Compiling LLVM for P2 on Windows (Updated!)

rogloh · 2026-04-15 13:38

I just dumped the symbols, wow your's includes a lot of extra stuff. I see you are setting the clock but not much more than that vs my example. The clock seems to have sucked in a lot of extra string functionality as well as time functions. I think you may well have used different compiler command line settings. This is what I used to build my elf file.

clang -I. -I../.. -Ibuild -Wall -Werror --target=p2 -fno-jump-tables -c -fdata-sections -ffunction-sections -o main.o main.c
clang -v --target=p2 -Wl,--gc-sections -Wl,-L/Users/roger/Applications/p2llvm/libc/lib -Wl,-L/Users/roger/Applications/p2llvm/libp2/lib -o main.elf main.o

EDIT: yep after using this on your source I get the symbols in out2.txt which is a smaller image vs your original .elf file dumped in out.txt. It might be the --gc-sections option doing this. EDIT2: yes it is that option which makes the difference.

Rayman · 2026-04-15 18:12

Last time when through this, was told that the .elf doesn't really reflect the size of the actual binary...

Seems you have to convert the .elf to a .bin to see exactly how big it is on chip?

Rayman · 2026-04-15 22:03

Used web search to figure out how to convert .elf to .bin

Rayman · 2026-04-15 22:10

Build in the way from post#32 above and size is smaller...

Rayman · 2026-04-15 22:11

@rogloh Guess I can look at what you posted in other thread, but...

Are the changes you made just to the lib files? Or, to the main LLVM files too?

Wuerfel_21 · 2026-04-15 22:54

32k for a hello world is still ridiculous.
I always find it annoying that the typical compiler->linker arrangement is unable to really provide readable listings that actually correspond to the output binary. Very hard to figure out what's actually in your code.

(don't read this post as me being all negative!)

Rayman · 2026-04-15 23:33

@Wuerfel_21 code is posted above…

Would be interesting to compare binary size with Flexprop…

rogloh · 2026-04-15 23:36

@Rayman said:
@rogloh Guess I can look at what you posted in other thread, but...

Are the changes you made just to the lib files? Or, to the main LLVM files too?

Details are in the other thread so best to read it. But yes it has changes to the libs (which is what I provided you), and also to LLVM source which needs fixes for the C modulus "%" operation otherwise it crashes LLVM, and also other CORDIC dependency fixes etc. You really should take those file changes and update your own LLVM build to resolve theses. Also there are still cases if you disassemble random bytes say with llvm-objdump -D it will crash this tool because the disassembler doesn't know about ALL P2 instructions yet. But if you disassemble genuine P2 compiled C code with llvm-objdump -d it's fine.

@Wuerfel_21 said:
32k for a hello world is still ridiculous.

Often the case with C with libs included.

I always find it annoying that the typical compiler->linker arrangement is unable to really provide readable listings that actually correspond to the output binary. Very hard to figure out what's actually in your code.

I find in this case that objdump -d is generally okay. I get a good disassembly listing of all P2 code in the binary. But if you wanted to see the symbols for global data accesses you don't see them used in the code it's just absolute read/write hex addresses which is not nice. You only see data symbols and their addresses in the symbol table. They really should cross reference them back in the disassembly listing IMO, or maybe that's just not implemented in the P2 port right now. Also it'd be real nice to have a way in the listing to see which C function arguments are being accessed in the stack frame or which initial registers they get copied into by somehow referring them back to the source code. Bit tricky if they don't have a way to carry it through in the .elf file. From memory I think enabling debugging helps pass more info down through the intermediate files.

main.elf:       file format elf32-p2

Disassembly of section .text:

00000000 <__entry>:
       0: f8 a1 03 fb                    rdlong r0, ptra        
       4: 10 00 80 fd                    jmp #\16
                ...

00000040 <__start0>:
      40: f8 a1 03 fb                    rdlong r0, ptra        
      44: 98 00 00 ff                    augs #152
      48: 28 a1 07 f6                    mov r0, #296   
      4c: d0 a1 03 fb                    rdlong r0, r0  
      50: 02 a0 97 fb                    tjz r0, #2
      54: 00 00 90 ff                    augd #1048576
      58: 00 fe 65 fd                    hubset #255
      5c: 00 00 00 ff                    augs #0
      60: 68 a0 07 f6                    mov r0, #104   
      64: d0 01 e8 fc                    coginit #0, r0 

00000068 <__start>:
      68: f8 a1 03 fb                    rdlong r0, ptra        
      6c: 98 00 00 ff                    augs #152
      70: 38 a5 07 f6                    mov r2, #312   
      74: 29 fe 67 fd                    setq2 #511
      78: 01 00 00 ff                    augs #1
      7c: 00 00 04 fb                    rdlong $0x000, #0

Rayman · 2026-04-16 10:23

Guess it’d be interesting to use spin2cpp on something and see if clang can compile it….

iseries · 2026-04-16 10:34

In the other thread it was noted that the elf files were bloated because they clear all of memory and not just the code.

I don't remember if there was an option to not do that or I built a loader that removed that.

Mike

Rayman · 2026-04-16 11:48

Think I have a minimum build folder (at least compiles hello.c).

To make .o from .c:
clang -I. -I./sys -Ibuild -Wall -Werror --target=p2 -fno-jump-tables -c -fdata-sections -ffunction-sections -o hello.o hello.c

To make .elf from .o
clang -v --target=p2 -Wl,--gc-sections -Wl,-L./ -Wl,-L./ -o hello.elf hello.o

To make .bin from .elf:
llvm-objcopy -O binary hello.elf hello.bin

Rayman · 2026-04-16 11:51

Guess forgot that made webpage for Clang a long time ago...

https://www.rayslogic.com/Propeller2/Clang.htm

I've uploaded a minimum build folder that can compile as in post #42.
But, still need to add the fixes from @rogloh , so not really ready yet...

Wuerfel_21 · 2026-04-16 15:46

@Rayman said:
@Wuerfel_21 code is posted above…

Would be interesting to compare binary size with Flexprop…

Ahh, we're doing printf, which is a typical bloat landmine. Most likely pulling in a bunch of floating point support along the way. Though IIRC P1 GCC could do printf with float support and not run totally out of RAM.

FlexC has multiple levels of mitigation for this problem, so beating it is hard:
- printf (though not sprintf or other variants) is treated as a builtin (__builtin_printf) and the compiler scans the format string and uses simpler functions to accomplish the same job if possible
- If the user program doesn't use floats, float support in the library is automatically disabled
- The actual library formatting implementation is pretty lean overall.

( It used to be possible to reduce bloat related to file descriptors etc if you just want to print to the console, but this got busted)

The fairer comparsion would be to use puts, I guess.

Most direct equivalent code to your LLVM example:

enum {
    _CLKFREQ = 200000000
};
#include "propeller.h"
#include "stdio.h"
int main() {
    _setbaud(115200 * 2);
    printf("Hello World!\n");
    while(1) {
        _waitx(CLKFREQ);
    }
}

Comes out to 7352 bytes.

If we force the real formatting implementation by using fprintf, which has no builtin processing:

enum {
    _CLKFREQ = 200000000
};
#include "propeller.h"
#include "stdio.h"
int main() {
    _setbaud(115200 * 2);
    fprintf(stdout,"Hello World!\n");
    while(1) {
        _waitx(CLKFREQ);
    }
}

Comes out to 8588 bytes.

If we also enable float support...

enum {
    _CLKFREQ = 200000000
};
#include "propeller.h"
#include "stdio.h"
int main() {
    _setbaud(115200 * 2);
    float foo = 1.0;
    fprintf(stdout,"Hello World!\n");
    while(1) {
        _waitx(CLKFREQ);
    }
}

We get 13244 bytes

Going the other way, if we don't include stdio (which will bog us down with a bunch of function pointers the compiler struggles to get rid of) and call the builtin directly:

enum {
    _CLKFREQ = 200000000
};
#include "propeller.h"
int main() {
    _setbaud(115200 * 2);
    __builtin_printf("Hello World!\n");
    while(1) {
        _waitx(CLKFREQ);
    }
}

We're down to 5768 bytes

If the aforementioned simple IO feature wasn't busted... (does not work on current versions)

enum {
    _CLKFREQ = 200000000
};
#define _SIMPLE_IO
#pragma exportdef _SIMPLE_IO
#include "propeller.h"
int main() {
    _setbaud(115200 * 2);
    __builtin_printf("Hello World!\n");
    while(1) {
        _waitx(CLKFREQ);
    }
}

It would be 3040 bytes. Still a lot, but most of it is actually zero-padding that the compiler will generate no matter what.

(EDIT: by adding -H 32 to the command line, some of it is saved and the size is exactly 2048 bytes - IDK why it's there by default)

And for comparsion, using puts instead of printf:

enum {
    _CLKFREQ = 200000000
};
#include "propeller.h"
#include "stdio.h"
int main() {
    _setbaud(115200 * 2);
    puts("Hello World!\n");
    while(1) {
        _waitx(CLKFREQ);
    }
}

4688 bytes

(all at default settings, -Os might make things slightly smaller but let's not)

So that's definitely something that needs improving to make LLVM a good option for P2.

@Wuerfel_21 said:
32k for a hello world is still ridiculous.

Often the case with C with libs included.

I always find it annoying that the typical compiler->linker arrangement is unable to really provide readable listings that actually correspond to the output binary. Very hard to figure out what's actually in your code.

I find in this case that objdump -d is generally okay.

Didn't even know about that one... >.<
Though it still ends up being an annotated disassembly of the already built program.

Rayman · 2026-04-16 15:51

Hmm... If things can be under 32k, maybe can compile for P1 too somehow?
That's probably pretty futile without XMM though, would guess...

Rayman · 2026-04-16 15:52

Thought those old notes on compiling LLVM would give me some insight on how to compile the .a libraries. But, seems couldn't figure it out back then either

Rayman · 2026-04-16 16:01

Ok, copied the @rogloh files from https://forums.parallax.com/discussion/169862/micropython-for-p2/p23 into a build folder and rebuilt.
Copied over new files from llmvfixes.zip first. Seems should be ready to go.

The p2llmv-fixes.zip looks to have stuff for building the .a libraries, but since was gifted those .a files from @rogloh (above) and can't build it anyway, skipping that.

Do have a question about the clang*.exe files... The all have exactly the same file size. Thinking they are all actually the same file. Are they?
7-Zip can compress them down as though were one file, so thing that is true...

rogloh · 2026-04-17 01:08

@Rayman said:
Ok, copied the @rogloh files from https://forums.parallax.com/discussion/169862/micropython-for-p2/p23 into a build folder and rebuilt.
Copied over new files from llmvfixes.zip first. Seems should be ready to go.

The p2llmv-fixes.zip looks to have stuff for building the .a libraries, but since was gifted those .a files from @rogloh (above) and can't build it anyway, skipping that.

Do have a question about the clang*.exe files... The all have exactly the same file size. Thinking they are all actually the same file. Are they?
7-Zip can compress them down as though were one file, so thing that is true...

When I build LLVM I get these files in the "bin" folder area and the clang* fies are different. Not sure what exatly you are talking about or maybe its a windows specific thing with Visual Studio. I do see a couple of symlinks are used to target the same clang binary if that's what you meant.

❯ ls -l
.rwxr-xr-x roger staff 556 B  Mon Mar 16 16:56:27 2026  analyze-build
.rwxr-xr-x roger staff 109 MB Fri Apr 10 11:22:54 2026  bugpoint
.rwxr-xr-x roger staff  99 MB Fri Apr 10 11:22:57 2026  c-index-test
lrwxr-xr-x roger staff   8 B  Mon Mar 16 17:14:20 2026  clang ⇒ clang-14
lrwxr-xr-x roger staff   5 B  Sat Apr 11 17:05:48 2026  clang++ ⇒ clang
.rwxr-xr-x roger staff 281 MB Fri Apr 10 11:22:56 2026  clang-14
.rwxr-xr-x roger staff 176 MB Fri Apr 10 11:22:55 2026  clang-check
lrwxr-xr-x roger staff   5 B  Sat Apr 11 17:05:48 2026  clang-cl ⇒ clang
lrwxr-xr-x roger staff   5 B  Sat Apr 11 17:05:48 2026  clang-cpp ⇒ clang
.rwxr-xr-x roger staff  88 MB Fri Apr 10 11:22:54 2026  clang-extdef-mapping
.rwxr-xr-x roger staff 8.1 MB Mon Mar 16 17:10:24 2026  clang-format
.rwxr-xr-x roger staff 6.0 MB Mon Mar 16 17:11:40 2026  clang-nvlink-wrapper
.rwxr-xr-x roger staff  15 MB Mon Mar 16 17:10:55 2026  clang-offload-bundler
.rwxr-xr-x roger staff  16 MB Mon Mar 16 17:11:52 2026  clang-offload-wrapper
.rwxr-xr-x roger staff  99 MB Mon Mar 16 17:12:39 2026  clang-refactor
.rwxr-xr-x roger staff  93 MB Mon Mar 16 17:12:38 2026  clang-rename
.rwxr-xr-x roger staff 257 MB Fri Apr 10 11:22:56 2026  clang-repl
.rwxr-xr-x roger staff 199 MB Fri Apr 10 11:22:56 2026  clang-scan-deps
.rwxr-xr-x roger staff  33 KB Mon Mar 16 17:06:41 2026  count
.rwxr-xr-x roger staff  30 MB Mon Mar 16 17:12:37 2026  diagtool
.rwxr-xr-x roger staff  64 MB Fri Apr 10 11:22:53 2026  dsymutil
.rwxr-xr-x roger staff 3.0 MB Mon Mar 16 17:07:02 2026  FileCheck
.rwxr-xr-x roger staff  22 KB Mon Mar 16 16:56:27 2026  git-clang-format
.rwxr-xr-x roger staff 9.7 KB Mon Mar 16 16:56:27 2026  hmaptool
.rwxr-xr-x roger staff 562 B  Mon Mar 16 16:56:27 2026  intercept-build
lrwxr-xr-x roger staff   3 B  Sat Apr 11 17:05:48 2026  ld.lld ⇒ lld
lrwxr-xr-x roger staff   3 B  Sat Apr 11 17:05:48 2026  ld64.lld ⇒ lld
lrwxr-xr-x roger staff   3 B  Sat Apr 11 17:05:48 2026  ld64.lld.darwinnew ⇒ lld
lrwxr-xr-x roger staff   3 B  Sat Apr 11 17:05:48 2026  ld64.lld.darwinold ⇒ lld
.rwxr-xr-x roger staff  91 MB Fri Apr 10 11:22:53 2026  llc
.rwxr-xr-x roger staff 144 MB Fri Apr 10 11:22:54 2026  lld
lrwxr-xr-x roger staff   3 B  Sat Apr 11 17:05:48 2026  lld-link ⇒ lld
.rwxr-xr-x roger staff  78 MB Mon Mar 16 17:13:44 2026  lli
.rwxr-xr-x roger staff 4.1 MB Mon Mar 16 17:13:25 2026  lli-child-target
lrwxr-xr-x roger staff  15 B  Sat Apr 11 17:05:48 2026  llvm-addr2line ⇒ llvm-symbolizer
.rwxr-xr-x roger staff  15 MB Fri Apr 10 11:22:49 2026  llvm-ar
.rwxr-xr-x roger staff  17 MB Mon Mar 16 17:11:42 2026  llvm-as
.rwxr-xr-x roger staff 2.3 MB Mon Mar 16 17:10:47 2026  llvm-bcanalyzer
lrwxr-xr-x roger staff  12 B  Sat Apr 11 17:05:48 2026  llvm-bitcode-strip ⇒ llvm-objcopy
.rwxr-xr-x roger staff  54 MB Fri Apr 10 11:22:52 2026  llvm-c-test
.rwxr-xr-x roger staff  16 MB Mon Mar 16 17:11:42 2026  llvm-cat
.rwxr-xr-x roger staff  22 MB Fri Apr 10 11:22:50 2026  llvm-cfi-verify
.rwxr-xr-x roger staff 1.1 MB Mon Mar 16 17:06:56 2026  llvm-config
.rwxr-xr-x roger staff  18 MB Mon Mar 16 17:11:10 2026  llvm-cov
.rwxr-xr-x roger staff 5.8 MB Mon Mar 16 17:10:53 2026  llvm-cvtres
.rwxr-xr-x roger staff  15 MB Mon Mar 16 17:10:54 2026  llvm-cxxdump
.rwxr-xr-x roger staff 1.8 MB Mon Mar 16 17:09:13 2026  llvm-cxxfilt
.rwxr-xr-x roger staff 2.4 MB Mon Mar 16 17:10:39 2026  llvm-cxxmap
.rwxr-xr-x roger staff  11 MB Mon Mar 16 17:10:50 2026  llvm-diff
.rwxr-xr-x roger staff  10 MB Mon Mar 16 17:10:50 2026  llvm-dis
lrwxr-xr-x roger staff   7 B  Sat Apr 11 17:05:46 2026  llvm-dlltool ⇒ llvm-ar
.rwxr-xr-x roger staff  19 MB Fri Apr 10 11:22:44 2026  llvm-dwarfdump
.rwxr-xr-x roger staff  59 MB Fri Apr 10 11:22:52 2026  llvm-dwp
.rwxr-xr-x roger staff  29 MB Mon Mar 16 17:14:11 2026  llvm-exegesis
.rwxr-xr-x roger staff  23 MB Mon Mar 16 17:12:50 2026  llvm-extract
.rwxr-xr-x roger staff  58 MB Fri Apr 10 11:22:52 2026  llvm-gsymutil
.rwxr-xr-x roger staff  15 MB Mon Mar 16 17:11:13 2026  llvm-ifs
lrwxr-xr-x roger staff  12 B  Sat Apr 11 17:05:48 2026  llvm-install-name-tool ⇒ llvm-objcopy
.rwxr-xr-x roger staff  48 MB Fri Apr 10 11:22:44 2026  llvm-jitlink
.rwxr-xr-x roger staff 4.1 MB Mon Mar 16 17:09:34 2026  llvm-jitlink-executor
lrwxr-xr-x roger staff   7 B  Sat Apr 11 17:05:46 2026  llvm-lib ⇒ llvm-ar
.rwxr-xr-x roger staff  15 MB Mon Mar 16 17:10:55 2026  llvm-libtool-darwin
.rwxr-xr-x roger staff  20 MB Mon Mar 16 17:12:51 2026  llvm-link
.rwxr-xr-x roger staff  15 MB Fri Apr 10 11:22:51 2026  llvm-lipo
.rwxr-xr-x roger staff 105 MB Fri Apr 10 11:22:54 2026  llvm-lto
.rwxr-xr-x roger staff 115 MB Fri Apr 10 11:22:54 2026  llvm-lto2
.rwxr-xr-x roger staff 7.1 MB Fri Apr 10 11:22:47 2026  llvm-mc
.rwxr-xr-x roger staff 6.6 MB Fri Apr 10 11:22:47 2026  llvm-mca
.rwxr-xr-x roger staff 6.2 MB Fri Apr 10 11:22:47 2026  llvm-ml
.rwxr-xr-x roger staff  16 MB Mon Mar 16 17:11:42 2026  llvm-modextract
.rwxr-xr-x roger staff 1.4 MB Mon Mar 16 17:09:15 2026  llvm-mt
.rwxr-xr-x roger staff  16 MB Fri Apr 10 11:22:50 2026  llvm-nm
.rwxr-xr-x roger staff  19 MB Mon Mar 16 17:11:05 2026  llvm-objcopy
.rwxr-xr-x roger staff  21 MB Fri Apr 10 11:22:44 2026  llvm-objdump
.rwxr-xr-x roger staff 2.8 MB Mon Mar 16 17:10:53 2026  llvm-opt-report
lrwxr-xr-x roger staff  12 B  Sat Apr 11 17:05:48 2026  llvm-otool ⇒ llvm-objdump
.rwxr-xr-x roger staff  24 MB Mon Mar 16 17:11:31 2026  llvm-pdbutil
.rwxr-xr-x roger staff  71 KB Mon Mar 16 17:06:42 2026  llvm-PerfectShuffle
.rwxr-xr-x roger staff 9.3 MB Mon Mar 16 17:10:51 2026  llvm-profdata
.rwxr-xr-x roger staff  28 MB Fri Apr 10 11:22:44 2026  llvm-profgen
lrwxr-xr-x roger staff   7 B  Sat Apr 11 17:05:46 2026  llvm-ranlib ⇒ llvm-ar
.rwxr-xr-x roger staff 6.8 MB Mon Mar 16 17:10:58 2026  llvm-rc
lrwxr-xr-x roger staff  12 B  Sat Apr 11 17:05:48 2026  llvm-readelf ⇒ llvm-readobj
.rwxr-xr-x roger staff  23 MB Mon Mar 16 17:11:39 2026  llvm-readobj
.rwxr-xr-x roger staff  16 MB Fri Apr 10 11:22:51 2026  llvm-reduce
.rwxr-xr-x roger staff  14 MB Fri Apr 10 11:22:44 2026  llvm-rtdyld
.rwxr-xr-x roger staff  12 MB Mon Mar 16 17:11:34 2026  llvm-sim
.rwxr-xr-x roger staff  14 MB Mon Mar 16 17:10:54 2026  llvm-size
.rwxr-xr-x roger staff  18 MB Mon Mar 16 17:11:50 2026  llvm-split
.rwxr-xr-x roger staff 8.6 MB Mon Mar 16 17:11:33 2026  llvm-stress
.rwxr-xr-x roger staff 1.8 MB Mon Mar 16 17:10:53 2026  llvm-strings
lrwxr-xr-x roger staff  12 B  Sat Apr 11 17:05:48 2026  llvm-strip ⇒ llvm-objcopy
.rwxr-xr-x roger staff  20 MB Mon Mar 16 17:11:32 2026  llvm-symbolizer
.rwxr-xr-x roger staff  15 MB Mon Mar 16 17:10:56 2026  llvm-tapi-diff
.rwxr-xr-x roger staff  16 MB Mon Mar 16 17:07:04 2026  llvm-tblgen
.rwxr-xr-x roger staff 2.1 MB Mon Mar 16 17:06:57 2026  llvm-undname
lrwxr-xr-x roger staff   7 B  Sat Apr 11 17:05:48 2026  llvm-windres ⇒ llvm-rc
.rwxr-xr-x roger staff  22 MB Mon Mar 16 17:11:35 2026  llvm-xray
.rwxr-xr-x roger staff 801 KB Mon Mar 16 17:06:56 2026  not
.rwxr-xr-x roger staff  28 MB Mon Mar 16 17:11:33 2026  obj2yaml
.rwxr-xr-x roger staff 115 MB Fri Apr 10 11:22:54 2026  opt
.rwxr-xr-x roger staff  22 MB Fri Apr 10 11:22:44 2026  sancov
.rwxr-xr-x roger staff  20 MB Mon Mar 16 17:11:34 2026  sanstats
.rwxr-xr-x roger staff  56 KB Mon Mar 16 16:56:27 2026  scan-build
.rwxr-xr-x roger staff 550 B  Mon Mar 16 16:56:27 2026  scan-build-py
.rwxr-xr-x roger staff 4.6 KB Mon Mar 16 16:56:27 2026  scan-view
.rwxr-xr-x roger staff 3.8 KB Mon Mar 16 16:56:27 2026  set-xcode-analyzer
.rwxr-xr-x roger staff 1.7 MB Mon Mar 16 17:06:56 2026  split-file
.rwxr-xr-x roger staff  18 MB Mon Mar 16 17:11:43 2026  verify-uselistorder
lrwxr-xr-x roger staff   3 B  Sat Apr 11 17:05:48 2026  wasm-ld ⇒ lld
.rwxr-xr-x roger staff 2.0 MB Mon Mar 16 17:06:57 2026  yaml-bench
.rwxr-xr-x roger staff  14 MB Mon Mar 16 17:11:13 2026  yaml2obj

rogloh · 2026-04-17 01:23

@Rayman said:
The p2llmv-fixes.zip looks to have stuff for building the .a libraries, but since was gifted those .a files from @rogloh (above) and can't build it anyway, skipping that.

I just logged the output of the make process for building these libraries which should help you reverse engineer things so you can build your own versions if needed.

RossH · 2026-04-17 12:25

@Wuerfel_21 said:

Ahh, we're doing printf, which is a typical bloat landmine. Most likely pulling in a bunch of floating point support along the way. Though IIRC P1 GCC could do printf with float support and not run totally out of RAM.

As Catalina still can, of course - I can't resist putting an ad in here! ...

<advertisment>

Using Catalina's COMPACT mode you can have full stdio support for programs executing entirely from Hub RAM - including full floating point and full file system support - on a P1 or a P2. The maximum overhead is about 14k.

For example, "Hello World" for the Propeller 1:

catalina hello_world.c -lcx -O5 -C COMPACT
code size 14952

or, for the Propeller 2:

catalina hello_world.c -P2 -lcx -O5 -C COMPACT
code size 14988

Of course, if you don't need full file system or full floating point support (as "Hello World" doesn't) you don't have to include either one.

Catalina does this by providing different libraries, that have different combinations of stdio and floating point support:

-lcx full floating point support, full stdio support, full file system support - max overhead about 14k
-lcix floating point support, stdio support but no floating point I/O, full file system support - max overhead about 11k
-lc full floating point support, stdio support but no file system - max overhead about 10k
-lci floating point support, stdio support but no floating point I/O and no file system - max overhead about 3k

So ...

catalina hello_world.c -p2 -lcix -O5 -C COMPACT
code size 11372 bytes

catalina hello_world.c -p2 -lc -O5 -C COMPACT
code size 10136 bytes

catalina hello_world.c -p2 -lci -O5 -C COMPACT
code size 3436 bytes

As you might expect, it is including file system support that is the largest memory hog (edit: see note, below).

In the case of "Hello, World", Catalina also offers several other ways to reduce code size.

You can use stdio but add a smaller version of printf (slightly less functional, but adequate for most programs) by adding -ltiny:

catalina hello_world.c -p2 -lci -O5 -C COMPACT -ltiny
code size 1520 bytes

Or you can replace printf with an even smaller version that does not pull in any stdio code at all:

catalina hello_world.c -p2 -lci -O5 -C COMPACT -Dprintf=t_printf
code size 1056 bytes

None of these require any modifications to hello_world.c, which in all the above cases is as follows:

#include <stdio.h>

void main() {
   printf("Hello, world!\n");
}

However, if you are ok with modifying the program, you can do better.

For example, this program - tiny_world.c - is functionally identical to hello_world.c, but uses only "built in" capabilities:

#define printf(str) t_string(1, str);

void main() {
   printf("Hello, world!\n");
}

Then ...

catalina tiny_world.c -p2 -lci -O5 -C COMPACT -C NO_EXIT -C NO_REBOOT
code size 100 bytes

Catalina's strength is that you usually do not need to modify a C program to get it to execute. Of course, there are additional libraries offered that add functionality that will only work on the Propeller 1 or Propeller 2, but if you stick to "clean" C (originally only C89, now also C99, C11 or C23), you don't need to modify programs, whether they are going to execute on the Propeller 1, Propeller 2, and whether they are compiled as COMPACT or NATIVE programs to execute from Hub RAM, or as XMM programs to execute from external RAM.

</advertisment>

Edited to add note: Technically, it is not "file system" support that bloats stdio so much - it is "stream" support. The -lci, library variant has simplified streams which supports only stdin, stdout and stderr. The other library variants all have full stream support.

rogloh · 2026-04-18 03:28

I just compiled this...under P2LLVM with the printf and stdio.h include file commented out. I still get a large program generated so it looks like it sucks in a lot of library stuff by default. That definitely needs to be optimized to reduce the size of the binaries being created. You can see what it's bringing in inside the sorted symbol table output I extracted as out.txt using llvm-objdump -x. I think part of the problem is that P2LLVM is packing certain built-in functions into LUTRAM to accelerate commonly(?) made calls which then reference external code in HUB it brings in afterwards by default. This includes a lot of floating point conversion code which isn't necessary in many cases. It makes sense to do memcpy and memmove in LUT but not sure about all the other ones (unless you really want to do floating point).

//#include <stdio.h>
#include <propeller.h>

int main(void)
{
_uart_init(63,62,115200,0);
//printf("hello world\n");
return 0;
}

Here's part of what it puts into LUTRAM. A lot of floating point conversion and comparsion calls, seemingly back to HUB RAM anyway at copies of the same label (with different code). A bug in the output generated perhaps? IMO it's probably best to keep this all in HUB anyway, only included when needed, and not even use LUTRAM. Also it could have used a JMP instead and had the RETA of the called function return to the original location which would be faster and save a long each time.
```
00000358 <__fixunsdfdi>:
358: b4 44 c0 fd calla #__fixuint
35c: 2e 00 64 fd reta

00000360 <__fixunsdfsi>:
360: 90 45 c0 fd calla #__fixuint
364: 2e 00 64 fd reta

00000368 <__fixunssfdi>:
368: 30 46 c0 fd calla #__fixuint
36c: 2e 00 64 fd reta

00000370 <__fixunssfsi>:
370: e4 46 c0 fd calla #__fixuint
374: 2e 00 64 fd reta

...

00000200 g F .text 00000008 __adddf3
00000208 g F .text 00000008 __addsf3
00000210 g F .text 00000054 __ashldi3
00000264 g F .text 00000058 __ashrdi3
000002bc g F .text 00000008 __eqdf2
000002bc g F .text 00000008 __ledf2
000002bc g F .text 00000008 __ltdf2
000002bc g F .text 00000008 __nedf2
000002c4 g F .text 00000008 __gedf2
000002c4 g F .text 00000008 __gtdf2
000002cc g F .text 00000008 __unorddf2
000002d4 g F .text 00000008 __eqsf2
000002d4 g F .text 00000008 __lesf2
000002d4 g F .text 00000008 __ltsf2
000002d4 g F .text 00000008 __nesf2
000002dc g F .text 00000008 __gesf2
000002dc g F .text 00000008 __gtsf2
000002e4 g F .text 00000008 __unordsf2
000002ec g F .text 00000008 __divdi3
000002f4 g F .text 00000008 __divdf3
000002fc g F .text 00000034 __divsi3
00000330 g F .text 00000008 __divsf3
00000338 g F .text 00000008 __extendsfdf2
00000340 g F .text 00000008 __fixdfsi
00000348 g F .text 00000008 __fixsfdi
00000350 g F .text 00000008 __fixsfsi
00000358 g F .text 00000008 __fixunsdfdi
00000360 g F .text 00000008 __fixunsdfsi
00000368 g F .text 00000008 __fixunssfdi
00000370 g F .text 00000008 __fixunssfsi
00000378 g F .text 00000008 __floatdisf
00000380 g F .text 00000008 __floatsidf
00000388 g F .text 00000008 __floatundidf
00000390 g F .text 00000008 __floatundisf
00000398 g F .text 00000008 __floatunsidf
000003a0 g F .text 00000008 __floatunsisf
000003a8 g F .text 000000d8 __floatsisf
00000480 g F .text 00000054 __lshrdi3
000004d4 g F .text 00000094 memcpy
00000568 g F .text 00000068 memmove
000005d0 g F .text 00000028 memset
000005f8 g F .text 00000008 __moddi3
00000600 g F .text 00000034 __modsi3
00000634 g F .text 00000008 __muldf3
0000063c g F .text 00000008 __mulsf3
00000644 g F .text 00000098 __muldi3
000006dc g F .text 00000014 __negdi2
000006f0 g F .text 00000024 __subdf3
00000714 g F .text 00000018 __subsf3
0000072c g F .text 00000014 __udivdi3
00000740 g F .text 000000cc __udivmoddi4
0000080c g F .text 0000003c __umoddi3
00000848 g F .text 00000008 sqrtf
00000850 g F .text 00000008 powf
````

Rayman · 2026-04-23 13:11

Rebuilt Clang using the updates from @rogloh

Modulus operator seems to work now, so should be in good shape.
Updated the repo here:
https://www.rayslogic.com/Propeller2/Clang.htm

Think every thing needed is in this zip:
https://www.rayslogic.com/Propeller2/LLVM/LLVM_bin.zip

Just extract the bin folder somewhere.
Open cmd.exe there and type "build hello2" to see it compile...

Upload to P2 using loadp2 or FlexProp

Christof Eb. · 2026-04-23 14:55

Is it thinkable, that this might open a path to have arduino for P2? I do love their libraries....

Rayman · 2026-04-23 15:08

I was wondering the same thing... Think it should work.

rogloh · 2026-04-23 23:28

@"Christof Eb." said:
Is it thinkable, that this might open a path to have arduino for P2? I do love their libraries....

A full GCC style compiler opens up all sorts of possibilities. Now whether there are bugs and we can find and fix them that's another story but I've been very pleasantly surprised so far running MicroPython. Best part is that Clang is only one front end and other LLVM compatible front end languages should also work if built for the P2. Whether they make sense or not is another story. Would you want to run Fortran on a P2? Maybe for numerical stuff it could still be good and could process data from a P2 directly (albeit without a FPU ). Hah, maybe I could try to dig up my first year numerical methods Maths assignments coded in Fortran back in the 80's, lol. I think I may still have them saved somewhere on a 5.25inch floppy (not punched cards thankfully, although they probably still had those things connected to all those antiquated Vax machines at uni). They finally switched to using 286 based terminal/PCs and better after that, just as we graduated...

Rayman · 2026-04-24 00:25

@rogloh Maybe we can find a regression test for llvm…. I’m guessing it already exists…

rogloh · 2026-04-24 07:01

@Rayman said:
@rogloh Maybe we can find a regression test for llvm…. I’m guessing it already exists…

Just took a look and there's a bunch of test programs in the test folder of p2llvm from Nikita's repo. These seem to be setup to test out different parts of C to make sure it's working wiht the P2 compiled program. Not something that tests individual instructions directly it seems. So if we add one more instruction that C doesn't use then it won't be tested this way. Still this is probably very good for a regression test to ensure nothing in C breaks with the changes. Give it a try and see if you get it going...needs some python stuff and cmake plus loadp2 it seems.

rogloh · 2026-04-24 08:02

@Rayman I messed up applying the optimizations to your hello program you sent yesterday. With the correct setting used and inlining disabled it seems that it is now far better optimized and that enlarged test function that bothered me doing the extra reads/writes is now far better behaved and is just 3 instructions - QDIV, GETQY, RETA which is perfect! This is as good as it gets and this was even with -O1 setting too.

00000a00 <main>:
     a00: 28 06 64 fd                           setq    #3
     a04: 61 a1 67 fc                           wrlong  r0, ptra++
     a08: 04 80 00 ff                           augs    #$8004
     a0c: f8 a1 07 f6                           mov     r0, #$1f8
     a10: e1 f5 05 ff                           augs    #$5f5e1
     a14: 00 a2 07 f6                           mov     r1, #0
     a18: cc e8 c0 fd                           calla   #\_clkset
     a1c: 3f a0 07 f6                           mov     r0, #$3f
     a20: 3e a2 07 f6                           mov     r1, #$3e
     a24: c2 01 00 ff                           augs    #$1c2
     a28: 00 a4 07 f6                           mov     r2, #0
     a2c: 00 a6 07 f6                           mov     r3, #0
     a30: b8 e9 c0 fd                           calla   #\_uart_init
     a34: 93 00 00 ff                           augs    #$93
     a38: 51 a1 07 f6                           mov     r0, #$151
     a3c: 80 4f c0 fd                           calla   #\puts
     a40: 64 a0 07 f6                           mov     r0, #$64
     a44: 32 a2 07 f6                           mov     r1, #$32
     a48: 80 0a c0 fd                           calla   #\test
     a4c: f8 df 63 fc                           wrlong  r31, ptra
     a50: f8 a1 03 f6                           mov     r0, ptra
     a54: 04 a0 07 f1                           add     r0, #4
     a58: 93 00 00 ff                           augs    #$93
     a5c: 60 a2 07 f6                           mov     r1, #$60
     a60: d0 a3 63 fc                           wrlong  r1, r0
     a64: 08 f0 07 f1                           add     ptra, #8
     a68: cc 5c c0 fd                           calla   #\printf
     a6c: 08 f0 87 f1                           sub     ptra, #8
     a70: 14 a0 07 f6                           mov     r0, #$14
     a74: d0 a3 03 fb                           rdlong  r1, r0
     a78: 1f a2 63 fd                           waitx   r1
     a7c: f4 ff 9f fd                           jmp     #-12

00000a80 <test>:
     a80: d1 a1 13 fd                           qdiv    r0, r1
     a84: 19 de 63 fd                           getqy   r31
     a88: 2e 00 64 fd                           reta

Also learned a bit more about configuring LLVM today. Seems there are a bunch of different instruction template settings that the passes of the toolchain use to help decide if various optimizations can be applied. Not many of these are currently being configured in the instruction tables for the P2 (so it's likely just using defaults) and I'm wondering if this may still leave some code unoptimized. If the compiler doesn't know which specific instructions read/write from memory and what is safe to move around it probably can't optimize all that easily. I think these flags might have some influence and wonder how many should be applied in the P2 case. Now some actually are, such is isReturn and isBranch and isCall, isTerminator. But the MoveImm/MoveReg setting is not really applied anywhere from what I can tell unless they get setup somewhere common by default.

 // instruction.
  bit isReturn     = false;     // Is this instruction a return instruction?
  bit isBranch     = false;     // Is this instruction a branch instruction?
  bit isEHScopeReturn = false;  // Does this instruction end an EH scope?
  bit isIndirectBranch = false; // Is this instruction an indirect branch?
  bit isCompare    = false;     // Is this instruction a comparison instruction?
  bit isMoveImm    = false;     // Is this instruction a move immediate instruction?
  bit isMoveReg    = false;     // Is this instruction a move register instruction?
  bit isBitcast    = false;     // Is this instruction a bitcast instruction?
  bit isSelect     = false;     // Is this instruction a select instruction?
  bit isBarrier    = false;     // Can control flow fall through this instruction?
  bit isCall       = false;     // Is this instruction a call instruction?
  bit isAdd        = false;     // Is this instruction an add instruction?
  bit isTrap       = false;     // Is this instruction a trap instruction?
  bit canFoldAsLoad = false;    // Can this be folded as a simple memory operand?
  bit mayLoad      = ?;         // Is it possible for this inst to read memory?
  bit mayStore     = ?;         // Is it possible for this inst to write memory?
  bit mayRaiseFPException = false; // Can this raise a floating-point exception?
  bit isConvertibleToThreeAddress = false;  // Can this 2-addr instruction promote?
  bit isCommutable = false;     // Is this 3 operand instruction commutable?
  bit isTerminator = false;     // Is this part of the terminator for a basic block?
  bit isReMaterializable = false; // Is this instruction re-materializable?
  bit isPredicable = false;     // 1 means this instruction is predicable
                                // even if it does not have any operand
                                // tablegen can identify as a predicate
  bit isUnpredicable = false;   // 1 means this instruction is not predicable
                                // even if it _does_ have a predicate operand
  bit hasDelaySlot = false;     // Does this instruction have an delay slot?
  bit usesCustomInserter = false; // Pseudo instr needing special help.
  bit hasPostISelHook = false;  // To be *adjusted* after isel by target hook.
  bit hasCtrlDep   = false;     // Does this instruction r/w ctrl-flow chains?
  bit isNotDuplicable = false;  // Is it unsafe to duplicate this instruction?
  bit isConvergent = false;     // Is this instruction convergent?
  bit isAuthenticated = false;  // Does this instruction authenticate a pointer?
  bit isAsCheapAsAMove = false; // As cheap (or cheaper) than a move instruction.
  bit hasExtraSrcRegAllocReq = false; // Sources have special regalloc requirement?
  bit hasExtraDefRegAllocReq = false; // Defs have special regalloc requirement?
  bit isRegSequence = false;    // Is this instruction a kind of reg sequence?
                                // If so, make sure to override
                                // TargetInstrInfo::getRegSequenceLikeInputs.
  bit isPseudo     = false;     // Is this instruction a pseudo-instruction?
                                // If so, won't have encoding information for
                                // the [MC]CodeEmitter stuff.
  bit isExtractSubreg = false;  // Is this instruction a kind of extract subreg?
                                // If so, make sure to override
                                // TargetInstrInfo::getExtractSubregLikeInputs.
  bit isInsertSubreg = false;   // Is this instruction a kind of insert subreg?
                                // If so, make sure to override
                                // TargetInstrInfo::getInsertSubregLikeInputs.
  bit variadicOpsAreDefs = false; // Are variadic operands definitions?

Also LLVM can apparently do conditional execution (predicable) where some blocks are conditionally executed inline rather than branching out into different blocks based on the flags. I don't typically see that happening very much in the code and it's mainly comparing to set the flags, and then branching on it right away. It may need some weights applied to check if a branch costs more than conditionally skipping over the instructions. Maybe some of that needs setting up somewhere...?

I read this too about LLVM:
Latency-Aware Scheduling: Updating a target's scheduling model (latencies and throughput) allows LLVM to reorder instructions more effectively to fill execution slots and hide long-latency operations like loads.
This could potentially help with hubexec load timing, although the FIFO probably gets in the way too.

Compiling LLVM for P2 on Windows (Updated!)

Comments