I just dumped the symbols, wow your's includes a lot of extra stuff. I see you are setting the clock but not much more than that vs my example. The clock seems to have sucked in a lot of extra string functionality as well as time functions. I think you may well have used different compiler command line settings. This is what I used to build my elf file.
EDIT: yep after using this on your source I get the symbols in out2.txt which is a smaller image vs your original .elf file dumped in out.txt. It might be the --gc-sections option doing this. EDIT2: yes it is that option which makes the difference.
32k for a hello world is still ridiculous.
I always find it annoying that the typical compiler->linker arrangement is unable to really provide readable listings that actually correspond to the output binary. Very hard to figure out what's actually in your code.
@Rayman said:
@rogloh Guess I can look at what you posted in other thread, but...
Are the changes you made just to the lib files? Or, to the main LLVM files too?
Details are in the other thread so best to read it. But yes it has changes to the libs (which is what I provided you), and also to LLVM source which needs fixes for the C modulus "%" operation otherwise it crashes LLVM, and also other CORDIC dependency fixes etc. You really should take those file changes and update your own LLVM build to resolve theses. Also there are still cases if you disassemble random bytes say with llvm-objdump -D it will crash this tool because the disassembler doesn't know about ALL P2 instructions yet. But if you disassemble genuine P2 compiled C code with llvm-objdump -d it's fine.
@Wuerfel_21 said:
32k for a hello world is still ridiculous.
Often the case with C with libs included.
I always find it annoying that the typical compiler->linker arrangement is unable to really provide readable listings that actually correspond to the output binary. Very hard to figure out what's actually in your code.
I find in this case that objdump -d is generally okay. I get a good disassembly listing of all P2 code in the binary. But if you wanted to see the symbols for global data accesses you don't see them used in the code it's just absolute read/write hex addresses which is not nice. You only see data symbols and their addresses in the symbol table. They really should cross reference them back in the disassembly listing IMO, or maybe that's just not implemented in the P2 port right now. Also it'd be real nice to have a way in the listing to see which C function arguments are being accessed in the stack frame or which initial registers they get copied into by somehow referring them back to the source code. Bit tricky if they don't have a way to carry it through in the .elf file. From memory I think enabling debugging helps pass more info down through the intermediate files.
Would be interesting to compare binary size with Flexprop…
Ahh, we're doing printf, which is a typical bloat landmine. Most likely pulling in a bunch of floating point support along the way. Though IIRC P1 GCC could do printf with float support and not run totally out of RAM.
FlexC has multiple levels of mitigation for this problem, so beating it is hard:
- printf (though not sprintf or other variants) is treated as a builtin (__builtin_printf) and the compiler scans the format string and uses simpler functions to accomplish the same job if possible
- If the user program doesn't use floats, float support in the library is automatically disabled
- The actual library formatting implementation is pretty lean overall.
( It used to be possible to reduce bloat related to file descriptors etc if you just want to print to the console, but this got busted)
The fairer comparsion would be to use puts, I guess.
Going the other way, if we don't include stdio (which will bog us down with a bunch of function pointers the compiler struggles to get rid of) and call the builtin directly:
(all at default settings, -Os might make things slightly smaller but let's not)
So that's definitely something that needs improving to make LLVM a good option for P2.
@Wuerfel_21 said:
32k for a hello world is still ridiculous.
Often the case with C with libs included.
I always find it annoying that the typical compiler->linker arrangement is unable to really provide readable listings that actually correspond to the output binary. Very hard to figure out what's actually in your code.
I find in this case that objdump -d is generally okay.
Didn't even know about that one... >.<
Though it still ends up being an annotated disassembly of the already built program.
Thought those old notes on compiling LLVM would give me some insight on how to compile the .a libraries. But, seems couldn't figure it out back then either
The p2llmv-fixes.zip looks to have stuff for building the .a libraries, but since was gifted those .a files from @rogloh (above) and can't build it anyway, skipping that.
Do have a question about the clang*.exe files... The all have exactly the same file size. Thinking they are all actually the same file. Are they?
7-Zip can compress them down as though were one file, so thing that is true...
The p2llmv-fixes.zip looks to have stuff for building the .a libraries, but since was gifted those .a files from @rogloh (above) and can't build it anyway, skipping that.
Do have a question about the clang*.exe files... The all have exactly the same file size. Thinking they are all actually the same file. Are they?
7-Zip can compress them down as though were one file, so thing that is true...
When I build LLVM I get these files in the "bin" folder area and the clang* fies are different. Not sure what exatly you are talking about or maybe its a windows specific thing with Visual Studio. I do see a couple of symlinks are used to target the same clang binary if that's what you meant.
❯ ls -l
.rwxr-xr-x roger staff 556 B Mon Mar 16 16:56:27 2026 analyze-build
.rwxr-xr-x roger staff 109 MB Fri Apr 10 11:22:54 2026 bugpoint
.rwxr-xr-x roger staff 99 MB Fri Apr 10 11:22:57 2026 c-index-test
lrwxr-xr-x roger staff 8 B Mon Mar 16 17:14:20 2026 clang ⇒ clang-14
lrwxr-xr-x roger staff 5 B Sat Apr 11 17:05:48 2026 clang++ ⇒ clang
.rwxr-xr-x roger staff 281 MB Fri Apr 10 11:22:56 2026 clang-14
.rwxr-xr-x roger staff 176 MB Fri Apr 10 11:22:55 2026 clang-check
lrwxr-xr-x roger staff 5 B Sat Apr 11 17:05:48 2026 clang-cl ⇒ clang
lrwxr-xr-x roger staff 5 B Sat Apr 11 17:05:48 2026 clang-cpp ⇒ clang
.rwxr-xr-x roger staff 88 MB Fri Apr 10 11:22:54 2026 clang-extdef-mapping
.rwxr-xr-x roger staff 8.1 MB Mon Mar 16 17:10:24 2026 clang-format
.rwxr-xr-x roger staff 6.0 MB Mon Mar 16 17:11:40 2026 clang-nvlink-wrapper
.rwxr-xr-x roger staff 15 MB Mon Mar 16 17:10:55 2026 clang-offload-bundler
.rwxr-xr-x roger staff 16 MB Mon Mar 16 17:11:52 2026 clang-offload-wrapper
.rwxr-xr-x roger staff 99 MB Mon Mar 16 17:12:39 2026 clang-refactor
.rwxr-xr-x roger staff 93 MB Mon Mar 16 17:12:38 2026 clang-rename
.rwxr-xr-x roger staff 257 MB Fri Apr 10 11:22:56 2026 clang-repl
.rwxr-xr-x roger staff 199 MB Fri Apr 10 11:22:56 2026 clang-scan-deps
.rwxr-xr-x roger staff 33 KB Mon Mar 16 17:06:41 2026 count
.rwxr-xr-x roger staff 30 MB Mon Mar 16 17:12:37 2026 diagtool
.rwxr-xr-x roger staff 64 MB Fri Apr 10 11:22:53 2026 dsymutil
.rwxr-xr-x roger staff 3.0 MB Mon Mar 16 17:07:02 2026 FileCheck
.rwxr-xr-x roger staff 22 KB Mon Mar 16 16:56:27 2026 git-clang-format
.rwxr-xr-x roger staff 9.7 KB Mon Mar 16 16:56:27 2026 hmaptool
.rwxr-xr-x roger staff 562 B Mon Mar 16 16:56:27 2026 intercept-build
lrwxr-xr-x roger staff 3 B Sat Apr 11 17:05:48 2026 ld.lld ⇒ lld
lrwxr-xr-x roger staff 3 B Sat Apr 11 17:05:48 2026 ld64.lld ⇒ lld
lrwxr-xr-x roger staff 3 B Sat Apr 11 17:05:48 2026 ld64.lld.darwinnew ⇒ lld
lrwxr-xr-x roger staff 3 B Sat Apr 11 17:05:48 2026 ld64.lld.darwinold ⇒ lld
.rwxr-xr-x roger staff 91 MB Fri Apr 10 11:22:53 2026 llc
.rwxr-xr-x roger staff 144 MB Fri Apr 10 11:22:54 2026 lld
lrwxr-xr-x roger staff 3 B Sat Apr 11 17:05:48 2026 lld-link ⇒ lld
.rwxr-xr-x roger staff 78 MB Mon Mar 16 17:13:44 2026 lli
.rwxr-xr-x roger staff 4.1 MB Mon Mar 16 17:13:25 2026 lli-child-target
lrwxr-xr-x roger staff 15 B Sat Apr 11 17:05:48 2026 llvm-addr2line ⇒ llvm-symbolizer
.rwxr-xr-x roger staff 15 MB Fri Apr 10 11:22:49 2026 llvm-ar
.rwxr-xr-x roger staff 17 MB Mon Mar 16 17:11:42 2026 llvm-as
.rwxr-xr-x roger staff 2.3 MB Mon Mar 16 17:10:47 2026 llvm-bcanalyzer
lrwxr-xr-x roger staff 12 B Sat Apr 11 17:05:48 2026 llvm-bitcode-strip ⇒ llvm-objcopy
.rwxr-xr-x roger staff 54 MB Fri Apr 10 11:22:52 2026 llvm-c-test
.rwxr-xr-x roger staff 16 MB Mon Mar 16 17:11:42 2026 llvm-cat
.rwxr-xr-x roger staff 22 MB Fri Apr 10 11:22:50 2026 llvm-cfi-verify
.rwxr-xr-x roger staff 1.1 MB Mon Mar 16 17:06:56 2026 llvm-config
.rwxr-xr-x roger staff 18 MB Mon Mar 16 17:11:10 2026 llvm-cov
.rwxr-xr-x roger staff 5.8 MB Mon Mar 16 17:10:53 2026 llvm-cvtres
.rwxr-xr-x roger staff 15 MB Mon Mar 16 17:10:54 2026 llvm-cxxdump
.rwxr-xr-x roger staff 1.8 MB Mon Mar 16 17:09:13 2026 llvm-cxxfilt
.rwxr-xr-x roger staff 2.4 MB Mon Mar 16 17:10:39 2026 llvm-cxxmap
.rwxr-xr-x roger staff 11 MB Mon Mar 16 17:10:50 2026 llvm-diff
.rwxr-xr-x roger staff 10 MB Mon Mar 16 17:10:50 2026 llvm-dis
lrwxr-xr-x roger staff 7 B Sat Apr 11 17:05:46 2026 llvm-dlltool ⇒ llvm-ar
.rwxr-xr-x roger staff 19 MB Fri Apr 10 11:22:44 2026 llvm-dwarfdump
.rwxr-xr-x roger staff 59 MB Fri Apr 10 11:22:52 2026 llvm-dwp
.rwxr-xr-x roger staff 29 MB Mon Mar 16 17:14:11 2026 llvm-exegesis
.rwxr-xr-x roger staff 23 MB Mon Mar 16 17:12:50 2026 llvm-extract
.rwxr-xr-x roger staff 58 MB Fri Apr 10 11:22:52 2026 llvm-gsymutil
.rwxr-xr-x roger staff 15 MB Mon Mar 16 17:11:13 2026 llvm-ifs
lrwxr-xr-x roger staff 12 B Sat Apr 11 17:05:48 2026 llvm-install-name-tool ⇒ llvm-objcopy
.rwxr-xr-x roger staff 48 MB Fri Apr 10 11:22:44 2026 llvm-jitlink
.rwxr-xr-x roger staff 4.1 MB Mon Mar 16 17:09:34 2026 llvm-jitlink-executor
lrwxr-xr-x roger staff 7 B Sat Apr 11 17:05:46 2026 llvm-lib ⇒ llvm-ar
.rwxr-xr-x roger staff 15 MB Mon Mar 16 17:10:55 2026 llvm-libtool-darwin
.rwxr-xr-x roger staff 20 MB Mon Mar 16 17:12:51 2026 llvm-link
.rwxr-xr-x roger staff 15 MB Fri Apr 10 11:22:51 2026 llvm-lipo
.rwxr-xr-x roger staff 105 MB Fri Apr 10 11:22:54 2026 llvm-lto
.rwxr-xr-x roger staff 115 MB Fri Apr 10 11:22:54 2026 llvm-lto2
.rwxr-xr-x roger staff 7.1 MB Fri Apr 10 11:22:47 2026 llvm-mc
.rwxr-xr-x roger staff 6.6 MB Fri Apr 10 11:22:47 2026 llvm-mca
.rwxr-xr-x roger staff 6.2 MB Fri Apr 10 11:22:47 2026 llvm-ml
.rwxr-xr-x roger staff 16 MB Mon Mar 16 17:11:42 2026 llvm-modextract
.rwxr-xr-x roger staff 1.4 MB Mon Mar 16 17:09:15 2026 llvm-mt
.rwxr-xr-x roger staff 16 MB Fri Apr 10 11:22:50 2026 llvm-nm
.rwxr-xr-x roger staff 19 MB Mon Mar 16 17:11:05 2026 llvm-objcopy
.rwxr-xr-x roger staff 21 MB Fri Apr 10 11:22:44 2026 llvm-objdump
.rwxr-xr-x roger staff 2.8 MB Mon Mar 16 17:10:53 2026 llvm-opt-report
lrwxr-xr-x roger staff 12 B Sat Apr 11 17:05:48 2026 llvm-otool ⇒ llvm-objdump
.rwxr-xr-x roger staff 24 MB Mon Mar 16 17:11:31 2026 llvm-pdbutil
.rwxr-xr-x roger staff 71 KB Mon Mar 16 17:06:42 2026 llvm-PerfectShuffle
.rwxr-xr-x roger staff 9.3 MB Mon Mar 16 17:10:51 2026 llvm-profdata
.rwxr-xr-x roger staff 28 MB Fri Apr 10 11:22:44 2026 llvm-profgen
lrwxr-xr-x roger staff 7 B Sat Apr 11 17:05:46 2026 llvm-ranlib ⇒ llvm-ar
.rwxr-xr-x roger staff 6.8 MB Mon Mar 16 17:10:58 2026 llvm-rc
lrwxr-xr-x roger staff 12 B Sat Apr 11 17:05:48 2026 llvm-readelf ⇒ llvm-readobj
.rwxr-xr-x roger staff 23 MB Mon Mar 16 17:11:39 2026 llvm-readobj
.rwxr-xr-x roger staff 16 MB Fri Apr 10 11:22:51 2026 llvm-reduce
.rwxr-xr-x roger staff 14 MB Fri Apr 10 11:22:44 2026 llvm-rtdyld
.rwxr-xr-x roger staff 12 MB Mon Mar 16 17:11:34 2026 llvm-sim
.rwxr-xr-x roger staff 14 MB Mon Mar 16 17:10:54 2026 llvm-size
.rwxr-xr-x roger staff 18 MB Mon Mar 16 17:11:50 2026 llvm-split
.rwxr-xr-x roger staff 8.6 MB Mon Mar 16 17:11:33 2026 llvm-stress
.rwxr-xr-x roger staff 1.8 MB Mon Mar 16 17:10:53 2026 llvm-strings
lrwxr-xr-x roger staff 12 B Sat Apr 11 17:05:48 2026 llvm-strip ⇒ llvm-objcopy
.rwxr-xr-x roger staff 20 MB Mon Mar 16 17:11:32 2026 llvm-symbolizer
.rwxr-xr-x roger staff 15 MB Mon Mar 16 17:10:56 2026 llvm-tapi-diff
.rwxr-xr-x roger staff 16 MB Mon Mar 16 17:07:04 2026 llvm-tblgen
.rwxr-xr-x roger staff 2.1 MB Mon Mar 16 17:06:57 2026 llvm-undname
lrwxr-xr-x roger staff 7 B Sat Apr 11 17:05:48 2026 llvm-windres ⇒ llvm-rc
.rwxr-xr-x roger staff 22 MB Mon Mar 16 17:11:35 2026 llvm-xray
.rwxr-xr-x roger staff 801 KB Mon Mar 16 17:06:56 2026 not
.rwxr-xr-x roger staff 28 MB Mon Mar 16 17:11:33 2026 obj2yaml
.rwxr-xr-x roger staff 115 MB Fri Apr 10 11:22:54 2026 opt
.rwxr-xr-x roger staff 22 MB Fri Apr 10 11:22:44 2026 sancov
.rwxr-xr-x roger staff 20 MB Mon Mar 16 17:11:34 2026 sanstats
.rwxr-xr-x roger staff 56 KB Mon Mar 16 16:56:27 2026 scan-build
.rwxr-xr-x roger staff 550 B Mon Mar 16 16:56:27 2026 scan-build-py
.rwxr-xr-x roger staff 4.6 KB Mon Mar 16 16:56:27 2026 scan-view
.rwxr-xr-x roger staff 3.8 KB Mon Mar 16 16:56:27 2026 set-xcode-analyzer
.rwxr-xr-x roger staff 1.7 MB Mon Mar 16 17:06:56 2026 split-file
.rwxr-xr-x roger staff 18 MB Mon Mar 16 17:11:43 2026 verify-uselistorder
lrwxr-xr-x roger staff 3 B Sat Apr 11 17:05:48 2026 wasm-ld ⇒ lld
.rwxr-xr-x roger staff 2.0 MB Mon Mar 16 17:06:57 2026 yaml-bench
.rwxr-xr-x roger staff 14 MB Mon Mar 16 17:11:13 2026 yaml2obj
@Rayman said:
The p2llmv-fixes.zip looks to have stuff for building the .a libraries, but since was gifted those .a files from @rogloh (above) and can't build it anyway, skipping that.
I just logged the output of the make process for building these libraries which should help you reverse engineer things so you can build your own versions if needed.
Ahh, we're doing printf, which is a typical bloat landmine. Most likely pulling in a bunch of floating point support along the way. Though IIRC P1 GCC could do printf with float support and not run totally out of RAM.
As Catalina still can, of course - I can't resist putting an ad in here! ...
<advertisment>
Using Catalina's COMPACT mode you can have full stdio support for programs executing entirely from Hub RAM - including full floating point and full file system support - on a P1 or a P2. The maximum overhead is about 14k.
Of course, if you don't need full file system or full floating point support (as "Hello World" doesn't) you don't have to include either one.
Catalina does this by providing different libraries, that have different combinations of stdio and floating point support:
-lcx full floating point support, full stdio support, full file system support - max overhead about 14k -lcix floating point support, stdio support but no floating point I/O, full file system support - max overhead about 11k -lc full floating point support, stdio support but no file system - max overhead about 10k -lci floating point support, stdio support but no floating point I/O and no file system - max overhead about 3k
Catalina's strength is that you usually do not need to modify a C program to get it to execute. Of course, there are additional libraries offered that add functionality that will only work on the Propeller 1 or Propeller 2, but if you stick to "clean" C (originally only C89, now also C99, C11 or C23), you don't need to modify programs, whether they are going to execute on the Propeller 1, Propeller 2, and whether they are compiled as COMPACT or NATIVE programs to execute from Hub RAM, or as XMM programs to execute from external RAM.
</advertisment>
Edited to add note: Technically, it is not "file system" support that bloats stdio so much - it is "stream" support. The -lci, library variant has simplified streams which supports only stdin, stdout and stderr. The other library variants all have full stream support.
I just compiled this...under P2LLVM with the printf and stdio.h include file commented out. I still get a large program generated so it looks like it sucks in a lot of library stuff by default. That definitely needs to be optimized to reduce the size of the binaries being created. You can see what it's bringing in inside the sorted symbol table output I extracted as out.txt using llvm-objdump -x. I think part of the problem is that P2LLVM is packing certain built-in functions into LUTRAM to accelerate commonly(?) made calls which then reference external code in HUB it brings in afterwards by default. This includes a lot of floating point conversion code which isn't necessary in many cases. It makes sense to do memcpy and memmove in LUT but not sure about all the other ones (unless you really want to do floating point).
Here's part of what it puts into LUTRAM. A lot of floating point conversion and comparsion calls, seemingly back to HUB RAM anyway at copies of the same label (with different code). A bug in the output generated perhaps? IMO it's probably best to keep this all in HUB anyway, only included when needed, and not even use LUTRAM. Also it could have used a JMP instead and had the RETA of the called function return to the original location which would be faster and save a long each time.
```
00000358 <__fixunsdfdi>:
358: b4 44 c0 fd calla #__fixuint
35c: 2e 00 64 fd reta
00000200 g F .text 00000008 __adddf3
00000208 g F .text 00000008 __addsf3
00000210 g F .text 00000054 __ashldi3
00000264 g F .text 00000058 __ashrdi3
000002bc g F .text 00000008 __eqdf2
000002bc g F .text 00000008 __ledf2
000002bc g F .text 00000008 __ltdf2
000002bc g F .text 00000008 __nedf2
000002c4 g F .text 00000008 __gedf2
000002c4 g F .text 00000008 __gtdf2
000002cc g F .text 00000008 __unorddf2
000002d4 g F .text 00000008 __eqsf2
000002d4 g F .text 00000008 __lesf2
000002d4 g F .text 00000008 __ltsf2
000002d4 g F .text 00000008 __nesf2
000002dc g F .text 00000008 __gesf2
000002dc g F .text 00000008 __gtsf2
000002e4 g F .text 00000008 __unordsf2
000002ec g F .text 00000008 __divdi3
000002f4 g F .text 00000008 __divdf3
000002fc g F .text 00000034 __divsi3
00000330 g F .text 00000008 __divsf3
00000338 g F .text 00000008 __extendsfdf2
00000340 g F .text 00000008 __fixdfsi
00000348 g F .text 00000008 __fixsfdi
00000350 g F .text 00000008 __fixsfsi
00000358 g F .text 00000008 __fixunsdfdi
00000360 g F .text 00000008 __fixunsdfsi
00000368 g F .text 00000008 __fixunssfdi
00000370 g F .text 00000008 __fixunssfsi
00000378 g F .text 00000008 __floatdisf
00000380 g F .text 00000008 __floatsidf
00000388 g F .text 00000008 __floatundidf
00000390 g F .text 00000008 __floatundisf
00000398 g F .text 00000008 __floatunsidf
000003a0 g F .text 00000008 __floatunsisf
000003a8 g F .text 000000d8 __floatsisf
00000480 g F .text 00000054 __lshrdi3
000004d4 g F .text 00000094 memcpy
00000568 g F .text 00000068 memmove
000005d0 g F .text 00000028 memset
000005f8 g F .text 00000008 __moddi3
00000600 g F .text 00000034 __modsi3
00000634 g F .text 00000008 __muldf3
0000063c g F .text 00000008 __mulsf3
00000644 g F .text 00000098 __muldi3
000006dc g F .text 00000014 __negdi2
000006f0 g F .text 00000024 __subdf3
00000714 g F .text 00000018 __subsf3
0000072c g F .text 00000014 __udivdi3
00000740 g F .text 000000cc __udivmoddi4
0000080c g F .text 0000003c __umoddi3
00000848 g F .text 00000008 sqrtf
00000850 g F .text 00000008 powf
````
Comments
I just dumped the symbols, wow your's includes a lot of extra stuff. I see you are setting the clock but not much more than that vs my example. The clock seems to have sucked in a lot of extra string functionality as well as time functions. I think you may well have used different compiler command line settings. This is what I used to build my elf file.
clang -I. -I../.. -Ibuild -Wall -Werror --target=p2 -fno-jump-tables -c -fdata-sections -ffunction-sections -o main.o main.c
clang -v --target=p2 -Wl,--gc-sections -Wl,-L/Users/roger/Applications/p2llvm/libc/lib -Wl,-L/Users/roger/Applications/p2llvm/libp2/lib -o main.elf main.o
EDIT: yep after using this on your source I get the symbols in out2.txt which is a smaller image vs your original .elf file dumped in out.txt. It might be the --gc-sections option doing this. EDIT2: yes it is that option which makes the difference.
Last time when through this, was told that the .elf doesn't really reflect the size of the actual binary...
Seems you have to convert the .elf to a .bin to see exactly how big it is on chip?
Used web search to figure out how to convert .elf to .bin
Build in the way from post#32 above and size is smaller...
@rogloh Guess I can look at what you posted in other thread, but...
Are the changes you made just to the lib files? Or, to the main LLVM files too?
32k for a hello world is still ridiculous.
I always find it annoying that the typical compiler->linker arrangement is unable to really provide readable listings that actually correspond to the output binary. Very hard to figure out what's actually in your code.
(don't read this post as me being all negative!)
@Wuerfel_21 code is posted above…
Would be interesting to compare binary size with Flexprop…
Details are in the other thread so best to read it. But yes it has changes to the libs (which is what I provided you), and also to LLVM source which needs fixes for the C modulus "%" operation otherwise it crashes LLVM, and also other CORDIC dependency fixes etc. You really should take those file changes and update your own LLVM build to resolve theses. Also there are still cases if you disassemble random bytes say with
llvm-objdump -Dit will crash this tool because the disassembler doesn't know about ALL P2 instructions yet. But if you disassemble genuine P2 compiled C code withllvm-objdump -dit's fine.Often the case with C with libs included.
I find in this case that
objdump -dis generally okay. I get a good disassembly listing of all P2 code in the binary. But if you wanted to see the symbols for global data accesses you don't see them used in the code it's just absolute read/write hex addresses which is not nice. You only see data symbols and their addresses in the symbol table. They really should cross reference them back in the disassembly listing IMO, or maybe that's just not implemented in the P2 port right now. Also it'd be real nice to have a way in the listing to see which C function arguments are being accessed in the stack frame or which initial registers they get copied into by somehow referring them back to the source code. Bit tricky if they don't have a way to carry it through in the .elf file. From memory I think enabling debugging helps pass more info down through the intermediate files.main.elf: file format elf32-p2 Disassembly of section .text: 00000000 <__entry>: 0: f8 a1 03 fb rdlong r0, ptra 4: 10 00 80 fd jmp #\16 ... 00000040 <__start0>: 40: f8 a1 03 fb rdlong r0, ptra 44: 98 00 00 ff augs #152 48: 28 a1 07 f6 mov r0, #296 4c: d0 a1 03 fb rdlong r0, r0 50: 02 a0 97 fb tjz r0, #2 54: 00 00 90 ff augd #1048576 58: 00 fe 65 fd hubset #255 5c: 00 00 00 ff augs #0 60: 68 a0 07 f6 mov r0, #104 64: d0 01 e8 fc coginit #0, r0 00000068 <__start>: 68: f8 a1 03 fb rdlong r0, ptra 6c: 98 00 00 ff augs #152 70: 38 a5 07 f6 mov r2, #312 74: 29 fe 67 fd setq2 #511 78: 01 00 00 ff augs #1 7c: 00 00 04 fb rdlong $0x000, #0Guess it’d be interesting to use spin2cpp on something and see if clang can compile it….
In the other thread it was noted that the elf files were bloated because they clear all of memory and not just the code.
I don't remember if there was an option to not do that or I built a loader that removed that.
Mike
Think I have a minimum build folder (at least compiles hello.c).
To make .o from .c:
clang -I. -I./sys -Ibuild -Wall -Werror --target=p2 -fno-jump-tables -c -fdata-sections -ffunction-sections -o hello.o hello.c
To make .elf from .o
clang -v --target=p2 -Wl,--gc-sections -Wl,-L./ -Wl,-L./ -o hello.elf hello.o
To make .bin from .elf:
llvm-objcopy -O binary hello.elf hello.bin
Guess forgot that made webpage for Clang a long time ago...
https://www.rayslogic.com/Propeller2/Clang.htm
I've uploaded a minimum build folder that can compile as in post #42.
But, still need to add the fixes from @rogloh , so not really ready yet...
Ahh, we're doing
printf, which is a typical bloat landmine. Most likely pulling in a bunch of floating point support along the way. Though IIRC P1 GCC could do printf with float support and not run totally out of RAM.FlexC has multiple levels of mitigation for this problem, so beating it is hard:
-
printf(though notsprintfor other variants) is treated as a builtin (__builtin_printf) and the compiler scans the format string and uses simpler functions to accomplish the same job if possible- If the user program doesn't use floats, float support in the library is automatically disabled
- The actual library formatting implementation is pretty lean overall.
( It used to be possible to reduce bloat related to file descriptors etc if you just want to print to the console, but this got busted)
The fairer comparsion would be to use
puts, I guess.Most direct equivalent code to your LLVM example:
enum { _CLKFREQ = 200000000 }; #include "propeller.h" #include "stdio.h" int main() { _setbaud(115200 * 2); printf("Hello World!\n"); while(1) { _waitx(CLKFREQ); } }Comes out to 7352 bytes.
If we force the real formatting implementation by using fprintf, which has no builtin processing:
enum { _CLKFREQ = 200000000 }; #include "propeller.h" #include "stdio.h" int main() { _setbaud(115200 * 2); fprintf(stdout,"Hello World!\n"); while(1) { _waitx(CLKFREQ); } }Comes out to 8588 bytes.
If we also enable float support...
enum { _CLKFREQ = 200000000 }; #include "propeller.h" #include "stdio.h" int main() { _setbaud(115200 * 2); float foo = 1.0; fprintf(stdout,"Hello World!\n"); while(1) { _waitx(CLKFREQ); } }We get 13244 bytes
Going the other way, if we don't include stdio (which will bog us down with a bunch of function pointers the compiler struggles to get rid of) and call the builtin directly:
enum { _CLKFREQ = 200000000 }; #include "propeller.h" int main() { _setbaud(115200 * 2); __builtin_printf("Hello World!\n"); while(1) { _waitx(CLKFREQ); } }We're down to 5768 bytes
If the aforementioned simple IO feature wasn't busted... (does not work on current versions)
enum { _CLKFREQ = 200000000 }; #define _SIMPLE_IO #pragma exportdef _SIMPLE_IO #include "propeller.h" int main() { _setbaud(115200 * 2); __builtin_printf("Hello World!\n"); while(1) { _waitx(CLKFREQ); } }It would be 3040 bytes. Still a lot, but most of it is actually zero-padding that the compiler will generate no matter what.
(EDIT: by adding
-H 32to the command line, some of it is saved and the size is exactly 2048 bytes - IDK why it's there by default)And for comparsion, using
putsinstead ofprintf:enum { _CLKFREQ = 200000000 }; #include "propeller.h" #include "stdio.h" int main() { _setbaud(115200 * 2); puts("Hello World!\n"); while(1) { _waitx(CLKFREQ); } }4688 bytes
(all at default settings, -Os might make things slightly smaller but let's not)
So that's definitely something that needs improving to make LLVM a good option for P2.
Didn't even know about that one... >.<
Though it still ends up being an annotated disassembly of the already built program.
Hmm... If things can be under 32k, maybe can compile for P1 too somehow?
That's probably pretty futile without XMM though, would guess...
Thought those old notes on compiling LLVM would give me some insight on how to compile the .a libraries. But, seems couldn't figure it out back then either
Ok, copied the @rogloh files from https://forums.parallax.com/discussion/169862/micropython-for-p2/p23 into a build folder and rebuilt.
Copied over new files from llmvfixes.zip first. Seems should be ready to go.
The p2llmv-fixes.zip looks to have stuff for building the .a libraries, but since was gifted those .a files from @rogloh (above) and can't build it anyway, skipping that.
Do have a question about the clang*.exe files... The all have exactly the same file size. Thinking they are all actually the same file. Are they?
7-Zip can compress them down as though were one file, so thing that is true...
When I build LLVM I get these files in the "bin" folder area and the clang* fies are different. Not sure what exatly you are talking about or maybe its a windows specific thing with Visual Studio. I do see a couple of symlinks are used to target the same clang binary if that's what you meant.
I just logged the output of the make process for building these libraries which should help you reverse engineer things so you can build your own versions if needed.
As Catalina still can, of course - I can't resist putting an ad in here!
...
<advertisment>Using Catalina's COMPACT mode you can have full stdio support for programs executing entirely from Hub RAM - including full floating point and full file system support - on a P1 or a P2. The maximum overhead is about 14k.
For example, "Hello World" for the Propeller 1:
catalina hello_world.c -lcx -O5 -C COMPACTcode size 14952
or, for the Propeller 2:
catalina hello_world.c -P2 -lcx -O5 -C COMPACTcode size 14988
Of course, if you don't need full file system or full floating point support (as "Hello World" doesn't) you don't have to include either one.
Catalina does this by providing different libraries, that have different combinations of stdio and floating point support:
-lcxfull floating point support, full stdio support, full file system support - max overhead about 14k-lcixfloating point support, stdio support but no floating point I/O, full file system support - max overhead about 11k-lcfull floating point support, stdio support but no file system - max overhead about 10k-lcifloating point support, stdio support but no floating point I/O and no file system - max overhead about 3kSo ...
catalina hello_world.c -p2 -lcix -O5 -C COMPACTcode size 11372 bytes
catalina hello_world.c -p2 -lc -O5 -C COMPACTcode size 10136 bytes
catalina hello_world.c -p2 -lci -O5 -C COMPACTcode size 3436 bytes
As you might expect, it is including file system support that is the largest memory hog (edit: see note, below).
In the case of "Hello, World", Catalina also offers several other ways to reduce code size.
You can use stdio but add a smaller version of printf (slightly less functional, but adequate for most programs) by adding -ltiny:
catalina hello_world.c -p2 -lci -O5 -C COMPACT -ltinycode size 1520 bytes
Or you can replace printf with an even smaller version that does not pull in any stdio code at all:
catalina hello_world.c -p2 -lci -O5 -C COMPACT -Dprintf=t_printfcode size 1056 bytes
None of these require any modifications to hello_world.c, which in all the above cases is as follows:
#include <stdio.h> void main() { printf("Hello, world!\n"); }However, if you are ok with modifying the program, you can do better.
For example, this program - tiny_world.c - is functionally identical to hello_world.c, but uses only "built in" capabilities:
#define printf(str) t_string(1, str); void main() { printf("Hello, world!\n"); }Then ...
catalina tiny_world.c -p2 -lci -O5 -C COMPACT -C NO_EXIT -C NO_REBOOTcode size 100 bytes
Catalina's strength is that you usually do not need to modify a C program to get it to execute. Of course, there are additional libraries offered that add functionality that will only work on the Propeller 1 or Propeller 2, but if you stick to "clean" C (originally only C89, now also C99, C11 or C23), you don't need to modify programs, whether they are going to execute on the Propeller 1, Propeller 2, and whether they are compiled as COMPACT or NATIVE programs to execute from Hub RAM, or as XMM programs to execute from external RAM.
</advertisment>Edited to add note: Technically, it is not "file system" support that bloats stdio so much - it is "stream" support. The
-lci, library variant has simplified streams which supports only stdin, stdout and stderr. The other library variants all have full stream support.I just compiled this...under P2LLVM with the printf and stdio.h include file commented out. I still get a large program generated so it looks like it sucks in a lot of library stuff by default. That definitely needs to be optimized to reduce the size of the binaries being created. You can see what it's bringing in inside the sorted symbol table output I extracted as out.txt using
llvm-objdump -x. I think part of the problem is that P2LLVM is packing certain built-in functions into LUTRAM to accelerate commonly(?) made calls which then reference external code in HUB it brings in afterwards by default. This includes a lot of floating point conversion code which isn't necessary in many cases. It makes sense to do memcpy and memmove in LUT but not sure about all the other ones (unless you really want to do floating point).//#include <stdio.h> #include <propeller.h> int main(void) { _uart_init(63,62,115200,0); //printf("hello world\n"); return 0; }Here's part of what it puts into LUTRAM. A lot of floating point conversion and comparsion calls, seemingly back to HUB RAM anyway at copies of the same label (with different code). A bug in the output generated perhaps? IMO it's probably best to keep this all in HUB anyway, only included when needed, and not even use LUTRAM. Also it could have used a JMP instead and had the RETA of the called function return to the original location which would be faster and save a long each time.
```
00000358 <__fixunsdfdi>:
358: b4 44 c0 fd calla #__fixuint
35c: 2e 00 64 fd reta
00000360 <__fixunsdfsi>:
360: 90 45 c0 fd calla #__fixuint
364: 2e 00 64 fd reta
00000368 <__fixunssfdi>:
368: 30 46 c0 fd calla #__fixuint
36c: 2e 00 64 fd reta
00000370 <__fixunssfsi>:
370: e4 46 c0 fd calla #__fixuint
374: 2e 00 64 fd reta
...
00000200 g F .text 00000008 __adddf3
00000208 g F .text 00000008 __addsf3
00000210 g F .text 00000054 __ashldi3
00000264 g F .text 00000058 __ashrdi3
000002bc g F .text 00000008 __eqdf2
000002bc g F .text 00000008 __ledf2
000002bc g F .text 00000008 __ltdf2
000002bc g F .text 00000008 __nedf2
000002c4 g F .text 00000008 __gedf2
000002c4 g F .text 00000008 __gtdf2
000002cc g F .text 00000008 __unorddf2
000002d4 g F .text 00000008 __eqsf2
000002d4 g F .text 00000008 __lesf2
000002d4 g F .text 00000008 __ltsf2
000002d4 g F .text 00000008 __nesf2
000002dc g F .text 00000008 __gesf2
000002dc g F .text 00000008 __gtsf2
000002e4 g F .text 00000008 __unordsf2
000002ec g F .text 00000008 __divdi3
000002f4 g F .text 00000008 __divdf3
000002fc g F .text 00000034 __divsi3
00000330 g F .text 00000008 __divsf3
00000338 g F .text 00000008 __extendsfdf2
00000340 g F .text 00000008 __fixdfsi
00000348 g F .text 00000008 __fixsfdi
00000350 g F .text 00000008 __fixsfsi
00000358 g F .text 00000008 __fixunsdfdi
00000360 g F .text 00000008 __fixunsdfsi
00000368 g F .text 00000008 __fixunssfdi
00000370 g F .text 00000008 __fixunssfsi
00000378 g F .text 00000008 __floatdisf
00000380 g F .text 00000008 __floatsidf
00000388 g F .text 00000008 __floatundidf
00000390 g F .text 00000008 __floatundisf
00000398 g F .text 00000008 __floatunsidf
000003a0 g F .text 00000008 __floatunsisf
000003a8 g F .text 000000d8 __floatsisf
00000480 g F .text 00000054 __lshrdi3
000004d4 g F .text 00000094 memcpy
00000568 g F .text 00000068 memmove
000005d0 g F .text 00000028 memset
000005f8 g F .text 00000008 __moddi3
00000600 g F .text 00000034 __modsi3
00000634 g F .text 00000008 __muldf3
0000063c g F .text 00000008 __mulsf3
00000644 g F .text 00000098 __muldi3
000006dc g F .text 00000014 __negdi2
000006f0 g F .text 00000024 __subdf3
00000714 g F .text 00000018 __subsf3
0000072c g F .text 00000014 __udivdi3
00000740 g F .text 000000cc __udivmoddi4
0000080c g F .text 0000003c __umoddi3
00000848 g F .text 00000008 sqrtf
00000850 g F .text 00000008 powf
````