Compiling HLL to PASM possible?
SRLM
Posts: 5,045
I would like to write code in a high level language (C/C++, BASIC, etc.) and have it compiled down to assembly. This would then be loaded into a cog and run as native assembly code.
The question: I would like to know which compiler(s) can compile directly to Propeller assembly.
I understand the limitations inherent in doing this (small code size, less efficient than pure assembly, etc.) I'm not looking for any interpreter type setup or LMM, or advanced language features such as objects, functions, etc. I want to be able to compile down so that I can write my code (math routines to be performed in a fast loop) in the high level language and not have to worry too much about the assembly code. I'd probably still do some tweaks and modifications, but it would be nice to start with the HLL.
The question: I would like to know which compiler(s) can compile directly to Propeller assembly.
I understand the limitations inherent in doing this (small code size, less efficient than pure assembly, etc.) I'm not looking for any interpreter type setup or LMM, or advanced language features such as objects, functions, etc. I want to be able to compile down so that I can write my code (math routines to be performed in a fast loop) in the high level language and not have to worry too much about the assembly code. I'd probably still do some tweaks and modifications, but it would be nice to start with the HLL.
Comments
If you look in the propgcc demos you will find a FullDuplexSerial driver written in C that runs in COG and works at 115200 baud.
In some cases you don't even have to do anything special to get code to run in cog. In the propgcc demos there is a Fast Fourier Transform the core loop of which gets loaded in to COG to run at full PASM speed. Have a look for "fcahe" when you find the propgcc threads.
This is actually quite amazing. I had always argued that compiling C/C++ to COG was going to be useless as the code would be to big and there is no stack or indexed addressing etc to help the C compiler. All in all more work that it's worth. But the guys did it anyway and it works very well.
[size=+2]PropBasic[/size]
I used PropBasic version 00.01.14 (2011-07-26) to test with. The source code that I used is a simple program to multiply some numbers together, add them, and divide with them. This goes well with the math intensive but no I/O application that I need. It's probably not a good benchmark to use if you're going to be doing complicated serial communication or anything with delays, I/O, etc.
A side note about PropBasic: the syntax is a bit quirky. It requires that your code have only one operator/statement per line. So "num = a+b+c" is out. It's odd, but easy enough to work with.
I used the following command to test with:
There doesn't seem to be any command line options to use. Anyway, that generated the following Spin file:
I think the compiler did a good job of being faithful to the original code, but I noticed some things:
1. Every source code line is in the .spin file as a comment, which is very helpful.
2. The multiplication and division is done inline, so each additional multiplication consumes 18 longs. It does share temporary variables however.
3. All variables are stored in cog RAM, and user defined variables use the user defined name.
4. The compiler added the remnants of some serial communication code: three longs at "__RAM" and a constants block.
5. The code is nicely formatted straight from the compiler (although it uses spaces instead of tabs).
[size=+2]Propeller GCC[/size]
I used the most recent (and only) version posted in the GCC downloads page (v0_2_3 from 2012-02-08). The source program I used was the same as from the PropBasic, except modified a bit for C.
I based it off the fft_bench.c demo, which is why it has the various preprocessor statements at the begining. Note the use of the keyword "volatile" for the int declaration: wihtout it the compiler simply optimized away everything into a simple jump loop.
Anyway, I used the following command to generate the code:
The options do the following:
-0s: optimize code for minimum size
-S: output source code as a file
-mcog: use the cog memory model (put everything in a single cog)
-mspin: generate the resulting spin file
There is also the -mfcache option, but in this case it did not generate code any differently.
And, when run it generated the following spin code:
Some things that I have noticed about this code:
1. The output lacks suitable comments, and the resultant code is rather difficult to understand. It doesn't use original variable names.
2. It creates a multiplication subroutine. This is slightly less efficient in execution time than putting it inline, but it is vastly more efficient on space.
3. The code stores variables in the hub, not the cog as expected.
4. The -Os option appears to be needed: with no optimization the output code is 192 lines. Interstingly, -O2 gives the same output as -0s.
5. The multiply loop ("__MULSI") is very compact (9 longs). It looks like it is O(1). It is also only 4 lines, so at most it will take 32*4 cycles to complete. I'm not sure how it works yet though (especially with a sign).
6. The divide routine is a bit more expensive: 51 longs. To support it though, the loop ("__UDIVSI") is as efficient as the multiply loop.
7. GCC isn't very efficient in memory management from the default: it creates a 256 long hub stack and a 16 long cog stack frame. This could probably be cleaned up manually.
8. It's missing a "FIT" statement at the end.
9. The generated code isn't very well formatted.
Next, I tried a slightly modified source:
Note here that the only variable marked volatile is result0. I compiled with
And got the following output:
This is much better: the output no longer has a bunch of RDLONG and WRLONGs, and is hench much more efficient. Previously, the main loop was 34 lines (many of which are hub access), and now it is 22 lines. The other comments still apply though. Also as before, -mfcache did not change the output code.
[size=+2]Conclusion[/size]
I think I will look into Propeller GCC more. It seems to do a good job for compiling down to efficient Propeller assembly, and it isn't too hard to read the output. I hope that it will be improved over time as well. The PropBasic compiler has a more understandable output, but the inefficient use of cog RAM and the lack of updates (no changes in 8 months) has me worried. Propeller GCC seems to fit my requirements.
As written, it's quite inefficient, but I'd bet that GCC will make a temporary variable for num1*num1+num2*num2+num3*num3.
Even if it reorders operations and breaks up lines of code the compiler still has to make a syntax tree, which has enough information to output useful comments. Some comments are better than none, especially when it changes the logic and order of the operations.
@Dave Hein
If you include the flags "-mspin" and "-S" it should make the complete assembly code (ie, code with the support assembly in there instead of just the business logic directly from the code).
EDIT: Oh, I see you mentioned it in your previous post on the March 4, and you also posted the output from PropGCC. Sorry I missed that. I usually skim through any post that is more a dozen lines or so. I should have read it in more detail.