LLVM Backend for Propeller 2
TL;DR: I want a modern C/C++ toolchain for Propeller 2 that compiles to native instructions directly, so I made one, on the shoulders of giants. It lives here: https://github.com/ne75/p2llvm
First things first that I think is important: I'm not a computer scientist and I've never studied compilers. I'm just an eager EE who really likes Propeller (and Propeller 2!). I've probably done some dumb things in this project. If I did, let me know, I want to learn.
Second things second, the immediate question I'm sure everyone will have is: "Why are you writing another compiler for Propeller 2?" I will answer that with a bit of a rant.
<rant>
Propeller (and Propeller 2) are power chips that can do A LOT of a small, simple, and power efficient package. The high flexibility allows it to be used in a very wide variety of applications without having to include a lot of support hardware. As such, it should be used widely in industry, but it's not (I work in aerospace where a multicore chip like this would be extremely useful, yet no-one I work with has even heard of it). I think there are several reasons for its lack of adoption, but one of the biggest ones is the lack of a modern toolchain and lack of modern language support. Propeller 1 addressed this with PropGCC, but it was several years after the release of Propeller 1 and built around GCC 4, which is outdated in the modern world. Additionally, there appears to be a game of chicken going on between Parallax and the Propeller community, where Parallax is focused on Spin and the development of Propeller hardware (which is where their focus should be right now), so they are hoping the community steps up (again) and develops the tools they desire, while the community is hoping to see something official come out and not put too much effort into developing something that might be pushed aside by an "official" toolchain. (As a general note, I'm not trying to start a debate about that here, it's just my observation). As a result, we have a few toolchains that are not quite good enough by industry standards (sorry to those who work on them, I know these tools take a lot of work and I do appreciate all the work that has been put in so far) that don't fully support C/C++ (like fastspin), and some that do have full C++ support, but do not support the full functionality of Propeller hardware (like RISCV-P2), and some in between, like p2gcc, which is more or less a bandaid for make use of PropGCC for Propeller 2 (p2gcc also doesn't support the most "standard" P2 library which makes code developed with it not very portable). While these are excellent tools to demonstrate the capabilities of the hardware, they make developing scalable products difficult if not impossible. There has also been several requests for various language support (microPython, Arduino, Rust, etc), all of which will require developing a compiler for Propeller's architecture.
</rant>
This project aims to solve all of the problems listed above. LLVM is a modern toolchain used by many companies around the world, developed at Berkley a while back and supported primarily by Apple at this point. It has an intermediate representation that frontends (such as clang for C/C++/Objective C) compile down to, and target specific backends the compile the IR down to target machine instructions. The majority of the work to add a new backend is baked into LLVM as it, and the P2 target is another backend (same as the existing x86, AArch64, MIPS, AVR, MSP430, etc etc backends) that provides basic information (such as registers, instruction encoding, and ABI information) to connect the dots between the various compiler passes that LLVM does. Once complete, it will provide access to the full functionality of several languages for Propeller.
I am developing this project with two main goals in mind:
1. create as much backward compatibility as possible with PropGCC projects. This won't be completely possible due to a few differences in Propeller 1 to Propeller 2 architecture, but the hope is that porting those P1 projects to P2 will be easy. I don't want to rewrite a ton of my existing code.
2. provide a tool that the community finds useful, regardless of adoption by Parallax as a formal tool. I know there's been some gripe on forums about the community's hard work not being adopted as much as people hope, but I am not pushing this for formal adoption. We'll just see what happens.
So without further ado, here it is: https://github.com/ne75/p2llvm. I went ahead and actually copied the entire LLVM repo (rather than making a fork) to not copy over the 1M+ commits in the main LLVM repo, but maybe that's not a good idea--I can always fix it later.
What you can do with it:
- compile and link C/C++ programs using the p2 target in clang and lld into elf binaries for loading onto a P2 board.
- disassemble compiled binaries using llvm-objdump
- generate printed PASM listings. These aren't perfectly formatted so they won't immediately compile with spin tools, but you can at least see the instructions being generated.
What I've tested so far:
- basic math operations
- flow control and branching
- function calling
- clock configuration
- starting cogs with parameters
- variable argument functions
- basic C++ classes.
What I still need to test:
- dynamic memory allocation
- more complicated programs with complex flow control and complex loops
- inheritance and virtual functions (C++)
- A lot of other things I can't immediately think of.
What still needs implementation
- the majority of instructions
- more hardware control (smart pins, streamer, etc etc)
- more efficient use of cog/lut RAM
- a math library
Over the next few weeks I'll try to put together some demoes of what we can do with it. This will also help flesh out the remaining functionality needed. with 400+ instructions, it will take a while to add them all. The current tests folder was really just a scratch pad to test out features and find bugs. First one I expect to do is my LVDS display driver I wrote a few months back but was never able to really use in the project I was designing it for. If there's something anyone would like to see demoed, let me know, I need ideas.
I'd also appreciate any pointers on what is our current "standard" P2 library, so that I can implement/port a version for this compiler.
What I'm currently not planning on doing is porting the entire C standard library. That's a big undertaking and I need to learn a lot more about the standard and what I can steal from PropGCC and what needs to be fundamentally changed. I did however compile the stdlib portion of the C standard library. I haven't tested it yet but it does compile.
I encourage everyone to read the wiki.md to see roughly how this works, and more importantly, what doesn't work. This whole project is a heavy work in progress. Expect bugs, missing features, or things that just don't seem right. I'll keep working on it as I can, burning down the todo list and continuously adding to it. If you want to work on it, feel free! just make a PR. If you see something that's weird, doesn't make sense, or doesn't work, please let me know so I can take a look and fix it. Eventually I'll write up details on how LLVM is actually structured, but that's an entirely separate project.
First things first that I think is important: I'm not a computer scientist and I've never studied compilers. I'm just an eager EE who really likes Propeller (and Propeller 2!). I've probably done some dumb things in this project. If I did, let me know, I want to learn.
Second things second, the immediate question I'm sure everyone will have is: "Why are you writing another compiler for Propeller 2?" I will answer that with a bit of a rant.
<rant>
Propeller (and Propeller 2) are power chips that can do A LOT of a small, simple, and power efficient package. The high flexibility allows it to be used in a very wide variety of applications without having to include a lot of support hardware. As such, it should be used widely in industry, but it's not (I work in aerospace where a multicore chip like this would be extremely useful, yet no-one I work with has even heard of it). I think there are several reasons for its lack of adoption, but one of the biggest ones is the lack of a modern toolchain and lack of modern language support. Propeller 1 addressed this with PropGCC, but it was several years after the release of Propeller 1 and built around GCC 4, which is outdated in the modern world. Additionally, there appears to be a game of chicken going on between Parallax and the Propeller community, where Parallax is focused on Spin and the development of Propeller hardware (which is where their focus should be right now), so they are hoping the community steps up (again) and develops the tools they desire, while the community is hoping to see something official come out and not put too much effort into developing something that might be pushed aside by an "official" toolchain. (As a general note, I'm not trying to start a debate about that here, it's just my observation). As a result, we have a few toolchains that are not quite good enough by industry standards (sorry to those who work on them, I know these tools take a lot of work and I do appreciate all the work that has been put in so far) that don't fully support C/C++ (like fastspin), and some that do have full C++ support, but do not support the full functionality of Propeller hardware (like RISCV-P2), and some in between, like p2gcc, which is more or less a bandaid for make use of PropGCC for Propeller 2 (p2gcc also doesn't support the most "standard" P2 library which makes code developed with it not very portable). While these are excellent tools to demonstrate the capabilities of the hardware, they make developing scalable products difficult if not impossible. There has also been several requests for various language support (microPython, Arduino, Rust, etc), all of which will require developing a compiler for Propeller's architecture.
</rant>
This project aims to solve all of the problems listed above. LLVM is a modern toolchain used by many companies around the world, developed at Berkley a while back and supported primarily by Apple at this point. It has an intermediate representation that frontends (such as clang for C/C++/Objective C) compile down to, and target specific backends the compile the IR down to target machine instructions. The majority of the work to add a new backend is baked into LLVM as it, and the P2 target is another backend (same as the existing x86, AArch64, MIPS, AVR, MSP430, etc etc backends) that provides basic information (such as registers, instruction encoding, and ABI information) to connect the dots between the various compiler passes that LLVM does. Once complete, it will provide access to the full functionality of several languages for Propeller.
I am developing this project with two main goals in mind:
1. create as much backward compatibility as possible with PropGCC projects. This won't be completely possible due to a few differences in Propeller 1 to Propeller 2 architecture, but the hope is that porting those P1 projects to P2 will be easy. I don't want to rewrite a ton of my existing code.
2. provide a tool that the community finds useful, regardless of adoption by Parallax as a formal tool. I know there's been some gripe on forums about the community's hard work not being adopted as much as people hope, but I am not pushing this for formal adoption. We'll just see what happens.
So without further ado, here it is: https://github.com/ne75/p2llvm. I went ahead and actually copied the entire LLVM repo (rather than making a fork) to not copy over the 1M+ commits in the main LLVM repo, but maybe that's not a good idea--I can always fix it later.
What you can do with it:
- compile and link C/C++ programs using the p2 target in clang and lld into elf binaries for loading onto a P2 board.
- disassemble compiled binaries using llvm-objdump
- generate printed PASM listings. These aren't perfectly formatted so they won't immediately compile with spin tools, but you can at least see the instructions being generated.
What I've tested so far:
- basic math operations
- flow control and branching
- function calling
- clock configuration
- starting cogs with parameters
- variable argument functions
- basic C++ classes.
What I still need to test:
- dynamic memory allocation
- more complicated programs with complex flow control and complex loops
- inheritance and virtual functions (C++)
- A lot of other things I can't immediately think of.
What still needs implementation
- the majority of instructions
- more hardware control (smart pins, streamer, etc etc)
- more efficient use of cog/lut RAM
- a math library
Over the next few weeks I'll try to put together some demoes of what we can do with it. This will also help flesh out the remaining functionality needed. with 400+ instructions, it will take a while to add them all. The current tests folder was really just a scratch pad to test out features and find bugs. First one I expect to do is my LVDS display driver I wrote a few months back but was never able to really use in the project I was designing it for. If there's something anyone would like to see demoed, let me know, I need ideas.
I'd also appreciate any pointers on what is our current "standard" P2 library, so that I can implement/port a version for this compiler.
What I'm currently not planning on doing is porting the entire C standard library. That's a big undertaking and I need to learn a lot more about the standard and what I can steal from PropGCC and what needs to be fundamentally changed. I did however compile the stdlib portion of the C standard library. I haven't tested it yet but it does compile.
I encourage everyone to read the wiki.md to see roughly how this works, and more importantly, what doesn't work. This whole project is a heavy work in progress. Expect bugs, missing features, or things that just don't seem right. I'll keep working on it as I can, burning down the todo list and continuously adding to it. If you want to work on it, feel free! just make a PR. If you see something that's weird, doesn't make sense, or doesn't work, please let me know so I can take a look and fix it. Eventually I'll write up details on how LLVM is actually structured, but that's an entirely separate project.
Comments
Another piece of the P2 puzzle solved...
P2GCC is certainly a band-aid as you say and limits the performance possible, and there are issues with the others too when it comes to fully supporting C. With any luck once done, this could really help improve the performance of native MicroPython too. I do hope you can stick through this and complete it. We already had an earlier GCC port apparently abandoned.
Probably no need to try to make full use of all those 400 P2 instructions. The key would be to just select the set you need for some efficient C to PASM2 code generation and allow use of others via inline PASM for example. Tighter prolog/epilogs using setq for block transfers, efficient register parameter passing and conditional branches/calls, minimising the load/stores to hub and using registers efficiently probably makes a huge difference and already could buy a lot compared to p2gcc. I expect the LLVM toolchain can optimize things well to begin with but I've never looked at its output.
For the library I guess PropGCC is one place to steal from. I found it was incomplete when it came to math stuff however and there'll be other missing parts. But it may get you going at least until you find something better.
A P2 "getting started" cheat sheet might still be useful as a road map though.
@n_ermosh : This looks very promising, but I'm afraid I'm stuck at the very first step of building it. The generic LLVM instructions in README.md suggest using cmake, but there's no CMakeLists.txt in the repository. Is that an oversight, or is there some way to generate CMakeLists.txt automatically? Could you perhaps post a "getting started for P2" cheat sheet?
Thanks,
Eric
Please please please, if you haven't already, look into this thread: https://forums.parallax.com/discussion/170253/propeller2-h-for-c-compilers
It will be really important that all of the different compilers use a common header, allowing libraries to be as interoperable between them as possible.
One other thing I'd encourage you to do, though I know it might be painful, is to copy your changes over to a proper fork. Without your repository being a fork, there is no hope of ever getting it merged into upstream llvm. If it _does_ get merged into upstream, then we'll get a new Propeller compiler every time a new version of LLVM comes out. The sooner your changes get copied to a real fork, the easier it will be.
This is a good idea. What I really want to do is use LUT ram for every cog's stack so it doesn't use hub memory at all, but that becomes a bit difficult when dealing with byte and word sized variables. However, using it just to push/pop registers in the prologue/epilogue is something I can implement pretty easily. will put it on the list. I do want all instructions available, at least for inline ASM code. Eventually I also want to make the assembly parser support all PASM constructs so that we can compile blocks of assembly directly without needing other toolchains, but that's an issue for another day. Big thing that doesn't work in the parser right now is conditional and effect flags. I hard coded them into the instruction strings for comparison and branches for specific effects, but there's more that could be done.
I'll put something more useful together today--I realize the wiki stuff I have is not very helpful to fresh eyes.
Hm that's interesting. I did a clean build before I posted to make sure I had no issues and it built fine. I'll try actually reconfiguring from scratch too. I'm running macOS as well. My best guess is its some dependency issue.
I'll take a look at the thread again. last I read it (like 4-5 months ago) it was still pretty new and there was not a lot of agreement on what it will be, but I'll catch up again. I completely agree we should hae a standard header for all compilers, I'll start making tracks to conform to that one. One potential deviation that I'll have (that I hope to solve) is that right now, statements like DIRA |= 1 << 1; are not possible, since the compiler has no knowledge of the cog vs hub rams, so I can't give a pointer to cog memory. and as far as I know, C doesn't have a construct for assigning a value directly to a register (which is what LLVM thinks DIRA is).
I'll also start making this a proper fork. I've also noticed that LLVM is starting a port of the standard C library for backends to implement. That would be super helpful for creating the propeller 2 version for this as well.
Thanks. I still haven't managed to actually build a whole application although I have been able to get clang -S to show me assembly code from a simple function, so that's a start. Some guidance on what flags to pass to cmake (e.g. which LLVM tools we should build, and where we should put them) would be great.
I think propeller2.h has settled down quite a bit; at least there's a useful common subset supported by riscvp2, fastspin, and Catalina.
Honestly I think for P2 it's OK for us to require people to use macros like _pinh(x) and _pinl(x) rather than writing directly to registers. With 64 pins figuring out which of DIRA/DIRB, OUTA/OUTB, INA/INB to access is a pain, whereas the instructions like PINL do that automatically.
riscvp2 has a similar problem: the P2 instructions like PINL are implemented (via custom RISC-V instructions) but the registers are not directly accessible. In practice though this doesn't seem to hurt; I've been able to convert my VGA driver, garry's USB driver, and various other Spin2 code none of which needs direct register access via the register names.
(EDIT: DIRA has to be on the clobber list for operator=)
@Wuerfel_21 that's clever. I think I'll include it as an option for compatibility with older propgcc code, but I agree with Eric, it will be a pain now that we have 64 pins.
Have I mentioned that I'm excited for this? It's true. I am. Thank you, @n_ermosh, for getting the ball rolling on this!
https://www.avrfreaks.net/forum/avr-llvm-released-1#new
Yeah, looking into this again today it appears from the warning that appeared prior to the error that I am probably using a version of Clang to build your LLVM that it doesn't like. The source code shows use of newer C++ features which I'm not too familiar with and maybe my Clang compiler version may not be up to it. My system is still running MacOSX Yosemite and is getting old now. I mainly keep it for using BST with my P1 projects.
For now I've commented out the offending line in the bug report note generator that returns some string (who knows what that will do) and the compile is now continuing, currently at 40% complete.
Update: Spoke too soon, build reached 63% but barfed again on its own includes. My setup must be incompatible in some way, I'm probably just too out of date somehow.
@rogloh clang 6 should work fine, though I've got 9.1: .
in theory, gcc should also work, you can set it to be built using gcc by setting the CMAKE_C_COMPILER and CMAKE_CXX_COMPILER flags when running the initial cmake (see https://llvm.org/docs/GettingStarted.html#local-llvm-configuration)
@jmg I actually referenced the AVR backend a lot when working on this. Essentially, I used a lot of AVR, MIPS, MSP430, and Sparc backend code to make this work and to understand how LLVM actually works. A little bit of the ARM backend as well.
It's interesting that despite doing runtime instruction translation riscvp2 is usually fastest. Is GCC 8 that much better than GCC 4? I dug deep into riscvp2 to know that it only does a few optimizations. I can't wait to see some benchmarks for clang.
I'll check my llvm compile in the morning.
From what I've seen so far, the clang optimizer is pretty good. I can also give it hints to the cost of each propeller instruction (I haven't even started looking into this yet...), so if it knows of multiples ways that the program graph converts to actual instructions, the compiler can choose the one the costs the least. will probably be super helpful when trying reduce hub access or CORDIC use to improve performance.
Also do I need to tell it to build Clang as well if I already have my own Clang version already running on my Mac? Is your LLVM something that is built as an add on to an existing Clang or does it require its own freshly compiled Clang front end so to speak?
I had originally used this setup initially in cmake. Maybe I don't need to define that clang if I already have it?
cmake -DLLVM_ENABLE_PROJECTS="clang" ../llvm/
rdlong a, ptra[4]
If you can utilize a pointer register for the stack frame access that might be better than doing things like this:
mov temp, sp
add temp, #4
rdlong a, temp
It saves a lot of space and is faster at the same time. Of course there are range limits of the offsets to consider too. In some cases you would still need to use the second form.
Yes I figured that one out recently and that's how I push/pop values to the stack. however, sometimes it is necessary to store the pointer separately into a specific register for later use, so I still end up performing a mov/add (actually it's a sub, but same thing) operation. And even for large offsets, we can just insert augs before hand (which I've already set up in the code, but have never tested it--something for the todo list). I like the idea of doing a block transfer in the prologue/epilogue, I need to look into that more (on top of all the other features that still need work )
as an example, here's a putc() function for printing out a character I just dumped super quick for reference:
Yes I did some work on this with p2gcc first talked about here:
https://forums.parallax.com/discussion/comment/1473617/#Comment_1473617
I created some perl scripts to hunt for the prolog/epilog patterns and replaced its full code with my shortened setq version. In my tests with p2gcc I'd say it probably gave less than a 5% boost, but would be more for heavily nested short function calls. It only improves further the more registers you need to save so it's certainly a gain worth considering. You would just need to layout and order your preserved registers appropriately and contiguously because setq can only burst read/write to memory in one direction (ie. increments only). Keep this one in mind, it's some low hanging fruit to utilize P2 capabilities.