riscvp2: a C and C++ compiler for P2
ersmith
Posts: 6,056
Update July 31, 2020: I've downloaded specific toolchains for Win32, MacOSX, and Linux, applied the changes to them, and uploaded the binaries to:
https://github.com/totalspectrum/riscvp2/releases/latest
So you can get the final binaries and don't have to build them yourself. Note that these are command line tools only, although since they're standard GCC (with one extra command line option for the P2 link) it should be easy to hook them up to your favorite IDE.
Provided are all the GNU binutils, GCC, and G++, along with newlib libraries. If you get this you'll be all set up for developing applications for both P2 and RISC-V.
Under the hood this compiler works by doing a JIT (just in time) translation from RISC-V code to P2 code. The RISC-V compressed instruction set is supported, so binaries can be reasonably small (but beware, the newlib libraries tend to add a lot of bloat; you'll probably want to use the "nano" version of the libraries). Performance is actually very respectable; only many benchmarks riscvp2 beats all of the other C compilers for the P2 (fastspin, Catalina, and p2gcc). On a substantial "real world" workload (micropython) the riscvp2 compiled binary ended up about 10% slower than the p2gcc compiled one, but was substantially smaller.
AFAIK this is currently the only solution for full C++ on P2.
Original message:
I've posted what I hope are useful instructions for compiling C and C++ applications for P2 using a RISC-V toolchain. The repository is at https://github.com/totalspectrum/riscvp2. The README.md should describe everything you need, including where to find a pre-built RISC-V toolchain. Once your RISC-V toolchain is installed, edit the Makefile to have the correct paths and then do "make install". That should update your RISC-V toolchain so that it can build P2 binaries as well, by just giving the "-T riscvp2.ld" switch to riscv-none-embed-gcc or riscv-none-embed-g++.
"make hello-c++.binary" will build a simple C++ demo application that should run on the P2 Eval board (it works for me, at least!)
Feedback is very welcome. I've been using this kind of setup for a while, so I may well have overlooked some step in the descriptions, or perhaps not made something as clear as it should be. If so, please let me know!
(Tested on Debian Linux. It should work the same for any Linux or Mac OSX. For Windows it *should* work if you have some kind of POSIX system like msys or cygwin installed, but I haven't actually tested it there.)
https://github.com/totalspectrum/riscvp2/releases/latest
So you can get the final binaries and don't have to build them yourself. Note that these are command line tools only, although since they're standard GCC (with one extra command line option for the P2 link) it should be easy to hook them up to your favorite IDE.
Provided are all the GNU binutils, GCC, and G++, along with newlib libraries. If you get this you'll be all set up for developing applications for both P2 and RISC-V.
Under the hood this compiler works by doing a JIT (just in time) translation from RISC-V code to P2 code. The RISC-V compressed instruction set is supported, so binaries can be reasonably small (but beware, the newlib libraries tend to add a lot of bloat; you'll probably want to use the "nano" version of the libraries). Performance is actually very respectable; only many benchmarks riscvp2 beats all of the other C compilers for the P2 (fastspin, Catalina, and p2gcc). On a substantial "real world" workload (micropython) the riscvp2 compiled binary ended up about 10% slower than the p2gcc compiled one, but was substantially smaller.
AFAIK this is currently the only solution for full C++ on P2.
Original message:
I've posted what I hope are useful instructions for compiling C and C++ applications for P2 using a RISC-V toolchain. The repository is at https://github.com/totalspectrum/riscvp2. The README.md should describe everything you need, including where to find a pre-built RISC-V toolchain. Once your RISC-V toolchain is installed, edit the Makefile to have the correct paths and then do "make install". That should update your RISC-V toolchain so that it can build P2 binaries as well, by just giving the "-T riscvp2.ld" switch to riscv-none-embed-gcc or riscv-none-embed-g++.
"make hello-c++.binary" will build a simple C++ demo application that should run on the P2 Eval board (it works for me, at least!)
Feedback is very welcome. I've been using this kind of setup for a while, so I may well have overlooked some step in the descriptions, or perhaps not made something as clear as it should be. If so, please let me know!
(Tested on Debian Linux. It should work the same for any Linux or Mac OSX. For Windows it *should* work if you have some kind of POSIX system like msys or cygwin installed, but I haven't actually tested it there.)
Comments
I compiled it with:
(This uses the LUT caching version of the JIT compiler, which is the fastest but has a small cache, so best for small programs like this.)
If I've done everything right, the inner loop will compile in the end to two instructions:
Running from LUT cache this should take just 6 cycles per loop. The system clock is 160 MHz, so we should get about 27 million toggles per second. I don't have an oscilloscope to check this, but I've attached the binary so interested readers can double check my work.
You'll also probably notice that the binary is huge, relatively speaking. That's a consequence of the newlib library that comes with most RISC-V toolchains, which for an "embedded" library is incredibly bulky. We'll probably want to port proplib or some other P2 library to RISC-V to get a more compact binary.
Has anyone tried this out? If you've tried and failed (or couldn't understand the docs) please let me know -- I would like to improve the installation instructions.
I started fixing some but quickly grew tired. Porting PropWare to P2 is going to require far more than an hour on a weeknight
But, it does seem feasible, and the fact that this is all based on a GCC toolchain means no crazy hacks need to be applied to CMake. It's very promising!
Thanks!
Ah, __builtin_propeller_rev reveals something we're missing from propeller2.h -- a way to access the _rev() instruction. We should add that. There are probably a few other things that are operators in Spin2 that we'll want to add as C functions.
CNT of course is now _cnt() on P2.
Thanks for giving this a try. I hope you were able to run something simple like "hello world"?
Sorry, what kind of gotchas with condition codes? Oh, you mean porting P1 assembly code to P2. Yeah, in general there are lots of gotchas there. Even more so if you're using the RISC-V toolchain which only supports RISC-V inline assembly . If you stick with the macros in propeller2.h you should be OK though.
The gnu-mcu-eclipse Risc-V toolchain that I point to in the README has Windows and MacOSX binaries, and the code to change the Risc-V toolchain to a P2 toolchain is fairly generic (it does use some Unix utilities that should be available for MacOS and would be available for Windows in cygwin or msys). But I haven't tried it myself on Windows.
Also, I'm guessing that inline assembly is not possible. Is that true?
If so, how would one add assembly drivers to a project?
As David mentioned, inline RISC-V assembly works fine . The emulated RISC-V has a number of extended instructions (RISC-V is inherently extensible) which map directly to P2 instructions, and most of those are supported via macros from propeller2.h.
To add assembly drivers to a project, convert them to C code via spin2cpp (or compile them with fastspin to a binary blob) and then use the _coginit() function (from propeller2.h) to start them. That's how I added the USB and VGA drivers to the micropython port.
Is there a way to include the binary blob in the C++ source? Need to convert the binary to text to do it?
looking at this: https://csl.name/post/embedding-binary-data/
I assume this comes with GCC, right?
By JIT compiling, do you mean you are taking the RISCV binary and converting it to P2 binary in real time as the program runs?
Isn't this an emulator?
Mainly I was thinking about a PASM driver. In theory you could probably do it for any fastspin language, but by default fastspin produces hubexec and you'd have to do some careful juggling to get that to link together with a C++ program. But if you used --code=cog so it would run in another COG it probably would work.
For a binary blob, there are a number of ways to include it in the program. The GNU linker could be used to link it into a program, or you could use xxd or some similar tool to convert it to a C array of hex bytes. Or if you're using spin2cpp it will (by default) output the DAT section as an array of bytes, and convert the Spin parts to C++ code, so the result can just be compiled as a .c or .c++ file.
Yes, it does convert the RISC-V binary to P2 at run time. But It is a compiler, not an interpreter -- it caches the compiled code and re-uses it. So there is some latency the first time through any loop (as the RISC-V instructions are compiled to P2 instructions) but subsequent loop iterations run at full hubexec speed (or LUT exec speed, depending on the options you give).
Also, the RISC-V processor that is emulated has some custom P2 instructions, so if any of those are used then it won't run on "real" RISC-V hardware (unless someday somebody makes a RISC-V with those custom instructions).
Here's the performance of some compilers on Heater's fft benchmark: I wouldn't pay too much attention to the binary sizes, those are mostly influenced by the libraries (Catalina and riscvp2 have pretty big libraries, fastspin and p2gcc have minimalistic ones). But the performance numbers show that riscvp2 more than holds its own. The dhrystone benchmark numbers are similar. On some other benchmarks (e.g. micropython) riscvp2 trails p2gcc by a bit, so it isn't always the fastest, but it is definitely competitive. It's also the only compiler at present to have 64 bit double and long long support.
You seem to be calling it an emulator but also a compiler (and not an interpreter).
I think if this were running on a PC, it would be called an emulator, right?
Aw crud... that has me a bit depressed. I was hoping that, with some work to change header files around and a few PropGCC-specific macros, I could compile all of PropWare with riscvp2. But since PropWare makes extensive use of inline (Propeller) assembly, porting to riscvp2 would be a significant undertaking - one that definitely isn't worth it so long as a proper GCC/LLVM port is still on the table.
(1) You could just pretend it's a P2 compiler. After all, it takes C/C++ code and produces a binary that can run on a P2. The binaries it produces are a bit smaller than most P2 compilers, and potentially a little bit slower (although in practice, as shown above, they're often faster!)
(2) You could call it a RISC-V emulator, because it works by translating (at run time) RISC-V instructions into P2 instructions. But that's also a little bit misleading, because the RISC-V it emulates doesn't really exist (it has custom instructions that no actual RISC-V hardware has right now, although if Chip wants to use the RISC-V instruction set in P3 then he could use this as a starting point ).
When you say "emulator", especially in the Propeller world, people tend to assume a (slow) interpreter. Since this one uses a JIT compiler, the output does run at full hubexec speed -- it just has some startup latency due to compilation (and really big programs that don't fit in cache do tend to slow down, but that's a rare case). So I've been kind of avoiding the "emulator" word. But yes, it's fair to call it an emulator -- but a very, very fast one.
Do you have any comparisons with a compiler that compiles to produce a native P2 binary - that should be the fastest of all speeds ?
In a JIT compiler case, if the P2 can manage this itself (with some time overhead), could a PC also do the same work, and remove that load from the P2, and so create a native P2 file ?
How much larger does that P2 file become ?
Do the PropWare files have non-inline (probably slower, of course) versions? If so those would just compile. If not, you may want to add them: it'll make your code a lot more portable, maybe potentially to other compilers like fastspin (fastspin only supports a very limited subset of C++ now, but perhaps someday it could handle more). Besides the P2 instruction set changes, things like FCACHE are going to be very different between P2 and P1. Many P2 compilers will skip FCACHE entirely, since it isn't nearly as much of a win over hubexec as it is over LMM.
riscvp2 does have a <propeller2.h> file with macros for a lot of common P2 instructions.
In theory someone could do something like this on the PC; it would be like p2gcc but taking RISC-V input instead of P1 input. It's not a project I'm particularly interested in. In general RISC-V compressed code is 25% smaller than uncompressed RISC-V code, which is in turn roughly 10% smaller than P1 code (the RISC-V instruction set is more compact). Obviously your milage may vary.
But, I do appreciate the negative connotations of "emulator".
Maybe I'd call it a "real time emulator" or "full speed emulator" or something like that...
No, it really is literally a JIT compiler. A JIT ("just in time") compiler is a compiler that translates at run time one instruction set into another. For example, the JVM JIT compiler translates Java bytecode to x86 (or whatever) instructions. That's exactly what riscvp2 does, at run time it compiles the RISC-V instructions to P2 instructions.