Can't Wait for PropGCC on the P2?
Dave Hein
Posts: 6,347
I've been hoping that P2 support for PropGCC would happen for a while. However, it seems like this won't happen until silicon is available, so I decided to do it myself -- sort of. I looked at the assembler output from the P1 compiler, and the COG version looks very similar to P2 code, except for a few minor details. So I wrote a conversion program call s2p that converts a P1 .s assembler file to a .spin2 file that can be assembled by PNut. The attached zip file contains the s2p.c source file along with an s2p.exe executable file that will run in a command prompt under Windows. s2p.c can be compiled under Linux by typing "gcc s2p.c -o s2p". There is also a c2p.bat file for Windows that will compile a C program to assembly, and then convert it to a .spin2 file. In a command prompt you would just type something like "c2p fibo" to compile fibo.c and create a fibo.spin2 file. This can then be assembled and loaded by PNut.
There are 4 sample programs -- fibo.c, bas.c, dry.c and chess.c. dry.c runs the Dhrystone benchmark program. bas.c is a Basic interpreter. chess.c is my threaded chess program, but only running on a single cog. Moves are entered by typing something like "d2-d4".
At some point I hope to get my P2 assembler updated to the latest instruction set, and then write a loader so everything can be done from the command line.
There is also another file called prefix.spin2 that contains the startup code and several C library functions, such as putchar, getchar, gets and printf. s2p copies this file to the beginning of the output file. Eventually I may implement a simple linker so that only the needed object files will be linked in, unless of course PropGCC becomes available for the P2 first.
EDIT: The latest version of the compiler tools is contained in the zip file p2gcc006.zip. See the readme.txt files contained in the release for more information, and also check the latest posts in this thread.
There are 4 sample programs -- fibo.c, bas.c, dry.c and chess.c. dry.c runs the Dhrystone benchmark program. bas.c is a Basic interpreter. chess.c is my threaded chess program, but only running on a single cog. Moves are entered by typing something like "d2-d4".
At some point I hope to get my P2 assembler updated to the latest instruction set, and then write a loader so everything can be done from the command line.
There is also another file called prefix.spin2 that contains the startup code and several C library functions, such as putchar, getchar, gets and printf. s2p copies this file to the beginning of the output file. Eventually I may implement a simple linker so that only the needed object files will be linked in, unless of course PropGCC becomes available for the P2 first.
EDIT: The latest version of the compiler tools is contained in the zip file p2gcc006.zip. See the readme.txt files contained in the release for more information, and also check the latest posts in this thread.
Comments
Can you add a summary of same/different to P1 flows.
I take it there is a 1:1 opcode mapping, so P2 code is exactly the same size as P1 ?
Is speed then exactly 2.0x P1, for a 80MHz P2.FPGA ?
Are the libraries you mention coded in PASM2, so they can be smaller/faster than P1, or are they translated P1 libs ?
Suppose gccp1 was tweaked to optionally simply raise the COG size, would that larger ASM file convert to a mix of COG and HUBEXEC on P2 ? (or is that COG/LUT/HUB ?)
No, the speed is not exactly 2x the P1. Average hub access latencies should be about the same at 16 cycles, so code that does a lot of random hub accesses will be about the same. However, hub exec should be much quicker than LMM on P1. Part of the problem with using the P1 COG mode is that the compiler generates a lot of pointer in cog memory, which end up being in hub memory for my approach. I'll have to look at starting from LMM code, which may be more efficient for converting to P2 hub exec.
The libraries started out as C code, and are pre-compiled/converted to PASM2. They went through the same process as user program code.
If the code would fit in COG and LUT memory it will run at the full speed, and generated pointers could be used directly without having to be loaded from hub RAM. So small programs could be run entirely from COG/LUT memory.
Does the converter default one way or the other ?
A smarter Assembler could automate the 'bump' to dual-opcode, which could be arranged either way.
eg Default larger, and shrink when possible, or default smaller and add when needed.
In both cases, multiple passes look to be needed, but multi-pass assemblers are practical.
I'm not following this exactly - are you saying the final memory map differs from P1 to P2, with some vars shifting from COG to HUB ?
Then, this magically makes it work for P2 in hubexec mode, right?
Sounds like a great start.
I looked at the assembly from the LMM model, and it avoids using a register with the address. This will probably run more efficiently on the P2 using hub exec. I think I just need to add support for a few LMM pseudo-ops to make this work.
I just finished trying the LMM mode, and I got mixed results. The LMM fibo was 35% slower than the COG version. The LMM dhrystone was 1% faster.
Built s2p.c on macOS without any compiler problems!
Tested a short LED blinker that builds in SimpleIDE, then ran c2p and created a .spin2 file. That file fails to build in Pnut on VMWared Windows 10, with the following error:
The C code as built in SimpleIDE:
waitcnt be converted to what's compatible with spin2?
dgately
Have you considered trying http://embed.cs.utah.edu/csmith/ to generate some C code for testing? I saw that the RISC V guys were using it. I have no idea how difficult this would be to do.
Yes, it would be simpler. The CT stuff is only necessary if you need to correlate with the system counter.
Eric, thanks for the tip on handling waitcnt.
KeithE, it looks like csmith is used for testing out the compiler. I'm not sure it's useful for what I'm doing, but I'll look into it some more.
Yes, I can now compile the .spin2 file created by s2p, with PNut.
dgately
Even if you continue to use the PNut assembler you might find the listing file from p2asm to be very useful. I am also working on a P2 loader program called p2load. I have it working under Cygwin, but I need to get it to run under Linux also. Once the loader is working it will be possible to compile a C program, assemble it and load it all from the command line. p2load will also serve as a terminal emulator. I'm hoping to post p2load by tomorrow night along with an updated s2p program that contains a couple of bug fixes.
Brings up the question of is there a way to query the version of the loader, ISTR the tags related to Silicon ?
Macros, Conditional Assembly.., ?
It works with an earlier version of P2-hot.
Initial build on macOS (Sierra) returned:
Modifying line 584 of that file (just return -1 or any integer in the error code):
With this slight change to line 584 of p2asm.c, this builds on macOS (sierra) and runs all of the tests in the verify directory...
dgately
dgately, thanks for checking into that. It's odd that GCC under Cygwin doesn't flag that as an error, or even issue a warning. I recompiled with the -Wall option, and it did issue a warning for it along with warnings about unused variables and a few other things. I cleaned it up and posted an updated version in the attached zip file.
David, I do recall there was a p2load or p2loader before, so as it turns out I actually called my loader loadp2 instead. I used the wrong name in my previous post. Most of my code is borrowed from the loader you wrote for the P1.
So what I'm thinking is that I'll post my loadp2 program that just supports Cygwin for now, and maybe you can update your p2load to support the latest FPGA based on the code in loadp2. At that point p2load would then become the "official" P2 loader. I'll post my code later today. It basically works by loading the MainLoader program first, and then loads the user program using MainLoader. loadp2 runs at a fixed baud rate of 115,200.
Edit: Duh. I should have read your post more carefully. I assume "MainLoader" is a second-stage loader.
In theory, loading could be done using the Prop_Hex or Prop_Ascii command, but this runs using the 12 MHz RC clock, which seems to drift a lot. I had to insert several ">" characters in the Prop_Hex stream to ensure that MainLoader got loaded OK. The ">" character is used to re-calibrate the baud rate.
Here's the code that does the actual loading. It resets the P2, then loads the MainLoader program using Prop_Hex. The user program is then loaded after that.
Is that 115200 a command line option ?
Seems nice to default to 115200, but option for higher baud rates once users know their system is working...
p2asm looks very promising, and combined with loadp2 it should make compiler work much easier. It's definitely much faster to be able to automate things in a script and/or run from a command line then to have to open PNut and recompile and redownload after every change. Thanks for working on these!
(A Linux version of loadp2 would really be gravy, but if nobody else tackles it I could take a look at porting it from Cygwin.)
Thanks,
Eric