Can't Wait for PropGCC on the P2?

Dave Hein · 2017-04-09 19:02

I've been hoping that P2 support for PropGCC would happen for a while. However, it seems like this won't happen until silicon is available, so I decided to do it myself -- sort of.

I looked at the assembler output from the P1 compiler, and the COG version looks very similar to P2 code, except for a few minor details. So I wrote a conversion program call s2p that converts a P1 .s assembler file to a .spin2 file that can be assembled by PNut. The attached zip file contains the s2p.c source file along with an s2p.exe executable file that will run in a command prompt under Windows. s2p.c can be compiled under Linux by typing "gcc s2p.c -o s2p". There is also a c2p.bat file for Windows that will compile a C program to assembly, and then convert it to a .spin2 file. In a command prompt you would just type something like "c2p fibo" to compile fibo.c and create a fibo.spin2 file. This can then be assembled and loaded by PNut.

There are 4 sample programs -- fibo.c, bas.c, dry.c and chess.c. dry.c runs the Dhrystone benchmark program. bas.c is a Basic interpreter. chess.c is my threaded chess program, but only running on a single cog. Moves are entered by typing something like "d2-d4".

At some point I hope to get my P2 assembler updated to the latest instruction set, and then write a loader so everything can be done from the command line.

There is also another file called prefix.spin2 that contains the startup code and several C library functions, such as putchar, getchar, gets and printf. s2p copies this file to the beginning of the output file. Eventually I may implement a simple linker so that only the needed object files will be linked in, unless of course PropGCC becomes available for the P2 first.

EDIT: The latest version of the compiler tools is contained in the zip file p2gcc006.zip. See the readme.txt files contained in the release for more information, and also check the latest posts in this thread.

jmg · 2017-04-09 20:08

That's very nifty, and a great way to get more code coverage into P2 for testing.

Can you add a summary of same/different to P1 flows.

I take it there is a 1:1 opcode mapping, so P2 code is exactly the same size as P1 ?
Is speed then exactly 2.0x P1, for a 80MHz P2.FPGA ?
Are the libraries you mention coded in PASM2, so they can be smaller/faster than P1, or are they translated P1 libs ?
Suppose gccp1 was tweaked to optionally simply raise the COG size, would that larger ASM file convert to a mix of COG and HUBEXEC on P2 ? (or is that COG/LUT/HUB ?)

Dave Hein · 2017-04-09 21:18

My main motivation for this is to get more testing of the P2 in the areas used by C code. I don't think there's been much testing of the hub exec mode, which is the primary way that C will be run. The P1 code is converted to assembly assuming the COG memory model. Fortunately, PropGCC doesn't check for code size during the compile phase. The COG memory limitation is only imposed at link time. In the P2, there is no difference between code executing from COG memory versus Hub memory, except for the address size. At the source level, constant address must use ## instead of # so that all the address bits fit in the instruction. This increases the code size about because AUGS/AUGD instruction are automatically generated when ## is used. However, compared to P1 LMM the size is probably similar.

No, the speed is not exactly 2x the P1. Average hub access latencies should be about the same at 16 cycles, so code that does a lot of random hub accesses will be about the same. However, hub exec should be much quicker than LMM on P1. Part of the problem with using the P1 COG mode is that the compiler generates a lot of pointer in cog memory, which end up being in hub memory for my approach. I'll have to look at starting from LMM code, which may be more efficient for converting to P2 hub exec.

The libraries started out as C code, and are pre-compiled/converted to PASM2. They went through the same process as user program code.

If the code would fit in COG and LUT memory it will run at the full speed, and generated pointers could be used directly without having to be loaded from hub RAM. So small programs could be run entirely from COG/LUT memory.

jmg · 2017-04-09 22:22

Dave Hein wrote: »

My main motivation for this is to get more testing of the P2 in the areas used by C code. I don't think there's been much testing of the hub exec mode, which is the primary way that C will be run. The P1 code is converted to assembly assuming the COG memory model. Fortunately, PropGCC doesn't check for code size during the compile phase. The COG memory limitation is only imposed at link time.

That's fortunate, so larger 'Virtual P1' COG mode code can translate to P2 this way ?

Dave Hein wrote: »

In the P2, there is no difference between code executing from COG memory versus Hub memory, except for the address size. At the source level, constant address must use ## instead of # so that all the address bits fit in the instruction. This increases the code size about because AUGS/AUGD instruction are automatically generated when ## is used. However, compared to P1 LMM the size is probably similar.

Does the converter default one way or the other ?
A smarter Assembler could automate the 'bump' to dual-opcode, which could be arranged either way.
eg Default larger, and shrink when possible, or default smaller and add when needed.
In both cases, multiple passes look to be needed, but multi-pass assemblers are practical.

Dave Hein wrote: »

Part of the problem with using the P1 COG mode is that the compiler generates a lot of pointer in cog memory, which end up being in hub memory for my approach.

I'm not following this exactly - are you saying the final memory map differs from P1 to P2, with some vars shifting from COG to HUB ?

Rayman · 2017-04-09 22:38

Very nice. So, one would start with PropGCC for P1 and use COG memory model.
Then, this magically makes it work for P2 in hubexec mode, right?
Sounds like a great start.

Dave Hein · 2017-04-09 23:52

The main issue with the cog model is that the P1 compiler uses cog locations to store addresses of variables. This is because hub addresses do fit in the 9-bit immediate field of an instruction. So instead of doing a "rdlong reg1, #variable" it does a "rdlong reg1, reg2", where reg2 contains the address of the variable. The way I implemented the converter the address is stored in a hub location instead of a cog location. This requires doing 2 rdlongs to read a variable. The first rdlong gets the address, and the second rdlong gets the variable contents.

I looked at the assembly from the LMM model, and it avoids using a register with the address. This will probably run more efficiently on the P2 using hub exec. I think I just need to add support for a few LMM pseudo-ops to make this work.

jmg · 2017-04-10 00:12

Dave Hein wrote: »

The main issue with the cog model is that the P1 compiler uses cog locations to store addresses of variables. This is because hub addresses do fit in the 9-bit immediate field of an instruction. So instead of doing a "rdlong reg1, #variable" it does a "rdlong reg1, reg2", where reg2 contains the address of the variable. The way I implemented the converter the address is stored in a hub location instead of a cog location. This requires doing 2 rdlongs to read a variable. The first rdlong gets the address, and the second rdlong gets the variable contents.

The P1 approach would still work tho ? - it just consumes a COG where it strictly did not need to, but there could be cases where, if that register is modified elsewhere, then the shift COG-> HUB could break code ?

Dave Hein · 2017-04-10 01:24

I could put the address pointer in cog RAM, but it would require 2 passes instead of 1. I might do that later on.

I just finished trying the LMM mode, and I got mixed results. The LMM fibo was 35% slower than the COG version. The LMM dhrystone was 1% faster.

dgately · 2017-04-10 22:10

Hi Dave,

Built s2p.c on macOS without any compiler problems!

Tested a short LED blinker that builds in SimpleIDE, then ran c2p and created a .spin2 file. That file fails to build in Pnut on VMWared Windows 10, with the following error:

The C code as built in SimpleIDE:

#include <propeller.h>

int main(void)
{
  int32_t	Index;
  DIRA |= (63);
  while (1) {
    for(Index = 0; Index <= 5; Index++) {
      OUTA |= (1 << Index);
      waitcnt(((CLKFREQ / 50) + CNT));
      OUTA &= (~(1 << Index));
      waitcnt(((CLKFREQ / 50) + CNT));
    }
    for(Index = 5; Index >= 0; Index--) {
      OUTA |= (1 << Index);
      waitcnt(((CLKFREQ / 50) + CNT));
      OUTA &= (~(1 << Index));
      waitcnt(((CLKFREQ / 50) + CNT));
    }
  }
}

waitcnt R6, #0

waitcnt be converted to what's compatible with spin2?

dgately

KeithE · 2017-04-10 22:36

Dave Hein wrote: »

My main motivation for this is to get more testing of the P2 in the areas used by C code.

Have you considered trying http://embed.cs.utah.edu/csmith/ to generate some C code for testing? I saw that the RISC V guys were using it. I have no idea how difficult this would be to do.

ersmith · 2017-04-10 22:37

In fastspin I translated waitcnt with:

    "pri waitcnt(x)\n"
    "  asm\n"
    "    addct1  x, #0\n"
    "    waitct1 x\n"
    "  endasm\n"

(sorry about the quoted strings, that was cut and paste from the fastspin source code.)

ozpropdev · 2017-04-10 23:38

In this example

waitcnt(((CLKFREQ / 50) + CNT));

It would also need a "GETCT" instruction as well,

	getct x
	addct1	x,##80_000_000 / 50
	waitct1

jmg · 2017-04-10 23:49

Can the WAITX opcode be used in some cases ?

cgracey · 2017-04-10 23:58

jmg wrote: »

Can the WAITX opcode be used in some cases ?

Yes, it would be simpler. The CT stuff is only necessary if you need to correlate with the system counter.

ozpropdev · 2017-04-11 00:02

jmg wrote: »

Can the WAITX opcode be used in some cases ?

Yep, in this case saving one instruction

	waitx	##80_000_000 / 50

Dave Hein · 2017-04-11 00:26

dgately, I've attached a new s2p.c file that handles waitcnt. It also handles the CNT register, which was another problem I encountered when converting your code. I ran your code on my DE2-115, but I had to change DIRA and OUTA to DiRB and OUTB to see it on the LEDs on that board.

Eric, thanks for the tip on handling waitcnt.

KeithE, it looks like csmith is used for testing out the compiler. I'm not sure it's useful for what I'm doing, but I'll look into it some more.

dgately · 2017-04-11 07:49

Dave Hein wrote: »

dgately, I've attached a new s2p.c file that handles waitcnt. It also handles the CNT register, which was another problem I encountered when converting your code. I ran your code on my DE2-115, but I had to change DIRA and OUTA to DiRB and OUTB to see it on the LEDs on that board.

Yes, I can now compile the .spin2 file created by s2p, with PNut.

dgately

KeithE · 2017-04-11 15:16

Dave Hein wrote: »

KeithE, it looks like csmith is used for testing out the compiler. I'm not sure it's useful for what I'm doing, but I'll look into it some more.

I think that was the intent - but I had seen some use of it in the RISC V world for testing hardware. It may be a pain to use though. e.g.

Note, that csmith is not guaranteed to produce terminating test programs, so any timed out test it marked by [Expected Fail] in the output of the script.

Dave Hein · 2017-04-18 02:31

I updated my P2 assembler with the latest instruction set, and I've attached the zip file below. The assembler is called p2asm. This is the same as the qasm assembler that I've posted before, but I just renamed it since it now only assembles for the P2. I haven't included an executable file, but it can be built by running the buildit script file under Cygwin or Linux. There is a verify directory where you can run the testall script file to verify that the executable works correctly. It will print the name of each source file that it assembles, and if there is no error output the binary and listing files match up to the reference files.

Even if you continue to use the PNut assembler you might find the listing file from p2asm to be very useful. I am also working on a P2 loader program called p2load. I have it working under Cygwin, but I need to get it to run under Linux also. Once the loader is working it will be possible to compile a C program, assemble it and load it all from the command line. p2load will also serve as a terminal emulator. I'm hoping to post p2load by tomorrow night along with an updated s2p program that contains a couple of bug fixes.

David Betz · 2017-04-18 02:35

Dave Hein wrote: »

I updated my P2 assembler with the latest instruction set, and I've attached the zip file below. The assembler is called p2asm. This is the same as the qasm assembler that I've posted before, but I just renamed it since it now only assembles for the P2. I haven't included an executable file, but it can be built by running the buildit script file under Cygwin or Linux. There is a verify directory where you can run the testall script file to verify that the executable works correctly. It will print the name of each source file that it assembles, and if there is no error output the binary and listing files match up to the reference files.

Even if you continue to use the PNut assembler you might find the listing file from p2asm to be very useful. I am also working on a P2 loader program called p2load. I have it working under Cygwin, but I need to get it to run under Linux also. Once the loader is working it will be possible to compile a C program, assemble it and load it all from the command line. p2load will also serve as a terminal emulator. I'm hoping to post p2load by tomorrow night along with an updated s2p program that contains a couple of bug fixes.

FYI, I already have a program called p2load although it currently only knows how to load the old P2 FPGA images.

jmg · 2017-04-18 04:02

David Betz wrote: »

FYI, I already have a program called p2load although it currently only knows how to load the old P2 FPGA images.

Do you mean it works with an earlier version of the bootloader ?
Brings up the question of is there a way to query the version of the loader, ISTR the tags related to Silicon ?

jmg · 2017-04-18 04:04

Dave Hein wrote: »

I updated my P2 assembler with the latest instruction set, and I've attached the zip file below. .

Cool, is there a manual covering Assembler directives and supported syntax ?
Macros, Conditional Assembly.., ?

Tubular · 2017-04-18 04:16

jmg wrote: »

David Betz wrote: »

FYI, I already have a program called p2load although it currently only knows how to load the old P2 FPGA images.

Do you mean it works with an earlier version of the bootloader ?
Brings up the question of is there a way to query the version of the loader, ISTR the tags related to Silicon ?

It works with an earlier version of P2-hot.

dgately · 2017-04-18 04:42

Dave Hein wrote: »

...but it can be built by running the buildit script file under Cygwin or Linux.

Initial build on macOS (Sierra) returned:

p2asm.c:584:9: error: non-void function 'FindNeededSymbol' should return a value [-Wreturn-type]
        return;
        ^

Modifying line 584 of that file (just return -1 or any integer in the error code):

return -1;

With this slight change to line 584 of p2asm.c, this builds on macOS (sierra) and runs all of the tests in the verify directory...

dgately

David Betz · 2017-04-18 09:52

jmg wrote: »

David Betz wrote: »

FYI, I already have a program called p2load although it currently only knows how to load the old P2 FPGA images.

Do you mean it works with an earlier version of the bootloader ?
Brings up the question of is there a way to query the version of the loader, ISTR the tags related to Silicon ?

I think it supports the FPGA image prior to the old P2-hot. That is the last version that PropGCC supports as well. I don't think I'd bother trying to support both that and the newest P2 images. No one will want to go back to the older ones once we have real silicon and I'm hoping there won't be yet another redesign. When I update p2load I'll make it work only with Chip's new ASCII loader protocol.

Dave Hein · 2017-04-18 12:09

jmg, there's no manual. The assembler is fairly straightforward. You just type "p2asm file.spin2" and it generates file.bin and file.lst. The program does contain some code for generating an object file that contains a list of global and unresolved symbols, but that's not really supported right now. I may add support for GAS directives later on, and look into what it takes to generate an ELF file.

dgately, thanks for checking into that. It's odd that GCC under Cygwin doesn't flag that as an error, or even issue a warning. I recompiled with the -Wall option, and it did issue a warning for it along with warnings about unused variables and a few other things. I cleaned it up and posted an updated version in the attached zip file.

David, I do recall there was a p2load or p2loader before, so as it turns out I actually called my loader loadp2 instead. I used the wrong name in my previous post. Most of my code is borrowed from the loader you wrote for the P1.

So what I'm thinking is that I'll post my loadp2 program that just supports Cygwin for now, and maybe you can update your p2load to support the latest FPGA based on the code in loadp2. At that point p2load would then become the "official" P2 loader. I'll post my code later today. It basically works by loading the MainLoader program first, and then loads the user program using MainLoader. loadp2 runs at a fixed baud rate of 115,200.

David Betz · 2017-04-18 12:16

Dave Hein wrote: »

David, I do recall there was a p2load or p2loader before, so as it turns out I actually called my loader loadp2 instead. I used the wrong name in my previous post. Most of my code is borrowed from the loader you wrote for the P1.

So what I'm thinking is that I'll post my loadp2 program that just supports Cygwin for now, and maybe you can update your p2load to support the latest FPGA based on the code in loadp2. At that point p2load would then become the "official" P2 loader. I'll post my code later today. It basically works by loading the MainLoader program first, and then loads the user program using MainLoader. loadp2 runs at a fixed baud rate of 115,200.

Sounds good. I'll see if I can steal your code for handling the ASCII loader protocol. I assume you have implemented a second-stage loader as well so you can load all of hub memory. Is that correct?

Edit: Duh. I should have read your post more carefully. I assume "MainLoader" is a second-stage loader.

Dave Hein · 2017-04-18 13:41

MainLoader.spin2 is the program that Chip includes with the FPGA distribution. He had said somewhere that PNut also uses it. The byte count is set in the hub image before loading it in a cog, and it just reads bytes from the serial port and writes them into hub memory starting at location 0.

In theory, loading could be done using the Prop_Hex or Prop_Ascii command, but this runs using the 12 MHz RC clock, which seems to drift a lot. I had to insert several ">" characters in the Prop_Hex stream to ensure that MainLoader got loaded OK. The ">" character is used to re-calibrate the baud rate.

Here's the code that does the actual loading. It resets the P2, then loads the MainLoader program using Prop_Hex. The user program is then loaded after that.

char *loader =
"> 00 fe 65 fd 00 00 8c fc 20 7e 65 fd 24 08 60 fd > 24 28 60 fd 02 00 80 ff 28 12 64 fd 08 08 dc fc > 4f 7e 74 fd 01 24 84 f0 01 00 80 ff 28 5c 65 fd > 18 24 44 f0 13 24 60 fd f4 23 6c fb 00 00 7c fc > 00 00 ec fc";

int loadfile(char *fname)
{
    FILE *infile;
    int count = 0;
    int num, size;
    char outstr[100];
    unsigned char val;

    infile = fopen(fname, "r");
    if (!infile)
    {
        printf("Could not open %s\n", fname);
        return 1;
    }
    fseek(infile, 0, SEEK_END);
    size = ftell(infile);
    fseek(infile, 0, SEEK_SET);
    sprintf(outstr, " %2.2x %2.2x %2.2x %2.2x ",
        size&255, (size >> 8) & 255, (size >> 16) & 255, (size >> 24) & 255);
    printf("Loading %s - %d bytes\n", fname, size);
    hwreset();
    usleep(1000);
    tx("> Prop_Hex 0 0 0 0 ", 19);
    tx(loader, strlen(loader));
    tx(outstr, strlen(outstr));
    tx("~", 1);
    printf("loader loaded\n", fname);
    usleep(10000);
    while ((num=fread(outstr, 1, 100, infile)))
    {
        tx(outstr, num);
    }
    printf("%s loaded\n", fname);
    return 0;
}

jmg · 2017-04-18 20:48

Dave Hein wrote: »

.... loadp2 runs at a fixed baud rate of 115,200.

IIRC Chip had his loader working up towards 2MBd in latest versions.
Is that 115200 a command line option ?
Seems nice to default to 115200, but option for higher baud rates once users know their system is working...

ersmith · 2017-04-18 23:38

Dave:

p2asm looks very promising, and combined with loadp2 it should make compiler work much easier. It's definitely much faster to be able to automate things in a script and/or run from a command line then to have to open PNut and recompile and redownload after every change. Thanks for working on these!

(A Linux version of loadp2 would really be gravy, but if nobody else tackles it I could take a look at porting it from Cygwin.)

Thanks,
Eric

David Betz · 2017-04-18 23:45

ersmith wrote: »

Dave:

p2asm looks very promising, and combined with loadp2 it should make compiler work much easier. It's definitely much faster to be able to automate things in a script and/or run from a command line then to have to open PNut and recompile and redownload after every change. Thanks for working on these!

(A Linux version of loadp2 would really be gravy, but if nobody else tackles it I could take a look at porting it from Cygwin.)

Thanks,
Eric

I just downloaded the zip file and typed "cc -o p2asm *.c -lm" and it built fine with four warnings.

Can't Wait for PropGCC on the P2?

Comments