p1spin
Dave Hein
Posts: 6,347
in Propeller 2
The attached zip file contains p1spin, which runs P1 Spin programs on the P2. The P1 Spin program is included as a binary file at the end of the program using the FILE directive. p1spin currently maps port A and B in the P1 to port b in the P2. The port mapping can be changed by modifying the "regmap" table in p1spin.spin.
I have only run this under spinsim, but it should work with an FPGA board as well. I have included a P1 Spin program that runs the Dhrystone 1.1 benchmark. The result from running this under spinsim is as follows:
The P1 binary file can also be run using spinsim with the following results.
I have only run this under spinsim, but it should work with an FPGA board as well. I have included a P1 Spin program that runs the Dhrystone 1.1 benchmark. The result from running this under spinsim is as follows:
# ./spinsim -t -b19200 ../p1spin/p1spin.obj Testing Spin Interpreter Dhrystone(1.1) time for 500 passes = 259 msec This machine benchmarks at 1930 dhrystones/second
The P1 binary file can also be run using spinsim with the following results.
# ./spinsim -b19200 ../p1spin/test.binary Testing Spin Interpreter Dhrystone(1.1) time for 500 passes = 692 msec This machine benchmarks at 722 dhrystones/secondP2 runs 2.7 times faster than P1. Adjusting for the difference between the 80 MHz P1 clock and the 50 MHz P2 clock gives a speed up factor of 4.3.
zip
22K
Comments
Did you basically port the P1 Spin Interpreter to P2?
Not sure what you pin mapping means, but I sure I'll figure it out...
The register mapping is used to handle the difference in register values for DIRA, OUTA and INA from P1 to P2. Even though the P1 doesn't support Port B the P1 Spin interpreter and compiler do support it. I currently map the registers for P1 port A to P2 port B because that's where the serial pins are located.
I make a special case for register $1F1, which is the CNT register. When I detect $1F1 I call a routine that uses GETCT and put that value on the stack. I also check if location $0000 is read, which is the CLKFREQ location. I return 50_000_000 when location $0000 is read.
I don't support the PAR register yet. I'm pretty sure that the PAR value is passed in PTRA on the P2, but I haven't seen any documentation on how is this passed by the COGINIT instruction.
I might try the original pin mapping and just modify the SimpleSerial.spin to use port B.
That should work, right?
I suspect that PAR is pass though the D register in the "COGINIT D, S" instruction, but it would need to be shifted to the left to avoid the bits that specify the cog number, "new" cog and hubexec mode. The PAR value would set the initial value of PTRA. At least that's how it worked in P2-Hot. Maybe Chip, or someone else that knows could comment on this.
If you do a SETQ before a COGINIT, you establish the PTRA value, which is akin to PAR on the Prop1.
Did you need to change the baud rate to 1200 to get it to work? With spinsim it works fine at 19200, and I suspect it could run even higher. I wonder why the run time isn't repeatable on the DE2-115. It seems like it should be the same every time it is run.
Am I missing something obvious?
At first the serial terminal just displayed garbage until I changed the section ldclkfreq to match the core frequency.
I also changed nutson's file to operate at 80MHz to match.
It's a strange one.
Other than that it's great!
There is also an error in computing the elapsed time. I'll look at it to determine the exact cause of the problem.
I will perform some other bit twiddling tests to confirm the speed this A9 is running at.
It's fine when you are using a single pin but not for multiple.
This works:-
This doesn't:-
If you have bstc, p2asm and loadp2 installed you can used the runtest script to assemble the code and run it. Or you can just run the pre-built P2 binary file using loadp2. You have to specify a baud rate of 57600.
You can also use the Spin tool to assemble the .spin code, and PNut to assemble and load p1spin.spin2.
Shouldn't this be called something clearer than p1spin ?
that name suggests Spin for P1, and P2 does not appear. Maybe P1SpinOnP2 or p1spin4p2 or...
Or, if there is going to be a V2Spin, (as in fastspin) that has much more language 'oomph' and runs on P2 and maybe P1 too, perhaps this needs to drop the confusion of numbers ?
Maybe SpinO (Spin Old) and SpinN (Spin New) ? (and maybe SpinR can be reserved for the exact, ROM binary image of original P1 Spin ? )
dgately
@dgately, it appears that fastspin used different bytecodes than bstc. I don't recall offhand what 2C 32 does, but I'll look into it.
WOW. I didn't recall this so thanks for updating.
I've used PTRA=pcurr and PTRB=dcurr as these are the most used. This save lots of space by using PTRA/B++ and --PTRA/B.
I used a vector_table for my version. It uses longs in hub (now moved to lut) which contains 3 routine addresses and 5 flags (flags unused?). Most bytecodes only use 1 or 2 routine addresses. Some routines require a popx/popyx/popayx first before their main routine.
The maths routines have 2 entry points, for binary and unary, followed by 5 different routines. One is where there is a mapped P2 instruction, and the instruction opcode is the 3rd vector. But I've been looking at perhaps using skipf and loading the skip values from a table.
I was tracing my code yesterday and realised the different special register mapping for DIR/IN/OUT - shame we didn't realise this earlier as we could have asked Chip to maintain the mapping So, I was about to put in another mapping table. I support both ports A & B. BTW the new instruction register bit mapping on the next silicon will be a boost here too.
Anyway, I think there is some synergy to combining features in both our versions. Your fully unrolled loops are great (I didn't have space on P1)
Are you interested?
Even the use of rdlong x,ptrx++ saves 2 clocks, and most bytecodes have a pop and push.