Propeller 2 Spin - any news?

Bill Henning · 2013-03-10 10:12

Now that I have my DE2-115 running, I am very interested in how Spin2 is doing...

Cluso99 · 2013-03-10 21:15

Bill, I have not progressed any further on conversion of my faster P1 version. IIRC I posted the latest code in its own P2 thread.

Of course I am interested to hear what Chips view on P2 spin is.

cgracey · 2013-03-10 21:19

I started working on it, but haven't done much in several months, as we've been wrapping the silicon up. And then there's all that chip documentation to finish....

Cluso99 · 2013-03-10 21:31

Chip,
I fully understand you have been too busy...
Can I ask a few questions about your spin2 thoughts?

I was interested to use my faster P1 spin as a base because it did quicker decoding freeing up valuable cog space that permited most of the other spin codes to be sped up, particularly the maths ones. I had thought that the decoding would fit in the CLUT but I now think this is better used for the stacks. The reason for this is that almost every spin code contains at least one push and a pop to/from the stack. Decoding is only once per code. Hence the speed yield will be much better using the CLUT as stacks. Do you agree?

Are you happy to use something like my vector table in hub to decode each spin code? Each spin code reads a long containing 3 addresses and 5 bits of flags from hub. The addresses are used as addresses to routines to be performed by the spin code.

Is there any extras you can think of that we could start implementing while we wait for you to get some free time?

jazzed · 2013-03-10 22:11

Cluso99 wrote: »

Is there any extras you can think of that we could start implementing ...?

Greater than 16 bit addresses? Maybe that has already been fixed.

I would be worried about stack growth > clut size.
Anybody could always use a different version of the interpreter if necessary though.

cgracey · 2013-03-10 22:25

Cluso99 wrote: »

Chip,
I fully understand you have been too busy...
Can I ask a few questions about your spin2 thoughts?

I was interested to use my faster P1 spin as a base because it did quicker decoding freeing up valuable cog space that permited most of the other spin codes to be sped up, particularly the maths ones. I had thought that the decoding would fit in the CLUT but I now think this is better used for the stacks. The reason for this is that almost every spin code contains at least one push and a pop to/from the stack. Decoding is only once per code. Hence the speed yield will be much better using the CLUT as stacks. Do you agree?

Are you happy to use something like my vector table in hub to decode each spin code? Each spin code reads a long containing 3 addresses and 5 bits of flags from hub. The addresses are used as addresses to routines to be performed by the spin code.

Is there any extras you can think of that we could start implementing while we wait for you to get some free time?

I was planning on sticking with byte codes for density, since RDBYTEC is pretty fast. The other certainty would be to use the stack RAM as the run-time stack That would place certain limits on call depth, but would be very efficient and fast. From where I left off, I was figuring Spin on Prop2 was going to be about 30x faster than on Prop1.

Cluso99 · 2013-03-11 00:37

cgracey wrote: »

I was planning on sticking with byte codes for density, since RDBYTEC is pretty fast.

Agreed. No reason to change this.

The other certainty would be to use the stack RAM as the run-time stack That would place certain limits on call depth, but would be very efficient and fast.

This sounds good. I have not examined the interpreter from real usage point of view.

From where I left off, I was figuring Spin on Prop2 was going to be about 30x faster than on Prop1.

WOW. I thought maybe somewhere between 10x to 20x which would be very acceptable.

The biggest improvements I made was with the (simple) maths bytecodes. I also implemented your? improved multiply and divide routines. Many bytecodes also improved because I was able to inline some of the push/pop and other called routines (due to cog space restrictions).

By using LMM, some of the lesser used opcodes could be moved to LMM, leaving more room to streamline some of the bytecode execution.

David Betz · 2013-03-11 04:39

cgracey wrote: »

I was planning on sticking with byte codes for density, since RDBYTEC is pretty fast. The other certainty would be to use the stack RAM as the run-time stack That would place certain limits on call depth, but would be very efficient and fast. From where I left off, I was figuring Spin on Prop2 was going to be about 30x faster than on Prop1.

I don't know if you want to take this into account but a small stack might interfere with using Femto Basic on the P2 since I believe it uses a recursive parser. Of course, typical Basic expressions are failry simple so that might not make a big difference.

potatohead · 2013-03-11 08:40

30X? Heck, SPIN might end up on par with PASM on P1.

Bill Henning · 2013-03-11 13:11

I wonder if an intermediate port of Spin would be useful...

- lower 64KB is used for Spin code sections (any free space could be used by PASM code as buffers)
- upper 64KB is used for DAT
- this should allow keeping to just 16 bit pointers, albeit by implementing a split I&D scheme

This may reduce the work needed to get a sub-optimal Spin running.

Cluso99 · 2013-03-11 17:21

Bill Henning wrote: »

I wonder if an intermediate port of Spin would be useful...

- lower 64KB is used for Spin code sections (any free space could be used by PASM code as buffers)
- upper 64KB is used for DAT
- this should allow keeping to just 16 bit pointers, albeit by implementing a split I&D scheme

This may reduce the work needed to get a sub-optimal Spin running.

I thought that restricting spin to the lower 64KB hub would be fine, at least to get it running. That is double what we have on the P1. I don't think anyone ran out of space for spin programs... it was the buffer space for video, etc and of course these can be tasked to fit in the upper 64KB.

Bill Henning · 2013-03-11 17:26

You are right; for the first "Spin" (pun intended) no need for split I&D, as you say, the last 64KB can be reserved for buffers etc.

pedward · 2013-03-13 23:03

cgracey wrote: »

I was planning on sticking with byte codes for density, since RDBYTEC is pretty fast. The other certainty would be to use the stack RAM as the run-time stack That would place certain limits on call depth, but would be very efficient and fast. From where I left off, I was figuring Spin on Prop2 was going to be about 30x faster than on Prop1.

Wow, that's something like 15MIPS!?

jazzed · 2013-03-13 23:20

Cluso99 wrote: »

I thought that restricting spin to the lower 64KB hub would be fine, at least to get it running. That is double what we have on the P1. I don't think anyone ran out of space for spin programs...

I certainly exhausted code space. :frown:

Cluso99 · 2013-03-14 00:59

jazzed wrote: »

I certainly exhausted code space. :frown:

Do you mean 32KB of spin bytecode??? Or just filled the whole 32KB with buffers, spin, pasm, etc???

jazzed · 2013-03-14 07:27

Cluso99 wrote: »

Do you mean 32KB of spin bytecode??? Or just filled the whole 32KB with buffers, spin, pasm, etc???

I said code space. One project was a g-code decoder and x-y stepper driver (driven by one pin toggles). It was almost entirely SPIN except for the full-duplex serial driver and some data. That g-code program is lost except that I sent it to someone in email once.

Cluso99 · 2013-03-14 20:05

Thanks jazzed. I was not aware of anyone who had almost a whole 32KB of spin code.

Guess there may be a reason to have the 64KB version initially, and a second 64KB+ version (later - should not be restricted to 128KB as hopefully we may bet a P2B with >128KB hub in the future, if there is enough demand).

FredBlais · 2013-03-18 06:28

I think SPIN is essential for the Propeller. From my experience with the first Prop, here are some things that I would change (just my opinion...)

Function calls : function should be called with the ( ) at the end. Right now, I'm sometimes confused between function and variables.
Objects : It would be nice to be able to get the content of an object global variable without passing by a function call, directly with the dot operator.
Operators : I think that <= >= != would be more standard than =< >= <>
Structures : would make the code easier to understand than this

labels[label_cnt*3] := commands[pc+1] 'Store label id in an array
labels[label_cnt*3+1] := pc 'Store the PC address for later use
labels[label_cnt*3+2] := 0 'Not looping

Just my 2 cent

Propeller 2 Spin - any news?

Comments