Fast Spin? And how to launch it?

JasonDorie · 2013-05-08 17:28

I remember reading a while back about a faster version of the Spin interpreter. I've poked and prodded and Googled, and found references to the faster interpreter, but I haven't seen any concrete examples on how one might launch a standard Spin function with non-ROM Spin interpreter. Has anyone done this, and if so, do you have an example you could share?

Why I want it: I'm working on the math routines for my DCM-based quad. They're pretty fast, but I'd like to make them faster. I'm using a lot lines that look like this:

Result := (V ** N) << 4

The << 4 part can be eliminated if I just alter the code that handles the ** instruction to do it for me. My code is littered with these shifts, so it would make things quite a bit faster. I can make the changes to the interpreter without any trouble, but I have no idea how to go about actually launching the interpreter and giving it my function to run.

Mike Green · 2013-05-08 17:43

You can't really launch a Spin function with a non-ROM Spin interpreter. What you do is launch a whole Spin interpreter like any other PASM program, then set up the initial stack and parameters to start interpreting at some place in your Spin program. It's just like what happens with the ROM Spin interpreter, just using the non-ROM Spin interpreter instead. Once that's running, you can use one of the undefined interpretive codes to load a small PASM routine into the COG for execution. Your particular expression might not be a good fit for this since it does a multiplication which is done by subroutine and may not fit into the space available for this feature.

What are the possible values for N? Perhaps you can use table lookup or some other optimization for (V ** N) << 4.

JasonDorie · 2013-05-09 00:16

I'm using the ((V ** N) << 4) to multiply pairs of 4.28 fixed point numbers together, so I'm using most of the range. The << 4 is required to bring them back up after losing the tail 4 bits. The ** code in the interpreter actually *has* all 64 bits of the result, so I've made & tested a modified version of the code that does the trailing shift for me and preserves the full 4.28 precision in the result (it's an extra mask or two and an OR). What I want to do is simply run my existing code with all the shifts removed on a version of the interpreter that contains the modified version of that opcode. For the function in question, I never use the ** operator other than to do 4.28 math, so I always need the trailing shift.

I know that to launch a ROM-Spin interpreter I use cognew( Function, @stack), and to launch a PASM cog I use cognew( @DatEntry, param). I assume that to run my own interpreter I need to use the PASM form of cognew to run my copy of the interpreter, but I have no idea what to pass it to get it to start executing my compiled Spin code.

You said "...set up the initial stack and parameters to run your Spin program" - Are there examples of how to do that anywhere?

Thanks for the reply.

kuroneko · 2013-05-09 00:32

The built-in interpreter is effectively started with cognew($F004, $0004). Assuming everything else is equal you could try something like coginit(cogid, @replacement, $0004) to run at least the primary cog (you'd need to set a flag to skip the replacement though). Running the cognew(method, @stack) equivalent requires stack setup which has been done but would need some digging.

JasonDorie · 2013-05-09 00:54

So, if I understand correctly, I could launch a normal Spin function with the built-in interpreter, and then swap it out for mine like this:

VAR
  byte NewInterpStarted

PUB EntryPoint
  if( NewInterpStarted == 0 )
    NewInterpStarted := 1
    coginit( cogid, @MyInterpreter, $0004 )

  'Rest of code here

Do I have that right? That's easier than I was expecting.

kuroneko · 2013-05-09 01:00

JasonDorie wrote: »

So, if I understand correctly, I could launch a normal Spin function with the built-in interpreter, and then swap it out for mine like this ...

As I said, this will work when the handling of the parameter area is the same (I never looked at this fast version). This code works for the built-in interpreter:

VAR
  byte  swapped

PUB null

  ifnot swapped~~
    coginit(cogid, $F004, $0004)

  dira[16..23]~~
  outa[16..23] := swapped
  waitpne(0, 0, 0)

So just give it a go ...

JasonDorie · 2013-05-09 01:15

I'm actually not using the fast version - I'm using the code for the interpreter posted by Chip here : http://forums.parallax.com/showthread.php/101483-Propeller-ROM-source-code-HERE?highlight=INTERPRETER+spin+booter

The only thing I was going to change is the code for that one opcode. Unfortunately I get the impression from your code that this will only for for the primary cog. I tried it on a cog that I'm launching to run my DCM loop and that didn't work.

PUB Main
  'Startup code precedes this

  CustomSpinStarted := 0
  cognew( RunDCM , @DCMStack )

  'Main flight loop follows



PUB RunDCM | tx, tz, loopTime, loopindex

  'if( CustomSpinStarted == 0 )
  '  CustomSpinStarted := 1
  '  coginit( cogid, @CustomSpin, $0004 ) 

  DCM.Init
  'additional code follows

Is the $0004 argument the location of the Main function? If so, it looks like I'm going to need to track down the stack-frame setup version after all.

Thanks though - this is helping me understand it a little better.

kuroneko · 2013-05-09 01:19

JasonDorie wrote: »

Is the $0004 argument the location of the Main function? If so, it looks like I'm going to need to track down the stack-frame setup version after all.

Yes, primary only I'm afraid. But given that we have the interpreter source it shouldn't be too hard to figure out what's required.

Could you verify your ** change(s) with a simple primary only test?

JasonDorie · 2013-05-09 01:37

I've tested that code independent of the interpreter and the results are correct.

It would be possible for me to actually make the main thread be the DCM loop, and just have it spawn the flight loop on another cog. I don't see why that wouldn't work, but it would require me rearranging a bunch of code to try it.

In parallel to this, I'm also writing a bunch of math functions in PASM that can be called with a sequence of instructions (in fact, you helped me fix a bug in it not long ago). Ultimately I may just write the DCM code in PASM, but if I can get the interpreter to run a modded version of my code, I might get the speedup I need without having to do the full translation.

Bedtime for me - I'll have to follow this up later, but your help is much appreciated.

kuroneko · 2013-05-09 06:36

OK, the reason a cognew doesn't work is that the preparation (byte) code is in ROM which means that it also drops the address of the ROM interpreter on the stack before continuing with coginit. Solution: point to a copy of the preparation code in hub RAM, patch the interpreter entry point and you're good to go. Minor challenge was to find a spare long in the interpreter. Anyway, the attached sample does all this for you. I verified it by telling the ++ op to do a --. This patch has been removed, the only change is the long aquisition, address patches are done from SPIN.

Cluso99 · 2013-05-09 07:03

my faster spin interpreter has two ways to launch it. I did it a couple of years ago and it's particularly faster in the maths routines, including a faster multiply routine iirc.
There is also my zero footprint debugger. iirc it launches my interpreter and a soft version of the rom interpreter.
one launch method uses a set of spin patch instructions to modify the interpreter once it is loadedinto cog. thismight also be useful too.
Links are in my signature to the tools list.hippy did alot of theinitial research into patching the rom interpreter once loaded.

Fast Spin? And how to launch it?

Comments