You might want to consider making a just-in-time compiler that translates P1 Spin byte codes into P2 assembly instructions.
An Interpreter keeps the bytecode small. By using a jit compiler, the bytecodes will get expanded and therefore restrict the code size.
Am I following your suggestion properly???
You might want to consider making a just-in-time compiler that translates P1 Spin byte codes into P2 assembly instructions.
An Interpreter keeps the bytecode small. By using a jit compiler, the bytecodes will get expanded and therefore restrict the code size.
Am I following your suggestion properly???
I'm assuming that a JIT compiler does not need to keep the translated code around all the time. It just translates bits of code and then executes them and keeps the translation in a cache that can be purged if memory gets low.
You might want to consider making a just-in-time compiler that translates P1 Spin byte codes into P2 assembly instructions.
For bytecodes I suspect it'll be hard to beat the performance of XBYTE, which would be the way to go if you really want a high performance bytecode interpreter. For programs that will fit in the JIT's cache the JIT will win, but as soon as you start getting cache misses I think XBYTE will win, since it's effectively using hardware to do the cache fills and instruction decode.
OTOH if performance is the goal, well, there's already fastspin
I'm sure Chip will eventually produce a byte code compiler for Spin2. It seems to me that any efforts to execute P1 byte code on the P2 will be short lived in their usefulness.
For P2 there's another interesting option. Since the bytecode interpreter is included with the application (rather than in ROM) it can be optimized for the particular program it's running. It'd be kind of interesting to see domain specific bytecode interpreters for different classes of applications. I don't know if p1spin or Cluso's prospective spin interpreter would be good starting points for this, but perhaps they might? You'd have a kind of "base" interpreter with the instructions everyone uses, and then replace some specific bytecode instructions with ones to speed up the application it's being linked with.
@"Dave Hein" I seem to recall that you modified a Spin interpreter to run LMM PASM in a similar way, by replacing one of the lesser used Spin opcodes?
Do you mind then if I use your code as a base as you've already unrolled the code better than I was able to do on the P1?
Feel free to use anything you want from p1spin. It has an MIT license.
@Eric, there is one unused bytecode. I think it is 3F, but I may be wrong. I wrote a Spin object that patched the Spin interpreter in cog memory, and inserted a small LMM interpreter. When the previously unused bytecode was encountered the interpreter would execute LMM code. The LMM code would jump back to the Spin interpreter to continue running Spin bytecodes. The object is in the OBEX, and I believe I called it SpinLMM.
There are certain areas that will benefit from a rewrite, whether your code or mine. I continued on my code today doing the parts that will benefit the most from a rewrite using P2s new features.
The skipf results in really tiny code for the maths as you can just put each P2instruction straight after one another and select the appropriate one by the skipf bits. Its really neat
@"Dave Hein"
When you get a chance can you send me a link or two regarding the interpreters spin to asm. I would like to look at a good example of what happens.
Look at the file p1spin.spin2 in the zip file. Search for the label "loop". That's the main loop for the interpreter. It reads a byte, which is used as an index into the jump table. Each bytecode is implemented by a small piece of PASM code. The PASM code then either jumps back to loop or to pushx1 depending on whether it saves the result on the stack.
You might want to consider making a just-in-time compiler that translates P1 Spin byte codes into P2 assembly instructions.
For bytecodes I suspect it'll be hard to beat the performance of XBYTE, which would be the way to go if you really want a high performance bytecode interpreter. For programs that will fit in the JIT's cache the JIT will win, but as soon as you start getting cache misses I think XBYTE will win, since it's effectively using hardware to do the cache fills and instruction decode.
I may have been premature with this statement. I've been doing more research on JIT compiling and on cache strategies (some of which is discussed in my LMM thread). Some stack based virtual machines may benefit a lot from a JIT compiler, if it can take advantage of knowledge of basic blocks to elide stack pushes and pops. The HUB RAM looks like it may be a bottleneck in a stack based system, so I think there actually is real potential in a JIT compiler for a stack based VM (like the spin1 one).
Comments
Am I following your suggestion properly???
For bytecodes I suspect it'll be hard to beat the performance of XBYTE, which would be the way to go if you really want a high performance bytecode interpreter. For programs that will fit in the JIT's cache the JIT will win, but as soon as you start getting cache misses I think XBYTE will win, since it's effectively using hardware to do the cache fills and instruction decode.
OTOH if performance is the goal, well, there's already fastspin
@"Dave Hein" I seem to recall that you modified a Spin interpreter to run LMM PASM in a similar way, by replacing one of the lesser used Spin opcodes?
@Eric, there is one unused bytecode. I think it is 3F, but I may be wrong. I wrote a Spin object that patched the Spin interpreter in cog memory, and inserted a small LMM interpreter. When the previously unused bytecode was encountered the interpreter would execute LMM code. The LMM code would jump back to the Spin interpreter to continue running Spin bytecodes. The object is in the OBEX, and I believe I called it SpinLMM.
There are certain areas that will benefit from a rewrite, whether your code or mine. I continued on my code today doing the parts that will benefit the most from a rewrite using P2s new features.
The skipf results in really tiny code for the maths as you can just put each P2instruction straight after one another and select the appropriate one by the skipf bits. Its really neat
When you get a chance can you send me a link or two regarding the interpreters spin to asm. I would like to look at a good example of what happens.
Thanks
Martin
looked for the file and cannot find it
I may have been premature with this statement. I've been doing more research on JIT compiling and on cache strategies (some of which is discussed in my LMM thread). Some stack based virtual machines may benefit a lot from a JIT compiler, if it can take advantage of knowledge of basic blocks to elide stack pushes and pops. The HUB RAM looks like it may be a bottleneck in a stack based system, so I think there actually is real potential in a JIT compiler for a stack based VM (like the spin1 one).