Interpreter toolkit for P2
ersmith
Posts: 6,087
in Propeller 2
I've created a simple interpreter toolkit for P2. Source code is at: https://github.com/totalspectrum/p2-jit-tools
It's based on the JIT translation code I used for my P2 ZPU emulator. It comes with a very stripped down stack based virtual machine, in 4 versions: a very straightforward ("plain") interpreter, an XBYTE based interpreter, a simple JIT compiler, and an optimizing JIT compiler. There's a timing program using the virtual machine to toggle pin 0 one million times. Times for the various interpreters running the same code are:
The timing program itself looks like:
It's based on the JIT translation code I used for my P2 ZPU emulator. It comes with a very stripped down stack based virtual machine, in 4 versions: a very straightforward ("plain") interpreter, an XBYTE based interpreter, a simple JIT compiler, and an optimizing JIT compiler. There's a timing program using the virtual machine to toggle pin 0 one million times. Times for the various interpreters running the same code are:
plain interpreter: 336_000_321 cycles xbyte interpreter: 160_000_129 cycles JIT w. HUB cache: 168_001_865 cycles JIT w. LUT cache: 120_002_177 cycles optimized JIT w. HUB: 64_002_363 cycles optimized JIT w. LUT: 48_002_425 cycles
The timing program itself looks like:
InitialPC byte OP_PUSHIM long @startmsg byte OP_PRSTR byte OP_GETCNT byte OP_PUSHIM long @starttime byte OP_STORE '' establish loop counter: toggle 1000000 times byte OP_PUSHIM long 1000000 ' loop counter ' negate it so we can count up byte OP_PUSHIM long 0 byte OP_SWAP byte OP_SUB loop byte OP_PUSHIM long 0 byte OP_PINLO byte OP_PUSHIM long 0 byte OP_PINHI ' decrement loop counter byte OP_PUSHIM long 1 byte OP_ADD byte OP_DUP byte OP_JNEG long loop exitloop ' push elapsed time onto stack byte OP_GETCNT byte OP_PUSHIM long @starttime byte OP_LOAD byte OP_SUB ' print elapsed time byte OP_PUSHIM long @endmsg byte OP_PRSTR byte OP_PRHEX byte OP_PUSHIM long @newline byte OP_PRSTR ' end of loop byte OP_HALT startmsg byte "toggling pin 0 1_000_000 times:" newline byte 13, 10, 0 endmsg byte "done. elapsed cycles: 0x", 0 alignl starttime long 0 endtime long 0 vara long 0
Comments
Could you briefly explain the JIT concept, please? I know it means just-in-time, but what is happening, exactly? You are translating something into code and executing it, right?
and what is different between JIT and 'optimized JIT' ?
Instead of executing those instructions right away as XBYTE would do, the just in time compiler appends that instruction sequence to its cache, and only executes it once the cache line is finished (basically when either the cache is full or we branch somewhere else).
When a branch is encountered we look in the cache for a line that starts with the new PC; if it's found we just jump directly to the cache. That's where the speed improvement comes; we don't have any interpretation overhead for loops that fit in cache, it's just running raw machine code at that point. (There's an extra "trampoline" mechanism to avoid the cache check entirely for jumps from cache to cache.)
The "regular" JIT is a dead simple translation of the plain and XBYTE interpreters; it copies basically the same instructions that they would execute into the code cache.
The "optimized" JIT compiler does some optimization on the sequences. It can do this because we know where branches are, and know that we can never branch into the middle of a cache line. The main optimization it performs is keeping track of what's on the stack, so we can turn pushes and pulls into register moves. It also optimizes the size of moves. So for example the bytecode to drive pin 56 low looks like: In the regular JIT this becomes but in the optimized JIT it's compiled to: because we know that the sequence leaves the stack unchanged at the end, we don't need to push or pop any values. There's an obvious further optimization to: but I haven't implemented that yet.
You are quite the busy P2 guy. Producing piles of useful stuff!