Emulating a 16-bit instruction machine - anyone else done this?
Mark_T
Posts: 1,981
For a few PASM programs I've written doing arithmetic of various kinds I've run out of cog space and decided to
write an emulator for a 3-address 16-register processor for handling the numeric code. The problem with PASM
is that being a 2-address architecture with 32-bit instructions each arithmetic operation (which might be just an add
instruction or a call to a routine takes 3 to 4 longs (MOV two arguments, CALL, MOV back the result).
So code with a dozen arithmetic ops takes about 50 longs from the COG space. An emulated 16-bit instruction 3-address
machine takes 12 words / 6 longs for the same sequence, and of course the code can be in HUB RAM too...
For ordinary integer arithmetic this is very slow, but one of the programs was manipulating points on a discrete elliptic curve
where each add/sub/mult operation is many instructions anyway. An example there was coding up the calculation of inverses.
So the basic technique was to code up a 4-bit opcode and 3 4-bit "register" numbers into a WORD and decode in a little
emulator (sometime like this)
For tests and branches I just did that directly in PASM and each basic block was a separate script. A branch mechanism could just about be shoehorned in the emulated code though.
write an emulator for a 3-address 16-register processor for handling the numeric code. The problem with PASM
is that being a 2-address architecture with 32-bit instructions each arithmetic operation (which might be just an add
instruction or a call to a routine takes 3 to 4 longs (MOV two arguments, CALL, MOV back the result).
So code with a dozen arithmetic ops takes about 50 longs from the COG space. An emulated 16-bit instruction 3-address
machine takes 12 words / 6 longs for the same sequence, and of course the code can be in HUB RAM too...
For ordinary integer arithmetic this is very slow, but one of the programs was manipulating points on a discrete elliptic curve
where each add/sub/mult operation is many instructions anyway. An example there was coding up the calculation of inverses.
So the basic technique was to code up a 4-bit opcode and 3 4-bit "register" numbers into a WORD and decode in a little
emulator (sometime like this)
exec_script :sloop rdword op, script wz add script, #2 if_z jmp #exec_script_ret ' zero instruction means end of script call #getarg ' pick apart 16-bit instruction " op r, a, b" mov b, arg call #getarg mov a, arg call #getarg mov r, arg ' now op just contains the opcode field, so dispatch mechanism follows cmp op, #1 wz if_z call #addd ' the various operations of the emulated machine cmp op, #2 wz if_z call #subtract cmp op, #3 wz if_z call #mult cmp op, #4 wz if_z call #halve_into ' in this example a,b,r happen to be addresses of operands so no need to write back result jmp #:sloop exec_script_ret ret getarg mov arg, op shr op, #4 and arg, #$F ' got rightmost arg number add arg, #X ' add X's cog address (X is the zero'th register) movs mov_ins, arg ' overwrite instruction with correct register spec nop mov_ins mov arg, X ' modified instruction to move register getarg_ret ret
For tests and branches I just did that directly in PASM and each basic block was a separate script. A branch mechanism could just about be shoehorned in the emulated code though.
Comments
There is more that can be done with it given time of course :thumb:
Thanks for sharing.
--Steve