Hacking Spin Interpreter Cog Ram
hippy
Posts: 1,981
A bit technical this and perhaps not of much interest to anyone who isn't toying with hacking the Spin Interpreter and object code, but for those who are, this could be a useful discovery.
Apologies if anyone has worked this out and announced it already but if they have I've missed it !
There are three Spin bytecodes which have had me intrigued for a long while ...
When n is $10 to $1F these allow Spin to access the Special Purpose Registers at $1F0 to $1FF ( INA, OUTA, DIRA etc ), when n is $00 to $0F they perform another function, for example, to push the current Cog ID to the stack ...
I knew about the '$3F 9x' opcodes, but '$3F 8x' remained a mystery. Working on the latest bytecode disassembler I set about trying to fill in the gaps.
Looking at the Spin Interpreter source; what do we find at $1E9 ? "id", which is set early on in the interpreter initialisation and never otherwise used within the interpreter ( I had wondered why that was ).
Obvious with hindsight and when the light comes on; +n is +$00 to +$1F accessing Cog Ram $1E0 to $1FF which just happens to include the SPR. A typical Chip optimisation ( I recall I did say getting into Chip's head was a big key to cracking and understanding the interpreter ).
Now, what do we see after "id" ? The variables the interpreter itself uses; program counter, stack pointer, object base, variable base, etc.
So a few things ...
1) For people writing their own Spin Interpreters, it's crucial that Cog Ram at $1E9-$1EF is used exactly as in the ROM Interpreter or the '$3F xx' opcodes are abstracted to behave as they would if placed elsewhere. That explains why attempts to move or remove any of those variables lead to a crash of a hub-loaded interpreter.
2) Reading Cog Ram from $1E0 to $1EF should be possible ( assuming we can put the required bytecode into the image ) so it's possible to determine current stack base, current stack pointer, current PC etc without jumping through convoluted hoops in Spin.
3) If we can read Cog Ram we can write it, which opens the door to some potentially clever tricks with altering stack and PC on the fly, all of which could be useful for DOL and other 'bootload object', and 'object overlay' hackery.
4) As we can write to $1E0 through $1EF, there's a good chance we could inject code into the interpreter which we can force to be executed; we can overwrite part of the 'range' routine.. That would allow injected code to dump the entire interpreter from Cog Ram ( irrelevant now, and a catch 22 ! ), but an attack vector Chip will probably want to close for the Prop II.
5) More importantly with code injection, it should be possible to switch a running interpreter to an LMM interpreter and back again, or even switch it to running in-line PASM !
I haven't had a chance to try this yet but looking at the interpreter source it appears to hold up as a valid analysis.
Apologies if anyone has worked this out and announced it already but if they have I've missed it !
There are three Spin bytecodes which have had me intrigued for a long while ...
3F 80+n PUSH spr 3F A0+n POP spr 3F C0+n USING spr
When n is $10 to $1F these allow Spin to access the Special Purpose Registers at $1F0 to $1FF ( INA, OUTA, DIRA etc ), when n is $00 to $0F they perform another function, for example, to push the current Cog ID to the stack ...
3F 89 PUSH $09
I knew about the '$3F 9x' opcodes, but '$3F 8x' remained a mystery. Working on the latest bytecode disassembler I set about trying to fill in the gaps.
Looking at the Spin Interpreter source; what do we find at $1E9 ? "id", which is set early on in the interpreter initialisation and never otherwise used within the interpreter ( I had wondered why that was ).
Obvious with hindsight and when the light comes on; +n is +$00 to +$1F accessing Cog Ram $1E0 to $1FF which just happens to include the SPR. A typical Chip optimisation ( I recall I did say getting into Chip's head was a big key to cracking and understanding the interpreter ).
Now, what do we see after "id" ? The variables the interpreter itself uses; program counter, stack pointer, object base, variable base, etc.
So a few things ...
1) For people writing their own Spin Interpreters, it's crucial that Cog Ram at $1E9-$1EF is used exactly as in the ROM Interpreter or the '$3F xx' opcodes are abstracted to behave as they would if placed elsewhere. That explains why attempts to move or remove any of those variables lead to a crash of a hub-loaded interpreter.
2) Reading Cog Ram from $1E0 to $1EF should be possible ( assuming we can put the required bytecode into the image ) so it's possible to determine current stack base, current stack pointer, current PC etc without jumping through convoluted hoops in Spin.
3) If we can read Cog Ram we can write it, which opens the door to some potentially clever tricks with altering stack and PC on the fly, all of which could be useful for DOL and other 'bootload object', and 'object overlay' hackery.
4) As we can write to $1E0 through $1EF, there's a good chance we could inject code into the interpreter which we can force to be executed; we can overwrite part of the 'range' routine.. That would allow injected code to dump the entire interpreter from Cog Ram ( irrelevant now, and a catch 22 ! ), but an attack vector Chip will probably want to close for the Prop II.
5) More importantly with code injection, it should be possible to switch a running interpreter to an LMM interpreter and back again, or even switch it to running in-line PASM !
I haven't had a chance to try this yet but looking at the interpreter source it appears to hold up as a valid analysis.
Comments
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
JMH
Tested on real hardware - Dynamically changing the stack pointer on the fly. This is just a raw proof of concept hack and it would be much better in an object abstracted with PUB IncStack(n), PUB DecStack(n) methods.
Dynamically changing PC ...
I have a 6 long debugger which allows me full access to the spin code in cog. I have saved 6 longs in the code (more up to about $20 but not yet consolidated, and is much faster) which allows me to run an LMM debugger. This is the debugger I spoke about.
I have simplified the codes $00-3F and am reasonably sure they work (not totally validated like I am doing with the $E0-$FF bytecodes)
Last night I set about a routine to validate the math codes $E0-$EF but my laptop didn't see action and went to sleep (shutdown) so I didn't validate all parameters. I am comparing to the spin code for verification of all possible x and y values.
I've left the codes at $1E3 or was it $1E5 (hate 2 laptops - it's always on the other) in the same location.
I only now have to understand the calling of mathop from mrop. Then I can place it in the new RamInterpreter.
During my validation I discovered that the spin interpreter has a couple of quirks with the repeat loop
·· repeat 0 to $FF··· only uses a byte and therefore left of the displayed long (endian issue)
·· repeat $0 to $FFFF_FFFF stops at $8000_0000 because of the sign
so I have to use 2 repeats
· repeat $0 to $7FFF_FFFF
····...
· repeat $8000_0000 to $FFFF_FFFF
··· ...
·
Sucking ROM Interpreter code out of the Cog's RAM bypassing all encryption mechanisms.
I'll be the first to put my hands up to acknowledge this wouldn't have been easy without knowing the interpreter code to start with but it is possible without knowing it. Whether it would have been worthwhile to try this approach which would have been 'stabbing in the dark' is another question.
The trick is to subvert the interpreter code which is used to do range checking, then force that code to execute ( LookUp in this case ), then restore it so other bytecode using that still works.
This is all rather academic given that we have the Interpreter source code and can use a RAM Interpreter such as that Cluso99 is working on, and this trick won't work on the Prop II if Chip adds the code to prevent it and I expect he will, although I hope he leaves the PC, SP etc data accessible which cannot be used to suck out the interpreter code.
It's been good fun though and I'm easily amused by sticking crowbars in cracks and seeing what happens
Thanks Hippy - I learnt a few more things getting this to work. It has given me a few more ideas for debugging my Interpreter.
On a related note, Can you remember that long dual-port ram discussion we tagged on a few months ago I believe? I just watched a few parts of the 7 part Chip Gracey interview on youtube. It sounds like he's quad-porting the ram, or so he says. Pipeline.
It's quite a pleasure to see Parallax so laid back about the digging myself and others have done. I'm sure Chip is quite fascinated watching what people do pull out the bag - a bit like a proud parent pushing a child out into the world and finding out what it can do they never expected !
I expect Chip and others ( myself included ) would be rather upset if anyone set about deliberately undermining the Propeller or Parallax but it seems to me he's happy to see anything which isn't damaging as positive. I don't expect a cease and desist notice any time soon
Regards,
John Twomey
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Necessity is the mother of invention'
Sorry for the OT comment.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer
Parallax, Inc.
Regards,
John
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Necessity is the mother of invention'
I want to use spin to output these bytecodes...
I know I can fudge it as Hippy has done (above).
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
I am sorry that I haven't had time to try the compiler yet. I am trying to reduce the complexity of the spin debugger, so it is simple for others. Othewise it's working well - I can switch from spin down to pasm and back on the fly, so I can just trace a particular bytecode. It is also display instruction counts (spin and pasm).
Really?
Oh, I see.. it compiles code just fine, it won't let you jump to a label after $1F0 but you can jump to an immediate value just fine.
Yeah, that'll be easy to fix in a new compiler.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
A RAM Spin Interpreter could be made to do that using the same zero-footprint debug trick as in Spin. It would run into problems though if two Cogs launched the same code at the same time as what's in hub would need to be replaced by a micro-bootloader and restored in the Cog later.
I was thinking
X1 := REG[noparse][[/noparse]$1EF]
Where REG[noparse][[/noparse]XX] works precisely as any other register operation as long as the XX is between $1E0 and $1FF
You could even do REG[noparse][[/noparse]$1ED][noparse][[/noparse]3..6]~~
It would be incredibly easy to implement and allow you to access anything from $1E0-$1FF with all the features of a normal register access.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
In the Spin Interpreter, it is curious that only the $3F bytecode·allows access to the registers (cog memory) from $1E0-$1FF whereas the SPR[noparse][[/noparse]reg] only allows access to $1F0-$1FF (because the bytecodes $24-$26 OR the register with $10 and subsequently OR with $1E0 in the $3F section).
Curious ? Yes and no.. more no than yes..
If you look at it, the compiler is specifically geared not to allow you access to $1E0 - > $1EF, although it does itself for COGID. So, logically it needs the ability to do that, but it does not want you to do that. Now look at SPR[noparse]/noparse which allows you to poke any value in there you want.. it needs to range check that in the interpreter to ensure you don't go where it does not want you to go.
Makes sense to me anyway.
Having the compiler source code though, means you can add anything you like to exploit the entire feature set of the available code.
I'm looking at ways of using PCURR to allow easy runtime patching of the Spin Bytecode..
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!