Yes-a-silicon-bug: Bypassing DEBUG protection
Wuerfel_21
Posts: 5,141
(Original topic title: Not-a-silicon-bug: FIFO writes do not bypass DEBUG protection - crazy hack further down)
Just a random thing I was thinking about: Can the DEBUG memory protection be bypassed by setting up the FIFO to write into debug ram and then executing BRK?
Apparently not! It seems that the FIFO/streamer continue running in the background while the debugger is active, but can not write to debug RAM. I don't think this is documented anywhere. (@cgracey ?)
I might need to check what happens if the FIFO is started from within the debugger.
Here's some code that attempts to write a jump instruction into the debug handler, but it doesn't work like that:
_CLKFREQ = 100_000_000 PUB main() pr0 := (%1111_1101100_0 << 20) + @haxx pr1 := $FF840 + (cogid()^15)<<7 'pr1 := @test_buffer pr2 := negx org '' uncomment to trigger payload manually 'jmp pr0 wrpin #P_REPOSITORY,#0 addpins 3 drvl #0 addpins 3 nop nop wxpin pr0,#0 shr pr0,#8 wxpin pr0,#1 shr pr0,#8 wxpin pr0,#2 shr pr0,#8 wxpin pr0,#3 setscp #64 waitx #128 getscp pr0 wrfast #1,pr1 setxfrq ##$1000_0000 xinit ##X_4ADC8_0P_4DAC8_WFLONG|X_WRITE_ON|$FFFF,#0 brk #1 end debug(uhex_long(pr0,pr1)) debug("Pink fluffy unicorns dancing on Rainbows! The world is alright!",uhex_long_array(pr1,16),uhex_long(pr0,pr1)) DAT orgh test_buffer long 0[16] orgh haxx loc ptra,#pwntext .loop rdbyte pr0,ptra++ wz .stop if_z jmp #.stop .wait rdpin pr1,#62 wc if_c jmp #.wait drvl #62 wypin pr0,#62 jmp #.loop pwntext byte "APPARITIONS STALK THE NIGHT AND ALL IS LOST!",10,13,0
Comments
My guess is the debug area will be writeable by the matching FIFO of the cog that has write access at the time. The moment that cog exits debug then the write access is removed for its FIFO too.
That's what happens here, but no. There's only only one cog active. By playing with the streamer parameters, you can see that it runs while the DEBUG interrupt is active (leaving the ORG/END section will kill it), but still can't write into the debug area.
Actually, all DEBUG()'s are a hubexec. The FIFO is reloaded before the write protect gets removed. I don't think you can test this.
I presume any streamer writes will vanish when the FIFO is configured as RDFAST. Although, those writes may possibly still cycle as a RFLONG and advance the FIFO, crashing the cog.
No, the debugger is specifically designed to not use hubexec because it can't reset the FIFO into just the right state (that really should have been fixed instead, it'd be really useful to start the FIFO part-way into a block). As said, the streamer actually keeps running, this I empirically verified (try setting the XFREQ to a low-ish value - when removing the BRK it will only fill a few slots of the buffer because the END will kill it, but with the BRK it keeps running while the debugger prints to the serial port for some couple hundred cycles.
My theory is that there's an extra signal that's only enabled when doing a regular synchronous WRxxxx instruction while in debug mode that overrides the memory protection.
I guess that shows how little attention I've paid to DEBUG.
Oh, another way to prove the FIFO keeps running during DEBUG is to set XFREQ to $8000_0000 and the streamer op to infinite length ($FFFF), as that will completely stall out the ROM's attempt to save $000..$00F and thus hang the CPU.
So here's how you actually bypass the debugger protection...
Try running it for yourself! (only tested with flexspin)
Do I need a CVE for this? :P
Okay, I don't see how "haxx" gets run. I can tell the program size is important. If I change it then it stops working. I can see PR0 is initialised with a JMP #\ op-code and the address of haxx, but then the very first Pasm instruction MOV 0-0,pr0 doesn't make sense to me. Is it somehow used by the following DEBUG()?
The overall size shouldn't matter, it's just that every instruction here is load-bearing (including both DEBUG()s). The value placed in register zero is used by the "oh no" DEBUG, which is why you don't see that string printed.
I have no idea how the debug()'s can do that. How come I can't remove the NOP?
It's a load-bearing NOP. Though you can replace it with another 2-cycle instruction of choice. (I arrived at the correct timing for this by trial and error, don't know why it needs those cycles, but it sure does)
I still have no idea how "haxx" is launched.
Check the silicon doc for "Debug ROM" and take a good gander at what it does.
And what is "negx"?
Predefined constant for $80000000
I'm home again, so here's the mystery revealed...
The aforementioned debug ROM. It saves the first 16 registers and then loads 16 debugger bootstrap instructions into their place.
The theory of the exploit is based on the observation that asynchronous RDFASTs can interfere with regular RDLONG/WRLONG type instructions. Infact, it's very easy to observe that something simple like
is likely to corrupt the register saving (seemingly by cancelling it altogether - remember that). TODO: Does this mean you can never reliably single-step over RDFAST instructions?
However, by the time the ROM gets to loading the code, the FIFO has finished loading and no glitch occurs. Two methods are used to extend the time spent loading the FIFO while in DEBUG mode:
This delays the FIFO startup enough to interfere with the code read, which causes $000 to not be loaded from hubRAM, keeping its previous value. By making this a jump instruction, arbitrary user code can be executed in debug mode. If you wanted to make an "invisible" version of this, you could load the 16 longs from $FFFC0 manually and jump into them, which would cause the interrupt to continue as usual.
The last piece of the puzzle then is the "everything's fine... so far" DEBUG. This is load-bearing because the ROM's register save also gets glitched. In effect this causes the previous values to persist in RAM, so simply doing a normal DEBUG before the exploit allows the saved registers to be correctly restored after the injected exploit code.
This also leads to the realization that if you change IJMP0 to point to custom code that doesn't immediately do hub access, this glitch won't work. (but that custom code would need to either be in memory that's writable, anyways or be hubexec in the debugger area (destroying FIFO state))
Oh, man, I'd blanked out that finding. It was traumatising! O_o
I've just been going over it again. I couldn't believe it would be a cancelling effect but that's looking to be what's going on. Sort of.
Adding a timer to the old test code sure enough reveals that the RDBYTE that gets screwed over is executing in less that 9 sysclock ticks. It's still sticking to its hubRAM slice but is short by a whole rotation.
Only for the non-blocking prefetch period. A normal
RDFAST #0,addr
doesn't have a problem.Of course. Though it should still work in most cases (for single-step specifically), since RDFAST doesn't change any registers, so the previous ones should be fine. You'd have to be combining it with an autoincrementing ALTx where the counter is in a low register.
Seems p2 needs some errata after all
Rayman,
As an EE friend of mine says, "There is no perfect hardware and there is no perfect software."