Dumping COGRAM - code snippets & sample program
mindrobots
Posts: 6,506
in Propeller 2
I've been playing around with some monitoring and debugging and found a really nifty reason I like the totally unassisted COGINIT and HUBEXEC. You can kill a running COG from a COG in HUBEXEC mode and dump out the COGRAM of the COG you just stopped! (OK, I thought it was cool)
From your "monitor" COG running in HUBEXEC, you can do something like this
The data in cog_dmp_img will be a copy of the COGRAM from whatever COG you did the coginit on.
After starting the test COG with this HUBEXEC code
and then "dumping" it at some point, you end up with COGRAM looking like this:
The entire program is below if you want to play with it (it's ugly). Load it, start up PST, press any key to launch the second COG, press again to stop that COG, press again to dump the captured memory.
Have fun!
(Yeah, Chip, it's FUN to program the P2!! Thanks!!)
From your "monitor" COG running in HUBEXEC, you can do something like this
' start a COG to dump itself mov cognum,#2 loc ptrb,@dump_cog coginit cognum,ptrbwhere dumpcog is this simple code which will be running in HUBEXEC after the coginit (I haven't tried it with the target cog running in COGEXEC but I see no reason it would not work the same)
dump_cog ' start a COG to dump its memory to HUBRAM ' loc adra,@cog_dmp_img setq #$1FF wrlong $000,adra cogid 0 cogstop 0 ret ' set it to $33333333 just so you notice changes cog_dmp_img long $33333333[$1FF]
The data in cog_dmp_img will be a copy of the COGRAM from whatever COG you did the coginit on.
After starting the test COG with this HUBEXEC code
cog_exercise ' load up COGRAM with pattern loc adra,@pattern_buf setq #$1EF ' fill COGRAM rdlong $010<<2,adra mov $011<<2,#$01 mov $012<<2,#$02 mov $013<<2,#$04 mov $014<<2,#$08 mov $015<<2,#$10 mov $016<<2,#0 rollp rol $011<<2,#1 rol $012<<2,#1 rol $013<<2,#1 rol $014<<2,#1 rol $015<<2,#1 add $016<<2,#1 jmp @rollp
and then "dumping" it at some point, you end up with COGRAM looking like this:
FFFFFFFF-00000000-00000000-00000000-00000000-00000000-10101010-10101010-00001084-10101 010-10101010-10101010-10101010-10101010-10101010-10101010-10101010-00000001-00000002-0 0000002-00000004-00000008-000AA05F-10101010-10101010-10101010-10101010-10101010-101010 10-10101010-10101010-10101010-10101010-10101010-10101010-10101010-10101010-10101010-10 101010-10101010-10101010-10101010-10101010-10101010-10101010-10101010-10101010-1010101 0-10101010-10101010-10101010-10101010-10101010-10101010-10101010-10101010-10101010-101 01010-10101010-10101010-10101010-10101010-10101010-10101010-10101010-10101010-10101010I thought it was pretty interesting...but then, I'm easily entertained.
The entire program is below if you want to play with it (it's ugly). Load it, start up PST, press any key to launch the second COG, press again to stop that COG, press again to dump the captured memory.
Have fun!
(Yeah, Chip, it's FUN to program the P2!! Thanks!!)
Comments
EDIT: It might be easier to instead add a way for a cog to inject a LINK or CALL[AB] or something into another cog's instruction pipeline.
If the new COG choices allow start of either Local or HUBEXEC, you could get close.
Some small overhead would exist, unless an external custom Debug code, pulled in some registers over a serial link, and then restored them.
Unlike the finesse of a break, this would be more a 'core dump' bigger hammer.
Co-operative debug of Step/Break is likely to be more useful, just need to make that as compact and 'invisible' as possible.
We just need a register that stores the PC (+C&Z?) when a cogstop/coginit occurs so we could read it back to use as a restart!
Chip said this in another thread, so it sounds like nice Debug improvements are in the pipeline.
This is one of those things that we'll need to feel out, somewhat, to make sure we've got the right approach.
When a coginit is issued on a cog, if the cog is/was already executing, would it be possible for the C&Z flags and the PC to be saved in a hidden register that could be read by a special instruction?
This would permit an errant program to be interrupted and examined and/or single stepped (debugged).
Good idea!
Can the COGINIT just push the CZ/PC to the COG's internal stack? If you want it after the COGINIT, you can just pop it off the stack, if not, life goes on.
It just becomes the known startup state of the COG without needing anything out of instruction space.
This becomes the NMI for a COG - the COG can't ignore it but at least it saves execution state for restart or forensics.
If you don't want the value, you wouldn't even need to pop it - just leave it there and nobody will ever notice it. If the stack overflows and the restart point falls off, it wouldn't even matter.
That's exactly what I meant by "if not, life goes on"
This sounds an excellent method.
Would this have any impact on the code security / protection / cryptography facility? If someone flashed a protected manufactured product with their own image with a hook to their own debug software on Propeller 2 startup? I am not very well versed in encoded images so this may be irrelevant.
Obvious way to counter that is to inhibit on Cog 0 (which I think does the boot decryption) with a first-exec flip-flop, if it is needed at all.
Code security would inhibit anybody from modifying any program images. If changed, they just wouldn't work.
With the debugging hooks, we already have asynchronous breaks from other cogs, which won't reset the ports and I/O states. The limitation is, though, that if the cog is locked up in some kind of a WAIT, the asynchronous break will never be seen.
I'll see about pushing C/Z/PC onto the hardware stack when a cog is COGINIT'd. In some cases, that would be your only hope of discovering what went wrong.
Man, you guys come up with some great ideas!
Pushing C/Z/PC when a COGINIT is issued sounds preferable to forcing the WAIT to see a falsified completion state, though you would then be dependant on the hardware stacking having a free entry, which hopefully in most cases it would.
Could you perhaps instead force a similar type of action on WAIT as when say a timer interrupt occurs and you force a CALL (LINK?) to the ISR, instead forcing a CALL/LINK to the asynchronous break instruction address @$400+(CogID*4) ?
Ah ha, the good old fashioned 'infinity stack' ! :-D
Could you please elaborate? I'm not understanding.
Wait.. there may be some misunderstanding here. Those $400 + cogid*4 vectors are only initial vectors. Those addresses are set in the tiny boot ROM within the cog. They can be changed to point anywhere, after the initial break. In some cases, you will want to have your whole debug routine in cog or LUT, to not suffer eggbeater-FIFO disruption that comes with hub exec. At the outset of a program, though, it does not matter because everything has been reset, so a jump into hub doesn't cost anything, but a few cycles.
The hardware stack is a simple hardwired LIFO that just pushes and pops to/from the next level. There's no pointer, just a bunch of 22-bit-wide-registers that can load from the one below or above. The bottom-most one loads from Z/C/PC or D[21:0] on a CALL or PUSH. On a RET or POP, the top level is copied to the level below. So, after 8 pops, you keep getting the same data.
Ah, ok, that means you cannot do a simple 8 x POP to end up 'back where you were'.
The 'same data' is what feeds into the top-most register Dn on POP, which is what ? 0x000 ?
Why not couple that top most Dn to lower Qn to allow 8xPOP content-preserve read ?
Would be useful for debug, and some stack gymnastics some may want to try.
Sorry, I wasn't very clear was I.
I think I'm correct that when interrupts are enabled and an interrupt occurs, you effectively force a CALL or LINK (names may have changed to protect the innocent) which save C/Z & PC on a stack, the ISR (interrupt service routine) is then executed. Surely the ISR is also executed if the Cog is currently executing a WAIT instruction.
Thus, referring to your comment:
My question is can the same method of forced CALL or LINK as used in the ISR method be used to invoke the code for that Cog at the $400 + cogid*4 vector. This means if a WAIT is in effect, the asynchronous break will now be seen.
I tried to make it work like this the first time around, but popping the cog out of WAITs creates some mystery as to to what was going on, exactly, at the break, as we can't go back and recreate the WAIT circumstance. I decided it was better to NOT do that, since, in practice, WAITs free up soon enough.
If the thing is really locked up in a WAIT that is not releasing, you can COGINIT it and see where its PC was via the hardware stack. That's the last resort.
Routinely busting into WAITs just compromises the quality of debugging, I discovered. I liken it to being in the bathroom, taking care of your business so you can get back to whatever you were doing, when all of the sudden the door gets kicked in and a SWAT team yanks you off the pot. It just wasn't right.
That sounds fine. COGINIT gets immediate control, and the PC tells where it was.
And yes, agreed that waitx and rep are exceptions.
Good rationale, and as jmg has just reminded in his post we can inspect via the pushed PC.
So, if I am understanding Chip right, a debugger that's unwinding the stack can only detect the bottom when it encounters the same PC/C/Z a second time. Of course, this is also what a recursive call will look like. On the other hand, if you feed the popped value back in the other end of the stack, you have the opposite problem: you can never safely tell when you've reached the end of the stack. Of course, after 8 pops, it doesn't really matter.
As an alternative, I wonder if there'd be a way to maintain a stack depth counter that could be read. That way, you could know exactly how deep the stack is. Also, if the count was greater than 8, you'd also be able to detect a stack overflow (or underflow). As long as the counter was large enough (e.g. 10 bits), the odds of a run-away recursive routine being stopped by the debugger when the value was between 0 and 8 would be low (e.g <1%).
Note that some enterprising individual might also use this to virtualize the stack: detect when the stack is 8 and store some or all of the stack elsewhere, then detect when the stack is 0 and pull some or all of the stack back in from elsewhere. If you got really crazy, having the counter hit 8 or 0 would set another interrupt flag.
IMPORTANT! Chip: Even if the idea of adding a counter has merit, please do not put any more thought into it until after you've released another FPGA image.