Dumping COGRAM - code snippets & sample program

mindrobots · 2015-10-06 17:17

I've been playing around with some monitoring and debugging and found a really nifty reason I like the totally unassisted COGINIT and HUBEXEC. You can kill a running COG from a COG in HUBEXEC mode and dump out the COGRAM of the COG you just stopped! (OK, I thought it was cool)

From your "monitor" COG running in HUBEXEC, you can do something like this

' start a COG to dump itself
		mov	cognum,#2
		loc	ptrb,@dump_cog
		coginit	cognum,ptrb

where dumpcog is this simple code which will be running in HUBEXEC after the coginit (I haven't tried it with the target cog running in COGEXEC but I see no reason it would not work the same)

dump_cog	' start a COG to dump its memory to HUBRAM
		'
		loc	adra,@cog_dmp_img
		setq	#$1FF
		wrlong 	$000,adra
		cogid	0
		cogstop 0
		ret

' set it to $33333333 just so you notice changes
cog_dmp_img	long		$33333333[$1FF]

The data in cog_dmp_img will be a copy of the COGRAM from whatever COG you did the coginit on.

After starting the test COG with this HUBEXEC code

cog_exercise
		' load up COGRAM with pattern
		loc	adra,@pattern_buf
		setq	#$1EF 		' fill COGRAM
		rdlong	$010<<2,adra

		mov	$011<<2,#$01
		mov	$012<<2,#$02
		mov	$013<<2,#$04
		mov	$014<<2,#$08
		mov	$015<<2,#$10
		mov	$016<<2,#0
rollp		
		rol	$011<<2,#1	
		rol	$012<<2,#1
		rol	$013<<2,#1
		rol	$014<<2,#1
		rol	$015<<2,#1
		add	$016<<2,#1	
		jmp	@rollp

and then "dumping" it at some point, you end up with COGRAM looking like this:

FFFFFFFF-00000000-00000000-00000000-00000000-00000000-10101010-10101010-00001084-10101
010-10101010-10101010-10101010-10101010-10101010-10101010-10101010-00000001-00000002-0
0000002-00000004-00000008-000AA05F-10101010-10101010-10101010-10101010-10101010-101010
10-10101010-10101010-10101010-10101010-10101010-10101010-10101010-10101010-10101010-10
101010-10101010-10101010-10101010-10101010-10101010-10101010-10101010-10101010-1010101
0-10101010-10101010-10101010-10101010-10101010-10101010-10101010-10101010-10101010-101
01010-10101010-10101010-10101010-10101010-10101010-10101010-10101010-10101010-10101010

I thought it was pretty interesting...but then, I'm easily entertained.

The entire program is below if you want to play with it (it's ugly). Load it, start up PST, press any key to launch the second COG, press again to stop that COG, press again to dump the captured memory.

Have fun!
(Yeah, Chip, it's FUN to program the P2!! Thanks!!)

Electrodude · 2015-10-06 17:53

It would be really cool if when you stopped or restarted a cog, the old PC and any other necessary state (how much is there?) got written somewhere. Then you could do things like hijack a cog to do a debug dump and then have it resume whatever it was doing without it ever knowing! Sort of like external interrupts sourced from other cogs.

EDIT: It might be easier to instead add a way for a cog to inject a LINK or CALL[AB] or something into another cog's instruction pipeline.

jmg · 2015-10-06 18:06

Electrodude wrote: »

It would be really cool if when you stopped or restarted a cog, the old PC and any other necessary state (how much is there?) got written somewhere. Then you could do things like hijack a cog to do a debug dump and then have it resume whatever it was doing without it ever knowing! Sort of like external interrupts sourced from other cogs.

EDIT: It might be easier to instead add a way for a cog to inject a LINK or CALL[AB] or something into another cog's instruction pipeline.

If the new COG choices allow start of either Local or HUBEXEC, you could get close.
Some small overhead would exist, unless an external custom Debug code, pulled in some registers over a serial link, and then restored them.

Unlike the finesse of a break, this would be more a 'core dump' bigger hammer.
Co-operative debug of Step/Break is likely to be more useful, just need to make that as compact and 'invisible' as possible.

Seairth · 2015-10-06 18:22

@mindrobots: nice idea!

Cluso99 · 2015-10-06 23:16

Agreed, nice idea mindrobots!

We just need a register that stores the PC (+C&Z?) when a cogstop/coginit occurs so we could read it back to use as a restart!

jmg · 2015-10-07 00:30

Chip said this in another thread, so it sounds like nice Debug improvements are in the pipeline.

cgracey wrote: »

I've been detoured a little bit, also, thinking about how to establish debugging in cogs without them needing to add any debug code. I think I have it figured out. ...

cgracey · 2015-10-07 06:41

We have single-step breakpoint, address breakpoint, interrupt breakpoints, and asynchronous breakpoint. I'm thinking about having all cogs start in single-step mode and break to $400+cogid*4, when they start, as an initial breakpoint. If that hub long contains an RETI0 (return from debug interrupt), breakpoint mode is over and the cog runs normally. If you put a jump to a handler at that location, instead, you can debug however you want. This way, all cogs can be debugged if some setup is done in hub RAM before they launch.

This is one of those things that we'll need to feel out, somewhat, to make sure we've got the right approach.

Cluso99 · 2015-10-07 10:28

Chip,
When a coginit is issued on a cog, if the cog is/was already executing, would it be possible for the C&Z flags and the PC to be saved in a hidden register that could be read by a special instruction?

This would permit an errant program to be interrupted and examined and/or single stepped (debugged).

mindrobots · 2015-10-07 11:44

Cluso99 wrote: »

Chip,
When a coginit is issued on a cog, if the cog is/was already executing, would it be possible for the C&Z flags and the PC to be saved in a hidden register that could be read by a special instruction?

This would permit an errant program to be interrupted and examined and/or single stepped (debugged).

Good idea!

Can the COGINIT just push the CZ/PC to the COG's internal stack? If you want it after the COGINIT, you can just pop it off the stack, if not, life goes on.

It just becomes the known startup state of the COG without needing anything out of instruction space.

This becomes the NMI for a COG - the COG can't ignore it but at least it saves execution state for restart or forensics.

Electrodude · 2015-10-07 12:38

mindrobots wrote: »

Can the COGINIT just push the CZ/PC to the COG's internal stack? If you want it after the COGINIT, you can just pop it off the stack, if not, life goes on.

If you don't want the value, you wouldn't even need to pop it - just leave it there and nobody will ever notice it. If the stack overflows and the restart point falls off, it wouldn't even matter.

mindrobots · 2015-10-07 12:41

Electrodude wrote: »

mindrobots wrote: »

Can the COGINIT just push the CZ/PC to the COG's internal stack? If you want it after the COGINIT, you can just pop it off the stack, if not, life goes on.

If you don't want the value, you wouldn't even need to pop it - just leave it there and nobody will ever notice it. If the stack overflows and the restart point falls off, it wouldn't even matter.

That's exactly what I meant by "if not, life goes on"

78rpm · 2015-10-07 13:33

cgracey wrote: »

We have single-step breakpoint, address breakpoint, interrupt breakpoints, and asynchronous breakpoint. I'm thinking about having all cogs start in single-step mode and break to $400+cogid*4, when they start, as an initial breakpoint. If that hub long contains an RETI0 (return from debug interrupt), breakpoint mode is over and the cog runs normally. If you put a jump to a handler at that location, instead, you can debug however you want. This way, all cogs can be debugged if some setup is done in hub RAM before they launch.

This is one of those things that we'll need to feel out, somewhat, to make sure we've got the right approach.

This sounds an excellent method.

Would this have any impact on the code security / protection / cryptography facility? If someone flashed a protected manufactured product with their own image with a hook to their own debug software on Propeller 2 startup? I am not very well versed in encoded images so this may be irrelevant.

Obvious way to counter that is to inhibit on Cog 0 (which I think does the boot decryption) with a first-exec flip-flop, if it is needed at all.

cgracey · 2015-10-07 18:28

78rpm wrote: »

cgracey wrote: »

We have single-step breakpoint, address breakpoint, interrupt breakpoints, and asynchronous breakpoint. I'm thinking about having all cogs start in single-step mode and break to $400+cogid*4, when they start, as an initial breakpoint. If that hub long contains an RETI0 (return from debug interrupt), breakpoint mode is over and the cog runs normally. If you put a jump to a handler at that location, instead, you can debug however you want. This way, all cogs can be debugged if some setup is done in hub RAM before they launch.

This is one of those things that we'll need to feel out, somewhat, to make sure we've got the right approach.

This sounds an excellent method.

Would this have any impact on the code security / protection / cryptography facility? If someone flashed a protected manufactured product with their own image with a hook to their own debug software on Propeller 2 startup? I am not very well versed in encoded images so this may be irrelevant.

Obvious way to counter that is to inhibit on Cog 0 (which I think does the boot decryption) with a first-exec flip-flop, if it is needed at all.

Code security would inhibit anybody from modifying any program images. If changed, they just wouldn't work.

With the debugging hooks, we already have asynchronous breaks from other cogs, which won't reset the ports and I/O states. The limitation is, though, that if the cog is locked up in some kind of a WAIT, the asynchronous break will never be seen.

I'll see about pushing C/Z/PC onto the hardware stack when a cog is COGINIT'd. In some cases, that would be your only hope of discovering what went wrong.

Man, you guys come up with some great ideas!

78rpm · 2015-10-07 21:20

cgracey wrote: »

With the debugging hooks, we already have asynchronous breaks from other cogs, which won't reset the ports and I/O states. The limitation is, though, that if the cog is locked up in some kind of a WAIT, the asynchronous break will never be seen.

I'll see about pushing C/Z/PC onto the hardware stack when a cog is COGINIT'd. In some cases, that would be your only hope of discovering what went wrong.

Man, you guys come up with some great ideas!

[/quote]

Pushing C/Z/PC when a COGINIT is issued sounds preferable to forcing the WAIT to see a falsified completion state, though you would then be dependant on the hardware stacking having a free entry, which hopefully in most cases it would.

Could you perhaps instead force a similar type of action on WAIT as when say a timer interrupt occurs and you force a CALL (LINK?) to the ISR, instead forcing a CALL/LINK to the asynchronous break instruction address @$400+(CogID*4) ?

jmg · 2015-10-07 21:40

78rpm wrote: »

Pushing C/Z/PC when a COGINIT is issued sounds preferable to forcing the WAIT to see a falsified completion state, though you would then be dependant on the hardware stacking having a free entry, which hopefully in most cases it would.

I think the stack wraps, so there will always be space, it means the oldest stack value may be over written.

78rpm · 2015-10-07 21:56

jmg wrote: »

I think the stack wraps, so there will always be space, it means the oldest stack value may be over written.

Ah ha, the good old fashioned 'infinity stack' ! :-D

cgracey · 2015-10-07 22:41

78rpm wrote: »

...Could you perhaps instead force a similar type of action on WAIT as when say a timer interrupt occurs and you force a CALL (LINK?) to the ISR, instead forcing a CALL/LINK to the asynchronous break instruction address @$400+(CogID*4) ?

Could you please elaborate? I'm not understanding.

Wait.. there may be some misunderstanding here. Those $400 + cogid*4 vectors are only initial vectors. Those addresses are set in the tiny boot ROM within the cog. They can be changed to point anywhere, after the initial break. In some cases, you will want to have your whole debug routine in cog or LUT, to not suffer eggbeater-FIFO disruption that comes with hub exec. At the outset of a program, though, it does not matter because everything has been reset, so a jump into hub doesn't cost anything, but a few cycles.

cgracey · 2015-10-07 22:44

jmg wrote: »

78rpm wrote: »

Pushing C/Z/PC when a COGINIT is issued sounds preferable to forcing the WAIT to see a falsified completion state, though you would then be dependant on the hardware stacking having a free entry, which hopefully in most cases it would.

I think the stack wraps, so there will always be space, it means the oldest stack value may be over written.

The hardware stack is a simple hardwired LIFO that just pushes and pops to/from the next level. There's no pointer, just a bunch of 22-bit-wide-registers that can load from the one below or above. The bottom-most one loads from Z/C/PC or D[21:0] on a CALL or PUSH. On a RET or POP, the top level is copied to the level below. So, after 8 pops, you keep getting the same data.

jmg · 2015-10-07 23:02

cgracey wrote: »

The hardware stack is a simple hardwired LIFO that just pushes and pops to/from the next level. There's no pointer, just a bunch of 22-bit-wide-registers that can load from the one below or above. The bottom-most one loads from Z/C/PC or D[21:0] on a CALL or PUSH. On a RET or POP, the top level is copied to the level below. So, after 8 pops, you keep getting the same data.

Ah, ok, that means you cannot do a simple 8 x POP to end up 'back where you were'.
The 'same data' is what feeds into the top-most register Dn on POP, which is what ? 0x000 ?
Why not couple that top most Dn to lower Qn to allow 8xPOP content-preserve read ?
Would be useful for debug, and some stack gymnastics some may want to try.

78rpm · 2015-10-07 23:03

cgracey wrote: »

78rpm wrote: »

...Could you perhaps instead force a similar type of action on WAIT as when say a timer interrupt occurs and you force a CALL (LINK?) to the ISR, instead forcing a CALL/LINK to the asynchronous break instruction address @$400+(CogID*4) ?

Could you please elaborate? I'm not understanding.

Wait.. there may be some misunderstanding here. Those $400 + cogid*4 vectors are only initial vectors. Those addresses are set in the tiny boot ROM within the cog. They can be changed to point anywhere, after the initial break. In some cases, you will want to have your whole debug routine in cog or LUT, to not suffer eggbeater-FIFO disruption that comes with hub exec. At the outset of a program, though, it does not matter because everything has been reset, so a jump into hub doesn't cost anything, but a few cycles.

Sorry, I wasn't very clear was I.

I think I'm correct that when interrupts are enabled and an interrupt occurs, you effectively force a CALL or LINK (names may have changed to protect the innocent) which save C/Z & PC on a stack, the ISR (interrupt service routine) is then executed. Surely the ISR is also executed if the Cog is currently executing a WAIT instruction.

Thus, referring to your comment:

cgracey wrote: »
With the debugging hooks, we already have asynchronous breaks from other cogs, which won't reset the ports and I/O states. The limitation is, though, that if the cog is locked up in some kind of a WAIT, the asynchronous break will never be seen.

My question is can the same method of forced CALL or LINK as used in the ISR method be used to invoke the code for that Cog at the $400 + cogid*4 vector. This means if a WAIT is in effect, the asynchronous break will now be seen.

cgracey · 2015-10-08 00:51

78rpm wrote: »

cgracey wrote: »

78rpm wrote: »

...Could you perhaps instead force a similar type of action on WAIT as when say a timer interrupt occurs and you force a CALL (LINK?) to the ISR, instead forcing a CALL/LINK to the asynchronous break instruction address @$400+(CogID*4) ?

Could you please elaborate? I'm not understanding.

Wait.. there may be some misunderstanding here. Those $400 + cogid*4 vectors are only initial vectors. Those addresses are set in the tiny boot ROM within the cog. They can be changed to point anywhere, after the initial break. In some cases, you will want to have your whole debug routine in cog or LUT, to not suffer eggbeater-FIFO disruption that comes with hub exec. At the outset of a program, though, it does not matter because everything has been reset, so a jump into hub doesn't cost anything, but a few cycles.

Sorry, I wasn't very clear was I.

I think I'm correct that when interrupts are enabled and an interrupt occurs, you effectively force a CALL or LINK (names may have changed to protect the innocent) which save C/Z & PC on a stack, the ISR (interrupt service routine) is then executed. Surely the ISR is also executed if the Cog is currently executing a WAIT instruction.

Thus, referring to your comment:

cgracey wrote: »
With the debugging hooks, we already have asynchronous breaks from other cogs, which won't reset the ports and I/O states. The limitation is, though, that if the cog is locked up in some kind of a WAIT, the asynchronous break will never be seen.

My question is can the same method of forced CALL or LINK as used in the ISR method be used to invoke the code for that Cog at the $400 + cogid*4 vector. This means if a WAIT is in effect, the asynchronous break will now be seen.

I tried to make it work like this the first time around, but popping the cog out of WAITs creates some mystery as to to what was going on, exactly, at the break, as we can't go back and recreate the WAIT circumstance. I decided it was better to NOT do that, since, in practice, WAITs free up soon enough.

If the thing is really locked up in a WAIT that is not releasing, you can COGINIT it and see where its PC was via the hardware stack. That's the last resort.

Routinely busting into WAITs just compromises the quality of debugging, I discovered. I liken it to being in the bathroom, taking care of your business so you can get back to whatever you were doing, when all of the sudden the door gets kicked in and a SWAT team yanks you off the pot. It just wasn't right.

jmg · 2015-10-08 00:55

cgracey wrote: »

If the thing is really locked up in a WAIT that is not releasing, you can COGINIT it and see where its PC was via the hardware stack.

That sounds fine. COGINIT gets immediate control, and the PC tells where it was.

Cluso99 · 2015-10-08 01:06

cgracey wrote: »

.....
I liken it to being in the bathroom, taking care of your business so you can get back to whatever you were doing, when all of the sudden the door gets kicked in and a SWAT team yanks you off the pot. It just wasn't right.

ROFL

And yes, agreed that waitx and rep are exceptions.

78rpm · 2015-10-08 01:21

cgracey wrote: »

I tried to make it work like this the first time around, but popping the cog out of WAITs creates some mystery as to to what was going on, exactly, at the break, as we can't go back and recreate the WAIT circumstance. I decided it was better to NOT do that, since, in practice, WAITs free up soon enough.

Good rationale, and as jmg has just reminded in his post we can inspect via the pushed PC.

Seairth · 2015-10-08 02:12

jmg wrote: »

cgracey wrote: »

The hardware stack is a simple hardwired LIFO that just pushes and pops to/from the next level. There's no pointer, just a bunch of 22-bit-wide-registers that can load from the one below or above. The bottom-most one loads from Z/C/PC or D[21:0] on a CALL or PUSH. On a RET or POP, the top level is copied to the level below. So, after 8 pops, you keep getting the same data.

Ah, ok, that means you cannot do a simple 8 x POP to end up 'back where you were'.
The 'same data' is what feeds into the top-most register Dn on POP, which is what ? 0x000 ?
Why not couple that top most Dn to lower Qn to allow 8xPOP content-preserve read ?
Would be useful for debug, and some stack gymnastics some may want to try.

So, if I am understanding Chip right, a debugger that's unwinding the stack can only detect the bottom when it encounters the same PC/C/Z a second time. Of course, this is also what a recursive call will look like. On the other hand, if you feed the popped value back in the other end of the stack, you have the opposite problem: you can never safely tell when you've reached the end of the stack. Of course, after 8 pops, it doesn't really matter.

As an alternative, I wonder if there'd be a way to maintain a stack depth counter that could be read. That way, you could know exactly how deep the stack is. Also, if the count was greater than 8, you'd also be able to detect a stack overflow (or underflow). As long as the counter was large enough (e.g. 10 bits), the odds of a run-away recursive routine being stopped by the debugger when the value was between 0 and 8 would be low (e.g <1%).

Note that some enterprising individual might also use this to virtualize the stack: detect when the stack is 8 and store some or all of the stack elsewhere, then detect when the stack is 0 and pull some or all of the stack back in from elsewhere. If you got really crazy, having the counter hit 8 or 0 would set another interrupt flag.

IMPORTANT! Chip: Even if the idea of adding a counter has merit, please do not put any more thought into it until after you've released another FPGA image.

Dumping COGRAM - code snippets & sample program

Comments