Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

cgracey · 2018-03-16 04:08

The debugger is aiming to take the "automagic" approach. There should be no code difference between debug and release versions of customers' apps. You'd just hit, maybe F11, instead of shift-F11 to do the different downloads.

jmg · 2018-03-16 04:12

cgracey wrote: »

Ah, but I keep forgetting... having separate debug programs (64 longs, or whatever size) means that you need to know the number of the cog BEFORE you start it. This practically means doing a COGNEW to get a cog, but have it launch into an infinite loop program, then you ready its debug program from its newly-discovered ID, then COGINIT that cog with the final program. It's a pain in the butt when you are in the context of the greater tool system, which needs to be able to engage cogs on a random basis.

Wouldn't a debugger also need to be able to stipulate the COG being run, in order to have a 'least variables' test environment.
Seems 'pick any random COG' is adding another variable for little/no benefit ?

jmg · 2018-03-16 04:13

cgracey wrote: »

The debugger is aiming to take the "automagic" approach. There should be no code difference between debug and release versions of customers' apps. You'd just hit, maybe F11, instead of shift-F11 to do the different downloads.

Sounds cool - and you can do that over 8 cogs, or just over 7 ?

msrobots · 2018-03-16 04:24

Question for clarification, please.

Do we still have the debug vectors accessible by a COG or is the current Debug-Mechanism completely predefined?

Sure having a Debug interrupt seamlessly replacing some COG RAM and run some ROM based debug code is nice, but can this be optional?

Is it still possible that one can override the - hopefully still existing - debug interrupt vector with a call to a own routine instead of calling the build in debugger, like one can write code for other interrupts?

This should be possible.

Mike

cgracey · 2018-03-16 04:24

jmg wrote: »

cgracey wrote: »

The debugger is aiming to take the "automagic" approach. There should be no code difference between debug and release versions of customers' apps. You'd just hit, maybe F11, instead of shift-F11 to do the different downloads.

Sounds cool - and you can do that over 8 cogs, or just over 7 ?

If I enhance the LOCK bits, all 8 cogs.

cgracey · 2018-03-16 04:28

jmg wrote: »

cgracey wrote: »

Ah, but I keep forgetting... having separate debug programs (64 longs, or whatever size) means that you need to know the number of the cog BEFORE you start it. This practically means doing a COGNEW to get a cog, but have it launch into an infinite loop program, then you ready its debug program from its newly-discovered ID, then COGINIT that cog with the final program. It's a pain in the butt when you are in the context of the greater tool system, which needs to be able to engage cogs on a random basis.

Wouldn't a debugger also need to be able to stipulate the COG being run, in order to have a 'least variables' test environment.
Seems 'pick any random COG' is adding another variable for little/no benefit ?

Pick-any-random-cog is already a feature of the Propeller. It doesn't want to be straight-jacketed into absolutism. You don't program it that way. So, we can't introduce such stricture when it comes to debug.

jmg · 2018-03-16 04:31

cgracey wrote: »

jmg wrote: »

cgracey wrote: »

The debugger is aiming to take the "automagic" approach. There should be no code difference between debug and release versions of customers' apps. You'd just hit, maybe F11, instead of shift-F11 to do the different downloads.

Sounds cool - and you can do that over 8 cogs, or just over 7 ?

If I enhance the LOCK bits, all 8 cogs.

Then that sounds well worth doing.

cgracey · 2018-03-16 04:34

msrobots wrote: »

Question for clarification, please.

Do we still have the debug vectors accessible by a COG or is the current Debug-Mechanism completely predefined?

Sure having a Debug interrupt seamlessly replacing some COG RAM and run some ROM based debug code is nice, but can this be optional?

Is it still possible that one can override the - hopefully still existing - debug interrupt vector with a call to a own routine instead of calling the build in debugger, like one can write code for other interrupts?

This should be possible.

Mike

Yes. It is all RAM based. The only ROM code is the 5 instruction program that saves the lower registers, loads in the debug program, and jumps to it, and then the 3 instruction program which restores the registers and does an RETI0. The memory locking is there to securely establish a reliable general-purpose debugger.

So, the individual vectors are gone, but you have register save buffers for each cog, and then a common debug program which gets swapped in. Before all this, you have the hidden INA debug interrupt vector which could point to anywhere in cog or hub memory. The vector initially points to the five-instruction ROM program at $1F8.

jmg · 2018-03-16 04:34

cgracey wrote: »

Pick-any-random-cog is already a feature of the Propeller. It doesn't want to be straight-jacketed into absolutism. You don't program it that way. So, we can't introduce such stricture when it comes to debug.

Understood, but some customers will not want to be straight-jacketed into randomness, either.
Seems to me being able to define a COG for debug stages of project development, is a way to eliminate variables, which everyone is keen to do when chasing down issues...

cgracey · 2018-03-16 04:45

jmg wrote: »

cgracey wrote: »

Pick-any-random-cog is already a feature of the Propeller. It doesn't want to be straight-jacketed into absolutism. You don't program it that way. So, we can't introduce such stricture when it comes to debug.

Understood, but some customers will not want to be straight-jacketed into randomness, either.
Seems to me being able to define a COG for debug stages of project development, is a way to eliminate variables, which everyone is keen to do when chasing down issues...

I'm thinking the approach here must be to have individual debug enable bits by cog, instead of a global enable bit. It might take some experimentation to discover which cog you actually want to singly enable debugging on. Looking at one's source code, you never know which cog is going to be used at run time.

msrobots · 2018-03-16 04:49

cgracey wrote: »

jmg wrote: »

cgracey wrote: »

Pick-any-random-cog is already a feature of the Propeller. It doesn't want to be straight-jacketed into absolutism. You don't program it that way. So, we can't introduce such stricture when it comes to debug.

Understood, but some customers will not want to be straight-jacketed into randomness, either.
Seems to me being able to define a COG for debug stages of project development, is a way to eliminate variables, which everyone is keen to do when chasing down issues...

I'm thinking the approach here must be to have individual debug enable bits by cog, instead of a global enable bit. It might take some experimentation to discover which cog you actually want to singly enable debugging on. Looking at one's source code, you never know which cog is going to be used at run time.

Just put a break into the code,enable all, rerun and you know your COG number...

Mike

cgracey · 2018-03-16 04:51

msrobots wrote: »

cgracey wrote: »

jmg wrote: »

cgracey wrote: »

Pick-any-random-cog is already a feature of the Propeller. It doesn't want to be straight-jacketed into absolutism. You don't program it that way. So, we can't introduce such stricture when it comes to debug.

Understood, but some customers will not want to be straight-jacketed into randomness, either.
Seems to me being able to define a COG for debug stages of project development, is a way to eliminate variables, which everyone is keen to do when chasing down issues...

I'm thinking the approach here must be to have individual debug enable bits by cog, instead of a global enable bit. It might take some experimentation to discover which cog you actually want to singly enable debugging on. Looking at one's source code, you never know which cog is going to be used at run time.

Just put a break into the code, rerun and you know your COG number...

Mike

Good idea!

cgracey · 2018-03-16 05:29

I just added 16 cog debug enable bits, instead of one global bit, in the HUBSET instruction.

This will allow you to easily handle one cog in debug, without the 64-long debug program having to field all kinds of debug interrupts from cogs that you're not interested in.

This is nice, too, because there may only be one or a few cogs of interest which you want to debug, and this will prevent unwanted debug traffic from even occurring.

Now, I'm working on those LOCK bits cancelling when their owner gets reset.

Cluso99 · 2018-03-16 10:01

This is way more complicated than it needs to be.

It's easy enough to have a BREAK instruction executed (in any cog) that jumps to a cog location that jumps to hubexec. The code in hub will perform whatever debugging is necessary and when complete, RETI. That's only one cog location for a "jump to hub" instruction. No need for any registers or shadow ROM or specific code in a specific hub location.

To expand on this, if 5-6 cog registers are reserved, then a complete LMM execution engine can be built. I did this in the P1 in my "Zero Footprint Debugger" that totally resides in the start of shadow ram.

ozpropdev · 2018-03-16 10:47

Unless things have changed I don't think we can avoid leaving a debug footprint of some sort.
We need to also save the FIFO hub pointer (GETPTR) and also restore it in cog exec to keep things valid when bouncing back and forth from hub exec.

cgracey · 2018-03-16 11:31

No. You guys are missing the point!

By swapping out lower cog registers with debug code and then restoring them afterwards, we are avoiding messing up whatever state the hub FIFO is in. In the Spin interpreter, it is continuously doing RFBYTE via XBYTE to get more program data. Jumping to hub would blow that up. What we have is not intrusive and leaves no footprint, at all. The only footprint is missing time.

And when the debugger engages, your code doesn't need to change, at all. What happens is, the loader sets up debugging and kicks things off as normal. Except, now you can interrupt your program and see what it's doing. In hub, the last 16KB of RAM is used at the top of the map to act as a stealthy debug buffer which is write-protected from all non-debug-ISR access.

It's like alien abduction or MK-Ultra personality splintering. The main program doesn't know it's going on.

cgracey · 2018-03-16 11:35

Another matter is that the FIFO is too complex to restore, exactly. You don't know what FBLOCK might have been done or what the original block size and start-address were. Also, the streamer could continue to use the FIFO during a debug interrupt.

cgracey · 2018-03-16 11:47

I've changed the way the LOCK bits work, so that they are very robust. In place of the old LOCKCLR/LOCKSET instructions, there is now just LOCKTRY:

LOCKTRY {#}D WC

LOCKTRY attempts to get the D[3:0] lock and "own" it. If it succeeds, C=1, else C=0. The cog that "owns" the lock is the only one that can release it, through another LOCKTRY. So, LOCKTRY attempts to get the lock, or releases it if it the cog alreadys owns it. If the owner cog gets reset via COGSTOP or COGINIT before releasing the lock, the lock automatically releases and becomes fair game for all other cogs vying for it. So, now, lock bits get owned by cogs, which is different than before.

For debug which doesn't require a coordinator cog, I needed a lock that worked this way, so that in case some cog is sitting in a debug ISR having a chat with the host, and some other cog does a COGSTOP or COGINIT on him, his lock is freed and another cog can capture it and safely take over the serial port to maintain host connection. As cogs own and release the lock that will be used for debugging, they acquire it in a round-robin fashion, so that no cog gets left out. This is a byproduct of the hub spinning around.

Seairth · 2018-03-16 12:05

And the debug mechanism is using those look bits? What if you're already using them in your code?

cgracey · 2018-03-16 12:10

Seairth wrote: »

And the debug mechanism is using those look bits? What if you're already using them in your code?

There are 16 LOCK bits. We just need one for debugging without a coordinator cog. Before your app runs, we'll check out (LOCKNEW) all 16 locks, then return LOCKs 0..14, keeping 15 reserved. This is maybe the single instance where some resource does get taken away. Probably no one is going to miss a single LOCK bit.

cgracey · 2018-03-16 13:00

While making the new LOCK bits, I had to relearn the hub pipeline timing, which is pretty simple compared to the cog timing. I realized there was a latent bug where a cog could issue a COGINIT/COGSTOP on another cog, one clock before the other cog does a COGSTOP/COGINIT on himself, causing the other cog's COGINIT/COGSTOP to enter the hub pipeline on the clock just before he was getting reset, allowing his COGINIT/COGSTOP command to execute right after the one intended by the first cog. It turned out to be an easy fix, thankfully. That would have caused some very hard-to-diagnose intermittent bugs. Cog N would have needed to do a COGINIT/COGSTOP on cog N+1 when cog N+1 was doing a COGINIT/COGSTOP on himself on the next clock.

Anyway, the new LOCKTRY locks seem to work fine. When a cog gets shut down, any locks owned by him are freed.

TonyB_ · 2018-03-16 13:56

cgracey wrote: »

I've changed the way the LOCK bits work, so that they are very robust. In place of the old LOCKCLR/LOCKSET instructions, there is now just LOCKTRY:

LOCKTRY {#}D WC

LOCKTRY attempts to get the D[3:0] lock and "own" it. If it succeeds, C=1, else C=0. The cog that "owns" the lock is the only one that can release it, through another LOCKTRY. So, LOCKTRY attempts to get the lock, or releases it if it the cog alreadys owns it.

Why not keep separate LOCKCLR/LOCKSET or LOCKCLR/LOCKTRY names for clarity? A LOCKCLR alias, if nothing else.

If debugging is not in use, do we still have 512KB of contiguous hub RAM starting at 0?

Yanomani · 2018-03-16 14:05

Hi Chip

IMHO, being capable of time-stamping the exact moment that a debug interrupt was entered could be useful to solve some intrincate problems, including keeping a number of COGs in sync.

But, before any debug processing can take place, there are many HUB accesses that must be done, to transfer registers to/from HUB ram.

Since this will consume some time and will depend on the particular moment the interrupt arrives and begins executing, in relation to the current position of the round-robin mechanism, it'll be very hard, if not impossible, to calculate the value of CT at the time when debug interrupt has begun its execution, at each interrupted COG.

Sure, if a certain COG is firing degug at many, this COG can save a copy of CT, to be lately retrieved by the other ones.

But the interrupted ones will not know the exact moment when their "normal" code ceases execution, at the beggining of each one's debug code.

In such situation, is there any way to freeze a copy of CT for each COG, to be later retrieved during each one individual debug processing?

Roy Eltham · 2018-03-16 16:46

Chip,
I like the new LOCKTRY for the part where it tries to get a lock, but I think releasing the lock should be explicit with a different instruction (LOCKCLR). It will be confusing to read code that has LOCKTRY getting the lock and LOCKTRY releasing the lock. How will you know which is which without explicitly commenting it? Please keep LOCKTRY, but bring back LOCKCLR to do the releasing.

potatohead · 2018-03-16 16:50

I second this.

cgracey · 2018-03-16 20:13

I agree.

jmg · 2018-03-16 20:47

cgracey wrote: »

Another matter is that the FIFO is too complex to restore, exactly. You don't know what FBLOCK might have been done or what the original block size and start-address were. Also, the streamer could continue to use the FIFO during a debug interrupt.

That's quite compelling.
I take it this assumes the Debug stub fits entirely inside swapped COG, and never uses HUBEXEC ?

cgracey · 2018-03-16 20:48

LOCKCLR is a little bit of a misnomer, though, and may cause more confusion with the older LOCKs.

How about these:

LOCKTRY
LOCKBYE

jmg · 2018-03-16 20:50

Yanomani wrote: »

IMHO, being capable of time-stamping the exact moment that a debug interrupt was entered could be useful to solve some intrincate problems, including keeping a number of COGs in sync.

But, before any debug processing can take place, there are many HUB accesses that must be done, to transfer registers to/from HUB ram.
...
In such situation, is there any way to freeze a copy of CT for each COG, to be later retrieved during each one individual debug processing?

Good point.
The time jitter is not large, so it may be possible to save a smaller copy of CT, to save logic resource ? Debug code sends both values to the PC-side for adjustments.
Is 5~6 bits sufficient here ?

cgracey · 2018-03-16 20:51

jmg wrote: »

cgracey wrote: »

Another matter is that the FIFO is too complex to restore, exactly. You don't know what FBLOCK might have been done or what the original block size and start-address were. Also, the streamer could continue to use the FIFO during a debug interrupt.

That's quite compelling.
I take it this assumes the Debug stub fits entirely inside swapped COG, and never uses HUBEXEC ?

If 64 longs isn't big enough, overlays can be done with SETQ+RDLONG. So, we'll have as big of a virtual cog-register-based debugger as we need. We could get by with 16 longs and overlays just as well, and it would save hub RAM and time. There's not much elbow room in 16 longs, though. Even 8 longs could work, but there'd be overlays out the wazoo. 64 is quite rich, anyway.

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments