Propeller 2 debug specification

ozpropdev · 2019-08-10 00:34

I'm finalizing testing of my Prop2 debugger and have a few thoughts.

Currently i'm using Pnut as my primary compiler and extract debug data from
the .lst file generated by the Ctrl-M function.

Consider the following test code

con
		xtal = 20_000_000

dat
		org
start		nop

		orgf	12
label1		nop
.loop		nop

label2		nop
.loop		nop

myreg		res	1

'hub only addresses
		orgh

mylong		long	1
myword		word	1
mybyte		byte	1

PNut produces the following debug data

TYPE: 4B   VALUE: 01312D00   NAME: XTAL
TYPE: 56   VALUE: 00000000   NAME: START
TYPE: 56   VALUE: 00C00030   NAME: LABEL1
TYPE: 56   VALUE: 00D00034   NAME: LOOP'0002
TYPE: 56   VALUE: 00E00038   NAME: LABEL2
TYPE: 56   VALUE: 00F0003C   NAME: LOOP'0003
TYPE: 57   VALUE: 01000040   NAME: MYREG
TYPE: 56   VALUE: FFF00040   NAME: MYLONG
TYPE: 55   VALUE: FFF00044   NAME: MYWORD
TYPE: 54   VALUE: FFF00046   NAME: MYBYTE

Looking at the output a bit closer we find

Type 4B represents a constant
Type 56 Represents a "LONG"
Type 55 Represents a "WORD"
Type 54 Represents a "BYTE"
Type 57 Represents a "RES" location

Thr format of the "VALUE" is either a combination of cog and hub address, or the constant value.

VALUE = cog_address << 20 | hub_address

Note that addresses defined under a ORGH directive return a cog address of FFF.

and NAME is self explanatory.

While this information has been very useful I beleive we need to expand this more.
A pronlem arises when you have multople labels representing the same cog/lut address.
What is needed is a means of seperating definitions into groups/blocks for each cog.

Maybe somthing like

	CON	'adc_cog'
	DAT	'adc_cog'
.
.
	CON	'hdmi'
	DAT	'hdmi'

If no name is included with the CON/DAT directive the constants/labels are global.

or maybe a "BLOVK" or "NAME" directive?

	BLOCK	"adc_cog"

so we end up with something like

TYPE: 4B   VALUE: 01312D00   BLOCK: ADC_COG NAME: XTAL
TYPE: 56   VALUE: 00000000   BLOVK: ADC_COG NAME: START

This way the user can select the correct labels to match the cog being debugged.

Thoughts?

jmg · 2019-08-10 00:56

I've seen some languages include the Module or Block names as part of the reported VAR name, so it looks like this

            DATA    0072H     0001H     UNIT         ?DT?_USBD_EPISBUSY?EFM8_USBD
            IDATA   0073H     002BH     UNIT         ?ID?EFM8_USBD

Cluso99 · 2019-08-10 03:42

Ozpropdev,
Your solution doesn’t solve local variable names such as “.loop” which are quite common. Does this matter?

ozpropdev · 2019-08-10 04:55

Cluso99 wrote: »

Ozpropdev,
Your solution doesn’t solve local variable names such as “.loop” which are quite common. Does this matter?

Pnut already deals with multiple ".label" names in the same code space.
From the sample code above the two ".loop" labels are appended with a unique number.

TYPE: 56   VALUE: 00D00034   NAME: LOOP'0002
TYPE: 56   VALUE: 00F0003C   NAME: LOOP'0003

cgracey · 2019-08-10 05:21

I've thought about this problem, myself. You don't know, even at run time, what cog is going to run what code. The debugger must intercept every COGINIT and use some AI to figure out what program is running in it. I couldn't think of any way around the problem.

ozpropdev · 2019-08-10 06:41

cgracey wrote: »

I've thought about this problem, myself. You don't know, even at run time, what cog is going to run what code. The debugger must intercept every COGINIT and use some AI to figure out what program is running in it. I couldn't think of any way around the problem.

You have to know what cog is going to run your code in order to set the appropriate debug enable bits with hubset.
So you can't use coginit #16,#address anyway.

cgracey · 2019-08-10 06:52

ozpropdev wrote: »

cgracey wrote: »

I've thought about this problem, myself. You don't know, even at run time, what cog is going to run what code. The debugger must intercept every COGINIT and use some AI to figure out what program is running in it. I couldn't think of any way around the problem.

You have to know what cog is going to run your code in order to set the appropriate debug enable bits with hubset.
So you can't use coginit #16,#address anyway.

No. For a system-level debug, you enable debugging on every cog and trap every COGINIT. That's why the upper memory protection is there and there's the requirement that only a cog in a debug ISR can asyncronously breakpoint another cog. It's like hypnosis, where the user app doesn't know these things are going on, and the debug app is untouchable.

ozpropdev · 2019-08-10 06:54

Also you need to know the cog number to calculate the debug ISR address.

ozpropdev · 2019-08-10 07:02

cgracey wrote: »

ozpropdev wrote: »

cgracey wrote: »

I've thought about this problem, myself. You don't know, even at run time, what cog is going to run what code. The debugger must intercept every COGINIT and use some AI to figure out what program is running in it. I couldn't think of any way around the problem.

You have to know what cog is going to run your code in order to set the appropriate debug enable bits with hubset.
So you can't use coginit #16,#address anyway.

No. For a system-level debug, you enable debugging on every cog and trap every COGINIT. That's why the upper memory protection is there and there's the requirement that only a cog in a debug ISR can asyncronously breakpoint another cog. It's like hypnosis, where the user app doesn't know these things are going on, and the debug app is untouchable.

Ok, I hadn't thought of enabling ALL cogs in debug mode.
Enabling all now....

cgracey · 2019-08-10 07:05

ozpropdev wrote: »

Also you need to know the cog number to calculate the debug ISR address.

The hardware handles that for the debug interrupt, but there is need to make some reporting convention so that a debug ISR can alert the greater context to its status. That would involve using COGID to calculate its status-reporting address, which is a convention that the greater debugger software must define.

cgracey · 2019-08-10 07:06

ozpropdev wrote: »

Also you need to know the cog number to calculate the debug ISR address.

Ah, you need to have debugger stub code ready for ALL cogs, up at the top of memory. You don't know what to expect, so you must be ready for anything.

Cluso99 · 2019-08-10 07:20

I would have thought you would only be debugging a single cog at runtime, so you could know this in advance.

cgracey · 2019-08-10 07:25

Cluso99 wrote: »

I would have thought you would only be debugging a single cog at runtime, so you could know this in advance.

You could, but that gets complicated, as special setup is needed. Better to debug the entire app, and when a cog program of interest is identified on COGINIT, do special debug handling for it, while ignoring the others. Or, maintain awareness of ALL cogs and allow everything to be debugged. Lots of possibilities, but probably best to keep the door open to all of them.

You don't know, because of COGNEW, what cog will be running what code. So, you need to trap everything, like Checkpoint Charley, and look at their papers, which may not even make sense if they are running code that was built at runtime. In that case, you may not be able to offer symbolic debugging.

cgracey · 2019-08-10 07:46

You can build a system-level debugger right away, by enabling debug on all cogs and having it communicate with a host system. It would signal an event, at least, each time a COGINIT occurs.

To make it most useful, you need to add context awareness, where you look at PTRB when a cog starts up, and try to match it up to source code, so that you can display things sensibly. That's going to be most of the work. The debugger in the upper 16KB is pretty finite. It's the host platform that needs to figure out the relationships back to the source code and provide the user interface.

cgracey · 2019-08-10 07:57

You could build a context-free debugger, at first, and then add the source-code awareness later. Man, I would love to work on that now, but Spin2 needs to get finished first.

ozpropdev · 2019-08-10 08:05

cgracey wrote: »

You could build a context-free debugger, at first, and then add the source-code awareness later. Man, I would love to work on that now, but Spin2 needs to get finished first.

I'm very close to having it all running Chip.
I already can symbolic debug a single cog now imcluding xbyte with spin2 symbols.

I just broke things with ALL debug though.

Fixing it now.

cgracey · 2019-08-10 08:19

Sounds good, Ozpropdev.

It would be so awesome to have a 4k monitor loaded with debug info, animating as things run. Being able to watch variables by name would be so neat. Also, automatic graphing woulld be nice. The more the programmer can see into his app, the better he can develop it.

Imagine debugging commands that could be written right into the source code. They would generate no code, but just instruct the debugger on what to display and how to show it. So many possibilities, but the debugger software that runs in the top of hub RAM only needs to do several things to make all this possible. There's not much to it. It's the development platform that must orchestrate all the source-to-debugger dialogue.

cgracey · 2019-08-10 08:44

Imagine a graphic heirarchical display of all your app's objects. Live debug data from each object could be displayed in its associated graphic. You could click to expand or minimize objects.

cgracey · 2019-08-10 09:11

One more thing, now that I'm remembering...

The LOCK functionality was changed so that if the LOCK-holding cog is stopped (or restarted), that LOCK is freed and up-for-grabs again. This was done so that the debugging system could use, say, LOCK #15 (since it is the most remote and would be LOCKNEW'd ahead of time, anyway) as a permission to use the host connection.

Each time a cog were to go into a debug ISR, it would wait to get that LOCK, so that it could exclusively use the serial connection on P63/P62 to ask the host what it wanted to do about that debug interrupt. The host might say, "give me these registers and these RAM data, then continue executing." Or, the host might know that the user must now be polled for a response, so it says, "Ask me in another 100ms." After each communication, the cog in the debug ISR holding the LOCK would release it, in case another cog is in, or about to be in, a debug ISR, needing its own marching orders from the host.

The point is, there would be no central debug app in the P2. Each cog would negotiate its own interaction with the host on each debug ISR event. A LOCK is used to share the host connection among all cogs in debug ISRs. The host must juggle conversations with the cogs. In case a cog gets restarted, the host would get a just-started debug message from that cog, indicating that any previous conversation in progress with that cog is now superceded by a restart. It's pretty simple, really.

macrobeak · 2019-08-11 02:43

mmmmm cgracey and oxpropdev,
to a non-expert observer that sounds potentially like Propscope on steroids.
Am I correct in my understanding?

cgracey · 2019-08-11 03:29

macrobeak wrote: »

mmmmm cgracey and oxpropdev,
to a non-expert observer that sounds potentially like Propscope on steroids.
Am I correct in my understanding?

It could be.

I've been wanting to summarize that the intention of the debug scheme was that the host system would have as many concurrent, but time-sliced, conversations going on as there were cogs executing code in the P2.

The host has to be able to hold eight (or even 16, if we had a 16-cog chip) distinct conversations simultaneously.

Each cog must handle its own dialogue with the host from within its debug ISR's.

This will be a lot of fun to get working. The debug sofware in the cog will exist in a 16KB write-protected (but writable from within debug ISRs) block at $FC000. It really only needs to do several things, which are not complex. The host side software can become much more complicated, as it needs to tie back the debug information to the source code, which may contain debugging meta commands. The host-side software, in its simplest form, could just be an unassuming hexadecimal debugger with disassembler. There are so many possibilities. It's fun to think about.

My long-term goal is to get Spin2 working, along with such a debugger, so that in a split-second after hitting a "run" key, your code is running in the chip, reporting live debug data.

jmg · 2019-08-11 05:34

cgracey wrote: »

I've been wanting to summarize that the intention of the debug scheme was that the host system would have as many concurrent, but time-sliced, conversations going on as there were cogs executing code in the P2.

The host has to be able to hold eight (or even 16, if we had a 16-cog chip) distinct conversations simultaneously.

Each cog must handle its own dialogue with the host from within its debug ISR's.

This will be a lot of fun to get working. The debug sofware in the cog will exist in a 16KB write-protected (but writable from within debug ISRs) block at $FC000. It really only needs to do several things, which are not complex. The host side software can become much more complicated, as it needs to tie back the debug information to the source code, which may contain debugging meta commands. The host-side software, in its simplest form, could just be an unassuming hexadecimal debugger with disassembler. There are so many possibilities. It's fun to think about.

My long-term goal is to get Spin2 working, along with such a debugger, so that in a split-second after hitting a "run" key, your code is running in the chip, reporting live debug data.

To help with all this traffic, I have some fresh numbers from the EFM8UB3 serial code I've been working on with Peter for the P2D2 & P2D2Pi

Rx buffer is now up to 1280 bytes, which means it can receive up to 1792 byte continual burst of data from P2, at the 8MBaud 8.n.2 setting. (that takes 2.464ms)
Above that, the lower average USB speed of just over 3MBd needs pauses in the data stream, but they can be timed. Burst limit is somewhat > 1280 Buffer limit, because USB sends whilst the buffer is filling

This also means anyone wanting a real time pass-tag in code, can set up a Smart Pin cell to send 3 x 8.n.2 = 32 bit frame, and write once to that on the fly.
That 3 byte tag sends in 4us, so you can pass-tag any two points in code down to as close as 4us, without needing to pause for Tx.Done

Attached file shows this RX Burst in action. (Host is a FT232H, which manages continual 8MBd no problems)
Blue trace is Rx Overrun error, with RXData visible as crosstalk for the 2.464ms
Red trace is USB Block move, for each 64 byte block, with a slight change in width visible between RX-Active and Rx-Done sections.

cgracey · 2019-08-11 05:45

Thanks, Jmg.

1280 bytes would be fine, as these exchanges are going to be very brief.

jmg · 2019-08-11 06:03

cgracey wrote: »

Thanks, Jmg.

1280 bytes would be fine, as these exchanges are going to be very brief.

These things have a habit of growing, so I bumped the original 128 bytes to 512 and then to 1280..
A trace-tag of up to 3 bytes can be very brief in timeindeed, just the time to write to a smart pin.
It may be useful to add some Language support for that type of in-line tag ? Then, the PC side could do some profiling.

ozpropdev · 2019-08-11 06:21

BTW
When a debug interrupt occurs 16 longs of cogram (0-F) are copied to hubram.
Then the ISR stub (also 16 longs) is copied from hubram to cogram.
Then you do your thing and then on exit the original 16 longs are copied back to cogram.
This has to be factored into any time calculation too.

jmg · 2019-08-11 06:56

ozpropdev wrote: »

BTW
When a debug interrupt occurs 16 longs of cogram (0-F) are copied to hubram.
Then the ISR stub (also 16 longs) is copied from hubram to cogram.
Then you do your thing and then on exit the original 16 longs are copied back to cogram.
This has to be factored into any time calculation too.

Yes, there are multiple tool in the debug toolbox.
Full single-step debug gives good visibility, but it pauses the user code whilst the debug code runs.

P2 has an edge here, in that other COGS are not paused, whilst a conventional microcontroller pauses the single core.

Simple printf emits add the overhead of a library call, and the time to emit the message, but they are easy to understand.
The trace-tag approach I mention is similar to printf, in that it is source-code in-line, but the overhead is much lower.
Also there is no debug break to service, and the very small tag packet means there is no send delay wait either. It can make great use of the P2 32b Serial ability.

It's almost as low impact as a pin-toggle debug entry, but does not need a scope/LA connected to get useful information.

ozpropdev · 2019-08-11 07:22

@jmg ,Re: Trace-tags
I have used a FT2232H @12Mbits quite reliably with P2.
Even dabbled with sync mode on the same device @48M bits!
With smartpins, this is eezy peezy.

Propeller 2 debug specification

Comments