Propeller 2 debug specification
ozpropdev
Posts: 2,792
in Propeller 2
I'm finalizing testing of my Prop2 debugger and have a few thoughts.
Currently i'm using Pnut as my primary compiler and extract debug data from
the .lst file generated by the Ctrl-M function.
Consider the following test code
PNut produces the following debug data
Looking at the output a bit closer we find
Type 4B represents a constant
Type 56 Represents a "LONG"
Type 55 Represents a "WORD"
Type 54 Represents a "BYTE"
Type 57 Represents a "RES" location
Thr format of the "VALUE" is either a combination of cog and hub address, or the constant value.
and NAME is self explanatory.
While this information has been very useful I beleive we need to expand this more.
A pronlem arises when you have multople labels representing the same cog/lut address.
What is needed is a means of seperating definitions into groups/blocks for each cog.
Maybe somthing like
or maybe a "BLOVK" or "NAME" directive?
so we end up with something like
Thoughts?
Currently i'm using Pnut as my primary compiler and extract debug data from
the .lst file generated by the Ctrl-M function.
Consider the following test code
con xtal = 20_000_000 dat org start nop orgf 12 label1 nop .loop nop label2 nop .loop nop myreg res 1 'hub only addresses orgh mylong long 1 myword word 1 mybyte byte 1
PNut produces the following debug data
TYPE: 4B VALUE: 01312D00 NAME: XTAL TYPE: 56 VALUE: 00000000 NAME: START TYPE: 56 VALUE: 00C00030 NAME: LABEL1 TYPE: 56 VALUE: 00D00034 NAME: LOOP'0002 TYPE: 56 VALUE: 00E00038 NAME: LABEL2 TYPE: 56 VALUE: 00F0003C NAME: LOOP'0003 TYPE: 57 VALUE: 01000040 NAME: MYREG TYPE: 56 VALUE: FFF00040 NAME: MYLONG TYPE: 55 VALUE: FFF00044 NAME: MYWORD TYPE: 54 VALUE: FFF00046 NAME: MYBYTE
Looking at the output a bit closer we find
Type 4B represents a constant
Type 56 Represents a "LONG"
Type 55 Represents a "WORD"
Type 54 Represents a "BYTE"
Type 57 Represents a "RES" location
Thr format of the "VALUE" is either a combination of cog and hub address, or the constant value.
VALUE = cog_address << 20 | hub_addressNote that addresses defined under a ORGH directive return a cog address of FFF.
and NAME is self explanatory.
While this information has been very useful I beleive we need to expand this more.
A pronlem arises when you have multople labels representing the same cog/lut address.
What is needed is a means of seperating definitions into groups/blocks for each cog.
Maybe somthing like
CON 'adc_cog' DAT 'adc_cog' . . CON 'hdmi' DAT 'hdmi'If no name is included with the CON/DAT directive the constants/labels are global.
or maybe a "BLOVK" or "NAME" directive?
BLOCK "adc_cog"
so we end up with something like
TYPE: 4B VALUE: 01312D00 BLOCK: ADC_COG NAME: XTAL TYPE: 56 VALUE: 00000000 BLOVK: ADC_COG NAME: STARTThis way the user can select the correct labels to match the cog being debugged.
Thoughts?
Comments
Your solution doesn’t solve local variable names such as “.loop” which are quite common. Does this matter?
Pnut already deals with multiple ".label" names in the same code space.
From the sample code above the two ".loop" labels are appended with a unique number.
You have to know what cog is going to run your code in order to set the appropriate debug enable bits with hubset.
So you can't use coginit #16,#address anyway.
No. For a system-level debug, you enable debugging on every cog and trap every COGINIT. That's why the upper memory protection is there and there's the requirement that only a cog in a debug ISR can asyncronously breakpoint another cog. It's like hypnosis, where the user app doesn't know these things are going on, and the debug app is untouchable.
Ok, I hadn't thought of enabling ALL cogs in debug mode.
Enabling all now....
The hardware handles that for the debug interrupt, but there is need to make some reporting convention so that a debug ISR can alert the greater context to its status. That would involve using COGID to calculate its status-reporting address, which is a convention that the greater debugger software must define.
Ah, you need to have debugger stub code ready for ALL cogs, up at the top of memory. You don't know what to expect, so you must be ready for anything.
You could, but that gets complicated, as special setup is needed. Better to debug the entire app, and when a cog program of interest is identified on COGINIT, do special debug handling for it, while ignoring the others. Or, maintain awareness of ALL cogs and allow everything to be debugged. Lots of possibilities, but probably best to keep the door open to all of them.
You don't know, because of COGNEW, what cog will be running what code. So, you need to trap everything, like Checkpoint Charley, and look at their papers, which may not even make sense if they are running code that was built at runtime. In that case, you may not be able to offer symbolic debugging.
To make it most useful, you need to add context awareness, where you look at PTRB when a cog starts up, and try to match it up to source code, so that you can display things sensibly. That's going to be most of the work. The debugger in the upper 16KB is pretty finite. It's the host platform that needs to figure out the relationships back to the source code and provide the user interface.
I'm very close to having it all running Chip.
I already can symbolic debug a single cog now imcluding xbyte with spin2 symbols.
I just broke things with ALL debug though. Fixing it now.
It would be so awesome to have a 4k monitor loaded with debug info, animating as things run. Being able to watch variables by name would be so neat. Also, automatic graphing woulld be nice. The more the programmer can see into his app, the better he can develop it.
Imagine debugging commands that could be written right into the source code. They would generate no code, but just instruct the debugger on what to display and how to show it. So many possibilities, but the debugger software that runs in the top of hub RAM only needs to do several things to make all this possible. There's not much to it. It's the development platform that must orchestrate all the source-to-debugger dialogue.
The LOCK functionality was changed so that if the LOCK-holding cog is stopped (or restarted), that LOCK is freed and up-for-grabs again. This was done so that the debugging system could use, say, LOCK #15 (since it is the most remote and would be LOCKNEW'd ahead of time, anyway) as a permission to use the host connection.
Each time a cog were to go into a debug ISR, it would wait to get that LOCK, so that it could exclusively use the serial connection on P63/P62 to ask the host what it wanted to do about that debug interrupt. The host might say, "give me these registers and these RAM data, then continue executing." Or, the host might know that the user must now be polled for a response, so it says, "Ask me in another 100ms." After each communication, the cog in the debug ISR holding the LOCK would release it, in case another cog is in, or about to be in, a debug ISR, needing its own marching orders from the host.
The point is, there would be no central debug app in the P2. Each cog would negotiate its own interaction with the host on each debug ISR event. A LOCK is used to share the host connection among all cogs in debug ISRs. The host must juggle conversations with the cogs. In case a cog gets restarted, the host would get a just-started debug message from that cog, indicating that any previous conversation in progress with that cog is now superceded by a restart. It's pretty simple, really.
to a non-expert observer that sounds potentially like Propscope on steroids.
Am I correct in my understanding?
It could be.
I've been wanting to summarize that the intention of the debug scheme was that the host system would have as many concurrent, but time-sliced, conversations going on as there were cogs executing code in the P2.
The host has to be able to hold eight (or even 16, if we had a 16-cog chip) distinct conversations simultaneously.
Each cog must handle its own dialogue with the host from within its debug ISR's.
This will be a lot of fun to get working. The debug sofware in the cog will exist in a 16KB write-protected (but writable from within debug ISRs) block at $FC000. It really only needs to do several things, which are not complex. The host side software can become much more complicated, as it needs to tie back the debug information to the source code, which may contain debugging meta commands. The host-side software, in its simplest form, could just be an unassuming hexadecimal debugger with disassembler. There are so many possibilities. It's fun to think about.
My long-term goal is to get Spin2 working, along with such a debugger, so that in a split-second after hitting a "run" key, your code is running in the chip, reporting live debug data.
To help with all this traffic, I have some fresh numbers from the EFM8UB3 serial code I've been working on with Peter for the P2D2 & P2D2Pi
Rx buffer is now up to 1280 bytes, which means it can receive up to 1792 byte continual burst of data from P2, at the 8MBaud 8.n.2 setting. (that takes 2.464ms)
Above that, the lower average USB speed of just over 3MBd needs pauses in the data stream, but they can be timed. Burst limit is somewhat > 1280 Buffer limit, because USB sends whilst the buffer is filling
This also means anyone wanting a real time pass-tag in code, can set up a Smart Pin cell to send 3 x 8.n.2 = 32 bit frame, and write once to that on the fly.
That 3 byte tag sends in 4us, so you can pass-tag any two points in code down to as close as 4us, without needing to pause for Tx.Done
Attached file shows this RX Burst in action. (Host is a FT232H, which manages continual 8MBd no problems)
Blue trace is Rx Overrun error, with RXData visible as crosstalk for the 2.464ms
Red trace is USB Block move, for each 64 byte block, with a slight change in width visible between RX-Active and Rx-Done sections.
1280 bytes would be fine, as these exchanges are going to be very brief.
These things have a habit of growing, so I bumped the original 128 bytes to 512 and then to 1280..
A trace-tag of up to 3 bytes can be very brief in timeindeed, just the time to write to a smart pin.
It may be useful to add some Language support for that type of in-line tag ? Then, the PC side could do some profiling.
When a debug interrupt occurs 16 longs of cogram (0-F) are copied to hubram.
Then the ISR stub (also 16 longs) is copied from hubram to cogram.
Then you do your thing and then on exit the original 16 longs are copied back to cogram.
This has to be factored into any time calculation too.
Yes, there are multiple tool in the debug toolbox.
Full single-step debug gives good visibility, but it pauses the user code whilst the debug code runs.
P2 has an edge here, in that other COGS are not paused, whilst a conventional microcontroller pauses the single core.
Simple printf emits add the overhead of a library call, and the time to emit the message, but they are easy to understand.
The trace-tag approach I mention is similar to printf, in that it is source-code in-line, but the overhead is much lower.
Also there is no debug break to service, and the very small tag packet means there is no send delay wait either. It can make great use of the P2 32b Serial ability.
It's almost as low impact as a pin-toggle debug entry, but does not need a scope/LA connected to get useful information.
I have used a FT2232H @12Mbits quite reliably with P2.
Even dabbled with sync mode on the same device @48M bits!
With smartpins, this is eezy peezy.