SKIPF and SKIP Special SKIPF Branching Rules
From the manual...
Within SKIPF sequences where CALL/CALLPA/CALLPB are used to execute subroutines in which skipping will be suspended until after RET, all CALL/CALLPA/CALLPB immediate branch addresses must be absolute in cases where the instruction after the CALL/CALLPA/CALLPB might be skipped. This is not possible for CALLPA/CALLPB but CALL can use '#\address' syntax to achieve absolute immediate addressing. CALL/CALLPA/CALLPB can all use registers as branch addresses, since they are absolute.
For non-CALL\CALLPA\CALLPB branches within SKIPF sequences, SKIPF will work through all immediate-relative branches, which are the default for immediate branches within cog/LUT memory. If an absolute-address branch is being used (#\label, register, or RET, for example), you must not skip the first instruction after the branch. This is not a problem with immediate-relative branches, however, since the variable PC stepping works to advantage, by landing the PC at the first instruction of interest at, or beyond, the branch address.
Today I was testing to see if I could nest subroutines while keeping the skip in place for return.
Here is what I found (only tested in COG)...
* SKIPF fails if the CALL is relative and the next instruction is to be skipped
* SKIP works correctly even if the call is relative (at least my test did)
* SKIPF and SKIP both work correctly if the call is absolute (ie #\label)
* When it works, 2 level nesting works (ie the CALLed routine makes another CALL.
I think I've found a useful trick when using PTRA/B operations. It's a little specialised but I'm sure it can be repurposed in other ways. The trick is the POP'd C/Z flags within the PTRA register are preserved across its operational use.
'===============================================
'Emit string from immediate code in hubRAM
' input: (hardware call stack) - hubRAM address of string
' result: (none)
'scratch: pb, temp1
'
putsi
mov temp1, ptra 'preserve existing PTRA
pop ptra 'address of immediate data following the CALL (includes the calling C/Z flags)
.loop
rdbyte pb, ptra++ wz 'get next charater, Z sets with null termination
if_nz call #putch 'emit character
if_nz jmp #.loop
push ptra 'update return address to instruction following the null character
mov ptra, temp1 'restore prior PTRA
ret wcz 'calling C/Z preserved
I've been getting myself in trouble with concurrent cordic ops. It's quite cool firing it off and coming back later to collect the results ... but, if for example I add in some debug type code, I find I'm breaking things too easy now because all my decimal printing is using the cordic divide operation.
So my first step to tidying this up a little is to at least make the printing routines themselves reliable in this scenario. The trick here is how to know you are getting the newest result - from print's QDIV operation. A little experimenting later and two instructions does it, eg:
emitclkfrq
qdiv clk_freq, ##1_000_000
pollqmt 'clear old event
.flushloop
getqx pa 'MHz whole number - at final pipeline result
jnqmt #.flushloop 'wait for QMT flag - CORDIC pipeline flushed
getqy temp2 'six decimal places
...
EDIT: PS: I fixed me problem. It was a bug, I wasn't clearing the event flag before using. I keep forgetting that the things that set the event flags, don't reset them.
EDIT2: It's in contrast to the straight through code that assumes the pipeline is empty prior to routine call. Which would be coded like this instead:
emitclkfrq
qdiv clk_freq, ##1_000_000
getqx pa 'MHz whole number
getqy temp2 'six decimal places
...
My first attempt was to check before use, but that immediately annoyed me as bloaty code. eg:
emitclkfrq
pollqmt 'clear old event
.flushloop
getqx inb
jnqmt #.flushloop 'wait for QMT flag - CORDIC pipeline flushed
qdiv clk_freq, ##1_000_000
getqx pa 'MHz whole number - at final pipeline result
getqy temp2 'six decimal places
...
I think I've found a useful trick when using PTRA/B operations. It's a little specialised but I'm sure it can be repurposed in other ways. The trick is the POP'd C/Z flags within the PTRA register are preserved across its operational use.
'===============================================
'Emit string from immediate code in hubRAM
' input: (hardware call stack) - hubRAM address of string
' result: (none)
'scratch: pb, temp1
'
putsi
mov temp1, ptra 'preserve existing PTRA
pop ptra 'address of immediate data following the CALL (includes the calling C/Z flags)
.loop
rdbyte pb, ptra++ wz 'get next charater, Z sets with null termination
if_nz call #putch 'emit character
if_nz jmp #.loop
push ptra 'update return address to instruction following the null character
mov ptra, temp1 'restore prior PTRA
ret wcz 'calling C/Z preserved
Interesting. Because PTRA++ only increments the lower 20 bits, and the upper bits remain unchanged.
Certainly a nice way to pass parameters.
A trap with the smartpin pulse out modes: This applies to pulse %00100 and transition %00101 out modes at least. Presumably also applies to all DAC, NCO and PWM modes as well. It really only affects pulse and transition modes though because they have an end count of pulses.
The "base period" is a metronomic clock from when the smartpin mode is first configured. This stays actively ticking within the smartpin even if the smartpin is not generating pulses. EDIT: What this means is that when WYPIN issues more pulses to generate, the smartpin is not instruction aligned but rather will start the pulse generation at the beginning of the next base period.
Most of the time this detail can be ignored. But I've been playing around with aligning a streamer bursting of SPI data out to coincide with a smartpin emulating a SPI clock. This means, because of the base period effect, the SPI clock pin will then have an unpleasant alignment dither with respect to the SPI data pin if the smartpin is not reconfigured for each burst. A disable/enable combo is not enough.
PS: It maybe possible to give the streamer the same "base period" and using XCONT instead of XINIT for each burst to duplicate the smartpin's behaviour. Not something I've tried out yet ...
PPS: Correction: Along with a compensation, clearing out the chaff allowed a DIRL+DIRH combo on the SPI clock smartpin to do the job. XCONT wasn't the answer.
I removed all code in the ... part to rule anything else out.
This works fine:
mov pb, ptrb 'make a copy to preserve things
getnib a, pb, #5 'extract pin group
This does not
mov ptra, ptrb 'make a copy to preserve things
getnib a, ptra, #5 'extract pin group
neither does this...
getnib a, ptrb, #5 'extract pin group
Next time ptra gets accessed later in my code it is overwritten with a new value anyway so leaving residual data in it is not causing problems.. And it doesn't have to using be the pb register to somehow inadvertently make it work, other general registers work too instead of pb. It just seems using ptra or ptrb doesn't work here with getting upper nibbles, somehow the upper bits get lost. I thought these registers were meant to still be 32 bits.
ps. I am executing this code from LUT RAM in case that could possibly make any difference...?
revB at the moment. After earlier confusions with revA vs revB vs FPGA I have it list a few crucial detected parameters on each run. First text emitted of all recent runs:
Total smartpins = 64 1111111111111111111111111111111111111111111111111111111111111111
Rev B silicon. Sysclock 4.0000 MHz
I am running P2ASM and I have been overclocking somewhat in the 252-308MHz range. I'll check the P2ASM output to make sure it is not generating bad opcodes.
Yes, I'm using fastspin almost exclusively these days. I tested mine up to 395 MHz without issue. No issue with the data values at 400 MHz but it does crash as expected on repeated runs.
I think this bug has been in p2asm from the beginning. If the source is ptra or ptrb p2asm will generate the pointer encoding instead of just using the pointer cog memory location. This affects getnib, rolnib, getbyte, rolbyte, getword and rolword. I'll fix it in GitHub in the next few minutes.
I think this bug has been in p2asm from the beginning. If the source is ptra or ptrb p2asm will generate the pointer encoding instead of just using the pointer cog memory location. This affects getnib, rolnib, getbyte, rolbyte, getword and rolword. I'll fix it in GitHub in the next few minutes.
EDIT: This is now fixed in version 0.017.
Blast! I wish I had read this thread before releasing the latest version of Catalina!
Is anyone supporting programming of the boot Flash EEPROM on board the Eval boards in their tools? Cluso, Chip, and Peter I think, worked out a pinout convention for having both SD and SPI bootable components on same four pins, P58-61. Chip has them documented in the prop2 doc. I presume Peter also uses same pinout for P2D2 boards.
While the P2 can boot from Serial/FLASH/SD I am not aware if any downloaders are capable of writing to FLASH or SD currently.
Perhaps the download authors can comment please ???
loadp2 doesn't currently support programming the flash, but If there's some stand-alone code for programming the flash it should be fairly straightforward to incorporate that.
While the P2 can boot from Serial/FLASH/SD I am not aware if any downloaders are capable of writing to FLASH or SD currently.
Perhaps the download authors can comment please ???
Cluso, were any of your ROM based SD init & write sector routines made available in a callable manner? If so it might be more straightforward to load and run some very small SD downloader PASM into the P2 somewhat like it did with its MainLoader1.spin that can access these routines and then we can write directly to a file, instead of developing an entire SD handling object first before that will be possible.
I know we can yank the SD card and write it in a PC etc, but on the P2-EVAL getting the microSD in and out becomes a chore fast and is not that ideal during development. I think there are some extender cards available that would help with that.
Catalina has a command that can program any .bin file into the FLASH RAM on the P2 EVAL board. It uses a version of the Flash_Loader_1.2 by ozpropdev.
See the command "flash_payload"
Good to hear. That should make it easy to integrate what I've done with speeding up the booting loader code.
I did also rework Brian's low-level serial programming routines but all that can be ignored. It was from when I was trying to figure out why nothing was working on the revB Eval board. Turns out I had a non-soldered CS pin on the Flash chip.
loadp2 doesn't currently support programming the flash, but If there's some stand-alone code for programming the flash it should be fairly straightforward to incorporate that.
Follow down left column as code executes...
So from Reset, the code tests for PU on P59, and if yes, Try will "try Serial", and if it fails else it will go on to "try FLASH" in the first column
Next "try FLASH", the code tests for PU on P61, and if yes, Try will "try FLASH", and if successful will load/run FLASH, else, will go on to "try SD" in the first column
Next "try SD", the code tests for PU on P60, and if yes, Try will "try SD", and if successful will load/run SD, else, will go on the "try Serial" in the first column
Next, "try Serial", the code tests for PD on P59, and if yes, Try, will "STOP", else will wait for "SERIAL" (after timeout will STOP IIRC)
Certainly could be expressed better - I did it in a hurry to see what pullups and pulldowns I need on a pcb.
Hmm, I'll add that Chip has six boot combinations listed in the google doc and, of the four boot lines you've got there, the first two, "Reset..." and "try FLASH..." are definitely wrong.
P59 PD was specifically to disable serial boot altogether. So saying STOP is appropriate in this case.
Boot ROM code execution is in this order:
P59 PU - TRY SERIAL
P61 PU - TRY FLASH
P60 PU - TRY SD
P59 PD - IGNORE SERIAL
To force it to check serial first before anything else after reset you would add a pull-up to P59. ( Cluso99's "flow" table is correct )
If I wanted it to always check Flash first then I would have a pull-up on the Flash CS
If an SD card is inserted then it detected as a pull-up on its CS = P60 and initializing the card can take many 100's of milliseconds. (don't use an external PU)
So with a Flash in the system you would tend to have a PU on its CS so it would get checked first.
If the Flash didn't have valid boot code then it would check the SD card as long as there is one inserted.
Finally it would check serial as long as there isn't a PD there.
Comments
Special SKIPF Branching Rules
From the manual...
Today I was testing to see if I could nest subroutines while keeping the skip in place for return.
Here is what I found (only tested in COG)...
* SKIPF fails if the CALL is relative and the next instruction is to be skipped
* SKIP works correctly even if the call is relative (at least my test did)
* SKIPF and SKIP both work correctly if the call is absolute (ie #\label)
* When it works, 2 level nesting works (ie the CALLed routine makes another CALL.
Here is an extract of the code I used Using absolute addressing works
So my first step to tidying this up a little is to at least make the printing routines themselves reliable in this scenario. The trick here is how to know you are getting the newest result - from print's QDIV operation. A little experimenting later and two instructions does it, eg:
EDIT: PS: I fixed me problem. It was a bug, I wasn't clearing the event flag before using. I keep forgetting that the things that set the event flags, don't reset them.
EDIT2: It's in contrast to the straight through code that assumes the pipeline is empty prior to routine call. Which would be coded like this instead:
My first attempt was to check before use, but that immediately annoyed me as bloaty code. eg:
Interesting. Because PTRA++ only increments the lower 20 bits, and the upper bits remain unchanged.
Certainly a nice way to pass parameters.
The "base period" is a metronomic clock from when the smartpin mode is first configured. This stays actively ticking within the smartpin even if the smartpin is not generating pulses. EDIT: What this means is that when WYPIN issues more pulses to generate, the smartpin is not instruction aligned but rather will start the pulse generation at the beginning of the next base period.
Most of the time this detail can be ignored. But I've been playing around with aligning a streamer bursting of SPI data out to coincide with a smartpin emulating a SPI clock. This means, because of the base period effect, the SPI clock pin will then have an unpleasant alignment dither with respect to the SPI data pin if the smartpin is not reconfigured for each burst. A disable/enable combo is not enough.
PS: It maybe possible to give the streamer the same "base period" and using XCONT instead of XINIT for each burst to duplicate the smartpin's behaviour. Not something I've tried out yet ...
PPS: Correction: Along with a compensation, clearing out the chaff allowed a DIRL+DIRH combo on the SPI clock smartpin to do the job. XCONT wasn't the answer.
When you copy ptrb to ptra the upper bits in ptra are somehow lost/trashed. This code fails: which behaves differently to this code below, which works.
The snipped ... code in the middle is innocuous and doesn't ever access ptra.
Not seeing that here. Here's my test code:
and output:
I removed all code in the ... part to rule anything else out.
This works fine: This does not neither does this...
Next time ptra gets accessed later in my code it is overwritten with a new value anyway so leaving residual data in it is not causing problems.. And it doesn't have to using be the pb register to somehow inadvertently make it work, other general registers work too instead of pb. It just seems using ptra or ptrb doesn't work here with getting upper nibbles, somehow the upper bits get lost. I thought these registers were meant to still be 32 bits.
ps. I am executing this code from LUT RAM in case that could possibly make any difference...?
Might be worth checking compiler output.
I'm running Pnut and I think evan runs fastspin?
IIRC your running P2ASM?
Seems this is a bug in P2ASM @"Dave Hein" are you still doing bug fixes? Actually I am running v0.016. I'd better check I'm up to the latest.
Update: Yes, I think it is the latest version on github
https://github.com/davehein/p2gcc/blob/master/p2asm_src/p2asm.c
EDIT: This is now fixed in version 0.017.
Blast! I wish I had read this thread before releasing the latest version of Catalina!
Serves me right for not keeping up to date
PS: I've had very good success in tuning up Brian's demo code to make the booting very fast even for large binaries - https://forums.parallax.com/discussion/comment/1480866/#Comment_1480866
Perhaps the download authors can comment please ???
Cluso, were any of your ROM based SD init & write sector routines made available in a callable manner? If so it might be more straightforward to load and run some very small SD downloader PASM into the P2 somewhat like it did with its MainLoader1.spin that can access these routines and then we can write directly to a file, instead of developing an entire SD handling object first before that will be possible.
I know we can yank the SD card and write it in a PC etc, but on the P2-EVAL getting the microSD in and out becomes a chore fast and is not that ideal during development. I think there are some extender cards available that would help with that.
See the command "flash_payload"
Good to hear. That should make it easy to integrate what I've done with speeding up the booting loader code.
I did also rework Brian's low-level serial programming routines but all that can be ignored. It was from when I was trying to figure out why nothing was working on the revB Eval board. Turns out I had a non-soldered CS pin on the Flash chip.
On the second page I had been reworking the low level reads for booting to get max loading speed - https://forums.parallax.com/discussion/comment/1480866/#Comment_1480866
Oh, that's not working for me - https://forums.parallax.com/discussion/comment/1482105/#Comment_1482105
You've got three types of "try"s. A column, 3 lines, and individual grid entries. What's the differences?
Follow down left column as code executes...
So from Reset, the code tests for PU on P59, and if yes, Try will "try Serial", and if it fails else it will go on to "try FLASH" in the first column
Next "try FLASH", the code tests for PU on P61, and if yes, Try will "try FLASH", and if successful will load/run FLASH, else, will go on to "try SD" in the first column
Next "try SD", the code tests for PU on P60, and if yes, Try will "try SD", and if successful will load/run SD, else, will go on the "try Serial" in the first column
Next, "try Serial", the code tests for PD on P59, and if yes, Try, will "STOP", else will wait for "SERIAL" (after timeout will STOP IIRC)
Certainly could be expressed better - I did it in a hurry to see what pullups and pulldowns I need on a pcb.
Boot ROM code execution is in this order:
P59 PU - TRY SERIAL
P61 PU - TRY FLASH
P60 PU - TRY SD
P59 PD - IGNORE SERIAL
To force it to check serial first before anything else after reset you would add a pull-up to P59. ( Cluso99's "flow" table is correct )
If I wanted it to always check Flash first then I would have a pull-up on the Flash CS
If an SD card is inserted then it detected as a pull-up on its CS = P60 and initializing the card can take many 100's of milliseconds. (don't use an external PU)
So with a Flash in the system you would tend to have a PU on its CS so it would get checked first.
If the Flash didn't have valid boot code then it would check the SD card as long as there is one inserted.
Finally it would check serial as long as there isn't a PD there.