SKIPF and SKIP Special SKIPF Branching Rules
From the manual...
Within SKIPF sequences where CALL/CALLPA/CALLPB are used to execute subroutines in which skipping will be suspended until after RET, all CALL/CALLPA/CALLPB immediate branch addresses must be absolute in cases where the instruction after the CALL/CALLPA/CALLPB might be skipped. This is not possible for CALLPA/CALLPB but CALL can use '#\address' syntax to achieve absolute immediate addressing. CALL/CALLPA/CALLPB can all use registers as branch addresses, since they are absolute.
For non-CALL\CALLPA\CALLPB branches within SKIPF sequences, SKIPF will work through all immediate-relative branches, which are the default for immediate branches within cog/LUT memory. If an absolute-address branch is being used (#\label, register, or RET, for example), you must not skip the first instruction after the branch. This is not a problem with immediate-relative branches, however, since the variable PC stepping works to advantage, by landing the PC at the first instruction of interest at, or beyond, the branch address.
Today I was testing to see if I could nest subroutines while keeping the skip in place for return.
Here is what I found (only tested in COG)...
* SKIPF fails if the CALL is relative and the next instruction is to be skipped
* SKIP works correctly even if the call is relative (at least my test did)
* SKIPF and SKIP both work correctly if the call is absolute (ie #\label)
* When it works, 2 level nesting works (ie the CALLed routine makes another CALL.
Here is an extract of the code I used
000d803600 C007 F6 | mov lmm_x, #0000dc037321864 FD | skipf #%0000_1100 ' SKIPF result is $0000_0C31 - WRONG!!!
000e0038 | ' skip #%0000_1100 ' SKIP result is $0000_FE31 - correct
000e003801 C047 F5 | or lmm_x, #%0000_0001 ' xxxx xxxx xxx0000e40391C 00 B0 FD | call #sr1 ' xxxx xxxx xx0x
000e803a 04 C047 F5 | or lmm_x, #%0000_0100 ' xxxx xxxx x1xx skip
000ec03b 08 C047 F5 | or lmm_x, #%0000_1000 ' xxxx xxxx 1xxx skip
000f003c 10 C047 F5 | or lmm_x, #%0001_0000 ' xxxx xxx0 xxxx
000f403d 20 C047 F5 | or lmm_x, #%0010_0000 ' xxxx xx0x xxxx
000f803e |
000f803e 28 CB AF FD | call #_hubHex8000fc03f E4 CA AF FD | call #_hubTxCR
0010004078 CD 8F FD | jmp #_hubMonitor
00104041 |
00104041 | sr1001040411C 00 B0 FD | call #sr2 '\ gets skipped if SKIPF and CALL #sr1 is relative
00108042010000 FF
0010c04300 C047 F5 | or lmm_x, ##%0010_0000_0000 '/ gets skipped if SKIPF and CALL #sr1 is relative
00110044020000 FF
0011404500 C047 F5 | or lmm_x, ##%0100_0000_000000118046040000 FF
0011c04700 C047 F5 | or lmm_x, ##%1000_0000_0000001200482D 0064 FD | ret
00124049 |
00124049 | sr2 or lmm_x, ##%0001_0000_0000_000000124049080000 FF
0012804a 00 C047 F50012c04b 100000 FF
0013004c 00 C047 F5 | or lmm_x, ##%0010_0000_0000_00000013404d 200000 FF
0013804e 00 C047 F5 | or lmm_x, ##%0100_0000_0000_00000013c04f 400000 FF
0014005000 C047 F5 | or lmm_x, ##%1000_0000_0000_0000001440512D 0064 FD | ret
00148052 |
00148052 | ' SKIPF result is $0000_0C3100148052 | ' SKIP result is $0000_FE31 - correct
Using absolute addressing works
000e40394100 A0 FD | call #\sr1 ' xxxx xxxx xx0x 'SKIPF result is $0000_FE31 - correct
I think I've found a useful trick when using PTRA/B operations. It's a little specialised but I'm sure it can be repurposed in other ways. The trick is the POP'd C/Z flags within the PTRA register are preserved across its operational use.
'==============================================='Emit string from immediate code in hubRAM' input: (hardware call stack) - hubRAM address of string' result: (none)'scratch: pb, temp1'
putsi
mov temp1, ptra'preserve existing PTRApopptra'address of immediate data following the CALL (includes the calling C/Z flags)
.loop
rdbytepb, ptra++ wz'get next charater, Z sets with null terminationif_nzcall #putch 'emit characterif_nzjmp #.loop
pushptra'update return address to instruction following the null charactermovptra, temp1 'restore prior PTRAretwcz'calling C/Z preserved
I've been getting myself in trouble with concurrent cordic ops. It's quite cool firing it off and coming back later to collect the results ... but, if for example I add in some debug type code, I find I'm breaking things too easy now because all my decimal printing is using the cordic divide operation.
So my first step to tidying this up a little is to at least make the printing routines themselves reliable in this scenario. The trick here is how to know you are getting the newest result - from print's QDIV operation. A little experimenting later and two instructions does it, eg:
emitclkfrq
qdiv clk_freq, ##1_000_000pollqmt'clear old event
.flushloop
getqxpa'MHz whole number - at final pipeline resultjnqmt #.flushloop 'wait for QMT flag - CORDIC pipeline flushedgetqy temp2 'six decimal places
...
EDIT: PS: I fixed me problem. It was a bug, I wasn't clearing the event flag before using. I keep forgetting that the things that set the event flags, don't reset them.
EDIT2: It's in contrast to the straight through code that assumes the pipeline is empty prior to routine call. Which would be coded like this instead:
My first attempt was to check before use, but that immediately annoyed me as bloaty code. eg:
emitclkfrq
pollqmt'clear old event
.flushloop
getqxinbjnqmt #.flushloop 'wait for QMT flag - CORDIC pipeline flushedqdiv clk_freq, ##1_000_000getqxpa'MHz whole number - at final pipeline resultgetqy temp2 'six decimal places
...
I think I've found a useful trick when using PTRA/B operations. It's a little specialised but I'm sure it can be repurposed in other ways. The trick is the POP'd C/Z flags within the PTRA register are preserved across its operational use.
'==============================================='Emit string from immediate code in hubRAM' input: (hardware call stack) - hubRAM address of string' result: (none)'scratch: pb, temp1'
putsi
mov temp1, ptra'preserve existing PTRApopptra'address of immediate data following the CALL (includes the calling C/Z flags)
.loop
rdbytepb, ptra++ wz'get next charater, Z sets with null terminationif_nzcall #putch 'emit characterif_nzjmp #.loop
pushptra'update return address to instruction following the null charactermovptra, temp1 'restore prior PTRAretwcz'calling C/Z preserved
Interesting. Because PTRA++ only increments the lower 20 bits, and the upper bits remain unchanged.
Certainly a nice way to pass parameters.
A trap with the smartpin pulse out modes: This applies to pulse %00100 and transition %00101 out modes at least. Presumably also applies to all DAC, NCO and PWM modes as well. It really only affects pulse and transition modes though because they have an end count of pulses.
The "base period" is a metronomic clock from when the smartpin mode is first configured. This stays actively ticking within the smartpin even if the smartpin is not generating pulses. EDIT: What this means is that when WYPIN issues more pulses to generate, the smartpin is not instruction aligned but rather will start the pulse generation at the beginning of the next base period.
Most of the time this detail can be ignored. But I've been playing around with aligning a streamer bursting of SPI data out to coincide with a smartpin emulating a SPI clock. This means, because of the base period effect, the SPI clock pin will then have an unpleasant alignment dither with respect to the SPI data pin if the smartpin is not reconfigured for each burst. A disable/enable combo is not enough.
PS: It maybe possible to give the streamer the same "base period" and using XCONT instead of XINIT for each burst to duplicate the smartpin's behaviour. Not something I've tried out yet ...
PPS: Correction: Along with a compensation, clearing out the chaff allowed a DIRL+DIRH combo on the SPI clock smartpin to do the job. XCONT wasn't the answer.
I removed all code in the ... part to rule anything else out.
This works fine:
movpb, ptrb'make a copy to preserve thingsgetnib a, pb, #5'extract pin group
This does not
movptra, ptrb'make a copy to preserve thingsgetnib a, ptra, #5'extract pin group
neither does this...
getnib a, ptrb, #5'extract pin group
Next time ptra gets accessed later in my code it is overwritten with a new value anyway so leaving residual data in it is not causing problems.. And it doesn't have to using be the pb register to somehow inadvertently make it work, other general registers work too instead of pb. It just seems using ptra or ptrb doesn't work here with getting upper nibbles, somehow the upper bits get lost. I thought these registers were meant to still be 32 bits.
ps. I am executing this code from LUT RAM in case that could possibly make any difference...?
revB at the moment. After earlier confusions with revA vs revB vs FPGA I have it list a few crucial detected parameters on each run. First text emitted of all recent runs:
Total smartpins = 641111111111111111111111111111111111111111111111111111111111111111Rev B silicon. Sysclock 4.0000 MHz
I am running P2ASM and I have been overclocking somewhat in the 252-308MHz range. I'll check the P2ASM output to make sure it is not generating bad opcodes.
Yes, I'm using fastspin almost exclusively these days. I tested mine up to 395 MHz without issue. No issue with the data values at 400 MHz but it does crash as expected on repeated runs.
I think this bug has been in p2asm from the beginning. If the source is ptra or ptrb p2asm will generate the pointer encoding instead of just using the pointer cog memory location. This affects getnib, rolnib, getbyte, rolbyte, getword and rolword. I'll fix it in GitHub in the next few minutes.
I think this bug has been in p2asm from the beginning. If the source is ptra or ptrb p2asm will generate the pointer encoding instead of just using the pointer cog memory location. This affects getnib, rolnib, getbyte, rolbyte, getword and rolword. I'll fix it in GitHub in the next few minutes.
EDIT: This is now fixed in version 0.017.
Blast! I wish I had read this thread before releasing the latest version of Catalina!
Is anyone supporting programming of the boot Flash EEPROM on board the Eval boards in their tools? Cluso, Chip, and Peter I think, worked out a pinout convention for having both SD and SPI bootable components on same four pins, P58-61. Chip has them documented in the prop2 doc. I presume Peter also uses same pinout for P2D2 boards.
While the P2 can boot from Serial/FLASH/SD I am not aware if any downloaders are capable of writing to FLASH or SD currently.
Perhaps the download authors can comment please ???
loadp2 doesn't currently support programming the flash, but If there's some stand-alone code for programming the flash it should be fairly straightforward to incorporate that.
While the P2 can boot from Serial/FLASH/SD I am not aware if any downloaders are capable of writing to FLASH or SD currently.
Perhaps the download authors can comment please ???
Cluso, were any of your ROM based SD init & write sector routines made available in a callable manner? If so it might be more straightforward to load and run some very small SD downloader PASM into the P2 somewhat like it did with its MainLoader1.spin that can access these routines and then we can write directly to a file, instead of developing an entire SD handling object first before that will be possible.
I know we can yank the SD card and write it in a PC etc, but on the P2-EVAL getting the microSD in and out becomes a chore fast and is not that ideal during development. I think there are some extender cards available that would help with that.
Catalina has a command that can program any .bin file into the FLASH RAM on the P2 EVAL board. It uses a version of the Flash_Loader_1.2 by ozpropdev.
See the command "flash_payload"
Good to hear. That should make it easy to integrate what I've done with speeding up the booting loader code.
I did also rework Brian's low-level serial programming routines but all that can be ignored. It was from when I was trying to figure out why nothing was working on the revB Eval board. Turns out I had a non-soldered CS pin on the Flash chip.
loadp2 doesn't currently support programming the flash, but If there's some stand-alone code for programming the flash it should be fairly straightforward to incorporate that.
Follow down left column as code executes...
So from Reset, the code tests for PU on P59, and if yes, Try will "try Serial", and if it fails else it will go on to "try FLASH" in the first column
Next "try FLASH", the code tests for PU on P61, and if yes, Try will "try FLASH", and if successful will load/run FLASH, else, will go on to "try SD" in the first column
Next "try SD", the code tests for PU on P60, and if yes, Try will "try SD", and if successful will load/run SD, else, will go on the "try Serial" in the first column
Next, "try Serial", the code tests for PD on P59, and if yes, Try, will "STOP", else will wait for "SERIAL" (after timeout will STOP IIRC)
Certainly could be expressed better - I did it in a hurry to see what pullups and pulldowns I need on a pcb.
Hmm, I'll add that Chip has six boot combinations listed in the google doc and, of the four boot lines you've got there, the first two, "Reset..." and "try FLASH..." are definitely wrong.
P59 PD was specifically to disable serial boot altogether. So saying STOP is appropriate in this case.
Boot ROM code execution is in this order:
P59 PU - TRY SERIAL
P61 PU - TRY FLASH
P60 PU - TRY SD
P59 PD - IGNORE SERIAL
To force it to check serial first before anything else after reset you would add a pull-up to P59. ( Cluso99's "flow" table is correct )
If I wanted it to always check Flash first then I would have a pull-up on the Flash CS
If an SD card is inserted then it detected as a pull-up on its CS = P60 and initializing the card can take many 100's of milliseconds. (don't use an external PU)
So with a Flash in the system you would tend to have a PU on its CS so it would get checked first.
If the Flash didn't have valid boot code then it would check the SD card as long as there is one inserted.
Finally it would check serial as long as there isn't a PD there.
Comments
Special SKIPF Branching Rules
From the manual...
Today I was testing to see if I could nest subroutines while keeping the skip in place for return.
Here is what I found (only tested in COG)...
* SKIPF fails if the CALL is relative and the next instruction is to be skipped
* SKIP works correctly even if the call is relative (at least my test did)
* SKIPF and SKIP both work correctly if the call is absolute (ie #\label)
* When it works, 2 level nesting works (ie the CALLed routine makes another CALL.
Here is an extract of the code I used
000d8 036 00 C0 07 F6 | mov lmm_x, #0 000dc 037 32 18 64 FD | skipf #%0000_1100 ' SKIPF result is $0000_0C31 - WRONG!!! 000e0 038 | ' skip #%0000_1100 ' SKIP result is $0000_FE31 - correct 000e0 038 01 C0 47 F5 | or lmm_x, #%0000_0001 ' xxxx xxxx xxx0 000e4 039 1C 00 B0 FD | call #sr1 ' xxxx xxxx xx0x 000e8 03a 04 C0 47 F5 | or lmm_x, #%0000_0100 ' xxxx xxxx x1xx skip 000ec 03b 08 C0 47 F5 | or lmm_x, #%0000_1000 ' xxxx xxxx 1xxx skip 000f0 03c 10 C0 47 F5 | or lmm_x, #%0001_0000 ' xxxx xxx0 xxxx 000f4 03d 20 C0 47 F5 | or lmm_x, #%0010_0000 ' xxxx xx0x xxxx 000f8 03e | 000f8 03e 28 CB AF FD | call #_hubHex8 000fc 03f E4 CA AF FD | call #_hubTxCR 00100 040 78 CD 8F FD | jmp #_hubMonitor 00104 041 | 00104 041 | sr1 00104 041 1C 00 B0 FD | call #sr2 '\ gets skipped if SKIPF and CALL #sr1 is relative 00108 042 01 00 00 FF 0010c 043 00 C0 47 F5 | or lmm_x, ##%0010_0000_0000 '/ gets skipped if SKIPF and CALL #sr1 is relative 00110 044 02 00 00 FF 00114 045 00 C0 47 F5 | or lmm_x, ##%0100_0000_0000 00118 046 04 00 00 FF 0011c 047 00 C0 47 F5 | or lmm_x, ##%1000_0000_0000 00120 048 2D 00 64 FD | ret 00124 049 | 00124 049 | sr2 or lmm_x, ##%0001_0000_0000_0000 00124 049 08 00 00 FF 00128 04a 00 C0 47 F5 0012c 04b 10 00 00 FF 00130 04c 00 C0 47 F5 | or lmm_x, ##%0010_0000_0000_0000 00134 04d 20 00 00 FF 00138 04e 00 C0 47 F5 | or lmm_x, ##%0100_0000_0000_0000 0013c 04f 40 00 00 FF 00140 050 00 C0 47 F5 | or lmm_x, ##%1000_0000_0000_0000 00144 051 2D 00 64 FD | ret 00148 052 | 00148 052 | ' SKIPF result is $0000_0C31 00148 052 | ' SKIP result is $0000_FE31 - correct
Using absolute addressing works000e4 039 41 00 A0 FD | call #\sr1 ' xxxx xxxx xx0x 'SKIPF result is $0000_FE31 - correct
'=============================================== 'Emit string from immediate code in hubRAM ' input: (hardware call stack) - hubRAM address of string ' result: (none) 'scratch: pb, temp1 ' putsi mov temp1, ptra 'preserve existing PTRA pop ptra 'address of immediate data following the CALL (includes the calling C/Z flags) .loop rdbyte pb, ptra++ wz 'get next charater, Z sets with null termination if_nz call #putch 'emit character if_nz jmp #.loop push ptra 'update return address to instruction following the null character mov ptra, temp1 'restore prior PTRA ret wcz 'calling C/Z preserved
So my first step to tidying this up a little is to at least make the printing routines themselves reliable in this scenario. The trick here is how to know you are getting the newest result - from print's QDIV operation. A little experimenting later and two instructions does it, eg:
emitclkfrq qdiv clk_freq, ##1_000_000 pollqmt 'clear old event .flushloop getqx pa 'MHz whole number - at final pipeline result jnqmt #.flushloop 'wait for QMT flag - CORDIC pipeline flushed getqy temp2 'six decimal places ...
EDIT: PS: I fixed me problem. It was a bug, I wasn't clearing the event flag before using. I keep forgetting that the things that set the event flags, don't reset them.
EDIT2: It's in contrast to the straight through code that assumes the pipeline is empty prior to routine call. Which would be coded like this instead:
emitclkfrq qdiv clk_freq, ##1_000_000 getqx pa 'MHz whole number getqy temp2 'six decimal places ...
My first attempt was to check before use, but that immediately annoyed me as bloaty code. eg:
emitclkfrq pollqmt 'clear old event .flushloop getqx inb jnqmt #.flushloop 'wait for QMT flag - CORDIC pipeline flushed qdiv clk_freq, ##1_000_000 getqx pa 'MHz whole number - at final pipeline result getqy temp2 'six decimal places ...
Interesting. Because PTRA++ only increments the lower 20 bits, and the upper bits remain unchanged.
Certainly a nice way to pass parameters.
The "base period" is a metronomic clock from when the smartpin mode is first configured. This stays actively ticking within the smartpin even if the smartpin is not generating pulses. EDIT: What this means is that when WYPIN issues more pulses to generate, the smartpin is not instruction aligned but rather will start the pulse generation at the beginning of the next base period.
Most of the time this detail can be ignored. But I've been playing around with aligning a streamer bursting of SPI data out to coincide with a smartpin emulating a SPI clock. This means, because of the base period effect, the SPI clock pin will then have an unpleasant alignment dither with respect to the SPI data pin if the smartpin is not reconfigured for each burst. A disable/enable combo is not enough.
PS: It maybe possible to give the streamer the same "base period" and using XCONT instead of XINIT for each burst to duplicate the smartpin's behaviour. Not something I've tried out yet ...
PPS: Correction: Along with a compensation, clearing out the chaff allowed a DIRL+DIRH combo on the SPI clock smartpin to do the job. XCONT wasn't the answer.
When you copy ptrb to ptra the upper bits in ptra are somehow lost/trashed. This code fails:
mov ptra, ptrb 'make a copy to preserve things ... getnib a, ptra, #5 'extract pin group
which behaves differently to this code below, which works.mov pb, ptrb 'make a copy to preserve things ... getnib a, pb, #5 'extract pin group
The snipped ... code in the middle is innocuous and doesn't ever access ptra.
Not seeing that here. Here's my test code:
mov bcdlen, #8 mov count, #10 .loop getrnd ptrb mov ptra, ptrb getnib pa, ptra, #5 call #itoh call #putsp mov pa, ptra call #itoh call #putsp mov pa, ptrb call #itoh call #putsp getnib pa, ptrb, #5 call #itoh call #putnl djnz count, #.loop jmp #$
and output:
00000003 aa30f2d5 aa30f2d5 00000003 00000006 c865965d c865965d 00000006 00000000 9704b86f 9704b86f 00000000 00000002 ac2d8dcd ac2d8dcd 00000002 0000000c bdc56ccd bdc56ccd 0000000c 0000000f a8fcdcb5 a8fcdcb5 0000000f 00000009 909f3ad5 909f3ad5 00000009 00000001 131cc72a 131cc72a 00000001 00000006 e9611c4d e9611c4d 00000006 0000000e 49e2fc62 49e2fc62 0000000e
I removed all code in the ... part to rule anything else out.
This works fine:
mov pb, ptrb 'make a copy to preserve things getnib a, pb, #5 'extract pin group
This does notmov ptra, ptrb 'make a copy to preserve things getnib a, ptra, #5 'extract pin group
neither does this...getnib a, ptrb, #5 'extract pin group
Next time ptra gets accessed later in my code it is overwritten with a new value anyway so leaving residual data in it is not causing problems.. And it doesn't have to using be the pb register to somehow inadvertently make it work, other general registers work too instead of pb. It just seems using ptra or ptrb doesn't work here with getting upper nibbles, somehow the upper bits get lost. I thought these registers were meant to still be 32 bits.
ps. I am executing this code from LUT RAM in case that could possibly make any difference...?
Total smartpins = 64 1111111111111111111111111111111111111111111111111111111111111111 Rev B silicon. Sysclock 4.0000 MHz
Might be worth checking compiler output.
I'm running Pnut and I think evan runs fastspin?
IIRC your running P2ASM?
00910 303 f603f1f9 mov ptra, ptrb 'make a copy to preserve things 0094c 312 f86f1500 getnib a, ptra, #5 'extract pin group
Good:00910 303 f603eff9 mov pb, ptrb 'make a copy to preserve things 0094c 312 f86b15f7 getnib a, pb, #5 'extract pin group
The S address in "getnib a, ptra, #5" looks a bit weird if it's $100. Seems bad and almost like it's using the GETNIB D form, but not quite.Seems this is a bug in P2ASM @"Dave Hein" are you still doing bug fixes? Actually I am running v0.016. I'd better check I'm up to the latest.
Update: Yes, I think it is the latest version on github
https://github.com/davehein/p2gcc/blob/master/p2asm_src/p2asm.c
EDIT: This is now fixed in version 0.017.
Blast! I wish I had read this thread before releasing the latest version of Catalina!
Serves me right for not keeping up to date
PS: I've had very good success in tuning up Brian's demo code to make the booting very fast even for large binaries - https://forums.parallax.com/discussion/comment/1480866/#Comment_1480866
Perhaps the download authors can comment please ???
Cluso, were any of your ROM based SD init & write sector routines made available in a callable manner? If so it might be more straightforward to load and run some very small SD downloader PASM into the P2 somewhat like it did with its MainLoader1.spin that can access these routines and then we can write directly to a file, instead of developing an entire SD handling object first before that will be possible.
I know we can yank the SD card and write it in a PC etc, but on the P2-EVAL getting the microSD in and out becomes a chore fast and is not that ideal during development. I think there are some extender cards available that would help with that.
See the command "flash_payload"
Good to hear. That should make it easy to integrate what I've done with speeding up the booting loader code.
I did also rework Brian's low-level serial programming routines but all that can be ignored. It was from when I was trying to figure out why nothing was working on the revB Eval board. Turns out I had a non-soldered CS pin on the Flash chip.
On the second page I had been reworking the low level reads for booting to get max loading speed - https://forums.parallax.com/discussion/comment/1480866/#Comment_1480866
Oh, that's not working for me - https://forums.parallax.com/discussion/comment/1482105/#Comment_1482105
You've got three types of "try"s. A column, 3 lines, and individual grid entries. What's the differences?
Follow down left column as code executes...
So from Reset, the code tests for PU on P59, and if yes, Try will "try Serial", and if it fails else it will go on to "try FLASH" in the first column
Next "try FLASH", the code tests for PU on P61, and if yes, Try will "try FLASH", and if successful will load/run FLASH, else, will go on to "try SD" in the first column
Next "try SD", the code tests for PU on P60, and if yes, Try will "try SD", and if successful will load/run SD, else, will go on the "try Serial" in the first column
Next, "try Serial", the code tests for PD on P59, and if yes, Try, will "STOP", else will wait for "SERIAL" (after timeout will STOP IIRC)
Certainly could be expressed better - I did it in a hurry to see what pullups and pulldowns I need on a pcb.
Boot ROM code execution is in this order:
P59 PU - TRY SERIAL
P61 PU - TRY FLASH
P60 PU - TRY SD
P59 PD - IGNORE SERIAL
To force it to check serial first before anything else after reset you would add a pull-up to P59. ( Cluso99's "flow" table is correct )
If I wanted it to always check Flash first then I would have a pull-up on the Flash CS
If an SD card is inserted then it detected as a pull-up on its CS = P60 and initializing the card can take many 100's of milliseconds. (don't use an external PU)
So with a Flash in the system you would tend to have a PU on its CS so it would get checked first.
If the Flash didn't have valid boot code then it would check the SD card as long as there is one inserted.
Finally it would check serial as long as there isn't a PD there.