Reading back through the docs, I now see the two-clock delay comment. I guess for slaves that can read on the rising edge, I suppose you could get down to sysclock/4 (so that output is effective written on the falling edge).
I think it's worse. While it takes two clocks for the smartpin to see the clock pin change, it also takes another two clocks for the shift out to appear at the sending data pin. I'd need to double check.
Yes, this technique would work for 1, 2, 4, 8, 16, and 32-bit widths.
I realized today it can also work for any size transfer. By setting the count in the streamer command to $FFFF (infinite), you could control the transfer size by the number of transitions expressed in D for the WYPIN instruction. You would wait for the cpin's IN to go high, indicating the clock transitions were finished. Then, do an XSTOP. Actually, there would be a few bits of overrun in that case. It would be better to record CT right before you begin the initiation sequence, then once begun, set up a WAITCT for the point in time two clocks before you will do an XSTOP to stop the streamer.
I looked into two-bit data mode for our flash chip, but the bits are reversed. D0 is above D1. So, you would have to swap even and odd bits, before or after the transfer. Or, you could just permit all bit pairs to be reversed in the flash memory. The data pins were arranged this way, so that if you connected up D2 and D3 below for QSPI, you would have a contiguous stretch of pins that were ordered, albeit upside down, in an integrally-placed nibble at P[56:59].
I was thinking about rearranging the bit order of the burst data anyway. Wasn't planning on delving into it until after I've done the mode checking code to workout what each SPI device supports. Alas, I've had some trouble with my teeth and just haven't been able to concentrate much of late.
I got the second-stage boot loader done. It's only 18 longs. Using RCFAST, it loads 1KB every ~700us at clk/2 rate.
This program goes into the 8-pin flash at $000000..$0003FF, while the application that will be loaded into the hub starting at $00000 follows in the flash starting at $000400.
Next, I need to make the code that programs this loader, plus the main application's data, into the flash. Then I can integrate them into PNut.exe so that with one key, you can compile, download, and program the flash with PASM or Spin code.
' *** Fast-load SPI flash program into hub memory and execute ***CON spi_cs = 61'low on entry, flash reading at $400
spi_ck = 60'low on entry, cycle for next bit
spi_di = 59'floating on entry
spi_do = 58'floating on entry, flash outputting MSB of byte[$400]' This $100-long block of code gets read from the 8-pin flash, from addresses' $000000..$0003FF, into cog registers $000..$0FF, then executed by the ROM booter.'' On entry, the flash is outputting bit 7 of the byte at address $400. Starting' there, this program quickly reads 1KB blocks into hub $00000..<=$FFFFF and then' does a 'COGINIT #0,#$00000' to launch the loaded application.DATorgwrpin #%01_00101_0,#spi_ck 'set spi_ck for transition output, drives lowfltl #spi_ck 'reset smart pinwxpin #1,#spi_ck 'set timebase to 1 clock per transitiondrvl #spi_ck 'enable smart pinsetxfrq ##$4000_0000'set streamer rate to clk/2wrfast #0,#0'ready to write to $00000+
nextkb wypin tran16k,#spi_ck '2 start clock transitionswaitx #3'2+3 align clock transitions with input samplingxinit bit8k,#0'2 start inputting spi_do data to hubwaitxfi'2+16k wait for streamer to finishdjnz blocks,#nextkb '4 get next 1KB blockwrfast #0,#0'ensure last data written to hubwrpin #0,#spi_ck 'clear smart pincoginit #0,#$00000'relaunch cog from $00000
tran16k long$4000'16K transitions for 8K bits
bit8k long$C081_2000 + spi_do<<17'streamer mode, 1-pin input, 8K bits
orgf $100-2'space to $100 longs
blocks long1'number of 1KB blocks to load (set by compiler)
checksum long -1'"Prop" - sum of these longs (set by compiler)
Here's the raw data for this loader. Allocating 256 longs for a second-stage loader was overkill in the ROM booter code.
You're missing SPI chip select and the read command ($03) and address.
No, the ROM booter transfers control to the second-stage booter with the flash being read at $400, with bit7 coming out of its SPI_DO pin. You're already on the bike, you just have to pedal it.
It would explain the reason I had to do so many steps to reset everything when configuring events and likes.
You mean that you've made second-stage booter code, already, yourself?
For normal application download, all smart pins are cleared to zero mode, and made inputs, so there should be no trace of anything. What were you seeing?
When the second-stage SPI booter gets control, there are no smart pins configured, just SPI_CS and SPI_CLK are low outputs and the flash is in read mode - that's it.
I looked into two-bit data mode for our flash chip, but the bits are reversed. D0 is above D1. So, you would have to swap even and odd bits, before or after the transfer. Or, you could just permit all bit pairs to be reversed in the flash memory. The data pins were arranged this way, so that if you connected up D2 and D3 below for QSPI, you would have a contiguous stretch of pins that were ordered, albeit upside down, in an integrally-placed nibble at P[56:59].
No such luck with the SD card. In 4bit SD bus mode (as compared to SPI mode), CS turns into D3, DI turns into CMD and DO turns into D0 (and D1/D2 are often not hooked up at all). So I guess one needs a full 4 extra pins to hook the data bits up to. (I assume there's no trouble in connecting two P2 pins to the same highspeed data line?).
Also speaking of which, I guess there might be some trouble if there's response data coming in on the CMD line while a data transfer is active (I'm not entirely sure that is avoidable, the spec document is terrible). Fast SD access might have to be a two-cog job.
It would explain the reason I had to do so many steps to reset everything when configuring events and likes.
You mean that you've made second-stage booter code, already, yourself?
For normal application download, all smart pins are cleared to zero mode, and made inputs, so there should be no trace of anything. What were you seeing?
Brian made it. I tinkered with it for speed - a dualSPI mode using smartpins. Eric has it included with FlexGUI. I'm reworking it now to handle different SPI flash parts so it can autodetect supported SPI modes.
It would have just been the enabled outputs. I was being cheap in early testing of the rework and not doing any DIRL or FLTL before reconfiguring the pins. It had some oddball side-efects, including not triggering the first event without needing both a POLLSE1 plus initial blind event.
Also speaking of which, I guess there might be some trouble if there's response data coming in on the CMD line while a data transfer is active (I'm not entirely sure that is avoidable, the spec document is terrible). Fast SD access might have to be a two-cog job.
Possibly two COGs yes but hopefully some way could be found to have it work with a single COG if the output clock is under our control. Perhaps the clock can be slowed during decoding the incoming response on CMD while collecting/outputting DAT nibbles, and then sped up for the remainder of the data transfer once the CMD response has been fully received. Maybe an independent smartpin could be allocated to the CMD pin in serial mode (to detect the first response start bit) which could be examined while the streamer reads/writes the nibbles (we may still need to consider a data CRC here too). Whether or not a dynamic clock variation like this is allowed or how it may effect SD block writes if they are somehow timed off it I'm not sure.
I'm working on the 2nd-stage flash booter for application launching. The first thing to sort out is how to quickly program the flash, so the user doesn't have to wait long. Then, the loader which executes on reset must pull the data from the flash into memory very quickly.
So with the first straightforward approach with clk/4 (200ns per bit) you could load 512kB in less than one second. With the optimised clk/2 transfer it's less than half a second. I think most programs are much smaller and load in virtually no time. So there's no need for further speed optimisation. If anybody has to transfer large files to play sounds, videos or whatsoever that could be handled with objects that are coded for speed and can be configured especially for the hardware they run on.
IMHO, the bootloader has to work on any possible hardware and should not depend on special features like 2 or 4 bit SPI modes. If you think you need more speed at any cost please make it optional.
I'm working on the 2nd-stage flash booter for application launching. The first thing to sort out is how to quickly program the flash, so the user doesn't have to wait long. Then, the loader which executes on reset must pull the data from the flash into memory very quickly.
So with the first straightforward approach with clk/4 (200ns per bit) you could load 512kB in less than one second. With the optimised clk/2 transfer it's less than half a second. I think most programs are much smaller and load in virtually no time. So there's no need for further speed optimisation. If anybody has to transfer large files to play sounds, videos or whatsoever that could be handled with objects that are coded for speed and can be configured especially for the hardware they run on.
IMHO, the bootloader has to work on any possible hardware and should not depend on special features like 2 or 4 bit SPI modes. If you think you need more speed at any cost please make it optional.
This is using standard SPI mode, which is 1 data bit. I've got it loading 512KB in 350ms now using the built-in RCFAST oscillator (20MHz+). There's no reliability problem in doing this, at all. It was just a matter of figuring how to best use the P2 peripherals to get the clk/2 data rate.
I take it you've got some urgency for your other board to work?
No urgency at all! I'm currently a bit busy with other projects anyway. I just don't want Chip waste his precious time on something that has to be changed back eventually because of compatibility problems.
Also speaking of which, I guess there might be some trouble if there's response data coming in on the CMD line while a data transfer is active (I'm not entirely sure that is avoidable, the spec document is terrible). Fast SD access might have to be a two-cog job.
Possibly two COGs yes but hopefully some way could be found to have it work with a single COG if the output clock is under our control. Perhaps the clock can be slowed during decoding the incoming response on CMD while collecting/outputting DAT nibbles, and then sped up for the remainder of the data transfer once the CMD response has been fully received. Maybe an independent smartpin could be allocated to the CMD pin in serial mode (to detect the first response start bit) which could be examined while the streamer reads/writes the nibbles (we may still need to consider a data CRC here too). Whether or not a dynamic clock variation like this is allowed or how it may effect SD block writes if they are somehow timed off it I'm not sure.
Well, there's two start bits (the spec calls the second "transmission bit", but it seems to just be a second zero bit?), so there might be time to cleanly slow the clock in such cases even at high speed relative to sysclock. Then again, to get higher than 50MHz clock, one has to switch to 1.8V signalling (that also needs another pin and some kind of transistor, since apparently one needs to powercycle the card to get it back into 3.3V/SPI mode at that point?) I think there was some trouble with reading fast 1.8V signals though?
It's just some bytes that you tack onto the front of your application's bytes, and then download. It programs your application into the SPI flash with a small second-stage loader that loads and runs your application on reset. All SPI activity happens at clk/2 in RCFAST. I just need to integrate it into PNut.exe next.
I documented the program and boot times:
' *** SPI FLASH PROGRAMMER AND LOADER' *** Works with 16MB flash W25Q128JV on P2 Eval board.' *** Writes loader and application to SPI flash, then reboots to execute.'' Program/Boot performance (RCFAST)'' program boot' bytes time time' -------------------------------------' 0..2KB 30ms 10ms' 4KB 60ms 11ms' 8KB 90ms 14ms' 16KB 125ms 20ms' 32KB 190ms 30ms' 64KB 260ms 52ms' 128KB 500ms 95ms' 256KB 1.00s 184ms' 512KB 1.95s 358ms'' Use: 1) append application bytes at app_start' 2) set app_size to number of application bytes' 3) download and execute composite image (uses RCFAST)' 4) after programming is complete, chip will reboot'CON spi_cs = 61
spi_ck = 60
spi_di = 59
spi_do = 58'****************'* Programmer *'****************'DATorgjmp #prep_data '@0: jump to prep_data
app_size long24'(per example) '@4: application size in bytes (set by compiler)''' If loader + application are under $400 bytes, pad with zeros and adjust app_size'
prep_data add app_end,app_size 'make app_endsub loader_end,app_end wcz'is loader_end > app_end ?if_aadd app_size,loader_end 'if loader_end > app_end, adjust app_size so that loader + app take $400 bytesif_ashr loader_end,#2'if loader_end > app_end, fill app_end..loader_end with zeros (overfills 1..4 bytes)if_bmov loader_end,#$100/4-1'if loader_end < app_end, fill app_end..+255 with zeros to keep last page cleanif_nesetq loader_end
if_newrlong #0,app_end
wrlong app_size,##@app_bytes 'set app_bytes in loader''' Calculate loader checksum'rdfast #0,#@loader 'sum $100 longs of loadermov x,#0rep #2,#$100rflong y
add x,y
sub csum,x 'compute checksumwrlong csum,##@checksum 'set checksum in loader''' Get ready to program flash'drvh #spi_cs 'spi_cs highfltl #spi_ck 'reset smart pin spi_ckwrpin #%01_00101_0,#spi_ck 'set spi_ck for transition output, starts out lowwxpin #1,#spi_ck 'set timebase to 1 clock per transitiondrvl #spi_ck 'enable smart pindrvl #spi_di
setxfrq ##$4000_0000'set streamer rate to clk/2rdfast #0,#@loader 'start fifo read at loaderadd app_size,#@app_start-@loader 'get total number of bytes to program''' Main loop - erase 4/32/64KB block, program 16/128/256 sequential 256-byte pages, repeat'
.block encod x,app_size 'pick fastest block-erase commandsetd .cmd,#$20'set 4KB erase (25ms)sets .tst,#$0Fcmp x,#14wc'if bytes >= $4000, set 32KB erase (100ms)if_ncsetd .cmd,#$52if_ncsets .tst,#$7Fcmp x,#15wc'if bytes >= $8000, set 64KB erase (140ms)if_ncsetd .cmd,#$D8if_ncsets .tst,#$FFcallpa #$06,#spi_cmd8 'write enable
.cmd callpa #$20,#spi_cmd32 'erase 4/32/64KB blockcall #spi_wait 'wait for erase complete
.page callpa #$06,#spi_cmd8 'write enablecallpa #$02,#spi_cmd32 'program 256-byte pagexinit rmode,pa'2 start outputting 256*8 bitswypin tranp,#spi_ck '2 start 256*8*2 clock transitionswaitxfi'~4k wait for streamer donecall #spi_wait 'wait for program completesub app_size,#$100wcz'if done, reset chip to rebootif_behubset reset
add addr,#$0001'inc address by 256
.tst test addr,#$000Fwz'if not 4/32/64KB block boundary, program next pageif_nzjmp #.page
jmp #.block 'else, erase next block''' SPI command 8-bit - use callpa'
spi_cmd8 drvh #spi_cs 'new commanddrvl #spi_cs
xinit bmode,pa'2 start outputting 8 bitswypin #16,#spi_ck '2 start 16 clock transitions_ret_waitxfi'~16 wait for streamer to finish''' SPI command 32-bit - use callpa'
spi_cmd32 drvh #spi_cs 'new commanddrvl #spi_cs
shlpa,#16'shift command uporpa,addr 'or in addressshlpa,#8'shift up to get bytes: command[7:0], addr[15:0], $00movbytspa,#%%0123'rearrange bytes for top-to-bottom outputxinit lmode,pa'2 start outputting 32 bitswypin #64,#spi_ck '2 start 64 clock transitions_ret_waitxfi'~64 wait for streamer to finish''' SPI wait'
spi_wait getptr x 'remember fifo pointer
.try callpa #$05,#spi_cmd8 'issue read-status-register commandwrfast #0,#0'get result, write byte to hub at $00000wypin #16,#spi_ck '2 start 16 clock transitionswaitx #3'2+3 align clock transitions with input samplingxinit smode,#0'2 start inputting spi_do data to hubwaitxfi'~16 wait for streamer to finishwrfast #0,#0'wait for byte written to hubrdbyte y,#0'get byte and check busy bittest y,#$01wcif_cjmp #.try 'if busy set, try again_ret_rdfast #0,x 'busy clear, restore fifo read''' Data'
loader_end long @loader + $400
app_end long @app_start
csum byte"Prop"
tranp long256 * 8 * 2
bmode long$4081_0008 + spi_di<<17'streamer mode, 1-pin output, msb-first byte from s
lmode long$4081_0020 + spi_di<<17'streamer mode, 1-pin output, msb-first long from s
rmode long$8081_0800 + spi_di<<17'streamer mode, 1-pin output, msb-first $100 bytes from hub
smode long$C081_0008 + spi_do<<17'streamer mode, 1-pin input, msb-first byte to hub
addr long$000000
reset long$1000_0000
x res1
y res1'************'* Loader *'************'' The ROM booter reads this code from the 8-pin flash, from addresses $000000..$0003FF,' into cog registers $000..$0FF, then executes it in order to load the application.'' The initial application data trailing this code at app_start..$0FF needs to be moved' to hub $00000+. Then, any additionally-needed application data must be read from the' flash and stored in the hub from where the initial application data left off.'' Once all application data has been moved/loaded into the hub, cog 0 is restarted from' hub $00000, in order to execute the application.'' On entry, both spi_cs and spi_ck are low outputs, the flash is outputting bit7 of the' byte at address $400 into spi_do. By cycling spi_ck, any additional application data' can be read.'org''' First, move application data in cog app_start..$0FF into hub $00000+.' If application bytes met or exceeded, launch app'
loader setq #$100-app_start-1'move code from cog app_start..$0FF to hub $00000+wrlong app_start,#0sub app_bytes,w wcz'if app_bytes met or exceeded, doneif_becoginit #0,#$00000'relaunch cog 0 from $00000''' Need to load more application data from flash, read in remaining bytes, launch app'wrpin #%01_00101_0,#spi_ck 'set spi_ck smart pin for transitions, drives lowfltl #spi_ck 'reset smart pinwxpin #1,#spi_ck 'set transition timebase to clk/1drvl #spi_ck 'enable smart pinsetxfrq ##$4000_0000'set streamer rate to clk/2wrfast #0,w 'ready to write to hub at app continuation
.block bmask w,#12'try max streamer block size for whole bytes (8191)fle w,app_bytes 'limit to number of bytes leftsub app_bytes,w 'update number of bytes leftshl w,#3'get number of bits, insert into streamer commandsetword wmode,w,#0shl w,#1'double for number of spi_ck transitionswypin w,#spi_ck '2 start clock transitionswaitx #3'2+3 align clock transitions with input samplingxinit wmode,#0'2 start inputting spi_do data to hubwaitxfi'? wait for streamer to finishtjnz app_bytes,#.block 'if more bytes left, read another blockwrfast #0,#0'done, ensure last data gets written to hubwrpin #0,#spi_ck 'clear spi_ck smart pincoginit #0,#$00000'relaunch cog 0 from $00000''' Data'
w long ($100-app_start)*4'initially, hub start address for additional app data
wmode long$C081_0000 + spi_do<<17'streamer mode, 1-pin input, msb-first bytes to hub
app_bytes long0'number of bytes in application (set by prep_data)
checksum long0'"Prop" - sum of $100 loader longs (set by prep_data)
app_start 'data from here to $0FF is first part of application' Example program which writes random values to P[63:56] every ~100ms using RCFASTbyte$FF,$F6,$DF,$F8,$1B,$0C,$60,$FDbyte$06,$FA,$DB,$F8,$42,$0F,$80,$FFbyte$1F,$00,$65,$FD,$EC,$FF,$9F,$FD
Here's the object code, for size:
Programmer code
00000- 040090 FD 1800000001 A000 F1509E 98 F1 '............P...'
00010- 4F 020011029E 44103F 9E 04 C6289E 605D 'O.....D.?...(.`]'
00020- 5000685C 000000 FF D00364 FC 64017C FC 'P.h\......d.d.|.'
00030- 00 B204 F60005 DC FC 12 B460 FD 5A B200 F1 '..........`.Z...'
00040- 59 A280 F1000000 FF D4 A364 FC 597A 64 FD 'Y.........d.Yzd.'
00050- 507864 FD 3C 940C FC 3C 021C FC 587864 FD 'Pxd.<...<...Xxd.'
00060- 587664 FD 0000 A0 FF 1D 0064 FD 64017C FC 'Xvd.......d.d.|.'
00070- 740204 F101 B280 F7204E B4 F90F 64 BC F9 't....... N...d..'
00080- 0E B214 F2524E B4397F 64 BC 390F B214 F2 '....RN.9.d.9....'
00090- D84E B439 FF 64 BC 390E 0C 4C FB 12404C FB '.N.9.d.9..L..@L.'
000A0- 6800 B0 FD 0B 0C 4C FB 0F 044C FB F6 AB A0 FC 'h.....L...L.....'
000B0- 3C A424 FC 243660 FD 5000 B0 FD 00039C F1 '<.$.$6`.P.......'
000C0- 00 B060 ED 01 AE 04 F10F AE CC F7 D4 FF 9F 5D '..`............]'
000D0- A0 FF 9F FD 597A 64 FD 587A 64 FD F6 A7 A0 FC '....Yzd.Xzd.....'
000E0- 3C 202C FC 2436600D 597A 64 FD 587A 64 FD '< ,.$6`.Yzd.Xzd.'
000F0- 10 EC 67 F057 EC 43 F508 EC 67 F01B EC FF F9 '..g.W.C...g.....'
00100- F6 A9 A0 FC 3C 802C FC 2436600D 34 B260 FD '....<.,.$6`.4.`.'
00110- F00B 4C FB 00008C FC 3C 202C FC 1F 0664 FD '..L.....< ,...d.'
00120- 00 AC A4 FC 243660 FD 00008C FC 00 B4 C4 FA '....$6`.........'
00130- 01 B4 D4 F7 D8 FF 9F CD 5900780C 64050000 '........Y.x.d...'
00140- D801000050726F 70001000000800 F740 '....Prop.......@'
00150- 2000 F7400008 F7800800 F5 C000000000 ' ..@............'
00160- 0000001028 C465 FD '....(.e.
Loader code
00160- 003A 64 FC 193698 F1 ' .:d..6..'
00170- 0000 EC EC 3C 940C FC 507864 FD 3C 021C FC '....<...Pxd.<...'
00180- 587864 FD 0000 A0 FF 1D 0064 FD 190088 FC 'Xxd.......d.....'
00190- 0C 32 CC F91B 3220 F3193680 F1033264 F0 '.2...2 ..6...2d.'
001A0- 193420 F9013264 F03C 3224 FC 1F 0664 FD '.4 ..2d.<2$...d.'
001B0- 0034 A4 FC 243660 FD F5379C FB 00008C FC '.4..$6`..7......'
001C0- 3C 000C FC 0000 EC FC 8C 0300000000 F5 C0 '<...............'
001D0- 0000000000000000Example application - blinks LEDs randomly
001D0- FF F6 DF F81B 0C 60 FD ' ......`.'
001E0- 06 FA DB F8420F 80 FF 1F 0065 FD EC FF 9F FD '....B.....e.....'
Very handy, those programming times look nice and responsive. We won't be waiting too long when we re-flash.
I guess this inline flash+loader approach means we just need to keep our final applications $1D8 = 472 bytes shorter than 512kB so the whole thing can be downloaded in one go?
> @rogloh said:
> Very handy, those programming times look nice and responsive. We won't be waiting too long when we re-flash.
>
> I guess this inline flash+loader approach means we just need to keep our final applications $1D8 = 472 bytes shorter than 512kB so the whole thing can be downloaded in one go?
That is correct. I had always imagined the PC waiting for the device being programmed to finish, having some dialogue, but it's not really necessary. If the program time is very fast and it reboots quickly, so you can see that it works, maybe we don't need anything fancier. As I started working this out, it just kind of became what it now is.
> @evanh said:
> Chip,
> Not a good idea for demo program to be writing random data to EEPROM pins when it's enabled!
That crossed my mind. Oh, there could even be electrical conflicts. Maybe I'll change it to resistive drive. Then, there's the probability that the data in the flash could be disturbed.
Not every Flash chip supports 32kB block-erase, it may be even quite specific to Winbond. 4kB and 64kB are the standard sizes.
Good point. BTW, what are the requirements that qualify a particular flash chip to be compatible with the P2 boot loader? Which commands and page sizes have to be supported? Frequency/timing should not be an issue, most chips support >100MHz.
Not every Flash chip supports 32kB block-erase, it may be even quite specific to Winbond. 4kB and 64kB are the standard sizes.
Good point. BTW, what are the requirements that qualify a particular flash chip to be compatible with the P2 boot loader? Which commands and page sizes have to be supported? Frequency/timing should not be an issue, most chips support >100MHz.
The ROM booter tries to get the flash on-line, no matter what mode it might have been in. Then, it issues a read command ($03) and reads in $400 bytes:
''' Try to load from SPI memory'
try_spi drvh #spi_cs 'drive spi_cs highdrvl #spi_ck 'drive spi_ck lownegpb,#1'set command bits to all 1'sdrvh #spi_do 'drive spi_do high in case quad/dual modecallpa #2,#spi_cmd 'send exit-quad commandcallpa #8,#spi_cmd 'send exit-quad commandcallpa #16,#spi_cmd 'send exit-dual commandfltl #spi_do 'float spi_docallpb #$66,#spi_cmd8 'send reset-enable commandcallpb #$99,#spi_cmd8 'send reset commandwaitx ##rc_max/20_000'wait 50uscallpb #$04,#spi_cmd8 'send write-disable command to clear WEL
.wait callpb #$05,#spi_cmd8 'send read-status commandcall #spi_in 'get statustestbn x,#1wz'if WEL high, no SPI memory (z=0)if_nzjmp #.fail
testbn x,#0wz'if BUSY high, wait for erase/write to finishif_nzjmp #.wait
movpa,#32'send read-from-start commandcallpb #$03,#spi_cmd
decod y,#10'ready to input $400 bytes from SPIwrfast #0,#0'ready to write bytes to hub
.data call #spi_in 'get bytewfbyte x 'store byte into hubdjnz y,#.data 'loop for next byte (y=0 after)rdfast #0,#0'ready to read longs from hubrep @.sum,#$100'ready to read and sum $100 longsrflong z 'read longadd y,z 'sum long
.sum
cmp y,csum wz'verify checksum, z=1 if okaybitz flags,#spi_ok 'if program verified, set spi_ok flag
.fail
Comments
I realized today it can also work for any size transfer. By setting the count in the streamer command to $FFFF (infinite), you could control the transfer size by the number of transitions expressed in D for the WYPIN instruction. You would wait for the cpin's IN to go high, indicating the clock transitions were finished. Then, do an XSTOP. Actually, there would be a few bits of overrun in that case. It would be better to record CT right before you begin the initiation sequence, then once begun, set up a WAITCT for the point in time two clocks before you will do an XSTOP to stop the streamer.
I looked into two-bit data mode for our flash chip, but the bits are reversed. D0 is above D1. So, you would have to swap even and odd bits, before or after the transfer. Or, you could just permit all bit pairs to be reversed in the flash memory. The data pins were arranged this way, so that if you connected up D2 and D3 below for QSPI, you would have a contiguous stretch of pins that were ordered, albeit upside down, in an integrally-placed nibble at P[56:59].
This program goes into the 8-pin flash at $000000..$0003FF, while the application that will be loaded into the hub starting at $00000 follows in the flash starting at $000400.
Next, I need to make the code that programs this loader, plus the main application's data, into the flash. Then I can integrate them into PNut.exe so that with one key, you can compile, download, and program the flash with PASM or Spin code.
' *** Fast-load SPI flash program into hub memory and execute *** CON spi_cs = 61 'low on entry, flash reading at $400 spi_ck = 60 'low on entry, cycle for next bit spi_di = 59 'floating on entry spi_do = 58 'floating on entry, flash outputting MSB of byte[$400] ' This $100-long block of code gets read from the 8-pin flash, from addresses ' $000000..$0003FF, into cog registers $000..$0FF, then executed by the ROM booter. ' ' On entry, the flash is outputting bit 7 of the byte at address $400. Starting ' there, this program quickly reads 1KB blocks into hub $00000..<=$FFFFF and then ' does a 'COGINIT #0,#$00000' to launch the loaded application. DAT org wrpin #%01_00101_0,#spi_ck 'set spi_ck for transition output, drives low fltl #spi_ck 'reset smart pin wxpin #1,#spi_ck 'set timebase to 1 clock per transition drvl #spi_ck 'enable smart pin setxfrq ##$4000_0000 'set streamer rate to clk/2 wrfast #0,#0 'ready to write to $00000+ nextkb wypin tran16k,#spi_ck '2 start clock transitions waitx #3 '2+3 align clock transitions with input sampling xinit bit8k,#0 '2 start inputting spi_do data to hub waitxfi '2+16k wait for streamer to finish djnz blocks,#nextkb '4 get next 1KB block wrfast #0,#0 'ensure last data written to hub wrpin #0,#spi_ck 'clear smart pin coginit #0,#$00000 'relaunch cog from $00000 tran16k long $4000 '16K transitions for 8K bits bit8k long $C081_2000 + spi_do<<17 'streamer mode, 1-pin input, 8K bits orgf $100-2 'space to $100 longs blocks long 1 'number of 1KB blocks to load (set by compiler) checksum long -1 '"Prop" - sum of these longs (set by compiler)
Here's the raw data for this loader. Allocating 256 longs for a second-stage loader was overkill in the ROM booter code.
00000- 3C 94 0C FC 50 78 64 FD 3C 02 1C FC 58 78 64 FD '<...Pxd.<...Xxd.' 00010- 00 00 A0 FF 1D 00 64 FD 00 00 8C FC 3C 1E 24 FC '......d.....<.$.' 00020- 1F 06 64 FD 00 20 A4 FC 24 36 60 FD FB FD 6D FB '..d.. ..$6`...m.' 00030- 00 00 8C FC 3C 00 0C FC 00 00 EC FC 00 40 00 00 '....<........@..' 00040- 00 20 F5 C0 00 00 00 00 00 00 00 00 00 00 00 00 '. ..............' 00050- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00060- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00070- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00080- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00090- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 000A0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 000B0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 000C0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 000D0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 000E0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 000F0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00100- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00110- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00120- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00130- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00140- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00150- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00160- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00170- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00180- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00190- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 001A0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 001B0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 001C0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 001D0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 001E0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 001F0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00200- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00210- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00220- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00230- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00240- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00250- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00260- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00270- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00280- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00290- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 002A0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 002B0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 002C0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 002D0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 002E0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 002F0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00300- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00310- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00320- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00330- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00340- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00350- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00360- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00370- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00380- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 00390- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 003A0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 003B0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 003C0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 003D0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 003E0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 003F0- 00 00 00 00 00 00 00 00 01 00 00 00 FF FF FF FF '................'
No, the ROM booter transfers control to the second-stage booter with the flash being read at $400, with bit7 coming out of its SPI_DO pin. You're already on the bike, you just have to pedal it.
It would explain the reason I had to do so many steps to reset everything when configuring events and likes.
You mean that you've made second-stage booter code, already, yourself?
For normal application download, all smart pins are cleared to zero mode, and made inputs, so there should be no trace of anything. What were you seeing?
No such luck with the SD card. In 4bit SD bus mode (as compared to SPI mode), CS turns into D3, DI turns into CMD and DO turns into D0 (and D1/D2 are often not hooked up at all). So I guess one needs a full 4 extra pins to hook the data bits up to. (I assume there's no trouble in connecting two P2 pins to the same highspeed data line?).
Also speaking of which, I guess there might be some trouble if there's response data coming in on the CMD line while a data transfer is active (I'm not entirely sure that is avoidable, the spec document is terrible). Fast SD access might have to be a two-cog job.
It would have just been the enabled outputs. I was being cheap in early testing of the rework and not doing any DIRL or FLTL before reconfiguring the pins. It had some oddball side-efects, including not triggering the first event without needing both a POLLSE1 plus initial blind event.
So with the first straightforward approach with clk/4 (200ns per bit) you could load 512kB in less than one second. With the optimised clk/2 transfer it's less than half a second. I think most programs are much smaller and load in virtually no time. So there's no need for further speed optimisation. If anybody has to transfer large files to play sounds, videos or whatsoever that could be handled with objects that are coded for speed and can be configured especially for the hardware they run on.
IMHO, the bootloader has to work on any possible hardware and should not depend on special features like 2 or 4 bit SPI modes. If you think you need more speed at any cost please make it optional.
This is using standard SPI mode, which is 1 data bit. I've got it loading 512KB in 350ms now using the built-in RCFAST oscillator (20MHz+). There's no reliability problem in doing this, at all. It was just a matter of figuring how to best use the P2 peripherals to get the clk/2 data rate.
No urgency at all! I'm currently a bit busy with other projects anyway. I just don't want Chip waste his precious time on something that has to be changed back eventually because of compatibility problems.
Well, there's two start bits (the spec calls the second "transmission bit", but it seems to just be a second zero bit?), so there might be time to cleanly slow the clock in such cases even at high speed relative to sysclock. Then again, to get higher than 50MHz clock, one has to switch to 1.8V signalling (that also needs another pin and some kind of transistor, since apparently one needs to powercycle the card to get it back into 3.3V/SPI mode at that point?) I think there was some trouble with reading fast 1.8V signals though?
It's just some bytes that you tack onto the front of your application's bytes, and then download. It programs your application into the SPI flash with a small second-stage loader that loads and runs your application on reset. All SPI activity happens at clk/2 in RCFAST. I just need to integrate it into PNut.exe next.
I documented the program and boot times:
' *** SPI FLASH PROGRAMMER AND LOADER ' *** Works with 16MB flash W25Q128JV on P2 Eval board. ' *** Writes loader and application to SPI flash, then reboots to execute. ' ' Program/Boot performance (RCFAST) ' ' program boot ' bytes time time ' ------------------------------------- ' 0..2KB 30ms 10ms ' 4KB 60ms 11ms ' 8KB 90ms 14ms ' 16KB 125ms 20ms ' 32KB 190ms 30ms ' 64KB 260ms 52ms ' 128KB 500ms 95ms ' 256KB 1.00s 184ms ' 512KB 1.95s 358ms ' ' Use: 1) append application bytes at app_start ' 2) set app_size to number of application bytes ' 3) download and execute composite image (uses RCFAST) ' 4) after programming is complete, chip will reboot ' CON spi_cs = 61 spi_ck = 60 spi_di = 59 spi_do = 58 '**************** '* Programmer * '**************** ' DAT org jmp #prep_data '@0: jump to prep_data app_size long 24 '(per example) '@4: application size in bytes (set by compiler) ' ' ' If loader + application are under $400 bytes, pad with zeros and adjust app_size ' prep_data add app_end,app_size 'make app_end sub loader_end,app_end wcz 'is loader_end > app_end ? if_a add app_size,loader_end 'if loader_end > app_end, adjust app_size so that loader + app take $400 bytes if_a shr loader_end,#2 'if loader_end > app_end, fill app_end..loader_end with zeros (overfills 1..4 bytes) if_b mov loader_end,#$100/4-1 'if loader_end < app_end, fill app_end..+255 with zeros to keep last page clean if_ne setq loader_end if_ne wrlong #0,app_end wrlong app_size,##@app_bytes 'set app_bytes in loader ' ' ' Calculate loader checksum ' rdfast #0,#@loader 'sum $100 longs of loader mov x,#0 rep #2,#$100 rflong y add x,y sub csum,x 'compute checksum wrlong csum,##@checksum 'set checksum in loader ' ' ' Get ready to program flash ' drvh #spi_cs 'spi_cs high fltl #spi_ck 'reset smart pin spi_ck wrpin #%01_00101_0,#spi_ck 'set spi_ck for transition output, starts out low wxpin #1,#spi_ck 'set timebase to 1 clock per transition drvl #spi_ck 'enable smart pin drvl #spi_di setxfrq ##$4000_0000 'set streamer rate to clk/2 rdfast #0,#@loader 'start fifo read at loader add app_size,#@app_start-@loader 'get total number of bytes to program ' ' ' Main loop - erase 4/32/64KB block, program 16/128/256 sequential 256-byte pages, repeat ' .block encod x,app_size 'pick fastest block-erase command setd .cmd,#$20 'set 4KB erase (25ms) sets .tst,#$0F cmp x,#14 wc 'if bytes >= $4000, set 32KB erase (100ms) if_nc setd .cmd,#$52 if_nc sets .tst,#$7F cmp x,#15 wc 'if bytes >= $8000, set 64KB erase (140ms) if_nc setd .cmd,#$D8 if_nc sets .tst,#$FF callpa #$06,#spi_cmd8 'write enable .cmd callpa #$20,#spi_cmd32 'erase 4/32/64KB block call #spi_wait 'wait for erase complete .page callpa #$06,#spi_cmd8 'write enable callpa #$02,#spi_cmd32 'program 256-byte page xinit rmode,pa '2 start outputting 256*8 bits wypin tranp,#spi_ck '2 start 256*8*2 clock transitions waitxfi '~4k wait for streamer done call #spi_wait 'wait for program complete sub app_size,#$100 wcz 'if done, reset chip to reboot if_be hubset reset add addr,#$0001 'inc address by 256 .tst test addr,#$000F wz 'if not 4/32/64KB block boundary, program next page if_nz jmp #.page jmp #.block 'else, erase next block ' ' ' SPI command 8-bit - use callpa ' spi_cmd8 drvh #spi_cs 'new command drvl #spi_cs xinit bmode,pa '2 start outputting 8 bits wypin #16,#spi_ck '2 start 16 clock transitions _ret_ waitxfi '~16 wait for streamer to finish ' ' ' SPI command 32-bit - use callpa ' spi_cmd32 drvh #spi_cs 'new command drvl #spi_cs shl pa,#16 'shift command up or pa,addr 'or in address shl pa,#8 'shift up to get bytes: command[7:0], addr[15:0], $00 movbyts pa,#%%0123 'rearrange bytes for top-to-bottom output xinit lmode,pa '2 start outputting 32 bits wypin #64,#spi_ck '2 start 64 clock transitions _ret_ waitxfi '~64 wait for streamer to finish ' ' ' SPI wait ' spi_wait getptr x 'remember fifo pointer .try callpa #$05,#spi_cmd8 'issue read-status-register command wrfast #0,#0 'get result, write byte to hub at $00000 wypin #16,#spi_ck '2 start 16 clock transitions waitx #3 '2+3 align clock transitions with input sampling xinit smode,#0 '2 start inputting spi_do data to hub waitxfi '~16 wait for streamer to finish wrfast #0,#0 'wait for byte written to hub rdbyte y,#0 'get byte and check busy bit test y,#$01 wc if_c jmp #.try 'if busy set, try again _ret_ rdfast #0,x 'busy clear, restore fifo read ' ' ' Data ' loader_end long @loader + $400 app_end long @app_start csum byte "Prop" tranp long 256 * 8 * 2 bmode long $4081_0008 + spi_di<<17 'streamer mode, 1-pin output, msb-first byte from s lmode long $4081_0020 + spi_di<<17 'streamer mode, 1-pin output, msb-first long from s rmode long $8081_0800 + spi_di<<17 'streamer mode, 1-pin output, msb-first $100 bytes from hub smode long $C081_0008 + spi_do<<17 'streamer mode, 1-pin input, msb-first byte to hub addr long $000000 reset long $1000_0000 x res 1 y res 1 '************ '* Loader * '************ ' ' The ROM booter reads this code from the 8-pin flash, from addresses $000000..$0003FF, ' into cog registers $000..$0FF, then executes it in order to load the application. ' ' The initial application data trailing this code at app_start..$0FF needs to be moved ' to hub $00000+. Then, any additionally-needed application data must be read from the ' flash and stored in the hub from where the initial application data left off. ' ' Once all application data has been moved/loaded into the hub, cog 0 is restarted from ' hub $00000, in order to execute the application. ' ' On entry, both spi_cs and spi_ck are low outputs, the flash is outputting bit7 of the ' byte at address $400 into spi_do. By cycling spi_ck, any additional application data ' can be read. ' org ' ' ' First, move application data in cog app_start..$0FF into hub $00000+. ' If application bytes met or exceeded, launch app ' loader setq #$100-app_start-1 'move code from cog app_start..$0FF to hub $00000+ wrlong app_start,#0 sub app_bytes,w wcz 'if app_bytes met or exceeded, done if_be coginit #0,#$00000 'relaunch cog 0 from $00000 ' ' ' Need to load more application data from flash, read in remaining bytes, launch app ' wrpin #%01_00101_0,#spi_ck 'set spi_ck smart pin for transitions, drives low fltl #spi_ck 'reset smart pin wxpin #1,#spi_ck 'set transition timebase to clk/1 drvl #spi_ck 'enable smart pin setxfrq ##$4000_0000 'set streamer rate to clk/2 wrfast #0,w 'ready to write to hub at app continuation .block bmask w,#12 'try max streamer block size for whole bytes (8191) fle w,app_bytes 'limit to number of bytes left sub app_bytes,w 'update number of bytes left shl w,#3 'get number of bits, insert into streamer command setword wmode,w,#0 shl w,#1 'double for number of spi_ck transitions wypin w,#spi_ck '2 start clock transitions waitx #3 '2+3 align clock transitions with input sampling xinit wmode,#0 '2 start inputting spi_do data to hub waitxfi '? wait for streamer to finish tjnz app_bytes,#.block 'if more bytes left, read another block wrfast #0,#0 'done, ensure last data gets written to hub wrpin #0,#spi_ck 'clear spi_ck smart pin coginit #0,#$00000 'relaunch cog 0 from $00000 ' ' ' Data ' w long ($100-app_start)*4 'initially, hub start address for additional app data wmode long $C081_0000 + spi_do<<17 'streamer mode, 1-pin input, msb-first bytes to hub app_bytes long 0 'number of bytes in application (set by prep_data) checksum long 0 '"Prop" - sum of $100 loader longs (set by prep_data) app_start 'data from here to $0FF is first part of application ' Example program which writes random values to P[63:56] every ~100ms using RCFAST byte $FF,$F6,$DF,$F8,$1B,$0C,$60,$FD byte $06,$FA,$DB,$F8,$42,$0F,$80,$FF byte $1F,$00,$65,$FD,$EC,$FF,$9F,$FD
Here's the object code, for size:
Programmer code 00000- 04 00 90 FD 18 00 00 00 01 A0 00 F1 50 9E 98 F1 '............P...' 00010- 4F 02 00 11 02 9E 44 10 3F 9E 04 C6 28 9E 60 5D 'O.....D.?...(.`]' 00020- 50 00 68 5C 00 00 00 FF D0 03 64 FC 64 01 7C FC 'P.h\......d.d.|.' 00030- 00 B2 04 F6 00 05 DC FC 12 B4 60 FD 5A B2 00 F1 '..........`.Z...' 00040- 59 A2 80 F1 00 00 00 FF D4 A3 64 FC 59 7A 64 FD 'Y.........d.Yzd.' 00050- 50 78 64 FD 3C 94 0C FC 3C 02 1C FC 58 78 64 FD 'Pxd.<...<...Xxd.' 00060- 58 76 64 FD 00 00 A0 FF 1D 00 64 FD 64 01 7C FC 'Xvd.......d.d.|.' 00070- 74 02 04 F1 01 B2 80 F7 20 4E B4 F9 0F 64 BC F9 't....... N...d..' 00080- 0E B2 14 F2 52 4E B4 39 7F 64 BC 39 0F B2 14 F2 '....RN.9.d.9....' 00090- D8 4E B4 39 FF 64 BC 39 0E 0C 4C FB 12 40 4C FB '.N.9.d.9..L..@L.' 000A0- 68 00 B0 FD 0B 0C 4C FB 0F 04 4C FB F6 AB A0 FC 'h.....L...L.....' 000B0- 3C A4 24 FC 24 36 60 FD 50 00 B0 FD 00 03 9C F1 '<.$.$6`.P.......' 000C0- 00 B0 60 ED 01 AE 04 F1 0F AE CC F7 D4 FF 9F 5D '..`............]' 000D0- A0 FF 9F FD 59 7A 64 FD 58 7A 64 FD F6 A7 A0 FC '....Yzd.Xzd.....' 000E0- 3C 20 2C FC 24 36 60 0D 59 7A 64 FD 58 7A 64 FD '< ,.$6`.Yzd.Xzd.' 000F0- 10 EC 67 F0 57 EC 43 F5 08 EC 67 F0 1B EC FF F9 '..g.W.C...g.....' 00100- F6 A9 A0 FC 3C 80 2C FC 24 36 60 0D 34 B2 60 FD '....<.,.$6`.4.`.' 00110- F0 0B 4C FB 00 00 8C FC 3C 20 2C FC 1F 06 64 FD '..L.....< ,...d.' 00120- 00 AC A4 FC 24 36 60 FD 00 00 8C FC 00 B4 C4 FA '....$6`.........' 00130- 01 B4 D4 F7 D8 FF 9F CD 59 00 78 0C 64 05 00 00 '........Y.x.d...' 00140- D8 01 00 00 50 72 6F 70 00 10 00 00 08 00 F7 40 '....Prop.......@' 00150- 20 00 F7 40 00 08 F7 80 08 00 F5 C0 00 00 00 00 ' ..@............' 00160- 00 00 00 10 28 C4 65 FD '....(.e. Loader code 00160- 00 3A 64 FC 19 36 98 F1 ' .:d..6..' 00170- 00 00 EC EC 3C 94 0C FC 50 78 64 FD 3C 02 1C FC '....<...Pxd.<...' 00180- 58 78 64 FD 00 00 A0 FF 1D 00 64 FD 19 00 88 FC 'Xxd.......d.....' 00190- 0C 32 CC F9 1B 32 20 F3 19 36 80 F1 03 32 64 F0 '.2...2 ..6...2d.' 001A0- 19 34 20 F9 01 32 64 F0 3C 32 24 FC 1F 06 64 FD '.4 ..2d.<2$...d.' 001B0- 00 34 A4 FC 24 36 60 FD F5 37 9C FB 00 00 8C FC '.4..$6`..7......' 001C0- 3C 00 0C FC 00 00 EC FC 8C 03 00 00 00 00 F5 C0 '<...............' 001D0- 00 00 00 00 00 00 00 00 Example application - blinks LEDs randomly 001D0- FF F6 DF F8 1B 0C 60 FD ' ......`.' 001E0- 06 FA DB F8 42 0F 80 FF 1F 00 65 FD EC FF 9F FD '....B.....e.....'
I guess this inline flash+loader approach means we just need to keep our final applications $1D8 = 472 bytes shorter than 512kB so the whole thing can be downloaded in one go?
Not a good idea for demo program to be writing random data to EEPROM pins when it's enabled!
> Very handy, those programming times look nice and responsive. We won't be waiting too long when we re-flash.
>
> I guess this inline flash+loader approach means we just need to keep our final applications $1D8 = 472 bytes shorter than 512kB so the whole thing can be downloaded in one go?
That is correct. I had always imagined the PC waiting for the device being programmed to finish, having some dialogue, but it's not really necessary. If the program time is very fast and it reboots quickly, so you can see that it works, maybe we don't need anything fancier. As I started working this out, it just kind of became what it now is.
> Nice seeing the streamer used for the programming too. Smooth.
It's funny how the fastest approach took the least amount of code.
> Chip,
> Not a good idea for demo program to be writing random data to EEPROM pins when it's enabled!
That crossed my mind. Oh, there could even be electrical conflicts. Maybe I'll change it to resistive drive. Then, there's the probability that the data in the flash could be disturbed.
> Not every Flash chip supports 32kB block-erase, it may be even quite specific to Winbond. 4kB and 64kB are the standard sizes.
Good to know. I'll change it to just use the 4KB and 64KB erase commands. The 32KB erase time wasn't much of a game-changer, anyway. Thanks, Ariba.
Good point. BTW, what are the requirements that qualify a particular flash chip to be compatible with the P2 boot loader? Which commands and page sizes have to be supported? Frequency/timing should not be an issue, most chips support >100MHz.
The ROM booter tries to get the flash on-line, no matter what mode it might have been in. Then, it issues a read command ($03) and reads in $400 bytes:
' ' ' Try to load from SPI memory ' try_spi drvh #spi_cs 'drive spi_cs high drvl #spi_ck 'drive spi_ck low neg pb,#1 'set command bits to all 1's drvh #spi_do 'drive spi_do high in case quad/dual mode callpa #2,#spi_cmd 'send exit-quad command callpa #8,#spi_cmd 'send exit-quad command callpa #16,#spi_cmd 'send exit-dual command fltl #spi_do 'float spi_do callpb #$66,#spi_cmd8 'send reset-enable command callpb #$99,#spi_cmd8 'send reset command waitx ##rc_max/20_000 'wait 50us callpb #$04,#spi_cmd8 'send write-disable command to clear WEL .wait callpb #$05,#spi_cmd8 'send read-status command call #spi_in 'get status testbn x,#1 wz 'if WEL high, no SPI memory (z=0) if_nz jmp #.fail testbn x,#0 wz 'if BUSY high, wait for erase/write to finish if_nz jmp #.wait mov pa,#32 'send read-from-start command callpb #$03,#spi_cmd decod y,#10 'ready to input $400 bytes from SPI wrfast #0,#0 'ready to write bytes to hub .data call #spi_in 'get byte wfbyte x 'store byte into hub djnz y,#.data 'loop for next byte (y=0 after) rdfast #0,#0 'ready to read longs from hub rep @.sum,#$100 'ready to read and sum $100 longs rflong z 'read long add y,z 'sum long .sum cmp y,csum wz 'verify checksum, z=1 if okay bitz flags,#spi_ok 'if program verified, set spi_ok flag .fail