Shop OBEX P1 Docs P2 Docs Learn Events
Faster SPI Bus Transfers - Page 2 — Parallax Forums

Faster SPI Bus Transfers

2

Comments

  • evanhevanh Posts: 15,916
    Seairth wrote: »
    Reading back through the docs, I now see the two-clock delay comment. I guess for slaves that can read on the rising edge, I suppose you could get down to sysclock/4 (so that output is effective written on the falling edge).
    I think it's worse. While it takes two clocks for the smartpin to see the clock pin change, it also takes another two clocks for the shift out to appear at the sending data pin. I'd need to double check.
  • cgraceycgracey Posts: 14,155
    edited 2020-01-19 00:31
    Yes, this technique would work for 1, 2, 4, 8, 16, and 32-bit widths.

    I realized today it can also work for any size transfer. By setting the count in the streamer command to $FFFF (infinite), you could control the transfer size by the number of transitions expressed in D for the WYPIN instruction. You would wait for the cpin's IN to go high, indicating the clock transitions were finished. Then, do an XSTOP. Actually, there would be a few bits of overrun in that case. It would be better to record CT right before you begin the initiation sequence, then once begun, set up a WAITCT for the point in time two clocks before you will do an XSTOP to stop the streamer.

    I looked into two-bit data mode for our flash chip, but the bits are reversed. D0 is above D1. So, you would have to swap even and odd bits, before or after the transfer. Or, you could just permit all bit pairs to be reversed in the flash memory. The data pins were arranged this way, so that if you connected up D2 and D3 below for QSPI, you would have a contiguous stretch of pins that were ordered, albeit upside down, in an integrally-placed nibble at P[56:59].
  • evanhevanh Posts: 15,916
    I was thinking about rearranging the bit order of the burst data anyway. Wasn't planning on delving into it until after I've done the mode checking code to workout what each SPI device supports. Alas, I've had some trouble with my teeth and just haven't been able to concentrate much of late.

  • cgraceycgracey Posts: 14,155
    No worries, Evanh. I hope your teeth get straightened out soon. No fun to be in discomfort.
  • cgraceycgracey Posts: 14,155
    edited 2020-01-19 12:03
    I got the second-stage boot loader done. It's only 18 longs. Using RCFAST, it loads 1KB every ~700us at clk/2 rate.

    This program goes into the 8-pin flash at $000000..$0003FF, while the application that will be loaded into the hub starting at $00000 follows in the flash starting at $000400.

    Next, I need to make the code that programs this loader, plus the main application's data, into the flash. Then I can integrate them into PNut.exe so that with one key, you can compile, download, and program the flash with PASM or Spin code.
    ' *** Fast-load SPI flash program into hub memory and execute ***
    
    CON		spi_cs = 61	'low on entry, flash reading at $400
    		spi_ck = 60	'low on entry, cycle for next bit
    		spi_di = 59	'floating on entry 
    		spi_do = 58	'floating on entry, flash outputting MSB of byte[$400]
    
    ' This $100-long block of code gets read from the 8-pin flash, from addresses
    ' $000000..$0003FF, into cog registers $000..$0FF, then executed by the ROM booter.
    '
    ' On entry, the flash is outputting bit 7 of the byte at address $400. Starting
    ' there, this program quickly reads 1KB blocks into hub $00000..<=$FFFFF and then
    ' does a 'COGINIT #0,#$00000' to launch the loaded application.
    
    DAT		org
    
    		wrpin	#%01_00101_0,#spi_ck	'set spi_ck for transition output, drives low
    		fltl	#spi_ck			'reset smart pin
    		wxpin	#1,#spi_ck		'set timebase to 1 clock per transition
    		drvl	#spi_ck			'enable smart pin
    
    		setxfrq	##$4000_0000		'set streamer rate to clk/2
    		wrfast	#0,#0			'ready to write to $00000+
    
    nextkb		wypin	tran16k,#spi_ck	'2	start clock transitions
    		waitx	#3		'2+3	align clock transitions with input sampling
    		xinit	bit8k,#0	'2	start inputting spi_do data to hub
    		waitxfi			'2+16k	wait for streamer to finish
    		djnz	blocks,#nextkb	'4	get next 1KB block
    
    		wrfast	#0,#0			'ensure last data written to hub
    
    		wrpin	#0,#spi_ck		'clear smart pin
    
    		coginit	#0,#$00000		'relaunch cog from $00000
    
    
    tran16k		long	$4000			'16K transitions for 8K bits
    bit8k		long	$C081_2000 + spi_do<<17	'streamer mode, 1-pin input, 8K bits
    
    		orgf	$100-2			'space to $100 longs
    
    blocks		long	1			'number of 1KB blocks to load (set by compiler)
    checksum	long	-1			'"Prop" - sum of these longs (set by compiler)
    

    Here's the raw data for this loader. Allocating 256 longs for a second-stage loader was overkill in the ROM booter code.
    00000- 3C 94 0C FC 50 78 64 FD 3C 02 1C FC 58 78 64 FD   '<...Pxd.<...Xxd.'
    00010- 00 00 A0 FF 1D 00 64 FD 00 00 8C FC 3C 1E 24 FC   '......d.....<.$.'
    00020- 1F 06 64 FD 00 20 A4 FC 24 36 60 FD FB FD 6D FB   '..d.. ..$6`...m.'
    00030- 00 00 8C FC 3C 00 0C FC 00 00 EC FC 00 40 00 00   '....<........@..'
    00040- 00 20 F5 C0 00 00 00 00 00 00 00 00 00 00 00 00   '. ..............'
    00050- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00060- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00070- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00080- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00090- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    000A0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    000B0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    000C0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    000D0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    000E0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    000F0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00100- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00110- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00120- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00130- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00140- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00150- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00160- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00170- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00180- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00190- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    001A0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    001B0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    001C0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    001D0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    001E0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    001F0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00200- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00210- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00220- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00230- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00240- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00250- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00260- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00270- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00280- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00290- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    002A0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    002B0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    002C0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    002D0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    002E0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    002F0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00300- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00310- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00320- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00330- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00340- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00350- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00360- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00370- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00380- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    00390- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    003A0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    003B0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    003C0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    003D0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    003E0- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   '................'
    003F0- 00 00 00 00 00 00 00 00 01 00 00 00 FF FF FF FF   '................'
    
  • evanhevanh Posts: 15,916
    You're missing SPI chip select and the read command ($03) and address.
  • cgraceycgracey Posts: 14,155
    edited 2020-01-19 12:14
    evanh wrote: »
    You're missing SPI chip select and the read command ($03) and address.

    No, the ROM booter transfers control to the second-stage booter with the flash being read at $400, with bit7 coming out of its SPI_DO pin. You're already on the bike, you just have to pedal it.
  • evanhevanh Posts: 15,916
    Oh, that's a tad hairy!

    It would explain the reason I had to do so many steps to reset everything when configuring events and likes.
  • cgraceycgracey Posts: 14,155
    evanh wrote: »
    Oh, that's a tad hairy!

    It would explain the reason I had to do so many steps to reset everything when configuring events and likes.

    You mean that you've made second-stage booter code, already, yourself?

    For normal application download, all smart pins are cleared to zero mode, and made inputs, so there should be no trace of anything. What were you seeing?
  • cgraceycgracey Posts: 14,155
    When the second-stage SPI booter gets control, there are no smart pins configured, just SPI_CS and SPI_CLK are low outputs and the flash is in read mode - that's it.
  • Wuerfel_21Wuerfel_21 Posts: 5,053
    edited 2020-01-19 17:17
    cgracey wrote: »
    I looked into two-bit data mode for our flash chip, but the bits are reversed. D0 is above D1. So, you would have to swap even and odd bits, before or after the transfer. Or, you could just permit all bit pairs to be reversed in the flash memory. The data pins were arranged this way, so that if you connected up D2 and D3 below for QSPI, you would have a contiguous stretch of pins that were ordered, albeit upside down, in an integrally-placed nibble at P[56:59].

    No such luck with the SD card. In 4bit SD bus mode (as compared to SPI mode), CS turns into D3, DI turns into CMD and DO turns into D0 (and D1/D2 are often not hooked up at all). So I guess one needs a full 4 extra pins to hook the data bits up to. (I assume there's no trouble in connecting two P2 pins to the same highspeed data line?).
    Also speaking of which, I guess there might be some trouble if there's response data coming in on the CMD line while a data transfer is active (I'm not entirely sure that is avoidable, the spec document is terrible). Fast SD access might have to be a two-cog job.
  • evanhevanh Posts: 15,916
    edited 2020-01-19 22:11
    cgracey wrote: »
    evanh wrote: »
    Oh, that's a tad hairy!

    It would explain the reason I had to do so many steps to reset everything when configuring events and likes.

    You mean that you've made second-stage booter code, already, yourself?

    For normal application download, all smart pins are cleared to zero mode, and made inputs, so there should be no trace of anything. What were you seeing?
    Brian made it. I tinkered with it for speed - a dualSPI mode using smartpins. Eric has it included with FlexGUI. I'm reworking it now to handle different SPI flash parts so it can autodetect supported SPI modes.

    It would have just been the enabled outputs. I was being cheap in early testing of the rework and not doing any DIRL or FLTL before reconfiguring the pins. It had some oddball side-efects, including not triggering the first event without needing both a POLLSE1 plus initial blind event.
  • Wuerfel_21 wrote: »
    Also speaking of which, I guess there might be some trouble if there's response data coming in on the CMD line while a data transfer is active (I'm not entirely sure that is avoidable, the spec document is terrible). Fast SD access might have to be a two-cog job.
    Possibly two COGs yes but hopefully some way could be found to have it work with a single COG if the output clock is under our control. Perhaps the clock can be slowed during decoding the incoming response on CMD while collecting/outputting DAT nibbles, and then sped up for the remainder of the data transfer once the CMD response has been fully received. Maybe an independent smartpin could be allocated to the CMD pin in serial mode (to detect the first response start bit) which could be examined while the streamer reads/writes the nibbles (we may still need to consider a data CRC here too). Whether or not a dynamic clock variation like this is allowed or how it may effect SD block writes if they are somehow timed off it I'm not sure.
  • cgracey wrote: »
    I'm working on the 2nd-stage flash booter for application launching. The first thing to sort out is how to quickly program the flash, so the user doesn't have to wait long. Then, the loader which executes on reset must pull the data from the flash into memory very quickly.

    So with the first straightforward approach with clk/4 (200ns per bit) you could load 512kB in less than one second. With the optimised clk/2 transfer it's less than half a second. I think most programs are much smaller and load in virtually no time. So there's no need for further speed optimisation. If anybody has to transfer large files to play sounds, videos or whatsoever that could be handled with objects that are coded for speed and can be configured especially for the hardware they run on.

    IMHO, the bootloader has to work on any possible hardware and should not depend on special features like 2 or 4 bit SPI modes. If you think you need more speed at any cost please make it optional.
  • evanhevanh Posts: 15,916
    I take it you've got some urgency for your other board to work?

  • cgraceycgracey Posts: 14,155
    edited 2020-01-20 10:34
    ManAtWork wrote: »
    cgracey wrote: »
    I'm working on the 2nd-stage flash booter for application launching. The first thing to sort out is how to quickly program the flash, so the user doesn't have to wait long. Then, the loader which executes on reset must pull the data from the flash into memory very quickly.

    So with the first straightforward approach with clk/4 (200ns per bit) you could load 512kB in less than one second. With the optimised clk/2 transfer it's less than half a second. I think most programs are much smaller and load in virtually no time. So there's no need for further speed optimisation. If anybody has to transfer large files to play sounds, videos or whatsoever that could be handled with objects that are coded for speed and can be configured especially for the hardware they run on.

    IMHO, the bootloader has to work on any possible hardware and should not depend on special features like 2 or 4 bit SPI modes. If you think you need more speed at any cost please make it optional.

    This is using standard SPI mode, which is 1 data bit. I've got it loading 512KB in 350ms now using the built-in RCFAST oscillator (20MHz+). There's no reliability problem in doing this, at all. It was just a matter of figuring how to best use the P2 peripherals to get the clk/2 data rate.
  • evanh wrote: »
    I take it you've got some urgency for your other board to work?

    No urgency at all! I'm currently a bit busy with other projects anyway. I just don't want Chip waste his precious time on something that has to be changed back eventually because of compatibility problems.

  • evanhevanh Posts: 15,916
    Good to hear.
  • rogloh wrote: »
    Wuerfel_21 wrote: »
    Also speaking of which, I guess there might be some trouble if there's response data coming in on the CMD line while a data transfer is active (I'm not entirely sure that is avoidable, the spec document is terrible). Fast SD access might have to be a two-cog job.
    Possibly two COGs yes but hopefully some way could be found to have it work with a single COG if the output clock is under our control. Perhaps the clock can be slowed during decoding the incoming response on CMD while collecting/outputting DAT nibbles, and then sped up for the remainder of the data transfer once the CMD response has been fully received. Maybe an independent smartpin could be allocated to the CMD pin in serial mode (to detect the first response start bit) which could be examined while the streamer reads/writes the nibbles (we may still need to consider a data CRC here too). Whether or not a dynamic clock variation like this is allowed or how it may effect SD block writes if they are somehow timed off it I'm not sure.

    Well, there's two start bits (the spec calls the second "transmission bit", but it seems to just be a second zero bit?), so there might be time to cleanly slow the clock in such cases even at high speed relative to sysclock. Then again, to get higher than 50MHz clock, one has to switch to 1.8V signalling (that also needs another pin and some kind of transistor, since apparently one needs to powercycle the card to get it back into 3.3V/SPI mode at that point?) I think there was some trouble with reading fast 1.8V signals though?
  • cgraceycgracey Posts: 14,155
    edited 2020-01-20 20:18
    I got the flash programmer and loader done.

    It's just some bytes that you tack onto the front of your application's bytes, and then download. It programs your application into the SPI flash with a small second-stage loader that loads and runs your application on reset. All SPI activity happens at clk/2 in RCFAST. I just need to integrate it into PNut.exe next.

    I documented the program and boot times:
    ' *** SPI FLASH PROGRAMMER AND LOADER
    ' *** Works with 16MB flash W25Q128JV on P2 Eval board.
    ' *** Writes loader and application to SPI flash, then reboots to execute.
    '
    '	Program/Boot performance (RCFAST)
    '
    '			program		boot
    '	bytes		time		time
    '	-------------------------------------
    '	0..2KB		30ms		10ms
    '	   4KB		60ms		11ms
    '	   8KB		90ms		14ms
    '	  16KB		125ms		20ms
    '	  32KB		190ms		30ms
    '	  64KB		260ms		52ms
    '	 128KB		500ms		95ms
    '	 256KB		1.00s		184ms
    '	 512KB		1.95s		358ms
    '
    ' Use:	1) append application bytes at app_start
    '	2) set app_size to number of application bytes
    '	3) download and execute composite image (uses RCFAST)
    '	4) after programming is complete, chip will reboot
    '
    CON		spi_cs = 61
    		spi_ck = 60
    		spi_di = 59
    		spi_do = 58
    
    '****************
    '*  Programmer  *
    '****************
    '
    DAT		org
    
    		jmp	#prep_data		'@0: jump to prep_data
    app_size	long	24 '(per example)	'@4: application size in bytes (set by compiler)
    '
    '
    ' If loader + application are under $400 bytes, pad with zeros and adjust app_size
    '
    prep_data	add	app_end,app_size	'make app_end
    
    		sub	loader_end,app_end  wcz	'is loader_end > app_end ?
    
    	if_a	add	app_size,loader_end	'if loader_end > app_end, adjust app_size so that loader + app take $400 bytes
    
    	if_a	shr	loader_end,#2		'if loader_end > app_end, fill app_end..loader_end with zeros (overfills 1..4 bytes)
    	if_b	mov	loader_end,#$100/4-1	'if loader_end < app_end, fill app_end..+255 with zeros to keep last page clean
    	if_ne	setq	loader_end
    	if_ne	wrlong	#0,app_end
    
    		wrlong	app_size,##@app_bytes	'set app_bytes in loader
    '
    '
    ' Calculate loader checksum
    '
    		rdfast	#0,#@loader		'sum $100 longs of loader
    		mov	x,#0
    		rep	#2,#$100
    		rflong	y
    		add	x,y
    
    		sub	csum,x			'compute checksum
    
    		wrlong	csum,##@checksum	'set checksum in loader
    '
    '
    ' Get ready to program flash
    '
    		drvh	#spi_cs			'spi_cs high
    
    		fltl	#spi_ck			'reset smart pin spi_ck
    		wrpin	#%01_00101_0,#spi_ck	'set spi_ck for transition output, starts out low
    		wxpin	#1,#spi_ck		'set timebase to 1 clock per transition
    		drvl	#spi_ck			'enable smart pin
    
    		drvl	#spi_di
    
    		setxfrq	##$4000_0000		'set streamer rate to clk/2
    
    		rdfast	#0,#@loader		'start fifo read at loader
    
    		add	app_size,#@app_start-@loader	'get total number of bytes to program
    '
    '
    ' Main loop - erase 4/32/64KB block, program 16/128/256 sequential 256-byte pages, repeat
    '
    .block		encod	x,app_size		'pick fastest block-erase command
    		setd	.cmd,#$20		'set 4KB erase (25ms)
    		sets	.tst,#$0F
    		cmp	x,#14		wc	'if bytes >= $4000, set 32KB erase (100ms)
    	if_nc	setd	.cmd,#$52
    	if_nc	sets	.tst,#$7F
    		cmp	x,#15		wc	'if bytes >= $8000, set 64KB erase (140ms)
    	if_nc	setd	.cmd,#$D8
    	if_nc	sets	.tst,#$FF
    
    		callpa	#$06,#spi_cmd8		'write enable
    .cmd		callpa	#$20,#spi_cmd32		'erase 4/32/64KB block
    
    		call	#spi_wait		'wait for erase complete
    
    .page		callpa	#$06,#spi_cmd8		'write enable
    		callpa	#$02,#spi_cmd32		'program 256-byte page
    
    		xinit	rmode,pa		'2	start outputting 256*8 bits
    		wypin	tranp,#spi_ck		'2	start 256*8*2 clock transitions
    		waitxfi				'~4k	wait for streamer done
    
    		call	#spi_wait		'wait for program complete
    
    		sub	app_size,#$100	wcz	'if done, reset chip to reboot
    	if_be	hubset	reset
    
    		add	addr,#$0001		'inc address by 256
    
    .tst		test	addr,#$000F	wz	'if not 4/32/64KB block boundary, program next page
    	if_nz	jmp	#.page
    
    		jmp	#.block			'else, erase next block
    '
    '
    ' SPI command 8-bit - use callpa
    '
    spi_cmd8	drvh	#spi_cs			'new command
    		drvl	#spi_cs
    
    		xinit	bmode,pa		'2	start outputting 8 bits
    		wypin	#16,#spi_ck		'2	start 16 clock transitions
    	_ret_	waitxfi				'~16	wait for streamer to finish
    '
    '
    ' SPI command 32-bit - use callpa
    '
    spi_cmd32	drvh	#spi_cs			'new command
    		drvl	#spi_cs
    
    		shl	pa,#16			'shift command up
    		or	pa,addr			'or in address
    		shl	pa,#8			'shift up to get bytes: command[7:0], addr[15:0], $00
    		movbyts	pa,#%%0123		'rearrange bytes for top-to-bottom output
    
    		xinit	lmode,pa		'2	start outputting 32 bits
    		wypin	#64,#spi_ck		'2	start 64 clock transitions
    	_ret_	waitxfi				'~64	wait for streamer to finish
    '
    '
    ' SPI wait
    '
    spi_wait	getptr	x			'remember fifo pointer
    
    .try		callpa	#$05,#spi_cmd8		'issue read-status-register command
    
    		wrfast	#0,#0			'get result, write byte to hub at $00000
    
    		wypin	#16,#spi_ck		'2	start 16 clock transitions
    		waitx	#3			'2+3	align clock transitions with input sampling
    		xinit	smode,#0		'2	start inputting spi_do data to hub
    		waitxfi				'~16	wait for streamer to finish
    
    		wrfast	#0,#0			'wait for byte written to hub
    
    		rdbyte	y,#0			'get byte and check busy bit
    		test	y,#$01		wc
    	if_c	jmp	#.try			'if busy set, try again
    
    	_ret_	rdfast	#0,x			'busy clear, restore fifo read
    '
    '
    ' Data
    '
    loader_end	long	@loader + $400
    app_end		long	@app_start
    csum		byte	"Prop"
    
    tranp		long	256 * 8 * 2
    bmode		long	$4081_0008 + spi_di<<17	'streamer mode, 1-pin output, msb-first byte from s
    lmode		long	$4081_0020 + spi_di<<17	'streamer mode, 1-pin output, msb-first long from s
    rmode		long	$8081_0800 + spi_di<<17	'streamer mode, 1-pin output, msb-first $100 bytes from hub
    smode		long	$C081_0008 + spi_do<<17	'streamer mode, 1-pin input, msb-first byte to hub
    
    addr		long	$000000
    
    reset		long	$1000_0000
    
    x		res	1
    y		res	1
    
    
    '************
    '*  Loader  *
    '************
    '
    ' The ROM booter reads this code from the 8-pin flash, from addresses $000000..$0003FF,
    ' into cog registers $000..$0FF, then executes it in order to load the application.
    '
    ' The initial application data trailing this code at app_start..$0FF needs to be moved
    ' to hub $00000+. Then, any additionally-needed application data must be read from the
    ' flash and stored in the hub from where the initial application data left off.
    '
    ' Once all application data has been moved/loaded into the hub, cog 0 is restarted from
    ' hub $00000, in order to execute the application.
    '
    ' On entry, both spi_cs and spi_ck are low outputs, the flash is outputting bit7 of the
    ' byte at address $400 into spi_do. By cycling spi_ck, any additional application data
    ' can be read.
    '
    		org
    '
    '
    ' First, move application data in cog app_start..$0FF into hub $00000+.
    ' If application bytes met or exceeded, launch app
    '
    loader		setq	#$100-app_start-1	'move code from cog app_start..$0FF to hub $00000+
    		wrlong	app_start,#0
    
    		sub	app_bytes,w	wcz	'if app_bytes met or exceeded, done
    	if_be	coginit	#0,#$00000		'relaunch cog 0 from $00000
    '
    '
    ' Need to load more application data from flash, read in remaining bytes, launch app
    '
    		wrpin	#%01_00101_0,#spi_ck	'set spi_ck smart pin for transitions, drives low
    		fltl	#spi_ck			'reset smart pin
    		wxpin	#1,#spi_ck		'set transition timebase to clk/1
    		drvl	#spi_ck			'enable smart pin
    
    		setxfrq	##$4000_0000		'set streamer rate to clk/2
    
    		wrfast	#0,w			'ready to write to hub at app continuation
    
    .block		bmask	w,#12			'try max streamer block size for whole bytes (8191)
    		fle	w,app_bytes		'limit to number of bytes left
    		sub	app_bytes,w		'update number of bytes left
    
    		shl	w,#3			'get number of bits, insert into streamer command
    		setword	wmode,w,#0
    		shl	w,#1			'double for number of spi_ck transitions
    
    		wypin	w,#spi_ck		'2	start clock transitions
    		waitx	#3			'2+3	align clock transitions with input sampling
    		xinit	wmode,#0		'2	start inputting spi_do data to hub
    		waitxfi				'?	wait for streamer to finish
    
    		tjnz	app_bytes,#.block	'if more bytes left, read another block
    
    		wrfast	#0,#0			'done, ensure last data gets written to hub
    
    		wrpin	#0,#spi_ck		'clear spi_ck smart pin
    
    		coginit	#0,#$00000		'relaunch cog 0 from $00000
    '
    '
    ' Data
    '
    w		long	($100-app_start)*4	'initially, hub start address for additional app data
    wmode		long	$C081_0000 + spi_do<<17	'streamer mode, 1-pin input, msb-first bytes to hub
    
    app_bytes	long	0			'number of bytes in application (set by prep_data)
    checksum	long	0			'"Prop" - sum of $100 loader longs (set by prep_data)
    
    app_start					'data from here to $0FF is first part of application
    
    
    
    ' Example program which writes random values to P[63:56] every ~100ms using RCFAST
    
    byte	$FF,$F6,$DF,$F8,$1B,$0C,$60,$FD
    byte	$06,$FA,$DB,$F8,$42,$0F,$80,$FF
    byte	$1F,$00,$65,$FD,$EC,$FF,$9F,$FD
    


    Here's the object code, for size:
    Programmer code
    
    00000- 04 00 90 FD 18 00 00 00 01 A0 00 F1 50 9E 98 F1   '............P...'
    00010- 4F 02 00 11 02 9E 44 10 3F 9E 04 C6 28 9E 60 5D   'O.....D.?...(.`]'
    00020- 50 00 68 5C 00 00 00 FF D0 03 64 FC 64 01 7C FC   'P.h\......d.d.|.'
    00030- 00 B2 04 F6 00 05 DC FC 12 B4 60 FD 5A B2 00 F1   '..........`.Z...'
    00040- 59 A2 80 F1 00 00 00 FF D4 A3 64 FC 59 7A 64 FD   'Y.........d.Yzd.'
    00050- 50 78 64 FD 3C 94 0C FC 3C 02 1C FC 58 78 64 FD   'Pxd.<...<...Xxd.'
    00060- 58 76 64 FD 00 00 A0 FF 1D 00 64 FD 64 01 7C FC   'Xvd.......d.d.|.'
    00070- 74 02 04 F1 01 B2 80 F7 20 4E B4 F9 0F 64 BC F9   't....... N...d..'
    00080- 0E B2 14 F2 52 4E B4 39 7F 64 BC 39 0F B2 14 F2   '....RN.9.d.9....'
    00090- D8 4E B4 39 FF 64 BC 39 0E 0C 4C FB 12 40 4C FB   '.N.9.d.9..L..@L.'
    000A0- 68 00 B0 FD 0B 0C 4C FB 0F 04 4C FB F6 AB A0 FC   'h.....L...L.....'
    000B0- 3C A4 24 FC 24 36 60 FD 50 00 B0 FD 00 03 9C F1   '<.$.$6`.P.......'
    000C0- 00 B0 60 ED 01 AE 04 F1 0F AE CC F7 D4 FF 9F 5D   '..`............]'
    000D0- A0 FF 9F FD 59 7A 64 FD 58 7A 64 FD F6 A7 A0 FC   '....Yzd.Xzd.....'
    000E0- 3C 20 2C FC 24 36 60 0D 59 7A 64 FD 58 7A 64 FD   '< ,.$6`.Yzd.Xzd.'
    000F0- 10 EC 67 F0 57 EC 43 F5 08 EC 67 F0 1B EC FF F9   '..g.W.C...g.....'
    00100- F6 A9 A0 FC 3C 80 2C FC 24 36 60 0D 34 B2 60 FD   '....<.,.$6`.4.`.'
    00110- F0 0B 4C FB 00 00 8C FC 3C 20 2C FC 1F 06 64 FD   '..L.....< ,...d.'
    00120- 00 AC A4 FC 24 36 60 FD 00 00 8C FC 00 B4 C4 FA   '....$6`.........'
    00130- 01 B4 D4 F7 D8 FF 9F CD 59 00 78 0C 64 05 00 00   '........Y.x.d...'
    00140- D8 01 00 00 50 72 6F 70 00 10 00 00 08 00 F7 40   '....Prop.......@'
    00150- 20 00 F7 40 00 08 F7 80 08 00 F5 C0 00 00 00 00   ' ..@............'
    00160- 00 00 00 10 28 C4 65 FD                           '....(.e.
    
    Loader code
    
    00160-                         00 3A 64 FC 19 36 98 F1   '        .:d..6..'
    00170- 00 00 EC EC 3C 94 0C FC 50 78 64 FD 3C 02 1C FC   '....<...Pxd.<...'
    00180- 58 78 64 FD 00 00 A0 FF 1D 00 64 FD 19 00 88 FC   'Xxd.......d.....'
    00190- 0C 32 CC F9 1B 32 20 F3 19 36 80 F1 03 32 64 F0   '.2...2 ..6...2d.'
    001A0- 19 34 20 F9 01 32 64 F0 3C 32 24 FC 1F 06 64 FD   '.4 ..2d.<2$...d.'
    001B0- 00 34 A4 FC 24 36 60 FD F5 37 9C FB 00 00 8C FC   '.4..$6`..7......'
    001C0- 3C 00 0C FC 00 00 EC FC 8C 03 00 00 00 00 F5 C0   '<...............'
    001D0- 00 00 00 00 00 00 00 00
    
    Example application - blinks LEDs randomly
    
    001D0-                         FF F6 DF F8 1B 0C 60 FD   '        ......`.'
    001E0- 06 FA DB F8 42 0F 80 FF 1F 00 65 FD EC FF 9F FD   '....B.....e.....'
    
  • Very handy, those programming times look nice and responsive. We won't be waiting too long when we re-flash.

    I guess this inline flash+loader approach means we just need to keep our final applications $1D8 = 472 bytes shorter than 512kB so the whole thing can be downloaded in one go?

  • evanhevanh Posts: 15,916
    Chip,
    Not a good idea for demo program to be writing random data to EEPROM pins when it's enabled!

  • evanhevanh Posts: 15,916
    Nice seeing the streamer used for the programming too. Smooth. :)

  • AribaAriba Posts: 2,690
    Not every Flash chip supports 32kB block-erase, it may be even quite specific to Winbond. 4kB and 64kB are the standard sizes.
  • cgraceycgracey Posts: 14,155
    > @rogloh said:
    > Very handy, those programming times look nice and responsive. We won't be waiting too long when we re-flash.
    >
    > I guess this inline flash+loader approach means we just need to keep our final applications $1D8 = 472 bytes shorter than 512kB so the whole thing can be downloaded in one go?

    That is correct. I had always imagined the PC waiting for the device being programmed to finish, having some dialogue, but it's not really necessary. If the program time is very fast and it reboots quickly, so you can see that it works, maybe we don't need anything fancier. As I started working this out, it just kind of became what it now is.
  • cgraceycgracey Posts: 14,155
    > @evanh said:
    > Nice seeing the streamer used for the programming too. Smooth. :)

    It's funny how the fastest approach took the least amount of code.
  • cgraceycgracey Posts: 14,155
    > @evanh said:
    > Chip,
    > Not a good idea for demo program to be writing random data to EEPROM pins when it's enabled!

    That crossed my mind. Oh, there could even be electrical conflicts. Maybe I'll change it to resistive drive. Then, there's the probability that the data in the flash could be disturbed.
  • cgraceycgracey Posts: 14,155
    > @Ariba said:
    > Not every Flash chip supports 32kB block-erase, it may be even quite specific to Winbond. 4kB and 64kB are the standard sizes.

    Good to know. I'll change it to just use the 4KB and 64KB erase commands. The 32KB erase time wasn't much of a game-changer, anyway. Thanks, Ariba.
  • Ariba wrote: »
    Not every Flash chip supports 32kB block-erase, it may be even quite specific to Winbond. 4kB and 64kB are the standard sizes.

    Good point. BTW, what are the requirements that qualify a particular flash chip to be compatible with the P2 boot loader? Which commands and page sizes have to be supported? Frequency/timing should not be an issue, most chips support >100MHz.
  • cgraceycgracey Posts: 14,155
    ManAtWork wrote: »
    Ariba wrote: »
    Not every Flash chip supports 32kB block-erase, it may be even quite specific to Winbond. 4kB and 64kB are the standard sizes.

    Good point. BTW, what are the requirements that qualify a particular flash chip to be compatible with the P2 boot loader? Which commands and page sizes have to be supported? Frequency/timing should not be an issue, most chips support >100MHz.

    The ROM booter tries to get the flash on-line, no matter what mode it might have been in. Then, it issues a read command ($03) and reads in $400 bytes:
    '
    '
    ' Try to load from SPI memory
    '
    try_spi		drvh	#spi_cs			'drive spi_cs high
    		drvl	#spi_ck			'drive spi_ck low
    
    		neg	pb,#1			'set command bits to all 1's
    		drvh	#spi_do			'drive spi_do high in case quad/dual mode
    		callpa	#2,#spi_cmd		'send exit-quad command
    		callpa	#8,#spi_cmd		'send exit-quad command
    		callpa	#16,#spi_cmd		'send exit-dual command
    		fltl	#spi_do			'float spi_do
    
    		callpb	#$66,#spi_cmd8		'send reset-enable command
    		callpb	#$99,#spi_cmd8		'send reset command
    		waitx	##rc_max/20_000		'wait 50us
    
    		callpb	#$04,#spi_cmd8		'send write-disable command to clear WEL
    
    .wait		callpb	#$05,#spi_cmd8		'send read-status command
    		call	#spi_in			'get status
    		testbn	x,#1		wz	'if WEL high, no SPI memory (z=0)
    	if_nz	jmp	#.fail
    		testbn	x,#0		wz	'if BUSY high, wait for erase/write to finish
    	if_nz	jmp	#.wait
    
    		mov	pa,#32			'send read-from-start command
    		callpb	#$03,#spi_cmd
    
    		decod	y,#10			'ready to input $400 bytes from SPI
    		wrfast	#0,#0			'ready to write bytes to hub
    .data		call	#spi_in			'get byte
    		wfbyte	x			'store byte into hub
    		djnz	y,#.data		'loop for next byte (y=0 after)
    
    		rdfast	#0,#0			'ready to read longs from hub
    		rep	@.sum,#$100		'ready to read and sum $100 longs
    		rflong	z			'read long
    		add	y,z			'sum long
    .sum
    		cmp	y,csum		wz	'verify checksum, z=1 if okay
    		bitz	flags,#spi_ok		'if program verified, set spi_ok flag
    .fail
    
Sign In or Register to comment.