Shop Learn
Faster SPI Bus Transfers - Page 3 — Parallax Forums

Faster SPI Bus Transfers

13»

Comments

  • Ah thanks, so then the device only has to support $66, $99 and $03 commands (reset enable, reset and read).

    But what does
    		callpa	#2,#spi_cmd		'send exit-quad command
    		callpa	#8,#spi_cmd		'send exit-quad command
    		callpa	#16,#spi_cmd		'send exit-dual command
    
    do? Aren't the dual and quad modes automatically cancelled as soon as /CS goes high?
  • cgraceycgracey Posts: 13,610
    edited 2020-01-21 10:34
    ManAtWork wrote: »
    Ah thanks, so then the device only has to support $66, $99 and $03 commands (reset enable, reset and read).

    But what does
    		callpa	#2,#spi_cmd		'send exit-quad command
    		callpa	#8,#spi_cmd		'send exit-quad command
    		callpa	#16,#spi_cmd		'send exit-dual command
    
    do? Aren't the dual and quad modes automatically cancelled as soon as /CS goes high?

    From the Micron data sheet:
    Interface Rescue

    For interface rescue, the second part of the sequence is for exiting from dual or quad-
    SPI protocol by using the following FFh sequence: DQ0 and DQ3 equal to 1 for 16 clock
    cycles within S# LOW; S# becomes HIGH before 17th clock cycle. For DTR protocol, 1
    should be driven on both edges of clock for 16 cycles with S# LOW. After this two-part
    sequence, the extended-SPI protocol is active.

    I remember that we went through a long effort to figure out how to get out of every possible state that might inhibit our boot effort.

    By the way, I got rid of the $52 command (32KB sector erase). I'm getting the loader all cleaned up. I'll post the new version soon. Thanks for looking into these matters. I just looked at a bunch of 32Mb SPI flash datasheets on Digi-Key. Lots of differences in the obscure details, but we need to be sure to stay within the common functionalities.
  • cgracey wrote: »
    I just looked at a bunch of 32Mb SPI flash datasheets on Digi-Key. Lots of differences in the obscure details, but we need to be sure to stay within the common functionalities.

    BTW, is there any reason why the flash has to be so large? I was hoping that I could also use a 4 or 8Mb (512k or 1MB) chip. Of course, for a development board saving $1 makes no difference. For later volume production it does.
  • cgraceycgracey Posts: 13,610
    edited 2020-01-21 11:50
    ManAtWork wrote: »
    cgracey wrote: »
    I just looked at a bunch of 32Mb SPI flash datasheets on Digi-Key. Lots of differences in the obscure details, but we need to be sure to stay within the common functionalities.

    BTW, is there any reason why the flash has to be so large? I was hoping that I could also use a 4 or 8Mb (512k or 1MB) chip. Of course, for a development board saving $1 makes no difference. For later volume production it does.

    As long as it supports the commands, it should be fine.

    We just put a big one on because it would be neat to use it as an SSD for computing apps.
  • Ok, I understand. I've just started another thread to further address the compatibility question.

    One more enhancement suggestion: Could you consider adding a verify pass to the downloader? I know this makes programming a bit slower but I think it's always a good feeling to get some feedback instead of blindly trusting that everything went well.

    We've programmed nearly 10,000 P1 boards the last 10 years and we had only two or three cases of bad flash chips. I don't even think it was actually the fault of the flash but rather a bad P1 that wasn't able to program the flash. Don't mind... But I mean it's always good to spot errors early.
  • cgraceycgracey Posts: 13,610
    edited 2020-01-21 17:06
    I found there was lots to improve in the flash loader.

    It now only does only 4KB and 64KB block erases, so it's compatible with maybe every 16MB (and smaller) SPI flash out there. I was able shrunk it by 88 bytes, so it's now only 384 bytes.

    Here's the object code:
    Programmer code:
    
    00000- 04 00 90 FD 10 00 00 00 78 01 C0 FE 61 03 64 FC   '........x...a.d.'
    00010- 80 03 04 F1 28 FE 65 FD 01 00 68 FC 10 03 84 F1   '....(.e...h.....'
    00020- FF 02 04 F1 08 02 44 F0 04 02 04 F3 10 01 7C FC   '......D.......|.'
    00030- 00 05 DC FC 12 00 60 FD 00 BC 80 F1 00 BD 64 FC   '......`.......d.'
    00040- 59 7A 64 FD 50 78 64 FD 3C 94 0C FC 3C 02 1C FC   'Yzd.Pxd.<...<...'
    00050- 58 78 64 FD 58 76 64 FD 1D B8 60 FD 10 01 7C FC   'Xxd.Xvd...`...|.'
    00060- 40 02 1C F2 20 38 B4 E9 0F 4C BC E9 0F 0C 4C FB   '@... 8...L....L.'
    00070- 13 B0 4D FB 6C 00 B0 FD 0C 0C 4C FB 10 04 4C FB   '..M.l.....L...L.'
    00080- F6 87 A0 FC 3C 80 24 FC 24 36 60 FD 54 00 B0 FD   '....<.$.$6`.T...'
    00090- 04 02 64 FB 01 7E 04 F1 FF 7E CC F7 D8 FF 9F 5D   '..d..~...~.....]'
    000A0- BC FF 9F FD 00 00 88 FF 00 00 64 FD 59 7A 64 FD   '..........d.Yzd.'
    000B0- 58 7A 64 FD F6 83 A0 FC 3C 20 2C FC 24 36 60 0D   'Xzd.....< ,.$6`.'
    000C0- 10 EC 67 F0 3F EC 43 F5 08 EC 67 F0 1B EC FF F9   '..g.?.C...g.....'
    000D0- 59 7A 64 FD 58 7A 64 FD F6 85 A0 FC 3C 80 2C FC   'Yzd.Xzd.....<.,.'
    000E0- 24 36 60 0D F1 0B 4C FB 3C 20 2C FC 1F 26 64 FD   '$6`...L.< ,..&d.'
    000F0- 40 74 74 FD EC FF 9F CD 2D 00 64 FD 00 00 00 00   '@tt.....-.d.....'
    00100- 00 10 00 00 08 00 F7 40 20 00 F7 40 00 08 F7 80   '.......@ ..@....'
    
    Loader code:
    
    00110- 28 C6 65 FD 00 38 64 FC 17 34 98 F1 3C 94 0C 1C   '(.e..8d..4..<...'
    00120- 50 78 64 1D 3C 02 1C 1C 58 78 64 1D 1D 30 60 1D   'Pxd.<...Xxd..0`.'
    00130- 17 00 88 1C 0C 2E CC 19 1A 2E 20 13 17 34 80 11   '.......... ..4..'
    00140- 03 2E 64 10 17 32 20 19 01 2E 64 10 3C 2E 24 1C   '..d..2 ...d.<.$.'
    00150- 1F 06 64 1D 00 32 A4 1C 24 36 60 1D F5 35 9C 1B   '..d..2..$6`..5..'
    00160- 00 00 8C 1C 3C 00 0C 1C 00 00 EC FC 90 03 00 00   '....<...........'
    00170- 00 00 00 40 00 00 F5 C0 00 00 00 00 B0 8D 90 8F   '...@............'
    
    Example application appended, blinks LEDs:
    
    00180- 5F F0 67 FD 25 26 80 FF 1F 80 66 FD F0 FF 9F FD   '_.g.%&....f.....'
    

    Here is the source:
    ' *** SPI FLASH PROGRAMMER AND LOADER
    ' *** Works with 16MB SPI flash chips.
    ' *** Writes loader and application to SPI flash, then reboots to execute.
    '
    ' Use:	1) Append application bytes at app_start.
    '	2) Set app_size to number of application bytes.
    '	3) Download and execute composite image.
    '	4) After programming completes, application will boot.
    '
    '
    '	Program/Boot performance using Winbond W25Q128 (RCFAST)
    '
    '			program		boot
    '	bytes		time		time
    '	-------------------------------------
    '	0..2KB		30ms		10ms
    '	   4KB		60ms		11ms
    '	   8KB		94ms		14ms
    '	  16KB		170ms		20ms
    '	  32KB		200ms		30ms
    '	  64KB		300ms		52ms
    '	 128KB		570ms		95ms
    '	 256KB		1.1s		184ms
    '	 512KB		2.2s		358ms
    '
    CON	spi_cs = 61
    	spi_ck = 60
    	spi_di = 59
    	spi_do = 58
    
    '****************
    '*  Programmer  *
    '****************
    '
    DAT		org
    
    x		jmp	#prep_data		'@0: jump to prep_data
    
    app_size	long	16 '(per example)	'@4: application size in bytes (set by compiler)
    '
    '
    ' Set app_bytes in loader
    '
    prep_data	loc	ptra,#\@app_bytes	'ready to write app_bytes and checksum into loader
    
    		wrlong	app_size,ptra++		'set app_bytes in loader
    '
    '
    ' Append trailing zeros after application
    '
    		add	app_size,#@app_start	'add $400 zeros after app to fill loader or last flash page
    		setq	#$100-1
    		wrlong	#0,app_size
    '
    '
    ' Determine number of 256-byte pages to program
    '
    		sub	app_size,#@loader	'determine number of 256-byte pages to program
    		add	app_size,#$FF
    		shr	app_size,#8
    		fge	app_size,#4		'four pages are needed to cover loader
    '
    '
    ' Calculate and install checksum in loader
    '
    		rdfast	#0,#@loader		'sum $100 longs of loader
    		rep	#2,#$100
    		rflong	x
    		sub	@app_bytes/4,x		'(use 'long 0' from loader)
    
    		wrlong	@app_bytes/4,ptra	'set checksum in loader
    '
    '
    ' Get ready to program flash
    '
    		drvh	#spi_cs			'spi_cs high
    
    		fltl	#spi_ck			'reset smart pin spi_ck
    		wrpin	#%01_00101_0,#spi_ck	'set spi_ck for transition output, starts out low
    		wxpin	#1,#spi_ck		'set timebase to 1 clock per transition
    		drvl	#spi_ck			'enable smart pin
    
    		drvl	#spi_di			'spi_di low
    
    		setxfrq	@clk2/4			'set streamer rate to clk/2 (use clk2 from loader)
    
    		rdfast	#0,#@loader		'start fifo read at loader
    '
    '
    ' Main loop - erase 64KB/4KB block, program 256/16 sequential 256-byte pages, repeat
    '
    .block		cmp	app_size,#$40	wcz	'initially set for 64KB erase (140ms)
    	if_be	setd	.cmd,#$20		'if pages <= $40, set 4KB erase (25ms)
    	if_be	sets	.tst,#$0F
    
    		callpa	#$06,#spi_cmd8		'write enable
    .cmd		callpa	#$D8,#spi_cmd32		'erase 64KB/4KB block
    
    		call	#spi_wait		'wait for erase cycle to complete
    
    .page		callpa	#$06,#spi_cmd8		'write enable
    		callpa	#$02,#spi_cmd32		'program 256-byte page
    
    		xinit	rmode,pa		'2	start outputting 256*8 bits
    		wypin	tranp,#spi_ck		'2	start 256*8*2 clock transitions
    		waitxfi				'~4k	wait for streamer done
    
    		call	#spi_wait		'wait for program cycle to complete
    
    		djz	app_size,#.reboot	'decrement pages, if zero then reboot
    
    		add	page,#$0001		'if not 64KB/4KB block boundary, program next page
    .tst		test	page,#$00FF	wz
    	if_nz	jmp	#.page
    
    		jmp	#.block			'else, erase next block
    '
    '
    ' Done programming, reboot chip to launch application
    '
    .reboot		hubset	##$1000_0000		'generate hardware reset
    '
    '
    ' SPI command 8-bit - use callpa
    '
    spi_cmd8	drvh	#spi_cs			'start new command
    		drvl	#spi_cs
    
    		xinit	bmode,pa		'2	start outputting 8 bits to spi_di
    		wypin	#16,#spi_ck		'2	start 16 spi_ck transitions
    	_ret_	waitxfi				'~16	wait for streamer to finish
    '
    '
    ' SPI command 32-bit - use callpa
    '
    spi_cmd32	shl	pa,#16			'shift command up
    		or	pa,page			'or in page
    		shl	pa,#8			'shift up to get {command[7:0], page[15:0], 8'h00}
    		movbyts	pa,#%%0123		'rearrange bytes for top-to-bottom output
    
    		drvh	#spi_cs			'start new command
    		drvl	#spi_cs
    
    		xinit	lmode,pa		'2	start outputting 32 bits to spi_di
    		wypin	#64,#spi_ck		'2	start 64 spi_ck transitions
    	_ret_	waitxfi				'~64	wait for streamer to finish
    '
    '
    ' SPI wait
    '
    spi_wait	callpa	#$05,#spi_cmd8		'read status register
    
    		wypin	#16,#spi_ck		'2	start 16 spi_ck transitions
    		waitx	#16+3			'2+19	align testp with last spi_ck transition
    		testp	#spi_do		wc	'2	sample spi_do to get busy bit
    
    	if_c	jmp	#spi_wait		'if busy set, try again
    
    		ret
    '
    '
    ' Data
    '
    page		long	$0000
    tranp		long	256 * 8 * 2
    bmode		long	$4081_0008 + spi_di<<17	'streamer mode, 1-pin output, msb-first byte from s
    lmode		long	$4081_0020 + spi_di<<17	'streamer mode, 1-pin output, msb-first long from s
    rmode		long	$8081_0800 + spi_di<<17	'streamer mode, 1-pin output, msb-first $100 bytes from hub
    
    
    '************
    '*  Loader  *
    '************
    '
    ' The ROM booter reads this code from the 8-pin flash, from addresses $000000..$0003FF,
    ' into cog registers $000..$0FF, then executes it in order to load the application.
    '
    ' The initial application data trailing this code at app_start..$0FF needs to be moved
    ' to hub $00000+. Then, any additionally-needed application data must be read from the
    ' flash and stored in the hub from where the initial application data left off.
    '
    ' Once all application data has been moved/loaded into the hub, cog 0 is restarted from
    ' hub $00000, in order to execute the application.
    '
    ' On entry, both spi_cs and spi_ck are low outputs, the flash is outputting bit7 of the
    ' byte at address $400 into spi_do. By cycling spi_ck, any additional application data
    ' can be read.
    '
    		org
    '
    '
    ' First, move application data in cog app_start..$0FF into hub $00000+.
    '
    loader		setq	#$100-app_start-1	'move code from cog app_start..$0FF to hub $00000+
    		wrlong	app_start,#0
    
    		sub	app_bytes,w	wcz	'if app_bytes met or exceeded, done
    '
    '
    ' If need to load more application data from flash, read in remaining bytes
    '
    	if_a	wrpin	#%01_00101_0,#spi_ck	'set spi_ck smart pin for transitions, drives low
    	if_a	fltl	#spi_ck			'reset smart pin
    	if_a	wxpin	#1,#spi_ck		'set transition timebase to clk/1
    	if_a	drvl	#spi_ck			'enable smart pin
    
    	if_a	setxfrq	clk2			'set streamer rate to clk/2
    
    	if_a	wrfast	#0,w			'ready to write to hub at app continuation
    
    .block	if_a	bmask	w,#12			'try max streamer block size for whole bytes ($1FFF)
    	if_a	fle	w,app_bytes		'limit to number of bytes left
    	if_a	sub	app_bytes,w		'update number of bytes left
    
    	if_a	shl	w,#3			'get number of bits
    	if_a	setword	wmode,w,#0		'insert into streamer command
    	if_a	shl	w,#1			'double for number of spi_ck transitions
    
    	if_a	wypin	w,#spi_ck		'2	start spi_ck transitions
    	if_a	waitx	#3			'2+3	align spi_ck transitions with spi_do sampling
    	if_a	xinit	wmode,#0		'2	start inputting spi_do bits to hub
    	if_a	waitxfi				'?	wait for streamer to finish
    
    	if_a	tjnz	app_bytes,#.block	'if more bytes left, read another block
    
    	if_a	wrfast	#0,#0			'done, ensure last byte gets written to hub
    
    	if_a	wrpin	#0,#spi_ck		'clear spi_ck smart pin
    '
    '
    ' Launch application
    '
    		coginit	#0,#$00000		'relaunch cog 0 from $00000
    '
    '
    ' Data
    '
    w		long	($100-app_start)*4	'initially, hub start address for additional app data
    clk2		long	$4000_0000		'clk/2 nco value for streamer
    wmode		long	$C081_0000 + spi_do<<17	'streamer mode, 1-pin input, msb-first bytes to hub
    app_bytes	long	0			'number of bytes in application (set by prep_data)
    checksum	byte	-"P",!"r",!"o",!"p"	'"Prop" - sum of $100 loader longs (set by prep_data)
    '
    '
    ' Application start
    '
    app_start					'append application bytes after this label
    
    
    
    ' Example program which toggles P[63:56] every ~250ms using RCFAST
    
    byte	$5F,$F0,$67,$FD,$25,$26,$80,$FF,$1F,$80,$66,$FD,$F0,$FF,$9F,$FD
    

    Now it'll go into PNut.exe.
  • cgraceycgracey Posts: 13,610
    The flash loader is in PNut.exe and it's downloading code.

    Short Spin2 programs (which include the 4KB interpreter) take 280ms to download, program to flash, and execute. That seemed long and I realized that the reason is that the P2 is undergoing a reset and re-running the ROM, waiting through a >100ms host-connect time window, before running the flash code. A straight download without the flash programmer takes only 85ms. I don't think there's any reason to fake a reset, instead of doing one, though, because programming flash is a relatively-rare operation and not so time-critical on the rebound.
  • msrobotsmsrobots Posts: 3,409
    edited 2020-01-22 00:29
    Since when you are loading this you just came out of a reset, couldn't you just jump into the ROM instead of resetting?
  • evanhevanh Posts: 11,253
    edited 2020-01-22 01:53
    Mike,
    Chip is meaning an SPI reset of the Flash part, not the Prop2. It is targetted at post-hard-reset of the Prop2, when the SPI chip might still be in some odd mode.
  • evanhevanh Posts: 11,253
    Chip,
    It dawned on me the streamer modes as is won't work with revA Prop2's. In particular the immediate serial mode doesn't even exist in revA. That's not ideal.
  • cgraceycgracey Posts: 13,610
    > @evanh said:
    > Chip,
    > It dawned on me the streamer modes as is won't work with revA Prop2's. In particular the immediate serial mode doesn't even exist in revA. That's not ideal.

    Rev B got lots of improvements over Rev A. Some incompatibilities were introduced. There are only -120 Rev A chips in existence, with thousands more Rev B's coming.
  • Cluso99Cluso99 Posts: 17,833
    cgracey wrote: »
    > @evanh said:
    > Chip,
    > It dawned on me the streamer modes as is won't work with revA Prop2's. In particular the immediate serial mode doesn't even exist in revA. That's not ideal.

    Rev B got lots of improvements over Rev A. Some incompatibilities were introduced. There are only -120 Rev A chips in existence, with thousands more Rev B's coming.
    Don’t you mean a few hundred Rev B, and thousands of Rev Cs coming?
    Although RevC is only a minor ADC pin modification.
  • cgraceycgracey Posts: 13,610
    Cluso99 wrote: »
    cgracey wrote: »
    > @evanh said:
    > Chip,
    > It dawned on me the streamer modes as is won't work with revA Prop2's. In particular the immediate serial mode doesn't even exist in revA. That's not ideal.

    Rev B got lots of improvements over Rev A. Some incompatibilities were introduced. There are only -120 Rev A chips in existence, with thousands more Rev B's coming.
    Don’t you mean a few hundred Rev B, and thousands of Rev Cs coming?
    Although RevC is only a minor ADC pin modification.

    Yes, I'm sorry. I think we received about 1,000 Rev B's and we've got 7,500 Rev C's arriving soon.
  • cgraceycgracey Posts: 13,610
    edited 2020-01-27 21:23
    I've got checksums added to the flash programmer/loader.

    When the data is downloaded, a checksum is verified. Then, the flash is programmed. On each boot, the application data is checksum-verified before execution. This is very safe, I think.

    All you need to do to use this is append your application data, pad to the next long alignment, then add up all the longs in the entire image and write the negative of the sum to the long at offset 4. Download the data to execute the programmer and it will boot your application when done and on every reset, thereafter.

    Here's the object code:
    CLKMODE:   $00000000
    CLKFREQ:  20,000,000
    XINFREQ:           0
    
    Hub bytes:         456
    
    00000- 31 02 64 FD 00 00 00 00 34 00 60 FD 28 FE 65 FD   '1.d.....4.`.(.e.'
    00010- 00 00 68 FC 02 00 44 F0 00 00 7C FC 00 04 D8 FC   '..h...D...|.....'
    00020- 12 02 60 FD 01 DC 08 F1 78 01 90 5D B8 01 C0 FE   '..`.....x..]....'
    00030- 72 00 84 F1 61 01 64 FC 61 01 64 FC C8 01 7C FC   'r...a.d.a.d...|.'
    00040- 00 04 D8 FC 12 02 60 FD 01 DE 80 F1 61 DF 64 FC   '......`.....a.d.'
    00050- 38 01 7C FC 00 05 DC FC 12 02 60 FD 01 E0 80 F1   '8.|.......`.....'
    00060- 61 E1 64 FC 24 00 04 F1 3F 00 04 F1 06 00 44 F0   'a.d.$...?.....D.'
    00070- 04 00 04 F3 59 7A 64 FD 50 78 64 FD 3C 94 0C FC   '....Yzd.Pxd.<...'
    00080- 3C 02 1C FC 58 78 64 FD 58 76 64 FD 1D D8 60 FD   '<...Xxd.Xvd...`.'
    00090- 38 01 7C FC 40 00 1C F2 20 52 B4 E9 0F 66 BC E9   '8.|.@... R...f..'
    000A0- 0F 0C 4C FB 13 B0 4D FB 64 00 B0 FD 0C 0C 4C FB   '..L...M.d.....L.'
    000B0- 10 04 4C FB F6 9B A0 FC 3C 94 24 FC 24 36 60 FD   '..L.....<.$.$6`.'
    000C0- 4C 00 B0 FD 04 00 64 FB 01 DC 04 F1 FF DC CC F7   'L.....d.........'
    000D0- D8 FF 9F 5D BC FF 9F FD 00 00 88 FF 00 00 64 FD   '...]..........d.'
    000E0- 59 7A 64 FD 58 7A 64 FD F6 97 A0 FC 3C 20 2C FC   'Yzd.Xzd.....< ,.'
    000F0- 24 36 60 0D 6E EC 2B F9 6C EC FF F9 59 7A 64 FD   '$6`.n.+.l...Yzd.'
    00100- 58 7A 64 FD F6 99 A0 FC 3C 80 2C FC 24 36 60 0D   'Xzd.....<.,.$6`.'
    00110- F3 0B 4C FB 3C 20 2C FC 1F 26 64 FD 40 74 74 FD   '..L.< ,..&d.@tt.'
    00120- EC FF 9F CD 2D 00 64 FD 00 10 00 00 08 00 F7 40   '....-.d........@'
    00130- 20 00 F7 40 00 08 F7 80 28 B6 65 FD 00 48 64 FC   ' ..@....(.e..Hd.'
    00140- DC 40 9C F1 00 00 EC EC 3C 94 0C FC 50 78 64 FD   '.@......<...Pxd.'
    00150- 3C 02 1C FC 58 78 64 FD 1D 3C 60 FD 01 00 00 FF   '<...Xxd..<`.....'
    00160- 70 01 8C FC 0A 46 CC F9 20 46 20 F3 23 40 80 F1   'p....F.. F .#@..'
    00170- 05 46 64 F0 23 3E 20 F9 01 46 64 F0 3C 46 24 FC   '.Fd.#> ..Fd.<F$.'
    00180- 1F 06 64 FD 00 3E A4 FC 24 36 60 FD F5 41 9C FB   '..d..>..$6`..A..'
    00190- 3C 00 0C FC 00 00 7C FC 21 04 D8 FC 12 46 60 FD   '<.....|.!....F`.'
    001A0- 23 44 08 F1 50 76 65 5D 00 04 64 5D 00 00 EC FC   '#D..Pve]..d]....'
    001B0- 00 00 00 40 00 00 F5 C0 00 00 00 00 00 00 00 00   '...@............'
    001C0- 00 00 00 00 B0 8D 90 8F                           '........'
    

    Here's the source:
    ' *** SPI FLASH PROGRAMMER AND BOOT LOADER
    ' *** Writes loader and application to SPI flash, then reboots to execute.
    ' *** All data is checksum-verified before programming and on each boot.
    '
    ' Use:	1) Append application bytes at app_start, pad to long alignment
    '	2) Write negative sum of all longs to long at offset 4
    '	3) Download all longs to execute flash programmer
    '	4) After flash programmer finishes, chip reboots to application.
    '
    '
    '	Program/Boot performance using Winbond W25Q128 (RCFAST)
    '
    '			program		boot
    '	bytes		time		time
    '	-------------------------------------
    '	0..2KB		30ms		10ms
    '	   4KB		60ms		11ms
    '	   8KB		94ms		14ms
    '	  16KB		170ms		20ms
    '	  32KB		200ms		30ms
    '	  64KB		300ms		52ms
    '	 128KB		570ms		95ms
    '	 256KB		1.1s		184ms
    '	 512KB		2.2s		358ms
    '
    CON	spi_cs = 61
    	spi_ck = 60
    	spi_di = 59
    	spi_do = 58
    
    
    '****************
    '*  Programmer  *
    '****************
    '
    DAT		org
    
    s		skip	#1			'@0: skip checksum			(reused as s)
    v		long	0			'@4: negative sum of all longs		(reused as v, set by compiler)
    '
    '
    ' Get number of bytes, add $400 zero bytes after download, verify checksum
    '
    		getptr	s			'get size of download in bytes
    
    		setq	#$400/4-1		'add $400 zeros after app to pad loader or last flash page
    		wrlong	#0,s
    
    		shr	s,#2			'get size of download in longs
    
    		rdfast	#0,#0			'verify checksum
    		rep	#2,s
    		rflong	v
    		add	@zeroa/4,v	wz	'(if checksum passes, @zeroa/4 = 0 afterwards)
    
    	if_nz	jmp	#@stop/4		'if checksum failed, float spi pins and stop clock
    '
    '
    ' Write settings into loader
    '
    		loc	ptra,#\@app_longs	'point to loader settings
    
    		sub	s,#@app_start/4		'get size of application in longs
    
    		wrlong	s,ptra++		'write app_longs in loader
    		wrlong	s,ptra++		'write app_longs2 in loader
    
    		rdfast	#0,#@app_start		'calculate app checksum
    		rep	#2,s
    		rflong	v
    		sub	@zerob/4,v
    		wrlong	@zerob/4,ptra++		'write app_sum in loader
    
    		rdfast	#0,#@loader		'calculate loader checksum
    		rep	#2,#$100
    		rflong	v
    		sub	@zeroc/4,v
    		wrlong	@zeroc/4,ptra++		'write loader_sum in loader
    '
    '
    ' Determine number of 256-byte pages to program to flash
    '
    		add	s,#app_start		'get size of flash data in longs
    		add	s,#$3F			'round upwards to next chunk of 64 longs
    		shr	s,#6			'get number of 256-byte pages of flash data
    		fge	s,#4			'a minimum of four pages are needed to cover loader
    '
    '
    ' Get ready to program flash
    '
    		drvh	#spi_cs			'spi_cs high
    
    		fltl	#spi_ck			'reset smart pin spi_ck
    		wrpin	#%01_00101_0,#spi_ck	'set spi_ck for transition output, starts out low
    		wxpin	#1,#spi_ck		'set timebase to 1 clock per transition
    		drvl	#spi_ck			'enable smart pin
    
    		drvl	#spi_di			'spi_di low
    
    		setxfrq	@clk2/4			'set streamer rate to clk/2
    
    		rdfast	#0,#@loader		'start fifo read at loader
    '
    '
    ' Main loop - erase 64KB/4KB blocks, program 256/16 sequential 256-byte pages, reboot when done
    '
    .block		cmp	s,#$40		wcz	'if pages <= $40, set 4KB erase @25ms
    	if_be	setd	.cmd,#$20		'(initially set for 64KB erase @140ms)
    	if_be	sets	.tst,#$0F
    
    		callpa	#$06,#spi_cmd1		'enable write
    .cmd		callpa	#$D8,#spi_cmd4		'erase 64KB/4KB block
    
    		call	#spi_wait		'wait for erase cycle to complete
    
    .page		callpa	#$06,#spi_cmd1		'enable write
    		callpa	#$02,#spi_cmd4		'program 256-byte page
    
    		xinit	rmode,pa		'2	start outputting 256*8 bits
    		wypin	tranp,#spi_ck		'2	start 256*8*2 clock transitions
    		waitxfi				'~4k	wait for streamer done
    
    		call	#spi_wait		'wait for program cycle to complete
    
    		djz	s,#.reboot		'decrement pages, reboot when done
    
    		add	@zeroa/4,#$0001		'if not 64KB/4KB block boundary, program next page
    .tst		test	@zeroa/4,#$00FF	wz
    	if_nz	jmp	#.page
    
    		jmp	#.block			'else, erase next block
    '
    '
    ' Done, reboot chip to launch application
    '
    .reboot		hubset	##$1000_0000		'generate hardware reset
    '
    '
    ' SPI command, 1 byte - use callpa
    '
    spi_cmd1	drvh	#spi_cs			'start new command
    		drvl	#spi_cs
    
    		xinit	bmode,pa		'2	start outputting 8 bits to spi_di
    		wypin	#16,#spi_ck		'2	start 16 spi_ck transitions
    	_ret_	waitxfi				'~16	wait for streamer to finish
    '
    '
    ' SPI command, 4 bytes - use callpa
    '
    spi_cmd4	setword	pa,@zeroa/4,#1		'get page address into pa[31:16]
    		movbyts	pa,#%%1230		'rearrange bytes to get {8'h00, page[7:0], page[15:8], command[7:0]}
    
    		drvh	#spi_cs			'start new command
    		drvl	#spi_cs
    
    		xinit	lmode,pa		'2	start outputting 32 bits to spi_di
    		wypin	#64,#spi_ck		'2	start 64 spi_ck transitions
    	_ret_	waitxfi				'~64	wait for streamer to finish
    '
    '
    ' SPI wait
    '
    spi_wait	callpa	#$05,#spi_cmd1		'read status register
    
    		wypin	#16,#spi_ck		'2	start 16 spi_ck transitions
    		waitx	#16+3			'2+19	align testp with last spi_ck transition
    		testp	#spi_do		wc	'2	sample spi_do to get busy bit
    
    	if_c	jmp	#spi_wait		'if busy, try again
    
    		ret
    '
    '
    ' Data
    '
    tranp		long	256 * 8 * 2
    bmode		long	$4081_0008 + spi_di<<17	'streamer mode, 1-pin output, bytes-msb-first, 1 byte from s
    lmode		long	$4081_0020 + spi_di<<17	'streamer mode, 1-pin output, bytes-msb-first, 4 bytes from s
    rmode		long	$8081_0800 + spi_di<<17	'streamer mode, 1-pin output, bytes-msb-first, $100 bytes from hub
    
    
    '************
    '*  Loader  *
    '************
    '
    ' The ROM booter reads this code from the 8-pin SPI flash from $000000..$0003FF, into cog
    ' registers $000..$0FF. If the booter verifies the 'Prop' checksum, it does a 'JMP #0' to
    ' execute this loader code.
    '
    ' The initial application data trailing this code in registers app_start..$0FF are moved to
    ' hub RAM, starting at $00000. Then, any additional application data are read from the flash
    ' and stored into the hub, continuing from where the initial application data left off.
    '
    ' On entry, both spi_cs and spi_ck are low outputs and the flash is outputting bit 7 of the
    ' byte at address $400 on spi_do. By cycling spi_ck, any additional application data can be
    ' received from spi_do.
    '
    ' Once all application data is in the hub, an application checksum is verified, after which
    ' cog 0 is restarted by a 'COGINIT #0,#$00000' to execute the application. If that checksum
    ' fails, due to some data corruption, the SPI pins will be floated and the clock stopped
    ' until the next reset. As well, a checksum is verified upon initial download of all data,
    ' before programming the flash. This all ensures that no errant application code will boot.
    '
    		org
    '
    '
    ' First, move application data in cog app_start..$0FF into hub $00000+
    '
    loader		setq	#$100-app_start-1	'move code from cog app_start..$0FF to hub $00000+
    		wrlong	app_start,#0
    
    		sub	app_longs,#$100-app_start  wcz	'if app longs met or exceeded, run application
    	if_be	coginit	#0,#$00000			'(small applications verified by 'Prop' checksum)
    '
    '
    ' Read in remaining application longs
    '
    		wrpin	#%01_00101_0,#spi_ck	'set spi_ck smart pin for transitions, drives low
    		fltl	#spi_ck			'reset smart pin
    		wxpin	#1,#spi_ck		'set transition timebase to clk/1
    		drvl	#spi_ck			'enable smart pin
    
    		setxfrq	clk2			'set streamer rate to clk/2
    
    		wrfast	#0,##$400-app_start*4	'ready to write to hub at application continuation
    
    .block		bmask	x,#10			'try max streamer block size for longs ($7FF)
    		fle	x,app_longs		'limit to number of longs left
    		sub	app_longs,x		'update number of longs left
    
    		shl	x,#5			'get number of bits
    		setword	wmode,x,#0		'insert into streamer command
    		shl	x,#1			'double for number of spi_ck transitions
    
    		wypin	x,#spi_ck		'2	start spi_ck transitions
    		waitx	#3			'2+3	align spi_ck transitions with spi_do sampling
    		xinit	wmode,#0		'2	start inputting spi_do bits to hub, bytes-msb-first
    		waitxfi				'?	wait for streamer to finish
    
    		tjnz	app_longs,#.block	'if more longs left, read another block
    
    		wrpin	#0,#spi_ck		'clear spi_ck smart pin mode
    '
    '
    ' Verify application checksum
    '
    		rdfast	#0,#0			'sum all application longs
    		rep	#2,app_longs2
    		rflong	x
    		add	app_sum,x	wz	'z=1 if verified
    
    stop	if_nz	fltl	#spi_di addpins 2	'if checksum failed, float spi_cs/spi_ck/spi_di pins
    	if_nz	hubset	#%0010			'..and stop clock until next reset
    
    		coginit	#0,#$00000		'checksum verified, run application
    '
    '
    ' Data
    '
    clk2		long	$4000_0000		'clk/2 nco value for streamer
    wmode		long	$C081_0000 + spi_do<<17	'streamer mode, 1-pin input, bytes-msb-first, bytes to hub
    
    zeroa						'(used by programmer as long 0)
    app_longs	long	0			'number of longs in application		(set by programmer)
    zerob						'(used by programmer as long 0)
    app_longs2	long	0			'number of longs in application		(set by programmer)
    zeroc						'(used by programmer as long 0)
    app_sum		long	0			'-sum of application longs		(set by programmer)
    x						'(used by loader as variable)
    loader_sum	byte	-"P",!"r",!"o",!"p"	'"Prop" - sum of $100 loader longs	(set by programmer)
    '
    '
    ' Application start
    '
    app_start					'append application bytes after this label
    


  • Chip, are you partitioning the flash so there is a flip-flop for code loading?

    Something like having a permanent boot loader that checks for a location and checksum in a block, if it's valid it loads the address from that block, then the program code is loaded indirectly?

    The flash would look like:
    00000 2nd stage bootloader
    01000 prog block 0 version+addr+checksum
    02000 prog block 1 version+addr+checksum
    03000 program 0
    83000 program 1
    

    When uploading a new program, you would flip-flop program blocks, the 2nd stage bootloader would look at prog block 0 and 1 and pick the one with the higher version. If the checksum of the prog-block is valid, it would load the program and checksum it, if it's valid it would start executing. If a problem happens where the program isn't fully written, the checksum is invalid and it falls back to the "backup" program and loads that. The purpose is to prevent power outages and failures from causing a bricked device.
  • cgraceycgracey Posts: 13,610
    Pedward, I'm not doing that now. I can add that later, though.

    I've almost got Spin2 done. Just doing some reality checks on the Delphi code now.
  • @cgracey
    It looks like using one's complement addition for the checksum (addx instead of add) could improve error detection marginally, for no impact to execution speed or code space (that I can see).
  • Cluso99Cluso99 Posts: 17,833
    XOR was used as it was considered reasonable before CRCs were used.
    But we have a CRC bit and a CRC byte instruction, so ehy not use the CRC byte instruction?
  • Cluso99 wrote: »
    XOR was used as it was considered reasonable before CRCs were used.
    But we have a CRC bit and a CRC byte instruction, so ehy not use the CRC byte instruction?

    I looked at that, but it's CRCBIT and CRCNIB. As 32-bit one's complement addition trends towards 1.5% undetected errors, do we get enough benefit from CRC to justify the overhead?

    Of all of the options in common use, it turns out that XOR is the worst unless you team it with lateral parity which requires an extra bit per long.
  • Cluso99 wrote: »
    But we have a CRC bit and a CRC byte instruction, so ehy not use the CRC byte instruction?

    Oh, nice! Especially the fact that you can use any arbitrary polynomial. Most other processors have a fixed built in CRC polynomial if they support CRC in hardware at all.

    I don't care about undetected error statistics. If the flash chip write fails it fails completely in almost all cases. Common error sources are bad solder joints, P&P errors (wrong chip or chip rotated 180°) or power failure in the middle of programming due to regulator overheat (short somewhere else...)

    XOR is really bad, though. It gives the same result for an even number of identical errors. A block of 256 bytes all $FF instead of all $00 have the same checksum. :-1:
  • cgraceycgracey Posts: 13,610
    Well, we are doing a 32-bit summation of all longs in the image. The idea is that, with an inserted compensation value, the correct sum winds up at $00000000. Or, in the case of the $100-long loader checked by the ROM Booter, the sum winds up at the long value "Prop".
  • ErNaErNa Posts: 1,597
    Prop is nice, that brings me to the idea of talking numbers in general and pCRC
  • cgracey wrote: »
    Well, we are doing a 32-bit summation of all longs in the image. The idea is that, with an inserted compensation value, the correct sum winds up at $00000000. Or, in the case of the $100-long loader checked by the ROM Booter, the sum winds up at the long value "Prop".

    Yes, and in light of that approach I was suggesting a simple small tweak that would slightly improve the error detection rate.
    No skin off my nose if you don't wish to use it.
  • Do you think checksum or CRC is better?
  • Cluso99Cluso99 Posts: 17,833
    FWIW here is the CRCBIT and CRCNIB discussion
    https://forums.parallax.com/discussion/comment/1427742/#Comment_1427742

    To accum 32 bits (4 bytes) takes 18 clocks for a CRC16
  • https://www.nayuki.io/page/forcing-a-files-crc-to-any-value

    I guess CRC-32 is just as malleable as checksum...
  • cgraceycgracey Posts: 13,610
    Cluso99 wrote: »
    FWIW here is the CRCBIT and CRCNIB discussion
    https://forums.parallax.com/discussion/comment/1427742/#Comment_1427742

    To accum 32 bits (4 bytes) takes 18 clocks for a CRC16

    18 clocks at 20MHz * 512K/4 = 118ms. That would increase the full-load boot time by 1/3. Is there sufficient benefit to doing so?
  • Malleability doesn't matter. The CRC in the loader is used to avoid hardware errors going unnoticed, not as protection against intentional hack attempts.

    BTW, the CRCNIB instruction is really useful. Pretty fast and doesn't need large tables. However I've noticed that CRCNIB shifts D right whereas most other CRC generators shift left. If the CRC is used only for internal comparison this doesn't matter. But if you compare the result against externally generated CRCs you have to reverse the polynomial and the result. Example
    CON
      polynomial = $11021 ' polynomial has to be reversed because of
      revpoly    = $8408  ' the P2 shifting right instead of left
    VAR
      long  crc   
    
    PUB crc16 (b): c | p
    ' data byte in, crc word out
      c:= crc
      p:= revpoly
      asm
        shl  b,#24
        setq b
        crcnib c,p
        crcnib c,p
      endasm
      crc:= c
      asm
        rev c
        shr c,#16
      endasm
    
  • ElectrodudeElectrodude Posts: 1,440
    edited 2020-01-29 21:52
    That's funny, I've been reversing the input, instead of the polynomial and the result, and it works. It's amazing that CRC is still useful for detecting hardware errors despite how many symmetries it has.
  • Hmm, not sure... XOR is symetrical, it doesn't matter in which order the operations are applied. But the shift direction is still wrong if you reverse the input instead of the polynomial and the output. If I change my code to
    CON
      polynomial = $1021
    
    PUB crc16 (b): c | p
      c:= crc
      p:= polynomial
      asm
        rev b
        setq b
        crcnib c,p
        crcnib c,p
      endasm
      crc:= c
      asm
        'setword c,#0,#1
      endasm
    
    ... I get different results. I cross checked with the original P1 spin function. My first version in the post above gives the same results.
Sign In or Register to comment.