SPI Coms: Smart Pins vs Bit-Banged

JonnyMac · 2020-09-01 17:42

I'm working on a flash object for the P1 and thought it would be best to work on the P2 version at the same time -- especially with the P2 Eval board conveniently having a flash chip ready to run with. I had a couple of frustrating days getting the smart pins code to work (it does now).

The attached code is the result of my experiments and may be useful to those wanting to play with smart pin SYNC TX and RX modes (SPI). As you go through you can see that I get things working in Spin, then translate to inline PASM. I left the Spin code [commented] in place so that you can see my journey, or just use it as is.

While speaking with Chip yesterday he suggested that SPI is so easy it might just be better to code it rather than using a smart pin. I think this experiment bears that out. Still, the code as I have it is blocking -- the advantage of a smart pin is that you can set it and forget it until later.

Anyway, your feedback is appreciated. We're all still new at the P2, and I'm especially new at PASM2.

Update: Reduced BB PASM2 for shiftin() by one instruction.

cgracey · 2020-09-01 18:00

In the 2nd-stage flash loader, the CK pin is transitioned once every system clock and DI is read on every 2nd system clock, where the bits are assembled into bytes and written into hub RAM. This works at 20MHz, but wouldn't work at 300MHz because of pin delays and SPI flash limitations. It uses a smart pin in 'transition' mode to drive CK and uses the streamer to read bits in from DI and write them into hub RAM.

'
'
' Read in remaining application longs (read command $03 already issued to flash)
'
		wrpin	#%01_00101_0,#spi_ck	'set spi_ck smart pin for transitions, drives low
		fltl	#spi_ck			'reset smart pin
		wxpin	#1,#spi_ck		'set transition timebase to clk/1
		drvl	#spi_ck			'enable smart pin

		setxfrq	clk2			'set streamer rate to clk/2

		wrfast	#0,##$400-app_start*4	'ready to write to hub at application continuation

.block		bmask	x,#10			'try max streamer block size for longs ($7FF)
		fle	x,app_longs		'limit to number of longs left
		sub	app_longs,x		'update number of longs left

		shl	x,#5			'get number of bits
		setword	wmode,x,#0		'insert into streamer command
		shl	x,#1			'double for number of spi_ck transitions

		wypin	x,#spi_ck		'2	start spi_ck transitions
		waitx	#3			'2+3	align spi_ck transitions with spi_do sampling
		xinit	wmode,#0		'2	start inputting spi_do bits to hub, bytes-msb-first
		waitxfi				'?	wait for streamer to finish

		tjnz	app_longs,#.block	'if more longs left, read another block

		wrpin	#0,#spi_ck		'clear spi_ck smart pin mode
'
'
' Verify application checksum
'
		rdfast	#0,#0			'sum all application longs
		rep	#2,app_longs2
		rflong	x
		add	app_sum,x	wz	'z=1 if verified

stop	if_nz	fltl	#spi_di addpins 2	'if checksum failed, float spi_cs/spi_ck/spi_di pins
	if_nz	hubset	#%0010			'..and stop clock until next reset

		coginit	#0,#$00000		'checksum verified, run application
'
'
' Data
'
clk2		long	$4000_0000		'clk/2 nco value for streamer
wmode		long	$C081_0000 + spi_do<<17	'streamer mode, 1-pin input, bytes-msb-first, bytes to hub

zeroa						'(used by programmer as long 0)
app_longs	long	0			'number of longs in application		(set by programmer)
zerob						'(used by programmer as long 0)
app_longs2	long	0			'number of longs in application		(set by programmer)
zeroc						'(used by programmer as long 0)
app_sum		long	0			'-sum of application longs		(set by programmer)
x						'(used by loader as variable)
loader_sum	byte	-"P",!"r",!"o",!"p"	'"Prop" - sum of $100 loader longs	(set by programmer)

The four instructions from WYPIN to WAITXFI form the code that reads in up to 65,504 bits at a whack from the flash chip. That's one byte every 16 clocks.

Here are the program and boot times from the flash loader, assuming a 20MHz RCFAST clock, which is actually more like 24MHz, but that wouldn't speed up programming, only booting:

'			program		boot
'	bytes		time		time
'	-------------------------------------
'	0..2KB		30ms		10ms
'	   4KB		60ms		11ms
'	   8KB		94ms		14ms
'	  16KB		170ms		20ms
'	  32KB		200ms		30ms
'	  64KB		300ms		52ms
'	 128KB		570ms		95ms
'	 256KB		1.1s		184ms
'	 512KB		2.2s		358ms

JonnyMac · 2020-09-01 18:03

I haven't figured out the streamer yet, and I wanted to make straightforward code that could be easily applied elsewhere. My generic SPI object is going to be bit-banged so that all SPI modes can be supported. I have some other objects, though -- like the MAX7219 -- that might benefit from smart pin SPI.

The forums is doing that glitchy thing again where updated code doesn't show up. If you click on the data/time of the edit in my original post, the link to the code will re-appear.

cgracey · 2020-09-01 18:13

To use smart pins in tight places, you have to slow way down and think about your approach. In the end, the code is always short and the results are as fast as they could be, but it's hard to come up the plan. Then, it takes some experimentation to get a handle on it.

When Jon and I spoke about this yesterday, we were talking about using the 'synchronous serial' smart-pin mode for this job, and my mind immediately got plugged up with details that would need to be worked through. Eventually, we will have enough samples and documentation that all the difficulty will be tamed, but for now it's still challenging. Part of my mental block was that I remembered that I used the 'transition' smart-pin mode and the streamer to accomplish all this in the flash loader, with the added benefit of having the streamer handle all the data I/O to and from hub RAM, without any extra activity in my code. So, I was kind of spoiled with one solution that I know works best for flash I/O.

JonnyMac · 2020-09-01 18:28

What was ultimately kicking me around was that I left the DO and DI smart pins enabled when I wasn't using them. Being attached to SCK, this created problems. The current code disconnects DO when not in use, and resets DI just before use to clear the internal result before the clocking starts.

While frustrating, this turned out to be a worthwhile learning experience, and now there is a bit of code in both styles (bit-bang vs smart pin) and languages (Spin2 vs PASM2) that may help others get into the grove.

Ultimately, a flash object will benefit from a very fast sector<-->buffer transfer. I'm going to do it with the code I have, then may ask for help in updating it to use the streamer.

Ariba · 2020-09-01 18:52

The biggest disadvantage of the smartpin solution is that the clock at input B can be maximum 3 pins away from the datapin. This means that an SPI driver with smartpins is not universally usable.
And smartpin SPI only works up to 1/5 or 1/6 of the system clock, while bitbanged, or with Streamer together with a smartpin for clocks can go up to 1/2 sysclock and has no pinlayout limitations.
But for inline assembly-SPI for SD cards smartpin SPI is probably still the best solution.

Andy

avsa242 · 2020-09-01 20:44

Yeah I've run into the pin layout issue, too. In my library, I have a smartpin-based SPI engine (fastspin) that I've tested to 25MHz (with an SSD1331 OLED, IIRC) https://github.com/avsa242/p2-spin-standard-library/blob/testing/library/com.spi.spin2. There have been a few devices I couldn't get it to work with, though, and the on-board flash chip was one of them - and I think it was due to the pin-relationship limitation. It's also limited to mode 0, at the moment, so it could definitely use some work.

SPI Coms: Smart Pins vs Bit-Banged

Comments