Smart Pins Docs and features

samuell · 2019-07-27 14:01

Hi Evan,

What SPI mode are you using? It seems that you are changing state on high-to-low clock transitions, and therefore asserting on low-to-high. It seems to be mode 0 (CPOL = 0, CPHA = 0). Am I right?

Also, is the clock speed just 5KHz? I need to check my calculations, but this seems to be very slow. Despite that, it is nice to see that SPI is working (well, it should anyway, or else one wouldn't be able to read the SD card, to begin with).

I'm in the hopes of having SPI functionality integrated on a C library. However, I don't have the expertise to do that. I need speeds in the order of 12MHz for SPI, for a project.

Kind regards, Samuel Lourenço

Peter Jakacki · 2019-07-27 14:13

At 250MHz CPU speeds I am getting 25MHz read/write speeds using bit-bashing with zero setup time required. I think for a lot of task this is sufficient and besides a lot of devices have definite limitations in terms of clock speed. I'm not in a hurry to implement smartpin based SPI but I'm taking a backseat while I watch and wait for something that just works and is efficient for 50MHz which is what SD cards are specified to run up to using SPI timing.

So while smartpin based SPI bus is desirable, it is not a show-stopper for day to day work. If we wanted to run the P2 as a SPI slave then it would be useful to have smartpins handle this.

evanh · 2019-07-27 15:10

samuell wrote: »

What SPI mode are you using? It seems that you are changing state on high-to-low clock transitions, and therefore asserting on low-to-high. It seems to be mode 0 (CPOL = 0, CPHA = 0). Am I right?

Yes, you've described the polarity, but it wasn't meant to be a specific mode and can certainly be rearranged. Just a demo of tight clock and data aligning that the smartpins weren't really intended to handle.

Also, is the clock speed just 5KHz? I need to check my calculations, but this seems to be very slow. Despite that, it is nice to see that SPI is working (well, it should anyway, or else one wouldn't be able to read the SD card, to begin with).

That scope snapshot was of the program running on RCFAST, so about 22 MHz sysclock. It was a demo of SPI clock at sysclock/2, so therefore SPI clock would be around 11 MHz. If sysclock was, say, 80 MHz then the SPI clock would be 40 MHz.

evanh · 2019-07-27 16:15

Here's a sysclock/4 example using smartpins + bit-bashed SPI clock and with OUT as tx smartpin clock input:

'Smartpin loopback test of sync. Tx and Rx modes 
'for P2-ES Eval board

dat	org
		hubset	##$0100_0008		' XI config (RCFAST operating)
		waitx	##(22_000_000/100)	' 10 ms
		hubset	##$0100_000a		' XI engage

'setup pin for SPI clock

		wrpin	##1<<16, sck		'registered pin
		drvl	sck

'setup smartpin for sync tx

		wrpin	##(%0000_0100<<24)+(1<<16)+%01_11100_0, tx	'clock from OUT, DIR forced on, registered pin
		wxpin	#$20 + 7,tx
		dirh	tx

'setup smartpin for sync rx

		wrpin	##(%0001_0010<<24)+(1<<16)+%11101_0, rx		'clock from rx + 2, data from rx + 1, registered pin
		wxpin	#$0 + 7,rx
		dirh	rx

'send bytes on tx pin and receive bytes on rx pin

		loc	ptra,#@msg
.loop		rdbyte	pb,ptra++ wz
	if_nz	call	#txrx
	if_nz	jmp	#.loop
done		jmp	#$

txrx
		rev	pb		'big-endian for conventional SPI ... and human sanity
		shr	pb,#24
		wypin	pb,tx		'load shifter and place first bit on SMART_OUT

		rep	@.loop,#8	'send 8 SPI clocks
		or	outa, #%1100	'SPI clock high, pins 2(tx) and 3(sck)
		andn	outa, #%1100	'SPI clock low, pins 2(tx) and 3(sck)
.loop
		waitx	#6		'twiddle thumbs while the lag from OUT to pin to smartpin propagates
		rdpin	pa,rx
		shr	pa,#24		'received byte

		cmp	pa,pb wz	'both leds lit if match
		drvl	#56		'alternate leds lit if mismatch
		drvnz	#57
		waitx	##25_000_000
		drvh	#56
		drvz	#57
		waitx	##25_000_000
		ret	wcz

rx		long	1		'smartpin locations
tx		long	2
sck		long	3

		orgh	$400
msg		byte	%1011_1010,"Smartpins",0

The scope snapshot showing a two sysclock lag (with registered tx pin) on the tx data so that data transition lines up on the low going clock edge:

evanh · 2019-07-27 16:45

I've also updated the block diagram to show there is separate OUT and SMART_OUT, indicating they can be used separately - https://forums.parallax.com/discussion/comment/1473762/#Comment_1473762

pilot0315 · 2019-07-27 21:47

@evanh

Thanks for the smart pin lesson. Was working it today. I changed to pin 0 for the scope.
Added a couple of comments, please check if I was correct on the pin sequence to start the pin.
Thanks

evanh · 2019-07-28 03:41

Yep, the pin number change is correct, and obviously has worked. The comments, not so much.

But first, I'd suggest removing the hubset #0 as that is just overriding your chosen system clock frequency of 160 MHz.

Okay, the comments:
- DIRL #0 lowers the output enable (direction) control signal of pin0. So it's an output disable rather than pin low. And what's more, once a smartpin mode is selected then this control becomes a smartpin enable/disable instead of the physical pin. Hence the comment about enabling the smartpin with the subsequent DIRH #0. The physical pin output is then enabled with the (%01<<6) in the WRPIN instruction.

- ADD daclevel,#1 is simply adding 1 to the variable "daclevel". It starts at 222 so the first add increments daclevel to 223. Second loop daclevel becomes 224, then 225, and so on until the variable exceeds 32 bits and rolls over back to zero.

- WYPIN daclevel,#0 places a copy of daclevel into smartpin 0, specifically register Y of smartpin 0. Because the smartpin is configured to used its register Y as the set level of the DAC then this sets the voltage out of the DAC at that moment.

So because on each loop, daclevel gets incremented and then repeatedly written into the smartpin, the smartpin is repeatedly raising the DAC value in small steps and therefore the voltage also in small steps. That's why the slope of the sawtooth is rising to the right.

See if you can work out why the voltage has also the distinctive sawtooth vertical fall as well.

evanh · 2019-07-28 04:21

For comparison with above scope snapshot I've rerun my first sysclock/4 "hack" of Oz's demo, with the so-called tighten timing example that I posted - https://forums.parallax.com/discussion/comment/1474287/#Comment_1474287

Here's what that one looked like with the same first ID byte as above (and I've also changed this to big-endian to match):

It worked because the rx smartpin was configured to use the low going clock rather than high going. As you can see, the second low going clock edge is just after the data level has changed. Not really ideal.

evanh · 2019-07-28 04:28

samuell wrote: »

It seems to be mode 0 (CPOL = 0, CPHA = 0). Am I right?

I've just had a look around for the mode naming and yes, CPOL=0 and CPHA=0 looks to be a good fit there. The fisrt data bit is being placed on the MOSI tx pin before the first rising SPI clock edge. And that same first rising SPI clock is the real condition for second data bit on tx pin. The fact that the second data bit doesn't appear at the tx pin until later is that detail of lag I keep raising. So there is some illusion in effect to arrange the desired timings.
800px-SPI_timing_diagram2.svg.png

evanh · 2019-07-28 12:28

After some reading and a small amount of testing, I can't see the smartpin tx synchronous serial mode being able to support CPHA = 1. A streamer could do it.

And rx mode can happily handle either, it's just a clock input polarity change for rx to move between first and second SPI clock edge. The prop2 can handle any polarity changes no problem.

evanh · 2019-07-28 12:48

evanh wrote: »

jmg wrote: »

I'm guessing this streamer-generates-clock cannot be used with a streamer-for-data ?
What speed could P2 manage for QuadSPI, using streamer for data ?

Well, could have both clock and data formatted together into hubram before sending them with a streamer. But the processing overhead would defeat the purpose.

I see Chip mentioned, a while back, the idea of using a smartpin for SPI clock gen alongside a streamer to do the 4-bit parallel data. This would be much more sensible since the data won't need reformatted.

Getting it fully functioning will be a decent undertaking though. Because QuadSPI, and its ilk, are a contorted extension to SPI, in practice, there will be commands and mode changes that are not always 4-bit parallel. So the implementation will need to handle back and forth mode transitions in a clean manner.

evanh · 2019-07-28 13:08

evanh wrote: »

... the idea of using a smartpin for SPI clock gen alongside a streamer to do the 4-bit parallel data.

There is an added trickiness to this arrangement. When using smartpins for the SPI data, they are following the provided SPI clock. A streamer does no such thing. It instead has to be separately programmed with the same timing details as the clock source is. The software has to arrange for them to be matched on alignment, sync and run-length.

pilot0315 · 2019-07-28 18:41

@evanh
faster ramp down of the voltage. similar to capacitive charge and discharge

pilot0315 · 2019-07-28 21:37

@evanh
Got reversing sawtooth working. Thanks for your help. Starting to get a handle on this.

pilot0315 · 2019-07-28 21:55

@evanh
Had the wrong code

evanh · 2019-07-29 00:37

pilot0315 wrote: »

@evanh
Got reversing sawtooth working. Thanks for your help. Starting to get a handle on this.

Good stuff. Now try removing the HUBSET instruction and see what happens.

cheezus · 2019-07-29 09:42

Peter Jakacki wrote: »

At 250MHz CPU speeds I am getting 25MHz read/write speeds using bit-bashing with zero setup time required. I think for a lot of task this is sufficient and besides a lot of devices have definite limitations in terms of clock speed. I'm not in a hurry to implement smartpin based SPI but I'm taking a backseat while I watch and wait for something that just works and is efficient for 50MHz which is what SD cards are specified to run up to using SPI timing.

So while smartpin based SPI bus is desirable, it is not a show-stopper for day to day work. If we wanted to run the P2 as a SPI slave then it would be useful to have smartpins handle this.

I agree that bit bashing is great for most things P2 but I think the smartpins could really help for SPI WHERE THERE IS PROCESSING TO BE DONE BETWEEN DATAWORDS. ie. What I really want to use smartpin SPI for is reading touchscreens. The large 7" resistive screens I use require a calibration, so before a final X,Y is output it must be filtered and processed. I'm not 100% sure how this would work out but seems doable. I see some room to improve the quality of my samples as well, which should help.

Smartpin SPI also has the advantage of going to sysclock /2 .

I've had a chance to sit down and look at things a bit more and @evanh 's comment about not being able to do CPHA =1 made everything click. I have the SD driver working solid using Mode 0,0 but my hardware uses Mode 1,1 and with pullups to allow the pins to be used for other things. I have a couple ideas I want to try to get Mode 1,1 working but even if I don't at least I learned some things about the smartpins!

I know most people won't care what mode their SD driver is running in and since Mode 0,0 works I'll probably base my driver off of this, instead of the path I'm going to head down shortly.

An aside, I'm only able to test SD up-to 25MHZ, around 27 my card stops responding. I'm pretty sure this would run to 50mhz no problem with a sysclock of 100mhz or greater. @"Peter Jakacki" if you would like to test what I have working so far I'll update the SD post with some code. Just let me know.

@samuell If you are looking for generic SPI code I could probably help you out.

Peter Jakacki · 2019-07-29 11:47

@cheezus - Bear in mind that the eval board has hideously long pcb traces to the SD card so this will limit the maximum speed anyway due to the inductance, capacitance, and perhaps even cross-talk and skew.
I can try this on my P2D2 board though.

evanh · 2019-07-29 13:23

I've been pondering a generic SPI handler at sysclock/2. They can't practically be done with smartpins in short bursts other than one byte at a time. Which is not particularly useful. The only time it can benefit anyone is when doing burst transfers like data blocks with SD cards or ADC/DAC sampling. Of course long bursts is exactly the right situation to make it work.

The key point I'm making here is these type transfers can be speed optimised for their longer lengths. This has a bearing on the structure of the overheads. In the case of a data block size the length is known exactly and an easy round number. In the case of ADC sampling there isn't any concern of trailing data loss, an arbitrary cut-off is fine.

Very high speed unbroken data rates will need some setup me thinks. Using a streamer, even for pure SPI, seems inescapable to achieve this.

jmg · 2019-07-29 23:45

evanh wrote: »

I've been pondering a generic SPI handler at sysclock/2. They can't practically be done with smartpins in short bursts other than one byte at a time. Which is not particularly useful. The only time it can benefit anyone is when doing burst transfers like data blocks with SD cards or ADC/DAC sampling. Of course long bursts is exactly the right situation to make it work.

The key point I'm making here is these type transfers can be speed optimised for their longer lengths. This has a bearing on the structure of the overheads. In the case of a data block size the length is known exactly and an easy round number. In the case of ADC sampling there isn't any concern of trailing data loss, an arbitrary cut-off is fine.

Very high speed unbroken data rates will need some setup me thinks. Using a streamer, even for pure SPI, seems inescapable to achieve this.

Yes, it is somewhat klunky to have all that smart-pin serial shifting silicon, and then have to generate the clock by bit-bashing, which consumes all the processor time.
ie the best SPI examples should really have HW Clocks and HW shifters.
If the final silicon is at least as fast as P2-ES, & cooler, maybe SysCLK/4 is going to be tolerable ?

Did you ever look at using 2 clk pins, so the TX shifter clock can be phase-moved relative to RX ? That costs a pin, but maybe that's tolerable for the very highest speeds ?

evanh · 2019-07-30 00:05

That one would be smartpin doing the SPI clock and streamer doing the data, with all the setup complexities, as per Chip's suggestion. It would be the only chance to achieve unbroken bit rate at sysclock/2.

Using smartpins for data, sysclock/4 would manage unidirectional unbroken rate only with careful tight loop consuming the cog for burst length.

Two clock pins is exactly what the recent streamer based SPI clock was doing. Just that the second clock stayed internally mapped to the tx smartpin only. And then I duplicated the idea back to the bit-bashed SPI clock version as well. It is what allowed the superior alignment via OUT-as-input config - https://forums.parallax.com/discussion/comment/1474472/#Comment_1474472

jmg · 2019-07-30 00:55

evanh wrote: »

Two clock pins is exactly what the recent streamer based SPI clock was doing. Just that the second clock stayed internally mapped to the tx smartpin only.

Yes, but I think that consumed the streamer to generate the clock, and was not suited to long bursts ?
Could that 'second clock stayed internally mapped to the tx smartpin' instead be generated by a smart pin, running in pulse-count-out mode ?
Hopefully, that can free the streamer for data transport ?
Saving a physical pin clearly has appeal, so if that can be internal, that is great.

evanh · 2019-07-30 01:34

The sysclock/4 with bit-bashed SPI clock didn't need the streamer. Although that one could only handle one byte at a time.

As I just pointed out it, I think it would be possible to engineer a longer burst at sysclock/4 with a customised routine just for that case. This one would use streamer for SPI clock - to do the OUT-as-input config.

Sysclock/8 would use a smartpin for both because the lag issue fades at this point. No longer requires the OUT-as-input config.

Also, as I've been saying a few times, to get unbroken data from sysclock/2 will require using streamer to handle the data. So far I've only done streamer for clock.

Chip's suggestion is streamer for data and smartpin for clock. In this mode there is no clock following occurring. It's purely a case of the software aligning the starts of both the clock and data to coincide. And ensuring the divider timings also match so that they don't go out of sync mid way. And also good idea to make them finish together as well.

evanh · 2019-07-30 01:57

jmg wrote: »

Could that 'second clock stayed internally mapped to the tx smartpin' instead be generated by a smart pin, running in pulse-count-out mode ?

No, the OUT-as-input config can only use OUT from the cogs/streamers. The smartpins can't control OUT. In the block diagram, because of this, I've now separated the two paths and renamed the smartpin's output as SMART_OUT - https://forums.parallax.com/discussion/comment/1473762/#Comment_1473762

cheezus · 2019-07-30 02:07

I've been working out how to clean up some uglies and this may be applicable to more than just SD.

One problem I was having is when sysclock was not at least 2x spi_max_clock. This SHOULD fix that, although I'm sure there's a cleaner way to write it. I could use some help because I admit to not being good at the compound min-max statements...

    sbt :=((clkfreq / spi_clk_max) /2 )+1   ' SPI bit time
    if sbt < 2                              ' make sure at least sysclock /2
        sbt := 2

One of the other things I didn't like was hard-coding the clock pin relative to the data pins so this should clean that up. Although it should probably be a one-liner too...

PRI getmask(clk, data) | t
    t := clk - data
    if ( || t ) <> t        
      t := ( || t ) + %0100

Hopefully someone can point out that nice one-liner for those two methods. So simple I'm just too close to the problem, but at least I didn't implement getmask with a lookup table?

The idea of using the streamer to handle data is something I've been contemplating, although I tend to want to leave the streamer free so I can run code in hubex at some point??

The fact I'm able to get sysclock/2 will probably be killed by the extra flops in the pins... maybe sysclock /4 is realistic, and totally acceptable. I guess I'll deal with that when I get there.. Doing full-duplex spi at sysclock/2 is unrealistic as well. I'm probably only getting this to work out of luck... I'm not sure what the difference between an external device and same-cog loopback imply.

The one takeaway from my testing code.. The simple test using bit-bashed pins time to complete was usually a linear function of clock speed.. Once sysclock*2 > spi_max_clock, the time to complete was not as dependent on clock speed. ie test @160 mhz takes 13s and @320 MHz 11 second. I haven't compared times on bit-bashed code but decoupling from sysclock is nice and the streamer will probably help.

evanh · 2019-07-30 11:56

Doh! One streamer can't do data for both tx and rx together. But I suppose things like QuadSPI are half duplex anyway. Yet one more reason why top speed bursting is a specialised mode to switch in and out of.

cheezus · 2019-07-30 19:23

I've got a question I'm pretty sure I know the answer to but wanted to double check before I head down this rabbit hole...

If I'm using the smartpin SPI for byte read/write and wanted to change to say 32 bits for one transaction, instead of doing 4x byte transfers.. Do I need to dirl, dirh after the WXPIN? I'm pretty sure it would be needed, maybe easier to just do 4x bytes. I was thinking of setting up the smartpins for 32 bits and just only count to 8? Seems like this could work..

jmg · 2019-07-30 21:11

cheezus wrote: »

I've got a question I'm pretty sure I know the answer to but wanted to double check before I head down this rabbit hole...

If I'm using the smartpin SPI for byte read/write and wanted to change to say 32 bits for one transaction, instead of doing 4x byte transfers.. Do I need to dirl, dirh after the WXPIN? I'm pretty sure it would be needed, maybe easier to just do 4x bytes. I was thinking of setting up the smartpins for 32 bits and just only count to 8? Seems like this could work..

On Rx it is likely tricky, as somewhere a shifter transfers to a holding register, and it needs to know how many clocks to do that.
On Tx, you also have a holding register and shifter, but you may be ok if a write to the holding register 're-primes' the shifter.
The better SPI peripherals have FIFOs, and they can Tx and Rx continually with no pauses in the clocks. P2 may not quite achieve that both ways, but I think Chip has said it can manage no-pauses one way.

evanh · 2019-07-31 01:51

There is a single longword buffer and shifter for both tx and rx modes. How many of those bits are for tx word length depends on config.

There is two slightly different tx operating modes:
X[5]=1 is start/stop mode. This has auto filling of shifter if transmitting is finished, buffer automatically engages once transmitting has started.
x[5]=0 is continuous mode. This has buffer engaged with DIR high and shifter direct with DIR low. So initial filling of shifter is by lowering DIR to load first word into shifter.

Both will start over if DIR is cycled. Both present first bit on tx pin before first clock edge.

Changing the smartpin X parameter in mid operation will probably be actioned. However, the outcome isn't likely to be orderly.

cheezus · 2019-07-31 06:41

evanh wrote: »

Changing the smartpin X parameter in mid operation will probably be actioned. However, the outcome isn't likely to be orderly.

I was pretty sure this would be the case. It's probably better to just do 4x bytes...

Smart Pins Docs and features

Comments