What SPI mode are you using? It seems that you are changing state on high-to-low clock transitions, and therefore asserting on low-to-high. It seems to be mode 0 (CPOL = 0, CPHA = 0). Am I right?
Also, is the clock speed just 5KHz? I need to check my calculations, but this seems to be very slow. Despite that, it is nice to see that SPI is working (well, it should anyway, or else one wouldn't be able to read the SD card, to begin with).
I'm in the hopes of having SPI functionality integrated on a C library. However, I don't have the expertise to do that. I need speeds in the order of 12MHz for SPI, for a project.
At 250MHz CPU speeds I am getting 25MHz read/write speeds using bit-bashing with zero setup time required. I think for a lot of task this is sufficient and besides a lot of devices have definite limitations in terms of clock speed. I'm not in a hurry to implement smartpin based SPI but I'm taking a backseat while I watch and wait for something that just works and is efficient for 50MHz which is what SD cards are specified to run up to using SPI timing.
So while smartpin based SPI bus is desirable, it is not a show-stopper for day to day work. If we wanted to run the P2 as a SPI slave then it would be useful to have smartpins handle this.
What SPI mode are you using? It seems that you are changing state on high-to-low clock transitions, and therefore asserting on low-to-high. It seems to be mode 0 (CPOL = 0, CPHA = 0). Am I right?
Yes, you've described the polarity, but it wasn't meant to be a specific mode and can certainly be rearranged. Just a demo of tight clock and data aligning that the smartpins weren't really intended to handle.
Also, is the clock speed just 5KHz? I need to check my calculations, but this seems to be very slow. Despite that, it is nice to see that SPI is working (well, it should anyway, or else one wouldn't be able to read the SD card, to begin with).
That scope snapshot was of the program running on RCFAST, so about 22 MHz sysclock. It was a demo of SPI clock at sysclock/2, so therefore SPI clock would be around 11 MHz. If sysclock was, say, 80 MHz then the SPI clock would be 40 MHz.
Thanks for the smart pin lesson. Was working it today. I changed to pin 0 for the scope.
Added a couple of comments, please check if I was correct on the pin sequence to start the pin.
Thanks
Yep, the pin number change is correct, and obviously has worked. The comments, not so much.
But first, I'd suggest removing the hubset #0 as that is just overriding your chosen system clock frequency of 160 MHz.
Okay, the comments:
- DIRL #0 lowers the output enable (direction) control signal of pin0. So it's an output disable rather than pin low. And what's more, once a smartpin mode is selected then this control becomes a smartpin enable/disable instead of the physical pin. Hence the comment about enabling the smartpin with the subsequent DIRH #0. The physical pin output is then enabled with the (%01<<6) in the WRPIN instruction.
- ADD daclevel,#1 is simply adding 1 to the variable "daclevel". It starts at 222 so the first add increments daclevel to 223. Second loop daclevel becomes 224, then 225, and so on until the variable exceeds 32 bits and rolls over back to zero.
- WYPIN daclevel,#0 places a copy of daclevel into smartpin 0, specifically register Y of smartpin 0. Because the smartpin is configured to used its register Y as the set level of the DAC then this sets the voltage out of the DAC at that moment.
So because on each loop, daclevel gets incremented and then repeatedly written into the smartpin, the smartpin is repeatedly raising the DAC value in small steps and therefore the voltage also in small steps. That's why the slope of the sawtooth is rising to the right.
See if you can work out why the voltage has also the distinctive sawtooth vertical fall as well.
Here's what that one looked like with the same first ID byte as above (and I've also changed this to big-endian to match):
It worked because the rx smartpin was configured to use the low going clock rather than high going. As you can see, the second low going clock edge is just after the data level has changed. Not really ideal.
It seems to be mode 0 (CPOL = 0, CPHA = 0). Am I right?
I've just had a look around for the mode naming and yes, CPOL=0 and CPHA=0 looks to be a good fit there. The fisrt data bit is being placed on the MOSI tx pin before the first rising SPI clock edge. And that same first rising SPI clock is the real condition for second data bit on tx pin. The fact that the second data bit doesn't appear at the tx pin until later is that detail of lag I keep raising. So there is some illusion in effect to arrange the desired timings.
After some reading and a small amount of testing, I can't see the smartpin tx synchronous serial mode being able to support CPHA = 1. A streamer could do it.
And rx mode can happily handle either, it's just a clock input polarity change for rx to move between first and second SPI clock edge. The prop2 can handle any polarity changes no problem.
I'm guessing this streamer-generates-clock cannot be used with a streamer-for-data ?
What speed could P2 manage for QuadSPI, using streamer for data ?
Well, could have both clock and data formatted together into hubram before sending them with a streamer. But the processing overhead would defeat the purpose.
I see Chip mentioned, a while back, the idea of using a smartpin for SPI clock gen alongside a streamer to do the 4-bit parallel data. This would be much more sensible since the data won't need reformatted.
Getting it fully functioning will be a decent undertaking though. Because QuadSPI, and its ilk, are a contorted extension to SPI, in practice, there will be commands and mode changes that are not always 4-bit parallel. So the implementation will need to handle back and forth mode transitions in a clean manner.
... the idea of using a smartpin for SPI clock gen alongside a streamer to do the 4-bit parallel data.
There is an added trickiness to this arrangement. When using smartpins for the SPI data, they are following the provided SPI clock. A streamer does no such thing. It instead has to be separately programmed with the same timing details as the clock source is. The software has to arrange for them to be matched on alignment, sync and run-length.
At 250MHz CPU speeds I am getting 25MHz read/write speeds using bit-bashing with zero setup time required. I think for a lot of task this is sufficient and besides a lot of devices have definite limitations in terms of clock speed. I'm not in a hurry to implement smartpin based SPI but I'm taking a backseat while I watch and wait for something that just works and is efficient for 50MHz which is what SD cards are specified to run up to using SPI timing.
So while smartpin based SPI bus is desirable, it is not a show-stopper for day to day work. If we wanted to run the P2 as a SPI slave then it would be useful to have smartpins handle this.
I agree that bit bashing is great for most things P2 but I think the smartpins could really help for SPI WHERE THERE IS PROCESSING TO BE DONE BETWEEN DATAWORDS. ie. What I really want to use smartpin SPI for is reading touchscreens. The large 7" resistive screens I use require a calibration, so before a final X,Y is output it must be filtered and processed. I'm not 100% sure how this would work out but seems doable. I see some room to improve the quality of my samples as well, which should help.
Smartpin SPI also has the advantage of going to sysclock /2 .
I've had a chance to sit down and look at things a bit more and @evanh 's comment about not being able to do CPHA =1 made everything click. I have the SD driver working solid using Mode 0,0 but my hardware uses Mode 1,1 and with pullups to allow the pins to be used for other things. I have a couple ideas I want to try to get Mode 1,1 working but even if I don't at least I learned some things about the smartpins!
I know most people won't care what mode their SD driver is running in and since Mode 0,0 works I'll probably base my driver off of this, instead of the path I'm going to head down shortly.
An aside, I'm only able to test SD up-to 25MHZ, around 27 my card stops responding. I'm pretty sure this would run to 50mhz no problem with a sysclock of 100mhz or greater. @"Peter Jakacki" if you would like to test what I have working so far I'll update the SD post with some code. Just let me know.
@samuell If you are looking for generic SPI code I could probably help you out.
@cheezus - Bear in mind that the eval board has hideously long pcb traces to the SD card so this will limit the maximum speed anyway due to the inductance, capacitance, and perhaps even cross-talk and skew.
I can try this on my P2D2 board though.
I've been pondering a generic SPI handler at sysclock/2. They can't practically be done with smartpins in short bursts other than one byte at a time. Which is not particularly useful. The only time it can benefit anyone is when doing burst transfers like data blocks with SD cards or ADC/DAC sampling. Of course long bursts is exactly the right situation to make it work.
The key point I'm making here is these type transfers can be speed optimised for their longer lengths. This has a bearing on the structure of the overheads. In the case of a data block size the length is known exactly and an easy round number. In the case of ADC sampling there isn't any concern of trailing data loss, an arbitrary cut-off is fine.
Very high speed unbroken data rates will need some setup me thinks. Using a streamer, even for pure SPI, seems inescapable to achieve this.
I've been pondering a generic SPI handler at sysclock/2. They can't practically be done with smartpins in short bursts other than one byte at a time. Which is not particularly useful. The only time it can benefit anyone is when doing burst transfers like data blocks with SD cards or ADC/DAC sampling. Of course long bursts is exactly the right situation to make it work.
The key point I'm making here is these type transfers can be speed optimised for their longer lengths. This has a bearing on the structure of the overheads. In the case of a data block size the length is known exactly and an easy round number. In the case of ADC sampling there isn't any concern of trailing data loss, an arbitrary cut-off is fine.
Very high speed unbroken data rates will need some setup me thinks. Using a streamer, even for pure SPI, seems inescapable to achieve this.
Yes, it is somewhat klunky to have all that smart-pin serial shifting silicon, and then have to generate the clock by bit-bashing, which consumes all the processor time.
ie the best SPI examples should really have HW Clocks and HW shifters.
If the final silicon is at least as fast as P2-ES, & cooler, maybe SysCLK/4 is going to be tolerable ?
Did you ever look at using 2 clk pins, so the TX shifter clock can be phase-moved relative to RX ? That costs a pin, but maybe that's tolerable for the very highest speeds ?
That one would be smartpin doing the SPI clock and streamer doing the data, with all the setup complexities, as per Chip's suggestion. It would be the only chance to achieve unbroken bit rate at sysclock/2.
Using smartpins for data, sysclock/4 would manage unidirectional unbroken rate only with careful tight loop consuming the cog for burst length.
Two clock pins is exactly what the recent streamer based SPI clock was doing. Just that the second clock stayed internally mapped to the tx smartpin only. And then I duplicated the idea back to the bit-bashed SPI clock version as well. It is what allowed the superior alignment via OUT-as-input config - https://forums.parallax.com/discussion/comment/1474472/#Comment_1474472
Two clock pins is exactly what the recent streamer based SPI clock was doing. Just that the second clock stayed internally mapped to the tx smartpin only.
Yes, but I think that consumed the streamer to generate the clock, and was not suited to long bursts ?
Could that 'second clock stayed internally mapped to the tx smartpin' instead be generated by a smart pin, running in pulse-count-out mode ?
Hopefully, that can free the streamer for data transport ?
Saving a physical pin clearly has appeal, so if that can be internal, that is great.
The sysclock/4 with bit-bashed SPI clock didn't need the streamer. Although that one could only handle one byte at a time.
As I just pointed out it, I think it would be possible to engineer a longer burst at sysclock/4 with a customised routine just for that case. This one would use streamer for SPI clock - to do the OUT-as-input config.
Sysclock/8 would use a smartpin for both because the lag issue fades at this point. No longer requires the OUT-as-input config.
Also, as I've been saying a few times, to get unbroken data from sysclock/2 will require using streamer to handle the data. So far I've only done streamer for clock.
Chip's suggestion is streamer for data and smartpin for clock. In this mode there is no clock following occurring. It's purely a case of the software aligning the starts of both the clock and data to coincide. And ensuring the divider timings also match so that they don't go out of sync mid way. And also good idea to make them finish together as well.
Could that 'second clock stayed internally mapped to the tx smartpin' instead be generated by a smart pin, running in pulse-count-out mode ?
No, the OUT-as-input config can only use OUT from the cogs/streamers. The smartpins can't control OUT. In the block diagram, because of this, I've now separated the two paths and renamed the smartpin's output as SMART_OUT - https://forums.parallax.com/discussion/comment/1473762/#Comment_1473762
I've been working out how to clean up some uglies and this may be applicable to more than just SD.
One problem I was having is when sysclock was not at least 2x spi_max_clock. This SHOULD fix that, although I'm sure there's a cleaner way to write it. I could use some help because I admit to not being good at the compound min-max statements...
sbt :=((clkfreq / spi_clk_max) /2 )+1 ' SPI bit time
if sbt < 2 ' make sure at least sysclock /2
sbt := 2
One of the other things I didn't like was hard-coding the clock pin relative to the data pins so this should clean that up. Although it should probably be a one-liner too...
PRI getmask(clk, data) | t
t := clk - data
if ( || t ) <> t
t := ( || t ) + %0100
Hopefully someone can point out that nice one-liner for those two methods. So simple I'm just too close to the problem, but at least I didn't implement getmask with a lookup table?
The idea of using the streamer to handle data is something I've been contemplating, although I tend to want to leave the streamer free so I can run code in hubex at some point??
The fact I'm able to get sysclock/2 will probably be killed by the extra flops in the pins... maybe sysclock /4 is realistic, and totally acceptable. I guess I'll deal with that when I get there.. Doing full-duplex spi at sysclock/2 is unrealistic as well. I'm probably only getting this to work out of luck... I'm not sure what the difference between an external device and same-cog loopback imply.
The one takeaway from my testing code.. The simple test using bit-bashed pins time to complete was usually a linear function of clock speed.. Once sysclock*2 > spi_max_clock, the time to complete was not as dependent on clock speed. ie test @160 mhz takes 13s and @320 MHz 11 second. I haven't compared times on bit-bashed code but decoupling from sysclock is nice and the streamer will probably help.
Doh! One streamer can't do data for both tx and rx together. But I suppose things like QuadSPI are half duplex anyway. Yet one more reason why top speed bursting is a specialised mode to switch in and out of.
I've got a question I'm pretty sure I know the answer to but wanted to double check before I head down this rabbit hole...
If I'm using the smartpin SPI for byte read/write and wanted to change to say 32 bits for one transaction, instead of doing 4x byte transfers.. Do I need to dirl, dirh after the WXPIN? I'm pretty sure it would be needed, maybe easier to just do 4x bytes. I was thinking of setting up the smartpins for 32 bits and just only count to 8? Seems like this could work..
I've got a question I'm pretty sure I know the answer to but wanted to double check before I head down this rabbit hole...
If I'm using the smartpin SPI for byte read/write and wanted to change to say 32 bits for one transaction, instead of doing 4x byte transfers.. Do I need to dirl, dirh after the WXPIN? I'm pretty sure it would be needed, maybe easier to just do 4x bytes. I was thinking of setting up the smartpins for 32 bits and just only count to 8? Seems like this could work..
On Rx it is likely tricky, as somewhere a shifter transfers to a holding register, and it needs to know how many clocks to do that.
On Tx, you also have a holding register and shifter, but you may be ok if a write to the holding register 're-primes' the shifter.
The better SPI peripherals have FIFOs, and they can Tx and Rx continually with no pauses in the clocks. P2 may not quite achieve that both ways, but I think Chip has said it can manage no-pauses one way.
There is a single longword buffer and shifter for both tx and rx modes. How many of those bits are for tx word length depends on config.
There is two slightly different tx operating modes:
X[5]=1 is start/stop mode. This has auto filling of shifter if transmitting is finished, buffer automatically engages once transmitting has started.
x[5]=0 is continuous mode. This has buffer engaged with DIR high and shifter direct with DIR low. So initial filling of shifter is by lowering DIR to load first word into shifter.
Both will start over if DIR is cycled. Both present first bit on tx pin before first clock edge.
Changing the smartpin X parameter in mid operation will probably be actioned. However, the outcome isn't likely to be orderly.
Comments
What SPI mode are you using? It seems that you are changing state on high-to-low clock transitions, and therefore asserting on low-to-high. It seems to be mode 0 (CPOL = 0, CPHA = 0). Am I right?
Also, is the clock speed just 5KHz? I need to check my calculations, but this seems to be very slow. Despite that, it is nice to see that SPI is working (well, it should anyway, or else one wouldn't be able to read the SD card, to begin with).
I'm in the hopes of having SPI functionality integrated on a C library. However, I don't have the expertise to do that. I need speeds in the order of 12MHz for SPI, for a project.
Kind regards, Samuel Lourenço
So while smartpin based SPI bus is desirable, it is not a show-stopper for day to day work. If we wanted to run the P2 as a SPI slave then it would be useful to have smartpins handle this.
That scope snapshot was of the program running on RCFAST, so about 22 MHz sysclock. It was a demo of SPI clock at sysclock/2, so therefore SPI clock would be around 11 MHz. If sysclock was, say, 80 MHz then the SPI clock would be 40 MHz.
The scope snapshot showing a two sysclock lag (with registered tx pin) on the tx data so that data transition lines up on the low going clock edge:
Thanks for the smart pin lesson. Was working it today. I changed to pin 0 for the scope.
Added a couple of comments, please check if I was correct on the pin sequence to start the pin.
Thanks
But first, I'd suggest removing the hubset #0 as that is just overriding your chosen system clock frequency of 160 MHz.
Okay, the comments:
- DIRL #0 lowers the output enable (direction) control signal of pin0. So it's an output disable rather than pin low. And what's more, once a smartpin mode is selected then this control becomes a smartpin enable/disable instead of the physical pin. Hence the comment about enabling the smartpin with the subsequent DIRH #0. The physical pin output is then enabled with the (%01<<6) in the WRPIN instruction.
- ADD daclevel,#1 is simply adding 1 to the variable "daclevel". It starts at 222 so the first add increments daclevel to 223. Second loop daclevel becomes 224, then 225, and so on until the variable exceeds 32 bits and rolls over back to zero.
- WYPIN daclevel,#0 places a copy of daclevel into smartpin 0, specifically register Y of smartpin 0. Because the smartpin is configured to used its register Y as the set level of the DAC then this sets the voltage out of the DAC at that moment.
So because on each loop, daclevel gets incremented and then repeatedly written into the smartpin, the smartpin is repeatedly raising the DAC value in small steps and therefore the voltage also in small steps. That's why the slope of the sawtooth is rising to the right.
See if you can work out why the voltage has also the distinctive sawtooth vertical fall as well.
Here's what that one looked like with the same first ID byte as above (and I've also changed this to big-endian to match):
It worked because the rx smartpin was configured to use the low going clock rather than high going. As you can see, the second low going clock edge is just after the data level has changed. Not really ideal.
And rx mode can happily handle either, it's just a clock input polarity change for rx to move between first and second SPI clock edge. The prop2 can handle any polarity changes no problem.
Getting it fully functioning will be a decent undertaking though. Because QuadSPI, and its ilk, are a contorted extension to SPI, in practice, there will be commands and mode changes that are not always 4-bit parallel. So the implementation will need to handle back and forth mode transitions in a clean manner.
faster ramp down of the voltage. similar to capacitive charge and discharge
Got reversing sawtooth working. Thanks for your help. Starting to get a handle on this.
Had the wrong code
I agree that bit bashing is great for most things P2 but I think the smartpins could really help for SPI WHERE THERE IS PROCESSING TO BE DONE BETWEEN DATAWORDS. ie. What I really want to use smartpin SPI for is reading touchscreens. The large 7" resistive screens I use require a calibration, so before a final X,Y is output it must be filtered and processed. I'm not 100% sure how this would work out but seems doable. I see some room to improve the quality of my samples as well, which should help.
Smartpin SPI also has the advantage of going to sysclock /2 .
I've had a chance to sit down and look at things a bit more and @evanh 's comment about not being able to do CPHA =1 made everything click. I have the SD driver working solid using Mode 0,0 but my hardware uses Mode 1,1 and with pullups to allow the pins to be used for other things. I have a couple ideas I want to try to get Mode 1,1 working but even if I don't at least I learned some things about the smartpins!
I know most people won't care what mode their SD driver is running in and since Mode 0,0 works I'll probably base my driver off of this, instead of the path I'm going to head down shortly.
An aside, I'm only able to test SD up-to 25MHZ, around 27 my card stops responding. I'm pretty sure this would run to 50mhz no problem with a sysclock of 100mhz or greater. @"Peter Jakacki" if you would like to test what I have working so far I'll update the SD post with some code. Just let me know.
@samuell If you are looking for generic SPI code I could probably help you out.
I can try this on my P2D2 board though.
The key point I'm making here is these type transfers can be speed optimised for their longer lengths. This has a bearing on the structure of the overheads. In the case of a data block size the length is known exactly and an easy round number. In the case of ADC sampling there isn't any concern of trailing data loss, an arbitrary cut-off is fine.
Very high speed unbroken data rates will need some setup me thinks. Using a streamer, even for pure SPI, seems inescapable to achieve this.
ie the best SPI examples should really have HW Clocks and HW shifters.
If the final silicon is at least as fast as P2-ES, & cooler, maybe SysCLK/4 is going to be tolerable ?
Did you ever look at using 2 clk pins, so the TX shifter clock can be phase-moved relative to RX ? That costs a pin, but maybe that's tolerable for the very highest speeds ?
Using smartpins for data, sysclock/4 would manage unidirectional unbroken rate only with careful tight loop consuming the cog for burst length.
Two clock pins is exactly what the recent streamer based SPI clock was doing. Just that the second clock stayed internally mapped to the tx smartpin only. And then I duplicated the idea back to the bit-bashed SPI clock version as well. It is what allowed the superior alignment via OUT-as-input config - https://forums.parallax.com/discussion/comment/1474472/#Comment_1474472
Could that 'second clock stayed internally mapped to the tx smartpin' instead be generated by a smart pin, running in pulse-count-out mode ?
Hopefully, that can free the streamer for data transport ?
Saving a physical pin clearly has appeal, so if that can be internal, that is great.
As I just pointed out it, I think it would be possible to engineer a longer burst at sysclock/4 with a customised routine just for that case. This one would use streamer for SPI clock - to do the OUT-as-input config.
Sysclock/8 would use a smartpin for both because the lag issue fades at this point. No longer requires the OUT-as-input config.
Also, as I've been saying a few times, to get unbroken data from sysclock/2 will require using streamer to handle the data. So far I've only done streamer for clock.
Chip's suggestion is streamer for data and smartpin for clock. In this mode there is no clock following occurring. It's purely a case of the software aligning the starts of both the clock and data to coincide. And ensuring the divider timings also match so that they don't go out of sync mid way. And also good idea to make them finish together as well.
One problem I was having is when sysclock was not at least 2x spi_max_clock. This SHOULD fix that, although I'm sure there's a cleaner way to write it. I could use some help because I admit to not being good at the compound min-max statements...
One of the other things I didn't like was hard-coding the clock pin relative to the data pins so this should clean that up. Although it should probably be a one-liner too...
Hopefully someone can point out that nice one-liner for those two methods. So simple I'm just too close to the problem, but at least I didn't implement getmask with a lookup table?
The idea of using the streamer to handle data is something I've been contemplating, although I tend to want to leave the streamer free so I can run code in hubex at some point??
The fact I'm able to get sysclock/2 will probably be killed by the extra flops in the pins... maybe sysclock /4 is realistic, and totally acceptable. I guess I'll deal with that when I get there.. Doing full-duplex spi at sysclock/2 is unrealistic as well. I'm probably only getting this to work out of luck... I'm not sure what the difference between an external device and same-cog loopback imply.
The one takeaway from my testing code.. The simple test using bit-bashed pins time to complete was usually a linear function of clock speed.. Once sysclock*2 > spi_max_clock, the time to complete was not as dependent on clock speed. ie test @160 mhz takes 13s and @320 MHz 11 second. I haven't compared times on bit-bashed code but decoupling from sysclock is nice and the streamer will probably help.
If I'm using the smartpin SPI for byte read/write and wanted to change to say 32 bits for one transaction, instead of doing 4x byte transfers.. Do I need to dirl, dirh after the WXPIN? I'm pretty sure it would be needed, maybe easier to just do 4x bytes. I was thinking of setting up the smartpins for 32 bits and just only count to 8? Seems like this could work..
On Rx it is likely tricky, as somewhere a shifter transfers to a holding register, and it needs to know how many clocks to do that.
On Tx, you also have a holding register and shifter, but you may be ok if a write to the holding register 're-primes' the shifter.
The better SPI peripherals have FIFOs, and they can Tx and Rx continually with no pauses in the clocks. P2 may not quite achieve that both ways, but I think Chip has said it can manage no-pauses one way.
There is two slightly different tx operating modes:
X[5]=1 is start/stop mode. This has auto filling of shifter if transmitting is finished, buffer automatically engages once transmitting has started.
x[5]=0 is continuous mode. This has buffer engaged with DIR high and shifter direct with DIR low. So initial filling of shifter is by lowering DIR to load first word into shifter.
Both will start over if DIR is cycled. Both present first bit on tx pin before first clock edge.
Changing the smartpin X parameter in mid operation will probably be actioned. However, the outcome isn't likely to be orderly.
I was pretty sure this would be the case. It's probably better to just do 4x bytes...