Looking for a fast read and write SPI object.

Duane Degn · 2011-04-10 18:31

I'm presently working on an object for the Nordic nRF24L01+ modules.

As I've mentioned in a previous thread, I took out the delays in the SPI_Asm object which has sped it up.

This is working relatively well except the SPI_Asm object can either read or write via SPI not both at the same time.

The nRF24L01+ modules output the status register as a command in sent to the device.

I'd like to be able to capture this status information as I write a command.

I've looked through the OBEX and see some drivers of various devices that use SPI. I'm hoping someone here can save me from looking through all these specialized drivers to see if there is anything useful for my needs.

Does anyone know of a PASM SPI driver that can read as it writes?

Another item on my wish list is a SPI driver that can read or write up to 32 bytes at a time.

I think my programing skills are just about up to satisfying these needs myself, but I'd sure rather spend my programming time and energy on other projects.

Kye · 2011-04-10 18:34

Look at the lowest level ASM code for my file system block driver. It should be pretty easy for you to pick it out and adapt it for whatever you need. 10Mhz read, 20Mhz write. Uses the C flag only and send bytes at a time.

Duane Degn · 2011-04-10 18:41

Kye, Thank you.

I'm glad to see you have some code I can use.

It's always a pleasure to read your code. I always learn a lot when I do.

These Nordic modules can do SPI at 10MHz. I'm hoping to have the airwaves buzzing with data.

Duane Degn · 2011-04-10 19:02

Kye, By "my file system block driver" you mean the SPI section in PASM of SD3.01_FATEngine?

Doe's this driver read and write at the same time? Can you point me to the section of code I should be looking?

I see your "Command SPI" section and "Response SPI" section but I as far as I can tell these either read or write via SPI not both at once.

These sections are still very educational. I need to learn how to use counters to control a clock signal. I was wondering how you could write a 20MHz, pretty cool.

kuroneko · 2011-04-10 19:26

Duane Degn wrote: »

These Nordic modules can do SPI at 10MHz.

At this speed you have 8 cycles available to do sampling and assembly (assuming 80MHz). I don't know if this is working. It all depends on the relationship between incoming and outgoing data relative to the clock. Let's say you drive bit 7 (DOUT, the only active output for this cog) and sample bit 8 (DIN).

DAT
        mov     outa, data              ' data is byte to send, bit 7 first

        test    mask, ina wc            ' collect 1st bit from DIN
        rcl     outa, #1                ' rotate left into outa, also update DOUT (6 -> 7)
        test    mask, ina wc            ' collect 2nd bit from DIN
        rcl     outa, #1    
        test    mask, ina wc            ' collect 3rd bit from DIN
        rcl     outa, #1    

        ...
        
mask    long    |< 8

jhh · 2011-04-10 20:11

I've been studying a lot of SPI code of late, and have to say that Kye's "fast mode" in FATEngine has to be closest to the limit of what is physically possible for the prop chip, and was very educational to me in learning how the clocks work.

The 20 and 10 Mhz rates are done via those clocks and the speeds are the fastest they can run and still manipulate the PHSB register to pull off what he did.

Writing is 1 instruction per bit, 4 clock cycles per instruction, so 80Mhz/4 is 20Mhz for the write rate. Reading is 2 instructions, 8 clock cycles per bit, or 10Mhz @ 80MHz system clock.

It's incredibly nice code for byte-oriented transfer.

One thing I don't know, is that although SPI allows "simultaneous transfer", where each bit cycle has read AND write, is if anything actually requires that? I have an SPI routine I've been knocking around, which allows for it, so I found I couldn't use Kye's technique without driving the clock down further than I needed since I have more overhead during transfer.

But does any device really need that? SD, SRAM, Flash, all seem to all do "in-then-out" or "out-then-in" in a byte oriented way, ignoring whats read on writes, and feeding dummy values on reads. (I see two flavors of devices for the latter, some like $FF, some like $00?)

If it's really not needed ever, I'd love to ditch what I have and do something heavily inspired by what Kye did.

Duane Degn · 2011-04-10 20:39

kuroneko, Thanks for taking time to help with this.

While these devices can communicate via SPI at 10MHz, I'm very willing to use a slower speed if I can read and write simultaneously.

Just in case anyone in interested, the MOSI line is read on the rising edge of the clock. New data is ready on the MISO line a few nanoseconds after the falling clock. The Propeller may read this before or after the rising clock. The data needs to be present on the MOSI line for 2ns prior to the rising clock so the clock can be driven high the instruction following setting the data line.

The data is read (and written) MSBit first LSByte first.

kuroneko, I confess to not really understanding your code. It looks like your shifting out the data, and I see you reading bits on "ina" but I don't see how these bits are assembled into a byte.
I'm relatively new to the world of PASM and this stuff is just barely hitting the top of my head (I'm hoping it's not over my head).

I was kind of hoping someone had an object similar to SPI_Asm but that can read as it writes. I'm not being lazy I'm being . . . efficient.

Thanks again to Kye and kuroneko for your help.

kuroneko · 2011-04-10 20:57

Duane Degn wrote: »

kuroneko, I confess to not really understanding your code.

Apologies. My mistake.

mov     outa, data              ' data is byte to send, bit 7 first

        test    mask, ina wc            ' collect 1st bit from DIN
        rcl     outa, #1                ' rotate left into outa, also update DOUT (6 -> 7)

The first line places the byte into outa, MSB on top of bit 7.
The test instruction tests bit 8 and affects the C flag (parity) depending on what it sees (C set for '1').
rcl (rotate carry left) shifts outa to the left (exposing the next bit to be sent) and also inserts the C flag into bit 0.
After doing this 8 times the byte you sent will occupy bits 15..8 of outa and bits 7..0 now hold whatever came in from DIN.

outa[15..0] after executing rcl 3 times. HTH

bit 8 (DIN) +----->- test mask, ina wc ->-----+
                          |                                 |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+        v
   |  |  |  |  |  |O7|O6|O5|O4|O3|O2|O1|O0|[COLOR="red"]I7[/COLOR]|[COLOR="red"]I6[/COLOR]|[COLOR="red"]I5[/COLOR]| <- carry flag
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
                             |
                             v
                             bit 7 (DOUT)

Duane Degn · 2011-04-10 21:01

jhh wrote: »

One thing I don't know, is that although SPI allows "simultaneous transfer", where each bit cycle has read AND write, is if anything actually requires that?

These devices don't "require" a simultaneous transfer but it would be very useful. The status register may be read as a command is given to a nRF24L01+. I'd like to frequently read this status register since it contains information about the success of the transmission of a data packet and other useful information. An ideal write method would return this status register so I could take appropriate action based on the information. As it now, I can read the status register after my the original write.

As I think about this a bit more, the only byte I'm interested in is this first one. The other bytes read during a write are dummy values. So an even better ideal write method would only read this first byte and not worry about the rest. So after the first byte is written the driver should switch to high speed mode and not worry about reading any more.

I'm not so sure how fast I'll really need these transceivers to communicate. I just thought it would be nice if the full speed of these modules were available.

Duane Degn · 2011-04-10 21:07

@kuroneko

Apologies. My mistake.

Apology accepted (though none was needed).

Thank you for further clarification. I didn't know the "outa" could be used that way. That is pretty slick.

Thanks again for your help.

Duane

Ahle2 · 2011-04-11 00:08

I have made an unreleased spi-driver which can read/write 20 Mbit/s to/from the spi-ram in the C3 using the video generator. And by 20 Mbit/s, i really mean continous 20 Mbit/s bandwidth from/to hub-ram to/from spi-ram. While the video generator is at work sending a byte, the next byte is read and prepared.
That's multitasking, I LOVE the propeller.

It isn't released due to some issues with synchronizing the PLL phase. It works like 95% of the times the driver is started in a cog.

/Ahle2

kuroneko · 2011-04-11 00:14

Ahle2 wrote: »

It isn't released due to some issues with synchronizing the PLL phase. It works like 95% of the times the driver is started in a cog.

You could ask for help. Just saying ...

jhh · 2011-04-11 22:45

Duane Degn wrote: »

These devices don't "require" a simultaneous transfer but it would be very useful. The status register may be read as a command is given to a nRF24L01+. I'd like to frequently read this status register since it contains information about the success of the transmission of a data packet and other useful information. An ideal write method would return this status register so I could take appropriate action based on the information. As it now, I can read the status register after my the original write.

As I think about this a bit more, the only byte I'm interested in is this first one. The other bytes read during a write are dummy values. So an even better ideal write method would only read this first byte and not worry about the rest. So after the first byte is written the driver should switch to high speed mode and not worry about reading any more.

I'm not so sure how fast I'll really need these transceivers to communicate. I just thought it would be nice if the full speed of these modules were available.

The core loop of "my" (it's all inspired from elsewhere) routine is this:

spiDoClockCycles        max     spiLayer1Param, #32   ' at most 32 bits to shift
                        tjz     spiLayer1Param,#spiDoClockCycles_ret ' skip if 0
:repeatclock            shl     spiBitsOut, #1   wc       ' shift off the bit to send
                        andn    outa,spiMaskClk       ' clock 0
                        muxc    outa,spiMaskDI        ' send bit
                        or      outa,spiMaskClk       ' clock 1
                        test    spiMaskDO,ina    wc ' get incoming bit
                        rcl     spiBitsIn,#1          ' and put it in buffer bit 1
                        djnz    spiLayer1Param, #:repeatClock
                        andn    outa,spiMaskClk       ' clock 0
                        andn    outa,spiMaskDI       ' DI 0
spiDoClockCycles_ret    ret

How many bits to transfer comes out on SPILayer1Param
Bits to go out are in spiBitsOut (must be aligned to the MSB of the long, IE if sending only 8 bits, it needs to be the first byte of the long.)
Bits that come in are in spiBitsIn, and will be lined up to the LSB of the long.

Based on the length of the loop (28 clocks), it probably gets just shy of 3Mhz, but bidirectionally.

I think I'd worked out the best I could get Kye's method to work bidirectionally was 16 cycles (1 write, 2 read, 1 for loop), but it starts to get a bit more complex as the loop gets longer. I wondered too about the relationship between the cycles of instruction processing to the hardware clocks rise and fall. Also, that the end of the loop takes 8 cycles further complicates things re: start/stopping the clock... you'll notice Kye's code actually skips the last bit of a byte, so the clock can be immediately turned off, and THEN it brings the last bit's processing into play. So I'd have to go down another clock divider to effectively 32 cycles per clock, which is why I stopped looking. I could do that, but @ 2.5 MHz, just keeping the loop I had would be faster in the end for doing bidirectional SPI.

PS: I need to read the original post before asking silly questions re: "is bidirectional even needed?", when that was half the point of the original post... :P

Duane Degn · 2011-04-12 09:53

jhh, Wow, I think I understand what you posted. (I'm very excited about my new PASM skills.)

One technique that is new to me is

:repeatclock            shl     spiBitsOut, #1   wc       ' shift off the bit to send
               '<snip>
                        muxc    outa,spiMaskDI        ' send bit

I'm guessing c is written if the MSB is 1? (Duh, Maybe that's why it's called the carry flag.)
I confess to not using the c flag much myself. I don't think I know enough to know when it would be useful. (Though I'm seeing how it is used a lot in this latest project.)

This does seem faster than Beau's method in SPI_Asm. (I've changed some of the variable names.)

MSB_Sout      test    t3,             t5      wc        ''          Test MSB of DataValue
              muxc    outa,           mosiMask          ''          Set DataBit HIGH or LOW
              shr     t5,             #1

t3 holds the data and t5 had previously been setup as a MSB mask.
I think your method is one instruction shorter.

I like in your method, four bytes are sent/received at a time.

Do you have to be careful about the order your bytes are placed in the long? The littleendian thing often messes me up. In this case the littleendian could be a good thing since if the long only held one byte, the long's MSB would also be the byte's MSB.

Yes, my original post did request a way to read and write simultaneously. As I've thought more about when I'd want to be able to do so, I realize it's not really as important as I first thought.

My main need as I previously mentioned, is to read the status register. But the times the status register information is going to be the most important is when the interrupt pin is set (active low). There are several reasons this may be set low and I wont know the reason until I read the status register. I doubt I'll want to write anything to the modules as I read this register. And then once I do know the contents of the status register, I wont need to be checking it as I write the next data to the module.

I still think it would be useful to be able to read and write at the same time but I think I can live without it for a while.

These modules like a $FF to be written as a read is performed. I just set the MOSI pin high as I perform a read. Although $00 didn't seem to cause a problem before I made this modification.

Thanks again for all your help. I've learned a lot.

(Note to self: Don't add code until after spellcheck.)

jhh · 2011-04-12 12:16

Quick replies:

Yes, you need to be careful about placement in the long if only doing a byte, it must live in the 8 most significant bits of the long.

The advantage of the code you pasted is it's not destructive to the data being sent, where the one I showed will "destroy" the original data being sent, because I am shifting the bits out. (Where that shifts a mask used for plucking bits out)

Looking for a fast read and write SPI object.

Comments