Hunting for an already invented wheel - Slave Sync Serial

photomankc · 2016-02-22 16:39

Looking to see if anyone has already developed a serial shift library that *is clocked by the other end*. In developing something using the Raspberry I had made the faulty assumption that the shiftin function on that end would have the clock line driven by the device sending data. That is not correct. It expects to drive the clock when shifting data in as well. Given that it clocks at 2.3MHz that's not an insignificant data rate to keep up with.

I was hoping to not have to chisel my own wheel here so I thought serial shift-in and out would be easy to implement since each side had a library, but each side also expects to drive the clock so I have a problem. Is there any library on the Propeller side that can shiftin and out using an external clock? I guess acting as something of a slave device?

Is there an IC that lets one side shift in data with it's own clock, latch it and then the other side can shift it out with it's own clock? That kind of sucks though since it cuts the data rate in half at least. Most shift registers I see go from serial to parallel. I was hoping to avoid chewing up the only hardware UART on the Pi for this or some USB -> Serial floppy cable and pin header. If I'm going to have to roll my own anyway I guess I'll just burn 5 lines and make it a 4 bit bus so I can move the same data rate at 4x slower clock rate and it only uses 2 more pins than I am now.

DavidZemon · 2016-02-22 17:10

I'm afraid I have not seen anyone implement this in any language on the Propeller. The problem is that bit-banging the slave side of a synchronous serial protocol would be both tedious and slow. Everyone I know that has wanted to do this has found different solution.

photomankc · 2016-02-22 17:39

That was what I was afraid the answer would be. Just sucks to be stuck with async serial at 115200. I've got other MCU's that can do slaves on SPI or I2C but they have hardware. I hope at some point the "soft-peripherals" of the Prop will include those too.

Alright.... well that was a fun exercise anyway.

Edit: I guess it does come across whinny. Wasn't really my intent. I don't really have the assembly skill to do this well. Took me months to write and debug a 400Khz I2C implementation back before C took over. It worked but was never really ready for prime-time before others got it done better. Maybe someday when I'm not trying to wrestle with a stalled out robot project.

DavidZemon · 2016-02-22 18:19

photomankc wrote: »

Just sucks to be stuck with async serial at 115200.

Why stuck at 115200? Many UART implementations I've seen for the prop support much higher. I've seen a few that do 4 MBaud

Peter Jakacki · 2016-02-22 18:35

I would always use asynch serial and you can just use dedicated receiver which will handle up to 4M baud. The transmit part back is easy. If you run half-duplex then you only need a single I/O line.

The FDS object is not designed for this though and even at 115.2k it is jittery.

kwinn · 2016-02-22 18:47

2.3 MHz gives you 434.78nS between clocks, time for 8 instructions @80MHz so you might be able to use a tight pasm loop to do this.

JonnyMac · 2016-02-22 19:07

Just sucks to be stuck with async serial at 115200.

No, you're not. You could code what you presently don't have.

I hope at some point the "soft-peripherals" of the Prop will include those too.

My acting teacher used to say, "Don't whine, win!" I'll paraphrase with, "Stop complaining; start coding!"

Peter Jakacki · 2016-02-22 19:09

JonnyMac wrote: »

Just sucks to be stuck with async serial at 115200.

No, you're not. You could code what you presently don't have.

Or is all of your project simply assembled from pieces written by others?....

and this is true too, to take things into context if the application only involves <10,000 bytes/sec etc.

tomcrawford · 2016-02-22 21:32

kwinn wrote: »

2.3 MHz gives you 434.78nS between clocks, time for 8 instructions @80MHz so you might be able to use a tight pasm loop to do this.

I'm pretty sure it can be done in 9 instructions:

  waitpeq clock pin
  tst datapin, wz
  if_nz OR #1
  shl dataword
  djnz bit counter, #topOfLoop
  mov buffer, dataword
  add #$-1, #1
  mov bit counter, #bits per word
  djnz word counter, #topOfLoop

So close and yet so far. How about with a 6 MHz crystal?

photomankc · 2016-02-23 17:09

JonnyMac wrote: »

Just sucks to be stuck with async serial at 115200.

No, you're not. You could code what you presently don't have.

I hope at some point the "soft-peripherals" of the Prop will include those too.

My acting teacher used to say, "Don't whine, win!" I'll paraphrase with, "Stop complaining; start coding!"

Ummmm, Ok. Yeah I get what you are saying. I could, but I likely wont. There are a lot of competing projects, so I won't be scratch-writing a synchronous serial slave. I was looking to see if one was ever written. It's not so that answers that. I'm going to go on with some other aspects of the bigger project and use what's already developed. Probably how this goes down when other people consider doing it too I guess.

photomankc · 2016-02-23 17:17

DavidZemon wrote: »

photomankc wrote: »

Just sucks to be stuck with async serial at 115200.

Why stuck at 115200? Many UART implementations I've seen for the prop support much higher. I've seen a few that do 4 MBaud

Last time I tested the FDS serial in C, 115,200 was the limit I could get reliable communication at 230400 worked sometimes, but sometimes not. 115K is probably enough. The main issue was finding a USB serial adapter that did not involve a USB A to Mini connection. I found one that is mounted to the USB A plug and then I can solder a pig-tail and connector to connect it up to a Propeller board. Should be a more reliable mechanical arrangement.

photomankc · 2016-02-23 17:20

tomcrawford wrote: »
kwinn wrote: »

2.3 MHz gives you 434.78nS between clocks, time for 8 instructions @80MHz so you might be able to use a tight pasm loop to do this.

I'm pretty sure it can be done in 9 instructions:
  waitpeq clock pin
  tst datapin, wz
  if_nz OR #1
  shl dataword
  djnz bit counter, #topOfLoop
  mov buffer, dataword
  add #$-1, #1
  mov bit counter, #bits per word
  djnz word counter, #topOfLoop
So close and yet so far. How about with a 6 MHz crystal?

That's kind of where I was landing too. Possible but really close to the line and maybe just over it. Then if something changes and the Raspberry manages to go a little faster it's broken.

photomankc · 2016-02-23 17:22

JonnyMac wrote: »

Just sucks to be stuck with async serial at 115200.

Or is all of your project simply assembled from pieces written by others?....

Wow, that's.....

forget I asked.

tomcrawford · 2016-02-23 17:28

If you know ahead of time how many bits will arrive in a burst, you can do it in four instructions (5 MHz):

   waitpeq clockMask
   tst dataMask wz
   if_NZ OR DataWord, AppropriateBitFromATableOFPowersOfTwo
   waitpeq ZERO clockmask
   repeat as needful for however many bits you expect

That should work just fine for up to 32 bits at a time.

DavidZemon · 2016-02-23 17:33

Ah... I just realized when I quoted 4 MBaud, I was thinking transmit not receive. I have no idea how fast the configurable UARTs can receive. If you want to give PropWare a try, it is configurable for any baud and might (or might not) run faster than FDS. PropWare::FullDuplexUART is unbuffered, but it isn't hard to run it in a second cog and have it write to a shared PropWare::Queue.

photomankc · 2016-02-23 18:04

tomcrawford wrote: »
If you know ahead of time how many bits will arrive in a burst, you can do it in four instructions (5 MHz):
   waitpeq clockMask
   tst dataMask wz
   if_NZ OR DataWord, AppropriateBitFromATableOFPowersOfTwo
   waitpeq ZERO clockmask
   repeat as needful for however many bits you expect
That should work just fine for up to 32 bits at a time.

That's slick! I could write in words or dwords and then take a break and send again. Really neat way to go using the table to OR in the value.

Bean · 2016-02-23 18:09

Transmit can be very fast since you can use the video generator to output multiple bits with one instruction. Either sync or async.
I'm not sure how fast, but probably 10's of MHz.

P.S. Oops. I see this is with an external clock. That won't work...

Bean

kwinn · 2016-02-23 18:46

Then again, one could also use a serial to parallel chip ('595) and waitcnt to grab 8 bits at a time.

Peter Jakacki · 2016-02-23 19:09

The problem with any receive code that is too tight is that it doesn't have enough time to write the results of the data to the hub along with updating any hub write index etc. I have code in Tachyon which dedicates a cog to high speed receive and you can always push it a bit more by using 2 or more stop bits when transmitting to it, no harm, but that's not 100% throughput The transmit side I just simply bit-bashed unbuffered straight from the application code as buffering makes sense at slow speeds but takes too much time at very high speeds to the point where it makes it slower again.

If you want some PASM code isolated I can give you a nice simple Ultra-FDS if you like. But I'm still wondering as we have been hinting as to what are your actual requirements in regards to payload and response etc? This then determines what is the best way which in my opinion would be this fast and reliable asynch serial.

photomankc · 2016-02-23 20:55

Peter,

The actual data to move is not much, but there needs to be a lot of conversations reading sensors and sending control values. I used to have a spreadsheet that outlined all the commands and the data back and forth but I've lost that.

Thinking back I believe that my real concern over bit-rate had to do with the maximum time the receiver could spend waiting on all the data. On the multi processor prop it may be less of a problem to have one of them tied up in long 1 or 2ms data transactions where getting the data over fast was more critical on single processor systems so they could get the deed done in time.

I really think 115K will be fast enough for what I need to do.

tomcrawford · 2016-02-25 17:18

Attached is an SPI Slave receive program that will input at 4 MHz MSBFIRST. The burst length is limited to 32 bits (which avoids the issue of storing the result in hub memory (and processing it)). There are four objects:

sps.start: Starts a PASM cog which then waits for an arming command
sps.stop: Kills the PASM cog
sps.arm: Tells the PASM cog which pins to look at (clock and data) and how many bits to accumulate (up to 32)
psp.wait: Waits until the PASM cog has accumulated n bits and returns them

The receive loop is "unwound". As written, it is four instructions per bit. I believe it will run at 4 MHz and will not run at 5 MHz (80 MHz clock).
If you can guarantee the clock is about 100 nsec or less, you could remove the fourth instruction for each bit and it ought to run at about 6 MHz.

A demo program is included.

DavidZemon · 2016-02-25 17:44

tomcrawford wrote: »

Attached is an SPI Slave receive program that will input at 4 MHz MSBFIRST. The burst length is limited to 32 bits (which avoids the issue of storing the result in hub memory (and processing it)). There are four objects:

sps.start: Starts a PASM cog which then waits for an arming command
sps.stop: Kills the PASM cog
sps.arm: Tells the PASM cog which pins to look at (clock and data) and how many bits to accumulate (up to 32)
psp.wait: Waits until the PASM cog has accumulated n bits and returns them

The receive loop is "unwound". As written, it is four instructions per bit. I believe it will run at 4 MHz and will not run at 5 MHz (80 MHz clock).
If you can guarantee the clock is about 100 nsec or less, you could remove the fourth instruction for each bit and it ought to run at about 6 MHz.

A demo program is included.

Very cool

. Does it do 4 MHz with sequential words, or does it require a pause between each word for the HUB write? If it requires a pause, do you know how fast of a clock you can run such that no extra pause is required between words?

tomcrawford · 2016-02-25 18:15

Maximum of just 32 bits at a time. To do a hub write between bits takes, as they say, up to 23 cycles. Add to that, 1 instruction to determine it is time to do the write, another to modify the address in hub memory for the next write, and yet another to loop; pretty soon you're talking about, say, 40 or 50 cycles. I doubt that you could do sustained input much faster than 1 MHz.

DavidZemon · 2016-02-25 19:57

Right, makes sense. I'm asking what's the max clock speed that can be handled with two sequential calls to wait()? Perhaps code like so:

PUB readTwoWords | word1, word2
    word1 = spi.wait()
    word2 = spi.wait()

And we'll assume n > 16, or else obviously this short example could be combined into a single call where n is doubled.

tomcrawford · 2016-02-25 23:24

David, sps.arm and ps.wait have to come in pairs. sps.arm specifies the pins and bit count; sps.wait waits on the PASM cog and returns the results. Cant have one without the other. sps.arm takes just over 20 instructions to sort out the pins, etc and get set up to await the first clock.

guenthert · 2016-03-04 05:05

tomcrawford wrote: »

Attached is an SPI Slave receive program that will input at 4 MHz MSBFIRST. The burst length is limited to 32 bits (which avoids the issue of storing the result in hub memory (and processing it)). There are four objects:

sps.start: Starts a PASM cog which then waits for an arming command
sps.stop: Kills the PASM cog
sps.arm: Tells the PASM cog which pins to look at (clock and data) and how many bits to accumulate (up to 32)
psp.wait: Waits until the PASM cog has accumulated n bits and returns them

The receive loop is "unwound". As written, it is four instructions per bit. I believe it will run at 4 MHz and will not run at 5 MHz (80 MHz clock).
If you can guarantee the clock is about 100 nsec or less, you could remove the fourth instruction for each bit and it ought to run at about 6 MHz.

A demo program is included.

Thanks for sharing.

Maybe I missed it, but what mechanism do you have in mind for detecting frame errors? Assuming the peer (the SPI master) reboots for some reason in the middle of a transfer, how would master and slave ever synchronize again?

tomcrawford · 2016-03-04 20:09

The program was written as an exercise to see just how fast the Prop could run as an SPI slave; no other consideration. No defensive coding at all. If the SPI master stops the clock during a transfer (either high or low), the PASM cog will hang forever. You would have to make provisions in the "boss" cog to detect this (by timing out). The recovery procedure would include stopping and restarting the PASM cog.

Hunting for an already invented wheel - Slave Sync Serial

Comments