@evanh said:
Is that the SPI clock rate? If so then I'm impressed. And I assume your Prop2 is at 340 MHz and therefore achieving just short of sysclock/3. SPI receiver smartpin has proven to be good for that. Transmitter not so much - because of the lag issue.
Yes P2 runs at 340 MHZ, but even with RPI 110 MHZ SPI clock I am not getting data faster. Is there any workaround that we can follow to receive data from RPi to P2 in 1.5 to 2ms? Also if we increase RPI spi clock frequency from 110 MHZ to 130 MHZ, we are not getting data properly.
@disha_sudra said:
Is there any workaround that we can follow to receive data from RPi to P2 in 1.5 to 2ms?\
You still need to separate overheads from data rate. If we don't know the achievable data rate of the interface then we can't gauge how much of that 3 ms is overheads. Then, after that, figure out where the overheads are coming from.
Also if we increase RPI spi clock frequency from 110 MHZ to 130 MHZ, we are not getting data properly.
That would be right. It exceeds sysclock/3 rate. 340 / 3 = 113 MHz.
Theoretical peak data rate is 110 Mb/s. 19200 x 8 = 153600 bits to transfer. So ideal best case is 153600 / 110e6 = 1.3 ms. I guess, unlike SD card block transfers, there isn't any need for block gaps. It can transfer everything in one long burst. So only overhead should be initiation handshake.
EDIT: A bonus is if a spare cog is used to be the SPI slave then it can effectively operate in the background. Not taking up any foreground processing time at all. At least not until notification of transfer completion. The RPi controls the transfer.
I'm not sure if this is advantage for you though. Why do you need such a short transfer time?
There is buffer limit in P2. So we fixed 19200 bytes buffer size.
Whole scenario is as below
RPI should transfer 19200 bytes in 1.5 to 2 ms
After receiving 19200 bytes P2 start processing of received 19200 bytes and in parallel cog and RPI can start filling of next 19200 bytes. (Processing time of P2 here is less than 2 ms), so we need RPI to transfer 19200 bytes in less than 2 ms to prevent any delay in printing.
I've managed to stay sane long enough to refine the SPI handler a little more. The events hardware isn't all that well documented so needed a rock solid understanding.
Here's the newest [revised again] version of the command handler I've been working on:
'==========================================
' Wait for a command from master
'==========================================
' If CLK glitches while CS still high then a fast reset of smartpins occurs
' If CS glitches then RDPIN returns a zero and jump table back to "cmdloop"
'
cmdloop
testp p_cs wz ' CS high? (Z=1 is yes)
if_nz jmp #cmdloop ' wait until CS high, avoids random garbage with CS low
.retry
dirl p_txrx ' reset both Tx and Rx smartpins, clears buffers
dirh p_txrx ' enable both Tx and Rx smartpins
.cawait
jpat #.retry ' CS and CLK rise(clocking edge), clock glitch while CS high
testp p_mosi wz ' cmd+addr received? (Z=1 is yes)
if_nz jnse2 #.cawait ' wait until CA complete or CS pin rise
rdpin cmdaddr, p_mosi ' read smartpin's buffer (and ACK the smartpin)
rev cmdaddr ' 32-bit endian bit-swap
getnib pb, cmdaddr, #7 ' first 4 bits is command
setnib cmdaddr, #0, #7 ' remove command from the address
altgb pb, #cmdjmp ' lookup table of 8-bit pointers
getbyte pb
jmp pb
Not to derail any of your efforts, but I am curious why you are using a SPI interface as opposed to just a standard USART interface between the Pi and the Propeller.
@"Beau Schwabe" said:
Not to derail any of your efforts, but I am curious why you are using a SPI interface as opposed to just a standard USART interface between the Pi and the Propeller.
That's a point. If the RPi can do 100 Mb/s UART with 32-bit word framing, then that'd be ideal for the Prop2.
Most likely the speed is a problem because regular hardware implementations of UARTs have x16 input sampling of each bit spacing. Such would then need 1600 MHz sampling clock on the RX serial pins.
@evanh said:
That's a point. If the RPi can do 100 Mb/s UART with 32-bit word framing, then that'd be ideal for the Prop2.
Most likely the speed is a problem because regular hardware implementations of UARTs have x16 input sampling of each bit spacing. Such would then need 1600 MHz sampling clock on the RX serial pins.
Finding actual Pi UART use cases above even 4Mbd is not easy.
The web finds paper numbers like this
UART: 476baud to 31.25Mbaud (theoretical maximum)
SPI: 3.8 kHz to 250 MHz (theoretical maximum)
I2C: 400kbps
but the PI also has small FIFOs, so you would need to lock a core to a UART to manage
Google also finds this
If the system clock is 250 MHz and the speed field is zero the SPI clock frequency is 125 MHz. The practical SPI clock will be lower as the I/O pads can not transmit or receive signals at such high speed
There was a lot of problems but it can be done. I know that it took a lot of time but I created comunnication beetwen RPi and other side devices (parallel connection) using baudrate = 10500000 (10,5Mb).... Now transfer 2GB of data takes about 1 hour
@jmg said:
How long does CS need to be high between bursts ?
Now that I've got something near complete. I haven't tested it for real but glancing at the typical code path looks like about 32 sysclocks from detecting a CS rise until ready for a subsequent CS low. So about 100 ns at 340 MHz.
3/4 of that time is the loop period of prior function. Not all are the same length so there is going to be variations. Maybe I could utilise interrupts to make that more responsive ...
EDIT: Interrupts now working smoothly!
I'm not sure what the worst case response will be though. It only needs two instructions (4 sysclock ticks) within the ISR to be ready. Worst case IRQ blocking is just branch instructions. So that's another 4 ticks I guess, plus the 4 ticks for the branch into the ISR itself. So 12 ticks total, maybe. So we're down to a relatively consistent 35 ns hopefully.
EDIT2: Okay, not that great. It needs 16 ticks20 ticks when SPI clock frequency is sysclock/8. Didn't need any packing at all when sysclock/80. I'm thinking the old method probably needed over 120 ticks.
And the measure is not exactly how long the CS high pulse is but rather the time from CS rise until the shifter smartpins come out of reset. The CS high pulse can be shorter as long as the subsequent SPI clocking is delayed enough.
EDIT3: sysclock/3 and sysclock/4 are both 24 ticks. So at 340 MHz that's 70 ns minimum from CS rise until clocks can start. CS can go low again anytime in between.
Huh, turns out I was getting unexpected clock glitch events. By clearing that event state along with the shifter reset it has now closed the margins some more. And also explains some erratic-ness that would come and go.
New figure based on previous measuring technique is now 12 ticks at sysclock/3. Which is exactly what I was originally hoping for. But I should hook up the scope because I suspect there is a little more when viewed from the physical pin timings ...
EDIT: Right, it's actually 13 ticks on the scope. And that's to the shifter clocking edge. Which can be either the first or second clock edge depending on CPHA of SPI clock mode. So at 340 MHz sysclock that's 38 ns minimum between CS rise and first clocking edge of next command. With CS going low again anytime in between.
Well, I got a newly observed oddity now - The Tx smartpin doesn't forget its data during a reset. If re-enabled it'll just repeat the old data word over and over. It makes sense because both initial loading modes allow a data word to be placed directly into the shifter, bypassing the buffer, by starting with DIR low (Which would normally clear all the working registers in many other smartpin modes).
It could be argued it's not an issue but I think I should probably do something to clear the data.
@evanh said:
Well, I got a newly observed oddity now - The Tx smartpin doesn't forget its data during a reset. If re-enabled it'll just repeat the old data word over and over. It makes sense because both initial loading modes allow a data word to be placed directly into the shifter, bypassing the buffer, by starting with DIR low (Which would normally clear all the working registers in many other smartpin modes).
Do you mean in slave mode, if more clocks arrive before new data, it repeats the last sent data word ?
That underrun behaviour is somewhat common I think.
This is after a CS high reset of the smartpins - software performs a DIRL-DIRH combo. Yes, as the slave. The Tx buffer register retains prior data and happily feeds it out when the smartpin is re-enabled. The software needs to either leave the smartpin in reset or feed it a zero value to keep the data pin quiet. In the past I've always left it in reset until needed.
Now though, the Tx smartpin is live all the time. This is so that I can pace out a known number of SPI clocks to the start of a reply to a command.
PS: The fix is simply add a WYPIN #0,txpin. Just had to work out the best place for it was all. Most convenient place is after the smartpin reset.
ISR_cs_rise ' abandon prior command handling
dirl p_txrx ' reset both Tx and Rx smartpins
dirh p_txrx ' enable both Tx and Rx smartpins
pollpat ' clear any prior clock glitch event, placed after smartpin enable due to more I/O stages
wypin #0, p_miso ' clear the Tx buffer and shifter
The down side is it is a little laggy at the physical pin. The Tx pin transitions about 3 sysclock ticks after the clocking edge. Which is no issue internally, the shifter still gets cleared before the bit-shift occurs. But that can still effect a high first bit externally when at the minimum 12 ticks.
Upping the required minimum gap, of 12 ticks from CS rise to new clocks, would sort it.
EDIT: Err, that was just plain wrong. I'd unwittingly removed the data source in my testing. The smartpin clearing has to be during reset. The minimum gap needs adjusted accordingly ...
ISR_cs_rise ' abandon prior command handling
dirl p_txrx ' reset both Tx and Rx smartpins
wypin #0, p_miso ' clear the Tx buffer and shifter
dirh p_txrx ' enable both Tx and Rx smartpins
pollpat ' clear any prior clock glitch event, placed after smartpin enable due to more I/O stages
EDIT2: Bugger, even with the sooner WYPIN it's still 19 sysclock ticks from CS rise until Tx pin clears on the scope. So that puts a minimum of 20 ticks on that gap. 59 ns at 340 MHz sysclock.
Righty-ho, first draft with just the minimum block read and write:
EDIT: Couple of fixes made and C tester program added.
EDIT2: Added new documenting comment:
CMD4 BlockRead is more speed limited than CMD5. Where CMD5 can achieve sysclock/3 rate, CMD4
is restricted to maybe sysclock/7. Maybe it can do better, I haven't fully tested.
EDIT3: Another typo in the docs. I'd written "24-bit address" but should have been "28-bit address". The diagram was right though. 4 + 28 = 32 bits for the command+address.
EDIT4: I've both fixed up and cleaned up the loopback tester.
It seems that the above code is written for sending data in loop in P2. In my case I do not need to fill sbuff by P2, as it should be filled by receiving data from R-pi. So, I modified above code as only reading the data from MOSI, considering P2 as a slave. Here I got some confusion in the portion "Block Read Test". It should read data from its SPI pin, then why we are using wypin there, It should be rdpin?
for( j = 0; j < 12; j++ )
{
printf( "Block Read Test " );
filldat( (uint8_t *)sbuff, BUFFSIZE/4 );
// wypin( LCLK, 5 );
// waitus( 10 );
__asm {
dirl #LMOSI
mov pb, ##0x4000_0000
add pb, addr
rev pb
wypin pb, #LMOSI // load CMD5 (BlockWrite) into shifter
outh #LCS // signal end of prior SPI transaction
waitx #4
outl #LCS // signal start of SPI transaction
dirh #LMOSI // ready for clocks
dirl #LCLK
dirh #LCLK
wypin #300, #LCLK // fire!
wypin #0, #LMOSI // clear buffer register
}
_waitms( 1 ); // pause for clocking to complete
// _pinh( LCS ); // signal completed SPI transaction
for( i = addr; i < addr + 6; i++ )
printf( " %08x ", sbuff[i] );
puts("");
}
Don't use that as example for your use. It was just my loopback tester converted to C. It is only useful for seeing the simplicity of starting the slave cog.
PS: The main feature of the driver is the blank sheet of sbuff[]. The Rpi gets to read and write wherever it likes within sbuff[]. As does the other seven cogs of the Prop2. You can make it whatever size you like, and chop it up into however many structures you like.
Just to be clear, the driver takes care of all I/O at the Prop2 end. What you need to do now is write the Rpi I/O to communicate with it.
What you are looking at in that loopback tester is just a quick hack of mine at simulating a SPI master talking to the slave.
Sorry, I was maybe a little tired and shouldn't have posted that.
Oh, I probably should highlight there is a couple of pin order restrictions because of smartpin routing limitations.
MISO and MOSI pins have to be a neighbouring pair. Also, it needs a minor update to allow either order. The existing source code assumes MISO comes first.
The second restriction on pin order is CLK pin has to be within three pins of both MISO and MOSI pins.
The driver init routine doe not complain at this stage. It just fails silently.
EDIT: Here's a minor update allowing either MISO or MOSI to come first.
EDIT2: MISO is always driven by the slave. It doesn't float when CS high.
I am trying to send data using below code of R-Pi. And in P2 used latest code provided by yours (spislavetest1.c and spi_slave_trucated.spin2). I am wondering why I am not able to receive data in P2. I tried by changing cmode and frequency as well.
EDIT: I have used same pins and configuration as you did. One more query I have, how sbuff in P2 is filled if I am sending data from R-Pi? Does it filled by driver itself or any receiving function required?
#include <sys/ioctl.h>
#include <linux/spi/spidev.h>
#include <fcntl.h>
#include <iostream>
#include <cstdio>
#define CHUNK 19200
int fd;
unsigned char chunk[CHUNK] = {0};
void Txfunc(unsigned char chunkx[], int l) {
struct spi_ioc_transfer spi;
memset (&spi, 0, sizeof (spi));
spi.tx_buf = (unsigned long)chunkx;
spi.len = l;
int ret = ioctl (fd, SPI_IOC_MESSAGE(1), &spi);
}
int main () {
fd = open("/dev/spidev0.0", O_RDWR);
uint32_t speed = 10000000;
ioctl (fd, SPI_IOC_WR_MAX_SPEED_HZ, &speed);
for (int i = 0, var = 1; i<= CHUNK;i++){
chunk[i] = var;
var++;
if(var > 100){
var = 1;
}
//printf("p = %d\n",chunk[i]);
}
printf ("Sending data....\n");
for(int ix = 0; ix < 56;ix++){
Txfunc(chunk, CHUNK);
}
printf("finished sending data...\n");
return 0;
}
It uses commands like most SPI devices. CMD5 for the master to send data. It'll just ignore an invalid command and any associated data.
Example of writing a data block into the slave's buffer
_____ starting from address $4AF (longword granular) ____
CS |_______________________________________________ ... _______|
CMD5 28-bit address ($4AF) DATA in multiples of 32 bits
0101 0000000000000000010010101111 xxxxxxxxxxxx ... xxxx
EDIT: This should get you going (Assuming it is working otherwise):
Thanks for the code, I tested it. You did amazing job.
I modify the code to check receiving time for multiple buffers. I applied some logic to note down first and last digit of the buffer to note the time, not sure it is correct, but tried as given below. I found that the time is almost the same which is around 1.8 milliseconds. But in starting some buffers takes longer time. I have a query why this time variation is happening?
#include <Simpletools.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <sys/time.h>
enum {
//_xinfreq = 20_000_000,
_xtlfreq = 20_000_000,
_clkfreq = 360_000_000,
DOWNLOAD_BAUD = 230_400,
DEBUG_BAUD = DOWNLOAD_BAUD,
CS = 29, //Rpi24
CLK = 26, //rpi 23
MISO = 24, //rpi 21
MOSI = MISO + 1, // rpi 19
CMODE = 3, // CPOL is bit1, CPHA is bit0
BUFFSIZE = 19200, // longwords
SPINS = CS | CLK<<6 | MISO<<12 | MOSI<<18 | CMODE<<30,
};
struct __using("spi_slave_truncated.spin2") rpi;
//static uint32_t sbuff[BUFFSIZE];
unsigned char sbuff[BUFFSIZE];
static long stackxrts1[BUFFSIZE];
int countx = 0;
void main(void) {
high(30);
high(31);
printf( " clkfreq = %d clkmode = 0x%x\n", _clockfreq(), _clockmode() );
rpi.start_SPI_slave( SPINS, sbuff, BUFFSIZE );
_waitatn(); // wait for cog to start up
// Slave cog is ready and waiting. At this stage the tester program has not setup any of its pins,
// so we are going to be glitching them from the start.
int t1, t2;
int flag = 1;
while (1) {
if ( (sbuff[0] == 1) && flag == 1 ) {
t1 = _getus();
flag = 0;
}
if( sbuff[BUFFSIZE-1] == 138 ) {
t2 = _getus();
countx++;
printf("countx = %d\n t2-t1 = %d\n", countx, t2-t1);
memset(sbuff, 0 , BUFFSIZE);
flag = 1;
}
}
}
Comments
Yes P2 runs at 340 MHZ, but even with RPI 110 MHZ SPI clock I am not getting data faster. Is there any workaround that we can follow to receive data from RPi to P2 in 1.5 to 2ms? Also if we increase RPI spi clock frequency from 110 MHZ to 130 MHZ, we are not getting data properly.
You still need to separate overheads from data rate. If we don't know the achievable data rate of the interface then we can't gauge how much of that 3 ms is overheads. Then, after that, figure out where the overheads are coming from.
That would be right. It exceeds sysclock/3 rate. 340 / 3 = 113 MHz.
Theoretical peak data rate is 110 Mb/s. 19200 x 8 = 153600 bits to transfer. So ideal best case is 153600 / 110e6 = 1.3 ms. I guess, unlike SD card block transfers, there isn't any need for block gaps. It can transfer everything in one long burst. So only overhead should be initiation handshake.
EDIT: A bonus is if a spare cog is used to be the SPI slave then it can effectively operate in the background. Not taking up any foreground processing time at all. At least not until notification of transfer completion. The RPi controls the transfer.
I'm not sure if this is advantage for you though. Why do you need such a short transfer time?
Ah, cool. Thanks.
There is buffer limit in P2. So we fixed 19200 bytes buffer size.
Whole scenario is as below
So a sustained 76.8 Mb/s. Yeah, that's substantial. And as you're saying buffer size is naturally constrained.
Good news is a background slave would work fine. I'll continue with finishing that. Others will make use of it too.
I've managed to stay sane long enough to refine the SPI handler a little more. The events hardware isn't all that well documented so needed a rock solid understanding.
Here's the newest [revised again] version of the command handler I've been working on:
How can we use this in code in our existing code mentioned in comment #20?
Getting there. Still further to go. Just letting you know of progress.
That's very kind of you. I actually unfamiliar with assembly code so.
Not to derail any of your efforts, but I am curious why you are using a SPI interface as opposed to just a standard USART interface between the Pi and the Propeller.
I think the Pi UART is nowhere near fast enough ?
That's a point. If the RPi can do 100 Mb/s UART with 32-bit word framing, then that'd be ideal for the Prop2.
Most likely the speed is a problem because regular hardware implementations of UARTs have x16 input sampling of each bit spacing. Such would then need 1600 MHz sampling clock on the RX serial pins.
Finding actual Pi UART use cases above even 4Mbd is not easy.
The web finds paper numbers like this
UART: 476baud to 31.25Mbaud (theoretical maximum)
SPI: 3.8 kHz to 250 MHz (theoretical maximum)
I2C: 400kbps
but the PI also has small FIFOs, so you would need to lock a core to a UART to manage
Google also finds this
and this
https://forums.raspberrypi.com/viewtopic.php?t=192447#p1482053
mentions 'parallel connection' but also baud rate ??
The measured 2G/hr is ballpark 550k Bytes/sec averaged, so they are well short of sustaining the set 10.5MBd
From that, I'll presume a 500 MHz sysclock. (31.25 x 16 = 500)
EDIT: Haven't read them but found these:
https://datasheets.raspberrypi.com/bcm2835/bcm2835-peripherals.pdf
https://datasheets.raspberrypi.com/bcm2711/bcm2711-peripherals.pdf
An answer to JMG's earlier question from https://forums.parallax.com/discussion/comment/1544113/#Comment_1544113
Now that I've got something near complete. I haven't tested it for real but glancing at the typical code path looks like about 32 sysclocks from detecting a CS rise until ready for a subsequent CS low. So about 100 ns at 340 MHz.
3/4 of that time is the loop period of prior function. Not all are the same length so there is going to be variations. Maybe I could utilise interrupts to make that more responsive ...
EDIT: Interrupts now working smoothly!
I'm not sure what the worst case response will be though. It only needs two instructions (4 sysclock ticks) within the ISR to be ready. Worst case IRQ blocking is just branch instructions. So that's another 4 ticks I guess, plus the 4 ticks for the branch into the ISR itself. So 12 ticks total, maybe. So we're down to a relatively consistent 35 ns hopefully.
EDIT2: Okay, not that great. It needs 16 ticks20 ticks when SPI clock frequency is sysclock/8. Didn't need any packing at all when sysclock/80. I'm thinking the old method probably needed over 120 ticks.
And the measure is not exactly how long the CS high pulse is but rather the time from CS rise until the shifter smartpins come out of reset. The CS high pulse can be shorter as long as the subsequent SPI clocking is delayed enough.
EDIT3: sysclock/3 and sysclock/4 are both 24 ticks. So at 340 MHz that's 70 ns minimum from CS rise until clocks can start. CS can go low again anytime in between.
Huh, turns out I was getting unexpected clock glitch events. By clearing that event state along with the shifter reset it has now closed the margins some more. And also explains some erratic-ness that would come and go.
New figure based on previous measuring technique is now 12 ticks at sysclock/3. Which is exactly what I was originally hoping for. But I should hook up the scope because I suspect there is a little more when viewed from the physical pin timings ...
EDIT: Right, it's actually 13 ticks on the scope. And that's to the shifter clocking edge. Which can be either the first or second clock edge depending on CPHA of SPI clock mode. So at 340 MHz sysclock that's 38 ns minimum between CS rise and first clocking edge of next command. With CS going low again anytime in between.
Well, I got a newly observed oddity now - The Tx smartpin doesn't forget its data during a reset. If re-enabled it'll just repeat the old data word over and over. It makes sense because both initial loading modes allow a data word to be placed directly into the shifter, bypassing the buffer, by starting with DIR low (Which would normally clear all the working registers in many other smartpin modes).
It could be argued it's not an issue but I think I should probably do something to clear the data.
EDIT: [see answer to JMG below]
Do you mean in slave mode, if more clocks arrive before new data, it repeats the last sent data word ?
That underrun behaviour is somewhat common I think.
This is after a CS high reset of the smartpins - software performs a DIRL-DIRH combo. Yes, as the slave. The Tx buffer register retains prior data and happily feeds it out when the smartpin is re-enabled. The software needs to either leave the smartpin in reset or feed it a zero value to keep the data pin quiet. In the past I've always left it in reset until needed.
Now though, the Tx smartpin is live all the time. This is so that I can pace out a known number of SPI clocks to the start of a reply to a command.
PS: The fix is simply add a WYPIN #0,txpin. Just had to work out the best place for it was all. Most convenient place is after the smartpin reset.
The down side is it is a little laggy at the physical pin. The Tx pin transitions about 3 sysclock ticks after the clocking edge. Which is no issue internally, the shifter still gets cleared before the bit-shift occurs. But that can still effect a high first bit externally when at the minimum 12 ticks.
Upping the required minimum gap, of 12 ticks from CS rise to new clocks, would sort it.
EDIT: Err, that was just plain wrong. I'd unwittingly removed the data source in my testing. The smartpin clearing has to be during reset. The minimum gap needs adjusted accordingly ...
EDIT2: Bugger, even with the sooner WYPIN it's still 19 sysclock ticks from CS rise until Tx pin clears on the scope. So that puts a minimum of 20 ticks on that gap. 59 ns at 340 MHz sysclock.
Righty-ho, first draft with just the minimum block read and write:
EDIT: Couple of fixes made and C tester program added.
EDIT2: Added new documenting comment:
EDIT3: Another typo in the docs. I'd written "24-bit address" but should have been "28-bit address". The diagram was right though. 4 + 28 = 32 bits for the command+address.
EDIT4: I've both fixed up and cleaned up the loopback tester.
It seems that the above code is written for sending data in loop in P2. In my case I do not need to fill sbuff by P2, as it should be filled by receiving data from R-pi. So, I modified above code as only reading the data from MOSI, considering P2 as a slave. Here I got some confusion in the portion "Block Read Test". It should read data from its SPI pin, then why we are using wypin there, It should be rdpin?
Don't use that as example for your use. It was just my loopback tester converted to C. It is only useful for seeing the simplicity of starting the slave cog.
PS: The main feature of the driver is the blank sheet of sbuff[]. The Rpi gets to read and write wherever it likes within sbuff[]. As does the other seven cogs of the Prop2. You can make it whatever size you like, and chop it up into however many structures you like.
I'll add this note to the docs ...
Just to be clear, the driver takes care of all I/O at the Prop2 end. What you need to do now is write the Rpi I/O to communicate with it.
What you are looking at in that loopback tester is just a quick hack of mine at simulating a SPI master talking to the slave.
Sorry, I was maybe a little tired and shouldn't have posted that.
Here's a starter program (untested) for starting up the slave driver and then reporting any changes to sbuff[0].
Oh, I probably should highlight there is a couple of pin order restrictions because of smartpin routing limitations.
MISO and MOSI pins have to be a neighbouring pair. Also, it needs a minor update to allow either order. The existing source code assumes MISO comes first.
The second restriction on pin order is CLK pin has to be within three pins of both MISO and MOSI pins.
The driver init routine doe not complain at this stage. It just fails silently.
EDIT: Here's a minor update allowing either MISO or MOSI to come first.
EDIT2: MISO is always driven by the slave. It doesn't float when CS high.
I am trying to send data using below code of R-Pi. And in P2 used latest code provided by yours (spislavetest1.c and spi_slave_trucated.spin2). I am wondering why I am not able to receive data in P2. I tried by changing cmode and frequency as well.
EDIT: I have used same pins and configuration as you did. One more query I have, how sbuff in P2 is filled if I am sending data from R-Pi? Does it filled by driver itself or any receiving function required?
It uses commands like most SPI devices. CMD5 for the master to send data. It'll just ignore an invalid command and any associated data.
EDIT: This should get you going (Assuming it is working otherwise):
Hi @evanh
Thanks for the code, I tested it. You did amazing job.
I modify the code to check receiving time for multiple buffers. I applied some logic to note down first and last digit of the buffer to note the time, not sure it is correct, but tried as given below. I found that the time is almost the same which is around 1.8 milliseconds. But in starting some buffers takes longer time. I have a query why this time variation is happening?
The output:
Did you modify the meaning of BUFFSIZE in the Spin code? It was written to be in longwords but you've used bytes instead.
Use this line to correct:
rpi.start_SPI_slave( SPINS, sbuff, BUFFSIZE/4 );
PS: I've posted the version 1 code to a new topic and named it as PropSPI - https://forums.parallax.com/discussion/175543/propspi-emulation-of-a-generic-spi-ram-chip-using-the-prop2/p1
It has new undocumented feature of notification when a transfer has completed.
Hope this code works (It requires the newer v1 PropSPI and also the attached pllset.spin2):
EDIT: Improved transfer start detection