Well when I clock SPI I've checked that the MISO data does indeed change on the falling edge and immediately I sample it and the sample that I read is the same sample that I read off the digital-signal analyzer.
SPIRD rep @.end,#8 ' 8 bits
xor outa,sck ' clock
xor outa,sck
test ina,miso wc ' read data from card
rcl tos,#1 ' shift in msb first
BTW, by loading I mean capacitive effects of the LED on rise/fall time.
Try this one out Peter. I'd love to know if gives stable result. You could say I'm testing what Chip has said, and it does depend on the clock active edge being where your code says.
' SPIRD ( dummy -- dat )
SPIRD xor outa,sck ' clock (active edge)
nop ' Fix clock Skew, improve Mhz tolerance & Tsu for Din
xor outa,sck ' cannot be active edge !
rep @.end,#7 ' Remaining 7 bits
xor outa,sck ' clock (active edge)
test ina,miso wc ' read data from card
xor outa,sck ' cannot be active edge !
rcl tos,#1 ' shift in msb first
.end
nop
test ina,miso wc ' read data from card
rcl tos,#1 ' shift in final lsb
ret
Just going through my SPI read timing and I think I remember how I got it to work without the delays because there was an issue, it's all a matter of clock polarity and offset "tricks". I will document what I did so that I could clock without any delays. The clock high btw is 40ns and the clock cycle is 160ns with data hold/valid at around 10ns.
Try this one out Peter. I'd love to know if gives stable result. You could say I'm testing what Chip has said, and it does depend on the clock active edge being where your code says.
That's nifty, slightly unrolled, & manages a 50% clock duty and a 4 opcode loop.
Relies on there being at least 2 clocks from CLK to Din.
Try this one out Peter. I'd love to know if gives stable result. You could say I'm testing what Chip has said, and it does depend on the clock active edge being where your code says.
' SPIRD ( dummy -- dat )
SPIRD xor outa,sck ' clock (active edge)
nop ' Fix clock Skew, improve Mhz tolerance & Tsu for Din
xor outa,sck ' cannot be active edge !
rep @.end,#7 ' Remaining 7 bits
xor outa,sck ' clock (active edge)
test ina,miso wc ' read data from card
xor outa,sck ' cannot be active edge !
rcl tos,#1 ' shift in msb first
.end
nop
test ina,miso wc ' read data from card
rcl tos,#1 ' shift in final lsb
ret
Thanks, I will have a look at it soon but when I wrote this code I didn't know that you couldn't sample that fast as it wasn't documented, so I just made it work!
Try this one out Peter. I'd love to know if gives stable result. You could say I'm testing what Chip has said, and it does depend on the clock active edge being where your code says.
' SPIRD ( dummy -- dat )
SPIRD xor outa,sck ' clock (active edge)
nop ' Fix clock Skew, improve Mhz tolerance & Tsu for Din
xor outa,sck ' cannot be active edge !
rep @.end,#7 ' Remaining 7 bits
xor outa,sck ' clock (active edge)
test ina,miso wc ' read data from card
xor outa,sck ' cannot be active edge !
rcl tos,#1 ' shift in msb first
.end
nop
test ina,miso wc ' read data from card
rcl tos,#1 ' shift in final lsb
ret
Just a quick analysis of what I did it seems that just before I did the SD card read block that I inserted a single clock pulse and then I don't need to worry about anything else for the next 512 bytes, just go for it and 815us later the data is all buffered.
As for the WIZnet chip well that outputs the read data before the clock so that just worked, and now I have to check the serial Flash. A lot of this stuff was done on the fly as I was still busy writing and testing various parts of Tachyon.
EDIT: just inserted my sample in the middle of the clock pulse (which I have tried before) and that works fine due to the clock skew I use and now the clock high/low is 80ns/90ns
Here is how you could clock SPI data out of a flash efficiently:
rep @.end,#8 'ready for 8 bits
clrb outb,clk 'clk low
rcl data,#1 'save din bit (1st pass like nop)
setb outb,clk 'clk high
testb outb,din wc 'sample din (4 clocks since clk low)
.end
rcl data,#1 'save final din bit
Note there are 4 clocks between clk going low and din being sampled. This respects the Prop2 clock delays and the SPI device outputting after the falling clk edge.
Note there are 4 clocks between clk going low and din being sampled. This respects the Prop2 clock delays and the SPI device outputting after the falling clk edge.
That's a little more compact, and it still has 50% CLK, but the slow-memory tolerance is not as good as the code a few posts up.
How much margin is in the pipeline, for adjacent opcode wr-rd cases ? I think 2 CLKs from your tests ?
Note there are 4 clocks between clk going low and din being sampled. This respects the Prop2 clock delays and the SPI device outputting after the falling clk edge.
That's a little more compact, and it still has 50% CLK, but the slow-memory tolerance is not as good as the code a few posts up.
How much margin is in the pipeline, for adjacent opcode wr-rd cases ? I think 2 CLKs from your tests ?
You're right. I didn't realize that about the slow-speed tolerance.
It took three clocks, at minimum, before pins echoed. 4 clocks is just practical given that instructions take 2 clocks each. That extra clock would give memories more time. I may not understand what you mean, though.
Note there are 4 clocks between clk going low and din being sampled. This respects the Prop2 clock delays and the SPI device outputting after the falling clk edge.
That's a little more compact, and it still has 50% CLK, but the slow-memory tolerance is not as good as the code a few posts up.
How much margin is in the pipeline, for adjacent opcode wr-rd cases ? I think 2 CLKs from your tests ?
You're right. I didn't realize that about the slow-speed tolerance.
It took three clocks, at minimum, before pins echoed. ..
If 3 CLKs is visible, and 2 is not, then there may be a remote chance 2 becomes visible with extreme timing ? but that still leaves 1 clk of headroom, which is ok
(ie there is by-design no risk of a shift in phase ?)
That extra clock would give memories more time. I may not understand what you mean, though.
The parts I looked at would need more than 1 SysCLK Tco tolerance, but in other aspects do meet timing (ie they would load with SysCLK=50MHz. 50% Duty, 8c Loop)
Yep. That level of functionality is way beyond the scope of the P2. Heck, there is a good, strong case for FAT itself being beyond the scope. We shall see.
Comments
BTW, by loading I mean capacitive effects of the LED on rise/fall time.
These delays matter when there is some tight output-to-input signalling.
I will write a fast SPI input routine tonight and post it here.
I said it was minor, and have never claimed it was unique, but I DID say details matter.
I am enjoying a wry smile here ...
That's nifty, slightly unrolled, & manages a 50% clock duty and a 4 opcode loop.
Relies on there being at least 2 clocks from CLK to Din.
Thanks, I will have a look at it soon but when I wrote this code I didn't know that you couldn't sample that fast as it wasn't documented, so I just made it work!
That's pretty much what I was envisioning, too.
As for the WIZnet chip well that outputs the read data before the clock so that just worked, and now I have to check the serial Flash. A lot of this stuff was done on the fly as I was still busy writing and testing various parts of Tachyon.
EDIT: just inserted my sample in the middle of the clock pulse (which I have tried before) and that works fine due to the clock skew I use and now the clock high/low is 80ns/90ns
Note there are 4 clocks between clk going low and din being sampled. This respects the Prop2 clock delays and the SPI device outputting after the falling clk edge.
Ah, yes. You would need 'and data,#$FF' after the last 'rcl' instruction.
How much margin is in the pipeline, for adjacent opcode wr-rd cases ? I think 2 CLKs from your tests ?
You're right. I didn't realize that about the slow-speed tolerance.
It took three clocks, at minimum, before pins echoed. 4 clocks is just practical given that instructions take 2 clocks each. That extra clock would give memories more time. I may not understand what you mean, though.
(ie there is by-design no risk of a shift in phase ?)
The parts I looked at would need more than 1 SysCLK Tco tolerance, but in other aspects do meet timing (ie they would load with SysCLK=50MHz. 50% Duty, 8c Loop)
http://www.denx.de/wiki/U-Boot/
The Raspberry Pi, my Cubieboard1, and my MR-3020 minirouter all are capable of using 'Das U-boot'. So I've been wondering why not the Propeller 2?
The numbers given around RAM and image sizes, suggest this is nowhere near a 16k ROM footprint.