Ouch, I just had a look at it on the scope. The tx data is amazingly appearing 5 sysclocks behind (lagging) its SPI clock edge! Even more amazingly, it still puts the rx data within normal spec because the tx clocking is running 1.5 data bits (6 sysclocks) ahead of rx clocking.
EDIT: Another discovery is there is more leeway when at least the tx data pin is left unregistered, ie: changes to 4 sysclocks behind. SPI clock pin has to be registered to stay aligned with internal sysclock above 100 MHz. rx data pin seems to be fine either way.
Ah, just thought of a way to fix it real clean, it'll make sysclock/2 piece of cake ...
EDIT: Damn, didn't work. Oh well, just proved another fine detail with digital input selector.
EDIT2: Grr, I think it's got me beat. sysclock/2 isn't doable with smartpins. The tx smartpin is the limiting factor due to the turnaround of SPI clock back to tx data out. It really needs an internal SPI clock source but I can't find any way to select a neighbouring smartpin as input without it being a physical pin.
EDIT3: Okay, maybe a streamer could probably be dedicated as clock generator, that might work by toggling all three OUTs and have the tx smartpin use OUT as SPI clock input selector. Just need to work out how make a streamer do that ...
EDIT4: Some good news, the smartpins SPI data out doesn't seem to be feeding back into the input selector. So that's also new detail, I'll need to update my block diagram to show this.
EDIT5: Yay, bit-bashing OUT as a clock is giving me data bits at tx smartpin's output pin!
EDIT6: Immediate LUT streamer mode is cool! I can still make little DMA'd patterns without engaging the FIFO. I think it's all go now. Last few steps ...
EDIT7: ugh, I'm crashing, it's gonna have to wait for some zzzz...
I've been bashing my head against a wall with trying to get sync rx working which caused me to go back to my "working" bit bashing code and start looking at each byte and it's not matching what I think I should see. Since I'm trying to get the concept of using smartpins for SPI, I'm trying to get my SD code working with smartpins which has me wondering how that's working in the first place! I'm sure I'm doing something wrong and it will probably become obvious once I start taking a closer look.
I'm somewhat glad to hear what you've found once hooking things up to the scope. I had suspected it was something like this occurring, but couldn't figure out how to capture and verify with my LA. Timing diagrams are going to be ESSENTIAL to making these modes work.
For SD I'd be happy with just bit-bashing things but it seems like a simple use of the smartpin feature. The idea was to use a smartpin clock and data pin (half duplex for now) but I thought bit bashing the clock would be easier to start. I guess I opened up a can of worms here, hopefully it will be worth it later on when implementing full-duplex SPI.
The streamer looks really cool and I've thought about trying to use it for this. Baby steps. For now I'm going to see if I can get sync TX working at all.
Ouch, I just had a look at it on the scope. The tx data is amazingly appearing 5 sysclocks behind (lagging) its SPI clock edge! Even more amazingly, it still puts the rx data within normal spec because the tx clocking is running 1.5 data bits (6 sysclocks) ahead of rx clocking.
EDIT: Another discovery is there is more leeway when at least the tx data pin is left unregistered, ie: changes to 4 sysclocks behind. SPI clock pin has to be registered to stay aligned with internal sysclock above 100 MHz. rx data pin seems to be fine either way.
I guess some lag is expected, but that all adds up..
Does that mean the 2H/2L clock is not really practical, if you want to allow a finite setup time on the spi slave ? Maybe SysCLK/5 or SysCLK/6 is the limit ?
Is there any difference in physical loop-back (jumper on board) and electrical loopback (Smart pin cell adjacent connect) ?
( Someone may use electrical (internal) loopback for POST )
Ouch, I just had a look at it on the scope. The tx data is amazingly appearing 5 sysclocks behind (lagging) its SPI clock edge! Even more amazingly, it still puts the rx data within normal spec because the tx clocking is running 1.5 data bits (6 sysclocks) ahead of rx clocking.
EDIT: Another discovery is there is more leeway when at least the tx data pin is left unregistered, ie: changes to 4 sysclocks behind. SPI clock pin has to be registered to stay aligned with internal sysclock above 100 MHz. rx data pin seems to be fine either way.
I guess some lag is expected, but that all adds up..
Does that mean the 2H/2L clock is not really practical, if you want to allow a finite setup time on the spi slave ? Maybe SysCLK/5 or SysCLK/6 is the limit ?
Is there any difference in physical loop-back (jumper on board) and electrical loopback (Smart pin cell adjacent connect) ?
( Someone may use electrical (internal) loopback for POST )
I think the effects evanh is fighting are due to extra flops between pins and cogs. My understanding is the smartpins are physically closer than a cog's pin control, hence why they are perfect for handling the low-level pin toggling of SPI. It took a few false starts but at least it was less than 24 hours before I got smartpin RX working. I've posted a working version in my FSRW post but I need to refine the technique quite a bit. I think the trick is slowing the cog down enough, in the right places. I've got rather arbitrary waitx s in my code, and some trying to square the clock pulse for scopeability. It will be interesting to see where the timing issues creep up!
My understanding is the smartpins are physically closer than a cog's pin control, hence why they are perfect for handling the low-level pin toggling of SPI.
There's the custom pin layout around the outer perimeter of the die but that does not include the smartpins. The smartpins are in the core synthesised logic and end up are wherever the synthesiser decides is best. I had hoped they would be placed near the physical pins but it became increasing obvious with testing of each FPGA binary that that wasn't the case.
I realised after a while that because of the size and number of interlinking databuses involved that it was impractical to have them anywhere other than dead centre along with the hub control logic. And I think Chip has had to insert an extra staging flop between the custom ring and everything else for both IN and OUT signals.
Not to fear, for the ultimate speed freak, I have a solution almost done to go all the way to sysclock/2 ...
While I can run at about sysclock/10 using bit-bashing, this is entirely dependent upon the clock. If I want to want the SD at 25MHz then I need a 250MHz P2 clock. What would be interesting is being able to run up to 50MHz SPI from a lower and more conservative P2 clock. @evanh - thanks for your work so far, I will even try out some of this code myself for the purpose of pushing SD card speeds at present.
The trick is in using the OUT of the tx pin redirected to the smarpin's B input SPI clock. Well, that's the first part, the second part is getting a sysclock/2 clock signal patterned onto that OUT without repurposing the tx smartpin. That's where the streamer comes in ...
Eliminates three sysclocks of lag in the tx smartpin!
I think I've got the basics solved and distilled down to basics. I really need to see what pin registering does still. I'm not sure where the limit is here but I've hit a few brick walls and I'm sure I'll find more. I've done more than my fair share of chasing my tail on this but this works up to 280mhz.
pri read | c, o, t ' Read eight bits from the card.
c := clk
o := do
asm
dirl o
wrpin #sp_sync_rx,o 'sync tx
wxpin #sp_sync_cfg,o 'stop/start mode
dirh o 'enable smartpin
' akpin o ' doesn't reset pin?
rep #.end_rd, #8
drvl c
waitx #3 ' increased from 2 for higher mhz. 10 / 11 for square clock??
drvh c
.end_rd
rdpin t, o
rev t
and t, #$ff
endasm
return t
I'm excited to see what evenh comes up with.
By physically closer, I was referring to the extra set of flops (2 sets in the new chip iirc) between the cog and the pins. The smartpins don't suffer this penalty since there's no flops... unless you're clocking from the cog and not smartpin like I am. I'm not sure about that waitx in the loop, probably the card.. I believe 10 gives a fairly square clock. I also haven't confirmed the akpin is needed, my thought is it resets the smartpin? I'm not sure the pin is reset when activated with dirh, I think it is but need to test this still.
*edit
I've tested this a bit more and it seems the akpin doesn't reset the pin like I was hoping. I'm clearly not understanding this still.. While this works I don't understand it well enough for my liking. I can't figure out how to set the pin config ONCE in the init, and just reset the pin before use. I THOUGHT this was possible but I've tried a few different combinations and still no dice. It's probably something obvious...
*aedit - GOT IT. it was obvious. Setting DIR high DOES reset the pin. I was hoping it was the akpin but alas. I've decided that I can either work on TX or CLK next and I think I'm feeling adventurous and going to go for clock.
pri read | c, o, t ' Read eight bits from the card.
c := clk
o := do
asm
dirh o 'enable smartpin
rep #.end_rd, #8
drvl c
waitx #4 'set for high mhz. 2 safe for 160ish
drvh c
.end_rd
rdpin t, o
dirl o 'enable smartpin
rev t
and t, #$ff
endasm
return t
'Smartpin loopback test of sync. Tx and Rx modes
'for P2-ES Eval board
con
dmadiv = 1 ' sysclocks per streamer cycle (sysclocks per SPI clock step)
dmalen = 16 ' streamer DMA cycles (number of SPI clock steps in a burst)
dat org
hubset #0
mov dira, #0
mov outa, #0
wrlut #0, #0 'SPI clock low
wrlut #%1100, #1 'SPI clock high on pin2 (tx) and pin3 (sck)
setxfrq ##($4000_0000/dmadiv)<<1 'set streamer data rate
'setup pin for SPI clock
wrpin ##(1<<16), sck 'registered pin, streamer will control this pin
dirh sck
'setup smartpin for sync tx
' this arrangement of streamer + unregistered gives a much reduced 1-sysclock data out delay
' which usefully acts as if the data change is on the next SPI clock edge
wrpin ##(%0000_0100<<24)+%01_11100_0, tx 'clock from streamer OUT, DIR forced on, unregistered pin
wxpin #$20 + 7,tx
dirh tx
'setup smartpin for sync rx
wrpin ##(%0001_0010<<24)+(1<<16)+%11101_0, rx 'clock from rx + 2, data from rx + 1, registered pin
wxpin #$0 + 7,rx
dirh rx
'send bytes on tx pin and receive bytes on rx pin
loc ptra,#@msg
.loop rdbyte pb,ptra++ wz
if_nz call #txrx
if_nz jmp #.loop
done jmp #$
txrx
rev pb 'big-endian for conventional SPI ... and human sanity
shr pb,#24
xinit stmrmode, stmrpat 'send 8 SPI clocks, can be started prior to loading tx
wypin pb, tx 'load shifter and place first bit on SMART_OUT
'waitx #500 'pause for bits to shift
waitx #20 'pause for bits to shift
rdpin pa,rx
shr pa,#24 'received byte
cmp pa,pb wz 'both leds lit if match
drvl #56 'alternate leds lit if mismatch
drvnz #57
waitx ##25_000_000
drvh #56
drvz #57
waitx ##25_000_000
ret wcz
rx long 1 'smartpin locations
tx long 2
sck long 3
'XINIT D/#[15:0] transfer count (Number of NCO rollovers that the command will be active for)
'XINIT D/#[31:16] (x = don’t care)
'dddd = %0000 no streamer DAC output
'eppp = %1000 enable output on pins 31..0
' mode dacs pins base S/# description
' ---- ---- ---- ---- ------ ---------------------------
'%0011_dddd_eppp_xxxx %0 32-bit RFLONG
'%1100_dddd_eppp_xxxx <long> 32-bit immediate
'%1000_dddd_eppp_bbbb <long> 1-bit immediate LUT
stmrmode long $8080_0000+dmalen 'DMA cycles (1-bit immed LUT), pins 0-31
stmrpat long $5555_5555 'string of 1-bit LUT addresses for SPI clocking
orgh $400
msg byte %1011_0010,"Smartpins",0
Scope snapshot of first byte transmitted with above code:
EDIT: Clarify comments on tx smartpin
EDIT2: Added big-endian tx ordering and first ID byte to string for scope verification
EDIT3: Added scope screenshot
EDIT4: Relabel smartpin output as SMART_OUT
@evanh - That's some very nice code there. I'm understanding quite a bit, considering I'm not near the genius you are! I think I understand the smartpin section but the streamer is WAY above my head right now. I'm just happy I cracked using 2 smartpins at the same time. Pretty close on setting clock period, I need to think it though a little more though. Here's my working concept, pretty sure it runs at sysclock /2 although I may have the math wrong on that part still.
spi_bitcycles :=((clkfreq / spi_clk_max) /2 )+1
pri read | c, o, t, bc ' Read eight bits from the card.
c := clk
o := do
bc := spi_bitcycles
asm
dirl c
wrpin #sp_clk, c
wxpin bc, c 'set base period
dirh c
dirh o 'enable smartpin in
wypin #byteclks, c 'start clock
.busy testp c wc
if_nc jmp #.busy
rdpin t, o
dirl o 'disable smartpin
wrpin #0, c
rev t
and t, #$ff
endasm
return t
By physically closer, I was referring to the extra set of flops (2 sets in the new chip iirc) between the cog and the pins. The smartpins don't suffer this penalty since there's no flops... unless you're clocking from the cog and not smartpin like I am.
Smartpins have the same number of clocked stages out to the pins that the cogs do. They are just as far away from the real world.
The trick, in the latest example above, I'm making use of, to make all those stages disappear, is by supplying the SPI clock to the smartpin internally. Which means:
- The tx smartpin sees, and acts on, the internal SPI clock on the next sysclock
- The tx smartpin data can then travel down those same stages as the SPI clock is also doing
Which all means SPI tx data and clock arrive only one sysclock apart at the outside of the prop2. EDIT: Err, they're actually two sysclocks apart internally but the SPI clock pin is registered, taking an extra sysclock to emerge, while the tx pin is not.
... but the streamer is WAY above my head right now.
Yeah, lots to learn there, and it's spread out which doesn't help. I should try to document it better. Do some more of that config labelling at least.
I didn't realize the smartpins were behind those flops, just like the cogs are. Something to think about and I'm sure your example is going to come in handy when I need that full-duplex SPI (can't remember what device I wanted to use that required it though). Half-Duplex seems to be pretty easy, hoping now that I understand a little better the TX portion shouldn't drive me so mad!
The problem I'm having is understanding the docs can be a bit... We are going to need LOTS of timing diagrams and block diagrams!! Words are great but pictures are going to be what helps the engineer get from point A to point B. I realize it's still pretty early for that.
My above example works up to sysclock /2, although the SD card brick walls around 25mhz I think. I'll be really happy once all this stuff is in it's own cog and I can just forget about it again!
Just a little prod ... I don't want to be the kill-joy but if repeated resetting of the smartpin is required to make things work, then you've probably got a bug still to iron out. You should be able to leave the smartpins configured, sitting idle between bursts, and then be able continue without any reset/clearing of old state when there is more data to tx/rx.
Just a little prod ... I don't want to be the kill-joy but if repeated resetting of the smartpin is required to make things work, then you've probably got a bug still to iron out. You should be able to leave the smartpins configured, sitting idle between bursts, and then be able continue without any reset/clearing of old state when there is more data to tx/rx.
I may have a bug to iron out still but if you are referring to the clock pin config at the beginning and end that's because I'm using the pin as a regular bit-bashed clock elsewhere for now. This is how I have the free-running clock setup, again disabling the smartpin so I can bash it with the cog.
asm
wrpin #sp_srx, o 'sync tx
wxpin #sp_srx_c, o 'stop/start mode
dirl c
wrpin #sp_clk, c
wxpin bt, c 'set base period (*2) ''2.5 400kmax init
dirh c
wypin #initclks, c 'start clock, tx data n-1?
.busy testp c wc
if_nc jmp #.busy
drvh csn
wypin #initclks, c 'start clock, tx data n-1?
rdpin pa,c
.busy2 testp c wc
if_nc jmp #.busy2
wrpin #0, c
endasm
The problem I'm having is understanding the docs can be a bit... We are going to need LOTS of timing diagrams and block diagrams!! Words are great but pictures are going to be what helps the engineer get from point A to point B. I realize it's still pretty early for that.
I've probably looked at a hundred scope captures in the last day. When I saw the 5-sysclock lag on a 4-sysclock period, with both send and receive still working, my eyes popped out. Sometimes diagrams aren't enough when trying to max it out.
But yeah, certainly a much bigger documenting job coming up than the prop1 effort. Early adopters won't have everything at the start.
I just noticed a special config bit, X[5], for the synchronous receive smartpin for changing the timing of the data shift in. Setting it breaks my above example. I like the nicely aligned clock/data timings I've got now, so at this stage I'm not inclined to work out what use X[5] config can be.
I may have a bug to iron out still but if you are referring to the clock pin config at the beginning and end that's because I'm using the pin as a regular bit-bashed clock elsewhere for now. This is how I have the free-running clock setup, again disabling the smartpin so I can bash it with the cog.
Ah, I think I see now. No, it was actually the line
dirl o 'disable smartpin
You'd mentioned having to reset things in an earlier post and I hadn't made a great effort to read all you've written.
Looking back a little further in your code I see you've enabled the smartpin for only the duration of use. So not resetting out of necessity per se. Apologies for the earlier accusation.
No worries, I can understand if I'm a bit confusing in my explanations. I believe mostly it's because I'm still confused. It could be the legal cannabis though tbf. At least it keeps me from getting too high-strung while fighting the code monkey life.
RE config bit[5], that's basically pre/post clock sampling. In my example I'm using an inverted clock, inverting the clock input to the smartpin and sampling post-clock. I'm pretty sure it would make the loopback worse as it adds .5 bit times.
RE config bit[5], that's basically pre/post clock sampling. In my example I'm using an inverted clock, inverting the clock input to the smartpin and sampling post-clock. I'm pretty sure it would make the loopback worse as it adds .5 bit times.
The docs don't say 0.5 bit times. Chip has been quite careful to precisely say what each thing is. Quite a lot can be understood by being very careful with reading his docs - I've tripped up many times. In this case though he hasn't been specific.
My guess is it is more likely to mean one sysclock prior to detection of B-input rise, of which itself is after the fact. This one definitely needs a timing diagram! More for understanding the implications than for function description.
My previous example is buggy, as in it works but it's not correct. I was cheating using 15 transitions instead of the full 16. I'm still cheating but it's better? because at least I'm using 3 smartpins, setup at init! What I need to do is figure out how to invert the smartpin clock so it defaults to high. My solution right now is to invert it on init with 15 transitions but if the pin is disabled and reenabled it will default back to low. There's a setting i'm missing, I thought it was the CIO in pppppp… but alas those don't describe what I need?
Here's 3 pins, no bit-bashing.
pri send(outv) | c, i ' Send eight bits, then raise di.
i := di
c := clk
asm
SHL outv, #32-8 ' Set MSB First
REV outv ' Set MSB First
wypin outv, i ' data
rdpin pa, c
wypin #byteclks, c 'start clock, tx data
.busy testp c wc
if_nc jmp #.busy
wypin #$FF, i
endasm
pri read | c, o, t ' Read eight bits from the card.
c := clk
o := do
asm
rdpin pa, c
wypin #byteclks, c ' start clock
.busy testp c wc
if_nc jmp #.busy
rdpin t, o ' get data
rev t
and t, #$ff
endasm
return t
and the "overhead"
spi_clk_max = 25_000_000 ' 20mhz safe, 25 seems to work though?
'smartpin modes
sp_stx = %0000_0010_000_0000000000000_01_11100_0 '' +2
sp_srx = %0000_1011_000_0000000000000_00_11101_0 '' +3, inv
sp_srx_c = %0_00111
sp_clk = %0000_0000_000_0000000000000_01_00101_0
initclks = (9*(2*8)) -1 ' invert pin here!
byteclks = (2*8)
long spi_bitcycles
…
' and init stuffs
spi_bitcycles :=((clkfreq / spi_clk_max) /2 )+1
bt := spi_bitcycles
asm
drvh csn
dirl c
wrpin #sp_clk, c
wxpin bt, c 'set base period (*2) ''2.5 400kmax init
dirh c
dirl i
wrpin #sp_stx, i 'sync tx
wxpin #%1_00111, i 'stop/start mode
dirh i
wypin #$FF, i
dirl o
wrpin #sp_srx, o 'sync rx
wxpin #sp_srx_c, o 'stop/start mode
dirh o
wypin #initclks, c 'start clock, tx data n-1?
.busy testp c wc
if_nc jmp #.busy
wypin #byteclks, c 'start clock, tx data n-1?
rdpin pa, c
.busy2 testp c wc
if_nc jmp #.busy2
endasm
I fought it for a while and when I knew I was getting close I flipped the pre-post bit (the infamous bit5) and PRESTO it works. I need to take a break and put this in it's own cog. Should have a pretty legit SD driver soon.
What I need to do is figure out how to invert the smartpin clock so it defaults to high. My solution right now is to invert it on init with 15 transitions but if the pin is disabled and reenabled it will default back to low. There's a setting i'm missing, I thought it was the CIO in pppppp… but alas those don't describe what I need?
Yes, the "O" bit inverts the output drive. This should do it:
One detail that isn't completely clear with those pulse and step (transition) smartpin modes: They always count off the period duration first, then the state change comes last.
That's another small advantage gained when I used the streamer, I was able to build the SPI clock pattern to start with a state change on the first sysclock, then have the period duration follow.
Screen shots are gold.
Do you have one of this sending multiple bytes ?
eg Common would be a SPI Opcode + address (which could be packed into 32b, and then sent with 40(39?) clocks, to cover [OP+Adr+Pause] & then a burst of data being read from flash/SRAM.
I'm guessing this streamer-generates-clock cannot be used with a streamer-for-data ?
What speed could P2 manage for QuadSPI, using streamer for data ?
It was only a one-byte-at-a-time loopback echo demo. I made no attempt to extend that functionality.
There is two modes configurable of how a tx smartpin can be loaded which I think affects how the first data bit appears. I haven't tried to work out why.
I'm guessing this streamer-generates-clock cannot be used with a streamer-for-data ?
What speed could P2 manage for QuadSPI, using streamer for data ?
Well, could have both clock and data formatted together into hubram before sending them with a streamer. But the processing overhead would defeat the purpose.
What I need to do is figure out how to invert the smartpin clock so it defaults to high. My solution right now is to invert it on init with 15 transitions but if the pin is disabled and reenabled it will default back to low. There's a setting i'm missing, I thought it was the CIO in pppppp… but alas those don't describe what I need?
Yes, the "O" bit inverts the output drive. This should do it:
sp_clk = %0000_0000_000_0000001000000_01_00101_0
Well that DOES invert the clock like I want but it totally breaks the rest of the timing! I'll need to investigate further since in my head, once I inverted the clock smartpin and removed the -1 from the initclocks I was done. Not the case so it seems. Got a friend in town for the next couple days so probably won't get much time to investigate.
From a quick look it seems the TX smartpin is running ahead. I had it "working" for a second until I changed the spi max frequency down to 2mhz which suggests the setup time was being violated. I had to invert the Bpin on both the TX and RX pins to make it work. I'm sure the answer is staring me in the face. I'll have to get back to it later though.
Comments
Understood. Thank you for understanding.
I will look at the code and work with it.
EDIT: Another discovery is there is more leeway when at least the tx data pin is left unregistered, ie: changes to 4 sysclocks behind. SPI clock pin has to be registered to stay aligned with internal sysclock above 100 MHz. rx data pin seems to be fine either way.
EDIT: Damn, didn't work. Oh well, just proved another fine detail with digital input selector.
EDIT2: Grr, I think it's got me beat. sysclock/2 isn't doable with smartpins. The tx smartpin is the limiting factor due to the turnaround of SPI clock back to tx data out. It really needs an internal SPI clock source but I can't find any way to select a neighbouring smartpin as input without it being a physical pin.
EDIT3: Okay, maybe a streamer could probably be dedicated as clock generator, that might work by toggling all three OUTs and have the tx smartpin use OUT as SPI clock input selector. Just need to work out how make a streamer do that ...
EDIT4: Some good news, the smartpins SPI data out doesn't seem to be feeding back into the input selector. So that's also new detail, I'll need to update my block diagram to show this.
EDIT5: Yay, bit-bashing OUT as a clock is giving me data bits at tx smartpin's output pin!
EDIT6: Immediate LUT streamer mode is cool! I can still make little DMA'd patterns without engaging the FIFO. I think it's all go now. Last few steps ...
EDIT7: ugh, I'm crashing, it's gonna have to wait for some zzzz...
EDIT8: Updated the block diagram - https://forums.parallax.com/discussion/comment/1473762/#Comment_1473762
I've been bashing my head against a wall with trying to get sync rx working which caused me to go back to my "working" bit bashing code and start looking at each byte and it's not matching what I think I should see. Since I'm trying to get the concept of using smartpins for SPI, I'm trying to get my SD code working with smartpins which has me wondering how that's working in the first place! I'm sure I'm doing something wrong and it will probably become obvious once I start taking a closer look.
I'm somewhat glad to hear what you've found once hooking things up to the scope. I had suspected it was something like this occurring, but couldn't figure out how to capture and verify with my LA. Timing diagrams are going to be ESSENTIAL to making these modes work.
For SD I'd be happy with just bit-bashing things but it seems like a simple use of the smartpin feature. The idea was to use a smartpin clock and data pin (half duplex for now) but I thought bit bashing the clock would be easier to start. I guess I opened up a can of worms here, hopefully it will be worth it later on when implementing full-duplex SPI.
The streamer looks really cool and I've thought about trying to use it for this. Baby steps. For now I'm going to see if I can get sync TX working at all.
I guess some lag is expected, but that all adds up..
Does that mean the 2H/2L clock is not really practical, if you want to allow a finite setup time on the spi slave ? Maybe SysCLK/5 or SysCLK/6 is the limit ?
Is there any difference in physical loop-back (jumper on board) and electrical loopback (Smart pin cell adjacent connect) ?
( Someone may use electrical (internal) loopback for POST )
I think the effects evanh is fighting are due to extra flops between pins and cogs. My understanding is the smartpins are physically closer than a cog's pin control, hence why they are perfect for handling the low-level pin toggling of SPI. It took a few false starts but at least it was less than 24 hours before I got smartpin RX working. I've posted a working version in my FSRW post but I need to refine the technique quite a bit. I think the trick is slowing the cog down enough, in the right places. I've got rather arbitrary waitx s in my code, and some trying to square the clock pulse for scopeability. It will be interesting to see where the timing issues creep up!
I realised after a while that because of the size and number of interlinking databuses involved that it was impractical to have them anywhere other than dead centre along with the hub control logic. And I think Chip has had to insert an extra staging flop between the custom ring and everything else for both IN and OUT signals.
Not to fear, for the ultimate speed freak, I have a solution almost done to go all the way to sysclock/2 ...
I guess you could use two CLK pins, in situations where speed mattered more than pin count
Eliminates three sysclocks of lag in the tx smartpin!
I'm excited to see what evenh comes up with.
By physically closer, I was referring to the extra set of flops (2 sets in the new chip iirc) between the cog and the pins. The smartpins don't suffer this penalty since there's no flops... unless you're clocking from the cog and not smartpin like I am. I'm not sure about that waitx in the loop, probably the card.. I believe 10 gives a fairly square clock. I also haven't confirmed the akpin is needed, my thought is it resets the smartpin? I'm not sure the pin is reset when activated with dirh, I think it is but need to test this still.
*edit
I've tested this a bit more and it seems the akpin doesn't reset the pin like I was hoping. I'm clearly not understanding this still.. While this works I don't understand it well enough for my liking. I can't figure out how to set the pin config ONCE in the init, and just reset the pin before use. I THOUGHT this was possible but I've tried a few different combinations and still no dice. It's probably something obvious...
*aedit - GOT IT. it was obvious. Setting DIR high DOES reset the pin. I was hoping it was the akpin but alas. I've decided that I can either work on TX or CLK next and I think I'm feeling adventurous and going to go for clock.
Scope snapshot of first byte transmitted with above code:
EDIT: Clarify comments on tx smartpin
EDIT2: Added big-endian tx ordering and first ID byte to string for scope verification
EDIT3: Added scope screenshot
EDIT4: Relabel smartpin output as SMART_OUT
The trick, in the latest example above, I'm making use of, to make all those stages disappear, is by supplying the SPI clock to the smartpin internally. Which means:
- The tx smartpin sees, and acts on, the internal SPI clock on the next sysclock
- The tx smartpin data can then travel down those same stages as the SPI clock is also doing
Which all means SPI tx data and clock arrive only one sysclock apart at the outside of the prop2. EDIT: Err, they're actually two sysclocks apart internally but the SPI clock pin is registered, taking an extra sysclock to emerge, while the tx pin is not.
I didn't realize the smartpins were behind those flops, just like the cogs are. Something to think about and I'm sure your example is going to come in handy when I need that full-duplex SPI (can't remember what device I wanted to use that required it though). Half-Duplex seems to be pretty easy, hoping now that I understand a little better the TX portion shouldn't drive me so mad!
The problem I'm having is understanding the docs can be a bit... We are going to need LOTS of timing diagrams and block diagrams!! Words are great but pictures are going to be what helps the engineer get from point A to point B. I realize it's still pretty early for that.
My above example works up to sysclock /2, although the SD card brick walls around 25mhz I think. I'll be really happy once all this stuff is in it's own cog and I can just forget about it again!
But yeah, certainly a much bigger documenting job coming up than the prop1 effort. Early adopters won't have everything at the start.
I just noticed a special config bit, X[5], for the synchronous receive smartpin for changing the timing of the data shift in. Setting it breaks my above example. I like the nicely aligned clock/data timings I've got now, so at this stage I'm not inclined to work out what use X[5] config can be.
Looking back a little further in your code I see you've enabled the smartpin for only the duration of use. So not resetting out of necessity per se. Apologies for the earlier accusation.
No worries, I can understand if I'm a bit confusing in my explanations. I believe mostly it's because I'm still confused. It could be the legal cannabis though tbf. At least it keeps me from getting too high-strung while fighting the code monkey life.
RE config bit[5], that's basically pre/post clock sampling. In my example I'm using an inverted clock, inverting the clock input to the smartpin and sampling post-clock. I'm pretty sure it would make the loopback worse as it adds .5 bit times.
My guess is it is more likely to mean one sysclock prior to detection of B-input rise, of which itself is after the fact. This one definitely needs a timing diagram! More for understanding the implications than for function description.
Here's 3 pins, no bit-bashing.
and the "overhead"
I fought it for a while and when I knew I was getting close I flipped the pre-post bit (the infamous bit5) and PRESTO it works. I need to take a break and put this in it's own cog. Should have a pretty legit SD driver soon.
That's another small advantage gained when I used the streamer, I was able to build the SPI clock pattern to start with a state change on the first sysclock, then have the period duration follow.
Do you have one of this sending multiple bytes ?
eg Common would be a SPI Opcode + address (which could be packed into 32b, and then sent with 40(39?) clocks, to cover [OP+Adr+Pause] & then a burst of data being read from flash/SRAM.
I'm guessing this streamer-generates-clock cannot be used with a streamer-for-data ?
What speed could P2 manage for QuadSPI, using streamer for data ?
There is two modes configurable of how a tx smartpin can be loaded which I think affects how the first data bit appears. I haven't tried to work out why.
Well, could have both clock and data formatted together into hubram before sending them with a streamer. But the processing overhead would defeat the purpose.
Well that DOES invert the clock like I want but it totally breaks the rest of the timing! I'll need to investigate further since in my head, once I inverted the clock smartpin and removed the -1 from the initclocks I was done. Not the case so it seems. Got a friend in town for the next couple days so probably won't get much time to investigate.
From a quick look it seems the TX smartpin is running ahead. I had it "working" for a second until I changed the spi max frequency down to 2mhz which suggests the setup time was being violated. I had to invert the Bpin on both the TX and RX pins to make it work. I'm sure the answer is staring me in the face. I'll have to get back to it later though.