PNut/Spin2 Latest Version (v46 - DEBUG gating, clock-setter control, VAR flexibility, C_Z for DEBUG)

Cluso99 · 2020-05-20 06:35

Chip,
I am also really looking for guidance regarding loading binaries, etc from SD.
Can you take a look here and answer some questions please?
forums.parallax.com/discussion/171599/discussion-and-questions-about-a-p2-operating-system

My P2 OS is running. I now need to work out how to load files while keeping the OS alive and not overwriting it.

Rayman · 2020-05-20 15:37

There's no mention of inline assembly in the Spin2 docs...
I know it starts with "org" but don't remember how it ends...
Guess I'll have to dig through the forum...

Found it.. Looks like "end" is what I need.

Rayman · 2020-05-20 15:52

Unfortunately, PNut Spin2 does't have anything like the Waitx() that FastSpin does.

But, was able to do this, which FastSpin seems to let me do and PNut seems OK with too:

PUB waitx(x)
    org
        waitx   x
    end

Rayman · 2020-05-20 16:29

Oops, spoke too soon...

PNut says "Expected a unique method name" on my waitx() method.

So, PNut apparently doesn't implement waitx() and also won't let you implement it yourself...

But, changing from "waitx" to "waitnx" seems to work for both.

Cluso99 · 2020-05-20 18:03

There is a waitms(n)

Rayman · 2020-05-20 18:34

deleted...

JonnyMac · 2020-05-20 19:43

In P2ASM, waitx is a keyword, hence it cannot be used as a method name in Spin2 (PNut).

Rayman · 2020-05-20 20:33

I seem to have a problem with using REP inside inline assembly...
Is this supposed to work?

Rayman · 2020-05-20 23:24

The stack must start immediately after code because I found the hard way that writing a few hundred bytes past the end sends it off the rails...

Rayman · 2020-05-23 19:30

There's a slight glitch with assembly when using bytes...
Compiler complained and made me put in "alignl" when using this byte jump table:

long
jumps                   byte    0                       '0  'Can't use this one!
                        byte    Start_                  '1
                        byte    WriteRam_               '2
                        byte    ReadRam_                '3
                        byte    HyperVideo_             '4
                        byte    Dummy_                  '5
                        byte    Stop_                   '6
                        byte    ConfigureVideo_         '7



DAT 'Start_ 
'alignl 'Need this or PNut complains about alignment due to bytes above...

The problem is that when the # of bytes happens to equal 8 (or maybe any multiple of 4) it gives an error on the "alignl".
Had to comment it out.
Doesn't seem like I should need to do that...

Cluso99 · 2020-05-24 09:13

Chip,
I just tried this code

              mov       outa,            #0
              mov       dira,            #1
              mov       outa,            #1
              waitx     #6
              test      ina,             t1mask     wc

Using the TEST INA instruction requires waitx #6 or 8 clocks to see the output.

              mov       outa,            #0
              mov       dira,            #1
              mov       outa,            #1
              waitx     #5
              testp     #0                          wc

Using the TESTP instruction requires waitx #5 or 7 clocks to see the output.

From the documents v33 (Rev B silicon) (dated 2019_09_13). Is this the latest?

I/O PIN TIMING

I/O pins are controlled by cogs via the following cog registers:

DIRA - output enable bits for P0..P31 (active high)
DIRB - output enable bits for P32..P63 (active high)
OUTA - output state bits for P0..P31 (corresponding DIRA bit must be high to enable output)
OUTB - output state bits for P32..P63 (corresponding DIRB bit must be high to enable output)

I/O pins are read by cogs via the following cog registers:

INA - input state bits for P0..P31
INB - input state bits for P32..P63

Aside from general-purpose instructions which may operate on DIRA/DIRB/OUTA/OUTB, there are special pin instructions which operate on singular bits within these registers:

DIRL/DIRH/DIRC/DIRNC/DIRZ/DIRNZ/DIRRND/DIRNOT {#}D - affect pin D bit in DIRx
OUTL/OUTH/OUTC/OUTNC/OUTZ/OUTNZ/OUTRND/OUTNOT {#}D - affect pin D bit in OUTx
FLTL/FLTH/FLTC/FLTNC/FLTZ/FLTNZ/FLTRND/FLTNOT {#}D - affect pin D bit in OUTx, clear bit in DIRx
DRVL/DRVH/DRVC/DRVNC/DRVZ/DRVNZ/DRVRND/DRVNOT {#}D - affect pin D bit in OUTx, set bit in DIRx

As well, aside from general-purpose instructions which may read INA/INB, there are special pin instructions which can read singular bits within these registers:

TESTP {#}D WC/WZ/ANDC/ANDZ/ORC/ORZ/XORC/XORZ -read pin D bit in INx and affect C or Z
TESTPN {#}D WC/WZ/ANDC/ANDZ/ORC/ORZ/XORC/XORZ -read pin D bit in !INx and affect C or Z

When a DIRx/OUTx bit is changed by any instruction, it takes THREE additional clocks after the instruction before the pin starts transitioning to the new state. Here this delay is demonstrated using DRVH:

____0 ____1 ____2 ____3 ____4 ____5
Clock: / \____/ \____/ \____/ \____/ \____/ \____/
DIRA: | | DIRA-->| REG-->| REG-->| REG-->| P0 DRIV |
OUTA: | | OUTA-->| REG-->| REG-->| REG-->| P0 HIGH |
| |
Instruction: | DRVH #0 |

When an INx register is read by an instruction, it will reflect the state of the pins registered THREE clocks before the start of the instruction. Here this delay is demonstrated using TESTB:

____0 ____1 ____2 ____3 ____4 ____5
Clock: / \____/ \____/ \____/ \____/ \____/ \____/
INA: | P0 IN-->| REG-->| REG-->| REG-->| ALU-->| C/Z-->|
| |
Instruction: | TESTB INA,#0 |

When a TESTP/TESTPN instruction is used to read a pin, the value read will reflect the state of the pin registered TWO clocks before the start of the instruction. So, TESTP/TESTPN get fresher INx data than is available via the INx registers:

____0 ____1 ____2 ____3 ____4
Clock: / \____/ \____/ \____/ \____/ \____/
INA: | P0 IN-->| REG-->| REG-->| REG-->| C/Z-->|
| |
Instruction: | TESTP #0 |

While it's likely in the silicon the output will not be seen on the same clock edge, accounting for one clock. But that does not account for at least another an additional clock.

Can you advise what the real delays should be please

evanh · 2020-05-24 09:37

Cluso,
It gets worse as the sysclock frequency is raised - up to +4 round trip. The effects can be noted below even 80 MHz sysclock. Temperature and other reactance factors affects the usable frequency bands. And registered pins adds another +2 round trip as well.

I'm not sure that stating any input vs output components matters without a huge amount more detail of the exact relationships at every stage of hardware sources, recipients and associated instructions.

Just the undefined nature of what constitutes the output timing, at the ALU, from an instruction is enough to make a specified number useless.

Cluso99 · 2020-05-24 09:48

evanh wrote: »

Cluso,
It gets worse as the sysclock frequency is raised - up to +4 round trip. The effects can be noted below even 80 MHz sysclock. Temperature and other reactance factors affects the usable frequency bands. And registered pins adds another +2 round trip as well.

I'm not sure that stating any input vs output components matters without a huge amount more detail of the exact relationships at every stage of hardware sources, recipients and associated instructions.

Just the undefined nature of what constitutes the output timing, at the ALU, from an instruction is enough to make a specified number useless.

It’s a mandatory requirement so that bit-bashing can be done reliably!
Since it’s clocked it should be possible to define with certainty. The only issue would be when an output arrives at the same clock clocking the input, but a clock either side should be specific.

The whole reason for clocking is to define what happens irrespective of frequency and internal delays.

evanh · 2020-05-24 11:11

Cluso99 wrote: »

It’s a mandatory requirement so that bit-bashing can be done reliably!

Those numbers don't help one bit for doing the I/O timing without also knowing their internal relationships.

The straight forward answer is simply measure and fine tune as we've already been doing.

The whole reason for clocking is to define what happens irrespective of frequency and internal delays.

I was hammer on along those lines at the beginning too. Hasn't panned out that way at all. It works up to a certain frequency but above that the usable bands kick in and things get fuzzy.

cgracey · 2020-05-24 11:59

Cluso, as Evanh stated, there is no hard rule at high frequencies because we are suffering analog delays through the 3.3V I/O pins.

Cluso99 · 2020-05-24 13:39

cgracey wrote: »

Cluso, as Evanh stated, there is no hard rule at high frequencies because we are suffering analog delays through the 3.3V I/O pins.

Chip,
That’s not a reasonable answer. You have to realise that the P2 does not have traditional I/O blocks like other micros. If you cannot provide specifics then the P2 will not make the cut with professional engineers.
Even if it’s a table depending on the frequency, it’s a mandatory requirement. Otherwise, how do expect anyone to use the I/O to build those blocks that are missing on the P2.
While the smart pins can do some things, they cannot do everything.

Every other micro these days has an abundant supply of silicon peripherals, and of course the massive manuals that go with them.

My testing was done at 200MHz with nothing attached to the pins. My results do not match the stated characteristics in your document.

BTW I’ve been speeding up my SD driver and this is what I’ve found.

Seairth · 2020-05-24 17:26

Is the latency consistent with documentation at 180 MHz? If not, I agree that should at least be corrected. As for overclocked values, I see three ways forward:

* Do nothing (each engineer needs to figure out their own timing tolerances)
* Figure out frequency bands for each additional clock of latency, with each band shifted downward to provide a margin of error.
* Figure out the worst case for the highest reasonable overclocked value and state that as the overclocked latency.

In the case of something like a your SD driver, it seems the most likely approach is to go with option 3 if you expect people to use it in overclocked scenarios (which seems highly likely with this chip).

cgracey · 2020-05-24 17:30

Cluso99 wrote: »

cgracey wrote: »

Cluso, as Evanh stated, there is no hard rule at high frequencies because we are suffering analog delays through the 3.3V I/O pins.

Chip,
That’s not a reasonable answer. You have to realise that the P2 does not have traditional I/O blocks like other micros. If you cannot provide specifics then the P2 will not make the cut with professional engineers.
Even if it’s a table depending on the frequency, it’s a mandatory requirement. Otherwise, how do expect anyone to use the I/O to build those blocks that are missing on the P2.
While the smart pins can do some things, they cannot do everything.

Every other micro these days has an abundant supply of silicon peripherals, and of course the massive manuals that go with them.

My testing was done at 200MHz with nothing attached to the pins. My results do not match the stated characteristics in your document.

BTW I’ve been speeding up my SD driver and this is what I’ve found.

There are three dimensions to this timing problem: process, voltage, and temperature. If turn-around time is a problem at high frequencies, you will need to have some kind of automatic calibration on a continuous basis. I don't see any other way around it. Do you?

Dave Hein · 2020-05-24 19:42

Cluso99, it seems like what you need are the specs for setup, hold, and delay times on the I/O pins, plus the pipeline delays on the inputs and outputs. Of course, the time specs would be relative to the internal clock, so I don't know how useful that is since we don't have direct access to the internal clock. At 200 MHz the clock period is only 5 nano-seconds, so the times are really quite small. Any logic that you have on the pins, such as an SD card, will probably shift the signals over to the next clock period, or even multiple clock periods.

Cluso99 · 2020-05-24 22:25

Seairth wrote: »

Is the latency consistent with documentation at 180 MHz? If not, I agree that should at least be corrected. As for overclocked values, I see three ways forward:

* Do nothing (each engineer needs to figure out their own timing tolerances)
* Figure out frequency bands for each additional clock of latency, with each band shifted downward to provide a margin of error.
* Figure out the worst case for the highest reasonable overclocked value and state that as the overclocked latency.

In the case of something like a your SD driver, it seems the most likely approach is to go with option 3 if you expect people to use it in overclocked scenarios (which seems highly likely with this chip).

cgracey wrote: »

Cluso99 wrote: »

cgracey wrote: »

Cluso, as Evanh stated, there is no hard rule at high frequencies because we are suffering analog delays through the 3.3V I/O pins.

Chip,
That’s not a reasonable answer. You have to realise that the P2 does not have traditional I/O blocks like other micros. If you cannot provide specifics then the P2 will not make the cut with professional engineers.
Even if it’s a table depending on the frequency, it’s a mandatory requirement. Otherwise, how do expect anyone to use the I/O to build those blocks that are missing on the P2.
While the smart pins can do some things, they cannot do everything.

Every other micro these days has an abundant supply of silicon peripherals, and of course the massive manuals that go with them.

My testing was done at 200MHz with nothing attached to the pins.

BTW I’ve been speeding up my SD driver and this is what I’ve found.

There are three dimensions to this timing problem: process, voltage, and temperature. If turn-around time is a problem at high frequencies, you will need to have some kind of automatic calibration on a continuous basis. I don't see any other way around it. Do you?

There needs to be a specific set of timings, even if this is done at a particular speed. We have no idea about the internals of the silicon.

I thought, maybe wrongly, that clock-gating was the solution to ensuring that the time delay from setting an output instruction to it appearing at the output was a fixed number of clocks after the instruction, and that the time delay from a pin being latched to being received by the instruction was also a fixed number of clocks. So I thought that these could (and were) specified. I thought that conditions external to the chip such as loading would not effect these internal fixed delays, so it would only be these external delays which could affect the rise and fall times of the signal at the pin. And these fixed delays would be constant over a wide clock range, whatever that range is.
Sure there may be some point in overclocking where these delays between the clock stages become marginal (ie subject to silicon process variation) and then fail. But surely these can be reasonably characterised as to where the limit may be?
Is OnSemi's software able to query these paths? I thought that was the reason an additional clock was added to avoid critical path problems.

What I am finding is that the current document is wrong. It needs to be corrected, and notes added giving details of what to expect. I realise we are in the early days but this will need to be precisely spelt out in the documents.

To put this bluntly, the P2 cannot be taken seriously without this basic information for designers. To tell an engineer (potential source of volume sales) that they will have to work it out for themselves will immediately lose any credibility that they may have to use the P2.

You would be surprised at the reasons chips get "dumped" by engineers. It's hard enough to get engineers to consider the P1 or P2 for a design in the first place, let alone give them a simple reason to give it a miss.

The I/O pins are a fundamental part of the P2 design, particularly in light of the fact there are no peripheral blocks in silicon.

As I said, my test results do not match the stated characteristics in your document.

I will redo my test at different clock speeds, but don't expect a design engineer to do this. If he doesn't have a base spec to work with it will be game over.

My SD Driver
FWIW I can read, and write SPI from/to the SD card at 8 clocks (4 instructions) per bit for a sustained average of 9 clocks over the entire 512 byte sector + 2 byte CRC16. It is running in cogexec.

In between the two CLK=0 and CLK=1, I need to insert the sample and accumulate instructions. But I need to locate the test/testb/testp in the correct window. I need to provide characterisation information with this SD Driver to the user.

Cluso99 · 2020-05-24 22:40

FYI here is the SD read code, and a timing diagram based on the current document info

BTW The bit numbering is incorrect: bit7 is read first, down to bit0 last

'+-------[ SD Card read SECTOR ]-----------------------------------------------+
_readFastSECTOR  call    #_enable_pins                   ' outputs:/CS=1 CLK=0 DI=1      'debug-time >>>>>>
'+-----------------------------------------------------------------------------+
'+ Read Block/Sector:  (512 bytes)                                             +
'+      CMD17, PAR=blocknr, CRC=$xx, reply=R1($??) +n*$FF +($FE+block+CRC16)   +
'+-----------------------------------------------------------------------------+
                mov     sdx_cmdout,      #CMD17          ' CMD17:
                mov     sdx_cmdpar,      sdx_sector      ' PAR=sector#
                mov     PTRB,            sdx_bufad       ' set hub buffer addr
                getct   sdx_timeout                      '\ set sdx_timeout for cmd17
                addct1  sdx_timeout,     ##delay1s       '/                           
'+-----------------------------------------------------------------------------+
                getct   sdx_dbg                                                          'debug-time
                wrlong  sdx_dbg,         ##sdx_dbg_1                                      'debug-time
                call    #_cmdRZtoken                     ' /CS=0, send cmd, recv R1, /CS=0(ena)
        if_nz   jmp     #_fail                           '<>$00=error="NZ"=failed                       
'+-----------------------------------------------------------------------------+
                call    #_getreply                       ' n*$FF+$FE
                cmp     sdx_reply,       #$FE        wz  ' $FE=valid Data Token
        if_nz   jmp     #_fail                           ' <>$FE=error="NZ"=failed                       
'+-----------------------------------------------------------------------------+
                getct   sdx_dbg                                                          'debug-time
                wrlong  sdx_dbg,         ##sdx_dbg_2                                      'debug-time
'+-----------------------------------------------------------------------------+
'read 512 bytes: wkg but we are sampling 3 clocks after CLK=H and 1 clock before CLK=L   48714 clocks (was 194742)
                rep     @.rep,            ##512          ' read 512 bytes
'               outl    #sdx_ck                          ' CLK=0  (already 0 first time)
.s0             outh    #sdx_ck                          ' CLK=1
                mov     sdx_reply,        #0             ' clear reply
                outl    #sdx_ck                          ' CLK=0
                nop
.s1             outh    #sdx_ck                          ' CLK=1
                testp   #sdx_do                      wc  ' read input bit0:  sample on/after prior CLK rising edge
                outl    #sdx_ck                          ' CLK=0
                rcl     sdx_reply,        #1             ' accum DO input bit 0
.s2             outh    #sdx_ck                          ' CLK=1
                testp   #sdx_do                      wc  ' read input bit1:  sample on/after prior CLK rising edge
                outl    #sdx_ck                          ' CLK=0
                rcl     sdx_reply,        #1             ' accum DO input bit 1
.s3             outh    #sdx_ck                          ' CLK=1
                testp   #sdx_do                      wc  ' read input bit2:  sample on/after prior CLK rising edge
                outl    #sdx_ck                          ' CLK=0
                rcl     sdx_reply,        #1             ' accum DO input bit 2
.s4             outh    #sdx_ck                          ' CLK=1
                testp   #sdx_do                      wc  ' read input bit3:  sample on/after prior CLK rising edge
                outl    #sdx_ck                          ' CLK=0
                rcl     sdx_reply,        #1             ' accum DO input bit 3
.s5             outh    #sdx_ck                          ' CLK=1
                testp   #sdx_do                      wc  ' read input bit4:  sample on/after prior CLK rising edge
                outl    #sdx_ck                          ' CLK=0
                rcl     sdx_reply,        #1             ' accum DO input bit 4
.s6             outh    #sdx_ck                          ' CLK=1
                testp   #sdx_do                      wc  ' read input bit5:  sample on/after prior CLK rising edge
                outl    #sdx_ck                          ' CLK=0
                rcl     sdx_reply,        #1             ' accum DO input bit 5
.s7             outh    #sdx_ck                          ' CLK=1
                testp   #sdx_do                      wc  ' read input bit6:  sample on/after prior CLK rising edge
                outl    #sdx_ck                          ' CLK=0
                rcl     sdx_reply,        #1             ' accum DO input bit 6
'               nop                                     ' delay not reqd
                testp   #sdx_do                      wc  ' read input bit7:  sample on/after prior CLK rising edge
                rcl     sdx_reply,        #1             ' accum DO input bit 7
                wrbyte  sdx_reply,        PTRB++         ' save byte
.rep
'+-----------------------------------------------------------------------------+
' since we are not checking crc(2 bytes), no need to sample, so can go 2x speed (50MHz @ 200MHz clock)
' in cog we can do rep just as fast as inline hubexec
                rep     @.repcrc,         #2*8
                outh    #sdx_ck                          ' CLK=1
                outl    #sdx_ck                          ' CLK=0
.repcrc
'+-----------------------------------------------------------------------------+
                getct   sdx_dbg                                                          'debug-time
                wrlong  sdx_dbg,          ##sdx_dbg_3                                     'debug-time
'+-----------------------------------------------------------------------------+
                MODCZ   _clr,_set                  wcz  ' "NC" & "Z" = success
                jmp     #_releaseDO                     ' releaseDO & /CS=1

Electrodude · 2020-05-25 04:11

@Cluso99 Isn't the rated Fmax for the P2 technically only something like 180 MHz? Why should anyone bit-banging still work reliably when you operate it well out of spec? I mean, it's great that it works at such high frequencies (albeit with timing caveats), and I operate my P2 at 360 MHz all the time, but doing this is still, nevertheless, operating it out of spec.

Cluso99 · 2020-05-25 05:42

Here are some test results on my RevB chip on the P2EVAL pcb.
There is nothing attached to the test pins 0-53, 54 & 55 have the buffer/leds IIRC.
For testb I only tested pins 0-31.

                                TESTP   TestB
40-140MHz  (20MHz steps)        6       7      clocks
160MHz                          6-7     7-8
180-300MHz (20MHz steps)        7       8
320-350MHz (10MHz steps)        7-8     8-9
360-390MHz (10MHz steps)        8       9

Note that my chip operates fine to 390MHz on this test - single cog using serial smart pins.
But at 392MHz starts to fail, so 390MHz is definately the top (and probably above max) .

Code is attached. Compiled with pnut and requires the serial to be working (5s delay after downloading) - I use PST.

This is the basics of the test code

              mov       pin,             #0             ' pin under test
next_pin
              mov       delay,           #0             ' actually starts at "1"
hi_loop       add       delay,           #1             ' delay++
              DRVL      pin                             ' make output and Low
              waitx     #10                             ' just a delay to let pin settle
              DRVH      pin                             ' L -> H
              waitx     delay                           ' delay+2 clocks
              testp     pin                         wc  ' pin = H ?
    if_nc     jmp       #hi_loop                        ' nc: try next delay
.....
              mov       pin,             #0             ' pin under test
next_pin2
'              mov       pinmask,         #1             '\ generate...
'              shl       pinmask,         pin            '/ ...pinmask
              mov       delay,           #0             ' actually starts at "1"
hi_loop2      add       delay,           #1             ' delay++
              DRVL      pin                             ' make output and Low
              waitx     #10                             ' just a delay to let pin settle
              DRVH      pin                             ' L -> H
              waitx     delay                           ' delay+2 clocks
              testb     ina,             pin        wc  ' pin = H ?
    if_nc     jmp       #hi_loop2                       ' nc: try next delay

I thought the effect of clock-gating was to ensure consistent results. I do understand that there is a possible uncertainty if the pin transitions right at the same time coincident with the clock. But we have quite a range here even while operating within design expectations.

IMHO this does not sit well for bit-bashing as code may have to be tailored for the specific clock used

Cluso99 · 2020-05-25 05:56

Electrodude wrote: »

@Cluso99 Isn't the rated Fmax for the P2 technically only something like 180 MHz? Why should anyone bit-banging still work reliably when you operate it well out of spec? I mean, it's great that it works at such high frequencies (albeit with timing caveats), and I operate my P2 at 360 MHz all the time, but doing this is still, nevertheless, operating it out of spec.

My tests were done at 200MHz. But I've just posted a range of tests.

Cluso99 · 2020-05-25 08:58

Here is the timing from the above observations. I'm not sure whether it's the output or input or both that get shifted as the clock frequency changes. Currently I cannot think of a way to determine it either.

ManAtWork · 2020-05-25 11:35

Cluso99 wrote: »

What I am finding is that the current document is wrong. It needs to be corrected, and notes added giving details of what to expect. I realise we are in the early days but this will need to be precisely spelt out in the documents.

To put this bluntly, the P2 cannot be taken seriously without this basic information for designers. To tell an engineer (potential source of volume sales) that they will have to work it out for themselves will immediately lose any credibility that they may have to use the P2.

I fully agree that the documentation should be as precise and detailed as possible, and especially, of course, correct. But threadening to drop the P2 only because one single parameter is not documented as you liked it is... well, I'd say a bit over-reaction, at least.

I've worked with ARM chips from Atmel/Microchip for some time. And I've found at least one serious design flaw or undocumented bug PER DAY. Many features weren't documented at all and you had to find out how they work by reverse engineering example code. These chips are horribly complex and contain so much unnecessary limitations that I really suspect that they are sponsored by some pharmaceutical company selling headache pills. So iI understand your concerns but please note that it could be worse, MUCH WORSE.

evanh · 2020-05-25 12:16

Cluso,
You test simply by measuring the outcome. The program I used to map HyperRAM compensations visually gives me a spread of timing measurements and covers the spectrum too. From this I can easily see the needed compensations.

Once I got confident with behaviour from that code, I am now able to use another existing working module, for SD cards for example, and optimise its timing just by relying on working or not outcomes of each edit.

That HR testing program isn't documented but Von was using it and I answered a few questions for him over a couple of pages of posts - https://forums.parallax.com/discussion/comment/1496361/#Comment_1496361

evanh · 2020-05-25 13:38

PS: One of the details discovered through this exercise with sdspi_bashed.spin2 code is I've found out that SD cards, in SPI mode at least, use timing mode of CPOL = 1 and CPHA = 1. Most generic SPI devices use CPOL = 0 and CPHA = 0 instead.

whicker · 2020-05-25 16:11

So there's digital clock delays (measured in clock ticks) plus the time it takes for the pin to rise and fall.

160 MHz is the first critical spot where this rise and fall time reaches the period of the clock frequency.

160 MHz is 6.25 ns period.

Are we thinking that the pin rise time from 0 to 1 logic threshold is about 6.25 ns?

Cluso99 · 2020-05-25 19:43

Thanks Evan.
Yes I’m fairly certain I’m using CPOL=CPHA=0 which according to documentation is the preferred SPI mode for SD cards - data out with clk going low, sampling on clock going high.

Thanks for the link to the Hyperram thread. While I’ve been skimming it I’ve not really taken much notice since I’m not really interested in hyperram. Lots of good info there!

I’m sure some of that can be useful for SD. But most of the time wasted with SD is in waiting for the SD Card to acknowledge the command (mostly 2.7ms on my SD, but as bad as 4ms and best 1.6ms. At 8 clocks (4 instructions) per byte gives ~120us IIRC for the actual read or write of the 512 byte sector and 2 byte crc16.
I have determined the card works fine at double this speed (4 clocks per bit) and 200MHz clock ie 50MHz which it should do.
BTW IIRC there’s no A class rating on my card.

I am happy with my SD Driver that now can run in its’ own Cog. So I’ll have two versions - one time slower hubexec Version shared cog (can be pasm or spin2) and a faster separate cog exec version.
I just have to tweek the command section of the code to use the faster writes and then I can release it.
Later I might revisit it to use smart pins.

PNut/Spin2 Latest Version (v46 - DEBUG gating, clock-setter control, VAR flexibility, C_Z for DEBUG)

Comments