There's no mention of inline assembly in the Spin2 docs...
I know it starts with "org" but don't remember how it ends...
Guess I'll have to dig through the forum...
There's a slight glitch with assembly when using bytes...
Compiler complained and made me put in "alignl" when using this byte jump table:
long
jumps byte 0 '0 'Can't use this one!
byte Start_ '1
byte WriteRam_ '2
byte ReadRam_ '3
byte HyperVideo_ '4
byte Dummy_ '5
byte Stop_ '6
byte ConfigureVideo_ '7
DAT 'Start_
'alignl 'Need this or PNut complains about alignment due to bytes above...
The problem is that when the # of bytes happens to equal 8 (or maybe any multiple of 4) it gives an error on the "alignl".
Had to comment it out.
Doesn't seem like I should need to do that...
Using the TESTP instruction requires waitx #5 or 7 clocks to see the output.
From the documents v33 (Rev B silicon) (dated 2019_09_13). Is this the latest?
I/O PIN TIMING
I/O pins are controlled by cogs via the following cog registers:
DIRA - output enable bits for P0..P31 (active high)
DIRB - output enable bits for P32..P63 (active high)
OUTA - output state bits for P0..P31 (corresponding DIRA bit must be high to enable output)
OUTB - output state bits for P32..P63 (corresponding DIRB bit must be high to enable output)
I/O pins are read by cogs via the following cog registers:
INA - input state bits for P0..P31
INB - input state bits for P32..P63
Aside from general-purpose instructions which may operate on DIRA/DIRB/OUTA/OUTB, there are special pin instructions which operate on singular bits within these registers:
DIRL/DIRH/DIRC/DIRNC/DIRZ/DIRNZ/DIRRND/DIRNOT {#}D - affect pin D bit in DIRx
OUTL/OUTH/OUTC/OUTNC/OUTZ/OUTNZ/OUTRND/OUTNOT {#}D - affect pin D bit in OUTx
FLTL/FLTH/FLTC/FLTNC/FLTZ/FLTNZ/FLTRND/FLTNOT {#}D - affect pin D bit in OUTx, clear bit in DIRx
DRVL/DRVH/DRVC/DRVNC/DRVZ/DRVNZ/DRVRND/DRVNOT {#}D - affect pin D bit in OUTx, set bit in DIRx
As well, aside from general-purpose instructions which may read INA/INB, there are special pin instructions which can read singular bits within these registers:
TESTP {#}D WC/WZ/ANDC/ANDZ/ORC/ORZ/XORC/XORZ -read pin D bit in INx and affect C or Z
TESTPN {#}D WC/WZ/ANDC/ANDZ/ORC/ORZ/XORC/XORZ -read pin D bit in !INx and affect C or Z
When a DIRx/OUTx bit is changed by any instruction, it takes THREE additional clocks after the instruction before the pin starts transitioning to the new state. Here this delay is demonstrated using DRVH:
When an INx register is read by an instruction, it will reflect the state of the pins registered THREE clocks before the start of the instruction. Here this delay is demonstrated using TESTB:
When a TESTP/TESTPN instruction is used to read a pin, the value read will reflect the state of the pin registered TWO clocks before the start of the instruction. So, TESTP/TESTPN get fresher INx data than is available via the INx registers:
While it's likely in the silicon the output will not be seen on the same clock edge, accounting for one clock. But that does not account for at least another an additional clock.
Can you advise what the real delays should be please
Cluso,
It gets worse as the sysclock frequency is raised - up to +4 round trip. The effects can be noted below even 80 MHz sysclock. Temperature and other reactance factors affects the usable frequency bands. And registered pins adds another +2 round trip as well.
I'm not sure that stating any input vs output components matters without a huge amount more detail of the exact relationships at every stage of hardware sources, recipients and associated instructions.
Just the undefined nature of what constitutes the output timing, at the ALU, from an instruction is enough to make a specified number useless.
Cluso,
It gets worse as the sysclock frequency is raised - up to +4 round trip. The effects can be noted below even 80 MHz sysclock. Temperature and other reactance factors affects the usable frequency bands. And registered pins adds another +2 round trip as well.
I'm not sure that stating any input vs output components matters without a huge amount more detail of the exact relationships at every stage of hardware sources, recipients and associated instructions.
Just the undefined nature of what constitutes the output timing, at the ALU, from an instruction is enough to make a specified number useless.
It’s a mandatory requirement so that bit-bashing can be done reliably!
Since it’s clocked it should be possible to define with certainty. The only issue would be when an output arrives at the same clock clocking the input, but a clock either side should be specific.
The whole reason for clocking is to define what happens irrespective of frequency and internal delays.
It’s a mandatory requirement so that bit-bashing can be done reliably!
Those numbers don't help one bit for doing the I/O timing without also knowing their internal relationships.
The straight forward answer is simply measure and fine tune as we've already been doing.
The whole reason for clocking is to define what happens irrespective of frequency and internal delays.
I was hammer on along those lines at the beginning too. Hasn't panned out that way at all. It works up to a certain frequency but above that the usable bands kick in and things get fuzzy.
Cluso, as Evanh stated, there is no hard rule at high frequencies because we are suffering analog delays through the 3.3V I/O pins.
Chip,
That’s not a reasonable answer. You have to realise that the P2 does not have traditional I/O blocks like other micros. If you cannot provide specifics then the P2 will not make the cut with professional engineers.
Even if it’s a table depending on the frequency, it’s a mandatory requirement. Otherwise, how do expect anyone to use the I/O to build those blocks that are missing on the P2.
While the smart pins can do some things, they cannot do everything.
Every other micro these days has an abundant supply of silicon peripherals, and of course the massive manuals that go with them.
My testing was done at 200MHz with nothing attached to the pins. My results do not match the stated characteristics in your document.
BTW I’ve been speeding up my SD driver and this is what I’ve found.
Is the latency consistent with documentation at 180 MHz? If not, I agree that should at least be corrected. As for overclocked values, I see three ways forward:
* Do nothing (each engineer needs to figure out their own timing tolerances)
* Figure out frequency bands for each additional clock of latency, with each band shifted downward to provide a margin of error.
* Figure out the worst case for the highest reasonable overclocked value and state that as the overclocked latency.
In the case of something like a your SD driver, it seems the most likely approach is to go with option 3 if you expect people to use it in overclocked scenarios (which seems highly likely with this chip).
Cluso, as Evanh stated, there is no hard rule at high frequencies because we are suffering analog delays through the 3.3V I/O pins.
Chip,
That’s not a reasonable answer. You have to realise that the P2 does not have traditional I/O blocks like other micros. If you cannot provide specifics then the P2 will not make the cut with professional engineers.
Even if it’s a table depending on the frequency, it’s a mandatory requirement. Otherwise, how do expect anyone to use the I/O to build those blocks that are missing on the P2.
While the smart pins can do some things, they cannot do everything.
Every other micro these days has an abundant supply of silicon peripherals, and of course the massive manuals that go with them.
My testing was done at 200MHz with nothing attached to the pins. My results do not match the stated characteristics in your document.
BTW I’ve been speeding up my SD driver and this is what I’ve found.
There are three dimensions to this timing problem: process, voltage, and temperature. If turn-around time is a problem at high frequencies, you will need to have some kind of automatic calibration on a continuous basis. I don't see any other way around it. Do you?
Cluso99, it seems like what you need are the specs for setup, hold, and delay times on the I/O pins, plus the pipeline delays on the inputs and outputs. Of course, the time specs would be relative to the internal clock, so I don't know how useful that is since we don't have direct access to the internal clock. At 200 MHz the clock period is only 5 nano-seconds, so the times are really quite small. Any logic that you have on the pins, such as an SD card, will probably shift the signals over to the next clock period, or even multiple clock periods.
Is the latency consistent with documentation at 180 MHz? If not, I agree that should at least be corrected. As for overclocked values, I see three ways forward:
* Do nothing (each engineer needs to figure out their own timing tolerances)
* Figure out frequency bands for each additional clock of latency, with each band shifted downward to provide a margin of error.
* Figure out the worst case for the highest reasonable overclocked value and state that as the overclocked latency.
In the case of something like a your SD driver, it seems the most likely approach is to go with option 3 if you expect people to use it in overclocked scenarios (which seems highly likely with this chip).
Cluso, as Evanh stated, there is no hard rule at high frequencies because we are suffering analog delays through the 3.3V I/O pins.
Chip,
That’s not a reasonable answer. You have to realise that the P2 does not have traditional I/O blocks like other micros. If you cannot provide specifics then the P2 will not make the cut with professional engineers.
Even if it’s a table depending on the frequency, it’s a mandatory requirement. Otherwise, how do expect anyone to use the I/O to build those blocks that are missing on the P2.
While the smart pins can do some things, they cannot do everything.
Every other micro these days has an abundant supply of silicon peripherals, and of course the massive manuals that go with them.
My testing was done at 200MHz with nothing attached to the pins.
BTW I’ve been speeding up my SD driver and this is what I’ve found.
There are three dimensions to this timing problem: process, voltage, and temperature. If turn-around time is a problem at high frequencies, you will need to have some kind of automatic calibration on a continuous basis. I don't see any other way around it. Do you?
There needs to be a specific set of timings, even if this is done at a particular speed. We have no idea about the internals of the silicon.
I thought, maybe wrongly, that clock-gating was the solution to ensuring that the time delay from setting an output instruction to it appearing at the output was a fixed number of clocks after the instruction, and that the time delay from a pin being latched to being received by the instruction was also a fixed number of clocks. So I thought that these could (and were) specified. I thought that conditions external to the chip such as loading would not effect these internal fixed delays, so it would only be these external delays which could affect the rise and fall times of the signal at the pin. And these fixed delays would be constant over a wide clock range, whatever that range is.
Sure there may be some point in overclocking where these delays between the clock stages become marginal (ie subject to silicon process variation) and then fail. But surely these can be reasonably characterised as to where the limit may be?
Is OnSemi's software able to query these paths? I thought that was the reason an additional clock was added to avoid critical path problems.
What I am finding is that the current document is wrong. It needs to be corrected, and notes added giving details of what to expect. I realise we are in the early days but this will need to be precisely spelt out in the documents.
To put this bluntly, the P2 cannot be taken seriously without this basic information for designers. To tell an engineer (potential source of volume sales) that they will have to work it out for themselves will immediately lose any credibility that they may have to use the P2.
You would be surprised at the reasons chips get "dumped" by engineers. It's hard enough to get engineers to consider the P1 or P2 for a design in the first place, let alone give them a simple reason to give it a miss.
The I/O pins are a fundamental part of the P2 design, particularly in light of the fact there are no peripheral blocks in silicon.
As I said, my test results do not match the stated characteristics in your document.
I will redo my test at different clock speeds, but don't expect a design engineer to do this. If he doesn't have a base spec to work with it will be game over.
My SD Driver
FWIW I can read, and write SPI from/to the SD card at 8 clocks (4 instructions) per bit for a sustained average of 9 clocks over the entire 512 byte sector + 2 byte CRC16. It is running in cogexec.
In between the two CLK=0 and CLK=1, I need to insert the sample and accumulate instructions. But I need to locate the test/testb/testp in the correct window. I need to provide characterisation information with this SD Driver to the user.
FYI here is the SD read code, and a timing diagram based on the current document info BTW The bit numbering is incorrect: bit7 is read first, down to bit0 last
@Cluso99 Isn't the rated Fmax for the P2 technically only something like 180 MHz? Why should anyone bit-banging still work reliably when you operate it well out of spec? I mean, it's great that it works at such high frequencies (albeit with timing caveats), and I operate my P2 at 360 MHz all the time, but doing this is still, nevertheless, operating it out of spec.
Here are some test results on my RevB chip on the P2EVAL pcb.
There is nothing attached to the test pins 0-53, 54 & 55 have the buffer/leds IIRC.
For testb I only tested pins 0-31.
TESTP TestB
40-140MHz (20MHz steps) 6 7 clocks
160MHz 6-7 7-8
180-300MHz (20MHz steps) 7 8
320-350MHz (10MHz steps) 7-8 8-9
360-390MHz (10MHz steps) 8 9
Note that my chip operates fine to 390MHz on this test - single cog using serial smart pins.
But at 392MHz starts to fail, so 390MHz is definately the top (and probably above max) .
Code is attached. Compiled with pnut and requires the serial to be working (5s delay after downloading) - I use PST.
This is the basics of the test code
mov pin, #0 ' pin under test
next_pin
mov delay, #0 ' actually starts at "1"
hi_loop add delay, #1 ' delay++
DRVL pin ' make output and Low
waitx #10 ' just a delay to let pin settle
DRVH pin ' L -> H
waitx delay ' delay+2 clocks
testp pin wc ' pin = H ?
if_nc jmp #hi_loop ' nc: try next delay
.....
mov pin, #0 ' pin under test
next_pin2
' mov pinmask, #1 '\ generate...
' shl pinmask, pin '/ ...pinmask
mov delay, #0 ' actually starts at "1"
hi_loop2 add delay, #1 ' delay++
DRVL pin ' make output and Low
waitx #10 ' just a delay to let pin settle
DRVH pin ' L -> H
waitx delay ' delay+2 clocks
testb ina, pin wc ' pin = H ?
if_nc jmp #hi_loop2 ' nc: try next delay
I thought the effect of clock-gating was to ensure consistent results. I do understand that there is a possible uncertainty if the pin transitions right at the same time coincident with the clock. But we have quite a range here even while operating within design expectations.
IMHO this does not sit well for bit-bashing as code may have to be tailored for the specific clock used
@Cluso99 Isn't the rated Fmax for the P2 technically only something like 180 MHz? Why should anyone bit-banging still work reliably when you operate it well out of spec? I mean, it's great that it works at such high frequencies (albeit with timing caveats), and I operate my P2 at 360 MHz all the time, but doing this is still, nevertheless, operating it out of spec.
My tests were done at 200MHz. But I've just posted a range of tests.
Here is the timing from the above observations. I'm not sure whether it's the output or input or both that get shifted as the clock frequency changes. Currently I cannot think of a way to determine it either.
What I am finding is that the current document is wrong. It needs to be corrected, and notes added giving details of what to expect. I realise we are in the early days but this will need to be precisely spelt out in the documents.
To put this bluntly, the P2 cannot be taken seriously without this basic information for designers. To tell an engineer (potential source of volume sales) that they will have to work it out for themselves will immediately lose any credibility that they may have to use the P2.
I fully agree that the documentation should be as precise and detailed as possible, and especially, of course, correct. But threadening to drop the P2 only because one single parameter is not documented as you liked it is... well, I'd say a bit over-reaction, at least.
I've worked with ARM chips from Atmel/Microchip for some time. And I've found at least one serious design flaw or undocumented bug PER DAY. Many features weren't documented at all and you had to find out how they work by reverse engineering example code. These chips are horribly complex and contain so much unnecessary limitations that I really suspect that they are sponsored by some pharmaceutical company selling headache pills. So iI understand your concerns but please note that it could be worse, MUCH WORSE.
Cluso,
You test simply by measuring the outcome. The program I used to map HyperRAM compensations visually gives me a spread of timing measurements and covers the spectrum too. From this I can easily see the needed compensations.
Once I got confident with behaviour from that code, I am now able to use another existing working module, for SD cards for example, and optimise its timing just by relying on working or not outcomes of each edit.
PS: One of the details discovered through this exercise with sdspi_bashed.spin2 code is I've found out that SD cards, in SPI mode at least, use timing mode of CPOL = 1 and CPHA = 1. Most generic SPI devices use CPOL = 0 and CPHA = 0 instead.
Thanks Evan.
Yes I’m fairly certain I’m using CPOL=CPHA=0 which according to documentation is the preferred SPI mode for SD cards - data out with clk going low, sampling on clock going high.
Thanks for the link to the Hyperram thread. While I’ve been skimming it I’ve not really taken much notice since I’m not really interested in hyperram. Lots of good info there!
I’m sure some of that can be useful for SD. But most of the time wasted with SD is in waiting for the SD Card to acknowledge the command (mostly 2.7ms on my SD, but as bad as 4ms and best 1.6ms. At 8 clocks (4 instructions) per byte gives ~120us IIRC for the actual read or write of the 512 byte sector and 2 byte crc16.
I have determined the card works fine at double this speed (4 clocks per bit) and 200MHz clock ie 50MHz which it should do.
BTW IIRC there’s no A class rating on my card.
I am happy with my SD Driver that now can run in its’ own Cog. So I’ll have two versions - one time slower hubexec Version shared cog (can be pasm or spin2) and a faster separate cog exec version.
I just have to tweek the command section of the code to use the faster writes and then I can release it.
Later I might revisit it to use smart pins.
Comments
I am also really looking for guidance regarding loading binaries, etc from SD.
Can you take a look here and answer some questions please?
forums.parallax.com/discussion/171599/discussion-and-questions-about-a-p2-operating-system
My P2 OS is running. I now need to work out how to load files while keeping the OS alive and not overwriting it.
I know it starts with "org" but don't remember how it ends...
Guess I'll have to dig through the forum...
Found it.. Looks like "end" is what I need.
But, was able to do this, which FastSpin seems to let me do and PNut seems OK with too:
PNut says "Expected a unique method name" on my waitx() method.
So, PNut apparently doesn't implement waitx() and also won't let you implement it yourself...
But, changing from "waitx" to "waitnx" seems to work for both.
Is this supposed to work?
Compiler complained and made me put in "alignl" when using this byte jump table:
The problem is that when the # of bytes happens to equal 8 (or maybe any multiple of 4) it gives an error on the "alignl".
Had to comment it out.
Doesn't seem like I should need to do that...
I just tried this code Using the TEST INA instruction requires waitx #6 or 8 clocks to see the output. Using the TESTP instruction requires waitx #5 or 7 clocks to see the output.
From the documents v33 (Rev B silicon) (dated 2019_09_13). Is this the latest?
I/O PIN TIMING
I/O pins are controlled by cogs via the following cog registers:
DIRA - output enable bits for P0..P31 (active high)
DIRB - output enable bits for P32..P63 (active high)
OUTA - output state bits for P0..P31 (corresponding DIRA bit must be high to enable output)
OUTB - output state bits for P32..P63 (corresponding DIRB bit must be high to enable output)
I/O pins are read by cogs via the following cog registers:
INA - input state bits for P0..P31
INB - input state bits for P32..P63
Aside from general-purpose instructions which may operate on DIRA/DIRB/OUTA/OUTB, there are special pin instructions which operate on singular bits within these registers:
DIRL/DIRH/DIRC/DIRNC/DIRZ/DIRNZ/DIRRND/DIRNOT {#}D - affect pin D bit in DIRx
OUTL/OUTH/OUTC/OUTNC/OUTZ/OUTNZ/OUTRND/OUTNOT {#}D - affect pin D bit in OUTx
FLTL/FLTH/FLTC/FLTNC/FLTZ/FLTNZ/FLTRND/FLTNOT {#}D - affect pin D bit in OUTx, clear bit in DIRx
DRVL/DRVH/DRVC/DRVNC/DRVZ/DRVNZ/DRVRND/DRVNOT {#}D - affect pin D bit in OUTx, set bit in DIRx
As well, aside from general-purpose instructions which may read INA/INB, there are special pin instructions which can read singular bits within these registers:
TESTP {#}D WC/WZ/ANDC/ANDZ/ORC/ORZ/XORC/XORZ -read pin D bit in INx and affect C or Z
TESTPN {#}D WC/WZ/ANDC/ANDZ/ORC/ORZ/XORC/XORZ -read pin D bit in !INx and affect C or Z
When a DIRx/OUTx bit is changed by any instruction, it takes THREE additional clocks after the instruction before the pin starts transitioning to the new state. Here this delay is demonstrated using DRVH:
____0 ____1 ____2 ____3 ____4 ____5
Clock: / \____/ \____/ \____/ \____/ \____/ \____/
DIRA: | | DIRA-->| REG-->| REG-->| REG-->| P0 DRIV |
OUTA: | | OUTA-->| REG-->| REG-->| REG-->| P0 HIGH |
| |
Instruction: | DRVH #0 |
When an INx register is read by an instruction, it will reflect the state of the pins registered THREE clocks before the start of the instruction. Here this delay is demonstrated using TESTB:
____0 ____1 ____2 ____3 ____4 ____5
Clock: / \____/ \____/ \____/ \____/ \____/ \____/
INA: | P0 IN-->| REG-->| REG-->| REG-->| ALU-->| C/Z-->|
| |
Instruction: | TESTB INA,#0 |
When a TESTP/TESTPN instruction is used to read a pin, the value read will reflect the state of the pin registered TWO clocks before the start of the instruction. So, TESTP/TESTPN get fresher INx data than is available via the INx registers:
____0 ____1 ____2 ____3 ____4
Clock: / \____/ \____/ \____/ \____/ \____/
INA: | P0 IN-->| REG-->| REG-->| REG-->| C/Z-->|
| |
Instruction: | TESTP #0 |
While it's likely in the silicon the output will not be seen on the same clock edge, accounting for one clock. But that does not account for at least another an additional clock.
Can you advise what the real delays should be please
It gets worse as the sysclock frequency is raised - up to +4 round trip. The effects can be noted below even 80 MHz sysclock. Temperature and other reactance factors affects the usable frequency bands. And registered pins adds another +2 round trip as well.
I'm not sure that stating any input vs output components matters without a huge amount more detail of the exact relationships at every stage of hardware sources, recipients and associated instructions.
Just the undefined nature of what constitutes the output timing, at the ALU, from an instruction is enough to make a specified number useless.
Since it’s clocked it should be possible to define with certainty. The only issue would be when an output arrives at the same clock clocking the input, but a clock either side should be specific.
The whole reason for clocking is to define what happens irrespective of frequency and internal delays.
The straight forward answer is simply measure and fine tune as we've already been doing.
I was hammer on along those lines at the beginning too. Hasn't panned out that way at all. It works up to a certain frequency but above that the usable bands kick in and things get fuzzy.
That’s not a reasonable answer. You have to realise that the P2 does not have traditional I/O blocks like other micros. If you cannot provide specifics then the P2 will not make the cut with professional engineers.
Even if it’s a table depending on the frequency, it’s a mandatory requirement. Otherwise, how do expect anyone to use the I/O to build those blocks that are missing on the P2.
While the smart pins can do some things, they cannot do everything.
Every other micro these days has an abundant supply of silicon peripherals, and of course the massive manuals that go with them.
My testing was done at 200MHz with nothing attached to the pins. My results do not match the stated characteristics in your document.
BTW I’ve been speeding up my SD driver and this is what I’ve found.
* Do nothing (each engineer needs to figure out their own timing tolerances)
* Figure out frequency bands for each additional clock of latency, with each band shifted downward to provide a margin of error.
* Figure out the worst case for the highest reasonable overclocked value and state that as the overclocked latency.
In the case of something like a your SD driver, it seems the most likely approach is to go with option 3 if you expect people to use it in overclocked scenarios (which seems highly likely with this chip).
There are three dimensions to this timing problem: process, voltage, and temperature. If turn-around time is a problem at high frequencies, you will need to have some kind of automatic calibration on a continuous basis. I don't see any other way around it. Do you?
There needs to be a specific set of timings, even if this is done at a particular speed. We have no idea about the internals of the silicon.
I thought, maybe wrongly, that clock-gating was the solution to ensuring that the time delay from setting an output instruction to it appearing at the output was a fixed number of clocks after the instruction, and that the time delay from a pin being latched to being received by the instruction was also a fixed number of clocks. So I thought that these could (and were) specified. I thought that conditions external to the chip such as loading would not effect these internal fixed delays, so it would only be these external delays which could affect the rise and fall times of the signal at the pin. And these fixed delays would be constant over a wide clock range, whatever that range is.
Sure there may be some point in overclocking where these delays between the clock stages become marginal (ie subject to silicon process variation) and then fail. But surely these can be reasonably characterised as to where the limit may be?
Is OnSemi's software able to query these paths? I thought that was the reason an additional clock was added to avoid critical path problems.
What I am finding is that the current document is wrong. It needs to be corrected, and notes added giving details of what to expect. I realise we are in the early days but this will need to be precisely spelt out in the documents.
To put this bluntly, the P2 cannot be taken seriously without this basic information for designers. To tell an engineer (potential source of volume sales) that they will have to work it out for themselves will immediately lose any credibility that they may have to use the P2.
You would be surprised at the reasons chips get "dumped" by engineers. It's hard enough to get engineers to consider the P1 or P2 for a design in the first place, let alone give them a simple reason to give it a miss.
The I/O pins are a fundamental part of the P2 design, particularly in light of the fact there are no peripheral blocks in silicon.
As I said, my test results do not match the stated characteristics in your document.
I will redo my test at different clock speeds, but don't expect a design engineer to do this. If he doesn't have a base spec to work with it will be game over.
My SD Driver
FWIW I can read, and write SPI from/to the SD card at 8 clocks (4 instructions) per bit for a sustained average of 9 clocks over the entire 512 byte sector + 2 byte CRC16. It is running in cogexec.
In between the two CLK=0 and CLK=1, I need to insert the sample and accumulate instructions. But I need to locate the test/testb/testp in the correct window. I need to provide characterisation information with this SD Driver to the user.
BTW The bit numbering is incorrect: bit7 is read first, down to bit0 last
There is nothing attached to the test pins 0-53, 54 & 55 have the buffer/leds IIRC.
For testb I only tested pins 0-31. Code is attached. Compiled with pnut and requires the serial to be working (5s delay after downloading) - I use PST.
This is the basics of the test code I thought the effect of clock-gating was to ensure consistent results. I do understand that there is a possible uncertainty if the pin transitions right at the same time coincident with the clock. But we have quite a range here even while operating within design expectations.
IMHO this does not sit well for bit-bashing as code may have to be tailored for the specific clock used
I fully agree that the documentation should be as precise and detailed as possible, and especially, of course, correct. But threadening to drop the P2 only because one single parameter is not documented as you liked it is... well, I'd say a bit over-reaction, at least.
I've worked with ARM chips from Atmel/Microchip for some time. And I've found at least one serious design flaw or undocumented bug PER DAY. Many features weren't documented at all and you had to find out how they work by reverse engineering example code. These chips are horribly complex and contain so much unnecessary limitations that I really suspect that they are sponsored by some pharmaceutical company selling headache pills. So iI understand your concerns but please note that it could be worse, MUCH WORSE.
You test simply by measuring the outcome. The program I used to map HyperRAM compensations visually gives me a spread of timing measurements and covers the spectrum too. From this I can easily see the needed compensations.
Once I got confident with behaviour from that code, I am now able to use another existing working module, for SD cards for example, and optimise its timing just by relying on working or not outcomes of each edit.
That HR testing program isn't documented but Von was using it and I answered a few questions for him over a couple of pages of posts - https://forums.parallax.com/discussion/comment/1496361/#Comment_1496361
160 MHz is the first critical spot where this rise and fall time reaches the period of the clock frequency.
160 MHz is 6.25 ns period.
Are we thinking that the pin rise time from 0 to 1 logic threshold is about 6.25 ns?
Yes I’m fairly certain I’m using CPOL=CPHA=0 which according to documentation is the preferred SPI mode for SD cards - data out with clk going low, sampling on clock going high.
Thanks for the link to the Hyperram thread. While I’ve been skimming it I’ve not really taken much notice since I’m not really interested in hyperram. Lots of good info there!
I’m sure some of that can be useful for SD. But most of the time wasted with SD is in waiting for the SD Card to acknowledge the command (mostly 2.7ms on my SD, but as bad as 4ms and best 1.6ms. At 8 clocks (4 instructions) per byte gives ~120us IIRC for the actual read or write of the 512 byte sector and 2 byte crc16.
I have determined the card works fine at double this speed (4 clocks per bit) and 200MHz clock ie 50MHz which it should do.
BTW IIRC there’s no A class rating on my card.
I am happy with my SD Driver that now can run in its’ own Cog. So I’ll have two versions - one time slower hubexec Version shared cog (can be pasm or spin2) and a faster separate cog exec version.
I just have to tweek the command section of the code to use the faster writes and then I can release it.
Later I might revisit it to use smart pins.