One more clock delay on next silicon inputs
cgracey
Posts: 14,229
Wendy at ON Semi noticed there's a potential metastability problem with the way we are registering pin inputs. We added another set of flops for the incoming pin signals. This means that there is now one more clock delay on all inputs from pins. I will make new FPGA files soon so we can confirm this doesn't cause any problems with existing code, especially what will go into ROM.
Comments
IIRC currently we see 1 clock prior to the start of a testp and 2 clocks prior for ina/inb, so that will now be 2 & 3 respectively.
IIRC outputs are 3 clocks after the instruction completion. This will remain the same?
Postedit 14 Feb 2019 - fix
Ill do it later
fixed 14 Feb 2019
Does this look correct? This is my send/receive code in the current ROM...
BTW your flash code seems to sample differently ???
Note:
Data Out follows CLK low by 2 clocks. It can either precede by 2 or after 2 clocks.
Data In will precisely sample on CLK going high in the new silicon if I have the info correct.
Postedit - add out/sample
Postedit fix out/sample
Does that mean the flash timing should be adjusted, to give enough margin ?
1. Data OUT follows the CLK LOW by 2 clocks (as-is)
-or-
2. Data OUT precedes the CLK LOW by 2 clocks
Each bit-time is 23 clocks plus a lost hub window.
Checking 25Q128 specs, 1. is preferable. All timings are met.
For the current ES silicon (v32i), presuming the delay from DRVx to the pin is +3 clocks after the end of the DRVx instruction, a minimum delay of 5 clocks (waitx #3) is required for the TESTP x instruction to see the output value on the pin. ie the TESTP instruction latches the pin value 1 clock prior to the start of the TESTP instruction.
On the respin silicon, this is expected to be waitx #4 or 2 clocks prior to the start of the TESTP instruction. INx instructions should take an extra clock prior.
Here is the test code for the P2D2. Change the clock values in the CON section as marked for P2-EVAL.
So w=5 for reliable results at 80MHz.
I will retry mine shortly at 25MHz on P2D2. I was running at 24MHz for timing above.
24 MHz - w=3 (w=2 fails)
200MHz - w=4 (w=3 varies per pin so is on the edge)
300MHz - w=5 (w=4 varies per pin so is on the edge)
This is the essence of the test
Cluso,
There is a good chance that all those lags go up by one in the re-spin. I don't understand it and I'm not confident that Chip has sorted it. We might just have to live with it in the Prop2.
Hmm.. That test seems to be saying the async path delays exceed higher SysCLKs ( which the added delay is not going to fix, but I think that was done for other reasons).
It also means
* ROM code may not work at higher sysclks (some were wanting to call-into ROM routines ?)
* Deterministic pin-pin paths may not be possible above some sysclk, and may require sysclk bands (with PVT edges) above that sysclk.
If you enable clocking in the digital pin mode, things should firm up. In your current test, you are stacking up a bunch of delays, whereas if clocking were to be enabled, the 3.3V I/O pin would register the input and output, hiding the internal propagation delays for which setup and hold time requirements ARE covered for clocked mode. If I could redesign the I/O pad, I would make it always synchronous (clocked).
My ROM SD code is working nicely at 24MHz which is the target speed so I think I will just leave that part of the code as-is. Do you agree?
-Phil
I agree.
There are many more levels of logic. Without those registers, the Fmax might otherwise be only 40 megahertz.
That has never made one bit of difference to consistency right from first discovery! This is why I've always been raising concern.
So, enabling clocking does increase latency by one clock, for both input and output, right?
But you are seeing additional delay times that are longer than a clock at very high frequencies. Those delays must be from the circuitry in the 3.3V I/O pad, then, which is not going to become different from what it currently is. If I had known the chip was going to actually work at 300MHz, I would have designed the I/O pad to work faster.
At this point, the I/O pad is what it is.
Wendy at ON is implementing some extra timing constraints which will group the delivery of all core-to-pin signals within 300ps, aside from being within setup-time and hold-time requirements. The pin-to-core IN signals will be constrained similarly. This will regulate asynchronous pin I/O, so that all pins will behave as identically as is reasonably possible.
And Cluso has just proven the problem is not only no better in the real silicon but the overall effect is worse because the finished product has so much over-clock-ability.
The FPGA is showing the exact same behaviour. And always has.
Here is code to test this out. It scans pins 0-61 and outputs the results to serial.
You can change the clock in the CON section (currently 300MHz for P2D2)
This is the important bit I run thru this with w varying from #2 to #5 and outputting either a 0 or 1. I test both low and high initially.
At different speeds w needs to be different, and at some frequencies, there is a variable w where the result can be either so we are on the edge with that value.
from the main test section
The remaining code/equates are chaf to utilise the inbuilt ROM serial routines.
I don't know what to do about it. An I/O pin is big and slow compared to internal signals. It's like the difference between a hydraulic backhoe and your hands on the controls. Just to implement ESD protection on the I/O pad, you incurr 1000x the capacitive loading of the core-side signals.