One more clock delay on next silicon inputs

Wendy at ON Semi noticed there's a potential metastability problem with the way we are registering pin inputs. We added another set of flops for the incoming pin signals. This means that there is now one more clock delay on all inputs from pins. I will make new FPGA files soon so we can confirm this doesn't cause any problems with existing code, especially what will go into ROM.
Comments
IIRC currently we see 1 clock prior to the start of a testp and 2 clocks prior for ina/inb, so that will now be 2 & 3 respectively.
IIRC outputs are 3 clocks after the instruction completion. This will remain the same?
Postedit 14 Feb 2019 - fix
Ill do it later
fixed 14 Feb 2019
Does this look correct? This is my send/receive code in the current ROM...
BTW your flash code seems to sample differently ???
Note:
Data Out follows CLK low by 2 clocks. It can either precede by 2 or after 2 clocks.
Data In will precisely sample on CLK going high in the new silicon if I have the info correct.
Postedit - add out/sample
Postedit fix out/sample
Does that mean the flash timing should be adjusted, to give enough margin ?
1. Data OUT follows the CLK LOW by 2 clocks (as-is)
-or-
2. Data OUT precedes the CLK LOW by 2 clocks
Each bit-time is 23 clocks plus a lost hub window.
Checking 25Q128 specs, 1. is preferable. All timings are met.
For the current ES silicon (v32i), presuming the delay from DRVx to the pin is +3 clocks after the end of the DRVx instruction, a minimum delay of 5 clocks (waitx #3) is required for the TESTP x instruction to see the output value on the pin. ie the TESTP instruction latches the pin value 1 clock prior to the start of the TESTP instruction.
On the respin silicon, this is expected to be waitx #4 or 2 clocks prior to the start of the TESTP instruction. INx instructions should take an extra clock prior.
Here is the test code for the P2D2. Change the clock values in the CON section as marked for P2-EVAL.
'' +--------------------------------------------------------------------------+ '' | Cluso's P2 Test Program (c)2013-2019 "Cluso99" (Ray Rodrick)| '' +--------------------------------------------------------------------------+ '' RR20190201 0001a generalised test program ''============================[ CON ]============================================================ CON {{'P2-EVAL _XTALFREQ = 20_000_000 ' crystal frequency _XDIV = 2 '\ '\ crystal divider to give 10.0MHz _XMUL = 10 '| 100 MHz '| crystal / div * mul to give 100 MHz _XDIVP = 1 '/ '/ crystal / div * mul /divp to give 100 MHz _XOSC = %10 '15pF ' %00=OFF, %01=OSC, %10=15pF, %11=30pF }} 'P2D2 _XTALFREQ = 12_000_000 ' crystal frequency _XDIV = 3 '\ '\ crystal divider to give 4.0MHz _XMUL = 12 '| 24 MHz '| crystal / div * mul to give 48 MHz _XDIVP = 2 '/ '/ crystal / div * mul /divp to give 24 MHz _XOSC = %01 'Osc ' %00=OFF, %01=OSC, %10=15pF, %11=30pF _XSEL = %11 'XI+PLL ' %00=rcfast(20+MHz), %01=rcslow(~20KHz), %10=XI(5ms), %11=XI+PLL(10ms) _XPPPP = ((_XDIVP>>1) + 15) & $F ' 1->15, 2->0, 4->1, 6->2...30->14 _CLOCKFREQ = _XTALFREQ / _XDIV * _XMUL / _XDIVP ' internal clock frequency _SETFREQ = 1<<24 + (_XDIV-1)<<18 + (_XMUL-1)<<8 + _XPPPP<<4 + _XOSC<<2 ' %0000_000e_dddddd_mmmmmmmmmm_pppp_cc_00 ' setup oscillator _ENAFREQ = _SETFREQ + _XSEL ' %0000_000e_dddddd_mmmmmmmmmm_pppp_cc_ss ' enable oscillator us = _clockfreq/1_000_000 ' 1us '------------------------------------------------------------------------------------------------ _baud = 115_200 _bitper = (_clockfreq / _baud) << 16 + 7 ' 115200 baud, 8 bits _txmode = %0000_0000_000_0000000000000_01_11110_0 'async tx mode, output enabled for smart output _rxmode = %0000_0000_000_0000000000000_00_11111_0 'async rx mode, input enabled for smart input '------------------------------------------------------------------------------------------------ rx_pin = 63 ' pin serial receiver tx_pin = 62 ' pin serial transmitter spi_cs = 61 ' pin SPI memory select (also sd_ck) spi_ck = 60 ' pin SPI memory clock (also sd_cs) spi_di = 59 ' pin SPI memory data in (also sd_di) spi_do = 58 ' pin SPI memory data out (also sd_do) '------------------------------------------------------------------------------------------------ test_pin = 49 CON '' +--------------------------------------------------------------------------+ '' | Cluso's LMM_SerialDebugger for P2 (c)2013-2018 "Cluso99" (Ray Rodrick)| '' +--------------------------------------------------------------------------+ '' xxxxxx : xx xx xx xx ... <cr> DOWNLOAD: to cog/lut/hub {addr1} following {byte(s)} '' xxxxxx - [xxxxxx] [L] <cr> LIST: from cog/lut/hub {addr1} to < {addr2} L=longs '' xxxxxx G <cr> GOTO: to cog/lut/hub {addr1} '' Q <cr> QUIT: Quit Rom Monitor and return to the User Program '' Lffffffff[.]xxx<cr> LOAD: Load file from SD '' Rffffffff[.]xxx<cr> RUN: Load & Run file from SD '' <esc><cr> TAQOZ: goto TAQOZ '' +--------------------------------------------------------------------------+ '' LMM DEBUGGER - CALL Modes...(not all modes supported) '' +--------------------------------------------------------------------------+ _MODE = $F << 5 ' mode bits defining the call b8..b5 (b4..b0 are modifier options) _SHIFT = 5 ' shr # to extract mode bits _HEX_ = 2 << 5 ' hex... _REV_ = 1 << 4 ' - reverse byte order _SP = 1 << 3 ' - space between hex output pairs '_DIGITS = 7..0 where 8->0 ' - no. of digits to display _LIST = 3 << 5 ' LIST memory line (1/4 longs) from cog/hub _ADDR2 = 1 << 4 ' 1= use lmm_p2 as to-address _LONG_ = 1 << 1 ' 1=display longs xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx _TXSTRING = 4 << 5 ' tx string (nul terminated) from hub _RXSTRING = 5 << 5 ' rx string _ECHO_ = 1 << 4 ' - echo char _PROMPT = 1 << 3 ' - prompt (lmm_x) _ADDR = 1 << 2 ' - addr of string buffer supplied _NOLF = 1 << 1 ' - strip <lf> _MONITOR = 7 << 5 ' goto rom monitor '' +--------------------------------------------------------------------------+ '' P2 ROM SERIAL ROUTINES (HUBEXEC) '' +--------------------------------------------------------------------------+ _SerialInit = $fcab8 ' Serial Initialise (lmm_x & lmm_bufad must be set first) _HubTxCR = $fcae4 ' Sends <cr><lf> (overwrites lmm_x) _HubTxRev = $fcaec ' Sends lmm_x with bytes reversed _HubTx = $fcaf0 ' Sends lmm_x (can be up to 4 bytes) _HubHexRev = $fcb24 ' Sends lmm_x with bytes reversed as Hex char(s) as defined in lmm_f _HubHex8 = $fcb28 ' Sends lmm_x as Hex char(s) after setting lmm_f as 8 hex chars _HubHex = $fcb2c ' Sends lmm_x as Hex char(s) as defined in lmm_f _HubTxStrVer = $fcb9c ' Sends $0 terminated string at lmm_p address after setting lmm_p=##_str_vers _HubTxString = $fcba4 ' Sends $0 terminated string at lmm_p address _HubListA2H = $fcbc4 ' List/Dump line(s) from lmm_p address to lmm_p2 address after setting lmm_f=#_LIST+_ADDR2 _HubList = $fcbc8 ' List/Dump line(s) from lmm_p address to lmm_p2 address according to lmm_f _HubRx = $fcb10 ' Recv char into lmm_x _HubRxStrMon = $fccc4 ' Recv string into lmm_bufad address after setting prompt=lmm_x=#"*" & params=lmm_f=#_RXSTRING+_ECHO_+_PROMPT _HubRxString = $fcccc ' Recv string into lmm_p/lmm_bufad address according to params in lmm_f _HubMonitor = $fcd78 ' Calls the Monitor; uses lmm_bufad as the input buffer address _RdLongCogHub = $fcf34 ' read cog/lut/hub long from lmm_p address into lmm_x, then lmm_p++ _str_vers = $fd014 ' locn of hub string, $0 terminated '' +--------------------------------------------------------------------------+ '' HUB ADDRESSES '' +--------------------------------------------------------------------------+ _HUBROM = $FC000 ' ROM $FC000 _HUBBUF = $FC000 ' overwrite Booter _HUBBUFSIZE = 80 ' RxString default size for _HUBBUF '' +--------------------------------------------------------------------------+ ''============[ COG VARIABLES $1E0-$1EF - MONITOR]============================= ''-------[ LMM parameters, etc ]----------------------------------------------- lmm_x = $1e0 ' parameter passed to/from LMM routine (typically a value) lmm_f = $1e1 ' parameter passed to LMM routine (function options; returns unchanged) lmm_p = $1e2 ' parameter passed to/from LMM routine (typically a hub/cog ptr/addr) lmm_p2 = $1e3 ' parameter passed to/from LMM routine (typically a 2nd hub/cog address) lmm_c = $1e4 ' parameter passed to/from LMM routine (typically a count) ''-------[ LMM additional workareas ]------------------------------------------ lmm_w = $1e5 ' workarea (never saved - short term use between calls, except _HubTx) lmm_tx = $1e6 ' _HubTx lmm_hx = $1e7 ' _HubHex/_HubString lmm_hx2 = $1e8 ' _HubHex lmm_hc = $1e9 ' " lmm_lx = $1ea ' _HubList lmm_lf = $1eb ' " lmm_lp = $1ec ' " lmm_lp2 = $1ed ' " lmm_lc = $1ee ' " lmm_bufad = $1ef ' _HubRxString '' +--------------------------------------------------------------------------+ '' ASCII equates '' +--------------------------------------------------------------------------+ _CLS_ = $0C _BS_ = $08 _LF_ = $0A _CR_ = $0D _TAQOZ_ = $1B ' <esc> goto TAQOZ '' +--------------------------------------------------------------------------+ DAT org 0 entry '+=============================================================================+ '+-------[ Set Xtal ]----------------------------------------------------------+ entry2 hubset #0 ' set 20MHz+ mode hubset ##_SETFREQ ' setup oscillator waitx ##20_000_000/100 ' ~10ms hubset ##_ENAFREQ ' enable oscillator '+-----------------------------------------------------------------------------+ waitx ##_clockfreq*5 ' just a delay to get pc terminal running '+-----------------------------------------------------------------------------+ '+=============================================================================+ '+ Display results + '+=============================================================================+ '+-------[ Start Serial ]------------------------------------------------------+ mov lmm_bufad, ##_HUBBUF ' locn of hub buffer for serial routine mov lmm_x, ##_bitper ' sets serial baud call #_SerialInit ' initialise serial '+-----------------------------------------------------------------------------+ '' mov lmm_x, #0 ' clear screen '' call #_hubTx '+-----------------------------------------------------------------------------+ mov lmm_f, #_TXSTRING+0 ' send string, $00 terminated mov lmm_p, ##_hubstring ' must be in hub! call #_HubTxString '+-----------------------------------------------------------------------------+ '+=============================================================================+ '+ Test Section + '+=============================================================================+ mov w,#2 mov pin,#0 ' start at P0 mov c,#62 .again call #loop add pin,#1 djnz c,#.again call #_hubTxCR mov w,#3 mov pin,#0 ' start at P0 mov c,#62 .again3 call #loop add pin,#1 djnz c,#.again3 '+-----------------------------------------------------------------------------+ jmp #$ ' loop here <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< '+-----------------------------------------------------------------------------+ loop mov x,#0 drvl pin waitx #8 drvh pin waitx w testp pin wc rcl x,#1 or x,#"0" mov lmm_x,x call #_hubTx mov x,#0 drvh pin waitx #8 drvl pin waitx w testp pin wc rcl x,#1 or x,#"0" mov lmm_x,x call #_hubTx call #_hubTxCR flth pin ret '+-----------------------------------------------------------------------------+ jmp #$ ' loop here <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< '+-----------------------------------------------------------------------------+ x long 0 c long 0 w long 2'3 pin long 55 fit $1E0 ' check fits in cog with lmm_x '+-----------------------------------------------------------------------------+ ''============[ HUB VARIABLES ]================================================= DAT orgh $ _hubstring byte "P2 Test Program v0003b",13,10,0 alignl ''---------------------------------------------------------------------------------------------------
P2 Test Program v0003b 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 11 11 11 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 11 11 11 01 01 01 01 01 01 01 01 11 11 10 11 11 01 10 10 10 11 10 10 01 01 11 01 10 11 11 11 10 10 10 11 10 10 10 11 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 10 01 11 10 11 10 11 11 11 01 01 01 01 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 11 11 11 10 10 10 10
So w=5 for reliable results at 80MHz.
I will retry mine shortly at 25MHz on P2D2. I was running at 24MHz for timing above.
24 MHz - w=3 (w=2 fails)
200MHz - w=4 (w=3 varies per pin so is on the edge)
300MHz - w=5 (w=4 varies per pin so is on the edge)
This is the essence of the test
drvl pin waitx #8 drvh pin waitx w testp pin wc rcl x,#1
Cluso,
There is a good chance that all those lags go up by one in the re-spin. I don't understand it and I'm not confident that Chip has sorted it. We might just have to live with it in the Prop2.
Hmm.. That test seems to be saying the async path delays exceed higher SysCLKs ( which the added delay is not going to fix, but I think that was done for other reasons).
It also means
* ROM code may not work at higher sysclks (some were wanting to call-into ROM routines ?)
* Deterministic pin-pin paths may not be possible above some sysclk, and may require sysclk bands (with PVT edges) above that sysclk.
If you enable clocking in the digital pin mode, things should firm up. In your current test, you are stacking up a bunch of delays, whereas if clocking were to be enabled, the 3.3V I/O pin would register the input and output, hiding the internal propagation delays for which setup and hold time requirements ARE covered for clocked mode. If I could redesign the I/O pad, I would make it always synchronous (clocked).
My ROM SD code is working nicely at 24MHz which is the target speed so I think I will just leave that part of the code as-is. Do you agree?
-Phil
I agree.
There are many more levels of logic. Without those registers, the Fmax might otherwise be only 40 megahertz.
That has never made one bit of difference to consistency right from first discovery! This is why I've always been raising concern.
So, enabling clocking does increase latency by one clock, for both input and output, right?
But you are seeing additional delay times that are longer than a clock at very high frequencies. Those delays must be from the circuitry in the 3.3V I/O pad, then, which is not going to become different from what it currently is. If I had known the chip was going to actually work at 300MHz, I would have designed the I/O pad to work faster.
At this point, the I/O pad is what it is.
Wendy at ON is implementing some extra timing constraints which will group the delivery of all core-to-pin signals within 300ps, aside from being within setup-time and hold-time requirements. The pin-to-core IN signals will be constrained similarly. This will regulate asynchronous pin I/O, so that all pins will behave as identically as is reasonably possible.
And Cluso has just proven the problem is not only no better in the real silicon but the overall effect is worse because the finished product has so much over-clock-ability.
The FPGA is showing the exact same behaviour. And always has.
Here is code to test this out. It scans pins 0-61 and outputs the results to serial.
You can change the clock in the CON section (currently 300MHz for P2D2)
This is the important bit
loop mov x,#0 drvl pin waitx #8 drvh pin waitx w testp pin wc rcl x,#1 or x,#"0" mov lmm_x,x call #_hubTx
I run thru this with w varying from #2 to #5 and outputting either a 0 or 1. I test both low and high initially.At different speeds w needs to be different, and at some frequencies, there is a variable w where the result can be either so we are on the edge with that value.
from the main test section
'+=============================================================================+ '+ Test Section + '+=============================================================================+ mov w,#2 mov pin,#0 ' start at P0 mov c,#62 .again call #loop add pin,#1 djnz c,#.again call #_hubTxCR mov w,#3 mov pin,#0 ' start at P0 mov c,#62 .again3 call #loop add pin,#1 djnz c,#.again3 call #_hubTxCR mov w,#4 mov pin,#0 ' start at P0 mov c,#62 .again4 call #loop add pin,#1 djnz c,#.again4 call #_hubTxCR mov w,#5 mov pin,#0 ' start at P0 mov c,#62 .again5 call #loop add pin,#1 djnz c,#.again5 '+-----------------------------------------------------------------------------+ jmp #$ ' loop here <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< '+-----------------------------------------------------------------------------+ loop mov x,#0 drvl pin waitx #8 drvh pin waitx w testp pin wc rcl x,#1 or x,#"0" mov lmm_x,x call #_hubTx mov x,#0 drvh pin waitx #8 drvl pin waitx w testp pin wc rcl x,#1 or x,#"0" mov lmm_x,x call #_hubTx call #_hubTxCR flth pin ret '+-----------------------------------------------------------------------------+
The remaining code/equates are chaf to utilise the inbuilt ROM serial routines.
I don't know what to do about it. An I/O pin is big and slow compared to internal signals. It's like the difference between a hydraulic backhoe and your hands on the controls. Just to implement ESD protection on the I/O pad, you incurr 1000x the capacitive loading of the core-side signals.