waitpxx detailed timing?

ags · 2014-04-07 14:44

What is the timing (in clock cycles) between a pin detected high (or low) - "releasing" the waitpxx instruction and beginning the next?

From other discussions, I believe that waitpxx requires a minimum of 6 clocks. If it is like waitcnt, the comparison happens during the 4th clock (unclear what happens during the 5th) and results are written in the 6th. So for:

waitcnt value, delay

cnt is sampled during the 3rd clock or "e" clock (using modified SDeR (fetchSource/fetchDestination/execute/writeResults) terminology for waitxx instructions of SDwm.R - or fetchSource/fetchDestination/wait/match/nothing/writeResult).

With waitpxx, is INA sampled during the 3rd clock? If so, and steps similar to waitcnt occur (fetchSource/fetchDestination/sampleINA/match/nothing/writeResult) that would mean that the following instruction begins 3 clocks after the src operand matches INA.

Is this correct? I searched for waitpne and waitne but found nothing about this. Thanks.

kuroneko · 2014-04-07 16:39

As stated in docn.errata the 4th cycle is the first match opportunity. See attached sample for more details. HTH

Phil Pilgrim (PhiPi) · 2014-04-07 19:45

I think what matters most here is how soon after the wait event occurs can an instruction effect change. So I wrote the following test program that outputs a PLL signal on a pin that's asynchronous to the system clock, does a wait, and outputs a pulse on another pin:

CON

  _clkmode      = xtal1 + pll16x
  _xinfreq      = 5_000_000

PUB  start

  cognew(@tester, 0)

DAT

              org       0
tester        mov       ctra,ctra0
              mov       frqa,frqa0
              mov       dira,#3
:loop         waitpne   one,#1
              waitpeq   one,#1
              or        outa,#2
              andn      outa,#2
              jmp       #:loop


ctra0         long      %00010 << 26 |  %011 << 23
frqa0         long      $0fcf_a5a5
one           long      1

Here's what the scope output looks like:

attachment.php?attachmentid=108029&d=1396927096

As expected, the uncertainty between the arrival time of the leading edge and recognition by the waitpeq is 12.5 ns. The amount of time it takes between that arrival and the output to the pin to register ranges between 56.25 ns and 68.75 ns (i.e. between 4.5 and 5.5 system clocks). See correction below.

-Phil

kuroneko · 2014-04-07 19:59

@PhiPi: Am I correct in assuming that the yellow trace represents outa[1]. If so why do you measure outa[0] from the falling edge (it's a waitpeq after all, waiting for high)?

Phil Pilgrim (PhiPi) · 2014-04-07 20:17

kuroneko,

Oh, Smile! I grabbed the wrong edge. Yellow is out[0]; blue, out[1]. Here's the correct display:

attachment.php?attachmentid=108028&d=1396927003

The delay ranges from 6.75 to 7.75 system clocks.

Thanks for the catch!

-Phil

Tracy Allen · 2014-04-08 23:14

Interesting approach Phil. Had to try it. Here it is on the LeCroy with width statistics. Result confirmed, 84.00 to 96.91µs, compared to 84.38 to 96.875 for exact 6.75 to 7.75 clocks. Average is close to 7.25 clocks. Trace A is the difference of (1)-(2). Trace C is the histogram of widths.

0

kuroneko · 2014-04-09 06:02

@PhiPi: can you please run the test for pins 7/8 again? Thanks.

Mark_T · 2014-04-09 06:03

This begs the question of what's the fastest input pin square wave that a stream of waitpeq/waitpne instructions
can synchronize to? Harder to measure if you don't allow other instructions between them, but you could
measure the speed of a loop like

:loop
                       waitpeq  mask, mask
                       waitpne  mask, mask
                       waitpeq  mask, mask
                       waitpne  mask, mask
                       waitpeq  mask, mask
                       waitpne  mask, mask
                       waitpeq  mask, mask
                       waitpne  mask, mask
                       xor      OUTA, pmask  ' output the input divided by 8 (or 10 when close to limit)
                       jmp     #:loop

And see the frequency beyond which it suddenly slows down as synchrony is lost. For 6 cycles it should
be at clkfreq/12, but will be sensitive to duty cycle if edges are subject to unequal delays.

Phil Pilgrim (PhiPi) · 2014-04-09 14:10

I tested pin 7 vs. pin 8. The results are about the same (6.75 - 7.75 ns):

attachment.php?attachmentid=108075&d=1397077810

Here's the revised program. You can change the constants to test any pair of pins with this one:

CON

  _clkmode      = xtal1 + pll16x
  _xinfreq      = 5_000_000

  PLL_PIN       = 7
  OUT_PIN       = 8

PUB  start

  cognew(@tester, 0)

DAT

              org       0
tester        mov       ctra,ctra0
              mov       frqa,frqa0
              mov       dira,dira0
:loop         waitpne   test_ptn,test_ptn
              waitpeq   test_ptn,test_ptn
              or        outa,outa0
              andn      outa,outa0
              jmp       #:loop


ctra0         long      %00010 << 26 |  %011 << 23 | PLL_PIN
frqa0         long      $0fcf_a5a5
dira0         long      1 << OUT_PIN | 1 << PLL_PIN
outa0         long      1 << OUT_PIN
test_ptn      long      1 << PLL_PIN

-Phil

kuroneko · 2014-04-09 18:46

Phil Pilgrim (PhiPi) wrote: »

I tested pin 7 vs. pin 8. The results are about the same (6.75 - 7.75 ns) ...

Thanks, I was just wondering about the pad delay effect. Seems it's negligible in this context.

ags · 2014-04-10 17:29

So although I framed my original question in terms of "how does waitpxx work at the individual clock-cycle level?", the resulting answer is actually more useful, and other than my inherent constant curiosity about how things actually work under the covers, does give me the information needed. Bottom line is that depending on when the input edge occurs relative to the Prop internal clock edge, the next instruction will begin in no more than 4 and no less than 2 clock cycles later.

Back to my "how does it work?" quest, I suppose this means that INA is sampled/latched in the third clock cycle ("e" using the "SDeR" terminology) as is the case with most (all?) other instructions, and there are three more clocks that pass before waitpxx completes. This also matches what I've seen stating that waitpxx takes 6+ clocks.

Still on the "what is happening?" path, if live registers (e.g. INA) are latched in "e", and the actual comparison is done in the next clock ("e+1", or "R" for "normal instructions") and the last clock is used to write the result (if specified), then what is happening in the clock cycle in between ("e+2")? Is it the same as is happening with waitcnt (whatever that is)?

kuroneko · 2014-04-21 11:18

waitxxx perform add underneath (except waitpne which does dst += src + 1).

waitpxx detailed timing?

Comments