Deterministic Timing

Jim C · 2006-10-18 16:37

Team:

I am working on a timing problem wherein there are two related clock signals, call them CLK and Conv_Pin for the purpose of discussion. I have set up a counter on a pin called CLK that runs at 10 MHz. After n CLK cycles, Conv_Pin should toggle. This number n is large, say 5000, but for the purpose of testing I have set it to 5. See assembly code below:

DAT
org $0
_Clocks
mov dira, PortBits 'set portbits 4 and 5 to output for this COG

movi ctra,#%0_00100_000 'set counter mode to NCO single ended
movs ctra,#%00000100 'set APIN to portbit 4: CLK
mov frqa, Freq_ '

movi ctrb,#%0_01010_111 'Mode to POS Edge detector of Pin A
movs ctrb,#%00000100 'set monitored pin to portbit 4
mov frqb,#1 'set frqb to 1 to simplify counting

mov phsb, #0 'zero phsb

:loop
'nop
cmp Match,phsb wc 'evaluate how many edges detected
If_nc jmp #:loop 'loop if phsb hasn't reached Match

xor outa,Conv_Pin 'toggle Conv_Pin if phsb gets to Match value.
mov phsb, #0 'reset phsb counter
jmp #: loop 'continue looping

Match long 5 '5000 'declare and initialize Match variable
PortBits long %00000000_00000000_00000000_00110000 'Indicate direction of I/O bits (1 is output)
Conv_Pin long %00000000_00000000_00000000_00100000 'Conv_Pin toggles when Match count reached
Freq_ long %00100000_00000000_00000000_00000000 '2**29 >> 10 MHz

Basically, everything works, almost. CLK (Pin 5) runs happily along at 10 MHz, and Conv_Pin (Pin 5) toggles approximately every 5 full cycles of CLK. Actually, Conv_Pin toggles every 7 and a half cycles of CLK. It is the half cycle that has me stumped.

Not being exactly 5 cycles is OK, but the main problem is that Conv_Pin needs to toggle at the same time (to within 10 nsec) as CLK. Conv_Pin does toggle within 10 nsec of a CLK change (actually about 3 nsec), but it needs to be on the UP-stroke of CLK, and that only happens on the upgoing change of Conv_Pin. When Conv_Pin goes low, it is timed with the DOWN-stroke of CLK.

I figured this oddity was due to the "jmp" command having a variable number of clock cycles (4 or 8), so I tried putting a "nop" into the :loop, at the top of this loop. This changes things to the opposite, where Conv_pin toggles down on the upstroke of CLK and vice versa.

In summary, my question is how can I get the Conv_Pin to toggle both directions on an upgoing change of CLK (to withing 10 nsec)?

Thanks!!

Jim C.

Graham Stabler · 2006-10-18 16:45

you might want to put your code in code tags, it makes it much easier for us to read

Graham

Jim C · 2006-10-18 17:34

I'm learning, slowly. Try this.

DAT
              org       $0
_Clocks
              mov       dira, PortBits                  'set portbits 4 and 5 to output for this COG

              movi      ctra,#%0_00100_000              'set counter mode to NCO single ended
              movs      ctra,#%00000100                 'set  APIN to portbit 4:  CLK  
              mov       frqa, Freq_                        ' 

              movi      ctrb,#%0_01010_111              'Mode to POS Edge detector of Pin A
              movs      ctrb,#%00000100                 'set monitored pin to portbit 4
              mov       frqb,#1                         'set frqb to 1 to simplify counting 

              mov       phsb, #0                        'zero phsb
:loop         
              'nop
              cmp       Match,phsb   wc                 'evaluate how many edges detected                                 
  If_nc       jmp       #:loop                          'loop if phsb hasn't reached Match

              xor       outa,Conv_Pin                      'toggle Conv_Pin if phsb gets to Match value. 
              mov       phsb, #0                        'reset phsb counter
              jmp       #:loop                          'continue looping

Match         long      5    '5000                                    'declare and initialize Match variable
PortBits      long      %00000000_00000000_00000000_00110000  'Indicate direction of I/O bits (1 is output)
Conv_Pin      long      %00000000_00000000_00000000_00100000  'Conv_Pin toggles when Match count reached
Freq_         long      %00100000_00000000_00000000_00000000  '2**29 >> 10 MHz

As noted before, CLK (Pin 5) runs happily along at 10 MHz, and Conv_Pin (Pin 5) toggles approximately every 5 full cycles of CLK. Actually, Conv_Pin toggles every 7 and a half cycles of CLK. It is the half cycle that has me stumped.

Not being exactly 5 cycles is OK, but the main problem is that Conv_Pin needs to toggle at the same time (to within 10 nsec) as CLK. Conv_Pin does toggle within 10 nsec of a CLK change (actually about 3 nsec), but it needs to be on the UP-stroke of CLK, and that only happens on the upgoing change of Conv_Pin. When Conv_Pin goes low, it is timed with the DOWN-stroke of CLK.

I figured this oddity was due to the "jmp" command having a variable number of clock cycles (4 or 8), so I tried putting a "nop" into the :loop, at the top of this loop. This changes things to the opposite, where Conv_pin toggles down on the upstroke of CLK and vice versa.

In summary, my question is how can I get the Conv_Pin to toggle both directions on an upgoing change of CLK (to withing 10 nsec)?

Thanks

Jim C.

Mike Green · 2006-10-18 17:49

I'm not sure about the 1/2 cycle. Assuming that it has to do with the relative timing between the counters and the instruction execution, you might try providing the Conv_Pin directly from CtrB. Preload CtrB with the complement of two times the number of 10MHz POS edges you want to elapse between Conv_Pin toggles. Set up the CtrB output to output to the Conv_Pin pin (from PHSB bit 31). You'll still need the assembly routine to continually check PHSB for wraparound (between 0 and the initial value or between the negative initial value and $80000000. If it's in that "invalid" range, add a correction value to it (the initial value).
Mike

cgracey · 2006-10-18 18:25

Jim,
One thing: jumps/calls always take 4 clocks, like all standard instructions. DJNZ, TJZ, TJNZ take 4 on a branch, but 8 on a non-branch. You might be out of luck on the 10ns requirement, as at 80MHz, the Propeller has a 12.5ns clock period. This is the period at which the CTRs sample the pins (and update them), and instructions take 4 of these periods (except for hub instructions and those DJNZ... mentioned above).
When tracking pin changes with the CTRs, it is better to take relative readings of the CTRs' PHS registers than to clear them. This takes·a few·more instructions, but is lossless. The problem with just clearing·PHS is that by the time you do it,·it might have already advanced from your last reading, and now that data will be lost by the clear. In the case of your code below, this would result in a cumulative error.
Keep pushing forward. You're doing great!

Jim C said...
I'm learning, slowly. Try this.
DAT
              org       $0
_Clocks
              mov       dira, PortBits                  'set portbits 4 and 5 to output for this COG

              movi      ctra,#%0_00100_000              'set counter mode to NCO single ended
              movs      ctra,#%00000100                 'set  APIN to portbit 4:  CLK  
              mov       frqa, Freq_                        ' 

              movi      ctrb,#%0_01010_111              'Mode to POS Edge detector of Pin A
              movs      ctrb,#%00000100                 'set monitored pin to portbit 4
              mov       frqb,#1                         'set frqb to 1 to simplify counting 

[color=red]:loop2[/color]        mov       phsb, #0                        'zero phsb - [color=red]This is lossy because you might have missed some!![/color]
:loop                                                   [color=red]'better practice is to subtract away last reading - lossless![/color]
              'nop
              cmp       Match,phsb   wc                 'evaluate how many edges detected                                 
  If_nc       jmp       #:loop                          'loop if phsb hasn't reached Match

              xor       outa,Conv_Pin                   'toggle Conv_Pin if phsb gets to Match value. 
[color=red]'get rid of   mov       phsb, #0                        'reset phsb counter
[/color]              jmp       #[color=red]:loop2[/color]                         'continue looping

Match         long      5[color=red]-1[/color]   '5000[color=red]-1[/color]                         'declare and initialize Match[color=red] - must be minus 1 for if_nc!![/color]
PortBits      long      %00000000_00000000_00000000_00110000  'Indicate direction of I/O bits (1 is output)
Conv_Pin      long      %00000000_00000000_00000000_00100000  'Conv_Pin toggles when Match count reached
Freq_         long      %00100000_00000000_00000000_00000000  '2**29 >> 10 MHz
As noted before, CLK (Pin 5) runs happily along at 10 MHz, and Conv_Pin (Pin 5) toggles approximately every 5 full cycles of CLK. Actually, Conv_Pin toggles every 7 and a half cycles of CLK. It is the half cycle that has me stumped.

Not being exactly 5 cycles is OK, but the main problem is that Conv_Pin needs to toggle at the same time (to within 10 nsec) as CLK. Conv_Pin does toggle within 10 nsec of a CLK change (actually about 3 nsec), but it needs to be on the UP-stroke of CLK, and that only happens on the upgoing change of Conv_Pin. When Conv_Pin goes low, it is timed with the DOWN-stroke of CLK.

I figured this oddity was due to the "jmp" command having a variable number of clock cycles (4 or 8), so I tried putting a "nop" into the :loop, at the top of this loop. This changes things to the opposite, where Conv_pin toggles down on the upstroke of CLK and vice versa.

In summary, my question is how can I get the Conv_Pin to toggle both directions on an upgoing change of CLK (to withing 10 nsec)?

Thanks

Jim C.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Post Edited (Chip Gracey (Parallax)) : 10/18/2006 8:10:53 PM GMT

Jim C · 2006-10-18 18:39

Mike:
I think I understand where you are headed, but I am unsure about the details. I understand how to use counters for POS detection and count transistions of a pin. That is how the code shown does it. I like your idea, but I cannot figure out which control mode to use to monitor one pin for edges, and have an output on another pin. Seems to me the edge-detection modes don't change an output pin, but only add to the PHSx register.

Jim C

Tracy Allen · 2006-10-18 18:49

Did you also try it with NEG edge detector? I really don't think that will help, because that will shift it by two whole clock cycles.

At 10 mhz output from CTRA, with 80mhz CLKFREQ, there is time for only two pasm instructions in the 100 nanoseconds. There are two instructions in the tight test :loop, and 6 instructions in the larger loop after the match. That sort of accounts for the 7 clocks repeat cycle. The whole thing will phase lock.

But I don't know about the 1/2 cycle. I have a feeling that you could make it lock by inserting on of the Prop instructions that takes an odd number of clock cycles, which would be either a dummy hub access (7..22 cycles) or a WAITxxx instruction (5... cycles). That would take some strategic planning. Say, first get locked to the hub with a dummy hub access, then execute a fixed number of cycles to set counter A to 10 mhz and then execute another dummy HUB access to burn up 1/2 cycle and then execute the rest of the code to start counter B. Something like that.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

Jim C · 2006-10-18 19:17

Team:

I think I misunderstood Mike's suggestion. I'll have to study it and these other suggestions a bit. Then, 'scope out a few ideas. First thing I'll try is something along the waitcnt lines.

Any other ideas are welcome.

Jim C

pjv · 2006-10-18 20:40

Hi All;

Now here is a perfect application for an SX28...... provided there are not a lot of untold other things to do.

Coherence of 10 nSec ? Easy !

Remember not to short change those wonderful snappy little chips.

Cheers,

Peter (pjv)

Jim C · 2006-10-19 01:47

Team:

I have answered the question of how to align two clocks, for one simple case:

DAT
              org       $0
_Clocks
              mov       dira, PortBits                  'set portbits 4 and 5 to output for this COG

              movi      ctra,#%0_00100_000              'set counter mode to NCO single ended
              movs      ctra,#%00000100                 'set  APIN to portbit 4:  CLK  
              mov       frqa, Freq_                        ' 

              mov       Time,   cnt
              add       Time,   Delay
              nop 
              waitcnt   Time,   Delay
               
:loop
              XOR       outa,   CONV_                   ' toggle CONV
              waitcnt   Time,   Delay                   
              jmp       #:loop                      

PortBits      long      %00000000_00000000_00000000_00110000  'Indicate direction of I/O bits (1 is output)
CONV_         long      %00000000_00000000_00000000_00100000  'Conv_Pin toggles when Match count reached
Freq_         long      %00100000_00000000_00000000_00000000  '2**29 >> 10 MHz
Delay         long      40_000
Time          res       1

In this simple example, it turns out than the CONV_ pin toggles correctly both high and low, with the upstroke of the CLK pin (pin 4). The number of CLK cycles between toggles is (Delay / 8). In this example, there would be 5,000 complete CLK cycles between each CONV toggle. (I didn't count them all, but it worked correctly with a Delay from 24-80.) For reference, the 'NOP' just before the loop is necessary to line things up. Without it, the CONV toggled on the down-stroke of CLK. It seems too easy. But, hey, that's OK with me.

The reason it works so easily relates to the CLK speed. The CLK rate of 10 MHz works out such that there are 8 propeller CNT's for each full CLK cycle. I don't fully understand why the Delay calculation is so straightforward, since it would seem like you'd need to take account of the other instructions in the loop. But this works...

As for Peter's comparison to an SX-28, I am quite familiar with an SX solution. The Spin solution under construction here will hopfully replace an SX solution I've been working with for a year or so. The SX can nicely keep track of 10 nsec timing, but the problem I have is with the interrupts. There is no way to count 5,000 CLK counts or 4,952 counts, and then do something. The interrupts had to be powers of 2, essentially, which was quite limiting. I got it to work, but it was pretty convoluted, and I was running out of time during the interrupts to take on additional computations and communications.

Hopefully I'll be able to figure out the Propeller assembly adequately. The above solution will eliminate several hardware parts, and hopefully help modularize the code and flow such that I can keep track of things better.

Thanks for the ideas,

Jim C

Mike Green · 2006-10-19 02:12

To answer the "you'd need to take account of the other instructions in the loop." ...

The WAITCNT instruction stops instruction processing until the destination value equals the CNT value. It then adds the source operand to the destination and continues execution. Because of the wait involved, the destination register contains the expected CNT value to produce the wait requested (by the source value of the first WAITCNT). If another WAITCNT is executed any time before the CNT register equals the value in the destination register, the time between the two WAITCNTs will be exactly what is requested no matter what happens between the two WAITCNT instructions. Once the WAITCNT is executed, you can also predict the time to any other event nearby in the instruction stream. As you get further along in the instruction stream, it becomes harder for the programmer to keep track accurately of the time since the last "mark point" of the WAITCNT, but it can be done to a large extent.

Mike Green · 2006-10-19 02:26

About the NOP ... I don't see how the NOP (in that position) should be able to make any difference in the phase of the CLK signal. The time when the first WAITCNT continues is set by the point where the "mov Time,CNT" is executed (plus the value of Delay). The same number of system clock cycles will pass by regardless of what's between that MOV instruction and the WAITCNT. If you put a NOP before the MOV instruction, that's another story. For your information, it's usually more useful to reference CNT last in a computation of delay time like

     mov   Time,Delay
     add    Time,CNT

If, for some reason, the delay time calculation is changed, it's the time between this last access of CNT and the WAITCNT instruction that is fixed by the calculation.

pjv · 2006-10-19 02:46

Hi Jim;

No doubt about it. If significant other things need to be done while outputting the 10 MHz clock, then the SX solution is likely not as good a choice as a Propeller.

My point was to not short-change what can be done with an SX.

Hope you get it all working.

Cheers,

Peter (pjv)

Jim C · 2006-10-19 03:05

Mike:

Regarding the NOP and the phase, it is a definite phenomenon. I believe it has to do with when the 10 MHz NCO is started up, relative to when the loop is started. (Tracy Allen alludes to something like this earlier in this thread.)

As for the code related to WAITCNT, you are correct: I'm not totally clear on how it works. I will study your comments.

Peter:

The SX is great, as is the IDE. I would have loved to keep the project there, but it just didn't fit any more.

Jim C

cgracey · 2006-10-19 04:30

Jim C said...

As for the code related to WAITCNT, you are correct: I'm not totally clear on how it works. I will study your comments.

The WAITCNT D,S instruction waits for CNT to equal D, then writes D+S into D, setting the target value for the next time around. There is no need to count or include cycles which transpire within the loop formed by: WAITCNT, some code, and a JMP back to the WAITCNT. You just have to be sure that your loop won't take so long that you "miss the bus" and wind up waiting·54 seconds at 80MHz for it to come by again.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Mike Green · 2006-10-19 04:44

Jim,
Still, the NOP doesn't make sense since the NCO will run the same number of system clock cycles until the first WAITCNT is released whether the NOP is there or not and the timing of the Conv_pin toggles is locked into the WAITCNT loop timing. Maybe Chip can explain it.
Mike

cgracey · 2006-10-19 05:03

I agree that the NOP shouldn't make any difference. The "die is cast" when the initial waitcnt value is computed. You could execute almost 10,000 instruction in place of that NOP and it shouldn't make a difference.

Mike Green said...
Jim,
Still, the NOP doesn't make sense since the NCO will run the same number of system clock cycles until the first WAITCNT is released whether the NOP is there or not and the timing of the Conv_pin toggles is locked into the WAITCNT loop timing. Maybe Chip can explain it.
Mike

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Tracy Allen · 2006-10-19 05:42

The thing is, CTRA is advancing at CLKFREQ/2. Since instructions take 4 or 8 cycles, they can be either aligned with the 0 phase of CTRA or with the 1 phase of CTRA. Once it gets aligned in the initialization, it stays that way forever, because CTRA and the instruction cycle are relentlessly a multiples of 2. Unfortunately, the first program Jim posted got them off on the wrong foot, so to speak, with respect to matching the output toggle to the 0-1 state change of CTRA.

That is why it needs to have some instruction in there that jogs it by one clock cycle, that is, one instruction that takes an odd number of cycles. Once done, it should again stay there forever. It is that even/odd business that is important.

Jim, about the delay values from 24 to 80. Why 24? Was that the minimum? Did it work for all values of Delay, both even and odd? The main loop should take 8 clock cycles for the XOR and the JMP, plus the time required by the WAITCNT, which will be 5 up. But I really don't understand at all where in the cycle it makes the first comparison with the main clock. I'm guessing it takes 4 cycles to execute the WAITCNT and then the State machine is testing for the match at every clock cycle, and if the match occurs, the next instruction starts on the following clock cycle. So that would be 5 total. So your loop should be able to run in 13 clock cycles minimum with the minimum value of Delay. That is why I asked about the minimum delay and the other values. If it runs in 13 cycles, the output should (????) toggle alternately on rising and on falling edges of the CNTRA.

The reason I brought up the dummy HUB operation is that it can jog the operation by 7 clock cycles. That should only be necessary in the initialization, not in the main loop. Once it gets locked onto the proper odd/even cycle, it should stay there. I'm still less clear about the timing of the WAITxxx instructions.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

Mike Green · 2006-10-19 06:09

Tracy,
So the way to align the instruction "clock" with the counters is to use a WAITCNT with an odd or even delay for the the first WAITCNT only, then have even delay counts from then on. I'm not sure how you would use the HUB operations to reliably jog the clock phase since these can vary by an unpredictable amount unless you do two close together with a specific number of other instructions between them. Wouldn't it be easier to use a WAITCNT? Do you know how to figure the phasing with the counters? In particular, where in the 4 clock sequence does the counter mode get updated (on a mov to CTRx) and how long does it take before it does the first count cycle?
Mike

Tracy Allen · 2006-10-19 07:19

Mike, now that I look at it a little more, the HUB access might not work. After the first HUB access, the next opportunities come at 9, 25, 41, 57,... clocks later, with time for 2, 6, 10, 14, ... instructions. Theres will always be an odd number of clock cycles thrown away when the next HUB access starts. plus 7 cycles for the HUB access itself, and the sum is an even number. So, that doesn't have the desired effect, a shift to the opposite phase of the /2 clock.

It looks like WAITCNT with odd or even argument, or WAITPxx with 0 or 1, shows more promise. I don't know how to answer the question about where in the cycle it gets updated etc. I hope Chip can fill in the details.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

Jim C · 2006-10-19 12:48

Tracy said...

Jim, about the delay values from 24 to 80. Why 24? Was that the minimum? Did it work for all values of Delay, both even and odd? The main loop should take 8 clock cycles for the XOR and the JMP, plus the time required by the WAITCNT, which will be 5 up. But I really don't understand at all where in the cycle it makes the first comparison with the main clock. I'm guessing it takes 4 cycles to execute the WAITCNT and then the State machine is testing for the match at every clock cycle, and if the match occurs, the next instruction starts on the following clock cycle. So that would be 5 total. So your loop should be able to run in 13 clock cycles minimum with the minimum value of Delay. That is why I asked about the minimum delay and the other values. If it runs in 13 cycles, the output should (????) toggle alternately on rising and on falling edges of the CNTRA.

I was interested in aligning the timing with the 10 MHz signal, which takes 8 counter ticks, or two assembly instructions per cycle. I tried a waitcnt delay of 12, but the loop didn't work with that. It did work with 24, and as I could use that to decipher timing issues, I didn't go shorter than that. Regarding odd number of cycle delay ticks in the loop, consider it with 17. Each loop would take 1 tick longer than two full 10 MHz cycles (8 ticks per cycle): this 1 tick would force the loop to be 1/8 cycle out of phase. So, every 8 full cycles (of the 10 MHz part) the loop toggle pin would line up with the 10 MHz pin toggle. Note I did not test this, but I'm pretty sure it would look like this. I can test this later today to confirm.

Jim C

Phil Pilgrim (PhiPi) · 2006-10-19 19:13

Here's a program that will allow you to adjust the phase of the toggle:

[b]CON[/b]

  [b]_clkmode[/b]      = [b]xtal[/b]1 + [b]pll[/b]16x
  [b]_xinfreq[/b]      = 5_000_000

  pclk          = 1
  ptgl          = 0


[b]PUB[/b] start

  [b]cognew[/b](@entry, 0)

[b]DAT[/b]

              [b]org[/b]       0
[b]CON[/b]

  [b]_clkmode[/b]      = [b]xtal[/b]1 + [b]pll[/b]16x
  [b]_xinfreq[/b]      = 5_000_000

  pclk          = 1
  ptgl          = 0


[b]PUB[/b] start

  [b]cognew[/b](@entry, 0)

[b]DAT[/b]

              [b]org[/b]       0
entry         [b]mov[/b]       [b]dira[/b],[b]dira[/b]0
              [b]mov[/b]       [b]frqa[/b],[b]frqa[/b]0
              [b]mov[/b]       [b]phsa[/b],#0
              [b]mov[/b]       [b]ctra[/b],[b]ctra[/b]0
              [b]waitpeq[/b]   zero,clk_mask
              [b]waitpeq[/b]   clk_mask,clk_mask
              [b]add[/b]       [b]phsa[/b],phadj
              [b]mov[/b]       time,period
              [b]add[/b]       time,[b]cnt[/b]

:loop         [b]waitcnt[/b]   time,period
              [b]xor[/b]       [b]outa[/b],tgl_mask
              [b]jmp[/b]       #:loop

[b]dira[/b]0         [b]long[/b]      1 << pclk | 1 << ptgl
[b]frqa[/b]0         [b]long[/b]      1 << 29
[b]ctra[/b]0         [b]long[/b]      %00100 << 26 | pclk
phadj         [b]long[/b]      3 << 30
period        [b]long[/b]      16
clk_mask      [b]long[/b]      1 << pclk
tgl_mask      [b]long[/b]      1 << ptgl
zero          [b]long[/b]      0

time          [b]res[/b]       1

It works by first synchronizing the program execution to the rising edge of the 10MHz clock, using a pair of waitpeqs. Then it adjusts PHSA by a chosen amount (phadj). This allows fine-tuning of the phase relationship between the clock and the toggle signal by a judicious choice of phadj. To keep this phase relationship constant, of course, the value for period has to be a multiple of 8. The smallest value for period that this program works with is 16.

-Phil

Deterministic Timing

Comments