Distributed clock synchronization

ManAtWork · 2024-04-02 09:04

Say, I'd like to synchronize multiple boards each with a P2 on it running on seperate crystals to an external clock. Each P2 board runs a control loop with a 1ms cycle time. The crystals have +/-50ppm tolerance so at worst case the highest and lowest frequency can be 100ppm apart. Providing the synchronization signal is not the problem. This is done by the field bus controller IC that provides a SYNC signal locally to each P2 with one pulse per millisecond synchronized to +/- 10ns for all boards.

As the local crystal frequency differs for each P2 they have to stretch or squeeze their cycle times to keep synchronized. Letting them drift would lead to phase jitter and eventually an over or underrun where two or no command values would be received during one control loop cycle.

The easiest way to do this would be to adjust the PLL itself because that would keep all smart pins (ADCs, PWM etc.) and other software events, interrupts and so on in sync. The problem is that the PLL has only relatively coarse adjustment granularity. If the crystal has 25MHz and the system clock is 200MHz the smallest frequency step would be 1/512 or ~2000ppm. Static frequency difference is <100ppm and the dynamic drift due to temperature changes can be as low as 1 or 2ppm. This would mean that synchronisation with the PLL would require some sort of sub-base-period dithering of the PLL settings. I fear that dynamically changing the PLL division and multiplication factors could have a bad effect on ADC noise.

I think the smart pin PWM frame period (X[31..16]) can be changed dynamically so that's not a problem. Adjusting the PWM output value (X15:0]) to keep the same duty cycle is only a simple multiplication. But what about changing the ADC sample period on the fly? Especially in SINC2 or SINC3 filtering mode this is probably a bad idea. If it's possible at all without invalidating the filter accumulators it would make VIO/GIO calibration extremely complicated as past calibration values would have to be re-scaled to match the current sample period. It's probably better to caclulate the ADC period for the highest clock frequency and insert gaps to slow the sampling down to the actual required sample frequency to match the external clock.

Has anybody had a similar problem? How did you solve it?

pik33 · 2024-04-02 11:32

Maybe it would be simpler to generate one clock signal for all these P2s instead of using individual crystals.

ManAtWork · 2024-04-02 12:01

Distriubuting the clock with an extra cable would be expensive. The boards can be 100m apart, the cable length is unknown and the signal delay has to be compensated. This is already done by the EtherCat field bus controller. Feeding the P2 with an external clock is also no option because it would require an extra watchdog that is independant from the external clock. Otherwise a signal interruption would cause the P2 to hang and blow up the power stage. (100% PWM -> current runaway)

BTW, to measure the deviation between external and local clock it is required to measure the phase difference of the 1ms SYNC signal vs. the internal control cycle. Unfortunatelly, there is no smart pin counter mode that captures CT on a rising edge of a pin input. But I thought there was a smart pin mode to measure phase difference or delay between A and B input. So by sacrifying a pin and toggle it in the software loop I could measure the phase difference to the external SYNC pulse. But I can't remember which mode it was. They have so confusing names...

Rayman · 2024-04-02 12:11

Like @pik33 would have some 20 MHz source and send that out on coax to all the others. Maybe have one P2 with the clipped sine cmos output and use buffer amps to send that same signal to the others. Sorta like one does with multiple scopes:
https://www.keysight.com/blogs/en/tech/bench/2018/12/05/how-to-time-synchronize-multiple-function-generators-together

evanh · 2024-04-02 12:44

Can't it just be done at the higher millisecond level? Count the number of sysclock ticks in each millisecond interval, and doing the next millisecond of scaling using that interval. Or maybe average with a gentle phase aligning. You'd probably also want to have a fallback for loss of signal too.

evanh · 2024-04-02 12:50

I mean you're not trying to build a DSO, nor even an audio system. It's only motion control. A microsecond of jitter is nothing.

ManAtWork · 2024-04-02 13:16

It's not something I build for my own purpose a single time but something I'll sell thousands of times. The EtherCat connection is already there and does the job of synchronizing the clocks extremely well. Jitter is below 10ns which is overkill for motion control, maybe. But 1µs is too much for some applications.

100ppm @ 200MHz system clock ans 1ms cycle time means I have to insert or skip a maximum of 20 clocks per cycle. That's 100ns which is tolerable. So instead of trying to tune the frequency perfectly we could simply wait until the error sums up enough and then insert a "leap tick" like a day is inserted every leap year to keep the calendar synchronized to the earth/sun orbit.

ManAtWork · 2024-04-02 14:42

I ran a brute force Execel sheet that calculates all possible PLL settings. The result is that there are a lot of fine adjustments possible near "odd" rational numbers but only few near the integer factors.

%DDDDDD %MMMM...    output MHz
49  398 199,5
50  406 199,5098039
51  414 199,5192308
52  422 199,5283019
53  430 199,537037
54  438 199,5454545
55  446 199,5535714
56  454 199,5614035
57  462 199,5689655
58  470 199,5762712
59  478 199,5833333
60  486 199,5901639
61  494 199,5967742
62  502 199,6031746
63  510 199,609375
0   7   200
1   15  200
...
38  311 200
43  351 200
63  512 200,390625
62  504 200,3968254
61  496 200,4032258
60  488 200,4098361
59  480 200,4166667
58  472 200,4237288
57  464 200,4310345
56  456 200,4385965
55  448 200,4464286
54  440 200,4545455
53  432 200,462963
52  424 200,4716981
51  416 200,4807692
50  408 200,4901961
49  400 200,5

So the nearest frequencies around 200MHz are 199.6 and 200.4. The possible output frequencies are not evenly distributet but in most cases they are no more than 10kHz (~50ppm) apart. So it should be possible to select 201MHz as system clock and set the cycle time to 201,000 clocks which equals 1ms.

Of course, this is not possible if the system clock is used as time base for sensible things like the Ethernet PHY clock. But serial ports and other simple things should tolerate up to 1% frequency shift without problems.

Wuerfel_21 · 2024-04-02 15:02

You could use a voltage-controlled oscillator for the clock input and use a DAC pin to fine-tune it up or down. I'm no expert on such things, but it may be possible to make that happen with just external components to the P2's internal crystal drive circuit.

evanh · 2024-04-02 15:06

One thing that needs tested is, if you're going to be twiddling the PLL, can it reliably shift parameters without a lock-up/crashing. I'm assuming a direct transition to the new frequency is desirable. The default compiler based solutions at the moment for any adjustment is to switch to RCFAST in the interim - With intermediate interval measured in milliseconds. The reason is because the final DIVP parameter can't be adjusted reliably while sysclock is operating from the PLL.

I don't think anyone has actually tried simply adjusting the other two, MUL and DIVD, parameters without switching to RCFAST first.

Also, at the high end, there was concern about frequency overshoot as the PLL tracked into the new settings. Which was the original reason for using RCFAST before the crashes were discovered. But again, this might be a minor bump when only adjusting by a small amount.

ManAtWork · 2024-04-02 15:32

Good point. I've looked up the docs and they say

WARNING: Incorrectly switching away from the PLL setting (%SS = %11 and %CC <> %00) with %PPPP = %1111 can cause a clock glitch which will hang the P2 chip until a reset occurs. In order to safely switch away, always start by switching to an internal RC oscillator (%SS = %00 or %01), while maintaining the %PPPP = %1111 and %CC settings.

So I think changing only the %DDD and %MMM parameters should work without glitches. I have to check if PLL changes affect the ADC accuracy and noise so if there are any glitches that cause the P2 to hang I should notice.

evanh · 2024-04-02 15:57

When hand crafting HUBSETs like this there is a couple of, that I know of, house keepings to deal with eventually. One is updating the system variables for clkmode and clkfreq, the other is feeding the debugger the same via pin P63. Further reading in the Spin2 Manual under section "DEBUG dynamic clock frequency adaptation".

ManAtWork · 2024-04-02 17:38

Thanks evan. I think that's not necessary in my case. Changing the clock frequency only by 50 or 100ppm won't affect the debugger, I bet.

Christof Eb. · 2024-04-02 17:43

Hi,
If something like this might help?
https://www.electronics-notes.com/articles/electronic_components/quartz-crystal-xtal/vcxo-voltage-controlled-crystal-xtal-oscillator.php
Christof

SaucySoliton · 2024-04-02 22:38

When I made an NTSC video capture program I used a variable frequency ADC for that. On the RevA chips I used the Goertzel mode. I put the window function into the LUT instead of sin/cos. On RevC chips I use the scope filter. Sample frequency is set by the streamer frequency. I used a software PLL to adjust the streamer frequency to get a fixed whole number of pixels per line.

Consecutive scope mode samples could be summed together like this: https://forums.parallax.com/discussion/comment/1555939/#Comment_1555939 But without a guaranteed time offset between samples it may not work so well.

I'm guessing that you have a higher frequency loop for things like PWM, ADC, etc. If that needs to be perfectly phase locked together, then I second the VCXO idea from Ada.

Not sure how much ADC precision you need. I did some quick math. 160MHz sysclock / 48khz ADC = 3333.33 clocks/sample. 3333/3334 = 1-0.03% SINC2 full scale value is N^2, so 3333^2 / 3334^2 = 1- .06% That is a 1-2mV difference which will probably be buried in the noise.

ManAtWork · 2024-04-03 08:10

If possible I'd prefer a software solution. The $1.50 for the VCXO is not the problem but I like to avoid special parts whenever possible to avoid procurement problems.

Yes, I have nested control loops that need to be phase locked. The servo current control loop runs at ~20kHz and the velocity control loop at 4 or 5kHz. You're right, if the ADC has 20kHz sample rate then the 20 clocks per 1ms maximum would mean only one clock difference per ADC cycle. That adds 1 LSB noise worst case. I have >2^13 bits per sample so if I get 10 bit accuracy and 12 bits resolution I'm happy. Audible noise is the biggest problem, even 10 bit accuracy would be enough for motion control alone.

So I think it should work with both methods, danymic PLL adjustment or PWM+ADC cycle time variation. But I still don't know how to measure the phase difference between the internal and external clock signal. I still haven't found a smart pin counter mode that works fully atomatically to measure phase difference. I think I have to use a trick like with the P1 counters. If the SYNC pulse is wide enough I can use it for two things at the same time:
a) trigger a smart pin in %10000 mode (time A-input states)
b) trigger an interrupt
The ISR then reads the counter and subtracts it from CT to get the actual time of the trigger edge to compensate the ISR response time.

Simonius · 2024-04-03 09:54

how about

%10010 = Time X A-input highs
Time is measured until X highs are accumulated.
X[31:0] establishes how many highs are to be accumulated.
Time is measured in clock cycles until X highs are accumulated from the input. The measurement is then placed in Z, and IN is
raised. RDPIN/RQPIN can then be used to retrieve the measurement.
?

ManAtWork · 2024-04-03 10:35

To be honest, I have difficulties to understand Chip's notation of how the counter modes work. So "time X A-input highs" means...

counter keeps last value if A=0
counter increments one for every clock cycle whenever A=1
BTW this is mode %10001.

Mode %10010 and !Y[2] should be better if it works like I expect:

reset holds the counter at $00000001
I release reset with DIRH at the beginning of my 1ms software loop
time is measured until X A-input rises are accumulated, I interpret this as "the counter starts counting immediately after reset and stops on the first rising edge if I set X=1 and Y[2:0]=001".
I can read Z and get the phase difference between my software clock and the external SYNC signal (A).

Simonius · 2024-04-03 10:43

%10010 = Time X A-input highs
Time is measured until X highs are accumulated.

If X=1 it would stop on the signal edge,
give the clock ticks since the pin was started

If X=10 it would stop after ten events, that dependent on setting can be:
-clock ticks while A is high
-A signal edges (both, i guess)
-A signal rises

Simonius · 2024-04-03 11:09

%10010 AND Y.[2] = Timeout on X clocks of missing A-input high/rise/edge

If no A-input high/rise/edge occurs within X clocks, IN is raised, a new timeout period of X clocks begins, and Z maintains a
running count of how many clocks have elapsed since the last A-input high/rise/edge.

Z will be limited to $80000000 and can
be read any time via RDPIN/RQPIN.

If an A-input high/rise/edge does occur within X clocks, a new timeout period of X clocks begins and Z is reset to $00000001.

evanh · 2024-04-03 11:10

Here's a table I made up a while back. It was mainly to identify which mode of counting is reported in Z, but I also relabelled some other terminology as well.

   Smartpin Counter Modes
 ==========================

    %01011_0, P_QUADRATURE              ' Count: A-B quadrature encoder
    %01100_0, P_REG_UP                  ' Count: A clock up, B enable
    %01101_0, P_REG_UP_DOWN             ' Count: A clock, B direction
    %01110_0, P_COUNT_RISES     Y=%0    ' Count: A clock up
                                Y=%1    ' Count: A clock up, B clock down
    %01111_0, P_COUNT_HIGHS     Y=%0    ' Accum: A up
                                Y=%1    ' Accum: A up, B down
    %10000_0, P_STATE_TICKS             ' Time: of prior level duration
    %10001_0, P_HIGH_TICKS              ' Time: of prior high duration

    %10010_0, P_EVENTS_TICKS    Y=%0nn  ' Time: of X number of highs/pulses/steps
                                Y=%1nn  ' Time: since latest high/rise/edge, with X timeout
    %10011_0, P_PERIODS_TICKS           ' Time: of X number of A-B cycles
    %10100_0, P_PERIODS_HIGHS           ' Accum: A up, during X number of A-B cycles

    %10101_0, P_COUNTER_TICKS           ' Time: of complete A-B cycles, for at least X duration
    %10110_0, P_COUNTER_HIGHS           ' Accum: A up, during complete A-B cycles, for at least X duration
    %10111_0, P_COUNTER_PERIODS         ' Count: of complete A-B cycles, for at least X duration

    KEY
 =========
Count  : Z result (RDPIN) counted events
Time   : Z result (RDPIN) duration of event or events, measured in sysclock ticks
Accum  : Z result (RDPIN) input gated time, measured in sysclock ticks

High   : a high level
Low    : a low level

Rise   : a low-to-high level change
Fall   : a high-to-low level change
Edge   : a low-to-high or high-to-low level change

Pulse  : a rise-to-fall high time
Step   : a rise-to-fall high time or a fall-to-rise low time
Cycle  : a rise-to-rise high-low time

Mickster · 2024-04-03 11:20

@evanh said:
I mean you're not trying to build a DSO, nor even an audio system. It's only motion control. A microsecond of jitter is nothing.

Yup, and a zillion variables. Time constants are all over the place due to temperature/load fluctuation. For me, this is up there with uSec loop sampling rates that the big names are pushing. Marketing dep't at it again.

Craig

ManAtWork · 2024-04-03 14:40

@Simonius said:
%10010 AND Y.[2] = Timeout on X clocks of missing A-input high/rise/edge

If no A-input high/rise/edge occurs within X clocks, IN is raised, a new timeout period of X clocks begins, and Z maintains a
running count of how many clocks have elapsed since the last A-input high/rise/edge.

Z will be limited to $80000000 and can
be read any time via RDPIN/RQPIN.

If an A-input high/rise/edge does occur within X clocks, a new timeout period of X clocks begins and Z is reset to $00000001.

Haha, that reminds me of some "worst ever programming language" somebody invented on purpose as a joke. "Goto" was replaced by "come from" and "if ... then" by "unless".

So mode %10010 with Y[2]=0 can be used to measure delay from software reset until an hardware input edge and %10010 with Y[2]=1 for the opposite, delay from hardware input edge to software poll of Z. This explanations should be added to the docs.

Wuerfel_21 · 2024-04-03 22:42

@ManAtWork said:
Haha, that reminds me of some "worst ever programming language" somebody invented on purpose as a joke. "Goto" was replaced by "come from" and "if ... then" by "unless".

Ruby actually has an unless keyword Though Spin has ifnot, which is really the same thing.

jmg · 2024-04-04 07:13

@ManAtWork said:
If possible I'd prefer a software solution. The $1.50 for the VCXO is not the problem but I like to avoid special parts whenever possible to avoid procurement problems.

You can get multi sourced low cost vctcxo using GPS frequencies.
Those default within 0.5~2ppm and you can trim them from there.

The P2 edge I think uses a TCXO. Cheapest (VC)TCXO are usually clipped sine.

If you want 20MHz, those are available too as VCTCXO

lcsc have these 0.5ppm at 57c/100
https://www.lcsc.com/product-detail/Temperature-Compensated-Crystal-Oscillators-TCXOs_KDS-Daishinku-1XTV20000CDA_C439723.html
Digikey have these 0.5ppm at $1.69/100
https://www.digikey.com/en/products/detail/taitien/TXEAADSANF-20-000000/6127602

50ppm is quite poor in 2024 for a crystal. You can get 10ppm tolerance and 10-20 ppm drift.
Of course you do need care to trim the load C to centre those, or you could store a calibrate offset into flash, that swallows offsets, soldering shift etc.

ManAtWork · 2024-04-04 07:38

Yes I know. I normally use 20ppm crystals. But the spec allows for +/-50ppm and I have to be compatible. It also makes no difference if my crystal has 50 or 0.5ppm. I have to compensate the long term drift and it can be done with an "add or subtract a single clock per PWM/ADC period". It just happens more often for 50ppm than for 0.5ppm. And if it can be done in software I see no reason to do it with hardware no matter how cheap the VCXOs are.

jmg · 2024-04-04 07:40

@ManAtWork said:
.... Providing the synchronization signal is not the problem. This is done by the field bus controller IC that provides a SYNC signal locally to each P2 with one pulse per millisecond synchronized to +/- 10ns for all boards.

Does that error accumulate ?
P2 can capture sysclks on a pin edge, but 10ns in 1ms is going to give 10ppm LSB plus some sampling jitter.
If the error averages lower, you could capture sysclks every 1000 edges or 1 sec, which now moves your LSB to 10ppb

jmg · 2024-04-04 08:06

@ManAtWork said:
100ppm @ 200MHz system clock ans 1ms cycle time means I have to insert or skip a maximum of 20 clocks per cycle. That's 100ns which is tolerable. So instead of trying to tune the frequency perfectly we could simply wait until the error sums up enough and then insert a "leap tick" like a day is inserted every leap year to keep the calendar synchronized to the earth/sun orbit.

This approach is valid too, and keeps the local osc simple.
You could arrange for all slaves to be a nudge too fast, so you always pause for some small amount, then the reference tick can be what keeps everything sync'd up.
You could add a missing pulse detect, so a missing tick does not lock everything up.

ManAtWork · 2024-04-04 10:35

The EtherCat fieldbus controller already has a VERY complicated circuit built in that handles the distributed clock synchronization. It uses one of the local clocks in the network as reference clock and synchronizes all others to it using some sort of PID controlled PLL. Cable delay, offset and frequency difference are stored locally. So if one or several network packets are lost the local clock continues to run with the last frequency calculated. Frequencies change only with thermal drift so once the control loop is stabilized it takes very long to drift apart. It can surely handle several milliseconds before synchronization ist lost. And even if that happens it's not a acatastrophic failure. Only a complete hangup of theoscillator would cause damage because the power stage would stay in the on state and current would rise uncontrollably.

@jmg said:
You could arrange for all slaves to be a nudge too fast, so you always pause for some small amount, then the reference tick can be what keeps everything sync'd up.

Yes exactly. I'll simply toggle either between two PLL settings or between two PWM/ADC cycle times, one slightly too fast and one slightly too slow. The decision is made based on the last phase comparison. If a pulse is missing (can't happen as long as the fieldbus controller still works) the P2 just sticks to the last setting which can result in a slowly increasing error but no catastrophic failure.

namibj · 2024-09-19 04:50

In very similar spirit, though with a software/smart pin based clock-and-data-recovery from an optical link as the reference source instead of some EtherCat controller's 1000 Hz square wave sync output, I've just looked at how/if the %CC setting could be used to modulate the effective load capacitance of the P2's internal crystal oscillator, for what seems to be more in the neighborhood of a +114 ppm / +137 ppm delta from turning on the integrated load capacitance to either 15 pf or 30 pf.

Unfortunately it seems like it will be a couple weeks before I'll have time to go about testing this in hardware.

Distributed clock synchronization

Comments