I'm also starting to favour having distinct Selector bits for each of CT.CLK and xNT.CLK sources, as either SysCLK or a Pin
That also reduces/frees up modes, as well as makes things more flexible.
See the discussion around External-Clock Gated Timer case, where it can make more sense to clock xNT from 'the other pin', than Chip's current code does.
Further to this, I think these modes (which are usually paired)
%10101 = WP Timer For whole periods in X+ clock cycles, count time
%10111 = WP Counter For whole periods in X+ clock cycles, count periods
can be the same as these (which can be used singly)
%10011 = For X periods, count time
%10100 = For X periods, count states
with xNT.CLK changed from SC for (%10101,%10111) to Ps for (%10011,%10100 ) (ie moving to simple user select of X-clock source, can collapse modes)
In both cases, you are tracking whole periods, with a possible delayed start.
A difference in usage, is one typically has an unknown Fi, so stipulates some min-time-to-measure, whilst the other knows ~Fi and wants to average over some X periods.
If your dynamic range is smaller, you can use less Smart Pin resource.
Limiting cases: I think the P2 can manage up to 63 Known-frequency counters, or up to 32 Reciprocal Frequency Counters (very wide dynamic range)
I'd ditch the 3 samples mechanism, and instead have a simple config decode into a tapping termination compare. With the 9-bit counter resetting on either an input state change or terminal count. This way every sysclock is a valid sample point for superior filtering.
With filter tappings at sysclocks of 1, 3, 7, 15, 31, 63, 127, 255, 511 ... I think, or even numbers might be easier.
Seems excessive ?
Other issues - Current map allocates only 4 choices for filtering.
If 3 vote sampling is not good enough, surely you have more problems than a 9 bit counter can fix ?
I could see merit in making one of the 4 choices, instead select a common /N filtering Clock, (replace /512), that is shared across all COGs.
That would reduce the local FF's significantly, and it also allows users define a select a slow debounce level clock.
I recall 200ms stable being an Alarm industry spec, and 10ms is common button bounce poll, those would be 66ms and 3ms in a 3 vote filter clock.
This type of debounce-pin filtering is becoming more common in small MCUs, as the modern programmer expects more done for them..
Yes, getting rid of those 9-bit counters in every smart pin is a great idea. We can pick, from the system counter, transitions of interest, instead.
If you wanted to move everything global, aside from the /1, maybe only 3 taps are needed (/1, /P, /Q ), to save 1 global route ?
I could see some use for MHz-region fast sensor filtering, and 10's of ms human type filtering, which is 2 globals - is there any need for (/1, /P, /Q, /R ) ?
Yes, getting rid of those 9-bit counters in every smart pin is a great idea. We can pick, from the system counter, transitions of interest, instead.
If you wanted to move everything global, aside from the /1, maybe only 3 taps are needed (/1, /P /Q ), to save 1 global route ?
I could see some use for MHz-region fast sensor filtering, and 10's of ms human type filtering, which is 2 - is there any need for 3 ?
Yes, getting rid of those 9-bit counters in every smart pin is a great idea. We can pick, from the system counter, transitions of interest, instead.
If you wanted to move everything global, aside from the /1, maybe only 3 taps are needed (/1, /P /Q ), to save 1 global route ?
I could see some use for MHz-region fast sensor filtering, and 10's of ms human type filtering, which is 2 - is there any need for 3 ?
I don't know. We have 4 slots, don't we?
Sure, comes down to what is the cost of a global route - I was trying to save die area... ?
I did give the Alarm industry example, so if you think 4 slots and 3 globals is ok, you could pitch to that sector, saying you can filter remote lines at ~200ms, local buttons.keypads at ~20ms and fast links at ~1us ?
You are using the Smart Pins to save code size, and they do live up to their name, even for something of comparatively low-brow like filtering...
Yes.
There is even a case for a 2-vote filter, at the top end.
If you have no filter at all, just a D-FF, then there is a small but finite chance even a nano-second wide spike can be caught and stretched to 1 Clk.
If you use 3-vote filtering, you have dropped the highest pin-toggle to SysCLK/3... less than ideal...
So a 2-vote filter will avoid single-digit ns spikes being expanded, & improves the highest pin toggle to SysCLK/2. This is a common MCU filter.
what about this ? :
000 = A, B (default) No filtering.
..
100 = A, B, filtered 2-of-2 at clock/1
101 = A, B, filtered 2-of-2 at clock/P
110 = A, B, filtered 3-of-3 at clock/Q
111 = A, B, filtered 4-of-4 at clock/R
If P,Q,R are general taps, this allows any Ff, and choice of Vote.
A proper filter is where the analogue section is slower than the digital. The only time a faster analogue works is when the signal is known to be regular and can be synchronised to.
Therefore, given this is all about unknown signal timing, there is no point in having large spaced samples without also recommending a large analogue filter in front of it. If such an analogue filter exists then there is no need for multiple samples.
So, the only question on a digital filter is how many sysclock samples does it run for.
A proper filter is where the analogue section is slower than the digital. The only time a faster analogue works is when the signal is known to be regular and can be synchronised to.
Therefore, given this is all about unknown signal timing, there is no point in having large spaced samples without also recommending a large analogue filter in front of it. If such an analogue filter exists then there is no need for multiple samples.
So, the only question on a digital filter is how many sysclock samples does it run for.
This is not a digital filter, in the strict sense - think of it more as a simple vote system, to wait-for-stable outcomes.
The input is not analog, but commonly contacts with noisy edges, with a known time-band on the noise, or some accepted 'stable for' time.
3-votes from a lower clock, will do well enough for most real-world signals, and 2-votes from a highest clock should spike filter without too much impact on highest count speeds.
It is only a filter, its sole purpose is for filtering. The accumulating method of all-or-nothing may not be any good for determining analogue signals but analogue is not the objective is it.
At an Fclk of 160 MHz, a filter that does effectively re-align input signals to the local timing reference, on a clock per clock basis, can be extremely useful.
But I don't realy undertand how it was crafted.
Single stage?
Will it have any chance of being immune to metastability?
Despite adding some (almost) predictable delay to dissimilar-time-base related bit streams, two ( or better, three) stages of filtering (shift register-alike linked, not two out of three or three out of three, in order to not restrain the maximum data rate) could perhaps do a better job.
At an Fclk of 160 MHz, a filter that does effectively re-align input signals to the local timing reference, on a clock per clock basis, can be extremely useful.
But I don't realy undertand how it was crafted.
Single stage?
Will it have any chance of being immune to metastability?
Despite adding some (almost) predictable delay to dissimilar-time-base related bit streams, two ( or better, three) stages of filtering (shift register-alike linked, not two out of three or three out of three, in order to not restrain the maximum data rate) could perhaps do a better job.
Only a thought...
Henrique
On every Nth clock, a new sample is taken from the input pin and it is shifted into a 3-bit buffer. The output of the filter becomes "1" when all three buffered bits are 1's. The output becomes "0" when all three buffered bits are 0's.
Will it have any chance of being immune to metastability?
Usually you avoid reduce metastability & avoid aperture effects by having just one D-FF sample the pin, in the Pin-Cell.
From there, it can be sampled again, and/or buffered as needed.
On every Nth clock, a new sample is taken from the input pin and it is shifted into a 3-bit buffer. The output of the filter becomes "1" when all three buffered bits are 1's. The output becomes "0" when all three buffered bits are 0's.
Oh, now I understand the meaning of three out of three. But, in such a way, the maximum rate of change will be restricted to 1/3 of Fclk.
On every Nth clock, a new sample is taken from the input pin and it is shifted into a 3-bit buffer. The output of the filter becomes "1" when all three buffered bits are 1's. The output becomes "0" when all three buffered bits are 0's.
Oh, now I understand the meaning of three out of three. But, in such a way, the maximum rate of change will be restricted to 1/3 of Fclk.
We can reduce the filter criteria to just two bits, instead of three, at the highest-frequency setting. Jmg pointed that out.
Usually you avoid metastability & aperture effects by having just one D-FF sample the pin, in the Pin-Cell.
From there, it can be sampled again, and/or buffered as needed.
Sorry jmg, I can be totaly wrong, but to my understanding, it's just there where metastability occurs.
A single stage of D-FF can be good for same-timebase signal sampling, but, input pins are not specifically meant to receive same-timebase signals.
Altera reccomends at least two-stage D-FF input circuits. Better three, as Chip has designed them (thanks Chip, your description made me (finally) understand three out of three).
But this restricts the data rate to 1/3 of Fclk. And we are talking about steady data inputs (three consecutive equal logic level bits). Noisy data streams would certainly reduce the bandwidht.
Okay, I've been thinking about this a little more. JMG has a point about the filter thing. Without an analogue filter, my idea of sampling every clock on a noisy input just results in it never stabilising. Which is not useful.
To clarify, the three sample mechanism is not a "wait for stability" mechanism at all. Rather, it's a grab some random samples and hope at least one of them is always correct. That way, through sheer luck, a continuous noisy signal will produce a result without any need for a true filter. It's definitely a tad hit'n'miss though. Just better than nothing at all.
Okay, I've been thinking about this a little more. JMG has a point about the filter thing. Without an analogue filter, my idea of sampling every clock on a noisy input just results in it never stabilising. Which is not useful.
To clarify, the three sample mechanism is not a "wait for stability" mechanism at all. Rather, it's a grab some random samples and hope at least one of them is always correct. That way, through sheer luck, a continuous noisy signal will produce a result without any need for a true filter. It's definitely a tad hit'n'miss though. Just better than nothing at all.
It's definitely not perfect. And consider, also, that we do have selectable Schmitt trigger input modes.
Sorry jmg, I can be totaly wrong, but to my understanding, it's just there where metastability occurs.
A single stage of D-FF can be good for same-timebase signal sampling, but, input pins are not specifically meant to receive same-timebase signals.
Altera reccomends at least two-stage D-FF input circuits. Better three, as Chip has designed them (thanks Chip, your description made me (finally) understand three out of three).
Well, yes I've rephrased my comment as you cannot totally avoid metastable effects, but you can minimize them.
A pin-Schmitt will help, as that increases the slew rate at the D-Pin, and it is likely there are more than a single D-FF in the total chain.
With modern CMOS processes the settling times are very rapid, especially on a lightly loaded FF. (ie a buffer will help where the pin-FF drives to many.)
The two stage simple series D-FF does not lower the pin toggle rate, but it does increase the latency. It is probably tolerable to have that at each pin buffer.
It may be that Chip has one FF in the outer ring, and another at Core logic entry.
JMG,
Here's the actual quoted documentation mentioning the glitch potential for mode %10000 = Time A-input states
If states change faster than the cog is able to retrieve measurements, the measurements will effectively be lost, as old ones will be overwritten with new ones. This may be gotten around by using two smart pins to time highs, with one pin inverting its ‘A’ input. Then, you could capture both states, as long as the sum of the states’ durations didn’t exceed the cog’s ability retrieve both results. This would help in cases where one of the states was very short in duration, but the other wasn’t.
Now for achieving mode %10001 = Time A-input high states functionality with say mode %10011 = For X periods, count time
Set X = 1 so that each high pulse time is individually captured to Z.
Set Y = %00
Set A input and B input to same source and invert B input so that the falling edge is the capture point.
For achieving mode %10001 = Time A-input high states functionality with mode %10101 = For periods in X+ clock cycles, count time
Set X = 1 so that any/all high pulses will individually capture to Z.
Set Y = %00
Set A input and B input to same source and invert B input so that the falling edge is the capture point.
Note the config is identical for both alternate modes.
Now for achieving mode %10001 = Time A-input high states functionality with say mode %10011 = For X periods, count time
Set X = 1 so that each high pulse time is individually captured to Z.
Set Y = %00
Set A input and B input to same source and invert B input so that the falling edge is the capture point.
Umm, no....
That only works on a repetitive signal, as the whole period modes need two edges at least.
If they could be identical, Chip would already have done that.
JMG,
Here's the actual quoted documentation mentioning the glitch potential for mode %10000 = Time A-input states
If states change faster than the cog is able to retrieve measurements, the measurements will effectively be lost, as old ones will be overwritten with new ones. This may be gotten around by using two smart pins to time highs, with one pin inverting its ‘A’ input. Then, you could capture both states, as long as the sum of the states’ durations didn’t exceed the cog’s ability retrieve both results. This would help in cases where one of the states was very short in duration, but the other wasn’t.
Only Chip does not use the word glitch for that effect, and nor would I.
What is described there, is a simple over-run problem, common to all MCU + peripherals, above some certain speed.
This does raise a valid point, which I've been looking at for a couple of days, which is how to signal and manage Smart Pin overruns.
High-rate over-run is also one reason I proposed adding these modes, where the P2 can watch high speed data, without over-run.
PWCaptureLE ( Compares Pulse width with X, captures @ X if Pe occurs before X )
PWCaptureGE ( Compares Pulse width with X, captures @ Pe if X occurs before Pe )
Getting back to how to signal and manage Smart Pin overruns :
There is a 33 pin BUS, so you can read into C status flag.
I'd suggest adding Error flags to all capture modes, so if a second capture is attempted before the first one was read, C is SET.
This also neatly allows the new modes PWCaptureLE, PWCaptureGE to signal one or two+ cases of signal exception capture.
The last usage question, is should an over-run overwrite the old value, or lose the most recent, or should that be a user choice ?
This also raises a question around Auto-Zero - if there is a noisy double-edge signal, and you have both Auto-Zero and overwrite, then you will get a bad/near zero outcome.
Without Auto-Zero, that last reading is only off by the noise width; without overwrite, you get the first edge, & some may consider that the best value.
Q: Should Auto-Zero and Capture-Overwrite decisions be made user-options, the same as I suggest for CT, xNT CLK sources ?
They do tend to be differing answers, depending on the application.
I see at least one mode says this %10000 = Time A-input states
Upon each state change, the prior state is placed in the C-flag buffer,
That's certainly useful, but it would be better to be able to capture the pin value, and signal an over-run.
Does that mean a 34 bit bus, that feeds Pin Value -> Z flag, and Errors -> C, is a better solution ?
(there are other modes where Pin Value -> flag could be useful)
Only Chip does not use the word glitch for that effect, and nor would I.
Nothing wrong with calling that a glitch. It's not like glitch is a technical term.
What is described there, is a simple over-run problem, common to all MCU + peripherals, above some certain speed.
This does raise a valid point, which I've been looking at for a couple of days, which is how to signal and manage Smart Pin overruns.
High-rate over-run is also one reason I proposed adding these modes, where the P2 can watch high speed data, without over-run.
You can't call it a speed/rate problem, and it is not an overrun due to pulse rate.
Yes it is an overrun but only due to the Z buffer needing to be filled twice, for both high and low times, in short order when there is a tiny pulse length. It's an oddity and glitch prone.
I see at least one mode says this %10000 = Time A-input states
Upon each state change, the prior state is placed in the C-flag buffer,
That's certainly useful, but it would be better to be able to capture the pin value, and signal an over-run.
That's not a useful example. C-flag is there to identify the high/low state (pin value) of the captured prior time reading in the Z register. It has nothing to do with any potential fault condition.
Comments
Further to this, I think these modes (which are usually paired)
%10101 = WP Timer For whole periods in X+ clock cycles, count time
%10111 = WP Counter For whole periods in X+ clock cycles, count periods
can be the same as these (which can be used singly)
%10011 = For X periods, count time
%10100 = For X periods, count states
with xNT.CLK changed from SC for (%10101,%10111) to Ps for (%10011,%10100 )
(ie moving to simple user select of X-clock source, can collapse modes)
In both cases, you are tracking whole periods, with a possible delayed start.
A difference in usage, is one typically has an unknown Fi, so stipulates some min-time-to-measure, whilst the other knows ~Fi and wants to average over some X periods.
If your dynamic range is smaller, you can use less Smart Pin resource.
Limiting cases: I think the P2 can manage up to 63 Known-frequency counters, or up to 32 Reciprocal Frequency Counters (very wide dynamic range)
Is there a shared /512 clock, routed to all pin-cells ?
Seems excessive ?
Other issues - Current map allocates only 4 choices for filtering.
If 3 vote sampling is not good enough, surely you have more problems than a 9 bit counter can fix ?
I could see merit in making one of the 4 choices, instead select a common /N filtering Clock, (replace /512), that is shared across all COGs.
That would reduce the local FF's significantly, and it also allows users define a select a slow debounce level clock.
I recall 200ms stable being an Alarm industry spec, and 10ms is common button bounce poll, those would be 66ms and 3ms in a 3 vote filter clock.
This type of debounce-pin filtering is becoming more common in small MCUs, as the modern programmer expects more done for them..
If you wanted to move everything global, aside from the /1, maybe only 3 taps are needed (/1, /P, /Q ), to save 1 global route ?
I could see some use for MHz-region fast sensor filtering, and 10's of ms human type filtering, which is 2 globals - is there any need for (/1, /P, /Q, /R ) ?
I don't know. We have 4 slots, don't we?
Sure, comes down to what is the cost of a global route - I was trying to save die area... ?
I did give the Alarm industry example, so if you think 4 slots and 3 globals is ok, you could pitch to that sector, saying you can filter remote lines at ~200ms, local buttons.keypads at ~20ms and fast links at ~1us ?
You are using the Smart Pins to save code size, and they do live up to their name, even for something of comparatively low-brow like filtering...
- every clock
- every 64th clock
- every 1Mth clock
- every 8Mth clock
Is every clock really useful?
Yes.
There is even a case for a 2-vote filter, at the top end.
If you have no filter at all, just a D-FF, then there is a small but finite chance even a nano-second wide spike can be caught and stretched to 1 Clk.
If you use 3-vote filtering, you have dropped the highest pin-toggle to SysCLK/3... less than ideal...
So a 2-vote filter will avoid single-digit ns spikes being expanded, & improves the highest pin toggle to SysCLK/2. This is a common MCU filter.
what about this ? :
000 = A, B (default) No filtering.
..
100 = A, B, filtered 2-of-2 at clock/1
101 = A, B, filtered 2-of-2 at clock/P
110 = A, B, filtered 3-of-3 at clock/Q
111 = A, B, filtered 4-of-4 at clock/R
If P,Q,R are general taps, this allows any Ff, and choice of Vote.
Therefore, given this is all about unknown signal timing, there is no point in having large spaced samples without also recommending a large analogue filter in front of it. If such an analogue filter exists then there is no need for multiple samples.
So, the only question on a digital filter is how many sysclock samples does it run for.
The input is not analog, but commonly contacts with noisy edges, with a known time-band on the noise, or some accepted 'stable for' time.
3-votes from a lower clock, will do well enough for most real-world signals, and 2-votes from a highest clock should spike filter without too much impact on highest count speeds.
But I don't realy undertand how it was crafted.
Single stage?
Will it have any chance of being immune to metastability?
Despite adding some (almost) predictable delay to dissimilar-time-base related bit streams, two ( or better, three) stages of filtering (shift register-alike linked, not two out of three or three out of three, in order to not restrain the maximum data rate) could perhaps do a better job.
Only a thought...
Henrique
On every Nth clock, a new sample is taken from the input pin and it is shifted into a 3-bit buffer. The output of the filter becomes "1" when all three buffered bits are 1's. The output becomes "0" when all three buffered bits are 0's.
From there, it can be sampled again, and/or buffered as needed.
Oh, now I understand the meaning of three out of three. But, in such a way, the maximum rate of change will be restricted to 1/3 of Fclk.
We can reduce the filter criteria to just two bits, instead of three, at the highest-frequency setting. Jmg pointed that out.
Sorry jmg, I can be totaly wrong, but to my understanding, it's just there where metastability occurs.
A single stage of D-FF can be good for same-timebase signal sampling, but, input pins are not specifically meant to receive same-timebase signals.
Altera reccomends at least two-stage D-FF input circuits. Better three, as Chip has designed them (thanks Chip, your description made me (finally) understand three out of three).
But this restricts the data rate to 1/3 of Fclk. And we are talking about steady data inputs (three consecutive equal logic level bits). Noisy data streams would certainly reduce the bandwidht.
Henrique
To clarify, the three sample mechanism is not a "wait for stability" mechanism at all. Rather, it's a grab some random samples and hope at least one of them is always correct. That way, through sheer luck, a continuous noisy signal will produce a result without any need for a true filter. It's definitely a tad hit'n'miss though. Just better than nothing at all.
It's definitely not perfect. And consider, also, that we do have selectable Schmitt trigger input modes.
A pin-Schmitt will help, as that increases the slew rate at the D-Pin, and it is likely there are more than a single D-FF in the total chain.
With modern CMOS processes the settling times are very rapid, especially on a lightly loaded FF. (ie a buffer will help where the pin-FF drives to many.)
The two stage simple series D-FF does not lower the pin toggle rate, but it does increase the latency. It is probably tolerable to have that at each pin buffer.
It may be that Chip has one FF in the outer ring, and another at Core logic entry.
Here's the actual quoted documentation mentioning the glitch potential for mode %10000 = Time A-input states
Set X = 1 so that each high pulse time is individually captured to Z.
Set Y = %00
Set A input and B input to same source and invert B input so that the falling edge is the capture point.
Set X = 1 so that any/all high pulses will individually capture to Z.
Set Y = %00
Set A input and B input to same source and invert B input so that the falling edge is the capture point.
Note the config is identical for both alternate modes.
That only works on a repetitive signal, as the whole period modes need two edges at least.
If they could be identical, Chip would already have done that.
Only Chip does not use the word glitch for that effect, and nor would I.
What is described there, is a simple over-run problem, common to all MCU + peripherals, above some certain speed.
This does raise a valid point, which I've been looking at for a couple of days, which is how to signal and manage Smart Pin overruns.
High-rate over-run is also one reason I proposed adding these modes, where the P2 can watch high speed data, without over-run.
Getting back to how to signal and manage Smart Pin overruns :
There is a 33 pin BUS, so you can read into C status flag.
I'd suggest adding Error flags to all capture modes, so if a second capture is attempted before the first one was read, C is SET.
This also neatly allows the new modes PWCaptureLE, PWCaptureGE to signal one or two+ cases of signal exception capture.
The last usage question, is should an over-run overwrite the old value, or lose the most recent, or should that be a user choice ?
This also raises a question around Auto-Zero - if there is a noisy double-edge signal, and you have both Auto-Zero and overwrite, then you will get a bad/near zero outcome.
Without Auto-Zero, that last reading is only off by the noise width; without overwrite, you get the first edge, & some may consider that the best value.
Q: Should Auto-Zero and Capture-Overwrite decisions be made user-options, the same as I suggest for CT, xNT CLK sources ?
They do tend to be differing answers, depending on the application.
I see at least one mode says this
%10000 = Time A-input states
Upon each state change, the prior state is placed in the C-flag buffer,
That's certainly useful, but it would be better to be able to capture the pin value, and signal an over-run.
Does that mean a 34 bit bus, that feeds Pin Value -> Z flag, and Errors -> C, is a better solution ?
(there are other modes where Pin Value -> flag could be useful)
You can't call it a speed/rate problem, and it is not an overrun due to pulse rate.
Yes it is an overrun but only due to the Z buffer needing to be filled twice, for both high and low times, in short order when there is a tiny pulse length. It's an oddity and glitch prone.
Maybe the serial modes have something ...