I've got all this SINC2/SINC3/conversion/raw/scope stuff done, finally.
Now I can get onto updating the streamer.
Thank you, Everyone, for your help on this stuff.
The SINC smart pin mode does the following:
1) Complete 1..14-bit SINC2 conversions in 2^(n-1) clocks.
2) SINC2 filtering at 1..8192-clock period
3) SINC3 filtering at 1..512-clock period
4) Raw 32 bits every 32 clocks
5) Internally or externally clocked
Will there be any changes to the Goertzel? I have never used one and I can't help until I learn more about it.
The only thing I plan to change in the Goertzel is to have all four LUT bytes signed internally, then invert their MSBs whenever they are output to DACs. This will make it possible to use three phases for output, 0, 120, and 240 degrees, while leaving one for 90 degrees, so that we have the requisite 0 and 90 for input, but get 3-phase concurrent output.
How might one use the Goertzel with multiple inputs? It seems to be a fairly common setup. The Chevy Volt motor position sensor uses one excitation signal and 2 outputs. (picture attached) An RF network analyzer generates a single phase and measures many signals. It's often desirable to know forward power, reflected power and transmitted power. Perhaps time multiplexing would work for some of these applications.
In the RF world differential signals are fairly common. Our modulators are not differential, but emulating a differential input might still improve the power supply ripple rejection. The smart pins already have sinc1 for differential signals.
Also common for RF is quadrature signals. Most quadrature demodulator ICs have differential outputs. I'm not sure if anyone will use the P2 for serious software defined radio work. If we had an external quadrature demodulator then the Goertzel hardware could be used to extract a smaller portion of the total bandwidth. Different cogs could demodulate different channels. To make sense of the output of a quadrature demodulator it's necessary to measure both the I and Q outputs at the same frequency. The current Goertzel can only monitor one input at a time.
Other things that would be good for software defined radio: sinc3 filtering of the Goertzel accumulators. Updating the streamer/Goertzel NCO frequency at any time. If we wait until the end of the cycle, that adds a variable delay which might be problematic if we are trying to phase-lock an external input. Maybe for frequency modulation too. Note that decoding color NTSC/PAL video also requires quadrature demodulation.
James,
That's known as a resolver. They're the workhorse of the top-of-the-line industrial servos. Can't beat 'em for rugged feedback. I first saw them when I started working on industrial gear in mid 1990's. And that was fully digital drives even then. I believe resolvers had been in use in analogue drives decades earlier.
SINC3 filters for the Goertzel accumulators would be awesome, but bit growth would be huge. I can only imagine how much better that "Fun With Goertzel" demo would work.
Bit growth would require 32-bit integrators for just 256 clocks' worth of accumulation. Might as well bump up to 64 bits. That would need six 64-bit integrators to cover cosine and sine. That would add 3k flipflops to the design, or a 5% increase. Oh, the result buffers would add another 1k flops.
The P2 can capture fairly good quality video. It's certainly not professional grade, but It might be worth averaging several ADC pins together to see the result. P2 video is on the bottom:
There is no window filter used here. I just stream the ADC bitstream out of a digital pin. Most of the filtering is done by the video decoder in the TV or capture card. The digital pins may not be quite linear as a delta-sigma output at these rates. Doing some filtering on the P2 might reduce this distortion. Or maybe it would be better to use a DAC instead of digital mode. This test was done at 180MHz. I added an RC filter to try to reduce some of the delta-sigma noise. These values are not at all optimal. They're just values I had at my desk. The actual video decoding is done by a video capture card, not P2 code.
Nice, Saucy! Does 250MHz look much better? I would definitely try a group of 4 pins and sum all their bitstreams.
By the way, We can implement SINC2 filtering on Goertzel X and Y. Should be good for 64K samples (clocks) per decimation, sticking with 32-bit accumulators. It would add 64 flops per cog, not much.
Earlier I mentioned using a look-up-table to filter the bitstream. This is the implementation of that idea. 8 clocks per sample. I'm running the P2 at 180MHz, so 22.5MSPS! Timing is very tight. The streamer could be modified to run this algorithm. It's just table look up, add, shift. Although, what's the point if we have the scope filter?
In these tests, the video actually looks worse at higher clock frequencies. Not surprising since raising the clock frequency raises the filter cutoff frequency. Having 4 different filter options is not overkill.
Saucy, I'm wondering if there may be some value in having the SINC modes look at 2 or 4 pins, summed together, instead of just one pin. Same with SINC2 Goertzel; 4 bits summed together result in: +2, +1, 0, -1, -2, which amounts to a left-shift or no-add, in addition to what we're already doing.
I did a quick test on the P1. Breadboarded on my Activity Board with 220pF caps instead of 1nF. Input was grounded.
Triangle Rectangle
Mean 1143.3 1153.9
Std Dev 0.13057 4.64507
Triangular window has a standard deviation 2.8% of the rectangular window. Have we been doing ADC wrong for 12 years? I'm probably going to build a second or third order modulator next.
So, I was just playing with this on a Quickstart board, working on making a 2-channel triangular-weighted ADC, but for hardware reasons I was monitoring only a single physical ADC input (so the CTRB counter is in POS mode monitoring the same input pin as used by CTRA). I found that the standard deviation of the B channel was almost 4x that of channel A, even through supposedly they were monitoring the exact same data! I also verified that I saw similar results when just using an evenly-weighted acquisition.
What fixed the issue was having the CTRB actually monitor the output pin from CTRA (using LOGIC !A mode). Once I made that change the results were similar in average and standard deviation again. I am guessing that the CTRA was doing it's job and toggling the output pin, and that feedback via a resistor was changing the the input pin while CTRB was busy reading it?
I noticed that the above code used the same mode I had started with, and thus might have an artificially high standard deviation reported for the regular filter mode. In my testing I found that the triangular-weighting gave a standard deviation almost exactly 1/2 that of the evenly-weighted samples.
So, I was just playing with this on a Quickstart board, working on making a 2-channel triangular-weighted ADC, but for hardware reasons I was monitoring only a single physical ADC input (so the CTRB counter is in POS mode monitoring the same input pin as used by CTRA). I found that the standard deviation of the B channel was almost 4x that of channel A, even through supposedly they were monitoring the exact same data! I also verified that I saw similar results when just using an evenly-weighted acquisition.
What fixed the issue was having the CTRB actually monitor the output pin from CTRA (using LOGIC !A mode). Once I made that change the results were similar in average and standard deviation again. I am guessing that the CTRA was doing it's job and toggling the output pin, and that feedback via a resistor was changing the the input pin while CTRB was busy reading it?
... In my testing I found that the triangular-weighting gave a standard deviation almost exactly 1/2 that of the evenly-weighted samples.
Close, what you measured was the aperture effect of having two FF's sampling an async pin.
Small skews in tsu.th mean the exact sample time is not the same, and hence the different results.
The !CTRA is what is actually fed-back, and is what is being averaged in the C, so that is the better place to add a second monitor point.
It's hard to see the improvement with using 4 ADC pins because I had to go back to a rectangular filter. I don't know if using multiple ADCs would satisfy Phil. Even if the extra resolution is not "instrumentation quality" I think it's a huge win for video and software defined radio.
If we could get "professional quality" video using 8 pins, that is fine. Why? We could use an external video decoder IC instead. Those chips need at least 9 pins for their digital data connection. 8 data plus 1 for clock. Since the ADCs can measure adjacent pins, it's possible that some of these ADCs could measure the adjacent pin while their assigned pin is used for a digital output.
So, I was just playing with this on a Quickstart board, working on making a 2-channel triangular-weighted ADC, but for hardware reasons I was monitoring only a single physical ADC input (so the CTRB counter is in POS mode monitoring the same input pin as used by CTRA). I found that the standard deviation of the B channel was almost 4x that of channel A, even through supposedly they were monitoring the exact same data! I also verified that I saw similar results when just using an evenly-weighted acquisition.
What fixed the issue was having the CTRB actually monitor the output pin from CTRA (using LOGIC !A mode). Once I made that change the results were similar in average and standard deviation again. I am guessing that the CTRA was doing it's job and toggling the output pin, and that feedback via a resistor was changing the the input pin while CTRB was busy reading it?
... In my testing I found that the triangular-weighting gave a standard deviation almost exactly 1/2 that of the evenly-weighted samples.
Close, what you measured was the aperture effect of having two FF's sampling an async pin.
Small skews in tsu.th mean the exact sample time is not the same, and hence the different results.
The !CTRA is what is actually fed-back, and is what is being averaged in the C, so that is the better place to add a second monitor point.
Interesting discovery, Jonathan. Hopefully there is no case where the window function INCREASES the noise.
I am implementing Sinc2 filtering in the Goertzel circuit and it works amazingly better than Sinc1. However, over time, the first integrator seems to accumulate a large offset and the final output becomes fractured-looking, with a DC offset proportional to the dwell time.
Now, in normal ADC Sinc2 filtering, we are only accumulating zeros and ones, whereas in this case we are accumulating 8-bit signed values in the first integrator. Being signed, they don't necessarily grow terminally, but there is an offset in the ADC that will accumulate and this seems to cause a mess after 5 minutes, or so.
Any ideas?
What should Sinc2 filtering look like for signed samples?
I am implementing Sinc2 filtering in the Goertzel circuit and it works amazingly better than Sinc1. However, over time, the first integrator seems to accumulate a large offset and the final output becomes fractured-looking, with a DC offset proportional to the dwell time.
Now, in normal ADC Sinc2 filtering, we are only accumulating zeros and ones, whereas in this case we are accumulating 8-bit signed values in the first integrator. Being signed, they don't necessarily grow terminally, but there is an offset in the ADC that will accumulate and this seems to cause a mess after 5 minutes, or so.
Any ideas?
What should Sinc2 filtering look like for signed samples?
Are the 8-bit values sign-extended in the first integrator?
I am implementing Sinc2 filtering in the Goertzel circuit and it works amazingly better than Sinc1. However, over time, the first integrator seems to accumulate a large offset and the final output becomes fractured-looking, with a DC offset proportional to the dwell time.
Now, in normal ADC Sinc2 filtering, we are only accumulating zeros and ones, whereas in this case we are accumulating 8-bit signed values in the first integrator. Being signed, they don't necessarily grow terminally, but there is an offset in the ADC that will accumulate and this seems to cause a mess after 5 minutes, or so.
Any ideas?
What should Sinc2 filtering look like for signed samples?
Are the 8-bit values sign-extended in the first integrator?
The issue must be a computational/algorithmic one. Any analogue imbalance should not cause a runaway output after the diff'ing. It must be small errors though, as in rounding, or it wouldn't be so slow.
PS: When I said apply an offset, I meant applied to output of the regular unsigned algorithm. Then it can be used as a signed integer.
The issue must be a computational/algorithmic one. Any analogue imbalance should not cause a runaway output after the diff'ing. It must be small errors though, as in rounding, or it wouldn't be so slow.
PS: When I said apply an offset, I meant applied to output of the regular unsigned algorithm. Then it can be used as a signed integer.
The "regular algorithm" uses two's complement arithmetic. The difference here is that the input is 8-bit, not 1-bit. I think the maximum decimation rate R has been reduced from 1024 to 512. How many bits are being used for the accumulators?
The issue must be a computational/algorithmic one. Any analogue imbalance should not cause a runaway output after the diff'ing. It must be small errors though, as in rounding, or it wouldn't be so slow.
PS: When I said apply an offset, I meant applied to output of the regular unsigned algorithm. Then it can be used as a signed integer.
The "regular algorithm" uses two's complement arithmetic. The difference here is that the input is 8-bit, not 1-bit. I think the maximum decimation rate R has been reduced from 1024 to 512. How many bits are being used for the accumulators?
Is it possible that the unbalanced range (+127,-128) is at fault?
This might be the issue, leading eventually to an overflow (or strictly speaking underflow) in the first accumulator. Is there any check for overflow?
The sin and cos table values range from -127..+127, so that's not an issue.
The first integrator's accumulations do accrue over time, though, and start to cause mild distortions, which get more pronounced over 15 minutes, causing the power baseline to become proportional to the measurement time. As well, the measurements become more grainy.
I've made a way to periodically reset the first integrators, but I think we need a more automatic approach.
That would be a good candidate for the problem. I suspect 2's complement roll-over is not mathematically compatible with a balanced signed input. And vis-versa.
The issue must be a computational/algorithmic one. Any analogue imbalance should not cause a runaway output after the diff'ing. It must be small errors though, as in rounding, or it wouldn't be so slow.
PS: When I said apply an offset, I meant applied to output of the regular unsigned algorithm. Then it can be used as a signed integer.
The "regular algorithm" uses two's complement arithmetic. The difference here is that the input is 8-bit, not 1-bit. I think the maximum decimation rate R has been reduced from 1024 to 512. How many bits are being used for the accumulators?
Is it possible that the unbalanced range (+127,-128) is at fault?
This might be the issue, leading eventually to an overflow (or strictly speaking underflow) in the first accumulator. Is there any check for overflow?
The sin and cos table values range from -127..+127, so that's not an issue.
The first integrator's accumulations do accrue over time, though, and start to cause mild distortions, which get more pronounced over 15 minutes, causing the power baseline to become proportional to the measurement time. As well, the measurements become more grainy.
I've made a way to periodically reset the first integrators, but I think we need a more automatic approach.
The Wikipedia article on Goertzel's algorithm contains a link to a paper written by Gentleman, W. M. (1 February 1969). "An error analysis of Goertzel's (Watt's) method for computing Fourier coefficients" (PDF). The Computer Journal. 12 (2): 160–164. and if you follow the link in the footnotes on the Wiki it will take you to a copy of that paper which states in the abstract that "Goertzel's method, also known as Watt's algorithm, is on of the three standard methods of computing Fourier coefficients, and is commonly used when only a small number of coefficients is desired for a given sequence. This paper gives a floating point error analysis of the technique, and shows why it should be avoided, particularly for low frequencies."
After some experimentation, I found that periodically clearing the first integrators in the Goertzel circuit, by temporarily switching from Sinc2 to Sinc1, and back to Sinc2, gets rid of the buildup problem.
The new Goertzel circuit allows up to 4 ADC input pins with selectable polarity for each. This is going to make relative measurements possible, among other things.
In these videos, there is a stable 1MHz sine coming into one pin and a swept sine above and below 1MHz coming into another pin. The two pins are summed to form the input sample. The sample is then multiplied each clock by cosine and sine values which are accumulated and periodically converted to power (hypotenuse).
For the Sinc2 video, I'm clearing the first integrators at the outset of each swept-sine measurement. You can see the resolving difference between Sinc1 and Sinc2 quite nicely.
Comments
Sounds great, hopefully, not too much added logic cost for doing this ?
What are the expected sample rates vs ENOB on each of these, and do you have DNL.INL plots from real P2 captures thru these filters ?
How might one use the Goertzel with multiple inputs? It seems to be a fairly common setup. The Chevy Volt motor position sensor uses one excitation signal and 2 outputs. (picture attached) An RF network analyzer generates a single phase and measures many signals. It's often desirable to know forward power, reflected power and transmitted power. Perhaps time multiplexing would work for some of these applications.
In the RF world differential signals are fairly common. Our modulators are not differential, but emulating a differential input might still improve the power supply ripple rejection. The smart pins already have sinc1 for differential signals.
Also common for RF is quadrature signals. Most quadrature demodulator ICs have differential outputs. I'm not sure if anyone will use the P2 for serious software defined radio work. If we had an external quadrature demodulator then the Goertzel hardware could be used to extract a smaller portion of the total bandwidth. Different cogs could demodulate different channels. To make sense of the output of a quadrature demodulator it's necessary to measure both the I and Q outputs at the same frequency. The current Goertzel can only monitor one input at a time.
Other things that would be good for software defined radio: sinc3 filtering of the Goertzel accumulators. Updating the streamer/Goertzel NCO frequency at any time. If we wait until the end of the cycle, that adds a variable delay which might be problematic if we are trying to phase-lock an external input. Maybe for frequency modulation too. Note that decoding color NTSC/PAL video also requires quadrature demodulation.
That's known as a resolver. They're the workhorse of the top-of-the-line industrial servos. Can't beat 'em for rugged feedback. I first saw them when I started working on industrial gear in mid 1990's. And that was fully digital drives even then. I believe resolvers had been in use in analogue drives decades earlier.
Bit growth would require 32-bit integrators for just 256 clocks' worth of accumulation. Might as well bump up to 64 bits. That would need six 64-bit integrators to cover cosine and sine. That would add 3k flipflops to the design, or a 5% increase. Oh, the result buffers would add another 1k flops.
There is no window filter used here. I just stream the ADC bitstream out of a digital pin. Most of the filtering is done by the video decoder in the TV or capture card. The digital pins may not be quite linear as a delta-sigma output at these rates. Doing some filtering on the P2 might reduce this distortion. Or maybe it would be better to use a DAC instead of digital mode. This test was done at 180MHz. I added an RC filter to try to reduce some of the delta-sigma noise. These values are not at all optimal. They're just values I had at my desk. The actual video decoding is done by a video capture card, not P2 code.
By the way, We can implement SINC2 filtering on Goertzel X and Y. Should be good for 64K samples (clocks) per decimation, sticking with 32-bit accumulators. It would add 64 flops per cog, not much.
In these tests, the video actually looks worse at higher clock frequencies. Not surprising since raising the clock frequency raises the filter cutoff frequency. Having 4 different filter options is not overkill.
So, I was just playing with this on a Quickstart board, working on making a 2-channel triangular-weighted ADC, but for hardware reasons I was monitoring only a single physical ADC input (so the CTRB counter is in POS mode monitoring the same input pin as used by CTRA). I found that the standard deviation of the B channel was almost 4x that of channel A, even through supposedly they were monitoring the exact same data! I also verified that I saw similar results when just using an evenly-weighted acquisition.
What fixed the issue was having the CTRB actually monitor the output pin from CTRA (using LOGIC !A mode). Once I made that change the results were similar in average and standard deviation again. I am guessing that the CTRA was doing it's job and toggling the output pin, and that feedback via a resistor was changing the the input pin while CTRB was busy reading it?
I noticed that the above code used the same mode I had started with, and thus might have an artificially high standard deviation reported for the regular filter mode. In my testing I found that the triangular-weighting gave a standard deviation almost exactly 1/2 that of the evenly-weighted samples.
thanks,
Jonathan
Small skews in tsu.th mean the exact sample time is not the same, and hence the different results.
The !CTRA is what is actually fed-back, and is what is being averaged in the C, so that is the better place to add a second monitor point.
If we could get "professional quality" video using 8 pins, that is fine. Why? We could use an external video decoder IC instead. Those chips need at least 9 pins for their digital data connection. 8 data plus 1 for clock. Since the ADCs can measure adjacent pins, it's possible that some of these ADCs could measure the adjacent pin while their assigned pin is used for a digital output.
Interesting discovery, Jonathan. Hopefully there is no case where the window function INCREASES the noise.
I am implementing Sinc2 filtering in the Goertzel circuit and it works amazingly better than Sinc1. However, over time, the first integrator seems to accumulate a large offset and the final output becomes fractured-looking, with a DC offset proportional to the dwell time.
Now, in normal ADC Sinc2 filtering, we are only accumulating zeros and ones, whereas in this case we are accumulating 8-bit signed values in the first integrator. Being signed, they don't necessarily grow terminally, but there is an offset in the ADC that will accumulate and this seems to cause a mess after 5 minutes, or so.
Any ideas?
What should Sinc2 filtering look like for signed samples?
Are the 8-bit values sign-extended in the first integrator?
Yes, of course.
We need to find out how the inc/dec version works. There must be some simple difference, but I'm not figuring it out.
I gather there is a good reason for wanting it as signed integers. May just have to apply an offset.
Yes, I am still trying to get my head around what the problem is, exactly.
something like this:
accum <- ((n-1) * accum) / n + sample
that should fight DC accumulation, while preserving the time constant if n is large enough.
in P1 pseudo-code 🙄 :
might require some headroom bits tough.
Jonathan
PS: When I said apply an offset, I meant applied to output of the regular unsigned algorithm. Then it can be used as a signed integer.
The "regular algorithm" uses two's complement arithmetic. The difference here is that the input is 8-bit, not 1-bit. I think the maximum decimation rate R has been reduced from 1024 to 512. How many bits are being used for the accumulators?
CIC Filter Introduction shows the number of bits needed on p.4
This might be the issue, leading eventually to an overflow (or strictly speaking underflow) in the first accumulator. Is there any check for overflow?
The sin and cos table values range from -127..+127, so that's not an issue.
The first integrator's accumulations do accrue over time, though, and start to cause mild distortions, which get more pronounced over 15 minutes, causing the power baseline to become proportional to the measurement time. As well, the measurements become more grainy.
I've made a way to periodically reset the first integrators, but I think we need a more automatic approach.
The Wikipedia article on Goertzel's algorithm contains a link to a paper written by Gentleman, W. M. (1 February 1969). "An error analysis of Goertzel's (Watt's) method for computing Fourier coefficients" (PDF). The Computer Journal. 12 (2): 160–164. and if you follow the link in the footnotes on the Wiki it will take you to a copy of that paper which states in the abstract that "Goertzel's method, also known as Watt's algorithm, is on of the three standard methods of computing Fourier coefficients, and is commonly used when only a small number of coefficients is desired for a given sequence. This paper gives a floating point error analysis of the technique, and shows why it should be avoided, particularly for low frequencies."
I haven't tired to read it, but I assume there is a recursive nature to the Goertzel algorithm. Ie: It can build its own error over time.
That's a key feature of the sincX algorithm I believe, it is lossless so doesn't accumulate errors of its own making.
The new Goertzel circuit allows up to 4 ADC input pins with selectable polarity for each. This is going to make relative measurements possible, among other things.
In these videos, there is a stable 1MHz sine coming into one pin and a swept sine above and below 1MHz coming into another pin. The two pins are summed to form the input sample. The sample is then multiplied each clock by cosine and sine values which are accumulated and periodically converted to power (hypotenuse).
For the Sinc2 video, I'm clearing the first integrators at the outset of each swept-sine measurement. You can see the resolving difference between Sinc1 and Sinc2 quite nicely.