I'm going to say that the Tukey is better at the same length. The 16 and 24 Tukey windows may compare favorably to the longer trapezoid32. I'd say trapezoid32 is clearly better than Tukey8. And the Tukey8 will only be worse when quantized.

The problem is we could debate window functions for days. The optimal window solution is likely to be this: https://xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-230001.3.2 including the overlapping samples. Unfortunately this is likely expensive to implement. It seems like the Goertzel mode could be used to implement the window function semi-autonomously.

Anything that comes out to zero on the edges is not contributing. Either chop them off or recalculate.

Seconded. And apply this to the other end where the values are 64. That saves us from having to generate that sixth bit. But if having a zero or so at the end helps with quantization or the generation logic, ok.

Tukey8 used for this plot
0.029560
0.119069
0.257159
0.426219
0.604687
0.769805
0.900515
0.980147

Have there been any window function discussions for the P1 adc? All it should take is writing the window function values into FRQx. It could even handle overlapping windows by using the high and low words as separate samples.

p.s. Happy Thanksgiving! Tukey is making me think of Turkey. Turkey Turkey Turkey! :-*

Any chance we can expose the raw bit stream as a mode?

I do think a filter mode should be baked in. Many will be happy, it will just work. Over time, software will improve. Getting the bit stream would allow for that, and so what if a COG has to process it worst case? For the people seeking better, they will do it, and it will be worth it.

More pictures. These are details of 4096-sample conversions (12-bit) with the ADC receiving a finely-stepped ramp.

First, the straight accumulator approach (no windowing). The monitor DAC was wrapping vertically because of the noise amplitude:

It looks like the windowing is getting rid of sporadic +1/+2 contributions from the initial and terminal bits samples.

So, could you tell it a way, I can crasp what I see: The analog input voltage of the input is a staircase function?
You read out the counter at a given period?
You substract the last read value from the current value?

If I didn't know it better, I would think, you look to the future as you start and to the past as you stop reading ;-)

And just another weird idea: couldn't it be, that the spikes are an overflow of the scope or the ADC that output the signal?

That is very close to my guessing.

Yes, where you see the full-span noise, the 8-bit DAC is rolling over. That's not actually what the signal looks like. I was just outputting {reading[3:0], 4'b0} to the DAC, so we could clearly see LSB activity.

Evanh, we could have an ADC mode in the smart pin that does run continuously. It would be great if it output a new sample on every clock. That way, it could be read at any time without concern for framing.

I'm going to say that the Tukey is better at the same length. The 16 and 24 Tukey windows may compare favorably to the longer trapezoid32. I'd say trapezoid32 is clearly better than Tukey8. And the Tukey8 will only be worse when quantized.

The problem is we could debate window functions for days. The optimal window solution is likely to be this: https://xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-230001.3.2 including the overlapping samples. Unfortunately this is likely expensive to implement. It seems like the Goertzel mode could be used to implement the window function semi-autonomously.

Anything that comes out to zero on the edges is not contributing. Either chop them off or recalculate.

Seconded. And apply this to the other end where the values are 64. That saves us from having to generate that sixth bit. But if having a zero or so at the end helps with quantization or the generation logic, ok.

Tukey8 used for this plot
0.029560
0.119069
0.257159
0.426219
0.604687
0.769805
0.900515
0.980147

Have there been any window function discussions for the P1 adc? All it should take is writing the window function values into FRQx. It could even handle overlapping windows by using the high and low words as separate samples.

p.s. Happy Thanksgiving! Tukey is making me think of Turkey. Turkey Turkey Turkey! :-*

The trapezoid filter is just a counter. It's much less expensive than a Tukey filter. We can cheaply do a trapezoid 64 or 128. For the same length, I've noticed that the Tukey filter is absolutely better than the trapezoid filter.

That Vorbis filter looks ideal, but very expensive.

One great development from all this, as Saucy pointed out, is that the P1 can easily benefit from windowing to clean up its ADC readings. Something new.

Saucy, could you show response of a trap64 and trap128, relative to the others?

That's expected behaviour, as the ADCs that use filters, usually specify a settling time. FWIR it is ~3 samples.
You could check the settling time of your filter, after a source change ?

Evanh, we could have an ADC mode in the smart pin that does run continuously. It would be great if it output a new sample on every clock. That way, it could be read at any time without concern for framing.

Yes, but you lose the auto/self calibration features, and a ADC that lacks calibration is limited in use....

Chip, I'm working on a 16 sample Tukey now - nearly done.

Super. Maybe it could top-out at 32.

Now I've got the hang of this we could have max value of 32 or 64. I've called the new one tonight Tukey24/64 (samples/max) and we could have the following set:

Chip, I'm working on a 16 sample Tukey now - nearly done.

Super. Maybe it could top-out at 32.

Now I've got the hang of this we could have max value of 32 or 64. I've called the new one tonight Tukey24/64 (samples/max) and we could have the following set:

The sad part is that the windowing needs to occur on the original bitstream. Once you have a composite sample, it's too late.

Thinking some more about the mechanism here, and the outlier removal nature of the filter...
When it does slice a rounded up portion of a bit on the ends, the next sample tends to miss that (ie it rounds down) - you can see on the plots that a common pattern is a 2 LSB step.
That behaviour means the info is not totally lost, and some multiple-sample smoothing can remove some of the outliers.
eg if you take 2 readings, and they differ by > 1 LSB, you can use the average instead. That's a natural LPF.
That can repeat over more samples, eg 4 samples can have 3 decisions, then 2 decisions then 1
Another simpler smoother could be to take 3 samples, and sum first and last, and 2x middle then divide that by 4, for a coarse window approach.

Evanh, we could have an ADC mode in the smart pin that does run continuously. It would be great if it output a new sample on every clock. That way, it could be read at any time without concern for framing.

As far as I get it: that's just what is needed.
I do not know anyhing about the streamer. So here what I can imagine: Every clock the comparator creates 0 or 1. 32 bits need 32 clocks, time enough to run some assembler codes. If there is also a counter that counts the number of bits set we can read at clkfreq/32 this counter and have 5 bits of resolution.
Still we are able to access the bits of the bitstream and apply some sliding average or whatever we find to be usefull.
If we anyway best case can read the bit counter every second clock, it may be usefull to two consecutive bits to create a new bitstream and then average 3 bits of this bitstream. That could filter noise and still keep the signal as fast as possible, that is, low phase shift!

The biggest logic shrink comes from 24 -> 16, as 16 -> 24 increases 'rom lines' by 50%, but 32->64 adds one adder bit, and one bit to the rom widths, for a 20% cost.
Put another way, the X-axis has a higher cost than the Y axis.

Of course, if that 20% gives no measurable gain, then prune it too. That's why these need to be range-measured to know what the gains actually are.

You can drop from 16 too, and each drop saves 'rom lines' so saves logic.
To me, the best cosine fit, is when the middle samples are all in a line (ie best fit the straighest part of the cosine).
Currently,
Tukey16/32, 0, 1, 2, 4, 6, 8,11,14,18,21,24,26,28,30,31,32
has 2.3.3.4.3.3.2, but a small shrink in X could produce 3.4.4.4.4.3 - perhaps around Tukey12 ?

Below is a graph of Tukey16/32 showing the difference between it and the trapezoidal ramp. Note the symmetry in the orange values. Adding them to a simple counter will result in the Tukey window.

The peaks near fs/7 are:
Rectangular -66
32purple -87 (-21)
64green -92 (-5)
128sky -99 (-7)
256red -105 (-6)

About 6dB per doubling. So doubling the length halves the noise voltage.

No doubt that at longer lengths the trapezoid is far cheaper than a Tukey.

I think the Goertzel mode would work for those who want an more complex window function. I wonder if clearing the accumulators would cause any problems for using the sine and cosine accumulators as overlapping window functions. I'm thinking not, as long as they are read and cleared atomically. In the USB host, I couldn't clear PHSx because the P1 had about 7 clocks between reading it and clearing it.

Idea for how to get overlapped windowed samples with the Goertzel mode:
Use the NCO to ramp up the window.
Stop at the top.
Read the accumulators.
Ramp back down.
Stop at the bottom.
Read the accumulators.
Add 2 readings together to get a sample.
The LUT is chosen so one window ramps up while the other ramps down. It's necessary to read the accumulator while one of the windows is zero. The other window will be at full value, but that should not be a problem since any noise from that edge should be compensated when the other half is added.

The biggest logic shrink comes from 24 -> 16, as 16 -> 24 increases 'rom lines' by 50%, but 32->64 adds one adder bit, and one bit to the rom widths, for a 20% cost.
Put another way, the X-axis has a higher cost than the Y axis.

Of course, if that 20% gives no measurable gain, then prune it too. That's why these need to be range-measured to know what the gains actually are.

You can drop from 16 too, and each drop saves 'rom lines' so saves logic.
To me, the best cosine fit, is when the middle samples are all in a line (ie best fit the straighest part of the cosine).
Currently,
Tukey16/32, 0, 1, 2, 4, 6, 8,11,14,18,21,24,26,28,30,31,32
has 2.3.3.4.3.3.2, but a small shrink in X could produce 3.4.4.4.4.3 - perhaps around Tukey12 ?

Yes, that +4 increment in Tukey16/32 due to rounding is slightly troubling, as the actual difference is only 3.13655.

I can create any TukeyX/Y with a Chi-square goodness of fit value and Tukey12/32 fits very well.

As you were typing, I was running the tables on exactly that set (I took a guess), and it shrinks the rom lines from 12 total to 8 total.
Question is, is the truncation starting to show yet, on real ADC INL/DNL numbers, or can we go further ?

Wasn't understanding, at first, how Goertzel could be helped by windowing, too. Totally get it now. Duh.

Whoa! Wait. I just realized it's not so simple. We take the one bit from the ADC and we use it to add or subtract two signed 8-bit LUT values to/from two 32-bit accumulators. We would have to have multipliers to scale those 8-bit values. Gets a little more expensive.

The peaks near fs/7 are:
Rectangular -66
32purple -87 (-21)
64green -92 (-5)
128sky -99 (-7)
256red -105 (-6)

About 6dB per doubling. So doubling the length halves the noise voltage.

No doubt that at longer lengths the trapezoid is far cheaper than a Tukey.

I think the Goertzel mode would work for those who want an more complex window function. I wonder if clearing the accumulators would cause any problems for using the sine and cosine accumulators as overlapping window functions. I'm thinking not, as long as they are read and cleared atomically. In the USB host, I couldn't clear PHSx because the P1 had about 7 clocks between reading it and clearing it.

Idea for how to get overlapped windowed samples with the Goertzel mode:
Use the NCO to ramp up the window.
Stop at the top.
Read the accumulators.
Ramp back down.
Stop at the bottom.
Read the accumulators.
Add 2 readings together to get a sample.
The LUT is chosen so one window ramps up while the other ramps down. It's necessary to read the accumulator while one of the windows is zero. The other window will be at full value, but that should not be a problem since any noise from that edge should be compensated when the other half is added.

Thanks, Saucy.

This makes me realize that a Tukey just gets you there faster, allowing more signal in earlier. For a 4096-sample conversion, a trapezoid256 would only diminish amplitude by 1/16th. We could make it settable. Or, always proportional to the sample size.

I woke up this morning and realised there was an obvious extra step to try with Sinc2. Since the leading two samples of the rolling Sinc2 output are clearly poorer than the rest, why not do a shorter window and roll for multiple samples and only use the third or fourth sample for each data point of the non-rolling graph ...
Results look good! Not as perfect as unbroken but I think it might be the "good enough" I was after.

Someone with filter knowledge might be able to say how good.

The graph data points are each made from four rolling 64 clock samples with a filter reset in front of the leading sample. Each numerical value is sample3 + sample4 summed, with sample1 and sample2 discarded.

octave:2> linspace(-4,4,8)'/10 % What I actually did
ans =
-0.400000
-0.285714
-0.171429
-0.057143
0.057143
0.171429
0.285714
0.400000
octave:3> linspace(-5,5,10)'/10 % What I meant to do (and remove the first and last)
ans =
-0.500000 % sin(-0.5*pi)=-1 when we add 1 this value is 0
-0.388889
-0.277778
-0.166667
-0.055556
0.055556
0.166667
0.277778
0.388889
0.500000 % sin(0.5*pi)=1 when we add 1 this value is "full scale"

(1+sin( x))/2 Piece of Tukey window
multiply by 64 to scale up

So, in an effort to save a line of code, I made a subtle mistake.

Tukey24/32 can be tested as a Tukey22/32 ignoring the first and last duplicate values. I suggest trying Tukey16/32 first. We don't want to use /64 provided /32 quality is acceptable.

Wasn't understanding, at first, how Goertzel could be helped by windowing, too. Totally get it now. Duh.

Whoa! Wait. I just realized it's not so simple. We take the one bit from the ADC and we use it to add or subtract two signed 8-bit LUT values to/from two 32-bit accumulators. We would have to have multipliers to scale those 8-bit values. Gets a little more expensive.

What I was talking about there was repurposing the Goertzel hardware to perform sample windowing. With the right NCO frequency it should step through the LUT one at a time. Thus, if the LUT contains a window function instead of sine/cosine the Goertzel hardware will filter the sigma-delta output bits.

If one wanted to do a windowed Goertzel, we should be able to do it with the same method. We could pre-calculate a sine/cosine sequence at the desired frequency. Then apply the window function to the ends of this sequence. Load this sequence into the LUT. This look like it will limit the accumulation time to a maximum of 512 clocks. (Although there may be ways around this by modifying the LUT offset. )

I can't make plots tonight because I'm on a different computer.
-James

Are we not shoveling a lot of hay on the needle? It seems, we do not precisely lay out what we are doing and what we are looking for. If there is a changing signal, every (useful) filter smoothes the signal. This is equivalent to removing higher frequencies. A sawtooth shows an abrupt change of the signal value which you only can detect after a given time. Any removal of higher frequencies will destroy the shape of the sawtooth.
The ADC we have is a filter by itself! The readout we have after, lets say 4096 clocks is the integrated signal of 4096 clocks, that is, in principle we know nothing about how the signal changed in between. Is there any idea, how much noise is to be expected? Instead of filtering we should dig into the question, how the noise is originated.
But this is NOT a problem of the current design. This design will not change, it's maybe up to the next propeller, if there is any reason to improve hardware.
All the aspects of filtering can be handled in software and I see nobody here having a solution ready. That definitely needs either experience (those people are engaged by the industry) or open R&D. That will take time.

ErNa, we can see that windowing the first and last samples improves noise by quite a bit. We just have to figure out what shape of window would be best.

Also, the analog front end of the ADC is only good for a few MHz. At 180MHz clock speed, a 32-sample ramp-up/down transition window only takes 178ns. Without the window, though, we definitely get 1 1/2 bits of noise.

Small detail, Erna. Smartpin mode %01100 is a Sinc1 filter suited to a sigma-delta bitstream.

The actual ADC hardware is modulating the bitstream. This bitstream appears as a digital pin IN data bit every sysclock. By leaving the smartpin disabled, Chip is able to use a Streamer to capture every clocked bit into a buffer in hubram, packed as 32 bits per longword, and then apply different effects.

So, alternative smartpin filters is what is being investigated here.

The only weak spots appear at the very high/low ends of the ramp. But that isn't a weakness of the method so much as a limit of reduced sample width verses the very sparse bitstream at each end of the 50000 ramp.

The only weak spots appear at the very high/low ends of the ramp. But that isn't a weakness of the method so much as a limit of reduced sample width verses the very sparse bitstream at each end of the 50000 ramp.

What does that graph mean? I need to look at his code.

## Comments

11,618Oh, I've run out of ideas on what I was doing. Sinc2 is cool but it needs unbroken stream. End of story.

EDIT: I had initially thought it might handle discontinuity in a good enough manner that it'd suit all purposes ... but that last graph shows the ends are same as Sinc1: https://forums.parallax.com/discussion/download/124343/50000sinc2.png

366The problem is we could debate window functions for days. The optimal window solution is likely to be this: https://xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-230001.3.2 including the overlapping samples. Unfortunately this is likely expensive to implement. It seems like the Goertzel mode could be used to implement the window function semi-autonomously.

A CIC filter may be a good choice as well.

Seconded. And apply this to the other end where the values are 64. That saves us from having to generate that sixth bit. But if having a zero or so at the end helps with quantization or the generation logic, ok.

Have there been any window function discussions for the P1 adc? All it should take is writing the window function values into FRQx. It could even handle overlapping windows by using the high and low words as separate samples.

p.s. Happy Thanksgiving! Tukey is making me think of Turkey. Turkey Turkey Turkey! :-*

10,195I do think a filter mode should be baked in. Many will be happy, it will just work. Over time, software will improve. Getting the bit stream would allow for that, and so what if a COG has to process it worst case? For the people seeking better, they will do it, and it will be worth it.

13,610Yes, where you see the full-span noise, the 8-bit DAC is rolling over. That's not actually what the signal looks like. I was just outputting {reading[3:0], 4'b0} to the DAC, so we could clearly see LSB activity.

13,610Evanh, we could have an ADC mode in the smart pin that does run continuously. It would be great if it output a new sample on every clock. That way, it could be read at any time without concern for framing.

13,610The trapezoid filter is just a counter. It's much less expensive than a Tukey filter. We can cheaply do a trapezoid 64 or 128. For the same length, I've noticed that the Tukey filter is absolutely better than the trapezoid filter.

That Vorbis filter looks ideal, but very expensive.

One great development from all this, as Saucy pointed out, is that the P1 can easily benefit from windowing to clean up its ADC readings. Something new.

Saucy, could you show response of a trap64 and trap128, relative to the others?

Happy Thanksgiving, Everyone!

1,757These values are not in agreement with mine.

14,812That's expected behaviour, as the ADCs that use filters, usually specify a settling time. FWIR it is ~3 samples.

You could check the settling time of your filter, after a source change ?

Yes, but you lose the auto/self calibration features, and a ADC that lacks calibration is limited in use....

1,757Chip, have you tried Tukey16/32?

It's the smallest, logically.

13,610I haven't been able to, yet, but will tonight.

It would be good if Saucy could graph it, too.

14,812Thinking some more about the mechanism here, and the outlier removal nature of the filter...

When it does slice a rounded up portion of a bit on the ends, the next sample tends to miss that (ie it rounds down) - you can see on the plots that a common pattern is a 2 LSB step.

That behaviour means the info is not totally lost, and some multiple-sample smoothing can remove some of the outliers.

eg if you take 2 readings, and they differ by > 1 LSB, you can use the average instead. That's a natural LPF.

That can repeat over more samples, eg 4 samples can have 3 decisions, then 2 decisions then 1

Another simpler smoother could be to take 3 samples, and sum first and last, and 2x middle then divide that by 4, for a coarse window approach.

1,648I do not know anyhing about the streamer. So here what I can imagine: Every clock the comparator creates 0 or 1. 32 bits need 32 clocks, time enough to run some assembler codes. If there is also a counter that counts the number of bits set we can read at clkfreq/32 this counter and have 5 bits of resolution.

Still we are able to access the bits of the bitstream and apply some sliding average or whatever we find to be usefull.

If we anyway best case can read the bit counter every second clock, it may be usefull to two consecutive bits to create a new bitstream and then average 3 bits of this bitstream. That could filter noise and still keep the signal as fast as possible, that is, low phase shift!

14,812The biggest logic shrink comes from 24 -> 16, as 16 -> 24 increases 'rom lines' by 50%, but 32->64 adds one adder bit, and one bit to the rom widths, for a 20% cost.

Put another way, the X-axis has a higher cost than the Y axis.

Of course, if that 20% gives no measurable gain, then prune it too. That's why these need to be range-measured to know what the gains actually are.

You can drop from 16 too, and each drop saves 'rom lines' so saves logic.

To me, the best cosine fit, is when the middle samples are all in a line (ie best fit the straighest part of the cosine).

Currently,

Tukey16/32, 0, 1, 2, 4, 6, 8,11,14,18,21,24,26,28,30,31,32

has 2.3.3.4.3.3.2, but a small shrink in X could produce 3.4.4.4.4.3 - perhaps around Tukey12 ?

1,757366Rectangular -66

32purple -87 (-21)

64green -92 (-5)

128sky -99 (-7)

256red -105 (-6)

About 6dB per doubling. So doubling the length halves the noise voltage.

No doubt that at longer lengths the trapezoid is far cheaper than a Tukey.

I think the Goertzel mode would work for those who want an more complex window function. I wonder if clearing the accumulators would cause any problems for using the sine and cosine accumulators as overlapping window functions. I'm thinking not, as long as they are read and cleared atomically. In the USB host, I couldn't clear PHSx because the P1 had about 7 clocks between reading it and clearing it.

Idea for how to get overlapped windowed samples with the Goertzel mode:

Use the NCO to ramp up the window.

Stop at the top.

Read the accumulators.

Ramp back down.

Stop at the bottom.

Read the accumulators.

Add 2 readings together to get a sample.

The LUT is chosen so one window ramps up while the other ramps down. It's necessary to read the accumulator while one of the windows is zero. The other window will be at full value, but that should not be a problem since any noise from that edge should be compensated when the other half is added.

1,757Yes, that +4 increment in Tukey16/32 due to rounding is slightly troubling, as the actual difference is only 3.13655.

I can create any TukeyX/Y with a Chi-square goodness of fit value and Tukey12/32 fits very well.

14,812As you were typing, I was running the tables on exactly that set (I took a guess), and it shrinks the rom lines from 12 total to 8 total.

Question is, is the truncation starting to show yet, on real ADC INL/DNL numbers, or can we go further ?

13,610Whoa! Wait. I just realized it's not so simple. We take the one bit from the ADC and we use it to add or subtract two signed 8-bit LUT values to/from two 32-bit accumulators. We would have to have multipliers to scale those 8-bit values. Gets a little more expensive.

13,610Thanks, Saucy.

This makes me realize that a Tukey just gets you there faster, allowing more signal in earlier. For a 4096-sample conversion, a trapezoid256 would only diminish amplitude by 1/16th. We could make it settable. Or, always proportional to the sample size.

11,618Results look good! Not as perfect as unbroken but I think it might be the "good enough" I was after.

Someone with filter knowledge might be able to say how good.

The graph data points are each made from four rolling 64 clock samples with a filter reset in front of the leading sample. Each numerical value is sample3 + sample4 summed, with sample1 and sample2 discarded.

EDIT: Retitled the graph

11,618366I was experimenting with a phase offset that I forgot to remove. The Octave/Matlab code I used was Explanation:

Generate a sequence of values (1+sin( x))/2 Piece of Tukey window

multiply by 64 to scale up

So, in an effort to save a line of code, I made a subtle mistake.

1,757Tukey24/32 can be tested as a Tukey22/32 ignoring the first and last duplicate values. I suggest trying Tukey16/32 first. We don't want to use /64 provided /32 quality is acceptable.

366If one wanted to do a windowed Goertzel, we should be able to do it with the same method. We could pre-calculate a sine/cosine sequence at the desired frequency. Then apply the window function to the ends of this sequence. Load this sequence into the LUT. This look like it will limit the accumulation time to a maximum of 512 clocks. (Although there may be ways around this by modifying the LUT offset. )

I can't make plots tonight because I'm on a different computer.

-James

1,757Here are the values I get for a symmetrical Tukey8/1:

1,648The ADC we have is a filter by itself! The readout we have after, lets say 4096 clocks is the integrated signal of 4096 clocks, that is, in principle we know nothing about how the signal changed in between. Is there any idea, how much noise is to be expected? Instead of filtering we should dig into the question, how the noise is originated.

But this is NOT a problem of the current design. This design will not change, it's maybe up to the next propeller, if there is any reason to improve hardware.

All the aspects of filtering can be handled in software and I see nobody here having a solution ready. That definitely needs either experience (those people are engaged by the industry) or open R&D. That will take time.

13,610Also, the analog front end of the ADC is only good for a few MHz. At 180MHz clock speed, a 32-sample ramp-up/down transition window only takes 178ns. Without the window, though, we definitely get 1 1/2 bits of noise.

11,618The actual ADC hardware is modulating the bitstream. This bitstream appears as a digital pin IN data bit every sysclock. By leaving the

smartpin disabled, Chip is able to use a Streamer to capture every clocked bit into a buffer in hubram, packed as 32 bits per longword, and then apply different effects.So, alternative smartpin filters is what is being investigated here.

11,618I'm liking the latest ability of Sinc2 to be used for discontinuous single point sampling. Every one of those data points on the graph were built independently: https://forums.parallax.com/discussion/comment/1454857/#Comment_1454857

The only weak spots appear at the very high/low ends of the ramp. But that isn't a weakness of the method so much as a limit of reduced sample width verses the very sparse bitstream at each end of the 50000 ramp.

13,610What does that graph mean? I need to look at his code.