Before we can know which way is best to proceed, we need to understand what Evanh is working on. It might be that we have an additional 8-bit bus from each smart pin conveying eight bits of ADC sample per clock.

Kind of a sample history shift register, with new samples being added to one end, while the older ones are being discarded at the other?

That would be expensive. I'm hoping we can just keep outputting accumulator values.

Yanomani, I'm thinking about the digital filter method that hopefully doubles the effective number of bits.

256 Bits 0 or1 can generate a value Range of 0 - 255 and can be represented by a 8 Bit Word. There is one way to create a 0 or 255 value, 256 ways to create a 1 or 254 count. More and more ways to create all the other values. If we read the counter only every 256th clock we lose information how the Bits streamed in. If we keep the bitstream in a circular buffer we can have 8bit resolution every clock. This value represents the average signal over 256 clocks. So there is a Time resp. phase Shift. As every single value can change no more then 1 or -1, the adc can not follow faster slopes. Having a bitstream and summing up a number of bits is equal to multiply the complete stream with a Signal ......0000011111111100000..... that is moving, called convolution or folding.

256 Bits 0 or1 can generate a value Range of 0 - 255 and can be represented by a 8 Bit Word. There is one way to create a 0 or 255 value, 256 ways to create a 1 or 254 count. More and more ways to create all the other values. If we read the counter only every 256th clock we lose information how the Bits streamed in. If we keep the bitstream in a circular buffer we can have 8bit resolution every clock. This value represents the average signal over 256 clocks. So there is a Time resp. phase Shift. As every single value can change no more then 1 or -1, the adc can not follow faster slopes. Having a bitstream and summing up a number of bits is equal to multiply the complete stream with a Signal ......0000011111111100000..... that is moving, called convolution or folding.

ErNa, do you believe it's possible to get a 16-bit or a 13-bit result from 256 bits of 1-bit ADC samples?

The advantage is no mult is needed and it works for any number of bits . The tukey filter doesnt work on a bitstream but needs bytes to multiply them with factors and integrate. Yes, resolution can be increased if we are sure that the signals slope is limited.

The differing parameters means each graph scales according to whatever readings are produced. I'm not rescaling anything yet. Note: Discontinuous sampling doesn't get close to the same bit depth as full blown continuous Sinc2.

That latest discontinuous Sinc2 method is producing peak values of just over 8k. So 13-bit depth from 256 clocks per reading. Fully continuous is 16-bit depth from 256 clocks.

Okay, sounds great, but when you say 13-bit or 16-bit depth from 256 samples, are you able to resolve every single possible 13-bit or 16-bit value from 256 samples? This seems, to me, the true criteria for being able to claim we've doubled the bit depth. That's what I'm totally not clear on.

My skill level here is to look for visual ripples in the data. Hence the graphs.

Here's the spreadsheet with the useful data I've collected.

The advantage is no mult is needed and it works for any number of bits . The tukey filter doesnt work on a bitstream but needs bytes to multiply them with factors and integrate. Yes, resolution can be increased if we are sure that the signals slope is limited.

We could make a 15-bit shifter for incoming ADC bits and mantain a 4-bit inventory count. Would that be a basis for increasing resolution? It would certainly limit slope.

The advantage is no mult is needed and it works for any number of bits . The tukey filter doesnt work on a bitstream but needs bytes to multiply them with factors and integrate. Yes, resolution can be increased if we are sure that the signals slope is limited.

The Tukey could work on a bit stream if you had signals to know when the opening window was beginning and when the closing window was beginning. You would output a '1' every time you reached an MSB of summation for the weighted bits.

The differing parameters means each graph scales according to whatever readings are produced. I'm not rescaling anything yet. Note: Discontinuous sampling doesn't get close to the same bit depth as full blown continuous Sinc2.

That latest discontinuous Sinc2 method is producing peak values of just over 8k. So 13-bit depth from 256 clocks per reading. Fully continuous is 16-bit depth from 256 clocks.

Okay, sounds great, but when you say 13-bit or 16-bit depth from 256 samples, are you able to resolve every single possible 13-bit or 16-bit value from 256 samples? This seems, to me, the true criteria for being able to claim we've doubled the bit depth. That's what I'm totally not clear on.

My skill level here is to look for visual ripples in the data. Hence the graphs.

Here's the spreadsheet with the useful data I've collected.

I think the best way to implement Tukey is directly as an up/down counter that counts in Tukey values, not binary. The only control inputs it needs are up/down and clear/start. It gets the signal to start and when it reaches the max value, probably 32, the programmed sampling period can begin. When this ends, the Tukey state machine counts down to zero and stops.

My program that generates Tukey values also calculates a Chi-Square sum to give an idea of a quality. However, errors for rounded values of 0 and 1 were distorting the results massively, so I ignore those two now and I have edited the table I posted earlier. An upshot is that the quality of 17/32 is better than 16/32.

Here are the current suggested Tukey candidates for testing:

My program that generates Tukey values also calculates a Chi-Square sum to give an idea of a quality. However, errors for rounded values of 0 and 1 were distorting the results massively, so I ignore those two now and I have edited the table I posted earlier. An upshot is that the quality of 17/32 is better than 16/32.

Here are the current suggested Tukey candidates for testing:

* indicates first and last duplicate values omitted.

Chip, could you try 17*/32 today? We really need to know how short the ramps can be. If 17*/32 is good, then 12/32 should be next.

A good test to run alongside the 17*/32 would be a time-doubled 7/32 (every value stays for 2 sysclks, taking 14 to walk the table)
A little more coarse, but much less logic cost, and it is that logic cost that needs to be driven down, as it multiplies by 64 here

I think the best way to implement Tukey is directly as an up/down counter that counts in Tukey values, not binary. The only control inputs it needs are up/down and clear/start..

Yes, I agree for the smaller Tukey sizes, a direct state engine can be used.

I'm really intrigued about getting more bits per clock.

What if we had a 31-bit shift register into which we shifted ADC bits, and then weighted each bit position to an up+down ramp (trapezoid or cosine)? The positions could sum to $FF if all 31 bits were high for 8-bit containment. It seems to me that that would give us an 8-bit sample every clock. Those 8-bit samples could be summed, as well, to get bigger samples.

A 31-position trapezoid pattern that sums to $FF would look like this:

1 2 3 4 5 6 7 8 9 A B C D E F F F E D C B A 9 8 7 6 5 4 3 2 1

So, on each clock, all 31 weighted bits are summed up to produce an 8-bit sample.

I'm going to try this out to see what it looks like, using captured ADC streams.

Does anyone have any prediction about how this will work out? I realize there's going to be a low-pass effect.

First-order is square. Second-order is triangular.

Evan's sinc^3 filter is what I would call a CIC decimator. I didn't realize until today that a 1 stage CIC has a rectangular response to it. Or that adding more stages is the same as convolving rectangular filters together. Note how the response gets longer for higher order filters. It's probably necessary to oversample by the order and discard the extra samples to get non-overlapping samples. I compensated for this in my analysis by shortening the higher order sinc filters. I approached the problem as: what's the best ADC measurement we can get from a certain number of bits. Of course using more bits is going to be better.

The sinc^2 / triangular window is cheap to implement, as it's just count up, count down. We could consider it "a long trapezoidal window taken to its limit." I was wondering about implementing these windows, mainly, how do we ramp the triangle back down when it's time?

Or it could be implemented as integrators. Only the integrators need to be in hardware, the differentiation/comb part can be done in software.

It seems counter-intuitive that "throwing away" more bits will produce a better result, but that is result of my analysis. Chip, can you try a triangular filter?
EDIT: I realized that the only way the P2 was able to process the bitstream in realtime was by using the "ones" instruction for the flat top part. But it may be possible with a look up table of 8-11 bits input to count the bits with increasing weights. Then add increasing offsets to the LUT output.
I may be testing some of these theories on the P1 soon.

First-order is square. Second-order is triangular.

Evan's sinc^3 filter is what I would call a CIC decimator. I didn't realize until today that a 1 stage CIC has a rectangular response to it. Or that adding more stages is the same as convolving rectangular filters together. Note how the response gets longer for higher order filters. It's probably necessary to oversample by the order and discard the extra samples to get non-overlapping samples. I compensated for this in my analysis by shortening the higher order sinc filters. I approached the problem as: what's the best ADC measurement we can get from a certain number of bits. Of course using more bits is going to be better.

The sinc^2 / triangular window is cheap to implement, as it's just count up, count down. We could consider it "a long trapezoidal window taken to its limit." I was wondering about implementing these windows, mainly, how do we ramp the triangle back down when it's time?

Or it could be implemented as integrators. Only the integrators need to be in hardware, the differentiation/comb part can be done in software.

It seems counter-intuitive that "throwing away" more bits will produce a better result, but that is result of my analysis. Chip, can you try a triangular filter?

Saucy, do you mean that for a 256-clock bitstream, we ramp up the first 128 samples and then ramp down the second 128 samples? Samples would be weighted 1..128, then 128..1, right?

Saucy, I tried a triangular filter and it looks the best, yet! There's no jumping around +/- levels. What's the next step? How do we layer another one on?

I'm really intrigued about getting more bits per clock.

What if we had a 31-bit shift register into which we shifted ADC bits, and then weighted each bit position to an up+down ramp (trapezoid or cosine)? The positions could sum to $FF if all 31 bits were high for 8-bit containment. It seems to me that that would give us an 8-bit sample every clock. Those 8-bit samples could be summed, as well, to get bigger samples.

A 31-position trapezoid pattern that sums to $FF would look like this:

1 2 3 4 5 6 7 8 9 A B C D E F F F E D C B A 9 8 7 6 5 4 3 2 1

So, on each clock, all 31 weighted bits are summed up to produce an 8-bit sample.

I'm going to try this out to see what it looks like, using captured ADC streams.

Does anyone have any prediction about how this will work out? I realize there's going to be a low-pass effect.

You get multi-bit samples more quickly by using a triangular or trapezoidal sliding window that moves by a single-bit sample every clock. Bits moving up the ramp would be worth +1 compared to the last clock and bits moving down the ramp -1. You just need to keep a count of the number of '1' bits in the ramps and the plateau (if there is one). The problem is that this requires storing samples and only very small windows could be handled by a smart pin.

TonyB_, I see now that the Tukey filter is the best when you want to "open the valve" in the shortest number of samples, but if shortening the time is not critical, a triangle filter is cheap. Any trapezoid window could be improved by a Tukey window in the same place, but Tukey256 would be very expensive compared to Trap256.

I'm really intrigued about getting more bits per clock.

These SDM filters exist, but they are not compact.
Here is one example google finds, of the FPGA resource used

After computer simulations the design was implemented in a Xilinx FPGA (XC3S200-4FT256). Implementation results are shown in Table 4 below:
Number of BUFGMUXs 1 (8 ) 12 %
Number of External IOBs 16 (173) 9 %
Number of LOCed IOBs 16 (16) 100 %
Number of MULT18X18s 2 (12) 16 %
Number of RAMB16s: 3 (12) 25 %
Number of Slices 451 (1920) 23 %
Number of SLICEMs 0 (960) 0 %

Saucy, I tried a triangular filter and it looks the best, yet! There's no jumping around +/- levels. What's the next step? How do we layer another one on?

Probably need to go to the CIC structure like shown on page 1.

The stuff in the do{}while loop needs happens for every bit in. The rest happens for every sample out. Consecutive samples will not be entirely independent.

Having the sums calculated in hardware would be very elegant way to do the filtering. The code could just read one of the accumulators (the interval needs to be the same) and do some post-processing. Although it would have to oversample by 4x or so to get independent samples. I don't know it it's worth having a 4th order accumulator.
-James

TonyB_, I see now that the Tukey filter is the best when you want to "open the valve" in the shortest number of samples, but if shortening the time is not critical, a triangle filter is cheap. Any trapezoid window could be improved by a Tukey window in the same place, but Tukey256 would be very expensive compared to Trap256.

A triangular window has a gain of 0.5, whereas a Tukey window could be close to 1.0. What does that the mean for the effective number of bits?

How many counters and adders are there in each smart pin?

I'm not arguing in favour of Tukey at the expense of anything else. From the output of a binary counter, only a little logic would be needed to generate a small error term, which when added to the counter would create the Tukey value to add to the accumulator. One counter and two adders needed and Sinc3 requires two adders.

You can easily extract two 3 out of 4, and one 6 out of 8 majority voting, almost at a glimpse!

And because two nearby Gio/Vio node-dependant groups are being hammered at the same time, you can trust cleansing one of them, based on its own history, and also in a 3 out of 4 majority voting at its neighbour.

TonyB_, I see now that the Tukey filter is the best when you want to "open the valve" in the shortest number of samples, but if shortening the time is not critical, a triangle filter is cheap. Any trapezoid window could be improved by a Tukey window in the same place, but Tukey256 would be very expensive compared to Trap256.

A triangular window has a gain of 0.5, whereas a Tukey window could be close to 1.0. What does that the mean for the effective number of bits?

How is the gain pushing 2x? Double gain adds 1 bit, right?

## Comments

14,019That would be expensive. I'm hoping we can just keep outputting accumulator values.

Yanomani, I'm thinking about the digital filter method that hopefully doubles the effective number of bits.

1,72314,019ErNa, do you believe it's possible to get a 16-bit or a 13-bit result from 256 bits of 1-bit ADC samples?

1,524So, 5 out of 6 or 7, 6 out of 8, 7 out of 9 or 10, and so on.

1,72314,844My skill level here is to look for visual ripples in the data. Hence the graphs.

Here's the spreadsheet with the useful data I've collected.

14,019We could make a 15-bit shifter for incoming ADC bits and mantain a 4-bit inventory count. Would that be a basis for increasing resolution? It would certainly limit slope.

14,019The Tukey could work on a bit stream if you had signals to know when the opening window was beginning and when the closing window was beginning. You would output a '1' every time you reached an MSB of summation for the weighted bits.

14,019But you must have some insight on the matter.

2,0972,097Here are the current suggested Tukey candidates for testing:

* indicates first and last duplicate values omitted.

Chip, could you try 17*/32 today? We really need to know how short the ramps can be. If 17*/32 is good, then 12/32 should be next.

15,133A good test to run alongside the 17*/32 would be a

time-doubled 7/32(every value stays for 2 sysclks, taking 14 to walk the table)A little more coarse, but much less logic cost, and it is that logic cost that needs to be driven down, as it multiplies by 64 here

Yes, I agree for the smaller Tukey sizes, a direct state engine can be used.

14,019What if we had a 31-bit shift register into which we shifted ADC bits, and then weighted each bit position to an up+down ramp (trapezoid or cosine)? The positions could sum to $FF if all 31 bits were high for 8-bit containment. It seems to me that that would give us an 8-bit sample every clock. Those 8-bit samples could be summed, as well, to get bigger samples.

A 31-position trapezoid pattern that sums to $FF would look like this:

1 2 3 4 5 6 7 8 9 A B C D E F F F E D C B A 9 8 7 6 5 4 3 2 1

So, on each clock, all 31 weighted bits are summed up to produce an 8-bit sample.

I'm going to try this out to see what it looks like, using captured ADC streams.

Does anyone have any prediction about how this will work out? I realize there's going to be a low-pass effect.

14,019Right now, I'm trying to get my head around this double-the-bits idea. I realize that what I proposed above might be more realistic for 5-bit results.

2,097Thanks, Chip. Are any as good as the original 32 sample version?

459from a certain number of bits. Of course using more bits is going to be better.The sinc^2 / triangular window is cheap to implement, as it's just count up, count down. We could consider it "a long trapezoidal window taken to its limit." I was wondering about implementing these windows, mainly, how do we ramp the triangle back down when it's time?

Or it could be implemented as integrators. Only the integrators need to be in hardware, the differentiation/comb part can be done in software.

It seems counter-intuitive that "throwing away" more bits will produce a better result, but that is result of my analysis. Chip, can you try a triangular filter?

EDIT: I realized that the only way the P2 was able to process the bitstream in realtime was by using the "ones" instruction for the flat top part. But it may be possible with a look up table of 8-11 bits input to count the bits with increasing weights. Then add increasing offsets to the LUT output.

I may be testing some of these theories on the P1 soon.

2,791Here's some real streams from P2 silicon.

8 channels of bitstream x 64000 clocks

14,019I'm not sure. I need to do more testing.

14,019Saucy, do you mean that for a 256-clock bitstream, we ramp up the first 128 samples and then ramp down the second 128 samples? Samples would be weighted 1..128, then 128..1, right?

14,0192,097You get multi-bit samples more quickly by using a triangular or trapezoidal sliding window that moves by a single-bit sample every clock. Bits moving up the ramp would be worth +1 compared to the last clock and bits moving down the ramp -1. You just need to keep a count of the number of '1' bits in the ramps and the plateau (if there is one).

The problem is that this requires storing samples and only very small windows could be handled by a smart pin.14,0191,524I don't know nothing about @ErNa's reaction, but at least to me, they seem like two huge and delicious icecream pots!

Thanks a lot, @ozpropdev

14,019Thanks a lot for posting this data, Ozpropdev.

15,133These SDM filters exist, but they are not compact.

Here is one example google finds, of the FPGA resource used

459Probably need to go to the CIC structure like shown on page 1.

Here's a snippet of code from evanh. The stuff in the do{}while loop needs happens for every bit in. The rest happens for every sample out. Consecutive samples will not be entirely independent.

Having the sums calculated in hardware would be very elegant way to do the filtering. The code could just read one of the accumulators (the interval needs to be the same) and do some post-processing. Although it would have to oversample by 4x or so to get independent samples. I don't know it it's worth having a 4th order accumulator.

-James

2,097A triangular window has a gain of 0.5, whereas a Tukey window could be close to 1.0. What does that the mean for the effective number of bits?

2,097I'm not arguing in favour of Tukey at the expense of anything else. From the output of a binary counter, only a little logic would be needed to generate a small error term, which when added to the counter would create the Tukey value to add to the accumulator. One counter and two adders needed and Sinc3 requires two adders.

1,524You can easily extract two 3 out of 4, and one 6 out of 8 majority voting, almost at a glimpse!

And because two nearby Gio/Vio node-dependant groups are being hammered at the same time, you can trust cleansing one of them, based on its own history, and also in a 3 out of 4 majority voting at its neighbour.

Fantastic!

14,019How is the gain pushing 2x? Double gain adds 1 bit, right?