ADC Sampling Breakthrough - Page 26 — Parallax Forums

• lazarus666 wrote: »
Simple question? What happens if you take the output of the comparator and run in through a single clock delay, and then use a half adder to obtain a two bit signal by computing sigma(z, z^-1) as a two bit value running a clock rate, I.e. creating a {1,1} convolutional kernel. Repeating the process would give a three bit signal at clock rate with an effective kernel of {1,2,1}, and then {1,3,3,1}, and then {1,4,6,4,1}, etc., according to the co-efficients of the binomial theorem. a.k.a. Pascal's triangle. Those familiar with the convolution theorem will recognize that convolution in the time domain is the same as multiplication in the frequency domain, and thus the various sinc^N functions can be obtained, as well as approximate Gaussian modes, because the co-efficients of Pascal's triangle approach the shape of exp(-z*z) as a limit as N approaches infinity.

Welcome lazarus666! All contributions are welcome.

I'm investigating a two-bit signal at half the clock rate at the moment.
• edited 2018-12-02 14:15
lazarus666 wrote: »
Simple question? What happens if you take the output of the comparator and run in through a single clock delay, and then use a half adder to obtain a two bit signal by computing sigma(z, z^-1) as a two bit value running a clock rate, I.e. creating a {1,1} convolutional kernel. Repeating the process would give a three bit signal at clock rate with an effective kernel of {1,2,1}, and then {1,3,3,1}, and then {1,4,6,4,1}, etc., according to the co-efficients of the binomial theorem. a.k.a. Pascal's triangle. Those familiar with the convolution theorem will recognize that convolution in the time domain is the same as multiplication in the frequency domain, and thus the various sinc^N functions can be obtained, as well as approximate Gaussian modes, because the co-effecients of Pascal's triangle approach the shape of exp(-z*z) as a limit as N approaches infinity.
I feel like I made an experiment on using a simple filter in cascaded form
ErNa wrote: »
See screenshots and excel data here
but didn't think about any specific application
For some reason, the quote doesn't work, so here's a link to the forum page: https://forums.parallax.com/discussion/169298/adc-sampling-breakthrough/p15

As I showed: the data can not change as fast, as this is the case in the last screenshot. So limiting this signals slope to just 1 cuts of the noise, that can not, by definition, be part of the measured signal!

Thanks evanh, here are the links: DATA: https://forums.parallax.com/discussion/comment/1455261/#Comment_1455261 Screenshot: https://forums.parallax.com/discussion/comment/1455298/#Comment_1455298
• TonyB_ wrote: »
Evan, could you post your 50000 ramps as a test bitstream file?

https://forums.parallax.com/discussion/comment/1454674/#Comment_1454674

• Erna,
A specific link to single post is in date of that post.

• cgracey wrote: »
You can see the improvement is quite good by summing multiple integrators.

I'm amazed. This seems, to me at least, that the similarity between pins must be very close. This bodes well for the thermal drift solution being possible by calibrating one VIO/GIO for many other pins around it.

• cgracey wrote: »
jmg wrote: »
cgracey wrote: »
The one-pin thing we've got working now is still pretty decent.
Agreed.
Can you make sure to test what you have now, with the external ADC's that need Digital Filters, to confirm they meet their spec connected to a P2.
That give a cross check on the filter, as well as confirming it does work with other parts, opening up more uses.

Do they come on development boards that you can connect the clock and data to?

Yes, here is my suggested shopping list for Digikey :

ADC module Eval PCB (this new part includes a 2.5V 0.25% reference, & no isolation, lower cost )
1 x 296-50791-ND Quantity Available 1 Can ship immediately
Manufacturer Part Number AMC1035EVM (1V Ana In) ~\$50 (also avail Mouser, Verical, TI Store)

+
Some precision resistors
10~100 pcs of ERA-6ARW102V Panasonic 1K OHM 0.05% 1/8W 0805 ±10ppm/°C for dividers from their 0.25% reference. (67c ea 10+)

Digkey also show Eval Boards (same prices) for isolated variants

AMC1303/4/5/6 with choices of CMOS or LVDS interfaces
• TonyB_ wrote: »
TonyB_ wrote: »
TonyB_ wrote: »
The Tukey17/32 could be shortened like this:
``` 1    3    5    7   10   13   16   19   22   25   27   29   31
32   32   32
31   29   27   25   22   19   16   13   10    7    5    3    1
```
Sum is 512. Full scale output would be 128.

A Tukey without a plateau is a Hann window, so bit 6 would select Tukey/Hann mode.

It's not quite a Hann window. There are 3 32s on top. Which was chosen in part because the sum would be 512. I presume a Hann would have a single 32 on top.

Yes, you're right. I knew it was not quite a Hann. What if we turn the short plateau 32,32,32 into a peak 32,33,32? Does that show up in the frequency response? Full scale stays at 128.

The 33 appears to make the response worse. The noise level at the 10 bit sum is worse as well. It also makes the filter more expensive since the 33 is not used in the longer filter anywhere.

Thanks for the waveforms, do you mind doing just a few more? I've had what might be a final look at Tukeys, with a range of 64, before returning to the soft HDMI. The options below would add logic but only a little for one of them. I'm interested to learn how much better if at all the frequency responses are with the larger range, even if none could be implemented.

In the table below, Tukey17*/32 is called Tukey17_13_32 to make it clear the middle 13 values are used and the same naming scheme applies to the others.
```tukey17_13_32							'existing
long	01,03,05,07,10,13,16,19,22,25,27,29,31		'13 up, up/down sum = 208 * 2
long	32,31,32					'19 top, top sum = 607
long	31,29,27,25,22,19,16,13,10,07,05,03,01		'13 down, total sum = 1023 (>>2 = 255)
```
```tukey15_13_64							' NEW
long	01,03,07,12,18,25,32,39,46,52,57,61,63		'13 up, up/down sum = 416 * 2
long	64,63,64					'19 top, top sum = 1215
long	63,61,57,52,46,39,32,25,18,12,07,03,01		'13 down, total sum = 2047 (>>3 = 255)
```
```tukey17_13_32_x2						'existing doubled for easier comparison
long	02,06,10,14,20,26,32,38,44,50,54,58,62		'13 up, up/down sum = 416 * 2
long	64,62,64					'19 top, top sum = 1214
long	62,58,54,50,44,38,32,26,20,14,10,06,02		'13 down, total sum = 2046 (>>3 = 255)
```
```tukey17_13_64							' NEW
long	02,05,09,14,20,26,32,38,44,50,55,59,62		'13 up, up/down sum = 416 * 2
long	64,63,64					'19 top, top sum = 1215
long	62,59,55,50,44,38,32,26,20,14,09,05,02		'13 down, total sum = 2047 (>>3 = 255)
```
```tukey17_15_64							' NEW
long	01,02,05,09,14,20,26,32,38,44,50,55,59,62,63	'15 up, up/down sum = 480 * 2
long	64,63,64					'17 top, top sum = 1087
long	63,62,59,55,50,44,38,32,26,20,14,09,05,02,01	'15 down, total sum = 2047 (>>3 = 255)
```

In ascending logic sizes, 17_13_64 would be slightly bigger than 17_13_32 as only four ramp values are different, then 15_13_64 with 17_15_64 the largest.

All the plateau values could be 64 by using a counter with overflow detection, as I mentioned earlier. The short Tukey lengths are 29, except 31 for 17_15_64 which, if the peak is 64, would be a true Hann window and this is the main reason for including it.
```WindowFunction  UnquantizedVpp98%  UnquantizedStdDev  HFNoisePower  HighBitCount
tukey17_13_32       1.828125        0.4276477         0.228645             95
tukey15_13_64       1.609375        0.4036476         0.3709245           104
tukey17_13_32_x2    1.828125        0.4276477         0.228645             95
tukey17_13_64       1.578125        0.3789672         0.2484211           100
tukey17_15_64       1.5625          0.3751425         0.240969            112

```
Vpp is interval that 98% of returned samples fit into. The top and bottom 1% were ignored.
StdDev is standard deviation. Both of these were done with the summer output, scaled to 0-255 for consistency but not rounded or truncated. (That would add additional noise)
NoisePower is the sum of FFT power from relative frequencies 0.05 to 0.5(Nyquist)
HighBitCount is used to estimate the number of adders required the implement the filter.

I don't see any obvious correlation between the frequency metric and the output noise.
• cgracey wrote: »
Tonight I did an experiment where I ganged ADC pins together to see if there was any resolution increase if they were all fed the same signal and their output bits were summed together.

This exercise indicates what kind of performance increase we could expect if each ADC had multiple integrators, which would be simple to do. Instead of 1 huge integrator cap in the ADC (which is already 20x bigger than necessary), we could have 15 smaller caps, each fed by a flop with a uniquely-biased sense amp driving its data input, so they don't all coalesce. Then, we add up their output states to form a 4-bit word on each clock. It looks like the improvement would be quite dramatic.
...
You can see the improvement is quite good by summing multiple integrators.

This cannot go into the next silicon, but will go into a future design. The ADC input bandwidth needs to be increased dramatically, as well. Then, we'll have much faster and more accurate ADCs.
That looks great! I would not consider 4 pins used for a single video input to be unreasonable. Regarding the frequency drop-off, it may be possible to add an RC "pre-emphasis" filter in front of the P2 to compensate.

There's a good sigma-delta tutorial here: https://beis.de/Elektronik/DeltaSigma/SigmaDelta.html Look about halfway down for the graph of SNR vs oversampling and modulator order.

It's probably more effective to increase the modulator order than to average multiple first order modulators. For instance, the ADS1675 claims 92dB SNR at an oversampling ratio of 8. That's off the chart. But it would be reasonable for a 6th or 7th order modulator.

If we're planning P3, why not add a sample-and-hold so we can do successive approximation? Or for interleaved conversions.

• evanh wrote: »
cgracey wrote: »
You can see the improvement is quite good by summing multiple integrators.

I'm amazed. This seems, to me at least, that the similarity between pins must be very close. This bodes well for the thermal drift solution being possible by calibrating one VIO/GIO for many other pins around it.

But did you see those later pics of the super accurate and quiet ramp and sine signals from the pad ring test chip running on the FPGA board? I was really surprised. This means that the digital ground noise in the substrate of the actual P2 die is demolishing our SNR. If we could quiet down that ground noise, the ADC performance would be fantastic. It should be possible in this next revision to improve noise isolation a little bit.
• edited 2018-12-02 21:29
Thanks, you guys, for posting more ideas. I need to get some more sleep in the next few hours and then I'll be back on to respond.
• That looks great! I would not consider 4 pins used for a single video input to be unreasonable.
I don't think Chip was planning on merge of 4 pins, rather creating multiple ADC within one (future P3) pin cell, to allow the statistical averaging that allows ?
• evanh wrote: »
cgracey wrote: »
You can see the improvement is quite good by summing multiple integrators.
I'm amazed. This seems, to me at least, that the similarity between pins must be very close. This bodes well for the thermal drift solution being possible by calibrating one VIO/GIO for many other pins around it.

I'm not sure you can make that leap. The plots thus far show poor correlation of thermal drift.

Chip gets the gains in these plots from summing the noise/sampling phases, so he is both effectively increasing the average sampling clock, and using averaging of the SDM periodic subclock elements.
Note this gain is where the chip was being pushed anyway, with very low sample windows, so sample quanta are well above the analog noise floors.
It may not gain so much, at lower frequencies.
• TonyB_ wrote: »

Sinc3 acc1 can have more than 1 bit as its input and if we could use two ADC bits then maybe we could run the Sinc3 logic at sysclock/2. This would have two effects:

1. The decimation rate R could not be odd, but a CIC filter works better with even R anyway, according to something I read the other day. R/4 an integer is better, R/8 an integer better still, etc.

2. The acc2 and acc3 adders could be replaced, in theory, by a single adder with one of the operands muxed. Adder operations (syntax deliberate):
```acc2 := acc2 + acc1
acc3 := acc2 + acc3
```

The single adder could have acc2 as one operand always and either acc1 or acc3 as the other operand every other clock. The adder output connects to both acc2 and acc3 registers with write enabled every other clock again. The counter that increments acc1 is unchanged and can count on every clock, depending on the the ADC bit.

The aim is to save logic but have the same functionality, apart from the even R caveat.

I have tested this idea and it does work. The scaling/bit shifting of the result is not the same as it is a quarter of the size compared to updating acc2 and acc3 every clock. There is a difference between the nth outputs for the two modes using the ramp bitstream but it is almost constant.
• edited 2018-12-03 04:58
TonyB_ wrote: »
TonyB_ wrote: »

Sinc3 acc1 can have more than 1 bit as its input and if we could use two ADC bits then maybe we could run the Sinc3 logic at sysclock/2. This would have two effects:

1. The decimation rate R could not be odd, but a CIC filter works better with even R anyway, according to something I read the other day. R/4 an integer is better, R/8 an integer better still, etc.

2. The acc2 and acc3 adders could be replaced, in theory, by a single adder with one of the operands muxed. Adder operations (syntax deliberate):
```acc2 := acc2 + acc1
acc3 := acc2 + acc3
```

The single adder could have acc2 as one operand always and either acc1 or acc3 as the other operand every other clock. The adder output connects to both acc2 and acc3 registers with write enabled every other clock again. The counter that increments acc1 is unchanged and can count on every clock, depending on the the ADC bit.

The aim is to save logic but have the same functionality, apart from the even R caveat.

I have tested this idea and it does work. The scaling/bit shifting of the result is not the same as it is a quarter of the size compared to updating acc2 and acc3 every clock. There is a difference between the nth outputs for the two modes using the ramp bitstream but it is almost constant.

If it causes a bit reduction, then maybe we could sample 16 bits of the ADC before operating the integrator. What do you think?
• How would you gang up ADCs on the ES and v1 P2 chips?
• edited 2018-12-03 05:40
cgracey wrote: »
TonyB_ wrote: »
TonyB_ wrote: »

Sinc3 acc1 can have more than 1 bit as its input and if we could use two ADC bits then maybe we could run the Sinc3 logic at sysclock/2. This would have two effects:

1. The decimation rate R could not be odd, but a CIC filter works better with even R anyway, according to something I read the other day. R/4 an integer is better, R/8 an integer better still, etc.

2. The acc2 and acc3 adders could be replaced, in theory, by a single adder with one of the operands muxed. Adder operations (syntax deliberate):
```acc2 := acc2 + acc1
acc3 := acc2 + acc3
```

The single adder could have acc2 as one operand always and either acc1 or acc3 as the other operand every other clock. The adder output connects to both acc2 and acc3 registers with write enabled every other clock again. The counter that increments acc1 is unchanged and can count on every clock, depending on the the ADC bit.

The aim is to save logic but have the same functionality, apart from the even R caveat.

I have tested this idea and it does work. The scaling/bit shifting of the result is not the same as it is a quarter of the size compared to updating acc2 and acc3 every clock. There is a difference between the nth outputs for the two modes using the ramp bitstream but it is almost constant.

If it causes a bit reduction, then maybe we could sample 16 bits of the ADC before operating the integrator. What do you think?

I'm not sure I can think at the moment as it's very late.

The input to the Sinc3 logic is 2-bit, even though the acc1 counter is exactly the same. This preliminary decimation by 2 means acc2 and acc3 can run at sysclock/2, which is all that is required for the single muxed adder. I think a higher 1st-stage decimation might smear edges.

The bit shift difference is two bits, e.g. right shift by 6 for a 16-bit result, right-aligned, when R=256*. I don't know whether this means we need fewer adder bits or could have a larger total R with the same number of bits. The even sampling rate might mean there is a config bit spare. If this idea might save an appreciable amount of logic, then it's worth examining now that I've proven it's not a waste of time.

* acc1 updated every clock for 256 clocks, acc2/acc3 updated every other clock.
• edited 2018-12-03 06:29
cgracey wrote: »
If it causes a bit reduction, then maybe we could sample 16 bits of the ADC before operating the integrator. What do you think?
That sounds riskier, as the filter seems to work best on the more granular signals. (those with the highest frequency contents), plus you move further away from an expected Sinc3 filter.
Easy enough to try I guess ?

Is there any significant Logic saving, over Latches + muxes + 1 adder, vs 2 adders ? Is there enough time margin to have Mux + Adder ?
• ```TonyB_ wrote: »
Sinc3 acc1 can have more than 1 bit as its input and if we could use two ADC bits then maybe we could run the Sinc3 logic at sysclock/2. This would have two effects:

1. The decimation rate R could not be odd …
```

If you want to decimate by an odd number, such as by 3; you could perform a six point FFT on overlapping blocks of windowed input; which then only requires additions and subtractions for the 1 and -1 coeffecients, as well as a shift operation for sin(30)=0.5, as well as something approximately equal to sqrt(3)/2~0.86603; so maybe sin(60) is approximately 0.875 = 1-1/8 for that particular fake multiply, which hurts the anti-aliasing a tiny amount; but it might still work out better and the long run, as it has no effect on the result when you average 100's of samples to get the DC performance, apart from a slight overall multiplicative calibration factor - since the FFT is by its nature a "linear operator." Then one could throw away the high frequencies, and convert back to time domain the now "low pass filtered" signal. Something worth looking into using more fast bit-weaving using minimal logic?
• Lazarus, you've got neat ideas about filtering the ADC bitstream. I still need more sleep before I can process.
• lazarus666 wrote: »
Simple question? What happens if you take the output of the comparator and run in through a single clock delay, and then use a half adder to obtain a two bit signal by computing sigma(z, z^-1) as a two bit value running a clock rate, I.e. creating a {1,1} convolutional kernel. Repeating the process would give a three bit signal at clock rate with an effective kernel of {1,2,1}, and then {1,3,3,1}, and then {1,4,6,4,1}, etc., according to the co-efficients of the binomial theorem. a.k.a. Pascal's triangle. Those familiar with the convolution theorem will recognize that convolution in the time domain is the same as multiplication in the frequency domain, and thus the various sinc^N functions can be obtained, as well as approximate Gaussian modes, because the co-effecients of Pascal's triangle approach the shape of exp(-z*z) as a limit as N approaches infinity.

Note this code is under development, as a simulation of how the ADCs might work, so that I and run tests on a Pentium; and it is still quite broken as to the later stages, i.e. when trying to push for speed by doing things by bit-weaving the logic in SIMD like fashion. Yet there are things that come to mind, such as that I should be possible to get the proposed P2 architecture A-D rate up to something much faster than 1 MSample/second for 8 bit and higher USABLE resolution … for certain types of signals. Note the word USABLE resolution; since it should be easy to get 6, 7 or even 8 noisy bits of data at 62 Mhz (or 31) based on simple delay and add, delay and add, delay, and add semantics, using the minimum number gates.

Then the magic happens, (warm maybe) lets say if you are trying to process the R-Y and B-Y signals of an S-video link, where you are getting your stream at the output of an analog 3.5795454Mhz bandpass filter (just a coil and capacitor or video grade op amp circuit) - then -- since you might be able to use multiple a-d inputs to guess at the amplitude swings of a plurality of bandwidth limited signals, without requiring a bunch of sample and hold circuits or ring modulators (such as LM1496 or equivalent) to mix the chroma channels or whatever down to baseband; although that is also an option.

Broken code follows - this is a work in progress …
```#define EVEN_BITS	(0x55555555)
#define ODD_BITS   	(0xaaaaaaaa)
#define EVEN_PAIRS	(0x33333333)
#define ODD_PAIRS	(0xcccccccc)
#define EVEN_NIBBLES	(0xf0f0f0f0)
#define ODD_NIBBLES	(0x0f0f0f0f)
#define EVEN_BYTES  (0x00ff00ff)
#define ODD_BYTES   (0xff00ff00)

{
static bool carry;

DWORD  q0, q1, q2, q3;
DWORD  r0, r1, r2, r3;
DWORD  s0, s1;
DWORD acc = 0;

// format b31.....b0, even bits have weight = 1
// odd bits have weight = 0.5*2, phase one - decimate by 2 using
// a [1,2,1] convolutional kernel yielding 16 3 bit values from a
// sigle DWORD containing 32 individual one bit input samples

q0 = input&EVEN_BITS;
q1 = input&ODD_BITS;
q2 = q0<<1;
q3 = (q1+q1<<2)>>1;
r0 = q2&EVEN_PAIRS;
r2 = q3&EVEN_PAIRS;
s0 = r0+r2;
r1 = q2&ODD_PAIRS;
r3 = q3&ODD_PAIRS;
s1 = r1+r3;

// phase two - apply the same transformation
// to the 16 three bit values, yielding
// eight five bit values, having a range [0..32]

q0 = s0&EVEN_NIBBLES;
q1 = s0&ODD_NIBBLES;
q2 = s1&EVEN_NIBBLES;
q3 = s1&ODD_NIBBLES;

// HMMM... FIXME?

s0 = q0+q2<<1+q3>>4;
s1 = q1+q3;

// phase three - apply the same transformation
// to the eight five bit values, yielding
// four seven bit values, having a range [0..64]

// finally repack into a 32 bit register
// for in input rate of 250Mbps - this results
// in an initial output rate of 31.25
return acc;
}
```

Looks very interesting, but not sure how it works. I am anxious to see what you can achieve, though.
• cgracey wrote:
Looks very interesting, but not sure how it works.

Mark_T gives a nice explanation in my thread on iterative binomial filtering:

https://forums.parallax.com/discussion/comment/1449994/#Comment_1449994

You have to use only the odd-numbered filters, though: 1,2,1; 1,4,6,4,1; etc. The even ones, apparently, have bad characteristics.

-Phil
• cgracey wrote:
Looks very interesting, but not sure how it works.

Mark_T gives a nice explanation in my thread on iterative binomial filtering:

https://forums.parallax.com/discussion/comment/1449994/#Comment_1449994

You have to use only the odd-numbered filters, though: 1,2,1; 1,4,6,4,1; etc. The even ones, apparently, have bad characteristics.

-Phil

Thanks, Phil.
• Chip, please don't take my lead as an endorsement of this entire thread. But if it works and simplifies what you were going to go ahead and do anyway, I guess I'd be in favor of it. -Phil
• Phil seems to be a fan of Colin Chapman • Sinc3 with 2-bit input performs well overall, but not as well as 1-bit. The differences between corresponding outputs for the same bitstream are mostly zero or one. With R=16 and 8-bit* resolution there are some differences of 4-6 when the waveform is changing slowly and the 1-bit mode is more accurate. Differences with R=256 and 16-bit resolution are much smaller proportionately.

* Note that the max value after scaling for 1-bit and 2-bit modes is 256 when R=16, so special care is needed when bit shifting.

Below are two tables of the absolute differences between the two modes when using the first 102400 bits in Evan's 50000 ramp bitstream:
```#R = 16, Output range = 0-256, Total outputs = 6400
#Diff, Freq
0, 3593
1, 2084
2,  383
3,  170
4,   53
5,   71
6,   46
```
```#R = 256, Output range = 0-65535, Total outputs = 400
#Diff, Freq
0,  96
1, 133
2,  81
3,  45
4,  23
5,   6
6,   3
7,   1
9,   1
12,   3
13,   1
14,   1
16,   1
17,   1
24,   1
37,   1
46,   1
47,   1
```
All differences not shown never occur.

Sample sequential outputs for R=16, showing greater precision of 1-bit mode:
```1-bit,2-bit
86, 8B
85, 83
86, 85
86, 8A
86, 81
86, 8C
86, 82
86, 89
86, 87
87, 85
86, 8A
87, 82
87, 8C
87, 82
87, 8B
87, 84
87, 8A
88, 85
87, 8A
87, 85
88, 8A
87, 85
88, 8A
87, 85
```
• edited 2018-12-04 01:37
I'm guessing this is showing the Sinc's have a fast impulse response. And that what Chip feared about bad pairing combinations does indeed poorly impact the output. Unlike with the Tukey's.

• evanh wrote: »
TonyB_ wrote: »
Evan, could you post your 50000 ramps as a test bitstream file?

https://forums.parallax.com/discussion/comment/1454674/#Comment_1454674

Thanks for the link, Evan. Here is the simple Sinc3 algorithm:
```	acc1 = acc1 + ADCbit
acc3 = acc3 + acc2	;Omit in 2-bit mode
acc2 = acc2 + acc1	;Omit in 2-bit mode

acc3 = acc3 + acc2
acc2 = acc2 + acc1
```
• evanh wrote: »
Tony,
That has similarity to the pruned output I had earlier. Both the slack start and the constant small oscillation.

Although I'm pleased my 2-bit input idea works, I would recommend it in a design only if separate adders for acc2 and acc3 cannot fit and one adder + muxed inputs can. I had doubts about whether two consecutive ADC bits are a valid 2-bit value, as the possible sums are 0, 1, 1 and 2. However, the mean is 1 which is twice the 1-bit mean of ½, so two bits are worth twice as much as one. The acc1 counter could use more than two bits before adding acc1 to acc2, but the resolving power would be less.
• TonyB_ wrote: »
Sinc3 with 2-bit input performs well overall, but not as well as 1-bit. The differences between corresponding outputs for the same bitstream are mostly zero or one. With R=16 and 8-bit* resolution there are some differences of 4-6 when the waveform is changing slowly and the 1-bit mode is more accurate. Differences with R=256 and 16-bit resolution are much smaller proportionately.

* Note that the max value after scaling for 1-bit and 2-bit modes is 256 when R=16, so special care is needed when bit shifting.

I didn't see it in my tests but it's possible the R=256 max value might be 65536. The FLE instruction will prevent errors.
• TonyB_ wrote: »
evanh wrote: »
Tony,
That has similarity to the pruned output I had earlier. Both the slack start and the constant small oscillation.

Heh, I went back and deleted that because I wasn't sure.