ADC Sampling Breakthrough

TonyB_ · 2018-11-20 13:31

Ariba wrote: »

I would not call this a filter. It is in fact a window function with a trapezoid window, like used for FFT. This reducies the leakage effect and gives therefore more accurate results.
The integrator is still linear and not exponential like in classical filters.

It is a window function and seems to be closest to the Planck-taper window with a sharp rise and fall:
https://en.m.wikipedia.org/wiki/Window_function

A triangular window looks to be straightforward to implement. The code uses 512 longs of data and if some datasets were posted we could try different windows ourselves.

cgracey · 2018-11-20 13:36

The first and last samples are like the rest of the data, but their position is significant, as there is continuity in the bitstream. For some reason, fading them in at first and out at last cuts the signal noise down so that the ADC operation becomes quite ideal.

cgracey · 2018-11-20 13:43

This windowing operation is identical to oversampling with (almost entirely) the same data, but at 32 different sequential bit offsets. It's the same thing, just done with efficiency by exploiting commonalities.

TonyB_ · 2018-11-20 13:54

Will the ramp and plateau lengths be programmable? The window will be triangular if the latter is zero.

cgracey · 2018-11-20 14:00

I've done a lot of testing on this tonight, even making a closed loop to adjust the sample length, in order to drive the 1's delta between VIO and GIO measurements to constants like 255, in order to make mathless conversions. Didn't work well, due to frequent sample-length adjustments.

I think all we need is a smart pin mode to make windowed measurements for ADC sampling. That is not much logic. We can't automate ADC operation beyond the windowed measurement, though, because it gets too complex, too quickly. This means that the streamer won't have the planned ADC modes, because it would need to make windowed measurements for them to be of acceptable quality and that would over-complicate the streamer.

I'm really glad we've got the ADC working as it should, now. That Trumps a lot of the other stuff. ErNa?

evanh · 2018-11-20 14:02

Ramps will be fixed at 32 clocks each, plateau will be (X - 64 clocks).

cgracey · 2018-11-20 14:22

evanh wrote: »

Ramps will be fixed at 32 clocks each, plateau will be (X - 64 clocks).

I think so. We could go more or less than 32, but 32 seems like a happy medium that doesn't impinge too badly on smaller sample sizes, like 8-bit quality.

I think the smart pin will work like this:

(1) Input first 32 samples, accumulate {1..32} * highs.
(2) Accumulate 32's for highs during duration of WXPIN period.
(3) Input last 32 samples, accumulate {32..1} * highs.
(4) Report accumulator >> 5 to result.

ErNa · 2018-11-20 14:37

cgracey wrote: »

I've done a lot of testing on this tonight, even making a closed loop to adjust the sample length, in order to drive the 1's delta between VIO and GIO measurements to constants like 255, in order to make mathless conversions. Didn't work well, due to frequent sample-length adjustments.

I think all we need is a smart pin mode to make windowed measurements for ADC sampling. That is not much logic. We can't automate ADC operation beyond the windowed measurement, though, because it gets too complex, too quickly. This means that the streamer won't have the planned ADC modes, because it would need to make windowed measurements for them to be of acceptable quality and that would over-complicate the streamer.

I'm really glad we've got the ADC working as it should, now. That Trumps a lot of the other stuff. ErNa?

There will be a next chance to salvage things, that once went wrong as we do not see the full picture. At least we know, there is no simple solution if it doesn't come from a strong genius ;-)

cgracey · 2018-11-20 14:40

ErNa wrote: »

There will be a next chance to salvage things, that once went wrong as we do not see the full picture. At least we know, there is no simple solution if it doesn't come from a strong genius ;-)

Yes, what we have now seems very stable.

Rayman · 2018-11-20 14:41

I guess you can think of this as the average of 32 regular moving average filters, all centered on each other and with widths that increase by 2 samples...

cgracey · 2018-11-20 14:51

Rayman wrote: »

I guess you can think of this as the average of 32 regular moving average filters, all centered on each other and with widths that increase by 2 samples...

Yes, they are centered on each other, but the top one has 32 bits in front of it and 1 bit behind it. The next one below has 31 bits in front of it and 2 bits behind it, and so on, down to the bottom one which as 1 bit in front of it and 32 bits behind it. Now, add up all the bits and divide by 32 to get your sample.

Rayman · 2018-11-20 15:04

I think that's another way to look at it...

evanh · 2018-11-20 16:48

Well, I'm getting numbers. Not at all sure how to interpret them yet. The obvious observation is Sinc2 resolution is basically double the bit-depth of the others.

Electrodude · 2018-11-20 16:54

cgracey wrote: »

evanh wrote: »

Chip,
That drawing is third-order, not second-order. I just noticed you called it second-order. So, to get second-order, only do two of the three shown stages. And even then only do two accumulators. Forget the decimator stages since software can do those easy.

That way you aren't making it any bigger than existing resources in the smartpins.

EDIT: Here's all that's needed

With a second-order filter, wouldn't we need a 2nd-order integrator? I still have no idea how these things work.

Consider the raw bitstream from the ADC to be a one bit stream of samples containing all jitter and no signal. You want to take the jitter pulses and spread them out into more smaller signal pulses. So you take an accumulator wider than one bit and accumulate these jitter bits into it. When the accumulator is sampled periodically and subtracted from its prior reading, this increment can be considered to have two components: a desirable signal component and an undesirable jitter component. This is what you do now with just a simple accumulator.

Originally, the 1-bit bitstream was all jitter and no signal. The first accumulator converts much of this jitter into signal. When you add another accumulator, this second accumulator converts some of the remaining jitter into more signal.

If you added this second accumulator that accumulated the results of the first accumulator, and then added another differentiator to compensate for the extra accumulator, it would leave the first-order signal's signal component pretty much unaffected, but would convert some of its jitter into more signal by the same process that the first one did so.

Run the following C code that compares first order to second order for 8-bit samples. It prints out two columns. The first shows 8-bit first-order samples with 1 LSB of jitter. The second shows 16-bit second-order values with more than one bit of jitter, but the jitter is in the bottom 8 bits, which isn't meaningful because only the top 8 bits have any meaning because we're only taking 256 samples. If you want 32-bit accumulators, change all the uint16_t's to uint32_t's and change the %04x's to %08x's.

#include <stdio.h>
#include <stdint.h>

int main(int argc, char **argv) {
	// bitsream generation
	uint16_t vacc_prev = 0;
	uint16_t vin = 0x3678;

	// accumulators
	uint16_t acc1 = 0;
	uint16_t acc2 = 0;

	// sampling period counter
	int samples = 0;

	// previous values
	uint16_t acc1_prev = 0;
	uint16_t acc2_prev = 0;
	uint16_t diff21_prev = 0;
	uint16_t diff22_prev = 0;
	uint16_t diff1_prev = 0;

	for (int i = 0; i < 16384; i++) {
		uint16_t vacc = vacc_prev + vin;         // increase bitsream accumulator
		uint16_t bit = vacc < vacc_prev ? 1 : 0; // detect carry

		acc1 += bit;  // update first accumulator
		acc2 += acc1; // update second accumulator

		if (++samples >= 256) {
			// first-order differentiation of first accumulator
			uint16_t diff1 = acc1 - acc1_prev;

			// second-order differentiation of second accumulator
			uint16_t diff21 = acc2 - acc2_prev;
			uint16_t diff22 = diff21 - diff21_prev;

			// first order, second order values
			printf("%04x, %04x\n", diff1, diff22);

			samples = 0; // reset sample count

			// save differentiation variables for next time
			acc1_prev = acc1;
			acc2_prev = acc2;
			diff21_prev = diff21;
			diff22_prev = diff22;
		}

		// save carry accumulator for next time
		vacc_prev = vacc;
	}

	return 0;
}

evanh · 2018-11-20 17:09

Here's five runs with differing parameters. Each run takes 20 readings (samples) and prints them consecutively.

First parameter is algorithm select. Second parameter is number of bits to read from the bitstream (chunk length). Third parameter is reset flag to zero the bitstream index and accumulators.

		reading = chunk( 1, 0x100, 1 );  // Square box, non-rolling
...
		reading = chunk( 2, 0x100, 1 );  // Chip's trapezoid, non-rolling
...
		reading = chunk( 11, 0x100, 0 );   // Sinc1  (Smartpin mode %01100), rolling
...
		reading = chunk( 12, 0x100, 1 );   // Sinc2, non-rolling
...
		reading = chunk( 12, 0x100, 0 );   // Sinc2, rolling

Console output, in respective order:

 0x52 0x52 0x52 0x52 0x52 0x52 0x52 0x52 0x52 0x52 0x52 0x52 0x52 0x52 0x52 0x52 0x52 0x52 0x52 0x52

 0x48 0x48 0x48 0x48 0x48 0x48 0x48 0x48 0x48 0x48 0x48 0x48 0x48 0x48 0x48 0x48 0x48 0x48 0x48 0x48

 0x52 0x52 0x52 0x52 0x52 0x53 0x52 0x52 0x52 0x52 0x53 0x52 0x52 0x52 0x52 0x53 0x52 0x52 0x52 0x52

 0x28c0 0x28c0 0x28c0 0x28c0 0x28c0 0x28c0 0x28c0 0x28c0 0x28c0 0x28c0 0x28c0 0x28c0 0x28c0 0x28c0 0x28c0 0x28c0 0x28c0 0x28c0 0x28c0 0x28c0

 0x28c0 0x5232 0x522f 0x5230 0x5230 0x5230 0x5231 0x522f 0x5230 0x522f 0x5232 0x522f 0x5230 0x5230 0x522f 0x5233 0x522d 0x5232 0x522f 0x5230

Note the two rolling runs have a little noise. This will be because the bitstream was filled with a 12-bit NCO, while the sampling is only 256 bits long. DC level is 0x523.

EDIT: Corrected first reading of rolling runs to start as a reset.

evanh · 2018-11-20 17:17

Here's the five runs again but with chunk length set to 0x1000 (4 kbit):

 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523

 0x519 0x519 0x519 0x519 0x519 0x519 0x519 0x519 0x519 0x519 0x519 0x519 0x519 0x519 0x519 0x519 0x519 0x519 0x519 0x519

 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523 0x523

 0x291292 0x291292 0x291292 0x291292 0x291292 0x291292 0x291292 0x291292 0x291292 0x291292 0x291292 0x291292 0x291292 0x291292 0x291292 0x291292 0x291292 0x291292 0x291292 0x291292

 0x291292 0x523000 0x523000 0x523000 0x523000 0x523000 0x523000 0x523000 0x523000 0x523000 0x523000 0x523000 0x523000 0x523000 0x523000 0x523000 0x523000 0x523000 0x523000 0x523000

And again but with chunk length set to 0x10000 (64 kbit):

 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230

 0x5226 0x5226 0x5226 0x5226 0x5226 0x5226 0x5226 0x5226 0x5226 0x5226 0x5226 0x5226 0x5226 0x5226 0x5226 0x5226 0x5226 0x5226 0x5226 0x5226

 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230 0x5230

 0x2917a920 0x2917a920 0x2917a920 0x2917a920 0x2917a920 0x2917a920 0x2917a920 0x2917a920 0x2917a920 0x2917a920 0x2917a920 0x2917a920 0x2917a920 0x2917a920 0x2917a920 0x2917a920 0x2917a920 0x2917a920 0x2917a920 0x2917a920

 0x2917a920 0x52300000 0x52300000 0x52300000 0x52300000 0x52300000 0x52300000 0x52300000 0x52300000 0x52300000 0x52300000 0x52300000 0x52300000 0x52300000 0x52300000 0x52300000 0x52300000 0x52300000 0x52300000 0x52300000

We're out of resolution. Can't go any further without scaling first.

EDIT: Corrected first reading of rolling runs to start as a reset.

evanh · 2018-11-20 17:55

Here's the bitstream generating source code:

	memset( bitstream, 0, sizeof(bitstream) );

	// fill the bitstream array with valid synthesised data
	dither = bitindex = 0;
	do {
		dither += 0x523;            // DC level, arbitrary
		rollover = dither & 0xfff;  // 12-bit based
		if( dither != rollover )
		{
			dither = rollover;
			bitstream[bitindex >> WORDEXP] |= 1 << (bitindex & SIZEMASK);  // C bitfield insertion
		}
		bitindex++;
	} while( bitindex < (sizeof(bitstream) << 3) );

And this is the various filters, for those that can be bothered scrolling:

	reading = 0;
	if( reset )  bitindex = 0;


	switch( chunktype )
	{
	case 1: // flat box chunk, using flat middle section from Chip's aglorithm
		indexSV = bitindex + chunklen;
		do {
			bits = bitstream[bitindex >> WORDEXP];
			bitindex += (1 << WORDEXP);
			// count the set bits
			i = ones = 0;
			do {
				ones += (bits >> i) & 1;
				i++;
			} while( i < (1 << WORDEXP) );

			reading += ones << WORDEXP;

		} while( bitindex < indexSV );

		reading >>= WORDEXP;
		break;


	case 2: // Chip's trapedzoidal aglorithm
		// taper up section
		bits = bitstream[bitindex >> WORDEXP];
		bitindex += (1 << WORDEXP);
		i = ones = 0;
		do {
			ones++;
			if( (bits >> i) & 1 )  reading += ones;
			i++;

		} while( i < (1 << WORDEXP) );


		// flat middle section
		indexSV = bitindex + chunklen - (2 << WORDEXP);
		do {
			bits = bitstream[bitindex >> WORDEXP];
			bitindex += (1 << WORDEXP);
			// count the set bits
			i = ones = 0;
			do {
				ones += (bits >> i) & 1;
				i++;
			} while( i < (1 << WORDEXP) );

			reading += ones << WORDEXP;

		} while( bitindex < indexSV );


		// taper down section
		bits = bitstream[bitindex >> WORDEXP];
		bitindex += (1 << WORDEXP);
		ones = (1 << WORDEXP);
		i = 0;
		do {
			if( (bits >> i) & 1 )  reading += ones;
			ones--;
			i++;
		} while( ones );

		reading >>= WORDEXP;
		break;


	case 11:// sinc1 aglorithm
		if( reset )  sinc1acc1 = sinc1diff1 = 0;

		indexSV = bitindex + chunklen;
		do {
			bits = bitstream[bitindex >> WORDEXP];
			if( (bits >> (bitindex & SIZEMASK)) & 1 )  // C bitfield extraction
			{
				sinc1acc1 += 1;
			}
			bitindex++;

		} while( bitindex < indexSV );

		reading = sinc1acc1 - sinc1diff1;
		sinc1diff1 = sinc1acc1;
		break;


	case 12:// sinc2 aglorithm
		if( reset )  sinc2acc1 = sinc2acc2 = sinc2diff1 = sinc2diff2 = 0;

		indexSV = bitindex + chunklen;
		do {
			bits = bitstream[bitindex >> WORDEXP];
			if( (bits >> (bitindex & SIZEMASK)) & 1 )  // C bitfield extraction
			{
				sinc2acc1 += 1;
			}
			sinc2acc2 += sinc2acc1;
			bitindex++;

		} while( bitindex < indexSV );

		reading = sinc2acc2 - sinc2diff1 - sinc2diff2;
		sinc2diff2 = sinc2acc2 - sinc2diff1;
		sinc2diff1 = sinc2acc2;
		break;


	default:
		break;
	}

Full source code attached:

jmg · 2018-11-20 18:42

cgracey wrote: »

I think all we need is a smart pin mode to make windowed measurements for ADC sampling. That is not much logic. We can't automate ADC operation beyond the windowed measurement, though, because it gets too complex, too quickly. This means that the streamer won't have the planned ADC modes, because it would need to make windowed measurements for them to be of acceptable quality and that would over-complicate the streamer.

I thought this was going to be optional ?
Forcing any 'filter' and removing an ADC streamer ability, does not seem a step in the 'flexible P2' direction at all ?

Before encoding a single filter into silicon, a lot more testing and analysis is needed.
eg How does this vary with SysCLK - 250MHz seems a quite high test value which will have millivolts of charge cap variation, and thus be prone to supply ripple/ringing etc.

jmg · 2018-11-20 18:49

Cluso99 wrote: »

I am lost when it comes to understanding analog. I can lay out a pcb to be quiet for analog, but that's it.

Something seems amiss when you can effectively remove a group of first samples and last samples and end up with a superior result. Since you say the sampling is free running and has been before you start, the something is upsetting the results. Otherwise all the results would be noisy and you could just take any window of samples and you would get the same/similar results.

So the question is rather, what is causing those first and last sample groups to be poor?

What you have found is the result of a problem. Now it's time to find out the why. It's not the final silicon yet, so a workaround currently isn't the solution.

Just my observation.

I agree.
Harder to figure out, is even if there is clumping effects in first/last windows of 32, the mirror nature of the very small first/last windows will have varying effects, depending on the Phase/Frequency of that clumping.
ie the notch effect can shift with sample N, SysCLK, and imposed error signal.

evanh · 2018-11-20 18:54

I've added a Sinc3 now too, but even 4 kbits of bitstream is enough to break the 32-bit counters. EDIT: Sinc3 presumably tipples the bit depth. So, 12 * 3 = 36-bit counters required for 4 kbit sampling period.

jmg · 2018-11-20 19:02

evanh wrote: »

I've added a Sinc3 now too, but even 4 kbits of bitstream is enough to break the 32-bit counters.

Do you have any numbers for predicted improvement in ENOB for each of the filters yet ? (eg vs start/end window size and N samples )
Is the noise here white noise ?

Electrodude · 2018-11-20 19:05

evanh wrote: »

I've added a Sinc3 now too, but even 4 kbits of bitstream is enough to break the 32-bit counters. EDIT: Sinc3 presumably tipples the bit depth. So, 12 * 3 = 36-bit counters required for 4 kbit sampling period.

Can't you just truncate the bottom four (or more) bits on the third accumulator to make it fit in 32 bits? None of the bits below the top 12 or so really mean anything anyway and are just there to minimize quantization error.

evanh · 2018-11-20 19:09

Electrodude wrote: »

Can't you just truncate the bottom four (or more) bits on the third accumulator to make it fit in 32 bits? None of the bits below the top 12 or so really mean anything anyway and are just there to minimize quantization error.

Probably, it would have to be done within the accumulator circuit though, and then it's not as per that drawing.

evanh · 2018-11-20 19:11

jmg wrote: »

Do you have any numbers for predicted improvement in ENOB for each of the filters yet ?

No idea. That's the mathy part.

evanh · 2018-11-20 19:48

It's 9 AM, brain is dead, I'm off to bed.

cgracey · 2018-11-21 00:50

jmg wrote: »

cgracey wrote: »

With the filtering, for a steady-voltage input, the samples settle at one level. That's 4x the certainty, which counts for two bits.

- but the 'filtering' only applies to the leading and trailing 32 bits, the noise in all those other bits, passes straight through.

cgracey wrote: »

And it doesn't matter how long the sample run is, although past 6k bits, the 1/f noise become influential.

Noise is always there, spread over all the samples.

Jmg, rather than claim a 2-bit Improvement, I should just say that the windowing is restoring two bits that should have been there, alteady, but were being destroyed by cycling patterns.

cgracey · 2018-11-21 01:01

jmg wrote: »

cgracey wrote: »

I think all we need is a smart pin mode to make windowed measurements for ADC sampling. That is not much logic. We can't automate ADC operation beyond the windowed measurement, though, because it gets too complex, too quickly. This means that the streamer won't have the planned ADC modes, because it would need to make windowed measurements for them to be of acceptable quality and that would over-complicate the streamer.

I thought this was going to be optional ?
Forcing any 'filter' and removing an ADC streamer ability, does not seem a step in the 'flexible P2' direction at all ?

Before encoding a single filter into silicon, a lot more testing and analysis is needed.
eg How does this vary with SysCLK - 250MHz seems a quite high test value which will have millivolts of charge cap variation, and thus be prone to supply ripple/ringing etc.

Jmg, I've tested this across frequency, even with the 24MHz RCFAST clock, per Yanomani's request. It just works!

I think the best contemplative understanding is lent from the staggered-oversampling concept.

Notably, this algorithm totally fails if the order of the first and last 32 bits is tampered with. Those bits must be as they come from the ADC, in order.

There is a rational explanation for why this works, but none of us can exactly identify it. I'm pretty sure that it has to do with the tapering periods covering, at least, the <7-bit spans of rise-to-rise patterns in the ADC bitstream data.

cgracey · 2018-11-21 01:16

Evanh, your testing looks interesting. I get the concept of one accumulator following another, but cannot figure out what that would do. Do you see any indication that we could get better than an 8-bit result out of 256 samples? I don't suppose that's possible, but maybe by some subtlety in the bitstream, more resolution could be sussed out. when I get back in the office and I have bigger monitors, I will look through your posts' data more more carefully.

potatohead · 2018-11-21 01:24

.
We should test with a variety of input signals. I will put that on my list for when I get a board.

Yanomani · 2018-11-21 01:38

cgracey wrote: »

Jmg, I've tested this across frequency, even with the 24MHz RCFAST clock, per Yanomani's request. It just works!

I think the best contemplative understanding is lent from the staggered-oversampling concept.

Notably, this algorithm totally fails if the order of the first and last 32 bits is tampered with. Those bits must be as they come from the ADC, in order.

There is a rational explanation for why this works, but none of us can exactly identify it. I'm pretty sure that it has to do with the tapering periods covering, at least, the <7-bit spans of rise-to-rise patterns in the ADC bitstream data.

Thanks Chip, for taking the burden of doing these extra tests.

After almost six days being forced to be in watch-only mode, thru a really small Android screen, now I'm back home, and a 15" seems a whole lot of a landscape to my eyes.

Now, I'm searching for some raw sample readings, because I want to peruse the bitstreams a little, with my own freshed-up eyes.

As long as there is a note about the sysclk value used in getting the sample bitstreams and some hint about the analog voltage or waveform being sampled, I believe they'll be enough for my needs.

Henrique

ADC Sampling Breakthrough

Comments