ADC Sampling Breakthrough

cgraceycgracey Posts: 10,477
edited November 19 in Propeller 2 Vote Up1Vote Down
I've been thinking like mad about how to get good ADC samples.

We all know about oversampling, where you accumulate bits from an ADC for some number of times longer than your desired sample size, then you divide the accumulated value by your oversampling factor. It works okay, but kills bandwidth.

In our case, I noticed that for any reading up to ~4k bits, the sum always had one of four adjacent values for a fixed voltage into the ADC. I realized that this was due to the lumpiness of the bit stream coming from the ADC. It cycles state at least once every 7 clocks. Just grabbing a run of those bits to make a conversion is often like picking it up by sharp ends, since you never know when these periodic rises and falls are going to occur in the data stream, introducing a pointy bias that's costly to oversample away.

So, what if you could oversample 32 times, but on the SAME data at slightly different offsets? To do this is really simple, it turns out. It's kind of a tapered filter. It works like this on a stream of N+32 bits:

(1) Clear sample accumulator.
(2) If bit 0 high, add 1 into accumulator. (This is used by 1 offset.)
(3) If bit 1 high, add 2 into accumulator. (This is used by 2 offsets.)
(4) If bit 2 high, add 3 into accumulator. (This is used by 3 offsets.)
(5) If bits 3..31 high, add 4..32 into accumulator. (These are used by 4..32 offsets.)
(6) If bits 32..N high, add 32 into accumulator. (These are used by all offsets.)
(7) If bit N+1 high, add 32 into accumulator. Now were tapering down.
(8) If bit N+2 high, add 31 into accumulator.
(9) If bit N+3 high, add 30 into accumulator.
(10) If bit N+4..N+32 high, add 29..1 into accumulator. Finish tapering down.
(11) Shift accumulator right by 5 bits to get final sample.

This worked a complete miracle on the ADCs.

I used the streamer to read the ADC pin in on each clock cycle and assemble bytes which got dumped into hub RAM. I could peruse that data and try out different ideas.


Here is 8-bit-quality sampling of a slowly-rising signal. This detail shows 12 steps in the ADC LSBs:

ADC_taper_8_steps.jpg


Now look at the same thing, but 12-bit-quality:

9253ec4fa338f4832f55f394931488.jpg


Here is 12-bit-quality sampling of a steady voltage. No wandering, anymore:

ADC_taper_12_steady.jpg


Now here's 13-bit-quality sampling of a 13mV span sawtooth. At 13 bits, 1/f noise starts to become an issue:

ADC_taper_13_sawtooth.jpg


Here's my code that runs on the P2 silicon:
' ADC tapered-filter sampling

con	p	= 5		'ADC pin, adjacent DAC pin is p^1
	m	= 30		'monitoring DAC pin for watching LSBs of conversion
	s	= $200		'samples/32 per reading ($7FE max)

dat	org

	hubset	##%1_000001_0000011000_1111_10_00	'enable crystal+PLL, stay in 20MHz+ mode
	waitx	##20_000_000/100			'wait ~10ms for crystal+PLL to stabilize
	hubset	##%1_000001_0000011000_1111_10_11	'now switch to PLL running at 250MHz

	wrpin	dacmod,#p^1		'output test level on adjacent pin DAC for ADC input
	wxpin	#1,#p^1
	wypin	dacval,#p^1
	dirh	#p^1

	dirh	#m			'set output for what will be monitoring DAC
'
'
' Get tapered sample
'
loop	callpa	adcmodp,#getsamp	'get pin sample

	mov	x,#0			'reset sample accumulator

	mov	y,#1			'taper up from 1 sample to 32 samples
	rflong	z
	rep	#3,#32
	shr	z,#1		wc
if_c	add	x,y
	add	y,#1

	rep	#4,##s-2		'count middle samples as 32
	rflong	z
	ones	z
	shl	z,#5
	add	x,z

	mov	y,#32			'taper down from 32 samples to 1 sample
	rflong	z
	rep	#3,#32
	shr	z,#1		wc
if_c	add	x,y
	sub	y,#1

	shr	x,#5			'divide sample accumulator by 32 to make tapered average
'
'
' Output result
'
	shl	x,#2			'magnify of LSBs
	sub	x,#55			'add offset (must adjust to center waveform)

	setbyte	dacres,x,#1		'output conversion LSBs to monitoring DAC
	wrpin	dacres,#m

	incmod	modctr,#0	wc	'update the DAC that feeds the ADC
if_c	add	dacval,#$01
	bitl	dacval,#8
	wypin	dacval,#p^1

	jmp	#loop			'loop
'
'
' Get sample
'
getsamp	wrpin	pa,#p			'set ADC mode
	waitx	adcpre			'allow ADC to aclimate
	wrfast	#0,buff			'ready to have streamer write ADC bits into buffer
	xinit	recmode,#0		'do streamer capture of ADC bits
	jnxfi	#$			'wait for streamer command finished
_ret_	rdfast	#0,buff			'ready to read sample longs (lsb=first sample, msb=last)

'
'
' Data
'
adcmodg	long	%100000_0000000_00_00000_0	'ADC gio
adcmodp	long	%100010_0000000_00_00000_0	'ADC pin adjacent
adcmodv	long	%100001_0000000_00_00000_0	'ADC vio

dacmod	long	%10110_00000000_01_00010_0	'DAC 16-bit with random dither
dacres	long	%10110_00000000_00_00000_0	'DAC 8-bit

recmode	long	$D<<28 + p<<17 + s<<5 + $20	'streamer single-pin record command

dacval	long	$8000+15			'DAC value for feeding ADC

adcpre	long	$20				'ADC acclimation period

modctr	long	0

buff	long	$1000				'sample buffer address in hub

x	res	1
y	res	1
z	res	1
774 x 636 - 133K
863 x 702 - 167K
767 x 654 - 138K
940 x 718 - 185K
«13456731

Comments

  • 906 Comments sorted by Date Added Votes
  • cgraceycgracey Posts: 10,477
    edited November 19 Vote Up0Vote Down
    I definitely need to add this tapered filtering into the smart pins for measuring the ADCs. It's a variation on what's already there. This is going to make the ADCs work like they are supposed to for discrete samples.

    Using this, I'll now look into getting the auto-calibrated sampling going for really mindless instrumentation operation.
  • cgraceycgracey Posts: 10,477
    edited November 19 Vote Up0Vote Down
    Without the tapering filter, this is what all up-to-12-bit-quality sampling looks like on steady voltages:

    ADC_plain.jpg


    With the tapered filter, you get nice single-level samples if you're close to the exact voltage that level represents, with only seldom spikes to adjacent samples.

    1130 x 945 - 186K
  • cgracey wrote: »
    I definitely need to add this tapered filtering into the smart pins for measuring the ADCs. It's a variation on what's already there. This is going to make the ADCs work like they are supposed to for discrete samples.

    Using this, I'll now look into getting the auto-calibrated sampling going for really mindless instrumentation operation.

    Will this q32 filter be optional, to allow someone to add a different filter if they find one that better suits their application ?
  • jmg wrote: »
    cgracey wrote: »
    I definitely need to add this tapered filtering into the smart pins for measuring the ADCs. It's a variation on what's already there. This is going to make the ADCs work like they are supposed to for discrete samples.

    Using this, I'll now look into getting the auto-calibrated sampling going for really mindless instrumentation operation.

    Will this q32 filter be optional, to allow someone to add a different filter if they find one that better suits their application ?

    It will just be a smart pin mode, like the rest.

    If they want some other filter, they'd have to do it in software, like I just did.
  • This sounds like a trapezoidal shaped moving average filter, right?
    Prop Info and Apps: http://www.rayslogic.com/
  • Rayman wrote: »
    This sounds like a trapezoidal shaped moving average filter, right?

    Right. Sounds like an apt description.
  • From the PDF:

    The amount of noise reduction is equal to the square-root of the number of
    points in the average. For example, a 100 point moving average filter reduces the noise by a factor of 10.
  • TubularTubular Posts: 3,290
    edited November 19 Vote Up0Vote Down
    I was reading the same thing, pedward. Its a good presentation with real-world application and commentary.

    I wonder how much logic this adds though, I would think it substantial, but I could be wrong.
  • I would bet the P2 could be used in place of that FPGA, reducing their cost by the square root of 100! :smiley:
  • Reflecting on this a bit further, I wonder how far we're overshooting a minimal solution, that might require a wholoe lot less logic.

    I guess its easy enough to test, quicker than building a model in this case, since its already working with real hardware and real signals
  • Just be aware that the "trapezoidal" filter described in the cited paper is not the kind needed here. The one in the paper uses both positive and negative filter coefficients to create a trapezoidal output from a step input -- useful for peak detection. Here, you want positive, trapezoidal filter coefficients, or -- better -- Gaussian (binomial) coefficients.

    But, oh jeez, here we go again: more added hardware complexity to further delay deliverable silicon. Where -- and when -- does it end?

    -Phil
    “Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away. -Antoine de Saint-Exupery

  • But, oh jeez, here we go again: more added hardware complexity to further delay deliverable silicon. Where -- and when -- does it end?

    -Phil

    They have a meeting on the respin with ON in 7 or 8 days, Chip promised to meet that deadline.
  • Surely this meeting with ON won't be the only one before a respin. The first silicon won't be available for testing by a wider population of users until after that. What happens if we find additional bugs in the next few weeks? I would assume Parallax will wait to do a respin until they've gotten feedback from people working with the chips that they just received.
  • pedward wrote:
    They have a meeting on the respin with ON in 7 or 8 days, Chip promised to meet that deadline.
    Even still, the probability of nasty surprises in the new silicon -- and the need for yet another rev -- increases with the number of modifications made.

    The beauty of the P1 is that there are no built-in peripherals, except for the counters. Everything can be done simply in software. The P2 now appears to be getting overstuffed with specialized hardware to what? Simplify things for the user? I really don't get it. But that doesn't mean I can't be sold at some point.

    -Phil
    “Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away. -Antoine de Saint-Exupery
  • TubularTubular Posts: 3,290
    edited November 19 Vote Up0Vote Down
    Perhaps, on domain of ADC quality, we're at that stage where things are still complicated, before they get simpler again.

    I wonder whether this ADC (and other peripheral) routines shouldn't go into the ROM, like a kind of analog BIOS, so they can be called from Taqoz or Cluso's monitor or other code. Also I dare say this trapezoidal filter idea will likely evolve, and ROM changes can apparently be slipped less painfully into the sausage, later in the process.

    --

    I think you'll like this P2 chip eventually, Phil. I remember my early days with the P1 and the expanse of "what suddenly become possible" and how easy it was to develop quick proof-of-concepts. This P2 chip already has me down at the local coil-makers, talking possibilities and getting custom stuff made.

    I'm also (respectfully) suggest that the P1 counters, while brilliant and flexible, still required an AN001 education process to be aware of how to take advantage of them, and the market and online catalogues _still_ think P1 doesn't have any UARTs, nor any timers, nor any ADC capability at all, despite us insiders knowing better.

    This is what Digikey says today
    P1_peripherals.png

    See that blank peripherals box? For P2 it'll be important to be recognized straight up for analog, video, streaming /DMA credentials, right from the beginning.

    USB, HDMI, SDXC and other licensable tech runs the risk of not being recognized like the P1 counters, but that can be dealt with by paying some fees and creating some definitive IP, if need be.

    1309 x 362 - 44K
  • What we are ending up with is mind boggling peripheral block totally unlike anything around. They are not traditional UART/I2C/SPI/USB/etc garden run-of-the-mill peripherals, but extremely powerful building blocks capable of so much more.
    I am in awe of the streamers ability to create VGA at 1920x1080 with so few instructions. You know what, its such a waste of a cog :smiley: I think we need some tiny cogs for this ;) Seems like a cog with 64-128 cog ram longs and the same for lut would be sufficient.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • Cluso, don't worry I'm hoping people will be able to make use of the remaining video COG processing power for sprites, fancy effects etc. If there is sufficient hub bandwidth and COG RAM left it won't need to go to waste. Maybe it could do some legacy PS/2 keyboard/mouse stuff too when it is not setting up the next scanline.
  • evanhevanh Posts: 5,871
    edited November 20 Vote Up0Vote Down
    Chip,
    I'm pretty certain you have done a second-order filter - except it's chunked rather than windowed, ie: non-moving average.

    If you really want to throw a little extra hardware at the problem then I'd recommend doing a double accumulator. You might remember a drawing from the past:
    appnote.tsa002.png

    This has three accumulators so is third-order. Ignore the decimators, software handles that.

    MOD_CLK there becomes sysclock of the Prop2.

    600 x 176 - 11K
    Money is a placeholder for cooperation
  • ozpropdevozpropdev Posts: 2,310
    edited November 20 Vote Up0Vote Down
    @Cluso99
    The streamers can be stuffed by interrupts.
    We already have video running this way.
    The video in my Logic Analyzer does this, even though I haven't taking advantage of the idle time.
    Melbourne, Australia
  • cgraceycgracey Posts: 10,477
    edited November 20 Vote Up0Vote Down
    evanh wrote: »
    Chip,
    I'm pretty certain you have done a second-order filter - except it's chunked rather than windowed, ie: non-moving average.

    If you really want to throw a little extra hardware at the problem then I'd recommend doing a double accumulator. You might remember a drawing from the past:
    appnote.tsa002.png

    This has three accumulators so is third-order. Ignore the decimators, software handles that.

    MOD_CLK there becomes sysclock of the Prop2.

    Evanh, while that 2nd-order filter diagram looks simple, I haven't been able to get my head around it after many years of trying. Maybe you could explain to me what that thing does.

    What we have in this new filter is extremely simple in hardware. You guys have way overblown the issue. This is simply an adder that adds 1 to 32, 32 for a while, then 32 to 1, based on each ADC sample bit. It's nothing to freak out about. I consider it a gift from God, because it's got the ADC's working exactly as they need to for discrete sampling. It's a miracle! I had no expectation that such an improvement would even be possible. The ADC's are exactly where they need to be now.
  • evanhevanh Posts: 5,871
    edited November 20 Vote Up0Vote Down
    cgracey wrote: »
    Evanh, while that 2nd-order filter diagram looks simple, I haven't been able to get my head around it after many years. Maybe you could explain to me what that thing does.
    I'll do my best but I'm not really any better informed than you.

    This is simply an adder that adds 1 to 32, 32 for a while, then 32 to 1, based on each ADC sample bit.
    This is the same except that the double accumulator is not tied to 5 bits and is windowed.

    The confusing part is the way the accumulators roll over at an alarming rate. This washes out in the differentiation at the decimator because the summing is circular.

    That's the best overview I can muster.

    Money is a placeholder for cooperation
  • Chip,
    Smartpin mode %01100, "Count A-input highs", with X set for the DEC_CLK period, is exactly as one accumulator and one decimator stage, with its differentiation, from that drawing.

    And by setting X to zero it just keeps counting up. Then the decimation, and differentiation, has to be arranged in software.

    Money is a placeholder for cooperation
  • evanhevanh Posts: 5,871
    edited November 20 Vote Up0Vote Down
    So, without needing any more flops in the smartpins, it should be possible to have a mode that makes the decimation flops into a second accumulator.

    EDIT: And this shouldn't change the amount summing logic either, since the existing differentiation requires it as well.

    Money is a placeholder for cooperation
  • Interesting but not trivial. Best to finish off this first, I think:
    http://forums.parallax.com/discussion/comment/1453877/#Comment_1453877
    Formerly known as TonyB
  • cgracey wrote: »

    I consider it a gift from God, because it's got the ADC's working exactly as they need to for discrete sampling. It's a miracle! I had no expectation that such an improvement would even be possible. The ADC's are exactly where they need to be now.

    So, as a casual observer, please give rough usefull effective sample rate for someone who wants 12 bit precision ADC in a P2 application. Willing to do software filter.
    I am the Master, and technology my slave.
  • cgraceycgracey Posts: 10,477
    edited November 20 Vote Up0Vote Down
    The_Master wrote: »
    cgracey wrote: »

    I consider it a gift from God, because it's got the ADC's working exactly as they need to for discrete sampling. It's a miracle! I had no expectation that such an improvement would even be possible. The ADC's are exactly where they need to be now.

    So, as a casual observer, please give rough usefull effective sample rate for someone who wants 12 bit precision ADC in a P2 application. Willing to do software filter.

    Because the ADC duty cycle ranges from ~16% to ~84%, ~1/3 the range is of no use within the power rails. So, you'd need to oversample by 1/(2/3), or ~50%. So, that's 2^12 * 1.5, or 6,144 clocks plus 32 for the taper. That's 6,176 clocks.

    If we have smart pin filtering, you can pick up a filtered sample every 6,176 clocks.

    Without smart pin filtering, you can capture a sample's worth of bits every 6,176 clocks and then process them in software. The time to process in software will be less than the sample time, but it would eat into your CPU time by ~20%. While that's not much, it would be a big bugaboo to have to have the filtering code resident and lose use of the streamer.

    So, in either case, at 250MHz, your sample rate could be 250M/6,176, or ~40.5KHz.
  • Almost 44.1kHz.
  • evanhevanh Posts: 5,871
    edited November 20 Vote Up0Vote Down
    Chip,
    That drawing is third-order, not second-order. I just noticed you called it second-order. So, to get second-order, only do two of the three shown stages. And even then only do two accumulators. Forget the decimator stages since software can do those easy.

    That way you aren't making it any bigger than existing resources in the smartpins.

    EDIT: Here's all that's needed
    304 x 176 - 17K
    Money is a placeholder for cooperation
  • evanh wrote: »
    Chip,
    That drawing is third-order, not second-order. I just noticed you called it second-order. So, to get second-order, only do two of the three shown stages. And even then only do two accumulators. Forget the decimator stages since software can do those easy.

    That way you aren't making it any bigger than existing resources in the smartpins.

    EDIT: Here's all that's needed

    With a second-order filter, wouldn't we need a 2nd-order integrator? I still have no idea how these things work.
Sign In or Register to comment.