Phil, this is a 65-tap Tukey filter, with the input going from 1/6-duty to 5/6-duty over 10,000 bit samples:
You can clearly see that there are way more than 64 steps here. Things get a little low-resolution at the 50% duty point (10101010...), the 25% and 75% points, and so on. What can we do to fix that, as it's the only problem we've got?
Here is the code:
'a=4 : b = 00 : m=32 '17-tap Hann window, t2 and t3 cancel out
'a=4 : b = 08 : m=16 '25-tap Tukey window
'a=8 : b = 00 : m=04 '33-tap Hann window, t2 and t3 cancel out
'a=8 : b = 16 : m=02 '49-tap Tukey window
a=8 : b = 32 : m=01.5 '65-tap Tukey window
'a=8 : b = 48 : m=01 '81-tap Tukey window
gx = 10 : cy = 540 : by = cy + 100 : ay = by + 100 : dy = ay + 50 : sy = dy + 50
sparse = 10
t0 = 0
t1 = t0 + a
t2 = t1 + a
t3 = t2 + b
t4 = t3 + a
t5 = t4 + a
topbit = t5
print "taps",,t0,t1,t2,t3,t4,t5
dim t(topbit)
for x = 0 to topbit : t(x) = 0 : next x 'clear bits
acc = 0
inta = 0
intb = 0
intc = 0
for iter = 0 to 9999
for x = topbit to 1 step -1 : t(x) = t(x-1) : next x 'shift bits
acc = acc + 1/6 + ((4/6) * (iter/10000)) 'new ADC bit
if acc >= 1 then
acc = acc - 1
t(0) = 1
else
t(0) = 0
endif
delt = t(t0) - t(t1)*2 + t(t2) - t(t3) + t(t4)*2 - t(t5)
inta = inta + delt
intb = intb + inta
intc = intc + intb * m
x = gx + iter/sparse
if x = int(x) then
line x, cy, x, cy - intc/8 'plot intc
line x, by, x, by - intb 'plot intb
line x, ay, x, ay - inta*2 'plot inta
line x, dy, x, dy - delt*8 'plot delta
line x, sy, x, sy - t(0)*8 'plot ADC bit
endif
next iter
You can get SmallBASIC here. It's only 4MB and is really simple to operate:
I don't doubt that the steps are small as the input slowly changes. I'm more concerned about DC stationarity and accuracy. What happens with a constant, steady voltage input, say, from a battery?
I don't doubt that the steps are small as the input slowly changes. I'm more concerned about DC stationarity and accuracy. What happens with a constant, steady voltage input, say, from a battery?
-Phil
The reason I did 10,000 samples was so that you could see that measurement-to-measurement there is small, rather consistent change. Remember that each sample is only a product of the 65 bits in the tap chain. There is no other state involved. You're seeing every 10th measurement of 10,000 measurements. There's hardly any difference between neighboring measurements.
You can clearly see that there are way more than 64 steps here. Things get a little low-resolution at the 50% duty point (10101010...), the 25% and 75% points, and so on. What can we do to fix that, as it's the only problem we've got?
Can you add plot of INL (difference from ideal linear line, to the actual indicated value.) ?
As you say that looks worst ~ 50%/25% etc, but none of the manhattan plot seems to quantify that error ?
It would also be nice to plot the many samples on a infinite persistence scope type line, to give a 'fat line' rather than a half-black plot.
A separate step response plot would show the filter rise time/delay time effects, I think that now partially appears on the left-side of the polygon ?
Maybe add some small noise in there, as a real scope mode use, will have ADC noise included.
There is not much point in making a filter that is much better than the Silicon/System noise floor.
Looks to me like easiest thing to do is have chip provide stream of sums of bits at a user specified rate (like 64 bits, for example).
That should give user plenty of time to do whatever higher order sinc filter they want...
Wait, that doesn't work... Amazing how something so simple can be tough to analyze...
Here is a second order modulator. Third order modulators are tricky and you need to reduce the gains somewhat.
'a=4 : b = 00 : m=32 '17-tap Hann window, t2 and t3 cancel out
'a=4 : b = 08 : m=16 '25-tap Tukey window
'a=8 : b = 00 : m=04 '33-tap Hann window, t2 and t3 cancel out
'a=8 : b = 16 : m=02 '49-tap Tukey window
a=8 : b = 32 : m=01.5 '65-tap Tukey window
'a=8 : b = 48 : m=01 '81-tap Tukey window
gx = 10 : cy = 540 : by = cy + 100 : ay = by + 100 : dy = ay + 50 : sy = dy + 50
sparse = 10
t0 = 0
t1 = t0 + a
t2 = t1 + a
t3 = t2 + b
t4 = t3 + a
t5 = t4 + a
topbit = t5
print "taps",,t0,t1,t2,t3,t4,t5
dim t(topbit)
for x = 0 to topbit : t(x) = 0 : next x 'clear bits
adin = 0
adinta = 0
adintb = 0
adintc = 0
inta = 0
intb = 0
intc = 0
for iter = 0 to 9999
for x = topbit to 1 step -1 : t(x) = t(x-1) : next x 'shift bits
adin = 1/6 + ((4/6) * (iter/10000)) 'new ADC bit
adinta = adinta+adin
adintb = adintb+adinta
adintc = adintc+adintb
' adinta for first order
' adintb for second order
' adintc for third order (currently not working)
if adintb >= 1.0 then
adintc = adintc - 1.0
adintb = adintb - 1.0
adinta = adinta - 1.0
t(0) = 1
else
t(0) = 0
endif
delt = t(t0) - t(t1)*2 + t(t2) - t(t3) + t(t4)*2 - t(t5)
inta = inta + delt
intb = intb + inta
intc = intc + intb * m
x = gx + iter/sparse
if x = int(x) then
line x, cy, x, cy - intc/8 'plot intc
line x, by, x, by - intb 'plot intb
line x, ay, x, ay - inta*2 'plot inta
line x, dy, x, dy - delt*8 'plot delta
line x, sy, x, sy - t(0)*8 'plot ADC bit
endif
next iter
For DC levels, we get toggling between two 6-bit-quality values.
Yup. Thanks for doing the test, Chip! I think this is where the rubber meets the road. A large percentage of apps are going to be reading things like temperature, light levels, etc., that don't change much from sample to sample. Precision and accuracy are what matter here. And you really need 2n bit samples to get n-bit resolution -- maybe more with the presence of noise.
Here is a second order modulator. Third order modulators are tricky and you need to reduce the gains somewhat.
'a=4 : b = 00 : m=32 '17-tap Hann window, t2 and t3 cancel out
'a=4 : b = 08 : m=16 '25-tap Tukey window
'a=8 : b = 00 : m=04 '33-tap Hann window, t2 and t3 cancel out
'a=8 : b = 16 : m=02 '49-tap Tukey window
a=8 : b = 32 : m=01.5 '65-tap Tukey window
'a=8 : b = 48 : m=01 '81-tap Tukey window
gx = 10 : cy = 540 : by = cy + 100 : ay = by + 100 : dy = ay + 50 : sy = dy + 50
sparse = 10
t0 = 0
t1 = t0 + a
t2 = t1 + a
t3 = t2 + b
t4 = t3 + a
t5 = t4 + a
topbit = t5
print "taps",,t0,t1,t2,t3,t4,t5
dim t(topbit)
for x = 0 to topbit : t(x) = 0 : next x 'clear bits
adin = 0
adinta = 0
adintb = 0
adintc = 0
inta = 0
intb = 0
intc = 0
for iter = 0 to 9999
for x = topbit to 1 step -1 : t(x) = t(x-1) : next x 'shift bits
adin = 1/6 + ((4/6) * (iter/10000)) 'new ADC bit
adinta = adinta+adin
adintb = adintb+adinta
adintc = adintc+adintb
' adinta for first order
' adintb for second order
' adintc for third order (currently not working)
if adintb >= 1.0 then
adintc = adintc - 1.0
adintb = adintb - 1.0
adinta = adinta - 1.0
t(0) = 1
else
t(0) = 0
endif
delt = t(t0) - t(t1)*2 + t(t2) - t(t3) + t(t4)*2 - t(t5)
inta = inta + delt
intb = intb + inta
intc = intc + intb * m
x = gx + iter/sparse
if x = int(x) then
line x, cy, x, cy - intc/8 'plot intc
line x, by, x, by - intb 'plot intb
line x, ay, x, ay - inta*2 'plot inta
line x, dy, x, dy - delt*8 'plot delta
line x, sy, x, sy - t(0)*8 'plot ADC bit
endif
next iter
Saucy, this looks great, but isn't this demonstrating a 2nd-order ANALOG integrator modulator? The slope looks fantastic, anyway.
What about the DC problem of only having 6-bit resolution?
I have been running some tests with simulated bitstream; using a Windows ap to generate and analyze the data; while tweaking the DSP algorithms so that they will run (and do run) on a P1 when compiled under Simple IDE. So now I can report that I have sinc3 running on the P1 with a C++ function that complies to just 42 assembly instructions on P1 producing two 7 bit values every 16 clock cycles, i.e, by calling twice for every overlapped 32 counts. When carry propagation is fully implemented in my sinc3 function it will produce 4 processed samples every 32 counts. In the meantime; I am now able to run some tests on simulated bitstream by simulating the behavior of an actual sigma delta modulator in software - not that hard to do - see the attached for the core of the algorithm!
And the results to seem to support the idea of getting 8 bits of usable data at NTSC rates and 16 or better at audio. For example if I take a 75Hz tone at -1db sampled at 44100 and run it through a simulated Sigma-Delta with NO "noise diterhering" I find that on the spectrum analyzer there is a second harmonic component that is down about -50db and that everything else would be filtered out by a proper Nyquist filter when you scale to what the expected performance would be a video rate - taking into account the fact that I am simply processing audio right now at 32x - as opposed to ~5700x which is the target rate for audio.
void sigma_delta::set_input (MATH_TYPE x)
{
m_input = (int)(m_gain0*x-m_offset);
}
inline int sigma_delta::dac1 (bool fb)
{
int result;
result = (fb==true?1.0:-1.0);
return result;
}
void sigma_delta::iterate (int N)
{
int i;
for (i=0;i<N;i++)
{
m_accum += m_input;
m_accum += m_gain1*dac1 (q0);
q0 = (m_accum<0?true:false);
m_bitstream <<= 1;
m_bitstream += (q0&0x01);
}
}
// optimized verstion of sinc3 - compiles to just
// 42 instruction on a P1 - but carry propagation
// is not yet implemented yet so you only get two
// valid 7 bit samples per iteration with a range
// of [0..64] and YES the overflow bit is important!
DWORD sigma_delta::sinc3B (DWORD input)
{
DWORD r0, r1, r2, r3, r4, r5, s0, s1;
r0 = input;
s0 = (input&EVEN_BITS)<<1;
s1 = ((input&ODD_BITS)+((input&ODD_BITS)>>2))>>1;
r1 = ((s0&ODD_PAIRS) + (s1&ODD_PAIRS))>>2;
r2 = (s0&EVEN_PAIRS) + (s1&EVEN_PAIRS);
s0 = (r1&EVEN_NIBBLES);
s1 = (r1&ODD_NIBBLES)>>4;
r4 = (r2&EVEN_NIBBLES)<<1;
r3 = (r2&ODD_NIBBLES)>>3;
r4+= (s0+s1);
r3+= ((s0>>8)+s1);
s0 = (r3&EVEN_BYTES);
s1 = (r3&ODD_BYTES);
r5 = (r4<<1)+s0+s1+(s0>>8)+(s1<<8);
return r5;
}
// earlier version of sinc3 ..
DWORD sigma_delta::sinc3 (DWORD input)
{
DWORD s0, s1;
REG[0] = input;
s0 = (input&EVEN_BITS)<<1;
s1 = ((input&ODD_BITS)+((input&ODD_BITS)>>2))>>1;
REG[1] = ((s0&ODD_PAIRS) + (s1&ODD_PAIRS))>>2;
REG[2] = (s0&EVEN_PAIRS) + (s1&EVEN_PAIRS);
s0 = (REG[1]&EVEN_NIBBLES);
s1 = (REG[1]&ODD_NIBBLES)>>4;
REG[4] = (REG[2]&EVEN_NIBBLES)<<1;
REG[3] = (REG[2]&ODD_NIBBLES)>>3;
REG[4]+= (s0+s1);
REG[3]+= ((s0>>8)+s1);
s0 = (REG[3]&EVEN_BYTES);
s1 = (REG[3]&ODD_BYTES);
REG[5] = (REG[4]<<1)+s0+s1+(s0>>8)+(s1<<8);
return 0;
}
MATH_TYPE sigma_delta::randomize3 (MATH_TYPE data)
{
MATH_TYPE voltage = 0.10;
int sample1, sample2, sample3, sample4, sample5;
int i;
set_input (data);
// since we are processing stereo samples on an
// odd and even sample basis we need to flush
// the buffer and precondition the bitstream to
// the new channel - FIXME ELSEWHERE!!!@
iterate (32);
sinc3(m_bitstream);
// now grab the two center values because carry
// propagation is not yet implemented
sample2 = REG[5]&0x0000ff00;
sample1 = REG[5]&0x00ff0000>>8;
set_input (data);
// now itherate again - and get two more valid samples!
iterate (16);
sinc3(m_bitstream);
sample4 = REG[5]&0x0000ff00;
sample3 = REG[5]&0x00ff0000>>8;
// use a [1,3,3,1] convolutional kernel for now
sample5 = 0.5*(sample1+3*sample2+3*sample3+sample4);
voltage = float(sample5-32768)/65536.0;
return voltage;
}
I have been running some tests with simulated bitstream; using a Windows ap to generate and analyze the data; while tweaking the DSP algorithms so that they will run (and do run) on a P1 when compiled under Simple IDE. So now I can report that I have sinc3 running on the P1 with a C++ function that complies to just 42 assembly instructions on P1 producing two 7 bit values every 16 clock cycles, i.e, by calling twice for every overlapped 32 counts. When carry propagation is fully implemented in my sinc3 function it will produce 4 processed samples every 32 counts.
Nice work. The fastspin compiler allows in-line assembler, that may help here ? Did you try that ?
AMC1035 has a second order modulator. I don't think that's what we have...
True, not inside the P2, but P2 needs to talk to these external ADCs, and the same filter can be used to improve the ADC the P2 does have.
If you average/filter the spreadsheet over a number of readings, (for those DC type uses) what ENOB does that spreadsheet indicate ?
The ~10b you got, is probably better than the P2 1/f noise floor already for DC type use.
What about the DC problem of only having 6-bit resolution?
This is what you get when returning a filtered 16-bit value on every clock cycle?
For the sorts of measurements @"Phil Pilgrim (PhiPi)" is referring to, is this the mode you'd be using anyhow? Most of those measurements tend to be low frequency (in the Hz to low KHz range). In which case, would the original unfiltered ADC mode be a better choice here? From what I recall of the original ADC characterization, you'd still get at least a good 10-12bit at ~1KHz sample rate with a sysclk of 180MHz.
Phil also mentioned running from a battery, so I wonder if this is also about whether this will work well under low-power situations. If the chip is running at 180MHz, but waiting most of the time for smart pin events, what would be a reasonable power consumption estimate? Conversely, if you were to run the chip at a lower clock rate, does that help the bit resolution issue at all? In other words, could you sample only 12 bits at 10Mhz and get the same resolution as sampling 16 bits at 160MHz?
If the chip is running at 180MHz, but waiting most of the time for smart pin events, what would be a reasonable power consumption estimate?
Currently, not great, waiting does not save much at all (unlike P1).
The SysCLK tree is global, and the COG influence rather smaller. ADC also can only clock at SysCLK.
Measurements indicate SysCpd=1.60nF for SysCLK tree and CogCpd = 83.33pF/COG
Id = (SysCpd + N*CogCpd) * Vcc * Fi
If Chip drops the integrate C as he has indicated (Chasing higher Scope mode MHz), the lower ADC frequency limits will increase, pushing up the lowest ADC-power envelope.
Conversely, if you were to run the chip at a lower clock rate, does that help the bit resolution issue at all? In other words, could you sample only 12 bits at 10Mhz and get the same resolution as sampling 16 bits at 160MHz?
Most ADC's have a broad but definite sweet spot of operating MHz - above or below that, and they give worse specs.
P2 has not really been characterized too much yet, on ENOB/MHz.
I think the battery was intended as an arbitrary but stable voltage point.
Correct.
______________
I think it's safe now to declare that the filtering methods discussed here cannot increase resolution beyond the natural scope of the sample size. There really are no rabbits in that hat. However, they might still be useful for reducing noise, at the expense of settling time. Nonetheless, I still aver that this is best left to software, rather than trying to integrate some sort of filter into the hardware. Despite the unfortunate delay this has caused in the P2's coming to fruition, it's certainly been interesting to witness the foment! Just remember, though, "If it looks too good to be true, it probably is." :-)
Comments
You can clearly see that there are way more than 64 steps here. Things get a little low-resolution at the 50% duty point (10101010...), the 25% and 75% points, and so on. What can we do to fix that, as it's the only problem we've got?
Here is the code:
You can get SmallBASIC here. It's only 4MB and is really simple to operate:
https://sourceforge.net/projects/smallbasic/
They valleys result when the input stream is a harmonic of the filter length.
I don't doubt that the steps are small as the input slowly changes. I'm more concerned about DC stationarity and accuracy. What happens with a constant, steady voltage input, say, from a battery?
-Phil
The reason I did 10,000 samples was so that you could see that measurement-to-measurement there is small, rather consistent change. Remember that each sample is only a product of the 65 bits in the tap chain. There is no other state involved. You're seeing every 10th measurement of 10,000 measurements. There's hardly any difference between neighboring measurements.
Can you add plot of INL (difference from ideal linear line, to the actual indicated value.) ?
As you say that looks worst ~ 50%/25% etc, but none of the manhattan plot seems to quantify that error ?
It would also be nice to plot the many samples on a infinite persistence scope type line, to give a 'fat line' rather than a half-black plot.
A separate step response plot would show the filter rise time/delay time effects, I think that now partially appears on the left-side of the polygon ?
Maybe add some small noise in there, as a real scope mode use, will have ADC noise included.
There is not much point in making a filter that is much better than the Silicon/System noise floor.
That should give user plenty of time to do whatever higher order sinc filter they want...
Wait, that doesn't work... Amazing how something so simple can be tough to analyze...
For DC levels, we get toggling between two 6-bit-quality values.
For 85/256-duty, we get toggling between 86 and 82.
For 86/256-duty, we get mostly 86 with an occasional 82.
For 87/256-duty, we get toggling between 90 and 86.
These 4-step level changes in a 256-step scale indicate only 6-bit quality.
...Sigh...
I was playing with that Excel spreadsheet and changed input from sinusoid to fixed value of 0.5125
The output of 5th order sinc with 64 bit oversampling varies between 0.506 and 0.518.
That doesn't seem like a real 16-bit precision...
What does 3rd order sinc, with 256 oversampling indicate ? I believe that's what is planned for P2 Sinc.
-Phil
very close to 10b, so better than 1/N
TI's data for sinc3/256 indicated ~ 13.5bits measured. (AMC1035)
Saucy, this looks great, but isn't this demonstrating a 2nd-order ANALOG integrator modulator? The slope looks fantastic, anyway.
What about the DC problem of only having 6-bit resolution?
I think the 64-tap window is not the best. Max value is only 64, ramp length 16 and plateau 32. Try this:
Max value is 256, ramp length 32 and plateau 0. Max sum is over twice as big.
And the results to seem to support the idea of getting 8 bits of usable data at NTSC rates and 16 or better at audio. For example if I take a 75Hz tone at -1db sampled at 44100 and run it through a simulated Sigma-Delta with NO "noise diterhering" I find that on the spectrum analyzer there is a second harmonic component that is down about -50db and that everything else would be filtered out by a proper Nyquist filter when you scale to what the expected performance would be a video rate - taking into account the fact that I am simply processing audio right now at 32x - as opposed to ~5700x which is the target rate for audio.
CMA(8,8,8) and CMA(16,16,16) are length 24 and 48 Sinc3 filters.
That looks a little rougher than the Tukey64, but generates twice the filter value.
The problem with all these windows is that we're only getting 6-bit performance, in the case of a 65-tap setup.
Maybe we should just implement a Sinc3/32, staggered for samples every 16 clocks, and get really good 8-bit samples, in every case.
Right. We have a simple 1st-order modulator in each pin.
How are these Sinc3 filters?
True, not inside the P2, but P2 needs to talk to these external ADCs, and the same filter can be used to improve the ADC the P2 does have.
If you average/filter the spreadsheet over a number of readings, (for those DC type uses) what ENOB does that spreadsheet indicate ?
The ~10b you got, is probably better than the P2 1/f noise floor already for DC type use.
They are Sinc3 by definition.
CMA(8) = [1,1,1,1,1,1,1,1] Sinc1
CMA(8,8) = [1,1,1,1,1,1,1,1] * [1,1,1,1,1,1,1,1] Sinc2
CMA(8,8,8) = [1,1,1,1,1,1,1,1] * [1,1,1,1,1,1,1,1] * [1,1,1,1,1,1,1,1] Sinc3
where * is convolution.
You can see how the rectangular Sinc1 convolved with itself turns into the triangular Sinc2.
This is what you get when returning a filtered 16-bit value on every clock cycle?
For the sorts of measurements @"Phil Pilgrim (PhiPi)" is referring to, is this the mode you'd be using anyhow? Most of those measurements tend to be low frequency (in the Hz to low KHz range). In which case, would the original unfiltered ADC mode be a better choice here? From what I recall of the original ADC characterization, you'd still get at least a good 10-12bit at ~1KHz sample rate with a sysclk of 180MHz.
Phil also mentioned running from a battery, so I wonder if this is also about whether this will work well under low-power situations. If the chip is running at 180MHz, but waiting most of the time for smart pin events, what would be a reasonable power consumption estimate? Conversely, if you were to run the chip at a lower clock rate, does that help the bit resolution issue at all? In other words, could you sample only 12 bits at 10Mhz and get the same resolution as sampling 16 bits at 160MHz?
I think the battery was intended as an abritrary but stable voltage point.
Currently, not great, waiting does not save much at all (unlike P1).
The SysCLK tree is global, and the COG influence rather smaller. ADC also can only clock at SysCLK.
Measurements indicate SysCpd=1.60nF for SysCLK tree and CogCpd = 83.33pF/COG
Id = (SysCpd + N*CogCpd) * Vcc * Fi
If Chip drops the integrate C as he has indicated (Chasing higher Scope mode MHz), the lower ADC frequency limits will increase, pushing up the lowest ADC-power envelope.
Most ADC's have a broad but definite sweet spot of operating MHz - above or below that, and they give worse specs.
P2 has not really been characterized too much yet, on ENOB/MHz.
CMA(x,y,z) is a Sinc3 filter when x=y=z. There are only four taps:
There are a couple of useful CMA(x,y,z) with sum=x*y*z = 2^N:
CMA(8,8,8), length = 24, sum = 512
CMA(16,16,16), length = 48, sum = 4096
And a couple with sum ~ 2^N:
CMA(10,10,10), length = 30, sum = 1000
CMA(20,20,20), length = 60, sum = 8000
______________
I think it's safe now to declare that the filtering methods discussed here cannot increase resolution beyond the natural scope of the sample size. There really are no rabbits in that hat. However, they might still be useful for reducing noise, at the expense of settling time. Nonetheless, I still aver that this is best left to software, rather than trying to integrate some sort of filter into the hardware. Despite the unfortunate delay this has caused in the P2's coming to fruition, it's certainly been interesting to witness the foment! Just remember, though, "If it looks too good to be true, it probably is." :-)
Cheers!
-Phil