And in other news I think I have found the source of the thrashing that is occurring between stairsteps which is causing so much grief; especially for DC users -- and the solution is to use a sinc three filter driving a Nyquist; with the possible addition of an external "modulator" which for audio use can be implemented with a simple Op Amp like the 741 variety configured as an "integrator", I.e low pass filter - which would (on a P1) receive bitstream from one pin, the way that people have always been doing "duty" and with the right selection of components it would perform the desired two stages of summation prior to feeding the output to the P1's comparator on another pin and presto - instant second order sigma delta ADC and with a little luck I might even be digitizing NTSC by Christmas (well I can always wish - but audio is a slam dunk.
O.K. the reason for the thrashing, between stair step values for certain amplitudes and inputs, based on running a software sigma-delta that takes 24 bit floating point input and runs a 32x oversample through sinc3 and a baby Nyquist by changing the post sinc3 summation kernel from [1,3,3,1] my earlier post to a [-1,5,5,-1] kernel so that the P2 will achieve 8 bits at 31Mhz!! (BELIEVE!@@@!). what we find is that it is not dithering noise or the lack thereof that causes the instability; or the ZERO input capacitance of the integrator FET (because there is none so the time constant is zero unless I start running simulation is PSPICE which is a possibility .. but rather it is because sigma delta modulation is in and of itself a type of highly deterministic chaotic fractal convolutional code - even if it is less chaotic than lets say, either Mandelbrot or Julia, and rather more like Hilbert or Seripenski - so that off the top of my head I cant tell you the formal "fractal dimension" associated there with; but I am thinking that it is in fact Hilbert complete, as opposed to being associated with any sort of Cantor group - which means of course it is a computable function on a suitable Universal Turing machine capable of general computations involving deterministic finite automatons.
Implication?? All sigma-deltas oscillate "out of band" for certain inputs. That's how it works. And it is for discrete input, a type of finite state machine. Of course that means that if you look at one of the problem cases on a spectrum analyzer display, having an output of a 2^N bank polyphase filter - then for at least one particular case of 40Hz or thereabouts, at around -20db, it is easy to see that in sub bands 6 and 12 is where the oscillation is occurring - but that the frequency of spurious oscillation will change according to input. The fix therefore is to use sinc3 with or without an external modulator and an appropriate Nyquist filter to isolate the sub-band of interest.
I think you still need an audio amp if you want to connect headphones...
Or, do you? Might get something with direct connection...
I bet we need an RC filter there too...
It will 'work' with Class D / direct connection, but a proper low pass filter and amplifier will allow PC sound card capture too...
I also found this - older, but good because it includes plots of the nodes in the SDM ADCs
That spec's numbers for 20kHz Audio targets (44/48kHz sampling?) , and 16b CD quality.
It has equation for those plots in your link of
First Order Sigma-Delta Modulation
For every doubling of the oversampling ratio, the SNR improves by 9.03 dB, or equivalently, the resolution improves by 1.5 bits.
It was desired to convert a 20 kHz audio band to CD quality resolution of 16 bits. Using Eq. 13, we can compute that the required fs with a 1 bit internal ADC is 96.78 MHz.
ie if we ignore the noise floors/real world distortion SNR, the P2 operating above 96.78MHz can manage 20kHz/16b Audio, with first order oversampled modulators. (single integrators)
Second Order Sigma-Delta Modulation
For every doubling of the oversampling ratio,fD/fB, the SNR improves by 15.02 dB, or the equivalent resolution by 2.5 bits,
a 20 kHz audio band needs to be converted to a resolution of 16 bits. The fs needed by a 2nd order sigma-delta modulator using only a 1 bit quantizer is, from Eq. 22, 6.12 MHz
or, put another way, an (ideal) Second order SDM can give 316.274 kHz bandwidth, with a 96.78 MHz fs.
Real World example : AMC1035 (2nd order) achieves 16 bits of resolution with a dynamic range of 87 dB at a data rate of 82 kSPS (from 21MHz Fs & Sinc3 filter).
Jmg, sounds like we really need to add a resistor and capacitor to our current first-order modulator, doesn't it? All the pieces are there, we could just rewire to achieve that. I am not absolutely certain of the topology, though. I believe the idea is just to cause more chaotic noise during the modulation process, which comes out in the wash.
I think a second order modulator would not get caught at the mid-level outputting 1010101010101010...., but would wander around the midpoint, averaging out to be the same. This would get us over the funny problem we have at the 1/2 point, the 1/4 point, the 3/4 point, etc., due to short, overly-repetitive output patterns which are hard to filter out.
Yes, it can. The clocked feedback is all in the pin, too, so it doesn't have to go through a cog's CTR.
So, is a 2nd-order modulator like what we've got, but with an additional RC stage, so that there are two in series?
Glad to hear that. It would be sad to give up that capability for the convenience of an internal modulator.
I imagine it would look like this:
|-------|--digital out
[R] [R]
| |
in--[R]--|--[R]--|--digital in
{C} {C}
| |
gnd------|-------|--GND
Omitting the caps to Vio.
Thanks. That's hard to think about. And how would those four R's be ratio'd?
Most the of newer papers/designs on higher order modulators, seem to use switched-capacitor integrators.
I think that's because high precision / low noise R's are just too hard, and C's + Analog SW are smaller, and avoid the LPF effects of high value R's.
For many DC use (eg strain guages, thermocouples) where you want to reject 50/60Hz, the rejection needs dictate the sample rates, more than modulator order.
The paper states 1st order can hit 16b/20kHz audio (44/47ksps) with 96.78 MHz fs, which is easy P2 fs ballpark, so a much lower (eg) 10 sps mains-noise-reject rate, is going to be system noise limited. not modulator limited.
My suggestion would be to skip trying to do 2nd order Modulators inside P2 (as the noise floors are too high) but do support the commercial external ones with Sinc3 & CLK+DAT Pin modes, as those can deliver 16b resolution.
A filter mode that allows slower, higher precision external 1st order modulators should also cover 20b/10sps type sensor uses.
(maybe that's a Sinc3 (variant?) + software to get the 20~24b results?)
Using those equations, I calculate a 20b 1st order ADC needs around 2^13.33 fs oversample, so that 96.78 MHz fs gives 9398sps, or 9.678MHz would be 939.8sps, which allows users to tune the fs for their lowest-noise point.
That leaves a scope mode to shake out, which I think is best experimentally extracted from whatever the present P2 ADC can do, viz-z-viz P2 real 1.x ? modulator and P2 noise floor.
There is now a black line showing the target value, you can zoom using the scroll wheel (restore zoom with a double-click in the chart), and you can pick a different endpoint and have a ramped target value. The target ramping happens per input bit.
This would be really cheap to implement in the smart pin.
For 64-clock-quality samples, we'd get one every 8 clocks.
I'm not sure how that saves silicon if there are 12 instructions per processed bit; and if there are 5 bits that contribute to each sample, I.,e, if you wanted to do sinc2 with a LUT (I looked at doing the schematic is logic Friday but instead just drew a schematic by hand while I was implementing the bit-weaver in software.) -- well of course depending on what tool chain you are using maybe it optimizes out; but what I see here adds up to something like 16 clocks per sample/8 * 32 bits * 12 instructions = 768 logical operations? Not sure if I am reading that right. I think I can do a sinc2 in logic Friday and post the schematic, which might make more sense if you don't have either System C or Inferno (I have neither) to turn the sinc3 C++ into RTL.
Jmg, sounds like we really need to add a resistor and capacitor to our current first-order modulator, doesn't it? ...
I'm not sure it is quite that simple...
The 2nd order connections and waveforms in that paper I linked, show a sum of the 1st stage INT and the DAC - ie you need a charge current for C2, based on vC1-DAC, whist the charge current for C1 is Vi-DAC.
I think that's pretty much what you have now, but doubled. ie another difference amplifier and another current mirror/mux, charging a C2.
Sounding more than a simple patch, but maybe P3 could optionally chain 2 integrators of 2 pins, to give a 2nd order modulator on every 2nd pin ?
This would be really cheap to implement in the smart pin.
For 64-clock-quality samples, we'd get one every 8 clocks.
I'm not sure how that saves silicon if there are 12 instructions per processed bit; and if there are 5 bits that contribute to each sample, I.,e, if you wanted to do sinc2 with a LUT (I looked at doing the schematic is logic Friday but instead just drew a schematic by hand while I was implementing the bit-weaver in software.) -- well of course depending on what tool chain you are using maybe it optimizes out; but what I see here adds up to something like 16 clocks per sample/8 * 32 bits * 12 instructions = 768 logical operations? Not sure if I am reading that right. I think I can do a sinc2 in logic Friday and post the schematic, which might make more sense if you don't have either System C or Inferno (I have neither) to turn the sinc3 C++ into RTL.
This is a concept to be implemented in hardware, not software. The tool chain required is mental, only.
So, what goes in and what comes out? I see five bits going in and five coming out, asynchronously.
YEP! That's the secret recipe. It is as [1,4,6,4,1] convolutional kernel, and if you decimate twice - it has to be evaluated at least once every 4 clocks to satisfy Nyquist. Its that simple - so simple that I did the math in my head while entering the truth table. Of course I've never programmed a CPLD, let alone an FPGA, but I have poked around in Quartus and checked the specs, and the way that they arrange the extra 1000' or so gates that you get with each LE and Macrocell, I can always wish that depending on how the AND-OR-INVERT daisy chains work out, it might only need 5 LE's - and that includes the latches. But the schematic sure looks impressive!
Bit 0: 0 or 1
Bit 1: 0 or 1
Bit 2: 0 or 1 I think, we can continue easily following this algorithm.
Number of combinations resulting from consecutive bits::
one bit: 2 possibilities
two bits: 4 possibilities
three bits: 8 possibilies It looks like 10 bits can create 1k possible cases. But after a certain time, only 1 combination really happend!
If for any reason, we start at a given time, after 10 bits we get 1 value for the number of bits set, but there are different possible cases with equal number of bits set.
If for any reason, we start at given time +1 the number of bits set may differ by one, as the first bit is removed, a new bit added.
That means: every possible chunk of bits has a lot of possibilities, but if there is a moving bit stream, every chunk has a history, so the number of possible values is limited. Even if we filter the chunk by multiplying with a window function, we can not increase the number of possibilities, only the range changes, as now not all bits 0 respectively 1 count the same amount.
This seems to be the trap, we are caught in. You can not reduce a deficit on one side without increasing production on one side and increasing consumption on the other. Is this so hard to follow?
Bit 0: 0 or 1
Bit 1: 0 or 1
Bit 2: 0 or 1 I think, we can continue easily following this algorithm.
Number of combinations resulting from consecutive bits::
one bit: 2 possibilities
two bits: 4 possibilities
three bits: 8 possibilies It looks like 10 bits can create 1k possible cases. But after a certain time, only 1 combination really happend!
If for any reason, we start at a given time, after 10 bits we get 1 value for the number of bits set, but there are different possible cases with equal number of bits set.
If for any reason, we start at given time +1 the number of bits set may differ by one, as the first bit is removed, a new bit added.
That means: every possible chunk of bits has a lot of possibilities, but if there is a moving bit stream, every chunk has a history, so the number of possible values is limited. Even if we filter the chunk by multiplying with a window function, we can not increase the number of possibilities, only the range changes, as now not all bits 0 respectively 1 count the same amount.
This seems to be the trap, we are caught in. You can not reduce a deficit on one side without increasing production on one side and increasing consumption on the other. Is this so hard to follow?
Close - but no rabbit yet; but there is a rabbit somewhere and in the classic Monty Python its just a harmless little bunny! I think of a 16 bit number; and I square it; and maybe just to be sneaky I transpose a couple of digits; and then I tell you the 32 bit number. You have this function called square root - which depending on how much damage I did to the number I gave you; it will tell you what my original number was. Of course we are not doing ordinary arithmetic in sigma delta land; but if it is provable that a sigma delta modulator can be modelled as a convolutional code; then there exists an abstract algebra which is diffeomorphic to the operations of raising a vector to some power and finding the inverse, i.e. root relative to that power. Just like with Reed-Solomon; where finding the syndrome and doing the error correction involves finding a logarithm relative to some polynomial in some Galois field.
So, what goes in and what comes out? I see five bits going in and five coming out, asynchronously.
YEP! That's the secret recipe. It is as [1,4,6,4,1] convolutional kernel, and if you decimate twice - it has to be evaluated at least once every 4 clocks to satisfy Nyquist. Its that simple - so simple that I did the math in my head while entering the truth table. Of course I've never programmed a CPLD, let alone an FPGA, but I have poked around in Quartus and checked the specs, and the way that they arrange the extra 1000' or so gates that you get with each LE and Macrocell, I can always wish that depending on how the AND-OR-INVERT daisy chains work out, it might only need 5 LE's - and that includes the latches. But the schematic sure looks impressive!
I don't understand how you'd use it. Can you show some pseudo code of what you'd do with it? Sounds intriguing to me.
Here is the live code that I am using to simulate a sigma delta on my Windoze box; while trying to work out all of the bugs in order to get the core DSP algorithms up and running on the Hydra - eventually.
#include "stdafx.h"
#include <fstream>
#include <algorithm>
#include <queue>
#include <valarray>
#include "defines.h"
#include "externals.h"
#include "mainconfig.h"
#include "propeller.h"
#include "sigmadelta.h"
#include "audio.h"
#define EVEN_BITS (0x55555555)
#define ODD_BITS (0xaaaaaaaa)
#define EVEN_PAIRS (0x33333333)
#define ODD_PAIRS (0xcccccccc)
#define EVEN_NIBBLES (0x0f0f0f0f)
#define ODD_NIBBLES (0xf0f0f0f0)
#define EVEN_BYTES (0x00ff00ff)
#define ODD_BYTES (0xff00ff00)
#define MOVING_AVERAGE_CONSTANT 8
#define QUANTIZATION 16
class sigma_delta
{
protected:
bool q0;
int m_input;
int m_accum;
MATH_TYPE m_offset;
MATH_TYPE m_gain0;
MATH_TYPE m_gain1;
DWORD m_bitstream;
bool carry;
unsigned int REG[8];
vector<MATH_TYPE> my_noise;
MATH_TYPE get_noise (int N);
void iterate (int N);
DWORD sinc3 (DWORD input);
inline int dac1 (bool fb);
public:
sigma_delta ();
void create_noise (size_t N);
void filter_noise ();
void set_input (MATH_TYPE);
MATH_TYPE randomize1 (MATH_TYPE, int);
MATH_TYPE randomize2 (MATH_TYPE, int);
MATH_TYPE randomize3 (MATH_TYPE data1, MATH_TYPE data2);
DWORD sinc3B (DWORD input);
};
void sigma_delta::set_input (MATH_TYPE x)
{
m_input = (int)(m_gain0*x-m_offset);
}
sigma_delta::sigma_delta ()
{
m_bitstream = 0xcccccccc;
m_input = 0;
m_offset = 0;
m_gain0 = 32768;
m_gain1 = 32768+16;
m_accum = 0;
}
inline int sigma_delta::dac1 (bool fb)
{
int result;
result = (fb==true?1.0:-1.0);
return result;
}
void sigma_delta::iterate (int N)
{
int i;
for (i=0;i<N;i++)
{
m_accum += m_input;
m_accum += m_gain1*dac1 (q0);
q0 = (m_accum<0?true:false);
m_bitstream <<= 1;
m_bitstream += (q0&0x01);
}
}
// optimized verstion of sinc3 - compiles to just
// 42 instruction on a P1 - but carry propagation
// is not implemented yet so you only get two
// valid 7 bit samples per iteration with a range
// of [0..64] and YES the overflow bit is important!
DWORD sigma_delta::sinc3B (DWORD input)
{
DWORD r0, r1, r2, r3, r4, r5, s0, s1;
r0 = input;
s0 = (input&EVEN_BITS)<<1;
s1 = ((input&ODD_BITS)+((input&ODD_BITS)>>2))>>1;
r1 = ((s0&ODD_PAIRS) + (s1&ODD_PAIRS))>>2;
r2 = (s0&EVEN_PAIRS) + (s1&EVEN_PAIRS);
s0 = (r1&EVEN_NIBBLES);
s1 = (r1&ODD_NIBBLES)>>4;
r4 = (r2&EVEN_NIBBLES)<<1;
r3 = (r2&ODD_NIBBLES)>>3;
r4+= (s0+s1);
r3+= ((s0>>8)+s1);
s0 = (r3&EVEN_BYTES);
s1 = (r3&ODD_BYTES);
r5 = (r4<<1)+s0+s1+(s0>>8)+(s1<<8);
return r5;
}
// earlier version of sinc3 ..
DWORD sigma_delta::sinc3 (DWORD input)
{
DWORD s0, s1;
REG[0] = input;
s0 = (input&EVEN_BITS)<<1;
s1 = ((input&ODD_BITS)+((input&ODD_BITS)>>2))>>1;
REG[1] = ((s0&ODD_PAIRS) + (s1&ODD_PAIRS))>>2;
REG[2] = (s0&EVEN_PAIRS) + (s1&EVEN_PAIRS);
s0 = (REG[1]&EVEN_NIBBLES);
s1 = (REG[1]&ODD_NIBBLES)>>4;
REG[4] = (REG[2]&EVEN_NIBBLES)<<1;
REG[3] = (REG[2]&ODD_NIBBLES)>>3;
REG[4]+= (s0+s1);
REG[3]+= ((s0>>8)+s1);
s0 = (REG[3]&EVEN_BYTES);
s1 = (REG[3]&ODD_BYTES);
REG[5] = (REG[4]<<1)+s0+s1+(s0>>8)+(s1<<8);
return 0;
}
MATH_TYPE sigma_delta::randomize3 (MATH_TYPE data1, MATH_TYPE data2)
{
MATH_TYPE voltage = 0.10;
int sample1, sample2, sample3, sample4, sample5;
int i;
set_input (data1);
// since we are processing setreo samples on an
// odd and even sample basis we need to flush
// the buffer and precondition the bitstream to
// the new channel - FIXME ELSEWHERE!!!@
iterate (32);
sinc3(m_bitstream);
// now grab the two center values because carry
// propagation is not yet implemented
sample2 = REG[5]&0x0000ff00;
sample1 = REG[5]&0x00ff0000>>8;
set_input (data2);
// now itherate again - and get two more valid samples!
iterate (16);
sinc3(m_bitstream);
sample4 = REG[5]&0x0000ff00;
sample3 = REG[5]&0x00ff0000>>8;
// use a [1,3,3,1] convolutional kernel for now
// sample5 = 0.5*(sample1+3*sample2+3*sample3+sample4);
sample5 = 0.5*(-sample1+5*sample2+5*sample3-sample4);
voltage = float(sample5-32768)/65536.0;
return voltage;
}
MATH_TYPE sigma_delta::randomize2 (MATH_TYPE data, int N)
{
MATH_TYPE voltage = 0.10;
MATH_TYPE noise;
int bitstream = 0;
int sample1, sample2, sample3;
int i, offset;
size_t sz = my_noise.size();
for (i=0;i<32;i++)
{
bitstream<<=1;
offset = N+i+1;
if (offset>sz)
continue;
noise = my_noise [sz-offset];
if (data>noise)
bitstream++;
}
sinc3(bitstream);
sample1 = REG[5]&0x0000ff00;
sample2 = REG[5]&0x00ff0000>>8;
sample3 = (sample1+sample2)<<1;
voltage = float(sample3-32768)/65536.0;
return voltage;
}
MATH_TYPE sigma_delta::randomize1 (MATH_TYPE data, int N)
{
MATH_TYPE voltage = 0.10;
MATH_TYPE noise;
int bits = 0;
int i, offset;
size_t sz = my_noise.size();
for (i=0;i<QUANTIZATION;i++)
{
offset = N+i+1;
if (offset>sz)
continue;
// noise = get_noise (sz-offset);
noise = my_noise [sz-offset];
if (data>noise)
bits++;
else
bits--;
}
voltage = float(bits)/QUANTIZATION;
return voltage;
}
// return N filtered noise samples
// in an array provided by the user
void sigma_delta::create_noise (size_t N)
{
static unsigned short value0;
static unsigned int value1 = 1;
static unsigned int value2 = 2;
MATH_TYPE dither, *nptr;
my_noise.resize (N);
nptr = &(my_noise[0]);
for (size_t i=0;i<N;i++)
{
value1 = random::GF16 (value1);
value2 = random::GF32 (value2);
dither = (1/32768.0)*(short)((value1^value2)&(0xffff));
nptr[i] = dither;
}
}
void sigma_delta::filter_noise ()
{
MATH_TYPE d;
vector<MATH_TYPE> tempn;
int N;
N = my_noise.size();
tempn.resize (N);
for (int i=0;i<N;i++)
{
d = get_noise(i);
tempn[i]=d;
}
my_noise = tempn;
}
MATH_TYPE sigma_delta::get_noise (int N)
{
MATH_TYPE a,b;
// size_t MOVING_AVERAGE_CONSTANT;
MATH_TYPE *nptr = &(my_noise[0]);
size_t i,j,k;
b=0;
a = 0;
i = N-MOVING_AVERAGE_CONSTANT/2;
k = N+MOVING_AVERAGE_CONSTANT/2;
if (N>MOVING_AVERAGE_CONSTANT/2)
i=N-MOVING_AVERAGE_CONSTANT/2;
else
i=0;
if (k>my_noise.size())
k=my_noise.size();
for (j=i;j<k;j++)
{
a += nptr[j];
}
a/=MOVING_AVERAGE_CONSTANT;
b = nptr[N]-a;
return b;
}
And here is how I patch in a simulated ADC into my spectrum analyzer. Eventually I would like to replace SimpleTerm so I can process live data from the Hydra, the way that Prop scope does.
Lazarus666, do you have, maybe, a 10-line explanation in English of what you do with it?
I take the output of a sigma delta modulator, I.e, the bitstream, and implement a bit wise convolutional code which does the same work as a low pass filter having a sinc1, sinc2, or sinc3 response in the frequency domain by implementing the corresponding impulse response function in the time domain. This performs the useful function of reducing a 250 megabit sample rate down to something like 7 bits at 31.25 Mhz for the sinc3 case or else 5 bits at 62.5 Mhz for the case of sinc2; and where the results of the sinc2 calculation and decimation are still in virtual registers REG[3] and REG[4] in the unoptimized C++ version of sinc3; so simply truncating the sinc3 implementation will give a working sinc2 - which is the earlier work. Now it turns out that there is a cog available in OBEX, which someone else wrote that purportedly will do a 1024 point FFT on a P1 using for cogs in just 7.5 milliseconds; which work out to ~ 7 microseconds per clock cycle. And that means that if I want to implement a brick wall Nyquist filter to post process the sinc3 result; then it is a simple matter of taking the assembly that GCC gives me in simple IDE if I know where to look, and port that over to FastSpin inline assembler, as someone else suggested. Which seems to suggest that it is indeed possible to get perhaps 13 or 14 bits of filtered audio at at least AM radio quality, as far a the frequency response goes on the P1. Right now I am still just building modules and profiling code and running test cases.
Lazarus666, so you are getting 7 bits in 8 clocks using Sinc3. That's pretty good. I'm trying to get 8 bits as quickly and often as possible. Have you got any advice?
Lazarus666, so you are getting 7 bits in 8 clocks using Sinc3. That's pretty good. I'm trying to get 8 bits as quickly and often as possible. Have you got any advice?
When I say 7 bits, I mean 7 bits of resolution, not precision - however the data is useful even at that stage, depending on what you do with it. Broadcast HDTV for example has a "digital cliff" where it only needs about 14db of signal to noise to recover a perfect picture - however the Viterbi algorithm - which is a bit beyond me as to how the Trellis works on the decode side … expects a floating point value; and as for the rest - the magic knows how to generate a probability tree (the Trellis) which de-ghosts the signal and fills in some dropouts and gets it ready for the Reed-Solomon. Now for what I am doing, most FFT algorithms expect a floating point input, although I think that KISS FFT works with 16 bit data in and 32 out; so those noisy bits do get used later on when the big guns go to work - and the more bits that we can give them; without roundoff or truncation, i.e., so that as much information as possible is preserved - the better the final outcome will be. Now as I see it, what sinc1 is really good at is turning a stream like 0101010101 into DC, because we really don't care about what is happening at 125 Mhz in the case of the P2 bitstream, and yet the theory says that we can use this filter which throws away half the information, without damaging the low frequencies in any way- and that saves a whole layer of butterflies and twiddles when it gets to FFT time for each time that we can get away with a simple decimiating filter that rids the stream of sequences like 01010101 or 001100110011 and replaces them with appropriate averages.
Of course in analog work it is common to talk about filters like 4 pole Butterworth or 8 pole Bessel, and so on; well then - how does something like 64 pole Chebyshev sound? All you need is a 64 point FFT … and a little more magic, to make the filter tree; of course higher order FFTs produces vastly superior results; which is easy to see from the spectrum plots that I posted earlier - which I will reattach for those who might just be tuning in.
I've just noticed something that must have caused confusion in this thread.
A sinc filter, as used by nearly every text book and wikipedia has a sinc _impulse response_.
Such filters have a close-to brick-wall frequency response, and a lot of delay.
In the terminology of sigma-delta modulators however, sinc seems to mean a filter with
a sinc _frequency response_, or a power thereof. They are very easy to implement, but
the frequency performance is necessarily mediocre.
However the time-domain performance is better suited for a 'scope style application,
ie no ringing on edges. However the price you pay for good time-domain behaviour is poorer
frequency behaviour, and in a sigma-delta modulator that means more noise and less
ENOB.
Audio sigma-delta ADCs have extreme brick-wall filters and square waves ring like a textbook
truncated odd-harmonic series - but for audio noise matters a lot and ringing does not matter at
all.
So I wonder if various sources of ENOB v. order v. decimation factor data are assuming brick-wall
filtering?
I've got an 8x oversampling Sinc2/128 worked out that definitely puts out 8-bit samples every 16 bits that are the product of the last 256 bits.
decimation = 16
range = decimation*256
for iter = 0 to range-1
acc = acc + iter/range 'new ADC bit
if acc >= 1 then
acc = acc - 1
t = 1
else
t = 0
endif
inta = inta + t
intb = intb + inta
if iter/decimation = int(iter/decimation) then
sample = intb - dif7
dif7 = dif6
dif6 = dif5
dif5 = dif4
dif4 = dif3
dif3 = dif2
dif2 = dif1
dif1 = dif0
dif0 = intb
intb = 0
x = iter/decimation
y = pow(decimation,2)
s = int((sample+7)/8)
line x, y, x, y - s
'print int(sample/8)
endif
next iter
Comparing this to Sinc2 with integrators in hardware, differentiators in software and integrate and dump, there are 8x as many difX registers.
Yes, but this gives a true 8-bit sample every 16 clocks. You just do a RDPIN to get it, or the streamer can use it. Its frequency response is going to be poorer than a 64-tap window, but precision will be 8 bits, always.
I've got an 8x oversampling Sinc2/128 worked out that definitely puts out 8-bit samples every 16 bits that are the product of the last 256 bits.
decimation = 16
range = decimation*256
for iter = 0 to range-1
acc = acc + iter/range 'new ADC bit
if acc >= 1 then
acc = acc - 1
t = 1
else
t = 0
endif
inta = inta + t
intb = intb + inta
if iter/decimation = int(iter/decimation) then
sample = intb - dif7
dif7 = dif6
dif6 = dif5
dif5 = dif4
dif4 = dif3
dif3 = dif2
dif2 = dif1
dif1 = dif0
dif0 = intb
intb = 0
x = iter/decimation
y = pow(decimation,2)
s = int((sample+7)/8)
line x, y, x, y - s
'print int(sample/8)
endif
next iter
Comparing this to Sinc2 with integrators in hardware, differentiators in software and integrate and dump, there are 8x as many difX registers.
Yes, but this gives a true 8-bit sample every 16 clocks. You just do a RDPIN to get it, or the streamer can use it. Its frequency response is going to be poorer than a 64-tap window, but precision will be 8 bits, always.
My understanding is the sample output is a moving average of eight 32-clock triangular windows that are overlapped by 16 clocks and therefore each output is generated by 128 ADC bits.
What if only dif0-dif3 are used when decimation = 16?
Or dif0-dif3 or dif0-dif7 when decimation = 8?
I'm thinking about a scope mode replacement that resolves DC better than 6 bits. We won't be able to get a sample every clock, but we can run different sets of acc2's and diff's for more frequent samples.
Are all the dif and bdif needed? I think the aim is to average eight successive Sinc2 outputs, but maybe four would be enough.
This is a CIC decimator with N=2, R=16, M=8. I think both are needed. It would be nice if we could make it work with one delay line. It seems like one delay line means one moving average or a sinc1.
Comments
O.K. the reason for the thrashing, between stair step values for certain amplitudes and inputs, based on running a software sigma-delta that takes 24 bit floating point input and runs a 32x oversample through sinc3 and a baby Nyquist by changing the post sinc3 summation kernel from [1,3,3,1] my earlier post to a [-1,5,5,-1] kernel so that the P2 will achieve 8 bits at 31Mhz!! (BELIEVE!@@@!). what we find is that it is not dithering noise or the lack thereof that causes the instability; or the ZERO input capacitance of the integrator FET (because there is none so the time constant is zero unless I start running simulation is PSPICE which is a possibility .. but rather it is because sigma delta modulation is in and of itself a type of highly deterministic chaotic fractal convolutional code - even if it is less chaotic than lets say, either Mandelbrot or Julia, and rather more like Hilbert or Seripenski - so that off the top of my head I cant tell you the formal "fractal dimension" associated there with; but I am thinking that it is in fact Hilbert complete, as opposed to being associated with any sort of Cantor group - which means of course it is a computable function on a suitable Universal Turing machine capable of general computations involving deterministic finite automatons.
Implication?? All sigma-deltas oscillate "out of band" for certain inputs. That's how it works. And it is for discrete input, a type of finite state machine. Of course that means that if you look at one of the problem cases on a spectrum analyzer display, having an output of a 2^N bank polyphase filter - then for at least one particular case of 40Hz or thereabouts, at around -20db, it is easy to see that in sub bands 6 and 12 is where the oscillation is occurring - but that the frequency of spurious oscillation will change according to input. The fix therefore is to use sinc3 with or without an external modulator and an appropriate Nyquist filter to isolate the sub-band of interest.
Good link, see also below.
It will 'work' with Class D / direct connection, but a proper low pass filter and amplifier will allow PC sound card capture too...
I also found this - older, but good because it includes plots of the nodes in the SDM ADCs
http://www2.ece.rochester.edu/~zduan/teaching/ece472/reading/Aziz_1996.pdf
That spec's numbers for 20kHz Audio targets (44/48kHz sampling?) , and 16b CD quality.
It has equation for those plots in your link of
First Order Sigma-Delta Modulation
For every doubling of the oversampling ratio, the SNR improves by 9.03 dB, or equivalently, the resolution improves by 1.5 bits.
It was desired to convert a 20 kHz audio band to CD quality resolution of 16 bits. Using Eq. 13, we can compute that the required fs with a 1 bit internal ADC is 96.78 MHz.
ie if we ignore the noise floors/real world distortion SNR, the P2 operating above 96.78MHz can manage 20kHz/16b Audio, with first order oversampled modulators. (single integrators)
Second Order Sigma-Delta Modulation
For every doubling of the oversampling ratio,fD/fB, the SNR improves by 15.02 dB, or the equivalent resolution by 2.5 bits,
a 20 kHz audio band needs to be converted to a resolution of 16 bits. The fs needed by a 2nd order sigma-delta modulator using only a 1 bit quantizer is, from Eq. 22, 6.12 MHz
or, put another way, an (ideal) Second order SDM can give 316.274 kHz bandwidth, with a 96.78 MHz fs.
Real World example : AMC1035 (2nd order) achieves 16 bits of resolution with a dynamic range of 87 dB at a data rate of 82 kSPS (from 21MHz Fs & Sinc3 filter).
I think a second order modulator would not get caught at the mid-level outputting 1010101010101010...., but would wander around the midpoint, averaging out to be the same. This would get us over the funny problem we have at the 1/2 point, the 1/4 point, the 3/4 point, etc., due to short, overly-repetitive output patterns which are hard to filter out.
Most the of newer papers/designs on higher order modulators, seem to use switched-capacitor integrators.
I think that's because high precision / low noise R's are just too hard, and C's + Analog SW are smaller, and avoid the LPF effects of high value R's.
The older paper I found above
http://www2.ece.rochester.edu/~zduan/teaching/ece472/reading/Aziz_1996.pdf
shows waveforms of the 2 integrator stages of a second order modulator, and also gives numbers for 1st and second order.
For many DC use (eg strain guages, thermocouples) where you want to reject 50/60Hz, the rejection needs dictate the sample rates, more than modulator order.
The paper states 1st order can hit 16b/20kHz audio (44/47ksps) with 96.78 MHz fs, which is easy P2 fs ballpark, so a much lower (eg) 10 sps mains-noise-reject rate, is going to be system noise limited. not modulator limited.
My suggestion would be to skip trying to do 2nd order Modulators inside P2 (as the noise floors are too high) but do support the commercial external ones with Sinc3 & CLK+DAT Pin modes, as those can deliver 16b resolution.
A filter mode that allows slower, higher precision external 1st order modulators should also cover 20b/10sps type sensor uses.
(maybe that's a Sinc3 (variant?) + software to get the 20~24b results?)
Using those equations, I calculate a 20b 1st order ADC needs around 2^13.33 fs oversample, so that 96.78 MHz fs gives 9398sps, or 9.678MHz would be 939.8sps, which allows users to tune the fs for their lowest-noise point.
That leaves a scope mode to shake out, which I think is best experimentally extracted from whatever the present P2 ADC can do, viz-z-viz P2 real 1.x ? modulator and P2 noise floor.
This would be really cheap to implement in the smart pin.
For 64-clock-quality samples, we'd get one every 8 clocks.
Here's an update:
https://www.dropbox.com/s/t258hbuh8t4hh9b/Test_ADC_Filtering - v002.zip?dl=1
There is now a black line showing the target value, you can zoom using the scroll wheel (restore zoom with a double-click in the chart), and you can pick a different endpoint and have a ramped target value. The target ramping happens per input bit.
thanks,
Jonathan
I'm not sure how that saves silicon if there are 12 instructions per processed bit; and if there are 5 bits that contribute to each sample, I.,e, if you wanted to do sinc2 with a LUT (I looked at doing the schematic is logic Friday but instead just drew a schematic by hand while I was implementing the bit-weaver in software.) -- well of course depending on what tool chain you are using maybe it optimizes out; but what I see here adds up to something like 16 clocks per sample/8 * 32 bits * 12 instructions = 768 logical operations? Not sure if I am reading that right. I think I can do a sinc2 in logic Friday and post the schematic, which might make more sense if you don't have either System C or Inferno (I have neither) to turn the sinc3 C++ into RTL.
I'm not sure it is quite that simple...
The 2nd order connections and waveforms in that paper I linked, show a sum of the 1st stage INT and the DAC - ie you need a charge current for C2, based on vC1-DAC, whist the charge current for C1 is Vi-DAC.
I think that's pretty much what you have now, but doubled. ie another difference amplifier and another current mirror/mux, charging a C2.
Sounding more than a simple patch, but maybe P3 could optionally chain 2 integrators of 2 pins, to give a 2nd order modulator on every 2nd pin ?
This is a concept to be implemented in hardware, not software. The tool chain required is mental, only.
O.K., I tried what I think is an zero phase delay, i.e. equiripple sinc2 in logic Friday and it comes up with this.
Entered by truthtable:
z0 = q0 q1 q2 q3 q4;
z1 = q0' q1' q2 q3 q4' + q0' q1' q2 q3 q4 + q0' q1 q2' q3 q4' + q0' q1 q2' q3 q4 + q0' q1 q2 q3' q4' + q0' q1 q2 q3' q4 + q0' q1 q2 q3 q4' + q0' q1 q2 q3 q4 + q0 q1' q2 q3' q4 + q0 q1' q2 q3 q4' + q0 q1' q2 q3 q4 + q0 q1 q2' q3 q4' + q0 q1 q2' q3 q4 + q0 q1 q2 q3' q4' + q0 q1 q2 q3' q4 + q0 q1 q2 q3 q4';
z2 = q0' q1' q2' q3 q4' + q0' q1' q2' q3 q4 + q0' q1' q2 q3' q4' + q0' q1' q2 q3' q4 + q0' q1 q2' q3' q4' + q0' q1 q2' q3' q4 + q0' q1 q2 q3 q4' + q0' q1 q2 q3 q4 + q0 q1' q2' q3 q4' + q0 q1' q2' q3 q4 + q0 q1' q2 q3' q4' + q0 q1' q2 q3 q4' + q0 q1' q2 q3 q4 + q0 q1 q2' q3' q4' + q0 q1 q2' q3' q4 + q0 q1 q2 q3' q4 + q0 q1 q2 q3 q4';
z3 = q0' q1' q2 q3' q4' + q0' q1' q2 q3' q4 + q0' q1' q2 q3 q4' + q0' q1' q2 q3 q4 + q0' q1 q2' q3 q4' + q0' q1 q2 q3' q4' + q0' q1 q2 q3' q4 + q0' q1 q2 q3 q4' + q0' q1 q2 q3 q4 + q0 q1' q2' q3' q4 + q0 q1' q2 q3' q4' + q0 q1 q2' q3' q4 + q0 q1 q2' q3 q4' + q0 q1 q2' q3 q4 + q0 q1 q2 q3' q4' + q0 q1 q2 q3 q4';
z4 = q0' q1' q2' q3' q4 + q0' q1' q2' q3 q4 + q0' q1' q2 q3' q4 + q0' q1' q2 q3 q4 + q0' q1 q2' q3' q4 + q0' q1 q2' q3 q4 + q0' q1 q2 q3' q4 + q0' q1 q2 q3 q4 + q0 q1' q2' q3' q4' + q0 q1' q2' q3 q4' + q0 q1' q2 q3' q4' + q0 q1' q2 q3 q4' + q0 q1 q2' q3' q4' + q0 q1 q2 q3' q4' + q0 q1 q2 q3 q4';
Minimized:
z0 = q0 q1 q2 q3 q4;
z1 = q0 q2 q3' q4 + q1 q2' q3 + q0 q1' q2 q3 + q1 q2 q4' + q0' q2 q3 + q0' q1 q2 ;
z2 = q0 q1 q3' q4 + q0' q1 q2 q3 + q0' q1' q2 q3' + q1' q2 q3' q4' + q1' q2' q3 + q0 q2 q3 q4' + q1 q2' q3' + q0 q1' q2 q3 ;
z3 = q0 q2' q3' q4 + q0 q1 q2' q3 + q0' q1' q2 q3' + q1' q2 q3' q4' + q1 q2 q4' + q1 q3 q4' + q0' q2 q3 + q0' q1 q2 ;
z4 = q0 q1' q4' + q0 q3' q4' + q0 q2 q3 q4' + q0' q4;
So, what goes in and what comes out? I see five bits going in and five coming out, asynchronously.
YEP! That's the secret recipe. It is as [1,4,6,4,1] convolutional kernel, and if you decimate twice - it has to be evaluated at least once every 4 clocks to satisfy Nyquist. Its that simple - so simple that I did the math in my head while entering the truth table. Of course I've never programmed a CPLD, let alone an FPGA, but I have poked around in Quartus and checked the specs, and the way that they arrange the extra 1000' or so gates that you get with each LE and Macrocell, I can always wish that depending on how the AND-OR-INVERT daisy chains work out, it might only need 5 LE's - and that includes the latches. But the schematic sure looks impressive!
Bit 0: 0 or 1
Bit 1: 0 or 1
Bit 2: 0 or 1 I think, we can continue easily following this algorithm.
Number of combinations resulting from consecutive bits::
one bit: 2 possibilities
two bits: 4 possibilities
three bits: 8 possibilies It looks like 10 bits can create 1k possible cases. But after a certain time, only 1 combination really happend!
If for any reason, we start at a given time, after 10 bits we get 1 value for the number of bits set, but there are different possible cases with equal number of bits set.
If for any reason, we start at given time +1 the number of bits set may differ by one, as the first bit is removed, a new bit added.
That means: every possible chunk of bits has a lot of possibilities, but if there is a moving bit stream, every chunk has a history, so the number of possible values is limited. Even if we filter the chunk by multiplying with a window function, we can not increase the number of possibilities, only the range changes, as now not all bits 0 respectively 1 count the same amount.
This seems to be the trap, we are caught in. You can not reduce a deficit on one side without increasing production on one side and increasing consumption on the other. Is this so hard to follow?
Close - but no rabbit yet; but there is a rabbit somewhere and in the classic Monty Python its just a harmless little bunny! I think of a 16 bit number; and I square it; and maybe just to be sneaky I transpose a couple of digits; and then I tell you the 32 bit number. You have this function called square root - which depending on how much damage I did to the number I gave you; it will tell you what my original number was. Of course we are not doing ordinary arithmetic in sigma delta land; but if it is provable that a sigma delta modulator can be modelled as a convolutional code; then there exists an abstract algebra which is diffeomorphic to the operations of raising a vector to some power and finding the inverse, i.e. root relative to that power. Just like with Reed-Solomon; where finding the syndrome and doing the error correction involves finding a logarithm relative to some polynomial in some Galois field.
I don't understand how you'd use it. Can you show some pseudo code of what you'd do with it? Sounds intriguing to me.
And here is how I patch in a simulated ADC into my spectrum analyzer. Eventually I would like to replace SimpleTerm so I can process live data from the Hydra, the way that Prop scope does.
It outputs a perfectly straight line within the ADC duty range.
I take the output of a sigma delta modulator, I.e, the bitstream, and implement a bit wise convolutional code which does the same work as a low pass filter having a sinc1, sinc2, or sinc3 response in the frequency domain by implementing the corresponding impulse response function in the time domain. This performs the useful function of reducing a 250 megabit sample rate down to something like 7 bits at 31.25 Mhz for the sinc3 case or else 5 bits at 62.5 Mhz for the case of sinc2; and where the results of the sinc2 calculation and decimation are still in virtual registers REG[3] and REG[4] in the unoptimized C++ version of sinc3; so simply truncating the sinc3 implementation will give a working sinc2 - which is the earlier work. Now it turns out that there is a cog available in OBEX, which someone else wrote that purportedly will do a 1024 point FFT on a P1 using for cogs in just 7.5 milliseconds; which work out to ~ 7 microseconds per clock cycle. And that means that if I want to implement a brick wall Nyquist filter to post process the sinc3 result; then it is a simple matter of taking the assembly that GCC gives me in simple IDE if I know where to look, and port that over to FastSpin inline assembler, as someone else suggested. Which seems to suggest that it is indeed possible to get perhaps 13 or 14 bits of filtered audio at at least AM radio quality, as far a the frequency response goes on the P1. Right now I am still just building modules and profiling code and running test cases.
When I say 7 bits, I mean 7 bits of resolution, not precision - however the data is useful even at that stage, depending on what you do with it. Broadcast HDTV for example has a "digital cliff" where it only needs about 14db of signal to noise to recover a perfect picture - however the Viterbi algorithm - which is a bit beyond me as to how the Trellis works on the decode side … expects a floating point value; and as for the rest - the magic knows how to generate a probability tree (the Trellis) which de-ghosts the signal and fills in some dropouts and gets it ready for the Reed-Solomon. Now for what I am doing, most FFT algorithms expect a floating point input, although I think that KISS FFT works with 16 bit data in and 32 out; so those noisy bits do get used later on when the big guns go to work - and the more bits that we can give them; without roundoff or truncation, i.e., so that as much information as possible is preserved - the better the final outcome will be. Now as I see it, what sinc1 is really good at is turning a stream like 0101010101 into DC, because we really don't care about what is happening at 125 Mhz in the case of the P2 bitstream, and yet the theory says that we can use this filter which throws away half the information, without damaging the low frequencies in any way- and that saves a whole layer of butterflies and twiddles when it gets to FFT time for each time that we can get away with a simple decimiating filter that rids the stream of sequences like 01010101 or 001100110011 and replaces them with appropriate averages.
Of course in analog work it is common to talk about filters like 4 pole Butterworth or 8 pole Bessel, and so on; well then - how does something like 64 pole Chebyshev sound? All you need is a 64 point FFT … and a little more magic, to make the filter tree; of course higher order FFTs produces vastly superior results; which is easy to see from the spectrum plots that I posted earlier - which I will reattach for those who might just be tuning in.
What is the step response of that ?
I suppose it would take nearly 256 clocks to go from rail to rail.
A sinc filter, as used by nearly every text book and wikipedia has a sinc _impulse response_.
Such filters have a close-to brick-wall frequency response, and a lot of delay.
In the terminology of sigma-delta modulators however, sinc seems to mean a filter with
a sinc _frequency response_, or a power thereof. They are very easy to implement, but
the frequency performance is necessarily mediocre.
However the time-domain performance is better suited for a 'scope style application,
ie no ringing on edges. However the price you pay for good time-domain behaviour is poorer
frequency behaviour, and in a sigma-delta modulator that means more noise and less
ENOB.
Audio sigma-delta ADCs have extreme brick-wall filters and square waves ring like a textbook
truncated odd-harmonic series - but for audio noise matters a lot and ringing does not matter at
all.
So I wonder if various sources of ENOB v. order v. decimation factor data are assuming brick-wall
filtering?
Comparing this to Sinc2 with integrators in hardware, differentiators in software and integrate and dump, there are 8x as many difX registers.
Yes, but this gives a true 8-bit sample every 16 clocks. You just do a RDPIN to get it, or the streamer can use it. Its frequency response is going to be poorer than a 64-tap window, but precision will be 8 bits, always.
My understanding is the sample output is a moving average of eight 32-clock triangular windows that are overlapped by 16 clocks and therefore each output is generated by 128 ADC bits.
What if only dif0-dif3 are used when decimation = 16?
Or dif0-dif3 or dif0-dif7 when decimation = 8?
Why does scope mode need 8 bits?
Something's not right here. I'm getting a sinc1 step response. If you want a sinc2 it should be like this:
Running intb at the reduced rate doesn't appear to change the step response. It could affect the high frequency rejection.
This is a CIC decimator with N=2, R=16, M=8. I think both are needed. It would be nice if we could make it work with one delay line. It seems like one delay line means one moving average or a sinc1.