A new window, Tukey 23(17)/30, has only two absolute derivative values. I've chosen to use the nearest integer difference between taps and 3 and 27 below would be 4 and 26 if using nearest integer to each actual tap. The shift register is larger but the max value of 30 means integrator b is only 6-bit.
I've shown Tukey 19(15)/34 again and added a couple of trapezoidal windows for logic comparison and completeness. Trap. 17(15)/32 requires an output clamp but the short version is perfectly triangular.
Tukey 23(17)/30, shift reg bits = 17+17+17 = 51
1, 2, 3, 5, 7, 9,11,13,15,17,19,21,23,25,27,28,29 Ramp up total = 255 Tukey/Hann
30[17] Plateau total = 510 Tukey
29,28,27,25,23,21,19,17,15,13,11, 9, 7, 5, 3, 2, 1 Ramp down total = 255 Tukey/Hann
Tukey 19(15)/34, shift reg bits = 15+15+15 = 45
1, 2, 4, 6, 8,11,14,17,20,23,26,28,30,32,33 Ramp up total = 255 Tukey/Hann
34[15] Plateau total = 510 Tukey
33,32,30,28,26,23,20,17,14,11, 8, 6, 4, 2, 1 Ramp down total = 255 Tukey/Hann
Trap. 17(15)/32, shift reg bits = 15+17+15 = 47, needs clamp
2, 4, 6, 8,10,12,14,16,18,20,22,24,26,28,30 Ramp up total = 240 Trap./Tri.
32[17] Plateau total = 544/32 Trap./Tri.
30,28,26,24,22,20,18,16,14,12,10, 8, 6, 4, 2 Ramp down total = 240 Trap/Tri.
Trap. 16(14)/30, shift reg bits = 14+20+14 = 48
2, 4, 6, 8,10,12,14,16,18,20,22,24,26,28 Ramp up total = 210 Trap./Tri.
30[20] Plateau total = 600/90 Trap./Tri.
28,26,24,22,20,18,16,14,12,10, 8, 6, 4, 2 Ramp down total = 210 Trap./Tri.
TonyB_, I made a SmallBASIC program that lets me try any spacing between the three slope changes, between their inflections, and between the plateau edges.
I expanded the tap shifter to 73 bits and want to have four filter settings. The longest one will be lowest-pass. We just mux in the tap pairs. I want to see what a lowest-pass filter looks like. We are still a little rough at 45 taps.
dim t(71)
for x = 0 to 71
t(x) = 0
next x
a = 0 : b = 0 : c = 31 : d = 2 '64-tap triangle window
d1 = 1 : d2 = 0 : d3 = 0 : m = 1
'a = 1 : b = 1 : c = 9 : d = 18 '44-tap Saucy Tukey window
'd1 = 1 : d2 = 1 : d3 = 1 : m = 1
'a = 1 : b = 3 : c = 6 : d = 18 '46-tap TonyB_ Tukey 17/32 window (overflows!)
'd1 = 1 : d2 = 1 : d3 = 1 : m = 1
'a = 2 : b = 3 : c = 6 : d = 14 '46-tap TonyB_ Tukey 19/34 window
'd1 = 1 : d2 = 1 : d3 = 1 : m = 1
'a = 1 : b = 2 : c = 8 : d = 1 '29-tap Hann window
'd1 = 1 : d2 = 1 : d3 = 1 : m = 2
t0 = 0
t1 = t0 + a
t2 = t1 + b
t3 = t2 + c
t4 = t3 + b
t5 = t4 + a
t6 = t5 + d
t7 = t6 + a
t8 = t7 + b
t9 = t8 + c
t10 = t9 + b
t11 = t10 + a
print "taps",,
if d1 = 1 then print t0,
if d2 = 1 then print t1,
if d3 = 1 then print t2,
if d3 = 1 then print t3,
if d2 = 1 then print t4,
if d1 = 1 then print t5,
if d1 = 1 then print t6,
if d2 = 1 then print t7,
if d3 = 1 then print t8,
if d3 = 1 then print t9,
if d2 = 1 then print t10,
if d1 = 1 then print t11,
print : print
inta = 0
intb = 0
intc = 0
for y = 0 to 71*2
for x = 71 to 1 step -1
t(x) = t(x-1)
next x
t(00) = 1
if y>71 then t(00) = 0
delta = 0
if (t(t0) = 1 and t(t0+1) = 0) then delta = delta + d1
if (t(t0) = 0 and t(t0+1) = 1) then delta = delta - d1
if (t(t1) = 1 and t(t1+1) = 0) then delta = delta + d2
if (t(t1) = 0 and t(t1+1) = 1) then delta = delta - d2
if (t(t2) = 1 and t(t2+1) = 0) then delta = delta + d3
if (t(t2) = 0 and t(t2+1) = 1) then delta = delta - d3
if (t(t3) = 1 and t(t3+1) = 0) then delta = delta - d3
if (t(t3) = 0 and t(t3+1) = 1) then delta = delta + d3
if (t(t4) = 1 and t(t4+1) = 0) then delta = delta - d2
if (t(t4) = 0 and t(t4+1) = 1) then delta = delta + d2
if (t(t5) = 1 and t(t5+1) = 0) then delta = delta - d1
if (t(t5) = 0 and t(t5+1) = 1) then delta = delta + d1
if (t(t6) = 1 and t(t6+1) = 0) then delta = delta - d1
if (t(t6) = 0 and t(t6+1) = 1) then delta = delta + d1
if (t(t7) = 1 and t(t7+1) = 0) then delta = delta - d2
if (t(t7) = 0 and t(t7+1) = 1) then delta = delta + d2
if (t(t8) = 1 and t(t8+1) = 0) then delta = delta - d3
if (t(t8) = 0 and t(t8+1) = 1) then delta = delta + d3
if (t(t9) = 1 and t(t9+1) = 0) then delta = delta + d3
if (t(t9) = 0 and t(t9+1) = 1) then delta = delta - d3
if (t(t10) = 1 and t(t10+1) = 0) then delta = delta + d2
if (t(t10) = 0 and t(t10+1) = 1) then delta = delta - d2
if (t(t11) = 1 and t(t11+1) = 0) then delta = delta + d1
if (t(t11) = 0 and t(t11+1) = 1) then delta = delta - d1
inta = inta + delta * m
intb = intb + inta
intc = intc + intb
sample = int(intc/4)
print inta,intb,intc,sample
next y
A Tukey window is very flat in the time domain, close to a value of 1.0 for a majority of the window. A ‘Taper Length’ can be specified which determines the amount of time that the window maintains a value of one (Figure 14). The lower the taper length, the longer the Tukey has a value of one over the measurement time.
tukey.png
Figure 14: The ‘Taper length’ determines how much of the Tukey window has a value of zero
Typically, Tukey windows are used to analyze transient data. The advantage of a Tukey window is that the amplitude of transient signal in the time domain is less likely to be altered compared to using Hanning or Flattop (see Figure 15). In a Hanning or Flattop window, the value of the window equals one for a shorter span compared to the Tukey window.
Triangular window (aka Sinc2): sidelobes fall off at -12dB per octave.
Hann window: sidelobes fall off at -18dB per octave.
So, what would be quietest for a DC measurement?
Triangular, maybe.
A Tukey is a mixture of Hann + rectangular. Exact behaviour depends on lengths of plateau and ramps and we need Saucy's help for the frequency response. First or second sidelobes might not be attentuated as much as triangular, whereas higher ones probably are.
I've just been reading that a rectangular moving average window is hopeless in the frequency domain but cannot be beaten for random noise reduction as all samples have equal weight, although other windows can match it. Multiple-pass moving average filters are better in the frequency domain and two-pass is equivalent to triangular window.
There is also a Blackman window that is a relative of Hann. What is the main constraint now if not the shift register? Number of different slopes?
Here is the new triple-integrating Tukey 17/32 that's in the smart pins. It reduced each smart pin by 35 ALMs, which is good.
...
Chip, I really think you're on to something here. I don't know how much logic it would save to go to the cascaded moving average setup. It's using 62 register bits for a 43 tap filter. This is 48 bits. This method is much more flexible at configuring the shape, length, and sum.
It seems to me that:
(tap[01:00] == 2'b01) - (tap[01:00] == 2'b10) = tap[00]-tap[01]
but beware of sign-extend issues. Perhaps the compiler already figured it out. Also, this might be introducing an extra differentiation. I'll run some more analysis, but I want to get some of this out sooner.
It may also be worth considering 4 accumulators. It might reduce the number of difference terms, depending on the filter shape.
Triangular window (aka Sinc2): sidelobes fall off at -12dB per octave.
Hann window: sidelobes fall off at -18dB per octave.
So, what would be quietest for a DC measurement?
Triangular, maybe.
A Tukey is a mixture of Hann + rectangular. Exact behaviour depends on lengths of plateau and ramps and we need Saucy's help for the frequency response. First or second sidelobes might not be attentuated as much as triangular, while other ones are.
I've just been reading that a rectangular moving average window is hopeless in the frequency domain but cannot be beaten for random noise reduction as all samples have equal weight, although other windows can match it. Multiple-pass moving average filters are better in the frequency domain and two-pass is equivalent to triangular window.
There is also a Blackman window that is a relative of Hann. What is the main constraint now if not the shift register? Number of different slopes?
Number of slopes. Trying to find some agreeable setting that will allow bypass of taps to achieve the Hann function. Tukey 17/32 does this, but needs clamping.
The 65-tap triangle filter really looks good.
I'm thinking that there must be some way to get higher-quality results every clock. Consider that a 16-sample Sinc3 is 8 bits in quality and uses the prior two samples (my understanding, anyway). That's all of 48 bits. We have that, and more.
Triangular window (aka Sinc2): sidelobes fall off at -12dB per octave.
Hann window: sidelobes fall off at -18dB per octave.
So, what would be quietest for a DC measurement?
Triangular, maybe.
A Tukey is a mixture of Hann + rectangular. Exact behaviour depends on lengths of plateau and ramps and we need Saucy's help for the frequency response. First or second sidelobes might not be attentuated as much as triangular, whereas higher ones probably are.
I've just been reading that a rectangular moving average window is hopeless in the frequency domain but cannot be beaten for random noise reduction as all samples have equal weight, although other windows can match it. Multiple-pass moving average filters are better in the frequency domain and two-pass is equivalent to triangular window.
There is also a Blackman window that is a relative of Hann. What is the main constraint now if not the shift register? Number of different slopes?
Vpp is probably the best measure of noise in a scope display. Just ignore the HighBitCount as that's not really relevant to how we will implement it now. So longer filters work better Tukey is better than a triangle of the same length.
Compare this to the third derivative of Tukey17. Indies vary because I had to pad the edges to avoid loosing data. And Matlab starts arrays at 1. Other than that, they are the same. So delta is the third derivative of the output. If we integrate it 3 times, we get the output.
But wait a minute, the second derivative has fewer non-zero terms. And that will eliminate an integrator as well. Also let's switch to Tukey23 because it performs better and saves summer terms.
Triangular window (aka Sinc2): sidelobes fall off at -12dB per octave.
Hann window: sidelobes fall off at -18dB per octave.
So, what would be quietest for a DC measurement?
Triangular, maybe.
A Tukey is a mixture of Hann + rectangular. Exact behaviour depends on lengths of plateau and ramps and we need Saucy's help for the frequency response. First or second sidelobes might not be attentuated as much as triangular, whereas higher ones probably are.
I've just been reading that a rectangular moving average window is hopeless in the frequency domain but cannot be beaten for random noise reduction as all samples have equal weight, although other windows can match it. Multiple-pass moving average filters are better in the frequency domain and two-pass is equivalent to triangular window.
There is also a Blackman window that is a relative of Hann. What is the main constraint now if not the shift register? Number of different slopes?
Vpp is probably the best measure of noise in a scope display. Just ignore the HighBitCount as that's not really relevant to how we will implement it now. So longer filters work better Tukey is better than a triangle of the same length.
What is the logic cost of doing this for the rectangular, i.e., flat top part. as it requires only 5 additions to run the function, i.e., by bit-weaving the data flow, so that it could be available as an PASM opcode, (if it isn't already); but it would run automatically if you wanted a Tukey accumulate in streaming data acquisition mode. Of course for the ends of the sausage then starting and stop functions could be called; which might use a one sided special triangular slope; or half cosine window function call - so that to start the Tukey you would compute the triangular starter window once; followed by count bits (and scale sum as desired), and then to stop accumulating you would call ) "flat top MINUS the triangular", or else remember that the Hann window is a raised cosine - so that you need f(x) = 1=cosine(x); and here again is the 1 part; whereas the cosine part can actually be done with some similar trickery.
What do you mean about getting rid of an integrator?
But wait a minute, the second derivative has fewer non-zero terms. And that will eliminate an integrator as well. Also let's switch to Tukey23 because it performs better and saves summer terms.
What do you mean about getting rid of an integrator?
But wait a minute, the second derivative has fewer non-zero terms. And that will eliminate an integrator as well. Also let's switch to Tukey23 because it performs better and saves summer terms.
The cancellation doesn't do anything to get rid of integrators. If we have 12 differential terms, we would need 24 taps off the shift register. But some of these cancel so we are using 16 unique taps from the shift register. Doing the conversion to sum form and canceling terms shows that the taps your code uses match the positions of the third derivative of the filter impulse response.
How we get rid of an integrator is making the computation of delta match the second derivative of the impulse response. Basically what we are doing here is differentiating our filter taps, doing the convolution, then integrating to get the output. Using the third derivative of the impulse response for our taps results in more taps with non-zero values than using the second derivative. We want as many taps as possible to have zero values because they are cheapest to implement.
These formations in the calculation of delta are performing edge detection/differentiation of the input data. So it's necessary to integrate an additional time to get the output samples. Also, we need to get additional taps from the shift register to perform this differentiation.
What is the logic cost of doing this for the rectangular, i.e., flat top part. as it requires only 5 additions to run the function, i.e., by bit-weaving the data flow, so that it could be available as an PASM opcode, (if it isn't already); but it would run automatically if you wanted a Tukey accumulate in streaming data acquisition mode. Of course for the ends of the sausage then starting and stop functions could be called; which might use a one sided special triangular slope; or half cosine window function call - so that to start the Tukey you would compute the triangular starter window once; followed by count bits (and scale sum as desired), and then to stop accumulating you would call ) "flat top MINUS the triangular", or else remember that the Hann window is a raised cosine - so that you need f(x) = 1=cosine(x); and here again is the 1 part; whereas the cosine part can actually be done with some similar trickery.
It would be very small, simply a counter running at sysclock rate or a tree of adders summing 32 individual bits to a 6 bit sum. I've seen similar code here: graphics.stanford.edu/%7Eseander/bithacks.html#CountBitsSetParallel Compared to the P1, the P2 has a new instruction to do this: "ones" Which was used by Chip in the code on the first post of this thread.
It would be nice to have an actual signal in the bitstream, e.g. 2 MHz of 10% FS amplitude
Here's my first attempt using smartpin sync serial receive to record. Simply fills all 512 KB of hubRAM and dumps it. Duration is about 0.4 seconds.
EDIT: I've added scope screenshot of analogue filtered bitstream of what goes to Prop pin.
EDIT2: And the schematic for ADC board with filter.
EDIT3: Note that the zip compression is only achieving slightly better than 2:1 compression. That tells me we aren't going to get great improvement at lossless compression. I've tried to use FLAC compressor tool but that needed PCM data, so no luck comparing that.
EDIT4: And source code snippet for the capturing.
EDIT5: Added snapshots of clock timing. About 14 ns of hold time on data rise. And about 12 ns of hold on data fall.
EDIT6: Correction on the recorded amount, 512 kB is 0.2 seconds worth. I had actually run for 1 MB but the second half wasn't kept. On that note, I can up that to 1 MB with other Prop2 image for FPGA ...
Saucy, getting rid of the cancelled-out bits in the delta computation definitely saves logic. The smart pin dropped by 10 ALMs.
Is that for Tukey17 integrated 3x or Tukey23 integrated 2x?
For my suggested delta calculation (Tukey23) I think it needs
delta 4 bits (range -4:4)
first integrator 6 bits (range -30:30)
second integrator 10 bits (range 0:1020)
delta=tap[00]+tap[03]+tap[49]+tap[52] -tap[15]-tap[18]-tap[34]-tap[37]; Tukey
delta=tap[00]+tap[03]+tap[32]+tap[35] -tap[15]-tap[18]-tap[17]-tap[20]; Hann
In Hann mode tap[34] is previous tap[16] Shift second half forward by 17 bits.
The negative taps will interleave together, so don't mask off the bypassed section.
Sum is 510
Saucy, I experimented with a lot of filters and some were just noisy. I picked three that I could prove in my program and test on the FPGA with the ADC:
dim t(70)
for x = 0 to 70
t(x) = 0
next x
'a=2 : b=3 : c=8 : d=33 : m=1 '70-tap Tukey window
'a=2 : b=3 : c=6 : d=14 : m=2 '47-tap Tukey window
a=1 : b=2 : c=8 : d=01 : m=4 '30-tap Hann window
t0 = 0
t1 = t0 + a
t2 = t1 + b
t3 = t2 + c
t4 = t3 + b
t5 = t4 + a
t6 = t5 + d
t7 = t6 + a
t8 = t7 + b
t9 = t8 + c
t10 = t9 + b
t11 = t10 + a
print "taps",,t0,t1,t2,t3,t4,t5,t6,t7,t8,t9,t10,t11
print
inta = 0
intb = 0
intc = 0
print "iter","inta","intb","intc","sample"
for y = 0 to 70*2-1
for x = 70 to 1 step -1
t(x) = t(x-1)
next x
t(0) = 1
if y >= 70 then t(0) = 0
delta = 0
if (t(t0) = 1 and t(t0+1) = 0) then delta = delta + 1
if (t(t0) = 0 and t(t0+1) = 1) then delta = delta - 1
if (t(t1) = 1 and t(t1+1) = 0) then delta = delta + 1
if (t(t1) = 0 and t(t1+1) = 1) then delta = delta - 1
if (t(t2) = 1 and t(t2+1) = 0) then delta = delta + 1
if (t(t2) = 0 and t(t2+1) = 1) then delta = delta - 1
if (t(t3) = 1 and t(t3+1) = 0) then delta = delta - 1
if (t(t3) = 0 and t(t3+1) = 1) then delta = delta + 1
if (t(t4) = 1 and t(t4+1) = 0) then delta = delta - 1
if (t(t4) = 0 and t(t4+1) = 1) then delta = delta + 1
if (t(t5) = 1 and t(t5+1) = 0) then delta = delta - 1
if (t(t5) = 0 and t(t5+1) = 1) then delta = delta + 1
if (t(t6) = 1 and t(t6+1) = 0) then delta = delta - 1
if (t(t6) = 0 and t(t6+1) = 1) then delta = delta + 1
if (t(t7) = 1 and t(t7+1) = 0) then delta = delta - 1
if (t(t7) = 0 and t(t7+1) = 1) then delta = delta + 1
if (t(t8) = 1 and t(t8+1) = 0) then delta = delta - 1
if (t(t8) = 0 and t(t8+1) = 1) then delta = delta + 1
if (t(t9) = 1 and t(t9+1) = 0) then delta = delta + 1
if (t(t9) = 0 and t(t9+1) = 1) then delta = delta - 1
if (t(t10) = 1 and t(t10+1) = 0) then delta = delta + 1
if (t(t10) = 0 and t(t10+1) = 1) then delta = delta - 1
if (t(t11) = 1 and t(t11+1) = 0) then delta = delta + 1
if (t(t11) = 0 and t(t11+1) = 1) then delta = delta - 1
inta = inta + delta
intb = intb + inta
intc = intc + intb * m
sample = int(intc/8)
print y,inta,intb,intc,sample
next y
Comments
I've shown Tukey 19(15)/34 again and added a couple of trapezoidal windows for logic comparison and completeness. Trap. 17(15)/32 requires an output clamp but the short version is perfectly triangular.
Have a look at synchronous serial smartpin mode. Data from pin A is shuffled into a 32 bit register when clock on pin B rises/lowers
I expanded the tap shifter to 73 bits and want to have four filter settings. The longest one will be lowest-pass. We just mux in the tap pairs. I want to see what a lowest-pass filter looks like. We are still a little rough at 45 taps.
I forget... Does a Tukey Window smooth better than a triangle window? I know it lets more signal through, bit this triangle window looks really good.
Chip, your programs work fine in QuickBASIC. I'll consult my literature.
Are simple registers in other modes re-cycled as the shift register in scope mode and therefore not additional logic?
Right, kind of. There are just mux's to existing flops in the smart pin.
Hann window: sidelobes fall off at -18dB per octave.
A Tukey window is very flat in the time domain, close to a value of 1.0 for a majority of the window. A ‘Taper Length’ can be specified which determines the amount of time that the window maintains a value of one (Figure 14). The lower the taper length, the longer the Tukey has a value of one over the measurement time.
tukey.png
Figure 14: The ‘Taper length’ determines how much of the Tukey window has a value of zero
Typically, Tukey windows are used to analyze transient data. The advantage of a Tukey window is that the amplitude of transient signal in the time domain is less likely to be altered compared to using Hanning or Flattop (see Figure 15). In a Hanning or Flattop window, the value of the window equals one for a shorter span compared to the Tukey window.
Source:
https://community.plm.automation.siemens.com/t5/Testing-Knowledge-Base/Window-Types-Hanning-Flattop-Uniform-Tukey-and-Exponential/ta-p/445063
So, what would be quietest for a DC measurement?
Triangular, maybe.
A Tukey is a mixture of Hann + rectangular. Exact behaviour depends on lengths of plateau and ramps and we need Saucy's help for the frequency response. First or second sidelobes might not be attentuated as much as triangular, whereas higher ones probably are.
I've just been reading that a rectangular moving average window is hopeless in the frequency domain but cannot be beaten for random noise reduction as all samples have equal weight, although other windows can match it. Multiple-pass moving average filters are better in the frequency domain and two-pass is equivalent to triangular window.
There is also a Blackman window that is a relative of Hann. What is the main constraint now if not the shift register? Number of different slopes?
Good point. I can throw a cog at this ...
It seems to me that:
(tap[01:00] == 2'b01) - (tap[01:00] == 2'b10) = tap[00]-tap[01]
but beware of sign-extend issues. Perhaps the compiler already figured it out. Also, this might be introducing an extra differentiation. I'll run some more analysis, but I want to get some of this out sooner.
It may also be worth considering 4 accumulators. It might reduce the number of difference terms, depending on the filter shape.
-James
Number of slopes. Trying to find some agreeable setting that will allow bypass of taps to achieve the Hann function. Tukey 17/32 does this, but needs clamping.
The 65-tap triangle filter really looks good.
I'm thinking that there must be some way to get higher-quality results every clock. Consider that a 16-sample Sinc3 is 8 bits in quality and uses the prior two samples (my understanding, anyway). That's all of 48 bits. We have that, and more.
Whoa!!! I never realized that. I'm going to recode this to see if it gets smaller.
Vpp is probably the best measure of noise in a scope display. Just ignore the HighBitCount as that's not really relevant to how we will implement it now. So longer filters work better Tukey is better than a triangle of the same length.
Doing the conversion and canceling I got this: Compare this to the third derivative of Tukey17. Indies vary because I had to pad the edges to avoid loosing data. And Matlab starts arrays at 1. Other than that, they are the same. So delta is the third derivative of the output. If we integrate it 3 times, we get the output.
But wait a minute, the second derivative has fewer non-zero terms. And that will eliminate an integrator as well. Also let's switch to Tukey23 because it performs better and saves summer terms. Note: I have not check the above for order of operations or sign-extension issues.
What is the logic cost of doing this for the rectangular, i.e., flat top part. as it requires only 5 additions to run the function, i.e., by bit-weaving the data flow, so that it could be available as an PASM opcode, (if it isn't already); but it would run automatically if you wanted a Tukey accumulate in streaming data acquisition mode. Of course for the ends of the sausage then starting and stop functions could be called; which might use a one sided special triangular slope; or half cosine window function call - so that to start the Tukey you would compute the triangular starter window once; followed by count bits (and scale sum as desired), and then to stop accumulating you would call ) "flat top MINUS the triangular", or else remember that the Hann window is a raised cosine - so that you need f(x) = 1=cosine(x); and here again is the 1 part; whereas the cosine part can actually be done with some similar trickery.
// every 32 clocks
#define EVEN_BITS (0x55555555)
#define ODD_BITS (0xaaaaaaaa)
#define EVEN_PAIRS (0x33333333)
#define ODD_PAIRS (0xcccccccc)
#define EVEN_NIBBLES (0x0f0f0f0f)
#define ODD_NIBBLES (0xf0f0f0f0)
#define EVEN_BYTES (0x00ff00ff)
#define ODD_BYTES (0xff00ff00)
#define HIWORD(arg) (((arg)&(0xffff0000))>>16)
#define LOWORD(arg) ((arg)&(0x0000ffff))
unsigned char propeller_adc::count_bits (DWORD input)
{
DWORD sum;
DWORD p1, p2;
p1 = (input&ODD_BITS)>>1;
p2 = (input&EVEN_BITS);
sum = p2+p1;
p1 = (sum&ODD_PAIRS)>>2;
p2 = (sum&EVEN_PAIRS);
sum = p2+p1;
p1 = (sum&ODD_NIBBLES)>>4;
p2 = (sum&EVEN_NIBBLES);
sum = p2+p1;
p1 = (sum&ODD_BYTES)>>8;
p2 = (sum&EVEN_BYTES);
sum = p2+p1;
p1 = HIWORD(sum);
p2 = LOWORD(sum);
sum = p2+p1;
return sum;
}
What do you mean about getting rid of an integrator?
The cancellation doesn't do anything to get rid of integrators. If we have 12 differential terms, we would need 24 taps off the shift register. But some of these cancel so we are using 16 unique taps from the shift register. Doing the conversion to sum form and canceling terms shows that the taps your code uses match the positions of the third derivative of the filter impulse response.
How we get rid of an integrator is making the computation of delta match the second derivative of the impulse response. Basically what we are doing here is differentiating our filter taps, doing the convolution, then integrating to get the output. Using the third derivative of the impulse response for our taps results in more taps with non-zero values than using the second derivative. We want as many taps as possible to have zero values because they are cheapest to implement.
These formations in the calculation of delta are performing edge detection/differentiation of the input data. So it's necessary to integrate an additional time to get the output samples. Also, we need to get additional taps from the shift register to perform this differentiation.
It would be very small, simply a counter running at sysclock rate or a tree of adders summing 32 individual bits to a 6 bit sum. I've seen similar code here: graphics.stanford.edu/%7Eseander/bithacks.html#CountBitsSetParallel Compared to the P1, the P2 has a new instruction to do this: "ones" Which was used by Chip in the code on the first post of this thread.
Here's my first attempt using smartpin sync serial receive to record. Simply fills all 512 KB of hubRAM and dumps it. Duration is about 0.4 seconds.
EDIT: I've added scope screenshot of analogue filtered bitstream of what goes to Prop pin.
EDIT2: And the schematic for ADC board with filter.
EDIT3: Note that the zip compression is only achieving slightly better than 2:1 compression. That tells me we aren't going to get great improvement at lossless compression. I've tried to use FLAC compressor tool but that needed PCM data, so no luck comparing that.
EDIT4: And source code snippet for the capturing.
EDIT5: Added snapshots of clock timing. About 14 ns of hold time on data rise. And about 12 ns of hold on data fall.
EDIT6: Correction on the recorded amount, 512 kB is 0.2 seconds worth. I had actually run for 1 MB but the second half wasn't kept. On that note, I can up that to 1 MB with other Prop2 image for FPGA ...
Is that for Tukey17 integrated 3x or Tukey23 integrated 2x?
For my suggested delta calculation (Tukey23) I think it needs
delta 4 bits (range -4:4)
first integrator 6 bits (range -30:30)
second integrator 10 bits (range 0:1020)
Sum is 510
I implemented them in the smart pin like this:
Good progress. Did you check continual RX did not drop any bits?
Maybe a second pin counting edges can do that ?
Is that target still 1.80V -10% and 125 deg ? You could tighten the 10% these days.
Voltages are spec'd at 1.8V and 3.3V, +/- 5%. Junction temperature is -55°C to +150°C.