Saucy, I experimented with a lot of filters and some were just noisy. I picked three that I could prove in my program and test on the FPGA with the ADC:
dim t(70)
for x = 0 to 70
t(x) = 0
next x
'a=2 : b=3 : c=8 : d=33 : m=1 '70-tap Tukey window
'a=2 : b=3 : c=6 : d=14 : m=2 '47-tap Tukey window
a=1 : b=2 : c=8 : d=01 : m=4 '30-tap Hann window
...
It would save some logic if the Hann window had identical ramps to the second Tukey window. I've always assumed that's how a Hann would be done. In fact, I think it should be possible to have one set of taps, for the long Tukey window, with shift register shortcuts for the other two. The following shows how similar the two Tukeys are:
Three windows need two filter select bits, which means there could be a fourth type of window.
That's right.
16 bits are dedicated to the trigger mechanism. There could be a mode without trigger that is highest resolution. What we have now with the 70 bit filter is pretty amazing. I will post pictures shortly.
Here are the various filters in the smart pin scope mode.
In these pictures, you are looking at the apex of a sine wave being digitized and played back on a DAC at 8x amplitude, so that the individual steps can be clearly seen. The longer the filter is, the lower its passband is, making it quieter, but not good for higher frequencies.
These filters slow down edges. In the 70-tap filter (which is actually more like a 68-tap filter, but needs extra bits for edge detection), 68 contiguous 1's must go into the filter before it reaches its max output of 255. Then, 68 contiguous 0's must go in before it reaches its internal 0, though it output 0 three clocks earlier. Rounding could lift things a little bit, but is maybe of no real value.
With a 7/8th reduction in the ADC's integrator capacitance, it will pass higher frequencies much better.
I was reading that to digitize baseband video, you need to sample at 13.5MHz. So, if you were to run at 216MHz you could have 16 clocks per sample in which an 8-bit-quality Sinc3 conversion could be performed.
With the ADC change, that should be viable. This will all be in the next silicon.
With a 7/8th reduction in the ADC's integrator capacitance, it will pass higher frequencies much better.
I was reading that to digitize baseband video, you need to sample at 13.5MHz. So, if you were to run at 216MHz you could have 16 clocks per sample in which an 8-bit-quality Sinc3 conversion could be performed.
With the ADC change, that should be viable. This will all be in the next silicon.
Any chance it could be extended to 20 clocks per sample, or it would mess with the filterings?
With sysclk@ 270 MHz and 20 clocks per sample, it would enable almost direct coupling of resulting data to HDMI.
With a 7/8th reduction in the ADC's integrator capacitance, it will pass higher frequencies much better.
I was reading that to digitize baseband video, you need to sample at 13.5MHz. So, if you were to run at 216MHz you could have 16 clocks per sample in which an 8-bit-quality Sinc3 conversion could be performed.
With the ADC change, that should be viable. This will all be in the next silicon.
It sounds pretty good Chip. I hope it all fits in the next rev. Cheers!
Roger.
With a 7/8th reduction in the ADC's integrator capacitance, it will pass higher frequencies much better.
I was reading that to digitize baseband video, you need to sample at 13.5MHz. So, if you were to run at 216MHz you could have 16 clocks per sample in which an 8-bit-quality Sinc3 conversion could be performed.
With the ADC change, that should be viable. This will all be in the next silicon.
Any chance it could be extended to 20 clocks per sample, or it would mess with the filterings?
With sysclk@ 270 MHz and 20 clocks per sample, it would enable almost direct coupling of resulting data to HDMI.
Only a thought...
Sure. 20 clocks per sample would be better. You can use whatever decimation rate you want. Then, there's the need to use some Goertzel analysis on the chroma signal and then do some colorspace conversion to RGB.
With a 7/8th reduction in the ADC's integrator capacitance, it will pass higher frequencies much better.
I was reading that to digitize baseband video, you need to sample at 13.5MHz. So, if you were to run at 216MHz you could have 16 clocks per sample in which an 8-bit-quality Sinc3 conversion could be performed.
With the ADC change, that should be viable. This will all be in the next silicon.
7/8 reduction? The figures I think I've seen you post were 3.5 ~ 6.5 MHz current bandwidth. But that'd imply you only needed to halve or quarter the capacitor
Thanks Chip. I get it, the more taps, longer filter, the more clocks needed per effective sample. That and sysclock tells us where the roll off is, and a bit about what it will look like.
So is all this recent stuff for the current chip or some future one?
Some of Chip's comments seem to involve changing the pad stuff and not just verilog? Or an I misunderstanding?
Saucy, I experimented with a lot of filters and some were just noisy. I picked three that I could prove in my program and test on the FPGA with the ADC:
dim t(70)
for x = 0 to 70
t(x) = 0
next x
'a=2 : b=3 : c=8 : d=33 : m=1 '70-tap Tukey window
'a=2 : b=3 : c=6 : d=14 : m=2 '47-tap Tukey window
a=1 : b=2 : c=8 : d=01 : m=4 '30-tap Hann window
t0 = 0
t1 = t0 + a
t2 = t1 + b
t3 = t2 + c
t4 = t3 + b
t5 = t4 + a
t6 = t5 + d
t7 = t6 + a
t8 = t7 + b
t9 = t8 + c
t10 = t9 + b
t11 = t10 + a
print "taps",,t0,t1,t2,t3,t4,t5,t6,t7,t8,t9,t10,t11
print
inta = 0
intb = 0
intc = 0
print "iter","inta","intb","intc","sample"
for y = 0 to 70*2-1
for x = 70 to 1 step -1
t(x) = t(x-1)
next x
t(0) = 1
if y >= 70 then t(0) = 0
delta = 0
if (t(t0) = 1 and t(t0+1) = 0) then delta = delta + 1
if (t(t0) = 0 and t(t0+1) = 1) then delta = delta - 1
if (t(t1) = 1 and t(t1+1) = 0) then delta = delta + 1
if (t(t1) = 0 and t(t1+1) = 1) then delta = delta - 1
if (t(t2) = 1 and t(t2+1) = 0) then delta = delta + 1
if (t(t2) = 0 and t(t2+1) = 1) then delta = delta - 1
if (t(t3) = 1 and t(t3+1) = 0) then delta = delta - 1
if (t(t3) = 0 and t(t3+1) = 1) then delta = delta + 1
if (t(t4) = 1 and t(t4+1) = 0) then delta = delta - 1
if (t(t4) = 0 and t(t4+1) = 1) then delta = delta + 1
if (t(t5) = 1 and t(t5+1) = 0) then delta = delta - 1
if (t(t5) = 0 and t(t5+1) = 1) then delta = delta + 1
if (t(t6) = 1 and t(t6+1) = 0) then delta = delta - 1
if (t(t6) = 0 and t(t6+1) = 1) then delta = delta + 1
if (t(t7) = 1 and t(t7+1) = 0) then delta = delta - 1
if (t(t7) = 0 and t(t7+1) = 1) then delta = delta + 1
if (t(t8) = 1 and t(t8+1) = 0) then delta = delta - 1
if (t(t8) = 0 and t(t8+1) = 1) then delta = delta + 1
if (t(t9) = 1 and t(t9+1) = 0) then delta = delta + 1
if (t(t9) = 0 and t(t9+1) = 1) then delta = delta - 1
if (t(t10) = 1 and t(t10+1) = 0) then delta = delta + 1
if (t(t10) = 0 and t(t10+1) = 1) then delta = delta - 1
if (t(t11) = 1 and t(t11+1) = 0) then delta = delta + 1
if (t(t11) = 0 and t(t11+1) = 1) then delta = delta - 1
inta = inta + delta
intb = intb + inta
intc = intc + intb * m
sample = int(intc/8)
print y,inta,intb,intc,sample
next y
You still did not fix the issue of differentiating the input data. While the BASIC program shows 12 taps, it actually uses 24 taps since it uses tn and (tn)+1. It's not actually a bad idea to integrate 3 times if the filter was designed for cubic curves. As it is now, one of the integrators is just undoing the differentiation that was inadvertently added. Surely the code below would be much cheaper to implement.
dim t(70)
for x = 0 to 70
t(x) = 0
next x
'=2 : b=3 : c=8 : d=33 : m=1 '70-tap Tukey window
'=2 : b=3 : c=6 : d=14 : m=2 '47-tap Tukey window
a=1 : b=2 : c=8 : d=01 : m=4 '30-tap Hann window
t0 = 0
t1 = t0 + a
t2 = t1 + b
t3 = t2 + c
t4 = t3 + b
t5 = t4 + a
t6 = t5 + d
t7 = t6 + a
t8 = t7 + b
t9 = t8 + c
t10 = t9 + b
t11 = t10 + a
print "taps",,t0,t1,t2,t3,t4,t5,t6,t7,t8,t9,t10,t11
print
inta = 0
intb = 0
intc = 0
print "iter","inta","intb","sample"
for y = 0 to 70*2-1
for x = 70 to 1 step -1
t(x) = t(x-1)
next x
t(0) = 1
if y >=70 then t(0) = 0
delta = 0
delta = delta + t(t0)
delta = delta + t(t1)
delta = delta + t(t2)
delta = delta - t(t3)
delta = delta - t(t4)
delta = delta - t(t5)
delta = delta - t(t6)
delta = delta - t(t7)
delta = delta - t(t8)
delta = delta + t(t9)
delta = delta + t(t10)
delta = delta + t(t11)
inta = inta + delta * m
intb = intb + inta
sample = int(intb/8)
print y," ",inta," ",intb," ",sample
next y
We have to assume that there are zeros before and after the values shown. So using a derivative of the filter requires a few more bits in the shift register, but if done properly we come out way ahead because there will only be a handful of terms to sum.
It's rather inconvenient for me that the Matlab "diff" function does not assume that anything exists beyond the array boundaries, so the output gets shorter with each differentiation.
So is all this recent stuff for the current chip or some future one?
Some of Chip's comments seem to involve changing the pad stuff and not just verilog? Or an I misunderstanding?
It's only a tiny tweek. Only adding 6-10% extra silicon/logic - nothing major.
And it's all theory and a little testing on the FPGA with external ADC which is not like the real P2-ES. No testing on the real P2-ES as far as I can tell.
Good progress. Did you check continual RX did not drop any bits?
Maybe a second pin counting edges can do that ?
AD7400 clock generator is not particularly stable. With Prop2 recording it varies about 150 ppm wrt the FPGA crystal. The scope is showing worse over shorter intervals.
EDIT: Prop2 is, agreeing with scope, up to 240 ppm wobble on repeated 0.1 second recordings. Compared to earlier 0.4 s recordings.
Saucy, I experimented with a lot of filters and some were just noisy. I picked three that I could prove in my program and test on the FPGA with the ADC:
dim t(70)
for x = 0 to 70
t(x) = 0
next x
'a=2 : b=3 : c=8 : d=33 : m=1 '70-tap Tukey window
'a=2 : b=3 : c=6 : d=14 : m=2 '47-tap Tukey window
a=1 : b=2 : c=8 : d=01 : m=4 '30-tap Hann window
t0 = 0
t1 = t0 + a
t2 = t1 + b
t3 = t2 + c
t4 = t3 + b
t5 = t4 + a
t6 = t5 + d
t7 = t6 + a
t8 = t7 + b
t9 = t8 + c
t10 = t9 + b
t11 = t10 + a
print "taps",,t0,t1,t2,t3,t4,t5,t6,t7,t8,t9,t10,t11
print
inta = 0
intb = 0
intc = 0
print "iter","inta","intb","intc","sample"
for y = 0 to 70*2-1
for x = 70 to 1 step -1
t(x) = t(x-1)
next x
t(0) = 1
if y >= 70 then t(0) = 0
delta = 0
if (t(t0) = 1 and t(t0+1) = 0) then delta = delta + 1
if (t(t0) = 0 and t(t0+1) = 1) then delta = delta - 1
if (t(t1) = 1 and t(t1+1) = 0) then delta = delta + 1
if (t(t1) = 0 and t(t1+1) = 1) then delta = delta - 1
if (t(t2) = 1 and t(t2+1) = 0) then delta = delta + 1
if (t(t2) = 0 and t(t2+1) = 1) then delta = delta - 1
if (t(t3) = 1 and t(t3+1) = 0) then delta = delta - 1
if (t(t3) = 0 and t(t3+1) = 1) then delta = delta + 1
if (t(t4) = 1 and t(t4+1) = 0) then delta = delta - 1
if (t(t4) = 0 and t(t4+1) = 1) then delta = delta + 1
if (t(t5) = 1 and t(t5+1) = 0) then delta = delta - 1
if (t(t5) = 0 and t(t5+1) = 1) then delta = delta + 1
if (t(t6) = 1 and t(t6+1) = 0) then delta = delta - 1
if (t(t6) = 0 and t(t6+1) = 1) then delta = delta + 1
if (t(t7) = 1 and t(t7+1) = 0) then delta = delta - 1
if (t(t7) = 0 and t(t7+1) = 1) then delta = delta + 1
if (t(t8) = 1 and t(t8+1) = 0) then delta = delta - 1
if (t(t8) = 0 and t(t8+1) = 1) then delta = delta + 1
if (t(t9) = 1 and t(t9+1) = 0) then delta = delta + 1
if (t(t9) = 0 and t(t9+1) = 1) then delta = delta - 1
if (t(t10) = 1 and t(t10+1) = 0) then delta = delta + 1
if (t(t10) = 0 and t(t10+1) = 1) then delta = delta - 1
if (t(t11) = 1 and t(t11+1) = 0) then delta = delta + 1
if (t(t11) = 0 and t(t11+1) = 1) then delta = delta - 1
inta = inta + delta
intb = intb + inta
intc = intc + intb * m
sample = int(intc/8)
print y,inta,intb,intc,sample
next y
You still did not fix the issue of differentiating the input data. While the BASIC program shows 12 taps, it actually uses 24 taps since it uses tn and (tn)+1. It's not actually a bad idea to integrate 3 times if the filter was designed for cubic curves. As it is now, one of the integrators is just undoing the differentiation that was inadvertently added. Surely the code below would be much cheaper to implement.
dim t(70)
for x = 0 to 70
t(x) = 0
next x
'=2 : b=3 : c=8 : d=33 : m=1 '70-tap Tukey window
'=2 : b=3 : c=6 : d=14 : m=2 '47-tap Tukey window
a=1 : b=2 : c=8 : d=01 : m=4 '30-tap Hann window
t0 = 0
t1 = t0 + a
t2 = t1 + b
t3 = t2 + c
t4 = t3 + b
t5 = t4 + a
t6 = t5 + d
t7 = t6 + a
t8 = t7 + b
t9 = t8 + c
t10 = t9 + b
t11 = t10 + a
print "taps",,t0,t1,t2,t3,t4,t5,t6,t7,t8,t9,t10,t11
print
inta = 0
intb = 0
intc = 0
print "iter","inta","intb","sample"
for y = 0 to 70*2-1
for x = 70 to 1 step -1
t(x) = t(x-1)
next x
t(0) = 1
if y >=70 then t(0) = 0
delta = 0
delta = delta + t(t0)
delta = delta + t(t1)
delta = delta + t(t2)
delta = delta - t(t3)
delta = delta - t(t4)
delta = delta - t(t5)
delta = delta - t(t6)
delta = delta - t(t7)
delta = delta - t(t8)
delta = delta + t(t9)
delta = delta + t(t10)
delta = delta + t(t11)
inta = inta + delta * m
intb = intb + inta
sample = int(intb/8)
print y," ",inta," ",intb," ",sample
next y
Wow! This is way better!!!
I wish I knew how to think about math more easily, because this should have occurred to me, too. I remember thinking it was weird that I needed to run that first integrator. There are so many neat things to learn about.
Anyway, this is going to save a lot of logic! Thanks for pointing this out.
You still did not fix the issue of differentiating the input data. While the BASIC program shows 12 taps, it actually uses 24 taps since it uses tn and (tn)+1. It's not actually a bad idea to integrate 3 times if the filter was designed for cubic curves. As it is now, one of the integrators is just undoing the differentiation that was inadvertently added.
Wow! This is way better!!!
I wish I knew how to think about math more easily, because this should have occurred to me, too. I remember thinking it was weird that I needed to run that first integrator. There are so many neat things to learn about.
Anyway, this is going to save a lot of logic! Thanks for pointing this out.
Hann/Tukey is a cosine-squared function. Just rename new 12-tap delta to inta and all the program outputs will be the same. There is still shift register skipping for smaller windows to try.
Here is the simplified version of the filter tester, per Saucy's recommendation. This is way better now:
a=2 : b=3 : c=8 : d=33 : m=1 '70-tap Tukey window
'a=2 : b=3 : c=6 : d=14 : m=2 '47-tap Tukey window
'a=1 : b=2 : c=8 : d=01 : m=4 '30-tap Hann window
t0 = 0
t1 = t0 + a
t2 = t1 + b
t3 = t2 + c
t4 = t3 + b
t5 = t4 + a
t6 = t5 + d
t7 = t6 + a
t8 = t7 + b
t9 = t8 + c
t10 = t9 + b
t11 = t10 + a
print "taps",,t0,t1,t2,t3,t4,t5,t6,t7,t8,t9,t10,t11
print
dim t(70)
for x = 0 to 70 : t(x) = 0 : next x 'clear bits
inta = 0
intb = 0
print "iter","delt","inta","intb","samp"
for y = 0 to 70*2-1
for x = 70 to 1 step -1 : t(x) = t(x-1) : next x 'shift bits
t(0) = (y < 70) 'insert new bit
delta = t(t0) + t(t1) + t(t2) &
- t(t3) - t(t4) - t(t5) &
- t(t6) - t(t7) - t(t8) &
+ t(t9) + t(t10) + t(t11)
inta = inta + delta
intb = intb + inta * m
sample = int(intb/8)
print y,delta,inta,intb,sample
next y
Now we'll see what kind of logic reduction this yields.
It's just amazing that so little math can compute the area under the filter curve on every clock. I wouldn't have thought this was possible. It's the miracle of calculus. Makes me wonder how many other things could be tackled like this. Seems like some live Goertzel could be done.
To make things more clear, if window function is y = ƒ(t) then
delta = dy
inta = y
intb = inty = ∫ydt
We could even go a level deeper and define the filter by only six points:
(1) initially from Y=0, begin upwards acceleration (+1)
(2) at the inflection point of the filter rise, begin upwards deceleration (-2)
(3) at the start of the plateau when delta is now zero, cancel deceleration (+1)
(4) at the end of the plateau, begin downwards acceleration (-1)
(5) at the inflection point of the filter fall, begin downwards deceleration (+2)
(6) at the end of the filter when Y and delta are now 0, cancel deceleration (-1)
It would take only two parameters to define the filter:
a) the acceleration/deceleration period for filter rise and fall
b) the plateau period
With a few more bits of resolution, we could dial it in much better than we have been.
I compiled the simplified algorithm per Saucy's recommendation and it works great. It cut 18 ALMs from each smart pin.
Here is what the code looks like now. It's just amazing how much this has been simplified:
I think we've had separate savings of 34, 10 and now 18 ALMs, with a logic increase between the second and third. At one time it was 80+ ALMs per smart pin.
I'll look at modifying the BASIC program for shift register skipping to allow a single set of 12 taps later this afternoon.
Comments
It would save some logic if the Hann window had identical ramps to the second Tukey window. I've always assumed that's how a Hann would be done. In fact, I think it should be possible to have one set of taps, for the long Tukey window, with shift register shortcuts for the other two. The following shows how similar the two Tukeys are:
That's right.
16 bits are dedicated to the trigger mechanism. There could be a mode without trigger that is highest resolution. What we have now with the 70 bit filter is pretty amazing. I will post pictures shortly.
We need extra bits to detect edges now. That's why it's longer.
In these pictures, you are looking at the apex of a sine wave being digitized and played back on a DAC at 8x amplitude, so that the individual steps can be clearly seen. The longer the filter is, the lower its passband is, making it quieter, but not good for higher frequencies.
Here is the 70-tap Tukey filter:
Here is the 47-tap Tukey filter:
Here is the 30-tap Hann filter:
Hz, Khz, Mhz?
These filters slow down edges. In the 70-tap filter (which is actually more like a 68-tap filter, but needs extra bits for edge detection), 68 contiguous 1's must go into the filter before it reaches its max output of 255. Then, 68 contiguous 0's must go in before it reaches its internal 0, though it output 0 three clocks earlier. Rounding could lift things a little bit, but is maybe of no real value.
With a 7/8th reduction in the ADC's integrator capacitance, it will pass higher frequencies much better.
I was reading that to digitize baseband video, you need to sample at 13.5MHz. So, if you were to run at 216MHz you could have 16 clocks per sample in which an 8-bit-quality Sinc3 conversion could be performed.
With the ADC change, that should be viable. This will all be in the next silicon.
Any chance it could be extended to 20 clocks per sample, or it would mess with the filterings?
With sysclk@ 270 MHz and 20 clocks per sample, it would enable almost direct coupling of resulting data to HDMI.
Only a thought...
It sounds pretty good Chip. I hope it all fits in the next rev. Cheers!
Roger.
Sure. 20 clocks per sample would be better. You can use whatever decimation rate you want. Then, there's the need to use some Goertzel analysis on the chroma signal and then do some colorspace conversion to RGB.
7/8 reduction? The figures I think I've seen you post were 3.5 ~ 6.5 MHz current bandwidth. But that'd imply you only needed to halve or quarter the capacitor
Boy, that 70 tap looks perty
Some of Chip's comments seem to involve changing the pad stuff and not just verilog? Or an I misunderstanding?
You still did not fix the issue of differentiating the input data. While the BASIC program shows 12 taps, it actually uses 24 taps since it uses tn and (tn)+1. It's not actually a bad idea to integrate 3 times if the filter was designed for cubic curves. As it is now, one of the integrators is just undoing the differentiation that was inadvertently added. Surely the code below would be much cheaper to implement.
We have to assume that there are zeros before and after the values shown. So using a derivative of the filter requires a few more bits in the shift register, but if done properly we come out way ahead because there will only be a handful of terms to sum.
It's rather inconvenient for me that the Matlab "diff" function does not assume that anything exists beyond the array boundaries, so the output gets shorter with each differentiation.
And it's all theory and a little testing on the FPGA with external ADC which is not like the real P2-ES. No testing on the real P2-ES as far as I can tell.
I think Christmas is tying people up. I'll be offline soon enough. Testing with P2D2 and ES boards will have to wait till January.
AD7400 clock generator is not particularly stable. With Prop2 recording it varies about 150 ppm wrt the FPGA crystal. The scope is showing worse over shorter intervals.
EDIT: Prop2 is, agreeing with scope, up to 240 ppm wobble on repeated 0.1 second recordings. Compared to earlier 0.4 s recordings.
Wow! This is way better!!!
I wish I knew how to think about math more easily, because this should have occurred to me, too. I remember thinking it was weird that I needed to run that first integrator. There are so many neat things to learn about.
Anyway, this is going to save a lot of logic! Thanks for pointing this out.
Hann/Tukey is a cosine-squared function. Just rename new 12-tap delta to inta and all the program outputs will be the same. There is still shift register skipping for smaller windows to try.
Now we'll see what kind of logic reduction this yields.
It's just amazing that so little math can compute the area under the filter curve on every clock. I wouldn't have thought this was possible. It's the miracle of calculus. Makes me wonder how many other things could be tackled like this. Seems like some live Goertzel could be done.
delta = dy
inta = y
intb = inty = ∫ydt
We could even go a level deeper and define the filter by only six points:
(1) initially from Y=0, begin upwards acceleration (+1)
(2) at the inflection point of the filter rise, begin upwards deceleration (-2)
(3) at the start of the plateau when delta is now zero, cancel deceleration (+1)
(4) at the end of the plateau, begin downwards acceleration (-1)
(5) at the inflection point of the filter fall, begin downwards deceleration (+2)
(6) at the end of the filter when Y and delta are now 0, cancel deceleration (-1)
It would take only two parameters to define the filter:
a) the acceleration/deceleration period for filter rise and fall
b) the plateau period
With a few more bits of resolution, we could dial it in much better than we have been.
Here is what the code looks like now. It's just amazing how much this has been simplified:
I think we've had separate savings of 34, 10 and now 18 ALMs, with a logic increase between the second and third. At one time it was 80+ ALMs per smart pin.
I'll look at modifying the BASIC program for shift register skipping to allow a single set of 12 taps later this afternoon.