To make things more clear, if window function is y = ƒ(t) then

delta = dy
inta = y
intb = inty = ∫ydt

We could even go a level deeper and define the filter by only six points:

(1) initially from Y=0, begin upwards acceleration (+1)
(2) at the inflection point of the filter rise, begin upwards deceleration (-1)
(3) at the start of the plateau when delta is now zero, cancel acceleration (0)
(4) at the end of the plateau, begin downwards acceleration (-1)
(5) at the inflection point of the filter fall, begin downwards deceleration (+1)
(6) at the end of the filter when Y and delta are now 0, cancel acceleration (0)

It would take only two parameters to define the filter:

a) the acceleration/deceleration period for filter rise and fall
b) the plateau period

With a few more bits of resolution, we could dial it in much better than we have been.

What if we set min and max dy values? 0 <= |dy| <= 3

I compiled the simplified algorithm per Saucy's recommendation and it works great. It cut 18 ALMs from each smart pin.

Here is what the code looks like now. It's just amazing how much this has been simplified:

I think we've had separate savings of 34, 10 and now 18 ALMs, with a logic increase between the second and third. At one time it was 80+ ALMs per smart pin.

I'll look at modifying the BASIC program for shift register skipping to allow a single set of 12 taps later this afternoon.

Okay. That would be great. Remember they just have to sum to 2040..2047, preferably a single value.

I compiled the simplified algorithm per Saucy's recommendation and it works great. It cut 18 ALMs from each smart pin.

Here is what the code looks like now. It's just amazing how much this has been simplified:

I think we've had separate savings of 34, 10 and now 18 ALMs, with a logic increase between the second and third. At one time it was 80+ ALMs per smart pin.

I'll look at modifying the BASIC program for shift register skipping to allow a single set of 12 taps later this afternoon.

Okay. That would be great. Remember they just have to sum to 2040..2047, preferably a single value.

I have the two Tukeys sharing the same taps in BASIC. It half worked first time and completely second time. I'll do the Hann, which won't be quite the same as now, then post the program.

I compiled the simplified algorithm per Saucy's recommendation and it works great. It cut 18 ALMs from each smart pin.

Here is what the code looks like now. It's just amazing how much this has been simplified:

I think we've had separate savings of 34, 10 and now 18 ALMs, with a logic increase between the second and third. At one time it was 80+ ALMs per smart pin.

I'll look at modifying the BASIC program for shift register skipping to allow a single set of 12 taps later this afternoon.

Okay. That would be great. Remember they just have to sum to 2040..2047, preferably a single value.

I have the two Tukeys sharing the same taps in BASIC. It half worked first time and completely second time. I'll do the Hann, which won't be quite the same as now, then post the program.

a = 2: b = 3: c = 8: d = 33 'All windows
m = 1 'Tukey68
'm = 2 'Tukey45
'm = 4 'Hann30
'm = 8 'Blackman22
t0 = 0
t1 = t0 + a
t2 = t1 + b
t3 = t2 + c
t4 = t3 + b
t5 = t4 + a
t6 = t5 + d
t7 = t6 + a
t8 = t7 + b
t9 = t8 + c
t10 = t9 + b
t11 = t10 + a
print "taps", t0, t1, t2, t3, t4, t5, t6, t7, t8, t9, t10, t11
print
dim t(70)
for x = 0 TO 70: t(x) = 0: next x 'clear bits
inta = 0
intb = 0
print "iter", "delt", "inta", "intb", "samp"
for y = 0 TO 70 * 2 - 1
for x = 70 TO 1 step -1
'Tukey68
if m = 1 then t(x) = t(x - 1)
'Tukey45
if m = 2 then
if x <= 10 or (x > 13 and x <= 31) or (x > 51 and x <= 56) or x > 59 then
t(x) = t(x - 1)
end if
if x = t3 then t(x) = t(t3 - 3): ' t(13) = t(10)
if x = t6 then t(x) = t(t6 - 20): 't(51) = t(31)
if x = t8 + 3 then t(x) = t(t8): ' t(59) = t(56)
end if
'Hann30
if m = 4 then
if x <= 10 or (x > 13 and x <= 16) or (x > 53 and x <= 56) or x > 59 then
t(x) = t(x - 1)
end if
if x = t3 then t(x) = t(t3 - 3): ' t(13) = t(10)
if x = t5 then t(x) = t(t4): ' t(18) = t(16)
if x = t6 then t(x) = t(t6 - 33): 't(51) = t(18)
if x = t7 then t(x) = t(t6): ' t(53) = t(51)
if x = t8 + 3 then t(x) = t(t8): ' t(59) = t(56)
end if
'Blackman22
if m = 8 then
if x <= 8 or x > 61 then
t(x) = t(x - 1)
end if
if x = t3 then t(x) = t(t3 - 5): ' t(13) = t(8)
if x = t4 then t(x) = t(t3): ' t(16) = t(13)
if x = t5 then t(x) = t(t4): ' t(18) = t(16)
if x = t6 then t(x) = t(t6 - 33): 't(51) = t(18)
if x = t7 then t(x) = t(t6): ' t(53) = t(51)
if x = t8 then t(x) = t(t7): ' t(56) = t(53)
if x = t8 + 5 then t(x) = t(t8): ' t(61) = t(56)
end if
next x
if y < 70 then t(0) = 1 else t(0) = 0
delta = t(t0) + t(t1) + t(t2)
delta = delta - t(t3) - t(t4) - t(t5)
delta = delta - t(t6) - t(t7) - t(t8)
delta = delta + t(t9) + t(t10) + t(t11)
inta = inta + delta * m
intb = intb + inta
sample = int(intb/8)
print y, delta, inta/m, intb, sample
next y

Here are the three four windows for the BASIC program above:

Tukey 21/40 "Tukey68"
1, 2, 4, 6, 8,11,14,17,20,23,26,29,32,34,36,38,39 Ramp up/down = 340 x 2
40[34] Plateau = 1360
39,38,36,34,32,29,26,23,20,17,14,11, 8, 6, 4, 2, 1 Grand total = 2040, /8 = 255
Tukey 19/34 "Tukey45"
1, 2, 4, 6, 8,11,14,17,20,23,26,28,30,32,33 Ramp up/down = 255 x 2
34[15] Plateau = 510
33,32,30,28,26,23,20,17,14,11, 8, 6, 4, 2, 1 Grand total = 1020, /4 = 255
Hann 19/34 "Hann30"
1, 2, 4, 6, 8,11,14,17,20,23,26,28,30,32,33 Ramp up/down = 255 x 2
33,32,30,28,26,23,20,17,14,11, 8, 6, 4, 2, 1 Grand total = 510, /2 = 255
Blackman 15/24 "Blackman22"
1, 2, 4, 6, 8,11,14,17,20,22,23 Ramp up/down = 128 x 2
23,22,20,17,14,11, 8, 6, 4, 2, 1 Grand total = 256 *** clamp to 255 ***

I'm curious to know how these windows affect the rise and fall times of square waves. The new Blackman window above is the shortest and thus should have the fastest response.

Today I made a 4th-order 22-bit filter that returns a 16-bit sample every clock. You can just do a RDPIN to get a sample.

I modeled an RC integrator and put four in series. The ADC bits feed in as $000000/$3FFFFF. It takes about 256 clocks to settle to a stable 16-bit reading for a steady input pattern. Of course, our ADC is only good for 12 bits, or so, so the filter is overkill.

Here is the filter model in SmallBASIC:

d = 16 'how much to divide errors before adding into accumulators
r = 10 'how often to report status (1=always)
for y = 0 to 500
acc = acc + 1/6 'simulate ADC bitstream
if acc >= 1 then
acc = acc - 1
bit = 1
else
bit = 0
endif
'bit = 1 'test all 1's
if bit = 1 then a0 = 0x3FFFFF else a0 = 0
e1 = a0 - a1
d1 = int(e1/d)
a1 = a1 + d1
e2 = a1 - a2
d2 = int(e2/d)
a2 = a2 + d2
e3 = a2 - a3
d3 = int(e3/d)
a3 = a3 + d3
e4 = a3 - a4
d4 = int(e4/d)
a4 = a4 + d4
if y/r = int(y/r) then print y,bit,hex(a1),hex(a2),hex(a3),hex(a4),,"sample " + hex(int((a4+32)/64))
next y

Here is the Verilog code in the smart pin:

This filter works great, but I don't know how practical it is. I think the windowed filters are more realistic, as they are a convenient 8 bits and don't have a RC-looking lag. Plus, they are 50 ALMs smaller. This filter is good for resolution, but not bandwidth.

Here it is sampling a 10KHz 20mV sawtooth. Of sample[15:0], these are bits [8:1]:

This morning I made a triple-integrating window filter that only requires 4 taps for Hann and 6 taps for Tukey windows. Only two parameters are needed to define the shape.

Here is the code:

'a=4 : b = 00 : m=32 '17-tap Hann window, t2 and t3 cancel out
'a=4 : b = 08 : m=16 '25-tap Tukey window
'a=8 : b = 00 : m=04 '33-tap Hann window, t2 and t3 cancel out
'a=8 : b = 16 : m=02 '49-tap Tukey window
a=8 : b = 48 : m=01 '81-tap Tukey window
gx = 10 : gy = 270 : iy = gy + 100
t0 = 0
t1 = t0 + a
t2 = t1 + a
t3 = t2 + b
t4 = t3 + a
t5 = t4 + a
topbit = t5
for x = 1 to 35 : print : next x
print "taps",,t0,t1,t2,t3,t4,t5
print
dim t(topbit)
for x = 0 to topbit : t(x) = 0 : next x 'clear bits
inta = 0
intb = 0
intc = 0
print "iter","ADC","delt","inta","intb","intc","samp","clam"
for iter = 0 to topbit*2 + 1
for x = topbit to 1 step -1 : t(x) = t(x-1) : next x 'shift bits
t(0) = (iter <= topbit) 'new ADC bit
't(0) = 1 - t(1)
't(0) = int(rnd + 0.5)
delt = t(t0) - t(t1)*2 + t(t2) - t(t3) + t(t4)*2 - t(t5)
inta = inta + delt
intb = intb + inta
intc = intc + intb * m
clam = int(intc/4096)
samp = int(intc/16) - clam
print iter,t(0),delt,inta,intb,intc,samp,clam
line gx + iter*3, gy, gx + iter*3, gy - samp 'plot samples
line gx + iter*3, iy, gx + iter*3, iy - intb 'plot intb
next iter

This morning I made a triple-integrating window filter that only requires 4 taps for Hann and 6 taps for Tukey windows. Only two parameters are needed to define the shape.

This morning I made a triple-integrating window filter that only requires 4 taps for Hann and 6 taps for Tukey windows. Only two parameters are needed to define the shape.

How much more logic does this take?

For supporting four windows, it would certainly take less.

Vpp98% is peak-to-peak noise with the output scaled to 0-255, but without quantization. It could be considered the number of counts that the reading will wander.
Cutoff frequencies are in MHz and assume a sysclock of 250MHz.

This morning I made a triple-integrating window filter that only requires 4 taps for Hann and 6 taps for Tukey windows. Only two parameters are needed to define the shape.

How much more logic does this take?

For supporting four windows, it would certainly take less.

The problem is these new windows are not Hann nor Tukey and I don't know what they are.

There is a family of "raised cosine" windows given by

w(n) = a0 - a1*cos(2*pi*n/N) + a2*cos(4*pi*n/N) for n = 0 to N-1
Window a0 a1 a2
Hann 0.5 0.5 0
Hamming 0.543 0.457 0
Blackman 0.423 0.498 0.079

The new fourth window mentioned above is a Blackman window, which has better stopband attentuation and lower passband ripple than either Hann or Hamming. The chosen Blackman is mostly identical to the Hann and Tukey, which means the logic required to implement it is reduced.

The following table show the taps that are skipped for three of the windows.

This morning I made a triple-integrating window filter that only requires 4 taps for Hann and 6 taps for Tukey windows. Only two parameters are needed to define the shape.

How much more logic does this take?

For supporting four windows, it would certainly take less.

The problem is these new windows are not Hann nor Tukey and I don't know what they are.

They are just simple integrations. Close approximation to raised sine.

This SmallBASIC program graphs it all:

'a=4 : b = 00 : m=32 '17-tap Hann window, t2 and t3 cancel out
'a=4 : b = 08 : m=16 '25-tap Tukey window
'a=8 : b = 00 : m=04 '33-tap Hann window, t2 and t3 cancel out
a=8 : b = 16 : m=02 '49-tap Tukey window
'a=8 : b = 48 : m=01 '81-tap Tukey window
gx = 10 : cy = 270 : by = cy + 100 : ay = by + 100 : dy = ay + 50
t0 = 0
t1 = t0 + a
t2 = t1 + a
t3 = t2 + b
t4 = t3 + a
t5 = t4 + a
topbit = t5
for x = 1 to 40 : print : next x
print "taps",,t0,t1,t2,t3,t4,t5
print
dim t(topbit)
for x = 0 to topbit : t(x) = 0 : next x 'clear bits
inta = 0
intb = 0
intc = 0
print "iter","ADC","delt","inta","intb","intc","samp","clam"
for iter = 0 to topbit*2 + 1
for x = topbit to 1 step -1 : t(x) = t(x-1) : next x 'shift bits
t(0) = (iter <= topbit) 'new ADC bit
't(0) = 1 - t(1)
't(0) = int(rnd + 0.5)
delt = t(t0) - t(t1)*2 + t(t2) - t(t3) + t(t4)*2 - t(t5)
inta = inta + delt
intb = intb + inta
intc = intc + intb * m
clam = int(intc/4096)
samp = int(intc/16) - clam
print iter,t(0),delt,inta,intb,intc,samp,clam
x = gx + iter*4
line x, cy, x, cy - intc/16 'plot intc
line x, by, x, by - intb 'plot intb
line x, ay, x, ay - inta*2 'plot inta
line x, dy, x, dy - delt*8 'plot delta
next iter

This morning I made a triple-integrating window filter that only requires 4 taps for Hann and 6 taps for Tukey windows. Only two parameters are needed to define the shape.

How much more logic does this take?

For supporting four windows, it would certainly take less.

The problem is these new windows are not Hann nor Tukey and I don't know what they are.

cmafilter(x,y,z) is a cascaded moving average. It means filter by a moving average of length x. Then filter the output of that by a moving average filter of length y. Do it again with a moving average of length z. If all 3 numbers were the same it would be a sinc3 filter. This is slightly more general. We basically have a sinc3 with the center part lengthened.

Here is the BASIC program for the 4th-order filter I added into the smart pin (to try it, anyway). At any time, you can do a RDPIN to get a 16-bit sample, which is noisy, though the filter is 16-bit stable:

d = 16 'how much to divide errors before adding into accumulators
r = 16 'how often to report status (1=always)
g = 1 'graphics/text mode
for i = 0 to 2048
acc = acc + 1/6 'simulate ADC bitstream
if i/320 = int(i/320) then t = 1-t
if t = 1 then acc = acc + 4/6
if acc >= 1 then
acc = acc - 1
bit = 1
else
bit = 0
endif
'bit = 1 'test all 1's
'bit = int(rnd + 0.5)
if bit = 1 then a0 = 0x3FFFFF else a0 = 0
e1 = a0 - a1
d1 = int(e1/d)
a1 = a1 + d1
e2 = a1 - a2
d2 = int(e2/d)
a2 = a2 + d2
e3 = a2 - a3
d3 = int(e3/d)
a3 = a3 + d3
e4 = a3 - a4
d4 = int(e4/d)
a4 = a4 + d4
sample = int((a4+32)/64)
if g = 1 then
x = 10 + i
s = 25000
line x, 200, x, 200 - bit*100 'plot bit
line x, 400, x, 400 - a1/s 'plot a1
line x, 600, x, 600 - a2/s 'plot a2
line x, 800, x, 800 - a3/s 'plot a3
line x, 1000, x, 1000 - a4/s 'plot a4
if sample = lastsample then same++ else same = 0
if same > 10 then line x, 1200, x, 1100
lastsample = sample
elseif i/r = int(i/r) then
print i,,bit,hex(a1),hex(a2),hex(a3),hex(a4),,"sample " + hex(sample)
endif
next i

When the program runs in graph mode (g=1), it plots, top-down, the ADC bitstream, a1, a2, a3, a4, and lines for when the filter output has been 16-bit-stable for 10 consecutive samples. Every 320 clocks it switches between 1/6 duty and 5/6 duty, which is what the ADC outputs at GND and VIO readings:

There is a family of "raised cosine" windows given by

w(n) = a0 - a1*cos(2*pi*n/N) + a2*cos(4*pi*n/N) for n = 0 to N-1
Window a0 a1 a2
Hann 0.5 0.5 0
Hamming 0.543 0.457 0
Blackman 0.423 0.498 0.079

The new fourth window mentioned above is a Blackman window, which has better stopband attentuation and lower passband ripple than either Hann or Hamming. The chosen Blackman is mostly identical to the Hann and Tukey, which means the logic required to implement it is reduced.

The following table show the taps that are skipped for three of the windows.

Chip, Blackman22 sums to 256, the rest to 255. I could modify the Blackman22 to sum to 250, but that's not great. This window matches a genuine Blackman, as much as the resolution allows. How many shift bits are available? It must be more than 70.

Your cascaded third-order filters need more adders bits. I calculate 27 vs 21 and total logic size of the two could be close. I think it would be optimal to have the plateau bits at the end as in the second table.

This morning I made a triple-integrating window filter that only requires 4 taps for Hann and 6 taps for Tukey windows. Only two parameters are needed to define the shape.

How much more logic does this take?

For supporting four windows, it would certainly take less.

The problem is these new windows are not Hann nor Tukey and I don't know what they are.

Chip, Blackman22 sums to 256, the rest to 255. I could modify the Blackman22 to sum to 250, but that's not great. This window matches a genuine Blackman, as much as the resolution allows. How many shift bits are available? It must be more than 70.

Your cascaded third-order filters need more adders bits. I calculate 27 vs 21 and total logic size of the two could be close. I think it would be optimal to have the plateau bits at the end as in the second table.

Sinc3 uses many more adder bits, of course, so Tukey/Hann/Blackman should be bigger. It's been interesting to work on, if it doesn't make it into rev B.

TonyB_, i think I will implement your filters. They take care of the eight bit issue nicely. For highest speed, they are necessary.

I wish we had infinite logic, so that we could also implement a 16-bit converter. Then, the streamer could write 16-bit samples to memory whenever it wanted to, as well.

Does anyone have any better ideas than the filter I implemented to do 16-bit conversions? It was a very simple notion and it works well, but there may be some better techniques. Maybe something else would take less logic, or have better pass-band characteristics, or settle faster.

Being able to grab a 16-bit conversion whenever you want it is really nice.

TonyB_, i think I will implement your filters. They take care of the eight bit issue nicely. For highest speed, they are necessary.

I wish we had infinite logic, so that we could also implement a 16-bit converter. Then, the streamer could write 16-bit samples to memory whenever it wanted to, as well.

Thanks, Chip. I don't mind if other filters are used, whatever can fit. I counted how many register bits in total were in use the other day for the scope mode and it came to 32x3 + 1x16 = 112. I'm not sure whether that is the limit and the 80 taps you need for one of your cascaded filters are impossible or not. If more bits are available then I could have a look at slightly larger windows that might avoid the need for clamping one of them. Having said that, Hann/Tukey and Blackman might not go together so well as now (see pic below). I've read some good things about the Blackman function, which is a combination of cosine-squared and cosine-squared-squared.

## Comments

2,097What if we set min and max dy values? 0 <= |dy| <= 3

14,020Okay. That would be great. Remember they just have to sum to 2040..2047, preferably a single value.

14,0202,097I have the two Tukeys sharing the same taps in BASIC. It half worked first time and completely second time. I'll do the Hann, which won't be quite the same as now, then post the program.

Is this a change or an observation? And side lobes means side lobes, not ramps?

14,020I mean that on the longest filter, the ramp-up and ramp-down are longer.

2,097EDIT:

Add Blackman22 window.

2,097EDIT:

New Blackman window added as fourth one.

2,097I'm curious to know how these windows affect the rise and fall times of square waves. The new Blackman window above is the shortest and thus should have the fastest response.

4592,097Thanks for the graphs.

Was is Vpp98% again? Are the short windows a lot worse than Hann30?

459Generally, the length is the most important factor in how well a filter performs. Probably.

14,85714,020Today I made a 4th-order 22-bit filter that returns a 16-bit sample every clock. You can just do a RDPIN to get a sample.

I modeled an RC integrator and put four in series. The ADC bits feed in as $000000/$3FFFFF. It takes about 256 clocks to settle to a stable 16-bit reading for a steady input pattern. Of course, our ADC is only good for 12 bits, or so, so the filter is overkill.

Here is the filter model in SmallBASIC:

Here is the Verilog code in the smart pin:

This filter works great, but I don't know how practical it is. I think the windowed filters are more realistic, as they are a convenient 8 bits and don't have a RC-looking lag. Plus, they are 50 ALMs smaller. This filter is good for resolution, but not bandwidth.

Here it is sampling a 10KHz 20mV sawtooth. Of sample[15:0], these are bits [8:1]:

14,020Here is the code:

I graphed the filter, too:

2,097How much more logic does this take?

14,020For supporting four windows, it would certainly take less.

459Cutoff frequencies are in MHz and assume a sysclock of 250MHz.

2,097The problem is these new windows are not Hann nor Tukey and I don't know what they are.

2,097http://forums.parallax.com/discussion/comment/1457894/#Comment_1457894

Details of all four are in this edited post:

http://forums.parallax.com/discussion/comment/1457896/#Comment_1457896

There is a family of "raised cosine" windows given by

The new fourth window mentioned above is a Blackman window, which has better stopband attentuation and lower passband ripple than either Hann or Hamming. The chosen Blackman is mostly identical to the Hann and Tukey, which means the logic required to implement it is reduced.

The following table show the taps that are skipped for three of the windows.

The next table shows the plateau taps after the ramps, with the Tukey45 taps at the end of the plateau shift bits. It is easier to see both ramps and the overall similarities between the windows in this second table.

2,097Thanks again for the graphs. Could you please add the new Blackman22 to the above?

http://forums.parallax.com/discussion/comment/1457896/#Comment_1457896

I think it's better to exclude zeroes in the names, hence Tukey68 and Tukey45.

EDIT:

I've noticed that you say the length of Hann30 is 28 but it does have 30 non-zero values.

14,020They are just simple integrations. Close approximation to raised sine.

This SmallBASIC program graphs it all:

Here's the output for the Tukey49:

459cmafilter(8,8,16) = thirdorderfilter(8,0)

cmafilter(4,4,8) = thirdorderfilter(4,0)

cmafilter(4,4,16) = thirdorderfilter(4,8)

cmafilter(8,8,32) = thirdorderfilter(8,16)

cmafilter(8,8,64) = thirdorderfilter(8,48)

cmafilter(x,y,z) is a cascaded moving average. It means filter by a moving average of length x. Then filter the output of that by a moving average filter of length y. Do it again with a moving average of length z. If all 3 numbers were the same it would be a sinc3 filter. This is slightly more general. We basically have a sinc3 with the center part lengthened.

14,020When the program runs in graph mode (g=1), it plots, top-down, the ADC bitstream, a1, a2, a3, a4, and lines for when the filter output has been 16-bit-stable for 10 consecutive samples. Every 320 clocks it switches between 1/6 duty and 5/6 duty, which is what the ADC outputs at GND and VIO readings:

14,020Tony, super work here. This looks very nice. Four good filters.

Do you know what they each sum to? The Blackman22 looks like it goes to 256. Not so sure about the rest.

Thanks for coming up with all these.

2,097Your cascaded third-order filters need more adders bits. I calculate 27 vs 21 and total logic size of the two could be close. I think it would be optimal to have the plateau bits at the end as in the second table.

1,235as far as I recall a RC low-pass corresponds to a PT1 which represents a IIR-Filter. IIR filters are inherently shorter/less expensive to implement than FIR-filters, but have nonlinear phase response and often an non flat passband.

https://gaussianwaves.com/2017/02/choosing-a-filter-fir-or-iir-understanding-the-design-perspective/

2,097Sinc3 uses many more adder bits, of course, so Tukey/Hann/Blackman should be bigger. It's been interesting to work on, if it doesn't make it into rev B.

14,020I wish we had infinite logic, so that we could also implement a 16-bit converter. Then, the streamer could write 16-bit samples to memory whenever it wanted to, as well.

14,020Being able to grab a 16-bit conversion whenever you want it is really nice.

2,097Thanks, Chip. I don't mind if other filters are used, whatever can fit. I counted how many register bits in total were in use the other day for the scope mode and it came to 32x3 + 1x16 = 112. I'm not sure whether that is the limit and the 80 taps you need for one of your cascaded filters are impossible or not. If more bits are available then I could have a look at slightly larger windows that might avoid the need for clamping one of them. Having said that, Hann/Tukey and Blackman might not go together so well as now (see pic below). I've read some good things about the Blackman function, which is a combination of cosine-squared and cosine-squared-squared.