Iterative binomial filter in Spin

Phil Pilgrim (PhiPi) · 2018-10-18 23:32

I'm working on a Propeller app that involves the smoothing of time-series data. One method that's often used for this is the very simplistic exponential averaging:

avg_i = avg_i-1 * p + data * (1 - p), where 0 < p < 1

The closer p is to 1, the more smoothing you get. In physical terms, this is equivalent to a simple RC filter and is a type of infinite impulse response (IIR) filter. This means that, theoretically, a datum an infinite time in the past still affects the current average. Of course, due to rounding and truncation in computer math, the effect is finite, but can be quite long.

Another type of filter is the boxcar smoother:

avg = (x_i-n+1 + x_i-n+2 + ... + x_i-1 + x_i) / n

This is a form a of a finite impulse response [FIR] filter, in that the influence of any particular datum lasts only n times in the data stream. It's not a very nice filter, because it has some nasty bumps at the high end of its frequency response.

A binomial filter is like a boxcar smoother, except that the past values are not all weighted the same. In fact, they're weighted by the numbers in one of the odd lines in Pascal's triangle, and divided by the sum, which is always a power of two:

1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1

The further down the triangle you go, the larger the coefficients become, to the point where overflow is a real problem when they're used to multiply the data points. So what I've done is to apply just the [1, 2, 1] filter to the data iteratively n times to get to the nth odd row of the triangle. And this just involves adding and shifting at each step.

To do this, the program uses a history array 3*n longs long. Each successive triad contains the history of the last three data points, where the data in triad i are the results of applying the [1, 2, 1] filter to triad i-1. I have sat down with pencil and paper to prove that doing this twice (n = 2) is equivalent to applying the [1, 4, 6, 4, 1] filter. So by extrapolation, I'm assured that successive odd rows fall out from further operations. The downside to doing it this way is the increased likelihood of truncation errors. But that's the price to be paid for no overflows.

Here's the program:

CON

  _clkmode      = xtal1 + pll16x
  _xinfreq      = 5_000_000

  MAXSTAGES     = 50
  UNDEF         = $8000_0000

VAR

  long history[MAXSTAGES * 3]

OBJ

  pst:  "Parallax Serial Terminal"

PUB start | seed, yy, zz, i, base, avg

  pst.start(9600)
  configure_smoothing(@history)
  repeat i from 0 to 511
    if (i > 128 and i < 384)
      base := 2000                               '+1000 pulse in the middle.
    else
      base := 1000
    yy := base + (?seed >> 25) + (i // 40) * 4   'Apply noise and sawtooth wave to signal.
    zz := smooth(@history, yy, 30)               'Do a 30-stage binomial smoothing.
    
    if (i)
      avg := (avg * 11 + yy) / 12                'Exponential smoothing for comparison.
    else
      avg := yy
      
    pst.dec(yy)
    pst.char(" ")
    pst.dec(zz)
    pst.char(" ")
    pst.dec(avg)
    pst.char(13)

'The following two subroutines set up and perform the binomial smoothing operation.

PUB configure_smoothing(hist_addr)

  long[hist_addr] := UNDEF                       'Constant used for undef, since Spin doesn't have one.

PUB smooth(hist_addr, value, n) | i

  n += n << 1                                    'Multiply n by 3.
  if (long[hist_addr] == UNDEF     )             'If first value, fill history with it.  
    longfill(hist_addr, value, n)
    return value
  else
    longmove(hist_addr, hist_addr + 4, n - 1)    'Otherwise shift history left by one long.
    repeat i from 0 to n - 1 step 3              'Iterate the [1 2 1] filter.
      long[hist_addr][i + 2] := value            'Insert value as last in local history triad.
      value := (long[hist_addr][i] + (long[hist_addr][i + 1] << 1) + value + 3) >> 2 'New value := (value[i - 2] + 2 * value[i - 1] + value[i]) / 4
    return value

This uses a 30th order binomial filter, which, if done explicitly, would involve 59 coefficients. I plotted the data it output in Excel. Here's the graph:

The major takeaways are these:

1. Both the exponential and binomial filters do a good job of cleaning up short-term fluctuations in the data.
2. The binomial filter does a much better job following data trends that the exponential filter mushes up.
3. The binomial filter entails a delay equal to n. The reason for this is that the highest coefficient is in the middle, n-1 data samples in the past from the current one. Another way to look at it is that the binomial filter considers not only the current value and values in the past, but also values in the future to compute the smoothing.

-Phil

Mark_T · 2018-10-19 11:08

Interesting - always on the lookout for a digital filter without multiplies for PASM use (!)

Its a bit less computational load to use [1 1] as the basis, rather than [1 2 1], twice as many
steps but only 2 x (move + add) rather than 3 x move + shift + add + add.

The moves can be eliminated by ping-ponging between two copies, some Python to
illustrate (this ought to translate to two unrolled PASM loops, one for each value of sense, meaning
cost is mainly one add instruction per [1 1] stage.)

N = 16
state = np.zeros ([2, N], float)

sense = 0

def smooth (inp):
    global sense
    state [sense, 0] = inp
    for i in range (N-1):
        state [sense, i+1] = state [0, i] + state [1, i]
    out = state [0, N-1]+state [1, N-1]
    sense = 1 - sense  # swap columns
    return out / (1 << N)

ErNa · 2018-10-19 15:49

Hello Phil, what do you want to reach? Remove the noise and just have the sawtooth? Determine the frequency of the sawtooth? Do you need the result in realtime or is a time shift acceptable? Could you attach the raw data file?

Phil Pilgrim (PhiPi) · 2018-10-19 19:55

My objective is just to remove the noise while preserving the short-term and long-term trends without smooshing them into oblivion. The filter I've presented meets that criterion for my app.

Here's the raw data generated by the above program:

-Phil

ErNa · 2018-10-19 22:03

This filter removes noise to a certain level, so the slopes are less steep. There is no phase shift visible, as the original signal is delayed, otherwise the phase shift would be 15 samples. As the middle part is easily detected as signal, to apply the filter, the signal is removed and later added again, that is the reason, why the step is still sharp.

Phil Pilgrim (PhiPi) · 2018-10-19 23:03

How did you do it? In my target app, I don't know what the signal is, so I can't remove it ahead of time. The data are a time-series, and I need to apply the filter in real time.

-Phil

pmrobert · 2018-10-20 00:45

ErNa, that is some seriously impressive filtering. Please share at least a hint?

Mike R.

Beau Schwabe · 2018-10-20 02:55

You know, this smells like a "Golf Challenge" :-)

Here is a 4th order delta filter ...

The superimposed image at the bottom with the spike going down on the left and up on the right is the delta between Samples.
The Sample starting at about 1000 is the original data.
The proceeding samples represent each successive order of filtering. I purposefully offset the Y by 250 on each filter.

The filter consists of a FIFO window of three samples, call them A1, A2, and A3. D1,D2,and D3 are Delta values. F1 is the Filtered output. Each pass of the filter, you lose the last two data samples.

D1 = A1-A2
D2 = A1-A3
D3 = INT((D1+D2)/2)
F1 = A1-D3

Phil Pilgrim (PhiPi) · 2018-10-20 03:29

Beau,

So,

D3 = (D1 + D2) / 2 = (A1 - A2 + A1 - A3) / 2 = A1 - (A2 + A3) / 2
F1 = A1 - D3 = (A2 + A3) / 2

This looks like the [1, 1] kernel, but peering into the future, rather than into the past, which explains the increasing negative delay with each pass. Right?

-Phil

Beau Schwabe · 2018-10-20 05:45

Phil,
Peering into the future is correct I guess, although technically you are still looking at the past since you lose the last two data points or they become part of the equation rather for each Delta order you implement. The idea is that you need at least 3 data points in a first in first out approach, so the output data for each order isn't valid until you have read at least 3 points, either way you lose the last two or first two depending on how you look at the incoming data stream. For the 4th order Delta you end up losing 8 data points.

Here is the same data taken to an 8th order Delta filter. 16 data points are lost at the end so the 512 data set is now 496

EDIT: losing 2 data points per order probably isn't correct. A Phase shift or delay would be more accurate. SO for each time you implement a filter stage, you introduce a delay of two samples per stage

Phil Pilgrim (PhiPi) · 2018-10-20 06:31

Unfortunately, I don;t have the luxury of knowing what future data values will be, and I need to come up with a filtered value based only upon past data values. However, I might be able to predict future data values based upon a regression analysis of past values and use those as if they were actual data. Hmmm ...

-Phil

Beau Schwabe · 2018-10-20 07:11

Argggghh !!!! .... It's all a matter of perspective. The Delta filter doesn't know any future values. I think what is throwing you off is that the data in the graph is left justified.

ErNa · 2018-10-20 09:34

Seems to be interesting matter and Beau just comes up with a very good solution. He's deep into the matter, ;-)
When it comes to filter the data, the question always is, what do you expect to happen. This sample data seems artifical, as there is a pulse, a sawtooth and noise. If you look to the derivative, Beau created, it is obvious, there are spikes in a certain distance and at certain amplitude, so these can be predicted to and seen as a part of the signal. It is a good idea to not filter those spikes out. So one could say: calculate derivative, limit amplitude to a level 3 sigma, apply a filter and remove the filtered out components from the original signal, that should suppress the noise and keep the information.

In this signal, the question is: what is the information. Is it every new sample or is the information only valid after a sawtooth completed? In the last case, the value is the average over one sawtooth and you have a time delay of 1/2 period anyway. In the other case, the history tells you, there is a systematical sawtooth and the new sample just tells you, if this sawtooth continues or ends. Such deviations can only be found, when the noise is less then the change of the signal.

ErNa · 2018-10-20 11:35

Whenever we sample a signal, we either accept, what we get, or we look back to the past to see if the signal didn't change faster then we experienced. Being more optimistic, we also can make a projection to the future and we check, if the signal fits to our prediction. Then we have to decide, what is the "truth". A binominal filter is to my understanding a folding of the signal with the gaussian standard deviation. So this can never reproduce a step without generating a finite slope and so phase delay.

Phil Pilgrim (PhiPi) · 2018-10-20 15:10

I just don't see the effect of any derivatives in Beau's filter. It's just:

F1 = (A2 + A3) / 2
F2 = (A3 + A4) / 2, etc.

And that's the same as Mark_T's technique for iterating the binomial filter, at least after any even number of iterations. (If you stop at an odd number, you don't get a true binomial filter.)

Ideally, the filter I use, would look something like this:

f_i = f(a₀*x_i + a₁*x_i-1 + ... + a_n*x_i-n)

The binomial filter fits the bill. But, because the middle a terms are the largest ones, they contribute to a large phase delay. What I propose is to come up with a binomial filter whose left tail is applied to past values, the center term to the current value, and the right tail to "future" values extrapolated from past values, using a polynomial or other regression technique. That way, the phase delay is minimized.

-Phil

ErNa · 2018-10-20 15:44

Phil, the problem you show is quit simply to solve: just state what the next value is to be and if you do this for a long time, the signal will follow, as the signal is dump like a bread. As we are older and know about the conservation of momentum and energy, we do not follow this advice of mine. We should argue more reasonable:
What we actually do when applying a filter, we write one vector of lenght 1 with the n components being the filter coefficients and then we take the last n samples of the signal as a second vector and we calculate the scalar product to get "how much" of the filter vector is "in" the signal.
Imagine the signal is constant but noisy. Then the scalar product to the middle of the binominal is just 50%. This will eliminate the noise and not introduce a phase shift, as the signal is constant, so phase shift doesn't apply. This will also work for any filter coefficients. The binomial just weights the values close to the center a little more, so if the signal is not constant, your result will be closer to the actual value of the last sample. If you could look into the future and see, that the signal is symmetrical to the past, then the next 50% of the signal will show the same value as the first 50%. But as we do not know, how the signal develops this is just lotterie.
So you could do this experiment: take 50% of the binominal and calculate the current value. Next: take 70% of the binominal, so the peak is in the past, and calculate the current signal. Now imagine to go back in time to the max of 70 %, predict what will happen 20% later, actual the known current value and determine, if you prediction was right. Then, as you know the error you made, assume that you just now made the same error and correct it.
That is just the basic principle how to make a learning system. But, always look that your predictions don't hurt conservation of momentum and energy. Or, to bring a less abstract example: if your employment rate raises constantly, there is a natural limit of 100 %. If someone tells you, that growth will continue for ever, he might be false.

Mark_T · 2018-10-20 19:45

Phil Pilgrim (PhiPi) wrote: »

Unfortunately, I don;t have the luxury of knowing what future data values will be, and I need to come up with a filtered value based only upon past data values. However, I might be able to predict future data values based upon a regression analysis of past values and use those as if they were actual data. Hmmm ...

-Phil

Sounds like a recipe for a nonlinear chaotic system to me... And you lose all the predictable
properties of a time-invariant linear system.

Mark_T · 2018-10-20 19:47

Or put another way: Try an IIR filter first, both approaches are going to require lots of multiple-and-add I think.

ErNa · 2018-10-20 20:09

What is bad with guessing, what the future is, if we already guess what just happens and what was in the past. There are so many histories, that it needs scientist to establish a narrative ;-)
In the future we will have P2 that is for sure, in the past, we had no P2, that's for sure to. And in between, we wait.....

Phil Pilgrim (PhiPi) · 2018-10-21 05:31

Mark_T wrote:

Or put another way: Try an IIR filter first, both approaches are going to require lots of multiple-and-add I think.

No!!! That's precisely what I'm trying to get away from. I'm totally committed to the FIR approach. I just need something without a huge phase lag that I can explain to my customer.

-Phil

ErNa · 2018-10-21 07:56

Phil, the point with your signal is, that the high frequency components that produce the sharp edges also generate the noise. Removing the noise flattens the signal and so there is a time shift. Search for a saw tooth and both shift it in amplitude and time to find the optimal correlation. I suppose, the signal is not high frequency, so computational effort should not be an issue

ErNa · 2018-10-21 10:47

Made another try:
Calculating the derivative, but D(i) = S(i) - S(i-2)
The idea behind that is: obviously, the steps in the signal (red) extend over more than one sample and so highest frequency noise (alternating) is suppressed, while the signal steps are not. By differentiation, the constant part is lost, so this part has to be added separately when reintegrating.
Integration now takes every difference, so the result has to be devided by 2 (green). To improve noise suppression, the derivative can also be averaged (in this case (d(i)+d(i-1)+d(i-2))/3 which shows the result in red.

It looks that the key with filtering is to know, what you are looking for. So if you look to your neighbor and you see a mob, take care that your sight was not filtered by a mirror.

SnSh21.10.2018-12.43.22.gif

Mark_T · 2018-10-22 11:05

Phil Pilgrim (PhiPi) wrote: »

Mark_T wrote:

Or put another way: Try an IIR filter first, both approaches are going to require lots of multiple-and-add I think.

No!!! That's precisely what I'm trying to get away from. I'm totally committed to the FIR approach. I just need something without a huge phase lag that I can explain to my customer.

-Phil

Then you have to put up with a small number of coefficients in the FIR - have you tried approximations to sinc()
using small integer coefficients (mainly powers of two) - for instance:

[1, 2, 1, 0, -2, -4, -2, 0, 4, 12, 16, 18, 16, 12, 4, 0, -2, -4, -2, 1, 2, 1]

Which has a cutoff of about fs/8 and latency of 10 samples - derived by adhoc means. I don't know how to
determine FIR coefficients for minimum phase, other than to simply truncate the future coefficients sooner.

Mark_T · 2018-11-04 01:09

I recently came across a description of cascaded integrator-comb filters, and this seems to be pertinent to this thread
http://dspguru.com/files/cic.pdf and https://www.embedded.com/design/configurable-systems/4006446/Understanding-cascaded-integrator-comb-filters

Outside of the use for decimation and interpolation, an integrator-comb filter is another way of looking at
boxcar filtering as a combination of an integrator and a comb filter where the integrator pole cancels a single
zero of the comb filter.

Tracy Allen · 2018-11-04 18:50

Here's a thread where Marty Lawsen applied a CIC filter to improve the resolution of the prop1 Sigma Delta:
high-precision-propeller-sigma-delta-adc

And here's a pdf link to that nice article by Richard Lyons:
https://faculty-web.msoe.edu/prust/sdr/cic_filters.pdf

Phil Pilgrim (PhiPi) · 2018-11-05 04:31

Tracy, thanks for posting the link! That's a very good paper, although my eyes start to glaze over when it talks about the Z-plane.

One thing I've discovered the hard way is that so much in the filtering world is applicable to AC signals (e.g. audio, RF) only, neglecting the DC component. I thought maybe I could use my FIR filter generator to come up with a filter for the time-series data. But no. Even though the TFilter FIR filter design tool claims a gain of 1.0 for a low-pass filter, the gain at 0 Hz can be less than that, which means the filtered data will be skewed downwards. This is because the ripple in the passband may not intersect the Y-axis at 0 dB.

The simplest way to guarantee DC unity gain is to make sure that all of the FIR filter coefficients sum to 1.0. If they are all positive, so much the better. This is not the case with many TFilter-generated filters, even though the AC gain is still 1.0.

Currently, I'm looking at a combination of a causal FIR filter (i.e. one with minimum phase delay), followed by median filtering.

-Phil

ErNa · 2018-11-05 07:53

Hi Phil, I feel with you, the z-plane to me also is not very intuitive. Mainly it's because of there is a theory, that reflects models that again not always reflect reality. But if you look to see DC as a very low frequency, it becomes clear, that any filter to remove those low frequencies will have a very long response time. DC only is detectable, if you know that any error is due to overlayed AC and so you can isolate only AC that is composed from many harmonics. That's why I asked: what is the signal you expect. From the example you gave, a step function and sawtooth seems to be the component of interest.

Phil Pilgrim (PhiPi) · 2018-11-05 15:51

The sawtooth and step functions were just highly-adverse fake data to test various filtering strategies. The real data come from environmental sensors and are noisy but look nothing like the functions I posted.

-Phil

ErNa · 2018-11-05 16:06

That is what I could imagine. The point with filtering is you always have to know what the signal looks like and how the transmission is structured.
The most simple assumption is: noise is stochastic. In this case the filter of choise is the gaussion scaled sliding average. If noise is created by accident, then it shouldn't make any difference, which sampling points you take. But as there is a underlying signal, selecting the sampling points matters. So anyway you have to have a model of the signal generating mechanism.

Tracy Allen · 2018-11-05 20:02

I know what you mean about the z-domain. My son is taking a second degree, in electrical engineering (first one in music), and he is now in the throes of control systems and z transforms. I had a lot of that when I was in school but it is all very rusty--I can't give him much support. I remember there was a good ol' crank to turn for systems that involved time delays and feedback taps. The CIC-like and shaping filters are all about that.

Chopping the signal works well, to bring the signal from DC up to a set higher frequency where it can be dealt with in an exquisite narrowband filter. Unfortunately a lot of those environmental signals can't be chopped in a way that doesn't include the interference.

Tracy Allen · 2018-11-05 20:03

Why would you use the FIR filter followed by the median filter? I'd think you want the other way around. The median filter tends to eliminate outliners.

Iterative binomial filter in Spin

Comments