Mean, Median and Mode for sensor data.

jdolecki · 2017-08-07 21:40

While I'm no math wiz what other averaging methods are used to smooth sensor data?

Also is there some formula that helps me decide how many samples to use?

And last obviously the more samples I take the longer it takes to update. How do I calculate the total time to update my sensor reading?
I assume the processor takes one sample per clock cycle? So if I take 25 samples I times that my the processor speed?
I know certain instructions take more cycles than others? We're do i find that info?

Thanks
John

Don M · 2017-08-07 22:40

I'm guessing you are referring to one of your other posts regarding the ultrasonic sensors?

I'm working on a project now using the Maxbotix USB sensors. I'm writing it in Python for use on the Raspberry Pi. I found that I would get some spurious readings once in a while also. Not very often but it did happen. So I wrote some code to create a list of readings but threw out the highs and lows and then averaged the remaining data. Seems to work ok. I have a variable that I can use to determine how many samples. Right now I'm using 5 samples.

JasonDorie · 2017-08-07 22:48

"one sample per clock cycle?"

Not likely - Depending on the sensor in question, it may be reading an I2C or SPI device, waiting for an R/C filter to expire, or something else. In all of those cases it'll be more than one clock cycle to take a reading, and in some cases, like reading a Ping or R/C filter, the time taken will vary. (Pings read quicker when they're closer because it takes less time for the sound pulse to echo back).

Your best option is to read the processor clock when you start, then again when you're done, and subtract the two values - that'll tell you how many clock cycles have elapsed. Divide by 80 for microseconds, 80,000 for milliseconds, assuming an 80MHz clock rate.

msrobots · 2017-08-07 22:57

more input needed.

first, the kind of sensor you use affects the minimum read time, say a photocell might deliver fast outputs, but a ping sensor has a minimum response time quite slower.

Second, the language and protocol used to read your sensor(s) will affect the fastest time to read a sensor.

But you can find it also out by experiment.

Make a loop in the language of your choice and read your sensor 1 time, 10 times ,100 times and use the system counter CNT to calculate the time it needs.

That is - theoretically - your fastest sample time. But since you need also to process your input data usually you do not reach that speed.

Now to simple averaging. That somehow depends on your 'kind' of your data from the sensors.

Say a temperature sensor changes slowly and should give a quite linear graph. In cases like that you can filter out noise by just allowing a certain distance from the current reading and kick out 'extreme' values.

A ADC say used in some oscilloscope can and should be able to go from 0 to whatever instantly, so you can not filter out' extreme' values.

Sensor data can have erratic readings, sometimes, so if applicable it is good to filter 'extreme' values out before any averaging.

Averaging does smooth the data, but the cost is slower overall changes, so reduced response time. Bad for a scope, good for more linear sensors.

I think the most useful one is a rolling average, mostly used with a small amount of sensor readings, think 4 to 30.

for example with 10:

after each sensor reading add the last 10 readings and divide the result by 10. this will flatten out fast changes.

Now you can put weight to readings, say actual reading times 3 PLUS, the reading before times 2 PLUS the next 5 readings as they are, then divide by ten.

Now it is still flattened but the actual value has more influence, the reaction time is better.

Since dividing by 10 is slow for a computer it is better to use sizes where shift can be used instead of divide. 4,8,16 as size is better.

But a lot of it depends on what kind of sensor for what kind of job.

There is no magical formula, just experiment how fast it can go and the decision how fast it needs to go.

Enjoy!

Mike

Phil Pilgrim (PhiPi) · 2017-08-08 17:03

Here's where I would decide to use the various filtering methods:

Mean: Use when the outputs from sensor data are pretty consistent, with no radical outliers to skew the average. There are several methods to do this, including boxcar averages and IIR-type running averages.

Median: Use for data that has occasional radical outliers, such as that from ultrasonic distance sensors. This can be done in successive boxcar groups of data or FIFO style after every reading.

Mode: Useful only when the data comes from a small set of discrete values. Outliers will be discarded, but if the data is very noisy, it will be hard to find common values within a set of readings.

-Phil

Tracy Allen · 2017-08-08 21:50

jdolecki wrote: »

And last obviously the more samples I take the longer it takes to update. ...
Also is there some formula that helps me decide how many samples to use?

Say you want to average 32 samples. The slow way to do that is to add up 32 successive readings and then divide by 32 at the end. It does take 32 sample intervals between output readings.

An alternative is to set aside 32 memory locations as a circular buffer with a pointer that replaces the oldest reading with the newest one at each sample interval. The output can be calculated at every step, so the output rate is the same as the input rate. The computation is incremental and can do different types of statistics, for example a window average, weighted or not, or a median etc. The response lag and shape to a sudden step change at the input depends on the filter and is tied closely to the length of the buffer. In the case of a median filter, the output jumps suddenly at a time equal to 1/2 the sampling interval times the buffer length. In the case of an unweighted window average, the response is a linear ramp that completes to the final value exactly when the buffer fills up with the final value.

IIR filters don't need a buffer. An exponential filter for example has accumulator and an exponent:
accumulation+= (new - accumulation) >> exponent
The step response has a time constant to settle within 1/e of the final value of about 2^exponent. For example, if the exponent is 6, then it takes about 32 samples to get there, and then 32 more samples to close in by another factor of 1/e, and so on.

For what to choose, the thing is to collect of large sample of successive data points and look at how they are distributed in time and amplitude. If extreme outliers are indeed sparsely distributed in time, then a median filter will help.

Don M · 2017-08-09 00:18

I'm assuming (yes I know what can happen) if you're monitoring a tank level the level won't change very fast so who cares (basically) how long it takes to figure out the values? Within reason of course.

jdolecki · 2017-08-09 01:32

Thanks everyone for the responses. It seems there are many ways to do this as there are members who responded. That's just what i was looking for.
Thanks again to all of you.
John

Mean, Median and Mode for sensor data.

Comments