Computing a running median

Phil Pilgrim (PhiPi) · 2008-05-24 03:35

In a recent thread, a question came up about array sorting using a BASIC Stamp. As part of the discussion, Tracy Allen mentioned sorting as a step in finding the median value in a sequence of sensor readings. The median is an important statistical concept, because it is less sensitive to the occasional "wild" reading, which can easily skew the mean (average) of a sequence of values.

The discussion got me thinking whether sorting is a necessary precursor in this process, or whether more efficient methods can be used. This is of particular importance for arrays stored in scratchpad RAM in the BS2p- family of Stamps, since swapping values (e.g. during a sort) can be a bit cumbersome with GET and PUT. I was hoping to find an algorithm that didn't require moving a value's location once it found its way into the scratchpad.

In pursuit of this, I came up with the following program that computes a "running median" using the N most recent sensor readings. (The test program uses random data in place of sensor readings.) After each reading, the program compares the current median with the new reading and with the value it replaces. Based on this, it determines whether the median value needs to change and, if so, scans the N most recent readings once to figure out the new median. For N readings, then, the algorithm is O(N²), the same as a bubble sort. But neither the readings themselves, nor an index table, need to be shuffled. Once a reading is written to RAM, it stays put until it's replaced by a newer reading.

Here's the program:

' {$STAMP BS2pe}
' {$PBASIC 2.5}

'Program to compute the running median of sample inputs.

N         CON 11    'Number of running samples for median. N must be odd.
N2        CON N * 2 'For word addressing.
NMID      CON N / 2 'Midpoint of array.

ptr       VAR Byte  'Location of last value written to scratchpad.
i         VAR Byte
k         VAR Word
nsame     VAR Byte
Rand      VAR Word
NewVal    VAR Word
Median    VAR Word
CurVal   VAR Word
NewMedian VAR Word

ptr = N2 - 2
Median = 0                              'Scratchpad initialized to 0's, so median is also 0.


FOR k = 1 TO 1000                       'Acquire 1000 samples.
  RANDOM Rand
  NewVal = Rand // 10 + 50              'Sample is a random integer on [noparse][[/noparse]50, 59].
  GOSUB RunningMedian                   'Compute running median.
  FOR i = 0 TO N2 - 2 STEP 2            'Print last N samples, <highlighting> latest one.
    GET i, Word CurVal
    IF (ptr = i) THEN
      DEBUG "<", DEC2 CurVal, ">"
    ELSE
      DEBUG " ", DEC2 CurVal, " "
    ENDIF
  NEXT
  DEBUG ":  ", DEC3 Median, CR          'Print computed median.
NEXT
END

'Subroutine fo computing median of N most recent samples.

RunningMedian:

  ptr = ptr + 2 // N2                                     'Increment pointer.
  GET ptr, Word CurVal                                    'Get value at that location.
  IF (NewVal <> CurVal) THEN                              'Is new value different?
    PUT ptr, Word NewVal                                  '  Yes: Write it.
    IF (NewVal <= Median AND CurVal >= Median) THEN       '       NewVal <= Median <= CurVal?
      NewMedian = 0                                       '         Yes: Find highest value in list
      nsame = 0                                           '              less than median,
      FOR i = 0 TO N2 - 2 STEP 2
        GET i, Word CurVal
        IF (CurVal < Median) THEN
          nsame = nsame + 1                               '              and count qualifying values.
          IF (CurVal > NewMedian) THEN NewMedian = CurVal
        ENDIF
      NEXT
    ELSEIF (NewVal => Median AND CurVal <= Median) THEN   '      NewVal >= Median >= CurVal?
      NewMedian = 65535                                   '        Yes: Find lowest value in list
      nsame = 0                                           '             greater than median,
      FOR i = 0 TO N2 - 2 STEP 2
        GET i, Word CurVal
        IF (CurVal > Median) THEN
          nsame = nsame + 1                               '             and count qualifying values.
          IF (CurVal < NewMedian) THEN NewMedian = CurVal
        ENDIF
      NEXT
    ENDIF
    IF (nsame > NMID) THEN Median = NewMedian    'Median changes iff number of qualifiers > NMID.
  ENDIF
  RETURN

To test this program, I wrote a PC program in Perl that takes output from the PBASIC program and computes the median at each step the old-fashioned way using a sort. In 1000 steps, it didn't find any errors, so I'm fairly confident the PBASIC program works as it should. Here's the test program. It prints only if an error is found:

use strict;
my $prev;
 
while (<DATA>) {
  s/[noparse][[/noparse]\n\l\r]+//;
  my @array = split /[noparse][[/noparse] :<>]+/;
  shift @array;
  my $median = pop @array;
  @array = sort {$a <=> $b} @array;
  if ($array[noparse][[/noparse] 5] != $median) {
    print "$prev\n$_\n\n"
  }
  $prev = $_;
}
 
 __END__
<52> 00  00  00  00  00  00  00  00  00  00 :  000
 52 <52> 00  00  00  00  00  00  00  00  00 :  000
 52  52 <52> 00  00  00  00  00  00  00  00 :  000
 52  52  52 <58> 00  00  00  00  00  00  00 :  000
 52  52  52  58 <57> 00  00  00  00  00  00 :  000
 52  52  52  58  57 <57> 00  00  00  00  00 :  052
 52  52  52  58  57  57 <59> 00  00  00  00 :  052
 52  52  52  58  57  57  59 <59> 00  00  00 :  052
 52  52  52  58  57  57  59  59 <56> 00  00 :  056
 52  52  52  58  57  57  59  59  56 <57> 00 :  057
 52  52  52  58  57  57  59  59  56  57 <50>:  057
<55> 52  52  58  57  57  59  59  56  57  50 :  057
 55 <52> 52  58  57  57  59  59  56  57  50 :  057
 55  52 <51> 58  57  57  59  59  56  57  50 :  057
 55  52  51 <59> 57  57  59  59  56  57  50 :  057
 55  52  51  59 <56> 57  59  59  56  57  50 :  056
 55  52  51  59  56 <58> 59  59  56  57  50 :  056
 ...

-Phil

Post Edited (Phil Pilgrim (PhiPi)) : 5/24/2008 10:53:01 PM GMT

RickH · 2008-05-24 21:53

Very nice, I am going to log this one for reference.

Tracy Allen · 2008-05-25 20:58

Hi Phil,

That is very nice and it is always interesting to follow through your logic!

I think your algorithm is closer to O(N) than to O(N²). Apart from the initial and final overhead, it makes only one pass through the data.

Actually the general bubble sort if properly implemented is O(n log n), not O(n²). That is because each successive pass through the loop is shorter than the last, and because the algorithm terminates when a pass through is made with zero swaps. The worst case is when the entire data array is repopulated with random numbers. Moreover, in the case where the data is already almost sorted, it can come pretty close to O(n). I do think bubbleSort routine I posted in the other thread can be improved in speed for the situation we are talking about. You've given me some food for thought.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

Phil Pilgrim (PhiPi) · 2008-05-26 05:10

Tracy,

Each step of my algorithm is either O(N) or O(1), so O(N) overall. The O(N²) is for a group of N readings, in order to make it a fair comparison with regular, "non-running-median" calculations.

-Phil

Tracy Allen · 2008-05-27 16:28

Well, another fair comparison is with methods that achieve your goal, which you did, of tracking the running median without necessarily doing a full sort, and without having to do so many GETs and PUTs.

The attached program is another one that is faster than a full bubble sort, but it does maintain a sorted table. This is for situations such as a multiplayer game, where the players take turns, or in data acquisition, where readings are processed one at a time in sequence. This is I think slower than yours, because it does do a lot of GETting and PUTting, but it does maintain a sorted table. The trick is that it takes only one bubble pass, the direction of which is determined by the new score entered in comparison to the score that it replaces. Here is the core procedure. It took some doing to compress both the scan (jx=0 TO np2) and the scan (jx=np2 TO 0) into one routine!

' enter with new value and with the value0 that it is replacing in the table of scores, both words.
bubbler:
    IF value > value0 THEN  sign=1 ELSE sign=0   ' set direction of bubble, depending on comparison of new value with old
     FOR jx = np2 * sign TO np2 *(1-sign)     ' loop through, either from np2 TO 0, or from 0 TO NP2
      DEBUG "n"+sign                        ' show pass through inner loop, "n" or "o"
      GET ranks + jx, tmpA                 'gets the pointer to the score currently ranked jx
      GET ranks + jx + 1, tmpB         'gets the pointer to the score currently ranked jx+1
      GET 2*tmpA+scores, WORD value(1-sign)          'gets the score currently ranked jx
      GET 2*tmpB+scores, WORD value(sign)        'gets the score currently ranked jx+1
      IF value(1-sign) < value(sign) THEN       ' ' condition for swap also depends on the direction parameter
        PUT ranks + jx, tmpB               'swap the rankings 
        PUT ranks + jx + 1, tmpA
        DEBUG "."                        ' show swap occurred with a "."
      ENDIF                                         'otherwise leave them alone, current ranking correct
    NEXT
   RETURN

That still isn't very efficient. With a binary search for the ranking, it should be possible to achieve O(log n + n/k).

One data acquisition scheme is a combination median and window average, by including in the average only the values that fall within, say, the two middle quartiles.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

Post Edited (Tracy Allen) : 5/27/2008 4:34:23 PM GMT

Computing a running median

Comments