Shop OBEX P1 Docs P2 Docs Learn Events
QuickMedian Algorithm in Spin — Parallax Forums

QuickMedian Algorithm in Spin

Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
edited 2015-01-13 20:20 in Propeller 1
Here's an implementation of the QuickMedian algorithm, invented by C.A.R. Hoare, who also came up with QuickSort. You can read about it here. I adapted the C++ code presented there to Spin and made some corrections to the end conditions that kept the original code from terminating.

The Spin code is attached below. It makes use of my '.debug framework to display the median-finding progress as it goes along. You can download the '.debug background .exe here:

What makes QuickMedian interesting is that it doesn't do a complete sort of the array -- just enough to isolate the median value. I haven't come across any errors in my implementation yet, but I haven't tested the code exhaustively, either.

-Phil

Comments

  • Tracy AllenTracy Allen Posts: 6,664
    edited 2012-09-14 08:22
    That link to how it works was very helpful. Ingenious.

    Having the median, apply it again to the halves, to find the quartiles. Combine that with the extrema and you have all you need for a quick alternative to mean/variance/skew.
  • Duane DegnDuane Degn Posts: 10,588
    edited 2015-01-12 12:46
    Phil,

    Thanks for sharing your work.

    I'm trying to figure out if I'm doing something wrong or if there's an error in the algorithm which when certain combinations of numbers are being sorted, the program gets caught in an endless loop.

    I've attached the program I used to generate the following output.
    LRF Median Demo
    
    |165 164 164 165[166]166 165 165|
     165 164 164 165[165]166 165|166
    ------------
    |165 164 164 165[165]165 166|166
     165|164 164 165[165]165 166 166
    ------------
     165 164 164 165[165]165 166 166
    
    Filtered Reading = 165 cm
    
    Array size: 8  Comparisons: 90  Swaps: 33
    Press any key to continue.
    

    In the above example the median is quickly found.

    Most of the output from the attached program look this way. However every once in a while the program hangs.
    LRF Median Demo
    
    |165 164 164 163[164]164 164 165|
     164|164 164 163[164]164|165 165
     164 164|164 163[164]164 165 165
    ------------
     164 164 164|163[164]164 165 165|
    ------------
     164 164 164|163[164]164|165 165
    ------------
     164 164 164|163[164]164|165 165
    

    The sequence shown in the last two lines just keeps repeating.

    I changed your original code to use PST and to display three digit numbers rather than two digit numbers and I also changed it to sort longs rather than bytes. Character 149 was placing asterisks in the terminal so I replace "149" characters with "|" characters. I hope my formatting is close enough to the original, you can make sense of the numbers.

    I'm hoping there's some small error in the original code which causes this endless loop. I don't see any reason why sorting longs would be different from sorting bytes but just to be sure, I'll try to replicate this with the byte sorting code.
  • Duane DegnDuane Degn Posts: 10,588
    edited 2015-01-12 12:55
    I took the last two digits from the array which caused the endless loop in the previous example and used in the byte sorting code. It caused a similar endless loop.
    Quick Median Demo
    
    |65 64 64 63[64]64 64 65|
     64|64 64 63[64]64|65 65
     64 64|64 63[64]64 65 65
    ------------
     64 64 64|63[64]64 65 65|
    ------------
     64 64 64|63[64]64|65 65
    ------------
     64 64 64|63[64]64|65 65
    ------------
     64 64 64|63[64]64|65 65
    

    Here's the program used to generate this output. I moved the "array" to the DAT section so I could fill it with the offending numbers.
    CON
    
      _clkmode      = xtal1 + pll16x
      _xinfreq      = 5_000_000
      
      SIZE          = 8'21
    
    OBJ
    
      'rnd   : "RealRandom"
      Pst   : "Parallax Serial Terminal"
      
    VAR
    
      'byte  array[SIZE]
      long  seed
      long comparisons, swaps
    
    PUB  start | i
    
      'rnd.start
      'seed := 255 'rnd.random
      'rnd.stop
      Pst.Start(115_200)
      repeat
        Pst.Str(string(11, 13, "Press any key to begin."))
        result := Pst.RxCount
        waitcnt(clkfreq / 2 + cnt)
      until result
    
      Pst.RxFlush
      Pst.Clear
      Pst.Home  
      Pst.Str(string("Quick Median Demo", 13, 13))
      
      {repeat i from 0 to SIZE - 1
        'array[i] := (?seed >> 16) // 90 + 10
        array[i] := (?seed >> 16) // 18 + 10  }
      i := byte_median(@array, SIZE)
     
      Pst.Str(string(13, "Median: "))
      Pst.Dec(i)
      Pst.Str(string("  Array size: "))
      Pst.Dec(SIZE)
      Pst.Str(string("  Comparisions: "))
      Pst.Dec(comparisons)
      Pst.Str(string("  Swaps: "))
      Pst.Dec(swaps) 
      
      '.debug #CR, "Median: ", dec i, "  Array size: " , dec SIZE, "  Comparisions: ", dec comparisons, "  Swaps: ", dec swaps
    
    PUB byte_median(addr, n) | k, first, last, i, j
    
      first~
      last := n - 1
      k := n >> 1
      repeat while (first < last)
        i := byte_split(addr, byte[addr][k], first, last)
        j := i & $ffff
        i >>= 16
        if (i =< k)
          first := i
        if (j => k)
          last := j
        Pst.Chars("-", 12)
        Pst.NewLine     
        '.debug rep "-"\12, #CR
      show(first, last)
      return byte[addr][k]
    
    PUB byte_split (addr, pivot, first, last) | t
    
      repeat
        show(first, last)
        repeat while byte[addr][first] < pivot
          first++
          comparisons++
        repeat while byte[addr][last] > pivot
          last--
          comparisons++
        comparisons += 2
        if (first < last)
          t := byte[addr][first]
          byte[addr][first] := byte[addr][last]
          byte[addr][last] := t
          swaps++
          first++
          last := last - 1 #> first
      while first < last
      return first << 16 | (last #> first) 
    
    PUB show(first, last) | i
    
      repeat i from 0 to SIZE - 1
        Pst.Char(" ")
        if array[i] < 10
          Pst.Char(" ")
        Pst.Dec(array[i])
        '.debug " ", dec2 array[i]
      Pst.PositionX(first * 3)
      Pst.Char("|")
      Pst.PositionX(last * 3 + 3)
      Pst.Char("|")
      Pst.PositionX((SIZE >> 1)* 3)
      Pst.Char("[")
      Pst.MoveRight(2)
      Pst.Char("]")
      Pst.NewLine
      
      '.debug #MOVX, first * 3, 149, #MOVX, last * 3 + 3, 149
      '.debug #MOVX, (SIZE >> 1)* 3, "[", rep #RIGHT\2, "]", #LF, #MOVX, 0     
    DAT
    
    array         byte 65, 64, 64, 63, 64, 64, 64, 65
    

    I've also attached the above code as a Spin file.

    I'll be trying to figure this out but I find the sort algorithm a bit mystifying.
  • Duane DegnDuane Degn Posts: 10,588
    edited 2015-01-12 13:19
    I used an even sized array. I'm betting that was my problem.

    Edit: Nope. I spoke too soon. It crashes with an odd sized array.
    LRF Median Demo
    
    |164 165 164 164[165]165 165 165 164|
     164 164|164 164[165]165 165 165|165
     164 164 164 164[165]165 165|165 165
    ------------
    |164 164 164 164[165]165 165|165 165
    ------------
    |164 164 164 164[165]165|165 165 165
    ------------
    |164 164 164 164[165]165|165 165 165
    ------------
    |164 164 164 164[165]165|165 165 165
    ------------
    |164 164 164 164[165]165|165 165 165
    
  • Duane DegnDuane Degn Posts: 10,588
    edited 2015-01-12 13:36
    I have learned the original code can't be made to sort longs just by changing "byte" to "long". I see now Phil stuffs two 16-bit return values into the 32-bit result. This still doesn't explain the issue with sorting bytes I'm seeing.
  • Duane DegnDuane Degn Posts: 10,588
    edited 2015-01-12 19:51
    I've tested a few variations of data to see which data points will cause the endless loops. The data marked with "does not resolve" are sets which cause these endless loops.
    'array         byte 64, 65, 64, 64, 65, 65, 65, 65, 64 ' does not resolve
    'array         byte 64, 65, 64, 64, 64, 65, 65, 65, 64 ' Comparisons: 18  Swaps: 6
    'array         byte 65, 65, 64, 64, 65, 65, 65, 65, 64 ' Comparisons: 15  Swaps: 4
    'array         byte 64, 64, 64, 64, 65, 65, 65, 65, 65 ' does not resolve
    'array         byte 64, 64, 64, 64, 64, 65, 65, 65, 65 ' Comparisons: 20  Swaps: 4
    'array         byte 65, 65, 64, 64, 65, 65, 65, 64, 64 ' does not resolve
    array         byte 65, 65, 64, 64, 65, 65, 65, 64, 63 ' does not resolve
    

    I looked at the original C# code and I don't think Phil's translation completely agrees with the original code. The differences are with comparisons including "or equal to" additions or not. Even when I make the few changes I can see, the code still ends up in endless loops with some of the above data.

    I'm starting to wonder if the original C# code is correct.

    I wanted to use this code with my experiments using the Laser Rangefinder. One thing I noticed with the data I'm using is much of the data is identical. It's relatively likely the median value will be present in one of the first few elements of the data collected.

    I tried to come up with a median algorithm to take advantage of this similar data.

    This is the jest of my algorithm:
    PUB ByteMedian(arrayAddress, elements) | halfTheElements, toCompareIndex, {
    } compareAgainstIndex, elementsLarger, elementsSmaller, maxIndex 
    
      maxIndex := elements - 1
      halfTheElements := elements / 2
      repeat toCompareIndex from 0 to maxIndex
        elementsLarger := 0
        elementsSmaller := 0   
        repeat compareAgainstIndex from 0 to maxIndex
          if byte[arrayAddress][toCompareIndex] > byte[arrayAddress][compareAgainstIndex]
            elementsSmaller++
          elseif byte[arrayAddress][toCompareIndex] < byte[arrayAddress][compareAgainstIndex]
            elementsLarger++
        comparisons += elements * 2 + 2
        if elementsSmaller =< halfTheElements and elementsLarger =< halfTheElements
          loops := toCompareIndex + 1
          return byte[arrayAddress][toCompareIndex]
      
    

    It counts up how many values in the array are larger than element being tested. If the element being tested doesn't have at least half the other elements either larger than it or smaller than it, then it is assumed the element is the median. With lots of identical elements, it doesn't take long to find one which is equal to the median.

    If anyone gets the QuickMedian algorithm working with the arrays I've posted, I hope they let us know.

    I've attached the program I used to test my algorithm. In order to generate data without much variation, I changed the random number generation equation to:
    repeat result from 0 to SIZE - 1
          array[result] := (?seed >> 16) // 4 + 10 ' values 10 through 13
    

    As noted in the comments, this limits the values to numbers between 10 and 13 inclusive.

    Here's some sample output:
    Median Demo
    
    
    13, 12, 13, 13, 12, 11, 12, 10, 11
    
    Median = 12  Array size: 9  Comparisons: 40  loops: 2
    Median Demo
    
    
    12, 10, 10, 11, 12, 10, 10, 10, 10
    
    Median = 10  Array size: 9  Comparisons: 40  loops: 2
    Median Demo
    
    
    12, 13, 13, 12, 13, 11, 12, 13, 12
    
    Median = 12  Array size: 9  Comparisons: 20  loops: 1
    Median Demo
    
    
    10, 11, 11, 11, 11, 10, 13, 10, 10
    
    Median = 11  Array size: 9  Comparisons: 40  loops: 2
    Median Demo
    
    
    10, 11, 13, 13, 12, 10, 11, 13, 13
    
    Median = 12  Array size: 9  Comparisons: 100  loops: 5
    Median Demo
    
    
    12, 12, 13, 11, 13, 11, 13, 10, 13
    
    Median = 12  Array size: 9  Comparisons: 20  loops: 1
    
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2015-01-12 20:52
    Duane,

    I tested the code with random data similar to yours, and it did end up in an endless loop. I wish I had the time to devote to tracking down the reason, but I don't at the moment.

    Worst of all, though, I'd completely forgotten about this thread. I mean ... I looked at it and wondered, "Did I really post this? And it's only been ... what ... 2 1/2 years ago?" Hopefully, 10 ... 20 ... years from now, when I've completely lost it, a forumista or two might come to visit me at the "home." :)

    -Phil
  • Tracy AllenTracy Allen Posts: 6,664
    edited 2015-01-12 22:12
    In the memory refresh department there is also this...
    New-type-of-"sort"-algorithmic-function-(I-think)

    And another one, while PBASIC, is informative. If I'm not mistaken, Phil's approach there made its way translated into Spin into his median filter for the Parallax #29124 altimeter.
    Computing-a-running-median
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2015-01-12 22:17
    Well, at least I do remember the PBASIC version and was recently reminded that it was incorporated into the Spin altimeter code. So that might be a better option for now ...

    Thanks, Tracy!
    -Phil
  • kuronekokuroneko Posts: 3,623
    edited 2015-01-13 02:28
    Duane Degn wrote: »
    array         byte 65, 65, 64, 64, 65, 65, 65, 64, 63 ' does not resolve
    

    I looked at the original C# code and I don't think Phil's translation completely agrees with the original code. The differences are with comparisons including "or equal to" additions or not. Even when I make the few changes I can see, the code still ends up in endless loops with some of the above data.

    I'm starting to wonder if the original C# code is correct.
    Works for me (C# or adapted C/C++ version). For the sequence above I get 65. I have a look at the SPIN version later once I get home.
  • Duane DegnDuane Degn Posts: 10,588
    edited 2015-01-13 06:29
    kuroneko wrote: »
    Works for me (C# or adapted C/C++ version). For the sequence above I get 65. I have a look at the SPIN version later once I get home.

    I just found a big typo in Phil's code. He had the i and j swapped. Give me a few minutes to try to fix this.
  • Duane DegnDuane Degn Posts: 10,588
    edited 2015-01-13 06:45
    I think this part of the C# code is incorrect. This part of the C# code seems odd to me.
    now swap values if they are in the wrong part:
      if (i <= j)
      {
        float t = a[i];
        a[i] = a[j];
        a[j] = t;
        i++;j--;
      }
    

    I think the "i" and "j" are swapped.
    Are the "i" and "j" swapped?

    I've now got code which no longer loops endlessly but I want to test it a bit more.

    Edit: This is Phil's typo (assuming he was translating the C# code).
    if (i =< k)
          first := i
        if (j => k)
          last := j
    

    Here's the C#
    if (j < k)  L = i;
      if (k < i)  R = j; 
    

    Fixing this wasn't enough to get the code to work. I'm still testing the latest version.

    So far my latest version doesn't work. If you get this working kuroneko, I hope you let us know.
  • kuronekokuroneko Posts: 3,623
    edited 2015-01-13 07:17
    Untested (comparisons, i/j typo, #> misuse):
    PUB byte_median(addr, n) | k, first, last, i, j
    
      first~
      last := n - 1
      k := n >> 1
      repeat while (first < last)
        i := byte_split(addr, byte[addr][k], first, last)
        j := i & $ffff
        i >>= 16
        if (j < k)
          first := i
        if (k < i)
          last := j
        '.debug rep "-"\12, #CR
      show(first, last)
      return byte[addr][k]
    
    PUB byte_split(addr, pivot, first, last) | t
    
      repeat
        show(first, last)
        repeat while byte[addr][first] < pivot
          first++
          comparisons++
        repeat while pivot < byte[addr][last]
          last--
          comparisons++
        comparisons += 2
        if (first =< last)
          t := byte[addr][first]
          byte[addr][first] := byte[addr][last]
          byte[addr][last] := t
          swaps++
          first++
          last--
      while first =< last
      return first << 16 | last
    
  • Duane DegnDuane Degn Posts: 10,588
    edited 2015-01-13 07:46
    kuroneko wrote: »
    Untested

    I just tested it and it appears to work.

    Now I need to see what you did that I missed.

    Thanks for your help.

    Here's the output from one of the arrays (which had been) giving me trouble.
    Quick Median Demo
    
    |65 65 64 64[65]65 65 64 64|
     64|65 64 64[65]65 65 64|65
     64 64|64 64[65]65 65|65 65
     64 64 64 64[65]65|65 65 65
    |64 64 64 64[65]65 65 65 65
     64 64 64 64[65]65 65 65 65
    
    Median: 65  Array size: 9  Comparisons: 16  Swaps: 5
    
  • kuronekokuroneko Posts: 3,623
    edited 2015-01-13 07:55
    Note that the way of returning updated i/j values is (now) not safe in case last goes below 0. Easy to resolve though ...

    Edit: But from a quick glance it looks like that never happens.
  • Duane DegnDuane Degn Posts: 10,588
    edited 2015-01-13 08:39
    I found my mistake. I was missing these lines in my code.
    i := first
        j := last
    

    I modified the code to sort longs. While it's unlikely I'll be sorting arrays with more elements than the 16-bit return value Phil was using, I changed the code to pass pointers to the 'i" and "j" values. I'm thinking these changes probably slow down the code.
    PUB long_median(addr, n) | k, first, last, i, j
    
      first := 0
      last := n - 1
      k := n >> 1
      repeat while (first < last)
        i := first
        j := last
        long_split(addr, n, long[addr][k], @i, @j)
        if j < k 
          first := i
        if k < i   
          last := j 
        Pst.Chars("-", 12)
        Pst.NewLine
      show(addr, n, first, last, MAX_DIGITS)
      return long[addr][k]
    
    PUB long_split (addr, n, pivot, iPtr, jPtr) | t
    
      repeat
        show(addr, n, long[iPtr], long[jPtr], MAX_DIGITS)
        repeat while  long[addr][long[iPtr]] < pivot 
          long[iPtr]++
          comparisons++
        repeat while pivot < long[addr][long[jPtr]] 
          long[jPtr]--
          comparisons++
        comparisons += 2
        if long[iPtr] =< long[jPtr] 
          t := long[addr][long[iPtr]]
          long[addr][long[iPtr]] := long[addr][long[jPtr]]
          long[addr][long[jPtr]] := t
          swaps++
          long[iPtr]++
          long[jPtr]-- 
      while long[iPtr] =< long[jPtr] 
    

    Thanks Phil and kuroneko for (once again) continuing my education.
  • Tracy AllenTracy Allen Posts: 6,664
    edited 2015-01-13 09:56
    Duane, I wonder if the data really needs a quickmedian sort? As outlined in the reference Phil cited in the first post, the quickmedian algorithm only makes sense and saves time when sorting a large array. I don't know what the cutoff point is. Also, in the case of sequential data, say you have 32 samples per second coming in, are you going to feed say 32 of those at a time into an array, find their median, and thus produce one output reading per second? Or, will each new reading be fed into an array and immediately produce an output, so 32sps in and 32sps out? For the latter case, the array can be maintained sorted, and a simple insertion sort for the new reading may be most efficient.
  • Duane DegnDuane Degn Posts: 10,588
    edited 2015-01-13 10:03
    I'm not sure if this algorithm is the best to use in my application or not but it bugged me that I couldn't get it to work.

    I wanted to experiment with using multiple sensor readings from SF02 as discussed a bit in this thread.

    attachment.php?attachmentid=112500&d=1419657908

    I'm having my super cheapy pan and tilt mechanism scan over an array of points and record the readings at each point. I use a scan of a level area as the baseline in order to detect obstacles or drop offs. Subsequent scans would be compared with the data from the level area in order to find high and low spots.

    The filter would be applied to the individual points.

    I'm not sure if filtering the data will help. The LRF does its job just fine. It's the $3 servos which are giving me some trouble.
  • John AbshierJohn Abshier Posts: 1,116
    edited 2015-01-13 15:55
    Duane, I am wondering why you picked a median filter? Is the data in post 3 representative of LRF data or just synthetic data? To me using a median filter is indicated when bad reading are very much bigger or less than the true signal and have a large/long term effect on something like an average or exponential smoothing. Attached is a document I found on the net for optimal medians in C for orders 3, 5, 7, 9 and 25.

    I also found a file named SuperFastMedian3.txt with these two lines

    z:=a#>b<#(a<#b#>c)
    44 microseconds

    Do I win and unreadable code award?

    John Abshier
  • Tracy AllenTracy Allen Posts: 6,664
    edited 2015-01-13 16:54
    John, that {z:=a#>b<#(a<#b#>c)} is a sweet algorithm, correct for the median of three values a, b and c and using the Spin notation. And it follows directly from the other Compact-use-of-#>-how-to.

    I too like the running median as a pre-filter to remove shot noise, single far outliners in the case of a three entry median. The output of the running median can feed a subsequent lowpass or window or adaptive filter. One unfortunate property of a long median filter comes when the robot! suddenly comes to the edge of a cliff. It takes some time for the high values in the table fill up, then suddenly after a delay of 1/2 the table size the median takes a sudden plunge. Not that traditional filters are much better, smooth rather than sudden.
  • Duane DegnDuane Degn Posts: 10,588
    edited 2015-01-13 20:20
    Duane, I am wondering why you picked a median filter? Is the data in post 3 representative of LRF data or just synthetic data?

    The data is almost real. I subtracted 100 from the distance values so the numbers would work with Phil's two digit sort display.

    These values were taken with the robot on the bench without any disturbances. I was thinking a median filter would take care of issues like a leaf blowing in front of the LRF or a possible stray wire swinging in front of the sensor.

    I'm sure I'll be looking for ways to speed up the data collection but for now, I'm just hoping to get some reliable numbers to see if my $10 (including the price of the servos) pan and tilt mechanism (when used to point the SF02) is capable to detecting problems in the robot's path.
Sign In or Register to comment.