QuickMedian Algorithm in Spin
Phil Pilgrim (PhiPi)
Posts: 23,514
Here's an implementation of the QuickMedian algorithm, invented by C.A.R. Hoare, who also came up with QuickSort. You can read about it here. I adapted the C++ code presented there to Spin and made some corrections to the end conditions that kept the original code from terminating.
The Spin code is attached below. It makes use of my '.debug framework to display the median-finding progress as it goes along. You can download the '.debug background .exe here:
What makes QuickMedian interesting is that it doesn't do a complete sort of the array -- just enough to isolate the median value. I haven't come across any errors in my implementation yet, but I haven't tested the code exhaustively, either.
-Phil
The Spin code is attached below. It makes use of my '.debug framework to display the median-finding progress as it goes along. You can download the '.debug background .exe here:
What makes QuickMedian interesting is that it doesn't do a complete sort of the array -- just enough to isolate the median value. I haven't come across any errors in my implementation yet, but I haven't tested the code exhaustively, either.
-Phil
Comments
Having the median, apply it again to the halves, to find the quartiles. Combine that with the extrema and you have all you need for a quick alternative to mean/variance/skew.
Thanks for sharing your work.
I'm trying to figure out if I'm doing something wrong or if there's an error in the algorithm which when certain combinations of numbers are being sorted, the program gets caught in an endless loop.
I've attached the program I used to generate the following output.
In the above example the median is quickly found.
Most of the output from the attached program look this way. However every once in a while the program hangs.
The sequence shown in the last two lines just keeps repeating.
I changed your original code to use PST and to display three digit numbers rather than two digit numbers and I also changed it to sort longs rather than bytes. Character 149 was placing asterisks in the terminal so I replace "149" characters with "|" characters. I hope my formatting is close enough to the original, you can make sense of the numbers.
I'm hoping there's some small error in the original code which causes this endless loop. I don't see any reason why sorting longs would be different from sorting bytes but just to be sure, I'll try to replicate this with the byte sorting code.
Here's the program used to generate this output. I moved the "array" to the DAT section so I could fill it with the offending numbers.
I've also attached the above code as a Spin file.
I'll be trying to figure this out but I find the sort algorithm a bit mystifying.
Edit: Nope. I spoke too soon. It crashes with an odd sized array.
I looked at the original C# code and I don't think Phil's translation completely agrees with the original code. The differences are with comparisons including "or equal to" additions or not. Even when I make the few changes I can see, the code still ends up in endless loops with some of the above data.
I'm starting to wonder if the original C# code is correct.
I wanted to use this code with my experiments using the Laser Rangefinder. One thing I noticed with the data I'm using is much of the data is identical. It's relatively likely the median value will be present in one of the first few elements of the data collected.
I tried to come up with a median algorithm to take advantage of this similar data.
This is the jest of my algorithm:
It counts up how many values in the array are larger than element being tested. If the element being tested doesn't have at least half the other elements either larger than it or smaller than it, then it is assumed the element is the median. With lots of identical elements, it doesn't take long to find one which is equal to the median.
If anyone gets the QuickMedian algorithm working with the arrays I've posted, I hope they let us know.
I've attached the program I used to test my algorithm. In order to generate data without much variation, I changed the random number generation equation to:
As noted in the comments, this limits the values to numbers between 10 and 13 inclusive.
Here's some sample output:
I tested the code with random data similar to yours, and it did end up in an endless loop. I wish I had the time to devote to tracking down the reason, but I don't at the moment.
Worst of all, though, I'd completely forgotten about this thread. I mean ... I looked at it and wondered, "Did I really post this? And it's only been ... what ... 2 1/2 years ago?" Hopefully, 10 ... 20 ... years from now, when I've completely lost it, a forumista or two might come to visit me at the "home."
-Phil
New-type-of-"sort"-algorithmic-function-(I-think)
And another one, while PBASIC, is informative. If I'm not mistaken, Phil's approach there made its way translated into Spin into his median filter for the Parallax #29124 altimeter.
Computing-a-running-median
Thanks, Tracy!
-Phil
I just found a big typo in Phil's code. He had the i and j swapped. Give me a few minutes to try to fix this.
I think the "i" and "j" are swapped.
Are the "i" and "j" swapped?
I've now got code which no longer loops endlessly but I want to test it a bit more.
Edit: This is Phil's typo (assuming he was translating the C# code).
Here's the C#
Fixing this wasn't enough to get the code to work. I'm still testing the latest version.
So far my latest version doesn't work. If you get this working kuroneko, I hope you let us know.
I just tested it and it appears to work.
Now I need to see what you did that I missed.
Thanks for your help.
Here's the output from one of the arrays (which had been) giving me trouble.
Edit: But from a quick glance it looks like that never happens.
I modified the code to sort longs. While it's unlikely I'll be sorting arrays with more elements than the 16-bit return value Phil was using, I changed the code to pass pointers to the 'i" and "j" values. I'm thinking these changes probably slow down the code.
Thanks Phil and kuroneko for (once again) continuing my education.
I wanted to experiment with using multiple sensor readings from SF02 as discussed a bit in this thread.
I'm having my super cheapy pan and tilt mechanism scan over an array of points and record the readings at each point. I use a scan of a level area as the baseline in order to detect obstacles or drop offs. Subsequent scans would be compared with the data from the level area in order to find high and low spots.
The filter would be applied to the individual points.
I'm not sure if filtering the data will help. The LRF does its job just fine. It's the $3 servos which are giving me some trouble.
I also found a file named SuperFastMedian3.txt with these two lines
z:=a#>b<#(a<#b#>c)
44 microseconds
Do I win and unreadable code award?
John Abshier
I too like the running median as a pre-filter to remove shot noise, single far outliners in the case of a three entry median. The output of the running median can feed a subsequent lowpass or window or adaptive filter. One unfortunate property of a long median filter comes when the robot! suddenly comes to the edge of a cliff. It takes some time for the high values in the table fill up, then suddenly after a delay of 1/2 the table size the median takes a sudden plunge. Not that traditional filters are much better, smooth rather than sudden.
The data is almost real. I subtracted 100 from the distance values so the numbers would work with Phil's two digit sort display.
These values were taken with the robot on the bench without any disturbances. I was thinking a median filter would take care of issues like a leaf blowing in front of the LRF or a possible stray wire swinging in front of the sensor.
I'm sure I'll be looking for ways to speed up the data collection but for now, I'm just hoping to get some reliable numbers to see if my $10 (including the price of the servos) pan and tilt mechanism (when used to point the SF02) is capable to detecting problems in the robot's path.