New type of "sort" algorithmic function (I think)
Bits
Posts: 414
I have a few question about a sorting function:
1. Who has the fastest "integer" sort object for the propeller. I want to use mine to compare. Specifically finding the Median but I would love to try others.
2. How does a sort function decide to "pick a value" if the values are not dividable. Look below
Out of a,b,c the answer could be 2 or 3 Is that a correct assumption?
Once I get this kink worked out Ill put up my object.
1. Who has the fastest "integer" sort object for the propeller. I want to use mine to compare. Specifically finding the Median but I would love to try others.
2. How does a sort function decide to "pick a value" if the values are not dividable. Look below
a := 2 b := 3 c := 3
Out of a,b,c the answer could be 2 or 3 Is that a correct assumption?
Once I get this kink worked out Ill put up my object.
Comments
-Phil
Having said that if you don't need a stable sort (which you don't to find the median value) then quicksort is widely regarded as the fastest (on average). Naive quicksort has a terrible worst-case performance, but you'd have to be v. unlucky to hit that - I presume you've found this object: http://obex.parallax.com/objects/715/
[edit: now that I've had a little time to remember stuff, I think there is a method to find the median that doesn't require a full sort, but is based on quicksort - only you only recurse on only one subset after partitioning - the one you know must contain the median - I think this is O(N) rather than O(N log N) IIRC ]
Say for simplicity
is 5 the answer or 6?
Your list above is not sorted; 0 belongs at the top, in which case the question becomes, "Is the answer 4 or 5?" If you have to pick one from the list, choose between 4 and 5 at random; otherwise take the average, i.e. 4.5.
-Phil
http://brisray.com/qbasic/qsort.htm
Also, a simple sort (like a bubble sort) will require less code (and not require recursion) and may be "fast enough" for a given purpose.
You don't, in fact, need to completely sort the sample. You just need to partition it recursively about a pivot value selected from the sample until you have two equal-sized partitions. At that point, the pivot is the median.
-Phil
I could only wish
All the sort functions I am looking at don't even come close to what I am doing. They went in a completely different direction, fingers crossed and Ill have something useful soon. Reality - I might be totally off...
-Phil
A lot depends on how the problem arises. The problem of sorting a new batch of random numbers is quite different from the problem of adding a new number to a table of previously sorted values. Animations show that clearly. The latter case comes up quite often when for example you have a multiplayer game, where you need to rank the scores as players take turns. Or in data acquisition, where you need an initial median filter for an incoming stream of samples. The simple insertion or bubble are quite efficient in those cases. Those cases keep both the rank and order of play (or the age of the samples, so that the oldest can be knocked off the end).
(Aha! Phil, I remember that. Nice algorithm.)
negative will be true ($ffff_ffff) if value is less than zero; zero, otherwise. BTW, this is efficient only in processors, like the Propeller, that have a barrel shifter.
-Phil
Note that in 2'complement representation as used in Spin zero is always positive.
-Phil
:blank:
Zero is certainly a valid number, and belongs in a basket, in a true sorting sense.
You may choose to discard zero for other reasons, depending on what your numbers represent/mean, but that is less of a sorting question.
Do you know anything about the distribution of your numbers, and how Median may relate to Mean ?
How many samples do you have, and what is your time budget ?
cleaver ? sounds scary.
Odd or even is just the LSB, but that may not be what you meant ?
Yes my sample size 1-9 so far with numbers ranging from 0-9. But this is going to increase as soon as I get the basics down.
The lsb is a great solution. Thanks.
Create a base 32 number system (in firmware) and allow each real bit to represent a number from 0-31. Easier for numbers 0-9 so lets keep my explanation short and stick to 0-9 for now.
Imagine a grouping of numbers ranging from 0 to 9 are inputted into a function, that function sets a bit accordingly. Repeat this process until all numbers are accounted for.
To complete a sort all you need to do is shift right X amount of times till you find the median number. wiki explains the what to do if the group (X) of numbers are odd or even.
Now my only issue so far is if the numbers are repeated as in the set 1,2,1,4,5,3,8. The number 1 is repeated and thus overall count will also be off - and now I suck.
I do have a vision to solve this issue and to solve numbers larger than 31 but it requires matrix type programming with arrays. I think it can be done but not sure how effective it will be.
There's nothing wrong with your approach. It's a special case of the histogram method broached by Dave Hein, above. Multiple readings of the same number could be handled by a cluster of bits, rather than just one. For each input, just shift a one bit left by its value x (cluster width) and add to the accumulated long. You just need to make sure that each cluster is wide enough to accommodate the number of readings in the sample.
-Phil
I'm the author of the previously mentioned object http://obex.parallax.com/objects/715/ -- If you are going to test your algorithm's speed against one of the algorithms in the object, you'll have to test it against three of the algorithms included in the object: insertion, shell, and quick. I've found that for decimal sorting, shell is the all around champ (using less memory than quick, and is the fastest out of all of them when sorting between 20 and 70 values), but for small sorts (like 10 integers) insertion would probably be the fastest.
@ericball, the way I have bubble sort and shell sort written, shell sort is only 11 bytes larger (17%) and is much faster. So bubble seems pretty worthless.
For a 500 value sort, shell is 31 times faster than bubble.
For a 100 value sort, shell is 9 times faster than bubble.
For a 10 value sort, shell is 1.8 times faster than bubble.
Jonathan
The rest of it, though, I'm having some troubles with, it is affecting some memory outside of it's intended range, and often yields incorrect results (the two problems may be related). I'm looking into it.
I think the problem is "REPEAT i FROM i TO endpt" did you mean something different?
Unlike in most languages, Spin reevaluates all of its loop arguments -- even the limits -- every time through the loop.
-Phil