New type of "sort" algorithmic function (I think)

Bits · 2012-09-05 14:32

I have a few question about a sorting function:

1. Who has the fastest "integer" sort object for the propeller. I want to use mine to compare. Specifically finding the Median but I would love to try others.

2. How does a sort function decide to "pick a value" if the values are not dividable. Look below

a := 2
b := 3
c := 3

Out of a,b,c the answer could be 2 or 3 Is that a correct assumption?

Once I get this kink worked out Ill put up my object.

Bits · 2012-09-05 14:35

Fastest object in spin that is

Phil Pilgrim (PhiPi) · 2012-09-05 14:36

The median value in your example is b (3).

-Phil

Mark_T · 2012-09-05 14:43

The median is the middle value of the list once its sorted, so for 2, 3, 3 its 3. Quicksort uses an estimate of the median to partition a set into two - if you could devine the true median you'd get a balanced partitioning, but to guarantee getting the true median you would have to first sort the set....

Having said that if you don't need a stable sort (which you don't to find the median value) then quicksort is widely regarded as the fastest (on average). Naive quicksort has a terrible worst-case performance, but you'd have to be v. unlucky to hit that - I presume you've found this object: http://obex.parallax.com/objects/715/

[edit: now that I've had a little time to remember stuff, I think there is a method to find the median that doesn't require a full sort, but is based on quicksort - only you only recurse on only one subset after partitioning - the one you know must contain the median - I think this is O(N) rather than O(N log N) IIRC ]

Bits · 2012-09-05 14:47

Thanks guys I think its becoming clear but what do I do with 0's, do I need to weight them as well?

Say for simplicity

a := 1
b := 2
c := 3
d := 4
e := 5
f := 6
g := 7
h := 8
i := 9
j := 0

is 5 the answer or 6?

Phil Pilgrim (PhiPi) · 2012-09-05 15:02

When the number of elements is even, the median is often taken to be the average of the two middle values.

Your list above is not sorted; 0 belongs at the top, in which case the question becomes, "Is the answer 4 or 5?" If you have to pick one from the list, choose between 4 and 5 at random; otherwise take the average, i.e. 4.5.

-Phil

Mark_T · 2012-09-05 15:14

For even length sets you can elect to chose either, or if its meaningful to take the mean of the two middle ones (4.5 here), Sometimes that's not meaningful though.

Rayman · 2012-09-05 17:42

I used to like the QBasic sorting demo... You can see the different methods here:
http://brisray.com/qbasic/qsort.htm

jazzed · 2012-09-05 17:52

One of my favorites: http://en.wikipedia.org/wiki/Merge_sort

ericball · 2012-09-06 05:55

Note: what algorithm is best to accomplish a given function can be architecture specific, i.e. for a sort does a comparison or swap require more time, and is the sort done "in place" or does it use additional memory. What was fast (or slow) in the 70s and 80s on mainframes is often not the same as current PCs or microcontrollers.

Also, a simple sort (like a bubble sort) will require less code (and not require recursion) and may be "fast enough" for a given purpose.

Dave Hein · 2012-09-06 06:42

You don't need to sort the data to determine the median. You could accumulate a histogram and then sum up the histogram bins from the lowest to the highest value. The median is the bin where the sum is half of the total.

Phil Pilgrim (PhiPi) · 2012-09-06 07:36

Dave Hein wrote:

You don't need to sort the data to determine the median. You could accumulate a histogram and then sum up the histogram bins from the lowest to the highest value.

That's efficient if the sample domain is small w.r.t the size of the sample. But, for example, if numbers can range from 1 to 1000, and there are only ten numbers in the sample, it's more efficient to sort the sample than to scan through 1000 bins.

You don't, in fact, need to completely sort the sample. You just need to partition it recursively about a pivot value selected from the sample until you have two equal-sized partitions. At that point, the pivot is the median.

-Phil

Heater. · 2012-09-06 07:54

I suspect that if you have come up with a new type of sort that can be smaller/faster than any existing then you are up for an ACM Turing Award like this guy: http://en.wikipedia.org/wiki/Tony_Hoare

Bits · 2012-09-06 08:47

Heater. wrote: »

I suspect that if you have come up with a new type of sort that can be smaller/faster than any existing then you are up for an ACM Turing Award like this guy: http://en.wikipedia.org/wiki/Tony_Hoare

I could only wish

All the sort functions I am looking at don't even come close to what I am doing. They went in a completely different direction, fingers crossed and Ill have something useful soon. Reality - I might be totally off...

Phil Pilgrim (PhiPi) · 2012-09-06 08:59

BTW, here's an O(n) algorithm I came up with a few years back for computing a running median, i.e. the median of the most recent n values in a data stream, recomputed for each new value:

http://forums.parallax.com/showthread.php?103778

-Phil

Tracy Allen · 2012-09-06 09:03

Animations of sorting applied to different types of data are informative. Here is a nice one: http://www.sorting-algorithms.com/

A lot depends on how the problem arises. The problem of sorting a new batch of random numbers is quite different from the problem of adding a new number to a table of previously sorted values. Animations show that clearly. The latter case comes up quite often when for example you have a multiplayer game, where you need to rank the scores as players take turns. Or in data acquisition, where you need an initial median filter for an incoming stream of samples. The simple insertion or bubble are quite efficient in those cases. Those cases keep both the rank and order of play (or the age of the samples, so that the oldest can be knocked off the end).

(Aha! Phil, I remember that. Nice algorithm.)

Bits · 2012-09-06 12:28

What is the easiest way to tell if an integer is positive or negative? I prefer to not use conditional statements or division if possible, but nothing is coming to mind at the moment.

Heater. · 2012-09-06 12:32

Assuming your machine is using normal 2's complement binary representation of signed integers just test the most significant bit. If it is 1 then the number is negative. So in Spin for 32 bit numbers test bit 31.

Phil Pilgrim (PhiPi) · 2012-09-06 12:38

If you want a true/false answer, you can also do an arithmetic right shift by 31 bits, viz:

negative := value ~> 31

negative will be true ($ffff_ffff) if value is less than zero; zero, otherwise. BTW, this is efficient only in processors, like the Propeller, that have a barrel shifter.

-Phil

Heater. · 2012-09-06 12:40

Mind you, what is wrong with "if someNumber < 0"?
Note that in 2'complement representation as used in Spin zero is always positive.

Phil Pilgrim (PhiPi) · 2012-09-06 12:46

heater wrote:

Note that in 2'complement representation as used in Spin zero is always positive.

That's a good point, which brings to mind the three-way compare/test operator <=> that Spin does not implement (returning -1, 0, or 1) but which can be simulated thus:

three_way_test := (value < 0) | -(value > 0)

-Phil

Bits · 2012-09-06 13:24

Sorry guys I wanted to find an cleaver clever way to find if this is odd or even. My mistake up there on post #17

:blank:

jmg · 2012-09-06 13:43

Bits wrote: »

Thanks guys I think its becoming clear but what do I do with 0's, do I need to weight them as well?

Zero is certainly a valid number, and belongs in a basket, in a true sorting sense.

You may choose to discard zero for other reasons, depending on what your numbers represent/mean, but that is less of a sorting question.

Do you know anything about the distribution of your numbers, and how Median may relate to Mean ?

How many samples do you have, and what is your time budget ?

jmg · 2012-09-06 13:45

Bits wrote: »

Sorry guys I wanted to find an cleaver way to find if this is odd or even. My mistake up there on post #17

cleaver ? sounds scary.
Odd or even is just the LSB, but that may not be what you meant ?

Bits · 2012-09-06 13:53

Clever!!

Yes my sample size 1-9 so far with numbers ranging from 0-9. But this is going to increase as soon as I get the basics down.

The lsb is a great solution. Thanks.

Bits · 2012-09-11 18:21

Okay so I failed. My idea, I think, would work but I cant seem to overcome one slight issue. below is the vision I had.

Create a base 32 number system (in firmware) and allow each real bit to represent a number from 0-31. Easier for numbers 0-9 so lets keep my explanation short and stick to 0-9 for now.

Imagine a grouping of numbers ranging from 0 to 9 are inputted into a function, that function sets a bit accordingly. Repeat this process until all numbers are accounted for.

To complete a sort all you need to do is shift right X amount of times till you find the median number. wiki explains the what to do if the group (X) of numbers are odd or even.

Now my only issue so far is if the numbers are repeated as in the set 1,2,1,4,5,3,8. The number 1 is repeated and thus overall count will also be off - and now I suck.

I do have a vision to solve this issue and to solve numbers larger than 31 but it requires matrix type programming with arrays. I think it can be done but not sure how effective it will be.

Phil Pilgrim (PhiPi) · 2012-09-11 19:38

Bits,

There's nothing wrong with your approach. It's a special case of the histogram method broached by Dave Hein, above. Multiple readings of the same number could be handled by a cluster of bits, rather than just one. For each input, just shift a one bit left by its value x (cluster width) and add to the accumulated long. You just need to make sure that each cluster is wide enough to accommodate the number of readings in the sample.

-Phil

Bobb Fwed · 2012-09-12 09:35

This is a very interesting thread. It is always cool to see people trying to circumnavigate the pre-existing tried and true methods. That's how breakthroughs are discovered!
I'm the author of the previously mentioned object http://obex.parallax.com/objects/715/ -- If you are going to test your algorithm's speed against one of the algorithms in the object, you'll have to test it against three of the algorithms included in the object: insertion, shell, and quick. I've found that for decimal sorting, shell is the all around champ (using less memory than quick, and is the fastest out of all of them when sorting between 20 and 70 values), but for small sorts (like 10 integers) insertion would probably be the fastest.

@ericball, the way I have bubble sort and shell sort written, shell sort is only 11 bytes larger (17%) and is much faster. So bubble seems pretty worthless.
For a 500 value sort, shell is 31 times faster than bubble.
For a 100 value sort, shell is 9 times faster than bubble.
For a 10 value sort, shell is 1.8 times faster than bubble.

lonesock · 2012-09-12 10:44

@Bobb Fwed: I really like your sort object! Btw, I've found that bubble sort is usually about twice as fast if you alternate pass directions. That also bypasses the trivially worst case scenario where the array starts sorted in reverse order. One other tiny note from the Spin code is the direction comparison (ASC or DESC) could be a bit faster. Something like this (semi-tested...certainly not guaranteed [8^):

PUB bubblesort(arrayAddr, arraylength, asc_desc) : i | endpt, swapped, tmp
'' thanks Jon "JonnyMac" McPhalen (aka Jon Williams) (jon@jonmcphalen.com) for the majority of this code
'' slowest, but simplest sorting system

  arraylength -= 2                                                        ' reduce this so it doesn't re-evaluate each loop
  endpt := arraylength
  REPEAT
    swapped := false                                                            ' assume no changes
    REPEAT i FROM i TO endpt                                              ' loop through array
      'IF (asc_desc == ASC AND long[arrayAddr][i] > long[arrayAddr][i + 1]) OR (asc_desc == DESC AND long[arrayAddr][i] < long[arrayAddr][i + 1]) ' compare values
      IF (long[arrayAddr][i+asc_desc] > long[arrayAddr][i + 1 - asc_desc])
        tmp := long[arrayAddr][i]                                               ' swap values
        long[arrayAddr][i] := long[arrayAddr][i + 1]                            ' swap values
        long[arrayAddr][i + 1] := tmp                                           ' swap values
        swapped := true
    endpt := arraylength - endpt
  while swapped

thanks for the object!
Jonathan

Bobb Fwed · 2012-09-12 13:17

I like the change to the direction comparison, I can use that in a few of the algorithms, that's much better.
The rest of it, though, I'm having some troubles with, it is affecting some memory outside of it's intended range, and often yields incorrect results (the two problems may be related). I'm looking into it.

I think the problem is "REPEAT i FROM i TO endpt" did you mean something different?

Phil Pilgrim (PhiPi) · 2012-09-12 13:26

Bobb,

Unlike in most languages, Spin reevaluates all of its loop arguments -- even the limits -- every time through the loop.

-Phil

New type of "sort" algorithmic function (I think)

Comments