Shop OBEX P1 Docs P2 Docs Learn Events
Sort an array of strings — Parallax Forums

Sort an array of strings

Bobb FwedBobb Fwed Posts: 1,119
edited 2011-02-15 16:15 in Propeller 1
So I wonder if someone has done this before. Or if not, what are some pointers to do this?

The string addresses are stored in an array, and the best option would be to arrange the original array (not creating a new one -- so not to increase memory usage).

Any help would be nice.

Comments

  • Mike GreenMike Green Posts: 23,101
    edited 2011-02-08 09:27
    There are all sorts of articles on simple sorting. Start with the Wikipedia on "exchange sort".

    What you will be sorting is the array of string addresses using the strings themselves for comparison. You'll need to be able to compare two strings for "less than" given their addresses.
  • Dave HeinDave Hein Posts: 6,347
    edited 2011-02-08 09:31
    You could do a bubble sort. It's slow, but easy to implement. It should be OK if you don't have a large number of strings to sort.
  • JonnyMacJonnyMac Posts: 9,208
    edited 2011-02-08 12:00
    As an exercise to shake the cobwebs from my brain (been under the weather), I wrote the attached (based on suggestions listed above); perhaps it will get you going.
  • Bobb FwedBobb Fwed Posts: 1,119
    edited 2011-02-08 13:41
    JonnyMac wrote: »
    As an exercise to shake the cobwebs from my brain (been under the weather), I wrote the attached (based on suggestions listed above); perhaps it will get you going.
    That get's me very close. Thank you.
  • Bobb FwedBobb Fwed Posts: 1,119
    edited 2011-02-09 11:36
    So, I have heavily modified your code to fit my needs (and it sped it way up). And I have decided to go with the cocktail sort instead of the bubble sort, it's a little less than twice as fast (because it does about half as many compares).

    Attached is my test code. I will implement it in the next version of Memory Storage Management. I might try a couple different sorting methods, as there may be quite a few strings to sort, thus speed will become an issue. Insertion sort doesn't seem too complicated, and it should be faster in pretty much all situations.
  • Bobb FwedBobb Fwed Posts: 1,119
    edited 2011-02-09 13:06
    Well, that was simpler than I was expecting.
    It takes up a lot less code space than cocktail sorting, and it's faster.
    PUB insertionSort (arrayAddr, arraylength, ASC_DESC) | j, i, val, done, tmp
    '' Sorts array of address to strings based on string value
     
      REPEAT i FROM 1 TO arraylength - 1
        val := word[arrayAddr][i]
        j := i - 1
        done := false
        REPEAT
          tmp := strcmp(word[arrayAddr][j], val)
          IF (ASC_DESC == ASC AND tmp > 0) OR (ASC_DESC == DESC AND tmp < 0)
            word[arrayAddr][j + 1] :=  word[arrayAddr][j]
    
            IF (--j < 0)
              done := true
          ELSE
            done := true
        UNTIL done
        word[arrayAddr][j + 1] := val
    
    I just finished it, and it seems to work, I will test it more.

    Now I need to make it so it isn't case sensitive (or optionally not).
  • Bobb FwedBobb Fwed Posts: 1,119
    edited 2011-02-09 15:36
    A couple more contributions to this sorting stuff: I added ascending/descending option to the method above, and I improved on the strcmp method by first making it slightly smaller and faster, then making it case insensitive (which made it larger and slower).
    PRI strcmp(s1, s2)
    '' thanks Jon "JonnyMac" McPhalen (aka Jon Williams) (jon@jonmcphalen.com) for the majority of this code
    '' altered so results are not case sensitive, and slightly faster (when considering the case insensitivity)
    
    '' Returns 0 if strings equal, positive if s1 > s2, negative if s1 < s2
    
      REPEAT WHILE ((byte[s1] & constant(!$20)) == (byte[s2] & constant(!$20)))     ' if equal (not perfect case insensitivity, but fast -- we mostly work with just a-z/0-9)
        IF (byte[s1] == 0)                                                          '  if at end
          RETURN 0                                                                  '    done
        ELSE
          s1++                                                                      ' advance pointers
          s2++
    
      RETURN (byte[s1] & constant(!$20)) - (byte[s2] & constant(!$20))              ' (not perfect case insensitivity, but fast -- we mostly work with just a-z/0-9)
    
  • Bobb FwedBobb Fwed Posts: 1,119
    edited 2011-02-11 15:14
    So I went through five different sorting algorithms to figure out which is the best for my purpose, now I wish to share the work I've done with everyone else, so I have posted the algorithms on the OBEX. They are available in both the string sorting and decimal sorting variety.

    Download here: http://obex.parallax.com/objects/715/

    I have included bubble sort, cocktail sort, insertion sort, shell sort, and quick sort. The first two are pretty worthless (under almost all circumstances) once you introduce the last three.
  • max72max72 Posts: 1,155
    edited 2011-02-12 06:14
    Thanks!
    There is a lot to learn in the objects...
    Massimo
  • Bobb FwedBobb Fwed Posts: 1,119
    edited 2011-02-15 14:29
    I just released a new version 1.01.

    Changes:
    1) Most importantly, I have been able to find optimizations for all five sorting algorithms. Most significantly, quick sort. For decimal sorting, it is almost always the fastest sorting method as long as there are more than 17 elements in the array. Less than 17, it is only slightly slower than insertion sort.
    2) Now all methods have the order (ascending/descending) option. I did not have these on shell and quick sort before.
    3) In the string demo, the strings are now randomly arranged before sorting (for less bias times). There are also (optionally) 90 strings to arrange in the demo.

    Here is a chart to help determine which is best for your uses (it is based on the speed of the demo):
    +-------------------------------------------------------------+
    |                 SORTING SPEED RANKING CHART                 |
    +------------------------------+------------------------------+
    |          DECIMALS            |            STRINGS           |
    +------------------------------+------------------------------+
    | Less than 17 array Elements: | Less than 13 array Elements: |
    |   1) Insertion Sort          |   1) Insertion Sort          |
    |   2) Quick Sort              |   2) Quick Sort              |
    |   3) Shell Sort              |   3) Shell Sort              |
    +------------------------------+------------------------------+
    | 17 =< Elements =< 25:        | 13 =< Elements =< 19:        |
    |   1) Quick Sort              |   1) Insertion Sort          |
    |   2) Insertion Sort          |   2) Shell Sort              |
    |   3) Shell Sort              |   3) Quick Sort              |
    +------------------------------+------------------------------+
    | More than 25 array Elements: | More than 19 array Elements: |
    |   1) Quick Sort              |   1) Shell Sort              |
    |   2) Shell Sort              |   2) Insertion Sort          |
    |   3) Insertion Sort          |   3) Quick Sort              |
    +------------------------------+------------------------------+
    
  • Bobb FwedBobb Fwed Posts: 1,119
    edited 2011-02-15 16:15
    A thing to keep in mind, is while quick sort is the fastest decimal method, it uses significantly more stack memory. For example, with 1000 elements it is almost 30% faster than shell sort, but it recurses, so the math (is something close to this):
    n := 1000                     ' number of elements in the array
    
      memory := 14                  ' first call of the quick sort method
      REPEAT WHILE (n > 15)         ' repeats until array parts are less than 15, then uses insertion sort
        n /= 2                      ' (optimally) splits array in half before recursion (this could be worse, as a perfect split will not always happen)
        memory += 14                ' recursive calls
      memory += 12                  ' call to insertion sort
    
    That's over 100 longs (best case). It's logarithmic, so doubling the sorted array will only add 14 more longs, but even at only 32 elements, it's using 54 longs in stack.
    Alternatively the shell sort would only use 13 longs of stack, no matter the size of the array.
Sign In or Register to comment.