Yeah, I understand that, and that is the reason it seems like an odd statement. I get it's potential, I've just never seen it used before. The algorithm isn't working, so I was checking it was correct.
Is there an example of the algorithm he is performing (usually there is examples somewhere around the web in C).
PUB bubblesort(arrayAddr, arraylength, asc_desc) : i | del, swapped, tmp
'' thanks Jon "JonnyMac" McPhalen (aka Jon Williams) (jon@jonmcphalen.com) for the majority of this code
'' slowest, but simplest sorting system
--arraylength ' reduce this so it doesn't re-evaluate each loop
del := 1
REPEAT
swapped := false ' assume no changes
REPEAT arraylength ' loop through array
'IF (asc_desc == ASC AND long[arrayAddr][i] > long[arrayAddr][i + 1]) OR (asc_desc == DESC AND long[arrayAddr][i] < long[arrayAddr][i + 1]) ' compare values
IF (long[arrayAddr][i+asc_desc] > long[arrayAddr][i + 1 - asc_desc])
tmp := long[arrayAddr][i] ' swap values
long[arrayAddr][i] := long[arrayAddr][i + 1] ' swap values
long[arrayAddr][i + 1] := tmp ' swap values
swapped := true
i += del
-del
i += del
while swapped
My quick set of tests says it works. Thanks!
Of course, now it's only 4 bytes smaller, and still slower than my unoptimized shell sort at any size above 6 elements to sort. If I figure out a way to implement your sorting change, it will be slower at all sizes.
No problem! Yeah, I'm not trying to say people should use bubble-sort....merely that there is a trivial change that will give a 2x-ish improvement if you _do_ end up using it. [8^)
PUB RadixSort_nybble( arrayAddr, arraylength , tempArray ) | swapVar, Counts[16], Indices[16], i, pass, bucket, index
'Does 8 passes, using nybbles as the keys
arraylength-- ' reduce this so it doesn't re-evaluate each loop
'Radix only deals with positive numbers, so offset the values so
'negative numbers start at zero, and positive ones start at 1<<31
repeat i from 0 to arraylength
long[arrayAddr][i] += 2_147_483_648
repeat pass from 0 to 28 step 4
longfill( @Counts, 0, 16 )
repeat i from 0 to arraylength
bucket := ((long[arrayAddr][i]) >> pass) & 15
Counts[bucket]++
Indices[0] := 0
repeat i from 0 to 14
Indices[i+1] := Indices[i] + Counts[i]
repeat i from 0 to arraylength
bucket := ((long[arrayAddr][i]) >> pass) & 15
long[tempArray][Indices[bucket]++] := long[arrayAddr][i]
swapVar := arrayAddr
arrayAddr := tempArray
tempArray := swapVar
'Remove all the digit offsets
repeat i from 0 to arraylength
long[arrayAddr][i] -= 2_147_483_648
PUB RadixSort_byte( arrayAddr, arraylength , tempArray ) | swapVar, Counts[256], Indices[256], i, pass, bucket, index
'Does 4 passes, using bytes as the keys
arraylength-- ' reduce this so it doesn't re-evaluate each loop
'Radix only deals with positive numbers, so offset the values so
'negative numbers start at zero, and positive ones start at 1<<31
repeat i from 0 to arraylength
long[arrayAddr][i] += 2_147_483_648
repeat pass from 0 to 24 step 8
longfill( @Counts, 0, 256 )
repeat i from 0 to arraylength
bucket := ((long[arrayAddr][i]) >> pass) & 255
Counts[bucket]++
Indices[0] := 0
repeat i from 0 to 254
Indices[i+1] := Indices[i] + Counts[i]
repeat i from 0 to arraylength
bucket := ((long[arrayAddr][i]) >> pass) & 255
long[tempArray][Indices[bucket]++] := long[arrayAddr][i]
swapVar := arrayAddr
arrayAddr := tempArray
tempArray := swapVar
'Remove all the digit offsets
repeat i from 0 to arraylength
long[arrayAddr][i] -= 2_147_483_648
Because of the higher pass overhead, the byte-bucket based radix sort does poorly on very short lists, but goes MUCH faster on longer lists. Here's how it stacks up against the others. A PASM implementation would murder them all.
Name values sorted: 30 300 1000
Shell Sort (in cycles): 1,200,406 27,252,688 117,658,214
PASM Shell Sort (in cycles): 31,459 451,884 1,884,771
Radix Sort (in cycles): 2,486,976 21,080,304 69,285,104
Radix Byte (in cycles): 4,397,568 14,264,496 39,845,296
Quick Sort (in cycles): 1,044,086 21,623,142 87,516,099
Insertion Sort (in cycles): 1,645,136 136,504,000
Cocktail Sort (in cycles): 2,951,184 281,561,651
Bubble Sort (in cycles): 4,691,264
A C "for" loop evaluates it's initial condition once, it then evaluates the termination condition and the step statement one more time than the loop body is executed.
As can be seen here:
#include <stdio.h>
int loopCount;
int init()
{
printf ("init:\n");
loopCount = 0;
return (loopCount);
}
int test()
{
printf ("test:\n");
if (loopCount < 10)
{
return (1);
}
return(0);
}
int step()
{
printf ("step:\n");
loopCount++;
}
int main (int argc, char* argv[])
{
for (init(); test(); step())
{
}
printf ("loopCount = %d\n", loopCount);
return (0);
}
The BASIC Stamp has that behavior too, in FOR:NEXT statements. The manual describes with a couple of examples how that can be used to create an up-down sweep of numbers and an exponentially increasing series. The Prop manual has one example where changing the delta value allows the REPEAT loop to achieve "interesting effects".
I looked at a couple of variations of the insertion sort: #1 to take out the decision of ascending vs descending, which was being evaluated every time around the inner loop. #2 to substitute a single longmove once for each time around the outer loop, instead of multiple swaps which were taking place within the inner loop.
[FONT=courier new][SIZE=1]PUB insertionsort1 (arrayAddr, arraylength) | j, i, val
arraylength--
REPEAT i FROM 1 TO arraylength
val := long[arrayAddr][i] ' store value for later
j := i - 1
REPEAT WHILE long[arrayAddr][j] > val ' [COLOR=#ff0000]no asc vs decs test here[/COLOR]
long[arrayAddr][j + 1] := long[arrayAddr][j] ' insert value
IF (--j < 0)
QUIT
long[arrayAddr][j + 1] := val ' place value (from earlier)
PUB insertionsort2 (arrayAddr, arraylength) | j, i, k, val
arraylength--
REPEAT i FROM 1 TO arraylength
val := long[arrayAddr][i] 'store value for later
j := i
k~
REPEAT WHILE long[arrayAddr][--j] > val ' compare values
if ++k == i [/SIZE][/FONT][FONT=courier new][SIZE=1] ' [COLOR=#ff0000]no swaps in inner loop[/COLOR][/SIZE][/FONT]
[FONT=courier new][SIZE=1] QUIT
if k
longmove(j:=arrayAddr+(i-k+1)<<2,j-1,k) ' [COLOR=#ff0000]one longmove[/COLOR]
long[arrayAddr][i-k] := val
[/SIZE][/FONT]
The following results are for 5 loops on random data. The insertion sort without the asc/desc evaluation runs in about 0.6 the time. And the insertion sort with the longmove is faster still (also with no asc/desc evaluation)
The following instead uses a largely presorted (ascending) array, with 3 random values poked in the 1/4,1/2 and 3/4 positions in the array. This is the kind of precondition where the insertion sort is at an advantage over the Shell and Quick (but the pasm version of Shell still shines). The insertion sort here with the longmove was a little slower than the version with the swaps. I think that happens when a high value is discovered early in the process, it is slow to propagate to the top, multiple longmoves.
I may have overstated my position when I said "most languages" do not reevaluate their loop limits each time through. Perhaps "most languages that I've used" (i.e. not including C, C++, or Java) would be more nearly correct. Anyway, here's an more exhaustive discussion of Spin loop semantics, including pathological examples not suitable for those with delicate sensitivities; so be forewarned:
I looked at a couple of variations of the insertion sort: #1 to take out the decision of ascending vs descending, which was being evaluated every time around the inner loop. #2 to substitute a single longmove once for each time around the outer loop, instead of multiple swaps which were taking place within the inner loop.
I was hoping people would come along and find optimizations in my coding. I'm sure there are optimizations that can be done to all the algorithms. The other thing your chart is missing is the optimizations that your changes to insert sort did to quick sort. Cutting the insertion sort speed in half speed that algorithm up too (from my quick testing, it is about 10-20% faster).
I'm doing some tests to see how much faster removing the ascending/descending option is. It could always sort in ascending then if people want the opposite, they just run through the array backward, not that big of a deal. Though a couple of the algorithms I don't think will have a major change in speed due to how they are written. We'll see.
Comments
Is there an example of the algorithm he is performing (usually there is examples somewhere around the web in C).
Jonathan
Of course, now it's only 4 bytes smaller, and still slower than my unoptimized shell sort at any size above 6 elements to sort. If I figure out a way to implement your sorting change, it will be slower at all sizes.
Jonathan
-Phil
http://en.wikipedia.org/wiki/Radix_sort
Because of the higher pass overhead, the byte-bucket based radix sort does poorly on very short lists, but goes MUCH faster on longer lists. Here's how it stacks up against the others. A PASM implementation would murder them all.
Out of interest which languages are you thinking of? not C, C++ or Java
As can be seen here:
The following results are for 5 loops on random data. The insertion sort without the asc/desc evaluation runs in about 0.6 the time. And the insertion sort with the longmove is faster still (also with no asc/desc evaluation)
The following instead uses a largely presorted (ascending) array, with 3 random values poked in the 1/4,1/2 and 3/4 positions in the array. This is the kind of precondition where the insertion sort is at an advantage over the Shell and Quick (but the pasm version of Shell still shines). The insertion sort here with the longmove was a little slower than the version with the swaps. I think that happens when a high value is discovered early in the process, it is slow to propagate to the top, multiple longmoves.
Oh, Jason, the radix sort is very cool!
-Phil
I was hoping people would come along and find optimizations in my coding. I'm sure there are optimizations that can be done to all the algorithms. The other thing your chart is missing is the optimizations that your changes to insert sort did to quick sort. Cutting the insertion sort speed in half speed that algorithm up too (from my quick testing, it is about 10-20% faster).
I'm doing some tests to see how much faster removing the ascending/descending option is. It could always sort in ascending then if people want the opposite, they just run through the array backward, not that big of a deal. Though a couple of the algorithms I don't think will have a major change in speed due to how they are written. We'll see.