Multicore program crashes or produces garbage when calculating large prime numbers

samuell · 2016-04-26 15:51

ThinkersALot and yeti, my thoughts exactly. The recursive method has one edge, though: you'll only need to list the primes from 2 to sqrt(n) if you are to calculate primes up to n. The Sieve of Erastosthenes works in a similar fashion, if you think about it.

In the end, the recursive method I've mentioned earlier is a cheaper method of doing it without having a complete array of n numbers, as you only need an array of primes up to sqrt(n); that you either calculate on the fly recursively, or were previously calculated (if you can store a list of previously calculated numbers and recall it later for less effort). In fact, I may implement this on the PC to see if there is a difference. The potential is huge.

JasonDorie · 2016-04-26 16:06

You'll notice I said "...but at some point keeping track of the extra counters will have more overhead than just doing the division." I simply meant that for this code on this hardware doing the every-3rd test would be a win. On a PC with hardware division it wouldn't.

I know it can be extended to infinity and overwhelm the code, but you're already skipping even numbers. Adding 3's with a simple counter will save you about 20% of the work you're doing.

samuell · 2016-04-26 16:10

JasonDorie wrote: »

You'll notice I said "...but at some point keeping track of the extra counters will have more overhead than just doing the division." I simply meant that for this code on this hardware doing the every-3rd test would be a win. On a PC with hardware division it wouldn't.

I know it can be extended to infinity and overwhelm the code, but you're already skipping even numbers. Adding 3's with a simple counter will save you about 20% of the work you're doing.

Hi Jason,

I'm afraid there is some confusion. On the PC I'll use the recursive method, the one that uses a lot of memory. The method you've mentioned I'll apply to the Propeller, not to the PC.

Kind regards, Samuel Lourenço

JasonDorie · 2016-04-26 16:33

Oh no, you understood me - my comments were to Tinkers and Yeti, but I think I may have missed the fact that they were talking to each other, not me.

samuell · 2016-09-09 22:04

JasonDorie wrote: »

I don't know if this would make the inner loop faster or slower, but every 3rd number in your series is divisible by 3:

3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27 . . .

You could pre-test division by three like you do for division by two, then keep a second counter in your loop. Skip the modulo test for every 3rd value and just reset the counter back to zero.

If you're compiling as LMM or PASM it should be much faster to increment a number and test for == 3 than to do the division involved in the modulo. You could use the same trick for 5's, too, saving you some additional modulo tests, but at some point keeping track of the extra counters will have more overhead than just doing the division.

Hi Jason,

After seeing an interesting algorithm to test primality on Wikipedia, it reminded me of your suggestion of also suppressing divisions with multiples of 3. I've decided to try that, and it is possible to reduce test time up to 50% for larger numbers. The difference is not that dramatic for smaller numbers, and even to the point of being about the same time for the old and new algorithm for small series of "tiny" numbers (smaller than 10000). Overall, the new algorithm is noticeably faster, and never slower.

I've uploaded the new algorithm.

Kind regards, Samuel Lourenço

wmosscrop · 2016-09-28 12:11

Jumping in here with a possible compromise...

Why not have multiple algorithms?

* There are only 25 prime numbers between 1 and 100. Each can be stored in a byte. That's only 25 bytes.
* The first algorithm would calculate the primes from 1 to 100 and store them in an array of bytes.
* The second algorithm would then calculate the primes from 101 to whatever by...
* First using the values from the first step as divisors. If it passes all of these values...
* Then a third (whew!) algorithm would start with 101 as a divisor and then continue with 103, 105, etc.

If you don't like the idea of hardcoding a prime value (101) then just arbitrarily fill an array of fixed size with primes. If you exhaust all of the values in the array then start with the last divisor + 2.

You could even use the primes up to 255 (251 is the last). There are only 54 of them. That's a reduction of ~57% of divisions for the most likely divisors.

The advantage of this method is that you can eliminate 50% or more of the first divisors. Of course, the code is more complicated, but also more optimized for each test.

Many (many!) years ago I implemented this on a Monroe 1665 printing calculator. It had 512 bytes of ram.

Good luck,

Walter

samuell · 2016-09-29 10:03

wmosscrop wrote: »

Jumping in here with a possible compromise...

Why not have multiple algorithms?

* There are only 25 prime numbers between 1 and 100. Each can be stored in a byte. That's only 25 bytes.
* The first algorithm would calculate the primes from 1 to 100 and store them in an array of bytes.
* The second algorithm would then calculate the primes from 101 to whatever by...
* First using the values from the first step as divisors. If it passes all of these values...
* Then a third (whew!) algorithm would start with 101 as a divisor and then continue with 103, 105, etc.

If you don't like the idea of hardcoding a prime value (101) then just arbitrarily fill an array of fixed size with primes. If you exhaust all of the values in the array then start with the last divisor + 2.

You could even use the primes up to 255 (251 is the last). There are only 54 of them. That's a reduction of ~57% of divisions for the most likely divisors.

The advantage of this method is that you can eliminate 50% or more of the first divisors. Of course, the code is more complicated, but also more optimized for each test.

Many (many!) years ago I implemented this on a Monroe 1665 printing calculator. It had 512 bytes of ram.

Good luck,

Walter

Hi Walter,

Well, it is always possible to optimize that way, but I think that would have little effect. The more numbers you have hardcoded, the less advantage you have. For instance, I have a 50% speed increase for large numbers by skipping multiples of 2 and 3. I guess if I would skip multiples of 5 too, the advantage would be less than 50% over the algorith that skips multiples of 2 and 3, so to speak.

Kind regards, Samuel Lourenço

wmosscrop · 2016-09-29 15:10

samuell wrote: »

Well, it is always possible to optimize that way, but I think that would have little effect. The more numbers you have hardcoded, the less advantage you have. For instance, I have a 50% speed increase for large numbers by skipping multiples of 2 and 3. I guess if I would skip multiples of 5 too, the advantage would be less than 50% over the algorith that skips multiples of 2 and 3, so to speak.

I'm not saying to hardcode multiples of 2, 3, 5, 7, etc. I'm saying that you can create an array of the first "n" primes and use them as the preliminary divisors before having to check every odd number not a multiple of 3 as a divisor.

And yes, as the numbers being checked increase in size, the advantage decreases. But the more (computed) primes that you can use the better.

Everything is a compromise.

Walter

samuell · 2016-10-04 09:50

wmosscrop wrote: »

samuell wrote: »

Well, it is always possible to optimize that way, but I think that would have little effect. The more numbers you have hardcoded, the less advantage you have. For instance, I have a 50% speed increase for large numbers by skipping multiples of 2 and 3. I guess if I would skip multiples of 5 too, the advantage would be less than 50% over the algorith that skips multiples of 2 and 3, so to speak.

I'm not saying to hardcode multiples of 2, 3, 5, 7, etc. I'm saying that you can create an array of the first "n" primes and use them as the preliminary divisors before having to check every odd number not a multiple of 3 as a divisor.

And yes, as the numbers being checked increase in size, the advantage decreases. But the more (computed) primes that you can use the better.

Everything is a compromise.

Walter

Hi Walter,

That would require memory, and a more complex algorithm. Note that this algorithm, when compiled using LMM, already occupies most of the EEPROM/RAM.

Kind regards, Samuel Lourenço

Multicore program crashes or produces garbage when calculating large prime numbers

Comments