What's left is a set of unflagged numbers that we need to test for primeness by calling a generic is_prime() function. If we learn of a new prime number, we must flag the numbers in the array that ar multiples of it, e.g. 2n, 3n, 4n... to remove those from consideration. We leave the actual prime number (1n) unflagged.
So we proceed through the array, doing a primeness test on each non-flagged number. If it's prime or non-prime, we update the array. When we reach the end of the array, we know that the remaining non-flagged numbers are in fact primes.
The Sieve of Eratosthenes simply lets you avoid factoring the multiples of known primes, which is a large percentage of the numbers in the array, thereby saving you time.
One disadvantage of Eratosthenes is that you have to allocate a fixed-length array at the start. Even though we are using a bit array, if you want to determine all of the primes among the first 4 billion numbers, you'd need an array that is 0.5 billion bytes and this becomes a resource-hungry endeavor.
Eratosthenes is ultimately limited by the speed of your main memory, because the array is typically far too large to fit in your CPU's caches. If you plan to allocate a bitmap that is larger than your available RAM, you can expect a lot of paging to disk, which will impact greatly on the speed of the program.
I originally wrote this in 32-bit x86 assembly language, which provided enough 8-bit registers to skip over most of the multiples of lower primes. When I rewrote this in x86_64 assembly language, I had more registers to use and so the coverage is more complete. To repeat the basic theme: By eliminating multiples of the 12 lowest primes, the vast majority of numbers are removed from consideration and do not need to be checked for primeness.
The most time-consuming part of generating prime numbers is always the checking of a number for primeness by dividing known primes into it, because divisions are very compute-intensive. One's goal should always be to avoid doing divisions. By using counters, I can avoid doing most of those divisions.
I am also avoiding the memory-intensive approach of Eratosthenes as well.
The pertinent data for avoiding perhaps 95% of the divisions is held in the fastest memory available: The processor's registers.
The remaining necessary work consists mainly of divisions, done to determine the primeness of a candidate number, which requires dividing each unsigned integer in the array of prime numbers into the candidate number, until a zero remainder is found or not. As program execution continues, that array is always increasing in size because it is, after all, the output array.
Sieve of Smith also uses the common technique
in my is_prime(n)
routine to check n against primes up to
and including ciel(sqrt(n))
.
Square roots are time-consuming but usually involve only a handful of divisions.
Here is my pseudocode for the Sieve of Smith, checking against the lower 9 primes:
function generate_primes { array primes[] = { 3,5,7,11,13,17,19,23 } array prime_counters[8] = { 0,0,0,0, 0,0,0,0 } let n_primes := 8 let n := 0 while (always) { ++n for i = 0 to 7 do prime_counters[i]++ let skip := false if prime_counters[0] >= 3 { prime_counters[0] = 0 skip = true } if prime_counters[1] >= 5 { prime_counters[1] = 0 skip = true } if prime_counters[2] >= 7 { prime_counters[2] = 0 skip = true } if prime_counters[3] >= 11 { prime_counters[3] = 0 skip = true } if prime_counters[4] >= 13 { prime_counters[4] = 0 skip = true } if prime_counters[5] >= 17 { prime_counters[5] = 0 skip = true } if prime_counters[6] >= 19 { prime_counters[6] = 0 skip = true } if prime_counters[7] >= 23 { prime_counters[7] = 0 skip = true } if (skip) continue if (n is even) then continue if (is_prime (n, primes, n_primes)) { primes[n_primes] = n multiples[n_primes] = n n_primes++ println ("%Lu is prime", n) } } } |
On my Core i5 2.4 GHz with DDR3 main memory, running a 32-bit variant of my sieve, 203 million primes were generated in 85 minutes using 32-bit divisions.
On my Core i5 2.4 GHz with DDR3 main memory, 100 million primes were generated in 3284 seconds, or about 55 minutes, using 64-bit divisions.
On an old Celeron 2.8 GHz with slow DDR main memory, 100 million primes required 4792 seconds, or about 80 minutes. This was running my 32-bit C code variant, compiled with gcc -O6, on GNU/Linux, kernel 2.6.