A Fast Method for Generating Primes
© by Zack Smith All rights reserved.
IntroductionAfter considering the problem of how to most efficiently generate prime numbers I hit upon an idea for optimizing the process on standard 32- and 64-bit Intel based CPUs.
Two sievesLet's compare two sieves, which generate a list of prime numbers: The Sieve of Eratosthenes versus mine.
Sieve of EratosthenesIn the Sieve of Eratosthenes, one first creates a blank bitmap representing candidate numbers, i.e. those whose primeness is to be determined.
Suppose one wants to check 100 candidate integers for primeness, e.g. 2 to 101. This requires a bitmap of 100 bits. Given the primes that we already know, we flag all multiples of those primes within the array e.g.
What's left is a set of unflagged numbers that we need to test for primeness by calling a generic is_prime() function. If we learn of a new prime number, we must flag the numbers in the array that are multiples of it, e.g. 2n, 3n, 4n... to remove those from consideration. We leave the actual prime number (1n) unflagged.
So we proceed through the array, doing a primeness test on each non-flagged number. If it's prime or non-prime, we update the array. When we reach the end of the array, we know that the remaining non-flagged numbers are in primes.
On a computer with 8 gigabytes of RAM, a bitmap of 7GB could be safely allocated, providing an efficient method of scanning the first 60,129,542,144 integers.
The Sieve of Eratosthenes simply lets you avoid factoring the multiples of known primes, which is a large percentage of the numbers in the array, thereby saving you time.
One disadvantage of Eratosthenes is that you have to allocate this fixed-length array at the start. This limits how many integers can be checked to how much RAM you have i.e. the RAM requirement is proportional to the number of integers to be checked.
Eratosthenes is ultimately limited by the speed of your main memory, because the array is typically far too large to fit in your CPU's caches. If you plan to allocate a bitmap that is larger than your available RAM, you can expect a lot of paging to disk, which will impact greatly on the speed of the program.
Sieve of Smith (my sieve)My sieve takes some inspiration from the above sieve. Instead of eliminating every single multiple of every prime using a bitmap, I keep counters for the lower primes in the CPU registers. As one proceeds sequentially into higher numbers, I use the counters to skip over multiples of primes, rather than a bitmap.
I originally wrote this in 32-bit x86 assembly language, which provided enough 8-bit registers to skip over most of the multiples of lower primes. When I rewrote this in x86_64 assembly language, I had more registers to use and so the coverage is more complete. To repeat the basic theme: By eliminating multiples of the 12 lowest primes, the vast majority of numbers are removed from consideration and do not need to be checked for primeness.
The most time-consuming part of generating prime numbers is always the checking of a number for primeness by dividing known primes into it, because divisions are very compute-intensive. One's goal should always be to avoid doing divisions. By using counters, I can avoid doing most of those divisions.
I am also avoiding the memory-intensive approach of Eratosthenes as well. For my sieve, the RAM requirement is proportional to the number of primes found.
The pertinent data for avoiding perhaps 95% of the divisions is held in the fastest memory available: The processor's registers. Multiples of the smallest primes eliminate most numbers.
The remaining necessary work consists mainly of divisions, done to determine the primeness of a candidate number, which requires dividing each unsigned integer in the array of prime numbers into the candidate number, until a zero remainder is found or not. As program execution continues, that array is always increasing in size because it is, after all, the output array.
Sieve of Smith also uses the common technique
Here is my pseudocode for the Sieve of Smith, checking against the lower 9 primes:
AnalysisMy sieve avoids testing most numbers for primeness, because it exploits the fact that multiples of the lower primes emcompass most candidate numbers, and certainly more than do multiples of higher primes.
PerformanceThese numbers assume using my enhanced is_prime function.
On my 2.4 GHz Core i5, running a 32-bit variant of my sieve, 203 million primes were generated in 85 minutes using 32-bit divisions.
On my 2.7 GHz Core i5, 100 million primes were generated using 64-bit divisions in 4079 seconds, or about 68 minutes,
On an old Celeron 2.8 GHz with slow DDR main memory, 100 million primes required 4792 seconds, or about 80 minutes. This was running my 32-bit C code variant, compiled with gcc -O6, on GNU/Linux, kernel 2.6.
DownloadThe source code includes an output file containing the first 16777216 primes.