Memory Efficient Multithreaded Incremental Segmented Sieve Algorithm: Introduction

cover
29 Mar 2024

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Evan Ning, Milton Academy & [email protected];

(2) David R. Kaeli, Department of Electrical and Computer Engineering & Northeastern University [email protected].

1. Introduction

Prime numbers are fundamental in number theory and play a significant role in various areas, from pure mathematics to practical applications. For example, public key encryption techniques, especially in cryptography (e.g., the Rivest Shamir Adelman encryption scheme (RSA) [1]), rely heavily on the novel properties of prime numbers. The degree of difficulty in factorizing very large numbers which are products of two prime numbers ensures the security of data encrypted using these methods.

Even though prime numbers are fundamental in number theory, determining the primality of large numbers or generating long lists of prime numbers can be computationally challenging as primes are pseudorandomly distributed. Thus, despite several conjectures and theorems that lend insight into the distribution of prime numbers, there is no formula to compute the next prime. The prime sieve algorithm, more formally known as the "Sieve of Eratosthenes," is a well-known and efficient technique used to find prime numbers [2].

With the growing computational needs associated with prime enumeration tasks, traditional implementations of the Sieve algorithm remain a computational bottleneck, especially given the advances in multi-core processors. As the rate of data grows and computational demands rise, harnessing the power of multithread processing becomes imperative. Over the years, there have been prior studies that have delved into the challenge of parallelizing the Sieve of Eratosthenes. Some of these approaches try to exploit the capabilities of multicore processors, breaking down the computation task into chunks [3]. This approach lends itself to using parallelization libraries and middleware, including OpenMP (shared memory) and MPI (inter processor message queues) [4]. Some of the common drawbacks and challenges of parallel implementations are that too many threads accessing shared memory can lead to memory contention, causing bottlenecks and limiting the performance gains from parallelism. Furthermore, prior work has not adequately addressed scalability issues, such that when the prime number being searched for is very large, it eventually can exceed the limits of memory. Additionally, more optimizations can be made within each iteration with improvements in mathematical algorithms, which can lead to significant savings.

We introduce a multithreaded implementation of the Segmented Sieve [3]. In our implementation, instead of handling large prime ranges in one iteration, the sieving process is broken down incrementally, which theoretically eliminates the challenges of working with large numbers, and can reduce memory usage, providing overall more efficient utilization over extended computations. In this work, we also explored some mathematical optimization to speed up computation. The code is implemented in POSIX threads C++ [6] and tested on a high-performance multi-core cluster [7]. We have been able to achieve a significant reduction in prime enumeration time, as well as a significant reduction in the associated memory footprint.

1.1 Sieve of Eratosthenes

Prime numbers are hard to compute, which makes them indispensable in various fields, especially in cryptography. The security of many encryption systems depends on the computational difficulty of factorizing large numbers. The "Sieve of Eratosthenes," is an old, yet efficient, algorithm used to find all prime numbers up to a specified integer. The algorithm works by iteratively marking the multiples of each prime number given a range.

The algorithm begins by creating a list of consecutive integers, from 2 up to n: (2, 3, 4, ..., n). Next, we set the current number, P, to the value 2. Then, remove all multiples of P beginning at P2 from the list, as all multiples of P less than P2 would have been removed in previous iterations already. For instance, the first time we run the algorithm, we remove all even numbers (except 2). Then, we remove all multiples of 3 (except 3), starting from 9 as 6 has already been removed, and so on. After each removal, we find the next number in the list after P that has not been removed. We set this new number to P and repeat the process.

The Sieve of Eratosthenes can be expressed in pseudocode [5], as shown in Figure 1. The Sieve of Eratosthenes is especially efficient for computing all prime numbers up to a certain limit. Its time complexity is O(n log log n), making it one of the most efficient ways to find all primes up to a given limit n.

Figure 1. Pseudocode for the Sieve of Eratosthenes.