Problem 7: 10001st Prime
View on Project EulerProject Euler Problem 7 Solution
EulerSolve provides an optimized solution for Project Euler Problem 7, 10001st Prime, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary The task is to determine the 10001st prime number. If \(p_n\) denotes the \(n\)-th prime and \(\pi(x)\) denotes the number of primes not exceeding \(x\), then the problem is equivalent to finding the smallest integer \(x\) such that \(\pi(x) \ge 10001\). The implementations do not test primality one candidate at a time. Instead, they choose a finite interval that is expected to contain \(p_n\), generate every prime in that interval with a sieve, and then read off the desired entry from the ordered list. Mathematical Approach The whole method rests on two ideas: turning the question into a counting problem, and using a sieve whose correctness follows from a simple factorization invariant. From the \(n\)-th prime to a counting threshold The prime-counting function is $\pi(x)=\#\{q \le x : q \text{ is prime}\}.$ With this notation, the \(n\)-th prime can be written as $p_n=\min\{x \ge 2 : \pi(x)\ge n\}.$ So the problem becomes: find some upper limit \(U\) with \(\pi(U)\ge n\), list all primes up to \(U\), and take the \(n\)-th element of that increasing list. Choosing a practical upper bound For \(n \ge 6\), a standard explicit estimate for the \(n\)-th prime is $p_n \lt n(\log n+\log\log n).$ The implementations therefore start from $U(n)=\left\lfloor n(\log n+\log\log n)\right\rfloor+16.$ The extra 16 is a safety margin....
Detailed mathematical approach
Problem Summary
The task is to determine the 10001st prime number. If \(p_n\) denotes the \(n\)-th prime and \(\pi(x)\) denotes the number of primes not exceeding \(x\), then the problem is equivalent to finding the smallest integer \(x\) such that \(\pi(x) \ge 10001\).
The implementations do not test primality one candidate at a time. Instead, they choose a finite interval that is expected to contain \(p_n\), generate every prime in that interval with a sieve, and then read off the desired entry from the ordered list.
Mathematical Approach
The whole method rests on two ideas: turning the question into a counting problem, and using a sieve whose correctness follows from a simple factorization invariant.
From the \(n\)-th prime to a counting threshold
The prime-counting function is
$\pi(x)=\#\{q \le x : q \text{ is prime}\}.$
With this notation, the \(n\)-th prime can be written as
$p_n=\min\{x \ge 2 : \pi(x)\ge n\}.$
So the problem becomes: find some upper limit \(U\) with \(\pi(U)\ge n\), list all primes up to \(U\), and take the \(n\)-th element of that increasing list.
Choosing a practical upper bound
For \(n \ge 6\), a standard explicit estimate for the \(n\)-th prime is
$p_n \lt n(\log n+\log\log n).$
The implementations therefore start from
$U(n)=\left\lfloor n(\log n+\log\log n)\right\rfloor+16.$
The extra 16 is a safety margin. For very small indices the formula is inconvenient because of the \(\log\log n\) term, so the code uses the fixed bound \(64\) instead, and for \(n=1\) it returns \(2\) immediately. This matches the actual control flow in all three languages.
Correctness does not depend on the first estimate being perfect. If the sieve up to \(U\) produces fewer than \(n\) primes, then \(U \lt p_n\), so the program doubles \(U\) and tries again. Because primes are infinite in number, repeated doubling must eventually reach a bound above \(p_n\).
Why the sieve is correct
The sieve keeps a Boolean table for the integers \(0,1,\dots,U\). At the beginning, every entry at least 2 is treated as a prime candidate. When the outer loop reaches an integer \(p\) that is still marked, \(p\) must actually be prime: if \(p\) were composite, it would have some prime divisor \(r \lt p\), and that smaller prime would already have crossed \(p\) off.
Once \(p\) is known to be prime, the code marks
$p^2,\ p^2+p,\ p^2+2p,\ \dots$
as composite. Starting at \(p^2\) is enough. Any smaller multiple \(kp\) with \(2 \le k \lt p\) already has a factor below \(p\), so it was eliminated earlier. After every prime \(p \le \sqrt{U}\) has been processed, every unmarked number in the table is prime and every marked number is composite.
Worked examples
The checkpoint example used by the implementations is \(n=6\). The first six primes are
$2,3,5,7,11,13,$
so \(p_6=13\).
For the actual Project Euler input \(n=10001\), the initial estimate is
$U(10001)=\left\lfloor 10001\bigl(\log 10001+\log\log 10001\bigr)\right\rfloor+16=114335.$
This already gives a modest interval in which to search for the 10001st prime.
How the Code Works
The C++, Python, and Java implementations all share the same computational core. They first handle the trivial first-prime case directly, then choose an initial upper bound: \(64\) for small indices and the logarithmic estimate above for \(n \ge 6\).
Next, the implementation allocates a Boolean array of length \(U+1\), marks 0 and 1 as non-prime, and runs the classical Eratosthenes loop for every candidate \(p\) with \(p^2 \le U\). Whenever \(p\) is still marked, all multiples starting at \(p^2\) are crossed off. A final scan through the table collects the surviving entries in increasing order.
If the resulting list contains at least \(n\) primes, the implementation returns the \(n\)-th entry. Otherwise it doubles the bound and rebuilds the sieve on the larger interval. The surrounding input and output details differ slightly between languages, but the number-theoretic algorithm is identical.
Complexity Analysis
One sieve pass up to \(U\) uses \(O(U)\) memory and \(O(U\log\log U)\) time. Since the chosen bound satisfies \(U=\Theta(n\log n)\), the dominant cost is
$O\!\bigl(n\log n\,\log\log(n\log n)\bigr).$
The retry loop does not change the asymptotic order. If the final successful bound is \(U_{\ast}\), then the total amount of sieving over repeated doublings is bounded by a geometric series:
$U_{\ast}+\frac{U_{\ast}}{2}+\frac{U_{\ast}}{4}+\cdots \lt 2U_{\ast}.$
So even in the fallback case, the total work stays within a constant factor of the last successful sieve.