Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 7: 10001st Prime

View on Project Euler

Project Euler Problem 7 Solution

EulerSolve provides an optimized solution for Project Euler Problem 7, 10001st Prime, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary The task is to determine the 10001st prime number. If \(p_n\) denotes the \(n\)-th prime and \(\pi(x)\) denotes the number of primes not exceeding \(x\), then the problem is equivalent to finding the smallest integer \(x\) such that \(\pi(x) \ge 10001\). The implementations do not test primality one candidate at a time. Instead, they choose a finite interval that is expected to contain \(p_n\), generate every prime in that interval with a sieve, and then read off the desired entry from the ordered list. Mathematical Approach The whole method rests on two ideas: turning the question into a counting problem, and using a sieve whose correctness follows from a simple factorization invariant. From the \(n\)-th prime to a counting threshold The prime-counting function is $\pi(x)=\#\{q \le x : q \text{ is prime}\}.$ With this notation, the \(n\)-th prime can be written as $p_n=\min\{x \ge 2 : \pi(x)\ge n\}.$ So the problem becomes: find some upper limit \(U\) with \(\pi(U)\ge n\), list all primes up to \(U\), and take the \(n\)-th element of that increasing list. Choosing a practical upper bound For \(n \ge 6\), a standard explicit estimate for the \(n\)-th prime is $p_n \lt n(\log n+\log\log n).$ The implementations therefore start from $U(n)=\left\lfloor n(\log n+\log\log n)\right\rfloor+16.$ The extra 16 is a safety margin....

Detailed mathematical approach

Problem Summary

The task is to determine the 10001st prime number. If \(p_n\) denotes the \(n\)-th prime and \(\pi(x)\) denotes the number of primes not exceeding \(x\), then the problem is equivalent to finding the smallest integer \(x\) such that \(\pi(x) \ge 10001\).

The implementations do not test primality one candidate at a time. Instead, they choose a finite interval that is expected to contain \(p_n\), generate every prime in that interval with a sieve, and then read off the desired entry from the ordered list.

Mathematical Approach

The whole method rests on two ideas: turning the question into a counting problem, and using a sieve whose correctness follows from a simple factorization invariant.

From the \(n\)-th prime to a counting threshold

The prime-counting function is

$\pi(x)=\#\{q \le x : q \text{ is prime}\}.$

With this notation, the \(n\)-th prime can be written as

$p_n=\min\{x \ge 2 : \pi(x)\ge n\}.$

So the problem becomes: find some upper limit \(U\) with \(\pi(U)\ge n\), list all primes up to \(U\), and take the \(n\)-th element of that increasing list.

Choosing a practical upper bound

For \(n \ge 6\), a standard explicit estimate for the \(n\)-th prime is

$p_n \lt n(\log n+\log\log n).$

The implementations therefore start from

$U(n)=\left\lfloor n(\log n+\log\log n)\right\rfloor+16.$

The extra 16 is a safety margin. For very small indices the formula is inconvenient because of the \(\log\log n\) term, so the code uses the fixed bound \(64\) instead, and for \(n=1\) it returns \(2\) immediately. This matches the actual control flow in all three languages.

Correctness does not depend on the first estimate being perfect. If the sieve up to \(U\) produces fewer than \(n\) primes, then \(U \lt p_n\), so the program doubles \(U\) and tries again. Because primes are infinite in number, repeated doubling must eventually reach a bound above \(p_n\).

Why the sieve is correct

The sieve keeps a Boolean table for the integers \(0,1,\dots,U\). At the beginning, every entry at least 2 is treated as a prime candidate. When the outer loop reaches an integer \(p\) that is still marked, \(p\) must actually be prime: if \(p\) were composite, it would have some prime divisor \(r \lt p\), and that smaller prime would already have crossed \(p\) off.

Once \(p\) is known to be prime, the code marks

$p^2,\ p^2+p,\ p^2+2p,\ \dots$

as composite. Starting at \(p^2\) is enough. Any smaller multiple \(kp\) with \(2 \le k \lt p\) already has a factor below \(p\), so it was eliminated earlier. After every prime \(p \le \sqrt{U}\) has been processed, every unmarked number in the table is prime and every marked number is composite.

Worked examples

The checkpoint example used by the implementations is \(n=6\). The first six primes are

$2,3,5,7,11,13,$

so \(p_6=13\).

For the actual Project Euler input \(n=10001\), the initial estimate is

$U(10001)=\left\lfloor 10001\bigl(\log 10001+\log\log 10001\bigr)\right\rfloor+16=114335.$

This already gives a modest interval in which to search for the 10001st prime.

How the Code Works

The C++, Python, and Java implementations all share the same computational core. They first handle the trivial first-prime case directly, then choose an initial upper bound: \(64\) for small indices and the logarithmic estimate above for \(n \ge 6\).

Next, the implementation allocates a Boolean array of length \(U+1\), marks 0 and 1 as non-prime, and runs the classical Eratosthenes loop for every candidate \(p\) with \(p^2 \le U\). Whenever \(p\) is still marked, all multiples starting at \(p^2\) are crossed off. A final scan through the table collects the surviving entries in increasing order.

If the resulting list contains at least \(n\) primes, the implementation returns the \(n\)-th entry. Otherwise it doubles the bound and rebuilds the sieve on the larger interval. The surrounding input and output details differ slightly between languages, but the number-theoretic algorithm is identical.

Complexity Analysis

One sieve pass up to \(U\) uses \(O(U)\) memory and \(O(U\log\log U)\) time. Since the chosen bound satisfies \(U=\Theta(n\log n)\), the dominant cost is

$O\!\bigl(n\log n\,\log\log(n\log n)\bigr).$

The retry loop does not change the asymptotic order. If the final successful bound is \(U_{\ast}\), then the total amount of sieving over repeated doublings is bounded by a geometric series:

$U_{\ast}+\frac{U_{\ast}}{2}+\frac{U_{\ast}}{4}+\cdots \lt 2U_{\ast}.$

So even in the fallback case, the total work stays within a constant factor of the last successful sieve.

Footnotes and References

  1. Project Euler Problem 7
  2. Prime number theorem
  3. Prime-counting function
  4. Sieve of Eratosthenes
  5. Bounds for the \(n\)-th prime

Mathematical approach · C++ solution · Python solution · Java solution

Previous: Problem 6 · All Project Euler solutions · Next: Problem 8

Need help with a problem? Ask me! 💡
e
✦ Euler GLM 5.2
Hi! I'm Euler. Ask me anything about Project Euler problems, math concepts, or solution approaches! 🧮