Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 7: 10001st Prime

Project Euler Problem 7 Solution

EulerSolve provides an optimized solution for Project Euler Problem 7, 10001st Prime, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary The task is to determine the 10001st prime number. If $p_n$ denotes the $n$-th prime and $\pi(x)$ denotes the number of primes not exceeding $x$, then the problem is equivalent to finding the smallest integer $x$ such that $\pi(x) \ge 10001$. The implementations do not test primality one candidate at a time. Instead, they choose a finite interval that is expected to contain $p_n$, generate every prime in that interval with a sieve, and then read off the desired entry from the ordered list. Mathematical Approach The whole method rests on two ideas: turning the question into a counting problem, and using a sieve whose correctness follows from a simple factorization invariant. From the $n$-th prime to a counting threshold The prime-counting function is $\pi(x)=\#\{q \le x : q \text{ is prime}\}.$ With this notation, the $n$-th prime can be written as $p_n=\min\{x \ge 2 : \pi(x)\ge n\}.$ So the problem becomes: find some upper limit $U$ with $\pi(U)\ge n$, list all primes up to $U$, and take the $n$-th element of that increasing list. Choosing a practical upper bound For $n \ge 6$, a standard explicit estimate for the $n$-th prime is $p_n \lt n(\log n+\log\log n).$ The implementations therefore start from $U(n)=\left\lfloor n(\log n+\log\log n)\right\rfloor+16.$ The extra 16 is a safety margin....

Detailed mathematical approach

Problem Summary

The task is to determine the 10001st prime number. If $p_n$ denotes the $n$-th prime and $\pi(x)$ denotes the number of primes not exceeding $x$, then the problem is equivalent to finding the smallest integer $x$ such that $\pi(x) \ge 10001$.

The implementations do not test primality one candidate at a time. Instead, they choose a finite interval that is expected to contain $p_n$, generate every prime in that interval with a sieve, and then read off the desired entry from the ordered list.

Mathematical Approach

The whole method rests on two ideas: turning the question into a counting problem, and using a sieve whose correctness follows from a simple factorization invariant.

From the $n$-th prime to a counting threshold

The prime-counting function is

$\pi(x)=\#\{q \le x : q \text{ is prime}\}.$

With this notation, the $n$-th prime can be written as

$p_n=\min\{x \ge 2 : \pi(x)\ge n\}.$

So the problem becomes: find some upper limit $U$ with $\pi(U)\ge n$, list all primes up to $U$, and take the $n$-th element of that increasing list.

Choosing a practical upper bound

For $n \ge 6$, a standard explicit estimate for the $n$-th prime is

$p_n \lt n(\log n+\log\log n).$

The implementations therefore start from

$U(n)=\left\lfloor n(\log n+\log\log n)\right\rfloor+16.$

The extra 16 is a safety margin. For very small indices the formula is inconvenient because of the $\log\log n$ term, so the code uses the fixed bound $64$ instead, and for $n=1$ it returns $2$ immediately. This matches the actual control flow in all three languages.

Correctness does not depend on the first estimate being perfect. If the sieve up to $U$ produces fewer than $n$ primes, then $U \lt p_n$, so the program doubles $U$ and tries again. Because primes are infinite in number, repeated doubling must eventually reach a bound above $p_n$.

Why the sieve is correct

The sieve keeps a Boolean table for the integers $0,1,\dots,U$. At the beginning, every entry at least 2 is treated as a prime candidate. When the outer loop reaches an integer $p$ that is still marked, $p$ must actually be prime: if $p$ were composite, it would have some prime divisor $r \lt p$, and that smaller prime would already have crossed $p$ off.

Once $p$ is known to be prime, the code marks

$p^2,\ p^2+p,\ p^2+2p,\ \dots$

as composite. Starting at $p^2$ is enough. Any smaller multiple $kp$ with $2 \le k \lt p$ already has a factor below $p$, so it was eliminated earlier. After every prime $p \le \sqrt{U}$ has been processed, every unmarked number in the table is prime and every marked number is composite.

Worked examples

The checkpoint example used by the implementations is $n=6$. The first six primes are

$2,3,5,7,11,13,$

so $p_6=13$.

For the actual Project Euler input $n=10001$, the initial estimate is

$U(10001)=\left\lfloor 10001\bigl(\log 10001+\log\log 10001\bigr)\right\rfloor+16=114335.$

This already gives a modest interval in which to search for the 10001st prime.

How the Code Works

The C++, Python, and Java implementations all share the same computational core. They first handle the trivial first-prime case directly, then choose an initial upper bound: $64$ for small indices and the logarithmic estimate above for $n \ge 6$.

Next, the implementation allocates a Boolean array of length $U+1$, marks 0 and 1 as non-prime, and runs the classical Eratosthenes loop for every candidate $p$ with $p^2 \le U$. Whenever $p$ is still marked, all multiples starting at $p^2$ are crossed off. A final scan through the table collects the surviving entries in increasing order.

If the resulting list contains at least $n$ primes, the implementation returns the $n$-th entry. Otherwise it doubles the bound and rebuilds the sieve on the larger interval. The surrounding input and output details differ slightly between languages, but the number-theoretic algorithm is identical.

Complexity Analysis

One sieve pass up to $U$ uses $O(U)$ memory and $O(U\log\log U)$ time. Since the chosen bound satisfies $U=\Theta(n\log n)$, the dominant cost is

$O\!\bigl(n\log n\,\log\log(n\log n)\bigr).$

The retry loop does not change the asymptotic order. If the final successful bound is $U_{\ast}$, then the total amount of sieving over repeated doublings is bounded by a geometric series:

$U_{\ast}+\frac{U_{\ast}}{2}+\frac{U_{\ast}}{4}+\cdots \lt 2U_{\ast}.$

So even in the fallback case, the total work stays within a constant factor of the last successful sieve.