Problem 10: Summation of Primes
View on Project EulerProject Euler Problem 10 Solution
EulerSolve provides an optimized solution for Project Euler Problem 10, Summation of Primes, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary For a given upper bound \(L\), compute $S(L)=\sum_{\substack{p\lt L\\ p\text{ prime}}} p.$ The inequality is strict: if \(L\) itself is prime, it is still excluded. When \(L\le 2\), the sum is \(0\) because there are no primes below \(2\). The task is not to test one number for primality, but to add all primes below a large bound without repeating the same divisibility work for each candidate. The implementation therefore classifies every integer from \(0\) to \(L-1\) in one coordinated pass and then sums the entries that survive. Mathematical Approach The solution uses the Sieve of Eratosthenes. Instead of treating each integer independently, it keeps a table of numbers that are still possible primes and progressively deletes the composite ones. The Quantity Being Summed A convenient way to write the target is $S(L)=\sum_{n=2}^{L-1} n\,\mathbf{1}_{n\text{ is prime}}.$ So the real objective is to build the indicator \(\mathbf{1}_{n\text{ is prime}}\) for every \(n\lt L\). Once that information is known, the sum itself is just a linear scan through the surviving indices. The Sieve State and Its Invariant Begin with a Boolean table indexed by \(0,1,\dots,L-1\). Mark \(0\) and \(1\) as non-prime, and treat every integer from \(2\) onward as provisionally prime. Then inspect candidate bases in increasing order....
Detailed mathematical approach
Problem Summary
For a given upper bound \(L\), compute
$S(L)=\sum_{\substack{p\lt L\\ p\text{ prime}}} p.$
The inequality is strict: if \(L\) itself is prime, it is still excluded. When \(L\le 2\), the sum is \(0\) because there are no primes below \(2\).
The task is not to test one number for primality, but to add all primes below a large bound without repeating the same divisibility work for each candidate. The implementation therefore classifies every integer from \(0\) to \(L-1\) in one coordinated pass and then sums the entries that survive.
Mathematical Approach
The solution uses the Sieve of Eratosthenes. Instead of treating each integer independently, it keeps a table of numbers that are still possible primes and progressively deletes the composite ones.
The Quantity Being Summed
A convenient way to write the target is
$S(L)=\sum_{n=2}^{L-1} n\,\mathbf{1}_{n\text{ is prime}}.$
So the real objective is to build the indicator \(\mathbf{1}_{n\text{ is prime}}\) for every \(n\lt L\). Once that information is known, the sum itself is just a linear scan through the surviving indices.
The Sieve State and Its Invariant
Begin with a Boolean table indexed by \(0,1,\dots,L-1\). Mark \(0\) and \(1\) as non-prime, and treat every integer from \(2\) onward as provisionally prime. Then inspect candidate bases in increasing order.
The key invariant is this: after all prime bases smaller than \(p\) have been processed, every number below \(L\) that has a prime factor smaller than \(p\) has already been marked composite. Therefore, if the entry for \(p\) is still unmarked when the scan reaches it, then \(p\) must itself be prime. If it were composite, it would have a least prime factor \(r\lt p\), and the pass for \(r\) would already have crossed out \(p\).
Why Crossing Off Starts at \(p^2\)
Once a prime \(p\) is confirmed, the sieve marks
$p^2,\ p^2+p,\ p^2+2p,\ \dots$
as composite. The first term is \(p^2\), not \(2p\), for a precise reason. Any smaller multiple \(kp\) with \(2\le k\lt p\) already has a factor smaller than \(p\), so it was removed earlier while processing that smaller factor or one of its prime divisors.
This is where the sieve gains most of its efficiency: the inner pass never restarts from the full list of multiples.
Why It Is Enough to Stop at \(\sqrt{L}\)
Let \(n\lt L\) be composite. Write \(n=ab\) with \(2\le a\le b\). Then \(a^2\le n\lt L\), so \(a\lt \sqrt{L}\). In particular, \(n\) has a prime factor \(r\le a\lt \sqrt{L}\). That prime factor will appear as a sieve base before the outer loop finishes, and its marking pass will cross out \(n\).
So once the scan reaches a value with \(p^2\ge L\), no composite below \(L\) can still be waiting for its first prime factor to be processed. Every remaining unmarked entry is prime, and the algorithm can move directly to summation.
Worked Example: \(L=30\)
Initially, the candidates are \(2,3,4,\dots,29\).
Processing \(p=2\) removes
$4,6,8,10,12,14,16,18,20,22,24,26,28.$
Processing \(p=3\) removes
$9,12,15,18,21,24,27,$
though some of those were already removed by \(2\).
The candidate \(4\) is skipped because it is already known to be composite. Next, \(p=5\) removes only \(25\), since the marking starts at \(5^2\). After that, \(6^2>30\), so the marking phase is over.
The numbers still marked prime are
$2,3,5,7,11,13,17,19,23,29,$
and therefore
$S(30)=129.$
The same mechanism explains the standard checkpoint \(S(10)=2+3+5+7=17\).
How the Code Works
The C++, Python, and Java implementations all use the same three-phase structure.
Initialize the Table and Handle Small Limits
If the bound is at most \(2\), the answer is returned immediately as \(0\). Otherwise, the implementation allocates a Boolean table of length \(L\), sets the entries for \(0\) and \(1\) to false, and leaves the rest provisionally true.
Strike Composite Multiples in Increasing Base Order
The outer scan starts at \(2\) and continues while \(p^2\lt L\). Whenever the current entry is still marked prime, the implementation crosses out the arithmetic progression
$p^2,\ p^2+p,\ p^2+2p,\ \dots \lt L.$
Candidates already known to be composite are skipped automatically, so only genuine prime bases launch a marking pass.
Accumulate the Surviving Primes
After the sieve phase, a second linear pass adds every index that remains marked prime. The typed implementations use a 64-bit accumulator because the total for the problem input is far beyond 32-bit range; the Python implementation relies on arbitrary-precision integers.
Complexity Analysis
Initializing the Boolean table costs \(O(L)\) time. The marking work is bounded by
$\sum_{\substack{p\le \sqrt{L}\\ p\text{ prime}}}\left(\left\lfloor\frac{L-1-p^2}{p}\right\rfloor+1\right)=O\!\left(L\sum_{p\le\sqrt{L}}\frac{1}{p}\right)=O(L\log\log L).$
The final accumulation pass adds another \(O(L)\), so the total running time is \(O(L\log\log L)\). The Boolean table uses \(O(L)\) memory.
Footnotes and References
- Project Euler problem page: Problem 10 - Summation of Primes
- Sieve of Eratosthenes: Wikipedia
- Algorithmic discussion of the sieve: cp-algorithms
- Background on prime numbers: Wikipedia - Prime number