Problem 50: Consecutive Prime Sum
View on Project EulerProject Euler Problem 50 Solution
EulerSolve provides an optimized solution for Project Euler Problem 50, Consecutive Prime Sum, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary Let \(N = 10^6\). Among all primes below \(N\), we want the one that can be written as the sum of the longest block of consecutive primes. If \(q_0 = 2, q_1 = 3, q_2 = 5, \dots\) are the primes in increasing order, then the task is to maximize the length \(L = j - i\) of a block $q_i + q_{i+1} + \cdots + q_{j-1}$ subject to the sum itself being prime and still satisfying \(q_i + \cdots + q_{j-1} \lt N\). The built-in checkpoint examples in the implementations are the standard ones: below \(100\), the winner is \(41 = 2 + 3 + 5 + 7 + 11 + 13\); below \(1000\), the winner is \(953\), obtained from \(21\) consecutive primes. The million-case is the same search problem on a larger range, so the main challenge is not exotic number theory but an efficient way to scan all relevant consecutive-prime intervals. Mathematical Approach The prime list is ordered, and every term is positive. Those two facts are enough to turn the naive search into a manageable interval problem. Consecutive blocks become differences of prefix sums Define the prefix totals $P_0 = 0,\qquad P_m = \sum_{t=0}^{m-1} q_t \quad (m \ge 1).$ Then every block of consecutive primes has the closed form $S(i,j) = q_i + q_{i+1} + \cdots + q_{j-1} = P_j - P_i,$ where \(0 \le i \lt j\), and the length of the block is \(L = j - i\)....
Detailed mathematical approach
Problem Summary
Let \(N = 10^6\). Among all primes below \(N\), we want the one that can be written as the sum of the longest block of consecutive primes. If \(q_0 = 2, q_1 = 3, q_2 = 5, \dots\) are the primes in increasing order, then the task is to maximize the length \(L = j - i\) of a block
$q_i + q_{i+1} + \cdots + q_{j-1}$
subject to the sum itself being prime and still satisfying \(q_i + \cdots + q_{j-1} \lt N\).
The built-in checkpoint examples in the implementations are the standard ones: below \(100\), the winner is \(41 = 2 + 3 + 5 + 7 + 11 + 13\); below \(1000\), the winner is \(953\), obtained from \(21\) consecutive primes. The million-case is the same search problem on a larger range, so the main challenge is not exotic number theory but an efficient way to scan all relevant consecutive-prime intervals.
Mathematical Approach
The prime list is ordered, and every term is positive. Those two facts are enough to turn the naive search into a manageable interval problem.
Consecutive blocks become differences of prefix sums
Define the prefix totals
$P_0 = 0,\qquad P_m = \sum_{t=0}^{m-1} q_t \quad (m \ge 1).$
Then every block of consecutive primes has the closed form
$S(i,j) = q_i + q_{i+1} + \cdots + q_{j-1} = P_j - P_i,$
where \(0 \le i \lt j\), and the length of the block is \(L = j - i\). This is the central mathematical object in the solution: instead of recomputing a sum term by term, each candidate interval is represented by two endpoints and evaluated by a single subtraction.
Positivity gives a strict stopping rule
For a fixed start index \(i\), the values
$S(i,i+1),\ S(i,i+2),\ S(i,i+3),\ \dots$
are strictly increasing, because every step adds another positive prime. Therefore, once some end index \(j\) gives \(S(i,j) \ge N\), every later end index \(j' \gt j\) is impossible as well. No later window with the same start can come back below the bound.
This monotonicity is what justifies the early break in the search. It is not merely an optimization trick; it is a direct consequence of the fact that all primes are positive.
The key invariant: only record-breaking lengths matter
Suppose the best chain found so far has length \(B\). Any new interval with length at most \(B\) cannot improve the answer, even if its sum is prime, because the problem asks for the longest chain. So for a fixed start \(i\), the search only needs to consider end indices \(j\) with
$j - i \ge B + 1.$
Equivalently, the first relevant end index is \(j = i + B + 1\). Whenever a prime-valued interval is discovered, \(B\) is updated to that larger length, and the lower bound for all future searches becomes stricter. This is the main invariant encoded by the implementations.
Notice that this is slightly different from a brute-force "test every length from longest to shortest" strategy. The code keeps a moving threshold for admissible lengths and examines only intervals that could still set a new record.
Why the search is complete
Take any interval \((i,j)\) that could possibly be the first interval of some new record length. When the outer scan reaches the start index \(i\), the current record length \(B\) must satisfy \(B \lt j - i\); otherwise the interval would not improve anything. That means \(j\) lies inside the inner loop's search range, which begins at \(i + B + 1\).
Since the inner loop then increases \(j\) one step at a time until the sum reaches the limit, every potentially record-setting interval is examined exactly once. The algorithm therefore performs a complete search over all intervals that can still matter and discards only intervals that are mathematically incapable of beating the current record.
Worked example: the checkpoint at 1000
With \(q_0 = 2, q_1 = 3, q_2 = 5, q_3 = 7, \dots\), the \(21\)-term block
$7 + 11 + 13 + 17 + 19 + 23 + 29 + 31 + 37 + 41 + 43 + 47 + 53 + 59 + 61 + 67 + 71 + 73 + 79 + 83 + 89 = 953$
starts at \(q_3\) and ends at \(q_{23}\). In prefix-sum notation this is simply
$S(3,24) = P_{24} - P_3 = 953.$
The checkpoint confirms that below \(1000\) no longer consecutive-prime block gives a prime. This example shows the whole method in miniature: represent the block by its endpoints, compute the value through a prefix difference, and compare the resulting length with the current best record.
How the Code Works
The C++, Python, and Java implementations first build a sieve up to \(N - 1\). That one table serves two purposes at once: it generates the ordered list of primes and it answers primality queries for candidate sums in constant time.
Next, the implementation constructs the prefix array \(P\). The running totals are stored in integer types wide enough to hold cumulative sums larger than the final answer bound, because prefix totals keep growing even after individual candidate windows have already exceeded \(N\).
The main search keeps two pieces of state: the best prime found so far and the best length \(B\). For each start index \(i\), the end index begins at \(i + B + 1\), so intervals that are already too short are skipped completely. Each candidate sum is computed as \(P_j - P_i\). If that value reaches the limit, the scan for that start stops immediately; if it is prime, the stored best length and best prime are updated on the spot.
The implementations also contain two sanity checkpoints, namely \(41\) for limit \(100\) and \(953\) for limit \(1000\). These checks are useful because they verify that the sieve, the prefix sums, and the record-tracking invariant all agree before the million-scale computation is run.
Complexity Analysis
Building the sieve costs \(O(N \log \log N)\) time and \(O(N)\) space. Extracting the prime list and constructing the prefix totals add \(O(\pi(N))\) time and space, where \(\pi(N)\) is the prime-counting function.
In the worst case, the interval search is \(O(\pi(N)^2)\), because it is a nested scan over start and end positions in the prime list. In practice it is far better than naive quadratic work for two reasons: the sum for a fixed start grows monotonically and triggers an early break, and the current record length eliminates every interval that is too short to matter.
Footnotes and References
- Problem page: Project Euler 50 - Consecutive prime sum
- Prime numbers: Wikipedia - Prime number
- Sieve construction: Wikipedia - Sieve of Eratosthenes
- Prefix totals: Wikipedia - Cumulative sum
- Prime density: Wikipedia - Prime-counting function