Problem 249: Prime Subset Sums
View on Project EulerProject Euler Problem 249 Solution
EulerSolve provides an optimized solution for Project Euler Problem 249, Prime Subset Sums, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary Let \(\mathcal{P}\) be the set of all primes below 5000. There are 669 such primes, so a brute-force search would have to reason about \(2^{669}\) different subsets. For each subset \(A \subseteq \mathcal{P}\), form the sum \(\sum_{p \in A} p\). The task is to count how many subsets have a prime-valued sum and report that count modulo \(10^{16}\). The count is over subsets, not over distinct sums. If two different subsets produce the same prime total, both must be counted. The empty subset contributes the sum 0, which is not prime, so it never enters the final answer, but it is still the correct base case for the recurrence. Mathematical Approach The natural state space is the set of all achievable subset sums. Since every input value is itself prime and may be chosen at most once, the problem is a counting version of 0/1 subset sum rather than an unbounded partition problem. The generating function for subset sums Order the input primes increasingly as $p_1 < p_2 < \cdots < p_m,\qquad m=669.$ If $G(x)=\prod_{i=1}^{m}(1+x^{p_i}),$ then choosing the term \(1\) means “do not use \(p_i\)”, while choosing \(x^{p_i}\) means “use \(p_i\) once”....
Detailed mathematical approach
Problem Summary
Let \(\mathcal{P}\) be the set of all primes below 5000. There are 669 such primes, so a brute-force search would have to reason about \(2^{669}\) different subsets. For each subset \(A \subseteq \mathcal{P}\), form the sum \(\sum_{p \in A} p\). The task is to count how many subsets have a prime-valued sum and report that count modulo \(10^{16}\).
The count is over subsets, not over distinct sums. If two different subsets produce the same prime total, both must be counted. The empty subset contributes the sum 0, which is not prime, so it never enters the final answer, but it is still the correct base case for the recurrence.
Mathematical Approach
The natural state space is the set of all achievable subset sums. Since every input value is itself prime and may be chosen at most once, the problem is a counting version of 0/1 subset sum rather than an unbounded partition problem.
The generating function for subset sums
Order the input primes increasingly as
$p_1 < p_2 < \cdots < p_m,\qquad m=669.$
If
$G(x)=\prod_{i=1}^{m}(1+x^{p_i}),$
then choosing the term \(1\) means “do not use \(p_i\)”, while choosing \(x^{p_i}\) means “use \(p_i\) once”. Therefore
$G(x)=\sum_{s=0}^{S_{\max}} F(s)x^s,$
where \(F(s)\) is exactly the number of subsets whose elements sum to \(s\), and
$S_{\max}=\sum_{p\in\mathcal{P}} p = 1{,}548{,}136.$
The required answer is then
$\text{Answer}=\sum_{\substack{q \text{ prime} \\ 0 \le q \le S_{\max}}} F(q)\pmod{10^{16}}.$
The recurrence after the first \(k\) primes
Let \(F_k(s)\) be the number of subsets of \(\{p_1,\dots,p_k\}\) whose sum is \(s\). Then
$F_0(0)=1,\qquad F_0(s)=0 \text{ for } s>0,$
and for \(k\ge 1\), each subset counted by \(F_k(s)\) either omits \(p_k\) or includes it exactly once. That gives the recurrence
$F_k(s)=F_{k-1}(s)+F_{k-1}(s-p_k),$
where the second term is interpreted as 0 when \(s<p_k\). This is the whole mathematics behind the dynamic program: every valid subset of the first \(k\) primes appears in exactly one of those two disjoint cases.
Why one descending array is enough
The implementations do not store the full two-dimensional table. Instead, they keep a single array of coefficients and update it in descending order of \(s\). After the first \(k-1\) primes, suppose the array stores \(F_{k-1}(s)\). When processing \(p_k\), the update
$F(s+p_k)\leftarrow F(s+p_k)+F(s)\pmod{10^{16}}$
must read the old value of \(F(s)\), not a value already modified by the same prime. Sweeping \(s\) downward guarantees exactly that, so each prime is used either zero times or one time, never twice in the same stage.
A second invariant is that after the first \(k\) primes, all nonzero entries lie in
$0 \le s \le \sigma_k,\qquad \sigma_k=p_1+\cdots+p_k.$
This explains why the code only iterates over the currently reachable frontier instead of scanning the whole array after every prime.
Worked example: primes below 10
For the smaller set \(\{2,3,5,7\}\), the generating function is
$G(x)=(1+x^2)(1+x^3)(1+x^5)(1+x^7).$
Expanding it gives
$G(x)=1+x^2+x^3+2x^5+2x^7+x^8+x^9+2x^{10}+2x^{12}+x^{14}+x^{15}+x^{17}.$
The prime exponents are \(2,3,5,7,17\), with coefficients \(1,1,2,2,1\). Therefore the number of subsets with prime sum is
$1+1+2+2+1=7.$
This example also shows why we count subsets rather than distinct sums: the coefficient of \(x^5\) is 2 because both \(\{5\}\) and \(\{2,3\}\) produce the prime sum 5.
From subset counts to the final total
Once every coefficient \(F(s)\) has been computed modulo \(10^{16}\), the remaining task is only to know which indices \(s\) are prime. That is why the solution performs a second sieve up to \(S_{\max}\): it marks prime sums once, then adds the stored subset counts at exactly those indices.
How the Code Works
Building the prime input set
The C++, Python, and Java implementations first run a sieve up to 5000 to list the input primes. During that same pass they accumulate the total sum \(S_{\max}\), which determines the length of the dynamic-programming array.
Running the one-dimensional counting DP
The array starts with one nonzero entry: there is exactly one way to make sum 0 before any prime has been processed. Then each prime is folded in by a descending sweep over currently reachable sums. If a sum \(s\) already has \(c\) representations, then \(s+p\) gains another \(c\) representations by appending the new prime \(p\).
Every entry is reduced modulo \(10^{16}\) immediately, so the code never needs the astronomically large exact counts. In the fixed-width implementations this is especially convenient: each update combines two residues below \(10^{16}\), which comfortably fits in 64-bit arithmetic.
Filtering by prime totals
After the subset-count array has been completed, the implementation runs another sieve, now on the interval from 0 to \(S_{\max}\). This second sieve is about the possible subset sums, not about the input primes. A final linear pass adds the counts stored at prime indices only.
The small sanity cases are consistent across the implementations: using primes below 10 gives 7, and using primes below 20 gives 65. Those checks confirm that the recurrence and the prime-sum filter are aligned correctly.
Complexity Analysis
If \(m=\pi(5000)=669\) and \(S_{\max}=1{,}548{,}136\), the dynamic program costs \(O(mS_{\max})\) time and \(O(S_{\max})\) memory. The final sieve on subset sums costs \(O(S_{\max}\log\log S_{\max})\), which is smaller than the DP term at this scale.
So the problem is solved with one large but manageable one-dimensional table instead of enumerating \(2^{669}\) subsets. That is the decisive reduction: exponential search over subsets becomes polynomial counting over reachable sums.
Footnotes and References
- Project Euler problem page: Project Euler 249
- Subset sum problem: Wikipedia - Subset sum problem
- Dynamic programming: Wikipedia - Dynamic programming
- Generating function: Wikipedia - Generating function
- Sieve of Eratosthenes: Wikipedia - Sieve of Eratosthenes