Problem 106: Special Subset Sums: Meta-testing
View on Project EulerProject Euler Problem 106 Solution
EulerSolve provides an optimized solution for Project Euler Problem 106, Special Subset Sums: Meta-testing, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary Let \(A=\{a_1<a_2<\cdots<a_{12}\}\) be a candidate special sum set. The second special-sum rule already says that whenever one disjoint subset has more elements than another, its sum must be larger. Therefore the only remaining ambiguity comes from two disjoint subsets \(B\) and \(C\) with the same size. Problem 106 asks how many unordered pairs \((B,C)\) still need an explicit test for \(S(B)\neq S(C)\). The count depends only on the relative positions inside the ordered set, and for \(n=12\) the total is \(21384\). Mathematical Approach The implementations work entirely with indices \(0,1,\dots,n-1\), not with the actual set values. That is the key invariant of the problem: once the set is strictly increasing, the question of whether two equal-size subsets are automatically ordered depends only on how their positions interleave. Why only equal-size pairs survive If \(|B|\neq |C|\), the second special-sum condition already rules out equality. If \(|B|=|C|=1\), equality is impossible because all elements are distinct....
Detailed mathematical approach
Problem Summary
Let \(A=\{a_1<a_2<\cdots<a_{12}\}\) be a candidate special sum set. The second special-sum rule already says that whenever one disjoint subset has more elements than another, its sum must be larger. Therefore the only remaining ambiguity comes from two disjoint subsets \(B\) and \(C\) with the same size. Problem 106 asks how many unordered pairs \((B,C)\) still need an explicit test for \(S(B)\neq S(C)\). The count depends only on the relative positions inside the ordered set, and for \(n=12\) the total is \(21384\).
Mathematical Approach
The implementations work entirely with indices \(0,1,\dots,n-1\), not with the actual set values. That is the key invariant of the problem: once the set is strictly increasing, the question of whether two equal-size subsets are automatically ordered depends only on how their positions interleave.
Why only equal-size pairs survive
If \(|B|\neq |C|\), the second special-sum condition already rules out equality. If \(|B|=|C|=1\), equality is impossible because all elements are distinct. So the only nontrivial cases are
$|B|=|C|=k,\qquad 2\le k\le \left\lfloor\frac{n}{2}\right\rfloor.$
Reducing the problem to index patterns
Write the two subsets in sorted order:
$B=\{b_1<b_2<\cdots<b_k\},\qquad C=\{c_1<c_2<\cdots<c_k\}.$
If \(b_i<c_i\) for every \(i\), then each element chosen from \(B\) is smaller than the corresponding element chosen from \(C\), so
$a_{b_1}+a_{b_2}+\cdots+a_{b_k}<a_{c_1}+a_{c_2}+\cdots+a_{c_k}.$
The same argument works with the roles reversed when \(c_i<b_i\) for every \(i\). In either situation, the inequality between subset sums is automatic and no equality test is needed. A pair needs testing exactly when the coordinatewise comparison crosses, meaning that one position favors \(B\) and another favors \(C\).
Counting all disjoint pairs of size \(k\)
First choose the \(2k\) positions involved, then split them into two unlabeled \(k\)-subsets. This gives
$\binom{n}{2k}\cdot\frac{1}{2}\binom{2k}{k}$
unordered disjoint pairs with \(|B|=|C|=k\).
Which of those pairs are automatic?
Fix one chosen set of positions \(x_1<x_2<\cdots<x_{2k}\). Any split into two \(k\)-subsets can be encoded by a word of length \(2k\): write \(B\) if \(x_j\) goes into the first subset and \(C\) if it goes into the second. To count each unordered pair once, orient it so that \(x_1\in B\).
Under that orientation, the condition \(b_i<c_i\) for all \(i\) is equivalent to saying that in every prefix of the word, the number of \(B\)'s is at least the number of \(C\)'s. Before the first \(C\) appears there must already be one \(B\); before the second \(C\) appears there must already be two \(B\)'s; and so on. Those prefix-balanced words are Dyck words of semilength \(k\), so their count is the Catalan number
$C_k=\frac{1}{k+1}\binom{2k}{k}.$
Therefore, for each fixed set of \(2k\) positions, the number of unordered pairs that are automatically ordered is \(C_k\), and the number that really need testing is
$\frac{1}{2}\binom{2k}{k}-C_k=\frac{k-1}{2(k+1)}\binom{2k}{k}.$
Worked example: the \(k=2\) pattern
Take four ordered positions \(x_1<x_2<x_3<x_4\). There are exactly three unordered splits into two pairs:
$\{x_1,x_2\}\mid\{x_3,x_4\},\qquad \{x_1,x_3\}\mid\{x_2,x_4\},\qquad \{x_1,x_4\}\mid\{x_2,x_3\}.$
The first two are automatic because one sorted pair is pointwise smaller than the other. The third is the only crossing configuration, so it is the only one that needs a genuine sum comparison. This matches
$\frac{1}{2}\binom{4}{2}-C_2=3-2=1,$
which is also the checkpoint value for \(n=4\).
Final formula for Problem 106
Summing over all feasible subset sizes gives
$T(n)=\sum_{k=2}^{\lfloor n/2\rfloor}\binom{n}{2k}\left(\frac{1}{2}\binom{2k}{k}-\frac{1}{k+1}\binom{2k}{k}\right).$
For \(n=12\), the contributions are
$\binom{12}{4}(3-2)=495,\qquad \binom{12}{6}(10-5)=4620,\qquad \binom{12}{8}(35-14)=10395,$
$\binom{12}{10}(126-42)=5544,\qquad \binom{12}{12}(462-132)=330,$
so
$T(12)=495+4620+10395+5544+330=21384.$
How the Code Works
The C++, Python, and Java implementations do not use the Catalan closed form directly. Instead, they compute the same quantity by explicit enumeration, which is still practical because the target instance is only \(n=12\).
Generating all \(k\)-subsets
For each \(k=2,3,\dots,\lfloor n/2\rfloor\), the implementation generates every \(k\)-subset of \(\{0,1,\dots,n-1\}\). The C++ version represents subsets through integer masks, the Python version uses combinations, and the Java version builds arrays recursively. In every case the stored subset is already sorted.
Filtering the pairs that really matter
The nested loops inspect each unordered pair of generated subsets. First they discard any pair that is not disjoint. For the remaining pairs, they compare the sorted coordinates position by position. If all coordinates of one subset are smaller than the corresponding coordinates of the other, the pair is automatically ordered and is skipped. Only the crossing cases increase the counter.
Built-in sanity checks
One implementation also checks the smaller cases \(n=4\) and \(n=7\) before evaluating \(n=12\). The expected counts are \(1\) and \(70\), and the same formula above gives
$T(7)=\binom{7}{4}(3-2)+\binom{7}{6}(10-5)=35+35=70.$
Complexity Analysis
Let \(M_k=\binom{n}{k}\). For a fixed \(k\), the direct enumeration examines \(\binom{M_k}{2}\) unordered pairs of \(k\)-subsets. The coordinatewise comparison costs \(O(k)\), and the overlap test is at worst \(O(k^2)\) in the tuple and array based versions, so a safe bound for the implemented approach is
$O\!\left(\sum_{k=2}^{\lfloor n/2\rfloor}\binom{n}{k}^2\,k^2\right).$
Memory usage is dominated by storing all \(k\)-subsets for the current size, namely \(O\!\left(\binom{n}{k}k\right)\), with the largest layer near \(k=n/2\). For the actual input \(n=12\), these costs are tiny, which is why the implementations can afford brute-force enumeration even though the mathematics collapses to a short summation.
Footnotes and References
- Problem page: https://projecteuler.net/problem=106
- Catalan number: Wikipedia - Catalan number
- Binomial coefficient: Wikipedia - Binomial coefficient
- Dyck path: Wikipedia - Dyck path