Problem 467: Superinteger
View on Project EulerProject Euler Problem 467 Solution
EulerSolve provides an optimized solution for Project Euler Problem 467, Superinteger, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary Let \(p_1,p_2,\dots\) be the primes and \(c_1,c_2,\dots\) be the composite numbers. Define the digital-root sequences $P_n=(\operatorname{dr}(p_1),\dots,\operatorname{dr}(p_n)),\qquad C_n=(\operatorname{dr}(c_1),\dots,\operatorname{dr}(c_n)),$ where $\operatorname{dr}(x)=1+((x-1)\bmod 9).$ The task is to form the shortest decimal digit string that contains both \(P_n\) and \(C_n\) as subsequences. If several shortest strings exist, we take the lexicographically smallest one; because every digit lies in \(\{1,\dots,9\}\), this is also the numerically smallest shortest superinteger. The required output is that enormous integer modulo \(10^9+7\), and the implementations target \(n=10000\). Mathematical Approach Write $A=(a_1,\dots,a_n)=P_n,\qquad B=(b_1,\dots,b_n)=C_n.$ We must compute the lexicographically least shortest common supersequence of these two digit sequences. Step 1: Build the Two Digit Sequences The first stage is purely arithmetic. We scan the integers from \(2\) upward, separate primes from composites, and replace each accepted value by its digital root. Because $\operatorname{dr}(x)\in\{1,2,\dots,9\},$ both sequences consist only of ordinary decimal digits and contain no zeros. This matters later, because equal-length digit strings are ordered lexicographically exactly as their decimal integers are ordered....
Detailed mathematical approach
Problem Summary
Let \(p_1,p_2,\dots\) be the primes and \(c_1,c_2,\dots\) be the composite numbers. Define the digital-root sequences
$P_n=(\operatorname{dr}(p_1),\dots,\operatorname{dr}(p_n)),\qquad C_n=(\operatorname{dr}(c_1),\dots,\operatorname{dr}(c_n)),$
where
$\operatorname{dr}(x)=1+((x-1)\bmod 9).$
The task is to form the shortest decimal digit string that contains both \(P_n\) and \(C_n\) as subsequences. If several shortest strings exist, we take the lexicographically smallest one; because every digit lies in \(\{1,\dots,9\}\), this is also the numerically smallest shortest superinteger. The required output is that enormous integer modulo \(10^9+7\), and the implementations target \(n=10000\).
Mathematical Approach
Write
$A=(a_1,\dots,a_n)=P_n,\qquad B=(b_1,\dots,b_n)=C_n.$
We must compute the lexicographically least shortest common supersequence of these two digit sequences.
Step 1: Build the Two Digit Sequences
The first stage is purely arithmetic. We scan the integers from \(2\) upward, separate primes from composites, and replace each accepted value by its digital root. Because
$\operatorname{dr}(x)\in\{1,2,\dots,9\},$
both sequences consist only of ordinary decimal digits and contain no zeros. This matters later, because equal-length digit strings are ordered lexicographically exactly as their decimal integers are ordered.
Step 2: Reformulate the Problem as SCS
A shortest common supersequence (SCS) of \(A\) and \(B\) is a shortest string that contains both sequences in order, not necessarily contiguously. Every valid answer must preserve the internal order of \(A\) and the internal order of \(B\). If \(a_i=b_j\), one output digit can serve both sequences simultaneously; if they differ, the next output digit must come from either \(A\) or \(B\).
So the problem is not about decimal arithmetic first. It is a sequence-merging problem whose final merged digit string is interpreted as an integer only at the end.
Step 3: Dynamic Programming for the Optimal Length
Let \(L(i,j)\) be the length of the shortest common supersequence of the suffixes
$a_i a_{i+1}\dots a_n\qquad\text{and}\qquad b_j b_{j+1}\dots b_n,$
with the convention that \(L(n+1,n+1)=0\). The boundary cases are immediate:
$L(n+1,j)=n-j+1,\qquad L(i,n+1)=n-i+1.$
For interior states we have
$L(i,j)= \begin{cases} 1+L(i+1,j+1), & a_i=b_j,\\ 1+\min\{L(i+1,j),L(i,j+1)\}, & a_i\ne b_j. \end{cases}$
The recurrence is standard: matching digits are consumed together, while different digits force a choice of which sequence contributes the next output digit.
Step 4: Recover the Lexicographically Smallest Optimal Answer
The length table tells us which moves keep the answer shortest. Reconstruction then follows these rules:
1. If one sequence is exhausted, append the rest of the other sequence.
2. If \(a_i=b_j\), output that digit once and advance in both sequences.
3. If \(a_i\ne b_j\), compare \(L(i+1,j)\) and \(L(i,j+1)\).
If one option gives a smaller remaining length, that branch is forced. If both options preserve the optimal length, then both produce shortest supersequences, so the first digit decides lexicographic order immediately. Therefore we output \(\min(a_i,b_j)\). By induction on the remaining suffix length, this greedy tie-break yields the lexicographically smallest shortest supersequence.
Step 5: Evaluate the Huge Integer Modulo \(10^9+7\)
If the final digit string is \(d_1d_2\dots d_t\), its value modulo \(10^9+7\) can be accumulated online by
$V_0=0,\qquad V_{k+1}\equiv 10V_k+d_{k+1}\pmod{10^9+7}.$
This avoids constructing a huge big integer. The algorithm only needs the next chosen digit, so reconstruction and modular evaluation can happen in the same pass.
Worked Example: \(n=10\)
The first ten prime digital roots are
$P_{10}=(2,3,5,7,2,4,8,1,5,2),$
and the first ten composite digital roots are
$C_{10}=(4,6,8,9,1,3,5,6,7,9).$
The common subsequence \((4,8,1,5)\) already shows that the answer can be shorter than length \(20\), and the dynamic-programming table proves that the true optimum length is \(16\). After lexicographic tie-breaking, the shortest supersequence becomes
$2357246891352679.$
This is the checkpoint used by the implementations for \(f(10)\). They also verify that \(f(100)\equiv 771661825 \pmod{10^9+7}\).
How the Code Works
The C++, Python, and Java implementations follow the same pipeline. First they sieve enough integers to collect the first \(n\) primes and the first \(n\) composites, converting each accepted number to its digital root immediately. Next they fill an \((n+1)\times(n+1)\) dynamic-programming table from bottom right to top left so that every state already knows the optimal remaining length of its suffixes.
After that, the implementation reconstructs one digit at a time. Equal digits are merged, unequal digits consult the two neighboring table entries, and ties are broken by the smaller next digit. At the same moment, the answer is updated modulo \(10^9+7\). For small checkpoint cases the full digit string can also be materialized, but that is not needed for the final \(n=10000\) computation.
Complexity Analysis
Let \(M\) be the largest integer that must be sieved in order to collect \(n\) primes and \(n\) composites. Building primality information up to \(M\) costs \(O(M\log\log M)\) time and \(O(M)\) memory. Since the prime side is the sparse one, \(M\) is on the order of the \(n\)-th prime, so \(M=\Theta(n\log n)\).
The SCS stage dominates: filling the length table takes \(O(n^2)\) time and \(O(n^2)\) memory. Reconstruction adds only \(O(t)\) time, where \(t\le 2n\). For the target \(n=10000\), the quadratic dynamic programming is the main cost.
Footnotes and References
- Problem page: https://projecteuler.net/problem=467
- Digital root: Wikipedia — Digital root
- Shortest common supersequence: Wikipedia — Shortest common supersequence problem
- Sieve of Eratosthenes: Wikipedia — Sieve of Eratosthenes
- Dynamic programming: Wikipedia — Dynamic programming