Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 467: Superinteger

Project Euler Problem 467 Solution

EulerSolve provides an optimized solution for Project Euler Problem 467, Superinteger, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary Let $p_1,p_2,\dots$ be the primes and $c_1,c_2,\dots$ be the composite numbers. Define the digital-root sequences $P_n=(\operatorname{dr}(p_1),\dots,\operatorname{dr}(p_n)),\qquad C_n=(\operatorname{dr}(c_1),\dots,\operatorname{dr}(c_n)),$ where $\operatorname{dr}(x)=1+((x-1)\bmod 9).$ The task is to form the shortest decimal digit string that contains both $P_n$ and $C_n$ as subsequences. If several shortest strings exist, we take the lexicographically smallest one; because every digit lies in $\{1,\dots,9\}$, this is also the numerically smallest shortest superinteger. The required output is that enormous integer modulo $10^9+7$, and the implementations target $n=10000$. Mathematical Approach Write $A=(a_1,\dots,a_n)=P_n,\qquad B=(b_1,\dots,b_n)=C_n.$ We must compute the lexicographically least shortest common supersequence of these two digit sequences. Step 1: Build the Two Digit Sequences The first stage is purely arithmetic. We scan the integers from $2$ upward, separate primes from composites, and replace each accepted value by its digital root. Because $\operatorname{dr}(x)\in\{1,2,\dots,9\},$ both sequences consist only of ordinary decimal digits and contain no zeros. This matters later, because equal-length digit strings are ordered lexicographically exactly as their decimal integers are ordered....

Detailed mathematical approach

Problem Summary

Let $p_1,p_2,\dots$ be the primes and $c_1,c_2,\dots$ be the composite numbers. Define the digital-root sequences

$P_n=(\operatorname{dr}(p_1),\dots,\operatorname{dr}(p_n)),\qquad C_n=(\operatorname{dr}(c_1),\dots,\operatorname{dr}(c_n)),$

where

$\operatorname{dr}(x)=1+((x-1)\bmod 9).$

The task is to form the shortest decimal digit string that contains both $P_n$ and $C_n$ as subsequences. If several shortest strings exist, we take the lexicographically smallest one; because every digit lies in $\{1,\dots,9\}$, this is also the numerically smallest shortest superinteger. The required output is that enormous integer modulo $10^9+7$, and the implementations target $n=10000$.

Mathematical Approach

Write

$A=(a_1,\dots,a_n)=P_n,\qquad B=(b_1,\dots,b_n)=C_n.$

We must compute the lexicographically least shortest common supersequence of these two digit sequences.

Step 1: Build the Two Digit Sequences

The first stage is purely arithmetic. We scan the integers from $2$ upward, separate primes from composites, and replace each accepted value by its digital root. Because

$\operatorname{dr}(x)\in\{1,2,\dots,9\},$

both sequences consist only of ordinary decimal digits and contain no zeros. This matters later, because equal-length digit strings are ordered lexicographically exactly as their decimal integers are ordered.

Step 2: Reformulate the Problem as SCS

A shortest common supersequence (SCS) of $A$ and $B$ is a shortest string that contains both sequences in order, not necessarily contiguously. Every valid answer must preserve the internal order of $A$ and the internal order of $B$. If $a_i=b_j$, one output digit can serve both sequences simultaneously; if they differ, the next output digit must come from either $A$ or $B$.

So the problem is not about decimal arithmetic first. It is a sequence-merging problem whose final merged digit string is interpreted as an integer only at the end.

Step 3: Dynamic Programming for the Optimal Length

Let $L(i,j)$ be the length of the shortest common supersequence of the suffixes

$a_i a_{i+1}\dots a_n\qquad\text{and}\qquad b_j b_{j+1}\dots b_n,$

with the convention that $L(n+1,n+1)=0$. The boundary cases are immediate:

$L(n+1,j)=n-j+1,\qquad L(i,n+1)=n-i+1.$

For interior states we have

$L(i,j)= \begin{cases} 1+L(i+1,j+1), & a_i=b_j,\\ 1+\min\{L(i+1,j),L(i,j+1)\}, & a_i\ne b_j. \end{cases}$

The recurrence is standard: matching digits are consumed together, while different digits force a choice of which sequence contributes the next output digit.

Step 4: Recover the Lexicographically Smallest Optimal Answer

The length table tells us which moves keep the answer shortest. Reconstruction then follows these rules:

1. If one sequence is exhausted, append the rest of the other sequence.

2. If $a_i=b_j$, output that digit once and advance in both sequences.

3. If $a_i\ne b_j$, compare $L(i+1,j)$ and $L(i,j+1)$.

If one option gives a smaller remaining length, that branch is forced. If both options preserve the optimal length, then both produce shortest supersequences, so the first digit decides lexicographic order immediately. Therefore we output $\min(a_i,b_j)$. By induction on the remaining suffix length, this greedy tie-break yields the lexicographically smallest shortest supersequence.

Step 5: Evaluate the Huge Integer Modulo $10^9+7$

If the final digit string is $d_1d_2\dots d_t$, its value modulo $10^9+7$ can be accumulated online by

$V_0=0,\qquad V_{k+1}\equiv 10V_k+d_{k+1}\pmod{10^9+7}.$

This avoids constructing a huge big integer. The algorithm only needs the next chosen digit, so reconstruction and modular evaluation can happen in the same pass.

Worked Example: $n=10$

The first ten prime digital roots are

$P_{10}=(2,3,5,7,2,4,8,1,5,2),$

and the first ten composite digital roots are

$C_{10}=(4,6,8,9,1,3,5,6,7,9).$

The common subsequence $(4,8,1,5)$ already shows that the answer can be shorter than length $20$, and the dynamic-programming table proves that the true optimum length is $16$. After lexicographic tie-breaking, the shortest supersequence becomes

$2357246891352679.$

This is the checkpoint used by the implementations for $f(10)$. They also verify that $f(100)\equiv 771661825 \pmod{10^9+7}$.

How the Code Works

The C++, Python, and Java implementations follow the same pipeline. First they sieve enough integers to collect the first $n$ primes and the first $n$ composites, converting each accepted number to its digital root immediately. Next they fill an $(n+1)\times(n+1)$ dynamic-programming table from bottom right to top left so that every state already knows the optimal remaining length of its suffixes.

After that, the implementation reconstructs one digit at a time. Equal digits are merged, unequal digits consult the two neighboring table entries, and ties are broken by the smaller next digit. At the same moment, the answer is updated modulo $10^9+7$. For small checkpoint cases the full digit string can also be materialized, but that is not needed for the final $n=10000$ computation.

Complexity Analysis

Let $M$ be the largest integer that must be sieved in order to collect $n$ primes and $n$ composites. Building primality information up to $M$ costs $O(M\log\log M)$ time and $O(M)$ memory. Since the prime side is the sparse one, $M$ is on the order of the $n$-th prime, so $M=\Theta(n\log n)$.

The SCS stage dominates: filling the length table takes $O(n^2)$ time and $O(n^2)$ memory. Reconstruction adds only $O(t)$ time, where $t\le 2n$. For the target $n=10000$, the quadratic dynamic programming is the main cost.

Footnotes and References

Problem page: https://projecteuler.net/problem=467
Digital root: Wikipedia — Digital root
Shortest common supersequence: Wikipedia — Shortest common supersequence problem
Sieve of Eratosthenes: Wikipedia — Sieve of Eratosthenes
Dynamic programming: Wikipedia — Dynamic programming