Problem 714: Duodigits
View on Project EulerProject Euler Problem 714 Solution
EulerSolve provides an optimized solution for Project Euler Problem 714, Duodigits, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary For each positive integer \(n\), define \(d(n)\) as the smallest positive multiple of \(n\) whose decimal expansion uses at most two distinct digits; such a number is called a duodigit. The task is to evaluate $D(k)=\sum_{n=1}^{k} d(n).$ A direct search through multiples of \(n\) is far too slow, because the first valid multiple can be much larger than \(n\) and can have many digits. The implementation instead rewrites the digit pattern as a modular subset-sum problem and searches by increasing decimal length. Mathematical Approach The key idea is to describe every candidate duodigit of a fixed length in a uniform algebraic way. That turns the divisibility test into a congruence and lets us search over residue classes instead of enumerating all decimal strings. Step 1: Write every candidate in a fixed two-digit form Fix a decimal length \(t\ge 1\) and a leading digit \(f\in\{1,\ldots,9\}\). Let the second digit be \(f+g\), where $-f \le g \le 9-f,\qquad g\ne 0.$ The repeated-digit backbone is the repunit-scaled number $R_t=1+10+\cdots+10^{t-1}=\frac{10^t-1}{9},\qquad fR_t=ff\ldots f.$ If \(S\subseteq \{0,1,\ldots,t-2\}\) is the set of non-leading positions where we replace \(f\) by \(f+g\), then the candidate number is $N=fR_t+g\sum_{j\in S}10^j.$ The leading position is excluded from \(S\), so the first digit stays equal to \(f\)....
Detailed mathematical approach
Problem Summary
For each positive integer \(n\), define \(d(n)\) as the smallest positive multiple of \(n\) whose decimal expansion uses at most two distinct digits; such a number is called a duodigit. The task is to evaluate
$D(k)=\sum_{n=1}^{k} d(n).$
A direct search through multiples of \(n\) is far too slow, because the first valid multiple can be much larger than \(n\) and can have many digits. The implementation instead rewrites the digit pattern as a modular subset-sum problem and searches by increasing decimal length.
Mathematical Approach
The key idea is to describe every candidate duodigit of a fixed length in a uniform algebraic way. That turns the divisibility test into a congruence and lets us search over residue classes instead of enumerating all decimal strings.
Step 1: Write every candidate in a fixed two-digit form
Fix a decimal length \(t\ge 1\) and a leading digit \(f\in\{1,\ldots,9\}\). Let the second digit be \(f+g\), where
$-f \le g \le 9-f,\qquad g\ne 0.$
The repeated-digit backbone is the repunit-scaled number
$R_t=1+10+\cdots+10^{t-1}=\frac{10^t-1}{9},\qquad fR_t=ff\ldots f.$
If \(S\subseteq \{0,1,\ldots,t-2\}\) is the set of non-leading positions where we replace \(f\) by \(f+g\), then the candidate number is
$N=fR_t+g\sum_{j\in S}10^j.$
The leading position is excluded from \(S\), so the first digit stays equal to \(f\). The special one-digit-value case is simply the empty choice \(S=\varnothing\), which gives \(N=fR_t\).
Step 2: Convert divisibility into one congruence
Let
$s(S)=\sum_{j\in S}10^j.$
Then \(n\mid N\) is equivalent to
$g\,s(S)\equiv -fR_t \pmod n.$
Write
$r\equiv -fR_t \pmod n,\qquad d=\gcd(g,n).$
This congruence has a solution only if \(d\mid r\). When that happens we divide by \(d\) and obtain
$\frac{g}{d}\,s(S)\equiv \frac{r}{d}\pmod{n/d}.$
Now \(\gcd(g/d,n/d)=1\), so \((g/d)^{-1}\) exists modulo \(n/d\). Therefore all valid residues are described by
$s(S)\equiv w_0 \pmod{n/d},\qquad w_0\equiv \frac{r}{d}\left(\frac{g}{d}\right)^{-1}\pmod{n/d}.$
Viewed modulo \(n\), that means the acceptable residues are
$w_0,\ w_0+\frac{n}{d},\ w_0+2\frac{n}{d},\ \ldots,\ w_0+(d-1)\frac{n}{d}.$
So the arithmetic test for a digit pair \(\{f,f+g\}\) reduces to a short list of target residue classes.
Step 3: Split the lower positions into two residue collections
For length \(t\), the adjustable place values are \(1,10,\ldots,10^{t-2}\). The implementation assigns these powers of ten to two groups, say \(X\) and \(Y\), and treats the chosen set as a union \(S=A\cup B\) with \(A\subseteq X\) and \(B\subseteq Y\).
For each half we care about two quantities:
$\rho(A)\equiv \sum_{j\in A}10^j \pmod n,\qquad \sigma(A)=\sum_{j\in A}10^j,$
and similarly for \(B\). Then
$s(S)=\sigma(A)+\sigma(B),\qquad s(S)\equiv \rho(A)+\rho(B)\pmod n.$
This is a meet-in-the-middle decomposition: if a target residue \(w\) is required, we only need residues \(x\) from the first half and \(y\) from the second half such that
$x+y\equiv w\pmod n.$
Step 4: Why each residue class only needs a minimum and a maximum
For fixed \(t\), \(f\), and \(g\), the base term \(fR_t\) is constant, so minimizing \(N\) is the same as optimizing \(s(S)\) in
$N=fR_t+g\,s(S).$
If \(g\gt 0\), a smaller subset sum gives a smaller number, so we only need the minimum value of \(\sigma\) for each residue. If \(g\lt 0\), a larger subset sum gives a smaller number, so we only need the maximum value of \(\sigma\) for each residue.
That is why each residue class stores just two extremal values: the smallest attainable subset sum and the largest attainable subset sum. Full lists of subsets are unnecessary.
Step 5: Grow the search one decimal place at a time
The search proceeds through lengths \(t=1,2,3,\ldots\). At each new length, the repeated-digit backbone \(fR_t\) is updated, and one more lower place value \(10^{t-1}\) becomes available for future subset choices.
When a new weight \(10^j\) is added to one residue collection, every existing subset either keeps that weight out or includes it. So each stored residue \(u\) generates a shifted residue
$u+10^j \pmod n,$
and the corresponding subset sum increases by \(10^j\). The implementation inserts the new place value into whichever side is cheaper to expand, which keeps the two tables practical while preserving exactly the same set of possible totals.
There is also a useful special case: if \(10\mid n\), then every multiple of \(n\) must end in \(0\), so the second digit must be \(0\). In the formula above that means \(g=-f\), and all other offsets can be skipped.
Step 6: Worked examples
For \(n=12\), take length \(t=2\), leading digit \(f=1\), and offset \(g=1\). Then \(R_2=11\), and choosing \(S=\{0\}\) gives
$N=1\cdot 11+1\cdot 1=12.$
In congruence form,
$1\cdot s(S)\equiv -11\equiv 1\pmod{12},$
so \(s(S)=1\) works immediately.
For \(n=103\), a negative offset is needed. Take \(t=3\), \(f=5\), and \(g=-4\). With \(S=\{1\}\),
$N=5\cdot 111-4\cdot 10=555-40=515,$
and indeed \(515=5\cdot 103\). This illustrates why the search must support both \(g\gt 0\) and \(g\lt 0\).
How the Code Works
The C++ implementation performs the actual number-theoretic search. For each admissible digit offset \(g\in\{-9,\ldots,-1,1,\ldots,9\}\), it precomputes the gcd reduction data and the modular inverse needed to solve the congruence modulo \(n/d\). That lets it reject impossible digit pairs quickly and turn every possible pair into a small list of target residues.
Two residue tables are maintained. Each table starts with the empty subset, and whenever a new lower place value \(10^j\) is introduced, the chosen table updates all of its reachable residues by considering both possibilities: exclude \(10^j\) or include \(10^j\). For every residue class modulo \(n\), the table keeps only the smallest and largest actual subset sums that realize that residue.
For the current length \(t\) and leading digit \(f\), the implementation first tests the repeated-digit number \(fR_t\). If that does not work, it tries every valid second digit. For each admissible target residue \(w\), it joins the two tables by checking which residue \(x\) from the first table can be paired with residue \(w-x\) from the second. The stored extremal subset sums then reconstruct the smallest numeric candidate for that sign of \(g\).
After \(d(n)\) has been found, the C++ implementation adds it to the running total. For large \(k\), the range \(1,\ldots,k\) is split into chunks and processed in parallel by worker threads, then the partial sums are added exactly. The Python and Java implementations do not rederive the mathematics; they invoke the same compiled core and parse its printed result.
Complexity Analysis
For a fixed \(n\), each residue table has at most \(n\) residue classes. Adding one new power of ten to a table therefore costs \(O(n)\) time in the worst case, because every currently reachable residue may create one shifted state. The join step also works over residue classes rather than over all subsets.
If the smallest duodigit multiple of \(n\) has length \(t_n\), the search for that \(n\) uses \(O(t_n n)\) memory-time scale up to small constant factors coming from the finite set of digit offsets. This is dramatically smaller than a naive exploration of all \(2^{t_n-1}\) subsets of lower positions.
For the outer sum, the total work is
$\sum_{n=1}^{k} O(t_n n),$
and the implementation parallelizes this sum across CPU cores because the values \(d(1),d(2),\ldots,d(k)\) are independent.
Footnotes and References
- Problem page: Project Euler 714
- Repunit numbers: Wikipedia — Repunit
- Modular arithmetic: Wikipedia — Modular arithmetic
- Modular multiplicative inverse: Wikipedia — Modular multiplicative inverse
- Subset-sum background: Wikipedia — Subset sum problem