Problem 529: $10$-substrings
View on Project EulerProject Euler Problem 529 Solution
EulerSolve provides an optimized solution for Project Euler Problem 529, $10$-substrings, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary A decimal string \(d_1d_2\dots d_\ell\) with \(d_1\ne 0\) is valid if every position belongs to at least one contiguous substring whose digit sum is \(10\). Let \(a_\ell\) be the number of valid strings of length \(\ell\), and define $T(n)=\sum_{\ell=1}^{n} a_\ell \pmod{10^9+7}.$ The target input is enormous, so the implementation cannot iterate lengths up to \(n\) directly. The solution turns the coverage condition into a finite automaton, extracts a linear recurrence for \(T(n)\), and evaluates that recurrence at very large indices. Mathematical Approach Write the prefix sums of the digits as \(P_0=0\) and \(P_t=d_1+\cdots+d_t\). A substring \(d_i\dots d_j\) has digit sum \(10\) exactly when $P_j-P_{i-1}=10.$ The whole method is built around scanning the string from left to right while tracking only the information that can still matter for future sum-\(10\) substrings. Step 1: Track the Leftmost Unresolved Position Suppose we have processed digits up to position \(j\), and let \(c\) be the first position that is not yet guaranteed to be covered by a sum-\(10\) substring. Then the unresolved window is \(d_c\dots d_j\), whose total digit sum is $\Delta=P_j-P_{c-1}.$ All digits are nonnegative, so \(\Delta\) can only increase as we append more digits....
Detailed mathematical approach
Problem Summary
A decimal string \(d_1d_2\dots d_\ell\) with \(d_1\ne 0\) is valid if every position belongs to at least one contiguous substring whose digit sum is \(10\). Let \(a_\ell\) be the number of valid strings of length \(\ell\), and define
$T(n)=\sum_{\ell=1}^{n} a_\ell \pmod{10^9+7}.$
The target input is enormous, so the implementation cannot iterate lengths up to \(n\) directly. The solution turns the coverage condition into a finite automaton, extracts a linear recurrence for \(T(n)\), and evaluates that recurrence at very large indices.
Mathematical Approach
Write the prefix sums of the digits as \(P_0=0\) and \(P_t=d_1+\cdots+d_t\). A substring \(d_i\dots d_j\) has digit sum \(10\) exactly when
$P_j-P_{i-1}=10.$
The whole method is built around scanning the string from left to right while tracking only the information that can still matter for future sum-\(10\) substrings.
Step 1: Track the Leftmost Unresolved Position
Suppose we have processed digits up to position \(j\), and let \(c\) be the first position that is not yet guaranteed to be covered by a sum-\(10\) substring. Then the unresolved window is \(d_c\dots d_j\), whose total digit sum is
$\Delta=P_j-P_{c-1}.$
All digits are nonnegative, so \(\Delta\) can only increase as we append more digits. Therefore, if \(\Delta>10\), no future extension can create a sum-\(10\) substring containing position \(c\), and that branch is impossible. This gives the crucial bound
$0\le \Delta \le 10.$
Step 2: Record Only Finite Data
Inside the current unresolved window we keep the set of partial sums
$S=\{P_t-P_{c-1}: c-1\le t\le j\}\subseteq\{0,1,\dots,\Delta\}.$
These values describe every internal boundary of the unresolved window. We also keep a second set
$H\subseteq\{0,1,\dots,10\},$
whose elements are offsets from earlier prefix positions that can still serve as left boundaries of a future sum-\(10\) substring. When a new digit \(x\) is appended, the unresolved total becomes
$\Delta'=\Delta+x.$
If \(\Delta'>10\), the transition is rejected. Otherwise we enlarge the partial-sum set to
$S'=S\cup\{\Delta'\}.$
Step 3: Detect When the Unresolved Window Closes
The new right endpoint closes the whole unresolved window exactly when there is a complementary historical offset:
$10-\Delta' \in H.$
In that case some earlier prefix position \(u\) satisfies \(P_{c-1}-P_u=10-\Delta'\), so
$P_{j+1}-P_u=10.$
Hence the substring from \(u+1\) to \(j+1\) has digit sum \(10\) and covers every position from \(c\) through \(j+1\). After this closure, the unresolved window becomes empty again, so the window data resets to
$S_{\text{new}}=\{0\}.$
The historical offsets relative to the new boundary are updated by translation and by every internal cut of the closed window:
$H_{\text{new}}=\{h+\Delta' : h\in H,\ h+\Delta'\le 10\}\cup\{\Delta'-s : s\in S'\}.$
Step 4: Finite Automaton and Acceptance
Because both \(H\) and \(S\) live inside \(\{0,1,\dots,10\}\), only finitely many states can ever occur. Each appended digit gives a deterministic state transition. A state is accepting exactly when no unresolved window remains, which in this encoding means
$S=\{0\}.$
So counting valid strings of length \(\ell\) becomes counting length-\(\ell\) paths in a deterministic finite automaton. The first digit uses the alphabet \(\{1,\dots,9\}\), while all later digits use \(\{0,\dots,9\}\).
Step 5: From the Automaton to a Linear Recurrence
Let \(a_\ell\) be the number of accepting paths of length \(\ell\). Dynamic programming on the minimized automaton gives the initial values of \(a_\ell\) and therefore of
$T(n)=\sum_{\ell=1}^{n} a_\ell.$
Any fixed finite automaton induces repeated matrix multiplication modulo \(10^9+7\), so the cumulative sequence \(T(0),T(1),T(2),\dots\) satisfies a linear recurrence. The implementation samples enough initial terms, recovers the shortest recurrence with Berlekamp-Massey, and then evaluates the \(n\)-th term with polynomial binary exponentiation.
Worked Example
Consider the string \(352\). Its prefix sums are \(0,3,8,10\). While reading it, the unresolved total moves from \(0\) to \(3\), then to \(8\), then to \(10\), never exceeding the allowed bound. At the last digit we have \(10-\Delta'=0\), and the initial historical set already contains \(0\), so the whole string closes as one sum-\(10\) block. Therefore \(352\) is valid.
As a small counting check, no length-\(1\) string can be valid, so \(a_1=0\). For length \(2\), the only valid strings are \(19,28,37,46,55,64,73,82,91\), giving
$a_2=9,\qquad T(2)=9.$
How the Code Works
The C++, Python, and Java implementations use the same mathematical pipeline. The compiled solver first enumerates every reachable automaton state from the start configuration by repeatedly applying the digit-transition rule described above. After that, it minimizes the raw automaton by partition refinement so that equivalent states are merged without changing the accepted language.
Next, the implementation runs dynamic programming on the minimized automaton. The first step allows digits \(1\) through \(9\); each later step allows \(0\) through \(9\). Summing the counts in accepting states yields \(a_\ell\), and cumulative sums yield \(T(\ell)\). The code generates \(2s+5\) initial values when the minimized automaton has \(s\) states, which is enough to recover the recurrence used in practice.
Finally, Berlekamp-Massey extracts the shortest linear recurrence for the cumulative sequence modulo \(10^9+7\). The solver then evaluates that recurrence at index \(n\) using Kitamasa-style polynomial reduction with binary exponentiation. The Python entry point delegates to the compiled solver, so its numerical behavior matches the compiled versions.
Complexity Analysis
Let \(s_{\text{raw}}\) be the number of reachable raw states, \(s\) the number of minimized states, and \(r\) the recovered recurrence order. Building the reachable raw automaton requires \(O(10s_{\text{raw}})\) transitions. Minimization is near-linear in the transition graph, typically written as \(O(10s_{\text{raw}}\log s_{\text{raw}})\).
The initial dynamic programming phase computes \(2s+5\) terms, so it costs \(O(10s(2s+5))=O(s^2)\) time. Once the recurrence is known, the large-index evaluation costs
$O(r^2\log n)$
time and \(O(r)\) extra memory, on top of the automaton tables.