Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 316: Numbers in Decimal Expansions

View on Project Euler

Project Euler Problem 316 Solution

EulerSolve provides an optimized solution for Project Euler Problem 316, Numbers in Decimal Expansions, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary Let $p=p_1p_2p_3\cdots$ be an infinite random decimal stream, where each digit is chosen independently and uniformly from \(\{0,\dots,9\}\). For a positive integer \(n\) with decimal representation $s=s_1s_2\cdots s_d,$ let \(k\) be the first position where the block \(s\) appears in the stream. The problem asks for $\sum_{m=2}^{999999} g\!\left(\left\lfloor\frac{10^{16}}{m}\right\rfloor\right),$ where \(g(n)=\mathbb E[k]\). Mathematical Approach 1) End position is more convenient than start position. If the first occurrence of \(s\) starts at position \(k\), then it ends at $T_{\mathrm{end}}=k+d-1.$ So once we know the expected ending index, we recover the required quantity by $g(n)=\mathbb E[T_{\mathrm{end}}]-d+1.$ This is why the code first computes an expected end position and only subtracts \(d-1\) at the very end. 2) KMP states give the exact Markov chain. State \(i\in\{0,\dots,d\}\) means that the longest suffix of the digits seen so far equals the prefix \(s_1\cdots s_i\). State \(d\) is absorbing because the pattern has already appeared. If \(E_i\) is the expected number of additional digits needed to reach state \(d\), then the exact linear system is $E_d=0,$ $E_i=1+\frac1{10}\sum_{x=0}^{9} E_{\delta(i,x)} \qquad (0\le i\lt d),$ where \(\delta(i,x)\) is the usual KMP transition after reading digit \(x\)....

Detailed mathematical approach

Problem Summary

Let

$p=p_1p_2p_3\cdots$

be an infinite random decimal stream, where each digit is chosen independently and uniformly from \(\{0,\dots,9\}\).

For a positive integer \(n\) with decimal representation

$s=s_1s_2\cdots s_d,$

let \(k\) be the first position where the block \(s\) appears in the stream. The problem asks for

$\sum_{m=2}^{999999} g\!\left(\left\lfloor\frac{10^{16}}{m}\right\rfloor\right),$

where \(g(n)=\mathbb E[k]\).

Mathematical Approach

1) End position is more convenient than start position.

If the first occurrence of \(s\) starts at position \(k\), then it ends at

$T_{\mathrm{end}}=k+d-1.$

So once we know the expected ending index, we recover the required quantity by

$g(n)=\mathbb E[T_{\mathrm{end}}]-d+1.$

This is why the code first computes an expected end position and only subtracts \(d-1\) at the very end.

2) KMP states give the exact Markov chain.

State \(i\in\{0,\dots,d\}\) means that the longest suffix of the digits seen so far equals the prefix \(s_1\cdots s_i\). State \(d\) is absorbing because the pattern has already appeared.

If \(E_i\) is the expected number of additional digits needed to reach state \(d\), then the exact linear system is

$E_d=0,$

$E_i=1+\frac1{10}\sum_{x=0}^{9} E_{\delta(i,x)} \qquad (0\le i\lt d),$

where \(\delta(i,x)\) is the usual KMP transition after reading digit \(x\).

So the problem is really an absorbing Markov chain on the prefix-function automaton.

3) The whole system collapses to the border chain.

Let

$d=\ell_0 \gt \ell_1 \gt \ell_2 \gt \cdots \gt \ell_t \gt 0$

be the lengths obtained by starting at \(d\) and repeatedly following the KMP failure link

$\ell_{j+1}=\pi[\ell_j-1].$

These are exactly the lengths of the pattern itself and all its non-empty proper borders.

The standard exact solution of the KMP expectation system is

$\mathbb E[T_{\mathrm{end}}]=\sum_{j=0}^{t}10^{\ell_j}.$

Equivalently,

$g(n)=\sum_{j=0}^{t}10^{\ell_j}-d+1.$

4) Why borders are the only overlaps that matter.

If the word had no self-overlap, every fresh full hit would need an average of \(10^d\) ending positions, so we would simply get \(\mathbb E[T_{\mathrm{end}}]=10^d\).

What changes this is overlap. After reading many digits, the automaton may fail to complete the word, yet still keep a suffix that is also a prefix of the pattern. The only suffix lengths that can survive in this way are the border lengths found by the prefix-function chain.

Those recyclable overlaps contribute the extra terms \(10^{\ell_1},10^{\ell_2},\dots\). No other state survives in the final closed form. That is why the implementation does not solve a dense linear system; it only walks the border chain.

5) Worked example: \(n=535\).

The decimal word is \(s=\texttt{535}\), so \(d=3\). Its prefix function is

$\pi=[0,0,1],$

hence the border chain is

$3\to1\to0.$

Therefore

$\mathbb E[T_{\mathrm{end}}]=10^3+10^1=1010,$

and

$g(535)=1010-3+1=1008.$

This matches the value given in the statement and the checkpoint in the C++ code.

6) The Markov equations for \(535\).

If we really write the state equations, with states \(0,1,2,3\) for matched prefixes \(\emptyset,\texttt{5},\texttt{53},\texttt{535}\), we obtain

$E_3=0,$

$E_2=1+\frac9{10}E_0+\frac1{10}E_3=1+\frac9{10}E_0,$

$E_1=1+\frac8{10}E_0+\frac1{10}E_1+\frac1{10}E_2,$

$E_0=1+\frac9{10}E_0+\frac1{10}E_1.$

Solving these gives

$E_0=1010.$

So the border formula is not a heuristic shortcut; it is the exact closed form of the full automaton system.

7) Another quick example: a word with no border.

For \(s=\texttt{12}\), there is no non-empty proper border. The chain contains only \(\ell_0=2\), hence

$\mathbb E[T_{\mathrm{end}}]=10^2=100,\qquad g(12)=100-2+1=99.$

This is exactly what we expect for a two-digit block with no overlap bonus.

8) Why \(g(n)\) is automatically an integer.

The formula uses only integer powers of \(10\) and an integer correction \(d-1\):

$g(n)=\left(\sum_j 10^{\ell_j}\right)-d+1.$

So the integrality of \(g(n)\), which looks surprising in the problem statement, becomes completely transparent.

9) What the code actually does.

For each query value

$q=\left\lfloor\frac{10^{16}}{m}\right\rfloor,$

the solver:

a) converts \(q\) to its decimal string \(s\),

b) builds the KMP prefix function \(\pi\) in \(O(d)\),

c) walks the chain

$d,\ \pi[d-1],\ \pi[\pi[d-1]-1],\dots$

and adds the precomputed powers \(10^{\ell}\),

d) subtracts \(d-1\) to turn expected end position into expected start position.

This is exactly the loop for (len = d; len > 0; len = pi[len - 1]) expected_end += pow10[len];.

Algorithm

1) Precompute \(10^0,10^1,\dots,10^{19}\).

2) For each \(m=2,\dots,999999\), set \(q=\lfloor 10^{16}/m\rfloor\).

3) Build the prefix function of the decimal expansion of \(q\).

4) Sum \(10^{\ell}\) over the full border chain.

5) Convert expected end index to \(g(q)\) by subtracting \(d-1\).

6) Accumulate the result in a 128-bit integer.

Complexity Analysis

There are

$Q=999998$

queries, and each queried number has at most \(16\) decimal digits. Therefore the total running time is

$O\!\left(\sum_{m=2}^{999999} d_m\right)=O(16Q),$

which is effectively linear in the number of queries. Memory per query is \(O(d)\), again at most \(16\) integers.

Checks And Final Result

The implementation checks the two values stated by the problem:

$g(535)=1008,$

$\sum_{n=2}^{999} g\!\left(\left\lfloor\frac{10^6}{n}\right\rfloor\right)=27280188.$

The final answer produced by the program is

$542934735751917735.$

Further Reading

  1. Problem page: https://projecteuler.net/problem=316
  2. Knuth-Morris-Pratt algorithm: https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm
  3. Prefix function and borders: https://cp-algorithms.com/string/prefix-function.html

Mathematical approach · C++ solution · Python solution · Java solution

Previous: Problem 315 · All Project Euler solutions · Next: Problem 317

Need help with a problem? Ask me! 💡
e
✦ Euler GLM 5.2
Hi! I'm Euler. Ask me anything about Project Euler problems, math concepts, or solution approaches! 🧮