Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 238: Infinite String Tour

View on Project Euler

Project Euler Problem 238 Solution

EulerSolve provides an optimized solution for Project Euler Problem 238, Infinite String Tour, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary Start with \(s_0=14025256\) and iterate $s_{n+1}=s_n^2 \pmod{20300713}.$ Write the decimal expansions of \(s_0,s_1,s_2,\dots\) one after another to obtain an infinite digit string \(w=w_1w_2w_3\cdots\). For each positive integer \(k\), define \(p(k)\) to be the smallest starting position \(z\ge 1\) for which there exists an ending position \(y\ge z\) such that the digit sum of the block \(w_z w_{z+1}\dots w_y\) is exactly \(k\). The problem asks for $\sum_{k=1}^{2\cdot 10^{15}} p(k).$ A direct attack is impossible: the outer summation is enormous, and the string itself is infinite. The solution used by the implementations turns the problem into finite data extracted from one period of the quadratic generator. Mathematical Approach The key objects are the state cycle of the modular recurrence, the resulting periodic digit block, and the prefix sums of that block taken modulo one full-period digit sum. The modular squaring sequence is purely periodic The recurrence evolves in a finite state space, so some state must eventually repeat. Once a state repeats, the future trajectory repeats as well. In this problem the repetition is especially clean: the first repeated state is the initial seed itself, so there is no transient prefix before the cycle starts. Therefore the sequence of generated integers is a pure cycle from \(s_0\) onward....

Detailed mathematical approach

Problem Summary

Start with \(s_0=14025256\) and iterate

$s_{n+1}=s_n^2 \pmod{20300713}.$

Write the decimal expansions of \(s_0,s_1,s_2,\dots\) one after another to obtain an infinite digit string \(w=w_1w_2w_3\cdots\). For each positive integer \(k\), define \(p(k)\) to be the smallest starting position \(z\ge 1\) for which there exists an ending position \(y\ge z\) such that the digit sum of the block \(w_z w_{z+1}\dots w_y\) is exactly \(k\). The problem asks for

$\sum_{k=1}^{2\cdot 10^{15}} p(k).$

A direct attack is impossible: the outer summation is enormous, and the string itself is infinite. The solution used by the implementations turns the problem into finite data extracted from one period of the quadratic generator.

Mathematical Approach

The key objects are the state cycle of the modular recurrence, the resulting periodic digit block, and the prefix sums of that block taken modulo one full-period digit sum.

The modular squaring sequence is purely periodic

The recurrence evolves in a finite state space, so some state must eventually repeat. Once a state repeats, the future trajectory repeats as well. In this problem the repetition is especially clean: the first repeated state is the initial seed itself, so there is no transient prefix before the cycle starts.

Therefore the sequence of generated integers is a pure cycle from \(s_0\) onward. Concatenating their decimal representations gives a digit block

$d_1,d_2,\dots,d_L$

that repeats forever. The implementations discover this period directly and obtain

$L=18{,}886{,}117,\qquad S=\sum_{i=1}^{L} d_i = 80{,}846{,}691.$

Here \(L\) is the number of digits in one full period, and \(S\) is the sum of the digits in that period.

Prefix sums linearize substring sums

Define the one-period prefix sums by

$P_0=0,\qquad P_i=\sum_{t=1}^{i} d_t\quad (1\le i\le L).$

Then \(P_L=S\). Because the digit block repeats, every prefix sum of the infinite string has the form

$T_{mL+i}=mS+P_i,\qquad m\ge 0,\ 0\le i\le L,$

where \(T_n=w_1+\cdots+w_n\) is the sum of the first \(n\) digits of the infinite string.

A block beginning at position \(z\) and ending at position \(y\) has digit sum \(k\) exactly when

$T_y-T_{z-1}=k.$

So \(p(k)\) is the smallest \(z\) for which this equation has at least one solution \(y\ge z\).

Residues modulo \(S\) are the right state space

Write \(z-1=mL+b\) with \(0\le b<L\). Then

$T_{z-1}=mS+P_b.$

If we want a block of digit sum \(k\) starting at \(z\), we need

$T_y=mS+P_b+k.$

But every \(T_y\) is some \(qS+P_i\), so the existence of such a \(y\) is equivalent to the congruence

$P_i\equiv P_b+k \pmod S$

for at least one \(i\in\{0,1,\dots,L\}\). This leads to the residue set

$R=\{P_i\bmod S:0\le i\le L\}.$

Now the entire question “does position \(z\) work for target sum \(k\)?” becomes the finite test

$\big(P_b+k\big)\bmod S\in R.$

That is exactly the condition checked in the implementations. The infinite search over end positions is reduced to a residue lookup.

Periodicity of \(p(k)\)

The validity test depends on \(k\) only through \(k\bmod S\). Therefore the set of starting positions that work for \(k\) is the same as the set of starting positions that work for \(k+S\), so

$p(k+S)=p(k).$

There is a second periodicity as well: the starting residue repeats every \(L\) digits, so whether position \(z\) works depends only on \((z-1)\bmod L\). Hence, whenever a target sum is attainable, its earliest valid start lies within the first period.

As a result, it is enough to compute \(p(1),p(2),\dots,p(S)\) once. If \(N=2\cdot10^{15}\), then

$\sum_{k=1}^{N} p(k)=\left\lfloor\frac{N}{S}\right\rfloor\sum_{k=1}^{S}p(k)+\sum_{k=1}^{N\bmod S}p(k).$

This is the central reduction in the problem: a huge outer range collapses to one finite period in \(k\).

Worked Example from the Beginning of the String

The infinite string begins with the decimal expansions of \(14025256,7410149,5847003,\dots\), so its first digits are

$1,4,0,2,5,2,5,6,7,4,1,0,1,4,9,\dots$

The corresponding prefix sums begin

$0,1,5,5,7,12,14,19,25,32,36,37,37,38,42,51,\dots$

Take \(k=6\). Starting at \(z=1\) would require a later prefix sum equal to \(6\), but the running totals jump from \(5\) to \(7\), so no substring beginning at the first digit has digit sum \(6\).

Starting at \(z=2\), however, the starting prefix sum is \(1\), and \(1+6=7\) does appear later as a prefix sum. Therefore the block \(4,0,2\) has digit sum \(6\), so

$p(6)=2.$

This is a miniature version of the full method: subtract the prefix sum at the starting position, look for the shifted target among the visited prefix residues, and take the earliest start that succeeds.

How the Code Works

Building one full digit period

The C++, Python, and Java implementations iterate the quadratic recurrence from the given seed, remember the first occurrence of each generated state, and stop when a state repeats. Because the repeated state is the seed, the collected decimal digits already form one complete period of the infinite string. The implementations then build the one-period prefix sums and the total digit sum \(S\).

Preparing the residue tables

Next, the implementation marks every residue \(P_i\bmod S\) in a dense membership table and also stores the repeating sequence of starting residues \(P_0,P_1,\dots,P_{L-1}\bmod S\). These two arrays encode the mathematical criterion above: for a candidate start position and a target \(k\), one modular addition and one table lookup decide whether that start position works.

Computing \(p(k)\) for one full \(k\)-period

For each \(k=1,2,\dots,S\), the implementation scans starting positions \(z=1,2,3,\dots\) until the residue condition succeeds, and records that first successful position as \(p(k)\). The C++ and Java implementations split the independent \(k\)-ranges across multiple threads; the Python implementation performs the same search serially. Once the table \(p(1),\dots,p(S)\) is available, the final answer is obtained from the block decomposition formula.

Complexity Analysis

Let \(L\) be the number of digits in one period and \(S\) the digit sum of one period. Constructing the state cycle, the digit block, and the prefix sums costs \(O(L)\) time. Marking the residue set also costs \(O(L)\) time and uses \(O(S)\) memory for the membership table.

The dominant cost in these implementations is the direct search for each \(p(k)\). In the worst case, one value of \(k\) can scan up to one whole digit period before the first success, so the overall worst-case running time is \(O(SL)\). The memory usage is \(O(S+L)\). Parallel execution in the compiled implementations reduces wall-clock time, but the main mathematical gain comes from periodicity: instead of handling \(2\cdot10^{15}\) targets separately, the algorithm computes one finite period and reuses it.

Footnotes and References

  1. Problem page: https://projecteuler.net/problem=238
  2. Prefix sum: Wikipedia - Prefix sum
  3. Modular arithmetic: Wikipedia - Modular arithmetic
  4. Pigeonhole principle: Wikipedia - Pigeonhole principle
  5. Periodic sequence: Wikipedia - Periodic sequence

Mathematical approach · C++ solution · Python solution · Java solution

Previous: Problem 237 · All Project Euler solutions · Next: Problem 239

Need help with a problem? Ask me! 💡
e
✦ Euler GLM 5.2
Hi! I'm Euler. Ask me anything about Project Euler problems, math concepts, or solution approaches! 🧮