Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 419: Look and Say Sequence

Project Euler Problem 419 Solution

EulerSolve provides an optimized solution for Project Euler Problem 419, Look and Say Sequence, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary Let $A_1 = 1$, and obtain each later term by reading the previous term as consecutive runs of equal digits. Thus $1 \to 11 \to 21 \to 1211 \to 111221 \to \cdots.$ If $a_n$, $b_n$, and $c_n$ denote the numbers of digits $1$, $2$, and $3$ in $A_n$, the goal is to determine $(a_n,b_n,c_n)$ for an enormous index $n$. Directly generating $A_n$ is hopeless because the string length grows exponentially, so the solution must compress the evolution into a fixed-size linear system. Mathematical Approach Step 1: Replace strings by Conway atoms Conway's cosmological theorem states that, after finitely many initial terms, every look-and-say word generated from $1$ can be decomposed into a concatenation of 92 irreducible building blocks, usually called Conway atoms or Conway elements. The crucial point is that each atom has a deterministic one-step decay into a short list of atoms. Let $v_n \in \mathbb{N}^{92}$ be the atom-count vector of $A_n$, where $v_n(i)$ is the number of copies of atom $i$....

Detailed mathematical approach

Problem Summary

Let $A_1 = 1$, and obtain each later term by reading the previous term as consecutive runs of equal digits. Thus

$1 \to 11 \to 21 \to 1211 \to 111221 \to \cdots.$

If $a_n$, $b_n$, and $c_n$ denote the numbers of digits $1$, $2$, and $3$ in $A_n$, the goal is to determine $(a_n,b_n,c_n)$ for an enormous index $n$. Directly generating $A_n$ is hopeless because the string length grows exponentially, so the solution must compress the evolution into a fixed-size linear system.

Mathematical Approach

Step 1: Replace strings by Conway atoms

Conway's cosmological theorem states that, after finitely many initial terms, every look-and-say word generated from $1$ can be decomposed into a concatenation of 92 irreducible building blocks, usually called Conway atoms or Conway elements. The crucial point is that each atom has a deterministic one-step decay into a short list of atoms.

Let $v_n \in \mathbb{N}^{92}$ be the atom-count vector of $A_n$, where $v_n(i)$ is the number of copies of atom $i$. Define the transition matrix $M$ by

$M_{ij} = \text{number of copies of atom } i \text{ produced from one copy of atom } j \text{ in one step}.$

Then the nonlinear string process becomes the linear recurrence

$v_{n+1} = M v_n.$

This is the main mathematical reduction used by the implementation: the sequence is complicated at the string level, but it is linear on the 92-dimensional atom population vector.

Step 2: Choose an explicit anchor term

The implementations do not start from $A_1$ in atom form. Instead they move to a small term that already decomposes cleanly into Conway atoms. The first eight terms are

$1,\ 11,\ 21,\ 1211,\ 111221,\ 312211,\ 13112221,\ 1113213211.$

The eighth term already lies inside the Conway alphabet and splits as

$A_8 = 11132 \, 13211.$

These two substrings are Conway atoms, so $v_8$ has exactly two nonzero entries, each equal to $1$. Therefore, for every $n \ge 8$,

$v_n = M^{n-8} v_8.$

For a huge target such as $n = 10^{12}$, this formula is the reason the problem becomes tractable.

Step 3: Recover the counts of $1$, $2$, and $3$

For each atom $i$, precompute its digit histogram

$d_i = (u_{i,1}, u_{i,2}, u_{i,3}),$

where $u_{i,r}$ is the number of times digit $r$ appears in that atom. If we write

$x_n = (a_n,b_n,c_n),$

then the final answer is obtained by linear aggregation:

$x_n = \sum_{i=1}^{92} v_n(i)\, d_i.$

Equivalently, if $D$ is the $3 \times 92$ matrix of per-atom digit counts, then

$x_n = D v_n.$

So the full computation consists of two stages: evolve the atom population, then convert that population into the required digit totals.

Step 4: Fast exponentiation of the transition

Applying $M$ step by step would still require about $10^{12}$ updates, which is far too slow. Instead, the implementation computes $M^{n-8}$ by exponentiation by squaring. That reduces the dependence on $n$ from linear to logarithmic:

$M^{n-8} = \prod_{\text{bits of } n-8} M^{2^k}.$

Because the required output is taken modulo $2^{30}$, every intermediate matrix entry, vector entry, and digit accumulation can also be reduced modulo $2^{30}$ without changing the final result modulo $2^{30}$.

Worked Example: the anchor term

The chosen anchor is also a convenient consistency check. Since

$A_8 = 1113213211,$

we immediately obtain

$a_8 = 6,\qquad b_8 = 2,\qquad c_8 = 2.$

This agrees with the atom decomposition $11132 \, 13211$: once the initial vector is correct, the matrix method and direct counting coincide at the anchor before any large exponentiation begins.

How the Code Works

The C++, Python, and Java implementations all encode the same mathematical data: the 92 Conway atom strings, the one-step decay rule for each atom, and the corresponding sparse $92 \times 92$ transition matrix. Each column has only a few nonzero entries because one atom decays into at most six atoms.

They also precompute, once, how many $1$s, $2$s, and $3$s occur inside each atom. Starting from the two-atom decomposition of the 8th term, the implementation applies binary exponentiation to evolve the atom-count vector to the requested term and then forms the final digit triple by a weighted sum of those precomputed histograms.

The C++ implementation additionally includes a tiny direct generator for small indices so that checkpoint terms can be verified against the matrix method. The fast path for the real target is the same linear-algebra approach in all three languages.

Complexity Analysis

Let $K = 92$. In a dense matrix model, one matrix squaring costs $O(K^3)$, and exponentiation by squaring needs $O(\log n)$ squarings, so the total running time is

$O(K^3 \log n).$

Applying a matrix to a vector costs $O(K^2)$ and does not change the overall asymptotic bound. The memory usage is $O(K^2)$ for the transition matrix and $O(K)$ for vectors and per-atom digit histograms. In practice the runtime is noticeably better than the dense estimate because the transition matrix is sparse.

References

Problem page: https://projecteuler.net/problem=419
Look-and-say sequence: Wikipedia — Look-and-say sequence
Conway cosmological decay: Wikipedia — Cosmological decay
Exponentiation by squaring: Wikipedia — Exponentiation by squaring