Problem 419: Look and Say Sequence
View on Project EulerProject Euler Problem 419 Solution
EulerSolve provides an optimized solution for Project Euler Problem 419, Look and Say Sequence, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary Let \(A_1 = 1\), and obtain each later term by reading the previous term as consecutive runs of equal digits. Thus $1 \to 11 \to 21 \to 1211 \to 111221 \to \cdots.$ If \(a_n\), \(b_n\), and \(c_n\) denote the numbers of digits \(1\), \(2\), and \(3\) in \(A_n\), the goal is to determine \((a_n,b_n,c_n)\) for an enormous index \(n\). Directly generating \(A_n\) is hopeless because the string length grows exponentially, so the solution must compress the evolution into a fixed-size linear system. Mathematical Approach Step 1: Replace strings by Conway atoms Conway's cosmological theorem states that, after finitely many initial terms, every look-and-say word generated from \(1\) can be decomposed into a concatenation of 92 irreducible building blocks, usually called Conway atoms or Conway elements. The crucial point is that each atom has a deterministic one-step decay into a short list of atoms. Let \(v_n \in \mathbb{N}^{92}\) be the atom-count vector of \(A_n\), where \(v_n(i)\) is the number of copies of atom \(i\)....
Detailed mathematical approach
Problem Summary
Let \(A_1 = 1\), and obtain each later term by reading the previous term as consecutive runs of equal digits. Thus
$1 \to 11 \to 21 \to 1211 \to 111221 \to \cdots.$
If \(a_n\), \(b_n\), and \(c_n\) denote the numbers of digits \(1\), \(2\), and \(3\) in \(A_n\), the goal is to determine \((a_n,b_n,c_n)\) for an enormous index \(n\). Directly generating \(A_n\) is hopeless because the string length grows exponentially, so the solution must compress the evolution into a fixed-size linear system.
Mathematical Approach
Step 1: Replace strings by Conway atoms
Conway's cosmological theorem states that, after finitely many initial terms, every look-and-say word generated from \(1\) can be decomposed into a concatenation of 92 irreducible building blocks, usually called Conway atoms or Conway elements. The crucial point is that each atom has a deterministic one-step decay into a short list of atoms.
Let \(v_n \in \mathbb{N}^{92}\) be the atom-count vector of \(A_n\), where \(v_n(i)\) is the number of copies of atom \(i\). Define the transition matrix \(M\) by
$M_{ij} = \text{number of copies of atom } i \text{ produced from one copy of atom } j \text{ in one step}.$
Then the nonlinear string process becomes the linear recurrence
$v_{n+1} = M v_n.$
This is the main mathematical reduction used by the implementation: the sequence is complicated at the string level, but it is linear on the 92-dimensional atom population vector.
Step 2: Choose an explicit anchor term
The implementations do not start from \(A_1\) in atom form. Instead they move to a small term that already decomposes cleanly into Conway atoms. The first eight terms are
$1,\ 11,\ 21,\ 1211,\ 111221,\ 312211,\ 13112221,\ 1113213211.$
The eighth term already lies inside the Conway alphabet and splits as
$A_8 = 11132 \, 13211.$
These two substrings are Conway atoms, so \(v_8\) has exactly two nonzero entries, each equal to \(1\). Therefore, for every \(n \ge 8\),
$v_n = M^{n-8} v_8.$
For a huge target such as \(n = 10^{12}\), this formula is the reason the problem becomes tractable.
Step 3: Recover the counts of \(1\), \(2\), and \(3\)
For each atom \(i\), precompute its digit histogram
$d_i = (u_{i,1}, u_{i,2}, u_{i,3}),$
where \(u_{i,r}\) is the number of times digit \(r\) appears in that atom. If we write
$x_n = (a_n,b_n,c_n),$
then the final answer is obtained by linear aggregation:
$x_n = \sum_{i=1}^{92} v_n(i)\, d_i.$
Equivalently, if \(D\) is the \(3 \times 92\) matrix of per-atom digit counts, then
$x_n = D v_n.$
So the full computation consists of two stages: evolve the atom population, then convert that population into the required digit totals.
Step 4: Fast exponentiation of the transition
Applying \(M\) step by step would still require about \(10^{12}\) updates, which is far too slow. Instead, the implementation computes \(M^{n-8}\) by exponentiation by squaring. That reduces the dependence on \(n\) from linear to logarithmic:
$M^{n-8} = \prod_{\text{bits of } n-8} M^{2^k}.$
Because the required output is taken modulo \(2^{30}\), every intermediate matrix entry, vector entry, and digit accumulation can also be reduced modulo \(2^{30}\) without changing the final result modulo \(2^{30}\).
Worked Example: the anchor term
The chosen anchor is also a convenient consistency check. Since
$A_8 = 1113213211,$
we immediately obtain
$a_8 = 6,\qquad b_8 = 2,\qquad c_8 = 2.$
This agrees with the atom decomposition \(11132 \, 13211\): once the initial vector is correct, the matrix method and direct counting coincide at the anchor before any large exponentiation begins.
How the Code Works
The C++, Python, and Java implementations all encode the same mathematical data: the 92 Conway atom strings, the one-step decay rule for each atom, and the corresponding sparse \(92 \times 92\) transition matrix. Each column has only a few nonzero entries because one atom decays into at most six atoms.
They also precompute, once, how many \(1\)s, \(2\)s, and \(3\)s occur inside each atom. Starting from the two-atom decomposition of the 8th term, the implementation applies binary exponentiation to evolve the atom-count vector to the requested term and then forms the final digit triple by a weighted sum of those precomputed histograms.
The C++ implementation additionally includes a tiny direct generator for small indices so that checkpoint terms can be verified against the matrix method. The fast path for the real target is the same linear-algebra approach in all three languages.
Complexity Analysis
Let \(K = 92\). In a dense matrix model, one matrix squaring costs \(O(K^3)\), and exponentiation by squaring needs \(O(\log n)\) squarings, so the total running time is
$O(K^3 \log n).$
Applying a matrix to a vector costs \(O(K^2)\) and does not change the overall asymptotic bound. The memory usage is \(O(K^2)\) for the transition matrix and \(O(K)\) for vectors and per-atom digit histograms. In practice the runtime is noticeably better than the dense estimate because the transition matrix is sparse.
References
- Problem page: https://projecteuler.net/problem=419
- Look-and-say sequence: Wikipedia — Look-and-say sequence
- Conway cosmological decay: Wikipedia — Cosmological decay
- Exponentiation by squaring: Wikipedia — Exponentiation by squaring