Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 419: Look and Say Sequence

View on Project Euler

Project Euler Problem 419 Solution

EulerSolve provides an optimized solution for Project Euler Problem 419, Look and Say Sequence, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary Let \(A_1 = 1\), and obtain each later term by reading the previous term as consecutive runs of equal digits. Thus $1 \to 11 \to 21 \to 1211 \to 111221 \to \cdots.$ If \(a_n\), \(b_n\), and \(c_n\) denote the numbers of digits \(1\), \(2\), and \(3\) in \(A_n\), the goal is to determine \((a_n,b_n,c_n)\) for an enormous index \(n\). Directly generating \(A_n\) is hopeless because the string length grows exponentially, so the solution must compress the evolution into a fixed-size linear system. Mathematical Approach Step 1: Replace strings by Conway atoms Conway's cosmological theorem states that, after finitely many initial terms, every look-and-say word generated from \(1\) can be decomposed into a concatenation of 92 irreducible building blocks, usually called Conway atoms or Conway elements. The crucial point is that each atom has a deterministic one-step decay into a short list of atoms. Let \(v_n \in \mathbb{N}^{92}\) be the atom-count vector of \(A_n\), where \(v_n(i)\) is the number of copies of atom \(i\)....

Detailed mathematical approach

Problem Summary

Let \(A_1 = 1\), and obtain each later term by reading the previous term as consecutive runs of equal digits. Thus

$1 \to 11 \to 21 \to 1211 \to 111221 \to \cdots.$

If \(a_n\), \(b_n\), and \(c_n\) denote the numbers of digits \(1\), \(2\), and \(3\) in \(A_n\), the goal is to determine \((a_n,b_n,c_n)\) for an enormous index \(n\). Directly generating \(A_n\) is hopeless because the string length grows exponentially, so the solution must compress the evolution into a fixed-size linear system.

Mathematical Approach

Step 1: Replace strings by Conway atoms

Conway's cosmological theorem states that, after finitely many initial terms, every look-and-say word generated from \(1\) can be decomposed into a concatenation of 92 irreducible building blocks, usually called Conway atoms or Conway elements. The crucial point is that each atom has a deterministic one-step decay into a short list of atoms.

Let \(v_n \in \mathbb{N}^{92}\) be the atom-count vector of \(A_n\), where \(v_n(i)\) is the number of copies of atom \(i\). Define the transition matrix \(M\) by

$M_{ij} = \text{number of copies of atom } i \text{ produced from one copy of atom } j \text{ in one step}.$

Then the nonlinear string process becomes the linear recurrence

$v_{n+1} = M v_n.$

This is the main mathematical reduction used by the implementation: the sequence is complicated at the string level, but it is linear on the 92-dimensional atom population vector.

Step 2: Choose an explicit anchor term

The implementations do not start from \(A_1\) in atom form. Instead they move to a small term that already decomposes cleanly into Conway atoms. The first eight terms are

$1,\ 11,\ 21,\ 1211,\ 111221,\ 312211,\ 13112221,\ 1113213211.$

The eighth term already lies inside the Conway alphabet and splits as

$A_8 = 11132 \, 13211.$

These two substrings are Conway atoms, so \(v_8\) has exactly two nonzero entries, each equal to \(1\). Therefore, for every \(n \ge 8\),

$v_n = M^{n-8} v_8.$

For a huge target such as \(n = 10^{12}\), this formula is the reason the problem becomes tractable.

Step 3: Recover the counts of \(1\), \(2\), and \(3\)

For each atom \(i\), precompute its digit histogram

$d_i = (u_{i,1}, u_{i,2}, u_{i,3}),$

where \(u_{i,r}\) is the number of times digit \(r\) appears in that atom. If we write

$x_n = (a_n,b_n,c_n),$

then the final answer is obtained by linear aggregation:

$x_n = \sum_{i=1}^{92} v_n(i)\, d_i.$

Equivalently, if \(D\) is the \(3 \times 92\) matrix of per-atom digit counts, then

$x_n = D v_n.$

So the full computation consists of two stages: evolve the atom population, then convert that population into the required digit totals.

Step 4: Fast exponentiation of the transition

Applying \(M\) step by step would still require about \(10^{12}\) updates, which is far too slow. Instead, the implementation computes \(M^{n-8}\) by exponentiation by squaring. That reduces the dependence on \(n\) from linear to logarithmic:

$M^{n-8} = \prod_{\text{bits of } n-8} M^{2^k}.$

Because the required output is taken modulo \(2^{30}\), every intermediate matrix entry, vector entry, and digit accumulation can also be reduced modulo \(2^{30}\) without changing the final result modulo \(2^{30}\).

Worked Example: the anchor term

The chosen anchor is also a convenient consistency check. Since

$A_8 = 1113213211,$

we immediately obtain

$a_8 = 6,\qquad b_8 = 2,\qquad c_8 = 2.$

This agrees with the atom decomposition \(11132 \, 13211\): once the initial vector is correct, the matrix method and direct counting coincide at the anchor before any large exponentiation begins.

How the Code Works

The C++, Python, and Java implementations all encode the same mathematical data: the 92 Conway atom strings, the one-step decay rule for each atom, and the corresponding sparse \(92 \times 92\) transition matrix. Each column has only a few nonzero entries because one atom decays into at most six atoms.

They also precompute, once, how many \(1\)s, \(2\)s, and \(3\)s occur inside each atom. Starting from the two-atom decomposition of the 8th term, the implementation applies binary exponentiation to evolve the atom-count vector to the requested term and then forms the final digit triple by a weighted sum of those precomputed histograms.

The C++ implementation additionally includes a tiny direct generator for small indices so that checkpoint terms can be verified against the matrix method. The fast path for the real target is the same linear-algebra approach in all three languages.

Complexity Analysis

Let \(K = 92\). In a dense matrix model, one matrix squaring costs \(O(K^3)\), and exponentiation by squaring needs \(O(\log n)\) squarings, so the total running time is

$O(K^3 \log n).$

Applying a matrix to a vector costs \(O(K^2)\) and does not change the overall asymptotic bound. The memory usage is \(O(K^2)\) for the transition matrix and \(O(K)\) for vectors and per-atom digit histograms. In practice the runtime is noticeably better than the dense estimate because the transition matrix is sparse.

References

  1. Problem page: https://projecteuler.net/problem=419
  2. Look-and-say sequence: Wikipedia — Look-and-say sequence
  3. Conway cosmological decay: Wikipedia — Cosmological decay
  4. Exponentiation by squaring: Wikipedia — Exponentiation by squaring

Mathematical approach · C++ solution · Python solution · Java solution

Previous: Problem 418 · All Project Euler solutions · Next: Problem 420

Need help with a problem? Ask me! 💡
e
✦ Euler GLM 5.2
Hi! I'm Euler. Ask me anything about Project Euler problems, math concepts, or solution approaches! 🧮