Problem 341: Golomb's Self-describing Sequence
View on Project EulerProject Euler Problem 341 Solution
EulerSolve provides an optimized solution for Project Euler Problem 341, Golomb's Self-describing Sequence, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary The Golomb sequence \(G(n)\) is the unique nondecreasing sequence of positive integers in which the integer \(n\) appears exactly \(G(n)\) times. Its beginning is $1,2,2,3,3,4,4,4,5,5,5,6,6,6,6,\dots$ Project Euler 341 asks for $\sum_{1\le n \lt 10^6} G(n^3).$ The official checkpoints \(G(10^3)=86\), \(G(10^6)=6137\), and \(\sum_{1\le n \lt 10^3}G(n^3)=153506976\) show that the queried positions become enormous very quickly. A naive approach that expands the sequence all the way to the largest queried position \((10^6-1)^3\) would be hopelessly too slow and too memory-hungry. Mathematical Approach Self-Describing Structure By definition, value \(k\) occurs exactly \(G(k)\) times. The implementation generates the sequence with the classical recurrence $G(1)=1,\qquad G(n)=1+G\bigl(n-G(G(n-1))\bigr)\quad(n\ge 2).$ This recurrence is equivalent to the usual Project Euler formulation and allows us to precompute \(G(1),G(2),\dots\) once. First Prefix Sum: Value Boundaries Define the first prefix sum $S(k)=\sum_{i=1}^{k}G(i).$ Since the value \(k\) appears \(G(k)\) times, the positions holding \(k\) are exactly $S(k-1)+1,\ S(k-1)+2,\ \dots,\ S(k).$ Therefore $\boxed{G(n)=k\iff S(k-1)\lt n\le S(k).}$ So \(S(k)\) marks where the block of value \(k\) ends. Second Prefix Sum: Blocking the Indices Themselves Now look at the indices \(m\) for which \(G(m)=k\)....
Detailed mathematical approach
Problem Summary
The Golomb sequence \(G(n)\) is the unique nondecreasing sequence of positive integers in which the integer \(n\) appears exactly \(G(n)\) times. Its beginning is
$1,2,2,3,3,4,4,4,5,5,5,6,6,6,6,\dots$
Project Euler 341 asks for
$\sum_{1\le n \lt 10^6} G(n^3).$
The official checkpoints \(G(10^3)=86\), \(G(10^6)=6137\), and \(\sum_{1\le n \lt 10^3}G(n^3)=153506976\) show that the queried positions become enormous very quickly. A naive approach that expands the sequence all the way to the largest queried position \((10^6-1)^3\) would be hopelessly too slow and too memory-hungry.
Mathematical Approach
Self-Describing Structure
By definition, value \(k\) occurs exactly \(G(k)\) times. The implementation generates the sequence with the classical recurrence
$G(1)=1,\qquad G(n)=1+G\bigl(n-G(G(n-1))\bigr)\quad(n\ge 2).$
This recurrence is equivalent to the usual Project Euler formulation and allows us to precompute \(G(1),G(2),\dots\) once.
First Prefix Sum: Value Boundaries
Define the first prefix sum
$S(k)=\sum_{i=1}^{k}G(i).$
Since the value \(k\) appears \(G(k)\) times, the positions holding \(k\) are exactly
$S(k-1)+1,\ S(k-1)+2,\ \dots,\ S(k).$
Therefore
$\boxed{G(n)=k\iff S(k-1)\lt n\le S(k).}$
So \(S(k)\) marks where the block of value \(k\) ends.
Second Prefix Sum: Blocking the Indices Themselves
Now look at the indices \(m\) for which \(G(m)=k\). From the previous identity, those indices are
$m=S(k-1)+1,\dots,S(k),$
and there are exactly \(G(k)\) of them. Each such index \(m\) contributes the value \(k\), so the total length contributed by the entire \(k\)-block is \(k\cdot G(k)\). This motivates the weighted prefix sum
$P(k)=\sum_{i=1}^{k} i\,G(i).$
After finishing all blocks up to \(k\), the occupied sequence positions are precisely \(1,\dots,P(k)\).
Inverting a \(P\)-Block
Suppose \(x\) satisfies
$P(k-1)\lt x\le P(k).$
Then \(G(x)\) must lie inside the block made of the values
$S(k-1)+1,\dots,S(k),$
where each value is repeated exactly \(k\) times. Let
$d=x-P(k-1).$
Every consecutive group of \(k\) positions advances the answer by 1, hence the offset inside the block is
$\left\lceil\frac{d}{k}\right\rceil.$
So we get the fast query formula
$\boxed{G(x)=S(k-1)+\left\lceil\frac{x-P(k-1)}{k}\right\rceil.}$
This is the core identity used by the function golomb_at.
Worked Example: \(x=10\)
The first boundaries are
$S(1)=1,\qquad S(2)=3,\qquad S(3)=5,$
and the weighted boundaries are
$P(1)=1,\qquad P(2)=5,\qquad P(3)=11.$
Since \(P(2)=5\lt 10\le 11=P(3)\), the query lies in the \(k=3\) block. Therefore
$G(10)=S(2)+\left\lceil\frac{10-P(2)}{3}\right\rceil =3+\left\lceil\frac{5}{3}\right\rceil =5.$
Indeed, the tenth term of the Golomb sequence is 5.
Why the Monotone Sweep Works
The problem only asks for \(x=n^3\), and these queries arrive in strictly increasing order. Therefore the correct block index \(k\) never moves backward. The code keeps the current \(k\), \(S(k)\), and \(P(k)\) in a reusable state object and only advances them when the next cube crosses a new block boundary. Precomputation stops as soon as \(P(k)\) reaches the largest required cube \((10^6-1)^3\).
How the Code Works
precompute_golomb_until(max_query_position) builds the table golomb[n]=G(n) using the recurrence above. While doing so, it also tracks products, which is exactly the weighted prefix sum \(P(k)\). Thus it stops as soon as the largest needed query position is covered.
QueryState stores the current block index \(k\), the current boundaries \(S(k)\) and \(P(k)\), and the previous boundaries \(S(k-1)\) and \(P(k-1)\). In golomb_at(x, golomb, st), a simple while loop advances to the unique block satisfying \(P(k-1)\lt x\le P(k)\), then evaluates
$G(x)=S(k-1)+\left\lceil\frac{x-P(k-1)}{k}\right\rceil.$
The ceiling is implemented with pure integer arithmetic:
$\left\lceil\frac{a}{b}\right\rceil=\left\lfloor\frac{a+b-1}{b}\right\rfloor.$
Finally, sum_golomb_cubes(limit, golomb) loops over \(n=1,2,\dots,\text{limit}-1\), computes \(x=n^3\), queries \(G(x)\), and accumulates the total.
Complexity Analysis
Let \(K\) be the largest block index needed so that \(P(K)\ge (10^6-1)^3\). Precomputing \(G(1),\dots,G(K)\) costs \(O(K)\) time and \(O(K)\) memory. The cube sweep costs \(O(10^6+K)\) amortized, because the pointer over \(k\)-blocks only moves forward. There is no per-query binary search and no attempt to materialize the sequence up to index \(10^{18}\).
Footnotes and References
- Problem page: https://projecteuler.net/problem=341
- Golomb sequence: Wikipedia - Golomb sequence
- Prefix sums: Wikipedia - Prefix sum