Problem 214: Totient Chains
View on Project EulerProject Euler Problem 214 Solution
EulerSolve provides an optimized solution for Project Euler Problem 214, Totient Chains, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary For the Project Euler parameters \(N=40{,}000{,}000\) and \(T=25\), define the totient chain of a positive integer by repeatedly applying Euler's totient function until the value reaches \(1\): $n \to \varphi(n) \to \varphi(\varphi(n)) \to \cdots \to 1.$ The problem asks for the sum of all primes \(p < N\) whose chain has length exactly \(T\). If we denote the chain length by \(\ell(n)\), then the target quantity is $S(N,T)=\sum_{\substack{p < N\\ \ell(p)=T}} p,$ where the sum runs over primes. A naive prime-by-prime search would recompute the same totient values again and again. The implementations avoid that duplication by building one global table of \(\varphi(n)\) and one global table of chain lengths. Mathematical Approach The key object is the directed map \(n \mapsto \varphi(n)\). For Problem 214, every useful fact comes from understanding how this map behaves on all integers up to the bound, not just on the primes that appear in the final sum. The totient map always moves downward For every integer \(n > 1\), Euler's totient satisfies \(\varphi(n) < n\). Therefore repeated iteration cannot cycle above \(1\); it must eventually terminate at \(1\). This makes the chain length well defined for every positive integer. The natural base case is $\ell(1)=1,$ because the chain starting at \(1\) already consists of a single term....
Detailed mathematical approach
Problem Summary
For the Project Euler parameters \(N=40{,}000{,}000\) and \(T=25\), define the totient chain of a positive integer by repeatedly applying Euler's totient function until the value reaches \(1\):
$n \to \varphi(n) \to \varphi(\varphi(n)) \to \cdots \to 1.$
The problem asks for the sum of all primes \(p < N\) whose chain has length exactly \(T\). If we denote the chain length by \(\ell(n)\), then the target quantity is
$S(N,T)=\sum_{\substack{p < N\\ \ell(p)=T}} p,$
where the sum runs over primes. A naive prime-by-prime search would recompute the same totient values again and again. The implementations avoid that duplication by building one global table of \(\varphi(n)\) and one global table of chain lengths.
Mathematical Approach
The key object is the directed map \(n \mapsto \varphi(n)\). For Problem 214, every useful fact comes from understanding how this map behaves on all integers up to the bound, not just on the primes that appear in the final sum.
The totient map always moves downward
For every integer \(n > 1\), Euler's totient satisfies \(\varphi(n) < n\). Therefore repeated iteration cannot cycle above \(1\); it must eventually terminate at \(1\). This makes the chain length well defined for every positive integer.
The natural base case is
$\ell(1)=1,$
because the chain starting at \(1\) already consists of a single term. For \(n \ge 2\), one step of the chain removes the first term and leaves the chain starting at \(\varphi(n)\), so
$\ell(n)=1+\ell(\varphi(n)).$
This is the central recurrence in all three implementations. Because \(\varphi(n)\) is strictly smaller than \(n\), the values form a dependency graph that is automatically topologically ordered by the ordinary numeric order \(1,2,3,\dots,N\).
Primes enter through the identity \(\varphi(p)=p-1\)
If \(p\) is prime, then every number \(1 \le a \lt p\) is coprime to \(p\), so
$\varphi(p)=p-1.$
Hence every candidate prime immediately reduces to a composite even number:
$\ell(p)=1+\ell(p-1).$
This does not mean we can skip the composite numbers; on the contrary, the prime case depends on them directly. Problem 214 is solved only after the chain length has been computed for every integer up to the bound.
Why the linear sieve gives every totient value in one pass
The implementations compute all totients with a linear sieve. Maintain a growing list of discovered primes. When an integer \(i\) has not been marked composite, it is prime, so its totient is known immediately:
$\varphi(i)=i-1.$
Then combine \(i\) with each known prime \(q\) to form \(v=i q\). Two cases determine \(\varphi(v)\):
$\varphi(iq)= \begin{cases} \varphi(i)\,q, & q \mid i,\\ \varphi(i)\,(q-1), & q \nmid i. \end{cases}$
The second line is just multiplicativity for coprime factors: if \(\gcd(i,q)=1\), then \(\varphi(iq)=\varphi(i)\varphi(q)=\varphi(i)(q-1)\). The first line follows from writing \(i=q^a m\) with \(q \nmid m\): then
$\varphi(i)=q^{a-1}(q-1)\varphi(m),\qquad \varphi(iq)=q^a(q-1)\varphi(m)=q\,\varphi(i).$
The linear sieve stops the inner generation as soon as it uses the smallest prime divisor of \(i\). That invariant ensures each composite is produced exactly once, so the totient table is filled in overall \(O(N)\) time.
Worked example: small chains and the checkpoint sums
A few small chains show the recurrence in action:
$5 \to 4 \to 2 \to 1,\qquad \ell(5)=4,$
$7 \to 6 \to 2 \to 1,\qquad \ell(7)=4,$
$11 \to 10 \to 4 \to 2 \to 1,\qquad \ell(11)=5.$
So for the small test case \(N=100\) and \(T=4\), the qualifying primes are \(5\) and \(7\), whose sum is \(12\). For \(N=1000\) and \(T=5\), the qualifying primes are \(11,13,19\), whose sum is \(43\). Those two values are exactly the checkpoint cases used by the C++ implementation.
The final summation is a simple filter once the tables exist
After the sieve has produced the prime list and all values \(\varphi(n)\), and after the recurrence has produced all values \(\ell(n)\), nothing difficult remains. One final pass over the prime list is enough: keep the primes with \(\ell(p)=T\) and add them. There is no need to follow any chain on demand during this last stage, because each chain length is already available as a table lookup.
How the Code Works
First pass: build the prime list and the totient table
The C++, Python, and Java implementations all start with a linear sieve over \(2,3,\dots,N\). In the same loop they identify primes, mark composites, and assign \(\varphi(n)\) using the two-case recurrence above. This is the mathematically decisive preprocessing step, because every later calculation depends on \(\varphi(n)\).
Second pass: fill chain lengths in ascending order
With \(\varphi(1),\varphi(2),\dots,\varphi(N)\) already known, the implementations allocate a second array for chain lengths, set \(\ell(1)=1\), and process \(n=2,3,\dots,N\) in order. Each entry is obtained from the recurrence
$\ell(n)=1+\ell(\varphi(n)).$
Because \(\varphi(n) < n\), the right-hand side always refers to an earlier entry. No recursion is needed; the table is filled iteratively in one linear sweep.
Third pass: inspect primes only and accumulate the answer
Once chain lengths are known, the final pass is over primes only. Whenever a prime has the target length, it is added to the running total. The Python and Java implementations hardcode the Project Euler parameters \(N=40{,}000{,}000\) and \(T=25\). The C++ implementation uses the same algorithm but also supports alternate bounds and target lengths, and it verifies itself on the two small checkpoint sums before the full computation.
Complexity Analysis
The linear sieve is \(O(N)\) time because each composite number is generated once. The chain-length pass is another \(O(N)\) sweep, and the final prime filter is \(O(\pi(N))\), which is contained in \(O(N)\). Therefore the overall running time is linear in the bound:
$O(N).$
The memory usage is also \(O(N)\): one array for composite markers, one for totients, one for chain lengths, and a list of primes. For Problem 214 this linear behavior is essential, because \(N=40{,}000{,}000\) is far beyond the range where repeated per-prime chain simulation would be comfortable.
Footnotes and References
- Project Euler Problem 214: https://projecteuler.net/problem=214
- Euler's totient function: Wikipedia - Euler's totient function
- Linear sieve overview: cp-algorithms - Linear Sieve
- Sieve of Eratosthenes and related sieves: Wikipedia - Sieve of Eratosthenes
- Dynamic programming: Wikipedia - Dynamic programming