Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 14: Longest Collatz Sequence

View on Project Euler

Project Euler Problem 14 Solution

EulerSolve provides an optimized solution for Project Euler Problem 14, Longest Collatz Sequence, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary For every starting integer \(1 \le n \lt 10^6\), repeatedly apply the Collatz map $T(n)=\begin{cases}\dfrac{n}{2}, & n \equiv 0 \pmod{2},\\[4pt] 3n+1, & n \equiv 1 \pmod{2}.\end{cases}$ The goal is to find which starting value produces the longest chain before reaching 1, counting both the starting value and the final 1 as terms of the chain. The key difficulty is that the orbit of a small starting value can rise far above the search bound before it comes back down. That means the computation cannot be organized as a simple left-to-right table on the interval \(1,2,\dots,10^6-1\); it has to reuse information wherever different trajectories merge. Mathematical Approach Write the Collatz orbit of \(n\) as $n_0=n,\qquad n_{k+1}=T(n_k).$ Define the chain length \(\lambda(n)\) to be the number of terms from \(n\) down to 1, inclusive. Then the basic recurrence is $\lambda(1)=1,\qquad \lambda(n)=1+\lambda(T(n)) \quad (n \ge 2).$ Everything in the implementations follows from using this recurrence efficiently. The right state space is the orbit, not just the search interval The search is restricted to starting values below one million, but the recurrence itself lives on all positive integers visited by those starts....

Detailed mathematical approach

Problem Summary

For every starting integer \(1 \le n \lt 10^6\), repeatedly apply the Collatz map $T(n)=\begin{cases}\dfrac{n}{2}, & n \equiv 0 \pmod{2},\\[4pt] 3n+1, & n \equiv 1 \pmod{2}.\end{cases}$ The goal is to find which starting value produces the longest chain before reaching 1, counting both the starting value and the final 1 as terms of the chain.

The key difficulty is that the orbit of a small starting value can rise far above the search bound before it comes back down. That means the computation cannot be organized as a simple left-to-right table on the interval \(1,2,\dots,10^6-1\); it has to reuse information wherever different trajectories merge.

Mathematical Approach

Write the Collatz orbit of \(n\) as $n_0=n,\qquad n_{k+1}=T(n_k).$ Define the chain length \(\lambda(n)\) to be the number of terms from \(n\) down to 1, inclusive. Then the basic recurrence is $\lambda(1)=1,\qquad \lambda(n)=1+\lambda(T(n)) \quad (n \ge 2).$ Everything in the implementations follows from using this recurrence efficiently.

The right state space is the orbit, not just the search interval

The search is restricted to starting values below one million, but the recurrence itself lives on all positive integers visited by those starts. For example, the chain beginning at 13 is $13 \to 40 \to 20 \to 10 \to 5 \to 16 \to 8 \to 4 \to 2 \to 1,$ so even a small start immediately moves outside any tiny local neighborhood. The mathematical object being computed is therefore a directed functional graph on positive integers, where each \(n\) has exactly one outgoing edge to \(T(n)\).

Shared suffixes create a dynamic-programming recurrence

If two starting values ever reach the same intermediate value \(m\), then everything after \(m\) is identical for both chains. So once \(\lambda(m)\) is known, there is no reason to recompute the tail below \(m\) again. The central invariant is simple: if \(n_a=n_b=m\), then the remaining suffix from that point onward is identical. In other words, the problem is not about storing whole chains; it is about storing chain lengths at merge points.

The exact backfill formula

Suppose a forward walk starting from \(n\) produces distinct visited values $n_0,\ n_1,\ n_2,\ \dots,\ n_{r-1},\ n_r,$ and the first value already known in the memo table is \(n_r\). Then $\lambda(n_{r-1})=\lambda(n_r)+1,$ $\lambda(n_{r-2})=\lambda(n_r)+2,$ and in general $\lambda(n_i)=\lambda(n_r)+(r-i)\qquad (0 \le i \lt r).$ This is exactly why the implementations walk forward first, save the visited values on a stack, and then assign lengths while popping that stack in reverse order.

Worked example: recovering \(\lambda(13)\)

Starting from 13, the orbit is $13,\ 40,\ 20,\ 10,\ 5,\ 16,\ 8,\ 4,\ 2,\ 1.$ If the memo table already contains \(\lambda(1)=1\), reverse backfilling gives $\lambda(2)=2,\ \lambda(4)=3,\ \lambda(8)=4,\ \lambda(16)=5,$ $\lambda(5)=6,\ \lambda(10)=7,\ \lambda(20)=8,\ \lambda(40)=9,\ \lambda(13)=10.$ This matches the explicit length of the chain and also matches the checkpoint used by the C++ implementation.

From chain lengths to the final answer

Once \(\lambda(n)\) can be obtained quickly, the rest of the problem is a record search: $\underset{1 \le n \lt 10^6}{\arg\max}\ \lambda(n).$ The implementations scan the starting values in increasing order, compute each chain length with the memoized recurrence above, and keep the start that achieves the largest length seen so far. A smaller built-in checkpoint confirms that among starts below 20, the record holder is 18.

How the Code Works

Seed the memo table with the base case

The computation begins from the only chain length known a priori: $\lambda(1)=1.$ A hash table or dictionary stores already computed lengths. This choice matters because trajectories may rise above the original search limit, so the memoized values are not restricted to a fixed contiguous array of small integers.

Walk forward until a known value appears, then fill backward

For each new start, the implementation repeatedly applies the Collatz rule until it reaches a value whose length has already been stored. Every unseen value on that path is pushed onto a temporary stack. Once a known value is reached, the code pops the stack and assigns each stored value with the recurrence \(\lambda(x)=1+\lambda(T(x))\). This avoids recursion, uses the recurrence exactly, and ensures that every stored value is written with its correct final chain length.

Scan the search range and maintain the best start

After each chain length is recovered, the implementation compares it with the current record and updates the best starting value only when a strictly longer chain is found. The C++, Python, and Java implementations all follow this same logic. The C++ and Java versions use 64-bit integer arithmetic for the orbit values, Python uses arbitrary-precision integers, and the C++ version also includes an explicit guard against overflow in the odd step.

Complexity Analysis

Let \(L=10^6\) be the search bound, and let \(M\) be the number of distinct integers that appear in all trajectories started from \(1 \le n \lt L\) before they hit a memoized value. Because each newly discovered integer is inserted into the memo table once, pushed once, and popped once, the total expected running time is $O(L+M),$ where the expectation comes from constant-time average hash-table operations.

The memory usage is $O(M),$ not merely \(O(L)\), because the memo table also stores intermediate values above the search bound whenever a trajectory climbs that high. In practice this is still easily manageable for the Project Euler bound, and the memoized reuse of shared tails is what makes the search fast.

Footnotes and References

  1. Project Euler, Problem 14: https://projecteuler.net/problem=14
  2. Collatz conjecture: Wikipedia - Collatz conjecture
  3. Memoization: Wikipedia - Memoization
  4. Dynamic programming: Wikipedia - Dynamic programming
  5. Functional graph: Wikipedia - Functional graph
  6. Collatz problem overview: MathWorld - Collatz Problem

Mathematical approach · C++ solution · Python solution · Java solution

Previous: Problem 13 · All Project Euler solutions · Next: Problem 15

Need help with a problem? Ask me! 💡
e
✦ Euler GLM 5.2
Hi! I'm Euler. Ask me anything about Project Euler problems, math concepts, or solution approaches! 🧮