Problem 494: Collatz Prefix Families
View on Project EulerProject Euler Problem 494 Solution
EulerSolve provides an optimized solution for Project Euler Problem 494, Collatz Prefix Families, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary For Problem 494 we must evaluate \(f(90)\), where \(f(t)\) counts Collatz prefix families of length \(t\). The checkpoints built into the implementations are \(f(5)=5\), \(f(10)=55\), and \(f(20)=6771\). A direct search in the inverse Collatz tree grows too quickly, so the solution splits the count into a universal Fibonacci backbone and a small finite correction coming from exceptional seeds. Mathematical Approach Let $T(n)=\begin{cases} n/2, & n\equiv 0 \pmod{2}, \\ 3n+1, & n\equiv 1 \pmod{2}. \end{cases}$ The required counting function is evaluated by walking backward through the inverse Collatz graph. The crucial point is that almost all of the structure can be summarized by a Fibonacci term, while the remaining irregular part is concentrated in a fixed set of six seed values. Step 1: Write the inverse Collatz rules If \(T(m)=x\), then \(m\) can arise in two ways. The even predecessor is always \(2x\). An odd predecessor exists exactly when \(x\equiv 4\pmod{6}\), in which case it is \((x-1)/3\). So the inverse predecessor set is $P(x)=\left\{2x\right\}\cup\left\{\frac{x-1}{3}: x\equiv 4\pmod{6}\right\}.$ Doubling is unconditional, but the odd branch appears only in one congruence class modulo \(6\). That arithmetic restriction is the whole reason the count is structured rather than chaotic....
Detailed mathematical approach
Problem Summary
For Problem 494 we must evaluate \(f(90)\), where \(f(t)\) counts Collatz prefix families of length \(t\). The checkpoints built into the implementations are \(f(5)=5\), \(f(10)=55\), and \(f(20)=6771\). A direct search in the inverse Collatz tree grows too quickly, so the solution splits the count into a universal Fibonacci backbone and a small finite correction coming from exceptional seeds.
Mathematical Approach
Let
$T(n)=\begin{cases} n/2, & n\equiv 0 \pmod{2}, \\ 3n+1, & n\equiv 1 \pmod{2}. \end{cases}$
The required counting function is evaluated by walking backward through the inverse Collatz graph. The crucial point is that almost all of the structure can be summarized by a Fibonacci term, while the remaining irregular part is concentrated in a fixed set of six seed values.
Step 1: Write the inverse Collatz rules
If \(T(m)=x\), then \(m\) can arise in two ways. The even predecessor is always \(2x\). An odd predecessor exists exactly when \(x\equiv 4\pmod{6}\), in which case it is \((x-1)/3\).
So the inverse predecessor set is
$P(x)=\left\{2x\right\}\cup\left\{\frac{x-1}{3}: x\equiv 4\pmod{6}\right\}.$
Doubling is unconditional, but the odd branch appears only in one congruence class modulo \(6\). That arithmetic restriction is the whole reason the count is structured rather than chaotic.
Step 2: Isolate the Fibonacci backbone
Whenever the odd predecessor \((x-1)/3\) is used, the result is odd. Therefore the very next inverse step cannot again use the odd-predecessor condition \(x\equiv 4\pmod{6}\). In other words, two odd-branch choices can never occur consecutively.
This creates the standard Fibonacci recurrence. If \(F_1=F_2=1\) and
$F_t=F_{t-1}+F_{t-2}\qquad (t\ge 3),$
then the backbone contribution extracted by the implementations is exactly \(F_t\). This is why the small checkpoints begin as
$f(5)=F_5=5,\qquad f(10)=F_{10}=55,$
before any exceptional seed has enough remaining depth to contribute.
Step 3: Reduce the correction term to six seeds
The correction comes from the finite seed set
$S=\{9,19,37,51,155,159\}.$
For each seed \(s\in S\), define its forward Collatz distance to the first power of two by
$\lambda(s)=\min\left\{k\ge 0: T^{(k)}(s)\text{ is a power of }2\right\}.$
If we want families of length \(t\), the seed \(s\) matters only when the remaining depth
$r_s=t-\lambda(s)$
is non-negative. The count therefore splits as
$f(t)=F_t+\sum_{s\in S,\ r_s\ge 0} B_{r_s}(s),$
where \(B_r(a)\) denotes the number of inverse descendants of \(a\) after \(r\) backward steps. Seeds with \(r_s<0\) contribute nothing.
Step 4: Count inverse descendants directly
The literal backward recurrence is
$B_0(a)=1,$
$B_{r+1}(a)=B_r(2a)+ \begin{cases} B_r\!\left(\dfrac{a-1}{3}\right), & a\equiv 4\pmod{6},\\ 0, & \text{otherwise}. \end{cases}$
This is the exact mathematical rule used by the simpler implementations: at each layer, replace every current value \(a\) by the always-valid predecessor \(2a\), and add \((a-1)/3\) only when the congruence condition holds.
There is one immediate simplification. If \(3\mid a\), then no odd predecessor can ever appear, because values divisible by \(3\) never satisfy \(x\equiv 4\pmod{6}\). Hence
$3\mid a \implies B_r(a)=1\qquad\text{for all }r\ge 0.$
Step 5: Compress the large-depth counting by residue classes
Directly expanding the inverse tree is still too expensive for the final target. The optimized implementation therefore groups states by the number \(k\) of odd-branch moves and by the residue class modulo
$M_k=2\cdot 3^k.$
After a backward walk with exactly \(k\) odd-branch moves, this modulus retains exactly the parity and divisibility-by-\(3\) information needed for all future admissibility tests.
If a state is known modulo \(M_k\), the doubling update comes from solving
$2u\equiv v \pmod{M_k},$
which produces two residue classes for \(u\). The odd-branch update comes from
$v=3u+1,$
so one residue class modulo \(M_{k-1}\) lifts to a single residue class modulo \(M_k\). This turns enormous families of backward walks into residue-table counts.
Step 6: Use a four-step transfer for the remaining tail
The optimized implementation also batches the last part of the backward expansion in blocks of four steps. For values not divisible by \(3\), the set of admissible four-step descendants depends only on the current residue modulo \(18\).
That means long remaining depths can be reduced by repeated four-step transfers until the state reaches the precomputed residue-table horizon, after which table lookup finishes the count. The underlying count is unchanged; only the bookkeeping is compressed.
Worked Example: \(t=20\)
The Fibonacci backbone gives
$F_{20}=6765.$
Now compute the forward distances to the first power of two:
$\lambda(9)=15,\qquad \lambda(19)=16,\qquad \lambda(37)=17,\qquad \lambda(51)=20,$
while \(\lambda(155)=81\) and \(\lambda(159)=50\), so the last two seeds do not contribute at \(t=20\).
The remaining depths are therefore \(5,4,3,0\). Because \(9\) and \(51\) are divisible by \(3\), each contributes exactly \(1\).
For \(19\), the backward expansion is
$19 \to 38 \to 76 \to \{152,25\} \to \{304,50\},$
so \(B_4(19)=2\).
For \(37\), the backward expansion is
$37 \to 74 \to 148 \to \{296,49\},$
so \(B_3(37)=2\).
Therefore
$f(20)=6765+1+2+2+1=6771,$
which is exactly the checkpoint verified by the implementations.
How the Code Works
The C++, Python, and Java implementations all begin with the same decomposition: compute the Fibonacci backbone \(F_t\), determine the six seed depths \(\lambda(s)\), and add the inverse-descendant counts for the seeds that still have non-negative remaining depth.
The Python and Java implementations keep the correction term in its most literal form. They simulate the backward recurrence layer by layer, replacing each current value by its doubled predecessor and, when permitted, its odd predecessor. That makes the mathematical structure very transparent.
The C++ implementation uses exactly the same mathematics but accelerates the correction term. It precomputes residue-class tables modulo \(2\cdot 3^k\), orders the seeds by remaining depth, reduces long tails with four-step transfers governed by residues modulo \(18\), and finally accumulates the matching residue counts for each seed. The checkpoints \(5\), \(55\), and \(6771\) are verified before the final computation of \(f(90)\).
Complexity Analysis
If the correction term is implemented literally, inverse expansion is exponential in the remaining seed depth because each layer can branch. That is acceptable only for the short explanatory versions. The optimized implementation replaces most of that tree by residue tables whose widest layer has size \(2\cdot 3^k\), where \(k\) is the number of odd-branch moves being tracked. If \(R=\max_s \max(0,t-\lambda(s))\), the precomputed depth is about \(R/3\), so memory grows like \(O(3^{R/6})\) and the table-building time like \(O(R\,3^{R/6})\), followed by only a small amount of per-seed work because there are just six seeds. In practice this is what makes the final target \(t=90\) feasible.
Footnotes and References
- Problem page: https://projecteuler.net/problem=494
- Collatz conjecture and inverse trees: Wikipedia — Collatz conjecture
- Fibonacci numbers: Wikipedia — Fibonacci number
- Modular arithmetic: Wikipedia — Modular arithmetic
- Dynamic programming: Wikipedia — Dynamic programming