Problem 308: An Amazing Prime-generating Automaton
View on Project EulerProject Euler Problem 308 Solution
EulerSolve provides an optimized solution for Project Euler Problem 308, An Amazing Prime-generating Automaton, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary The automaton in Problem 308 is a graphical encoding of Conway's prime-generating FRACTRAN program. If we simulate it literally, the number of steps explodes long before we reach the \(10001\)-st prime output. The key is to understand one complete candidate cycle analytically and count steps without walking through every transition. Mathematical Approach 1) State interpretation: prime exponents, not raw integers The FRACTRAN machine can be represented by exponent counts of the primes $2,3,5,7,11,13,17,19,23,29.$ So a machine state is equivalent to a vector of exponents, or to the integer $N=2^a3^b5^c7^d11^e13^f17^g19^h23^i29^j.$ The automaton “prints” a value exactly when the state is a pure power of two: $N=2^m.$ Then the printed number is the exponent \(m\). The brute-force checkpoint in the C++ code confirms that the first outputs are $2,3,5,7,11,13,17,19,23,29,\dots$ with step indices $19,69,281,710,2375,3893,8102,11361,19268,36981,\dots$ 2) Canonical start-state for candidate \(n\) The program does not test primes by trial division in the usual sense....
Detailed mathematical approach
Problem Summary
The automaton in Problem 308 is a graphical encoding of Conway's prime-generating FRACTRAN program. If we simulate it literally, the number of steps explodes long before we reach the \(10001\)-st prime output. The key is to understand one complete candidate cycle analytically and count steps without walking through every transition.
Mathematical Approach
1) State interpretation: prime exponents, not raw integers
The FRACTRAN machine can be represented by exponent counts of the primes
$2,3,5,7,11,13,17,19,23,29.$
So a machine state is equivalent to a vector of exponents, or to the integer
$N=2^a3^b5^c7^d11^e13^f17^g19^h23^i29^j.$
The automaton “prints” a value exactly when the state is a pure power of two:
$N=2^m.$
Then the printed number is the exponent \(m\). The brute-force checkpoint in the C++ code confirms that the first outputs are
$2,3,5,7,11,13,17,19,23,29,\dots$
with step indices
$19,69,281,710,2375,3893,8102,11361,19268,36981,\dots$
2) Canonical start-state for candidate \(n\)
The program does not test primes by trial division in the usual sense. Instead, the FRACTRAN dynamics repeatedly reaches a canonical state that encodes “we are now testing candidate \(n\).” In the optimized derivation, the first step index of that canonical state is denoted by
$T_n.$
The source code uses
$T_2=2,$
meaning that after two initial transitions from the seed, the machine has entered the standard start configuration for candidate \(2\).
3) What one candidate cycle does
Once candidate \(n\) starts, the automaton behaves like a deterministic divisor-testing routine.
Setup: it builds the control structure for the current candidate.
Middle phase: it tests divisors \(d\) in descending order.
Exit: if a divisor is found, the machine moves on to candidate \(n+1\); if no divisor exists, it follows a special prime branch and emits \(2^n\).
This is why the whole process can be summarized by two exact arithmetic quantities:
$\Delta(n)=T_{n+1}-T_n,$
the total step count from the start of candidate \(n\) to the start of candidate \(n+1\), and
$H(p),$
the offset from \(T_p\) to the moment where prime candidate \(p\) is actually emitted.
4) Why the largest proper divisor appears
Let
$s=\operatorname{spf}(n)$
be the smallest prime factor of \(n\). Then the first divisor encountered while scanning downward is the largest proper divisor
$d_*=\frac{n}{s}.$
All integers
$d=n-1,n-2,\dots,d_*+1$
are guaranteed to be non-divisors, and the scan stops exactly at \(d_*\). That is why the composite-case formula naturally splits into “all non-divisor tests above \(d_*\)” plus “the final successful divisor branch at \(d_*\).”
5) Exact cost formulas
The code has already flattened the entire state graph of one candidate cycle into exact transition counts.
For a general candidate \(n\), the total jump to the next candidate is
$\Delta(n)=(2n-1)+\sum_{d=d_*+1}^{n-1}\bigl(6n+2+2\lfloor n/d\rfloor\bigr)+\bigl(5n+d_*+2\lfloor n/d_*\rfloor+2\bigr).$
The three parts are:
Setup: \(2n-1\).
Rejected divisor tests: one term \(6n+2+2\lfloor n/d\rfloor\) for every non-divisor above \(d_*\).
Terminal composite branch: the final successful branch at \(d_*\).
If \(p\) is prime, there is no divisor in \(2,\dots,p-1\), so the machine instead reaches the prime-output branch:
$H(p)=(2p-1)+\sum_{d=2}^{p-1}\bigl(6p+2+2\lfloor p/d\rfloor\bigr)+(6p+2).$
The last term \(6p+2\) is the prime branch itself.
6) Accumulating global time
Once \(\Delta(n)\) is known, the candidate-start times satisfy the prefix recurrence
$T_{n+1}=T_n+\Delta(n),\qquad T_2=2.$
If \(p_k\) is the \(k\)-th prime, then the answer to the Project Euler problem is simply
$T_{p_k}+H(p_k).$
For example, candidate \(2\) starts at step \(T_2=2\), and
$H(2)=17,$
so the first prime hit occurs at
$2+17=19.$
Likewise, \(\Delta(2)=20\), so \(T_3=22\); then \(H(3)=47\), giving the second prime hit at
$22+47=69.$
These match the brute-force checkpoints exactly.
7) Fast evaluation of the floor sums
The expensive-looking part is
$\sum \lfloor n/d\rfloor.$
The implementation evaluates such sums in quotient blocks: if several consecutive divisors have the same quotient \(q=\lfloor n/d\rfloor\), they are added at once. This is the standard harmonic-grouping trick and makes each floor-sum sublinear instead of linear.
How the Code Works
1) Prime bound. The solver first estimates an upper bound for the \(k\)-th prime using \(n(\log n+\log\log n)\)-type asymptotics, then enlarges it if necessary.
2) Linear sieve. A sieve produces both the prime list and the smallest-prime-factor table \(\operatorname{spf}(n)\).
3) Floor-sum helpers. floor_sum_upto() and floor_sum_range() implement quotient-block summation.
4) Closed formulas. candidate_delta() computes \(\Delta(n)\), and prime_hit_offset() computes \(H(p)\).
5) Prefix walk. The main loop advances \(T_n\) candidate by candidate until it reaches the target prime \(p_k\), then returns \(T_{p_k}+H(p_k)\).
6) Strong validation. The C++ code also contains a literal automaton simulator and checks that the first 10 prime hits and several small target indices agree with the closed formulas.
Complexity Analysis
The sieve costs \(O(B)\) time and memory up to the chosen prime bound \(B\). Each candidate then needs only a constant amount of algebra plus a few harmonic floor-sums, each taking about \(O(\sqrt n)\) block steps. This is enormously faster than literal FRACTRAN simulation, which would need to traverse about \(1.5\times10^{15}\) transitions for the final answer.
Further Reading
- Problem page: https://projecteuler.net/problem=308
- FRACTRAN background: https://en.wikipedia.org/wiki/FRACTRAN
- Harmonic grouping for floor sums: https://cp-algorithms.com/algebra/divisors.html