Problem 869: Prime Guessing
View on Project EulerProject Euler Problem 869 Solution
EulerSolve provides an optimized solution for Project Euler Problem 869, Prime Guessing, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary Let \(P_N\) be the set of primes not exceeding \(N\), with \(N=10^8\). One prime is chosen uniformly at random. We then try to guess its binary expansion one bit at a time, starting from the least significant bit. After each guess, the true next bit is effectively revealed because the candidate set is restricted to primes sharing the observed suffix. A correct guess scores one point. The goal is to maximize the expected total score, so the problem is really about making the best local decision at every partially revealed binary suffix. Mathematical Approach The natural state space is a binary trie built from reversed binary representations. Each node corresponds to a suffix of already revealed low bits, so every decision can be expressed in terms of how many remaining primes continue with bit \(0\) and how many continue with bit \(1\). Step 1: Encode Each Prime by Low Bits First Write a prime \(p\) in binary as $p=\sum_{d=0}^{\ell(p)-1} b_d(p)\,2^d,\qquad b_d(p)\in\{0,1\},\qquad b_{\ell(p)-1}(p)=1.$ The game reveals bits in the order \(b_0(p),b_1(p),b_2(p),\dots\), so a state is determined by an already known suffix \(s\) of low bits. A trie over these reversed bit strings stores exactly the primes compatible with each suffix....
Detailed mathematical approach
Problem Summary
Let \(P_N\) be the set of primes not exceeding \(N\), with \(N=10^8\). One prime is chosen uniformly at random. We then try to guess its binary expansion one bit at a time, starting from the least significant bit.
After each guess, the true next bit is effectively revealed because the candidate set is restricted to primes sharing the observed suffix. A correct guess scores one point. The goal is to maximize the expected total score, so the problem is really about making the best local decision at every partially revealed binary suffix.
Mathematical Approach
The natural state space is a binary trie built from reversed binary representations. Each node corresponds to a suffix of already revealed low bits, so every decision can be expressed in terms of how many remaining primes continue with bit \(0\) and how many continue with bit \(1\).
Step 1: Encode Each Prime by Low Bits First
Write a prime \(p\) in binary as
$p=\sum_{d=0}^{\ell(p)-1} b_d(p)\,2^d,\qquad b_d(p)\in\{0,1\},\qquad b_{\ell(p)-1}(p)=1.$
The game reveals bits in the order \(b_0(p),b_1(p),b_2(p),\dots\), so a state is determined by an already known suffix \(s\) of low bits. A trie over these reversed bit strings stores exactly the primes compatible with each suffix.
If \(S(s)\) denotes the set of primes whose reversed binary expansion begins with \(s\), then reaching node \(s\) means that the hidden prime is known to lie in \(S(s)\).
Step 2: Separate Continuing Primes from Finished Primes
At a trie node \(s\), let \(c(s)=|S(s)|\). Some primes end exactly at that node because their entire binary expansion has already been consumed. Let \(t(s)\) be that terminal count, and define
$q(s)=c(s)-t(s).$
Only those \(q(s)\) primes require another guess. If \(q(s)=0\), then the game is already over for every prime in that state, so the remaining expected score is \(0\).
Now look at the two child states \(s^{(0)}\) and \(s^{(1)}\), obtained by appending the next revealed bit. Let \(c_0(s)\) and \(c_1(s)\) be the numbers of compatible primes in those two children. Because every continuing prime must go to exactly one child, we have
$c_0(s)+c_1(s)=q(s).$
Step 3: The Optimal Guess Is the Local Majority Bit
Suppose we are at state \(s\) and must guess the next bit. Choosing \(0\) or \(1\) changes only the score of the current round. It does not change which child state is reached, because that depends on the hidden prime, not on our guess.
Therefore the optimal policy at node \(s\) is simply to guess the more common next bit among the continuing primes. The best immediate success probability is
$\frac{\max(c_0(s),c_1(s))}{q(s)}.$
This greedy choice is globally optimal because the future subproblem is the same regardless of which bit we guessed; only the current point is affected by the choice.
Step 4: Dynamic Programming Recurrence on Trie Nodes
Let \(E(s)\) be the maximum expected additional score once suffix \(s\) is known. For each child, some primes may stop immediately after that next bit. Let \(q_0(s)\) and \(q_1(s)\) be the numbers of primes that still continue beyond \(s^{(0)}\) and \(s^{(1)}\), respectively.
If \(q(s)=0\), then
$E(s)=0.$
If \(q(s)>0\), then
$E(s)=\frac{\max(c_0(s),c_1(s))}{q(s)}+\frac{q_0(s)}{q(s)}E\!\left(s^{(0)}\right)+\frac{q_1(s)}{q(s)}E\!\left(s^{(1)}\right).$
The first term is the expected point earned on the current guess. The remaining terms are weighted by the probabilities that the game continues into the two child states.
The required value for the problem is simply
$E(\varnothing),$
where \(\varnothing\) denotes the root, meaning that no bits have been revealed yet.
Worked Example: \(N=10\)
The primes are \(2,3,5,7\). Their reversed binary strings are
$2\to 01,\qquad 3\to 11,\qquad 5\to 101,\qquad 7\to 111.$
At the root, the next-bit counts are \(c_0=1\) and \(c_1=3\), so the optimal first guess is \(1\), worth \(3/4\) in expectation.
At the node reached after observing suffix \(1\), the next-bit counts are \(c_0=1\) and \(c_1=2\), so the optimal local score there is \(2/3\). From both of its relevant children, the continuation contributes exactly one more expected point.
Thus
$E(1)=\frac{2}{3}+\frac{1}{3}\cdot 1+\frac{1}{3}\cdot 1=\frac{4}{3},$
and the root value is
$E(\varnothing)=\frac{3}{4}+\frac{1}{4}\cdot 1+\frac{3}{4}\cdot \frac{4}{3}=2.$
This matches the small checkpoint used by the implementation. A second checkpoint in the implementations is \(N=30\), for which the same recurrence gives \(2.9\).
How the Code Works
The C++, Python, and Java implementations all follow the same two-phase plan. First they generate every prime up to \(10^8\) using an odd-only sieve, inserting \(2\) separately and then scanning only odd candidates.
Each prime is inserted into a binary trie from least significant bit to most significant bit. Every node records how many primes pass through it, whether a prime ends exactly there, and where the \(0\)-child and \(1\)-child are located.
Once the trie has been built, a depth-first traversal evaluates the recurrence bottom-up. At each node, the implementation reads the child counts, chooses the heavier child for the immediate guess, subtracts terminal primes when computing continuation probabilities, and combines those pieces into the expected score for that node.
Finally the root value is printed with fixed decimal precision. The three language versions differ only in storage details; mathematically they all compute the same trie dynamic program.
Complexity Analysis
Let \(N=10^8\). The sieve phase runs in \(O(N\log\log N)\) time. Inserting the primes into the trie costs
$O\!\left(\sum_{p\in P_N} \ell(p)\right)=O\!\bigl(\pi(N)\log N\bigr),$
because each prime contributes one trie step per binary digit. The final depth-first pass is linear in the number of trie nodes, so it is also \(O\!\bigl(\pi(N)\log N\bigr)\).
Memory usage is dominated by the odd-only sieve array together with the trie itself. In asymptotic form, the method uses \(O(N)\) sieve storage plus \(O\!\bigl(\pi(N)\log N\bigr)\) trie storage.
Footnotes and References
- Project Euler problem page: https://projecteuler.net/problem=869
- Trie data structure: Wikipedia - Trie
- Binary numeral system: Wikipedia - Binary number
- Sieve of Eratosthenes: Wikipedia - Sieve of Eratosthenes
- Expected value and dynamic programming ideas: Wikipedia - Expected value and Wikipedia - Dynamic programming