Problem 852: Coins in a Box
View on Project EulerProject Euler Problem 852 Solution
EulerSolve provides an optimized solution for Project Euler Problem 852, Coins in a Box, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary At a state with \(u\) unfair coins and \(f\) fair coins still in the box, one of the \(u+f\) remaining coins is selected uniformly at random. Therefore the current coin is unfair with prior probability $p=\frac{u}{u+f}.$ Before making the final classification, the player may toss that same coin as many times as desired. Every toss costs \(1\) point. The final classification pays \(+20\) if the type is identified correctly and \(-50\) if it is identified incorrectly. After that reveal, the coin is removed and the next round starts. The task is to maximize the expected total score when the box initially contains \(N\) unfair and \(N\) fair coins. The solution naturally splits into two layers: an optimal stopping problem for a single selected coin, and a second dynamic program for the changing composition of the box. Mathematical Approach Let \(E(p)\) be the optimal expected score contributed by the current coin when the prior probability of being unfair is \(p\). Let \(B(u,f)\) be the optimal expected total score from a box that still contains \(u\) unfair and \(f\) fair coins. The implementations first compute \(E(p)\) and then plug it into the recurrence for \(B(u,f)\). Step 1: Immediate decision value for one coin If we stop immediately, the best action is to guess the more likely type....
Detailed mathematical approach
Problem Summary
At a state with \(u\) unfair coins and \(f\) fair coins still in the box, one of the \(u+f\) remaining coins is selected uniformly at random. Therefore the current coin is unfair with prior probability
$p=\frac{u}{u+f}.$
Before making the final classification, the player may toss that same coin as many times as desired. Every toss costs \(1\) point. The final classification pays \(+20\) if the type is identified correctly and \(-50\) if it is identified incorrectly. After that reveal, the coin is removed and the next round starts.
The task is to maximize the expected total score when the box initially contains \(N\) unfair and \(N\) fair coins. The solution naturally splits into two layers: an optimal stopping problem for a single selected coin, and a second dynamic program for the changing composition of the box.
Mathematical Approach
Let \(E(p)\) be the optimal expected score contributed by the current coin when the prior probability of being unfair is \(p\). Let \(B(u,f)\) be the optimal expected total score from a box that still contains \(u\) unfair and \(f\) fair coins. The implementations first compute \(E(p)\) and then plug it into the recurrence for \(B(u,f)\).
Step 1: Immediate decision value for one coin
If we stop immediately, the best action is to guess the more likely type. Guessing “unfair” gives expected value
$20p-50(1-p)=70p-50,$
while guessing “fair” gives
$20(1-p)-50p=20-70p=70(1-p)-50.$
Hence the best immediate value is
$E_{\mathrm{stop}}(p)=\max\{70p-50,\;70(1-p)-50\}=70\max\{p,1-p\}-50.$
When \(p\) is close to \(1/2\), this quantity is negative, so blind guessing is worse than collecting evidence from tosses.
Step 2: Bayesian update after one more toss
If we toss the current coin once, the probability of a head is
$q_H=\frac{3}{4}p+\frac{1}{2}(1-p)=\frac{1}{2}+\frac{p}{4},$
and the probability of a tail is
$q_T=1-q_H=\frac{1}{2}-\frac{p}{4}.$
Bayes' rule turns these observations into posterior probabilities
$p_H=\frac{\frac{3}{4}p}{q_H}=\frac{3p}{2+p},\qquad p_T=\frac{\frac{1}{4}p}{q_T}=\frac{p}{2-p}.$
So if we continue for one more toss, the expected value is
$E_{\mathrm{cont}}(p)=-1+q_H E(p_H)+q_T E(p_T).$
The optimal single-coin value is therefore
$E(p)=\max\{E_{\mathrm{stop}}(p),\;E_{\mathrm{cont}}(p)\}.$
This recursion is exact, but it is infinite-horizon. The implementations turn it into a numerically stable finite-horizon dynamic program.
Step 3: Compress observation histories with log-odds
Let the prior odds be
$O_0=\frac{p_0}{1-p_0}.$
A head multiplies the odds by
$\frac{3/4}{1/2}=\frac{3}{2},$
while a tail multiplies them by
$\frac{1/4}{1/2}=\frac{1}{2}.$
After \(n\) tosses with \(h\) heads and \(n-h\) tails, the odds are
$O_{n,h}=O_0\left(\frac{3}{2}\right)^h\left(\frac{1}{2}\right)^{n-h}=O_0\,3^h\,2^{-n}.$
Taking logarithms yields
$\log O_{n,h}=\log O_0-n\log 2+h\log 3.$
Therefore the posterior depends only on the pair \((n,h)\), not on the exact order of the tosses. Recovering the posterior from the odds uses
$p_{n,h}=\frac{O_{n,h}}{1+O_{n,h}}.$
This state compression is the key reason the single-coin problem can be solved with a triangular dynamic-programming table.
Step 4: Backward induction for the single-coin game
The implementations use a depth cap \(D=220\). At depth \(D\), they force the process to stop and use the terminal value
$E_{D,h}=E_{\mathrm{stop}}(p_{D,h}).$
Then they sweep backward through the triangle with
$E_{n,h}=\max\left\{E_{\mathrm{stop}}(p_{n,h}),\;-1+q_{n,h}E_{n+1,h+1}+(1-q_{n,h})E_{n+1,h}\right\},$
where
$q_{n,h}=\frac{1}{2}+\frac{p_{n,h}}{4}.$
This is a standard optimal stopping recursion: at every posterior state, compare the payoff of guessing now with the payoff of paying one more point for more information.
The depth cap is effective because extra tosses always cost \(1\), while repeated evidence pushes the posterior probability rapidly toward \(0\) or \(1\), making immediate stopping optimal after sufficiently many observations.
Step 5: Dynamic programming for the whole box
Now return to the box with \(u\) unfair and \(f\) fair coins remaining. The selected coin is unfair with probability
$p=\frac{u}{u+f}.$
The current round contributes \(E(p)\) in expectation. After the coin is eventually revealed and removed, the future state is \((u-1,f)\) if the coin was unfair and \((u,f-1)\) if it was fair. Therefore
$B(u,f)=\frac{u}{u+f}B(u-1,f)+\frac{f}{u+f}B(u,f-1)+E\left(\frac{u}{u+f}\right).$
The boundary states are immediate:
$B(u,0)=20u,\qquad B(0,f)=20f,$
because once only one type remains, every remaining classification is certain and no tosses are useful. The required answer is
$S(N)=B(N,N).$
Worked Example: the case \(N=1\)
With one unfair coin and one fair coin, the first selected coin has prior probability
$p_0=\frac{1}{2}.$
Immediate stopping gives
$E_{\mathrm{stop}}\left(\frac{1}{2}\right)=70\cdot\frac{1}{2}-50=-15,$
so tossing is essential.
After one head, the posterior becomes
$p_H=\frac{3(1/2)}{2+1/2}=\frac{3}{5},$
and after one tail it becomes
$p_T=\frac{1/2}{2-1/2}=\frac{1}{3}.$
Stopping after exactly one toss is still unprofitable, because the immediate values are \(-8\) and \(-10/3\). But stronger evidence quickly changes the decision. After three consecutive heads, the odds are \(27/8\), so
$p=\frac{27}{35},\qquad E_{\mathrm{stop}}(p)=70\cdot\frac{27}{35}-50=4.$
After two consecutive tails, the odds are \(1/4\), so
$p=\frac{1}{5},\qquad E_{\mathrm{stop}}(p)=70\cdot\frac{4}{5}-50=6.$
The backward dynamic program decides exactly where that stop/continue boundary lies. For the starting prior \(1/2\), it yields
$E\left(\frac{1}{2}\right)\approx 0.558591.$
The other coin then becomes known with certainty and contributes \(20\), so
$S(1)=\frac{1}{2}\cdot 20+\frac{1}{2}\cdot 20+0.558591=20.558591,$
which matches the numerical checkpoint used by the implementation.
How the Code Works
The C++, Python, and Java implementations first solve the single-coin problem for every prior of the form \(u/(u+f)\) with \(1\le u,f\le N\). For each prior they build the last row of the depth-\(220\) triangle from immediate-stop values, then sweep backward one row at a time until they reach the starting state. The computation is performed in log-odds form so that repeated Bayesian updates do not overflow or underflow numerically.
When the log-odds become larger than \(50\) or smaller than \(-50\), the posterior is treated as numerically indistinguishable from \(1\) or \(0\). That avoids expensive exponentials in states where the decision is already effectively certain. The single-coin values are stored in a two-dimensional table indexed by the remaining numbers of unfair and fair coins.
After that precomputation, the implementation fills a second table for the box recursion. The first row and first column hold the certainty boundaries \(20u\) and \(20f\). Every interior cell applies the weighted transition to the two possible next box states and adds the optimal expected value of the current coin. The final cell corresponding to \((N,N)\) is printed with six digits after the decimal point.
Complexity Analysis
Let \(D\) be the depth cap for the single-coin backward induction. One evaluation of \(E(p)\) fills a triangular table with
$1+2+\cdots+(D+1)=O(D^2)$
states, so it costs \(O(D^2)\) time. The implementations only keep one row of that triangle at a time, so the extra memory for the single-coin stage is \(O(D)\).
This evaluation is repeated for every pair \((u,f)\) with \(1\le u,f\le N\), which gives \(O(N^2D^2)\) time for the single-coin precomputation. The outer box dynamic program adds only \(O(N^2)\) time and stores two \((N+1)\times(N+1)\) tables of floating-point values.
Therefore the full method uses
$O(N^2D^2)\text{ time},\qquad O(N^2+D)\text{ memory}.$
For the actual parameters \(N=50\) and \(D=220\), this is easily practical.
Footnotes and References
- Problem page: Project Euler 852 - Coins in a Box
- Bayes' theorem: Wikipedia - Bayes' theorem
- Posterior probability: Wikipedia - Posterior probability
- Optimal stopping: Wikipedia - Optimal stopping
- Dynamic programming: Wikipedia - Dynamic programming