Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 852: Coins in a Box

Project Euler Problem 852 Solution

EulerSolve provides an optimized solution for Project Euler Problem 852, Coins in a Box, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary At a state with $u$ unfair coins and $f$ fair coins still in the box, one of the $u+f$ remaining coins is selected uniformly at random. Therefore the current coin is unfair with prior probability $p=\frac{u}{u+f}.$ Before making the final classification, the player may toss that same coin as many times as desired. Every toss costs $1$ point. The final classification pays $+20$ if the type is identified correctly and $-50$ if it is identified incorrectly. After that reveal, the coin is removed and the next round starts. The task is to maximize the expected total score when the box initially contains $N$ unfair and $N$ fair coins. The solution naturally splits into two layers: an optimal stopping problem for a single selected coin, and a second dynamic program for the changing composition of the box. Mathematical Approach Let $E(p)$ be the optimal expected score contributed by the current coin when the prior probability of being unfair is $p$. Let $B(u,f)$ be the optimal expected total score from a box that still contains $u$ unfair and $f$ fair coins. The implementations first compute $E(p)$ and then plug it into the recurrence for $B(u,f)$. Step 1: Immediate decision value for one coin If we stop immediately, the best action is to guess the more likely type....

Detailed mathematical approach

Problem Summary

At a state with $u$ unfair coins and $f$ fair coins still in the box, one of the $u+f$ remaining coins is selected uniformly at random. Therefore the current coin is unfair with prior probability

$p=\frac{u}{u+f}.$

Before making the final classification, the player may toss that same coin as many times as desired. Every toss costs $1$ point. The final classification pays $+20$ if the type is identified correctly and $-50$ if it is identified incorrectly. After that reveal, the coin is removed and the next round starts.

The task is to maximize the expected total score when the box initially contains $N$ unfair and $N$ fair coins. The solution naturally splits into two layers: an optimal stopping problem for a single selected coin, and a second dynamic program for the changing composition of the box.

Mathematical Approach

Let $E(p)$ be the optimal expected score contributed by the current coin when the prior probability of being unfair is $p$. Let $B(u,f)$ be the optimal expected total score from a box that still contains $u$ unfair and $f$ fair coins. The implementations first compute $E(p)$ and then plug it into the recurrence for $B(u,f)$.

Step 1: Immediate decision value for one coin

If we stop immediately, the best action is to guess the more likely type. Guessing “unfair” gives expected value

$20p-50(1-p)=70p-50,$

while guessing “fair” gives

$20(1-p)-50p=20-70p=70(1-p)-50.$

Hence the best immediate value is

$E_{\mathrm{stop}}(p)=\max\{70p-50,\;70(1-p)-50\}=70\max\{p,1-p\}-50.$

When $p$ is close to $1/2$, this quantity is negative, so blind guessing is worse than collecting evidence from tosses.

Step 2: Bayesian update after one more toss

If we toss the current coin once, the probability of a head is

$q_H=\frac{3}{4}p+\frac{1}{2}(1-p)=\frac{1}{2}+\frac{p}{4},$

and the probability of a tail is

$q_T=1-q_H=\frac{1}{2}-\frac{p}{4}.$

Bayes' rule turns these observations into posterior probabilities

$p_H=\frac{\frac{3}{4}p}{q_H}=\frac{3p}{2+p},\qquad p_T=\frac{\frac{1}{4}p}{q_T}=\frac{p}{2-p}.$

So if we continue for one more toss, the expected value is

$E_{\mathrm{cont}}(p)=-1+q_H E(p_H)+q_T E(p_T).$

The optimal single-coin value is therefore

$E(p)=\max\{E_{\mathrm{stop}}(p),\;E_{\mathrm{cont}}(p)\}.$

This recursion is exact, but it is infinite-horizon. The implementations turn it into a numerically stable finite-horizon dynamic program.

Step 3: Compress observation histories with log-odds

Let the prior odds be

$O_0=\frac{p_0}{1-p_0}.$

A head multiplies the odds by

$\frac{3/4}{1/2}=\frac{3}{2},$

while a tail multiplies them by

$\frac{1/4}{1/2}=\frac{1}{2}.$

After $n$ tosses with $h$ heads and $n-h$ tails, the odds are

$O_{n,h}=O_0\left(\frac{3}{2}\right)^h\left(\frac{1}{2}\right)^{n-h}=O_0\,3^h\,2^{-n}.$

Taking logarithms yields

$\log O_{n,h}=\log O_0-n\log 2+h\log 3.$

Therefore the posterior depends only on the pair $(n,h)$, not on the exact order of the tosses. Recovering the posterior from the odds uses

$p_{n,h}=\frac{O_{n,h}}{1+O_{n,h}}.$

This state compression is the key reason the single-coin problem can be solved with a triangular dynamic-programming table.

Step 4: Backward induction for the single-coin game

The implementations use a depth cap $D=220$. At depth $D$, they force the process to stop and use the terminal value

$E_{D,h}=E_{\mathrm{stop}}(p_{D,h}).$

Then they sweep backward through the triangle with

$E_{n,h}=\max\left\{E_{\mathrm{stop}}(p_{n,h}),\;-1+q_{n,h}E_{n+1,h+1}+(1-q_{n,h})E_{n+1,h}\right\},$

where

$q_{n,h}=\frac{1}{2}+\frac{p_{n,h}}{4}.$

This is a standard optimal stopping recursion: at every posterior state, compare the payoff of guessing now with the payoff of paying one more point for more information.

The depth cap is effective because extra tosses always cost $1$, while repeated evidence pushes the posterior probability rapidly toward $0$ or $1$, making immediate stopping optimal after sufficiently many observations.

Step 5: Dynamic programming for the whole box

Now return to the box with $u$ unfair and $f$ fair coins remaining. The selected coin is unfair with probability

$p=\frac{u}{u+f}.$

The current round contributes $E(p)$ in expectation. After the coin is eventually revealed and removed, the future state is $(u-1,f)$ if the coin was unfair and $(u,f-1)$ if it was fair. Therefore

$B(u,f)=\frac{u}{u+f}B(u-1,f)+\frac{f}{u+f}B(u,f-1)+E\left(\frac{u}{u+f}\right).$

The boundary states are immediate:

$B(u,0)=20u,\qquad B(0,f)=20f,$

because once only one type remains, every remaining classification is certain and no tosses are useful. The required answer is

$S(N)=B(N,N).$

Worked Example: the case $N=1$

With one unfair coin and one fair coin, the first selected coin has prior probability

$p_0=\frac{1}{2}.$

Immediate stopping gives

$E_{\mathrm{stop}}\left(\frac{1}{2}\right)=70\cdot\frac{1}{2}-50=-15,$

so tossing is essential.

After one head, the posterior becomes

$p_H=\frac{3(1/2)}{2+1/2}=\frac{3}{5},$

and after one tail it becomes

$p_T=\frac{1/2}{2-1/2}=\frac{1}{3}.$

Stopping after exactly one toss is still unprofitable, because the immediate values are $-8$ and $-10/3$. But stronger evidence quickly changes the decision. After three consecutive heads, the odds are $27/8$, so

$p=\frac{27}{35},\qquad E_{\mathrm{stop}}(p)=70\cdot\frac{27}{35}-50=4.$

After two consecutive tails, the odds are $1/4$, so

$p=\frac{1}{5},\qquad E_{\mathrm{stop}}(p)=70\cdot\frac{4}{5}-50=6.$

The backward dynamic program decides exactly where that stop/continue boundary lies. For the starting prior $1/2$, it yields

$E\left(\frac{1}{2}\right)\approx 0.558591.$

The other coin then becomes known with certainty and contributes $20$, so

$S(1)=\frac{1}{2}\cdot 20+\frac{1}{2}\cdot 20+0.558591=20.558591,$

which matches the numerical checkpoint used by the implementation.

How the Code Works

The C++, Python, and Java implementations first solve the single-coin problem for every prior of the form $u/(u+f)$ with $1\le u,f\le N$. For each prior they build the last row of the depth-$220$ triangle from immediate-stop values, then sweep backward one row at a time until they reach the starting state. The computation is performed in log-odds form so that repeated Bayesian updates do not overflow or underflow numerically.

When the log-odds become larger than $50$ or smaller than $-50$, the posterior is treated as numerically indistinguishable from $1$ or $0$. That avoids expensive exponentials in states where the decision is already effectively certain. The single-coin values are stored in a two-dimensional table indexed by the remaining numbers of unfair and fair coins.

After that precomputation, the implementation fills a second table for the box recursion. The first row and first column hold the certainty boundaries $20u$ and $20f$. Every interior cell applies the weighted transition to the two possible next box states and adds the optimal expected value of the current coin. The final cell corresponding to $(N,N)$ is printed with six digits after the decimal point.

Complexity Analysis

Let $D$ be the depth cap for the single-coin backward induction. One evaluation of $E(p)$ fills a triangular table with

$1+2+\cdots+(D+1)=O(D^2)$

states, so it costs $O(D^2)$ time. The implementations only keep one row of that triangle at a time, so the extra memory for the single-coin stage is $O(D)$.

This evaluation is repeated for every pair $(u,f)$ with $1\le u,f\le N$, which gives $O(N^2D^2)$ time for the single-coin precomputation. The outer box dynamic program adds only $O(N^2)$ time and stores two $(N+1)\times(N+1)$ tables of floating-point values.

Therefore the full method uses

$O(N^2D^2)\text{ time},\qquad O(N^2+D)\text{ memory}.$

For the actual parameters $N=50$ and $D=220$, this is easily practical.

Footnotes and References

Problem page: Project Euler 852 - Coins in a Box
Bayes' theorem: Wikipedia - Bayes' theorem
Posterior probability: Wikipedia - Posterior probability
Optimal stopping: Wikipedia - Optimal stopping
Dynamic programming: Wikipedia - Dynamic programming