Problem 232: The Race
View on Project EulerProject Euler Problem 232 Solution
EulerSolve provides an optimized solution for Project Euler Problem 232, The Race, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary Player 1 and Player 2 both start at 0 and race to 100 points. Player 1 always plays the simple strategy: on each turn, flip one fair coin and score 1 point on heads, 0 on tails. Player 2 has a much richer move: before each turn, choose a positive integer \(T\), then succeed with probability \(2^{-T}\); on success, score \(2^{T-1}\) points, otherwise score nothing. Player 1 moves first, and the question asks for Player 2's maximum possible winning probability under optimal choices of \(T\). The solutions do not simulate individual games. Instead they solve a finite optimal-control problem on the remaining distances to the target. For the actual target 100, the computed starting probability is \(0.83648556\) to eight decimal places. Mathematical Approach The right viewpoint is to forget absolute scores and track only how many points each player still needs. That turns the game into a dynamic program with an explicit Bellman recurrence. Two state functions for the two turn types Let \(F(a,b)\) be the optimal probability that Player 2 eventually wins when Player 1 still needs \(a\) points, Player 2 still needs \(b\) points, and it is Player 2's turn to move. It is also convenient to introduce \(G(a,b)\), the corresponding probability at the start of Player 1's turn with the same remaining distances....
Detailed mathematical approach
Problem Summary
Player 1 and Player 2 both start at 0 and race to 100 points. Player 1 always plays the simple strategy: on each turn, flip one fair coin and score 1 point on heads, 0 on tails. Player 2 has a much richer move: before each turn, choose a positive integer \(T\), then succeed with probability \(2^{-T}\); on success, score \(2^{T-1}\) points, otherwise score nothing. Player 1 moves first, and the question asks for Player 2's maximum possible winning probability under optimal choices of \(T\).
The solutions do not simulate individual games. Instead they solve a finite optimal-control problem on the remaining distances to the target. For the actual target 100, the computed starting probability is \(0.83648556\) to eight decimal places.
Mathematical Approach
The right viewpoint is to forget absolute scores and track only how many points each player still needs. That turns the game into a dynamic program with an explicit Bellman recurrence.
Two state functions for the two turn types
Let \(F(a,b)\) be the optimal probability that Player 2 eventually wins when Player 1 still needs \(a\) points, Player 2 still needs \(b\) points, and it is Player 2's turn to move.
It is also convenient to introduce \(G(a,b)\), the corresponding probability at the start of Player 1's turn with the same remaining distances. The initial position of the real problem is therefore \(G(100,100)\), not \(F(100,100)\), because Player 1 moves first.
The boundary conditions are immediate:
\(F(a,0)=1\) and \(G(a,0)=1\), because Player 2 has already reached the target; \(G(0,b)=0\), because if Player 1 has already arrived before Player 2 gets another move, Player 2 has lost.
The easy transition: Player 1's turn
Player 1 either scores 1 point with probability \(1/2\) or scores nothing with probability \(1/2\). Therefore, for \(a\ge 1\) and \(b\ge 1\),
$G(a,b)=\frac{F(a,b)+F(a-1,b)}{2}.$
This formula is the first invariant used by the code: every time the game passes through Player 1's turn, the continuation value is just the average of two neighboring \(F\)-states.
Player 2's move and the raw Bellman equation
Suppose Player 2 chooses an integer \(t\ge 1\). Then
$q_t=2^{-t},\qquad d_t=2^{t-1}$
are, respectively, the success probability and the number of points gained on success.
Define the success continuation value by
\(S_t(a,b)=1\) if \(d_t\ge b\), because Player 2 wins immediately, and otherwise \(S_t(a,b)=G(a,b-d_t)\), because after scoring \(d_t\) points the turn passes to Player 1.
If \(t\) is fixed, the one-step Bellman equation is therefore
$F(a,b)=q_t\,S_t(a,b)+(1-q_t)\,G(a,b).$
The subtle point is the failure branch: when Player 2 misses, the scores do not change, so the game returns to the start of Player 1's turn at the same pair \((a,b)\). That is why \(F(a,b)\) appears on both sides after we expand \(G(a,b)\).
Removing the self-reference
Substitute \(G(a,b)=\frac{F(a,b)+F(a-1,b)}{2}\) into the previous equation:
$F(a,b)=q_t\,S_t(a,b)+(1-q_t)\frac{F(a,b)+F(a-1,b)}{2}.$
Now solve this linear equation for \(F(a,b)\). After collecting terms,
$F(a,b)=\frac{2q_t\,S_t(a,b)+(1-q_t)F(a-1,b)}{1+q_t}.$
Since Player 2 chooses the best \(t\), the true recurrence is
$\boxed{F(a,b)=\max_{t\ge 1}\frac{2q_t\,S_t(a,b)+(1-q_t)F(a-1,b)}{1+q_t}.}$
This closed form is exactly what the implementations evaluate for every state \((a,b)\).
Why the dynamic program can be filled bottom-up
For a fixed state \((a,b)\), the formula only refers to two kinds of previously known values:
\(F(a-1,b)\), which lies in the previous row, and \(S_t(a,b)\), which either equals 1 or involves \(G(a,b-d_t)\) with \(b-d_t<b\).
That means a row-by-row, left-to-right fill order works. When the code reaches \((a,b)\), every state on which it depends has already been computed.
The action set is also finite. Once \(d_t\ge b\), success already means immediate victory, and any larger \(t\) only lowers the success probability \(q_t\). So only \(O(\log b)\) candidates can matter, and a small global cap is enough for the whole target 100 table.
Worked example: why a bigger gamble can be optimal
Consider the state \((a,b)=(1,2)\): Player 1 needs 1 more point, Player 2 needs 2 more points, and it is Player 2's turn.
First compute the simpler state \((1,1)\). If Player 2 chooses \(t=1\), then \(q_1=1/2\) and success ends the game immediately. The formula gives
$F(1,1)=\frac{2\cdot(1/2)\cdot 1+(1-1/2)\cdot 0}{1+1/2}=\frac{2}{3},$
so
$G(1,1)=\frac{F(1,1)+F(0,1)}{2}=\frac{2/3+0}{2}=\frac{1}{3}.$
Now compare two actions at \((1,2)\).
If Player 2 chooses \(t=1\), a success scores only 1 point, so the success branch leads to \(G(1,1)=1/3\). Since \(F(0,2)=0\),
$F_{t=1}(1,2)=\frac{2\cdot(1/2)\cdot(1/3)+(1-1/2)\cdot 0}{1+1/2}=\frac{2}{9}\approx 0.2222.$
If Player 2 chooses \(t=2\), then \(q_2=1/4\) but the gain is 2 points, which wins immediately:
$F_{t=2}(1,2)=\frac{2\cdot(1/4)\cdot 1+(1-1/4)\cdot 0}{1+1/4}=\frac{2}{5}=0.4.$
So the optimal move at \((1,2)\) is the riskier \(t=2\). This is exactly the kind of local tradeoff the recurrence captures across the whole \(100\times 100\) state space.
How the Code Works
Table setup and precomputation
The C++, Python, and Java implementations store a two-dimensional table for \(F(a,b)\). The column \(b=0\) is initialized to 1, expressing that Player 2 has already won whenever no further points are needed.
They also precompute the candidate actions: for each tested \(t\), the code stores \(q_t=2^{-t}\) and \(d_t=2^{t-1}\). Because the gains double each time, only a small number of actions must be examined for target 100.
The bottom-up sweep
The main loop visits states in increasing order of \(a\) and \(b\). At each cell, the implementation reads the already known value \(F(a-1,b)\), evaluates every admissible action \(t\), computes the corresponding success value \(S_t(a,b)\), plugs those quantities into the closed-form recurrence, and keeps the maximum.
The C++ implementation additionally stores the maximizing choice of \(t\) for each state, while the Python and Java implementations keep only the probability table because the final answer needs only the optimal value.
Extracting the start state and cross-checking
After the table is complete, the required answer is obtained by one final Player 1 transition:
$G(100,100)=\frac{F(100,100)+F(99,100)}{2}.$
The C++ program also includes an independent Bellman value-iteration verifier on smaller targets and checks that its threaded verifier agrees with the single-threaded version. Those checks are not part of the mathematical core, but they confirm that the closed-form recurrence has been implemented correctly.
Complexity Analysis
For a general target \(N\), there are \(N^2\) states \((a,b)\). Each state tests only \(L\) candidate actions, where \(L=O(\log N)\) because the possible gains are powers of two. The running time is therefore \(O(N^2L)=O(N^2\log N)\).
The memory usage is \(O(N^2)\) for the probability table. One implementation also stores an auxiliary policy table and a verification routine, but the main solver itself is still a straightforward \(O(N^2)\)-space dynamic program. For the actual Project Euler input \(N=100\), this is easily fast enough.
Footnotes and References
- Problem page: https://projecteuler.net/problem=232
- Dynamic programming: Wikipedia - Dynamic programming
- Bellman equation: Wikipedia - Bellman equation
- Markov decision process: Wikipedia - Markov decision process
- Bernoulli trial: Wikipedia - Bernoulli trial