Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 232: The Race

Project Euler Problem 232 Solution

EulerSolve provides an optimized solution for Project Euler Problem 232, The Race, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary Player 1 and Player 2 both start at 0 and race to 100 points. Player 1 always plays the simple strategy: on each turn, flip one fair coin and score 1 point on heads, 0 on tails. Player 2 has a much richer move: before each turn, choose a positive integer $T$, then succeed with probability $2^{-T}$; on success, score $2^{T-1}$ points, otherwise score nothing. Player 1 moves first, and the question asks for Player 2's maximum possible winning probability under optimal choices of $T$. The solutions do not simulate individual games. Instead they solve a finite optimal-control problem on the remaining distances to the target. For the actual target 100, the computed starting probability is $0.83648556$ to eight decimal places. Mathematical Approach The right viewpoint is to forget absolute scores and track only how many points each player still needs. That turns the game into a dynamic program with an explicit Bellman recurrence. Two state functions for the two turn types Let $F(a,b)$ be the optimal probability that Player 2 eventually wins when Player 1 still needs $a$ points, Player 2 still needs $b$ points, and it is Player 2's turn to move. It is also convenient to introduce $G(a,b)$, the corresponding probability at the start of Player 1's turn with the same remaining distances....

Detailed mathematical approach

Problem Summary

Player 1 and Player 2 both start at 0 and race to 100 points. Player 1 always plays the simple strategy: on each turn, flip one fair coin and score 1 point on heads, 0 on tails. Player 2 has a much richer move: before each turn, choose a positive integer $T$, then succeed with probability $2^{-T}$; on success, score $2^{T-1}$ points, otherwise score nothing. Player 1 moves first, and the question asks for Player 2's maximum possible winning probability under optimal choices of $T$.

The solutions do not simulate individual games. Instead they solve a finite optimal-control problem on the remaining distances to the target. For the actual target 100, the computed starting probability is $0.83648556$ to eight decimal places.

Mathematical Approach

The right viewpoint is to forget absolute scores and track only how many points each player still needs. That turns the game into a dynamic program with an explicit Bellman recurrence.

Two state functions for the two turn types

Let $F(a,b)$ be the optimal probability that Player 2 eventually wins when Player 1 still needs $a$ points, Player 2 still needs $b$ points, and it is Player 2's turn to move.

It is also convenient to introduce $G(a,b)$, the corresponding probability at the start of Player 1's turn with the same remaining distances. The initial position of the real problem is therefore $G(100,100)$, not $F(100,100)$, because Player 1 moves first.

The boundary conditions are immediate:

$F(a,0)=1$ and $G(a,0)=1$, because Player 2 has already reached the target; $G(0,b)=0$, because if Player 1 has already arrived before Player 2 gets another move, Player 2 has lost.

The easy transition: Player 1's turn

Player 1 either scores 1 point with probability $1/2$ or scores nothing with probability $1/2$. Therefore, for $a\ge 1$ and $b\ge 1$,

$G(a,b)=\frac{F(a,b)+F(a-1,b)}{2}.$

This formula is the first invariant used by the code: every time the game passes through Player 1's turn, the continuation value is just the average of two neighboring $F$-states.

Player 2's move and the raw Bellman equation

Suppose Player 2 chooses an integer $t\ge 1$. Then

$q_t=2^{-t},\qquad d_t=2^{t-1}$

are, respectively, the success probability and the number of points gained on success.

Define the success continuation value by

$S_t(a,b)=1$ if $d_t\ge b$, because Player 2 wins immediately, and otherwise $S_t(a,b)=G(a,b-d_t)$, because after scoring $d_t$ points the turn passes to Player 1.

If $t$ is fixed, the one-step Bellman equation is therefore

$F(a,b)=q_t\,S_t(a,b)+(1-q_t)\,G(a,b).$

The subtle point is the failure branch: when Player 2 misses, the scores do not change, so the game returns to the start of Player 1's turn at the same pair $(a,b)$. That is why $F(a,b)$ appears on both sides after we expand $G(a,b)$.

Removing the self-reference

Substitute $G(a,b)=\frac{F(a,b)+F(a-1,b)}{2}$ into the previous equation:

$F(a,b)=q_t\,S_t(a,b)+(1-q_t)\frac{F(a,b)+F(a-1,b)}{2}.$

Now solve this linear equation for $F(a,b)$. After collecting terms,

$F(a,b)=\frac{2q_t\,S_t(a,b)+(1-q_t)F(a-1,b)}{1+q_t}.$

Since Player 2 chooses the best $t$, the true recurrence is

$\boxed{F(a,b)=\max_{t\ge 1}\frac{2q_t\,S_t(a,b)+(1-q_t)F(a-1,b)}{1+q_t}.}$

This closed form is exactly what the implementations evaluate for every state $(a,b)$.

Why the dynamic program can be filled bottom-up

For a fixed state $(a,b)$, the formula only refers to two kinds of previously known values:

$F(a-1,b)$, which lies in the previous row, and $S_t(a,b)$, which either equals 1 or involves $G(a,b-d_t)$ with $b-d_t<b$.

That means a row-by-row, left-to-right fill order works. When the code reaches $(a,b)$, every state on which it depends has already been computed.

The action set is also finite. Once $d_t\ge b$, success already means immediate victory, and any larger $t$ only lowers the success probability $q_t$. So only $O(\log b)$ candidates can matter, and a small global cap is enough for the whole target 100 table.

Worked example: why a bigger gamble can be optimal

Consider the state $(a,b)=(1,2)$: Player 1 needs 1 more point, Player 2 needs 2 more points, and it is Player 2's turn.

First compute the simpler state $(1,1)$. If Player 2 chooses $t=1$, then $q_1=1/2$ and success ends the game immediately. The formula gives

$F(1,1)=\frac{2\cdot(1/2)\cdot 1+(1-1/2)\cdot 0}{1+1/2}=\frac{2}{3},$

so

$G(1,1)=\frac{F(1,1)+F(0,1)}{2}=\frac{2/3+0}{2}=\frac{1}{3}.$

Now compare two actions at $(1,2)$.

If Player 2 chooses $t=1$, a success scores only 1 point, so the success branch leads to $G(1,1)=1/3$. Since $F(0,2)=0$,

$F_{t=1}(1,2)=\frac{2\cdot(1/2)\cdot(1/3)+(1-1/2)\cdot 0}{1+1/2}=\frac{2}{9}\approx 0.2222.$

If Player 2 chooses $t=2$, then $q_2=1/4$ but the gain is 2 points, which wins immediately:

$F_{t=2}(1,2)=\frac{2\cdot(1/4)\cdot 1+(1-1/4)\cdot 0}{1+1/4}=\frac{2}{5}=0.4.$

So the optimal move at $(1,2)$ is the riskier $t=2$. This is exactly the kind of local tradeoff the recurrence captures across the whole $100\times 100$ state space.

How the Code Works

Table setup and precomputation

The C++, Python, and Java implementations store a two-dimensional table for $F(a,b)$. The column $b=0$ is initialized to 1, expressing that Player 2 has already won whenever no further points are needed.

They also precompute the candidate actions: for each tested $t$, the code stores $q_t=2^{-t}$ and $d_t=2^{t-1}$. Because the gains double each time, only a small number of actions must be examined for target 100.

The bottom-up sweep

The main loop visits states in increasing order of $a$ and $b$. At each cell, the implementation reads the already known value $F(a-1,b)$, evaluates every admissible action $t$, computes the corresponding success value $S_t(a,b)$, plugs those quantities into the closed-form recurrence, and keeps the maximum.

The C++ implementation additionally stores the maximizing choice of $t$ for each state, while the Python and Java implementations keep only the probability table because the final answer needs only the optimal value.

Extracting the start state and cross-checking

After the table is complete, the required answer is obtained by one final Player 1 transition:

$G(100,100)=\frac{F(100,100)+F(99,100)}{2}.$

The C++ program also includes an independent Bellman value-iteration verifier on smaller targets and checks that its threaded verifier agrees with the single-threaded version. Those checks are not part of the mathematical core, but they confirm that the closed-form recurrence has been implemented correctly.

Complexity Analysis

For a general target $N$, there are $N^2$ states $(a,b)$. Each state tests only $L$ candidate actions, where $L=O(\log N)$ because the possible gains are powers of two. The running time is therefore $O(N^2L)=O(N^2\log N)$.

The memory usage is $O(N^2)$ for the probability table. One implementation also stores an auxiliary policy table and a verification routine, but the main solver itself is still a straightforward $O(N^2)$-space dynamic program. For the actual Project Euler input $N=100$, this is easily fast enough.

Footnotes and References

Problem page: https://projecteuler.net/problem=232
Dynamic programming: Wikipedia - Dynamic programming
Bellman equation: Wikipedia - Bellman equation
Markov decision process: Wikipedia - Markov decision process
Bernoulli trial: Wikipedia - Bernoulli trial