Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 744: What? Where? When?

Project Euler Problem 744 Solution

EulerSolve provides an optimized solution for Project Euler Problem 744, What? Where? When?, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary Consider independent Bernoulli trials with success probability $p$ and failure probability $q=1-p$. The process stops as soon as either successes or failures has appeared $n$ times. If $T$ denotes that stopping time, then the quantity evaluated by the implementation is $f(n,p)=\frac{2n+1-\mathbb{E}[T]}{2n+1}.$ So the problem is to compute a normalized expected remaining score in a first-to-$n$ race between two complementary outcomes. For small or moderate inputs the value can be summed exactly, while for huge $n$ with a strong bias one can replace the full stopping-time distribution by a drift-dominant approximation. Mathematical Approach The exact formula comes from the distribution of the stopping time $T$. Because one side reaches count $n$ first, we always have $n \le T \le 2n-1$. Step 1: Describe the Stopping Event Write $T=n+m$ with $0\le m\le n-1$. There are two mutually exclusive ways to stop on trial $n+m$: the last trial can be a failure, or the last trial can be a success. If the process ends with a failure on trial $n+m$, then the first $n+m-1$ trials must contain exactly $n-1$ failures and $m$ successes, followed by one more failure....

Detailed mathematical approach

Problem Summary

Consider independent Bernoulli trials with success probability $p$ and failure probability $q=1-p$. The process stops as soon as either successes or failures has appeared $n$ times. If $T$ denotes that stopping time, then the quantity evaluated by the implementation is

$f(n,p)=\frac{2n+1-\mathbb{E}[T]}{2n+1}.$

So the problem is to compute a normalized expected remaining score in a first-to-$n$ race between two complementary outcomes. For small or moderate inputs the value can be summed exactly, while for huge $n$ with a strong bias one can replace the full stopping-time distribution by a drift-dominant approximation.

Mathematical Approach

The exact formula comes from the distribution of the stopping time $T$. Because one side reaches count $n$ first, we always have $n \le T \le 2n-1$.

Step 1: Describe the Stopping Event

Write $T=n+m$ with $0\le m\le n-1$. There are two mutually exclusive ways to stop on trial $n+m$: the last trial can be a failure, or the last trial can be a success.

If the process ends with a failure on trial $n+m$, then the first $n+m-1$ trials must contain exactly $n-1$ failures and $m$ successes, followed by one more failure. Therefore

$A_m=\Pr(T=n+m\text{ and stop by a failure})=\binom{n+m-1}{m}p^m q^n.$

By symmetry, ending with a success has probability

$B_m=\Pr(T=n+m\text{ and stop by a success})=\binom{n+m-1}{m}q^m p^n.$

Step 2: Sum the Two Terminal Branches

The two branches are disjoint, so

$\Pr(T=n+m)=A_m+B_m.$

Since $\sum_{m=0}^{n-1}\Pr(T=n+m)=1$, the quantity being computed can be written either as a normalized expectation

$f(n,p)=\frac{2n+1-\mathbb{E}[T]}{2n+1},$

or directly as the exact finite sum

$f(n,p)=\frac{1}{2n+1}\sum_{m=0}^{n-1}(2n+1-(n+m))(A_m+B_m).$

Because $2n+1-(n+m)=n+1-m$, this becomes

$f(n,p)=\frac{1}{2n+1}\sum_{m=0}^{n-1}(n+1-m)\left(\binom{n+m-1}{m}p^m q^n+\binom{n+m-1}{m}q^m p^n\right).$

Step 3: Convert the Exact Sum into a Stable Recurrence

Computing binomial coefficients from scratch at every step is unnecessary. The two branch terms satisfy

$A_0=q^n,\qquad B_0=p^n,$

and for $m\ge 0$

$A_{m+1}=A_m\cdot \frac{n+m}{m+1}\,p,\qquad B_{m+1}=B_m\cdot \frac{n+m}{m+1}\,q.$

This follows from

$\binom{n+m}{m+1}=\binom{n+m-1}{m}\frac{n+m}{m+1}.$

So the exact mode needs only constant memory: accumulate the current contribution, then advance both terminal probabilities by one recurrence step.

Step 4: Interpret the Large-Drift Regime

When $p$ is far from $\frac12$, one outcome is dominant. Let

$d=\max(p,q).$

In that case the process usually ends when the dominant outcome collects its $n$-th occurrence, and the stopping time is close to the negative-binomial expectation

$\mathbb{E}[T]\approx \frac{n}{d}.$

Substituting this into the normalized form gives the approximation used by the implementation:

$f(n,p)\approx \frac{2n+1-\frac{n}{d}}{2n+1}.$

This is effective when $n$ is huge and the bias is strong enough that the losing branch contributes negligibly.

Step 5: Decide Which Regime to Use

The implementation measures the strength of the bias with

$\delta=|1-2p|\sqrt{\frac{n}{pq}}.$

The factor $|1-2p|$ is the drift per trial, while $\sqrt{npq}$ is the natural fluctuation scale after about $n$ trials. So $\delta$ compares deterministic drift against random noise. The exact summation is used when $n\le 2000$ or $\delta \lt 15$; otherwise the large-drift approximation is used.

Worked Example: $n=3,\ p=\frac12$

Here $q=\frac12$. The exact branch terms are

$A_0=B_0=\left(\frac12\right)^3=\frac18,$

$A_1=B_1=\binom{3}{1}\left(\frac12\right)^4=\frac{3}{16},$

$A_2=B_2=\binom{4}{2}\left(\frac12\right)^5=\frac{3}{16}.$

Therefore

$\Pr(T=3)=\frac14,\qquad \Pr(T=4)=\frac38,\qquad \Pr(T=5)=\frac38.$

The normalized value is

$f\left(3,\frac12\right)=\frac{4\cdot \frac14+3\cdot \frac38+2\cdot \frac38}{7}=\frac{23}{56}\approx 0.4107142857.$

This example shows exactly how the weights $n+1-m$ are paired with the stopping-time probabilities.

How the Code Works

The C++, Python, and Java implementations all follow the same hybrid plan. They first compute $q=1-p$ and the drift score $\delta$. If the input lies in the exact regime, the program initializes the two terminal-branch probabilities at $q^n$ and $p^n$, loops over $m=0,1,\dots,n-1$, adds the weighted contribution $\frac{n+1-m}{2n+1}(A_m+B_m)$, and updates both terms by the recurrence above.

If the input lies in the drift-dominant regime, the implementation skips the loop and returns $\frac{2n+1-n/\max(p,q)}{2n+1}$ directly. All three versions print the final decimal value to ten digits, and the C++ implementation also checks a few smaller benchmark cases before evaluating the huge target input.

Complexity Analysis

The exact regime performs one pass over $m=0,\dots,n-1$, so it runs in $O(n)$ time and $O(1)$ memory. The drift-dominant approximation is $O(1)$ time and $O(1)$ memory. The hybrid switch keeps the exact method where cancellation matters and avoids an impossible $O(n)$ loop when $n$ is extremely large.

Footnotes and References

Problem page: https://projecteuler.net/problem=744
Bernoulli process: Wikipedia - Bernoulli process
Negative binomial distribution: Wikipedia - Negative binomial distribution
Binomial coefficient: Wikipedia - Binomial coefficient
Sequential analysis: Wikipedia - Sequential analysis