Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 819: Iterative Sampling

View on Project Euler

Project Euler Problem 819 Solution

EulerSolve provides an optimized solution for Project Euler Problem 819, Iterative Sampling, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary Let \(X_0=n\). If the current state is \(k\), we perform \(k\) independent uniform draws from \(n\) labels, and the next state \(X_{r+1}\) is the number of distinct labels that appear in those \(k\) draws. Because every round satisfies \(1 \le X_{r+1} \le k\), the process eventually reaches the absorbing state \(1\). The task is to compute the expected number of rounds needed to go from \(n\) to \(1\), for the case \(n=1000\), and report the result to six decimal places. Mathematical Approach The key observation is that the process is a finite Markov chain whose transition law depends only on an occupancy count: from state \(k\), the next state is the number of occupied labels after \(k\) draws into \(n\) equally likely labels. Step 1: Define the states and transition probabilities Let \(t_k\) denote the expected number of additional rounds needed to reach state \(1\) when the current state is \(k\). Then $t_1=0.$ To determine \(t_k\), we need the transition probabilities $q_k(j)=\Pr(X_{r+1}=j\mid X_r=k),\qquad 1\le j\le k.$ So the combinatorial core of the problem is: after \(k\) independent draws from \(n\) equally likely labels, what is the probability that exactly \(j\) distinct labels appear? Step 2: Build the occupancy distribution Start with zero draws. Then only zero labels can be occupied: $q_0(0)=1.$ Now append one more draw....

Detailed mathematical approach

Problem Summary

Let \(X_0=n\). If the current state is \(k\), we perform \(k\) independent uniform draws from \(n\) labels, and the next state \(X_{r+1}\) is the number of distinct labels that appear in those \(k\) draws. Because every round satisfies \(1 \le X_{r+1} \le k\), the process eventually reaches the absorbing state \(1\). The task is to compute the expected number of rounds needed to go from \(n\) to \(1\), for the case \(n=1000\), and report the result to six decimal places.

Mathematical Approach

The key observation is that the process is a finite Markov chain whose transition law depends only on an occupancy count: from state \(k\), the next state is the number of occupied labels after \(k\) draws into \(n\) equally likely labels.

Step 1: Define the states and transition probabilities

Let \(t_k\) denote the expected number of additional rounds needed to reach state \(1\) when the current state is \(k\). Then

$t_1=0.$

To determine \(t_k\), we need the transition probabilities

$q_k(j)=\Pr(X_{r+1}=j\mid X_r=k),\qquad 1\le j\le k.$

So the combinatorial core of the problem is: after \(k\) independent draws from \(n\) equally likely labels, what is the probability that exactly \(j\) distinct labels appear?

Step 2: Build the occupancy distribution

Start with zero draws. Then only zero labels can be occupied:

$q_0(0)=1.$

Now append one more draw. To end with exactly \(j\) distinct labels after \(k+1\) draws, there are two mutually exclusive possibilities:

the first \(k\) draws already used exactly \(j\) labels and the new draw hits one of those \(j\) labels, or the first \(k\) draws used exactly \(j-1\) labels and the new draw hits a brand-new label.

This gives the recurrence

$q_{k+1}(j)=q_k(j)\frac{j}{n}+q_k(j-1)\frac{n-j+1}{n}.$

This is exactly the triangular probability table computed by the implementations. An equivalent closed form is

$q_k(j)=\frac{n(n-1)\cdots(n-j+1)}{n^k}\,S(k,j),$

where \(S(k,j)\) is a Stirling number of the second kind, but the recurrence is the most direct way to evaluate all needed probabilities numerically.

Step 3: Write the expectation equation by first-step analysis

From state \(k\), one round is spent immediately, and then the chain moves to state \(j\) with probability \(q_k(j)\). Therefore

$t_k=1+\sum_{j=1}^{k} q_k(j)\,t_j.$

At first glance this looks circular, because \(t_k\) appears on both sides. That is not a problem: the only unknown on the right that is not already a smaller-state value is the self-loop term corresponding to \(j=k\).

Step 4: Isolate the self-loop and obtain a triangular solve

Move the \(j=k\) term to the left:

$\left(1-q_k(k)\right)t_k=1+\sum_{j=1}^{k-1} q_k(j)\,t_j.$

Hence

$t_k=\frac{1+\sum_{j=1}^{k-1} q_k(j)\,t_j}{1-q_k(k)}.$

The denominator is positive for every \(k\ge 2\), because \(q_k(k)\) is only the probability that all \(k\) draws are distinct:

$q_k(k)=\frac{n(n-1)\cdots(n-k+1)}{n^k}\lt 1.$

So once the probabilities are known, the expectations can be computed in increasing order \(t_2,t_3,\dots,t_n\). That is why the system is triangular rather than requiring a full linear solver.

Step 5: Worked example for \(n=3\)

For state \(2\), two draws from three labels produce either one distinct label or two distinct labels:

$q_2(1)=\frac{1}{3},\qquad q_2(2)=\frac{2}{3}.$

Therefore

$t_2=1+\frac{1}{3}t_1+\frac{2}{3}t_2=1+\frac{2}{3}t_2,$

so

$t_2=3.$

For state \(3\), three draws from three labels give

$q_3(1)=\frac{1}{9},\qquad q_3(2)=\frac{2}{3},\qquad q_3(3)=\frac{2}{9}.$

Then

$t_3=1+\frac{1}{9}t_1+\frac{2}{3}t_2+\frac{2}{9}t_3.$

Substituting \(t_1=0\) and \(t_2=3\) gives

$t_3=1+0+2+\frac{2}{9}t_3,$

hence

$\frac{7}{9}t_3=3,\qquad t_3=\frac{27}{7}.$

This is the small exact checkpoint used to verify the recurrence.

Step 6: Final quantity for the problem

The required answer is \(t_n\) with \(n=1000\). No random simulation is needed. Once the occupancy table \(q_k(j)\) has been filled, the expected number of rounds follows deterministically from the formula above.

How the Code Works

The C++, Python, and Java implementations follow the same two-stage plan. First, they build a \((n+1)\times(n+1)\) probability table row by row. Each row corresponds to a draw count \(k\), and each entry stores the probability of seeing a specific number of distinct labels. Only the triangular region \(0\le j\le k\le n\) is relevant, so each new row is formed from the previous one using the occupancy recurrence.

After the transition table is ready, the implementation stores the expected remaining rounds for each state. It sets the absorbing value \(t_1=0\), then computes \(t_2,t_3,\dots,t_n\) in increasing order by applying the self-loop formula. Because every new state depends only on smaller states and on the already known diagonal probability \(q_k(k)\), no iterative approximation is required. The final printed value is the expectation for \(n=1000\), formatted to six digits after the decimal point.

Complexity Analysis

Building the occupancy table requires

$\sum_{k=0}^{n-1}(k+1)=O(n^2)$

updates. Solving the expectation equations requires

$\sum_{k=2}^{n}(k-1)=O(n^2)$

more arithmetic operations. Therefore the overall running time is \(O(n^2)\). The implementations keep the full triangular probability table in memory, so the space usage is \(O(n^2)\), while the expectation array itself contributes only \(O(n)\).

Footnotes and References

  1. Problem page: Project Euler 819
  2. Occupancy problem: Wikipedia - Occupancy problem
  3. Stirling numbers of the second kind: Wikipedia - Stirling numbers of the second kind
  4. Markov chains: Wikipedia - Markov chain
  5. Coupon collector's problem: Wikipedia - Coupon collector's problem

Mathematical approach · C++ solution · Python solution · Java solution

Previous: Problem 818 · All Project Euler solutions · Next: Problem 820

Need help with a problem? Ask me! 💡
e
✦ Euler GLM 5.2
Hi! I'm Euler. Ask me anything about Project Euler problems, math concepts, or solution approaches! 🧮