Problem 493: Under the Rainbow
View on Project EulerProject Euler Problem 493 Solution
EulerSolve provides an optimized solution for Project Euler Problem 493, Under the Rainbow, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary An urn contains \(C\) colors, each represented by \(B\) balls, so the total number of balls is \(CB\). We draw \(D\) balls uniformly without replacement and ask for the expected number of distinct colors that appear in the sample. For Problem 493, the concrete parameters are \((C,B,D)=(7,10,20)\). Mathematical Approach Let \(X\) be the number of distinct colors seen in the draw. The key observation is that the expectation can be reduced to a single probability about one fixed color, and that probability is a simple hypergeometric complement. Step 1: Model the Random Variable For each color \(j\in\{1,\dots,C\}\), define an indicator variable \(I_j\) that equals \(1\) if color \(j\) appears at least once in the sample and equals \(0\) otherwise. Then the number of distinct colors is $X=\sum_{j=1}^{C} I_j.$ This reformulation turns a seemingly global counting question into a sum of very small yes-or-no events, one event for each color. Step 2: Apply Linearity of Expectation Taking expectations gives $\mathbb{E}[X]=\sum_{j=1}^{C}\mathbb{E}[I_j]=\sum_{j=1}^{C}\Pr(I_j=1).$ No independence is needed here; linearity of expectation always holds. By symmetry, every color has the same chance to appear. If we write $p=\Pr(I_1=1),$ then $\mathbb{E}[X]=Cp.$ So the whole problem is reduced to finding the appearance probability of one fixed color....
Detailed mathematical approach
Problem Summary
An urn contains \(C\) colors, each represented by \(B\) balls, so the total number of balls is \(CB\). We draw \(D\) balls uniformly without replacement and ask for the expected number of distinct colors that appear in the sample. For Problem 493, the concrete parameters are \((C,B,D)=(7,10,20)\).
Mathematical Approach
Let \(X\) be the number of distinct colors seen in the draw. The key observation is that the expectation can be reduced to a single probability about one fixed color, and that probability is a simple hypergeometric complement.
Step 1: Model the Random Variable
For each color \(j\in\{1,\dots,C\}\), define an indicator variable \(I_j\) that equals \(1\) if color \(j\) appears at least once in the sample and equals \(0\) otherwise. Then the number of distinct colors is
$X=\sum_{j=1}^{C} I_j.$
This reformulation turns a seemingly global counting question into a sum of very small yes-or-no events, one event for each color.
Step 2: Apply Linearity of Expectation
Taking expectations gives
$\mathbb{E}[X]=\sum_{j=1}^{C}\mathbb{E}[I_j]=\sum_{j=1}^{C}\Pr(I_j=1).$
No independence is needed here; linearity of expectation always holds. By symmetry, every color has the same chance to appear. If we write
$p=\Pr(I_1=1),$
then
$\mathbb{E}[X]=Cp.$
So the whole problem is reduced to finding the appearance probability of one fixed color.
Step 3: Count the Complementary Event
It is easier to compute the probability that the chosen color does not appear. The total number of equally likely samples of size \(D\) is
$\binom{CB}{D}.$
If one particular color is missing, then all \(D\) drawn balls must come from the other \(C-1\) colors, which contribute \((C-1)B\) balls altogether. Therefore
$q=\Pr(I_1=0)=\frac{\binom{(C-1)B}{D}}{\binom{CB}{D}}.$
Since \(p=1-q\), we obtain
$p=1-\frac{\binom{(C-1)B}{D}}{\binom{CB}{D}}.$
Step 4: Derive the Closed Formula
Substituting \(p\) into \(\mathbb{E}[X]=Cp\) yields the exact expectation
$\boxed{\mathbb{E}[X]=C\left(1-\frac{\binom{(C-1)B}{D}}{\binom{CB}{D}}\right).}$
For the Project Euler parameters, this becomes
$\mathbb{E}[X]=7\left(1-\frac{\binom{60}{20}}{\binom{70}{20}}\right).$
This formula is already the mathematical solution. The remaining task is simply to evaluate it numerically in a stable way.
Step 5: Rewrite the Binomial Ratio as a Product
Directly forming the two large binomial coefficients is unnecessary. Using falling products,
$\binom{n}{D}=\frac{n(n-1)\cdots(n-D+1)}{D!},$
so for any \(A\) and \(N\) with \(A\le N\) we have
$\frac{\binom{A}{D}}{\binom{N}{D}}=\frac{A(A-1)\cdots(A-D+1)}{N(N-1)\cdots(N-D+1)}=\prod_{i=0}^{D-1}\frac{A-i}{N-i}.$
With \(A=(C-1)B\) and \(N=CB\), the missing-color probability is therefore
$q=\prod_{i=0}^{D-1}\frac{(C-1)B-i}{CB-i}.$
This product form is exactly what the implementations use, because it avoids huge intermediate values and matches floating-point arithmetic naturally.
Worked Example: \((C,B,D)=(3,2,2)\)
This small case is easy to verify by hand. There are \(3\) colors with \(2\) balls each, so \(6\) balls in total. For one fixed color, the probability that it is absent from the two draws is
$q=\frac{\binom{4}{2}}{\binom{6}{2}}=\frac{6}{15}=\frac{2}{5}.$
Hence
$p=1-q=\frac{3}{5},\qquad \mathbb{E}[X]=3\cdot\frac{3}{5}=\frac{9}{5}=1.8.$
A direct enumeration confirms this. Among the \(\binom{6}{2}=15\) equally likely pairs, exactly \(3\) pairs have both balls of the same color and the remaining \(12\) pairs have two different colors, so
$\frac{3\cdot 1+12\cdot 2}{15}=\frac{27}{15}=\frac{9}{5}.$
How the Code Works
The C++, Python, and Java implementations all use the same formula. They first compute the total number of balls \(CB\) and the number of balls that remain after excluding one fixed color, namely \((C-1)B\).
They then evaluate the product
$\prod_{i=0}^{D-1}\frac{(C-1)B-i}{CB-i}$
term by term. Each iteration contributes one factor to the probability that the chosen color is missing from the sample.
After that, the implementation takes the complement of this probability and multiplies by \(C\):
$C\left(1-\prod_{i=0}^{D-1}\frac{(C-1)B-i}{CB-i}\right).$
The result is printed to nine digits after the decimal point. The C++ implementation also performs small sanity checks, including the toy example above and the edge case \(D=0\), before printing the final value.
Complexity Analysis
For general parameters, the method performs exactly \(D\) multiplicative updates, so the running time is \(O(D)\) and the extra memory usage is \(O(1)\). For the actual problem instance \(D=20\), the computation is tiny. The important point is that the algorithm never enumerates samples and never builds large combinatorial tables.
Footnotes and References
- Problem page: https://projecteuler.net/problem=493
- Hypergeometric distribution: Wikipedia — Hypergeometric distribution
- Indicator function and indicator variables: Wikipedia — Indicator function
- Expected value and linearity: Wikipedia — Expected value
- Binomial coefficient: Wikipedia — Binomial coefficient