Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 756: Approximating a Sum

Project Euler Problem 756 Solution

EulerSolve provides an optimized solution for Project Euler Problem 756, Approximating a Sum, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary The implemented quantity is an expected error term arising from approximating a weighted sum by sampling $m$ distinct positions from $\{1,\dots,n\}$ uniformly without replacement. If $X$ denotes the smallest sampled position and $f$ is the weight sequence, the error is the total weight lying strictly before the first sampled position: $\sum_{k=1}^{X-1} f(k).$ For Problem 756 the weight is Euler's totient function, so $f(k)=\varphi(k)$, and the production parameters are $n=12{,}345{,}678$ and $m=12{,}345$. The goal is therefore to evaluate the corresponding expectation accurately and efficiently. Mathematical Approach Write $S\subseteq\{1,\dots,n\}$ for the uniformly chosen set of size $m$, and let $X=\min S$. The implementations compute $E_{n,m}(f)=\mathbb{E}\left[\sum_{k=1}^{X-1} f(k)\right].$ Step 1: Rewrite the error with indicator variables Term $f(k)$ contributes to the error exactly when no sampled position appears among $1,\dots,k$, which is equivalent to $X\gt k$. Therefore $\sum_{k=1}^{X-1} f(k)=\sum_{k=1}^{n-m} f(k)\,\mathbf{1}_{\{X\gt k\}}.$ The upper limit is $n-m$ because the smallest selected position can never exceed $n-m+1$....

Detailed mathematical approach

Problem Summary

The implemented quantity is an expected error term arising from approximating a weighted sum by sampling $m$ distinct positions from $\{1,\dots,n\}$ uniformly without replacement. If $X$ denotes the smallest sampled position and $f$ is the weight sequence, the error is the total weight lying strictly before the first sampled position:

$\sum_{k=1}^{X-1} f(k).$

For Problem 756 the weight is Euler's totient function, so $f(k)=\varphi(k)$, and the production parameters are $n=12{,}345{,}678$ and $m=12{,}345$. The goal is therefore to evaluate the corresponding expectation accurately and efficiently.

Mathematical Approach

Write $S\subseteq\{1,\dots,n\}$ for the uniformly chosen set of size $m$, and let $X=\min S$. The implementations compute

$E_{n,m}(f)=\mathbb{E}\left[\sum_{k=1}^{X-1} f(k)\right].$

Step 1: Rewrite the error with indicator variables

Term $f(k)$ contributes to the error exactly when no sampled position appears among $1,\dots,k$, which is equivalent to $X\gt k$. Therefore

$\sum_{k=1}^{X-1} f(k)=\sum_{k=1}^{n-m} f(k)\,\mathbf{1}_{\{X\gt k\}}.$

The upper limit is $n-m$ because the smallest selected position can never exceed $n-m+1$. Taking expectations and using linearity gives

$E_{n,m}(f)=\sum_{k=1}^{n-m} f(k)\,\Pr(X\gt k).$

This reduces the whole problem to finding one survival probability for each prefix length $k$.

Step 2: Count $\Pr(X\gt k)$ combinatorially

The condition $X\gt k$ means that every sampled position lies in $\{k+1,\dots,n\}$, a set of size $n-k$. The number of admissible samples is therefore $\binom{n-k}{m}$, while the total number of samples is $\binom{n}{m}$. Hence

$\Pr(X\gt k)=\frac{\binom{n-k}{m}}{\binom{n}{m}} \qquad (1\le k\le n-m).$

Substituting this into the expectation yields the exact closed form

$E_{n,m}(f)=\sum_{k=1}^{n-m} f(k)\frac{\binom{n-k}{m}}{\binom{n}{m}}.$

No subset enumeration is needed; only combinatorial counting is.

Step 3: Turn the binomial formula into a stable recurrence

Directly evaluating huge binomial coefficients would be wasteful. Define

$q_k=\Pr(X\gt k)=\frac{\binom{n-k}{m}}{\binom{n}{m}}.$

Then the first value is

$q_1=\frac{n-m}{n},$

and consecutive terms satisfy

$\frac{q_{k+1}}{q_k}=\frac{\binom{n-k-1}{m}}{\binom{n-k}{m}}=\frac{n-k-m}{n-k},$

so

$q_{k+1}=q_k\frac{n-k-m}{n-k}\qquad (1\le k\le n-m-1).$

This recurrence lets the expectation be accumulated in one forward pass, using only one running probability instead of enormous intermediate integers.

Step 4: Specialize the weights to Euler's totient function

For Problem 756, the weight at index $k$ is $\varphi(k)$, the number of integers in $\{1,\dots,k\}$ that are coprime to $k$. Its standard multiplicative formula is

$\varphi(r)=r\prod_{p\mid r}\left(1-\frac{1}{p}\right).$

The implementations obtain all totient values up to $n$ with a sieve. Start with $\varphi(r)=r$ for every $r$. Whenever a prime $p$ is found, update each multiple of $p$ by

$\varphi(r)\leftarrow \varphi(r)-\frac{\varphi(r)}{p}.$

After all primes have been processed, every needed weight $\varphi(1),\dots,\varphi(n)$ is available, and the expectation becomes

$E_{n,m}(\varphi)=\sum_{k=1}^{n-m} \varphi(k)\frac{\binom{n-k}{m}}{\binom{n}{m}}.$

Worked Example: $n=5$, $m=2$, $f(k)=\varphi(k)$

Here $n-m=3$, so only $k=1,2,3$ matter. The survival probabilities are

$q_1=\frac{\binom{4}{2}}{\binom{5}{2}}=\frac{3}{5},\qquad q_2=\frac{\binom{3}{2}}{\binom{5}{2}}=\frac{3}{10},\qquad q_3=\frac{\binom{2}{2}}{\binom{5}{2}}=\frac{1}{10}.$

The relevant totients are

$\varphi(1)=1,\qquad \varphi(2)=1,\qquad \varphi(3)=2.$

Therefore

$E_{5,2}(\varphi)=1\cdot\frac{3}{5}+1\cdot\frac{3}{10}+2\cdot\frac{1}{10}=\frac{11}{10}.$

This agrees with direct enumeration of the ten 2-element subsets of $\{1,2,3,4,5\}$: four subsets start at 1 and contribute 0, three start at 2 and contribute 1, two start at 3 and contribute $1+1=2$, and one starts at 4 and contributes $1+1+2=4$. The average is $(0\cdot4+1\cdot3+2\cdot2+4\cdot1)/10=11/10$.

How the Code Works

The C++, Python, and Java implementations all follow the same two-phase plan. First they build the totient table up to $n$ using the prime-multiple sieve described above. That preprocessing is the dominant cost, but it is done only once.

Second, they evaluate the expectation with the recurrence for $q_k$. The first probability is $(n-m)/n$, and each later probability is obtained from the previous one by multiplying by $(n-k-m)/(n-k)$. At each step the current probability is multiplied by the current totient value and added to the running total.

The implementations never enumerate subsets and never form the huge binomial coefficients explicitly. The C++ version also checks two reference values before the main run: a linear-weight case with exact value $2525/1326$, and a totient-weight checkpoint at $n=10{,}000$, $m=100$ giving $5842.849907$. The production computation then uses $n=12{,}345{,}678$ and $m=12{,}345$ and prints the result to six decimal places.

Complexity Analysis

Computing all totients up to $n$ by the sieve costs $O(n\log\log n)$ time and $O(n)$ memory. The expectation pass uses one additional forward scan of length $n-m$, so it costs $O(n-m)$ time and $O(1)$ extra space beyond the totient table. Overall, the method runs in $O(n\log\log n)$ time and $O(n)$ memory.

Footnotes and References

Problem page: Project Euler 756
Euler's totient function: Wikipedia — Euler's totient function
Hypergeometric distribution and sampling without replacement: Wikipedia — Hypergeometric distribution
Negative hypergeometric distribution: Wikipedia — Negative hypergeometric distribution
Expected value and linearity of expectation: Wikipedia — Expected value