Problem 398: Cutting Rope
View on Project EulerProject Euler Problem 398 Solution
EulerSolve provides an optimized solution for Project Euler Problem 398, Cutting Rope, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary Choose uniformly from all ordered compositions of \(n\) into \(m\) positive integers: $\mathcal{C}_{n,m}=\left\{(x_1,\dots,x_m)\in \mathbb{Z}_{>0}^m : x_1+\cdots+x_m=n\right\}.$ After sorting one sampled composition into nondecreasing order, write $X_{(1)}\le X_{(2)}\le \cdots \le X_{(m)}.$ The task solved by the code is to compute the expected value of the second shortest piece, namely \(\mathbb{E}[X_{(2)}]\), for the Project Euler parameters \(n=10^7\) and \(m=100\). Mathematical Approach 1) Total number of equally likely cuts By the standard stars-and-bars argument, the number of ordered compositions of \(n\) into \(m\) positive parts is $\left|\mathcal{C}_{n,m}\right|=\binom{n-1}{m-1}.$ So every probability in the program is a composition count divided by \(\binom{n-1}{m-1}\). 2) Tail-sum identity for an integer-valued random variable Because \(X_{(2)}\) is a positive integer, we may use the tail-sum formula $\mathbb{E}[X_{(2)}]=\sum_{k\ge 1}\Pr(X_{(2)}\ge k).$ This reduces the whole problem to counting, for each \(k\), how many compositions have second smallest part at least \(k\). 3) Interpret the event \(X_{(2)}\ge k\) The condition \(X_{(2)}\ge k\) means that at most one part is smaller than \(k\). Equivalently, the composition belongs to one of two disjoint cases: Case A: all \(m\) parts are at least \(k\)....
Detailed mathematical approach
Problem Summary
Choose uniformly from all ordered compositions of \(n\) into \(m\) positive integers:
$\mathcal{C}_{n,m}=\left\{(x_1,\dots,x_m)\in \mathbb{Z}_{>0}^m : x_1+\cdots+x_m=n\right\}.$
After sorting one sampled composition into nondecreasing order, write
$X_{(1)}\le X_{(2)}\le \cdots \le X_{(m)}.$
The task solved by the code is to compute the expected value of the second shortest piece, namely \(\mathbb{E}[X_{(2)}]\), for the Project Euler parameters \(n=10^7\) and \(m=100\).
Mathematical Approach
1) Total number of equally likely cuts
By the standard stars-and-bars argument, the number of ordered compositions of \(n\) into \(m\) positive parts is
$\left|\mathcal{C}_{n,m}\right|=\binom{n-1}{m-1}.$
So every probability in the program is a composition count divided by \(\binom{n-1}{m-1}\).
2) Tail-sum identity for an integer-valued random variable
Because \(X_{(2)}\) is a positive integer, we may use the tail-sum formula
$\mathbb{E}[X_{(2)}]=\sum_{k\ge 1}\Pr(X_{(2)}\ge k).$
This reduces the whole problem to counting, for each \(k\), how many compositions have second smallest part at least \(k\).
3) Interpret the event \(X_{(2)}\ge k\)
The condition \(X_{(2)}\ge k\) means that at most one part is smaller than \(k\). Equivalently, the composition belongs to one of two disjoint cases:
Case A: all \(m\) parts are at least \(k\).
Case B: exactly one part is in \(\{1,2,\dots,k-1\}\), and the remaining \(m-1\) parts are at least \(k\).
4) Count Case A: every part is at least \(k\)
Write \(x_i=y_i+(k-1)\) for every part. Then each \(y_i\ge 1\) and
$y_1+\cdots+y_m=n-m(k-1).$
Therefore the number of compositions in Case A is
$A_k=\binom{n-m(k-1)-1}{m-1}.$
As usual, this is understood to be \(0\) whenever the top entry is smaller than \(m-1\).
5) Count Case B: exactly one short part
Choose the short position in \(m\) ways. Let that short part have length \(t\), where \(1\le t\le k-1\). The other \(m-1\) parts are at least \(k\), so after subtracting \(k-1\) from each of them we get
$z_1+\cdots+z_{m-1}=n-t-(m-1)(k-1),\qquad z_i\ge 1.$
For fixed \(t\), the number of such compositions is
$\binom{n-t-(m-1)(k-1)-1}{m-2}.$
Summing over \(t\) gives
$B_k=m\sum_{t=1}^{k-1}\binom{n-t-(m-1)(k-1)-1}{m-2}.$
Now apply the hockey-stick identity:
$\sum_{t=1}^{k-1}\binom{n-t-(m-1)(k-1)-1}{m-2}=\binom{n-(m-1)(k-1)-1}{m-1}-\binom{n-m(k-1)-1}{m-1}.$
Hence
$B_k=m\left(\binom{n-(m-1)(k-1)-1}{m-1}-\binom{n-m(k-1)-1}{m-1}\right).$
6) Closed formula for the tail probability
Adding the two disjoint cases yields
$A_k+B_k=m\binom{n-(m-1)(k-1)-1}{m-1}-(m-1)\binom{n-m(k-1)-1}{m-1}.$
Dividing by the total number of compositions gives the exact tail used by the solver:
$\Pr(X_{(2)}\ge k)=\frac{m\binom{n-(m-1)(k-1)-1}{m-1}-(m-1)\binom{n-m(k-1)-1}{m-1}}{\binom{n-1}{m-1}}.$
Substituting this into the tail-sum identity gives the full expectation. This is the formula implemented in C++, Python, and Java.
7) Small checkpoints from the source code
The C++ program verifies two test cases before printing the final value. For \((n,m)=(3,2)\), the only compositions are \((1,2)\) and \((2,1)\), so the second shortest piece is always \(2\), hence \(\mathbb{E}[X_{(2)}]=2\).
For \((n,m)=(8,3)\), the denominator is \(\binom{7}{2}=21\). The nonzero tail terms are
$\Pr(X_{(2)}\ge 1)=1,$
$\Pr(X_{(2)}\ge 2)=\frac{3\binom{5}{2}-2\binom{4}{2}}{\binom{7}{2}}=\frac{18}{21}=\frac{6}{7},$
$\Pr(X_{(2)}\ge 3)=\frac{3\binom{3}{2}-2\binom{1}{2}}{\binom{7}{2}}=\frac{9}{21}=\frac{3}{7}.$
All later terms are zero, so
$\mathbb{E}[X_{(2)}]=1+\frac{6}{7}+\frac{3}{7}=\frac{16}{7},$
which matches the checkpoint in the repository.
How the Code Works
The implementations avoid constructing huge binomial coefficients directly. They set
$r=m-1,$
$R_k=\frac{\binom{n-(m-1)(k-1)-1}{r}}{\binom{n-1}{r}},\qquad S_k=\frac{\binom{n-m(k-1)-1}{r}}{\binom{n-1}{r}}.$
Then the tail probability becomes
$\Pr(X_{(2)}\ge k)=mR_k-(m-1)S_k.$
If the current binomial top is \(a\) and the next step decreases it by \(d\), the ratio update is
$\frac{\binom{a-d}{r}}{\binom{a}{r}}=\prod_{j=0}^{r-1}\frac{a-d-j}{a-j}.$
This telescoping product is exactly what ratio_binom_drop computes. The variables a_r and a_s track the current top entries for the two numerator terms, while ratio_r and ratio_s hold the normalized values \(R_k\) and \(S_k\). Once a top entry falls below \(r\), that term is set to zero and the loop terminates naturally.
Complexity Analysis
Each loop iteration updates two products of length \(r=m-1\), so one iteration costs \(O(m)\) time. If \(T\) tail levels are nonzero, the full computation costs \(O(Tm)\) time and \(O(1)\) additional memory. Since the top parameters decrease by \(m-1\) or \(m\) at every step, \(T\) is on the order of \(n/(m-1)\). The method is therefore compact in memory and efficient enough for the Project Euler input.
References
- Problem page: https://projecteuler.net/problem=398
- Order statistics: Wikipedia — Order statistic
- Compositions and stars-and-bars: Wikipedia — Stars and bars
- Binomial coefficient identities: Wikipedia — Binomial coefficient