Problem 231: Prime Factorisation of Binomial Coefficients
View on Project EulerProject Euler Problem 231 Solution
EulerSolve provides an optimized solution for Project Euler Problem 231, Prime Factorisation of Binomial Coefficients, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary The target quantity is the sum of the prime factors of the binomial coefficient \(\binom{20{,}000{,}000}{15{,}000{,}000}\), where each prime is counted with multiplicity. If a positive integer \(N\) has prime factorisation $N=\prod_{p \text{ prime}} p^{e_p},$ then the required weighted sum is $S(N)=\sum_{p \text{ prime}} p\,e_p.$ So for this problem we must compute $S\!\left(\binom{20{,}000{,}000}{15{,}000{,}000}\right)=\sum_{p \le 20{,}000{,}000} p\,v_p\!\left(\binom{20{,}000{,}000}{15{,}000{,}000}\right),$ where \(v_p(x)\) is the exponent of the prime \(p\) in \(x\). Directly expanding or factoring the binomial coefficient is impossible in practice, so the solution works entirely with prime exponents. Mathematical Approach The whole method rests on one idea: instead of building the enormous integer \(\binom{n}{k}\), determine how many times each prime occurs in it and add the weighted contributions \(p \cdot v_p\). Prime Exponents Inside a Binomial Coefficient Write $\binom{n}{k}=\frac{n!}{k!(n-k)!}.$ Prime exponents are additive across multiplication and subtractive across division, so for every prime \(p\), $v_p\!\left(\binom{n}{k}\right)=v_p(n!) - v_p(k!) - v_p((n-k)!).$ This is the exact cancellation pattern used by the implementations. It turns the problem into three factorial-valuation queries per prime....
Detailed mathematical approach
Problem Summary
The target quantity is the sum of the prime factors of the binomial coefficient \(\binom{20{,}000{,}000}{15{,}000{,}000}\), where each prime is counted with multiplicity. If a positive integer \(N\) has prime factorisation
$N=\prod_{p \text{ prime}} p^{e_p},$
then the required weighted sum is
$S(N)=\sum_{p \text{ prime}} p\,e_p.$
So for this problem we must compute
$S\!\left(\binom{20{,}000{,}000}{15{,}000{,}000}\right)=\sum_{p \le 20{,}000{,}000} p\,v_p\!\left(\binom{20{,}000{,}000}{15{,}000{,}000}\right),$
where \(v_p(x)\) is the exponent of the prime \(p\) in \(x\). Directly expanding or factoring the binomial coefficient is impossible in practice, so the solution works entirely with prime exponents.
Mathematical Approach
The whole method rests on one idea: instead of building the enormous integer \(\binom{n}{k}\), determine how many times each prime occurs in it and add the weighted contributions \(p \cdot v_p\).
Prime Exponents Inside a Binomial Coefficient
Write
$\binom{n}{k}=\frac{n!}{k!(n-k)!}.$
Prime exponents are additive across multiplication and subtractive across division, so for every prime \(p\),
$v_p\!\left(\binom{n}{k}\right)=v_p(n!) - v_p(k!) - v_p((n-k)!).$
This is the exact cancellation pattern used by the implementations. It turns the problem into three factorial-valuation queries per prime. The complement \(n-k\) matters naturally because the denominator contains both \(k!\) and \((n-k)!\), and the identity \(\binom{n}{k}=\binom{n}{n-k}\) explains why the two smaller factorials play symmetric roles.
Legendre's Formula for Factorials
To evaluate \(v_p(m!)\), the code uses Legendre's formula:
$v_p(m!)=\sum_{j \ge 1}\left\lfloor \frac{m}{p^j}\right\rfloor.$
Each term has a clean meaning. The term \(\left\lfloor m/p \right\rfloor\) counts the multiples of \(p\) in \(1,2,\dots,m\); the term \(\left\lfloor m/p^2 \right\rfloor\) counts the extra copies contributed by multiples of \(p^2\); the next term does the same for \(p^3\), and so on. The sum is finite because once \(p^j > m\), every later floor is zero.
Computationally, this means we can start with \(x=m\), repeatedly divide by \(p\), and add the quotients:
$v_p(m!)=\left\lfloor \frac{m}{p}\right\rfloor+\left\lfloor \frac{m}{p^2}\right\rfloor+\left\lfloor \frac{m}{p^3}\right\rfloor+\cdots.$
Turning Valuations into the Required Sum
Substituting Legendre's formula into the binomial identity gives
$v_p\!\left(\binom{n}{k}\right)=\sum_{j \ge 1}\left(\left\lfloor \frac{n}{p^j}\right\rfloor-\left\lfloor \frac{k}{p^j}\right\rfloor-\left\lfloor \frac{n-k}{p^j}\right\rfloor\right).$
Therefore
$S\!\left(\binom{n}{k}\right)=\sum_{p \le n} p \sum_{j \ge 1}\left(\left\lfloor \frac{n}{p^j}\right\rfloor-\left\lfloor \frac{k}{p^j}\right\rfloor-\left\lfloor \frac{n-k}{p^j}\right\rfloor\right).$
Only primes up to \(n\) can contribute, because no prime larger than \(n\) appears in \(n!\). That is why the first phase of the algorithm is simply to list all primes \(p \le n\) with a sieve.
Once those primes are known, every contribution is independent. For each prime \(p\), compute its exponent in \(n!\), subtract the two denominator exponents, multiply by \(p\), and add the result to the total.
Worked Example: \(\binom{10}{3}\)
The small checkpoint used by the implementations is
$\binom{10}{3}=120=2^3 \cdot 3 \cdot 5,$
so the weighted prime-factor sum should be
$S\!\left(\binom{10}{3}\right)=2 \cdot 3 + 3 \cdot 1 + 5 \cdot 1 = 14.$
Legendre's formula reproduces that value directly. For \(p=2\),
$v_2(10!)=5+2+1=8,\qquad v_2(3!)=1,\qquad v_2(7!)=3+1=4,$
hence
$v_2\!\left(\binom{10}{3}\right)=8-1-4=3.$
For \(p=3\),
$v_3(10!)=3+1=4,\qquad v_3(3!)=1,\qquad v_3(7!)=2,$
so
$v_3\!\left(\binom{10}{3}\right)=4-1-2=1.$
For \(p=5\),
$v_5(10!)=2,\qquad v_5(3!)=0,\qquad v_5(7!)=1,$
thus
$v_5\!\left(\binom{10}{3}\right)=2-0-1=1.$
And for \(p=7\), the cancellation is complete:
$v_7(10!)=1,\qquad v_7(3!)=0,\qquad v_7(7!)=1,\qquad v_7\!\left(\binom{10}{3}\right)=0.$
This example shows exactly why the method is reliable: the binomial coefficient never has to be formed explicitly, yet every prime exponent is recovered correctly.
How the Code Works
The C++, Python, and Java implementations all follow the same three-stage computation.
Generate All Relevant Primes
First, the implementation runs a sieve of Eratosthenes up to \(n\). A byte or boolean array marks which integers are still prime, composite multiples are crossed out, and the surviving indices are collected into a prime list. Because the target instance uses \(n=20{,}000{,}000\), this precomputation is large but still straightforward.
Evaluate Factorial Valuations by Repeated Division
For each prime \(p\), the implementation computes \(v_p(m!)\) by repeatedly dividing \(m\) by \(p\) and accumulating the quotients. The same routine is applied to \(n\), to \(k\), and to the complementary value \(n-k\). This is a direct implementation of Legendre's formula and avoids any factorisation of individual integers.
Accumulate the Weighted Contribution
After the three valuations are known, the exponent of \(p\) in the binomial coefficient is obtained by subtraction:
$v_p\!\left(\binom{n}{k}\right)=v_p(n!) - v_p(k!) - v_p((n-k)!).$
The implementation then adds
$p \cdot v_p\!\left(\binom{n}{k}\right)$
to a running integer total. The C++ implementation also includes a small checkpoint for \(\binom{10}{3}\), verifying that the method returns 14 before the main instance is processed. The Python and Java versions perform the same mathematics with the same default parameters.
Complexity Analysis
The sieve up to \(n\) costs \(O(n \log \log n)\) time and \(O(n)\) memory. That is the dominant memory expense, because the implementation stores one primality mark per integer up to \(n\).
After the sieve, each prime \(p\) requires \(O(\log_p n)\) divisions to evaluate Legendre's sum, so the valuation phase costs
$O\!\left(\sum_{p \le n}\log_p n\right).$
Combining both parts gives total running time
$O\!\left(n \log \log n + \sum_{p \le n}\log_p n\right),$
with memory usage \(O(n)\). For the actual problem size, the algorithm is efficient because it never manipulates the gigantic binomial coefficient itself; it only works with prime counts.