Problem 906: A Collective Decision
View on Project EulerProject Euler Problem 906 Solution
EulerSolve provides an optimized solution for Project Euler Problem 906, A Collective Decision, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary Problem 906 studies two independent random preference orders on the same set of \(n\) alternatives and asks for a probability \(P(n)\); the final evaluation uses \(n=20000\). The implementations show that the whole computation can be organized around one distinguished alternative \(x\). If \(x\) is in position \(a+1\) in the first order and in position \(b+1\) in the second, then among the other \(N=n-1\) alternatives there are \(a\) above \(x\) in the first order and \(b\) above \(x\) in the second. The remaining alternatives split into three blocks: those above \(x\) only in the first order, those above \(x\) only in the second order, and those below \(x\) in both orders. The event counted by the implementations requires the two upper blocks to be disjoint, and then it adds an auxiliary contribution coming from the common lower block. Averaging that contribution over the \(n^2\) equally likely rank pairs of \(x\) yields the desired probability. Mathematical Approach Let \(N=n-1\). For a fixed alternative \(x\), the two permutations induce a pair of rank coordinates. The solution turns the original probability question into a clean double sum over those coordinates. Fix one alternative and its two rank coordinates Write \(y\succ_1 x\) when the first preference order places \(y\) above \(x\), and \(y\succ_2 x\) for the second order....
Detailed mathematical approach
Problem Summary
Problem 906 studies two independent random preference orders on the same set of \(n\) alternatives and asks for a probability \(P(n)\); the final evaluation uses \(n=20000\). The implementations show that the whole computation can be organized around one distinguished alternative \(x\). If \(x\) is in position \(a+1\) in the first order and in position \(b+1\) in the second, then among the other \(N=n-1\) alternatives there are \(a\) above \(x\) in the first order and \(b\) above \(x\) in the second.
The remaining alternatives split into three blocks: those above \(x\) only in the first order, those above \(x\) only in the second order, and those below \(x\) in both orders. The event counted by the implementations requires the two upper blocks to be disjoint, and then it adds an auxiliary contribution coming from the common lower block. Averaging that contribution over the \(n^2\) equally likely rank pairs of \(x\) yields the desired probability.
Mathematical Approach
Let \(N=n-1\). For a fixed alternative \(x\), the two permutations induce a pair of rank coordinates. The solution turns the original probability question into a clean double sum over those coordinates.
Fix one alternative and its two rank coordinates
Write \(y\succ_1 x\) when the first preference order places \(y\) above \(x\), and \(y\succ_2 x\) for the second order. Define
$A=\{y:y\succ_1 x\},\qquad B=\{y:y\succ_2 x\}.$
If \(|A|=a\) and \(|B|=b\), then \(x\) has rank \(a+1\) in the first order and rank \(b+1\) in the second. Because the two orders are independent and each rank of \(x\) is uniform on \(\{1,\dots,n\}\), every pair \((a,b)\in\{0,\dots,N\}^2\) occurs with probability \(1/n^2\).
Why the upper sets must be disjoint
The implemented formula contributes only when no alternative lies above \(x\) in both orders. In set language that condition is
$A\cap B=\varnothing.$
For fixed sizes \(a\) and \(b\), the set \(B\) is a uniformly random \(b\)-subset of the \(N\) alternatives other than \(x\). Exactly \(\binom{N-a}{b}\) such subsets avoid the \(a\) elements already placed in \(A\). Therefore
$q(a,b)=\Pr(A\cap B=\varnothing\mid |A|=a,\ |B|=b)=\frac{\binom{N-a}{b}}{\binom{N}{b}}.$
This immediately explains the triangular summation region in the code: if \(a+b>N\), then disjoint sets of those sizes cannot exist, so the contribution is zero. It also explains the multiplicative update used inside the inner loop:
$q(a,0)=1,\qquad q(a,b+1)=q(a,b)\frac{N-a-b}{N-b}.$
The unanimous lower block
Once \(A\) and \(B\) are disjoint, every remaining alternative is below \(x\) in both orders. Denote that common lower block by
$C=\{y:x\succ_1 y\ \text{and}\ x\succ_2 y\},\qquad |C|=m=N-a-b.$
The implementations attach to this block the auxiliary quantity
$R_m=\sum_{c=0}^{m}\prod_{t=0}^{c-1}\frac{m-t}{N-t}=\sum_{c=0}^{m}\frac{(m)_c}{(N)_c}=\sum_{c=0}^{m}\frac{\binom{m}{c}}{\binom{N}{c}}.$
The term for a fixed \(c\) has a direct counting interpretation: \(\binom{m}{c}/\binom{N}{c}\) is the probability that a uniformly chosen \(c\)-subset of the \(N\) alternatives other than \(x\) lies entirely inside the unanimous lower block \(C\). The total \(R_m\) adds those probabilities for every feasible subset size \(c\).
Collapsing the auxiliary sum
The direct sum for \(R_m\) is already good enough for computation, but the mathematics becomes clearer after simplifying it. Start from
$\frac{\binom{m}{c}}{\binom{N}{c}}=\frac{1}{\binom{N}{m}}\binom{N-c}{m-c}.$
Substituting this into the previous formula gives
$R_m=\frac{1}{\binom{N}{m}}\sum_{c=0}^{m}\binom{N-c}{m-c}.$
Now set \(j=m-c\). The sum becomes
$\sum_{j=0}^{m}\binom{N-m+j}{j},$
which is a standard hockey-stick sum, so
$R_m=\frac{\binom{N+1}{m}}{\binom{N}{m}}=\frac{N+1}{N+1-m}.$
Since \(m=N-a-b\), this can be rewritten in the especially convenient form
$R_{N-a-b}=\frac{n}{a+b+1}.$
The final probability formula
Putting the two pieces together, the probability computed by the C++, Python, and Java implementations is
$\boxed{P(n)=\frac{1}{n^2}\sum_{a=0}^{N}\sum_{b=0}^{N-a} q(a,b)\,R_{N-a-b}=\frac{1}{n^2}\sum_{a=0}^{N}\sum_{b=0}^{N-a}\frac{\binom{N-a}{b}}{\binom{N}{b}}\cdot\frac{n}{a+b+1}.}$
Everything in the program is a fast way to evaluate this triangular double sum without ever forming large factorials or binomial coefficients explicitly.
Worked Example: \(n=3\)
Take \(n=3\), so \(N=2\). Then
$R_0=1,\qquad R_1=1+\frac12=\frac32,\qquad R_2=1+1+1=3.$
The admissible pairs \((a,b)\) satisfy \(a+b\le 2\). Their contributions are
$\begin{aligned} (0,0)&:\ q=1,\ m=2,\ qR_m=3,\\ (0,1)&:\ q=1,\ m=1,\ qR_m=\frac32,\\ (0,2)&:\ q=1,\ m=0,\ qR_m=1,\\ (1,0)&:\ q=1,\ m=1,\ qR_m=\frac32,\\ (1,1)&:\ q=\frac12,\ m=0,\ qR_m=\frac12,\\ (2,0)&:\ q=1,\ m=0,\ qR_m=1. \end{aligned}$
The total is \(17/2\), and dividing by \(n^2=9\) gives
$P(3)=\frac{17/2}{9}=\frac{17}{18}.$
This small case shows all the moving parts of the full solution: the disjointness factor, the lower-block sum, and the final average over rank pairs.
How the Code Works
Precomputation of reciprocal factors and lower-block values
The implementations first build a table of reciprocals \(1,2,\dots,N\). That allows every ratio in the double sum to be updated by multiplication instead of by repeated division. They also precompute all values of \(R_m\) for \(m=0,\dots,N\). For a fixed \(m\), the term with \(c=0\) is \(1\), and the next term is obtained from the previous one by multiplying by \((m-c+1)/(N-c+1)\). Summing those terms produces the required lower-block table.
The triangular sweep over \((a,b)\)
After the \(R_m\) table is available, the main work is the triangular loop over all \(a\) and \(b\) with \(0\le a\le N\) and \(0\le b\le N-a\). For each fixed \(a\), the implementation starts from \(q(a,0)=1\) and updates the disjointness factor with
$q(a,b+1)=q(a,b)\frac{N-a-b}{N-b}.$
At each lattice point \((a,b)\), it looks up \(R_{N-a-b}\), adds \(q(a,b)R_{N-a-b}\) to the running total, and finally divides the accumulated sum by \(n^2\).
Numerical accuracy and compiled-language optimizations
The calculation is purely floating-point. The compiled implementations therefore use compensated summation to reduce accumulation error, and the C++ version also keeps the main total in extended precision. The compiled versions split the outer \(a\)-range into blocks and process those blocks in parallel, while the Python implementation follows the same mathematics serially.
Complexity Analysis
Let \(N=n-1\). Building the reciprocal table costs \(O(N)\) time and \(O(N)\) memory. Precomputing all \(R_m\) values with the direct inner products used in the implementations costs \(O(N^2)\) arithmetic operations. The main triangular sum over \((a,b)\) is also \(O(N^2)\).
Therefore the overall running time is \(O(N^2)\) and the memory usage is \(O(N)\). Parallel execution reduces wall-clock time for the compiled versions, but it does not change the asymptotic order.
Footnotes and References
- Problem page: Project Euler 906
- Total order: Wikipedia - Total order
- Binomial coefficient: Wikipedia - Binomial coefficient
- Falling and rising factorials: Wikipedia - Falling and rising factorials
- Hypergeometric distribution: Wikipedia - Hypergeometric distribution
- Hockey-stick identity: Wikipedia - Hockey-stick identity
- Kahan summation algorithm: Wikipedia - Kahan summation algorithm