Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 906: A Collective Decision

View on Project Euler

Project Euler Problem 906 Solution

EulerSolve provides an optimized solution for Project Euler Problem 906, A Collective Decision, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary Problem 906 studies two independent random preference orders on the same set of \(n\) alternatives and asks for a probability \(P(n)\); the final evaluation uses \(n=20000\). The implementations show that the whole computation can be organized around one distinguished alternative \(x\). If \(x\) is in position \(a+1\) in the first order and in position \(b+1\) in the second, then among the other \(N=n-1\) alternatives there are \(a\) above \(x\) in the first order and \(b\) above \(x\) in the second. The remaining alternatives split into three blocks: those above \(x\) only in the first order, those above \(x\) only in the second order, and those below \(x\) in both orders. The event counted by the implementations requires the two upper blocks to be disjoint, and then it adds an auxiliary contribution coming from the common lower block. Averaging that contribution over the \(n^2\) equally likely rank pairs of \(x\) yields the desired probability. Mathematical Approach Let \(N=n-1\). For a fixed alternative \(x\), the two permutations induce a pair of rank coordinates. The solution turns the original probability question into a clean double sum over those coordinates. Fix one alternative and its two rank coordinates Write \(y\succ_1 x\) when the first preference order places \(y\) above \(x\), and \(y\succ_2 x\) for the second order....

Detailed mathematical approach

Problem Summary

Problem 906 studies two independent random preference orders on the same set of \(n\) alternatives and asks for a probability \(P(n)\); the final evaluation uses \(n=20000\). The implementations show that the whole computation can be organized around one distinguished alternative \(x\). If \(x\) is in position \(a+1\) in the first order and in position \(b+1\) in the second, then among the other \(N=n-1\) alternatives there are \(a\) above \(x\) in the first order and \(b\) above \(x\) in the second.

The remaining alternatives split into three blocks: those above \(x\) only in the first order, those above \(x\) only in the second order, and those below \(x\) in both orders. The event counted by the implementations requires the two upper blocks to be disjoint, and then it adds an auxiliary contribution coming from the common lower block. Averaging that contribution over the \(n^2\) equally likely rank pairs of \(x\) yields the desired probability.

Mathematical Approach

Let \(N=n-1\). For a fixed alternative \(x\), the two permutations induce a pair of rank coordinates. The solution turns the original probability question into a clean double sum over those coordinates.

Fix one alternative and its two rank coordinates

Write \(y\succ_1 x\) when the first preference order places \(y\) above \(x\), and \(y\succ_2 x\) for the second order. Define

$A=\{y:y\succ_1 x\},\qquad B=\{y:y\succ_2 x\}.$

If \(|A|=a\) and \(|B|=b\), then \(x\) has rank \(a+1\) in the first order and rank \(b+1\) in the second. Because the two orders are independent and each rank of \(x\) is uniform on \(\{1,\dots,n\}\), every pair \((a,b)\in\{0,\dots,N\}^2\) occurs with probability \(1/n^2\).

Why the upper sets must be disjoint

The implemented formula contributes only when no alternative lies above \(x\) in both orders. In set language that condition is

$A\cap B=\varnothing.$

For fixed sizes \(a\) and \(b\), the set \(B\) is a uniformly random \(b\)-subset of the \(N\) alternatives other than \(x\). Exactly \(\binom{N-a}{b}\) such subsets avoid the \(a\) elements already placed in \(A\). Therefore

$q(a,b)=\Pr(A\cap B=\varnothing\mid |A|=a,\ |B|=b)=\frac{\binom{N-a}{b}}{\binom{N}{b}}.$

This immediately explains the triangular summation region in the code: if \(a+b>N\), then disjoint sets of those sizes cannot exist, so the contribution is zero. It also explains the multiplicative update used inside the inner loop:

$q(a,0)=1,\qquad q(a,b+1)=q(a,b)\frac{N-a-b}{N-b}.$

The unanimous lower block

Once \(A\) and \(B\) are disjoint, every remaining alternative is below \(x\) in both orders. Denote that common lower block by

$C=\{y:x\succ_1 y\ \text{and}\ x\succ_2 y\},\qquad |C|=m=N-a-b.$

The implementations attach to this block the auxiliary quantity

$R_m=\sum_{c=0}^{m}\prod_{t=0}^{c-1}\frac{m-t}{N-t}=\sum_{c=0}^{m}\frac{(m)_c}{(N)_c}=\sum_{c=0}^{m}\frac{\binom{m}{c}}{\binom{N}{c}}.$

The term for a fixed \(c\) has a direct counting interpretation: \(\binom{m}{c}/\binom{N}{c}\) is the probability that a uniformly chosen \(c\)-subset of the \(N\) alternatives other than \(x\) lies entirely inside the unanimous lower block \(C\). The total \(R_m\) adds those probabilities for every feasible subset size \(c\).

Collapsing the auxiliary sum

The direct sum for \(R_m\) is already good enough for computation, but the mathematics becomes clearer after simplifying it. Start from

$\frac{\binom{m}{c}}{\binom{N}{c}}=\frac{1}{\binom{N}{m}}\binom{N-c}{m-c}.$

Substituting this into the previous formula gives

$R_m=\frac{1}{\binom{N}{m}}\sum_{c=0}^{m}\binom{N-c}{m-c}.$

Now set \(j=m-c\). The sum becomes

$\sum_{j=0}^{m}\binom{N-m+j}{j},$

which is a standard hockey-stick sum, so

$R_m=\frac{\binom{N+1}{m}}{\binom{N}{m}}=\frac{N+1}{N+1-m}.$

Since \(m=N-a-b\), this can be rewritten in the especially convenient form

$R_{N-a-b}=\frac{n}{a+b+1}.$

The final probability formula

Putting the two pieces together, the probability computed by the C++, Python, and Java implementations is

$\boxed{P(n)=\frac{1}{n^2}\sum_{a=0}^{N}\sum_{b=0}^{N-a} q(a,b)\,R_{N-a-b}=\frac{1}{n^2}\sum_{a=0}^{N}\sum_{b=0}^{N-a}\frac{\binom{N-a}{b}}{\binom{N}{b}}\cdot\frac{n}{a+b+1}.}$

Everything in the program is a fast way to evaluate this triangular double sum without ever forming large factorials or binomial coefficients explicitly.

Worked Example: \(n=3\)

Take \(n=3\), so \(N=2\). Then

$R_0=1,\qquad R_1=1+\frac12=\frac32,\qquad R_2=1+1+1=3.$

The admissible pairs \((a,b)\) satisfy \(a+b\le 2\). Their contributions are

$\begin{aligned} (0,0)&:\ q=1,\ m=2,\ qR_m=3,\\ (0,1)&:\ q=1,\ m=1,\ qR_m=\frac32,\\ (0,2)&:\ q=1,\ m=0,\ qR_m=1,\\ (1,0)&:\ q=1,\ m=1,\ qR_m=\frac32,\\ (1,1)&:\ q=\frac12,\ m=0,\ qR_m=\frac12,\\ (2,0)&:\ q=1,\ m=0,\ qR_m=1. \end{aligned}$

The total is \(17/2\), and dividing by \(n^2=9\) gives

$P(3)=\frac{17/2}{9}=\frac{17}{18}.$

This small case shows all the moving parts of the full solution: the disjointness factor, the lower-block sum, and the final average over rank pairs.

How the Code Works

Precomputation of reciprocal factors and lower-block values

The implementations first build a table of reciprocals \(1,2,\dots,N\). That allows every ratio in the double sum to be updated by multiplication instead of by repeated division. They also precompute all values of \(R_m\) for \(m=0,\dots,N\). For a fixed \(m\), the term with \(c=0\) is \(1\), and the next term is obtained from the previous one by multiplying by \((m-c+1)/(N-c+1)\). Summing those terms produces the required lower-block table.

The triangular sweep over \((a,b)\)

After the \(R_m\) table is available, the main work is the triangular loop over all \(a\) and \(b\) with \(0\le a\le N\) and \(0\le b\le N-a\). For each fixed \(a\), the implementation starts from \(q(a,0)=1\) and updates the disjointness factor with

$q(a,b+1)=q(a,b)\frac{N-a-b}{N-b}.$

At each lattice point \((a,b)\), it looks up \(R_{N-a-b}\), adds \(q(a,b)R_{N-a-b}\) to the running total, and finally divides the accumulated sum by \(n^2\).

Numerical accuracy and compiled-language optimizations

The calculation is purely floating-point. The compiled implementations therefore use compensated summation to reduce accumulation error, and the C++ version also keeps the main total in extended precision. The compiled versions split the outer \(a\)-range into blocks and process those blocks in parallel, while the Python implementation follows the same mathematics serially.

Complexity Analysis

Let \(N=n-1\). Building the reciprocal table costs \(O(N)\) time and \(O(N)\) memory. Precomputing all \(R_m\) values with the direct inner products used in the implementations costs \(O(N^2)\) arithmetic operations. The main triangular sum over \((a,b)\) is also \(O(N^2)\).

Therefore the overall running time is \(O(N^2)\) and the memory usage is \(O(N)\). Parallel execution reduces wall-clock time for the compiled versions, but it does not change the asymptotic order.

Footnotes and References

  1. Problem page: Project Euler 906
  2. Total order: Wikipedia - Total order
  3. Binomial coefficient: Wikipedia - Binomial coefficient
  4. Falling and rising factorials: Wikipedia - Falling and rising factorials
  5. Hypergeometric distribution: Wikipedia - Hypergeometric distribution
  6. Hockey-stick identity: Wikipedia - Hockey-stick identity
  7. Kahan summation algorithm: Wikipedia - Kahan summation algorithm

Mathematical approach · C++ solution · Python solution · Java solution

Previous: Problem 905 · All Project Euler solutions · Next: Problem 907

Need help with a problem? Ask me! 💡
e
✦ Euler GLM 5.2
Hi! I'm Euler. Ask me anything about Project Euler problems, math concepts, or solution approaches! 🧮