Problem 793: Median of Products
View on Project EulerProject Euler Problem 793 Solution
EulerSolve provides an optimized solution for Project Euler Problem 793, Median of Products, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary We generate the first \(n\) terms of the deterministic sequence $s_0=290797,\qquad s_{k+1}\equiv s_k^2 \pmod{50515093},$ sort those values into $a_0\le a_1\le \dots \le a_{n-1},$ and consider every product \(a_i a_j\) with \(0\le i<j<n\). There are $m=\binom{n}{2}=\frac{n(n-1)}{2}$ such products. The task is to return their median. In the order-statistic language used by the implementations, this means the element of rank $t=\left\lceil\frac{m}{2}\right\rceil=\left\lfloor\frac{m+1}{2}\right\rfloor,$ so when \(m\) is even, we want the lower of the two middle products. Mathematical Approach The central observation is that we do not need to list all \(\binom{n}{2}\) products. Instead, we can ask a threshold question: for a candidate value \(x\), how many pair products are at most \(x\)? Once that counting problem is solved efficiently, the median follows from binary search. Step 1: Generate the sequence and sort it The recurrence produces \(n\) positive integers. After sorting them, the array becomes monotone: $a_0\le a_1\le \dots \le a_{n-1}.$ This sorted order is the key structural fact. For a fixed left index \(i\), the products $a_i a_{i+1},\ a_i a_{i+2},\ \dots,\ a_i a_{n-1}$ also increase as the right index moves to the right. That monotonicity is what makes a linear counting pass possible....
Detailed mathematical approach
Problem Summary
We generate the first \(n\) terms of the deterministic sequence
$s_0=290797,\qquad s_{k+1}\equiv s_k^2 \pmod{50515093},$
sort those values into
$a_0\le a_1\le \dots \le a_{n-1},$
and consider every product \(a_i a_j\) with \(0\le i<j<n\). There are
$m=\binom{n}{2}=\frac{n(n-1)}{2}$
such products. The task is to return their median. In the order-statistic language used by the implementations, this means the element of rank
$t=\left\lceil\frac{m}{2}\right\rceil=\left\lfloor\frac{m+1}{2}\right\rfloor,$
so when \(m\) is even, we want the lower of the two middle products.
Mathematical Approach
The central observation is that we do not need to list all \(\binom{n}{2}\) products. Instead, we can ask a threshold question: for a candidate value \(x\), how many pair products are at most \(x\)? Once that counting problem is solved efficiently, the median follows from binary search.
Step 1: Generate the sequence and sort it
The recurrence produces \(n\) positive integers. After sorting them, the array becomes monotone:
$a_0\le a_1\le \dots \le a_{n-1}.$
This sorted order is the key structural fact. For a fixed left index \(i\), the products
$a_i a_{i+1},\ a_i a_{i+2},\ \dots,\ a_i a_{n-1}$
also increase as the right index moves to the right. That monotonicity is what makes a linear counting pass possible.
Step 2: Turn the median into a counting threshold
For any integer \(x\ge 0\), define
$F(x)=\#\left\{(i,j):0\le i<j<n,\ a_i a_j\le x\right\}.$
This function is monotone nondecreasing: if \(x_1\le x_2\), then every pair counted by \(F(x_1)\) is also counted by \(F(x_2)\). Therefore the answer is exactly
$\min\left\{x\in\mathbb{Z}_{\ge 0}:F(x)\ge t\right\}.$
That statement is just another way of saying: find the smallest threshold that includes at least the first half of the sorted products.
Step 3: Count \(F(x)\) in linear time with two pointers
Fix a threshold \(x\). For each left index \(i\), let \(J(i)\) be the largest index with \(J(i)>i\) and
$a_i a_{J(i)}\le x.$
Then every index between \(i+1\) and \(J(i)\) is also valid, so the contribution of row \(i\) is
$\max(0, J(i)-i).$
Hence
$F(x)=\sum_{i=0}^{n-1}\max(0, J(i)-i).$
The reason this is fast is that \(J(i)\) never increases as \(i\) increases. Once the left factor becomes larger, the right boundary cannot move to the right and still keep the product \(\le x\). So we can keep one pointer \(j\) starting at \(n-1\), move it only left while \(a_i a_j>x\), and add \(j-i\) when \(j>i\). Because that pointer moves left at most \(n-1\) times in total, one evaluation of \(F(x)\) costs \(O(n)\), not \(O(n^2)\).
Step 4: Binary-search the answer value
The answer must lie between the smallest possible threshold and the largest possible pair product:
$0\le x\le a_{n-2}a_{n-1}.$
Now apply a lower-bound binary search on that integer interval. At each midpoint \(\mathrm{mid}\):
$F(\mathrm{mid})\ge t \Rightarrow \mathrm{hi}=\mathrm{mid},\qquad F(\mathrm{mid})<t \Rightarrow \mathrm{lo}=\mathrm{mid}+1.$
When the search finishes, \(\mathrm{lo}=\mathrm{hi}\) is the smallest threshold whose count reaches rank \(t\), so it is exactly the median product.
Step 5: Worked Example with the first four generated values
The first four sequence values are already in increasing order:
$290797,\quad 629527,\quad 13339144,\quad 15552512.$
So \(n=4\) gives
$m=\binom{4}{2}=6,\qquad t=\left\lceil\frac{6}{2}\right\rceil=3.$
The six pair products are
$\begin{aligned} 290797\cdot 629527&=183064563019,\\ 290797\cdot 13339144&=3878983057768,\\ 290797\cdot 15552512&=4522623832064,\\ 629527\cdot 13339144&=8397351304888,\\ 629527\cdot 15552512&=9790726221824,\\ 13339144\cdot 15552512&=207457197129728. \end{aligned}$
After sorting, the third product is
$4522623832064.$
Equivalently, if \(x=4{,}000{,}000{,}000{,}000\), then only the first two products are counted, so \(F(x)=2<3\). If \(x=4{,}600{,}000{,}000{,}000\), then the first three products are counted, so \(F(x)=3\ge 3\). The lower-bound search therefore locks onto
$4522623832064,$
which is the correct median for this small instance.
How the Code Works
The C++, Python, and Java implementations all follow the same pipeline. They first generate the first \(n\) terms of the recurrence, store them in an array, and sort the array. Next they compute the total number of pair products \(m=\binom{n}{2}\) and the target rank \(t=\left\lfloor\frac{m+1}{2}\right\rfloor\).
They then binary-search over the integer interval \([0, a_{n-2}a_{n-1}]\). Each midpoint invokes the linear two-pointer counting routine described above: a left index scans from the beginning, a right index only moves left, and the routine accumulates how many products are at most the current threshold. If the count is already at least \(t\), the search keeps the left half; otherwise it keeps the right half.
The implementations also use integer types large enough for both the pair count and the product range. That guarantees the comparison logic remains exact throughout the search.
Complexity Analysis
Generating the sequence costs \(O(n)\). Sorting costs \(O(n\log n)\). Each evaluation of \(F(x)\) is \(O(n)\) because the right pointer moves only left and never resets. The binary search performs \(O(\log V)\) iterations, where
$V=a_{n-2}a_{n-1}.$
So the total running time is
$O(n\log n+n\log V),$
with memory usage \(O(n)\). In practice the sorting step dominates the preprocessing, while the counting oracle makes each search iteration linear and cache-friendly.
Footnotes and References
- Problem page: https://projecteuler.net/problem=793
- Order statistic: Wikipedia — Order statistic
- Binary search algorithm: Wikipedia — Binary search algorithm
- Two pointers technique: cp-algorithms — Two pointers method
- Modular arithmetic: Wikipedia — Modular arithmetic