Problem 593: Fleeting Medians
View on Project EulerProject Euler Problem 593 Solution
EulerSolve provides an optimized solution for Project Euler Problem 593, Fleeting Medians, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary The sequence is defined by $S(k)=p_k^k \bmod 10007,\qquad S_2(k)=S(k)+S\left(\left\lfloor\frac{k}{10000}\right\rfloor+1\right),$ where \(p_k\) is the \(k\)-th prime. For each contiguous block \(S_2(i),S_2(i+1),\dots,S_2(i+K-1)\), let \(M(i,i+K-1)\) be its median, using the average of the two middle values when \(K\) is even. The target quantity is $F(n,K)=\sum_{i=1}^{n-K+1} M(i,i+K-1).$ A direct recomputation of every window median would be far too slow. The implementations succeed because the sequence can be generated online, every value lies in a very small fixed range, and a frequency structure can therefore answer median queries without re-sorting each window. Mathematical Approach The solution combines two observations: generating each term of \(S_2\) is cheap once the primes are streamed in order, and medians of a multiset can be recovered from prefix frequencies. Step 1: Generate \(S(k)\) efficiently from the prime stream Since \(10007\) is prime, every prime \(p_k\neq 10007\) is invertible modulo \(10007\). Therefore Fermat's little theorem allows the exponent to be reduced modulo $\varphi(10007)=10006.$ So, except for the single case \(p_k=10007\), we may compute $p_k^k \bmod 10007 = p_k^{\,k \bmod 10006} \bmod 10007.$ This turns each term into one fast modular exponentiation once the \(k\)-th prime is known....
Detailed mathematical approach
Problem Summary
The sequence is defined by
$S(k)=p_k^k \bmod 10007,\qquad S_2(k)=S(k)+S\left(\left\lfloor\frac{k}{10000}\right\rfloor+1\right),$
where \(p_k\) is the \(k\)-th prime. For each contiguous block \(S_2(i),S_2(i+1),\dots,S_2(i+K-1)\), let \(M(i,i+K-1)\) be its median, using the average of the two middle values when \(K\) is even. The target quantity is
$F(n,K)=\sum_{i=1}^{n-K+1} M(i,i+K-1).$
A direct recomputation of every window median would be far too slow. The implementations succeed because the sequence can be generated online, every value lies in a very small fixed range, and a frequency structure can therefore answer median queries without re-sorting each window.
Mathematical Approach
The solution combines two observations: generating each term of \(S_2\) is cheap once the primes are streamed in order, and medians of a multiset can be recovered from prefix frequencies.
Step 1: Generate \(S(k)\) efficiently from the prime stream
Since \(10007\) is prime, every prime \(p_k\neq 10007\) is invertible modulo \(10007\). Therefore Fermat's little theorem allows the exponent to be reduced modulo
$\varphi(10007)=10006.$
So, except for the single case \(p_k=10007\), we may compute
$p_k^k \bmod 10007 = p_k^{\,k \bmod 10006} \bmod 10007.$
This turns each term into one fast modular exponentiation once the \(k\)-th prime is known. The C++, Python, and Java implementations all estimate a safe upper bound for the \(n\)-th prime and use an odd-only sieve to enumerate primes in increasing order.
Step 2: Exploit the very small value range of \(S_2(k)\)
By construction,
$0\le S(k)\le 10006.$
Hence
$0\le S_2(k)=S(k)+S\left(\left\lfloor\frac{k}{10000}\right\rfloor+1\right)\le 20012.$
This bound is the key simplification. Every window value lies in the fixed domain
$\{0,1,2,\dots,20012\},$
whose size is only \(V=20013\). For the target instance \(n=10^7\), the shifted index \(\left\lfloor k/10000\right\rfloor+1\) never exceeds \(1001\), so the implementation only needs to remember those early sequence values to form the second summand.
Step 3: Rewrite the median as an order-statistics problem
For one window of length \(K\), let \(f_v\) be the number of occurrences of value \(v\), and define the prefix counts
$P(v)=\sum_{x=0}^{v} f_x.$
The \(t\)-th smallest value in the window is the unique \(v\) satisfying
$P(v-1) \lt t \le P(v).$
If the sorted window values are \(a_1\le a_2\le \cdots \le a_K\), then setting
$t_1=\left\lfloor\frac{K+1}{2}\right\rfloor,\qquad t_2=\left\lfloor\frac{K+2}{2}\right\rfloor$
gives the compact formula
$M=\frac{a_{t_1}+a_{t_2}}{2}.$
So every window median reduces to finding one or two order statistics inside a frequency table.
Step 4: Maintain those frequencies under a sliding window
Moving the window by one position changes only two elements: one outgoing value disappears and one incoming value appears. In frequency form this is
$f_{v_{\text{out}}}\leftarrow f_{v_{\text{out}}}-1,\qquad f_{v_{\text{in}}}\leftarrow f_{v_{\text{in}}}+1.$
A Fenwick tree stores the array \((f_0,f_1,\dots,f_{20012})\), supports both updates in \(O(\log V)\), and can also search for the smallest value whose prefix sum reaches a target rank \(t\). That makes median extraction \(O(\log V)\) per window as well.
Step 5: Keep the total integral by doubling every median
When \(K\) is even, a median may end in \(0.5\). Instead of using floating-point arithmetic, the implementation accumulates the doubled contribution
$2M=a_{t_1}+a_{t_2}.$
Equivalently, it sums
$2F(n,K)=\sum_{i=1}^{n-K+1} \bigl(a_{t_1}^{(i)}+a_{t_2}^{(i)}\bigr),$
where \(a_{t_1}^{(i)}\) and \(a_{t_2}^{(i)}\) are the two middle order statistics of the \(i\)-th window. Only at the very end is the result divided by \(2\), which guarantees an exact decimal ending in .0 or .5.
Worked Example: the first window of length \(10\)
For \(1\le k\le 10\), we have \(\left\lfloor k/10000\right\rfloor+1=1\), and \(S(1)=2\). Therefore the first ten values are
$\{S_2(1),\dots,S_2(10)\}=\{4,11,127,2403,941,3437,1640,2867,6554,9242\}.$
After sorting, the window becomes
$\{4,11,127,941,1640,2403,2867,3437,6554,9242\}.$
The middle positions are \(5\) and \(6\), so
$M(1,10)=\frac{1640+2403}{2}=\frac{4043}{2}=2021.5.$
The doubled contribution is \(4043\), exactly the value used by the implementation when checking this sample window.
How the Code Works
The C++, Python, and Java implementations follow the same pipeline. They first estimate a bound for the \(n\)-th prime and sieve only odd numbers, which is enough to stream primes in order without storing a full list of all candidates. As each prime arrives, the implementation computes the corresponding residue \(S(k)\), caches the early residues needed by the shifted term, and immediately forms the next \(S_2(k)\).
The current window is maintained by combining a circular buffer with a Fenwick tree over the value range \(0\) through \(20012\). Filling the first \(K\) values creates the initial multiset. Each later step removes the outgoing value, inserts the new one, and asks the Fenwick tree for the middle rank or the two middle ranks. The running sum is kept doubled, so no rounding is needed. The C++ implementation also verifies the sample medians and sample sums from the problem statement before evaluating the full instance; the Python and Java versions implement the same algorithmic idea with the same mathematics.
Complexity Analysis
Let \(B\) be the sieve bound used for the \(n\)-th prime, and let \(V=20013\). Building the odd-only sieve costs \(O(B\log\log B)\) time and uses sieve storage proportional to \(B\). Streaming the sequence contributes one modular exponentiation per prime, and because the modulus is fixed and the exponent is reduced modulo \(10006\), this behaves as constant-cost arithmetic per term. The sliding-window part performs two Fenwick updates and one or two rank searches per window, each in \(O(\log V)\) time. Since \(V\) is fixed, the median-maintenance cost is \(O(n\log V)\) with very small constants, and the additional working memory is \(O(V+K)\) besides the sieve.
Footnotes and References
- Problem page: https://projecteuler.net/problem=593
- Median: Wikipedia — Median
- Fenwick tree: Wikipedia — Fenwick tree
- Modular exponentiation: Wikipedia — Modular exponentiation
- Sieve of Eratosthenes: Wikipedia — Sieve of Eratosthenes