Problem 816: Shortest Distance Among Points
View on Project EulerProject Euler Problem 816 Solution
EulerSolve provides an optimized solution for Project Euler Problem 816, Shortest Distance Among Points, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary We generate \(k\) deterministic points in the plane from the recurrence $s_{n+1}=s_n^2 \bmod 50515093,\qquad s_0=290797,$ and, using zero-based indexing, define $P_i=(s_{2i},s_{2i+1}),\qquad 0\le i<k.$ The goal is to find the minimum Euclidean distance between any two generated points. For \(k=2{,}000{,}000\), a naive \(O(k^2)\) scan over all pairs is completely infeasible, so the solution uses closest-pair geometry to discard almost all comparisons. Mathematical Approach Write each point as \(P_i=(x_i,y_i)\). The target quantity is $\delta_k=\min_{0\le i<j<k}\sqrt{(x_i-x_j)^2+(y_i-y_j)^2}.$ The key observation is that closest-pair algorithms do not need to compare every pair. Geometry lets us keep only a narrow set of candidates near the current point or near the current dividing line. Step 1: Generate the Point Set Deterministically The recurrence is deterministic, so the entire input is fixed once \(k\) is given. The first few points are $\begin{aligned} P_0&=(290797,629527),\\ P_1&=(13339144,15552512),\\ P_2&=(17939732,3034546),\\ P_3&=(22608053,23794117). \end{aligned}$ Nothing random remains after generation; the computational problem is purely geometric. Step 2: Minimize Squared Distance Instead of Distance Because the square-root function is strictly increasing, minimizing the Euclidean distance is equivalent to minimizing its square....
Detailed mathematical approach
Problem Summary
We generate \(k\) deterministic points in the plane from the recurrence
$s_{n+1}=s_n^2 \bmod 50515093,\qquad s_0=290797,$
and, using zero-based indexing, define
$P_i=(s_{2i},s_{2i+1}),\qquad 0\le i<k.$
The goal is to find the minimum Euclidean distance between any two generated points. For \(k=2{,}000{,}000\), a naive \(O(k^2)\) scan over all pairs is completely infeasible, so the solution uses closest-pair geometry to discard almost all comparisons.
Mathematical Approach
Write each point as \(P_i=(x_i,y_i)\). The target quantity is
$\delta_k=\min_{0\le i<j<k}\sqrt{(x_i-x_j)^2+(y_i-y_j)^2}.$
The key observation is that closest-pair algorithms do not need to compare every pair. Geometry lets us keep only a narrow set of candidates near the current point or near the current dividing line.
Step 1: Generate the Point Set Deterministically
The recurrence is deterministic, so the entire input is fixed once \(k\) is given. The first few points are
$\begin{aligned} P_0&=(290797,629527),\\ P_1&=(13339144,15552512),\\ P_2&=(17939732,3034546),\\ P_3&=(22608053,23794117). \end{aligned}$
Nothing random remains after generation; the computational problem is purely geometric.
Step 2: Minimize Squared Distance Instead of Distance
Because the square-root function is strictly increasing, minimizing the Euclidean distance is equivalent to minimizing its square. Define
$\Delta_k=\min_{0\le i<j<k}\left((x_i-x_j)^2+(y_i-y_j)^2\right).$
Then
$\delta_k=\sqrt{\Delta_k}.$
This matters in the implementation because squared distances stay integral and can be compared exactly. The square root is needed only when a geometric width \(d=\sqrt{\Delta}\) is required, or when the final answer is printed.
Step 3: Sort by \(x\) and Keep Only a Moving Window
After sorting the points by increasing \(x\)-coordinate, suppose the current best squared distance is \(\Delta\), and let \(d=\sqrt{\Delta}\).
Consider a current point \(P_i\) and an earlier point \(P_j\). If
$x_i-x_j>d,$
then automatically
$ (x_i-x_j)^2+(y_i-y_j)^2 \ge (x_i-x_j)^2 > d^2=\Delta,$
so that pair cannot improve the answer. Therefore only points lying within a vertical strip of width \(d\) behind the current point can still matter.
Step 4: Restrict the Search Further with a \(y\)-Window
Even inside that vertical strip, most points can still be ignored. If
$|y_i-y_j|>d,$
then
$ (x_i-x_j)^2+(y_i-y_j)^2 \ge (y_i-y_j)^2 > d^2=\Delta,$
so the pair is again impossible. Hence only points with
$y_j\in[y_i-d,\ y_i+d]$
need to be checked. This is why the active candidates are maintained in \(y\)-order, or why the central strip is sorted by \(y\) in the divide-and-conquer formulation.
Step 5: The Same Geometry Also Yields Divide and Conquer
There is an equivalent classical view. Split the \(x\)-sorted array into left and right halves, solve each half recursively, and let
$\Delta=\min(\Delta_L,\Delta_R).$
Any pair that beats \(\Delta\) but lies across the split must satisfy
$|x-m|<d,$
where \(m\) is the \(x\)-coordinate of the dividing line, so both points lie inside a central strip of width \(2d\). Once that strip is ordered by \(y\), it is enough to compare each point only with later strip points until the \(y\)-gap alone already forces the squared distance to be at least \(\Delta\). This is the standard closest-pair theorem.
Worked Example: \(k=14\)
For the first \(14\) generated points, the closest pair is
$ (18885138,9652189),\qquad(19047286,9130354). $
The coordinate differences are
$\Delta x=162148,\qquad \Delta y=-521835,$
so the squared distance is
$162148^2+(-521835)^2=26291973904+272311767225=298603741129.$
Therefore
$d(14)=\sqrt{298603741129}\approx 546446.466846479,$
which is the checkpoint reproduced by the implementations.
How the Code Works
The C++, Python, and Java implementations all begin the same way: generate the deterministic points, sort them by \(x\), and store the current best value as a squared distance in integer arithmetic.
The C++ and Python implementations use a sweep line. As the scan moves from left to right, points whose \(x\)-distance from the current point already exceeds the current best distance are removed from the active structure. The remaining active points are ordered by \(y\), so the implementation inspects only the narrow \(y\)-window that can still improve the answer.
The Java implementation uses divide and conquer. It solves the left and right halves recursively, keeps the smaller squared distance, then builds the central strip near the median \(x\)-coordinate. That strip is sorted by \(y\), and comparisons stop as soon as the \(y\)-difference alone is too large.
In all three implementations, the final step is a single square root followed by formatting to nine decimal places.
Complexity Analysis
Point generation costs \(O(k)\) time and \(O(k)\) memory. Sorting costs \(O(k\log k)\). The Java divide-and-conquer implementation has the standard \(O(k\log k)\) running time with \(O(k)\) auxiliary memory. The C++ and Python sweep-line implementations also use \(O(k)\) memory and are designed to remain close to \(O(k\log k)\) on this data, because each point enters and leaves the active set once and only a narrow candidate strip is searched for each new point.
Footnotes and References
- Problem page: https://projecteuler.net/problem=816
- Closest pair of points problem: Wikipedia — Closest pair of points problem
- Sweep line algorithm: Wikipedia — Sweep line algorithm
- Euclidean distance: Wikipedia — Euclidean distance
- Divide-and-conquer algorithm: Wikipedia — Divide-and-conquer algorithm