Problem 596: Number of Lattice Points in a Hyperball
View on Project EulerProject Euler Problem 596 Solution
EulerSolve provides an optimized solution for Project Euler Problem 596, Number of Lattice Points in a Hyperball, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary Let $T(r)=\#\left\{(x_1,x_2,x_3,x_4)\in\mathbb Z^4: x_1^2+x_2^2+x_3^2+x_4^2\le r^2\right\}.$ The problem asks for this lattice-point count in four dimensions when \(r=10^8\), with the final value reduced modulo \(10^9+7\). A direct enumeration of all integer quadruples is hopeless, so the solution converts the geometry into a divisor sum that can be evaluated in about \(O(\sqrt{r^2})\) time. Mathematical Approach The key observation is that the hyperball can be counted shell by shell according to the squared distance from the origin. Step 1: Turn the Hyperball into a Summatory Function Set \(N=r^2\). For each integer \(n\ge 0\), let \(r_4(n)\) be the number of integer quadruples satisfying $x_1^2+x_2^2+x_3^2+x_4^2=n.$ Every lattice point inside the ball has some squared radius \(n\) with \(0\le n\le N\), so $T(r)=\sum_{n=0}^{N} r_4(n).$ This removes the geometric viewpoint and replaces it with a summation problem. Step 2: Apply Jacobi's Four-Square Theorem For \(n>0\), Jacobi's theorem gives $r_4(n)=8\sum_{\substack{d\mid n\\4\nmid d}} d,$ and of course \(r_4(0)=1\). Substituting this into the previous identity yields $T(r)=1+8\sum_{n=1}^{N}\sum_{\substack{d\mid n\\4\nmid d}} d.$ Now interchange the order of summation....
Detailed mathematical approach
Problem Summary
Let
$T(r)=\#\left\{(x_1,x_2,x_3,x_4)\in\mathbb Z^4: x_1^2+x_2^2+x_3^2+x_4^2\le r^2\right\}.$
The problem asks for this lattice-point count in four dimensions when \(r=10^8\), with the final value reduced modulo \(10^9+7\). A direct enumeration of all integer quadruples is hopeless, so the solution converts the geometry into a divisor sum that can be evaluated in about \(O(\sqrt{r^2})\) time.
Mathematical Approach
The key observation is that the hyperball can be counted shell by shell according to the squared distance from the origin.
Step 1: Turn the Hyperball into a Summatory Function
Set \(N=r^2\). For each integer \(n\ge 0\), let \(r_4(n)\) be the number of integer quadruples satisfying
$x_1^2+x_2^2+x_3^2+x_4^2=n.$
Every lattice point inside the ball has some squared radius \(n\) with \(0\le n\le N\), so
$T(r)=\sum_{n=0}^{N} r_4(n).$
This removes the geometric viewpoint and replaces it with a summation problem.
Step 2: Apply Jacobi's Four-Square Theorem
For \(n>0\), Jacobi's theorem gives
$r_4(n)=8\sum_{\substack{d\mid n\\4\nmid d}} d,$
and of course \(r_4(0)=1\). Substituting this into the previous identity yields
$T(r)=1+8\sum_{n=1}^{N}\sum_{\substack{d\mid n\\4\nmid d}} d.$
Now interchange the order of summation. A fixed divisor \(d\) contributes once for each multiple of \(d\) up to \(N\), so it appears exactly \(\left\lfloor N/d\right\rfloor\) times. Therefore
$T(r)=1+8\sum_{\substack{d\le N\\4\nmid d}} d\left\lfloor\frac{N}{d}\right\rfloor.$
Step 3: Remove the Multiples of 4 Cleanly
Introduce the summatory function
$S(N)=\sum_{d=1}^{N} d\left\lfloor\frac{N}{d}\right\rfloor.$
Then the terms with \(4\nmid d\) are obtained by subtracting the terms with \(d=4k\). Those discarded terms are
$\sum_{k=1}^{\lfloor N/4\rfloor} 4k\left\lfloor\frac{N}{4k}\right\rfloor.$
Since \(\left\lfloor N/(4k)\right\rfloor=\left\lfloor \left\lfloor N/4\right\rfloor /k\right\rfloor\), this becomes
$4S\!\left(\left\lfloor\frac{N}{4}\right\rfloor\right).$
Hence the whole problem collapses to the closed form
$\boxed{T(r)=1+8\left(S(N)-4S\!\left(\left\lfloor\frac{N}{4}\right\rfloor\right)\right),\qquad N=r^2.}$
Step 4: Evaluate \(S(N)\) in \(O(\sqrt N)\)
A naive evaluation of \(S(N)\) would still be linear in \(N\). The implementation avoids that by exploiting repeated values of \(\left\lfloor N/d\right\rfloor\).
Let
$M=\left\lfloor\sqrt N\right\rfloor.$
For the small divisors \(1\le d\le M\), compute the terms directly:
$\sum_{d=1}^{M} d\left\lfloor\frac{N}{d}\right\rfloor.$
For large divisors \(d>M\), group them by the common quotient
$q=\left\lfloor\frac{N}{d}\right\rfloor.$
All \(d\) producing the same \(q\) lie in the interval
$L_q=\left\lfloor\frac{N}{q+1}\right\rfloor+1,\qquad R_q=\left\lfloor\frac{N}{q}\right\rfloor.$
Only the portion above \(M\) belongs to the second half, so the effective lower bound is \(\max(M+1,L_q)\). The grouped contribution is then
$q\sum_{d=\max(M+1,L_q)}^{R_q} d,$
and the inner sum is an arithmetic progression:
$\sum_{d=l}^{r} d=\frac{(l+r)(r-l+1)}{2}.$
This split-hyperbola decomposition visits only about \(2\sqrt N\) meaningful ranges.
Step 5: Work Modulo \(10^9+7\)
The target answer is needed modulo
$P=10^9+7.$
So every addition and multiplication is reduced modulo \(P\). The division by \(2\) in the arithmetic-series formula is replaced by the modular inverse
$2^{-1}\equiv 500000004 \pmod{P}.$
Because the arithmetic-series sum is an exact integer before reduction, performing the reduction at every step preserves the correct final residue.
Worked Example: \(r=2\)
Here \(N=r^2=4\). First compute
$S(4)=1\cdot 4+2\cdot 2+3\cdot 1+4\cdot 1=15.$
Also, \(\left\lfloor N/4\right\rfloor=1\), so
$S(1)=1.$
Substituting into the closed form gives
$T(2)=1+8(15-4\cdot 1)=1+8\cdot 11=89.$
This matches the standard checkpoint for the problem and confirms that the divisor-sum reformulation is correct.
How the Code Works
The C++, Python, and Java implementations all evaluate the same formula
$T(r)=1+8\left(S(r^2)-4S\!\left(\left\lfloor\frac{r^2}{4}\right\rfloor\right)\right)\pmod{10^9+7}.$
Each implementation first computes an integer square root to split the summation for \(S(N)\) into a direct part and a grouped part. The direct part handles small divisors one by one. The grouped part iterates over quotient values \(q\), reconstructs the divisor interval that shares that quotient, clips the interval so that it stays strictly above \(\sqrt N\), and then adds the interval using the arithmetic-series formula.
The implementations keep every intermediate result modulo \(10^9+7\). The Python and Java versions partition the two summation phases across available worker tasks, while the C++ version performs the same mathematics in a single-threaded pass with careful wide-integer multiplication for safety. Despite those engineering differences, all three implementations are computing exactly the same divisor decomposition.
Complexity Analysis
For one evaluation of \(S(N)\), the direct loop has length \(\lfloor\sqrt N\rfloor\), and the grouped loop also runs over only \(O(\sqrt N)\) distinct quotient values. Therefore
$S(N)\text{ can be evaluated in }O(\sqrt N)\text{ time}.$
The final answer requires two such evaluations, namely at \(N=r^2\) and at \(\left\lfloor N/4\right\rfloor\), so the asymptotic running time remains \(O(\sqrt N)\). The serial method uses \(O(1)\) auxiliary memory, and the parallel variants add only modest task-management overhead on top of the same \(O(\sqrt N)\) total work.
Footnotes and References
- Problem page: https://projecteuler.net/problem=596
- Jacobi's four-square theorem: Wikipedia - Jacobi's four-square theorem
- Dirichlet hyperbola method: Wikipedia - Dirichlet hyperbola method
- Sum of squares function: Wikipedia - Sum of squares function
- Lattice point: Wikipedia - Lattice point