Problem 472: Comfortable Distance II
View on Project EulerProject Euler Problem 472 Solution
EulerSolve provides an optimized solution for Project Euler Problem 472, Comfortable Distance II, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary This problem defines an integer sequence \(f(n)\) coming from the comfortable-distance construction and asks for the large prefix sum $S(n)=\sum_{k=1}^{n} f(k),$ with the final result reported modulo \(10^8\). The implementations target values as large as \(n=10^{12}\), so a direct simulation of every term is completely impractical. The checkpoints built into the solution are $f(1)=1,\qquad f(15)=9,\qquad f(20)=6,\qquad f(500)=16,$ and $S(20)=83,\qquad S(500)=13343.$ The key observation is that the sequence is not generated term by term. Instead, it is organized into self-similar blocks on dyadic intervals, and each large block can be reduced to a smaller one plus three explicit arithmetic segments. Mathematical Approach For each \(k \ge 0\), define the block $B_k=\bigl(f(2^k+1),f(2^k+2),\ldots,f(2^{k+1})\bigr),$ so \(|B_k|=2^k\). Also define the power-of-two prefix and the internal block prefix by $P_k=\sum_{n=1}^{2^k} f(n),\qquad G_k(m)=\sum_{j=1}^{m} B_k(j),\qquad G_k(0)=0.$ Step 1: Seed Blocks and Notation The first five blocks are stored explicitly: $B_0=(2),\qquad B_1=(2,4),$ $B_2=(3,6,2,6),\qquad B_3=(3,8,2,6,2,4,9,4),$ $B_4=(3,8,2,6,2,4,10,4,2,4,6,8,15,6,5,4).$ Together with \(f(1)=1\), these values determine all small checkpoints. They also serve as the exact base of the recursion, so no approximation is introduced anywhere....
Detailed mathematical approach
Problem Summary
This problem defines an integer sequence \(f(n)\) coming from the comfortable-distance construction and asks for the large prefix sum
$S(n)=\sum_{k=1}^{n} f(k),$
with the final result reported modulo \(10^8\). The implementations target values as large as \(n=10^{12}\), so a direct simulation of every term is completely impractical. The checkpoints built into the solution are
$f(1)=1,\qquad f(15)=9,\qquad f(20)=6,\qquad f(500)=16,$
and
$S(20)=83,\qquad S(500)=13343.$
The key observation is that the sequence is not generated term by term. Instead, it is organized into self-similar blocks on dyadic intervals, and each large block can be reduced to a smaller one plus three explicit arithmetic segments.
Mathematical Approach
For each \(k \ge 0\), define the block
$B_k=\bigl(f(2^k+1),f(2^k+2),\ldots,f(2^{k+1})\bigr),$
so \(|B_k|=2^k\). Also define the power-of-two prefix and the internal block prefix by
$P_k=\sum_{n=1}^{2^k} f(n),\qquad G_k(m)=\sum_{j=1}^{m} B_k(j),\qquad G_k(0)=0.$
Step 1: Seed Blocks and Notation
The first five blocks are stored explicitly:
$B_0=(2),\qquad B_1=(2,4),$
$B_2=(3,6,2,6),\qquad B_3=(3,8,2,6,2,4,9,4),$
$B_4=(3,8,2,6,2,4,10,4,2,4,6,8,15,6,5,4).$
Together with \(f(1)=1\), these values determine all small checkpoints. They also serve as the exact base of the recursion, so no approximation is introduced anywhere.
Step 2: Large Blocks Split into Four Deterministic Pieces
For \(k \ge 5\), set
$r=2^{k-3}.$
Then \(|B_k|=2^k=8r\), and the block decomposes into four consecutive pieces:
$\text{Part I: } B_k(1),\ldots,B_k(3r)=B_{k-1}(1),\ldots,B_{k-1}(3r),$
$\text{Part II: } 4r+2,\ 2r,\ 2r-2,\ \ldots,\ 4,$
$\text{Part III: } 2,\ 4,\ 6,\ \ldots,\ 4r,$
$\text{Part IV: } 6r+3,\ 2r+2,\ 2r+1,\ \ldots,\ 4.$
So the first \(3r\) values are copied from the first three quarters of the previous block, while the remaining \(5r\) values are generated by closed arithmetic rules. This is the structural reason the problem becomes logarithmic rather than linear.
Step 3: Closed Forms for the Three New Pieces
Let the prefix sums of Parts II, III, and IV be denoted by \(A(r,m)\), \(E(m)\), and \(U(r,m)\).
For Part II, with \(1 \le m \le r\),
$A(r,m)=4r+2+(m-1)(2r-m+2),$
and the full segment sum is
$\Sigma_A(r)=A(r,r)=r(r+5).$
For Part III, with \(1 \le m \le 2r\),
$E(m)=m(m+1),$
because the segment is simply \(2,4,6,\ldots,2m\). Its full sum is
$\Sigma_E(r)=E(2r)=4r^2+2r.$
For Part IV, with \(1 \le m \le 2r\),
$U(r,m)=6r+3+\frac{(m-1)(4r-m+6)}{2},$
and the full segment sum is
$\Sigma_U(r)=U(r,2r)=2r^2+11r.$
If \(Q_k\) denotes the sum of the first three quarters of \(B_k\), then
$Q_k=Q_{k-1}+\Sigma_A(r)+\Sigma_E(r)=Q_{k-1}+5r^2+7r,$
and the whole block sum is
$\Sigma_k=Q_k+\Sigma_U(r).$
Step 4: Prefix Sums Reduce to One Block Query
Write any \(n \ge 1\) in the form
$n=2^k+t,\qquad 0 \le t \le 2^k.$
Then
$S(n)=P_k+G_k(t).$
For \(k \le 4\), \(G_k(t)\) is read directly from the stored seed-block prefixes. For \(k \ge 5\), the block prefix is piecewise:
$G_k(t)= \begin{cases} 0, & t=0,\\ G_{k-1}(t), & 1 \le t \le 3r,\\ Q_{k-1}+A(r,t-3r), & 3r \lt t \le 4r,\\ Q_{k-1}+\Sigma_A(r)+E(t-4r), & 4r \lt t \le 6r,\\ Q_{k-1}+\Sigma_A(r)+\Sigma_E(r)+U(r,t-6r), & 6r \lt t \le 8r. \end{cases}$
Only the first case makes a recursive descent, and each descent lowers the block index by one. That is why the query depth is \(O(\log n)\).
Step 5: Worked Example
The first nontrivial large block is \(B_5\), where \(r=2^{5-3}=4\). The first \(12\) entries are copied from the first \(12\) entries of \(B_4\):
$3,8,2,6,2,4,10,4,2,4,6,8.$
The three analytic pieces are then
$18,8,6,4,$
$2,4,6,8,10,12,14,16,$
and
$27,10,9,8,7,6,5,4.$
Hence
$B_5=(3,8,2,6,2,4,10,4,2,4,6,8,18,8,6,4,2,4,6,8,10,12,14,16,27,10,9,8,7,6,5,4).$
As a quick prefix check,
$P_4=1+2+6+17+38=64,$
and since \(20=16+4\),
$S(20)=P_4+G_4(4)=64+(3+8+2+6)=83,$
which matches the required checkpoint.
How the Code Works
The C++, Python, and Java implementations all follow the same plan. They store the five seed blocks, build prefix tables for those exact values, and then precompute for each higher level three quantities: the sum of the whole block, the sum of the first three quarters of the block, and the prefix sum up to the next power of two.
When a query \(S(n)\) arrives, the implementation finds the highest power of two not exceeding \(n\). The stored prefix up to that power contributes immediately, and only the remainder inside one block still has to be evaluated. That remainder is handled by the piecewise formulas above, using recursion only when the remainder falls into the copied first segment.
The arithmetic is exact before the final reduction modulo \(10^8\). The C++ version uses 128-bit integers for intermediate totals, the Java version uses arbitrary-precision integers, and the Python version relies on Python's built-in unbounded integers.
Complexity Analysis
If the target range is below \(2^{64}\), only \(64\) block levels are ever needed. Precomputing all stored block totals and power-of-two prefixes therefore costs \(O(\log n_{\max})\) time and \(O(\log n_{\max})\) memory. A single query for \(S(n)\) performs at most one descent per level, so its running time is \(O(\log n)\) and its extra working memory is \(O(1)\) beyond the precomputed tables.
Footnotes and References
- Problem page: https://projecteuler.net/problem=472
- Prefix sum: Wikipedia - Prefix sum
- Recurrence relation: Wikipedia - Recurrence relation
- Arithmetic progression: Wikipedia - Arithmetic progression