Problem 507: Shortest Lattice Vector
View on Project EulerProject Euler Problem 507 Solution
EulerSolve provides an optimized solution for Project Euler Problem 507, Shortest Lattice Vector, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary Let the Tribonacci sequence be defined by $t_1=0,\qquad t_2=t_3=1,\qquad t_{k+3}\equiv t_{k+2}+t_{k+1}+t_k \pmod{10^7}.$ For each \(n\ge 1\), take the 12 consecutive values $r_k=t_{12n-12+k}\qquad (1\le k\le 12),$ and form two vectors in \(\mathbb{Z}^3\): $v_n=(r_1-r_2,\ r_3+r_4,\ r_5r_6),\qquad w_n=(r_7-r_8,\ r_9+r_{10},\ r_{11}r_{12}).$ We must compute $s(n)=\min_{(a,b)\in \mathbb{Z}^2\setminus\{(0,0)\}} \left\|a v_n+b w_n\right\|_1,$ where $\left\|(x,y,z)\right\|_1=|x|+|y|+|z|,$ and then sum these values: $S(N)=\sum_{n=1}^{N} s(n).$ The target computation uses \(N=20{,}000{,}000\), so the solution has to generate the sequence and solve each rank-2 lattice problem without storing huge tables. Mathematical Approach The lattice generated by \(v_n\) and \(w_n\) is $L_n=\{a v_n+b w_n:\ a,b\in\mathbb{Z}\}.$ Because the lattice has rank 2, the implementations use an \(L_1\)-analogue of two-vector reduction: repeatedly shorten the longer generator by subtracting an integer multiple of the shorter one. The critical point is that the best multiple can be found by checking only a handful of integer candidates. Step 1: Generate the Tribonacci data in blocks of 12 Each lattice instance uses exactly 12 consecutive Tribonacci values....
Detailed mathematical approach
Problem Summary
Let the Tribonacci sequence be defined by
$t_1=0,\qquad t_2=t_3=1,\qquad t_{k+3}\equiv t_{k+2}+t_{k+1}+t_k \pmod{10^7}.$
For each \(n\ge 1\), take the 12 consecutive values
$r_k=t_{12n-12+k}\qquad (1\le k\le 12),$
and form two vectors in \(\mathbb{Z}^3\):
$v_n=(r_1-r_2,\ r_3+r_4,\ r_5r_6),\qquad w_n=(r_7-r_8,\ r_9+r_{10},\ r_{11}r_{12}).$
We must compute
$s(n)=\min_{(a,b)\in \mathbb{Z}^2\setminus\{(0,0)\}} \left\|a v_n+b w_n\right\|_1,$
where
$\left\|(x,y,z)\right\|_1=|x|+|y|+|z|,$
and then sum these values:
$S(N)=\sum_{n=1}^{N} s(n).$
The target computation uses \(N=20{,}000{,}000\), so the solution has to generate the sequence and solve each rank-2 lattice problem without storing huge tables.
Mathematical Approach
The lattice generated by \(v_n\) and \(w_n\) is
$L_n=\{a v_n+b w_n:\ a,b\in\mathbb{Z}\}.$
Because the lattice has rank 2, the implementations use an \(L_1\)-analogue of two-vector reduction: repeatedly shorten the longer generator by subtracting an integer multiple of the shorter one. The critical point is that the best multiple can be found by checking only a handful of integer candidates.
Step 1: Generate the Tribonacci data in blocks of 12
Each lattice instance uses exactly 12 consecutive Tribonacci values. There is no need to materialize the whole sequence up to index \(12N\); we only need a rolling state for three consecutive terms.
Once the current state \((t_k,t_{k+1},t_{k+2})\) is known, the next term is
$t_{k+3}\equiv t_{k+2}+t_{k+1}+t_k \pmod{10^7}.$
After every 12 streamed values, we obtain one pair \((v_n,w_n)\). This turns the outer problem into a sequence of independent shortest-vector queries in lattices of the form \(\mathbb{Z}v+\mathbb{Z}w\subseteq \mathbb{Z}^3\).
Step 2: Rewrite the shortest-vector search as a basis reduction problem
For fixed vectors \(v,w\), we want
$m(v,w)=\min_{(a,b)\in \mathbb{Z}^2\setminus\{(0,0)\}} \left\|av+bw\right\|_1.$
If we replace \(w\) by
$w'=w-tv\qquad (t\in\mathbb{Z}),$
then the lattice does not change, because
$\mathbb{Z}v+\mathbb{Z}w=\mathbb{Z}v+\mathbb{Z}(w-tv).$
This is the same unimodular basis change used in Gauss reduction. Therefore we are free to shear one generator by an integer multiple of the other whenever that makes the basis shorter in the \(L_1\) norm.
Step 3: Minimize one shear step by studying a convex piecewise-linear function
Suppose \(v\) is the shorter current vector and we want the best replacement for \(w\). Define
$f(t)=\left\|w-tv\right\|_1=\sum_{i=1}^{3}\left|w_i-t v_i\right|,\qquad t\in\mathbb{Z}.$
Each summand \(\left|w_i-t v_i\right|\) is a convex piecewise-linear function of \(t\). Its slope changes only when \(t=w_i/v_i\) for a coordinate with \(v_i\ne 0\). Therefore \(f(t)\) is also convex and piecewise linear, with all breakpoints coming from those coordinate ratios.
Over the integers, a minimizer of a convex piecewise-linear function must occur at an integer adjacent to one of the breakpoints, so it is enough to test
$\left\lfloor\frac{w_i}{v_i}\right\rfloor,\qquad \left\lfloor\frac{w_i}{v_i}\right\rfloor+1$
for every coordinate with \(v_i\ne 0\), together with \(t=0\). In three dimensions this gives at most seven candidates.
That is why the implementations can find the best integer shear in constant time per reduction step instead of searching over all integers.
Step 4: Repeat the reduction until no further improvement is possible
The reduction loop always keeps the shorter vector first. It computes the best integer \(t\), forms \(w-tv\), and accepts the change only if the \(L_1\) norm strictly decreases:
$\left\|w-tv\right\|_1<\left\|w\right\|_1.$
Every accepted step preserves the lattice and strictly lowers the norm of one basis vector, so the process must terminate. When no tested integer multiple makes the longer vector shorter, the pair is locally reduced for this \(L_1\) shear operation.
The implementations then return the shorter nonzero vector of the reduced pair. For the rank-2 lattices arising here, this is the quantity used throughout the computation of \(s(n)\).
Step 5: Jump to any starting index with a companion matrix
The C++ and Java implementations do not restart the Tribonacci sequence from the beginning whenever they need a later index. Instead they use the companion matrix
$A=\begin{pmatrix} 1 & 1 & 1\\ 1 & 0 & 0\\ 0 & 1 & 0 \end{pmatrix}.$
This matrix advances the state vector by one step:
$A\begin{pmatrix}t_{k+2}\\ t_{k+1}\\ t_k\end{pmatrix}=\begin{pmatrix}t_{k+3}\\ t_{k+2}\\ t_{k+1}\end{pmatrix}.$
So exponentiation by squaring gives any required starting state in \(O(\log k)\) time. After that jump, the next terms are streamed linearly.
Worked Example: The first lattice pair
The first 12 Tribonacci values are
$0,1,1,2,4,7,13,24,44,81,149,274.$
Hence
$v=(-1,3,28),\qquad w=(-11,125,40826).$
The initial norms are
$\|v\|_1=|-1|+|3|+|28|=32,$
$\|w\|_1=|-11|+|125|+|40826|=40962.$
To reduce \(w\), examine the coordinate ratios
$\frac{-11}{-1}=11,\qquad \frac{125}{3}\approx 41.67,\qquad \frac{40826}{28}\approx 1458.07.$
The best tested integer is \(t=1458\), giving
$w-1458v=(1447,-4249,2),$
with
$\|w-1458v\|_1=1447+4249+2=5698.$
This is a major improvement over \(40962\), but it is still larger than \(\|v\|_1=32\). No further integer shear shortens the second vector, so the shortest nonzero lattice vector has \(L_1\) length
$s(1)=32.$
This matches the first checkpoint used by the implementation. A second checkpoint verifies
$S(10)=130762273722.$
How the Code Works
The C++ and Java implementations share the same numerical method. They compute the Tribonacci values on the fly, buffer 12 consecutive terms, build the two 3-dimensional generators, run the reduction loop described above, and add the resulting shortest \(L_1\) length to a running total.
Because the third coordinate is a product of two Tribonacci values, intermediate vector norms can become large. The C++ implementation therefore uses 128-bit integer arithmetic for norm calculations and final accumulation, while the Java implementation uses arbitrary-precision integers for the same reason.
The C++ implementation also splits the range \(1\le n\le N\) into contiguous chunks when multiple hardware threads are available. Each chunk computes its own starting Tribonacci state via matrix exponentiation, then streams its local terms independently and returns a partial sum. Those partial sums are added at the end.
The Python implementation is intentionally thin: it ensures that the C++ solver is available, runs that compiled program, and returns the printed decimal answer. In other words, the mathematical work is the same, while Python serves as a bridge layer.
Before the full target run, the implementation checks the first generated vector pair, confirms \(s(1)=32\), and confirms \(S(10)=130762273722\). Those checkpoints protect against indexing mistakes in the sequence stream and against incorrect reduction behavior.
Complexity Analysis
Generating the 12 Tribonacci values for one lattice instance costs \(O(1)\) time and \(O(1)\) memory. Each reduction iteration evaluates at most seven candidate integers, so one iteration is also \(O(1)\). In practice the number of accepted reductions is small for these 3-dimensional rank-2 bases, which makes the average cost per \(n\) effectively constant.
As a result, the total sequential running time is linear in the number of instances:
$O(N).$
If the work is split across \(P\) independent chunks, each chunk pays an additional \(O(\log N)\) initialization cost for the matrix jump, so the total work stays linear and the wall-clock time is close to
$O\!\left(\frac{N}{P}+\log N\right)$
per chunk, with \(O(1)\) memory per chunk.