Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 433: Steps in Euclid's Algorithm

Project Euler Problem 433 Solution

EulerSolve provides an optimized solution for Project Euler Problem 433, Steps in Euclid's Algorithm, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary For positive integers $x$ and $y$, let $E(x,y)$ denote the number of divisions performed by Euclid's algorithm until the remainder becomes $0$. The problem asks for $S(N)=\sum_{1\le x,y\le N} E(x,y)$ at the very large value $N=5\cdot 10^6$. A direct double loop is hopeless: there are $N^2$ ordered pairs, and each pair would still require a separate Euclidean run. The solution therefore reorganizes the sum so that most of the work is done by arithmetic formulas, Möbius inversion, and floor-sum recurrences. Mathematical Approach 1. Reduce the Full Sum to the Lower Triangle When $x>y$, starting from $(y,x)$ performs one swap before entering exactly the same remainder chain, so $E(y,x)=E(x,y)+1 \qquad (x>y).$ Also, $E(x,x)=1$ for every diagonal pair. If we define $L(N)=\sum_{1\le y<x\le N} E(x,y),$ then the whole ordered sum becomes $\begin{aligned} S(N) &=\sum_{x=1}^{N}E(x,x)+\sum_{1\le y<x\le N}\bigl(E(x,y)+E(y,x)\bigr) \\ &=N+\sum_{1\le y<x\le N}\bigl(2E(x,y)+1\bigr) \\ &=2L(N)+\frac{N(N+1)}{2}. \end{aligned}$ So it is enough to evaluate the lower-triangular sum $L(N)$. 2. Peel Off the Universal First Steps Every pair with $x>y$ needs at least one Euclidean step....

Detailed mathematical approach

Problem Summary

For positive integers $x$ and $y$, let $E(x,y)$ denote the number of divisions performed by Euclid's algorithm until the remainder becomes $0$. The problem asks for

$S(N)=\sum_{1\le x,y\le N} E(x,y)$

at the very large value $N=5\cdot 10^6$. A direct double loop is hopeless: there are $N^2$ ordered pairs, and each pair would still require a separate Euclidean run. The solution therefore reorganizes the sum so that most of the work is done by arithmetic formulas, Möbius inversion, and floor-sum recurrences.

Mathematical Approach

1. Reduce the Full Sum to the Lower Triangle

When $x>y$, starting from $(y,x)$ performs one swap before entering exactly the same remainder chain, so

$E(y,x)=E(x,y)+1 \qquad (x>y).$

Also, $E(x,x)=1$ for every diagonal pair. If we define

$L(N)=\sum_{1\le y<x\le N} E(x,y),$

then the whole ordered sum becomes

$\begin{aligned} S(N) &=\sum_{x=1}^{N}E(x,x)+\sum_{1\le y<x\le N}\bigl(E(x,y)+E(y,x)\bigr) \\ &=N+\sum_{1\le y<x\le N}\bigl(2E(x,y)+1\bigr) \\ &=2L(N)+\frac{N(N+1)}{2}. \end{aligned}$

So it is enough to evaluate the lower-triangular sum $L(N)$.

2. Peel Off the Universal First Steps

Every pair with $x>y$ needs at least one Euclidean step. There is also an easily recognized second step: if $x<2y$, then the first quotient is $1$, so after one subtraction we are left with $(y,x-y)$, which is still a non-terminal pair. Hence

$E(x,y)=1+\mathbf{1}_{x<2y}+R(x,y),\qquad R(x,y)\ge 0.$

The easy contribution is therefore

$B(N)=\sum_{1\le y<x\le N}\left(1+\mathbf{1}_{x<2y}\right).$

The first part is just the number of pairs below the diagonal:

$\sum_{1\le y<x\le N}1=\frac{N(N-1)}{2}.$

For the indicator term, count the pairs with $y<x<2y$. Writing $N=2m$ or $N=2m+1$, one gets

$\sum_{1\le y<x\le 2m}\mathbf{1}_{x<2y}=m(m-1),$

$\sum_{1\le y<x\le 2m+1}\mathbf{1}_{x<2y}=m^2.$

So the closed base term is

$B(N)= \begin{cases} (3m-2)m, & N=2m, \\ 3m^2+m, & N=2m+1. \end{cases}$

This is exactly the closed form used by the implementations before any expensive computation begins.

3. Residual Contribution and the GCD Structure

The remaining term $R(x,y)$ measures everything beyond those universal first one or two steps. Because Euclid's algorithm is unchanged by scaling both arguments by the same factor,

$R(dx,dy)=R(x,y)\qquad (d\ge 1).$

That makes the gcd decomposition natural: the residual depends only on the primitive core of the pair.

Another useful observation is that $R(x,y)=0$ unless the larger coordinate is at least $5$. Indeed, the first primitive pairs with more than the peeled-off base behavior are $(5,2)$ and $(5,3)$. This is why the Möbius layer only needs to run up to $\lfloor N/5\rfloor$.

4. Möbius Inversion

Let $\mu$ be the Möbius function and let

$M(t)=\sum_{d\le t}\mu(d)$

be its prefix sum. The fast method isolates primitive pairs by Möbius inversion and rewrites the residual part of the lower triangle in the form

$L(N)=B(N)+2\sum_{d=1}^{\lfloor N/5\rfloor}\mu(d)\,A\!\left(\left\lfloor\frac{N}{d}\right\rfloor\right),$

where $A(V)$ is an unrestricted auxiliary counting function. The role of $\mu(d)$ is to remove contributions coming from non-coprime multiples. This is why all three implementations build Möbius values and their prefix sums before the main accumulation loop.

5. Reverse Euclid and the Auxiliary Function

To compute $A(V)$, the implementation reverses one step of Euclid's algorithm. If a reduced tail pair is $(u,v)$ with $u>v\ge 1$, then every predecessor that reaches it after one division has the shape

$\bigl(ku+v,\;u\bigr),\qquad k\ge 1.$

Imposing the upper bound $ku+v\le V$ turns the counting problem into lattice-point counting under linear inequalities. After summing all admissible predecessors of all reduced tails, the required quantities reduce to floor sums of the form

$\sum_t \left\lfloor\frac{\alpha t+\beta}{\gamma}\right\rfloor,$

together with a second weighted variant of the same type. The implementation evaluates both of these by a recursive Euclidean-style transform, which is the key reason the method stays fast.

6. Why the Loop Stops at $\sqrt{V}$

The predecessor lattice is symmetric between two coordinate regions. Because of that symmetry, the auxiliary computation only needs to enumerate reduced tail pairs with larger coordinate at most $\lfloor\sqrt{V}\rfloor$. One accumulated sum covers the full asymmetric region, another covers the square overlap, and the final combination is a doubled total minus that overlap correction. This explains the integer-square-root bound seen in the code.

7. Harmonic Interval Grouping

The Möbius sum is not evaluated one divisor at a time. The quotient

$q=\left\lfloor\frac{N}{d}\right\rfloor$

stays constant on intervals $l\le d\le r$, where

$r=\left\lfloor\frac{N}{q}\right\rfloor.$

So an entire block contributes

$\sum_{d=l}^{r}\mu(d)\,A(q)=\bigl(M(r)-M(l-1)\bigr)A(q).$

Only $O(\sqrt{N})$ distinct quotients occur, so the expensive auxiliary function is called much fewer times than in a linear scan.

Worked Check: $N=10$

For $N=10$, we have $m=5$, so the base part is

$B(10)=(3\cdot 5-2)\cdot 5=65.$

The residual term computed by the fast Möbius-floor-sum pipeline is $18$, hence

$L(10)=65+18=83.$

Returning to the full ordered sum gives

$S(10)=2L(10)+\frac{10\cdot 11}{2}=2\cdot 83+55=221,$

which matches the small checkpoint used to validate the implementation.

How the Code Works

The C++, Python, and Java implementations all follow the same plan. They first build the Möbius table and its prefix sums up to $\lfloor N/5\rfloor$. They then evaluate the closed base term $B(N)$, generate the harmonic intervals where $\lfloor N/d\rfloor$ is constant, and for each distinct quotient compute the reverse-Euclid auxiliary kernel via recursive floor sums. Each block is weighted by the corresponding Möbius prefix difference, added to the base term, and finally converted back to the full ordered total through

$S(N)=2L(N)+\frac{N(N+1)}{2}.$

Complexity Analysis

A naive method would inspect all $N^2$ ordered pairs and run Euclid's algorithm separately, which is entirely infeasible for $N=5\cdot 10^6$. The optimized method uses a linear-style Möbius sieve up to $\lfloor N/5\rfloor$, only $O(\sqrt{N})$ distinct quotient blocks in the outer summation, and a recursive floor-sum kernel whose depth is logarithmic in the relevant parameters. In practice this turns a quadratic enumeration into a heavily compressed arithmetic computation. The main memory cost is the Möbius and prefix tables, so the space usage is $O(N/5)$.

References

Problem page: https://projecteuler.net/problem=433
Euclidean algorithm: Wikipedia — Euclidean algorithm
Möbius function and inversion: Wikipedia — Möbius inversion formula
Continued fractions and Euclid's algorithm: Wikipedia — Continued fraction
Graham, Knuth, Patashnik, Concrete Mathematics, 2nd ed., sections on floor sums and Möbius inversion.