Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 551: Sum of Digits Sequence

Project Euler Problem 551 Solution

EulerSolve provides an optimized solution for Project Euler Problem 551, Sum of Digits Sequence, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary The sequence is defined by $a_1=1,\qquad a_{n+1}=a_n+\sigma(a_n),$ where $\sigma(x)$ is the decimal digit sum of $x$. The target index is so large that simulating every update one by one is infeasible, so the implementations accelerate the process by grouping many ordinary updates into exact carry-to-carry transitions. Mathematical Approach The key observation is that decimal digit sums behave well across powers of $10$. By isolating the lowest six digits, the recurrence splits into a rapidly changing low part and a slowly changing high part. That split makes it possible to precompute exact jumps to the next carry and then compose those jumps over long intervals. Step 1: Split Each Term into a High Part and a Six-Digit Low Part Set $B=10^6,\qquad a_n=Bq_n+r_n,\qquad 0\le r_n \lt B.$ Let $\sigma_6(r)$ denote the digit sum of the six-digit decimal block of $r$, allowing leading zeros....

Detailed mathematical approach

Problem Summary

The sequence is defined by

$a_1=1,\qquad a_{n+1}=a_n+\sigma(a_n),$

where $\sigma(x)$ is the decimal digit sum of $x$. The target index is so large that simulating every update one by one is infeasible, so the implementations accelerate the process by grouping many ordinary updates into exact carry-to-carry transitions.

Mathematical Approach

The key observation is that decimal digit sums behave well across powers of $10$. By isolating the lowest six digits, the recurrence splits into a rapidly changing low part and a slowly changing high part. That split makes it possible to precompute exact jumps to the next carry and then compose those jumps over long intervals.

Step 1: Split Each Term into a High Part and a Six-Digit Low Part

Set

$B=10^6,\qquad a_n=Bq_n+r_n,\qquad 0\le r_n \lt B.$

Let $\sigma_6(r)$ denote the digit sum of the six-digit decimal block of $r$, allowing leading zeros. Because $B$ is a power of $10$, the decimal digits of $q_n$ and $r_n$ do not interfere, so

$\sigma(a_n)=\sigma(q_n)+\sigma_6(r_n).$

Therefore one elementary update becomes

$a_{n+1}=Bq_n+r_n+\sigma(q_n)+\sigma_6(r_n).$

If the low part stays below $B$, then

$q_{n+1}=q_n,\qquad r_{n+1}=r_n+\sigma_6(r_n)+\sigma(q_n).$

If the low part reaches or exceeds $B$, one carry occurs:

$q_{n+1}=q_n+1,\qquad r_{n+1}=r_n+\sigma_6(r_n)+\sigma(q_n)-B.$

For the target computation, the high part never needs more than $12$ decimal digits, so $\sigma(q_n)\le 108$. Since $\sigma_6(r_n)\le 54$, every elementary update increases the low part by at most $162$. In particular, a single update can create at most one carry.

Step 2: Jump Directly to the Next Carry When the High Part Is Fixed

While $q_n$ is fixed, its digit sum is a constant. For a fixed value $c$, define

$F_c(x)=x+\sigma_6(x)+c.$

Starting from a low part $x$, the next carry happens after

$\tau_c(x)=\min\{t\ge 1:F_c^{(t)}(x)\ge B\}$

elementary updates, and the low part immediately after that carry is

$\rho_c(x)=F_c^{(\tau_c(x))}(x)-B.$

These quantities are exact: $\tau_c(x)$ counts how many ordinary recurrence steps are consumed before the boundary $B$ is crossed, and $\rho_c(x)$ is the true remainder after subtracting $B$.

Step 3: After a Carry, the Low Part Returns to a Tiny State Space

At the moment of crossing we have

$B\le F_c^{(\tau_c(x))}(x)\le B+54+108,$

so the post-carry remainder always satisfies

$0\le \rho_c(x)\le 162.$

This is the main compression step. Although the low part can wander through all values below $10^6$ before the next carry, the state immediately after a carry lies in the tiny set

$S=\{0,1,2,\dots,162\}.$

So a carry event can be viewed as a macro-step that starts from some remainder in $S$, spends a known number of elementary updates, and returns to another remainder in $S$.

Step 4: Represent One Carry Event as a Transition Depending Only on the Current High Part

For each current high part $h$, let $c=\sigma(h)$. One carry event then induces a transition $M_h$ on the small remainder set $S$: for each $x\in S$, the transition returns the new remainder $\rho_c(x)$ and the exact cost $\tau_c(x)$.

If we want to process several carries in a row, starting from high part $h$, we must compose the transitions

$M_h,\ M_{h+1},\ M_{h+2},\ \dots.$

For adjacent intervals of high-part values, composition is exact:

$\mathcal{T}_{[u,w)}=\mathcal{T}_{[v,w)}\circ \mathcal{T}_{[u,v)}\qquad (u\le v\le w).$

This means that long runs of carry events can be handled by composing reusable transition blocks instead of replaying each event separately.

Step 5: Build Decimal Blocks of Consecutive High Parts

Consider an aligned interval of length $10^d$, such as all high-part values from $2300$ to $2399$. Every number in that interval shares the same prefix and only the last $d$ digits vary. Because digit sums split over decimal concatenation, the digit sum of each high part is

$\sigma(\text{prefix}\cdot 10^d+\text{suffix})=\sigma(\text{prefix})+\sigma(\text{suffix}).$

Therefore the full transition of such a block depends only on two pieces of information: the block length $10^d$ and the digit sum of the fixed prefix. The implementations precompute all these aligned decimal blocks up to $d=12$. An arbitrary interval $[u,v)$ is then decomposed greedily into aligned blocks, so a long interval can be evaluated with only a small number of transition compositions.

Step 6: Use Doubling and Binary Search to Maximize the Number of Carry Events

Suppose $R$ elementary updates are still available. Let $C(m)$ be the number of elementary updates consumed by the next $m$ carry events from the current state. Since each carry event costs at least one ordinary update, $C(m)$ is strictly increasing.

So the algorithm finds

$m^*=\max\{m\ge 0:C(m)\le R\}$

by first doubling $m$ until the budget is exceeded and then applying binary search. After those $m^*$ carry events are executed in one shot, the remaining budget is too small to reach the next carry. The last few elementary updates are therefore performed directly with the high part fixed.

Worked Example: One Carry Event and a Short Tail

Take

$a=17\cdot 10^6+999995.$

Then $\sigma(17)=8$ and $\sigma_6(999995)=50$. One elementary update gives

$999995+50+8=1000053,$

so the next term is

$a'=18\cdot 10^6+53.$

This single carry event has consumed one ordinary recurrence step and moved the low part back into the small set $S$. Now the high part is $18$, so its digit sum is $9$. If two more elementary updates are taken before the next carry, the low part evolves as

$53\to 70\to 86,$

because $53+\sigma_6(53)+9=53+8+9=70$ and $70+\sigma_6(70)+9=70+7+9=86$. This is exactly the kind of short leftover tail that the algorithm finishes directly after the carry-jump phase.

How the Code Works

The C++, Python, and Java implementations begin by precomputing the digit sum of every six-digit value from $0$ to $999999$. This makes the low-part update $x\mapsto x+\sigma_6(x)+c$ constant-time for every possible state.

Next, for each possible digit sum $c$ of the high part, the implementation scans the full low-part range backward and determines the exact number of ordinary updates needed to hit the next carry, together with the post-carry remainder. Only the carry-to-carry states $0$ through $162$ need to be retained for later composition, because every carry lands inside that range.

After that, the implementation builds transition blocks for aligned decimal intervals of lengths $1,10,100,\dots,10^{12}$. Each block stores, for every small remainder, both the new remainder and the number of ordinary updates consumed across that whole interval of consecutive high-part values.

During the actual solve phase, the current value starts at $1$. The implementation repeatedly asks how many carry events can be afforded under the remaining step budget, answers that question with doubling and binary search, applies the resulting interval transition, and finally performs the tiny leftover suffix directly. The method is exact throughout; nothing is approximated.

Complexity Analysis

Let $B=10^6$, let $C=108$ be the maximum possible digit sum of the high part used by the implementations, let $R=162$ be the post-carry remainder bound, and let $D=12$ be the maximum decimal block depth. Precomputing all six-digit digit sums costs $O(B)$ time. Building the next-carry information for every $c\in[0,C]$ costs $O(BC)$ time and dominates the preprocessing.

The aligned block tables operate only on the tiny state space of size $R+1=163$, so their cost is much smaller than the carry-table construction. For the single final query, doubling and binary search use a logarithmic number of interval evaluations, and each interval evaluation touches only a small number of decimal blocks. Memory usage is dominated by the carry tables and is $O(BC)$.

Footnotes and References

Problem page: https://projecteuler.net/problem=551
Digit sum: Wikipedia — Digit sum
Decimal numeral system: Wikipedia — Decimal
Function composition and iteration: Wikipedia — Iterated function
Binary search: Wikipedia — Binary search algorithm