Problem 637: Flexible Digit Sum
View on Project EulerProject Euler Problem 637 Solution
EulerSolve provides an optimized solution for Project Euler Problem 637, Flexible Digit Sum, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary For a base \(b\), write \(k\) in base \(b\) and cut its digit string into contiguous blocks. Interpret each block as a base-\(b\) integer and replace \(k\) by the sum of those block values. Let \(f_b(k)\) be the minimum number of such moves needed to make the value \(< b\). The problem asks for $g(n,b_1,b_2)=\sum_{k=1}^{n}[f_{b_1}(k)=f_{b_2}(k)]\,k,$ and the required output is \(g(10^7,10,3)\). The difficulty is that every number admits many different contiguous partitions, so a naive search over all reduction sequences would explode. Mathematical Approach The solution computes \(f_b(k)\) for every \(k\le n\) and for each required base separately. The key observation is that every useful move produces a strictly smaller number, which turns the problem into a dynamic program over increasing \(k\). Step 1: Define the transition precisely Let the base-\(b\) expansion of \(k\) be a digit string. A partition of that string into contiguous blocks \(P_1,\dots,P_r\) produces the next value $T_b(k;P_1,\dots,P_r)=\sum_{j=1}^{r} V_b(P_j),$ where \(V_b(P_j)\) is the ordinary integer value of block \(P_j\) in base \(b\)....
Detailed mathematical approach
Problem Summary
For a base \(b\), write \(k\) in base \(b\) and cut its digit string into contiguous blocks. Interpret each block as a base-\(b\) integer and replace \(k\) by the sum of those block values. Let \(f_b(k)\) be the minimum number of such moves needed to make the value \(< b\).
The problem asks for
$g(n,b_1,b_2)=\sum_{k=1}^{n}[f_{b_1}(k)=f_{b_2}(k)]\,k,$
and the required output is \(g(10^7,10,3)\). The difficulty is that every number admits many different contiguous partitions, so a naive search over all reduction sequences would explode.
Mathematical Approach
The solution computes \(f_b(k)\) for every \(k\le n\) and for each required base separately. The key observation is that every useful move produces a strictly smaller number, which turns the problem into a dynamic program over increasing \(k\).
Step 1: Define the transition precisely
Let the base-\(b\) expansion of \(k\) be a digit string. A partition of that string into contiguous blocks \(P_1,\dots,P_r\) produces the next value
$T_b(k;P_1,\dots,P_r)=\sum_{j=1}^{r} V_b(P_j),$
where \(V_b(P_j)\) is the ordinary integer value of block \(P_j\) in base \(b\). Then
$f_b(k)=0\qquad (0\le k < b),$
and for \(k\ge b\),
$f_b(k)=1+\min_{P_1|\cdots|P_r} f_b\!\left(T_b(k;P_1,\dots,P_r)\right).$
The one-block partition leaves the number unchanged, so it never improves the optimum; all useful partitions create a strict reduction.
Step 2: Bound every transition between digit sum and the original number
Let \(s_b(k)\) be the sum of the base-\(b\) digits of \(k\). Every block value is at least the sum of the digits inside that block, so any partition satisfies
$s_b(k)\le T_b(k;P_1,\dots,P_r)\le k.$
The left equality occurs when every digit is separated into its own block. The right equality occurs only when the whole digit string is kept as one block. Therefore every useful transition satisfies
$s_b(k)\le T_b(k;P_1,\dots,P_r)<k.$
This immediately gives a valid first candidate:
$f_b(k)\le 1+f_b(s_b(k)).$
Because all useful targets are smaller than \(k\), the values \(f_b(0),f_b(1),\dots,f_b(k-1)\) can be finalized before computing \(f_b(k)\).
Step 3: Enumerate all contiguous partitions by scanning digits once
After the dynamic-programming order is fixed, the remaining task for a given \(k\) is to enumerate its possible block sums. The implementation scans the digits from least significant to most significant.
At each new digit there are exactly two structural choices:
$\text{start a new block}\qquad \text{or} \qquad \text{extend the current block}.$
Those two choices generate the full binary partition tree. If the base-\(b\) expansion of \(k\) has \(d\) digits, then the raw search tree has \(2^{d-1}\) leaves, one for each placement of separators between adjacent digits. No partition is missed and none is counted twice.
Step 4: Derive a lower bound for any unfinished search state
During the depth-first search, part of the digit string has already been processed. Suppose a state stores
$A=\text{sum of finished blocks},\qquad C=\text{value of the current open block},\qquad R=\text{sum of the digits not processed yet}.$
Then every completion of that state produces a total \(T\) satisfying
$T\ge A+C+R.$
Why? The unfinished higher digits contribute at least their digit sum \(R\) if they are split as single digits, and any merging can only increase their contribution. Likewise, closing the current block as it stands gives \(C\), while extending that block can only make it larger. Hence
$L=A+C+R$
is an optimistic lower bound on every descendant of the current DFS node.
Step 5: Turn the lower bound into pruning
Let
$\Phi_k(L)=\min_{L\le x<k} f_b(x).$
Because all useful next values are smaller than \(k\), this minimum only ranges over already solved states. If the best answer found so far for the current number is \(B\), then every completion of the current DFS node needs at least
$1+\Phi_k(L)$
moves. Therefore, if
$1+\Phi_k(L)\ge B,$
the whole subtree can be pruned safely. This is the central speedup: the search does not need to finish a partition once the lower bound proves that it cannot beat the current best value.
Worked Example: \(k=13\) in bases \(10\) and \(3\)
In base \(10\), the digit string is \(13\). The two partitions are
$13\qquad\text{and}\qquad 1|3\mapsto 1+3=4.$
The nontrivial move sends \(13\) to \(4<10\), so
$f_{10}(13)=1.$
In base \(3\), we have \(13=(111)_3\). The contiguous partitions give
$111_3=13,\qquad 1|11_3=1+4=5,\qquad 11|1_3=4+1=5,\qquad 1|1|1_3=3.$
The best first move is therefore \(13\to 3\). But \(3=(10)_3\) is still not below \(3\), and one more move \(10_3\to 1\) finishes the reduction. Hence
$f_{3}(13)=2.$
So \(13\) does not contribute to \(g(n,10,3)\), because its optimal step counts in the two bases are different.
How the Code Works
The implementation builds the table of step counts for one base at a time. It first precomputes the base-\(b\) digit sum of every number up to \(n\) using the recurrence
$s_b(k)=s_b\!\left(\left\lfloor\frac{k}{b}\right\rfloor\right)+(k\bmod b).$
All values below \(b\) are initialized with step count \(0\), while larger values start with a large provisional bound. For each \(k\ge b\), the first candidate is the all-single-digit partition, namely \(1+f_b(s_b(k))\).
Next, a depth-first search scans the digits from right to left and enumerates all contiguous partitions by repeatedly choosing between closing the current block and starting a new one, or extending the current block with the next digit. At each node it evaluates the lower bound \(L=A+C+R\), updates the best answer using the already known value at \(L\), and then asks for the minimum solved step count on the suffix \([L,k)\). If even that optimistic suffix minimum cannot beat the current best, the branch is discarded.
A compact suffix-minimum summary supports those queries efficiently: for each attained step count it remembers how far to the right that value has already occurred among solved numbers. This is enough to recover \(\Phi_k(L)\) quickly without scanning the whole interval.
Once the table is complete for a base, the same process is run for the other base. The C++, Python, and Java implementations follow the same mathematical plan; the C++ version evaluates the two bases concurrently when they differ, the Java version performs the two passes directly, and the Python entry point delegates to the same C++ computation. After both tables are available, the implementation sums all \(k\) with matching step counts. One of the implementations also checks the smaller benchmark
$g(100,10,3)=3302$
before producing the final answer.
Complexity Analysis
Let \(d_b(k)\) be the number of base-\(b\) digits of \(k\). Without pruning, a number with \(d\) digits has \(2^{d-1}\) contiguous partitions, so the raw dynamic program would cost
$O\!\left(\sum_{k=b}^{n} 2^{d_b(k)-1}\right).$
Since \(d_b(k)\le \lfloor \log_b n \rfloor+1\), a coarse worst-case bound is
$O\!\left(n^{\,1+\log_b 2}\right)$
for one base, before pruning. The implemented method adds two linear-size arrays for step counts and digit sums, so the memory usage is \(O(n)\) per base computation, plus \(O(\log_b n)\) recursion depth and a small suffix-summary structure.
What makes the program practical is not a better worst-case formula, but the pruning. The initial candidate \(1+f_b(s_b(k))\) is often already close to optimal, and the lower-bound test removes most of the partition tree. For the required input \(n=10^7\), that practical reduction is strong enough to make the full computation feasible.
Footnotes and References
- Problem page: https://projecteuler.net/problem=637
- Positional notation: Wikipedia — Positional notation
- Digit sum: Wikipedia — Digit sum
- Dynamic programming: Wikipedia — Dynamic programming
- Branch and bound: Wikipedia — Branch and bound