Problem 156: Counting Digits
View on Project EulerProject Euler Problem 156 Solution
EulerSolve provides an optimized solution for Project Euler Problem 156, Counting Digits, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary For a fixed digit \(d\in\{1,\dots,9\}\), let \(f_d(n)\) denote the total number of times \(d\) appears in the decimal expansions of all integers from \(0\) through \(n\). The task is to find every integer \(x\) satisfying $f_d(x)=x,$ then sum all such \(x\) over \(d=1,\dots,9\). The point \(x=0\) is trivially a fixed point for every nonzero digit, because the decimal expansion of \(0\) contains none of \(1,\dots,9\), but it contributes nothing to the final total. The real difficulty is to locate all positive fixed points without scanning an enormous search range one value at a time. Mathematical Approach The solution rests on two ideas. First, \(f_d(n)\) can be evaluated exactly in \(O(\log_{10} n)\) time by summing independent contributions from each decimal position. Second, once \(f_d\) is known to be monotone, large intervals can be discarded using only endpoint values, so the search can proceed recursively on decimal blocks rather than on individual integers. The cumulative counting function Let \(\nu_d(m)\) be the number of occurrences of digit \(d\) in the ordinary decimal expansion of the single integer \(m\)....
Detailed mathematical approach
Problem Summary
For a fixed digit \(d\in\{1,\dots,9\}\), let \(f_d(n)\) denote the total number of times \(d\) appears in the decimal expansions of all integers from \(0\) through \(n\). The task is to find every integer \(x\) satisfying
$f_d(x)=x,$
then sum all such \(x\) over \(d=1,\dots,9\).
The point \(x=0\) is trivially a fixed point for every nonzero digit, because the decimal expansion of \(0\) contains none of \(1,\dots,9\), but it contributes nothing to the final total. The real difficulty is to locate all positive fixed points without scanning an enormous search range one value at a time.
Mathematical Approach
The solution rests on two ideas. First, \(f_d(n)\) can be evaluated exactly in \(O(\log_{10} n)\) time by summing independent contributions from each decimal position. Second, once \(f_d\) is known to be monotone, large intervals can be discarded using only endpoint values, so the search can proceed recursively on decimal blocks rather than on individual integers.
The cumulative counting function
Let \(\nu_d(m)\) be the number of occurrences of digit \(d\) in the ordinary decimal expansion of the single integer \(m\). Then
$f_d(n)=\sum_{m=0}^{n}\nu_d(m).$
This immediately shows that \(f_d\) is nondecreasing, and in fact
$f_d(n+1)-f_d(n)=\nu_d(n+1).$
If we also write
$g_d(n)=f_d(n)-n,$
then the fixed-point equation is simply \(g_d(n)=0\), but
$g_d(n+1)-g_d(n)=\nu_d(n+1)-1.$
That difference can be negative, zero, or positive. So \(g_d\) is not monotone, which is why a direct binary search on \(g_d\) is not valid. The code therefore searches by interval elimination instead of by sign changes.
Counting one decimal position exactly
Fix one decimal position \(p=10^k\). We want the number of integers in \(0,\dots,n\) whose digit in that position equals \(d\). Set
$N=n+1,\qquad q=\left\lfloor\frac{N}{10p}\right\rfloor,\qquad r=N\bmod(10p).$
The first \(N\) integers split into \(q\) complete blocks of length \(10p\), plus one incomplete tail of length \(r\). In every full block, the chosen position runs through the digits \(0,\dots,9\), each repeated for exactly \(p\) consecutive numbers, so digit \(d\) contributes exactly \(p\) times per block. The full blocks therefore contribute \(qp\).
Only the tail needs casework. Inside the final block, the numbers whose \(p\)-digit equals \(d\) form a contiguous segment of length \(p\), starting after the first \(dp\) entries of that block. Hence the tail contribution is
$\min\!\bigl(\max(r-dp,0),\,p\bigr).$
Summing over all active decimal positions yields the exact formula used by the implementations:
$f_d(n)=\sum_{p=1,10,100,\dots}\left(\left\lfloor\frac{n+1}{10p}\right\rfloor p+\min\!\bigl(\max((n+1)\bmod(10p)-dp,0),\,p\bigr)\right).$
No leading-zero correction is needed here, because the problem only asks for digits \(1\) through \(9\). That detail matters: counting zeros would require a different treatment.
Worked example: computing \(f_1(12)\)
The identity \(f_1(12)=5\) is small enough to verify by hand and appears as a natural checkpoint for the method. From \(0\) to \(12\), the digit \(1\) appears in
$1,\ 10,\ 11,\ 12,$
with multiplicities \(1,1,2,1\), so the total is 5.
The positional formula reproduces the same count. For the units position \(p=1\),
$\left\lfloor\frac{13}{10}\right\rfloor\cdot 1+\min(\max(13\bmod 10-1,0),1)=1+1=2.$
For the tens position \(p=10\),
$\left\lfloor\frac{13}{100}\right\rfloor\cdot 10+\min(\max(13\bmod 100-10,0),10)=0+3=3.$
There are no higher active positions, so \(f_1(12)=2+3=5\).
The interval-pruning invariant
Suppose a fixed point \(x\) lies in an interval \([L,R]\). Since \(f_d\) is nondecreasing,
$f_d(L)\le f_d(x)=x\le f_d(R).$
Because \(x\in[L,R]\), we also have
$L\le x\le R.$
Combining the two chains gives the necessary conditions
$f_d(L)\le R,\qquad f_d(R)\ge L.$
Therefore, if either
$f_d(R)\lt L\qquad\text{or}\qquad f_d(L)\gt R,$
then \([L,R]\) cannot contain any solution and may be discarded immediately.
A concrete example is the interval \([30,39]\) for \(d=1\). One finds \(f_1(39)=14\), and \(14\lt 30\), so the entire interval is impossible. The algorithm never needs to test 30, 31, 32, and so on individually.
Why the search tree follows powers of ten
The search begins with a span \(10^m\) large enough to cover the chosen limit. Any surviving interval of span \(10^m\) is split into ten children of span \(10^{m-1}\). This is not an arbitrary design choice: the counting formula itself is organized around decimal blocks, so each split reveals one more decimal digit of any possible fixed point.
When the span reaches 1, the leaf represents a single integer candidate, and the implementation performs the exact equality test \(f_d(x)=x\). In this way the recursion visits only those decimal boxes that remain compatible with the endpoint bounds.
How the Code Works
The C++, Python, and Java implementations all follow the same high-level structure: a fast evaluator for \(f_d(n)\), followed by a recursive search over decimal intervals for each digit \(d=1,\dots,9\).
Fast endpoint evaluation
For any queried endpoint \(n\), the implementation rewrites the inclusive range \(0,\dots,n\) as a half-open interval of length \(n+1\). It then loops through the decimal positions \(1,10,100,\dots\), adds the contribution from complete blocks, adds the tail contribution from the final partial block, and returns the total. Because the loop advances by powers of ten, one evaluation uses only as many iterations as there are decimal digits.
Recursive filtering of decimal intervals
For each digit, the implementation chooses the smallest power of ten covering the search limit and treats it as the root span. At each recursive node it evaluates \(f_d\) at the left and right endpoints, applies the pruning test above, and returns immediately when the interval is impossible. Otherwise it descends into the ten decimal children.
At unit span, the left endpoint is the unique integer candidate represented by that leaf, so the implementation checks the equality exactly and adds the value to the total when it is a fixed point. The C++ implementation also contains small checkpoint verifications, including the hand-checkable fact \(f_1(12)=5\) and a brute-force comparison on a short range; the Python and Java implementations use the same search logic without that extra validation layer.
Complexity Analysis
Let \(U\) be the chosen search limit. Evaluating \(f_d(n)\) requires one pass over the decimal positions, so it takes \(O(\log U)\) time and \(O(1)\) auxiliary memory.
If \(V_d\) interval nodes survive pruning for digit \(d\), then the total time for that digit is \(O(V_d\log U)\), because each visited node performs two endpoint evaluations. The recursion depth is \(O(\log U)\), so the stack usage is also \(O(\log U)\).
In the pessimistic case, pruning could fail often enough that the search approaches a full scan, giving \(V_d=O(U)\) and hence \(O(U\log U)\) time. In practice the endpoint test eliminates most decimal intervals very early, which is why the method is effective without building any large tables.
Footnotes and References
- Problem page: Project Euler 156
- Decimal numeral system: Wikipedia - Decimal
- Positional notation: Wikipedia - Positional notation
- Floor and ceiling functions: Wikipedia - Floor and ceiling functions
- Monotonic function: Wikipedia - Monotonic function
- Fixed point: Wikipedia - Fixed point