Problem 658: Incomplete Words II
View on Project EulerProject Euler Problem 658 Solution
EulerSolve provides an optimized solution for Project Euler Problem 658, Incomplete Words II, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary Let \(F(k,n)\) denote the total requested in Problem 658. The target parameters are enormous, so any method that tries to enumerate words, states, or lengths directly is out of reach. The key observation encoded by the C++, Python, and Java implementations is that the whole quantity can be rewritten as a finite algebraic basis indexed only by \(r=0,1,\dots,k-1\). After that transformation, the problem is no longer a direct combinatorial search over incomplete words. It becomes the evaluation of a weighted sum of geometric series modulo $M=10^9+7.$ Mathematical Approach The implementations absorb the inclusion-exclusion coming from missing letters into coefficients \(A_r\). Once these coefficients are known, the dependence on the huge parameter \(n\) is isolated inside standard geometric sums. Step 1: Rewrite the target in a geometric basis The required total can be written as $F(k,n)=\sum_{r=0}^{k-1} A_r\,G_r(n)\pmod{M},$ where $G_r(n)=\sum_{t=0}^{n} r^t.$ The index \(r\) behaves like an effective alphabet size that survives one stage of inclusion-exclusion. The difficult combinatorics is therefore compressed into the scalar weights \(A_r\), while the length parameter appears only inside a geometric progression....
Detailed mathematical approach
Problem Summary
Let \(F(k,n)\) denote the total requested in Problem 658. The target parameters are enormous, so any method that tries to enumerate words, states, or lengths directly is out of reach. The key observation encoded by the C++, Python, and Java implementations is that the whole quantity can be rewritten as a finite algebraic basis indexed only by \(r=0,1,\dots,k-1\).
After that transformation, the problem is no longer a direct combinatorial search over incomplete words. It becomes the evaluation of a weighted sum of geometric series modulo
$M=10^9+7.$
Mathematical Approach
The implementations absorb the inclusion-exclusion coming from missing letters into coefficients \(A_r\). Once these coefficients are known, the dependence on the huge parameter \(n\) is isolated inside standard geometric sums.
Step 1: Rewrite the target in a geometric basis
The required total can be written as
$F(k,n)=\sum_{r=0}^{k-1} A_r\,G_r(n)\pmod{M},$
where
$G_r(n)=\sum_{t=0}^{n} r^t.$
The index \(r\) behaves like an effective alphabet size that survives one stage of inclusion-exclusion. The difficult combinatorics is therefore compressed into the scalar weights \(A_r\), while the length parameter appears only inside a geometric progression.
Step 2: Evaluate the geometric sums in closed form
For modular arithmetic, \(G_r(n)\) is handled by cases:
$G_r(n)=\begin{cases} 1,& r=0,\\ n+1,& r=1,\\ \dfrac{r^{n+1}-1}{r-1},& r\ge 2. \end{cases}$
The special cases \(r=0\) and \(r=1\) are treated separately so the formula remains exact and no division by zero appears. For \(r\ge 2\), division by \(r-1\) means multiplication by the modular inverse of \(r-1\) modulo \(M\).
Step 3: Build the coefficients by a descending recurrence
The transformed coefficients satisfy
$A_{k-1}=k,$
$A_r=2A_{r+1}+(-1)^{k-r+1}\binom{k+1}{k-r}-1\pmod{M}\qquad (0\le r\le k-2).$
This is the algebraic core of the solution. Each step doubles the next coefficient, adds one signed binomial correction from inclusion-exclusion, and subtracts \(1\). Because the sweep runs from \(r=k-1\) down to \(0\), every new coefficient depends only on the next already-known coefficient and one current binomial term.
Step 4: Update the binomial term in \(O(1)\)
If one step uses \(\binom{k+1}{k-r}\), then the next step needs \(\binom{k+1}{k-r+1}\). These values satisfy
$\binom{k+1}{k-r+1}=\binom{k+1}{k-r}\cdot\frac{r+1}{k-r+1}.$
Since \(M\) is prime, the division by \(k-r+1\) is implemented as multiplication by its modular inverse. This avoids factorial tables and keeps the whole coefficient sweep linear in \(k\).
Step 5: Assemble the final answer
Once the coefficients are ready, the answer is simply
$F(k,n)=\sum_{r=0}^{k-1} A_r\,G_r(n)\pmod{M}.$
For \(r\ge 2\), the only expensive subroutine is fast exponentiation of \(r^{n+1}\). That is why the dependence on the huge input \(n\) stays logarithmic.
Worked Example: \((k,n)=(4,4)\)
This is the smallest checkpoint built into the implementations. The recurrence gives
$\begin{aligned} A_3&=4,\\ A_2&=2A_3-\binom{5}{2}-1=8-10-1=-3,\\ A_1&=2A_2+\binom{5}{3}-1=-6+10-1=3,\\ A_0&=2A_1-\binom{5}{4}-1=6-5-1=0. \end{aligned}$
So the signed coefficients are \((A_0,A_1,A_2,A_3)=(0,3,-3,4)\). In modular storage the negative value \(-3\) is represented by \(M-3\), but the arithmetic is identical.
The corresponding geometric sums are
$G_0(4)=1,\qquad G_1(4)=5,\qquad G_2(4)=1+2+4+8+16=31,\qquad G_3(4)=1+3+9+27+81=121.$
Therefore
$F(4,4)=0\cdot 1+3\cdot 5-3\cdot 31+4\cdot 121=406,$
which matches the checkpoint used by the implementation.
How the Code Works
The C++, Python, and Java implementations first precompute modular inverses for all integers from \(1\) to \(k+1\). That single table supports both the binomial updates and the division by \(r-1\) in the geometric-series formula.
Next, the implementation performs one descending pass over \(r\) to fill the coefficient table with the recurrence above. During the same pass it updates the current binomial coefficient multiplicatively, so no factorial arrays are required.
Finally, it evaluates each basis term. The cases \(r=0\) and \(r=1\) are handled directly, while \(r\ge 2\) uses binary exponentiation to compute \(r^{n+1}\bmod M\). The C++ implementation additionally splits the final summation across several threads when \(k\) is large enough; the Python and Java implementations perform the same arithmetic in a single pass.
Complexity Analysis
Precomputing inverses costs \(O(k)\) time and \(O(k)\) memory. Building all coefficients also costs \(O(k)\). The final summation evaluates \(k-2\) nontrivial powers, each in \(O(\log n)\) time, so the dominant term is \(O(k\log n)\). The overall memory usage is \(O(k)\). Parallel summation in C++ improves wall-clock time but does not change the asymptotic bound.
Footnotes and References
- Problem page: https://projecteuler.net/problem=658
- Geometric series: Wikipedia — Geometric series
- Binomial coefficient: Wikipedia — Binomial coefficient
- Modular multiplicative inverse: Wikipedia — Modular multiplicative inverse
- Inclusion-exclusion principle: Wikipedia — Inclusion-exclusion principle