Problem 657: Incomplete Words
View on Project EulerProject Euler Problem 657 Solution
EulerSolve provides an optimized solution for Project Euler Problem 657, Incomplete Words, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary For an alphabet with \(\alpha\) symbols, we must count all incomplete words whose lengths range from \(0\) to \(n\). A word is called incomplete when at least one symbol of the alphabet does not appear in it. The final answer is required modulo \(10^9+7\). If we tried to inspect every word directly, we would face \(\alpha^t\) possibilities at length \(t\), which is hopeless for the real parameters. The solution instead counts words by inclusion-exclusion, then sums those counts across all lengths in one algebraic expression. Mathematical Approach Let \(I_t(\alpha)\) be the number of incomplete words of exact length \(t\), and let $T(\alpha,n)=\sum_{t=0}^{n} I_t(\alpha).$ The goal is to evaluate \(T(\alpha,n)\) efficiently. Step 1: Count all words of one fixed length For a fixed length \(t\), there are exactly $\alpha^t$ words over the full alphabet, because each of the \(t\) positions can use any of the \(\alpha\) symbols. Among these words, some are complete and use every symbol at least once. If \(C_t(\alpha)\) denotes the number of complete words of length \(t\), then the incomplete words are simply the complement: $I_t(\alpha)=\alpha^t-C_t(\alpha).$ Step 2: Use inclusion-exclusion for complete words For each symbol, consider the event that this symbol is missing....
Detailed mathematical approach
Problem Summary
For an alphabet with \(\alpha\) symbols, we must count all incomplete words whose lengths range from \(0\) to \(n\). A word is called incomplete when at least one symbol of the alphabet does not appear in it. The final answer is required modulo \(10^9+7\).
If we tried to inspect every word directly, we would face \(\alpha^t\) possibilities at length \(t\), which is hopeless for the real parameters. The solution instead counts words by inclusion-exclusion, then sums those counts across all lengths in one algebraic expression.
Mathematical Approach
Let \(I_t(\alpha)\) be the number of incomplete words of exact length \(t\), and let
$T(\alpha,n)=\sum_{t=0}^{n} I_t(\alpha).$
The goal is to evaluate \(T(\alpha,n)\) efficiently.
Step 1: Count all words of one fixed length
For a fixed length \(t\), there are exactly
$\alpha^t$
words over the full alphabet, because each of the \(t\) positions can use any of the \(\alpha\) symbols.
Among these words, some are complete and use every symbol at least once. If \(C_t(\alpha)\) denotes the number of complete words of length \(t\), then the incomplete words are simply the complement:
$I_t(\alpha)=\alpha^t-C_t(\alpha).$
Step 2: Use inclusion-exclusion for complete words
For each symbol, consider the event that this symbol is missing. If we choose a set of exactly \(j\) symbols that are forbidden, then the word may use only the remaining \(\alpha-j\) symbols, so there are
$ (\alpha-j)^t $
such words.
By inclusion-exclusion, the number of complete words of length \(t\) is
$C_t(\alpha)=\sum_{j=0}^{\alpha}(-1)^j\binom{\alpha}{j}(\alpha-j)^t.$
Therefore
$I_t(\alpha)=\alpha^t-C_t(\alpha)=\sum_{j=1}^{\alpha}(-1)^{j+1}\binom{\alpha}{j}(\alpha-j)^t.$
Now reindex with \(k=\alpha-j\). Then \(k\) runs from \(0\) to \(\alpha-1\), and we obtain the form used by the implementations:
$I_t(\alpha)=\sum_{k=0}^{\alpha-1}(-1)^{\alpha-k+1}\binom{\alpha}{k}k^t.$
This formula already explains why the summation stops at \(\alpha-1\): the missing \(k=\alpha\) term corresponds to the full alphabet and is exactly what disappears when we pass from complete words to incomplete words.
Step 3: Sum over all lengths \(0\) through \(n\)
Insert the fixed-length formula into the outer sum:
$T(\alpha,n)=\sum_{t=0}^{n}\sum_{k=0}^{\alpha-1}(-1)^{\alpha-k+1}\binom{\alpha}{k}k^t.$
Because the summation is finite, we may swap the order:
$T(\alpha,n)=\sum_{k=0}^{\alpha-1}(-1)^{\alpha-k+1}\binom{\alpha}{k}\sum_{t=0}^{n}k^t.$
Define the geometric block
$G_k(n)=\sum_{t=0}^{n}k^t.$
Then
$T(\alpha,n)=\sum_{k=0}^{\alpha-1}(-1)^{\alpha-k+1}\binom{\alpha}{k}G_k(n).$
The three cases for \(G_k(n)\) are
$G_k(n)= \begin{cases} 1, & k=0,\\ n+1, & k=1,\\ \dfrac{k^{n+1}-1}{k-1}, & k\ge 2. \end{cases}$
The case \(k=0\) is special because only the empty word contributes: \(0^0\) is interpreted combinatorially as \(1\), while \(0^t=0\) for \(t>0\). The case \(k=1\) is also special because there is exactly one word at every length.
Step 4: Convert the formula into modular arithmetic
The answer is required modulo the prime
$M=10^9+7.$
For the actual parameters, \(\alpha<M\), so every integer \(1,2,\dots,\alpha\) has a modular inverse modulo \(M\). This makes two parts efficient.
First, the geometric term for \(k\ge 2\) becomes
$G_k(n)\equiv \left(k^{n+1}-1\right)(k-1)^{-1}\pmod M,$
where \(k^{n+1}\) is obtained by fast exponentiation.
Second, the binomial coefficients are generated incrementally instead of by factorial tables:
$\binom{\alpha}{0}=1,\qquad \binom{\alpha}{k+1}=\binom{\alpha}{k}\cdot\frac{\alpha-k}{k+1}.$
Modulo \(M\), division by \(k+1\) is replaced by multiplication with the modular inverse of \(k+1\). This is exactly why the code precomputes inverses from \(1\) to \(\alpha\).
Step 5: Worked example with \(\alpha=3\) and \(n=4\)
For \(\alpha=3\), the transformed formula is
$T(3,4)=\sum_{k=0}^{2}(-1)^{4-k}\binom{3}{k}G_k(4).$
Now evaluate each geometric block:
$G_0(4)=1,\qquad G_1(4)=5,\qquad G_2(4)=1+2+4+8+16=31.$
So
$T(3,4)=\binom{3}{0}\cdot 1-\binom{3}{1}\cdot 5+\binom{3}{2}\cdot 31=1-15+93=79.$
This also matches the direct length-by-length count:
$I_0=1,\quad I_1=3,\quad I_2=9,\quad I_3=21,\quad I_4=45,\quad 1+3+9+21+45=79.$
The example is useful because it checks both the combinatorial interpretation and the algebraic transformation used by the code.
How the Code Works
The C++, Python, and Java implementations all evaluate the same summation formula modulo \(10^9+7\). They first build modular inverses for the integers \(1\) through \(\alpha\), then generate the binomial coefficients \(\binom{\alpha}{k}\) one after another with the multiplicative recurrence above. This avoids storing factorial tables for such a large alphabet size.
Next, the implementation loops over \(k=0,1,\dots,\alpha-1\). For each \(k\), it evaluates the geometric block \(G_k(n)\). The cases \(k=0\) and \(k=1\) are handled directly, while \(k\ge 2\) uses fast modular exponentiation together with the inverse of \(k-1\).
Each term is multiplied by the corresponding binomial coefficient and inserted with alternating sign. The C++ implementation additionally splits the \(k\)-range across several threads for large inputs, but the mathematical expression is identical in all three languages.
Complexity Analysis
Precomputing modular inverses takes \(O(\alpha)\) time and \(O(\alpha)\) memory. Generating all binomial coefficients also takes \(O(\alpha)\) time and \(O(\alpha)\) memory. The final summation has \(\alpha\) terms, and for every \(k\ge 2\) it performs one fast exponentiation costing \(O(\log n)\). Therefore the overall running time is
$O(\alpha\log n),$
with total memory usage
$O(\alpha).$
Parallel execution in the C++ version improves wall-clock time on large machines but does not change the asymptotic bound.
Footnotes and References
- Problem statement: Project Euler 657
- Inclusion-exclusion principle: Wikipedia — Inclusion-exclusion principle
- Geometric series: Wikipedia — Geometric series
- Binomial coefficient: Wikipedia — Binomial coefficient
- Modular multiplicative inverse: Wikipedia — Modular multiplicative inverse