Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 657: Incomplete Words

Project Euler Problem 657 Solution

EulerSolve provides an optimized solution for Project Euler Problem 657, Incomplete Words, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary For an alphabet with $\alpha$ symbols, we must count all incomplete words whose lengths range from $0$ to $n$. A word is called incomplete when at least one symbol of the alphabet does not appear in it. The final answer is required modulo $10^9+7$. If we tried to inspect every word directly, we would face $\alpha^t$ possibilities at length $t$, which is hopeless for the real parameters. The solution instead counts words by inclusion-exclusion, then sums those counts across all lengths in one algebraic expression. Mathematical Approach Let $I_t(\alpha)$ be the number of incomplete words of exact length $t$, and let $T(\alpha,n)=\sum_{t=0}^{n} I_t(\alpha).$ The goal is to evaluate $T(\alpha,n)$ efficiently. Step 1: Count all words of one fixed length For a fixed length $t$, there are exactly $\alpha^t$ words over the full alphabet, because each of the $t$ positions can use any of the $\alpha$ symbols. Among these words, some are complete and use every symbol at least once. If $C_t(\alpha)$ denotes the number of complete words of length $t$, then the incomplete words are simply the complement: $I_t(\alpha)=\alpha^t-C_t(\alpha).$ Step 2: Use inclusion-exclusion for complete words For each symbol, consider the event that this symbol is missing....

Detailed mathematical approach

Problem Summary

For an alphabet with $\alpha$ symbols, we must count all incomplete words whose lengths range from $0$ to $n$. A word is called incomplete when at least one symbol of the alphabet does not appear in it. The final answer is required modulo $10^9+7$.

If we tried to inspect every word directly, we would face $\alpha^t$ possibilities at length $t$, which is hopeless for the real parameters. The solution instead counts words by inclusion-exclusion, then sums those counts across all lengths in one algebraic expression.

Mathematical Approach

Let $I_t(\alpha)$ be the number of incomplete words of exact length $t$, and let

$T(\alpha,n)=\sum_{t=0}^{n} I_t(\alpha).$

The goal is to evaluate $T(\alpha,n)$ efficiently.

Step 1: Count all words of one fixed length

For a fixed length $t$, there are exactly

$\alpha^t$

words over the full alphabet, because each of the $t$ positions can use any of the $\alpha$ symbols.

Among these words, some are complete and use every symbol at least once. If $C_t(\alpha)$ denotes the number of complete words of length $t$, then the incomplete words are simply the complement:

$I_t(\alpha)=\alpha^t-C_t(\alpha).$

Step 2: Use inclusion-exclusion for complete words

For each symbol, consider the event that this symbol is missing. If we choose a set of exactly $j$ symbols that are forbidden, then the word may use only the remaining $\alpha-j$ symbols, so there are

$ (\alpha-j)^t $

such words.

By inclusion-exclusion, the number of complete words of length $t$ is

$C_t(\alpha)=\sum_{j=0}^{\alpha}(-1)^j\binom{\alpha}{j}(\alpha-j)^t.$

Therefore

$I_t(\alpha)=\alpha^t-C_t(\alpha)=\sum_{j=1}^{\alpha}(-1)^{j+1}\binom{\alpha}{j}(\alpha-j)^t.$

Now reindex with $k=\alpha-j$. Then $k$ runs from $0$ to $\alpha-1$, and we obtain the form used by the implementations:

$I_t(\alpha)=\sum_{k=0}^{\alpha-1}(-1)^{\alpha-k+1}\binom{\alpha}{k}k^t.$

This formula already explains why the summation stops at $\alpha-1$: the missing $k=\alpha$ term corresponds to the full alphabet and is exactly what disappears when we pass from complete words to incomplete words.

Step 3: Sum over all lengths $0$ through $n$

Insert the fixed-length formula into the outer sum:

$T(\alpha,n)=\sum_{t=0}^{n}\sum_{k=0}^{\alpha-1}(-1)^{\alpha-k+1}\binom{\alpha}{k}k^t.$

Because the summation is finite, we may swap the order:

$T(\alpha,n)=\sum_{k=0}^{\alpha-1}(-1)^{\alpha-k+1}\binom{\alpha}{k}\sum_{t=0}^{n}k^t.$

Define the geometric block

$G_k(n)=\sum_{t=0}^{n}k^t.$

Then

$T(\alpha,n)=\sum_{k=0}^{\alpha-1}(-1)^{\alpha-k+1}\binom{\alpha}{k}G_k(n).$

The three cases for $G_k(n)$ are

$G_k(n)= \begin{cases} 1, & k=0,\\ n+1, & k=1,\\ \dfrac{k^{n+1}-1}{k-1}, & k\ge 2. \end{cases}$

The case $k=0$ is special because only the empty word contributes: $0^0$ is interpreted combinatorially as $1$, while $0^t=0$ for $t>0$. The case $k=1$ is also special because there is exactly one word at every length.

Step 4: Convert the formula into modular arithmetic

The answer is required modulo the prime

$M=10^9+7.$

For the actual parameters, $\alpha<M$, so every integer $1,2,\dots,\alpha$ has a modular inverse modulo $M$. This makes two parts efficient.

First, the geometric term for $k\ge 2$ becomes

$G_k(n)\equiv \left(k^{n+1}-1\right)(k-1)^{-1}\pmod M,$

where $k^{n+1}$ is obtained by fast exponentiation.

Second, the binomial coefficients are generated incrementally instead of by factorial tables:

$\binom{\alpha}{0}=1,\qquad \binom{\alpha}{k+1}=\binom{\alpha}{k}\cdot\frac{\alpha-k}{k+1}.$

Modulo $M$, division by $k+1$ is replaced by multiplication with the modular inverse of $k+1$. This is exactly why the code precomputes inverses from $1$ to $\alpha$.

Step 5: Worked example with $\alpha=3$ and $n=4$

For $\alpha=3$, the transformed formula is

$T(3,4)=\sum_{k=0}^{2}(-1)^{4-k}\binom{3}{k}G_k(4).$

Now evaluate each geometric block:

$G_0(4)=1,\qquad G_1(4)=5,\qquad G_2(4)=1+2+4+8+16=31.$

So

$T(3,4)=\binom{3}{0}\cdot 1-\binom{3}{1}\cdot 5+\binom{3}{2}\cdot 31=1-15+93=79.$

This also matches the direct length-by-length count:

$I_0=1,\quad I_1=3,\quad I_2=9,\quad I_3=21,\quad I_4=45,\quad 1+3+9+21+45=79.$

The example is useful because it checks both the combinatorial interpretation and the algebraic transformation used by the code.

How the Code Works

The C++, Python, and Java implementations all evaluate the same summation formula modulo $10^9+7$. They first build modular inverses for the integers $1$ through $\alpha$, then generate the binomial coefficients $\binom{\alpha}{k}$ one after another with the multiplicative recurrence above. This avoids storing factorial tables for such a large alphabet size.

Next, the implementation loops over $k=0,1,\dots,\alpha-1$. For each $k$, it evaluates the geometric block $G_k(n)$. The cases $k=0$ and $k=1$ are handled directly, while $k\ge 2$ uses fast modular exponentiation together with the inverse of $k-1$.

Each term is multiplied by the corresponding binomial coefficient and inserted with alternating sign. The C++ implementation additionally splits the $k$-range across several threads for large inputs, but the mathematical expression is identical in all three languages.

Complexity Analysis

Precomputing modular inverses takes $O(\alpha)$ time and $O(\alpha)$ memory. Generating all binomial coefficients also takes $O(\alpha)$ time and $O(\alpha)$ memory. The final summation has $\alpha$ terms, and for every $k\ge 2$ it performs one fast exponentiation costing $O(\log n)$. Therefore the overall running time is

$O(\alpha\log n),$

with total memory usage

$O(\alpha).$

Parallel execution in the C++ version improves wall-clock time on large machines but does not change the asymptotic bound.

Footnotes and References

Problem statement: Project Euler 657
Inclusion-exclusion principle: Wikipedia — Inclusion-exclusion principle
Geometric series: Wikipedia — Geometric series
Binomial coefficient: Wikipedia — Binomial coefficient
Modular multiplicative inverse: Wikipedia — Modular multiplicative inverse