Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 427: $n$-sequences

Project Euler Problem 427 Solution

EulerSolve provides an optimized solution for Project Euler Problem 427, $n$-sequences, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary An $n$-sequence is a word of length $n$ over the alphabet $\{1,2,\dots,n\}$, so the total number of such sequences is $n^n$. For a sequence $S$, let $L(S)$ denote the length of its longest contiguous run of equal values. The target quantity is $f(n)=\sum_S L(S)\pmod{10^9+9}.$ Direct enumeration is hopeless for the actual input $n=7{,}500{,}000$, so the implementations reorganize the problem into a counting question: for each $k$, how many sequences satisfy $L(S)\le k$? Mathematical Approach Step 1: Tail-Sum Identity Define $N_k(n)=\#\{S:\,L(S)\le k\}.$ For a fixed sequence with $L(S)=\ell$, the inequality $\ell\le k$ is true for exactly $n-\ell$ integers $k\in\{1,2,\dots,n-1\}$. Therefore $\ell=n-\sum_{k=1}^{n-1}\mathbf{1}_{\ell\le k}.$ Summing this identity over all $n^n$ sequences gives $f(n)=n\cdot n^n-\sum_{k=1}^{n-1}N_k(n)=n^{n+1}-\sum_{k=1}^{n-1}N_k(n).$ So the whole task reduces to computing $N_k(n)$ efficiently for every $1\le k\le n-1$. Step 2: Encode a Sequence by Its Runs Fix the first symbol. Any valid sequence with longest run at most $k$ can be written as $r$ consecutive runs with lengths $\ell_1+\ell_2+\cdots+\ell_r=n,\qquad 1\le \ell_i\le k.$ Once the first run value is fixed, each later run may use any symbol different from the previous one, so it contributes a factor $n-1$....

Detailed mathematical approach

Problem Summary

An $n$-sequence is a word of length $n$ over the alphabet $\{1,2,\dots,n\}$, so the total number of such sequences is $n^n$. For a sequence $S$, let $L(S)$ denote the length of its longest contiguous run of equal values. The target quantity is

$f(n)=\sum_S L(S)\pmod{10^9+9}.$

Direct enumeration is hopeless for the actual input $n=7{,}500{,}000$, so the implementations reorganize the problem into a counting question: for each $k$, how many sequences satisfy $L(S)\le k$?

Mathematical Approach

Step 1: Tail-Sum Identity

Define

$N_k(n)=\#\{S:\,L(S)\le k\}.$

For a fixed sequence with $L(S)=\ell$, the inequality $\ell\le k$ is true for exactly $n-\ell$ integers $k\in\{1,2,\dots,n-1\}$. Therefore

$\ell=n-\sum_{k=1}^{n-1}\mathbf{1}_{\ell\le k}.$

Summing this identity over all $n^n$ sequences gives

$f(n)=n\cdot n^n-\sum_{k=1}^{n-1}N_k(n)=n^{n+1}-\sum_{k=1}^{n-1}N_k(n).$

So the whole task reduces to computing $N_k(n)$ efficiently for every $1\le k\le n-1$.

Step 2: Encode a Sequence by Its Runs

Fix the first symbol. Any valid sequence with longest run at most $k$ can be written as $r$ consecutive runs with lengths

$\ell_1+\ell_2+\cdots+\ell_r=n,\qquad 1\le \ell_i\le k.$

Once the first run value is fixed, each later run may use any symbol different from the previous one, so it contributes a factor $n-1$. Hence a composition into $r$ parts contributes $(n-1)^{r-1}$ sequences for each fixed first symbol.

Let $G_k(M)$ be the number of valid continuations of length $M$ after fixing the first position. Then $N_k(n)=n\,G_k(n-1)$, and

$G_k(n-1)=\sum_{r\ge 1}(n-1)^{r-1}[x^n]\bigl(x+x^2+\cdots+x^k\bigr)^r,$

where $[x^m]$ denotes the coefficient of $x^m$.

Step 3: Generating Function

Set

$R_k(x)=x+x^2+\cdots+x^k=\frac{x(1-x^k)}{1-x}.$

Then the run decomposition becomes a geometric series:

$\sum_{r\ge 1}(n-1)^{r-1}R_k(x)^r=\frac{R_k(x)}{1-(n-1)R_k(x)}.$

Substituting the closed form for $R_k(x)$ yields

$\frac{x(1-x^k)}{1-nx+(n-1)x^{k+1}}.$

Therefore, with $M=n-1$,

$G_k(M)=[x^M]\frac{1-x^k}{1-nx+(n-1)x^{k+1}}.$

It is convenient to define

$A_{M,k}=[x^M]\frac{1}{1-nx+(n-1)x^{k+1}}.$

Then

$G_k(M)=A_{M,k}-A_{M-k,k},$

with the convention $A_{M,k}=0$ for $M<0$.

Step 4: Closed Form Used by the Implementation

Expand the denominator as a geometric series:

$\frac{1}{1-nx+(n-1)x^{k+1}}=\sum_{t\ge 0}\bigl(nx-(n-1)x^{k+1}\bigr)^t.$

Inside the $t$-th power, choose $p$ factors of $-(n-1)x^{k+1}$ and $t-p$ factors of $nx$. The total exponent of $x$ is then $t+pk$. To obtain the coefficient of $x^M$, we must have $t=M-pk$. This gives

$A_{M,k}=\sum_{p=0}^{\lfloor M/(k+1)\rfloor}\binom{M-pk}{p}\,n^{M-p(k+1)}\bigl(-(n-1)\bigr)^p.$

The alternating sign is exactly the correction that one also sees from inclusion-exclusion on overlong runs; the generating-function derivation packages the same cancellation into a compact formula.

Hence

$N_k(n)=n\bigl(A_{n-1,k}-A_{n-1-k,k}\bigr),$

and therefore

$\boxed{f(n)=n^{n+1}-n\sum_{k=1}^{n-1}\bigl(A_{n-1,k}-A_{n-1-k,k}\bigr)\pmod{10^9+9}.}$

This is exactly the formula evaluated by the C++, Python, and Java implementations.

Worked Example: $n=3$

There are $3^3=27$ sequences in total.

For $k=1$, every run must have length $1$, so adjacent values must always change. With the first symbol fixed, each of the next two positions has $2$ choices, giving $2^2=4$ continuations. Thus

$N_1(3)=3\cdot 4=12.$

For $k=2$, the only forbidden sequences are the constant ones $111$, $222$, and $333$. Therefore

$N_2(3)=27-3=24.$

Now apply the tail-sum formula:

$f(3)=3\cdot 27-12-24=45.$

This matches the checkpoint used by the implementations. They also verify $f(7)=1403689$ and $f(11)=496890792$.

How the Code Works

The implementations precompute factorials and inverse factorials modulo $10^9+9$, so every binomial coefficient in the closed form can be obtained in $O(1)$ time. They also precompute the powers $n^q$ and $(-(n-1))^q$ that appear throughout the sum.

For each $k$, the implementation evaluates the two coefficient sums corresponding to $A_{n-1,k}$ and $A_{n-1-k,k}$, converts them into $N_k(n)$, accumulates all these values, and finally subtracts the total from $n^{n+1}$. The C++ version parallelizes the outer loop over $k$, while the Python and Java versions follow the same mathematics without changing the formula.

Complexity Analysis

Precomputing factorials, inverse factorials, and power tables costs $O(n)$ time and $O(n)$ memory. For a fixed $k$, the inner summation over $p$ has $\lfloor (n-1)/(k+1)\rfloor+1$ terms. Summing over all $k$ gives

$\sum_{k=1}^{n-1} O\!\left(\frac{n}{k+1}\right)=O(n\log n).$

So the overall method runs in $O(n\log n)$ time and $O(n)$ memory, which is dramatically smaller than the impossible $O(n^n)$ brute-force search.

Footnotes and References

Problem page: https://projecteuler.net/problem=427
Runs in strings: Wikipedia — Run (string)
Integer compositions: Wikipedia — Composition (combinatorics)
Generating functions: Wikipedia — Generating function
Inclusion-exclusion principle: Wikipedia — Inclusion-exclusion principle