Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 427: $n$-sequences

View on Project Euler

Project Euler Problem 427 Solution

EulerSolve provides an optimized solution for Project Euler Problem 427, $n$-sequences, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary An \(n\)-sequence is a word of length \(n\) over the alphabet \(\{1,2,\dots,n\}\), so the total number of such sequences is \(n^n\). For a sequence \(S\), let \(L(S)\) denote the length of its longest contiguous run of equal values. The target quantity is $f(n)=\sum_S L(S)\pmod{10^9+9}.$ Direct enumeration is hopeless for the actual input \(n=7{,}500{,}000\), so the implementations reorganize the problem into a counting question: for each \(k\), how many sequences satisfy \(L(S)\le k\)? Mathematical Approach Step 1: Tail-Sum Identity Define $N_k(n)=\#\{S:\,L(S)\le k\}.$ For a fixed sequence with \(L(S)=\ell\), the inequality \(\ell\le k\) is true for exactly \(n-\ell\) integers \(k\in\{1,2,\dots,n-1\}\). Therefore $\ell=n-\sum_{k=1}^{n-1}\mathbf{1}_{\ell\le k}.$ Summing this identity over all \(n^n\) sequences gives $f(n)=n\cdot n^n-\sum_{k=1}^{n-1}N_k(n)=n^{n+1}-\sum_{k=1}^{n-1}N_k(n).$ So the whole task reduces to computing \(N_k(n)\) efficiently for every \(1\le k\le n-1\). Step 2: Encode a Sequence by Its Runs Fix the first symbol. Any valid sequence with longest run at most \(k\) can be written as \(r\) consecutive runs with lengths $\ell_1+\ell_2+\cdots+\ell_r=n,\qquad 1\le \ell_i\le k.$ Once the first run value is fixed, each later run may use any symbol different from the previous one, so it contributes a factor \(n-1\)....

Detailed mathematical approach

Problem Summary

An \(n\)-sequence is a word of length \(n\) over the alphabet \(\{1,2,\dots,n\}\), so the total number of such sequences is \(n^n\). For a sequence \(S\), let \(L(S)\) denote the length of its longest contiguous run of equal values. The target quantity is

$f(n)=\sum_S L(S)\pmod{10^9+9}.$

Direct enumeration is hopeless for the actual input \(n=7{,}500{,}000\), so the implementations reorganize the problem into a counting question: for each \(k\), how many sequences satisfy \(L(S)\le k\)?

Mathematical Approach

Step 1: Tail-Sum Identity

Define

$N_k(n)=\#\{S:\,L(S)\le k\}.$

For a fixed sequence with \(L(S)=\ell\), the inequality \(\ell\le k\) is true for exactly \(n-\ell\) integers \(k\in\{1,2,\dots,n-1\}\). Therefore

$\ell=n-\sum_{k=1}^{n-1}\mathbf{1}_{\ell\le k}.$

Summing this identity over all \(n^n\) sequences gives

$f(n)=n\cdot n^n-\sum_{k=1}^{n-1}N_k(n)=n^{n+1}-\sum_{k=1}^{n-1}N_k(n).$

So the whole task reduces to computing \(N_k(n)\) efficiently for every \(1\le k\le n-1\).

Step 2: Encode a Sequence by Its Runs

Fix the first symbol. Any valid sequence with longest run at most \(k\) can be written as \(r\) consecutive runs with lengths

$\ell_1+\ell_2+\cdots+\ell_r=n,\qquad 1\le \ell_i\le k.$

Once the first run value is fixed, each later run may use any symbol different from the previous one, so it contributes a factor \(n-1\). Hence a composition into \(r\) parts contributes \((n-1)^{r-1}\) sequences for each fixed first symbol.

Let \(G_k(M)\) be the number of valid continuations of length \(M\) after fixing the first position. Then \(N_k(n)=n\,G_k(n-1)\), and

$G_k(n-1)=\sum_{r\ge 1}(n-1)^{r-1}[x^n]\bigl(x+x^2+\cdots+x^k\bigr)^r,$

where \([x^m]\) denotes the coefficient of \(x^m\).

Step 3: Generating Function

Set

$R_k(x)=x+x^2+\cdots+x^k=\frac{x(1-x^k)}{1-x}.$

Then the run decomposition becomes a geometric series:

$\sum_{r\ge 1}(n-1)^{r-1}R_k(x)^r=\frac{R_k(x)}{1-(n-1)R_k(x)}.$

Substituting the closed form for \(R_k(x)\) yields

$\frac{x(1-x^k)}{1-nx+(n-1)x^{k+1}}.$

Therefore, with \(M=n-1\),

$G_k(M)=[x^M]\frac{1-x^k}{1-nx+(n-1)x^{k+1}}.$

It is convenient to define

$A_{M,k}=[x^M]\frac{1}{1-nx+(n-1)x^{k+1}}.$

Then

$G_k(M)=A_{M,k}-A_{M-k,k},$

with the convention \(A_{M,k}=0\) for \(M<0\).

Step 4: Closed Form Used by the Implementation

Expand the denominator as a geometric series:

$\frac{1}{1-nx+(n-1)x^{k+1}}=\sum_{t\ge 0}\bigl(nx-(n-1)x^{k+1}\bigr)^t.$

Inside the \(t\)-th power, choose \(p\) factors of \(-(n-1)x^{k+1}\) and \(t-p\) factors of \(nx\). The total exponent of \(x\) is then \(t+pk\). To obtain the coefficient of \(x^M\), we must have \(t=M-pk\). This gives

$A_{M,k}=\sum_{p=0}^{\lfloor M/(k+1)\rfloor}\binom{M-pk}{p}\,n^{M-p(k+1)}\bigl(-(n-1)\bigr)^p.$

The alternating sign is exactly the correction that one also sees from inclusion-exclusion on overlong runs; the generating-function derivation packages the same cancellation into a compact formula.

Hence

$N_k(n)=n\bigl(A_{n-1,k}-A_{n-1-k,k}\bigr),$

and therefore

$\boxed{f(n)=n^{n+1}-n\sum_{k=1}^{n-1}\bigl(A_{n-1,k}-A_{n-1-k,k}\bigr)\pmod{10^9+9}.}$

This is exactly the formula evaluated by the C++, Python, and Java implementations.

Worked Example: \(n=3\)

There are \(3^3=27\) sequences in total.

For \(k=1\), every run must have length \(1\), so adjacent values must always change. With the first symbol fixed, each of the next two positions has \(2\) choices, giving \(2^2=4\) continuations. Thus

$N_1(3)=3\cdot 4=12.$

For \(k=2\), the only forbidden sequences are the constant ones \(111\), \(222\), and \(333\). Therefore

$N_2(3)=27-3=24.$

Now apply the tail-sum formula:

$f(3)=3\cdot 27-12-24=45.$

This matches the checkpoint used by the implementations. They also verify \(f(7)=1403689\) and \(f(11)=496890792\).

How the Code Works

The implementations precompute factorials and inverse factorials modulo \(10^9+9\), so every binomial coefficient in the closed form can be obtained in \(O(1)\) time. They also precompute the powers \(n^q\) and \((-(n-1))^q\) that appear throughout the sum.

For each \(k\), the implementation evaluates the two coefficient sums corresponding to \(A_{n-1,k}\) and \(A_{n-1-k,k}\), converts them into \(N_k(n)\), accumulates all these values, and finally subtracts the total from \(n^{n+1}\). The C++ version parallelizes the outer loop over \(k\), while the Python and Java versions follow the same mathematics without changing the formula.

Complexity Analysis

Precomputing factorials, inverse factorials, and power tables costs \(O(n)\) time and \(O(n)\) memory. For a fixed \(k\), the inner summation over \(p\) has \(\lfloor (n-1)/(k+1)\rfloor+1\) terms. Summing over all \(k\) gives

$\sum_{k=1}^{n-1} O\!\left(\frac{n}{k+1}\right)=O(n\log n).$

So the overall method runs in \(O(n\log n)\) time and \(O(n)\) memory, which is dramatically smaller than the impossible \(O(n^n)\) brute-force search.

Footnotes and References

  1. Problem page: https://projecteuler.net/problem=427
  2. Runs in strings: Wikipedia — Run (string)
  3. Integer compositions: Wikipedia — Composition (combinatorics)
  4. Generating functions: Wikipedia — Generating function
  5. Inclusion-exclusion principle: Wikipedia — Inclusion-exclusion principle

Mathematical approach · C++ solution · Python solution · Java solution

Previous: Problem 426 · All Project Euler solutions · Next: Problem 428

Need help with a problem? Ask me! 💡
e
✦ Euler GLM 5.2
Hi! I'm Euler. Ask me anything about Project Euler problems, math concepts, or solution approaches! 🧮