Problem 427: $n$-sequences
View on Project EulerProject Euler Problem 427 Solution
EulerSolve provides an optimized solution for Project Euler Problem 427, $n$-sequences, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary An \(n\)-sequence is a word of length \(n\) over the alphabet \(\{1,2,\dots,n\}\), so the total number of such sequences is \(n^n\). For a sequence \(S\), let \(L(S)\) denote the length of its longest contiguous run of equal values. The target quantity is $f(n)=\sum_S L(S)\pmod{10^9+9}.$ Direct enumeration is hopeless for the actual input \(n=7{,}500{,}000\), so the implementations reorganize the problem into a counting question: for each \(k\), how many sequences satisfy \(L(S)\le k\)? Mathematical Approach Step 1: Tail-Sum Identity Define $N_k(n)=\#\{S:\,L(S)\le k\}.$ For a fixed sequence with \(L(S)=\ell\), the inequality \(\ell\le k\) is true for exactly \(n-\ell\) integers \(k\in\{1,2,\dots,n-1\}\). Therefore $\ell=n-\sum_{k=1}^{n-1}\mathbf{1}_{\ell\le k}.$ Summing this identity over all \(n^n\) sequences gives $f(n)=n\cdot n^n-\sum_{k=1}^{n-1}N_k(n)=n^{n+1}-\sum_{k=1}^{n-1}N_k(n).$ So the whole task reduces to computing \(N_k(n)\) efficiently for every \(1\le k\le n-1\). Step 2: Encode a Sequence by Its Runs Fix the first symbol. Any valid sequence with longest run at most \(k\) can be written as \(r\) consecutive runs with lengths $\ell_1+\ell_2+\cdots+\ell_r=n,\qquad 1\le \ell_i\le k.$ Once the first run value is fixed, each later run may use any symbol different from the previous one, so it contributes a factor \(n-1\)....
Detailed mathematical approach
Problem Summary
An \(n\)-sequence is a word of length \(n\) over the alphabet \(\{1,2,\dots,n\}\), so the total number of such sequences is \(n^n\). For a sequence \(S\), let \(L(S)\) denote the length of its longest contiguous run of equal values. The target quantity is
$f(n)=\sum_S L(S)\pmod{10^9+9}.$
Direct enumeration is hopeless for the actual input \(n=7{,}500{,}000\), so the implementations reorganize the problem into a counting question: for each \(k\), how many sequences satisfy \(L(S)\le k\)?
Mathematical Approach
Step 1: Tail-Sum Identity
Define
$N_k(n)=\#\{S:\,L(S)\le k\}.$
For a fixed sequence with \(L(S)=\ell\), the inequality \(\ell\le k\) is true for exactly \(n-\ell\) integers \(k\in\{1,2,\dots,n-1\}\). Therefore
$\ell=n-\sum_{k=1}^{n-1}\mathbf{1}_{\ell\le k}.$
Summing this identity over all \(n^n\) sequences gives
$f(n)=n\cdot n^n-\sum_{k=1}^{n-1}N_k(n)=n^{n+1}-\sum_{k=1}^{n-1}N_k(n).$
So the whole task reduces to computing \(N_k(n)\) efficiently for every \(1\le k\le n-1\).
Step 2: Encode a Sequence by Its Runs
Fix the first symbol. Any valid sequence with longest run at most \(k\) can be written as \(r\) consecutive runs with lengths
$\ell_1+\ell_2+\cdots+\ell_r=n,\qquad 1\le \ell_i\le k.$
Once the first run value is fixed, each later run may use any symbol different from the previous one, so it contributes a factor \(n-1\). Hence a composition into \(r\) parts contributes \((n-1)^{r-1}\) sequences for each fixed first symbol.
Let \(G_k(M)\) be the number of valid continuations of length \(M\) after fixing the first position. Then \(N_k(n)=n\,G_k(n-1)\), and
$G_k(n-1)=\sum_{r\ge 1}(n-1)^{r-1}[x^n]\bigl(x+x^2+\cdots+x^k\bigr)^r,$
where \([x^m]\) denotes the coefficient of \(x^m\).
Step 3: Generating Function
Set
$R_k(x)=x+x^2+\cdots+x^k=\frac{x(1-x^k)}{1-x}.$
Then the run decomposition becomes a geometric series:
$\sum_{r\ge 1}(n-1)^{r-1}R_k(x)^r=\frac{R_k(x)}{1-(n-1)R_k(x)}.$
Substituting the closed form for \(R_k(x)\) yields
$\frac{x(1-x^k)}{1-nx+(n-1)x^{k+1}}.$
Therefore, with \(M=n-1\),
$G_k(M)=[x^M]\frac{1-x^k}{1-nx+(n-1)x^{k+1}}.$
It is convenient to define
$A_{M,k}=[x^M]\frac{1}{1-nx+(n-1)x^{k+1}}.$
Then
$G_k(M)=A_{M,k}-A_{M-k,k},$
with the convention \(A_{M,k}=0\) for \(M<0\).
Step 4: Closed Form Used by the Implementation
Expand the denominator as a geometric series:
$\frac{1}{1-nx+(n-1)x^{k+1}}=\sum_{t\ge 0}\bigl(nx-(n-1)x^{k+1}\bigr)^t.$
Inside the \(t\)-th power, choose \(p\) factors of \(-(n-1)x^{k+1}\) and \(t-p\) factors of \(nx\). The total exponent of \(x\) is then \(t+pk\). To obtain the coefficient of \(x^M\), we must have \(t=M-pk\). This gives
$A_{M,k}=\sum_{p=0}^{\lfloor M/(k+1)\rfloor}\binom{M-pk}{p}\,n^{M-p(k+1)}\bigl(-(n-1)\bigr)^p.$
The alternating sign is exactly the correction that one also sees from inclusion-exclusion on overlong runs; the generating-function derivation packages the same cancellation into a compact formula.
Hence
$N_k(n)=n\bigl(A_{n-1,k}-A_{n-1-k,k}\bigr),$
and therefore
$\boxed{f(n)=n^{n+1}-n\sum_{k=1}^{n-1}\bigl(A_{n-1,k}-A_{n-1-k,k}\bigr)\pmod{10^9+9}.}$
This is exactly the formula evaluated by the C++, Python, and Java implementations.
Worked Example: \(n=3\)
There are \(3^3=27\) sequences in total.
For \(k=1\), every run must have length \(1\), so adjacent values must always change. With the first symbol fixed, each of the next two positions has \(2\) choices, giving \(2^2=4\) continuations. Thus
$N_1(3)=3\cdot 4=12.$
For \(k=2\), the only forbidden sequences are the constant ones \(111\), \(222\), and \(333\). Therefore
$N_2(3)=27-3=24.$
Now apply the tail-sum formula:
$f(3)=3\cdot 27-12-24=45.$
This matches the checkpoint used by the implementations. They also verify \(f(7)=1403689\) and \(f(11)=496890792\).
How the Code Works
The implementations precompute factorials and inverse factorials modulo \(10^9+9\), so every binomial coefficient in the closed form can be obtained in \(O(1)\) time. They also precompute the powers \(n^q\) and \((-(n-1))^q\) that appear throughout the sum.
For each \(k\), the implementation evaluates the two coefficient sums corresponding to \(A_{n-1,k}\) and \(A_{n-1-k,k}\), converts them into \(N_k(n)\), accumulates all these values, and finally subtracts the total from \(n^{n+1}\). The C++ version parallelizes the outer loop over \(k\), while the Python and Java versions follow the same mathematics without changing the formula.
Complexity Analysis
Precomputing factorials, inverse factorials, and power tables costs \(O(n)\) time and \(O(n)\) memory. For a fixed \(k\), the inner summation over \(p\) has \(\lfloor (n-1)/(k+1)\rfloor+1\) terms. Summing over all \(k\) gives
$\sum_{k=1}^{n-1} O\!\left(\frac{n}{k+1}\right)=O(n\log n).$
So the overall method runs in \(O(n\log n)\) time and \(O(n)\) memory, which is dramatically smaller than the impossible \(O(n^n)\) brute-force search.
Footnotes and References
- Problem page: https://projecteuler.net/problem=427
- Runs in strings: Wikipedia — Run (string)
- Integer compositions: Wikipedia — Composition (combinatorics)
- Generating functions: Wikipedia — Generating function
- Inclusion-exclusion principle: Wikipedia — Inclusion-exclusion principle