Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 361: Subsequence of Thue-Morse Sequence

Project Euler Problem 361 Solution

EulerSolve provides an optimized solution for Project Euler Problem 361, Subsequence of Thue-Morse Sequence, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary Let $t=0110100110010110\ldots$ be the infinite Thue-Morse word, the fixed point of the morphism $\mu:0\mapsto01,\;1\mapsto10$. For each length $\ell$, let $T_\ell$ be the set of distinct factors of $t$ of length $\ell$. Sort $T_\ell$ lexicographically, keep only the words whose first bit is $1$, and then concatenate these filtered lists for $\ell=1,2,3,\ldots$. Reading every chosen word as a binary integer produces the sequence $A(1),A(2),\ldots$. The goal is to compute $\sum_{k=1}^{18} A(10^k) \pmod{10^9}.$ Mathematical Approach 1. Count the Number of Factors of Each Length Write $p(\ell)=|T_\ell|.$ The solution files use the standard Thue-Morse factor-complexity recurrence with the explicit base values $p(1)=2,\qquad p(2)=4,\qquad p(3)=6,$ and for $n\ge 2$, $p(2n)=p(n)+p(n+1),\qquad p(2n+1)=2p(n+1).$ Now define the cumulative count $s(\ell)=\sum_{j=1}^{\ell} p(j).$ The code memoizes the derived recurrences $s(2n)=3s(n)+s(n+1)-8,\qquad s(2n+1)=4s(n)+3p(n+1)-8.$ Why does this help with the Project Euler sequence? Because the Thue-Morse word is closed under bitwise complement: if $w\in T_\ell$, then $\overline{w}\in T_\ell$. No nonempty binary word equals its own complement, so the factors of length $\ell$ are paired into one word starting with $0$ and one starting with $1$....

Detailed mathematical approach

Problem Summary

Let $t=0110100110010110\ldots$ be the infinite Thue-Morse word, the fixed point of the morphism $\mu:0\mapsto01,\;1\mapsto10$. For each length $\ell$, let $T_\ell$ be the set of distinct factors of $t$ of length $\ell$. Sort $T_\ell$ lexicographically, keep only the words whose first bit is $1$, and then concatenate these filtered lists for $\ell=1,2,3,\ldots$. Reading every chosen word as a binary integer produces the sequence $A(1),A(2),\ldots$. The goal is to compute

$\sum_{k=1}^{18} A(10^k) \pmod{10^9}.$

Mathematical Approach

1. Count the Number of Factors of Each Length

Write

$p(\ell)=|T_\ell|.$

The solution files use the standard Thue-Morse factor-complexity recurrence with the explicit base values

$p(1)=2,\qquad p(2)=4,\qquad p(3)=6,$

and for $n\ge 2$,

$p(2n)=p(n)+p(n+1),\qquad p(2n+1)=2p(n+1).$

Now define the cumulative count

$s(\ell)=\sum_{j=1}^{\ell} p(j).$

The code memoizes the derived recurrences

$s(2n)=3s(n)+s(n+1)-8,\qquad s(2n+1)=4s(n)+3p(n+1)-8.$

Why does this help with the Project Euler sequence? Because the Thue-Morse word is closed under bitwise complement: if $w\in T_\ell$, then $\overline{w}\in T_\ell$. No nonempty binary word equals its own complement, so the factors of length $\ell$ are paired into one word starting with $0$ and one starting with $1$. Hence exactly half of them start with $1$, and the number of valid Euler terms up to length $\ell$ is

$\operatorname{count\_upto}(\ell)=\frac{s(\ell)}{2}.$

2. Convert a Global Index into a Length and a Local Rank

Given $N$, the program first finds the smallest length $\ell$ such that

$\frac{s(\ell)}{2}\ge N.$

This is done by doubling an upper bound and then applying binary search. Once $\ell$ is known, the rank of the desired word inside the filtered list for this fixed length is

$k=N-\operatorname{count\_upto}(\ell-1).$

Inside the full lexicographic list of all words in $T_\ell$, the leading-$0$ half comes first and the leading-$1$ half comes second. Therefore the wanted factor is the

$\frac{p(\ell)}{2}+k$

th factor of length $\ell$ in complete lexicographic order. This is the reason for the line

$\texttt{offset}=p(\ell)/2+k$

in all three implementations.

3. Reconstruct the Lexicographic Factor Order Recursively

The difficult part is selecting the $\texttt{offset}$-th factor of length $\ell$ without listing all factors. The implementations stop recursion at lengths $1,2,3$, where the factor sets are stored explicitly:

$T_1=\{0,1\},\qquad T_2=\{00,01,10,11\},\qquad T_3=\{001,010,011,100,101,110\}.$

For $\ell\ge 4$, the solution uses the unique reading frame induced by the length-2 morphism $\mu$. Every factor either starts on a block boundary of $\mu$ or one symbol later, and that leads to three lexicographic classes called $O1$, $E$, and $O0$ in the source code.

4. Even Length Factors

Let $\ell=2n$. If a factor starts on an even boundary, it is exactly $\mu(u)$ for a unique $u\in T_n$. This gives the middle class

$E(u)=\mu(u),\qquad u\in T_n,$

with size $p(n)$.

If the factor starts one position later, it begins inside one $\mu$-block and ends inside another. For an ancestor $a=a_1a_2\cdots a_{n+1}\in T_{n+1}$, the odd-start construction used by the code is

$\operatorname{OE}(a)=(1-a_1)\,\mu(a_2a_3\cdots a_n)\,a_{n+1}.$

If $a_1=1$, the result starts with $0$; if $a_1=0$, the result starts with $1$. Thus the complete lexicographic order for length $2n$ splits into three consecutive blocks:

$O1,\qquad E,\qquad O0,$

with sizes

$\frac{p(n+1)}{2},\qquad p(n),\qquad \frac{p(n+1)}{2}.$

A concrete example is $\ell=4$, where the ten factors are

$0010,0011,0100 \mid 0101,0110,1001,1010 \mid 1011,1100,1101.$

The first block is $O1$, the middle block is $E$, and the last block is $O0$.

5. Odd Length Factors

Let $\ell=2n+1$. There are again two alignment types, but now an odd-start factor crosses only the left block boundary, while an even-start factor crosses only the right one. For $a=a_1a_2\cdots a_{n+1}\in T_{n+1}$, the code uses

$\operatorname{OO}(a)=(1-a_1)\,\mu(a_2a_3\cdots a_{n+1}),$

$\operatorname{EO}(a)=\mu(a_1a_2\cdots a_n)\,a_{n+1}.$

Exactly as in the even case, the lexicographic order is

$O1,\qquad E,\qquad O0,$

but now the block sizes are

$\frac{p(n+1)}{2},\qquad p(n+1),\qquad \frac{p(n+1)}{2}.$

This is why the recursive selector only needs the values $p(n)$ and $p(n+1)$ to decide which branch contains the required factor.

6. Evaluate the Selected Word Modulo $10^9$ Without Expanding It

The chosen factor can be very long, so the program never builds its ordinary binary integer directly. Instead it defines

$M=10^9,\qquad B_i=2^{2^i}\bmod M,$

and lets $V_i(w)$ be the value of a binary word $w$ when read in base $B_i$. In particular, $V_0(w)$ is exactly the ordinary binary value modulo $M$.

Concatenation satisfies the standard rule

$V_i(uv)=V_i(u)\,B_i^{|v|}+V_i(v).$

For the morphism, one block $\mu(c)$ has value

$V_i(\mu(c))=1+(B_i-1)c,\qquad c\in\{0,1\}.$

So for $w=c_1c_2\cdots c_m$,

$V_i(\mu(w))=\sum_{j=1}^{m} B_i^{m-j}\bigl(1+(B_i-1)c_j\bigr).$

Separating the constant part from the digit-dependent part gives

$V_i(\mu(w))=G_{i+1}(m)+(B_i-1)V_{i+1}(w),$

where

$G_j(m)=\sum_{r=0}^{m-1} B_j^r.$

This identity is exactly what the cached MU node evaluation computes. The helper pow_sum stores both $B_i^m$ and $G_i(m)$, so each evaluation step remains small and reusable.

How the Code Works

The C++, Python, and Java versions all follow the same structure. They memoize p_count and s_count, use find_length_for_index to locate the correct length, and then call select_node to build only a symbolic expression DAG for the requested factor. WordBuilder has four node types: empty, leaf, concatenation, and morphism application. The helper methods prefix, suffix, and middle implement the ancestor-to-descendant maps behind the $O1/E/O0$ split. Finally get_eval computes the cached values $V_i$ for each node, and compute_A_mod returns $V_0$. The C++ file also contains sanity checks proving that the construction reproduces $A(100)=3251$ and $A(1000)=80852364498$.

Complexity Analysis

Because every recurrence halves its argument, computing $p(\ell)$ or $s(\ell)$ touches only $O(\log \ell)$ memoized states. The length search for $A(N)$ costs $O(\log N)$ calls to count_upto. After that, select_node descends through $O(\log \ell)$ recursive levels, and the node evaluator works with a fixed base table of size 64. So each query is handled in polylogarithmic time with $O(\log \ell)$ symbolic memory, which is dramatically smaller than generating all factors or building the full binary integer.

References

Problem page: https://projecteuler.net/problem=361
Thue-Morse sequence: Wikipedia — Thue-Morse sequence
Subword complexity: Wikipedia — Subword complexity
J.-P. Allouche and J. Shallit, Automatic Sequences, Cambridge University Press, 2003.