Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 361: Subsequence of Thue-Morse Sequence

View on Project Euler

Project Euler Problem 361 Solution

EulerSolve provides an optimized solution for Project Euler Problem 361, Subsequence of Thue-Morse Sequence, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary Let \(t=0110100110010110\ldots\) be the infinite Thue-Morse word, the fixed point of the morphism \(\mu:0\mapsto01,\;1\mapsto10\). For each length \(\ell\), let \(T_\ell\) be the set of distinct factors of \(t\) of length \(\ell\). Sort \(T_\ell\) lexicographically, keep only the words whose first bit is \(1\), and then concatenate these filtered lists for \(\ell=1,2,3,\ldots\). Reading every chosen word as a binary integer produces the sequence \(A(1),A(2),\ldots\). The goal is to compute $\sum_{k=1}^{18} A(10^k) \pmod{10^9}.$ Mathematical Approach 1. Count the Number of Factors of Each Length Write $p(\ell)=|T_\ell|.$ The solution files use the standard Thue-Morse factor-complexity recurrence with the explicit base values $p(1)=2,\qquad p(2)=4,\qquad p(3)=6,$ and for \(n\ge 2\), $p(2n)=p(n)+p(n+1),\qquad p(2n+1)=2p(n+1).$ Now define the cumulative count $s(\ell)=\sum_{j=1}^{\ell} p(j).$ The code memoizes the derived recurrences $s(2n)=3s(n)+s(n+1)-8,\qquad s(2n+1)=4s(n)+3p(n+1)-8.$ Why does this help with the Project Euler sequence? Because the Thue-Morse word is closed under bitwise complement: if \(w\in T_\ell\), then \(\overline{w}\in T_\ell\). No nonempty binary word equals its own complement, so the factors of length \(\ell\) are paired into one word starting with \(0\) and one starting with \(1\)....

Detailed mathematical approach

Problem Summary

Let \(t=0110100110010110\ldots\) be the infinite Thue-Morse word, the fixed point of the morphism \(\mu:0\mapsto01,\;1\mapsto10\). For each length \(\ell\), let \(T_\ell\) be the set of distinct factors of \(t\) of length \(\ell\). Sort \(T_\ell\) lexicographically, keep only the words whose first bit is \(1\), and then concatenate these filtered lists for \(\ell=1,2,3,\ldots\). Reading every chosen word as a binary integer produces the sequence \(A(1),A(2),\ldots\). The goal is to compute

$\sum_{k=1}^{18} A(10^k) \pmod{10^9}.$

Mathematical Approach

1. Count the Number of Factors of Each Length

Write

$p(\ell)=|T_\ell|.$

The solution files use the standard Thue-Morse factor-complexity recurrence with the explicit base values

$p(1)=2,\qquad p(2)=4,\qquad p(3)=6,$

and for \(n\ge 2\),

$p(2n)=p(n)+p(n+1),\qquad p(2n+1)=2p(n+1).$

Now define the cumulative count

$s(\ell)=\sum_{j=1}^{\ell} p(j).$

The code memoizes the derived recurrences

$s(2n)=3s(n)+s(n+1)-8,\qquad s(2n+1)=4s(n)+3p(n+1)-8.$

Why does this help with the Project Euler sequence? Because the Thue-Morse word is closed under bitwise complement: if \(w\in T_\ell\), then \(\overline{w}\in T_\ell\). No nonempty binary word equals its own complement, so the factors of length \(\ell\) are paired into one word starting with \(0\) and one starting with \(1\). Hence exactly half of them start with \(1\), and the number of valid Euler terms up to length \(\ell\) is

$\operatorname{count\_upto}(\ell)=\frac{s(\ell)}{2}.$

2. Convert a Global Index into a Length and a Local Rank

Given \(N\), the program first finds the smallest length \(\ell\) such that

$\frac{s(\ell)}{2}\ge N.$

This is done by doubling an upper bound and then applying binary search. Once \(\ell\) is known, the rank of the desired word inside the filtered list for this fixed length is

$k=N-\operatorname{count\_upto}(\ell-1).$

Inside the full lexicographic list of all words in \(T_\ell\), the leading-\(0\) half comes first and the leading-\(1\) half comes second. Therefore the wanted factor is the

$\frac{p(\ell)}{2}+k$

th factor of length \(\ell\) in complete lexicographic order. This is the reason for the line

$\texttt{offset}=p(\ell)/2+k$

in all three implementations.

3. Reconstruct the Lexicographic Factor Order Recursively

The difficult part is selecting the \(\texttt{offset}\)-th factor of length \(\ell\) without listing all factors. The implementations stop recursion at lengths \(1,2,3\), where the factor sets are stored explicitly:

$T_1=\{0,1\},\qquad T_2=\{00,01,10,11\},\qquad T_3=\{001,010,011,100,101,110\}.$

For \(\ell\ge 4\), the solution uses the unique reading frame induced by the length-2 morphism \(\mu\). Every factor either starts on a block boundary of \(\mu\) or one symbol later, and that leads to three lexicographic classes called \(O1\), \(E\), and \(O0\) in the source code.

4. Even Length Factors

Let \(\ell=2n\). If a factor starts on an even boundary, it is exactly \(\mu(u)\) for a unique \(u\in T_n\). This gives the middle class

$E(u)=\mu(u),\qquad u\in T_n,$

with size \(p(n)\).

If the factor starts one position later, it begins inside one \(\mu\)-block and ends inside another. For an ancestor \(a=a_1a_2\cdots a_{n+1}\in T_{n+1}\), the odd-start construction used by the code is

$\operatorname{OE}(a)=(1-a_1)\,\mu(a_2a_3\cdots a_n)\,a_{n+1}.$

If \(a_1=1\), the result starts with \(0\); if \(a_1=0\), the result starts with \(1\). Thus the complete lexicographic order for length \(2n\) splits into three consecutive blocks:

$O1,\qquad E,\qquad O0,$

with sizes

$\frac{p(n+1)}{2},\qquad p(n),\qquad \frac{p(n+1)}{2}.$

A concrete example is \(\ell=4\), where the ten factors are

$0010,0011,0100 \mid 0101,0110,1001,1010 \mid 1011,1100,1101.$

The first block is \(O1\), the middle block is \(E\), and the last block is \(O0\).

5. Odd Length Factors

Let \(\ell=2n+1\). There are again two alignment types, but now an odd-start factor crosses only the left block boundary, while an even-start factor crosses only the right one. For \(a=a_1a_2\cdots a_{n+1}\in T_{n+1}\), the code uses

$\operatorname{OO}(a)=(1-a_1)\,\mu(a_2a_3\cdots a_{n+1}),$

$\operatorname{EO}(a)=\mu(a_1a_2\cdots a_n)\,a_{n+1}.$

Exactly as in the even case, the lexicographic order is

$O1,\qquad E,\qquad O0,$

but now the block sizes are

$\frac{p(n+1)}{2},\qquad p(n+1),\qquad \frac{p(n+1)}{2}.$

This is why the recursive selector only needs the values \(p(n)\) and \(p(n+1)\) to decide which branch contains the required factor.

6. Evaluate the Selected Word Modulo \(10^9\) Without Expanding It

The chosen factor can be very long, so the program never builds its ordinary binary integer directly. Instead it defines

$M=10^9,\qquad B_i=2^{2^i}\bmod M,$

and lets \(V_i(w)\) be the value of a binary word \(w\) when read in base \(B_i\). In particular, \(V_0(w)\) is exactly the ordinary binary value modulo \(M\).

Concatenation satisfies the standard rule

$V_i(uv)=V_i(u)\,B_i^{|v|}+V_i(v).$

For the morphism, one block \(\mu(c)\) has value

$V_i(\mu(c))=1+(B_i-1)c,\qquad c\in\{0,1\}.$

So for \(w=c_1c_2\cdots c_m\),

$V_i(\mu(w))=\sum_{j=1}^{m} B_i^{m-j}\bigl(1+(B_i-1)c_j\bigr).$

Separating the constant part from the digit-dependent part gives

$V_i(\mu(w))=G_{i+1}(m)+(B_i-1)V_{i+1}(w),$

where

$G_j(m)=\sum_{r=0}^{m-1} B_j^r.$

This identity is exactly what the cached MU node evaluation computes. The helper pow_sum stores both \(B_i^m\) and \(G_i(m)\), so each evaluation step remains small and reusable.

How the Code Works

The C++, Python, and Java versions all follow the same structure. They memoize p_count and s_count, use find_length_for_index to locate the correct length, and then call select_node to build only a symbolic expression DAG for the requested factor. WordBuilder has four node types: empty, leaf, concatenation, and morphism application. The helper methods prefix, suffix, and middle implement the ancestor-to-descendant maps behind the \(O1/E/O0\) split. Finally get_eval computes the cached values \(V_i\) for each node, and compute_A_mod returns \(V_0\). The C++ file also contains sanity checks proving that the construction reproduces \(A(100)=3251\) and \(A(1000)=80852364498\).

Complexity Analysis

Because every recurrence halves its argument, computing \(p(\ell)\) or \(s(\ell)\) touches only \(O(\log \ell)\) memoized states. The length search for \(A(N)\) costs \(O(\log N)\) calls to count_upto. After that, select_node descends through \(O(\log \ell)\) recursive levels, and the node evaluator works with a fixed base table of size 64. So each query is handled in polylogarithmic time with \(O(\log \ell)\) symbolic memory, which is dramatically smaller than generating all factors or building the full binary integer.

References

  1. Problem page: https://projecteuler.net/problem=361
  2. Thue-Morse sequence: Wikipedia — Thue-Morse sequence
  3. Subword complexity: Wikipedia — Subword complexity
  4. J.-P. Allouche and J. Shallit, Automatic Sequences, Cambridge University Press, 2003.

Mathematical approach · C++ solution · Python solution · Java solution

Previous: Problem 360 · All Project Euler solutions · Next: Problem 362

Need help with a problem? Ask me! 💡
e
✦ Euler GLM 5.2
Hi! I'm Euler. Ask me anything about Project Euler problems, math concepts, or solution approaches! 🧮