Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 384: Rudin-Shapiro Sequence

Project Euler Problem 384 Solution

EulerSolve provides an optimized solution for Project Euler Problem 384, Rudin-Shapiro Sequence, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary The Rudin-Shapiro sign sequence used by the solutions is $b(n)=(-1)^{\operatorname{popcount}(n \mathbin{\&} (n \gg 1))}\in\{-1,+1\}.$ The bit count here is exactly the number of overlapping occurrences of the binary pattern $11$ inside the expansion of $n$. Its prefix walk is $A(n)=\sum_{k=0}^{n} b(k).$ For integers $t\ge 1$ and $1\le c\le t$, define $g(t,c)$ as the $c$-th nonnegative index $n$ such that $A(n)=t$. The program evaluates $\sum_{t=2}^{45} g(F_t,F_{t-1}),\qquad F_0=F_1=1.$ A direct simulation would have to generate the walk up to extremely large indices. The local C++, Python, and Java implementations avoid that completely: they count how often level $t$ has appeared up to a bound $x$, then invert that monotone counting function by binary search. Mathematical Approach Step 1: Two-bit identities for $b(n)$ Write $q(n)=\operatorname{popcount}(n \mathbin{\&} (n \gg 1)) \bmod 2,$ so that $b(n)=(-1)^{q(n)}$. Looking at the last two binary digits gives four exact identities: $q(4m)=q(m),\qquad q(4m+1)=q(m),$ $q(4m+2)\equiv q(m)+m \pmod 2,\qquad q(4m+3)\equiv q(m)+m+1 \pmod 2.$ Therefore $b(4m)=b(m),\qquad b(4m+1)=b(m),$ $b(4m+2)=(-1)^m b(m),\qquad b(4m+3)=-(-1)^m b(m).$ This 4-block structure is the real mathematical source of the recursion in all three solution files....

Detailed mathematical approach

Problem Summary

The Rudin-Shapiro sign sequence used by the solutions is

$b(n)=(-1)^{\operatorname{popcount}(n \mathbin{\&} (n \gg 1))}\in\{-1,+1\}.$

The bit count here is exactly the number of overlapping occurrences of the binary pattern $11$ inside the expansion of $n$. Its prefix walk is

$A(n)=\sum_{k=0}^{n} b(k).$

For integers $t\ge 1$ and $1\le c\le t$, define $g(t,c)$ as the $c$-th nonnegative index $n$ such that $A(n)=t$. The program evaluates

$\sum_{t=2}^{45} g(F_t,F_{t-1}),\qquad F_0=F_1=1.$

A direct simulation would have to generate the walk up to extremely large indices. The local C++, Python, and Java implementations avoid that completely: they count how often level $t$ has appeared up to a bound $x$, then invert that monotone counting function by binary search.

Mathematical Approach

Step 1: Two-bit identities for $b(n)$

Write

$q(n)=\operatorname{popcount}(n \mathbin{\&} (n \gg 1)) \bmod 2,$

so that $b(n)=(-1)^{q(n)}$. Looking at the last two binary digits gives four exact identities:

$q(4m)=q(m),\qquad q(4m+1)=q(m),$

$q(4m+2)\equiv q(m)+m \pmod 2,\qquad q(4m+3)\equiv q(m)+m+1 \pmod 2.$

Therefore

$b(4m)=b(m),\qquad b(4m+1)=b(m),$

$b(4m+2)=(-1)^m b(m),\qquad b(4m+3)=-(-1)^m b(m).$

This 4-block structure is the real mathematical source of the recursion in all three solution files.

Step 2: Block formulas for the prefix walk $A(n)$

Summing one whole block gives

$b(4m)+b(4m+1)+b(4m+2)+b(4m+3)=2b(m).$

After accumulating blocks from $0$ to $m$, we obtain

$A(4m+3)=2A(m).$

Using neighboring terms inside the same block then yields the exact formulas used implicitly by the code:

$A(4m)=2A(m)-b(m),\qquad A(4m+1)=2A(m),$

$A(4m+2)=2A(m)+(-1)^m b(m),\qquad A(4m+3)=2A(m).$

So every recursive call reduces the index scale from $x$ to roughly $x/4$.

Step 3: Why the memo stores a pair, not one counter

The implementations memoize two functions:

$P_t(x)=\#\{0\le n\le x: A(n)=t,\ b(n)=+1\},$

$N_t(x)=\#\{0\le n\le x: A(n)=t,\ b(n)=-1\}.$

The total number of visits to level $t$ up to $x$ is then

$C_t(x)=P_t(x)+N_t(x),$

and the selector function is recovered by monotone inversion:

$g(t,c)=\min\{x\ge 0 : C_t(x)\ge c\}.$

This sign split is essential. The formulas for $A(4m)$ and $A(4m+2)$ depend on $b(m)$, so a single counter for $A(n)=t$ would discard information needed by the next recursive level.

Step 4: Parity turns the formulas into a closed recursion

Every summand $b(k)$ is odd, hence

$A(m)\equiv m+1 \pmod 2.$

So if $A(m)=u$, then $m\equiv u-1 \pmod 2$; if $A(m)=u+1$, then $m\equiv u \pmod 2$. This observation lets the code replace the parity of the unknown index $m$ by the parity of the known level parameter.

For compact notation define

$x_r=\left\lfloor\frac{x-r}{4}\right\rfloor,\qquad r\in\{0,1,2,3\}.$

Step 5: Recurrence for even levels

If $t=2u$, then only residues $1$ and $3$ can contribute, because $A(4m)$ and $A(4m+2)$ are always odd. From $A(4m+1)=A(4m+3)=2A(m)$ we get

$P_{2u}(x)=P_u(x_1)+ \begin{cases} P_u(x_3), & u \text{ even},\\ N_u(x_3), & u \text{ odd}, \end{cases}$

$N_{2u}(x)=N_u(x_1)+ \begin{cases} N_u(x_3), & u \text{ even},\\ P_u(x_3), & u \text{ odd}. \end{cases}$

The switch at residue $3$ comes from

$b(4m+3)=-(-1)^m b(m)=(-1)^u b(m),$

because $A(m)=u$ forces $m\equiv u-1\pmod 2$. When $u$ is even, the sign is preserved; when $u$ is odd, it flips.

Step 6: Recurrence for odd levels

If $t=2u+1$, only residues $0$ and $2$ are possible. For $n=4m$, the equation

$A(4m)=2A(m)-b(m)=2u+1$

forces exactly two cases:

$A(m)=u \text{ with } b(m)=-1,\qquad A(m)=u+1 \text{ with } b(m)=+1.$

For $n=4m+2$, the equation

$A(4m+2)=2A(m)+(-1)^m b(m)=2u+1$

together with the parity rule determines whether the source state is counted by $P_u$ or $N_u$, and likewise for $u+1$. The final recurrence is

$P_{2u+1}(x)= \begin{cases} P_u(x_2)+P_{u+1}(x_0), & u \text{ odd},\\ N_u(x_2)+P_{u+1}(x_0), & u \text{ even}, \end{cases}$

$N_{2u+1}(x)= \begin{cases} N_u(x_0)+P_{u+1}(x_2), & u \text{ odd},\\ N_u(x_0)+N_{u+1}(x_2), & u \text{ even}. \end{cases}$

Those are exactly the four branches in prefix_counts / prefixCounts in the local Python, C++, and Java sources.

Step 7: Why $g(t,c)$ exists for $1\le c\le t$

Let $T(t)$ be the total number of visits to level $t$ over the entire infinite walk. Taking $x\to\infty$ in the recurrences gives

$T(1)=1,\qquad T(2u)=2T(u),\qquad T(2u+1)=T(u)+T(u+1).$

The simple function $T(t)=t$ satisfies the same recurrence and the same base case, so

$T(t)=t.$

This explains the guard c > t in the code: level $t$ is reached exactly $t$ times, so the $c$-th hit exists precisely when $1\le c\le t$.

Worked Example

The first prefix values are

$A(0),A(1),A(2),\dots = 1,2,3,2,3,4,3,4,5,6,7,6,\dots$

Hence level $5$ is hit at

$n=8,12,14,16,26,\dots$

so

$g(5,3)=14.$

This also matches the total-count theorem above: level $5$ appears exactly five times, and the third occurrence is at index $14$.

How the Code Works

The core routine in each language is prefix_counts(t, x) or prefixCounts(t, x). It memoizes the pair $(P_t(x),N_t(x))$ with the boundary conditions $t=0$, $x<0$, and $t=1$. The helper count_total just adds the two components.

The function g(t, c) first doubles an upper bound hi until count_total(t, hi) >= c, then binary-searches the smallest valid index. Finally, solve() builds Fibonacci numbers with $F_0=F_1=1$ and sums g(F_t, F_{t-1}) for $2\le t\le 45$.

The C++ version adds two implementation details beyond the bare mathematics: it contains an internal checkpoint g(54321, 12345) = 1220847710, and it can split the 44 independent Fibonacci queries across threads. The Python and Java files are direct single-thread translations of the same recursion.

Complexity Analysis

A single evaluation of $C_t(x)$ only explores memoized states obtained by replacing $(t,x)$ with smaller arguments such as $(\lfloor t/2\rfloor,\lfloor x/4\rfloor)$ or $(\lfloor t/2\rfloor+1,\lfloor x/4\rfloor)$. The recursion depth is therefore logarithmic in $x$, and the number of cached states stays tiny compared with scanning every index up to $x$.

Finding $g(t,c)$ adds one exponential search for an upper bound and then a binary search, so the number of counter evaluations is $O(\log g(t,c))$. The whole solve loop performs this process only 44 times, which is why the recursive counting approach is practical while brute force is not.

Footnotes and References

Problem page: https://projecteuler.net/problem=384
Rudin-Shapiro sequence: Wikipedia - Rudin-Shapiro sequence
Automatic sequence: Wikipedia - Automatic sequence
J.-P. Allouche and J. Shallit, Automatic Sequences, Cambridge University Press, 2003.
Local source files used as the implementation source of truth: solutionsCpp/Euler384.cpp, solutionsPython/Euler384.py, solutionsJava/Euler384.java.