Problem 464: Möbius Function and Intervals
View on Project EulerProject Euler Problem 464 Solution
EulerSolve provides an optimized solution for Project Euler Problem 464, Möbius Function and Intervals, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary Let \(C(n)\) be the number of intervals \(I=[l,r]\subseteq [1,n]\) for which the values \(\mu(k)=1\) and \(\mu(k)=-1\) stay in almost perfect balance. If we write $P_I=\#\{k\in I:\mu(k)=1\},\qquad N_I=\#\{k\in I:\mu(k)=-1\},$ then the counted intervals are exactly those satisfying $99P_I\le 100N_I,\qquad 99N_I\le 100P_I.$ Values with \(\mu(k)=0\) are neutral: they lengthen the interval but do not change either count. The implementations verify the derivation with $C(10)=13,\qquad C(500)=16676,\qquad C(10000)=20155319,$ and then evaluate the same method at \(n=20000000\). Mathematical Approach The key reduction is to replace interval counting by a pair-counting problem on weighted prefix sums. Step 1: Generate the Möbius Values Efficiently We need \(\mu(1),\mu(2),\dots,\mu(n)\). The Möbius function takes the values $\mu(m)=\begin{cases} 1, & m=1,\\ (-1)^t, & m\text{ is a product of }t\text{ distinct primes},\\ 0, & m\text{ is divisible by }p^2\text{ for some prime }p. \end{cases}$ The implementations build these values with a linear sieve, so the preprocessing is essentially one forward pass through the integers....
Detailed mathematical approach
Problem Summary
Let \(C(n)\) be the number of intervals \(I=[l,r]\subseteq [1,n]\) for which the values \(\mu(k)=1\) and \(\mu(k)=-1\) stay in almost perfect balance. If we write
$P_I=\#\{k\in I:\mu(k)=1\},\qquad N_I=\#\{k\in I:\mu(k)=-1\},$
then the counted intervals are exactly those satisfying
$99P_I\le 100N_I,\qquad 99N_I\le 100P_I.$
Values with \(\mu(k)=0\) are neutral: they lengthen the interval but do not change either count. The implementations verify the derivation with
$C(10)=13,\qquad C(500)=16676,\qquad C(10000)=20155319,$
and then evaluate the same method at \(n=20000000\).
Mathematical Approach
The key reduction is to replace interval counting by a pair-counting problem on weighted prefix sums.
Step 1: Generate the Möbius Values Efficiently
We need \(\mu(1),\mu(2),\dots,\mu(n)\). The Möbius function takes the values
$\mu(m)=\begin{cases} 1, & m=1,\\ (-1)^t, & m\text{ is a product of }t\text{ distinct primes},\\ 0, & m\text{ is divisible by }p^2\text{ for some prime }p. \end{cases}$
The implementations build these values with a linear sieve, so the preprocessing is essentially one forward pass through the integers.
Step 2: Express the Interval Condition with Two Counts
For an interval \(I=[l,r]\), define
$P_I=\#\{k\in I:\mu(k)=1\},\qquad N_I=\#\{k\in I:\mu(k)=-1\}.$
The interval is counted precisely when neither sign dominates the other by more than the factor \(100/99\):
$99P_I\le 100N_I,\qquad 99N_I\le 100P_I.$
This formulation makes clear that entries with \(\mu(k)=0\) do not affect the balance at all. In particular, an interval containing only zeros is valid because then \(P_I=N_I=0\).
Step 3: Pass to the Complementary Bad Families
Instead of counting good intervals directly, count the two ways an interval can fail:
$\mathcal{A}=\{I:99P_I-100N_I\gt 0\},\qquad \mathcal{B}=\{I:99N_I-100P_I\gt 0\}.$
These two families are disjoint. Indeed, if both inequalities held, then adding them would give
$-(P_I+N_I)\gt 0,$
which is impossible. Therefore
$C(n)=\frac{n(n+1)}{2}-\#\mathcal{A}-\#\mathcal{B}.$
The total number of intervals appears because there are \(n(n+1)/2\) choices of \([l,r]\subseteq [1,n]\).
Step 4: Turn Each Bad Family into a Weighted Prefix-Sum Problem
Define two weight systems:
$a(k)=\begin{cases} 99, & \mu(k)=1,\\ -100, & \mu(k)=-1,\\ 0, & \mu(k)=0, \end{cases}\qquad b(k)=\begin{cases} -100, & \mu(k)=1,\\ 99, & \mu(k)=-1,\\ 0, & \mu(k)=0. \end{cases}$
Now define prefix sums
$A_j=\sum_{k=1}^{j}a(k),\qquad B_j=\sum_{k=1}^{j}b(k),\qquad A_0=B_0=0.$
For an interval \(I=[l,r]\),
$A_r-A_{l-1}=99P_I-100N_I,\qquad B_r-B_{l-1}=99N_I-100P_I.$
So
$I\in\mathcal{A}\iff A_r\gt A_{l-1},\qquad I\in\mathcal{B}\iff B_r\gt B_{l-1}.$
This converts interval counting into the problem of counting ordered pairs \((i,j)\) with \(0\le i\lt j\le n\) and a strict prefix inequality.
Step 5: Count Increasing Prefix Pairs
For the first bad family we must count
$\#\mathcal{A}=\#\{(i,j):0\le i\lt j\le n,\ A_j\gt A_i\},$
and similarly
$\#\mathcal{B}=\#\{(i,j):0\le i\lt j\le n,\ B_j\gt B_i\}.$
Scanning the prefixes from left to right, we only need to know how many earlier prefix values are strictly smaller than the current one. A Fenwick tree supports exactly this operation after the prefix values are shifted into a dense range or compressed into sorted coordinates.
Hence the full problem becomes a pair of standard order-statistics passes over two prefix arrays.
Worked Example: \(n=10\)
The Möbius values are
$\bigl(\mu(1),\dots,\mu(10)\bigr)=(1,-1,-1,0,-1,1,-1,0,0,1).$
Some representative intervals are easy to classify:
$[1,1]:\ P_I=1,\ N_I=0\ \Rightarrow\ 99P_I-100N_I=99\gt 0,$
so \([1,1]\in\mathcal{A}\).
$[2,3]:\ P_I=0,\ N_I=2\ \Rightarrow\ 99N_I-100P_I=198\gt 0,$
so \([2,3]\in\mathcal{B}\).
$[1,2]:\ P_I=1,\ N_I=1,$
hence both balance inequalities hold and \([1,2]\) is counted. Also \([8,9]\) contains only zeros, so \(P_I=N_I=0\) and it is counted as well.
For the first weight system, the prefix sums are
$A=(0,99,-1,-101,-101,-201,-102,-202,-202,-202,-103),$
and for the second they are
$B=(0,-100,-1,98,98,197,97,196,196,196,96).$
The number of increasing pairs in the first sequence is \(6\), and in the second sequence it is \(36\). Since the total number of intervals is
$\frac{10\cdot 11}{2}=55,$
we obtain
$C(10)=55-6-36=13,$
which matches the checkpoint used by the implementations.
How the Code Works
The C++, Python, and Java implementations all follow the same pipeline. First they compute the Möbius values up to \(n\) with a linear sieve. Then they run the interval-counting routine twice: once with weights \((99,-100)\) and once with weights \((-100,99)\). In each pass they build the weighted prefix sums, map those prefix values to Fenwick-tree indices, and accumulate how many earlier prefixes are strictly smaller than the current prefix. After obtaining the two bad counts, they subtract them from \(n(n+1)/2\).
The C++ and Java implementations also include a coordinate-compression fallback when the raw prefix-value span would make a dense Fenwick tree unnecessarily large. The mathematical quantity being counted is the same in every language.
Complexity Analysis
Computing all Möbius values up to \(n\) by linear sieve costs \(O(n)\) time and \(O(n)\) memory. Each weighted pass constructs one prefix array in \(O(n)\) time and performs Fenwick queries and updates in \(O(n\log n)\) time in the worst case; when the prefix span is already dense, the logarithm is taken over that span instead of over the number of distinct compressed values. Therefore the overall method runs in \(O(n\log n)\) time with \(O(n)\) memory, a dramatic improvement over naive \(O(n^2)\) interval enumeration.
Footnotes and References
- Problem page: https://projecteuler.net/problem=464
- Möbius function: Wikipedia - Möbius function
- Fenwick tree: Wikipedia - Fenwick tree
- Prefix sum: Wikipedia - Prefix sum