Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 609: $\pi$ Sequences

Project Euler Problem 609 Solution

EulerSolve provides an optimized solution for Project Euler Problem 609, $\pi$ Sequences, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary Let $\pi(x)$ denote the prime-counting function. A $\pi$-sequence is any finite sequence of positive integers $u=(u_0,\dots,u_r)$ with $r\ge 1$ and $u_{i+1}=\pi(u_i)\qquad(0\le i<r).$ For a fixed starting value $u_0$, the valid sequences are exactly the positive prefixes of the iterated chain $u_0,\ \pi(u_0),\ \pi(\pi(u_0)),\ \dots$ having at least two terms. Let $c(u)$ be the number of non-prime terms in $u$, with $1$ counted as non-prime. Then $p(n,k)$ counts the $\pi$-sequences with $u_0\le n$ and $c(u)=k$, and the target quantity is $P(n)=\prod_{k:\,p(n,k)>0} p(n,k).$ We must compute $P(10^8)\bmod(10^9+7)$. The central observation is that many different starts share the same tail after the first application of $\pi$, so the problem can be compressed by counting groups of starts together instead of processing each start independently. Mathematical Approach Write $M=\pi(n),$ and let $q_m$ denote the $m$-th prime. The only number with $\pi(u_0)=0$ is $u_0=1$, which cannot begin a valid positive sequence of length at least $2$, so the useful first values are $m=1,2,\dots,M$. Step 1: Partition starts by the first $\pi$-value For each $m$, consider the set of starting values $x\le n$ satisfying $\pi(x)=m$....

Detailed mathematical approach

Problem Summary

Let $\pi(x)$ denote the prime-counting function. A $\pi$-sequence is any finite sequence of positive integers $u=(u_0,\dots,u_r)$ with $r\ge 1$ and

$u_{i+1}=\pi(u_i)\qquad(0\le i<r).$

For a fixed starting value $u_0$, the valid sequences are exactly the positive prefixes of the iterated chain

$u_0,\ \pi(u_0),\ \pi(\pi(u_0)),\ \dots$

having at least two terms. Let $c(u)$ be the number of non-prime terms in $u$, with $1$ counted as non-prime. Then $p(n,k)$ counts the $\pi$-sequences with $u_0\le n$ and $c(u)=k$, and the target quantity is

$P(n)=\prod_{k:\,p(n,k)>0} p(n,k).$

We must compute $P(10^8)\bmod(10^9+7)$. The central observation is that many different starts share the same tail after the first application of $\pi$, so the problem can be compressed by counting groups of starts together instead of processing each start independently.

Mathematical Approach

Write

$M=\pi(n),$

and let $q_m$ denote the $m$-th prime. The only number with $\pi(u_0)=0$ is $u_0=1$, which cannot begin a valid positive sequence of length at least $2$, so the useful first values are $m=1,2,\dots,M$.

Step 1: Partition starts by the first $\pi$-value

For each $m$, consider the set of starting values $x\le n$ satisfying $\pi(x)=m$. Because $\pi(x)$ increases exactly when $x$ passes a prime, these sets are intervals:

$I_m=\{x\le n:\pi(x)=m\}=\begin{cases} [q_m,q_{m+1}-1], & 1\le m<M,\\ [q_M,n], & m=M. \end{cases}$

Let $B_m=|I_m|$. Then

$B_m=\begin{cases} q_{m+1}-q_m, & 1\le m<M,\\ n-q_M+1, & m=M. \end{cases}$

Each interval contains exactly one prime starting value, namely $q_m$. Every other element of $I_m$ is composite. This prime-versus-composite split is the only distinction that remains inside a fixed interval.

Step 2: All starts in one interval share the same tail

If $x\in I_m$, then the first step of the sequence is always $\pi(x)=m$. After that point the future no longer depends on the original start $x$. Define the common tail by

$t_{m,0}=m,\qquad t_{m,j+1}=\pi(t_{m,j}).$

Every valid sequence beginning from the block $I_m$ is therefore of the form

$\bigl(x,\ t_{m,0},\ t_{m,1},\ \dots,\ t_{m,j}\bigr)$

for some $j\ge 0$. Since $\pi(y)<y$ for every integer $y>1$, the tail strictly decreases until it reaches $1$, so each block produces only a short list of prefixes.

Step 3: Count non-primes along the shared tail

For a fixed block $m$, let $r_{m,j}$ be the number of non-prime terms among

$t_{m,0},\ t_{m,1},\ \dots,\ t_{m,j}.$

If the start is the unique prime $q_m$, then the sequence $(q_m,t_{m,0},\dots,t_{m,j})$ has exactly $r_{m,j}$ non-prime terms, because the first term contributes nothing to $c(u)$.

If the start is any other element of $I_m$, then the first term is composite, so the corresponding sequence has

$r_{m,j}+1$

non-prime terms. Thus every prefix of the common tail contributes

$1$

sequence to the bucket $k=r_{m,j}$, and

$B_m-1$

sequences to the bucket $k=r_{m,j}+1$. This is the key counting rule implemented in all three languages.

Step 4: Sum the contributions block by block

Once the values $r_{m,j}$ are known, the counting function can be written directly as

$p(n,k)=\sum_{m=1}^{M}\left(\#\{j:r_{m,j}=k\}+(B_m-1)\#\{j:r_{m,j}=k-1\}\right).$

The first term counts the prefixes whose start is the unique prime in $I_m$. The second term counts the prefixes coming from the composite starts in the same interval. After all buckets are accumulated, the required answer is simply

$P(n)=\prod_{k:\,p(n,k)>0} p(n,k)\pmod{10^9+7}.$

The important simplification is that the algorithm never iterates through every $u_0\le n$. It iterates through the much smaller set of first images $m=1,\dots,\pi(n)$, and each such value represents an entire interval of starting points.

Step 5: Why only a short $\pi$-table is required

After the first step, every later term of every sequence is at most $M=\pi(n)$. Therefore the implementation needs two different kinds of precomputation:

the full sieve up to $n$, in order to know which numbers are prime and where the prime intervals begin and end, and a short prime-count table only for inputs $0,1,\dots,M$, in order to continue the tails $m,\pi(m),\pi(\pi(m)),\dots$.

This explains the structure seen in the implementations: the expensive work is the initial sieve up to $n$, while the dynamic tail processing happens on the much smaller domain $[1,M]$.

Worked Example: $n=10$

For $n=10$, we have

$M=\pi(10)=4,\qquad (q_1,q_2,q_3,q_4)=(2,3,5,7).$

Hence the intervals of starting values are

$I_1=\{2\},\qquad I_2=\{3,4\},\qquad I_3=\{5,6\},\qquad I_4=\{7,8,9,10\}.$

The corresponding tails and running non-prime counts are

$\begin{aligned} m=1&:\quad 1,\qquad r=(1),\\ m=2&:\quad 2\to 1,\qquad r=(0,1),\\ m=3&:\quad 3\to 2\to 1,\qquad r=(0,0,1),\\ m=4&:\quad 4\to 2\to 1,\qquad r=(1,1,2). \end{aligned}$

If we write each block's contribution to

$\bigl(p(10,0),\,p(10,1),\,p(10,2),\,p(10,3)\bigr),$

we obtain

$\begin{aligned} m=1&:\quad (0,1,0,0),\\ m=2&:\quad (1,2,1,0),\\ m=3&:\quad (2,3,1,0),\\ m=4&:\quad (0,2,7,3). \end{aligned}$

Therefore

$p(10,0)=3,\qquad p(10,1)=8,\qquad p(10,2)=9,\qquad p(10,3)=3,$

and the product is

$P(10)=3\cdot 8\cdot 9\cdot 3=648.$

This small example shows exactly why interval grouping works: one short tail walk simultaneously counts the prime start in the interval and all composite starts beside it.

How the Code Works

The C++, Python, and Java implementations begin with a sieve up to $n$. From that sieve they obtain both a primality lookup for every value up to $n$ and the ordered list of all primes not exceeding $n$. The number of primes in that list is $M=\pi(n)$, and consecutive primes immediately determine every interval size $B_m$.

Next, the implementation builds a prime-count table only on the range $0$ through $M$. That table is enough because once a sequence has taken its first step, all later values belong to $[1,M]$. The program then loops through the blocks $I_1,\dots,I_M$, computes the number of composite starts in each block, and walks the common tail

$m,\ \pi(m),\ \pi(\pi(m)),\ \dots,\ 1.$

While walking the tail it maintains the running number of non-prime tail values. At every tail position it performs two logical updates: one for the unique prime start of the block, and one for the $B_m-1$ composite starts. The C++ and Java versions store the bucket counts in growable arrays, while the Python version uses a sparse mapping, but the mathematical update is identical in all three cases.

After all blocks have been processed, the implementation multiplies every non-zero bucket size modulo $10^9+7$. Because this final product is taken only once, the real cost lies in the sieve and the short iterated-$\pi$ tail walks.

Complexity Analysis

Building the sieve and the prime list up to $n$ costs $O(n\log\log n)$ time and $O(n)$ memory. This is the dominant part of the algorithm.

Let $M=\pi(n)$. For each $m\le M$, the tail length is $O(\log m)$: for every $x\ge 2$, at most one out of every two integers up to $x$ can be prime, so

$\pi(x)\le \frac{x+1}{2}.$

Repeated application of $\pi$ therefore shrinks the value geometrically until it reaches $1$. Summing the tail-walk cost over all $m\le M$ gives $O(M\log M)$ additional time.

Since $M=\pi(n)=O(n/\log n)$, the post-sieve work is asymptotically smaller than the sieve itself. The overall complexity is therefore

$O(n\log\log n)\text{ time and }O(n)\text{ memory}.$

Footnotes and References

Problem page: Project Euler 609
Prime-counting function: Wikipedia — Prime-counting function
Sieve of Eratosthenes: Wikipedia — Sieve of Eratosthenes
Composite number: Wikipedia — Composite number
Prime number theorem: Wikipedia — Prime number theorem