Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 603: Substring Sums of Prime Concatenations

View on Project Euler

Project Euler Problem 603 Solution

EulerSolve provides an optimized solution for Project Euler Problem 603, Substring Sums of Prime Concatenations, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary For a decimal string \(w\), let \(S(w)\) denote the sum of all contiguous decimal substrings of \(w\), counted with multiplicity and interpreted as ordinary integers. Let \(P(t)\) be the decimal concatenation of the first \(t\) primes, and let \(C(t,k)\) be \(k\) consecutive copies of \(P(t)\). The target is $S\big(C(10^6,10^{12})\big)\bmod(10^9+7).$ The direct string is far too large to build explicitly: even one copy of \(P(10^6)\) already contains millions of digits, and the problem asks for \(10^{12}\) copies. The solution therefore compresses the effect of appending digits into a fixed linear transformation and then raises one block transformation to a huge power. Mathematical Approach Write any decimal string as \(d_1d_2\cdots d_L\), where each \(d_j\in\{0,\dots,9\}\). We first derive an exact recurrence for substring sums, then package one digit-append step into a \(4\times4\) matrix, and finally exploit the fact that the large input is just one repeated prime-concatenation block. Step 1: Sum All Substrings Ending at One Position Let \(F_j\) be the sum of all substrings that end exactly at position \(j\). Then the total answer for the whole string is $S(d_1d_2\cdots d_L)=\sum_{j=1}^{L} F_j.$ Why is there a simple recurrence? Every substring ending at position \(j-1\) can be extended by digit \(d_j\), which multiplies its previous value by \(10\)....

Detailed mathematical approach

Problem Summary

For a decimal string \(w\), let \(S(w)\) denote the sum of all contiguous decimal substrings of \(w\), counted with multiplicity and interpreted as ordinary integers. Let \(P(t)\) be the decimal concatenation of the first \(t\) primes, and let \(C(t,k)\) be \(k\) consecutive copies of \(P(t)\). The target is

$S\big(C(10^6,10^{12})\big)\bmod(10^9+7).$

The direct string is far too large to build explicitly: even one copy of \(P(10^6)\) already contains millions of digits, and the problem asks for \(10^{12}\) copies. The solution therefore compresses the effect of appending digits into a fixed linear transformation and then raises one block transformation to a huge power.

Mathematical Approach

Write any decimal string as \(d_1d_2\cdots d_L\), where each \(d_j\in\{0,\dots,9\}\). We first derive an exact recurrence for substring sums, then package one digit-append step into a \(4\times4\) matrix, and finally exploit the fact that the large input is just one repeated prime-concatenation block.

Step 1: Sum All Substrings Ending at One Position

Let \(F_j\) be the sum of all substrings that end exactly at position \(j\). Then the total answer for the whole string is

$S(d_1d_2\cdots d_L)=\sum_{j=1}^{L} F_j.$

Why is there a simple recurrence? Every substring ending at position \(j-1\) can be extended by digit \(d_j\), which multiplies its previous value by \(10\). In addition, the new digit \(d_j\) appears as the last digit in each of the \(j\) substrings ending at \(j\): the one-digit substring \(d_j\), the two-digit substring ending at \(j\), and so on up to the full prefix \(d_1\cdots d_j\). Therefore

$F_j=10F_{j-1}+j\,d_j,\qquad F_0=0.$

This recurrence is exact and already reduces the problem from quadratic substring enumeration to a single left-to-right scan.

Step 2: Turn the Recurrence into a Linear State

To compose many digit steps efficiently, track four quantities after processing \(\ell\) digits:

$v_\ell=\begin{bmatrix}F_\ell\\ S_\ell\\ \ell\\ 1\end{bmatrix},\qquad S_\ell=\sum_{j=1}^{\ell} F_j.$

If the next digit is \(d\), then the new position is \(\ell+1\), so the recurrence becomes

$F_{\ell+1}=10F_\ell+(\ell+1)d,$

$S_{\ell+1}=S_\ell+F_{\ell+1},$

$\ell'=\ell+1.$

That update is linear in the state vector:

$v_{\ell+1}=T(d)\,v_\ell,$

with

$T(d)=\begin{bmatrix} 10 & 0 & d & d\\ 10 & 1 & d & d\\ 0 & 0 & 1 & 1\\ 0 & 0 & 0 & 1 \end{bmatrix}.$

So appending one digit is no longer a special-case arithmetic step; it is just multiplication by a fixed-size matrix.

Step 3: A Whole String Becomes One Matrix

For a block \(w=d_1d_2\cdots d_L\), define its transform \(M(w)\) by feeding the digits from left to right. Starting from

$v_0=\begin{bmatrix}0\\0\\0\\1\end{bmatrix},$

we get

$v_L=M(w)\,v_0.$

Because matrix multiplication composes successive digit steps, concatenation of strings corresponds to multiplication of their block transforms. If \(u\) is processed first and then \(v\), then

$M(uv)=M(v)\,M(u).$

This is the key structural fact: instead of thinking about billions or trillions of individual digits, we can think about a much smaller number of block transformations.

Step 4: Repeating One Prime Block Means Matrix Powers

The problem input is not an arbitrary string; it is the same block repeated many times. Let

$B=M\big(P(10^6)\big).$

Then one copy of \(P(10^6)\) acts as \(B\), two copies act as \(B^2\), and in general

$M\big(C(10^6,k)\big)=B^k.$

Therefore the required state is

$v_{\text{final}}=B^{10^{12}}v_0,$

and the answer to the problem is the second component of \(v_{\text{final}}\), namely the accumulated substring sum.

This replaces an astronomically long concatenation with binary exponentiation on a \(4\times4\) matrix.

Step 5: Build the Prime Block Without Building the Giant String

To obtain \(B\), we only need the decimal digits of the first \(10^6\) primes in order. A sieve gives those primes up to a safe upper bound for the millionth prime, and each prime is then decomposed into decimal digits from most significant to least significant. Every digit updates the current \(4\times4\) block transform once.

No copy of \(P(10^6)\) is stored as one long string, and certainly no copy of \(C(10^6,10^{12})\) is stored. The entire method keeps only a tiny state matrix, a few fixed-size vectors, and the sieve structure needed to enumerate primes.

Worked Example

Take the ordinary string \(2024\). The recurrence gives

$F_1=2,$

$F_2=10\cdot 2+2\cdot 0=20,$

$F_3=10\cdot 20+3\cdot 2=206,$

$F_4=10\cdot 206+4\cdot 4=2076.$

Hence

$S(2024)=2+20+206+2076=2304.$

This matches the direct substring sum \(2+0+2+4+20+2+24+202+24+2024\). The same recurrence is what the full solution uses; the only difference is that the actual problem feeds in the digits of prime concatenations and then compresses repeated blocks with matrix powers.

How the Code Works

The C++, Python, and Java implementations all follow the same structure. First, they define modular arithmetic under \(10^9+7\) and implement \(4\times4\) matrix multiplication together with matrix-vector multiplication for the state described above.

Next, they generate the first million primes with a sieve up to a safe analytic upper bound. The digits of each prime are processed from most significant to least significant, and each digit left-composes the current block transform by the corresponding digit matrix. After the millionth prime has been processed, the code has the single matrix \(B\) for one block \(P(10^6)\).

Finally, they apply exponentiation by squaring to compute the action of \(B^{10^{12}}\) on the initial state vector. Because the matrix dimension is fixed at \(4\), this stage is tiny compared with prime generation and digit streaming. The printed output is the second state component after the exponentiation loop finishes.

Complexity Analysis

Let \(n=10^6\), let \(U\) be a safe upper bound for the \(n\)-th prime, and let \(D\) be the number of decimal digits in \(P(n)\). The sieve costs \(O(U\log\log U)\) time and \(O(U)\) memory, or \(O(U/2)\) logical storage in an odd-only representation. Feeding the digits of all primes into the block matrix costs \(O(D)\) time. Exponentiating a \(4\times4\) matrix to the power \(10^{12}\) costs \(O(\log 10^{12})\) matrix multiplications and only \(O(1)\) extra memory beyond the fixed matrices and vectors.

So the overall complexity is

$O(U\log\log U + D + \log 10^{12})$

time and \(O(U)\) memory. In practice, the dominant work is prime generation plus the single pass over the digits of the first million primes.

Footnotes and References

  1. Problem page: Project Euler 603
  2. Exponentiation by squaring: Wikipedia - Exponentiation by squaring
  3. Sieve of Eratosthenes: Wikipedia - Sieve of Eratosthenes
  4. Prime number theorem: Wikipedia - Prime number theorem
  5. Matrix multiplication: Wikipedia - Matrix multiplication

Mathematical approach · C++ solution · Python solution · Java solution

Previous: Problem 602 · All Project Euler solutions · Next: Problem 604

Need help with a problem? Ask me! 💡
e
✦ Euler GLM 5.2
Hi! I'm Euler. Ask me anything about Project Euler problems, math concepts, or solution approaches! 🧮