Problem 234: Semidivisible Numbers
View on Project EulerProject Euler Problem 234 Solution
EulerSolve provides an optimized solution for Project Euler Problem 234, Semidivisible Numbers, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary For each integer \(n\), let \(p\) and \(q\) be the consecutive primes satisfying \(p \lt \sqrt n \lt q\). The number \(n\) is called semidivisible when exactly one of those two primes divides it. The task is to compute $S(L)=\sum_{\substack{n\le L\\ n\text{ semidivisible}}} n$ for the given limit \(L=999966663333\). A direct search over all \(n\le L\) would be hopeless. The successful idea is to stop thinking about individual integers and instead group them by the pair of consecutive primes surrounding \(\sqrt n\). Once that pair is fixed, the semidivisible condition becomes a clean inclusion-exclusion calculation on an interval. Mathematical Approach The three implementations all exploit the same structure: partition the line into prime-square bands, compute one closed-form contribution for each band, and add those contributions. Prime-square bands identify the relevant primes Suppose \(p\) and \(q\) are consecutive primes. Then the condition $p \lt \sqrt n \lt q$ is equivalent to $p^2 \lt n \lt q^2.$ So every semidivisible number belongs to exactly one interval between two consecutive prime squares. Inside that whole band, the lower surrounding prime is always \(p\) and the upper surrounding prime is always \(q\)....
Detailed mathematical approach
Problem Summary
For each integer \(n\), let \(p\) and \(q\) be the consecutive primes satisfying \(p \lt \sqrt n \lt q\). The number \(n\) is called semidivisible when exactly one of those two primes divides it. The task is to compute
$S(L)=\sum_{\substack{n\le L\\ n\text{ semidivisible}}} n$
for the given limit \(L=999966663333\).
A direct search over all \(n\le L\) would be hopeless. The successful idea is to stop thinking about individual integers and instead group them by the pair of consecutive primes surrounding \(\sqrt n\). Once that pair is fixed, the semidivisible condition becomes a clean inclusion-exclusion calculation on an interval.
Mathematical Approach
The three implementations all exploit the same structure: partition the line into prime-square bands, compute one closed-form contribution for each band, and add those contributions.
Prime-square bands identify the relevant primes
Suppose \(p\) and \(q\) are consecutive primes. Then the condition
$p \lt \sqrt n \lt q$
is equivalent to
$p^2 \lt n \lt q^2.$
So every semidivisible number belongs to exactly one interval between two consecutive prime squares. Inside that whole band, the lower surrounding prime is always \(p\) and the upper surrounding prime is always \(q\). For a fixed pair \((p,q)\), the relevant interval is therefore
$A=p^2+1,\qquad B=\min(L,q^2-1).$
The final band may be cut off by the global limit \(L\), but the logic is the same.
Semidivisibility becomes an exclusive divisibility condition
Inside the band \((p^2,q^2)\), a number is semidivisible exactly when it is divisible by one of \(p\) or \(q\), but not both. That is an XOR condition:
$p\mid n \;\text{ xor }\; q\mid n.$
Define \(M_d(A,B)\) to be the sum of all multiples of \(d\) in the closed interval \([A,B]\). Then the contribution of the band determined by consecutive primes \(p\) and \(q\) is
$\mathcal{B}(p,q)=M_p(A,B)+M_q(A,B)-2M_{pq}(A,B).$
Why does this work? Every multiple of \(p\) contributes once through \(M_p\), every multiple of \(q\) contributes once through \(M_q\), and numbers divisible by both are counted twice. Because \(p\) and \(q\) are distinct primes, being divisible by both means being divisible by \(pq\), so subtracting \(2M_{pq}(A,B)\) removes exactly those forbidden cases and leaves only numbers divisible by exactly one of the two primes.
Each multiples sum is an arithmetic progression
For any divisor \(d\), the multiples of \(d\) in \([A,B]\) are
$d\,k_0,\ d(k_0+1),\ \dots,\ d\,k_1,$
where
$k_0=\left\lceil\frac{A}{d}\right\rceil,\qquad k_1=\left\lfloor\frac{B}{d}\right\rfloor.$
If \(k_0 \gt k_1\), there are no such multiples and the sum is \(0\). Otherwise the terms form an arithmetic progression, so
$M_d(A,B)=d\cdot\frac{(k_0+k_1)(k_1-k_0+1)}{2}.$
This is the key collapse. Instead of inspecting potentially millions of integers in one band, the implementations evaluate three closed-form sums: one for multiples of \(p\), one for multiples of \(q\), and one for multiples of \(pq\).
Worked example: the small case \(L=15\)
The first relevant consecutive-prime pairs are \((2,3)\) and \((3,5)\).
For \((2,3)\), the band is \(4 \lt n \lt 9\), so \(A=5\) and \(B=8\). The multiples are:
$M_2(5,8)=6+8=14,\qquad M_3(5,8)=6,\qquad M_6(5,8)=6.$
Hence
$\mathcal{B}(2,3)=14+6-2\cdot 6=8.$
The only semidivisible number there is \(8\): \(6\) is excluded because it is divisible by both \(2\) and \(3\).
For \((3,5)\), the global limit truncates the band to \(10\le n\le 15\). Now
$M_3(10,15)=12+15=27,\qquad M_5(10,15)=10+15=25,\qquad M_{15}(10,15)=15.$
Therefore
$\mathcal{B}(3,5)=27+25-2\cdot 15=22,$
which comes from the semidivisible numbers \(10\) and \(12\). Summing the two bands gives
$S(15)=8+22=30.$
Summing all bands
Once the band formula is known, the whole problem is just
$S(L)=\sum_{\substack{(p,q)\text{ consecutive primes}\\ p^2\le L}} \mathcal{B}(p,q),$
with \(B=\min(L,q^2-1)\) in the last active band. No semidivisible number is ever generated explicitly; every contribution comes from prime pairs and arithmetic-series identities.
How the Code Works
The C++, Python, and Java implementations first generate all primes up to slightly beyond \(\sqrt L\). That is enough because any relevant band is determined by a prime \(p\) with \(p^2\le L\) and the next prime \(q\). The code therefore prepares a prime list through about \(\sqrt L\) and makes sure there is at least one additional prime above that threshold for the final band.
It then scans adjacent prime pairs \((p,q)\). For each pair it computes the band endpoints \(A=p^2+1\) and \(B=\min(L,q^2-1)\). If \(A\gt B\), the band contributes nothing. Otherwise the implementation evaluates the three interval sums \(M_p(A,B)\), \(M_q(A,B)\), and \(M_{pq}(A,B)\) from the first multiple, the last multiple, and the arithmetic-progression formula, and adds
$M_p(A,B)+M_q(A,B)-2M_{pq}(A,B)$
to the running total.
Everything is done with integer arithmetic. The C++ implementation uses wider intermediate integers for safety, Python relies on arbitrary-precision integers, and Java stays within signed 64-bit arithmetic for this problem size. All three implementations stop as soon as the lower prime square \(p^2\) exceeds \(L\).
Complexity Analysis
Let \(N=\lfloor\sqrt L\rfloor+O(1)\). In these implementations, prime generation is done with a linear-sieve style method, so producing the prime table costs \(O(N)\) time and \(O(N)\) memory. The subsequent scan over consecutive prime pairs performs only constant work per pair, so it adds \(O(\pi(\sqrt L))\) operations, which is smaller than the sieve itself.
Therefore the overall complexity is
$O(\sqrt L)\text{ time and }O(\sqrt L)\text{ memory}.$
The important point is not just the asymptotic bound, but the structural reduction: the algorithm replaces a search across almost \(10^{12}\) integers by a pass over primes near \(\sqrt L\), with each band handled in constant time.
Footnotes and References
- Project Euler problem page: Project Euler 234 - Semidivisible Numbers
- Prime numbers: Wikipedia - Prime number
- Inclusion-exclusion principle: Wikipedia - Inclusion-exclusion principle
- Arithmetic progression: Wikipedia - Arithmetic progression
- Prime sieving: Wikipedia - Sieve of Eratosthenes