Problem 211: Divisor Square Sum
View on Project EulerProject Euler Problem 211 Solution
EulerSolve provides an optimized solution for Project Euler Problem 211, Divisor Square Sum, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary For this problem we set \(L=64{,}000{,}000\) and study the divisor-square-sum function $\sigma_2(n)=\sum_{d\mid n} d^2.$ The goal is to compute $\sum_{\substack{1 \le n \lt L \\ \exists r\in \mathbb{Z}_{\ge 0}:\ \sigma_2(n)=r^2}} n.$ The condition is arithmetical rather than combinatorial: we are not counting divisors, but adding their squares and then testing whether the resulting integer is itself a square. Since \(\sigma_2(1)=1\), the sum already includes \(n=1\); the real work is to find all larger \(n\) below the limit that survive the same test. Mathematical Approach The common structure behind all three implementations is the prime-factor description of \(\sigma_2\). Once that formula is available, the problem becomes: generate or evaluate integers \(n\lt L\), compute \(\sigma_2(n)\) exactly, and keep only those for which the value is a square. Multiplicativity of \(\sigma_2\) If \(a\) and \(b\) are coprime, then every divisor of \(ab\) can be written uniquely as \(d_1d_2\) with \(d_1\mid a\) and \(d_2\mid b\). Therefore $\sigma_2(ab)=\sum_{d\mid ab} d^2=\sum_{d_1\mid a}\sum_{d_2\mid b}(d_1d_2)^2=\left(\sum_{d_1\mid a} d_1^2\right)\left(\sum_{d_2\mid b} d_2^2\right)=\sigma_2(a)\sigma_2(b).$ So \(\sigma_2\) is multiplicative....
Detailed mathematical approach
Problem Summary
For this problem we set \(L=64{,}000{,}000\) and study the divisor-square-sum function
$\sigma_2(n)=\sum_{d\mid n} d^2.$
The goal is to compute
$\sum_{\substack{1 \le n \lt L \\ \exists r\in \mathbb{Z}_{\ge 0}:\ \sigma_2(n)=r^2}} n.$
The condition is arithmetical rather than combinatorial: we are not counting divisors, but adding their squares and then testing whether the resulting integer is itself a square. Since \(\sigma_2(1)=1\), the sum already includes \(n=1\); the real work is to find all larger \(n\) below the limit that survive the same test.
Mathematical Approach
The common structure behind all three implementations is the prime-factor description of \(\sigma_2\). Once that formula is available, the problem becomes: generate or evaluate integers \(n\lt L\), compute \(\sigma_2(n)\) exactly, and keep only those for which the value is a square.
Multiplicativity of \(\sigma_2\)
If \(a\) and \(b\) are coprime, then every divisor of \(ab\) can be written uniquely as \(d_1d_2\) with \(d_1\mid a\) and \(d_2\mid b\). Therefore
$\sigma_2(ab)=\sum_{d\mid ab} d^2=\sum_{d_1\mid a}\sum_{d_2\mid b}(d_1d_2)^2=\left(\sum_{d_1\mid a} d_1^2\right)\left(\sum_{d_2\mid b} d_2^2\right)=\sigma_2(a)\sigma_2(b).$
So \(\sigma_2\) is multiplicative. If
$n=\prod_{i=1}^r p_i^{e_i},$
then
$\sigma_2(n)=\prod_{i=1}^r \sigma_2\!\left(p_i^{e_i}\right)=\prod_{i=1}^r \left(1+p_i^2+p_i^4+\cdots+p_i^{2e_i}\right).$
This is the key formula used everywhere in the page: the square test for \(\sigma_2(n)\) is controlled completely by the prime-power factors of \(n\).
Prime-power contribution and the useful recurrence
For a fixed prime \(p\), define
$G_p(e)=1+p^2+p^4+\cdots+p^{2e}=\frac{p^{2(e+1)}-1}{p^2-1}.$
Then the factorization above becomes
$\sigma_2(n)=\prod_i G_{p_i}(e_i).$
The closed form is mathematically convenient, but the implementations use an even simpler iterative identity:
$G_p(0)=1,\qquad G_p(e+1)=p^2\,G_p(e)+1.$
Indeed, multiplying \(1+p^2+\cdots+p^{2e}\) by \(p^2\) and then adding 1 produces \(1+p^2+\cdots+p^{2e}+p^{2e+2}\). This recurrence is exactly what lets the recursive search extend an exponent from \(e\) to \(e+1\) without recomputing a geometric series from scratch.
Worked example: \(n=42\)
A concrete example shows why the multiplicative viewpoint is so effective. Since
$42=2\cdot 3\cdot 7,$
we get
$\sigma_2(42)=\left(1+2^2\right)\left(1+3^2\right)\left(1+7^2\right)=5\cdot 10\cdot 50=2500=50^2.$
So \(42\) is one of the integers that must be included in the final sum. The important point is that no divisor list had to be generated explicitly: prime factorization plus multiplicativity already gives the answer.
Why the recursive factorization search is unique
The C++ and Python implementations maintain a state of the form
$n_0=\prod_{i=1}^k p_i^{e_i},\qquad s_0=\sigma_2(n_0),$
with strictly increasing primes \(p_1<p_2<\cdots<p_k\). To extend the state, they choose a larger prime \(q\) and an exponent \(f\ge 1\), then form
$n=n_0q^f,\qquad \sigma_2(n)=s_0\,G_q(f).$
Because each new prime is introduced only after all smaller chosen primes are fixed, every integer below the limit is generated exactly once, namely in the order dictated by its unique prime factorization. The pruning condition \(n<L\) is monotone, so as soon as a prime power pushes \(n\) past the limit, all larger exponents for that prime can be discarded immediately.
The sieve recurrence used by the full-array approach
The Java implementation uses the same mathematics in a different way. Write
$n=mp^e,\qquad p\nmid m,$
where \(p\) is the smallest prime dividing \(n\). Then
$\sigma_2(n)=\sigma_2(m)\,G_p(e).$
This gives a recurrence from a smaller integer \(m\) to the current integer \(n\). Once the smallest prime factor of every number is known, the algorithm can strip off the full \(p\)-power, build the geometric factor \(G_p(e)\), and recover \(\sigma_2(n)\) from a previously computed value. In other words, one family of implementations explores valid factorizations directly, while the other fills the entire table of \(\sigma_2(n)\) values from \(1\) up to \(L-1\).
How the Code Works
C++ and Python: enumerate factorizations
These implementations first build the prime list up to \(L-1\). They then recurse over increasing primes, carrying two exact integers: the current candidate \(n_0\) and the already-known value \(\sigma_2(n_0)\). For each newly chosen prime \(p\), they try exponents \(1,2,3,\dots\) as long as the product stays below the limit, update the prime-power factor with the recurrence \(G_p(e+1)=p^2G_p(e)+1\), and multiply it into the current \(\sigma_2\) value.
Whenever the new value of \(\sigma_2(n)\) is a perfect square, the corresponding \(n\) is added to the running total. Since the recursion visits each admissible factorization once, no deduplication structure is needed. The C++ version also splits the outermost prime choices across several worker threads so that separate branches of the factorization tree can be processed independently.
Java: build \(\sigma_2(n)\) for every \(n\)
The Java implementation takes an array-based route. It first computes the smallest prime factor of every integer below the limit. Then it processes \(n=2,3,\dots,L-1\) in order. If \(n\) is prime, the formula is immediate:
$\sigma_2(n)=1+n^2.$
If \(n\) is composite, the implementation extracts the full power \(p^e\) of its smallest prime factor, forms \(G_p(e)=1+p^2+\cdots+p^{2e}\), and multiplies that by the previously known value of \(\sigma_2(m)\) for the cofactor \(m\). After the array is filled, one final pass tests each \(\sigma_2(n)\) for being a square and accumulates the valid \(n\).
Exact perfect-square testing
The square test must be exact. Python uses the integer square-root routine directly. The C++ and Java implementations begin with a floating-point square-root estimate, but then adjust the candidate root upward or downward until \(r^2\le \sigma_2(n) < (r+1)^2\). This correction step avoids false positives from rounding and guarantees that the acceptance test really means
$\sigma_2(n)=r^2\quad\text{for some integer }r.$
Complexity Analysis
The three implementations share the same number theory but not the same cost profile. The recursive C++ and Python approach is output-sensitive: its running time is proportional to the number of prime-exponent states visited under the bound \(n<L\). The recursion depth is the number of distinct prime factors currently chosen, so it is small in practice and at worst \(O(\log L)\). Memory is dominated by the prime list plus the recursion stack.
The Java approach uses more memory but has a very regular control flow. It stores arrays indexed by all integers below \(L\), so its space usage is \(O(L)\). Time is dominated by the smallest-prime-factor sieve, the pass that reconstructs each \(\sigma_2(n)\), and the final square test sweep. For this fixed Project Euler limit, both strategies are practical: one saves memory by exploring only factorization states, while the other spends memory to make the arithmetic on each \(n\) very direct.
Footnotes and References
- Problem page: Project Euler 211
- Divisor function: Wikipedia - Divisor function
- Multiplicative function: Wikipedia - Multiplicative function
- Prime factorization: Wikipedia - Prime factor
- Perfect square: Wikipedia - Square number
- Sieve of Eratosthenes: Wikipedia - Sieve of Eratosthenes