Problem 619: Square Subsets
View on Project EulerProject Euler Problem 619 Solution
EulerSolve provides an optimized solution for Project Euler Problem 619, Square Subsets, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary For an interval \([a,b]\), let \(C(a,b)\) be the number of non-empty subsets of \(\{a,a+1,\dots,b\}\) whose product is a perfect square. The task is to compute $C(10^6,1234567)\bmod(10^9+7).$ Small checkpoints such as \(C(5,10)=3\) and \(C(40,55)=15\) clarify the intended interpretation. The interval is far too large for direct subset enumeration, so the problem must be transformed into linear algebra over \(\mathbb F_2\). Mathematical Approach Write the interval elements as \(x_1,\dots,x_n\), where \(n=b-a+1\). The key observation is that a product is a square exactly when every prime exponent in that product is even, and parity is naturally encoded over \(\mathbb F_2\). Step 1: Replace each integer by its squarefree parity pattern If $x=\prod_p p^{e_p},$ then only the residues \(e_p\bmod 2\) matter. Define a binary vector \(v_x\) by $v_x(p)=e_p\bmod 2.$ Equivalently, every integer can be written as \(x=q^2s\) with \(s\) squarefree, and \(v_x\) records exactly the primes dividing \(s\). For example, \(12=2^2\cdot 3\) contributes only the prime \(3\), while \(18=2\cdot 3^2\) contributes only the prime \(2\). Step 2: Translate the square condition into a zero xor-sum For a subset \(S\subseteq \{a,a+1,\dots,b\}\), the product is $\prod_{x\in S}x=\prod_p p^{\sum_{x\in S}e_p(x)}.$ This product is a perfect square if and only if every exponent sum is even....
Detailed mathematical approach
Problem Summary
For an interval \([a,b]\), let \(C(a,b)\) be the number of non-empty subsets of \(\{a,a+1,\dots,b\}\) whose product is a perfect square. The task is to compute
$C(10^6,1234567)\bmod(10^9+7).$
Small checkpoints such as \(C(5,10)=3\) and \(C(40,55)=15\) clarify the intended interpretation. The interval is far too large for direct subset enumeration, so the problem must be transformed into linear algebra over \(\mathbb F_2\).
Mathematical Approach
Write the interval elements as \(x_1,\dots,x_n\), where \(n=b-a+1\). The key observation is that a product is a square exactly when every prime exponent in that product is even, and parity is naturally encoded over \(\mathbb F_2\).
Step 1: Replace each integer by its squarefree parity pattern
If
$x=\prod_p p^{e_p},$
then only the residues \(e_p\bmod 2\) matter. Define a binary vector \(v_x\) by
$v_x(p)=e_p\bmod 2.$
Equivalently, every integer can be written as \(x=q^2s\) with \(s\) squarefree, and \(v_x\) records exactly the primes dividing \(s\). For example, \(12=2^2\cdot 3\) contributes only the prime \(3\), while \(18=2\cdot 3^2\) contributes only the prime \(2\).
Step 2: Translate the square condition into a zero xor-sum
For a subset \(S\subseteq \{a,a+1,\dots,b\}\), the product is
$\prod_{x\in S}x=\prod_p p^{\sum_{x\in S}e_p(x)}.$
This product is a perfect square if and only if every exponent sum is even. In vector form, that is exactly
$\bigoplus_{x\in S} v_x=0.$
If we introduce coefficients \(\lambda_i\in\{0,1\}\) telling whether \(x_i\) is chosen, the condition becomes
$\lambda_1v_{x_1}\oplus \lambda_2v_{x_2}\oplus\cdots\oplus \lambda_nv_{x_n}=0.$
So square subsets are precisely the linear relations among the parity vectors.
Step 3: Count square subsets by rank-nullity
Let \(r\) be the rank of the family \(v_{x_1},\dots,v_{x_n}\) over \(\mathbb F_2\). The linear map sending \((\lambda_1,\dots,\lambda_n)\) to the xor-sum above has image dimension \(r\), so its kernel has size
$2^{n-r}.$
Every kernel vector corresponds to a subset whose product is a square, and the zero vector corresponds to the empty subset. Therefore
$C(a,b)=2^{n-r}-1.$
Once the rank is known, the counting part of the problem is finished.
Step 4: Why sparse elimination is the right model
A dense vector would have one coordinate for every prime up to \(b\), but most integers leave only a small set of primes with odd exponent. That means each row is naturally sparse. Over \(\mathbb F_2\), row addition is xor, which on supports is just symmetric difference: a prime remains present exactly when it appears in one row but not the other.
This lets the implementation work with sorted prime-index lists instead of huge dense bit matrices. Numbers that are already perfect squares produce the zero vector immediately, which is consistent with the fact that such singletons already form valid square subsets.
Step 5: Worked Example on \([5,10]\)
The six interval elements give the parity supports
$5\mapsto\{5\},\quad 6\mapsto\{2,3\},\quad 7\mapsto\{7\},\quad 8\mapsto\{2\},\quad 9\mapsto\varnothing,\quad 10\mapsto\{2,5\}.$
The nonzero rows for \(5,6,7,8\) are independent, while
$\{2,5\}=\{2\}\oplus\{5\}.$
Hence the rank is \(r=4\). Since \(n=6\), the nullity is \(6-4=2\), so
$C(5,10)=2^2-1=3.$
The three non-empty square subsets are \(\{9\}\), \(\{5,8,10\}\), and \(\{5,8,9,10\}\), which matches the checkpoint used by the implementation.
How the Code Works
The C++, Python, and Java implementations first build a smallest-prime-factor table up to the upper end of the interval. This makes repeated factorization fast enough for every number from \(a\) to \(b\).
Each interval element is then factored, and every prime whose exponent is odd is kept in a sparse support list. The list is ordered by prime index, so xor-reduction can be done by merging two ordered supports and removing entries that appear twice.
During Gaussian elimination over \(\mathbb F_2\), the implementation uses the largest remaining prime index as the current pivot. If no basis row owns that pivot yet, the current row becomes a new basis vector and the rank increases. Otherwise the current row is xor-reduced by the stored basis row and the process continues until the row vanishes or claims a new pivot.
After all rows have been processed, the implementation knows \(r\), computes \(n-r\), and returns
$2^{n-r}-1 \pmod{10^9+7}$
using fast modular exponentiation.
Complexity Analysis
Let \(B=b\) and \(n=b-a+1\). Building the smallest-prime-factor sieve up to \(B\) costs \(O(B\log\log B)\) time and \(O(B)\) memory. Factoring the \(n\) interval elements through that table is near-linear in practice and has average order \(O(n\log\log B)\).
The elimination phase depends on the total sizes of the sparse supports and on how much cancellation occurs during xor-reduction. In this problem that sparse representation is far smaller than a dense matrix with one column per prime up to \(B\). The overall method is therefore dominated by the sieve plus sparse elimination, while memory remains \(O(B)\) for the sieve together with the stored sparse basis.
Footnotes and References
- Problem page: https://projecteuler.net/problem=619
- Square-free integer: Wikipedia - Square-free integer
- Gaussian elimination: Wikipedia - Gaussian elimination
- Rank-nullity theorem: Wikipedia - Rank-nullity theorem
- Finite field: Wikipedia - Finite field