Problem 98: Anagramic Squares
View on Project EulerProject Euler Problem 98 Solution
EulerSolve provides an optimized solution for Project Euler Problem 98, Anagramic Squares, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary We are given a list of English words. Two words matter only if they are anagrams, meaning they contain exactly the same letters in a different order. A valid solution chooses one such pair together with a bijection from letters to digits so that both words become decimal integers, neither integer begins with 0, and both integers are perfect squares. The goal is to find the largest square that appears anywhere in such a pair. Because a word of length \(L\) must map to an \(L\)-digit number, the search is finite and can be organized by word length. Mathematical Approach The efficient viewpoint is to treat the problem as a comparison between two kinds of patterns: sorted-letter signatures for finding anagram classes, and repetition signatures for deciding whether a word can match a square at all. Once those invariants are identified, the remaining work is a finite scan over the relevant square numbers. Anagram Classes Fix the Search Domain Sort the letters of each word alphabetically. Words with the same sorted form belong to the same anagram class, and only such classes can contribute a valid pair. If one class contains \(m\) words, then only its \(\binom{m}{2}\) unordered pairs need to be examined. For a chosen pair \((W_1,W_2)\), the common word length is \(L\)....
Detailed mathematical approach
Problem Summary
We are given a list of English words. Two words matter only if they are anagrams, meaning they contain exactly the same letters in a different order. A valid solution chooses one such pair together with a bijection from letters to digits so that both words become decimal integers, neither integer begins with 0, and both integers are perfect squares.
The goal is to find the largest square that appears anywhere in such a pair. Because a word of length \(L\) must map to an \(L\)-digit number, the search is finite and can be organized by word length.
Mathematical Approach
The efficient viewpoint is to treat the problem as a comparison between two kinds of patterns: sorted-letter signatures for finding anagram classes, and repetition signatures for deciding whether a word can match a square at all. Once those invariants are identified, the remaining work is a finite scan over the relevant square numbers.
Anagram Classes Fix the Search Domain
Sort the letters of each word alphabetically. Words with the same sorted form belong to the same anagram class, and only such classes can contribute a valid pair. If one class contains \(m\) words, then only its \(\binom{m}{2}\) unordered pairs need to be examined.
For a chosen pair \((W_1,W_2)\), the common word length is \(L\). Therefore any candidate square must lie in the interval
$10^{L-1} \le n^2 < 10^L.$
The number of \(L\)-digit squares is
$S_L=\left\lfloor \sqrt{10^L-1} \right\rfloor-\left\lceil \sqrt{10^{L-1}} \right\rceil+1.$
That already cuts the problem down from all digit assignments to a concrete list of square numbers of the correct length.
Repetition Pattern Is the Key Invariant
A valid substitution must be a bijection: the same letter must always receive the same digit, and two different letters cannot share one digit. So the real question is whether the pattern of repeated letters in a word matches the pattern of repeated digits in a square.
Define the pattern signature \(\pi(s)\) of a string \(s=s_1s_2\cdots s_L\) by numbering symbols in order of first appearance. For example,
$\pi(\text{CARE})=(0,1,2,3),\qquad \pi(\text{MEET})=(0,1,1,2),$
and similarly
$\pi(1296)=(0,1,2,3),\qquad \pi(1225)=(0,1,1,2).$
A word \(W\) can match a digit string \(D\) under a bijection if and only if \(\pi(W)=\pi(D)\), together with the extra rule that the leading digit is nonzero. This is the main mathematical filter used by the strongest implementation: squares are pre-grouped by pattern, so a word with pattern \((0,1,1,2)\) never needs to be tested against a square with pattern \((0,1,2,3)\).
Once the First Square Is Chosen, the Second Number Is Forced
Suppose \(W_1\) is matched with some \(L\)-digit square \(D_1\). If that match is valid, then every letter occurring in the anagram pair has been assigned exactly one digit. Because \(W_2\) uses the same multiset of letters, the second digit string \(D_2\) is completely determined by the same bijection; there is no independent second search.
Each candidate test therefore has a precise structure: choose \(D_1\), verify the bijection \(W_1 \leftrightarrow D_1\), construct \(D_2\) by reading the letters of \(W_2\), reject it if it begins with 0, and finally check whether \(D_2\) is also a square. If all those conditions hold, \((D_1,D_2)\) is a valid square anagram pair.
Worked Example: CARE and RACE
The pair CARE/RACE has length \(L=4\), so only four-digit squares are relevant. Those are the squares from \(32^2\) through \(99^2\), because
$1000 \le n^2 \le 9999.$
Take \(D_1=1296=36^2\). Its digit pattern is \((0,1,2,3)\), which matches CARE, so we obtain the bijection
$C\mapsto 1,\qquad A\mapsto 2,\qquad R\mapsto 9,\qquad E\mapsto 6.$
Applying the same map to RACE gives
$RACE \mapsto 9216 = 96^2.$
Both numbers are squares, so this is a genuine solution pair. By contrast, a four-digit square such as \(1225\) fails immediately for CARE because its repeated-digit pattern does not match a word with four distinct letters.
How the Code Works
Grouping the Words
The C++, Python, and Java implementations first parse the quoted word list, normalize it to plain words, and group those words by sorted letters. Any group of size 1 is discarded immediately, while each larger group contributes all unordered anagram pairs.
Preparing Square Candidates
All three implementations organize perfect squares by digit length so that a word of length \(L\) is only tested against \(L\)-digit squares. The C++ implementation goes one step further: it also groups the square strings by repetition pattern and keeps a fast lookup structure for square values. The Python and Java implementations keep the length buckets and validate the mapped second number by an integer-square-root check.
Testing and Updating the Best Answer
For each anagram pair, the implementation iterates over candidate squares of the same length, builds the two-way letter-to-digit and digit-to-letter correspondence, rejects any collision that breaks bijectivity, constructs the mapped value for the partner word, rejects leading zeros, and then verifies that the partner value is square. Whenever both numbers are square, the larger of the two is compared with the current global maximum.
Complexity Analysis
Let \(A_L\) be the number of anagram pairs of length \(L\), and let \(S_L\) denote the number of \(L\)-digit squares. Checking one square against one word pair costs \(O(L)\), because each character position is examined a constant number of times while building and validating the bijection.
The Python and Java implementations therefore run in
$O\!\left(\sum_L A_L S_L L\right),$
with space \(O\!\left(\sum_L S_L\right)\) for the stored square strings plus the word groups. The C++ implementation has the same outer structure but reduces the practical candidate count by restricting each word pattern to the matching square-pattern bucket. In all cases the search is manageable because the number of relevant word lengths is small and \(S_L\) grows only like the width of a square-root interval.
Footnotes and References
- Problem page: https://projecteuler.net/problem=98
- Anagram: Wikipedia - Anagram
- Perfect square: Wikipedia - Square number
- Bijection: Wikipedia - Bijection
- Integer square root: Wikipedia - Integer square root