Problem 913: Row-major vs Column-major
View on Project EulerProject Euler Problem 913 Solution
EulerSolve provides an optimized solution for Project Euler Problem 913, Row-major vs Column-major, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary For each pair \(2\le n\le m\le 100\), the problem considers a rectangle with side lengths \(n^4\) and \(m^4\). Its cells are listed once in row-major order and once in column-major order, so the reindexing from one list to the other is a permutation of all \((nm)^4\) positions. For each pair, we must find the minimum number of swaps needed to realize that permutation, and then sum those values over all admissible pairs. A direct cycle walk on the full permutation would be hopeless, because the rectangle for one pair already has \(Q=(nm)^4\) cells. The implementations succeed by turning the transpose permutation into modular multiplication and then counting cycles through divisors of \(Q-1\). Mathematical Approach Let the rectangle have \(r=m^4\) rows and \(c=n^4\) columns. Define $Q=rc=(nm)^4,\qquad L=Q-1,\qquad \alpha=r=m^4.$ The entire task becomes a problem about the cycle structure of one modular permutation. From Matrix Indices to a Modular Permutation Take a cell in row \(i\) and column \(j\), with \(0\le i\lt r\) and \(0\le j\lt c\)....
Detailed mathematical approach
Problem Summary
For each pair \(2\le n\le m\le 100\), the problem considers a rectangle with side lengths \(n^4\) and \(m^4\). Its cells are listed once in row-major order and once in column-major order, so the reindexing from one list to the other is a permutation of all \((nm)^4\) positions. For each pair, we must find the minimum number of swaps needed to realize that permutation, and then sum those values over all admissible pairs.
A direct cycle walk on the full permutation would be hopeless, because the rectangle for one pair already has \(Q=(nm)^4\) cells. The implementations succeed by turning the transpose permutation into modular multiplication and then counting cycles through divisors of \(Q-1\).
Mathematical Approach
Let the rectangle have \(r=m^4\) rows and \(c=n^4\) columns. Define
$Q=rc=(nm)^4,\qquad L=Q-1,\qquad \alpha=r=m^4.$
The entire task becomes a problem about the cycle structure of one modular permutation.
From Matrix Indices to a Modular Permutation
Take a cell in row \(i\) and column \(j\), with \(0\le i\lt r\) and \(0\le j\lt c\). Its row-major index is
$x=ic+j,$
while its column-major index is
$y=jr+i.$
Now compute \(rx\):
$rx=r(ic+j)=irc+rj=iQ+rj\equiv i+rj=y \pmod{Q-1}.$
So for every index except the last one, the reindexing permutation satisfies
$\pi(x)\equiv \alpha x \pmod L,\qquad 0\le x\lt L.$
The missing index \(x=L=Q-1\) is an extra fixed point. Because
$L=(nm)^4-1\equiv -1 \pmod m,$
we have \(\gcd(\alpha,L)=1\), so multiplication by \(\alpha\) genuinely permutes the residues modulo \(L\).
If a permutation on \(Q\) elements has \(C\) cycles, then the minimum number of swaps needed to realize it is \(Q-C\), since a cycle of length \(t\) takes exactly \(t-1\) swaps. The problem is therefore reduced to counting cycles of \(\pi\).
Layers Indexed by Additive Order
For a residue \(x\in \mathbb{Z}/L\mathbb{Z}\), define
$d=\frac{L}{\gcd(x,L)}.$
This number \(d\) is the additive order of \(x\) in the cyclic group \(\mathbb{Z}/L\mathbb{Z}\). Multiplication by \(\alpha\) preserves \(\gcd(x,L)\), because \(\alpha\) is invertible modulo \(L\). Therefore it also preserves \(d\). The permutation splits into invariant layers, one layer for each divisor \(d\mid L\).
Exactly \(\varphi(d)\) residues have additive order \(d\): they are the generators of the unique subgroup of size \(d\) inside the cyclic group \(\mathbb{Z}/L\mathbb{Z}\).
Why the Cycle Length Becomes a Multiplicative Order
Write
$g=\gcd(x,L),\qquad x=gu,\qquad L=gd,$
so that \(\gcd(u,d)=1\). Then the condition that \(x\) returns to itself after \(t\) steps is
$\alpha^t x\equiv x \pmod L.$
Substituting \(x=gu\) and \(L=gd\) gives
$gd\mid g(\alpha^t-1)u\iff d\mid(\alpha^t-1)u.$
Because \(u\) is invertible modulo \(d\), this is equivalent to
$\alpha^t\equiv 1 \pmod d.$
So every orbit in the layer indexed by \(d\) has length
$\operatorname{ord}_d(\alpha),$
the multiplicative order of \(\alpha\) modulo \(d\). Since the layer contains \(\varphi(d)\) elements, its number of cycles is
$\frac{\varphi(d)}{\operatorname{ord}_d(\alpha)}.$
The Divisor Sum for the Total Cycle Count
Summing those contributions over all divisors of \(L\) gives the number of cycles inside the modular part:
$C_L=\sum_{d\mid L}\frac{\varphi(d)}{\operatorname{ord}_d(\alpha)},$
with the natural convention \(\operatorname{ord}_1(\alpha)=1\). This already counts the fixed point \(x=0\). The index \(x=L=Q-1\) contributes one more fixed point outside the modular residue set, so the full cycle count is
$C=C_L+1.$
Therefore the swap count for the pair \((n,m)\) is
$\boxed{\text{swaps}=Q-1-\sum_{d\mid L}\frac{\varphi(d)}{\operatorname{ord}_d(\alpha)}.}$
Worked Example: A \(4\times 3\) Rectangle
The actual problem uses fourth powers, but the same mechanism is already visible in a small ordinary rectangle with \(r=4\) and \(c=3\). Then
$Q=12,\qquad L=11,\qquad \alpha=4,$
so on \(0\le x\lt 11\) we have
$\pi(x)\equiv 4x \pmod{11}.$
The only divisors of \(11\) are \(1\) and \(11\), hence
$C_{11}=1+\frac{\varphi(11)}{\operatorname{ord}_{11}(4)}=1+\frac{10}{5}=3.$
Indeed, the residues split as
$0,$
$1\to 4\to 5\to 9\to 3\to 1,$
$2\to 8\to 10\to 7\to 6\to 2,$
and the twelfth position \(x=11\) is the extra fixed point. So the total cycle count is \(4\), and the minimum number of swaps is
$12-4=8.$
This is the same small check used by the implementations to confirm the formula.
Factoring \(s^4-1\) Without Touching the Matrix
Set \(s=nm\). Then
$L=s^4-1=(s-1)(s+1)(s^2+1).$
Because \(s\le 10^4\), the three factors \(s-1\), \(s+1\), and \(s^2+1\) are moderate-size integers. Factoring those three numbers is enough to recover the complete prime factorization of \(L\).
For each prime power \(p^k\mid L\), the order satisfies
$\operatorname{ord}_{p^k}(\alpha)\mid \varphi(p^k)=p^{k-1}(p-1).$
The exact order is obtained by starting from \(\varphi(p^k)\) and repeatedly removing prime factors whenever the modular test
$\alpha^{\,\varphi(p^k)/q}\equiv 1 \pmod{p^k}$
still holds.
If
$d=\prod_i p_i^{e_i},$
then multiplicativity and the Chinese remainder theorem give
$\varphi(d)=\prod_i \varphi(p_i^{e_i}),\qquad \operatorname{ord}_d(\alpha)=\operatorname{lcm}_i\operatorname{ord}_{p_i^{e_i}}(\alpha).$
So a depth-first enumeration of the exponent choices \(0\le e_i\le v_{p_i}(L)\) visits every divisor \(d\mid L\) and accumulates its cycle contribution exactly, without ever traversing the \(Q\) entries of the rectangle.
How the Code Works
The C++, Python, and Java implementations follow the same number-theoretic pipeline.
Reusable Factorization Data
A prime sieve is built once. For a given pair \((n,m)\), the implementation forms \(s=nm\), factors \(s-1\), \(s+1\), and \(s^2+1\), and merges those exponents into the factorization of \(L=s^4-1\). Because different pairs can share the same product \(s\), this factorization is cached and reused. Factorizations of numbers of the form \(p-1\) are also cached, because they are needed repeatedly when reducing \(\varphi(p^k)\) to the exact order \(\operatorname{ord}_{p^k}(\alpha)\).
Per-Pair Evaluation
For each prime power \(p^k\mid L\), the implementation precomputes \(\varphi(p^k)\) and \(\operatorname{ord}_{p^k}(\alpha)\). It then runs a divisor DFS: each recursion level chooses one exponent for one prime, multiplies the running totient contribution, updates the running order with an \(\operatorname{lcm}\), and descends to the next prime. At every leaf, the divisor contribution
$\frac{\varphi(d)}{\operatorname{ord}_d(\alpha)}$
is added to the cycle total \(C_L\). The returned swap count is therefore \(Q-1-C_L\).
The outer loops sum this value over all \(2\le n\le m\le 100\). The implementations also compare the divisor formula against explicit cycle decompositions on small cases; for example, the ordinary \(3\times 4\) rectangle gives \(8\) swaps exactly, matching the theory above.
Complexity Analysis
For one pair \((n,m)\), the algorithm never iterates over all \(Q=(nm)^4\) positions. Its work is concentrated in three places: factoring \(s-1\), \(s+1\), and \(s^2+1\); computing multiplicative orders on the relevant prime powers; and enumerating the \(\tau(L)\) divisors of \(L\) with a DFS over prime exponents. In practice that is far smaller than a brute-force cycle walk on the full transpose permutation.
Memory usage is modest. The main data are the prime list, the caches for repeated factorizations, and the prime-power tables for the current pair. The method is efficient precisely because it manipulates the factorization of \(L=(nm)^4-1\), not the \((nm)^4\) matrix entries themselves.
Footnotes and References
- Problem page: https://projecteuler.net/problem=913
- Row-major and column-major order: Wikipedia - Row- and column-major order
- In-place matrix transposition: Wikipedia - In-place matrix transposition
- Euler's totient function: Wikipedia - Euler's totient function
- Multiplicative order: Wikipedia - Multiplicative order
- Chinese remainder theorem: Wikipedia - Chinese remainder theorem