Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 822: Square the Smallest

Project Euler Problem 822 Solution

EulerSolve provides an optimized solution for Project Euler Problem 822, Square the Smallest, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary Start with the list $2,3,\dots,n$. Each operation finds the smallest current entry and replaces it by its square. After $m$ operations we need the sum of all entries modulo $1234567891$. Because $m$ can be enormous, the solution does not simulate every step on the raw integers. Mathematical Approach For each initial base $b\in\{2,\dots,n\}$, let $t_b$ be the number of times that entry has been selected so far. Its current value is then $b^{2^{t_b}}.$ The solution separates two tasks: determining which entry is currently smallest, and evaluating the final values modulo the prime modulus. Step 1: Move the Ordering Problem into Logarithms The logarithm is strictly increasing, so comparing values is equivalent to comparing their logarithms. For the entry that started at $b$, define $\lambda_b=\log\left(b^{2^{t_b}}\right)=2^{t_b}\log b.$ Choosing the smallest current number is therefore the same as choosing the smallest $\lambda_b$. One squaring sends $\lambda_b \longmapsto 2\lambda_b.$ This lets the implementation maintain a min-priority structure on logarithmic keys instead of on astronomically large integers. When two logarithmic keys are equal, the values are equal as well; the implementations use a deterministic secondary order so the data structure stays stable....

Detailed mathematical approach

Problem Summary

Start with the list $2,3,\dots,n$. Each operation finds the smallest current entry and replaces it by its square. After $m$ operations we need the sum of all entries modulo $1234567891$. Because $m$ can be enormous, the solution does not simulate every step on the raw integers.

Mathematical Approach

For each initial base $b\in\{2,\dots,n\}$, let $t_b$ be the number of times that entry has been selected so far. Its current value is then

$b^{2^{t_b}}.$

The solution separates two tasks: determining which entry is currently smallest, and evaluating the final values modulo the prime modulus.

Step 1: Move the Ordering Problem into Logarithms

The logarithm is strictly increasing, so comparing values is equivalent to comparing their logarithms. For the entry that started at $b$, define

$\lambda_b=\log\left(b^{2^{t_b}}\right)=2^{t_b}\log b.$

Choosing the smallest current number is therefore the same as choosing the smallest $\lambda_b$. One squaring sends

$\lambda_b \longmapsto 2\lambda_b.$

This lets the implementation maintain a min-priority structure on logarithmic keys instead of on astronomically large integers. When two logarithmic keys are equal, the values are equal as well; the implementations use a deterministic secondary order so the data structure stays stable.

Step 2: Simulate Explicitly While the Smallest Entry Can Stay Competitive

Write the ordered logarithms at some moment as

$\lambda_1\le \lambda_2\le \cdots \le \lambda_L,\qquad L=n-1.$

Suppose the smallest one satisfies

$2\lambda_1\le \lambda_L.$

After squaring that smallest entry, its new logarithm is still no larger than the current maximum. So it may remain among the small values and can affect the immediate future ordering in a nontrivial way. During this phase the algorithm performs exact heap updates one operation at a time.

The number of such explicit updates is usually far smaller than $m$, because each update doubles one logarithm and quickly pushes that entry upward.

Step 3: Prove That the Balanced Regime Becomes Cyclic

The explicit phase stops once

$2\lambda_1>\lambda_L.$

Now squaring the smallest entry sends it strictly past every untouched entry, because its new logarithm exceeds the current maximum. One step therefore transforms the sorted list into

$\lambda_2,\lambda_3,\dots,\lambda_L,2\lambda_1.$

The relative order of the untouched entries does not change, and the updated entry moves to the end. Repeating the same argument shows that the next selected entry must be $\lambda_2$, then $\lambda_3$, and so on. After one full round of $L$ operations, every entry has been squared exactly once and the ordered list becomes

$2\lambda_1,2\lambda_2,\dots,2\lambda_L.$

The key point is that the same inequality still holds after scaling the entire list by $2$:

$2(2\lambda_1)>2\lambda_L \iff 2\lambda_1>\lambda_L.$

So once this regime begins, the process continues in identical cycles forever.

Step 4: Distribute the Remaining Operations in Bulk

Let $r_{\mathrm{rem}}$ be the number of operations left after the explicit phase. Since the balanced regime proceeds in complete rounds over the $L$ entries, write

$r_{\mathrm{rem}}=qL+s,\qquad 0\le s<L.$

Then every entry is squared $q$ more times, and the first $s$ entries in the current sorted logarithmic order are squared once additional time. Equivalently, if the balanced-order list is

$\lambda_1\le \lambda_2\le \cdots \le \lambda_L,$

then entries $1,2,\dots,s$ receive $q+1$ further squarings, while entries $s+1,\dots,L$ receive $q$. This replaces an astronomically large tail of the process by one division and one remainder.

Step 5: Recover the Final Values Modulo the Prime

If the current residue modulo $M=1234567891$ is $v$, then applying $t$ further squarings produces

$v^{2^t}\pmod{M}.$

Therefore the bulk phase only needs the exponents $2^q$ and $2^{q+1}$. Because $M$ is prime and the initial bases $2,3,\dots,n$ are all nonzero modulo $M$, every later residue is also nonzero modulo $M$. Fermat's little theorem therefore reduces exponents modulo $M-1$:

$a^k\equiv a^{\,k\bmod (M-1)}\pmod{M}\qquad (a\not\equiv 0\pmod{M}).$

So the implementation computes $2^q\bmod (M-1)$, doubles it to obtain $2^{q+1}\bmod (M-1)$, and then evaluates the required modular powers for the final sum.

Worked Example: $(n,m)=(5,3)$

The initial list is $2,3,4,5$. The first three operations are easy to follow directly:

$[2,3,4,5]\to [4,3,4,5]\to [4,9,4,5]\to [16,9,4,5].$

Therefore the required sum is

$16+9+4+5=34,$

which matches the checkpoint used by the implementations.

To see the bulk rule in isolation, suppose the ordered logarithms after the explicit phase are $1.40,1.55,1.80,2.10$. Since $2\cdot 1.40=2.80>2.10$, the next four selections are forced to occur in that same order. If seven operations remain, then

$7=1\cdot 4+3,$

so every entry is squared once more, and the first three receive one extra squaring.

How the Code Works

The C++, Python, and Java implementations store two pieces of information for each entry: a logarithmic key used only for ordering, and the current residue modulo $1234567891$. They initialize a min-priority structure with the values $2,3,\dots,n$ and also track the largest current logarithm.

While doubling the smallest logarithm does not jump beyond the current maximum, the implementation performs one exact update: remove the minimum, double its logarithmic key, square its residue modulo the prime, reinsert it, and refresh the current maximum if needed.

Once the balanced inequality $2\lambda_{\min}>\lambda_{\max}$ holds, the remaining items are extracted and sorted by the same ordering rule. The implementation then computes the quotient $q$ and remainder $s$ of the remaining step count by $L=n-1$, converts those counts into the exponents $2^q$ and $2^{q+1}$ modulo $M-1$, and applies fast modular exponentiation to each stored residue. The first $s$ sorted entries receive the larger exponent, the rest receive the smaller one, and the resulting residues are summed modulo $M$.

Complexity Analysis

Let $L=n-1$, and let $u$ be the number of explicit priority-queue updates performed before the balanced regime begins. The implementations insert $L$ initial items into the priority structure, which costs $O(L\log L)$. The explicit phase then costs $O(u\log L)$, because each update removes and reinserts one element.

After that, the remaining work is one sort of $L$ items, one exponentiation to compute $2^q\bmod(M-1)$, and $L$ modular exponentiations for the final residues. This gives overall running time

$O(u\log L+L\log L+L\log M+\log q),$

which simplifies to $O(u\log L+L\log L)$ when the modulus is treated as fixed. The memory usage is $O(L)$. The important point is that the method avoids any linear dependence on the huge value of $m$ after the short explicit phase.

Footnotes and References

Problem page: https://projecteuler.net/problem=822
Logarithm: Wikipedia — Logarithm
Heap data structure: Wikipedia — Heap (data structure)
Modular exponentiation: Wikipedia — Modular exponentiation
Fermat's little theorem: Wikipedia — Fermat's little theorem