Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 701: Random Connected Area

View on Project Euler

Project Euler Problem 701 Solution

EulerSolve provides an optimized solution for Project Euler Problem 701, Random Connected Area, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary We consider a random \(W\times H\) binary grid, with the Project Euler instance using \(W=H=7\). Each cell is occupied independently with probability \(1/2\), and occupied cells are connected by 4-neighbor adjacency. If \(L(B)\) denotes the size of the largest occupied connected component of a board \(B\), the task is to compute the exact expectation \(\mathbb E[L(B)]\). A brute-force search would inspect all \(2^{WH}\) boards, which is impossible for \(7\times 7\). The successful idea is to scan the grid one column at a time and keep only the information that can still influence future growth. Mathematical Approach Step 1: Rewrite the expectation as an exact counting problem Let $\Omega_{W,H}=\{0,1\}^{W\times H}.$ For each board \(B\in\Omega_{W,H}\), let \(\mathcal C(B)\) be the set of occupied 4-connected components of \(B\), and define $L(B)=\max_{C\in\mathcal C(B)} \lvert C\rvert,$ with the convention \(L(B)=0\) when the board has no occupied cells. Since every board is equally likely, $\mathbb E[L]=\frac{1}{2^{WH}}\sum_{B\in\Omega_{W,H}} L(B).$ So the whole problem is to sum the largest-component size over all boards exactly, then divide by \(2^{WH}\). Step 2: Keep only the frontier information that matters Suppose the first \(c\) columns have already been processed....

Detailed mathematical approach

Problem Summary

We consider a random \(W\times H\) binary grid, with the Project Euler instance using \(W=H=7\). Each cell is occupied independently with probability \(1/2\), and occupied cells are connected by 4-neighbor adjacency. If \(L(B)\) denotes the size of the largest occupied connected component of a board \(B\), the task is to compute the exact expectation \(\mathbb E[L(B)]\).

A brute-force search would inspect all \(2^{WH}\) boards, which is impossible for \(7\times 7\). The successful idea is to scan the grid one column at a time and keep only the information that can still influence future growth.

Mathematical Approach

Step 1: Rewrite the expectation as an exact counting problem

Let

$\Omega_{W,H}=\{0,1\}^{W\times H}.$

For each board \(B\in\Omega_{W,H}\), let \(\mathcal C(B)\) be the set of occupied 4-connected components of \(B\), and define

$L(B)=\max_{C\in\mathcal C(B)} \lvert C\rvert,$

with the convention \(L(B)=0\) when the board has no occupied cells. Since every board is equally likely,

$\mathbb E[L]=\frac{1}{2^{WH}}\sum_{B\in\Omega_{W,H}} L(B).$

So the whole problem is to sum the largest-component size over all boards exactly, then divide by \(2^{WH}\).

Step 2: Keep only the frontier information that matters

Suppose the first \(c\) columns have already been processed. Any connected component that does not touch column \(c\) can never grow again, because all future cells lie strictly to the right. Therefore closed components only matter through a single number: the largest closed size seen so far.

Now look at column \(c\). Its occupied cells form vertical runs such as \(11100 11\mapsto (111)\) and \((11)\). Let these runs be \(R_1,\dots,R_t\). Two different runs in the same column may already belong to the same global component because earlier columns can connect them around the gaps. Hence the frontier is not described only by the mask of occupied cells; we must also know which runs belong together.

The state can therefore be written as

$\sigma=(b,\Pi,a_1,\dots,a_r),$

where:

Here \(b\) denotes the largest closed component size seen so far.

\(\Pi\) is a partition of the runs \(R_1,\dots,R_t\), and each block of \(\Pi\) corresponds to one open component touching the frontier,

For \(1\le j\le r\), \(a_j\) denotes the total size of the \(j\)-th open component.

This is sufficient because future columns can interact with the processed region only through those open components.

Step 3: Append one new column

Let the next column be the bitmask \(m\in\{0,1\}^W\). Its occupied cells also decompose into vertical runs, say \(S_1,\dots,S_u\).

An old open component touches the new column exactly at rows where both adjacent columns contain an occupied cell. Once one cell of a run \(S_k\) is touched, the whole run belongs to the same new connected piece, because cells inside one vertical run are already connected within the new column.

This produces three possibilities:

First, an old open component may touch nothing in the new column. Then it becomes closed forever and can only update the best closed value.

Second, one old component may touch one or more runs in the new column. Then it survives and its size grows by the number of newly occupied cells attached to it.

Third, several old components may touch the same run, or runs that become linked inside the new column. Then those components merge into one new open component.

If \(\mathcal D\) is the set of old open components that disappear, then

$b'=\max\!\Bigl(b,\max_{j\in\mathcal D} a_j\Bigr).$

For each new open component \(T\), let \(\mathcal A(T)\) be the set of old components that feed into it. Its new size is

$a'(T)=\lvert T\rvert+\sum_{j\in\mathcal A(T)} a_j,$

where \(\lvert T\rvert\) counts only the occupied cells in the newly appended column.

Step 4: Why partitions of runs are enough

Inside a single column, connectivity is completely determined by vertical runs. Cells in the same run are automatically connected, while cells in different runs are separated inside that column. The only missing information is whether two distinct runs are already linked through earlier columns. That is exactly what the partition \(\Pi\) records.

This compression is powerful because a width-7 column has at most four occupied runs, for example \(1010101\). So for each column mask there are only finitely many possible partitions of those runs, and all of them can be precomputed once. The dynamic program then works with canonical frontier patterns instead of full partial boards.

Step 5: Worked example

Take width \(W=3\). Suppose the first processed column is \(101\). Then there are two runs:

$R_1=\{1\},\qquad R_2=\{3\}.$

No earlier column exists, so they are separate open components of sizes \(1\) and \(1\). The state is

$\sigma=\bigl(0,\{\{R_1\},\{R_2\}\},1,1\bigr).$

Now append the next column \(111\). In the new column there is one run

$S_1=\{1,2,3\}.$

The old component at row \(1\) touches \(S_1\), and the old component at row \(3\) also touches \(S_1\). Because those contacts lie in the same vertical run, the whole run becomes one connected bridge, so the two old components merge. The new open component has size

$1+1+3=5.$

The new state is therefore

$\sigma'=\bigl(0,\{\{S_1\}\},5\bigr).$

If the following column were \(000\), that size-5 component would lose frontier contact, close permanently, and update the best closed value to \(5\).

Step 6: Recover the expectation from state counts

Let \(w_c(\sigma)\) be the number of \(c\)-column prefixes that produce state \(\sigma\). The transition defined above gives

$w_{c+1}(\sigma')=\sum_{\sigma}\sum_{\substack{m\in\{0,1\}^W\\ T_m(\sigma)=\sigma'}} w_c(\sigma),$

where \(T_m\) is the deterministic update induced by the next column mask \(m\). The dynamic program stores these weights as exact integers, not floating-point probabilities.

After all \(H\) columns have been processed, every terminal state \(\sigma=(b,\Pi,a_1,\dots,a_r)\) contributes

$\operatorname{score}(\sigma)=\max(b,a_1,\dots,a_r)$

to the largest connected area of each board counted by that state. Hence

$\mathbb E[L]=\frac{1}{2^{WH}}\sum_{\sigma\in\mathcal T} w_H(\sigma)\operatorname{score}(\sigma),$

where \(\mathcal T\) is the set of terminal states. This is exactly the quantity evaluated by the implementation.

How the Code Works

The implementation first enumerates every possible column mask, decomposes each mask into vertical runs, and precomputes the canonical partitions of those runs. It also precomputes how an old frontier pattern interacts with a new column: which old components survive, which ones merge, which ones disappear, and how many new occupied cells are absorbed into each surviving component.

The dynamic program then advances one column at a time. A state is keyed by the frontier partition together with the accumulated open-component sizes and the current best closed size. When a new column is appended, the implementation applies the precomputed connectivity rule, creates the resulting state, and merges identical target states by adding their integer multiplicities.

Because every column mask is equally likely, the C++, Python, and Java implementations count boards rather than carrying fractions at each step. Only at the very end do they divide by \(2^{WH}\). The Python and Java implementations use the same exact underlying solver as the C++ implementation, so all three languages follow identical transitions and produce the same final value.

The implementation also uses horizontal reflection on the penultimate layer: mirror-image masks contribute equally to the final expectation, so non-symmetric representatives can be counted once with multiplicity \(2\). This reduces the amount of final work without changing the exact result.

Complexity Analysis

Let \(P\) be the total number of canonical frontier partitions across all column masks, and let \(S_c\) be the number of reachable compressed states after \(c\) columns. Precomputing mask-to-mask contact information costs \(O(4^W\cdot W)\), and building the partition-transition tables costs \(O(P\cdot 2^W)\).

The main dynamic program processes each reachable state against each possible next-column mask, so its time complexity is

$O\!\left(\sum_{c=0}^{H-1} 2^W S_c\right).$

The memory usage is

$O(S_c+P\cdot 2^W)$

for the active DP layers and the precomputed transition data. For the \(7\times 7\) instance this is practical, while direct enumeration of all \(2^{49}\) boards is not.

Footnotes and References

  1. Problem page: https://projecteuler.net/problem=701
  2. Connected components in graph theory: Wikipedia — Connected component
  3. Transfer-matrix method: Wikipedia — Transfer-matrix method
  4. Bell numbers and set partitions: Wikipedia — Bell number
  5. Dynamic programming: Wikipedia — Dynamic programming

Mathematical approach · C++ solution · Python solution · Java solution

Previous: Problem 700 · All Project Euler solutions · Next: Problem 702

Need help with a problem? Ask me! 💡
e
✦ Euler GLM 5.2
Hi! I'm Euler. Ask me anything about Project Euler problems, math concepts, or solution approaches! 🧮