Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 253: Tidying Up A

Project Euler Problem 253 Solution

EulerSolve provides an optimized solution for Project Euler Problem 253, Tidying Up A, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary We start with 40 occupied cells in a row. At each step, one occupied cell is chosen uniformly at random and removed. The occupied cells always form a disjoint union of contiguous segments. If $C_t$ denotes the number of occupied segments after the $t$-th removal, the goal is to compute $\mathbb{E}\!\left[\max_t C_t\right].$ The key difficulty is that the process has $40!$ possible removal orders, so a direct permutation average is infeasible. The code solves the problem exactly by compressing configurations to segment lengths and then applying a memoized expectation recursion. Mathematical Approach Reverse-Time Viewpoint Every deletion history is just a permutation of the 40 positions. If we reverse that permutation and insert cells instead of deleting them, we obtain the same sequence of occupied sets in reverse order. Therefore the maximum number of segments is identical in the forward deletion process and in the reverse insertion process. This observation is not needed for the main DP, but it explains why the brute-force validator in the C++ file can enumerate insertion permutations for small $n$. State Compression by Segment Lengths A configuration does not need absolute coordinates. Only the lengths of the current occupied segments matter....

Detailed mathematical approach

Problem Summary

We start with 40 occupied cells in a row. At each step, one occupied cell is chosen uniformly at random and removed. The occupied cells always form a disjoint union of contiguous segments. If $C_t$ denotes the number of occupied segments after the $t$-th removal, the goal is to compute

$\mathbb{E}\!\left[\max_t C_t\right].$

The key difficulty is that the process has $40!$ possible removal orders, so a direct permutation average is infeasible. The code solves the problem exactly by compressing configurations to segment lengths and then applying a memoized expectation recursion.

Mathematical Approach

Reverse-Time Viewpoint

Every deletion history is just a permutation of the 40 positions. If we reverse that permutation and insert cells instead of deleting them, we obtain the same sequence of occupied sets in reverse order. Therefore the maximum number of segments is identical in the forward deletion process and in the reverse insertion process.

This observation is not needed for the main DP, but it explains why the brute-force validator in the C++ file can enumerate insertion permutations for small $n$.

State Compression by Segment Lengths

A configuration does not need absolute coordinates. Only the lengths of the current occupied segments matter. So the state is stored as a sorted vector

$S=(\ell_1,\ell_2,\dots,\ell_m),\qquad 1\le \ell_1\le \ell_2\le \cdots \le \ell_m,$

where $m=|S|$ is the current number of segments and

$R(S)=\ell_1+\ell_2+\cdots+\ell_m$

is the number of occupied cells still remaining.

The order of segments is irrelevant: states such as $(1,3)$ and $(3,1)$ have the same future law because each segment evolves independently except for its length, and deletions never reconnect separated segments. In other words, a state is just an integer partition of the remaining occupied cells.

Single-Step Transitions

Consider one segment of length $L$.

If $L=1$, deleting its only cell removes that segment completely.

If $L\ge 2$, there are two qualitatively different removals:

1. Deleting an endpoint leaves one segment of length $L-1$. There are exactly 2 such cells.

2. Deleting an interior cell splits the segment into two smaller segments. If the removed cell leaves $a$ cells on the left and $b$ cells on the right, then

$a+b=L-1,\qquad a\ge 1,\quad b\ge 1.$

Equivalently, for each

$a=1,2,\dots,L-2,$

we obtain the split

$\bigl(a,\;L-1-a\bigr).$

The implementation removes one copy of $L$ from the sorted state vector and inserts the new lengths back in sorted order.

Aggregated Transition Weights

If the same segment length appears several times, the corresponding transitions are multiplied by that multiplicity. Suppose length $L$ occurs $c_L$ times in the state. Then every transition caused by deleting a cell from a segment of length $L$ receives an extra factor $c_L$.

The code aggregates all identical next states into an integer weight

$w(S\to S').$

Because each occupied cell is equally likely to be removed, the transition probability is simply

$\Pr(S\to S')=\frac{w(S\to S')}{R(S)}.$

The total outgoing weight is always

$\sum_{S'} w(S\to S')=R(S),$

because each of the $R(S)$ occupied cells contributes exactly one deletion outcome.

Expected Maximum Recurrence

Let

$E(S,M)$

denote the expected final value of the maximum segment count, assuming the current compressed state is $S$ and the largest segment count seen so far is $M$.

The base case is immediate:

$E(\varnothing,M)=M.$

If $S\neq\varnothing$, condition on the next deletion. Then

$E(S,M)=\sum_{S'} \Pr(S\to S')\;E\bigl(S',\max(M,|S'|)\bigr).$

Using the aggregated weights, this becomes

$E(S,M)=\frac{1}{R(S)}\sum_{S'} w(S\to S')\;E\bigl(S',\max(M,|S'|)\bigr).$

This is the exact recurrence implemented in all three languages.

Worked Example: Starting from 3 Cells

For the initial state $S=(3)$, there are three equally likely deletions. Removing either endpoint gives the state $(2)$, while removing the middle cell gives $(1,1)$. Hence

$E((3),1)=\frac{2}{3}E((2),1)+\frac{1}{3}E((1,1),2).$

Now $(2)$ can only go to $(1)$, so

$E((2),1)=1.$

And from $(1,1)$, deleting one singleton leaves $(1)$ while the maximum segment count has already reached 2, so

$E((1,1),2)=2.$

Therefore

$E((3),1)=\frac{2}{3}\cdot 1+\frac{1}{3}\cdot 2=\frac{4}{3}.$

This matches the brute-force average over the $3!=6$ possible deletion orders.

Why Memoization Is Exact

The process is Markovian after compression: once we know the multiset of current segment lengths and the best segment count seen so far, the future no longer depends on the detailed history. Thus the pair $(S,M)$ is a sufficient state description.

The recursion is exact because it is just the law of total expectation applied to the first random deletion. Memoization merely avoids recomputing the same subproblem; it does not introduce approximation.

Validation Checkpoint

The C++ code explicitly validates the recurrence against brute force for small sizes:

$E((6),1)=2.255555555556,$

$E((7),1)=2.544444444444.$

Both values agree with exhaustive enumeration of all $6!$ and $7!$ insertion permutations, which is a strong sanity check before solving the full $40$-cell problem.

How the Code Works

The helper make_next constructs a new sorted state after removing one segment length and inserting the child lengths. The function get_transitions scans equal segment lengths together, aggregates identical next states in a hash map, and stores the resulting list of (nextState, weight) pairs. The function expected_max memoizes the recurrence above; in C++ each compressed state owns a vector of length 41, because the running maximum can only be between 0 and 40. Finally, the program starts from the one-segment state

$S_0=(40)$

with initial maximum $M=1$, after first checking the DP against brute force for $n=6$ and $n=7$.

Complexity Analysis

A pure shape state with $r$ remaining cells is an integer partition of $r$. Therefore the number of possible compressed shape states is at most

$\sum_{r=0}^{40} p(r)=215308,$

where $p(r)$ is the partition function and $p(40)=37338$. This is dramatically smaller than $40!$ deletion orders.

For one fixed state $S$, the number of raw deletion outcomes is exactly $R(S)$, and after aggregation the number of distinct next states is still $O(R(S))$. Memoization ensures that each pair $(S,M)$ is solved only once, so the actual runtime is governed by the number of reachable compressed states and their aggregated transitions rather than by permutations.

Memory usage is proportional to the cached transition table plus the memo table for $(S,M)$.

Footnotes and References

Problem page: https://projecteuler.net/problem=253
Markov chains: Wikipedia — Markov chain
Dynamic programming and memoization: cp-algorithms — Introduction to Dynamic Programming
Integer partitions: Wikipedia — Partition (number theory)
Law of total expectation: Wikipedia — Law of total expectation