Problem 640: Shut the Box
View on Project EulerProject Euler Problem 640 Solution
EulerSolve provides an optimized solution for Project Euler Problem 640, Shut the Box, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary In this variant of Shut the Box , the cards are labeled \(1\) through \(12\). A roll of two fair six-sided dice produces an ordered outcome \(o=(x,y)\). After seeing that outcome, we may flip card \(x\), card \(y\), or card \(x+y\). Starting from the initial configuration and aiming for the state in which all twelve target bits are set, we want the strategy that minimizes the expected number of rolls. The full game is small enough for exact dynamic programming but too large for hand analysis: there are \(2^{12}=4096\) states, \(36\) ordered dice outcomes, and a decision to make after every observed roll. Mathematical Approach The problem is a finite Markov decision process with one absorbing goal state. The correct value function is the optimal expected number of remaining rolls from each state. Step 1: Encode the State Space Represent the current configuration by a 12-bit mask \(s\in\{0,\dots,2^{12}-1\}\). Bit \(a-1\) records the status of card \(a\). The implementations use $s_0=0,\qquad s^\star=2^{12}-1$ for the initial and target states. The exact interpretation of a bit as “open” or “closed” is irrelevant mathematically; what matters is that \(s_0\) is the start and \(s^\star\) is the absorbing goal....
Detailed mathematical approach
Problem Summary
In this variant of Shut the Box, the cards are labeled \(1\) through \(12\). A roll of two fair six-sided dice produces an ordered outcome \(o=(x,y)\). After seeing that outcome, we may flip card \(x\), card \(y\), or card \(x+y\). Starting from the initial configuration and aiming for the state in which all twelve target bits are set, we want the strategy that minimizes the expected number of rolls.
The full game is small enough for exact dynamic programming but too large for hand analysis: there are \(2^{12}=4096\) states, \(36\) ordered dice outcomes, and a decision to make after every observed roll.
Mathematical Approach
The problem is a finite Markov decision process with one absorbing goal state. The correct value function is the optimal expected number of remaining rolls from each state.
Step 1: Encode the State Space
Represent the current configuration by a 12-bit mask \(s\in\{0,\dots,2^{12}-1\}\). Bit \(a-1\) records the status of card \(a\). The implementations use
$s_0=0,\qquad s^\star=2^{12}-1$
for the initial and target states. The exact interpretation of a bit as “open” or “closed” is irrelevant mathematically; what matters is that \(s_0\) is the start and \(s^\star\) is the absorbing goal.
Step 2: Describe One Move After a Roll
For an observed dice result \(o=(x,y)\), the admissible toggle choices are
$A(o)=\{x,\ y,\ x+y\}\subseteq \{1,\dots,12\}.$
Choosing card \(a\in A(o)\) flips exactly one bit, so the successor state is
$T(s,a)=s\oplus 2^{a-1}.$
Every ordered outcome has probability
$p(o)=\frac{1}{36}.$
If \(x=y\), then the list \(x,y,x+y\) contains a repeated move. That causes no mathematical issue, because repeated candidates do not change the minimum over successor values.
Step 3: Write the Bellman Optimality Equation
Let \(V(s)\) be the optimal expected number of additional rolls needed to reach \(s^\star\). Then
$V(s^\star)=0.$
For every nonterminal state, the player first observes the dice outcome and only then chooses the best toggle. Therefore the minimization is inside the expectation:
$V(s)=1+\sum_{o} p(o)\min_{a\in A(o)} V\bigl(T(s,a)\bigr),\qquad s\ne s^\star.$
This is the key equation of the whole problem. A common mistake would be to choose one action before the roll; that would place the minimum outside the sum and would describe a different game.
Step 4: Evaluate a Fixed Policy
A deterministic policy \(\pi\) assigns one allowed toggle to every pair \((s,o)\):
$\pi(s,o)\in A(o).$
Once \(\pi\) is fixed, its value function satisfies
$V_{\pi}(s^\star)=0,$
$V_{\pi}(s)=1+\sum_{o} p(o)\,V_{\pi}\bigl(T(s,\pi(s,o))\bigr).$
The implementations do not solve this linear system explicitly. Instead they repeatedly apply the fixed-policy update in place until the maximum change becomes tiny. Numerically, that is a relaxation scheme for the policy-specific Bellman equations.
Step 5: Improve the Policy
After evaluating the current policy, each state-outcome pair is improved greedily by choosing the successor with the smallest estimated remaining cost:
$\pi_{\text{new}}(s,o)\in \arg\min_{a\in A(o)} V\bigl(T(s,a)\bigr).$
If no pair \((s,o)\) changes, the policy is already optimal for this finite state space, so the corresponding value function is the desired one. The programs then run one more high-precision evaluation pass to sharpen the final value \(V(s_0)\).
Worked Example: The 4-Card Checkpoint
Before solving the 12-card game, the C++, Python, and Java implementations validate the method on a smaller model with 4 cards and coin outcomes \(x,y\in\{1,2\}\), each with probability \(1/4\). Starting from the empty state \(\varnothing\), the four ordered outcomes are \((1,1)\), \((1,2)\), \((2,1)\), and \((2,2)\).
From \(\varnothing\), the Bellman equation is
$\begin{aligned} V(\varnothing)=1 &+\frac14 \min\bigl(V(\{1\}),V(\{2\})\bigr) \\ &+\frac14 \min\bigl(V(\{1\}),V(\{2\}),V(\{3\})\bigr) \\ &+\frac14 \min\bigl(V(\{1\}),V(\{2\}),V(\{3\})\bigr) \\ &+\frac14 \min\bigl(V(\{2\}),V(\{4\})\bigr). \end{aligned}$
This compactly shows the whole logic: every observed outcome contributes its own best successor, and only then are those contributions averaged. Solving all \(2^4=16\) states gives
$V(\varnothing)=5.673651\ldots$
which is exactly the numerical checkpoint used by the implementation before the full 12-card solve.
How the Code Works
The C++, Python, and Java implementations all follow the same policy-iteration strategy. They enumerate all ordered outcomes, precompute the three candidate toggles for every state-outcome pair, and store a value array over all \(4096\) states.
Policy evaluation repeatedly sweeps through the nonterminal states and replaces the current value by
$1+\sum_o p(o)\,V\bigl(T(s,\pi(s,o))\bigr).$
Policy improvement then checks the three possible successors for each \((s,o)\) and keeps the one with the smallest current value estimate. The outer loop alternates these two phases until no decision changes. After that, a longer final evaluation with a tighter tolerance produces a stable six-decimal answer from the initial state.
The Java program mirrors the C++ numerical logic directly. The Python entry point returns the same result by invoking the same compiled solver, so the mathematical method remains identical across all three language versions.
Complexity Analysis
Let \(S=2^{12}=4096\), let \(O=36\) be the number of ordered dice outcomes, and let \(A=3\) be the maximum number of candidate toggles. One policy-evaluation sweep costs \(O(SO)\), because the current action is already fixed for each state-outcome pair. One policy-improvement sweep costs \(O(SOA)\), because it compares up to three successors for each pair.
Memory usage is \(O(SO)\) for the policy table plus \(O(S)\) for the value array. With these constants, the method is comfortably small: the entire exact solve fits into a modest dynamic-programming loop instead of needing any large symbolic machinery.
Footnotes and References
- Problem page: https://projecteuler.net/problem=640
- Markov decision process: Wikipedia - Markov decision process
- Bellman equation: Wikipedia - Bellman equation
- Policy iteration: Wikipedia - Policy iteration
- Absorbing Markov chain: Wikipedia - Absorbing Markov chain