Problem 96: Su Doku
View on Project EulerProject Euler Problem 96 Solution
EulerSolve provides an optimized solution for Project Euler Problem 96, Su Doku, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary The input consists of 50 standard Sudoku grids. Each grid is a \(9 \times 9\) array with digits \(0\) through \(9\), where \(0\) marks an empty cell. After solving a grid, the required contribution is the three-digit number formed by the first three entries of the top row. If the completed grid is written as \(x_{r,c}\) for \(1 \le r,c \le 9\), then one puzzle contributes $100x_{1,1}+10x_{1,2}+x_{1,3}.$ The final answer is the sum of this quantity over all 50 solved grids. The real work is therefore to complete each partially filled grid while respecting the Sudoku rules: every row, every column, and every \(3 \times 3\) box must contain each digit from 1 to 9 exactly once. Mathematical Approach The implementations treat Sudoku as a finite search problem with a very strict legality invariant. At every recursive step, the current board is a partial completion that already agrees with all row, column, and box constraints....
Detailed mathematical approach
Problem Summary
The input consists of 50 standard Sudoku grids. Each grid is a \(9 \times 9\) array with digits \(0\) through \(9\), where \(0\) marks an empty cell. After solving a grid, the required contribution is the three-digit number formed by the first three entries of the top row.
If the completed grid is written as \(x_{r,c}\) for \(1 \le r,c \le 9\), then one puzzle contributes
$100x_{1,1}+10x_{1,2}+x_{1,3}.$
The final answer is the sum of this quantity over all 50 solved grids. The real work is therefore to complete each partially filled grid while respecting the Sudoku rules: every row, every column, and every \(3 \times 3\) box must contain each digit from 1 to 9 exactly once.
Mathematical Approach
The implementations treat Sudoku as a finite search problem with a very strict legality invariant. At every recursive step, the current board is a partial completion that already agrees with all row, column, and box constraints.
Rows, columns, boxes, and the legality invariant
For a partially filled grid, define the used-digit sets
$R_r=\{x_{r,c}\ne 0:1\le c\le 9\},\qquad C_c=\{x_{r,c}\ne 0:1\le r\le 9\},$
and for each box
$B_b=\{x_{r,c}\ne 0:b(r,c)=b\}.$
A box is identified by
$b(r,c)=3\left\lfloor\frac{r-1}{3}\right\rfloor+\left\lfloor\frac{c-1}{3}\right\rfloor+1.$
The essential invariant is simple: no nonzero digit may appear twice inside the same \(R_r\), the same \(C_c\), or the same \(B_b\). Any recursive branch that would violate this invariant can be discarded immediately.
Candidate sets for an empty cell
For an empty position \((r,c)\), the allowable digits are exactly
$A(r,c)=\{1,2,\dots,9\}\setminus\bigl(R_r\cup C_c\cup B_{b(r,c)}\bigr).$
This set is the mathematical core of the solver. If \(A(r,c)=\varnothing\), the current partial grid cannot be extended to a full solution. If \(|A(r,c)|=1\), the placement is forced. Otherwise the solver must branch on the candidates in \(A(r,c)\).
The C++ implementation stores these used-digit sets as bit patterns, so the candidate set is recovered with bitwise operations. The Python and Java implementations compute the same set more directly by scanning the corresponding row, column, and box whenever they test a trial digit.
The recursive search recurrence
Let \(S\) be a legal partial Sudoku state. If there are no empty cells left, then \(S\) is already a complete solution. Otherwise choose one empty cell \(u\), compute its candidate set \(A(u)\), and try each legal digit in turn. In logical form, the search is
$\operatorname{Solve}(S)=\bigvee_{d\in A(u)} \operatorname{Solve}(S[u\leftarrow d]).$
Here \(S[u\leftarrow d]\) means the new state after writing digit \(d\) into cell \(u\). A branch returns failure as soon as some empty cell has no legal digit left.
The C++ implementation improves this recurrence by choosing the empty cell with the smallest candidate set, which reduces the branching factor. The Python and Java implementations keep a list of empty cells and recurse through that list in order. Both strategies enumerate only legal completions, and backtracking guarantees completeness because every admissible digit assignment is eventually explored unless a solution is found earlier.
Worked Example: why the checkpoint grid begins with 534
One built-in checkpoint uses a nearly complete grid whose first row is
$[0,3,4,6,7,8,9,1,2].$
The missing entry is \(x_{1,1}\). Row 1 is missing only the digit 5. Column 1 contains
$[0,6,1,8,4,7,9,2,3],$
so that column is also missing only 5. The upper-left \(3\times 3\) box contains
$[0,3,4;\ 6,7,2;\ 1,9,8],$
which again excludes every digit except 5. Therefore
$A(1,1)=\{5\},\qquad x_{1,1}=5.$
Once that forced move is made, the top-left three-digit value is
$100\cdot 5+10\cdot 3+4=534.$
This small example shows exactly how the solver reasons: legal digits are determined by row, column, and box exclusions, and forced cells collapse immediately before any deeper branching is needed.
How the Code Works
All three implementations read the 50 grids, solve each grid independently, and add the quantity \(100x_{1,1}+10x_{1,2}+x_{1,3}\) to a running total. The recursion updates the board in place, and a failed branch is undone before trying the next candidate.
Shared structure across the three implementations
Each implementation parses the nine text rows of a puzzle into a \(9 \times 9\) board. Empty cells are represented by zeroes. Once a solution is found, the first three cells of the top row are read and converted into the required three-digit contribution.
What differs between C++, Python, and Java
The C++ implementation first checks that the given clues do not already violate Sudoku constraints. It maintains row, column, and box occupancy masks, recomputes candidate masks efficiently, and at each recursive level chooses the still-empty cell with the fewest legal digits. The Python and Java implementations instead store the empty positions once and recurse through them in order; legality is tested by direct scans of the row, column, and \(3\times 3\) box for each trial digit. Mathematically the search tree is the same kind of backtracking tree, but the C++ version prunes it more aggressively.
Complexity Analysis
If a grid has \(m\) empty cells, the worst-case search tree has branching factor at most 9 and depth \(m\), so the crude upper bound is \(O(9^m)\). That is the unavoidable exponential behavior of generic backtracking on Sudoku constraints.
Inside one recursive step, the bookkeeping is constant-time on a fixed \(9 \times 9\) board. In the Python and Java implementations, checking one tentative digit means scanning at most 9 cells in the row, 9 in the column, and 9 in the box. In the C++ implementation, candidate computation is bitwise, but it also scans the board to find the empty cell with minimum candidate count. Memory usage is \(O(m)\) for the recursion stack, plus the fixed board representation and the auxiliary row, column, and box state.
Because the board size is fixed and there are only 50 puzzles, the practical running time is small even though the worst-case search bound is exponential.
Footnotes and References
- Project Euler problem page: https://projecteuler.net/problem=96
- Sudoku: Wikipedia - Sudoku
- Backtracking: Wikipedia - Backtracking
- Constraint satisfaction problem: Wikipedia - Constraint satisfaction problem
- Sudoku solving methods: Wikipedia - Sudoku solving algorithms