Problem 212: Combined Volume of Cuboids
View on Project EulerProject Euler Problem 212 Solution
EulerSolve provides an optimized solution for Project Euler Problem 212, Combined Volume of Cuboids, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary Problem 212 asks for the volume of the union of \(N=50000\) axis-aligned cuboids. The cuboids are generated deterministically from a lagged Fibonacci sequence, so the input is large but completely fixed. Each cuboid has the half-open form $C_i=[x_i,x_i+d_{x,i})\times[y_i,y_i+d_{y,i})\times[z_i,z_i+d_{z,i}),$ with \(x_i,y_i,z_i\in\{0,\dots,9999\}\) and \(1\le d_{x,i},d_{y,i},d_{z,i}\le 399\). The challenge is the overlap structure: \(50000\) cuboids can intersect in extremely complicated ways, so direct inclusion-exclusion is useless. The successful approach is to exploit the arithmetic structure of the coordinates. Because all coordinates are integers and every \(x\)-length is at most \(399\), the entire 3D problem can be reduced to about \(10^4\) independent 2D union-area problems. Mathematical Approach From the sequence to the cuboids The pseudo-random sequence is $S_k=(100003-200003k+300007k^3)\bmod 10^6\qquad(1\le k\le 55),$ followed by the recurrence $S_k=(S_{k-24}+S_{k-55})\bmod 10^6\qquad(k\ge 56).$ Every cuboid uses six consecutive values: $x_i=S_{6i-5}\bmod 10000,\qquad y_i=S_{6i-4}\bmod 10000,\qquad z_i=S_{6i-3}\bmod 10000,$ $d_{x,i}=1+(S_{6i-2}\bmod 399),\qquad d_{y,i}=1+(S_{6i-1}\bmod 399),\qquad d_{z,i}=1+(S_{6i}\bmod 399).$ The half-open convention matters....
Detailed mathematical approach
Problem Summary
Problem 212 asks for the volume of the union of \(N=50000\) axis-aligned cuboids. The cuboids are generated deterministically from a lagged Fibonacci sequence, so the input is large but completely fixed. Each cuboid has the half-open form
$C_i=[x_i,x_i+d_{x,i})\times[y_i,y_i+d_{y,i})\times[z_i,z_i+d_{z,i}),$
with \(x_i,y_i,z_i\in\{0,\dots,9999\}\) and \(1\le d_{x,i},d_{y,i},d_{z,i}\le 399\).
The challenge is the overlap structure: \(50000\) cuboids can intersect in extremely complicated ways, so direct inclusion-exclusion is useless. The successful approach is to exploit the arithmetic structure of the coordinates. Because all coordinates are integers and every \(x\)-length is at most \(399\), the entire 3D problem can be reduced to about \(10^4\) independent 2D union-area problems.
Mathematical Approach
From the sequence to the cuboids
The pseudo-random sequence is
$S_k=(100003-200003k+300007k^3)\bmod 10^6\qquad(1\le k\le 55),$
followed by the recurrence
$S_k=(S_{k-24}+S_{k-55})\bmod 10^6\qquad(k\ge 56).$
Every cuboid uses six consecutive values:
$x_i=S_{6i-5}\bmod 10000,\qquad y_i=S_{6i-4}\bmod 10000,\qquad z_i=S_{6i-3}\bmod 10000,$
$d_{x,i}=1+(S_{6i-2}\bmod 399),\qquad d_{y,i}=1+(S_{6i-1}\bmod 399),\qquad d_{z,i}=1+(S_{6i}\bmod 399).$
The half-open convention matters. If two cuboids only touch on a face or an edge, that boundary has zero volume and is not double-counted. It also means that every unit box \([x,x+1)\times[y,y+1)\times[z,z+1)\) is either fully inside the union or fully outside it.
Why integer \(x\)-slabs are enough
Let
$X_{\max}=\max_i(x_i+d_{x,i}).$
For each integer \(x\in\{0,1,\dots,X_{\max}-1\}\), consider the slab \([x,x+1)\). Inside that slab, a cuboid contributes exactly when \(x_i\le x<x_i+d_{x,i}\), and then its cross-section is the rectangle
$[y_i,y_i+d_{y,i})\times[z_i,z_i+d_{z,i}).$
If we denote by \(A_x\) the union area of all such active rectangles, then the full volume is exactly
$V=\sum_{x=0}^{X_{\max}-1} A_x,$
where
$A_x=\operatorname{area}\!\Bigg(\bigcup_{i:\,x_i\le x<x_i+d_{x,i}}[y_i,y_i+d_{y,i})\times[z_i,z_i+d_{z,i})\Bigg).$
This is not an approximation. The slab thickness is 1, the active set is constant throughout the slab, and the coordinates are integral, so summing the cross-sectional areas gives the exact 3D volume.
Union area inside one slab
Fix one slab. Suppose its active rectangles are
$R_j=[a_j,b_j)\times[c_j,d_j)\qquad(1\le j\le m).$
We sweep upward in the \(y\)-direction. Each rectangle generates two events:
$ (a_j,\,[c_j,d_j),\,+1),\qquad (b_j,\,[c_j,d_j),\,-1). $
Between two consecutive event heights, the active set of rectangles does not change, so the covered length on the \(z\)-axis is constant there. If that covered \(z\)-length is \(L_z\) on the strip from \(y=u\) to \(y=v\), then the strip contributes
$L_z\,(v-u)$
to the slab area. Summing over all strips gives \(A_x\).
The segment-tree invariant on the \(z\)-axis
The only dynamic quantity during the sweep is the union length of the active \(z\)-intervals. The implementations maintain a segment tree over the integer \(z\)-range \([0,Z_{\max})\), where
$Z_{\max}=\max_i(z_i+d_{z,i}).$
For every node interval \(I=[\ell,r)\), the tree stores a cover count \(c(I)\) and a covered length \(\lambda(I)\). The invariant is
$\lambda(I)= \begin{cases} r-\ell, & c(I)>0,\\ 0, & c(I)=0\text{ and }r-\ell=1,\\ \lambda(I_{\mathrm{left}})+\lambda(I_{\mathrm{right}}), & c(I)=0\text{ and }r-\ell>1. \end{cases}$
If a node is covered by at least one active rectangle, its whole interval is covered. Otherwise, coverage is determined entirely by its children. Therefore the root always stores the current union length on the \(z\)-axis, exactly the quantity needed for the sweep formula.
Worked example
Take two cuboids
$C_1=[0,2)\times[0,3)\times[0,1),\qquad C_2=[1,3)\times[1,4)\times[0,2).$
They occupy the slabs \(x=0,1,2\).
For \(x=0\), only \(C_1\) is active, so \(A_0=3\cdot 1=3\).
For \(x=1\), both are active. The two rectangles in the \((y,z)\)-plane are
$[0,3)\times[0,1),\qquad [1,4)\times[0,2).$
The event heights are \(0,1,3,4\). The covered \(z\)-length is \(1\) on \([0,1)\), then \(2\) on \([1,3)\), then still \(2\) on \([3,4)\). Hence
$A_1=1\cdot(1-0)+2\cdot(3-1)+2\cdot(4-3)=7.$
For \(x=2\), only \(C_2\) is active, so \(A_2=3\cdot 2=6\). Therefore
$V=A_0+A_1+A_2=3+7+6=16.$
This miniature example is exactly the same geometry as the full problem: turn volume into slab areas, then turn each slab area into a sweep over rectangle events.
How the Code Works
The C++, Python, and Java implementations first generate the first \(6N\) sequence values and convert them into \(50000\) half-open cuboids. The C++ and Java versions then use a difference-array pass over the \(x\)-axis to count how many rectangles will be active in each slab, so they can reserve or allocate exact per-slab storage. The Python version directly appends rectangles into slab lists, but it builds the same mathematical object: for each integer \(x\), the full list of \((y,z)\)-rectangles that appear in the slab \([x,x+1)\).
Each slab is processed independently. The implementation builds \(2R_x\) events for a slab with \(R_x\) active rectangles, sorts them by \(y\), and sweeps upward while updating a segment tree on the raw \(z\)-coordinate range. No coordinate compression is needed here because the generated coordinates never exceed roughly \(10^4\), so the direct tree is already small enough. After every gap in the sorted event list, the current covered \(z\)-length at the root is multiplied by the height of the gap and added to the slab area.
Once the slab areas are available, they are simply summed to obtain the final volume. The C++ implementation distributes slabs dynamically across worker threads and reuses one sweep structure per worker. The Python implementation splits the \(x\)-range into blocks and evaluates those blocks in separate worker processes. The Java implementation performs the same slab-by-slab sweep serially. The C++ solution also contains internal validation against small brute-force tests and against the known \(100\)-cuboid checkpoint before running the full instance.
Complexity Analysis
Let \(R_x\) be the number of active rectangles in slab \(x\), and define
$M=\sum_x R_x=\sum_{i=1}^{N} d_{x,i}.$
Because every \(x\)-length satisfies \(d_{x,i}\le 399\), we have the concrete bound \(M\le 399N\). Building the cuboids is \(O(N)\), and materializing all slab lists is \(O(M)\) time and \(O(M)\) memory.
For each slab, the code sorts \(2R_x\) events and performs \(2R_x\) segment-tree updates. This gives total running time
$O\!\left(\sum_x R_x\log R_x + M\log Z_{\max}\right),$
with \(Z_{\max}\le 10398\). The additional working memory for one sweep structure is \(O(Z_{\max})\), so the full memory footprint is \(O(M+WZ_{\max})\) if \(W\) workers are used. In practice this is efficient precisely because the problem's coordinate bounds are small enough to justify explicit slab storage.
Footnotes and References
- Project Euler problem page: https://projecteuler.net/problem=212
- Lagged Fibonacci generator: Wikipedia - Lagged Fibonacci generator
- Sweep line algorithm: Wikipedia - Sweep line algorithm
- Segment tree: Wikipedia - Segment tree
- Klee's measure problem: Wikipedia - Klee's measure problem