Problem 107: Minimal Network
View on Project EulerProject Euler Problem 107 Solution
EulerSolve provides an optimized solution for Project Euler Problem 107, Minimal Network, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary The network is given as a symmetric adjacency matrix. Each entry is either a positive integer edge weight or the symbol - for "no direct edge". We want to remove as much total weight as possible while keeping all 40 vertices connected. If \(G=(V,E,w)\) is the original weighted graph, the target quantity is the largest possible saving $\operatorname{saving}=W(G)-W(H),\qquad W(X)=\sum_{e\in E(X)} w(e),$ where \(H\) ranges over connected spanning subgraphs of \(G\). The key observation is that an optimal connected network can never contain a cycle. That turns the problem into a minimum spanning tree problem: keep the cheapest possible connected backbone, then subtract its weight from the original total. Mathematical Approach Let \(G=(V,E,w)\) be the weighted undirected graph extracted from the matrix, with \(n=|V|=40\). The derivation has two distinct parts. First we identify what a minimal network must look like. Then we justify the greedy construction used by the implementations. From a minimal network to a spanning tree Suppose \(H\) is a connected spanning subgraph of \(G\) with minimum total weight. If \(H\) contained a cycle, then deleting any one edge from that cycle would leave all vertices on the cycle still connected through the remaining edges of the cycle. So the graph would stay connected but become strictly lighter....
Detailed mathematical approach
Problem Summary
The network is given as a symmetric adjacency matrix. Each entry is either a positive integer edge weight or the symbol - for "no direct edge". We want to remove as much total weight as possible while keeping all 40 vertices connected. If \(G=(V,E,w)\) is the original weighted graph, the target quantity is the largest possible saving
$\operatorname{saving}=W(G)-W(H),\qquad W(X)=\sum_{e\in E(X)} w(e),$
where \(H\) ranges over connected spanning subgraphs of \(G\).
The key observation is that an optimal connected network can never contain a cycle. That turns the problem into a minimum spanning tree problem: keep the cheapest possible connected backbone, then subtract its weight from the original total.
Mathematical Approach
Let \(G=(V,E,w)\) be the weighted undirected graph extracted from the matrix, with \(n=|V|=40\). The derivation has two distinct parts. First we identify what a minimal network must look like. Then we justify the greedy construction used by the implementations.
From a minimal network to a spanning tree
Suppose \(H\) is a connected spanning subgraph of \(G\) with minimum total weight. If \(H\) contained a cycle, then deleting any one edge from that cycle would leave all vertices on the cycle still connected through the remaining edges of the cycle. So the graph would stay connected but become strictly lighter. That contradicts the assumption that \(H\) was already minimal.
Therefore every optimal connected spanning subgraph is acyclic. A connected acyclic graph on \(n\) vertices is a spanning tree, so any optimal network has exactly \(n-1\) edges. The problem is therefore equivalent to
$\text{find a spanning tree }T\subseteq G\text{ minimizing }W(T).$
Any such tree is a minimum spanning tree (MST). Once its weight is known, the saving is simply
$\operatorname{saving}=W(G)-W(T_{\min}).$
Uniqueness is irrelevant here: even if several MSTs exist, they all have the same total weight, so the saving is well defined.
Reading the matrix correctly
Because the graph is undirected, the adjacency matrix is symmetric. The edge between vertices \(i\) and \(j\) appears twice, at positions \((i,j)\) and \((j,i)\). So the correct way to build the graph is to read only the upper triangle \(i<j\). Every non-dash entry contributes one undirected edge and one copy of its weight.
That means the total original weight is
$W(G)=\sum_{0\le i<j<n,\ a_{ij}\ne \text{-}} a_{ij}.$
This formula matches the way the implementations accumulate the original network cost before any edges are removed.
Why Kruskal's greedy rule is correct
List the edges as \(e_1,e_2,\dots,e_m\) in nondecreasing order of weight:
$w(e_1)\le w(e_2)\le \cdots \le w(e_m).$
Kruskal's algorithm scans this list from left to right. An edge is accepted exactly when its endpoints lie in different connected components of the forest built so far; otherwise it would create a cycle and is skipped.
Two invariants drive the proof. First, the chosen edges always form a forest, because no accepted edge ever closes a cycle. Second, after every accepted edge there is still some MST containing all chosen edges. This is the safe-edge statement behind the cut property: when the algorithm selects the lightest edge joining two current components, that edge is a minimum-weight edge crossing the cut determined by either component, so replacing another crossing edge in an MST cannot increase the total weight.
Each accepted edge reduces the number of connected components by one. Starting from \(n\) isolated vertices, after exactly \(n-1\) accepted edges we obtain a connected acyclic graph, hence a spanning tree. By the invariant above, that spanning tree has minimum possible total weight.
The role of the disjoint-set structure
The only dynamic question in Kruskal's algorithm is whether two vertices are already in the same component. A disjoint-set union structure stores the current partition of the vertex set. The operation "find" identifies the representative of a component, and "union" merges two components after a safe edge is accepted.
With path compression and union by rank, the amortized cost per operation is \(O(\alpha(n))\), where \(\alpha\) is the inverse Ackermann function. For a 40-vertex graph this cost is effectively constant, so the dominant work lies in sorting the edges, not in maintaining connectivity.
Worked Example: The 7-vertex sample network
The sample network in the statement has 7 vertices. Reading only the upper triangle gives the 12 edge weights
$11,\ 12,\ 16,\ 17,\ 18,\ 19,\ 20,\ 21,\ 23,\ 27,\ 28,\ 31,$
so the total weight of the original graph is
$W(G)=243.$
Processing these weights in ascending order, Kruskal accepts
$11,\ 12,\ 16,\ 17,\ 18,\ 19,$
because each of them still joins two different components at the moment it is considered. Those six edges already connect all 7 vertices, so they form a spanning tree with weight
$W(T_{\min})=11+12+16+17+18+19=93.$
Every remaining edge would create a cycle and is therefore redundant. The saving is
$243-93=150,$
which is exactly the value quoted in the problem statement and checked by the C++ implementation.
Final formula for the answer
Once the minimum spanning tree has been found, the output is not another graph but a scalar difference:
$\boxed{\operatorname{saving}=W(G)-W(T_{\min}).}$
So the computation naturally splits into two totals: one over all original edges, and one over the \(n-1\) edges that survive in the MST.
How the Code Works
Parsing the weighted network
The C++, Python, and Java implementations read the comma-separated matrix row by row and ignore blank lines. For row \(i\), they inspect only columns \(j>i\). If an entry is -, no edge is added. Otherwise the token is converted to an integer weight, appended to the edge list, and added to the running total \(W(G)\).
This upper-triangle scan is the concrete implementation of the mathematical observation that each undirected edge should be counted exactly once.
Constructing the minimum spanning tree
After the edge list has been built, the implementation sorts it by weight and initializes one disjoint-set component per vertex. It then scans the sorted edges. Whenever an edge connects two different components, that edge is accepted into the spanning tree, its weight is added to the MST total, and the two components are merged.
If both endpoints are already in the same component, the edge would close a cycle and is ignored. The C++ implementation stops as soon as \(n-1\) edges have been accepted. The Python and Java implementations continue scanning the sorted list, but the connectivity test prevents any extra weight from being added after the tree is complete.
Producing the saving
At the end, all three implementations subtract the MST total from the original total. The C++ implementation also includes the 7-vertex sample network as a checkpoint and verifies that the saving for that sample is \(150\) before solving the full instance.
Complexity Analysis
Let \(n\) be the number of vertices and \(m\) the number of edges that actually appear in the network. Parsing the matrix costs \(O(n^2)\), because the input is an \(n\times n\) table. The dominant step in Kruskal's algorithm is sorting the edge list:
$O(m\log m).$
The disjoint-set operations contribute \(O(m\,\alpha(n))\), which is asymptotically smaller than the sorting cost. Space usage is \(O(m+n)\): the edge list stores the graph, and the disjoint-set structure stores one representative and one rank-like value per vertex.
For Problem 107 specifically, \(n=40\), and even a complete graph would have only \(40\cdot 39/2=780\) edges. So the program is easily fast enough; the real content of the problem is the mathematical reduction from "remove redundant weight while preserving connectivity" to "compute an MST and subtract".
Footnotes and References
- Project Euler, Problem 107: https://projecteuler.net/problem=107
- Minimum spanning tree: Wikipedia - Minimum spanning tree
- Spanning tree: Wikipedia - Spanning tree
- Kruskal's algorithm: Wikipedia - Kruskal's algorithm
- Disjoint-set data structure: Wikipedia - Disjoint-set data structure
- Cut (graph theory): Wikipedia - Cut (graph theory)