Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 107: Minimal Network

Project Euler Problem 107 Solution

EulerSolve provides an optimized solution for Project Euler Problem 107, Minimal Network, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary The network is given as a symmetric adjacency matrix. Each entry is either a positive integer edge weight or the symbol - for "no direct edge". We want to remove as much total weight as possible while keeping all 40 vertices connected. If $G=(V,E,w)$ is the original weighted graph, the target quantity is the largest possible saving $\operatorname{saving}=W(G)-W(H),\qquad W(X)=\sum_{e\in E(X)} w(e),$ where $H$ ranges over connected spanning subgraphs of $G$. The key observation is that an optimal connected network can never contain a cycle. That turns the problem into a minimum spanning tree problem: keep the cheapest possible connected backbone, then subtract its weight from the original total. Mathematical Approach Let $G=(V,E,w)$ be the weighted undirected graph extracted from the matrix, with $n=|V|=40$. The derivation has two distinct parts. First we identify what a minimal network must look like. Then we justify the greedy construction used by the implementations. From a minimal network to a spanning tree Suppose $H$ is a connected spanning subgraph of $G$ with minimum total weight. If $H$ contained a cycle, then deleting any one edge from that cycle would leave all vertices on the cycle still connected through the remaining edges of the cycle. So the graph would stay connected but become strictly lighter....

Detailed mathematical approach

Problem Summary

The network is given as a symmetric adjacency matrix. Each entry is either a positive integer edge weight or the symbol - for "no direct edge". We want to remove as much total weight as possible while keeping all 40 vertices connected. If $G=(V,E,w)$ is the original weighted graph, the target quantity is the largest possible saving

$\operatorname{saving}=W(G)-W(H),\qquad W(X)=\sum_{e\in E(X)} w(e),$

where $H$ ranges over connected spanning subgraphs of $G$.

The key observation is that an optimal connected network can never contain a cycle. That turns the problem into a minimum spanning tree problem: keep the cheapest possible connected backbone, then subtract its weight from the original total.

Mathematical Approach

Let $G=(V,E,w)$ be the weighted undirected graph extracted from the matrix, with $n=|V|=40$. The derivation has two distinct parts. First we identify what a minimal network must look like. Then we justify the greedy construction used by the implementations.

From a minimal network to a spanning tree

Suppose $H$ is a connected spanning subgraph of $G$ with minimum total weight. If $H$ contained a cycle, then deleting any one edge from that cycle would leave all vertices on the cycle still connected through the remaining edges of the cycle. So the graph would stay connected but become strictly lighter. That contradicts the assumption that $H$ was already minimal.

Therefore every optimal connected spanning subgraph is acyclic. A connected acyclic graph on $n$ vertices is a spanning tree, so any optimal network has exactly $n-1$ edges. The problem is therefore equivalent to

$\text{find a spanning tree }T\subseteq G\text{ minimizing }W(T).$

Any such tree is a minimum spanning tree (MST). Once its weight is known, the saving is simply

$\operatorname{saving}=W(G)-W(T_{\min}).$

Uniqueness is irrelevant here: even if several MSTs exist, they all have the same total weight, so the saving is well defined.

Reading the matrix correctly

Because the graph is undirected, the adjacency matrix is symmetric. The edge between vertices $i$ and $j$ appears twice, at positions $(i,j)$ and $(j,i)$. So the correct way to build the graph is to read only the upper triangle $i<j$. Every non-dash entry contributes one undirected edge and one copy of its weight.

That means the total original weight is

$W(G)=\sum_{0\le i<j<n,\ a_{ij}\ne \text{-}} a_{ij}.$

This formula matches the way the implementations accumulate the original network cost before any edges are removed.

Why Kruskal's greedy rule is correct

List the edges as $e_1,e_2,\dots,e_m$ in nondecreasing order of weight:

$w(e_1)\le w(e_2)\le \cdots \le w(e_m).$

Kruskal's algorithm scans this list from left to right. An edge is accepted exactly when its endpoints lie in different connected components of the forest built so far; otherwise it would create a cycle and is skipped.

Two invariants drive the proof. First, the chosen edges always form a forest, because no accepted edge ever closes a cycle. Second, after every accepted edge there is still some MST containing all chosen edges. This is the safe-edge statement behind the cut property: when the algorithm selects the lightest edge joining two current components, that edge is a minimum-weight edge crossing the cut determined by either component, so replacing another crossing edge in an MST cannot increase the total weight.

Each accepted edge reduces the number of connected components by one. Starting from $n$ isolated vertices, after exactly $n-1$ accepted edges we obtain a connected acyclic graph, hence a spanning tree. By the invariant above, that spanning tree has minimum possible total weight.

The role of the disjoint-set structure

The only dynamic question in Kruskal's algorithm is whether two vertices are already in the same component. A disjoint-set union structure stores the current partition of the vertex set. The operation "find" identifies the representative of a component, and "union" merges two components after a safe edge is accepted.

With path compression and union by rank, the amortized cost per operation is $O(\alpha(n))$, where $\alpha$ is the inverse Ackermann function. For a 40-vertex graph this cost is effectively constant, so the dominant work lies in sorting the edges, not in maintaining connectivity.

Worked Example: The 7-vertex sample network

The sample network in the statement has 7 vertices. Reading only the upper triangle gives the 12 edge weights

$11,\ 12,\ 16,\ 17,\ 18,\ 19,\ 20,\ 21,\ 23,\ 27,\ 28,\ 31,$

so the total weight of the original graph is

$W(G)=243.$

Processing these weights in ascending order, Kruskal accepts

$11,\ 12,\ 16,\ 17,\ 18,\ 19,$

because each of them still joins two different components at the moment it is considered. Those six edges already connect all 7 vertices, so they form a spanning tree with weight

$W(T_{\min})=11+12+16+17+18+19=93.$

Every remaining edge would create a cycle and is therefore redundant. The saving is

$243-93=150,$

which is exactly the value quoted in the problem statement and checked by the C++ implementation.

Final formula for the answer

Once the minimum spanning tree has been found, the output is not another graph but a scalar difference:

$\boxed{\operatorname{saving}=W(G)-W(T_{\min}).}$

So the computation naturally splits into two totals: one over all original edges, and one over the $n-1$ edges that survive in the MST.

How the Code Works

Parsing the weighted network

The C++, Python, and Java implementations read the comma-separated matrix row by row and ignore blank lines. For row $i$, they inspect only columns $j>i$. If an entry is -, no edge is added. Otherwise the token is converted to an integer weight, appended to the edge list, and added to the running total $W(G)$.

This upper-triangle scan is the concrete implementation of the mathematical observation that each undirected edge should be counted exactly once.

Constructing the minimum spanning tree

After the edge list has been built, the implementation sorts it by weight and initializes one disjoint-set component per vertex. It then scans the sorted edges. Whenever an edge connects two different components, that edge is accepted into the spanning tree, its weight is added to the MST total, and the two components are merged.

If both endpoints are already in the same component, the edge would close a cycle and is ignored. The C++ implementation stops as soon as $n-1$ edges have been accepted. The Python and Java implementations continue scanning the sorted list, but the connectivity test prevents any extra weight from being added after the tree is complete.

Producing the saving

At the end, all three implementations subtract the MST total from the original total. The C++ implementation also includes the 7-vertex sample network as a checkpoint and verifies that the saving for that sample is $150$ before solving the full instance.

Complexity Analysis

Let $n$ be the number of vertices and $m$ the number of edges that actually appear in the network. Parsing the matrix costs $O(n^2)$, because the input is an $n\times n$ table. The dominant step in Kruskal's algorithm is sorting the edge list:

$O(m\log m).$

The disjoint-set operations contribute $O(m\,\alpha(n))$, which is asymptotically smaller than the sorting cost. Space usage is $O(m+n)$: the edge list stores the graph, and the disjoint-set structure stores one representative and one rank-like value per vertex.

For Problem 107 specifically, $n=40$, and even a complete graph would have only $40\cdot 39/2=780$ edges. So the program is easily fast enough; the real content of the problem is the mathematical reduction from "remove redundant weight while preserving connectivity" to "compute an MST and subtract".

Footnotes and References

Project Euler, Problem 107: https://projecteuler.net/problem=107
Minimum spanning tree: Wikipedia - Minimum spanning tree
Spanning tree: Wikipedia - Spanning tree
Kruskal's algorithm: Wikipedia - Kruskal's algorithm
Disjoint-set data structure: Wikipedia - Disjoint-set data structure
Cut (graph theory): Wikipedia - Cut (graph theory)