Problem 79: Passcode Derivation
View on Project EulerProject Euler Problem 79 Solution
EulerSolve provides an optimized solution for Project Euler Problem 79, Passcode Derivation, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary We are given fifty successful three-digit login attempts. In each attempt, the three digits appear in the same relative order as they do in the unknown passcode, but they need not be adjacent in the passcode itself. The goal is to recover the shortest passcode compatible with every observed attempt. The implementations solve this by turning the keylog into an order relation on digits. Once that relation is known, the passcode is obtained by listing the digits in an order that respects every forced precedence constraint. Mathematical Approach Let the attempts be \((a_i,b_i,c_i)\) for \(i=1,\dots,50\), and let \(V\subseteq\{0,1,\dots,9\}\) be the set of digits that appear anywhere in the keylog. We build a directed graph \(G=(V,E)\) whose edges express compulsory precedence relations between digits. From Login Attempts to a Directed Graph Each observed triple \(abc\) means exactly that \(a\) must appear before \(b\), and \(b\) must appear before \(c\). The implementations therefore insert the directed edges $a \to b,\qquad b \to c.$ Nothing more is needed. The relation \(a\) before \(c\) is already implied by transitivity: if every valid passcode places \(a\) before \(b\) and \(b\) before \(c\), then it automatically places \(a\) before \(c\). So the code stores the immediate constraints and lets graph reachability represent the derived ones....
Detailed mathematical approach
Problem Summary
We are given fifty successful three-digit login attempts. In each attempt, the three digits appear in the same relative order as they do in the unknown passcode, but they need not be adjacent in the passcode itself. The goal is to recover the shortest passcode compatible with every observed attempt.
The implementations solve this by turning the keylog into an order relation on digits. Once that relation is known, the passcode is obtained by listing the digits in an order that respects every forced precedence constraint.
Mathematical Approach
Let the attempts be \((a_i,b_i,c_i)\) for \(i=1,\dots,50\), and let \(V\subseteq\{0,1,\dots,9\}\) be the set of digits that appear anywhere in the keylog. We build a directed graph \(G=(V,E)\) whose edges express compulsory precedence relations between digits.
From Login Attempts to a Directed Graph
Each observed triple \(abc\) means exactly that \(a\) must appear before \(b\), and \(b\) must appear before \(c\). The implementations therefore insert the directed edges
$a \to b,\qquad b \to c.$
Nothing more is needed. The relation \(a\) before \(c\) is already implied by transitivity: if every valid passcode places \(a\) before \(b\) and \(b\) before \(c\), then it automatically places \(a\) before \(c\). So the code stores the immediate constraints and lets graph reachability represent the derived ones.
Because the digits are stored in sets, repeated appearances of the same precedence relation do not change the graph. The mathematical object is therefore a finite directed graph on the participating digits, not a multigraph weighted by frequency.
The Partial Order Hidden in the Keylog
Write \(u \prec v\) if there is a directed path from \(u\) to \(v\) in \(G\). This reachability relation is the transitive closure of the recorded edges. When the keylog is consistent, it defines a partial order: some pairs of digits are forced into one order, while others may remain incomparable until more constraints are considered.
A candidate passcode \(P\) is valid exactly when every edge of \(G\) is respected by the left-to-right order inside \(P\). Indeed, if \(P\) respects every edge, then for each attempt \(abc\) we have \(a\) before \(b\) and \(b\) before \(c\), so \(abc\) occurs as a subsequence of \(P\). Conversely, any passcode containing every attempt as a subsequence must certainly respect all inserted edges. Thus the problem is equivalent to finding a linear extension of this partial order.
Why a Topological Order Gives the Shortest Passcode
A topological ordering of \(G\) is a listing of the vertices in which every edge points from an earlier digit to a later one. By the previous observation, such an ordering is automatically a valid passcode.
It is also shortest in the model used by the implementations. Every digit in \(V\) must appear at least once, otherwise any login attempt containing that digit would be impossible. A topological order uses each participating digit exactly once, so its length is \(|V|\). No shorter string can exist, because a string of length less than \(|V|\) cannot even contain all distinct required digits.
So the problem is reduced to finding a topological ordering of \(G\).
The Zero In-Degree Invariant
The implementations use Kahn's algorithm. For a remaining vertex \(v\), let \(\operatorname{indeg}(v)\) be the number of incoming edges from vertices not yet written into the passcode. If \(\operatorname{indeg}(v)=0\), then no unresolved constraint forces another remaining digit to appear before \(v\). Such a vertex is safe to place next.
The key invariant is: after writing some prefix of the passcode and deleting those vertices together with their outgoing edges, the residual graph describes exactly the precedence constraints that still matter for the unfinished suffix. Therefore repeatedly removing a zero in-degree vertex preserves correctness.
If the residual graph ever has no zero in-degree vertex while digits remain, then there is a directed cycle and the data are inconsistent. If there is exactly one zero in-degree choice at every step, the passcode is uniquely determined. If several choices are available, then several shortest passcodes are compatible unless a tie-breaking rule is imposed.
Worked Example from the Sanity Check
The C++ implementation includes the miniature keylog
$123,\qquad 135,\qquad 245.$
Its vertex set is \(V=\{1,2,3,4,5\}\), and the directly stored edges are
$1\to 2,\quad 2\to 3,\quad 1\to 3,\quad 3\to 5,\quad 2\to 4,\quad 4\to 5.$
The initial in-degrees are
$\operatorname{indeg}(1)=0,\ \operatorname{indeg}(2)=1,\ \operatorname{indeg}(3)=2,\ \operatorname{indeg}(4)=1,\ \operatorname{indeg}(5)=2.$
So the first digit must be \(1\). After removing \(1\), digit \(2\) becomes available. After removing \(2\), both \(3\) and \(4\) have in-degree zero. That means the constraints no longer distinguish between those two positions. The ordered implementations choose the smaller available digit first, so they continue with \(3\), then \(4\), and finally \(5\), producing
$12345.$
This example illustrates both the mathematics and the implementation detail: the graph guarantees validity, and an ordered container makes the output deterministic when the partial order is not yet total.
How the Code Works
Collect Digits and Deduplicate Edges
The C++, Python, and Java implementations read each three-character attempt, mark its digits as present, and add directed edges only between consecutive positions in the attempt. Because adjacency sets are used, duplicate precedence relations are collapsed automatically.
Compute In-Degrees and Repeatedly Remove Available Digits
Next, the implementations compute an in-degree count for every participating digit. Any digit with in-degree zero can be emitted immediately. After emitting such a digit, the implementation deletes its outgoing edges by decrementing the in-degrees of its successors; any successor whose count drops to zero becomes newly available.
This is exactly Kahn's topological-sort procedure. The C++ version keeps currently available digits in a min-priority queue. The Java version builds the graph with sorted digit sets and processes available digits through a queue. The Python version uses a deque and inserts outgoing neighbors in sorted order. All three are performing the same mathematical elimination process.
Form the Passcode String
The digits are appended in the order they are removed. If the graph is acyclic and all participating digits are processed, the resulting string is a passcode consistent with every attempt. Because each digit is written once, the produced string is also shortest under this precedence-graph model.
Complexity Analysis
Let \(A=50\) be the number of login attempts. Each attempt has length \(3\), so scanning the input takes \(O(A)\) time with tiny constants, and at most \(2A\) direct edges are inserted before deduplication. Since the vertices are decimal digits, we always have \(|V|\le 10\).
After the graph is built, Kahn's algorithm runs in \(O(|V|+|E|)\) time with a plain queue. The ordered extraction used by the C++ version changes this to \(O((|V|+|E|)\log |V|)\), but here \(|V|\le 10\), so the distinction is negligible. Memory usage is \(O(|V|+|E|)\).
Footnotes and References
- Problem page: https://projecteuler.net/problem=79
- Topological sorting: Wikipedia - Topological sorting
- Directed acyclic graph: Wikipedia - Directed acyclic graph
- Partially ordered set: Wikipedia - Partially ordered set
- Transitive closure: Wikipedia - Transitive closure
- Shortest common supersequence problem: Wikipedia - Shortest common supersequence problem