Problem 941: de Bruijn's Combination Lock
View on Project EulerProject Euler Problem 941 Solution
EulerSolve provides an optimized solution for Project Euler Problem 941, de Bruijn's Combination Lock, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary For each \(i=1,\dots,N\), a linear congruential generator produces $a_i \equiv 920461\,a_{i-1}+800217387569 \pmod{10^{12}}, \qquad a_0=0.$ Each value is written as a 12-digit decimal string with leading zeros, and every digit \(d\in\{0,\dots,9\}\) is shifted to \(d+1\). That turns the number into a word \(w_i\in\{1,\dots,10\}^{12}\). The problem is not to compare these words lexicographically, but to compare them by the order in which they appear as consecutive windows in a de Bruijn cycle of order 12 on a 10-symbol alphabet. After assigning that de Bruijn rank to every generated value, the values are sorted by rank and we compute $F(N)=\sum_{m=1}^{N} m\,a_{(m)} \pmod{1234567891},$ where \(a_{(m)}\) is the \(m\)-th value after sorting by de Bruijn rank. Mathematical Approach The whole solution is a ranking problem on words. There are \(10^{12}\) possible length-12 words, so the de Bruijn cycle cannot be built explicitly. Instead, the implementations rank each word directly through necklace representatives, Lyndon words, and a short dynamic program. The de Bruijn order used by the lock Let \(k=10\) and \(n=12\). A standard theorem says that if we concatenate, in lexicographic order, all Lyndon words whose lengths divide \(n\), we obtain a de Bruijn cycle \(B(k,n)\)....
Detailed mathematical approach
Problem Summary
For each \(i=1,\dots,N\), a linear congruential generator produces
$a_i \equiv 920461\,a_{i-1}+800217387569 \pmod{10^{12}}, \qquad a_0=0.$
Each value is written as a 12-digit decimal string with leading zeros, and every digit \(d\in\{0,\dots,9\}\) is shifted to \(d+1\). That turns the number into a word \(w_i\in\{1,\dots,10\}^{12}\). The problem is not to compare these words lexicographically, but to compare them by the order in which they appear as consecutive windows in a de Bruijn cycle of order 12 on a 10-symbol alphabet. After assigning that de Bruijn rank to every generated value, the values are sorted by rank and we compute
$F(N)=\sum_{m=1}^{N} m\,a_{(m)} \pmod{1234567891},$
where \(a_{(m)}\) is the \(m\)-th value after sorting by de Bruijn rank.
Mathematical Approach
The whole solution is a ranking problem on words. There are \(10^{12}\) possible length-12 words, so the de Bruijn cycle cannot be built explicitly. Instead, the implementations rank each word directly through necklace representatives, Lyndon words, and a short dynamic program.
The de Bruijn order used by the lock
Let \(k=10\) and \(n=12\). A standard theorem says that if we concatenate, in lexicographic order, all Lyndon words whose lengths divide \(n\), we obtain a de Bruijn cycle \(B(k,n)\). Cutting that cycle immediately before the window \(1^{12}\) turns the cyclic order into a linear order on all \(10^{12}\) words of length 12.
So every word \(w\in\{1,\dots,10\}^{12}\) has a unique rank \(R(w)\in\{1,\dots,10^{12}\}\): the position at which \(w\) appears as a length-12 window in that linearized cycle. The final sort is performed by this rank.
Necklaces and the primitive block length
A word is a necklace if it is lexicographically smallest among all of its cyclic rotations. If a necklace \(\nu\) can be written as
$\nu=u^{\,n/p},$
where \(u\) is primitive and Lyndon, then \(p\) is the length of the fundamental block. This number is an important invariant in the code: it tells us how many consecutive windows are contributed by that Lyndon block in the de Bruijn construction.
For instance, \(1^{12}\) has \(p=1\), while \((1,2,1,2,\dots,1,2)\) has \(p=2\). When a necklace is ranked, the computation first counts all completed Lyndon blocks before it, and then steps backward inside its own block by exactly \(p-1\) positions.
Replacing an arbitrary boundary by the largest admissible necklace
Fix a divisor \(d\mid 12\). To count Lyndon words of length \(d\) up to a boundary induced by \(w\), the implementations first replace the boundary by the largest necklace \(\eta_d(w)\) that is not greater than the first \(d\) symbols of \(w\). This is the correct canonical boundary because necklace representatives, not arbitrary rotations, are what index the blocks in the de Bruijn construction.
That replacement is done by repeatedly locating the first place where the current prefix ceases to be necklace-compatible, lowering that symbol by 1, and filling the rest with the maximum symbol 10. The loop stops exactly when the word becomes a necklace.
The dynamic count and its recurrence
Once the boundary necklace \(\eta=\eta_d(w)\) is known, the implementations evaluate a counting function \(T_d(w)\). The main table satisfies
$B[t,j]=B[t,j+1]+\bigl(k-\eta_{j+1}\bigr)\,B[t-j-1,0], \qquad B[0,0]=1,$
for \(0\le j<t\). Conceptually, \(B[t,j]\) counts how many completions of remaining length \(t\) are possible once the first \(j\) positions have already matched the boundary necklace. The first term keeps the next symbol equal to the boundary and moves right; the second term chooses a larger symbol and then counts all free continuations.
A second table stores border information: for every relevant suffix, it remembers how much of that suffix also matches a prefix of \(\eta\). This lets the code continue comparisons across the cyclic boundary in constant time instead of rescanning characters. Using those two tables, the implementation accumulates the full value \(T_d(w)\), which counts periodic words compatible with the boundary necklace.
Möbius inversion extracts Lyndon counts
The quantity \(T_d(w)\) still counts periodic objects, so primitive Lyndon words must be isolated by Möbius inversion:
$L_d(w)=\frac{1}{d}\sum_{e\mid d}\mu\!\left(\frac{d}{e}\right)T_e(w).$
Here \(L_d(w)\) is the number of Lyndon words of length \(d\) that do not exceed the boundary induced by \(w\), and \(\mu\) is the Möbius function. The divisor sum removes the overcount coming from smaller primitive words repeated several times.
Rank formula for necklace words
If \(w\) is itself a necklace, then its de Bruijn rank is
$R(w)=1-p(w)+\sum_{d\mid 12} d\,L_d(w).$
The sum \(\sum d\,L_d(w)\) is the total number of length-12 windows contributed by all eligible Lyndon blocks up to \(w\). The correction \(1-p(w)\) moves from the end of the current block back to the specific window represented by \(w\).
Rotations and the wrap-around case
Every non-necklace word is a rotation of a unique necklace. The implementations rotate \(w\) left until the first necklace in its orbit appears; call it \(\nu\). If the rotation does not cross the point where the de Bruijn cycle was cut, then \(R(w)\) is just \(R(\nu)\) plus or minus the corresponding offset inside the same Lyndon block.
The only exceptional family is
$w=(\underbrace{10,\dots,10}_{t},\underbrace{1,\dots,1}_{12-t}), \qquad 1\le t\le 12.$
These are precisely the windows that wrap across the cut between the end of the cycle and the initial word \(1^{12}\). They occupy the last \(t\) positions of the linear order, so
$R(w)=10^{12}-t+1.$
A concrete example is \(w=(10,10,1,1,\dots,1)\), which has \(t=2\), hence rank \(10^{12}-1\). This matches the special case handled explicitly in all three implementations.
How the Code Works
The C++, Python, and Java implementations follow the same pipeline. They first generate the pseudo-random values, keep each one as a 12-digit object with leading zeros, and map those digits from \(\{0,\dots,9\}\) to the alphabet \(\{1,\dots,10\}\).
Next they precompute the tiny amount of reusable arithmetic: the powers \(10^0,10^1,\dots,10^{12}\) and the Möbius values for \(1,\dots,12\). For every generated word, the implementation decides whether it is already a necklace, finds the appropriate boundary necklace when it is not, evaluates the divisor-based counting formulas, and combines them into the de Bruijn rank \(R(w)\).
After each value has been paired with its rank, the list is sorted by rank. The final pass computes \(F(N)\) exactly for the small validation cases and modulo \(1234567891\) for the full input. The C++ version parallelizes the independent ranking work across threads; the mathematical result is otherwise identical in all three languages.
Complexity Analysis
The alphabet size and word length are fixed: \(k=10\) and \(n=12\). Therefore the ranking work for one value is bounded by a constant amount of computation over the divisors of 12 together with a few \(12\times12\) tables. For \(N\) generated values, the dominant cost is the final sort, so the overall time complexity is \(O(N\log N)\).
The memory usage is \(O(N)\) because the generated values and their ranks must be stored for sorting. The main mathematical achievement is that the algorithm never constructs the de Bruijn cycle of length \(10^{12}\); it reaches the correct position of each word directly through necklace normalization, Möbius inversion, and short dynamic counts.
Footnotes and References
- Problem page: https://projecteuler.net/problem=941
- de Bruijn sequence: Wikipedia - De Bruijn sequence
- Necklace (combinatorics): Wikipedia - Necklace (combinatorics)
- Lyndon word: Wikipedia - Lyndon word
- Möbius function: Wikipedia - Möbius function