Problem 679: Freefarea
View on Project EulerProject Euler Problem 679 Solution
EulerSolve provides an optimized solution for Project Euler Problem 679, Freefarea, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary Let $\mathcal{P}=\{\text{FREE},\text{FARE},\text{AREA},\text{REEF}\}.$ We want the number \(F(n)\) of length-\(n\) words over the alphabet \(\{A,E,F,R\}\) in which every word in \(\mathcal{P}\) appears exactly once as a contiguous substring. For the actual problem one needs \(F(30)\). A brute-force search would inspect \(4^n\) words, which is hopeless for \(n=30\). The difficulty is not just the size of the search space: the four target words overlap in several ways, so ordinary inclusion-exclusion becomes awkward. The implementations therefore convert the problem into a finite-state dynamic program that records both the current suffix information and how many times each target has already been completed. Mathematical Approach The key observation is that only two pieces of information matter after reading a prefix: which suffixes can still grow into one of the target words, and how many times each target word has already been seen. That leads naturally to an Aho-Corasick automaton combined with dynamic programming. Step 1: Build a Finite Automaton for the Four Target Words Construct the trie of the four words in \(\mathcal{P}\), then add failure links in the standard Aho-Corasick way. After processing a prefix \(u\), the automaton state represents the longest suffix of \(u\) that is also a prefix of at least one word in \(\mathcal{P}\)....
Detailed mathematical approach
Problem Summary
Let
$\mathcal{P}=\{\text{FREE},\text{FARE},\text{AREA},\text{REEF}\}.$
We want the number \(F(n)\) of length-\(n\) words over the alphabet \(\{A,E,F,R\}\) in which every word in \(\mathcal{P}\) appears exactly once as a contiguous substring. For the actual problem one needs \(F(30)\).
A brute-force search would inspect \(4^n\) words, which is hopeless for \(n=30\). The difficulty is not just the size of the search space: the four target words overlap in several ways, so ordinary inclusion-exclusion becomes awkward. The implementations therefore convert the problem into a finite-state dynamic program that records both the current suffix information and how many times each target has already been completed.
Mathematical Approach
The key observation is that only two pieces of information matter after reading a prefix: which suffixes can still grow into one of the target words, and how many times each target word has already been seen. That leads naturally to an Aho-Corasick automaton combined with dynamic programming.
Step 1: Build a Finite Automaton for the Four Target Words
Construct the trie of the four words in \(\mathcal{P}\), then add failure links in the standard Aho-Corasick way. After processing a prefix \(u\), the automaton state represents the longest suffix of \(u\) that is also a prefix of at least one word in \(\mathcal{P}\).
For every state \(s\) and next letter \(\sigma\in\{A,E,F,R\}\), there is a deterministic transition
$\delta(s,\sigma).$
Each state also carries an output vector
$m(s)=\bigl(m_1(s),m_2(s),m_3(s),m_4(s)\bigr),$
where \(m_i(s)\) is the number of times the \(i\)-th target word ends when we arrive at \(s\), after suffix outputs from failure links have been propagated. This is exactly what lets one transition detect newly completed matches, even when overlaps are involved.
Step 2: Augment the State with a Capped Occurrence Profile
For a processed prefix of length \(\ell\), define
$D_\ell\bigl(s,c_1,c_2,c_3,c_4\bigr)$
to be the number of prefixes of length \(\ell\) that end in automaton state \(s\) and have seen the four target words with capped counts \(c_i\in\{0,1,2\}\). Here
$c_i=0 \text{ means “not seen yet,”}\qquad c_i=1 \text{ means “seen exactly once,”}$
and
$c_i=2 \text{ means “seen at least twice.”}$
This cap is enough because any word with some \(c_i=2\) can never contribute to the final answer. We do not need to distinguish between two occurrences and three occurrences; both are already invalid.
The initial condition is
$D_0(0,0,0,0,0)=1,$
with every other state equal to \(0\).
Step 3: Write the Transition Formula
Suppose a length-\(\ell\) prefix is counted by \(D_\ell(s,c_1,c_2,c_3,c_4)\), and we append one letter \(\sigma\). The automaton moves to
$s'=\delta(s,\sigma).$
The new capped counts are
$c_i'=\min\bigl(2,\;c_i+m_i(s')\bigr)\qquad (1\le i\le 4).$
If at least one \(c_i'=2\), this branch is discarded immediately. Otherwise we add
$D_\ell(s,c_1,c_2,c_3,c_4)$
to
$D_{\ell+1}(s',c_1',c_2',c_3',c_4').$
This recurrence is exact because appending one letter changes the future only through the new automaton state and the updated occurrence profile.
Step 4: Why Pruning at Two Occurrences Is Correct
The goal is “exactly once” for every target word. Therefore every valid completed word must end with profile
$\bigl(1,1,1,1\bigr).$
If some component ever reaches \(2\), then one of the target words has already appeared at least twice, and no continuation can repair that. So pruning those branches is logically exact, not merely a heuristic.
This is what makes the state space small. Instead of storing arbitrary occurrence numbers, the dynamic program only needs \(3^4=81\) occurrence profiles for each automaton state.
Step 5: Final Summation
After processing all \(n\) letters, we sum over all automaton states with the exact target profile:
$\boxed{F(n)=\sum_s D_n\bigl(s,1,1,1,1\bigr).}$
No extra correction term is required. The transition rule already enforced “at most once,” and the final profile enforces “at least once.” Together they give “exactly once.”
Worked Example: \(n=9\)
A concrete valid word is
$\text{FREEFAREA}.$
Its important length-4 substrings are
$\text{FREE},\ \text{REEF},\ \text{FARE},\ \text{AREA},$
each appearing exactly once. The occurrence profile evolves as follows:
$\begin{aligned} \text{after FREE} &\longrightarrow (1,0,0,0),\\ \text{after FREEF} &\longrightarrow (1,0,0,1),\\ \text{after FREEFARE} &\longrightarrow (1,1,0,1),\\ \text{after FREEFAREA} &\longrightarrow (1,1,1,1). \end{aligned}$
The implementations also check the checkpoint
$F(9)=1,$
so this length-9 construction is not merely valid; it is the unique valid word of that length.
How the Code Works
The C++, Python, and Java implementations all follow the same plan. First they build the trie for the four target words and perform a breadth-first construction of failure links. During that process, each state receives the accumulated output information from its failure ancestor, so every transition already knows which target words have just ended.
Next, the implementation allocates two rolling dynamic-programming layers. Each automaton state is paired with one of the \(81\) capped occurrence profiles, so a whole layer is just a flat table indexed by “automaton state + profile.” The base layer contains one way to be at the root with all counts equal to \(0\).
For each of the \(30\) character positions, the current layer is scanned. From every nonzero entry, the implementation tries the four possible next letters, follows the precomputed automaton transition, updates the four capped counts, and discards the transition whenever some target word would be counted for a second time. Otherwise the number of ways is added to the destination entry in the next layer.
After the last position, the implementation sums all entries whose profile is exactly \((1,1,1,1)\). The C++ version also includes two small sanity checks, namely \(F(9)=1\) and \(F(15)=72863\), before producing the value for \(F(30)\).
Complexity Analysis
Let \(S\) be the number of automaton states. Building the automaton and its failure links costs \(O(S\cdot 4)\), since the alphabet size is \(4\). The dynamic program performs
$O\bigl(n\cdot S\cdot 3^4\cdot 4\bigr)$
operations, because for each position it scans every state, every capped occurrence profile, and every next letter.
Using rolling layers keeps the memory usage at
$O(S\cdot 3^4).$
For this particular problem the automaton is tiny: the trie of the four length-4 words has only \(16\) states, so each DP layer has \(16\cdot 81=1296\) entries. That is why the exact count for \(n=30\) is easy to obtain once the state formulation is set up correctly.
Footnotes and References
- Problem page: Project Euler 679
- Aho-Corasick algorithm: Wikipedia - Aho-Corasick algorithm
- Trie data structure: Wikipedia - Trie
- Dynamic programming: Wikipedia - Dynamic programming
- Finite-state machine: Wikipedia - Finite-state machine