Problem 893: Matchsticks
View on Project EulerProject Euler Problem 893 Solution
EulerSolve provides an optimized solution for Project Euler Problem 893, Matchsticks, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary For each positive integer \(n\), let \(M(n)\) be the minimum number of matchsticks needed to display an expression with value \(n\), using decimal digits together with the operators \(+\) and \(\times\). The seven-segment digit costs are fixed, and each operator contributes two extra matchsticks. The target quantity is $S(N)=\sum_{n=1}^{N} M(n),\qquad N=10^6.$ The key observation is that the search space can be turned into a dynamic program over the integers \(1,2,\dots,N\), instead of a brute-force search over all expression trees. Mathematical Approach The implementation builds the answer from smaller integers upward. Every candidate representation comes from one of three sources: writing \(n\) directly, writing \(n\) as a product of smaller values, or writing \(n\) as a sum whose last term is multiplicative. Step 1: Price Every Decimal Literal Let \(c(d)\) be the matchstick cost of digit \(d\)....
Detailed mathematical approach
Problem Summary
For each positive integer \(n\), let \(M(n)\) be the minimum number of matchsticks needed to display an expression with value \(n\), using decimal digits together with the operators \(+\) and \(\times\). The seven-segment digit costs are fixed, and each operator contributes two extra matchsticks. The target quantity is
$S(N)=\sum_{n=1}^{N} M(n),\qquad N=10^6.$
The key observation is that the search space can be turned into a dynamic program over the integers \(1,2,\dots,N\), instead of a brute-force search over all expression trees.
Mathematical Approach
The implementation builds the answer from smaller integers upward. Every candidate representation comes from one of three sources: writing \(n\) directly, writing \(n\) as a product of smaller values, or writing \(n\) as a sum whose last term is multiplicative.
Step 1: Price Every Decimal Literal
Let \(c(d)\) be the matchstick cost of digit \(d\). For the standard seven-segment display used here,
$c(0)=6,\ c(1)=2,\ c(2)=5,\ c(3)=5,\ c(4)=4,\ c(5)=5,\ c(6)=6,\ c(7)=3,\ c(8)=7,\ c(9)=6.$
If \(n\) is written directly in decimal, its literal cost is
$L(n)=\sum_{i=1}^{k} c(d_i).$
The code computes this efficiently by the recurrence
$L(n)=L\!\left(\left\lfloor\frac{n}{10}\right\rfloor\right)+c(n\bmod 10),\qquad L(0)=0.$
This gives a valid upper bound for every \(M(n)\), because the literal representation is always available.
Step 2: Optimize a Single Multiplicative Term
Define \(P(n)\) as the minimum matchstick cost of a term that evaluates to \(n\) and uses only decimal literals and multiplication. Then
$P(n)=\min\!\left(L(n),\ \min_{\substack{d\mid n \\ 2\le d\le \sqrt{n}}}\bigl(P(d)+P(n/d)+2\bigr)\right).$
The extra \(2\) is the cost of the multiplication sign. Restricting the divisor search to \(d\le\sqrt{n}\) is enough because factor pairs are symmetric.
This recurrence is exact for multiplicative terms: every product can be split at its top multiplication into two smaller multiplicative subterms, and conversely every such split produces a valid candidate.
Step 3: Decompose Any Full Expression into a Sum of Terms
Now let \(M(n)\) denote the true optimum for the value \(n\). The implementations treat a full expression as a sum of multiplicative terms, so after choosing the last summand we get
$M(n)=\min\!\left(P(n),\ \min_{1\le a<n}\bigl(M(a)+P(n-a)+2\bigr)\right).$
Here the left part \(a\) may already be an optimized full expression, while the right part \(n-a\) is the final multiplicative term. Again the \(+2\) accounts for the plus sign.
This recurrence turns the original expression problem into a shortest-cost decomposition problem on the interval \([1,N]\).
Step 4: Replace the Exact Recurrences by Monotone Relaxation
For small limits, the recurrences above can be evaluated directly. For \(N=10^6\), the optimized method instead starts from the literal table and repeatedly applies cost-improving relaxations.
For multiplication, every pair \(x\le y\) with \(xy\le N\) yields the candidate
$\text{cost}(xy)\le \text{cost}(x)+\text{cost}(y)+2.$
Running this update until no entry decreases produces the multiplication-closed table.
For addition, integers are grouped by current cost:
$B_t=\{n\le N:\text{cost}(n)=t\}.$
If \(x\in B_a\) and \(y\in B_b\), then \(x+y\) has candidate cost \(a+b+2\). Let
$C_{\max}=\max_{1\le n\le N}\text{cost}(n).$
Whenever \(a+b+2\ge C_{\max}\), that bucket pair cannot improve anything, so it is skipped. This cost-bucket pruning is the main reason the large computation is practical.
All updates are of the form “replace by a smaller integer if possible”, so the process is monotone decreasing and must eventually stabilize.
Worked Example
The recurrences become clearer on concrete numbers.
For \(28\), the literal cost is
$L(28)=c(2)+c(8)=5+7=12.$
But \(28=4\times 7\), so
$P(28)\le L(4)+L(7)+2=4+3+2=9.$
Thus \(M(28)=9\), already much better than writing 28 directly.
For \(82\), the literal cost is \(7+5=12\), while the additive split \(82=11+71\) gives
$M(82)\le M(11)+P(71)+2=4+5+2=11.$
For \(88\), the literal cost is \(14\). A multiplicative form \(88=8\times 11\) gives cost \(7+4+2=13\), but the additive form \(88=11+77\) gives
$M(88)\le M(11)+P(77)+2=4+6+2=12.$
These examples show why both multiplication and addition relaxations are needed. As a global checkpoint, summing the optimal values up to \(100\) gives \(S(100)=916\).
How the Code Works
The C++, Python, and Java implementations begin by filling an array of literal costs with the decimal recurrence for \(L(n)\). That pass is linear in \(N\) once the digit table is fixed.
Next, they run repeated multiplication relaxations over all pairs \(x\le y\) with \(xy\le N\). Because multiplication is commutative, iterating only over \(x\le y\) avoids duplicate work without missing any product. When this loop reaches a fixed point, the table matches the optimum multiplicative costs \(P(n)\).
After that, the implementation switches to addition. It groups integers by their current matchstick cost, then combines bucket pairs in increasing cost order. Bucket pairs whose candidate cost is already too large are discarded immediately, and within a bucket pair the scan stops as soon as the numeric sum exceeds \(N\). Those two pruning rules remove most of the naive \(O(N^2)\) work.
One implementation also cross-checks the optimized routine against an exact small-limit dynamic program and verifies the checkpoint \(S(100)=916\) before evaluating the full \(N=10^6\) case.
Complexity Analysis
Computing all literal costs takes \(O(N\log_{10}N)\) time and \(O(N)\) memory. One full multiplication sweep costs
$\sum_{x=1}^{N}\left\lfloor\frac{N}{x}\right\rfloor=O(N\log N).$
The multiplication phase repeats this sweep until no further decrease occurs; in practice the number of rounds is small.
The addition phase first rebuilds the cost buckets in \(O(N)\) time, then inspects only bucket pairs whose candidate cost can still improve the table and only value pairs whose sum is at most \(N\). Its exact running time is data-dependent: the naive worst case is quadratic, but the implemented pruning makes the real workload much smaller for this problem size. Memory usage stays \(O(N)\) throughout.
Footnotes and References
- Problem page: https://projecteuler.net/problem=893
- Seven-segment display: Wikipedia — Seven-segment display
- Dynamic programming: Wikipedia — Dynamic programming
- Shortest path problem: Wikipedia — Shortest path problem
- Integer factorization: Wikipedia — Integer factorization