Problem 59: XOR Decryption
View on Project EulerProject Euler Problem 59 Solution
EulerSolve provides an optimized solution for Project Euler Problem 59, XOR Decryption, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary The ciphertext is given as a comma-separated list of ASCII codes obtained by XOR-encrypting an English plaintext with a repeating key of length 3. Each key character is a lowercase English letter, so the key belongs to a set with only \(26^3\) possibilities. The task is to recover the intended plaintext and then compute the sum of its ASCII values. The decisive facts are that XOR is its own inverse, the key repeats every three positions, and the implementations do not use a separate cryptanalytic shortcut. They test every admissible key and rank the resulting plaintexts with an explicit English-text score. Mathematical Approach Let the ciphertext be \(e_0,e_1,\dots,e_{n-1}\). For a candidate key \(k=(k_0,k_1,k_2)\) with each \(k_r\in\{97,98,\dots,122\}\), the decrypted byte at position \(i\) is $p_i(k)=e_i \oplus k_{i \bmod 3}.$ This turns the problem into a finite optimization problem over the key set \(K=\{97,\dots,122\}^3\). XOR as an involution The basic invariant is $x \oplus y \oplus y = x.$ If a plaintext byte \(p_i\) was encrypted as \(e_i=p_i \oplus k_{i \bmod 3}\), then applying the same key byte again restores the original symbol: $e_i \oplus k_{i \bmod 3}=(p_i \oplus k_{i \bmod 3}) \oplus k_{i \bmod 3}=p_i.$ No recurrence is needed....
Detailed mathematical approach
Problem Summary
The ciphertext is given as a comma-separated list of ASCII codes obtained by XOR-encrypting an English plaintext with a repeating key of length 3. Each key character is a lowercase English letter, so the key belongs to a set with only \(26^3\) possibilities. The task is to recover the intended plaintext and then compute the sum of its ASCII values.
The decisive facts are that XOR is its own inverse, the key repeats every three positions, and the implementations do not use a separate cryptanalytic shortcut. They test every admissible key and rank the resulting plaintexts with an explicit English-text score.
Mathematical Approach
Let the ciphertext be \(e_0,e_1,\dots,e_{n-1}\). For a candidate key \(k=(k_0,k_1,k_2)\) with each \(k_r\in\{97,98,\dots,122\}\), the decrypted byte at position \(i\) is
$p_i(k)=e_i \oplus k_{i \bmod 3}.$
This turns the problem into a finite optimization problem over the key set \(K=\{97,\dots,122\}^3\).
XOR as an involution
The basic invariant is
$x \oplus y \oplus y = x.$
If a plaintext byte \(p_i\) was encrypted as \(e_i=p_i \oplus k_{i \bmod 3}\), then applying the same key byte again restores the original symbol:
$e_i \oplus k_{i \bmod 3}=(p_i \oplus k_{i \bmod 3}) \oplus k_{i \bmod 3}=p_i.$
No recurrence is needed. Once the key is fixed, every plaintext byte is determined independently by this identity, and the period 3 tells us exactly which key byte is used at each index.
The key space is small enough to exhaust
Because the key has length 3 and each character must be a lowercase ASCII letter,
$|K|=26^3=17{,}576.$
That is small enough for direct exhaustive search. For each key, decryption costs one XOR per ciphertext position, so the entire search is completely practical. This is why the solution does not need a more elaborate derivation from residue classes or classical frequency tables.
The method is exact relative to its scoring rule: it does not estimate the best key, it evaluates every candidate key and keeps the best-scoring plaintext.
Scoring English-looking plaintexts
The implementations use a hard validity filter followed by a weighted English-text score. If any decrypted character lies outside the ASCII interval from 9 through 126, the candidate is rejected by assigning a very large negative score. Otherwise the score is
$S(p)=3N_{\text{space}}(p)+2N_{\text{letter}}(p)+40W_{\text{the}}(p)+25W_{\text{and}}(p)+20W_{\text{of}}(p)+20W_{\text{to}}(p),$
where \(N_{\text{space}}\) counts spaces, \(N_{\text{letter}}\) counts uppercase and lowercase English letters, and \(W_{\text{word}}\) counts occurrences of the indicated word written with surrounding spaces.
So the selected key is
$k^\star=\operatorname*{arg\,max}_{k\in K} S(p(k)).$
This is the real mathematical object used by the code. The search is performed on fully decrypted texts, not by solving each key position separately.
Worked Example: a short XOR round trip
The implementations include a small checkpoint with the plaintext “hello world” and the key “abc”. Writing the key bytes as \(97,98,99\), the first encrypted values are
$104\oplus97=9,\qquad 101\oplus98=7,\qquad 108\oplus99=15,$
and continuing with the same period-3 pattern gives the ciphertext prefix \([9,7,15,13,13,67,\dots]\). Decrypting with the same repeating key reverses the process:
$9\oplus97=104,\qquad 7\oplus98=101,\qquad 15\oplus99=108.$
The original text is recovered exactly. A wrong key may still produce printable symbols, but it usually scores much worse because it contains fewer letters, fewer spaces, and few or no common English words.
Recovering the required answer
Once the maximizing plaintext \(p(k^\star)\) has been found, the required result is simply
$A=\sum_{i=0}^{n-1} p_i(k^\star).$
The search phase identifies the plaintext, and the final summation converts that plaintext into the requested integer.
How the Code Works
Parsing the ciphertext
The C++, Python, and Java implementations read the ciphertext as raw text and scan it character by character. Consecutive decimal digits are collected into one token, converted to an integer, and appended to the ciphertext array. Commas and line breaks act only as separators.
Enumerating candidate keys and decrypting
Next, the implementation loops through all triples of lowercase letters. For each triple, it decrypts the entire ciphertext with the repeating period-3 XOR rule and materializes the candidate plaintext as a string.
Scoring, selecting, and summing
Each candidate plaintext is checked against the ASCII-range filter and then scored with the weighted English-text formula. Whenever a candidate beats the best score seen so far, the implementation stores it as the current winner. After the full \(26^3\) search finishes, it sums the ASCII codes of the winning plaintext. Before the main search, the implementations also verify the XOR round-trip identity on a short known example.
Complexity Analysis
If \(n\) is the number of ciphertext bytes, parsing costs \(O(n)\). The exhaustive search tries \(26^3\) keys, and each key requires \(O(n)\) work to decrypt and score the text, so the dominant running time is
$O(26^3 n).$
Because \(26^3=17{,}576\) is a fixed constant, the runtime is linear in the message length with a moderate constant factor. The extra word-count passes in the score are over four fixed short words, so they do not change the asymptotic bound.
The memory usage is \(O(n)\): the implementations store the ciphertext, one current plaintext, and a constant amount of bookkeeping for the best score and the key enumeration.
Footnotes and References
- Problem page: https://projecteuler.net/problem=59
- XOR cipher: Wikipedia - XOR cipher
- ASCII: Wikipedia - ASCII
- Frequency analysis and language statistics: Wikipedia - Frequency analysis