Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 22: Names Scores

Project Euler Problem 22 Solution

EulerSolve provides an optimized solution for Project Euler Problem 22, Names Scores, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary The input is a comma-separated list of quoted first names written with uppercase Latin letters. After extracting the names, we sort them alphabetically. If the sorted list is $s_{(1)}, s_{(2)}, \dots, s_{(N)}$, then the score of the name at position $i$ is its alphabetical value multiplied by $i$. The task is to compute the total of all these scores. This is not a difficult problem computationally, but it has a clean mathematical core: convert letters to numbers, impose lexicographic order, and evaluate one weighted sum over the sorted sequence. Mathematical Approach Let the extracted names be $a_1, a_2, \dots, a_N$. After sorting them, we write the ordered list as $s_{(1)} \le s_{(2)} \le \dots \le s_{(N)}$. Everything else follows from defining the value of one name and then summing those values with the correct positional weights. Extracting the names from the quoted list The data file uses a very simple CSV-like format: each name is surrounded by double quotes and commas separate entries. The implementations therefore use a one-pass scan with a single invariant. After reading any prefix of the file, the output list already contains exactly the names whose closing quote has been seen, and the temporary buffer contains exactly the characters since the last unmatched opening quote. This matters because commas never need to be parsed explicitly....

Detailed mathematical approach

Problem Summary

The input is a comma-separated list of quoted first names written with uppercase Latin letters. After extracting the names, we sort them alphabetically. If the sorted list is $s_{(1)}, s_{(2)}, \dots, s_{(N)}$, then the score of the name at position $i$ is its alphabetical value multiplied by $i$. The task is to compute the total of all these scores.

This is not a difficult problem computationally, but it has a clean mathematical core: convert letters to numbers, impose lexicographic order, and evaluate one weighted sum over the sorted sequence.

Mathematical Approach

Let the extracted names be $a_1, a_2, \dots, a_N$. After sorting them, we write the ordered list as $s_{(1)} \le s_{(2)} \le \dots \le s_{(N)}$. Everything else follows from defining the value of one name and then summing those values with the correct positional weights.

Extracting the names from the quoted list

The data file uses a very simple CSV-like format: each name is surrounded by double quotes and commas separate entries. The implementations therefore use a one-pass scan with a single invariant. After reading any prefix of the file, the output list already contains exactly the names whose closing quote has been seen, and the temporary buffer contains exactly the characters since the last unmatched opening quote.

This matters because commas never need to be parsed explicitly. Characters are copied only while the scan is inside quotes, and each closing quote finishes one complete name. For this dataset, that is enough to recover the sequence $a_1, a_2, \dots, a_N$ exactly.

Assigning a numerical value to a name

For a single uppercase letter $c$, define

$\alpha(c)=\operatorname{ord}(c)-\operatorname{ord}(\text{A})+1,$

so $\alpha(\text{A})=1$, $\alpha(\text{B})=2$, and so on up to $\alpha(\text{Z})=26$. For a name $s=s_1s_2\dots s_m$, its alphabetical value is

$v(s)=\sum_{j=1}^{m}\alpha(s_j).$

The code follows this formula directly by subtracting the code of A from each uppercase letter. Characters outside A through Z are ignored, which is harmless here because the dataset consists of plain uppercase names.

Sorting turns the task into a weighted sum

Once the names are sorted, the problem becomes

$T=\sum_{i=1}^{N} i \cdot v\bigl(s_{(i)}\bigr).$

The position $i$ is 1-based, so the first name contributes its value once, the second contributes twice its value, and so on. The order matters twice: first because sorting decides which name receives which index, and then because that index becomes the multiplier in the final sum.

There is no hidden combinatorics beyond this formula. The entire mathematical reduction is that alphabetical ordering transforms the raw list into a deterministic sequence, and then the answer is a dot product between the position vector $(1,2,\dots,N)$ and the vector of name values.

A useful running recurrence

If we define the partial totals by $T_0=0$ and, for $i \ge 1$,

$T_i=T_{i-1}+i \cdot v\bigl(s_{(i)}\bigr),$

then $T_N$ is the required answer. This simple recurrence is exactly what the implementations accumulate in one pass over the sorted list. It is also the reason no secondary data structure is needed after sorting.

Worked examples

A minimal checkpoint is the two-name list $\text{["B","A"]}$. After sorting, it becomes $\text{["A","B"]}$. Since $v(\text{A})=1$ and $v(\text{B})=2$, the total is

$T=1\cdot 1 + 2\cdot 2 = 5.$

The famous example from the problem statement is COLIN. Its alphabetical value is

$v(\text{COLIN})=3+15+12+9+14=53.$

If COLIN appears at position $938$ in the sorted list, then its name score is

$938 \cdot 53 = 49714.$

The full answer is obtained by applying the same computation to every sorted name and summing all contributions.

How the Code Works

Parsing the quoted payload

The C++, Python, and Java implementations read the entire text payload and scan it character by character. A Boolean state records whether the scan is currently inside quotation marks. While that state is true, letters are appended to the current name; when the closing quote arrives, the completed name is pushed into the list.

This is deliberately narrower than a full CSV parser, but it exactly matches the structure of the Project Euler data file.

Sorting and accumulation

After parsing, the implementation sorts the list with the standard string ordering of the language. Because the data uses uppercase ASCII names, that ordering agrees with the alphabetical order required by the problem. The code then performs one linear pass through the sorted list, computes each name value, multiplies by the 1-based position, and adds the result to a running total.

The implementations also include two small sanity checks before solving the full input: COLIN must have value $53$, and the list $\text{["B","A"]}$ must produce a total score of $5$. Those checkpoints test the two core ingredients of the method: letter valuation and position-weighted summation after sorting.

Complexity Analysis

Let $L$ be the total length of the input payload in characters, let $N$ be the number of names, and let $M=\sum_{i=1}^{N}|s_i|$ be the total number of letters inside the names. Parsing is $O(L)$ because the file is scanned once. The final scoring pass is $O(M)$.

The dominant step is sorting. In the standard comparison model, sorting $N$ strings of average length $k$ costs $O(Nk \log N)$, since each comparison may inspect several characters. Therefore the overall running time is $O(L + Nk \log N)$, commonly written as $O(Nk \log N)$ when $L$ and $M$ are both proportional to $Nk$. The memory usage is $O(L)$ to store the extracted names and the underlying text data.