Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 300: Protein Folding

View on Project Euler

Project Euler Problem 300 Solution

EulerSolve provides an optimized solution for Project Euler Problem 300, Protein Folding, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary For every binary HP sequence of length \(n\), we place the chain on the square lattice as a self-avoiding walk and ask for the maximum possible number of H-H contacts. Project Euler asks for the average of that maximum over all \(2^n\) sequences. The code solves the case $n=15.$ Mathematical Approach 1. Folding shapes are self-avoiding walks A fold is a self-avoiding walk $p_0,p_1,\dots,p_{n-1}\in\mathbb Z^2$ with unit steps and no repeated lattice site. Because translation and rotation do not change contact counts, the implementation fixes $p_0=(0,0),\qquad p_1=(1,0),$ and then performs a DFS over the remaining \(n-2\) steps. This removes the large symmetry factor before enumeration even starts. 2. A fold matters only through its contact map For a finished walk, the only geometric information relevant to scoring is which non-consecutive monomer indices end up adjacent on the lattice. For \(i<j\) with \(j\ge i+2\), define $C_{ij}=1 \iff |p_i-p_j|_1=1.$ This produces a contact map, which the code stores as a bitset over all pairs \((i,j)\) with \(j\ge i+2\). Different lattice walks can produce the same contact map, and then they have exactly the same score for every HP sequence. So the geometry phase ends by deduplicating maps with hashing....

Detailed mathematical approach

Problem Summary

For every binary HP sequence of length \(n\), we place the chain on the square lattice as a self-avoiding walk and ask for the maximum possible number of H-H contacts. Project Euler asks for the average of that maximum over all \(2^n\) sequences. The code solves the case

$n=15.$

Mathematical Approach

1. Folding shapes are self-avoiding walks

A fold is a self-avoiding walk

$p_0,p_1,\dots,p_{n-1}\in\mathbb Z^2$

with unit steps and no repeated lattice site. Because translation and rotation do not change contact counts, the implementation fixes

$p_0=(0,0),\qquad p_1=(1,0),$

and then performs a DFS over the remaining \(n-2\) steps. This removes the large symmetry factor before enumeration even starts.

2. A fold matters only through its contact map

For a finished walk, the only geometric information relevant to scoring is which non-consecutive monomer indices end up adjacent on the lattice. For \(i<j\) with \(j\ge i+2\), define

$C_{ij}=1 \iff |p_i-p_j|_1=1.$

This produces a contact map, which the code stores as a bitset over all pairs \((i,j)\) with \(j\ge i+2\). Different lattice walks can produce the same contact map, and then they have exactly the same score for every HP sequence. So the geometry phase ends by deduplicating maps with hashing.

For example, the solver finds only

$41$

distinct contact maps for \(n=8\), and

$12495$

distinct contact maps for \(n=15\).

3. Why consecutive H-H pairs are added separately

If positions \(i\) and \(i+1\) are both H, then they are always adjacent in the chain, no matter how the fold bends. Therefore that contribution is independent of the contact map. For an HP mask \(h\in\{0,1\}^n\), the number of consecutive H-H bonds is

$\operatorname{popcount}(h\ \&\ (h\gg 1)).$

This explains the extra term in the code:

$\text{total}(h,C)=\text{nonconsecutive\_contacts}(h,C)+\operatorname{popcount}(h\ \&\ (h\gg 1)).$

So the optimization over folds only needs to maximize the non-consecutive part.

4. Scoring one contact map by subset DP

Fix one contact map \(C\). The code converts it to adjacency bitmasks \(N_C(i)\), where bit \(j\) is set if \((i,j)\) is a non-consecutive contact in that fold. For an H-mask \(m\), define \(\text{score}_C(m)\) as the number of contact edges whose two endpoints are both H.

Instead of checking every contact edge from scratch for every mask, the implementation uses the recurrence

$\text{score}_C(m)=\text{score}_C(m\setminus\{i\})+\bigl|N_C(i)\cap(m\setminus\{i\})\bigr|,$

where \(i\) is the least significant set bit of \(m\). This works because when \(i\) is inserted last, exactly the contacts from \(i\) to already-present H positions are new. Every contact is counted once and only once.

5. Best fold for each HP sequence

Let \(B(m)\) be the best non-consecutive contact count over all unique maps:

$B(m)=\max_C \text{score}_C(m).$

The code maintains an array best[mask] and updates it while scanning all deduplicated contact maps. After that, the full optimal score for sequence mask \(m\) is

$B(m)+\operatorname{popcount}(m\ \&\ (m\gg 1)).$

6. Exact averaging over all \(2^n\) sequences

The final numerator is

$\sum_{m=0}^{2^n-1}\left(B(m)+\operatorname{popcount}(m\ \&\ (m\gg 1))\right).$

Since every HP sequence is equally likely, the desired expectation is

$\frac{1}{2^n}\sum_{m=0}^{2^n-1}\left(B(m)+\operatorname{popcount}(m\ \&\ (m\gg 1))\right).$

The denominator is a power of two, so the result is converted to an exact terminating binary-rational decimal by long division.

7. Checkpoints

The published checkpoint is

$n=8 \quad \Longrightarrow \quad \frac{850}{256}=3.3203125.$

The full solver for \(n=15\) returns

$8.0540771484375.$

How the Code Works

dfs(...) enumerates all self-avoiding walks after fixing the first edge. add_current_contact_map() converts one finished walk to a packed bit key and inserts it into a set. Then each unique map is turned into adjacency masks, and a subset DP computes \(\text{score}_C(m)\) for all \(m\). The array best stores the maximum over all maps, after which the sequence-independent consecutive-HH term is added and averaged.

Complexity Analysis

If \(M\) is the number of unique contact maps, the scoring phase costs

$O(M\cdot 2^n)$

with very small constants because every update is just a few bit operations. Memory is

$O(M+2^n).$

The expensive combinatorial step is the self-avoiding walk enumeration, but for \(n=15\) it is still manageable once symmetry is fixed and maps are deduplicated.

Further Reading

  1. Problem page: https://projecteuler.net/problem=300
  2. Self-avoiding walk: https://en.wikipedia.org/wiki/Self-avoiding_walk
  3. HP lattice model: https://en.wikipedia.org/wiki/HP_model

Mathematical approach · C++ solution · Python solution · Java solution

Previous: Problem 299 · All Project Euler solutions · Next: Problem 301

Need help with a problem? Ask me! 💡
e
✦ Euler GLM 5.2
Hi! I'm Euler. Ask me anything about Project Euler problems, math concepts, or solution approaches! 🧮