Problem 352: Blood Tests
View on Project EulerProject Euler Problem 352 Solution
EulerSolve provides an optimized solution for Project Euler Problem 352, Blood Tests, with C++, Python, Java, and a step-by-step mathematical explanation.
Problem Summary Each person is infected independently with probability \(p\). A pooled blood test on any subset is negative if and only if every person in that subset is healthy. For a fixed group size \(s\), let \(T(s,p)\) be the minimum possible expected number of tests needed to determine all infection statuses. Problem 352 asks for $\sum_{i=1}^{50} T\left(10000,\frac{i}{100}\right).$ A brute-force search over all adaptive testing trees is infeasible. The repository solution succeeds because the optimal strategy can be expressed with two one-dimensional dynamic programming states whose transitions depend only on group size. Mathematical Approach Step 1: Two states and the basic probabilities Write \(q=1-p\). Then \(q^n\) is the probability that all \(n\) people are healthy, and \(1-q^n\) is the probability that at least one person in a group of size \(n\) is infected. Define \(A_n\) as the optimal expected number of tests for \(n\) people with no prior information, and \(B_n\) as the optimal expected number of tests for \(n\) people under the condition that at least one of them is infected. Because the people are exchangeable, only the size \(k\) of the next tested subgroup matters. The exact identities of the chosen people do not matter. The base values are forced: $A_0=B_0=0,\qquad A_1=1,\qquad B_1=0.$ \(A_1=1\) because one unconstrained person must be tested once....
Detailed mathematical approach
Problem Summary
Each person is infected independently with probability \(p\). A pooled blood test on any subset is negative if and only if every person in that subset is healthy. For a fixed group size \(s\), let \(T(s,p)\) be the minimum possible expected number of tests needed to determine all infection statuses.
Problem 352 asks for
$\sum_{i=1}^{50} T\left(10000,\frac{i}{100}\right).$
A brute-force search over all adaptive testing trees is infeasible. The repository solution succeeds because the optimal strategy can be expressed with two one-dimensional dynamic programming states whose transitions depend only on group size.
Mathematical Approach
Step 1: Two states and the basic probabilities
Write \(q=1-p\). Then \(q^n\) is the probability that all \(n\) people are healthy, and \(1-q^n\) is the probability that at least one person in a group of size \(n\) is infected.
Define \(A_n\) as the optimal expected number of tests for \(n\) people with no prior information, and \(B_n\) as the optimal expected number of tests for \(n\) people under the condition that at least one of them is infected.
Because the people are exchangeable, only the size \(k\) of the next tested subgroup matters. The exact identities of the chosen people do not matter.
The base values are forced:
$A_0=B_0=0,\qquad A_1=1,\qquad B_1=0.$
\(A_1=1\) because one unconstrained person must be tested once. \(B_1=0\) because if a single-person group is already known to contain an infected person, that person's status is already determined.
Step 2: Deriving the conditional recurrence for \(B_n\)
Assume \(n \ge 2\), and suppose we are in state \(B_n\): the group has size \(n\), and at least one infected person is guaranteed to exist. Testing the whole group again would be pointless, because that pooled test must be positive. So the first informative move is to test a proper subgroup of size \(k\), where \(1 \le k < n\).
Let \(E_n\) be the event that the full group of size \(n\) contains at least one infected person. Then
$\Pr(E_n)=1-q^n.$
If the tested subgroup of size \(k\) is negative, then all \(k\) tested people are healthy. Under the condition \(E_n\), the remaining \(n-k\) people must therefore contain at least one infected person. The corresponding conditional probability is
$p_a=\frac{q^k(1-q^{n-k})}{1-q^n}.$
In that branch, the problem becomes \(B_{n-k}\).
If the tested subgroup is positive, then that subgroup becomes a new conditional problem \(B_k\), while the remaining \(n-k\) people are still an unconditional problem \(A_{n-k}\). The complementary probability is
$p_b=1-p_a=\frac{1-q^k}{1-q^n}.$
Therefore the expected cost of choosing size \(k\) is
$1+p_aB_{n-k}+p_b(B_k+A_{n-k}),$
and minimizing over all legal \(k\) gives
$\boxed{B_n=\min_{1\le k<n}\left(1+p_aB_{n-k}+p_b(B_k+A_{n-k})\right).}$
Step 3: Deriving the unconditional recurrence for \(A_n\)
For \(A_n\), the code compares two families of strategies.
The first option is to test the whole group immediately. That costs one test. With probability \(q^n\) the result is negative and we are done; with probability \(1-q^n\) the result is positive and we then face the conditional state \(B_n\). So this strategy costs
$1+(1-q^n)B_n.$
The second option is to test a proper subgroup of size \(k\) first. Regardless of the outcome, the remaining \(n-k\) people are still an unconditional subproblem, so they contribute \(A_{n-k}\). Only when the tested subgroup is positive do we also pay the extra cost \(B_k\). Thus
$1+q^kA_{n-k}+(1-q^k)(B_k+A_{n-k})=1+A_{n-k}+(1-q^k)B_k.$
Taking the better of the whole-pool strategy and all split-first strategies yields
$\boxed{A_n=\min\left(1+(1-q^n)B_n,\ \min_{1\le k<n}\left(1+A_{n-k}+(1-q^k)B_k\right)\right).}$
Step 4: Dynamic-programming order and final target
Both recurrences use only smaller group sizes, so the arrays can be filled in increasing order \(n=2,3,\dots,s\). The implementations also precompute
$q^0,q^1,\dots,q^s$
so that each transition reads the needed probabilities in constant time instead of recomputing powers repeatedly.
Once the tables are complete, the desired value for a fixed probability is simply
$T(s,p)=A_s.$
The Project Euler target is therefore
$\boxed{\sum_{i=1}^{50} A_{10000}\Bigl(p=\frac{i}{100}\Bigr).}$
The C++ source also checks two benchmark values before producing the final answer:
$T(25,0.02)=4.155452,\qquad T(25,0.10)=12.702124.$
How the Code Works
The main implementation is in solutionsCpp/Euler352.cpp. It precomputes qpow[n]=q^n, fills B[n] from the conditional recurrence, then fills A[n] from the unconditional recurrence. The function T(s,p) returns A[s].
The outer solve() function evaluates \(p=0.01,0.02,\dots,0.50\) and sums the 50 independent results. In C++, those independent probability cases are distributed across pthread workers, capped at 8 threads. The Java file solutionsJava/Euler352.java is a direct port of the same DP and parallelizes with ExecutorService.
The Python file solutionsPython/Euler352.py is intentionally different: it is a small bridge that compiles the C++ solver if needed, runs the produced binary, and parses the output. So the mathematical source of truth is the C++ recurrence, with Java matching it directly and Python delegating to it.
Complexity Analysis
For fixed \((s,p)\), the dynamic program loops over \(n=2,\dots,s\) and, for each \(n\), tries every \(k=1,\dots,n-1\). The total number of candidate splits is
$\sum_{n=2}^{s}(n-1)=\frac{s(s-1)}{2}=O(s^2).$
The arrays qpow, A, and B each have length \(s+1\), so the memory usage is \(O(s)\). The final Project Euler computation repeats this for 50 independent values of \(p\). Parallelism improves wall-clock time, but the total asymptotic work remains \(50\cdot O(s^2)\).
References
- Problem page: https://projecteuler.net/problem=352
- Group testing overview: Wikipedia — Group testing
- Dynamic programming overview: Wikipedia — Dynamic programming
- Repository source implementations used here:
solutionsCpp/Euler352.cpp,solutionsJava/Euler352.java,solutionsPython/Euler352.py