Open Source · 7 Languages · 900+ Problems

Project Euler Solutions

Complete solutions in C++, Python & Java — with step-by-step mathematical explanations

All Problems

Problem 553: Power Sets of Power Sets

View on Project Euler

Project Euler Problem 553 Solution

EulerSolve provides an optimized solution for Project Euler Problem 553, Power Sets of Power Sets, with C++, Python, Java, and a step-by-step mathematical explanation.

Problem Summary Let \([n]=\{1,\dots,n\}\). We choose a non-empty family \(\mathcal{F}\) of non-empty subsets of \([n]\). Its intersection graph has one vertex for each chosen subset, and two vertices are adjacent exactly when the corresponding subsets intersect. We must compute \(C(n,K)\), the number of such families whose intersection graph has exactly \(K\) connected components. The target input is \(n=10000\) and \(K=10\), so direct enumeration is completely infeasible. Mathematical Approach The implementations solve the problem by first counting all families on a fixed label set, then extracting the connected ones with exponential generating functions. Step 1: Count every non-empty family on \(r\) labels On a ground set of size \(r\), there are \(2^r-1\) non-empty subsets. Any non-empty selection of those subsets is a valid family, so the raw count is $R_r=2^{2^r-1}-1.$ Even before connectivity is considered, this quantity grows doubly exponentially, which explains why brute force is hopeless for the real parameters. Step 2: Force every label to appear Let \(U_n\) be the number of families whose union is exactly \([n]\). Inclusion-exclusion removes families that miss at least one label. If we restrict attention to a chosen set of \(r\) labels, then every selected subset must lie inside that \(r\)-set, so there are \(R_r\) possibilities....

Detailed mathematical approach

Problem Summary

Let \([n]=\{1,\dots,n\}\). We choose a non-empty family \(\mathcal{F}\) of non-empty subsets of \([n]\). Its intersection graph has one vertex for each chosen subset, and two vertices are adjacent exactly when the corresponding subsets intersect. We must compute \(C(n,K)\), the number of such families whose intersection graph has exactly \(K\) connected components. The target input is \(n=10000\) and \(K=10\), so direct enumeration is completely infeasible.

Mathematical Approach

The implementations solve the problem by first counting all families on a fixed label set, then extracting the connected ones with exponential generating functions.

Step 1: Count every non-empty family on \(r\) labels

On a ground set of size \(r\), there are \(2^r-1\) non-empty subsets. Any non-empty selection of those subsets is a valid family, so the raw count is

$R_r=2^{2^r-1}-1.$

Even before connectivity is considered, this quantity grows doubly exponentially, which explains why brute force is hopeless for the real parameters.

Step 2: Force every label to appear

Let \(U_n\) be the number of families whose union is exactly \([n]\). Inclusion-exclusion removes families that miss at least one label. If we restrict attention to a chosen set of \(r\) labels, then every selected subset must lie inside that \(r\)-set, so there are \(R_r\) possibilities. Therefore

$U_n=\sum_{r=0}^{n}(-1)^{n-r}\binom{n}{r}R_r.$

For the generating-function step we normalize by factorials:

$a_n=\frac{U_n}{n!}\quad (n\ge 1),\qquad a_0=1.$

The special value \(a_0=1\) represents the empty assembly of connected components; it is a standard bookkeeping convention for the exponential formula.

Step 3: Why the exponential formula applies

Consider a family whose union is all of \([n]\). If two graph components shared a label \(t\), then some chosen subset in the first component and some chosen subset in the second would both contain \(t\). Those two subsets would intersect, producing an edge between the components, which is impossible. Hence different connected components use disjoint blocks of labels.

Conversely, if we partition \([n]\) into disjoint non-empty blocks and place one connected family on each block, then the resulting intersection graph has exactly those blocks as its connected components. So families that use every label are exactly sets of connected families on disjoint labeled blocks.

Define

$A(x)=\sum_{n\ge 0} a_n x^n,\qquad H(x)=\sum_{n\ge 1} h_n x^n,$

where \(h_n\) is the number of connected families using all \(n\) labels, divided by \(n!\). The exponential formula gives

$A(x)=\exp(H(x)).$

Step 4: Recover the connected coefficients

Differentiating \(A(x)=\exp(H(x))\) gives

$A'(x)=H'(x)A(x).$

Matching the coefficient of \(x^{m-1}\) yields

$m a_m=\sum_{i=1}^{m} i h_i a_{m-i}.$

Because \(a_0=1\), this becomes the recurrence

$h_m=a_m-\frac{1}{m}\sum_{i=1}^{m-1} i h_i a_{m-i}.$

Thus the connected counts can be recovered one size at a time after the all-label counts are known.

Step 5: Allow unused labels and require exactly \(K\) components

The original problem does not force every label to appear in a chosen subset. An unused label contributes no graph vertex and can be chosen independently, which gives the labeled factor

$e^x=\sum_{j\ge 0}\frac{x^j}{j!}.$

A set of exactly \(K\) connected components contributes

$\frac{H(x)^K}{K!}.$

Therefore the required count is

$\boxed{C(n,K)=n!\,\left[x^n\right]\left(e^x\frac{H(x)^K}{K!}\right).}$

This is the coefficient formula evaluated by the implementations.

Worked Example: \(n=2\)

We have \(R_0=0\), \(R_1=1\), and \(R_2=2^3-1=7\). Hence

$U_1=1,\qquad U_2=7-2\cdot 1=5,$

so

$a_1=1,\qquad a_2=\frac{5}{2}.$

The recurrence gives

$h_1=a_1=1,\qquad h_2=a_2-\frac{1}{2}h_1a_1=\frac{5}{2}-\frac{1}{2}=2.$

Thus

$H(x)=x+2x^2+O(x^3).$

For one connected component,

$C(2,1)=2!\,\left[x^2\right]\left(e^xH(x)\right)=2!\,(2+1)=6.$

For two connected components,

$C(2,2)=2!\,\left[x^2\right]\left(e^x\frac{H(x)^2}{2}\right)=2!\cdot\frac{1}{2}=1.$

This matches direct inspection: among the seven non-empty families on \(\{1,2\}\), only the family containing \(\{1\}\) and \(\{2\}\) but not \(\{1,2\}\) has two connected components.

How the Code Works

The C++, Python, and Java implementations all work modulo \(10^9+7\). They precompute factorials, inverse factorials, and modular inverses so that binomial factors and exponential-generating-function coefficients can be handled with simple modular multiplications.

Next they evaluate the raw counts \(R_r=2^{2^r-1}-1\). Because the modulus is prime, the enormous exponent \(2^r-1\) is reduced modulo \(10^9+6\) before fast modular exponentiation is applied. This keeps the computation small while preserving the correct modular value.

The inclusion-exclusion step is implemented as one polynomial convolution of the signed sequence \(((-1)^r R_r/r!)_{r\ge 0}\) with \((1/r!)_{r\ge 0}\). After the final sign adjustment, this produces the normalized coefficients \(a_n\).

The connected coefficients \(h_n\) are then recovered sequentially from the recurrence above. To obtain exactly \(K\) components, the implementation raises the truncated series \(H(x)\) to the \(K\)-th power by binary exponentiation, multiplies by the truncated series for \(e^x\), extracts the coefficient of \(x^N\), and finally multiplies by \(N!/K!\).

The C++ implementation additionally parallelizes the expensive convolutions and includes optional checkpoint validations on small cases before tackling the large target. The Python and Java implementations use the same mathematics with straightforward quadratic convolutions.

Complexity Analysis

Let \(M=\max(n,K)\). Precomputing factorials, modular inverses, and the raw counts up to \(M\) costs \(O(M\log \text{MOD})\) time and \(O(M)\) memory, which is not the dominant part.

The main cost comes from truncated polynomial convolutions of degree \(M\). With the naive multiplication used here, one convolution costs \(O(M^2)\) time and \(O(M)\) additional memory. Recovering all connected coefficients from the recurrence is also \(O(M^2)\).

Exponentiating \(H(x)\) to the \(K\)-th power requires \(O(\log K)\) convolutions, so the total running time is \(O(M^2\log K)\) with an additional \(O(M^2)\) term from the recurrence. In practice the quadratic series operations dominate. Memory usage remains \(O(M)\) because only a small number of coefficient arrays are active at once.

Footnotes and References

  1. Problem page: https://projecteuler.net/problem=553
  2. Inclusion-exclusion principle: Wikipedia - Inclusion-exclusion principle
  3. Exponential formula in combinatorics: Wikipedia - Exponential formula
  4. Exponential generating function: Wikipedia - Generating function
  5. Connected component in graph theory: Wikipedia - Connected component

Mathematical approach · C++ solution · Python solution · Java solution

Previous: Problem 552 · All Project Euler solutions · Next: Problem 554

Need help with a problem? Ask me! 💡
e
✦ Euler GLM 5.2
Hi! I'm Euler. Ask me anything about Project Euler problems, math concepts, or solution approaches! 🧮