Subset Sum is NP-Complete

SUBSET-SUM={S,tS={x1,,xk} and for some {y1,,yl}{x1,,xk},yi=t}\text{SUBSET-SUM} = \{\langle S, t \rangle | S = \{x_1, \dots, x_k\}\ \text{and for some}\ \{y_1, \dots, y_l\} \subseteq \{x_1, \dots, x_k\},\sum y_i = t \}

We will try to prove that the Subset Sum Problem is NP-Complete. To do this, we’ll assume that the 3SAT3-\text{SAT} problem is NP-Complete, and we’ll show that we can reduce the 3SAT3-\text{SAT} problem to the Subset Sum problem in polynomial time.

Hence, given a boolean formula in 3CNF3-\text{CNF} ϕ\phi, we will try to construct an instance of the Subset Sum Problem in polynomial time such that the subset problem will be satisfiable if and only if ϕ\phi is satisfiable.

The boolean variables in the formula ϕ\phi will be x1,x2,,xlx_1, x_2, \dots, x_l and the clauses will be c1,c2,,ckc_1, c_2, \dots, c_k.

An example of a 3-CNF we’ll use for demonstrative purposes is

(x1xˉ2x3)(x2x3)(xˉ3)(x_1 \lor \bar x_2 \lor x_3) \land (x_2 \lor x_3 \lor \dots) \land \dots \land (\bar x_3 \lor \dots \lor \dots)

We’ll construct a set of numbers, with each variable and clause corresponding to 2 numbers each. Subset Sum

Every boolean variable xix_i will correspond to 2 numbers yiy_i and ziz_i, while every clause corresponds to 2 numbers gig_i and hih_i.

Every number we construct will be in the decimal base system even though we’re only using 0s and 1s, and they will all be of l+kl + k digits, where ll is the number of boolean variables and kk is the number of clauses.

We’ll talk about each number with respect to indexing from the MSB and not the LSB, as it will make it easier to understand.

Numbers corresponding to boolean variables i.e. yi,ziy_i, z_i

Initially set all the digits to 00, and set digits according to this method.

First ll digits

For yiy_i and ziz_i, set the ithi^\text{th} digit to 11, and keep the rest of the l1l - 1 digits in the first ll digits set to 00.

Last kk digits

If xix_i appears in clause cjc_j, then the l+jl + j digit will be set to 11 for yiy_i. In the table, y3y_3’s l+1l + 1 digit is set to 11 as x3x_3 appears in the first clause.

Similarly, if a variable xˉi\bar x_i appears in clause cjc_j, then the l+jl + j digit will be set to 11 for ziz_i. As xˉ2\bar x_2 appears in the first clause, the l+1l + 1 digit is set to 11 for z2z_2.

Numbers corresponding to clauses i.e gi,hig_i, h_i

For each clause cic_i, we have 2 corresponding numbers in the set - gig_i and hih_i. These two numbers are identical, and only have the l+il + i digit set to 11, and the rest to 00.

The target sum tt is given as 1111133331111\dots1333\dots3, where we have the first ll digits as 11, and the last kk digits as 33. We claim that we can pick a subset of the set of numbers we constructed if and only if the 3CNF3-\text{CNF} is satisfiable.

Proof this instance of subset sum is equivalent to 3SAT3-\text{SAT}

Fixing values for variables

The first property we ensure is that we pick either xix_i or xˉi\bar x_i for a given ii. We can’t pick both, or pick none, we must pick exactly one of them.

This property is ensured by the fact that the first ll digits are 11s. We know that only yiy_i and ziz_i can have a digit in the first ll digits be set. For a given position aa in the first ll positions, we know that there are only two numbers that have the atha^{th} digit set, yay_a and zaz_a.

Hence, we have the option of choosing exactly one of them in the subset, as choosing both would set the sum of digits at the position to be 22, and choosing none would sum up to 00 for that position, both cases which don’t satisfy the target sum.

Ensuring all clauses are true

For ϕ\phi to be satisfiable, it must make sure that every clause has at least one term that is true. This is ensured by the last kk digits of every number.

First, we’ll consider only the numbers yiy_i and ziz_i. We need to pick the numbers in such a way that each clause has at least one true variable. In terms of the subset problem, we can restate the problem as ensuring that digits l+1l + 1 to l+kl + k in the sum are all 1\geq 1, as this implies that we’ve picked at least one true variable for each clause.

The max value a digit can have amongst the last kk digits is 33 (when all three variables are true in a clause). To ensure that you can reach exactly 33, we have gig_i and hih_i.

  • If bth digit =3b^\text{th}\ \text{digit}\ = 3, no need to pick extra variables.
  • If bth digit =2b^\text{th}\ \text{digit}\ = 2, pick gbg_b or hbh_b.
  • If bth digit =1b^\text{th}\ \text{digit}\ = 1, pick both gbg_b and hbh_b.
  • If bth digit =0b^\text{th}\ \text{digit}\ = 0, it’s unsatisfiable.

Hence, we can say that gig_i, hih_i were added to the set to provide some sort of padding for testing if each clause is true for a given subset of numbers chosen.

If there exists a subset of numbers that satisfy the subset problem constructed, we get a satisfying assignment to variables for the 3SAT3-\text{SAT} problem. For all ii where yiy_i was chosen, we assign xi=1x_i = 1, and for the rest we assign xi=0x_i = 0.

Hence, we’ve proven that the 3SAT3-\text{SAT} problem can be reduced to the Subset Sum Problem, proving that Subset Sum Problem is NP-complete\text{NP-complete}

Published Nov 22, 2022