DNF Approximate Counting

Decision Problem in NP - π\pi is in NP if for any YES instance II, \exists a proof that II is a YES instance that can be verified in polytime.
Counting Problem - We define a counting problem for a decision problem π\pi in NP, where given an instance II of π\pi, we have to produce an output the number of solutions for the instance.

Polynomial Approximation Scheme

A deterministic algorithm AA for a counting problem PP such that it takes an instance II as input, and ϵR>0\epsilon \in \mathbb{R}_{>0}, and in polynomial time wrt n=In = |I| produces an output A(I)A(I) such that

(1ϵ)#(I)A(I)(1+ϵ)#(I)(1 - \epsilon)\#(I) \leq A(I) \leq (1 + \epsilon)\#(I)

A fully polynomial approximation scheme (FPAS) is a PAS such that it runs in time poly(n,1ϵ)\text{poly}(n, \frac{1}{\epsilon})

Polynomial Randomized Approximation Scheme

A randomized algorithm AA for a counting problem PP that takes instance II as input, and ϵR>0\epsilon \in \mathbb{R}_{>0}, in time poly(n)\text{poly(n)} produces A(I)A(I) such that

P[(1ϵ)#(I)A(I)(1+ϵ)#(I)]34\mathbb{P}[ (1 - \epsilon)\#(I) \leq A(I) \leq (1 + \epsilon)\#(I)] \geq \frac{3}{4}

A fully polynomial randomized approximation scheme (FPRAS) is a PRAS such that it runs in time poly(n,1ϵ)\text{poly}(n, \frac{1}{\epsilon})

An (ϵ,δ)FPRAS(\epsilon, \delta)-\text{FPRAS} for a counting problem is an FPRAS that takes as input an instance II and computes an ϵ\epsilon-approximation to #(I)\#(I) with probability 1δ\geq 1 - \delta in time poly(n,1e,1δ)\text{poly}(n, \frac{1}{e}, \frac{1}{\delta})

Abstract Problem

We have UU, a finite set of known size, and a function f:U{0,1}f: U \rightarrow \{0, 1\}, using which we define G={uUf(u)=1}G = \{u \in U | f(u) = 1\}. We have to estimate G|G|.

Attempt 1

Sample u1,u2,,uNu_{1}, u_{2}, \dots, u_{N} from UU independently.
For all i[N]i \in [N]

Yi={1if f(ui)=10otherwiseY_{i} = \begin{cases} 1 & \text{if } f(u_{i}) = 1 \\ 0 & \text{otherwise} \end{cases}

And we define ZZ as a random variable such that

Z=Ui=1NYiNZ = |U| \cdot \sum\limits_{i=1}^{N} \frac{Y_{i}}{N}

ZZ is our estimation of GG.

E[Z]=E[Yi]UN=i=1NE[Yi]UN=UNi=1NP[Yi=1]=UNNGU=G\begin{aligned} \mathbb{E}[Z] &= \mathbb{E}\left[\sum\limits Y_{i}\right] \cdot \frac{|U|}{N} \\ &= \sum\limits_{i=1}^{N}\mathbb{E}[Y_{i}] \cdot \frac{|U|}{N} \\ &= \frac{|U|}{N}\sum\limits_{i=1}^{N}\mathbb{P}[Y_{i}=1]\\ &= \frac{|U|}{N} \cdot N \cdot \frac{|G|}{|U|} = |G| \end{aligned}

Estimator Theorem

Let ρ=GU\rho = \frac{|G|}{|U|}, then the Monte Carlo method yields an ϵ\epsilon-approximation to G|G| with probability 1δ\geq 1 - \delta provided

N4ϵ2ρln2δN \geq \frac{4}{\epsilon^{2}\rho}\ln \frac{2}{\delta}

Derivation

Let Y=i=1NYiY = \sum\limits_{i=1}^{N} Y_{i}. This allows us to rewrite Z=UYNZ = \frac{|U| \cdot Y}{N}
Computing probability that it’s an ϵ\epsilon-approximation,\

P[(1ϵ)GZ(1+ϵ)G]=P[(1ϵ)NρY(1+ϵ)Nρ]1P[Y>(1+ϵ)Nρ]P[Y<(1ϵ)Nρ]=12eNρϵ241δ\begin{aligned} \mathbb{P}[(1 - \epsilon)|G| \leq Z \leq (1 + \epsilon)|G|] &= \mathbb{P}[(1 - \epsilon)N\rho \leq Y \leq (1 + \epsilon)N \rho]\\ & \geq 1 - \mathbb{P}[Y > (1 + \epsilon)N \rho] - \mathbb{P}[Y < (1 - \epsilon)N \rho]\\ &= 1 - 2\cdot e^{-N \rho \frac{\epsilon^{2}}{4}}\\ & \geq 1 - \delta \end{aligned}

Hence,

δ2eNρϵ24    δ2eNρϵ24    Nρϵ24ln2δ    N4ϵ2ρln2δ\begin{aligned} &\delta \geq 2 \cdot e^{-N\rho \frac{\epsilon^{2}}{4}}\\ \implies & \frac{\delta}{2}\geq e^{-N\rho \frac{\epsilon^{2}}{4}}\\ \implies & \frac{N\rho \epsilon^{2}}{4} \geq \ln \frac{2}{\delta}\\ \implies & N \geq \frac{4}{\epsilon^{2}\rho}\ln \frac{2}{\delta} \end{aligned}

However, NN is dependent on ρ=GU\rho = \frac{|G|}{|U|}, hence NN can be exponential if ρ\rho is exponentially small. Next, we look at an example of how we can reduce the universal set size to get a better ρ\rho.

Example of DNF Counting

DNFs are the opposite of CNF, where we take a logical OR of clauses, which have logical ANDs.

F=T1T2Tm,Ti=L1L2LkF = T_{1}\lor T_{2}\lor \dots \lor T_{m}, T_{i}=L_{1}\land L_{2}\land \dots \land L_{k}

The universe is U={T,F}nU = \{T, F\}^{n} We need to find the number of satisfying solutions to FF

G={uUF(u)=T}G = \{u \in U | F(u) = T\}

To count this, we make use of biased sampling, which is essentially reducing our universe so that we can get a more accurate estimation with lesser trials.

We define HiH_{i} as the subset of assignments that satisfy term TiT_{i}, hence we have H1,H2,,HmH_{1}, H_{2}, \dots, H_{m}
We know
Hi=2nri|H_{i}| = 2^{n - r_{i}}, as there is exactly one assignment of the rir_{i} literals in term TiT_{i} to make it satisfy, and all other nrin-r_{i} literals can take any value.

H=i=1mHi\left|H = \bigcup_{i=1}^{m}H_{i}\right|

is the count we want, instead, we first work with a multiset union of these.
Definition of the Universal Set

U=H1H2HmU = H_{1}\uplus H_{2}\uplus \dots \uplus H_{m}
U=i=1mHiH|U| = \sum\limits_{i=1}^{m}|H_{i}| \geq |H|

To make things easier, we define UU as

U={(v,i)vHi}U = \{(v, i) | v \in H_{i}\}

To distinguish between duplicates and where they came from.

Coverage Set

cov(v)={i(v,i)U}\text{cov}(v) = \{i | (v, i) \in U\}

This essentially captures all the terms that would be satisfied by this assignment.
A trivial bound on this is cov(v)m|\text{cov}(v)| \leq m as an assignment can’t satisfy more terms than those that exist.

U=vHcov(v)|U| = \sum\limits_{v\in H} |\text{cov}(v)|

Function to define GG in terms of UU
f:U{0,1}f: U \rightarrow \{0, 1\}

f((v,i))={1if i=min{j(v,j)U}0otherwisef((v, i)) = \begin{cases} 1 & \text{if } i = \min\{j | (v, j) \in U\} \\ 0 & \text{otherwise} \end{cases}

G={(v,j)f((v,j))=1}G = \{(v, j) | f((v, j)) = 1\}
By defining in terms of this, we essentially just take the minimum index occurrence of an assignment in case of duplicates, hence avoiding duplicates in GG.

We can see that G=H|G| = |H|

Lemma -

ρ=GU1m\rho = \frac{|G|}{|U|} \geq \frac{1}{m}

Proof -

U=vHcov(v)vHm=mH=mG\begin{aligned} |U| &= \sum\limits_{v \in H} |\text{cov}(v)|\\ &\leq \sum\limits_{v \in H} m\\ &= m|H|\\ &= m|G|\\ \end{aligned}
    GU1m\implies \frac{|G|}{|U|}\geq \frac{1}{m}

Hence, we’ve modified UU such that the ratio has a much tighter bound in this specific counting problem.

Using the Estimator theorem, we can say that if ρ1m\rho \geq \frac{1}{m} , then the monte carlo method yields an ϵ\epsilon-approximation to G|G| with probability 1δ\geq 1 - \delta, provided

N4mϵ2ln2δN \geq \frac{4m}{\epsilon^{2}} \ln \frac{2}{\delta}

Randomized algorithm for DNF

We still need to ensure that the queries u1,u2,,unu_{1}, u_{2}, \dots, u_{n} are still randomly sampled from UU. The algorithm to do this is given as follows.

  1. Sample i[m]i \in [m] with probability HiU\frac{|H_{i}|}{|U|}
  2. Sample from Hi|H_{i}| uniformly. Fix the rir_{i} bits that are there in the term, and for the rest of the bits you can just flip a coin to decide each of their values.
P[(v,i) is sampled]=P[i is sampled]P[v is sampled from Hii is sampled]=HiU1Hi=1U\begin{aligned} \mathbb{P}[(v, i) \text{ is sampled}] &= \mathbb{P}[i \text{ is sampled}] \cdot \mathbb{P}[v \text{ is sampled from } H_{i} | i \text{ is sampled}]\\ &= \frac{|H_{i}|}{|U|} \cdot \frac{1}{|H_{i}|} = \frac{1}{|U|} \end{aligned}

Hence, every element in UU is sampled with uniform probability.

Published Feb 2, 2023