Coupon Collector Problem

There are many cereal boxes, and in each cereal box with equal probability there is one of the nn coupons C1,C2,C3,CnC_{1}, C_{2}, C_{3}, \dots C_{n}.

How many boxes do we have to buy to collect all nn coupons?

XiX_{i} - Random variable that is defined to be equal to the number of purchases needed to get ithi^{\text{th}} coupon.
Our answer is E[i=1nXi]=i=1nE[Xi]E\left[\sum\limits_{i=1}^{n}X_{i}\right]= \sum\limits_{i=1}^{n} E[X_{i}]

X1=1X_{1} = 1 as the first box will always result in a new coupon for us

Finding XiX_{i}

We know the probability of success when you’re opening a box for getting the second new coupon is

Pr[success]=n1nPr[\text{success}] = \frac{n - 1}{n}

Therefore the expected number of trials for getting the second coupon is

E[X2]=nn1E[X_{2}] = \frac{n}{n - 1}

Hence, for general XiX_{i},

E[Xi]=nn(i1)E[X_{i}] = \frac{n}{n - (i - 1)}

Hence using linearity of expectation

i=1nE[Xi]=nn+nn1+nn2++n1nlnn\sum\limits_{i=1}^{n} E[X_{i}] = \frac{n}{n} + \frac{n}{n - 1} + \frac{n}{n - 2} + \dots + \frac{n}{1} \approx n \ln n

Probability of deviation from expected value

If we use Markov’s inequality for this problem to find the probability that it takes 2E[X]\geq 2 \cdot E[X] coupons, we’d get the bound

P[X2E[X]]E[X]2E[X]=12P[X \geq 2 E[X]] \leq \frac{E[X]}{2 E[X]} = \frac{1}{2}

However, that may not be a very good bound. We can instead use Chebyshev’s Inequality, which says -

P(xμa)Var(X)a2P(|x - \mu | \geq a) \leq \frac{Var(X)}{a^{2}}

Finding tighter bound for Coupon Collector problem

Variance of geometric distribution =1pp2= \frac{1 - p}{p^{2}}
For independent random variables X1,X2,,XnX_{1}, X_{2}, \dots, X_{n}, we can sum the variances

Var[X]=i=1nVar[Xi]=i=1n(1pi)pi2=i=1n(1(ni+1n))(ni+1n)2=i=1n(i1n)(ni+1)2n2=i=1nn(i1)(ni+1)2=j=1nn(nj)j2(j=ni+1)=(n2j=1n1j2)(nj=1n1j)n2π26nHnn2π26\begin{aligned} Var[X] &= \sum\limits_{i=1}^{n}Var[X_{i}]\\ &= \sum\limits_{i=1}^{n}\frac{(1 - p_i)}{p_{i}^{2}}\\ &= \sum\limits_{i=1}^{n}\frac{\left(1 - \left(\frac{n - i + 1}{n}\right)\right)}{\left(\frac{n -i + 1}{n}\right)^{2}}\\ &= \sum\limits_{i=1}^{n} \frac{\left(\frac{i - 1}{n}\right)}{\frac{(n - i + 1)^{2}}{n^{2}}}\\ &= \sum\limits_{i=1}^{n}\frac{n(i - 1)}{(n - i + 1)^{2}}\\ &= \sum\limits_{j=1}^{n}\frac{n(n - j)}{j^{2}} & (j &= n - i + 1)\\ &= \left(n^{2}\sum\limits_{j=1}^{n}\frac{1}{j^{2}}\right)- \left(n \sum\limits_{j=1}^{n}\frac{1}{j}\right)\\ & \leq \frac{n^{2}\pi^{2}}{6} - n \cdot H_{n}\\ &\leq \frac{n^{2}\pi^{2}}{6} \end{aligned}

Now, finding the probability that it takes 2E[X]2nlgn\geq 2 E[X] \approx 2 n \lg n coupons is

Pr[X2E[X]]Pr[XE[X]E[X]]Pr[X \geq 2 E[X]] \leq Pr[|X - E[X]| \geq E[X]]

Using Chebyshev’s Inequality, a=E[X]a = E[X].

Pr[XE[X]E[X]]Var(X)E[X]2π2n26n2Hn2=π26Hn22ln2(n)\begin{aligned} Pr[|X - E[X]| \geq E[X]] &\leq \frac{Var(X)}{E[X]^{2}} \leq \frac{\pi^{2}n^{2}}{6n^{2}H_{n}^{2}}\\ &= \frac{\pi^{2}}{6H_{n}^{2}} \leq \frac{2}{\ln^{2} (n)} \end{aligned}

Published Feb 2, 2023