There are many cereal boxes, and in each cereal box with equal probability there is one of the n coupons C1,C2,C3,…Cn.
How many boxes do we have to buy to collect all n coupons?
Xi - Random variable that is defined to be equal to the number of purchases needed to get ith coupon.
Our answer is E[i=1∑nXi]=i=1∑nE[Xi]
X1=1 as the first box will always result in a new coupon for us
Finding Xi
We know the probability of success when you’re opening a box for getting the second new coupon is
Pr[success]=nn−1
Therefore the expected number of trials for getting the second coupon is
E[X2]=n−1n
Hence, for general Xi,
E[Xi]=n−(i−1)n
Hence using linearity of expectation
i=1∑nE[Xi]=nn+n−1n+n−2n+⋯+1n≈nlnn
Probability of deviation from expected value
If we use Markov’s inequality for this problem to find the probability that it takes ≥2⋅E[X] coupons, we’d get the bound
P[X≥2E[X]]≤2E[X]E[X]=21
However, that may not be a very good bound. We can instead use Chebyshev’s Inequality, which says -
P(∣x−μ∣≥a)≤a2Var(X)
Finding tighter bound for Coupon Collector problem
Variance of geometric distribution =p21−p
For independent random variables X1,X2,…,Xn, we can sum the variances
Var[X]=i=1∑nVar[Xi]=i=1∑npi2(1−pi)=i=1∑n(nn−i+1)2(1−(nn−i+1))=i=1∑nn2(n−i+1)2(ni−1)=i=1∑n(n−i+1)2n(i−1)=j=1∑nj2n(n−j)=(n2j=1∑nj21)−(nj=1∑nj1)≤6n2π2−n⋅Hn≤6n2π2(j=n−i+1)
Now, finding the probability that it takes ≥2E[X]≈2nlgn coupons is
Pr[X≥2E[X]]≤Pr[∣X−E[X]∣≥E[X]]
Using Chebyshev’s Inequality, a=E[X].
Pr[∣X−E[X]∣≥E[X]]≤E[X]2Var(X)≤6n2Hn2π2n2=6Hn2π2≤ln2(n)2