Skip to content
RBJLabs ®

Hypergeometric distribution

Next we will see one of the most present distributions in probability, which is the hypergeometric distribution, which we will explain below.

This distribution consists of extracting a random sample of size n without replacement or consideration of its order, from a set of N objects.

That is to say that the events are dependent and of the N objects, r have the feature that interests us, in addition, the random variable X is the number of objects in the sample that have that feature.

So to make this clearer, there are combinations N \choose n equally likely ways to select n objects, thus giving to achieve x successes, x objects must be selected from among the r that have the feature we are interested in, having r \choose x ways and also the n - x objects of the N - r objects that do not have the feature, having combinations {N-r} \choose{n-x}.

hypergeometric distribution diagram

Hypergeometric distribution formulas

Using the classic probability formula and the multiplication rule, the probability density is obtained as follows:

P[X = x] = \cfrac{ {r\choose x} {N-r \choose n-x} }{N \choose n }\quad \text{max}[0,n-\left(N-r\right) \le x \le \ \text{min}\left(n,r\right)]

Its most important characteristics are those shown below the expectation and variance

E[X] = n \left( \cfrac{r}{N} \right)

Var[X] = n \left( \cfrac{r}{N} \right) \left( \cfrac{N - r}{N} \right) \left( \cfrac{N-n}{N-1} \right)

The meaning of the unknowns in the above formulas are as follows:

  • N is our batch population
  • r which are our defective units per batch
  • n which is the number of units being tested
  • x is expected, to calculate the probability that x quantities have some condition

Hypergeometric distribution example

A foundry ships blocks in batches of 20 units. No manufacturing process is perfect, so bad blocks are inevitable. However, it is necessary to destroy them to identify the defect. Three units are selected and tested before a lot is accepted. Suppose a given lot includes five defective units.

a) Express the density function.

For this exercise we have our following data:

  • N = 20 units
  • r = 5 defective units
  • n = 3 units that are tested

With these data we can proceed to write our density formula:

P[X = x] = f(x) = \cfrac{ {r\choose x} {N-r \choose n-x} }{N \choose n } = \cfrac{ {5\choose x} {20-5 \choose 3-x} }{20 \choose 3 } \quad x = 0,1,2,3

So now we have to calculate each of the probabilities with the values of x, which is the probability that none, one, two or three are defective:

x= 0 \qquad f(x) =  \cfrac{ {5\choose 0} {15 \choose 3-0} }{20 \choose 3 } = \cfrac{91}{228}\approx 0.399

There is a 39.9% chance that zero units will be defective.

x= 1 \qquad f(x) =  \cfrac{ {5\choose 1} {15 \choose 3-1} }{20 \choose 3 } = \cfrac{35}{76}\approx 0.46

There is a 46% chance that one unit will be defective.

x= 2 \qquad f(x) =  \cfrac{ {5\choose 2} {15 \choose 3-2} }{20 \choose 3 } = \cfrac{5}{38}\approx 0.131

There is a 13.1% chance that two units will be defective.

x= 3 \qquad f(x) =  \cfrac{ {5\choose 3} {15 \choose 3-3} }{20 \choose 3 } = \cfrac{1}{114}\approx 0.008

There is a 0.8% chance that three units will be defective.

b) Find the expected value of defective units.

To find this value we will apply the expectation formula:

E(X) = n\left( \cfrac{r}{N}\right) = 3\left(\cfrac{5}{20} \right) = \cfrac{3}{4} = 0.75

So our expectation of getting defective units is 0.75 or 75%

c) Find the variance for this case.

And to find the value of the variance we will also apply the pure formula:



3\left(\cfrac{5}{20}\right)\left(\cfrac{15}{20}\right)\left(\cfrac{17}{19}\right) = \cfrac{153}{304}

Which gives us an approximate value of the variance of 0.5032

That’s all, we hope that the hypergeometric distribution has been understood as well as possible.

Thank you for being in this moment with us : )