Population size (N)
Number of success states in population (K)
Sample size (n)
Number of success states in sample (k)
What to compute?
P(X = k)
Result

Use our hypergeometric distribution calculator whenever you need to find the probability (or cumulative probability) of a random variable following the hypergeometric distribution. If you want to learn what the hypergeometric distribution is and what the hypergeometric distribution formula looks like, keep reading!

Besides those essential facts, we also provide you with an example of the hypergeometric distribution and discuss when to use the hypergeometric probability distribution vs. the (more familiar) binomial distribution.

What is hypergeometric distribution?

The hypergeometric probability distribution describes the number of successes (objects with a specified feature, as opposed to objects without this feature) in a sample of fixed size when we know the total number of items and the number of success items (total number of objects with that feature). Importantly, we assume sampling is without replacement - when we choose an item from the population, we cannot select it again.

The hypergeometric distribution turns out to be useful whenever an observed event cannot re-occur, e.g., in various card games, in which the fact that we drew a card implies we will not draw that card again. For example, the hypergeometric distribution appears in Fisher's exact test, which we use to test the difference between two proportions when the sample size is small (<=50).

Note that, although the population's items are divided into two mutually exclusive categories (success/failure), the hypergeometric distribution is not the same as the binomial distribution. See the last section for more details. But first, let's discuss the formula for the hypergeometric distribution.

Hypergeometric distribution formula

Three parameters define the hypergeometric probability distribution:

  • N - the total number of items in the population;
  • K - the number of success items in the population; and
  • n - the number of drawn items (sample size).

A random variable X follows the hypergeometric distribution if its probability mass function is given by:

Hypergeometric distribution formula

where,

  • k is the number of drawn success items.

There are usually binomial coefficients in the hypergeometric distribution formula. With use of the factorial operator !, we can rewrite the above equation as:

Hypergeometric distribution formula simplified

The mean and variance of the hypergeometric distribution

For a hypergeometric distribution with parameters N, K, n:

  • The mean of hypergeometric distribution (expected value) is equal to:

    n * K / N

  • The variance of hypergeometric distribution is equal to:

    n * K * (N - K) * (N - n) / [NΒ² * (N - 1)]

How to use this hypergeometric distribution calculator?

As you can see, there are lots of formulae related to the hypergeometric distribution that are not so trivial to evaluate. Fortunately, there's our hypergeometric distribution calculator! 😁 Let's explain how to use it before we move on to an example of the hypergeometric distribution.

  1. Enter the parameters of the hypergeometric distribution you want to consider.

  2. Choose what to compute: P(X = k) or one of the four types of cumulative probabilities: P(X > k), P(X β‰₯ k), P(X < k), P(X ≀ k).

  3. Our hypergeometric distribution calculator returns the desired probability.

  4. Go to the advanced mode if you want to have the variance and mean of your hypergeometric distribution shown.

Example of the hypergeometric distribution

As you now know what hypergeometric distribution is, let's have a look at an hypergeometric distribution example.

Imagine a bag of chocolate bars with 12 dark and 36 white chocolate bars. You close your eyes and draw 10 bars without replacement.

  • What is the probability that you have exactly 4 dark chocolate bars?

    The parameters are:

    N = 48, K = 12, n = 10, k = 4.

    So we apply the hypergeometric distribution formula and obtain:

    P(X = 4) = 12!*36!*10!*38! / (48!*4!*8!*6!*30!) β‰ˆ 0.1474

  • What is the probability that you have at least 4 dark chocolate bars?

    P(X β‰₯ 4) = P(X = 4) + P(X = 5) + P(X = 6) + P(X = 7) + P(X = 8) + P(X = 9) + P(X = 10) β‰ˆ 0.2023

  • What is the mean of hypergeometric distribution?

    10 * 12 / 48 = 2.5

  • What is the variance of hypergeometric distribution?

    10 * 12 * (48 - 12) * (48 - 10) / [48Β² * 47] β‰ˆ 285 / 188 β‰ˆ 1.5160

  • What is the standard deviation of this hypergeometric distribution?

    √1.5160 β‰ˆ 1.2313

Hypergeometric distribution vs. binomial distribution

The hypergeometric and binomial distributions both quantify the probability of k successes in n trials. However:

  • For the hypergeometric distribution, there is sampling with no replacement, so each draw decreases the population. In consequence, after each trial, the probability of success in the next trial changes; and
  • For the binomial distribution, there is sampling with replacement, so the probability of success remains the same for every trial.

Tip: If the population size is large and the sample is small (relative to the population size), the hypergeometric distribution gives almost the same results as the binomial distribution.

So we can summarize as follows:

  • Use the hypergeometric distribution if you sample without replacement and the population has few enough elements that a trial changes the probability of the next trial significantly; and

  • Use the binomial distribution for sampling with replacement or for sampling without replacement with a large population and a small sample size.

Anna Szczepanek, PhD candidate