Hypergeometric Distribution Calculator

Creators

Anna Szczepanek, PhD

Anna SzczepanekPhD, Jagiellonian University in Kraków, Poland

Website

Anna Szczepanek, PhD is a mathematician at the Faculty of Mathematics and Computer Science of the Jagiellonian University in Kraków, where she researches mathematical physics and applied mathematics. At Omni, Anna uses her knowledge and programming skills to create math and statistics calculators. In her free time, she enjoys hiking and reading. See full profile

Check our editorial policy

Reviewers

Dominik Czernia, PhD

Dominik CzerniaPhD, Institute of Nuclear Physics PAN

Website

Research Gate

Dominik Czernia, PhD, is a physicist at the Institute of Nuclear Physics in Kraków, specializing in condensed matter physics with a focus on molecular magnetism. He has led several national research projects, pioneering innovative approaches to novel materials for high technology. Passionate about making science accessible, Dominik has created various calculators, mostly in physics and math categories. In his free time, he enjoys family walks, city explorations, mountain hiking, and traveling everywhere by bike. See full profile

Check our editorial policy

and Jack Bowater

Use our hypergeometric distribution calculator whenever you need to find the probability (or cumulative probability) of a random variable following the hypergeometric distribution. If you want to learn what the hypergeometric distribution is and what the hypergeometric distribution formula looks like, keep reading!

In addition to those essential facts, we also provide you with the properties of the hypergeometric distribution, an example of the hypergeometric distribution, and discuss when to use the hypergeometric probability distribution vs. the (more familiar) binomial distribution.

What is hypergeometric distribution?

The hypergeometric probability distribution describes the number of successes (objects with a specified feature, as opposed to objects without this feature) in a sample of fixed size when we know the total number of items and the number of success items (total number of objects with that feature). Importantly, we assume sampling is without replacement — when we choose an item from the population, we cannot select it again.

The hypergeometric distribution turns out to be useful whenever an observed event cannot re-occur, e.g., in various card games, in which the fact that we drew a card implies we will not draw that card again. For example, the hypergeometric distribution appears in Fisher's exact test, which we use to test the difference between two proportions when the sample size is small (<=50). Check out our dedicated Fisher's exact test calculator to discover more.

Note that, although the population's items are divided into two mutually exclusive categories (success/failure), the hypergeometric distribution is not the same as the binomial distribution. See the last section and the binomial distribution calculator for more details.
But first, let's discuss the formula for the hypergeometric distribution.

The properties of the hypergeometric distribution

There are three properties of the hypergeometric distribution:

The mean;
The standard deviation; and
The variance.

Mean

The mean of the hypergeometric distribution can be calculated by using the following formula:

\mu = n \frac{k}{N}

Where:

n — the number of occurrences;
k — is the number of successes;
N — is the population size.

Standard deviation

The standard deviation is a property of the hypergeometric distribution; it can be calculated as:

\sigma^2 = n \frac{k}{N} \frac{N - k}{N} \frac{N - n}{N - 1}

Variance

The variance of the hypergeometric distribution can be calculated by using the following formula:

\sigma = \sqrt{ n \frac{k}{N} \frac{N - k}{N} \frac{N - n}{N - 1} }

Hypergeometric distribution formula

Three parameters define the hypergeometric probability distribution:

N — Total number of items in the population;
K — Number of success items in the population; and
n — Number of drawn items (sample size).

A random variable X follows the hypergeometric distribution if its probability mass function is given by:

\footnotesize P(X=k) = \frac{ {{K}\choose{k}}{{N-K}\choose{n-k}} } {{N}\choose{n}}

where:

k — Number of drawn success items.

There are usually binomial coefficients in the hypergeometric distribution formula. With the use of the factorial operator !, we can rewrite the above equation as:

\scriptsize \! P(X\!\! =\!\! k)\!\! =\!\! \frac{K!(N\!-\!K)!n!(N\!-\!n)!}{N!k!(K\!-\!k)!(n\!-\!k)!(N\!-\!K\!-\!n\!+\!k)!}

🔎 See the factorial calculator if you're not sure what the exclamation mark ! means.

The mean and variance of the hypergeometric distribution

For a hypergeometric distribution with parameters N, K, n:

The mean of hypergeometric distribution (expected value) is equal to:

n × K / N
The variance of hypergeometric distribution is equal to:

n × K × (N - K) × (N - n) / [N² × (N - 1)]

How to use this hypergeometric distribution calculator?

As you can see, there are lots of formulae related to the hypergeometric distribution that are not so trivial to evaluate. Fortunately, there's our hypergeometric distribution calculator! 😁 Let's explain how to use it before we move on to an example of the hypergeometric distribution.

Enter the parameters of the hypergeometric distribution you want to consider.
Choose what to compute: P(X = k) or one of the four types of cumulative probabilities: P(X > k), P(X ≥ k), P(X < k), P(X ≤ k).
Our hypergeometric distribution calculator returns the desired probability.
At the very bottom of the calculator, you will find the variance and mean of your hypergeometric distribution shown.

Example of the hypergeometric distribution

As you now know what hypergeometric distribution is, let's have a look at an hypergeometric distribution example.

Imagine a bag of chocolate bars with 12 dark and 36 white chocolate bars. You close your eyes and draw 10 bars without replacement.

What is the probability that you have exactly 4 dark chocolate bars?

The parameters are:

N = 48, K = 12, n = 10, k = 4.

So we apply the hypergeometric distribution formula and obtain:

P(X = 4) = 12!×36!×10!×38! / (48!×4!×8!×6×30!) ≈ 0.1474
What is the probability that you have at least 4 dark chocolate bars?

P(X ≥ 4) = P(X = 4) + P(X = 5) + P(X = 6) + P(X = 7) + P(X = 8) + P(X = 9) + P(X = 10) ≈ 0.2023
What is the mean of hypergeometric distribution?

10 × 12 / 48 = 2.5
What is the variance of hypergeometric distribution?

10 × 12 × (48 - 12) × (48 - 10) / [48² × 47] ≈ 285 / 188 ≈ 1.5160
What is the standard deviation of this hypergeometric distribution?

√1.5160 ≈ 1.2313

You can check these results with our hypergeometric distribution calculator!

Hypergeometric distribution vs. binomial distribution

The hypergeometric and binomial distributions both quantify the probability of k successes in n trials. However:

For the hypergeometric distribution, there is sampling with no replacement, so each draw decreases the population. In consequence, after each trial, the probability of success in the next trial changes; and
For the binomial distribution, there is sampling with replacement, so the probability of success remains the same for every trial.

Tip: If the population size is large and the sample is small (relative to the population size), the hypergeometric distribution gives almost the same results as the binomial distribution.

So we can summarize as follows:

Use the hypergeometric distribution if you sample without replacement and the population has few enough elements that a trial changes the probability of the next trial significantly; and
Use the binomial distribution for sampling with replacement or for sampling without replacement with a large population and a small sample size.