Sample Variance vs. Population Variance: What's the Difference?
Population variance and sample variance are essential statistical measures representing the degree of dispersion or spread in a set of values.
-
Sample variance is used when you only know part of the group (a sample) and want to estimate the variance of the entire group.
-
On the other hand, population variance measures the dispersion of values in a population (or whole group).
If this "sample variance vs. population variance" subject interests you, check out our article "What is variance in statistics?" to learn more about variance in general!
Variance 🇺🇸 quantifies the extent to which the values in a data set differ from that set's mean. In other words, it helps you understand the spread or variability of your data. As noted above, there are two types of variance:
- Sample variance; and
- Population variance.
What is population variance?
Population variance refers to the variance of all data points. It can be used when data is available for the entire population. It is calculated by adding the squared difference between each value and the mean, then dividing by the total number of observations.
The formula for calculating the population variance (σ2) is as follows:
where:
- N — Number of data points in the population;
- xi — Each data point in the population; and
- μ — Population mean.
💡 Interested in knowing more? Visit our population variance calculator 🇺🇸!
What is sample variance?
Sample variance refers to the variance of a given sample of data, which does not include the entire population. By using sample variance, you can account for the fact that samples regularly underestimate parameters such as variance.
As above, it is calculated by adding the squared difference between each value and the mean. Unlike population variance, sample variance is divided by the number of observations minus 1 (n−1), which slightly increases the value of the sample variance and eliminates bias.
The formula for calculating the sample variance (s2) is as follows:
where:
- n — Number of data points in the sample;
- xi — Each data point in the sample; and
- xˉ — Sample mean 🇺🇸.
Here's a sample vs. population variance table showing the major differences between sample and population variance.
Feature | Sample variance | Population variance |
---|---|---|
Definition | Measure of dispersion in a sample | Measure of dispersion in a population |
Mean used | Sample mean | Population mean |
Denominator | n−1 (degrees of freedom) | N (total number of data points) |
Purpose | To estimate the population variance from a sample | To quantify the true variance of the population |
Usage | When only a subset of the population is available | When the entire population data is available |
💡 Did you know?
The standard deviation is the square root of the variance, and is often preferred because it uses the same units as the original data. Want to save time? Try our standard deviation calculator 🇺🇸 to calculate it for any data set quickly.
You can use population variance when you know the data for the entire population and want to measure the actual variability within that population. Here are some cases:
- Census data: Analyzing data from a national census that includes every individual in the population;
- Controlled experiments: Experimenting on the entire population; and
- Complete data sets: Having a complete set of data points for the population.
You can use sample variance when working with a subset (sample) of the entire population and want to estimate the variability within the population. For instance:
- Quality control: Inspecting a sample of products to determine the quality of a large batch;
- Survey research: Surveying a sample of people to deduce the characteristics or behaviour of the entire population;
- Biological studies: Studying a sample of individuals of a species to understand its characteristics as a whole; and
- Experimental studies: Experimenting on a sample in order to generalize the results to the entire population.
In short, population variance measures actual variability based on all data points, whereas sample variance estimates variability within a population based on a subset of data. So, when you have limited data, use sample variance; when you have complete data, use population variance. It's important to know when to apply each to ensure accurate statistical analysis and relevant insights from your data.
To use population variance, you need all available data, whereas to use sample variance, you only need part of it. For example, if we take eight random students from a class to calculate the variance of their age, the sample variance would be required.
In statistics, the Bessel's correction consists of using n − 1
instead of n
in the formula for a sample's variance, where n is the number of observations in a sample. This method corrects the bias in the estimation of the population variance.
To calculate the variance of a population:
- Calculate the mean of the data provided.
- Subtract the mean from each observation.
- Add up the squares of the values obtained in step 2.
- Divide the value obtained in step 3 by the total number of observations (n) to obtain the variance of the population.
It turns out that the sum of the squared deviations from the sample mean will always be less than the sum of the deviations from the true population mean. Therefore, a sample variance (if uncorrected) will always underestimate the population variance.
This article was written by Claudia Herambourg and reviewed by Steven Wooding.