Prediction Interval vs. Confidence Interval: Understanding the Difference
When it comes to statistical forecasting, two types of intervals may seem similar at first glance: confidence intervals and prediction intervals. They serve different goals, but their definitions can be confusing for beginners. Let's clarify these two concepts by examining their purpose.
If you need to refresh your knowledge of confidence intervals, we recommend that you first consult our other articles, such as How to Interpret Confidence Intervals and Confidence Interval vs. Standard Deviation, so you can get the whole picture.
What is a confidence interval?
A confidence interval is a range of values containing the true value of a population parameter, such as the mean or proportion, with a certain confidence level. For example, a 95% confidence interval for the average weight of adult women in a country means that if you repeat the sampling process several times, 95% of the intervals obtained will include the actual average weight of all adult women in that country. To construct a confidence interval, you need to know the sample statistic, the standard error, and the confidence level. Check out our confidence interval calculator 🇺🇸 to learn more about it.
What is a prediction interval?
A prediction interval is a range of values containing a single future observation from a population or process, with a certain confidence level. For example, a 95% prediction interval for the weight of a randomly selected adult woman in a country means that if you repeat the sampling process several times, 95% of the intervals obtained will include the weight of the selected individual. A prediction interval reflects the uncertainty due to sampling error and the variability within the population or process. To construct a prediction interval, you need to know the sample statistic, the standard error, the confidence level, and the prediction error.
Before we dive into the differences between confidence intervals and prediction intervals formulas, you should know that confidence intervals show up in two main contexts:
- Inferential statistics, where we estimate a population parameter like the mean from sampled data; and
- Regression analysis, where we estimate the average response of a dependent variable for a given independent variable.
So, first, let's look at the confidence interval formulas for these two contexts:
Confidence interval formula for statistical inference
When estimating a population mean from a sample, we use the following formula:
where:
- xˉ — Sample mean.
- z∗ — z-score 🇺🇸 for your confidence level.
- σ — Population standard deviation. In practice, we rarely know the true population standard deviation. We usually use the standard deviation of our sample as an estimate.
- n — Square root of the sample size.
Confidence interval formula for regression
When predicting the average value of y for a given x0, we use this formula:
where:
- y^0 — Predicted mean response at x0;
- z∗ — z-score for your confidence level;
- SE — Standard error of the regression;
- n — Sample size;
- x^ — Mean of x values; and
- SSx=∑(xi−xˉ)2 — Sum of squares of x deviations.
Now that you understand the difference between these two confidence interval calculations, we can move on to the prediction interval formula.
Prediction interval formula
When predicting one new observation of y at x0:
where:
- y^0 — Predicted mean response at x0;
- z∗ — z-score for your confidence level;
- SE — Standard error of the regression;
- n — Sample size;
- x^ — Mean of x values; and
- SSx=∑(xi−xˉ)2 — Sum of squares of x deviations.
As you can see, the terms are the same as for the regression confidence interval. However, note the additional "1" inside the square root; it accounts for the variability of individual results, which makes the prediction intervals wider than the confidence intervals.
💡 For large samples (n≥30) or normally distributed data, using the z-score (e.g., 1.96 for 95% confidence) is common. For smaller samples, use the critical t-value t∗.
Feature | Confidence Interval (CI) | Prediction Interval (PI) |
---|---|---|
Purpose | Estimates a population parameter (mean, proportion) | Predicts the value of a single future observation for a given set of predictors |
What it describes | The range of plausible values for the true parameter based on sample data | The range of plausible values for a single new data point |
Variability accounted for | Accounts for sampling error only | Accounts for sampling error plus the natural variability among individual observations |
When to use | When estimating averages, proportions, regression mean responses, or other population parameters | When predicting outcomes for individual cases or making forecasts where a single value is needed |
Common applications | Poll results, average test scores, mean sales estimates, tolerance limits for a mean | Forecasting the next quarter’s sales, the weight of a specific manufactured part, an individual’s medical outcome |
Interpretation | "We are X% confident the true parameter is between A and B." | "We are X% confident a future observation will be between A and B." |
Imagine that you have data on the age and circumference of orange trees. Using a linear regression calculator 🇺🇸, you adjust a model to predict the circumference of a tree based on its age. For a tree that is 900 days old, the model predicts a circumference of 113.5 mm.
We also know the 95% confidence interval for the average circumference of all trees that are 900 days old, that is [105.3 mm, 121.7 mm]. This means we are 95% sure that the average circumference of all trees of this age lies somewhere within this range. You can try doing your own calculations using a 95% confidence interval calculator 🇺🇸.
When we look at the 95% prediction interval for an individual tree instead, the range widens considerably to [64.5 mm, 162.5 mm]. Here, this means that if we randomly select a single 900-day-old tree from the population, we are 95% confident that its circumference will fall somewhere within this range.
A confidence interval indicates the accuracy of the estimated value or statistic given the data's variability. In contrast, prediction intervals quantify the expected range for one or more additional future observations, including uncertainty about the parameter and random variability.
A prediction interval is a range of values likely to contain the value of a single new observation, given the parameters specified for the predictors. For example, for a 95% prediction interval of [5, 10], you can be 95% sure that the next new observation will fall within this range.
The predicted result for an individual observation is identical to the estimated mean result. Still, the prediction interval (PI) for the result of an individual observation will be wider than the confidence interval (CI) for the estimated mean result, because individual observations are more variable than the mean.
This article was written by Claudia Herambourg and reviewed by Steven Wooding.