Normal Distribution Calculator
- Normal distribution definition
- What is standard normal distribution
- The normal CDF formula
- How to use the normal distribution calculator: an example
- The amazing properties of the bell curve probability distribution
- More about the central limit theorem
- Normal distribution table and multivariate normal
- Normal distribution and statistical testing
- Going beyond the bell curve
This normal distribution calculator (also a bell curve calculator) calculates the area under a bell curve and establishes the probability of a value being higher or lower than any arbitrary X. You can also use this probability distribution calculator to find the probability that your variable is in any arbitrary range, X to X₂, just by using the normal distribution mean and standard deviation values. This article explains some basic terms regarding the standard normal distribution, gives you the formula for normal cumulative distribution function (normal CDF), and provides you with examples of the normal distribution probability.
Normal distribution definition
Normal distribution (also known as the Gaussian) is a continuous probability distribution. Most data is close to a central value, with no bias to left or right. Many observations in nature, such as height of people or blood pressure, follow this distribution.
In a normal distribution the mean value (average) is also the median (the "middle" number of a sorted list of data) and the mode (value that appears most often). As this distribution is symmetric about the center, 50% of values are lower than the mean and 50% of values are higher than the mean.
Another parameter characterizing the normal distribution is the standard deviation. It describes how widespread the numbers are. Generally, 68% of values should be within 1 standard deviation from the mean, 95% - within 2 standard deviations, and 99.7% - within 3 standard deviations. The number of standard deviations from the mean is called the z-score. It may be the case that you know the variance, but not the standard deviation of your distribution. However, it's easy to work out the latter by simply taking the square root of the variance.
You can say that an increase in the mean value shifts the entire bell curve to the right. Changes in standard deviation tightens or spreads out the distribution around the mean. In strongly dispersed distributions there's a higher likelihood for a random data point to fall far from the mean. The shape of the bell curve is determined only by those two parameters.
What is standard normal distribution
You can standardize any normal distribution, which is done by a process known as the standard score. This is when you subtract the population mean from the data score, and divide this difference by the population standard deviations. Standard normal distribution has the following properties:
- Mean value is equal to 0;
- Standard deviation is equal to 1;
- Total area under the curve is equal to 1;
- Every value of variable x is converted into the corresponding z-score.
You can check that this tool by using the standard normal distribution calculator as well. If you input the mean, μ, as 0 and standard deviation, σ, as 1, the z-score will be equal to X.
The total area under the standard normal distribution curve is equal to 1. That means that it corresponds to probability. You can calculate the probability of your value being lower than any arbitrary X (denoted as P(x < X)) as the area under the graph to the left of the z-score of X.
Let's take another look at the graph above and consider the distribution values within one standard deviation. You can see that the remaining probability (0.32) is divided in two regions. The right-hand tail and the left-hand tail of the normal distribution are symmetrical, each with an area of 0.16. This mathematical beauty if precisely why data scientists love the Gaussian distribution!
The normal CDF formula
Calculating the area under the graph is not an easy task. You can either use the normal distribution table or try integrating the normal cumulative distribution function (normal CDF):
Φ(x) = 1/√(2π) * ∫ exp (-t^2/2)dt
For example, if you want to find the probability of a variable being lower than X, you should integrate this function from minus infinity to X. Similarly, if you want to find the probability of the variable being higher than X, you should integrate this function from X to infinity. Make sure to check out the p-value calculator for more information on this topic.
You can also use this calculator as a normal CDF calculator!
Note, however, that the cumulative distribution function of the normal distribution should not be confused with its density function (the bell curve), which simply assigns the probability value to all of the arguments:
φ(x) = 1/√(2π) * exp (-0.5 * x^2)
By definition, the density function is the first derivative, i.e. the rate of change, of the normal CDF.
How to use the normal distribution calculator: an example
- Decide on the mean of your normal distribution. For example, we can try to analyze the distribution of height in United States. The average height of an adult man is 175.7 cm.
- Choose the standard deviation for your data set. Let's say it is equal to 10 cm.
- Let's say you want to use this bell curve calculator to determine the probability of an adult being taller than 185 cm. Then, your X will be equal to 185 cm.
- Our normal distribution calculator will display two values: the probability of a person being taller than 185 cm (P (x > X)) and shorter than 185 cm (P (x < X)). In this case, the former is equal to 17.62% and the latter to 82.38%.
- You can also open the advanced mode to calculate the probability of a variable x being in a certain range (from X to X₂). For example, the probability of height of an adult American man being between 185 and 190 cm is equal to 9.98%.
You can also use the SMp(x) function to simulate the normal distribution. It is a more versatile function, but it is a bit more complicated, too. Therefore, we recommend it to experienced statisticians (or the deeply curious).
The amazing properties of the bell curve probability distribution
The normal distribution describes a large number of natural phenomena: processes that happen continuously, and on a large scale. According to the law of large numbers, the average value of a sufficiently large sample drawn from some distribution will be close to the mean of its underlying distribution. The more measurements you take, the closer you get to the actual value of the mean for the population.
Keep in mind, however, the one of the most robust statistical tendencies is the regression toward the mean. This term, coined by a famous British scientist Francis Galton, reminds us that things tend to even out over time. Taller parents tend to have, on average, children with height closer to the mean. After a period of high GDP (gross domestic product) growth, a country tends to experience a couple of years of more moderate total output.
It may frequently be the case that natural variation, in repeated data, looks a lot like a real change. However it's just a statistical fact that relatively high (or low) observations are often followed by ones with values closer to the average. Regression to the mean is often the source of anecdotal evidence, that cannot be confirmed on statistical grounds.
Normal distribution is known for its mathematical probabilities. Various probabilities, both discrete and continuous, tend to converge toward normal distribution. This is called the central limit theorem, and it's clearly one of the most important theorems in statistics. Thanks to it, you can use the normal distribution mean and standard deviation calculator to simulate the distribution of even the largest datasets.
More about the central limit theorem
As your sample size gets larger and larger, the mean value approaches normality, regardless of the initial shape of the population distribution. For example, with sufficiently large number of observations, the normal distribution may be used to approximate the Poisson distribution or the binomial probability distribution. Consequently, the normal distribution is often considered as the limiting distribution of a sequence of random variables.
That's why it is frequently claimed that many statistical tests and procedures need a sample of more than 30 data points to make sure that a normal distribution is achieved. In statistical language, such properties are often call asymptotic.
If you're not sure what the underlying distribution of your data is, but you can obtain a large number of observations, you can be pretty sure that they follow the normal distribution. It is true even for random walk phenomena, that is processes that evolve with no discernible pattern or trend.
Normal distribution table and multivariate normal
A standard normal distribution table is a great place to check the referential values when building confidence intervals. You can use our normal distribution probability calculator to confirm that the value you used to construct the confidence intervals is correct. For example, if X = 1.96, then that X is the 97.5 percentile point of the standard normal distribution. (set mean = 0, standard deviation = 1, and X = 1.96. See that 97.5% of values are below the X.)
What's more, provided that the observation you use is random and independent, the population mean and variance values you estimate from the sample, are also independent. The univariate Gaussian distribution (calculated for a single variable) may also be generalized for a set of variables. A specific "sum", called the multivariate normal distribution shows the joint distribution of a specific number of variables. It may be used to model higher dimensional data, such as a comprehensive assessment of patients.
Normal distribution and statistical testing
Many types of statistical tests are based on the assumption that the observations used in the testing procedure follow the Gaussian distribution. It is true for nearly all inferential statistics, that is when you use the information from the sample to make generalizations about the entire population.
For example, you may formally check whether the estimated value of a parameter is statistically different than zero, or if a mean value in one population is equal to the other. Most of the simple tests that help you to answer such questions (the so-called parametric tests) rely on the assumption of normality, and cannot be used when an empirical distribution has different properties than a normal one.
This assumption should be tested before you apply these tests. There are a couple of popular normality tests to find out, whether the distribution of your data is normal. The Shapiro-Wilk test, based on the variance of the sample, or the Jarque-Bera test, based on skewness and the excess kurtosis of the empirical distribution. Both tests allow you for accurate interpretation and maintain the explanatory power of statistical models.
Testing for normality also helps you check if you can expect excess rates of return of financial assets, such as stocks, or how well your portfolio performs against the market. The mean of the empirical distribution may be used to approximate the effectiveness of your investment. The variance, on the other hand, can be used to assess the risk that characterizes a portfolio.
One of the most commonly used normality assumptions regards linear (or even non-linear) regression models. Typically, it is assumed that the least squares estimator residuals follow a normal distribution with mean value of zero and fixed (time-invariant) standard deviation (you can think of this residuals as a distance from a regression line to actual data points). The goodness of fit of the least square model may be assess using the chi square test. However, if the error distribution is non-normal, it may mean that your estimates are biased or ineffective.
Another important example in this area is ANOVA (analysis of variance), used to check whether the mean values of two samples are equal. In the canonical form, the ANOVA may also be successfully performed when the distribution of model residuals is normal.
Going beyond the bell curve
There are several ways in which the distribution of your data may deviate from the bell curve distribution, but the two most important of them are:
- fat tails - extreme values may occur with higher probabilities (e.g. there's a relatively high chance of getting the abnormal results);
- skewness - distribution is asymmetric, mean and median values of the distribution are different (e.g. dispersion of wages in the labor market).
Non-normal distribution are common in finance, but you can expect the same kinds of problems to appear in psychology or social studies. One of many examples of such distributions is the geometric distribution, suitable for modeling a number of independent events, e.g. the outcome of rolling dice. If you are interested in financial calculators, make sure to check out our net profit margin calculator that allows you work out an intuitive measure of a company's profitability. Take a look at the compound interest calculator, too.