Statistics for Experiments

1. Introduction

Are you confident in running your experiments but not so bold as to analyze them statistically? This article is here to help you understand step-by-step how to apply statistics for experiments.

Step 1: p-value in science;
Step 2: Confidence interval; and
Step 3: Standard deviation.

2. Step 1: p-value in science

One of the key factors in evaluating study results is the p-value. The p-value in science lies at the core of health data analysis and statistics for experiments. At its foundation is probability, the mathematical measure of how likely an event is to occur, ranging from $0$ (impossible) to $1$ (certain).

Relevance

In medical studies, the p-value helps researchers determine whether the observed results in a sample reflect a real effect or are merely due to random chance.

Thus, a p-value is the probability of obtaining results at least as extreme as the observed data, assuming that the null hypothesis (the idea that there is no real effect or difference) is true. That means that a p-value is the probability that you would see the same, or more extreme, results if the null hypothesis were true.

🙋 Interested in statistics for experiments? Learn more with our article p-value for the Null Hypothesis: When to Reject the Null Hypothesis.

Application

For example, if a study analyzed the effectiveness of a drug in lowering blood pressure, the null hypothesis would state that the drug has no effect on blood pressure. Let’s imagine that the placebo group has an average blood pressure of $140\space \text{mmHg}$ while the experimental group (with the drug) has an average of $132\space \text{mmHg}$ . To calculate the p-value, you need to apply a statistical test such as a t-test to determine how likely the observed difference between the two groups is under the null hypothesis. If a statistical tool or a p-value calculator gives us a p-value of $0.02$ , this means there is only a 2% probability of observing this result or something more extreme if the null hypothesis is true.

Statistical significance

There is a common threshold that scientists use to determine statistical significance. A p-value of less than $0.05$ essentially says there is a less than 5% probability of seeing data this extreme if the null hypothesis is true. Find more in our article A p-value Less Than 0.05 — What Does it Mean?.

However, p-value measures reliability, not impact. This indicates how unlikely the observed results are under the null hypothesis.

3. Step 2: Confidence interval

Statistics for experiments go beyond the p-value, as a statistically significant result does not always imply a meaningful effect in real life. For example, a new drug might reduce blood pressure by just $1\%$ , and even if this result is statistically significant ( $p < 0.01$ ), such a small change may have little or no practical benefit for a patient.

That’s why p-values should always be interpreted together with confidence intervals (CI).

Application

A confidence interval gives a range of likely values for the true effect. For example, you can be 95% confident (95% CI) that the true reduction in blood pressure lies between $8$ and $12$ units of measurement.

Why both matter

p-value — tells you whether the result is likely due to chance; and
Confidence interval — tells you how large and precise the effect is.

Looking at both together gives a more complete and clinically meaningful picture.

Understanding confidence intervals

Wide confidence interval (e.g., from 1 to 50 units of measurement) — too imprecise to guide clinical decisions even if statistically significant; and
Narrow interval (e.g., 9 to 11 units of measurement) — a clear and reliable estimate of what to expect in practice.

🔎 Statistical calculations can be tricky; it is best to use statistical software or specific calculators, such as confidence interval calculator and standard deviation calculator.

4. Step 3: Standard deviation

In an experiment, standard deviation measures how spread out your data points are around the mean (average). If you are measuring the effect of a drug on blood pressure, the mean tells you the average drop in pressure, while the standard deviation tells you how consistent that drop was across all patients.

Interpretation

Low standard deviation — most data points are tightly clustered near the mean, indicating high precision and consistent effect of the treatment on the subjects; and
High standard deviation — the data points are widely spread out, indicating low precision and varying effects on the subjects.

Often, when a high standard deviation is observed in statistics for experiments, it is likely due to measurement error, natural biological variation, or uncontrolled experimental variables.

Sample standard variation

In experimental research, you almost always calculate the sample standard deviation (denoted as $s$ ), because you are testing a sample of a population, not the entire population itself. The formula is:

s = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n-1}}

where:

$x_i$ — Each individual data point;
$\bar{x}$ — Mean (average) of all data points;
$\sum$ — Sum of what follows;
$n$ — Total number of data points; and
$n-1$ — Bessel’s correction.

For example, imagine an experiment testing a new fertilizer on plant growth. You measure the height of $5$ plants in centimeters: $10$ , $11$ , $12$ , $10$ , $12$ . To find the standard deviation, you need to follow these steps:

Find the mean:

$(10 + 11 + 12 + 10 + 12) / 5 = 11$
Calculate the deviation of each point from the mean, and square it:
- $(10 - 11)² = 1$
- $(11 - 11)² = 0$
- $(12 - 11)² = 1$
- $(10 - 11)² = 1$
- $(12 - 11)² = 1$
Sum the squared deviations:

$1 + 0 + 1 + 1 + 1 = 4$
Determine the variance by dividing by $n-1$ . The sample size $n$ is $5$ , so $n-1$ is $4$ . Thus, we divide:

$4 / 4 = 1$
Take the square root:

$\sqrt{1}=1$
That's it! The standard deviation is $1 \text{ cm}$ .

This is a very low standard deviation, indicating that the fertilizer had a consistent effect on these plants.

For contrast, if another group’s heights were $5$ , $17$ , $11$ , $6$ , $16$ , the mean is still $11 \text{ cm}$ , but the data is much more scattered, which is reflected by a standard deviation of $\approx 5.52 \text{ cm}$ . Therefore, the higher standard deviation indicates that the effect is inconsistent across samples.

💡 How about editing calculators?* And returning to your saved ones?

*Coming soon.

This article was written by Julia Kopczyńska and reviewed by Steven Wooding.

1. Introduction

2. Step 1: p-value in science​

Relevance

Application

Statistical significance

3. Step 2: Confidence interval

Application

Why both matter

Understanding confidence intervals

4. Step 3: Standard deviation

Interpretation

Sample standard variation

2. Step 1: p-value in science