Sampling and Confidence Intervals for Biology and Medicine

1. Learning Path — Introduction

Do you want to learn more about sampling and confidence intervals? Well, you’ve come to the right place. You just need to follow our 5-step journey. We are pretty sure that afterward, you will dominate these subjects and feel confident applying sampling and confidence intervals in different areas such as economics, sports, biology, and medicine.

Your path in this journey is composed of the following subjects:

Step 1: Populations vs. Samples;
Step 2: Normal Distribution;
Step 3: Law of large numbers — Central Limit Theorem;
Step 4: Confidence Intervals; and
Step 5: Application to biology and medicine.

💡 Beyond this article, you will see that we have a lot of content and tools for you. Expand your learning skills by registering for an Omni Calculator account today. It is fast, easy, and (coming soon) allows you to create, edit, and share calculators. You can also access your previously used tools in the blink of an eye.

2. Step 1: Populations vs. Samples

If you want to master the concepts of sampling and confidence intervals, the first step is to understand the difference between the population and the sample. Let’s use an example to make it easy for you. Suppose that we consider the inhabitants of New York City; they will be our population.

The sample will be a small group from this population, for instance, people living in a neighbourhood such as Queens or Brooklyn. Since there are several neighbourhoods in New York, we can have several samples from the same population. You can apply this concept to different scenarios, such as in a colony of bacteria or in a group of people testing a new vaccine.

The main point in statistics is to use samples to compute information about the entire population. We can, for instance, think of the average age of the population. Of course, the samples will not give us a number with $100\,\%$ precision, because they are small fragments of the population. But they can provide us with a reasonable estimate; the difference between a sample estimate and the true population value is called sampling error.

3. Step 2: Normal Distribution and z-score

Many datasets, such as population age, patient blood pressure, or enzyme activity, can cluster around a mean, forming a curve called a normal, Gaussian, or bell-shaped distribution.

This type of distribution enables us to determine how many standard deviations a specific data point (or raw value) is from the mean. In statistics, such a calculation measures the dispersion of your distribution. This procedure leads to the definition of a parameter called z-score. More details about this parameter can be found in our dedicated articles “Understanding z-score and z-critical value in statistics: A comprehensive guide” and “Z-score and p-value”.

If you want to determine the z-score of a sample, you can save your time by using our z-score calculator or computing it through the formula:

z = \frac{x-\mu}{\sigma}

where:

$x$ — Raw value;
$\mu$ — Average value; and
$\sigma$ — Standard deviation.

We will show in Step 4 how the z-score is deeply connected to the process for deriving the confidence interval for one population.

🙋 Are you searching for more tools related to the z-score and to the concept of dispersion? Then, access our p-value calculator and standard deviation calculator.

4. Step 3: Law of large numbers — Central Limit Theorem

As we mentioned before, if you have a small sample size, your statistical quantities will not reflect the behavior of the entire population. So, we need a larger number in our sample. As we increase the sample size, we will get closer and closer to a proper description of the population. This concept is popularly known in statistics as the law of large numbers.

Complementary to the law of large numbers, we have the Central Limit Theorem. This theorem states that, under fairly general conditions, if you take enough large samples and calculate the average for each sample, the distribution of those averages is normal (Gaussian).

🙋 You can play with these concepts by checking out our central limit theorem calculator and normal distribution calculator.

5. Step 4: Confidence Intervals

The last concept to master the sampling process for your population is determining the confidence interval. When we make an estimate, it is based on samples from the population, so the result will lie within an interval, since it has an inherent sampling error.

Let us go back to the computation of the average age of a population. In the figure below, we see that the average age of the population is determined by the average ages of different samples. As you should remember, there is a sampling error, so the sample averages differ from the population average. However, there is a range in which the true value is most likely to be found with a certain probability. This range is the confidence interval.

Therefore, the confidence interval measures the level of trust that you can have in your samples to compute a statistical quantity. For example, a 95% confidence interval means that after repeated random sampling, the true population mean would be within the interval 95% of the time.

The confidence interval is written using the equation:

\mathrm{CI} = \bar{x} \pm z \times \frac{s}{\sqrt{n}}

where $\bar{x}$ is the sample mean, $s$ is the sample standard deviation, $n$ is the sample size, and $z$ is the z-score.

The last equation unveils that the confidence interval depends on the value of the z-score. Remember that the z-score is connected with the dispersion of your raw data with respect to the average. Thus, a larger dispersion will give you a worse confidence interval, while a lower dispersion will provide us a better $\mathrm{CI}$ .

In the table below, we show you different z-scores and the corresponding confidence intervals:

z-score	Confidence Interval
1.645	90%
1.960	95%
2.576	99%
3.291	99.9%

🙋 If you want to easily compute the confidence interval and understand this equation deeply, you can access our confidence interval calculator.

6. Step 5: Application to biology and medicine

Let’s apply the key concepts learned about sampling and confidence intervals in biology and medicine.

Suppose that a pharmaceutical company wants to launch a new fast-acting medication for headaches. To place the fast-acting label on the box, the average time to pain relief must be statistically significantly less than 20 minutes. But, as you can imagine, it’s impossible to test every person who suffers from headaches (the population).

To do it, the company conducted a clinical trial with 100 patients (the sample). They gave everyone the pill and timed how long it took for their headache to disappear. The results were:

Sample mean $\bar{x}= 18\,\mathrm{minutes}$ ; and
Sample standard deviation $s = 8\,\mathrm{minutes}$ .

By considering $95\%$ of the margin, what would be the confidence interval for this dataset? Would the company be able to print the label fast-acting?

To find the answers, let us use the confidence interval formula:

\mathrm{CI} = 18 \pm 1.96 \times \frac{8}{\sqrt{100}}

where we worked with $z=1.96$ , since the company needs to be $95\%$ sure about the “Fast-Acting” claim.

Thus, the final result is:

\begin{split} \mathrm{CI} & = 18 \pm 1.57\\[.5em] & = 16.43\, – \,19.57 \end{split}

Therefore, the confidence interval is below 20 minutes even in the worst scenario. So, the company can safely use the label fast-acting for this new medication, having shown that the data are consistent with the population mean time below 20 minutes at the 95% confidence level.

As a second example, suppose a team of biologists wants to reintroduce a fish species into a river. To ensure that this fish will reproduce, the level of dissolved oxygen in the water must be higher than $6.0\,\mathrm{mg/L}$ with $95\%$ of significance.

To verify whether this river is safe, the team collected water samples from 64 locations. The results were:

Sample mean $\bar{x} = 6.5\,\mathrm{mg/L}$ ;
Sample size $n=64$ ; and
Sample standard deviation $s = 1.6\,\mathrm{mg/L}$ .

So, the confidence interval for this dataset is:

\begin{split} \mathrm{CI} & = 6.5 \pm 1.96 \times \frac{1.6}{\sqrt{64}}\\[1em] & = 6.5 \pm 0.392\\[.5em] & = 6.108\, – \,6.892 \end{split}

Thus, we can see that the dissolved oxygen level is safe for the reintroduction of this fish in this river.

7. You arrived in the end of the journey!

Congratulations! We are happy that you reached your final destination. Now is your time to develop your skills to become a statistics master on sampling and confidence intervals.

8. Who wrote this article?

This article was written by João Rafael Lucio dos Santos and reviewed by Steven Wooding.