Are the correct statistical methods used in the study?

Learn how to assess an article’s results to determine if the statistics are clinically relevant to your patients.

7m read

Peer reviewers:Franz Wiesbauer, MD MPH Internist

Last update2nd Mar 2021

Do you believe in coincidences? What if you saw your favorite actor at a local coffee shop one Friday morning? You may chalk it up to a chance encounter. Now, what if you started attending this café every Friday morning and bumped into this actor every time? After a month, how confident would you be that you’d see them at the café on Friday mornings?

Researchers want to make sure that the results they found are due to the variables they’re examining—and not by coincidence. They use statistical measures to convey that the results aren’t due just to chance.

A quick glance at the results of a research paper may show you what variables are statistically significant. But, you may need to ask yourself if the results are also clinically relevant.

There are many statistical tests that researchers use. But, in this article, we’ll just focus on the most important ones that you’ll need to know. Let’s breakdown the p value, statistical versus clinical significance, and effect size. In the end, you’ll be able to quickly identify if a paper’s results are worth considering.

What is a p value?

In scientific articles, a p value is the most commonly used method to summarize research results. To understand p values, we’ll first need to review null and alternative hypotheses.

What are null and alternative hypotheses?

Let’s look at an example. Let’s say that a scientist wants to study whether getting 8 hours of sleep a night helps you lose 1 kg of weight a month.

The null hypothesis is what the scientist is trying to prove wrong. In this example, the null hypothesis would be that 8 hours of sleep a night doesn’t lead to weight loss. The alternative hypothesis is that 8 hours of sleep a night does lead to weight loss. The p value would then be the probability that the results show that sleep leads to weight loss—assuming that the null hypothesis is true.

Before starting the research, the researchers set a predetermined significance level to reach (e.g., a number that the p value must be less than) to reject the null hypothesis. This significance level is known as alpha (α).

What p value is considered statistically significant?

In scientific research, the alpha level is usually set to 0.05 so that a p value of 0.05 or less is considered statistically significant. When a p value is 0.05 or less, we say that the results failed to reject the null hypothesis.

Based on our example, this means that if the null hypothesis were true (e.g., sleep doesn’t lead to weight loss) then there is a 5% or less likelihood that we would see the alternative hypothesis (8 hours of sleep a night resulting in 1 kg weight loss) in our results.

Often in medical research, alpha is set at 0.01 and so p values are considered significant at 0.01 or less. In this case, it means that there is less than a 1% chance that the alternative hypothesis would be observed in the results if the null hypothesis was true.

In a scientific article, p values can be found in the results section. Significant p values are often highlighted or bolded to make it easier for the reader to identify the significant variables.

Problems with the p value

Let’s say you come across a study that shows that people who work night shifts reported more chest pains than people who worked day shifts. In fact, the p value equaled 0.03 which is less than the predetermined a of 0.05! That’s great, right? Sure, it’s statistically significant. But let’s take a closer look at the study participants.

Now, this study was able to draw from a database with over 400 000 participants. On the surface, such a large sample size is a good thing. But, the problem with p values is that when you have a large sample, you’re more likely to see significant associations because there is less variability within the sample. This means that even the smallest difference in groups could lead to a significant result.

Let’s take a look at another example.

What if a study wants to examine the relationship between premature babies’ (e.g., preemies) weight and their life expectancy? The null hypothesis is that there is no relationship between weight and life span. The alternative hypothesis is that an increase in weight is associated with a longer life span.

The researchers could gather data from a sample of preemies from one hospital and then look at the relationship between preemie weight and life span.

This might give them an answer to their research question, but the answer might not reflect the real world. Given the small sample size, there may be so much variability in the measures that preemie weight is not statistically associated with life expectancy when, in reality, it is. If this study were published, then you’d think weight differences in preemies didn’t affect their life span, and you’d be missing out on information that could really help your patients!

So, what if the researchers included all the preemies in the United States? With the larger sample, the results are more likely to be a good representative sample and generalizable to many populations. This may also make the findings applicable to your own patient population.

But now, the researchers may find that even the smallest increase in weight is significantly associated with an increase in life span! For example, the results may show that preemies who weighed 2 pounds and 4 ounces have an average life span of 82 years and 30 days; preemies who weighed 2 pounds and 5 ounces have an average life span of 82 years and 35 days. Due to the millions of babies included in the sample, this small difference may be statistically significant. But, does it have any real-world implications? Based on these results, is it important to actively try to increase a preemie’s weight when the benefit would only be 5 more days on average? Either way, the average preemie lives to a ripe old age.

So, what’s the problem with the p value? Well, with larger sample sizes, you’re more likely to find a significant difference—even if it doesn’t exist in the real world. So, a significant p value from a large sample wouldn’t help you pinpoint whether the association between the variables studied is truly meaningful and not due to the large sample size.

And, it is important to know not only that a new treatment is significantly different from the standard treatment, but how significantly different it is. In other words, researchers would want to know that a new drug can significantly affect cholesterol levels, and by how much it affects cholesterol levels. Make sense?

So, it isn’t enough to just rely on p values. You need to sift through other statistics such as effect size for significant values, and see which ones have real meaning!

Become a great clinician with our video courses and workshops

Start learning for free

What is effect size?

Remember, at its core, a p value is about disproving the null hypothesis. The p value won’t tell you if the alternative hypothesis is true. A significant p value will also not tell you the size of the difference between the two groups. To quantify the effect of a certain therapy, clinical trials will often include effect sizes in their statistical analyses.

The effect size tells you how much effect a specific therapy or treatment has on the outcome being studied. In other words, it tells you how large the difference is between two experimental groups (usually between control and treatment groups). The larger the effect size is, the stronger the relationship between the experimental treatment and the outcome. And unlike the p value, effect size is independent of sample size.

In clinical trials, researchers often try to test a new drug against a placebo or standard treatment. A significant p value would only tell them that the differences they are observing in their patients are significant. It wouldn’t be enough to simply say: the results showed that it’s unlikely that the lowered cholesterol levels observed in patients would occur if there was no drug taken (i.e., if the null hypothesis was true). If researchers are studying whether a new cholesterol drug can effectively reduce cholesterol levels, they would want to ensure that the reductions in cholesterol levels were due to the new drug.

So, we’d use the effect size. A large effect size would indicate that the patients on the new drug had a large change in cholesterol rates compared to the participants on the standard drug.

What if the effect size for the cholesterol drug was small? Well, then you may want to weigh the costs and benefits of adopting a new treatment versus continuing the standard of care. You may also want to factor in the price of the new drug. If the new drug is more expensive than the old drug (and the beneficial effect was small), you might not think it’s worth it to switch. This is because the findings may not be clinically relevant with a small effect size.

Statistically significant or clinically significant?

mean that it has clinical implications? Well, we already saw from our previous example that p values alone may not give you the full answer. Large study samples make statistical associations less clinically applicable since the smallest difference can be significant.

What if a study reported that a new drug required patients to take a pill twice a day versus the standard treatment of once a day? And, after all that, the new drug only lowered cholesterol levels by a very small amount? As a clinician, would it be worth it for your patients to take pills twice a day for only a slight reduction in their cholesterol?

So, the next time you are skimming through the results section of a research article, home in on the statistically significant variables and ask yourself: are the results clinically important as well? The effect sizes may help you answer this question.

Of course, you could also read how the authors of the study interpreted their results. How could you determine if the authors’ interpreted their findings correctly? Check out our next article on how to evaluate the conclusions drawn in a study.

So there you have it! A quick summary of the essential statistics you should know in order to quickly evaluate a clinical research paper. Interested in more than just a summary? Head on over to our Epidemiology Essentials Course, where we cover sample sizes, significance levels, and result interpretations of clinical trials in more detail.

Reference list

Agarwal, R. 2019. P-value explained simply for data scientists. Towards data science. https://towardsdatascience.com
McLeod, S. 2019. What does effect size tell you? SimplyPsychology. https://www.simplypsychology.org

About the author

Hafsa Abdirahman, MPH

Hafsa is a public health scientist and medical writer.

Author profile