Supplements: Evidence Tiers Explained
Systematic reviews consistently show that fewer than 20% of widely-marketed supplements have Tier 1 evidence. Industry-funded trials are 4× more likely to report positive outcomes than independent trials.
| Measure | Value | Unit | Notes |
|---|---|---|---|
| Evidence Tier | 1 | tier | Tier 1 — this page applies systematic review and meta-analytic methodology to classify evidence |
| Industry-Funded Positive Outcome Rate | 4× | multiplier | Industry-funded studies are ~4× more likely to report positive outcomes vs. independent trials (Lexchin et al. 2003) |
| Supplements with Tier 1 Evidence | <20 | % | Fewer than 20% of marketed supplements have replicated, independently-funded RCT support |
| P-hacking threshold | 0.05 | p-value | When researchers test multiple outcomes and report only significant ones, false positives inflate at roughly 5% per test |
| Minimum RCT n for reliability | 100 | participants | Studies under n=100 are underpowered for most supplement effect sizes (Cohen's d typically 0.2–0.5) |
| Cohen's d small effect | 0.2 | d | A Cohen's d of 0.2 is 'small' — statistically significant but often not practically meaningful |
Most supplement marketing lives and dies on one phrase: “a study showed.” That phrase is nearly meaningless without context. A study in 12 obese rats, a study funded by the company selling the product, a study measuring a surrogate biomarker nobody cares about — all are “studies.” This page exists to give you the tools to tell the difference.
The Four-Tier Framework
| Tier | Name | Definition | Example Supplements | Red Flags | What It Means for You |
|---|---|---|---|---|---|
| 1 | Strong | Multiple large RCTs (n≥100/arm), replicated, independent funding, consistent effect sizes | Creatine monohydrate, caffeine, omega-3 (triglyceride reduction), vitamin D (deficiency correction) | Absence of negative trials, only industry-funded trials | Evidence justifies use if the effect is relevant to your goals |
| 2 | Moderate | Some RCTs, but limited size/number, mixed results, or predominantly industry-funded | Beta-alanine, dietary nitrates (beetroot), ashwagandha (KSM-66 form) | All studies from same lab, no independent replication | Use with realistic expectations; effect size may be smaller than advertised |
| 3 | Weak | Mostly animal or in vitro data, single small human RCTs, or conflicting results | Branched-chain amino acids (in leucine-sufficient diet), glutamine (healthy athletes), most herbal nootropics | ”Cellular studies show…”, “animal models demonstrate…” | Effect in humans unproven; purchase is speculative |
| 4 | Insufficient | Anecdote, marketing claims, mechanism extrapolation, no credible human trials | Most “fat burners,” proprietary blends, exotic botanicals, many “recovery” products | Testimonials only, “clinically studied ingredients,” no study citations | Purchasing is paying for hope, not evidence |
Why “Statistically Significant” Doesn’t Mean “Works”
Statistical significance (p<0.05) only means there is less than a 5% chance the result occurred by chance — if all study assumptions were met and only one hypothesis was tested. In practice, researchers routinely test 10–30 outcomes and report the significant ones, inflating false-positive rates to 40% or higher (Ioannidis 2005, PMID 16060722).
Effect size is the number that actually matters. Cohen’s d of 0.2 is “small” — technically detectable but often meaningless for athletic performance. A supplement giving you 0.2 standard deviations of improvement in a 5-km run might translate to 8 seconds in an average runner. Whether that justifies cost and daily dosing is a question of values, not statistics.
Funding Bias: The Specific Mechanism
A meta-analysis of pharmaceutical sponsorship (Lexchin et al. 2003) found industry-funded studies were 4× more likely to report favorable outcomes. This is not primarily about fraud. The mechanisms are:
- Selective publication: Negative trials simply never get published. One industry-funded positive trial can represent five buried neutral ones.
- Comparator selection: Comparing to a suboptimal dose of a competitor rather than an optimal dose.
- Outcome switching: Pre-specifying one primary outcome, finding it null, then reporting a secondary measure as if it were primary.
- Population selection: Testing in populations most likely to respond (e.g., deficient individuals) then marketing broadly.
How to Read a Supplement Study
Check these five things before trusting a result: (1) Was it pre-registered at ClinicalTrials.gov? (2) Who funded it? (3) What was the sample size and power calculation? (4) Was the primary outcome what was marketed, or a surrogate? (5) Has it been independently replicated?
If the answer to all five is unclear from the paper, assign it Tier 3 at best.
Related Pages
Sources
- Lexchin J et al. (2003). Pharmaceutical industry sponsorship and research outcome and quality. BMJ 326(7400):1167–1170.
- Ioannidis JP. (2005). Why most published research findings are false. PLoS Med 2(8):e124.
- Sterne JA et al. (2019). RoB 2: A revised tool for assessing risk of bias in randomised trials. BMJ 366:l4898.
Frequently Asked Questions
What makes a supplement 'Tier 1' evidence?
Tier 1 requires multiple large RCTs (typically n≥100 per arm), replicated findings across independent research groups, consistent effect sizes across populations, and studies not primarily funded by the manufacturer. All these criteria must be met simultaneously — one large industry-funded RCT does not qualify.
What is p-hacking and how does it inflate supplement evidence?
P-hacking occurs when researchers test many outcomes but only report the statistically significant ones. At a p<0.05 threshold, 1 in 20 tests will show false positives by chance alone. A study testing 20 biomarkers will likely find one 'significant' result even with an inert substance. This inflates the apparent evidence base for many supplements.
Does a large effect size always mean a supplement works?
No. Effect size must be interpreted alongside study quality. A massive effect size from a single unblinded study with n=20 means far less than a modest Cohen's d of 0.3 from a pre-registered, double-blind, independently-funded RCT with n=300. Extraordinary effect sizes in small studies often shrink dramatically in replication attempts — a phenomenon called the 'winner's curse.'
Why should funding source matter if the data looks solid?
Industry-funded trials are 4× more likely to report positive results (Lexchin et al. 2003, BMJ). This is not always outright fraud — it manifests through selective outcome reporting, choosing comparators that favor the product, publishing only positive trials while burying neutral ones, and choosing doses or populations where the product looks best. The data can be technically accurate while the framing is deeply misleading.
What does 'no evidence' actually mean?
Absence of evidence is not evidence of absence — but it matters how thoroughly we've looked. For a supplement with zero human trials, 'no evidence' is genuinely uncertain. For one that has been studied in 15 RCTs with consistently null results, 'no evidence of effect' is a strong finding. The distinction is critical when evaluating marketing claims.