Supplements: Evidence Tiers Explained

Category: foundations Updated: 2026-04-03

Systematic reviews consistently show that fewer than 20% of widely-marketed supplements have Tier 1 evidence. Industry-funded trials are 4× more likely to report positive outcomes than independent trials.

Key Data Points
MeasureValueUnitNotes
Evidence Tier1tierTier 1 — this page applies systematic review and meta-analytic methodology to classify evidence
Industry-Funded Positive Outcome RatemultiplierIndustry-funded studies are ~4× more likely to report positive outcomes vs. independent trials (Lexchin et al. 2003)
Supplements with Tier 1 Evidence<20%Fewer than 20% of marketed supplements have replicated, independently-funded RCT support
P-hacking threshold0.05p-valueWhen researchers test multiple outcomes and report only significant ones, false positives inflate at roughly 5% per test
Minimum RCT n for reliability100participantsStudies under n=100 are underpowered for most supplement effect sizes (Cohen's d typically 0.2–0.5)
Cohen's d small effect0.2dA Cohen's d of 0.2 is 'small' — statistically significant but often not practically meaningful

Most supplement marketing lives and dies on one phrase: “a study showed.” That phrase is nearly meaningless without context. A study in 12 obese rats, a study funded by the company selling the product, a study measuring a surrogate biomarker nobody cares about — all are “studies.” This page exists to give you the tools to tell the difference.

The Four-Tier Framework

TierNameDefinitionExample SupplementsRed FlagsWhat It Means for You
1StrongMultiple large RCTs (n≥100/arm), replicated, independent funding, consistent effect sizesCreatine monohydrate, caffeine, omega-3 (triglyceride reduction), vitamin D (deficiency correction)Absence of negative trials, only industry-funded trialsEvidence justifies use if the effect is relevant to your goals
2ModerateSome RCTs, but limited size/number, mixed results, or predominantly industry-fundedBeta-alanine, dietary nitrates (beetroot), ashwagandha (KSM-66 form)All studies from same lab, no independent replicationUse with realistic expectations; effect size may be smaller than advertised
3WeakMostly animal or in vitro data, single small human RCTs, or conflicting resultsBranched-chain amino acids (in leucine-sufficient diet), glutamine (healthy athletes), most herbal nootropics”Cellular studies show…”, “animal models demonstrate…”Effect in humans unproven; purchase is speculative
4InsufficientAnecdote, marketing claims, mechanism extrapolation, no credible human trialsMost “fat burners,” proprietary blends, exotic botanicals, many “recovery” productsTestimonials only, “clinically studied ingredients,” no study citationsPurchasing is paying for hope, not evidence

Why “Statistically Significant” Doesn’t Mean “Works”

Statistical significance (p<0.05) only means there is less than a 5% chance the result occurred by chance — if all study assumptions were met and only one hypothesis was tested. In practice, researchers routinely test 10–30 outcomes and report the significant ones, inflating false-positive rates to 40% or higher (Ioannidis 2005, PMID 16060722).

Effect size is the number that actually matters. Cohen’s d of 0.2 is “small” — technically detectable but often meaningless for athletic performance. A supplement giving you 0.2 standard deviations of improvement in a 5-km run might translate to 8 seconds in an average runner. Whether that justifies cost and daily dosing is a question of values, not statistics.

Funding Bias: The Specific Mechanism

A meta-analysis of pharmaceutical sponsorship (Lexchin et al. 2003) found industry-funded studies were 4× more likely to report favorable outcomes. This is not primarily about fraud. The mechanisms are:

  1. Selective publication: Negative trials simply never get published. One industry-funded positive trial can represent five buried neutral ones.
  2. Comparator selection: Comparing to a suboptimal dose of a competitor rather than an optimal dose.
  3. Outcome switching: Pre-specifying one primary outcome, finding it null, then reporting a secondary measure as if it were primary.
  4. Population selection: Testing in populations most likely to respond (e.g., deficient individuals) then marketing broadly.

How to Read a Supplement Study

Check these five things before trusting a result: (1) Was it pre-registered at ClinicalTrials.gov? (2) Who funded it? (3) What was the sample size and power calculation? (4) Was the primary outcome what was marketed, or a surrogate? (5) Has it been independently replicated?

If the answer to all five is unclear from the paper, assign it Tier 3 at best.

💊 💊 💊

Related Pages

Sources

Frequently Asked Questions

What makes a supplement 'Tier 1' evidence?

Tier 1 requires multiple large RCTs (typically n≥100 per arm), replicated findings across independent research groups, consistent effect sizes across populations, and studies not primarily funded by the manufacturer. All these criteria must be met simultaneously — one large industry-funded RCT does not qualify.

What is p-hacking and how does it inflate supplement evidence?

P-hacking occurs when researchers test many outcomes but only report the statistically significant ones. At a p<0.05 threshold, 1 in 20 tests will show false positives by chance alone. A study testing 20 biomarkers will likely find one 'significant' result even with an inert substance. This inflates the apparent evidence base for many supplements.

Does a large effect size always mean a supplement works?

No. Effect size must be interpreted alongside study quality. A massive effect size from a single unblinded study with n=20 means far less than a modest Cohen's d of 0.3 from a pre-registered, double-blind, independently-funded RCT with n=300. Extraordinary effect sizes in small studies often shrink dramatically in replication attempts — a phenomenon called the 'winner's curse.'

Why should funding source matter if the data looks solid?

Industry-funded trials are 4× more likely to report positive results (Lexchin et al. 2003, BMJ). This is not always outright fraud — it manifests through selective outcome reporting, choosing comparators that favor the product, publishing only positive trials while burying neutral ones, and choosing doses or populations where the product looks best. The data can be technically accurate while the framing is deeply misleading.

What does 'no evidence' actually mean?

Absence of evidence is not evidence of absence — but it matters how thoroughly we've looked. For a supplement with zero human trials, 'no evidence' is genuinely uncertain. For one that has been studied in 15 RCTs with consistently null results, 'no evidence of effect' is a strong finding. The distinction is critical when evaluating marketing claims.

← All supplement pages · Dashboard