Home Supplements How We Rate Blog
Back to Home

How We Rate Evidence

Our scoring system uses a multi-factor weighted algorithm to objectively evaluate supplement research. Every score is computed automatically from the raw study data — no human bias, no cherry-picking.

The Scoring Pipeline

Each supplement's evidence score flows through four stages. Every individual study is scored, then scores are aggregated with a confidence adjustment.

1
Study Design
How rigorous is the methodology?
2
Study Type
Human, animal, or in vitro?
3
Sample Size
How many participants?
4
Outcomes
Positive, negative, or mixed?

1 Study Design Quality

Not all studies are equal. A double-blind, placebo-controlled trial carries far more weight than a case study. We assign a multiplier based on the study's methodological rigor.

Meta-analysis / Systematic Review
1.5×
Double-blind RCT + Placebo
1.4×
Double-blind Trial
1.3×
Randomized Controlled Trial
1.2×
Controlled / Comparative
1.0×
Open-label / Pilot
0.8×
Observational / Cohort
0.7×
Case Study
0.5×
Why it matters: A meta-analysis synthesizes data from multiple trials, giving it 3× the weight of a case study.

2 Study Type

Human clinical trials are the gold standard for supplement research. Animal and in vitro studies provide supporting evidence but can't be directly applied to humans.

Human Study

1.0×

Full weight. Clinical trials, cohort studies, and human observational data.

Animal Study

0.5×

Half weight. Useful for understanding mechanisms but may not translate to humans.

In Vitro

0.3×

Low weight. Lab cell studies provide early signals but are far from clinical proof.

3 Sample Size Scaling

Larger studies are more statistically reliable. We use a logarithmic scale so that going from 10 to 100 participants matters more than going from 1,000 to 10,000.

Formula: weight = min(2.2, 0.3 + log10(n) × 0.45). The logarithmic curve means diminishing returns.

4 Outcome Analysis

We scan each study's results text for statistically significant language to determine whether findings were positive, negative, or mixed.

1.0

Positive

Study found statistically significant benefits.

0.6

Mixed

Both positive and negative signals found.

0.5

Unknown

Results text doesn't contain clear signals.

0.15

Negative

Study found no significant benefit.

Putting It All Together

Each study gets a weight (design × type × sample size) and an outcome score (0 to 1). These are combined into a weighted average, then adjusted for confidence.

Per-study weight
W = Design × Type × SampleSize
Raw score (weighted average)
Raw = (ΣWi × Outcomei) / ΣWi × 100
Confidence adjustment
Confidence = min(1, ΣW / 6)
Final score
Score = 50 + (Raw − 50) × Confidence
Why confidence matters: With only 1-2 studies, even if both are positive, we can't be confident yet. The confidence adjustment pulls the score toward 50 (uncertain) when evidence is thin.

Evidence Level Thresholds

The final score maps to one of four evidence levels.

Weak0 – 37
Moderate38 – 54
Strong55 – 71
Very Strong72 – 100
Weak

Limited or no human clinical trials. Evidence may come only from animal or in-vitro studies.

Moderate

Some human evidence exists, but results are mixed or studies have limitations.

Strong

Multiple well-designed human studies show consistent positive results.

Very Strong

Extensive body of high-quality human research with consistently positive outcomes.

Human Evidence Requirement

Confidence is primarily driven by human clinical data. Animal and in-vitro studies inform mechanisms but cannot replace human trials.

0 human studies

Evidence is capped at Weak regardless of how many positive animal studies exist.

<25% human weight

Evidence is capped at Moderate. Some human data exists but the majority comes from non-human models.

Substantial human data

Strong and Very Strong ratings require meaningful human clinical evidence.

Transparency & Limitations

Fully Automated

All scores are computed algorithmically from the study data. No manual overrides, no editorial bias.

Not Medical Advice

These scores summarize research trends. They are not recommendations. Always consult a healthcare professional.

Keyword-Based Outcome Detection

Outcome scoring relies on text pattern matching, which may miss nuance.

Living Database

Scores update automatically as new studies are added. Evidence levels can change over time.