Inequity Index
A data-backed reference examining how inequality is produced, maintained, measured, and misrepresented across economics, health, housing, labor, and policy domains. Each entry scores the structural explanation of a claim from 0–100 and documents the mechanism, the evidence, and who benefits from the prevailing framing.
Structural
score
The United States ranks #23 on the World Happiness Report despite the highest GDP per capita of any large nation. The evidence — from the Easterlin Paradox to Wilkinson & Pickett's inequality-wellbeing correlation to the Nordic natural experiment — shows that this gap is explained by structural conditions, not by American attitudes or personal choices.
See the full analysis →The domains and claims below document the structural mechanisms that produce this outcome — explore them to see the evidence domain by domain.
What this is
The Inequity Index is not a political scorecard. It is a reference tool that applies a consistent analytical framework to contested claims about inequality: What is the mechanism? What does the best available evidence say? Who benefits from the current framing? What would a structural explanation predict, and does it hold?
Each claim receives a Structural Score (0–100): the degree to which a structural explanation accounts for the observed outcome, versus an individualist or behavioral explanation. A score of 100 means the structural explanation fully accounts for the variance; 0 means the evidence points entirely to individual factors. Most entries are somewhere between, and the score is a starting point for analysis, not a verdict on the person raising the claim.
Dual scoring: Structural position vs. Premise truth
Each claim is scored on two independent axes:
Structural Score (0–100) measures how much structural forces versus individual choice explain the observed phenomenon. A high structural score means that changing the system would change outcomes even if individuals didn't change; a low score means individual behavior dominates. This is independent of whether the claim as stated is true.
Premise Score (0–100) measures whether the claim itself is accurate. A claim like "alternative medicine is equally valid to conventional medicine" can have a high structural score (if people's beliefs about health are largely shaped by marketing and access to information) while having a very low premise score (because the claim itself is not supported by evidence).
The Verdict (refuted / contested / partial / supported / strongly_supported) reflects the premise score: whether the claim is true, not whether structural factors explain the phenomenon.
Structural Score scale
Verdicts (Premise assessment)
Domains
What is a structural explanation?
A structural explanation locates the cause of an outcome in the rules, institutions, incentives, and resource distributions that shape the choices available to individuals — rather than in the choices individuals make within those constraints. It asks: given the system people are embedded in, would we expect a different kind of person to produce a different outcome? If the answer is no — if the outcome follows from the position, not the person — then the explanation is structural.
The contrast is with an individualist explanation, which attributes outcomes to the traits, decisions, effort, values, or capabilities of the person experiencing them. Individualist explanations are not always wrong. Individual variation is real and matters. The question the Inequity Index asks is empirical: across the full distribution, how much of the observed variance in outcomes — wages, poverty rates, incarceration, health, educational attainment — is explained by structural position versus individual characteristics?
The clearest test is the cross-national natural experiment. If a gap between groups reflects individual characteristics — culture, work ethic, values, behavior — then that gap should appear everywhere those individuals exist. If it disappears or reverses in countries with different institutions, the individual explanation is falsified. American poverty rates are more than four times higher than Danish poverty rates. Danish people are not four times more industrious. The Danish welfare state, minimum wage structure, union density, and childcare policy produce different outcomes from the same distribution of human behavior. That is a structural effect.
A second test is the policy natural experiment. If outcomes are structurally determined, then changing the structure should change the outcome even when individuals have not changed. The US Child Tax Credit expansion in 2021 cut child poverty from 14.2% to 5.2% in six months — a structural change that produced an immediate outcome change with no change in the individuals involved. When the policy expired in January 2022, poverty rose from 12.1% to 16.8% within a month. Individuals did not change their behavior between December 2021 and January 2022. The structure changed, and the outcome followed.
Structural explanations are not claims that individuals have no agency, or that culture is irrelevant, or that every disparity reflects intentional discrimination. They are claims about the relative explanatory weight of position versus disposition — and that claim is testable. The Inequity Index is an attempt to apply that test consistently, using the best available evidence, to a set of contested claims where structural factors are systematically underweighted in public discourse.
A few important limits: structural and individual explanations are not mutually exclusive. A structural account of poverty says that labor market institutions, transfer policy, and housing costs explain most of the cross-national and cross-group variance — not that individual effort plays no role within a given system. The structural score does not rate the moral culpability of individuals; it rates the fraction of outcome variance that policies and institutions can, in principle, move.
How scores are calculated
Each claim receives two independent scores:
Structural Score (0–100) is the sum of four dimensions, each scored 0–25. It measures how much structural versus individual factors explain the observed phenomenon. This is not a judgment about whether the claim is true — it is a summary of the evidence about whether outcomes follow from system position or individual choice.
Premise Score (0–100) is also the sum of four dimensions, each scored 0–25. It measures whether the claim as stated is accurate. The verdict label (refuted / contested / partial / supported / strongly_supported) is derived from the premise score.
Scores are revisable. If new evidence changes the weight on any dimension, the score updates. Individual claim pages show the per-dimension breakdown and the rationale behind each sub-score.
Premise Score dimensions
The Premise Score (0–100) is the sum of four dimensions, each scored 0–25. Together they assess whether the claim as stated is accurate based on available evidence.
Quality and quantity of direct evidence for or against the claim — RCTs, systematic reviews, natural experiments, large cohort studies.
23–25: Multiple high-quality sources (RCTs, meta-analyses) consistently support or refute the claim.
19–22: Strong evidence from well-designed studies, though some inconsistency or limitations exist.
14–18: Decent evidence from multiple sources, but methodological limitations or conflicting results reduce confidence.
8–13: Limited or indirect evidence; single studies or weaker designs.
0–7: Little direct evidence, primarily anecdotal, or evidence contradicts the claim.
Whether the proposed mechanism is valid and established. Does the causal logic make sense, or are there fundamental flaws?
23–25: Mechanism is well-understood, empirically confirmed, with no major gaps in the causal logic.
19–22: Mechanism is plausible and mostly confirmed; minor uncertainties remain.
14–18: Mechanism is theoretically sensible but partially speculative; some evidence exists.
8–13: Mechanism is unclear or faces serious conceptual challenges; mostly speculative.
0–7: Mechanism is implausible, contradicted by evidence, or nonsensical.
Degree of agreement among domain experts and relevant scientific/policy bodies on whether the claim is accurate.
23–25: Strong consensus among experts that the claim is true or false; mainstream scientific position.
19–22: Majority expert agreement, with some legitimate minority views.
14–18: Genuine disagreement among experts; no clear consensus.
8–13: Minority of experts support the claim; most disagree.
0–7: Expert consensus rejects the claim; fringe position only.
Whether findings hold across independent studies, populations, and contexts. Resistance to p-hacking, publication bias, and single-study effects.
23–25: Consistent replication across multiple independent studies, contexts, and populations.
19–22: Generally consistent replication with occasional exceptions or confounds.
14–18: Mixed replication; some studies replicate, others conflict.
8–13: Weak replication; mostly single-study effects or contradictory results.
0–7: No replication; study-specific or contradicted across attempts.
Structural Score dimensions
The Structural Score (0–100) is the sum of four dimensions, each scored 0–25. It measures how much structural versus individual factors explain the observed phenomenon — independent of whether the claim is true.
Measures the causal strength of the evidence supporting the structural explanation.
23–25: Multiple RCTs or near-perfect natural experiments with large samples and pre-registered analysis.
19–22: Strong natural experiment or high-quality quasi-experimental design (difference-in-differences, instrumental variables).
14–18: Observational with good controls, or weaker natural experiments with plausible identification.
8–13: Observational correlations, descriptive statistics, or meta-analyses of lower-quality studies.
0–7: Anecdotal, case studies, or evidence that structurally favors one interpretation.
Countries with different structural policies should have different outcomes if the explanation is structural. If it were behavioral, the pattern should hold everywhere.
23–25: Near-natural experiment — similar countries, meaningfully different policy, stark outcome difference (e.g. Finland vs. US on homelessness).
19–22: Strong pattern across multiple peer countries, partially confounded by other country-level differences.
14–18: Pattern exists but confounders are significant and not fully addressed.
8–13: Limited cross-national comparison, or the pattern is weak or inconsistent.
0–7: No meaningful cross-national comparison available, or the cross-national data cuts against the structural claim.
If the outcome is structurally produced, changing the structure should change the outcome. Policy natural experiments (a law passes, a program launches) test this prediction directly.
23–25: Multiple documented policy interventions with clear, measured causal effects on the outcome.
19–22: One strong policy natural experiment with measured effects and reasonable identification.
14–18: Some policy evidence, but identification is weaker or effects are smaller than theory predicts.
8–13: Indirect or suggestive policy evidence only.
0–7: No policy variation available to test, or policy evidence contradicts the structural prediction.
After controlling for the structural factors, how much outcome variance is attributable to individual choice, behavior, or traits? High scores mean structural factors absorb most of the variance; low scores mean individual factors remain dominant.
23–25: Individual factors become statistically insignificant or very small after structural controls are included.
19–22: Individual factors have modest residual effect after structural controls; structural dominates.
14–18: Meaningful individual variance remains; both explanations have real purchase.
8–13: Individual factors explain substantial variance independently of structural factors.
0–7: Individual factors dominate; structural controls explain little additional variance.
Limitations
Sub-scores within each dimension involve judgment calls. Two analysts applying the same rubric to the same evidence may reach scores that differ by 2–5 points per dimension. We treat scores as precise enough to place claims in the right band, not precise enough to rank order claims within a band. A score of 82 and 86 are both "strongly structural" — the difference is not meaningful. A score of 48 and 79 is meaningful.
The rubric scores structural evidence, not truth. A low structural score does not mean structural factors are absent — it means the evidence base for them is weak by these criteria. Some structural phenomena are hard to study experimentally; this rubric penalizes evidence quality, not the phenomenon itself.
Scores are updated as the evidence base changes. Citations on each claim page reflect the sources used in scoring. Corrections and additions are welcome.