Cross-race misaggregation: Its detection, a mathematical decomposition, and Simpson's paradox

Bryan L. Koenig, Florian Van Leeuwen, Justin H Park

Research output: Contribution to journalArticle (Academic Journal)peer-review

1 Citation (Scopus)
339 Downloads (Pure)


Researchers sometimes aggregate data, such as combining resident data into state-level means. Doing so can sometimes cause valid individual-level data to be invalid at the group level. We focus on cross-race misaggregation, which can occur when individual-level data are confounded with race. We discuss such misaggregation in the context of Simpson's Paradox and identify four diagnostic indicators: aggregated rates that correlate strongly with the relative size of one or more subgroup(s), unequal sample sizes across subgroups, unequal rates or mean values across subgroups, and aggregated rates that do not correlate with subgroup rates. To illustrate these diagnostic indicators, we decomposed data on the prevalence of sexually transmitted diseases (STDs) to confirm cross-race misaggregation in Parasite Stress USA, an ostensible index of parasite prevalence known to be confounded with the proportion of African American residents per state.
Original languageEnglish
Pages (from-to)16-22
Number of pages7
JournalEvolutionary Behavioral Sciences
Issue number1
Early online date28 Dec 2015
Publication statusPublished - Jan 2017

Structured keywords

  • Cognitive Science
  • Social Cognition


  • Simpson’s Paradox
  • ecological fallacy
  • parasite-stress theory
  • sexually transmitted diseases
  • population demographics


Dive into the research topics of 'Cross-race misaggregation: Its detection, a mathematical decomposition, and Simpson's paradox'. Together they form a unique fingerprint.

Cite this