TY - JOUR
T1 - Unbiased estimation of odds ratios
T2 - Combining genomewide association scans with replication studies
AU - Bowden, Jack
AU - Dudbridge, Frank
PY - 2009/9/29
Y1 - 2009/9/29
N2 - Odds ratios or other effect sizes estimated from genome scans are upwardly biased, because only the top-ranking associations are reported, and moreover only if they reach a defined level of significance. No unbiased estimate exists based on data selected in this fashion, but replication studies are routinely performed that allow unbiased estimation of the effect sizes. Estimation based on replication data alone is inefficient in the sense that the initial scan could, in principle, contribute information on the effect size. We propose an unbiased estimator combining information from both the initial scan and the replication study, which is more efficient than that based just on the replication. Specifically, we adjust the standard combined estimate to allow for selection by rank and significance in the initial scan. Our approach explicitly allows for multiple associations arising from a scan, and is robust to mis-specification of a significance threshold. We require replication data to be available but argue that, in most applications, estimates of effect sizes are only useful when associations have been replicated. We illustrate our approach on some recently completed scans and explore its efficiency by simulation.
AB - Odds ratios or other effect sizes estimated from genome scans are upwardly biased, because only the top-ranking associations are reported, and moreover only if they reach a defined level of significance. No unbiased estimate exists based on data selected in this fashion, but replication studies are routinely performed that allow unbiased estimation of the effect sizes. Estimation based on replication data alone is inefficient in the sense that the initial scan could, in principle, contribute information on the effect size. We propose an unbiased estimator combining information from both the initial scan and the replication study, which is more efficient than that based just on the replication. Specifically, we adjust the standard combined estimate to allow for selection by rank and significance in the initial scan. Our approach explicitly allows for multiple associations arising from a scan, and is robust to mis-specification of a significance threshold. We require replication data to be available but argue that, in most applications, estimates of effect sizes are only useful when associations have been replicated. We illustrate our approach on some recently completed scans and explore its efficiency by simulation.
KW - Genomewide scans
KW - Selection bias
KW - UMVUE
KW - Winner's curse
KW - WTCCC
UR - http://www.scopus.com/inward/record.url?scp=70149111937&partnerID=8YFLogxK
U2 - 10.1002/gepi.20394
DO - 10.1002/gepi.20394
M3 - Article (Academic Journal)
C2 - 19140132
AN - SCOPUS:70149111937
SN - 0741-0395
VL - 33
SP - 406
EP - 418
JO - Genetic Epidemiology
JF - Genetic Epidemiology
IS - 5
ER -