Polygenic scores using summary statistics via penalized regression

Created on 10th June 2016

Timothy Mak; Robert Milan Porsch; Shing Wan Choi; Xueya Zhou; Pak Chung Sham;

Polygenic scores (PGS) summarize the genetic contribution of a person's genotype to a disease or phenotype. They are useful in a wide variety of analyses of genetic data. Many possible ways of calculating polygenic scores have been proposed, and recently there is much interest in methods that incorporate information available in published summary statistics. As there is no inherent information on linkage disequilibrium (LD) in summary statistics, a pertinent question is whether we can make use of LD information available elsewhere to supplement such analyses. To answer this question we proposed a method for constructing PGS using summary statistics and a reference panel in a penalized regression framework, which we called lassosum. We also proposed a general method for choosing the value of the tuning parameter in the absence of validation data. Our simulation results suggested that lassosum is faster and more robust than other similar methods in most scenarios. We also found that accounting for LD with a reference panel is beneficial only when the signals from the data are strong. In the presence of summary statistics from a large number of SNPs, clumping may both enhance or decrease the performance of standard PGS, although its effects on lassosum is attenuated. lassosum combined with pre-filtering by clumping appears to be a robust and reliable option for calculating predictive PGS.

Show more