Research area: genomics

Bayesian Integrated Analysis Of Multiple Types Of Rare Variants To Infer Risk Genes For Schizophrenia And Other Neurodevelopmental Disorders

Created on 28th June 2017

Hoang T. Nguyen; Amanda Dobbyn; Laura M. Huckins; Douglas Ruderfer; Gulio Genovese; Menachem Fromer; Xinyi Xu; Joseph Buxbaum; Christina Hultman; Pamela Sklar; Shaun M. Purcell; Xin He; Patrick F. Sullivan; Eli Ayumi Stahl;

Integrating rare variation from family and case/control studies has successfully implicated specific genes contributing to risk of autism spectrum disorder (ASD). In schizophrenia (SCZ), however, while sets of genes have been implicated through study of rare variation, very few individual risk genes have been identified. Here, we apply hierarchical Bayesian modeling of rare variation in schizophrenia and describe the proportion of risk genes and distribution of risk variant effect sizes across multiple variant annotation categories. Briefly, we developed a pipeline based on the previous work used in ASD studies to jointly estimate genetic parameters for one or multiple combined populations of any disease. We applied this method to the largest available collection for rare variants in schizophrenia (1,077 families, 6,699 cases and 13,028 controls). We defined five variant annotation categories: disruptive (nonsense, frameshift, essential splice site mutations), damaging (predicting damaging by seven algorithms), silentFCPk (silent mutations within frontal cortex-derived DHS peaks) de novo mutations, and disruptive and damaging missense case/control singletons. We estimated that 8.01% of genes are risk genes (95% credible interval, CI, 4.59-12.9%), with mean effect sizes (95% CIs) of 12.25 (4.8-22.22) for disruptive de novos, 1.44 (1-3.16) for missense damaging de novos, and 1.22 (1-2.16) for silentFCPk de novos. The mean effect sizes of damaging and disruptive singleton variants for three case-control populations were 2.09 (1.04-3.54), 2.44 (1.04, 5.73) and 1.04 (1-1.19) respectively. Our analysis identified only two known SCZ risk genes with FDR < 0.05: SETD1A and TAF13; and two other genes with FDR < 0.1: RB1CC1 and PRRC2A. We further used FDRs to directly analyze candidate gene sets for the enrichment of Bayesian support. Significant enrichments were observed for essential genes, which were found enriched among autism genes in a recent study, and central nervous system (CNS) related genes, in addition to gene sets previously found to be enriched (including in these data). We conduct power analyses under our inferred model for SCZ, estimating the number of risk gene discoveries as more data become available, and quantifying the greater value of case/control over trio samples for novel rare variant risk gene discovery. We also applied the method to four other neurodevelopmental disorders: autism spectrum disorder (ASD), intellectual disorder (ID), developmental disorder (DD) and epilepsy (EPI), in total 10,792 families, and 4,058 cases and controls. The predicted proportions of risk genes in these diseases were smaller than that in SCZ, 4.6% in ASD, and < 3% for the other disorders. We report 164 and 58 genes with FDR < 0.05 for DD and ID, respectively, 101 and 15 of which are novel. Overall, replication of previous results confirms the robustness of our approach, and our method is able to identify novel risk genes for SCZ as well as for other diseases.

Show more