Author: Po-Ru Loh
Research area: genetics

Reference-based phasing using the Haplotype Reference Consortium panel

Created on 10th May 2016

Po-Ru Loh; Petr Danecek; Pier Francesco Palamara; Christian Fuchsberger; Yakir A Reshef; Hilary K Finucane; Sebastian Schoenherr; Lukas Forer; Shane McCarthy; Goncalo R Abecasis; Richard Durbin; Alkes L Price;

Haplotype phasing is a fundamental problem in medical and population genetics. Phasing is generally performed via statistical phasing within a genotyped cohort, an approach that can attain high accuracy in very large cohorts but attains lower accuracy in smaller cohorts. Here, we instead explore the paradigm of reference-based phasing. We introduce a new phasing algorithm that attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference Consortium, HRC) using a new data structure based on the positional Burrows-Wheeler transform. Our method improves phasing accuracy by >2x compared to the best publicly available alternative for phasing small European-ancestry cohorts, and it attains a ≈20x speedup and ≈10% increase in accuracy compared to reference-based phasing using SHAPEIT2. Our method is freely available for reference-based phasing using the HRC panel via the Sanger Imputation Service and the Michigan Imputation Server.

