EVS Variants Track Settings
 
NHLBI GO Exome Sequencing Project (ESP) - Variants from 6,503 Exomes   (All Variation tracks)

Display mode:   

Minimum minor allele frequency (if INFO column includes AF or AC+AN):

VCF configuration help

View table schema
Data last updated: 2014-03-28

Description

The goal of the NHLBI GO Exome Sequencing Project (ESP) is to discover novel genes and mechanisms contributing to heart, lung and blood disorders by pioneering the application of next-generation sequencing of the protein coding regions of the human genome across diverse, richly-phenotyped populations and to share these datasets and findings with the scientific community to extend and enrich the diagnosis, management and treatment of heart, lung and blood disorders. The current data release (ESP6500SI-V2-SSA137) through the EVS website is taken from 6,503 samples drawn from multiple ESP cohorts and represents all of the ESP exome variant data.

Data in this track was obtained from the EVS Release Version: v.0.0.25. (Feb. 7, 2014).

Display Conventions

In "dense" mode, a vertical line is drawn at the position of each variant. In "pack" and "full" modes, in addition to the vertical line, a label to the left shows the reference allele first and variant alleles below (A = red, C = blue, G = green, T = magenta, Indels = black). Hovering the pointer over any variant will prompt the display of the occurences numbers for each allele in the Exome Sequencing Project's database. Clicking on any variant will result in full details of that variant being displayed as well as possible links to the ESP and dbSNP databases.

Methods

Sequences were aligned to NCBI build 37 human genome reference using BWA. PCR duplicates were removed using Picard. Alignments were recalibrated using GATK. Lane-level indel realignments and base alignment quality (BAQ) adjustments were applied.

All data were simultaneously analyzed for exome SNP variants at the University of Michigan (by the Abecasis Laboratory). SNPs were called using a two-step approach. First, genotype likelihood files (GLFs) were generated using samtools pileup on individual BAM files. Next, we used glfMultiples, a multi-sample variant caller, to generate initial SNP calls. Details of the likelihood model implemented in glfMultiples are given in Li, et al., 2011 (in the section entitled "Identifying Potential Polymorphic Sites"). The Michigan SNP calling pipeline is available at: http://genome.sph.umich.edu/wiki/UMAKE. This pipeline makes diploid calls for pseudo-autosomal regions of male samples and haploid calls for the rest of the chromosome. Female samples have diploid calls for all regions on the X chromosome. SNPs were filtered by a machine-learning technique called support vector machine (SVM) classification (for a detailed description, see Filter Status).

Small INDEL variants were analyzed at the Broad Institute (by the Genome Sequencing and Analysis group) using the GATK variation discovery pipeline following the guidelines in the GATK best practices v4. More specifically, each BAM was reduced to create a Reduced BAM, and then INDELs were discovered by analyzing all samples simultaneously with the GATK UnifiedGenotyper, and subsequently filtered by the GATK Variant Quality Score Recalibration (VQSR) filtering model, again following the V4 best practices. The INDEL genotypes for X and Y chromosomes were adjusted to be consistent with the samples' genders. Female samples have diploid calls for all regions on the X chromosome. Male samples have diploid calls for pseudo-autosomal regions on the X chromosome and haploid calls for the rest of the X chromosome and on the Y chromosome as well. However, the INDEL calls for the ESP data are preliminary and not as robust as the SNP calls at this point. Users are advised to keep this difference in mind when applying the ESP data to research studies.

All SNPs and INDELs were further annotated by SeattleSeqAnnotation137, and the variant annotations at the coding-DNA and protein levels mostly follow HGVS notations.

The SNP calls are included in the release of dbSNP build 138. The full dataset is described in Fu, et al., 2013, and a subset of the data (i.e., 2,500 exomes) was published by the ESP Population Genetics and Statistical Analysis Working Group in Tennessen, et al., 2012.

Credits

The authors would like to thank the NHLBI GO Exome Sequencing Project and its ongoing studies which produced and provided exome variant calls for comparison: the Lung GO Sequencing Project (HL-102923), the WHI Sequencing Project (HL-102924), the Broad GO Sequencing Project (HL-102925), the Seattle GO Sequencing Project (HL-102926), and the Heart GO Sequencing Project (HL-103010).

Contact: evsserver@uw.edu

References

Fu W, O'Connor TD, Jun G, Kang HM, Abecasis G, Leal SM, Gabriel S, Rieder MJ, Altshuler D, Shendure J et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature. 2013 Jan 10;493(7431):216-20. PMID: 23201682; PMC: PMC3676746

Li Y, Sidore C, Kang HM, Boehnke M, Abecasis GR. Low-coverage sequencing: implications for design of complex trait association studies. Genome Res. 2011 Jun;21(6):940-51. PMID: 21460063; PMC: PMC3106327

Tennessen JA, Bigham AW, O'Connor TD, Fu W, Kenny EE, Gravel S, McGee S, Do R, Liu X, Jun G et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science. 2012 Jul 6;337(6090):64-9. PMID: 22604720; PMC: PMC3708544