Studying C4

c4haplotypesThe complex variation of the complement component 4 (C4) genes has prevented their effective inclusion in genome-wide studies based on SNP arrays and exome sequencing. Here we provide resources to facilitate the analysis of C4 in both C4-focused and genome-wide studies.

We describe a combination of molecular assays and downstream inferential strategies that utilize droplet digital PCR (ddPCR) to infer the C4 gene contents of each genome analyzed – including the combination of C4AL, C4AS, C4BL, and C4BS genes present. This method has high (64/64) concordance with results obtained using Southern blotting. Common C4 alleles can also be imputed from flanking SNPs. Though recurring mutation at C4 makes imputation less effective than it is for simpler variants, we found that the common C4 alleles can generally be imputed with 0.7 < r2 < 1, making this approach useful for large cohort studies. We provide the reference panel for imputation that we created from the HapMap samples.

Check back for updates: we are creating advanced reference panels from whole genome sequencing of much larger numbers of people, which we hope will enable the imputation of lower-frequency C4 alleles.

Molecular analysis of C4 structural variation using droplet digital PCR

Reference panel for imputation

Sekar et al. Schizophrenia risk from complex variation of complement component 4. Nature, 2016.

Genome STRiPGenome Strip

Genome STRiP (Genome STRucture In Populations) is a set of software tools for analyzing genome structural variation in whole-genome sequence data from many individuals of the same species.  Bob Handsaker in our lab is the lead developer and architect.  We describe Genome STRiP and its applications in two papers in Nature Genetics. As of February 2015, Genome STRiP had been downloaded by 1,099 users and cited in 133 academic papers.  Here are links to software downloads, documentation, cookbooks, and Bob Handsaker’s workshop videos for users.  This work is supported by the iSeqTools program of the National Human Genome Research Institute.

Bob Handsaker makes available his new maps of human genome copy number variation in the 1000 Genomes Project.  These maps contain the genomic locations, alleles, and allele frequencies of 8,659 segregating CNVs in diverse populations, including 1,356 multi-allelic CNVs that are described for the first time at the levels of integer copy numbers, copy-number alleles, and allele frequencies.  These maps also report the relationships of CNV alleles to SNPs and haplotypes, including visual plots suitable for re-use in papers and presentations.

Official website
Human CNV browser
Download data

Handsaker et al. Large multiallelic copy number variations in humans. Nature Genetics, 2015.
Handsaker et al. Discovery and genotyping of genome structural variation by sequencing on a population scale. Nature Genetics, 2011.

Human genome replication timing

DNA replication creates opportunities for mutation, and the timing of DNA replication correlates with the density of SNPs across the human genome. To enable deeper investigation of how DNA replication timing relates to human mutation and variation, we generated a high-resolution map of the human genome’s replication timing program and analyzed its relationship to point mutations, copy number variations, and the meiotic recombination hotspots utilized by males and females.

Amnon Koren makes available these data resources, which were generated and/or further analyzed in work he described in the following papers.  These data are from Koren’s experiments profiling replication timing at loci across the human genome in lymphoblastoid cell lines.  The experimental protocol was developed by Amnon Koren and is described in the AJHG paper.

Download data

Koren et al. Differential relationship of DNA replication timing to different forms of human mutation and recombination.  Am J Hum Genet, 2012.
Koren and McCarroll. Random replication of the inactive X chromosome.  Genome Research, 2013.
Koren et al. Genetic variation in human DNA replication timing.  Cell, 2014.


Birdsuite is an open-source set of software tools for analyzing data from SNP arrays to detect and report SNP genotypes, common copy-number polymorphisms (CNPs), and novel, rare, or de novo copy-number variants (CNVs).  Birdsuite analyzes data generated using  “hybrid” SNP/CNV genotyping arrays (e.g., the SNP 6.0 array) that we co-developed with Affymetrix. While most of the components of the suite can be run individually (for instance, for SNP genotyping only), the Birdsuite is especially intended for integrated analysis of SNPs and CNVs.  The two papers describing hybrid SNP/CNV arrays and Birdsuite have been cited respectively in 712 and 529 academic papers as of February 2015.

Official Birdsuite website.

McCarroll et al., Integrated detection and population-genetic analysis of SNPs and copy number variation.  Nature Genetics, 2008.
Korn et al., Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs.  Nature Genetics, 2008.

Drop-seq logo

Drop-seq is a technology we developed to enable biologists to analyze genome-wide gene expression in thousands of individual cells in a single experiment.  Our Drop-seq resources site provides interested users with resources to implement Drop-seq in their own labs. We hope Drop-seq helps you do things you have always wanted to do.  Tell us about it!

Macosko et al., Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell, 2015.


Drop-phase is an approach we developed to quickly determine the chromosomal phase of sequence variants by separating genomic DNA into thousands of nanoliter droplets and then analyzing the extent to which alleles partition into the same or different droplets.  We hope this is useful for studies involving genome editing, allele-specific expression, and for clinical scenarios involving compound heterozygosity.

Regan, Kamitaki et al., A rapid molecular approach for chromosomal phasing. PLoS One, 2014.