Arnold Library

An expansive human regulatory lexicon encoded in transcription factor footprints.

Neph, Shane and Vierstra, Jeff and Stergachis, Andrew B and Reynolds, Alex P and Haugen, Eric and Vernot, Benjamin and Thurman, Robert E and John, Sam and Sandstrom, Richard and Johnson, Audra K and Maurano, Matthew T and Humbert, Richard and Rynes, Eric and Wang, Hao and Vong, Shinny and Lee, Kristen and Bates, Daniel and Diegel, Morgan and Roach, Vaughn and Dunn, Douglas and Neri, Jun and Schafer, Anthony and Hansen, R Scott and Kutyavin, Tanya and Giste, Erika and Weaver, Molly and Canfield, Theresa and Sabo, Peter and Zhang, Miaohua and Balasundaram, Gayathri and Byron, Rachel and MacCoss, Michael J and Akey, Joshua M and Bender, M A and Groudine, Mark and Kaul, Rajinder and Stamatoyannopoulos, John A (2012) An expansive human regulatory lexicon encoded in transcription factor footprints. Nature, 489 (7414). pp. 83-90. ISSN 1476-4687

[thumbnail of NephNature_Sep2012.pdf]
NephNature_Sep2012.pdf - Accepted Version

Download (6MB) | Preview
[thumbnail of Supplemental figures and tables]
Text (Supplemental figures and tables)
NephNature_Sep2012SupplFiles.pdf - Supplemental Material

Download (38MB) | Preview
Article URL:


Regulatory factor binding to genomic DNA protects the underlying sequence from cleavage by DNase I, leaving nucleotide-resolution footprints. Using genomic DNase I footprinting across 41 diverse cell and tissue types, we detected 45 million transcription factor occupancy events within regulatory regions, representing differential binding to 8.4 million distinct short sequence elements. Here we show that this small genomic sequence compartment, roughly twice the size of the exome, encodes an expansive repertoire of conserved recognition sequences for DNA-binding proteins that nearly doubles the size of the human cis-regulatory lexicon. We find that genetic variants affecting allelic chromatin states are concentrated in footprints, and that these elements are preferentially sheltered from DNA methylation. High-resolution DNase I cleavage patterns mirror nucleotide-level evolutionary conservation and track the crystallographic topography of protein-DNA interfaces, indicating that transcription factor structure has been evolutionarily imprinted on the human genome sequence. We identify a stereotyped 50-base-pair footprint that precisely defines the site of transcript origination within thousands of human promoters. Finally, we describe a large collection of novel regulatory factor recognition motifs that are highly conserved in both sequence and function, and exhibit cell-selective occupancy patterns that closely parallel major regulators of development, differentiation and pluripotency.

Item Type: Article or Abstract
Additional Information: The final version of this article is freely available at the URL listed above.
DOI: 10.1038/nature11212
PubMed ID: 22955618
Grant Numbers: HG004592 , RC2HG005654, DGE-071824, UWPR95794
Fred Hutch Divisions: Basic Sciences
Depositing User: Library Staff
Date Deposited: 24 Jun 2013 20:59
Last Modified: 24 Jun 2013 20:59

Repository Administrators Only

View Item View Item