Computational Biology Colloquium
“ChIP-seq: unleashing its full potential through data integration”
Department of Biostatistics
Johns Hopkins University
Bloomberg School of Public Health
Thursday, September 18, 2014
Hosts: Ting Chen
One major goal of functional genomics is to comprehensively characterize the regulatory circuitry behind coordinated spatial and temporal gene activities. With the ability to map genome-wide transcription factor binding sites and histone modifications, ChIP-seq has quickly become an indispensable tool for studying gene regulation. Despite its unprecedented power, a number of challenges must be overcome before one can take full advantage of this high-throughput technology. First, ChIP-seq is increasingly used for analyzing dynamic changes of regulatory circuitry across different biological contexts (i.e., different cell types, time points, etc.). The conventional methods for analyzing data for one protein in one cell type cannot meet the emerging needs for characterizing quantitative and synergistic changes of DNA binding of multiple proteins between different conditions. New methods are required for dealing with the exponentially growing number of multi-protein combinatorial patterns, and for evaluating the statistical significance given the background biological and technical variation. Second, ChIP-seq is high-throughput with respect to analyzing the whole genome, but low-throughput with respect to analyzing gene regulation in a large number of biological contexts. New strategies need to be developed to better utilize ChIP-seq data originally collected from one biological system to gain insight into gene regulation in other biological systems or diseases. Third, ChIP-seq data also contain information on allele-specific binding (ASB). However, applying ChIP-seq to study ASB often suffers from low statistical power due to the limited number of reads mapped to heterozygote SNPs. In this talk, I will demonstrate that the problems above may be approached computationally by developing new statistical methods for jointly analyzing multiple ChIP-seq datasets and methods for integrating ChIP-seq data with enormous amounts of gene expression data in Gene Expression Omnibus (GEO).