Sunday, April 15, 2018

CBIO Colloquium | 4/16/2018

Haris Vikalo
University of Texas at Austin
Research Profile

Efficient Algorithms for Haplotype Assembly and Viral Quasispecies Reconstruction

Monday April 16
2:00 PM
RRI 101

Abstract: We study two applications of high-throughput DNA sequencing: haplotype assembly and viral quasispecies reconstruction. These NP-hard tasks are rendered challenging by sequencing errors and limited lengths of sequencing reads. We present a framework that casts haplotype assembly as the problem of decomposing a tensor of sequencing reads into a product of two factors -- one that reveals haplotype information and another one that encodes origin of the reads. The performance and convergence properties of a proposed tensor factorization method are analyzed and, in doing so, guarantees on the achievable minimum error correction scores and correct phasing rate are established. We then extend this framework and utilize it to reconstruct viral quasispecies characterized by uneven frequencies of their components. This is accomplished by successively inferring strains in a quasispecies in order from the most to the least abundant one; every time a strain is inferred, sequencing reads generated from that strain are removed from the dataset. Extensive benchmarking studies on synthetic and experimental data demonstrate efficacy of the proposed methods. Finally, after noting a connection between single individual haplotyping and the task of deciphering coded messages in communication systems, we employ information-theoretic tools to study fundamental limits of performance of haplotype assembly algorithms.

No comments: