Loading...

MIA: Mertash Babadi, A scalable, Bayesian model for copy number variation; Samuel Lee, Bayesian PCA

464 views

Loading...

Loading...

Loading...

Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on Mar 16, 2017

Models, Inference, and Algorithms
Broad institute of MIT and Harvard
March 15, 2017

MIA Meeting: https://youtu.be/zGPPd-qbP3I?list=PLl...

Mehrtash Babadi
Data Sciences & Data Engineering, Broad

A scalable Bayesian framework for inferring copy number variation

Abstract: Inferring copy number variation (CNV) from next-generation sequencing (NGS) data is a challenging problem. On the one hand, the complexity of the NGS technology results in a highly non-uniform sampling of the genome with unknown latent factors. On the other hand, devising and implementing modern machine learning algorithms for CNV inference in a scalable and robust fashion is an arduous task due to the sheer size of the data. In this talk, we briefly review the existing approaches and glance over a number of their caveats, including difficulty with sex chromosomes, lack of a data-driven model for determining the number of bias latent factors, neglect of sampling noise, heuristic filtering and outlier detection, lack of self-consistency and scalability. Next, we introduce GATK gCNV, our principled and scalable Bayesian framework for germline CNV inference from whole-exome sequencing (WES) and whole-genome sequencing (WGS) data that addresses these caveats. We benchmark GATK gCNV, XHMM and CODEX on WES data against high-confidence Genome STRiP calls on matched WGS data as ground truth, and show that GATK gCNV yields up to 30 percent higher sensitivity and specificity compared to the existing tools. We conclude the talk with a brief discussion of our ongoing efforts toward addressing the difficulty with common and large CNV events, and generalization to somatic CNV inference.

Samuel Lee
Data Sciences & Data Engineering, Broad

Bayesian PCA

Abstract: The model at the heart of GATK gCNV builds heavily on the probabilistic and Bayesian approaches to principal component analysis (PCA). In contrast with traditional PCA, the probabilistic approach provides a predictive model that can account for missing data, while the fully Bayesian approach further enables a principled way to learn the effective dimensionality of the principal subspace (i.e., the appropriate number of principal components to use). Model inference can be performed using expectation-maximization and variational-Bayesian methods, respectively. We will give a pedagogical overview of these methods, drawing analogies between Bayesian PCA and the perhaps more familiar Gaussian mixture model.

For more information regarding the Broad Institute and Models, inference and algorithms visit:
http://www.broadinstitute.org/mia

Copyright Broad Institute, 2017. All rights reserved.

Comments are disabled for this video.
When autoplay is enabled, a suggested video will automatically play next.

Up next


to add this to Watch Later

Add to

Loading playlists...