Phyloseminar #62: Nicholas Lartillot (Université de Lyon)




Streamed live on Nov 16, 2016

Systematic errors in phylogenomic studies: on the importance of modeling pattern-heterogeneity across sites.

While all models now used in phylogenetic analyses account for rate-heterogeneity across sites, the case of pattern-heterogeneity (i.e. qualitative variation in substitution processes across nucleotide or amino-acid positions) is much less clear and has recently been the subject of some controversy. One main question is whether pattern-heterogeneity should be modelled at the level of genes (or groups of genes), or at the level of sites. Both approaches have been used in recent phylogenomic analyses of metazoans---sometimes leading to radically different conclusions---in particular concerning the early patterns of diversification within this group.

In this talk, I will first explore the empirical evidence concerning the presence, and the relative importance, of either type of heterogeneity in empirical sequence alignments. Then, I will introduce Dirichlet process mixture models accounting for site-specific amino-acid preferences. The statistical meaning of Dirichlet processes, as a non-parametric method for estimating arbitrary distributions of site-specific effects, will be explained and illustrated through simulation experiments. Finally, based on simulations implementing pattern heterogeneity simultaneously at both the gene and the site levels, I will show the importance of using models explicitly accounting for pattern-heterogeneity across sites for reconstructing accurate phylogenies.

