 Hello and welcome to this SIB in silico talk. My name is Robert Waterhouse. I am a SIP group leader and assistant professor at the Department of Ecology and Evolution of the University of Lausanne. In my group, we aim to elucidate interactions between gene evolution and gene function through developing computational approaches to interrogate evolutionary and functional genomics data. Today I will present an overview of our work exploring how beetles have evolved to feed on plants published earlier this year in genome biology. The amazing diversity of beetle species has long been considered at least in part due to an evolutionary arms race with the plants on which many beetles feed. In our study, we explored the possible genomic consequences of plant-beetle interactions using genomic and transcriptomic data from 18 beetle species. The overarching question we set out to address has fascinated evolutionary biologists and particularly entomologists for many years. Can the incredible species diversity found amongst beetles today be explained by co-evolution with the many plants on which they feed? Specifically, with modern sequencing technologies allowing us to sample large-scale genomic data from many species, can we now find support for this hypothesis in the way that beetle genes and genomes have evolved? I aim to present the approaches we took to build our genomic data analysis workflow covering the various tools and resources we used along the way, some of which are developed and maintained by SIB research groups. This behind-the-scenes look at our study is likely to be of particular interest to evolutionary biologists working with genomics data. For those less familiar with the wonderful world of insects, I should first set the scene. Haldane's quote, although probably apocryphal, responds to a theologist's question. What has the study of the natural world taught us about God? With the answer being, one thing's for sure, the creator must have been incredibly fond of beetles. Why so? Amongst described species, beetles are by far the most species, with about 390,000 species representing some 40% of all insects. Estimated total species numbers are of course greater, but beetles still make up a large fraction of all insects with possibly 1.5 million species. This incredible diversity of beetles is undisputed, and for most sensible people, the role of any deity in creating this diversity is also undisputed. Instead, several ideas have been put forward to explain this striking species diversity, with the predominant hypothesis focusing on plant-beetle feeding interactions, where, as land plants diversified, so too did the beetles that feed on them. This hypothesis is logically appealing, because plants offer an abundant and varied food source, but they also defend themselves against plant-eating insects. Beetles feed on various plant materials, often causing great damage. Indeed, some of our worst agricultural pests are beetles. But the plants fight back by producing distasteful deterrents and toxic poisons. In turn, beetles must continuously find the means to counteract these defenses, thereby propelling an endless arms race. Beetle responses to plant defenses are numerous, and aim to neutralize or minimize the effects either directly or indirectly. Direct neutralizing is carried out by various families of detoxification enzymes, including cytochrome P450s, carboxyl esterases, and glutathione ester transferases. In an arms race scenario, one hypothesis would be that to more effectively deal with harmful toxins, plant-feeding beetle species will produce many more of these enzymes than other beetles that rely on other food sources. It follows therefore that one possible mechanism to achieve this would be to have many more copies in their genomes of the genes encoding these enzymes. To test this, we first have to collect the relevant genomic data, that is, sequenced assembled and annotated transcriptomes and genomes of different beetle species. We sampled nine species from the mainly predatory suborder of beetles with transcriptomes, and nine species from the mostly plant-feeding suborder of beetles with both genomes and transcriptomes. Importantly, reaching the sample size of a total of 18 beetle species was made possible only through the generous sharing of pre-publication genomic data from the I5K5000 arthropod genomes initiative and the OneKite 1000 insect transcriptome evolution consortium. As with any scientific study, it is important to first assess the quality of the input data. In our case, we are particularly interested in examining gene copy number variation amongst our set of 18 beetle species. So for these to be valid, our comparisons of gene sets need to be mostly complete. That is, our genomes and transcriptomes need to have captured as complete a catalogue of genes in each species as possible. To assess completeness, we used benchmarking universal single copy orthologs, or BUSCOs, a widely used SIP resource. BUSCO provides quantitative measures for the assessment of genome assembly, gene set and transcriptome completeness, based on an evolutionarily informed expectation of gene content from near-universal single copy orthologs. Most of our data sets showed good completeness, allowing us to proceed with our analysis and to perform additional control analyses where we could exclude some species with lower completeness levels. Next, we need to delineate gene families or groups of orthologous genes across our set of beetle species. These orthologous groups allow us to trace evolutionary histories and thereby examine changes in gene copy number throughout the evolution of these species. To delineate orthologous groups, we use the orthoDB catalogue of orthologs, another widely used SIP resource. All against all protein sequence alignments are used to identify best reciprocal hits and build clusters of orthologs or to map genes from new species to existing orthologous groups. To model gene copy number changes, we first require a robust species phylogeny that defines the evolutionary relationships amongst our beetle species. The phylogeny of predatory and herbivorous beetles, rooted with an out-group species, provides the evolutionary framework for analyzing traits such as gene copy number variation. To achieve this, we selected universal single copy orthologs from our orthoDB delineated orthologous groups, aligned these with MAFT, concatenated them into a super alignment, and then used this to reconstruct the species phylogeny with FRAXML. With all the necessary input data assembled, we can now infer gene gain and loss events along the species phylogeny. To do this, we employed the computational analysis of gene family evolution, or CAFE inference tool. CAFE uses a birth and death process to analyze changes in gene family sizes, while accounting for phylogenetic history, as well as for potential errors or missing information in the input data. Providing a statistical foundation for evolutionary inferences of gene family expansions or contractions. This allowed us to compare the rates of gene gain or loss per gene per million years, as well as the numbers of orthologous groups affected by such gene gain or loss events between the herbivorous and the predatory beetle lineages. Our comparisons showed that gene family evolution, in terms of copy number variation, was much more dynamic amongst the herbivores than the predators. We examined only beetle-wide orthologous groups to avoid potential biases that could be introduced by including OGs specific to one lineage or the other. In a nutshell, considering all genes from such beetle-wide orthologous groups, the plant feeding lineage showed a higher gain rate across more OGs, as well as a lower loss rate spread out over more OGs. It is possible that this greater dynamism may be generally linked to the greater species richness of the plant feeders, with no specific role for plant feeding underlying this trend. However, among the candidate OGs for detoxification, there are also more gains in plant feeders and in contrast to the background where there were fewer OGs with losses. Thus, both gain and maintenance are higher for the detox gene families in plant feeders, which is consistent with a key role for plant defenses in driving dynamic gene repertoire evolution and particularly lineage-specific expansions. Such gene repertoire changes may be significantly different between the two lineages, but are the lineage-specific expansions in plant feeders actually adaptive? To address this question, we tested for signatures of adaptive expansion in each suborder by comparing neutral Brownian motion to adaptive Orstein-Ulenbeck evolutionary models. The models consider per-species gene count as a trait that can evolve towards a value, which may or may not differ between the two suborders and may or may not be driven by selective pressure. This we call the optimum value in the models that evoke selection. Amongst all OGs with significant variations in their gene content, the vast majority showed significantly higher optimum for the plant feeding lineage. That is, Polyfaga showed more adaptive lineage-specific expansions. In addition, these adaptive expansions were enriched for candidate gene families involved in detoxification. A striking example of adaptive lineage-specific expansions is that of a group of glutathione S. transferases. Genes from plant-eating species are shown here in red, and in total, they outnumber genes from predatory species almost two to one, with an Ornstein-Ulenbeck optimum of 12 compared to just seven. The red and yellow bands highlight phylogenetically well-supported lineage and species-specific expansions, respectively. This group of genes was identified first as dynamically evolving from the CAFE analysis, and then as adaptive from the OE analysis, with the best-fitting evolutionary model showing two optima. These adaptive lineage-specific expansions support beetle plant co-evolution, with genomic signatures accompanying the dietary shift to plant-eating. Taken together, our results provide genomic support for the popular hypothesis that co-leoptera species richness may be in part explained by their interactions with land plants. Specifically, we found candidate detoxification genes were often part of the most dynamically evolving gene families, as quantified by our CAFE analysis. Furthermore, our evolutionary modeling with OE supported adaptive rather than neutral expansions of such families. Confirmation and generalization of our observed trends would ideally involve whole genome sequencing to assemble and annotate high-quality genomes for improved resolution and confidence, as well as sampling from other beetle clades or some of the many other groups of insects with dietary shifts towards plant-feeding to enable phylogenetic replication. Nevertheless, performing multi-species analyses with rigorous statistics and evolutionary modeling using both genomic and transcriptomic data, we were able to identify evidence of adaptive selection on gene copy number beyond anecdotal observations of variable gene counts in different species. I hope that this behind-the-scenes look at the approaches we employed to build our analysis workflow provides you with a clear understanding of the various tools and resources we use along the route to identify genomic signatures that have accompanied dietary shifts in beetles. It remains for me to acknowledge all the co-authors of this study, especially Nadir Alvarez and Mathieu Sepi, and of course, to thank you all for listening.