 Welcome everybody. My name is Saskia Hilteman and now I will give you a very brief overview of metagenomic sequencing in Galaxy. So let's start with why we study the microbiome. So there are different reasons to different applications. One big application is healthcare research. So we as humans are full of microorganisms everywhere in our body from our skin to our gut to our mouths, our noses, even our eyes. They all are hosts to many microorganisms and these microorganisms can affect our health. They can affect how well drugs work and things like that. In fact, there is so many genetic material from these microorganisms that it's sometimes referred to as our second genome. These are very rough estimates and there's some debate about these, but there are approximately 10 times more cells than you containing 100 times as many genes than as you and thousands of different species. And these do affect our health. So these are important to study. Another application area can be environmental studies. So microbes in the soil can affect how well plants and animals thrive. And this study can help improve agriculture in regions that otherwise may be inhospitable to certain plants. Now there are two main approaches when we do microbiome analysis. There is the shotgun method. This is where we take all genetic material, all these different microorganisms and we sequence it all together. So you can think of this as each organism being a jigsaw puzzle. If we do shotgun sequencing, we cut all the genetic sequence into smaller pieces, into reads, and these land in one big pile and then we have to disentangle them. So you can imagine we have a lot of data so we can get more information about this process of analyzing the data and reconstructing the different genomes is also more complicated. And of course more costly because we sequence more. The other main approach is Amplicon sequencing. So this is a more targeted approach. We instead of sequencing all the genetic material, we sequence only a specific gene. So let's say in this analogy, we would look only at the corner pieces. Now these pieces are easy to recognize because they're different than the other pieces. We have obviously less information so we can't reconstruct the entire picture of each puzzle, but it may tell us enough information for us to tell things like okay this is a nature puzzle or this is an artwork. Obviously this, we have less information so we can say less about the microorganisms. It's also simpler to perform and cheaper and less complex to analyze. So these two shotgun and Amplicon sequencing are the two main approaches in microbiome analysis. So for the Amplicon sequencing, the targeted approach, often we use the 16S or 18S ribosomal RNA gene for this. And the reason this gene is often chosen is because it is present in all bacteria, but not eukaryotes or environmental fungi implants. So you really target only the bacteria that you want to sequence and archaea. The other reason why this is very suitable gene is because it has highly conserved regions so that makes it easy to target to find these genes across all bacteria. And in between these highly conserved regions are highly variable regions and these regions you can use to distinguish between different taxonomic, different microbiome microbes. So often you sequence across several of these V regions, so these V1 to V9 are the variable regions. So you might sequence V1 and V2 this section or you might sequence V3, V5 this whole section or this V6, V8. So typically you don't sequence the entire gene, but one or more of these variable regions. And this allows you to distinguish, usually you can't go further than genus level, so you can't go down to the level of species differentiation, but down to genus is very doable and for many use cases this is enough. Okay, so Amplicon sequencing some of the pros are it's very well established, it has been used for a long time. It's not very expensive. There are many tools available that can do the analysis, many reference database that can be used. Some of the downsides are that the choice of which V region you choose to sequence may bias your results. So some microbes might be easier to differentiate between than others depending on which V region you choose. And like I said, this is based on very highly conserved gene so it is hard to resolve down to species or strain level and usually genus level differentiation is the best you can do. And of course, you can only identify. You can only do taxonomic profiling, you can don't have extra functional information about maybe what these microbes are up to, or any mutations in certain genes. And the other approach shotgun metagenomics, here you sequence all genetic material. Well, some of the pros here are that you are not biased by the Amplicon primer sets of which V region you choose. It's not limited by conservation of the Amplicon and you get more information so you can also provide functional information about the activity of these microbes. And some of the cons now since you're not targeting you will get some contamination from the host set say you are sequencing human microbiome you will get some contamination from human cells. If you're sampling environmental samples you will get some contamination from surrounding plants and fungi maybe. It's also more expensive because we're sequencing more the data analysis more complex because we have to distinguish or reconstruct all this genetic material into their different genomes. So that more complex data data analysis you need high performance computing more memory and more compute capacity. Now the whole end to end. Analysis here looks a little bit like this you start with taking your sample. You extract it you amplify it you do the sequencing, and then comes the bioinformatics. And, as always, the choices you make in each of these steps can affect your downstream results. So you have to be very aware of choices you make in each step. Now the bioinformatics, in this case is a big part of your experiment so there are a lot of sequences being generated. And then they come from a lot of different genomes. So bioinformatics is really important, and it can sometimes feel like you're drowning in sequencing data. So we need some good tools and approaches to resolve this. Here's an overview of the analysis pipelines. So for amcon sequencing. In both cases we start with pre processing so this means cleaning our data and assessing the quality. For amcon sequencing, we will remove some other sequencing artifacts that may arise called chimeras. We will cluster very similar sequences into ot use this stands for operational taxonomic units, which are just clusters are very similar sequences. And then we assign a taxonomy to these clusters, and then we do some visualization for shotgun. This is very similar on the left side but on the right side you see we can do an additional functional analysis. The sequencing is very similar to what you've seen in other analyses I'm sure this is a FASQC plot of the sequencing read quality, and very typical as with other experiments is that towards the end of the read. So here is position one of the read on the left, and the higher positions on the right, the quality drops off so you might want to do some filtering and trimming of this data before we start. So I think that can happen with amcon sequencing is during PCR application here. Something can go wrong, which leads to sort of hybrid sequences that come from two different reads originally, and they're called chimeras. And we want to make sure to get rid of those from our sequence before we do our full analysis. The next step will be to cluster highly similar sequences into OTS operational taxonomic units. So often you see this 97% identity threshold use. So that means that any two sequences that are 97 at least 97% identical will sort of be grouped together. So here you see schematic of that, and using this 97% identity threshold in the case that we are going down to genius level. So these, these clusters then each represent taxonomic unit a genius unit, which will try to assign to specific taxonomy. So we take these, these clusters we take the sequences from each of these clusters and we try to assign taxonomy for this we have reference databases we can use so amcon sequencing is very established. So these reference databases are very mature. So there are two main ones, silver and green jeans, I think green, silver is a little bit more updated nowadays than green jeans so we'll be using that in the tutorial also. And for shotgun sequencing, we have the metafine database, which has a very large number of marker genes to aid in identification. But of course, again, the choice of which database you use can affect what you find, because you only find what you look for in this approach. And these databases are invariably incomplete. So just be aware of this factor. Okay, so they're different functional databases as well for shotgun sequencing to identify gene families, or groupings on other functional categories, such as keg gene ontology, and you can also do a pathway reconstruction of this. Now the final results for taxonomic identification will be the OTU table. So I already mentioned that we're clustering sequences into OTU so to use are just groups of very similar sequences. So we're assigning taxonomy to those. So at the very end, we have this OTU table. Each line represents a different OTU, different microorganism. And you see here the name of the OTU, the number of times it was observed so this can be used to say something about relative abundance of these different organisms. And then the full taxonomy. So here we see that the most prevalent organism in our sample here was cephalococcus, and then listeria, and then streptococcus, and so on. So this can be very useful, for example, in healthcare, if a patient comes in with an infection, often you just want to know which infectious agent is present so that you know what antibiotic to give the patient. So here we don't need the full functional information. Perhaps that shotgun might have given us if we know, okay, there's cephalococcus is causing this infection. That's enough for treatment plan. Of course, if you would like more information, for example, are these these micro organism, are they resistant to certain antibiotics so they have antibiotic resistant genes. Then you would need more information that amplicon sequencing can give you. So that's when shotgun sequencing comes in handy. And after all is done. It's always nicer to have a visualization. I've been looking at this table so we also show you Krona is a very nice tool for this to interactively explore these plots so you can click here on these different, these different taxonomic levels to explore your sample. And we will show that in the tutorial as well. So Finch is another, it's an online tool that is very good at so Krona let you really explore a single sample while Finch is made for a multi sample analysis. So these are some nice graphs to show how different samples compare. Okay. And with that, I hope to see you at the tutorial.