 Hi everyone and welcome to this brief overview of what is in the Galaxy training material in terms of sequence data analysis of spectrum of viral pathogen genomes. My name is Wolfgang Meier. I'm a member of the Galaxy Europe team at the University of Freiburg and I'm involved in many of the viral sequencing data analysis efforts that are taking place within the Galaxy project. The three viral pathogens with existing associated training in the GTN are SARS-CoV-2, influenza A virus and lumpy skin disease virus. For those of you who don't know what the letter is, that's a pox virus infecting livestock animals. Of course, this is just three out of many viral pathogens you could potentially want to analyze with Galaxy, but it's a selection of rather important pathogens that also serves to demonstrate how diverse viral pathogens are and how each of them poses their unique challenges to data analysis. So what we are hoping, you can learn from following these tutorials on the analysis of sequence data from these three different viruses, is that there is a core set of recommended tools and shared aspects to the general analysis flow and you should be aware of those. But that data from each virus also requires some particular unique steps that deal with the specifics of that particular pathogen in terms of its molecular biology, but also in some cases its epidemiological characteristics. So let's start with an overview of the three viruses we are dealing with here. First, SARS-CoV-2, a better coronavirus. Like other coronaviruses, it has its genetic material encoded on a single linear piece of single-stranded RNA with a size of roughly 30 kilobases. It exhibits only a medium rate of mutation and actually coronaviruses as a family. Have really low mutation rates for RNA-based viruses because unlike other such viruses, they are capable of doing proofreading during RNA replication. Influenza A in comparison is a more typical RNA-based virus in that regard. It lacks proofreading activity and consequently has a high mutation rate. It's genome is only about half the size of SARS-CoV-2, but more importantly, while it's also a single-stranded RNA genome, it comes as eight separate so-called segments of RNA, which has major implications for the diversity of influenza A virus isolates. Then finally we have, oh sorry, we have Lampi-skin disease virus or short LSDV. It's a pox virus or more specifically one of three members of the genus of Capripox viruses. Its genome is gigantic in comparison to the other two viruses at around 150 kilobases. And like for pox viruses, the genome consists of double-stranded DNA, which also means that it comes at a really low mutation rate and an overall not-so-great diversity between isolates. Now, for slightly more detail, look at each of the three viruses. In the figure here, you see the genome architecture of SARS-CoV-2, which has all its proteins encoded, as I said, on one linear piece of single-stranded RNA, and which with just 30 kilobases in size is still pretty straightforward to sequence. I've said before that for an RNA virus, SARS-CoV-2 shows only a low mutation rate, but more importantly, this virus got introduced into a large host population only a few years ago still. So it simply didn't have much time even by fast evolutionary standards of viruses to diversify a lot so far. And because of this and because the SARS-CoV-2 pandemic started in the age of molecular biology and massive throughput sequencing, there happens to be a single clearly defined reference genome of SARS-CoV-2 and you can map reads from whatever current isolate to that reference, so to the Wuhan who won isolate of SARS-CoV-2 without major issues. So bioinformatically speaking, this is a really beginner-friendly virus and as such, it's certainly a good idea to start your viral data analysis journey here with SARS-CoV-2 unless you are very specifically interested in one of the other two pathogens, of course. Now, judged by genome size alone, you would think that influenza A should be a rather similar case with its roughly 14-15 kilobase genome. However, that virus has several characteristics that require quite some extra effort on the bioinformatics side to analyze it properly. Influenza A for one thing has been circulating very widely in various large populations of hosts, be it mammalian or avian hosts and has done so for a really long time and that fact combined with its high mutation rate means that influenza A viruses have involved into an extraordinary diversity of subtypes with especially high variability in the genes coding for the most antigenic proteins, which would be hemagglutinin or abbreviated HA and neuraminidase abbreviated NA. Now, to make matters worse, the segmented nature of the influenza genome means that those segments can re-assort also in hosts infected with two different viruses at the same time, two different strains of influenza A and then that means that the genome segments can get shuffled into new sets of eight of them and for bioinformatics, that means that there is no such thing as one single reference genome you could map sequence reads to. In that case, of course, genome assembly is certainly an alternative, but on the other hand, assemblers tend to get confused by those conserved five prime and three prime ends at the ends of all the segments and they tend to over-assemble the reads into context spanning multiple segments. So what the GTN tutorial on influenza A is going to demonstrate instead is analysis through a more sophisticated mapping-based approach that handles this diversity and references quite well, but still only uses mapping and not assembly for sequence resolution. Then finally, we have lumpy skin disease virus, LSDV. This virus also, like influenza A, had a lot of time to evolve diversity, but it's more limited spread, limited at least when compared to something like influenza, and its pox virus double-stranded DNA genome nature and its low mutation, base low mutation rate have prevented the evolution of too much diversity, basically. So diversity of isolates is much less of a problem for this virus than for influenza A, but recombination events still happen between not terribly closely related strains and that can still pose a certain issue. The other more molecular biological challenges with pox viruses and with pox viruses in general, not just with lumpy skin disease, of course, are their really large genome size, which makes, for example, Genovo assembly already rather computationally expensive for these viruses and favors a mapping approach, but at the same time, their large inverted terminal repeat regions at both ends of the genome, which pose a mapping problem because for reads that fall largely inside of one of these two ITRs, it is hard to decide for a mapping algorithm on which side of the genome to place that read because it will also have an almost perfect match on the other end of the genome. So the tutorial on LSDV will demonstrate a mapping-based approach that enables reliable genome sequence reconstruction of any isolate despite these pox virus specific challenges. And as such, this tutorial can also be a very nice start if you are interested in the analysis of other pox viruses. For example, like many people are nowadays in monkey pox sequencing data. If you want to learn more about the disease caused by lumpy skin virus and related capri pox viruses, then there is a link in the tutorial that allows you to to there are links that allow you to read up about those diseases and their economic impact on African and Asian countries in particular. So finally, let me emphasize that even with the user-friendly platform like Galaxy, it really takes time and effort to put together high-quality data analysis workflows like they are presented in these three tutorials. And so does the writing of the tutorials themselves. So an acknowledgement slide here is more than adequate, I think at this point. And with that, we've reached the end of this introductory session. And I wish you fun and success with and of course new insights from following these tutorials. Thanks for your attention and bye-bye.