 Welcome to this introduction to the session Building a Pathogen Surveillance System with Galaxy. I'm Wolfgang Meyer, member of the team behind the European Galaxy Server, but also one of the main contributors to the Galaxy COVID-19 project. When you watch this video, then hopefully you've seen a lot of the Galaxy training material around sequencing data analysis already. And you will know that, or you should have learned that Galaxy is a really nice analysis platform, also for analyzing pathogen sequencing data. What might at the moment still seem like a big gap at this point is how to integrate a simple Galaxy-based pathogen data analysis into a public or animal health program. So take as an example the simple SARS-CoV-2 variant analysis tutorial that you might have worked through already from the training material. Well, it explains in detail the different steps it takes to get from whole genome sequencing data of SARS-CoV-2 samples to lists of mutations identified in each of the samples. Well, and it's certainly great to have the bioinformatics explained and made accessible through Galaxy. But in a routine molecular surveillance program of a pathogen, these steps are just one piece. Personally, I would say a rather important piece of a much bigger task. So in a larger program, you might well have data generated through different sequencing schemes and you will certainly have a lot of data, which means you want to automate as many analysis steps as possible while still being able to process data from different sources rather flexibly. You would also not want to stop at simple lists of mutations like this tutorial does, but instead, most likely, you want to answer higher level questions about circulating lineages or circulating drug-resistant strains of the pathogen in your region under surveillance. And you would like to be most likely again alerted of shifts in those lineages or strain patterns. So what you'll need for that are aggregated reports and visualizations that let you keep an overview of a large amount of data and that also let you detect patterns in that data. And then, of course, you want even more automation so that also these reports and visualizations are getting updated when more data becomes available. So then finally, at the same time, and I think that is equally important, you also want to archive generated data safely and ideally publicly, I would say, so that others and your own research can profit from that data again later. So, yeah, all in all, the gap between that simple pathogen analysis tutorial here and the full solution for molecular surveillance of any pathogen is a rather wide one. Luckily, though, national players and also supranational organizations, like, for example, the EU and also the International Atomic Energy Agency through that joint center with the Food and Agriculture Organization, just as examples, have recognized this issue too, and they have initiated programs to improve things. So within the Galaxy project, we are currently working together with the EU funded by COVID project, which has the goal to make pathogen data open and accessible to everyone. And also with the IAA coordinated research project with the VATLAB network, which works on bringing state-of-the-art molecular surveillance tools to veterinarian diagnostic labs around the world. And our goal with both of these projects is to align galaxy-based pathogen analysis better with the needs of public and animal health initiatives. So the way things have gone over the last years in the middle of the pandemic, our efforts are most advanced for SARS-CoV-2, where we have already built a modular genome surveillance system with Galaxy at its core. And as a first component of that system, we have built and released a set of workflows for automated reproducible analysis of SARS-CoV-2 sequencing data and we have registered these workflows with the two major public workflow registries, Dockstore and Workflow Hub, from where users can then easily import defined releases of these workflows into their accounts on any Galaxy server of their choice and start a high quality data analysis on their own data immediately. And this tutorial here, mutation calling viral genome reconstruction and lineage plate assignment from SARS-CoV-2 sequencing data explains this kind of import of workflows into Galaxy and also how to use and combine the workflows. Importantly, it will explain that there are different workflows for different types of input data that let you analyze data from different sources, but that finally these all arrive at standardized reports and visualizations and SARS-CoV-2 lineage assignments. We then also worked on automation scripts that let you trigger automated runs of all these workflows as new sequencing data becomes available within your project. The technical basis of this is explained in a dedicated tutorial, but we have also put together a demo of how to use this automation system on example data. And the whole SARS-CoV-2 surveying system we've created is also presented as a showcase on the by COVID infectious diseases toolkit webpages. And you are very encouraged, of course, towards a visit that showcase page which provides a very structured overview of the system along with many additional links, two different components and resources we have used for that. So the clear goal for the coming years is to keep on extending and improving this kind of surveying system and to also provide comparable support and solutions for pathogen other than SARS-CoV-2 from within Galaxy. Here then, finally, regarding data archiving I've alluded to before, you should know by now that it is not particularly difficult with Galaxy to produce consensus genomes of viral and also bacterial samples from high throughput sequencing data. And for many pathogens, there are domain specific databases to submit these consensus sequences to. So as for example, GZ8 for influenza and COVID sequencing data. But it is important to realize that a consensus genome of any pathogen sample only captures a tiny fraction of the information present in the original raw sequencing data. And so we strongly urge everybody to submit also their original raw sequencing data to truly open databases like the European Nuclear Tide Archive, the ENA or NCBI's Sequenced Read Archive SRA. This will not only allow other initiatives to benefit from your efforts, but it will also make sure you yourself can still access the data later when you might need to reanalyze it, be it in light of new insights into a pandemic or epidemic or new developments on the bioinformatics side. So for this aspect of data management, we've prepared a demo on how to upload data directly from within Galaxy to the European Nuclear Tide Archive and also tutorial on the hosting your raw sequencing data before submission to such databases. So you see, there is a lot of material to be discovered by you and I don't wanna keep on talking and prevent you from accessing it right now. So with that, I'm at the end of this brief introduction. I hope it clarified a few things and I wish you fun with and insights from all this material available. Good luck and bye-bye.