 And here's the link where you can see sort of what's going to be happening. So today's webinar will be about how Galaxy can be run because Galaxy is a software framework and it can be run in a variety of ways. And this webinar will be given by three people, me, Agnes Afghan and Michael Schatz. I'm from Penn State and Agnes and Michael are from Johns Hopkins. So before we start, we probably need to get ready to introduce what Galaxy is. So Galaxy is an analysis framework for analysis of large data sets. It has the latest set of biomedical and not only biomedical because there are galaxies for non-biological data analysis needs as well, but it has thousands of latest tools. It has latest types of reference data, latest genome assemblies, for example. It has full-featured workflow functionality so you can chain your tools in a very complex analysis, in a very complex analysis of workflows. You can process very large number of samples and then you can apply results. You can analyze the results produced by these tools or workflows using ad hoc analysis environments such as Jupiter or RStudio. And there is also a big training infrastructure with a lot of tutorials. It's all supported by a very large international community of users, developers, system engineers and so on. So it's an open public framework for data analysis, which can be used right now. But depending on your needs, it can be used either through existing public servers or on the cloud or on your own laptop, if you will. So, yes, obviously, you can analyze data in a variety of ways. You can just use tools on the common line, nothing prevents you from doing that. You can do that. You can chain tools together using a variety of ways and there is an entire ecosystem of evolving workflow languages such as Nextclo, CWL, Widow, Snakemake and others. But you can also, so you can do all of that, but in order to analyze data, you also need to use some kind of a compute infrastructure and you can run that infrastructure yourself and it can be, depending on your needs, it can be a simple or a complicated scenario such as this one, for example, but you can do all of this in one place and Galaxy is that place. So, let me just show you a quick overview of what Galaxy does. Suppose you work with cats. As you know, cats can be orange, green or blue and they can have a variety of ear shapes in many different tail forms. What is the genetic underpinning of this typical variability? So, you convert your cats into samples and you send them to a closest core facility. Because you're a good scientist, you're not just going to sequence one cat, you're going to sequence, say, 100 orange, 100 green and 100 blue ones. So, you send and you wait. Then your samples come back as a collection of many, many, many, many, many, many files. What do you do now? A great answer to this existential question is Galaxy. So, here we are going to dump all these read files into Galaxy Upload tool and now we're going to choose from thousands of available tools. In particular case, we're going to use BWAMM because we want to map these reads against the cat genome and we'll start mapping. And this will generate a BAMM dataset. This is fantastic, but you don't do analysis with one tool. In Galaxy, you can chain multiple tools together with workflows. So, here we have a workflow editor and let's build a workflow. We'll start with input datasets. We'll add fast P for quality control, BWAMM for mapping, SAM tools view for filtering alignments, mark duplicates for removing duplicates, free base for calling variants, and SNPF for annotating them. And we're going to chain them together in a workflow which would look like this. So, this is about 1% of Galaxy functionality. Stay tuned for more. So, hopefully this gives you some a bit of an idea. But as I said, there is a whole universe of galaxies. And Ennis will explain us what possibilities are. So, as Anton was saying, you can perform end-to-end analyses in Galaxy, meaning that Galaxy incorporates tools, it incorporates the infrastructure itself for executing those tools and the framework for linking them and manipulating datasets. But as an outcome of that, there are different use cases and a variety of users that want to use Galaxy in different ways. And as a follow-up to that, there's a lot of different installations of Galaxy in a lot of different ways you can install Galaxy. And so, we're going to try to break this down for the rest of this webinar today on what is the sort of our recommended path for different use cases and how you may want to approach or which of the Galaxy servers or options you may want to use. And so, to start the first slice through this universe of different Galaxy options, we're going to look at it, how can you sort of make your choice based on the role that you play in your professional private life. So, if you're, for example, a researcher without IT support, challenges that you may encounter are things like local infrastructure that is limited or inflexible requiring you to install command-line tools or sysadmins that will not allow you to install a web application in your department. Perhaps the cloud and the command-line tools are just too technical for a task at hand. So, if you're facing those challenges, the solution that we recommend is just going to use Galaxy.star. And I mean, literally, usegalaxy.star. We will talk about each of these badges that are on the bottom row of this slide in more depth. It's a collection of use Galaxy servers that are freely and readily available now. And so, you'll see these badges carry on through the rest of the application. If you're a researcher working with protected datasets, so if DBGaP means something in your daily life or if data is just too big to analyze locally, in that point, we recommend using Galaxy on Anvil. Again, we'll talk about what this option is. If you're an organization or a group collaborating closely, you have large datasets, you're concerned about the data locality for a variety of reasons, you have privacy concerns, or you need heterogeneous resources such as GPUs, large memory machines, a lot of disk. There are several options that are available, so you can either install Galaxy locally, you can run Galaxy on the cloud via the genomics virtual lab or the GVL, or you can use the commercial offering of Galaxy known as Galaxy Pro. Next, the other way we can slice this universe of Galaxy options is by use case. So if you just want to try Galaxy, so if you are a student, if you're new, if you want to learn how to run an analysis or you're a senior data analyst and just need infrastructure and tools that are readily available, then we recommend using one of the Galaxy.Star servers. If you do omics research, so model organisms or use things that are readily available, but you're really interested in just doing the data analysis or predominantly using the data analysis, we suggest again using Galaxy, one of the Galaxy.Star servers or the commercial Galaxy Pro service. If you work with human genetics, so again, protected data sets that require FedRAM or HIPAA compliance, Oxygen Anvil is in the path here. If you're a tool developer or a software engineer and you're looking for a platform through which you can disseminate your tools or you can develop tools, we recommend either using Galaxy, local installation of Galaxy for developing the tools, disseminating them or the GVL for scaling it out to a larger scale, a larger infrastructure. If you're an educator or a trainer and you're working on a syllabus where you're either delivering workshops or you're doing regular classes as part of your job, we recommend using either the one of the use Galaxy.Star servers or the GVL as the cloud installation where some of the ambiguity, I guess, over how long job queues make keep your students waiting, it removes that variability. And lastly, if you are a represented institution and you're system admin and you've been tasked to install Galaxy and configure it for a number of users, so if you're looking like a deer in the headlights and all the options that are available, we recommend either installing Galaxy locally, using Galaxy on the cloud through the GVL platform or subscribing to the Galaxy Pro option. And I think with that, we're going to go back to Anton and we're going to take a look at each of these. As I said, we use these five badges across the rest of the workshop or in the rest of the webinar and we'll dissect each of them and what each offers and whether the options, when you should consider each one. Thank you, Anis. So the first group of the first way of using Galaxy, we're going to explain is so-called use Galaxy Star. So this is a constellation of multiple public instances. And these are, as Anis was mentioning, these are perfect for any researcher analyzing large data sets, for example, data sets generated with next generation sequencing or other types of unprotected data. And some of the benefits of using these resources is that there are thousands of tools and the tool sets are continuously updated. Again, it's a community project. It's not something which is maintained by a single lab. There's a big community that watches for tools and wraps them. It's run on powerful and robust public infrastructure or different infrastructures in different countries. Here in the US, we're using Exceed. In Europe, there are also public infrastructures or also public infrastructures in Australia. There is access to latest reference data sets. And because many people use these instances, you can collaborate and share your data with thousands of users. And there is also centralized maintenance and support. These instances have very high uptime. There are some disadvantages, of course, such as that there is limits on how much disk space you can use. And these are not recommended for protected data sets. So there are multiple instances, but there are big three here in the US, usegalaxy.org, in Europe, usegalaxy.eu and in Australia, usegalaxy.org.au. But there are many other instances as well in multiple countries. There are European instances. There's also African instance. And also, if we list all the galaxy flavors that exist, there are over 100 platforms and none of them, not all of them are biology related. There are galaxies, for example, used for climate science or for natural language processing. So galaxy is much more than biology. This is the interface of the usegalaxy.org, the main site here in the US. This is the instance in Europe, run out of Freiburg University, and it utilizes pan-European computational infrastructure. It has many flavors, sort of a tool-specific flavor, such as, for example, RNA-seq for analysis of viral data sets for human cell atlas. And all the instances in the next year will have that ability to be, you will be able to customize them based on what kind of tool set you would like to use. This is Australian instance as well. There is also a unifying global training network. Again, it's a community-run effort in which multiple people create tutorials. These tutorials are geared to the entire distribution of users, some of very basic introduction tutorials. Some are more in-depth, for example, explaining the nuances, actual biological or analytical nuances, for example, RNA-seq or genome assembly or single-cell RNA-seq types of analysis. Now we are going to Mike, who will, Michael Schatz, who will introduce us to Annville. Hi everyone. My name is Michael Schatz. I'm at Johns Hopkins. I'm also one of the co-program managers for the Annville. So the Annville is NHGRI, which is the National Human Genome Research Institute, is one of the institutes of the United States since NIH. And they have developed this sort of, made a call for action to develop what they call the ANVIL, the Genomic Data Science Analysis Visualization and Informatics Lab Space. As I'm sure many of you are aware, there's a sort of growing interest in human genetics to collect very large cohorts of different patient samples in order to do association studies and other sort of studies to identify some of the drivers of disease and development. In response to this, in total, there are many, many tens of thousands of human genomes that have been sequenced through the NHGRI's efforts. In response to this, NHGRI established an ANVIL as a sort of a unified platform to be able to host all these data, do analysis of that, share the results in a very safe and protected way. It's highly scalable. It runs in the cloud, which just affords many options to be able to do great analysis at very large scales. So if you're working with protected data, ANVIL is definitely a platform that you should consider. And also, though, we have lots of options if you just start facing the challenges of large data sets. The major sort of challenges here is that it does require some configuration to get up to speed in the cloud environment. And then because you are working in a cloud environment, there are costs associated with compute storage and egress of the data from the cloud. Next slide, please. But overall, though, because of these very large cohorts are being assembled, we're trying to invert the model of genomic data sharing. Historically, different institutions would do their own sequencing. They would set up their own compute infrastructure to be able to do the analysis. This becomes very challenging when a lot of the power comes to the aggregation of many, many data sets together. Instead of trying to copy large data sets to every institution in the world, let's host them in a centralized resource, where any researcher can then go access them on an equal platform with all the latest and greatest tools available. Next slide, please. The ANVIL is composed of several different components, one of which is called TERRA that was developed at the Broad. This is the platform for large-scale batch analysis. It also has capabilities for launching interactive analysis that I'll talk about in a second. Next. We also have Gen 3, which is a system for organizing these very large cohorts. One of the sort of key operations of this is, let's say, rather than having to go out and sequence a new cohort that has the phenotypes of interest, the population demographics of interest, maybe we can pick and choose samples that have already been sequenced to develop those that synthetic cohort on the fly out of the data that you already have and make it more valuable. Next. The next major component is called DOCSTOR. This was developed at University of California at Santa Cruz. This is a great resource for sort of organizing and sharing and documenting tools and workflows that can be applied at scale. Next, we have a couple interactive environments that are available inside of ANVIL. We have Jupyter Notebooks, very popular, and then also the very popular bioconductor in our studio suite where you can do statistical processing, visualization, machine learning, all kinds of analysis through that, through those packages. And then finally, and sort of most relevant to this scenario, we're very pleased to announce as of just a couple weeks ago, we've now made Galaxy totally available inside of the ANVIL. Functionality is exactly the same that you would use on other servers, but it's all preconfigured, ready to go, and is authorized to work with protected data sets. Next slide, please. And I should also mention there's many more systems coming soon. All of this works in what's called the FISMA moderate security zone, which means that there's extensive auditing and security practices that are put forth to make sure that this is robust. As of today, there's about 75,000 whole human genome data sets that have been loaded into the ANVIL. This is about 1.1 petabytes of raw storage. Again, to access it, you will need to apply to DDGAP or have consortium access to these data sets, but it is authorized to do so. We also do host a number of non-protected data sets, and you can also bring it on. Next, please. So why is this, right? So what does Galaxy bring to ANVIL? Well, it's all the functionality of Galaxy that you've known and love with the active community, users, tools, and so forth. But what does ANVIL bring to Galaxy? Number one would be access to this protected data. We can avoid the data download. You don't need to download 1.1 petabytes of data to do your analysis already right there. There are no real quotas associated with it. You can scale as much as your scientific needs demand, and we connect the data sets together in novel ways. Next slide. So what I'm going to do now is show a quick video demonstrating how you launch Galaxy inside of ANVIL and then a quick tour of some of the other features that are there. This is also available on YouTube if you need. So the main project for ANVIL is called anvilproject.org. There you'll find resources for how you get log on to ANVIL, how to set up your accounts, and then critically how you can launch the ANVIL. Here I'm showing the data dashboard, which shows a summary of all the cohorts are available. You can search by feed a type. You can search by the types of consortiums that you need. To sign in to ANVIL, you can go to Terra, use your Google credentials, and then explore the workspaces that are there. That's a way to organize data, tools, other compute visualizations, all in one place. The batch computing is done through workflows that are executed in Whittle, but then critically we can also launch into a variety of interactive tools. Here I'm going to show you how you can launch Galaxy inside of a new workspace just so that there's harmonized data. Note it does cost about $0.84 per hour to use Galaxy. It's pretty inexpensive, but you get access to the full feature resources. Rather than working with protected human data sets for this training, I'm just going to be loading in some synthetic microbial data that's unprotected. I'm loading it in from my own computer, but then I need to upload it into an ANVIL workspace and then get it into Galaxy to actually work with. So here we're actually showing how it can be actually integrated into Galaxy. Once there you have full functionality. Here I'm showing fast QC in order to visualize the quality of the data. Next I'm going to be launching bowtie to get alignments of those reads to a reference genome. I can plot the coverage. And then the final steps I'm going to be showing in a second is how you can call variants out of these data entirely inside of Galaxy. And then the final thing I'm going to show here is how you can compute some statistics on that and just sort of comment that in this data set there were 1,900 variants that are detected. The last step that I'm showing is how you can download from the ANVIL back to your home directory. And then the final step is make sure to delete your instance of Galaxy running in the cloud so you don't incur any excess charges. Oh with that I'll hand it back over to Ennis to talk about some of the other options for accessing Galaxy. Thanks Mike. So we've seen two options that are largely posted on when you would be half or largely pre-configured so that you can easily get access to them. Another option for using Galaxy is to install it locally. Again Galaxy is open source, community developed, and as a result you have the freedom in the option to install the software on your local infrastructure. This local infrastructure can be anything from your laptop to high performance computing cluster. And then anything in between of course. So the use cases that sort of the local installation is most suitable for is if you're working on private data, things that you don't want to upload to one of the public servers or possibly even the cloud depending on your organizational requirements. If you want to control the quotas so the public servers have quotas in place here you can manage quotas on your own. If you have administrative expertise and infrastructure available to run Galaxy or if you want to develop the Galaxy tools, workflows or visualizations so whether that's as you're a tool developer and you like to distribute or disseminate your tool via the Galaxy platform then installing Galaxy locally is going to be necessary. So the benefits of installing Galaxy locally is of course that you have maximal control. You can edit the application, you configure it any way you'd like, you can customize it any way you'd like. So whichever reference data set you'd like, whichever tool set you'd like, it's in your hands. The downside are that you do have to have a local compute infrastructure and as I said Galaxy will run on a local laptop but if you want to make Galaxy available to your group or to your organization the infrastructure required is should be proportionate to the workload expected. And other thing is if you're going to maintain it over time it requires ongoing maintenance. Within the local installation we sort of fork one more time where there are multiple installation options for you to choose from. So if you're a developer and you want something up and running quickly Galaxy is very simple as we'll see in a second to install simply clone it, run it and within five minutes you will be up and running. You can easily edit the software, you can easily edit the configurations but again this is not an intended platform for doing actual data analysis. This is mostly for development and trialing. And you have to configure those settings manually. Another option is using the Kubernetes stack that's actively being developed. So Kubernetes is a container orchestration technology that allows us to link a lot of this different services and publish them as a single unit in this case sort of the best practice installation of Galaxy that includes all the components required to run a production grade of Galaxy. The configuration for Galaxy in this case for versions through a tool called Helm. But in order to use Kubernetes and deploy it you do need Kubernetes based infrastructure and knowledge of Kubernetes to some degree at least. Another option is to use pre-built Docker images. So there's a library of several dozen pre-built Docker images that allow you to that have been preconfigured in sort of units that can be deployed on local infrastructure as well. So there are many domain-specific variants for various omics domains and these Docker containers can be extended. You're not necessarily limited to the Docker container itself but you can extend it to connect to an HPC cluster or to an external disk. And you get this as a unit that can then be linked to existing infrastructure. The last sort of not least option is using the Ansible mode. Ansible is a method for scripting deployment and management of software and we have a library of Ansible playbooks and Ansible roles that codify best practice standards for deploying Galaxy. And you can use these roles in a variety of ways so that you build an infrastructure that is most suitable for your needs. This does require some assembly and context before you can actually just go ahead and run it as we'll see. So I'll dive into each of these four options with a little more depth. As I said the local installation for the developer option is very straightforward. In three commands in about five minutes you can have an installation of Galaxy running in. If you simply copy and paste the three commands at the top of this slide you will have a Galaxy available on the local host server. This gets set up with an SQLite database and not really any tools or reference data. So this is again good for development experimentation. If you want to wrap tools, if you want to disseminate tools through Galaxy you can use this. There's an SDK around this environment called Planimo that allows you to develop tools more easily and provide some high-level features. There's more about this. You can start here and evolve it into a production-grade installation of Galaxy by adding various components. More information about that is at the getgalaxy.org URL. The other option that I mentioned or summarized early on is the Kubernetes path. So we have developed a lot of Kubernetes components and wrapped it all up into a Galaxy Helm chart. Helm chart is a way to parameterize and manage complicated Kubernetes deployments, which in this case Galaxy is. So there's a handful of commands again listed here that will let you get up and running. And what you get in turn is sort of production-grade installation of Galaxy. So it comes with a high-performance database namely PostgreSQL and the CERN Virtual Machine File System or CVMFS, which contains a lot of reference data. The tools are pre-built or they're pre-built via the bi-containers and configured into this particular installation of Galaxy. So if you start Galaxy up using this method, you will get a fully functional, largely preconfigured installation of Galaxy that is very similar to that of one of the public Galaxy servers. And even though these commands are sort of condensed and fairly short, the reality is you are building a fairly complex system behind or the system is being built for you. So some knowledge of Kubernetes and the stack is recommended. The benefits of this approach are that things like zero downtime upgrades, meaning that if you want to change your configuration in Galaxy, so in this case, we're just setting a simple brand on Galaxy, you run the following command and Helm and Kubernetes will do the right thing in that they will start up a new process before the old one is retired and only then switch all the requests to the new server. You can also support rollbacks in this context. So if you've made a mistake, Kubernetes and Helm automatically track those revisions and you are able to go back to a previous installation. So if an upgrade, for example, fails, you can in this case revert back to a working installation. Again, we highly recommend having some local Kubernetes administration available for this. The awkward images are another very straightforward method of getting Galaxy up and running on local infrastructure. So there's a one command listed at the top of the slide that will get you a fully functional, very capable installation of Galaxy running again in a matter of minutes. There is a library of several dozen pre-configured Galaxy images that cater to specific domains. So there's mentioned here imaging, epigenomics, metabolomics that have, as you bring Galaxy up, it has a toolset that matches that particular domain. And so you just have to change which Docker image you use and are up and running again very quickly. Once you have this Galaxy up, you may want to expand it and grow it. Like I said, you can connect this to an external cluster or external disk storage for making sure that the data persists. So just something to keep in mind and look into if you choose this option. Again, every Galaxy installation, we just want to be very clear about that, that our local installations of Galaxy, while they may be very easy to get going with, over time, you can expect to have some requirement for management and upgrades and so on that come with some of the shared requirements. And the last option for installing Galaxy, locally, is to use Ansible. And Ansible is an open source, we're generally available software that allows you to manage software in a controlled fashion. In the context of Galaxy, we have developed a set of reusable Ansible roles that codify these best practices for installing Galaxy. So Galaxy consists of a number of components. So you have the Galaxy web application, you have the database, you have some of the tools, you have a number of sort of components that make up the full blown production grade installation of Galaxy. And so to compartmentalize those into manageable units, the Galaxy community has built a set of these Ansible roles that can then be linked into playbooks that run end to end and install a complete Galaxy server. So Galaxy Kickstart is one example of such playbook that will install Galaxy on a virtual machine, for example, and again, end to end. There's another playbook called usegalaxy-playbook that is used to manage usegalaxy.org. And so while this one is somewhat specific to the infrastructure at usegalaxy.org, it is a good example of what is possible with Ansible these playbooks and how you can actually link them into a very functional service. So with Ansible, there isn't a single command or single playbook that you can necessarily just run and say, I want to install a Galaxy server in a matter of minutes. You have to learn a little bit more about this and compose these Galaxy Ansible roles into a playbook that is suitable for your infrastructure. If you want to learn more about Ansible and how to do this, there's a Galaxy admin training generally held as an annual event at least coming up at the end of January. And the registration for this event are closing in about 10 days. And the URL is on this slide. You can go there, register and attend and learn all in more about Ansible. So the next top level option, so we've seen usegalaxy.star, we've seen Ansible, we've seen local installation. One that's sort of missing, I guess, thus far is how you can run Galaxy on the cloud. So the genomics virtual lab with the GBL is the method through which Galaxy is made available on the cloud. If you choose to run Galaxy on the cloud, you get a dedicated Galaxy instance on scalable cloud infrastructure, meaning that you're able to scale the scale of the amount of compute and storage available to Galaxy instance to match the needs. And this is available on a variety of cloud providers pre-configured with a fairly comprehensive toolset and reference data that's equivalent to that of usegalaxy.org. The use cases for using Galaxy via the GBL is if you're operating on private data that you should not be uploading to one of the public servers. If you exceed the quotas of public servers, again, you can get additional storage and additional compute capacity on the GBL, or if you need a custom toolset that you, whether you developed your own tools or whether you want to install your tools that may not be available on one of the public servers. The benefits or the pros of this capability is that this installation is private to you or to whoever launched the installation of this Galaxy for you. You basically are up and running in a matter of minutes, generally 12 to 15 minutes is what it takes to get going once you have figured out your credentials and access to cloud infrastructure. There are no quotas. Again, toolbox is customizable and you can use versatility hardware, meaning that if you want to run some assemblies and you need a lot of memory, assuming that the cloud provider you're using offers that it's entirely possible. This option, however, may not be free unless you're using one of the academic clouds. Cloud providers do incur or you will incur a charge for using Galaxy on the cloud. You do have to launch this instance and maintain it over time. So unlike useGalaxy.star servers where you just visit a URL, here you have to go through this launch process and I'll show you this in a video in a second. And if you plan on running the GVL over time, you may need a system administrator. Again, this is a complex system that over time requires maintenance, security upgrades, upgrades in general. And so just not even though the launch process is fairly straightforward, the long-term maintenance may require care. And next slide, please. So the GVL, as I said, it stands for the Genomics Virtual Lab and it supports multiple applications in multiple clouds. This system has been designed from the ground up to be compatible with most of the popular clouds out there today. So namely Amazon Web Services, the Google Compute platform, OpenStack. We specifically have both allocations and experience working on Jetstream as an academic cloud in the United States and Nectar as an academic cloud available in Australia and then Microsoft Azure. The GVL, so the Virtual Lab portion of the GVL stands for, or sort of implies that this isn't just about Galaxy. And in turn, the GVL supports Galaxy as the flagship application, but you can also work with Jupyter RStudio and a web-based terminal and you can organize your data in NextCloud as your data manager. And so if we go to the next slide, we in play this video, we will see how to launch an instance of GVL on the cloud. So at the start, you have to go to launch.UseGalaxy.org. You have to log in. This is a requirement for the server and you choose the Genomics Virtual Lab as your appliance. You choose the version. Currently, the latest is Galaxy 20.09 and you choose the cloud on which you want to deploy Galaxy. There's a number of them, as I mentioned earlier, different regions and whatnot. You have to provide credentials for your GVL instance. You can save them on the GVL, sorry, on cloud launch, or you can provide temporary ones for one-time use. Once you've settled on the cloud choice, we have to give our cluster GVL a name, provide a password that protects access to it, choose the type of infrastructure we want to run, and you have an option of using a DNS mapping so that you don't just deal with an IP address but can use a proper domain name. Click launch and like I said, in about 10, 15 minutes, you will have an installation of Galaxy available. So in this case, again, we have to log in to our server using the password we provided. There's a number of these applications that are available. We said Terminal, RStudio, Data Browser, Jupyter, and of course Galaxy. Clicking on that, we get access to Galaxy. It is configured very similar to usegalaxy.org in that it has a number of tools available and a lot of reference data that you can get up and running in a matter of minutes. And I think that wraps up the GVL option and the last option that we're going to talk about today is using Galaxy Pro. So Galaxy Pro was announced several months ago as a commercial offering of Galaxy. It is a subscription-based service of a managed Galaxy installation that comes with support. So the use cases for this is if you're looking for reliability and improved performance and privacy so that you don't have to install or manage Galaxy installation on your own. And in turn this sort of enables this increased research productivity by having pre-built services and components of the system. So with the Galaxy Pro, there is no need for system administration. All that is handled by the administrators. There are options for customizing Galaxy Pro to include a variety of tools and it can be integrated with existing infrastructure or data sources that might exist in your organization. There are no quotas. It comes with support. Each installation of Galaxy Pro is a dedicated one, so there's no isolation and privacy. There's no sharing of infrastructure on the bottom. Hardware, it's a cloud-based service. So there's a first little hardware including again high memory, high CPU machines as well as GPUs. And there's a set of pro workflows which I'll talk about in a second. However, as I said, this is a commercial subscription service. Hence, it is not free. And because these Galaxy installations are dedicated, sort of sharing is not as freely available as on one of the usegalaxy.servers.org servers, for example, where there are thousands of users and you can choose to easily share this. This is mostly sharing within your group or organization. So how is a Galaxy Pro different or what makes up a Galaxy Pro service is that, of course, it includes the Galaxy software. So it is a managed installation of Galaxy System. It has verified the tool set. So everything that's available on the Pro tool set has been validated, tested to make sure that it operates. And there are no usage quotas. It comes with support and customizations. And that means that if you're developing a if you're developing a pipeline or if you're developing an experiment, there's an application scientist happy to help coach in terms of how that should be done so that the analysis yields good results. And then the tools can be customized, the tools that can be customized for for the needs of the organization. The infrastructure is entirely managed. That includes compute and storage. It includes backup in as part of this management component. And then there's a system administration that includes ongoing maintenance and software upgrades. So not have to have a system administrator on staff, for example. And next slide, please. The other thing that the Galaxy Pro comes with is these pro workflows, so-called pro workflows. It is a growing library of high quality workflows that include documentation. Currently, there's RNA-seq and O&T variant Oxford Nanaport variant calling workflows available with more being actively developed. These workflows are regularly updated and they are supported to reflect the best practice guidelines. They are a bit opinionated, but they can be customized to reflect the needs of a specific project or a specific organization. And they are regularly tested with proper test data to make sure they are ready to use. So you don't have to go out and search for which version is most suitable or which one of these reflects the best practice. And again, they can be customized to accommodate a variety of use cases. Next slide. And lastly, Galaxy Pro is not offered by Galaxy Project directly, but instead there's a company called Galaxy Works that offers this service. It's available at galaxyworks.io and there's more information, sort of, via an email at info.galaxyworks.io. And with that, I'll hand it back to Anton to close us out and open the Q&A session. I just want to finish with, I just want to finish with acknowledgments. And acknowledgments really go to, besides our funders for the public project, goes to the community. Because again, it's not a project which is supported by a single lab, single PI or a single grant. It's much more than that. And this is one of the greatest strengths of this project is that that ensures that it's up to date and that it reflects best practices for analysis in each of the domain. And with that, I just want to thank, again, the Galaxy team, the community. Galaxy is primarily funded by NHGRI and NSF, and also by NAID. And let's open for Q&A. We will need the help from Dave Clemens on that. I'm not sure. So what do we do, Dave? Let's see. So we have one outstanding question from Bellin, and I'm answering another one from Kerry as I speak. Bellin asked a question, can the Galaxy repository be used as a sort of GenBank in terms of storing data? In terms of GenBank, are you going to, are you asking in terms of storing your data, what data are you talking about? And Bellin, I can't give you permission to talk, or I don't know how to. So if you want to type in a chat, that's okay. I would say generally no, that's not what Galaxy is used for. I mean, if you are storing your data, for example, if you're analyzing, for example, negative generation sequencing data, of course, we can store it in your account. But you can also, if you are working with common data sets, for example, I don't know, the most popular thing of the day, of course, is the coronavirus. Of course, if you're working with a particular set of coronavirus sequences, for example, you can have, you can configure it as a library that would be accessible to all the users of the given Galaxy instance. But Galaxy is certainly not GenBank. It's not a replacement for GenBank or for Shortread Archive. It's, it leverages these resources, it allows you to get data from these resources and analyze it, but it does not serve as the data repository. Yeah, the question from Kerry, I sort of answered, but it would be good to have a more thorough answer. Kerry says, I want to assemble plant virus genome sequences from the raw sequence data containing plant RNA sequences. What do I need to do? Do I need to register for a Galaxy Pro? No, you can, you, I would, in your case, I would start with a Galaxy training network. So I would begin right around here. Because that site has tutorials on assembly and viral assembly is something that can be very nicely done within Galaxy with existing tools. So that would be the place to start. Good. Other questions. Oh, we have a question in chat from Nile. What are the subscription costs associated with Galaxy Pro? Do they vary by usage? What are your recommendations for a less than 10 person lab using NGS methods, but not exclusively cloud or pro? So it does vary by usage. It's a usage-based model. So there's, I guess, two contexts in there. One is sort of a flat fee to avoid the runaway script syndrome where you pay a flat monthly fee and the cluster behind the scenes scales appropriately. So on Monday it'll scale more than on a Sunday. Assuming that on Sunday everybody's taking the day off. Alternatively, you can just, it's like an hourly fee for depending on how many jobs are currently running. It accrues over time. And we can talk about the costs sort of offline. For a less than 10 person lab, either the cloud or the pro are options. Again, the different domain difference being that with the pro, you don't have to deal anything with with managing the infrastructure over time. With cloud, you have to stand it up and maintain it over time because you're the only person that has access to that particular instance. So it depends predominantly whether you have expertise and time to handle that installation. Thanks, Ernest. We have another question from Hari Haran and I apologize for that pronunciation. What is the data storage limit per public user in the public Galaxy servers? How long will the data be stored such as tools, usage history, and processed results? For public servers such as usegalaxy.org and usegalaxy.eu, the standard quota is 250 gigabytes. However, for example, on EU, if you are a member of LXE or if you're associated with LXE, that quota is 500 gigabytes. But again, depending on your requirements, if you contact us, we can work with you on helping you to manage that. But the standard quota on usegalaxy.org is 250 gigabytes. I think that's true in Australia as well, but I'm not positive. I think yesterday I saw that usegalaxy.Norway has a thousand gigabyte limits, but I could be wrong about that too. But you have to be a Norwegian researcher. Australia has a similar tiered, if you're an Australian researcher, I believe you get more storage than if you're a non-Australian user. I also, I didn't answer the second part of this question is how long your data is stored into history. So your data is technically stored forever. However, there are some changes that will be coming in public, in public instances in that we will be switching to a tiered storage in which data which has not been used for a while will be moved to a much cheaper for us cold storage. So it will take some time to retrieve it. But so far, if you've created an account and 10 years ago, your data is still there. We have another question about usegalaxy.eu versus org. I've heard that some tools are available only in usegalaxy.eu, but not available in usegalaxy.org. Is that correct? Are those two websites different in other aspects? So this is why we're talking about this concept of usegalaxy.star. So the ultimate goal is to have all three big sites and ultimately other sites as well is incomplete sync. The tool sets between the three sites overlap to a very significant degree, but there are some differences. For example, EU has a dedicated set of tools, for example, for image processing, which is not currently available at usegalaxy.org. But the ultimate goal going forward is to be in sync. Yeah. Something else to consider is there are over 125 public servers, and if you aggregate all the tools on all of those, it's a huge number. One of the most popular cases for creating a public server is if your lab has created tools and you want to make them easy for people to use, you stand up your own Galaxy server and then people don't need to install it to use it. So there are, in fact, lots of tools on lots of servers. So, okay, let's see. No more in Q&A. Is there any in chat? There's a, when I run something, is it analyzed in the nearest country where I am? So assuming that you're talking about usegalaxy.star servers, it depends on which Galaxy server you go to. So if you go to usegalaxy.org, for example, it will run on one of the servers in the United States exclusively. If you go to usegalaxy.org.au, it will run on one of the servers in Australia. If you go to that EU, it will run in one of the European countries. So it's part of the Elixir network and there's a number of servers that are available. Most likely, it will end up in Germany, but certainly in Europe. And then with the GVL slash the cloud option, you have a choice of which cloud you want to deploy your Galaxy installation on. It's your choice, whether it's Amazon and Google or whatnot. There are different regions that each provide and support. And then it will certainly be running only in that particular server of that country. We have a question from Amin. And I again, I apologize if that's an incorrect pronunciation. What is the email address to be used in order to work out extra allocation? That's an excellent question. There is a help. So on the Galaxy interface, there's a help section. So if you contact us through that mechanism, then we will work with you. And again, it depends on which instance you are. This is in case of .org. In case of .eu, there's a specific form that you fill out. So I guess overall vision, I'm not talking only about storage, is that sort of, you know, how, for example, if you look at Apple, everything is starting to look like iOS. And so with all the Galaxy, they're also moving towards, there will be highly synchronized, very similar data and very similar tool sets and very similar procedures for requesting more storage. There are still some difference at this point, but the project is evolving in a convergent way. So I have a follow-up question on that. If I create an account on UseGalaxy, or do I get an account on UseGalaxy.eu by default, or? No. No. Are there plans to support that? There are no plans to support that at this point. However, well, not plans as of now. Thank you. Okay. Any other questions? Oh, we have a question from Nasir. My goal is to have a local Galaxy installation equivalent to UseGalaxy.org. The easiest way for me is via Galaxy Docker image. Otherwise, I have to go to a lengthy process of figuring out how to install tools. And, okay, I'll let you guys read that because that's a long question. Am I correct? There is no Docker image set up this way with all tools used in UseGalaxy.org. So the short answer is no. All the UseGalaxy, or sorry, Galaxy Docker stable is at least very close. You can alternatively start off with that and then connect it to the CVMFS read-only file system that's globally available and supported by the Galaxy project. And all the configurations of UseGalaxy.org are available on that file system, as well as all the tools. And assuming that you use Kanda as the resolver, so you can simply link to that CVMFS server. And it will pull the tools as part of the execution mechanism of Galaxy and make those tools available. And possible, probably you will not get 100% coverage just because some of the tools have been around for more than a dozen years. And so the dependencies on those are a bit questionable and are hand addressed in the case of UseGalaxy.org. So I know this works quite well, but not 100% because the GVL option uses that mechanism. And the mechanism of mounting the CVMFS server, and hence it's functional, very functional. I would also advise to consider the sediment training because that forum will address many of these kinds of questions. That's free. And yeah, as we mentioned, the deadline is next week to apply. So I guess one other comment is just be careful. If you're trying to install the thousands of tools locally, that will take a huge amount of storage space up. So you want to be selective of what you install. And ultimately, if you're planning, for example, to do assembly, and you need to make sure that your infrastructure is capable of doing that. I mean, Frank, again, without knowing what your use case is, trying to just blindly mirror a server may not be necessary. If you talk to your users and see what the needs are, especially when it comes to tool versions and the tool set, it may be a lot less work and more functional for the consumer. Okay. So we promised less than an hour. And we have three minutes left. So I'm going to declare victory and thank the speakers and thank the audience and thank everyone for the questions. Have the speakers got anything to say before we close out? I just want to mention in the end is that one of the really coolest thing that happened within the last year is the annual functionality that Mike was talking about. This is very new, but it's an amazing way to actually be able to analyze these remarkable datasets, such as GTX, for example, in place. Without need to download them. Mike, do you want to say a few words about that? It's very exciting. As I mentioned, there's more than a part of protected human datasets that are not available and they're all basically a click away. So this is just the very beginning. NHGRI and NIH are committing a lot of resources to this. So I expect this just to grow and grow and grow in capability in the coming years. So I'm really excited to think about all things and all the places we can take Galaxy in the future. This is the beginning of a completely new way we analyze data in place without the usual just done loading them. Okay. Well, thank you everyone. It's been great. You know how to reach us online if we have any follow-up questions. This will be posted as a video on the Galaxy YouTube channel and you know where to find us. Dave, back to you.