 Hi, my name is Marius Wienbeck and I'm going to talk about a new initiative that I'm very excited about. We have done a lot of groundwork over the last year to start a new Galaxy workflow community, intergalactic workflow commission. Before I start, I want to briefly mention that the slides are available here, since there are a lot of links in this presentation that you might find useful. Let me start by highlighting a few things that are great about workflows in Galaxy. For me, the number one reason to use workflows is that it is by far the fastest and most reliable way to perform a complex analysis in Galaxy, especially when you have to analyze more than a handful of samples or when the tools you need require a lot of non-default parameter values. I say reliable, since every parameter for each tool step is defined in a workflow. So you will not accidentally forget a specific parameter, something that might easily happen when doing an interactive analysis. Workflows are often the outcome of an interactive analysis. As a user, you may have tried a few different tools and settings until you find exactly what works for your data and gives you the correct results, assuming there is a correct result. And when you've reached this point, you can delete the stuff you don't need anymore and extract the workflow from your history. So workflows reflect both knowledge and trial and error of the person that defined the workflow. But of course it would be nice if that knowledge and lessons learned can be shared easily and transparently. So we can focus on the science and not the luck work that has been done many times before. That brings me to a set of problems and roadblocks that I see today in the day-to-day use or under use of workflows and their features. Traditionally workflows did not allow for a great deal of parameterization. What I mean by that is that you may might have designed a workflow for differential expression analysis for six C and again samples. The liner is set up to align against the C elegans genome and your transcript quantification uses transcript coordinates specific to C elegans, and maybe you set some other parameters that depend on the size of the C elegans genome. What that means is that somebody that would like to run this workflow with samples from mouse or human or any other organism, we need to change those parameters deep in the guts of the workflow, either in the workflow run form or in the workflow editor. If you're a programmer, you might think a workflow to be tightly coupled to the data they can process, and that is not ideal. So essentially Galaxy force users to create copies of the workflows where they change those parameters and then of course it's difficult to keep multiple copies of the workflow up to date with proper descriptions, latest versions of tools, annotations and other metadata, not to speak of proper versioning and change looks. And why would you even go that extra mile if the workflow is likely only usable for your own data. The workflow based around the data and you know it's working on your galaxy server. There's also a little incentive to write tests for the workflow. The guy says gained a lot of features to deal with this problems with the parameters for instance can be defined once and connected to many steps. So one such parameter can define the organism. So when you're on your workflow, you only need to define this one parameter and all dependent settings would use that parameter. And I would express that that the workflow author needs to take, but I would argue that if we as a community decide to do that will end up with fewer, but more universal well tested and user friendly workflows, workflows that you can build upon also if you're not a galaxy expert because the time that everyone spends in defining troubleshooting workflows can go towards properly describe inputs documentations examples benchmarks, and so on, making it easier and friendlier for beginners. A few years ago, we were in a similar situation for tools. There were many two rappers out there in the two shed and other social repositories, but the quality and security of these tools Barry and for admins that were security minded, they had to scan the two rappers and see they had any flaws, which is time consuming inefficient and not very easy. You can scale well, especially if you want to install hundreds of tools, let alone 1000s as available now on the public galaxy instances beyond security implications various patterns and conventions, good or bad will copy paste it from one tool to another. And we didn't have yet an extensive tool schema and documentation that we have now, and collaborating to improve commonly used two rappers rappers meant yet to send patches or bug reports by email. So we had a lot of questions, issues in the intergalactic utilities commission, why is he was founded in practice that meant a repository on GitHub was created where the IC collects two rappers that follow the highest standards, and that those standards were documented. You can submit or improve to a rapper request a new rapper report bugs at the IC. And of course there are many other community repositories that are fantastic and that follow a similar model. So we can take some inspiration from the tools is and the IC is not an anonymous mass, they're close to 200 volunteers that have added or changed to us over the years. And like many open source projects, the IC is a couple of members, or we can call them commissioners that are both knowledgeable in galaxy tool development and willing to donate some of that time to respond to issues and reviews to approve and merge pull requests. And to summarize the goals of the IC is to determine and find best practices to guide authors and to development, and the IC also organize trainings on communicates requirements for the galaxy development team, and importantly also develops and maintains the testing infrastructure. So in practice the process of improving or adding tool is that anyone can open up a request. A UC member will review, and perhaps make some suggestions for improvement. In parallel automated tests are on so that we can be reasonably sure that the two rapper and the changes work as intended. So here we run planning test on the tool and if it passes, it should be fine. So then we can merge the pull request. And so here's a visual overview of the PR process so the tests are run. And then at the end, if everything looks good, the changes will be deployed, and they're the tools will be uploaded uploaded to the tuition. So this makes us all tools from the entire community and provides a mechanism for athletes to install new or updated tools. And since we think that the IOC tools from other community maintained repositories do not contain any security vulnerabilities vulnerabilities you can confidently install these new tools on our server. So this brings me back to the IWC and the central proposal for how we can do something similar for workflows. It's really important for tool development or software development in general to have tests that are run for every change and additional. So we need that. Now we need a central repository where anyone can submit new workflows and report bugs. This has turned out to work well and it was the overhead of managing the infrastructure for many small repositories and subscribing to many repositories to respond to issues there. At the same time, we should aim to have proper release for workflows. I'll talk about that some, some more later. And then we need to establish some conventions for metadata and if there's no standard. If there's no standard we can follow. And of course we need workflows. So the first and most simple step was the creation of the galaxy project slash IWC repository. We've actually created the repository in 2018 following the GCC boss in Portland, but we haven't added the first workflows until February 2021. When we started adding SARS code to workflows that Wolfgang may have developed and that are processing new samples every day on the use galaxy service. So there are IWC posters here. And this is an example of an addition of workflow to the IWC. So how do we do our testing. This is very similar to this tools we use planning a command line tool which uses galaxy programmatically 30 galaxies rest API. Planning or runs the galaxy workflow waits until the workflow run is finished and compares the output to what is expected on the top right. You see the test definition for one workflow test to write such a test you need to provide the inputs in this case run accessions. You provide the file which contains one or more accessions. And then you provide a snippet of the expected results. This is a bit piece of a fast queue file that should be downloaded by this workflow. We've chosen to run a galaxy workflow test directly within GitHub or pro runners. This has a couple of advantages over other possible approaches. We don't need to maintain a separate Jenkins instance. Everyone can fork the repository and run tests on their fork they don't need additional permissions. Since plenty more also starts the galaxy instance we don't have to rely on an external galaxy server, which may not have all the tools installed that the workflow requires and for which we can simply see the logs as that would be a security problem. All jobs run within Docker so the test can also be run locally and we can reasonably expect to produce identical results in local testing. And finally we can easily test against multiple versions of galaxy. For instance, when a new galaxy release is being prepared, we can ensure that the workflows continue to execute properly. One advantage of testing against external galaxy services that they come with reference data. But we set up the workflow tests to use the same reference data that use galaxy servers are using using CDM FS. And since the procedure of running workflow test is not too different from running two tests, we have unified these within a script and created a GitHub action using the script called plenty more CI action. So most of the logic is shared between tool tests and work protests. I'm sure the central part of the GitHub CI workflow here, which uses plenty more CI action. And so this means that when we need to change something within plenty more, we don't need to update each individual workflow we can just update the plenty more CI action. So now we need a place where galaxy users and galaxy itself can find workflows. And while the toolshed can host workflows, it is not ideal. Installation was complicated. And few users ever really use this. So, luckily we have better alternatives now. So, we have the toolshed. Docstore and work for how our tool registries that implement the global Alliance for Genomic Health GA4GH to registry standard that's abbreviated with tears. So they provide convenient interfaces for browsing workflows and viewing information about workflows. I will have to talk about what we'll have later in this session. The requirement that we wanted to satisfy and I mentioned that before is proper versioning of workflows. So you all know that most software has versions. One difficulty with a central repository is that a version refers to the entirety of the code in the repository. For tools, we simply define the version attribute in the tool, as you can see here. But that means we can use git directly to find the code for a specific version of a tool. It's better for software that lives in a separate repository, like planning for instance, you can see that here. And both Docstore and work for how can link out to specific git releases of workflows. So it would be great if we can create releases for individual workflows. So we've added functionality to planning more than we take one or more related workflows in a directory of the central repository and upload this as a separate GitHub repository. And then we can version that repository. So to illustrate this, we have SARS code tool variant calling topic in the central IWC repository. Within the topic we have multiple directories each containing a specialized workflow. So each directory here will be deployed to another GitHub repository under the IWC workflows organization. You can see this here on the left you see the repositories in the IWC organization, where each repository corresponds to a directory in the central repository that we saw earlier. And on the right, I picked out the change log entry for a single version of a single workflow which is all the changes that were applied since the previous version to that workflow. So you can look at this and see what is new what is changed, what we need to know. And these new versions automatically appear in RockStore and we can find these workflows directly from within the Galaxy interface, including all previously published workflows. So you can see the process of finding workflows published by the IWC organization on the right, the right organization, IWC workflows that will show all the workflows. And then you can upload or import a specific version of a workflow. So contributing we of course we want you to submit workflows. The goal is that contributing a new workflow becomes as simple as possible. So you can find these instructions in the read me of the IWC repository. And basically, it boils down to starting with a workflow that you've run through the best practice panel in the workflow editor explaining that here. This fixes a lot of common problems and alerts users to possible improvements. And then you can run planning workflow in it on your workflow file, which creates this test template that assumes you're going to test all the pro outputs. And often that's not the case. So from this template, you can delete things you don't need. So in our case, we can remove the paired end output and reads here. And then you can link the workflow, which checks that the workflow is syntactically correct. And now we need to generate a docstore YAML file. You can do this with planning what are in it. And then you just need to add read me change log, and you need to mention which release this is within the workflow itself. So we've just recently started doing this and hopefully things will become more straightforward in the future. One thing that is a little difficult when we're doing workflows is that the .jf format isn't really meant to be read by humans. So here's an example step. Note how the index here is numeric. These are made through numeric IDs and these numbers, they don't really carry any meaning. So it's a difficult to look at what what is the input here. And the tool state is JSON. So JSON within JSON is really difficult to read. So JSON added functionality to visualize Galaxy workflows using SiteScape.js. So to improve on the difficulty to review texture workflow representations, we might be able to visually review changes in the future. Another plan is to automatically convert the .jf formats to the GX format tool workflows directly within a pull request. The GX format tool allows for much nicer disparate differences. The tool state here is not doubly nested JSON, and it uses step labels to find inputs and outputs so that actually has some meaning we can understand. We would also love to have a static page for our workflows. We have benchmarks and resource requirements and of course once this stabilizes and we have a good idea what works, we hope this can become a template for other groups interested in maintaining Galaxy workflows. And finally, I think the IWC can play an important role in prioritizing the development of new workflow features in Galaxy. So I hope you will join us in creating many cool workflows. We hope this will become a community project. I think improving the IWC and adding new workflows will be a great co-fest project. So with that, I'd like to thank John for all the work he's put into workflow, into Plenimo and the tool and workflow framework within Galaxy. Wolfgang has put in our first workflows, he focused on the Salesforce 2 analysis workflows, and he's also proven to be a careful review of these workflows. Matthias put in a lot of effort into the CI testing, both for the IWC and the IUC. I also want to thank the IUC for setting a great example on how we as a community can work together to create and maintain tool wrappers, and every individual and team that has ever created Galaxy tool wrappers, since without great tools we can build with cross. Thank you. Let me know if there are any questions.