 Okay, so we're going to talk about task execution schema. So just as by way of introduction, the Global Alliance for Genomics and Health has a cloud work stream that's working on several different APIs. These include previously mentioned data repository service. There's a tool registry that's, you know, there's workflow and execution service. And then finally there's task execution services, which is one I'll be talking about today. So TESS, we call it, is basically a way to issue a single job request, asynchronous long running batch job. So this is akin to basically the same operations you would see in the HPC environment, submitting a job to the cluster. The key components of this is that the message packet includes mapping of all of the inputs and outputs from where they exist in some external object store data system to where they should be mapped inside the container, and then a list of all of the command lines that should be executed on these things. So in terms of specifications, it's very simple and documented and described in open API 3. And so this is kind of a very user friendly way to describe how to get work done. Nice thing is though, because it's kind of this common standard, it's being integrated into a large network of different products. So again, this is being currently used as I think there was a version utilized by seven bridges, one used by the Rhodes-Cromwell engine, there's a CWL engine that takes advantage of this API. And then also more recently, there's some beta testing for an investigation of the snake making this next flow as well. Then that's from the workflow side. And then from the production or service side, you have a number of different implementations to make this service available. So we'll see one from an implementation that was written by Alixer over in Europe. And all the tests and that deploys onto Kubernetes clusters. There's an endpoint that's being packaged into Cromwell's implementation or Microsoft's implementation of Cromwell on Azure, as well as there is also a packed system called Cromwell-The-Mighting worked on, which handles a bunch of HVC environments in some other systems like the WS batch. So a lot of different full clients and service speaking this protocol. And so obviously we want to bring Galaxy into this system. And one of the exciting things about this is that not only does it enable a workflow engine or system to communicate with a number of different underlying systems at the same time, it also provides the capability of actually putting it in a gateway and actually federating out the requests. So that because the request is introspective, you can see what the inputs and outputs are. It's a verse ID. You could say, well, this data system actually is a better location. We're going to move the job over here. So it becomes an entry point into Federation from within jobs around. So we think that's kind of an interesting and exciting way to kind of connect with what's going on in Galaxy. So to get this actually picked off, last year as part of Google Summer Code Project, a student actually implemented our test program. It's currently under a pull request to the main Galaxy line. So we're hoping to get this kind of the details of this ironed out integrated Galaxy. Actually, the fun thing is we did uncover a couple of nuances in the way the Galaxy works. And those issues are going to be integrated into an update to the specification. So we're currently on version 1.0. We're going to do this in 1.1 in the near future. We'll handle some of those issues. So just by way of knowledge of people that are working on this, we have a student who actually implemented the test runner, Alex, who is actually over one of the co-chairs and actually managed that project. This is another co-chair with the standards group that I work on. And it helps a lot with like building the standard up and then we have these work stream leads that are kind of leading this whole project, not just with tests, but also the other APIs like Durs and Wes and hers. Thank you. I'm sorry to get you. Are you working to have Galaxy also support Durs and Wes? Well, Durs would be exactly what John was talking about. In terms of Wes, I don't know if anybody's working on that in the video. Thank you. So that's been rolled in slowly. Right now it has been personal. You roll your own endpoints and so it's assumed that you're in your kind of security. And that was the 1.0 specification. GA4GH as a consortium is coming out with a larger security of passports and racks specifications. So we're going to incorporate those components into the APIs following the next couple of versions. And so at that point, you'll be able to provincial yourself and say firstly allowed into the end point, but also we have to have a way to pass credentials for like storage systems. So if you point to an S3 bucket, I will give those S3 credentials to the executor that's actually working in the end. So the security now that it's passed through, those have been kind of worked on and inspected right now to be added in the specification. Thank you. Next talk is on the Galaxy Storage Board by Dana Badger. I'm also really glad to see everyone, but I haven't missed this part. Okay. So again, I'll say right at the top, this work is primarily done by David out of the Fryberg group, but he said he would give me a beer if I presented it since he's been here. So here we are. So it's about the Galaxy Storage Dashboard. And it's a new way to manage and visualize all of your Galaxy Storage. The motivation for this is if you've ever administered Galaxy Server, you've probably got hundreds, thousands of questions like, you know, I can't possibly be using 200 gigabits. It's not right. You know, I need to recap my storage, please help me. I need more quota. It happens all the time. It's probably, I think Jen said it's probably the most frequent thing she has to deal with as a support person. So what is the Galaxy Storage Dashboard? Galaxy makes it really easy to get data into Galaxy and to run a whole bunch of jobs and generate lots of data. But then how do you get rid of it, I guess. So the Galaxy Storage Dashboard is a centralized place to visualize and manage all of your storage in Galaxy. We want one like a one stop job for seeing all of all of your stuff and performing actions on it. It's yeah. So right now on in 2205, you can click either in the top right where you have a little quota meter or within a history where it says how much space it's using. You can click that and you end up with the storage dashboard. The dashboard is supposed to be sort of, it's got sort of right now it's a minimal view that says, you know, this is your quota, this is how much storage you're using. And then below that, it gives you a couple of little wizardy things that we'll talk about in a sec. The note in the middle here you could say refresh this sometimes Galaxy storage calculation gets out of sync. If you've missed your Galaxy server, you probably have to deal with, yeah, please recalculate my quota, the admins have to go in a couple of buttons, recalculate the user's quota. Now users can just do it themselves. So you shouldn't have to deal with as many of those requests. And yeah, we plan to add more sort of guided assistance into this in the future. We also want to be able to visualize the actual space you're using in a tree map, right? So you could see, oh, where, what histories are using only space? I forgot about this stuff. So this is going to be really nice. Right now, when you click that button at the bottom that says clean up wizard, we have these little actions here that you can perform. Right now they're only two. You can discover and free deleted data sets across your history, something that really commonly happens is that users delete data, move on to an industry, it never got purged though, so it still can't get certain quota, right? So now you can say, you know, review and clear here at the bottom, and you'll get this box on the right that says, hey, this is all stuff you've deleted, you can purge it if you're really done with it. Users can select all, individual things, click one button, it does pop up and say, are you really sure you want to delete all this stuff? This is safe to measure. And then it'll purge it for you. So hopefully that's much easier for users. For future ideas, you can imagine all kinds of things you'd want, find large histories, old histories, intermediate files and workflow implications, sort of the sky's the limit, anything you can query the Galaxy API for, you could build a plugin to be a little clean up. So this is how you'd have to add one. When this first rolled out, there was only the old data sets, sorry, the deleted data sets plugin from Jen requested that we add deleted histories as well. There's an example PR there that was about 80 lines of the JavaScript, and we have a new feature, right? So it's super easy to add these extra plugins and sort of the specimen there, right? Yeah, so additional future features we want to be able to, you can imagine on the tree map have exposing a lot more information about the date the history was last touched, you know, what other histories have other histories that are sharing the same data, that kind of thing. We can add a lot of new features here. And yeah, that's it. So, I mean, can you repeat the question? So you're asking if you could identify spaces being used and push it to like cold storage, something like that, right? Yeah, or, you know, start to find your history. Yeah, great. I mean, I imagine that's exactly where John's work on the different user-based object storage will come, right? So you could imagine having this all in the same interface, storage dashboard where you see your storage and you go, okay, let's do you prioritize this to infrequent access or whatever. Thank you, I really enjoyed that. Did I miss it or do you have a question to search the duplicate files? It's not built-in right now. I'm trying to think how you would actually do that in the UK. Yeah, that's a great idea. I don't know. Thank you. Next talk is on the new history. Fine. I agree. It's great and fantastic to see everybody. It's definitely new after two years. I will talk about the galaxy history. Oh, sorry. Is that better? I'll talk about the galaxy history, our recent developments and future ideas. And before I start, I want to give an outline of the talk. So at first, we'll just introduce what is history, actually, what's its role within Galaxy. And then I'll talk about the new features and changes which came now with this current release and what that enables users to do. And then we'll see a demonstration. First of all, the role of the history. So if we look at the Galaxy UI, we have three main panels. And usually one process is starting on the left-hand side where all the tools are listed. You can search all the tools. For instance, you select a tool and that populates the center panel in which you can now parameterize these tools. Once you have decided on the barriers and selected maybe additional data, you can execute this tool and that will populate your history on the right-hand side. And your history as with this, the main duty of it is to display data sets or data samples. That's the main duty of it. Of course, once you have displayed data sets, we also have quick links made available and they're visible, which allow you to delete data sets or otherwise edit metadata and so on. So it helps you to manage your data sets too, of course, but the main challenge is to quickly and efficiently display your samples. And then usually the process is continuous in such a way that you select samples, your tools and parameters, run them. Now you look at the results, the outcome that might be visualizations. You want to dig deeper into it. You derive a conclusion from that and that leads you to the next iteration of your research cycle is to produce new samples and continue the cycle over and over. And the history's second most important role is to keep track of all the data you produce. So given that role, what we have achieved now in this release is to have much larger, significantly larger histories. And with the availability or the possibility to handle these larger histories, there comes also certain requirements. These are the additional features we need rapid filtering. Of course, we have much more data, so we need much faster filtering in order to be able to select data sets. And we also introduce bulk operations, which I will also demo briefly. These operations now can run on the entire history. So with the larger history, better filtering and operations upon all their sets. And what made the larger history is possible is a complete rewrite and new iteration of the history code. We move to a modern reactive framework, the view framework. And what the view does is a client-side JavaScript framework, which allows you to bind in the data, and it will react between changes in the data and display them immediately. It gives you a nice clean structure to connect data. And for our purpose, the data, since it comes from the API from the actual history contents, we provide, we develop some data providers, some store data providers, which are generated. These are shown here in yellow. So now if someone has a view component and wants to utilize or show the data in the history, what that person can do is just import a provider simply just like any other view component into their component. And here the left-hand side shows view that could be a single component or a whole set of components. But the main feature here is that you import one of these, the generic store provider, and you specify, for example, the history items store with specifying a gather and an action for that store. And that will populate your component with the data you requested, and it will also be reactive. So the data changes, your component immediately changes too. And these generic store providers also allow you to specify props to configure and parameterize your stores or your store getters, the ones you use. So there's a lot of flexibility in one simple interface to reduce the complexity of the stores. Additionally, the stores are always up to date. How we achieve this is we have currently a watcher, which for the current history will detect changes and update all the stores. Though the entire UI if plugged in through this will always have the most current data without having to press refresh, it can also be used in other embedded components. It doesn't have to be just the history panel because we also embed these data samples, let's say for example, and the invocations and in reports and other contexts that will also be up to date. And with this, I would like to show a brief demo of the current history. And what we'll see in the beginning is a fresh reload. You can see that it's local, of course, but goes very fast. What you can do now is you can scroll through thousands of data sets fairly quickly and display them. And additionally, you can go from top to bottom here. There's several problem data sets being scrolled through. You have the option to have these quick filter options just like before. So we didn't change much on the UI to keep it the transition also simple and intuitive that the learning purpose is low. You can see now when you delete data sets, they disappear immediately. You don't have to press refresh for the UI to update. Additionally, we introduced this filtering panel to assist in identifying the keywords you want to search for. Technically, you could enter all the keywords just in the search field like done here. So you see we selected history data sets with the ID below 1000 and with the extension PDB, but you don't have to remember all the keywords because you can just click on the double arrows and populate the form, press search, and then you'll get the keywords displayed to you. Additionally, this comes with benefits for other features which use the history. For example, here, the collection builder, which is also known much faster because you can also see it start populating it in the back because they are independent. So the collection builder just communicates with the API and does what it wants to do. It doesn't have to worry about what the history does because the history will always be up to date due to the watcher and the connectivity. You see some basic features changing the name, adding tags that remain consistent and the storage dashboard which then talked about appeared here in the center panel. I want to highlight the feature which I mentioned, but which was not shown in the video in the screen because it's a bulk operations. What we've done here, it's very similar from the first site as I have worked before. We just improved the UI a little bit on the first site, but there are additional major changes here. First, it didn't change in that way that click on the checkbox on the left upper side and you get this check option. You can select the data sets and then if you click on the highlighted drop down button, it will indicate how many data sets you have and how many you have selected. As I said, you can now for the first time select all data sets in the history. So that's a major change, not only the visible set like it was in the versions before and additionally, you also have new bulk operations. For example, to change the database build, add tags for all data sets at once or remove them or delete, all these operations can be applied completely and here we see a brief demo on just quickly selecting all in your items in your history and changing the data set builds. You see there a question mark first and then the operation will proceed through the API and you have your new assigned database builds here. Another very important feature which we introduced in a very in the least intrusive way, I would say, but it's an important feature is that now for the first time given this new architecture, it's no problem at all to start connecting the data set samples in the history with each other to bring them into context and we demonstrate this here by highlighting the inputs for a given data set. So here's a data set number 11 and you can see that with an additional option that the three other data sets which are highlighted here is a blue arrow are connected in the sense that they are the same inputs for the set tool. So here we have the option to in a meaningful way add more features depending on the data which is available. So one thing of course will be the outputs inputs and outputs may be also additional collapsing in order to improve the highlighting but overall there is no limit to enriching the history with this interconnected data. This demonstrates it really well. I'd like to also show other slightly developments which were in the slight periphery of it. She also saw a really nice how John also demonstrated the usage of this. We upgraded our tours and our tours now entirely deal-fied and they can run on the new history. They are very robust and they help of course to demonstrate new features as we've seen and have also test I think because these tours are implemented in such a way that our testing program automatically runs on. So there are multiple levels of benefits here. So what is next? Pretty conservatively since we had huge changes this time. My favorite three items are increase the test coverage for the next release, remove some of the legacy code. Of course there's still some legacy code because in this transition you're still able to switch between the two histories just as a safety plan. But I don't think anyone has used it. There should be any missing features and then we find the processes. And we have of course other ideas coming up and I encourage everyone to join this conversation. It's on GitHub. It's also an additional document and this is kind of an example on how these ideas are communicated. For example this suggestion this year, can we have scrollers which help us even further to navigate our history by let's say going in a certain region of certain dates or just highlighting certain tools. And the technology provides and enables us to do these things. We just have to decide what do we want to do, what's the most useful for the users and there should be no discussion. And with this I want to really thank the entire community this continuous community effort by far over several years, very different projects from just an entire modernization of Galaxy itself which made this possible. And a special thanks definitely to the UI group, to the backend group testing and systems which made this transition very efficiently possible. They worked excellently together and brought everything together. And I want to highlight some significant contributors here too. David who did not only the storage dashboard as we saw but also the bulk operations was extremely helpful with the API support. Marius helped a lot with the data strategy. So the data strategy design, the UX stores, they're from there, they're from his writing pretty much. And of course also Ahmed and Asunta, new features and refinements. And with this I want to thank everybody for that contribution. This is really the sum of everybody's work. Thank you so much. Is there any plan to use a way of for example, connecting individual history items in some form of hierarchy that is sort of like below so intermediate with history and the user? So the question is can we, so we have the history and it's listed. Yeah. And your question is to have an additional a subset of the history no like is it like would it be some more possible to like multiple histories sort of together under something that because like I see as all histories but like some of those histories might be related, they might be different runs, like I say, some additional data and so on. Yeah. So the question is can we group histories with each other, several histories and subsets of histories? And I think this is a great idea. I think it fits into the context of the multi-view history where we have that which we also start transitioning and will be in the next release. And in that context, we should see if it makes sense. I mean currently you can't tag the students, right? That's kind of an option to do that. It might make sense to further into it, maybe to be able to search and highlight. You can actually search tagged history. So I guess that's enough that would be already covered. Thank you again. Our last talk for this session is on is by Keith Suderman on automated benchmarking. Hello everyone. My name is Keith Suderman. I'm part of the Galaxy team at Johns Hopkins and I'm going to introduce a tool that we developed to automate benchmarking in Galaxy. This was motivated as part of our work, the Anvil project, cost modeling project. The goal is to collect some real world data running real world workflows to get an idea of the costs involved. And we quickly realized that running these benchmarks manually was not going to be an option. So what we needed to do is we want to make sure we're running the same workflows on the same data with the same configuration to make sure that we're comparing apples to apples. Amazon and Google, they have hundreds of different instance types, compute optimized or memory optimized. When we start throwing in the number of CPUs or the amount of memory, the sheer number of possibilities makes the search page intractable. So we developed ABM, which is simply a Python command line tool that wraps common Galaxy APIs such as a BioBlend, Planemo. Also, since Anvil runs on Kubernetes, it can also interact with the cube control and helm. So we can reconfigure the Galaxy instance, run a benchmark, reconfigure the Galaxy instance again to run another benchmark. And then we can pull the runtime information out of Galaxy to see how things actually work. So what is, when I'm talking about a benchmark here, I'm actually talking about a relatively specific thing. Our benchmarking experiments consist of three components, the actual artifacts that we're using that is the workflows and the datasets. A benchmark, air quotes, is a given workflow on a given data set. And then an experiment is running a benchmark on a given cluster with specific parameters. So when we're running a benchmark, running a benchmark in a single Galaxy instance is relatively easy and straightforward. That's not difficult to do. But when we start running it on multiple Galaxy instances, we run into a problem in the GA files and whatnot. Galaxy refers to custom ID values to identify both the workflows and the datasets. Those ID values are generated with a secret key that should be unique on every Galaxy instance, which means the ID values are unique across Galaxy instances. So that adds a little bit of a twist to running our Galaxy workflows. Fortunately, BioBlend does allow us to look up an ID value given a name and also look up a name given the ID value. So that's what IBM does to find workflows and datasets. And we have a method in there to translate, given a set of ID values, we can translate those into names and then also validate that workflow on another Galaxy instance to make sure that 20 hours into the run, it's not going to contain because it can't find something. So how would a user use ADM? I'm a big fan of convention over configuration. So ADM itself requires very little setup. You're going to need to write a profile file that just defines the URLs of the Galaxy instances we want to use and your API key for those Galaxy instance. There's also a way to specify locations of datasets so they can be loaded from S3 buckets or from Zenodo or any place that has a URL that we can use them. Once we have our configuration in place, we can upload our workflows, we can upload our datasets and then we write a simple YAML configuration file that defines how many times we want to run our benchmark, the benchmark configuration that we're going to use, and then the Galaxy instances where we want to run things. So here's an example of a simple benchmark, the output history name, the number, the benchmark to run, the input datasets that it requires, and then the name of the workflow. Once Galaxy has been running, we can also inspect the jobs to make sure they haven't erred out. We can get the information from the job and then we can also dump all the runtime statistics, CPU usage, memory usage and whatnot out to a CVS file so we can load that into either a spreadsheet or a database for further analysis. Here's a quick analysis, just some runtime in seconds that we've done on some various tools. There's HighSat, Bowtie2 on the left and BWA, BWA-Mem on the right. So we can see as we add CPUs, our runtime goes down relatively linearly. 8 CPUs seems to be about the sweet point. Memory doesn't have the same effect though. There is a point of diminishing returns adding memory where adding memory doesn't do anything except increase the cost. There's a little graph there that shows our run times and our costs. It turns out long story short costs. CPU optimized or compute optimized instances give you both the best bang for the buck. I'll be doing a demo tomorrow if you want to come and see it in action and I think we've got a few moments for questions. What would you like to do once you have all this data? The goal for that, as I said, it was part of the Anvil Cost Modeling project. So when people are configuring instances, they make up numbers on how many CPUs they need or how much memory they need. The idea is to provide a dashboard for users that they can say, oh, I want to run this tool on this many terabytes of data. What kind of instance is best for me and how much approximately is it going to cost? That's sort of where we're going or we'd like to go. Thank you.