 Okay, so I guess for today's call, I'm just planning to give an update on total perspective vortex or TPV for short, and some of you may have, I suspect that many of you would have heard of it before, but it might be new to some of you, so I'll go through a kind introduction that might or a quick recap of what it is, and then go into a kind of a practical demonstration of how to set it up. So for those of you who haven't heard of TPV before, it's a galaxy plug-in for dynamically routing jobs to appropriate destinations with appropriate CPU memory and other settings. So most galaxy admins at some point and users, I guess, have grappled with this issue of how best to allocate resources for a particular job, and TPV helps you with that exact issue. So I'll provide a brief overview and we'll see how we can set up TPV on a fresh installation of Galaxy, how to do some basic things with it, and we'll look at some of the latest and plans happening with TPV. So TPV comes. We're seeing the non-presentation mode of the slides, I don't know if you were switching. If I go into presentation mode, I think it might be too big the screen. All right, I just know if you've already switched slides and we were just looking at the wrong window. So TPV comes in a PIP installable package that you can install in your Galaxy environment. And once done, you can essentially just use the job cons to point to TPV as a destination. It's just a dynamic job rule or a dynamic Python dynamic destination. And so once you configure that, you point to a TPV rules file, which is just a YAML file containing the resource requirements for each tool. So how do you define these basic resource requirements? So a typical TPV config might look like this. So you simply define the tools, the codes, memory, and GPU requirements, and other environment settings. And you also define the destinations that are available to you. So once you switch to TPV, you don't need to define your destinations in job cons anymore. You just define it in TPV. So once you've done that, I mean, straight away, you might notice, for example, that in this Botai2 case, that the mem is a computed expression. So mem is caused into 4. So it's just a Python expression that will be dynamically evaluated. So all fields in TPV are Python expressions. And you can simply provide a constant value. It just happens to be valid Python. But that gives you a fair amount of flexibility in what you can express as a TPV value. So in this particular example, we have two destinations, and then TPV will fit whichever destination it is that will fit this tool. So in this case, I think both destinations fit just fine because it just needs 4 cores and 24 gigs of RAM. Okay. So how do you now exert greater control over it? So what you can do here is to use something called tags, with which you can express preference or version to a particular destination. So tags themselves have no intrinsic meaning. They're just matched up by TPV using fixed rows. So for example, here we define a high mem tag. And we say that it's a required tag, or rather we prefer that tag. And we also say that the spades tool rejects the offline tag. So any destination that has offline tag will be immediately rejected. So that's a simple way of taking a node offline, for example. And similarly, you could express a preference for a particular node. In this case, we say prefer a high mem and it will search for a destination that also matches that requirement. So simply by combining all these tags, you can express preference or version. And that's scheduling. TPV can also be configured to instantly reload its configuration. So for example, if you had to take a node offline, you can do so instantly for maintenance. Sorry. I was going to say, I think in once or twice that I ran into this, it had to reach three boot the job. So there is a setting that you have to configure, which is to watch job rules. We'll come to that, actually. Otherwise, the files aren't monitored by Galaxy, so you have to manually enable that. Okay. So let's now look at how you might conditionally change the job requirement. So again, taking advantage of the fact that TPV expressions are all Python expressions, we can just define an if conditional, a YAML field, but this evaluates to a value. And if it's value is true, then these settings are applied. So essentially, we can see that there are conditional, you can apply resources conditionally, tags conditionally, and you can also have contextualized errors. Okay. And finally, to reduce the repetitiveness of all of this, you can also do a basic form of inheritance. So for example, in this case, the Trinity tool inherits the spades tool, but overrides the cause value. So the nice thing about this is that when you override cause, now if you see here, mem is a computed expression, so mem is evaluated as late as possible, and therefore that becomes 16 into 4 instead of the previous value. So it's a nice way of kind of doing late binding of the values. So ultimately, I guess our goal is to reduce or eliminate the need for each admin to rediscover ideal settings for running a tool. So the idea would be, can we benefit from this shared knowledge pool somewhere? So TPV essentially allows you to specify a remote URL with a list of rules to be loaded, and you can immediately have a highly functional Galaxy instance with recommended settings, mostly based on reduced Galaxy star federation. We just point to the file and go. So I think that makes that initial bootstrapping problem of a Galaxy instance much easier. And of course, everyone can contribute back to that database. So let's, now that you have a basic overview of how this whole thing works, let's take a quick demo of how you can set it up. So for that, let me just share my additional screen one second. Can you see my terminal? Yeah. Okay. Okay. So I have a freshly cloned instance of Galaxy, which I'm just going to start fresh. And I've installed it and, you know, cloned it and started it up one. So it's fast second time. But otherwise, it's a fairly standard clone. So let's start it up. But before we do that, let's see what settings we have to edit. So the first setting I'll edit is the Galaxy YML file. Is this big enough? Or does it need to be bigger when you see the text? Is that okay? That looks much better. So we're going to look for watch job rules. Okay. Sorry. Okay. So this is the setting we need to change so that TPV will automatically reload the rules. I can't see what you're showing. I don't know if something happened. Oh, okay. Let me try it again. Oh, something did happen. It got detached. The tab got detached. Sorry about that. Okay. Okay. Can you see it now? Not yet. How about now? Yeah. No? No. I just see your terminal but there's nothing. Nothing on it? Okay. Thanks to Sean. How about now? No? No, I see it. Great. Thank you. Is it? All right. Awesome. Okay. So what job rules is what we need to be specifying. And once you do that, Galaxy will start monitoring those files. So next, let's see. I've already set it to polling. I don't know. That's the only setting that works for some reasons on my Mac. But in any case, that's all done. So the next thing we'll do is to create a new job of cons file. So into that job cons, I'm just going to paste this block. Okay. So you can see all it does is to define a local runner and define a dynamic destination which is the default destination called the TPV dispatcher and all it is is just a dynamic Python destination. So it's a fairly standard one. And then that function in turn is parameterized with the config files, which it means. So we need to now the TPV config file is pasting here and we need to next create that file. So we do that next. So into the rules file, we'll just add some basic, feel free to interrupt at any time if you have any questions or something. So in this particular file, we start off with just defining some global settings and we just say everything inherits from default. And that's just the basic tool that everybody will inherit from. And then we just define this default tool, which is defined as abstract, meaning that it can't be scheduled. It's just the abstract definition for all default tools. And there we define course is two and memory is three times that. So essentially, this will be what every tool will use by default once the TPV is activated. And then we also define the destinations. As I mentioned earlier, we route it to the local runner with these two parameters. So the local mem is set to the mem value that's computed and local slots to the course map. So whatever the tool leaves at the end of the day, the appropriate values will be inserted here and those parameters will go into the runner. So once you do that, let's start that galaxy. We'll have to wait for a few seconds. But I will do that. Let's for the next thing, which is to upload some data and see whether these rules have applied. So we should expect to see the local slots and local mem parameter on the galaxy instance. So give it a few seconds. All right, that's taking a while. Okay. So we have the new history and upload the data and let's just see whether the correct settings went in. And yes, we can see here, I hope the font is not too small, but yeah, we can see the local mem and the local slots being computed and sent in for the tool. Okay. So now let's expand on this by adding another distinction. So let's assume that we have a slump cluster at our disposal and let's see whether we can get that slump cluster to, let's see if we can get the jobs to go to that slump cluster. So can you see my, not the galaxy window, but my editor window, can you see that? Oh, no, you can't. The font's a little small, but we can see it. All right. I'll make that bigger as well. Better. All right. So here I'm going to define this additional destination called slump. I mean, it's not really slump because I but I'm just going to route it to the local runner anyway, but we are going to pretend it's slump. And the critical thing is that for slump, we need to have the native specification parameter, which is what slump users to actually enforce the cause and main value. So we just need to generate the native specification and parameterize with the cause and main that we calculated through TPD. So that's what we've done here. Note that the cause is, sorry, men means again, a computed expression here. And yeah, that's that should do it. So before we do that, we also want to make sure that we have two destinations now, and we want to send all our new jobs, let's assume to the slump destination, and only run specifically tagged tools on the local destination. We can do that by simply specifying an additional tag. So since none of the tools have this tag specified, nothing's going to execute here by default. So we should expect to see the job, all the jobs now going to slump. So let's see whether that happens again. This should now take effect immediately. Let's see whether that happens. Okay. So yes, we can see here, it says reloading TPD rules. And we should expect this to work now. So I'm going to upload a new file or put some nonsense. All right, and let's go see whether the new settings are in effect. And yes, so we can now see the native specification going in. And again, the parameters have been computed. So now if you want to actually send something to our local destination, we can do that too. So let's say that all upload jobs, we want to send it to our local destination. So we go back, edit the TPD file again, and say the data fetch tool, which is the upload tool will require the local destination now. So then we should expect all upload jobs to always go to the local destination and not have the native specification. And instead, it should have the local slots and local mem instead. So again, we could see whether that happens. It's going to go back, upload a new job. And yeah, showing us it's now local slots and local mem for this. Okay, so now that we've got the basics working, let's see how we can now like hook into the shared database. So to do that, we're going to have to edit the job con file. So what you're going to do is add a new config file pointing to the shared database. So once you do this, it will just load all these files in order. And the ones that are defined later override the ones that comes first. So essentially, we what we are saying here is that the local rules file will override the shared database. In this case, though, we are going to have to restart galaxy. So I'm just going to stop and start again. And while it's starting, let's take a look at the shared database file that we just pointed to. And if you go to the very top, you see that it's looks the same. We had some default calls, default tool defined. We have a whole bunch of tools defined and the memory and post requirements. So here we see that in this particular case, just defines mem meaning that calls will be the default, whatever it is. And some tools define scheduling tags, some define additional environment options, also containing computed expressions and so on. And some even have dynamic rules, like, for example, in this, no, I saw some. Anyway, yeah, yeah, yeah. So something like this. So all these, yeah, quick question. Yes, when you refer to the mem value, how does it know to pick the value from the default block and not from some subsequent tool? So the, so when the tool is dispatched, it's matched against these available tools. So this is a regex match. So each of the regexes that match are applied. So for example, if you, if you dispatch the, this particular tool, we'll end up matching this and the calls and mem values of this will be applied. Does that answer your question or did that not? When, you know, on the other screen where you compute the value in the other config file, I believe it was. Yeah. Yes, this one. So under destination under local, local mem, you refer to mem in squiggly brackets. Yeah. And that is picked from tools default, where it says mem colon cores multiplied by three. So how does it, how does it know to grab that value and not say if under tools, you would have another tool which would specify its own mem value? Oh, because at the point of routing, we know which tool it is, right? So at that point, when you get to the destination, so the first thing it will do is match the tool. So figure out which, so for example, we executed the data fetch tool, it will match the data fetch tool. So by default, data fetch, all the, it inherits from default. So all of the settings from default will be applied and then the overrides will be applied, right? So, so we notice the data fetch tool now and then we start matching it against the destination. So the available destinations are local and slow. So we specifically requested the local destination. So here it goes. And there we've already computed the cores and mem, it just inserts here. Yes, thank you. Okay. So we now have the shared rule files. So let's run a tool that's defined here and see whether those, whether those settings are going to be set. So what I'm going to do is I'm going to run the, I think intersect intervals. Yeah. So bed tools intersect bed. So that's designed here. And that tool has a memory setting of 40. So we should expect to see the cores being what we define, whatever it is. Yeah, core should be, I think two by default. And mem should be 40, right? Because we override the cores value and set it to 40 on the shared database. So let's see what happens now. I'm going to run intersect intervals. Just run it. And sure enough, it's 40 bits of RAM have now been allocated to it. So essentially, we are now using the remote database. So if you wanted to, we can override that locally and just tweak a value. For example, let's try that next. I'm just going to edit this file and say, okay, intersect bed should instead use only 10 gigs of memory and also passing this custom parameter that I want to run. Let's rerun the tool. See whether those, whether that gets all written now. And yeah, sure enough, it's now 10 gigs. So by bootstrapping your instance with the shared database and then adding your overrides on top, you can have a lot of control over exactly how you want things to run. And you can even override rules. So you can ID a specific rule and override that rule if you don't want it to work or something like that. Or yeah, but that kind of concludes the demo. I just wanted to give a quick flavor of how you might set it up and use it in practice. So if you were to go back to our slides, so what's currently happening, I guess, just to talk about the current developments. So there's a lot of work going into the shared database. So people adding more tools and the result requirements, scaling rules, and a lot of that is coming from the use Galaxy Star Federation. And at the moment, most of the major sites, including, I believe Galaxy main, I'm not 100% sure on that, but certainly use Galaxy you and you have switched to using TV. So this is the place therefore where community contributions would be really valuable. So if you kind of have a knowledge, because all this knowledge is distributed amongst a large number of people. And so if you have a notion of the resource requirements for a specific tool, just and how to scale those requirements, its size, and so on. So please consider initiating a pull request and, you know, just augmenting the database. So we also have a GTN tutorial now and the admin training content has been updated to use TV by default. And finally, there's a lot of tooling that has come up to assist admins. For example, there's now a TV dry run command that allows you to check whether a tool might be scheduled, where a tool might be scheduled and what resources would be allocated without actually running the tool. You can just do it on the command line and check where the tool might get scheduled. There are also some Ansible rules for setting up TV if you need and a lot of convenience methods, comparing tool versions, parameters, and so on. So if you were to now take a brief look at the future. So some of the things that are planned on the horizon is one thing that might be really interesting and useful is to integrate some of the outputs from the NIH funded cost modeling project from Anvil. So where the project aim is to estimate resource utilization for tools and workflow. So these estimates could be really, really valuable to kind of further fine tune these resource requirements. And also as part of the project, we are hoping that we'll see some machine learning models that come out that can be trained on historical use galaxy star federation data and that could more accurately predict what these resource requirements might be. So we can just plug that in as a function into TPV. So most of these rules will probably simplify it into just a couple of lines and then we just add our overrides on top if you want to tweak this. And another thing we are hoping to do maybe during the GCC code fest is to add support for clearly explaining why TPV chose a particular destination and show which rules affected that decision. Because as these things become more complex, there are more rules, more tags and more interactions and so on. You can really make things very complicated if you want to. And it becomes, it's necessary to see, okay, where did this rule come from? Is it from a remote database? Is it local? What happened? I mean, obviously there's no substitutes for keeping it simple, but we need to provide these assistance, I think, as the number of rules and so on go. So that kind of brings things to a close. So these are some links to more information. And these are some of the people who made all of this happen and are keeping the ball rolling with TPV. So I think we can just, is there any questions and comments you can switch to that? Yeah, thanks. Thanks so much for your presentation, Yuan. If anybody have questions, comments? I have a quick question and it's more of a usage question and something I've been meaning to look up how to do. And if that's, if it's possible to route jobs to destinations or use particular configurations based on a history tag in Galaxy? Yes, I think that may be possible. I'm not 100. Well, I'm assuming that information is available to the job at the time of dispatch. And if it is, then it should definitely be possible because you can just interrogate the job object and write the rule because we can write arbitrary Python code here. So we can just look at the job object, look at the parameters, look at any other relevant information and make a decision saying, yeah, at this tag or at this codes and mem values and so on. So I think that should be possible. Yes. So in order to put differently, anything that dynamic destination can do a TPV rule can do because all of the context that's provided to a dynamic destination is available to a TPV rule. So you have the app object, you have the job object, you can just find everything you need. You can query the database, anything you want to do. Okay, cool. I'll try and take a deeper look into that. Yeah, maybe this is not a question for you though, like that reload thing. It was super cool until the link to the database came into play. So if we have another external file, like, you know, got to restart these handlers. I don't know, I mean, would it be nice to, you know, have an admin panel saying like reload these configs by hand without having to reboot the process or, you know, some way to pull for a change in the remote file and do that periodically, I don't know, just some way to marry the two. I don't know if you'd want to watch the remote file, but we have like, you know, reload data tables, tasks that run on the Consellery. I mean, you could write a little task to do it, I guess. So like, from the admin, just say reload. Yeah, like you go into the admin, you click reload tool data tables or whatever. It would work the same way. I mean, there's also the watching of the data tables, right? But since it's a remote file, I don't think you'd want to, you know, it's not like there's a file system for it, you know. Just to give a sense of, I mean, is anybody here using TPV at the moment or have used it in the past? So this is all new to you, right? So that's, I guess, hopefully good. Well, I used TPV when we were doing our last benchmarking run. All right. Yeah. Okay. I would have a quick, easy question. I guess we can all like create rows for groups, rows, and decide destinations for groups of users or more and so on. Yes. Yes. So the same things apply. You can use tags on them and you can match up rules with users, with tools and so on. So yeah. So okay. Has there been any thought to try? Well, so the rules that are in the shared database, is that sort of the bare minimum needed to run or I guess what's the intent there that those rules reflect the minimum needed to run the tool? And then, you know, if you have special considerations, you customize locally. They're more or less, I would say more like recommended settings from past experience from the use Galaxy Star Federation. So some of them are like massive, you know, you see like, let me just look here, 80, 100, I mean stuff that is my memory stuff. Exactly. So you might not have any, have those resources locally. I mean, assuming you can still run that tool, there are ways in TPV to clamp it down so that, you know, you load the shared database, but you say, you know, I have only this many codes locally. So clamp it down to this. There's a setting called the max accepted codes and max accepted mem, and then you can force it at the destination to a particular maximum value. And that way, kind of control it. So it would just, if you had something specified as 80 in the default and you clamped it to 24 gigs or whatever, it would just try to use 24. That's right. Yeah. Yeah. Otherwise, I guess the shared database becomes next to unusable for all, but the biggest... Yeah, that's what I was trying to figure out. And I guess the other thing about the shared database is that it is kind of general because you can't, we probably can't have very specific rules. So the idea there is that you define essential things that you're likely to have in every environment. So there is a bit of a least lowest common denominator approach there. But because with this, because you can clamp things down, we are not, I think nobody's too worried about the cause of it. Yeah. Yeah. With the clamping, that sounds like a good approach. Yeah. I think the difficult part is to kind of infer the rules because if you have scaling rules, if you have this much, input size is this much, maybe you allocate this much memory or something, that's a matter of experience. I guess that's part of the problem. I think you can switch to more like a machine learning approach or something that might be more appropriate and better. Yeah, it kind of relates that. I mean, as I looked through these values, I'm wondering where they came from for these defaults here. I guess it's just some combination of community experience, developer experience of what a typical run is. But I don't know, some of these are all over the place where typical could mean very different things. Yeah. Yeah. No, I mean, they're entirely from seeded by the values used in use Galaxy U and A. So yeah, they do this. There is a fair bit of variation and some of them seem to be like, oh, this didn't work. So I just pumped it up. There also seem to be values that may have been just somebody just thought it up. And it doesn't really test that it's possible. So I think that fine tuning has to happen over time. One thing we could do, I think, is to check actual usage against recommended usage, maybe on the use Galaxy Star federation and kind of fine tune the values more. That's probably something that needs to happen at some point. Yeah. That's a good idea. Any other questions for anyone? Well, we're in Smorgasbord Week. It might be a good opportunity to try out their tutorial and talk to one about it. Awesome. Well, yeah, thank you so much, Nguyen. Thanks everybody for joining today's community call. The next one that we'll have will be on June 15th. So look forward to seeing you all there. Thanks, everybody. See you next time.