 So without further ado then I want to hand things over to Simon he's going to tell us today about automatic updates for Galaxy tools and workflows. Exactly so yeah so my plan for this talk is just to introduce this project that we've been working on I guess for a bit more than a year for automatically updating software requirements in Galaxy tools and workflows. Yeah with the aim most of informing people who are working on developing new tools and workflows that this possibility exists. So we've developed a command line tool integrated into Planamer as well as a as well as a github bot which performs these automatic updates and in order to explain exactly what this what this auto update functionality does I wanted to start at the absolute beginning and go through what Galaxy tools and workflows actually are I guess that most people should have some idea about this but most by but most by mathematicians or computation scientists run their software on the command line and what Galaxy allows you to do is to run it in the in the Galaxy interface so you have this this tool wrapper which maps inputs and outputs between the command line and the Galaxy API and then this is rendered in the in the Galaxy interface so in the wrapper which you see at the top of this slide you have various things you have the command section which is not shown here you have a list of requirements which are installed using the condo package manager or alternatively using docker or singularity containers which are then in turn produced from condo packages and also a list of then inputs and outputs which define for the tool which are mapped between the between the command line and the and the API or the graphic interface and then Galaxy workflows are then created when when a user combines multiple Galaxy tools into a single into a single item so whereas Galaxy tools that were defined by this wrapper then Galaxy workflows are defined by this by this workflow file which specifies all of the component tools of the workflow as well as how they're connected and any predefined inputs or parameters and whereas for the Galaxy tools the requirements are specified as condo packages or as containers the the requirements for a Galaxy workflow can be seen as the component tools which make up the workflow so you can really think about some kind of hierarchy that you have the source code of some scientific software and this is compiled to create some condo package or so this will be a container and then you have the the Galaxy tool which is based on the condo package and then the Galaxy workflow is the highest level of tool and just like a Galaxy tool then the workflow is rendered in the user interface like you can see on this slide so it looks pretty similar but of course for a workflow then the workflow developer might have chosen to hide a lot of the parameters from the user because they because the aim of the work was to perform a more specialized task and not all parameters maybe need then then to be exposed to the end user so one problem which arises is that all of these tools and workflows specify versions of their respective dependencies so the versions of the condo package for their tools or for the workflows the versions of the component Galaxy tools with the aim of ensuring the reproducibility of the analyses but then the problem arises of course is that the upstream tool developers might be constantly releasing new versions of their software so for example for this tool HiFi if you search using condo for software versions you have a lot of different options and then I just checked on the European server 2 day and the newest version is then 2.5.36 and a lot of versions missing so the idea of this project is to try and solve this problem that when you have this update of of the upstream tools that we have some automatic way to also update the the the galaxy tools and workflows which are based on these on these software tools and a big inspiration for for this is what bio condo is doing so they have have a get to bot the bio condo bot is the get to username which is doing something very similar for the condo for for the condo packages so for the next step down this kind of higher of this kind of hierarchy that I mentioned so condo packages defined by condo recipes and they are effectively a list of instructions on how to compile a particular particular software package and so you can see an example on the screenshot for for cut adapt that's just like a galaxy tool or galaxy workflow it specifies in particular a particular version of the software and it also specifies a link where the source code should be going over from and then it has some instructions about how this source code should be taken and compiled to produce the condo package which has been published and then downloaded whenever you run a galaxy tool and so this was the kind of inspiration for us to create a single bot which would do the same thing for galaxy tools and galaxy workflows so just like this bio condo bot is is doing that that every that every software checks these these these these URLs for this example on on pypy to see if a new version of the package has been released and then it creates a pull request if a new version is available to update the version to publish a new version of the condo package as soon as possible so the update functionality is implemented as a sub-command to plan a most of this plan or to update and yeah so it makes a few assumptions about about about the tool so one difference between the galaxy tool and and the condo recipe is that a galaxy tool might have more than might have more than one dependency which complicates things a little bit because for example in this example which is which is pangolin then probably you only want to create a new version of the galaxy tool if the main depends if it's the main dependency so pangolin has been updated so if there's a new version of pangolin available and if there's a new version of this csv tk dependency which is just used for I guess for processing csv files then probably that's not a good enough reason to publish a new version of the galaxy tool so what we assume is that a tool has this tool version token defined so once in the version of the tool on the first line here and then once in the in the requirement section then we assume that this is the main requirement the one that we care about and then the auto update command checks if a new version of this dependency exists or can be installed via condo and if so then it updates it and only then it checks the rest of the dependencies in this case scorpion csv tk and then it finds that indeed there's a new version of scorpion also and also updates that so it updates this this token which is specified and also updates any additional dependencies and the final step which isn't necessary in this case is that if this version suffix is set to a value greater than zero then has to be reset for the new version so this version suffix tracks changes to the galaxy tool which which are made without updating the the dependency version so for example if there's a missing feature which needs to be integrated or perhaps a bug which needs to be fixed then this galaxy version suffix needs to be needs to bump each time and yes so that's more or less it I mean there's a couple of complexities which we can discuss in a bit later if anyone's interested like for example that galaxy tool is often that useful of to define to define parts of the of the tool definition which can be which can be repeated and because various tools can share and share macros which might contain these requirements then that makes things a bit more complicated but but fundamentally that's the process which and this auto update is doing for for galaxy tools and so on downside of this auto update compared to the bioconda bot is that the the the prs which which created often still need needed a lot of work so we created this the skitter music named auto update which runs this auto update command periodically against the rgc repository and then it creates poor requests for all the tools which it thinks needs updating and the big difference with bioconda bots is that for a condo recipe it's unlikely that you want to make big changes to the recipe every time that version is updated because just a list of compilation instructions this shouldn't be changing much but for a galaxy tool then you might really need to change a lot of stuff between releases so if a new feature something added if you feature something added to the to the to the code new flags new sub commands then very likely you want to then incorporate those into the galaxy tools and then someone from there's a lot of someone needs to go ahead and to and to take care of those of those updates so in this example the auto update has found that pangolin needs updating but then but then in this case someone has to go ahead and and check the changeloc or we're assuming that the changeloc exists otherwise they have to check the the diff between the two versions and to add a new version to add any new features which need to be added or which they think in their opinion as a galaxy tool developer are important for the galaxy users and that raises the question which in my opinion is not fully resolved of who is actually responsible for for this update process so um one idea which um which we have which was also discussed a little bit but maybe we could also discuss it again as if we get something to specify um tool um yeah people who are responsible for the tools so um that they would automatically be be pinged when there's poor requests are created and they have some um not necessarily responsibility but they're assigned to um to take care of that tool it's similar to how um biocombo recipes um also have tool maintainers um who um kind of take care of a particular tool so once we um created this um um uh auto update command for tools um we wanted to also extend it to to um so we can do the same thing for workflows so um um so recently there's now this this um this IWC repository which um maintains workflows on github something that's how the IWC does it for for tools and so you can have um a similar um auto update um github opt which then updates these these these galaxy workflows um and then again it's the kind of next stage in this um update hierarchy from the source code to biocombo which is taken care of by biocombo bot into the galaxy tool and then to the galaxy workflow and so it builds on these workflow refactor actions which were created by by by John Chilton so mostly I think in this poor request here um and um so if you might know about this upgrade workflow option in the workflow editor that you can select um in the drop down menu upgrade workflow and then um in in UI and it automatically updates all of the galaxy tools and sub workflows um which make up that workflow um uh so so at the moment that it automatically updates all the component tools and sub workflows um in the workflow um and so what's this um auto update sub command does when it's applied to a workflow is it does exactly the same thing but by the galaxy api instead of the user interface so if you run then planning also updates on your workflow then it um spins up a local galaxy server it loads all the um galaxy tools which are required until the server um it installs the workflow it then runs this um this upgrade workflow um refactor action and then it downloads the workflow um and um it's your machine and then closes the galaxy so what you can also do which um can be quite nice and um also much quicker in many cases is to run this auto updates on a particular galaxy server so for example usegalaxy.org and then what you get is um uh the auto update for a workflow which is um uh tailored to that particular instance so you get the most recent version of the of of this workflow which is capable of running on usegalaxy.org or usegalaxy.u or whichever other um uh or whichever other server you're using and um yeah that's been quite useful for me as well when I've submitted workflows to the IWC that um this can take um uh some time I mean it can be that um new versions galaxy tools up here in the process of the workflow development on the icons quickly um run this auto update command and um have um a fresh version of the um of the workflow which I can push to to my IWC request and then maybe somewhat surprisingly the review process for um the workflows which are created sorry for the full request which are created um for workflows is actually a bit simpler than it is for tools um and the reason is I guess that most workflows have a very defined function so um you don't necessarily want to be changing um um a lots of different stuff in contrast in contrast to a tool where um if the underlying software has some new features which um uh you want to add in order to provide the maximum functionality to the user for a workflow um you maybe don't don't care about that because the workflow has a um a very specific task which is supposed to be formed and so the advantage of doing this workflow refactoring by the um api um is that it ensures that um tool inputs are absolute or outputs which which obsolete which got removed um uh in the um which which got removed from the tools um and who updates it to the most recent version um or to get um or get removed and the bot provides um a list of um list of the tools which have been changed or updated and um if the test is passed on IWC then to be honest it's probably um it's probably safe it's probably safe to to merge without um too much human intervention without too much human intervention um yes so to kind of finish off I wanted to show this um uh this figure from the planning paper which we um submitted recently so um the idea is that we have an entire chain of um automated software version updates um as I kind of mentioned before we start with the source code and GitHub to Bioconda bot to take care of creating the new Bioconda package then we have the auto update bot which creates the new Galaxy tool the new Galaxy workflow and then in the biocontainers community that then they're they're dealing with creating documents and collaborating containers and then the tools and workflows um also then get installed um from the um get installed to um the Galaxy toolshades to um to Dockstone and from there on to um the big service like useconnected.eu or newsconnected.org um and then yeah so of course um I ended up with asking if there are questions um that's I thought maybe I can ask um if if that that that I can ask some questions myself so I guess I'm interested um in improving this and if possible um I wanted to see what people think about this tool about this tool maintainers idea in particular um so I mean um on our computational chemistry um tools repository that we have this implemented so we just specified the maintainers in the um in the in the shed far and then um the bot extracts this and then pings the then pings the maintain a lot of it creates a PR and then yeah maybe um um how to prioritize PRs and so after my look through open PRs and I see that um if there's a particularly particularly important tool when I try and focus on that one but um and to um check whatever features have been included and to um uh and and to and push them to the PR um but maybe but maybe we can think about doing the same more um structured way so again that maybe something gets gets assigned to it um yeah maybe perhaps check the frequency PRs so currently the bot is running once a week for the IEC once a month for workflows um yeah every every change is potentially perhaps this is to is um this um is not frequency not particularly for for workflows and I mean there's also the possibility to automatically update the the tool test data um if anyone has an opinion on that it could be nice to enforce it if you have questions then I can answer those as well that's kind of that's fun I think that's more I think that's more it's everything I want to do so yeah very nice thanks so much um that's really super cool we've been using it a lot I mean it really makes a difference in how up to date the tools can be so that's really cool um and I mean for tools and for workflows this is really awesome work I mean that's that was really needed um it really gives us an advantage that's very nice um yeah for tool maintainers I think we have a few code maintainers um so in GitHub you can put the code maintainer file in the directory and then whenever something changes you're being pinged and ask for a review um I mean in general it would be awesome to have more tool maintainers um yeah um sorry yeah I mean I guess that these two approaches which were proposed one was to use this code this code only as well I want to um include the maintainers in the in the shed file I guess that they work um that in terms of the effects they're basically the same right I mean I wouldn't use the shed file we have the um creators metadata in the tool external itself that would also be a way like planimo could generate the code maintainers based on on this I mean we wanted to do more credit um yeah um yeah I mean I think I think we want to update all the versions I don't I see zero costs there but that's that's just me um and yeah again some some people put in degrees maybe so some people thought that if yeah they should you mentioned that and we should have an open discussion um the same I mean the IWC definitely I mean I have zero problem with daily updates um don't think there's a problem um certainly I mean you know I mean if people feel very you know hands on on their workflows we can uh we can work with the code maintainers file man right so um if there's a code maintainer then you know whoever is updating will first ask for their opinion but otherwise uh there's we should update us as often as possible um updating test data I don't like because usually you have to tailor your test data you have to trim it you have to look at it um if you can jump on jump in on that sure Marius yeah about the update test data I think uh it could be nice to have like a by a conda bot command but any more bot command so that like if a maintainer asks for the update within comments on the request then can be done automatically like when you know yeah I think that was what was suggested um originally in this let's get to where she got created that would be nice to have like a slash command and get to the yeah like you say a um a command with an analogous to by a conda bot do something which then updates the test data and then you could also use it not only for this automatic pull request but also for pull request which created by by human users as well that could be quite handy yeah I feel like it works in like two percent of the cases but that's what's in the um yeah I mean there's no harm in having an extra command that is a question in the chart yeah about whether it works only for bioconda or for any other condo sounds so I mean by default it works it works in the same way as for by it's for planning my test that you have these two default um channels which are bioconda and conda forge but just like um so I mean I guess that's the answer to the question that that by default it works with with both but you can also um customize this so sort of using the condo channels flag with planning where you can also specify what have other private condo channel you want to use. I have a quick question is this reflected in the UI through the upgrade workflow uh drop-down option in the workflow editor or did I misunderstand this what is this so um can I trigger this update process through the UI as admin or is this more uh so this upgrade workflow option in the UI does exactly the same thing as then this plan okay it does exactly the same thing okay so it would also so but someone has to be admin on the on the instance to to install these tools or so uh if you look at these two commands here so um for this first command that it creates a local a throwaway galaxy instance so just like for planning my tooltesting in this case then the user is is admin by default and for the second and for the second case then um it doesn't do any tool installation in the second case it just updates to the most recent version of the tools which are available so it so the drop the drop-down on the right maybe that's why I got confused so the drop-down on the right if I click on upgrade workflow it triggers this process right yes so it triggers the process of upgrading all the tools and so in the workflow to the most recent versions which are available on that particular server but it won't install anything obviously oh okay okay all right thank you that's very cool so what's next do you have any uh any more ideas that you want to work on no not at the moment um I think that as far as this um also update project is concerned then properly be taken just as far as it can go I mean if anyone disagrees then please let me know like I guess no I mean I think this is really great I mean something else you've done that deserves highlighting is you've also added to plenty more command workflow tests in it I don't know who here has tried to write workflow tests um I mean it's you can write them by hand but it's a little cumbersome because you will you need to know the test syntax and so on and Simon added another command where you can just point at an invocation and it'll produce the test files automatically um do you think there is any value in doing the same thing for tool tests so I know some of us like to write the tests first but some of us also write the tool first and then produce test data do you think that would be worth doing tool tests how that works so you mean that you should somehow generate the test from a galaxy job or yeah I mean the same way that you're doing it for the invocation right it's just a different template the test language is actually the same the difference is just it's xml versus i yaml adjacent ah right yeah I see what I mean so you mean that you that you first write the tool wrapper book without the tests then you take the data you run out or you can generate a new test case see I mean for instance if you found a bug um and I mean ideally when you find a bug you want to have a test case that shows how you fixed it so that'd be one way to do it but I don't know if you I mean yeah yeah I'm not sure I just want to say I mean that's kind of all the update test data is um now I mean it doesn't change the test parameters right um yeah but I mean it takes care of the test data which is the difficult part I guess I mean I'm sure that I've done this whenever it's in tools that I've written the document I've written the test without the output yeah and then I mean I think it would be something nice for new users that are maybe not familiar with the tests and tags but yeah yeah maybe um yeah I think that's it I mean yeah yeah thanks for telling me is there anything else we should be discussing some of the question remarks latest developments doesn't look like it so yeah maybe we can say until yeah the next time in two weeks yeah in two weeks we're going to be discussing the release the 22.05 release the testing team's presenting then so um are we that that is what is on the schedule we can discuss offline if that's not going to be happening I think the last time we talked about it didn't have to be just the testing team but just a release update in general we can sort of crowdsource it okay I can change that on the hub then it'll it's all contingent on when we do the actual testing and right now there are a few minor things which are holding us up so once we start there will be probably a week of testing he will take and then there will be another two three days to prepare the report so that's why I am not completely 100% sure that we would be able to present the full report in two weeks got to okay we'll have something we'll have content yeah we'll have something about the release all right well we'll see you in two weeks and we'll talk about the release then all right thanks everybody bye thanks bye