 We needed asynchronous endpoints, and we still do. There's a couple of applications that we want to enable, so if you want to know more about this, you can ask me what these are. It uses standard open API specifications, so that's great for interoperability. And we can enable some really cool things using the open API, so we can use the open API as a spec, and vice versa, so we can use the spec of other projects and generate some cool things for the Galaxy. It's based on modern type-to-type pins, so it's using the Phedantic library, which provides validation, so whenever a request comes in, we'll know exactly whether, you know, that your request is actually valid, it includes everything you need, and it generates a nice exception message if it doesn't. In the same way, outgoing requests come out of data as well, so that he knows how to catch bugs before they happen. We can generate fantastic documentation from the Phedantic models, and it's really a huge operator to develop a great experience. I think this really makes a difference from getting it right on the third, fourth, or fifth try to get it right on the first try. So, yeah, I mean, it's the end of USB supporting Galaxy. Since 22.01, the default web server is Ewingkorn, and USB can still be used. That's actually the situation since 1909, I think. Well, you know, we've got things production ready. In 22.05, we have to support for USB, so, yeah, actually, it will be exclusively an ACE2I application driven by Fast.AI, and we've just recently learned that USB has entered maintenance mode, so it's a good thing we started a year ago preparing the switch. I can't remember. The route migration is still ongoing, so we have a USB middleware that, I mean, an ACE2I middleware that lets us call into a USB route, and that was super essential for us to start working on it and bringing it into production. And these routes are, it's not critical that they're not converted yet. What's important is that we have the tools to create other routes currently. And a consequence of this is that gravity replaces USB supporting job handling, so they already mentioned this with gravity updates, so that's tightly connected there. So the architectural choices we've made is that we went with a thin API layer that is backed by a service layer. So what that means is most of the Python code that's running in the API is actually just a single line. It takes in all the parameters and returns models. So it's like super simple. Should one day we have to change to another framework, it'll be super easy. The service layer determines API specific logic, so that means like working with limits, with API keys, decoding, encoded IDs, so that the managers in the end can really deal with business logic, which for us mostly means talking to the database, talking to the task queue, talking to storage. Something that's newest where we're using the media type headers in order to provide new responses to existing API routes. So that's one way to version responses, and it also allows generating clients that will know what type of response they're getting based on the data that they're sending in. So that's really great for augmenting endpoints and keeping backwards compatibility. Because we don't always using our API. That's the point of having an API stable. That enables us to have stable API, but to introduce new features. So we have 107 routes converted with massive documentation. 330 routes is still undocumented. That's an overestimate because of the way that we have some sort of magic routes that just take an entity and do something through it, but it's just the same amount of code. Yeah, I mean, here's an example of how you can actually interact with the API documentation. So if you go to muskats.org, study the case of docs, you can follow this and you can start interacting interactively with the API. So here just, yeah, ask for a list of workflows. And, you know, there's this little thing that lets you select things. You can try it out. It generates a curve of commands. You can just take them, stick them into your bash script if you need to. So, yeah, I think models of API make the aspects they nicely capture the required inputs and outputs. We can generate service tabs and don't stop this for the GA4KHTRS. The models are great for parameter exchanging on REST API. So the models are also being used for standard tasks. We've got invalid or even more client parameters, which is also great. And it really helps to hide the inconsistencies we have in the API, but that's sort of the next step. We're never going to do, probably not going to do a V2 API, but we can sort of solidify the experience. Yeah, I mean, certain things in primary are a little bit more complicated, but it's not a big deal and we'll just move this out. And, yeah, probably something else we learned that projects that is really important. Fast API is probably the best option for us today, but the way we've architected everything, should this not be too many or anything is supposed to change. We're not much admin oriented. Next, we want to get more API routes. We're migrating to the city like M2.0, which will allow us to talk as some currency to the database. We're going to do web circuits and service events, in order to increase the activity and decrease the amount of polling we do, especially in the history. And yeah, we're going to create API clients automatically. And we can just import an MPM package inside your alternative. Who are you for? Yeah, that's it. Thank you. Yeah. Next time we have Nicola. Nicola and Michael. So I'm Nicola. So I'm working here in Norwich. And together with me is Michael Kruse. He's a significant project leader. And also from Germany. So together we're going to talk to you about this portfolio, working language in Galaxy. It's a collective work. And these slides also use some of the work from Mars. And let me see the rest of people in data knowledge. So if you have downloaded one workflow from your Galaxy instance of choice, or if you downloaded it from one of the four repositories that are around, you will get the Galaxy native format or the extension that you get. This format is JSON based. And it's never been supposed to be something to share, basically, in the sense that to share it is to be reusable and habitable by human. And in fact, it contains other JSON inside of JSON. And it's basically a representation of the state of the workflow in the database. It doesn't have a formal schema. So it's been updated during the course of the year without changing the schema version of it because there's no schema. And main issue with it is actually coming out right now when we are starting to have community workflow developers that are being joined together like IWC to develop best practice workflows. And then we have this kind of best practice also you want to version it to have new version that come out and update tools or new steps. And if you want to give this format to compare different version of your workflows, this doesn't work very well, so it's unbelievable. So one thing that we have tried is also to change to a new workflow language for example, this work from John Chiltern. Mainly, and this is YAML-based, so it's much more readable and writable than the GA format. And it's heavily modeled on CWL. In fact, it has a schema that's also similar to CWL that's linked here in slide. But we are not currently exposing it to the user interface. It's only available for the API. So we are planning to change soon to be kind of default export format that we have an issue on the repository to track this as an example in slide. So why, since we already have a CWL live format, why do we want to support the actual proper workflow standard? So the main reason is that we want to expand our community of workflow developers. And using CWL would allow us to reach to other groups of workflow developers that are not even, it's already used in biology or bioformatics, it's also used outside of our community, mutual communities. And also this really creates collaboration with other main scientists that are tool and workflow authors that are not used to the galaxy formats. And again, this will allow exchange of workflow across platforms. You could develop your workflow in galaxy through the user interface into some of the different platforms they will be able to directly run it or vice versa. Anyway, standards are great, so you all love standards, right? So why do we choose CWL in particular? Because there are hundreds of different workflow standards. So first of all, it has a very good documentation starting from how we have an introduction to it and the scheme is well-developed as well. And it has hundreds of components that allow you to see if you're in validation of the standards. It's pretty enough to be comprehensive and allow you to test your validation. And it's established for the process to get to a different new version of the standard that we've already explained before. And more important point also, the execution model of CWL, so how tool and workflow are represented is similar enough that we can easily support all of them. We can support it with integrating with the current galaxy workflow tools. So I'll give you the stage two micro to introduce CWL. Okay, everybody. Yeah, so just a little bit about the CWL project. The CWL project is like a boutique. You know, small standards development organization posted up that software for you can serve and see public charity based here in the U.S., who's legally obligated to work in the public interest. And as a project, we care about the prestandard process, bringing people together to have a neutral, convenient place, making the standards are actually writing it, the editing, all of these open standards principles and a post standard cycle about moving the standard, you know, things that need to get fixed and of course, supporting the open source tool ecosystem around it. A little timeline in case you're not familiar, we just celebrate their eighth birthday. We were born at the 5x open source conference. Coach Fest, John Chilton is one of my co-founders along with myself, Peter Amstutz, and I'm a voice of John and then over the years, we've made steady progress and adoption. And this year, the big news is that we got a paper published communications at the ACM, who they kind of bringing CWL to even a wider audience. That's kind of a nice journal that's going to mix industry and academia. And work continues on further documenting lots of corner cases and places where reportedly wouldn't happen with new CWL standards, but it's also going to be more advanced fast to ensure that additional portability. So as we mentioned that CWL standards are two, we didn't mention that actually there are two standards in one. One is the command line tool, which is actually the very difficult part, part where the most different from Galaxy. And then the workflow language, which is now with GX format too, and even the old Galaxy format conceptually very similar. I think we're converging quite well. So we, we supported a very similar execution model that benefits from software containers, but does not require them. In fact, some people run their CWL workflows with carbon packages. And we've seen now adoption, while we started bioinformatics in many fields, most recently the geospatial earth observation folks are folding CWL into their open geospatial consortium standards. So maybe one day we get to add the Galaxy to this slide, but outside of Galaxy CWL, you can write it on your laptop and run it on your cluster and cloud. And you sort of back in, you probably can imagine there's an open source implementation and often there are commercially supported implementations as well. So as an update on the other side, just to show you a taste of the syntax, here is a command line tool description on the right. And in the YAML format, we do try to get that balance between human vehicle and machine vehicle. And it's a very explicit model. We're also very much informed by the experiences of the Galaxy community. And we also support extensions if needed. So we mentioned that 1.2 came out, there are conditionals now, which is I know something that Galaxy is looking at. And we just had a proposal come out for loops. So as an extension first, and maybe that will make it into CWL 1.3. So at the co-fascinate, we'll look at the loop proposal to see how that might make sense for Galaxy. To me, so how did we start the work? And it was the current status of the CWL implementation in Galaxy. So the work started in 2015, again, when the CWL standard was being developed by John. And in that work, we were basically supporting both the CWL tool standard and the workflow standard. For the tools, we are subclassing the regular Galaxy tool class and creating to create the support CWL tools and loading the implementation of using CWL tool, which is part of the reference implementation of CWL. And after reading it, the actual execution is very similar to the one of normal Galaxy tools. For workflow, we don't need CWL tool, but we're reading it directly to the files and supporting the best part of Galaxy. And from the technical point of view, we have a separate fork and branch that we have developed across the number of years. And slowly piece by piece, we've been integrating features upstream already. And when we try to do that using small pieces of code that are usable in itself to Galaxy of cleaning out the system interfaces. So what do we have accomplished here now? So there's been a huge number of requests that's been already merged probably now around 100 and that are related to the CWL only specific to CWL. But three things that we're putting in this slide is some workflows. So if you don't know, there's some workflow in Galaxy and they came originally from maybe from CWL branch. And also expression tools, which we don't produce, it doesn't have as an output, but something like a string or number. And these can now be used also to help in the implementation of conditions. After this very nice implementation, we have continued this work through several bio-hacketons. These are reported by Galaxy here in Europe. So we have really done four projects, so from four different bio-hacketons from 2018 to 2021. The code there was again, taking small pieces out of the branch and open request and merge them. And so this way we will greatly reduce the size of this separate fork. And importantly, we are going to have a fifth one again in Paris in November and the next bio-hacketon Europe. So I've got here a list of implemented features and going through them is quite boring. But again, we have an important one like some workflows that we've now merged and some of these are still only in the form of partial docker support, multi-input scattering, overriding two input defaults, all these kind of things. What's the current status though? So the last bio-hacketon, we open an official core request from the separate fork and this link up here, and this is frequently re-based by myself or others in the group to be always to some merge conference that nobody happens, it's quite good work just to keep it re-based and working. And it's been reduced a lot, but it still touch around 100 files and I think we'll be fine 3,000 lines of code. It's not huge, but it's not so much more. And regarding the conformance test, so how well we are implemented is CWL standard. So for 1.2, so the latest version of the standard, we have over 200 tests that are passing and around 100 of them are failing. But all of these 100 are failing, only 12 are actually required for the core features of CWL and the rest are ocean features. And that's came here exactly today. So we have an external user from the University of Manchester, Oliver, and a couple of other people mentioned at the bottom of the slide and they were able to take their own already existing CWL workflows and loading them together with their respective tools and building the user interface and write that. So it's quite exciting for us. The remaining challenges I think the main point here is that when you import a workflow you have an embedded tool and this is basically what was mentioned already by Anthony yesterday, so a very user-defined tool. And this is kind of a trickier thing because in fact we are actually only allowing it for admins at the moment but we already have seen that for interactive tools, not to use the fact that interactive tools, there's been people that have been abusing the main Galaxy servers, you and others, and it's been pretty demanding. So if we do something similar for workflows allowing user-defined tools, there's obviously a security side that we need to take into account so running those containers and staging a full source or limiting the access to the policy system as much as we can. And there's also, we are reading resource requirements that can be specified by tools or workflows or tools that I can say but we are not passing this information to run on what is to run on this. There's also this stretch for us concept of default files and directories that we don't have in Galaxy because everything in Galaxy is a data set and this is satisfied to come together with the work in the way. And finally, work for conditional limitations not finalized so working for us. So what's the road to getting this development in the Galaxy service that everyone used? So we first need to iron out the last to offer a quite conformance test and we plan to merge then we see the red branch by the end of the year I would say probably after the hack at all maybe. And then we'll try to enable this on teststatsproject.org and see how people start collaborating to see that their work was in Galaxy. And just if you're interested in collaborating participating in this development you're putting here a couple of links to the website that the official getting started guide or follow charts website with other possibilities. Finally, acknowledgments. Thanks of course to John who started all this work and Marius that's leading now together with us. Michael for being on stage with me today and the other people is in the slide by the stage is happening and then it is here and why are you supporting us during the last four five minutes. Thank you all for your attention and there's some type of questions. We'll be back. Welcome. This is a part of the Galaxy architecture slides which are three and a half hours long and highly technical for the next seven minutes. So here's a typical interaction between the web browser and Galaxy but I'll just sort of zoom into this part at the end the web browser will make a request and Galaxy back in needs to respond and produce some JSON typically and this is happening in a web request and generally just take a couple seconds but there's so much stuff that we want to do in Galaxy that takes more than a couple seconds installing tools, generating PDFs, creating datasets, exporting histories et cetera and web servers just are not designed to do this work and this model breaks down a lot and so traditionally in Galaxy we've had all sorts of hacks over the years to deal with this but over the last couple of years really we're sort of converging on celery and message queues and it's a very sort of modern Python approach and the idea here is that the web requests that initial request from the API just put the message in the message queue and says here's some work to do and then celery workers on the back end take that and then do the work and then they have as long as they need and this is how it should technically work sort of current best practices in the field and this is sort of a very standard Python stack and much better than our sort of typical hacks inside of Galaxy which we're all sort of very ad hoc there's some downsides to using celery anytime you get into a piece of infrastructure in the stack it's a little bit harder for admins it's a little bit harder to configure Galaxy it's a little bit harder for developers to develop in this Galaxy but a lot of that was illuminated by Nate's amazing graphic work which he presented yesterday but it starts up celery and it takes care of the details hopefully for you to at least make that initial user experience initial developer add an experience better just starting Galaxy and having it all work when you start Galaxy now you'll see a list of celery tasks and we really expanded the number of tasks we're doing work we're doing in celery over the last year that's really been essential to a lot of the things that we've seen and I'll get to that at the end of the slides here so first I'll just talk a little bit this is from the architecture slide so I'll talk a little bit about how to define a task sort of developer idea a simple task looks a lot like a fast API endpoint that Marius talked about earlier you've got a decorator that says you know the decorator on a function that says this is a task instead of fast API this is a controller method and then you've got the request as parameters to the function automatically based on type and then we've got just do your small piece of work and so everything that Marius said about protecting ourselves from fast API and fast changes and having thin controllers applies to our tasks also and the developer experience should feel very similar yeah so the request here is coming in as the setup export job thing it's a pythagantic model just like the fast API we've done the plumbing to make sure that Celery knows how to take those and send them to the task and the models are really quite simple they're just Python classes with type properties just a little bit of implementation details but the components like here the model store manager that just gets injected from the Galaxy app object based on the type of the thing it is executing the task from inside the Galaxy sort of the client side of this is pretty simple build one of those request objects and then you sort of take the decorated function and call the delay so it used to be a function but with the decorator it's now a task class and so it's nice and easy best practices keep things thin keep the required components and things you're consuming from the Galaxy API container as small as possible place things in the right place and you're good to go so we have some initial success stories for tasks over the last couple of years here actually slides are all about the last year so a couple of years ago I had the Galaxy markdown the PDF support and it was essentially not usable the PDF generation took too long and the feature was quite unstable even if you wanted to have it up it was quite difficult and so the last couple of releases of Galaxy we've added the short-term storage component here and what it is it's a component it's a component that's going to manage files that just exist for a short period of time and are being served back to the user and all of these props you can celebrate the generation of them and this is going to get us around all sorts of tasks that we have to generationally do with Ingenix web server plugins etc hopefully even the future admins will just need to know here's the endpoint that is hosting these files Ingenix do that to my things right at that endpoint and then everything else can be and the client can have the polling of that endpoint in a very consistent manner the gigantic model that generates this just taking a string token that sort of tells Galaxy where to write the file and then here we've got one of these tasks and again we've got the request and then we've got the components that are being injected by Galaxy and that's a very lightweight task and now we've made PDFs much more stable to be generated in Galaxy on Sunday I talked about all these APIs that I added for the different data set work for when histories and invocations and libraries and collections are in a consistent way and all of this work is powered by salary is powered by this task framework Marius did a bunch of work on optimizing the upload of jobs so the API tests which aren't even supposed to be testing that from two and a half hours to 50 minutes with this work of keeping salary keeping the Galaxy process in tasks and doing the upload with salaries or with jobs we'll also explore competition of tasks together in the pilot tasks in salary and so that's amazing work we'll continue to see of all going forward batch operations that Dan talked about along many ones are all limited in salary future work basically eating slow in UI we're going to celebrate thanks so much talking about the Galaxy port which is a way of securely storing sensitive information in Galaxy I think a lot of people excited about this idea that you could properly use identity to jobs or tools and maybe securely take actions on behalf of users so this is going to be much more simpler thing and basically what this does Galaxy as you know is using these interfaces with external systems so it could be an SD bucket or top box on behalf of the user or perhaps also in all these quick cases the Galaxy needs to be able to talk to the systems on behalf of the user and securely access users credentials so for example now but of course this is maybe concerned with storing these secrets so so really it's about encrypting these secrets maybe having an auditory of access some centralized management capability and the ability to remote or rotate your encryption keys so this is really pretty hard to get to writing Galaxy itself and it occurs out of complexity so ideally you want to dedicate these to systems that only do it quite well so that is what the Galaxy mode does it's a very simple programmatic interface for managing secrets with multiple cardable backends so this is the interface itself so it just has three methods that really abstraction over key values so it can write secrets to a store and read secrets back and it's really meant to be a building block for high level services so some of the high level services that now do take a look at the first one we'll take a look at two examples the first one will be the user preference so as you may know in the Galaxy user management user preferences screen under management information in the UI there is a form that they gather user details so this form is actually generated based on the animal file that you call the user preference is fixed upon the animal in your Galaxy configuration so we now added the ability to store specific fields in this form like for example you may see some sensitive password or something and we can now route it to the world so it's stored in an encrypted store another example of this is file sources so file sources as we know is a way for Galaxy to access remote storage like sidewalks or blockers or something like that and in this case again Galaxy needs the user's credentials and we need a way to securely retrieve it and handle that so we now added the ability for the file sources to directly read these credentials from the moment so again it increases your ability to manage secrets more securely so the world itself is configured very simply we just add an entry called world config file and point to the config file and the world config file itself simply says what backend to use and just credentials for connecting to that port so there are several supported backends Custos is a managed service so that you can see it's run by the Custos project which is always on web service we can just straight away start using it after the registration process so that's probably one of the easiest ways to get started so we collaborated closely with the Custos project to make this happen then obviously if you don't want to use an existing service and you want to run the world yourself you can use the hashtag called world which is also what backs Custos as well but of course you do it yourself and finally if you work with your requirements about models and you just want to kind of add something just to encrypt your secrets there is a database backed encryption management system as well so that simply stores it in a local table and you can do basic things like key rotation I mean not really but better than not so yeah so this is this more documentation here so basically as I mentioned we have two services that being advantageous right now if I have some future more can do so so consider deploying this and using this to encrypt your secrets on behalf of your users so if there are questions do you have time for questions are there any questions how hard is it to run the hash I mean I guess it's not that time but you still have to do the service make sure this will go down and that kind of thing so I just competitively speaking manage services easier but the model itself is just a single command to start it up then is the mode accessible for the model you know not that important so I think for that to happen a lot more things have to happen on the way we need to propagate identity I think and then have a way of the Galaxy VI itself I think needs to be able to hand out some time-limited API keys so that jobs can communicate with the Galaxy VI and then the access to secrets on behalf of the user but as you take it another level further the secrets you get should also be time-limited and so on so it's more what you do there so we have to get these steps on with that Next up is Joe Alright now this is new database migration system I will talk about what the database migration system does what it is and what has changed what is new for both developers and admins a little bit of the internals how stuff works and just a glimpse into the troubleshooting and we have to look for more information so first of all it's database schema migration and schema migration is the management of incremental reversible changes to relational database schemas so essentially it's like a powerful database a gift for bigger databases database version control it is performed on a database whenever it is necessary to upgrade or downgrade the schema to a specific version and how does it happen well you execute a sequence of changes to the schema until it reaches a given state so it goes from one to two three to n or that way around and how we can do it well first of all of course we are all SQL experts so we can perform day-to-day surgery as quoted by Darren Baker we can do it with our eyes closed at the same time we are programmers which makes us lazy by definition so we write scripts which do that for us and this approach works absolutely beautiful so for example if you look at our databases with two or three or four tables what could possibly go wrong this is the galaxy data model it's a graph representation fairly accurate it contains 154 tables they are presented as vertices and the edges are exclusively defined relationships between these tables now the structure is not accurate of course we have tables with zero relationships we have tables with zero relationships and the dimension is accurate in the size of the input so it is dense, big and scary so one does not sit with my great manual with the galaxy data model for that we have migration tool that automate the process and galaxy has used SQL outcome migrate up to now it's a fine tool it wasn't fine too however it hasn't been maintained actively for at least 10 years as a result of the work on the SQL outcome 2.0 which is a complete deal greater for us so we have moved to Alembic finally as a release 2205 so why Alembic and what is Alembic it is a very actively maintained and developed project it has been around for more than 10 years it has 102 releases the most recent release was just a couple of days ago and I checked it today before yesterday so maybe they have a new release I don't know lots of issues, quality issues well-aggressed issues it has excellent documentation and the code is very well written very coherent it is a joy to read furthermore it is developed and maintained by the developer of SQL Alchemy very responsive and very easy and pleasant to deal with on GitHub so in addition to that which is the recommended migration tool for SQL Alchemy has been recommended by the community for at least 10 years so that's why Alembic what's new for devs and admins first of all what's not human this is something I bet many people didn't know Galaxy's data model is actually two data models it is the primary data model the galaxy core maker data model and then a smaller install data model so both of these models they are not independent they depend on each other they are not separate applications they are based by default or two separate databases either way and this is your install database connection configuration setting and of course Galaxy has to accommodate both scenarios in one code base so how did this version of a two in one model work with SQL Alchemy migrate well SQL Alchemy migrate being a fine tool it's quite limited so we had one version located in two separate directories one directory for the Galaxy model contained all the migration logic plus all the revision scripts the other directory for the install model contained exactly the same migration logic plus manually added sim links to the relevant revisions in the Galaxy model the upgrade process was relatively straightforward you write a revision script you place it in the versions directory you create a sim link manually if started in the install model you run the upgrade script on one model or the other model how does this work with Alembic? well first of all Alembic is way more complicated it supports N and 1 data models why the concept it calls branches and branches essentially branches I won't use for many things in Alembic but that includes virtually lining into the start of one common route and represent different parts of one parent model so this is our setup we have one Alembic installation a redundant code and we use two branches one is the branch reality the other one is TSI the branch for the toolshed install model known as the install pass each branch has its own version history it's represented by revision scripts which are located in two separate version directories so the new upgrade process step one you create a revision script template you do that by running a command the command this is an example of the command it translates as create a new revision at the head of the galaxy branch hold it create a table pool typical units use a friendlier send it Alembic will generate the revision script template for you and will place it into the appropriate director so step two you open the script and fill in the body of two functions upgrade and downgrade and step three you run one of the two available migration scripts and they will run upgrades of both models simultaneously that's it now two available migration scripts which one you choose so there is managed to be and run Alembic so managed to be this used to be a very thin router around SQL Alchemy Migrate so what it did it would set up some configuration values and directly ask control to SQL Alchemy Migrate we don't have SQL Alchemy Migrate code-based anymore so now managed to be something completely different under the hood it provides a subset of basic SQL Alchemy Migrate commands and translates them into input which can be understood by Alembic so it's an adaptor source run Alembic got a stage it's just a thin router around Alembic the Alembic CLI run so which one do you use you can use either one it really doesn't matter if you need or prefer simplicity go and manage to be if you prefer or if you need the full spectrum of Alembic CLI operations and options which is quite impressive and very handy at times you can use run Alembic the other stage internals so everything lives in live galaxy model migration this is all the code everything happens there and the only two parts of it that are relevant or interesting potentially interesting to a contributor are the versions directories the two directories one versions the underscore GXY and the underscore TSI that's where older vision scripts live to modify the bodies of the upgrade and downgrade functions and the other one if you are interested in or need to figure out what's going on and look at the logic which is used by galaxy every time on startup when galaxy determines the state of the database or the two databases and determines what to do with that state whether it can safely start up or it needs to fail or it needs to try to automatically upgrade the data so if you want to dig in and understand what galaxy does that's where to go and of course the main issue here in this update was migrating galaxy migration systems so the goal was to provide a simple upgrade path for existing galaxies installations from 20.01 to 20.05 and again the plan is very very simple the system should check whether the database exists and if it's not empty and if not create and initialize then it's done otherwise determine whether the database is up to date if not check whether it can be upgraded automatically upgrade or fail with informative message that's it it's very simple but it comes with a lot of complexity there are two models which may be two separate databases or one combined database which may have different versions which may be upgraded which may have been upgraded in a different way the migration might have stopped in the middle so it goes on and on and on so to ensure a smooth transition across multiple configurations and database setup each model runs through this and we don't have enough time to do this justice essentially this is the migration algorithm which it's a flow of logic which is followed by galaxy every time on each startup and what it does it tries to follow within reason every possible combination of configurations set up in order to determine the path to the appropriate end state which might be either a fail with some kind of message or a done rectangle which means the galaxy continues to start up and with great complexity comes great responsibility without a test list so there were two key test requirements one we needed to test the system under multiple configurations so again the plan is fairly straightforward we know the set of input states of the world so speak the database state the configuration state and we know the desired output state which we can match the output state what we want the result to be as in the database being upgraded with the system exploding in our face with a pleasant appropriate message so we would ideally want to use the test first approach so we codify the inputs we match them to appropriate outputs we put this into a test function we run it and it fails we implement the functionality the problem is there is no way to easy way to codify multiple input states and multiple outputs expected output states so the second requirement is each test case each test case needs to run against single light well that's easy we run tons of tests again but the problem is that each part we have set up there immediately for test session which includes hundreds of tests which we don't create and tear down the database for each test stage which is not true though so we set up a new testing infrastructure what it does is to address the first challenge of the input states and output states we have composable metadata objects containing one table each it's like bricks you can use to build bigger structures so you take metadata objects we we would use to describe a certain make up of the database we put them together and then you convert them to the size of one variable which is a bypass fixture and we give it to another bypass which takes this data and loads it up into a live database and again it's condensed into the size of one variable one fixture which is passed to actual test function so the test function essentially gets a very simple format you give it two simple arguments which are in fact a database in a composed state and which is the output we expect from a database which comes in this state and then the test functions body is going to be called a migration function or whatever function is called within the Galaxy code based on startup and then verify that the resulting state of this database matches the expected output state and that's it nothing more the other test the requirement was addressed with new contracts managers which ensure that both databases are created populated and properly torn down on every single test base so that works it's all located in test unit data model so troubleshooting and further information, troubleshooting just the glimpse when things go bad look at the database first of all you may have the migrated version table which must contain version 180 if it's not that, that's the problem next step look at the related version table which must be present it will contain two values, two revision identifiers and those two revision identifiers need to correspond to the revision identifiers of the head revisions in the revision scripts in the two version version directories version or gxy version it was under sport tsy if all looks fine the bodies of the upgrade downgrade functions and if all fails see the next slide which is fantastic information and let's find it so we have a read me file in the code base it describes the migration system and a little bit about how to troubleshoot also I highly recommend the comments the comment blocks at the top of the screen they include detailed instructions on how to use these scripts there is also the key pool request which contains detailed depth oriented description of how the system works here are links to the relevant documents it will help the documentation there is one very relevant tutorial coming up tomorrow contributing a new feature to galaxy 4 and of course the galaxy back in working group has always had to hear from you we will try to help thank you sorry I went all the time so let's look at uploads in galaxy specifically by uploads so there are a couple of upload options in the galaxy you can upload files with this you can use the new code with all these different plugins you can write more we have tool uploads and data sources so tools can also generate data that is also kind of an upload but sorry we are talking about files from this how did galaxy do user uploads from this previously and sort of a historic overview so we can transfer the whole file one go use a multi-parform upload since we didn't do this then we handed that off to the app of one tool created a job so booking upload and the handling was slow it was all handled in python and that's not where it excels if the transfer got interrupted the connection dropped the galaxy was done well we approved on that using the nginx upload module but I was kind of difficult to distribute you could compile it fix admin overhead involved whenever you upgrade your distribution you tend to rebuild it it's also kind of un-maintained but admittedly it works fine it's still necessary we will transfer the entire file with this approach they still check something and in response to that we have difficulties to upload large files we've developed the custom chunk upload API again on the side it's slow it's handled in python it requires galaxy up and around at all times and it had limited user but it was possible and you're inclined for to take your API galaxy user interface and it created problems so this is an issue we're open in May 2020 the problems because the user interface became un-sponsored and you look at what's taking time what's that specific API so you can see individual requests taking like up to 15 minutes and it makes sense I mean still user upload you're clicking in slowly so that's okay so what do we need ideally upload that skinny pen from galaxy and I'd say it's true from most things we want to add if it's external from galaxy time it should be a reasonable performance we want to have the the possibility to do check some verification they should be good out of the box experience it doesn't make it more complicated for atmosphere on galaxy so to work with any proxy we sort of recommend engine X because there's a lot of cool things that help us but it should still be possible to use Apache or whatever that's used and actually it's very reliable and easy enough to replace activity in these types of default which is what we had recommended since very long time with the additional things we are forced to use activity out so what are our options well I mean I started looking into this and the obvious things that search for engine X upload are just to see if there's something better there is so there's engine X back up load as a do-it-makes script can be installed on most distributions by compiling up the whole of engine X it's a custom protocol it's coupled to engine X it requires an engine X1 module which is easier to get than an upload module but it also kind of looks unmaintained again maybe it just works but it doesn't use confidence and then well it starts working on this and then it's kind of in the right way that's a problem other platforms have to have as well like certainly not going to upload big files yeah so we're a bit more moving and I found this task protocol for kind of everything protocol but what was good about this is an existing protocol so we didn't have to come up with this there are many server-inclined implementations it's robust, performant reasonable mostly external components I'll just come back to that briefly and kind of optionally do checks on we're not actually using checks on right now out of reasonable performance we're just getting some hands on experience first but that's definitely a possibility that we may want to offer so how do we implement this as it works so the user starts with a upload request there's fingerprint that fingerprint identifies the file and the user and then with that fingerprint contact the test server and ask for the file but we ask do you know what this file is if yes we passed but I rolled up a few from the back so usually it's not complete so then we start uploading the chance on the first request we pass through the headers for Galaxy to verify that the user is actually known for just something crazy stuff or not necessarily user anonymous we just pass through and then go on our way we use our uploads data and then finally we pass on our fingerprint again so the only thing at the end is we have a tiny request and Galaxy was the same but it uploads either through so actually during the upload process you don't need Galaxy to hand it up so so how do I be able to tell some about Galaxy instance we're going to update this to a new one so Galaxy comes with a middleware that acts as the test server obviously that's not as performing as having external a test server back to the problem with the time API it doesn't do check sums and restarts will obviously cause outputs but then for production instances we want our own test server to have a proxy so we have experience with our test team the rest tells it's looking actually even more interesting but it wasn't out so we just have proxy that takes up that API root so middleware about attacking as a server instead that's then being handed by a test server where it can start testing or deciding any way you want then the outputs will continue or it's currently restarting and we have the docs so there are additional clients so by then knows how to use the test outputs now and then there will be based on that maybe there's an app that provides a nice interface the NXT front end uses the test.js client so we didn't actually have to write all ourselves and here's a list of official implementations and in official this is longer so yeah the history archive uploads will also start using tasks like doing file staging with Pulsar not to use tasks that can be re-created by Linux for those cases and I have it in my Pulsar number 35 so if you have questions I can answer them there.