 to drive the slides and yeah. So Sergey is going to talk about the release testing process for 2101, which should be out in a couple of weeks. All right, go for it. All right, sharing my screen. OK, so here we go. So this is our semi-formal report. The team was Assunta David, John, Keith, and myself. So first of all, release testing is hard and the team makes all the difference. And every single one was absolutely invaluable, every team member. So thank you very much for the work that became bearable, doable, and we did it. So moving on, we tried to plan our release testing based on the results of 2009 testing. The results were not very quantitative, mostly qualitative, but at least this is what we've based our plan on. So first of all, the key results were in the past cycle, we did 21 tutorials. We used them as testing as usage scenarios. We did the tool test result comparison when we compared the tests of tools compared to previous tests on the new release compared to the previous runs. We did only three, and that's an approximate number, only three out of the exactly 350 release notes items for the previous release. And we spent on it approximately two weeks of time. So that was a major commitment. And I daresay we didn't get even close to what we anticipated it would be able to do. There were several recommendations based on that release cycle, testing release cycle. The three most important which were relevant to the release testing process, where one, a more structured testing plan based on some kind of protocol of what to do and what not to do, what to test, went to open an issue, went not to open an issue. Two was to open issues more aggressively. So what we did in the previous testing cycle is we would find an issue, we would spend a lot of time identifying whether it was an issue or not. Maybe we would post an issue, maybe we would jump into fixing the bug. And that could take an hour or two or three or four days. So that was an item for improvement. And we also made a recommendation to mix and match testing teams so that there are both developers and active Galaxy users. This is the only item we were not able to implement this time around. So release testing plan. This is the specific plan we put together and which we tried to follow religiously. So first of all, the scope. GTN tutorials, again, they provide, they proved to be a very good idea. They are well-defined, tried and tested scenarios for Galaxy users. So what we did, we decided to go through select tutorials in GTN. Of course, noting any required updates and opening issues were necessary, but noting anything that reflected any changes which stemmed from the new release. Second, release notes. Again, in the absence of a formally set in stone defined set of requirements, release notes, we decided to use them as a semi-formal itemized list of new things, new items that are supposed to work. So we went over the release notes. We planned to go over the release notes as time permits, verifying each item. And we did not do tool test results because luckily they were completed before the testing process started. So the approach, and these are the specific differences compared to the previous testing cycle. One, we had a very specific, well, a much more specific testing protocol. We had defined what went to open an issue as in, when with the tutorials, we mapped out specific requirements, what constitutes an issue. So for example, incorrect or outdated descriptions of steps that conflict with the current UI, without data UI, or updated screenshots, again, that conflict with the steps presented in the tutorial, broken links to internal Galaxy locations. We also outlined items which do not require opening an issue. Again, that would be a good thing to open an issue, but it's impossible to do everything given the scope of the release testing. We also specified how to, for release notes, how to verify that a problem is relevant, how to verify that it is indeed a problem caused by the new release and what would be the steps for opening an issue if such an issue is not already open, how you tag it, et cetera, et cetera. So we had this structure on our hands which helped us move through the release testing process. The second very important change is we did not fix bugs. We decided to focus specifically on what the release testing team is supposed to do as in test, identify bugs, identify issues, and let the team fix them as really it is supposed to be. The time commitment we planned two full days, I daresay it was not two full days in the end, it was probably more like one week, but again, that was the plan. And moving on to results summary. Thanks, Sergei. I'm gonna take you guys through the results of what we did. So we went through 24 tutorials, 30 release note items, opened 26 issues, and nine of those issues are blocking issues. The tutorials that we covered of the 24, four of them were admin tutorials, five of them were introduction tutorials, and 15 of them are galaxy interface tutorials. So a pretty good spread. And then these are of the 30 release note items that we covered, three of them were considered enhancements, two of them built-in tool updates, nine of them were new data types, two new visualizations, and 14 highlights. The issues that we opened, we opened up 26 issues, nine of them were in the training material repo, and 17 of them you could find in just the galaxy repo. And like I said earlier, nine of them are considered blocking issues. And that's what we've done. Keith, are you there? Yes, I am here now. Sorry, I was a few minutes late to the meeting. Here's just a couple of examples of the problems we had identified in front of pointed out, we'd found 26 issues and this is just some examples in the UI, 11, 4, 4, 3 dataset headers are not being displayed. There's a problem with text wrapping can break words incorrectly in the UI. So that will be a CSS issue most likely. Some controller routes fail when you reload. I think that was a known issue in the server errors. There's the root problem with the Rural Builder 11, 4, 3, 7 on occasion would return an empty list. Sometimes 11, 4, 5, 1, there's unhandled errors when creating tags. Most of the tutorials were okay. There's a few outdated screenshots and some outdated or inconsistent steps. Most of those fall into what Sergei said, things that we decided that we weren't important enough to actually open issues for. And then there's some other common ones. The last point there, 11, 504, the clear search button doesn't always clear the text. 11, 4, 6, 5 data libraries, the search trigger to refresh. And then there was an issue 11, 503 workflow best practices that it's already been addressed and closed. So if people wanna go in, they can just do a quick look at those. So moving on to blocking issues. Sorry, I'll talk about this slide, but one thing I wanted to mention that these 26 issues, like listening to the talk, I'm realizing that you can, whatever slide, that the 26 issues are after the whole team and the extended community spent a week on the smorgasbord, like a week getting ready for it and a week doing it. I know that many people spent more than a week getting ready for it. But I mean, in terms of like, Anton yelling at the core team, like let's get this done. Like, so there was a lot of hands on all of those tutorials and this release had a tremendous amount of testing already. And then the release team took over and still found 26 issues. And I think that's really a testament to like the thoroughness of the team. And I'm just nothing but respect and I appreciate Sergei's effort and everyone. There are still nine blocking issues. I'm not sure that all of these were discovered as part of the release testing process, but some of them were. So I know that I'm trying to sort of tackle the rule, builder ones here. There's two of those that are open. I think that Marius is taking care of most of the century issues, but I think there's still one there. But if anyone else has time to sort of jump in and take one of these on, that would be great. So I think I went through and I pruned, this list had like 40 items on it at the beginning of the day, but I pruned a bunch down. And I think these are the ones that are sort of blocking the release. They seem like regressions. And if they're not regressions, just go in there and say this was broken in the last release and you need to push it off. But yeah, that's the blocking issues. Okay, moving on. So that gets us to testing process analysis. It's not a formal analysis, but a semi-structured way of summarizing what we, the conclusions we came up with after we looked back at the process. So first of all, we made it better, I think, so I hope so. And these are the main points compared to the previous release testing cycle. So first of all, it is much, much easier to do it when the protocol is detailed. I'm the only one on the team who did the previous testing cycle. So this is based on my opinion only, but the opinion is strong. I believe this really made a significant difference. So these steps which we outline, look at the issue if the issue does not exist, open an issue, tag it with this label, tag it with this milestone, et cetera, et cetera. It actually helps when you look at the items which constitute an issue that should be raised on the training network repo, it actually helps. When you see that, aha, this can be ignored, you immediately forget about it, move on to the next tutorial or to the next batch of release notes to look for stuff to detect. So that helps. Second, that is significant. We actually did release notes. Last time we completely ran out of time, there were 350 release notes items in 2009, we covered three, but I dare say we covered none because those were kind of important items in additional to release notes. This time around, we actually covered 30 items of the release notes. Of course we prioritized, so we did not cover them randomly. We focused completely on the items which are listed above the release notes purpose. So highlights, there were no security notices, there were no deprecation notices, but those are the kind of items which we would suggest including normally on any release testing cycles. So these 30 items from the release notes are legitimate items which were looked examined to a certain extent and gave a yay or nay. Three, availability of VMs are very helpful. We had virtual machines, we underutilized them, but they were available. And first of all, thank you very much, Keith. He set them up. I can't even guess how much time that required and effort and sweat and pain. So thank you. And I'm sure we groups should utilize VMs more next time both for GAT testing and also for comparative release, release testing when you need to verify whether or not an item is indeed a problem on the new release, but not on the previous release. So that was great. And probably the most important boost to our productivity was the fact that we focused on testing and not testing and fixing. Thank you, Marius, for suggesting that at the very beginning of the testing cycle. This made a huge difference. Again, I remember last release cycle we would discover an item and possibly we would spend hours if not days on trying to figure out how to fix it. And that involved learning about this or that part of Galaxy for some of us on the team. This time we would identify an item, we would immediately move on to looking for more. And that helped us cover 30 items from the release notes. So that was great. And moving on. Is that great? Yes. For the VMs, is that in JetStream or in a commercial cloud? JetStream, Keith, JetStream, right? You're muted. Sorry, yes. Those were on JetStream. I just spun up one instance on a cluster on Google just to do some last minute testing with the Anvil stuff. But that was formally included in part of our release testing plan. Great. That was my next question about, I mean, presumably if it works in JetStream that gives us a huge leg up for working in other cloud environments including Anvil or for ITCR. I guess at this point, that may be beyond the scope of what we can do comprehensively now, but I think moving forward that needs to be part of the test plan. Well, there's a super... We've sort of talked about that a little bit that the next time around, we would have VMs on all the cloud providers that we can have on JetStream, Google. I don't know if we wanna do Amazon or something like that, but have those available just as a sanity test to make sure that they work. And then some of these things require admin access. So they need to go in and actually run Ansible playbooks and they can't really be doing that on main or EU. Yeah. Great. Thank you. Moving on. Okay. I guess this is my slide. Thank you, Sergei. As John had mentioned, the week before people had been doing a lot of tests for the smorgasbord. So when we ran through any of the tutorials or at least when I ran through the tutorials, particularly the admin training stuff, there were no major problems found in the tutorials themselves. As for working through the release notes, we did identify a couple of issues and one of them is simply the number of items in the release notes. There was, I think it says there 350 and 20.09 down to 283 in 2101. There's no way a team our size, given the amount of time we had could make it through that, even if we did find a way to do it efficiently. And it's sort of, you get the sense that you're trying to empty the ocean with a teaspoon. You can try and whittle away a few of these issues and the list of outstanding issues doesn't really get any smaller. So you didn't really have a sense of making progress. Like the first number of issues, there is a dozen or so and we're able to knock those down and you see them getting smaller and you feel like you're having something then you get into the release notes and you're just sort of swimming and treading water without any noticeable progress. Because one of the problems we had is it was not always obvious to us how to prioritize which one of these tests. Some were highlighted as being important, but for the rest of them, it was really hard work for us to figure out what were the most important ones that we should be dedicating our effort to. And then when we did identify them, it wasn't always clear how we should go about testing that release note, rather than saying, oh, you know, click this, go do this, make sure this output appears. We'd have to go read PRs. Very often the release note would reference two to four PRs. So we spent a lot of time just figuring out what needs to be tested and how we would go about doing that testing. Once we did figure out that out, there'd be a lot of duplication, things that could be or should be done by the automated testing. And then not all of the tests should be testable on main. They might require admin access or they might require tools that are not there. And as the last item, some items shouldn't be manually tested. So we need to identify, if we've got a limited amount of resources in human hours, we need to identify what actually requires a human intervention and what can just be done automated, automatically, the task can be automated. Okay, and then some ideas that we had for improvements. Asante came up with the idea of a PR template. I believe it was, Asante, I hope I'm not giving missing somebody for the credit, but have it just a common template for PRs. People can go look at that poll request, it seems to, I think it's actually already been accepted. It just includes some ideas or some details on what the PR does and how to test it. And ideally at least some screenshots from the person submitting the PR, ideally maybe even a little animated GIF to show how to recreate the problem or how to test it. PRs could be tagged for items that need special attention back to the previous slide where we wanna identify tasks that require some human intervention. So maybe we could have some tags in there that says, oh, a human needs to identify this. The use of the Galaxy Admin Tutorials was a great idea. And then we also talked about maybe a greater focus on automating testing, perhaps using, creating some selenium tests that basically mimic some of the tutorials going through and stepping through those that those could just be automatically kicked off as part of the release test. There is a comment in there about Sentry, somebody else will have to talk about Sentry, I'm not that familiar with it. And then the idea of again having a side-by-side test where we've got running through something on main and on a VM potentially with a previous version so we can easily compare what was going on in a previous version as compared to what is going on in the release version under test. Thank you, Keith. So all of these items are open up for discussion, but before we jump into discussion, I just want to say two things. One, first of all, the PR template merged, which has merged already. I encourage everyone to take a look. It is fantastic. I think it should improve release testing at least besides many other things significantly. And there is a very detailed discussion in that PR. So essentially it's a work in progress which we have merged so far and we will try to improve it as we go. Secondly, did GAT, Helena agreed to briefly introduce with this, we think this is going to be very, very useful to incorporate it in the following, the subsequent release test cycle. So Helena, if you could just speak. Sure, absolutely. Thank you. So the app and training previously, we've had some issues keeping like the playbooks as they actually are constructed and the tutorials in sync a bit because the tutorials get updated every year. We add new features, we add new steps, we sometimes add new tutorials and we need to keep the playbooks updated that we want to show as demonstrators. So this year we have replaced the playbooks or sorry, replaced in the tutorials every change that gets made with a diff. And I wrote some scripting to pull out all of these diffs, stack them up into a Git history and or pull them out as patches and apply them to an empty Git repository. And then we get out this repository, we're calling Git GAT because it really rolls off the tongue. And this gives a snapshot of the training materials for the admin at every single step. And this should be absolutely invaluable for testing because then we can just go backwards and forwards in time. One of the biggest problems for me testing admin training when we get ready for GAT is that if we need to make a change in the role, we sometimes that change affects every single downstream step. So we have to go back to where that change starts to get applied and then start replaying the playbooks from there on a fresh VM, which is really painful. So with this, hopefully we can say, okay, we've got all of our Git history and we can just check out some specific positions where we run the playbook and run the playbook there or up to there. Yeah, I think that's gonna be great. And let me know how we can help and what synergy we can get going here. Because for the next GAT, I really would love to have, okay, not the GCC GAT, but the next proper GAT, I'd really love to see we're running molecule tests to run these playbooks against an empty VM. We're making sure the galaxy gets set up that the playbooks don't fail and maybe running some selenium tests if they're available of some sort to make sure that the features that are getting deployed are actually enabled in working. Great, thank you, Elena. I'm sure we can utilize it and then we can actually help each other. So all of these items are open up for discussion. So what do you guys think? Which tutorials do you want to automate with selenium? Was that the admin tutorials or the user facing? The science tutorials? Both, all of them. Okay. For the user facing ones, it might make sense to use the workflow testing. Yeah, I would think that the ones, so I know that the slide said that everything went well in the tutorials. I mean, I actually encountered quite a few doing the advanced workflow tutorial. Like there were three sections and I think we hit issues on two of them and it was great to hit them and get that all resolved and I appreciate everyone who helped. Because, yeah, it was a big community effort to do that. But I think all of the little user face element tutorials, if they had this, yeah, where the workflow, where having the workflow test isn't quite good enough, right? Because you want those, these are like, these are like screenshot driven things where you're really showing like how to use actual interface components. I would think all of those would be really good candidates for having screenshot tests. And I know Oleg is doing some work in this realm trying to get the rule-based uploader test to automatically generate screenshots for the training materials. And I think that if we could just extend that to the workflow UI component, workflow UI training and maybe like history and certain stuff, I think that would be really fantastic. Definitely, I would definitely support that. Yeah, there's definitely three categories, sort of the science tutorials which can be tested with workflows, the interface tutorials which needs Selenium and the admin tutorials which need special things and Selenium. But yeah, yeah, that's a good point. We should definitely get all of the UI tutorials under Selenium tests if possible. I would love to use the what Oleg is building currently for all of the UI tutorials. That would be really a bonus for us. Yeah, so I guess this call is good. I mean, there's some PIs on the call. It would be nice if the admin and testing groups could take on automating the testing of GitGat as maybe a quarter two or quarter three task. And then, yeah, just continuing to support all Oleg's work and sort of whatever we need to do there to help with that and expand the scope of that. But those are a great project. So that should continue to make the roadmap, I think. Definitely, speaking on behalf of the admin community, definitely we can add that to the Q2, Q3. I also wanted to say, I mean, it's something that you mentioned in the pull request that added the pull request template. But I think the tags for things that should go in the release notes would also be really good. So that makes it much easier to write the release notes. And it makes it also clearer what needs to be tested. Because I don't think we need to test everything, all 350 items, that's, of course, not possible. And also not useful because a lot of images, internal changes have zero impact on anyone. Maybe fix some bugs, move some code around, but all this is relevant to test. But the things that we should test, tags, I think that's a great idea. I can, as a data point, it works well on the training materials. We have highlight and new contributor and improvement or new feature. And these work well for pulling out things we want to talk about. I mean, we have, right? We have enhancement, but like, we need to decide bug refactoring enhancement, but there's, I mean, still most of that is not interesting. There are certain things that, you know, either they should be in the user facing release notes or that's going to have an impact on admins or that might need updating of the admin-centric tutorials and things like that. I think we need some additional enhancement for that purpose. There's some feedback, so I'm not really hearing. Sorry, I said I would not have known to use enhancement for that purpose. So yeah, maybe we should standardize some tags. Yeah, I think the, there's another one is the feature, kind feature, more than enhancement because we tag enhancement, all the ones that are not bug fixes, I think. And I'm just to reinforce the point and I think that would probably be useful for the testing team to coordinate with whoever is writing the user related release notes. The user was the correct definition it was. User facing. User facing, thanks. Because I think more than the 300 things that we put in the global release notes, that's more the user facing ones that need to be tested by the team. Yeah, that's my suggestion. So I mean, I up to this release, I usually started, I mean, Helena wrote the user facing stuff and I pulled out all the remaining things. I stepped up there a little bit this release. But yeah, I mean, we should, like when we branch, we should already do this, not some weeks later because, but it wasn't really clear whether the testing's gonna happen or not this release, but next release we can plan this better. I looked at the participants, I don't think Jen is on the call, but I know that in the past, she's used other repositories with tags to mark, like this is an issue that she encountered on main and she needs to recheck it or note to herself to recheck it once the new release is deployed. So yeah, the tagging thing I really like, I mean, I mean, I've only thought through it in terms of the release notes. I mean, other than we already do sort of organize the initial batch by like priority and stuff. But it would be nice to know like, if this has a tag that says recheck on main, recheck on EU, it would be nice if the QA team, the release testing team could do that and be nice to sort of track that and have a process in place for that that'll probably help Jen a lot. Yeah, so I did want to give a, it would require some buy in from the committers to start using tags more aggressively for describing how things should be tested by the QA team, but I really think that's a good idea. Should we maybe have designated testing related tags like maybe have test underscore something pretended so that we don't have to go hunting for the appropriate label and also keep them to a minimum and not treat them as exhaustive covering all possible cases but use them only when really necessary. Yeah, I think that would be good. And I might even use QA as the prefix because there's already some test tags in there but just sort of how can we improve the whole overall QA process this way? And again, to clarify, I think it's doable and a very good idea and I'm not pushing back on the idea of tagging. I'm just trying to make sure we keep it manageable and don't end up in a situation when we have so many tags, so many PRs tagged and then we add another tag to them to make sense that set of tags is supposed to be exhausted and mutually exclusive. We have to go back and we tag appropriately another 200 PRs, that would be very difficult and definitely not accurate. Right, no, I agree completely. Yeah. I mean, we have to tag everything with kind and area anyway. So one more for like, this testing and this needs another kind of testing. I mean, that's not too much overhead compared to actually looking at the PR, that's fine. I mean, the default tags could also give it like a no QA label and then the sides instead of QA none, like, oh, this is a QA main or a QA admin or whatever it is, then there's actually, it's just a pure bonus, I guess. Yes. And yeah, I think the pull request template there will help too, right? So if there are automated tests that really reduces the burden on needing to write QA. Although, I mean, people did, I mean, Debbie, for example, I mean, these automated tests, I wouldn't say that automated tests prevent having to test it manually because like the rule builder had tests from the beginning, but still there's, I mean, it's not bug free, right? Because there's just so many things you can do and the automated tests are really ever fully comprehensive. Let's say, but for a typical PR, not one of like... Something else, I'm sorry. I'm sorry that I picked one of you a bit, like that came to mind. I was gonna pick on the best practices filter though, which David, that you found the bugs in, right? Like, yeah. So sort of a big like Marius or Sam or John, like a huge sprawling PR that touches 15 things, probably still needs manual testing. But I mean, the typical PR is really changing something rather small generally. And so I think it's realistic that the tests would cover, you know, I think 90% of my PRs, they are like adding a tool feature that has the tool test, right? Okay, QA team does not need to worry about that. But yeah. It's not usually exclusive. If a PR doesn't have automated tests, it doesn't necessarily mean that it needs to be tested manually. It's just might mean that it's a re-factoring or a new feature which is somewhere which deals with the internal guts of Galaxy which have nothing to do with you, the facing stuff just misses a test. Yeah. It's still having the, I guess my major point though was having the PR template will hopefully really sort of help drive those QA takes. Yeah. I'll shut up now. Marius, did you wanna talk more about Sentry? Cause I thought that was like really awesome. Yeah. I mean, I didn't prepare anything but just in general and I guess I shouldn't show it because things like emails appear in Sentry. But Sentry is this log aggregation system. So you can connect your Galaxy instance to Sentry and then I don't actually know all logs or just logs above a certain log level. So for instance, warning, appear in Sentry. And so when we make the switch to a new release, you can check out like after a day, are there any new errors and exceptions? And I did find a couple of things. And because I mean, so we have a lot of like sort of error-ish or exception-ish type of issues in Galaxy. So if you just go to Sentry, if you look at all errors, so you will feel even worse than if you just look at the issues that went into a Galaxy release because it's a lot. Most of them are fine. It's not really a problem. A lot of it is like files that have been deleted but somebody's still trying to access them. And so the point is when you make the release, you can look for new things because they will appear new. They have never seen before. So Sentry also groups them so you can see whether it's a new thing or not. And yeah, that's a good thing to do. When you just make the release switch, these exceptions and errors can also be linked to issues. So you can say, okay, this issue should close it. And then when it reappears, you see, okay, that didn't actually work. And you also find like a lot of code that nobody uses or that is almost unused. And you see occasionally like there's one or two errors for a year might not be worth fixing, but sometimes you also see, whoa, this is like 10,000 errors coming in since yesterday. There's something really wrong. Yep. Recommended to everyone that runs again except for a setup Sentry. Are there any other discussion points? I have a curiosity if there's time about the VM that you use for testing and what is set up to also with also all the tools that you needed for the tutorials or where the tutorial parts run on usegalaxy.org or something. Keith, how did you run yours? For the second, let me unmute myself. Yeah, that was actually an issue that I did run into now that you mentioned it. I went to run through one of the tutorials and it wanted a tool that wasn't installed on my VM basically to create the Galaxy instances on the VM as I use the Galaxy Ansible Playbook. So whatever that set of the tools comes out of the box with that, that is... That's nothing. Yeah. So part of the provisioning then could be installing any required tools for tutorials. So I think that's Marty. Oh, sorry. Sorry. I wanted to say this kind of builds on what Mike had said. That I think it'd be great for future releases to have more platforms included in the testing. So each team member takes on a different deployment starting with local usegalaxy's, Anvil, Amazon, whatever. And because a lot of those already have the tools available, so testing like the admin capability could be done on a local machine, but testing the tools and not have to take on the overhead of installing any of those could be done on one of the platforms that provide that. And so that'll serve a dual purpose. What is the tool testing? What do you mean? Like, I didn't understand what you mean by testing the tools. I don't mean the tutorials, you gotta pull up the tools. Oh, okay. I mean, the tutorials I understand. Tools, they don't quite understand that. I just want to test the tools. We install something bare bones on a VM using just the plain Ansible. You'll just get the empty tool panel. And so you can't really run many tutorials, right? I mean, because the tutorials will need the reference data, they'll need tools. So we need a more complete deployment and setting that up one off just for the testing on throw away VM seems wasteful. We can mount CBMFS. I think that's fairly standard setup. And there's lots of scripts for installing all of the tools needed for a tutorial. And if that's not easy enough, we can look into making that easier for the testing group. That's the point. I mean, generous speaking, I guess we have some time depending on if anyone's interested, but you can mount CBMFS on your laptop, right? And this is the easiest way to run tutorials or develop workflows, especially if you also need to look whether there's some errors occurring, some exceptions. So that's really valuable since this is a developer on tape and I don't think it's too crazy to make sure that you know how to do this. And it would enhance the point being that it would help tremendously if that testing included some of these automated deployments where as a user, you're able to launch these machines such as Anvil or one of the cloud deployments because that's what that is set up at the outset. You still have complete control overseeing the logs, but it would make sure that those deployments are in fact tested with the new release. I mean, I like the idea, but there's also a lot of us that don't have access to Anvil or any of that. I mean, that's a fixable. I mean, the same way Keith said this up on Jetstream, you can set up machines for others. Yeah, and currently, as I said, there's an Anvil installation running on Google Cloud right now that is installed via the Helm charts. And if the admin team has scripts to install a lot of tools for tutorials, those could definitely be included in the provisioning system. So you say, oh, I want a VM to test this tutorial and we can provision that for you easily enough. Yeah, but the Anvil system is gonna have the CVMFS, right? So probably just having that set up would be enough, I think presumably, right? I mean, if you want to test the tutorial and the tools are not there, then we can install them on main, because that's what we've done for the smorgasbord. And I think as long as they're a tutorial that describes how to use these tools, I don't think there's a problem installing them on main. Or I mean, the number of tutorials you can run on main is probably not that high to put it in another way. It's kind of a lot better, thanks to smorgasbord. Well, just so we're moving into this kind of heterogeneous environment, for better or worse, the reality is we're in this heterogeneous environment where there's gonna be multiple deployments that all are gonna require some love and attention. Here's a question related, but not to what we're just discussing. How close do you think we are to requiring any new feature or any new bug fix to be accompanied by an automated test? And what would be the, how low do we need to lower the bar for the ease of writing a test for that to become a reality? Because many, many major repos, many major projects have that requirement. I mean, I would love it, right? We've discussed it many times. But it's complicated because we're, it's a project with a lot of different areas that it can be tested in a lot of different ways and not all tests are good. You can definitely cripple the development by writing tests that fail, right? I've written my share of brittle tests which make updating very difficult. I'll fix that. I think, I mean, I would argue for good judgment on the part of the reviewer, right? Maybe we should request more tests when we feel like this might not work or might have side effects that we didn't think about. But I think requiring tests for everything is probably going to be slowing down development. I mean, I would push back on that and say that judgment is going to be applied unevenly and it's much easier to request tests if there's a formal policy in place. And I do think it will slow down development. That is the case. But that's probably a good thing because the things that are getting in should be tested. I have a thought and then I lost it. I would argue that adding tests actually saves time for fixing future problems. So even though you think it's slowing things down, the net price is actually much less. Oh, and you don't get me wrong. Better even start with a test before you start writing. It's just, you know, some things are difficult to test. That's one thing I was going to suggest is bug reports or issues when opened if possible include a test that exposes the bug or the thing that's not functioning properly. And then whoever comes in to work on that bug simply has to write code until that test passes. So an issue comes with a failing test and the issue is closed when the test passes. That's exactly what I'm going to suggest. I really like the idea of if you found a bug, make a test for it, you know, that's a beautiful idea. I mean, people that open issues are not paid personal. I mean, people that open issues are people that found the head is not working. He's right though, if you're addressing an issue, it would be good if that issue came with a test. I'm afraid that if you've written a test that exposes a bug, you're that close to fixing the bug because writing the test to expose the bug is usually more difficult than fixing the bug itself. Yeah, but the test is for the future. And it also depends on the nature of the bug. If it's very trivial, yes, the bug writing the test might be more work than actually fixing the bug, but you still need the test to show that it demonstrate that it's been fixed. I completely agree with Keith that fixing a bug must come with an accompanying test. What I'm saying is that maybe that test should be part of the PR and not part of opening the issue because one, as Marius said, we don't pay the people who open our issues. And secondly, maybe it's not a dev or an admin who discovered an issue. So they wouldn't know the first thing about writing a test. Sort of circling back to Marius's point about not requiring tests versus requiring tests, maybe what we could do as a step in the direction that Sergey recommended is that if you're not including a test in a PR, you've got to have an explanation why. So take that pull request template that I sent to put together and push a little bit further and say, by default, all PRs require tests. If you don't have a test, explain how to test your PR manually and explain why it's too onerous to write an automated test for it. I think that might be a good step in that direction. I mean, another step in that direction could be that we differentiate between core contributors and external contributors. Adding tests has always a difficulty of you raise the bar of entrance, right? And testing, as you said, is complicated. So you might push back new contributors. But what we could do is actually requiring tests for all committers, for all core contributors, or what would also be easy enough to do is, I think we have these areas which we change like API or frontend and so on. And that we require tests for one of these areas specifically that we say every API needs to have a test and so on. Because we know writing tests for certain areas is easier than for other areas. But we should take into consideration that as soon as we require tests and whatever level, we raise the bar of entrance. I mean, it goes both ways, right? I think the only thing I'm a little concerned about is making it a hard requirement. Otherwise, yeah, encouragement is always good. Because the more tests we have, the more examples there are of how to use the thing that you're trying to fix. Maybe we can have a label in bright red color, requires a test. And then the core contributors can find those PRs, open PRs, which just require a test. And if it's a PR by a member of the community, well, help out. Yeah, I mean, that's gonna really slow down community contributions still. Yeah, I think any hard requirement for tests just has to be for the paid people, the core committers and the Galaxy team, I don't think we can be forcing community people if they require, if they're reporting an issue that we force them to have a test with that. Maybe there's some sort of way we can automatically generate a test that just returns false or throws an exception by default. And then part of whoever does it is, part of the fixing the test is writing the test as well. How would we, at this point, it's not even clear to me how we would identify those people. Like there's so many, like our Bjorns people, those people. And we can start with the committers group, right? This is automatizable. Yeah, that is the clear group. Is there any committer here who would object to requiring their PRs to have tests? I mean, as long as we can explain whether there are no tests, sure, right? As long as there's like an opt-out. Yes. But if we're gonna have an opt-out, we might as well just make an opt-out for everyone. Like we can then just, I think we could just have a universal rule. Like I mean, an opt-out reason could be I'm the first time contributor, I don't know how to write a test for this, right? You guys have already tests there, just changing something. I mean, refactoring, right? Things like that are small and covered by tests already. Oh yeah, I mean, if it's already covered by tests, then you just check it. This is already covered by existing tests, right? And Nate, the answer's no. I think Nate has to write two tests. Nate has a backlog of tests he needs to write first before he's allowed to contribute anything else. I think that unit tests should be mandatory because there's no set up for writing unit tests, but like integration tests, I can see situations when writing an integration test is not easy. So maybe we could have exceptions for those only. So... A unit test can be not easy in the context of Galaxy because you might be fixing some obvious thing which will be part of a 300 line method which is not unit testable unless you refactor it and that's potentially a can of worms. Not that we don't need to address it, but if we're core committers, staff, yes. If we're a community member, maybe that's too high a bar. Like for instance, Kvan, your PRs on the tool testing thing, right? That's much easier addressed with an integration test at this point. Having to unit test something in basic that PYs, I mean, it's gonna be a big test, it's gonna be a worthless test. It's all of those pieces sort of moving together I think are useful just because of how it's designed poorly, right? And poorly 300 years ago and we can't fix it now. But... Galaxy, there's no unit testing because you're right, there's no units. But if you're writing new code, there should be unit testing, right? I mean, I sort of disagree and I think it depends on the component, right? I mean, if you've got a nice isolated component, like a nice isolated unit is fine, but there's some API functionality, the database stuff, it's all quite integrated. And every time I've tried to write unit tests for, I mean, I'm getting better, but like tool stuff or job stuff, I just mocked out so much stuff to get to the unit test that I mean, I regret ever even writing those tests. And if there's an integration test, I don't... Yeah, and this is why I think the PR template's great because it says an automated test, right? Because sometimes the unit test is more appropriate and sometimes an integration test is more appropriate. Yes, as John said, unit tests are not necessarily easy in Galaxy. I wrote my share of brittle tests where I mocked out this and this and this and that and what that leads to is greater test coverage, but we have to change and adjust those tests all the time. And ideally a good unit test should not have to be changed. So there is this decision on how much to mock and what to mock and how to mock it. And this is something a first-time contributor is not capable of doing their own. I mean... We're not mocking at all because I don't like to mock. I don't like to mock. But that's just a personal bias, personal opinion. And mocking is pretty much a necessary evil for any sort of moderately complex integration test. Unless you dependency inject. Yeah, but my problem is when mocked tests fail, I usually spend more time understanding what the mocking libraries do rather than what the problem was because it's not as readable as Python or Java or whatever. So it's like kind of a extra burden. It's because the mocking implementation sucks not because the approach sucks. And the... Yeah, I think that way. And implementation is terrible. We try what we can. And not just mocking, I mean test doubles in general, whatever they are, fakes, mock, stubs, all of this. All right, it looks like the meeting's done. We got to that important integration versus unit test. Yeah, so... Testing framework for our testing framework. And it's next step, 42 Roadmap testing group. All right, thanks everyone for your time. And thank you everyone. And thanks to the release testing team. I think you all did a fantastic job. Super impressed. Yes, great. Thank you. Good job. Thank you.