 So this talk is really like, it's more, I would like to share or use an issue that I have currently with some tools integration. And I would like to have feedback on that. If it's just my issue or is it something, or do we have a way to fix that? Because currently it's like a lot of work we are doing. And yeah, I would like really to have feedback there. So the thing started from, so some of you probably know, most of you know that I'm working on microbiome a lot lately and especially on metagenomic assembly. So the idea is when we do metagenomic assembly, we take a sample of, I don't know, soil, gut, whatever, where there is some microorganisms inside. So we have the microorganisms in there. There is a sequencing step. So where we sequence the DNA from the organisms. So you can think about the microorganisms. You can see the different colors in this image. I'm sorry if it's not really easy to read. I drew it quite quick. So I hope it's still readable. And so you sequence and you get these small pieces of DNA from different organisms, but we still don't have no idea to which organism that belongs. So one steps we can do is either we try to classify these sequences to identify the paxon. Another things that is happening more and more is to do assembly. So we do use specific tool that are called metagenomic assemblies tool that try to come up with longer sequences by combining these short sequences into longer into context. And when we do a metagenomic assembly, the next step that happens is usually binding these longer sequences because of this context still doesn't represent the full genome of an organisms. So we need to somehow make groups of these context to group them in a group that corresponds somehow to a species, for example. So we bind them and we try to cluster them in groups or bins. And this is called metagenomic ensemble genomes. And for doing this metagenomic assembly, there is two possible approaches that we can do. It's usually you can, for example, if you want to learn more about the microorganisms and the sequences of microorganisms in a soil sample, you just don't take one sample. You take several samples, maybe at different timing to really have a representation of a good representation of the microbiome at this time and the different microorganisms that are in this soil. And then you can compare to another soil, for example, in another location. And so you do a lot of sampling for that to get the data. And when you do the assembly and especially in metagenomic assembly, you can use two approaches. You can use what is called co-assembly where you take all the samples, all the reads that you got from all the samples, you assemble them in run-run. So you run that you merge somehow all the reads together before running the assembly to get the context from all the samples in one box or in one outputs. But there is another approach that is becoming more and more used also. It's called individual assembly where you assemble in parallel the different samples. And then you process them in parallel. You do the binding in parallel. You do different things in parallel. And at the end, when you do the clustering and you clean the bins that you got, then you combine the different samples together to identify really the metagenomic assembly genomes. So that is the two approaches. The only thing that we realized by trying to do that on Galaxy is that we have several issues then. For example, when you have multiple inputs, so these multiple samples as inputs, we need to explicitly add these two modes. So when you have a collections, by default, only the co-assembly approaches is implemented. So for example, when you look at metaspates that is used, by default, whatever the type of data that you have here, you see that you have always the four loops on all the input that you have. So if you have a collections, for example, of data, by default, all the wrappers for metaspates, in the wrappers for metaspates for Galaxy, then by default, there is a four loop that go through all the samples and add them to the same command line. So by default, metaspates is one, only ones. So we need to add another, we have to explicitly add, okay, to which type of modes do you want to use? Do you want to use a co-assembly mode or individual modes to define, to put that directly, to put that in the common sections of the wrappers, to say, if you use the co-assembly, yes, you can do a four loops on that and making only one command for that, for all the samples together. If you do an individual assembly, then we need, you don't do a four loops, and by default, several jobs will be run for each of the samples in the inputs there. So it's a bit, yeah, the issue that we add, so we have to do that, and we have even tricker cases, for example. So in this case, we add quality control. So we have an assembly where we do individual assembly, then we have the context on the second rows, and then we do a quality control where we compare the context to the inputs samples. So how it should work when we do individual assembly, we should be able to run the quality control on each of the samples individually, and that there is a match between, even if we give two collections, where the inputs, the samples inputs are, the raw data are first collection, the second collections, we should there, and then the QC is run for several times, so the number of samples we have, and that match every sample and each context to run the quality control. But with one truth, with Quast, I was investigating the implementation, and I found out something that was a bit more trickier, freaky, so it's like, if we have a paired collection as inputs for the samples, the collection that is given by the assembly is not a paired collection, it's a simple collection because we have only one output from the assembly, one Contic file, even if we have a paired collection as inputs, so when we are forward and reverse, then the output of the assembly is one Contic file, so it's good. But then when we do the quality control and we give the samples paired collection as input, and the Contics collection as the other inputs for the quality control, there are a mix of for loops and non-for loops on some aspects, so that makes the thing that there are, yes, several jobs, so the number of samples of jobs that will run, but the commands that, and how the commands looks like, it's all the Contics were put every time as inputs for the collections, for the quality control, and so the sample one, we're compared to the Contics from all Contics outputs from all the samples, so the raw reads from sample ones were compared to all Contics from all the samples, same for sample two, for sample, blah, blah, blah, et cetera, and when we do with this quality control, what we do is we do a mapping, so there is a mapping of the reads, the raw reads on the Contics, and so it's mean that in this case is the mapping was there is mapped mapping of each of the reads on each of the Contics every time, so it's a lot of computational resources that were lost when we do that because of the way we had to, the way it was defined in the command sections of the wrappers, and another case that I figure, found out also in this also tool is when we have a forward and a reverse collections, there were a mix of everything we're concatenated in ones to really give only one output for, so not one report for each of the samples, but just everything aggregating in one report where everything we're compared to, so it was a bit of mix. So what I wanted to share here is just currently, it can be quite tricky to identify the case when it's happened, so these cases of reality is different than what we expected, and we need then to modify all the XML of a lot of tools to add this assembly mode to say, okay. So we need to, sorry, I'm not sure if it was a question or not, but yeah, okay. So we need to modify the XMLs of these tools to add this assembly mode, except if there is another solution that I would be really love to hear, and then we need to remove the for loops in the individual assembly mode in V6ML, and there is still a lot of tools, so I already started, we already started to do that on several tools, but there is still a lot of tool that doesn't provide that, and I know that we have that some collaborators that try to run that and got some issues, and for us, we also would like to implement this Metadromic Assemble Genome Building Workflows, and we need to cover this aspect of individual assembly. So, and then- Can I ask you a question? Yes, sure. So just to understand the thing, so you basically have a list, and your list gets merged into a single dataset because it iterates over all the items. Is that what's happening? Yes, yes. Okay, so there is the applied rules tool that takes your collection and places each dataset into its own collection with a single item. Would that work? But it's meaning you add another steps, do you mean in the XML wrappers, or do you need to add another- Or in the workflow? Yeah, but the idea is why not, I'm not sure if it will solve, so it's not, but then the users need to know that they need to do that before, so which is not- So your users are running tools? Are we needing what? Your users are running tools, not workflows? So this case is a lot, so it's tools that we already have, so mega-eat, metaspate, squast, check-in, cover-m. So there is a lot of other tools in these cases, more and more also, where, yeah, I'm still not completely sure how to express that, to be honest. So it's just a way, I just wanted to get feedback on that also there. So we have the way that we have collections in Galaxy and every time you have a collection here, in this case, you do a for loops over the item, so like in metaspate, the way it's implementing the command sections of metaspate is this way. So by default, in these cases, here you see that all the items of the collections are put in the same command. Do you get that? And you just want to have it operate on a single file of the collection? Yes. So that's the solution then, and like, you know, if people are not comfortable with the apply-rules tool, which is super generic, but maybe a bit hard to understand, I mean, it's pretty simple to do another tool that just reshapes your collection. I mean, it doesn't consume extra data, it doesn't consume extra compute. It's just, yeah. But does that work? I'm a matter of being familiar with Galaxy. I'm just not sure if it will fit to old cases. So when you have these cases like here, these things, will that work? Well, I don't understand the graph, so you have to walk me through it. Sorry. So on the top, so you have the raw data on the top. You do the assembly, you get an input which is the context. So you have another collection. So you have a collection on the top, a collection on the outputs of assembly. Then you want to compare the output of the assembly to the inputs of that you add to the assembly inputs. It's the quality control step here. And you want that to be run one by one. So you want to have the S1 compared to S1. S1 context compared to S1 raw data. And currently in some of the tools, that is not the case. It's not what is happening because of these small loops. So that's exactly what my suggestion is about. So when you have the for loop and you iterate over a simple list, you have to have a list. So what do you do if your list shouldn't be merged? You make a list of lists where the inner part is just one item. I mean, we do this a lot. So I mean, I'm a little sad that you say you spend a lot of time adapting the XML, which is good. I mean, it's certainly something a lot of tools have. They have a pooled mode and a single mode where pooled means, you know, you iterate over the entire input. But is it some, yeah, okay. So then I didn't know about that. We already, we discussed a bit with Bjorn. And I think the only solution we had at the time was putting these two modes. So I didn't know about that. But Bjorn is not the Galaxy community. So I think that's a great question to ask in one of the channels. This, I mean, I've been giving this answer a few times. And I mean, of course, it's interesting to know that this is still a problem. The question is then how to communicate about this. Yeah. Yeah, that would be good to know, because maybe I need to try that first, also to see if it's fitting really to the use cases here or not. But yeah, that was mostly what I wanted mostly to discuss with the community there. It's also because I didn't figure out, I didn't find a solution there. And I had to rework on the tool. So it's good to have it to know if there is a solution for that. And that would be really helpful there, definitely. So. Marius, is there like a tutorial you would recommend? Or I mean, I guess it's kind of hard to look at the tool XML and know if Galaxy should sort of say, hey, do you want to build a list of lists here? Is there a good example tool XML that has a help section that would explain that, that we could sort of say for tools that have this pooled versus single mode, you know, look at this example tool when you're sort of writing the wrappers. I mean, I don't think we, I mean, you know, if this pooled mode is something you would use, I would just write the pool part. I don't think I would adjust the tool XML because if it's much more generic and simpler to write a light tool that does the for loop and then just have Galaxy pick out the right structure. I mean, the good thing about the other mode is that you would do it, like if you only have a single file, it's kind of easier. Galaxy doesn't offer the ability to sort of auto promote a file to a singleton list. But then you do need to have those two independent command and sections independent. Like, I mean, they're in the if statement, but like kind of disconnected. Yeah, which I mean, I guess I'm fine with. I guess more, my question is, I mean, that's fair. I mean, I think that's a perspective and it's, I trust you use tools and I don't. I guess maybe the question is about like, examples that we could like point people at for how to write tools with like a help section like, I don't know. I mean, you guys know how to write the tools, right? Cause I think like the planimo docs go into depth on this, right? No, but the tools don't have, yeah, there's that. But there's the, and I could, you know, I wrote that. So I understand that. But I mean, like the help section should have maybe a description of like, okay, if I have a single file, how do I use this? If I want to break it into batches, how do I do that? Like maybe just pointing that up at the applied rules tutorial or something. I guess we do have that tutorial, but it's not structured in this way. So if I was going to improve tutorials, I would put a couple of like commonly used rules that you may want to use in the applied rules tool. Okay. Which isn't a direct link to, you know, the tool form. Yeah, I mean, it feels like there's some disconnect there between, I mean, we do say in the tool form that we are, I mean, we say something about this. I mean, it is clear that you're going to reduce the data set on the tool form. It's just you got to know what that means. And like, if you don't want to reduce it, how to get around that, right? Like I don't think that's there. Like, this is going to reduce it all down. If you would want to consume it in batches, here's, you know, maybe that's a little link we could put in the tool form. It feels very specific, but anytime you're reducing a collection completely, sort of add the option to have a little help hint that like, if you don't want to reduce the collection completely, here's how to restructure your data potentially. Yeah. I mean, I wonder how we can do this. So it is not confusing. Because I mean, it's not that common, I guess. You know, when you build a workflow with 50 tools, then not common becomes common. But like for the average user, they probably don't have to worry about this. I guess that's part of the problem here, that those tools have this as a very common problem. So the tool on its own, even if you don't use it in workflows. I mean, I'm not sure it's a problem really. I mean, like, I mean, you know it's a problem because there's an entire meta genomics and communities that struggle with that. I mean, I would say the question is, why did that? I mean, I don't know. I mean, why did nobody ask the question? Or like, I do answer that question like three or four times a year. Well, Bjorn, maybe to flip it back on you then, like we've, I think Marius and I have put a lot of effort into the apply rules tool and the training around that. I mean, you know, I never write documentation but I understood this was complex. So I actually wrote documentation for a change. Like, is there, what do you think the missing piece of documentation is? I think maybe Marius's point is that this is not a problem with the tool. It's a problem with how information is flowing through the ecosystem. Is there, do you have an idea for how to improve that? Not really because it's a, I mean, we have two kind of different forums, I guess. We have the developer, I mean the tool developer problem, right? I mean, I guess this we can improve with documentation maybe part one of these tools and reference that then as a best practice, whatever implementation of this kind of use case. But then we have also the end user problem that consuming those tools and needs to use it. So these are two different kind of problems, I think. And for the last one, I don't know. I mean, we need to link the rules tools more prominently in the tool form then, I guess, I don't know. And probably also have more training where this is used really because currently I didn't, I think I didn't realize that could be used. So it's been that there is probably no tutorials or almost not that many tutorials that use that but it's not the one I checked or I worked on. So we need to add more to that, show application of that maybe. Yeah, I mean, I think that's a fair point. It's on the eternal to do list. It should get done, I agree. I don't know. We could certainly also offer the option for that thing. So if you have a tool that consumes multiples of a data side, we could maybe add a hook where we just reshape the connection in a way that things do not get reduced. But I worry about the, I mean, it's easier than opening up the apply rules tool but I worry that it's not, I mean, I worry about the extra button and you still got to understand what's happening there. So I have this open PR that I started that's like, it's all, I was trying to make a little modal or something to have all the different collection operations with like pretty pictures and maybe just like a higher level view. And one could easily imagine like, you know, one button use the apply rules tool to, you know, been single-tonized the list into a list of lists like, but then, you know, we probably wouldn't want to put that modal on every tool form, but maybe the tool XML could have a hook. Like if this is the tool like an assembly tool where we expect this is a common operation, we could have a little flag that would do that. I mean, so let me restate that. I mean, this isn't an assembly problem in any shape, way or form. Like, I don't know how many projects are doing this. And I think we don't know that in advance. We don't know how our users are going to compose what they're doing. And so I don't think that it makes a lot of sense to include this in the tool XML. I mean, I think if something consumes something else than a simple data input, it should be an option on the row with buttons that switch between collection and multiple inputs. And you would be okay with that button? I mean, I would find it handy, but... Okay. For that, I think we probably need to explore how people understand this, if they understand this. But, you know, I mean, we could certainly take users more by the hand because for the most part, we know this. We can say this tool consumes a list of data set and produces a list of data sets. You know, that sort of simple thing. I mean, we don't even showing what outputs tool has, right? But in order to make that right choice, I think the first step would be actually to show what outputs the tool is producing. That's currently not visible at all in the tool form. And then I guess the next step would be to say, okay, we're taking your list and we're creating a single output. At which point the user will say, oh, but that's not what I want. I want the list and I want to list out. And there is the logical link where you would put such an option in, right? Well, yeah, that's a great point. So maybe before the tool success page at the bottom of the tool form, we describe the outputs and even maybe just having that there would sort of catch people a lot quicker and then we could sort of go from there. I think so. And I think it also ties in with the common be requested feature of being able to name your outputs in the tool form. Yeah, that seems like a pretty natural, good place to go. Or could there be like a tab in the form where there's one tab where you can kind of put in all the inputs deal with all the parameters of the tool and then maybe have a tab where you can deal with the outputs, see what they will be and then possibly name them if that's of interest. We have a great UX working group that I think could explore those different options. But yeah, I think there's a bunch of different ways you could do it. Ernest, does that sound like it all kind of helped you get towards the names for your tools? Yeah. So yes, I will try, I will have a look definitely at the app libraries before going further. And if I, if I still doesn't solve my issue, maybe I mean, I couldn't, I check with you, Marius, we can discuss again, I can try to explain again and see different use cases where it really doesn't solve the issue. That sounds good. I mean, I guess my other question is like, naturally, where would you ask a question like this? And sort of what should be the place where people ask this question? Because I think like the most knowledge in this regard is in the IOC. But it's a difficult for me, I had difficulty to formulate the problems, to be honest. So already identify how to formulate that. It's also making this presentation is a way for me to be sure that also I understood the problems. But then how do you, for users or for someone that cannot, doesn't want to give a talk or something like this, how do you ask them to formulate the chart is difficult, I figure, I found. So already we were, I was trying to explain to Christo that chart about that issue and it was really not easy. Yeah, so I can't say I understood the chart, but I mean, it's a problem I've heard multiple times. I think if you're talking to an audience of Galaxy people, maybe a link to a history and saying, here I ran this tool on that import data set, I got that output and that's not what I want, what do I need to do? Would be something we can easily answer. I mean, I guess ideally in an ideal world, this question would go to help.galaxyproject.org because there it's discoverable. It's just, yeah, a lot of questions there and it's a bit intimidating. I think that part of the issue is that we don't have a visual language for collections. And so no, like if people saw a chart, like this is how this tool is gonna consume it and this is how they're gonna produce it and they could see like boxes and lines connecting them. I think that people would be a lot easier. I think maybe Bernice's whole presentation would be like, the boxes look like this and they need to look like this. And we want, people wouldn't have to understand like, there's like 10 terms that we use like reduction and consuming and list of like singletons and like we've used a bunch of very jargony computer science terminology here and we want our users to know all of that but I don't think it's necessarily realistic and it's a very visual application already. So just like, I don't know, it's something. I mean, at one point Anton had some great little charts of the, what the collection tools were doing but if we could put more of that in the UI, I think it would really help. I think even like the workflow editor, people can talk, I think it's a little bit more communicable because you can see ribbons or not and sure even that could be vastly improved by like, a lot of information is lost in those ribbons but just ribbon or not, it's already a start. I mean, in a sense, the best way we had to go about this, we've actually removed with a multi-history view because we had like these boxes where you could see your structure off the connection but I guess there's of course more involved in sort of showing the transition between your input and your output. And presumably we could build a diagram for each tool run, right? Like we've got, there are gonna be a couple of tools where we can't, but for the most part we can and yeah, I think that might be like, step one is just how do we display the outputs? And then step two is, well, we could add the naming and that should be a pretty short project. Maybe step three though is in like, how can we communicate that visually? And then I think, a question like this could just be like, hey, I want this diagram to look a little different. What do I do? Yeah, but I mean, I mean, these are all great points. I guess my question was mostly what avenue like, I think maybe for users it's not entirely clear where they can even ask that question. It's like, I mean, if they want to get an answer from me, I mean, I obviously should be looking at help but realistically I only do if somebody points me at it. I am however present on the IOC channel. Is that a channel we would want to point more people at? I, as a newer member, I would just be curious or, you know, if Bernice wants to put together or work on a training video together with me for this case in the short-term, that might be a nice short-term answer while the long-term response on the functionality and the feature set gets flushed out a little more if there's anyone else who's interested or thinks the training video using the pride rules just to see if that will help solve any of the issue that Bernice is experiencing, you know, let me know. I mean, definitely working on trainings would be good anyway. So having more training for that and more use cases that would be really useful and to show that how to solve that issues. But anyway, in the max, so metatomic assemble, workflow, genome, yeah, whatever, workflow, I would put that in the tutorials when we write tutorials, but so a short, yeah, would be good, yeah. If you want to work on that, I'd be happy to help you. Great. Thank you so much, Bernice, for sharing the issue that you run into and hopefully we can kind of get you towards an answer. I think maybe we can put together like a blog post or something like that that can break down for folks where they can plug into the community if they kind of get stuck. And maybe this kind of training video will already help to address it, but I think also maybe like a blog post of this kind of story of how you got stuck here and the solution and maybe like pathways for other users so that this can be a little bit more discoverable. Might be also a good idea, but the tutorial may take care of that, so that might not be good. I just want to point out also for the new people that we've had videos in the beginning and there's a reason we don't do them anymore because they can't be updated easily. And that's why we went with tutorials and it's so much easier to find tutorials than videos. That's a good point. It's hard to scale some of this like transfer of knowledge where videos take a lot of time to produce and can be extremely helpful, but as soon as something changes it might not work for you anymore. I hate videos. If there's nothing written I don't know, I will use something else. Personal opinion, but I mean, I think it's probably quite common that business scientists will not take half an hour even five minutes to watch a video. Yeah, that's fair. We could add more FAQs for example in the GTN for that to solve that so that where explain, okay, your ear is your cases, ear are in different solutions for the different cases. And then you can point to the FAQs because they have FAQs without each one page. So then it's easy to, but then we probably need to identify the different cases and different solution for that. Yeah, I mean, I think these FAQs are fantastic. We need more of these because they are to the point. With Andrew, we use like an automatic rendering of Google Slides to video. So that might be something we could explore here where the slide notes are kind of red, kind of with the robot voice, but that could be still helpful and then a little bit more updatable in a manageable way. It's already implemented in the GTN. So when we have slide, we have automatic videos generation. Yes. Okay, that's great then. I'm seeing some comments from Dan in the chat and mentioning that it's okay for people to end up in the lobby anytime and that way they can ask their question, get an answer there or get pointed to the right channel that they need to be in. And it's okay to ask any kind of question even one that's not fully formed is a little bit hard to communicate just as kind of a place to start the communication. And then... I wonder if we can... I'm sorry to interrupt, but I wonder if we could use one of those texts generating AIs to steer people in the right direction? I don't know if that's too spammy. It sounds like it'll be fun. I mean, they generally even generate good-ish or workable templates for tools. So I wonder how usage questions would come out and if we have the funds to pay it. I think that'd be a cool idea to explore. Or are you gonna say something, Michelle? Oh, I was gonna say that Hopkins is having like a meeting about chat GPT or like a town hall or something. So I don't know how... I don't know what we'll be able to use, but it's an interesting question for sure. But from a policy standpoint, like I don't know what they're gonna do or say about how academics can or can't choose that. It was like a whole new world. And then I'll also just read out the last comment that Dan made is that we kind of have two problems. One is answering questions and that people will have by providing documentation and materials. So I guess being a little bit proactive and then also encouraging folks to kind of just ask if it doesn't exist or they can't find it. And I think sometimes as someone's learning, you don't know if it's something that you're missing or and just can't find or if it doesn't exist, which is totally a possibility. So I think definitely this is a good... I think this meeting has been helpful for us to kind of bring some of those up and we can kind of continue to brainstorm how we can encourage folks to ask those questions. Awesome. I hope this has been a helpful discussion for you, very nice and for you all joining today. Any other questions or comments? I had a quick comment. I apologize if I came off a little defensive in this meeting, but I really, and that's on me, but I do really love this format of like, we've got a specific problem, here's a presentation. I think this is really helpful. So yeah, so I just, I'd really like to encourage us more and I apologize if my tone or comments discouraged it because I think it's really great. So I just wanted to throw that in. Thanks so much, Bernice. Completely agree. Thanks for, I mean, I wanted, I feel a bit like, stupid, I didn't know the apply, I mean, I didn't check the apply rules on that and I didn't try to see if it works or not. But on the other hand, I mean, it's helpful to have the discussion to also see if it's, I mean, if I didn't figure out about the apply rules, I mean, there is probably a lot of users that didn't figure out about the apply rules. So how can we solve these big issues of finding that and making that more obvious? So at least my feeling there is good to help also building better Galaxy for that. So and better interface or better materials for that. So all good. Thanks. Thanks so much. Thanks everybody. With that, we'll pose our community call today. The next meeting that we'll have is on April 13th and we'll get an update about Anvil. See you all in about a month and many of you and Baltimore hopefully. Bye. Bye. Thanks everyone.