 Hello, my name is Sunja Desanto. I'm a software engineer developer for the Galaxy project. I primarily work with the UI UX team, but I also do some backend stuff and some outreach. So you might have seen me around a bit, but today I'm going to be walking you through the rule-based uploader tutorial. And if you've never used the rule-based uploader, this is a great place for us to get started. There is an advanced tutorial that you can follow this up with, but we're just gonna work through the basic rule-based uploader tutorial for now. The rule-based uploader is a way for you to modify big data sets or big collections of data sets according to certain rules without needing to do it over and over and over again. It's scalable and reproducible. And when you're working with somewhere in the 10,000 range of data sets, it is impossible for you to modify that data according to a rule, the same for everything. So it's really useful. We're going to be using this tutorial here. So it's called the rule-based uploader. So you should be on the one that says rule-based uploader with John and Helena as authors. There are some, like I said, some prerequisites on dataset collections and managing your data. But as long as you've got some time working with Galaxy, you probably are okay there. If you've never used dataset collections, you might wanna check out the dataset collection tutorial as well. There are three different hands-on examples that we're going to work with here. One is gonna be uploading data sets with rules. The next one is gonna be creating a simple dataset list. And the last one is creating a list of dataset pairs. If you guys are ready with this tutorial up and whatever instance of Galaxy you're going to be using up, then we can get started. But you're going to definitely want to have the tutorial up in a tab because we're going to be copying and pasting datasets and data information into the block. And if you don't have this easily accessible, then it's gonna slow you down. So definitely if you have to pause here and pull up the link, do that, and then we can get started. So the first hands-on experience, we're gonna be just uploading datasets with rules. And it's got some information here. All the datasets that we're gonna be using for this tutorial are tabular. But that doesn't mean that the rule-based uploader can only handle tabular datasets. You can handle a bunch of other stuff, but tabular datasets just make a lot of sense here. So we're going to click on this copy button right here under the uploading datasets with rules section. And that's going to take this data and copy to our clipboard that way we can use it. Navigate to your Galaxy page and you're gonna click on the upload data button. But you don't just wanna paste it here. Instead, up at the top, we see these tabs. Usually you would just upload data as regular for the compositor collection, depending on what it is you're uploading, but we're gonna go to the rule-based uploader and we're going to leave this as upload data as datasets because that's what we're working on here and from this pasted table. So you just wanna paste your data in there and click build. Now you can see that we've got the data in here, but we don't need all of this data. And if you look up at the top, there's a warning here saying you can't move forward. You haven't specified a source comms. I'm gonna show you how to do all of that. So the first chunk, the first thing we're going to do based on the tutorial is get rid of this header. We don't need that information, right? So we're gonna go to filter first to last rows and we're gonna filter out the first row. So the reason why we don't need this is it doesn't include any data. This is human accessible and it's telling us what these different columns are, but Galaxy doesn't need to know that. That's going to read this as another dataset. So we just wanna remove it. And we're probably gonna start, I think all three of the hands-on experiences doing that. So we're just gonna click apply and boom, we can see that the top row is gone. And we see also here in the rule builder, the list of rules that we've done, you can edit them if you need to. You can delete them, completely remove it and we see it pops back up. So it's always good to know. If we're wondering what did we do to get to the point where we're at, we have a record of it. And as we do more stuff, they'll just pop on there in order. The next thing we need to do is in order to proceed, in order to actually upload this, we need to rectify this problem. We need to specify our source column. Our source column is column D. This is the URL. But there is one other thing we are going to do here. We're also going to select column C as our name to name our data sets. And so we're gonna add or modify column definitions and then we're gonna add the definition for name as C and we're gonna add the definition of URL as D. So let's see how we do that on here. So we're gonna go to rules, add or modify column definitions, add a definition for the name and we want the names to be C because that is the unique name column. And we wanna add a definition for URL. When we do the URL one, that's not correct. We don't wanna leave it there. We wanna drop this down to D. We do the URL and we notice that error has gone away. The last thing we want to do is change the type. These are fast queue Thanger. And we've done everything we needed to do here. We can click upload and you'll see that it says it's working on uploading them but you can see in the history that now we've got six new data sets uploading and that they match up with that initial table that we had. But in the background there are going to fetch the data from this URL. Ta-da, you've used the rule-based uploader. We created just a simple, like six, was it six, I think? Six data sets from URLs and we modified our table a bit to make sure that we were fetching them in the right place and that we named them correctly. And then we were uploading extraneous data like that with column header. That's we're gonna work a little bit with collections. So if you haven't done anything using collections yet, now would be a good time maybe to pause here and go in and try the collections tutorial or to do just some reading on collections and data in Galaxy and what that means. But there's different types of collections in Galaxy and the ones we're gonna be working with in this tutorial are a data set list and then the next hands-on tutorial, the part of this tutorial is a list of data set pairs. So that's a list of pairs and this is just a simple list. For this one we're actually going to make a purposeful whoopsie, purposeful accident. We're going to copy our data but we're going to upload it as a regular paste load. So we're gonna do a paste fetch data and we're gonna paste it here just like that. Like, oh, we're adding some data. Uh-oh, we wanted to turn that into a collection and uh-oh, we put it over here. Well, that's not the end of the world. What we can do now is we can go to the rule-based uploader and we're going to upload the data as collections instead of data sets. So you got two options there. And then the loading, where you're going to load this tabular data from, there's a paste of table. That's what we did the first time where you just put the tape, the data in here, history data set or from the road files. In this case, we're gonna do it from somewhere else in our history and we're going to choose that last one that we uploaded. Let's wait for it to finish uploading. Ba-da-da, ba-da-da. You can pause here and you can pause here and wait for it. I'm gonna pause my video too. Mine is all uploaded now. So we're ready to continue. So we're going to upload data using the rule-based column, the rule-based tab. And I guess we're gonna upload from collection using a history data set and this new history data set or click built. Now you can see that we thought, again, some things that we need to resolve before we can proceed. We have to name the collection because collections have a name, specify the source column, which is still gonna be this column D and specify a column as a list identifier. But first, like we did with the last one, we wanna get rid of this top column. It doesn't, this top row, excuse me. This top row doesn't have any usable data for galaxies. We're gonna get rid of that first row. We're gonna apply and boom, that's gone. Again, we're going to go into rules, add a modifier, we're gonna add our URL as column D and we're going to add our list identifier as column C and that's just, again, the name. Modify, we're gonna add a list identifier as column C. We can apply and we can see our three rules so far. We still have one rule we haven't resolved and that's naming the collection. That's in a box down here. Enter a name for your collection. You can call whatever you want. I'm gonna call it this since that's a PRG, JDA 60709. If you wanna add the type, you can again, adding the type helps whenever you're uploading things. It's gonna make it easier for you to use the right tools. So add to type as early as you can in the box and this is possible. And that's it. This is ready. Our warning up here is gone. We've identified everything we need to do and we can just click upload. Again, it'll take a few minutes but you're gonna see now that the same six data sets are going to be encapsulated basically in this collection which makes them, it's a one data set that you need to one item from the history really that you need to to run a tool on or whatever. So like I said, if you're not sure what a collection is, if you never worked with collections, there are places for you to, there's another tutorial that exists about using collections that can help you figure out what they are, what they're used for and empower you to use them more frequently. That's gonna take a little bit of time as it's zoom. Again, it's going and fetching all of those data sets again. So while that's going, we're just gonna assume it's gonna turn green and it should. Now we're gonna work on creating a list of data sets here. So this is a different data sample this time. You know, the actual data looks different. Got a lot going on in these very long URLs here. And it's gonna be a different collection as well, the structure of it. Instead of just having a list, like being ABCDE of data sets, we're going to have a list of pairs. So we're gonna have a pair, a forward pair and a reverse pair and we're going through making these matches. And we're doing that all on the rule-based uploader. So if you've never done that in the regular paired list collection builder, just creating a collection of data sets might wanna go and refresh on that in a bit of a while. But if you're good, let's move on. We're gonna start again by copying our data. And this time I think we're gonna put it right into the rule-based uploaders, we're doing a rule-based, we're doing a collection from a pasted table and we're just gonna paste it in there, boom. Now it's built, but it's got, like you can see this column, which is our URL column is lengthy. And it's more than one URL actually. You can follow the link up to the point where you see a semicolon and then you see another link start. And that's because these are our forward and reverse pairs. So now we need to split this up. Okay, and this is gonna show you some of the really advanced data manipulation stuff that you can do with the rule-based uploader. We're gonna take this string and we're going to split it and create additional data sets here, but then we're going to pair with their partners. Like we did every other time, we're going to first start by getting rid of that top row. It's human readable, but we don't need it. It's not an actual data set. Then we're going to go to the rules menu and we're gonna identify our listed identifier as C, which is what we did last time as well. We just wanna know which one is the name. Okay, the list identified and we're good there. Now, if you've never done anything with regular expressions, don't get worried. This is straightforward and we give you the regular expression we need to match. But we're going to create a column matching an expression group. And that expression group is anything. That's what this is. The parentheses.start is anything that is semicolon followed by anything. And we wanna split that into two. So that's going to take this column D and that's going to split it into two columns. One with this part of the one URL, the first URL, and one with the other URL. So we're going to go to rules, click home. You don't, using a regular expression, excuse me. Using a regular expression from column D. You guys can feel free, I'm going to do it. I'm going to just copy that regular expression from here rather than accidentally mess up. If you know what you're working with, if you know your data set, then you're obviously going to be. And we want to create column, columns matching expression groups. And we want to, right? Cause there's two URLs in here. So apply and we can then come to the end of our data sets here and we can see that we have split column D into column E and column F, which has two, the two parts of each, of the URL. Next up, just to make it easier, press build CDs, we can delete column D. Why not? It's not useful to us anymore. We've extracted the data we need from it. So you can just click column D and apply, boom. Column D is gone. Now it got replaced by what was column E and column F. They shifted over. But column D that came in initially is gone. It's not in our field of vision anymore. Now we're going to split the columns because we've got this column D and this column E and we want to split them so that they create more data sets for us to use. We're going to split the columns that odd row columns are D and even row columns are E and click apply. The middle column, no, sorry, it's contender rule. I keep going to columns. And we're going to split columns D and E and we're going to apply. You can see now that if you look, you can see this was D and this was in E and has just popped them right below, one below each other. All the other stuff in the table here, C and B and A, they were common. It was just D and E, those two that were different. And so it just duplicated everything here and gave it a different URL, which is what we wanted. Now we've got our forward pair and our reverse pair and each is a separate item in our table. Now we just need to match them up. Now we're going to use a regular expression again to create columns matching our expression groups. And this one, again, you're going to want to copy the regular expression. This is anything followed by an underscore and then a number. Numbers can be one or two in our case because that's what we know, followed by a dot fast q dot gd because that's the ending for that link. And we want one group. Okay, so copy this regular expression for now. Gonna be helpful when we're going to using regular expression on column D. Okay, so either expression and we want to create a column for the expression replacement, I believe. Create columns matching expression groups. Create columns matching expression groups and we want one group, sorry. We'll take a reply. And what it did was it popped out these numbers. So now we know from the link, which is a reverse, so base one and base two. There's an option here to swap the columns D and E just so we can pull this forward. Why don't we go ahead and do it? Swap columns D and E and that's going to just switch the order for us that way we see the, well, the ones into the single digits here is more prominent than the actual link. This is going to be useful to create our pairs. And finally, we're going to go back to the rules and add or modify a column definition which is the paired end indicator, D or E. Add modify, we're going to add a paired end indicator of D. We can apply that and move on but one thing we're nearly at the end here and we still haven't said what our source column is. Now that we have a URL that is just one URL, we're going to add our URL and that is column E. You can click apply and we're almost done. We can add our type, the past QWGC and we can write our name with whatever name you'd like. I'm going to use this name here, PRJDB3920 and I'm going to upload. What I'm going to have at the end, and we can wait for this to finish loading, is a list of paired front end forward and reverse readers. There's all different kinds of paired end indicators, right? Some people use R1, R2, S or R, the whole word forward or reverse and to use any of these other ones, if your data comes in as R1, R2, you're just going to modify that last regular expression we had. So it pulls, it creates the column based on whatever it is that is your paired end indicator. We're going to have a list of these paired end indicators as pairs instead of just as individual and we can't see it yet. If you want to pause here, you can pause here and then we'll look at the result and then we're rarely done. We're almost there. All right, we can see that our collection is complete. We can click into it. We can see that we've got a list of items but each one is a pair with a forward reading and a reverse reading and so that's how that's split them up. Compared to the first list we created, the first collection where we just have data sets, there is no pair, we can click into them to see the data but we don't see any partner or pair forward or reverse read. That is the difference between the two collections we created. And like I said, if you want to learn more about collections, there's a whole collections tutorial, I think it's linked area using data set collections. That's it, we're done. Here at the bottom is a feedback. If you want to add some information to help us make things better, that'd be great and we would really appreciate it. Hopefully you feel now a little more empowered to use the role-based uploader. Like I said, there is a follow-up advanced role-based uploader tutorial and you can find that at the bottom of that tutorial as well it's called role-based uploader colon advanced. Maybe at least you feel empowered enough to try that tutorial and learn more and then start slowly incorporating that into your music galaxy. I hope you guys have a good day, enjoy the rest of your training if you have more and I look forward to seeing you again in the future. Have a great day, bye-bye.