 Hello, my name is Wendy Bacon. I'm part of the training team at Emblem EBI, and I'm going to be going through some of these tutorials with you. So we're going to start with the understanding barcodes tutorial. So we're in this tutorial, a lot of it's conceptual. Basically, the way you can use this video is if you can just get through the tutorial on your own, awesome. If you get stuck, like one of the tools isn't working and you want to sort of see me click through it, that's pretty much how you can use this. So this was an explanation of the different how the barcodes work and how they are. It's quite similar to 10x, which is what I tend to be more familiar with. You can certainly go and see this PowerPoint if you want some more information about plates, batches and barcodes. So this is where we're distinguishing between the cell barcode versus the unique molecular identifier and identifier barcode so that you can distinguish between either between cells or between transcripts. So why is it important to know which cell a read came from? Well, you're doing single cell analysis. Why do we need a barcode read transcript too? So I think this might have even a nice little explanation as well. So basically in case you want to distinguish between this at a high level, this one cell had a high level of, I don't know, GABDH and this other cell had a low level. You want to be able to distinguish between PCR duplicates or actual cell differences. Cool. Here's an explanation of how the UMI's work. Again, all really important information to read through. Are UMI's not specific certain genes? Can the same UMI map to different genes? No, they're not. They're random. Yeah, they can tag anything you like. And then can the same UMI map to different RNA molecules? Absolutely. In fact, it's likely that two different RNAs have the same UMI. It's pretty unlikely that it'll happen to two different transcripts of the same gene though. So onwards with the galaxy! Okay, so assuming you understand how barcodes work, you've made it to this point. Let's prepare the data. So we can copy this, upload. Okay, so you won't have these files. This is because I was playing around. Okay, but you'll have something blank, so you hit the paste button. It'll give you a blank box. Type that in and you can upload. Hit start, close, and it'll get there. The glorious orange circle of doom. And we're done! I am going to give you a little bit of a trick though here. I'm also going to be good and actually label my history, which you should always do. So there's another little trick because sometimes uploading data is slow. With a lot of these things, if you go to share data histories. For instance, I tend to label things with input. So when you bake in training, sometimes we'll have these things for you and you'll find different people do this in different ways. Anyway, you can also click there and just import that history. Label whatever you like. And then you can go from there. That word around makes more sense with bigger data sets or bigger tutorials that I worked on. Alright, so we're there! Now what? If you however go and switch between histories, you're going to have to re-find your spot in this training link. Okay, so we've done that. Now we need to build a dataset pair. Forward? No, that's not forward because you need forward to be the R1. Okay, let's get off of that. Perform operations on everything. Okay, now what? So now we need to generate a list of reads. Oh, very exciting. So I'm going to come here, paste fetch. Alright, and then I want it to be tabular. And very hopefully here is my list of stuff. It'll be the name. Alright, so come back to your tutorial while that's sorting itself out. Okay. So we now have a paired FASQ test data and table of the read names. Okay, so now we get to extract. Okay, and if I'm not mistaken, this sometimes this search bar doesn't necessarily work, but for this one I believe it does. So, ooh, this stuff's me all the time trying to input the data. Make sure to click on the dataset collection, otherwise it won't work. Alright, we now make sure we have a tabular file. This is my list of stuff. You want to tick this box. We want just positive matches. Okay, ooh, we need to take the data type. So I'm going to come over here, check this guy, FASQ singer. FASQ singer, cool. Ooh, we're going to view all the reads side by side using a scratch book. Alright, so now let's look what we can get from these reads. So we can see that each read name is starting with this at symbol. We're getting then the sequence of nucleotide bases, a separator, and then a quality score. The main thing that's interesting for us is specifically within the forward read. So wait, that's read two, so we want read one. And that is, we're looking at, ooh, I wonder what those deeds could be. Is that is we're looking at the cell barcode and the UMI barcode. So if you look in the cell seek protocol, you'll see that one through six is the UMI. And then seven through 12 is the cell barcode. And then you're getting the poly A tail. And that's quite common in this. The reverse read is when we actually get our sequence of interest. Cool. So now let's look, let's look at the quality. So we're going to do a fast QC on our original set. All right, because we want everything, not just the four barcodes. So fast QC, I believe will also show up. And we want, ugh, this gets me every time, the day set pair. And we want the original one because we want to see everything, not just the four. So normally you would do this, well, sort of first before you even do anything, but we were trying to examine to see what the different sequences were. But you want to do this on both forward and reverse to make sure that your sequence wasn't naff. If you're looking at quality for this, we're looking at what's on the actual basis. You may notice, depending on how well the galaxy server is working today, that these things may be going quite rapidly. It was the magic of pre-recording. Okay, so we've done our fast Q and so now we're going to look at the webpage and then look at the per base sequence content. So we want to look at the forward. So I'm going to go to that and then that webpage. I've just repeated that same mistake 18 times. Okay. Oh, this is pretty good sequencing quality. Anything in the green is great. And normally single-learny seek data is kind of crap. All right. So we want the per base sequence content. And when we look in the tutorial, we are looking for this smooth, relatively constant, noisy, highly varied, and then the T's where everything goes to hell. Yes, the T's. And then, yeah, you can see that here. It's the exact same image. Why might this be the case? Why is the UMI barcode distribution smoother than the cell barcode? Well, you only have X number of cells, but you have way more UMI's, don't you? So it makes much more mixed variation. Okay. So now we need to unite the barcodes with the sequence because obviously they're in different reads. So if you're looking at these example, which of these reads come from the same cell? Well, hopefully this is already just separated out for you. So you can see that all three of these have the same cell barcode. Cool. Which of these reads are PCR duplicates? Well, can't be that one because that's the only one from the cell. Well, then the same cell, you've got two with the same UMI. But that doesn't matter. That's just the super unlikely scenario wherein you got the same UMI on the same gene because the sequences themselves aren't identical. It doesn't matter. They're most likely, hopefully, coming from two different transcripts of the same gene from the same cell, as opposed to one that got dedicated. Okay. So how do we do that sort of en masse, not just thinking about it on a piece of paper? And now we're getting the UMI tools. So every time it changes to uni. Okay. So we're going to do the UMI tools extract. We got our paired in data set collection. We're only going to look at the four reads we've been looking at. It just makes it much faster. Barcode on the first read only. Use known barcodes. No, because the random, we need this pattern. So if we come down here, we've got our barcode pattern. So we can copy that. And I have a quality filter. No, execute. We have the orange circle of doom. Cool. And now we can look at them. So we'll look at read one. All right. And then we'll look at read two. Okay. And what we can see from here is, all right, well, we read one is, look at that. It's just a bunch of T's. But read two over here. We suddenly have your cell barcode. Remember how you had the three same cell barcodes? And then the UMI's as well. They've been chucked into the header along with the reverse, the actual read from that transcript. And that's what you can find in these bits. Okay. Are the forwards reads useful at all? Well, let's see. You've got the cell barcode. You've got the UMI. What else are you needing for? Just a bunch of T's and rubbish. All right. So make sure you read through all of this before moving on to the next tutorial.