 Hello! Welcome to this tutorial on deconvoluting bulk RNA-seq data into cell proportions. This tutorial was built beautifully by Mehmet Tekman, and the reality is the tutorial itself contains all of the scientific information you're going to need to know. I'm Wendy Bacon. I am from the Open University, and I'm going to be taking you through the tutorial. Really, just in case you're having any issues following it or getting the tools to run, this video is kind of your guide to go with it. But like I said, for any of the scientific content or explanations, you're just going to want to read the tutorial. Or indeed, if you just need some moral support to get you through, I can help with that too. And if you have any questions, please do get in touch using the different chat forums. Galaxy is really great at answering them. So, well, just so you know, I'm going to be doing it on the human cell atlas, usegalaxy.eu instance. That's what I'm used to. Any galaxy.eu instance should work just fine. Onward we go. Right, so we're going to use the human cell atlas.usegalaxy.eu instance, and I'm going to use the tutorial view and that's using this button. I'm going to come down to transcriptomics, and then all the way down to bulk RNA deconvolution with music. And this is going to be hugely valuable, particularly if you're not doing this in 2022, but still using the video because it means you're going to be able to use the correct tools to make that whole tutorial run. Right, because tools get updated all the time. So, there's a beautiful, if I say so myself, description within this section, so please do read it, but like I said, that's not the purpose of this video. This video is to help you get through the clicking and finding all the parameters and making sure it works. So we're going to start with creating new history and saving it. It's always good to often forget that. All right. So we're going to get our data. I think they also want to. Yeah, so then we also get our single cell RNA-seq data. Now I'm going to cheat throughout this whole process. And things are going to go a lot faster for me because I'm going to cut out all the waiting time for you. All right, so don't get concerned if your tool is taking a lot longer for you than it does for me. I'm using the magic of pre-recording. So this is quite risky to try and name them whilst they're still sort of gray or even orange, because sometimes it all sort of error out and then you'll have to rename it anyway, so I'm playing fast and loose. And we're back. Everything's uploaded. That's good. We've got them. Yep, they've got our tags. That's good. So we've got all some important information about the data sets. So we can now investigate them. Oh, you can look in the little window here if you want, or you can view the data. So this is the phenotype for the single cell RNA data. So you have what's probably the well number, but basically the cell ID. And then it tells you the subject. So these are non-type 2 diabetes. So there's probably some type 2 diabetes in there. And here you have all of the different cell types. And I must say I appreciate their unclassified endocrine. I appreciate the honesty when they're annotating their cells, because sometimes you just don't know. All right, that's cool. And that's a little bit different to the bulk phenotypic data. So the next feature, your samples for the bulk data, information about the sample, HBMI, the all important HB1C factor, gender and then tissue, probably all of our pancreatic eyelids, but I don't know. Right, and then we can also look at the actual experimental data. So here you have the cell ID across the top for the single cell data set, and then the genes, all gene symbols along the side. You have the bulk data set, you have the same same with gene symbols, but now you have the subjects or the people across the top to the bulk samples. Onward we go. So yes, go through all of these. That's good for you to do that. Always look into your data otherwise the next steps make no sense. This is going to be important later when they're looking for HB1C levels because we can consider anything above 6.5% they have type 2 diabetes. It's going to be quite helpful when we're searching our samples later. Oh, the expression set. I hate this bouncing cell data. It's always like, let's make a new type of matrix, mega matrix that stores all this data. So, and again, this is why we use a tutorial version. So I can just click this button. I don't know why that happened, but if it happens to you, just click it again. That's what we've learned. All right, so we want assay data. Subounce the expression data. Then we want phenotype data. And this is really cool because it's going to carry our tags through which will be super duper helpful. always like to cheat. So I'm just going to rerun it phenotype phenotype. Cool. I suppose at the beginning of this, I could probably just delete it this bit and made it just like healthy expression or something. Anyway, fine. All right. And that's all set. So now we can inspect it. Oh, yeah, don't do that. I just that just downloads it. Don't do what I just did. That's bad. We will inspect it properly. Inspect. Oh, nice. So sometimes I've had issues with having to click and drag. So this seems okay. Cool. So we're going to inspect for is it generally cool? Oh, we want featured data table of the SCR. My guess is we're going to do both of them, aren't we? And then we'll inspect the dimension too. Okay. Okay. So the first one's done. Now you might need to do this. It depends. So we should see a list of gene names because it's the F data, the featured data. And yes, that is what we see. A massive list of gene names. Cool. And then the dimensions. So this is genes by samples or in this case, because it's the single cell stuff, it's by cell. Cool. All right. Okay. Let's move along. Let's estimate some cell type proportions. Why should the single RNA seek data set contain books? Oh, and then there will be a weird extra cell type. That's wrong music. Single data sets, SCR name, bulk, purpose. I guess we're estimating proportions. We could untick this if we wanted, but for the purposes of this story, we weren't. Sample ID. Here we go. These the cell. Oh, sorry. These are the cell types that we want. We don't want any phenotypic factors. We want to exclude these. So when they're looking for what's making the proportions of cells, we don't want it to take into account sample ID or subject name. Show proportions of a disease factor. Yes. Cell target. We're going to put beta. So 6.5. That's the number that we're dealing with. This doesn't actually have to be from that. I mean, ideally your single data set has some disease cells in it. But in this case, this is just going to define it in the graph. Anything that is that hits this criteria of HBA1C 6.5 in the bulk is going to then have the T2D defining it. It doesn't actually have anything to do with your single cell data set. Okay. Sometimes you will want to do this. Otherwise, the axes become quite misleading. All done through the magic of television. And then we can look at all these lovely results. Like I said, you can remove this in the tool parameters if you want. The two different algorithms pick up different levels of cells. Kill, surprise. All right. And then here's cool because you see the triangles. Anything that had above 6.5, you can see here as well. Anything above 6.5 was now gets a triangle and we're calling it T2D. But again, you could have put anything there. And what's cool about this is within the beta cells where we would expect in a diabetes patient that you would have fewer of those beta cells. We see that anything with the high HBA1C in the bulk samples also has fewer beta cells. And that's cool because it tells us that our deconvolution is it works. This method works to a degree. This is another neat way of visualizing it. So you can see that you're kind of picking up these four different cell types within the samples. You can see the variability across the samples. So like this had very few alpha and beta cells and a lot more asynarynductal cells, whereas like this maybe have loads of alpha. So you can see the sort of inter-sample variation, which could be very, very helpful to a fair, even more helpful if this was organized by disease versus non-disease. But here we are. Goals for the future. Cool. That was exciting. And here's where it explains it and interprets it for you. Pretty good. Pretty good. This is a, I like this bit. I like it when stats are made for me. You know what I'm saying? I don't have to calculate it myself. This is cool. I've never looked at this. Very cool. Anyway, yeah. So this is a log of music fitting. I'm going to have to zoom out for this. Otherwise it looks insane. Yeah. So you can see basically there's stars by gender and HBA1C and also intercept, which means they're interacting. But the point is, yeah, these things all are factors. We know that they're all factors. We know that they're interacting. It's a whole thing. But HBA1C and male gender in particular are impacting the proportion of beta cells. And in G in general, the different proportions. Cool. So we've made it to the halfway point of this bulk RNA deconvolution with music, which means it's time for a cute dog break. Take it away, Ziggs. Aren't you the most adorable dog that ever existed? This is the best dog in the world. Come pick up your toy. Look, it's a bear dog with a bear. Right. Second part. New history. Well, I've done this tutorial quite a few times at no point that I actually catch, make new history. Here we are. That's my poor data management. Importing some files. Import some files. Oh, we're getting in here. And then the single cell. Being a maverick kind of right in the tag before it goes orange. Because if it changes color whilst you're doing this, it all goes to hell. I'm going to have to redo the tag making anyway. I'm impatient, so I'm going for it. And we're there. Okay, now what are we doing? You can explore the data sets as before. Tell me about yourself. Single cell RNA seek. So, yeah, this looks more like droplet-based experiments. Those are your barcodes, which mice they came from, and indeed, they're cell type. Cool, cool, cool. Expression pretty much the same. Yeah, stock 17. You're in everything. And then here, there's way fewer bulk samples. So now we have these APOL-1s versus the control, which is not particularly important in this particular tutorial, but it may be in the future. Okay, so we have all that. Let's do some stuff. We'll construct our expression set objects again. The good old E-set. Expression, phenotype. So easy to get that wrong. I think this is my favorite tutorial for why tags are important because it's like they're not all that important until you start actually analyzing your own data. But if this, it's crucial for not making them up. All right, I'm going to be honest that this one took a long time. Okay, so don't be surprised if it takes, the single cell one takes an awfully long time. Okay, we constructed, we can look at them if we wanted. Let's go for it. So we're doing the second half where this isn't going to just randomly make the groups. We're going to give it some information. We're going to do it in sort of stages and then also give it some information. So since I only seek data set, yes, bulk data set, yes, but this time we're going to compute dendrograms. We can sort of do the first round of cluster grouping. All right. This is the list we want because now these are the single cell types that we have in the sample. We don't know our cluster groups yet, so we're just going to go for it. It's done. Ooh, and now we can look at the dendrogram. Cool. So we'll focus on this time. So yeah, one of the questions is if we use this cutoff where we can see that these are grouped and these are grouped and these are their own sort of things, if we cut it right there, that's going to be the next section. Okay. Go up. So yes, let's do that next section. If that was our granularity, how many would we have? So we're going to use that and then we're also going to use this known information. So if we have a list of known epithelial markers and immune markers, that's going to help us in a supervised way identify the cell types. So it'll still go through and use the best genes. Ooh. So now we're going to run it again, but I'm not going to click that button. Why? Because I already had to input a whole bunch of stuff, so I'm just going to rerun this one. Okay. But we are now going to insert some cluster groups. We've got C1, C2, C3, so we're calling them. All right. And how do we call them? So then C1, it should be the neutrophils. And then in C2, be honest, I don't even know that cell type, but that's the best bit about bioinformatics. You don't need to. So then there were a whole bunch in this C3 group. And well, just scroll if that happens to you. Okay. There are a whole bunch in this C3 group. Okay. But we do, we can break down some of these epithelial markers. Okay. I'm going to do it again. And again, these genes can distinguish the immune cells. Well, let me think. Oh, I'm stretched now, chums. Cool. So now you can say that within each bulk sample it's giving you the cell proportions. And we can look at some PDF plots. So now we have all of this jazz. All right. So that's all the same. But now we can look with an each sample. Basically there's loads of that cell type, but you can sort of see it differences across the different fingertips. Most of these are at zero. So yeah. Cool. Ta-da! Conclusion! We've made it, folks. I hope you enjoyed it. If you have any questions, yes, check the FAQ page or indeed the GTN Gitter page. Or if you're here during the smorgasbord or whatever, wonderful, fantastic Galaxy conference you're in, you can use those chat forms as well. All right. Cheerio!