 I don't understand. It doesn't matter. Are you sure? Yeah. OK. OK. Signs, directions, me and Corey. Hi, everyone. Hi. Hi. Hi. My name's Quaid Morris. I'm a computation biologist. I'm an associate professor at the University of Toronto. I've been doing this for about 10 years in terms of my lab. I'm interested in primarily in post-transcriptional regulation, gene regulation, RNA-binding proteins. I've also become interested recently in analyzing cancer genomes. And I work on electronic medical records. And I'm a co-developer of one of the tools that we're going to be introducing to you tomorrow called G-mania. And I've been teaching in the Canadian Biopharmatics workshop for about the last six or seven years. If you have any questions, please feel comfortable just to stop me and ask me. I'm going to be here for the afternoon as well. So just feel like we're here to help you. And so if anything's unclear, I'll try to make it clear. Just in the interest of time, if there's detailed questions, what I might do is I might try to answer them at lunch or in the afternoon instead. But please don't feel shy to ask me any questions. I like it when there's a little bit of back and forth. Right. So what am I talking to you about today? So I'm talking about finding overrepresented pathways and gene lists. And this is somewhat of a theoretical talk, but it has some very practical aspects. What I'm going to teach you about is I'm going to teach you about the statistics that go into computing these p-values that you're going to be reporting for when you do the pathway enrichment analysis, which is an important component of every genomics paper these days. All right. OK. So I have a list of learning objectives here. So these are things that you should be able to do at the end of today. And you can read them. I can read them aloud to you. Or you can just look at them. But maybe just check them off at the end of the talk. OK. Oh, I think I have this thing that it matches the slides when you just see them. Whoa, great. OK. So what am I going to talk to you about? So I'm just going to do an introduction to enrichment analysis just to make the concepts clear because I realize there was some difficulty in some part of the concepts during Yuri's talk. And I'm going to introduce you to the hypergeometric test. It's also called Fisher's Exact Test. And if you're doing one type of enrichment analysis, we're just comparing your gene list to what I'm going to call gene set or a set of gene annotations. There's only one test that you need to know. And that's, in fact, the only test for this type of thing. And all the other tests are approximations of this test. And that test is called Fisher's Exact Test. Right now, if instead of just having a fixed gene list, you have a ranking of genes, and you want to ask whether or not genes with a given annotation are near the top or near the bottom of the list, then you're in the world of tests for rank lists. And I'm going to describe one test called the minimum hypergeometric test. There's about four or five other tests for this. They're all basically similar. There's one that's slightly different. But there, you have to make some choices. And I'll help you try to guide your choices. Now, when you test for enrichment of more than one category or more than one gene set, you have to correct the multiple tests. Otherwise, the p-value is reported to be wrong. And then there's two multiple test corrections. There's one called the Bonferroni correction, which is really easy. And then there's the false discovery rate correction with people like a little bit better, slightly more complicated, but not that complicated. All right. So as I said, there's two types of enrichment analysis. So you have a gene list. So you've made some choices about what genes you think are the ones that you're going to look for enrichment in. These are the ones that are responsive to whatever drug you've treated yourselves with. These are the ones that have showed up as being overexpressed in your microwave analysis, your RNA-seq analysis now. These are the proteins that you've detected in a proteomic analysis and the cell type of interest after the perturbation. And so here, this answers the question, are my gene sets surprisingly enriched, are any gene sets surprisingly enriched or depleted in my gene list? OK. So there's two different sets of genes here. And I might screw it up when I describe it. I'll try to be as careful as I can. So the gene list is what your experiment provides. The gene set is a set of annotations that someone has come up with. I'll try to call the gene sets annotations so I don't mix them up. But I realize there's two sets here. And you're going to be looking at the overlap between those two sets and ask whether or not that overlap is surprisingly big or not. OK. And as I said, there's one test, Fisher's exact test. That's all you need to know. So if you have some way of ranking your gene, say you have some sort of differential expression analysis, for example, these enrichment analysis on rank lists, they answer the question, are any gene sets ranked surprisingly high or surprisingly low on my rank list of genes? So are they near the top or are they near the bottom? OK. And the Cisco test I'm going to describe is the minimum hypergeometric test. And there's a ton of others that we won't discuss, but I'll make connections between the minimum hypergeometric test and the other statistical tests that are very similar to it when I introduce it. OK. So I went fairly quickly through the concept. So let's go more slowly through the concept. And we're going to use microarrays here because everybody kind of remembers what microarrays are and they're nice to think about. You can replace that with RNA-seq now if you want. OK. So we have some gene expression table. We have some way of deciding which genes are responsive to the condition of the perturbation that we're interested in studying. And we take that information and we add these gene set databases, for example, gene ontology, K, some of the other pathway databases that Yuri mentioned in his talk. And we compare those two things. And then we have some enrichment table where we have a gene set and a p-value associated with the enrichment of that gene set based on the seven genes that are perturbed in this experiment. OK. So just to make things a little bit more clear, here's a gene list. These are genes that we think are responsive or that I'm interested in studying. Gene sets or annotations come from gene ontology. Maybe they're transcription factors that have binding sites in the promoters of genes. And the question, again, is are any gene annotations surprisingly rigid in the gene list? That's not right. This is just five genes. Yeah. If we were ranked, it would be a lot longer. Where do the gene lists come from? How do we assess surprisingly mass statistics? And how do we correct for repeating these tests? Right. Because we're not going to be testing one gene set. We're going to be testing lots of gene sets. OK. Here's our expression matrix. We have case and control, for example, or wild type and perturbed or something. And previously, we learned how to calculate these differential expression statistics. So we have genes that are up-regulated potentially and genes that are down-regulated. And there's various ways to compute these statistics, which I hope you guys are all familiar with by now. And then you can threshold by saying, OK, I'm going to draw some line. Let's say two-fold up-regulated. And I can call these genes the up-regulated ones. And let's say two-fold down-regulated and call these genes the down-regulated ones. Now, that defines a set. These genes are a set. These genes here. Sorry, a list, a list. Yeah, question. So they are here to be ranked. But what we're going to do is we're going to convert that rank into a gene list by threshold. So right now, this is the two class design. This is the gene list set up. Without rank. Right. OK. Now, you could do it based on thresholding this rank list. The other thing you can do is say you have some sort of time course. And you see some change in gene expression across different conditions. And then using some sort of clustering technique, you define three different sets of genes that are all responsive in the same way to this time course. Now, these different sets from the clusters they themselves also become gene lists. There's no intrinsic rank here either. OK. So now, we're still on the gene list zone. So let's take our gene list. Say these ones are up-regulated. And we also need to know what the background is. And we're going to get back to this issue here. So when I define a gene list, I'm defining that gene list compared to a background. So for example, in the old days, when you did a microarray experiment, sometimes the microarrays didn't cover all the genes in the genome. So there were microarrays, for example, called the immune array. And with the immune array contained was genes that were being expressed in the immune system. Now, if I take a subset of those genes and I ask, are those genes enriched for those with immune function? Guess what? Any subset will be. Or most subsets will be. So it shouldn't be particularly surprising that when you look at an array of measurements of genes that are being expressed in the immune system and you take a subset of those genes, that that subset, under many conditions, is going to be enriched for genes that have immune function. So you need to know what your background is. So what you're doing is you're doing a comparison against your gene list, against all the gene lists that could have come up in your experiment. And we'll get back to this concept in a while, because this is an important concept. And I can see it's a little bit confusing, but I just want to introduce it now so that you're thinking about it when we get back to it in a minute or in the talk. OK. So here's the gene set. The gene set will overlap with the gene list. It will also overlap a little bit with the background. And there's going to be some genes that are neither in the gene list nor the background that are in the gene set. Let's go back to this example of the immune array. Not all the genes are on the immune array. So yeah, question. A gene set is a set of genes. OK, so gene set is something that you get from a pre-existing database, like an annotation. So it's like a pathway, or it's a go annotation. That's the gene set. These are the things that are defining function that you're trying to find rich and informed. No, not huge necessarily. I mean, bigger than your genes. No, not always, no. So the genotology has a lot of categories in it. For example, development or like heart development, those are all categories of gene function. Each one of those categories is associated with gene set. So like development, genes are involved in development. That's probably a huge list. I would say that's probably in the thousands. Genes are involved in heart development. That's a smaller list. I'd say that's probably less than 100. Genes involved in eye development is probably even smaller. So one thing is these categories vary a lot in size. And it tends to be the most informative categories are the ones that are sort of in the dozens, like 10 to 100. Because if you just find out that your gene list is enriched and genes are involved in development, great. But it doesn't tell you very much specifically. And so actually later on in the talk, I'll talk a little bit about how to choose go categories because it's important to do so because the more categories you test, the more stringent your multiple test correction needs to be. Yeah, so this is the database of these. There's a whole bunch of gene sets here. And this is one specific gene set. And this gene set has some overlap with the gene list. It has some overlap with the background. It also has possibly some genes that were neither in the gene list nor in the background. Because the gene list in the background doesn't necessarily cover the entire genome. I'll keep that in mind for later. OK, so we call this the query set. Do you want me to call it the query set? Yeah, so this is, oh, so the gene list should be the query set. OK, good set. Yeah, if I'm going to have to change my language halfway through the talk, things are going to go off the rails really quickly. OK, so just map it onto the way you're thinking about it. Gene set is what you're testing enrichment for. Gene list is what you're testing enrichment in. And the background is the background you're testing enrichment against. Maybe? OK. OK, great. So the output of enrichment test is the p-value. And the p-value assesses the probability that this overlap that you see here is at least as large as you would get by random sampling from this background and the gene list. That's what the p-value assesses. It's a measure of how likely this overlap is to be if I was sampling randomly gene lists of the same size from the background. OK, so step one, you define your gene list and your background list. Step two, you define your gene sets to test for enrichment, like your annotation databases. Step three, as you run enrichment in tests, then correct for multiple testing if necessary. And it's almost OK. It's necessary if you conduct more than one test. Step four, you interpret enrichment. Step five, you publish. Overdone, right? OK, so there's discussion about rank lists. And so let's say why you might want to test enrichment rank lists. So you can always take a rank list and you can threshold it somewhere to define a gene list, an unranked gene list, right? So why would you want to consider the entire rank list? Well, maybe there's no natural value for this threshold in your rank list. Sometimes there is a natural value, right? If you're calculating the p-value for a differential expression, maybe you choose the threshold of p equals 0.05, for example, plus some other threshold non-effect size. People often use two-fold-up-regulated or 1.5-fold-up regulated. But regardless of that, you still might be losing statistical power by choosing the threshold. And it can be somewhat of an arbitrary choice. So you might get different results in different threshold settings. And as I said, you get this loss of statistical power. Say you choose your threshold wrong and then you've got a whole bunch of other genes of that annotation that are just below your threshold. Well, you can, with the rank list type analysis, identify these enrichments, even if the differential expression doesn't achieve your p of 0.05 threshold value. So these are the possible parameters we've gained this test without them. Yeah. This is why you might want to consider using a rank list. It's certainly easier to test a gene list. There's one test. There's tons of tools that will do it for you. That might be your first approach. But people have moved to testing rank lists of genes. And certainly, if you don't have any natural way of ranking, like in this methylation example that you introduced, you do have to use a gene list. All right. And so the rank list enrichment is basically the same as the gene-centered enrichment. But now instead of defining a threshold, you just take the ranking. And you have to find some way of ranking the gene list. You can rank it by p-value. Maybe you rank it by effect size. Maybe you rank it by some combination between the p-value and effect size. I can't tell you what the answer to that question. I will give you some advice later on. But there's various different ways to rank the list. OK. And then here you have to choose some tests that look for enrichment and rank lists. I'm going to talk about the minimum geometric test because that's a test that gene profiler uses. And that's a tool that we're going to be learning to use. And then you have the same thing. You get some enrichment thing. And the output of this test is a p-value. OK. So the recipe for the rank gene list is the same as the recipe for just the normal gene list. Except step one is rank your genes. Step five is still publishing. Yeah. Tell me. Sorry. I'm sorry. Can you please speak louder? I didn't hear you. Sorry. Ranking by clusters. If you want to have a rank list based on a cluster. So the question is how you would rank genes by clusters. So now there's a couple. So I think what you said is that you enumerate the clusters and then you assign the gene a score that is based on the clusters that it comes from. You could do that. The thing is you have to have some natural ordering of the clusters. So is cluster six somehow bigger than cluster five? And is cluster five somehow bigger than the cluster four? If you don't have a natural ordering of the clusters, then it's just like having genes that are colored like, OK, here's the red gene, here's the green gene, here's the blue gene, here's the purple gene. What comes first? You don't know. But if you do have a natural ordering, you can certainly do it by cluster number. The other thing that you can do is a lot of the clustering algorithms themselves, they have like a centroid or an exemplar. They have a pattern which is shared by everything in the cluster. So you could rank based on choosing a specific cluster, looking at this pattern, this temporal expression pattern, and asking how close each gene's expression pattern is to that temporal expression pattern. And that might be another way. That's like ranking the degree to which this gene is likely to be in cluster one. That's another approach. Does that make sense? Both buttons seem to advance the slides. OK, so now we're going to do a little bit of theory. So I talked to you about this Fisher's exact test. Sometimes people call Fisher's exact test and sometimes people call it the hypergeometric test. I'm going to use both so that you're familiar with both. You're used to hearing the both. Also, sometimes I change what I call it. And then I'm going to tell you what the minimal hypergeometric test is for the rank lists. Then I'm going to tell you these two multiple corrections. OK. All right, hypergeometric test. OK, here's the null hypothesis. Every time you have a p-value, you have to have some null hypothesis. The p-value is the probability that the null hypothesis is what generated the data. So here is that the gene list that you have, these are the five genes that I introduced before. And these are unranked, is a random sample from the population. So what does that mean? It means that I put the, in this case, the black genes are the ones in the gene lists and the red genes are the ones under the background. And I reached in and I pulled out five random balls from this background population. And then what's the probability that I get four or more black balls? That's what the hypergeometric p-value calculates. And so if the probability that the null hypothesis is true, the probability that by just by random chance, I pulled out four or more black balls, if that's lesson 0.05, then I say that this gene list is significantly enriched for black balls. So what is that probability? Well, you can compute it using something called the hypergeometric function. And you can look that up in Wikipedia and you'll see lots of math. And basically, you can plot the probability that you would pull zero black balls out of five. If you reached into this background population, one black ball out of five, two black balls three, four, five. These are what those probabilities are. And the p-value is simply the probability that you get four or more black balls. So you sum up four plus five. And that's what that number is. So you can compute this yourself. I'm not giving you the equation, but you can get the equation on Wikipedia. Most people just like to plug these numbers into some online computer or use an online tool. And that's a null distribution. And so the thing that you need to compute this p-value is something called the 2 by 2 contingency table. So in this case, as the rows, I'm saying the ones that are in the gene set and the ones that are not in the gene set, the ones that are in the gene list and they're not in the gene list. So there's four black balls. So there's four genes that are both in the gene list and black. There is one gene that's in the gene list and red. And then once you remove these five genes, there's 496 black balls left in the bin and there's 4,499 red balls left in the bin. That's how you fill out the 2 by 2 contingency table. The good news is generally you don't need to do this. But you can find online tools that will compute this p-value for you. So sometimes people aren't interested in testing for over-enrichment. They want to find depletion. So if you want to find depletion, you can just test for over-enrichment of the background. Very rarely you're going to be looking for depletion or you'll be getting a significant p-value for depletion. You need a very large gene list in order to get that. OK, so now let's go back to this discussion. You need to choose the background population appropriately. And so the example I use is kind of an old example where we have a microarray that only looks at immune system genes. But now in the present day, we have RNA-seq. We have proteomics. So there's different ways you need to think about defining your background population. For example, you're looking in some cell population and you're looking for differential expression of some gene under some condition. Now certainly, if that gene is not expressed in your population, either just under normal conditions or under the differential expression conditions, maybe that shouldn't show up. Maybe that gene shouldn't be in your background population. It's not something that you're going to be able to detect differential expression of. So what you need to think about when you're defining your gene list is when you're defining your background population is you need to think about all the genes that could have appeared on your gene list if your experiment had gone differently. And so that's what the, that is the, for me, that's the intellectual equivalent of assuming that you have this immune system array that only measures expression of a subset of the genes in the gene. Now if you want to think about that for a little while and then get back to me with questions about how to define background populations, I'm happy to answer them at lunch or in this afternoon, and I'm also here tomorrow. Okay, because this is the part of the talk where people usually have the most questions. So in fact, you guys don't have any questions. Okay, good, I have one question. Yeah, you would take all the genes that are on your way. Okay, so there's, okay, so let me distinguish between two different things here. Okay, in your paper, if you want to report that a gene has up-regulated expression, you should be using the p-value that you get for measuring up-regulated expression, right? So now, what we're doing here is we're defining a gene list of interest, right? So when you're defining this gene list of interest, you can think of that p-value as a way of ranking your genes as a way of having confidence that they're up-regulated. Now you can define your gene list any way that you want. So you can use p-values that are larger than 0.05 to define your gene list, for example. Now you don't want to go too big because then you're going to start putting in genes that are probably not up-regulated. But in terms of saying, in your paper, that these genes are up-regulated and making that statistical plane, you have to use the p-value that you got out of that. You can do that, sure, could you, sorry. So I have to think about your question in a little bit more detail. Can we talk about at lunch, and then if other people aren't just in the answer, I can get up in front of people, and yes. I agree with that. Okay, so did everyone follow that discussion? I wasn't sure if it was loud enough for everybody to hear. So the question is defining the background, and I think the answer that we came up with was that you defined the background based on the genes that are actually expressed under the conditions that you're looking for. Two questions. An experiment in cell-type specific system. Right, so I think it's more of a comment. What you're saying is, if you're doing an experiment in specific cell type, you might not actually know the set of genes that are being expressed in that cell type. So you would use the entire genome as a background by default. So I mean, the problem is, okay, so there's two ways of looking at this, the answer to that question. Okay, one way of looking at the answer to that question is if that gene's not being expressed in the cell type, it's not gonna be affected by your perturbation. The other way of looking at that is your perturbation might induce a topic expression or might induce expression of genes that normally wouldn't be expressed in that cell type. So it becomes difficult to answer the question how to define the background. You have to make that decision yourself. Right, and then the way you make that decision is, as you said, and as I said previously, is that this is, you know, the background is all the genes that could've shown up as being, you know, very differentially expressed under the condition that you're measuring. And then that can be due to your ability to measure them. Right, so some genes you can't map RNA-seq, some genes is very hard to map RNA-seq reads to because they contain a lot of repeats. Right, or some proteins are really hard to get and to mass-spec data out of because they don't get good peptides. So that's one thing to keep in mind and then the other thing to keep in mind is whether or not that gene would ever be, it's ever possible for it to be expressed in that cell type that you're looking at. I mean, it's not terrible to use the entire genome as your background. You should just be careful when you look at it if you don't fool yourself because of this like immune array thing, right? If you're only ever gonna be able to see genes that are expressed in the immune system in your gene list, you shouldn't be surprised if any random subset of your gene list is enriched for immune function. So you're switching from microarray to like RNA-seq, for example, yeah. So you're switching, you're using an NGS or RNA-seq and the representation is random, what do you mean? Okay, so the question is, again, is defining genomic background and the specific question is related to, for example, if you're using a new experimental protocol where like an NGS sequencing, for example, where not all genes necessarily would be able to be measured because potentially they're not amenable to the protocol that you're using in terms of getting into being able to show up in the flow cell. We could discuss this, so I mean, one thing you could do is you could look at genes that are expressed under any condition. I mean, those are certainly things that you can measure and that could be your background. And that's typically what I do in my lab is I say, okay, well, let's see genes that are expressed under any of the conditions. Those are my backgrounds and then my gene list is the ones that are expressed under the specific condition. That's one way of defining it. I mean, these are, but they're hard philosophical questions to answer. So like, for example, one of my colleagues, Anne-Claude Jean-Grasse, she has proteomics and she keeps a database around proteins that she's actually been able to detect under some perturbations in a cell type of interest. And that would be, that would solve the question that you're answering, how to define this background set under the assumption that not all genes can be measured. But I see Michelle's being a little bit uncomfortable. So I'm gonna go out at this point, but I'm happy to discuss this more later on. Okay, all right. You gotta think about it. Yeah. All right, so here we go. What? Yeah, yeah, they might challenge her, yeah. And you as a reviewer can challenge someone else's background choice. Okay, so if you just have a gene list, Fisher's Exact Test is the test that you use. Sometimes people tell you it's binomial test or high square test. Those are approximations of Fisher's Exact Test back when computers weren't faxed. Because with Fisher's Exact Test, you gotta sum up a whole bunch of different terms, right? So, and some people don't realize that computers are now fast enough to compute the exact test. That's why it's called the Exact Test, the high square in approximation to the exact test. Rank lists, so like I said, minimum hypergeometric test is what we're gonna learn. This is extremely similar to another test called the GSEA test, which you probably all heard about. And the GSEA test is extremely similar to another test called Kumagorov-Svernov Test. Right, so these three tests, the description I'm gonna give you more or less applies for all of these three tests. There's another test, well, there's another two tests, that also, it's the same test, it's got two different names. It's called either the Wilcoxon Rank Sum Test or the Man-Witney-U Test, and they're identical. Sometimes people call them the Wilcoxon-Man-Witney Test. Nobody realized it until recently that they're identical. And this is like a robust T test. So it asks whether or not the median of genes in the gene set is different from genes in your, not in the gene set. Sorry? I can't see who's speaking, sorry. Oh, there you are, sorry. Which do I recommend? Well, ask me that question again when I introduce it. So they can test for different things. So the nice thing about the Wilcoxon is it's easy to get the P value up. The problem with the Wilcoxon is that there's some types of differences between gene lists that it doesn't detect, whereas these other tests will detect some types of differences. I used to have a slide about this. We'll see if that slide is still on my deck, and if it's not, I'll explain that when we get to that point. Okay, so how do you compute the minimal hypergeometric test? So you calculate the P value of multiple thresholds. So you know how to compute the hypergeometric P value now, right? So now you just, you have this rank list and everything on the list is either in the gene set or it's not in the gene set, and then you just go down and one thing you could do is you could try every possible threshold, compute the P value at every one of those thresholds, and then you take the minimum P value. That's not an incorrect way of describing what this test actually does. And here's a paper that introduces the test. Okay, but the problem is, as you know, is you're taking this P value of multiple thresholds so you do need to correct for multiple testing and I'm gonna tell you how you would do this. Okay, so like I said, so here we go. So here's your gene set and I'm representing genes in the gene set as lines and then I'm representing the list of genes that you have. So these are all the genes in your background as this kind of continuous gradient, right? And so the location here shows the line shows you where it ranks in this continuous gradient. So meaning this is like rank five here and this is like rank, well, let's talk about percentiles. So this is probably like the 60th percentile and this is like in the 30th percentile or something, okay? So now the question is, are these lines, are they significantly towards one end or the other end of the list? Okay, so to do that, you need some way of computing the score, this some enrichment score, right? And like I said before, what you're doing basically the hypogeometric, the minimal hypogeometric test is you're looking at each one of these lines where a gene from the gene set shows up in the rank and you're saying, well, what would be the P value here if it's thresholded just below that gene? So when I'm gonna be plotting in the next slide, I'm gonna be plotting those P values as a function of where you are in that ranked list, right? Now P values are not really fun to look at in plot, so what I'm gonna plot instead is the negative log 10 of the P value. So what does that mean? So it's like the P value is 10 to the minus three, the E that I'm gonna plot is three, right? The P value is 10 to the minus five, the E I'm gonna plot is five, right? It's just an easy way of looking at really small numbers. Okay, so here's the negative log base 10 P value, right? So if we go up here, the highest point is the threshold at which we get the smallest P value, okay? So now what's happened here is I've just calculated the P value at every single gene in this list and wherever you get a gene from the gene set, the P value with the negative log P value, so the P value goes down so this number goes up. You can see whenever you hit a new gene, the number goes up, when you go to get a gene from the, that's not in the gene set, the number goes down, so it works like that, right? And so the maximum here is the minimum hypergeometry P value. So that's where you choose the threshold and that's your final score, right? And then this is like, this is essentially the gene list that you've identified by finding the maximum of the final score. Yeah, so these lines here are the annotations, yeah. So the continuous gradient is like, you're all the genes in your experimental assay ranked from top to bottom and where the lines show up, these are the appearance of genes that are in the gene set, the annotations. So this whole, this gene list, the ranked gene list includes the background. So like you, yeah. So only genes, like when I've talked about before your background, that's a set of genes that you pull from, those are the ones that you're gonna get ranked to. Yeah, and you have a question? Well, I mean, so the question is, that's a way to find threshold to be upregulated. But as I said earlier, the question of upregulated or not is something that you do with an earlier analysis, right? Yeah, it's a way of defining the gene list of things that could be upregulated. But you can't say these things are upregulated because you have another p-value that tells you that. So I just missed the last 10 words that you said. I've got a gene list that's ranked by, Okay, so the question is, can you use this thing to prune your background down to a smaller set? So you can think about like choosing the threshold as a way of doing that, right? So you're choosing the threshold is where you get the highest enrichment p-value for this gene set. So once you choose that threshold here, you have this set of genes, this is the set of genes that are most enriched via some measure for the annotation that you're asking about. So if you wanna do pruning, you prune from the on here, right? But yeah, the problem is you can't make decisions about how to do your enrichment based on the results of your enrichment because then you get kind of circularity. You're not ready to buy it. Yeah, and so you can't really, like you can't say, oh, now I'm gonna take this gene out and I'm gonna redo my enrichment because you haven't essentially already done your enrichment. And the only reason that you know that this is a bad gene is because you did your enrichment text. I mean, you could use that as a way of say identifying genes that might be artifacts or might have shown up as being operated related for some reason that's independent of things. But now you can't report some p-value that you've got by doing an enrichment test and then massaging your gene list so that you can re-do the enrichment test, that's circular or operator bias as you mentioned. So the question is, when you choose the maximum p-value, you're not necessarily getting everything. I mean, there's nothing you could do about that. You're your, and you could go back and what, sir? That's a good point. So the point is that when you're choosing this threshold, you might not get all the genes in the pathway. And certainly we're not getting all, we're not getting these genes in the pathway. And then, but later on, you go back and use maybe some sort of network analysis to try to get more genes in that pathway. And that's actually something that I'm gonna be talking about tomorrow. Which, so this is a really good point. The other thing to say is these genes that are down here, these pathway databases aren't perfect. And depending upon what evidence code you use for gene ontology, some of these evidence codes are just like the function of the genes being guessed by a computer. And no human has ever looked at that. And those are called electronic annotations. So these genes maybe they're not actually involved in the pathway and the misanitization. That's something that to also think about when going on here. I mean, you're not gonna be able to do anything about that. But certainly don't get uncomfortable if you don't pick up all the genes in the pathway. So the key value is basically defined by taking the threshold here and then doing a hypergeometric test up here. That's what they, that's what they, what I'm calling the score. Now that score isn't necessarily a key value. You have to do something to translate that score into a key value. I'm not about to describe it, but you're gonna do a translate that score into a key value. Yeah, one more question. So that's a great question. The question is, what about gene sets that are one end or are or gene sets where you have like, like if you imagine like, you know, they're enriched at the top and the bottom end. Right, you're it. Yeah, so yeah, there's two answers so far to that question. One is your answer, which is, just reverse the order of the gene list and do this. On the other side, what's your name? Joseph. Joseph. And then Joseph's answer is, is just to take the absolute value. So instead of having up-regulated or down-regulated, just say, differentially regulated. So in theory, so this is that, you know, this is, this is I think the point where I should maybe say something about the other texts. Okay, that are like this. So there's, what I draw on here at these slides, I actually use these slides in previous talks to explain the GSEA test because it's so similar to the maximum hypergeometric p-value test. And the only difference is, is this axis is a little bit different. They call this the enrichment score. Instead of this being like negative log p-value, it's like some number that goes up by one when you get a gene in the gene set and goes down by something small-ish when you get a gene that's not in the gene set. Okay, and so the way that the GSEA works is it finds the maximum difference from zero. All right, so that could, maximum absolute difference from zero. So here in the minimum hypergeometric p-value, we're finding where it's the highest. And GSEA just finds it where it's most different. And so, for example, in your question where you have lines that are, if you have genes that are reached down here, what's gonna happen for the ES score is you're gonna get a dip, right? That's not gonna happen when you compute the hypergeometric p-value. I mean, it can happen if you set your zero at 25. Your p-value is gonna get higher. So it might work that way. But I don't know anybody who tries to make it work that way. So in this case, you just reverse the work of your genes. Now, looking for this maximum difference from zero, that should also work in the case where you have like your gene lists are either at the beginning or they're at the end. Right, so you're gonna get, what's gonna happen is you're just gonna go up and it's gonna go down and then it might go down again here, depending upon how you compute the enrichment score. And so, GSEA will sometimes detect that but the test that looks for maximum difference definitely is called the Comagaro-Smirnov test, the KS test. Okay, and it's these interesting examples where you have things at the top of the gene list and the things at the end of the gene list that Comagaro-Smirnov detects and minimum hypergeometric and I'm not sure about GSEA definitely detect. Right, but there's various ways around this, like reversing your gene list or taking the absolute value. Does that make sense? Okay, great, all right. So now we gotta correct this score for multiple testing. You can use a standard multiple testing correction and we see next section. The other way that people have done this, I think this is what G Profiler does is you can compute empirical P values using permutations. So what does that mean? What that means is that you take this, be the rank list that you have and then you just randomly re-sort it, right? And then you see where the gene set shows up in that. You make the same plot, you find your maximum again and then that's a number that you associate with that random permutation of the rank list. And you do that random permutation thousands of times and you collect these numbers and then you ask, how often is the number that you get with the real rank greater than or equal to the number that you get with this random permutation? Sorry, it's the other way around but it'll be clear when I show you the slides. So remember what the null hypothesis is. The null hypothesis is that this gene set is distributed randomly in the rank list and the way that we generate samples from the null hypothesis is we randomly re-order our rank list, find out where the gene set is. When we randomly re-order the rank list, the gene set is distributed randomly in that list. So we randomly re-order the list, we get the score, we randomly re-order the list, we get the score, we randomly re-order the list, we get the score and here this is done 2,000 times and this is the histogram of the scores that you get and these are the accounts. And the score that we got is way out over here and the p-value using what's called this empirical way of estimating the p-value is just the number of times you've got a score at least as large as the score you got for the real list. It's not bootstrapping, it's called, it's called, you think permutations to compute empirical p-values. Bootstrapping is something that's slightly different. Bootstrapping is re-sampling, re-computing something and then what the bootstrap tells you is what your uncertainty is in that number. This is the way of generating samples from the null hypothesis when you can't empirically compute what the distribution of the null hypothesis should be. Sorry, when you can't analytically compute what the distribution should be. I'm sorry, you guys speak up, I can't hear you. So it doesn't depend upon your list size because basically what you're asking is how often would a randomly sorted list give me a score this high or more? So what it depends on is how significant you want your p-value to be. So if you need a p-value that's like 10 to the minus five, it's like one in a hundred thousand, then you need to re-sort the list like at least a hundred thousand times. If you need a p-value that's like 10 to the minus three, like one in a thousand, you only need to re-sort it a thousand times, a bit more than a thousand. Yeah, that's pretty. Okay, so this is, I mean, this is a very, so you're right, so the reason that you need very small p-values is later on you're gonna correct it. And then there's a lot of discussion about, that we can have about this, a beautiful statistical discussion that I love. But just to answer your question quickly, what people do sometimes is they do this permutation and then they fit some sort of functional form to the distribution to estimate these tail probabilities so they can get estimates of the p-value that don't require them to do a lot of permutation. All right, so here we go. All right, and so that's, this is how these p-values imputed. I'm not recommending that you do this yourself, it's gonna take a long time. But when you see software that does an empirical p-value computation, this is what's going on, right? And if you have to wait a long time for the software to run, the reason you're waiting a long time is it's randomly re-sorting these lists and then re-computing the enrichment score. Okay, great. So now we're on to multiple test directions. Any quick remaining questions here? So we've only got half an hour. Usually the background is five to 10,000 genes. Yeah, I mean usually those are the genes that your gene list is chosen from. Okay, so maybe I misunderstood your question. I thought you were asking about the general length of the background list. So the background list is just everything that you think your gene list should be taken from. Now, but if you're asking like how large a gene list you need to get significant p-values, that's a little bit of a different question. And it's a very important question because you're gonna be choosing the threshold somewhat arbitrarily. So like often what I find is gene lists in the, like somewhere between 30 and 100 sometimes give you the best resolution in terms of being very specific about the functions that are being represented, but also having enough statistical power that you can detect changes. This is a really, really rough guide. Right, now if your gene list has like five genes in it, it's really, sometimes it's very hard to detect significance because your statistical power depends on the size of your gene list, but like that number shows up when you compute the p-value. But that being said, people get gene lists like a thousand genes long and they get really interesting things out of those gene lists. What you're gonna get is you're gonna get a lot of categories that are significant and then you're gonna wanna use some sort of further analysis like using the enrichment map that Yuri showed in order to kind of group those categories into functions, similar functions. All right, so we all wanna get to step five and publish our paper, right? We can't get there until we get a significant p-value. So how do we do that? Well, there's actually a really easy way to get a significant p-value. And that's just continuing to draw balls from this distribution until we get one that we like, until we get a draw that we like, right? So based on the p-value that we computed earlier for four black balls from this background population, on average, you only have to do about 8,000 draws to get something that is that significant, right? So if you wanna publish your paper, you can just submit this p-value boundary by just continuing to do the random draws. Now, obviously we're not choosing a whole bunch of different gene lists, but when you look at different annotations, you are effectively choosing different gene lists, right? By changing the annotation is in some ways kind of like changing the gene list, right? So you have one observed draw. Now we've marked these things as both being black and having a particular shape, right? So if we change the, if we have an observed draw like black balls, but then we say, okay, well, instead of looking at whether or not the ball is black or not, I wanna see if it's round or square, or there's like 8,000 other things, 8,000 other features of the balls that I'm gonna look at, that's like taking your draw over and over again, right? And so, yeah, good point. So the point is that the features themselves can be linked. Yes. And so in this case, I'm assuming that the features are all independent of one another. Certainly if the features are linked one to another, it's not taking as many effective draws. Now the issue is that if you know something about all the features are linked together, you can do the multiple test corrections slightly differently. The multiple test corrections that I'm telling you about are ones where you, we don't know anything about the linkage of the features. Because basically when you compute a p-value, you wanna be as conservative as possible because you're making a claim. So the p-value that you compute is a bound, an upper bound on the p-value, right? So if we assume that all these annotations are independent from one another and we make the correction, making that assumption, that stringent p-value is going to be higher than the other p-value which incorporates all this information, right? But we wanna be careful, right? You wanna avoid making false positive claims. And that's what the p-values are all about. Okay. Okay, so this is the super stringent correction that assumes that all these annotations are different from one another, right? It assumes that all the annotations are independent and this correction is called the bond for only correction. I love this correction. It's so easy to explain. I can show you an equation here. And so say you have M different gene sets or annotations that are tested. The corrected p-value is just the original p-value times M. So let's say we're using gene ontology. We have 1,000 categories. We have computed p-values for every single one of those categories. The bond for only corrected p-value is 1,000 times the p-value that we compute for all those categories. Okay, now this issue, does that make sense? Yeah, okay, easy, right? Okay, now the issue of the p-value being upper bound on the actual p-value comes up here, right? So now say we test the category. We get, there's no enrichment at all. We get a p-value of 0.5, right? That's not significant, right? It's not 0.05, it's 0.5, it's 50%. Now we're gonna take that p-value and we're gonna multiply it by 1,000. What do we get? We get 500, we get a p-value of 500. Now that sounds stupid, right? Because p-values are supposed to be at maximum one, right? And this is the point that people start to freak out, usually. But what that means is, is it just means it's an upper bound, right? The p-value is less than 500, right? We probably, in that case, you know it's also less than one because p-values can't be greater than one. But this is what this gives you. This bond for only correction gives you an upper bound. Because it's making as few assumptions as possible. It's using something called the union bound in probability. Which, like I said, is assuming that all the annotations are independent and as you brought out, the annotations might be linked in some way so that two different annotations might actually be testing the same thing. All right, and this whole thing here about it being a bound, being greater than or equal to probably one or more of these is a grand draw. I mean this is the technical thing that corresponds to what I just said. And then sometimes when people use the bond for only correction, they call this controlling for the family-wise error rate. Okay, and the only reason I put this down is that there are people come up with more careful corrections in the bond for only correction that also do the same thing that are basically the same in the bond for only correction. So sometimes you'll see in a paper that someone is correcting for the family-wise error rate. They're gonna use some correction that you don't know. But it's essentially going to be the bond for only. It might be a little bit better than the bond for only. Okay. Okay, so the problem is that if you're gonna multiply your p-values by 1000, you're in big trouble often, right? So it's very stringent and you can wash away your own arrangements, you need to call negatives, right? Now, if you know, are good friends with a statistician and you know that there's some links between your annotations, sometimes you can incorporate that information when you're computing a p-value that corrects for a family-wise error rate. But there's another thing that people have done in genomics and this was actually introduced in genomics is to accept a less stringent condition. And that condition is called the false discovery rate or the FDR. And this leads to much gentler correction when there are real arrangements. Okay, so what's the difference between the family-wise error rate and the false discovery rate? Let's see if I have a slide about that. Okay, oh, good, I do. Okay, so the family-wise error rate, the bond for only correction is once you do that, you compute the p-values, that's the probability that it's the probability that one or more of your observed enrichments are due to random chance, right? And it's an upper bound limit, one or more. So it could be one or it could be two, it could be 10. It's like saying there's no, none of these enrichments are due to random chance, like one minus the p-value, right? Zero, okay? The family-wise error rate is the expected proportion of the observed enrichments that are due to random chance. Okay, so I run, I run like I compute p-values for each one of my individual categories and then I do a bond for only correction and then some of my categories after I multiply by the bond by 1,000, have a p-value of 0.01, right? So what that means is the probability that any one of the enrichments that I'm gonna report above that are higher, sorry, we take a step back and let me explain that again. So I'm using a number that you guys are familiar with. Let's say our significant threshold is 5%. So we compute p-values for each one of the 1,000 categories and we multiply by 1,000 to get our bond for only corrected p-value and then we find that 100 of the categories still have p-values less than 0.05. Okay, now we're gonna call those significant, right? With a bond for only corrected p-value 0.05. So what does that mean? We said that means that there's a 5% chance or less than any of these 100 are false enrichments. It's one or more. FDR is the proportion of these 100 that are false enrichments. So if we do the FDR correction for the p-value and we corrected a false discovery rate of 5% and we have 100 enrichments that we're gonna report, what the FDR is saying is that probably on average, no more than 5% of these 100 are false. So that says the number that are false are five or less on average. Some people are comfortable with that. I mean, certainly a lot of people are comfortable with that. It doesn't necessarily tell you which ones are the ones that are false, but it says in general, I'd say like 5%, sometimes people correct it, false discovery rate of 10%. That's very popular. So the 10% of my observed enrichments are due to random change. Okay, and as you can imagine, that makes a much less stringent threshold. And certainly how stringent that threshold depends, it is depends on how many enrichments you report. If you report 700 enrichments, then you're saying like no more than, and you're correcting a 10% you're saying, I would not average about 70 of my enrichments are gonna be false, false, okay? So it becomes less stringent as you report more enrichments because you're going from five to say 70. Okay. So now how do you compute that? So I told you how to compute the bond for only, everybody can now do that. You just count your number of tests and multiply it with p-value. This is computing the false discovery rates a little bit more complicated but not much more complicated. You can do it in an example. Okay, so the first thing you do is you take all the things that you're gonna test for, and these are in the different categories and there's 53 of them, and you compute the p-value. So you're just using Fisher's exact test. This is called the nominal p-value. That's the p-value you compute Fisher's exact test before you do the correction. And you just rank them from smallest to largest. So then the next thing you do is you compute the adjusted p-value. And how do you compute the adjusted p-value? You multiply by the number of tests that you did but then you divide by the rank in the list. So up here, this adjusted p-value, I'm taking the p-value and multiplying by 53. That's the number of tests I did. That's the bond for an incorrect p-value right there at the top of the list. But now when I get to go down here, I'm, you know, it comes out to be 0.053 because I constructed this example myself, but now I'm multiplying by 53 divided by two. So it's a less of a correction, right? And that correction gets less and less and less as I go down the last p-value that I don't correct at all, right, I just take the straight p-value because I'm multiplying by 53 divided by 53. Right, so now we have the adjusted p-value. Now here, look, these p-values aren't necessarily in decreasing order anymore, right? Because the correction you think you're multiplying gets smaller as you go down. You can see that 0.053, 0.053, 0.04, 0.053. And the math is there if you wanna try to reproduce this afterwards. Okay, so now the FDR, the false discovery rate, and people sometimes call this the q-value instead, you get that by, for each one of the ranks, you look at the smallest p-value, adjusted p-value at that rank or higher. So this rank, the smallest p-value at this rank or higher is 0.053. At this rank, the smallest p-value at this rank or higher is 0.04, so it's FDR becomes 0.04. And now your p-value threshold for a false discovery rate less than 0.05 is the p-value at which the FDR reaches 0.05. Even though this list right here doesn't always increase, this list always increases, right? Because this value here is gonna be the minimum of this or any that occur below. And this value is gonna be the minimum of this or anything that occurs below, so it's always gonna increase or be the same. Okay, so now this is the p-value threshold at which you have to correct for a tab of false discovery rate at 0.05. So now this is, you can implement this in Excel, it's really easy to implement Excel. I mean, most of the tools that we're describing are gonna do this correction for you, but if you're ever in the wilderness and all you have is Excel, you're still okay. All right, so any questions about that? So now I told you about two different corrections that you could make to make sure that the p-values you report, you're not reporting false positive enrichments. Okay, but there's other things in some of them are stringent, even the FDR can be a little bit stringent, but there's a lot of things that you can do to avoid having to make these really nasty corrections on your p-values, right? So if you're gonna test 10,000 gene sets, you're always gonna have to multiply your p-values by 10,000 or you're gonna have to be at the FDR, but then the FDR will save you under some circumstance, but it won't always save you, right? So another way of dealing with this is to reduce the number of tests that you do, right? So if you're only interested in some functions, only test for those functions, that's the first thing that I would suggest. The second thing that I would suggest is think about statistical power when you're doing the corrections and whether or not the thing that you're, when you're choosing what tests to do, think about statistical power and think about what the test will tell you, right? So as Yuri explained, the gene ontology is represented as a hierarchy, right? So the nodes at the top of the gene ontology are things like development, metabolism, stuff like that, very broad, not super informative things, right? And then the middle nodes are things like, say, like eye development, it's the example that I came up with, right? And so those are sort of the middle nodes of the gene hierarchy. And then at the bottom, they're very, very specific processes. Okay, so if you're gonna choose categories to test, I would test the middle, right? Because the top nodes, I mean, depending upon what you wanna do, there's not that many categories at the top, so you can throw them in if you want. But like, does development tell you what you wanna know? Does metabolism tell you what you wanna know? It might be a little bit too broad for you. The stuff at the bottom is very informative, right? I don't know. I can't even think of something off the top of my head, but it's very specific. But these gene sets, they only have like three genes in them. Now the problem with testing for a gene set that only has three genes is you don't have a lot of statistical power. The p-values that you can get, the significance, the size of the p-value depends upon the size of the gene list, which we've already discussed, but also depends on the size of the gene set. So if you only have three genes in the gene set, there's some like minimum that your p-value can get to. And that minimum is often simply not small enough to survive the correction, right? So what do I do in this case? So some of the tools that are out there, like David, for example, they allow you to select, go on Richmond to test based on where they are on the hierarchy. And gene profiler does the same thing, right? No, it doesn't, nevermind. But gene profiler does something that I like better. And that's selecting the category by size. So select gene sets that are in given size range. You already want to say something? I'll repeat. So what Yuri was saying is that conceptually selecting a pick or go level is not really a meaningful concept because when you say I'm going to select go level three, well there's some nodes, that's like the number of parents until you get to the top node. But there's some nodes that are both at level three and level four and level two because there's multiple paths to the top node. So if you go one path, it's got two parents. If you go another path, it's sort of not two parents, two ancestors. If you go another path, it's got three ancestors. So it's not a meaningful concept. But going by goal category size has always been working pretty well for me. And so you use sizes that would allow you to get key values that are small enough. So it usually has a minimum, I use about 10, sometimes 30 genes as the minimum size that I work at. Right, and what that does is that removes a lot of the smaller goal categories so you're not testing for them. And because you're not testing for them, they don't go into the number that you have to multiply by when you do the correction. Now just let me caution you about something here. You can't make any choices about the tests that you're gonna do after you've done the tests. Right, that's cheating, right? That's circular. You're like, oh, these aren't significant. I'm gonna remove them when I compute the size of the correction I need to do. Okay, so that's obvious that you can't do that. What people wanna do sometimes is they take their gene list and they say, okay, we're not gonna test for any categories that aren't represented somewhere in the gene list. Right, so I have a set of 10 genes. None of these genes are involved in high development. So I'm not gonna test for high development. That's like implicitly doing the test, right? There you are counting. You're still doing the test implicitly by finding the fact that there's no high development genes in your gene list. So that's not fair either. And that's like looking at the result of your enrichment test when you choose the test that you're gonna do. Well, the way you presented that makes it sound like it's okay, but it's not okay. Oh, that's not okay. It is, it's a practical thing that people might do. That's why it's important to know that it's not okay. Right, it's an obvious thing to do. I mean, I wanna do it too, but it's not okay. Yeah, yeah, that's okay. I mean, it's not perfectly okay, but it's mostly okay and it's what people do anyways. Yeah, as long as you correct for increasing the number of categories, Michelle. Oh, I'm sorry. So the question is, you start with a small, let's say you start with ghost limb. You start with a small list. You don't find any significant things, significant. So then you increase the list of categories and then you test in the larger list of categories. And is that okay? As long as when you test in the larger list of categories, you do the appropriate correction for the larger list. It's almost okay. And it's almost okay enough that's what a lot of people already do. Right, so, I mean, in theory, you shouldn't be allowed to choose a category test and then choose a category test and so forth because you're kind of using information about all the categories when you do that. But, you know, people do do this and it's not terribly bad. So it's okay. Just, you know, just among us. But, Slim, you are most likely to not get a significant use of all that data. No, I don't agree with that. So the point was if you don't find significant to go slim, you're not gonna find significance with smaller categories. So now I think you're right in that the go slim is a larger category. So it's gonna be, you're gonna have more statistical power. But it might be when you have the go slim categories, like your, you know, hard to go slim would be development. The first one, like if you use the, if you're gonna find the significant results that you think goes slim, then you are most likely, and if you use the larger data, you are less likely to get the significant result. You're seeing the full data set. Yeah, so by full data set, you mean the full set of annotations, right? Yeah. Right, but that's not necessarily true. It's not necessarily true that if you don't find significant things in go slim, you're not gonna find significance with more categories. Because if go slim has very broad categories like development, right? Which means that your genes that might contain things that are being affected by your assay, but also a lot of things that aren't being affected by your assay. And the things that aren't being affected by your assay are contributing to decreased enrichment. So if you can be a little bit more specific about the questions that you're asking, like eye development, you can get significance, even if the entire set of development genes is not a rich. You could imagine that, like, you know, genes involved in eye development are operated when genes involved in, say, liver development are. Okay, more questions about that. Great, okay. Wow, I think we're on time. So, today I told you about statistical tests for gene lists and there's one you need to know and finish this exact test. And then rank lists, there's a bunch of them. I described the minimal hyper geometric test, but, you know, the GSEA and Kalman-Barnow-Spirinoff test, you can call it the KS test, everybody does, are equally, they do basically the same thing using a slightly different scoring method. Locoxon and Man-Whitney are like T tests, basically, it's one way to think about it. And I talked about two multiple test corrections. One is the bond-phoronic correction, this controls the probability that at least one falls positive, it's also called the family-wise error rate. And the other test I talked about is more forgiving and it controls the expected proportion of false positives, right? So, if you have 1,000 tests that you're gonna say, sorry, if you have 100 categories, you're gonna say are enriched, the bond-phoronic p-value is the p-value that any one of those is a false positive. And the FDR is the proportion, the expected proportion of those that are false positive. And there's the learning objectives. And are there any questions? Let's go here. I mean, if you go to a bigger annotation database. Yes. I mean, if you didn't find the results with a smaller annotation, then you went to the bigger database. You are less likely. And that's a very good point. And so the point is that there's two things that go on when you go from go slim to a larger annotation database. So when you get something that's more specific, but because you're testing more categories, your multiple test correction becomes partial. Yeah, and that's a very good point. So 5% of them, on average, will be due to rent and check. That pass the FDR. That pass the FDR, yeah. And then 100 and say we can see a lot more. So, but here's the thing. So like, okay. So say I did 10,000 tests. No, let's say I did 1,000 tests, right? You know, I can't get the p-values for all those tests. And if it's just random chance, those p-values, they're a random variable that's uniformly distributed on the interval between one and zero. So what that means is if I do 1,000 tests and it's just random, like just random choosing, 50 of those tests, they're gonna have a p-value less than 0.05, but every one of those tests is a false positive, right? What the FDR says is that, okay, so I do like, I do 1,000 tests. I get some tests that pass an FDR of 0.05. Let's say I get 100 tests that pass an FDR of 0.05. Five of those are going to be false positives, because it's five of the ones that pass. So you're right. So the p-value, a nominal p-value of 0.05, it's like that's the probability that this is, you got this enrichment due to just a random chance. But that doesn't mean that if you just threshold a 0.05, that 5% of the tests that pass the threshold are going to be due to random chance. Yeah, that's right. Yeah, that's right. That's right. So let me try explaining it again, or you wanna try it? Are you, is your question related to this question or is it? Is it a question? Okay, so the issue that you're talking about is how you choose the threshold. Okay, let's get back to that in a second. I wanna completely address this question because it's a really important point. Okay, so there is two things that you, so okay, so what we're asking is when we report something as being significantly enriched, what's the probability that it's a false positive? Okay, now I'm using that terminology very loosely because it's not, the p-value is not quite that, but let's just like in this room think that it means like probably false positive, okay? All right, all right, so we do 1,000 experiments, sorry, not 1,000 experiments, we do 1,000 tests, and we get some collection of things that pass that p-value threshold. So if you do 1,000 tests, 5% of them are gonna pass the p-value threshold to 0.05 because that's what a p-value means. The probability that you get that enrichment or higher due to random chance. So like you run this, you do 1,000 tests, there's no real enrichment, but you still get 50 that pass the p-value 0.05. So in this case, every single one of those 50 is a false enrichment, okay? That's what happens when you look at the nominal p-value. Okay, when you look at the FDR, the FDR is saying something about all the tests that pass the enrichment, okay? So for example, in this case, we do like an FDR, we do an FDR correction, let's say we get like 10 tests that pass the FDR threshold 0.05. Now here, in this new world, there actually are some tests that are significant, right, there is some significant enrichment. So if we, did I say 10 tests? Yes. Okay, I'm gonna say 100 instead. Because 85% of 10 is not fun, right? Okay, so there's 100 tests that pass the enrichment. So now, what the FDR is saying is 5% of those 100 tests are false positives. So that's five tests out of the 100, sorry, five of the significant enrichments out of the 100 are false positives, right? Now in the nominal case, all of them are false positives. Right, in this case, I'm guaranteeing you that on average, 5% of them are false positives. So now we go to the Bonferroni case. 100 tests pass the P-value threshold, the boundary-wise error rate correction. The P-value point of five means there, it means there's only a 5% chance that any of them are false positives. So it's not, you know, so like on average, zero are false positives when you pass them on from the correction. On average, 5% are false positives when you pass the FDR, and when you just use the nominal, it's possible that all of them are false positives. Does that make sense? This is important to distinguish between these three.