 talking to you about finding overrepresented pathways and gene lists. And this is really, I hate to say, the theoretical part of this. And then you're going to be doing a practical in about an hour. The reason we teach you this is because there are, even though for many cases there are tools that do these types of analyses for you, there's always a little bit of a turnover in tools. And often we had students who came, and probably there's some students here, whose organisms of interest weren't actually covered by the tools. And they had to figure out how to do this stuff themselves. Also, I think it would help to understand what the p-values mean that you're getting out of these tools, what a false discovery rate is, and what a bond for only correction is. So if you're comfortable with these things, because the tools refer to them in different ways and different tools do different subsets of these things. Please, if you want, stop me with questions at any time during the talk. I'm very happy to answer people's questions. In fact, I'd like a little bit of back and forth. So our learning objectives, there's really, there's a big list here. And you can read this in your own time. You can read it now if you want. But really there's four things that I want you to learn. So in terms of the basic tests, there's two types of tests. There's tests where you have a list of genes. And you want to find out if any pathways are enriched in that list. And then there's another type of test where instead of having a discrete list of genes, you have like a whole bunch of genes that are ranked by some score, like differential expression, for example. That's the motivating example that we use, but any way that you have of scoring genes where you might think that genes at the top of the list or genes at the bottom of the list are special and somehow these types of tests are based on that. And those are for ranked lists of genes. Okay, so those are the basic tests. And then there's two types of corrections you can make for multiple tests. Now everyone's heard about multiple test corrections and I suspect a lot of you have heard about the bond for only correction or the false discovery rate. And so we're gonna talk about what those are and what those corrections mean. And we're even actually gonna tell you how to compute those corrections. And it's actually very straightforward. So don't worry about that and make it easy. Okay, so those are the learning objectives. There's a lot of words on these slides. There's a lot of words on these slides so you can use them as notes later. Okay, so I mean, I pretty much outlined everything that I'm gonna talk about today. I love to point but I can't because there's two screens. So maybe I'll try the mouse. Maybe I won't, we'll see how it works. I might walk back and forth. And so the first basic test and that's the gene list test is called it's called Fisher's exact test or the hyper geometric test. And that's the only one you need to know for a gene list. All the other tests are somehow approximations of this test or work in weird special cases where a gene might occur more than once on the list. But basically if you know Fisher's exact test, you know everything you need to know about looking for a pathway and over enrichment of a pathway in a gene list. Now for the ranked lists, the situation is a little bit more complicated because people can't really agree on what the right test is and that that's because the tests ask different questions basically. And so we're gonna go over that and I'm gonna talk about two different tests for ranked lists. The GSEA test, which is developed by people who, the broad and they make this, they make the tools that are very popular, the GSEA analysis and that's one of the tools you're gonna use in the integrated assignment. The other test is called the minimum hyper geometric test and this was developed about 10 years ago and this is a test that's used in G Profiler which is something else that's gonna show up in your integrated assignment. And then once I describe those tests, we're gonna go into multiple test corrections as I pointed out before. Okay, so as I said, you have two types of enrichment analysis. One is a gene list. Say you're like thresholding, express or change above two-fold. So you have a list of genes that have changed in your treatment condition or a list of genes that are up-regulated in your treatment condition. How you decide where that threshold is just depends upon what you're looking for and I can't help you too much with that decision, right? A lot of that just depends on the statistics of like your gene expression or what it is exactly that you're looking for. But I'm gonna assume that you have a gene list and you think that gene list is somehow special and now you're looking for pathways in that list. Okay, and then the question is, are there any of these pathways? And here I've called them gene sets. I'm also gonna call them annotations, but let's just use the word pathway, even though sometimes you're looking for things that aren't quite pathways. We're just gonna use the word pathway so we can distinguish it from gene list because you have two lists that you're comparing to one another. Okay, so the first question is, are there any pathways surprisingly enriched in my gene set and my gene list? I don't know, here we go. This always happens to me in this talk. Okay, and that's, like I said, there's one test for that. And then say you're gonna rank genes by differential expression. That's one way to do it. Maybe you rank them by expression itself or number of peptides that you're able to recover from the gene in a mass spec experiment. And in the rank list, it answers the questions, are any gene sets ranked surprisingly high or low in my list of genes? I hope that's not me. Who's phone was that? So. Okay, thanks Gary, that was good. Okay. Okay, all right. And so there, like I said, there's a bunch of tests. Okay, I feel like I'm belaboring the same point, but here we go, here's the enrichment test. So the idea is you start with some, let's see if I can get this, there we go. So good, okay. If I shake it enough, it becomes big. Oh, but only for a short while. Okay, so these are, we've been doing this, oops, we've been doing this for a while. So in the old days, everyone used microarrays to measure gene expression. And I think everyone understands what a microarray is at this point in time. So you take your microarray and you take your genes, all the genes that are on your microarray and you look at the different conditions. And then you pass that through some sort of enrichment test based, oh sorry, you take your microarray, you look at the different conditions, and then based on that you come up with your gene list. Your set of genes that are acting strangely. And then you take your gene list, put that through an enrichment test where you compare it to gene set or pathway databases. And then at the end of the day you come up with an enrichment table where each one of the pathways is annotated with some score and that score is the p-value. And everybody knows you want the p-value to be as low as possible. All right, so given the gene list, given the pathways, which I'm also gonna call gene sets or annotations, and as Gary said, there's a lot of different sources of gene annotations these days. The question is, are any of these annotations surprisingly enriched in the gene list? Are there more genes in that pathway than you would expect by random? If that gene list was just randomly selected from the background of all the genes that you're considering in your assay. If you like proteins, I could say proteins instead. But you have a background of things that you're considering, genes or proteins, microRNAs, who knows, SNPs. And you wanna ask the question, I've selected some special list through my experiment, are there any pathways that are enriched in this list? That's what your test is asking. Okay, so I quickly talked about where these gene lists might come from, how to assess surprisingly, and that's when we have to do the statistics and how to correct for repeating the test over and over again. And this correction is important, because you come up with your gene list and then there's literally thousands of different pathways you can look for. And everybody knows that if you run, if you compute a p-value over and over and over and over and over again, eventually you'll get something that you like, which is great, but you don't wanna publish that because you're almost certainly gonna be wrong. Okay, so there's the classic two-class design, where let's say you have some untreated and some treated condition and you look at differential expression and then here throughout the talk, I'm just gonna use the shading to go from high express to low express. And then you just threshold that differential expression at some point. And there's a lot of different ways you can choose to threshold that and there's an entire class on how to choose that threshold. Or you use some sort of differential expression tool that will give you a p-value that you can then use for that threshold. But you threshold in some way, you might just look at the genes that are highly expressed and ones that are lowly expressed, like are down-regulated, up-regulated, or differentially regulated. If you have time course data, one way to come up with gene lists is to cluster the gene expression profiles of each gene across the time course. And then based on that clustering, say like a canine clustering or some sort of hierarchical clustering, you have different groups of genes and you can ask within these different groups, are there certain pathways or annotations that are enriched? Okay. And so how does it work? Well, here's your background. So this little box here represents all the genes under consideration. And then these days, it seems like you don't have to worry about the background, but you still kind of do. So in the old days, I used to give, in fact, just last year, I can't think of the word in English, which is my only language. So I give a short lecture, let's call it, on the fact that if you have a microarray that just has genes that are expressed in the immune system on it, you used to call this the immune array and then you choose some gene list from that microarray and you find that it's enriched for immune function, you shouldn't be surprised, right? So when you're thinking about looking for enrichment, you have to define your background, which is the set of genes under consideration. Like what that means is these are the genes that could have come up in your gene list if your assay had gone differently. And we could talk a little bit during questions about what the right way to find that background is, but that background's very important, okay? Great, okay, so there's the background, which is important. Okay, so like mass spec, for example, there's some proteins you just can't detect, right? If you can't detect those proteins, those proteins shouldn't be in your background. If those proteins have never been detected by a mass spec in that cell line of interest that you're looking at, those proteins shouldn't be in your background when you do the analysis. Okay, and then now you have your gene list, which is a special set of these, and you've sorted the genes, so your gene list is at the top, but whatever. And then your gene set contains an overlap, there's a little bit of a Venn diagram. So there's some genes in the background that you're in your gene set, there's some genes in the foreground in the gene list in your gene set, and there's some genes that live outside your background because they weren't considered in your experiment. Those ones you can ignore, and the question is, is the overlap here surprisingly large given the overlap of this entire thing? Okay, and the way that you measure that is using a p-value, and I'm gonna spend a few minutes, or I guess a few seconds later on explaining exactly what a p-value is, and I know you've heard this over and over and over again, but I think it helps to just say it a few more times and try to make it as clear as possible. Okay, and so, I mean, essentially what the p-value computes is the probability that you would see an overlap at least as large if you were just randomly selecting that gene list from the background. And one way you can do that is randomly select a whole bunch of gene lists from the background in the same size and figure out how often you see an overlap at least that large. If you do that, that's called an empirical p-value, and often people do do that. The problem is, is you have to do that selection a lot to get really low p-values, right? Because it's always gonna be one plus the number of times you see above the enrichment divided by the number of selections you do. So if you wanna p-value 10 to the minus six, you have to do this a million times. So luckily, under some conditions, we know what the distribution of overlaps should be from random sampling. And in that situation, you can say you can compute the p-value analytically. And the fish's exact test is one such example of that. Okay. Okay, so here's the recipe for doing the enrichment test. First, you define your gene list and your background. Right? Then you just, you select your gene sets or pathways to test for enrichment. Really, you should select these ahead of time. Otherwise, you're just gonna be continually selecting pathways until you find one that you like. Right? And that's not quite kosher. And then you run your enrichment tests, correcting for multiple testing if necessary. That means if you have more than one pathway that you're checking, step forward and interpret your enrichments and step five is published. Great. Yeah. Let's say if I do the exome sequencing, so my background list should be I assume all the genes, the genes would be like the genes that are in the target. Right, so you're doing whole exome sequencing. Yeah, so I mean, you would wanna be a little bit more careful in case you're like selection, you know, whatever you're using to do this selection for the exons doesn't capture the entire genome. So like if I can check any totally genes that are covered in all of them. Yes, that's set as your background list. It's not just the number because you're gonna be asking whether or not the genes that weren't mutated have that annotation. Yeah. For expression data, would you be considering all the genes of the threshold and those that are below the threshold that you're back to? So expression data is kind of, it's something I struggle with my own analysis. So it really depends on the question that you're asking. Right, so if you wanna say, is there differential, you define your gene list, let's say you take like some sort of, I don't know, let's say you're looking at a cell line and then you do like you do a knockdown or something and then you see which genes are differentially regulated. Well, to me it seems like your background list shouldn't contain genes that aren't expressed in the cell line. Unless you think that doing your knockdown is going to up-regulate an arbitrary set of genes. So, I mean, that's just my opinion, that's not what everybody does. People just sometimes, your gene list could be, as you point out, all the genes that either are up-regulated, so are above some threshold, or are differentially regulated, so either up-regulated or down-regulated. There's kind of two different ways to do that. But even in this condition, you might even consider using one of these rank list type tests. Some of these rank list tests can test for both up and down regulation. Okay, are there any more questions? Right, and then we talked about already why we might want to consider rank lists and you brought up one example and there's other examples you might consider. For example, then maybe there's no natural threshold. There often isn't. And you should worry about this if you get a lot of different results at different threshold settings, right? Because if you can't choose a natural threshold and the results that you get, depending on the threshold that you choose, I mean, that's a little bit disconcerting, a little bit. And then there's also a possible loss of statistical power to do a thresholding, right? So if you only look at the most highly expressed genes, maybe you're not able to detect signals that you would be able to detect if you reduced your threshold a little bit. Yeah? So the enrichment gets around the problem of ratio setting? Yes, wow, okay. And the example is exactly the same thing, except instead of just taking your gene set, you have a rank gene list. So you have to have some way of ranking the genes. Some of the tools like GSEA will actually figure out that ranking for you. Okay, and then otherwise the workflow is exactly the same. Exactly the same. Yeah. It means we rank the list and then take top 1,000. Yeah. On those 1,000. Yeah. So there we are doing a selection, but based on ranking. That's right, yeah. But you would do that, that's, I mean, that's one of the ways you can define a gene set. Sorry, a gene list. You can threshold by saying, we want to find things that are up-regulated, at least two-fold, or you can just take the highest 100 or highest 500 and say, okay, that's what I'm gonna say is my gene list. Wow, that's hard to say. Yeah, so I'm not entirely sure that I know the answer to that question. Yeah. The question is, what has more Cisco power? Doing like a gene set, like doing a lead gene list, or looking at a rank list? And so my last slide claimed that doing the rank list does that. But I don't know for sure what has more power. What you might be able to detect is signals that you otherwise wouldn't be able to detect because you choose to threshold wrong. So it might have different power, I guess is what I might say. Yeah. Just briefly, what you said was. So it's, like I said, it is a bit of a philosophical question what the appropriate background is, right? So, and it does depend a little bit upon the question you're asking. But if you are, for example, looking in a cell line and you see these genes are expressed in the cell line, you do a knockdown, you see some of the genes are down-regulated or up-regulated, it doesn't seem to make sense to me to include genes that aren't expressed in the cell line and are never expressed in the cell line and are unlikely to be expressed due to the effect that your knockdown has in your background. But it gets a bit complicated, right? Cause then you have to ask yourself the question, if I do this knockdown, are genes that don't get expressed in the cell line or are they suddenly gonna be expressed? Right? And if you think that's true, then your background should be the entire genome. Right, exactly. So that's what I said, yeah, but as Gary pointed out, some parts of the chromosome could be deleted. So, I mean, these questions are hard to answer and definitely, I think it's something you should think about. Also one way to address it is try it both ways. So Gary's suggestion, which is a good one, is you could try it both ways. Yeah. So, no. Gary, do you have suggestions for? It's hard in general to, somebody might improve, there are a lot of cell line expressions in cases, but I guess I would probably only question the cell lines. I mean, I can tell you what we do. So usually you have some cases and you have some controls, right? Or you have like the untreated conditions and the treated conditions. And so for our background, usually we use genes that are expressed in either of those conditions, right? And then we look for differential expression. I'm not convinced that that's entirely perfect, but I'm pretty happy with it. Yeah. What do you think of that, Gary? Yeah. I mean, it's just that you have to use it, can't use that background for some more data that comes in. Okay. Great. So here's the outline, we already talked about everything on this list. Okay, so the test for gene list, the only test you ever need to know is called Fisher's Exact Test or the Hypergeometric Test. I give it two names because whenever I give it one name, someone always comes up with the second name. And Fisher's Exact Test might test for both over-enrichment and under-enrichment, but there's a difference of opinion about that. So I don't know. Certainly the way that everybody uses Fisher's Exact Test, test for over-enrichment. Okay. All right, so what we're gonna do here is something called hypothesis testing. And you probably have heard about that before in statistics. You have two hypothesis about the data and you wanna compare them to see which is the better explanation. And let's just call them like the boring hypothesis and the fun one, right? As the null is the boring one, nothing interesting is going on and the fun is something interesting is going on, right? Okay, to understand what a p-value means, I think it helps to understand that we know a lot more about the boring hypothesis than we do about the fun hypothesis, right? The boring hypothesis says the genes aren't changing, right? Or there's no fun pattern or interesting pattern to mutations and exomes. And if that were the case, then we could describe what the data should look like. The problem with the fun hypothesis is no, it's very hard to pin someone down and what the right answer is for the fun hypothesis. They don't give you a very good description, right? So in the boring hypothesis, in Fisher's exact test, I know what a random draw looks like, right? And I just have studied this for a very long time. I know all the probabilities involved in a random draw. But in terms of being over-enriched, I don't know how much is supposed to be over-enriched. Maybe it's supposed to be over-enriched a bit. Maybe it's supposed to be over-enriched a lot. So I can't describe the fun hypothesis very well. So the only thing that you can get is saying whether or not the boring hypothesis can explain the data, right? And so what a p-value is is the chance that you would see the enrichment or something more extreme if it was the boring hypothesis. And the important thing is it's conditional. If boring hypothesis, then I see this data with this probability. Now what people often think p-values are is a false positive probability. And it's not a false positive probability because in order to know if something's a false positive or not, you need to know how likely it is that the fun hypothesis is true. And in order to know that, you need to know what the data should look like if the fun hypothesis is true and no one will tell you that, okay? So I came up with this kind of goofy example on my bike ride in today. I'm gonna try it and see if it helps or maybe it won't help. So I have this like fantastic device. It's called the two-nose machine, right? And then you put your head in the fantastic device and then you get a number that comes out and that number tells you whether or not you have one nose or two noses, okay? Now, 5% of the time, if you have one nose, the two-nose machine says you have two noses, right? And the two-nose machine goes bing, bing, bing, bing, you have two noses. How likely it is do you think that I have two noses? I mean, you can't look at my face. Like, they got likelihood of zero because nobody has or a very small number of people have two noses, right? But on the other hand, if I had the one-nose machine and 5% of the time, if you had two noses, it tells you that you have one nose and bing, bing, bing, the nose, the one-nose machine says you have one nose, we believe it, right? Even though the false positive, even though like 5% of the time, the one-nose machine makes what's called a type one error. It says you have one nose when you actually have two noses, okay? So that's what the p-value tells you. The p-value tells you how likely you were able to see this, the probability you were able to see this result or something more extreme if it were the boring hypothesis. And it can't tell you anything about the fun hypothesis, right, does that make sense? Okay, so what I'm just from now on, I'm gonna call it a type one error because if I call it a false positive, people get confused and you think about, you know, the false positive probability is that and it's not because you need to know something about the fun hypothesis. Okay, so it's called type one error, the probability of a type one error. I hate that language, but it's clear, okay? All right, so if I tell you in the background population, there are like, let's say 500 black genes and 4500 red genes, and I say that I took a random sample of five genes from that population, then this little histogram here tells you the probability that I would get exactly four balls, that probably I'd get exactly five black balls, that probably I'd get exactly three balls, two balls, zero balls, and one ball. And those probabilities are also called the hyper geometric probability. Someone's computed this and it has a lot to do with permutations and combinations and I could write it down for you, but I'm not going to. I'm not entirely sure I could write it down without help, but I could probably figure it out after about five minutes. Okay, so the p-value is just the sum of these two probabilities. So it's four black balls or more. So you sum these two probabilities together and that gives you the p-value, and that's the probability that you would have seen four black balls or more if this were a random draw. And that's, and this is the null distribution, this is a hyper geometric test. And, okay, so usually all you do is you just put your background list in and you put your foreground list in, you click off the pathways that you want to check and it computes these hyper geometric or these fissures exact test p-values for you and then does the correction. Some people run lucky and the tools don't, oh that's me, yeah. Just, there, I'll put it in the opening moment. Sorry folks. Okay, sometimes people have to do this themselves and if you look up a fissures exact test on the internet, you'll find people want a two by two contingency table and this is a two by two contingency table and so in here the columns are in the gene list, not in the gene list, and the rows in this are in the gene set or in the pathway, not in the pathway. And once you put the numbers in, it can do the computation to give you the p-value for fissures exact test. I would look very carefully to see whether or not it's giving you a one-tailed or two-tailed p-value because of what I told you before. So one-tailed only looks for over enrichment, two-tailed looks for over enrichment and under enrichment. But we're almost never in a circumstance where we can detect depletion. We can only, usually only detect over enrichment. Okay, so on some important details. To test for under enrichment of black, we can also just test for over enrichment of red. Those two things are identical. Again, in my little lecture, you need to think about the background population, okay? As Gary said, if you're not sure what the background population is, you can try a couple and see whether or not your results change a lot and if they do, then you have to really think more carefully about the background population. Often it's obvious. And so to test for enrichment of more than one pathway, you just apply Fisher's Exact Test separately for each pathway. And that's what these tools are gonna do. Here's gonna, you can take a pathway, for each pathway, it's gonna check the overlap and then it's gonna compute a Fisher's Exact Test p-value separately for each one of these pathways. And then it's gonna correct those p-values in some way to correct for the fact that if you test multiple times, you're more likely to get a type one error, right? Type one error, remember, as I used that language before, so that's what I'm gonna say, type one error, okay? That's when it's the boring hypothesis, but the data looks consistent with the fun hypothesis. Okay, all right, and so Fisher's Exact Test is the only one that you need to consider. So by no means on chi-square test, sometimes you see these things. So for the two by two table, where there's just like two possibilities in the pathway or not in the pathway, and then there's only in the gene list or not in the gene list, Fisher's Exact Test is called Fisher's Exact Test because it computes the exact chi-squared probabilities. Otherwise, the chi-square test is an approximation. Right, so you don't need to use the chi-square test, you can use the chi-square test, but it's an approximation, but for the two by two contingency table, you can use Fisher's Exact Test, and sometimes people also use the binomial test, but once again, the Fisher's Exact Test is exact and the binomial test is in the approximation. Okay, and again, like I said, the rank lists, I've listed here all the tests I know of for the rank lists, maybe there are even more than what I've said here, but each one of these tests tests something different and there's some slides later on where I'm gonna try to distinguish between these different tests, but there's like at least five different tests and two of these are actually identical. They just hadn't, people didn't realize that they were coming up with the same test, so we'll cox on rank some test and the Mann-Whitney-U test are identical. So some people call it the, we'll cox on Mann-Whitney test, okay. So what's the minimum hypergeometric test? So this test, it's a nice idea. Basically what you do is you take your rank list of genes, you start at the top and just try Fisher's Exact Test thresholding at every gene. So you start with one gene in your gene list, count compute Fisher's Exact Test, two genes in your gene list, compute Fisher's Exact Test, three genes in your gene list, compute Fisher's Exact Test, and go all the way down, computing all those p-values, and then you find the point at which you get the minimum p-value, okay. That's why it's called a minimum hypergeometric test. Then you have to go back and make a correction for the fact that you tested a whole bunch of times, okay. So this, yeah. Exactly, you just choose the best threshold. There's one way to think about it. Yeah, exactly. Okay, so as far as I can tell, this test was originally described in this paper that I've cited right here. Okay, so the nice thing about that fit minimum hypergeometric test, it does give you a threshold, which is sometimes helpful. So you can say, okay, this is the point at which I achieve the minimum p-value. That's useful. Okay, so this is the point at which I try to, yep. It can go up and down. It tries every single point on the list. Will you get the minimum p-value? Not all, so the p-value tells you something about the list. So what the p-value tells you is it's a measure of enrichment in the top of the list for the pathway that you're looking for. The threshold just tells you where the test achieved its minimum value. And you could think of that as having some meaning. It has a little bit of meaning. So p-values can go down or go up, right? So, in fact, I think the next slide will make this a little bit clearer. Okay. The p-value's for the list. The p-value's for the list, yeah. Yeah, so it's one list and one pathway. Let's say you get, so usually you have one list and you're testing against a bunch of different pathways. So you can say the p-value is connected to the pathway. List doesn't change when you try different pathways. Only the pathway changes and the threshold, the place that you achieve the minimum p-value might change but the list doesn't change, the rank of the list doesn't change. So yes, it's for the entire list but it's better to think about the p-value as being for the pathway. Okay. So what I've done in this figure is taken the pathway or the gene set and then all the gray lines, these are all the genes in the list and the red lines are those genes that are also members of the pathway. And here what I've called the ES score and I'll explain why I did this is the minus log peer, the hypergeometric test of that threshold. So what am I doing here? So you know, p-values are more exciting when they get smaller but people always forget that and they think bigger things should be more exciting. So the way to make bigger things more exciting is to take the one over the p-value, right? So, and then the log is a nice way to not have to deal with really big numbers because you get like one over 10 to the minus six, you get a million, you get one over 10 to the minus 15, you get a number that I don't know how to say, a thousand trillion, but if you take the log of that, you get six or 15, okay? So this is the negative log to the base 10 of the p-value, the hypergeometric test, I'm gonna call that the ES score and the reason I'm calling that the ES score is I'm gonna connect this test with the GSEA test which uses something called the ES score. Yeah, well it depends, it depends what you interpret the threshold as meaning. So for me, I just think of the threshold as a way of getting a p-value. It just so happens that to test the entire rank list, you just, you try all the thresholds and you just take the minimum p-value. If you think the threshold is telling you something meaningful, then I guess you have a problem. Yeah. In this figure, it seems like the genes were added into the gene, they're getting the DNA of all of them. So in this figure, please repeat your question. If you look at the ranks in terms of the same type of genes on top of the chart, what's added between the ES score from that line? Yeah, so as you, as basically, I'm gonna come over here and then I'll come over there. Okay. Then it's not gonna be able to record me either. This is too bad. Okay, I'm a little bit off the record now. Okay, so, so basically the way that this is gonna work, right, is if you start at the top of the list and as you go down the list, when you encounter a gene that's not in the pathway, the p-value is gonna go up, which means our like inverse p-value is gonna go down. Right, so it starts off, goes down, down, down, down, down. Once you hit a gene in the pathway, the p-value goes down, which means our inverse p-value goes up. Right, so the jumps occur where the red genes are. And so you go up, up, up, up, up, up, and then you go down, down, down, up, up, up, up, up, and you get to the end here and then you start going down, down, down, down, down. Oh, I was looking at the, only a linear order of, say we combine all the red bars, we select all the red bars and we do not select all the red bars. I'm gonna go over to this side of the room now so they get equal time. Okay, all right, so we're not selecting anything here. This is just a way of indicating. So the black lines here are indicating the genes and this is the top of the list and then this is the bottom of the list. And what I'm trying to show here in a not ideal fashion is which genes in the list are in the pathway and which ones aren't. And so the black lines are the ones that are not in the pathway and the red lines are the ones that are in the pathway. And what I'm trying to indicate with this figure is just visually when you look at the ES score, the enrichment score, whenever you see a black line, it goes down because the P value goes up and whenever you see a red line, it goes up because the P value goes down. And so that's why I warned that the alignment here between the bars and the plot is a little bit off. Unfortunately, I've been a professor for too long and I can no longer code. So it's gotta be Microsoft PowerPoint for me to make all my figures. So yeah, so that's what happens. That's what I'm trying to show with the black and red bars. And it is a linear order because it's just a rank list. So you always just go from the top of the list to the bottom of the list. You don't have like different paths that you can take down the list. Maybe you're thinking of like, you can think of maybe like a phylogenetic tree and then trying to check the pathway enrichment along some sort of branching tree. I don't know how you would do that. Yeah, okay, great. Any more questions? Okay, so, boom. This is the point at which you achieve the minimum P value. That's when the enrichment score is the highest. Okay. Okay, so now here I've done something a little bit fishy. Well, not fishy, tricky, let's say. It's not fishy, it's tricky. And that I put minimum hypergeometric tests and GSEA tests and I've said it's the same thing. And in fact, it kind of is the same thing. I mean, what they're doing is essentially identical. What, how they differ is how much you go up and how much you go down when you go do this ES score plot, right? So for the minimum hypergeometric P value, how much you go down and how much you go up is governed by whatever this P value calculation is, right? And it, you know, makes genes go, has a certain amount that you're gonna go up based on how, where you are on the list and how many genes have come before. GSEA makes a different decision about what that number should be. Okay. If, if you went up the same number, if you went down like one unit and went up one unit every time you encountered the gene, that test is called the Comagorov-Smirnov test, or the KS test, okay? The GSEA test is almost identical to the KS test except they sometimes have this kind of waiting scheme that they think going up and going down near the beginning of the list or the end of the list is more important. So they take bigger steps at the beginning or at the end. Whereas the minimum hypergeometric test, the step, the relationship between steps and how much you go up with the ES score is kind of obscure. I mean, it's not obscure, you can write down what that is, but it's not gonna be as something as straightforward as going up a certain amount or going down a certain amount. It's gonna depend where you are on the list and a whole bunch of other things. But essentially, all the tests are doing the same thing. They're making this ES score plot and they're looking for the maximum ES score. Well, sorry, minimum hypergeometric P values looking for the maximum ES score. GSEA and Comagorov-Smirnov is looking for the ES score that's most different from zero, okay? Let me explain that and no, okay? I'm gonna explain it now without a figure. So, oh, I have to go over here. Okay, so let's go back to this thing. GSEA score, you said it goes up, up, up, up, up, up, down, down, down, down, down, down, down, down, right? It goes down every time there's a black line, it goes up every time there's a red line. Okay, the reason it goes up and then goes down is because there's more red lines at the front. What if I reverse the order, right? So what if all the black lines came first and then the red lines came? So the B reds at the end of the list rather than the beginning list. Can anyone tell me what this like ES score probably looked like? We got heard. Flip it this way, I think, and then maybe this way? Right, I'm not sure, right? But yeah, it goes down, right? So it goes down, down, down, down, up, up, up, up, and then this point right here, that's the minimum point and then for the GSEA, that's what you would use, the ES score that you would use to compute your P value of GSEA. So GSEA, it looks for enrichment at the top of the list and at the end of the list. All right, so, okay, so, so far, yeah. So what does that mean? That's a great question. So in the GSEA test, they call it the leading edge, that it somehow corresponds to some measure of how diffuse the pathway is at the top of the enrichment. And for me, I don't completely understand the arguments, but I think Veronica will talk a little bit more about the leading edge analysis. So you could think of it like telling you where the enrichment is highest. The reason that I'm a little bit hesitant with this is because at least in the P value, you can often get the situations. So what do I wanna say here? Like if you had a gene list, let's say there were 10 things on the gene list and two of them were in the pathway and that was an enrichment because the pathway was very rare. If you just like made that gene list 100 and 20 of the things in the gene list were enriched, the same proportion of the gene list is enriched, right? But the latter is much more significant. Like the P value is much lower because it's much more surprising to like take a gene list of size 100 and be able to get 20, then it's to take a gene list of size 10 and be able to get two. Because it's easier to get rare things to happen if you have a small gene list. So the P value depends not only on the enrichment, which is like the proportion of genes that are in the pathway, but it also depends on the size of the gene list, right? And so as you go down, boom, boom, boom, boom, boom, your gene list gets bigger and bigger and bigger and bigger, but then your enrichment isn't necessarily the leading edge or the point at which you do the cutoff, that's not necessarily where the enrichment is the highest. Can you tell us something about your gene set to the universal gene set? Can you tell us something about the gene set, your circle onto the radio? Right, yeah. Oh yeah? There's multiple thresholds. Yeah, there's multiple peaks here, right? Look at this line. Boom, boom, boom, boom, there's lots of peaks. Similarly, like then which thresholds do you use? Oh, I think in this case, I'm not sure what they do. I think it's kind of arbitrary. So I think GSEA has a certain peak that they use in that circumstance, probably the first one. Minimum type of geometric, I don't know. Probably arbitrary. It seems unlikely that that would happen because there's a lot of real numbers. And the p-value calculation, the relationship between the size of the gene list and the enrichment of the gene list, kind of complicated. So I suspect in the minimum hypergeometric p-value, it's gonna be rare that you get this kind of two-peak problem. Unless, so sometimes, how do I say this? People have seen a lot of talks with p-values on them. If you have seen a lot of talks with p-values on them, there's a specific p-value if you'll see a lot. I'm going way off topic. It's 2.2 times 10 to the minus 16. Maybe you don't care about numbers as much as I do, but I see this p-value all the time. And what that p-value means is the p-value is below the precision that the computer has to calculate the p-value. When I see that, I just think that people have coded up the statistical method badly. But basically, there is a way to get like more significant p-values if you're a bit more careful about the way that you code it, but you might see the two peaks in the minimum hypergeometric p-value under those conditions. But I suspect it'd be relatively rare to see that for the minimum hypergeometric p-value under different conditions. But the way that GSEA does things, I think you could possibly see the two peaks and then I don't know how to make the decision in that circumstance. Perhaps that is more an answer to your question you wanted, but on the computer. Any more questions? Yeah? Are the leading edge genes once they're driving the emergency score? So yes and no. I mean, so like you could see here, there's still a lot of genes afterwards. Just the density decreases a little bit. And so one way to think about it is where the enrichment is the highest. But the point I was trying to make with the p-values is the p-values get more significant as the gene list gets longer. So you can get a smaller p-value with a gene list that's less enriched than with a shorter one, right? So like if I did like, if I have a gene list that's like four out of 10, I could get, that probably has a larger p-value than a gene list that's like 20 out of 100 even though the enrichment is smaller. So for the minimum hypergeometric p-value, it's kind of the point at which the density is the highest, but it's gonna lag behind a little bit. Like if I were gonna choose the highest density, I'd probably choose it there, right? So it's kind of like that. This is why I'm so cautious about the interpretation and because it also depends upon how the step size is taken. The leading edge is the solid red line at the bottom. Oh, let me show you. Actually, I can do this with the pointer, can I? This thing right here. Yeah, so this is the threshold. And then anything above that, it would be the leading edge. This dotted red line, that's the maximum enrichment score or alternatively the smallest p-value. Okay, so and then for the minimum hypergeometric p-value, So remember I told you, to compute a p-value, you need to either do a lot of random selections or random tests, or sometimes you can do it analytically to compute the p-value. Well, for the GSEA, it's not entirely clear what the analytic solution is, so they have to redo it a bunch of times to figure out whether or not a given enrichment score could do a sign a p-value to an enrichment score, right? And so the way that they do that is they randomize the order of the genes in some way, right? And I think in GSEA they randomize the order of the genes by switching the labels of the cases and controls. And so they try all label switchings where you have the same number of cases and the same number of controls. And then when you do that and you do the same thing that you did before where you come up with this like enrichment score and you find the maximum enrichment score and then you put it on this, then you can just make this histogram that contains the enrichment score. And this is one for, I guess, 2,000 permutations and you see where the enrichment score you got with the original ranking lives and you ask what proportion of these random shuffles have an enrichment score equal to or higher than the one that you started with is. And that's how you get the p-value. In this case it's an empirical p-value. So when they ask you about permutations that's what they're doing. Questions about that? Okay, and we've already talked about this. If you go up, you can get enrichment. If the things are at the bottom of the list, you get this thing goes down. What did you say about GSEA score? What did you say about 1 in 0? This one? Yes. Oh. So GSEA score is exactly as the ES score if it's based on the diagram and then convert. Yeah, so GSEA will compute the enrichment score by making that plot and then finding the maximum point. Based on the minimum p-value or based on the top to the. So for, I conflated these two tests because I tried to do things in parallel. So the minimum hypergiometric p-value, the enrichment score actually just corresponds to a p-value. In GSEA, they have a different way of computing the enrichment score where they take a step up every time they encounter a gene that's in the pathway and they take a step down every time they encounter gene not in the pathway. And the size of the steps could be constant or it could depend upon how close you are to the front or the back. So it's just some computation that they do. And as a result of the computation, you get these plots that look like this and then the enrichment score for a given ordering of the genes is the maximum or the absolute value. So either the maximum or the minimum, okay? Whatever's biggest, right? So then, so that's, so now you have that enrichment score and you wanna translate that enrichment score into a p-value. For the minimum hypergiometric p-value, it's easy because it's already there. For this, it's harder because it's not an analytical way to do that. So they have to do it by like permutation-based analysis. And in the permutation-based analysis, you change the labels, you flip the labels of the cases in control, you randomize them, recompute the enrichment score for that and then just do that a bunch of times and look at the distributions of random of enrichment scores where you have a random distribution of cases and controls and compare that to the one where you, we have the one that you started with. Okay, any more questions? All right, so because Gary finished early, we have extra time, which means I can finish my slides. And so I wonder though, if you wanna take like a one minute breather or maybe I should, no, no, no, Ann says no. Yeah, we'll never get you back in your seat. So let's talk about quickly about multiple test corrections and like with the p-values, there's one part of this talk that's easy and one part of the talk that's a little bit more difficult. Yeah, any questions before I start? Okay. Yes. Yeah, go. Enrichment in a particular pathway. Yeah. But we didn't say anything about individual genes, whether they can gene list. So. Or maybe yes, maybe no, you can go gene by gene. Yeah, I mean, you have a variety of ways. If you have two different conditions, you have a variety of ways of coming up with gene lists. You can say, let's take all the genes that go up under either condition. Let's take all the genes that go up under one condition, I'm sorry, under both conditions. You have to come up with some way of defining the gene list. Okay, we could talk about this a little bit off because I'm a little bit behind. And so I wanna make sure that you get the multiple test correction because this stuff's pretty important. All right, so you want a small p-value. How do you get a small p-value? Well, one thing you can do is you can just continually take a whole bunch of samples of size five from the background population. And if you do this enough, eventually something fun will happen. And so for example, in the case that we had before, you say you want at least four black balls, as long as you're willing to take about 10,000 samples, you'll get four black balls. Almost certainly, right? And so what does this mean for us? Well, that's a, you know, if what we're looking for is we're trying to prevent, we're trying to assess whether or not the gene list that we took was randomly selected from the background. If we just continually take gene lists from the background by random, and then just stop and report the gene list that we like, sorry, that's not cool, right? So now what you're not doing is you're not taking a whole bunch of different gene lists out, but you're doing something different when you try a whole bunch of different pathways is, oh no, this is a little bit behind, so is instead of taking different gene lists, you're just relabeling the genes. So instead of looking for black balls and red balls, you're looking for square balls and circular balls, right? So you're kind of doing the same thing. If you're gonna take 10,000 pathways and you're gonna ask the same question, are any of these gene lists, are any of these pathways enriched in my gene list, it's almost like paying the p-value lottery, but in just slightly a different way. So you can't do that, right? Because you're gonna get type one errors. You're gonna get situations in which there is no enrichment, there are, you know, in which the enrichment that you see in that pathway could have been due to a random chance. Okay, so there's actually a simple way to correct for this. And the simple way to correct for this is called the Bonferroni correction. And you've probably seen this before. And so your corrected p-value is just equal to m times the original p-value, which we're gonna call the nominal p-value, right? And so m is the number of tests you do. That's easy. So that's one way to do it. The other way to do it is, you know, you're testing against 0.05, you're testing against 5%. So you could either multiply the p-value and then test it against 5%, or take the original p-value and test it against 5% divided by the number of tests that you've done. Those are mathematically equivalent. Okay. And that's what the Bonferroni correction does. Okay, so if m is really big, let's say m is like 10,000. Sometimes, well, actually often you'll get p-values that are bigger than one, right? Which is weird. So don't worry about that. p-values, they're like bounds. This means that the probability that, what you know, that probably the type one error is less than or equal to the p-value. Okay, so, you know, if you have a p-value of like 100, well, the probability that you get a type one error is less than or equal to that because it can't be greater than one, right? So just think of it as a bound. And the reason to think about it as a bound, especially in this case is this Bonferroni correction is probably the most stringent thing that you can do. It makes the fewest assumptions about the relationship between the different pathways to one another. And often it's just way too stringent, right? Now, you can get a little bit more power out of things. There's a different type of correction in the same type that you can use, but almost nobody uses it and doesn't give you that much. Okay, Bonferroni's easy, you just multiply it by the number of tests. Okay, so when you use the Bonferroni correction, you correct for something called the family-wise error rate. Okay, so what does that mean? This is kind of important. So if you do a thousand tests and then you report all the pathways that are enriched at a Bonferroni or family-wise error rate corrected probability of point zero five or less, you are saying something about the probability that any of your enrichments are type one errors, like any of them. So it tells you probably more than one, right? So if you do like a thousand tests, 500 are enriched and you report a family-wise error rate of 5%, that's saying of the 500, the probability that any of these 500, like any one of them in the set is a type one error, is 5% or less. So that's different than another thing that you can say. The other thing you could say is you do your tests, you get 500 pathways that are enriched, and you say I want a false discovery rate of 5%, and I'll talk about what that is a difference in a while, that says that on average we expect 25 of these 500 are type one errors, right? Those are two very different things, right? One of them saying the probability that any one, and the other one saying that 25 of these 500 tests are type one errors, very, very different. Okay, so family-wise error rate, that's the first type of correction, extremely stringent, really hardcore, and if you can use it, great, because you have all the guarantees in the world, and everything that you measure, you can be pretty confident, like everything you report, you can be pretty confident. Okay, right, and so almost always, you're gonna be very disappointed when you do a bond for only correction, and for that reason people are willing to accept this less stringent condition called the false discovery rate. That says I've reported to you 500 of these pathways are enriched, but on average 25 of them are probably type one errors. Okay, so just to repeat, it's the expected proportion of the observed enrichments due to random chance. Now, man, I hate this slide. Okay, the accepted proportion of the enrichments that are type one errors. Okay, so this false discovery rate, it's a different type of guarantee, and for that reason they don't call it a p-value, they call it a q-value, right? Now the thing to realize is that what the false discovery rate is saying is as the number of enrichments you report is the number of tests that pass increase, the number of type one errors also increases. It's the proportion of the tests that you report. All right, questions about that. Okay, great. So, the classic way to compute the false discovery rate is something called the Benjamini-Hoschberg, and don't worry, this takes like two minutes to go through and then we're done. Okay, so how do you compute it? So you take the nominal p-values of all the pathways that you've tested for, and you sort them in increasing order of the p-value. So the most significant ones are at the top, boom, boom, boom, boom, boom, boom. And in this cartoon, there's like a nominal p-value of 0.99 right at the bottom. Okay, then you compute an adjusted p-value. So what does the adjusted p-value? Well, the adjusted p-value is equal to the number of tests that you did, the original p-value times the number of tests that you did, which in this case is 53, divided by the rank of that p-value in the list. So this is the highest rank one, so it's one. This is the second highest rank one is two. It's the third highest rank one is three. Okay, so now that's the adjusted p-value. Okay, and we're almost there. Then the last step to compute the q-value is you look at all the adjusted p-values, and the q-value is equal to the minimum adjusted p-value at that rank or lower. Okay, so here in this example, I have these adjusted p-values of 0.053, 0.053, 0.053. I've got one that's 0.04, then I go back to 0.053. And let's just assume that all the adjusted p-values from here all the way down are higher than 0.053. Okay, so in this case, this 0.04, this adjusted p-value, that becomes a q-value for all the higher ranks, right? And that doesn't propagate down to here. Okay, and then so the nominal p-value threshold for FDR greater, less than 0.05, corresponds to the lowest ranked p-value that has that q-value of less than 0.05. It's called the Bonn-Fron, it's called the Benjamin Hoshberg Step Down Procedure. Do you just step up or step down? So I write 50% of the time. It's still the minimum. Yeah. I'll come over here. It's still this one, because this one's smaller. The first place is there. You can see up here, actually, this adjustment, it multiplies by the number of tests divided by the rank. That's just the Bonn-Frony correction. That's super stringent, right? But as you go down the list, you become less and less stringent. The reason is, is because the false discovery rate depends on the number of tests that pass, right? So you can become less stringent as you go down the list because more and more tests pass and you're making a statement about the proportion of tests that pass. The reason is, I'll try to state, it's about a group of tests that pass. When we make that group bigger, the small proportion of that group are due to type 1 errors, right? So if I have, say, 500 tests that all achieve a p-value of 0.05, and then I have 500 other tests that achieve a p-value of 0.06, in terms of a proportion, the bigger set has a larger proportion that are type 1 errors. But, let's just leave it there. Feel like I'm talking myself into a hole. Yeah, multiple pathways. So you don't have to do this yourself in your GSEA. GSEA does this for you. Whenever someone reports a false discovery rate, this is what they're doing. You don't have to do this. I'm telling you how it's done if you're one of the end cases, for your own edification, because numbers are fun. But the other reason is, in case you have to do it yourself, unless in case you're in one of these situations where the tool doesn't cover what you need. Or you could do the GSEA for some reason GSEA doesn't have the pathways. Maybe you can upload the pathways in. But you're in a situation where you don't have full coverage of everything you want to do with the tool. But if you're using GSEA and someone gives you a q-value or a false discovery rate, this is how they got it. This is probably how they got it. There are other ways to compute the false discovery rate that make more assumptions. This is the one that makes the smallest number of assumptions. Some number runs at a higher after they've adjusted them to have a bigger number. So this, just right here. Well, you see the adjustment, the adjustment becomes, you're multiplying by smaller and smaller number because you're increasing the denominator. So even though the nominal p-values are smaller, which you're multiplying by, it's smaller and smaller as you go down the list, because you're dividing 53 by a bigger and bigger number. Here, number one is number three. Right. But number four, the original p-value is smaller after adjustment. What was bigger? But after adjustment the adjusted p-value is smaller than the original p-value. Right. We were looking for the first adjusted p-value that's above 0.05. Less than 0.05. We're looking for the... For the last one, that's the biggest. Yeah. But then, like, but the reason this is bigger is because 53 divided by three is a bigger number than 53 divided by four. Yeah. Right. I guess my question is, after the adjustment, you can see there's some fluctuation with the adjusted p-value. Right. So this is just the adjusted p-value. It's not yet the q-value. To get the q-value, you have to, like, each point, you have to look down the list below to find the smallest adjusted p-value. So even though the adjusted p-value can go up and down, the q-value always stays the same or gets bigger as you go down the list. Okay. Great. Yeah. So this, you are saying by dividing, you are saying that if the rank is higher, then you have more chance of getting a false... Of being a false positive than if your rank is lower. No, all I'm saying is that... Because you are dividing by a bigger number. Right. So the multiplication factor is going smaller. That means you are more sure that this cannot be a false positive. So again, remember what you are doing is you are looking at proportion. So you can, as it goes down further, you can get... Let's not call them false positives anymore. Let's call them type one errors. Okay, because people always get upset when you call them false positives because it gets confusing. Call them type one errors. As you go down the list, you do get more type one errors. That's right. Almost certainly you can get more type one errors. But FDR is the proportion of the tests that are type one errors. So as if you pick up fewer type one errors as you go down the list, then you are in the things in the list. So like here, let's say up here, then the FDR goes down. It has to do with like the rate... It varies with this division by the rank. Right. Because 53 by 5 is always smaller than 53 divided by 1. Right. So you are multiplying a smaller p-value with a bigger number and a bigger p-value with a smaller number. That's right. So then you are saying for a smaller p-value, my type one error rate, which is the factor which I'm multiplying, is more as compared to a bigger p-value, will be multiplying the smaller number. So my type one error of getting that p-value is smaller, is less as compared to a smaller p-value. Let's maybe talk about this because we're over our time. One way to think about it is it's a bound instead of an exact number. But let's talk about it off. There's one more point I want to make. I'm sorry I went over time. And this I think is, I think one of the more important points about this is regardless of what you do, the number of tests that you do is always going to decrease your ability to, your power, right? If you're doing tests, if you're testing pathways that are unlikely to be enriched in your gene list, you're, you know, that's going to still end up affecting your ability to detect the pathways that are enriched. So be very careful about which pathways you look at, right? And so there's various ways to do that. You can use Go Slim. You can restrict testing to only the appropriate Go annotations or one other way that people tend to do it is they filter the pathways by their size. Okay, so let me be, you have to do this extremely carefully. So like in terms of getting a small p-value, pathways with like one or two or three genes in them, just because of the way that the hyper geometric test works or the way that the rank list test work, they will never give you a small p-value, right? So a common thing that people do is they restrict, they say that the pathway that they look at has to have a certain size. Some people use 10, I use 30, that type of thing. And that removes a lot of the pathways because there's more pathways, especially in gene ontology, as you go down further to the list. Okay, so now what I'm saying is, you look at the pathway size before you do the testing. So you look at the size of the pathway, like the number of genes in your background in the pathway. You can't look at the number of genes in your gene set in the pathway to restrict them because that's kind of implicitly doing the test, right? So if you take your gene set and you look at all the pathways, you're like, oh, this pathway, this pathway, and this pathway had zero in my gene list. What you're implicitly doing is computing the p-value, right, but you can say, okay, well, here's my background. I don't know what my gene list is yet, but these pathways, only three genes are in the background, so I'm just not gonna ever consider those because I'm not gonna be able, I don't have the power to detect enrichment for those pathways. So you can remove the stringency by just removing these, not doing tests for things you're never gonna be able to detect enrichment for. Okay, so that was the last point I wanted to make. There's a summary. I'm over time, but I'm happy to stay up here to answer questions.