 So, the learning objectives are to be able to select the appropriate enrichment test for your data, to be able to determine the background list when running an efficient exact test, for example, or a hypergeometric test, then to be able to understand how a ranked list is interpreted with a minimum hypergeometric test, and to be able to determine when you need to apply a multiple testing correction and in the genomics, you mostly will need to do that. And then there's a couple of different multiple testing corrections, and you should be able to understand what the Bonferroni correction does, or what a false real discovery rate does. And you should be able to understand or explain how you do either of those tests. Okay, so the outline. First we introduce enrichment analysis, but we touched on that in the previous lecture. Then we can talk about the hypergeometric test, and then a ranked version of that test, and then there's a couple of multiple testing corrections. So as we discussed, one of the most important parts of the Bonferron enrichment analysis is a gene list, and then usually, or in a very simple case, you just pick a gene list that has no additional information, so list of genes. You were depending on various parameters to define that gene list. For example, you may have big genes that have a significance of less than 5% from random analysis. And then you would ask, are any gene sets or corresponding pathways and processes surprisingly frequently present in my gene list of interests? So in other words, does my experiment pull out some sort of common biology from known pathway databases? And the community commonly uses the Fischer's exact test or the hypergeometric test in order to determine how surprising that really is. A ranked list, in the case of a ranked list, which is sometimes recommended because ranked lists have more information for pathway enrichment analysis, you would expand your Fischer's exact test to a gene ranking rather than a flat pre-list. And in this case, you would answer the question, are any gene sets ranked surprisingly high in my ranked list of genes? So in other words, are the pathway genes appearing very high in the list? And as we know, the genes in the list will be ranked according to some decreasing significance, then are the pathway genes generally with higher scores in my input list? The statistical test here is a minimum hypergeometric test. I won't be discussing GSEA, but Vernik will discuss GSEA. So GSEA is another approach of doing genome-wide ranked list analysis. So this is a common scenario or an overview of your workflow. You can have a microarray or an RNA-seq experiment or any other omics experiment, which essentially is a gene expression table where genes are going from top to bottom and they have their particular values. Then there is a black box called enrichment test, and that will spit out a list of genes, sorry, a list of pathways that have a significant p-value associated to a gene list. And then the black box also takes gene set or pathway databases as its input. So we discussed that scenario quite a bit in the previous lecture. We have a gene list, we have some gene sets or annotations from gene ontology, and we ask are any of these gene annotations surprisingly enriched in the experimental gene list? And then the details are the following. Where do they come from, both the gene lists as well as the gene sets? How do we assess surprisingly? That is the main topic of this lecture, and how to correctly repeat these tests and not to become overly enthusiastic about the data that you see. So the most common design that people use in omics experiments is a two-class design. You would have cases in blue and controls in red or vice-versa. You compare a particular set of samples to some control samples. And you know that there will be some gene expression differences, which you have determined with solids that this nickel techniques hopefully, and you have hopefully also created a good experimental design so there are no strong cofactors or confounding factors in your experimental matrix. And then having run through a gene expression analysis, for example, or proteomics analysis, you can rank your genes according to how different they are between cases and controls. So ranking these genes will give you both genes that are up-regulated in cases and down-regulating in cases relative to controls. And there's several ways of going about it. The first simple way is to just select your genes of interest according to some threshold. A reliable threshold would be selecting these genes by statistics. So any genes that are up-regulated with a p-value less than 0.05, an unreliable way of doing it is selecting genes according to fold change. Please don't do that. That is a bad idea. So always use some sort of statistics because fold change would not reflect the error in data, right? Fold change could be incredibly high, but it's meaningless if the error is also incredibly high. So you should always use statistics to do this, to do thresholding if you do do thresholding. An alternative version is not to do any thresholding and let the pathway analysis take care of it. But that's a different approach. So this will give you a gene list that you significantly trust in some sense. And if it's a gene expression analysis, you could actually have two gene lists. You could have the genes that are going up according to some threshold, and you could have another gene list that is going down according to some threshold, or you could have a mixed gene list. It's actually up to you what you want to do, but maybe you want to start with the simplest approach and just start with one gene list. In some other cases, you may have time courses. For example, you take omics measurements from genes at certain time intervals, maybe you're testing how well the drug works. And then you can also transform that time-wise gene expression matrix or any other experimental matrix of measures into gene sets. So you can perform, for example, clustering or ranking, and each one of those clusters becomes your gene list of interest. And then each one of those clusters, you can analyze for functions and pathways saying that maybe apoptolic pathways become up-regulated over time because the drug starts to work. So how do we perform a gene disenrichment test? First, we have a gene expression table, as we discussed, then we have certain lists of genes that we trust, maybe are up-regulated with a significant p-value. We have all these pathway databases. We select one of those pathway sets in these pathway databases, representing particular biological knowledge. And then we count how many genes from that gene set land in our up-regulated gene list, how many land elsewhere outside our threshold. And then the p-value to measure that surprisingness of the overlap between the gene set and the gene list is basically related to how much you would expect your genes to be in that pathway if these genes were drawn randomly. So the underlying assumption is that you sample many genes of that size from your original experimental matrix, each time count how many genes you get into that pathway, and that gives you a background estimate. Obviously, we don't need to do that all the time because there are standard statistical tests that work with these distributions, but this is the meaning of the p-value there. And then we do see such a result if the data were randomly generated. So this is a recipe for high-impact publications, define your gene less than background list, select the gene sets that you want to test for enrichment, run necessary enrichment tests, and correct for multiple testing always. Then interpret your enrichment, find out the new mechanism and publish them. So why would you even bother thinking about the more complicated case when you have a ranked list of genes? The possible problems with just testing a flat list is that you don't really know where to draw the threshold. So you could say, I trust genes with 5% error rate. So this is p-value 0105 for a cutoff. But then it's due diligence that you try different cutoffs. Do they actually reflect the same biology or do all your results disappear when you select the most stringent cutoff? And then you can also see that you can get the loss of statistical power because you constrain yourself to a way to stringent threshold. So to avoid that situation, you may want to consider a ranked gene list instead, because each gene in a ranked gene list will have a particular significance p-value from gene by gene expression analysis. And as you allow more movement or if you allow more power to your pathway analysis then it may be able to detect signals among genes that are near the threshold but a little bit lower. So this is one of the reasons why you want to work with a ranked gene list. So in a ranked gene list, instead of splitting your matrix into genes that we're interested in and other genes that we're not interested in, each gene will have a value that will decrease and then the most interesting genes would lie on the top. And again, you look at this black box which in this case is called the minimum hypergeometric test and it will again spit out a list of pathways that are significantly associated to your experimentally derived gene list. And again, you will have to do it across many different pathway sets in order to capture the entire pathway space. One question on the actual ranking because there is a big discussion on how you actually what kind of metric you use in order to rank the genes in the ranked case and so what do you normally use? Because many people use like the p-value plus, you know, if the genes are regulated or not regulated, other people think that it's a good idea to also include the full change in the metric, plus the p-value somehow. So what do you normally use as a ranking? Because the values have to be unique, right? As far as I know, in the case from the ACA, they have to be unique, otherwise it's not a good idea. So point number one, don't use FTR for ranking because FTR will do exactly what you say. FTR will flatten. So instead of a continuous row of various values, you would seem something more like a staircase. So in ranking, FTR won't work as nicely because it doesn't give a complete ranking. Generally I would recommend starting with the simplest type of ranking. Just use one variable because with the more combinations you choose, the more questions you will introduce. P-value is oftentimes a good way of ranking. If you have reason to believe that in your experiment there was more up-regulation of genes than down-regulation of genes, or if you want to capture something that's clearly being up-regulated, then you may want to maybe separate your list into two lists, one being up-regulated and another being down-regulated, try to sort them by a combination of full change and p-value. But each time you make a combination, you have to justify, why did I pick this type of combination and not the other? So simpler is better. And then the other suggestion is that if there's something solid going on in your data, some true signal, then it should come out whatever you do with the data in various ways. You should see the same thing over and over and over again regardless of your parameters. So in terms of recipes, when you do have a ranked list of genes, then the recipe is not that different. If you rank your genes in a meaningful way, make sure you validate that the ranking is robust to change these parameters, you can select up and down-regulated genes separately. You can use a linear threshold. Now depending on the method, a GSEA would require you to have all the genes to have some meaningful signal. G-profiler would require you to have some list. It could be a long list, but it still has to be a subset of all genes. So use a linear threshold. In that case, you won't put yourself into doubt whether there's anything more interesting below the threshold. Then select the gene sets or pathways to test for enrichment, which we discussed in the previous lecture. You run your enrichment tests. You definitely need to correct for multiple testing when you have a ranked list. This is because in a ranked list, you will test more hypotheses. So false discovery rate is even more important. You have to interpret the results carefully and then you publish. You can do that. Good, good. So the question was whether up-regulated and down-regulated genes are in the same list or whether they are part of different lists. Now this will depend on which method you use. GSEA will have everything in the same list. G-profiler in this case would require you to have one list for up-regulated genes and another for down. Or you could combine them all together, but then a different ranking needs to be used. You could rank them by p-value and then you would expect that up- and down-regulated genes are somewhat mixed when they come after another in the ranking. So the theory in the lecture is the following. How does a hypergeometric test work in broad principles? And how does the ranked version, so the minimum hypergeometric test, work? And then we discussed two different multiple testing corrections. The classical Barferoni test and then the more common Benjaminio-Hockberg test that is frequently used in genomics. So the hypergeometric test or the Fissures exact test, I think they're the same things and one of them is doing an exact computation, another one is doing an approximate. We have good computing power, so we usually do the exact test these days. The null hypothesis is that our gene list is a random sample from the population. And the classical textbook example of the population is a bowl full of bowls and then the bowls are either red or black in this case. And then the black genes would be maybe the ones that are involved in a particular pathway and then the red is the population genes that are not involved in the pathway. So we pull out a random set of genes and then we would expect that most of them are red, some of them are black or maybe none of them are black. And then we see this particular list where actually most of them turn to be black, so associated to the particular pathway. And we want to understand is that a random draw or is it more likely to reflect some underlying biology. And then this is done with a hypergeometric distribution where we know the probability of how many balls we are likely to pull out. If we pull out five, then how many of them are likely to be black or red randomly. So in most cases, you would observe zero black balls, sometimes one. And then there's a long tail of more rare events. And then the p-value is the sum of your observation, so four black out of five, versus anything more extreme to that. So a p-value is always a sum of probabilities. In this case, it would be the sum of probabilities that you see four black balls plus the probability that you see five, something even more extreme. And then when you do this analysis in a software like R, then most likely you will be compiling a contingency table, which is a two by two table measuring whether your gene is in a pathway or not and whether all the genes are in a pathway or not. And then this is a standard input that you would provide to this Fischer's exact test and that will give you a p-value response. A couple of important details. Most of the time we are measuring over-representation in a pathway enrichment analysis. We want to know are there more apoptotic genes present in my list of experimental genes? Sometimes you may want to measure for under-representation. Are some genes or others depleted or fewer than expected in my list of genes? And then you would just reverse the hypothesis and instead of measuring over-representation of red, you measure under-representation of black or vice versa. Now, I just have to mention that this kind of comes with certain caveats. The true negative of biology is way less described. So a gene that is not related to apoptosis according to a pathway database doesn't mean that it's not related to apoptosis. We just haven't discovered it yet. So there are no good databases out there of negative results. So measuring under-representation sort of comes with that caveat. Another thing that I also mentioned during the previous lecture is that you need to choose your background population appropriately. And that has to do with sampling of genes from the background. And one example is that if you limit your search space considerably, then you need to account that in your pathway analysis. For instance, in RNA-seq analysis, oftentimes people will discard transcripts that are lower than a certain level in their expression. It seems like a harmless move. You just want to remove noise. But if that covers a substantial number of your transcripts, say, you used to measure 18,000 genes and now you're only measuring 12,000, that that means that your sampling error cannot be the entire genome anymore, but it has to be the highly expressed transcriptome. Because otherwise, you will overemphasize the pathways that tend to be highly expressed. And then, therefore, you need to... This bin needs to only include the genes that weren't filtered because of their low transcription. And you can probably recognize the difference when you do practical analysis and you either don't provide the background set and you analyze every gene in the genome or you provide the list of genes that weren't filtered. And you would expect to see way fewer pathways pop up once you provide the right search space because you're no longer biasing yourself towards the highly expressed transcriptome. And this becomes especially acute if you, for example, you're only interested in transcription factors. Say you're looking at chip-seq data or transcription factor binding data. Your search space is suddenly 1,800 transcription factors instead of 18,000 genes. And then any pathway analysis you do that only includes transcription factors, you will get transcriptional processes out unless you restrict yourself to this search space that you pre-define in your experimental setup. So other enrichment tests. We're talking about Fisher's exact test and then there are other tests that do the same thing with certain differences in their statistical assumptions. We have a binomial test or a chi-square test. When we consider ranked list analysis then there are many different varieties of GSEA and related approaches. And then in the statistical test, there's Wilcoxel rank sum, Kolbogorov Smirnov and so on. And many times if you have your own previously not well-described data, you may want to design your own pathway analysis test. What I would recommend is using permutation tests to randomly sample genes from your list and see how frequently you will see pathways show up as a sanity check. Before really with getting enthusiastic about your results, try to repeat your experiment by randomly sampling genes and seeing if you see anything equivalent show up. So the minimum hypergeometric test is essentially a series of hypergeometric tests or Fisher's exact tests that are performed on increasing lists of genes from your input. So the assumption here is that if you have ranked your genes in a meaningful way, then the most important genes are aligned in the beginning of your list. And as you increase the cutoff then you get fewer and fewer very important genes and then their scores also diminish. So the way the minimum hypergeometric test works is that you calculate P values of multiple thresholds, multiple thresholds of input list and then you determine some sort of an optimum where the overlap with the pathway is the best and then you correct for multiple testing. So maybe that is a good visualization here. This is your gene list and the color intensity represents the scores and then we want to find out where our genes of a particular pathway. So this MGMHG method always works on one pathway at a time. So all the stripes over here are the various members of that pathway and you can quickly see that most of them seem to be located in the red area. So they cluster towards the beginning of the list. So they have higher scores. And then the question that the method works on is that whether it's the distribution is random or there seems to be an enrichment of the pathway genes towards the beginning of the list. So what happens here is that at each point in your list, a hypergeometric test is performed and it's calculated how surprising it is to see so many pathway genes at this low end of the, at this high end of the gene list. So a hypergeometric test occurs at each one of those steps along the curve and then we determine that in this position of the gene ranking, the score is the highest. So there are the surprisingness of seeing so many pathway genes is the highest. And that also tells you that maybe for this particular pathway, most of the interesting genes lie within that little block and then the significant enrichment starts to drop down because fewer and fewer pathway genes are present in that list after the cut-off. Why would we bother with such a more complicated approach of analyzing gene lists? First of all, we don't know where the pathway genes are located in the list and different pathways may have different genes. So this approach is much more sensitive compared to a flat list of genes because you can imagine that some very specific pathways have so small numbers that they wouldn't come out as significant if you consider the entire list that you selected. So your selected list may be a few thousand genes and then the pathway only has 10 but all of those 10 pathway genes are just the top 10 of your list that you analyzed. So those top 10 genes would be highly significant if you only consider the first portion of your list but they wouldn't be significant if you consider the entire list. On the other hand, there may be some general pathways that you're interested in with hundreds of members or maybe let's call them biological processes. So we wouldn't think too detailed about the pathway. However, they would be more scattered around your long list but yet still remain significant because there's many more of them. If you only considered a very small proportion in the beginning of your list, you would lose that more general pathway signal because you wouldn't be looking at the big picture. So in that sense, using their rank list is often preferable to using a flat list of genes as input but you need to be sure of your ranking. So as we mentioned before, garbage in, garbage out. If your ranking is of low quality, you may see results in pathway analysis that are not really representative of biology. What you also need to know is that this minimum hypergeometric test is more sensitive but it also does more statistical testing within. So multiple testing correction becomes even more significant because you're more likely to encounter highly significant p-values in any data, even random data and then you have to be more conservative and careful about that. So multiple testing corrections are just your tools against that issue of seeing way too much signal that turns out to be false positive. And why would we actually see multiple testing issues is how do you imagine yourself winning a lottery if lottery tickets were for free? You would just take a whole box of those lottery tickets and one of them would end up being the winner, right? And then, but unfortunately, you have to pay for these lottery tickets so you'll never get a whole box. But when you do bioinformatics analysis, lottery tickets don't cost anything so therefore you can try again and again and again. And in pathway analysis, you naturally try again and again because you try all these different pathways. And when you have this background population of earn or bowl of bowls, a sample only once, then you expect to see this distribution where most of the bowls are red and one of them is black. But if you sample long enough and you keep on sampling, then ultimately you will get the jackpot or you will get something that looks really significant just by chance because you were doing it a number of times. And multiple testing corrections are essentially methods to systematically fight that effect of seeing things highly significant because you just kept on trying. And then when you have a particular p-value, say 0.01, then you would expect an equivalent enrichment if you try at least so many times. So if you have 0.01, then if you try a hundred times, then you'll likely see one of those results that looks good, even if you do it in total random data. So this is the reason why you always have to think, do I need FDR? Do I need multiple testing correction? And most of the times the question is yes. Okay, another way of winning the p-value lottery is even if you don't have the same bowl of bowls but you have different bowl of bowls, you try different various hypotheses and you try to find the right answer, then ultimately you will find an answer that looks good if you don't correct for multiple testing. So in this case, you are sampling from black and red bowls, but you're also sampling from square and round figures. So if you test all these different potential outcomes, you will find an outcome that looks legitimate, but you need to test for multiple correction. So the earliest and the simplest multiple testing correction is called the Bonferroni correction, and the Bonferroni works on the following assumption. If M is the number of tests you run, for example, M is the number of pathways that you analyze for enrichment, then you should be very conservative about each p-value and you should essentially take every p-value that you observe and multiply it by M. And that will deal very efficiently and quickly with all those significant p-values because that's a very stringent correction. And the assumption here is that the correct p-value is greater or equal to the probability that at least one of the observed enrichment is random. So at least one, you run 100 tests, so at least one of them will be wrong. And that's a pretty strong assumption because the opposite of this assumption is that none of them are wrong, right? So you take your set of p-values, you multiply them by M, and the resulting p-values are the ones that you should be interpreting. And then the jargon or statistical way of saying this that we control that for the family-wise error rate, FWER. And then this is to date even used, I think in genome-wide associations, that is people talk about Bonferroni sometimes. So problem with Bonferroni is that it's very stringent and it kind of wash away real results. So especially if your signal is not that strong then applying Bonferroni on it will probably result in everything being p-value of one or above 0.5. And then these days the more common way of accepting this false discovery is you put a less stringent condition. So the false discovery to FTR leads to gentler correction and there's a fundamental difference between the assumptions here. So FTR is the expected proportion of observed enrichments to be wrong or to be caused by random chance. So in this case, this would be 5%, right? If you have a hundred draws, then after the correction you accept that five of them could be wrong. Compared to the Bonferroni where we allowed only zero or more to be, it's either zero or wrong or one or more are wrong. So FTR allows us to introduce more error into our measurements after multiple testing correction. However, it also provides us with better chance of finding something in our data. The usual procedure for FTR is calculating using the Benjamini-Hopper procedure, but there are several other procedures which are claimed to be more or less stringent. It's also important to know that Benjamini-Hopper procedure is not ideally suited for the pathway case because it assumes independence. Pathways are not independent by any means. I showed you the hierarchy before and also we know that genes are involved in many processes. So, but regardless, we're always using these different techniques even though we know that they're not independent. So the FTR threshold is often called a Q value which relates to the P value but it has been corrected to be more conservative. Okay, and I'll walk you through this one example. Please bear with me because this is a little difficult to explain but you'll see how it works on an example. So, for instance, we have tested all these pathways against our particular list of genes and retrieved a P value for each. What we need to do next is we need to, in order to apply the FTR procedure, we need to rank them from the strongest P value all the way to the weakest P value and also count how many there are. So in this case, we were analyzing only 53 pathways, maybe from a very stringent pathway database. Then the adjusted P value is the following. We take the nominal P value, we multiply by how many P values there were all together. So this is almost like the Bonferroni procedure except that we also divide it by the rank of that P value, right? So the first will be divided by one, the second by two, then by three and so on. And then we get some sort of an adjusted P value which will be further transformed. Some of them actually go beyond one. And then the Q value is the correspond, it corresponds to a nominal P value such that the smallest adjusted P value is assigned to the same or highest ranks. So you see that one of those P values here obtains 0.04 and then anything above that higher in the rank will be assigned 0.04. Nearly thing below will be assigned the second rank. So this is how you see that the FDR procedure takes the continuous space of P values and it makes it more a discrete space where it looks like a staircase, right? That's a block of results that go at 0.04 and then another block that go at 0.05. So among other things you need to notice that this doesn't no longer provide a good ranking because we don't really know what the rank is. They've been flattened out a little bit. And also note how the nominal P value used to be very, very significant or reasonably insignificant. And now it's corrected to something that is more close to your traditional threshold. So this is how the multiple testing correction works. The more common one, the Benjamin Hopper one, it just converts each finding to something that is a more conservative finding. And no matter what you do, the more results you test at a time, the more stringent this conversion becomes. So one way to deal with strong multiple testing issues is to a priori select an arrow or set of hypothesis to test. And then the last thing is that the P value threshold for FDR 0.05 is actually 0.031 that corresponds to that level where the last result was still considered significant. So that result is no longer considered significant because it's going above the classical 0.05. And anything below will increasingly get a P value that's closer to one. The fact that all those Q values are equivalent to each other, is that like a mathematical consequence of doing this or is it just that? This is what you commonly see, that it will flatten out the group of P values into the same range. And then- But not the actual same number, right? It depends. You may see like the first value, the very first, the very high P value would remain unique compared to the next one. And then there will be a region or an area or a sequence of them where the corrected P values are all the same and then they drop down a little bit more. It really looks like a staircase. Yes? I don't think that's a good way of estimating just by looking at your gene list length or anything like that. I would recommend starting simple, restricting yourself to fewer pathways and then expanding from there, rather than wasting yourself. Selecting fewer pathways for a start also has the advantage of you making fewer tests and subjecting to yourself to fewer multiple testing correction effects. And so as I just mentioned, the correction strength of a multiple testing correction directly depends on how many tests you do. So if you have a large dataset, you wish to analyze many, many pathways with it, then the best way is to choose a test that is really sensitive or vice versa. If you know that your data is pretty noisy and you don't have an opportunity to do a very sensitive test on it, then choose fewer pathways to work on. So maybe that also answers the previous question. If there's a lot of noise, then do so that you don't do too many tests in the beginning. And from the previous lecture, it probably makes sense for you to test the most interpretable pathways first. So biological processes maybe react on molecular pathways rather than diving into very noisy and complex pathways that you are likely not going to interpret. Another aspect that you can also do is I showed you the diagram or the hierarchy of the genontology. That also means that the genontology gene sets or the corresponding gene sets to genontology terms are widely variable in their sizes. And when you analyze them, there's probably maybe one third of half of the genontology gene sets actually have less than five genes or less than three genes. So there's a huge number of very small pathways and the small number of very large pathways. What you want to do for interpretation sometimes is to select the meaningful range to test in your enrichment. So, for example, we often discard pathways that are smaller than five and discard pathways that have more than a thousand genes. So these are more like, you know, meta big processes. You don't necessarily gain a lot by using these very large or very small pathways because they're hard to interpret, but you win considerably in this multiple testing issue because you'll just leave aside a lot of data that you don't want to interpret well and then focus on the one, on the slice of the data that's more interpretable, but you also rescue yourself from the multiple testing issue. All right, to summarize what we discussed so far, there were a couple of statistical tests that are commonly used in pathway enrichment analysis. There are obviously more and they will be discussed more in the workshop. If you just have a gene list, then there is no ranking involved. Then you can use a fish's exact test to determine the surprisingness of pathways in your gene list. More informative tests work on ranked gene lists. Probably they're also more sensitive because they capture smaller and larger pathways in the same go. One of them is called the minimum hypergeometric test. This is included in G-profiler. The very common and popular one is called GSEA that will be discussed by Veronique later. And then if you have your custom data, then there are some statistical tests that you can work from. For example, Wilcoxon test or Mann-Whitney or Kolmogorov-Smirnov test that allow you to interpret ranked lists of genes. With a GSEA, I'd like to just give you a quick caveat that it has been designed for microarray data and then just out of the box, it doesn't necessarily work on next generation sequencing data. It's something to be aware of, but you can work around with it by adding custom layers over statistical tests. Now, multiple testing correction is essential in pathway analysis and also elsewhere in omics analysis. We have so many pathways. We conduct many tests on these pathways. Moreover, we've assumed that the pathways are independent but they're not. So any meaningful pathway analysis will give you many results and you need to carefully correct for all the biases coming from multiple tests. Bonferroni was a test where you essentially multiply or p-value by the number of tests you took and that's very stringent in most situations and more contemporary and commonly used tests like the false discovery rate procedure by Benjaminio Hopberg are more forgiving because they also loosen up the conditions a little bit. So you allow a percentage of results to be wrong rather than a single count of results to be wrong. And fortunately, we don't need to do these procedures by coding them in, but most of the statistical packages will do them for you if you ask. Okay, so to summarize the learning objectives, you should be able to select the appropriate enrichment test for your data. The main two classes are ordered gene lists and flat gene lists. You should be able to determine the appropriate background set when you're doing your analysis. And this is where you have to think carefully about what your experimental design was, how did you determine your gene list or protein list and if there's anything that you missed in the general genome white gene background and maybe you should use a custom background list instead. You should be able to understand what the minimum hypergeometric test does and then you need to be able to determine when and how to do multiple testing correction and then explore the different, the two different families of tests really upon Ferroni and FDR and also I encourage you to look around if there's anything that specifically deals with your type of data. And a G profiler to run a little bit in advance will have a specific correction for multiple testing that also attempts to account for the fact that gene ontology processes are hierarchically related and there are smaller processes within larger processes. Okay, that concludes the lecture so I'd be happy to take any questions. Just one question on the actual usage. So I usually want to let's say to automate the whole thing so you have done like your RNA-seq or whatever and you come up with the candidate genes of the genes, you know, are different as they express and then you want to let's suppose to automatically do some kind of un-out-of-this-downstream analysis so identity-seq or the duty-profiler or this kind of thing. So which ones of these tools actually offer, you know, the command line thing? I'm a big user of R and I know that there's a large number of R packages that do various kinds of pathway analysis. G profiler has an R package. You may want to consider the fact that this R package is actually not doing local computation but it's accessing the server back in Estonia which could be a problem if you wanted to analyze 1,000 list of genes at the same time. Otherwise it works and the advantage is that you know that the data are up to date. Other packages require you to download data and then you just, you have this extra step of figuring out where to get the data. In general, command line tools are available and if you need to design a custom approach then you could download something called the GMT file which is essentially a database of gene sets and then design your analysis around that.