 my recording, so welcome back everyone on Moodle as well. So normalization means, or there's two different types of normalization. The first one is the normalization of ratings, which means that your just values measured on different scales to a common scale. All right, so if you think about normalization of ratings, if you think about a microarray, if you have two microarrays, one has more DNA on it than the other one, then the entire intensity of the one array will be higher than the intensity of the other array, right? So you would do a normalization to bring these two distributions in line with each other, right? So you would take the array which had too much DNA on there and then kind of subtract a constant from each of these values had to bring the microarray back into the same scale level. That's the first thing of, first part of normalization. So that's a very specific type of normalization and it's called normalization of ratings. And then you have another process which is normalization of scores. And normalization of scores is something which happens when you align your distribution to a normal distribution. So imagine that I've measured intensity values, intensity values, if you scan stuff with a laser, and then of course the intensity values will not follow a normal distribution normally. And that is because the intensity will, in a lot of times be zero, like a lot of probes on the array will not catch any DNA, right? Because these genes are not expressed. And the genes that are expressed will have very high values, right? They are shining really bright. And so when you want to go from, so if you look at a single microarray, and then the intensity values on this microarray will not follow a normal distribution. And so you would want to then change the distribution that you have to a normal distribution to kind of compensate for the fact that a lot of the genes were off and only some of the genes will be on. And normally in a certain cell type, you kind of expect 10 to 20% of the genome to be expressed. You don't expect all of the genes in the genome to be expressed in every cell. So these two types of normalization happen either simultaneously, or you can do them in two steps in microarray analysis. And so generally when you do a microarray normalization, the first thing that you do is to normalize your scores to be a normal distribution. And then afterwards you go across the arrays and you kind of normalize using a rating normalization to make all of the distributions more or less the same. So normalization is nothing more than shifting and scaling the original values with the intention that the normalized values allow you to compare corresponding normalized values for different data sets in a way that eliminates the effect of certain massive influences. Okay, I hope that's clear. So in microarrays normally the score normalization is the log two transformation that we do on the intensity values. And then the next, the normalization of ratings is done by quantum normalization. So how does that look? So when we have our microarrays and we have log two transformed the microarrays, and then you can still see that every sample on every different microarray has a slightly different mean. And you see that some microarrays, they have also kind of different distribution and this one has kind of a higher distribution. And on the top we see that there's a couple of outliers. And so we can get rid of this by saying we want to normalize between samples. So this is a rating normalization and to make them all similar, so that they all have more or less the same scale. And so this is after you do the log two transformation because normally the intensity values on an array will be in the order of zero, right? Not being expressed to like 20 to 40,000 lumen. So 20 to 40,000 intensity. And of course this is not a normal distribution at all. But taking the log two will make it a normal distribution like this. So how do you do that for each array independently? And then afterwards you do the ratings normalization which will then normalize all of these distributions to be in the same range. So in this case from like zero to slightly above 50. Quantum normalization by the way is one of the most used normalization techniques in microarrays. So I told you about one of these issues that we had with the various behaviors of the different dyes, right? So here I have a little example. So we see that Psi five, the dynamic range of Psi five is from like 5,000 up to like 20,000, has sometimes 16,000. But the issue is that Psi three has a much higher intensity and that is just due to the fact that the green color is much more intense than the red color in that sense. And so when you use two color microarrays you want to take the ratio. So here we see a little example. So we for example see that the Psi five channel of the array had an intensity value of 5,000. The Psi three had an intensity of 40,000. And so when you look at the microarray you see that the spot is green because there's more green than there's red. And of course when both tissues express the, or both samples that you colored express in almost the same then you see that it's around 20,000 for each. But the thing is is that these scales, the dynamic range of Psi five and Psi three are very different. And so what you want to do is you then take the mean of the intensity one divided by the mean of the second intensity and then you get the ratio between them. So here you see there's one part Psi five to eight part Psi three. But the issue here comes in is that if you look at a red dot and then you see that there's eight parts of Psi five and one part of Psi three. And so these are more or less they're inverse but mathematically of course going from one in eight to one in four is a smaller step than going from four in one to eight in one, right? So there's four units between this while here there's less than one fourth, there's like one, one, a fourth and eight, that's two eight. So there's one eighth of a unit difference here. And that's one of these issues is that because when you're dividing things if you're dividing A by B and B by A, and then of course these ratios flip when one of the two is higher. So the range between zero and one is the same or is more or less comparable to the range of one until infinity at the other end. And so to compensate for that you take the log two ratio. So the log two ratio allows you to now have the same step size because going from one to two is a step of one, going from one to four is a step of three. But going from one to half is only stepping by half a unit, going from half a unit to one fourth of a unit is also a smaller step. And so the ranges if you divide two things then it gets squeezed in the area between zero and one. And so to compensate that you can take the log two ratio and the log two ratio makes it so that if you have a one, right? So both samples are equal. Then now it turns into a zero. And if you look at a ratio of two to one that becomes positive one and a ratio of one over two becomes negative one. And so now when you look at the last line then you see that had the step size going from zero to one, going from one to two is the same in both directions. And this is one of these kind of mathematical tricks. And so to kind of prevent the squeezing of the numbers in this zero to one range on one side of the distribution. And so that is one of the reasons why, especially in two color microarrays we would always lose a log two transformation of the ratio. Like I told you why express intensities as a log two ratios because there's a big imbalance the different intensity ranges. This imbalance is due to the two channels is known as dye bias and you don't want to have any dye bias because it has nothing to do with biology. It has just everything to do with like chemistry and how these dyes react to a laser light and some dyes have a bigger dynamic range. So the log two transformation improves the characteristic of the data distribution. It makes it a more normal distribution, that's one but it also prevents the squeezing of these ratio numbers between zero and one when one of the two is higher. And it allows us to often use classical parametric statistics for analysis compared to non-parametric statistics. Because if your distribution is not a normal distribution you have to switch to using non-parametric statistics. And non-parametric statistics generally are less powerful than parametric statistics because if you use parametric statistics your assumption is that it's a normal distribution and because of that, hey, you can do certain tests that you can't do when you don't have normal distributions. And it's the difference between doing a t-test or a Wilcoxon sign-brank test and the t-test is just more powerful because it knows something or it assumes something that the Wilcoxon test cannot assume. So those are kind of the three main reasons why you want to block ratio your data if you have two-color microarrays. The log transformation of course on the normal single-channel microarrays like I showed you here, here we just have a single channel a single intensity and we do the log two transformation just to make it a normal distribution. So this is just for statistical reasons but when we have two-color microarrays we have another reason to do it and that is because of the dye bias. Of course if you have a single-color microarray then there's no dye bias because every array is using the same color. So when you are looking at microarrays one of the most important steps in the analysis of microarray is to do background correction because you want to adjust for non-specific hybridization because DNA will bind to DNA even though it is not perfectly the same. So even if there's like one or two mismatches it will still bind and in the washing step you try to get rid of as many of these like non-specific bindings as possible but there is always some non-specific binding going on. And even between sequences which are completely not complementary they can get kind of stuck to each other and by trying to wash them off they just don't wash off exactly. And so if you then look at the array what you see is that the whole array is kind of having a little bit of an intensity, right? The array, if there's no DNA bound to a spot the spot is still a little bit lighter than the rest of the array next to it. So the hybridization of sample transcripts whose sequence do not perfectly match to those of the probes of the array, it happens so you have to compensate for that. So in the old days non-specific hybridization could be estimated from the fluorescence level in the immediate vicinity of the probes. If you have a very big array from like nine centimeters by 12 centimeters and that this is your microarray then of course you could look around the spots to see how much kind of intensity you saw next to the spots. But since spots became so small and are so close together you cannot really do that anymore. So currently we use like an exogenous negative control spots. And so if you order a human array there are some probes on there which come from plants. So these probes are not supposed to work in human because these sequences do not occur in human. So they give you kind of a negative control because you can look at the intensity of these spots that should not occur in humans. And then these spots are used as the minimum value. So if another spot has the same value as one of these plant spots then we can say well this is just non-specific binding. Another way of doing this is how Afimetrics does it. So Afimetrics always has mismatch probes. So if you make an array, an Afimetrics array then these arrays they say well there's 50,000 probes on the array but actually there's 100,000. So there's for each probe that they put on the array they also put a probe which has a mismatch in there. And then they look to see what the difference is between the perfect match probe and the mismatch probe to estimate the background level to estimate what the real intensity of a spot is. And so there's two ways of kind of fixing that. And in the old days when we still had the big arrays we fixed it by just looking around the spots. But now we use an exogenous negative control. There's often also an exogenous positive control. So a spot which is always on and a spot which is then well should always be off at least on like human arrays. And mismatch probes are another way of fixing this. There's also some kind of spatial bias often in microarrays and this is just due to the fact that pipetting is not perfect, right? The best way would be to have like a very kind of homogeneous intensity on the whole array. But if you are pipetting on an array then of course you're pipetting either from a corner of the array or from the middle of the array. And of course if you change that so if one array is pipeted exactly in the middle and another array is pipeted at the side and then you see the spatial bias distortions because of course the concentration of DNA will kind of be higher at the point where you pipeted in compared to the point where you did not pipe it but where it just flowed towards. And so it will start binding. I think that that's logical, right? That if you take like a if you look at like a piece of paper and you take a big pipet with water and you pipe it on the paper then at a spot where you are pipetting is much more intense or is much more wet than where the water flow to because the paper will just start absorbing it very directly. And so what you can see here is for example hey you have three arrays where the pipetting happened more or less at the top of the array and you see that there's a concentration difference more or less at the bottom of the array where there was just less DNA compared to on top of the array. And here, hey you see that it's more or less kind of distributed across the array. But what you can then do is you can use spatial normalization techniques to say no, I expected the top of the array to be just as intense as the bottom of the array because there's no reason to assume that all of the genes which are expressed are located on the top and all of the genes which are not expressed on the bottom. And so there's software that can allow you to kind of circumvent this spatial bias and make the whole array more or less a uniform background, a uniform color. So if you wanna do this then you can look into Bioconductor. So Bioconductor has this AFI package to analyze AFI metrics arrays. It also has an Illumina package to, oh, someone followed me. Thank you, Skorita, for following me. It still works. I like it so much that it still works. And Bioconductor has these packages and they include like the best known algorithms for pre-processing micro array data. So this spatial aberration and had the things that have to do with like a little hair because also there, hey, you can have a hair that is stuck on your micro array. Yeah, it's funny, right? That you get like this little sound effect and your name on the bottom. I thought that that would be funny for users so that they can have some feedback when they follow me. And also I get notified when people follow. And so for single channel arrays there are different types of normalization. So there's a lot or annoy you. Yeah, yeah, you could annoy me as well but then it's very easy then I'm just pressing one button and then this sound goes off. So the single channel arrays have kind of three normalization techniques which are used a lot. So MAS5 is one of the older ones. It's not used that much anymore. You have the RMA, so the robust micro array average and you have GC-RMA which is compensating also for the GC content of the different probes. So GC-RMA stands for GC which is the GC content of the probe and this has to do of course because of the fact that GC has three hydrogen bridges and AT only has two hydrogen bridges. And when you do two channel arrays then the standard normalization that people use is lowest. So lowest normalization. And of course you can read up on which would be best for you but in general nowadays people when they use single channel arrays they always do GC-RMA normalization of the background and of the spatial effects and if you use two channel or two color micro arrays then you have lowest normalization. All right, so that's more or less everything that I want to say about normalization of micro arrays and of course this is, you can just look up best practices and bio conductor here is your friend. Bio conductor has packages for reading in cell files, normalizing cell files and then writing them out onto disk in more or less a text format. But if you think about the statistical analysis then the goal is to identify genes that are differentially expressed between groups of samples that you have. Yeah, for example I've measured 10 normal livers and I've measured 10 livers which have cirrhosis so which are affected by alcohol use or other things and then I want to know what is the difference on the DNA level or not in the expression of the DNA level and so what I can do then is use many different statistical methods and the problem there is that none of these methods is perfect, right? If you do statistics, you're building a model and so you're saying that, well, I think that this is going on and then you can use all kinds of different statistical tests to kind of see which model fits best. So each test has different applications and different pros and cons but very normally if you have two groups then people use t-tests. So t-tests work really well when you have normal distribution. If you have more than two groups but these groups still follow a normal distribution then you use an ANOVA test and rank products is very much used and it was invented by the assistant professor of my previous group so I'm always trying to push people to use rank product but rank product is a non-parametric method of doing t-tests so it's kind of a non-parametric alternative to t-tests and it works really well because it ranks things and then based on the ranking instead of the intensity values that you observe it does test and it still retains a lot of power compared to Bill Coxson test. But of course, all statistical models are wrong but some are useful and every statistical model has its own advantages and disadvantages. So always when you do an analysis do your analysis using two different methods. So to see if something comes up using different statistical methods say if a certain gene is always the most differentially expressed gene and then of course you are more certain when you have computed this using three different methods and then when you only use one method to analyze your data. So the t-test compares the differences between two groups so it's the t-test function in R it's just t.test and the thing with t-test is that there's a lot of variation in how you can do your t-test but the major difference between t-test is the single-sided t-test where you're saying that my hypothesis is that a gene in group A is lower expressed than a gene in group B so this is a much more stringent test than the two-sided test because the two-sided test says that I don't know what I have no hypothesis I just wanna see genes which are different in group A compared to group B. But if you have a very good prior and for example you already know from previous experiments had that the gene that you are looking for is likely down-regulated in one of the groups and then you would want to go for a single-sided t-test because it will give you more power to detect effects compared to a two-sided test but the two-sided test if you have no prior and if you have no prior information and you're just doing an experiment for the first time and then you're almost always forced to use a two-sided test because you can't know if you're looking for a gene which is up-regulated or if you're looking for a gene which is down-regulated. The ANOVA is slightly different so the ANOVA in R is based on linear modeling so you put up a model and this allows you to have like three groups or four groups or five groups but the power of the ANOVA is that hey, you can compare differences between multiple groups for example, different genotype groups like A, A, A, T and T, T and it allows you to do quantitative differences so hey, and it doesn't really allow there's a lot to say about the ANOVAs ANOVAs are very powerful but the biggest advantage which I didn't write on the slide is that the ANOVA can allow you to compensate for known effects and so if you know that half of your arrays have been scanned on Monday and another half of the arrays have been scanned on a different point in time and then of course you can give this information and you can give this batch effect information to the ANOVA and so the first thing that the ANOVA will do is see if there's a difference between the groups that you specified and then it will use the kind of corrected data for the association test with the groups that you are interested in so in R, if you want to do this you have to create a linear model first using the LM function and then you can use the ANOVA function to test the significance of this linear model so it all works with specifying a model saying well, I think that the expression of a gene is based on the age of the sample so how old was the mouse or the human when we took the sample and then I also assume that there's for example an effect of a certain gene or a certain marker so it allows you to include multiple things and so multiple groups and so not just two groups like the t-test but you can do multiple groups and you can allow for covariates in your analysis things like the temperature, the time of day, the age of the sample, the concentration of the DNA so all of these things you can put in your linear model and that will allow you to have a very so it allows you to build up a hypothesis of what is going on in your data so a linear model looks kind of like this so you have your response in this case the response generally is your analysis of interest and so in this case it would be gene expression or the expression level of a gene is determined by some covariates so covariates are things like sex, age, weight or food intake and then you have your predictor and the predictor is the thing that we are investigating right so in this case it could be disease tissue versus normal tissue or cancer tissue versus normal tissue or brain cells versus fat cells right so and we're generally not interested in the effect that the age has on a certain gene expression because we're interested in what is different between cancer cells and normal cells and so this is the kind of basics of a linear model and of course a linear model will also allow you to have different interactions and these kinds of things but have for the assignments today we will start building a linear model and it's too much to explain linear models in the context of gene expression analysis if you're really interested in for example doing linear modeling then the R course that I'm doing in the summer semester will have like three lectures about linear modeling and how to do it in R so how do you linear model when you have repeated measurements how do you do linear modeling when you don't have normal distributions how do you deal with things like time series modeling so all of these things like I have three lectures in the R course about how to properly do linear modeling and what do the results mean so a little example of an ANOVA based on data that we collected here so we have the B6 which is the reference mouse strain we have our famous Berlin FET mouse inbred line which is one of these mice which weighs like almost three times the weight of a B6 a normal mouse and here we have an F1 group and the F1 group is are the children of a B6 mouse crossed by a BFMI mice and in our case we had multiple tissues we had two different tissues measured and then we put up the model saying that the expression of a gene is based on the tissue that we measured plus the strain that we are currently looking at and then I can use the ANOVA and when I do the ANOVA then I get this analysis of variance table and it will tell me that there is a significant effect of the tissue and so this gene was differentially expressed between the two tissues that we were looking at but it was not significantly expressed between the different strains that we were looking at so every strain had the same expression level but of course the gene is different when you look at brain tissue compared to for example liver tissue so very basic introduction of ANOVA like I said if you want to know more you can ask me like I can talk about ANOVA and linear modeling for hours and hours and hours but I don't want to do that today so depending on which statistical test you use because that depends on the data the amount of covariates that you have the amount of groups that you have the distribution that you have there is a very significant issue when you are doing microarray analysis because when you are doing a microarray experiment you are measuring like 10,000 or 100 well not 100,000 genes because humans don't have 100 but you are measuring somewhere between 10,000 genes and like 20 to 30,000 genes so you are doing thousands and thousands of statistical tests so the issue here comes in is that in biology we agree on saying that well if the p-value is below 0.05 then something is significantly different but of course if we just generate random data and start doing and so if we generate two sets of random data so we just generate like 100 random numbers and we generate 100 random numbers from the same distribution and we do a test then this test will tell me that these two groups come from the same distribution 95% of the time 5% of the time the test will make an error because that is our type 1 error rate that we agreed upon 0.05 meaning that hey if I just randomly generate two distributions and I do that 100 times then 5 times the test will say there is a significant difference while there is actually not because I generated it from the same distribution so there are two errors that occur when you deal with this and there is a type 1 error where you call a gene significantly changed so you are saying it is significantly upregulated but it is not significantly upregulated it is just by chance it is just due to the fact that we do so many statistical tests that this happens and you can avoid this by using a Bonferroni correction and there is then also the type 2 error so the type 2 error is saying that this gene is not significantly different while it actually is and that is something that you can avoid as well when you do a statistical test you are always trying to optimize the type 1 error rate or the type 2 error rate because if you start messing with the way that you do your correction procedure like using Bonferroni so if you do Bonferroni correction then you optimize to minimize your type 1 errors and if you do a Benjamin Hoogberg correction also called false discovery rate Benjamin Hoogberg there is a third guy sometimes that can be mentioned because they didn't invent it with the two of them but if you do this kind of false discovery rate procedure then you are optimizing for type 2 errors meaning that you don't optimize for type 1 errors so the type 1 error rate goes up so if you are saying that I want to bring my type 2 error rate down your type 1 error rate goes up your type 1 error rate down then your type 2 error rate goes up so these are interchanged and are connected to each other but for you guys know that the issue is that if you would test 10,000 genes and you would just take your 0.05 significance level and then this would be an issue because out of 10,000 genes you would say that 500 genes are differentially expressed and they do not have to be differentially expressed it's just because your statistical test has an error and all statistical tests have an error so in R we can use the p.adjust function to correct our p values and to optimize for a certain minimal error rate so we can for example use the Bonferroni correction to optimize our type 1 error rate to say that we want to prevent type 1 errors in 0.0015 using the Bonferroni method and then here in the third parameter is the number of tests that you did so here we're telling the algorithm well we did 10 tests in total so I want to adjust this value based on the fact that we did 10 tests and the 10 here the number 10 here is the number of probes or genes that we measured on the array so this number is generally in the order of 10,000, 20,000 30,000 depending on what type but this makes a significant difference in the amount of wrongly differentially expressed genes that you claim there are in your sample because in the end if you do a microarray and you find a gene which is very different according to your analysis then hey you would go to the lab and you would use something like QPCR to validate this and of course if you're not adjusting your p values correctly then you would confirm your differentially expressed gene because then in the QPCR, so in a separate experiment hey you would try to redo this and figure out but then you would figure out that because normally if you would find that there's 100 genes which are differentially expressed based on your microarray if you then go to QPCR then if you are a good statistician then you want 95 or more of these genes so in the end doing a separate experiment to test what your error rate was on your microarrays and you can do this with QTL mapping as well and always when you do statistical analysis it's good that someone in the lab confirms that your analysis was good and good means that 95% of the things that you say should be able to validate in a lab if it's lower than 95% or as a statistician and you should feel bad alright so imagine now that we did all of this so we did our microarrays we did our normalization and our background correction and we did our linear modeling and we have our differentially expressed genes then we corrected for multiple testing then in the end we have this list of differentially expressed genes and then we can do multiple things with this so we want to for example look at if all of these genes share something if they are all belonging to the immunity pathway or if they're all belonging to muscle regulation or if they're belonging to cell cycle so to figure this out we can use things like gene ontology or CAG and we already talked a lot about CAG CAG is this database for the high level function and it provides maps but you can also use the CAG database to test if there is an over representation of a certain pathway in your data because it might be interesting to know oh we look at livers from normal mice and heavily obese mice and now we want to know what's going wrong in the liver and just knowing that a certain gene is differentially expressed will not really help us pinpoint where things go wrong but using things like CAG we can figure out with which cellular pathway or which biological process is more than at random over represented in your data set which will allow you to kind of reason what is going wrong within your cells or within your different tissues we can also use gene ontology and gene ontology is a project a collaborative effort to address the need for consistent description of gene products across databases and it is an interesting project I'm not the biggest fan of gene ontology I like CAG much more but that's why we discussed CAG already a couple of times but gene ontology is something that people use a lot and it is well received but I'm not really that big of a fan I don't like the hypergeometric testing that they do but that's something personal and of course my personal opinion should not hold you guys to say well I'm not going to do gene ontology and gene ontology is a tool to kind of figure out what's going on in your samples and in your experiments so gene ontology consists of three different parts so what they did is every gene in the genome gets annotated based on these three things so the first thing is the cellular component so for each gene in the genome people started trying to figure out where is this gene located or where does it have its function so some genes are active in the nucleus other genes are active in the endoplasmatic reticulum other genes are active within the cytosol some genes are actually excreted so they are active in the extracellular matrix so every gene is annotated as the location where this gene is found to be active so there's a lot of information and experiments behind this to kind of figure out which cellular component the gene is active and of course this can lead to overrepresentation if you find that all of your genes which are differentially expressed are actually annotated as having a cellular component mitochondria then you know oh wait perhaps we should take a closer look at the mitochondria in our next experiment because that's probably where the thing which goes wrong is most active if this cellular component points to the nucleus hey you can have a closer look at the nucleus so it helps you to plan with your current analysis in which your current like differentially expressed set of genes to figure out which region of the cell something goes wrong all genes in the all genes in the genome are also annotated based on their biological process so the biological process is a term that describes a series of events accomplished by one of more organized assemblies of molecular function so a biological process is not really equivalent to a pathway like a cac pathway but a biological process is something that is is more general right like growth growth isn't is something that it cellular growth or cell division and those are not pathways in itself right because a pathway is defined by having a metabolite and then an enzyme working on that converting it to another one and so metabolic pathways are described as being like steps where hey you go from one substance to another substance through a step a number of steps of transformation and the biological processes are much more broad so they say like this is DNA replication biological processes I think we saw this in REACTOM as well right cac is really about looking at metabolite protein or enzyme working on it and then it turns into another metabolite but the biological process is something like cellular component and that is something like DNA replication or growth or cell cycle so it's much more broad but of course if you if you find a list in your list of differential express genes that most of these genes have something that is annotated to for example cell cycle and then you can kind of figure out or you can kind of think like oh that's interesting so there's probably something cell cycle wrong or there's something wrong in the cell cycle of these cells furthermore we have molecular function so again every genome has a molecular function and a molecular function describes the activity that occur at a molecular level so it occurs or it corresponds to an activity that can be formed by individual gene products but some activities are performed by assembly so a molecular function is much smaller so a molecular function is much more like a pathway it's very close to what a pathway does so a molecular functions can be protein degradation that's a molecular function it's also more or less a biological process but a biological process would not say it would say protein synthesis so the degradation and the production of proteins but a molecular functions are very close to pathways and so genontology is a tree so if we look at the biological process tree then it is split out into cellular processes and metabolic processes and then of course you have then in the next group so in the next level things are annotated being part of a cellular metabolic process or a small molecule metabolic process and this then gets split out into metabolic processes which have to do with vitamins and metabolic processes which are cellular and have to do with ketone bodies and for example you can have cofactor metabolic processes and this then splits out more and more and more in much more finer detail and finer detail but all of these are a tree so also the cellular component is a tree so hey it's like inside outside of the cell and when you are inside of the cell you can be in the cytosol or in the nucleus and when you're in the cytosol you can be in the plasma and so it's always a tree that you're looking at and of course every gene gets annotated with one of these and if you're annotated as being a metacoionine metabolic process then you're also a vitamin K process and you're also a ketone process you're also these two so it works up and it is very dissimilar to how CAC works CAC works in pathways so metabolite enzyme working on a metabolite other metabolite is being produced alright so again I've been talking for like 40 minutes and I'm going to do a quick break and then we have like 15 slides left so we will do the 15 slides and then we will talk about SARS-CoV-2 and downloading stuff from Ensembl in R and doing multiple sequence assignments on viral genomes which Jan wanted to know something more about good so then second break is what was it koalas so enjoy the koalas and I will be back in like 5 to 10 minutes so I will see you all 5 to 10 minutes and