 Any questions about EDA before we move on to the next subject which is going to be slightly more statistical? So we've got one more module to go in, is it like 45 minutes or, yeah, okay. But the module tomorrow morning is actually a bit shorter so we'll probably finish it tomorrow morning. I want to make sure, you know, we understand it because it's quite important. Okay, so we're now going to talk about one sample and two sample t-tests. So for example in the gene expression experiment, we want to find the genes that are differentiated expressed now, right? So we've normalized, we've looked at everything, everything seems okay, we've taken the log transform. Now we want to know what are the genes that are differentiated expressed between the two conditions. So probably, so when I did my PhD in statistics, we had to do consulting, that is we had to do consulting at the University of Washington in Seattle with people from the whole university and they would come and they had these statistical problems that would, you know, they would ask you about lots of things, how to do these, how to do that, how can I compute the p-value. And I remember one of the professor that was actually taking care of the consulting lab there said, you know, doing consulting with someone on a statistical problem is basically trying to understand where the t-test is. So just to tell you how important the t-test is, often people are very interested in comparing the effect of a drug or the difference between two samples and it's just trying to do a sample t-test. Okay, so often it's trying to write down the problem in a way that you're trying to do a test and often you're just interested in comparing two conditions that is trying to look for the t-test. Okay, so what is a t-test? So a t-test can be used to do what we call a one sample or a two sample t-test and it's typically to test the hypothesis about the mean or the means of one or more distributions. So for example, let's say we've got the gene expression data set and we want to know is the mean expression in the HIV sample different from the mean expression level in the control sample. So typically when you do a t-test, you have to assume that the data are normal. Though we'll see and we'll talk a little bit about that, the t-test is fairly robust and you can get away with that assumption assuming you've got enough data points in your sample. Okay, so one sample t-test. Let's assume we've got a data set and we're just going to enumerate the data points as Y1 up to Yn. For the t-test, we have to assume that the data are independent, normally distributed with a mean mu and a variance sigma squared. So what we want to test is, is the mean mu equal to mu naught, some value. And the alternative that is not equal to that value. What we're going to do is that we're going to form what's called the test statistic and in this case it is the t-statistic. So the t-statistics is written as this ratio. So we're going to compute the sample mean. We're going to subtract the null like hypothesis or mu naught. And then we're going to divide by the sample variance, sample size deviation divided by the square root of n, where n is the sample size. So what does this do? So this is just the distance between the sample mean and the mean you're trying to test. So the idea is that a large t-statistic will be the evidence that probably you should reject your null like hypothesis. If you get a t that's very large, it means that probably you should reject the fact that mu is equal to mu naught. If the true mean is very far from mu naught, then the sample mean should be far from that value, right? Because the data are going to tell you that the mean of the data is very different from mu naught. So this will be big. So it makes sense. However, you kind of want to penalize noisy samples. If there's lots of variability in your data set, you kind of want to penalize your statistics for it, okay? Because if there's lots of variability, then maybe by chance alone you could get a mean that's very different from mu naught. So you want to penalize the statistics by that. So this is kind of like a penalized version of the difference between the empirical mean and the true mean that you're trying to test, the mean that you're trying to test mu naught. So the large values of the statistics are evidence then probably we should reject the null hypothesis. If the mean is mu naught, so under the null hypothesis that the data really come from a normal distribution with mean mu naught, then t should follow a t distribution with n minus 1 degrees of freedom. So again, the t distribution I've told you earlier, it's very similar to the normal, but there's that extra parameter which is called the decrease of freedom, which sort of tells you how heavy the tails are. And when n is close to infinity, it's basically a Gaussian distribution. So I'm going to show you a couple of graphics, you'll see that. Any questions about this? So you shouldn't worry too much about the formula because you don't need to know it, you know what, because r does it for you and there's a nice interface and nice function that you can use for that. Well now how can we relate these guys? So I say, well, when t is large, we're going to know that we should reject the null hypothesis. But what we'd like to do is to have a P value that will help us in making the decision. So basically we're going to try to compute the P value which will be computed like that. So maybe we'll move on to, so basically what we're going to compute is what is the probability that we observe a statistic as extreme or more extreme than the one that we computed from our data set. And I'll explain you why you computed like that. Let's look at the graphs, I think I have a graph after. Okay, so this is one of the genes from, so this is gene 1 from the HIV data set. So what we want to know, so we can compute the log ratio, the gene is going to be different to express if the log ratio is very different from zero. If the log ratio is close to zero, it means that there's no difference between the two samples, right? So what we'd like to know is, is the log ratio for gene 1 equal to zero or is it different from zero, right? To know if the gene is different she expressed. Does that make sense? Yes. Okay, so remember that we've taken the log of the expression data set on the log scale. What we want to know is, is there a difference between treatment and control? So it's basically saying, is the log ratio or the log difference between the two different? If there's no difference, it means that the gene is now a different she expressed. If there's a difference, it means the gene is different she expressed. Oh, okay. Let me go back to the formula. So here, this will be, so why would be the log ratios? So what we would do is that we're just going to compute all of the log ratios. Here, we've got a CDNM algorithm, so we've got a pairing between the control and the treatment, so we can summarize the data set into the log ratio for each replicate, okay? Which will be just the difference between the treatment and the control for that gene for replicate 1, replicate 2, replicate 3, replicate 4. And this will be the wise. Is that clear? I know it's late, but if you've got questions, if it's not clear, please let me know. No, no, go, go ahead. Okay, so that's a good question. Why do we do a 1-sample t-test and not a 2-sample t-test? Because we've got the control and the treatment. So it's a very good point, and we'll get to that. There's two things we could do. We could do, okay, we've got the two samples. Let's assume we've got just one gene, we're going to have four data points in the control, four data points in the treatment. We could do 2-sample t-test. Is the mean here equal to the mean here? Or we could compute the difference of these four and say, is the mean zero or the mean non-zero? This is two ways to look at the same problem, okay? And the key point is going to come when we're going to look at the assumptions of the t-test, okay? And I can tell you right away that taking the difference and doing a 1-sample t-test is what you should do, okay? And we'll see why. So now let's assume that we've taken the differences between the treatment and the control. What you want to know is, is the gene expression level or is the mean of the log ratios across the four replicates equal to zero or not? Okay. So that's a pair t-test. So this is equivalent to doing a pair t-test. But you could just say, I summarize my data like that and I do a 1-sample t-test. But in fact, it's just like doing a pair t-test. But a pair t-test is you compute differences and you do 1-sample t-test. So in fact, a pair t-test is just a 1-sample t-test on differences, on pair differences. You guys are good. Okay, so let's look at the example to try to understand how we compute the p-value. So here, maybe we're gonna go over this. So here I read the data, okay? Take the log, same as before. Here we compute the difference between the four HIV replicates minus the apparent control replicates. Now I compute the t-test for the first gene. So there's a t-dat-test function as specifying the first gene. And here, the hypothesis is that the mean is zero. So musical to zero, okay? So let's try to do that in arcs. So here, I think it's quite important we do it. Okay, so this is it. So here, read the data, compute the difference. So the log ratios and then we're gonna do the t-test. Okay, so gene one will contain the t-test of testing whether the mean is equal to zero or not. So we can also look at the outputs. So let's type that to see what it looks like. So it tells you this is a one sample t-test. The data was just the first row of the M matrix, which is just the first row of differences between HIV and control and the log scale, which is just the first gene of the log ratios. This is the t-statistic. This is the degrees of freedom, y three, because we've got four replicates. Remember that the formula is n minus one. So four minus one is three. This is the p-value. And the alternative hypothesis that the true mean is not equal to zero. And this is the actual mean that's computed from the sample. So it gives you a little summary of the test and all of the things. Say that again? Yeah, so first of all, we've gone through some of the assumptions when we looked at the HIV data set. So it's right that here we have to assume that the data are from a normal distribution. So we'll talk a little bit about that after. We'll see that in fact, the assumptions are not exactly correct. Maybe we're gonna run in a few problems. Okay, so this is the observed statistic is 0.74, right? Which we got from R, which is here. If you go on the other side is minus 0.74. Now we want to know, so p-value by definition is what's the probability that we observe something as extreme or more extreme? Okay, that is either something larger than what we observe or smaller than the negative value of that. And then you're going to compute the area under the curve. And that's very easy to do in R because you can compute the probability under the curve. Okay, and this is how that p-value is computed. So if we go back to the formula, that's why I've written that the p-value can be computed as two times the probability that Tn minus one is greater than T in absolute value because Tn minus one greater than T in absolute value only gives you that part. It's the probability that you observe statistic greater than that in absolute value. And then you need to multiply it by two because it's symmetric, so you're going to get the other part on that side, okay? Once again, this is just, it's more important that you know what the p-value means than the actual formula to compute it because R can do that for you. Any questions? Yes, so here the alternative says that we're looking at mu not equal to zero. We could say that we're only interested if mu is strictly greater than zero but because here we're looking at different Shakespeare's deans, we're looking at mu equals zero versus mu not equal to zero. On the other hand, if you were to look at a treatment where you can only believe that the expression could be greater when you apply the treatment and you could do one sample T test where the alternative, yeah, I mean often you will do two sample T tests and that's why I do not even bother to talk about the one sample T test. Okay, so this is another example just using the gene number four during the exact same thing and of course because the expression is slightly different, you can see that the p-value that you get is very different and in this case, the p-value is 0.04 which technically if you were to do a test and you assume all the assumptions are correct and if you use 0.05 significance level, you would say that the gene is probably different, zero. Yes, so you need to know that, you need to know what you're tasting. So mu zero is just some value that you believe might be true and you want to test that hypothesis. Y bar will be the mean, the empirical mean of these four observations. So Y bar is just, so for those who've never seen the bar above the Y, it's just a way to denote the empirical mean, so the sample mean. If there's things that sometimes you've never seen, you don't understand, you don't know what it is, where you feel like maybe you're the only one, don't be ashamed to raise your hand and ask it. We're destroying the experiment, but they are paired because they are the same, they originate from the same colony themselves. No, they are paired because this is, in a way, it's kind of a technical limitation of CDNA microarrays, that you need to put the two samples on the same slide. So they are coming from the same array, yes. Okay, so you should know the answer to that one already, because I said it, but let's say, I'm asking you the question, what is a p-value? So it's a measure of how much evidence we have against the alternative hypothesis. It's the probability of making an error. It's a name code used by statisticians. It's something that biologists want below 0.05. It's the probability of observing the value as extreme or more extreme by chance alone, or it's all of the above. So which ones are true? Okay, so, well, you know, we could argue that C is true and D is true. If we remove C and D, which ones do you think are true and are not true? So A is true, E is true, A is true, what about B? Okay, so how many of you think that B is true? You need to raise your hand. So raise your hand now if you don't want to raise it next. How many of you think that it's not true? Okay, so it's almost half and a half. It depends, oh, that's great. It depends on what. No, it's wrong, it's false. So p-value has nothing to do with the probability of making an error, okay? And we'll see a little bit about that, but you could pick p-values 0.05, but the probability of making an error is 50%. It's related to what we call the type one error, but there's other kinds of errors you can make. So we'll see that. Okay, so two sample t-tests. So the two sample t-tests is very similar to the one sample t-test, just the formula, you compute the t-statistic slightly different. And the assumptions are slightly different too. So let's assume that we've got the data yi1 to yin, and i is one or two this time, right? Because we've got the control and the treatment. So it's too simple. We have to assume that the data are independent, they're normally distributed, just like in the one sample t-test. The first one is a mean mu i, so either mu one or mu two, and we've got a variance sigma squared. In addition, we need to assume that the data in the two groups are independent, and that the variance is the same in the two groups. Okay, so right away you can see that the independent assumption, if we were to do a two sample t-test, in this case, could definitely be questioned, right? Because they are on the same array. So let's imagine that something goes wrong on one of the microarrays. Then they're definitely not independent because if one of the measurement is bad, the other is very likely to be bad, right? So this time we have a two sample hypothesis. So mu one is equal to mu two. So the expression level of gene, expression level of that gene in the first condition is equal to the expression level in the second condition versus they are different. So the formula is slightly more complicated, but it's basically the same thing. So this time what you're going to do is that you're going to look at the difference between the two empirical means in the two sample treatment and control. The idea is that if this is big, it means that there is probably a difference between the two, that the gene is different to express. And if it's small, it probably means that there is no much of difference and therefore the gene is not different to express. However, we still want to penalize for the viability. Okay, and because we assume that the variance is constant, this is the way we're going to pull the standard error across the two samples when we compute the standard error. So this is exactly the same idea difference between the two sample and we penalize by large alliances. If the means are equal, T follows the T distribution with n one plus n two minus two degrees of freedom. So again, you can see it's very similar in terms of the formula before we had n minus one. This time because we have two samples, we get n one plus n two observation, but because we estimate two parameters, the two variances as one is two, we have two degrees of freedom. Once again, no worry too much about the formula because we can do that for you. The way you compute the P value is exactly the same. So as far as the overall presentation and interpretation of the results is concerned, it's exactly the same thing. You can compute the T statistics, so this is the formula that you do that you use to compute the two sample T statistic. And then this is what you would get. So there's just one difference here is that when you compute the test statistic using T that test is that you need to input two samples because it's a two sample T test. So here we will input data from one to four for the HIV condition and then data from five to eight from the control condition. This is still looking at G1. And in theory, you need to assume that the variances are equal. There's a way that you can sort of relax that assumption, but if you do that, then some of the things are not exactly correct when you compute the T statistic. And by default, R is, by default is do not assume they are equal, so R will do the correction for you. But here I wanted to show you first when they are equal, so I forced it to assume that the variances are equal. Usually you would not worry about that, you would just use the default value. That would probably be good enough. So this is the result of the two sample T test. This is the T statistic. This is the degrees of freedom. Four plus four minus two, six. This is the P value. And then it gives you also the sample meaning X and the sample meaning Y. Again, the P value is computed exactly the same way and you can see this time that P value is 0.5, so again, we would not reject the null hypothesis. So in fact, even though we use, first we use a one sample T test, here we use a two sample T test, doesn't make a very big difference in terms of the null hypothesis. But if we look at gene four, we can do the same thing again. And you can see that here the P value is slightly different, but it's just right below the threshold of 0.5. Okay, and you can do it exactly the same way. So everything's the same. It's just the way you specify the T test. You've got the two samples and here I specify the assumption because I wanted to show you when the vines are equals. So the true T test needs the vines to be equal in the two groups. Yes? You can do it just for one gene, right? Yes. How can you do it for all the genes without getting to that bit one by one? You can't, you have to do it one by one. No, we'll see that. So you could do the L apply function, for example, that the apply function with the mean of the sound division we've seen before. You could do that and just apply the T test to that. We can do a loop and we'll just do a loop for now. But we'll go over it. Okay, so let's go over the assumptions because I think that is pretty important. So the data need to be normal. If not, then one can use a transformation or a non-parametric test. So this is kind of like an alternative to the T test that is non-parametric. If the sample size is large enough, and typically large enough means n is greater than 30, then the T test will just fine. It's because there's some limit results in statistics that says when the sample size is large enough, the sample mean will be approximately normal and therefore the T test remains valid. It will be approximately normal and therefore it works just fine. So this is great because this is all theory, but in practice, especially when you work in bioinformatics and biology, you never have n is greater than 30, right? It's very rare that you have n is greater than 30. So basically all of these nice rules and things about the T test are just not valid in our case. We need to be very careful about that. Independence, so usually it's satisfied because if you're looking at replicates, especially biological replicates, typically they will be independent, so you don't have to worry too much about the independence between replicates. If not, then you really need more complex modeling and you're sort of in trouble because it gets very complicated. What's the meaning of the normalization? Oh, so we remember when we're looking at the histograms and the box plots before taking the log, it was highly skewed, it was definitely not normal, right? It was not symmetric at all, so it was really bad. So taking a transformation can help you to make the data more normal. And this is something very common in statistics, is that when there are some assumptions, typically people might just, so you have two options. There are some assumptions, they're not satisfied. What do you do? You try to find another test or another method that do not require the same assumptions, okay? Such as a non-parametric test where there's very little assumptions. Or you try to transform your data, okay? So you can normalize, you could take the log transformation, you try to transform them so that the assumptions are more valid or are close to being valid. But for example, for a model distribution, you can not transform. No, so there are things that you couldn't really do. So the independence between the two groups, so this is what we've discussed. In the two-sample t-test, the groups need to be independent. If not, then one can use a per-t-test, which is what we've done here, right? If the two groups are not independent, that is there's a natural pairing between the two, then you can do a per-t-test. And in fact, this is what you should do with CDNM algorithms. And remember, for the two-sample t-test, I said that we have to assume that the variances are the same. Otherwise, it's not valid. But in fact, there's another variant of the t-test which is called the Welch's t-test, which is in fact the default in R when you do two-sample t-test, which will correct and do a t-test with separate variances from the two groups. So variances have to be exactly equal. Yes, you need to assume that, well... I mean, you could test it by summarizing the data, right? Yeah, but the sample variances will never be equal, right? That was my question. Because there's viability in the data. So if you go tomorrow and measure something and you come the next day, you will probably get a different answer because there is viability. So it's difficult to know if the variances are different or not. I mean, you can look at the numbers and get a feel if they are very different or not, especially here that we've got so many genes. So you can look at the variances in the first group and the second group for all genes and try to test that assumption. Typically, it's not that bad. But in fact, you don't lose much if you relax that assumption and use Welch's T-test. So typically, and that's why it's a default in R, it's because it's more flexible and you don't really pay a price for it. It still works fairly well. So this is something I will let you practice on your own. It's very easy to accept exact same comment except that you remove the var that equal equal should. So you just use the default to sample T-test and you can compute the p-value in exactly the same way. Okay? So if you use the default on each R, you cannot assume that it's normal distribution, right? No, I mean, if you know the truth is that there's nothing in real life that's normal. Because a normal, it's a distribution that we have. It's nice and convenient. But in reality, there's nothing that's exactly normal, right? Because it's the real world. Yeah, so it's okay to assume normal T-test, not to assume it if you have 30 samples because the distribution of the means of the samples are going to be obviously a normal distribution. If you have just four samples... Then you're in trouble. What you can do is that you can sort of look at your data, try to make sure that there's no gross violation of the assumptions. But obviously with four observation is very difficult. And you know what? With four observations, if I give you just one gene with four observation here, four observation here and I tell you, give me the difference between the two, tell me if it's different to express. And I'll probably tell you, you know, see you later. You know, I can't do anything. Gene expression experiments are different because yes, we only have four measurements in the control and the treatment, but we've got thousands of genes. So in a way, you've got some information coming from the other genes. So maybe you can borrow some information from the others in trying to make a decision. If you have a distribution of a cross-solving gene in one sample, that's normal, you... No, you can't do that. Questions from the other genes, can we get to that? You'll see. So we'll get to that a little bit. One question, you probably know that, with these box spots or... Well, so one thing you could do, for example, okay. What don't you tell me what you would do? I think the columns in this case and each column in the box spot, you see the... That would be okay, except that the box spot where we're looking at is doing a box plot per array. So we're looking at the variability across genes. Here, we want to know if the variants in the four replicates of that gene and in the treatment and control are the same. But what you could do, for example, is compute the variants for each gene. You can compute the variants from four replicates in the control and in the treatment. And then you can do a scatter plot of the variances in control in treatment for all gene. What you would like to see is maybe some kind of a line that says that they're roughly similar across. But it's very difficult to do because a variant's estimates can be very noisy, in particular with four replicates. So it's hard. I mean, it will be very difficult for you to have something that says, oh yeah, they're exactly the same. So that's why, if you have any doubt, just use the default in R and it will take care of that for you and that's the best way you can do. And in fact, it doesn't hurt you very much. Even if they were equal and you said they're not, it's not gonna change very much. So I will let you do the Walsh's t-test and the pair t-test and you can compare to the previous results. And in fact, what you should observe is that when you do the pair t-test, it's exactly the same as doing a regular one sample t-test on the differences as we've done. Just that it's nice because it's doing the differences and everything for you, you don't have to handle that. Okay, so what are non-parametric tests? We did not really talk about it. So it's just a way to do a t-test with no assumptions on the distribution of the samples. So you would say, well, this is great. Why don't we always do a non-parametric test, right? Because we don't make an assumption. I don't even know why people would bother with the t-test. That's true. It's very robust and you don't have to care too much about the assumptions of distribution of your statistics. However, there are some drawbacks to using non-parametric alternatives. Obviously, they are very flexible, but in a way, they're not going to be as powerful as doing a regular t-test if the data are normal. So if the data are normally distributed and you do a t-test, typically your pair, that is the ability to detect true differences would be much greater than if you use a non-parametric test. So you're going to lose some if you use a non-parametric test. So in a way, using a non-parametric test, you will be slightly more conservative. You'll be like, I'm okay if I don't detect everything, but I just want to make sure I don't make a mistake. So it's a trade-off between the two. Again, these sort of things are coded in RUN. You can do it very easily. So this is an exercise for you to do. Use R to perform a non-parametric test. So for example, the Wilkinson on GIN1. So you don't know what you're looking for. So what you could do, and I think that's, let's see if it is in the script. Yeah, it's right here. So if you don't know what you're looking for, you can just do help that search. I made a typo, yeah, that's good, found it. So you can see that here, get the distribution, and here you get the test. So what we want is Wilcox that test, okay? So it's very similar to the t-test, the real t-test, and you use it very much in the same way. Here we're going to do a two-sample t-test. So we'll just do it like that, 14.1, and you can look at the summary, okay? Say that again? It's just the way that they compute the statistics. They are different. Very much like the t-test with an equal variance and the t-test with equal variance. The assumptions will typically be slightly different. I can't remember exactly what the assumptions of each test are, but I would say that in most cases you will do both functions. Where do you see that? Just Wilcox and Rang sombers and Wilcox and sideline tests. I mean, I always see both in every program I use, and I've never known. It's just the way they compute the statistic. Again, I think some of the assumptions might be slightly different in either way. To be honest, I typically just do a t-test if I want to do a statistical test. But I think when you don't really know exactly, you've got several options, you might think that the default in R are typically what would be the best for most users. If you just use a default, typically you're doing the right choice. I always thought that the man-winding was different by the fact. It could well be, yeah. I don't remember exactly. It's kind of synonymous with a parent. That's very possible, yeah. This is what I told you earlier when I said, don't remember too much about the formulas and everything. The truth is that you cannot remember everything and especially the assumptions and so forth. What you need to know is there are some assumptions, there are some tests, and you can go back to your notes. You can go back to your book and you will know and you will be able to understand what it means, such assumptions, how you can verify some of these and that there exist alternative methods to do a statistical test depending on what the assumption is of. Okay, so we've got three more minutes for about 30 slides, so I think we're good. What we want to do next is basically to do the same kinds of tests for each gene because the overall goal here is to test the hypothesis that the genes are differentially expressed. But you want to do that for all the genes, right? So it's very simple. What we're going to do is to apply, okay, I'm talking about something else. I should go back to the slides. Okay, I wanted to talk first about the permutation test. When computing the p-value, all we need to know is the distribution of our statistics under the null, right? Because what we want to know is what is the probability that we observe something as extreme or more extreme than the statistics we observed under the null distribution. So assuming that there's no difference, assuming that the gene is not differentially expressed, what are the chances that I would have observed something as extreme or more extreme than what I have observed? So what we need to do is to be able to, if you want to simulate some data sets where we know that the null distribution is true, compute a bunch of statistics and then we compare it to the one we observed and we find something that's more extreme than what we have, we might think that, yeah, potentially it's not differentially expressed. If the number of statistics we observed that are greater than the one we have is very small, then we might think that probably there was a difference because our p-value is very small or the statistics was very extreme. So the question is how can we estimate things under the null distribution? So in the two sample case, it's fairly easy to do that. That's a good question. I would say typically 100 to 1,000 permutations is probably enough. It depends on the number of samples you have. Exactly, right? Because here you've got four and four. You don't have that many permutations you can do, right? So very quickly you're gonna see that you've done all your permutations. More than the number of permutations so you've got more power to... Exactly, you get more accurate p-values. Exactly. So that's an alternative of the p-tests to analyze pool-sized data, right? Exactly, and in fact this is something that people do a lot for microarrays. We're gonna talk about that tomorrow. This is something they do in the SAM package and we'll see that. It's actually kind of a nice way to do it. Okay, so here are a couple of facts about the permutation test. So you can select the statistics you prefer. So it could be the mean difference. So here we could just compute the mean difference. You could compute a t-statistic. It's not parametric. It does not really depend on the statistic you actually choose. What you do is that you compute the statistics for the original data. You do a number of permutations. Every time you have a new data set you recompute that statistic and then the p-value will just be how many times in all of the data sets you generated you observe something that's more extreme or as extreme to the one you calculated from the original data set. So it will be the proportion of newly computed statistics that are as extreme or more extreme than the one you had originally. So this is an example in our on how to do the permutation test. And what I encourage you to do is to go over this code, make sure you understand it. If tomorrow morning you feel like you have no idea what this is doing, we can discuss it.