 So, today we're going to get an introduction to probability. It will be a whirlwind tour through probability for you today. And I want to start with a little bit of history. So Laplace was the person who basically introduced the modern ideas associated with probability. This is in the 1814. And the idea of probabilities to Laplace were only necessary because he thought that we just simply don't know the laws of nature well enough to be able to make predictions. If we knew the laws of nature, he thought that if you knew, for example, precisely the equations of motion, if Newton's equations were truly description of all the things that there is to know in the nature of the universe, if you knew the position and velocity of every particle today in the universe, Laplace thought that that's all you needed to know to predict everything about the future. So, to him, the idea why we had to have probabilities arose not because there was really stochasticity in the world, but because we just simply didn't know the rules of nature. And so because we didn't know the rules of nature, the best we could do is approximate things with probabilities. And in a way, the concept of Einstein's important sentence that the God doesn't play dice with the universe is a reflection of this notion that probability is something that we use because we don't really understand the true underlying nature of things, which, as later on physics showed, actually isn't true. The probabilities are real things that they don't just arise because we don't know the underlying equations that describe the deterministic way of things working. So, I want to begin with a description of the three basic ideas that Laplace described in what probabilities are. So, as you know, probability is basically the ratio of a favorable things to all possible things. So, probability is ratio of favorable cases to all possible cases. And its first rule had to do with describing these probabilities. So, if you can imagine that we throw a coin, let's say, we throw a coin twice. And we want to know what's the probability of getting exactly one head. So, let's see what that would look like. So, suppose that we say a head when it occurs, what we mean by that is a one. So, we have two coins, we're going to throw these two coins. And so, we can get two zeros, we can get a zero or one, we can get a one or zero, we can get one and one. So, the probability of getting exactly one head is, what, because this probability of one head exactly is, here's one case, here's another case, two out of four cases is favorable. So, this probability is one half. One of its first axioms was that as follows. So, if events are independent of one another, then the probability of the combined existence is the product of the respective probabilities. So, let's write this down. So, if events are independent, then probability of both events occurring is equal to multiplication of probability of each event occurring. So, let's give an example of this. Suppose we have two dice, and we're going to throw these dice and we want to know what's the probability of getting a one and a one on each dice. So, in gambling this is called snake eyes. So, this probability of x one being equal to one, x two being equal to one is the probability of x one being equal to one times the probability of x two being equal to one which is equal to one six times one six, one thirty six. So, this is the notion of independence of the two events. The next notion that I want to describe to you from Laplace's book is the idea that the probability of a simple event, if it was to recur consecutively, it would be the multiplication of those events. So, for example, he was thinking about the idea of, say, your father telling you a story about something that had happened to your family 20 generations ago. And he was interested in knowing, alright, if my father tells me this is what happened to my family 20 generations ago, how likely is it that this is true? And he was interested in saying, alright, well, suppose that my grandfather, when he told this story to my father, 90% of what he said is right, that it was true. And he was trying to understand, well, how likely is it that the thing that actually they're talking about actually occurred, what the probabilities look like down the road. So, let's write down his axiom. So, the probability of a simple event in the same circumstances will occur consecutively, the given number of times is equal to the probability of this event raised to the power indicated by this number. So, we're going to have this particular problem. So, we're going to have some event that we believe is being transmitted to us and it can be, you know, have occurred or not occurred. And we think that probability of it having occurred is 90%. So, when I hear the story from my dad, I'm 90% true, 90% confident that what he's saying is actually the same thing that happened. Well, he heard it from his dad and his dad heard it from his dad and so how far back do we need to go, say we go back 20 generations. And let's see what that looks like. So, the probability that what I hear is true and what dad heard was true is just going to be 0.9 raised to the power of 2. And after 20 retellings of the story, what we see is that probability in this case of X1 is equal to 1. And the original thing X20 is equal to 1 is going to be equal to 0.9 raised to the power of 20, which is going to fall to about one-eighth, 0.12 or so. So, in his book he says, look at this, this is astonishing. We, you know, think history is passed down by people telling what happened and even if they're 90% telling the truth, well, it's very unlikely that what they're talking about actually happened. So, he was using this to show that repetition of these probabilities in this case, a simple event that is occurring consecutively, someone is telling the story to somebody else who hears it from somebody else, makes us that it's unlikely to occur. So, which is the basis of hearsay, for example, in court you can't use hearsay, which means that I heard somebody else say something about something else, which is saying, well, that's already two steps removed, the probabilities are too low for us to consider. Okay, the next concept that he thought about was the idea of conditional probabilities, which, as you know, has to do with an event occurring conditioned on something else having occurred. And this is the way he wrote it. So, he said when two events depend on each other, the probability of the compound event, meaning that they both occur, is product of the first event and the probability that given the event has occurred, the first event has occurred, the second event will occur. And what this means is that the probability of A is equal to 1 and B is equal to 1 can be written as the probability of B occurring given that A occurred times the probability that A occurred. So, whether this occurring and this occurring can be written as the probability of the first one occurring and the probability of the second occurring given that the first one has occurred. So, let's do an example of this. Suppose that I have three jars and in these jars I have some balls and one of these jars has black balls and the rest of them have white balls. So, black balls in this one, white balls in this one. Now, I don't know which jar has which ball. I just know that we have three jars and there are one of them are these jars that has black balls and two of these jars has white balls. So, we reach and pick a ball. We reach and we pick a ball. We don't know which jar has which ball, but we just go and pick it up. And so, what's the probability that the ball that we pick is white? So, say white means one. So, we go and pick up a ball. What's the probability that it's white? Two thirds. Very good. Okay. So, we have picked the white ball. Okay? We have reached into one of these jars and we have picked the white ball. So, now what I want to know, we reach and we pick from another jar and pick another ball. What's the probability that it is white? So, now what I want to know is probability that x2 is equal to 1 given that x1 is equal to 1. So, how do I do that? Well, what's the probability that x2 is equal to 1 and x1 is equal to 1? That's the probability of x2 is equal to 1 given that x1 is equal to 1 times probability that x1 is equal to 1. And so, this condition is, if I've already picked the white ball, I have two jars left. So, this probability is just one-half. So, the probability that both my first ball and the second ball are white is equal to this conditional probability times the fact that the first ball was white, which is equal to one-half times two-thirds, which is equal to one-third. So, the probability that I will pick two balls in a row that are white is going to be equal to one-third. Okay? So, what I did is I first wrote the conditional probability. I picked the ball out of the jar. It was white. Now, I have two jars left. I pick another ball and that probability is going to be equal to one-half because I have two jars left. And so, now the joint probability is going to be the multiplication of this, which leads us to what's called the Bayes rule, which uses the joint probability to write the conditional probabilities. So, I can take this probability of x2 and x1 and I can write it as probability of x2 given x1 times probability of x1 and I can write it as probability of x1 given x2 times probability of x2, which means I can write probability of x2 given x1, this, these two must be equal to each other, is equal to probability of x1 given x2 times probability of x2 divided by probability of x1. Alright, so let me put this up here. Let's do an example over here. Suppose I have a group of people where I have some males and I have some females in this group. I have some smokers in this group and I have some non-smokers. And what I want to know is as follows. So, I'm going to tell you that this person that you see is a smoker. You tell me the probability that that person is a male. Okay? So, I'm going to give you some information about what makes up this group, how many males we've got, how many females we've got, something about what's the relationship between being a male and being a smoker. And the question for you is I have given you the information that this person is a smoker, how likely is it that they are going to be a male? So, my x here is going to be a male or female, and my y is going to be a non-smoker or a smoker. So, smoker is a 1, non-smoker is 0, male, female. Okay? Alright. So, now what I have is some information about my group. I have the probability of being a male is 40% in this particular group. And the probability of course of being a female is 0.6. The probability I also have information about the whether they're a smoker or not, and here's the information that I have. I have the probability of being a smoker given that I am a male is 50% and the probability that I am a smoker given that I'm a female is 30%. So, in this group among the females, most of them are non-smokers and among the males, half of them are smokers. So, the problem that I have is that I give you information and the person that you're seeing is male. So, what I need you to compute is the following. Probability of x being a, sorry, y being a smoker given that you have a male. Is that right? No. What am I trying to say? x is equal, no, the other way around. x is equal to male given that you're a smoker. So, how do we do that? So, we can use Bayes' rule, right? So, that's the probability of being a smoker given you a male times probability that you're a male divided by the probability of being a smoker. Okay? So, the numerator is easy to handle. What is it? We have this number here, 0.5. Probability of being a male is 0.4. Now, what's this? What's the probability of being a smoker? Right. So, the integral... Yes. Yes. So, this is probability of y is equal to 1 given that x is equal to m times probability of x is equal to m plus y is equal to 1 given that x is equal to f times probability of x is equal to f. That's what you were saying, right? Okay? That's the probability of being a smoker. So, let's put this up here. This conditional probability we have, 0.3 times 0.6. Okay? So, we multiply it out and we get something like 0.53. Any questions about that? No? Yeah. So, look at what I did. The probability of being a smoker, I'm going to divide it up into what that probability is if I'm a male and what that probability is if I'm a female. So, that could only be one of two things, either male or female, right? This is some of these two conditions. It's probability of smoker and male plus probability of smoker and female. Yes. But then you use the definition of conditional probability to separate this. I'm integrating out x out of our conditional probability here. All right. Now, we've been talking about joint probability, conditional probability. Now, much of what we're going to be doing in the class has to do with statistics associated with probability. Things like expected value and variance of those probabilities. Let's define them. So, we have a chance to describe it. So, if you have a, you know, random variable x, you can take some values and, you know, you have some probability in it. So, this is the distribution associated. We call this probability density function, p of x. What we mean by its mean is basically, so this, this p of x has a special property that its integral is one. So, you can think about it as every possible value of x is weighted by some amount. This is how likely it is for it to occur. And so, what we mean by expected value, the mean of x is just its integral times p of x dx. And if it's discrete, what we have is the sum of x times p of x. And what we mean by variance is the expected value of x minus its mean. I'm going to use the bar on top of x to mean its mean squared. So, that's what I mean by the expected value and what I mean by variance of a random variable. And so, this just becomes this goes up to this equation to define what the variance is. So, if I write this out let's see what this looks like, the expected value of x minus x squared minus 2x times its mean plus x mean squared. And you notice that expected value is a linear operator because if I take the integral of this equation I can just split each individual. So, then this is just equal to the expected value of x squared minus 2 times the expected value of x times its mean plus its mean squared. This is just its mean. So, that's equal to expected value of x squared and what I have now is expected value I have two expected values of x individually squared. So, I get this. So, the expected value of a random variable squared then is equal to its variance plus its mean squared. When you have a random variable that you've squared it, it's always going to have positive values, right? Its mean is going to be equal to its variance plus the squared of its mean. So, this is going to be an important equation for us later on when we deal with things like costs. Oftentimes when we think about costs we are squaring something. You know, we're saying that something is going to cost us more over here, less over here, but it's always going to be a positive number. So, we're going to have random variables that are going to be squared like this and so what we're going to do is minimize something like the expected value of the random variable and what we're going to see, what that means is that find the variance of the random variable, find the mean of the random variable and squared and that's what you're minimizing. So, some other useful properties of the expected value operator. So, the expected value of a times x, x is the random variable, is equal to a expected value of x. Variance of a times x, again a is the constant, x is the random variable, that's a squared variance of x. Sometimes we come across things like minimizing mean squared error. Let's see what does that mean. So, when we say find the expected value of x minus a squared where x is the random variable and a is our constant. So, we want to say we're interested in minimizing a cost that looks like this. The expected value of something minus some value that it should be. The difference between what we're predicting maybe and what it should be. So, let's see what that is. So, that's equal to expected value of x squared minus 2x times a plus a squared which okay. So, that's equal to I'm going to now write this as a variance. So, expected value of x squared if I want to have a variance of x here, what I would need to add to it is a term that has expected value of x squared. So, I'm going to now do this. So, I've subtracted this and added this and I'm going to have this rest of this term here. Two expected value of x times a plus a squared. This term here is variance of x, right? That's this term. Plus, I have this term here which you can see it's a squared value. It's going to be expected value of x minus a squared. And so, what is this? This is variance of x. This is bias of x. So, when you see a term that says we're computing the expected value of a random variable, the difference between that random variable and some constant a, if you want to do something with that like minimize it or maximize it, what you're doing in effect is reducing the bias between your random variable and your reference a while at the same time reducing the variance of your random variable. So, a cost that says minimize the squared error between something, x and some reference a, in effect what it's doing is weighing two things equally. The bias between x and a and the variance of x. So, that's a particular kind of cost that you may want to minimize. Okay, all right, let me go back over here. Oftentimes in this class we're going to be talking about two kinds of distributions binomial distributions and Gaussian distributions. So, I want to give you a little bit of intuition between these two distributions. So, in a binomial distribution or a multivariate distribution what you have is a random variable that can take on a discrete number of states. So, for example a coin, it can be 0 or 1 head or tail. And let's apply some of our terminology, expected value and variance to a distribution like a binomial distribution. So, what we have is that x can be 0 or 1 and if you look at this value, so here's x, it could be 0 or 1 and this is the x axis. Up here I'm going to plot the probability of x and say that for it to be equal to 1 it can take on the value q and 0 it can take the value 1 minus q. So, you notice that when I integrate this distribution I get 1. So, we're okay, we have a genuine probability distribution and I have q as my probability of getting x is equal to 1. So, probability of x is equal to 1 is equal to q. Alright, so how do I write this distribution mathematically? How do I write p of x? So, p of x is a distribution and you know it has two values, it's either going to be 1 minus q, that's going to be the case when x is 0, it's going to be q when x is equal to 1. So, I can write it as follows q to the power of x. So, that's when x is equal to 1, its value is q so that's good. When x is equal to 0 its value is going to be 1 minus q. So, let's see what I wrote. I wrote that this function which is made up of two numbers at 0 it's equal to 1 minus q at 1 it's equal to q this density can be written by this equation q to the power of x times 1 minus q to the power of 1 minus x so let's see if this works. So, when x is equal to 0, what happens to p of x? When x is equal to 0 I have 1 times 1 minus q, right? When x is equal to 1 x is equal to q times 1. So, that's good. That works. That's a function that I can use to write this discrete distribution. So, now if I have a vector x made up of these binomial binomial distributions, so maybe I have x in trial 1, I have x in trial 2 I have x in trial n maybe what I've done is that I've observed this variable n times then probability of a particular sequence p of this particular sequence is going to be equal to 1 times x1 1 minus q x2. So, if I have a random variable made up of all these individual scalar variables so I have some particular sequence of events that I've observed I want to know what's the likelihood, what's the probability of that particular sequence I have just a multiplication of each individual independent observation. So, that's a description of this probability distribution for a particular sequence. Now, what's a particular interest to us is that maybe what I'm interested in is the n which refers to number of times a positive event occurred. So, what that means is that maybe what I want to know is how many heads did I get? So, anytime I get a head, let's call it a 1 anytime some event that I like occurred, we're going to call it a 1 and I want to know how many of those did I get and particularly what I'm interested in are things like the expected value of n and the variance of n. So, how might be describe that event? So, how many, what might be for example this, the probability of getting you know, 50% of the trials that I've done, trials that I've observed to be a positive trial? Well, what I need to know is what's the likelihood of taking, so let me write this down how many different ways can I observe exactly n positive events. So, I know the probability of getting this particular sequence of events and what I need to know is well, this is one particular event that gave me n successful trials, how many different permutations of this do I need to see in order to get this n positive results and this is given by the equation n factorial n minus n factorial. This tells me among n total tries that I've had, number of tries, number of positive events, this is the number of ways that I can generate this sequence. Now what I need to know is what's the probability of actually seeing these n things. But I multiply this by q to to the n times 1 minus q n minus n. So, this is a representation of this probability distribution for having exactly n positive events and then of course the rest of them are going to be negative events. So let's see so if I have capital N number of events that I've tried if there is n of them that are positive events then I'm going to have q raised to the power n and there's the rest of them are negative events. There's going to be 1 minus q raised to the power of 1 minus x. So therefore all of these together become this and all the positive events together become this and then I have the permutation of that particular combination. Okay, let's find an example of this. Let's find the expected value and these two quantities that I'm interested in. So I have 15 events and I want to know the expected value and variance of it. So let me give you an example of it so we can move forward. So suppose that I have this individual that is going to observe some figures. So I'm going to give you a specific example of a paper. So there was this patient who had a lesion in their right parietal cortex. This person was shown a picture. This picture was a picture of a house. What a right parietal cortex damage does, makes it so that the individual tends to neglect the left part of the scene. So they see this picture, the house is on fire on the left side. They see another picture of the same house without the fire. And they're asked, do these pictures look similar to you? They say yes, I can't see any difference between them. But then they're asked, which house would you rather live in? And say that this person has seen this picture 17 times. And each time, you see these two pictures, they look the same to me. And the patient is asked, I would rather live in the house without the fire. This is the one they pick. But they do it out of the 17 times, 14 times. So 14 out of 17 times, they pick the picture without fire. So this is called blind sight. And there was a paper about it in nature about 20 years ago. So it was describing this patient that was given a task. And this task was, which house would you rather live in? And they did it 17 times. 14 out of 17 times, they picked the house without fire. Even though to them the two houses looked similar. So they couldn't tell the difference between the houses, but somehow they would rather live in one than the other. For the purpose of our discussion, what I want to know is 14 out of 17, is that like significant? Is that good? Is that really different than chance? Or is it basically not that far from chance? So how do we answer that question? This person made a decision 17 times. 14 out of those 17 times, they picked something. Chance, of course, would be not 14, but 8.5, right? But what about the variance of our decision? How do we determine the standard deviation and the variance of this thing? So we can say, is 14 a long ways away from 8.5, or is it just a little ways away from 8.5? How do we decide that? So what we need to do is know something about these things, the expected value and variance. So what is n? N is the total number of times this person picked that particular house. In this case, 14. So it's 14 that they picked far away from chance or not. So how do we do it? So what's the expected value of this random variable? So we have this random variable, it can be 0 or 1. When it's 0, it's 1 minus q. When it's 1, it's q. So this variable x has potentially two values, 0 and 1. Its expected value is its value times probability 1 minus q plus 1 times probability q. So its expected value is q. What's its variance? So what is this? What's the expected value of x squared? Well, that's 0 squared times 1 minus q plus 1 squared times q, which is just q. So then the variance of a random variable is expected value of x squared, which is q minus expected value of x squared, q squared, so that's q times 1 minus q. So its expected value of q, this random variable, its variance is q times 1 minus q. Ok, so for a binomial random variable, now we know how to find its mean and its variance. Alright, so we had our patient and she saw 17 examples, picked 14 of them. So we're going to write our problem as follows. So let's say that every time he picked, he or she picked the house not on fire, I'm going to say 1. So for chance, if this person was, you know, completely blind and could not tell the difference between these houses and were picking these things at random, then the probability of x is equal to 1 is of course just 1 half. And what would be the variance of our variance of x. So the expected value of x, let me write both of these down. So the expected value of x is going to be equal to just 1 half. The variance of x is going to be equal to 1 half times 1 half which is 1 fourth. Ok, so this is chance. Chance, we have a distribution described as follows. It has a mean of 1 half, it has a variance of 1 quarter, which means what's the standard deviation of x. So now what we have is that this person is going to make 17 choices in a row. So we want to know what's the expected value. So n here is going to be the sum of xi i is equal to 1 to 17 and under the chance condition, what's our prediction about what n is going to be? What's the expected value of n? It's just going to be the sum of expected value of xi i is equal to 1 to n which is going to be what? So now we have 17 times, 17 divided by 4 as being our variance, right? So that's 4 and a quarter. Alright, so what's the square root of 4 and a quarter? Yeah, it's a little bit more than 2. 2.1 or so, I'm not sure. About more than 2. Anyway, about 2. Ok, so what did she do? She had a mean of 14. So we had her, so her performance, we have probability of x is equal to 1 was equal to 14 over 17. The expected value is our person, our subject. Their expected value was 14 over 17. And we can compute her variance as well, which is 14 over 17 minus, which is about 0.146. So if I now compute the expected value of n for her, how many should she have picked? Of course she's going to be 14 here and the variance for her is going to be 2.48 or so. So this performance, this mean is about two standard deviations away. So chance is eight and a half with a standard deviation of two, her performance is 14 with a standard deviation of, you know, one point something. So she's about two standard deviations away from the mean. So that's the confidence that we have that we would say this person is significantly different than a naive blind person. So the paper makes the conclusion based on one person doing something 17 times and we can still have a statistic for this one person. We have a variance, we also have a mean and variance for chance, even though we didn't have a control subject. And based on that, we can make an inference that this person is different than naive. And what's our confidence? Well, because they're more than, they're about two standard deviations away from chance. That's the statistic that one would be using. Any questions about this? Yeah. I didn't follow you, what do you mean? Yeah. Oh, it's more than two standard deviations away. Yeah. Yeah. It's between two and three, right? Yeah. Yeah. So we have two distributions here. Really what we have is two distributions. They're not normal distributions, of course. So I can't use any of the normal ways of testing between things. I have one distribution that has this mean and this variance. I have another distribution that has this mean and this variance and the means are more than two standard deviations away from each other. And that's as much as I can say about the data. And you can get a sense of whether you think that's a lot or not a lot. Yeah. Wait, when you say that the variance, so I was just a little confused about what you were meaning by variance in that, because there was only like one observation. Yes. Despite the fact that there's one n, right? But that n is the sum of these 18 observations that we actually did. So it's a sum, right? Yeah. And each one of them is a random variable that's guided by this distribution. Okay. Right. So this is what I wrote here. These two equations. Right. So each instance that we see, that's a random variable with that particular distribution. So it's weird, right? You only have one observation, but how can you have variance on it? Well, this is how you can. Alright. Any other questions? Yeah. And, uh-huh. Yeah, so that's the expected value of x squared. So this, it's this equation. It's q minus q squared. Good. Okay. Last thing I want to tell you are Gaussian distributions. The more familiar one that we have. And let's just, I want to give you some intuition about what they mean. So one of the nice things that one of the, I don't know, you guys should look this up. What is an exponent? What is e? And where does it come from? It's a useful thing to know. It's an interesting number. But maybe I'll leave that for another class to describe to you. So what we say a Gaussian distribution, what do we mean is we write it like this, x normally distributed with some mean and variance. And so what Gaussian distributions look like? So when I'm in a Gaussian distribution, if I were to look at the, integrate the probability of these tails, things that are far away from the mean, then I would get 95, 95% of my, my data is likely to occur here. 5% of it is going to occur in these tails. More than two standard deviations away from the mean. And the way so 95% of the observations of x will be mu plus or minus 2 sigma. A Gaussian is written as follows. And if we have a vector instead of the scalar, what we have is the equation that looks like this. So this is my vector x. It's made up of a random variable that has components in it, x1 and x2 and x3 and whatever. And what it's, the way we write it as a normal distribution is as follows. N is going to be the size of this vector x meaning that how many dimensions it has. And now my variance becomes a matrix, which I'm going to call c here, the variance covariance matrix. And I'll show you what that is. So what is this, what is this c? What kind of a beast is this? So suppose that x my vector is made up of x1 and x2. What do I mean by variance covariance? Let me give you some intuition about that. So my variance is going to be a matrix rather than just a scalar because I'm going to be looking at the variance of x1, the variance of x2, the covariance of x1 with x2, the covariance of x2 with x1. So this is variance of x1, variance of x2, the covariance of x1 with x2, the covariance of x2 with x1. What does it mean? What's this covariance thing? So we knew that what I say variance of x1, what I mean is the expected value of x1 minus its mean squared. When I say covariance of x1 and x2, what I mean is the expected value of x1 minus its mean times x2 minus its mean. So let's give an example. Say at my covariance matrix of x1, covariance of x, vector x, where x is equal to x1 and x2, say it's a matrix and say it looks like this, 1, 3, minus 1, minus 1. Say that's my covariance matrix. What does that mean? Let's draw my random variables. So first of all I have a random variable that's a vector, it's made up of x1 and x2 and I'm going to pick a number each time. When I pick a number I get a vector, x1 and x2. So say that my mean is 0. So say that mu is 0, 0. So the mean of this vector is 0. But it has a covariance that looks like this. So first of all it tells me that the variance of x1 is small but the variance of x2 is large. Because that's what this equation says. 3 is variance of x2, 1 is variance of x1. So if I were to draw random numbers based on this distribution, I'm going to get more variability along x2 than x1. So my numbers are going to have more variability along x2 than x1. Three times more variance. More importantly, however, there's a negative covariance. What does that mean? That means that when I pick an x1 that's positive, I'm likely to pick up an x2 that's negative. The covariance is negative. So that means that there's going to be data that's going to fall along a diagonal like this. So I'm picking up a vector. That vector has a covariance that's described by this matrix. The diagonal parts of this matrix are telling me something about the variance of the individual components of the vector x1 and x2. The off diagonals are telling me about their covariances. How as x1 changes, what happens to x2? Negative covariance means as one is increasing, the other one is decreasing. Negative variance means the opposite. As one is increasing. So if these numbers were positive, then I'd have something that looks like this. Okay, one last thing. I said this would be the last thing, but I have one more thing to tell you. Let me just briefly show you the arithmetic included that we need to know to compute expected value and variances of simple algebraic equations. So the expected value of x plus y is the expected value of x plus the expected value of y. The variance of x plus y, what does that mean? That's the expected value of x plus y minus x bar minus y bar squared, right? So let me write this this way. We'll call this a. We'll call this b. So that's equal to expected value of a plus b squared. So which is equal to expected value of a squared is going to be x minus x bar squared plus y minus y bar squared plus 2 times x minus x bar y minus y bar. Okay, so what is this expected value of x minus x bar squared? What is that? Variance of x plus variance of y and then what is this last term? Very good. Okay, good. Thank you very much. Have a good rest of the week.