 A teaser solution for this homework that's due Wednesday of Part A in each problem. And if you haven't gotten one, you can beg somebody to get one now. You can stop by my office and I'll give you... I have only one. How many have not gotten that? Two? Okay. All right. You're going to be in the other class. I'll give it to you. Okay. I mean, I'll hold on to it and then I'll give it to you at the end of the class. Okay. Let's see. We're in Chapter 7, so we have to talk about... We still have some things to talk about. And there are two problems that are in the exercises that I don't have really time to... I mean, I don't want to assign them, but here I give them the solutions. There are these Morpheus laws, two instances of Morpheus laws. Morpheus law applies to everything. But I thought it would be kind of interesting so you can look at how wonderful probability is, explaining weird things. I mean, unexpected things, right? Or things that actually happen too often that you prefer not to happen. So, okay. So let me stay this theorem. So we talked about the standard Gaussian density function or normal distribution last time. And I mentioned that very briefly. I said that although this is certainly a typical example, I mean, an example of density function. So I'm going to remind you of that. So the standard normal distribution is, as you know... Well, we said that for a continuous random variable, the density function is the derivative of the distribution function, right? So the distribution function would be the integral of the density function. And the density function was... I think we use g... little g. So this is the integral of 1 over square of 2 pi e to the minus x square over 2 dx. Okay? Has anybody made that computation that this density function actually integrates to 1? No? So I remember last time we talked about the reason for this factor. This factor is just so that... Right. So as I said again, as t goes to infinity... Yeah, you remember that, too. Right. The integral goes to... So this is the bell-shaped curve, right? Which has an inflection at positive 1 and negative 1. So this is g01 of x. Okay. And this height comes... Excuse me, this factor here comes from the fact that negative infinity to infinity of e to the minus x square over 2 is square root of 2 pi, right? And I said try this as an exercise in double integrals, basically, right? Which implies that the integral from negative infinity to infinity of g... You divide by 1 over square root of 5 and that's where you get 1, right? So the area on this curve is 1. Also we said because of this normalization, what's the area on there? So other fun facts. Integral from negative 1 to 1 is about 0.068. Integral from negative 2 to 2, 0.95. And integral from... And so on, right? So you can actually tabulate this. Also, let's say there is so-called an error function. What's the error function? You use the integral from 0 to t of this. Okay. And it has... I think it's built in MATLAB for sure, but... I just want to make sure. Okay, so it's slightly different. Actually, no. I think once you make a change of variables, you see how this thing is... So up to a change of variables, this is equal to minus t squared, not t squared over 2. So you have to take t over square root of 2 as a change of variable. Okay, so... So it's just one non-elementary function that one has handy. So I'm just going to say it's up to... You know, it's related to this. Okay? We're not going to use this, but... Okay, so... And the distribution, the normal distribution is actually the antiderivative of this density function. Which antiderivative is the one that goes to 0 at negative infinity? Right? So the... Remember, the distribution function has to start at negative infinity at 0, increase, and asymptotically go to 1. Right? So it's a function of t. So that's the reason why I go from negative infinity to t and not from, I don't know, some number. You could start from any number, right? And you still get an antiderivative. But it won't have that property that it goes to 0 as t goes to negative infinity. Okay, so... So here's the central limit theorem. So it says that... For a sequence of identically distributed... Independent random variables in short is I, I, D, R, V. So I guess independent identically distributed random variables. With mean with expected value denoted by mu. So again, this is true for any of the random variables because they're all identically distributed, right? And the standard deviation, sigma. So the square of the sigma is the variance. And again, this is true for any... I mean, this should be for any xi. And I will remind you what the variance is, the expected value of x minus mu squared. Okay? Again, we can talk about the following event. The event that x1 plus xn minus n times mu divided by sigma squared of n is less than t. So, right? This is an event. In the sample space, you take all outcomes for which this is an equality hold. And this makes an event. This probability is getting closer and closer. So converges as n goes to infinity to this normal distribution. So this... I repeat this, formula is integral negative infinity to t of 1 over square root of pi to minus x square over 2 dx. Okay, so that's... So we're not gonna go through the proof or anything, but it's a major tool in probability theory. So what I like to do is just show you how we use this in simple models. So here's an interpretation. So for example, imagine that we want to... So if we want to estimate the probability, the event that x1 plus xn minus n mu is between... Well, let me do it like this. Over sigma n is between two values. And I'm picking negative one on one just to illustrate. But if it's between two values, this actually... Well, the choice of negative one on one is so that we fall within the first standard deviation from the mean for the normal distribution. But this could be 3 and 5. It doesn't matter, right? So any two numbers. So that would be your c-value on the normal distribution. Then we can write... So we can write this probability of this event. Let's give it a name. So we want to call this event A, right? So we want to write this as... Well, I guess we can rewrite this event as difference between two events. One event is that one inequality happens. So I'm gonna use less than or equal to n minus... Or it should be... Right, so it's gonna be minus as sets, right? The probability that the other inequality doesn't happen. Because I have... So my event A is actually consists of... It's the intersection of... Well, it's not the intersection. It's actually the difference between... So this is gonna be the larger. So this is where things are less than one. And this is where the things are less than negative one, right? So what we're computing here is the probability where this expression is between negative one and one. And because of this, we can write that the probability of A is the difference of two probabilities, right? So the probability of... Let's give this a name. So this is gonna be... This is B and this is C, right? Probability of B minus C. And C is a subset of B. So this is the probability of B minus the probability of C. Sub-event, right? Or subset. Okay, so in essence, what I have is we have that probability of... This ratio, x1, xn minus n and mu over sigma squared of n is less than or equal than one. Mine is the probability that the same thing is less than negative one. Now, we should be careful with the strict and less than or equal then. I guess to avoid... If you wanna avoid that... Well, one way to say is put the strict inequality here and then this would be less than or equal then, right? For instance. And then you have to deal with the... What's the probability of that expression equals 91, right? That would be relevant if, for instance, if the variables are discrete random variables. But as n goes to infinity, here's what you can conclude from this computation. We can conclude that this thing goes to g01 of t of one, excuse me, right? And this goes to g01 of negative one, okay? So this is the integral from negative infinity to one of the density function minus the negative infinity to negative one of the density function. This is the same as the integral from negative one to one of g01 of x dx, which we said is a 68%, right? So as n is large, the probability that that ratio is between negative one and one is close to... We're not saying really how close. We don't have that rate of convergence. Well, it's not part of this statement. But it's approaching as n goes to infinity. It's approaching this number, right? So that's true in general. So more general, excuse me, the probability that... I'll send a second. Why do we... I mean, we have to stick with this kind of strange ratio because that's what central momentum theorem tells you about. But this goes to integral from a to b of g01 of x, standard normal... Well, it's basically the integral or the area under the curve, bell shaped curve between a and b, okay? And again, to be very precise, one should put strictly less than, right? In case a is... In case x is discrete random variable. So for us, the practical implication of this is, for instance, what's the probability of the event that this ratio, always the same ratio, is between negative two and two when n is large, right? So in the limit, this is approximately... On the limit is exactly the integral from negative two, which in 95%, and this is also approximate, right? So there's... It's not actually... None of this is... Well, in the limit, this would be equality, but this is not equality, right? It's close to 95%. So we say the following. We say that the probability that... And let me rephrase this. Instead of this ratio, I'm going to put... I'm going to rephrase the event, but it's the same event, right? That the sum minus n times the mean is between what? Two sigma squared of n minus two sigma squared of n is close to 95% for n large, okay? One could actually rephrase even... You can have different ways to look at this. For instance, you could actually rephrase that event by dividing by n, all three sides, right, of this inequality. And again, the probability of this is close to 95%. So in this form, you can see something that we talked before. We said that if I have a sequence of identically distributed random variables, then what was happening? This average was converging to the mean, to the expected value, right? So this statement is actually, in a way, it's a more precise information about how close this average is to the mean, to the expected value, because these guys go to zero, right? So as n gets large, this gets close to zero. So this is the probability of this difference being less than this small quantity, right? But again, the small quantity depends on n, so it's a little bit delicate. It's 95%. So in a way, it's adding more information about the average of the sequence of random variables, right? But in a kind of very particular fashion. So it's not totally kind of on top of this strong law of large number. Okay, the final one that I think we're going to use in this example is one last reformulation is that simply if you're interested in the sum and you want to know, well, where does that sum end up being? Well, if you're looking above here and you add n mu on all sides, you're going to see this 2 sigma squared of n and n mu minus 2 sigma squared of n. So this is with probability close to 95%. So maybe that's probably what's the most, I guess, relevant for what we do, for the examples we do. Okay, so the example that is here in the book where this gets applied is basically an emergency service example. So let me say this house fire problem if you want. So it says that an emergency service receives an average of 171 calls per month from house fires over the past year. On the basis of this data, the rate of house fire emergencies was estimated at 171 per month. The next month there were only 153 calls received and the question is does it indicate an actual reduction in the rate of house fires or is it simply a random fluctuation? So it looks kind of out of the blue in a way we don't really know what the random variable is, right? Well, so in this modeling we have to start saying how do you reinterpret this? What is the random variable? And that can be kind of tricky, I think, but if we agree to pick the random variable to be time between arrivals, between, excuse me, between fire calls, well, between a fire call I guess N and N plus one. Okay, then these are random variables because, well, there are, right, I mean they're not programmed. So the question is what is the distribution of such a random variable? So I guess one could say that they're independent because you can never predict based on, well, any single random variable is independent of the others, right? Because the fires are coming at random times. So the distribution of XN is assumed to be, is assumed to be of exponential type. So what does that mean? That is the density function is f of X equals lambda e to the minus lambda X. Okay, so this is very different from the normal distribution by the way. The graph of this is, well, I should say if X is positive and zero if X is negative. So the graph is, it starts at some value lambda and then it drops exponentially, right? And then before zero is zero, so this density function is like this. Now there is actually, the reasons why one takes this to be, so this assumption is based on the nature of this time of arrival, okay? So arrival times usually have exponential distributions. I think in chapter eight if we have time we're going to talk a little bit about this. So what would be the expected value of such a random variable? Remember this is the integral of X times the density function, right? So if you do this, lambda integral from negative infinity to infinity, lambda is inside but it can be moved outside, right? So X times lambda e to the minus lambda X dx. I'm sorry, this is only from zero to infinity, right? Because before zero is zero, so what is this? It's lambda times, right? So you do integration by parts and I think what you're going to get, you should get in the end is that this is actually lambda. One over lambda. So that's something to remember. And what about the variance for the standard deviation? So the standard deviation is square is the variance. So see much square is the variance of X and this is the integral from negative infinity to infinity X minus the mean of the expected value of the square times f of X dx, right? Again, this will reduce the integral from zero to infinity and I guess it takes several integration by parts, right? To actually get the value of the variance but it turns out to be one over lambda squared, meaning that sigma is one over lambda, right? It's just a computation of the integral. I don't want to do this. So in a way it's very similar to the Poisson distribution for discrete random variables where both the expectation and the standard deviation are the same, right? And that same thing is connected to the rate at which these arrivals occur, right? What do we say? So let's see. So if we say that a house fires calls average 171 a month, number of, and we model this random variables with exponential distributions, then what will lambda be? Let me say that the average is the 171 average, well, this average one over lambda is going to be the average time between two arrivals, right? So it means that one over 171 is the expected value, so this is the expected, so let me say it, the expected time between calls would be one over 171, and since this is one over lambda, it means the lambda is 171, okay? So because of this lambda is 171, it means that the standard deviation is also going to be one over, excuse me, one over 171, okay? So with just these two quantities and the fact that, I guess, this sequence of Xn is IID, independent and identical distributed random variables, then the conclusion is that the sum of the random variables X1 through Xn lies between this number, n over 171, so that's mu, right? One over 71, plus 2 squared of n over 171, and n 171 minus 2 squared of n over 171, with probability 95%. And what is the meaning of the sum of these random variables? So this would be total time for, between the first and the nth, first and nth call, right? So the thing is tricky, it's all related, the time frame is one, so if you add up all those X and it adds up to one, it's one month. Exactly, so the time frame is one, as the unit of time is in months. But keep in mind, as we stay this, this is an event, right? So all of this is, it's like you look in the past and you say, well, this many have happened, right? That's one instance of the experiment, right? That's one run of the experiment, and this is what you observe, right? So this is just one outcome, right? Every month that things happen, there was an outcome of that experiment. This is just saying how likely that is to happen, right? And it says that with 95% probability, it is likely that this inequality is going to hold true. Does it mean that the next month is going to happen? No, but it's very unlikely, right? And I think it's taken to be a standard, this 95% is taking to be a standard threshold as to, so it's called a confidence interval, right? And it's taken to be the, if things happen outside of this, if this is not happening, if this inequality is not happening, then there is some external factor that wasn't taken into account, okay? Otherwise, if this inequality happens, with this high probability, it's saying that this is a normal fluctuation, okay? Random fluctuation. So this two standard deviation from the mean for the standard normal distribution is actually referred to the, you know, to be statistically significant with this probability or be within this confidence interval. Yeah, yes, except the density is, yes. I mean, the density is not, you see, it's not, it's hard to pinpoint where, well, the mean is going to be one over, it's somewhere here, right? It's one over lambda, right? So that's where that center of mass of this, if you think of the area on the curve, right, the region on the curve, over lambda would be where the center of mass of this is, right, where it's balanced. So the area to the left equals the area to the right. But you see it's not symmetric. And then it's, the first standard deviation would be, in this case, would be one over lambda. I mean, it means it goes all the way to zero and then two over lambda, right? But it's not the same as in the Gaussian distribution where it's symmetric. So for instance, probably it's better to think of two standard deviations from the mean from here. It would be three over lambda, right? That's where what happens. Is that true? No, yeah, it's only with a normal distribution, actually. Yeah. So z-score. Yeah, so I don't know how to best kind of interpret this unless, I mean, for this specific example, it's saying that, I mean, this quantity, it's used to determine what's the confidence interval. What's the probability that within 95% probability the number of calls is going to lie between 171 plus or minus what, okay? So during any given month, if you want to, with n calls, this is going to be, well, it may not be actually one, it may be slightly less than one or slightly bigger than one, right? But if you want to just get an estimate, just substituting this sum with one is going to give you inequalities that n has to satisfy to have that, right? So during any given month with n calls, this is within normal fluctuation, random fluctuation if this inequality both holds, okay? So it would be a question of solving these two inequalities for n and I guess you can solve it in many different ways. It could even be graphical, but it turns out that so n has to be belonging to, it could also be iterative, right? It's going to be between 147 and 199, okay? So if the number of calls falls between these two, within this range, basically it says that it's a normal, it's a random fluctuation, right? Outside of that, it would be, you know, that something wasn't taken into account in this model. For instance, so you don't have the assumption that you have exponential distribution is probably not correct, okay? So it is twisted. See, I thought the problem you said that the assumption that exponential distribution is not correct, but the central limit there says it doesn't matter where it came from. If you add up 150 of these things, it's going to be normal, no matter what the underlying distribution was. So some things might not be correct, but I wouldn't think it would be the underlying distribution. So you looked at it from the end. The assumption is that that 171 number is the mean, right? So if you notice that in previous months, it doesn't mean, you know, it's either, in the previous months, happened so. So this assumption, even if it's exponential distribution, the mean may not be correct. The assumption that the mean is 1 over 171 is not correct. I agree that the mean or the point of standard deviation may be incorrect or something like that. Also, that's what you're saying. But also it's possible that different between calls that they're not independent, right? For instance, they could have different distributions. One could have one distribution, one could have another distribution, right? Some of them could be arson or something. So all of the assumptions that led to this or any one of them can actually break this. So, yeah, you could have a different distribution for some of the x's. Yeah, so they're not, they may be not independent. They may not be independently distributed and stuff like that, yeah. Let's see, one of the homework problems that I assigned and I gave you that, I have you started on that. No, yeah, the teaser solution. It has basically the same flavor except that those random variables are actually discrete. In problem number two, actually, I'm talking about. So you can see that the key is actually to start with the same inequality and then, right? So start with this inequality and use that information that the central limit theorem gives you to conclude things about the probability, in that case, a faulty diode or something. It's a lot of manipulating, you know, in a way you take out the randomness. In your calculation, all the randomness is built in this central limit theorem, right? That's what's used to the main assumption on the random nature of these models. But past that, all you have is just, you know, deterministic, just algebraic things that you do, right? So I don't know if anybody tried to go further than Part A, which was, but I think if you have Part A, it shouldn't be hard to extend. You know, I think Part A, I think you have 1,000 diodes that you test, right? And you find three wrong. But obviously that's, so to conclude that, that's the probability of something being faulty is, may not be accurate, right? I know it's hard to think, but there is a number that's the probability that something is faulty, right? And it's just not always points, I mean, 0.3%. So to narrow it even further, you'd have to run it more. And I think Part B is saying, increase that to like 10,000. So basically you do the computation. And then in Part C it says, you know, how much more do you have to increase to be confident with that, you know, within that 95% confidence that you found, you narrow it down to that probability, to find the probability. So it's an interesting, well, it's just playing with inequalities, but some of them are symbolic, so it's an interesting use of, I think a computer to figure those numbers out, okay? Let's see, so by this exponential distribution we'll come back in chapter eight. We haven't done, we haven't, you know, talked about, you know, just basically talked about the expected value and the variance. Well, so it's, the sum of this random variable is because each random variable is a time between two calls. The sum of the random variables is basically the time from the first to the nth call. So if you say you have n calls, you say that the sum is close to one. That is, maybe this is less than one, but n plus one is greater than one. So I guess that would be more correct. Yeah, I had a problem with this, too, thinking about that. The correct way would be that this is less than or equal than one, and then you would use that inequality, right? Should I say less than or equal than one? Yeah? And then with n plus one is greater than one, and then put, and then you use that inequality, right? Yeah, that would be, so, so, in other words, you would have to put n plus one here, n plus one here. Maybe we should, we should have done that, but... This is such a weird problem, the way it's phrased, because it's a one-month phrase. So we've done it over 171, so roughly you're getting a call every five or six hours, right? And when you add up 150 calls, you know, what's the time? So when you add that up, it's 171 calls and 153 calls. Just where do you see the random variables? Yeah, well, so again, you think of the experiment being during a month, record all the fire calls. That's the experiment, okay? Then you extract the random variables, you know, are going to take values for each of this experiment, for each run of this experiment. Yeah, I know it's a little bit kind of, it's odd to think about it, but remember that in all this, you have to kind of formulate, you know, what is the sample space? You have to understand what the sample space is. And that could be sometimes hard to say, but in words, that's what it is, right? In any given period of one month, you record the number of, you record basically the time when you had a fire call, right? And then X1 is going to be the first, the time between the first and the second and so forth. Okay, now, I was just telling you, my take on this is why don't we, I mean, maybe we should just put N plus 1 and plus 1 here, right? Well, in the end, you're going to end up with maybe slightly different interval here, right? But probably not a heck of a lot different than this, maybe some decimals, right? So in the end, it's the conclusion of your model is, because these are probabilistic models anyway, there's never like an exact number, right? That you have to do. Plus, these numbers are not very large to conclude that, you know, maybe the central limit theorem doesn't even, it's not even correct to apply here. N has to be very large for the central limit theorem to apply. Yeah, one month. Yeah, so it's okay. So Xn would be the fraction of the month. We should have said. Yeah, because you see when I said expected time between calls, I said it's 1 over 171. So that's the fraction of the one month, yeah. Okay, all right, so let's see. I think there was another problem which, number six, which simply it's just a computation of this, it's a discrete random variable. So party is just computation of variance and standard deviation. Yeah, the Poisson distribution. So that's just basically dealing with a series. I'm starting to understand what was, and it's still, if you do the previous problems, I would have said 1 and X was 1, 2, 3. Okay, yeah, okay, okay. So now we're talking about a different, it's the same, I think it's the same general situation, but now we're talking about a different random variables. So now you call the random variable to be Nt, which is number of arrivals during a time interval. So it's a fundamental change that is, now you're looking at a different, you observe something else, the same experiment, but you observe the something else. It's a number of arrivals during the fixed time interval. And that has a basically Poisson distribution. Okay, and again that's, we're not really talking about why, but we're just saying that has that kind of distribution at an either rate, right? So there's a lambda there, exactly. So what is that expected value of number of arrivals during a given time interval? Like if time is 1, it's going to be 171, right? So yeah, in that problem, your lambda is 171. But if our sample space is 153. Your sample space is 153. The standard is 153. Yes. Then the equation has you taking 171 to the power of 153. Yes. How do you do it on your questions? How do you do it on a computer, right? Right. So, yeah, so, right. So the task, at least in part B, I think, is to basically sum up a bunch of, I don't know, 36 values. So in part 7, part B, 6 part B in your homework is basically to take a sum from 153 to 1. Is this what, yeah, to 1, oh, okay. They want to know what the probability is. 171 minus 18, 171 plus 18, okay? Whatever these numbers are, okay? And you want to find this sum, right? Okay, so there's a legitimate question. How do you sum this up? By hand, it would be impossible, right? So let's say one thing you can do is, so this would be e to the minus lambda t, lambda t to the n over n factorial. So there are two ways to do it. One, in MATLAB, there is actually a built-in Poisson distribution, which I believe is called Poisson, right? No. Does anybody remember what that is? What that function is? Okay. So that's the Poisson distribution. And, by the way, it's a part of the statistic toolbox, so I think you need to have that, oh, yeah. Okay, so you see this one here? So Poisson PDF, okay? So it'll just compute those values. I mean, so that will take care of it, yeah. No, no, no, this is the script. Yeah, you just have to do the series, yep. Okay, and so that should answer your question? Yes, I agree. So the other way, what I would say is, we can actually make this in a loop as follows, so you can do the following. Let's see, so how do you do this in MATLAB? e to the minus lambda, lambda to the n over n factorial. Well, let's see, so you do lambda e to the minus lambda, lambda times lambda n times divided by 1 times 2 times n. So you can, let's see, turns out, if lambda is big, that's going to be fairly small, right? So one thing to do is you can do the following. e to the minus lambda plus n, and then divide lambda into 1 times e, n times e. When lambda is close to n, and that's what you have, that's lambda is at 171 and n is close to 171, right? Then you can actually implement this product, iterate, so you can start with this ratio, then multiply by the next and the next and the next. All of these ratios are reasonable numbers, right? And then this is also not too small or too large number, okay? And then you sum this for all n's. That's a good exercise to implement in a code. Any other questions or something? I'll show that the probability that the number of calls is big. So what's the standard deviation? It turned out to be weird, so it wasn't 18, but... Right. Well, it's telling you to, 18 is not standard deviation. It's just given to you as a number to say, compute what's the likelihood that the number of calls is within that range. And that's... Yeah, so it's not standard deviation. Standard deviation is probably what you said. Then the next question is, what if you increase that 18, or how much do you have to increase that 18 to get what's part c? To be at the normal variation, okay? And again, that will not be the standard deviation. No, no, because these are not... So these are discrete random variables. They have their own distribution, and it's not a Gaussian distribution. Okay. So it's... Behind all of this is a central limit here in saying, you know, that 95% confidence interval is determined, or... Yeah, so the range of n of the number of calls to be within that standard fluctuation, random fluctuation, is what you need to determine. And it won't be necessarily a standard deviation of that random variable, right? I mean, standard deviation... Isn't the standard deviation of the Poisson distribution the same as the square root of lambda? So it's the square root of... So that's it, right? Yeah, it's the square root of lambda, right? But it's not the same as computing that confidence interval. Let's see. So they're the only one that... Well, there are two problems. I'll leave number seven since I... In a way, it's kind of similar to this Murphy's law. So if you'd like to look at eight and nine, well, maybe nine. It's, again, kind of a discrete... No, it's not. I'm talking about... So number seven, I think it's just a counting sort of problem. I think I gave you the number part A, so at least you know how to set it up. And leave the exercise 16, because I haven't talked about normal distribution. It's really about the diffusion, okay? So we don't have to do 16? Yeah, so I'll... We'll talk about it on Wednesday. Okay? So Wednesday I'll have to... So we'll talk about diffusion. I'll start chapter eight. And then I'll have the FCQ, so we'll have to quit a little bit early. Okay? Thank you.