 This is a video about cumulative distribution functions for continuous random variables. The cumulative distribution function, almost always written with a capital letter F, is a function whose output is the probability that x is less than or equal to the input. So for example, f of 2 is the probability that x is less than or equal to 2, and f of 3 is the probability that x is less than or equal to 3. Now you know that if we look at the graph of the probability density function, then the probability that x is less than or equal to 3 is this area here, the area underneath the graph up to x equals 3. So f of 3 is this area, the area under the graph of the probability density function up to x equals 3. So f of 3 being the probability that x is less than or equal to 3 is the integral of the probability density function up to the limit x equals 3. And in general, f of x naught being the probability that x is less than or equal to x naught is equal to the integral of the probability density function up to the limit x equals x naught. You may be wondering here why I'm using x naught as the input to the cumulative distribution function. And in fact, you'll see that we often use x naught as the input to a cumulative distribution function. The reason is that when we have integrals like this, we want to avoid confusion between the limit x naught and the variable x in the probability density function. Okay, here are the most important things that you need to remember about cumulative distribution functions. The cumulative distribution function is a function whose output is equal to the probability that x is less than or equal to the input. And as that's the area underneath a graph of a probability density function up to an including x naught, that's equal to the integral of the probability density function as far as the limit x naught. Now what are you supposed to be able to do with cumulative distribution functions? I'm going to spend the rest of this video looking at some example questions and showing you how to answer them. The first thing that you need to be able to do is to calculate probabilities. And here's a typical question. We're told the cumulative distribution function for the random variable x. We're told that f of x is equal to 0 when x is less than 0. 1 over 32 times 6x squared minus x cubed when x is between 0 and 4. And it's equal to 1 when x is greater than 4. The question is to calculate the probability that x is greater than 2 and less than 3. Well think about the graph of the probability density function. The probability that x is between 2 and 3 is this area here, the area under the graph between 2 and 3. And one way to work that out would be to find all the area up to 3 and then subtract the area up to 2. In other words to find all the coloured area here and then subtract the yellow area. Well the whole coloured area would be f of 3 and the yellow area would be f of 2. So the probability that x is greater than 2 and less than 3 will be f of 3 minus f of 2. The definition of the cumulative distribution function tells us how to calculate these. f of 3 is 1 over 32 times 6 times 3 squared minus 3 cubed. And f of 2 is 1 over 32 times 6 times 2 squared minus 2 cubed. If you work all this out we get the answer 11 over 32. So that's the probability that x is greater than 2 and less than 3. Here's a similar question. This time let's find the probability that x is greater than or equal to 1.8. Well again if we drew the graph of the probability density function we'd see that we had to work out this area. The area under the curve to the right of 1.8. And one way to find that is to look at the total area under the curve which remember has to be 1 and to subtract the yellow area, the area to the left of 1.8. Okay well the area under the curve to the left of 1.8 will be f of 1.8. So the probability that we're looking for will be 1 take away f of 1.8. Again the definition of the cumulative distribution function tells us how to calculate this. It tells us that f of 1.8 is 1 over 32 times 6 times 1.8 squared minus 1.8 cubed. And if we calculate this and subtract the answer from 1 we get 0.57475. And that's the answer to this question. Finally on this example let's calculate the probability that x is equal to 1. Okay well perhaps you can answer this for yourself. What do you think the probability that x is equal to 1 is? I hope you realize that this was actually a trick question. The probability that x is equal to 1 is the area of this line. And the area of a line is 0. The probability that x is equal to 1 is equal to 0. Okay the next thing I'd like to look at is converting a probability density function into a cumulative distribution function. There are several methods for doing this and I'm going to show you three. You might like to think about which of the methods you would prefer to use. The first method works when the probability density function is a linear function when the graph is made up out of straight lines. And you can see that that's the case here because each part of the definition is of the form y equals mx plus c. There are no x squareds or x cubes or anything funny like that. Now when the graph of the probability density function is made up out of straight lines we can find the cumulative distribution function using some simple geometry. It helps to visualize the graph and in this case it looks something like this. Now we've got two cases to consider. The first is when x naught is between 0 and 2 on the left hand side of the graph. In that case we're looking at something like this. We need to know the area of this shape. And as it's a triangle all we need to do is a half base times height. Well the base of that triangle is equal to x naught. And according to the formula at the top of the page the height is a quarter of x naught. So the total area would be a half times x naught times a quarter of x naught which is equal to an eighth of x naught squared. The second case we need to look at is when x naught is between 2 and 4. And in this case we need to work out this red area here. But it's going to be easier to think about the yellow area and subtract that from 1 because remember that the total area under the graph of the probability density function must be 1 and therefore the red area will be one take away the yellow area. Now again this is just a triangle so all we have to do is a half base times height. The base of the yellow triangle is going to be 4 take away x naught. And according to the formula at the top of the page the height is going to be a quarter of 4 take away x naught. So the yellow area will be a half times 4 take away x naught times a quarter times 4 minus x naught and the red area will be 1 minus that. That simplifies to 1 minus an eighth of 4 minus x naught squared. So putting all this together we can now say that when x naught is less than zero f of x naught is zero because the cumulative probability up to zero is obviously nothing. When x naught is between naught and 2 f of x naught is an eighth of x naught squared and when x naught is between 2 and 4 f of x naught is 1 minus an eighth of 4 minus x naught squared and when x naught is greater than 4 f of x naught is equal to 1 because eventually the cumulative probability must be 1. So that's the first method for converting a probability density function into a cumulative distribution function and remember it only works when the graph of the probability density function is linear when it's made up out of straight lines. Let's move on to a different method. Here's another question. Let's suppose that the probability density function for the random variable x is given by f of x is an eighth of x squared when x is between naught and 2 an eighth of x times 4 minus x when x is between 2 and 4 and zero otherwise. This time you can see that we've got a non-linear function. It's not going to be made up out of straight lines because we've got things like x squared involved. It will help to visualize the graph of the probability density function. It looks something like this and again we need to deal with two different cases. The first is when x naught is between zero and 2 and then we've got an area that looks something like this. We can work it out using integration. In this case f of x zero is going to be the integral of an eighth of x squared with limits zero and x naught. It's an eighth of x squared because of the formula at the top of the page. So let's work out the integral of an eighth of x squared with limits naught and x naught. It's going to be 1 over 24 times x cubed evaluated with limits naught and x naught and that's 1 over 24 times x naught cubed minus 1 over 24 times naught cubed which is just 1 over 24 times x naught cubed. The second case that we have to deal with is when x naught is between 2 and 4 and then we have an area that looks something like this. We can see that as consisting of two parts. Firstly the yellow area on the left and secondly the red area on the right. Now the yellow area on the left will be f of 2 and the red area on the right will be the integral of the probability density function with limits 2 and x naught. So in this case f of x naught will be f of 2 plus the integral of an eighth of x times 4 minus x with limits 2 and x naught. An eighth of x times 4 minus x again because of the formula at the top of the page. So let's work this out. We need to work out f of 2 plus the integral of an eighth of x times 4 minus x with limits 2 and x naught. We'll do these one at a time. We know that when x naught is between naught and 2 f of x naught is 1 over 24 times x naught cubed and we can use this to find f of 2. It means that f of 2 is 1 over 24 times 2 cubed which is 1 over 24 times 8 i.e. a third. Okay next we need to work out the integral of an eighth of x times 4 minus x with limits 2 and x naught. Multiplying out those brackets gives us a half of x minus an eighth of x squared and we can integrate that with limits 2 and x naught. It's a quarter of x squared minus 1 over 24 times x cubed with limits 2 and x naught. And that's a quarter of x naught squared minus 1 over 24 times x naught cubed minus a quarter times 2 squared minus 1 over 24 times 2 cubed. And if you work that out and simplify it you'll get a quarter of x naught squared minus 1 over 24 times x naught cubed minus 2 thirds. Okay now we've worked out both parts. We can add them together and find out that f of x naught is a third plus a quarter of x naught squared minus 1 over 24 times x naught cubed minus 2 thirds. Now it's equal to a quarter of x naught squared minus 1 over 24 times x naught cubed minus a third which can be written more neatly as 1 over 24 times in brackets 6x naught squared minus x naught cubed minus 8. Okay now we can sum up and say finally that f of x naught is equal to 0 when x naught is less than 0 1 over 24 times x naught cubed when x naught is between 0 and 2 1 over 24 times 6x naught squared minus x naught cubed minus 8 when x naught is between 2 and 4 and 1 when x naught is greater than 4. Okay that was quite a lot of work wasn't it? And it's fair to warn you that questions of this type do tend to involve quite a lot of fiddly calculation. Let's move on to the third method. The third method for converting a probability density function into a cumulative distribution function can be a little harder to understand in the first place but may be quicker to use in the long run. Let's look at the same question as we just did but use this new method to find out the answer. This time instead of using definite integration we're going to use indefinite integration. We know that you can find the cumulative distribution function from the probability density function by integrating. In fact when x is between 0 and 2 f of x will be the integral of an eighth of x squared dx and that's 1 over 24 times x cubed plus c. Okay the only problem here is that we don't know the value of c but fortunately we can work it out because if we look at the start of the interval that we're talking about at x equals 0 we realise that we know what f of 0 is. We know that f of 0 is 0 because the cumulative probability at x equals 0 is nothing. And the fact that f of 0 is 0 means there's only one possible value of c. If you think about it if f of 0 has to be 0 then 1 over 24 times 0 cubed plus c must be 0. The only way that can be true is if c is 0 and that means that when x is between 0 and 2 f of x is 1 over 24 times x cubed. Okay now we need to look at the next interval and think about when x is between 2 and 4. Well when x is between 2 and 4 f of x must be the integral of an eighth of x times 4 minus x. We can work that out by multiplying out the brackets. It's the same as the integral of a half x minus an eighth of x squared which is a quarter of x squared minus 1 over 24 times x cubed plus another constant let's call it d. Okay again the only problem here is that we don't know the value of d yet but if we think about the start of the interval where x is equal to 2 we can find out what d is. We know that f of 2 must be 1 over 24 times 2 cubed using the previous formula which is a third and there's only one value of d which will give us the answer a third when we substitute x equals 2 into our new formula. When we substitute in 2 we get a quarter times 2 squared minus 1 over 24 times 2 cubed plus d and we know that that's meant to give us the answer a third so we've got an equation for d. If you solve it you find out that d is equal to minus a third. So therefore d is minus a third. Let me just make sure it's clear what happened there. We did an indefinite integral and we got an answer that involved an unknown constant d but then we realized there was a particular value of x, x equals 2 at the start of the interval where we knew the probability, we knew the value of f of x and so we were able to make an equation by substituting in x is equal to 2 for the unknown constant d and solving that equation told us that d was equal to minus a third. So now we can say that f of x is actually a quarter of x squared minus 1 over 24 times x cubed minus a third and that simplifies to 1 over 24 times 6x squared minus x cubed minus 8. Now we can say that f of x is the same function as we worked out earlier. It's 0 when x is less than 0, it's 1 when x is greater than 4, when x is between 0 and 2 it's 1 over 24 times x cubed and when it's between 2 and 4 it's 1 over 24 times 6x squared minus x cubed minus 8. And that finishes the third method for converting a probability density function into a cumulative distribution function. You might want to think now about which method you would like to use to answer questions about this. Okay, the next thing that you need to be able to do is fortunately a lot simpler. You also need to be able to convert the other way round and turn a cumulative distribution function into a probability density function. So let's look at the answer that we just obtained for the cumulative distribution function and turn it back into a probability density function. Well you know that the opposite of integration is differentiation and so all we need to do here is to differentiate these expressions. First of all we'll differentiate 1 over 24 times x cubed and that gives us the answer 1 eighth of x squared and secondly we'll differentiate 1 over 24 times 6x squared minus x cubed minus 8 and that gives us the answer 1 over 24 times 12x minus 3x squared which if you simplify a bit gives you an eighth of x times 4 minus x and that enables us to write down the probability density function. We found out that it's an eighth of x squared when x is between 0 and 2. It's an eighth of x times 4 minus x when x is between 2 and 4 and obviously if you differentiate when x is less than 0 or when x is greater than 4 you get the answer 0. Good news, this is the same function that we started off with in the previous questions so that implies that we got the right answer. Okay the last thing that you have to be able to do is to recognize when a function is a valid cumulative distribution function. We're going to look at a couple of examples here. First of all how do you know if something is a valid cumulative distribution function? Well there are two conditions. First of all as x increases f of x has got to increase starting from 0 and ending up at 1. And secondly f must be a continuous function. That means that a small increase in x can't produce a sudden jump in the value of f of x. And the reason for that is that you obtain the cumulative distribution function by integrating the probability density function and if you move a little bit to the right on the graph you can only include a tiny little bit more area. You can't have a suddenly much bigger area. So I'm going to show you the graphs of some functions and ask you to think about whether they could be cumulative distribution functions. What about this one? Could this be the graph of a cumulative distribution function? The answer is no it couldn't because it doesn't increase starting from 0 and ending up at 1. In fact it decreases up to x equals 0 then it increases and then it decreases again from x is equal to 4 onwards. So it decreases then increases and then decreases again and that's not allowed. So this can't be a cumulative distribution function. What about this one? Could this be a cumulative distribution function? Well again the answer is no because this time there's a sudden increase in f of x where x is equal to 2. There's a jump in the cumulative distribution function and that's not allowed. Finally what about this? Could this be a cumulative distribution function? This time the answer is yes it could be a cumulative distribution function because it increases steadily starting out at 0 and ending up at 1 and there are no sudden jumps or anything strange. This is a perfectly valid cumulative distribution function. One final example on recognizing cumulative distribution functions is an incomplete definition of a cumulative distribution function. It's incomplete because it involves the value of k somewhere and I wanted to find out the value of k for which this would be a valid cumulative distribution function. So it's told that f of x is 0 when x is less than 0 and half of x when x is between 0 and 1 1 minus k times the square of x minus 4 when x is between 1 and 4 and 1 when x is greater than 4. Now the key thing is there can't be any sudden jumps and we should be on the lookout to see if there are any jumps when x is 0, when x is 1 and x is 4 because that's where the definition changes from one thing to another. Okay well you can see that there's no jump at x equals 0 because a half of x is 0 if x is 0 and there's also no jump at x equals 4 because 1 minus k times the square of 4 minus 4 is just 1 minus k times 0 which is 1 but there's the possibility of a jump at x is equal to 1. So what we say is that there can't be a sudden increase in f of x at x is equal to 1. Therefore substituting in x equals 1 to two different expressions half of x and 1 minus k times x minus 4 squared must give us the same answer. In other words a half times 1 must give us the same answer as 1 minus k times the square of 1 minus 4. Okay let's solve that equation. That says that a half is equal to 1 minus 9k which is the same as saying that 9k must be equal to a half and that shows that k is equal to 1 over 18 and that's the answer to this question. k must be 1 over 18 for that to be a valid cumulative distribution function. Okay well this has been a video about cumulative distribution functions and here are the key things that you need to remember. First of all the cumulative distribution function is a function whose output is the probability that x is less than or equal to the input and that's the area under the graph of the probability density function up to an including x0 the input to the cumulative distribution function. Now the area under that graph can be obtained by integration and it's the integral of the probability density function up to the limit x0. So that means that f of x0 is equal to the integral of the probability density function up to the limit x0. Okay thank you very much for watching this video. I'm sure you'll have noticed that a lot of the work on this topic is going to be fiddly and there's a lot to think about. So do practice questions on this sort of thing. You'll find that it takes a while for you to get used to the logic involved. And you'll also find that you need to be extremely careful with all the little details in order to get the right answers. So good luck and thank you very much for watching this video.