 This is a video about using the Poisson distribution to approximate the binomial distribution. First I want to show you that the two distributions are closely connected. Imagine that this line represents an interval of time broken down into 10 subintervals. I'm interested in the probability of me receiving different numbers of phone calls. Let's say that x is the number of subintervals in which I receive a phone call. And y is the total number of phone calls I receive in the interval. Now x will have the binomial distribution because it's the number of successes in a sequence of trials. The trials being the subintervals and success being that I get a phone call. y on the other hand will have a Poisson distribution because it's the number of events in a continuous interval of time. So x has the binomial distribution and y has the Poisson distribution. In most cases though, x and y will be the same. To see why, consider this example. Suppose I get phone calls these times. You can see that both x and y here will be equal to 3. x will be equal to 3 because there are 3 subintervals in which I receive a phone call. And y will be 3 because the total number of phone calls I receive is 3. So x has the binomial distribution and y has the Poisson distribution. But the vast majority of the time, x and y will be the same. And if they're the same, they must have the same distribution. So that means there's a binomial distribution which is very similar to a Poisson distribution. Let's go into this in more detail. Let's suppose that the probability of getting a phone call in any interval is 0.1. Then x will have the binomial distribution where 10 is the number of trials because there are 10 subintervals. And 0.1 is the probability of success because that's the probability of getting a phone call in an interval. That means by the way that the expected value of x will equal 1. Now let's assume that x and y do have the same distribution. In that case they must have the same expected value. And therefore the parameter lambda for the Poisson distribution must equal 1. 1 must be the total number of phone calls that we expect to happen. So y has the Poisson distribution with parameter 1 and the expected value of y is equal to 1. We would expect therefore that the binomial distribution with parameters 10 and 0.1 should be similar to the Poisson distribution with parameter 1. They won't be exactly the same but they should be similar. We can tell that they won't be exactly the same by thinking about the variances because the variance of x here will be 10 times 0.1 times 0.9 which is 0.9 whereas the variance of y will be lambda which is 1. So they won't be exactly the same. But nevertheless we should expect the binomial distribution with parameters 10 and 0.1 to be roughly the same as the Poisson distribution with parameter 1. Here's a graph which shows that this is indeed the case. The probabilities for the binomial distribution with parameters 10 and 0.1 are in blue and the probabilities for the Poisson distribution with parameter 1 are in red. So we can see that these two distributions are roughly the same. It's not a very good approximation but the probabilities aren't that far apart either and the general shape is the same in both cases. Now it's important to understand why x and y aren't always the same. To see why let's look at this example. Here there are four phone calls and the problem is that two of them happen in the same interval. So this means that the number of intervals in which a phone call happens is 3 and so x is 3 but the total number of phone calls is 4 and so y is 4. Now we can try and get around this problem by increasing the number of intervals. Suppose that I divide each of these intervals in half so that there are 20 subintervals. Now x and y are the same again. x and y are both 4 because there are four subintervals with a phone call and the total number of phone calls is 4. Let's go into detail again. x and y still represent the same things but this time the intervals are smaller so let's assume that the probability of success is 0.05 in which case x will have the binomial distribution with parameters 20 and 0.05 but the expected value of x will still be 1. If x and y do have the same distribution then lambda will still be 1 and so y will have the Poisson distribution with parameter 1 and the expected value of y will be 1. It should be the case that these two distributions are more nearly equal. The binomial distribution with parameters 20 and 0.05 should be more similar to the Poisson distribution with parameter 1 and in fact this is reflected in their variances because this time the variance of x is 0.95 and the variance of y is 1 and these variances are closer than they were before but you can see that the distributions are more similar by looking at the graph this is the graph that we had earlier and you can see that when we change from the binomial distribution with parameters 10 and 0.1 to the binomial distribution with parameters 20 and 0.05 the bars are more neatly equal in height now in general when we increase the number of subintervals we eliminate more and more cases where more than one event happened in the same interval so therefore as we increase the number of subintervals x and y must be equal to one another more and more often and the distributions of x and y should be more and more nearly equal to each other so as we increase the number of subintervals from 20 to 100 it's not surprising that the probabilities are now very very similar the binomial distribution with parameters 100 and 0.01 is very similar to the Poisson distribution with parameter 1 if we increase the number of subintervals again the bars are almost exactly equal in height the heights are so similar that you can't see the difference in this picture and this shows that the binomial distribution with parameters 1000 and 0.001 is very very similar to the Poisson distribution with parameter 1 okay so I've been trying to show you that the binomial distribution and the Poisson distribution are very closely related and what you need to remember is that if n is large and p is close to zero then the binomial distribution with parameters n and p is approximately the same as the Poisson distribution with parameter lambda where lambda is equal to n times p what this means is that if n is large and p is close to zero then the Poisson distribution can be used to approximate binomial probabilities let's look at some examples first of all suppose that x has the binomial distribution with parameters 1000 and 0.003 so there are 1000 trials and the probability of success is 0.003 let's try to estimate the probability that x is greater than or equal to 2 and less than or equal to 5 now you can see that this would be a pain to calculate without using an approximation first of all there are 1000 trials so we can't look up any probabilities in the tables and secondly the binomial probabilities would be a pain to calculate using the formula because the numbers are so big so this is a typical situation where we would want to use the Poisson distribution to approximate binomial probabilities now the first thing we need to do is to find the expected value of x which in this case is 1000 times 0.003 which is 3 because then we can use the Poisson distribution with parameter 3 to approximate the binomial probabilities what we can say is that the probability that x is greater than or equal to 2 and less than or equal to 5 is approximately the same as the probability that y is greater than or equal to 2 and less than or equal to 5 now that's the chance that y is 2, 3, 4 or 5 which is the same as the chance that y is less than or equal to 5 away the chance that y is less than or equal to 1. Because 2, 3, 4, 5 is all the numbers up to 5, take away the numbers 0 and 1. Okay, we can work out these probabilities by looking at the tables. We need to find where lambda is equal to 3 and look along the row where x is 5 and where x is 1 to see the probabilities 0.9161 and 0.1991. And then we can do the sum 0.9161, take away 0.1991 to get the answer 0.7170. Now let's look at another example. This time suppose that x has the binomial distribution with 2,000 trials and 0.996 as the probability of success. We're going to estimate the chance that x is less than or equal to 1990. Now this time we can't use the Poisson approximation straight away because although n is large, p isn't close to 0. This time we need a trick and that's to think about the number of failures instead of the number of successes. Suppose that f is 2,000 minus x, so that f will give us the number of failures instead of the number of successes. Then f will still have the binomial distribution and still with 2,000 trials but this time the probability will be 0.004 instead of 0.996. So now we have a situation where n is large and p is very close to 0 so we can use a Poisson approximation. As before we need to find the expected value, we need to work out n times p which in this situation is 2,000 times 0.004 which is 8. So let's say that y has the Poisson distribution with parameter 8. Now we're asked to find the probability that x is less than or equal to 1990 and that's going to be the same as the probability that f is greater than or equal to 10. That's because 1,990 successes is the same as 10 failures and fewer than 1,990 successes is more than 10 failures. So we need to find the probability that f is greater than or equal to 10 and that's going to be approximately the same as the probability that y is greater than or equal to 10 because remember the binomial distribution with parameters 2,000 and 0.004 is approximately the same as the Poisson distribution with parameter 8. Okay the chance that y is greater than or equal to 10 will be one take away the chance that y is less than or equal to 9 because the opposite of being 10 or more is being 9 or less. We can look up this probability in the tables. We need to find where lambda is equal to 8 and then look across the row where x is equal to 9 and we see 0.7166. So the calculation that we need to do is one take away 0.7166 which is 0.2834. Okay I'd like to look at one more example and this is going to be a practical example to do with DNA and genes. You know that the chromosomes inside the nucleus of each of your cells contains lots of DNA and that DNA contains a genetic code for manufacturing proteins within your body. You can see that DNA is a double spiral where the bases A, C, T and G are joined to other bases. In fact they always occur in pairs so A is always joined to T and C is always joined to G and that's very important. Now the way DNA works is that the bases are codes for amino acids. If you take any three bases in a row they're the code for a particular amino acid. For example three T's in a row is the code for phenyl alanine. Now amino acids are the constituent parts of proteins. So what's happening is that DNA contains the codes for amino acids and amino acids are put together in the order the DNA specifies to make a protein. For example this is hemoglobin and the two red things you can see here are identical proteins that are manufactured from a particular gene while the two blue things are different identical proteins manufactured from a different gene. Now I want to look at a question that's to do with DNA replication. Whenever a cell divides into two cells the DNA inside it needs to be copied so that the two daughter cells each get a copy of the DNA. Here you can see an amazing picture of the chromosomes inside the nucleus of a cell being reorganized and starting to form two different cells. This is what happens when DNA is replicated. What you can see here is the gray original DNA starting from the right and proceeding off to the left. On the far right it's all coiled up as it is in its natural state then it's being unzipped and added to with the red and the yellow to form two new DNA coils. The colorful structures that you can see are enzymes. One of them is unzipping it whilst the two big green enzymes are responsible for completing the original pieces of gray DNA with the new red and yellow bits. What's happening here is that the gray DNA has a sequence of bases like A and G and for each base there's only one possible partner base because remember A always goes with T and C always goes with G. So the big green enzymes are responsible for attaching the correct partner base at every position along the gray original DNA. Here's an amazing picture of one of these enzymes doing its work and here you can really get a sense that these enzymes are made out of individual atoms and they have a particular shape that enables them to do their work. Here you can see the DNA molecule in sort of yellow purple in the middle and the green enzyme wrapped all around it. Of course the enzyme isn't really green this is just to show you what's happening. Now here's my question. The big green enzymes that I've just been talking about are called DNA polymerase because they're responsible for building DNA. And there's various kinds of DNA polymerase but this is about just one of them which is called DNA polymerase epsilon. Now it's been found that when DNA polymerase epsilon is adding a new base to a strand of DNA the probability that it adds to the wrong one is approximately 2 times 10 to the power of minus 5. Now DNA polymerase epsilon is a protein like all other enzymes and it's manufactured from a gene on the DNA. So at some point when the enzyme is replicating DNA it will be replicating the gene from which it's made. The gene that encodes DNA polymerase epsilon has about 200,000 base pairs. So my question is what's the probability that when DNA polymerase epsilon creates a copy of its own gene the gene from which it's manufactured? What's the probability that the new gene contains at least five mistakes? So let's say that x is the number of errors in the new gene. You should be able to see that x will have the binomial distribution. The number of trials is 200,000 because they're 200,000 bases. The probability of an error is 2 times 10 to the power of minus 5. Now obviously we want to use the Poisson distribution to approximate the probability here. So we need to find the expected value of x. We need to calculate n times p. In this situation that's 200,000 times 2 times 10 to the power of minus 5 which is 4. So we're going to have to use the Poisson distribution with parameter 4. The question is what's the probability that x is greater than or equal to 5? And that's going to be about the same as the probability that y is greater than or equal to 5. Now getting 5 or more than 5 is the opposite of getting 4 or less than 4. So this is one take away the probability that y is less than or equal to 4. And we can look that up in the table. We need to find where lambda is 4 and then look along the row from where x is equal to 4. And you should see 0.6288. So the calculation we need to do is 1 take away 0.6288 which is 0.3712. I'm going to round that to 0.4 to only one significant figure because it's clear that the numbers in the question were only very approximate. So what this shows is that there's a very significant probability that when dna polymerase epsilon creates a copy of its own gene it's going to introduce a lot of errors and this should be worrying because if every time dna is replicated a lot of errors are introduced and in particular errors are introduced into the genes which create the proteins which do the copying then very soon the whole thing is going to go disastrously wrong because the new dna isn't going to be creating the right proteins and the whole process of copying will grind to a halt. Okay well obviously the problem can't be as bad as I've just made out because otherwise life couldn't continue. What happens is that after the dna polymerase enzymes have gone along creating new bits of dna some other enzymes come along and do some proofreading and they find the errors and they usually manage to fix them in fact in the vast vast vast majority of cases they can put them right. Here's an incredible picture of this happening. What you can see is the dna strand in red and there's a mistake which you can see clearly in yellow and the mistake means that the dna has got a slight kink in it. The big blue thing is an enzyme which has been crawling along the dna looking for mistakes and here it's just found one. The shape of this enzyme means that the mistake is going to get fixed. Okay I hope you enjoyed that example. It gives you a little flavor of some of the amazing things that are going on inside our cells. But to summarize this video what you need to remember is that if n is large and p is close to zero then the binomial distribution with parameters n and p can be approximated by the Poisson distribution with parameter lambda where lambda is equal to n times p. Thank you very much for watching.