 This is a video about criteria for using the Poisson distribution. There are four criteria. First of all, the events must occur randomly in a fixed interval of time or space. Secondly, the events must occur at a constant average rate. What this means is that events are just as likely to happen in one part of the interval as they are in any other part. Or more precisely, if you divide the interval into equal subintervals, the number of events that you would expect to happen in each subinterval is the same. Or again, if we divide the interval into subintervals, the number of events that you would expect to happen in each subinterval is proportional to the size of the subinterval. Thirdly, the events must occur independently and fourthly, they must occur one at a time. Let's look at some examples to see how these criteria apply. First of all, imagine that you've broken down on a remote highway in a corner of America. Suppose that you've broken down for two hours and that the random variable x is the number of cars that go past you. We can go through the criteria one at a time and see whether they apply. So first of all, it's reasonable to think that cars will go past in a random way and we do have a fixed interval of time, two hours. So we can tick the first criterion. Secondly, cars will go past at a constant average rate. There's no reason to expect more cars to go past at the start of the two hour period or at the end or anything like that. So we can tick the second criterion. Thirdly, cars will pass independently. For in the middle of nowhere, it's very unlikely that any cars will be travelling in convoy. So if one car goes past, there's no reason to think that another car is about to go past. This means we can tick the third criterion. And finally, I've already said that the cars will go past one at a time and so we can tick the fourth criterion. So in this case, the random variable will have a Poisson distribution. My next example is to do an email. Suppose I choose a 30 minute interval during the working day. And x is the number of emails I receive. Let's go through the criteria again. Well, I think emails do arrive randomly and I specified a fixed interval of time. Secondly, over a period of half an hour, email probably does arrive at a constant average rate. It's not that emails will arrive more frequently at the start of the half hour and less frequent at the end or anything like that. The rate at which emails arrive will probably stay constant over a period of half an hour. It will be different if I chose to look at a whole day because there must be sometimes of day when I receive more emails and other times of day when I receive less. For example, obviously in the middle of the night, I'm not going to be receiving emails from my work colleagues whereas during working hours, I tend to receive rather a lot. I don't think that emails arrive independently, however. First of all, sometimes email is sent to a group of people and that triggers a response from some of the recipients. And so it often happens that one email is quickly followed by some others. Also, sometimes I get an email and then I reply to it and then somebody replies to my reply. So again, when I receive one email, it increases the probability that I'll receive another email shortly afterwards. The last criterion is tricky. If you have a phone which is set to go and retrieve email every five minutes or so, it will appear as though you sometimes get emailed together. But this is only because of the way that the phone is checking. In reality, you do receive emails one at a time. It's just that sometimes they're too close together for your phone to keep up. So I would say that emails do arrive one at a time. My next example is to do with birth. Suppose that we look at a hospital maternity unit for a period of 24 hours and we say that X is the number of births. Let's go through the criteria. I don't think it's the case that the events occur randomly in a fixed interval of time because some births occur by elective cesarean and these are planned. Secondly, I don't think that births occur at a constant average rate. Elective cesareans presumably almost always happen during normal working hours. And also I think it's more likely that births will be induced during the daytime than it is in the middle of the night. So there are probably more births during the daytime than there are at night. Births also don't occur independently because some births are births of twins or triplets. And of course, the birth of the second twin or the third triplet is bound to follow soon after the birth of the first one. Finally, I don't think it's always true that births occur one at a time because presumably if twins are born by elective cesarean, they emerge at pretty much the same time. So this shows that the number of births on a maternity ward over a period of 24 hours does not have the Poisson distribution. Before we move on, there's something important to point out. Suppose that the hospital is interested only in the number of randomly occurring genuinely unpredictable births. The number that occur in a period of 24 hours probably does have approximately the Poisson distribution. And this is because the exceptional cases like twin births and the births of triplets only occur very rarely. And so their effect on the probabilities is only marginal. The probability for any particular number of births is probably very close to what you would calculate by assuming the Poisson distribution as a model. So if the hospital were trying to find the probability for a particular number of births, for example, that it could plan its staffing grids, it would not be unreasonable to assume the Poisson distribution as a model and calculate the probabilities accordingly. The errors that would result from doing this would be extremely small. My next example is to do with crime. Here's a very interesting graphic from The Guardian. It shows that since the Labour Party came to power in 1997, crime has approximately halved. It's gone down from really 20 million crimes per year to under 10 million. At the same time, popular impressions of crime are completely wrong. The vast majority of people think that crime has been increasing, at least on a national level. Intriguingly, only minority thinks that crime is increasing in their local areas. Anyway, suppose that we monitor a city centre for 24 hours and we say that X is the number of crimes committed. Let's go through the criteria. First of all, I think it's reasonable to say that crime occurs randomly, and I've specified a fixed interval of time. But does the crime occur at a constant average rate? Well, I don't think so. Firstly, because if you look at the geographical spread of crime, there are some parts of a city centre where it's more likely to find crime than others. Crime is concentrated in some very particular areas in most city centres. So this means that crime does not occur at a constant average rate. And there's another reason for this as well. It's obviously the case that crime is more likely to happen at some times of day rather than others. The rate of crime is highest in the evening and lowest in the morning. Next, it doesn't seem very likely that crimes occur independently. Certain times of crime probably happen in quick succession. For example, street crime and robbery. Somebody has just robbed somebody. It's quite likely that they'll rob some other people in quick succession. And finally, I don't think that crimes happen one at a time, because again, if you think of certain types of crime, like street crime and violence in a city centre on a Friday evening, you probably get quite a lot of criminal activity happening simultaneously. So the number of crimes in a city centre over a period of 24 hours won't follow the Poisson distribution. My last example is to do with the manufacture of computer chips, like this one in the centre of the screen. First I need to tell you something about how these are made. If you look inside the casing, you would see something like this. But you probably need to have a sense of what's going on up close. If we were able to zoom in really close, you'd see something like this. What you've got here is a silicon base that's full of transistors. In this picture, each of the red things is a transistor. The transistors are connected together by copper wires. A modern computer chip has hundreds of millions, or in fact probably billions of transistors, all connected together with tiny, tiny, tiny wires. So how are these made? Well, the process starts with an ingot of silicon, and you can see one in this picture. The silicon is then chopped into very thin wafers. The wafers are then prepared with transistors, and a variety of engineering processes are used to connect these together with copper. A silicon wafer actually ends up with many, many, many chips on it. This one here has got hundreds on it. And the wafer is cut up into individual chips at the end of the process. Now the manufacturing process is carried out by the most precise and sophisticated machinery. And it happens in an ultra clean and sterile environment, where people have to wear suits like this and breathe their own special air. But nevertheless, things always go wrong, and not all of the chips on the surface of a silicon wafer will be perfect. In fact, wafers always have flaws on them. And before they're cut up into chips, the chips have to be tested to find out which ones of them work and which don't. After this, many of them are thrown away, and only the ones which actually work are then passed on to be made into computers. So suppose that a silicon wafer is produced, and x is the number of flaws somewhere on the wafer. Let's see if this has a Poisson distribution. Well, first of all, the flaws do occur randomly. In this case, in a fixed interval of space, the surface of the wafer. Secondly, they do occur at a constant average rate, because flaws are just as likely to occur in one part of the wafer as any other. If you take any portion of the wafer and compare it to a different portion of the same size, you would expect the same number of flaws in each piece. Thirdly, flaws do occur independently. If you've got one flaw somewhere on the wafer, it's neither more nor less likely that you'll find another one close by. And finally, flaws do occur one at a time. So this means that the number of flaws on the surface of the wafer does have the Poisson distribution. Suppose now that we look at a different random variable, though. So instead of x being the number of flaws, we say that x is the number of faulty chips. Let's see if this has the Poisson distribution. Well, is it the case that the events occur randomly in a fixed interval of time or space? Well, no, it's not. We're not really dealing here with the situation where we've got events occurring randomly at a particular time or place. This isn't a random variable with the Poisson distribution at all. This is more likely to be a random variable with the binomial distribution, because here we're dealing with a number of successes or failures in a sequence of trials. Let's go through the criteria for the binomial distribution and see whether they apply. Well, first of all, the number of trials is fixed. We've got a certain number of chips on the surface of the wafer, and these chips constitute the trials. Secondly, each trial does have the same two possible outcomes. Either the chip works or it doesn't work. Thirdly, the trials are independent. If we know that one chip works, this doesn't raise or decrease the probability that any neighbouring chip works. And finally, the probability of a chip working is the same at any point of the wafer. All the chips have the same probability of working properly. So this illustrates something important. When we were thinking about the number of flaws on the wafer, this had the Poisson distribution, because there we were dealing with the number of events that were occurring randomly in a fixed interval of space. But when we were interested in the number of faulty chips, there we had something which had the binomial distribution, because then we were considering the number of successes in a sequence of trials. Okay, so you need to remember the criteria for using the Poisson distribution. The first is that the events must occur randomly in a fixed interval of time or space. The second is that they must occur at a constant average rate, and you need to remember what that means. Thirdly, they must occur independently. And finally, they must occur one at a time. I hope you found this video useful. Thank you for watching.