 Hello! Today we're going to be talking about discrete random variables. And for this chapter specifically, this is covering chapter four in the statistics book, I want you really to go through and look at the examples that are given for each of the topics that we're talking about. These topics may not make sense immediately, but if you see some examples that go along with them, especially for the distributions, then you will have a much better idea about how these random, what is a random variable, and how these different distributions are used, what we learn from them. So make sure you go through chapter four and look at the examples of especially how the distributions are used. So first, a random variable describes the outcomes of a statistical experiment in words. So for example, a student takes a 10-question true-false quiz because the student has such a busy schedule he or she could not study and guesses randomly at each answer. I hope this is not you guys, okay? So what is the probability of the student passing the test with at least a 70%? This is, here we're describing, you know, discrete random variables because we are describing, we're asking the question in terms of using words basically. Instead of describing it specifically as a value or a number, basically, we can describe discrete random variables in words. So next, a small companies, sorry, small companies might be interested in the number of long-distance phone calls their employees make during peak time of the day. Suppose the average is 20 calls. What is the probability that the employees make more than 20 long-distance phone calls during the peak time? Again, we are using words to describe, in this case, the probability that the employees make more than 20 long-distance phone calls. That's what we're interested in. So we are looking for discrete random variables and these are normally talked about in words rather than numbers. So random variable notation, just a little bit about how we write things, uppercase letters, uppercase letters like capital X and capital Y, you can choose any, any letter of course for your variable, uppercase letters denote random variables where the lower case letters denote the value of the random variable. So if you see the uppercase X, in this case, it denotes the random variable, which will be in words, right? And to the lower case letters will denote the value of a random variable, which is actually then the number in this case. So the value that we actually measure or what we observe. If X is a random variable, the X is written in words and the X, the uppercase X is written in words and the lower case X is given as a number. A discrete probability distribution function has two characteristics. So again, think about, whenever I'm talking about these distributions, think about the characteristics and then go to the book and look at the examples of each because they give a lot of very good examples. Excuse me. So a discrete probability distribution has two characteristics. Each probability is between 0 and 1 and the sum of probabilities is 1. So each probability is between 0 and 1 and the sum of probabilities is 1, which means that for this probability distribution that we're looking at, all of the probabilities have to add up to 1, which means they are of course related to each other because they are in the same probability distribution here. So discrete probability distribution function has two characteristics. Each probability is between 0 and 1, of course it has to be, and the sum of the probabilities is 1. So an example of this, this one is also from your book, but there are many more. A child psychologist is interested in the number of times a newborn baby is crying wakes its mother after midnight. For a random sample of 50 mothers, the following information was obtained. Let x equal the number of times per week a newborn baby's crying wakes its mother after midnight and for this example x equals 0, 1, 2, 3, 4, and 5. So the probability of x equals the probability that capital X takes on a value lowercase x. So then we see in this little table here, this is our probability distribution for, in this case, a discrete random variable. So we have our probability distribution and we know, for example, it's a discrete probability distribution because each probability is between 0 and 1 and the sum of the probabilities is 1. So here, if you look at px in the table, every probability is less than 1. Every value in this case is less than 1 and whenever you add them up, you will find that it is equal to 1. So x takes on the values 0 through 5. Each probability of x is between 0 and 1 and the sum of all of these probabilities is 1. So we can tell that it is a discrete probability distribution. Again, really the best way to do this is go through all of the examples and make sure you understand what I mean by discrete probability distribution. What is the distribution? Well, the distribution is actually what we've observed in this table here, but make sure you understand how it works, how the distribution would be different than other distributions. So the next distribution we'll talk about. Sorry, we're not getting to the other distributions yet. So first mean and standard deviation, the expected value, the expected value whenever we're doing an experiment, is the long term average or mean. So like I talked about before, the more data we have, the more accurate or confident we can be in our probability estimations. So the expected value is essentially what we calculate and then the observed values, if we have enough data, will eventually start to reach the expected value. So the expected value is the long term average or mean. You may or may not, depending on what your measurement is, you will be able to calculate the measuring. You may not get the expected value with only a few samples. The more samples you have, the more likely you are to be closer to the expected value. Again, this is the law of large numbers and it still applies here. So we need as many samples as possible. Think about flipping a coin. Like I talked about before, if you flip a coin and you have 100 flips, most likely heads and tails will not be even. They won't be 50-50, for example. They will be 40, possibly even 30. They just won't be even as we would expect. But if you get around 10,000, 100,000, maybe even more, then it starts to get closer and closer to the expected value as long as we have many, many data points. To find the expected value or long-term average, this U is Mu. Simply multiply each value of the random variable by its probability and add the products. So to find the expected value or long-term average, which we call Mu, simply multiply each value of the random variable by its probability and add the products. So now another distribution. Here we have a binomial distribution and its features are a fixed number of trials. So think of trials as repetitions of an experiment. So how many times did we actually repeat the experiment to find some results? If we do an experiment one time, depending on how we've set the experiment up, that's not enough to give us enough data. So think about, again, a coin flip. One experiment might be flip a coin, see if the coin is heads or tails. Now, if that's what we consider one experiment, then we have to do many experiments to be able, or many trials, I should say, to get the amount of data that we actually need to calculate confidently, our probability distribution. So in this case, a binomial distribution features fixed number of trials. So the trials or repetitions of the experiment, the letter N denotes the number of trials here. There are only two possible outcomes called success and failure for each trial. The letter P denotes the probability of success on one trial and Q denotes the probability of a failure on one trial. So P plus Q equals one. So here, I think in this second feature, you can start to see what's going on. It's a binomial, so there's by, which means two, two options basically. We have success, we have failure, and the probabilities of the success plus the failure equals one. So we have a fixed number of trials, we're adding essentially the probability of success, probability of failure, and that should equal one. And the N trials are independent and are repeated using identical conditions. So we have an experiment set up where we can control the conditions. We have N number of trials that are independent, which means that one trial will not, the outcome of one trial will not affect the outcome of another trial. So in this case, these are the features of a binomial distribution. And binomial distribution notation here is X is a random variable with a binomial distribution in NP. So basically, we read this X squiggly mark B in P as X is a random variable with a binomial distribution. The parameters here are N and P, like we talked about, N is the number of trials and P is the probability of success on each trial. So we know the number of trials, we know the probability of success on each trial. So this tells us that X is a random variable, and how did we get that? Well, we did, we have a binomial distribution, we know the number of trials and the number of successes. Depending on what you are, you are studying, think about what that would actually mean. We have the number of trials, we have the probability of success, success, sorry. So this can tell us, you know, do we have a lot of data? Sometimes, especially in the medical field, we might only have, you know, two or three cases of somebody with a specific type of disease, and a specific cure was used, right? But because it's more dealing with people, we don't really see the disease very often. Maybe, maybe we don't really have a lot of data. So in that case, the binomial distribution here, we're looking at, you know, successful treatments, or something like that, we could use binomial distribution to denote that. And we would accept a small number of trials because, well, we don't want to make people sick to test on them, right? So we only have to deal with six people that we actually get in. Other times, whenever we can easily replicate our experiment, we would expect the number of trials to be relatively high, because that would increase our confidence that of the success for each trial. Next, geometric distribution features. So in the features of geometric distribution, we keep repeating what you are doing until the first success, and then you stop. So whatever you're attempting to test, you keep repeating your experiment until the first success, and then you stop. So in this case, we have success and failure. So every time we attempt to try something, we have basically failure, failure, failure, and then eventually we have a success, and that's at the point when we stop. So in theory, the number of trials could go on forever if we keep failing, depending on what we're testing. And there must be at least one trial. So if you have one trial and you get a success the first time, then you have at least one trial in that case. If it's only one trial, it's probably not a very interesting answer, depending on what you're testing, I guess. The probability P of a success and the probability Q of a failure is the same for each trial, right? So in this case, the probability, the probability, let's say, of success plus the probability of failure equals one, and the probability of failure equals one minus P. So in this way, just like the prior distribution, we can calculate the probability of success, calculate the probability of failure based on the data we collect, and they should equal up to obviously one because that means that something happened. So geometric distribution notation, here we have x and then the tilde Gp, and we read this as x is a random variable with a geometric distribution, x is a random variable with a geometric distribution, and the parameter is P, where P equals the probability of a success for each trial. Now notice we don't know how many, we're not explicitly told, let's say, how many trials we had, but because of the way that geometric distribution is calculated, I can figure out how many trials were actually done based on the probability of success, right? Because we fail, and then once we have the first success, then we stop. And then that gives us our probability distribution. So I can use that information to figure out how many trials we actually had. So this gives us very similar information, but we use it for slightly different reasons. Again, look at the examples for more information. Okay. Hyper geometric distribution features, so another distribution. In this case, we have samples from two groups, samples from two groups, and we are concerned with a group of interest called the first group, right? So we actually want to study this first group, but we have two groups we're interested in one of the groups, and that's what we want to study. So in this case, from both groups, we sample without replacement from the combined groups, right? So we have both groups combined, we sample without replacement. Remember what without replacement does? If we sample without replacement, then we are actually increasing the probability by removing objects from the sample space, right? So in this case, let's say we have 100 objects, 100 jelly beans, and we sample one, we've taken one out, and now we have 99 jelly beans, so the probability that one of those jelly beans is chosen is going to be higher next time, right? So in this case, we're sampling without replacement from the combined groups. Each pick is not independent since sampling is without replacement. So each pick depends on the prior pick. So if I remove one jelly bean from the pot, well, I've increased the probability of selection for the 99 other jelly beans, so it is dependent. It is dependent on my prior selection. So we're not, and in this case, we're not dealing with Bernoulli trials, which is sample till success. We have samples from two groups, concerned with a group of interest, and we sample without replacement from the combined groups. So in this case, the distribution notation looks pretty much the same as before, x tilde h, r, b, n, where x is a random variable with a hypergeometric distribution. That's how we say it. And the parameters are r, b, and n, where r is the size of the group of interest of the first group. Remember, the first group is the group that we're interested in. b is the size of the second group, and n is the size of the chosen sample, right? So how many samples did we take? So our first group, maybe that's the group we're interested in, and we only have 10, 10 objects in that group. b is the size of the second group, right? And maybe we have 100 objects in that group, but we're actually interested in the first group. And then n is the size of the chosen sample. And then that's how many samples we actually took, and we can generate this hypergeometric distribution based on this sampling method. Next, Poisson distribution features. The Poisson probability distribution gives the probability of a number of events occurring in a fixed interval of time or space, if these events happen with a known average rate, and independently of the time since the last event. Poisson distributions are used for a lot of different things, especially in investigations and things like that, but basically we're looking at probability distributions over times or space, and that's the important part. The Poisson distribution may be used to approximate the binomial if the probability of success is small, such as 0.01, and the number of trials is large, such as a thousand or more. Right, so I thought this was best explained with an example, so consider that a book editor might be interested in the number of words spelled incorrectly in a particular book. So a book editor might be interested in the number of words spelled incorrectly in a particular book. It might be that on average there are five words spelled incorrectly in 100 pages, so in this case the interval is 100 pages. So this is a type of problem we can solve with the Poisson distribution, because we're interested in basically the probability that there's going to be some sort of spelling error per 100 pages. Right, so we have an interval of essentially space in this case, which is 100 pages, and we want to know what's the probability that I'm going to find an error. So Poisson distribution notation, we have x tilde p mau, mau, mau, yeah. x is a random variable with Poisson distribution, is how we describe this, and the parameter is mau, and that is the mean for the interval of interest, the mean for the interval of interest here. So in that case the mean error for the interval of interest for misspelled words or whatever we're measuring. So that is it for distributions, discrete random variables, the distributions that go along with them. Again, I urge you to go through the book and look at the many examples that they have for each of those distributions. Each distribution is very powerful for specific types of problems, so we use them for very specific things, really go through and make sure you understand what each distribution does. So that's it for today, thank you very much.