 Welcome to our lecture on probability. You may think we're going to define the word probability, but it really can't be defined. What we do define is the probability of an event. The probability of an event is the proportion of times under identical circumstances that the event can be expected to occur. So you see that that's more of an explanation than a definition, but it is quite mathematical and it does relate to expected value to expectations. It's the event's long run frequency of occurrence over the very, very long term. If let's say you toss a coin repeatedly under identical conditions, your hand doesn't get tired, there is no wind conditions. Then the probability of getting ahead on a single coin toss as we know is 0.5. Half for head and half for tail. If you do this experiment, this process repeatedly for a very, very long time, about half of the time you should get ahead, which means that the probability of getting ahead would be 0.5. This is, believe it or not, all goes towards defining the word probability, which we can't really do. We explain probability as the probability of an event, the proportion of times under identical circumstances that that event can be expected to occur. The other interesting thing about this topic is the difference between probability and statistics. They're kind of complementary. With statistical inference, as you know, we're only looking at a sample and yet we're trying to draw inferences to the entire population. With probability, it's kind of the other way around. We don't need a sample because with probability, we do know the entire population. We're analyzing the population and either we have the entire population as in a census or we have a formula, a mathematical function that represents the population as in a probability distribution, which will be our next topic after this one. The definition of the probability of an event that I just gave you on the previous slide is called an objective probability. It's a mathematical probability. Long run frequencies of occurrence, remember that. It's also considered the classical meaning of probability. We're referring to a stochastic process, a repetitive process where there are probabilities involved, obviously. Each individual outcome is not necessarily identical. When you toss a coin, it's going to be either a head or a tail. You don't know what you're going to get. And you can't predict with certainty the outcome on any particular time that you repeat this process that you toss the coin as the example. But we can talk about relative frequencies and using relative frequencies, we get to the idea of proportions. We get to the idea of probabilities. The individual results of a process like this are called events. The outcome is an event. And we distinguish it from a kind of a loose informal use of the word probability that we're not discussing in this lecture. Subjective probabilities. Subjective probabilities basically just measure opinion. It measures the strength of your personal belief. So if I say, I'm pretty sure that there's a 60% chance that this new product will succeed. Okay, we're not relying on data there. It really just measures my opinion. Or if I say there's a 90% chance that when I commute to school, it will take me less than an hour. That's the strength of my personal belief. That's a subjective probability. So a random variable is that which is observed as the result of a stochastic process. A stochastic process is what we've been talking about. The random variable takes on values, each outcome, each possible outcome or event from the process is usually numerical values. And associated with each one of those values is a probability that that value will occur. So for an example, if you toss a die, and I hope you all know what a die is, six sides, there's one number, one integer on each side. The probability it'll come up with one is the same as the probability it'll come up with two, the same as the probability three, the same as the probability four, probability five, probability six. They're all equally likely. So each one of those, the probability is one sixth. Continuing with just simple concepts here. And we'll see more about this later when we work with problems. A simple probability is also called a marginal probability. It's just a probability of a single event. Let's say probability of A. A joint probability is the probability that two events will occur together. Let's say the probability of A and B. A conditional probability is the probability of a particular event, given that we know some other event has occurred. The vertical line means given. The probability of A given B. I know B has occurred. What's the probability of A? That's what that means. Here are the basic rules of probability. Rule number one. The probability of an event will never be less than zero or greater than one. You're never going to have a probability that's negative, and you'll never have a probability that's greater than one. The entire universe is always one. There's no such thing as 150% probability. I never want to see you giving me an answer to a probability question that's less than zero. That will be one of the big no-nos. And the mathematical way of writing it, as you see there, zero is less than or equal to the probability of the event A, which is less than or equal to one. Rule two. If you take all the possible outcomes of a process and add up all the probabilities of these outcomes, it must equal one. So 100% is the total universe. If you have the process is, let's say, tossing a die, and the die has six sides, and it's equally likely that any of the numbers from one to six will come up as a possible outcome. They're all equally likely, one six plus one six plus one six, and so on. It equals 100%. If you toss a coin, the probability of a head is one half. The probability of a tail is one half. Together, the sum of the probabilities is one. We're going to learn the rule of addition. Events A and B are mutually exclusive. It simply means that if A you can't have B, if B you can't have A, then the rule is simple. The probability of A or B is the probability of A plus the probability of B. Simple as that. Examples of mutually exclusive would be head, tail. You can't have both. If somebody says, flip a coin, what do you get? You say, I got a head, tail, I think you're crazy. You get a head, you get a tail, you can't have both. What happens if they're not mutually exclusive? This is the general formula. The probability of A or B is the probability of A plus the probability of B minus the probability of A and B. That's the general formula. However, bear in mind, if A and B are mutually exclusive, that last term, probability of A and B is zero. In fact, that's the definition of mutually exclusive. That the probability of A and B is zero because they can't occur together. Let's take the example of you toss a die. What's the probability of getting a one or a two? Well, that again is a perfect example of mutually exclusive. You can't get a one two. You get a one, you get a two, which can't get both. You toss a die. The probability of a one or a two is the probability of one, which is a six, plus the probability of two, which is a six. That probability then is a third. These are perfect examples. A Venn diagram is an easy way to describe mutually exclusive versus non-mutually exclusive. See on the left, you see A and B, they're not intersecting. They're separate. A and B are mutually exclusive. Notice the diagram on the right. They show A and B not mutually exclusive. Because note, with A intersect, the probability of A and B is not zero. There's a kind of an overlap, and that's called the intersection. That's the probability of A and B. On the Venn diagrams, you can see that the probability of not A, A' means not A, is just one minus the probability of A. You also note the probability of not A or not B is one minus the probability of A and B. You can see this very clearly from the diagram. Let's look at an example of numbers. You go to a college, they allow double majors. 10% of the students are majoring in accounting, 15% are majoring in business. B is the accounting major, B is the business major, and 3% are double majors, majoring in both A and B. Then you're asked, what is the probability of being an accounting or business major in that school? Note, you just can't simply add 10% and 15%. You can't just add those two probabilities because you'll be double counting. These students who are majoring in both will be on both lists. To avoid that problem, here's the formula. The probability of A or B is the probability of A minus the probability of A and B. So it's 0.10 plus 0.15 minus 0.03 and that's 0.22. Again, if you don't subtract that 0.03, you've counted those people twice. Rules of multiplication I use when you want to determine what's called a joint probability. A joint probability is the probability that two events will occur together. So the probability A and B, that's the probability of A times the probability of B, but only if the events A and B are independent. Events A and B are independent if knowledge of the occurrence of B has no effect on the probability that A will occur. We'll explain this in the next slide. The probability of independent events. So let me define independent events. A and B are independent if knowledge of the occurrence of B has no effect on the probability that A will occur. Notice A and B, even the word in English, independent means they're unrelated. Here are some examples. Well, I ask you, what is the probability of having blue eyes given that you're male? It's the same because men and women or whatever, any group, the probability of blue eyes is constant. So we say they're independent. Eye color and gender are independent. On the other hand, the probability of being over 6 feet tall given that you're female is not the same as the probability of being over 6 feet tall when you're including everybody. The minute you know that somebody is a male, that's given male, you know the chance that they're over 6 feet tall is more than if you know that they're female. Now look at the next problem 3, example 3. Probably getting into a car accident if you're under 26, given that you're under 26. That's not the same as the probability of getting into a car accident in general. And that's why you pay more insurance if you're under 26. Now look at example 4. The probability of getting ahead on the second toss of a coin, given that you've got ahead on the first toss of the coin. It's still point 50. They're independent. In fact, if you get four heads in a row, the probability of getting ahead on the fifth trial is still 50%. It doesn't change. They're independent. What happens from the first toss does not affect the second toss, does not affect the third toss. And here we're going to learn some basic rules. In general, this is called the multiplication rule. The probability of A and B is the probability of A given B times the probability of B or the probability of B given A times the probability of A. That's a general rule. The probability of A given B is called the conditional probability. So if A and B are independent, that term, the probability of A given B is just the probability of A. So if you look at the formula above, you'll see that if the A and B are independent, the probability of A given B will just be the probability of A. Or the probability of B given A is just the probability of B. So rule 5, conditional probability. Basically, if you look at the formula in 4B, we can see that you can compute the conditional probability. The probability of A given B is the probability of A and B divided by the probability of B. Base theorem is an optional topic. But if you look at it, you'll see you want to figure out the probability of A given B. You can use those formulas there and you'll see what base theorem is about. I'm going to discourse without holding you responsible for it. Let's look at a little problem. In a small village in upstate New York, we are looking at readership of two newspapers, the New York Times and the Wall Street Journal. We know that the probability that an individual reads the New York Times, that's P of T, is 0.25. We know that the probability that an individual reads the Wall Street Journal, that's event W, P of W, is 0.20. But those events are not mutually exclusive. Somebody, some people, could read both. And indeed, the joint probability of both events reading the New York Times and reading the Wall Street Journal is 0.05. So we know there is an overlap. Now question, what's the probability of reading anything? What's the probability of being either a New York Times reader or a Wall Street Journal reader or just reading one of those newspapers? Well, we have to use the formula. Remember, we have a formula for this. That's why we have them because sometimes the formula is the easiest way to figure these things out. The probability of T or W is equal to, it's the probability of T plus the probability of W minus the probability of the intersection of the joint probability. And whenever you have events that overlap, you're taking away that overlap piece, the joint probability, because when you add the two probabilities, you don't want to count it twice. We saw that before and we see it again in this example. 0.25 for the probability of T, 0.20 for the probability of W, 0.05 for the intersection, all together that comes out to 0.4. The probability of reading either thing, either newspaper, 40%. Another way to look at this problem is by graphing it with the graphical approach by using the Venn diagram. Over there, we are using, instead of probabilities, we're looking at the universe as 100% or 100 people, let's say, to represent 100%. The number of people out of 100 who don't read anything is 60, which makes sense because on the previous slide, we saw that the probability of reading anything was 40%. How does that break down? The probability of the overlap is 0.05 or 5 out of 100 people. 5 plus 15 is 20 people who are out of 100 who read the Wall Street Journal. 5 plus 20, 25 out of 100 who read the New York Times. 20 people read the Times, but not the Wall Street Journal and so on. So we can really see how that's laid out when we graph this in a Venn diagram. Okay, let's compute some more probabilities. We're basically still using that Venn diagram. The probability of not T and not W is 0.60. 60 out of 100, 60% of people don't read the New York Times and don't read the Wall Street Journal. Not T and yes W is 0.15. Yes T and not W is 0.2, 0.20. The probability of T or W, we did that before. We know that that ends up computing to 0.4 or 40%. The probability of not T or not W following the formula ends up being 0.95, 95%. This means that 95% of the people in this population don't read at least one of the two papers. Only 5% read both of them, but in order to get into that probability, you had to not read at least one. So we've seen using the formulas, we've seen using Venn diagram and now we have another way to do this and we're actually going to use this a lot in this topic and in a later topic. Where we construct a table of joint probabilities. The cells in the middle are the joint probabilities. The cells on the margins, the row totals and the column totals are simple probabilities or marginal probabilities. So you see the probability of T and W, 0.05, probability of W and not T, 0.15. All the probabilities that you had in the Venn diagram are here and then some because we add up across the rows to get the row totals. We add down the columns to get the column totals. And of course, always the column totals add up to the total universe 100% and the row totals add up to 100% too. It's very important to know the difference between the terms independence and mutually exclusive. Don't confuse those two terms. Independence has to do with the effect that B has on A or A on B. So for example, if you say the probability of A given B is the probability of A, they're unrelated. They're basically independent. And mutually exclusive simply means it's kind of a physical thing. They can occur together. If I look at a part, it's either defective or not defective. So the probability of them being both defective and not defective at the same time is 0. Well, the probability of a head tail as we said before is 0. You can't have them both. That's called mutually exclusive. And that means that the probability of A and B is 0. And here's an example that we give you with waist size and gender. Are they independent of each other? Let's make it simple. We're looking at males and females. And we know somebody has a 24-inch waist. Generally, we can assume the person's more likely to be female with a 24-inch waist than male. How do we know that? We can look at the probabilities. The probability of having a 24-inch waist given that you're a female and we'll get a probability. It should not be the same as the probability of having a 24-inch waist given that you're a male. And you can do the same, by the way, with height. If I said somebody is six feet tall, chances are they're more likely to be male than female. So again, just to simplify things, if the probability of A and B is 0, they're mutually exclusive. If the probability of A given B is the same as the probability of A, then they're independent. Let's talk about independence. Researchers are always testing for relationships. We want to know whether the two variables are related. For example, is there a relationship between cigarette smoking and cancer? Or are they independent? We know the answer to that one. The minute we find that you're a cigarette smoker, we know that your lifespan is going to be approximately seven years less than if you're a non-smoker. Is there a relationship in general between occupation and how long you're going to live? Well, they're independent. The answers are they're not. The people with one of the longest lifespans are librarians. Professors also have very long lifespans. But if you're a coal miner, your lifespan is among the shortest. I think the only one that's worse are drug dealers. I think coal miners, in general, they're lucky if they get to beyond 55. So clearly there's a relationship between occupation and lifespan longevity. Is there a relationship between salary and how slender you are if you're a woman? Well, guess what? This is a certain kind of discrimination, obviously. But the studies show that women who are very slender are going to earn higher salaries than women who are overweight or even normal weight. For a woman, being slender is a plus in terms of salary. How about dates and how you look? Well, you know what they found? They've done studies of this type and they found that women with a blonde hair get more dates on these dating sites like e-Harmony than women with dark hair. Men who are lawyers, guess what? They get many more dates than the men who are in blue collar occupations. So again, this is something we want to study, relationships between variables. This is called a contingency table. Look, we have smoker and non-smoker. Notice S' means non-smoker. And then we have diet of cancer, C. And C' means they didn't die of cancer. Notice there are 1,000 people. Now, these are not probabilities. The probabilities go from 0 to 1. We're looking at the frequencies. The contingency table shows you the frequencies. If you divide all the numbers by 1,000, now you'll have probabilities. Y1,000? That's the sample size. There are 1,000 people. You can add up the rows of columns and add up to 1,000. So divide everything. Now, from this table, assuming you divide it through by 1,000, you can get all the joint probabilities. I'm going to show you a table with that in the above two slides. So here, look at the joint probabilities. Let's take CNS. There are 100 people out of 1,000. That's 0.10. So 10% of the people here with smokers who died of cancer. Look at the probability of not CNS. Not CNS. That was 300 out of 1,000. That's 0.3. Let's look at one more. The probability of not being a smoker and not having cancer. And that's 550 out of 1,000, 0.55. So these are called joint probabilities. That's the numbers in the cells. If you look at the row totals, the column totals, those are called marginal or simple probabilities. Let's look at the probability of just getting dying of cancer in this community. Well, 150 died out of 1,000, 0.15. How many people did not die of cancer? 850 out of 1,000 or 0.85. How many people with smokers in the sample? 400 out of 1,000, 0.4. How many were non-smokers? 600 out of 1,000 or 0.6. A researcher looking at that table will try to determine whether smoking and cancer are independent. Independent means unrelated. So here's a simple way to do it. Look at the probability of dying of cancer. And is that the same? Question mark. Is that the same as the probability of C given S, cancer given that you're a smoker, and is that the same as the cancer given you're a non-smoker? Let's look at the three different probabilities. Okay, the probability of C given S. Well, using the formula, that's the probability of C and S over the probability of S. That turns out to be 0.10 of a 0.40, a 0.25. In simple English, that means if you're a smoker in this community, 25% you're going to die of cancer. Let's look at the probability given not smoker, non-smoker, S prime. That's the probability of C and S prime over the probability of S prime, which is 0.05 divided by 0.60 or 0.083. Notice 0.15, which is the probability of dying of cancer, which includes everything, is a weighted average of those two numbers. So you can see clearly cancer and smoking are not independent. There's some kind of relationship. In simple English, we say you're a smoker, 25% chance you're going to die of cancer. Non-smoker, 8%. It doesn't take a genius to figure out there's some kind of relationship here. And those who smoke are much more likely to die of cancer than those who do not smoke. That's another way of doing the same problem. You want to know how smoking and cancer are independent. By definition, if C and S are independent, then that's called the probability of C and S. It's just the probability of C times the probability of S. Well, we're going to check to see if that's the case in this example. Well, the probability of C and S we saw was 0.10. Well, it was 0.10 equal to the probability of C, 0.15 times those are marginal probabilities, times the probability of S, 0.40. Well, 0.10 is not equal to 0.06. Right away, we know that cancer and smoking are not independent. That's why you want to set this up as a joint probability. I'll show you in a moment how to do that. Okay, here we're going to take the original data, divide through by 1,000, and right away you can see that you have what's called a joint probability table. That's the table on the bottom. We have S and not S. Smoker, non-smoker, cancer, non-cancer. Notice by dividing through by 1,000, we have 0.10 probability, that's C and S. And we have 0.05 is C and not S. The marginal probability of 0.15, that's just dying of cancer in general. And then we have not dying of cancer, we have two groups. The smokers, 0.30. And then we have the non-smokers who didn't die of cancer, 0.55. And notice the marginal probability there is 0.85. So it's always good to take the table that has the frequencies, divide through by the full sample size of N, and you get the joint probability table, including the marginal probabilities. Here you can see what you gain by turning it into a joint probability table. See, once you get a joint probability table, you can answer any question about probabilities. Notice the joint probabilities of the center of the table. The marginal probabilities are on the margin. The conditional probabilities, all you gotta do is take a divide, the joint probability divided by the marginal will give you the conditional. So let's take one example. You wanna know the probability of C given S. Okay, so you look at the probability of C, that's a marginal, 0.15, I can see it right there. Look at the probability of C and S, that's in the table itself, and that's 0.10. So when you divide 0.10 over 0.40, that's the probability of C and S over the probability of S. I've got to mention that the probability of S is 0.40, that's a marginal. You can get the probability of C given S by taking the probability of C and S, the 0.10, over the probability of S, which is 0.40, and then you get 0.25. Here's an example. Gender and drinking beer. We're only looking at male and female here in this example. There really weren't enough people identified as anything other than male and female to make it into the example. And similarly, we're only looking at, do you drink beer or do you not drink beer? And not any gradations of light beer, pale ale, and so on. The universe that was looked at was 2,000 people. The contingency table, in other words, the two-way frequency table, is on top. The joint probability table that was computed from the frequency table is below it. And how do you get it? Again, you take your grand total of 2,000, divide every frequency in the table by 2,000, and you get your joint probabilities in the middle of the table. You get your marginal probabilities in the margins. And you can even see the joint probabilities and the marginal probabilities listed there. The probability, the joint probability of being a beer drinker and a male is 0.225. The marginal probability of being male, that's 900 out of 2,000, and that was 0.45. The marginal probability of not being a beer drinker was 1,200 out of 2,000 or 0.60. And we'll continue on the next slide. Here we have some questions to answer about the data on the previous slide. We have that probability, the joint probability table, and we can answer any question from the table, including conditional probabilities, where we just have to find the appropriate joint probability and the marginal probability for the event after the given, and we can compute the conditional probability. So given that an individual is male, what's the probability the person is a beer drinker? That's the probability of B given M. The joint probability B and M is 0.225. The marginal probability, the probability of male, is 0.45. Divide one by the other, so the conditional probability we're looking for is 0.5. Next one, what about the probability of being a beer drinker given female? Well, probability of beer and female divided by the probability of female and you get 0.318. So we've already seen the probability of being a beer drinker given your male is 0.5. The probability of being a beer drinker given that the person is female, 0.318. The probability of being a beer drinker without looking at sex, without looking at whether the person is male or female. Well, we know from the table that's 0.4. Are beer drinking, are these two events, are beer drinking and male or beer drinking and female? Are they independent? Clearly not. If they were independent, then the probability of being a beer drinker given that the person is male would be equal to the probability of beer drinking without knowing anything about sex. The probability of drinking beer given the person's female would be the same as the probability of being a beer drinker given the sex of the individual. So clearly there's some relationship. They're not independent. It's a very similar problem. I'm not going to belabor it. You may want to take a minute, pause the video if you're watching the video or wait before you move on if you're using the PowerPoint and just take a look. We've got a population here of 1,000 individuals. We have male-female and instead of beer, we're looking at if they use Dove soap or not. So over the 1,000 people, 200 use Dove soap and the 800 do not. 400 are male. 600 are female. 480 of the females do not use Dove soap. 80 males do use Dove soap. That's the data in the contingency table. We could compute probabilities. The joint probabilities, the probability of Dove soap and male, 8 over 1,000 is 0.08. As a matter of fact, since the grand total is 1,000, all we have to do to create the probability table is move over the decimal places. The probability of using Dove soap is 0.2. The probability of female is 0.6. The probability of the row totals and the column totals add up to 1. And can we also get conditional probabilities? Sure. We just take a joint probability from the center of the table, divide by a marginal probability. Let's see how that works. We use the conditional probabilities to test for independence because that's actually the definition of whether events are independent or not. If knowing that one occurs, does that change the probability that the other one occurred? Well, if we're looking at the probability of using Dove soap, the marginal probability, the probability of that single event all by itself is 0.2. The probability of using Dove soap given your male computes to 0.2. The probability of using Dove soap given your female computes to 0.2. Yeah, 0.2, 20%. So yes, these two events are independent. The alternative method uses the formula for computing joint probabilities, the multiplication rule. If two events are independent, we can get the joint probability by multiplying the two simple probabilities or marginal probabilities. Well, the probability of male and used Dove soap is 0.08. The probability of male is 0.4. The probability of D using Dove soap is 0.2. So then we test. Here's how we test. Is the joint probability 0.08 equal to the multiplicity product of the two marginal probabilities? 0.4 times 0.2, which actually is also 0.08. So yes, that's a second method that gives us the same answer. The two events are independent. Always do as much homework as you can find. Do as many problems as you can find. Do more than what's required because it will stand you in good stead. You'll be able to do your problems quicker and more thoroughly. You'll know when you know something. You'll know when you don't know something. Practice, practice, practice.