 As Salaamu Alaikum, welcome to lecture number 22 of the course on statistics and probability. You will recall that in the last lecture, I discussed with you the concept of independent events and we also considered the special case of the multiplication theorem in case of independent events. We did an interesting example concerning the gender of the newborn babies and the nature of birth, life born versus still born and through this example, we also noticed that there is an important concept called marginal probability. In today's lecture, I will begin with the Bayes theorem. This theorem deals with conditional probabilities in an interesting manner. As you now see on the screen, the Bayes theorem states that if the events a 1, a 2, so on up to a k form a partition of a sample space S that is the events a i are mutually exclusive and their union is S and if b is any other event of S such that it can occur only if one of the a i occurs, then for any i p of a i given b is equal to p of a i into p of b given a i divided by the summation of p of a i into p of b given a i for i going from 1 to k. Students, I am sure that you are saying that this theorem is very difficult. Actually, it is not like this. If you see this step by step, as I have stated so many times earlier, everything will fall in place. The formula that I just presented to you is for the case of k mutually exclusive and exhaustive events. It can be written in the manner that you now see on the screen. Probability of a i given b is equal to p of a i into p of b given a i divided by p of a 1 into p of b given a 1 plus p of a 2 into p of b given a 2 plus so on up to p of a k into p of b given a k. If you open the denominator or open it, then this is the formula. If we take the case when k is equal to only 2, then the formula becomes p of a i given b is equal to p of a i into p of b given a i over p of a 1 into p of b given a 1 plus p of a 2 into p of b given a 2. Let me explain this to you with the help of an example. In a developed country where cars are tested for the emission of pollutants, 25 percent of all cars emit excessive amounts of pollutants. When tested, 99 percent of all cars that emit excessive amounts of pollutants failed, but 17 percent of the cars that did not emit excessive amounts of pollutants also failed. What is the probability that a car that fails the test actually emits excessive amounts of pollutants? Students, I will analyze this problem step by step and see how the formula of base theorem that I have just presented to you with it. The first thing is that you have seen that 25 percent of the cars emit excessive amounts of pollutants. So, it is obvious that 75 percent of the cars are such which do not emit excessive amounts. Now, what was our condition for the base theorem that a 1, a 2 form a partition of the sample space, a 1 represents the event that the car does emit excessive amounts of pollutants whereas, a 2 represents the event that it does not. So, together these two events are mutually exclusive and exhaustive and hence they do form a partition of the sample space. This is the first thing. The next thing is that in base theorem it is said that if B is another event which occurs only if one of the AI occurs. So, in this example, the second event is that the car fails this test. That means, the car either emits excessive amounts or does not. But, if the car fails, it might still fail the test. Therefore, this is the event which will be represented by B. Now, when we have identified a 1, a 2 and a 3, then let us look at that formula again. As you see on the screen, the formula for the case of two mutually exclusive events a 1 and a 2 a, probability of a i given B is equal to probability of a i into probability of B given a i divided by P of a 1 into P of a 1 into P of B given a 1 plus P of a 2 into P of B given a 2. Students, this may, the denominator, you see that there is no problem. That is, a 1 or a 2 or B is mentioned and we have identified them. But, you may be confused that what is this A i? This formula will be applied, it can be applied two times. If you keep A i as 1, then naturally the formula will be read as you now see on the screen. P of a 1 given B is equal to P of a 1 into P of B given a 1 over the same denominator as before. And if we put i equal to 2, we have P of a 2 given B is equal to P of a 2 into P of B given a 2 divided by the same denominator. Now, the question is that in this particular example, we want to compute P of a 1 given B or P of a 2 given B. If we go back to the question of this example, it said what is the probability that a car that fails the test actually emits excessive amounts of pollutants? What is the probability that the car will emit excessive amounts of pollutants given the information that it has failed the test? What is the probability that a car that fails the test actually emits excessive amounts of pollutants given the information that it has failed the test? What is the probability that a car that fails the test actually emits excessive amounts of pollutants? This can also be stated as what is the probability that the car emits excessive amounts of pollutants given that it has failed the test. Let a 1 be the event that the car does emit excessive amounts and a 2 the event that the car does not emit excessive amounts of pollutants. That is probability of a 1 given B. We are wanting the probability of the event that the car emits excessive amounts of pollutants given that it has failed the test. So, applying the formula that I presented to you, probability of a 1 given B is equal to probability of a 1 into probability of B given a 1 over P of a 1 into P of B given a 1 plus P of a 2 into P of B given a 2. In this question, we have available to us the values of P of a 1 P of B given a 1 and P of B given a 2. As you see on the screen once again, the statement was when tested 99 percent of all cars that emit excessive amounts of pollutants failed, but 17 percent of the cars that do not emit excessive amounts of pollutants also failed. These two statements can also be read as follows. The probability that a car will fail given that it emits excessive amounts is 99 percent whereas the probability that a car will fail given that it does not emit excessive amounts of pollutants is 17 percent. In other words, P of B given a 1 is 0.99 and P of B given a 2 is 0.17. So, students we can now substitute these values in the formula and as you see on the screen probability of a 1 given B is equal to 0.25 into 0.99 over 0.25 into 0.99 plus 0.75 into 0.17 and this comes out to be 0.25 into 0.17. So, the probability that a car will fail given that it emits excessive amounts is 25 percent so naturally the probability that the car does not emit excessive amounts will be 75 percent because the two events are the complements of each other or baaki saari explanation abhi hum discuss kar chuke hain. So, having applied the formula then the final result is 0.66 yani the probability is 66 percent that a car that fails the test actually emits excessive amounts of pollutants. Students yejo example have me consider kia this was the simplest case when k is equal to 2 only, but of course it can be extended to the case when k is equal to 3, 4 or more. Let us consider another example in a bolt factory 25 percent of the bolts are produced by machine A 35 percent are produced by machine B and the remaining 40 percent are produced by machine C of their outputs 2 percent 4 percent and 5 percent respectively of the bolts are defective if one bolt is selected at random and it is found to be defective what is the probability that it came from machine A students is example me ab dekhye kye humare paas teen mutually exclusive or exhaustive events hain yani either the bolt is being produced by machine A or by B or by C or is ke lawa koi machine nahi hain or 100 percent of the bolts yani tamamter bolts do hain they are covered by these three machines 25 percent by A 35 percent by B and 40 percent by C. So, humare jo bunyadi requirement tina k A 1, A 2, A 3, waga jo hain they should be mutually exclusive and exhaustive aap note karein ke yeh shat yaha pe puri ho rahi hain. A bolt can either come from machine A or from machine B ye to nahi hosakta hain ke baik vaakth hum kaha hain ke wo machine A se bha raha hain aur B se bha raha hain wo ek bolt zair hai ke ek baik main koi ek bolt jam uthain ke it will be coming from one of the three machines exhaustive isliye hua ke 25 percent plus 35 percent plus 40 percent is 100 percent that is 1. So, students yeh jo humare basic concept tha that A 1, A 2, A 3 should form a partition of the sample space yeh baat yaha pe puri ho rahi. Aur wo jo dosra event hai, jis ko hum B kahenge is example mein that is that the bolt is defective yani jo bolt uthaya gaya at random that came out to be defective. Aur hum nikalna kya chahate hain? Ke what is the probability that the bolt came from machine A given this fact that it is defective? To aap dekhye ke isska to yeh matlab hain hain, ke we are trying to compute P of A 1 given B the probability that the bolt has come from the first machine given the information that it is defective. Now, I would like to encourage you to apply the Bayes theorem to this example and to solve it yourself in order to determine this particular probability. This brings us to the end of the discussion of some of the very basic and important concepts of probability. I will now begin with you the next very very important topic and that is probability distributions. As I mentioned in the very beginning, there are two types of quantitative variables the discrete variable and the continuous variable and accordingly there are two types of probability distributions the discrete probability distribution and the continuous probability distribution. I will first discuss with you the case of discrete probability distributions and in this regard the first thing that we will do is to consider what is meant by the term random variable. As you now see on the screen such a numerical quantity whose value is determined by the outcome of a random experiment is called a random variable. It is an any experiment in which the outcome is unpredictable. For example, if we toss a pair of dice we know that there are 36 possible outcomes. We cannot say which of those 36 outcomes is going to occur. If we consider the case of tossing three coins simultaneously, students you will recall that the sample space of this particular random experiment consists of eight outcomes head, head, head, head, head, tail, head, tail, head, tail, head, head and so on. If we are not interested in the sequence but are only interested in what is the number of heads that appear students then we are talking about a random variable. The random variable X denotes the number of heads that appear and in this example this number can be 0, 1, 2 or 3. If we get tail, tail, tail it is obvious that X is equal to 0, n e number of heads is 0 and if we have for example head, head, tail how much is X? Obviously X is equal to 2. Now that the basic concept of a random variable is clear the next question is what kind of a random variable are we dealing with? Obviously in this particular example that I just presented X is a discrete random variable because it can take the value 0, no head, 1, 1 head, 2 or 3 but obviously we cannot have 2.9 heads or 1.7 heads. So because we are dealing with a discrete random variable hence the associated probability distribution will be called a discrete probability distribution and you will be interested in noticing as we study this topic that there you will find a number of similarities between the discrete probability distribution and the discrete frequency distribution that we dealt with in an earlier lecture. Let us now study this concept in detail with the help of an example. If a biologist is interested in the number of petals on a particular flower this number may take the values 3, 4, 5, 6, 7, 8, 9 and each one of these numbers will have its own probability. Suppose that upon observing a large number of flowers of this particular species say 1000 flowers the following results are obtained. The X values are 3, 4, 5, 6, 7, 8, 9 and the frequencies are 50, 100, 200, 300, 250, 75 and 25. Students, abhi jo table aapne screen pe dekhi obviously this is a frequency distribution aur aap kahange ke phir ye toh wo concept hai johan bohot pehle uswag study kar rehte johan frequencies deal kar rehte. But if you remember the relative frequency definition of probability and you relate that definition with this particular example you will realize that the total number of flowers that have been inspected in this problem that is 1000 and 1000 is a reasonably large number for us to treat the relative frequencies for the various values of X as probabilities. After all relative frequency definition of probability ye hi thi na ke if an experiment is repeated again and again and again and again a large number of times then the proportion in which a certain event occurs that proportion can be regarded as the probability. So, in this example if I compute the relative frequencies or in other words the proportion of flowers that had 3 petals, 4 petals, 5 petals and so on these proportions are eligible to be treated as probabilities. So, as you now see on the screen we can say that the probability that X is equal to 3 is 50 over 1000 that is 0.05, the probability that X is equal to 4 is 100 over 1000 and that is 0.10 and similarly all the other probabilities. In other words the probability that a flower of this particular species contains 3 petals is only 5 percent, the probability that a flower of this species contains 4 petals is 10 percent, 5 petals 20 percent probability, 6 petals 30 percent probability, 7 25 percent probability, 8 petals only 0.075 is the probability of this event and for 9 petals it is only 0.025 that is 2.5 percent. Students, this table is an example of a discrete probability distribution. These three words that I have used, let us look at them. First of all why are they called as discrete? That is most obvious. The number of petals on a flower will be 3, 4, 5, 5 petals. Now, you will argue that it is possible that there were 6 petals, but out of that 3, 4th part of a petal is a petal, so now there are 5 petals left. It can be argued in this manner, but this is not the way that we will be interpreting it from the mathematical point of view. It will always be a whole number. After this, students, note that the total probability 1 has been distributed among the various values of x. As you once again see on the screen, 5 percent of it has been allocated to x equal to 3, 10 percent of it has been allocated to x equal to 4 and so on. And this is exactly the reason why we say probability distribution. Like the frequency distribution discussion, total frequency has been distributed among the various classes. Students, a discrete probability distribution has two basic properties. As you now see on the screen, the first property is that the probability of x i lies between 0 and 1 for each x i. And the second property states that the sum of the probabilities is equal to 1. In the example that we just considered, you can notice that each of these properties is being fulfilled as will be the case for any discrete probability distribution. Let us now consider the graphical representation of a discrete probability distribution. Of course, in the case of the continuous variable, we talked about the histogram and the frequency polygon and the frequency curve. But, if you remember, in the case of the frequency distribution of a discrete variable, I proposed that you draw a line chart. This is exactly the chart that we will be drawing for the discrete probability distribution as well. And for the example that we just considered, we have the following situation. Since x 1 is equal to 3 and the probability of this x value is 0.05, hence the length or the height of the first line is 0.05. Similarly, the height of the second line is 0.10, the 1 for the third is 0.20 and so on. Hence, we get the line chart that you now see on the screen. And I think that you will agree with me that we can say that this particular discrete probability distribution is approximately symmetric. You have noted that I gave you the concept of symmetry here. That means, the same concept that we were discussing at that time when we were discussing discrete distributions. But, symmetry or skewness or kurtosis was discussed later. First we mentioned mean and then spread. This means that in the case of discrete probability distribution, we may measure skewness later. First, we measure its mean or its central value or its spread or its standard deviation. As you see on the screen once again, this distribution is approximately symmetric. And because of that, you might agree with me that we can say that the mean number of petals for this particular example is 6. And we are saying this because the central value of our random variable X in this problem is 6. Students, Zahirek ye to ek estimate hai of the center because I have not computed the mean according to the proper formula. How will I do that if I wish to compute the exact mean of this particular random variable? In order to understand this, I would like to draw your attention to the fact that the formula is very similar actually to the formula that we have in the case of a frequency distribution. As you will remember, the formula for X bar was sigma F X over sigma F and this can also be written as sigma X F over sigma F. Now, the next point to realize is that in that situation we had a column of frequencies, but now we have a column of probabilities. So, instead of F I can write P and my formula will become sigma X P over sigma P. But sigma P, the sum of the probabilities is 1 and hence the formula becomes sigma X P over 1. That is sigma X P and this is exactly the formula for the mean of a discrete probability distribution. Formally speaking, the mean of a discrete probability distribution is equal to sigma X into P of X and in this particular example, if we multiply the X column with the column of P of X, we obtain 0.15, 0.40, 1.00, 1.80 and so on. And upon adding this particular column, the mean of this particular probability distribution comes out to be 5.925. So, you have seen that our thought that this distribution is almost symmetric as the line chart was clear. This answer is validating because 5.925 is very close to 6 and if this distribution was exactly symmetrical the answer exactly 6. You may be confused that when I explained you first, I said that sigma X into P and now I am saying sigma X into P of X, well that was only to explain the point that F is being replaced by P, the probability. But formally probability of X is always denoted as P of X. In fact, in many situations we do not even write P of X rather we write F of X or if you see this notation in a book, then please do not mix F of X with F. If F is written, then this means frequency. But F of X, if any discrete probability distribution is written, then it represents probability. Next, we consider the spread of the probability distribution. As you see on the screen, in case of a frequency distribution, the formula for the standard deviation is the square root of sigma F X square over sigma F minus sigma F X over sigma F whole square, exactly the same shortcut formula that we discussed in an earlier lecture. Now, this can also be written as square root of sigma X square F over sigma F minus sigma X F over sigma F whole square. But in case of a probability distribution, we will be replacing the frequencies by the probabilities and our formula now becomes square root of sigma X square into P of X over sigma P of X minus sigma X P X over sigma P X whole square. But because of the fact that the sum of the probabilities of a discrete probability distribution is 1, the values in the denominator become equal to 1 and our formula reduces to the square root of sigma X square P of X minus sigma X P X whole square. In order to apply this formula for the example of the number of petals on the flowers of a particular species or for any other such example, of course, we should multiply the X column with the column of probabilities in order to obtain X into P of X just as we did in the case of the mean and we will create another column and that is the column of X square into P of X which will be obtained by multiplying the column of X by the column of X into P of X. So, as you see on the screen in this case multiplying 3 by 0.05 gives us 0.15 and multiplying 0.15 by 3 we get 0.45. The sums of these columns are 5.925 and 36.925 and substituting them in the formula we obtain the standard deviation of X equal to 1.3. Students, the notation for the mean of a probability distribution is mu and the notation for the standard deviation is sigma. The sigma that I am talking about now is the small sigma of the Greek alphabet as you know capital sigma stands for summation. Now that we have computed the mean and the standard deviation of our probability distribution students we can represent it on the graph of our distribution exactly in the same manner as we did in case of a discrete frequency distribution. So, as you now see on the screen the mean of this particular probability distribution is 5.925 and the standard deviation is represented by a horizontal line segment which is equal to 1.3 units. Now that we have computed both the standard deviation and the mean of our distribution students is it not obvious that we can also compute the coefficient of variation as you remember in case of a frequency distribution the coefficient of variation is defined as S over X bar into 100. So, in the case of a probability distribution it will be defined as sigma over mu into 100. And as you now see on the slide the coefficient of variation for this particular example comes out to be 21.9 percent. Students, abhi abhi amne jo example ki aap ne note kiya ke usme jo probabilities hai we arrived at those by the relative frequency definition of probability. Lekin aap jante hi hai ke there are certain situations where probabilities are arrived at according to the classical definition. So, let us consider another example in which we obtain the probabilities using the classical definition of probability. As you now see on the screen if we are interested in finding the probability distribution of the sum of the dots when two fair dyes are thrown we are going to apply the classical definition of probability. In this problem we are also interested in utilizing this probability distribution in order to find the probabilities of obtaining number 1 a sum that is greater than 8 and number 2 a sum that is greater than 5, but less than or equal to 10. In order to solve this problem students the first thing to realize is that if the dice are fair this is going to be the case of equally likely outcomes and we are going to be able to apply the classical definition of probability and we do make this assumption we will assume that the dice are properly made and they are fair and so we apply this definition and according to this definition we have to compute probabilities of the form m over n where n is the total number of possible outcomes whereas, m is the number of outcomes favoring what we want. So, now in this lesson we proceed step by step. First of all the sample space of this experiment as you remember the total number of possible outcomes is 36 which ones are they 11, 12, 13 and so on. After that we have to note that the random variable we are interested in that is the sum of the dots on the two dice and students you will realize immediately that it is a discrete random variable why because 1 plus 1 is 2, 1 plus 2 is 3 and so on or up to sum 2.7 or 4.9. So, in any situation may be may be they are all going to be whole numbers and they range from 2 to 12, 1 plus 1 as I said is 2 but the last outcome 6 and 6 gives us a sum of 12. Now, when the sample space is determined now the probabilities are talk about the table which we have to construct in order to construct the probability distribution. So, as you see on the screen we will first create the column of x which goes from 2 to 12 and then we create the column of probabilities p of x which is to be filled out in such a way that the sum of the column will come out to be 1. Now, in order to fill out this column of course we should count the number of outcomes that are favoring a sum of 2, a sum of 3, a sum of 4 and so on or as you see on the screen the number of outcomes that favor a sum of 2 is only 1 and that is 1, 1. So, the probability that x is equal to 2 where x represents the sum of the dots on the two dice is 1 by 36. Similarly, in order to compute the probability that the sum is equal to 3 we notice that there are two outcomes that favor this particular sum and they are 1, 2 and 2, 1. Hence, the probability that x is equal to 3 is 2 by 36. Similarly, we can compute all the probabilities or if you look at that sample space then you will notice that these are very interesting things that you will keep going diagonally and you will get the sum of the particular sum of those diagonals. So, as you now see on the screen the probability distribution for this particular example is x is equal to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 and the probabilities are 1 by 36, 2 by 36 and so on. They keep on rising until they attain the value 6 by 36 against a sum of 7 and then they decline and we have 5, 4, 3, 2 and 1. As you have noted, the numerators of these probabilities are increasing in a steady fashion and then they decline in the same manner in a mirror image way and hence I hope you realize that if we draw the line chart of this particular distribution it will be absolutely symmetric. But in this problem, our next question is that we should compute the probability of the event that our sum is greater than 8. How do we proceed? Obviously, greater than 8 means either 8 or 9 or 10, 11, 12 or your love's or is the malhua that reminds us of the addition theorem and the special case of the addition theorem probability of A or B if A and B are mutually exclusive is equal to probability of A plus the probability of B. So, extending this theorem to the case of more than two events, we obtain in this example as you see on the screen the probability that x is greater than 8 is equal to the probability that x is equal to 9 or 10 or 11 or 12 and that is equal to the probability that x is equal to 9 plus the probability that x is equal to 10 plus the probability that x equal to 11 plus the probability that x is equal to 12. Applying the values of these probabilities in this formula, the probability that we obtain a sum greater than 8 comes out to be 4 plus 3 plus 2 plus 1 by 36 and that is 10 by 36. Similarly, the other part of the question was find the probability that the sum is greater than 5 but less than or equal to 10 and this is equal to the probability of x equal to 6 or 7 or 8 or 9 or 10 and adding those probabilities this singular probability comes out to be 23 by 36. Now that we have discussed the concept of the discrete probability distribution in considerable detail, we have seen its basic format. Just two columns, a column of x and a column of probabilities such that the sum is 1, the next concept that I would like to discuss with you is the concept of the distribution function. As you now see on the screen, the distribution function of a random variable x denoted by capital F of x is defined as the probability that capital X is less than or equal to small x. Students, you have noted that capital F of x and capital F bracket is a small x and I said that the probability that capital X our random variable capital X is less than or equal to small x. So, what is the difference between capital X and small x students? Capital X represents the random variable that we are talking about and small x represents some particular value of the random variable. In other words, if I am interested in determining the probability that the sum of the dots on the two dice is less than or equal to 5, I am talking about capital F of 5. As you see on the screen, capital F of 5 is equal to the probability that capital X is less than or equal to 5 and this is equal to the probability that x is equal to 2 or 3 or 4 or 5 and because of the fact that all these events are mutually exclusive and they cannot occur together, this probability is equal to the probability that x is equal to 2 plus the probability that x is equal to 3 plus the probability that x is equal to 4 plus the probability that x is equal to 5. Substituting the values of these probabilities in the formula, we obtain 1 by 36 plus 2 by 36 plus 3 by 36 plus 4 by 36 and evidently this is equal to 10 by 36. In other words, students, the distribution function of a discrete probability distribution is simply the cumulation of the probabilities from the beginning up to that particular value that we are talking about. In this case, 5. You have seen that this concept is exactly the same as the concept of the cumulative frequency distribution. As you see on the screen, if we construct the column of cumulative probabilities for the same example that we just discussed, we obtain 1 by 36, 3 by 36, 6 by 36, 10 by 36 and so on and the last one, the 1 against x equal to 12 comes out to be 36 by 36 exactly equal to 1. These cumulative probabilities represent capital F of x and as you can see the probability that x is less than or equal to 5 that is 10 by 36, it is seen at a glance if we have a look at the column of the cumulative probabilities. So, you have seen that this concept, this is also a very useful concept, whenever you have a particular value or less than a particular value, you can think of the distribution function capital F of x. Students, next time we will take this concept a bit further and after that I will discuss with you in a formal mathematical manner, the concept of mathematical expectation. I wish you the very best in your study of this new topic that we have started today and until next time, Allah Hafiz.