 Last class I traced the origins of the term probability and we saw the historical development of the definitions of probability, the methods of calculation of the probability. We have seen that historically the problems that were considered they were of nature in which we can consider a mathematical definition or you can say so called classical definition. We had games of chance and dice throwing tossing of a coin and so on, where we could use the definition which was classical definition where we have a finite number of elements in the sample space and the outcomes are considered to be equally likely. Later on it was discovered that this definition is quite restricted and it cannot be used forseveral kind of phenomena for which we need the probabilistic statements. A more practical definition was the so called empirical definition or a statistical definition of probability which is based on the evidence. That means, how many times over a period of time a particular event occurs. So, looking at the proportion if this proportion is stabilizing to a certain number, we define that as a empirical definition of probability. However, it was found that even this definition has certain inadequacies and therefore, later on a firm measure theoretic definition of probability was given by A and Karl-Mogorov. It was called the axiomatic definition and in the last lecture towards then I introduced this definition. Just to recall the terms here, we have a random experiment which results in a sample space omega and we consider a class of subsets of omega which is satisfying certain conditions and we called it a sigma field a sigma field of subsets of omega. Based on this, we have defined p as the probability function. This is defined as a probability function on omega b p and this is satisfying this satisfies the axioms of probability the 3 axioms. So, we can say the 3 axioms of probability are satisfied here. So, we will call omega b p as a probability space. So, let us look at our structure. We have a random experiment on the basis of the random experiment the possible outcomes which are of interest to us they are collected in a sample space called omega and we consider the a class of subsets of omega which satisfies certain conditions as we recollect from our last lecture it is a sigma field and then on this we define p as a probability function which satisfies the 3 axioms of probability. Just to recollect what we did it in the last class let me show up the slide for the definition of probability it is as follows. We have probability to be a non-negative function the probability of the full space is equal to 1 and if we are considering a pair wise disjoint events then the probability of their union is equal to sum of the probabilities. So, these are the 3 basic axioms of the axiomatic definition of probability. So, this is the so called axiomatic definition axiomatic definition of probability. So, now when we say that omega b p is a probability space this is satisfying the 3 axioms that were given there. Now, let us look at certain laws or certain properties that this probability function will have. So, let us look at certain properties of the probability function. These are actually consequences of of the axiomatic definition axiomatic definition of probability. The first consequence is that the probability of the empty set is 0 phi denotes the null event or the impossible event as we defined earlier. Note that we must expect that the probability of impossible event should be 0. Now, what we are saying is that this is consistent with our axioms. So, let us see here let me give a sketch a proof of this here. Consider say E 1 is equal to say certain set here say A and E 2 E 3 and so on this is equal to say phi in axiom 3. If we do that we get probability of E 1 union E 2 union and so on that will become A on the left hand side on the right hand side it will be probability of E 1 plus probability of phi plus probability of phi and so on. Now, we can this probability of A is a non negative number. So, we are saying that. So, we can actually remove this number from here by considering say A is equal to omega then what will happen you can have P omega on both the sides. So, you can cancel. So, you get P phi plus P phi is equal to 0 this means P phi is actually 0. Let us look at the second consequence if E 1 E 2 E n or pair wise disjoint sets in B then probability of union E i i is equal to 1 to n is equal to sigma probability of E i. So, let us compare it with the third axiom here. If we have any collection of pair wise disjoint sets then the probability of union is equal to sum of the probabilities. Now, in this axiom if we take only a finite number here and the remaining ones I substitute as phi then this is reducing to this statement. So, let us look at a third statement P is a monotone function that is if A is a subset of B then probability of A is less than or equal to probability of B. Once again you can easily construct a proof here. Let us consider say this set as B and A as a subset then basically what you are having that B is actually equal to A union B minus A. Now, this is a disjoint thing. So, probability of B is equal to probability of A plus probability of B minus A. So, since this is a nonnegative term you can conclude that probability of A is less than or equal to probability of B. So, it is a monotone function some simple consequences you can see probability of E complement will be equal to 1 minus probability of E for all E that is the probability of complement is 1 minus the probability of the original set. Similarly, probability of so you can also say that if I consider say A and B are any 2 events then probability of A union B is equal to probability of A plus probability of B minus probability of A intersection B. This is actually called addition rule of probability. The proof is again very simple you can look at a sketch by using Venn diagrams. Suppose I have A event here and B event here and if I am looking at A union B then A union B can be written as A plus this portion of B which is actually B minus A intersection B. So, the probability of A union B becomes probability of A plus probability of B minus probability of A intersection B. Now, this result one can look at generalization. Suppose in place of 2 I have 3 events say A B C then in that case if A B C are any 3 events then probability of A union B union C is equal to probability of A plus probability of B plus probability of C minus probability of A intersection B minus probability of B intersection C minus probability of C intersection A plus probability of A intersection B intersection C. Now, naturally one may so what does it denote actually as we have explained earlier this is probability of occurrence of either A or B or both. Similarly, here it will mean probability of occurrence of at least one of A B and C. So, naturally one will be interested in finding out the probabilities of occurrence of at least one of n number of events. So, that gives us a general addition rule. If A 1, A 2, A n are any events in B then probability of union A i, i is equal to 1 to n probability of A i intersection A j probability of A i intersection A j intersection A k and so on plus minus 1 to the power n plus 1 probability of intersection A i, i is equal to 1 to n. So, one can actually prove these results by induction because the proof of this is almost trivial here and extension of this you can use by considering A union B and C and then splitting and then doing it twice. Similarly, for proving this result you can do it by induction. In this particular course I would be skipping the proofs because I have to cover various topics which are to be used by scientists and engineers. So, if any user is interested in the proofs. So, one can look at the differences which are given here in the syllabus and also you can look at the other lectures which are available on probability and statistics by myself. So, those lectures cover the proofs of these facts here. Here I will be only telling the main rules which will be used by the users which are applied scientists and engineers who will be making use of the results of probability theory. Let me give an example here of some elementary probability problems. The probability of getting two cards of the same type say two kings, two queens etcetera. Two cards of the same type irrespective of their colour symbol in the first 13 cards dealt from a well shuffled pack of cards. So, let us look at the conditions of the experiment. We have a well shuffled pack of 52 cards that means, we have 13 cards of heart, 13 cards of spirit, 13 cards of club and 13 cards of diamond starting from 1 to 10 numbers and then 11 is jack, 12 is queen and 13 is king. So, what is the probability that we will be drawing two cards of the same type in this? So, let us consider this. Now there are two cards of the same type. So, how many such possibilities are there? Since we have 13 different numbers for example, 1, 2, 3 and so on. So, there are 13 possibilities of two same cards ok. Now these can be done in now for each there are four cards. So, 4 c 2 is the number of possibilities and out of this 13 any of them can be done. So, this is possible in 4 c 2 into 13 ways. Now you have 11 more cards. So, they should all be distinct because we are putting that only two cards are same type. So, remaining 11 cards must be all distinct. So, this can be done in 4 to the power 11 ways. Now we have chosen one type here. So, for which two cards are there? Now out of remaining 12 type 11 have to be chosen. So, that can be done in 12 c 1 ways. Now this selection of 11 cards from 12 types can be done in 12 c 11 ways. So, the required probability it is 4 c 2 into 13 into 4 to the power 11 into 12 c 11 and total number of ways of choosing 13 cards out of the pack of 52 is 52 c 13. So, this number can be calculated and this is approximately 0.0062. Let us take another example in a study of a group of 1000 subscribers to a certain magazine the following data is recorded. There are 312 professionals 470 married persons 525 college graduates. Now so, we have considered the data with respect to profession marriage status and education. Out of this then 42 are professional college graduates 147 married college graduates 86 married professionals and 25 married professional college graduates. Find the probability that a randomly selected subscriber will be at least one of professional married college graduate. That means he will be satisfying at least one by one of the properties. Now is there is there a fallacy in the data. So, we are asking an additional inquiry here let us see. Suppose I define the event A as that the person has a person is a professional say B is the event that person is married say C is the event that he is a college graduate. Then basically we are interested in what is the probability of A union B union C. Now according to the addition rule this will be probability of A plus probability of B plus probability of C minus probability of A intersection B minus probability of B intersection C minus probability of C intersection A plus probability of A intersection B intersection C. Now this is equal to now number of professionals is 312. So, this will become 312 by 1000 plus married persons are 470. So, it will become 470 by 1000. Then the number of college graduates is 525 and like that we substitute all the values A intersection B is the married professionals. So, that is we have 42 then we have married college graduates that is this is 86 year. Then we have B intersection C that is married college graduates that is 147 then we have professional college graduates. So, we have 42 then we have A intersection B intersection C that is married professional college graduates that is 25 by 1000. Now if you look at this this turns out to be 1057 by 1000 which is obviously greater than 1. So, the data is inconsistent because you cannot have the probability greater than 1. So, using the addition rule we are able to detect fallacy in the given data. Now we look at some further aspects of the rules of probability we define something what is called conditional probability it is something like this. If we consider here that what is the probability that a person is a professional we are giving the answer it is 312 by 1000. But suppose I ask the question what is the probability that the randomly selected person is a professional given that he is already married. In that case I will have to look at the number of persons who are already married among them how many are professionals and therefore, the answer will become 86 by 470. Similarly, if you want to find out what is the probability that the person is a professional given that he is a college graduate. So, in that case I have to look at all the college graduates and among them how many are professionals. So, we have only 42 such things. So, we have 42 by 525 rather than 312. That means, if we apply a condition in our random experiment then the probability of the original event may change this is called the concept of conditional probability. So, let me introduce formally let omega b p be a probability space let f be an event with probability of f greater than 0. Then the probability of then the conditional probability of an event e given that event f has already occurred is defined probability of e given f. So, this is the notation of conditional probability this is not e divided by f it is e given f. So, probability of e intersection f divided by probability of f one can easily check that this is a valid probability function. It can be easily checked that this conditional function is a valid probability function. Now, let us look at this consequences of this definition also. See from here we can see that the conditional probability is equal to the probability of simultaneous occurrence of two events divided by probability of the other event which is conditioning event. So, from here we can also write probability of e intersection f is equal to probability of f into probability of e given f. Also if probability of e is positive we may write probability of e intersection f is equal to probability of e into probability of f given e. So, these statements this is called multiplicative probability or multiplication rule of probability this is the multiplication rule of the probability. Now, easily one can see that in place of two events if we have three events four events and so on then this type of statement can be further extended. So, we have as a consequence a general multiplication rule let e 1 e 2 e n be n events with probability of each of them actually positive. So, I am taking the smallest one to be positive then probability of intersection e i that is the simultaneous occurrence of each of them is equal to probability of e 1 into probability of e 2 given e 1 into probability of e 3 given e 1 intersection e 2 and so on probability of e n given intersection e i i is equal to 1 to n minus 1. One can again prove by induction this thing I am skipping the proof here. Now, let us look at further consequences of this conditioning. Many times we are looking at the phenomena in the following fashion. Let us consider certain event for example, we are looking at the death of a person that means, we are looking at the records of the deaths in a particular locality. So, for example, it could be a city municipality data. Now, we may like to look at the causes of the death. So, a person might have died due to a disease he might have died due to an accident and so on so forth. And even in the diseases one may have one may die due to a lung disease one may die due to a kidney disease or one may die due to a liver disease etcetera. So, in that case if we are looking at the event death of a person what is the probability of the death among the total population at a certain age for example, then we can classify it according to the different causes. So, this is called the you can say the total probability that means, the total probability of a particular event can be split into the probability of that event caused by something caused by another cause and so on. So, we have theorem of theorem of total probability. So, let E 1 E 2 E n be n events such that E i intersection E j is equal to phi for all i not equal to j and union of E i is equal to omega. So, basically what we are saying is that the events are mutually exclusive exhaustive. So, now for any event A probability of A can be written as sigma probability of A given E i into probability of E i. Here we are assuming that the probability of E i is positive for each i. Let me explain the proof here at least for this statement here. See let us write down probability of A we can consider it as probability of A intersection the full space because any event is a subset of the full space and this you can write as A intersection since A i's are E i's are exhaustive we can write omega as union of E i's. So, this is becoming probability of union A intersection E i, i is equal to 1 to n. Now this you can consider see you have mutually exclusive events say E 1 E 2 E n etcetera ok and this is some event A and if A is disjoint with if E i's are disjoint and I consider say A is any set then A intersection E 1 A intersection E 2 A intersection E 3 and so on they will also be disjoint. So, this will become probability of because the probability is additive. Now on this we can apply the multiplication rule. So, this becomes probability of A given E i into probability of E i is equal to 1 to n. Now a further consequence of this is the famous Bayes theorem which is named after Reverend Thomas Bayes. So, we have the same set up here. Let E 1 E 2 E n be events with probability of E i is positive E i intersection E j is equal to phi for all i not equal to j and union E i is equal to omega. Let A be any event with probability of A positive then probability of E i given A it is equal to probability of A given E i into probability of E i divided by sigma probability of A given E j into probability of E j j is equal to 1 to n. So, this is called then posterior probability this is called then prior probability that means we have the prior probability of the causes and also the conditional probability of an event given those causes. Then we can calculate suppose we know that what is the event that has actually occurred then what is the posterior probability that means if we know the effect then what was the cause we can find out the probability of the cause. So, this is a so called Bayes theorem because it is named after Reverend Thomas Bayes who and it was published in 1763. There are very interesting consequences of this result I can explain through certain example. Let us consider a survey of people in a given region showed that 25 percent of people drank regularly. The probability of death due to liver disease given that a person drank regularly was 6 times the probability of death due to liver disease given that a person did not drink did not drink regularly. The probability of death due to liver disease in the region is 0.005. If a person dies due to liver disease what is the probability that he or she drank regularly. So, now you can see that the death may be caused by a liver disease and for the liver disease the cause may be drinking regularly or not. Now we know that the person dies due to liver disease. So, we are looking at the posterior probability of whether he was a regular Dunkard or not. So, let us define we will apply this Bayes theorem here. Let us define the events here. Suppose I consider let A be the event that person drinks regularly and let us consider B be the event that death is due to liver disease. Now, what is given? It is given that probability of B given A that is the death is due to liver disease is 6 times given death is due to liver disease given that the person is a regular Dunkard is 6 times the probability of death due to liver disease given that he is not a regular Drinker. Let us put say this equal to alpha. Also it is given that probability of A that is percent drinks regularly is because it is given that 25 percent of the persons drink regularly. So, 0.25 and also the probability of death due to liver disease is 0.005. So, we can apply the theorem of total probability. So, probability of B is equal to probability of B given A into probability of A plus probability of B given A complement into probability of A complement. Now, we have assumed that the probability of B given A is equal to alpha. So, this becomes alpha by 4 plus 3 by 4 probability of A complement is 3 by 4 and B given A complement is equal to alpha by 6 and probability of B is equal to 0.005. So, from here you can get alpha is equal to 8 by 3 into 0.005 that means, the probability of death due to liver disease given that he is a regular Drinker that will be equal to 8 by 3 into 0.005. Now, let us look at our event what is the probability of that he drank regularly given that he dies due to a liver disease. So, here we can apply base theorem. So, that is equal to alpha divided by 4 into 0.005 that is equal to 2 by 3 because alpha we have already calculated to be equal to 8 by 3 into 0.005. So, if we substitute it here we will get 2 by 3. So, if a person dies due to liver disease then there is a high chance that he is actually he was a regular Drinker. Let us take a couple of more problems of similar nature here. A research scholar asked her guide to give a letter of recommendation. She estimates that the probability that she will get the job is 0.9 with a strong letter, 0.6 with a medium letter and 0.1 with a weak letter. She also believes that the letter will be a strong medium or weak with the respective probabilities 0.6, 0.3 and 0.1. What is the probability? The letter was strong given that she got the job. So, let us define here A to be the event that she gets a job. Let us consider B 1 as the event that the letter is a strong recommendation letter, B 2 is the event that medium recalator and B 3 is the event a weak recommendation letter. So, it is given that probability of B 1 is 0.6, probability of B 2 is 0.3, probability of B 3 is 0.1. Also it is given that probability of A given B 1 is 0.9, probability of A given B 2 is 0.6 and probability of A given B 3 is equal to 0.1. We are asked to find out what is the probability of B 1 given A. That is the probability that letter was strong given that she got the job. So, if we apply the Bayes theorem here we will get probability of A given B 1 into probability of B 1 divided by summation probability of A given B j into probability of B j, j is equal to 1 to 3. So, this is equal to 0.9 into 0.6 divided by 0.9 into 0.6 plus 0.6 into 0.3 plus 0.1 into 0.1. So, if we evaluate this turns out to be simply 54 by 73 or 0.7397. So, there is a 74 percent of chance that actually the letter was strong if we already know that she got a job here. Now, let me also introduce the concept of independence of events. So, here we have seen that the conditioning alters the original probabilities. Let us look at here. For example, what is the probability of B 1 that is 0.6, but if I consider conditioning by A it turns out to be 0.7397 ok. In a similar way we can look at the effect on probability of B 2 and so on. Similarly, in one of the previous problems if you look at probability of A was actually 0.25 that means, the person is a regular drunker, but if we know that he died due to liver disease then what is the probability that he was a regular drunker then that probability is quite high its 2 by 3. So, you can see that the effect of conditioning changes the original probability. Now, there may be cases where the conditioning does not change the it can happen if the two events are quite unrelated. For example, if you consider tossing of a coin and tossing of a dice then what is outcome of the tossing of coin may not have anything to do with the tossing of a that is the outcome of the tossing of a dice. So, the two events are totally independent. So, using this we define the concept of independence as follows. So, as you can see here that we are considering that if the event B has no effect on happening of A then A and B will be independent or you can say A will be independent of B. So, we can write like this that probability of A given B is equal to probability of B then we will say that A is independent of happening of however, if we write it in a elaborate fashion what does it mean? It means probability of A intersection B divided by probability of B sorry this is probability of A here is equal to probability of A. So, this means probability of A intersection B is equal to probability of A into probability of B. Now, this definition is more symmetric because if we say A is independent of B we should also say that B is independent of A. So, we define then we say events A and B are independent if probability of A intersection B is equal to probability of A into probability of B. Now, immediately one can look for the generalization of this. For example, if we have three events A, B, C are independent then we should have probability of A intersection B is equal to probability of A into probability of B. We should have probability of B intersection C is equal to probability of B into probability of C. Probability of C intersection A we should have equal to probability of C into probability of A and probability of A intersection B intersection C is equal to probability of E into probability B into probability of C. actually the condition when we take 2 events at a time then that they are called pair wise independent and when all of them are satisfied then it is called mutually independent . So, in general then we can define n events a 1 a 2 a n are said to be independent if I consider every pair a i intersection a j that is equal to probability of a i into probability of a j for all i not equal to j. If we consider probability of a i intersection a j intersection a k that should be equal to probability of a i into probability of a j into probability of a k i j k must not be equal and so on. Probability of intersection a i i is equal to 1 to n must be equal to the product of all the probabilities basically these are 2 to the power n minus n minus 1 conditions. So, when this has two types of uses one is that we have a complex experiment and we may like to check whether the events are independent. On the other hand we consider independent events and then using them we have a complex event for which we want to find out the probability and then we can make use of the concept of independence in actual calculation of the probabilities of those events. We also consider what is known as independent experiments. So, here we have say omega 1 b 1 p 1 as one probability space and another probability space is omega 2 b 2 p 2 this is another probability space. So, for example, it could mean that we are considering say we are looking at say longevity of persons in a particular country say India and then we consider the longevity of persons in say a country in Europe say for example, France and in that case if we have considered sampling of persons in India and in France then that is considered to be independent. That means, the longevity of the persons in India and then we calculate various kind of events what is the probability that the age average age or average longevity is less than 50 more than 50 between 55 to 60 and so on in India it will have nothing to do with the similar kind of events in the another country say France. So, this is independent experiments and certainly when we want to join the events in India and France then we can make a complex event, but by using the independence we can resolve them and find out the probabilities of the corresponding events. So, we have considered various rules of probability there are certain other things also for example, limiting probabilities if we have a sequence of events and we can consider say limiting probability what is the probability of the limit of that sequence of sets then under certain conditions it can be proved that it will be equal to limit of the probabilities of that sequence. So, probability satisfies various nice properties for example, we have seen additivity, general addition rule, multiplication rule, total probability, limiting probability and so on. In the following lectures I will introduce to random variables and their distributions and we will move over to special discrete and continuous distributions.