 Now, we will talk about probability mass functions and probability density functions. Probability mass function is for discrete random variable and probability density function we are going to talk about continuous random variables. So, let us say my random variable x is discrete and it is take some discrete values x 1, x 2 like x 3 like that. And the probabilities at these discrete points is going to be called as probability mass function and naturally the probabilities of all these values should add up to 1. And what we are going to call is P of xi we are going to call it as the mass assigned at point xi. And if you now recollect masses corresponds to the jumps amount of the jumps at on this cdf curve. Now, probability density function. So, probability density function we are going to talk about for the continuous case. And in this case the way we will define probability density function is if my probability of x on subset A I can represent it as an integration of some function f of x. And this is true for any subset then I am going to call my random variables to be continuous. And this function f of x I am going to call it as a probability density function ok. So, now let us look into some aspects. Now, suppose so this should have been A is a in this case A is a subset right A is a set. So, I should have A can be it should have written this A is a subset of R here ok is a subset of R. Now, and this could be equal also ok. Now, let me see what happens if I take A to be R itself I am basically saying probability that A on the real line. What is this value in that case? If I take A to be R real line entire real line it is going to be 1 right I am basically letting random variable to take any possible value. So, this left hand side is 1 that is the property of my probability. So, now this is saying in that case what I am doing basically I am integrating this function over the entire real line right. So, this is like basically I am integrating it in between minus infinity and infinity and then it is saying that this has to be 1. So, what it is basically saying is if if something is a PDF first natural condition it has to satisfy is if you integrate it it has to equals to 1. So, what does this mean? That is the area under my PDF is going to be 1 ok. And now if I have to set A to B is between some finite interval A and B all I need to do is integrate that function between A and B ok. Now, if I set A equals to B now what I am basically saying is yeah basically I am saying what is the probability that x equals to A and by definition if I integrate this function between A and A what I am going to get 0. So, whenever we have a continuous random variable we will not ask the question what is the probability that it is going to take a particular value x equals to A because that value is 0. Whenever we have a, but this is fine if it is a discrete random variable there it corresponded to its mass. If it has a mass at that point it will be that mass, but in the continuous case the basically what we are saying is mass at each point is going to be 0 ok. But the mass if I look into some region that may be positive, but at a particular point it is going to be 0 ok. Now, what is the meaning of then PDF? What does PDF is saying? So, in the PMF it was clear in the discrete case when we talked about probability mass function we know that PMF at a particular point is basically how much is the mass there, but if I just say ok P of x is equals to this what does this mean? Ok let us say I have a function x and this is my and let us say I have some function like this what does now I have some particular value x what does this mean? What is the value what does the value indicate? Does it indicate the probability at this point? No, we said the probability is 0 at that point, but then what it is indicating? Yeah it is kind of giving some rate of change of the mass in that neighborhood ok let us try to understand that. So, before that we know that CDF is simply probability x less than or equals to x and here x can be discrete or continuous ok. So, we can always define discrete or continuous. So, we can always define PDF for a continuous as well as discrete function and we said that PMF we define for the continuous and PDF for the sorry PMF for the discrete and PDF for the continuous case. Now by definition if it is a continuous random variable this should be minus infinity to x here and by our definition of probability sorry differentiation function the relation between PDF and CDF is this right because F of x is nothing, but integration of my small F of x between minus infinity to x. So, if you just differentiate it you will get this differentiation ok. So, that is what like one way to interpret is F of x at point x is the rate of change of my cumulative density function at that point x ok and is does this indicate that it is a positive quantity F of x at any point why is that it is not decreasing and differentiation may not always make sense for the discrete random variables as long as this function is differentiable this is fine ok. If if my function I cannot differentiate I cannot define this at that point. So, that is why remember that I am talking about now this continuous functions this is not what I am talking about is a discrete function this is for the continuous and naturally it is telling that if such a probability density function exist then your cumulative density function for that continuous random variable is differentiable at every point ok and then how the. So, does this differentiability implies continuity yes right that means it is saying that if my continuous random variable is if my if my continuous random variable has certain pdf it is already saying that my pdf sorry cdf has to be something like this it has to eventually go and one and saturate and it has to be increasing ok, but at every point there cannot be jumps it has to be continuous and differentiable at every point ok. Now let us go back and apply our definition let us take a small interval around the point a I am writing it as a let us take this point a and take a small point around this this is a plus epsilon by 2 and this is a minus epsilon by 2. I am now in taking it in small neighborhood of a and now I am asking the question what is the probability that my x takes value in this range by definition this is nothing, but integration of your f of x between a minus epsilon by 2 and a plus epsilon by 2. Now if this epsilon is very close very short like almost epsilon is almost tending to 0 you are integrating in a very very narrow region and in that region you can approximately assume that your f of x remains constant. In the small region when you are trying to integrate it in that you can approximately assume that is a constant and then this probability is nothing, but epsilon into f of x into a ok. So, it is basically saying that probability in that region is epsilon times that quantity that value given by this pdf in a way this is telling you that this is the weight or like it is the kind of rate with which the probability is changing in the neighborhood of your epsilon and that is why you can if when the rate when that epsilon is very small this probability you can simply take it as a product of epsilon and the value of your pdf at that point and this is all argument true only when x is that interval you are looking a small if it is interval is large I mean this is does not make sense ok. So, in a way what we can interpret is f of x at point a is a measure of how likely random variable x will be near point a I mean when when you conduct an experiment what is the likelihood that outcome is near my point a it is not exactly saying the probability of getting a it is saying that your outcome is going to be in the neighborhood of a what probability in a small neighborhood ok. With this understanding of what is cdf, what is pdf, what is continuous random variable, what is discrete random variable and those their properties. Now, we will study about some commonly used distributions which I have listed here. So, in i621 these are already covered. Now, let us today start talk about this continuous random variables. We are going to focus on some standard discrete random variables and we are going to focus on them because when we want to model something right which is better that you have some distributions which you know better ok and these are the some standard discrete random variables about which we know better and it is also like these random variables are comes pretty handy in studying various applications ok. Let us look into them. Bernoulli random variable x it is denoted by Bernoulli and it comes with a parameter p. So, whenever I talk about random variable I need to talk about what are the possible outcomes it outcomes are 0 1 ok and now I have to give probabilities to them. I am going to say that probability that x is taking value 1 is p and x is taking value 0 is 1 minus p and that p is the this parameter p and this p can be anything between 0 1 that is why this Bernoulli is a parameterized random variable with this parameter p. Now, where does this random variable can be useful? It could be useful to model your coin toss right when head comes you call it tail why you want to call it tail instead of that you simply call it is 1 and when tail comes you call it as 0 or depending on your application you may call other way also tail you call 1 and head as 0 and now depending on the bias of the coin head is going to come with probability p and tail is going to probability 1 minus p and all the time we are talking about a fair coin for which p equals to half but why p equals to has to half p can be anything like p can be 1 in which case you are in a Sholay movie or like a p can be strictly less than 1 ok. And can you think of any other example instead of just a coin toss it could be anything in many example any experiments will be just interested in whether something happened or not whether I passed or failed or like I won or loss you put money in a lottery and you will be interested in just knowing one and all there you can put use this Bernoulli random wherever the outcome is just binary like this is pretty much used in machine learning nowadays right you will be interested in a classification if you are given a photo you need to tell whether in that photo dog is there or not only two outcomes S or no there you can use this Bernoulli random ok next is binomial binomial comes with two parts n and p where n is p is again between 0 1 but n is an integer and x takes here value between 0 1 to all the way up to n. So, how many possibilities n plus 1 possibilities are there. Now I have to assign a probabilities to them probability that x is going to i is n choose i p to the power i 1 minus p to the power n minus i why I did like this we will see later but right now take it but once I say this it should not be arbitrary PMF I mean arbitrary values when I say PMF they should add up to 1 do they add up to 1 they do check ok. So, now where this is going to be useful this is going to be useful when you are going to do like a counting in successive iterations when you are repeating something and then you want to count suppose let us say you have coin you have thrown n times. Now you want to ask how many times head happened in all those n trials head may never happened in that case it is 0 it may have happened 1 or it may happen n times right and you may want to ask the question out of this n trials how many times head 3 times head happened or 5 times head happened like that. So, then you can use this and naturally if you little bit carefully look into this what is saying is if I are interested in i like number of heads is i it is basically says saying that the different ways 3 can come is n choose i out of your n trials 3 heads can come anywhere right in how many different possible ways they can come that is n choose i and whenever they can p corresponds to success 1 minus p correspond to failure. So, you are going to multiply and it will comes in that way geometric distribution another one which comes with parameter p and this is like you can think of like till the first head happens and that can take value head can happen in 1 second or 3 all the way up. So, this is takes all the values 1 to n and probability that it is going to take value i is 1 minus p to the power i minus 1. So, that is the example I give number of trials till head happens geometric and another one is Poisson. Poisson is also is over infinite sorry countably all numbers including 0 and now the probability that x equals to i is given by e to the power minus lambda to the power i by i factorial and here lambda is a quantity which is greater than or equals to 0 and again if you sum this we are going to see it as 1. So, Poisson comes with a parameter lambda and geometric comes with a parameter p and where is Poisson use Poisson use mostly used in counting like how many how many buses crossed IIT main gate today or how many cars passed or how many people entered IIT there like you may want to count and where the count value can be anyway between 0, 1, 2, 3, 4 like this and there you may want to use such a Poisson model and this rate is going to kind of control I mean that rate is going to describe these probabilities. So, these are the main four we will talk about and we will stop here.