 Dear students, I would like to present to you the concept of the marginal probability mass function. Let us consider a discrete random vector. In other words, a vector whose entries are discrete random variables. Now, when we consider any one of these random variables in isolation, its distribution can be characterized in terms of its probability mass function, right? This one, when we have more than one random variables, this one which we just now considered, the one we considered in isolation of the other ones, it is called the marginal probability mass function. The first thing is that when you concentrate on another variable, you will use this term for its distribution, marginal probability mass function. And of course, this word marginal distinguishes this particular PMF from the joint PMF. Joint PMF, of course, is the one which characterizes the joint distribution of all the variables taken together. You must be wondering why are the words marginal? To understand the words of marginal, the example of discrete variables is very good, because you can then think about a table, a bi-variate table, for example, if you have two discrete random variables. And in the first column of that table, you have the numerical values of the first variable, and in the top row of that table, you have the numerical values of the other discrete random variable. And in the body of the table, you have all the probabilities for all those combinations of values of the first variable and the second variable, and the sum of all those is equal to one. Now, when you think about this table form, then it is very easy to see why we use the word marginal when we would like to concentrate on the probability distribution of either the first or the second variable in isolation of the other one. It is as simple as this, that even in the first column, you have values of the random variable X, and in the top row, you have values of the random variable Y, then to get the marginal distribution of X, you just have to add up all the probabilities, which are in the various rows in the body of that table. When you add them and you write them in the margin, then this first column of the values of X, and this last column of those probabilities, which are in that margin of that table, they are the ones we are talking about, and because they are in the margin, so it is absolutely correct to call them marginal probabilities. Just like this, on the top, you have the other variable Y, the values of the other variable. Now, if you want to find the probabilities of these values, you simply add up the probabilities in the body of the table column wise, and in the bottom margin of this table, you get those sums, which are the probabilities of those values of Y that you have in the top row. So, X and probabilities of X, Y and probabilities of Y. The probabilities of X are in the margin of that table, and the probabilities of Y are in the bottom margin. Is it not obvious then that it is appropriate to call them marginal probabilities? And after that, when this was developed in this way, then even for continuous variable, I suppose they must have thought that we should just maintain the same terminology, and so even if those variables are continuous, when you instead of summing, you are integrating over one particular variable, you are getting the marginal PDF of the other one. This is the concept of marginal probability distribution.