 Hello! I remember that when we started this series of lectures, the weather was pretty bad. And in between we have had a lot of snow, but now it's really become summer and it becomes increasingly difficult to make the choice to go inside and enjoy the lectures. I appreciate that, so I'm very happy to see you all here this morning. Today we have quite an extensive program. Why we need to go ahead, and I also think that maybe we have to be a little fast today. But nevertheless, I would like to introduce to you a couple of books. So I'm not by any means promoting these books for sale, but I guess just like I did as a student you are also reading books different from the books you have to read in order to follow the courses. And some of the books I've read have provided me with some ideas and some insights which I would not otherwise have found and I would like to share this information with you. So when it comes to the history and philosophy of probability and statistics, quite recently I found this book which is called Chances Are, Adventures in Probability. And this is quite an interesting book, not only if you are interested in probabilities, but there are a lot of other insights of general value in this book and it also provides you with an insight to the history of the development of statistics and probability and it gives you an idea on the role of statistics and probability in human and also in natural sciences. So I can strongly recommend the information which is contained in this book. Another interesting book is the book of J. Ray Diamond with the title Gone, Germs and Steel which gives an extremely interesting perspective to the development of civilizations in the world. So it takes you basically from the year 50,000 years before Christ was born and brings us up to the present date and explains why the different civilizations in the world, they appear to be what they are today and how that all happened. And it also gives you an idea on how might progress in the future and actually the same also more recently published a book which is called Collapse which elaborates a little on the perspective for the future. So Collapse seems to be quite negative, but from such considerations maybe also we can learn a little. The third book which really has an interesting title, is the Art of Motorcycle Maintenance. So how many of you actually have a motorcycle? There's one there. I used to have one. So we are three. This book is actually quite a lot about motorcycles but also very little about motorcycles but it has a lot to do with the philosophy of learning and the science of learning in general and also a little about chasing your ghost and I guess each of you just like I. We all have a ghost which we are somehow trying to find. This is also a quite interesting book. Well, enough of that. Now let's get down to business. Today the main interest of the lecture is to look more into the details of random variables and how we can characterize random variables. We remember that we are using the random variables to represent the uncertainties which are influencing the events for which we would like to calculate the probabilities that they occur. This is the role of the random variables. Now we introduced a couple of characteristics or way of describing random variables in the last lecture. We introduced the probability density function and the parameters of the probability density function. We also introduced the commutative distribution function which is simply defined as the integral of the density function. We were talking about two types of random variables namely discrete random variables and continuous random variables. The discrete random variables were the random variables which were used to express uncertain phenomena which can only take discrete realizations. The continuous random variables were random variables to represent uncertain phenomena which can take realizations in the continuous sample space. In order to describe these random variables we are using the density functions and the commutative distribution functions. But it's not enough just to have these functions. We also need to define the functions in terms of their parameters. What we also looked at were the moments of random variables. These moments could also be applied to define the parameters or say the probability density functions and the commutative distribution functions. So that's more or less where we are today. We are starting from that and then we are looking at further properties of the expectation operator which we introduced. We will also look at the situation where we do not only have one random variable but where we have a set of random variables which we collect in a vector. So that's why we call it vectors of random variables or random vectors and when we are dealing with more than one random variable then we need also to introduce the concept of joint moments. So not only the moments for a random variable but also the moments involving several random variables. I will also talk about conditional distributions and conditional moments. Now if you try to think back on the basics of probability theory which we introduced in the second lecture we introduced conditional probabilities if you remember. So for the situations where we actually had some information about some uncertain phenomena the same applies actually here when we are dealing with moments and distributions we can also introduce those conditional on information which we know. Now then we move a step further and we look at functions of random variables and we start with the very simplest function you can imagine namely a sum of random variables and I will show you a quite nice little result involving how to establish the cumulative distribution function and the density functions for sum of random variables and then we are going to take a little leap jumping to general functions of random variables and how we can describe those in terms of density functions and cumulative distribution functions. So that's the program. Are you ready? Or did I just kill you? Well, random variables. So we are dealing with the real world and we are dealing with uncertain phenomena in the real world and these uncertain phenomena are generating whatever information we can observe of the real world and now what we want to do is to model the real world and the means for modeling the real world is to try to represent the uncertainties generating these data which we can observe and we do that by establishing probabilistic models based on random variables and all engineering decision making is based fundamentally on these probabilistic models and the random in a sense also the random variables. Now the random variables we can then also use to generate outcomes but these outcomes, these data which are generated on the basis of the models of course do not have the same kind of value as the real observations but at least the information we have in our model should in the best way possible also represent the real world and the data generated by such data should represent in the best way possible data which could be expected to come out of the real world. Now looking at the expectation operator what is the value of this thing? Well the expectation operator facilitates that we can assess the expected value and also the variance of a random variable so if we have some sort of uncertain phenomena the two first characteristics which can give us an idea about the uncertainty in the phenomena a central value and then a measure of the dispersion those are the expected value and the variance and for this reason the expected value and the variance are interesting to be able to assess. Now by understanding how the expectation operator works what we are able to do is to assess this expected value and the variance also of functions of random variables and of course this is of general interest in engineering because normally we are not just dealing with one uncertain phenomena but the engineering models are somehow sums and products and other functions of say a set of random variables uncertain parameters entering into some equations we are using to estimate or evaluate some problem. So it's useful if we want to analyze engineering models involving one or more random variables in regard to the expected value and their variances and this is why we need to look at it today and if one example for instance could be if we want to analyze the duration of a construction process as a function of the duration of the individual sub-processes so you can imagine that we have some construction process comprised of a lot of individual processes and if we have information based on experience of the duration of say typical performance of such sub-processes then we can start to make statistical or probabilistic assessments and models for the sum of these sub-processes and that gives us a tool, a framework for planning when we are dealing with construction processes or projects so that was just to give a little argument why at all we want to consider this now the expectation operator has some properties which are very convenient to know first of all the expected value of a constant so of a deterministic constant is simply equal to the constant itself so if there is no uncertainty then the expectation operator simply provides you the deterministic number if we are dealing with a random variable multiplied by a constant then the expectation operator works in the way that the constant just flies outside of the expectation operator and we end up with this term so the expected value of c times the random variable x becomes c times the expectation operator working on the random variable x now as a consequence of these two first rules the third one comes out immediately so if we have the expectation of a constant plus a constant multiplied by a random variable then the result simply becomes the constant plus and now we get the constant outside the expectation operator like this now it's also interesting that the expectation operator is a linear operator so if we have the expectation of a sum of two functions of random variables then that turns out to can be written as the expected value of the individual functions added up together so it's a linear operator we will come back to how we will evaluate the expected value of a function now the variance we remember that the variance is defined to the second central moment we introduced that the last time and if we now write that up as we have here then we have that the variance of the random variable x can be written as the expectation operation working on the squared difference between the random variable and its expected value so it's a second central moment in the sense that we are subtracting the mean value or the expected value and then we take it to the power of two now we can of course simply expand a little this expression here and we become this one here and then we can try to look at it a little we can immediately see that the expected value to the power of two is simply a deterministic value so we know that the expectation operator is a linear operator so now we can split this up so we can take the expected value of each of those terms so that becomes first of all this constant which is here let me just use a pin if I can find the pin I'll take this one and if we have this term here that becomes this term and this term here it goes down and we become this and the last term here is what we have down here so here we see that the constant goes outside of the expectation operator and now by looking at this term here this is of course equal to the expected value of the random variable x that means that this whole term here this whole thing here results in what we have here this thing here goes down here and this thing here goes down here now we see that we have some expected values to the power of two actually we have one here but then we subtract two so we end up only having one with a minus sign so this is the result this is now a quite useful little result which we will come back to now when we are looking at the variance operator and we remember that the variance operator can be defined through the expectation operator then it's obvious that the first result which is written here namely that the variance of deterministic constant turns out to be simply the to be equal to zero the variance of a constant multiplied by a random variable x now we have to imagine that this constant goes into the expectation operator and as a result of that this constant simply comes outside can be taken outside the variance operator but taken to the power of two and the final result because the expectation operator is a linear operator also the variance operator is a linear operator and that gives us now using the first result here that the variance of the first term the deterministic constant simply becomes zero so we end up by taking the variance of b multiplied by x and the result of that we get from the second equation so we end up with this term here so this is the variance of a linear combination of a constant and a constant multiplied by a random variable now I want you to look again at the expression for the variance of a random variable and what we ended up with was that this variance so the variance of a random variable could be written in terms of the expected value of the random variable to the power of two minus the expected value to the square now if you look at that and then you wonder whether if we now would take the the expected value if we wanted to evaluate the expected value of this term here whether it would be possible just to insert the expected value of x and then taking it to the power of two so in this sense insert mu of x here to the power of two and then subtract mu of x to the power of two and then we see that we get zero and that points to the situation that or say to the inside that we can only in the case where we are dealing with a deterministic quantity so a parameter which is not associated with uncertainty in that case we have that the variance would be equal to zero if we are dealing with some certain variable where there is absolutely no uncertainty then there would be no dispersion then there would be no variance and in that case in that case only we can say that we can in taking the expectation of a non-linear function we can take that by inserting the expected value directly into the function and then evaluating the function in this way but you see in general if we do it like that here we get the result zero it only applies if we are dealing with a deterministic variable in general we cannot do that so what does that lead to? that leads to the results that the expected value of a function of a random variable for instance like this this is a function of a random variable and here we have the function takes x to the power of 2 what we can say in general is that the expected value of a function of a random variable is larger than or equal to the function taken in the expected value and this is an insight which is valid for convex functions and this result is what we call the instance inequality this is a quite useful insight because if we want to do say if we are evaluating the expected value of such a non-linear function of course if it is a linear function then we know that the expectation operator is a linear operator and then it's very easy to evaluate the expected value but if it's a non-linear function and we don't have the information which is required or the equations which are required to evaluate in the correct way the expectation of this function then what we can do is that we can take the function in the expected value and we know that this the value we get would be smaller or equal to the correct value of course the equality is valid for linear functions because the expected value operator is a linear operator okay now we have a small little exercise and it relates so these small exercises they may relate to any which lecture but mostly we try to pick up material which we had in the last lectures this one also go back to the last lecture so assume that we have a bag full of pins which I guess is not the most normal situation but just imagine that you have a bag full of pins and all you know is that we have pins of two colors in the bag some are red and some are green and now you are asked randomly to put your hand into the bag and select a pin and pull it out and now the outcome of that the uncertainty associated with the pin you will be pulling out what type of uncertainty is that and I give you the following options red means aleatory green means epistemic and yellow means both I see there is a great consistency in the answers I would say the vast majority with very few exceptions there are a few exceptions I see one green and two yellow and let's try let's try to think about this result now I'm sure that when you think about this pulling out a pin and randomly selecting with closed eyes and whatever you can do in order to do it correctly you imagine that the real component here for uncertainty is aleatory but there is one thing you forgot you don't you don't know you don't know the proportion of the number of pins which are red to the number of pins which are green so this uncertainty will also influence the uncertainty which is associated with the color of the pin you are pulling out of the bag so imagine that you have to try to assess the probabilistic characteristics of this process of pulling pins out of the bag of course it's a big issue if there are only red pins in the bag if you knew that from the very beginning and you had to assess the probability that you would put in randomly your hand and select the pin that the pin would be red or green then this information of course would be absolutely crucial yeah the other way around if there are only green pins in the bag then despite however randomly you select the pin it will be a green pin with probability one so the in this particular case where we we don't have a clue on the proportion of green pins to red pins then this lack of knowledge this epistemic uncertainty also influences the problem ok one small additional exercise it relates to the probability density function and if you remember we had the definition that the probability density function can be derived from the community distribution function through the first partial derivative of the community distribution function in regard to the argument this is what we this is what we have here also we introduced that the probability of an outcome of a random variable in an interval the interval going from x to x plus dx can be written to this probability here namely that the random variable belongs is contained in the interval x to x plus dx and this probability is actually equal to the probability density function multiplied by the interval dx now if you look at this expression I'm sure that many of you already heard about this in high school if you let dx go to zero then it's also clear that this probability that the random variable will be in the interval this probability will also go to zero and that implies of course mathematically or theoretically that the probability of a continuous random variable taking a specific value is equal to zero now we could ask ourselves whether this means that it's actually impossible to obtain by observation a given single value and here there are three possible answers again if you support this statement that it's impossible then you should give me the red card if you feel the contrary if you mean no then you should give me the green card or if you don't know then give me the yellow card and remember we are talking about realizations of continuous random variables and I see a lot of green here I see a little yellow actually I see some yellow okay well no no in reality would be the actual situation and the thing is that whatever we observe we only will observe with a certain accuracy and that means that whatever discrete value we are observing it's actually not a completely discrete value but it corresponds to the interval given by the position of our observations and for this reason we actually always observing interval observations in a sense so for our observations there is a probability associated with what we observe okay let's take the next step now we looked at the expectation operator and we had a look at the variance operator and we also introduced important concepts relating to the expected value and the variance of simple functions I would like you always to bear in mind this Jensen's inequality that we can simply not just take the expectation of a function by evaluating the function in its expected values remember that now dealing with real engineering problems we do not only deal with one random variable that's very seldom and therefore it's useful to introduce the concept of vectors of random variables or random vectors in general such components may be dependent that means that their outcomes would also be dependent their outcomes may be correlated and we introduced the concept of correlation already in the CERT lecture now one example would for instance be rainfall and water level so the event of a certain amount of rainfall would be dependent would be dependent also on not on but say the other way around the water level in Zurich lake would be dependent on the event of rainfalls so there's a dependency between those two uncertain phenomena so it is necessary that we establish probabilistic models which take into account this dependency and what we can do in order to describe consistently such uncertainties and including their the interdependencies between the random variables is by introducing what we call joint cumulative distributions and joint moments so let's consider that we have a vector of random variables as I have this one here so now I write my capital letter X as bold and that always means a vector or a matrix so when I write a bold letter it's always a vector or matrix and in this case it's a vector and I've write it out for you so that you may see what this vector is comprised of namely a set of random variables X1 up to N we're dealing with a vector of N random variables and of course I write it here transposed because otherwise it is not in the form of what you understand to be a vector which is basically a column so I transpose it now you have a vector the joint cumulative distribution function is given just in a way similar as we had it the random variable it's given as to the probability that the random variables X1 is smaller than equal to the argument X1 the argument is contained in this vector here the random variable X2 is smaller than equal to X2 the random variable and it goes on up to the random variable XN is smaller than equal to XN and now we take the section of all these individual events all these individual events is what is expressed by the joint cumulative distribution function the probability that they will occur now just as we had it for an individual random variable we can develop the probability the joint density function from the joint cumulative probability or joint cumulative distribution function by taking the N's order partial derivative in respect to all the arguments so we have the arguments of the function those are these values here so if we now differentiate this thing here partially with respect to all the arguments like we do here then we obtain the joint probability density function so that's actually a very simple operation it's very easy to take the first order partial derivative of a function in regard to one particular argument now in developing this joint cumulative distribution function we simply do it in times so one time in regard to each of the arguments so we're just repeating the process up until the N's variable so that's really not complicated it looks much worse than it is I would say the next thing I would like to show you here it is an illustration where we have a joint probability density function of discreetly distributed random variables this is what we have here on the right side so these are this is the joint density function of two discreetly distributed random variables and what you see here are the probability densities so these are the densities p x y and of course here I just provided some numbers this is just an example so these define the probability densities those you have in this table okay this is an illustration of a joint density for discreetly distributed random variables now you can imagine that if we were dealing with two continuously distributed random variables then we would have a nice function like for instance the typical bell shaped functions which you have seen earlier and of course if you take all these densities which you have in this figure here and you sum them up then of course the sum should be equal to one because the probability of all events should be equal to one now for jointly distributed random variables and considering the joint density function of a vector of random variables we can define what we call the marginal probability density function and this is defined for one of the random variables in the vector so that would be the marginal probability density function of the random variable Xi and we can evaluate that by taking the joint density function for the whole vector and then folding this density function together in all dimensions except the i's dimension so we are integrating out over all random variable so over all the arguments except except dxi you see dxi is not in the integral so we are simply folding together this n-dimensional function over n-1 dimensions and then we get the marginal density function for the random variable Xi how does that look in this small example so we were starting out with this joint density function and now I would like to have the marginal density function of the random variable X here we have X here we have Y so what I now do is that I take the joint density of X and Y and then I integrate out over all Y values and the way this works if you go into the density function table here where I have the values defined then for one value of X namely 1 now what I'm doing is that I'm adding up the densities of those and then I arrive at this value here and I note this value the result is this value here this is in the X direction here ok then I do the same for this I note the values of these densities I add them up together and I get this value here and I continue this process I get this and I get this and this is now the marginal probability density function for the X variable so this is the way of folding together and of course the marginal density function say they give us some information about the uncertainty associated with each of the random variables but it gives no information about the dependencies between the two random variables X and Y ok very often when people are trying to describe uncertain phenomena what they really have in their mind and this is fair because it's very difficult to to build up an opinion about an indimensional space so what we tend to do as humans is to try to describe the phenomena individually and the individual description of a random variable would normally correspond to the marginal probability density function but we have to remember there's more to the story because this random variable which we are describing the probability density function implicitly the marginal probability density function would most likely be somehow independent with other random variables and in order to get the full picture we would need the joint density function ok let's take a break ok ladies and gentlemen we start this hour the next 45 minutes we are starting with again one of these extremely small exercises now consider the situation that the profit you all want to make a profit don't you is there anybody there who are not interested in profit well then we would be willing to share well let's assume that the profit is uncertain and it's modeled by a random variable X and this is a profit earned by a factor on a construction project so he is trying to look into the future and he is modeling this uncertain profit by a random variable and this random variable he has defined to a probability density function now this is a continuous probability density function which is defined in the interval from minus 10 up to plus 70 and we are dealing with measures of $1,000 so it goes from a loss of $10,000 to a possible income of $70,000 and of course this uncertainty is due to the combined effect of many individual uncertainties which are influencing the success of the project now of course the main concern of the contractor may be what is the probability that this project will give me a loss and for this reason I would like you to come up very easily, very fast based on this information the probability that the contractor will suffer a loss and you have a couple of options here red means the probability is 0.2 green means the probability is 0.2 and yellow means that the probability is 0 what do we have? we have most red and then there is like a green cluster which is quite interesting maybe there is some sort of group psychology also involved in the answering anyhow the right answer is red and you simply take the area under the probability density function which is negative and that is not so difficult here that would mean that we would have to to multiply this value here with the length of this interval here so 0.02 multiplied by 10 is equal to 0.2 yep you got that also the greens good then we will proceed okay so really I appreciate that you might find a lecture like this a little filled with formulas and well a part of this lecture is to give you tools on the basis of which you can do your own assessments and you simply need a minimum toolbox and I recommend this is really a good idea to have a minimum toolbox I know the problem in my house whenever I have to fix something which has broken whenever I have a problem with the windows who cannot close or something like that I always go and I find my toolbox and you can imagine my problem because in my toolbox basically I have a hammer and a screwdriver and that means that all the problems I can solve for me they look like problems which can be solved by using a hammer and a screwdriver and as a consequence of that the results are not particularly good I have to admit and also for this reason I don't have the problem that the neighbors are coming asking me for my help now the situation with you it should be completely the other way around it should be that you should have a toolbox which you can open whenever you have a problem and you can find the right tool so that when you have a screw the only tool you have will not be a hammer okay and it should be like that that your neighbors should be coming to you whenever they have a problem because they realize they know that you have the right toolbox and they know what tool to choose so a part of this lecture is really to give you some tools and I appreciate that it's not during the lectures immediately easy to see what type of problems really can be appropriately solved by any ways of these tools the thing is also that the tools are of a very general application so there may be many different aspects of problems which can be solved by the individual tools it depends very much on the problem you're dealing with and these problems will always be a little different so it's not that I can give you the example for the case where you need this particular tool there may be many many many examples and what you need to achieve is an understanding of what the tool really can do what is the contents of the tool and then during of course the exercise tutorials on Thursday we try to give you a variety of problems where you can see the different contexts where the tools can be applied having said that I will introduce a new thing in the lecture and this is a little exciting for me maybe not for you but for me it is it's also a little scary so you know of course in soccer the sheetrichter he has the possibility to show a yellow card and a green card and then please don't give me your red card so yellow card and the red card please don't give me your red card but if there is something which I am trying to explain and you really feel that this we could not understand then of course it's because I did not explain it correctly or in a good way and I can try to reformulate my explanation if that happens if you have a severe problem with something I said then give me the yellow card and then I can immediately see that I need to explain something better will you do that green means yes red means no you will do that, thank you now we are going to introduce what is called the covariance the covariance between the random variables in the vector and now we are looking at the random variable i and the random variable j and it's a random vector of continuous distributed random variables and this covariance is defined as the joint central moment the joint central moment so this is the expectation operator which we are using to assess the moment of random variables and now where we are used only to look at one random variable when we are defining moments now we have here the random variable xi minus its expected value multiplied by the random variable xj minus its expected value you see the equivalence if we were only dealing with the variance of a random variable then the i and the j would be the same so i would be equal to j and then this whole thing this whole thing here would cook down to xi minus the expected value of xi to the power of 2 this is what you have seen before now it's a joint moment so there are two different random variables and of course if this is the definition of the joint central moment it's also very easy to evaluate this using the normal equation for assessing the moments here we have the joint density function of the two random variables and now we are just integrating this density function over the definition domain of the random variables I'm assuming that the definition domain goes from minus infinity to infinity for both random variables and then we multiply by these arguments here then we do the integration and then we have the joint central moments of these two random variables and the result the result is what we call the covariance and this is this thing here now for the case that i is equal to j of course we get we get the variance of i mm-hmm what we sometimes do is that we arrange the covariances in a matrix and this we would call a covariance matrix and in this matrix we would then have the C X i X X 1 C X 2 etc which means that we would have the variance in the diagonal and then we would have the covariances so here out in the oops let's just go backwards then we would have the covariances out here in the rows of the diagonal so this would be X 1 X j here and the same way down and what we also have is that the covariance of X X i X j would be equal to the covariance of X j X i so this covariance matrix this is a symmetrical matrix now I would like to delete this thing again I have no idea how that goes let's see well that's a technical problem there's something called an eraser we can simply delete it's very convenient this thing here so besides a few red spots keep keep oh my god okay so another thing I would like to introduce are the correlation coefficients and those we get out of the covariances so the correlation coefficient between the random variable i and the random variable j is what we have here now it goes really good is what we have here and it's defined to the covariance between the random variable X i and X j divided by the standard deviation of X i and the standard deviation of X j and for this reason also you may realize that the diagonal elements of the correlation coefficient matrix which we can build in the same way as the covariance matrix that the diagonal elements so the correlation coefficient between X i and X i they are all equal to 1 because it becomes the variance or say the covariance no it becomes the variance divided by the standard deviation to the power of 2 so that's of course equal to 1 the standard deviation to the power of 2 is equal to the variance if you remember okay so the expected value and the variance of a linear function is what we are now really able to assess so if we have some sort of engineering problem which can be described by a linear function of random variables now for instance looking at the duration of a process comprised by sub processes which are following consecutively and there's uncertainty associated with each of those sub processes in terms of their duration then we can build a probabilistic model for the total duration of the project in terms of a linear combination where this is the deterministic value and then we have a sum of components here with random variables multiplied by deterministic components and we assume that there are in components in the sum and now knowing what we have learned that the expectation is a linear operator we become immediately this result here that the expected value of this function y is equal to a0 plus the sum of the deterministic coefficients here multiplied by the expected values of the individual random variables the same way from what we have learned we can assess the variance of this linear combination of random variables here of course the constant a0 disappears and we only get we only get this term here now the variance of ai multiplied by xi we remember that ai comes to the power of 2 outside the variance of xi and then we are summing up over those terms and now we also have to take into account the dependencies they also play a role for the variance and that comes to the products here with the covariances xi this is the variance between the random variable xi and xj multiplied by the coefficient which was used on xi and the coefficient which was multiplied on xj and then we sum up over i and j outside the diagonal so that means for i different from j over all components so now we have a very useful tool if we have some sort of engineering model which looks like this then immediately we can evaluate the expected value of this function and we can evaluate the variance of this function if we know if we know the variances and the covariances and the expected values of the random variables this is the information we need to know then we can analyze using second order information we can analyze the first and second order information of our linear function and based on that we can make evaluations of probabilities of events related to this function which may be important in our engineering problem we also dealing with conditional distributions and conditional moments sometimes it's useful to be able to assess the probability of some event given that we know we know something about one of the uncertainties which are which the event depends on as an example let's assume that we want to calculate the probability that this project we were talking about previously that this project will be delayed we want to have the probability of a project delay under the condition that one of the one of the sub-processes will exceed its plant duration by 50% so that may be useful information during the project if for some reason we get information about one of the ongoing processes in the project and this information supports the suspicion that this particular sub-process probably would be delayed by 50% according to the original plans then it would be nice to be able to update the probability of the duration of the entire project because that would now be useful information to have for the project planners that they can make decisions in order to reduce the consequences of this so when we are dealing with jointly distributed random variables we have a vector of random variables we assume that again they are continuously distributed then the conditional probability density function for the random variable x1 given the outcome of the random variable x2 is defined through the conditional probability density function x1 given now this sign here means given x2 and we also write the arguments if we want to do it in a way where everybody can see it immediately then we also write the arguments accordingly x1 is the real argument the value of x2 is given so the value here with reference to the example x2 corresponds to the 50% delay of one of the sub-processes and we want to evaluate the probability distribution of x1 which could be the total duration of the project now if we assume that we can start out with the information about the joint probability density function of x1 and x2 we assume that this is a basic information then this conditional density function we want to assess can be assessed through the joint density function this thing here divided by the probability density function of x2 taken in the value x2 so here this is a given value this is a joint probability density function x1 and x2 this one here now of course if x1 and x2 are independent then we can we can separate this function here we can write that as the density function of x1 x1 multiplied by the density function of x2 oops x2 and then by dividing by the density function of x2 x2 we see we can cancel out here and then we get this result here the conditional cumulative distribution function is simply taken from the from the conditional density function integrated up just like we do for any other density function the cumulative distribution is always simply the integral of the density whether it's a conditional density or an unconditional density it doesn't matter so for this reason the conditional cumulative distribution is defined through this this thing here integrated up from minus infinity to x1 the argument of the cumulative distribution yeah please remember that that this here is a given value corresponding to this value here of course if we have a conditional probability that density or say a conditional cumulative distribution function then we can achieve the unconditional by integrating the conditional cumulative distribution function up over the possible outcomes of the variable upon which we have conditioned and then weighing with the density of the random variable which has been conditioned upon and this is what we have here here we have the conditional cumulative distribution function and now we are integrating up from minus infinity to infinity over the variable which we have conditioned on namely this variable here and we are weighing with the probability density of this variable here so we are weighing with x2 in this case and by doing that we we get here the unconditional cumulative distribution function of the random variable x1 and this this equation is also referred to as the total probability theorem we are simply adding up the probabilities of of this of this thing here so this is the cumulative distribution function we are adding up this contribution over all possible values of the variable upon which we have conditioned namely x2 this one here and we are weighing by the density by the probability that these these outcomes will occur and this is what we have here this is the probability this is the probability of x2 in this interval given by the interval length dx2 now we can also define the conditional expected value just like we can define the expected value of a random variable and when we are dealing with the conditional expected value then again the way we do it is exactly as usual we write it like this so it's the expected value here of x1 given x2 so the expectation operator is working on x1 under the condition that x2 is equal to a certain value now just as normal when we take the expectation we integrate over the entire definition domain of the random variable here I assume that this goes from minus infinity to infinity and then I weigh the variable x1 by the conditional in this case the conditional probability density function x1 given x2 so it's completely similar to the unconditional case the only difference here is that the conditional density function when we do the expectation yeah it's time for a small exercise let's assume that we are dealing with this contractor again and now he's planning to buy three bulldozers for a new project and he knows that the probability density function by experience for the number of bulldozers x which is a random variable which will break down halfway during the project can be described through this probability density function so this is now a discrete probability density function giving you the densities of the breakdown of a number of bulldozers halfway during the project there's 1 divided by 8 probability density that none of the bulldozers will break down halfway during the project there's 3 8 for both the event of 1 and 2 bulldozers breaking down halfway during the project etc now I would like you to calculate the expected value of the number of bulldozers which would break down halfway during the project and red means 3 2 means green and 1.5 how can 1.5 bulldozer break down halfway during the project it's a philosophical problem yellow means 1.5 now you're giving me mostly yellow does that mean I have to explain this again or does that mean does that mean 1.5 hey come on we need more cards and give yourself 10 seconds and use a pencil to try to come up with the right answer okay I see mostly yellow and of course the right answer is yellow so we have 1.5 bulldozer breaking down halfway through the project and of course it's very easy to calculate because we want to evaluate the expected value of this random variable the number of bulldozers breaking down halfway during the project and we can do that it's a discrete random variable so we sum up over the possible events of the number of bulldozers breaking down halfway during the project which are of course the integer values here so either 0, 1, 2, 3 or 4 and then we establish the expected value through the first order moment namely the possible realizations weighted by the probability densities and this is what you get here the result is 1.5 that should be very clear now in many cases we are interested in assessing the probabilities of functions of random variables and this is what we are talking about today functions can be useful for describing the events we are interested in these functions define our engineering models so the engineering models will be functions of variables and when these variables are uncertain we can model them by random variables the simplest case we can imagine involving more than one random variable would be the sum of two random variables and when we are dealing with such problems it's possible to derive a very nice little result on the probability distribution function of the sum or say the cumulative distribution function of the sum of these two random variables there is also a more general result which relates to more general functions for monotonic functions of random variables we can also give a result and I will also give you that now if we could focus here a little I know that this might look a little bad I hear some mumbling is there anything more we really need to focus on this otherwise it will fly over your heads consider the sum of two random variables x1 and x2 this sum we call y let's assume we have the joint density function for the two random variables x1 and x2 now in order to proceed we make the trick let's assume we know the outcome of one of these two random variables x1 let's assume we know this and now we want to analyze the probability density of y given x1 are you following this up there now assuming that we have x1 given what we need is the conditional density function of x2 and given x1 and we have just learned that we can achieve that by taking what we had from the start namely the joint density function and dividing it by the density function taken in the value upon which we condition namely this one here okay so this is now what we have the conditional density of x2 given x1 now that gives us that gives us a first means in order to assess the probabilistic structure of y namely now we can write the density function the conditional density function of y given x1 by substituting by substituting x2 with y minus x1 so y in the argument we take from here is now x2 and x2 we can isolate by substituting x1 from y okay this is what we have here so now we have the conditional density function x2 given x1 in the argument y minus x1 conditional of x1 and based on this if we take this result here then we can if we take this result here I'm sorry if we take this result here which is the conditional density function and we want to uncondition it then all we need to do is to multiply it by the density function of x1 then it becomes unconditional so this is what we do here we take this here and then we multiply by this term and then we get the joint probability density function of y and x1 and this is now equal to the density function of x2 and x1 in the argument of y minus x1 comma x1 okay what in order to now get the built density function of y which is which is simply the marginal then all we need to do is to integrate out over x1 so this is what we are doing we are integrating this joint density function defined for the random variables x2 and x1 we are integrating that over the entire definition domain of the random variable x1 and so now we are getting the marginal and this is the probability density function of y what you can see here is also the nice thing that if the two random variables x1 and x2 are independent then we can split up this density function as usual then we can split up this function into these two components here so and this one and this is what is called the convolution integral in probability theory now you can imagine in this case we were only developing the density function for two random variables the sum of two random variables but now you can continue so if you have a sum of n random variables then you can apply this expression for each of the two random variables and then you can group together and you can continue all the sum of all the random variables in the sum so in this way we can actually we can develop the probability density function of any sum of random variables for which we have the joint probability distribution functions and it becomes especially easy if the random variables of course are independent as you see there is also another result which relates to the cumulative distribution function of functions of random variables now we are looking at general functions y equal to g of x x is a random vector and we assume that we have the cumulative distribution function of this random vector given as this joint distribution function now we need to differentiate a little if the function g of x is a monotonic increasing function and represents a one-to-one mapping so that is the real function one-to-one function the realization of y is smaller say is smaller than y0 only if the realization of x is smaller than x0 where x0 is evaluated through the inverse of the function of y0 now I think I said the vector x here but that's not true it's a scalar it's not a vector it's only a scalar so this is a function on a random variable and it's a monotonic function and it's a one-to-one mapping from x to y so if we have that the important thing here is to realize that the realization of the random variable y can only be smaller than y0 if x0 the realization of x where x0 is given as the inverse function taken in y0 in that case we can write the probability the cumulative distribution function of y which we can always just write up as the probability that y is smaller equal to y the argument that becomes the probability that x is smaller than or equal to the inverse function taken in y and then in that case we have the cumulative distribution function for y can be written in terms of the cumulative distribution function for x taken in the inverse function of y with argument now of course if that's the case then it's also easy to function because we simply take that through the first order partial derivative of this function here with respect to the argument y so this is what we have here and then we realize that this we can write because now we have a combined function and when we take the first order derivative we have to do it in two factors the first factor is simply the first order partial derivative of the distribution function which is the density function taken in y and this we become here no sorry this we become here and the other term is the first order partial derivative of the function inside the cumulative distribution function and this is a term we have here okay and now looking at this this term here we realize that the inverse function of y from the very beginning is equal to x and therefore we can write this expression here in this way here so we have the density function of y where y is a function of x can be written immediately as the differential quotient dx dy of the density function for the random variable x if we have let me just fold up this lecture in the case that we have a monotonically decreasing function instead of an increasing function then the whole thing applies again but here we need to change the sign so if the function is monotonically decreasing instead of increasing then we have to change the sign and for this reason the general result we can derive is based on the numerical value of this differential quotient dx dy so that applies for generally monotonic functions and finally we can analyze it to the situation where we are dealing with a vector of random variables and in that case in that case I was not planning to give you any derivations at all anyway but I wanted to give you the result and you see it's very similar if we have the density function of a random variable x is the starting point then we can establish the probability density function y which are functions of the x's in terms of the what we call the Jacobi matrix defined through the differential quotients x to y and where this signs here means the absolute value of the determinant of the Jacobi matrix that's a very good thing now that was the absolutely last overhead for today and I will wish you a nice week