 Well, good morning. We are just about to start and Actually, we are already 15 minutes late. I realized in this building here. We always have to start quarter to the lectures and That means Whatever we do we have to be out in the right time, but today it should not be this bigger problem Please come in and take a seat 10 francs per seat. Yeah, are you in? I Will try to give you a few minutes break anyway, you deserve it But thank you Okay, I can relax. This here is the third last lecture you realize We will have today Then next week you will have the small exam then there will be one ordinary lecture in The following week again, and then we will have the extra lecture. So this is the program today we will have a short look at what we looked at the last time and We will try to find out where we are in the framework of Model building and estimation and Then we will take a look at how can we how can we evaluate models Which we have developed how can we evaluate their goodness? how good are they and We will look at two methods in particular the first one we will have a look at this called the Chi-square goodness of fit tests and the second one is the Kolmogorov Smirnov goodness of fit tests and after that I will give some Comments on how to compare models So if we have verified models Already and we find that they can be verified using these statistical methods Then it might still be interesting to see if two competing models can be compared and we can evaluate which one is the better one Okay, in the last lecture We were looking at Another interesting problem because in that lecture we started out with the bases that we had already identified a Communitive distribution function For a random variable which we would like to use to model some uncertain phenomena in engineering and Having made that model choice the next issue was to find out. Okay with this community distribution function Which is basically still only a family is a function of some distribution parameters like mean values standard deviations or just some Constants which go into this expression for the cumulative distribution and The issue was how can we estimate these parameters if we have data? So assuming that we have some sample data. We have some observations We have some results from the laboratory. How can we estimate these parameters? and We learned that this can indeed be done We can do that and there are different types of methods and I Introduced to you two types two very different types of methods one Provided us point estimates of the parameters Meaning that we would get values for the parameters we were looking for based on the data We have available But we would not get any information about the uncertainty associated with these parameters This method is what we call the method of moments There are other methods providing us point estimates, but this is the one I wanted to Make sure that you had heard about it's a very convenient little tool Then I also gave you the if if you can consider this here to be the Volkswagen Beetle version of estimation methods, then I also gave you the paucer and This is the method of maximum likelihood which from a philosophical point of view is A much much stronger formulation and it can be used in many contexts in statistics and probability and Using this Formulation we could also estimate the parameters indeed, but we would not only get point estimates We would get a full probabilistic description of the parameters so the results of using the method of maximum likelihood It's not just values for the parameters, but it's a full distribution of the parameters and I Showed you That it turns out that these distribution parameters Normal distributed if the number of data we have available is big enough if we only have small numbers of data then The method of maximum likelihood can still be used of course But the distribution is no longer normal distribution Then it turns out that the parameters are t distributed and the use of the results in in this manner becomes more complicated So we would get the parameter values in Terms of the mean values We would also get information about the uncertainty associated with the matter with the with the values of these parameters in terms of the covariance matrix for the parameters Now the method of moments. How did it actually? work It worked in this way that if we have a distribution with Let's say two parameters, which we need to estimate then we can first formulate the sample moments equations Using the equations for the first and the second sample moment We can calculate those based on the observed data That nothing can be more simple than calculating the same the first two sample moments if you have a data set if you have some observation now if you as we started out with if you Already have chosen your community of distribution function or your density function you want to consider for your modeling the issue really is how to determine the parameters of these density functions so the idea in the method of moments is that we can also analytically establish the equations to determine the first two moments of Those random variables corresponding to the chosen density functions Now using these analytical expressions we can write out the first and the second moment So now we have two two ways of representing the moments One based on the data and one based on the analytical expression okay so we calculate the values corresponding to the sample moments and Then we set these Sample moments equal to the analytical moments and then we solve the equations in regard to the unknown parameters and In this case when we're looking for two parameters, then we need to consider two Moment equations, so we get two equations with two unknowns and then we have to solve these two equations simultaneously and Sometimes this can be done analytically Sometimes we can do it numerically and I showed you using Excel also the last time how that very very conveniently can be done Can be formulated into an optimization problem where you minimize the difference Between the sample moments and the analytical moments The difference squared between the first moments the difference squared between the second moment and then we add up these differences Then we get what I called an object function And if we then minimize the object function in regard to these unknown parameters, then we find Then we find the optimal parameters in the end as those parameters satisfying these equations Very easy Very convenient to use simple tools like Excel to do it. Of course you can use Probably some of your calculators can can do these things if you program them a little you can also use MATLAB or you can program it in C Any which way it's quite simple In the method of maximum likelihood as I said we get full distribution Estimates of the unknown parameters What we do using the maximum likelihood method is that we The idea is to choose the parameters which we are looking for in such a way that the observations we have become most likely So that's the underlying philosophy What we have seen what we have observed the results we have from the laboratory are the Simply the most likely results we can imagine now how do we estimate the parameters so that This is fulfilled that the likelihood is maximized So what we did was that we formulated a likelihood function Which is nothing else than the product of the density function Which we have chosen to model our uncertain phenomena Taken or evaluated in the value of the In the sample outcome that the value of the data we have so if we have a vector of in Observations, then here you have Xi is the ice observation in the data set and then we make this product and of course the maximum likelihood can be achieved by Searching for those two parameters which we want to estimate such that this function here is maximized that's equivalent to minimizing the minus likelihood function and Then there's one particular nice feature and that is if we look instead of the likelihood function We look at the logarithm of the likelihood function so you can imagine now that we take the logarithm in front of this product and then you know by the rules of operating with the logarithm that Exponents or say products they turn into sums So this is why instead of this product sign we have a some sign here So now we are just summing up the logarithms of the individual density functions evaluated in the outcome of the samples The convenient thing about well very often it's it's more convenient simply to operate with a sum than a product The numbers which come out of it are also in a nicer range So there are several conveniences, but there is one of course Optimizing this function here is completely equivalent to optimizing this function. We have not Done anything fundamentally with the function. We just take the logarithm doesn't change anything in regard to where the minima is located and When we are then optimizing searching the for these unknown parameters then the values which come out as The optimum values we can consider those to be equal to the expected values and Those values here would correspond exactly to what you would get out of using the method of moment Yeah But we also get additional information because if we take the second order partial derivative of this lock likelihood function, so we take the derivative to First one parameter and then the other parameter That would correspond to the second order partial derivative like we have it in this expression here if we develop the Hessian matrix based on the second order partial derivatives of this lock likelihood function evaluated Evaluated at one particular point namely this point here So when we have found this point then we insert These values and then we take the partial derivatives Okay Then we get what we call the information matrix H This is sometimes called the fissure information matrix And if we take the inverse of this thing then we get the covariance matrix That contains the information about the variances of the individual parameters But it also contains information about the correlation between the parameters So and as I said From theoretical considerations, we can also show that the parameters are jointly normal distributed. So in the end we have full information on the Uncertainty associated with the estimated parameters Extremely strong tool If you want to forget a method say let's imagine that you would forget Either the method of moments or the method of maximum likelihood after 10 years from now. It could happen Then I hope you will you will remember the method of maximum likelihood But of course you will never forget it. That's a ridiculous thing well Looking at estimation and model building we have been looking at this small sketch here a couple of times and We have been looking at different issues relating to to the Various inputs to model building we have been looking at data Quite significantly and we have been looking at how to choose the family of distributions We introduced as a very heuristic kind of method we introduced the the concept of probability paper, but I Want you to keep in mind that probability paper is not like the solution to all problems physical and mathematical understanding of the problems which we're dealing with is is is still the Guideline we have for choosing appropriate families of cumulative distributions and you have also seen throughout the lecture that Depending on the characteristics of the Uncertain phenomena we are dealing with we can end up simply based on physical or mathematical considerations. You name it We can end up with normal distributions log normal distributions extreme value distributions distributions which we call distributions of statistics like the chi-square or t-distribution The f-distribution The gamma distribution The exponential distributions all those distributions came out of physical considerations and model assumptions So we should always keep those considerations in mind when we are Looking for the best choice of a distribution family but having this insight into the characteristics of the problem Still probability paper can be a useful thing and we can check whether our thinking Is plausible Simply by plotting up data into a probability paper and we can see if there's some gross contradiction between our model assumptions and the data which we have available and I said and this is very important when you when you look at the probability paper Please keep in mind that you want a good fit in those regions of the data Which is important or which are important for the decision-making Okay The next step was the last lecture where we looked at the estimation of the distribution parameters We have talked I guess enough about that but what we Really would like to do now Is that we would like to check the validity? The validity can we can we check the goodness of our Model so the joint consideration of the choice of the family of distribution and The estimated parameters is a formalized way We can validate this model and indeed there is and this is what we will look at today And having done that if we have Established such a model if you have the distribution family if we have the distribution parameters Then indeed we have ended up in a situation where we have a probabilistic model to represent the uncertainty in our engineering problem and Then we are home free then we have we have come a Very big step forward Because then we have a representation of the uncertainty Looking at all the specifics of the problem even the validation the next step Which we will take in the subsequent lecture Will be concerned about how to use these probabilistic models now in the calculation of Probabilities of the events which are important for our decision-making So you will learn in the subsequent lecture that will not be next week because next week your BC will all sorts of other things But the week following we will see how we can formulate events of Engineering problems and how we can calculate the probabilities based on these probabilistic models Then if when we have the probabilities you remember that we use Probability within the framework of risk-based decision-making So all we need when we have the probabilities All we need is also to have the consequences if the events happen when we multiply the probabilities with the consequences of the events then we have the risks and In the very very last lecture you will see that when we have the risks Then there is a fantastic framework on how to identify optimal decisions using basic concepts of decision theory So that's the outline where we are now but before we Go any further in today's lecture. This is small exercise so Let us assume that we have identified The maxima of the log likelihood function in a parameter estimation problem so the log likelihood function is this animal up here and the maximum corresponds to the Parameter value being equal to pi Now the question is What is the value of the likelihood function not the log likelihood function, but What is the corresponding value of the parameter of the likelihood function? So it's not the value of the likelihood function. It's the parameter of the value in this Corresponding to the to the maximum. I say it again if we have maximized the log likelihood function And we have found that the optimum parameter is equal to pi then choosing these For the same problem using the likelihood function Identifying the maximum of the likelihood function. What would be the parameter value? Would it be equal to this thing here this thing here, or would it be equal to pi? You are thinking some are thinking yellow. I see a red. We have one red. I see some strange markers What else do you have in your bags? Okay, I think maybe as it's written there Maybe it's a little confusing but those of you who understood what I was trying to say you Guests or you came up with a very correct evaluation that that the right answer is pi So it doesn't matter Whether use whether you formulate the problem using the likelihood or whether you use the log likelihood The value of the parameters optimizing these functions are the same So it's completely equivalent You can use the likelihood. You can use the log likelihood Yeah the advantage with the log likelihood is that if we use the log likelihood and We use this concept of establishing the Fisher information matrix based on the second-order partial derivatives of the log likelihood function Then we get the covariance function But the parameter values the mean values of these parameters are equally well determined using the likelihood or the log likelihood it doesn't matter now Let's assume that we have selected a distribution function as a model to describe an uncertain Quantity, so we have an engineering problem one of the parameters in the equation. We have formulated Which is important for our engineering decision-making one of these parameters is uncertain now We have selected a distribution function for this uncertain phenomena We have a probabilistic model of this parameter so data and physics you can say provides us with the distribution family and also the distribution parameters and And So that ends up for instance in a probabilistic model for the concrete compressive strings Here you see a density function now this density function as I will show you the next lecture We can use for calculating probabilities of certain events But we have everything we have the family and we have the parameters So what we now want to do is to validate this model by means of a statistical test and So we will look at two principally different cases namely the verification of discrete distribution functions or Discrete commutative distributions and continuous commutative distributions and for discrete Commutative distributions We have the chi-square test and for continuous Commutative distributions. We have the Kolmogorov Smirnov test Now I want to give you a small comment to this because now Please keep in mind this okay What you will see is That Sometimes when we are dealing with a continuous distribution function Then we formulate The tests in such a way You can say we discretize We discretize a continuous commutative distribution so that it is represented as a discrete commutative distribution and and then we are using the chi-square test To test with the model is a good one or a bad one So it's a little it's a little tricky This chi-square test is formulated for the use for testing of whether a chosen Discrete commutative distribution is Appropriate to represent data But in many cases if we have a continuous one then we discretize it So that we can use the chi-square test And you will see that everywhere the idea behind the chi-square goodness of fit tests is that the difference between the predicted and observed sample histograms should be appropriately small so if we have a model we can calculate a Sample histogram based on our model and Then we would like to compare this sample histogram with a sample histogram. We have established from data from observations so this is what we do and If the difference if the deviations over the histogram Of the relative frequencies or the absolute numbers if these differences are small Then our chosen model is probably a good model and this we can we can we can show with significance by statistical testing just as we are always using these statistical tests to show to Check if we can reject such an assumption with a certain level of significance and This is now also what we're going to do here. So you can imagine that we have the light blue bars in this histogram are the The number of values belonging to a certain range of values Simply based on the observed data And now using our model we can we can also establish such a histogram the Dark blue one, but this is now based on our model only and we construct that Artificially it does not correspond to data. It corresponds to a model. Then we check these small differences for oops that's very exciting Now it's standing still. So these are the differences and Then we are summing up the square of these differences over all the ranges and We are using this sum as a statistic which we can We can test Statistically at a certain level of significance and of course we would like the sum of all those deviations Squared to be small and if it is small with significance, then we cannot reject our Assumed model the model on the basis of which we have produced the dark blue bars now you remember that the cumulative distribution function for discrete Distributed random variable can be written in this way So the cumulative distribution is simply evaluated by summing up the Probability Contributions at the individual discrete values which are represented by the random variable If this would have been a dice then you would have had Contributions at the value of one two three four five and six and they would have had equal height here So what we are doing when we establish thing the community distribution is that we are summing up This one plus this one is what we have here at this value Then we add this one at this value and then we add the last one and then we end up with one now Assuming that we we sample a discrete Distributed random variable in X the random variable X where So we sample it in times then the number of Realizations of this random variable having at the value X I Can be considered a binomial distributed random variable So imagine again, you're throwing dices. So the number of times you will you will get a certain outcome of throwing dice Is a binomial distribution every time you throw your dice? It's a trial in a in a Bernoulli trial and you have a probability of success and you have a probability of failure Throwing dices the probability of success is equal to one divided by six and of course The probability of failure is five divided by six so the number of outcomes of Discretely distributed random variables at a certain value is a binomial distributed random variable and So if you realize that then you can calculate the expected value of the number of outcomes of the random variable at a certain value and this is simply equal to N which is the number of trials the number of repetitions of your say your Bernoulli trial multiplied by the Probability of the outcome of that particular event and we call this capital letter in With the index P because it is the predicted number of outcomes of the random variable at this particular value it's in multiplied by P and I this index refers to the particular outcome Value and again also we can calculate the variance when we know it's a binomial distribute random variable Then we have this result which we have shown previously that the variance is equal to the Say N multiplied by P and then multiplied by 1 minus P So this is a variance There's another small thing which is very useful and That is that if the postulated model is correct and N is large enough Then the difference The difference between the predicted and the observed Would turn out to be standard normal distributed this this difference here, which has been normalized We have subtracted the mean value and we have divided by the standard deviation This is now standard normal distributed which is an plausible assumption If due to the central limit theorem So all these small Deviations turn out to be standard normal distributed if we write them in this way. So when we standardize them This is useful. You know now you already see where we are Yeah, the next step is of course the extremely logical step Because I said we want to look at all these deviations over the different possible Outcome values of the random variable. So we want to look at all the epsilons corresponding to the different eyes and Of course a negative deviation is just as important as a positive Deviation so we cannot look at the absolute values of the epsilons But if we look at the squares of the epsilon Then we consider the negative deviations just as important as the positive ones So this is why we look at the squared epsilons and this now this squared epsilons thing here I'm writing up as the sum of the squared differences over all intervals of the outcomes of the random variable and If you just write it up on The basis of these standardized normal distributed Individual errors or deviations then it looks like this now. There's this one one one slight thing we need to take into account and That is the fact that if you Let's say we have ten Experiments so in the first interval we could have two outcomes In this second interval we could have two outcomes now we have four In the third interval we could have two outcomes again, then we have six Now in the last interval How many outcomes we have we have two plus two plus two so we already have six and And those can that can realize completely free We there are no limits on the number of realizations if we first have realizations in this interval then in this then in this Now if we already have six and we only have one interval and we have ten experiments Then we simply have to have four in the last one. There's no choice. So this implies This this small illustration implies that there's a dependency between the number of outcomes in the different intervals So we don't have we don't have freedom In if we have we only do ten experiments then there's absolutely no way we can have 15 outcomes in the first interval or so if we have eight in the first then It's already clear that the sum of the outcomes in the last intervals has to be two so we have dependency introduced into this statistical model and due to this dependency It turns out that this this Quantity which you might Based on this expression here Immediately have an idea that that would be chi-square distributed with k degrees of freedom because you are you're summing up K Epsilon's to the square and each of the epsilons were standard normal Distributed Then it actually turns out due to this dependency That it is chi-square distributed But only if we multiply this term here by by by this Factor so this falls out and If we only consider k minus one degrees of freedom But if we do that then it is a chi-square distributed random variable. So this in effect This modified epsilon Is a chi-square distributed random variable with k minus one degrees of freedom? This is now a test statistic we can use to test the goodness of our chosen model Just like we have been doing with the other types of tests we have done on the mean value and the variance Given different levels of information Now we can formulate a test for the goodness of a certain Model for an uncertain phenomena so using the Usual scheme for significance test is that we we have to choose a level of significance And what we are testing is is that if the sum of observed squared differences is plausible small So we are postulating the H 0 0 hypothesis the null hypothesis that the assumed distribution function is say is not in gross contradiction with our Observed data so we formulate the operating rule that this test statistic should be Should be larger than a certain Delta With a with a probability Equal to alpha Now we could have formulated this alternatively so we could we could We could we could now we are validating the choice of of one model We are saying we want to test whether this model can be accepted We could also have formulated it the other way around that that we can The null hypothesis is that the chosen model is not a good model That's completely in in terms of what we are looking for is equivalent but This test is is not is not so explicit because that would That would just tell us that it could be any which one of all other possible probabilistic models which could be a better one So in this formulation we are we are checking whether the particularly chosen model Can be rejected or not so Considering a small example Let's assume that we are dealing with a model Where we have chosen already that we are dealing with a normal cumulative distribution and That the parameters of this distribution Has not been estimated using the data So we have a data set we have a let's call it a postulated model for the Family of distribution and we also have data These data have however not been Estimated using the data we are looking at or the parameters have not been estimated using these data And let's assume that the mean value is equal to 33 and the standard deviation is 5 then Of course as I said in the beginning when we looked at the chi square a testing scheme that The normal distribution is clearly not a discrete distribution But what we can do is that we can discretize it and then this is what we will do in order to be able to use the chi square test So we can easily discretize this Now you have the density Function here we can split that up into intervals like the first interval being from zero to twenty five so what we look at now is the We want to model this here by by a discrete Distributed random variables, so we look at for this interval What is the probability that we will end up in this interval and now? We can use the standard normal cumulative distribution function simply calculating the area under this part of the curve here So from the interval 0 to 25 We first we integrate Yeah, so we we calculate the value In the other end of the interval which corresponds to 25 now we have subtracting the mean value and divided By the standard deviation because we are using the standard normal cumulative distribution function this one and then we subtract corresponding to the other end of the interval which is Zero or minus infinity if you like it doesn't matter it doesn't give any difference in this case, but in principle we should choose The lower end of the integral and This here is now the probability that we are in this interval now Using the binomial distribution the number of points Which we would have in this interval would be equal to 20 times this probability and If we make this calculation, then we would get out A number which is equal to 1.1 So this is the value you have here and we can do that for The intervals 25 to 30 30 to 35 35 to infinity and we would we get these predicted number of observations in the individual intervals That's very easy you can do that for any any chosen Model, so if it would not have been a normal distribution some other distribution you could do it in exactly the same way Simply by calculating what is the probability now that I am in this interval and then multiplied each time by the total number of experiments and Here we assume That we we have a data set with 20 experiments This is why I'm multiplying by 20. I think I did not explain that no So this is then we end up with this histogram here This is the predicted histogram This is a total number of experiments and this Corresponds in that way now what we can do is that we can compare The observed and the predicted histograms So let's assume we have some observed histogram Now however, and that is quite important thing Due to the fact that we have a relatively low number of outcomes in the first interval and We were taking basis in the assumptions that we would have a sufficiently large number of outcomes in each interval only in that case and Only also if our model was correct, then we could assume that these epsilon for each interval would be normal distributed yeah and and having say on on on on average One realization in one interval is not enough for that assumption to be valid. We need therefore to Discretize a little more. So what we do is that we lump? these two together We lump these two bars together So the first interval will be longer the first interval will not be from zero to 25 But it will be from zero to 30 in that way, we have a sufficient number of outcomes in the in each interval and As a general rule, we should have at least four no, I'm sorry five five outcomes in each interval So if you have less Outcomes in one interval you need to lump them together Now what you can do Here we now have three intervals From zero to 30 from 30 to 35 from 35 to infinity and what we can Based on the data we can calculate the number of observed values in each of those Histogram intervals and in this example here. We have five in the first nine in the second and six in the third That eventually turns out to be equal to 20 the sum of those We can determine the probability Based on our model that we have Realizations in each of those intervals. This is what we did with the standard normal cumulative distribution function Those probabilities are equal to the values are written up here easily calculated and of course the sum of those is equal to one Now using the fact that we have 20 experiment outcomes the predicted number of outcomes in each interval is Simply this number here this and this multiplied by 20 and you end up with these Values here for the predicted number of outcomes now. It's a very simple thing to calculate the differences between the observed and the predicted That is the epsilons and You can calculate those contributions here to our statistics By taking the squared differences normalizing with the total number of Predicted observations and you get this number here. This is our statistic now if we look at the if we want to conduct the test at say the five percent Significance level And we know that the tests statistic is chi-square distributed with So we have three three components in the sum so this means that K is equal to three, but we had we had to subtract one degree of freedom Because of the dependency of between the number of outcomes in each interval, so we only have two degrees of freedom Based on this we can we can calculate the operating rule the acceptance criteria to be equal to five point nine nine now as As the calculated test statistic is significantly smaller than five point nine nine Then we cannot reject the Null hypothesis the H zero hypothesis So this model the normal distributed model with the With the parameters Mean value equal to cert a tree and standard deviation equal to five Cannot be rejected on the basis of the observations What is happening? Yeah, sorry? Yes, of course It's a very good comment All right, it's a little confusing we have no bells and nothing and you're just sitting there and listening and I feel So lucky that you're so you're so interested It's really difficult to stop again No, I'm sorry And if you had not stopped me I would have continued promise So let's let's have a one of the usual breaks so 15 minutes and That means that we will meet more or less 10 minutes 10 minutes after nine I guess yeah Thank you Okay We have to continue Whether we like it or not in the last example we assumed that The parameters of the distribution function which we were testing had been estimated or Assessed not using the data. We were using for the test itself. Okay now this is important if One or more of the parameters For our probabilistic model have been estimated based on the data. We want to use For the test itself Then we have to subtract degrees of freedom depending on the number of Parameters we have estimated so in the case for instance where we Had estimated the variance from the data, but for instance not the mean value We would have Only one degree of freedom We had three intervals to start with we subtract one because we have to there's a dependency between the number of outcomes in the individual intervals and Then we have to subtract one more for each parameter, which we have Estimated using the data we are using also for the test So if one Parameter has this has been estimated based on the data Then we have to subtract one more and we end up with only one degree of freedom in this case You see here. I maintain the same expected value as we had before But now the standard deviation is slightly different This is now a value which has been Established based on the data we are using for the test itself But everything else is the same we can divide into intervals again We can kill we cannot observe so the number of observed values in the different intervals is exactly the same The predicted probability Is not the same Why? Because the standard deviation has changed so that of course has an influence on Our predicted probability that we have realizations in the different intervals Now we do again. We calculate the predicted number of observations in the individual intervals and We can establish our sample statistic and in that case at the same level of significance But now looking at the chi-square distributed random variable with only one degree of freedom as the test statistic We get a criterion delta being equal to 3.84 However, our test statistic is again much smaller than this number so also in this situation we cannot reject our null hypothesis H zero So It appears to be impossible to reject this hypothesis no matter what we do, but of course it's not the case but for these models We could not reject that these models were valid based on the observations now time for a small exercise so out of these Equations which one or one of those Corresponds to the Method of moments Come on. We need some color cards. It should not be so confusing If you feel that there are more than one correct answer to this small exercise Then please keep in mind that you have more than one color card. Of course, I don't want to press you to show two colors, but Thank you very much. It seems that you you definitely feel very green about this question some of you also think that yellow is appropriate and indeed indeed green and yellow is correct because Here you have the the sample moments Equations and then they are set equal to the expected value based on the analytical analytical expressions for the For the moments of random variables in the two cases namely the continuous case and the discrete case So these two equations would be strongly related to the method of moments Now I just just for the for the record Some of you send me this little flyer here and I think it's very nice of you But I assume it's it's it's your mobile telephone number But it would be nice if you could also leave your name so that I would know who I would be calling So and you can also be quite open about it. Nothing to be shamed about I Don't understand. What's so funny about that? It happens in all my lectures the Kolmogorov smirnoff test is the other type of test which we have to look at today and I Already introduced this test as a test which we can use when we are dealing with continuous cumulative distribution functions and Indeed This is a convenient test When the information we have is directly related to the cumulative distribution function So looking at this test we are looking at the cumulative So the idea is that if the post-related cumulative distribution function is In good correspondence with the observed data Then the maximum difference between the observed and the predicted cumulative distribution function should be small That's the That's the underlying idea here We don't look at the number of intervals in or say realizations that get specific values or within Specific intervals, but we look directly at the cumulative distribution function and Then of course if we are considering the cumulative distribution function Corresponding to our model and we want to compare that to the Information we can get by looking at the data then It's useful to look at the At the quantiles So what we can do is that we can Establish the observed cumulative distribution and we can do that by looking at the order data set as we have been looking at in descriptive statistics and Based on the order data set we can calculate the cumulative distribution is the ratio between the eye where I is the index of the Ordered data set So this is the data number I and then divided by n where n is the total number of data in the data set You will see exactly how that runs and now The statistic which can be applied for testing has the following form here so what we are testing is looking at all the data points and all the values of the sample cumulative distribution We're looking at the maximum over all these values the maximum between so the maximum numerical Difference between the observed and the predicted cumulative distributions So this we can write in this way The predicted is the one corresponding to our model This becomes really easy Because we have already Introduced the concept of probability paper now using this concept What we can do is that we can plot up In a probability paper which corresponds to our postulated model So this probability paper could for instance be the normal probability paper, which is the Considered example here The postulated model would correspond to a straight line in this paper and if you look now at the best straight line through these Or say the best straight line through these data here and now we are looking at the the maximum difference between the observed cumulative distribution which corresponds to these points here and the predicted Which corresponds to the straight line? So what we have to do is we have to go and look within the data set and We have to identify the value where we have the largest difference and this is our test statistic Yeah, so this this is now extremely easy and very operational the less easy thing is to develop the statistical characteristics of this of this sample statistic and We will not go into the details of that what we have done is that we have calculated the the statistic for you, so we have tabulated it as a function of the number of samples I'm sorry degrees of freedom or number of no number of samples here and we have Also as a function of the significance level and then you can go in and you can check the values of the statistic and In the case for this example here where we have the number of samples is equal to 20 and We already in the previous examples. We looked at the significance level being equal to five percent We go in in the table and we find the value Which is equal to 0.2941 Which we now have to compare to the observed statistic Which we got on the previous overhead, which is equal to 0.106 And we see that the statistic we we got there is smaller so We cannot reject the H0 the null hypothesis at the five percent significance level Yeah, that's a very easy test And you can do that for any type of postulated model But what you have to do is that you have to go in with a probability paper corresponding to that To that postulated model In principle, you don't need a probability paper. You can also just calculate Based on your on your postulated model the corresponding cumulative distribution function value but Plotting it up in a probability paper makes everything very easy to to overview now in regards to Another issue and that is the issue of model comparison We have considered or we have to consider two cases First of all, it is shown that a model hypothesis cannot be rejected And the other cases that it's shown that model hypothesis can be rejected now the question is What information is actually contained in these two cases Let's say given that the significance tests show that a given model hypothesis Cannot be rejected Then we have to keep in mind that we could equally well Have have attempted to verify other models So it might be very well that we could also not reject other models, which we could have proposed So there may be Possible competing models Which we cannot reject based on the data And and of course In in in our applications in engineering we very often we have few data And due to the fact that we have few data these tests for significance Are not Very strong tools They indicate They give us indications about whether A model which is postulated is gross Is in gross contradiction with the data But it's not like an absolute conclusion Imagine that we have to make some tests of soil properties or water Say concentrations of pesticides in water And and we we only have like 20 or 30 tests that that would be a typical maximum In in engineering applications because the costs associated with collecting this information may be very substantial So we have in general Very sparse information And these statistical tests Which have been developed and described theoretically They Have their origin in typical production Decision problems So for instance in the midi in the midi medical industry Where you are producing pills and you want to be very sure that the pills you're producing they have a certain Quality Then you have millions of pills and you can test Many many you can make many many tests and they don't cost very much also In those situations where you have huge amounts of data the Significance tests Provide much much stronger tools for Redacting hypothesis In in our cases. It's more an indicator of the goodness of the fit So if you have very few data you have a very strong Opinion or based on experience and physical understanding that a model A model is good I really feel this model is good. And then just for the Just for the record you decide to make a Statistic a test like the kai square or kolmogorov smirnov And it turns out that based on on the test you cannot verify Your your model then it does not mean that the model cannot be Correct, of course And then we already know that The concept of conducting Type one errors and type two errors So rejecting a hypothesis even though it's it's true This of course Occurs with a certain probability Namely the level of significance Now in in the case where the significance shows the significant test shows that a model hypothesis should be rejected It doesn't as I said it doesn't mean that it's really bad But it just means That the evidence the data Is not enough to show it with the required significance We have two little data so Let's say we have two competing models two or more competing models I could have And i'm very sure and you can try it When you get back home and you want to practice a little You could try to assume A model which is not normal distributed, but log normal distributed And then use the same observations Which I have used in my small example here and i'm sure based on these This model hypothesis That they are log normal distributed you would not be able to reject the hypothesis based on the observations That that would come out of it. So now you have two models You cannot reject the models And the question is which one is the better model And is there any basis for comparing the goodness of the individual models Against each other And what you see in the literature Is that some propose to compare the sample statistics directly And then to choose the model which has the the smaller sample statistic But that might actually not be completely consistent Because you you could have cases where the number of degrees of freedom is not exactly the same And then If you you cannot you cannot compare directly the statistic depends on the number of degrees of freedom Uh What is is much more consistent is to compare the the sample likelihoods which you would get From the postulated model by inserting the sampled Values the observations. So if you have a set of observations and you have a postulated model Then try to establish the likelihood function Corresponding to your model The parameters everything and then Insert the observations And then the model with the highest likelihood Would be the more appropriate model to choose and that does not depend on On anything else. There is no problems related to the number of degrees of freedom and things like that So that would be my recommendation Oops I hate this thing How many of you think the right answer is green Actually, this is a nice little trick So I think we have to consider this in the future if we if we want people to really give us good answers, then we have to To put sequences in with where you just see it for one split second. So without really realizing it, but it's in your mind Of course, yeah, this considering this little small exercise. Let's have a look at it anyway We have a sample space with only two possible states It's a binary sample space zero and one We are living in the matrix world And we choose randomly five outcomes out of that sample space and we get This vector of outcomes zero one zero zero zero now What is the likelihood corresponding? The likelihood function corresponding to this observation And actually I think it's it's it's not too bad that you already saw some of the solution because you did not see the complete solution This is a trick question Yeah, let we have to see some cards. Otherwise, you will absolutely have no break I see a green. I see like some reds But I want to see many more cards. Come on Don't be afraid of my trick questions. So red is of course a very very qualified guess but But there there could also be other really really good considerations So I see also a few yellow I like that I like that because basically If you look at it, there's no real difference between red and yellow here In this vector x You simply always have in mind you have the observed values the vector x contains the observed values So this is the the right likelihood We have inserted into our density function so to say for this random verbal The observed values and we calculated So This was the assumed correct answer, but but the yellow one Is equally correct now, so what we have seen today is that this concept of statistical testing which we introduced some time ago already now for the Reduction or verification of hypothesis which we could construct using data in order to check In order to reject and verify our hypothesis at a certain significance level So with a certain probability And this probability is related to the probability of rejecting a hypothesis even though it's correct This is a type one error And the type two error is to accept a hypothesis even though it's wrong now using this concept We have shown that there is a way which we can validate The goodness of postulated probabilistic models Probabilistic models containing the whole picture namely the choice of cumulative distribution or density As well as the parameters and we have to keep in mind and this is important that We have two methods the chi-square tests and the Kolmogorov-Smirnov tests the chi-square is Originally formulated for the purpose of testing continuous probabilistic models But we can see that we can also use it for testing No, discrete Models But we have seen that we can also use it in the continuous case by Discritizing our continuously distributed random variables now the Kolmogorov-Smirnov test directly goes in and and formulates a test based on the cumulative distribution And another important issue here is that the number of degrees of freedom Using the chi-square tests Depends on whether or not we have estimated some of the parameters of the distribution our model Using the data which we are also using to test with This is important if we have Estimated one or two or three of those parameters Using the same data then we have to Substract from the total number of degrees of freedom one or two or three degrees of freedom. Yeah We in principle that can be many parameters For each one we have estimated using the data we are using for the test we have to subtract one degree of freedom We can compare also the goodness of Competing models It's it's it's very likely that you can formulate many models for the same phenomena But if you would like to see which one of those is the more appropriate They cannot be rejected any one of them. You have already tested that now you would like to see which one is more appropriate One consistent way of doing it is by comparing the likelihoods by using the concept of this likelihood function We introduced for parameter estimation. So the more likely one of the models Following the concept of the likelihood function would be the more appropriate model choice Yes, uh now I only have one thing to tell you And that is that I really really wish For next week that you will do Just as good as you did at the last test because I I feel that you did good and Do it just as good But don't be nervous at all Face the problems with an open mind and solve them I wish you a good luck and I will see you the week after next week. Thank you