 Good morning everyone. I welcome you to lecture number two of of course collective dynamics of firms last Week I gave a broad introduction into the topic as you probably recall This is part of the larger field of industrial economics or industrial organization or industrial dynamics and This particular lecture Focuses on data about Thousands or even millions of companies and tries to find Statistical regularities and once we find these regularities we have an interest to understand why this regularity is there And we of course we would like to develop models in order to reproduce this In order to do so I have introduced the state of the art software that we Use in this course to analyze the data This is notably the statistical software are and your first self-study exercise was or about Installing our and reproducing the normal distribution Once you have installed our it should have taken like 60 seconds to get the picture. I hope today we have our first exercise and Parveen is ready to answer your question in case there were problems Today we start to introduce the topic in a more formal way and I hope that you did not Get scared about the many equations that are printed in the hand out In fact, you shouldn't be scared first of all. I'm here to explain everything secondly We do this in the formal way to help you to understand it It does not mean that you have to reproduce every single equation from these slides, right? This is not what we have in mind instead. We would like to provide you with more insights about this We will talk today about random variables and their distributions in case you took a course in computational statistics already there is not Much new for you But for all those who are average m-tech students we found it useful to recapture some of the basics here and We will introduce two classes of Distributions today that play a role through the whole course These are the symmetric and the skew distribution particular you will learn how to measure skewness and You will also learn how to determine specific parameter dependencies of these distributions Let me go back to the example that we used already last week We would like to understand how firms grow and This means that we first have to have a proxy of the firm size in a given year and then we have need to have data about The firm size in different years only then we are able to estimate the growth So the question that is also to be answered in your second self-study talks that is distributed today is How much does the firm size vary between two consecutive years? because that's needed to understand what growth means and A particular question that we will then answer in more detail also in the next lecture is if we find that the growth rate If we have determined the growth rate between two consecutive years the question is that the growth rate depends on the size or Does it not depend on the size? if it does not depend on the size it means that Your small startup company has the same probability to grow as a big Company like Nestle for example You can think of this for a moment. Is it true or is it not true? If it depends on size and we would like to know how the growth rate of the company scales with its size These are things that we will answer in the lecture and also in the self-study talks The first step is we need to get the data There are various ways of Proxying firm size and I will discuss some of these During the lecture for the moment it is sufficient to assume that we proxy the firm size by the number of Employees that a firm has But you can also think of other Proxies of firm size. Maybe you have a few other suggestions How can we measure? this Capitalization right, so that's one Revenue yes, so other ideas Okay, so we will discuss a few for example sales is another measure, right? We will discuss some of these proxies I think the next lecture so the interesting thing I can already tell you is No matter what proxy you use you'll see more or less the same pattern That's why we are quite robust in our assumptions every proxy with firm size and firm size with number of employees and that's it So we already did the first step for you we query the database that I introduced last week and Have provided you with ten to the four Data entries about firms in two consecutive years And now we talk about step number two we have to analyze the data the data is given to you now and We would like to look into this data and the first question We have to answer what is the distribution of the firm size and the firm growth rate? That is underlying this data and that is a question. I would like to discuss with you in today's lecture To discuss this we need to start a bit more in the formal way We depart from the Specific idea of the firm size and talk about random variables in general the nice thing about this is Once once you move to a different field you talk about stock market data you talk about Biological observations whatever right you can still use the knowledge that you acquire from these slides Right. This is not specific about firm size. This applies to any sort of random variables so We assume That the data that is given to us in this little data file that we provide today is Drawn from a distribution That's a random draw. There's a distribution underlying this data We do not know this distribution yet But we assume that ten thousand times someone draw makes a random draw from this distribution And gives us a number back and these are the small numbers x i So what we have is we have a data stream of x i x 1 x 2 x 3 pump pump pump And then we have x 1 x 2 x 3 for the next year again and our assumption is these are random draws from a Particular distribution and in order to find what was the distribution? Distribution we start by proxying the distribution with a histogram That means we try to plot the frequency at which particular x occur and I have sketched here in this little Diagram, so we have here the x values distributed And then you see in the gray bars the frequencies of these x values and You see that of course there is not a perfect match between the observations and the underlying distribution and That's exactly the problem. We have to talk about how to statistically Get information about the underlying distribution if the pattern that we get from the frequencies looks like this Our observations about firm size Referred to disparate variable. So you have either two employees or three employees. You do not have two point five employees, right? Therefore we have to talk about disparate variables first, but later I will Repeat the same story all the four continuous variables. Why is continuous variables? Why are they useful because we can do some analytical approximation that are not That easy to get from the discrete variable No, but our observations are assumed to be discrete variables What happens? So press the wrong button here so This looks a bit yeah mathematical or maybe messy if you like so It is only Thought to help you to go formally to this thing if you don't like Mars and you skip all these equations Let's listen to my explanation, right? okay in order to describe the probability Distribution of our variables We have to introduce two different notions of the probability distribution. The first is called the probability Function or probability mass function in mathematics and the second is called the cumulative distribution And the letter one is just the sum or the integral of the former one Okay, what is the information we get from the probability? mass function it tells us how Probable it is that I find a given value x i That is in realization of the random variable x. That means the Small f of x gives me the probability more or less That the random variable capital X has exactly the value x i that's the meaning of it. Here's one example Of course, we have to think of the Sample space of the possible outcome What is the sample space of the possible outcome for firm sizes any idea? Right, so okay first of all it starts with one. We don't know what the Maximum number is but a good proxy is the world population for example, right? It shouldn't be much more than ten to the nine or so right so okay So that's the lower and then upper bound there are discrete values and most importantly None of these values is negative right all the very important Here's another example much simpler. We take we toss a coin and Then the sample space is only two possible outcomes namely tail or head so we call the zero and one and then we Tell about the probability that zero occurs or the probability that one occurs and each of them as you know Is just one half That's the meaning of the probability Distribution the cumulative Distribution function which is sometimes only called the distribution Is the capital X it's a capital F of X and this gives you the probability that a value Randomly drawn from the underlying distribution from is below a given value xj For example, we talk about income right then the capital F gives you the probability That your income is below one million francs right so that is the information you already see that This does not contain all the information it gives you just an upper limit right and the probability that the Income of a mr. X who's the CEO of a company Y is below one million francs Of course much smaller right Okay, so it really depends on the underlying distribution Okay, and what we do here is we sum up all the values of this probability mass function up to this level xj here and The Cumulative value that means the sum of all of these gives me this probability So in the example of flipping a coin, then it's very clear Well, you have these two realizations here only and together they have to sum up to one because there is no other value available Right, so you just recap your late The small f gives me the probabilities at a certain x i occurs and The capital F gives me the probability that the value I get is below a certain value X i or xj Now that we know this Definition of the distribution we have to ask ourselves How can we characterize the distribution? We have to give a formal expression and the correct the rise it By two parameters which play a major role although in the course One is called the mean value mu and the other one is called the variant Sigma square the issue here is that I do not know the mu and I do not know the sigma square all I know is a sequence of Empirical observations and then I have to calculate the mu and the sigma from this empirical observation And that's described here on the slide so we assume that a good proxy for The mean value is just the arithmetic means so I sum over all of my observations one to n Yeah, I divide by n and that's the mean value How do I get this we talk about it later, right? So this sounds natural to you How what what else should the mean value be right? That's the most obvious thing that's wrong right so the mean value can be defined in different ways I think I wrote a bit in the notes. Yes That's all this is geometric mean Which is a much better proxy for the mean value in certain? Circumstances we come to examples in the next lecture actually so This is a proxy for the mean value just the arithmetic mean and the variants is then the Deviation from the mean and because it's squared we squared here. So once we know the mean value here So then we calculate the deviations and we scale it by one minus n We come to this one minus and later again n minus one. Sorry. Yeah Yeah Yeah, in this particular case. Yes, so in other cases not So it depends a bit. Yes, you are right in this particular case. It's a new when I write it like this Because I Have defined it like but as I said the mean value is not the average That's the average That's very important. Therefore. I took a capital X bar here. No, that's the average And this is a mean value for the moment you can assume that's the same Well, we come to other example where it's not the same later on Okay, so we use these variables with a head to remind ourselves on the fact that these are proxies of True values that we do not know why do we not know why don't we know these true values because we don't know What's the underlying distribution is right? You just have a finite sample set 10,000 values and we have to calculate it in some way Okay Now I come to the same but just for continuous variables as I said In the observations we do not deal with continuous variables But it's helpful to have an analytical expression. So and then I can proxy the probability density function PDF here is by a continuous function and The f of the small f x is The probability that I find this value x Which was the Xi in the discreet picture in such an interval between t and t plus 8 H is a very small number, right? There is a lengthy discussion I think I've also wrote it in the notes here that we should not call f of f x a Probability and so on right these are mathematically Valid restrictions of all discussions, but this is not a course on mathematics. So therefore We skip this discussion here, right? It is just for you a reminder There can be a deeper discussion Underneath the whole topic. We skip it here, right? So this basically gives you somehow the probability to find this value x and Decomulative distribution function as I said is nothing but the sum up to a given value x And or here in this particular case t. So that means it's a it's the area Under this function, right? So this is very obvious We have two notions the probability density Function and the cumulative distribution function and as I said the letter. It's just the sum or the integral of the form Well, that's it and we try to correct the rise the probability density function for two barrier The first one is a mean value and the second one is a variance and for the discrete case I gave you a proxy how to get this from the data, right? All right so now how do we proxy the mu and the sigma square from Continuous variable so then of course we have a similar notion here compared to The discrete case in this particular case We call the mu the expectation value and this is simply the x times the probability to find this x So integrated over the whole sample space, or it's not the sample space. It's a continuous space and the variance is Calculated likewise as a deviation from the mu And the standard deviation. It's another term that is often used instead of variance It's simply the square root of the variance Sometimes I mix this up. Yeah, and I talk about standard deviation But you know what I really mean right either the sigma square or the sigma. There's a firm relation So that's a formal definition of mean value and variance For the continuous space. We need to keep this in mind that it is well defined Well, we will have to deal with the issue how to estimate it from the Discrete variables so now we start with one example and like in school we start with the most simple Example that's a normal distribution I mean when I thought about this So I wanted to have an example that is not totally boring So and then I came up with this idea of a shoe company, right? So which I called the 3s company so but it is a good imagination of How to apply what we learn about distributions to a very simple case? Let's assume you are the manager of this shoe company So and you want to produce shoes for the European market. So the total market size is More than 700 million people so It's certainly worth the effort to think about this market in particular Since the people obviously have the money to pay for the shoes, right? So you produce the shoes in China and you have to tell your Manufactory in China How many shoes of what size they have to produce right in order to ship this to Europe? And that's exactly the question How can we know that of shoe size how many pairs of shoes of size 20 or 24 or 30? We have to produce that's the question Then we go and do a little research on the literature We found this paper from a Japanese website where people thought about how shoe size distribution is Looks like and we found that's a normal distribution with the mean and Standard deviation not the sigma square. What's a square sigma only given by these values And obviously there is a gender difference between man and woman you could have guessed from the outset This is all that we know now And we would like to apply this to our question How many pairs of shoes of what size do we need to deliver to Europe? Right, that's a question. Okay, let me just plot the function first You did it already in the exercise What first self-study exercise everyone got this picture here It is Obviously not only helpful, but all the needed that you remember how the Probability density function of the normal distribution looks like would be good if you are able to write down this equation All right, so you see it's a very simple structure here It's an exponential function the exponential function you have a Square here in the sigma and although in the First part here and This is the difference between x and mu where mu is a mean value. This is simply the Variance here and then there is a normalization factor Which is the two times sigma square? Square root and in additional pi We are not deriving this but you should remember this from the structure, right? so first the exponent then what's in the exponent something with the square and then a Normalization factor that's a square root of this denominator here together with an additional factor That's how it looks like it's obviously a symmetric distribution that the first thing you have to notice the symmetric distribution and the way we plotted it is Symmetric according to zero, but this is because we assumed that the mu is zero here Symmetric according to mu actually very easy to understand so Now we have to talk about the cumulative distribution function that I already introduced the capital F How do we get this? By doing the integral of the distribution function, right of the small f. I explained it to you so now you start and do this and then you recognize okay, there is There are quite complicated or intricate Outcomes there was no closed-form solution for the cumulative distribution function of the normal distribution though therefore people and Until very recently Calculated this once they printed it in tables and then you go and look up the value For a particular x in the table. That's what I did when I was a student for example Today everyone is doing it with a computer of course. Yeah, but we always had these tables Okay in order to look it up in a table We face a problem of course we have a particular mu and we have a particular sigma square So they will not be able to print tables for all sorts of muses and all sorts of sigma square, right? Therefore we have to normalize this distribution We introduce a normalized variable. That's a T by just Substructing the mean and dividing by the sigma By this we get a normalized normal distribution where the mean is zero by definition and the sigma Square is one. That's a very important normalization and this Only this function is Printed in the table, right? This is sometimes also denoted as capital phi of T Well, let us take an example here. Let's assume the mu is given as 4 sigma square as few and our T is 3 then We know that our underlying normal distribution looks like this and now we have to rescale it By just doing this transformation so In order to get from here to the phi We have to subtract them for here and the two Divide by two and then we have to normalize distribution. We go and look it up in a table I doubt that anyone still had these Table books, right? No one has it. Therefore we use The same in our in our it takes us like 15 seconds so of course we have to tell our what's the mean value Mu is for what is the sigma not the sigma square the sigma is 2 And what is the T 3 and then we type p norm which is the function calling the normalized normal distribution like this and so our Returns this value to us so and Because we talk about the cumulative distribution functions or the interpretation of this value is very easy It's a probability that a value x is less or equal to 3 is about 40 percent That is the message that we get from this. I Hope you could follow up to here, right? so now we go back and want to ask How many shoes of what size do we need to produce for the European market, right? Okay, I have plotted here the two Distributions for man and woman record. They are both normally distributed But with a different mean and a different variance you know, so these are the two distributions and Now we record that we are able to calculate the cumulative distribution function This tells us what is the probability that a value is lower That the value that we randomly draw is lower than a given value, you know, okay Let's assume we are interested in shoe size 21 here and we asked What is the probability? That the shoe size is below 21 and this can be calculated As I have described before right Normalize it and then you go and calculate it from R by using our so but this gives us all people that have a shoe size up to 21 so in order to know what How many people really need 21 and not 20 or 19 and so on we have to subtract how many of these people Have a shoe size that is less than 20, right? Because we are only interested in those people who really need 21 and not in all people that need something Below 21 and then we calculate this and then we get this kind of numbers So that means a cumulative distribution function help us to give an upper estimation of how many pairs of shoes People in Europe need right This is exactly the number of shoe pairs that we have to produce of a given size Everything clear Yes, very good. There is a Question at the very end that you have to calculate this for Other values of the shoe size and I want you to do it exactly as I have described on the on the Slide here. Yeah, once you understood that it takes you like 30 seconds. Yeah 15 for the Upper limit and 15 for the lower limit, right? Not more, okay Now that we have answered this question we already found out what is the difference between the two different genders male female and In order to have a better understanding of the whole thing we would like to remove this gender difference from the distribution The case of the normal distribution, it's very easy because we already discussed how to Normalize or how to scale a normal distribution that was two slides before right, so Which is the standard normalization so remember that from the values of x I have to subtract the mean and I have to divide by the standard deviation No, this was given to you Slides before So and this means that instead of having a Distribution from which I draw all the numbers for the females and then a distribution from which I draw all the numbers For the males I now have just one Distribution namely the normalize distribution that describes both males and females right because I have normalized this and the result of seem to The result of this is shown here instead of having this picture Where I have these two distributions I only have one picture with one distribution and what has changed between the two distribution between the two pictures here What has changed? No one's recognizing it Please missus What has changed between these two pictures here, excuse me The height has changed or this is okay. It's a bit The axis have changed correct so the axis have changed the x and the y axis have changed So I was able to merge a different distribution in one master curve Once I got the right Scaling idea about what is the new x and what is the new y? Right, so that means instead of talking about different distributions for different genders or later Talking about different Distribution of growth rates for small firms and big firms that was the example. We already used we talk only about one distribution, but this Needs to have the right scaling of the x and the y axis once I know What I have to write here everything collapses in one curve, right? But then I can claim that I understood the problem because I found the common denominator Of all of these and I found the correct variable that describes all of these distributions It's a very important point for you to make, you know so Okay, so we have a scaled variable here and a scared variable there and this allows us to collapse Collapse different distributions to one master curve. That was correct Did you get this I? Hope so. This is a bit complicated way of writing it up. It simply means x minus the Mean value divided by the standard deviation, you know, that's basically the t as it was defined it before Everyone got says I hope we will talk about these scalings Quite often in the core. So therefore I want you to think how do you get from here to there? Again the conclusion is we understood the role of gender here What's the difference between these tools and because we understood this we were able to remove the influence of gender From these two curves and make it one curve that is completely independent of whether we talk about males or females That is the important step we did in And the analysis, okay There is another problem underlying the whole thing Which I just addressed here. We will talk about this in the next lecture in more detail It's about data binning remember that we talk about discreet variables here and therefore we have histograms that have these Defined height and defined width and of course The histogram is very much dependent on how we define the bin sauce I have plotted the same the same data here three times The only difference between the three one is a bin sauce. It's not the data. It's a bin sauce, right? I took my observation and then I fitted this here. I have much more bins So then I get a distribution like this that looks more or less symmetric, right? Then I choose a larger bin size Only 21 22 23 and so on. Yeah bin size of one centimeter And you see this distribution has lost It's a metric shape Why is this? Because I have chosen a bin size that is too large to show any sort of Symmetry here and then I have chosen even a larger bin size namely 20 to 24 and then I get something like this. This looks like a uniform distribution Why is this? Because I have chosen a bin size that is way too large for this problem, right? Therefore everything looks like the same probability This addresses the question how to get the optimal bin size to bin the data You can also think of something differently here You can think of having a very very small bin size one millimeter or something What would be the outcome then? So or even a tenth of a millimeter or something like this you all the end up here with something That's more or less uniformly distributed because for every single Realization that you find you have also defined a bin so that means the height is one in every bin, right? Because you made such small bins that I can distinguish myself from another observation, right? So you have more or less equal bin size That means you should not choose a bin size too large But also not to prove small in order to get some statistical information out of this How to choose the optimal bin size will be discussed in the next lecture, but I want you to already Anticipate the problem here because if you get a picture like this Which I show in the middle you would obviously assume that we talk about skew distribution here So where most of the Mars is concentrated to the To the right that's The consequence of this the nice answer to this is R is already choosing the right bin size for you So you do not need to care about it, but I want you to understand this problem, right? If you use R, then this is already optimized Without you knowing it Okay, now Let's go back to the histograms that we need to calculate For which we need to calculate now the two parameters, right? We still need to know If we have the Observations here then we need to calculate our distribution and the mu and the sigma Okay, you can remind me that I already gave you the mu and the sigma, correct? This was thanks to some guy who published it in the internet In general, you don't know then mu and the sigma you have to do it yourself And therefore we're addressing this here. So we have 1,000 measurements of the foot size and then we get These numbers here and then we plot the histogram number of times observing a given value Divided by the total number of observations, which is 1000 in our case and then we get a picture like this here so and now we need to know two things that Are Addressed here The first thing is we need to know what are the two parameters that we Need to have to characterize the distribution. It was a mu the mean value and the sigma square I have to calculate these but the other problem all there is I have to Know whether this histogram that I just showed you Follows a normal distribution or another distribution if it follows another distribution for example a log normal Distribution, then it's not clear that the mu and the sigma are calculated the same way right I Said that because you got so much used to the normal distribution You can only think of the mu and the sigma at the arithmetic mean and the respective variance, right But there are other cases. I want you to understand that this is like the hand and the egg problem All right, we can decide what we want to choose First to solve But we have to keep in mind the other one So we will talk about the mu and the sigma square first Assuming that we have a normal distribution But later we also have to answer the question do we have a normal distribution? Is the data telling us that we have a normal distribution or? Is it just our lack of imagination that we think we have a normal distribution because we don't know about other distribution? Bible distributions pump pump pump. There is a whole zoo of days. Yeah, so You understand we have to solve these two problems together and we do it step-by-step Instead of solving it together first talk about the first problem and then about the second So now the bell is ringing exactly so that means after the break the break is 10 minutes So 10 after 11 we continue with the lecture 10 after no matter whether there's a gong or not 10 after 11 we continue 10 minutes. Yeah, and then I Continue with this one so let us please continue with lecture number two by the way, all the material is Available on Moodle as you know and this is a password for the registered students. I have put it up on the Blackboard again You can log in you find the handout of the lecture you find additional literature that is thought to further educate you and you also find The self-study talks and the data to download and you find a link to the Recording of this lecture as well This recording is very nice So if you forgot what was the meaning of a particular slide So then there is a slider and you go to this particular slide and then you hear me hopefully Telling you the right thing It's a very convenient way The drawback of this very convenient way is that Many students are absent because they no longer feel the need to sit here That's the problem From the registration that should be more than 20 students in this course. Okay Let me now continue with This issue of parameter estimation We get to so we need to know how do we calculate the mu and the sigma or the normal distribution I gave you this example already. That means you know, what is the result? But now we ask the question If we do not know the result, how can we get the result? How can we find an equation to calculate mu and sigma? That's a task here. We need to find Parameter estimation for the mu and the sigma and how do we get this and the method that we use here is the Maximum likelihood estimation I'm sure that some of you have already heard about this Can you just raise your hand if you are familiar with the one two three a bit? Yeah, okay You're okay, so then I can be very quick But let's assume that those who look into the video recording have not the same scale Therefore I put it here So we have this realization of the random variable X and these Alex you can take a note these supposed to be small X Yeah, if we want to be consistent with the previous slide So these should be small X and capital X is just the name of the random variable We assume that these variables are IID that means independently and identically distributed This is an important assumption when we talk about random variable random means random as Opposed to correlated right so if you draw in number Then the draw should not depend on the number you have drawn before or the number you will draw in the future something Right so if this is the case and we talk about correlations and then we do not talk about independently Distributed variables and identically distributed means if I have a Thousand of this realizations I assume that they are drawn from the same distribution and not 500 from one distribution and 500 from another distribution This is a meaning of this you can circle this in it's a very important assumption if you go to econometrics or Courses and of course this is a standard assumption that you find everywhere, but you should understand what the meaning of it is so and Now we would like to discuss how do we get our parameters mu and sigma for the foot lengths and our Distribution is assumed to be a normally this normal distribution That is not clear. I mentioned to you. There is a related problem where we test the distribution Is it a normal distribution? Now we assume it's a normal distribution and then we calculate the mu and the sigma and then we go and test whether this Fulfilled the criteria of the Kolmogorov Smirnov test And if it does not fulfill this quartet then we have to go back and say well Maybe it's a log normal distribution and then we have to Recalculate the mu and the sigma and then we have to test the next distribution and so on and so on This is a way we do this This is just a starting exercise to get you Acquainted with these methods here We define a likelihood function, this is this capital L That depends on the two variables mu and sigma and is formerly Given as the joint density of the measured data and a distribution with these two parameters So you should read this till the F as a function like Given that I have The observations x1 to xn What is the mu and the sigma That likely describes this set of observations right Under the assumption that it's normally distributed. That's not mentioned here, but that's another example Right, so this is how you read this. That's a likelihood function Okay, so now we make use of our argument that it is that we talk about iid iid variables here, so Independently and identically distributed data if these Realizations of the variable x are Independent of each other. What have you learned and probability theory about independent? Events the probability of independence event Does someone recall this? Yes, please right, so if I talk about Some dices right then the probability that I have a one Does not depend on whether I had a six before right so that means there's our independent realizations over the sample space one to six and in this case of the Independent realizations, I can factorize this function into functions that No longer depend on all of these observations, but just on a single observation and According to probability theory this the joint probability is just as you said the product of the independent probabilities That's something you have already heard in basic mathematics, right? So you understand why we are able to factor rises No, did you get it because these are independent? Distribution so and now we make use of the fact that they are Identically distributed they come from the same distribution So that means for each of these I assume that they follow a normal distribution They follow all the same distribution, which is the normal distribution. So that means for the F I make this assumption here That's a normal distribution and please recall that I want this to be changed into small x i right so You understand this so that means we have a product of all these normal distributions here That is due to the fact that the variables were Independent the product comes from this and they are identically distributed that means they are all the products of the same normal distribution This is the message of it so that means I have already an expression for my likelihood function And even if it looks complicated on the first glimpse it is not complicated at all This is a normal distribution where I want you to repeat and recapture late the equation And this is just a product over a number of realizations Okay, so Then We asked Ourself now What are the parameters mu and sigma? That describes this max at this likelihood functions the best that is the next question, you know So what are the parameters that describe your data? It can be different muse and sigma that fulfill The likelihood function, but the question what describes it best and this is equivalent To the question of finding the maxima or the minima or the extrema of a given function you know this already from your Courses in mathematics basic mathematics if you want to determine the maxima of a function, what do you do? Hmm Yes, it takes the first derivative and set it to zero right so that is the same thing we do here We take the first derivative of the likelihood function set it to zero So now the likelihood function depends on two parameters mu and sigma So we have two derivatives here, which we have to set both to zero right Okay Now we are facing the problem that the likelihood functions the product of a number of Normal distribution, which is okay, but makes it a bit difficult to handle Therefore we say if he takes a logarithm of the likelihood functions will probably not change the maximum, right? so But it makes the things more convenient if we take a logarithm of a product This turns all the products into sums. That's a nice thing. Therefore we take it, right? So Yeah, instead of having our Likelihood function with the product we take the ln of the likelihood function which is a small l here and This turns all the products into a sum that's a nice feature Then and then it looks like this. It's a sum over the ln of all these things So and you see there is the ln of the pre-factor and there is the ln of the Exponent the ln of the exponent is simply what we had in the exponent here. Yes, please Where here No, no, this is the ln here and the ln is already taken from here And this is the sum goes over the whole party. Is this your question? Yeah, of course I mean what why do you assume that the sum applies to the pre-factor the pre-factor does not depend on i This is a variable that depends on my Yeah, I could if you if you like can also write the sum here, right? But again, so this was how it looked like so I take the logarithm of this and this means the logarithm of the Pre-factor and the logarithm of this and the logarithm of an exponent something is simply what we have here, right? So that's how we got to this equation and now we rewrite this a bit in a more convenient way Like this, yeah, so this is the same To get it to here. I do it step-by-step to really show you that you do not need to get scared here, right? It's very easy to understand what we are doing everyone should be able to Understand why we end up with this equation for the log likelihood function in case of a normal distribution, okay? You agree so and now once we had defined say Log likelihood function we have to take the partial derivative and then we have to set them to zero, right? Let's take the partial derivative from the previous slide. So log likelihood function Derived after mu so Is there someone who volunteers to talk about this here No, it's too easy right so everyone understands this so we go and Look, yeah, so this is the equivalent this doesn't depend on mu. This doesn't depend on mu only that depends on mu so therefore we get this as The derivative right so We can rewrite it in this way and the same is true for The derivation after sigma this doesn't depend on sigma this depends on sigma and this depends on sigma Therefore we get a first term This is one over sigma and the sum is there are n sums so that because there's n times 1 over sigma We get the n here and this is the second term That we get from this previous exponent, right? So very easy to understand and then last step we set this to zero and Solve so then we have two equations Which both depend on sigma and mu and we have to solve two equations for two variables I hope everyone is able to do this, right? And this is the result The result is not a surprise, right? What you see here is that the mu that maximizes the likelihood of describing the data that I have given to you Is simply the arithmetic mean With a small x here, right? So it's not a big deal, right? So I told you the result before But now you can assume that you didn't know the result before Then this is the way to calculate this We come to the log-normal distribution afterwards And you need to get an estimation of the mu and this estimation looks completely different And then you go to Wikipedia and they just print this for you Then you ask yourself, how did they get this for the mu? And you understand how they got this for the mu by doing exactly the same thing Just not with a normal distribution, but with a log-normal distribution And if you want to test a whatever distribution, yeah Stretched exponential distribution and other things and we do the same thing We have an equation for the distribution and then we check What is the expression for mu and sigma that maximizes the likelihood That this data is described by that kind of distribution function Everyone understanding what I'm talking about here. It's a very important step So and for the sigma we get the same as we have defined for the sigma And you all know the result because we started from telling you this No, but you also have to understand that the mu and the sigma look completely different If I'm not talking about the normal distribution, but for example the log-normal distribution We come to this in a moment I also want you to understand that there is a difference between Between the estimators That's something we get from the data and the true values the true values are those that have not Anything here on top no tilde and no star or something like this right so Here our start values are the same as the one with the head Which I gave you before right, but the true value these are values We cannot know these are values. We can only derive from a mathematical expression here. We derive this from data No Okay So with this I Come to the more interesting distributions Recall that the normal distribution was just an exercise to tell you about the probability distribution the cumulative distribution and the maximum likelihood Estimation to get the mu and the sigma now we are dealing with the real thing no the real thing is The distribution is not symmetric, but it's skewed in some way. What do we mean by skewed? It has it's either Lopsided to the left or to the right So then we talk about a negative skew or a positive skew So here you see that in a negative skew we have the left part drawn out here and in the positive skew We have the right part drawn out To this So the first question is Can we how can we Describe the skewness? Is there a measure for the skewness? So before okay before we calculate the skewness. Let me just talk about a few properties of the skew function Okay, if you look here then You see if you calculate the mean by the arithmetic mean and you will probably end up somewhere here, right? But that's not a good proxy because Very many data can be in this longer tail of the skew distribution. Right That means you have a low mean As compared To the maximum value of this distribution That's not true for the normal distribution in the normal distribution You find something that is more or less close to the mean Right, even if it is like a factor of two or a factor of five or something So but here you can find something that's a factor of a hundred away From the mean. That's the first thing that you'll see Yeah One more picture. Why is it not normalized? Yeah, the normal distribution is not the dotted line Yeah, it could be yes, but okay This is simply this is simply a sketch. Okay, you're right This is not a normalized distribution because okay, but alex is telling us likely is if we normalize this thing so then Normalize means the area under the function is normalized to one Then this would be the normal distribution and if I had a skew distribution Then it would look like this, right? That's what you're talking about. Yeah, so okay And this is not normalized just to give you an Impression of The skewness a better impression of this skewness right, but of course Skew distribution are in many cases also normalized Okay, here are some properties of skew distribution I said already there's a low mean value low means compared to the maximum value that you can observe and We all often have data without any negative values The example that we already talked about was the size Of the firm which cannot have negative value Your welds can also not have a negative value Even if your bank sees it differently. Yeah, so okay There are other distributions that do not have negative values and there are two prominent Examples for skew distribution which we discuss in this course. There's a log normal distribution and the power law And we now need to characterize these two Distributions the log normal and the power law, but before We characterize the log normal distribution power law will be discussed in the next lecture We discuss how to measure the skewness so So in order to get this remember that we have our observed data from one to n And we can Calculate a skewness Parameter which we call gamma here from the sample By having this observed mean value and the observed The observed Variance in this way The gamma is defined as the third Power of the Mean divided by The third power of the sigma The sigma is just Here the square root of our observed variance. This was the observed variance. So and we have to take the Third power of this and the mu was Given here by the mean So and then we get one number back and uh If this c If the gamma is equal to zero Or very close to zero then we talk about a normal distribution or a symmetric distribution And if it's negatively or positively skewed then we get this in terms of considerable numbers here And here is a little example how you Calculate this in r. I'm not going to retype this for you. Sometimes I did but Today I'm not going to do this It's very easy. You just write down What's written here and you try to understand what we do here so We have a table with our test data now then the be assign the values One to n From this test data and we define a function that is called skew Exactly as we have described here well There is the m Which we call mu three here. It is simply the sum over x minus mu To the power of three divided by n and the sigma as The square root and the third power of the variance And then we calculate this you know so Very easy to understand right so you learn a bit of r here So we first define the m the s and then we divide the m by the s That is how the function skew or gamma is defined And then I just run this function on my sample That means I read in all this data and this is the number I get so what's the result of this Who's able to follow Our explanation here 1.37 There is a skewness so it's very different from zero and to what side Too difficult, huh? Positively skewed what is positively skewed it is drawn out to what side So which one is it? Yeah, it's drawn out to the right Most skewed distributions that we see are of this type, you know so okay So now We apply what we just learned to the first candidate of a skew distribution. That's the most important one It's a log normal distribution So what is the difference between a normal distribution and a log normal distribution? The difference is simply that instead of talking about x I'm talking about ellen x in fact. I have a new variable which is y equals ellen x So I replace my variable y here by ellen x So and then instead of having a Normal distribution I have a distribution that is drawn out to the right and has a positive skew And you also notice that in the normalization There is the x Right, that's the next difference. So that means the exponent looks similar to the normal distribution Except that I talk about ellen x Well, there was an x before so the denominator is the same But in the normalization I get an extra x. Why is this the case? So that's maybe something you can check at home You just normalize this and then you see that because of the normalization Or of you can do this by variable transformation You know the y is ellen x and then you see that thanks to the variable transformation you get one over x You know and that's one over x. This is simply The result of the variable transformation that you do here And then you have to calculate this into the normalization Yeah, you do it at home and then you immediately understand this Okay This is a plot of the distribution function for The on the on a normal scale where you can notice the Asymmetry or the skew And we can also plot this that is what we usually do On a logarithmic scale How should the log normal distribution look like on a logarithmic scale of x? Like a normal distribution, right? This is how we recognize that it is a log normal distribution We plot The x scale on at the logarithm and then we see something symmetric Like in this case, that's the first indication of a log normal distribution All right, so Why do we talk about the log normal distribution here? I think let me just check with you that I gave you one Paper to read about the importance of the log normal Yes Where is it? Okay, yeah very oh, okay. Oh, yeah here Very interesting paper on the log normal distribution. I wrote this in the note That's a short paper that you can read to find out why the log normal distribution is so important It gives you a whole zoo of applications of the log normal distribution And many of the variables that are listed here that follow a log normal distribution You will probably not have expected to follow a log normal distribution It's an interesting thing to read. There's no math in it simply. It's a bit more Overview, you know So we have put this in our literature folder. You just downloaded and there's a table that shows you all these very hours very interesting What was it the time in hand breeds or something like this? Okay There were many of these examples so now we Have to characterize our so first of all, so do you read please this little piece? Yeah To find out about the importance of the log normal distribution there Here geology medicine physics biology always the log normal distribution appears So what you learn here about log normal distribution is not just specific or for economic data. It can be applied anywhere so but now we Are left with the task to characterize the log normal distribution And remember we want to characterize by these two variables mu and sigma square, right? So and now we have to calculate the mu and the sigma square. How do we do this? You got the point already. So how do we do this? With the maximum likelihood estimate, right? So why am I not showing you this here, right? Because it's a bit more involved a bit messier, right? And the normal distribution had the advantage that you already know what the result is here. You don't know it So therefore I skipped it here. We do the same thing as we did Before for the log now for the normal distribution now with the log normal distribution Then we see that these are our estimators here You don't find it a big deal, right? So you see, okay mu is now the log of the x And here sigma square depends on the log of the x. So that's somehow trivial. It is not it is not Right because what you see here? This is no longer the arithmetic mean This is what you have to recognize But it is the average of the log of the x So and this Maximizes our likelihood It means if we ask what is the mu and the sigma that maximizes The likelihood of the log normal distribution function if we assume that the data follows the log normal distribution function It's expressed like this Okay So now We have a mu That refers to the normal distribution and then we have a mu That refers to the log normal distribution How are the These two related to our previous mu So in many cases, it's very so maybe you already calculated the arithmetic mean of the x So what sends a relation between the mu for the log normal and the mu of the normal Of the normal distribution function and here you we have printed to you this non trivial relation between Those variables characterizing the normal distribution and those variables characterizing the log normal distribution Simply for you to see that this is a difference, right? so and here we have plotted the log normal distribution for various values of sigma and Probably also of mu, but I cannot see it here. So And then you see how the shape of the distribution changes Changes If I have different values here, right? So So What you have to recognize is that the sigma has a very important influence now Which is clear because the log normal distribution is stretched very far out to the Right side as you mentioned, you know, okay There's also a nice note that I copied from wikipedia. So The term expected value can be misleading. Yeah It must not be confused with the most probable value. So that's something you can read at home Okay, now again comparing the log normal and the normal distribution so What makes the normal distributions are popular? So first of all, we see it In the living world we can really observe it Most features related to the human body are normally distributed for example the body height For Example the shoe size all these things are normally distributed. That means we observe this If we observe Outliers of extreme values Then we have a mental problem To see whether this is really an outlier or whether this belongs to the distribution But it is just another distribution not a normal distribution. That is a problem That prevented the log normal distribution from being recognized for a very long time Because in previous times people Automatically assumed that the Most common distribution is a normal distribution and then everything that didn't fit into the normal distribution was then considered an outlier Now we know That the power law and the log normal distribution are as Frequently observed as the normal distribution. That means if you observe these outliers to the normal distribution This can be well Fitting parameters that fit into the log normal or the power law distribution, right? That is the important that is the important thing In order to determine whether outliers belong to the distribution or not you need to have a sufficient statistics and In most cases you didn't have this statistics before Now that's an important Issue that means you do not have Evidence enough to tell that this particular value that you have observed also belongs to Another distribution namely to a log normal distribution. Therefore all these things were dropped but now Yeah, we have the sensory of the valves of data or the abundance of data We have much more data than we ever had in the history of human life And then we can now test other distributions with a higher significance The most noticeable example is from the stock market for example Okay No, you cannot this is you cannot easily answer this you have but uh because it You can estimate this frequency of course From a limited sample size as you will see so I give you these 10 000 data points So and I hope that you see a log normal distribution there, right? So Conclusion 10 000 data are sufficient. So but that's not the point. So if you go there I mean this was selected in a way that extreme values are Nicely represented there in other cases Extreme values are not so nicely represented So and then of course the statistic gets Worse on the end of the extreme values where the distribution is drawn out Right, it depends a bit on what you are talking about in terms of the example Right, that's an important. That's an important issue Okay There is no general answer. What is the good data size for the for the Statistic it really depends on the on the sigma in particular That's what I what I told you before and the sigma is different from For most of these examples that we talked about you can really go to this little piece of Of this Paper I mentioned because they have printed all the what's the sigma And you see that the sigma in some cases is really large That means we need to have a large data set in order to find that this is a log normal distribution In other cases. It's not that difficult Okay So now we are left with our self study number two We give you data and you need to Calculate the skewness of the data I already gave you the example how to do this Every single line of code was given there once you have our in start You should be able to rewrite This and then to get the skewness of the data, right? Second challenge is you should plot your data in order to see whether this is drawn out to the left or to the right Or whether this is symmetric and then You have to use the maximum likelihood Estimations to get these Parameters that you need the solution is already given to you So but you need to calculate this yourself in order to understand how this functions I give a Example in detail during the lecture here. So this is the self study talks again concluded What you have to learn and what you have not to learn the last point is a bit more Wow, it's a longer discussion basically because in some cases there's a one over n when we talk about The Variant and in some cases there was a one over n minus one, right? Where is the difference coming from that's the difference matter, right? These are all mathematical discussions For you in practical circumstances, you probably don't care and that's also correct But it could be that you are interested in Whether we made a mistake in typing or setting these equations or whether there's a deeper reason So that means this addresses a deeper reason here. Okay, and these Are the questions that you please answer Uh to recapitulate the content of this lecture here So you have to understand the meanings between the difference between the probability distribution function and the cumulative distribution function That's the first thing and then you have To calculate exactly the way I described in the lecture To find the probability of given true size 20 45 and 50 here so You should also recall the equation for calculating the skewness It's very easy. Yeah, it's Mute to the power of three divided by the sigma to the power of three. I mean you probably will get yourself That's something you should remember so and Then you explain What is the underlying Mechanism of the maximum likelihood estimation so And why we need to have an assumption first about The distribution before we can calculate the proper mean and the proper Mean and variance Huh, that's the idea next week. We will talk about the opposite problem Namely, how can we check whether this distribution is a normal distribution or a log normal distribution or Something else, right? So that's a issue of the next week with this I thank you for your attention and remind you on the exercise that happens today at five o'clock. Yeah, thanks