 So good morning everyone. I welcome you to lecture number three of our course collective dynamics of firms just to Repeat what we did last To in the last two lectures. So the beginning we were talking about the data We have available about the software we want to use to analyze the data then in the last lecture we introduced the meaning of distributions we talked about continuous versus discrete distribution that's something we use today all the two Calculate a few parameters of the distributions Then an important issue After talking about the normal distribution we talked about skew distribution because these are the distribution that are relevant for this lecture here and We talked about how to estimate Parameters of the distribution You probably recall. What's the two important parameter where? well What was it? Yeah mean and variance correct and we needed an expression that relates to the data to calculate Assumption is not that the professor writes down what the mu is or the Sigma so but you calculated from the data Today we first introduce another kind of skew distribution Last week we talked about the log normal distribution today We talked about the power law and then we are trying to apply this to the firm data as we have promised in the beginning By noticing some so-called stylized facts So these are statistical regularities that we see in the data We talked about two stylized facts today one is regarding the firm size distribution and the other one is regarding the growth rate distribution We also talked about a specific method to use In order to detect deviations from these distributions Let me introduce in the beginning the second Candidate for a skew distribution Remember last week we spent a lot of time on introducing the log normal distribution and the second candidate As I already mentioned was the power law we characterized the power law first by a probability density distribution And secondly also by the cumulative distribution, so I Already mentioned this in the last lecture the probability density is Given by this kind of expression One divided by x and there is an exponent alpha That is called a power law because that's the power. You know the exponent there You see that you cannot really apply this to Very small values values close to zero Because then the whole thing diverges, right? the minimum value if you think of The firm size distribution and you measure firm size and number of employees is certainly one So we shouldn't have a problem here, right? So if we measure x in terms of employees The parameter if you write it like this should be larger than zero and we first have to determine the normalization constant. How do we do this? We just integrate over the whole distribution and then set this Area to one in order to calculate the c that is what we do. That's a normalization here. So if we calculate this integral We get an expression like this everyone is certainly able to do this and then by Using this expression we get a normalization constant c that looks like this the most important Issue here is that it depends on the minimum value at which we start to think about this power law distribution Some people kindly ignore the x mean so you cannot ignore it We will see it afterwards when we talk about examples You cannot ignore it first of all because you need to define it for the normalization and secondly you will see that some of the values do not Display a power law behavior over the whole range of x That can be beginning something differently with the power law tail that that means Whenever we think where the power law tail should start we define the x mean So that means with this normalization constant. We have an expression like this from here already an important restriction for the alpha follows The alpha has to be larger than one That's very important If you have an alpha less than one it doesn't make any sense Right and the usual expression for this power law that looks like this with the normalization and with the x mean is We take the logarithm of these two sides We take the logarithm then the alpha which is the interesting Parameter here appears In front of the logarithm so in fact we have a linear function here where the slope of the linear function gives us the alpha Because there is a minus alpha and we talked about a positive alpha here. That's a negative slope, right? That's what you see here and The meaning can be better described in this picture here here. You see the logarithmic Plot, it's a log log plot which refers to the last line of them the previous slide and here you see it's a straight line With the negative slope indicates the alpha that's the idea the real meaning of the power law becomes more obvious if you look into the Non logarithmic scale that's like here. Let's assume you have a Number of values ten thousand Observations from a data set and you see most of them are like between zero and one and Then you have occasionally one two or three of these values that are Three four or five, right? Then you say well This can be outliers who knows what happened at that particular day Maybe there was a Sun spot and then we saw these irregularities and these kind of things So from looking into the data Particular if you have the normal distribution the Gaussian distribution as your null hypothesis You would clearly characterize these Values here as outliers because they have a very they are very rare. Well, they have a very low probability to occur Therefore you may be inclined to drop says But in fact, that's the important message here They may belong to the same distribution Namely to such a power law distribution just set your null hypothesis That everything should be close to these values of zero and one That your null hypothesis is wrong You understand what I mean, right an outlier is only an outlier with respect to what? You think of the underlying distribution if you think it's a normal distribution then that's an outlier Therefore the best thing is to test other hypothesis for example skew distributions in order to find out whether this is really an outlier Or whether it is on such a plot here just with a very very low probability so One to attend to the minus six means you have a million observation One of these is this one, right? That's the meaning of it You think of it. This is a really rare events, but it's useful to test these kind of things Okay, what's the meaning of a power law distribution? Usually we talk about these as scale free distributions What do we mean by this? You can already see in the distribution that there is no Characteristic scale as opposed for example to the normal distribution and the normal distribution Do you have a characteristic scale which is approximately given by the mean? That means you can describe all phenomena more or less on a scale that relates to the mu in A power law you do not see this characteristic scale All possible observations have a certain probability that nicely fits this distribution, but there is not a characteristic scale So here we gave an example simply from computer science You see the same relation of occurring events Independent on how you measure these events. I took the example if you look into your files on your computer Then you have very small files like two kilobyte files No one has any more two kilobyte files, right? I get emails where people don't mind sending me attachments of 10 megabytes They don't think they just click on something So they don't think about me receiving these big files. They don't even recognize that's a big file So two kilobyte was maybe 20 years ago But it's interesting to see that there is a certain regularity of the appearance of these files and in this particular case you see The most common files then are the one kilobyte files and two kilobyte files are only a Quarter Of this probability to occur Right, so there's a fixed relation between finding one kilobyte files and two kilobyte files And scale free means the same relation between one kilobyte and mega table also holds for one megabyte and two megabyte files and For one gigabyte and two gigabyte files and so on so that means no matter on what scale I measure the phenomenon on I always see the same relation and this is exactly expressed in this in this linear Relationship here, so if I put 500 in relation to 50 It's the same as if I would put 50 in relation to five and so on and so on you understand This is although Called self similarity in some respect, so there is this example of fractal Fractal is a structure that you see for example if you have a particular pattern that has these dendritic structure with some With some there's a structure that has arms for example Looks very careful. I don't have an example right now Maybe we should put one of these so and then you zoom into the structure, right? You'd zoom into the one of the wings of this dendrite, right? Then you see the same structure again a small dendrite And then you zoom into the small dendrite and look into the structure of the small dendrite and you see the same thing again Right, so that means no matter at what level you look into the fractal You always see the same kind of pattern in a statistical sense That is the meaning of self similarity you can zoom in and out. That's always the same That's the meaning of a fracture and that's the meaning of self similarity Do we have this? in Real life if you think of your body for example, is this the case? Yes, why? Mm-hmm That's true, but the question is this is self similar on all scales if I zoom into you Yeah, so then I see something different on the nano scale level from them From the millimeter level from the centimeter level and so on so that means we do not expect self similarity All the time right the classical example for this is the coastline the coastline of Britain So you measure this with a whatever Meter scale. No, what what do they have yard scale? So okay, so you measure this with a yard scale and then you get a number, right? And then you measure this not with a yard scale, but with an in scale and then you get a different number and the increase of The coastline of Britain heavily depends on what you use as a scale, but the relation is the same That's the interesting thing So That is a meaning of self similarity of factor. We can describe this mathematically by extracting these Prefactor which characterizes the scale actually, you know so the distribution below this the f of x is always the same, but there is a scale parameter or a scaling function g of b that Basically describes on what scale we measure the thing Kilobyte or megabyte or Inches and yards and so on that's the meaning of it Conclusion what I want to tell you by by talking about the body who does not display a fractal structure on all scales is Most phenomena are not self similar Only some phenomena We have to test this high positive hypothesis really Careful, okay. Let me talk about the moments of the distribution Remember we want to describe the distribution by the mean and by the variance Put my clock here so then The mean is calculated as the first moment of the distribution though that means That's the first moment x times f of x integrated over The whole range of values which start as by at x mean and then go to infinity and then this is the expression for the For the mean value Remember that we have already calculated the normalization concept on the previous slide So and this means we get something like this From here you recognize a second important restriction on the alpha the mean value is only defined if the alpha is at least two So there are cases Where the alpha is not Larger than two So I gave a few to have solar flares for example these are these outbreaks on the Sun so but all the walls if you measure the size of human conflicts wars in terms of Casualties that people now then you see a power law behavior, but the alpha is less than two It's still a power law, but it doesn't have a mean So It has to be larger than one. That's what we found out But the mean is only defined if the alpha is larger than two I come to this in a moment on another slide again. We talk about the meaning. I think Not really so Okay What does it mean? Let's discuss this here. What does it mean if this distribution doesn't have a mean? So let's assume. I give you this 10,000 data points And of course you can easily calculate the mean, right? What can you calculate the mean? Because I just gave you 10,000 data points right and the mean of course can be calculated from the sympathize And then you get a value, right? Now I give you a million data points So you would say I have a better statistics, right? So 10,000 to a million What's the impact of this on the mean? It will be very different, right? That's the meaning so it means in more data you get you can get all sorts of values of the mu dependent on what your data size You understand this so in practical circumstance You do not recognize that you cannot calculate the mean because for a finite sample you can always do this But this value changes if you go for larger or smaller data sets. That's the important message. Yeah, thank you So then we can also calculate the second Moment which is a mean square or the variance And then we see in the same manner as we do here This is only defined if the alpha is between two and three that means in In most cases, I mean it can be also less than two of course so but it cannot be larger than three If it is larger than three then the variance doesn't have any Meaning so we have a power law, but the alpha has to be at least one if it is Below two, I don't have a mean and if it is a buff Three, I don't have a variance. What does it mean? I don't have a variance again the same discussion If I give you a data set you can calculate something and can declare that this is sigma but the sigma heavily changes with what we get in terms of Of How it converges dependent on the data set So what kind of data follow what kind of data follow a power law Here we took one example, which is one of the earliest example That have been detected. It's about city size The city size is measured in terms of population and then you see In a normal plot that's a distribution looks like this So you would probably ignore all these small percentages of cities that have 10 to the 5 10 to the 6 10 to the 7 Now 10 to the 7. No, maybe yeah, it's 10 to the 7 inhabitants But if you look on a lock lock skill then you see that it perfectly fits a power law And if you simply ignore these very small probabilities here, then you miss out a city like New York, right? That's the consequence of this. You think about this No, next time before you discard some of the values as outliers You think about this case and think okay, maybe it's something like New York that I'm just throwing away in this moment So you see here. It's a very clear Relationship there are a number of Yeah common sayings That's referred to the scale free or the power law behavior. I took few of these No 80% of The people only do 20% of the work Oh But it's the other way around 20% of the people do 80% of the work, right? That's an important consequence that holds everywhere Only a few people really feel responsible and the rest feels responsible To a lesser degree There are These other things that be used also in when we talk about systems engineering that Only 20% of the problems cost us 80% of the time right and 80% of the problems only cost at 20% of the time That means the underlying distributions heavily skewed Okay Let us look into one important value here. That's the media on what's the meaning of the media on the media on Divide the distribution the underlying distribution in two halves That means here x one half is the value that divides the distribution in two of the halves So it's simply defined as taking the whole Distribution and half of it. So then you get a firm Expression if you know the F as we do then you get a firm expression about this well, of course always the X mean appears that's clear and now we would like to take one example like the wealth distribution and Look into how much wealth is concentrated in One of in each of these two halves of the distribution If you think of this then you would maybe Easily argue now Because it is a media that divides the distribution in one half Then one half of the wealth should be on one side and the other half of the world should be on the other side What's wrong with this argument? That's not the interpretation. What's wrong with this who wants to argue about this Yes, can you Alex? Can you say it loudly for everyone? Yeah, in this case because we integrate here we compare the two we compare the two areas Yeah That is the very important of the X is the value that divides the distribution Well, we compare the two areas and in one of the area can be much more mass than in the other area in particular If you have a skew distribution, what will be the situation for the normal distribution, honey? It's the same, right? So usually the trap the mental trap is you think it's the same, right? there's a richer half and a Poor a half so but the the mass and both of these is the same that's wrong So if you calculate this then we see we can empirically determine the alpha That's 2.1. So and then the consequence is that 94% of the belts is Distributed in the richer half or But it's the other way around 6% of the words is distributed in the poorer half We're talking about the area, right? That's something you have to know if you want some that's the underlying Argument of Occupy Wall Street or something like this, right? because people are Not satisfied with this kind of the distribution, you know, they take all the people in the US So and they form two groups The richer and the harbour more at the poorer ones then this inequality Appears and They are not satisfied with it. Why should half of the people only have 6% of the belts? Well, the other half of the people have 94% Okay, you notice here that from this empirical data the alpha is larger than 2 This was important for what? What why was it important to have the alpha larger than 2? Yes, right that was a important thing, yeah So there are other examples about large inequalities we have a Special lecture on inequalities. It's lecture number seven or eight or so so that means I'm going to talk about this again but you understand whenever you see a Skew distribution you have these huge inequalities Underlying this that's a message here, right whenever you seek skew distribution and you calculate the media and you find something like 94 versus 6 this is true for the power law for the log normal. It's not that extreme, but it's still a lot Let me move for some other examples of power law First thing one has to notice is wealth is not power law distribution. This was just an excess size, right? Okay, just for us. So the wealth distribution is not a power law If you look into the total wealth distribution only the wealth of the richest American is a power law distribution the wealth of the poorest American as we see here on this slide is More or less a Boltzmann distribution Exponential distribution as you call it That's important to see so if you have the data Then you have a long tail I have to show it this way Yeah, you have a long tail and then you have something curved here The null hypothesis is this has to be one and the same distribution, right? And then you do all the effort to fit this curved part into your power law, right? And people do this In fact this first part is described by different distribution And the two distribution just merge at a particular point and then it is a power law. That's the message here, right? So you should not assume that it on all scales fits the same distribution Therefore I talked so much about this X mean X many is a very important variable here Okay, so you can read this yourself, but the frequency the birth frequency was the first famous example So that was empirically studied by Klaus Zipps On one of these slides. I wrote two notes about his work. Well, you should check this out so this is a really Famous and really interdisciplinary work one has to say Okay Here you see Those number of citations that papers receive that's an important empirical fact if you want to predict the probability That your paper receives more than one citation More than ten citations more than a hundred more than a thousand and so on and all these Probabilities that something you can test empirically lie on the same distribution The probability that your paper receives ten thousand citations is not zero, right? That's important message. It's a non need need legible Probability, but it's rather small and the question is whether it appears It occurs for your paper or the paper of someone else Okay Alright, so it's very important to think of all these examples where we do not find a power law Because the power law has become such an important mindset that all the people think they need to fit everything To a power law, right? The editor in chief I see lots of papers that are submitted by people to my journals And so most of them do something like this. They cannot imagine that maybe other distributions are As good as candidates as a power law Question about this so Now we have these two candidates on one end we have the log normal distribution We discussed a week ago and on the other end we have the power law distribution that we just discussed These are two skew distribution, but how do they compare to each other in order to Discuss this a bit further Natalia has prepared these two nice slides for you So You already know what the log normal distribution is it's a normal distribution as you see here But just for the logarithm of the variable not for the x, right? So you see this is a normal distribution But instead of the x we have the L and x here now We would like to know what's the distribution of the x is then we have to do this variable Transformation that I discussed last week and recommended you to do though. This is a variable Transformation we want to get from L and x to x then I have to consider this Derivation here which gives me the one over x last week. We had to discussion where is the one over x coming from? No, so there is the first mistake. This is not equal, but Proportional to so we dropped the normalization constant So now let's take a logarithm of this Why do we take the logarithm of this because with the power law? We also always talk about the logarithmic plot a log log plot instead of the Normal plot so therefore we take the logarithm of this to compare the two and then we get something like this We just take the logarithm and now if you call the set and an axis sets and you see that's the square That's a quadratic equation. Yeah Minus set square pump pump pump plus Pump pump pump times set plus a constant, right? So it's a quadratic equation that you see You see this set square plus set plus constant If we take the logarithm here and set was equal to L and x So now we go and take a plot here. That's the plot It's a log log plot and in a log log plot the power law appears s a linear function here and What's the function of the log normal distribution in a log log plot? Who's following this? I just said it 30 seconds ago. It's a quadratic So curve in the log log plot. We just discuss this right so that means we have this Straight line and then we have a quadratic line and Now to compare this with the usual exponential functions, so we see so this is Then yeah even far further away from this plot you may assume Well, I can always distinguish between a log normal distribution and a power law, right? Because this is quadratic and this is linear. I Could not mix it up But the problem is the following I mean this is over This is over four orders of magnitude now What happens if you do not have four orders of magnitude if you just have a little piece from this distribution and if Yeah, if you look here for example, then it's extremely difficult to distinguish between a power law and the log normal distribution Consequence You can only talk a power about a power law if you have several orders of magnitude, right? And not one order of magnitude. I see many papers where people plot a power law over one order of magnitude, right? So you understand that this is complete nuts, so it can be anything Yeah, that's the mindset of the people they expect a power law. Therefore, they don't care, right? So they can put it from one to ten. That's always a power law, right? But it can be anything So that's a very important thing here Yeah, that if you have only pieces of this you have no chance of distinguishing between these two these two curves and another one is Where we come to one of the coming lectures is if you see Time-dependent behavior becomes even more difficult There are Some firm relation between this So with this I think we have introduced fairly well our two candidates For skew distributions the log normal and the power law We have even seen Seen how they relate to each other in this little plot Basically, we would argue that the data is sufficient and we can probably tell them apart, right? That is at least it's a Suggestion from the previous slide and now we go to the real data and look into what we find and I I Present what we find in terms of stylized fact. So what's the stylized fact? So we used a definition that was given by the famous economist Nicholas Caldo from Cambridge And I read it loudly for you I recommend even that you learn it by heart Because what I see in the exam is that people then have their own version of what they think is a power is a stylized fact, right? First of all, it's a stable pattern. So that emerges from many different sources of Empirical data It's not a stable pattern that I see in one data set. Yeah, that's the Important message here Well, I have one data set and I see a power law then I cannot call this a stylized fact if I see this in various different data sets Firm-sized data from Japan from the US from Europe and so on then I can claim That's the stylized fact you get the point. So very important and it has to be a stable pattern If I give it a snapshot of 1960 and I see a power law I Cannot tell whether this is a transient distribution or whether this is a stable pattern, right? Not that important thing That means as Caldo says Observations that are made in so many contexts that they are widely Understood to be Empirical truth to which theory must fit The important set notion here is We believe that this is true What we see as this emerging pattern repeatedly, but we do not have an explanation for Right, that's the important thing. So stylized fact is a fact But not an explanation of something that we see you understand the difference. We observe this what we have no idea Where it comes from? Conclusion it's extremely nice if people send me papers where they show me lots of stylized facts But this is basically that these are empirical plots But does not help us to understand what's the underlying Reason for this if you want to do science and you have to deal with this other side of the problem as well All right, you present the stylized fact But then you also have to struggle to find out what are the microscopic Interaction dynamics that produces these kind of stylized fact You cannot ignore this, you know presenting a power law is not a scientific paper. You know period so Okay, so but what we do with the stylized fact is we use them as a reference point when we do modeling The other Opposite end of these cases is as bad as the first one, right? Then you see all these sorts of computer simulations where people start with Free assumptions on how agents should interact and then they present the results, right? Of course, that's a result of a computer simulation. What does it have any meaning? How can I know that this result of my computer simulations? Correct. I Can only know it by comparing it to some empirical data but if you have a Agent-based model about household income and spend it to then you would probably not meet the Outcome of these computer simulations with exactly an empirical finding, right? That's not possible Therefore you compare a stylized outcome of your simulation with a stylized outcome of the empirical Observation that is how you do it. This is very important to understand Yeah If you want to know do am I right or wrong with my computer simulation or with my modeling Then the only way to find this out is to link this to the stylized fact Because if you really point to a particular empirical finding then in most of the cases you are probably wrong, right? because household 177 may not have the predicted 1525 francs income, right? You've never met this with a computer simulation. You only meet it in a statistical sense Okay, so that's very important You think a little bit about the meaning of a stylized fact because that's important So here though, this is exactly what I already said How can we compare? Models and empirics Not one by one right the only thing we can do is we compare the statistical outcome of the computer simulations and put it in relation to the statistical outcome of the empirical observations Model validation again is only possible on the level of the stylized fact Why is this this has to do a bit with the type of models that we use here Who don't know who attended this course systems dynamics and complexity was there anyone one two three a few okay, so There we talk about these two model classes one, right Models as accurate as possible flight simulator type and models that show our stylized fact This was I think the easing model of ferromagnetism on these kind of thing Do you remember this? What I want to say is there are other type of models that can indeed be matched to specific Experiments these models are tailor-made to describe a phenomenon in a way that it exactly meets this Empirical outcome. Let's talk about a physics example. Let's talk about superconductivity, right? There are models and physics that predict your to the second Ditch it after the comma. What is the critical temperature? So they are extremely accurate and then you go to the experiment and you find exactly this value, right? So that's a great thing We are not talking about these kind of models here. We are talking here about About models that capture the the the the stylized dynamics of economics, right? We are hardly able to do this on the level of the superconductivity Because agents in an economic Meaning are not electrons and atoms, right who have a very narrow distribution of their Of their properties, right? All the same charge the same mass the same type of interaction and so on right so that's the difference here Economic agents are different therefore. We will not have a model that exactly predicts something very specifically and Therefore we need to compare it in this kind of way So with this we have a break of ten minutes and then we talk about firm size and firm growth afterwards Let us continue, please with the second part of the lecture so we talked about stylized facts now we talked about The relation to computer simulation and this is an important Topic as I said to help you to understand the meaning of it So and now we come to the stylized facts Remember that we wanted to look into firm size in the beginning That's a scalar variable X that is assigned to a firm and Of course, we have to discuss how we want to measure size We already talked a bit about this before we can use different proxies number of employees then returns Assets sales whatever right we see a complete different behavior with these proxies and we have to think twice But thanks to Caldor we learned that the stylized fact is something that we see in very different data sets, right? That's an important thing If you have proxies that do not give you the same distribution then you have to think how robust this is Finding and is it really a stylized fact? Okay So what do we know about firm size distribution that stylized fact number one? Each of these stylized fact you should really know all the by the number, right? So and it says firm size distribution is skewed. How does it sound? stylized fact Sounds completely disappointing. Yes professors. What are you telling us for two lectures in two lectures? How important it is to get these robust patterns in the data and how we compare this and it's with a Computer simulation and our models and theory and then you come up with this extremely weak statement here I mean we could have taken a statement like this We are even without thinking about other things, right? Yeah, this is disappointing in the first thing because it's very broad and doesn't tell us what we wanted to know namely What kind of stylized fact we have? What kind of distribution we have? So but it's important that it is skewed so that means your null hypothesis should be the normal distribution And that should be rejected What was this? What did I do here? Sorry for this. So okay But as I said before the pattern is relatively stable That means no matter at what data we look if it is from the 19th century Or from the 20th century or from the 21st century. We will always see this pattern. That's number one No If we see it only for the year 1999 and we have to think twice so but the stylized fact exactly points to the fact that The distribution is controversial That means if we have these two candidates power law and lock normal We cannot really distinguish between whether this is true or the other one is true And that's something we look at now in detail. So this is data that Analyzed by colleagues of mine from Boston It's about all publicly traded us manufacturing companies over 20 years All publicly traded us manufacturing companies so we are talking about a big data set here, right because Everything that is known about these us firms have been taken into account So publicly traded refers to the stock market So manufacturing refers to this Can you recall what the sick was I use this in the very beginning of My course here in lecture number one who recalls what the sick was It's an industry code exactly so and two thousand bits three thousand nine for two three thousand ninety nine Refers to manufacturing, right? So that means you can then go back to this classification can look it up Yeah, that's the important thing. So we are not talking About fear here, right because fear had a sick code very different from this one. Remember that Was not in manufacturing. It was in managing and business and so on Recall that so this is very important. We are not talking about every firm on earth We are talking about those firms that are in the manufacturing sector that are producing something We don't talk about financial services here, right? Very important to understand therefore the sick number is important. Then we talk about 20 years and The as we can see here the size of the of the this Company is proxied not by employees that we usually do but by sales So that means you got the sales data of all these companies. So what do we see here? What is this? Can you explain it? Hello everyone here. So what do you think? What do we see here? What is the most recognizable feature of this picture? It looks a metric, right? That's the most recognizable thing. It looks like a parabola, correct? It looks like a Gaussian distribution So that's the most recognizable feature. It looks like a Gaussian distribution This is different from all these linear functions that I have shown just two slides ago That's the first recognizable feature and then I look into the X and say oh well That's a log scale and that's a log scale. So what kind of distribution do I have here? It's a perfect not normal distribution Because if I take the logarithm of X and it's a perfect normal distribution That's an interesting thing, right? So that means this data clearly suggests That's a log normal distribution Very important point So what is remarkable is that this log normal distribution is stable over 20 years In 20 years a lot of things happen I mean after all they measure the size in terms of sales and the value of the dollar was Probably changing all the time. There was inflation and all this kind of stuff, right? So how can I compare the sales values of 93 versus sales values of 74? That's it next issue, right? I have to sit down and carefully scale the data against some basis value, right? I have to remove what the dollar is doing by itself in terms of the dollar dynamics from these sales numbers You think I just plot some things that they gave me, right? And then I see this now this is You see this only after a very very careful Treatment of the data That's the next important thing So what else do you see? Are there noticeable features that you would like to address here? Mr. What's your name? Yeah Olivier so what what do you see? Okay, I'm not so sure about this size or you can say about what you probably want to say Is that the fluctuation on this end is much larger than the fluctuation on that end? That's what you want to say, right? So Okay, so this is a very important notion here because we see that Certainly small firms have quite a bit of a dynamics Within these 20 fair years whereas with big firms We do not really notice it because we are talking there can be also large fluctuation But thanks to this scale, we don't see them right so 10 to the 10 and Then 10 to the 11 10 to the 12 so these are huge numbers, right? This is 10 to the 3 Okay So but we noticed that there is a larger volatility for smaller firms There's less volatility for bigger firms. That's something we have to look into when we later talk about growth rates Well, the life of smaller firms seems to be more interesting if you want to see this in a positive way, right? Okay So this is exactly what I said We see that it is a stable over year the year So this is all the due to the fact that the data have been rescaled So that means all the external dynamics as I said about the value of the underlying currency and so on has been removed so the larger volatility is an important fact that will Will interest us later on again, so Now what did we say he said it has to be a Observation that occurs in so many different data sets that we think that this is true Okay, so let's take another data set Before we talked about all publicly trading manufacturing firms in the US Now we are talking about all firms in the US. That's not the same, right? Then we see something like this So that's not a log-normal distribution, right? You just compare the two pictures This is clearly a power law and a power law that spans from one to ten to the six. So it's a huge scale Right So what is the difference between these two plots that's so the first thing that you recognize even before is It's not the same observation So I'm not allowed to call a starlight's fact that this is a power law I'm also not allowed to call a starlight's fact that this is a log-normal distribution because this is not a robust observation Instead the only thing I can conclude from these different data set. It's skewed So that's why we call to starlight's fact. There is a skewed distribution underlying it. Well, we are not specific unfortunately about This and why is this a case that we find it either in this data and in that data? Okay? That is written here So first of all, we have a different measure of Sass a different proxy Before it was sales here. We talk about employees and there are other like measures as well like received. So Okay This data is only about one year it's not from 20 different years as we have seen before and So people looked into other data as well to find out what's the power loss But the most important difference is I hope that's on the next slide Yeah, this one The underlying data set is a very different one before we talked about the compute start database That's a commercial database that you can buy and then you look into the firms here And you see that there is a bias in the data set towards larger firms Okay, whereas the census data is about all firms and Most of all firms are not large but small right that means if you neglect The smaller firms as a compu start database is doing then you see This shape curve in the beginning that Refers to a log normal distribution But if you have a better statistic on the small firms as the US Census Bureau gets then you see it's still skewed. Oh, sorry So we're talking about this area here right here. This area is skewed although in the log normal distribution you didn't see it because I Now I mean I can go back here though. This is also skewed Yeah, if you zoom in that's a more or less a straight line Right, so we are talking about this area here which applies to small firms So if you have better data, then you see it's going like that Okay, so that means the major difference here is that these data set cover better Either larger or smaller firms That is the important issue No That is why you find in one case a log normal distribution and In the other case you find a more or less power law. We are not talking about the same data here. That is Very important But there are other comments that I have put up here the beginning If we go for the time evolution and we have to do this When we talk about this 20 year time period Then we see that the log normal distribution will even that's not on this in this lecture But I think in the next lecture or the upper next lecture and we see that's a no It's in the modeling part now. I remember it there You can see that the log normal distribution has a time evolution that makes it Indistinguishable from a power law at least in the tail So that means the contradiction that you assume in the first moment here by looking into this completely different Distribution is not really there if you go for long times and you just focus on the on the tail That means on the large firms, right? So it is there if you have enough data about the small firms Yeah, so that means there is a Mathematical reason why I cannot really tell these two Distributions apart, but there are most and for most the Differences in the two data sets that account for the different findings and that's why we were so weak on our on distribution so this slide compares again the CompuStyle data and the census data and you see here So first of all the census data is much larger 5.5 million firms and then you see that the coverage is different Here in the compuStyle see that most of the firms are in this area and Here you see that most of the firms are in smaller Size classes so what is size class zero? What's the meaning of that? That's a one-man company from zero to one That's a one-man company and the one to four is Everything from two three to four because this gives you the lower boundaries That should not fall into your class and this gives you the upper boundary, right? but The lower boundary should fall into your class and the upper boundary should not fall into your class That means everything zero No, the lower should not fall into your class the upper should fall into your class Okay, so this makes obvious that from different data sets you can expect different outcome That means our theory is not as Predictive as we would like to see But we have to find out Later when we talk about the modeling How we can match these two Seemingly different distribution into one model that gives us either this and that It means from we now go like five Lectures ahead and we talk about the microscopic models who reproduces and we have to come up with a Microscopic interaction model that allows us to get either as a log normal distribution or the power law We write a microscopic interaction model that only gives us the power law Then we probably miss a lot of things In this model because it can be only applied to a very special case. Yeah, namely large data Sets for example and compute start right so we have to have a model that allows us To bridge between these two distributions. That's the task that we can already Get from this analysis So now let's deal a bit with the power law that mr. Axel found in this one data set We would like to know What is the skewness of this power law that means what's the alpha right so we have to calculate the alpha so and If you first look here into this Little graph this is another power law But this is the cumulative distribution of a power law and the size is not measured by sales or by Employees but by receipt and this is all the different year So that's the cumulative distribution as we already discussed a week ago the nice feature of the power law is if the Underlying distribution is a power law then the cumulative distribution is also a power law just with a different Exponent, right? That's a method that I gave you last week and we can see this here Well, I'm only of the rotator so we can calculate now the cumulative distribution by just integrating over this and then we have Then we have a Distribution like this so that means instead of the minus alpha I have the minus alpha plus one that's the relation between the underlying distribution and the cumulative distribution There's nothing else. It's a nice feature of this. There's a special case There's a mistake on the slide where in the cumulative Distribution the alpha is one that this should be One Alpha minus one should be one right so this is there's a mistake because this alpha I mean there's a Alpha minus one should be equal one then we talk about the so-called sift plot Or we find in the cumulative distribution and the exponent that is one That's this mistake. Yeah, it has to be Alpha minus one that means the sift plot is a plot where we find the Where we find the alpha in the underlying distribution the density distribution Is equal to two and therefore in the cumulative distribution as equal the exponent is equal to one, right? This is a special case that applies to a lot of Phenomena so I'm listed a few here. So frequency of word usage was sift's own example He calculated how much certain words appear in written text. So that's a tip floor But also the city size follows a tip floor Immune system response follows a tip floor and so on so the tip floor again is a power law with a special exponent Yeah, special power that is alpha equal to or in the Cumulative distribution Alpha minus one equal one. That's what I said here. Yeah, there's this mistake So and here we see this clearly appearing With a slope equal to one In the cumulative distribution, let me repeat this again If you find the alpha equal one in the density distribution, then can you recall what I said ten slides ago? No one and we have a problem, right because for the normalization The alpha should be at least one or even a bit larger, right? And if the alpha is Different from two then we have a problem with the normal with the with the definition of the mean and so on You should recall this restrictions here. Here's the alpha is equal to two and the alpha minus one is equal to one So now we would like to get the alpha That's the most important value from our data, right? But we now want to repeat the exercise of mr. Extl, then we have to calculate the alpha from the data now, correct? More precisely what we want to do we would like to get an estimator That tells us what is the best value of alpha given that we have this underlying Distribution or proxied by this observation here given a set of nx values What is the probability that these values are generated from a distribution with this? Particular alpha I would like to get that alpha that best describes our Underlying data so this particular Notion here would say I used it a week ago when we talked about maximum likelihood Distribution in this setting means given That means this year means Given the X I these are my observation. What is the alpha? Right? This means given the Alpha, what's the distribution of the X I? And the assumption here is for this that it follows a power law Remember when we did our little exercise last week. We also had to take an Assumption for the underlying distribution. It was a normal distribution at that time and then we try to Then we try to calculate what is The mu and the sigma That described best our data set that's given right and here we do a similar exercise We ask what is the alpha? Because if we know the alpha then we also know the mu and the sigma if it is defined So what is the alpha? that maximizes our Observation of course there is a relation between these two probabilities Well, so that is given by the base law so the Probability that the alpha describes this Set of observations of given X is related to the probability that I see this distribution given some alpha right That's a relation with this and in the paper by mark Newman that I Mention on the next slide. There was an argument that this is approximately constant I mentioned it here that you do not mix this up the likelihood Estimation we assume the data is given and we ask for the proper expression of the mu and the and the Sigma or in this case of the alpha that meets this data set Using the assumption of a given distribution So, you know exactly what to do with this now In order to get the maximum likelihood estimator We define The likelihood function of the log likelihood function as the ellen of this probability That's the distribution with some given alpha. That's our power law So we take the logarithm of the power law. That's what I Wrote here and then we say what is the alpha? What's the expression of the alpha that maximizes my likelihood function or my log likelihood function in this particular case so that means I said I First Takes a derivative of this and then I set this to zero That's exactly what we go with the normal distribution. Remember this so that's the same exercise again I really want you to understand How we do this with a maximum likelihood estimator and then we get a value Like this so that means given that I have a Set of data X I from my observation Given this and I have the hypothesis of This data being generated by a power law or from Sampled from a power law then I should calculate the alpha this way In order to have the best estimator for the alpha. Yes, please Yes, yes That's what people do in practice a lot. So One can also that's what what most people do right so but this actually means No, it's not good. So this is Okay, I didn't prepare Special slides for this but I mentioned here and I think on this slide now on the slide On one slide. I mentioned this paper by mark Newman who gives us an argument Why we shouldn't do this? if you want to read this this is on On the slide where we had the power law and see City size distribution. There's a paper by mark Newman Which Tells us a bit about the argument. So the reason why we are not simply taking the Observations and take the logarithm of this is a very bad statistics what you notice here from this is that we can Calculate it from the cumulative distribution the alpha in a much more Significant way as from the data. So you can see this here if you look into the statistical error The statistical error gets smaller with n as you can see here So that means the statistics of Calculating the alpha from this goes considerably down and if you do this by looking into the cumulative distribution You get a much better result of this You get this you would assume you get the same value if you do it from the From the density distribution instead of the locomotive distribution, but that's not true You get the best result by looking into the cumulative distribution of this All right So it's important that you understand how you get the alpha from the data Therefore we all they have an exercise about this where you calculate the exponent Of this power from the data exactly as I have described it here But you understand how you got this expression of how to calculate it got it from the maximum likelihood estimate If you have other distributions, of course you get other expressions here That's what we already understood So with this I come to a last problem which I would like to discuss in the last 10 minutes It's about deviations from the underlying distribution If you look into such a plot here, so this is again the number of companies And the size distribution proxied in sales, so basically it's a similar It's a similar data than as a data we have already discussed in the paper of Amaral Where we should have shown the log normal distribution So you see here that's a log normal distribution, right? That's a log scale and it looks pretty much the metric It's a great, right? So I got the log normal distribution But there is an issue, right? So you see most of the data is here on this end with the small size form So that means the statistic is good here If you go to large scales to large sizes and first of all you have less at data point And secondly this already spans two orders of magnitude here This also spans two orders of magnitude But I'm certainly able to tell the difference between 9,950 and 10,000, right? So on this end I can do But on that end I'm no longer able to do Because I'm talking about huge differences in the size So the conclusion is that in the upper tail of the distribution I have difficulties to tell whether this data is really on the log normal distribution or not From this graph it looks like this, right? So you would say well why not, okay? So this has to do with the spinning issues and all this kind of things That means here for these firms we have to put a question mark there But there is a technique that allows us to investigate exactly this upper tail And to find out whether it matches the log normal distribution or not And that's called the ZIPF plot So Mr. ZIPF is very famous here now in this lecture Okay, you should not mix this up with the ZIPF distribution The ZIPF distribution who records this I try to make this point quite often Pardon me? That is wrong, that is wrong, that's exactly what I said What you said? It was on the slide, it's wrong I said that this is wrong for the community No, no, I said it's the other way around The ZIPF plot, yeah please The ZIPF plot is a particular power law Where the alpha is equal to 2 in the density And is equal to 1 in the cumulative distribution fund That was a mistake on that slide Therefore I mentioned this several times Alpha plus 1 is equal to 1 Yeah? Alpha minus 1 Oh yes, alpha, sorry Alpha minus 1 is equal to 1 Yes, that's correct So the ZIPF plot is a special plot A special power law Where the alpha is equal to 2 in the density function And equal to 1 in the cumulative function Okay, so And this should not be mixed up with the ZIPF plot I don't know why this is named after him again So we usually call this a rank frequency plot So ZIPF dealt a lot with its ranked distributions Of course So this word frequency issue that I mentioned City sizes and so on So what we do is we have our N observations here And we assume that the distribution is known Which also means that the cumulative distribution is known Right? It can be a normal distribution It can be a log normal distribution It can be a power law, everything Our assumption is that we somehow did the job Finding out what distribution fits this data That's the idea By the way, we do this I think next week When we talk about the Komogorov-Smirnov test So then that is a way to test If this is the log normal distribution or not So let's assume that we know this And now we do the following We order these observations In a way that we start in such a way That the first value that we mentioned is the largest one So it's in descending order here If we talk about firm size Then the largest firm in the sample Probably general electrics or something Is then the one that appears first And because it's an ordered set There is one, two, three and so on So that means this little index I Gives me what I call the rank Ranked one means the largest Ranked two means the second largest Ranked three means the third largest And so on, right? And rank N means the smallest In some cases you order things in a different manner From smallest to largest So therefore you have to keep this in mind Rank one refers to the largest firm In this case So that means if I have the Xi From this ordered set Then the I tells me exactly what rank I have Clear? And then now I can do a nice relation Between the Xi and the cumulative distribution function Which we have written as a capital F In the last lecture We can show that there is a firm relation Between the I and the cumulative distribution function I don't have prepared a slide For justifying this Maybe we should do this next time It's described in the paper that I mentioned On the previous slide We got the data from yes So it's a paper on tip plots And size distribution of firm If you want to look this up Then there is a more detailed explanation Maybe we should take it here in our thing There is a relation such that one minus F Where F is the cumulative distribution function Is precisely related to the rank of this observation That's the important thing Or we can put it in the other way around We can say that the rank I Is a specific transformation Of the cumulative distribution function And by using this we can Now do the following plot We have a data set And we plot the X The order in the ordered manner Where the C I I took the logarithm here You see that this is a We are concentrating on the logarithm of the rank And the logarithm of the one minus X I We plot the logarithm of the rank Versus the logarithm of the size Which is called as the sift plot here And then we get Let me go to this exam Let me take this exam So what we do is we plot this As I just described on the slide So that's the same data that we have here But now we ordered this in terms of ranks And we plot this logarithm of I Which is the rank, that's log I Versus log X That's what I said See this, this is log I And this is log X Then I get a picture like this As you see it here And then I look into the data The data is this curve Unfortunately it's not in color That is below the other one There are two curves There's this data below And there is the theoretical curve That is above Where do I get the theoretical curve from By assuming that the data Follows a log normal distribution Right? Again, I plot the log I Versus log X I And I see that the normal distribution Nicely fits the data As long as I talk about large ranks What is large rank? What does it mean, a large rank of 1000? It means the small firms The small firms are probably very good Described, very well described By the log normal distribution So what happens here If I look into rank 1 to 10 Then I notice, and please keep in mind That this is a logarithmic scale here I notice a huge difference here That means that the firms With the lower ranks do not fit The log normal distribution Or do not follow the log normal distribution Instead there is a considerable difference here What are the firms of rank 1 to 10 These are the biggest firms Because I started ordering with the biggest firm Conclusion, small firms are nicely described By the log normal distribution Big firms are not nicely described By the log normal distribution function Let's go back to the original figure You could not have told this From this figure, right? You agree? Because it looks approximately the same That means the tipped plot As we calculated it here Is a way to project the differences Out of these firms with the lower ranks Which are the bigger firms And then you see Whether or not they follow the same Underlying distribution or not In this case you can clearly say Up to a rank of 100 here This is no longer described By the log normal distribution But by something else We are talking about the biggest firms here The biggest 100 firms do not follow The log normal distribution Should they be larger or smaller To get into the log normal distribution What do you see here? They should be larger, right? So they are too small actually In reality Awesome, okay Good, so this is the Let me just finish with this This is the description It's again a theoretical description We need to calculate now This inverse function Of the cumulative distribution And that's what we do here Usually it's a distribution That is listed in a table But you can simply Calculate this in R In a very simple way That there is an expression for this And this is a message here With this difference, okay Unfortunately, so I was not Able to finish with this one But let me, because we need it Next week let me do this very No, we don't do it I start next week by talking about The starlight effect number two Which is about firm growth I continue next week by talking about this And then I try to be a bit more specific So what have you learned today? There are two practical things That you have learned The first thing is how to calculate The alpha from the data If I want to check The power law behavior Then I need to know the alpha Now I know how to calculate the alpha From the data And the second thing is I would like to know whether the large firms Which are the small ranks Follow a given distribution Precisely or not And I found a way with the SIPF plot To zoom into this And to look what's the deviation of this That's a specific projection That allows me to call the difference Two practical things that you may want to use Irrespectable of whether you're interested In firm sizes or not And then you also learned How we are able to compare things Between data And a simulation Or a theory We can only do this on the statistical level In terms of starlight effect Precisely means That's the important message of today Thank you very much for your attention And I continue next week with the slide