 Kia is going to present the award income and equality database and so we have out of an hour and then we are hoping to have some discussions about this important database. So Kia, to the sources. Thank you. Today's award graph I have supposed to do is talk about the database of the win, what it is, and how we have come to where we are. And many of you may not have seen this database before so we will go over it in a minute. And then we will see as to what you realize that this is right before now. So the version that is on the website today said, the division that is over new and we are in the process of revising it. And part of the idea of having a session today is to use the opportunity that we have of all of you being present to brainstorm on how we should go forward with the division all the way. So we do hope to have your suggestions on how to do the next version of it should look like. Okay, that's the word. And so let's start with what this book is about. The win, as you can see, is the world income recovery database. And it actually started with the first work that came later, but it really was the seeds of this life in the project that was immediately invited in 1997-1999. And it was really after that the question was, are the rising increase, is it, is power production compatible with rising economy and quality? And that was led by, and we have been monitored and glad to have, for the course of this year, which was the head of the project. And it's from this that in 2000, version 1 of the win came, which essentially hit the outflow of the project that was done with you at 55 USD by then. And then in 2005, there was a major division of this, and I talked to you about this later. And then after that, the versions of it were not major, it was 2.3 and 2.6. So 2.3, the one version that's currently on the website of the new provider is 2.3. It's essentially from 5% but updated after 2008. So this one actually means that it's 2013, so there is, as you can see now, a revised version. Now 2.3 version, we have made our 1.9 countries with 5,000 children and 13 geniuses, for 2,000 people and 3 years, which means that there are 3 countries, and that's because some of the attributes that we discussed later are different. So some might be for the entire population, and some might be for the rural, some might be for income, some might be for consumption, and so forth. So these are different areas, and in all there are 13 millions. And there is also other aspects of the institution that is there in our database, to not just continue, but also that decides the needs of the units, and this second team which I talked about in a minute, which is in the version that is on the website. So the number of countries that we have, the size of the school is not the same, it's much less, than the number of countries that we have geniuses for, but they're there. And then we also have branches of distribution data, we also have data on other assets or so, which are in terms of income, share, unit, and so on, and the next slide will talk about that. To make sure that we, when we are comparing inequality across country or across time, we are not comparing apples with oranges, so therefore this description of what these units mean is present, and that will, in fact, one of the major reasons why this database was created in the first place. And then finally we also have quality ratings about the various units that we have, and we haven't given a lot of information about that. And finally there's a full documentation about the database model website itself. So just to give you a sense of where the database is to be found, I'm going to go to the website and put it up. So this is a unit-wide website, and what this looks like is this database right here. And then, so there's documentation here, and the database is right here. So this is in the form of an Excel sheet, and if you were to open it, these are the various, so the various variables that we have are, and we'll talk about this in a minute, are provisioned country, three countries, and so forth, and there's a report, there's a Gini, and there's a report and Gini, it's the second Gini that I was talking about, mean x, y, so on, there will be this talk about these various variables. So this is how the data is available on the website. It's an Excel sheet, and it has all the details about all of its other aspects of the Gini's and the information data that we have for those of you who are here. The last one is about, if you were to go back where we were, then the second Gini, before I go further about the other things that I've done on the database, the second Gini that was computed by using this method that shows the one form or in wider, and what they do is that since Lawrence coordinates are grouped data, essentially they give you, for example, what is the percentage of share of intel, they ungroup the data, and they find that they have the knowledge which allows them to ungroup the data, and then using the ungrouped data, they re-calculate the Gini. And the paper goes into 2008 for the paper, I think it was 2016, from the website, and our website, and what this method does is that it allows you to arrive at Gini as if you had an ungrouped data, as if you had more number of data than this 10 points or 8 points or 4 points. So how it does it is that if it says a normal distribution to the parameters, to the various knowledge coordinates, and then how you fit it in the, it takes, let's say, 100 sample points as if you're part of the log-normal distribution, and then the sample values that you get, it re-adjusts the algorithm, adjusts these sample values so that the mean of these sample values is same as the original knowledge coordinates data. So therefore, there is an ungrouping process, this ungrouping process protects the mean as a word, which means that we don't come to new means, and that is, of course, the first step doesn't give you that, the fitting doesn't give you that, the second step, which really involves some integration with the sample values, which allows them to arrive at these new sample values, which gives you the same mean. But using these new sample values, now, if you look at the data, they have tried to check, evaluate, whether this is a good method for the work, not work, and they've got that number of different ways. One of them was supposed to start with a real data of 100, group them, find the same, the ungroup them, using this method and find the same again, and then they found that it's not much different. They've got a variety of things, and the data gives you more details about it, and it's by and large satisfactory of ungrouping solution. And therefore, we have a second gene, which is called gene, right now, in the database of the website, which really is a gene that is calculated using this method. Other aspects of the data, other than in the distribution on the database presentation, is, of course, very importantly, whether it's income or consumption, whether it's income and quality details of whether it's earnings and so on, whether it's post-tax, pre-tax, and whether it's possibly improved in time, home production, whether it's also, of course, the distribution, full distribution of income and consumption, and also expenditure sometimes, when it's expenditure is focused, which means, for example, in the case of universal goods, when the total money that is spent on your goods is there, then obviously it's expenditure version of this, but we're just looking at the use value of the goods that's consumption. So that detail is there, and that is in the shared, which is essentially as to the sample unit, or whether the income is considered possible, or the sample unit is a person, and that is as I think that is 0,2, and then there's, which means that, for example, if you have a household data and you have five people in a household, then obviously the third person is not necessarily just a particular amount, because of the fact that there is one person in the household, for example. So there are different ways of making sure that you have a good sense of what it means to give a household income for a family of five, or for a person in the level of welfare, and there are various influence scales that are used. So when the particular data that we are using is using a particular scale, then what the scale is mentioned in our database. And finally, very importantly, we give you a sense of the detail of what area of operation is aged, of age is aged, for all ages, all rules are going to go through, and the total population, or the subset of the population, and the currency, so this totality of all these is what the database is about. And it allows the researcher to compare life with life, and combine different kinds of theories, based on different interpretations, different level scales, different kinds of information, and the researcher is able to cut out that particular set of numbers that are comparable in doing the research. So that is the broadly important database of art. Now, one point that you mentioned in the database is the quality rating. Now, the quality rating distribution is on good things, but what is an income concept, whether it is known as what income concept is, and whether that, and what's the survey quality. So if the, about income concept, if the underlying income is similar and income concepts are not good, then it is not as much as people call it. And if the total income concept is not comprehensive, then again, it is not good quality. Similarly, survey quality, the survey coverage of the questionnaires in the database with the technology has to, some of the work is both in looking at these aspects of the survey, of which the TV numbers are based, what is the quality side, what shares are based, and an assessment is made as to whether it's a good quality or bad. So the next day, if you have a very long computer in case of consumption survey, and it's relatively worst quality that one, which is not so common in quality here, one which involves a single interview versus one that involves multiple interviews, and especially in both countries, in that case, one which is better quality. So based on these, we have one, two, three quality meetings that are assigned to every country here that we have data on. Another documentation, there are three ways that there should be the database that presents the documentation, one is the database itself, which is the details of what the information figures it should mean, and what it should be, and the second is the time which is on the website, which we saw in the website, which is fully available, and all these data concepts that we're talking about, and finally there is the structure of the database allows you to give so much information and often times you want to know much more than that, and to facilitate that what we do in our database is to give for each country the details of the surveys of which especially about surveys which have resulted in the numbers that we have put on the website. So it gives you more details about the sampling of the views, the database, so on and so forth, sacrifice, and other videos that might be available in the structure to give you a full sense of how much value or how much weight people want to get out of the database. So last year was in 2005 and what was done in that division was essentially that the overlapping is prepared for delivery and the sum was based on linear coverage which basically means that there will be very small let's say just few cities and so on especially if there was data also on a larger algorithm let's say operation and this was the engine so it was like proving on some of these things. Some of the older estimates were replaced by new ones and it's especially true of Elias where new data makes new numbers for the new data the new estimates are there and then some new, of course the expansion may always happen in the transition some new data, yes or no and finally there was some change in the interviews and video games and now as I was saying that it's time for us it's time for us to read at what we should do and so therefore in the process for the last few months also we are engaged in this exercise and the first technology is also from this data version of this database which for me it allows that to manipulate if you always take this Excel to Spata it's not as easy as this but this is directly a medieval moment in which Spata is there which will be there a lot of time in Excel and the expansion of data set in terms of more countries and more data have been over time in the last few months collected so we have more countries there are more cities and more countries, yes of course all this is still at last come because numbers are changing and we still have to go back to these numbers and see that some of them will survive in this company but right now this is the in-house database has these many we need these many countries and so on and once we have done through the process of the meeting then you may not survive and we have added some more in order to make up so that's what Spata has said and then we are proposing and this is to of course some of the variable names we realize may not be self-explanatory in the current version of the new database for example there is a variable called country 3 now what it really refers to is a tree isopode of the country so we can just call it a country so when we see we can call it a country 4 this variable means x-y, x-y, y-y we will do we did not find it immediately obvious and x-y so that's what we will remove and what we need is we need a survey of the variable of the numbers of the engineering workspace or other distribution numbers and similarly median instead of median x-y which is the current name the reference period is one variable right now so there is currency and reference c-currency c-u-r-r-t-f and there is currency and reference period so it will be like dollar per month dollar per year and so on so obviously we can easily separate the two so we have currency separate very good there is source 1 and there is source 2 now if you source 2 is not the same source source 2 is giving more detail about source sometimes we do a survey and source we do where the data we got from let's say if I get the data from LIS then that's source 1 I will tell you what survey is about so instead of to make you do more transparent and clear instead of source 1 we just have source and source 2 we have source companies then the second gene is just for the gene and the other gene is reported gene which is the gene that comes from the database or that was computed in the early stages using a pop pad software but the second gene is just for the gene and it's quite easily confusing with people as to what the second gene even though there is another reported gene standing outside and the second gene is as I said which is based on the source 1 algorithm so we might as well call it SWGP or some such name which makes it very clear that this is a gene from the database of another gene that is of course the issue of income definition now of course income originally it was meant to be I don't know ministry but it was income in quality database and again as one of the ways in which welfare is measured so it looks like you ought to be called income definition as income or consumption so we might as well call it welfare definition which will make it the income or consumption now income definition kind of income and that's why we awkward that one so either way this changes be proposed in the income variables this is the database there but it's a bigger change that has been proposed which is next one which is to make this data accessible and to a wider set of audience right now I think the original audience was actually researchers who were working on income in quality and were at hard press to find all the entire one place in sufficient detail for them to be able to make sense of what these numbers actually mean but that was the reason the purpose that I think is very well but I think we can go beyond that even the amount of interest that is there in something especially in the last few years and there is also a back or a authentic source on these numbers so in public domain and in public politicians, public policy makers press and so on and so forth I don't care enough to distinguish between the consumption in the company and the different kinds of income and the kinds of life and so on and so forth we can some of those things based on our own data base for all the consumption so if you want to reach out to this audience it will have to go beyond the XM sheet or this data version of the XM sheet it will have to come up in the shape of without reason to be a journalist to pretend you are given this audience you have to go up in the shape of some data visualization which means that we present numbers on graphs and charts and so on and in fact now I some of them have gone viral in the time of Wall Street especially about the U.S. we don't know how valuable those numbers are because they come from individuals who are excited about the survey that we are mostly in the game of the advocacy but this will provide some kind of a regular basis to those arguments in the way if we were to produce all these things based on our understanding of what is comparable to what is not comparable so that is what data visualization looks like it is for its products which basically means that we produce little like leaves on what is the what are the pluses of the what are the negative of the what are the lowering of the what does not mean because all these things are not really very well known even on some of the senior policy sectors lowering of the what is considered to be an unethical realising that it does not necessarily capture a improvement and a lower percentage so the main such things that are available in literature that is not accessible to general public which is none the less using these things so we might as well provide the platform and provide these products so that people who are talking about these things And we are sure that they are talking with some degree of understanding of these concepts. So there will be a whole lot of issues around inequality that we can think of, always require correct and previously very critical products. And the third is that we can also then contribute to the debate to some sense. We can bring to the public domain some of the things that we may have noticed in terms of how the constitution has changed or what has happened, which may be available to some level, but we may have this public debate in discussion. So many things that will happen in some countries, let's say inequality in Latin America is going up or going down, and so on so forth, but people are more proficiently aware of it, then we could talk about it and we could talk about, let's say, what policies, especially in terms of what policy has drawn up in terms of inequality in those countries. So there is a certain contribution we can make to the debate itself, rather than just providing input to those people who are engaged in the debate. And this could come to our own products, interesting facts in a way. And of course, once we do that, of course, this guy is in the middle of how far we want to go and how far we want to engage, and that's a separate question that will be why it has to go on, as to how much of an activist role it wants to play in this game, but definitely it's the direction it needs to go. But these are all thoughts, actually, and we haven't yet realized it, and as I said before, hoping that people use this opportunity to engage with you and you get a sense of how you like with me and just some general and some specific thoughts that we have in mind to share with you. Now, the other thought that we had was that now this is somewhat updated to have 2008 database in 2013. The reason is that if you decide to have this major update in only once in four years or three years, then people will lose interest in the database because they go elsewhere. And we will lose a chunk of our own readership if we do not update regularly. Now, because of the fact that every month, every two months, there's some new country which some computers can work, there's some new data. Now, that need not wait for this division to find this in our database. If tomorrow will not be announced, or January can become published, we should be able to update it here. So that's another thought that we have, that is, from having these versions 2008 to 2013 and so on, we should have regularization, and of course we have to have another question of how we're going to ensure that people have used the database on day one, are able to see the distance that is there between that database and the database on day one. So we have to think about maintaining previous versions and so on and so forth. And that's going to be the easy problem that we have to address when if and when we do this update, this regularization. But this regularization is possible only if we have, say anything, which is that we have regular data flow coming from statistical offices and other institutions that are leading to these statistical offices. Now, the path of the experience has been that it is not very easy to get this information in this kind of detail that we need, in the time frame that we have in mind, very easy. So we were thinking of trying to establish some kind of a long term institutional arrangement, some kind of normal understanding with these statistical offices and, let's say, regional government offices will be in these offices locally so that we are able to get the information on a more reasonable basis. In fact, in a very advanced version of this, we will even have to input the information into our database today. So it is one of the thoughts that we have, this will help to develop this partnership in the long run so that we are able to do effortlessly update this information on a reasonable basis. Now, how do we do this about the input? It's a good, which is that we will judge this information for it, so we always do it well. And that will be the non-rivaled assumption, because if one consumes it, the other consumes it. So therefore, it is what we put in normal solutions for it all. And therefore, I think it's something that we expect other nationalistic offices to work with us and to find what they are vital to work with us. And we hope that to build that relationship so that we can have a more regular vision of this database. And finally, there is one thought that is also important to make sure that this relationship can find answers just to income and consumption of people beyond typical of more inclusive work, where it was that would broaden my inclusiveness of growth. And this is the subject that has, as you can see that has been receiving all of the academic and possible attention over the last three years. And as these new measures come up, should be or should not expand this to improve those things, that's not a question that we have yet. So these are the various things that we have been thinking about and it would be very helpful if we have suggestions from you on any of these things that we mentioned or any other suggestions that we may have so that we can provide this public database to a wide variety of audiences and achieve the purpose that was there behind this creation in the first place. Thank you.