 Je to veliko začusil, da mi to prizvan. The world of Topink and Database is probably the newest of databases we are going to discuss today. Let me start with an overview. It seems that tax-based statistics always hit a nerve of the economics profession. In pa zelo načinili izvok, da era veliko debatik v deštenju 19. sejnče, na poželnih pažnih, nekaj je izvok, da je izvok, neki nekaj je izvok, kako se predstavilo na poželnih pažnih. Ne zelo, da vse že vso tudi izvok je neko neko nekaj jih, kako je bil, da imaš način način način v poželnih. in v tako zemljenje. In v mid-fiftejte, Kuthnets, vzivljali tačnje data, da vzivljali vse uprinkanje v U.S. Kuthnets skomentljali, kako je vzivljenje in persistenje. Po II. Vrvod, se vzivljenje vzivljenje in vzivljenje vzivljenje And Arthur Barnes, NBR, and then Governor of the Board said that is one of the great social revolutions of history. And then recently the tax data by Arkansas Piketisa and the things I'm going to talk now brought to the fore the big increase in quality in rich countries especially. And now the 1% has become part of our everyday language. So it's strange how tax data produce such an interest in the profession and also outside the profession. But on the other hand they have been used not much by economists especially people working on income inequality. Well, there are many reasons for that, but it is still surprising that we have had to wait until 2011 when the World Top Income Database was released to have an international database based on tax data. Now the WTID collects data on the shares of personal income receipts by the richest groups of taxpayers. They have data for 29 countries over a period of 143 years. The first observation is from Denmark in 1870. And the last one is for a bunch of countries in 2012. And all figures in the database are computed from tax statistics except for China and more recent finished figures. Now three characteristics of the project are important. The project is dynamic, cooperative and easy access. Dynamic because people involved in this project want to extend forward and backwards their term series to add series for new countries. And they plan also to add series for other concepts like not only for income but for earnings and wealth. It is a cooperative project in the sense that everyone is invited to participate if he or she has relevant data. And there is easy access because if you go to that website you can download all the data. That is the coverage as of September 2014. You can see the countries in red are the countries for which data are available. And the countries in blue are countries for which new series are being produced now. That is something that you don't need to worry. That is something that I did to give you an idea of what this dataset has. That is each gray square indicates that there is available the figure for the top 1% income share. You have countries in columns and here in rows. And you can see, look at first at the right hand side. And you can see that the series are very dense. You have a lot of observations, much less missing values that you would find in the other datasets we were talking about. But the interesting thing is that if you go to the left hand side you have a lot of observations for the years before 1945. So it is for sure the only source that allows you at least now to study pre-second world war changes in income inequality. So, length, density, the two main strengths of the dataset. And the third strength is that you have a much better coverage of the top of the income distribution than in sample surveys. We all know that the rich people tend to, well, do not appear, do not feature or tend to be data not collected in sample surveys. We have information based on tax data for the richest people in each country. Of course there are weaknesses and some of these weaknesses are linked to the fact that tax data have not been used much in the recent decades. The definitions of income or refer to administrative rules. The reference unit is not how the household or the individual, but it is the tax paying unit. That is not what we would like for welfare analysis in many cases. There are important discontinuities due to changes in tax rules. There is of course the problem of tax avoidance. And the last thing, you cannot usually estimate standard inequality measures like the Gini index or the Tile index. But you can always have information about the shelving and going to the top income errors. Of course we could have a similar slide for all the other sources with strengths and weaknesses. But that is what we are talking about. Now, before describing the data set, let me mention briefly the problems of methodology using tax data. That is the typical kind of raw material that is used for this kind of analysis. That is a table that I drew from the paper by Atkinson in the Journal of Economic Literature. It refers to income tax data in the UK in 1911-1912. And the information that you have is you have income brackets for the top of the distribution, number of persons in each income bracket, and the total income received by these people during their fiscal year. Of course, having this number, you can compute the average for this income bracket. Then you have the total number of persons filling tax records, just 12,000 people in 1911-1912 in the UK. And you have the total income of these people. How can you use this data? When Paretov wrote, people were using just this information. Then, what happens thanks to Kuznetz? Kuznetz basically laid down the methodology, followed by Atkinson, Piketisa, El Salvador and all the others. First point, the basic procedure is to compare the number and the income of persons represented on federal income tax, Kuznetz is talking about the US, with the total population and its income receipts. So, going back here, we need the number of total people in the population, and we need the number for total income. To know how much this income accounts, to know how this income fits into the income share of top income groups. Of course, these brackets do not coincide with the person ties. So, what we have to do is to interpolate within brackets to compute the top 1%, top 5%, and so on of the population. There is an excellent discussion of all these problems in the paper by Tony Atkinson. Let me remind you some of the issues, important issues. The first one, I already mentioned that we can compute only one in the quantity measure. Then we have problems, we have to decide about control totals for the population. So, taxation is rarely, well, not rarely, it is mostly now on individual basis, but in many countries over time, the tax unit is the family. So, we have to account for the fact that there are married females usually. The second point is control totals for income. Where do we find total income to compute the top income shares? So, there are basically two methodologies used by people. The first one is to use income tax data themselves, and to estimate the income of non-tax fighters. The second methodology is to use external control, basically national accounts. Then we have an issue about interpolation. How do we, in which distribution do we fit within each income bracket? The typical thing that is what is done by almost all people working in this field is to use a Pareto distribution of this kind. But there are other alternatives, as discussed by Frank Kowell and Mehta in the paper of the Economy Studies. In the paper by Tony Arkinson, you find an estimation under assuming a Pareto distribution of what happens if you change the size of the reference population. So, if you increase the reference population by 3%, you see here, then you have to go further down the income distribution to locate the topics per cent group. And that means that top income share goes up, it goes up by this number here, if you assume a Pareto distribution. So, that means that the choice of the individual population matters. So, if you move from taking individuals, so you approximate your population of taxpayers with individuals 20 plus, or you approximate with individuals 15 plus, you get different levels for the shares. But the effect in practice can be relatively small. You have an effect of population revisions, because there are revisions of the number of people in a resident in a country done by statistical office. And you have, and that is more related to the data set, variability across countries about the subsumption. That is a chart, again, you have not to read, but this is a summary of the different definitions of the reference population used by people producing the series in the data set. And you can see that Abgenpiné's individual aged 20 plus, Australia aged 15 plus, and then you have adjustment for married people and so on. So, that means that you have a degree of variability across the series. Do we want to keep this or not? That is an open quest. When you move to income controls, again you have problem, and that matters much more probably than, not problem, matters much more than for population controls. The big issue here is that you have a variability of the non-taxable items of income. So for typical examples are transverse and capital gains. In many countries capital gains are excluded, in some others are included, and capital gains can make a big difference for the estimation of top income shares. But also transverse, in many places transverse are exempled from taxation, so they do not appear in tax statistics. Then as before, we have a problem or revisions in external control. Because we use national accounts, you all know very well that national accounts estimates change from time to time. We are going through one of these changes right now with the incorporation in GDP of new items that were not before included, and that has an effect on the income of households of the other sector. And again, and here, it is very important to compare the statistics in transbenchmarks. Now, let me give you a quick example. This is from a paper forthcoming in the Journal of Economic Nequality by Rick Burkhauser and the two, I think, Australian guys. That is about Australia. This series here is the series for Australia made by Tony Atkinson and Andrew Lee. This series here, just below, is exactly the same as the other series, except for incorporating revisions in population totals and national accounts made by statistical officers. So no other change except for the revisions. The trend is more or less the same, but you can see that there is, in some years, a change in levels. Then Burkhauser and Quotas suggest that you should, you can obtain a different, a more homogeneous series over time, excluding capital gains, relative to the series produced by Atkinson and Lee, and what they get is this series here. So the trend is upforce is the same, but it is much milder. So that is an example of the problems that you have with this data. That is very quickly about this chart. This chart is something that I produced. I used the income totals available in the database, and I took the ratio of these income totals to a pre-tax household income from national accounts from the ICD database. This should be more comparable across country. And you can see that the quantity, the amount of income captured in tax-based statistics varies over time. That is the U.S. And you see that there is a continuous decline. So that is the amount of income of which we are talking about when discussing inequality in the U.S. using top income data. But there is a large chunk of income that is not included. That is 60% or 40% of the income going to the household sector is not included in this analysis. In some other countries you have an upward trend, flat trend in France, and big differences across countries in levels. So there may be problems of comparability across countries, something recognized by people. I don't have much time to go through all the details, but let me show you. This is a chart that is a comparison for the U.S. So we have two dimensions, comparisons of inequality trends and comparisons of across countries. If we look at inequality trends, this series here is the the Piketty and Syris series for the U.S. Interestingly, this blue series here that matches quite well is estimated from the Bureau of Economic Administration's data. It matches quite well because it was essentially based on tax statistics. It is a synthetic series produced at the time. That is the number, the series for pre-tax household from the current population survey. And you can see that the overall trend is the same, but the increase in equality is much smoother. The red one is the series computed from CPS data by Stephen Jenkins, Burkowski and two other coders, trying to match as much as possible definitions and so on of the current population survey to the tax data. And you can see that the matching is much better. That is a comparison of this data that is personal income. Personal income of individuals, individuals 20 plus, top 1% share. On the horizontal axis you have the number from the WTID. You can see that the WTID usually has much larger numbers. There are many explanations for that, but you can use the list data to get a better understanding of whether there is an inconsistency between the two sources that we have to explain. Or it is due to statistical reasons. Now, the website is very easy to use. There are nice features I'm not going to discuss. That is the example of the typical data available there. So you have a codification, a code for each variable, and codification rule are not provided. You have a series, one series for each top share. So that is, for example, the top 5% income share in Argentina. You have some nodes, but not very developed. So we need more information on that side. So to conclude, how can we develop further WTID? These are not only for the people there, but what researchers can do in general. So I would suggest them to streamline and release variable codification rules. That is something that would help researchers. Provide much more details about which are the taxable income items available. You can go to the choosing volumes to have this information, but if you want to use the data set, it's better to have them there. So we would try to provide lower and upper limits and in perspective would be important to include all original raw data in the database. And then cross validation and cross country comparability needs to be further assessed. Since I do not have time, let me just conclude with this quotation from Kuthnes. I don't pay in the fine picture with thick brushes and large blobs of somewhat mixed colors, but still better than a white page. Thank you.