 I'm a research fellow at UNU Wider and I'm going to talk about the, what is now called the ICTD, UNU Wider Government Revenue Data Set. So the name seems to be getting longer as we go along and the title or the working title of my presentation is Toward Closer Cohesion of International Tax Statistics. So I've been working on the update of this of this data set and as I've gone along this year I've been making some notes and trying to put together a paper on some of the inconsistencies and issues I've covered. So one person that's read this paper so far described it as interesting in a very dry sort of way. So I hope it's, I hope it's at least interesting for some of you. So I'm going to start off with just a discussion about where the GRD fits into the bigger picture and discussions around taxation and development at the moment. And then thinking about the bigger picture, there's been a recent focus on domestic resource mobilisation from for example the Addis Action Plan and it's enshrined as one of the sustainable development goals, I'm not going to read it out, it's on the screen. There's a couple of SDG indicators that this kind of data could perhaps feed into the discussion surrounding. And indeed all of the, all of the sustainable development goals are certainly underpinned or it's been acknowledged that they need to be funded by an increase in domestic resource mobilisation. At the same time there's been issues around data quality and calls for in some corners a data revolution and so I think perhaps the GRD project is quite relevant and fits between these two current concerns in the development agenda. So thinking about the data set, it's a partnership with the International Centre for Tax and Development. So it began life there in around 2010, 2011 and after a lot of years of work it was finally launched at the back end of 2014. Since then it's been used in actually quite a lot of research studies. I did a Google scholar search for it a few days ago and I was personally surprised to see the pure number of people that had accessed this data and used it and the sorts of ways in which they had applied the data to their research questions and it's been used by economists at the UN, the World Bank and even some in the fiscal affairs department at the IMF which is really encouraging for us. But the management of the data set has moved over to UNU wider since the back of 2015, start of 2016. So we held a small event last year where a number of researchers came together and presented their findings having used the data set. And it fits as part of a broader program on taxation and development at wider so you might have been to some of the sessions that dealt with tax benefit micro simulation models or this morning some researchers were talking about the access to firm level data in South Africa. We kind of advertise this as a research data set. It's of course open to anyone to use but it came out of a need for or a perceived need for more open, reliable, comprehensive, maybe analytically accurate cross-country data for researchers to access for their studies. And this arose because a number of previous studies on links between taxation and development at the macro level had been based on ad hoc data sets perhaps created by individual researchers that sometimes weren't publicly available and at other times even if they were publicly available it was difficult to put the results in context with one another because the sources were different. There was also a case where or a number of cases where tax policy advice was being given to developing countries but the advice was being given based on or informed by knowledge that came from studies that had only used for example high income or OECD countries in their analysis and that was probably as a result of the lack of availability of tax data for developing countries. And so traditionally with cross-country data people would look to the IMS government finance statistics which has fairly limited country coverage and is not overly detailed in many cases. The same goes for the OECD's revenue statistics. Other problems with existing sources of of tax data and revenue data are that neither the IMF or OECD would systematically account for the presence of natural resource tax or natural resource non-tax in in total revenues. The two sources also differ quite a lot in the treatment of how social contributions are recorded and I'll say a bit more in fact probably a lot more about that in this talk because it's one of the things I've tried to focus on getting consistent in this version of the data set and traditionally these sources have had reason fairly poor coverage for developing countries. I should say that both the OECD and IMF in the in the kind of two years since I've been involved with this project and accessing the underlying data there have been improvements. The OECD released their revenue statistics in Africa for the first time last year for a limited number of countries and those statistics are quite detailed and we've been able to incorporate some of them this time around into the government revenue data set. The IMF's government finance statistics as well have improved their coverage for a number of developing countries over recent years so the signs are certainly in trending in the right direction for the availability of these types of statistics but there's still significant gaps and especially when you go back in time into the 1990s or 1980s. So to give an illustration of some of the challenges that exist in underlying data I'm going to show you some data that comes from the IMF's government finance statistics and why this presents real challenges for users trying to access and make sense of the data and make cross-country comparisons. So this series is for total taxation in Algeria between 1995 and 2010. This comes from the IMF's government finance statistics and you can see there's quite a lot of volatility and you might think that's probably due to resource revenue, hydrocarbon revenue and you would be right. If you try to disaggregate this from the IMF's government finance statistics into so I will just plot two categories taxes on income and taxes and goods and services you can see that for the first part of this period most of the volatility seems to be driven by income taxes which would suggest that there seems to be some sort of natural resource revenue tied up in that figure until 2002 but then what happens with the figure for with the graph for the rest of the time series I'm not entirely sure but it would appear that there's a massive reallocation away from taxes and goods and services toward or sorry from income taxes toward taxes and goods and services for 2003-45 and then it seems to jump back and then it jumps back again. In all honesty I don't know the reason for this. It looks like it's this portion of resource tax that at one point has been classed as an income tax and another time has been classed as tax and goods and services um but in truth I'm not sure what is the right way to classify this kind of data but hopefully this serves to illustrate that for a researcher wishing to use either the income tax ratio or the tax and goods and services ratio from this source you know it's very inconsistent and definitely presents some challenges there so this is the sort of thing that that crops up in underlying data and unfortunately there's not a lot of guidance provided with these sources to the to the end user as to how they should best deal with this sort of thing. So to take a little step back again we construct a cross-country dataset on government revenues and it runs from 1980 until 2015 and some data does exist for a number of countries into the 70s and even in 1960s but it starts to get very patchy the coverage so we draw a line under it at 1980 where the coverage generally starts to pick up and become quite complete. The sources we take this from are OECD's revenue statistics, IMF government finance statistics the economic commission for Latin America and the Caribbean sepal start initiative was previously quite a good source of data and the most recent update they haven't updated their data for quite a while so we might not be able to include any of that going forward. Mainly we're able to get data or access data for developing countries from IMF staff reports statistical appendices and on very rare occasions we can look to national data sources from revenue authorities for example to fill in some gaps if appropriate. So the data runs from 1980 to 2015 the data is there I think there are data points for almost every country over 180 I think and so we present revenue tax and subcomponents so that's direct tax indirect tax and the subcomponents of that which is income payroll property taxes and goods and services trade taxes and other taxes and we follow the classification of the IMF government finance statistics manual because it's fairly straightforward to follow and we express all of this as a percent of what we call a common GDP figure so sometimes if you've accessed data as a ratio of GDP from two different sources for the same country your observation not only might the tax figure be slightly different but the GDP figure that they've used may also be different so there's a very simple correction we just use one GDP figure and we take all of the data in common local currency and that GDP figure comes from the IMF's World Economic Outlook so compared to existing sources or underlying data sources I've identified what I think are the four main innovations or improvements that we're able to make and the first one is that we can achieve significant gains in coverage and consistency compared to other sources we can present revenues both inclusive and exclusive of social security contributions and that point is perhaps more relevant for OECD or high income countries a lot of developing countries or lower middle income countries don't at this point either collect or report social security contributions but it's kind of an important point that you should bear in mind if you're using this kind of data to make cross-country comparison we also were possible try to distinguish the part of revenue that comes from natural resources and so the Algeria example was just one where that's perhaps a problem and so we try to do that as consistently as possible for the largest number of countries as possible and perhaps most importantly we provide notes and extensive interpretations of the data and help to guide and try to guide users when they're using the data and so that they employ it in a more perhaps responsible manner so thinking about the gains in coverage what we offer is what we call a merge data set which incorporates data from both central and general government so ideally we would be able to incorporate data for the general government level for every country that's not possible because a large number of countries don't report at the general government level to the IMF and that's perfectly understandable so ideally we could have general government where we can't account for all revenue we try to just account for as much as possible so in some countries that might mean we some central government and state government together so India is one example where a lot of revenue is collected at both the central and the state level so we can sum those together in a large number of countries we're able to sum central government revenue with revenue collected at the at the social security funds level of government but similarly in and still quite a large number of countries data only really exists at the budgetary central level of government and so that's all we can really really report we also make available the underlying central government files and general government files should users wish to look at that data as well so I said before that for most of the for most of the developing countries the the data is lifted from IMF article four staff reports statistical appendices and this is just an example of one from Benin in the early 2000s and you can see that in this case the disaggregation is reasonably good direct taxes indirect taxes and trade taxes etc and so we're able to we're possible match those up with with the revenue headings that we have I should say that this is perhaps a rare example of of a statistical appendix which is quite detailed a lot of them are frustratingly frustratingly restrained in in what they in what they can report the second thing that I wanted to talk about was that across sources in both and across countries as well the way in which social security is funded social security contributions are reported and it differs widely and this if you're making a comparison of the tax ratio can you know can skew your results or your your graphs or your diagrams or if you're including this in an econometric approach and it's maybe something that you should you should be aware of so across countries some countries fund the social security via taxes some via more traditional social security contributions and in some countries employees are required to make compulsory contributions to the private sector and so those those contributions don't actually ever enter the general government so we don't see those numbers showing up in tax statistics and also across sources so between the IMF's government finance statistics and OECD revenue statistics we see social contributions classified in different ways sometimes and I'll show a few examples of where that's the case just not so if I plot this is for Denmark and Finland and this is taxes exclusive of social security contributions and this comes from the OECD's tax statistics so this exposition would suggest to you that Denmark collects a heck of a lot more tax than does Finland but if you have any sort of knowledge of those two countries you know that's perhaps not the case in truth this is kind of a definition of an issue and comes down to the way in which social security is funded in Denmark where it's funded almost entirely through personal income taxes when you then express this series as taxes inclusive of social contributions you see that the two lines are much closer together and so this gives a much more accurate picture of revenue collection in both in both countries and then you can see that the green line shows that what are what are reported as social security contributions in Denmark are actually only a couple of points of GDP whereas in Finland they're over 10% of GDP and this as I said is really a definitional issue and so I think personal income taxes in Denmark account for about or they are to the tune of about 25% of GDP which is more than a lot of countries collect in total taxes it's just due to the way that social security is funded but the wider sort of point is that if you use the variable taxes exclusive of social contributions you perhaps end up with a slightly invalid cross-country comparison of the tax ratio in these two countries how does the GRD overcome this kind of inconsistency it's fairly straightforward we present total taxes both inclusive and exclusive of social security contributions we also present direct taxes both inclusive and exclusive of social security contributions so there's a lot of columns in the data set but hopefully the fact that all those figures are available helps users to see these sorts of differences and make a better use of of these statistics a second issue that crops up and it doesn't it doesn't happen for a lot of countries but this is for this is a graph for the sum of payroll taxes and social security contributions in Sweden and you can see that if you take the data from the IMF government finance statistics they suggest that payroll taxes are close to 10 percent of GDP and social security contributions are very very low but if you take the exact same numbers from the OECD's revenue statistics you see the gray line doesn't really shift at all so they're capturing the same thing but the the mixture between payroll taxes and social security contributions has changed completely so if you were making inferences about the tax ratio in Sweden where payroll taxes are part of tax and social contributions are part of social contributions depending on where you got your data from you you may come up with an answer that's perhaps eight percent of GDP out depending on the source and again it's purely a definitional issue and to do with how the taxes are recorded in in the different sources but it's something that users should be aware of and so in cases like this where one source says one thing and another source says something totally different what we do is we annotate the data and say look we've selected data from the OECD's revenue statistics they suggest this about social security contributions but other sources suggest something slightly different so be aware. A couple of other inconsistencies that that come to mind I mentioned about private sector contributions so as far as I know in Switzerland employees are required to make or some employees are required to make compulsory social contributions to the private sector so those never actually enter the general government budget and so Switzerland's tax ratio ends up being somewhat lower than you might expect for for a high-income country like Switzerland and again this is not something we can account for in the data but it's somewhere where we guide users and say look compared to some other countries that collect all of their social security contributions at the general government level or at the social security funds level of government you might want to be be aware that this ratio is slightly lower. So thinking for a moment about natural resource revenues so researchers and policymakers are I think increasingly interested in in the non-resourced tax tech and certainly in this SDG context where there's a move toward trying to increase revenue mobilization in developing countries it's important that researchers are able to get at that data and most sources or most underlying sources are not very good at disaggregating the data in this way I should say that the the new africa tax statistics from the OECD actually do quite a good job at this for the limited number of countries where they report unfortunately the IMF's government finance statistics don't yet disaggregate their their tax revenues in this manner so in some cases we're able to go into individual country sources and and augment the the data from from OECD and I'll say a little bit more about that and so as an example of why this is maybe useful here is total government revenue in Nigeria as a percent of GDP so I think the underlying source here is an IMF country report and as you can see it's extremely volatile in the GRD we're able to break this down into the resource component and the non-resource component and you can see that the volatility is almost entirely driven by revenue from oil and then we can take the residual of that and you see that actually non-resource revenue has been fairly constant for the last 10 years or so at about four percent of GDP and can we do this for every country no we can't and Nigeria is a nice example and that's why I've put it up there and the data is less consistent for other countries there are an increasing number of countries where it is well reported and that's encouraging and but it's it's not always as consistent as this but we hope to give users of the data just an idea of of the amount of revenue that comes from from natural resources and what I should say is so that that variable there was what we've what we've called total resource revenue or total hydrocarbon revenue for Nigeria it's very it's not always possible to break that down into the part that comes from taxation and the part that comes from non-tax revenue indeed it's very difficult and the way in which these revenues are reported over time in the country reports is sometimes wildly inconsistent sometimes a report might suggest it's a non-tax revenue the next report would suggest that it's actually a tax revenue so it's really difficult for us to know in some cases some and they also differ across countries some countries will class for example a royalty as a non-tax payment other sources would class a royalty as you know an a tax payment and so we're not always entirely sure so the the total resource revenue figure is probably more consistent than figures on resource taxes or non-resource taxes um how accurate are these figures um well sometimes it's really difficult to know what we have done is we've I've tried to match up where we have a data point on natural resource revenue where with where the extractive industries transparency initiative also has a data point and so some of the researchers at the national resource natural resource governance institute um very usefully put all of the EITI data into excel form so um thank you to them for doing that but it's allowed me to come up with a bit of a comparison with how the figures from IMF's um country reports match up with EITI data and that's with the assumption that EITI is capturing you know as much resource revenue as possible um and so I was reasonably encouraged with with how close the figures were so our GRD figures are on the y-axis EITI data is on the x-axis and you know most of the the the points would cluster around the 45 degree line so I was I was fairly encouraged but there is there's certainly a tendency that our figures would would underestimate the total amount of resource revenue that's assuming that the the point in in the EITI is is the correct amount um but what this has also enabled us to do is enabled us to do I should say is annotate the data again so in cases where the EITI suggests that the country's collecting maybe there's some points that are at around 30% of GDP where the where where our data suggests it's much lower we're able to say look these figures come from the IMF country report they suggest around 10 to 15 percent of GDP come from natural resource revenues but the true figure may be much higher indeed so I've been able to use this data and just provide some more interpretation around the figures that we have contained in in the government revenue data set okay so the the fourth kind of innovation or addition we make and I've already said a little bit about this is that um we provide where possible interpretation of the data for users and try to provide guidance on what's actually going on in the data um and whether or not it's it's you know good research practice to include this data in your econometrics if you choose to do that um so we have a series of notes and comments and what we call flags in the data where it's just a dummy variable equal to one so if you were perhaps running a regression you could exclude that quite quite quickly and so the sorts of cases where we need to warn users are certainly when we when we can't account for natural resource revenues but we know that they're driving volatility or highly inflated um natural resource not sorry where they're driving volatility or inflated tax revenues um so we said to users look these figures seem high it's probably because there's natural resource revenue but we don't have a source which allows us to break that up um we also include comments and flags wherever wherever we have inconsistencies over the way that social contributions are reported so some of the some of the examples that I showed you we say look source x says this but source y says this be careful if you're making a cross-country comparison here and there are also there are also cases where we're able to to warn users that look we're not sure why this data is so high or so low but something looks incorrect um so quite often this is the case when a country has transitioned from a command economy to a to a market economy and you see that the composition of revenues has shifted starkly from perhaps non-tax revenue toward more traditional tax revenue or the makeup of tax revenue has changed has changed quite a bit and so we're able to provide that interpretation and and guidance for users and there are certainly cases where we suggest if you're doing econometrics maybe exclude this observation and see if your results hold and so we hope that users pay attention to that data and because we put a lot of a lot of effort into manually cleaning the data and looking for those those observations where we think that we can't explain what's going on in entirely but we think perhaps something is is causing a bias in the data um we also you know are quite okay with with cases where there are data points but they just look to be wildly wildly incorrect so it's not it's not often the case but there are some observations where we don't include any data because it just looks to be wrong and and in cases like that again we would suggest to users why we have not got a data point for that specific country here and and there are other there are other cases where we've flagged up the the data to users but hopefully um if you if you are interested in accessing the data you will pay attention to those um okay so I've spent the last probably three or four months with my head buried in excel trying to get this data set updated um and so just to talk for five or ten minutes about some of what I've been doing um so we've we've managed to improve coverage a little bit um so the small table on the right shows the percentage of total observations that we have for each of those um disaggregated categories um I should say the numbers there are probably off by a couple of percentage points because a lot of countries in the data set didn't exist before 1990 former soviet states example so it didn't make that small correction but compared with the with the last version of the data set we've been able to we've been able to improve coverage for income tax trade tax and goods and services taxes by about three three and a half percent of observations which when you're at the margin actually represents quite an improvement I think and most of that has been done to better disaggregation from the underlying sources such as the OECD the IMF um I've also tried really hard to fill in gaps in the time series so frustratingly in the older version of the data set there would be every now and again just if you had a time series from 1980 to 2012 there might be one or two missing observations just in the middle and if you're doing any sort of panel econometrics you know that country's probably going to get dropped from your estimation if there's a missing observation so um I filled in a lot of those gaps and I think there's only one or two countries where that's still the case so off the top of my head I think in 1996 or 7 Morocco changed from reporting its taxes on a calendar year basis to a fiscal year basis where which ran instead from January to December from July to June and so for 1996 there was only six months of data for Morocco so we we just don't have a full data point for that year and so we can't include it it's not comparable with the data that surrounds um the the data runs now up until up until 2015 and that's so a small number of countries have now reported at this stage um tax and revenue data for 2016 but um almost every country we have data up to 2015 which which is you know so usually the tax statistics will lie two years behind the the calendar year um and I'll say a little bit more about some of these some of these points so previously in the government revenue data set when we were taking data from article four consultations or statistical appendices it was sometimes unclear whether the trade tax figure included vat that's collected on imports or not um often vat collected on imports is collected by the customs authority so in the country report it might just be included under the trade tax revenue heading um the previous authors of the grd the will pritchard and others were quite open about the fact that they struggle to be consistent with how this data was was handled um I've fairly painstakingly gone through each and every country and tried were possible to remove the part of of sales tax or vat or excises on trade that should actually be classed as a tax and goods and service according to gfs m 2014 standards so this is again the the example from benin and this is a really nice example because you can see under taxes and international trade and transactions it shows the part of that that comes from vat um so it's quite simple to lift that and reclassify it as a tax and goods and service which brings it in line with benin's figures from the imf gfs unfortunately not many countries disaggregate this far and so it's sometimes quite it's sometimes quite tricky to to be able to make this correction um but i'm reasonably in fact i'm very confident that we've we've either identified the cases where it's not possible to make this correction and remove those data those data points because they're not comparable with other countries and there's only i think one or two countries in the data set where that's the case in every other case i've been able to make the correction um so if perhaps you have used the data set before to look at trade or goods and services ratios as a percent of GDP perhaps take another look and and see if those particular data points have have maybe changed this time around and so yeah i've corrected that in consistency um and the final thing i wanted to say is just about property tax so there's a very small literature on property tax in developing countries but there is a bit of attention at the moment um as of the government finance statistics manual 2014 um the imf have reclassified what was um included in property taxes um so historically this included um inheritance taxes non-recurrent and recurrent taxes on the movable property and taxes on financial and capital transactions which is a tax levied on the seal of of a property um in accordance with sna 2008 and esa 2010 guidelines this this item has been moved from a property tax to attacks and goods and services um which is fine so whenever we re re re accessed all of the data from the imf gfs the the correction had been made unfortunately oecd's revenue statistics team didn't follow suit and make that make that reallocation um they they weren't very forthcoming with a satisfying reason why they haven't done that but they promised me they'll revisit it revisit that decision again this year but for for now what we had to do was you know make that calculation ourselves for all of the oecd countries and were possible um for data that came from imf country reports so um this might seem kind of trivial because if you've ever looked at tax statistics property taxes are a very small make up a very small part of the overall revenue picture but if users are interested in using the property tax series from the grd um this probably will probably will have quite an impact because as a percent of property tax taxes on financial and capital transactions was often up to half or two thirds of that figure so the taxes on the movable property weren't actually that much um so if you've used the data for taxes on property then there there may be a small change in in the numbers that you see but um going forward this allows us to be more consistent with how the statistics are are classified um certainly by the imf okay so that's just a snapshot of some of the improvements that we've made uh this time around um so the dataset is freely open and available for anyone to use it's online at the wider website um under projects click on projects is the first one you see um looking forward we're hoping to visualize it and create an interactive tool so i appreciate always that researchers are happy to delve into this data file in the excel file um i know that i would always go straight there but other people would perhaps just want a snapshot of trends across countries or in certain in certain regions so we're hoping to implement that at some point uh in the future um certainly hoping to move to a more uh a more simple annual update cycle so this time around i took the chance to address a lot of these underlying issues that i knew needed to be corrected at some point in the data hopefully going forward the annual update will just add an extra year of observations and input any any uh revisions that have been made in the underlying sources and certainly if other sources come along or sources improve their coverage we can work to incorporate that in the dataset and hopefully improve coverage going forward this we've always been happy to collaborate with other researchers on this and collaborate with people in developing countries who might have specific country knowledge um please if you use the dataset get in touch tell us why you're using it just maybe purely selfishly it's nice to know what people have done with the dataset um but also please provide us with feedback so in a number of cases um people from various revenue authorities or who have worked in certain countries have looked at our data and said okay we know that this is from the underlying source the imf etc but i happen to suggest that it's out by x percent of tdp or something has been misclassified here or there so we're very happy to incorporate that sort of information and collaborate with other people on the dataset and that can only make it a richer resource um going forward okay so that's all i have to say on the dataset