 I'm going to be talking about a project called the Government Revenue Data Set, which I suppose in charge of here at UNU-Wider. So this is definitely on the more macro end of things. And so I'm going to talk about why we feel it's important to maintain and invest in this project, talk about some of the reasons why it was implemented in the first place, then show some just sort of the trends and highlights that you can pick out from this sort of data set, briefly discuss some research findings and then talk a little bit about what we're trying to do with this with regards to communication efforts around tax data in the future. So do I have a clicker? Yes, okay. So it really, this project came out of two large motivating concerns. The first being that of data quality. For sure for a long time it was very hard to get hold of good, comparable, reliable and maybe an inverted comm is accurate cross-country tax data, especially for developing countries. And the implication of this for researchers was that you might have a lot of different papers using different data sets from different sources and then it's hard to know which one to believe. A lot of research would come out of, for example, the IMS Fiscal Affairs Department and they often used data that wasn't publicly available for one reason or another and that meant that researchers had a hard time trying to replicate those results or trying to interrogate them or extend them. And so the idea was to create an open data set of government revenues that had better developing country coverage and brought together data from a number of different sources and made that comparable. And of course this kind of speaks to the broader discussions about domestic resource mobilization which has come to the fore in recent years. So what exactly is it? It's a cross-country data set of government revenues and tax subcomponents. So things like total government revenue, total tax, income taxes, VAT, trade taxes, grants, social contributions, et cetera, et cetera. And this was initially started at the International Centre for Tax and Development which is based at the Institute for Development Studies in Sussex in the UK. The project itself began sometime around 2010 but I have found reference to this in online and in reports as far back as 2005. People were sort of calling for a data set like this and it was initially launched in 2014 and since then I've kind of taken on the mantle of overseeing the updates every year. And so just this week we're putting the finishing touches on this year's version of the data set which will be online as of early next week. And so I've mentioned kind of some of the motivating concerns already. A lot of the time if you were to compare data from three different sources not only might the tax figures be different but if they're expressed as tax ratios the GDP figure that they use as the denominator would also be different and this creates a lot of confusion for researchers and even if you're just displaying a tax ratio for example. So a tax data from all of the major traditional sources where individual researchers might have got their data before. So primarily the OECD's revenue statistics, the IMF's GFS but the big kind of or the most of the work that we do is that we mine data from IMF article four consultations which are if you've ever read those they're in a very on standard format and there's no way to kind of mechanize that procedure and it's a lot of work to just copy and paste the figures and try and make sense of them. But that really improves the developing country coverage because quite often that's the only source that we can go to to find out for example how much revenue is collected in those countries and so the idea is that we do our best to take data from different sources and account for the fact that different sources classify different revenues in different ways and make those adjustments so that you can compare accurately across countries no matter where the data comes from which source it comes from. So we end up with vastly improved country coverage compared to any other single source I think there's data points for 196 or 197 countries in there and as of the new version it runs until 2016 so tax data unfortunately lags behind by about 18 months or 24 months depending on the country. So just this year we've managed to complete the series for 2016 and I think only about a handful maybe 20 or 30 countries have data for 2017 at the moment so that's the reason why. Another major innovation that we try where possible to do is to present tax and revenue data both inclusive and exclusive of natural resource revenues. So some countries have a vastly inflated tax revenue or revenue ratio because of for example oil or mining revenues and it's not always possible to do this but where possible we do try to present both figures and a lot of users will find that really useful. I'll say a bit more later but we put a lot of work into providing guidance and notes so having had our head stuck in this data for a number of years now it becomes quite apparent that there's kind of often a story behind the numbers and when you've compared for example the data for one country across sources you can get to a point where you can figure out what's going on or if a revenue has been misclassified and we put a lot of work into providing notes in the in the data set I have no idea if users actually ever read these notes or take heed of them but we hope that they do because you have to be sometimes careful when making sort of cross country comparisons with this kind of data. Okay and so as a very extreme example I'm going to show you what can go wrong when you take data without actually looking at it or without thinking does this make sense so this is this is quite an extreme example but I quite like this example because it's been very clearly wrong for a number of years but it still exists within the IMF's government finance statistics and hopefully it highlights the importance of you know using using caution when when looking at revenue data so it looks at total tax as a percentage of GDP in Algeria so you can see there's quite a lot of fluctuations and that's primarily due to revenue from oil but that series seems broadly believable but then whenever you take data from the IMF and plot income taxes and goods and services taxes between 1995 and 2002 it seems broadly believable but then something's gone wrong at some point and what we found out this was was that the part of income tax that came from natural resource revenues for a few years was classified as a goods and service tax until about 2005 and then I think they changed the minds again and again and again and again to the point where these two subcomponents of total revenue don't really make a lot of sense and my understanding of this is that actually this is how the Algerians have presented the data to the IMF and in that sense it is actually correct but if you attempt to do any research with this or as soon as you plug this into a time series or a panel regression you're going to come to some very misleading research results and so how does the GRD for example solve this problem well we end up for Algeria taking our data from the IMF country consultation which has a lot more straightforward and sensible data so it plots taxes a percentage GDP which follows broadly the same trend but you can see that income tax looks like more toward what the true value would be and goods and services taxes broadly even at around about 5% of GDP across the same time period as I said we try to present data both inclusive and exclusive of natural resource revenues so we can go a bit further than the traditional IMF source and also show non-resource tax which is the orange line and non-resource income tax at the very bottom in the yellow line so we can in some cases really improve on what exists out there at the moment and add a little more in terms of the non-resource taxes now it's not always the case that the GRD is able to solve problems such as this and sometimes the data is still of pretty poor quality and quite misleading and we just have to say be careful with this or we have to say we don't include this data point because we know what to be definitely incorrect and so that's just a broad example or a rough example of what can go wrong with cross country revenue data and so now I'm going to move on to just talking a little bit about some first trends from the 2018 government revenue data set and as Yuka mentioned there have been continual improvements in the tax ratio in developing countries and regions over the past decade or these especially the last five years and this is really encouraging but these improvements do really mask some outliers so to go here we see this is the tax ratio by region between 1990 and 2016 and so you can see that just in the last about five or six years and both South Asia and Middle Eastern North Africa have seen their tax ratio really take off and improved by almost around 40 or 50 percent and you can see in the in the dark blue line Sub-Saharan Africa has seen from the 80s and 90s it was pretty stagnant for a long time but then about 10 years ago started to take off and you see the average tax ratio in Sub-Saharan Africa is now about 16 16 and a half percent which is quite an encouraging story as I said this does mask some outliers so you have countries like Senegal which collects around about 20 21% of GDP in taxes and that's more than Japan collects and Japan obviously collects a lot of Social Security contributions and but you have some countries doing incredibly well you have other countries such as Somalia and for the best data we have for Somalia and again it's probably not very good it suggests that Somalia collects about one and a half percent of GDP in tax revenue and so this definitely masks some outliers but the trend is on average pretty encouraging across a across the developing world and if we look at the evolution of the tax structure in low and lower middle income countries and those are defined as low and lower middle income as per the 2018 classification you can see that again the the total tax take has been steadily increasing over time but also that the tax structure has been changing a much heavier reliance on income taxes than would have previously been the case and it's a stylized fact of development that lower income countries depend a lot more on trade taxes and whilst that still is true compared to OECD countries the relative reliance in the total tax mix is falling and so now I think on average it's about two or two and a half percent of GDP only in in developing countries but you still see this massive if we so income taxes pretty much all of the direct tax that developing countries would collect and you can still see a massive a massive heavily reliance on indirect taxes compared to in rich or or OECD countries where the split would be about 50-50. What we can also see for just about the first time in this year's GRD is the impact of the fallen oil prices in about 2014 and again it's a consequence of tax data and revenue data lagging behind by a couple of years that we haven't really been able to get a systematic look at this until now but you can see so I just picked out the data for some African oil producing companies countries sorry and this is total revenue as a percent of GDP and okay it it fluctuates quite a lot but that's natural when you depend on a natural resources so this is Congo Nigeria, Chattangula, Algeria, Egypt, Gabon and Equatorial Guinea and you can see that the oil price fell around about 2004 and there see there starts to be some remarkable drops in in total revenue collection in these countries and if we plot an average of these you can see that in just about four years the revenue has dropped from about just over 30% on average of GDP to about 18% so a lot of countries facing facing big struggles and we've seen for example countries like Qatar which never really traditionally taxed its citizens have now are now introducing the VAT for example because they're really struggling to rely on oil revenues which they always could before so those are just some some quick trends that we've taken out of the new data this year. Every year when we update it we make thorough revision so it's been the case that for the last few years and particularly the OECD have been expanding the number of countries that they've covered so every time a new source for a country comes on board we have to see okay how does this line up with what we have is it better is it worse does it tell us something we didn't know before and that actually leads to sometimes we end up removing data because if we find a new source which suggests that the initial data is incorrect or problematic and then we prefer sometimes to take some take some observations out but again leave a note and say to the users we've taken this data out because we know it to be potentially incorrect or if we're just not sure about it we'll put a flag in the data and say this looks suspicious for reasons X, Y and Z be careful using this in context A, B and C. In terms of the coverage we have these are just a couple of the indicators but you can see that this is pretty much the best we can do for a lot of countries that are just as historically no data out there at least publicly available in a cross country source so for components like total revenue or total tax run about four-fifths of the potential observations exist when you start to look at the disaggregates the coverage here is still remarkably better than any other effort like this but there still are some holes in there but the picture is pretty encouraging and especially with the OECD and IMF at the moment they are expanding their coverage and each year as I say when we update this there are more sources available to us which really helps us. We also this year added a column for VIT so a lot of researchers had asked us because we had a column for domestic goods and services tax and then a column for the sales tax and VIT added in together so this year we went back to all the old reports and tried to find VIT data where possible so hopefully some users find that useful. In terms of research that's come out with the GRD we had an event here in I think March 2016 and much like the data there's a two year lag to getting research out so in March of this year there's an open access edition of the Journal of International Development which was a collection of nine articles that use the government revenue data set and these are listed here I think there might be a copy on the table outside but it's open access online so a couple of these studies revisited existing results with new extended data sets and find something slightly different and a couple were asking questions that we couldn't previously ask before just because the data wasn't systematically available so I'm not gonna talk through each and every one of them but if some of those titles look like something that might be interesting then I'd encourage you to have a look at the at the version online. So it's been used in a large number of... I sat when I was putting this presentation together to try and go through Google Scholar and count how many publications have used it but I quickly gave up because it was a there were many more than I thought had actually used it and it's been it's featured in our world in data and the Moe Ibrahim Foundation uses it in their index of governance and occasionally picked up in the media so like the goal for us is that when researchers or policymakers or anyone writing a report working in an IO wants to make cross-country comparisons that this would be the go-to data set and I think we're working towards that end and I think we're increasingly being recognized as somewhere to get good quality reliable data. The next steps is something that we've been talking about doing for a long time and finally got underway this year is to create a kind of online dashboard so at the moment it's available in Excel format and Stata format and that's fine for someone like myself because I'm a researcher and probably I'm always gonna download data in that format and get stuck into it but that's not accessible for everyone and particularly those that aren't particularly research oriented but just want to get to the numbers quickly and I spend a lot of time answering requests from people that want want me to make a table or want me to make a graph or something and so hopefully this new tool will free that up. While I'm bothering to mention this it's nothing new like a lot of similar data sets have these online tools but if you've ever used any of them they're not particularly user-friendly all the time some of them are okay but they're not great so we've decided to build an interface around the data set instead of plugging the data set into an existing interface somewhere and so hopefully one that will help users to really get the most out of out of this data and increase its usability. As I've said as a researcher I'm probably still just gonna download the Excel file and go there but if you just want to make a quick chart or make a quick comparison to embed in a report for example it's maybe better to have something more user-friendly so we want to try and create a little bit of a hub around tax data on the wider website and make the guidance that we've provided on the interpretation of these statistics a bit more visible to users and hope that they take advantage of those and so we have a few mock-ups so we're working with a data design company in spin to put this together and so it will look something hopefully roughly like this the colors aren't set in stone but hopefully a pretty user-friendly interface where you can decide which variables you want to see choose regions income groups etc select your time frame plot the data on different kinds of visualizations and hopefully the software will be built with it will be slightly it will be tried to be quite smart so it will see how many indicators you've selected which countries you've selected in the time frame and then notify you on which would be the most appropriate type of visualization to use and so we also are going to create kind of country profile pages which will just give a brief overview of tax collection and the tax composition in every country and if you look closely you can see this is a mock-up because the numbers don't quite add up and and here we're going to like provide what we know about the tax data for for example Colombia for many countries it might be this is the tax data for Finland it's pretty believable there's nothing wrong go ahead and use it for other countries it might be there are severe problems here be careful etc and so that's kind of a first look at what we we think will be ready by the end of this year at least I hope so because we've been talking about it for a long time but hopefully this really works to engage more people with the tax data and to and to work with it yeah so the idea is to engage people especially those outside of research and yeah so we have a few more screenshots from this in the network cafe on the PC upstairs so if you drop by there for a couple of minutes you can you can watch a presentation of those and it would be great to get some feedback on that if possible to conclude data quality remains an issue this this data set is only ever going to be as accurate as the underlying data so almost two years to the day McMurray stood in here who's the who's in charge at ICTD that started this project and called this data set better than bad but refused to use the word good I'm gonna say it's better than better than bad but I'm still not gonna use the word good for it and I see it as kind of a sticking plaster on the issue there is a increasing international commitment to provide this kind of cross-country data and we've seen that from investments by the OECD etc but I mean and perhaps in an ideal world a third party wouldn't have had to create this data set but the case is that you know that has happened and I'd also say like there's there's definitely a balance that we're aware of and whilst we like to urge caution when using these statistics we also want people to use the data and engage with it so we're trying to strike that balance by saying be careful here and but still still use it but we definitely see for the foreseeable future and perhaps I'm justifying my own existence here and there's there's a need for this to there's a need for this to continue and because at present none of the other major sources have have produced anything close or anything that has as much coverage so that's just a very quick run-through of what the project is about and I'd love to talk to you more about it later if you're interested thank you