 I'm Salvatore Babonis and today's lecture is the International Data Infrastructure. It can be easy to forget that data don't come into existence automatically. Data are compiled and published by people and organizations to serve specific purposes. As a result, the data that we have about the world that are available for comparative analysis are highly skewed and not always the data we might want. Global scientists and policy analysts must do what they can with the data they have, but it is always important to think about what data we don't have and why we don't have it. Most of the more or less global data we have about the world consists of data about countries. The country is the default unit of analysis for comparative social research. This is because the modern world system is composed of states and states produce statistics about the countries they govern. In fact, you can see the word state right there in the word statistics. Statistics were etymologically originally just data about countries, data that were produced by states. Nearly all global data are really just compilations of country data. These compilations are done by a variety of organizations. The most prominent are the World Bank, the International Monetary Fund, and the Organization for Economic Cooperation and Development, or OECD. There's also a database produced by the Central Intelligence Agency of the United States, the CIA World Factbook. This is a resource for quick access to data online, but it's not generally used as by professionals as a broad source of international data. It should always be remembered that many poor countries have only minimal data collection and compilation capacity. Remember that about half the countries of the world have fewer than 10 million people, and about half the countries of the world have a GDP per capita of less than $5,000 a year. These small and poor countries tend to have very limited capacity to produce data simply because they have very limited state capacity in general. Again, the kind of statistics that can be effectively produced by a country like Laos, or El Salvador, or the Central African Republic, and you'll get an idea of just how meaningful or meaningless data can be in international databases. Remember that the data in the international databases are simply the data reported by the countries. They are not data collected independently by the World Bank, or the IMF, or the CIA. Now data are reported for countries in the international data infrastructure, but often they pertain to different kinds of unit of analysis. They are in one way or another aggregated to the country level. So for example, some international data are purely compositional. They're data for persons or for firms and they're just added up to compile data for the country. National income or gross domestic product is this sort of data. You simply find every person's income, every firm's income, add it all up and you get the country's income. Other kinds of data are structural and relate to relationships among individuals. For example, income inequality. If no one person has a level of income inequality, you can't add up everybody's income inequality in a population and get the total society's income inequality. Income inequality is derived as a structural feature of the relationships among people. Still other kinds of data are integral to the country. They are only meaningful about the country. For example, whether a country is a democracy or a dictatorship, that's a characteristic of the country that cannot be reduced to the people inside the country. Still other kinds of data are relational and involve multiple countries. For example, treaty memberships. No one country can be a party to a treaty. All treaties involve at least two countries and often many countries to agree to them. So you can't have data about a particular country without having data about multiple countries at the same time. And finally, there are contextual data that even if they are applied to a country, really apply to an entire system that's much bigger than the country. For example, the total level of global trade could be reported for any particular year and connected to a country. So within the year 2015, the United States exists in a world with a certain level of global trade. But in that same year 2015, Lesotho exists in a world with that level of global trade. So these are contextual variables that apply to all countries within the system at a given point in time. Comparative social scientists make extensive use of the compilations of country data that together make up what we call the international data infrastructure. The international data infrastructure is a collection of routinely used data that cover most or many or at least some of the countries of the world, depending on the type of data. Some kinds of data are available for nearly all countries. For example, population is reported with at least some degree of accuracy for 190 or 200 countries, while other kinds of data are only reported for a small number of countries. For example, something like level of unemployment benefits would only be available in a database probably for rich countries only, not for all 200 countries in the world. There are three big sources of data in the international data infrastructure. The first are IGOs, intergovernmental organizations. These are organizations like the IMF and the World Bank that countries belong to, and countries report their data to the IGO to be collected and collated to be made available to the public. Second are non-governmental organizations or NGOs. These usually compile their own data and very often NGO data are not data at all. For example, the Freedom House think tank publishes an annual rating of countries' levels of democracy or their levels of freedom. These ratings are done by a Freedom House expert panel, essentially a group of people meet in a room and, given evaluation, is a country a one, two, three, four, five, six or seven on different scales of freedom and democracy. That's not really data, but it gets used as data because the press reports these figures as actual figures, and academics even use these as data in statistical analyses. Finally, there are academic survey data produced systematically by university professors. The two biggest examples are the International Social Survey Program, ISSP, and the World Value Survey, WVS. These two surveys are multi-country surveys done by universities in partnership around the world. The ISSP has something like 45 countries participating. The World Value Survey has more than 80 countries that have participated in it. Because these are done by academics, they're done slightly differently in every country, and they tend to be very contingent on funding. If a group of scholars in a country can get funding to participate in these programs, they do, but very often funding is not available, and so a country disappears from the survey only to reappear later when funding is made available once again. For IGO data, nearly all of the data are reported by the countries themselves. They are not collected independently by the IGOs, and thus the data are of questionable value because countries always have political motives for reporting the data they report. Nearly all NGO data are driven by the NGO's political agendas, and thus are of questionable value because they, to some extent, reflect the desired outcomes or the political mindset of the NGO itself. Overall, the international data infrastructure is quite strong on economic data, relatively weak on political data, and extremely weak on cultural data. There is almost no data available on languages, religions, attitudes. These are really things that we only have the broadest estimates available because there's no standard reporting mechanism through which countries report cultural data. Cultural data are also pretty weak because of a lack of standardization and obviously because countries can't agree on what constitutes democracy or on appropriate standards for democratic behavior. But on economic data, there's relatively good agreement around the world and a standardized system for collecting, collating, and reporting economic data. At the center of the international data infrastructure, as a result, are the data that are needed for the calculation of gross domestic product, or GDP. Gross domestic product is the sum total cash value of all the goods and services produced within a country's borders. There are three major methods for calculating GDP, the income method, the production method, and the consumption method. The easiest way to understand it is via the consumption method. In this method, arithmetically, GDP is defined as the sum of domestic consumption, domestic investment, government spending, and net exports, or exports minus imports. If you think about these four components, what they really sum up to is everything being produced inside a country minus the things that have been imported into the country that aren't really being produced by the people of the country themselves. Gross domestic product, or GDP per capita per person, is widely used as a broad indicator of national well-being. This is a very simple index. You just take a country's gross domestic product, usually expressed in dollars, US dollars, divide by the number of people in the country, and you get a number called GDP per capita or GDP per person. It's not perfect. It does not represent the amount of money each person in a country gets. But broadly speaking, it's a measure of how much money is available in a country to pay for each person's needs. Now, it doesn't mean the person herself or himself has control over the money. The money could be spent by government on your behalf. The money could be used by firms for investment instead of being spent by you on consumer goods. But broadly speaking, if a country has $40,000 GDP per capita, like most European countries, that makes them much better off than countries that have $4,000 GDP per capita, like most of the countries of Central America or Southeast Asia, those countries, again, are much better off than countries that have less than $1,000 GDP per capita, places like many sub-Saharan African countries. So when you're thinking about very broad categories, a country's GDP per capita is 10 times as large as in other countries, well, it's certainly a very useful indicator of well-being. On the other hand, when you try to compare countries at a very fine level, a GDP per capita of $40,000 versus $44,000 per person, well, that starts to become much less meaningful because GDP doesn't really capture exactly what we mean by well-being. Now, the statistics that are used in the calculation of GDP have been systematically collected since the 1940s in what's called the System of National Accounts, or SNA. The System of National Accounts is supervised by the United Nations, but is really an agreement between the United States Treasury, the European Union, the OECD, World Bank, and IMF. The big economic players in the world decide on the technical details of how GDP will be calculated using national accounting. Many things are very well-measured in the system of national accounts. Things that countries and economic managers are interested in. For instance, total consumer spending is in the system of national accounts because in order to calculate GDP, you have to know domestic consumption. But the distribution of consumer spending is not necessarily very well understood. In other words, are some people spending a lot and other people spending very little. Also the categories in which people spend are very well delineated in countries like the United States and Western Europe, but can be much more poorly measured in poor and developing countries. A second thing that's very well measured in the System of National Accounts is the average level of wages and the average level of savings. But again, the distribution of wages and the distribution of savings are not very well measured. So for example, average wages are measured by finding out how much all companies pay in wages and dividing by the number of people. So average wages is something very different from the median or the typical wage. In one country, all the wages may be paid to managers and very little to workers. While in another country, managers are paid very low and workers receive a lot of income. The average might be the same in both countries, but the distribution could be very different. Unfortunately, the System of National Accounts is based on aggregate quantities like the total wages paid and as a result tells us very little about the distribution of wages and the same for savings. Other items that are measured very well are price levels, international trade, investment spending, government spending, things that are very important to economists studying GDP, but maybe aren't so important to people trying to measure well-being of individual people. Systematically lacking from the international data infrastructure are economic data that don't enter into GDP calculations as well as non-economic data. So one thing that doesn't go into GDP calculations is the household economy. Housework is not part of GDP. Voluntary work is not part of GDP. Unpaid work of all kinds is not part of GDP. GDP is really quite warped when non-monetary transactions become monetized. So for example in Australia most people clean their own houses, but in Singapore people hire cleaners to clean their house for them, hire maids. Well in Singapore that becomes part of GDP. It appears as if the economy is richer because there are lots of people cleaning houses and getting paid for it. In Australia where people tend to clean their own houses the GDP appears to be lower because less money is being spent on house cleaning. Now in both countries the house gets cleaned. There is not a difference in welfare in the sense of how well off people are. People in both countries are living in clean houses. People in both countries have their lawns mowed. People in both countries have their repairs done. But in a country like Australia where people tend to take on these tasks themselves it's not recorded in GDP. In places like Singapore where these tasks are paid it is recorded in GDP. Perhaps the most important aspect of the household economy that goes unrecorded in GDP is childcare. Childcare is typically not compensated labor. It's something that people and let's face it very often women do for free with no economic compensation. As a result it's not included in GDP. Having children as a result might take away from GDP because people leave the workforce and instead spend their time caring for children. Now that doesn't mean that society is getting less done raising children is definitely work. But it does mean that GDP is not necessarily an unbiased indication of how well off a society is. Again this becomes a problem when a society moves from a model of people caring for their own children to a model where children are placed in childcare it appears in the national statistics as GDP growth because more people are getting paid to do more things. But there's no actual improvement in welfare or standard of living it's just an increase in the number of transactions in the society that are monetary that are actually transactions occurring where money changes hands. Another particularly large deviation or bias in GDP is introduced by home ownership. When you rent a house that's money changing hands a person rents a house and the landlord provides a housing service and that's recorded in GDP as a service being rendered. When you own a house and live in it of course no service is being rendered. Now developed countries make an adjustment for this called imputed rent for homeowners. It's the rent you would be paying yourself if you rented your house from yourself and this is actually a very large component of GDP in countries like the United States and Australia where home ownership is very common. But it's interesting that countries like the United States and Australia and for that matter the entire system of national accounts chooses to include an imputation or an estimate for the real economic benefits of home ownership but they choose not to include an imputation of the economic benefits of raising children. So this is a criticism of how GDP is a very gendered statistic. Things that the let's face it mostly men who calculate national income things that they care about get imputations that are included in GDP even though no money changes hands living in your own house but things that those people don't value like caring for children or caring for the elderly do not receive an imputation and are not included in GDP. Of course non-economic data don't enter into GDP at all and thus are completely lacking in most or most of them are completely lacking in the international data infrastructure. The use of GDP per capita as a broad measure of national well-being is controversial but widespread and it's unavoidable. GDP is in essence a measure of the size of the country's monetary economy. Some GDP alternatives attempt to account for other aspects of life by monetizing them so green GDP measures try to put a price on the environment and make a charge against GDP whenever the environment is harmed. The genuine progress indicator is a similar attempt to put a value on not just the environment but also on quality of life and to put a value on vacations. Other attempts to replace GDP try to broaden the concept of well-being so that it's not just economic. The United Nations Human Development Index includes education and health indicators as well as GDP to create a broad human development index. Famously the small country of Bhutan uses something called gross national happiness which I put in big quotation marks because it's really very difficult to measure happiness but when it comes down to it the international data infrastructure is primarily designed to produce one thing, GDP and as a result those of us who use the international data infrastructure in a social policy context inevitably use GDP per capita as one of the most important variables in any comparison we do. In fact GDP alternatives generally have to rely on the data that are produced in the creation of GDP. The problem is that even if you want other data they just don't exist. So conceptually we might prefer alternatives to GDP for measuring national well-being. The problem is we just don't have the data. Practically speaking the international data infrastructure consists of GDP and the components of the GDP, components of GDP and things that are necessary to produce GDP per capita like population statistics and very little else. As a result when you see analyses of the world or comparisons of countries in the world you always see the same odd variables popping up again and again. My personal favorite is immunization. Where you see people using data on measles and DPT immunization rates. Very few people really care very much about measles and DPT immunization rates but they are reported for nearly every country in the World Bank's World Development Indicators and since they're there people use them. People use them as a proxy for the robustness of the health system, use them as a proxy for child health, use them as a proxy for government effectiveness. Just because they're reported whether or not the data are really correct for the purpose is a secondary consideration. Similarly for studying education we tend to use the gross school enrollment ratio. Now the gross school enrollment ratio is the number of people attending school divided by the population of a given age. It's not actually the percentage of people of a given age who are in school. In a country like Australia or the United States it may be very obvious that six year olds are in first grade so if you wanted to know the percentage of six year olds who attend school just take the number of first graders divided by the number of six year olds and you would get around 100%. But in many countries people of all ages might be in first grade because schooling is relatively inaccessible or because many people are unable to go to school or have to repeat grades or attend school that's wildly inappropriate for their ages. So you could find that for first grade the gross school enrollment ratio might be 150 or 200% while for eighth grade it might be something like 10 or 20%. These figures are highly variable and not really very reliable. Similarly infant mortality rates are reported for pretty much every country in the world but they're highly questionable. Advanced countries like the United States and Australia tend to have relatively high infant mortality rates because first all infant deaths are recorded and second many pregnancies that are very complicated make it through to delivery whereas in poor countries those pregnancies would never have reached the point where the child was born. So as a result of advanced medicine ironically the infant mortality rate tends to be higher in the most developed countries than it is in other countries. Now again as with GDP per capita if a country has an infant mortality rate of 10 children dying for every 100 born versus another country has an infant mortality rate of 2 we can be reasonably confident that the country with 10 has worse health than the country with 2. Similarly when infant mortality rates get up as high as 50 or 60 per 1000 children that's cause for alarm but when comparing countries with rates of 1.9 versus 2 infant deaths per 1000 births then these kind of considerations of data quality become much more important. Another indicator that's not as meaningful as it sounds is life expectancy at birth. Life expectancy at birth is calculated by looking at the age structure of the population as it exists today and trying to figure out what the death rate is at every single year for every single year of life. So we don't actually know how long you will live when you were born but we can calculate for all the people alive today what are the death rates at each particular age level and then we can multiply through all those death rates to find out an imputed life expectancy at birth. Once again the difference between life expectancy at birth of 50 years versus 80 years is surely important and meaningful but differences between 78 years and 78.5 years are highly questionable and might be do much more to data considerations than they are to any real difference in life expectancy. In general much of the data in the international data infrastructure is of highly questionable value. Some of it is simply completely fabricated. For example many countries most recently Nigeria and India have announced massive upward revisions of GDP due to the use of improved economic estimates. Now the improved economic estimates are to be applauded. It's really good that these countries have improved the measurement of GDP but if you look at a trajectory of GDP per capita in the international data infrastructure for Nigeria you'll find that in 2014, 2015, 2016 Nigeria grew by enormous amounts 10, 20 percent per year GDP growth rates. Now that's not real growth that's simply the change from one way measuring GDP that gave a lower figure to another way that gave a higher figure. As a result the growth figures reported are not necessarily very, very meaningful. What's more many countries report planned or targeted outcomes as actual data. Historiously China always reports GDP growth figures that are within a tenth of a percentage point of the planned figure. For 2015 the projected growth rate was 7 percent, the quote unquote actual growth rate was 6.9 percent. In my own research I suggest that China's real GDP growth rate is more likely something like 3 or 4 percent and even that 3 or 4 percent is entirely due to government deficit spending. The real economy in China outside of government spending may not be growing at all. Yet the government continues to report growth rates that exactly hit its target growth. Similarly, NGOs have political agendas. Unfortunately some present themselves as unbiased when in reality they are very biased. I already mentioned Freedom House but another good example is Transparency International. Transparency International is an NGO funded mainly by the UK and US and Nordic governments and by the OECD and it's an organization to fight corruption. That sounds very noble to fight corruption but Transparency International has very specific definitions of corruption that involve illegal behavior. So for example it's definitely corruption when a business person pays a bribe to a government official in order to receive a contract. There's no end out since that's corruption and Transparency International would appropriately characterize that as corruption. But what about when a business person gives a large campaign contribution to a politician, that politician is then elected and that politician coincidentally promotes or passes legislation that benefits the contributor? Well that's not corruption because that's entirely legal behavior. So their definition of corruption is a definition that penalizes poor countries where there's poor rule of law but makes rich countries look inappropriately good or at least makes them look less corrupt than we know they really are. Key takeaways. First the country is the default unit of analysis for data and the international data infrastructure. Second the amount of non-economic data in the international data infrastructure is very limited. And third despite many criticisms gross domestic product per capita or GDP per person remains the most widely used indicator of national well-being. I hope you enjoyed this lecture. To find more of my popular writing visit SalvaturbaBonus.com where you can also sign up for my monthly newsletter.