 So one of the things that people like to talk about, people who maybe don't have much familiarity with statistics but enough to kind of feel their way around, they'll talk about correlation. And correlation we understand to be the degree to which two things move together. So when economists say things like, for example, increase trade is good for the economy, increase trade is associated with increased household income, it's associated with less unemployment, it's associated with less poverty. One of the things people say is, well, what's the correlation between those two things? And correlation is kind of a, it's a good and a bad thing. It's good in the sense that it's this nice clean number. It goes from zero to one, right? Or negative one to positive one, depending on what kind of correlation you're using. But it's this nice little compact thing where we understand that zero means these two things aren't correlated and one means they're very highly correlated, right? So people have this kind of impression that the closer your correlation gets to one, the more correct your statement is, whatever the statement is you've just made. So there's some things we have to be careful about when it comes to correlation. And one of the things is what we call a spurious relationship. A spurious relationship is a relationship that, statistically speaking, looks like a nice strong relationship. Here are two things and they seem to move together and they've got a nice high correlation. That's all well and good, but what you're seeing is simply due to random chance. There isn't any real relationship here, it's just randomness that you're seeing. We call that spuriousness or a spurious relationship. Background case in point. You flip a coin, comes up heads, and you look on the news and you see that the stock market went up. Next day, you flip a coin, it comes up heads, you look at news, you see, the stock market went up. Next day you flip a coin, it comes up tails and you see the stock market went down and you say, good God, I've got the magic coin that predicts the stock market, right? Every time it's heads, the stock market goes up. Every time it's tails, stock market goes down. Now the fact is that we all know that that coin has no relationship whatsoever. to the stock market. What's going on is that by random chance you happen to match up the coin with the stock market. That's a spurious relationship. Now the unfortunate thing about spurious relationships is that by random chance some of them are gonna persist for a long time. So we have an example here number of, this is actual data, number of sunspots in the current year. This is 1960, 1980. So you see the number of sunspots in the current year and you see the number of Republicans in the Senate one year later. And what you see is this this is going over a 20 year period as the number of sunspots declines one year later the number of Republicans in the Senate declines. As sunspots go up one year later the number of Republicans in the Senate go up. Right? And somebody might look at this and say well yeah this has got to be a spurious relationship. There is no relationship between sunspots and number of Republicans. It's just random chance. That's true. Except that this thing persisted for 20 years. 20 years. You know and it leaves you wondering well look even though I know rationally there's no relationship between sunspots and Republicans. This relationship this apparent relationship is showing up in the data maybe I could use it to predict election outcomes. So you're kind of making the statement of I know these two things aren't related but I also know that they're showing up being correlated in the data so maybe I can make use of this. Here's the problem. It took 20 years for this data to accumulate. So this nice picture that you're seeing you would not have seen this picture in 1960 you wouldn't have seen it in 1970. It's not until 1980 that there's enough data there that you can construct this picture and you look at it and you say oh my God look at this thing. Let's start using sunspots to predict Republicans. Now the problem is that no spurious relationship is guaranteed to persist. If you move time forward and look at the next 15-20 years what you find is that the relationship disappeared. So again we see number of sunspots going up and down and one year later number of Republicans in the Senate going up and down but they're no longer moving lockstep with the sunspots. So this relationship that's this is the problem with the spurious relationship because it's due to random chance even though it's there and it exists there's nothing to guarantee it's going to be there tomorrow. And so if you start to base decisions on this thing you're basing decisions on a relationship that could disappear at any moment. And this is what would have happened if we had started making our election outcome predictions based on sunspots we would have found that we would have very bad predictions because we're based on this spurious relationship that just disappeared. So one problem with correlation is we have to be careful about spurious relationships things that appear to be correlated and in fact are only correlated due to random chance. Another thing we have to be careful about are what we call third variable effects. This is two things that are correlated and they're correlated because of some real underlying phenomenon it's not random chance. However the underlying phenomenon is not that the two things are mutually causal. Give an example if you look around at population data in the United States you will find that communities that have more churches also experience more crime the two are correlated. More churches goes with more crime and I can tell you that the relationship is not spurious it's not random chance there's actually a relationship here but here's where we go wrong. We go wrong if we think that the relationship just because churches in crime are correlated that churches must cause crime or maybe crime causes churches right? What's going on is what we call a third variable effect and third variable effect means that there is some other phenomenon that is correlated with these two things and so when this other phenomenon does what it does these two things move and appear to be moving together and appear to be related when in fact they aren't the third variable here is population size the more people you get the more crime you'll get because you have more people but the more people you get the more churches you'll get because you have more people. There's a beautiful example of this this happened oh maybe I want to say 15 20 years ago a major soft drink manufacturer was coming was rolling out product trying to expand market share and they were rolling out product in India in you know what was at the time a relatively new thing in India to have vending machines so you're selling the this manufacturer was selling the soft drink in the vending machines an interesting thing happened this company introduced vending machines in in a city in India in a couple of weeks later there's an outbreak of hepatitis and they introduced the vending machines in another city in India and a couple weeks later there's an outbreak of hepatitis and then a third city in India and a couple weeks later there's an outbreak of hepatitis and this kept going on and it got to the point that health officials were becoming quite concerned that this company's product was tainted in some fashion it's causing hepatitis this is a good example of a correlation there's a very tight correlation between company puts the vending machine two weeks later hepatitis what was going on interestingly was a third variable effect that is these two things were indeed correlated the company's product and the hepatitis they were correlated but they weren't causal one wasn't causing the other what was going on was this third variable effect that the the children largely couldn't afford to buy a can of this product so they would pool their coins buy one can and share it amongst themselves it was the sharing of the product that was causing the hepatitis right it's a third variable effect so the warning here is with with correlation two things just because you see a tight correlation doesn't mean that there's actually a relationship it could simply be random chance we call that spuriousness furthermore just because you see a correlation and there is a relationship there doesn't mean that the relationship is causal it could be a third variable effect that these two things are are actually neither causing the other it's a third variable that's causing both of them another thing we have to be careful of when it comes to correlation is reverse causality a good example of this is you know every morning you set your alarm and every morning the sun rises this there is a causal relationship here it's not spurious and it's not a third variable effect the causal relationship is between these two things but just because you set the alarm and the sun rises doesn't mean that you're setting your alarm causes the sun to rise in fact the causality moves in the other direction because you anticipate the sun rising at a certain time you set your alarm appropriately so this is one more thing we have to be careful of when we talk about correlation that we aren't just because we see a relationship and it's not spurious it's real and just because it's actually is causal we've got one thing causing the other doesn't mean that the causality runs in the direction that we think it does so one interesting set of correlations to look at is the relationship between economic freedom and and socioeconomic outcomes and i'm showing you here the relationship between economic freedom and the global peace index so every dot is a is a country and they're measured horizontally by economic freedom is measured by the Fraser Institute so to the right means the country's experience more economic freedom that is the government is less in is less intrusive into people's economic decisions taxes are lower regulation is less this sort of thing to the left is less economic freedom so the government is more intrusive in people's economic decisions up and down is the global peace index so up is the country is less peaceful it's not just a matter of being less peaceful with regard to neighboring states but also the country is less peaceful to its own citizens so if they put you know use violence to put down protests this sort of thing the country would score high on this on this peace index and by high it's an inverse scale so high means less peaceful low means more peaceful in what you see here is in apparent correlation there are clearly exceptions but remember this is a stochastic relationship exceptions are to be expected what's interesting is the trend there on average it appears that as as countries are more economically free they also score better on the global peace index interestingly you find this same kind of phenomenon correlations of economic freedom with all sorts of other interesting things countries that are more economically free tend to have on average lower poverty rates than countries that are less economically free and this is not just true for the rich countries it's also true for the poor countries you know because you might say well yes rich countries tend to be economically free because we have the leisure to be concerned with economic freedom and to tell the government to stay out of our lives we want to do what we want to do oh and by the way because we're rich we're going to have less child labor we're going to have less poverty okay fine but if you look at the poor countries poor countries that are economically free although they have very high child labor rates and they have very high poverty rates those poverty rates and child labor rates are lower for the poor economically free countries than they are for the poor economically unfree countries so no matter how you slice it you see this recurring theme that countries that are more economically free they score better for child labor they score better for poverty interestingly they score better for um environmental measures like pollution and deforestation um you see in this data that they score better for peace um they also score better for income which is kind of to be expected right economically free countries you think of the more developed countries which also have high incomes but if you look at the poor countries poor countries are economically free have high their incomes are low but they're higher than they are for poor countries are economically unfree interestingly you see the same thing with inequality countries that are more economically free have less income inequality than do countries that are more economically or than do countries that are less economically free so there's interesting correlations here and you can all the the the arguments still apply you know do we how do we know these relationships aren't spurious how do we know that there's not a third variable effect these are all very good things in their economists who look into this data and address these questions what is interesting to me is that no matter how you slice the data whether you're looking at differences among countries or differences among states in the united states or differences among cities or differences across time the same pattern keeps emerging again and again that you get better socioeconomic outcomes in countries that are in country city states that are more economically free now one possible argument here is that well economic freedom causes causes better outcomes because we're seeing this this correlation and of course you can't say that because we don't know is there a reverse is it reverse causality is it that countries that are have better that are more economically rich the countries that are cleaner environments countries are less in less have less inequality do they demand more economic freedom right does the causality go the other direction is it is it is there a third variable effect something that we haven't thought of causing both these things the good outcomes and and the economic freedom and you know I don't know the answer to that so what I cannot say is that this data indicates that economic freedom causes good things what I can say though is that because every way you look at it you see the correlation going in that direction more economic freedom correlated with with good outcomes what you can say is that economic freedom does not cause badness that is correlation does not imply causation but the absence of correlation does imply the absence of causation because I don't see economic freedom correlated with bad outcomes I can conclude that economic freedom does not cause the bad outcomes now there's a technical footnote here that goes along the lines of well it is possible that there could be some third variable effect that if it is negatively correlated with economic freedom and positively correlated with this outcome and that it and that it's the magnitude of the effect is large enough to outweigh the magnitude of of the effect of economic freedom that in fact the the correlation does go in the other direction we're just not seeing here and I'm not going to go into that argument largely because it's it's it's highly technical but I will tell you this it is an argument but but there's a tremendously high bar for that argument to get over to become meaningful generally speaking generally speaking you can you're safe to admit you're safe to make the statement that correlation does not imply causation but the absence of correlation does imply the absence of causation so about the Gini coefficient would you say it's an accurate measure for income inequality the the Gini coefficient question is a good one it leads into the next topic there there are a variety of of economic problems in my opinion with the whole idea of inequality it ignores half of the economy we only look at when we look at transactions and we think about inequality we look at the people who are accumulating dollars we don't look at the people who are accumulating goods and services in exchange for those dollars right so but those are economic issues there are some statistic statistical issues with with the idea of inequality put aside how it's what particular measure you use just the concept of inequality raises raises a problem at least statistically and that's called aggregation bias aggregation bias is occurs when you take a whole bunch of data and you you average pieces of it together and you then look at those pieces and draw some conclusion about the individual people on the basis of the averages and sometimes not always but sometimes the conclusions you draw can be faulty give you a good example let's suppose we're we're going to calculate income inequality for a group of people and we ask everyone to come into the room and we say what is your income and we've got you've just started your career so your income is you know very low you're a little bit further on your career your income is higher i'm further mine's higher these two gentlemen are coming close to retirement their incomes are quite high and if we calculate inequality for this table we get some you know decent inequality from low incomes to very high incomes so we go away and we reconvene 10 years later and 10 years later you two are sitting in this position your mid-career your incomes are are moderate i'm sitting over there uh my i'm close to retirement my income's quite high these two gentlemen have retired they're gone and replacing you two or two young people who've just entered the job market with low incomes and if again we calculate inequality again we get this you know decent inequality we've got poor people here we've got rich people here well here's the interesting thing if this is how we progress around the table people coming into the job market moving up middle career retirement go live in florida over the course of our careers every one of us earns exactly the same income so over the course of our careers we have perfect equality even though every time we look we see inequality now that's i'm not making the argument that there is no inequality in the world what i the argument i'm making is when we go to measure inequality we we take snapshots of the world like looking at this table and saying okay what's the difference in our incomes and we we can in doing that miss large components of equality good case in point we um we talk a lot in this country when we talk about inequality we'll say things like in 2000 the poorest 20% of of americans earned 3.8% of all the income and in 2007 the poorest 20% of americans earned 3.4% of all the income so you look at those two things you say well look the poor americans their lot has not it has not improved in fact it's it's worsened a bit over these years they used to get 3.8% of all the income now they they earn 3.4% of all the income and so we're concerned about that and we we we talk about this we say about you know the stagnation of poverty there are people here they're trapped and they're always there it's always you know 3% there are 20% of the populations earning 3% of the income whatever it is that's an at at least in part at least in part it's an aggregation bias we've taken a bunch of individuals and we've put them together into a single measure and we look at that measure and we assume that what is true of the measure is true of the individuals that's not necessarily the case i give you another example in 2000 the youngest 20% of americans were 7.1 years old in 2010 the youngest 20% of americans were 6.9 years old now if you apply the same logic to these people's ages that we did to their incomes you would conclude that these young americans not only did they not get older they actually got younger over the course of 10 years right their average age was 7.1 now their average age is 6.9 of course what's going on here is and it's interesting to think about because nobody got younger we all got older and yet the youngest 20% of americans have a younger age how is this happening of course what's happening is people are aging over the course of this decade and they're no longer part of the youngest 20% and new people are being born and they're born into this youngest part of the 20% so when we talk about the youngest 20% that's an aggregation and we compare the youngest 20% in 2000 to the youngest 20% in 2010 they're different sets of people some are the same right some are the same but a lot of them are different new people have come in old people have gone out just like in the example of the table we come back here in 10 years these guys are gone i'm moved over there you're over here we've got two new people it's a different set of people similarly when we talk about the poorest americans in 2000 the poorest americans in 2007 some of those people are still there some of the people who constitute the poorest americans are still amongst the poorest americans 2007 but also a lot of them are different some of them have have some of these people who were amongst the poorest in 2000 and now have higher incomes they're no longer amongst the poorest we've had immigrants we've had young people enter the workforce and they're now amongst the poorest americans in 2007 they weren't there before so at least in part it's a different set of people moral of the story is be careful be careful when you look at aggregated data averages of groups of people what's true of the average what's true of the aggregation is not necessarily true of the individuals that comprise the aggregation a beautiful example of this is this picture so you we hear the thing about um stagnation a wage stagnation amongst the middle class and what you're seeing here the blue line is median worker compensation so just to be clear about this this is compensation means people's incomes and employer paid benefits so everything that you get as a result of your job um median worker means we've lined up all the american workers from poorest to richest and we're taking the guy in the middle and the 2014 dollars means that it's adjusted for inflation so what you're seeing and there's nothing special about the years these are all the years that were readily available from census bureau at the time what you're seeing are the years 1992 through 2013 and the blue line is pretty flat this is the story the blue line is what is what leads us to this conclusion that median worker compensation hasn't changed over the past you know 20 years now if you look at the red line this is a little bit different this is compensation over the median career and you have to get your head around what's going on here picture the red line as follows in 1992 we asked but we census bureau asked people set of workers what is your median what's your income and then we pick the median one from this set of workers and in 1992 there are people who are just joining the labor market so think you know 20 20 year olds 22 year olds something like that the median income of these 20 22 year olds is is what you see on the left side of that red line then each year census bureau goes back and asks those sane people what's your income and what you see is over the course of their careers their income is rising and rising and rising kind of plateaus around 2007 right but it's certainly not the story of stagnation that red line represents the actual an actual person's experience going through the course of his career he starts out low he earns more and eventually he ends up you know at some higher level of income that's a very different story than the blue line the problem with the blue line is it suffers from aggregation bias with with the blue line what you're seeing is in 1992 the average income or the median income for all the workers in 2013 the median income for all the workers and what we're missing is the fact that the group of workers in 2013 are different from the group of workers in 1992 so although each workers income maybe not each worker but at least the median workers income was rising over time the median for all the workers remains constant so in a in a perverse way the statement median worker incomes have stagnated in English is correct but it does not characterize what's actually going on what's actually going on is that the workers are earning more money over time