 Rwyf ychydig i'r ffordd o'i ddyn nhw arweinyddol yma. Rwyf eu gofynwyd i chi'n ymdod â bwysig o'r cynllun o'r cyhoesgofaniaeth sgiliau. Rwyf e ddechrau ar hyn, ddyn nhw'n cyhoesio o'r cynllun o'r cyhoesgofaniaeth. Rwyf e chydig i chi'n eich cynllun o'r cyhoesgofaniaeth, er mwyn rhywbeth o'r cyhoesio o'r cynllun o'r cynllun ei rhannu, The social sciences studing transferu by the challenge of big data. I want to try to explore that tonight with a simple argument. That there's been a seed change in the nature of social science thinking. that like it or not that many social scientists don't particularly like it. We are now seeing a form of dated genetic analysis which is really about shaping the way we do sociology. I want to begin by taking you back to something I wrote ten years ago, nine years ago, a famous paper and for some people an infamous paper. I wrote with my colleague Roger Burrow called The Coming Crisis of Empirical Sociology and the argument in this paper was a simple one. It was to say that most social scientists when they do research do two or three things and the two or three things they do is they interview people. Or they do surveys and get people to fill in questionnaires. A few people also do ethnography, but that's not very common these days. And my argument was this, those methods have been around in the social sciences for decades and they haven't actually transformed that much. And yet over the last 20 or 30 years we have seen the proliferation of digital data. And I argued that sociologists had to take this world seriously. We couldn't just afford to sit on the margins of this world. We had to try and get a hands dirty and think about how we can use these kinds of data for our own purposes. Now this was a very controversial argument. One of the referees of the paper said it was the worst paper he'd ever read. Fortunately, the other referee liked it and indicated very much the flavour of the dispute about whether social scientists should use this data or just steer clear of it. And there are many critical comments made about the problems of big data, digital data. So, for instance, the fact that much bigger data does not contain any information on motives of people on the hermeneutic dimension of that of life, whereas if you interview people you can ask them why they do things, but you can't do that using big data. All you get is the traces of people's credit card histories or telephone histories or Google searches or whatever it is. Similarly, it's often said that you can't really establish cause or links using this data. You can do fantastic patterns, you can explore networks between all sorts of connections or practices or transactions, but you can't really understand what is causing what in the way which you conventionally can using various kinds of modelling techniques and social sciences. So, it's quite a common critique of big data to say that it looks very interesting, but how have we actually been able to prove something called to something else in a non fairly trite way, in a deep way. Similarly, the world of big data is not a neutral world, it's a neutral world because big data is only generated if you are part of the community of big data users, if you've got a mobile phone or if you're on the internet, and this is a world which is skewed towards the rich and powerful. So, if you rely upon big data then there's a kind of inherent power bias in the sources you use. So, I was at the seminar a few days ago and they said, well isn't the idea of big data another example of the way that the north of the globe has more power than the south of the globe, where there's an informal economy, but things aren't recorded and where these activities are invisible to transactions. Also, and this is something I'm going to talk about in a minute, one of the issues we have in social science, if you take a kind of Thomas Coon approach to how social science develops, you need exemplars of good studies which show how you can use big data in an interesting way, and it can be argued that we don't yet have many good exemplars of how the deployment of social science data, or big data rather, can actually improve our understanding and our analysis of social world. So, you can see the stakes and I think really this is a very serious debate because the future of social sciences is at stake. You can argue the 20th century was the rise of the social sciences. In most nations the social sciences become massively more powerful as the 20th century advances. You could argue in a pessimistic view that the 21st century could see the decline of the social sciences and the rise of the natural sciences again, because you're now finding physicists and computer scientists doing social research using big data and ignoring what social sciences do. So, the future of academic disciplines are very much tied up with this debate. I think the fundamental issue is that you get all these sort of graphics, I'm sure I'll see some in a minute, miserable, but how do you make sense to them sociologically? How do you actually get all sorts of beautiful pictures and patterns and diagrams, but the fundamental issue is so what? What does it actually tell us about the world in a way which is meaningful and advances our understanding? I actually want to argue today that in fact we are now seeing social scientists beginning to make sense of these kinds of innovative data sources. I'm going to argue that really what we've seen is the rise of very powerful modes of social science thinking, which depart from traditional modes, traditional modes of social science thinking have predominantly been theoretical, they've been philosophical, they've been historical and fairly abstract. Instead, what we're now seeing is high profile social scientists being much less focused around theory and much more focused around using data of all kinds. As I'll say, not all of them use the big data label and talking about big data may be a bit confusing, but it is a data driven kind of analysis. So we're really seeing a very radical shift in the repertoire of how social scientists establish themselves as commentators on affairs and how they come to the attention of governments, of policy makers of the media. For those of you who are social scientists or intellectuals, you may hopefully know these four people. I guess you're Ulrich Beck and Jürgen Habermas, very eminent German thinkers, Anthony Giddens, former director of the London School of Economics, Michel Foucault. So my point is this, 20 years ago, 30 years ago, these would have been the leading social scientists of their day. Everyone listened to them, had a huge influence on public debate. None of them knew much about data. They were not data people. They were people who had a fantastic command of history, philosophy, social theory and very good at synthesising. They were not people who satify to the computer and analysed surveys or data. But that is the challenge I threw out to you. I threw this out to my students a few days ago to say, can we find a equivalent to those people today? That kind of intellectual. I don't think you can. I think you've actually seen a shift away from those people who make grand claims. The risk society in the case of Beck, neoliberalism in the case of Foucault or whatever. Those kinds of grand claims are no longer so significant. But instead you are finding other kinds of social scientists coming to the fore. I'm going to talk about three of them now. None of them use the big data word particularly. But they are all doing data analysis. It's the data which is making their name famous. Robert Putnam, Harvard political scientist. I'm not sure how prominent this book was in Germany, but certainly in the US and in the UK. Ten years ago this was the dominant book. It influenced governments all over the world. The argument that people were no longer as civically engaged as they used to be. The point is Putnam is a very eminent political scientist. But it was a book which had lots of data in it. This is fairly conventional data in his case. It's mainly surveys. But also data from membership registers. From all sorts of different sources, it's mashed together. Secondly, I'm not sure how familiar this book is in Germany, but Richard Wilkinson and Kate Pickett are two British epidemiologists who wrote this book called A Spirit Level, which is a bestseller. It's sold below about a quarter of a million copies. I'll show you some of its pages in a minute. This is an argument saying that more unequal societies are worse on a great variety of criteria. They're less healthy. People aren't as happy. People don't live as long. So it's a very wide-ranging argument, but it's an argument based on massive uses of data. It's not based on theory particularly. The final example is of course this guy who you will all know who published this book last year. More than half a million copies sold in the world. It's true that most people don't get to the end of it, but nonetheless it's had an enormous influence. The book is fundamentally a data book. We're in this very interesting situation where the really powerful intellectuals, powerful social scientists are very different. Pickett is very different intellectual from Foucault or Hammas in the way he does his work. What I think each of them does in different kinds of ways is they have a very skillful knack of taking multiple data sources, some digital, some survey-based, some linked to government records like taxation data, health data, and they mash those data sources together. But the challenge of big data is always everything can become so complicated and so complex you lose sight of the big argument. What they do is they have a way of visualising basically one thing which they keep banging home time and again. It's an enormous skill in cutting to the story which really matters. They do that by using visualisation tools. Most sociologists don't particularly like visualisations. After the Second World War sociologists normally did things by writing lots of words or by having figures and numbers. They didn't do visualisations. That would apply to Ulrich Beck and Anthony Giddens and Foucault and Habermas as much as anybody else. But also Piketty and Wilkinson and Pikit and Putnam are not really doing cause or social sites in the conventional sense. They are looking at descriptive patterns. Often social scientists say it's only a description. It doesn't matter very much actually. Piketty shows this particularly strongly. Description is really powerful if you do it in a way that has varying patterns. It's really very challenging for social science repertoires about the way in which what social science involves today when it gets out into the public realm and influences policy makers is it's a different kind to that which was influential 20 or 30 years ago. Let me give some examples of how it works in practice. This is the way Putnam proceeds. Putnam's big argument is that in America and in main nations, but particularly in America basically the membership of many clubs increases after the Second World War and then it declines. The argument is there's a crisis of social capital in America. People are no longer being so civically engaged. How does he make that case? He makes that case by showing figures like this time and again in the book. There's a simple visual here which is being mobilised. It's built up on a variety of databases. It has to do with the membership of bowling clubs. Here's another one I took to do with membership of trade unions. Again the same kind of peak. It's a very skillful way of showing how you tell a story by repeat visualisations, which are a bit different each time. There's a bit of a nuance, but the reader as they read the book has a simple message in a sense. It's a very powerful message. It's a descriptive message. It's been a big rise of collective membership and it's a big decline. What do you do about it? That's the kind of message which has been very influential in the policy world. The version in which workers in a picket do is a bit different. It's a different kind of visualisation, but it's repeated time and again in their book. What they're doing here is on the X axis. You're contrasting countries with high and low income inequality. On the Y axis, you've arranged countries according to whether the health and social problems are worse or better. Their point is that when there's high levels of income inequality, as there is particularly in the USA, that's when these measures on the left are the worst. Even though USA is a very rich country in general, an average, it's a very unequal country too, and therefore, generally, health and social problems are worse. On the other hand, if you look at Japan, which has a low income inequality, then the social problems are much better. There's a fairly straightforward line. There's about 20 countries in the line and it shows there's a kind of correlation there, a kind of pattern at work. This basic graphic is reproduced time and again in their book. Here's another example. Income inequality on the X axis, and then this time we've got mental illness, and again USA is in the top right-hand corner, and Japan is on the bottom left-hand corner. So it's kind of just repeating itself, but by keeping on blasting in with the same kind of graphics, you get the message. And it's a very skillful way of finding a crisp way of visualising patterns. What you can also do if you have this technique is you can then occasionally you can change it. You can kind of vary it whilst sticking to the same repertoire. So in this case, the X axis is the gross national income per head, and the point is this. In this case, I mean, what workers in a picket of arguing is actually to say what matters is any quality, not income. So income shouldn't be so significantly linked in to whether you're happy or not, and it's not. So New Zealand and Norway have the same levels of happiness, even though they have very different levels of prosperity. So this is an argument against classical economics arguments that the main thing is to have a prosperous economy. But I hope you see the point here. It's a very effective use of visualisation which repeated a bit and kind of tweaked a bit to kind of make the message. And this approach really reaches its kind of, its apperge with Piketty's work. Because Piketty's work, 650 pages or whatever is, is littered with versions of this figure. And it's the opposite of Putnam. Putnam was the mountain. Piketty is the valley. It's basically, the story is, in this case, it's to do with the proportion of the top percentile in a total income. So the higher this line, that means the more unequal it is, the top 1% have the line share of the national income. And the argument is, obviously in the Victorian period before the First World War, highly unequal the top 1% has a high portion of the national income that declines into the middle years of the 20th century. But it's rising again. And we're now moving back towards the inequality of the 19th century. It's a very crisp way of showing that pattern. He's using a very... Piketty is also important in terms of the big data argument because he never uses the word big data as far as I know, but actually he is using big data. He has made this important argument that you can't use household surveys to ask about income because if you ask people how much you earn, they don't tell you realistically. Particularly if you're at the top end, you tend not to be honest about it. But if you... He therefore says, you've got to use taxation data. Obviously people don't always exactly tell the truth for taxation data either, but the point is this. The survey company never comes back and checks and says you tick the wrong box where taxation people can. So you're more likely to think I better give my correct income. And once you're into the world of taxation data, this is administrative data, huge databases, a potential for using forms of digital big data in these kinds of analysis. So he's now developing... He has been developing with his colleagues the World Top Income of Database, which is a kind of big database. So here again, here's the U-shape and here's another version of the U-shape which is done a bit differently. I think it's a lovely way of representing social change. So here he's got this very clever idea of looking at the proportion of capital assets, if you like wealth, as a percentage of the national income. His argument is, up until the First World War, about 600% of the national income was stored in the form of those kinds of capital, housing, land, savings, investments and such like. That rapidly declined in the middle years of the 20th century. As part of the effective war, part of the redistribution. And then it's been rising again in the more recent period. And then what he does, again, this is rather like a picket. Same U-shape here in the black line. But what he does is he interjects that with the line with the white squares, which is the rate of growth. So he becomes part of his argument that actually the issue here is that the rate of growth tends to be less than the rate of return. And therefore there's always a tendency for capital, the growth of wealth to exceed the growth rate in the economy. So he's making an argument through a clever use of visuals, which are the same but slightly tweaking each time. And this links into his famous kind of R is greater than G argument. So this is really interesting. If you go back to Karl Marx, Weber, Durkheim, all the great social theorists until the last few years, none of them used visualisations. I would say certainly in the British context, those three writers have been the most influential in policy terms, in shaping debates in the social sciences. And they're doing it by a very skillful use of data. And I think it's worth us bringing out the skill involved here. The skill is about not getting too stuck into the detail of the data. It's about standing back and finding a technique of crisply summarising one trend again and again and again. So it is in many respects, it's a kind of intuitive, historical, hermeneutic approach to data. And it involves hooking the visualisations with a story and with a theory. So the concepts are in Putnam's case, it's social capital. In Picardy's case, it's capital, which has a kind of battery of links to Marx and those economists. But it also has a story about historical change and then about that can be backed up visually. So in a way, and also, in the slide, they're also interested because they're also very political in certain ways. I mean, they're all making political arguments, working to pick it very clearly, and they make no bones about this, a very left wing, very socialist, trying to show that inequality is a bad thing. Picardy similarly, they're also interestingly, it makes it clear he's not a Marxist, but he's obviously on the left. Putnam more of a kind of, I guess, a liberal Democrat, American style, communitarian. But politics is central to what they're doing. It's not a kind of empiricist, let's see what the data says. Politics is really informing the way they're interpreting the data. So what they're all doing is they're putting data sets together, which aren't just the conventional data sets of surveys. It includes all kinds of data, digital, it could be stored data, it could be any kind of data, but then they're webbed together. They're not doing complex models, even though Picardy could do the econometrics he wanted to, but he doesn't do that. They use simple, univariate and bivariate descriptive figures and visualisations. And the basic approach is to contextualise the variables which interest them over time or in space. So looking at broader patterns and looking at trends either between areas or over time. So what are the implications for big data? So when I was writing my paper ten years ago on the coming crisis, I was quite pessimistic in some ways. I said, well, the social scientists aren't doing, they're not really doing data analysis in an impressive way. And the work, the high ground is being taken by the computer scientists, by the people who are doing amazing work with data sets. But actually I've changed my mind a bit now because these three thick people, I think Picardy is the best example of this, are now very much using data. They're not calling it big data, and that's interesting. It's interesting as to whether that term is a helpful one. It's actually a little power, but the fact that they are doing data analysis and that data analysis is so powerful is interesting, and the fact that they need to talk about it as big data is even more telling perhaps. And arguably the debate about big data conflates a number of different things. One of them is the data itself, which can take all sorts of forms, Google searches or tweets or taxation records or whatever. But you've got that, and so Picardy, for instance, is using taxation data. In that respect, he is using various kinds of big data, he doesn't call it that. But the analytical strategies, which you may want to use to analyse the data, you can use very conventional methods to analyse big data if you want to. You can use regression models which have been out for donkey's ears. You don't need to use computer scientific tools, like you can do, but my point is this, you need to separate out the analytical strategies from the data itself. Data doesn't speak for itself. Whatever happens, data has to be interpreted and given sense to. And that's why I think Picardy and Puckman and Wilkinson and Picardy are so powerful. So my argument, as I said, 10 years ago I was thinking sociology, and social sciences could be going down because we can't really speak this world of big data or this world of transactional and measured data. The high ground is taken by natural scientists and by computer scientists. But I think I'm changing my mind because social scientists have actually shown themselves, certainly these three social scientists have shown themselves to be really adept in using data to be very influential both in the world of intellectual life and in influencing policy and in raising big issues. So the whole growth of interest in inequality which has been a massively increasing issue is driven by the work of Picardy and also Wilkinson and Picardy. And I think the point is this, social scientists, unlike computer scientists for instance or physicists can come to data sources with a much more creative and historical orientation to what they're looking at. And they're much better able I think in the detail or the beautiful maps or whatever it happens to be and find the kind of kernel of the arguments which is what will interest people. And therefore perhaps, this is my final point, perhaps we're going to see the rise again of a sort of interdisciplinary and very historically oriented social science because again Picardy is an amazing and economist is going back to the year 1700. That's the 300 year period of just straddling all that period of time and there's a kind of merger here between historical sensibility and the social scientific sensibility which is very exciting. And the world of big data for this kind of approach to social science is a very exciting one in which I think we're going to see some very interesting and very important work in the years to come. Thank you.