 There are, I think, lots of implications for social scientists now in this kind of new data era in terms of the sorts of skills that you need to equip yourselves with and the increasing need to work interdisciplinarily. I mean, when I finished my PhD many, many, many years ago, the main way of approaching your career was to start out and try and publish a single authored paper, maybe using survey data or focus groups and so on. I mean, there's still a lot of mileage in doing that, but I think there are many, many more options open to people and hopefully what we'll see is some of the challenges and the opportunities that are offered now in this new context. So what is this new context? Well, there's essentially, I'm going to sort of play on my sort of longevity a bit here and contrast what the situation was when I was kind of finishing my PhD and sorts of data that were available then to what's available now. Now, I remember, one of my memories is that when I used to work on survey data, British Household Panel Survey, and at that time when I was doing my PhD, it was great that you could send off an email and someone at Essex University would send you a few weeks later a CD with the data set on it. And I remember thinking, this is amazing, you know, we can, this has opened up all these avenues, but I mean, that was only 20 years ago and now the landscape has completely changed. So we've got, of course, social media data, Twitter, Facebook, all sorts of other things that people are now using, both as we'll see in those sessions, both as ways of obtaining data that might shed light on kind of conventional questions that social scientists were interested in, you know, what are the dynamics of public opinion, who are people going to vote for in the election and so on, but also as sites of interest in their own right. These are actually now where people communicate, interact and so on. So this is important not just to use these as a new form of data, but of sites of scientific interest in themselves. Online surveys is what I'll be talking about this and you'll be hearing more about online surveys, but another huge development, you know, the vast majority of surveys now being completed online and that has a lot of implications for the ways in which we can survey people and the kinds of data that we can obtain, the quality of the data, the cost of the data and so on, but another a really huge change in the data landscape. Admin data I think is something that has been around for a while offering a lot of promise. You know, I think it'd get probably going back five, ten years, Mike. There was a big push for saying we've got all this data out there. Well, not all of it, but we've got a lot of data on people. Why do we need to keep going and doing surveys, asking questions about how much people earn and so on when we know this from their tax returns and if we can link things up in a smart way, we've got this really excellent data infrastructure that we can use and in many countries, particularly in Scandinavia, this kind of data infrastructure is already well established, but there perhaps we've made less progress in terms of being able to access that data than we might like, but nonetheless that's I think a really big new thing. Admin data has been around for a while, but there's certainly more emphasis and I think more availability despite some of the access problems. Cutting across all of this is the way that we access this information, mobile digital devices. Again, my first mobile phone, I think I got in the year 2000, was one of those knock it black, knockier brick things basically. Battery never ran out because you never really wanted to do anything with it other than take a phone call. Of course, that's again enormously changed and the sorts of things that you can do with your smartphone connecting to the internet is obviously key to that, but once you're connected, there's all sorts of things that one can now do including, of course, completing online surveys, accessing your social media, using GPS and so on. That's something which is transformative in another way, which is that again, there's a big age gradient in the way that people are using these devices. I have kids who are 15 and 11 and they use their mobile devices in completely different ways to me really, ways that I would never think of reading stuff and watching films and so on. So there's an interesting angle which I'll come back to here which is a potential kind of promising angle to our social scientists in the ability to actually access people and collect data from people who are much more comfortable with using these devices than someone of my vintage or older. Another big development is the availability of what are broadly called textual archives, digitized text databases of all various kinds that because they're online and because they're digitized, they are not always, but largely in principle, they are accessible as data sources and we can apply various kind of machine learning, natural language programming and so on to mine into this and to do both quantitative and qualitative data analysis. And another, I mean these are, you can see there's a lot of cross cutting, these are not sharp boundaries between these various dimensions here, but what we can broadly call transactional data, data that sort of arises as a kind of an epiphenomenon, the exhaust fumes of the way that we go about conducting our digital lives these days. So we take taxi rides with Uber and it's actually interesting, I was putting these slides together yesterday and I saw something on Twitter, so there's probably a trail of me doing this, someone could access it, my thought patterns and someone tweeted the thing saying if you really want to know what you're like as a person and what people think of you in business and in your organization, get your phone up and look at your Uber account, see what your Uber rating is from the drivers. And there was then an interesting sort of discussion about this which essentially said that if you've got a low rating, which I think is something like a 4.5 out of 5 I think is a low rating, then this suggests that you're probably the sort of person who gets in an Uber, doesn't say hello, gets straight on their mobile phone, talks to people and doesn't give a tip. Now I looked at one, I've got a 4.82, so I'm doing okay, I'm not doing too bad, but you can see there that that's something which is completely new and if we can access that data, it does give us something interesting, something potentially insightful about the way that people behave, the way that they interact with others and the way that people sort of perceive those behaviors and so on. And obviously bike shares, these are connected so you can see, you can get these, they're quite neat looking maps of where the Boris bikes or whatever, they're all over the world these days, aren't they? Where people are going, kind of what sorts of rides people are taking, what the effect of various bike schemes are on real estate prices, these can all be kind of linked up and so on offering new possibilities. Airbnb and all of these are kind of, I was at a meeting last week talking about new ways of investigating trust between people, social trust and so on. And someone raised a really interesting point about Airbnb and similar things where you get ratings, how do you, if someone's only got one rating or no ratings, how do you trust that person, how do you build up trust and the importance of these kinds of reviews and so on in modern life? And again, all of these have barriers to access, some are proprietary data and one of the big things that we know and will hear about in social media analysis is that if it's Twitter data, you can't necessarily access it, archive it, if you, what you access may change if people are able to delete their tweets and so on. So there's all sorts of difficult issues, but there's all sorts of opportunities that present themselves for social scientists. Although one of the things that I will want to pose as a challenge, and we can perhaps come back to this in the discussion, is whether all of this, whether there's a sort of, you know, bit of a hype underlying a lot of this, you know, particularly on the kind of the big data front. Is this something that really is so exciting for social science? I mean, it's certainly exciting, certainly useful in all sorts of ways for the logistics and there's huge benefits to be had from it, undoubtedly, but is there a real social science payoff? Well, that's an interesting question. So, as I said, the focus of my talk and of this afternoon is about actually what's the implications of this, in particular, for survey research. And I thought I used, as a hook for this, a paper that some of you may be familiar with, others not. This is a paper by Mike Savage and Roger Burroughs that they wrote in 2007. It's in sociology, it's open access, so you can easily get hold of it. And it's quite a sort of provocative think piece. It's published in 2007, so it's kind of even before the term big data had any real currency, but that's really what they're talking about. And they're talking about, they're looking forward and thinking, what are the implications of this data revolution, the sorts of things I've just been talking about there, for academic social scientists? They talk particularly about sociologists because they're sociologists themselves, but I think there are wider implications. So, here they say, and I should, they're not just sort of qualitative people having a go at surveys. Mike Savage uses surveys quite a lot himself. They also have a go at qualitative interviews and so on, so they're equally nasty to everyone. But so they say, the sample survey is not at all that stands outside history, it's glory years we contend are in the past. So this is a red flag for someone like me who's steeped in all my career spent doing surveys. And here's something even more provocative. It's unlikely we suggest that in the future the sample survey will be particularly important research tool. And those sociologists or social scientists to stake their expertise of their discipline to this method might want to reflect on whether this might leave them exposed to marginalization or even redundancy. So this is pretty scary stuff. And there's me back in 2007 thinking, Blimey, maybe I'll be out of a job. Maybe I should jump on this kind of, you know, big data transactional data bandwagon. So this, you know, I think deliberately provocative, but some interesting ideas, not just saying that surveys are kind of old fashioned, but making the point that a lot of the new forms of data are not sort of held within universities as we saw. These are often kind of proprietary owned by corporations, owned by government and so on. And that where a lot of the interesting developments are actually taking place are not in sociology departments or statistics departments, but in Google or Facebook. And that's where a lot of the smart data scientists and so on are locating themselves. So in some ways that's the, I guess the hook of what I'm going to be presenting for the rest of my talk is trying to sort of say, well, do I believe this? Are we in danger of finding ourselves redundant if we stick with the survey method? I'll give you a spoiler. I think the answer to that is no. Okay. So I'm going to talk about a few things, but I'm mainly going to talk about a piece of work that I've been doing with Rebecca Lough, who's a research fellow at NCRM. It's one of our NCRM research projects. And what we're trying to do is to understand how our patterns of data usage, how our patterns of data usage has been changing over time, what kinds of data our social scientists are using now, and so on. So what kinds of data are we using? How does this vary over disciplines? How has it changed over time? In particular, to address the savage and burrows question, are we witnessing a decline in the use of surveys and a concomitant increase in the sort of nebulous term big data? A parenthetic, although I think an important question is how well are people reporting their survey methods? If they're using surveys, are they reporting response rates, modes of interviews, sample size, and so on? So that's a sort of a parenthetic but nonetheless important concern. So survey research, are we really in kind of crisis mode? Do we need to be thinking about getting new jobs? Well, it's not just the fact that there's all this new data out there. Surveys have their own particular set of problems that we need to deal with. So the big one that has been around for some time now and causes people a lot of headaches is the response rate issue. So again, when I started in survey research, my first job was with the Office for Population Sensuses and Surveys, now the Office for National Statistics. Back then it was fairly standard to get a response rate for a face-to-face survey of 70% or above. That was the kind of, you know, the standard that you would expect to get and a minimum standard as well. Over the ensuing sort of 25 years, response rates have gone down and down. So now I'd say, you know, depending on the design and the topic of the survey and so on, but you would, you'd struggle to reach 50% response rate, even for a sort of relatively interesting topic. It's even worse for other modes of interview. We don't tend to do phone surveys so much in this country, certainly not random digit dialing, that's what RDD stands for, their random phone surveys, we don't do them. But in the US, even for sort of high quality government or academic surveys, it's routine to get below 10% response rates and these are going down. This is, you know, we're getting more kind of problematic legislation, do not dial numbers and this kind of stuff, various data protection laws coming in, making that even harder. So as this is all happening, you know, people who commission surveys, there's the sorts of surveys that would, you know, previously get the 70% response rates are saying, what are we getting for all these hundreds of thousands of pounds that we're spending on this, particularly when we're hearing all this stuff about, we can get this for free from Google and admin data. What's the payoff? And as our response rates get edged down towards this kind of 10%, what's the benefit relative to having a good quota sample, something which doesn't have a response rate at all, something which doesn't use random probability sampling? Okay, so taking us into a, you know, a different and cheaper form of, that's a strange one, sorry. Oh, it's because it's doing this reveal thing. I'll put it all there. So that's the context of response rates, response rates getting worse. At the same time, the costs of doing these kinds of high quality surveys is increasing. And that's, you know, also, it's also worth bearing in mind that we're spending more money as the response rates are going down. So if we were kind of, you know, keeping our costs flat, then that the response rates would be even worse. I went to a presentation by Simon Jackman, who at the time was running the American National Election Study in the US, a longstanding, the kind of gold standard election study in America going back to the to the 50s. And his estimate was that it cost, and this is public money, $2,000 per complete interview for the 2012 American National Election Study. So when you look at your, you know, your data data set of the A and ES, each one of those people cost $2,000 after you've factored in all there. Now I think he deliberately made that number as big as he possibly could. I see Joel not shaking his head. What does that include? It does include two waves of data collection, face to face. So that's a bit of an overestimate. But in the US, face to face interviewing is enormously expensive, much more than it is here, because if you draw a sample point in Wyoming, you have to fly an interviewer there and put them up in a hotel, rent a car and so on for a month. So it's more expensive here. Here's my ballpark estimate for what an equivalent sort of number for a row in a data set might be in the UK. So if you just wanted to go and commission something from Curtis or Joel at Natsen or Cantar and said I want a 50% response rate about 1,500 respondents, 45-minute interview using Cappy, drawn using the standard approach and path sample, I think that would cost you £150, £200, maybe a bit more, maybe a bit less. We can have a discussion about that. That would be, if you can do it cheaper than that, we maybe can have a chat. So very expensive and in some ways getting worse in terms of quality, certainly in terms of the kinds of quality indicators that we use. And again, there's this point about sponsors saying, what do we get for this? If you compare that, if you went to YouGov and said, how much is this going to cost me for just an opt-in online panel, probably about £5 per respondent, maybe a bit less. There's certainly cheaper operators than YouGov out there. So you really want to know what you're getting in terms of additionality there. What's driving this? What's driving the increase in costs? Well, it's the response rates. Why are the response rates going down? Well, that's a big question. We're not entirely sure about that. It's societal changes. It's more atomized lives that we lead. It's the increasing number of requests that we get to do every time you get on an aeroplane or borrow a book from whatever you get a text saying, can you do our survey? So there are lots and lots of different factors coming together, but they are making people harder to persuade to do a survey. So we send interviewers back to do more and more calls. Each time they call back, that has a cost to it. So that's driving up costs. We also do refusal conversions. If someone says, no, I'm too busy now. Don't want to do it. We'll send another interviewer back another time. Again, very expensive. That's often maybe a more senior or experienced interviewer who gets sent back. Similarly, we're offering more incentives than we used to do in the past for the understanding society survey. This is actually in a kind of experimental part of the study. Found that they had to offer 30 pounds incentives to get people to complete online at the same rate as they had done face-to-face. So where we can make some savings in terms of not using interviewers, that's largely burned up by having to offer people larger incentives. So essentially a lot of these total costs that we're burning are coming from trying to get our response rates up to a sort of reasonable 50%, really working the sample to get a reasonable threshold response rate to get those hard to get ones. And that's after all the logic of what we're doing with surveys. If we just do a quota sample, we don't get these hard to get ones. That's what we think is the benefit. So that makes sense, but it's where the cost really ratchets up. Here you can see this is a bit dated, this graph, but it shows you the basic point. It's from this paper by Curtin et al in Public Opinion Quarterly 2000. And it uses a consistent survey, has the same methodology. And you can see that the number of contact attempts is increasing year on year and the percentage of refusals while the response rate is going down from 70 to 68. And you could pick any number of surveys which show the same trends. More effort, more cost, response rates staying flat or going down. So this is bad from a kind of a cost and a quality perspective. We've also got a few other things that we want to worry about when we're sort of really squeezing people to do our research for us. It's that we may get, persuade people, may give them 30 quid to do the survey, but they'll give us bad data. They'll just straight line through the questionnaire, not give it any thought. We're changing the nature of the social exchange from one where we say this is a worthy thing that we're doing. We're doing research. Can you give your time? And that will benefit other people to saying this is a financial exchange. We'll give you 30 quid. Do you spend as little of your time as possible? So we perhaps encourage that. We're similarly placing implicit pressure on respondents. Therefore, there's the same point in many ways to fabricate to just do the questionnaires in order to get the money. If you're doing that, then you don't want to think about the responses that you're giving. Same kind of pressure on interviewers. There's evidence that in a lot of the sort of high quality comparative surveys that the interviewers are just sitting on the curb and making up data so that they get paid. There's an ethical boundary that we can cross. When people don't want to do our studies, ultimately we have to get to the point where we say okay, and sometimes that boundary can be potentially crossed. So there are certainly a lot of pressures on survey research as the kind of premier method of the social sciences. So we might expect as this combines with this new data landscape to see the savage and burrows forecast, if you like, coming to fruition. So that's what we were, well not entirely, a large driver of what we're interested in in this piece of work, which I said is joint with Dr Rebecca Lough. I should also say that this is a work in progress. So these are sort of provisional numbers we're kind of working through. There are a few things. In fact, the first time I've seen these numbers was on the weekend, and so I've spent the last couple of days kind of saying that looks a bit odd Rebecca. Why is that? So there may be a few things that you think look odd here, and I'm interested in your thoughts on this work. So what we're doing is trying to answer this question about how are patterns of social science data uses changing over time? How do they vary across discipline? There are different ways one could skin that cat. How would you approach that question? One way of doing that is to take a sample of journal articles and look at what the data that's being used in those published articles is. So that's what we did. There are other ways of doing it, and this certainly has its limitations, particularly in the way that we've implemented it, and we're well aware of those. Some of these are quite interesting issues, I think. So we weren't the first people to do this as a paper by Stanley Presser, I think in public opinion quarterly in 1983, and a follow-up paper by Saris and Galhoffer. I'll give you the citations for these, but that's in a book chapter, I think, in their book of 2007. So what Stan Presser did was to take these journals, which are the sort of the main, the premier journals in each discipline in 1983, and take all the articles in each of them and code them for what the data they were using. He was particularly interested in whether people were using surveys, not so much in other forms of data, so that was his primary concern. So he looked at these years, 4950, 64, 65, 7980, and Saris and Galhoffer updated this in 94, 95. So we thought, well, that'd be quite interesting now to take this up to 2015. So we looked at 2014, 2015, and I'll show you the results of what we found in a moment. I said there are different ways one could approach answering this question. Another way is to do a survey of social scientists, right, and say what kind of data do you use? And indeed, there's a paper by Katie Metzler and Nick Allam and others published last year. You can find this on the internet. What they do is they have a very large database of social scientists who are contacts of sage in one form or another. It's not exactly clear what this sample frame is, people who've written for sage or signed up in some way. So they sent out this online questionnaire invite. They got 9,500 people, nearly social scientists, to fill in the questionnaire, which is an impressively large number of people. Problem is the response rate is less than 2%. So whether this is representative, whether the sample frame in the first place was representative of the population of social scientists and then the response rate, but what they find, their headline finding is that 33, a third of people, a third of social scientists report having undertaken big data research. Now that strikes me as a surprisingly high proportion and I would say an unfeasibly high proportion. I think part of the reason is to do with this. If you get an invitation to an online survey, which is about doing big data, if you do big data, you're probably more likely to click on it and do it. That's one thing. Also, I mean, we're all aware that big data is a rather nebulous unfortunate term. We kind of sort of think we know what it is, but once you drill down in it, it's slippery and so people were essentially relying on their own definition of what constitutes having done big data research. But that's another perspective on this and another way that one could try and answer this question. We think certainly there are flaws in our approach and we wouldn't want to claim that the percentages that I'm going to show you in a moment would in any way be the correct unbiased population estimate. If we first look at the findings that have already published from the Stan Presser and Saras and Galhofer papers, you can see here that this is the percentage using surveys. The numbers in parentheses is the actual number of journal articles on which this is based. You see two things, I guess. One is that surveys make up a pretty substantial minority overall, but a substantial amount of the data that social scientists were using in this kind of post-war period. There's a trend towards increasing use of surveys. That reflects the fact that we would start doing more surveys, increasing skills, access to the data, increasing so on, but certainly a sort of general trend towards increasing survey research and across disciplines the percentages vary quite a lot. As you'd expect, public opinion research almost entirely based on surveys, less so in political science, certainly in the early days. Political science as a discipline has I think changed quite a lot over that period. If we update this now to 94, 95 in Saras and Galhofer, we see that the trend increases. In sociology, 70% of papers now are using surveys in sociology, increasing in political science, economics. A good high proportion across all disciplines, although some variability. In the pre-big data era, we're seeing this increasing trend towards survey research. This final column here is like the big caveat point, which is that Stan Press' study doesn't give a lot of detail about his methods, how he defined things, how he did things, and some of his definitions are a bit odd of what counts as a survey. Anything that was done by the Census Bureau, for example, would count as a survey, even if on closer examination there was no survey involved. He would just count anything that the Census Bureau did. Saras and Galhofer came up with their own definition of what constitutes a survey in one of these articles. You can see that the rates are still high and similar for some of them, it makes quite a big difference. That tells you something about the way you do this kind of study is that your definitions, unsurprisingly, make quite a lot of difference to what you find in terms of the patterns. Another point to make about this, this is the percentage of all papers. This includes papers that don't have any empirical content at all, theory papers and so on. So, we updated this analysis. I say we. This was actually a team of coders that read all these papers, so we hired a team of coders. We got 1,453 papers across all those journals for those two years, hired seven coders, trained them up, randomly assigned papers to coders and came up with a set of codes that we asked them to apply to each paper that they read. First of all, whether it was a theory paper or a review paper, so no data, then if it was, if there was data, whether it was quantitative, qualitative or mixed methods, then there is other things. I'm not going to report on all of these, we're only part of the way through the analysis, but which kind of data was used, survey data, big data, and so on. So, let's say we're making our way through this and we've found, you know, at the moment we've got, we've found that in terms of the sort of the reliability of the coding, 8% of the papers were flagged by one, at least one of the coders to say there's something ambiguous here I'm not too sure about. There's quite a lot of variability about, in terms of which papers were flagged, so this tells us something, you know, about how difficult this task was. We did a little coder reliability study where we got each coder to code the same subset of papers, so we can get a better estimate of. So we've got pretty good rates of agreement here, 87 is the average pairwise agreement across all codes, that is a bit of variability across coders in that and particularly across the types of data, so that ranges from, for the qualitative codes, very high levels of agreement, lower levels, although still reasonably high levels of agreement for survey and administrative data. Okay, sorry, my slide's not quite fitting on there. The first thing I'm going to show you is the breakdown of empirical and non-empirical papers, so about 14% of papers didn't have any empirical content at all. This varies, again, fair bit across disciplines, mostly in economics, this is the 2014-15. Kind of expect that, there's a lot of formal theory in economics and, you know, macro economics, so it doesn't really involve much in the way of data, but what we're focusing on for the rest of these slides is this column here, this 1251, that did have some empirical component to them. Quantitative, qualitative and mixed, the vast majority of papers are quantitative in these journals. Again, varies in ways that we might expect, virtually no qualitative papers in economics and rather a large number of qualitative, well, large proportion, 11%, still quite a low percentage of qualitative papers in sociology, but still quite high in both social psychology and sociology mixed methods, particularly in social psychology. But vast majority of papers in use quantitative methods, if we break that down a little bit and say what kinds of quantitative data are being used, then 48% using surveys, 47%, sorry, that should point out that these don't sum to 100 because you can use more than one kind of data in a paper. So, actually quite high rates of admin data, not much in the way of, you know, overtly big data, but this is from a base of, you know, zero from 94, 95. So, we're certainly seeing evidence of the emergence of big data, particularly if we're kind of expanding that to include, you know, some forms of admin data, depends on what our definition is here of big data. Certainly a no real decline in survey data. In terms of mainly qualitative data, these are pretty small percentages here, but mainly using interview focus group or kind of observational study, quite a lot of, you know, text analysis and surprisingly high number of visual methods papers in social psychology. But here we see the kind of, the sort of qualitative big data, some evidence but less than on the quantitative side. So the direct comparison, we can't compare all of those backwards because essentially we've, you know, we've come up with new codes in our data set, which we can't, which aren't published in the previous studies. For the ones that we can make a comparison, here we're looking at the proportion of empirical papers that you surveys across disciplines and the red line is the 2014-15 and the blue is 94-95. So you can see not much change really. There's some decline economics, sociology, a small drop-off, an increase in political science, quite a big increase in social psychology. So broadly, I'd say on average staying stable, but with a bit of variability across disciplines. Where there is more of a change, I think it's interesting, is in experimentation, where across the board we're seeing a big increase in experiments. Social psychology 46 up to 72, public opinion. My theory here is that what we're seeing is partly the fact that it's very easy to load experiments into surveys, particularly into online surveys. There's been a kind of quite an explosion and so I think what we'll see is once we put the parts of this together that there's a lot of papers that are doing surveys and experiments. But that's the biggest sort of clear shift over this period is a big increase in experimentation. Probably also, I think, reflects the greater focus on causal inference that we've seen over this period, the need to be able to demonstrate causal effects rather than just saying, you know, we're not interested in causality. Yes, yes, yeah. So if they mentioned doing an experiment and doing a survey, they would get ticked in each of those categories. So we'll be able to sort of combine these eventually, but at the moment we've just got the sort of, you know, the univariate versions at the moment. Observation, this is kind of, you know, doing participant observation that what we see here is, you know, a big decline in political science. So this is something that we want to go and kind of truth check this, any of these really big ones. But that kind of rings true a bit. Political science has become a lot more quantitative over this period. And so we kind of, you know, you see some increase of small increases, social psychology, sociology, zero in public opinion. That doesn't surprise me either. Text analysis is another one where we see quite a big increase. As I said, I think that sort of accords with my intuition. We're seeing a lot more text analysis where there's a lot more availability of textual archives and so on, although one small drop-off in text analysis. In terms of the transparency and quality of the reporting of surveys, unfortunately, we haven't seen much of an improvement. A lot of presses, and that was what a lot of what he was doing in his original paper was kind of saying, you know, what's the quality of reporting like. Basic reporting is often absent. There's a lack of detail about response rates and so on. A third of papers lack something really basic like the sampling method or the mode of interview. And a new thing that's emerged since press' day is a lot of journals now refer you to all sorts of online appendices where all this information is available. That's fine, but there's a question about quite how much people are actually able to access or indeed accessing that information. It's there, but not as part of the paper. So the story on transparency and quality of reporting is not great. What we want to take this next is to carry on doing our analysis, looking at the sort of combining of methods and so on, checking some of the strange or the surprising results. More ambitiously, what we want to do is to use this as a training data set because we're not very happy with the sampling, if you like, of journals. If we were doing this study from scratch, I wouldn't have started with that sample of journals and that timeframe. I would have designed, defined my population, drawn a stratified random sample of all stratified across journals and so on. And that's what we'd like to do. That may be feasible, it may not. We can come back to this, hopefully I'll finish in time for some questions, but one of the obvious limitations of our approach is that we're sticking with the same journals over time. Now, there's been an enormous expansion in the number of journals over this same period and one would expect, I think, that new kinds of science, new kinds of data, innovation probably happens in more kind of niche types of journals. The journal of so big data in social science or something, doesn't find its way into the American journal of political science until these are more established. So I think the nature of the set of journals, which are the premier journals in each discipline, probably mitigates against, militates against finding these new kinds of data. So if anything, these may be sort of lower bound estimates. If we were looking across a proper random sample of journals, we'd see more evidence. So, to paraphrase Oscar Wilde, are the reports of the death of surveys greatly exaggerated? I think they are. We've seen in the data that I've just shown you that there's no real evidence of any decline in the rate of surveys being used in this kind of set of published journals, despite those caveats that I just gave. Here's another chart from some work that I've been doing on opinion polls and so on. This is the frequency of opinion polls in Great Britain since 1940. Each dot represents more than one, more than one poll. But you can see the key thing here is there's an enormous increase in 2001 and onwards. If we carried this on end of 2015, carried on to 2017, we'd see the same effect. First, YouGov poll was published in 2001. So we are doing lots and lots and I think we're probably doing many more surveys, different kinds of surveys, but doing many more surveys than we did in the past. Here are some numbers to go with that. Between 1945 and 2010, there were three and a half thousand published polls on who's going to win the election. Between 2010 and 2015, nearly 2000. So we're seeing a real huge increase in the rate of surveying of people. Here's another way we can see this global spend on online research in market research. Again, not quite the same thing, but up to 2014, seeing this huge and growing increase in online research. That's the direction of travel. So the future, as I see it, and this is slightly a plug for our next speakers, the survey future will be about doing lower cost online surveys, means that we won't be doing fewer surveys, we'll be doing more, less costly surveys, different kinds of surveys. Why is that? Well, one of the key things here, I think, is that population inference is still really key to an enormous amount of social science. And by that I mean we need to make inferences to the whole population. It's not okay to make inferences just to the people who are on Twitter or on Facebook to some undefined pool of respondents in a YouGov signed up panel and so on. So there are real problems with moving away from the model of having samples of a well-defined target population. So I think that that is going to be a persistent issue, which will mean this remains true. I said at the start, I've got a bit of a bugbear about, I don't really see, I hear a lot of talk about big data, but I rarely see a decent social science paper which addresses an interesting social science question that uses big data. So I'm very happy for people to challenge me on that, but I think that's another reason why we're going to carry on doing surveys, because being able to design questions that address your hypotheses and so on remains a key benefit of surveys. We will see surveys changing. We'll have shorter questionnaires, probably more frequent time intervals. People are going to rather than sort of interviewing every year as you do in the sort of standard repeated cross-section model, you'll see more regular intervals. Device agnostic questionnaires, so you can answer a short questionnaire on your smartphone. Maybe a question that only asks one question, or maybe a questionnaire which asks you five questions throughout the day. How are you feeling now? How are you feeling now? How are you feeling now? And so on. So once we're doing surveys in that way, it opens up all sorts of new possibilities. We don't have to do a 45-minute interview. We can do a 10-second interview, ask people to take pictures, so on. And we'll potentially see this kind of more passive data collection, as I said, rather than doing the national travel survey and trying to ask people, how long does it take you to get to the nearest bus stop? You just either look at that on a map, don't bother asking the question at the first place. If you want to know where people are actually travelling, get them to agree to turn on their GPS and collect that data, get their Facebook feeds and so on. Big problems with that. I'm going to finish very quickly on an example of this, which is where I think there's reasons to be cheerful for the future of even the random survey. This is my own experience of working with the Wellcome Trust on a survey of young people, Wellcome Trust interested in young people's views of science, whether they want to be a scientist, get a study science at university, that sort of thing. So for the first two waves of this survey, they spent an enormous amount of money doing it in the standard way. So it was part of a survey of adults, and that was a survey of the standard face-to-face random probability thing with a cappy interview, very expensive, very slow. In this adult sample, of course, if I was in this sample, my daughter would be eligible, she's 15, so they could just interview her part of a random sample of children, but there are not many in a sample of this size. So I had to do this additional screener on adjacent houses next door, very complicated, error-prone. What they ended up with was a sample of about 450 kids aged 14 to 18. Response rate about 50%, really expensive, and let's face it, not much used to anyone. That's why no one ever analyzed it. So in the most recent wave, we thought about doing this differently, and actually with Kantar, who should take the credit for this, not me, but totally different design, the sample was drawn from the National Pupil Database rather than as being done as the post office address file sample. That meant that we had their name so we could send a letter directly to this young person, asking them in a postcard, go online, here's the URL and a password, do this short interesting questionnaire, and we'll give you a tenner if you do it. Lo and behold, we were hoping that we would get a response rate of about 20 odd percent for this. We ended up with a response rate of 50% within about three weeks, and because of the lower cost of doing this, we were able to get 4,000 achieved interviews. 25% of the kids, the young people, completed the questionnaire using a smartphone or a tablet. So now, this is a really useful, this is informed policy, there's been a recent initiative about ethnic minority kids in London and STEM and all this kind of, it's been a really useful study, and part of that is because these are digital natives, and this is maybe the future holds out this kind of promise rather than the sort of apocalypse that I think a lot of us in survey research when all this started happening once feared.