 I want to start by saying I genuinely welcome the excellent work the UK data service has been doing to promote and enable more and better, and those are two different things, use of data for public benefit and it's that public benefit use that I want to speak about. But I should also note that this does rest on, as Jane says, a lot of excellent work and a lot of money that the ESRC has given over a number of years to strengthen and make coherent a data infrastructure for social science and public policy. It's been a major investment in funding coordination and support, and it not only includes support for the major longitudinal studies, growth cohorts, understanding society, but the more recent initiative to support the use of administrative data from central government departments and local government and commercial data under the big data rubric. So we have a more robust and coherent data infrastructure, though it's still very early days. But I do want to note too that this has meant a real cultural change on the part of those providing the support for that data infrastructure. So just as we've heard from the previous speakers, this means the data service, which used to be accurately called a data archive, does a lot more than archiving data, although that's important. It actually helps provide a much more proactive sense about training for data use, about tabulation services for certain studies where that can be done online, and to actually enable people to use the more restrictive data that many, not just in the academic community, but the private and the commercial sector want to do. So all of that I'm going to include under a discussion of a public benefit regime. But what I really want to do, I think, is follow on very nicely from some of the comments previously and turn the tables a bit on you. You're interested enough to come on an early December evening and say, what are we missing in the wider community to take forward some of these things and appreciate not just the technical difficulties, but the kinds of public arguments we need to be making. And I speak as someone who until recently oversaw major funding programs to use data for public benefit as funded by an endowed charitable trust. And so these data or the analyses we funded weren't simply for academic purposes, although we welcomed that, but it was to look at the effective policies in various areas to see that they got objective scrutiny, and that wasn't just left to a political party or, dare I say it, a particular government or a local borough or a particular hospital, to say how can we understand some of the basic science and what implications those have for policy. So for instance, we used to fund a lot of work on using longitudinal data to capture the links between early childhood circumstances and social and educational prospects long term. And we did that not just because it's good social science, but because it is part of being in a democratically accountable world. And one of the advantages of a truly data-rich society is that some of those data need to be used for public benefit and public debate. So I start from that public benefit perspective. I think in fact all of us have had that from one way or another. And I want to talk about some of the challenges that we face as a community. I'll just start up here in ensuring that wider use. First I think we've got to recognize we're still a long way from the human capital capacity or capability to do this work. We simply have too few analysts, either as individuals or working in teams, who bring together substantive understanding of the areas we work in, real understanding of data collection and the challenges of data collection which in many ways have only grown in a big data area where you can think about some big data as thin but relatively universal and the thicker richer data as being harder to get than it was when I was a child and started out doing survey research. So the concept of getting robust representative data isn't just the number of data points. And to link both the substantive knowledge and the analytic skills for data collection with the skills for data analysis is rarer still. And those don't all need to be in the same person but they need to be working in closer operation together. Now while I was at the Nuffield Foundation I was lucky enough to spearhead and work with a number of others on the Q-Step initiative which the ESRC and Hefke were great supporters of and not just in funding and which continues still and it's nice to see some people hear from that. And the point of that is to reorientate some undergraduate teaching in social sciences not just to do numbers but to bring together that substantive analysis, data collection, data analysis. Give people more of an understanding of those challenges. But that is just the tiniest drop in the ocean and it was 20 million pounds. That's the tiniest drop in the ocean of what we need. I think as a community we're not strong and vocal enough outside of the STEM context in saying that we need to be proselytizing in schools. We need to be concerned about secondary school curriculum in particular and how to widen the over-specialization which divorces numbers, statistics, research design from various substantive questions and issues. And I think further efforts are going to be needed at undergraduate level at work placements with paid internships with that kind of outreach. So I think as a community those of us that really feel passionately about data being used for public benefit need to recognize we've got a human skills shortage and it is going to take concerted action and pressure on the part of governments of all strikes who hesitate to dislocate the current A-level curriculum. So that's a number one. Secondly, I think it's clear we don't have the right public policies yet. And I say that as someone who also feels passionately and I think as all of us do understands this use of data protection, confidentiality, anonymity and misuse of data which is wider than those. If I may say so, I think much of the social science community has long had a more nuanced and genuinely robust regime for considering and adjudicating these issues. We've long had ethical review bodies, bodies that review the data conditionality for under what circumstances can you merge, is it genuinely in the public benefit, how will you protect confidentiality and so on. And although these arrangements have been deepened and strengthened, the actual architecture of those goes back decades. But I do think all of us should be more aware of the way the UK Data Service has recently crystallized these in their five SAFES principles. Which is an attempt to get beyond talking about anonymization and pseudo-anonymization and firewalls and all sorts of things that are just really important but very complicated if we're trying to have an argument, a discussion, as Titmus did about blood transfusions, about public benefit and why we give things. It's not simply a question of monetizing data, it's saying why do we permit certain uses of our data, knowing it brings public benefit and that our confidentiality is protected. And to my mind, thinking about these five SAFES is really important. But first, I mean I'll just briefly recapitulate them, although it's an articulation, I think a clever one that as I said tries to take on board things we've all been worrying about for ages. SAFES projects, projects genuinely for the public good, with publication of results being an important indicator that it is public. In some form, I'm not saying publishing the individual data, I'm saying publishing results so that people can query it with some sort of proportionate and independent review of the project plan before it starts. This means you can allow general consent rather than specific consent and I'll talk about that in a second, but for those of you who don't know, a truly specific consent regime would mean every time an analyst wanted to do an analysis that wasn't, that you hadn't signed a form for, it couldn't be allowed unless they went back to you and got your consent. And I'm one of the geekiest people I know about this and I know that kind of mail would go or email would just go into the bin box. So you end up with very low response rates, non-representative data. So we really do need a model for public benefit research which safeguards confidentiality but takes that public benefit argument seriously. I think the ESRC is administrative. Data network is a model of this sorts, but as I said there have been such mechanisms in place for long. Secondly is a question of safe people and that is really complicated, but it does mean, as we've all talked about, there's some data you can make openly available at a general statistical level, at a regional level, but a lot of individual data you couldn't make open because it would give rise to confidentiality concerns. So you do need to have programs for training researchers, undertakings, legal protections, institutions who take responsibility for anonymity and safe data handling. We've talked about safe settings and again all of you will know about the variety of those and safe outputs so that you think in advance about what level you can publish data. But in a sense to my mind the fifth safe, safe data, actually in a way follows from those. It's not quite the same thing. I think we're all aware that in this day and age virtually all data or many combinations of data can present challenges to confidentiality and those are not an inherent feature of the data themselves and certainly not a datum itself. It is to do with the way they're released in conjunction with what else. And the excellent work that the ESRC supported in conjunction with ONS that was carried out by Mori on public attitudes towards use of administrative data showed that if people can be persuaded to invest enough time and talk to you over a bit of time about this, they understand these safeguards and as been previously said they're actually concerned that you aren't doing things for public benefit. That there is a sense that as long as they feel that data aren't being transmitted in ways that will not only breach confidentiality but will snitch on them or be used by big business or big government to monitor them if it's being used for strategic understanding to bring about a public benefit there's actually an incredulity that we're not doing more of it than we are now. So I think controversially the reports show that the public does have an altruistic streak about wanting to ensure wider use of and better engagement with evidence. And I say that also as someone who does take as I said the confidentiality seriously enough to surmise that big business is doing this anyway. We sign without looking at it the conditions on you can use my cookies you can do this I'm six pages the last time I booked an airline ticket online 65 different emails following that chased me up on that even though I said no third party emails big business is going to do this because they can say it's within an organisation. And actually when government really wants to it can do this too. I mean it did do it with the results of the 2011 riots where it brought together various home office data and Department for Justice data and different people have different views whether bringing together those individual data were or were not sure should not have been protected in terms of confidentiality I'm sure that they went to great lengths to ensure there was no individual follow up it was to understand characteristics. But I do think we've missed a trick if we haven't been able to say that having public benefit research to provide public services to scrutinise government and the effect of government policies and to actually give a counterweight to those who would have access to bits of data and would make arguments means we are missing a chance to say research can be for the public benefit. Now let's so end my challenges with a couple of three concrete observations about things that all of us I think as a community ought to be considering. First we all need to be engaging and it's only just in time with continuing concerns over the EU data protection regulation the proposed protection and I won't go through all the complications it's very complicated how it's got to where it's got to but as it stands now there is still a danger that it will require individual specific consent for reanalysis. So meaning that all those hundreds of thousands of people that have taken part in the cohort studies who have given consent to their data being linked to their child's educational record or their benefit data on the grounds that it's anonymous, it's protected it's got data safeguards they've gone through a very clear consent procedure but they don't have to vet each individual analysis. They can't say well I like Professor X and the study of poverty but I don't like Professor Y in the study of healthcare usage. That is a risk of the current legislation and for those of you who don't know the Wellcome Trust has been doing excellent work to try to get people to sign up to having a public benefit research exemption from specific consent. It is not an exemption from consent. It is not an exemption from anonymity and pseudo-anonymization or confidentiality. Those safeguards are if anything strengthened by having the notion that they have to go through an independent process. It's not just a researcher who has to say so. But I do think this is a danger and it will cut asymmetrically on public benefit research and commercial research. That's an empirical statement rather than a philosophical one. Secondly, while in the UK we've had general government support and dare I say it increasing general, rhetorical government support about secondary usage, about data linkage about depositing data with the archive still like a concrete and coherent framework that views public benefit research as a positive public good. And in the absence of such a framework various central government departments often delay depositing data or deposit it with seriously ratcheted up restrictions about when it can be used. Even when this poses no threat to data security. And I don't blame colleagues in government or the different health authorities who each take a different line on this. It's the chilling effect of always being worried that you've not quite got your guidelines right in a very complicated field. Which means it's just easier to say no or delay. And that's a particular issue where you want data for timely use. So I think it would be helpful this is an unbalanced judgment on my part happy to talk about it to have a very clear pathway for public benefit research subject to the safeguards we've talked about which would require I think a positive and concrete need, i.e. legislation or regulations for government. The Cabinet Office in 2013 and 2014 did some useful discussions on this. To my mind I think they were very nervous about trying to say it's public benefit and I appreciate the difficulty of defining that. I don't mean by that simply academic work and I don't think that would need that sort of definition but I do think it needs to be made separate from data linkage for government's own purposes to uncover fraud and so on. All of which we might agree are helpful but all of which we might also worry about in a different way about who guards the guardian who's going to say this is really to check fraud and what's going on here. So I think that's something that after the EU data protection regulation is clearer and we need to be active as a social policy social science community to say we need a framework which is both protective of individuals but enabling of the collectivity. Finally we really need a clearer appeals procedure in the case of government departments or local authorities or local health authorities that are dragging their feet and their legislative need to do so and I'm not underestimating the difficulty of that but I do think if we're really to make use of some of this data we need to break some log jams and certainly in the realm of central government public policy data of the sort I'm most familiar with I would say the UK statistics authority is well placed to play this role but it would mean them having an explicit aim of promoting public benefit use of data not just publication of data they collect and probably require some change in their standing vis-a-vis other central government departments and I don't underestimate the difficulty of that both small people politics and in terms of the legal basis of the different departments but I know the ESRC is considering this sort of strategic issue and the role of what used to be the UK data forum and it's like in helping to promote those discussions so while I want to end on a really positive note and say we're in a world that even 10 years ago I don't think we thought we would have seen with the stable infrastructure for the big central cohorts with a much greater appreciation of the role of data linkage, admin data linking with commercial data and so on I do think we as a community need to tackle those challenges because if we don't address the human capability and the public framework we won't be able to reap the benefits that I would argue are generally for the public good