 Absolutely at first glance you might think where does transforming the census fit into this story but I hope that you're about to see that it fits really centrally into the story that we've just been talking about. So a little bit of context as to why I'm doing this presentation. Although I'm involved in NCRM and Administrative Data Research Centre for a long time I was the coordinator of ESRC's census program which doesn't exist now it's part of the UK data service which is why UK data service appears in some of these slides later on as well and for about 25 years in one way or another I've worked with ONS on aspects of census and my main interest as a geographer is in georeferencing small areas zone design. This presentation is not about that. What I'm aiming to do is to give you an insight into how some of this administrative data linkage that we've just been talking about is going to make 2021 census look extremely different to 2011 and in fact some of the things that are going on right now are likely to lead to new data and different data becoming available within a general census frame that might be useful to you even before we get to 2021. I want to do a quick straw. I'm not going to let you doze off at this point. A quick straw, I've got some questions for you as we go along. How many of you somewhere in what you do have contact with any kind of census data? Okay, about a third to a half of you. And some of you may discover that there's stuff in here that you might be using without realizing that's where it comes from. So that's the plan. I want to explain in general sense how census is changing. It's quite a lot of international context for this and I'm going to bring it down to what's currently going on in England and Wales for the Office for National Statistics. And you'll see in the middle there the administrative data is particularly key to this. And so the way in which England and Wales census is changing is an important part of how the landscape around the linkage of a lot of other data sets relevant to what Laura and James have already talked about might be changing as we go. And I've put discussion in brackets depending on whether you're very responsive when I ask you questions. I might either ask you to do a little bit of conversation as we go and get some questions out or we might save that for the panel at the end. So if you haven't been and can understand and not everybody gets up every morning and turns immediately to the census news page in their paper they're not a census junkie like me. If you just look at what's globally going on with censuses we are seeing several clear trends very similar in different countries all pushing statistical agencies conducting censuses into some common problems, common challenges, they're not all coming up with the same solutions. And so we'll see that we've got common challenges and some rather diverse responses. So one obvious thing in creating demand for really timely data. Challenge with censuses in England and Wales we only do them every 10 years. Ireland, Canada, Australia, New Zealand doing them every five years. Even a five-year gap is quite a long time between censuses. And really if you're wanting to maybe think about moving into linked administrative data you want to have really good population denominators or denominators for particular groups. You're interested in why is life expectancy stalling, what's happening to the very elderly, waiting and using 10-year data and trying to project it forward which is kind of how our systems broadly work now not the smartest move. So lots of people say actually we need baseline demographic data much more frequently and much more up-to-date. So there's a big pressure. And actually doing the census is getting harder. So actually going and getting census forms through people's letterboxes and getting them to respond again by any means is increasingly difficult particularly in developed societies we're seeing that not only because people are just less willing to respond but actually they leave very busy lives. Lots of households are unoccupied right through the working day. Lots of people spend some of the week in one place and some in another. There's just lots of practical circumstances of our lives that are making it difficult to do a traditional census enumeration. And the result of that is if you left everything the same the response rate falls. If you want to keep the response rate up you've got to spend more and more money and go to more and more effort and have more and more complex designs. That drives cost it's very unpopular with government understandably. And there's a second sort of block down the bottom there which I call dissolving definitions which is even more of a challenge but does touch on some of the things that we were talking about in the first session. And that is that the traditional way of collecting census data assumes stability of a set of core concepts. And actually for lots of people the idea of usual residents, if you've got kids that split between two parents and spend their time on some kind of complex rotor sometimes at one house and sometimes at another, the interpretation of their usual residence is far from straightforward. We see a growth in those kinds of dilemma. Definition of a household, who are household members. Lots of people, young adults buying or renting a house together. Kind of shared arrangements for some stuff, individual arrangements for others. Conventional definition of a household doesn't look particularly resilient either. What do people do for their main employment? Working in the gig economy, working on several different jobs, changes rapidly. Lots of the things that if you like a census which was developed to give you a pay perform and capture a reality with a notion that that was a sort of thing that was stable over a long period of time is almost all dissolving as we watch. And also we've got, as per this workshop, lots of interesting and exciting methods about linking data that are tantalisingly hanging there in front of census agencies around the world thinking could we do it differently? So we've got lots of drivers for change. But we're still at the end of the day and I want to go to all these tricky places and work out who is there, what are their personal and household, their living arrangement characteristics, some stuff about their housing, the places they travel to work, because we use that information in so many ways, deeply embedded in planning, education, health care, public services, business decisions. So our basic challenge hasn't changed, but it's getting harder to do and the ways we want to consume the data are different. And there you go, two original photos from the 2011 census campaign. The one of ONS is two purple buses that drove around the country with a kind of a campaign trying to get people to fill in their census forms and big publicity campaign. It's time to fill it in, it says on the bus stop. Interestingly, outside ESRC's headquarters in Swindon, I couldn't resist that particular photograph when I went there just at the time of the census and there's the census campaign. So all of this speaks to what feels quite an old fashioned way of getting a lot of data. But it's very precious data and we don't want to let it go unless we've got a good alternative plan. So let's have a little look at how we might think about a typology of getting census type data. My traditional census is down in the bottom left corner. You'll see this diagram a few times, I'm going to plot some dots on it and drag them around and do stuff. And if you think about what happens in a really traditional census, the one that we've filled in all these times before, you could be really trendy and up to date and put the form on the internet. And that would take you in the direction of my top left box, basically the form that you know and love, the questions that you know and love, dressed up in an online kind of way and maybe with some drop down boxes for answers that people find tricky and then we'd get good data quality because they wouldn't type in that their three year old daughter was a mechanical engineer or that many other implausible circumstances which we see on actual census forms. However, if we went down the route that we started off for the last couple of hours and we thought actually how this admin data, if you could link it together in a sensible way, we could probably reconstruct most of the stuff that's on the census form. We'd be heading over in my right hand side over here. Traditional census, maybe use a bit of internet as well, but let's think about what we can link on the admin side and we get lots of advantages that way. We'll have a look at those. Or we could do the all seeing, all dancing, why not keep a list of people, get some online survey to fill in tricky characteristics and do the rest of it with admin data. Have this beautiful linked system. We would be up here in the top right hand corner. And obviously this is a kind of play version. In my mind, each of these dots does represent a place, but I haven't labelled them individually because otherwise we could argue for the rest of the day. But in 2011, we see a whole host of censuses that are down here and they're effectively doing something really traditional, but edging up into putting their census forms online and thinking that would be a good way of collecting some data, playing a little bit with maybe checking against administrative records. So that dot there, I'll let you into a secret, was Northern Ireland. This is just a few I knew about. I dropped on there. Northern Ireland was the only bit of the three UK censuses where looking at linked administrative data actually changed directly in a processing way some of the counts that were published. Whereas in England and Wales, we did a traditional census, we did a coverage survey, then we had a look at some administrative data as a quality check, but the numbers from the administrative data in 2011 didn't change directly in a straight transfer of values any of the published census data. So lots of people move into the top left in that box. And what we've also got up here, because everybody always says this and it's true, is some broadly Scandinavian countries working with a population register who've long since parted from asking those traditional questions at all and they're effectively building the whole thing from an integrated system of linked data. But the key is, right back to what James was talking about, it's based on a population register with a single person identifier that the society is happy with government agencies holding and using, which means effectively, and it won't be perfect, you've got a single citizen number allowing you to connect health, education, housing, employment, taxation, benefits, the whole lot, and therefore we're looking at a really high quality but not perfect linkage in a system which is mature. And a few countries, Netherlands, Belgium, have got there in later decades, not as long as the Scandinavians, but are building things effectively from linked administrative records, not all of whom started in quite the same space as us, but generally they have some acceptance of a kind of single citizen identifier, which makes it possible to do the linkages. And just to be different, a couple of nations in my frame here who have leapt off the page and done something really quirky and different, and I put the USA and France in that box, who've come up with unconventional designs which are not quite on that page, both effectively involving a very extensive rolling survey program, where you're not doing a full enumeration at any time, but you're doing a large program of stratified surveys year on year, building up a population picture, but it's really hard for researchers to use, because when you want some data for say a reasonably small area, you may find that you've got to pool the results over three years or five years, and the dataset you're looking at is actually based on some records that are year old and some records that are three years old and some records that are five years old and you have to have a completely different design head on thinking about how you analyze the data. But if there's a general trend, it's kind of shifting bottom left up to top right. So why does it matter? And this is a sort of England and Wales perspective, because I'm addressing an audience where mostly it's likely that you will encounter this in the context of England and Wales data. I should say that Scotland and Northern Ireland, who do their different censuses, are heading in the same broad direction, but are not always doing the same things as the office financial statistics. So they are independent censuses taking independent decisions about quite a few important methodological things, which is why they're certainly not the same dot in that diagram. If you stand where we are now, the stuff that really originates from 2011 census is our single best and only source of really high quality small area population distributions with any covariates. So effectively, if you want to understand ethnicity by age for small areas, census is the only reliable source you've got. And if you want to understand anything more complex, like housing type, ethnicity, economic activity, there is no source that will take it down to small area data, the kind of thing that's going to demarcate neighbourhoods and be useful to local plans and policy makers. Something that's going to feed into deprivation indicators at real high spatial resolution. We've got surveys that might give you the big trend for the country, but none of them have anywhere near a sample size that's going to tell us about small areas. And that's because we get very high coverage from the census. If you've got very high coverage, you can make detailed local data without being disclosive. And this issue about having full coverage, high geo-referencing just gives you this enormously rich data environment, which exists at the moment uniquely in the census frame. So we've got unique combination of attribute, detail, spatial resolution. And we need those data because that drives most of our national systems which need small area denominators. So I was meeting yesterday, ONS population theme advisory board, a whole array of fascinating stuff that they were reporting back on. And in every system, whether it is to do with a bit of the jigsaw for estimating migration, a bit of the jigsaw that's looking at life expectancy, the stuff that's to do with population projections, things that are to do with migration projections even, somewhere in that jigsaw, one of the core pieces of gold is the fact you've got those reliable census estimates to start from and therefore you're building them into systems which are actually allowing you to have some certainty at least about the denominator part. It's also, this is my one geography bullet point I think, at the moment that census data sets also the driver of the basic statistical geography. So if you've done anything, anybody ever used index and multiple deprivation in something that they do, controlling for deprivation? Remarkable, not many. Normally more people reckon they use deprivation than census. Super output areas, indices of deprivation, middle layer, lower layer, the output areas themselves, they're all produced as an artifact of the census processing, they were not independently created. So we get all of those things being done from our census machinery. It gives us a range of integrated data products. So if you are interested in accessing microdata samples, various versions, controlled access census microdata, records anonymized about actual individuals and households who filled in 2011 census forms, you can go into the ONS virtual microdata laboratory and access those samples. If you want to understand change over time, we've got the ONS longitudinal study which is the longitudinal link between a small sample of individuals from censuses from 1971 through to 2011 linked to birth and death registration and cancer registry and much more powerful longitudinal studies actually linked to census in Northern Ireland and in Scotland. If you want the small area data, this is where it comes from, you want boundaries, this is where it comes from, you want travel work matrix, this is where it comes from. We don't at the moment have really resilient comprehensive alternatives to those things and so the why does it matter part of the question is that if we are to move on to doing something slightly more adventurous with our census type data, we're going to have to be pretty certain that we're replacing the really important pieces with something equally resilient and sustainable and not burn our bridges and discover we have no root forward for lots of these processes. Also worth saying of course the beauty of census is it is a giant linked dataset, so the individuals linked to addresses, linked to households are there in a database which we can't access but which underlie all the census output products and you can request bespoke tabulations and other bespoke products which are based on that. So we've got big picture, small area aggregate data, boundaries, interactions, micro data, longitudinal studies all in the one package and my one UK data service slide justifies my UK data service part is if you want to find any of that stuff working from the academic community and you think well I might need to know about one of those go to UK data service there is a link on the front page to census outputs which links through to all of those bits and pieces on the previous slide and there's explanations and links to how to get them. So back to the general census story what we've got is both of these big shifts going on at the same time and we see them in different mixtures in different countries taking the traditional census and putting it online that is clearly part of the 2021 plan for England and Wales but at the same time taking the traditional census and spinning up an alternative system based on the admin data linkage with the possible objective if you like the current expectation that by 2021 ONS have got to a point with the admin data linkage that there is no further complete enumeration and that the system which rolls forward is one based on admin data linkage and that brings with it everything every caveat that James introduced you to at the very beginning but also a whole host of opportunities about frequent data and ability to answer new questions. So there's two points in here this is from the National Statisticians recommendations on 2021 increased use of administrative data and surveys in order to enhance the statistics from 2021 and improve what happens thereafter and an online census so the goal here is that this is a successor to the traditional paper based decennial census with some caveats which I'll say something about so we're going in both directions we are effectively moving ourselves to the middle of the diagram going up to the point where we're sitting there certainly not in that top right corner because there is no sit as an identifier there's no permanent linkage at the moment between any of those things but what we're doing is moving up if you like into a position which has got a if it's if the census is a four-legged thing it's got a foot in each of the if the courts and that is something which we see going on pretty generally so New Zealand, Canada, Australia, Northern Ireland, England, Wales censuses that tend to cross reference one another quite a lot all going to about that same position and also a few countries that are not pushing into the admin data are still nevertheless shifting out of here and putting their traditional census online and they're all cross referencing one another because they're all very anxious about one of the obvious things what would be one of your most obvious concerns if I said I would just do the census online next year what would possibly as a researcher concern you about that digitally excluded people certainly that would be one of my key ones whatever anything else that might concern you if we said I was going to do it online security fraud public acceptability I mean public acceptability basically underpins all of those things so if it if it's not trusted if people are bought into the thing it's not going to work so let's do the administrative bit first and visit that and then we'll briefly touch on how it interacts with the online and we'll come back to them put them together the the the commentary around this and it was there in 2011 as well I think a lot of people don't realize how close we came to not having a 2011 census so financial crash change of government two years before the census um Francis Mould who was minister of the cabinet office new government coming in and saying we need to cut all sorts of superfluous things why we spend in all this money on this census thing and the answer which meant it carried on was not anything to do with really the value of it but the fact that we spent most of the money already and it would look silly not to and that is that was my analysis at the time and it is still my analysis of how close it was for 2011 we could have been looking at some kind of big survey or something that was much much more scaled down and all the data that the stuff that we've been talking about we rely on we would have been re-engineering fragments of that for the last eight years but the narrative goes we're collecting loads of administrative data so we've already talked about lots of it this morning DWP HMRC Student Loans Company TV Licensing Authority Driver and Vehicle Licensing Authority NHS Department for Education Higher Education Statistics Agent they've all got these data sets surely surely surely if the public purse is paying for them all you could put them together and say you've you know three quarters of a billion you're going to spend on a census and you'd still have the same data that is a very attractive high level narrative other countries do it although let's face it the ones that are doing it successfully are doing it based on a citizen number a common population register and there was quite a strong sense at the beginning of the decade that says well yeah that's the way to go how fast can we get there can we do it let's let's let's let's make the thing happen but of course there are a lot of obstacles and you spotted some public public acceptability obviously but actually the technical and legal and practical issues are very substantial what it would look like if you like current model of an administrative data census interpreted in a UK setting is select some of the key public data sources that already exist and then do the personal level linkages so develop of a kind of a standard routinized way of taking maybe what Netherlands do an annual cut of those administrative data putting them together and building if you like a statistical model of that population which is done with the linkages at the person level you then have to deal with all kinds of imputations and corrections for the things that are over and under counted in the different data sets and you would produce from it something that looked like a census result and a set of census data products each of the sources that you bring in bring some extra variables so you want to know about housing well we could get housing data from evaluation office agency we don't need to ask people to estimate how many rooms are in their house and we want to ask people about vehicles available to a household and there's a different kind of question but we know about vehicle registrations we get the demographics that's seen quite a lot of them you want to get education seen quite a few to a certain degree so the idea is that each of these things replace or enhance components of the existing system with the potential of course that you could then rematch re-update and that you do the thing you'd be able to have potentially annual data sets for some of those so here's a little insight into 2011 where the thinking got to that's I know you can't see the numbers it doesn't matter in the quality assurance work from 2011 census as I said in England and Wales the other data sources that were available were not used in a computation to adjust the census counts but several alternative data sets and although you won't be able to see them because this screen is not especially high resolution we've got here age groups what the census counted some estimation bands around that and some estimates data we got from census coverage survey and then we've got some other lines which is from the NHS patient register data for certain ages we know from who's on the school system we've got some electoral register data we've got some benefits data for the young and the elderly which effectively are all giving you at the really simple level estimates of how many people you thought there ought to be an age band in each small area and ONS used that data in the quality assurance publishing a lot of local authority profiles a bit like this one what's going on here this is City of Bristol we have the line in the center of the purple band which was what was finally published as the census estimate but then the other lines are all the other sources and how they stack up over age so the blue line at the bottom is the people the census actually count it census doesn't count everyone that's why we do a big coverage survey and we do a lot of quite complex imputation for missing people and missing households until we get back to what we feel should be the correct estimate but also in there we've got the NHS register which is the patient register is the green line tracking well above census from around about age 20 through to late 50s but actually pretty close with census at the very young and the very old and we've got some of those other sources in there and so looking at this the question which is being asked is effectively if you can model the relationship between the lines and you've got the other sources anyway why spend all that money getting the middle line and that effectively is what's underlying this but that is a very univariate kind of view because it's assuming that what you're after is some small area demographic estimates for age groups and it gets very very complex very rapidly once you start wanting to add all those other attributes which are important for specific studies so ONS had a look and that ran a project for two or three years called Beyond 2011 I'm going to make a prediction there'll be a project in 2022 called Beyond 2021 and the title of the projects has changed a bit but they said well look here's a lot of census options there's traditional census up the top we could do a rolling census that would be something a bit like the US been doing or the French we could do a short form where we send a short questionnaire to everyone and then detail to the others that's what the US used to do we could just do a head count and send out some big annual survey and ramp up some of our current surveys or here's some admin data options basically what I just showed you for Bristol is the aggregate analysis that doesn't involve the linkage it's just producing estimates of things from individual admin records or we could do a bit of sample linkage or we could create 100% linkage of everybody or we could just do a big survey and all of those options if you like started on the table and here's from that that we have got to the current position which I stated so in order to get there it's not that these other things have not been considered but they've been considered and crossed off along the way as not being workable ways of producing the data that we need and one of the particular challenges is that a lot of those methods especially anything survey based doesn't produce the small area data and it's evolved quite a lot in the kind of the debate around why the small area data really matters you can use the national survey data to get good estimates for sorts of things but you can't bring it down to the local level so if you go now to the ONS website and you search on admin data census research you will see the beginnings of some outputs they're all experimental statistics which are the results of making estimates of various things from admin sources and you also see a lot of very interesting research reports which are the first attempts at linking one or other of those data sets and understanding the admin data quality but this is not at the moment a fully integrated linked system one of the things that ONS has invested a lot of effort into is experimenting with quite a wide variety of matching methodologies because of course we haven't got a direct citizen number even what Laura told us where they happily discovered that national insurance number was both on the student loans data and the RC data doesn't exist in most of these cases so we are back at the level of taking in some kind of hashed and anonymized age birth versions of name address data and then using if you like automated matching tools that rank the weights and probabilities that we've got the same person and building if you like a lot of different tools and models for that trying to refine towards one which would be able to routinely support the kind of thing that would be needed in this 2021 world I throw a couple of these slides in because it's easy to think that it doesn't fundamentally change anything and actually this way of thinking fundamentally changes loads of stuff that we take for granted if you look for example at what census currently tells us about people in families grouped together in things we call households and households occupy household spaces within dwellings and those things usually have addresses that is a hierarchy which people contest a bit because local government doesn't have exactly the same definitions as individual departments but you can get all of those things from the census data infrastructure because we have a kind of model forgetting them anybody here interested in households yeah few flickers of maybe households very interesting because of course households are almost completely invisible in most of those administrative sources in fact I probably go as far as to say that there isn't any real way of completely linking the household in the sense that we understand it from census by putting the admin data together nearly all of the admin records are based on either the person because we're matching the person or one of the tiers of the address that says here's a group of people associated with this address but that doesn't necessarily mean we've matched it all right we may have one of the previous residents and three of the current residents and go and think that they're a household and we would get a very very different answer to what we get from the census question actually the only way we currently really understand who's in a household is by asking them and for the last five decades all the census questions have asked something about people you live with and have some kind of common housekeeping so you share a microwave or in fact they used to use the phrase common housekeeping which for me feels like that there's a glass jar standing on the top of the fridge with the with the 50Ps in and somebody probably mum went out and did the shopping down at the co-op our world is not like that anymore but actually who people think they're living with and what those living arrangements are we only really know by asking you cannot derive sophisticated understandings of something like that by linking the admin records you'll end up with something but it will be something different and this is back again to what James and Laura were saying about the fact that understanding the purpose for which the thing was collected versus the purpose for which you want it for our whole world kind of unravels in several directions and we have to start thinking about stuff very differently we may actually end up if you did an administrative census having this whole stack of constructs for ways that people live together where you may have often connections between usually a mother and children which you can trace from some of the administrative data you have a fairly strong connection through some of the DWP benefits data but it's not going to get every household it's not going to get absent parents and indeed as households and family relationships change we may well not have a very good track of that at all so lots of the things that were challenging for census are even more challenging in this particular world let's just turn to the internet for a minute because that was part of this jigsaw it's not even as simple as solving the data linkage problem across a load and load of different government records it's much worse than that because we're going to try and do an internet enumeration 2010 11 censuses as I said most of the ones in the bottom left of that diagram internet was an option so you could complete your 2011 England Wales census form online oh yeah there it is I took a screenshot of it knowing that one day it would be useful so on the left is the form actually that's a specimen one but it's what the paper form looked like and on the right was the internet version and 16 percent of the forms were returned online so effectively on the census form there was a url and a code and you could go to the url and put in the code that was on your form and complete it online what we get is really good data quality because of being able to control the structure there may be some response by us in there as well maybe people who would have filled the form in better anyway tended to do it online so obviously as you've pointed out there's there's a big age bias in there and in the latest tests for 2017 trialing bits of census the 20 year olds 90 something percent of them responded online in a small scale trial in 2017 50 percent of the over 85s and we're pretty certain of most of the 50 that seem to be online it was somebody else doing the form for them so we're changing the thing we're measuring quite a lot have a look at what happens then post 2011 though so we know the forms of high quality so all the census agencies around the world say way to go let's do this New Zealand 2011 census disrupted by the Christchurch earthquake so they put it back two years pushed online heavily got up to 34 percent looked pretty good Australia 2016 Canada everybody who's doing a mid decade census think yeah we're going to go for online first Australia disaster census night big campaign against distributed denial of service attack on the census site bit poor PR 10 percent response when they've been aiming for 65 percent this is a really risky strategy if it all goes wrong this is worse than paper forms and a mail strike because of course at least when the mail strike is over your paper forms might get there you do it all online and everybody loses faith in the exercise or somebody just puts the wrong message out or maybe there's a data breach or something else happens completely outside the control of the census agency the week beforehand it's a very risky strategy so actually despite the fact that it looks good and that's the campaign is targeting 75 percent online response online first you need still a colossal infrastructure for trying to prop it up and be able to do it with a lot of paper forms and people whizzing around and a complete communications architecture for dealing with things that might go wrong Canada does look like they've got it right 68 percent so it can be done can actually get high coverage and of course those internet responses are quite cheap so once you put the investment in you've got the date it's basically ready in the database do a bit of cleaning and checking but fundamentally you are creating at source the census database from which everything else can be constructed what's difficult is the business of how we then get to all the people that will not respond online and so that's what the 2017 test was experimenting with do you send an enumerator just straight in advance and say these areas are going to be really hard they've got poor internet connectivity or they've got populations that we know aren't going to respond online very much and you plan to go with a paper form or even digitally assisted somebody coming around with a tablet but if you do that we're going to introduce all kinds of response by us that we haven't previously had in our census data set so none of these routes is plain sailing and what we get in 2021 is intended to be online first assisted for those that can't and then lots of admin data integration being spun up if you like to the point where it's getting to the same speed as the census outputs and we can say oh we got this data about size of households numbers of rooms car ownership and we've got these other things are they telling us the same answer because if we've got to about the right place we don't do this again we just carry on with the admin source I threw this one in although it's just a couple of slides because somebody will otherwise ask and it's been in the news lately although it is not a part in any integral way of the 2021 plan which is what about big data and I make an explicit distinction between admin data and big data DWP's admin systems are pretty big but they're not big like Google's or Vodafone's big data systems I was about to pick up my device this thing sitting here connected to Vodafone and we'll have told them a couple of hundred times since we started that Dave's device is sitting here in this room that's really big data but it doesn't actually answer many of the census type questions it could be really important part of the jigsaw for some so data about things and events that that device started at my home and came here but it usually goes to the University of Southampton occasionally it goes to the University of Southampton might be quite a useful way of extracting some travel to work information if you didn't have a census form that asked it and you're not going to get that directly from the linked admin data set you might make an association between me and the university but that's not really going to tell you about local travel to work that might so locational stuff certain types of activity expenditure household energy some really tantalising potentially very powerful sources for using the big data if you saw it's stuff that is sensed rather than part of an administrative system or something that is volunteered but we don't really have the means of doing a lot of linkage between them they tend to give us a fascinating noisy picture of a particular system but the integration of that with the other data sets is really quite challenging just to illustrate that this is not going to be plain sailing this is not even the internet census this is this one is based on a little trial 7th November so only a couple of weeks ago mobile phone tracking data could replace census questions say the BBC and the little bit under the picture you might not see is thousands of people have had their movements tracked by the Office of National Statistics to see if they can find out where they live and work don't believe everything that you read ONS acquired for local authority level flows with more than 15 people in them from a secondary analysis company the work flows into three London local authorities from one phone provider for a four week period just to work out whether or not this might tell you anything that was useful so the headline is a bit of a long way from the reality but elsewhere around the world we do see people starting to get some useful mobility data in particular but it's only replacing one little silo think about everything else we were talking about earlier in the in the morning not much of that is going to get replaced directly because we don't have the attributes on most of the big data they're usually proxies for something that's quite a long way removed from the individuals fascinating futures huge ethical questions about how you use it what on earth is consented so should I perhaps start looking at which devices spend lots of time together and then I might have a new definition of practical living arrangements and households all sorts of things that in your working lives we will have to deal with we're not there in the 2021 world so 2011 looked like that in a nutshell we had traditional enumeration for 84 percent of the population on the left with 16 percent responding online those data being cleaned and put together into a census database then several big steps including the coverage survey and checking against various administrative data sources adjustment and estimation to produce if you like an outputs cleaned database from which all the outputs that we know and love are available on the right hand side there's more boxes available other boxes are available I just haven't put them on this slide but the 2021 model watch the admin checks is one that looks like this where we've got hopefully a large amount of internet collection a little bit of traditional hopefully almost swapping those two over several admin sources coming in and providing variables income quite possibly maybe replacing some of the questions like number of rooms in the household or in the dwelling space which we might get from the admin source that we could link feeding actually into the census database so that this thing then becomes the beginnings of a statistical population data set rather than what you got off your forms maybe still some admin checks from other sources but I've kind of shifted it in cartoon form down to there because that's really the 2021 thinking and the coverage survey still being needed 2011 census coverage survey you think of it as just an adjunct to the census it is still by far the largest household survey we've conducted in the decade so 300,000 or so addresses stratified all local authorities different area types visited to work out how well you've done with your census because otherwise you don't know how to do the correction we're certainly still going to need that in there and then we can still do the same set of outputs so I put this in the point where I thought I would be after about 40 minutes and it's not far off there I've got some questions for you so just to cede the conversation we have later I want you to chill out a bit and say something to somebody else on your table just pick something off there that looks vaguely interesting to you are there some census variables stuff we've talked about that's relevant to what you do maybe for some of you that's easy what might they look like if we go down the route that I'm talking about or worse if you had to do them from an admin source just have a little ponder just think yourself three years down the line and think well I know how you do it now but when we get the new data well might that be the same or not what would be the biggest challenge how would it affect what you do don't spend ages I'll give you about three minutes but what I want to do cede a few excitement about the future or possible anxieties about the future that you might come back with because then when I ask you for things in 10 minutes time we're on a panel you'll have lots of material to work from have a quick conversation there's your admin census just a kind of further update to my processing model that says now I've got no standard enumeration on the left hand side I've just got a variety of linked admin data sets we bring them all in they get linked in that statistical population database which interestingly is probabilistic so it can have half the person at their term time address and half a person at their home address for example in the way it's currently being constructed it's a floating point calculation it's not exactly here's a person and they're wholly in one place and I can foresee the big data coming in at the point where the admin data was being used for the checks possibly on its way to providing some variables over there but importantly with your point about validation I don't see any route here that doesn't involve a coverage survey at least as big as the one in the 2011 model because if you do any of this you will find exactly if you stop doing the census how do you know when you've drifted off the admin sources governments will change policies child benefit things of this kind we thought we'd got the admin data but then you haven't changed the admin data doesn't look like the admin data used to look like maybe it'll be the same with something to do with universal credit you know some future government will do it differently so we have to have the concepts firm and know how we're measuring them but it's got to be resilient against the fact that most of the data are there as a result of particular policy that way of doing stuff lasts about a decade and we do it differently so maybe we end up with this world which is mostly administrative some big data sources a large official model of the population and we do a big coverage survey every five years or 10 years or a rolling one that helps us to fill in where do we think the reality lies and that's possibly the answer to the validation question that didn't address your ethics question and the legal is a bit separate from the ethics the ethics debate I think is it's still needing to be had and the security concerns about the online census we've got lots of things still needing to be had but there are in a sense then legal terms quite a lot of exemptions for putting the data together for the purposes of creating the official statistics for the country but I would really separate them out from if you like the broader ethical discussion just because it's legal is it ethical discuss might want to come back to that one there we go what future so 2021 we're going to get a vaguely conventional census question but mainly enumerated online admin data integrated some big risks which ONS are working on in fairness a lot of thought but we see that really bringing a routinization of the admin data integration and I would say that where where the government departments and ONS get to if you like their comfort zone and success levels for 2021 that's the single biggest driver so where 2021 census gets to is likely to set some standards and norms which will then affect what happens when any of us are talking to them talking to departments who hold those data future linkages certainly in the early part of the next decade because it's all going to be based on this worked within the census frame these are the matching methodologies these are the things that we understood and how it worked we've got the validation and it's going to be an enormously important marker that affects the projects we apply for now that might be ready in six years time shall we say and beyond that something really diverse but I think needing to be calibrated with those big coverage surveys so effectively last couple of slides that's your 2021 model we shrink it down a bit we're going to wrap big data around the edge a lot of those countries are going to it's like somebody's hits the billiard ball the cue ball comes in from bottom left and they scatter they're all heading up in that direction and my final comment just for interest is that there were a few like this who I call leapfroggers and we're down there and then they've gone there because most of this is a developed world census narrative I've given you and in lots of parts of the world where there have been no formal censuses for quite a long time period the big data routes which are saying we can use satellite imagery to find the houses we've got a load of mobile phone data really high mobile phone coverage we've got mobility flows it's some really basic demographics and counts will stop even attempting to do what's bottom left and we'll jump immediately around to top right so I should stop there