 Good morning everyone. Just about actually it's afternoon isn't it? Welcome my name is David Martin and I'm one of the Deputy Directors of UK Data Service and I'm just going to introduce and provide some context for what we're going to be telling you over the next 25 minutes and in particular I'd just like to explain that the team who are going to be talking to you are all part of the UK Data Service Census Support Team and from UKDS you can gain access to census data from 1961 right through to 2011 which still is the most recent of the censuses to which data are available and we will be talking to you about the data we've already got and also what's going to be possible with data from the 2021 census and we're doing this right now because this weekend is Census Weekend so the team are all part of the UK Data Service but are spread across specialist units a variety of universities and you see them listed there and we work together to try and ensure that researchers have got ready access to the census data and to provide the most flexible support that we can for users of census resources hopefully in as friendly and intuitive way as possible and so today's discussion is a part of that process. Very briefly I want to set the context for the 2021 census that's going on now and the 2022 census in Scotland which you may have heard a little about. It's important to understand that in the UK there are actually three separate censuses taking place because the census legislation is devolved in Scotland and Northern Ireland and they are similar and coordinated but they're not exactly the same exercise and not exactly the same questions. So these are decennial censuses there's a statutory duty for people to respond to the census questionnaire and the data that come from it are an absolutely fundamental source of demographic and economic statistics they underpin lots and lots of public and private decision making and they're key to providing denominators and insights across a whole range of research. In England and Wales run by the Office of National Statistics and in Northern Ireland run by the Northern Ireland Statistics and Research Agency census day this year is this Sunday the 21st of March. In Scotland run by National Records to Scotland a decision was taken partway back through last year to postpone what was going to be their 21 census to March 2022 and so there will not be a single census date across the UK products for this cycle and obviously COVID has posed a lot of challenges to the census operation. So there are adaptations to the guidance around the census questions particularly questions that are going to be hard for people to interpret about travel to work, usual place of work, students places of residence and there's also modifications to the field operations the ones that are going on right now and clearly the delay to the census in Scotland. So we're going to be seeing a census that needs some special interpretation and awareness on part of data users in terms of the constraints applied at the time that the 21-22 censuses were in the field. Key features of the censuses that we're doing right now importantly this is the first time that we've had a census operation that's been described by the agencies as digital first in that the whole system was designed for online completion and the target is that 75% of the population will complete the census online usually by receiving an online access code some neighborhoods will still receive paper but it still has an access code on it and those paper forms have been targeted to neighborhoods where either internet connectivity is poor or testing has shown that they've got a demographic which is unlikely to respond directly online but everybody has a chance to go either online or on paper so it's a mixed mode enumeration and the actual core question set in these censuses is largely comparable to the question set in 2011 so there should be lots of opportunity to do comparisons intersensibly from 2011 to 2021. The new forms have got some additional questions so in England and Wales there are new questions on gender identity and past service in the armed forces those are not present in Northern Ireland and across England Wales and Northern Ireland there's new question on sexual orientation. There's also for the first time some explicit linkage of administrative data sources that will actually feed through into the census outputs and in Northern Ireland where they're not asking the question on the census about past service in the armed forces there is going to be an attempt to build a model of that data from administrative sources and then across the piece new administrative data about numbers of rooms and income estimates so these are data which are being drawn from existing government admin sources but will be used to help build census outputs so important to understand what happens next. The census is in the field right now in England Wales and in Northern Ireland the reference date for which people are invited to fill in the forms as the 21st although many millions of forms have already been returned and once the enumeration period is complete and that lasts throughout six weeks after census day all of the paper responses will need to be returned in the post and scanned and merged into the single census database which is effectively the record of all the responses completed by either channel. Coverage survey will then go out into the field in May and June which is a carefully stratified sample to understand the extent to which we've reached all the different population groups in all the different local authorities and then that intelligence comes into the process in order to allow the census agencies to undertake quality assurance of the responses, edit missing and obviously incorrect answers and effectively use the coverage survey to estimate missing persons and missing households and this is a fairly well established process from the last two censuses. When that's all been completed there'll be an adjustment of the provisional output area boundaries so that data can be tabulated in a consistent way all the way through the system and we should start to see outputs available to us from March 2023. Clearly that's not going to cover Scotland and there won't be any whole in UK outputs. So what's going to happen now is that members of the team are going to talk you through the specific output areas starting with Dave Ronsley taking us through the aggregate data overview. Thank you Dave. I'm just going to share my screen now so hopefully you can see that. So the aggregate data I guess that's what most people think of as the as the census so that's counts of people or households at a given geographic level. So what do we expect from the National Statistical Institutes? So hopefully we'll get some headline figures. The NSIs have said that they planned to have an initial set of census results one year after census day and all outputs available within two years. So those initial outputs will be headline figures and then we'll start to see outputs appearing slowly in the two years after that. Both ONS and NISRA have committed to a flexible dissemination system that will allow users to create their own outputs by choosing variables and geographies to suit their needs. This will incorporate a form of dynamic disclosure control to protect data confidentiality and that possibly will be a mix of data blurring, records swapping or data withholding. So we might get to see lots of different populations depending on how that disclosure control works or it may cache a query once it's created and reuse that but we'll wait and see that. There will be a number of predefined tables probably similar to the 2011 key statistics. The harmonisation work on a UK census will obviously be delayed by at least a year. More details will be passed on as we find out what the details are. NISRA are also working with the Central Statistics Office in Ireland to produce where possible key statistics for the island of Ireland. There will also hopefully be API access into the NSIs databases so that we can and other people can build interfaces that link directly to the NSIs data. We've not got any details of that yet but we will let you know as soon as we find out. So what do I expect from us? The first thing we'll do is we'll do some gap analysis between what has been published and what our users are most using to identify if there is anything that's falling down that gap and if we have to build any new tables. We will archive a full copy of all the data from all three NSIs just in case fingers crossed not that in the future their websites change so that we've got access to original copies. We will provide bulk access to the data via our decan interface but we're also building a new interface to the data which will incorporate all the data from 71 up to 2021. That will have API access as well so that you can also build interfaces on top of that. We'll also be working on a UK data set as well that will expand on the harmonisation work that the NSIs are doing. So we are currently working on two new interfaces one which we're calling the moment UK CAS which will unfortunately get rid of CAS web which has been going since 1998. It will also do away with Infuse and that will provide an interface to 71 through to 2021. We're also working on a new and improved geoconvert and both these interfaces will have accessibility to built-in and API access built-in and we're hoping that we'll be able to link from UK CAS straight through to geoconvert so that you can take data at census levels converted to other geographies and also add in further metadata and these both of these interfaces will be extremely fascinating responsive much more so than Infuse at the moment. I have a couple of very early screenshots this is the UK CAS which will you'll be able to add in your filters add a location type and an extent and then you'll be able to see outputs on screen before you download them and geoconvert were a little bit behind UK CAS with geoconvert but this is an early screenshot we can add in various metadata groups and we can do some wildcard selections with this now which we couldn't do with geoconvert so you can you can for instance say all postcodes for Manchester or all postcodes for LL etc and that's I think that's all we have for aggregate so I shall hand you over to I think it's James next sorry Ollie okay so thanks today for that review of what's happening with aggregate data I'm going to talk about the flow data which are another one of the census outputs they're rather less known less well known than the the aggregate data so flow data or origin destination data are data about people moving from one place to another there's various sets of those that have been produced from the census for several iterations of the census those include migration data which are based on the question where were you living one year ago was it the same address as your current same as your current address or was it something different um there's a sub question within that to ask whether or not your address one year ago was a student term time address or a boarding school address and that's used to generate a separate set of migration data for students there's a journey to work set of data based on the questions where do you work and how do you get to work and there's a set that were introduced for the first time in 2011 for England and Wales about second residences based on the question in the census do you have another address that you stay at for 30 days or more a year and the the major categories in that second residence set of data are students with different term time addresses and children who have custody shared between two parents but it also includes things like holiday homes and addresses people have for weekly commuting and so on we've talked a little bit about the pandemic the effects of the pandemic are likely to be quite pronounced for origin destination data so the journey to work data are going to be fairly strongly affected obviously a large number of people are currently working at home uh people have changed their jobs people are furloughed so the overall labour market has has some quite unusual features at the moment we also know that student term time location is going to be affected students are asked to fill in the census in terms of where they should be staying in term time as long as they still have the contractual right to stay wherever they should have been but we don't know how well that will be captured and one further area that's going to be quite unusual in the 2021 22 data are to do with cross border flows so where people have moved between Scotland and the rest of the UK um there's going to be effects some some people will be captured twice if they've moved from England to Scotland uh over the coming year uh some people won't be captured at all if they move from Scotland to England and there's also going to be effects with journey to work about people who live on one side of the Scottish border and work on the other side but that we're going to have to deal with that in the interpretation of those data plans for what data will be produced are still not final I'm afraid um Dave mentioned in his previous uh in the previous presentation um about the use of API and flexible table builders and so on it's possible that ONS and the other census agencies will develop those for flow data but we we don't know that that's going to happen we don't think it's particularly likely we expect there to be very similar outputs to those produced in 2011 so four sets migration workplace second residence and students and as before they'll be divided by security level uh so the easiest to access will be open data safeguarded data is accessible by uh UK academics and people in um public sector roles and uh the most detailed data will be secure data for which you need to be an accredited researcher and have an approved project um the census agencies have spoken a little bit about that and talked about moving some of the safeguarded data that was produced in 2011 into open or into secure but we don't know how the extent to which that will happen in 2011 the origin destination data were published in the period 2014 to 2015 so they were a highly specialized set they take quite a long time to assemble as we've heard the general results should be published in the period from one year to two years after the census we think the origin destination data will be towards the end of that or possibly a little after as a specialized set but that will still make them consider considerably earlier than was the case last time and with Scotland we expect all of those to be offset by a year one of the complicating factors is that the flow data are published at a UK level because they contain flows within and between countries in the UK our current delivery software is very old so we're going to move that to a legacy state and we're going to be creating a new back-end API that we'll use for for data extraction and subsetting for those of you who don't know APIs they're something that you have to write a program to to interact with to to work with uh that's obviously not for everyone so we're going to be creating a front end as well for users to create their own queries and interactively extract flow data and download subsets of it and so that brings me to our next presentation which is James talking about boundary data I think hi everyone hopefully you can hear me um I'll just put the presentation into um full mode and see that um hi uh James Reed University of Edinburgh and I head up the geography part of the census so why are we interested in uh geography and what does role does that play well essentially the the the census is intrinsically geographic it actually starts with addresses and it's not only in the the outputs but the um the actual origin of the census starts with geography so on the screen you can see some kind of standard statistical reports all geographically based and on the right a simple core plethmap which is the geographic essence and the boundary data that I'm going to talk about digital boundary data essentially in the UK is is quite complex those geographic outputs it's not displaying your presentation properly that's it yeah good so as I was saying the geography of the census in the UK is quite complex as Dave has already said we've had three different agencies they use slightly different the magnitude but essentially at all levels down to we have a hierarchy built around kind of small areas that are aggregated from the individual census and household responses those are things called output areas though in Scotland um the higher level geographies have slightly different names essentially all form a hierarchy um in England uh and and Wales that is the super output areas the lower super output areas and the medium output areas hopefully you see on the screen as the aggregation increases the number of units uh decrease so these are larger areas these essentially they're nested into the um the more familiar geographies that people are familiar with like local authorities and in the 2011 census we had things like built up areas and a new geography which was the workplace zones which records the um the location of the daytime population Scotland is similar but again uses slightly different different the magnitude those output areas as I say form a nested hierarchy in which the geographical outputs the GIS and mapping data that were um formed um that constellation on the table there you can see some of the uh the thresholds that we used in the 2011 census there's some variation in the size of those the small areas uh the Scottish ones tend to be um uh using smaller thresholds in the 2021 census there's um proposals that some of those output areas may make have a UK harmonized output um but that's still in discussion um so in terms of what the UK data service provides we have a suite of online tools some of which already we mentioned they essentially allow you access to all those geographic boundaries for all the three countries dating back to 1971 and you can drill into those geographies and drill down to a single output area or an aggregate of those there are variations to some of those geographic products which are sometimes confused lay users there are essentially um two core variants one is a clipped version of the boundaries that don't extend to the what's known as the extent of the realm which is essentially the mean um high water mean low water spring which is used in administrative context and actually if you look at the map on the the bottom of the screen you'll see the clipped and the mapped versions look slightly different and most people would be more familiar with the kind of clipped versions of of the uh the products there's another version which is a kind of generalized version of the same binary set but essentially we throw away a significant proportion of the pocket of the um the information in those binary files to reduce both the binary file size and reduce the complexity of the boundaries there are generalized and super generalized versions of each of the boundaries and they're slightly different for application use for mapping the generalized versions are probably fine for most things but for detailed analysis and statistical work it's probably best to use the high resolution variants so coming to the 2021 clowns um those versions of the the boundaries I've spoken about the IPA areas the nested lower and super IPA areas uh the census agencies have already um agreed that that's what they'll be providing so you'll see on that table we have we're going to have super um um generalized versions and clipped versions of the existing kind of 2020 2011 variants ONS have also said they will be producing exactly the same sorts of geographies they did in 2011 uh area classifications built up areas and the workplace zones and one new departure potentially is the uh the generation of one kilometer grid square product um all of these products will be unreleased under open government license so it's broadly more of the same the slight variation there is possibly Northern Ireland may have a dual geography they had an administrative update to their boundaries in 2015 which makes backward compatibility to the 2011 slightly problematic so they may produce outputs to the new administrative um to some of the older ones how they're going to do that again remains to be seen um I will now hand you over hopefully to rehab who will say something about the census micro data uh thanks james good afternoon everyone uh I work in the UK data service I'm based in Manchester and I lead the census micro data team um so uh what's uh what are census micro data they are samples of census records these type of data are very flexible they have a lot of socio economic details but the data are anonymized to protect the confidentiality of the uh census respondents uh so why we use them uh as I have just said the census micro data are flexible uh they have large sample size so you can combine different variables to make new data from them they allow you to create sub samples you can also perform different types of statistical analysis for example regression or multivariate analysis you can also run multilevel modeling with them and because the data were collected in different years you can study changes over time using census micro data but because they are sample data we should keep in mind that the results will be estimates also this data don't have geographical detail because they have socio economic details so again this is for confidentiality reasons uh here we have a list of some of the most important topics of census micro data that will be in the 2021-22 census there's no need to go through it in detail just to say that they will be similar to the topics that we have in the 2011 census but with a slight change uh as they've mentioned earlier there will be new questions in the new census these are the ones in green color and in red this is a question that would not be asked in the new census but as they've said users can still have information about the number of rooms from existing government data and in black these are the questions that were asked in the previous census and will also be in the new census currently we have micro data from 1961 to 2011 census for 2021 census we are aiming to have micro data files similar to the one in 2011 census at the moment we have these files at three levels of access we have the teaching files and they are based on one percent sample of individuals we also have the safeguarded files they are larger so we have the regional and local authority samples and both are based on five percent sample of individuals we have the secure files for england and wales there are individual and household control files these files are the largest they are they have more detail each file is based on a 10 percent sample another new thing that we hope to get from 2021-22 census is a safeguarded household file at the moment all ONS are working to create this file which was not available from the previous census of course we are urging ONS to create such file but for now unfortunately we cannot promise much in detail but at this stage is a good news that they are considering it okay this is my final slide so we receive the new since when we receive the new census products we are planning to continue providing online access to census micro data through our online data exploration tool nister our users can also access the data from the data catalog on our website we expect to have a busy help desk when the census outputs are released so we will be doing our best to answer all users queries we will also provide documentation on the website to support our users and we will certainly update our training materials and will hold the event using the new census data before I hand this to Jill I just want to mention quickly that we have a new website which will be launched soon it will have a new look it should be easier to navigate and it will integrate census data and material so our new website will not have a separate census site like the current website