 Good afternoon everybody. So as Jill said, my name is David Rosely. I work for JISC, which you probably won't have heard of, unless you work in UK HEIT. We are the people behind things such as the Janet network and a lot of the software used by the libraries, but we also have a data unit. We've been, we used to be part of the University of Manchester, but we've been with JISC now for about six years. I've been working with census data since 1995, so I quite think I know a little bit about it. So I could get data about populations, groups, regions or countries. It could be average life expectancy, it could be employment rate, GDP, greenhouse gas emissions, or census headcounts. And aggregate data can be time series or it can be single point in time. Here we have some non-census aggregate data. This is reading performance according to the PISA data. So in this example we have girls identified by circles, boys identified by diamonds, and this is their average reading performance score by country. So the Jedi left your thinking of founding a church for Jedi and would be worth a place in the UK. But what if the above diagram was your local area, for instance? And what if the numbers represented households where you can identify single person households, so people living alone? And then if you're adding that maybe another characteristic is that those people living alone are over the age of 60. Using that information we could build up a map of the country and easily find people who might be vulnerable to COVID, for example, and who are shielding and who might be lonely. It starts to get that we can target resources. So since we've gone to Wales, in this case actually says England and Wales, although in the little box we can see there's an S that means it also relates to Scotland. We have local authorities in the columns and the highlighted number there would be people aged 16 to 24 who say that they are Muslim within the local authorities. Each census variable will have a unique identifier. In this case, it's a 1, 0, 3, 6, 5, that particular variable. The geographies also have Unable, but each geography has a geographic identifier as well. Built to deal with the 2001 of the 2011 census data, actually removes the table format. We've seen all questionnaire papers are released, a project ongoing to try and digitise the data we still have from 1961. There is a possibility that if that goes well, money will be forthcoming for a project to do earlier censuses. The next census they say will be the last ever full census, although they said that about the 2011 census as well. So we'll see. They want to go to a system of using administrative data plus some sample surveys. However, it's just a personal opinion. I don't think we are good enough at holding administrative data in this country to enable that. So we'll see. Some countries do this. The Dutch don't have a traditional census. Reasons for that are manifold, one of which is that the Nazis used the census data in the Second World War to identify Jewish people. So there was a lot of reaction to that and people weren't forthcoming with answers. Thankfully we don't have that issue in this country. So the next census will be in 2021 for England, Wales and Northern Ireland and that will be taking place on the 21st of March 2021 into the New Year. You'll start to see a lot of advertising around census day. Rehearsals have taken place. They've got good results from those. They've done lots of work on the address registers, so they know where people live and that might sound obvious. We did look up mail to people. How can we not know where people live? But there are instances where people are living in back gardens, in parks, where dwellings have been subdivided and sublet and the authorities don't know about it. So a lot of work has gone into that we have a good understanding of where people live and the address registers are complete. Scotland have announced that they won't be holding their census next year. They're going to delay it for a year and that's because of Covid obviously. They won't be changing or I understand that they won't be any of the questions. They're just changing the date on which they are holding the census. This will have a lot of problems because when we come to create a UK census, there will be a year apart. When we start to look at flow data, so looking at maybe migration or travelling to work, if you travel to work from Scotland to England, then that data is a year apart. If you migrated from England to Scotland, again you might have migrated in 2021 but you won't be enumerated until 2022. So you might appear in two censuses in the England and the Scottish. So what can the census accurate data tell us? It is the most complete source of information about demography and social economics in the UK. It's as near as possible 100% of the population. No other surveys come close to that. It's every 10 years so we can get some longitudinal studies from that and cover a wide variety of topics. That's not an exhaustive list and also there are many tables or variables where we're crossing these topics over. What it can't tell us, it can't tell us anything about wealth or income. We have derived deprivation data which we'll touch upon later. But no questions have ever asked about how much money people earn or how much money they have in the bank. So far no questions have ever been asked on sexual orientation. That may change. I believe Scotland are considering that. I don't know if they're going to do that yet. So far there has been nothing to identify an individual for 100 years. All of the census agencies are very, very careful about this. To my knowledge, no personal identification has ever been able to take place from the census aggregate data. To do this, they blur the data, the obfuscate the data. They move the data around where you get very small numbers. So where there's 1s, 2s and 3s, they may not be 1s, 2s and 3s. They might be 2s, 3s and 1s. Or they may move records around so they may swap the data from two areas where there are very small numbers. So how are the aggregate data produced? Well, traditionally there's a paper form. You might not see a paper form this year, sorry, next year. You might see some standard tables based on what? In the 2021 census they are hoping to create a system where you can create your own tables. There will be a set of standard tables as normal, but also you can be able to create tables on the fly using the variables you're interested in. And that will include some online data blurring where numbers are small and are possibly discloser. So from the question forms, theories of tabular outputs are designed at different levels. The higher the geographies of countries, regions, counties, local authorities, there will be more detailed data. As you start to get lower down, down to Ampertaria, maybe 100 households, the data won't be as detailed. So once they've got these tables, they can start putting together the geographies. Now, countries, local authorities, entities, they stay the same, those are recognised administrative boundaries. For the smaller geographies such as Ampertaria's, Superampertaria's, those are designed once they have the data. And they are designed so that communities, Ampertaria's, describe a community that is homogenous, similar. And they are banded by recognised geographic elements. So major roads or coastline or rivers all play a part in shaping Ampertaria's. But from the Ampertaria's, those are the building blocks. From those we get lower Superampertaria's and middle Superampertaria's. The data is then subject to some statistical disclosure controls of the data blurring to restrict that if it's felt in any way to disclose information. So there is an issue with census geography. As we said, the building block is the Ampertaria. Built after the census has been taken. And roughly 100 to 150 households designed to be homogenous. So to describe an area that has similar characteristics. Constrained by the obvious boundaries such as roads and rivers. And in 2011 they were aligned to local authority boundaries as well. And I imagine that will be the case in 2021. And from that we use those to build up the hydrographies. Now they're called different names in different nations. Scotland, they're called data zones. England, Superampertaria's. More than Ireland, they don't have middle there, Superampertaria's. Just lower Superampertaria's. And none of those geographies are real. They don't describe anything that we understand such. So it's not where you live. It's not a town. It's not a county. The Royal Mail couldn't deliver to your Ampertaria, for instance, whereas they can to your post county. But above those census geographies are the local authority. So the wards, which relate, generally relate to electoral wards. And then on top of that local authorities. There's a link there to Geoconvert, which we'll be talking about later. Geoconvert is our resource and application for with tricky issue of census geographies. Also, just to add in, there's one other little problem in that Ampertaria's in England and Wales are minimum 40 households. They recommended to be about 125 households in Scotland. So minimum 20 households and a target of 50 households. So there are differences if you're going to be looking at comparing England and Wales with Scotland. As I said, there are differences. So each nation has its own census office, the national statistics in England and Wales, national records, the Nisra, the Northern Ireland statistics and research agency. And they all create their own census questionnaire. They capture the data and deal with it completely separately. They talk to each other and they broadly agree on where they are going to be the same. But there are definite differences in 2011. There are differences in the definition of long-term disabilities. There's differences in the definition of tenure. There's also differences on the same table sometimes on the population. I think Scotland has ages up to 60, whereas England and Wales have ages up to 65 in a lot of their population for tables. They do attempt for stability over time, so they attempt to have questions that you can compare with previous censuses. And they do produce a report each year on comparability over time and comparability with the other nation censuses. I don't have a link there. Unfortunately, I can get a link for you at the end on the ONS report on comparability. So accessing the census data through UK data service, we have three interfaces for good data. Infuse interface has dates from 2001 to 2011. CAS Web is a very old, creaking interface, but it just has data from 71 up to 2001. It also has integrated boundary data with the 91 and 2001 censuses. Decan is our latest interface. It's our solution for delivering bulk data, so whole tables. It's currently only got 2011 data, but we are loading 2001 in now, and we plan to load all the data back to 1971. So we've got an activity for you now. We're going to ask you to, in a web browser, go to infuse at the web address there and before you do that, if you could download the worksheet from the address here and this should take you about 10 minutes. So I'll stay online. If you've got any questions, pop them in the question box. So have a look at the worksheet and actually before we do that, what I might do is I'll just go and show you infuse. So I'll just change my screen. Okay, so I'm now showing this is infuse, topics amount of topics available. If you pick topics first, it will limit the geography that's available to you. So I can choose geography. We could go in then and choose nothing completely different. So I could look in, click all the geographies on the bridge end, nations of topics. So it will give me some other set of variables. And then I've got a tool here. So I can choose all the variables that might relate to how I get to my place of work, add them, go on to the next sheet, asked for the variables we've asked to download, and then the data itself is in a comma-separated variable file. And if I open that up in Excel, I'll just change that. You see that. So here's our data here, unique identifiers for the variables and then variable names and then the data. If a worksheet you can download, that will give you a step-by-step task to perform. Okay, the answers to those questions. So how many one-person households, where the householder is 65 or over, are they in Bartley Green in Birmingham? The answer to that is 1,409. As a percentage, that's 13.1% of the numbers of households. And across all of Birmingham, about 11.6% of households are single person over 65, which is quite a lot. There are wards in Manchester where that is over 25% of people. So serious problems with isolation there, especially with shielding for COVID-19. So deprivation data. So we're not allowed to ask questions on, sorry, that the census agencies aren't allowed to ask questions on income or wealth. So how can we then get information on deprivation? So they're derived. So look at room occupancy, how many people are living in a household, to how many rooms there are, house ownership or tenancy, housing association, our availability, how many cars or vans are in the household employment status, et cetera. There is an issue with car availability because it was identified that the centre of London, people don't have cars. You might live in a huge mansion, but you just don't have a car, so you don't need one. So the equation has been designed to take that into account. There are a number of recipes for that. Car stares and Townsend traditionally were used with 8191, 2001. More recently, it's been the index of multiple deprivation, which is now called the indices of deprivation, which there's a very good website there from good.uk, which will explain that and have links to other indices of deprivation, earlier indices of deprivation. Along with deprivation, we can talk about matching geographies because the tool that we use for matching geographies, we can also use for getting access to deprivation data. Geoconvert is a system we had developed now for about 15 years. It uses address points, so Royal Mail, where Royal Mail deliver to, and how many people are in that address to calculate an area of population, and then it uses some clever proportioning of that data to understand how many people are in an area. How to think about what you're attempting to do if you're going to use Geoconvert to convert or match geographies, because there are overlaps, and also geographies don't have homogenous populations so that people are not spread nice and evenly over a population. So if you're going to split a district into right down the middle, everybody might be in the left-hand part of that rather than spread nice and evenly over both parts. Also, postcodes change. They get reused over time. They get discontinued as well. So when you're trying to match postcodes, it won't always be perfect. We do have information in there about when postcodes were introduced and when they were ended, but it's not 100%. Supporting documentation there is a link there. We've got all sorts of information about geographies on there as well. What we're going to do now is a little activity. I'm going to walk you through this one. I'm going to walk you through all the different parts of Geoconvert. We've got some sample data that you can download there. If you can download those two files, postcodes.csv and data.csv.