 Hello, okay, we're going to start the webinar now, so many thanks for joining us. So this webinar is due for the 2011 census, specifically to do with aggregates data. Okay, so first of all, just stand some introductions. My name's Richard Weisland, and I'm with my colleague. Hello, my name's Justin Hayes, and we both work for the UK Data Service based up at Manchester. Okay, so just let you know a bit about the structure. We're going to have about a 30 minute presentation roughly, and then we're going to have the remaining 20 minutes for questions and answers. You can, so we're aiming to finish it 10 to 3. So any problems, use the chat facility, we are presenting at the same time, so we're not always going to be able to look at it straight away, but questions, if you've got that, we can look at them at the end of the webinar. So this is just an overview of what we're going to cover, so I'll give you some brief information about the UK Data Service, and specifically the census support. We'll give you some background to the census, tell you what aggregate data is, talk about the geographies, and other information about the census. And then we're going to do a demo on our application infuse, where you can download census aggregate data. My colleague Justin will then talk about the data and geography model, and we'll talk about our next release and future plans. So what is the UK Data Service? Well, we're funded by the ERC, integrating several previous resources, such as the old ESDS and the census.ac.uk site. So as one of the data, we provide support, training, and guidance, and all of our materials can be found on our website at ukdataservice.ac.uk. The census support is the specialist unit of the UK Data Service, so we have our own little micro site. And what we do is we provide access to and support for use of the data from the last five UK censuses from 1971 up to the latest 2011 census, and what we do is we provide the spoke interfaces to make data easier to find, understand, and use. So you can see the URL there, and it's easily found from the main UK Data Service website. And talking about the website, here's a screen dump from there, and you can see there's lots of information on there, but to access specifically the census supports part of the site, you can, there's numerous ways you can access it, but you can click this link here on the right-hand side, and that'll take you to the many census support websites. So on that, we have all obvious things like the get the census data at the top, and the use census data, which shall have links to the question forms and the definitions, for example, and there's also the use and events part, so you can find out about what the news and obviously the events that might be coming up. Right, now we have a question for you if you're a census user. So I'm just going to start a poll, which I'll launch now, and just simply, if you can let us know if you use the census weekly, monthly, annually, or never. Okay, I can see people are answering, which is great, and I'll just give you five more seconds, so five, four, three, two, one. Okay, I'll just close that poll, and I'll share the results. Great, so I can see that it's a little bit of everything really. This is some people who've never used it, and some people who've used it much more often, so that's interesting to know. Okay, I'll hide that, there we go. Okay, so to carry on. So just a little bit of background information about the census. Census takes place every 10 years, and it covers the entire UK population. It's been going from 1801, so really long history, with the exception of 1941 due to the war. So the questions are mainly about people and households. The 2011 census is estimated to cost around $500 million, but considering the use of how much the census is used, it's a pretty good investment. So the census provides primary evidence for government policy and spending, and it covers a wide range of demographic and socioeconomic characteristics. It covers detailed combinations of characteristics, which makes it quite unique. It goes down to small areas, which is very unique, so the same people have been asked the same questions on the same day, so you can reliably compare different areas, because it's such a large population, so you're not going to get that with any other source really, and it has a really long history. But although the primary purpose of the census is for government policy and spending, it has a lot of use for a secondary source of information for other people, such as charities, academia, the commercial sector, for instance. The good thing is that the state is all available in open government license, so that makes the data much usable for lots of people, so that gets rid of the barriers. Anyway, we're talking about aggregate data, and just to give you a definition, that's accounts of usually people and households with particular combinations of characteristics for particular geographical areas. All these characteristics come from questionnaire responses that derive from that, so as we were saying earlier, the areas vary from very large, so UK, country level, right down to the smallest areas, and so there's some data down to postcode level, which has just had accounts of males and females, but those details are at the area level as well, which gives us more detail. An example of what aggregate data is for a grant book could be the number of females aged 16 to 74 in employment in associate to professional technical occupations, and usually residents in wars in the county of Devon, for example. And just to show you in an image of what aggregate data is, so if you looked at the country and then decided to carve it up into lots of areas, like this, for example, these will be considered the boundaries, and then each area has a name, so I believe these are wards, and then if you go and look at the amount of people who live in that area, particular characteristics, so these could be my last example, females working in a particular occupation group, and then you count them all up, and that's all aggregate data is. It tells you that there's 43 people in one area that has a certain attribution compare to another in another area. But if you download these data within our interface infuse, you could see this is what the data will come out like, so you can see that's the geography code, the geography label, and so on, and then you can see there's two columns here. So one column is the number of females with the associate professional technical occupation, and the other column is, well, that's the total of all categories of occupation, so that's all people essentially. So that's what the data would look like if you downloaded it from infuse. You can also do some work and just quickly convert that into a percentage, and that means you can easily, if you do that and you have some GIS experience, you could create a map like this. So you can visualize the data, and this is a coreblef map for Devon for all the different boards. For the 2011 census, that took place on the 27th of March, 2011, obviously, and just to make you aware that the UK census is actually conducted by three different agencies, so for England, Wales, that was the Office for National Statistics for Scotland, the National Records for Scotland, and for Northern Ireland, that would be Nisra. So for each census, there's new questions and variables added, so there are some new ones which I'll tell you about in a minute for the 2011 census. What was unique, new for the 2011 census was for the first time there was online computer completion of the census. Targeted enumeration took place as well, because there's always certain areas or certain types of people that are harder to find and count and are represented as well. A lot of quality assurance took place as well, on the data as well, to ensure it was all up to scratch. Okay, so here's the screen dump from the 2011 census form. So new questions this time around were questions such as national identity, passports held, ability in spoken English, languages other than English, month and year of arrival in the UK, attention to stay and second homes. So that's what was new this time around. And we've got another question for you. There's a main language other than English by a number of speakers in England and Wales. That's a graph and you can see we've put the top ones there. But we're missing what's the largest non-English language spoken in England and Wales? And I'll just start that poll and you should be able to see you've got four choices. That's Somali, Polish, Turkish or Tamil. So if you can just see what you think that missing language is. Okay, and I'll give you five more seconds. Five, four, three, two, one. Okay, I'll just close that poll and you should be able to see the results. Yeah, so the most people got that, Polish was the largest non-English language in England and Wales. So that just gives you an example of the sort of data you get out of the census. And that's the graph completed. Okay, so I'm now going to go and do a demo of Infuse. So you can see for Infuse we have 2011 and 2001 data and I'll go and access the 2011 data there. So just click that link. As I was saying, it's all open access. It used to be the case it was restricted to academia but anyone can access this data. So Infuse has four different steps to it. There's step one, which we're on at the moment, which is the topics. And you can see here there's a grid of all of the different topic combinations. So on the left-hand side, you can see all the topics there. And what we can do is we can use that to filter them down. So what I wanted to do is I go and click on sex, for example. We can see that we now have 77 combinations. If I then want to go and click on age, we can see we'll filter it down. So there's like age, sex there, for example, lots of other combinations. I'm now going to click Occupation and there we go. And I'll pick this first for an age, economic activity, occupation and sex. So this is an example I'm going to fill out that I saw earlier. But anyway, I've now clicked that and you can see all of the different definitions for those different topics. So once you're happy that you know what you're looking at, you can click Next again. And at this point, you have the opportunity to select exactly what you want. So I'm going to just select one. We've got there's only one for age 16 to 74. There's just an employment for the center. So I've picked that one here and Female and I'll click Add. And you can see it's found it right down there at the bottom. Now next we'll click Next. And at this point, we can select the geography to be one. So it's very flexible. So you can select multiple geographies and infuse is really good for that. Whereas other interfaces, it's a little bit harder. So for instance, in the example earlier, we selected all wards in Daven. So if I click this little plus here, it'll expand all the counties. That takes a little bit of time to come up. Great. And then you can see, well, Daven. So we could take Daven and I will take Daven. And that will give us data just for the county Daven. But I wanted all wards. If I click that plus, which I've just stern. And then if I take it here, we've got wards and electoral divisions. So there's 201 wards in Daven. And I could go down further if I wanted just one specific ward or a different geography. But I'll stop that now. I'll scroll down. I just click Add. And there, that's just a confirmation. So you've got your data for Daven. And you've got all wards in Daven as well. Click Next again. Again, this is just a stage where we download the data. A little confirmation model is. This is what your file reference will be. And you can change that if you want. Something I didn't really mention that I should have is there's a little bit of guidance through each step as well, which you can always have a look at. But anyway, I'm done. I just want to get the data so I click the screen button here. It comes up. Then we now need to click this red button and download the data. And what it does is it'll come in a zip file, which I'll show you. Just get my mouse to the right place. And so the zip file's got three different files in it. There's a citation file. It's really important to cite the data from where you got it. So you just have that information telling you exactly to how to cite the data. There's data, which I'll open in a minute. And there's also metadata. So that gives you all the descriptions as well. There's definitions about those topics that you picked. It can be important to refer to. So if I open the data, and it's just a simple CSV file, OK. So you can see there's just three different columns. There's the geography code, the geography label. And then you can see the figures that we downloaded as well. Which is great. OK. I think that's the end of the demo. Oh, no. I was going to show you one more demo, wasn't I? Another thing today, if you want to start again at any time, then click Start Again here. OK, so back at the beginning. So just one very quick thing I wanted to show you was that if you were to pick a topic combination that doesn't exist, because not all topic combinations were provided by the census agencies. We wish they were. But sometimes that's because they could be considered confidential, which is very important. So it's an ethnic group, for example. And then side is to pick religion. I mean, that's available. You can have ethnic group by religion. But say, you can see all of these different topics that are grayed out. And that means that they're not available. So if you wanted car and van availability, for example, you wouldn't be able to access that, because the data isn't in infuse. That's just an example. That's a good example of it's important that infuse tells you what doesn't exist as well as what does exist. Anyway, I won't dwell on that any longer. OK, so that brings me to the end of the day. OK, at this point, I'm going to pass over to my colleague, Justin. Thanks, Richard. I'm hoping everyone can hear me. I'm just going to take over from Richard and just tell you a little bit about the characteristics, the data that is available from the census and the way in which we process it into a form which allows us to use it in infuse the way we do, and then the census geographies as well. And then a little bit of detail about the data that infuse has at the moment and the data that we're going to be adding to it very shortly. Right, so the census data is specified and produced traditionally as sets of tables. And these tables range from very simple ones like with one variable, households with at least one usual resident. And the tables, oops, there we go. The table, some of the tables get larger and larger and more complex. And you can see here that these are cross tabulated tables with more than one variable. And each of the little cells here contains a particular combination of categories and larger and larger. And some of the big ones contain several thousand different cells of information. The data that's produced by the census agencies comes in the form of multiple different packets of data which are associated with particular sets of geographies for each of these tables. And so a large part of our work is tracking and bringing together all the data that's produced over a several year period by the agencies so that we can bring it all together in a form which allows us to manage the whole lot and then provide an interface which allows end users such as yourselves to pick the little bits of information they want from within the vast swathes that are available within the census. So just to give you an idea, the data that's already within Infuse, this is the data we've processed and has been in for a while since the middle of last year, is mainly what are called the local and detailed characteristics. There's also some UK data, so that's data across the UK but only down to local authority level. That was supplied to 422 different variants of those kind of table layouts which I just showed you. Most of those were multi-variate, quite complex tables. There are 31 different types of geographies, so things like wards, unitary authorities, districts, local government districts in Northern Ireland, council areas, et cetera. The whole lot came to us in about 11,000 different files which we had to pull together. This was files of data and files of accompanying metadata, about 15 gigs of data with a lot of the metadata and separate files which we had to open up and try and integrate. So the idea of Infuse was to, rather than traditionally, traditional dissemination starts with trying to allow you to pick a table. So effectively reproducing all the tables online, allowing you to pick a table, look into that table, see whether it has the data you want and if it doesn't, you come out of that table and jump into another table, see whether other tables have data you want. So it's sort of a per table query. What we set out to do was to try and bring the data from all the different tables into a single model which allowed us to query across all of the data at once. This involved deconstructing, so pulling apart all of those table frameworks, several hundreds, if not thousands of table frameworks, looking at what variables and categories they were in each table and trying to sort of condense and rationalize them into a set of, a standard set of variables and categories. Oops, sorry, I've just got a strange error message coming up. We also did things like try to pull to get, effectively try to simplify the structure of the overall data set by integrating things like table universes. So a lot of tables have variables specified on the axes, but there's also something which the universe are quite often tables are only produced for people who are of working age 16 to 74, of working age. Once we had all of these cleaned up and rationalized sets of variables, we were able to effectively reinsert the counts which had been supplied in all of these thousands of different files into this single descriptive model. And the model then also provided us with a framework to attach all of the extended metadata about the data. So, for instance, information about question text that was used, so not only do you have the immediate description of the variables, but you can look at the questions which we used in the survey questions which were then processed into the data. I think we're pressing the wrong buttons. So from the original set of 11,000 different files, we rationalized those down to find that there were 97 variables in those of those variables contained about 2,500 categories between them. The variables were found in about 280 different combinations and within those variable combinations, there are about 140,000 different category combinations. The important thing being that we had a model in which we could see all of these things and describe them at once. The data already in a few available through infuse has about four and a half billion different values. So that's the actual numbers that you see. And I in an idle moment worked out that if you wrote each one on a sticky note and stack them up, you'd get a stack approximately 460 kilometers high. So it would sort of go between Manchester and Belfast if it fell over. So that's the sort of what of the senses, the characteristics. I always think that the senses tells us what it's like, where, and when. So that's the what of the senses. The 2011 geographies are the where, they're the location. And really they consist of subdivisions of the UK into smaller areas. At the top level you can get overall aggregate figures for the entire UK and then for smaller areas within that. And what we call these sets of similar areas geographies. So you get regions, counties, units or authorities, districts, they reach a separate geography. There tend to be hierarchies of these geographies, the most common being the statistical and the administrative hierarchies. We're expecting about a hundred different geographies overall, but we've done some work to try and simplify those, which I'll be showing you now, let me see. So as I said, yes, there are hierarchies within these geographies. So the administrative geography would start at UK regions, counties, districts, wards. Statistical geography contains the smaller statistical geographies like the output areas, the super output areas, et cetera. And there are also several other geographies which are specific to different uses such as health electoral postcode geographies. This is just an example of the administrative hierarchy. If I can get my cursor on the right screen. Regions, counties, and it comes in unitary authorities. These are districts, wards, and then going down to the very smallest output areas and output areas I think have a minimum population of 40 households and a hundred people. So they're actually very small areas and they're about, I think, 150,000-ish of those in England and Wales. I have a warning message again, got rid of it. So the first job really is to try to structure and record the raw relationships between these geographies. And this is just a snapshot of our SQL server database with the tables. These tables are either, half of them are lists of areas. So there'll be a table which lists all of the different wards and then the other half are the relationships between them. So this is the way in which we capture all of the geographies in lists and what the relationships are. For instance, if there are hierarchical relationships. Luckily, you don't have to see any of this because we've boiled it down into a much simplified form. We've taken the 31 geography types which the data so far has been supplied with and simplified those into 11 geography layers where, for instance, we've taken, try to simplify selection across the UK by condensing all of the district type areas into a single layer. So local government districts from Northern Ireland council areas from Scotland, unitary authorities and districts are all combined. We've also done work to try to simplify the rather over-complicated system of standard and merged geographies that ONS has used for wards and districts in England and Wales by bringing the data together into a single layer. I think you'll have to ask me more questions about that if you want to know. But the idea is that it brings together the data and makes it easy for you to select them. We've developed the interface to allow you to make complex selections in multiple areas from different geographies across the UK in a single operation so you don't have to go in and select wards in one operation and then districts in another. You can do it all in one go. And the interface allows you to make what we call geography jumps. So I think, as Richard demonstrated earlier, you can select England and then specify that you want all of the wards in England without going through all of the intermediate geographies by drilling the whole way down. So this is what the simplified version of the geographies looks like in the interface. This is one of the information pages that goes along with the data. And then once you have the watt of the sensors, which is the characteristics they wear, which is the geographies, and then there's the interrelationship between those because not all of the data that's produced or specified in the tables is available for all of the areas. This is primarily due to issues of confidentiality of personal information. It's a very fundamental aspect of the census that the agencies guarantee that no information will be traceable to particular individuals for 100 years following the census. I mean, we're now able to look at the actual individual returns from the 1911 census, but we won't be able to see the 2011 census until 2111. And the sort of way in which this is implemented really is to avoid low counts being produced by the census. So low counts, maybe there are only one or two people with a particular combination of characteristics in an area. That leads to the danger of people who have some knowledge of the area being able to identify someone who they know has one or two characteristics and then finding out a further characteristic which may have been produced in combination with that. So this results in a trade-off in between detail in the characteristics, the amount of detail on what the characteristics are and the geographical detail. Basically, the smaller geographies you want information for, the less detailed information you will be able to get for those. So, for instance, whereas you might be able to get data with single-year of age categories for wards, if you go down to output areas, you may only be able to get data for condensed categories of five-year age groups. There are two main groups of the census outputs. There is what we call the lower threshold data, which is the less detailed data, but that is available for all of the areas. So right down to the smallest output areas. And that's primarily what the agencies call their key statistics, quick statistics, and local characteristics. And then there's higher threshold data which has a higher population threshold required for its release. So it's only available down to the wards and middle super output areas, not down to the smallest areas. But that gives you more detailing characteristics, more detailed categories and that kind of thing. So just to recap on what we've been saying about infuse, the benefits of infuse are primarily that it doesn't have this concept of tables, it doesn't have the intermediate step of you having to find a table and then look into the table. It provides a very fast and easy way of globally searching the entire set of census outputs because we've structured those in a single data set. And you do this using the variable and category combinations that Richard demonstrated. I think Richard also did a demonstration of where if you're looking for a particular combination of ales and it doesn't exist, infuse allows you to find out very quickly that there's no data that meets your requirements, whereas previously until you'd looked in quite a number of tables, you wouldn't have been able to be reassured that your data actually didn't exist. Infuse is designed to guide users to find data as quickly as possible. So where you select variables, as soon as you select one variable, it will only allow you to make following selections of variables which actually do, for which data has been combined, all of the other variables are grayed out. So you can only make selections which will actually lead you to data. And once you've made your variable and category selections, you can only then, you're only shown geographies for which those data for those category combinations are available. We've also done some work because of the way in which we've managed the data it's allowed us to spot gaps in the data, and so we've produced additional aggregated counts up for some of the areas which the agencies seem to have forgotten in their data production, and we've done work to make lower threshold and higher threshold data available for all the areas in the condensed geography layers that I was talking about. We're able, because we can tie metadata in with the data in the same model, it allows us to provide that data a lot more easily, so it allows users to understand the data a lot better and make appropriate use of it. All the data is now open access, fire open government license, so it means we've been able to remove the authentication which we did have previously on the interfaces. So just a quick recap on what the data that Infuse currently contains. So for England and Wales it contains key stats, quick stats, and local characteristics and detailed characteristics down to output area level. UK-wide, we've only got data down to local authority level, so this is the UK-wide harmonized outputs. We are planning a major release which will more than double the amount of data in Infuse by the middle of February, this was supplied in about 837 tables and another 5,000 files, and this will bring us mostly up to date with the 2011 Outputs that have been released so far. It'll also bring in data for some of the more interesting new data that's been produced in 2011 for alternative geographies such as workplace zones, and we're also considering trying to get data for parliamentary constituencies in advance of the next election. We're also bringing ourselves mostly up to date with data for Scotland and Northern Ireland so you'll be able to make selections across the UK of most of the data that's available from 2011 Census. Future plans, well obviously we're going to continue putting more data in, more data from the 2011 Census until we've got all of that in. We're then working our way back through the previous UK Census data that we hold, there is already some 2001 data in and we're going to work back through 2001, 91, 81, 71. We're looking at ways of integrating the delivery of flow and boundary data, the flow data being information on travel to work and on the flows between place of residence a year before the Census and a place of residence at the Census, and then the model, we're looking at ways of trying to extend the model to include non-census data sets as well to integrate those into the model, and particularly working with the Census agencies to look at how we might develop a model which would integrate a lot more of the national statistics data sets. The interface itself, as Richard said, it's a little bit clunky at the moment because we really haven't had, our focus has been on getting data out through it rather than developing the interface itself, but there's a lot that we can do to improve the usability and the functionality of the data, things like potentially putting in some visualizations and allowing downloads in GIS outputs, this kind of thing, and we intend to do that by working with users to try and identify what the majority requirements are, and working on those as a priority. If you want to find out more, we have a hands-on workshop on the 17th of February, which I think booking is open for, and if you keep an eye on the UK Data Service news pages will be posting information about data releases and developments in the interface. We provide interactive support, so hands-on support for people. You can find this by either going through the Infuse application or the Census Support web pages. I think we're now at the end of the presentation where we will look at some of the questions which people have been submitting, and if you have any further questions, if you can type them in in the chat box we'll see how many of those we can get through.