 Hi everyone, people are still joining us but so what I'm maybe to cover today is ways you can get more information about census data so you can hone down what you want to look at. Some discussion about the potential geography, I'm not going to go as far as mapping but I will show you how to arrive at a point where you've got data that can be loaded into a product like QGIS or UPGIS for mapping purposes. Some discussion about the sources of data and a recommendation on where to go. At the end of that session which is kind of giving out a lot of information I'm going to have a break to answer any questions so if you have any questions if you pop them in the chat into the Q&A sorry I'll pick those up then and then we'll move on to looking at how to get the data together. I'm going to take you through a demonstration of how to download the data and how to prepare it and then over to you again for questions. So first of all sources of information. This is the census page so when you go into this page there is a useful set of information in the census 2021 dictionary it brings a lot of things together so it talks to you about the area type definitions which I'll be covering to some extent in the presentation. It talks about the measurements again I'll be talking about that but we hold information about individuals, about households and about dwelling spaces and then details of each of the variables by topic with the potential breakdowns of them. So a useful kind of one-stop shot to check information. So just to kind of focus on what we've got we had a response rate of 97% for the 2021 census and more than 88% in all local authorities. Now though this is great it does mean that the numbers that are produced at estimates and may have some impact on some of your smaller populations so for example we know there's significant underrepresentation of the gypsy and traveler community which does cast some kind of concerns about using the census data purely as the only method of understanding that community in the local area. There was a comprehensive quality assurance process and new thing for 2021 and the majority of local authorities took part in that. There was then a coverage survey as ever used to estimate missing and double counting. One important thing I mean given that this was conducted during the pandemic is that ONS have attempted to count students at their term time address whether or not they were there but it's likely to exclude some overseas students because they wouldn't have been resident in the UK so they wouldn't have been picked up at all. And one of the key findings I think you know one of the key concerns like in terms of funding for local authorities particularly was this kind of evidence of a temporary outflow for urban areas. So there's a number of documents I've just summarized them here and really I'm not going to go through them in lots of detail except to say that what the ONS did was compare what they got from the census with a number of different sources. So using avian based population estimates, mid-year population estimates to census coverage survey and at individual level and they identified particular issues with some groups and have made some corrections to local authority estimates based on discussions about the real reasons for those but probably not as much as some local authorities would have liked. So if you're interested in that kind of quality information there's reports there a lot of them have used case studies of local authorities to demonstrate the things they found. So the next phase is thinking about your geography and this is a kind of important part of deciding how you're going to do your area profiling and I think there's a lot of caveats around this that we'll come to later on but first of all we've got a kind of administrative geography which so on the left we have a breakdown of Greater Manchester there are 10 boroughs in Greater Manchester physical representation of those on the right we've got one of those boroughs Trafford and it's broken down into two sets of things so in terms of their governance they run area committees for northwest south and central and within those they have bounded wards so those wards are then included as part of they have an elected councillor they have potentially budgets in some areas and so on and that kind of physical representation of those is a useful starting point for those of you who will be thinking about an area profile for a local area. The next one is just an example of the different ways we might look at this on the left we have a physical representation of the UK general election results by constituency using a physical geography so it looks like Labour is the country is predominantly blue of England the country of Scotland is predominantly yellow and Northern Ireland is split between green and red which I think are the unionist and campaign representations the next map takes a different approach and makes each constituency a hexagon so they're of equal size of equal weight and that gives probably a more accurate representation of the results of that election and then finally the one on the right uses what's called a cartogram and what this does is to distort the boundaries to reflect the population size now this may be particularly useful if you're in an area where you have quite dense urban part and quite sparse rural parts so if we went back to that map of Trafford the previous one you can see on the bottom left Bowdoin is a large area but there's significant parkland in that so if you applied that cartogram measure Bowdoin was shrinking significantly whereas the more populated parts like sale Stratford Old Trafford would grow proportionately as well so those are different ways of doing that I'm happy to to talk about how you might do that but it's not part of this course so I can probably point you but offline to sources of where you can work on that and then we have the kind of statistical geography this was introduced in 2001 and there were two real reasons one was to produce a more homogenous population by matching characteristics such as tenure and property type and when you look at the root files for this they have characteristics of the areas and they've broken them down into those and the second name was over time to minimize the changes so there was more ability so for those of you in many local authorities this year you will have seen with the 2021 data you have seen significant changes from the ward structure in 2011 and looking at change over time comparing at ward level will not really give you the answer whereas these measures are more consistent and ONS provide data on what's changed what stayed the same etc so the building block is an output area it has a minimum of 40 households and a maximum of 250 households 100 residents and 625 people and a target of around 125 so those are the kind of building blocks and that's where the matching is done to create the homogeneity I would say all of these areas are bounded within local authorities so though there might be lots of similarities across the boundaries between local authorities those are not created as output areas of the larger super output areas so the next building block is the lower layer super output areas minimum 400 households 1000 people maximum 1200 households and 3000 people and they're often used in public statistics such as the index of multiple deprivation and recorded crime so they're useful kind of pointers that many of us will be familiar with and the larger areas the mid-layer super output areas 2000 households 5000 people and 6000 households and 15 000 people so quite akin towards and again there's public statistics at this level such as education attainment when they were looking at COVID test results and mapping those during the pandemic there was a weekly data coming out at MSOA level so a useful building block probably one I would recommend and just to illustrate the differences that this shows from the left we have the private rented sector in Manchester so the darker the shading the higher the level of the private rented sector so if you look at the brown blob towards the bottom that's kind of located to the south of the university where there's a significant private rented sector when we move on to LSOA geography we get more detailed breakdown of those areas and we can see the variance in within those kinds of areas and finally on the right hand side we get an output area geography which gets really quite specific and you can begin to see how that homogeneity is led to clustering of kind of very high levels of private renting in and around the city centre and in the student corridor and up on the borders with Salford I believe that is the the next thing to say and this is probably the the area where I feel most uncomfortable in a way because I'm going to end up recommending using a different source of the UK data service so just to say that for the individual variables UK data service no miss and no ns provide topic summaries for all of those individual variables that's kind of established I think they are all available to pick from a long list and you can map those those comfortably into output area level data when we move to multivariate data the UK data service and no miss currently present a defined set of tables so they present the tables that have been released as defined tables by dns and having done that they also take the decision to present those tables with minimum suppression so what that means is that for some categories where we might be interested in a more detailed breakdown such as ethnicity or household composition we can't see it within the data provided by the UK data service and no miss whereas the ons interface allows you to select variables to create a custom data set to select the number of categories and the interface is quite fast and informs you of level of suppression due to the statistical disclosure control basically let's have a look at what they've got so in terms of data selection no miss and UK DS have a list of tables in the UK data service site those tables have filters so you can filter out particular things or filter in particular things you want to whereas in no miss currently there is a long list that may well be developed but when I checked it yesterday that was the way it was coming out on s have again a list of predefined tables with filters but they also have the option to build a custom data set no miss and the ons allow you to select data from particular geographies whereas the UK DS at the moment though it's intended to bring it in currently only gives you the national data set from the geographies you select so you need to clear out what you don't need for yourself and then in terms of the output format no miss gives you a table or tables and that's kind of fine if the table you want is there and in the format you want it one of the things it will do when you go to multivariate data is potentially produce multiple tables for each variable I'm not sure and I'm not played with the interface enough for 2021 to see if you can get it in long format but it does mean some processing so for example I was looking at religion and ethnicity and I ended up getting tables for each religion of all of the ethnic groups which was kind of for for the geography I selected which was local authority for ons and UK data service the there's individual data so you get a data line which holds each category combination and the number of observations so I'll be demonstrating that later on but that is the kind of offering really from from the three so for me I've mainly used the ons interface as it supports a level of detail I need I do quite a lot of work around ethnicity and I found I can't really get that data and in the ons finding a table no missing UK DS is quite time consuming so I intend to start with the build a custom data set so UK DS I know is undergoing development there are plans of the geographical selection improve filtering and more detailed categories the mechanism for doing the more detailed categories will be to generate custom data sets and load them into UK DS so to go down levels of geography as far as is possible without losing data both ons and UK DS data need to be prepared and I'll demonstrate that later as I said I think nomus provides data formatted but you may get multiple tables so in this session the demonstration will use the ons version so the aim of the disclosure control is to protect confidentiality and so what happens is they will whilst it's generating data of dynamically swap records between areas when there are low counts mostly within local authorities and they use a selfie method for each table that changes the value by adding or subtracting one or two to the counts so it leaves the overall totals on change and gives you consistent results so in terms of looking at data if you look at it today and you look at it in six months time you should get the same results having done that what they call selfie perturbation they have a set of disclosure rules for tables and they've tested that they've tested those with 50 odd users attempting to find information about individuals for disclosure risk and found that I think they made one amendment to those disclosure rules and they were so for me surprisingly I think they're quite generous so you do get quite low set counts which and they're rational as you can't rely on those really being people in the same place so that's a kind of feature of it so what that does mean though is that low frequency characteristics across a large range of data may lead to empty cells so for example I found when using ethnicity that the no number of Gypsy Traveller and Roma being unevenly spread means if I try to take a countrywide approach and lower geography I won't get the data because there will be so many empty cells and that increases the likelihood of their suppression so I think to address that you need to probably focus on the geography you want and where areas are suppressed to go up a level to cover the gaps so I'm going to move in a little bit to a demonstration but this is the ONS front page which I'll move on to so you can get standard tables you can get the flexible table builder and topic summaries and looking at this when you go and I think this number has probably gone up when you go into standard defined tables you get a long list in the series of filters down the left so it's quite time consuming so when you look in those I'll demonstrate that later on but it takes a while for it to filter down and you've still got quite a lot of scrolling through and when you move into the custom data set you get different types of variables so you your population type you can select households household reference persons, usual residents and then usual residents in households and usual residents in communal establishments and in the example I've gone for usual residents in common establishments I think in households but just to have a look at where I arrived at I was looking at housing deprivation by household composition and tenure in Greater Manchester and what I've got in this table is housing deprivation for different types of household types and the percentage of the total number of that household type so if you look at that you can see across the first line for a single person under 66 that levels of housing deprivation increase to 4% in social rented sector and to 9% in the private rented sector as you move down that list if I pick out the one that always strikes it struck in 19 in 2011 sorry was our households with dependent children so these are liked to be either extend family households or people who are living in multi households and across all of the tenures they're very high rates of housing deprivation now the biggest indicator of housing deprivation is overcrowding so you might say that's not that surprising really but 40% of those who own our right 43% of those who own with a mortgage or a loan 60% and 62% in the private rented sector so so an interesting table for Greater Manchester start digging into the data um what I've done with the next one is to pick the borough eye living which is stop port and to look at those counts and percentages on the same basis but without the breakdown of household types so what you can see from there is um relatively high levels compared to other tenures for those in social rented tenures of housing deprivation but in some areas quite small numbers right so I'm going to now run a demonstration of how to get the data hopefully you can see the on s front page there and if I click on data and analysis from the census um I talked before about the census 2021 dictionary that's down there on that page um there's a whole set of information about topics and some interactive stuff around building a custom area profile which I haven't explored to be fair so if I go into the get census data this is the standard tables so now there are 331 so if I take out aging I'll leave demography I'll take out education ethic group health disability so I've taken out quite a lot of data I'm used looking at housing and demography and I've still got 67 results um which is quite a lot of tables so I could probably find the table I wanted here um but that's not necessarily um so that's gonna I'm gonna just go back to the start start again um to create the same kind of output I'm gonna go in here so um as I said before you will get um these options so I'm looking at all usual residents in households um and it's defaulted to the area type of lower tier local authorities so I'm gonna change that to um the electoral wards and divisions and I'm going to change the coverage and I'm just going to select the those in a larger area now for this example I did it for 10 before but I'm just going to go with one authority and I searched for it and then add it so now um if I continue this will tell me I've got 32 areas available the area type is electoral wards divisions and the coverage is for Manchester so I can now go into adding a variable and if I browse the available available variables I'm going to pick up um household composition I'm going to pick up household deprived in the housing dimension I'm going to get tenure so that's the table I demonstrated for Stockport we will be able to recreate here for Manchester so at the bottom of here when I click continue it will tell me how many areas are available but it also tells me which household compositions it's selected so here it's gone for six categories I'm actually interested in more detail so I'm going to go for 15 categories which breaks down into pensioner households those with children those without etc in slightly more detail so if I continue those are still all available household deprivation is simply two categories and I'm happy with tenure so when you start to explore this you may well find some of your areas are being suppressed so you're forced to make decisions either about the geography or about the number of categories you're looking at um and we'll talk a bit more about that when we come back to looking at it but if I get the data um and here it then allows me to have it in different formats so I'm going to take it in excel format and download it and open that file okay so so the missing value here is minus eight I'm just going to open some of these out so we can see them properly so in effect we've got the geography the household composition code and the household composition type housing deprivation and the interpretation of that and then tenure of household so I'm first of all I'm going to look at those missing data because they they behave strangely so if I put a filter on and just pick one of these if I take the housing deprivation first and click on minus eight check if there are any observations so for many of the variables there won't be any observations but somewhere there are people who might be in your some in your population but who won't have a category you might find their accounts so for example um when you look at um occupational variables like social class um you will find that there will be counts in those fields so I'm just going to delete those rows out turn the filter back off and then do the same for tenure check whether any observations are unknown and delete those take the filter back off and then the same for household composition and there are none there so I'm going to delete those so I've now got rid of the missing values without losing any data um the next thing I might want to do is to think about transforming um these variables so if I look at this one here if we look down we've got one person households age 66 other um other related families so let me extend that um so I'm just going to so so what you find when you go down this list is that you get um a single family household married with dependent children um as code four as code seven you get um a cohabitant cobalt family with no children so those two from the analysis I want to do are the same so I combine those two by using the filter to select four and seven um maybe reordering them in a way that I want to so I my preference in looking at this is to have those households um who aren't pensioners below and have no children so single couples with no children um no parents with no children uh sorry um other families and other households um together as the first group so I would create a new variable and I'll show you what this looks like in a bit so if I then say look at tenure I've got owns outright owns with a mortgage loan or shared loan I've got two categories for private renting one includes um rent free and I've got two categories for social renting so for my purposes I've combined those two so I'm not going to go through doing that online because I've prepared one earlier so I'm just going to switch over to that so here this is the great amount of the data so we've got the electoral wards and codes on the left we've got the local authority that I've added in myself um I've created a new value here for um household composition and I put them into an order so we've got a single person couple with no children other family structure other household couple with dependent children no parent with dependent children the other household with dependent children similar um couple and lone parent with non-dependent children and then single pensioner and pensioner family um similarly with tenure the new variable I've created is owns outright owns with a mortgage loan social rented private rented or rent free so I've got three questions that I'm just going to go um here so um these are individuals so I could have used the household one so I'd be looking at households um observation is the same as count so it's the number of um people in this case would fall into that category um so what I've done when it's nil is to take it out I'm just going to take those out um I have got used to doing my recoding in Excel uh you could do it in software um I have wondered about doing it but um it's fairly quick for me in Excel but it would be an option so having having created that data the next stage would be to insert a pivot table so to do that you go to insert and pivot table it would open a new um screen so I'm just going to show you that screen now for local authorities so here I've got for all local authorities and this was the table I showed you um a set of data for each household type so for this household type those who aren't housing deprived and those who are housing deprived I've done some work to generate a field which says the number of housing deprived people and the percentage of the total of those two so you can see that that gives a kind of reasonable picture but if you were say you were working greater managed you wanted to look at the breakdown by different authorities you could pick a different authority and that would give you that picture um you might want to do some tidying up so you could actually read it um but let's have a look at this other households with dependent children um are at that level in 36, 47, 58, 65 let's try a different board let's try um Oldham and if we look again across that um other household with dependent children we can see much higher levels there so 54 percent who own outright 55 percent with a mortgage 72 percent social rented and 73 percent private rented so that has given us the kind of ability to look at um data by creating this kind of pivot table structure with a function you can look at the whole picture and then um you could compare authorities with each other over to yeah by some work um okay Gemma I'll come back to your question after um I'll come back to those questions later when we've finished this demonstration so the next area I developed for again inserting a pivot table and that created the table we generated there um that I showed you for the wards in stopport so I've set a filter here which I've left on I could change that to another authority and have a look at those figures for that authority and the wards in that authority okay I would probably need to adjust depending on the number of wards so I did that for stopport which has a certain number of wards Bolton has one less Manchester probably has more so there might be a bit of playing around to do that and then the last thing for those of you who are going to get into mapping or who are thinking about it um this kind of data is not necessary in the format you want so I've just provided in this worksheet and again this will be on the shared page um some data on the map preparation so this was a pivot table for an area of one of the variables so I could have done something more complex I would have had to create multiple roles but this is an example so I've just gone with household composition um by area and I've then worked on that so I've taken that data using formula using shorthand for the different types of families and put them in the order I want them and then on the right hand side generating percentages which are the percentage of that group across all of the household types there so if I wanted to map particular household types in Manchester um so I wanted to see where couples with dependent children were or if I wanted to combine families with dependent children I might combine all three of those the columns q r and s and that would give I would be able to generate a kind of chlorophyll map using that so this example is in the worksheet that I will give to you I think there's a question here about what happens if that doesn't give you that I haven't gone through the details of what happens when you get statistical disclosure control suppressor's data so there are two things you can think about the first is to commission a data set from ONS now my understanding is if you ask for a data set they will give you all of the details so one of the things you can't do with this data set is say well I'm not interested in this category that has a low count you you just get the area suppressed on that basis so what you can ask them to do is to generate the category combinations that work for you so that might be particularly useful if you're exploring ethnicity at a lower geographical level and I think one of the projects going on in Belfast has done that and has generated that which is used for a paper looking at deprivation ethnic by ethnic group and the second thing you can do is explore the micro data now what the micro data does is give you combined local authority level data but it does give you multiple variables so you can begin to explore some of the combinations for most urban areas at the urban geography I think there's a couple of exceptions where that isn't true and when you get into smaller districts you may need to look at county level but the micro data enables you to put in more variables than you can into this and probably answers one of the questions that came up before so just some notes on the data that there so the the data sets on households household reference persons and usual rent residents can go down to output area for residents in communal establishments and the usual residents in households that only goes down to MSO or a level which is why I use the ward it would have been easy to use eyes of them but it needs a larger population size because that's the decision made so if I wanted to look at households at NSO a level I would need to request a data set commission a data set and the obvious thing to say is there'll be more limitations on the number of categories available as you look at smaller areas it's potentially significant where you have a number of categories so I did in preparing for this think about using ethnic categories at NSO a level and found quite high levels of suppression where I tried to include other variables household composition is only available through that interface at MSO a level and I suppose one of the ways to address this is to think about a phased approach to answer the question so you can overcome that issue so the last thing I would say before I come to the Q&A is just about the process of doing this so I would say plan out what you would find out open end exploring is interesting but you're more likely to waste time from personal experience document the steps you take particularly if this is not something you're like you know something you might want to come back to in six months time you won't necessarily remember so you'd have to go back and find out so if you write a short outline of the steps you've taken and try to separate out your calculations so you don't overwrite original data so in my case I use formula in Excel if you were using a programmatic solution you would just be generating another file thank you very much for coming