 Ni od border plenty. I welcome if you're joining us out. So I just give people a couple minutes to get in? OK so first of all a couple of caveats. Hello welcome. My name is Nigel De Naronna. I work for the UK Data Service at the moment, I suppose a lot of the material here really comes from my experience of teaching on the graduate Geography a news flynzers Da'수를 CHARPS which is going to have been through various incarnations and you'll get a chance to look at them afterwards, but they focus on using Excel. So it's a strange time really. I mean, we initially planned this workshop for November based on what we thought would be the availability of data. We moved it to now and we are still kind of relatively short of data, so in the examples I'm going to use, Yn y cwmhyslu, rym ni'n gweinio 2011 oedd yn ddaint. Byddwn i fod yn cynhyrchu a chyfnodd, byddwn i'r ddweud o gweithwyr ymddangosol ar gyfer 2021 neu ar ydych chi. Mae'n ddweud â'r ddweud a'n oes i ddweud eich cyfnodd a'u ddweud o'r ddweud o'r ddweud, ac yn ychydig i'r ddweud yw'r ddweud. Yn ymddangosol er mwyn i'n gweithio ar gweithio'r ddweud, sy'n gweithio'n deallu o'r ddataeth absiwys. A'i wneud i'n fath o'r dysgu faniwys arall a'r cymryd yn chweithio'r ddau. If you have questions, put them in to the Q&A and we can stop at a convenient point and talk about them. So, the aim is not for me to talk a massive amount, is to tell you some things to give you some time to have some practical application of using Excel for some exercises. Come back and talk a bit more, and then have a look at the ways you can access data, and then reflect, I think collectively on the ways that this might be useful in our future. And maybe inform some of the work we will do on developing materials to support more use of this. First of all, just the context. There are three statistical authorities, so there are three censuses. ac, ond mae wedi'i gael iawn, oherwydd sy'n gweithio'ch gweithio gyda'r pandemig, y twmfynol ar ein hirio ddechrau yng nghymru yn y Gweithreib yn Llywodraethol, yn ystod y byd 2021, ond mae'r Ysgrifffyr yng Nghymru yn y Llywodraeth. Mae hynny'n meddwl bod y ddefnyddio'r informatio yn ei ddysgu'r gweithreib yma, felly rydyn ni'n meddwl nid yma yn y Gweithreib yn y gweithreib yng nghymru, ond mae'r dyfnyddio'r panthoedd yn cyfrifiad. England and Wales have now nearly completed releasing all of the single variable tables, so the last one is due this week, and the next stage is to release some multivariate tables. Okay, so Jackie's asked the question about what tools we use. Maybe we could hold on to that one and come back to that later on because I think it does depend a bit on what we're doing. I suppose I've given this a kind of basic statistical literacy for students, which means using Excel as a common denominator, but there's no reason you couldn't use other things. I mean, the currency of the download data tends to be Excel, and it will be interesting to come back and see how other people might use this data, particularly when we get on to mapping, which is not part of the focus of this workshop, but clearly of interest to many of us. So once again, I suppose there's a debate about whether this is the last census and whether we're going to move to understanding our population from administrative data. Because of the pandemic, there was a significant amount of work done to validate the census estimates using other sources of data, and there is material from ONS on that for those who are interested in that. Okay, so what I'm going to cover is, first of all, the census outputs. Happy to be stopped here. I'm going to start with the obvious ones that are available, but as we move on, there are some kind of nuances to what's coming to explain the geographies that are available, to have a look at the kind of data that we're going to get. The tools, talk a little bit about tools we might use to understand it, and then move on to practical exercises, and finally some reflections on how you found that material. So in terms of census outputs, I've grouped them into four chunks. So there are different ways of cutting this, but one clear one is the geographical data. So the kind of boundaries and things we need in order to create maps of the data we're using. There's a set of univariate data summarised in topic summaries, which are currently being released and area profiles, and there's also multivariate data. So the first of those are due out next week, which will be the sexuality gender identity with sex and age, five-year age breakdowns, the same for race and the week after religion. So those are some of the predefined multivariate data. The other thing we're being promised now, which is new, is flexible tables. So this means we'll be going to cross-tabulate what we want, and the outcome that we get will be subject to statistical disclosure control, which I'll say a bit more about. Margaret's asked about the slides. The slides will be available on the event page. One thing I should have said is, because when we're discussing, we don't want to get tied up with asking new permission, we're going to stop the recording and discussion at that point. So recording, sorry, and transcription at that point. So the discussion is for our group, but the presentation and the practical exercises will be available, so all of the slides will be. And then the last parts are possibly the most complex, but I'll go over those a little bit more in the next slides. So first of all, the geographies. So we have a fairly standard set of administrative geographies, local authority, health bodies, and so on. And those do differ between the countries. Most of my examples do tend to focus on England and Wales, partly because that's where I've done my work. Then we also are due fairly soon to get the electoral data on constituency and ward. And finally, we've got a kind of, I suppose, a statistical area that was developed in 2001 to move, I suppose, away from the changes that arose from changes to boundaries. So if you looked at the census before 2001, there would be data at the lowest level collected on what was called in the new ration district, which was where the census collection team would operate in order to go and make sure the forms were completed. So they were fairly arbitrary geographies located around community hogs so that there was a base where people could work from. So in 2001, there was work on developing more homogenous areas, output areas that have similar types of characteristics. And that work developed a kind of definition of an output area and then building blocks, use those as building blocks to make two other areas. And that has been maintained for 2011 and 2021. Clearly, there have been some changes over that time, but the aim of that set of geographies was to try and minimise the change. And I think overall, the change in each year has been less than 5%. So thinking about teaching particularly, the examples I use here are focused on country, region and local authority as a kind of national geography that enables you to see kind of high level detail and to explore more detail. Other people might do work with constituencies or if you're working on a geography and some of our new materials will be focusing on this local authority geography within it, the neighbourhoods within it. An alternative geography that will emerge for some later on with the census releases is workplace geography, so that's the places where people work. So just to go through quickly, an output area has a threshold size, it has a minimum and a maximum and a target number of around 125 households. I said some of those. Those are then built up into lower-layer super-output areas, typically five or six, and mid-layer super-output areas, which are kind of akin towards size around them, 2,000 households to 6,000 households. And all of those output area geography boundaries are constrained within local authorities. So to give you an example of the way those work, and this does use 2021 data, I went on to talk about the different geographical data administered to the electoral and statistical, and then the ways that you might define the geographical scope. So most of the examples, the practical examples in Excel would use region, country region, local authority. You could also look at constituency, and if you're interested in the area you're in, you might want to use neighbourhood data around one of the statistical areas within a local authority. And later on, we will get workplace data. So I'm not going to go over all of this just to say there are three types. So the output areas, the basic building block, those are built up into lower-layer super-output areas, and mid-layer super-output areas. And just to give you an example now, these are the six London boroughs with 2021 data, and what these are showing is the proportion of households who have unrelated adults with no children. So this is more than two adults, so not couples or others, and it's a proxy for measure for houses under multiple occupation, and this is the highest level of geography, statistical geography, the mid-layer super-output, and you can see some concentration in parts of the boroughs in Tower Hamlets, Newham and Hackney. When we move down a level to the lower super-output area, you can see that the detail becomes more granular. We can begin to see patterns of concentration. The largest area has 33% of households of this type. And then finally, when we move down to output area, that granulation becomes very detailed, and the area with the largest proportion actually has more than half, 56% of people. So that's a kind of showing you demonstrating the geography, but I think in terms of looking at this, when we're looking at a single variable, in most cases, we can use output area once we move into two variables because the numbers are so alike to be small, it's quite likely that a number of data figures will be suppressed. So what happens with the tables that you get is, first of all, if there are small counts, they may be swapped between areas. And secondly, under the new statistical disclosure mechanism, if something is seen as disclosive of either individuals or a group, it will not be released, so you would get your requested table and it would say, this isn't available at this geography, or this is not available in these areas. So there's always going to be some kind of trade-off between the detail of the geography and the detail of the variables you're looking at. Okay, so let's move on to the data products. So the reason I've used 2021 data for East London was just to illustrate the point, really. It's an arbitrary choice, it could have been Manchester, it could have been Birmingham, it could have been anywhere, to do that on a national level, but let me look at the geography because you need to be concentrating on one or a few, not a small number of local authorities to demonstrate the difference. So there is 2021 data for the country across all of these indicators now, I think, apart from health, the health ones which are coming on Thursday. Hopefully that answers your question. Okay, so the topic summary is here, there's a set around demography and migration, so these are things like household composition, whether people are living in communal establishments, and a set of data around migration, particularly around those who, the year of arrival in the UK when age of arrival, etc. And those are the basis for future tables that will have relationships with those, but at the moment those are single tables. They are the ones that were used for the press releases around Albanians on the release of this data, which coincided with the events in an immigration centre as well. So from the other side of that, there's a lot of stuff about migration, but there's also quite a lot of stuff about the way people live, the kind of households people living in, living arrangements, which is where my previous example came from. The next release was actually veterans. Okay, can I hold on to that one Lawrence, can you come back with that question later on, because I think that's getting into quite a complex area of analysis. So I'll just focus on what we've got with the data now and then talk a bit about how we might use it with other fields a bit later on. So the veterans was tied up and released with, sorry, was released the day before Remembrance Sunday, I think, or the week before Remembrance Sunday, for some reason. There was a whole large group of questions around ethnicity, around national identity, that is what identity people feel they have, link perhaps to the migration ones which taught us about passports, language spoken and particularly kind of competence in English and Welsh and religion. I'm not going to go through these in the order they were released, we will have health disability and unpaid care questions. A whole set of stuff about housing which covered largely households at this stage in terms of tenure, the type of property and so on rather individuals and also provided some information on second homes, on cars and van availability, so some of those kind of semi-property related things. A whole set around the labour market and travel to work, at which point it's probably worth saying there is a kind of open question on that data because of the impact of COVID and the time the census was taken. So for those doing that kind of analysis, you may well be looking at alternative sources as well. Education has just come out and the last release that got quite a lot of coverage was around sexual orientation and gender identity and I think I'm not an analyst but I think it was quite, it's not an area for my study particularly but I think it was quite refreshing or quite welcome that the participation rate in these questions was pretty high. So one of the concerns I think in asking a new question that was volunteering was will people play, will people complete this or will it end up being not very valid because a lot of people haven't filled that in. And the result was it was quite well completed. Okay, so in terms of the multivariate data, so what we're beginning to get as I said that we get just before is some defined set of tables and my understanding is that those will become available over the next couple of months and that into March we will get what's going to be called deflectable table builder and this will dynamically calculate tables for you but it will apply a statistical disclosure control which will not therefore show things that are seen as disclosing. Also the variable categories are likely to vary between the univariate and multivariate and the point I made before about balancing geographical scan and the level of detail you are. So these other types of data are all going to come a bit later on but may well be interesting areas of studies that in terms of alternative population basis will have a picture of workplace populations, workday populations, out of term population and second address population. So if for example you were looking at the impact of students and where students came from to a university you were teaching in, you could use this data from the census, it's fairly contemporary and in the census as completed in 2021 students were asked to complete their term time address whether they were living there or not so it should be fairly reliable for at least UK born students, it might not be so reliable for those born overseas who hadn't registered the place already because they wouldn't obviously be completing the census. A small population so ethnic group, country of birth, religion and national identity again I mean I've looked at the ethnic group data and it is really quite confusing trying to unpick it at a local area level but it's basically a writing category so there is lots of detail there there's some data quality with dealing with it but lots of data there that might be of interest. Flow data is a new type of data for some so what it holds is an origin and destination is typically used for looking at migration flows, workplace flows and this time it'll be also be looking at second address flows and student flow so what it is based on is asking the question of where you live 12 months ago in terms of migration, in terms of workplace it's where do you live and where do you work, in terms of second address it's do you have a second address and if you spend more than 30 days there what's the purpose so is it a work second address are people going away to work somewhere is it a holiday address etc so you can get some picture of those kind of flows and students and largely focusing on on HE I suppose in terms of the volume so that data is quite interesting probably available from early summer onwards similarly the micro data for those of you who want to engage in more complex analysis it's a 5% sample of either regional or combined local authority geography or a 1% household sample so if for example you were teaching in a city and you wanted to look at a multivariate analysis of the population in terms of the data available within the census materials which is reasonably comprehensive so you can look at some reasonable descriptive reasonable dependent variables around deprivation you could do that one note though is that doing this is this is a safeguarded file so students using it would need to register with the data service and have permission to use it but this data file has the 2011 file had around 2.8 million records and if you cut that down to a city size you would you would get a reasonable sample of data so I was what teaching in Nottingham I had a sample around 15 000 to do an analysis so again a potentially interesting source if you want to come back to that later on as I said I'm focusing more on the kind of early outputs from the census so just to kind of summarize now what are the potential topics of interest so there's a whole set of things around demographics, population, migration, living arrangements, population, pyramids, age and sex distributions etc and if you're in a university town it maybe is quite interesting to compare the age profile of the population there with the age profile of the largest surrounding area so as I said I taught at Nottingham and the distinction between Nottingham as a city and Nottinghamshire as the surrounding area was was quite stark in terms of that kind of younger population who tend to move to study or to work a whole set of things around identity, around ethnicity, national identity, religion and language, work, education, housing and health there are other topics there but those are the ones that I kind of picked out as ones that have some resonance for teaching now what I'm going to do now is come back to to Lawrence's question I think the answer Lawrence is the other way around really so if you take data from understanding society and other sources there is a mechanism to link that to census data and there are two ways of thinking about that one is to link it at individual level which clearly requires kind of use of a safe environment approval so from the statistical authorities that was used quite a lot in the analysis of COVID cases, COVID mortality etc with pre-release census data so there are individual links to find out information about people's occupations etc that were part of the explanatory explanation from why particular things were happening in particular ways the other way to think about it is to link to an area base so you could take neighbourhood characteristics and it's not something we've talked about here but in essence we could build up a profile of a neighbourhood by taking variables associated with that neighbourhood and then say you know I think some work has just been published today by Queen's Belfast on diversity and increasing diversity in neighbourhood so if we could characterise neighbourhoods as diverse as lower class as having particular class profiles we could use that potential as an explanatory factor if we had geographical data in the survey data so the link is more from the individual survey data to either potentially the geography of census data or individual census data does that answer your question Lance let me know if that's okay so I mean I think if you do want to pursue that it's it's something you would need to talk to either the UK data service secure service or ONS secure service about that using that kind of linkage has clearly got issues and you would need to go through some approval process to do it and that's managed by both ourselves and ONS separately for for those projects okay so what we're going to do now is to have a look at some basic exercises using 2011 data this is also an opportunity for you to take a comfort break to get yourself a drink and whatever there are some materials that my colleague will provide a link to in the chat there are four sets of materials the first set are based on simple univariate data using regions so you can see the data on one side I'm not suggesting you have to go through all of this it's to give you an idea of the flavour of that the second gets more complex so it gets you to navigate a local authority set of data and then the third one moves into multivariate particularly picking up on pivot tables in excel and the last one is about text formatting so as I say they are quite detailed exercises you're free to take them away with you we will be upgrading them to our new accessible templates and 2021 data at some point in the future so if you want to wait for the more accessible versions that's fine if you want to take them with you I think you'll need to download them now and keep them and you're free to use them in any way they were paid for out of q-step monies initially and are kind of common okay so Margaret's asked about individual respondent level though there's nothing specific about primary or secondary schools attended so you can impute the area of attendance but there is nothing held at that detailed level if you had a primary school survey I'm not sure whether they would let you link you know that's an open question because the the individual educational data is quite secure as well I'm putting that together with census data um there might be nervousness but that would be the way to do it um to pull together the plastic data the whatever the pupil census data with the census data if you wanted those additional variables okay so um what I'm going to suggest is um we kind of give this maybe 20 25 minutes to see how you get on and I'll check in with you then to see what's happening meanwhile I'll keep watching questions and if you need support with anything then then just give us a shout hey um I'm going to um go through the different ways of getting 20 21 census data after this session of trying the materials so um if you hold on I'll go through those after this session just to show you the three different ways to get them hi Margaret that's an interesting question um I mean I suppose that the kind of work I've done with this has been um around skills development so it's around students answering similar questions and learning how to interpret data which is one way it can be used um I think the other way I've seen students use it and I would encourage them to is to paint some contextual information about broader areas of study so typically it's very powerful in a dissertation to be able to map some of these variables and show um what's happening with them in terms of detail work with the census um that's kind of complex I'm not sure it I'm not sure how easy it is for particularly undergraduate students to find something original um and I think if it forms part of postgraduate study it's like to form part of the methods used alongside other things um I hope that answers your question but happy for other people to kind of contribute what they think about that hi Tecwet um I'm I'm not sure what you mean by that if you're just saying is there an archive of projects um probably the short answer is no I mean this data is is is becoming available now um it's widely used and not just in um in the academy but also used significantly by policy makers um health professionals etc so there is a kind of a way that this data is very open data is used in all sorts of ways so when you look at briefings for parliament on the population when you look at um material used by local councils by local health bodies um by researchers etc there will be masses of work being being done a lot of it will be I suppose not original research but interpreting things within particular local context um you could search for census 2011 if you wanted to see what projects have been done using census 2011 data I think we're quite early on with the census 2021 released to see more advanced forms of academic work going on here hopefully that answers your question but but please come back to me if you want to clarify if anybody else wants to add to that um okay mark right hi um in terms of all of the tables and the flexible table builder I'd be pretty confident that will all be available by um by May 2023 I think the kind of the release of all the topics has finished I think all of the um predefined tables are likely to be available over the next few weeks and the flexible table builder will come as a big bang following on from that the other products and all the geography products are there and everything is being adapted to fit the additional geographies um I think the ones that might come later are what they are in ethical phase three which is the flow data the micro data and some of the alternative population things um that's for England and Wales clearly the Scottish data is likely to be a year behind that and if you want a combination of UK data that's going to be a further year behind there's some harmonisation work so the short answer is a lot of what we'd expect to see will be available by then but some of the more detailed products we might still be waiting a little bit for um okay sorry I've put off mac versions as well um there are no real differences in the mac versions three and four the big differences are in some of the keys you use and now those first two workbooks give you navigation keys which differ um with the mac in part three and four we're really looking at basically pivot tables and text formatting so we're much more into um not directly interacting key wise but building up formulas hi Liz um I think that what you'll find is the population um is for all ages and the qualifications figures are for working age groups so you'll you'll find there's a difference between those two um Margaret I think with the Scottish release Scottish release we're unlikely to see anything um by May 2023 I mean I think uh you could talk to national records of Scotland or you could put a call into our help desk and I'll refer it to my colleagues who liaise with them regularly to find out but I think they they thought there was a year from closing the census to getting to first release data um and we've seen with England that there've been problems with that data as they've released it we don't know that's going to be the same for Scotland but clearly they've got a kind of a significant estimating uh piece of work to do before they get the general population data so Scottish data won't be available and then I'd be pretty confident to say um but if you want to contact me outside of this I'll refer it to a colleague to get a kind of more detailed date yeah I would say so I would say that's a kind of reasonable plan to go with but like I say if you're if you're producing project proposals and thinking about what you're going to do in terms of that bit then I'm happy to kind of give you the information that my colleagues get from attending those liaison meetings with National Records Scotland um okay Payan you've asked about the methodology um I think that I will put a link in the chat just to show you um right so this is um an ONS page which has the original questionnaires um I'll put it into the chat rather than here so there are some differences between the countries as well um so the well uh particularly around language but there are some ordering differences around ethnicity as well and others but um and I think if you look at the Scottish questionnaire which is held by National Records Scotland um they ask a lot more details about health um hi it's a question from Nigel about multi-variant flexible table option um I'm not sure when that would be available because the workplace geographies are coming kind of later on as alternative population geographies but I would expect them to be available as um as that sounds like it's going to be the main interface now there are three ways of accessing the data and I think they are all in um a state of change so for those of you who have used 2011 data um you may have used Nomus or the UK data service um the ONS is new the ONS interface is has changed over time um I'm happy to go through it um on the screen but I'm going to try just with the slide and see how you get on and if you're struggling with it I will demonstrate it as well um because it's not the slickest of things to do um so first of all the ONS now one thing to say is that all of these interfaces don't yet allow for multi-variant tables uh multi-variant data and particularly when the flexible table builder comes along that's going to sit within the ONS website so the way that Nomus and the UK data service access that um is is not really clear so that's about ONS Nomus looks a bit like it did before and I presume it will get more like it uh if you look at the 2011 in space where you could pick variables um select your job of the etc um and the UK data service is a completely new interface it's under development so there are areas where um I've asked for changes based on using the material some are in there and some are on their way so let's start off with ONS so first of all from the front page you can go to the census and you'll get a page like that on the left and currently what's available are topic summaries and if you pick topic summaries you'll get the top screen which shows you a topic summary um you can pick any of those topic summaries so in this example I pick ethnic group an ethnic group and when you go down um to the variables you get into this get the data so you can select your area type um you can select your coverage and what the coverage does is it now allow you to reduce the geographical scope so for example if you wanted to look at um lower level super output lower layer super output areas in Manchester you could pick the area type as lower layer super output areas um and then pick the coverage as um an area you would be asked to type that in um it will search and come up and you select that and it will then restrict what you get back based on those two selections um so I'd suggest you have a go at the ONS one um if you're struggling with it I'm happy to demonstrate as I said but it's not the slickest of things to do um and I'd like to leave a bit more time for the discussion okay so um I mean again to explain if you go into NOMIS so the address is nomisweb.co.uk um and there are different ways of accessing census data there's a tab along the top that says census and you can pick 2021 um there's the census statistic data catalogue on the right um and there's also the query tool so whichever one you pick you arrive at uh topic summaries and data downloads and if you go into topic summaries you get each of the tables that are are available um within those you need to choose your geography um so you can just select from a list you can select areas within which would give you something analogous to the way we did the um the data for uh the ONS interface so we could pick the lower layer super output areas um so we select areas within we select select those lower layer super output areas and we say within um local authorities and then we navigate to the authority we're interested in. When you come to the ethnic group um the default is to have no categories so you need to decide on the categories you want so I would typically tick the top box on the third column that you might be able to see on the bottom right to select all the categories and then download that data so again we'll just have a five five ten minutes having a look at this um yes yeah the question about um geography because of it this is the first ONS access to census data so in 2011 they just provided to Nomys and the UK data service they don't specify um the different geographies so they implicitly means the 2021 version um I think if you took the um 2011 census question you would find that it would use the 2011 geography in Nomys as well um those the two I'm going to encourage you to use for the time being um and the reason is that the UK data service though it will get there doesn't have that geographic scoping so when you you run it it will pull down all the data um for the country depending on whatever geography you pick um so that's likely to come in a subsequent version but I'm not going to take you through that because it could um take quite a lot of time but broadly speaking what happens if you go in there is there's a lot more support material so within the learning hub from the front page of the UK data service there's information about the different censuses which you might find interesting there's some explainers about different variables that have been released so for example the new ones on gender identity the sexual orientation have some explainers about what what you're like to see and you have access to aggregate and and boundary data as well so the top side shows you the the aggregate data and the bottom side shows you the boundary data those at the moment are very much works in progress so rather than saying go and have a look at these I'm putting you off I'm going to suggest we skip that one and um we will publicize it once it's fully accessible and we'll be using it in courses in future but for now I would say it's not unless you're looking at national um tables um or national geography overall it's probably not the best