 Hello and welcome to this webinar on Geography and Longitudinal Data. I'm Oliver Duke-Williams and I'm joined by Dr. Gundy Kniece at the University of Essex and we're going to be talking about geography in understanding society and in the ONS Longitudinal Study. In a moment I'll hand you over to Gundy who's going to start the session talking about issues to do with geography in understanding society and after Gundy's presentation I'm going to talk about the Longitudinal Study. So at this stage I'll hand you over to Gundy in a moment. I just want to say thank you to Vassilis Rutsis who in the background is doing some of the work of controlling who sees what and who's presenting at what time. So thank you to Vassilis for that and I'll now try to hand you over to Gundy who is going to talk to you about understanding society. Hello guys, I hope you can else. In the next 15 minutes I'm going to give you a quick overview of the understanding society study design and content and I will then launch into a description of what geographical identifiers we provide with the study and how it might be used. The study is rather complex so I can't really cover all of it in 15 minutes that I have for this talk but there will be some links for further information and following Oliver's talk. There's also some time to ask us more questions if you have any. As you may know the UK has a remarkable suite of longitudinal studies and understanding society has a special place in this as it provides annual data that allow us to look at short, medium and long-term effects of social and economic change on individual well-being. Unlike other studies understanding society is not rooted in a particular discipline such as demography, health or economics but it is designed to be useful for a whole range of disciplines. So it collects hard indicators such as income and marital status but it also has soft well-being indicators such as relationship satisfaction and income expectations. Now in terms of content the bulk of the questions we ask is around the six key topics listed in the left-hand side box of the slide but there is also a great detail on the contextual factors such as neighborhoods and social networks that may help explain key outcomes such as what education we do, what we work as and how healthy we are. Goody, I'm sorry, can I just interrupt you a minute? Can you just maximise your window because the slides are appearing like... Oh, okay, sorry for that. Okay, sorry about that. But in addition to that there's also some room for questions such as political behaviours and leisure activities for instance party support and participation in culture, media and sports so in 2012 for instance at the time of the Olympics we asked about whether people were following the Olympics or whether they attended. Now the best way to find out about what information is available is to visit the study website or the content highlight section of the user guide for a quick overview. Now another thing that is special in understanding society is that it has a household focus of design so we started with randomly selected addresses and then each way we follow the sample members as they move and form new households and each time we do not collect information just about one person but we collect information about all members of their household. Now in terms of who is in the sample the understanding society study really has two core elements. The first element is the continuing household panel survey sample. The sample was originally drawn in 1991 in Britain and later on incorporated boost samples for Wales and Scotland as well as the sample for Northern Ireland from 2001 onwards. So already in 2001 or since 2001 the study has been UK wide longitudinal study. The continuing sample was then integrated into understanding society in 2010. Now the second element of understanding society is the new household longitudinal study element that started in 2009 on 10 under this understanding society study brand 9. The sample was much larger than that of the BHPS and designed so that it is representative for all regions of the UK. It also included an ethnic minority boost sample and since 2015 there's also a boost sample for immigrants to the UK. Now the bulk of our data is collected using face-to-face interviews with adults so that's people aged 16 and over and in self-completion interviews with children aged 10 to 15. On this slide and the next I show you how the responding sample developed over time. So what we can see here is that there were just over or just under 20,000 respondents in the first wave of the BHPS and the sample size increased greatly when understanding society started which was in 2009 as a study. Now in 2010 the continuing BHPS sample then provided around 12,000 interviews as part of the understanding society sample increasing the total number of interviews with adults to over 50,000. But of course there's also attrition and non-response which reduces the sample size over time. When we look at the sample with use respondents the pattern is actually pretty similar. So we can see that there were around 700 to 1,200 use respondents in the BHPS from wave 4 onwards so that's 1995 I think. And when understanding society started this increased to around 5,000 young people so that is roughly 1,000 interviews for children of each age in sort of each 10 to 15 age group. So quite a large number. Now understanding society is a prospective survey with retrospective elements and the questions are repeated and that is actually what allows us to look at change over time but not every question appears in every year or is asked of every person so there are rotating modules, event and age-triggered questions for instance. In addition to the data collected in personal interviews however we also have collected really cool biological specimens during a health assessment in wave 2 and 3 and we have also got linked administrative records for instance from the National Pupil Database. And just the focus of today's presentation we have linked to a great deal of spatial context data. Now because we have interviews as I said before our interviews are mostly face-to-face so we know where each household lives at each wave of the survey. We can then use the postcode of that address to obtain further information about these places from the ONS postcode directory. The postcode directory provides a great deal of information which you can read about more in the ONS geoportal website and let's put the URL on this slide. And for our study we extract more than a dozen key administrative unit identifiers and neighborhood classifications and we make these then available for each household in the sample for each wave and the data can access by analysts such as yourselves via a download from the UK data service who distribute all our data. So in this table I list the geographical information that is readily available with our data. So if you want these data you can replace the $4 signs in the URL provided at the bottom of the slide. Here is a study number that is listed in the second column of that table and this directly links to the UK data service shopping basket and then there is a quick registration and application process that you will be guided through by the UK data service and the letters in the third column of this table indicate the access rules under which the data typically are being made available which is either standard and user license or special license or a secure data access. Now the main interview data from Understanding Society is available as study number 6614 and 6931 and this contains the region identifier and a cross rule urban indicator. Now if you want to have access to more detail indicators you can also access these but you then would have to choose between the 2001 and 2011 census versions of this and there is also other sort of versions. So we have the output area classification and then we have ACON types and what is also pretty cool is the study number 7533 which we basically have linked four waves one to three of Understanding Society data from the Department for Transport Accessibility Statistics and this gives you easy access to more than 600 unique pieces of information that has been longitudinally harmonized and provides information about key services that are available or not to people in our sample. Now if this is not enough or not what you want you also have the possibility to link your own data and for this we provide a whole range of official geographical lookup data and the important thing here to say is that sometimes the official codes and boundaries change over time so if you want to link your own data it is a good idea to try and link your external data with the ONSPD first to see whether your data are compatible with our form. So when you go to the download section of our data so you click on the URL on the previous slide and replace the study number you will find that we produce or provide a file called like a geographical lookup file and that tells you exactly which version of the ONSPD was used for which wave of understanding society for which indicators should be really easy to test it out. Now we often get asked about the smallest geographical area for which predictions can be made and in this table I list some of the key UK geographies and show you how many of these are represented in the wave 1 sample. Now the smallest geographies are listed at the top and the largest at the bottom. You can see that we have a good number of cases in all regions. This is the areas highlighted green and there may be enough cases within the local authorities and the travel to work areas represented in the study 2. So this is highlighted in yellow. For smaller geographies we also have a very good representation in the study that is for instance for the output areas there are overall more than 200,000 output areas in the UK and we present more than 10% of all of these but when we look at the number of cases within each output area that we observe there the number is very small. So ultimately what this means is that for smaller area characteristics so those highlighted in red you would probably want to link area characteristics from external sources rather than make out of sample predictions from understanding society. Now when you are analyzing neighborhood context data on longitudinal and this is true which ever scale you are looking at it is important to consider that there are at least 4 possible sources for change in the neighborhood context. As analysts we are probably mostly interested in the change that occurred to people moving and in the change that manifests itself in a particular place over time and which could therefore help causally explain differences in individual outcomes but change in the neighborhood context may ultimately also be observed because of changes in measurement. So for instance the method to draw the spatial boundaries may have changed or the categorization of the neighborhood types has been affected. So one of the typical examples here is that there were quite a large number of redefinition of output areas between the 2001 and the 2011 census and I believe Oliver is going to say a little bit more about that in his presentation later on. So let's look at an applied example from understanding society. So here we compare basically levels of change in three different neighborhood classifications which we provide with the understanding society data with the census-based classifications on the left-hand side. We need to worry a little bit about boundary changes that occurred over time as we use different versions of the ONS postcode directory lookup file to extract these classifications but we do not need to worry much about the changes in the definition of output areas or rule type because these are fixed to represent the census 2001 characteristics. Now for the ACON typology on the right we use the 2015 version for all waves so definitions and boundaries are identical across waves. Now in this setting change over time in the neighborhood context that we see in the data or in the link data I should say can only occur due to individuals moving and only if they move from one type of neighborhood to another. The downside of this approach is of course that we cannot then look at how neighborhoods change for nonmovers and this means that for 90% of us who do not move from one year to the next we actually do not have information about how neighborhoods change the effect of them. So how much change do we observe then wave on wave for nonmovers? Now here in the slide we can see that the lowest level of change is observed using the rural urban classification which only considers urbanicity and settlement structures. This is because most moves are to and from settlements of the same type or from rural areas to urban areas and typically then once you have lived in an urban area you do not move back to a rural area. Levels of change are somewhat higher using the output area and ACON classification which considers sociodemographic and lifestyle profiles as well as settlement structure. Now from an analysis point of view you will probably have more information about change to exploit when you use these output area classifications or ACON but really the key decision to make is probably in terms of information content. So when you analyze individual level data from 2009 and to 2011 which is the case when you look at data from wave 1 and wave 2 of understanding society do you think it is more relevant that the neighborhood or what the neighborhood looked like up to 10 years before the respondent moved there or is it more relevant what the neighborhood they moved to will look like up to 5 years after they moved there. Now both of these sort of context characteristics that we provide are approximate measures but only you can decide which is the most appropriate choice in your particular analysis case. Now I'm kind of running out of time so if you want to read more about this and for exact number of cases in the non-condense classifications for the output area classifications etc you can follow the link I provide at the bottom of this slide and there's also some further links for you to follow if you want to learn more about understanding society and the different access routes there is a link to the online documentation here and also to the understanding society data on the UK Data Service website. On the understanding society documentation there's also a link to the understanding society user support so that's also a place where you can ask further questions but first I'm now handing over to Oliver who will introduce us to geography in the ONS longitudinal studies and I'll make it for you later. For now. Thank you Gundy. As Gundy said I'm going to talk about the ONS longitudinal study and about issues relating to geography in the LS. Understanding society which Gundy was just talking about is a superb source and gives us annual observations of households. The ONS longitudinal study offers decennial observations of the population of England and Wales so it's covering a much larger time in terms of the overall length of period of time it covers but it's doing it at different intervals. The LS is based on four sample birth dates. There's a sampling rate of 4 out of 365 which gives about 1% of the population of England and Wales An important aspect of the LS is that sample members don't know that they're in the sample. The four sample birth dates are not disclosed. I could be in the sample, you could be in the sample, none of us know. The data for England and Wales include both census data from 1971 through to 2011 and administrative data. Access to the data is controlled and it gives a little bit more information about how you can use it at the end of the talk. In the UK there are three longitudinal studies one in England and Wales, one in Scotland and one in Northern Ireland. They all have different time periods that they cover and they have different sample sizes. They also differ in the range and the amount of additional data that's linked to them. All of them have broadly similar secure access arrangements. This slide summarizes some of the differences between those three studies. The ONS longitudinal study, which is what I'm talking about today mostly is based on four birth dates. The Scottish longitudinal study is based on 20 birth dates and the Northern Ireland longitudinal study is based on 104 birth dates. Very different sample sizes. One of the things to note is that the sample birth dates are each contained within each study. The four sample birth dates in the ONS LS are part of the 20 birth dates in Scotland and those 20 birth dates are part of the 104 birth dates in Northern Ireland. What's in the LS? Similar to census microdata or the SARS with which some of you might be familiar, we have all of the variables from the census form from individuals but we have more detail than in the safeguarded and in the open microdata samples. As in understanding society, we have observations of all people in the households. We only track the sample members so as they move from household to household over 10, 20, 30 years, we'll see other people in the household at the time of each census but we don't follow those other people longitudinally. We've got an illustration of that here. This is a diagram representing a sample member who was born in the late 1960s and is observed in the LS in 1971, 81, 91 and so on. And we can compare two time points. In 1981, that person will probably see in a household, they're aged about 12 or 13 then, we'll see them in a household with a sibling and with parents. By the time we see that same person 30 years later in 2011, the other people in that household are likely to be that person's partner and that person's children. As I said, we have all of the regular questions from the census, all the responses to the questions in the census forms. And this slide just summarizes some of those. And on the right-hand side, showing some of the variables that have been introduced more recently compared to the others which have been in all the way through. In addition to census data, we have linked administrative data as well. So we have births of sample members. So if a person is born on one of the four birth dates, they immediately enter the study at that point. We also know about all births to sample mothers. We know about widowhoods and widowhoods. And we know about deaths of sample members. So for sample members who die, we have linked mortality data. So we know the cause of death. And the typical usage of longitudinal data is to compare that cause of death to things that occurred much earlier in the person's life. This diagram comes from the ONS website and gives a summary of the total number of people who have entered and left the sample in various different ways. These slides, as well as Gundy's, will be shared with you after the webinar. Another couple of slides taken from ONS. This one tracks the 1971 sample members on the left-hand side and looks at what's happened to them over time. So the yellow dots, as we move across the columns, show the sample members who are still present through to 2011. And the other dots, the white dots show those sample members who've died. And the small amount of green dots in the middle shows the people who are known to have migrated out of the UK or out of England and Wales. This slide looks from the other direction. It starts on the right-hand side with all the people in 2011 and looks at how they entered the sample. So the bottom part of that diagram, I apologize that this has red and green colouring, so I'll explain the diagram. The bottom part of the right-hand column is 233,000 people who entered the sample at some stage at birth. The middle section, 64,000 are people who've migrated into England and Wales. And the larger part, the 265,000, are people who entered at a particular census. The other columns show the people in the 2011 census as they can be observed in earlier censuses. A few developments that we've done since 2011 we did a series of beta test projects related to the 2011 data. We've introduced synthetic data and I'm not going to talk about that today, but there are links on websites that I'll mention later about that. We're involved with consultations towards the 2021 census and we've done various other presentations and roadshows and so on. So on to the main body of our talk today, is the geography in the LS. The LS consists of multiple files. So we have a file for each census, we have files for members and for sample members and for other persons in the households, and we have samples for various other sorts of administrative data. When you use the LS, a support officer will create an extract linking across all the relevant files and produce one single file for your use. In this talk I'm going to concentrate just on the census files. The most detailed geographies are contained within a set of restricted access tables, also known as X files, which most researchers don't have permission to use or to see. So these most detailed geographies can't be used for standard analysis or for reporting results, but they can be used for linking other variables. When we talk about geography in the LS, it's important to remember that there are in fact lots of different types of geography that we might be thinking about. We have place of enumeration and place of usual residence. For most people, those two are the same thing, but not always. We have place of second residence relating to a question in the 2011 census. We have place of work. We have place of usual residence at some stage in the past. Each census has asked where you lived one year ago, whether it was the same place you were living now or somewhere else. We have students' term time addresses. We have country of birth. We have place of birth or place of enumeration in 1939. So for people in our study who are old enough, we have some data on where they were registered at the beginning of the war in 1939. I've mentioned that because it's quite an interesting variable, but in practice it's one that is not always complete and is quite hard to use. So it's there more for interest rather than for offering serious analytical worth. At this stage I want to launch another poll. So as I mentioned, one of the items of geography we have is country of birth. So we've got a question here for people listening. To the nearest thousand people, how many people do you think there were in the 2011 census in England and Wales who were born in Croatia? The sample answers that we've got, 12,000, 10,000, 8,000, or 6,000 are all based on the normal, if you like, aggregate census, the census that most people are familiar with that contains all people. The LS, of course, is a sample. It will contain about 1% of these people. Okay. So we've got 19% of people saying 12,000, 24% saying 10,000, 27% saying 8,000, and 30% saying 6,000 people. The correct answer is the third option, 8,000. I said that's to the nearest thousand people. The actual number was around 8,200. And for those of you who are interested of all the countries in the last 16 of the World Cup, Croatia were 15th largest in terms of number of people in England and Wales. Only Uruguay had fewer people. Of course, that was based on data collected in 2011, and the situation might have changed since then. And I asked that question not just to try and maintain interest in the webinar, but also to point out that we need to think a little bit about geography. Croatia was recognized by the UK and by other EU countries as a sovereign state in 1992. But of course, many of the people in the census are old enough that they were born before 1992. And I haven't been able to look at the distribution of people born in Croatia by age, but I presume at least some of them were born before 1992. And this makes us think about how we relate to a field like country of birth. Is it the country that exists now? Is it a country recognized as a sovereign state? Or is it something else? Okay, I've now got a series of slides in which I want to look at each census in turn from 1971 through to 2011 and consider some of the types of geography that can be used in each census. So starting with 1971, our earliest census data, you can use a number of special fields. Standard region, county and district, local authority, new towns, regional health authorities, area health authorities and health districts. There are two types of geography mentioned at the bottom, in red and in brackets, wards and grid references. I've marked those in red because they're in the restricted files. So you won't be able to see those identifiers, but you may be able to use them in some way in discussion with your support officer. The grid references in 1971 are a mixture of 100 meter level accuracy and 1 kilometer level of accuracy, depending on whether you're in an urban area or a rural area. 1971 is also a little bit messy in terms of the way that we've got both pre and post 1974 variables. 1974 was when the effects of the 1972 Local Government Act came into force. I know from looking at the institutions that people who signed up for the webinar mentioned, not all of you are from the UK. One of the things that is notable about the UK, as Gundy suggested earlier, was that we change our local geography an awful lot. Between every census, there'll be changes in local geography, which makes it difficult to do any sort of analysis over time. So in 1971, we've got both pre and post 1974 variables. There's a slight issue with that in that I'm hoping I'm getting this the right way around. The post 1974 variables are about place of enumeration, and the pre 1974 codings are about place of residence. As I said earlier, of course, for most people, place of enumeration and place of residence are the same thing, but that's not always true. Moving forward to 1981, we've got a broadly similar set of types of geography that can be used. We introduced another type in 1981, travel to work area, and we also have a new low level observation enumeration district, which were the smallest area units available in 1981. Again, those ones at the bottom, ward and enumeration district, are only available in restricted files, so you can provide data to link to them, you can use them in certain ways, but you won't be able to see those values and you won't be able to use them to report results. Moving on to 1991, again, broadly we have the same sort of sets of headline geographies, although of course it's important to remember that some of these change over time in terms of detail. So we have districts in 1971, in 81 and 91, but they're not always exactly the same, those districts. In 1991, we also have an additional restricted observation of postcode. Moving on to 2001, there were quite a lot of changes. We no longer use standard region, we use government office region, as our large regional identifies, and those two sets of regions are similar, but not quite the same. Reflecting perhaps political interest and things that were important at the time, we no longer have Newtown as a coding. We've introduced national parks for people who live in national park areas. There are parliamentary constituency, Westminster parliamentary constituency, and European parliamentary constituency codings as well. We have a larger set of restricted area codings. Wards, parishes, primary care group and primary care trust, and grid reference. In 1991, we had enumeration districts. In 2001, these had been replaced with output areas, which are the smallest unit with which most people that are used to doing census analysis will probably be familiar with. Grid references in 2001 were more detailed than those used in previous censuses. Then finally, most recently, in looking at 2011, we have a similar set of headline areas, government office regions, counties, districts, and so on. Again, we have a much broader range of restricted level geography that can be used for linking other data and so on, as well as output area. We've got LSOAs, lower level super output areas, MSOAs, middle layer super output areas. We have workplace zones. Workplace zones were a new geography introduced in 2011, and then were introduced because of a problem that had arisen in trying to look at workplace statistics. In the past, all data about workplace statistics, including place of work and journey to work, had been tabulated using residential geographies. In many parts of the country, that's not terribly helpful. The easiest example that we always quote for this is the City of London. Thousands of people work in the City of London, but very few people live there. It makes sense to break the City of London down into multiple small units and not to try and use the residential geography, which wouldn't be very helpful. These small units are called workplace zones. In city centres, they tend to be very detailed. In more residential areas, they're larger than output areas, and they're designed to be large enough that we can tabulate data using them. For both 2001 and 2011, we have output areas. An implication of that is that if you want to use another sort of geography, then anything that can be built using output areas can be produced. Before moving on to the next slide, I just want to point out one thing I didn't mention there. In 2011, we have a set of 1991 district codes. As well as districts changing their boundaries, they also change their codes over time. 1991 districts are available almost all the way through our sequence. They're the easiest thing to use if you want to do long-term change over time observations. With a bit of fiddling, you can use other sets of districts as well, but the 1991 districts are the easiest ones to use. And districts are the smallest areas for which we can normally tabulate results. In some cases, districts might need to be joined together, typically for City of London and City of Westminster, because nobody lives in the City of London. As I mentioned, using output areas is relatively easy for data to be recoded. If you want to do this, you can supply to your support officer a lookup table with every output area and the way you want to recode that output area, and your support officer will be able to assist in recoding the data. As well as using OAs to recode data in that sort of way, we can also use them to attach contextual or area level data to the rest of the data for the purposes of analysis. So again, users submit a data set with an observation for every OA, and it's possible for that to be attached to unit records. However, in doing that, it must not be possible for someone to deduce a location smaller than district size from the final results. And the way that's normally suggested to get around this is to convert the raw values in your table of observations to deciles or a similar transformed form of the data. And people have done a wide variety of projects using this approach of attaching data at small areas. People have attached environmental data, such as weather data. We've done a small project attaching house price data. And essentially anything that can be produced at output area level can in principle be attached. If you want to use the LS, it's free at the point of use. And it can be used in one of two ways. Either you can use secure setting at the ONS offices, in which case you'll be able to run your own code and see results, but you won't be able to take any results out of the secure setting unless they satisfy disclosure requirements. And the disclosure requirements are that you can't have any values in a crosstab or a similar set of results smaller than 10 persons. If you're unable to attend an ONS office and they're limited to London, to Tishfield, to Southport, and for most people it's the London office that they attend, if you're unable to do that, you can submit code remotely and that code will be run by support offices. Again, if the outputs of that code satisfy disclosure requirements, then those outputs can be returned to the researcher under usual restrictions that those results can't be passed on to anyone else until final clearance has been granted. If you want more information, there are a couple of URLs on this slide. ucl.ac.uk slash Celsius for information about the ONS longitudinal study and about Celsius who supported. And calls.ac.uk which supplies information about all three longitudinal studies. So it also provides information about the studies in Scotland and in Northern Ireland. Finally, because I'm running over a little bit, if you want to apply, you need to be an accredited researcher and the website explains how to do that and you need to have an approved project. And with Celsius and with the other longitudinal studies as well, the support teams will assist you and advise on how to complete the form, etc.