 Hi, good afternoon and welcome to this webinar on the UK Census Longitudinal Database. I'm Oliver Duke-Williams from Celsius Center for Longitudinal Study, Information and User Support. With me is Rachel Stutterbury, who is one of our research officers, who will actually help you do your research with the ALS. We're going to talk about the UK ALSes, looking at both the Longitudinal Study in England and Wales, and the other studies in Scotland and in Northern Ireland. First of all, we would just want to check that everyone can hear what I'm saying. So we've now got a poll, which should ask whether or not you can hear us. Yes or no? We'll just have that open for a short while for people to be able to respond to that. And we can see that almost everyone can hear us, possibly one person can't. Okay, if you can't hear us, we've got a couple of options. One is to dial in and listen over the phone. Another is that we'll be recording this webinar and we'll send a link to everyone who has signed up for it. But the most likely reason why you can't hear us is that you've got your speakers or headset plugged in or muted or the volume down on your PC. If you have got any problems hearing us, please do put something in the chat window. We'll give any advice we can as we go along. So in this webinar, we're going to talk for about 40 minutes about the Longitudinal Studies and then we'll have time for questions and answers. You can ask a question by typing it into the chat window. We'll keep an eye on those as we're going along, but we'll answer questions at the end of the session. You can also tweet us either now or later on using the Twitter handle at Celsius News and we'll keep an eye on that as well. As I said, my name is Oliver and with me is Rachel and this is what we look like. If you want to use the LS, Rachel and our other support officers, Wei and Chris are the people that you'll see and you'll meet when you come and use the data. In this webinar, we'll be covering the question of what the Longitudinal Studies actually are, what kinds of research they're useful for, how you can access them and what support is available to you and we'll be outlining those things during the webinar. First of all, we want to gauge in some way the level of experience that people have got with the Longitudinal Studies. So we want to ask you why you're attending this webinar, whether it's because you want to analyze quantitative data, whether you want to use longitudinal data in particular, whether you want to use the LS amongst all possible longitudinal data sets, or whether you're here for some other reason. And again, we'll open a poll to see reasons why people are listening to the webinar and we'll have that open for a short while and see what people are doing. Okay, so we'll close that in a few seconds. Okay, and we can see that there are small amounts for the first and last options. The main reasons are the middle two, either because you want to use longitudinal data or you want to use the longitudinal studies in particular. And it might be that you're not entirely sure what the distinction is between those and that's one of the things that we'll cover during this webinar. And hopefully we can encourage you in particular to use the longitudinal studies, either the ONS Longitudinal Study or one of the others, or we can get you interested in the sorts of questions that longitudinal data in general can answer. So at this point I'm going to hand over, no I'm not, I'm doing this slide, I'm very sorry, I'm going to talk something about census data that's available for social scientists to do research. Some of this you'll probably already be familiar with. The aggregate data are data about places, how many people live in a particular location, a district or award or whatever. And this is what most people mean when they talk about the census data, especially people who've done quantitative degrees, such as geography or other subjects where they've used census data. We also have flow data, which are about movements between one place and another place. And that includes commuting flows and migration flows. There are also boundary data as part of the census outputs. They're digital map data that you can use in a geographic information system if you want to produce maps of your results. The fourth sort of census data we have available is microdata. Microdata are individual or household level records. And we have two forms of those with census. We have samples of anonymized records and we have the longitudinal studies. The samples of anonymized records, we get the complete set of answers that a sample of individuals gave in the census with some categories broad coded in order to prevent identification of people. Of course, directly identifying characteristics such as name and address aren't included in those records. With the longitudinal studies, we have samples of records from more than one census. And we can see how a given individual responded in the census at more than one point in time. And at this point I will hand over to Rachel, who's going to carry on. Thanks. And if you want to know any more about these options, do go to the census web page at the UK data service site. The URL is at the bottom of the slide, which I'm about to move on from to you. So the longitudinal studies or LSs, these are, as I expect you've gathered, large individual level data sets. Don't be put off by the lame studies. And each has a sample of the country's population. There's one for England and Wales, one for Scotland and one for Northern Ireland. I will cover sampling methods and sizes a bit later. But first, I want to talk about where the data come from. There are two core data sources, which are common to all three LSs. The first of these is the census. For England and Wales, the earliest census that's included is 1971. Northern Ireland, it's 1981. And for Scotland, 1991. And I should make it clear that what gets added to the LS record former census form isn't just the individual data about the sample member, but the whole content of that form apart, of course, from names and addresses. So responses about the household and the dwelling and individual data about every person in that household. So we don't just know whether an LS sample member is married at census, say, we know what job his wife is doing, how many hours, we know the ages of all children, how healthy they are, we know how many cars or vans the household has, all that. However, only the sample members are tracked over time. We only know about other household members cross-sectionally. So that's one core data source and the other is vital registration. That's to say, certain life events have to be reported to the government by law. And the details from these registrations are available in the LSs. So that would include, for example, the weight of the baby from a birth registration and the place and cause of death from a death registration. These are called events data. And they're available in the England and Wales LS, including the birth of the sample member, the birth of a child to a female sample member, the death of the sample member's spouse, so widowhood, cancer diagnosis and treatment, and the death of a sample member. The Scottish and Northern Irish LSs, all those events are there, but also they have marriage and also the birth of a child to male sample members, as well as females. Now, for the England and Wales LS, that's all the data sources we have. It's the longest running of the LSs, starting way back in 71, but the data sources are limited. Whereas in Scotland and Northern Ireland, there are all sorts of other exciting sources, either permanently linked to the LS or for which a special linkage will be made if you make a decent research case for it. And I will describe a few of these in a moment. But first, perhaps just a reminder of the topics covered by census. So we have geographies, various, housing tenure or type of communal establishment, sex, age, from 91 ethnicity, country of birth, all censuses, qualifications, marital status, family composition, economic activity, occupation from which is derived various socio-economic classifications, migration, travel to work, and from 1991 on chronic illness. In 2001, some new topics were added. Religion, self-rated general health, unpaid caring, and the year when the person last worked. And in 2011 came more additions, national identity, passports held, though this wasn't asked in Scotland, date of arrival in the UK, main language and fluency in English. And in Scotland and Northern Ireland, there was a question about various chronic health conditions or limitations. And in Northern Ireland only, a question about voluntary work. Going back to the very first topic, geography. All the LSs contain a range of different geographic classifications including census geographies, administrative boundaries, also post codes and grid references. And this makes it perfectly possible to link in area characteristics from outside the LSs. Because it's nearly always possible to construct a match between the geographic classifications used in the external data source, and some set of geographic variables in the LSs. For example, many projects use measures of area deprivation, density of different ethnic groups in an area has been linked in more than once. There's a project in England that looked at proximity of residents to power lines, whether or pollution data would be used. In fact, the Scottish LS has just formalized a link between Met Office databases and the SLS. So this is an important opportunity to extend the range of variables considered. But linkage at individual or household level of external data is not possible. So turning to other data sources, you can see on this slide the top row of balloons, if I can call them that, are the three censuses included in the SLS. And the bottom row are the vital registrations or events data. But in the middle row are brief pointers to other sources. There are health data from the Scottish information services division, including, among other things, hospital admissions and discharges, maternity records, drug misuse data and prescriptions. And then on the right hand side, there's school census records, attainments, attendances, exclusions. Although it's not mentioned on this slide, the SLS team has also traced older sample members in the 1939 National Register. Actually, this has been done for England and Wales as well, but I haven't often seen it used. And in Scotland, but not in England and Wales, those cases who were present in 1939 have been leaked forward to the Scottish mental survey carried out in 1947, which offers measurements of what we now call cognitive ability. And this is a very new addition, you heard it first here, folks. And another recent development is internal migration from year to year, based on postcode data from GP registration. So that's just a brief summary of what's available in the Scottish SLS. Turning to Northern Ireland, the possibilities include property characteristics, including its capital value in 2005, what parking it has available and the year it was built. As in Scotland, one can also study migration within Northern Ireland, as well as into and out of the province. And special linkages can be made from health and social care records, covering, for example, attendance at breast screening appointments, prescriptions for antibiotics, hospital visits, dental services, and much more. So, on to the question of the samples. Each LS is based on a sample of the population and the sample members are selected by their day and month of birth, irrespective of their year of birth. So it's an all-age sample. For England and Wales, the original sample was drawn from the 1971 census returns. And there are four selected birthdays, so the sample is just over 1% of the population. And the number of cases in the sample of any one census is between 5,000 and 600,000 people. For Scotland, the study started in 1991 and there are 20 selected birthdays, so the sampling fraction is just over 5%. And this means that there are about 270,000 sample members at each census. For Northern Ireland, the method is slightly different. The original sample was selected not from the census, but from the health registration system in 2001. And census data for 1981 and 1991 have been added retrospectively to the records for those and subsequent sample members. Because of the considerably smaller population in Northern Ireland, there are 104 sample birthdays and therefore NILS includes 28% of the population, giving similar numbers at each census to the England and Wales LS, 500,000 odd. All of these sample birthdays are kept very confidential. I mean, I've been in the job 10 years and I don't know them. So that no one will be able to identify any sample member in the datasets with real people in the population. This is absolutely paramount, the confidentiality of census and vital registration data. This does mean that if a sample member moves between countries in the UK, they can't be followed from one LS to another, which is a pity. I should also draw your attention at the bottom of the slide to the Northern Ireland Mortality Study, which is a dataset which includes all the deaths in Northern Ireland from 1991 onwards. And it links them to census records, obviously before the person died. The LS samples are dynamic. Every year there are exits and entries. Entry is by being born on a sample birthday or by registering with the health service, which is taken as an indication that you've recently arrived in the country, so it's labeled immigration, or by making a census return and not being identified within the existing LS membership. And of course, for any entry, the person is selected because he or she has an LS birthday. Exit from the study is by dying or by deregistering from the health service, which is taken to indicate that you're leaving the country, so it's called immigration, although it's known that many people leave to live abroad permanently or for a time and don't think of deregistering from the health service, so it is by no means a complete record of emigration. But exit from an LS doesn't mean that your record is archived, it stays in the database permanently for analysis. And if a sample member who has emigrated reappears at a subsequent census, well, the new data are added to their record and their current status is revised accordingly. And the same is true for dying, although that is less common. So all records are retained permanently for analysis. Just as a visual reminder, this is an example of the data available in the LS for one imaginary woman in England or Wales. She enters the sample at the 1971 census, which included, for that census only, questions on marital history and childbirth history, which have been very useful for subsequent LS researchers, I can tell you. She has two birth records added, a daughter and then twin boys. The 1981 and 1991 census data for her household are successfully linked in, and we see that by 1991 the daughter now in her teens has left home. In 1997, the husband dies, so our widow could record his added. She's present again at the 2001 census, now living alone because the children have all left home. There's a cancer registration. She appears at the 2011 census with a co-resident grown-up daughter and she died soon afterwards. So moving on to the types of research possible using the LSes. This rather daunting list shows some of the types of study design that are possible using the LSes, cross-sectional, geographic, longitudinal from census to census, longitudinal combining census data with events data, cross-sequential and intergenerational. Cross-sequential means doing the same bit of longitudinal research twice or more, but starting at a different point in time. So this means you're studying a particular process or transition, as always in longitudinal research, and how that transition itself is changing over time. I've got a few research examples which I hope will make the types of study design clearer. They're all from England or Wales as it happens. So an example of longitudinal census to census research. How stable were cohabiting partnerships compared to marriages between 1991 and 2001? For this we studied people, excuse me, aged 16 to 54 who were in partnerships in 1991 and who were enumerated 10 years later in 2001. Well, here's a question for you. Which of the age groups in 1991 would you think most likely to have no partner at all 10 years later? 16 to 24, 25 to 34, 35 to 44, or 45 to 54? Interesting results. By far the largest set of answers, about half of you voted for the oldest age group, and the next largest set of you voted for the youngest age group. Okay, the results are the answer. Here we see on the left in bold the 1991 partnership headings, so married or cohabiting, and in under those the 2001 outcome divided into those with the same partner, those with a new partner, and those with no partner. And you can see that the likelihood of having no partner reduced considerably by age group for those who had been married in 1991. But for those who are in cohabiting in 1991, age made no difference. Just over 20% were alone 10 years later. So I think the people who said the youngest age group were the most likely to have no partner win because they were more married people than cohabiting people in 1991, although I'm afraid you can't tell it from this table. Congratulations to those. I mean you, not the people who have got no partner, sorry. I suppose we also noticed from this that for all age groups, the likelihood of being with the same partner after 10 years was higher for those who were married than those who have been cohabiting. I suspect that if we did the same analysis between 2001 and 2011, the difference between marriage and cohabitation would be less marked. Time will tell. Moving, oh sorry, I'm supposed to have given you some boxes, never mind. Moving on to another kind of data study design. One of the most valuable features of the LS is the possibility of combining census data with vital events data. And this is very often done using death's data to study survival by characteristics that can be measured at census such as occupation, marital status, you name it. But this is an example using birth records to investigate whether girls who become teenage mothers are disadvantaged in socio-economic terms. And for this we took a sample of girls aged 5 to 9 years in the 1981 census and followed them up in 2001 when they were aged 25 to 29 which meant we could analyse them both by their characteristics before any births in 1981 and by their characteristics 20 years later after they had or had not experienced early motherhood. I haven't asked you a question here because I couldn't think of one. So here are some results. This table shows one characteristic from the 2001 census which is the woman's highest educational qualification that's in 2001. And you can see that there was a very obvious relationship between giving birth before their 20th birthday on the left and having lower or no qualifications. And vice versa of the sample as a whole you can see 29 percent got a degree but only 3 percent of women who've been teenage mothers got to that level compared with 43 percent of women who hadn't given birth by their late 20s which is a huge difference. And we could and we should follow those same women up in 2011 and see if any of the love achievers have caught up perhaps gone back to college when their children got bigger. One more example. Another thing we can do with LS members is to sample them when their children as we did in that project we pick them as children because they are very likely to be living with at least one parent so we can get parental characteristics from the census form and see whether the characteristics of the parent generation are replicated in the child generation when it reaches our oldhood. This analysis uses LS members who were present at the 1971 census and again in 2001 and in 1971 they were aged six to 15 years and it was trying to assess social mobility. So here's a question for you. Four men who are unemployed at the latest census point in 2001 when they are aged 36 to 45 years what do you think the most common socioeconomic status will be for their highest ranking parent usually the father 30 years earlier when they were children. Working in a professional managerial occupation in a skilled non-manual occupation in a skilled manual occupation in a semi-skilled or unskilled manual occupation or unemployed. Over to you. Right I think everybody's answer that's going to the largest set of answers 46 percent came out working in a semi-skilled or unskilled manual occupation 29 percent said unemployed and 23 percent working skilled skilled manual and nobody opted for most likely to have a parent in a professional or managerial occupation and they're quite right not to. Well this chart shows men only and it excludes those who only lived with one parent in 1971 who were analyzed separately. The colored bars show the status of the parent and then the bottom axis shows what the son was doing 30 years later and you can see that the cream colored bars are nearly always the highest and the cream colored bar is also the highest for the sons who are unemployed in 2001 labeled U stroke E. Cream meant the parental status was skilled manual worker which is the most common overall um it was but it was the most common for all working age men um in 1971 so that's that but very noticeably for the group of sons who are in social classes one or two the left most group of bars the cream bar isn't the highest it's the lavender bar which is highest that's the percentage of parents in that same highest social class a strong association between high occupational status in the parent and the same in the son in 2001 on the other hand I suppose you could say there are presentable proportions of lavender bars in every group of sons even those with unemployed parents uh sorry even those who are unemployed themselves I think it's pardon so the glass ceiling obviously wasn't impossible between 1971 and 2001 well that's the end of the examples um the practical side how do you go about accessing one or more of the LS is each LS has its own application procedure but in fact the steps you have to follow and the conditions under which you use the data are very similar you having thought about you want to what you want to study formulate your research question you contact the appropriate support unit I'll mention them in a minute you work with the support officers and with the support unit websites to decide what LS data you need what your sample will be and what your variables will be you can read the application forms one is a description of the proposed project and the other is an application to be an approved researcher and this latter must be applied for separately by every person named on your application and if you're a student your supervisor must be on there a data set will be extracted especially for your project and it will be accessible to you in a safe setting on specific premises you can't use it anywhere else for the England the Wales LS these are in London Hampshire and Newport for the SLS it's in Edinburgh and for NILS it's in Belfast once your project is approved you attend a training session so that you understand your responsibilities when you're working within the safe setting another way of working in England Wales or Scotland but not in Northern Ireland is to send code to the support officers the code will be in stator or SPSS or SAS or R and we will run it for you and correct any mistakes if we can and send you back the outputs encrypted and this is very popular in England and Wales and one user has sent us well over 300 code files so far whichever way you go in doing your analysis the outputs that can be released from the safe setting are tables or models or aggregated data sets with a limited number of variables and these outputs must be scrutinized by our support officer for potentially disclosive elements anything that might allow you to identify a person or a household and these outputs can only be shared with approved researchers on the same project and then when you've got something you want to make public a presentation or an article anything that contains data from your study you will need to submit it for what's called final output clearance and the standards for this are higher it sounds complicated but there are support units funded by the Economic and Social Research Council to help researchers use the LSEs SLSDSU and NILSDSU the services are free to the user and are currently funded to the end of July 2017 so if you want to use a stability too long just in case we all offer support right through the application process and your analysis and our respective websites are the main sources of information about the LSEs and they're certainly the place to start if you want to know more and there's also a central inquiry point for all three called the calls hub the calls website has some resources which are really useful for anyone thinking of using more than one and there are a number of researchers who've done that and at this point I shall hand you back to Ollie. Thank you Rachel. Just following on Rachel's point about the calls hub and UK level analysis there's a couple of things I'd like to say about that at the calls hub website we have a harmonized data dictionary the three separate studies all have their own separate data dictionaries but the one of the calls hub includes entries for all three studies at the same time and this helps with doing either comparative or inclusive UK level research if you look for one particular question or topic in the data dictionary you can see whether that question is asked in all three studies and also some guidance as to whether we think it's the same question with the same response categories whether it's a question that's been asked in different ways with response categories that aren't compatible or whether it's somewhere in between whether it's a question that with a bit of playing around you can get the response categories to be more or less compatible and then do comparative work there's also information at the calls hub site about something called e data field and that's a process that allows you to run a model on more than one study at the same time it's an iterative process by which you'll run a model on one data set send the parameters off and have the model run on another data set and go back and forth until the model reaches a balance and you get a set of model parameters that are fitted across both data sets whilst that can in principle happen automatically we have to have a human gap for legal reasons so it takes a bit of time to do that but staff at calls and the three studies are perfectly happy to help you do that if you'd like to do a research project involving more than one of the studies well that brings us to the end of our webinar at the beginning we saw that some of you were specifically interested in the census longitudinal study others of you were interested in longitudinal analysis in general hopefully the examples that Rachel gave you can see apply not necessarily just to the census longitudinal studies but about ways that you can work in a more generic sense with longitudinal data and some of the sorts of comparisons and sorts of experimental designs that you can do using longitudinal data