 Gallwch allwch! welcome to the UK Data Service Workshop on Large Scale Social Survey Your presenter today is Nigel, the UK Data Service based at the University of Manchester Ok well thanks very much so I'm going to cover today is a little bit about the UK Data Service talking about survey data with some practical examples of that I'm going to the documentation and tips which are quite important, and a tour of the web site at the end. As I say we are a small group, so we can have more interaction in this session if you want it. I'm quite happy for you to either ask verbal questions or use Q&A. Okay, so what is the UK Data Service? Well, we're funded by the ESRC, which means all of the material we produce, it's free for noncommercial use. We provide a single point of access to a wide range of secondary social science data. This includes both qualitative and quantitative data and today we're focusing particularly on survey data. We also provide support training and guidance, and this is part of our training program. There's lots of documentation, and I'll show you some of that on the website. We basically support new things as they come in, so I might mention a few of the surveys we're expecting to see coming in shortly, which data we'll be expecting to see coming in shortly. So we hold UK surveys, longitudinal data, so data that's either available from longitudinal surveys like understanding society, or also data that comes from sources like the World Bank aggregate data. That's also an international database. We'll hold census data, we're just expecting to be releasing census 2021 data for England and Wales. There's already some data there, just simple local authority profiles of eight by age and sex in terms of England and Wales, and some more detailed data from Northern Ireland. We also host business data, and, as I said before, qualitative data. So I can see some people have gotten since we said, so just to reiterate, we have had technical problems today. So we have a smaller group than we expected, I suspect, and some people might come in a bit late. So if you've come in late and you feel like you've missed something that we've been talking about so far, please feel free to use the Q&A or to type or to just raise a question directly. So what are surveys? Well, surveys hold data generally about individuals or households. They're often commissioned by government departments and conducted by organisations like the Office for National Statistics or the National Centre for Social Research. And the aim of most of the surveys is to generate large sample sizes so that they can be seen as nationally representative. So within them we have mechanisms that allow you to translate the results into those reflecting of the population of the area being surveyed. So, for example, the English Housing Survey has things in it to enable you to make claims about housing in England. Quite often they're repeated cross sections, so they use the same or similar questions, but each time with a new sample of people. And they are repeated regularly. Just to kind of look at the mechanics underneath this, this is a couple of SPSS screenshots. And what they're showing is the way the data is held. So it's held in a kind of worksheet type format that you, if you're not familiar with SPSS, you will be familiar probably with Excel. The data is held as coded. So looking at that first question, the question is on self-rated health. The response is two in the first line and that translates into good. In the fourth line response is three, that translates to fair. And one which you can't quite see translates to very good. Similarly with sex, age bands, marital status, higher education qualification and ethnicity. So if you use a package like SPSS, you can see what these categories are, and when you produce tables and so on, they will come out with a text version model, the number version. And we produce data together with documentation in different formats. So SPSS data and tab formats are the most common. We also provide an online interface by Nestar where you can browse the variables and meter data for some of the data sets that we hold. Nestar allows you to do simple data analysis and to export tables and graphs and download subsets of data. So the kind of key topics we hold cover a wide range. So in terms of employment and work, we have the labour force survey, which is the main survey and we'll talk a bit more about that later on. And aligned with that is a larger survey, the annual population survey, which has less information in it. Other data sets such as the European working conditions survey are also there. In terms of health, we have the health survey for England, the Scottish health survey and diet and nutrition surveys amongst others. In terms of family finances, maybe a topic of great interest at the moment, we have family resources survey and living costs and food survey. We hold the crime survey for England and Wales and the Scottish equivalent. And one of the examples we're going to use is around attitudes and opinions. We're going to look at the British social attitudes survey. There's a Northern Ireland life and time survey, etc. And then in terms of housing in the local environment, we have the English housing survey. So there are lots of different sources. So at that point, I think it's right to stop and ask if there are any questions because that's the kind of essence of what we're going to give examples of now and we're going to get a bit more interactive as well. So before we move on, are there any questions about the kind of data you're looking for that we can pick up in the examples later? OK, so this is from Neo. The ONS website and Nomus are different data types of archive that are much more linked to the official statistics. We hold some of the ONS data that isn't available directly from other sources and the detail behind it, but they both provide kind of similar types of data. And we've got a question from Apple on the level of disaggregation. So basically, our data sets, the survey data sets are individual data and some of them will have some geography in them. One of the issues with geography on surveys is that it might be disclosive. So below regional level geography, for most surveys, you would find that you would need to go into the secure access. So we can talk about that later. But I think there is a restriction given sampling methodologies about looking at things at any fine kind of grain granular level. We actually hold all of that data. So in terms of the, we have information on expenditure and food in the living costs and food survey. In the English housing survey, half of the properties have a surveyors report on housing conditions. In the health survey, there's a nurse visit for a substantial participants that has both a blood and saliva test as well as other measurements like health. There's a national survey for sexual attitudes and lifestyles. And finally, in terms of understanding society in the COVID waves, we get test results, vaccination, infection, etc. So the answer is all of them in different ways. And that's becoming particularly around the biological samples. I think that's the right term. There's quite a lot of interest in actually collecting that kind of information within surveys because it enables us to connect the kind of fields of clinical research and social research. An area of particular interest I think during as a result of COVID. So here's some examples of surveys now. I think somebody mentioned the British social attitudes survey. This is conducted almost every year since 1983. It covers public attitudes on a range of issues. It helps because it's been so long running to look at patterns of continuity and change. The sample is around 3000. It's a core questions that identify things like your background, age, sex, ethnicity, etc. And then you have a number of modules on different aspects. Some are asked regularly, others less often, often led by the kind of media interest. So if you look at the 2010s, what you see as a focus on immigration, Brexit, etc. I suspect that we will be looking at issues around cost of living and precarity in future editions around COVID and so on in previous editions. And just to look at a particular kind of question. So in the 1987 report, I think it was, there was a question about attitudes to gender roles. So in particular thinking about women having the primary caring role for children when they're young. Again, we want to have a look at a poll here and see what you think had changed up to 2012. So I'm going to start this poll much quicker than last time. OK, so I suppose the majority is somewhere around 23 to 33%. The correct answer is 33%, which I found a bit surprising. I suppose I thought the world had become more modern. But actually when we move on to the repeat in 2017, that the overall traditional gender roles had dropped from 48% to 8%. Though there was still more of an expectation that women had the primary caring role in terms of children. There was a series of questions asked here without getting the documentation up. We don't have it to hand. But the one around women having the primary caring role when children are young was phrased much like that. Do you believe that women have the primary caring role when children are young? But the bank of questions on gender inequality was around a range of different things. So attitudes to women in the workplace, attitudes to economic independence and other household issues. So to explore that you would need to look at that. One thing to say about the British Social Attitude Survey is every year there is a report that comes alongside it and they report the key headlines from the survey. There's also all of the documentation in terms of the questions asked alongside the data set. So it's quite a good data set to navigate in lots of ways. The information available with it supports a whole range of analysis quite effectively. And it's also one we have produced sub versions of for those who might be interested in teaching about social attitudes. So we have one on politics and the environment, one on poverty. So remove on to the labour survey. So this is the main source of data in the labour market. And it measures a whole range of things around employment, unemployment and economic activity together with topics around occupation training hours of work and the personal characteristics. So it's a household survey is carried out quarterly with 60,000 people interviewed per quarter. So this is one of the I think this is the only one of the national surveys I've been contacted to do. And I ended up with five visits at some point in the 2000s from the researcher who asked me questions about my characteristics and the rest of my household. So I was reporting second hand on my wife and children's kind of economic position whilst they were living in the household. So the data sets you get is you can get a quarterly individual and household data set. And you can also get a five quarters data set. It's a pretty complex survey I would say in terms of our help desk which provides support. It's probably the biggest set of queries that is due to the way the documentation has evolved. So far, the documentation has been built up over time. So if you're looking for particular variables, you've got quite a bank of documentation to go through. I want to say currently working through reproducing that guidance so that it's more coherent. So rather than adding in further guidance when something changes, which has been the pattern today, they're trying to consolidate all that guidance into one set. And there are things that reflect that already there. But when, for example, particular categories very change over time, there are some queries that come in because people can't easily look at change over time with that category shift. Just to refresh what I've said, we have technical problems this morning with the Zoom link and have to send out a new one. So if you come in later and you've got questions about what's come before, then put them in the Q&A and we'll try and pick them up. So the health survey for England is conducted annually again. Close a whole range of things like general health, longstanding on the smoking and alcohol. Burns asked about the LFS. I think it does hold benefit claim information, but your better source for that might be something like the family resources survey, which would tend to hold sources of income as well. If you have a query as you're going through that, then we do have the help desk, which can help guide you towards the best data sets to use. For me, that would be an offline task where I'd have to go and look at the documentation and see specifically what was how. But feel free to use the help desk. So once you're registered with us, you can email us through the help desk. Actually, one of our team would pick up the query and get back to turn around fairly quick, I think. So there's a core questionnaire plus a focus topic for the year. It's an annual survey that's run since 1991. Currently, there are about 13,000 interviews a year. In terms of the question before about ethnicity, there is a release due, which has an ethnic boost in it so that there will be better data in that version of the data set, which is due out, I believe, later this year. We will be highlighting that on our website. So if you're interested in ethnicity and health, then follow our space and when the data is available, you'll be able to look in more detail at that. Alongside the questionnaire, there's a number of physical measurements and the analysis of blood samples and there's a report as well. So the picture on the right is a kind of extract of some of the information from that report. So it covers both public health and treatment clinical symptoms as well. And they use the infographic way of presenting data quite a lot. So it's quite an accessible report in terms of headline findings from each of the surveys. So this is where we're going to do a bit more of a kind of how you get to grips with the survey. So you've got a data set that looks like nothing sensible at all. It's got a load of numbers in it and some labels that may not make that much sense. So the documentation is central to getting to grips with surveys. And let's have a look at here. So we're looking at the Understanding Society COVID-19 study. So there were seven or eight ways of this study conducted during COVID to explore different areas of interest. So the areas of interest has shifted over time. There was a lot of interest in what was happening in education. So a lot of questions about school children within the household and what their experiences of education were. A lot of interest in the kind of facilities available to people. So when people have access to open space, whether they had access to exercise, etc. And a lot of interest in the clinical elements. So have people tested positive where they vaccinated and so on. And these kind of, I use this data for analysis of housing and debt, which was quite interesting for. And in terms of ethnicity, I was interested in breakdowns in that. So I use the data which showed there were quite distinct differences in housing conditions and precarity between different groups. So the question has changed over time because the need for information changed over time. And looking over here on the right hand side at what we've got, we've got the study information. So if you use a survey, we have a recommended citation for that survey. We then have data dictionaries, which have all of the variables. And you can look down through those. In understanding society, they tend to be grouped into blocks of similar type things. So you can have a look at them, see what might fit. Any recent notes, the original questionnaire and some information about changes. Now, in terms of understanding society, that's wrong from the University of Essex. And they also have their own website. So you can go on their own website and search for particular variables. So if you're looking at change over time in some aspect of social or physical economic life, you could search for a variable and find out which ways it had been included in. And potentially, if you were getting into longitudinal analysis with which we're not, you might be able to look at change over time for households, individuals, et cetera. So it's a household survey. There's around 40,000 responses in the main survey in the COVID-19 survey, which we're talking about. There's about 16,000 or 17,000. And it was conducted every two months from, I think, April 2020 through till autumn 2021. The frequency dipped towards the end, as things weren't changing as much. But for many people, it was a useful open way to get into the data. So quite a lot of the work done on COVID was using pretty secure data, which was linked directly to health records in terms of looking at the broader social context. Understanding society was an important resource for researchers. And here is an example of the questions asked. So I think that the previous question, what was the actual question, asked you to find this documentation for most of the surveys. And at the top, you can see some routing information. So if it's a question that is dependent on answers to previous questions, there will be the kind of who did answer this question, which is quite important to understand in lots of ways. The variable name and the things that you could put in here. We're looking at long COVID symptoms, and you can see there's a long list. There's another at the bottom where you'd be prompted to write in what that was. So this came a bit later in that understanding society set of ways of COVID data, exploring long COVID, which I don't know what we knew about at the beginning, but became quite significant area to understand as we moved on. And then when we look at resources, we might see case studies that analyse key messages from the data set. So this should link to the understanding society publication. So if you end up citing this, then your publication could easily end up here. I'm not sure of our mechanics of getting that. And then related studies. And as it says here, in the other bit, it links you to the, you've got a link to the understanding society website. So this is kind of the things behind the survey that I've produced when a survey is deposited with us. Kind of quite important stuff to understand. Now, I talked before about how we make things appear, how we make survey data nationally representative. And the way we do that is to weight the data. So for example, in the English housing survey, there is a selection base, which is based on the most recent information we have about the population. So in terms of thinking about some of those, we think about different types of property maybe in terms of housing, different tenures, different types of households, different geographical neighbourhoods. And the survey is designed to kind of gather a representative sample of those, but response rates may vary between those categories. And what weights do is allow an adjustment. So if a group is underreported, then the weighting will make that particular variable kind of report representative. So when you do simple count data, what you'd find is that data might be incorrect. And in all of the statistics packages, we simply have to say use that weighting variable. And the details will be in the survey documentation. So in the example of the English housing survey, there is a single individual weight, which is fairly easy to use. In the labour force survey, there are both individual and household weights. And there's quite a lot of guidance about how to apply those. And we have kind of guidance on what is weighting a video, some videos on what they mean in social surveys, how to use them in Nestar. I think our SPSS data and R is probably a bit out of date for some. And we would cover it in workshops where we're exploring particular data sets. And we have some work going on developing data skills modules. So for example, there's one on the crime survey. And if you use that, it will explain how to use weights and how to apply them. We are doing some similar work with Excel, but that will be coming along later on. It's really aimed more at teaching rather than analysis. Excel can be used, but it doesn't automatically understand weight. So you need to do something particularly with it. You need to count the weights rather than counting the number of records. So pretty central to doing survey work is understanding how to transform the data you can get out of it into one that you can make claims about the population. And you would notice that in news reported when particular surveys are released. So I know that almost every time the English Housing Survey is released, there's a press release from Shell to which talks about the particular challenges facing groups within society. Going back to the point about ethnicity, this is an area of concern particularly around housing, which is one of my research interests. And there's quite a lot of work going on in different places to try and address that. But the sample sizes aren't big enough to have adequate representation of individual ethnic groups. So the categories used in them tend to be the white, black, Asian, mixed, other categories, which are not that useful when you're looking at inequality within those groups. So the next thing to talk about is access conditions. So we have three levels of access. The first is open access. It's available with few restrictions. In terms of survey data, it's restricted to a small number of teaching data sets. There are also census aggregate data and geography are available open access. Most of our survey data is safeguarded. It requires you to be registered with us and to sign an end user license. You need to register and agree to the conditions. Most of the surveys, once you've done that are accessible, but some may have additional conditions. So, for example, the English Housing Survey requires approval from the data owner, which is currently the Ministry of levelling up, I think. And then the final level, the secure level. I mentioned before this has things like more specific geography, so it's a much greater risk of identification of individuals. So in order to access that, you need to become what's called an accredited researcher. So we run training courses. If you look at the events page, there is secure training. And what that is, is in effect a days training and then a test to make sure you understand the principles. You then need to apply for use. And there is a test for the public good, which is okay. It would fit with most people's use of data. It's very unlikely you're going to be looking at secure data in order to investigate things that will harm the population. And a lot of business data fits within this commercial confidence. So once your application is being processed, you then have a secure access agreement to use that data. And you would need to use it through a physical or virtual secure environment. Pre-pandemic, these used to be physical spaces. Increasingly, there are provisions for both secure sites on a number of university campuses and also a virtual arrangement where if you use a university controlled laptop. But the basic principle of that access is that you can't take anything in and you can't take anything out. So you have to put your code through a check. There is no external access once you're inside and your outputs are checked before they're released to you to use. So there is a kind of onerous process to get at that data. So I suppose you need to be fairly sure that that's what you want to do when you've got the time to go through that process. I'm happy to answer questions about that again. But that's the broad access conditions we hold data and release data on. So what I'm going to do now is I'm going to stop the share and I'm going to go on to the website and have a look at help you navigate some of that. So what you can see here is the front page of this and at the top bottom, there is a login or register button. So if I log in, it remembers me and it will take me through my university accreditation and then back to that front screen. So looking across the top bar, there's a fine data, which is what we're going to focus on, the learning hub and training and events. So first of all, if we have a quick look at the learning hub, so if you're new to the UK data service, there's a number of different things here. So there's a new to using data selection, the data skills module, things about student access. So I'm going to go into the survey data and what you can see here now is a set of information about survey data software and tools. So that tells you something about Stata, SPSS and R. So there's some guidance there. I know the SPSS guidance is quite old relatively, but the key thing here is to find out what your institution has if you haven't already got familiarity with software. Nestar, which is the interactive tool and you've got navigation bars alongside. So information about waiting, as I said, things about survey data. Somebody asked about geography. There is things there about the type of geographic information. And you've also got a question bank resource. So if you're using survey data and then you want to investigate further for your own purposes, you might find the question banks useful because they have the way. The way that questions have been asked in surveys, which you can repeat. And then here a set of data on different types. So if you're using the labour force and then your population survey, there's a video of the workshop introducing that. The understanding society and its predecessor, the British House of population survey, waiting in understanding society, etc. And then a link to the kind of training events. And we also wrong conferences. So lots of information there about survey data that was through the learning form. You can see there's other things down there. If you're interested in teaching, then there's information there to help you think about how you teach with this kind of data. That one link to teaching data sets as well, etc. So the learning home is probably a good place to start may well be a good reference point as you go along and is being developed as we as we go through things. The other areas that might be of interest is training and events. So you've all heard about this, but if you've heard about it by word of mouth, you might find this useful so you can see here the kind of events coming up. So today's event, the event this afternoon, which has been cancelled because of the technical problems we've got. Information on copywriting, census, which is a big focus, anonymisation. As I said, the secure training, this is safe researcher training, etc. So a whole range of courses. We tend to operate a term based program. So this is the autumn program. There will be a few more things coming into it. We advertise them as well, but the spring program will be released later on and can provide useful information. So those are the kind of support facilities. I think the big thing here is how you find data. So you can browse, you can search and so on. So I'm going to search for the labour force survey. Now I work on the help desk. So that's why this survey comes up top because I said it's the one we get most queries on. So if I go to the labour force survey and go forward, what it will give me is a whole where all of the data sets linked to the labour force survey. If I want to look at the most recent data, I can refine the date and then what I've got is the quarterly labour force survey data. I'm going to go into one of these in a minute, but I just want to see the others. So there's a longitudinal set, the labour force household survey, household data sets, and then two quarter and there will be five quarter longitudinal studies as well. So I'm going to go back up to the top of that. And if we look at the labour force survey here, so we've got the summary information about this data set, how to cite it, and a kind of abstract saying what it's about, information about the coverage and methodology. So when must appear work conducted, what spatial units were used, observations of individuals and households, how many cases are there, the way the data was collected, etc. So that's the kind of front page of it. When we move into the documentation, I did warn you about this, but there is an awful lot of documentation in this particular survey. I'm going to pick another one afterwards to show you that it's not quite. So, for example, now looking at the case studies, there's a number of different kind of studies that have been done that are reported here in terms of reports. The gender pay gap in Northern Ireland, etc. And then you've got the different studies that are linked to this. So all of the previous labour force surveys. So I'm going to go back to find data again. And I'm going to look for the British social attitudes, which we've also talked about. And let's have a look at the latest one, the 2020. So, as before, information, out citations, copyright, abstract and coverage. So there's nearly 4,000 cases. So if we then look at the documentation, much more straightforward. So there's a technical report, there's a user guide. There's a questionnaire. The variable list PDF format and Excel format citation information and the data dictionaries. So just have a quick look at the data dictionaries. So that's co-op as a zip file. I don't know. You have to let me know if this translates across when I open it. It's warning me about that. I'm not going to try and mess around with sharing screens anymore. I'm just going to go back to where I was. Okay, so I'll stop the share and go on to the documentation. So this is the kind of dictionary you get. So you get the name of the variable, the meaning of the variable. The categories used for some of them. I'll go down to one that has meaningful categories. Here we've got marital status. So we have the value and the meaning of that value for marital status and what happens with missing values. So there's a set of documentation there that enables you to kind of see what's going on to follow it through. Now, let me get back to that screen. So as I said, you can do quite complex searches here. Once you're familiar with datasets, the dataset you're using, then you would click on. So if you go back to that British Social Attitude Survey, there's a button on the right-hand side to access it. So having looked at the documentation deciding I want to do that, I then access the data, allocate it to a project. So if I haven't got a project set up, I need to set one up. I can allocate datasets to it and then download them and use them for analysis purposes. So there's a record of everything that I have. So if I go into my account, I'll just show you an example of. So I've got six projects, quite a lot of them are help desk ones. And these are some of the data I've been looking at for different purposes whilst I've been working here. So the previous datasets I've used were under a different email address. I wasn't at the University of Manchester, so you can't see those, but I'd basically set up a project and then allocate data to it. I think the last thing to show you on there is the help facility. So if you have problems registering or logging in, there's some guidance there. If you're not from an academic institution in the UK, you can register, but there is some additional information required. Some stuff for new users, information about the secure lab and information about your general download, access to the nest art tool. And it goes on and on, doesn't it? At the bottom, you've got the contact for the help desk. So as I said, if you've got a query, you can go into here, complete a web form and send it through. And for queries about datasets, they're likely to come to us, or for survey datasets, they come to our team. There will be headline news as well. So we have the census about to be released. So that's one of our headline things. Browsing data and training resources from events, stop at the bottom.