 All right. Good morning, everyone and welcome to this online workshop for dissertation projects, which is an introduction to secondary analysis for qualitative and quantitative data. My name is Maureen Haker. I am a senior lecturer at a University of Suffolk. But I also have been working for over 10 years now with the UK data service on basically anything from ingesting data to digitizing data and reuse projects. And I'm here with Ali. Do you want to introduce yourself? Yeah, hi everyone. I'm Ali Bloom. I work with the UK data service user support and training team based at the University of Manchester. So creating user training resources, online videos you might have seen, content for our website and delivering webinars like this to help you guys get to grip with kind of the key data and how you can use it. So what we're planning to do today is to go over what is secondary analysis and we'll look at some of the key methodological issues of reuse projects for qualitative and quantitative data. So before we get any deep in that I am going to give you a brief overview of the UK data service for those who have not yet had the pleasure of exploring the archive. And we've tried to pull in some case studies and specific examples so you can kind of see the data in action. So I'll cover that first and then we'll move into my introduction on qualitative data. So I'll be talking about recontextualizing the data and sampling. And then I'll hand over to Ali who's going to talk about quantitative data. I'll give you a bit more around selecting your data and understanding your data. And then we've got some sign posting to further resources and some interactive exercises at the end to help get you started with secondary analysis and see if you have any questions from those exercises. Alright, so we're just bear with us for a moment while we just swap screens. And I'll go ahead and get started. Alright. So hopefully you can all see my screen now. Excellent. Okay, so before we get too far along. I just want to address a very simple point first. What is secondary analysis. So in short secondary analysis is a method which asks new questions of old data. It's analyzing data that you've not collected yourself. Usually researchers collect far more data than they actually need to answer their own research questions. So you might think of national surveys for example which collect a lot of data on a representative national sample or qualitative studies, which contain interviews that last you know at least one to two hours, sometimes more. Those data sets can answer a lot of different questions and they can be analyzed using a lot of different techniques. So secondary analysis makes the use of that data and it basically does just that it reuses data that's been collected by someone else. But there is a little bit of complicated nuance here around terminology. So you may have heard other terms to describe the method. So there's secondary data analysis there's secondary analysis there's also reuse projects, and they all kind of refer to the exact same method. So don't be too confused by that. There is an ongoing debate about how to call the method. So in 2007, Libby Bishop wrote about the primary secondary dualism, and she basically makes the argument that there may be a little bit of a problem where you go out and collect data yourself and using terms like secondary analysis or secondary data reinforces that hierarchy. But actually primary and secondary analysis are a lot more similar than different if you fully consider the key methodological issues of secondary analysis as a method. You'll find there's an increasing use of the term data reuse. And that's a term that I'll probably use throughout the workshop. And that takes into account the huge range of ways that data can be used and reused. It also doesn't imply any of those projects are secondary to the initial use of the data. So you can use whatever term comes to mind first, but do be aware there are a few different ways of describing this process of reusing data. Okay, so now you know what secondary analysis is, but where would you find data that's already been collected. And this is where the UK data service which holds the largest collection of social science data in the UK comes in. It's a comprehensive resource that is funded by the ESRC. The main job is to be a single point of access to a wide range of secondary social science data. The purpose then is the collection, the ingest and processing of that data, and further dissemination of the data for other people to use. And in addition to that data archive infrastructure core, we also have a service layer, which provides extensive support training and guidance. It's part of what puts together these workshops. This is it for. We like to think it's for anyone who has an interest in data. So traditionally the main audience, and the people who probably both deposit and use the data the most tend to be academic researchers and students. There are a lot of other people who are well represented to which includes government analysts charities foundations businesses research centers think tanks. We try to give and use our data, given the importance of data and how it's used how it's disseminated, you know, we're trying to reach out to support a wide range of communities. What kind of data do we hold the majority of the data, judging by the number of collections is certainly quantitative data. And there are, I don't, I don't even know what the current count is at least over 8000 collections but we add collection daily, and at least about 7000 of those are quantitative collections. And there's a lot of variety of that data as well so there's survey data both cross sectional and longitudinal, there's aggregate statistics, domestic and international macro data. There's micro data, as well as having a good collection of qualitative and mixed methods data as well. Where does it come from. Again, that varies depending on the data type and some of the sources that you see listed here including agencies and statistical time series. Those are clearly the main sources for our quantitative data. Most of the qualitative and mixed methods data comes through individual academics. So an academic would get a research grant conduct the research and then they would deposit the data that they've produced that they've created at the end of their project. And also of course we do hold some originally paper based public records and historical sources. So that includes things like the census. Where can I find information about it we have a website UK data service dot ac dot UK, and that holds a lot of information. And from there you can find our catalog. And we also have hundreds of pages, which discuss methodological issues like gaining consent anonymizing data storing data. There's also some students specific tutorials like our data skills modules. There's also some exercises and workbooks that are based on collections we hold, and there's a help page if you have specific questions about how to use the website. Okay, but getting back to this dissertation projects. What kind of research projects. Can you do reusing data from an archive. Well, we really need to go back to the beginning to answer this and think about the research process. I really be familiar with this model. It starts with some kind of topic or general direction for your research. You'll do a bit of background research into the literature already on your topic. And from there hopefully you're inspired to ask a research question which builds on that body of research. And once you have your research question, you then have to decide the best way to answer that question and design your project. And then you have to settle down your method, you then collect the data, analyze it, and then do the write up. And that's what you submit for your dissertation. And you might have a few extra steps there or you might swap a couple of things in in the process depending on your theoretical foundation. But, you know, generally speaking this is normally how we think of the research process. And when you're doing a project on secondary analysis. This will look a little bit different. So, this model clearly shows how the research question is built from your chosen topic area your preliminary search your and possibly the literature that you find. However, with secondary analysis, the research question is derived directly from the data. So you start with the topic that you're interested in. But instead of looking for literature. You look for data, and you start evaluating collections. And when you find a collection that intrigues you, you then ask a question, a research question of that data. And from there, you would then find out what literature exists on the question. And you wouldn't need to collect the data, you just need to access it, which is of course one of the key advantages of secondary analysis. The data is already collected you just need to get your hands on it, either by downloading it from the catalog page, or you may have to actually go to the archive if it's only available in paper form. Increasingly that's less the case we tend to digitize things. But once you have it, you can then analyze it and you write up your dissertation. So the key point I'm making here around reusing data for a dissertation project is about where your research question comes from. It would take a long time and a lot of inside knowledge about data within archives, in order to be able to come up with a research question, and then search for the perfect data to answer it. You'll be searching for the right data for a really long time, unless, as I said before you already have a really good working knowledge about the collections that are held by the archive. For a dissertation project when your time and your resources are limited. You'll want to first look and see what data is out there first, and then develop a research question, which gives a new take on that data. You can of course spend the time looking for the right data, data that interests you. But for a dissertation project, you may want to, you know, just see what data exists on your general topic area before nailing down a specific research question. With that being said, it's probably important to have some kind of idea about what kind of project you want to do. And while you might be exploring data without a specific question in mind, you may want to think about what kind of research design your project will follow. So there's four types of reuse projects that will lend themselves well to dissertation projects. There are reanalysis, a replication study, comparative study, and a re-study. So reanalysis is probably the one that comes to mind when thinking about secondary analysis. This involves thinking about the wide range of approaches you can take in the analysis of a data set. It usually means asking some kind of different research question from what the original researchers were trying to do. So for example, Clive Seal and Charterus Black did a study using comparative keyword analysis of illness narratives. The original illness narratives had been looked at exclusively for health research. The interviews were meant to explore how diagnoses were made. When Seal and Charterus Black came along to do the comparative keyword analysis, however, they were much more interested in an analysis of the discussions between patients and doctors rather than the actual health issues that came up in the interviews. So the question can be very different in that kind of way. Or sometimes a question can be on a similar topic to the original research, but have a slightly different focus. So for example, Joanna Bornat looked at gerontology as a topic, and she found a couple different data sets looking specifically at gerontology. However, Bornat's research question was on racism, which wasn't the focus of the original work, but the data set was rich enough to allow her to explore that theme within the existing data. So if you want to do the exact same analysis strategy, that would be a replication study. So right now there's a real concern about reproducibility of research and replication studies can help kind of reveal some of the messiness that's involved in working through data. One of the most famous or rather infamous examples of replication is from Thomas Herndon, a postgraduate student at University of Massachusetts. So he was assigned an assessment to replicate results from a published study. So he chose Reinhart and Rogoff's 2010 paper, Growth in the Time of Debt. Basically, the paper came up with the proportion at which your national debt can be of your GDP before you see negative economic growth. So Thomas Herndon polled the OECD data that they cited within their study, and he re-ran the analysis as the paper had laid out. But he got a completely different answer. The paper published said that debts cannot exceed 90% of GDP, but he calculated that debt can actually exceed your GDP. And even then it's only quite a minimal impact on economic growth. So after contacting the original investigators, he found a flaw in their data set where they had miscopied some cells that they had downloaded from the OECD. So the full story is published in 2013 in the New Yorker. A replication study hopefully won't always find those kinds of flaws in the original study. But nonetheless, it's a study design that's worth considering, helps you develop an appreciation for the research process, and you can even develop a project whereby you might re-run a series of studies on the same topic, or you explore a complicated data set that's got, you know, missing data that you need to transform variables or so on. You can also do comparative work. So you might be looking at international comparisons between two countries or comparing social subgroups of the population based on a shared social characteristic. Our key data pages for quantitative data sets outline some of those large national surveys that are held at the archive, and any of those would allow you to do some comparative work without having to collect two sets of data. You could compare samples across time, across geographic place, across gender, ethnicity, you know, those characteristics are usually collected as standard for those larger surveys. And the final type of reuse that is going to be exemplified by a case study that I'll go through with you. And this case is a re-study. And that's where you replicate the methods of the study for purpose of comparison. So it does a bit of secondary analysis on the original data, but it also allows you scope to collect a little bit of your own data. So the example of this kind of reduced project is from the collection school leavers study. So the original study was conducted by Ray Paul in the late 70s as part of a much wider community study on the Isle of Sheppey. And as part of that project, Paul asked teachers to set a particular essay just before students were due to leave school, prompting them to imagine that they were reaching the end of their life and something made them think back to the time that they left school. And they were then assigned to write a short essay of what happened in their life over the next 30 to 40 years. In 2009, Graham Crow and Don Lyon, and that's Graham Crow there with Ray Paul, decided to reanalyze the data set and focus solely on student aspirations. Using the very same methodology, they conducted a re-study on school leavers for students on the Isle of Sheppey in 2009, 2010. So the prompt that was supplied to students in a later data collection was nearly the same. Imagine that you're at the end of your life and reflect back on what you've done since leaving school. They then transcribed those essays and compared the themes from the new set of essays to the set of essays that was collected in the 70s from Ray Paul. And you can see the wording of the prompt here. There's a small snippet as well of one of the essays. And the findings are fascinating. They show difference of young people's aspirations after one generation, 40 years of time has passed. But how exactly were they different? Well, in 1978, students expected much more grounded and arguably mundane sorts of jobs. Career progression was gradual and followed on from hard work. And there was sometimes talk of periods of unemployment or even quite morbidly their own early death or the early death of someone they loved. And you can see a few examples in the left column of some of those quotations of those essays, such as the one at the bottom. I longed for something exciting and challenging, but yet again I had to settle for second best. And I began working in a large clothes factory. 2010, however, showed students imagining well paid and instantaneous jobs filled with choice, but also some uncertainty. Crowan's research team also noted a clear influence of celebrity culture within those essays. So for example, you have the quote on the bottom of a girl who writes, in my future, I want to become either a dance teacher, a hairdresser or a professional show jumper horse rider. If I do become a dancer, my dream would be to dance for Beyonce or someone really famous. The study was a larger one stick for a dissertation project, the goal of ways in the research outputs. So as part of that initiative they published the living and working on Sheppy website, which helps to create a shared history and memory of what living on the Isle of Sheppy means among the community. And while this would be an ambitious project to say the least for a dissertation. It's nevertheless a good example of how you can combine a bit of data collection and data reuse into one project. So for those of you who are doing PhD work. This is certainly something you could consider for your projects for others you can design a much more feasible study with a smaller sample and different kinds of outputs. So hopefully you're now budding with ideas of what you want to work on. And what you might want to find in the archives, since you're not collecting data yourself, you'll find reuse projects have very few ethical considerations comparatively, and hopefully you wouldn't hit too many snags with ethical review boards. However, it doesn't mean there aren't any ethical considerations. So there's two key points that I want to make before diving into qualitative and quantitative data. And the first of these starts with the access point. How do you get permission to use the data. If you are reusing data from an established archive like the UK data service. We've taken a lot of the pain out of negotiating licensing issues with the person who collected the data. So this usually means that you need to sign what's called the end user license. And that is a legal document, which states that you're going to do two really important things. One is no sharing the data onward, including with your supervisor. So if you need help with your analysis and your supervisor needs to see the data, then he or she will need to register and download the data themselves. The end user license stipulates that you cannot under any circumstance share the data or your login with anybody. Second is that all of the data that we hold is anonymized, which is again likely to be the case if you're reusing data from an archive. However, just because something's been anonymized doesn't mean it's completely impossible to figure out the identities of participants. So you can go to the National Center for Research Methods, which kind of have these video tutorials about anonymization theory, which in short makes the argument that no anonymization strategy will ever be 100% effective. So consequently, should you inadvertently uncover identities of any of the participants, then the end user license stipulates that you won't reveal that identity to anyone. It's extremely unlikely that would happen, but just to kind of cover all bases in the event it does. So those are the key issues to recognize when sending our EUL or end user license. And once you have sorted out access, the second point that I want to make is that you then need to ensure you cite the data. So in short, citing archive data helps data creators track the impact of their study. It also supports reproducibility makes it easier to find the data that you use for your project. The issue is so important that the UK data service has run campaigns and has information on the website, which goes into further detail about data citation. And it helps explain why that's a really important ethical issue with the UK data service. We make this easy by supplying you with the citation on our catalog pages. So when you're on the catalog page, you can go to the citation and copyright section, and you'll see that there is a citation there that you can literally just copy paste into your reference list. You can even select the citation style and then copy paste that into your work. You've got access, you've sorted out citation now comes actually doing secondary analysis. So first I'm going to talk through qualitative data, a couple of key issues and getting started with quality data, and then I'll pass over to Ali who's going to talk about quant data. So I'm going to cover orienting yourself to the collection and recontextualizing the data. And then I'm just going to make a couple very small points about sampling. So when you first download a qualitative data set you'll get a zipped folder which looks a bit like this. So you'll have some folders that are stuffed with files. So qualitative data is held as RTFs, that's the archive standard. So to find the data you need to go to the RTF folder and RTF file is just a word processing document. This folder when opened looks a bit like this, right. So here you go all your data nicely organized, clicking on one of those files would open up a file, which looks a bit like this. You can just snip it of the school leaver study essay, or at least what it looks like in its entirety. The RTF folder has over 100 of those sorts of files from that collection. But the files don't just have to be essays or interview transcripts. For example, you might find PDFs of handwritten notes like in the upper left corner here, or you might find ethnographic notes like those that are on the right. You might also have images or videos like those in the lower left hand corner. Most likely you'll end up opening an interview transcript, like the one that's seen here. It should have clear turn taking and speaker tags so you know who is talking. There's a lot of different data types that are available so make sure you have a good look through the collection first and actually see what kind of data is there. All you need to do is orient yourself to the project. And I think the main point to make here is to not underestimate the amount of time that it would take to get acquainted with the data set. There may be multiple levels of context to get through in order to really understand the data. And what I mean by that is you may have more than just the data that's collected at the time of the interview, or data collection. You also need to consider the basic social characteristics of the participants, or the historical time period in which the data was collected, or perhaps where the data was collected. So really the idea is that you need to understand the data set as a whole in order to really get at the root of what the data can convey. Every collection archived at the UK data service does have some documentation provided with the data, which is a really useful starting point for that recontextualization process. So it often contains more information about the methodology such as an interview schedule, or call for participants that was used or sometimes it includes segments from publications that arising from the original study or even funding applications. For qualitative data sets, this documentation is called the user guide. So here we have an example of a user guide. This one happens to have an interview guide for interviewers as well as the consent form, a sample profile, and it just helps give further background information to help you understand how the data was collected. But what if you want to know more about the participants themselves? Well, every qualitative study also has what's called a data listing. And here is an example of one of these. It's basically an at a glance table, which gives you a brief overview of all of the data that is in the collection. So each row usually represents a piece of data or participants, and each column lists some kind of characteristic or attribute of that interview. So it's a quick way of getting to know who took part in the study. In addition to the context of the data, you may also have to think about your sample. So for example, if the data set is large, you may need to take a sub sample qualitative collections tend to be smaller studies anyways, but many of the archive data sets are funded, and they can collect a considerable amount of data. If it's a smaller dissertation project, you'll want to be realistic and consider the amount of data you can actually work through. So for example, the Edwardians collection, which was put together by Paul Thompson, and is widely considered to be the first oral history of Britain contains 453 80 plus page interviews. Basically, most dissertation projects probably have an expectation of maybe like six to 10 interviews. So you would need a clear sampling strategy to help you choose which interviews to look at, or you might be interested in a particular subgroup of the population. So again, you'll want to think about what criteria it is that you're looking for. You may also want to combine data from different collections to complement each other. Now, remember it would take a lot more time to sift through and find pieces of different data sets to pull together, but this is another possibility. If you feel that all of the data speaks to the same topic and research question, and you've done the work on recontextualizing to ensure that the interviews kind of work together, that they complement each other. Then certainly that's another thing you can do is just combine. So that's it for qualitative data. I'm now going to hand over to Ali to talk about the key methodological challenges of quantitative data. Thanks, Maureen. So yeah, I'm just going to give a quick rundown of the key things you need to think about if you're using quantitative secondary analysis. So I'm going to cover the key things you need to consider in two main areas. First of all, when you are selecting your data, and then when you are getting to grips with and understanding it. So selecting your data. So some of you might already know which topic area you want to look at or explore for your dissertation. But if not, here's an idea of some of the data available. So the UK data service holds quantitative data on a variety of topics. But just to give you an idea, there's data available that could allow you to look at the environment, workforce patterns, health care, family spending, attitudes to the police and criminal justice system, how people spent their time during the COVID-19 pandemic, people's attitudes, and their political opinions. And these are just some examples. So we're just going to do a quick menti poll. So just going to pop that up on the screen now. And so I want to know which topic are you interested in. So if you can navigate to Mentimeter and enter the code. Again, it's in the chat and also on the screen there and just pop in any ideas you might have for topics. Right. Yeah, very interesting. Yeah, we do have data on. We've got movement data, so migration, the census flow data, wellbeing. Yeah, loads of data on wellbeing, education, employment, COVID and mental health. There's definitely some data sets on that. Employment, political opinions. Yeah, lots of our data sets contain information on socioeconomic status. Yeah, data on adolescent mental health. I'm seeing loads of really, really interesting topics here. So I'll just give everyone one more minute. You can see we have a lot of interest in migration and wellbeing. Belonging, modern slavery, learning in the classroom. Great. So some really interesting topics in there. So keep those in mind as we go through the rest of the presentation. So now you all have a bit of an idea or perhaps just kind of an initial thought about your topic. You might have already done some of that background research and reading that Maureen briefly mentioned at the start, or maybe you've taken another approach where you've decided to start looking for your data first and seeing what's out there. But regardless, once you have a general idea of your topic area and you're starting to think about the data that you might use to explore it, a good place to start is by thinking about what you are trying to measure. This is key for a quantitative reuse project because you need to think about key concepts you want to measure and relate these to variables within a dataset. So for example, let's say we were interested in looking at the relationship between fear of crime and age. So our key concepts here are fear of crime and age. So we need to find some data which has variables that measure these concepts and allow us to formulate and or answer a research question about them. Now as Maureen said earlier, it may not be possible to find the perfect dataset and you could spend a bit of time searching. So you might find it easier to start from data on a general topic and then derive your key concepts and questions from the existing or available variables. If you already have a question in mind, that's absolutely fine too. You might just have to be flexible and revise it based on what data is available. So if you're looking for data on your key topics, there's a few key places you can start looking from the UK data service. So you can type keywords into the data catalog and we'll do a little activity on that later. You can use our variable and question bank. This allows you to search for particular variables and tells you the datasets that they are contained in. So for example, if we search fear of crime, it should bring up the crime survey because that's where that variable is contained. But please be aware that not all of the datasets are covered by the variable and question bank. So we have the Hacephasaurus. So this lets you search key social science concepts. And again, it will tell you which dataset these topics are linked to. And we also have the theme pages. These allow you to search for dataset on a particular theme. So things like crime, health, environment, et cetera. And all of the links to these can be found on the find data section of the website. And we will have a practice in the practical too. So once you found a dataset that you think might be suitable, you'll need to consult the catalog page. This will give you an overview of the key topics, backgrounds, and methodology. And you can also access the documentation from here, including any user guides, technical reports, lists of the variables included in the datasets, and any notes added by the data producers, outlining changes that may have been made since the data was originally deposited. Quantitative documentation does look slightly different to the qualitative one that Maureen flagged earlier. So for example, you may have these little notes, things like the data being reweighted, some changes to variables, maybe there's been a calculation error, et cetera. And the documentation will tell you all about that. The most important thing, at least when you're planning your project, is to find the list of variables that again can be found in the documentation. This will usually be in a user guide, a variable list, or the code book or data dictionary, which most, if not all, qualitative quantitative datasets have. So here you can find information on what the variables measure and who they apply to, which I'll discuss a bit more later on. So about your example, if we wanted data on crime, we might look at the crime survey for England and Wales. It's a large important survey, and looking at the title, I think it might cover our key topics. So it's a really important source of information about crime, crime statistics that are independent from those records held by the police. It is a repeated cross-sectional survey. It's conducted every year, and it surveys 35,000 individuals aged 16 plus and a smaller sample of 3,000 individuals aged 10 to 15. So back to our variables, having a look in the code book for the crime survey for England and Wales, I can see that it has two variables that might measure our key concepts. These are quality life, which asks how much is your own quality of life affected by fear of crime on a scale of 1 to 10, and age, which measures the respondents age. So we found these variables, but now we need to think carefully about whether they're suitable. And an important step is to think about what they measure in a critical way. So does our quality life variable actually measure fear of crime, or does it measure how much fear of crime affects an individual's life? And with all of your variables, this is something you'll need to think about and consider. So what is the question really asking? And to understand this, you can again look at the documentation and have a look at the original question. As well as considering your variables and concepts, you also might want to consider the kind of analysis you want to do or can do with the data. So for example, if you're interested in looking at individuals at a particular time point, you might want to use a cross-sectional survey. If you want to look at individuals at multiple time points, you could use a repeated cross-sectional survey. Or if you want to look at the same individuals over time, longitudinal data might be suitable. If you're interested in small geographic areas, there's the census data and that flow data, again, which looks at migration around the country from different work areas, et cetera. And if you want to compare countries, there's that international time series data that Maureen mentioned earlier. It's also important to think about your population. So that's the group you want to measure. So for example, this might be the population of the world, the UK, or perhaps a particular city or local authority area. And as well as this, you also need to think about your unit of analysis. So are you interested in individual people or are you interested in households? And this will affect the data you use as some data sets are only available for particular geographies or for certain units. Finally, it's important to remember that this process isn't linear. You might need to go back and forth and realign your question with the available data. You might need to compromise if the perfect data set isn't available or choose different variables and refine your research question. And this is just all part of doing secondary analysis. So going back to our example, we can see that I've redid the question here slightly to fit what our variables measure. So once you've chosen your data set, there are a few final things you need to consider in terms of understanding your data. So as we said, the documentation usually contains information on the variables, but it should also have information on the questionnaire used to collect the data. So as I said earlier, it's very important to understand the questionnaire and something that's particularly important is the routing. This is who was asked which questions. This is because many questionnaires use something called computer agent interviewing which sends respondents through the questionnaire by different routes depending on their previous answers. So for example, do you drink alcohol? Yes, you would then be sent to a question how many units per week, whereas if you answer no to that question, you wouldn't be asked the question on how many units you drink, for example. So therefore, many questions in a survey may only apply to some of the sample. So again, please check the documentation to see who was asked about the variables you're interested in. So this is an example of a variable called flex 10 from the labor force survey which relates to special working arrangements. This is an example of what this might look like in the documentation. You can see the exact wording of the question along with a range of answers and how they've been coded underneath this. It says that the question is asked if the respondent has said they are in work. And again, that text under where it says applies if in work during the reference week. This gives information on how the survey identified who was in work by their responses to other questions or answers to other variables. You will find that the documentation looks a bit different across different data sets, but this gives you a general idea of what to expect. You can also find information on how data has been processed after collection. So derived variables are variables created from the raw data. And here's an example of this. So that flex 10 variable, which we talked about on the last slide has been used to derive a variable called flex W7. This shows the variable has been derived by taking into account the responses from the original variable. So those who responded seven on the flex 10 variable, which was saying that they had a zero hours contract, they are now coded as one on the new flex W7 variable. And this new derived variable therefore indicates whether someone has a zero hour contract or not. It can be a bit complicated to get your head around. So don't worry if this doesn't make sense right now. But once you get used to it, you'll find that you'll be able to read these flow charts and get to grips with it. And our help desk is always there to help as well if you're struggling. Again, not all documentation is the same. So not all documentation contains these diagrams and in some surveys it will just be the SPSS or the status syntax, which shows you how the variables have been derived. But again, I just want to stress it's really important that you're understanding the origin of any of the variables that you're using. You also need to think about sampling. So surveys and similar quantitative data sources are almost always based on samples. And one important question you need to ask about your data is, is the sample representative? So you need to know who is included in the sample. Is it all adults? Is it only those in private addresses which might miss out individuals in things like care homes or hospitals? You also need to think what the response rate was and is there any information about differential response across the population, which might tell you about any potential bias. You will also need to find out if you need to use a survey weight in order to make the data representative. Most surveys will require you to do this. But again, all the information for this will be in the documentation. And for more complex surveys, they tend to have a specific weight and guide section. A second and final question you need to ask is whether you have enough cases to make a precise estimate. So for example, the crime survey for England and Wales has a large sample size, which should allow for precise estimates. However, if you are interested in a particular subpopulation or you're using a smaller sample, there may be insufficient cases in the sample to make these precise estimates. I've just given you a general overview of sampling considerations, but we do have more resources on the UK data website about this. And for more information, you can see our guide to survey weights and complex sampling. So just to summarize, with quantitative data, you need to think about your key concepts, what you're trying to measure and how these relate to variables in a data set. You need to check the catalog to help you understand your data. And you need to make sure that you are considering sampling. So I'm going to move on now to highlight some of our key dissertation resources. So these are our dissertation pages on the UK Data Service website. They can be found through the student pages. So if you go home, learning hub, and then the third little box along at the top is specifically for students. And if you click on that, it'll take you to the student pages, which will give you information on what data you can access as a student, how you can find an access data, our UK Data Service Dissertation Award, and further resources and training. You can keep up to date with all of our UK Data Service dissertation stuff by looking on hashtag UKDS Dissertations on Twitter. I suppose, actually, I probably should have put the new logo or X online. And yeah, so that's all of the key resources. And I think now we're going to move on to the practical part of today's workshop. I'm just going to retrieve our links. So what we're going to do is we're going to go through the worksheets. I've got some three different tasks for you to do. I'll pop them in the chat and then we'll come back together and we'll go through the answers in Mentimeter. So the first of these worksheets is a challenge, a task on finding data. So I'll put it in the chat and I'll also share it on the screen in just a sec. So if you can go to the UK Data Service homepage and have a go at browsing the catalog data, having a bit of an explore, and then searching for data either on one of the topics that I've popped up there or any data that might be relevant for your own research. And if you have any questions, do just add them to the Q&A box. Or if you're struggling, please do pop it in the chat. So I have about 10 minutes for that now. Okay, so for those of you who finished, I've popped up the Mentimeter code again. If you want to join it and then I just have some quick feedback. If you'd like to enter what data set did you find, whether this was one of the ones on the task list or any that was relevant for your research. So we've still got a couple more minutes. So I'll just give everyone time to finish exploring and then yeah, if you just want to pop any data sets you've found into the Mentimeter or if you don't have access to Mentimeter, you can pop them in the chat as well. I'm going to feed back in Mentimeter on this task now. I can see we've got a few questions in the chat as well. We will be answering those after the final task too. So if you are happy to navigate back to the code. So the first question we had for the quantitative survey is what is the observation unit? So is it individuals, households and families or both? So I give everyone a second to rejoin the Mentimeter and answer. Okay. So yeah, the correct answer is both. And you can see here this can be found in the coverage and methodology section on the catalog page. So what country does this data cover? Is it England, the United Kingdom or England and Wales? Yeah. So the correct answer is the United Kingdom. And again, you might have spotted it on the last slide. That's in the coverage and methodology section. So what topic examples did you find? I accidentally revealed the answer there. So what examples of topics did you find? Loneliness, health and well-being, poverty, COVID-19, homeschooling, volunteering, yeah, great, education. Okay. So yeah. So you can see here, yeah, there are lots of different topics. So yeah, homeschooling, there are the background variables, socio-demographic variables, social contact, neighborhood cohesion. And again, these topics can be found in the user guide if you scroll down to section 2.1 on the topics. Okay. And now moving on to the quantitative data. So what country do these data cover? So yeah, we've got some people saying UK, some people saying England, United Kingdom, England. Oh, it's some varying answers. So the answer is England. Again, can be found in the coverage and methodology section. And what kind of data are these? Are they audio recordings, numerical data or text? Yeah. So it seems pretty unanimous there. A text is the correct answer. And again, that can be found in the coverage and methodology section. So we've got one final task and then we'll answer some questions. So this final task is exploring a download bundle. So exploring one of the bundles that you would download for the qualitative data. So if I add that to the chat and I'll add the link as well. And then we'll have about five minutes on this one. So we've got no feedback from this one. So because we're a little bit shorter of time and I want to make sure we have time to answer people's questions, we'll have five, six minutes on this. And then you'll have the link to the worksheet. So if you want to explore it more in your own time, please feel free. Here's some links if you want to get connected, get in contact with us, links to our Twitter, how you can get help if you're struggling with any of this. And just to say thank you all very much for attending. And we hope that you found this useful. Please do get in touch with us if you do have any further questions. So hopefully, yeah, hopefully this has been useful and good luck on your dissertations.