 Hello, everyone. Thank you for joining us. We're just going to wait just a minute or two for everyone to kind of filter into the virtual room. I think the numbers starting to slow down now. So we'll go ahead and get started. Welcome everyone to this online workshop on doing your dissertation projects reusing data. We're going to be introducing secondary analysis for qualitative and quantitative data. My name is Maureen Haker. I've worked with the UK data service for about nine years now on everything from digitizing data to reuse projects. I also work at University of Suffolk as a lecturer in their childhood and education department. Allie, did you want to introduce yourself? Hi, everyone. I'm Allie Bloom. I work as part of the UK data services user support and training team based at the University of Manchester. I do a lot of work creating resources and creating a lot of student resources as well. So if this is your first online workshop with us, rest assured that you're all muted. And again, if you have any questions you'd like to ask as we go along, please feel free to pop those into the Q&A option. We're aiming to talk for around 40 minutes. And then we've got some exercises to help you get started potentially with a research project, but just with the process of secondary analysis. Allie and I are going to be monitoring the questions throughout this workshop and trying to answer some of them as we go along. But we should also have some time at the end to talk through some frequently asked questions. To start us off, I'm just going to go through what we are doing today. What we're planning on is to go over what secondary analysis actually is. And I'll look closer at some of the key methodological issues of we use projects for qualitative data. And then I'm going to hand over to Allie who's going to go over quantitative data. But before we get any deep in that, I'm going to give you a very brief overview of the UK data service for those of you who've not had the pleasure yet of exploring the archive. I've tried to pull in a few case studies to so you can actually see some of the data in action. And we're going to end with signposting you to some further resources which can help you if you're planning to reuse data for your dissertation project. This is an introductory workshop. So we've assumed that most of you will probably not have used the archive before, and that you may have had only introductory modules to research methods. Before we get too far along, I want to address a simple point first. What is secondary analysis. So in short, secondary analysis is a method which asks new questions of old data. It's analyzing data that you haven't collected yourself. So usually researchers collect far more data than they actually need to answer their own research questions. So think of those big national surveys which collect a lot of data on a representative sample or qualitative studies which have interviews that can last, you know, one or two hours or sometimes even more. Those data sets can answer a lot more research questions or be analyzed using different techniques. So secondary analysis basically makes use of that data and does just that it reuses the data that's been collected by someone else. But there is a complicated nuance here around terminology. So you may have heard other terms to describe this method. So there's secondary data analysis, there's reuse projects. They all kind of refer to, you know, basically the exact same method. So don't get too confused by that. But there is an ongoing debate about how to call this method. So in 2007, Libby Bishop wrote about the primary secondary dualism within research. Basically making the argument that there may be a privileging of methods where you go out to collect data yourself and using terms like secondary analysis or secondary data reinforces that hierarchy. But actually primary and secondary analysis are a lot more similar than different. So, you know, if you fully consider the key methodological issues of secondary analysis as a method. So consequently, you'll find an increasing use of the term like data reuse. And that's a term that that I'll use throughout the workshop. And that just takes into account the huge range of ways that data can be used and reused. And it also doesn't imply that any of those projects are secondary to the initial use of the data. You can use whatever term comes to mind first, but just do be aware there's a couple different ways of describing this process of reusing data. Okay, so now you know what secondary analysis is. But where would you find data that's already been collected. And this is where the UK data service, which holds the largest collection of social science data in the UK comes in. We're a comprehensive resource that's funded by the ESRC. Our main job is to be a single point of access to a wide range of secondary social science data. So the main purpose is the collection the ingest the processing of data, and then the further dissemination of that data for other people to use. So in addition to that data infrastructure core, we also have a service layer, and that provides extensive support training and guidance like this, what we're doing today. Who is it for what we like to think it's really for anyone that has an interest in data. Traditionally, our main audience and the people who probably both deposit and use our data the most tend to be academic researchers and students. There are other groups that are well represented as well, including government analysts charities foundations businesses research centers think tanks, all give us and use our data. Given the importance of data, how it's used how it's disseminated, we're trying to reach out and support a wide range of communities. What type of data do we hold, the majority of our data, at least judging by the number of collections is definitely quantitative data. So we hold over 8,000 9,000 collections of which, you know, about 7,000 are called quantitative collections. We hold a wide variety of that data. So there's survey data, which is both cross sectional and longitudinal. There's aggregate statistics there's domestic and international macro data. There's their census data there's micro data. And then of course we also have a sizable, but definitely smaller in comparison to quantitative data, but we do have qualitative and mixed methods data as well. They come from again that varies depending on the data type. So some of the sources that you see here including agency and statistical time series. Those are clearly the main sources for quantitative data. Most of our qualitative and mixed method comes from individual academics. So they may have gotten a research grant and then deposit the data with us after that research has ended. And then of course we also hold some originally paper based public records and historical sources, including things like the census and where can you find information about it. We've got a website UK data service dot ac dot UK, and that holds a lot of information from there you can find our data catalog, and we also have hundreds of pages which discuss methodological issues like gaining consent and anonymizing data storing data. There's also some students specific tutorials like our data skills modules. We also have some workbooks and exercises, and those are all based on collections that we hold. And there's also help pages so if you have specific questions about how to use the website, you can also go to our help page, but back to dissertation projects. What kind of research projects can you do reusing data from an archive. Really, we need to go back to the beginning to answer this and think about the research process. So hopefully you're familiar with this model. It starts with some kind of topic or general direction for your research. You do some background research on the literature that's already available on that topic. And from there hopefully you're inspired to ask a research question, which builds on that body of research. Once you have a research question, you then decide what's the best way to answer that question and you design your project. And once you've settled on your method you then collect the data, analyze it and then you begin to write up. And that's what you submit for your dissertation. And you might have, you know, a few extra steps or swap a couple steps depending on your theoretical foundation but generally speaking, this is normally how we think of the research process. When you're doing a project using secondary analysis, however, this process will look a little bit different. This model clearly shows how the research question is built from your chosen topic area, your preliminary search, and possibly the literature that you find. However, with secondary analysis, the research question is derived directly from the data. So you would start with a topic you're interested in, but instead of looking for literature, you look for data and you start evaluating collections. When you then find a collection that intrigues you, you then ask a question, a research question of the data. And from there, you would then find out what literature exists on that question. And then, of course, you wouldn't need to collect data. You just need to access it, which is of course one of the key advantages of secondary analysis. The data is already collected. You just need to get your hands on it, either by downloading it from the, from the catalog page, or you may need to go into the archive if it's only available in paper form. Once you have it, you then analyze it and write up your dissertation. So the key point I'm making here around reusing data for a dissertation project is where your research question comes from. It would take a lot of time and kind of inside knowledge about data within the archives in order to be able to come up with a research question, and then search and find the perfect data for it. You'll be searching for the right data for a really long time, unless, you know, as I said before, you already have a good working knowledge about the collections that are held by the archive. So for a dissertation project, when your time and your resources are limited, you'll want to look at the data first, see what's out there, and then from that develop a research question, which gives a new take on that data. You can of course spend the time looking for the right data. You know, that's, that's not an issue in doing it that way. But for a dissertation project, you may want to first and look and see what data exists on your general topic area before nailing down your specific research question. With that being said, it's probably important to have some kind of idea about what kind of project you want to do. While you might be exploring data without a specific question in mind, you may want to think about what kind of research design your project will follow. And there's four types of reuse projects that will lend themselves well to a dissertation project. And these are reanalysis, replication study, a comparative study, and a re-study. So reanalysis is probably the one that comes to mind when thinking about secondary analysis. And this involves thinking about the wide range of approaches you can take in the analysis of a data set. It usually means asking some kind of different question from what the original researchers were trying to do. So for example, Clive Seal and Charteris Black did a study using comparative keyword analysis of illness narratives. So the original illness narratives had been looked at exclusively for health research. The interviews were meant to explore how diagnoses were made. When Seal and Charteris Black came along to do the comparative keyword analysis, however, they were much more interested in the analysis of the discussions between patients and doctors, rather than the actual health issues that were coming up in the interviews. So the question can be very different in that sort of way. Or sometimes the question can be on a similar topic to the original research, but have a slightly different focus. So for example, Joanna Bornat looked at gerontology as a topic. And she found two different data sets looking specifically at this. However, Bornat's research question was on racism, which wasn't the focus of the original work, but the data set was rich enough to allow her to explore this theme within the existing data. If you want to use the exact same analysis strategy, this would be a replication study. And this is also possible. So right now there's a real concern about the reproducibility of research and replication studies can kind of reveal some of the messiness that's involved in working through data. So one of the most, you know, infamous examples of replication is from Thomas Herndon, who was a postgraduate student at University of Massachusetts. So he was assigned an assessment to replicate results from a published study. And this is not an unusual assessment. When he looked through, he saw Reinhart Rogoff's 2010 paper, Growth in the Time of Debt. And basically this paper comes up with the proportion at which your national debt can be of your GDP before you see negative economic growth. So Thomas Herndon pulled the OECD data to rerun their analysis as the paper had outlined. But he got a completely different answer. So the paper published said that debt cannot exceed 90% of your GDP, otherwise it will negatively impact your economy. However, he calculated that the debt can actually exceed your GDP. And even then it's only a minimal kind of negative impact on the economy. So, you know, he took it to his tutors, they said, this all, your work looks right, not sure what's going on, but you still have to do the assessment. So figure it out. So he contacted the original investigators, and they gave him the original data sets that they were working from. And basically what he found was a flaw in their data set. They had miscopied some of the cells from the OECD data sets into their kind of master data set they were working from. So the full story is published in 2013 in The New Yorker. A replication study hopefully won't always find those kind of flaws in the original study. But nonetheless, it's a study design that's worth considering and it helps you develop an appreciation for the research process. You could even develop a project whereby you rerun a series of studies on the same topic, or you explore a complicated data set with, you know, missing data, transforming variables, and so on. You can also do comparative work. So you might be looking at an international comparison between two countries or comparing social subgroups of the population based on a shared social characteristic. So we've got some key data pages for quantitative data sets which outline some of those large national surveys that are held at the archive, and any of those would allow you to do this kind of comparative work without having to go out and collect two sets of data. You could compare samples across time, across geographic place, across gender or ethnicity. So these characteristics are usually collected as standard for these larger surveys. The final type of reuse is going to be exemplified by a case study that I'm going to go through with you. And this is where you replicate the methods of a study for purposes of comparison. So it does a bit of secondary analysis, but it also allows you scope to go out and collect a little bit of your own data. So the example of this kind of reuse project is from the collection school leader study. The original study was conducted by Ray Paul in the late 70s as part of a much wider community study on the Isle of Sheppey. As part of that project, Paul kind of stumbled upon teachers who had set a particular essay for their students, just before the students were due to leave school, prompting them to imagine they were reaching the end of their life and something made them think back to the time that they left school. And they were then asked to write a short essay of what happened in their life over the next 30 to 40 years. In 2009, Grant Crowe and Don Lyon, and that's Grant Crowe there pictured with Ray Paul, decided to reanalyze the data set and focus solely on student aspirations. Using the very same methodology, they conducted a re-study of school leavers for students on the Isle of Sheppey in 2009, 2010. And the prompt supplied to students for the later data collection was nearly the same. Imagine you're at the end of your life and reflect back on what you've done since leaving school. They then transcribed those essays and compared the themes from the new set of essays to the set of essays that were collected by Ray Paul. And you can see the wording of the prompt here and a snippet of one of those transcribed essays here. The findings are fascinating and really show the difference in young people's aspirations after one generation, 40 years of time has passed. But how exactly were they different? Well, in 1978, students expected much more grounded and arguably mundane sorts of jobs. Career progression was gradual and it followed on from very hard work. And sometimes there were talks of periods of unemployment or even quite morbidly deaf or the early death of a loved one. And you can see a few examples in the left column of some of the quotations from those essays, such as the one on the bottom. I longed for something exciting and challenging, but yet again, I had to settle for a second fast. I began working in a large clothes factory. The later essays, however, showed students imagining well-paid and instantaneous jobs. They were filled with choice, but also a lot of uncertainty. And Crow and his research team also noted a clear influence of celebrity culture in those essays. So for example, you have the quote at the bottom of a girl who writes, in my future, I want to become either a dance teacher, a hairdresser, or a professional show jumper horse rider. If I do become a dancer, my dream would be to dance for Beyonce or someone really famous. Now, this study is a larger one and what might be realistic for a dissertation project. The goal was to engage the whole community alongside the research and find innovative ways of including participants in the research outputs. So as part of that initiative, they published the Living and Working on Shepi website, and that helps to create a shared history and memory of what living on the Isle of Shepi means for this community. This would be an ambitious project, certainly for an undergraduate project. But you know, it's a good example of how you might combine a bit of data collection, bit of data reuse into one project. And for those of you who might be doing some research based degrees like PhD or an MFIL, this is certainly something to consider for your projects. For others, you can design a much more feasible project with a smaller sample, smaller outputs. So hopefully you are now budding with ideas about what you might want to look at in the archives or what kind of project you might be able to do reusing data. And since you're not collecting data yourself, you'll find that reuse projects tend to have very few ethical considerations comparatively, and hopefully you wouldn't hit too many snags with any ethical review boards. But that doesn't mean that there aren't any ethical considerations. So there's two key points that I want to make before diving into qualitative and quantitative data. So the first of these starts at the access point. How do you get permission to use the data? So if you're reusing data in an established archive like the UK data service, we've taken a lot of the pain out of the process by negotiating licensing issues with the person who collected the data. And this usually means you need to sign what's called an end user license. And this is a legal document. It's about as exciting as any legal document that you would see. And it states that you're going to do two really important things. No onward sharing of the data, even with your supervisor. If you need help with the analysis and your supervisor needs to see the data, then he or she will need to register and download the data themselves. The end user license stipulates that you cannot, under any circumstance, share the data onward or share your login, which allows access to the data with anyone. The second is that all the data held at the UK data service has been anonymized. And this is likely to be the case if you're reusing data from a trusted repository or an established archive. However, because it's been anonymized does not necessarily mean that it's completely impossible to figure out the identities of participants. So there is a kind of thought around anonymization theory that makes the argument that no anonymization strategy will ever be 100% effective. So consequently, in the very unlikely event you inadvertently uncover an identity of one of those participants. You're signing that you will not reveal that identity to anyone. You'll continue to keep it anonymous. So those are the key issues to recognize when signing the end user license. Once you've sorted out the access. The second kind of ethical issue that I just wanted to bring up was that you need to ensure you cite the data. So in short, citing archived data helps data creators track the impact of their study. It also supports reproducibility in research, and it also makes it easier to find the data that you use for your project. This issue is so important that the UK data service has some information on its website, which goes into a little bit more detail about data citation, and will help you explain why this is an important ethical issue. With the UK data service, we also make this easy by supplying you with the citation that you need right on our data data catalog pages. So you look on our catalog page, that citation is underneath the citation and copyright box on every catalog page, and all you need to do is copy and pasted into your reference list. So you don't want to pass off the data as your own. That would be a very ethically bad thing to do. All right, so you've got access, you've sorted out the citation. Now comes actually doing secondary analysis. I'm going to talk through the qualitative data first, and there's a couple of key issues about getting started with qualitative data that I'll address. Then I'm going to pass over to Ali, who's going to talk about quantitative data. So first thing I'm going to talk about is orienting yourself to the collection. Then I'll talk about recontextualizing the data. And finally, I'll just make a couple of small points about sampling. So when you first download a qualitative data set, you'll get a zipped folder which looks a bit like this. You've got some folders stuffed with files. Most qualitative data is held as what's called an RTF. It's a type of Word document. So to find your data, you'll need to go into the RTF folder. So this folder, when you open it, there it is, looks like this. And here you are, all of your data nicely organized. Clicking on one of those files would open up a file which looks a bit like this. Okay, so this is a snippet of the school leaders essay. And that's what it looks like in its entirety. So the RTF has over 100 or so of those files. But the files don't have to just be essays or interview transcripts. So you might, for example, find something that looks like this in those data files. So these are PDFs of handwritten notes in the upper left-hand corner. There's also ethnographic notes, you know, like those on the right. And some collections also may have some images, possibly video, like those in the lower left-hand corner. We don't have as many of those with images and videos. Those are quite large files to store. But, you know, we are getting some of them, and we increasingly take that kind of data as well. Most likely, though, you'll probably end up opening an interview transcript, like the one that is seen here. So it should have clear turn taking, which means that it starts a new line with every new speaker. And it should also have speaker tags. So here the speaker tags are A and Q. So Q is the interviewer. A is the interviewer. You should know who's talking just by glancing at the interview transcripts. And there's a lot of different types of data available. So make sure you have a good look going through the collection first, see what's actually in there. Okay, the next thing that you want to do is orient yourself to the project. And I think the main point to make here is not to underestimate the amount of time it will take to get acquainted with the data sets. There might be multiple levels of context to get through in order to really understand the data. And what I mean by that is you may have more than just the data that's collected at the time of the interview or the data collection. You may also have to consider, for example, basic social characteristics of the participants, the historical time period in which the data was collected, or where the data was collected. So really the idea is that you need to understand the data set as a whole in order to get out what the data can convey. Every collection at the UK data service has some documentation provided with the data set, and that would be a really useful starting point for that. It comes, you know, with information about the methodology, such as the interview schedule or it might have the call for participants. Or sometimes it includes segments from publications that are arising from the original study or funding applications even for qualitative data sets, this is called the user guide. Here's an example of what a user guide looks like. This one happens to have an interview guide for interviewers, as well as a blank consent form, a sample profile, and so on. So it's just further background information to help you understand how the data was collected. But what if you want to know more about the participants themselves. Every qualitative study also has what's called the data listing. And here's an example of one of these. It's a table which gives you a brief overview of all of the data in the collection. So each row represents a piece of data, or a participant, and each column has some sort of characteristic or attribute to that interview. So it's a quick way of getting to know who took part in the study. And it really is just an at a glance look at this at the participants the sample as a whole. In addition to the context of the data, you may also need to consider the sample. So for example, if the data set is too large you may need to take a sample. Qualitative collections tend to be smaller anyways, but many of the archived data sets are funded and they can collect a considerable amount of data when they are. In a small dissertation project you'll want to be realistic and decide if you need to limit the number of participants to a smaller sub sample of that larger collection, and think about what the strategy for selecting those cases is. So for example the Edwardians collection, which was put together by Paul Thompson and is widely considered to be the first oral history of Britain contains about 453 80 plus page interviews to huge collection. Conversely, most dissertation projects probably have an expectation of maybe like six to 10 interviews, something like that. So you would need a clear sampling strategy to help you choose which of those interviews to look at. You might also be interested in a particular subgroup of the population. So again, you'll want to think about what kind of criteria that you're looking for. You may also want to combine data from different collections to complement each other. So remember, you know, it would take a lot of time to sift through and find the pieces from different data sets to kind of pull together, but that is another possibility. So if you feel like all of that data speaks to the same topic and research question, and you've done the work of recontextualizing to ensure those interviews kind of meld together and work together. Then that's that's another option you could do. All right, I'm going to hand over to Ali now to talk about the key methodological challenges of quantitative data. Thanks Maureen. Sorry, it's just sorting out my screen share that so yeah as Maureen says I'm going to give a quick rundown of the key methodological things for quantitative analysis. So I'm going to cover the key things you need to consider in two main areas. So when you're selecting your data and also when you're getting to grips with and understanding it. So we'll start with selecting your data. So I just want to run another quick Mentimeter poll to have an idea of what kind of topics you're all interested in for your dissertation. So I'll just repop the Mentimeter poll up on the screen. So again, some of you might have an idea. You might already know your topic or even if you don't just any general thoughts, ideas, suggestions that you're interested in. I'll just give 30 seconds or so so we can see what everyone's interested in. So yeah, public policy voting ideology. Minority stress there. I've never heard of that. Sounds very interesting. Land use, health, interpreting studies, health inequalities. Yeah, we've got the health survey data sets for that transport. There's some transport data sets as well. The future of work, the labour force surveys very useful for that and personal finance. We've got the families spending data sets. I can't remember what they're called. They're abbreviated to WAS energy use. Yeah, there's data sets on those. Okay, I'll just give everyone two more minutes. It's just interesting to see what everyone's what everyone's interested in. Great. Okay. Thank you all. I'm going to head back to the presentation now. So as we said before, some of you might already have an idea and it looks like lots of you have really great ideas about your dissertation topics. But if not, here's an idea of some of the data topics that we have available. There's a whole data on a variety of topics such as the environment, workforce patterns, health care, family spending. So I know that someone was interested in spending their attitudes to the police and the criminal justice system. How people spent their time during the COVID-19 pandemic attitudes, so political attitudes. I know we had some interest in politics there and political opinions as well. And these are just some examples. Let's say that you now have a general idea of your topic area and you're starting to think about the data you might use to explore it. And a good place to start is thinking about what it is that you're trying to measure. And the key for a quantitative reuse project is to think about your key concepts and how these relate to variables within a data set. So for example, let's say we're interested in looking at the relationship between the fear of crime and age. So we need to find some data with variables which will allow us to measure these concepts and formulate or answer a research question about them. Now, as Maureen said earlier, it might not be possible to find the perfect data set and you could spend a lot of time searching for this. So in general, it is easy to start from data on a general topic and then derive your key concepts and questions from the existing variables. That said, if you do already have a question in mind, perhaps from a research proposal assignment or a previous discussion with your supervisor, you might just have to be flexible and revise it based on what data is available. So let's say you're looking for data on your key topics. And there's a few key places to start and I know we had a question about this in the chat as well. So where I'd suggest you start is looking on our theme pages. So this lets you search data sets by themes. You can also type keywords into the data catalogue and you can use the variable and question bank though this isn't fully up to date with all of our data sets just yet. And we also have the Haset thesaurus, which lets you search by key concepts and the links to these can all be found on the find data pages of our website and we will have a practice in the practical as well. So once you found a data set that you think might be suitable, you'll need to consult its catalogue page. This will give you an overview of the key topics background and a brief overview of the methodology. And you can also access the documentation as Maureen mentioned earlier, including any user guides, technical reports and lists of variables included in the data set and any notes that you might need to be aware of for the data use. And I know we had another question earlier as well about how do you see the variables that are available in the data set before you download it? Well, this is also found in the documentation and for quantitative studies, it's usually either contained in the user guide, the variable list or we have what's called a code book or a data dictionary, which contains a list of all the variables available in that data set. So I'd encourage you to have a look there. So back to our example question. If we wanted data on crime, we might choose to look at something like the crime survey for England and Wales. And this is a large survey which might cover our key topics. So it provides crime statistics independent from police records. It's a repeated cross sectional survey. I'll talk about the different types of surveys in a bit. It's conducted every year and it has a large sample size and a smaller sample of those aged younger as well. So back to our variables, I can see that having a look in the code book, the crime survey for England and Wales has two variables that might measure our key concepts. So quality life, which is how is your quality of life affected by fear of crime and the age of the respondent. So we found these variables, but now we need to think carefully about whether they're suitable and an important step is to think critically about what these variables measure. So have a look at our quality life variable. Does this measure fear of crime or does it actually measure how much fear of crime is affecting an individual's life? And this is where you might start thinking about reframing your question based on what the variable is measuring and you'll need to think about this and consider this with all of your variables. And to do this, you can look at the original questionnaire in the documentation and get an idea of what information the question was really asking. As well as considering your variables and concepts, you might also want to consider the kind of analysis you want to do or can do with the data. So different data types can allow you to do different types of analysis. So cross sectional and repeated cross sectional allow you to look at particular points in time. Longitudinal data allows you to follow individuals over time. And if you want small geographic areas, you can use census data. And for comparing countries, which I know we had a question about as well, international time series data or international macro data can be useful. You also need to think about your population. So that's the group you want to measure. So that might be the whole of the UK, the world or a particular city or local authority area. And you'll also need to think about your unit of analysis. So do you want to measure individual people or households? And this will affect the data that you can use because some data sets are only available for particular geographic levels or certain units. And finally, it's important to remember that this process isn't linear. You may need to go back and forth and realign your question with the available data or compromise if the perfect data set isn't available or choose different variables. And this is all part of secondary analysis. So once you've chosen your data set, there are a few final things you need to consider when you're understanding your data. So I'm going to go through these relatively quickly and then we can move on to the practical. So as we said, the documentation usually contains information on the variables, but it should also have information on the questionnaire used to collect the data. And as I said earlier, to understand secondary data, it's also really important to understand the questionnaire. And something that's really important is the routing. So that's who was asked which questions. And this is because many questionnaires use something called computer aided interviewing, which will send respondents through the questionnaire by different routes depending on their previous answers. Therefore, many questions in the survey may only be applicable to some people in the sample. So it's a good idea to check the documentation and see who was asked about the variables you're interested in. So this is an example of a variable called flex 10 from the labour force survey, which relates to special working arrangements. It shows you the exact wording of the question, how the question was coded and that it only applies to those who were in work during the reference week. And underneath this, you can see how other variables have been used to derive the information on whether an individual does apply to that group and whether the question was asked to them. And you can find that the documentation will look a bit different across different data sets, but this can give you a general idea of what to expect. You will also find information about how the data has been processed after it was collected. So derived variables are created from the raw data. And here's an example of this. So that flex 10 variable that we were talking about on the previous slide has been used here to derive a variable called flex W7. And this flow chart shows how those who responded seven on the original variable, so those who said they had a zero hour contract have been coded as one on the new flex W7 variable. So therefore this new variable indicates whether someone has a zero hours contract or not. Now, I know this can be a little bit confusing, but once you've got the names in your head and you understand the data, it should become a bit more clear. And again, not all documentation contains these diagrams. And in some surveys, it will just be the SPSS or the stator syntax, which shows you how the variables have been derived. But again, really important to understand the origin of any variables that you're using. You also need to think about samples. So surveys and similar quantitative data sources are always, almost always based on samples. And one important question you need to ask about your data is, is the sample representative? So you need to know who is included. So some surveys are only asked to adults or those in private addresses. You need to know what the response rate is. So is there any information about differential response across the population? And this will tell you about any potential bias. And you also need to find out if you need to use a survey weight in order to make the data more representative. And you can again find this information in the documentation. A second and final question you need to ask is whether you have enough cases to make a precise estimate. So for example, the crime survey for England and Wales that we looked at earlier, this has a large sample size and should allow you to make precise estimates. However, if you have smaller samples or perhaps you're analyzing a particular subpopulation, there might not be enough cases in the sample for precise estimates. So I've just given a general overview of sampling considerations here. But there are a number of resources on the UK data service website and you can have a look at our survey weights and sampling guides as well. So in summary for quantitative data, you need to think about your key concepts you're trying to measure and how these relate to variables in a data set. You need to check the catalogue and the documentation to help you understand your data. And you need to consider your sampling and your sample sizes. So I'm just going to quickly highlight a few of our dissertation resources and then we'll move on to the practical part of the session. So we have our dedicated student pages on the UK data service website, which can be found under Home Learning Hub Students. And you can find information on what data are available, how to find an access here. I know someone asked a question about that earlier as well, our dissertation award and further resources. And you can also explore our learning hub and data skills modules, which will help you get to grips with the basics of analysing data, particularly survey data as well. And finally, in particular, I want to draw your attention to finding and accessing data for your project pages, which you can access from the student page. These have dedicated worksheets, videos and guidance to help you identify your data needs, search for and evaluate data like I've been talking about today and just think through your project. And you can also follow us at UKDS Dissertations on Twitter. If you follow that hashtag, that's where we put out all our new dissertation resources as well. So we're going to move on to the practical part, the activity part of today's workshop. So we've got some worksheets to help get you started. There are three worksheets. So we're going to start with the first one. Then we'll move on to the next one. And then we'll come back and we'll put the answers into Mentimeter. So we've got about 10 minutes to work through the first worksheet and that should be in the chat now as well. Great. Thank you, Jill. So that's in the chat. We'll give about 10 minutes for everyone to work through that. And then we'll go through the next ones as well. And again, if you have any questions, please do pop those in the chat too. I'm also going to share the worksheet on the screen in case anyone doesn't have access to the web page there. So you should all be able to see that as well. So our first task is to just go to the data catalogue and have a little bit of time exploring and finding and understanding the data. And then we'll come back on Mentimeter and see what data you found. And I can see we have a question in the chat. So I'll just answer that now while we're working on the activity. So we have a question on how to get data on the environment or green bond data or climate change data. So if I go to our website, apologies to those I've taken the task off the screen, but I'll just go through this briefly while we're going through the task. So this is our website. If you go to the Find Data page, then Browse and Access Key Data. And if you scroll down, you can see that we have a theme page on environment and energy. If you go here, these are our key data sets on the environment and energy, linking to things like the OECD environment statistics, energy efficiency. And then you can also go to our catalogue page and, for example, search climate change. And we'll see if anything comes up for that as well. So you've got some data sets here on climate change, information on rural communities. And I think if you also go to Topic on the left-hand side here, there's a search topic term for natural environment, which might be useful to you as well. I hope that helps answer your question. I'll just pop the task back up now as well. Okay, so I can also see that someone's asked how to access data from the website. So this isn't something we particularly cover in this webinar, but I will run briefly just give you an idea. So once you've found the data set that you want, so let's say we go Aging and we pick an Aging data set. You're in here and then you will click Access Data. And then you can go through the process for accessing the data. I won't go through all of that today, but basically you will select the data set you want and then add it to your account. You will need to be logged in, again, to the UK Data Service. But if you're part of an institution, you will have an academic institution. You will have access to that through your institutional login. And then you'll be able to add it to your account and follow the process. If I go to our student pages, so under here, if we go to Resources, and then Finding and Accessing Data for your project, this page provides all the information you need. This video at the end on registration and access specifically for students as well. And if you're not a student or you want a bit more wider information for accessing, not just as a student, you can watch our recording of our webinar on Finding and Accessing Data from the UK Data Service. That was a few weeks ago. It should be on the YouTube channel now or soon. And that will give you, talk you through the access process as well. Okay, so I think that's enough time for everyone to have a go at finding some data. So if I just bring up Mentimeter, so here we go. So I want to know what data set you found. So when you were searching, which data set did you find? You know, to Kingdom Children Go Online, yeah. Understanding Society, that's a very useful, widely used survey, Health Survey for England. Police Telephone Survey, British Social Attitude Survey, the Labor Force Survey, yeah, Welfare at a Social Distance, that's a very interesting COVID-19 data set as well. Real Estate Adaptation Innovation, What Is Governed in Cities, yeah, Labor Force Survey, the COVID-19 data sets. I'll just give two more minutes for everyone to have a go at the National Travel Survey, Police Public Attitudes, Crime Survey for England and Wales, the fraud-specific data sets. Great, I'll just leave that up for one more minute because I can see we have another question in the chat. So someone said, I can see that data sets are classified as being safeguarded, which requires more documentation and emailing for requests. So this is a common confusing thing that we get. So data sets on our website are categorized as open, which means anyone can access them and download them without needing to be logged in. Safeguarded data sets come in two forms. So those are data sets that come under our end user license. So a safeguarded data set is actually one that you would most likely be able to access by being logged in to the UK data service unless it says special license. If it's a special license data set, you might need some more additional, you know, additional agreements. Some of those data sets can't be used for commercial use or there may be additional agreements. But if it's just a safeguarded end user license data set, as long as you sort of logged in and you meet the end user license, you should be able to access that just by being logged in. Again, very slightly for some data sets. And then the data sets that we don't recommend you access as a student are the secure data sets because they have to be accessed through secure lab, very long processes, and not we say students can't really access those, especially for the length of time it takes for a dissertation. But safeguarded we know can be a confusing term, but usually things that say safeguarded are actually under our end user license and you should be able to access those. Yeah, that makes sense. Yeah, if I can just add Ali that it's something like 90% of our collections are just safeguarded, which means that as long as you register with us, you should be able to access them. So for every 10 data sets in your research, in your searches rather, none of them you probably should be able to just access once you register and sign our end user license. There may be one or two of them in there that are secure access or embargoed is one of the other ones where there's a pause on the release of that data. There's there's just a couple of other access types that that are a bit more restricted in that way. It might also vary though depending on what the topic you're looking for is it might just be that the topic you're looking for is quite sensitive. So, you know, if you're finding more of them are are kind of secure access or say it might just be the topic that you're looking for is actually quite a sensitive one potentially disclosive where you'd be able to re identify participants. That's the key concern there when we put it in a more restricted access level. Yeah, that's great. Yeah, that that's a thank you Maureen and just just to add to that as well. And you know you might the first time you encounter a particular data set a lot of them. You know we'll have a safeguarded version and then a secure version that contains that more disclosive information that Maureen was talking about but it's always worth looking if you come across a safeguarded data set. Sorry, you come across a secure data set now I'm getting confused. There may well be a safeguarded version as well that might take out some of those disclosive variables but still contain the information that you need. So that's why it's useful to look in the documentation as well to find if the information you need is in there. So I think we'll move on to our next task now. So if we could just pop that in the chat please try to remember what I think our next task is about. Let's see if I can open the document. Yeah, so it's about one. Great. I just completely forgot for a second we've got three of them. So yeah, so our next one is getting to grips with the catalogue and the documentation. So we've asked you to go and have a look at some particular examples of surveys or data that you can access. The first one is our new COVID-19 understanding society teaching data set. And if you can just work through part one and part two there and then we'll come and feedback on Mentimeter again. So I can see that we've also had a question about how far back our data goes. I'm not 100% sure on the answer for this. But if I have a look, if I just share the screen again, you can use our date filter. This says 440 I think that that's just how it's set to make sure that it captures all of the data. But you can put particular dates in here. 1890 if you click refine date. And then if you have most recently released selected and go down to last, that should give you the oldest data set. Sometimes the catalogue search can, it might miss stuff out, but it should be. That should be the oldest one I've got. I'm not sure if we have any older ones. I do have a comment on this. So secondary analysis is a relatively new method, if you will. So even within the quantitative tradition, it was in about the 1960s when the first kind of archive, if you will, the Roper Center had opened and allowed data to be reused. It doesn't mean that there isn't historical sources that are available as collections. So I'm just trying to remember what our census data goes back to. But I think it's earlier than 65, to be honest, because it's, it's kind of like this, this government data that was always available sort of thing. So there's a couple of instances, but other than that, the actual kind of regular collection and deposit of data would have been from the 60s onward. And for qualitative data specifically, you're probably more likely to come across stuff from the 1990s going forward. We do have some what we call legacy collections where there were some researchers who had saved all of their data themselves, like in their house. And so when they retired, or when they passed away, they just put those, all of those collections with the UK data service. So we work to digitize those and make them available. But generally speaking, it's 60s onwards for quant, 90s onward for qual, with some historical sources kind of thrown in the mix that would be few and far between. Thanks, Maureen. And I've just had a look. You're absolutely right. We do have earlier census data supported by the ISM project. And that's all available on the census pages on the website that I'll tell you how to explore and access it. And it will also give the information on census 2021. I'm not exactly sure on all of the release dates, but I know that some of the aggregate data and boundary data is available. So if you have a look through the census pages, we keep all of that updated. And the news pages as well as when the micro data and different data sets will be about will be out as well. Yeah, I think there's a couple other longitudinal surveys that that, for example, we're started in the 50s and they later kind of logged those first waves. I think it's an NCDS that started. Yeah, some of the ones. Yeah, there are there are a couple other examples of the earlier ones. But as a standard, where you'll see the influx of kind of collections coming in would be probably 60s onward. Great. OK, so I think we'll come back together and answer these questions in Mentimeter. So hopefully you've had enough time to explore these these catalog, these catalog and documentation tasks. So if I just go back to Mentimeter. OK, so we're going to start with part one, which is the Understanding Society COVID-19 study teaching data set. So I can see from the thumbs up that some of you have joined. So the first question was, what is the observation unit for this survey? So is it individuals, households, families or both? So I'll just give everyone a minute to have a go at answering that. OK, looks like everyone that's logged into Mentimeter has answered and feel free to log in whenever you've finished having a look as well. Yeah, so the correct answer is both. And you can see this finding it under the coverage and methodology section on the data catalog page. So if you scroll down, you can see the observation units there. OK, so our next question is what country does this data cover? So is it England, the United Kingdom or England or Wales? So the correct answer is the United Kingdom. And again, that's under the coverage and methodology unit, coverage and methodology section again. And you might have seen that on the last slide when I popped up the answer to the observation unit as well. So what topic examples did you find? So let's see. So yeah, social contact and neighborhood cohesion, volunteering, COVID-19 symptoms, homeschooling, socio demographic information. Yeah, great. And we can see. Yeah, so a lot of you are right. Those were the answers covered. So social contact, neighborhood cohesion and this little screenshot here is just an example of what it looks like in the user guide. There's a table in the user guide that gives information on the topics and also where they came from in the original survey, because the teaching data set was created from the original survey. OK, so now part two, this is the qualitative data. So the burden and impacting care homes, the mixed methods study. So first question again, what country do these data cover? Yeah. And as we can see the answer is England again in the coverage and methodology section. And what kind of data are these? So are they audio recordings, numerical data or text data? OK, yeah. Most of you are right. They are text data. Oh, and that's a question for later on. So we'll now move on to our final task, which is to have a go exploring a download bundle. And Ellie, do you mind just sharing your screen with the. Yes. Worksheet on there. This last worksheet is looking at the files for what we call an open collection. So I think it's something like less than 1% of our collections are open. And that basically means you do not have to register or sign the end user license in order to access it. There's been consent from participants. And of course the depositor to have the collection is open. So you should be able to just go to the data catalog page that is on there and just go ahead and download that zip file. And then from there, you'll just need to start trolling through and see if you can find some of these some of these items in there. Great. So we'll give 5, 10 minutes or so for that. And then we'll come back. We've got a few resources more to show you and then we'll take a few extra questions at the end as well. Yes, I don't I don't know. I don't think we need to go through the answers for this unless anyone has any particular questions or would you like to demo it Maureen? Or are we happy to just leave everyone to explore it? Yeah, I just wanted to just do you mind if I just take the screen for a second? I'm just going to pull up. Not at all. Go for it and give a just pull up, but just a couple of things to point out here. OK, so hopefully you can you can see my screen now. So if you follow the link, it would have taken you to the catalog page. And then if you click access data, it because this is an open collection, you can just download your file here. Right. And then you would get a zip file that looks like this. Right. So the MR doc folder always has your documentation that we talked about. So this includes the user list. Sorry, the data list, as well as any of the other kind of documentation like the interview guide that would be useful in getting to grips. And then the RTF folder is where all the interviews are held. Now, this particular collection has both transcripts and summaries. So the transcripts are the full transcript. So you would be able, you would be able to see everything that was said, whereas the summaries kind of give a condensed view. So there's a few different reasons we use the summaries. So it might be because the interviews transcripts can't be shared because they're a little too disclosive. So sometimes they might summarize them, take out those disclosive materials and then release those summaries. Sometimes it's just useful if you've got loads and loads and loads of interviews or really long interviews. Sometimes it's nice to have a one to two page at a glance. It can help you kind of identify the chronology of the interview transcript help you find the information you're looking for a bit quicker. So they can be a useful tool as well in terms of navigating the collection. So yeah, it was just a just a kind of talk through just the different elements there of it of the of the qualitative collection. So shall we just bring up. Yeah, shall I share my screen and then we can just, I'll go back to the PowerPoint. If I can. Sorry, I'm looking at all of the options for sharing PowerPoint and there we are. So hopefully you guys are quite interested now and having a look to see what we've got. I just put together a quick slide of some of our recent acquisitions just to give you an idea of the kind of breath of topics that we've got coming in. These are a mix of qualitative and quantitative collections. All of them are are available under a safeguarded license. Yeah, if you're if you're interested in some of the more recent issues like the pandemic. We've got things. We've got collections on that if you're interested in like vulnerable populations for example that would be hard for you to do your own research with such as those who are currently incarcerated or young people or even children you know we've got collections on that as well. Or things that are on quite sensitive topics that are difficult to talk about like in an interview setting such as bereavement, you know we've we've got collections on that as well. So do have a look through and see what kind of interests you. So in terms of collections there is quite a lot of collections that are published on a daily basis. And if you're looking for further resources on doing secondary analysis, we've put together some resources here so the Timescapes series is available openly online it's a series of essays that address different issues within secondary analysis. There's also some video tutorials that we have on our on our pages as well as the data skills modules. And then there's lots of tools and templates that might help you as well with managing your data. And then there's a couple of key texts like the secondary analysis of qualitative data that was published fairly recently, which can help you through the process as well. And we are on all sorts of platforms so find us on just mail for regular newsletters. You can find us on Twitter and Facebook and YouTube. And then of course our main website UK data service, you can always get in touch with us through the contact page there. If you're looking for the PowerPoint slides, those will be available on our website, following this recording. You can check our Twitter for more updates when it's posted it usually gets sent out through our social media so do have a look and get connected. And I will just say that's my fault there's a lot there's an error on the slide and we normally put a LinkedIn instead of our Facebook page because we tend to use that less and we also post a lot of stuff on LinkedIn now too so do follow our LinkedIn page as well. So if there are any more questions, thank you all very much for joining us today and taking part in all of our polls and activities. And again, you can find any information you need on our website in particular our student pages should be able to guide you with everything you need. Thank you all have a good rest of your day.