 Welcome to this workshop on getting started with secondary analysis. My name is Jen Buckley, and I'm here with my colleague, Maury. And we're both part of the UK Data Service user support and training team. And today, so I'm going to start with an overview. And today, we're going to look at some key issues to help you get started with secondary analysis. So I'm going to be looking, first of all, at secondary analysis in general and some of the main issues that come up when we're using data. And then we'll look more specifically at examples and issues around quantitative data first, and I'll do that. And then I'm going to pass you over to Maury to look at qualitative data. And during these parts, there'll be some activities and activities for you to do and then time for your own questions as well. We're in Zoom, there's a question and answer button. So feel free to add any questions to that as we go along and then we'll pick them up when there's a good moment. So let's start with thinking a little bit about what is secondary analysis. So secondary analysis involves the use or reuse of existing data. So we've got some data collectors who collect and use data for their own purposes. And these data collectors are varied. So we've got government or official agencies like things like the Office for National Statistics and there's large research organisations. So one example is Natsen Centre for Social Research. And there are also major academic led projects. So for instance, here we've got the Logo for Understanding Society, which is a major longitudinal study of households in the UK. And there are also many individual researchers and research teams. So we've got a picture on the screen of Peter Townsend, who did seminal research around poverty. As a following data collection and then some primary analysis, then data is often archived and then with data sharing, there's options to be able to reuse this data. So like for everything, there are pros and cons to secondary analysis. So first thinking about the pros. So firstly, there is high quality data available to simply download and use. So data from the UK Data Service Main Collection undergoes checks for quality and clarity. And many of the collections come from trained researchers and methodologists from universities, government departments, statistics authorities and research organisations. So it therefore makes sense in terms of efficiency and value to consider what suitable data is already available for your research. What's available also includes data that you could not collect yourself. So for example, so you are interested in changes to working conditions over time, archive data from ongoing surveys, such as the labour force survey, provide data that just could not be collected now. You can also get access to data at a scale you might not be able to collect yourself due to things like cost, time or opportunity. So for instance, from the UK Data Service, you can download data from social surveys with nationally representative samples that run into the thousands. So very hard to collect in a research project. And also many of the ethical issues around data collection are already dealt with. So there are many benefits, but there are cons too. So common disadvantages to secondary data analysis are first that the data that you might not just might not be available. Or what can also happen is there might be data available, but it's not perfect. So for example, it might be older than you'd like or the wording of a question in a survey or an interview might not be quite right for your needs. So in these situations, the data may still be suitable for your research, but you might need to be making sort of compromising or acknowledging some limitations in what you've managed to do with the data. And as you were not partied as a sort of data collection, you don't have some of the inside understanding around your data. So you therefore need to make an effort to get to know the data, which in practice will mean consulting very closely with some of the data documentation. So you're therefore reliant on the quality of this documentation. Ethical issues do also apply. So issues around confidentiality and the quality of the data. And I mean, you need to follow certain data access conditions that you will agree to at the point of accessing data. And there may also be limits on the data you can access. So for example, there's often very limited information about areas in geography as these potentially mean people could be identified. Or perhaps to get that level of information, you may need to access data in a sort of dedicated, safe environment. So some of the key issues then around reusing data are that researchers need to understand the data access conditions at the point of getting data and they need to work to understand the data. And then they may also be need to be pragmatic about whether the data is sort of good enough for for what you wanted to do. And let's have a little look at the what the research process might look like. So an ideal process might start with a research question and then you look at data and you might need to then evaluate data for its suitability. So you find something that looks good, you need to evaluate whether it's suitable and then hopefully you then go on to analyze the data. But in practice, it's it's not such a linear process. So for example, data for a research question might not be found. So you may need to then think again about the research question. You can find what seems like a suitable data set, but further evaluation of the data might mean you need to sort of do some tweaks or redesigns of the research or it might need to be that you need to go back and look for different data to answer sort of part of all of your research questions. And then further issues can appear once you start analysis of the data and get that further understanding of what the data is like. So there are therefore several sort of steps in the process. One of the first steps is actually finding data. So later in this session, we'll talk a little bit about some of the data that is available, but for more information about the sort of nuts and bolts of how you go about finding data, we've got several other resources that I want to highlight. So we've got how to find an access data workshop that we run. And there's also video tutorials on how to find data. So to look at what's already available, you can go to our YouTube channel where we have our video tutorials and recordings of past workshops. And to look at what's coming up, we can go to our event page on the website and see what workshops we've got coming up. And once you've found data, you need to consider its suitability for your research. So to do this, you need to understand what information was collected. Who was it collected from? When and where was it collected? And what sort of changes to the raw data have been done before being archived? And the key to sort of getting answers to these questions is the documentation that comes with the data. So we'll talk a lot about documentation. It's also sometimes called metadata. So this is data about data. And the documentation can include things like a catalogue record. So all of all data collections in the UK Data Service data catalogue have a record with lots of useful information. And then we have documentation from the study. So this might include user guides that have been put together by the data collection, data collectors, questionnaires, interview schedules, field notes and more. So you get all of this documentation that you need to familiarize yourself with. So we'll look at some documentation as part of both our sections on quantitative data and qualitative data. So I'm now going to start looking at quantitative data a bit more closely. So I'm going to look at some of the types of data that's available and how they might be used and go through a case study of some quantitative data that's been used in research. And we'll have some demonstrations and activity around looking at the documentation that comes with some quantitative data collections. And then we'll look at some of the key issues that might come up when you go about trying to do secondary analysis of quantitative data. So I'm going to have a little look at what's available by looking at some of the different types of analysis you might want to do. So, for example, to examine individuals, families and households or businesses at one point in time, we have a wide range of cross-sectional survey data or census microdata. So census microdata is data from the census, but it's at the individual level. So it comes from a sample of the census at individual level. So it's very much like the survey data in the way that it comes out. If you want to then start looking at trends over time, we have data from repeated cross-sectional survey data. So a lot of the major surveys that are done are regularly repeated. And this creates a sort of repeated cross-sectional data. And you can use this to examine trends. So one example is the health survey for England. So it's a survey that takes place every year and asks a range of questions about health and then can be used to monitor changes in population health over time. If you want to be able to follow individuals over time, there are several longitudinal studies. So we mentioned before this understanding society. So this is a study that's been following around 40,000 households over time. And it covers a wide range of different topics. And then if you are interested in data about areas, so rather than data that comes at the level of individuals, we have aggregated data from the UK census. So this might be data at the level of local authorities or smaller geographical areas. And to compare regions or countries, there's also collections of international macro data such as the World Bank indicators. So this is a sort of insight into the variety of quantitative data that you can find from the UK data service. And now look at a case study. So we're going to look at some secondary analysis of survey data. So survey data is one of the sort of largest parts of our quantitative data. And this is going to look at some research that used the crime survey for England and Wales to examine violence against people with disabilities and disability in England and Wales. So a little bit about the crime survey for England and Wales. So this is an ongoing annual survey. It was previously called the British Crime Survey. And it's been used as an, it's used almost primarily as an important source of information for crime statistics that collects data on crime that isn't dependent on reports to the police. And it has a large sample of around 35,000 adults aged over sort of 16. And then there's also an additional sample of children. And the survey asks about whether someone was a victim of crime in the previous 12 months. And then it also asks varied questions around demographics and also attitudes to things like the police and the criminal justice system. So though it provides good important data on crime statistics, it can be used in a range of different ways to understand people's experiences of crimes and views around crime related issues. The data itself is then stored as individual anonymized records. So there's a little image on the screen showing rows of data. And each one of those is an individual who took part in the survey. And there's actually the data comes in with different levels of access. So there is a safe, what we call a safeguarded version of the data. And this can be downloaded after registration with the UK data service. But there's also a secure access version. And this is a version of the data set that includes more sort of sensitive or disclosive information. And to get this data, you need to make further applications. So that's the crime survey for England. And then here we're going to look at how it's been used. So this is one example of some research. So they use the crime survey or the British crime survey, as it was called at the point where they use the data from. And they used it in research. It was published by him Calife in 2013. And the research made use of the introduction of some new disability measures in the survey. So this allowed them to identify people with a disability and also to distinguish by type of disability. And to get this level of information at the time, they had to make a special get a special license version of the data. And then overall, this gave them there was a sample of 46,000 adults and from this just over 9000 they identified with having a disability. And what they found was adjusting for things like age, sex and other socioeconomic characteristics, disability increases the risk of experiencing violence. And they found that levels of victimizations were highest amongst those with mental health problems. And at the time they estimated, there was about 116,000 victims of violence that were attributable to disability. So interestingly, this research was done some time ago. But one of the beauty of these ongoing surveys is that there is the potential to repeat studies like this to see if things have changed over time. And that is if the survey itself hasn't changed too much. So if they still got the same questions in. So now what I'm going to do is take you to have a little look at some of the documentation that helps us so we can see what the data documentation and study details are like when you come to get it. So I'm going to use the Health Survey for England as an example. And here we are. So this is the catalog record for the Health Survey for England 2018. And the start off, we can see the details of the study. So it gives us the full title. There's information around the access condition. So it tells us the data is safeguarded, which means that you can download it and download it straight away after registering with the UK data service. And there's information about who created the data here. If you scroll down, there's an abstract that provides detailed information about the study in general. And this also provides any notes around things that have particularly happened with that survey. So it's a really good source of information to start with. So for example, during the pandemic, these abstracts often included information about things that had happened to the survey during the pandemic. And then we can see a list of the main topics. So surveys like the Health Survey for England and Wales will tend to have a range of core topics that they include each time the survey is a run. And then there are additional topics that will be rotated over the years. So if you want to have a quick idea of what was covered in that particular year, you can see that straight away on this page. And then if we scroll down, we can find details around the study design itself. So we've got the fieldwork dates. We've got information around geography and we can see things like the sample size and who's included in the population and there's details about the sampling procedures there. So lots of information is immediately seeable just by looking at this catalog page. Then if it looks like your study that you found seems suitable and you want to know more, you can go to the documentation tab at the top. And with the documentation, we tend to get a mixture of different documents across different studies. So it's not always the same mix, but they tend to be things like a user guide. And then also we can see things like data set documentation. So this provides information about what's actually in the data set, whereas the user guide will often contain details of things like the sample response rates and you might also find information about variables. And then what we find here is there's the UK data archive data dictionary. So this is produced as part of the process of taking the data in with the UK data service and this provides a list of all the variables that are in the data set. So what I would like you to do now is we're going to go and have an activity where you can have a little go exploring some of this data documentation. So I think Jill is going to pop a link into the chat. Here it is worksheet one catalog and data documentation. So if you could go to this document and have a look. So you'll be asked to have a look at some the catalog page and the documentation for a survey called the British Social Attitude Survey. And there's some questions there to guide you. So see if you can find the answers to these questions and make a note of your answers because we'll have a little look at these at the end of the activity. And we have about 15 minutes to have a look at this activity. If you have any questions in the meantime, pop them into the chat and otherwise I'll look forward to seeing you back in 15 minutes and seeing what you've found. Okay Jill, so can we have a little look at the poll results? So one of the questions that we asked in the worksheet was which of the following topics did the BSA 2019 cover? So the BSA like lots of surveys will have core questions but then also change the variety of questions across different years. So some years will cover some topics and others won't and it picks up this allows it to pick up on topical issues like things like Brexit. So the options were political party identification and Brexit and then crime and most of you have selected the right options so we could see if we look at the documentation and we will do in a minute that there's data on political party identification and also Brexit but there's no mention of crime. So if I just click over to the cattle page. So this is the one for the British social attitude survey and if we go down to the abstract, we've found the details of the main topics here. So you can obviously go to have a look at things like the questionnaire to get you detailed information but just having a look at the catalogue record instantly gives you an idea about what's there. Oh, can we share the results for that? Sorry, Jill. I think we didn't share the results. So let's just have a quick look at the poll. So yeah, most of you selected the two options that looks like there's no mention of crime in the survey. So if we stop sharing those now, okay. And if we continue looking at the catalogue record, we can find answers to the other questions. So here there's details of the field works. You can see exactly when the survey was conducted. The things like the British social attitude survey that covered topic issues that can be very useful information to see what was going on at the time. And we also have details of the number of cases here. And so the next question was looking at do what does the variable with the name TV news measure. So to get to this point to sort of see further down into the data, we have to go and look at the documentation and there's actually two options for us to have a look at here. So we've got this full questionnaire. So we could have a look through there and we also have a variable list. So this is a nice sort of summary document just to give you an insight into what variables are there and you can do things like search through them. So here we go TV news and it asked how often do you watch all or part of a news program on television? So we can start to see what variables are there and what they're measuring. Now I also we asked, can you find a variable about interest in politics? So could you tell me in the chat whether you managed to find a variable about interest in politics? Brilliant. Yeah. So I'm getting some do so. Yeah. There's a lot of people just saying politics and that's because yet there is a variable looking at it. That is just called politics. So let's have a look this time. I'm going to look at the full questionnaire. Okay. And I'm going to do a search and this time I'm going to try searching politics and it brings up. I can see this four sort of references to politics. So it's going to say it's going to ask some questions about politics. And then if I look towards the next mention, I can see here, there's a variable called politics and it says how much interest you generally have in what's going on in politics. And you can see all of the response options as well as things like how don't know and refusal are coded. So here the actual name in brackets is indicating the variable name. So this is what you will find out. This is the variable that you'll find in your data set. And I'm got a question here. So what about the field work dates and units from this study? So there's different places you can look for this kind of information. So what you have is in the catalog record under details and you go right to the bottom coverage and methodology. We have the field work dates here. What you will also have is if you go into the user guide into the user guide, there'll be full details about the sampling and the response rate and field work. So for all of the information, you can go to the user guide and find out all of those details. But this section in the catalog record is quite useful at giving you that quick insight to think how big is this data set? Does it come from a representative sample? So I'm now just going to finish off by looking at some of the key issues that come up. So a lot of the quantitative data that's available comes from surveys. So a really big issue with surveys is sampling. And so some of the questions that we will come up with is the sample data representative. So here we need to think about things such as what were the sampling methods? Who was included in the sample? And what was the response rate? And is there any information about non-response across particular groups in the population? So sometimes they can identify that certain groups in the population were less likely to respond. And this is all information that you should find in the user guide. Another issue to consider around this is are there weights to use? So many social surveys use weights to make data better represent the population. So the user guide provides details about any weights that have been made for use with the data. And the UK Data Service also has some general resources about weighting. So if you're not familiar with issues around weighting and how to use weights, check out some of the UK Data Service resources around weighting. To get that background information. A second issue to consider about samples, are there enough cases to make a precise estimate? And this question is especially important when you're interested in smaller population groups. So for example, in relation to the disability research that we looked before, the sample of the crime survey started off with a sample of around 46,000, but only sort of 9,000 within that are identified as having a limiting disability. So a similar analysis might not be possible if the survey had a smaller sort of sample size to start off with. And then also for survey data, it's important to understand issues around who was asked what questions. So computer aided interviewing makes it easy to send respondents through the questionnaire in different routes. So depending on their answers. So it might be questions are only applicable to some of the sample. So for instance, questions around jobs only make sense if they're asked to people who are in work and this creates sort of types of missing data in the data set. So the people who haven't been asked the question so they don't have a response for that variable. And you could find out all of this information by looking in the documentation. So let's look at an example here. So this is from the labor force survey. There's a variable called flex 10 and this relates to sort of special working arrangements and the documentation includes the exact wording of the question along with the range of response options they could have. So they were asked if they have any of these special arrangements for working and then underneath the question. There is information about who this applied to. So here we get quite detailed information about all the previous questions that they've been asked in order to be asked this particular question. And we can also find information about what's been done with the data afterwards. So for example, the LFS documentation includes information about derived variables. So derived variables are variables that have been created from one or more sort of raw interview questions. So for example, based on responses to that flex 10 question about special arrangements, a new variable called flex W7 has been derived to indicate if someone has zero hours contract or not. So originally the rest about which special arrangements apply information from that that question responses to that question have then been used to create a new variable that simply indicates if someone has zero hours contract or not. And in the LFS we have these sort of flow diagrams to indicate how the variable was created. It's not the same like this in all survey documentation, but you'll usually see some information about how the questions were derived. And so there's some of the key issues that come up when we're sort of getting familiar with the study and then working to understand the data that we find. And so now I'm going to pass over to Maureen to show us about how we get started with qualitative data. All right. Thank you very much, Jen. I'm Maureen and I'm going to talk about very similar things as Jen, but for qualitative data. So first I'm going to go through a couple of different types of qualitative data reuse projects. I'll then walk you through a case study of one of those reuse projects, then do an overview of how to get started reusing data, including addressing a couple of key issues that arise when you reusing qualitative data. And I'll also show you some special ways of how to find qualitative data. Quality data reuse has been becoming more common in recent years, and the UK data service is certainly offering much more qualitative data sets in a much more accessible way than it was able to before. It used to be that if you wanted to reuse qualitative data, you'd have to actually come to the archive and sift through boxes of paper in order to reuse that data. But now it's a lot of sort of the old paper-based collections are digitized and we're also able to make some of this data searchable, which is really helpful. So there are many ways you can reuse qualitative data. You can quite simply give a description or understanding of a particular social or historical point in time. And why this is useful is because you can see a lot more of the data than just what publications would reveal. You may not want to be able to see all of the data, depending on what's available in the archive, but you can certainly see more than just what was originally published. And this means that you won't be limited to just what researchers thought was salient for their research questions and topics. You can actually explore it further and see what would be of interest to your own questions. Another way to reuse data is to consider analyzing methods used and look at lessons that might be gleaned from the most effective ways of, for example, sampling or data collection methods or developing topic guides. So for example, one thing that's especially valuable to look at is how an interview was laid out before the interview was conducted. So that is what questions interviewers thought they were going to ask and then look at what was actually talked about in the interview and there may be many reasons why certain questions are or are not asked in those interviews and some interview schedules are certainly designed to be a bit more flexible and sometimes tangents come up and you just want to interrogate that further. In any case, it's an important researcher skill to have the intuition to know what to do and you can't really see that unless you start comparing interview schedules with actual interview transcripts. Another reuse is called reanalysis and reanalysis looks at the wide range of approaches you can take in the analysis of a data set. So it usually means asking some kind of different research question from what the original researchers were trying to do. So for example, Clive Seal and Charteris did a study we were using illness narratives. So the original illness narratives had looked had been looked at exclusively for health research. So they were really interested in some of the diagnostic decisions that were made. But when Seal and Charteris Black came along to do a comparative keyword analysis, they were much more interested in an analysis of the discussions between patients and doctors rather than the actual health issues that came up in the interviews. So the questions can be very different in that kind of way or sometimes a question could be similar to the original research but have a slightly different focus. So for example, Joanna Bornat looked at gerontology as a topic and she found two different data sets which looked specifically at this topic. But Bornat's research question was on racism and that wasn't the focus of the original work for either study, but those data sets those interviews were rich enough to allow her to explore that theme within the existing data. The final type of reuse is going to be exemplified by a case study that I'm going to go through with you and this is a re-study which is where you replicate the methods of a study for purposes of comparison. So you might be looking at a historical comparison which could help you demonstrate how society has changed over time or you might be doing a comparison between key social characteristics that might be for example geographical or social class or a comparison with any other variable to show differences between subgroups. And the example that I'm going to show you is from a reuse project called the School Leavers Study. The original study was conducted by Ray Paul in the late 70s. It's part of a much wider community study on the Isle of Steppy. The UK data service holds a number of collections related to that community study, but the School Leavers Study was a sort of sub collection that specifically looked at student aspirations and Paul asked teachers to set a particular essay just before students were due to leave school, prompting them to imagine that they were reaching the end of their life and something made them think back to the time that they left school and they were then assigned to write a short essay of what happened in their life over the next 30 to 40 years. In 2009, Graham Crow and Don Lyon and that is a picture of Graham Crow there with Ray Paul. Ray Paul is holding his book that he wrote out of that community study, decided to reanalyze the data set and focus solely on student aspirations. So using the same methodology, they conducted a re-study of School Leavers for students on the Isle of Steppy in 2009-2010. And the prompt that the actual field work during the data collection was nearly the same. Imagine that you are at the end of your life and reflect back on what you've done since leaving school and they then transcribed the essays and compared the themes from the new set of essays with the set of essays that had been collected by Ray Paul in the 1970s. And you can see the wording of the prompt here and a small snippet of one of the essays there. And there was a challenge to doing the re-study of this specific study. So when Ray Paul collected the data initially, he sort of stumbled into teachers who were assigning an essay and asked them, can we use this? And it was the teachers who then gave the instruction. So there was perhaps a little bit less control over how the essay was presented to the students. The original essays also showed markup from the teachers because those essays were actually graded. When Graham Crowe did the re-study, they weren't marked and the research team had much more control over the essay prompt. So Graham Crowe goes into some detail about this in some of the publications and he devised the prompt based on conversations with Ray Paul about how the original study was conducted. Nevertheless, this is a point Crowe kind of talks about a lot and comes to the conclusion that the overall picture painted by the essays as a collective still offers a valuable comparison. And the findings did show quite a shift in aspirations as you might imagine. So here's a more details on what they received back. So there is a slightly different gender divide, but similar amounts of data received and both data sets covered the general themes of health, education, career and family and leisure, but they covered them in very different ways. So how were they different? In 1978, students expected much more grounded in arguably mundane sorts of jobs. Career progression was gradual and it followed on from hard work and sometimes there was talk of periods of unemployment or even death. And you can see a few of the examples in the left column of some of the quotations from those essays, such as the one at the bottom. I longed for something exciting and challenging, but yet again, I had to settle for second best. I began working in a large clothes factory. 2010, however, showed students that we're imagining well paid and instantaneous jobs, you know, filled with choice, but also a lot of uncertainty. And Crow and his research team also noted a clear influence of celebrity culture in those essays. So for example, you have the quote at the bottom of a girl who writes, in my future, I want to become either a dance teacher, a hairdresser or a professional show jumper, horse rider. If I do become a dancer, my dream would be to dance down say or some and the impact of the study spans beyond just the interesting changes they've noted about young people's aspirations. The study was part of a much bigger community project on the past, present and future of the Isle of Sheffey. So the goal was to engage the community alongside the research and find innovative ways of including the participants in the research outputs. So as part of that initiative, they published the Living and Working on Sheffey website and that has videos and artwork that's produced by the residents of the Isle of Sheffey, as well as ways for those who participated in the research, whether it was from the 70s or the current research to stay in touch with each other and read about the history of their community. So they helped create basically a shared history and a shared memory of what living on the Isle of Sheffey means among the community. So hopefully you are thinking about different types of projects you might do with qualitative data, but how might you go about finding qualitative data? And in terms of searching for data, qualitative data poses a bit of a challenge. Interview transcripts, essays and other types of qualitative data often hold far more information than just what an abstract or catalog page might say. So you might be missing out on a whole range of collections that could potentially touch on the topics that you're interested in simply because no one has the time to sit and read all of the data for every collection out there. So we do have a tool, which can help with this, and this is called Qualibank. Like the data catalog, you simply type in a keyword, but instead of searching through abstracts and catalog pages like the data catalog does, Qualibank actually searches through the data itself. So when you click on the search button in the data catalog, you'll see a Qualibank appear underneath that search button, or you can just type in the address, which is ukdataservice.ac.uk forward slash Qualibank. And with this tool, you might be able to identify relevant interviews that are spread across different collections or find a collection where you didn't think the theme might come up. So in this example, I've typed in typhoid, and you can see that it's searched through and highlighted the data itself where that is mentioned. So the first couple of hits there are from the Morale and Home Intelligence Reports collections, but further down there's examples from the Edwardians interviews. And when you click on one of those search results in Qualibank, it brings you straight to the data to the spot where that keyword is mentioned. And if you scroll to the top of the page, you can also see that there are external resources and collection documentation. So if you click on those hyperlinks to the external resources, it would shift you down to the bottom of the page, which could include things like audio extracts of the transcripts, images related to that interview, or sometimes there's web resources. It's totally dependent on what's available basically that would complement that piece of data, but where something is available. We've tried to make sure that it's listed. Finally, one last feature of Qualibank. If you want to cite directly from an interview transcripts or whatever piece of qualitative data you're looking at, you can simply click the create citation button, which is there in the left hand menu right at the top and then highlight the portions of text that you're interested in. And that create citation button will then turn into a retrieve citation button, which you can click and you'll see a pop up that looks just like this and you can copy and paste this citation into whatever document you're working on. And it has a persistent identifier, which is the URL that you see at the end of the citation. So that would bring your readers of your work directly to the exact paragraph that you've highlighted in Qualibank. So it introduces a new layer of transparency to your work and it also helps you accurately cite the data that you're reusing. Okay, so we've covered the different types of reuse projects that you can do, how to find and access the data, but what about the process of actually analyzing the data? And the first thing you need to do similarly, I think to what Jen was talking about is that you need to orient yourself to the original research project. And I think the main point here is to not underestimate the amount of time it would take to get acquainted with the data sets. So there may be multiple levels of contacts to get through in order to really understand the data. And what I mean by that is you may have more than just the data that's collected at the time of the interview or whatever the data collection method is, but you may also need to consider the metadata of the participant, so what their social characteristics are, what the historical time period is in which the data was collected or perhaps where the data was collected. So really the idea is that you need to really understand the data set as a whole in order to get at the root of what the data can convey. So the documentation provided alongside the data set is really useful as a starting point for that. And it often contains more information about the methodology, so you might have something like an interview schedule or a call for participants or sometimes there are segments from publications arising from the original study or funding applications. And I've also seen some studies which have sections written up by the principal investigator about particular features of the data set. So for example, Annette Lawson conducted a study in the 1980s on adultery and given the sensitivity of the topic at the time, the sampling became her primary focus for her and she ended up writing a 56 page document just on her sample. It was basically justifying why, you know, her sample came back as predominantly white middle-class women. So in my time working with qualitative data sets at the UK Data Service, I've also seen background contextual material that was taken from the area of research such as meeting minutes, government pamphlets, letters from participants and all of that can help paint a picture of what was going on around the study and hopefully that would be included with documentation. You may also need to consider the sample. So for example, if the data set is too large, you may need to take a sub-sample. So this is perhaps less of an issue with qualitative research since they're usually smaller studies anyways, but there are some collections which did get a large amount of funding and you may need to carefully consider what's feasible. So for example, the Edwardians collection that I mentioned earlier that was put together by Paul Thompson was widely considered to be the first oral history of Britain and it contains 453 80 plus page interviews. So it would take a considerable amount of time to read and reread all of those. So you may need to take a sample of it. Conversely, you might find the interviews from different data sets kind of complement each other and would make a new larger data set that's useful to you if you combined them. So this has been, I mentioned Joanna Bornaz's work which did this. You just need to be quite careful about making sure that the data sets are sort of harmonized, if you will, that they sort of match each other socially, politically and so on or that you take into account whatever differences there are when you're analyzing it. Finally, you'll need to think through how you're going to approach the data. So you might use an inductive strategy where you start with the data and then see what comes from that or you could use a deductive strategy where you have a firmer idea of what you're looking for within the data and you know both are equally valid, but you need to consider what approach you're going to take as you get started. So this has been quite brief overview of a couple of key points when you're getting started with qualitative data reuse. If you're looking for more guidance for discussion on these issues, then there are a few sources that I would highly recommend. The first and foremost is the SAGE handbook of qualitative secondary analysis, which just came out by 18 months ago. So it's edited by Karen Hughes and Anna Turin and it's a comprehensive guide to the issues around recontextualization, sampling and different types of reuse projects. So it's a really good kind of handbook to use. There's also a short single chapter out of Silverman's. I think it's his most recent edition of qualitative research. Libby Bishop wrote this chapter specifically on reusing qualitative data and it's filled with further examples of reuse and addresses the some of these key issues I've mentioned in more depth. And if you have access to the book through your library, then I'd recommend starting with that chapter and there's also some timescapes methods guide series and those are available online. They're quite short. They're just a few pages. There's one from Sarah Irwin and Mandy Winterton. That's pictured here on the slide, but that's another great way to sort of help you get started. And now we're going to do one more activity exploring a what we call a download bundle of data. So hopefully Jill is going to put into the chats the link for the next download activity. Excellent. And what I'm going to do is just share my screen of that. So here we are. So if you go to that link and have a look at the pioneers of social research collection, you can either go to our data catalog and type 6226 in and it should bring it up. That's a ISO sort of isoteric kind of looking number there, but that's a study number. It's a unique identifier for the collection. So you can just type it into the data catalog and it should bring it up. And this is an open data set. So you don't need to log in or anything to have a look at through the download bundle. If you go to the access data button, the purple button in the corner, you should be able to scroll down and see an option that looks like this and you can download a zipped file. So what I'm going to ask you to do is have a little explore through some of the folders and some of the documents in there and specifically find the data listing interview guide. Have a look for Stan Cohen's interview transcripts. So you'll need to find what his file name is, which you should be able to do on the data listing and then just have a little flip through participant 20, Diana Leonard's interview summary and interview transcripts just to look at a couple different types of data there. So I'll give you about 10 minutes or so to have a look through and we'll just have a quick summary when you come back and see if there's any questions that have come up. Okay, I'm just going to pull up just a couple of things to look at. So I've already got the download bundle here and you can see when you enter in there's two folders, the MR doc folder, which is documentation and the RTF folder, which in this case is the data. Sometimes it's available as PDF the data. So that that directory would be titled PDF. So if we go into the documentation, you have Excel and PDF, they're just the different formats of what's available. So you can see here there's a there's a use what we call a user guide a U list and it gives you an at a glance view of the data set. So if we were looking at this, for example, and thinking about one of the other questions on Stan Cohen, we can have a scroll through and see who participated and we see Stan Cohen's here. We've got some basic demographic details about him and then the text file names. So he's interviewee five. So we'll just keep that in mind for one of the other questions. Lauren, we can only see the exercise on the screen. I'm so sorry. So sorry. So is that better, Jen? Okay, so hopefully just let me know if it's still stuck. Apologies about that. So this is the the data listing and you can see the names of participants as well as some basic demographic details and if we scroll down, we can see Dan Cohen is here and his his file names are our titled 6226, which remembers the study number. This is the interview in five. So his participant ID in this case is five and you can see there's a couple of different files that are associated with Stan Cohen. You've also got some, you know, how long those those pages are. So his interview is 117 pages so on and so forth. Okay, so we've got the data listing there, but we can also look at the interview guide. And the interview guide gives you what all of the questions were that were planned to be asked within that interview. So it gives you what directions basically the interviewers were given. And then we need to find Stan Cohen's data. So if we go to the RTF folder, we can go into the RTFs, which is the Word documents and we can see here in five, which is hopefully Stan Cohen's interview. Yeah, so we can see Stan Cohen is written at the top here and this is the conversation that he had with his interviewer. And then finally, hopefully you will have seen that there are two different types of data in here available as RTF and those are summaries and interview transcripts. So the interview summaries and the interview transcripts are similar in that the summaries are based off of those transcripts. But at the same time, you know, the summaries is a much more condensed. So it gives you an at a glance view of what's going to be in that interview. So it might be helpful if you're working with large amounts of data as in this case, you've got interviews that are over 100 pages long. It's quite helpful to have those summaries which are, you know, 10 to 20 pages each. So I'll just I'll just open up interview 20s so you can kind of see it almost gives us sort of biography of what her life was but not in an interview format, right? They can be useful in analysis sometimes depending on what you're trying to do or they're just useful in terms of navigating large data sets or large interviews. It stays true to the kind of chronology, if you will, of the interview itself. We also see interview summaries come up where there may be reasons that the depositor can't share the interview itself. So if there's if there's issues over data protection or concerns about the safety of the participant, they may opt to share interview summaries instead so they can kind of have a little bit more control over the level of detail that's offered.