 Okay. I think the numbers are slowing of people entering the room. So as it's time, I think we'll go ahead and get started. So welcome to this online workshop getting started with secondary data analysis. My name is Maureen and I'm here with Nigel. I'm a senior qualitative user support and training officer at the UK data service. Alongside this, I also teach at University of Suffolk in the childhood and education roots. So very big welcome Nigel. Do you want to introduce yourself? Hi, I'm Nigel. I'm a research associate of the UK data service. My research interests around housing, race and migration. So what we're going to cover today is talking about what secondary data is, thinking about how you reuse it and to go through some examples. So talking through key issues, the resources available. And there are some practical activities that we will ask you to take part in as we go along. You can put questions in the Q&A as we go along, but we will pick them up towards the end of the session. So first question really has, have you reused existing data before and there's a poll coming up from Zoom. So if you just, okay, so it looks like there's a mixture really, quite around a third of reused quantitative data. Small number of reused qualitative data and another third of use both. Around a third haven't used any. So that's great. Let's move on. So first of all, what is secondary data? And what we can see here is on the left hand side, a data collector. So a data collector might be these three large bodies. So the Office for National Statistics that collects major national surveys, census data, etc. And the Society who collect a longitudinal survey that we will show you a version of. We'll have a look at some of the documentation for later on. The NATS and social research, and also many of the ESRC research centers deposit their data with us. This is primary search research is being collected and analyzed for a specific purpose, but is made available to other researchers to carry out secondary research. And that reanalysis may be for quite a different purpose to what the original data was collected for. So what are the advantages of reusing data? First of all, you can get a data sets of a quality that are probably impossible to create with the amount of resource we have available as individual researchers. So it's cost effective. And all of those ethical issues about data collection are dealt with. You don't need to be contact data subjects to ask about using data. And you can reuse it by others to make claims because it's got a certain validity. But there are some downsides which we need to work to overcome. So the first is, we don't have that insider understanding of the data and data collection. So there's some work required to get to know the data. There may be ethical issues about the way we use the data. And it may not directly match our research question. So if we had our own data and it didn't match our search question, we could extend maybe our data collection method with secondary data. We don't have that option. So the first thing to say is that, you know, our task if we're going to reuse secondary data is to get to understand it and make a pragmatic decision about whether it's good enough. Or to answer the research questions or some of the research questions we are posing. So just thinking about the kind of research process. We start off, you know, with a research question, we find data that will help us answer it. We evaluate it and we analyze it, but actually the process is really quite messy. So we might go around in various loops around that. So looking at the research question, we find data, we find it doesn't quite match what we need. So we go back, either reframe the research question or look again for data. Once we've evaluated it, similarly, we may well find the only answers part of what we want to look at. So you get the idea, and I'm sure many of you are familiar with the messiness of the research process. So we're going to jump quite quickly here to finding data. So what I'm going to do is put a link in the chat. So there's a link in the chat to the front page of the UK data service screen. And in there, there's a welcome and the first box you see is the search the data catalog. Now, to help you look for data, you can obviously, if you've got a research project yourself, use that data. But if you haven't, there is also a worksheet which the link will go into now to with some suggestions for what you might look at. So we're going to just take five minutes to have a quick look at this catalog search. And just try a couple of keywords and you will get kind of answers coming up in a box of the things that are available. So the links now in the chat. So if you want to just, we'll just give this five minutes and then we'll have a look at other ways that you might do it. And I will demo these as we go through. Okay, so I've had a chance now to have a look at the data. I'll now demonstrate you. So I'm just going to stop sharing that screen and start sharing the other one, which is the, so this was the screen I asked you to go to. Welcome to the UK data service and the default is use go in to the data catalog. So as an example, I said I was interested in housing research at the beginning. So I'm just going to type in housing and see what comes up. Now, if you look at the box on the right, you've got a set of surveys here, the English housing survey. It's got the housing stock data survey, which is a part of that survey. Now, if I was interested, for example, in refining this to look at more recent data, I could change the date. So here I've refined the date to 2021 to 2023. And as I look at the results, I get more data. So some survey data on housing possession cases during COVID-19, for example. So and some administrative data, continuous recording of social housing, lettings and sales. So there I've got a set of studies. If I look at the series, then I will get information here about the series. The English housing survey, the English house condition survey and the continuous recording of house letting. So we saw all of those. So if I go into the English housing survey, it will give me some information about that. And when I go into accessing data, then it will give me data for the different years that are available. Doesn't look the best. So that's one way of finding data. The second way is to look through the learning home. And here you will find things as well as data. You will find information about particular types of data. So if, for example, we're largely talking about survey data and examples we use in the quantitative section. But if you're interested in international data like World Bank data and so on, you could go into this search area. And there's some supporting materials there. So it will take you through what is there, what kind of data we hold, and then a list of the different types of data sets. I'll leave the qualitative data because Maureen is going to go through that similarly with census. And if you're looking at data for teaching, we have a set of teaching data sets. So there's lots of different ways to find data. Other ways that we might think about that, and I'll go back to the slides now, is information from past webinars. So we do webinars on some of the major surveys. So coming up is we will be doing a webinar on a new data set that will be coming into the data service. The evidence for equality, which is a unique survey that has evidence for ethnic minorities that hasn't previously been available, new types of questions, and also sample sizes that make that easy to use. So if you're interested in one that comes from a webinar like the labour force survey, family resources survey, or some of these new surveys, then you can find them from there. And we are investing in developing more materials, more asynchronous materials, videos that you can access on our YouTube channel to see what's going on. You might look at the theme pages that we just demonstrated. You might find material from previous webinars. But what you need to understand once you get at it is what was collected, who from and when and where, and the kind of changes that may be made. So I was going on to say that the UK data service simply holds data that's deposited by data owners. We don't make changes or anything else to it once it's been deposited with us. So there will be a number of things that I've done to that data, introducing different elements, derived data, for example, weights, etc. That you will need to understand. And those are, you know, reasonably simple for some data sets, but can get quite complex, particularly for longitudinal data sets. So the documentation will include generally a user guy, the original questionnaire and interview schedules, probably a data dictionary, and other resources. So what we're going to do now is to have a look at that documentation. So again, we've got a worksheet. So if you could have a look at that worksheet, which is catalog and document, if Emma, you could put the link into the chat. And what we're going to look at here is a study that was done by Understanding Society around COVID-19. So there are several waves of data collected during the COVID-19 pandemic to answer fairly immediate research questions. And based on that, we've developed a teaching data set, which is open access. So what I'd like you to do is to go follow that link into Understanding Society and answer these questions. So the first couple are direct kind of questions. So what's the observation unit for this survey and what country does it cover? But then have a look at in a bit more detail at what kind of topics are there in the data set. And finally, have a look at a study which use that data, which has been recorded in the resources, and then answer a couple of questions about that. Okay, so hopefully you had a look at the looking at the documentation and kind of what data is available. We're going to look at some types of data as well as some of the key issues you may face. So, first of all, in answering that program, that series of questions, were you clear about what type of analysis, what type of data was available first from the Understanding Society data set. So, it was longitudinal data, so it's taken over time, and it's available for both households and individuals within households. What area did it cover? It was across the UK. So hopefully you all found that information from that worksheet. And exploring specific topics. I don't know. I mean, I looked actually at this data set in terms of debt, housing debt and other debt, but there are lots of different topics in there. Lots of research using it because it was a very powerful resource at that point, because most surveys take a bit longer to filter through. But at the second part, what we had was a study of care homes in England, and the kind of data being collected was text. So there are lots of different types of data here, but let's just pick out some of those key things. So cross sectional surveys are taken once they involve one person, either people, families, households or businesses. So, one of these, an example of this is the even survey I talked about before, which is a cross sectional survey targeted at ethnic minorities in the UK. Repeated cross sections are used for some of our national surveys. So we looked at the English housing survey that's carried out every year, but it's a repeated cross section so it doesn't follow people over time and it's targeted at individuals. Longitudinal data is data like understanding society, labor force survey where we follow individuals and households over time. And then we look at all the types of data. So geographic data. So any of you who have worked with census data, this will give us estimates of aspects of the population at different geographical areas. So you could look at breakdowns by local authority by statistical census geographies by wards by regions or by countries. The last example on here is comparing countries over time. So a data set I've used quite a lot in teaching is the World Bank data set which gives you information about countries over time. So you can see how populations have changed aspects of those countries in terms of economy, demographics, healthcare, etc. So here is an example of a repeated cross sectional survey. It was called the British Crime Survey. It's used quite widely for information about crime because it offers the ability to ask how people feel about crime. It identifies victims and ask them how they feel about different things. So it's different from sources like police records, which simply hold recorded crime information. It's carried out each year around 35,000 individuals, a 16 plus and 3000 young people. It identifies whether people have been a victim of a crime in the previous 12 months. So it's a random selection. And then people are asked whether they were a victim of a crime. And it also covers demographic characteristics like ethnicity, age, gender, social class, attitudes to the police and the criminal justice system. So quite an important kind of research area for those interested in aspects of criminology. Data is stored as individual records that are anonymous. So you can't see any personal details. There are two different levels of access. So the standard end user license, which is available for people who are registered with the UK data service, gives a level of detail. The secure access has more sensitive information and is accessed through a secure agreement. You may get a secure service which both us and our NS operate. That secure service you can find more information on in other website, but to summarize, in order to access data at that level of detail, you need to be accredited, which means you need to attend the training course and pass a test. Once you've done that, you then need to get approval for projects. So here's an example of what's held. So there's a number. There's sex, age. I've got things blocking part of my screen. And things about work. So it's quite a large survey. There are quite a lot of variables, but that's an example of the format. And here's a study that used it. So this study from 2013 looked at violence against people with disability in England and Wales. It used the British Crime Survey from 2009-10 because that survey had introduced disability measures. So there's a special license version of the data. There were 46,000 adults and of those 9,000 that had at least one limiting disability. And key findings were that having adjusted the age, sex and socioeconomic characteristics, disability increased the risk of experiencing violence. The levels of victimization were highest amongst those with mental health problems. And based on the fact that this sample is represented through the population, there is an estimated 116,000 victims of violence in England and Wales where the violence against them is attributable to disability. So an important kind of exploration around hate crime targeting a particular part of our population demonstrates the value of that survey because there would be no other way to find that out from administrative records. What we do need to think then, what that kind of method of surveying does because we base it on samples. So first of all, is the sample represented? So who's included? Does it only include those who live in households? So we might be missing out people in communal establishments. Does it only include adults who are missing younger people? And what's the response rate and are there particular biases in that response rate? So do certain types of people, people with certain characteristics, are they less likely to complete the survey? One of the kind of areas some colleagues of mine are particularly interested in is around elites and people with lots of money and typically they don't respond well to surveys. So the English Housing Survey doesn't really reflect those extremely wealthy property owners in parts of London, in parts of the country because their response rate to questionnaires and surveys is pretty minimal. And what we get from those different response rates is maybe the need to apply a weight to make the data representative. So a weight is a way of adjusting the count so that it is representative of the population. Let me give you an example here. If people from the Black African ethnic group are half as likely to reply to a questionnaire as the white British group, then in order to make those two equivalent and make similar claims about the population, we would need to double. So we use a weight of two for the Black African group. And these kinds of weights are derived from population statistics. So the Census 2021 is quite an important secondary source for developing weights for different types of surveys. So we start off with the basis that most of the surveys we talk about are randomly sampled. Having taken those random samples, we then say which groups are over and underrepresented and use the census as a basis for that quite often. And then from that basis calculate different weights. I've got a quick question in there. Can we help researchers design weights? I think we have some training materials on weights that are being updated. But I think that's a question I probably need to take away. So it might be a good idea to email me at the end and I'll pick that up here and come back to you out of the session. Similar question there. So we have general information on our website about weights. They can get quite complex. So in the secondary data we use, there is generally guidance in the user guide on how weights have been calculated, how to use them, when to apply them. So if you look at a survey like the labour force survey, there will be household weights and individual weights for the quarterly files and there'll also be a different set of weights for the longitudinal five quarter files. We'll pick up anything later in questions. Okay, I think there's a few questions about weights. Maybe we can pick that up at the end. Do we have enough cases to make a precise estimate? And this is really important for small subpopulations. So for example, that COVID survey we talked about was used to make claims about vaccine hesitancy amongst different ethnic groups. But actually there weren't any of the ethnic groups the claims are made about of the age who were eligible for vaccines within the sample. So the sample was generated, the claim was generated from much younger people whose views aren't necessarily the same as older people. So I think there is a kind of underneath this, particularly around where there are going to be small counts, you'll need to think about whether they're adequate for the kind of claims you want to make. When you use data, you should cite it. So for every record and data set, there's a citation within the documentation. And you can use the citation tool to copy and paste it into anything you're writing. So here is an example from the British Social Attitude Survey would be the citation you would put in. So this is kind of thinking about who was asked what. So one of the techniques that's quite widely used now is computer-aided interviewing. And what that makes it easy to do is to take respondents through the questionnaire by different routes. So if you are older like me and you remember surveys that you might have done in the past, it would be. Well, if you answered yes to question 13, then please proceed to question 18 and those kind of directions that we had to follow ourselves for the paper surveys. What a computer-aided interview recommendation will do was actually do that automatic routing for you as you follow through. And many questions may only be applicable to some of the samples. So if you're asking questions about children, that clearly isn't applicable to people who don't have children. So those would be passed over in the kind of direct guiding through the survey. We also have some surveys like the British Social Attitude Survey, which takes a portion of the questions. So it has a wider coverage of questions, but not everybody has asked the same ones. So there are different banks of questions in there. So if you do use that survey, you'll find response rates for some of the questions will be a quarter of the sample or half of the sample. And here's a specific question. So I'll let you read that, but it's kind of saying what's your arrangement? Do you have flexible working arrangements and what type are they? And underneath it, there's a set of logic about when it applies. And then I think that the next stage. So once we've got that raw data that reflects the answers from the individual questions, we can do things to that data to manipulate it. These are called derived variables. So when you look at the documentation, you may have the kind of raw variables and derived variables identified separately. And this is the logic for whether a respondent works zero hours contract. So if you start and it's saying if the question answer is missing for one to three, then it's a missing value. If it's seven, then this person works in a zero hours contract. And the no option is doesn't work for zero hours contract. So you can see that kind of flow that generates an individual, an individual variable that may be more useful for your analysis. You don't have to repeat that bit of coding because those derived variables have been put into the data set that's released. Just a quick dog for census data. So we have access to the 2021 census data for England and Wales and Northern Ireland and we'll have the 2022 census data for Scotland when they're available. So current availability for England and Wales aggregate data is there with univariate and multivariate defined tables. There are what they call a custom data set, which I'll show you an example of analysis done from that. Alongside that is boundary data so that you can map that. Coming in the autumn is micro data, which is like survey data in that it's individual records with a number of variables. We'll hold a safeguarded version of that. It's a 5% sample either at local authority or combined local authority level, which has slightly less detail than the regional level data. There are also secure versions of those that will have 10% samples. So those are for individual data. There's also a 1% sample of households and a 10% secure sample. So those are safeguarded and secure. And there will be some open data that is available for teaching purposes. So it has about 20 variables and is a 1% sample. The other data coming in the autumn is flow data. So that is useful because it gives you the origin and destination that can be used for understanding commuting. So it's fair to say that the 2021 census in England and Wales and Northern Ireland was during a period of lockdown. So that data is likely to be severely different to what you might have seen in the past. But it has data on migration, student term time and home address. In the 2021 census, the data was work so that whether students were at home or not were term time addresses or not, those are recorded in the data. And also second addresses where people spend more than 30 days, 30, 90 years. So here's an example of something that took me quite a long time to do for the 2011 census from building a custom data set. So what this is showing is housing deprivation. So the percentage of individuals who are housing deprived by ethnicity and their year of arrival in the UK. And you can kind of see a general pattern that basically those who were born in the UK have a slightly higher level of housing deprivation or higher level. And the general pattern is increasing housing deprivation, depending on how long you've lived in how long ago you came to the UK. There's work to do on this, but I mean behind the scenes of this I suppose is the fact that housing deprivation is most likely to affect young people. So I think that might explain the difference in the, in the way this chart is projected. But this took maybe an hour. I'm fairly confident that using census data to derive this data at national level. You can also look at it at regional and at some local authority levels though some of the data becomes disclosive, particularly for the smaller ethnic groups. So at that point I'm going to stop and hand over to Maury to talk about the using secondary qualitative data. Thank you. Let me just switch screens here. That's the chart you shared is fascinating, especially if you know if they are younger people as you say you know depending on how much younger if their children you know you'd expect some policies or something in place about the kind of quality of housing etc so it's fascinating to see where that will take you. Alright, so I'm going to talk about very similar things as Nigel, but for specifically qualitative data. So I look like there was a bit of use of qualitative data, unsurprisingly, less than quant. I think there's more of a tradition within quantitative research to reuse existing data, especially from those larger representatives surveys. But qualitative is getting a little bit more common now, especially if you are using it for teaching and learning so if you're using it for postgraduate research for example. So first I'm going to go through a couple different types of qualitative data reuse projects, and then I'll walk you through a case study of one of those types of reuse. And then I'll do a quick overview on how to get started reusing the data including addressing a couple of issues that arise when trying to reuse qualitative data. You'll see that they're they're kind of similar to what Nigel was talking about in terms of looking at the documentation and that's, but I'll give you the qualitative spin on it. And then I'll show you a couple ways of how to find qualitative data. So yeah, it's definitely something that's becoming much more common in recent years. The UK data service is certainly offering more qualitative data sets in a much more accessible way than it's been able to before. So the UK data service, I think, unless Nigel if you want to correct me, but I think it is the largest qualitative collection of social science data in the world, certainly within Europe I think but I think it actually has the most in the world as well. Depending on how you think about qualitative collections and archives and that's it used to be that you'd need to actually go into an archive to sit down, sift through boxes in order to actually access and reuse qualitative data. But now it's much more downloadable just like quant data, you can just click the download button and everything's available available digitally. Even some of our older more paper based historical collections are being digitized so that they're available for reuse that way. Okay, so there's a lot of different ways you can reuse qualitative data. You can quite simply just give a description or understanding of a particular social and historical point in time. And why this is useful is because you can see more of the data than just what publications would reveal. So you might not be able to see all of the data depending on what's available in the archive, but you can certainly see more of the data than what was originally published. And this is useful because you won't be limited to what other researchers thought was salient for their research questions and topics. And instead you can explore it a bit further and see what would be of interest to your questions. Another way to reuse qualitative data is to consider analyzing the methods that are used and look at what lessons might be gleaned from the most effective ways of, for example, sampling or data collection methods, or developing topic guides. One thing that is especially valuable is to look at how an interview is laid out before the interview was conducted. So, for example, what questions the interviewers thought they were going to ask or what other preparation they had for the interview, and then look at what was actually talked about in the interviews. So there can be a lot of reasons why certain questions are, or are not asked in interviews, and some interview schedules are of course designed to be more flexible. Sometimes tangents just come up and you want to interrogate that further. But in any case, it's an important skill to be able to have the intuition to know what to do, and you can't really see that unless you start comparing the interview schedules to the actual interview transcripts. Another type of reuse is called reanalysis, which looks at the wide range of approaches you can take in the analysis of the data sets. So it usually means asking some kind of different research question from what the original researchers were trying to do. And this is probably, you know, the most kind of traditional or the kind of reanalysis that is typically thought about when we think of secondary analysis. So for example, Clive Seal and Charteris Black did a study reusing some illness narratives. The original illness narratives had been looked at exclusively for health research. They were looking at developing the ways that they diagnose certain kinds of conditions. When Clive Seal and Charteris Black came along to do the comparative keyword analysis, they were much more interested in analyzing the discussions between the doctors and patients, rather than the actual health issues that came up in the interviews. So the question can be very different in that kind of way. Or sometimes the question can look at a similar topic in a similar way as the original research, but have a slightly different focus. For example, Joanna Bornat looked at gerontology as a topic, and she found two different data sets that were specifically looking at that topic. But Bornat's research question was specifically on racism within medicine, which wasn't the focus of the original work, but the data sets were rich enough to allow her to explore that specific theme within the existing data. The final type of reuse is going to be exemplified by a CAFE study that I'm going to go through with you. And this is a re-study, which is where you replicate the methods of a study for purposes of comparison. So you might be looking at a historical comparison, which would allow you to demonstrate how society has changed over time. Or it could be geographical class or any kind of comparison to any other variable to show differences between subgroups. So this example is from a reuse project. Well, the original project was called school leavers study. And the original study was conducted by Ray Paul in the late 70s as much as part of a much wider kind of community study on the Isle of Sheppey. So the 70s was kind of this popular time for these sorts of community studies and researchers would go and kind of immerse themselves in the life and culture of an area. So the school leavers study kind of arose from that. So there are a number of collections that are related to this specific study, but the school leavers study was looking at student aspirations. So Paul found out that teachers at a local school were setting a particular kind of essay just before students were due to leave school. And it prompted them to imagine that they were reaching the end of their life. And something made them think back to the time that they left school. And so they were assigned to write a short essay of what happened in their life over the next 30 to 40 years. In 2009, Graham Crow and Don Lyon, and that's a picture of Graham Crow on the left there with Ray Paul, decided to reanalyze this data set. And they wanted to focus solely on student aspirations. So they use the school leavers study kind of portion of that. And they tried to set up the very same methodology, the best they could. And they conducted a re-study of school leavers on the Isle of Shepi for students in 2009, 2010 academic year. So the prompt that was supplied to students, you can see it here in 2010 during their data collection was nearly the same. You're at the end of your life and reflect back on what you've done since leaving school. And they transcribed the essays and compared the themes from the new set of essays to the set of essays that were collected by Ray Paul. So you can see as well a snippet of one of those essays there. And there was a bit of a challenge to doing the re-study of this specific study. When Ray Paul collected the data initially, he sort of stumbled into finding out teachers had assigned the essay. And they were able, at least at the time before GDPR, etc., they were able to share the essays with him. But he didn't have absolute control over how the essay was presented and how it was collected from students. And the originals also show some markup from teachers because they were graded. When Graham Crowe did the three essays were not marked and the research team had more control over the essay prompts. Crowe does go into some detail about this within his publications. And he devised the prompt based on conversations with Ray Paul about his original study. So yeah, this is a point that they sort of address is, you know, how valid is the comparison. And it kind of comes to the conclusion that the overall picture that's painted by the essays as a collective still offers a valuable comparison. So the findings do show a shift in aspirations, as you might imagine. And here's just a little bit more detail about what they received back. So slightly different gender divide, but a similar amount of data that's received. And both essays cover the same general themes of health, education, career and family and leisure. But they cover it in very different ways. So how exactly were they different? Well, in 1978 students expected much more grounded and arguably mundane sorts of jobs. Career progression was gradual and followed on from hard work. And sometimes there were talks of periods of unemployment or even quite morbidly early death or the death of a loved one. And you can see a few examples in the left column there from some of the quotations of essays, such as the one at the bottom. I longed for something exciting and challenging, but yet again I had to settle for second best. I began working in a large clothes factory. The essays collected in 2010, however, showed students imagining well paid and instantaneous jobs. There was a lot of choice, but also a lot of uncertainty. Crow and his research team also noted a clear influence of celebrity culture within the essays. So for example, you have the quote on the bottom of a girl who writes, in my future I want to become either a dance teacher, a hairdresser, or a professional show jumper, horse rider. If I do become a dancer, my dream would be to dance for Beyonce or someone really famous. The impact of the study spans beyond just the interesting changes they've noted in Young People's Aspirations. There was, you know, a much bigger community project there on the past, present, and the future of the Isle of Sheppy. So the goal was to engage the community alongside the research and find innovative ways of including participants in research outputs. So as part of this initiative, they published the Living and Working on Sheppy website, which has videos and artwork that's produced by residents on the Isle of Sheppy, as well as ways for those who participated in the research, both in the present and in the 70s, to stay in touch with each other and to read about the history of their community. So they helped to create a shared history and memory of what living on the Isle of Sheppy means among this community. So hopefully you are thinking about the different types of projects and what you might do with qualitative data, but how much you go about finding qualitative data. And in terms of searching for data, qualitative data poses a bit of a challenge. Interview scripts, essays, other types of qualitative data often hold much more information than just what an abstract might say on the catalog page. You can't just breeze through a variable list and figure out what an interview transcript can tell you. So you might be missing out on a whole range of collections that could potentially touch on the topics you're interested in simply because nobody has the time to sit down and read all of the data for every collection. So the tool which we've developed at the UK data service to sort of help with this, and this is called Qualibank. And like the data catalog, you simply type in a keyword, but instead of searching through abstracts and catalog pages, like our data catalog does, Qualibank actually searches through the data itself. If you click on the search button in the data catalog, you'll see a link just below the search bar to Qualibank appear, or you can just type in UK data service dot ac dot UK forward slash Qualibank. So with this tool, you might be able to identify relevant interviews that might have been spread across different collections, or you might find a collection that you didn't think the theme might come up in that collection. So in this example I've typed in typhoid into the search, and you can see that it's searched through and highlighted in the data itself where typhoid is mentioned. None of those collections have typhoid written anywhere in the catalog page for them. So the first couple of hits where the morale and home intelligence reports collections, but further down there's also examples from our Edwardians interviews. So when you click on one of those search results in Qualibank. So I clicked on one of the interviews that came up, it'll bring you straight to the interview to the spot in the data where the keyword is mentioned. And if you scroll to the top of that page, you can see that there are links to external resources and collection documentation. If you click on, for example, the external resources, it would bring you to the bottom of the page, which includes things like if it's available audio extract of the transcripts images related to the interview, or sometimes there might be some web resources as well. There's a little bit hit or miss whether or not there's external resources, where we know about them, and they're available, we link those in, but not every collection, not every piece of data necessarily has related external resources. All of them would have the collection documentation though. So yeah, you can also cite from Qualibank, which is the final feature. So if you want to cite directly from an interview transcript, you would simply click on that create citation button, which is in the upper left hand menu, then highlight the portion of the transcript you're you're interested in. So this is by utterance. So basically every time you've you've got a speaker tag and an utterance, it would highlight that whole piece. That create citation button will turn into a retrieve citation button, which you can click on and you'll see a pop up just like this. And you can copy paste this citation into your document into the bibliography or reference list. And it's got a persistent identifier, which is the URL that you see at the end of the citation. And if one of your readers of your documents were to click on that URL, it would bring them to the exact paragraph that you've highlighted within Qualibank. So this is introducing a new layer of transparency of enhanced publications, potentially to your work. And it also helps you accurately cite the data that you're reusing. So part of the appeal of qualitative research is the kind of context that you get from doing that kind of research. But of course you lose that as soon as you start pulling abstract, you know, extracts from your data. So this is is aiming to help kind of accomplish what it is that qualitative research does so well, which is keeping that context in place. Okay, so we've covered different types of reuse projects that you can do with qualitative data. We've talked about finding and accessing the data. But what about the process of actually analyzing the data. And the first thing that you'll need to do is to orient yourself to the original research project. And I think the main point here is to not underestimate the amount of time that it will take to get acquainted with the data sets. There may be multiple levels of context to get through in order to really understand the data. And by that is you may have more than just the data that's collected at the time of the interview or data collection, but you might also need to consider for example the metadata of the participants, or the historical time period in which the data was collected, or where the data was collected. So really, the idea here is that you need to understand the data set as a whole, in order to really get at the root of what the data can convey. The documentation provided with the data set will be really useful as a starting point for that. So that will often contain more information about the methodology might have things like the interview schedule or a call for participants, or sometimes it includes segments from publications that were arising from the original study. Or we've also had things like funding applications that were submitted. I've also seen some studies which have sections written up by the principal investigator about particular features of the data set, such as the sample. So for example, Annette Lawson conducted a study in the 1980s on adultery. And given the sensitivity of the topic at the time, sampling became a primary focus for her. You know, how do you recruit people to talk about a taboo subject. So she ended up writing a 56 page document just on her sample. In my time working with qualitative data sets at the UK data service. I've also seen background contextual material that was taken from an area of research. Things like meeting minutes from the local council government pamphlets. Letters from participants. All of this helps to paint a picture of what was going on around the study and would be included hopefully with the documentation. You may also need to consider the sample. And this is I think perhaps a little bit more simple than than the kind of sampling considerations that Nigel's brought up for quantitative data. But for example, if the data sets too large, you may need to take a sub sample of it. This is usually not as much of an issue with qualitative research since they are usually smaller studies anyways. But there are some collections which got a large amount of funding, and you'll have to carefully consider what's feasible. So for example, the Edwardians collection, which was put together by Paul Thompson and it's widely considered to be the first oral history of Britain contains 453 80 plus page interviews. This would take a considerable amount of time to read and reread. So you may need to take a sample. Conversely, you might find that interviews from different data sets complement each other. And you would make a new larger data set that's useful if you combined. Conversely, you might be interested in a particular subgroup, you know, you're looking at single mothers in Britain or something like that. So you'll need to think about who your sample is and what the strategy would be for working with the existing data to identify that sample. Finally, you'll need to think through how you will approach the data. You might use an inductive strategy where you start with the data and see what comes from that. Or you might be using more of a deductive strategy where you have a firmer idea of what you're looking for within the data. Both are equally valid, but you'll need to consider basically what your approach is as you get started. So this was a very brief overview of a couple of key issues when getting started with qualitative data reuse. If you're looking for more guidance or discussion on these issues, there's two sources that I would recommend. One is the Sage Handbook of Qualitative Secondary Analysis, which came out in 2021. It's edited by Karen Hughes and Anna Tarrant. And it's a comprehensive guide to issues around recontextualization sampling and the general reuse of qualitative data. There's also a short single chapter out of Silverman's most recent edition of qualitative research. Libby Bishop wrote that chapter specifically on reusing qualitative data. And it's filled with further examples of reuse and addresses these key issues in a little bit more depth. If you have access to the book through your institution, I definitely recommend, you know, that particular chapter. There's also the Timescapes Methods Guide series, which is available online, openly accessible. They're really short. They're just a few pages, but there's one from Sarah Irwin and Mandy Winterton, the guide number 19, which is another great guide to help get you started. So what we're going to do now is a practical activity of actually exploring a bit of data. The data set that we targeted for you is an open data set so you don't need to have necessarily registered yet in order to access it. You should be able to just access the data from our catalog page. So I think Emma hopefully can post in the chats our worksheet for this one. This is the third one, the download bundle. Thank you. Yeah, so I'm just posted in the chat or you should have received it as well before this workshop, the download bundle worksheet. So I'll give you what time are we at. I'll give you if we say about five minutes just get as far as you can. You can find the data based on the directions on that worksheet and just have a start exploring some of the data files that are in it. And I'll give you about five minutes, and I'll give you a very quick breeze through just a couple of quick points to highlight about exploring the collection. And then we can take your questions at the end. I think I'm going to have to reshare each screen, which is a bit annoying, but we'll get there. So just to kind of give you a quick overview. So if you go to the catalog page, you just go to access data. Now, this data is available openly. So explicit permission was sought because of the nature of the project for the data to be available as open. So you can just download it directly from here. If it was safeguarded, which like 95% of our data sets have some layer of safeguarding, most of them are just you can as long as you're logged in, you can access. There's a few that are more restricted a little bit special. But you would have to actually download it first, add it to your, your account, etc. But this one's a little bit more straightforward. So when you do download it, you should get a screen that looks like this. So let me just go to our download bundle so you can open up the bundle and there's a couple of folders here MR doc. It's a little bit of kind of archivist terminology there but that's all your documentation every collection that has documentation would have an MR doc folder. In qualitative collections you usually have an RTF and or a PDF folder so it's by file type after that. So if you go into the RTF folder, you'll see that there's there's interview transcripts and there they are. If you go into the MR doc folder, you can see that there's an Excel and PDF. So if you go into the Excel, you'll, you'll find the user list which is the data list rather which is the first item that you were, you were going to look at. So let me just show you one more thing and we can switch over to questions, which is the data list. So this data list has all of the interview names, you can see some of the key information that they chose to include in this data list include the year of birth, the place of birth, the gender of the participants, who did the interview, where the interview took place, how long the interview is, what the data file name is. Right, so you can you can hopefully that gives you enough information where that that data list is really useful. It's like the at a glance look at the data collection. And then from there, you can kind of further explore. So this will be available but here are some ways of getting in touch with us so kind of directly to the UK data service through just mail through Twitter, Facebook, and we have a YouTube channel, which has a number of kind of recordings of materials you may find useful. You can contact us on email, or you can use submit a query through our website as well. So, yeah, this has been lovely. I hope we were able to answer all your questions. And I hope that you get started using our data. And yes, thanks very much.