 Good morning, everyone. My name is Maureen Haker and I'm going to be delivering this online workshop on data in the spotlight qualitative and mixed methods data. So I've worked with the UK data service for over 10 years now on everything from ingest to reuse projects. And I also teach research methods at University of Suffolk. Here's an overview of what we're going to cover today. I'm just going to say a little bit about the broader picture of the UK data service and where qualitative and mixed data fit within that bigger picture. I'll also give some examples and details of some of what I think are some of our really good qualitative and mixed data collections, and I'll also try to show you how they've been used in quite creative ways. I'll also cover how to find and access data specifically for qualitative data, which has a couple different options from just general data that we hold, as well as some tips and resources and places to go get further help as you get started. Do be aware that we've got lots of online workshops and materials which go into further detail on things like finding and accessing data. So this is really just a quick overview and just some more specific qualitative kind of features that we have. So we've found that, you know, people kind of want more of the primary topic, but don't mind a little bit of these, these kind of reviews. So we've done a bit here, but if you want further information, we certainly have resources on some of that. So what is the UK data service? We're a comprehensive resource that's been funded by the ESRC and University of Essex. Our main job is to be the single point of access to a wide range of secondary social science data. So the main purpose then is the collection, what we call the ingest and the processing of data, and then further dissemination of that data for other people to use. In addition to the data infrastructure core, we also have a service layer, which provides extensive support training and guidance. And who is it for? Well, we like to think it's really for anybody who's got an interest in data, but traditionally, our main audience and the people who probably both deposit and use our data the most tend to be academic researchers and students. There's a lot of other groups that are well represented, including government analysts, charities, foundations, businesses, research centers, think tanks, all will give and use our data. Given the importance of data today, how it's used and how it's disseminated, we are trying to support a wide range of communities. And here's the front page for our website, which some of you will have already seen possibly when you were booking. This is the top half of the page just to give you a bit of bit of flavor of what we do. There's a link to the main page there as well, but I'll sort of go through where to go and where to find things that are specific to qualitative and mixed data a little bit later, because there's quite a lot of resources available through the website. Okay, and what kind of data do we hold? Well, this session is about qualitative and mixed, but just so you know, the majority of the data, at least judging by the number of collections is definitely going to be quantitative data. So we hold, I think it's over 7000, over 8000 possibly collections now, and most of those, at least two thirds of them are quantitative collections. And we hold a lot of variety of that kind of data. So there's survey data, both cross-sectional and longitudinal. We have some aggregate statistics. There's domestic international macro data. There's census data. So it's aggregate for 71 to 2011 and micro data for there's some select years there as well. But today I'm going to be focusing on qualitative and mixed data. So where does it come from? Again, that varies depending on the data type and some of the sources that you see here, including agencies and statistical time series, those are clearly our main sources for quantitative data. Most of our qualitative data comes through individual academics. So some of them will be funded by research grants. So those are, that's often like the ESRC, but not exclusively. We also take data from other funders like the Welcome Trust or Lever Hume, and there's others, as well as independently funded work. And also for some of the qualitative data, that was originally sort of historical paper-based materials. So we do hold some public records, including things like the census. So if you want a quick way to see all of the qualitative data we hold, we do have a listing for this. So if you simply click Find Data at the top of our homepage and then click the middle option to browse and access data and kind of scroll down a little bit on the page, you'll see these sorts of boxes. So these are the different types of data that we hold, and you can see qualitative and mixed data is one of the options that we have. So right now we have, we're getting closer to about 2,000 collections that we have, and they cover a huge range of topics, which you can also browse by theme. So this area collates a selection of high quality data specifically related to some of these themes. So each of these themes will have different types of data within it, so it might list quantitative data as well, but there are qualitative examples for each of these themes, too. So let me tell you a little bit about some of the collections we hold that hopefully will give you a sense of the range of qualitative data that we've got. And the first study here demonstrates the quality and capacity of inclusive research with people who have learning disabilities. So this is focus group-based research. Some of the focus groups are with people who have learning disabilities, and some of the focus groups are with researchers who they themselves have learning disabilities and are doing research on learning disabilities. So it's very much in the spirit of active research, of participatory research, and of course it's a really good example of focus group research. And just to note here as well that data reuse is a great way to access certain populations that are normally quite difficult to access, whether because they're physically difficult to access, such as prisoners where you might need to go through additional ethical review, which can be quite difficult to manage. Or in the case of disability, sometimes it just requires further resources to appropriately safeguard and address the needs of that group, so it may pose funding or ethical challenges. But reusing the data is a great way to access some of those populations. The second item here is inner generational dynamics by Jennifer Mason, and it looks at whether people above the age of 50 consider themselves to be a generation, and whether or not they identify as a generation in contrast to other age groups, primarily younger age groups, or do they identify themselves differently than that. And she's done a great deal of work on these kinds of aging and generational dynamics, and this is an interview example of interview based data, and again very high quality data. The next is chronic illness and online networking, expectations, assumptions and everyday realities. This is research on diabetes and the use of Facebook. So just to point out, one point really to make with this particular one, we don't have, you know, loads of the data that was kind of has been considered for a while now to kind of be new and novel or big forms of data or social media kind of data. We are getting some of it, though, there are some ethical challenges to kind of navigate around. So, as we kind of resolve some of those issues were able to take more of that kind of data. And this is an example of someone who's been who use their Facebook contacts in order to study how people with diabetes form support groups, and how they get information. So it's quite creative. It's a different kind of data from the usual interview based research. So it's a really good example of that. There are a couple others as well that have used kind of online tools. We have another one on shopping habits and eBay, for example. So we do see some of these kinds of of online based research. But there are, you know, particularly with some of the social media there are some ethical challenges to it, but increasingly we're able to to work around that and work with researchers to make sure they're able to deposit at the end of their projects. The fourth one down is a bit of a mouthful. It's the coalition and all presidential and comparative perspective minority executives and multi party systems. So this one's not quite as accessible as some of the other ones, but I did want to make a point here with this one is that you don't just get individual based data in qualitative and mixed studies. So this study looks at presidential coalitions across a number of countries. So it's looking across Europe, Africa and Asia. And it just makes the point that even with qualitative research, you can do this kind of macro scale kind of comparative work. And this kind of study, I think is a really good example of that. Family life and work experience before 1918 is our founding collection for the qualitative part of the archive. So this one is not new. But definitely I always mention this one just because it is the reason that we have the quality quality data as part of the UK data service. So this comprises of over 400 oral history interviews. And it's widely considered to be the first oral history project of Britain. So it's very well used. So I don't really need to promote it, but I just want to signpost to it as the typical kind of collection that many people probably think of when they think of qualitative data, perhaps maybe not this large. So some of these some of these interviews are over 80 pages, you know, several hours of interviewing with individuals. And these are done with people who were born during the Edwardian era. But yeah, it's an incredibly rich collection. So what I'm going to do now then is give a quick whirlwind tour of three of the collections that we hold and tell you a little bit about what they did and how they've been used. So the first of these is the last refuge. So I've put in the, the SN here, which might seem a bit esoteric to you if you're new to the UK data service, but all of our collections have a study number that's associated with it. And you can find our collections through lots of other tools. But if you do happen to run across a study number, it's a really quick way to find that specific collection. So it's really useful to know about it's basically like a, like a unique ID for each of our collections. So I'll talk about the data catalog in a bit, but if you just type the SN into our data catalog, it will bring that specific study up rather than trying to search through names or keywords. So the last refuge was conducted by Peter Townsend, and it's a major investigation into long-stay institutional care for old people in Britain in the late 1950s. And again, a bit like family life and work experience before 1918. This study was really exceptional for its breadth and its range, especially given the time that it was done. So this collection has in-depth interviews with 67 local authority chief welfare officers and with serving staff and residents of 173 institutions. It's also an important collection because of the diversity of content that it has. So it wasn't just interviews, there were also photographs, field notes and diaries. And I think another feature that is quite important to this collection is that the diaries are recorded by both members of staff and residents of the institutions. So I think this is one of the few examples of where there are voices of residents of long-stay institutions. And here we have, just so you can see some of the richness of the images that are available with the data collection. And it just kind of, I think, helps you visualize a little bit more about the context of the collection having these. And the reason that this collection is so influential is twofold. One is that it was a pioneering use of qualitative data in the area of old age, retirement, isolation services and so forth. Some quantitative work had been done, but really the use of qualitative data for this kind of, you know, direct policy relevance was new with Townsend's work. And the second reason was that this study did have major policy implications because Townsend's results changed whether long-stay institutions were really the best place for the elderly. And I think part of the conclusion of the book is that for many elderly in some circumstances, it's not the right form of institution. But it also had practical implications in that it recognized that people needed not just criticism, but also some suggestions. And he came up with some recommended improvements that the institutions could then adopt. The next study I'm going to tell you a little bit about is one of my favorite studies. It's called the School Leavers Study. And in this case, the original data was collected in 1978 by a researcher named Ray Paul. And this work was done with students on the Isle of Sheppy. So Sheppy, for those who might be joining from further afield, Sheppy is an island off the east part of the UK. Ray Paul spent a long time studying a lot of different aspects of living and working on the Isle of Sheppy, which included things like employment, education, religion, and family. So huge range of issues. And somewhere in that process, he got quite interested in schools and the educational experience. And he found out about a situation where teachers at one of the comprehensive schools were asked to set a particular kind of essay to pupils just before they were due to leave school. So this is about roughly when they're 16 years old. And I think this is perhaps a difficult kind of essay. They were asked to imagine that they're nearing the end of their life and something made them think back to the time that they left school. And they then had to write an imaginary account of how their life played out after leaving school at 16. And, you know, this sort of 30 or 40 years after. So, you know, this isn't just asking what do you want to do in future. It's asking to imagine a life that hasn't been lived yet. So despite it being a little bit of what I think would be a difficult exercise to engage in, I think the students did a really impressive job of providing details, which allowed a quick glimpse into what their aspirations might be, or what their expectations and realities of the lives that they were facing. So, from that we got from that exercise, we've got 141 of these essays, 89 of them are boys, 52 are girls. And you can see briefly, there's still a lot to be done with these essays. So it's just really highlighting a few key points here. But some of the themes of realism, possibly pessimism came out. Young people talked about mundane but quite stable and grounded jobs. They imagined a very gradual career progression if they progressed at all. And there were also stories of stagnation, unemployment, and somewhat morbidly, some predicted their own early death or the early death of one of their loved ones. And the quotes there kind of give you a little bit of a flavor of the tone. So not all of these essays necessarily covered those specific themes, but they were strong themes throughout the data set. So very realistic, very down-to-earth kinds of commentary. So, for example, you have the one boy who said, it was hard finding a job, I failed a few chances, but eventually got what I wanted locally. Or another boy who said I was on the dole for six months after leaving school, until I got a job in a garage, or the girl on the bottom who said I longed for something exciting and challenging, but yet again, I had to settle for second best. I began working in a large clothes factory. So it kind of gives you an idea of some of the tone of those essays. And what's quite interesting is that contemporary researchers, so Graham Crow and Don Lyon, went in during 2009, 2010, and tried another similar exercise. So it was aimed to be a replication. It wasn't in every sense a strict replication. So Ray Paul more or less kind of stumbled upon this assignment that teachers had set. So there wasn't a lot of control over how the essays were put to students. But Graham Crow kind of worked with Ray Paul to try and replicate it as well as he could. And it's one of the points that I think he covers in some of his publications. So I think it's still a useful re-study. So we now have this collection of new data, which offers a comparative to the present day. So like the first collection, it asks school leavers on Sheppy to imagine their lives. But we get a really different story. So for Sheppy's young people in about 2010, we hear very different themes. So again, similar sort of sample of 110 essays, bit more evenly split between boys and girls. And generally the themes that arise from the present day Sheppy students, you know, covered similar themes, but in a very different way. So Sheppy students included stories about being well paid, getting instantaneous jobs, including signing contracts for football or singing. And there was a lot of choice, but also a lot of uncertainty. And we clearly see an influence of celebrity culture as well. And again, you get a real feel for the re-study by looking at some of the quotes, to see how young people in 2010 had a very different view of the world from young people in the late 1970s. So one boy wrote that he had an amazing band and toured the world three times and sold 4 million records. And another thought, he would like to own a three bedroom luxury villa, a helicopter and a Bentley. And finally the girl at the bottom who wanted to be anything from a dancer to a hairdresser to a horse rider. And if a dancer, her dream would be to dance for Beyonce or someone really famous. So quite a shift in possibilities for young people today. In terms of the impact of this particular study, it was a bit different from the last refuge. So as part of the re-study, Graham Crow and Dawn Lyon engaged with the community at the Isle of Sheppy and actively sought to involve them in the process of data creation and analysis. So as part of that initiative, they created a kind of central website hub called Living and Working on the Isle of Sheppy, which you can see a screenshot here. And this included an archive so that community members could see the original study, as well as what was being added as part of the re-study. It also gave their community a bit of written and oral history and also engaging them in exploring their community from different perspectives. So there's videos, there's art galleries, and there's audio so people can really experience the lived memory of their community. And so finally for the third study, I'll look at this one is a very well known mixed methods study called the National Child Development Survey. And the National Child Development Surveys, it's got different waves, but originally it followed the lives of 17,000 people who were born in the same week of 1958. So at that point, people were surveyed every several years. So I think initially it was a little bit more intense where they would have surveys every year and then it kind of broadened out to sometimes, you know, every five years. So participants have been surveyed, I think now 10 times or 11 times since the initial contact in 1958. So they recently just did a big push for data as participants turned 60 in 2018, asking them to sort of, you know, supply new kinds of data. So the first survey was done when participants were about seven years old. I should say the first survey done with the participants themselves and not their parents was done when they were about seven years old. So what they had collected prior to that was information about their physical development and education, economic circumstances of the parents and family, the employment of their family, their health well being social participation of the family. So it's absolutely sweeping in terms of what's included there, even through COVID they were continuing to collect information. I believe they have tissue samples, blood samples, that sort of thing as well to support medical research. So from a standpoint of quantitative data, certainly it's a really, really good collection. But what makes it a really fantastic collection is that there is a qualitative addition to that, which allows for very rich mixed methods research to be done. So in 1969, so this is when kids were aged 11. They set an essay which is very similar to the school leavers essays, but a little bit different in that they were asked to imagine they were 25. So they only had to protect ahead about 15 years. So they got over 500 essays from their participants and the value of the collection really I think is twofold. So one is that the sample size is quite large for a qualitative collection, the essays themselves are quite short but you have 500 of them. So it's quite large in terms of its sample size. And the second is that you have the quantitative data to compare with that to see what actually happened to them at 25. So not only can you look at what they thought would happen, but then you can actually look at their lives at 25 and now also 45 and 55 and see how things kind of unfolded. So there's a number of articles that have been written using this data collection, Jane Elliott, whose publication I've shown here which looks at gender specifically is probably one of the best known people who've reused this data set. Okay, so hopefully, hopefully looking at some of these I've persuaded you that using quality mixed data is a great idea, and you might want to try and get your hands on some of it. So I'm just going to talk through how you can do that. And you can do that through our main data catalog, which is aptly named. So this slide kind of just goes through how you would conduct a search if you specifically wanted to filter based on qualitative data. So when you click find data at the top of our homepage, it will bring you to the data catalog just scroll down a little bit and you would see a very tempting search bar where you can begin your search. And it works a bit like Google, you just type in some keywords, and it will search through the catalog pages and bring back anything that has that keyword in it. Once you have started a search, there will be a left hand menu appear, and you can you've got some options there for filtering your, your search. So here I've set filter for qualitative and mixed methods data. And I've specified the UK, you can specify other areas if you want to. So I've just picked a general search term of food. And as you can see here, there's 39 collections, specifically from the UK that have mentioned food somewhere in their catalog record. So this could be in the abstract in the title in the keywords, or any of the subjects that are identified for that particular collection. So you can see a couple of the examples here there's consumer trust and traceability, and other people's perceptions and experiences of strengths and vulnerability in the UK's food system. So that's how you would search using the data catalog, specifically for for qualitative data. There are a few other filters as well. So if you, if you were interested in refining the date, for example, to data from the past five years or 10 years, you can do that as well. There are separate webinar kind of tutorials and guides available if you want further information on using our data catalog. But we do have another unique way to search for qualitative material. And that is quality bank. So this is an online resource that allows you to search browse and site qualitative materials. So you can get to it by simply clicking on quality bank, it will show just below the search bar in in on our data catalog. So quality bank allows you to look specifically through some of our qualitative collections. So I'm going to go back to some of the, the there's three key issues really that arise when you're trying to search for qualitative and mixed methods data. So I'm going to talk through these issues as we go along, but basically in short, these are actually finding the right kind of data for your project, recontextualizing that data and then citing the data. So quality bank really is meant to be a solution to some of these issues. So if you want to search using quality bank, it works the same way as the data catalog. So here I've tried searching for typhoid. And as you can see, again, like the data catalog, I've come up with 20 results here. But what's different about how quality bank searches is that it actually searches through the data itself. So if you're looking for qualitative data that specifically has a key word or concept in it, you can use quality bank to search through the data itself rather than just the catalog record. And this is really a key issue in reusing qualitative data. How do you find the right data for your project? The nature of qualitative data is such that it's rich and in depth and detailed and could potentially answer a lot of different research questions beyond just what the original investigators chose to look at. So taking the time to familiarize yourself really with all of the data that's available would be extremely time consuming. So what quality bank does is try to resolve this by searching through the data itself. So hopefully you'll have a better idea of whether the data is right for your project. So right now quality bank has about 50 collections that are uploaded into its database. This is something that we hope to expand in future. So you can browse through that data. If you know what collection you want to look through, or if you have a certain criteria for a participant's age or gender or socio-economic status or region, you can refine using the left hand menu. You can refine some of the characteristics as well. You will need to hit refine once you've filtered. There's an option to hit refine, which is a little bit different from the data catalog, but you can actually search through and filter specific participants as well if you want. So from the search on typhoid, I then chose to look at the interview with Mrs. Omissen, which was a little bit further down my search page. And you can see here on the interview transcript, you can see here the interview transcript itself is laid out in the web page. So you will need to log in to quality bank. You will need to register and log in in order to access it because it actually pulls the data up. And what's really nice is your search term will be highlighted within that transcript. So you can skip to the part where it actually mentioned so you can more quickly assess basically whether or not you're interested in that data. And quality bank also has links to external resources. So up at the top here, you can see that I've just circled that. If you click on that link, it will take you to the bottom of the page after the transcript finishes. And there will be a variety of resources related to this particular interview. So that includes an audio extract of this interview, a book extract about the Edwardians. There's a couple images that are related to the wider collection. And then there's some metadata about the participant as well. So it's a really useful tool to help you contextualize the qualitative data. And this added context, whether it's accompanying audio expanded discussion on the data from original investigators, or perhaps some other project which reuse the data. All of that will help you recontextualize and better understand that data. So you'll know what the limitations are and the opportunities within that data. And again, quality bank gives you direct access to those through those additional resources. And finally, there's also the ability to site using quality bank. So we all know how important it is to site your work appropriately. And this is no different when you're reusing data. So quality bank automatically generates citations for you, which you can literally just copy and paste into your work. So on the left hand, you'll see there's a create citation button. So you would click on that and then highlight with your cursor a portion of the text that you want to site. And that create citation button will turn into retrieve citation. So you can click on that. And a small pop up window will, which looks a bit like this will appear with your citation. This citation is very specific though. So it directs viewers to the specific passage that you've highlighted. So if someone who were looking at the citation were to click on the persistent identifier that URL at the end of the citation, it would bring them specifically to the highlighted portion of that transcript. So this I think addresses the third issue in reusing qualitative data, which is all about evidencing your conclusions. So you don't just cite the general collection or or even the data file, you can actually get a much finer grain reference. So others can see the quotation in the full context of the interview or data set. So this really helps I think add weight to your conclusions. And it demonstrates transparency in your processes and your analysis. So quality bank as a tool is something we're still expanding at the archive. So we're still working on support materials to help you use it effectively and expanding its database. So if you have any queries about it, please feel free to get in touch with us. We'd be really happy to hear about how you are using quality bank and how you're finding it and what other ideas you might have for how it how it can be developed. So now we're going to do a quick activity on finding data. So hopefully there is a. Yes, thank you, Gail. So Gail's just popped into the chat. A link to our finding data worksheet. So what I'm going to do now is just give 10 minutes now for you to have a search through. There's there's a couple of guided tasks to to get you looking through our data catalog. And then there's some directions about kind of searching for some of your own topics and just kind of seeing what's out there. So if you if you follow the directions on the worksheet there, I'll give you 10 minutes now or so. If you have any questions as you go along, please do feel free to please feel free to pop those in the chat or the Q&A box and I'll try and sort through some of those. Okay, so I'm just going to set my timer now for 10 minutes and give you some time to work on that. So we've got just a couple of questions. Can you search for topics within such as drawing and education and quality data. So I think, do you mean like searching for specific types of of data such as drawings. Or types of qualitative data, I should say, because we don't have that kind of you'd have to look through the methodology section of the collections. Okay, so if you if you try searching to see if drawing has come up. So I'll show you in just a moment. Let me get up. Let me get up my. PowerPoint, I'm going to pull up in just a moment are we do a new share. So hopefully you can see this. So if we just search for example, drawing plus education. And let's just see what comes up with this I haven't actually tried it before so I'm not sure. So you can you can then search by data type. So we've got five collections here. And I think what you would probably need to do is to look under the methodology section. So I'm just going to click on the first one. And hopefully this hasn't enough detail. But here is where you would just need to see what kind of data that they've got. So this is text data. And then they should usually kind of expand a little bit on what they've done so they would say in the method of data collection they would say here somewhere that they've collected drawings. So you'd have to search for it that way. My thought was if you searched for image or drawing, or perhaps even PDF, because those drawings might be PDF. If you did something like that, I think you'd kind of highlight anything that's within that methodology section and it hopefully would bring that back. So hopefully that's that's useful. But we don't tag the specific kind of qualitative data, if that makes sense. So you wouldn't be able to search that way. And there was, I think, another question, I thought, which has just disappeared on me. If you do have a question, feel free to pop it in the Q&A. We've had another question come in that says, is the service free to use? Yeah, the service is provided by the ESRC and University of Aztec. So between the two of them they fund this service so that it's free to use. The only thing is if you want to access the data, which I'll talk about in just a little bit, you may need to register, but you should be able to use any university credentials to register quite easily. Somebody has also asked about handwritten data. So the handwritten data, ideally, would be something that is transcribed so that it's sort of enhanced, if you will, through transcription so that you can use both the machine-readable format of the data, as well as potentially seeing the handwritten. So if there's no reason to preclude using the handwritten, like an image of the handwritten data, that would be made available as well. So the school leavers study, for example, is one of those collections where you can see an image of PDF of the handwritten data as well as the transcribed version, which is machine-readable. There might be instances where something is available handwritten and it's not supplied for whatever reason. It may be just the transcription that is available, but we do have at least a couple collections with some handwritten notes in there. So I'm just going to go back to my slides and resume share. Here we are. Okay, I think we're back. So hopefully you've been able to find a few things and just be able to explore the data catalog. That's what somebody was going to ask actually about. I now remember the question seems to disappear from the chat, but somebody asked about the filters. So make sure you distinguish between Qualibank filters and data catalog filters. The data catalog filters just have a couple of options where you can, where you can in the left-hand side do like menu type. You have to actually do a search first. So you'll see under find data, there's just a bar. So you have to actually search something and then you can filter the search. So Qualibank has different search options than the data catalog, and it searches differently because you're searching through the data. So the filters on Qualibank will search through participant metadata, basically. Okay. So, yeah, but somebody is, let me just, let me just go back to sharing my bear with me a moment. So somebody's just asked about how to find Qualibank. So unless it's changed on me in the last moment. So if you just search something on our data catalog, you should be able to find the link right here under the search bar. So it's just here in the corner. You can also get to it by going to our website, ukdataservice.ac.uk, forward slash Qualibank would be the URL for that. Right. I'm just going to navigate back again. There we are. Okay. So I just wanted to give you a kind of hopefully after searching you were able to find a few things. And I just wanted to say that we always have new data coming in. So this is just a few examples of some of the recent acquisitions to the collection, qualitative collection. So you can see here that there's lots of data on like Brexit and COVID LGBT rights, as well as collections on climate change, health and wellbeing. And a lot of the collections, seemingly increasingly to my eye are always are done with vulnerable and hard to reach populations. So refugees and children, for example, are those with disabilities. There's also a lot of international collections coming in. So there's a lot of research that's been done in African Asia, for example, and they are depositing those collections with us. So it's always worth kind of checking back. Those collections are kind of updated literally on a daily basis. So as people deposit data with us, we make it available. You can subscribe to our just mail list and we will send out notifications of which collections have been added that week. So some of them, it'll be a mix of quant and qual data, but you would be able to see as it comes in what's being added. Most of the collections that we receive now as well, if not all of the collections we've received are kind of born digital in a sense. So they're usually available just for standard sort of safeguarded download. And I'll go through some of our access conditions in a little bit. But hopefully this just kind of gives you an idea of what kind of collections that we've had coming in. So now I'll talk a little bit about access and some other tools and resources that will help you get started. So the data someone asked if the if the resources are free. Yes, everything is free. The data is freely available to anyone who registers with us. So we do have material. However, that's held under different licenses. So for example, we don't have a lot of collections in qualitative of this, but we do have a couple that are open data and open data is is basically you just need to go to the catalog page and it will be available for you to download. So we'll have an exercise in a bit where you can actually have a look at a download bundle and that is from one of our open collections. But we also hold most of our data anyways is held under a safeguarded license. And this means that you have to register with us and tell us what you want to use the data for before you are able to access it. So any of the conditions of the data that you would need to meet would be listed on the catalog record. So there's an access section of the catalog record which will tell you if it's safeguarded or open. But as long as you're looking to use the data for non commercial purposes, then the data are going to be free to use once you register with us. You may also see that under our safeguarded collections, there are some additional restrictions. So every once in a while you might come across for example, an embargoed study. And that means that the data is not available until a specific date, or it might be available as permission only access, which means that the archive will need to liaise with the depositor before the data can be released. So the depositor would look would review your project summary. And once they give permission for it to be released, we would then be able to release it. So most of the collections, probably about 95% I'd say are available as standard safeguarded licenses. With just a few definitely less than 1% available as open. Open. There's all sorts of challenges with making data that open when it's qualitative. The nature of qualitative data is such that it tends to have a lot of indirect identifiers within the data. So we would need explicit permission from from participants to consider making it something that's open. In terms of the data format, the data is available in multiple formats. So for qualitative data that is taxed, you'll usually get some sort of word processing documents like an RTF, or perhaps a dot doc formats, or sometimes they are PDF searchable PDFs. We do hold literally I think about two collections that have qualitative software packages. So like an and Vivo work package, for example. But we've asked for the data to also be made available. Not in the work package, but as an RTF as well. So the work package would ideally be something that is in addition to some of those text documents. And there's also some image audio and a few with videos as well. So we don't have lots of collections with images, audio or video. So part of it is just the fact that that is personal data, which makes it more challenging to make it available. And to safeguard participants and doing so. And part of it is just we're working with limited space as such. So some of those images, audios and especially video are very, very large file sizes. So sometimes we get offered video, for example, and we just the space presents a bit of a problem for us. Perhaps not in future that we might be be switching to the cloud or something. So we're looking at what some of our options are, but we're still working with with physical servers that are located at the university. So there are space issues where video are concerned. So if you are working with a UK university, you would just use your own username and password to get in. So if not, you'll need to make an application and assuming that you meet the criteria and agree to the terms and conditions, which I'm going to say about the next slide. Then you'll be issued with a UK data service specific username and password, which you can then use to get access to the data. Okay. And in terms of registering, it would probably take me longer to read through this slide and explain registration than it would be to actually do the registration. So the thing that I do want to say that I want to point out about this process is that when you are registering, you know, you, you put in your username and password, and it would take you to a page where you have to fill in your information. And there is a box along there that you would have to tick that you accept the terms and conditions of the end user license. And the end user license is a legal documents, and it's exciting as any legal document you've seen, but there's two points on it that I think are really, really important, particularly for qualitative researchers. And that's if you sign up to use data from the UK data service, and you even inadvertently figure out the identity of one of the participants of the collection that you're looking at this end user license stipulates that you won't disclose that identity. Most of the time, you wouldn't be able to uncover the identities of participants because the data has been pseudonymized and where necessary additional restrictions have been put into place. But even if you should, you're not permitted to disclose that identity. And the other promise that you make with the end user license is that you won't share the data onward. And that means with anybody. So if you're working with a research team or with your students. Again, you wouldn't be able to share with them. So we've got lots of resources that are available for students and and researchers. So they can always register separately. But those are the two key points. Don't disclose identities and don't share the data onward. If anybody is interested in reusing the data for teaching, a lot of quality use is done through teaching. You would need all of your students to kind of sign the end user license as part of that process. So you can get in touch with us. There's a few different ways of doing that efficiently and effectively. But all of them would need to sign the end user license if you're using a safeguarded collection. So downloading the data. Again, there's a short video which walks you through that process. You kind of pick what data that you that you want. You have to write a short description on what the project is. I think there's a minimum character requirements on it. But otherwise, if you just kind of write in what your research aims are and what you're planning to do with it, it's just a few sentences. And you would just assign the data to that particular use and you should be able to use it from there. So it's a little bit like an online shopping experience, except with data. The tutorials are really useful though if you've not downloaded data before, and you need a little bit of help. So we're going to do one another exercise here exploring a download bundle. So in the chat, Gail, if you don't mind just popping in the next worksheets. And it will take you to one of our open collections, the pioneers pioneers of social research. So if you haven't registered with us before you don't need to worry about doing that right now. You should be able to just go to the access data page and you should be able to see links there that will allow you to download our download bundle. And the activity basically just walks you through finding a couple of things within that download bundle. But the idea is just kind of explore it a bit. I'll walk you through. I'll give you another 10 minutes or so to have an explore. And if you have any questions, please keep those coming and I'll try to answer those. But I'll walk you through the download bundle and just say a few points afterward as well. So have a have a go there and see what some of the data actually looks like that you can download. And somebody's just asked a question about the licenses on data. So how do data under special license differ from safeguarded data and control data. So the data in this exercise is open data. So you hopefully that gives you you just access it safeguarded data means that you need to register and sign that end user license and control data is available through our secure lab, but control data only applies to quantitative data at the moment. So control data uses our five safes framework, which basically means that it everything about how the data is access is controlled because it's highly disclosive. So you would actually need to go to a secure environment in order to access that data. And then anything that you create while using the data would then have to run through some output checks. So there's staff at the UK data service who specialize in this. And they would, it usually takes them a half a day to a full day to do the checks, but they would do further output checks to ensure that anything that you're looking to publish or share is not disclosive to an individual. So if you wanted to use, for example, it's usually geospatial like geographic kind of data that sort of thing where literally pinpoints people where they live. That sort of data would be available through secure lab secure lab requires additional training. So we have secure lab trainings that run quite regularly. Usually every four to six weeks I think is what what the secure lab team aim to do. But yeah, you'd need to pass the special training for it. And then you would need the appropriate kind of setup in order to actually access the data. And then again, everything would get checked over before you're allowed to use any of your outputs from that. We don't have an equivalent for qualitative data at the moment. So with quantitative data. It's running a disclosure check is a little bit more straightforward than qualitative data where there's a lot of rich nuance, a lot of indirect identifiers, etc. So it just makes it harder to do to ensure an effective kind of disclosure analysis in that way. So we have been offered qualitative collections in the past, which we've had to turn down because there's two we felt there was too much risk to mitigate in the event of a disclosure. So we do have another online workshop that kind of goes into some of that detail a little bit more, but where we feel that, you know, should there be a disclosure, we can't mitigate the consequences of that. We haven't been able to take it. So for example, we had a collection that was offered. Interviews with individuals who were recruited for jihad, for example, and we felt the risk was too high so we couldn't accept that data. We had another collection that was offered to us of women who were living in refugee camps who were talking through some of the experiences they had in their home countries and in the refugee camps. And the kind of nature of the data just made it impossible to have an effective anonymization strategy and the risk to the women themselves, if identified, could have been life threatening. So we didn't take that data. So there are instances where we don't. But if we can mitigate impact of disclosure and we can put in effective safeguards, we would take that data. But that's usually under our standard safeguarded license. And then if needed, we can always do something like permission only access or an embargo. Sometimes data becomes less sensitive with time. So we can do, we can do other things with quality data. That was kind of a long winded answer to that question, but hopefully that that covers it. So this, this special license. So if you mean special license as in controlled data, yes, that's available for quantitative data only. So some of the mixed method studies might have the quantitative part of the data under a special license as well. Thank you. All right. Just another minute or so exploring the download bundle. And then I'm just going to talk through the bundle itself just to give you an idea, because a lot of them will be if they're curated through the UK data service. They'll be curated in the set with the same structure. So I'll just talk through that structure a little bit. Okay, so just as you're kind of wrapping up, hopefully you should be able to see my screen here where I've got the download bundle. So this MR doc folder is the documentation of the collection. So if you open up the MR doc folder, which is not opening, hopefully, there we go. You'll see that it then kind of goes into the format type. So if you go into PDF, for example, you should see consent form interview guide. The user list is a data listing. This one also has little bios of each of the participants. So the MR doc folder. This is my Zoom menu. Apologies. Keeps getting in my way. All right. So you also have the RTF. So the RTF, that's a rich text document. And you can see there's interview summaries, transcripts, and then there's, there's a PDF folder as well here. So sometimes this one has thematic highlights. This is kind of part of some of the summarization. I think that was done by the original investigator Paul Thompson. Sometimes if there's images that will be in a PDF folder. We also have a few of them that will have XML format as well. So if anybody is doing kind of linguistic analysis or something like that. And they're using software that requires code. We do have some of our collections available as XML as well. Yeah, so, so this is the collection and then you've got your, your read me file as well, which kind of just says a little bit about the condition of the data, if anything is missing or that sort of thing. But all of them will have a documentation folder, and all of them will have then another folder, which will be what kind of data that it is. So hopefully you've been able to open up some of those, I should have just gone really into one of the transcripts so you can kind of see. So you can just see the listing here of all of the transcripts, and we follow the same kind of file formatting convention so you'll see that the study number coming back here. So the study number, the kind of data that it is. So it might be an int or a DRY is for diary ESS is for essay, that sort of thing. So we have a little three letter for the type of data that it is, and then the participant number. And if there's multiple interviews with participants, so this one has an A and B. This is interview one and interview two with them. So all of them follow a similar kind of formats. Now this is this one is curated by the UK data service. So it's quite structured in that way. But if it is a through our self deposit system, it may just be that they have a slightly different kind of file formatting system that they use. We do a QA check with all of the self deposits. Just to try and make sure that there is a structure and that all of the data files are in one place and they're organized and all of the documentation files are in another place and they're organized. But they might have slightly different kind of file naming conventions that they follow. If it is self deposit. Okay, so I just, I know we're, we're getting close to the close here, but I just wanted to also just cover a little bit about documentation as well. So we've covered different examples of data we hold, how to find and access that data, and what the process is of actually getting to it. And I just wanted to talk about analyzing the data very quickly. So you've had a look through hopefully some of the documents in the download bundle that make up part of the documentation. And we often focus on the data itself and lose sight of the importance of the documentation that that is available with that collection. So when you start out reusing the data, the first thing that you'll want to do is orient yourself to the original research project. And I think the main point here is to not underestimate the amount of time it will take to get acquainted with the data set. There might be multiple levels of contacts to get through in order to really understand by that. You know, understand what the data is. And what I mean by that is that you might have more than just the data that's collected at the time of the interview or whatever the data collection method is. But you might also have to consider the metadata that participants, the historical time period in which the data was collected, or where the data was collected. So really the ideas that you need to understand the data set as a whole in order to get at the root of what the data can convey and the documentation that's provided with the data set is a really useful starting point for this. So it contains more information about the methodology. So it might be interview schedules or call for participants or segments from publications that arose from the original study or funding applications. I've got a list here of some of the examples. I've also seen some studies which have sections that are written up by the principal investigator about particular features of the data set, such as the sample. So for example, Annette Lawson conducted a study in the in the 80s on adultery. And at the time, and you know that topic was very, very taboo. So sampling became really a primary focus for her. It was hard to recruit participants. So she put an ad in a newspaper, but that ended up with a highly white middle class female sample. So she ended up writing a 56 page document justifying her sample. It's really interesting actually, but a really useful documentation as well. In my time working with quality data sets at the UK data service. I've also seen background contextual material that was taken from the area of research, such as meeting minutes, government pamphlets. There's been correspondence with participants and all of that I think helps to paint a picture of what's going on around the study and would be included with the documentation. So where we have curated that documentation, it would be available as a user guide, a U guide. And that's what these look like. It's a PDF document. It would be held in the MR doc folder in the download bundle. And you'll see that those are bookmarked as well. So on the right hand side, if you're using Adobe, you can navigate to whatever is in that user guide using the bookmarks. If it's a self deposit, they might just have a folder with a collection of documents. And here's the other standard piece of documentation we ask for. This is a data list. So this is an at a glance look at the participants. So usually the first column is some sort of pseudonym or participant ID, and then the following columns are characteristics of the collection, not the whole collection. The data list is meant to help you kind of filter and find the kind of data that you want. So we'll just have whatever the PI thought was most important for their particular collection. For example, April Galway did some research reusing the Millennium Memory Bank, which were interviews that were held at the LSE archives. And she specifically wanted to find single mothers. And so she was exploring post war motherhoods. And unfortunately, in her case, the collection wasn't as well documented. So she actually had to search through the interviews themselves to try and pick some of this out. But as part of her PhD project, she ended up constructing some of that documentation. So future re-users would be able to more effectively sift through the data. And just to reinforce the importance of documentation, I wanted to do one final activity with you. So, Gail, if you don't mind just popping in that interview. So I'll give you just five minutes to have a read through of the it's just an interview excerpt. It's about a page long or a page and a half long. And as you're reading it, I just want you to think about and surmise who is the participant? Who do you think they are? Where might they be from? So this transcript is an example of a verbatim phonetic transcript. So you'll find hints just upon opening it. And it may take a little bit more time to understand some of the words because it's written phonetically. So take your time reading it. And once we're done here, this is the last activity. So we'll just close up with a few signposting to other resources. I'll give you just a few minutes to have a look at this. And then we'll make the documentation available as well. All right. We've got just a couple more minutes. Gail, do you have that? I've got the link for the documentation for this. Let me pop that. Yeah, I've just added the second worksheet. I've added it online. Okay. Perfect. If you don't mind popping that in when you have a chance. So we're going to share the documentation with you as well for this particular interview transcript. But I think it's a really interesting example that might challenge some of the assumptions we make. So for me, for example, I was really surprised at the age of the participants because it was clear they were a grandmother. And I didn't expect them to be not much older than me to be perfectly honest. I have a four-year-old daughter, not a grandchild. But also it might then make more sense when you take into account when the data collection took place. So this is an older study, so a different generation than me. And you can probably tell that this is from, I believe it's Scotland is where she's from. So you might be able to tell from some of the phonetic spellings there where she's from, or some of the phrases if you're familiar. But of course, if you're not familiar, it kind of makes you think, you know, you might have made assumptions about perhaps her education level or something like that if you weren't able to fully understand some of that context. So yeah, so the worksheet data in context two has some of that additional documentation that might help you make sense of some of that interview transcript. So hopefully that's just a kind of insightful exercise on your own to think through why some of that documentation is really important. We are running low on time, so I just wanted to close with just a few more resources. So we do have a lot of web pages for new users if you're interested. So if you want more help with finding data or registering, we've got pages on that on our website. You can also contact us and connect with us on social media. We've also got our YouTube channel, which has lots of videos, not only of our past events, but also, you know, just talking with re-users, for example, some of our impact videos as well, kind of go through some of our collections. So check out our YouTube channel, which might be quite useful. And we've got recurring workshops. So we're kind of nearing the end of this academic term, but we'll have another set of these events upcoming in the next term. So these are the ones that usually run on a termly basis. So do have a look at our upcoming events page because those will be updated with the next set, basically, events that we have. So please do join us if you're interested in any of these topics. So yeah, just a signpost that as well as our workshops, we also provide advice on research data management planning and preservation. So if you're researching on your own and are looking to deposit at the end, please do get in touch with us. We're really happy to help you with some of the data management planning. And we work with those that we work with at the start of their project usually have a much easier time depositing at the end of their project as well. Or if you're looking to use data in a publication or classroom, please tell us about it. We're really happy to help support you with that. And if you would like us to broadcast some of that and publicize it for you, we can certainly do that as well. And if you have any queries, you can go to our FAQ pages or pop us an email and we'll be happy to to get in touch with you that way as well. So that's the end of this. I am hoping you guys have found this useful. I'm just going to just double check in the chat here and make sure there's no other final questions. But I have I know we're kind of at time here nearly, but I do have a few extra minutes if anybody did have questions they wanted to cover. I did try during the exercises to respond to those who did a Q&A through the chat facility. I think there was one other question that kind of came up around digitizing handwritten. And I just wanted to share because I was part of the project that did the school leaver study. So when we were transcribing the handwritten, you can just do a transcription, but you can also train OCR software is what it's called. So we used OmniPage. There is probably with the advances done in AI, there's probably something that's come out in the past year or two even. That might be a little bit more advanced than what we were using at the time because this is about 10 years ago now. But you'd be looking for OCR software, which can recognize characters and you can train it to kind of recognize certain letter formations that aren't sort of machine readable. It's not like a font in your text document per se. So you can train it to learn those letters. And so if you're doing a huge project, you know, if you've got a lot of data to work with, it might be worth looking at some of the OCR software and see if it can recognize some of the characters and if you can train it on that. We've done it as well with another, it was actually a survey, but they had handwritten in on the side notes. So we were transcribing those and we used OmniPage again there and trained it to the researchers handwriting so that it could automate a lot of it. It still needed editing, but it just kind of based on the volume we were working with. It just made it a bit quicker. All right. Thank you everyone for joining us today and yeah, let us know if you've got any questions. Thank you so much for the lovely comments in the webinar chat as well. I'm glad you guys have found this useful and I hope you find it as exciting as I do. I love our qualitative collections. I think there's so much in there. So we hope to see you using some of our data soon.