 Hello, everyone. Thank you for joining us today for this online workshop on documenting quantitative and qualitative data. My name is Maureen Haker. I am a senior user support and training officer at the UK data service. And I also teach at University of Suffolk and I'm a qualitative specialist. So I've worked on everything from ingesting and digitizing qualitative data to reusing qualitative data. I'm here with Anka. Anka, did you want to introduce yourself? Sure, really quickly. I'll just say hello, everyone. My name is Anka and I think I can just say I focus on quantitative data. So we have Maureen for quality and we have Maureen for quality. So we'll lead you to today's workshop. Welcome. So Anka and I are going to be doing a fair bit of talking, but we've also got some exercises that are scattered throughout to make this a bit more interactive. We'll try to monitor the questions throughout this workshop and we'll try and answer some of them as we go along if we can. Otherwise, we should have time at the end to talk through some of the frequently asked questions. This is an introductory workshop. So we're assuming that you may have had some research methods training before. You may have a bit of experience using archival material, but really this is aimed at helping build a foundation of knowledge about research, collecting research and sharing research. So let me see here. So what we're going to do today, here's a bit of an overview. We'll start with the basics. What's documentation? Why is it important? And then we'll move into a bit of a distinction between documentation for qualitative projects and documentation for quantitative projects. And we'll also talk through what metadata is and why it's important. And finally at the end we've got a Mentimeter exercise and some sign posting for some further resources if that's helpful. Okay, so going back to basics. What is documentation? Documentation is basically all of the stuff that you collect throughout research that sits alongside the data. So it helps to explain and make sense of the data. So another way of thinking about this is basically all the work that you do behind the scene, the necessary paperwork and paper trail that help you complete your project, but at the same time may not necessarily feature in your outputs. So we'll be going through some more specific examples of documentation, but just to start out, there are two broad categories of documentation. One is project level documentation. And this is all the things that you collect that tell about your research project as a whole. So what kind of methodology was used? Who the research team were? What are the keywords to describe your research? Where the research took place? But there's also data level documentation, which can give you more information about specific parts of your project or parts of the data. So it doesn't necessarily reflect the whole collection, but just a specific element of it. And this might be something like metadata about individual participants. So what their gender, their occupation, their age, or any other kind of demographic detail is. And it's compiled in a way that's quite easy to access. It could also be about specific variables that are collected. Now, you don't really need to think about whether your documentation is project level or data level necessarily, but it's just to point out that there are layers to a research project. And all of those layers of context are important to document and consider when you're assessing and evaluating your data and your overall research findings. So hopefully you're already thinking about some examples of what might be documentation across all levels. So this could be things like, you know, the interview guide that you use to interview your participants, your variable lists, your blank consent forms, your information sheets. It could be analysis memos, syntax, those sorts of things. Anything that is basically created as part of the research project but is not data is probably going to be some kind of documentation. Now, to make things a little more nuanced, all this material from your research is called different things by different bodies, different policies. So archives like the UK data service, we'll call it documentation. And under documentation, we have specific kinds of documentation. So you have user guides, you have data lists, you have data dictionaries. And all of these we'll kind of talk about in this workshop. But there's also things like read me files, which you may have been asked to write if you've ever deposited data before. Some policies like the ESRC's data policy, and that's the Economic and Social Research Council in the UK. They'll refer to research materials, which starts to get a little bit messy as, you know, what counts as a research material, what counts as data, what doesn't. It also refers to data assets, which is the system that's used to hold the data. And then there's also metadata. I think the best description I've heard of metadata is that it's data about data. And there's the specific brand name, if you like, of metadata called DDI. And while the concept of documentation is very simple, the deeper you go, you know, the more nuanced some of those ideas run. So if you're ever interested in exploring some more about the terminology, Codata has a working group dedicated to terminology and guidance on terminology. So hopefully some of the practical examples that we'll run through of the different types of documentation will give you enough flavor to really dive into any areas that you want to know a little bit more about. So now you know what it is, and the kind of different levels of documentation. So hopefully you're starting to see how embedded these are to doing and disseminating research. So why do we have documentation? Why do we compile it? Documentation from a point of an archive maximizes the value of the data. So if you're reusing data, it's essential to review and publish that documentation alongside it. You can't understand the data without the documentation. If you just dropped an interview transcript in the middle of the street and someone picked it up, they can't really understand that without understanding the context in which it was gathered. There's also a historical value to the data. So you build up a bit of provenance for it by assembling your documentation. Documentation also allows you to expand on the methods and the processes that might not normally get covered in a publication. So there's no kind of limited space necessarily for any documentation. You can publish as much as you want to. So whatever you think is useful for understanding that data, you can include. Doing so also helps enhance your research outputs. So documentation becomes an output in itself that you can publish. And that documentation can also be reused. And I'm going to touch on that a little bit later. And it also adds a level of transparency to the research. So as part of the peer review process, reviewers can better understand the work and the data and re-users can also, you know, more accurately and efficiently reuse data. And finally, it aids in the creation of fair research. So I'll talk a little bit more about fair data in just a moment, but first I want to do a quick exercise, which hopefully will kind of highlight to some of you some of these points about why documentation is important. So I think Gail is going to hop a link. Thank you. So we want worksheet one, which is the very first link. And if you can hold off before clicking worksheet two, I'm going to give you about five minutes to read through an interview extract from worksheet one. And there's some directions along the top there. But I mean, just kind of as you're reading it, just gather what your initial assumptions are or what you think about that particular participant. Okay, so I'll give you five minutes. And then I'm going to add some context to it after that. We've got somebody who has already said that the participant was much younger, age 43. I think that really surprised me as well. 43 with a four or five year old grandchild is quite young, I think for a grandmother, even of the grandmother generation. And somebody else was mentioned about the role versus urban location, or just the location in general, you probably could tell from just reading the data that there was a phonetic spelling. So clearly there was a kind of accent, but depending on what you were expecting, how that read out in your head might changed if you actually knew the location of the participants. And of course, some of the phrases that are used might be specific to the locale. So it's useful to know some of that because it helps you better understand what that participant is trying to say. Yeah. Okay, so a couple of you actually are quite surprised by the age. Yeah, and some of you are pointing out about the kind of deprivation that's described within the interviewer notes as well, which again might change the way you feel about what that person is saying. I think the other thing that I also found surprising is where they described a couple of the children as looking quite gaunt or ill, potentially. And it might be useful to know some more about the background of the interviewers themselves and what they make those judgments on. So if you read further down the context, it kind of describes the research team a little bit more too, which might be useful. Yeah, the date of the interview, key piece of information. Excellent. Yeah. Okay, so this is a little bit of an older study. So, you know, all of these, those dates in that and the social and historical context are actually really important for understanding and appreciating what the data has to say what the opportunities are and the limitations are of that data. So hopefully this kind of starts to drive home. Why documentation is so important to sit alongside data. And I mentioned earlier about how documentation helps to build up a fair research, fair data. So I don't want to go into a lot of detail about this. But I do just want to point out that sharing documentation and documenting your research is one of the key underpinning principles to fair data. How fair principles are relatively new guidelines or goals of research which aim to make research more transparent, more collaborative, more constructive. So since the early 2000s, technology have had a massive impact on how research is done. We can collect more data, we can collect more complex data and share it with others much quicker than we've ever been able to do before. However, despite collecting so much data all the time, we still have challenges to processing that data. So just think about any kind of organization where data is not shared between departments. And, you know, you have to constantly re-enter, re-ask that same information. So, you know, how do we solve this issue? We have this data. How do we share it better? In 2016, Wilkinson at all published the fair principles which outlined what good data management looked like that would enable the sharing and reuse of data. And the key point to make here, I think, is that the data needed to be machine actionable. It needed to be able to be read by computers, used by computers. So to be able to use the technology that so massively changed how research is done. And the guidelines were so influential that an international collaboration established the GoFair International Support and Coordination Office just a year later. And these fair principles continue to influence policies. So, you know, you'll, you'll see fair references and data policies of the Research Data Alliance, the Association of European Research Libraries, the UKRI and all of its research councils within the UK. And if you receive grants or taxpayer money to complete research, chances are you'll be asked to share the data and the documentation at the point of completion of that project. Many publishers are also now requiring the sharing of data and research materials before publication in the name of transparency and research rigor. And the fair principles basically they state that data should be findable, accessible, interoperable and reusable. So research is not just something that you complete in a solitude of an academic office, but it's something which is completed in collaboration with others, and it's shared for further reuse. So to make data fair, you need to also document that data. To make data findable and accessible requires clear metadata. Equally to make the data reusable means documenting the provenance of that data. So I won't go further into the fair principles here, but please do send through any questions you might have about it. Through our Q&A so I'm happy to go into a little bit more detail about fair research, fair data, if that's something that you're interested in. But from here I am going to go into a little bit further detail about documentation for qualitative data before handing off to ANCA to talk about documentation for quantitative data and metadata. So in talking about qualitative collections, I'll show you some good examples of documentation, but I'll also talk through about how you might utilize changing technology to help make documenting your project a little bit easier. And then I'll talk just a little bit about reusing documentation. So here are some examples of documentation for qualitative work, basically all the information that you probably create along the way of doing the research, but it's never seen outside the research team and maybe participants. So this can include any kind of interview preparation, including instructions to interviewers, prompts, topic guides, you might include blank consent forms, information sheets, or any other materials that participants received prior to taking part in the research. It could also be text that was written by you expanding on methodology or sampling, including where it's permissible under copyright, perhaps extracts of publications or draft work that prior to publication. What we don't see very often, but can also be really useful is things like research meeting minutes, research diaries or field notes, documentation from analysis, including things like memos or code books or initial analysis write ups. So at the UK data service we've got about just over 1500 collections of qualitative research. And we don't usually see things on analysis. I think there's probably just a handful about a half dozen who have any kind of analytical documentation included with the data. And I'm not sure why that's not typically part of documentation for qualitative projects, because it's a pretty standard part of quantitative collections, which I think Anka will go through later. But that analytical transparency really helps to validate research findings, and it can help re users better understand the decisions that are made, or perhaps not made about the cleaning and processing of the data. And that kind of context is actually really, really important to qualitative work. This is part of the argument of what why to do things qualitatively. You better understand the context. So hopefully all of those materials are being actively considered by the research team throughout the data collection and analysis. So collating them and making them available as documentation can really help kind of achieve what qualitative work is aiming to achieve. In collections that are curated by the UK data archive. All of those materials would be collated into a user guide, and the user guide is then bookmarked so you can see in the upper right corner of the screenshot those bookmarks, which tell you what the materials are available within that user guide. And those tend to encompass just projects level documentation, not data level documentation. In addition to the user guide, the UK data archive also curates a data list, which is a sort of at a glance look at the participants. So this would probably be considered more data level documentation where you have a bit of metadata, those basic demographic details about each participant, and the file name where you can find the relevant data. And the details on this data list are not necessarily standardized or all encompassing. It's the details that were relevant to the project itself. So these can take a, you know, they can take a little bit of time to compile. But they are really useful organizational tools during research as well. So they're generally just a good practice to sort of create a data list as you go along. And while the collections that are curated by the UK data archive have user guides, you're more likely to see just a folder of documents or separate files on collections that are deposited by researchers themselves. So in this collection, for example, on political dissatisfaction, you can see each file that was included with the data. So there's an end of award report. There is the information that was given to participants. And this was multi method project. So all of that documentation makes it a lot easier for re-users to kind of piece together that project again. Other types of documentation include observations that are written by researchers in the moment of data gathering. So some types of methods obviously dictate for that kind of self reflection as part of the method. And that may even be used as data. So for example, I've done research using biographical narrative interview method. And that requires you to sort of sit down for an hour after an interview and kind of just freely writes. So those sorts of things might be considered during the analysis and might be considered data. But others actually recommend this as just regular good practice of providing context. So these comments in this collection are just a few sentences that are written after every interviewer, every interview by the interviewer. And this is from a collection called the affluent worker, which was researched that was used to create our ONS categories of social class. So they help contextualize the relationships between the researcher and the participants. And they actually really, really helpful data level documentation. This, this is a little bit unusual. So I'm not sure if, you know, most kind of qualitative interview methods don't dictate this kind of quick reflection afterward, or if it's just something that's not often shared. But it really helps to rebuild ideas about what the power dynamics were of that interview, and they are really helpful. We have some, some examples like this, but they're not something that we normally see as sort of standard alongside qualitative work. And perhaps a step further than just those sorts of observations are draft works of analysis. So Dennis Marsden's mother alone collection has a piece called felt poverty that was written, I think as a, as a piece of draft work, but it never actually made it to publication. I think it was, it was further developed from here, but it was included in the documentation for this collection. And it is a really interesting collection. So the research was led by a white educated male who was interviewing single mothers living on welfare. But this piece that he wrote up really provides that context to show the kind of sympathy that the research team felt toward the participants, how they actually, you know, took some steps to be reflective about their own position. And that Lawson study on adultery and analysis of love and betrayal was a project that was conducted in the 1980s, which aim to explore the extremely taboo topic for the time anyway of cheating. So as such, it was really hard to recruit participants, because, you know, people wouldn't necessarily own up to this as such. So Lawson chose to put out a call for participants in a newspaper, but it created an arguably biased sample. So her sample was mostly white, mostly middle class women who people who read the newspaper, who responded to her call for participants. So Lawson became a bit preoccupied with the sample and she wrote a 54 page defense of her sample. And it starts here with just a discussion on some of the ethical conundrums that arose from from her sampling strategy, including jealous partners who are sending in information of their married partners to participate. Or there was one man in particular who called from a psychiatric ward. And then she went into an extensive comparison between her sample to the national population, exploring where there was a significant difference from the national population and whether or not that would actually impact her data. And finally, she comes to this, you know, some very interesting conclusions basically about sampling strategies more broadly, including the point that sampling needs to be, you know, needs to match the context of the study. And that exploratory studies, particularly of, you know, sensitive topics are benefited from a greater focus on the ability to talk about the topic in detail, rather than a focus on the participants themselves. Earlier, you know, I had the example of the interviewer notes from Affluent Worker, but more extensive kind of field notes are another example of quite detailed documentation. Now, field notes are a little bit different in that they occupy this bit of gray space between being both data and documentation at different times. So it's worth pointing out that documentation typically is something that would be normally available openly. So field notes and other reflections, depending on the level of detail, may need to be put into similar access restrictions as the data. But, you know, we've only got about half a dozen collections with examples of field notes, and almost all of those are ethnographic studies. So it's not something that we see often kind of included in a user guide as such. But this is kind of really important context to fit alongside data. And there are some new possibilities with changing technology. So EnVigo and other computer assisted qualitative analysis software allows you to download your code books, your memos, your mind maps. And here you can see a list of nodes and there, which is what EnVigo calls codes and their descriptions. And all of this is very easily exported into a Word document. So when we received, for example, the Edwardians collection, which contained 453 80 plus page interviews with British residents who were born during the Edwardian period. It was accompanied by all the analytical work that was done by hand. So this ended up taking up 16 shells to hold just the coding of the transcripts into those key themes. Now all of this can kind of be downloaded into a single file at any point during or after the actual coding. So much more accessible now if that's something that you would choose to include with your data. Research teams can also use blogs and websites to keep in touch with participants. So keeping a research blog which updates others on the progress, it might post information for participants, or it might even be used to send out calls for participants. Then once that project is done, the site can then sit alongside the project as a related resource and again provides all of that additional documentation. We are also now seeing creative documentation to such as this photo story. Again, this is the sort of thing that might be classed as either documentation or potentially data. But there is scope to kind of use video and audio files or image files to accompany the data as documentation. And finally, changing technology is not just about how research is done, but also how we archive the UK data service has created quality bank, which is this online tool for searching and browsing and citing qualitative data. And as part of that tool, you can search and view qualitative data online, and it also links documentation to each piece of data. So that documentation can be related to the specific text that you're looking at, or it could be related to the collection as a whole. But it does allow for this kind of distinction between project level and data level documentation where that's available and where it's accessible. And finally, just one more point about reusing documentation. Often we think about the reuse value of data specifically, but less about how documentation has its own value. So documentation can serve as an inspiration for good practices. So adapting consent forms and information sheets, which already exist rather than writing them from scratch. We have literally hundreds of consent forms documented within our collections, including snippets of the ones you see here, which are explaining what data sharing means to participants. So you can explore those and learn about good research practices from them. You can also examine how to do research with children or vulnerable groups, which can be particularly challenging. So writing up information letters and consent forms for very young children, for example, is something that can be quite difficult. But we do have collections which provide good examples of how this has been produced in the past. And then you can do the same with data collection. So one collection, the foot and mouth disease in North Cumbria was deposited with their interview guides. Those interview guides were then reused by med students to better understand how doctor and patients can have a conversation. I've also referred dissertation students to find similar interview guides on their subject area before setting out and making one themselves to help them see what's important to ask how to structure an interview, perhaps what you might not ask. Okay, so that's my bit on qualitative documentation. I'm going to hand over to Anka now. I think she's going to hear a screen. So give us just a moment to switch over and she'll talk through quantitative data. Thank you, Maureen. I'm just going to share my screen. Okay. Hello everyone. My name is Anka Vlad and I'll be talking through the second part of today's presentation. We're going to look at quantitative data, metadata and some closing slides with some further resources and links for you to use. Okay, so first of all, documentation for quantitative data. So what we're going to talk about is what documentation should accompany quantitative data, if that is what you're working with and planning to share. We're going to talk about embedded documentation and then we're going to look in more detail at what good documentation in particular looks like. And of course we're going to have some examples. So just a short summary, of course we're going to go into more detail about this, but documentation for quantitative data. So these are the most common, of course it would depend on your project, what type of data you're collecting. But these are the most common that we usually receive at the UK Data Archive. There will be a questionnaire, a code book embedded or separate to the data file, a data dictionary, a user guide, an experiment protocol, and the read me file. You'll notice that some of them are also, have also been mentioned by Maureen such as the user guide. So it's not necessarily specific to quantitative collections, but it is very useful to have as well. Okay, so moving on to some more information, looking at this in more detail. So for quantitative, so by that we mean structure tabular data, documentation can be embedded within the data file itself. So we can have variable and code descriptions within the database. Most data analysis software packages have facilities for data annotation and description. So we can have variable attributes, you know, data type definitions, etc. embedded in the data file. Alternatively, so this information about the data items can be recorded separately in a document such as a code book or data dictionary. The golden question here, what I referred to it, in terms of what to include. So when you're planning to share your data and you know, you have quantitative data that you produced. And in general, really, I think this applies to any data that we are working with. In terms of documenting it, we need to ask ourselves, of course, we know the in and out of our research projects, you know, everything there is to know about them. But for someone with no prior knowledge who were to say use this data in five, 10 years from now, what information would they need to be able to understand it, you know, understand the context, everything about it, and most importantly, use that correctly in their own research. Okay, so we talked about embedded documentation. So this is an example of embedded documentation and SPSS files. So this is the variable view. For those of you that worked using SPSS and know this image very well. So as you can see, we have the name of the variables. We have the labels. We have the values and information about missing values. Okay, so this would be what we called, of course, it's embedded in the data file. And one point to make about this, and I think it's important to point out is that we're going to talk a bit later about access restrictions. And of course, access is limited. Only data files can be put under access restrictions, right? But because this documentation is embedded in the data file, this will also be behind an access restriction, right? So if the data file needs to be, for example, safeguarded or needs to be under some sort of special license, this will not be available to users. That's why we need to also sometimes produce other documentation such as a user guide or a data dictionary, which would be available under open access for people to be able to also see that. So that is just something that I wanted to point out about embedded documentation in data files. Okay, Maureen already talked about user guides. Of course, we also have them for quantitative data, and it includes a variety of information and documents to provide valuable context information, so that, of course, examples are listed here on the screen. So we have methods, fieldwork, of course, everything that would apply to your project. And then I included here three examples, and we don't have an exercise per se in this section on quantitative data, but I was just going to go on, just use this example and go, I have to go out. Okay, that's fine. I'm going to share my screen in a second again. I just want to open this in my browser and show you this user guide. And it will also be useful to see where to find all the documentation for data collection. Okay, let me show my screen again. Okay, so here we have the first study that was mentioned in the slide. So we have understanding society, a COVID-19 study. As we can see, the data is safeguarded. So as I mentioned earlier about embedded information, embedded documentation, that would be behind the safeguarded wall. However, we have documentation here and we can see there's quite, this is very well documented. And we have readme file, we have data dictionaries, citation files, et cetera. And then we find, and at the very bottom we have a user guide. Okay, so I'm just going to open this. User guides will look different across collections. There isn't a template that we have that perhaps that is something that we can potentially look into. But it's usually created by depositors. Of course, it would vary very much across collections, depending on what information, what data is collected, et cetera. But if we just look at the contents, right, so we have information on sample and following rules, field work, right. On the further different ways we have questionnaire content, how to read the questionnaire, data structure, data files, file naming conventions, right. So this contains all this information put together. And it's very useful, it's open access as you saw, so I didn't have to register. This was not under any access restriction. So I can look at this and perhaps I can already, without looking at the data, decide if this is something that I can use or not in my own project. Okay, let me go back to the presentation. Okay, let me, I think we are already down. Okay, so that was just the first collection here. I added two more for you as you, if you, you know, you'll have access to the slide so you can, you can have a browse. You'll see they look very differently depending on the project, of course. And yeah, that would be examples of user guides. Moving on to code books and data dictionaries. So in previous slides we saw that these can be, you know, information can be embedded in the data files. But we can also have this recorded in a structured document, so separate from the data file as we just saw the user guide. So this should contain, so code with the extraditionaries should contain detailed and sufficient information, of course, about all the data items. So this includes all variables, new and derived, you know, including frequencies, command files used to create derived variables, etc. We also have some code book creation tools. So we have the DDI editor. This is aimed at data processing for curation purposes. Right, so before we publish our data in a data archive. And then we also have Nestor publisher. I included a link for the next publisher, but at DDI editor there's quite a lot of information out there. So if you just pop that into Google, there'll be a lot of information on it. And as you can see the Aster's there possibly under access restrictions. I already talked about that. So, you know, embedded information in the data file would be under some sort of access restriction potentially. So it's important to also have some sort of alternative to that in a separate file. Okay, examples of code books and data dictionaries. I included the citations here and the DOIs of course included there. So I think I'm just going to let you go through these on your own in your own time, just because we have quite a lot still to cover in this presentation. But these are very good examples of code books and data dictionaries that are available from our data catalog. Okay, moving on to data level documentation. So we're looking at a bit more detail in terms of what information should be present for quantitative data. So of course we need for variable names, right? So we need a question number system that matches the questions in the questionnaire used ideally, or it should be clear what numerical system is used or what type of system is used if it's a numerical order. You know, it should be very clear, you know, what connection it is between the questionnaire and the variable names. Okay, we should have minimum full abbreviations. So for example, for government office region, we would have GOR or for mothers occupation, we have MOCC. It should also be consistent in file conventions, especially if you're working on a larger project and we're creating different data sets. You know, that should be consistent. So it's important to speak to colleagues to make sure that that is in place. And for interoperability, interoperability, I always have to say that word, across platforms, variable names should not be longer than eight characters and avoiding spaces. Okay, similar principles for variable labels. So these should be brief and concise and they should be maximized to characters, include units of measurement where appropriate. Make sure to mention coding or classification schemes that were used. I have some examples there. Make sure that you reference the question number of a survey or a questionnaire, right? And I included an example there for a variable and its label. Okay, so it's clear what question the questionnaire relates to that particular variable. For value labels, so ensure there are no out of bounds values for categorical variables. For example, avoid having blank system missing or zeros. Ensure that missing values are coded. So, you know, this is an example here on the screen. Of course, you don't have to use 99 or 98. But this is just an example. This is important information to have for future people using this in their analysis to make the distinction between missing values. Okay, here we have an example of if you're using SPSS, you know, you've seen this before. This is just a variable information output. So we have the first four variables in this data collection, in this dataset. And we have a label, measurement level, missing values. On the next slide, we have variable values. So for the first three variables, we have the labels clear what represents what. Okay. Because we spoke about data dictionaries, so this is something that archives would potentially export and make available during that curation process. We do that as well at the UK Data Archives. So we create in this case a data dictionary, we export this, and we make it available separate. So this is not something that the depositor has to do. And I just did some snapshots of two variables. So this would be just a document. This is part of a data dictionary and this would be a document listing. Listing information about each individual variable. Okay. And these are just two examples. So we can see that it contains the variable name, the label, what type of variable it is. So if it's numeric, string, et cetera, the measurement level and the value labels. Okay. Moving on to metadata. Maureen's already mentioned it. We're going to have a look in more detail now. So we're going to look at what it is, what qualifies as good metadata. We're going to have a look at some metadata standards and how metadata is produced. Okay. So what it is, is essential subset of core data documentation. So, and it provides standardized structured information. It is intended for machine reading. So this is, this is probably, this is one of the most important things to remember. That it is machine readable. Right. So this is important for the purposes of indexing in a data catalog. So an archive would use it to index the collection in their data catalog. And of course, important for citing, discovering that data collection and retrieving the data collection in the data catalog. When we, you know, when we land on a data catalog and we search by, you know, there will be a search engine. We have it for, at the UK data service, you know, when you search by keyword, that is how it, you know, it uses that keyword to find data collections. That is, that is one piece of metadata that is used to retrieve information from a data catalog. Some examples. So we have abstract, we have keywords, topics, data field work, country, etc. Of course, it will depend on each collection. It would vary on each collection. But these are the standard, standard fields that you probably need to fill in. So that, that archival repository is able to process and index your, your data collection. What is good metadata? So, Marina really spoke about the fair principles. So I'm not going to go into any detail about that, but I used the same slide and I just, as you can see, there's some, not very well, no, I noticed that some, something happened with the formatting. It's cutting over some text, but hopefully it's still readable. We can see that all of them except three refer to metadata, right? And I'm not going to go through each individual one. I'm just going to pick a couple. So the first one on the findable, we have F1, metadata are assigned the global unique and persistent identifier. We saw in previous slides where I cited the, I can actually go back and show you, right, where we cited the collection. This is the permanent, this is the DOI. So this is at the UK Data Archive. We mint the DOI for each data collection that we, that we publish. We see that it looks like a URL, but it's different from a URL in the sense that it's permanent. It will always exist, you know, for the purposes of citing. If someone were to use that data and cite it, whoever accesses that in the future will be able to find that data or at least the metadata for it. If the data needs to be taken down at some point in the future or removed, the metadata should always be, should always exist. And that the DOI should always land on that metadata, right? So that is, that is why it's a persistent and unique identifier. It's also permanent. Okay. And I think I'll just leave it at that. You can see that there's quite a few other standards for metadata here. I'm going to, should I pick another one? I said time-wise, I'm looking at the time. I'm just going to let you read through these. And if there are any questions, do add them in the Q&A. Okay. Okay, producing good metadata. So how to produce this, right? So archives or repositories, the good news is that you don't have to, right? So the data or the data repository or the archive you're choosing to publish your data with would take care of this, right? So they will collect the information using a data deposit form where you, so it's basically a form where you have to add all the information about your research project. And this will then enhance the information used to create that metadata record, okay? So the data collection we saw earlier, the understanding society where we looked at the user guide, that is the data collection. And all the information that we saw there is the metadata that was first provided by the depositor in a data deposit form, and then of course we enhanced it and we published it and it's now available for reuse. There are of course metadata standards that are used across data archives to enhance discoverability, interoperability and reusability. So when you submit your data set to a trusted data repository, these will automatically be applied and as I said, they would take care of producing good metadata. At the UK Data Service we use DDI to structure catalog records that stands for the Data Documentation Initiative and it is one of the standards that is used for social science data, which is what we use at the, which was what we focus at the UK Data Service, right? Yeah, so it is a standard. I covered that, yeah, mostly used by social science data archives in the world. And what this does is records mandatory and optional metadata elements relating to, so we have the study description, data file description, variable description. Of course, there are other metadata standards and this usually varies by discipline and I linked them here for you if you're interested in reading more. There are also controlled vocabularies. So this is a controlled vocabulary is a consistent and organized way of describing data and it's essential to make that data findable in a data catalog and also shareable within research communities. So examples of controlled vocabularies are the SORI, ontologies, taxonomies. At the UK Data Archive we use HACET, which was developed by us at the UK Data Archive in the 1970s and we are also a curator for its multilingual sister ELST, which contains about 16 different, it's available in 16 different languages and its guardian is CESDA or the European Consortium of Archives. Oh, I don't know why my slide won't work. I'm trying to move on but it's not working. I'm just pressing, I don't know what's happening. I'm trying to move on but it's not working. Right, let me try this. Okay, I need to stop sharing for a second because my slides won't. Do you need me to share it all or? I'll try to do it again. I'll just try to share again. There we go. I don't know what happened. Okay, there we go. And I think this is you, Maureen. Yes, so I was just going to, I also threw this slide in there just to say that when we're talking about metadata and kind of describing the collection as a whole, we also create persistent identifiers. So, you know, you can, you can do this within Qualibank if you want to do it to a specific part of data, which is quite unique. So the UK data service developed Qualibank about 10 years ago now. And you can if you wanted to cite a specific part of the data file or interview, you can do that within Qualibank. You can click the create citation button. And then it creates a URL, which is a persistent identifier. It's like a, it's like a DOI basically is the brand name of persistent citations. But yeah, you can, you can do that within our, within our Qualibank tool. We also have on our catalog pages, DOI is minted for every collection as well. So you would get a persistent identifier for anything that you deposit with us at the UK data service. Thank you, Maureen. Okay, so we're reaching the final slides now. Just to conclude, data sharing checklist in terms of documentation. So a few steps, check with the archive or the repository where you're planning to share your data, what are the guidelines and create the necessary documentation files, depending on, you know, what data you're sharing. Of course, there might be templates already available, which you can just download and adapt. We have such templates as well. And I, we listed some in the final slides of this presentation. Fill in the data offer, the data deposit form, as we saw earlier, where we fill in that with as much information as possible. That will then allow the archive or the repository to create that machine readable metadata, and ensure that your files also contain data level documentation. So, you know, this is that embedded documentation that would help understand the data. Of course, this would depend on what software you're using for your data, but it's important to have it if possible. Okay, in terms of accessing data, so I mentioned that, you know, for embedded data that would be under some sort of access restriction, depending on, of course, the disclosure risk in the data file. At the UK Data Service, we have three different access levels. We have open, safeguarded and controlled. Open, you don't need to register. It's just available without any registration under some sort of CC license. Then we have safeguarded. So this is where you would have to register with us and agree to our end user license. And of course, there might be special arrangements for each collection. For example, there might be a deposit or permission needed. There might be an embargo on the data. Yeah, so those are also options. And then we have a controlled level as well. So this is available for remote or safe from access. This is equivalent to, for example, the ONSSRS. This is pretty much a TRE or a safe research environment if you've encountered them before. So anyone who wants to use this type of data would need to, you know, undergo some training first, pass a test. So this is data that has a higher disclosure risk than safeguarded data. So it needs, there needs to be more safeguards put in place before someone can access it. And in terms of documentation, so documentation files would always be under open access. Right. So, you know, the user guides, questionnaires, blank consent forms, you know, topic guides, everything that we pretty much everything we spoke about today, that will be available under open access except any embedded documentation in the data file. UKDS Data Management Guidance. So this is, as I said, these are the final slides and we included some more resources here for you. So we have some information, of course, we have more information on our website on how to manage data. We also have a book on managing and sharing research data training and events. It's the same page where you probably registered and found this workshop, but we have many other events, webinars, workshops that you can attend. Because we spoke about access restrictions, here's an example of a data collection that has data under different access restrictions. And we included the link here at the bottom for you to go have a look if you are interested in access conditions and the differences between them. Okay, some tools and templates. As I mentioned, we have a model consent form that you can just download and adapt and use for your project. You have a transcription template, a data list template that Maureen talked about earlier. Again, some further resources. So I'll just leave this. We put these in here for you to read later if you're interested. Of course, get connected. If you have any other questions, we realize that we can only answer so many in these workshops. But if you have any follow-up questions or other questions, use the link at the top and that would come into our help desk. It will be assigned to one of us and we can just get in touch with you. Okay, we have a YouTube page as well, Twitter, so you know what to do with that. Upcoming events, these are some upcoming workshops. We can see consent issues in data sharing, how to anonymize qualitative and quantitative data, family, finance, survey, user conference. So for more information, we linked here the events page. Here are email addresses as well. If there's something that perhaps we didn't go in as much detail or you have any questions, do get in touch with us. And now we have a mentee exercise and I'm just going to leave this on the screen. So what you need to do, just go to mentee.com. You can do this in a separate tab on your computer or on a phone or tablet and you can use and please use the code on the screen. You'll be prompted to insert a code, so that's the code you need. I'm also going to put this in the chat. I've already done it. Thank you, Gail. Right, let me share the mentee screen now. So we only have six quick questions. Your answers will be anonymous, so no need to worry about that. This is not to, even though it says quiz, it's not to quiz you. It's just to really raise some of the issues that we talked about today, have a conversation around them. And refresh what we just looked through in the presentation. So you should still have the code at the top. So it's 4608.0763. I think because we are slightly behind schedule, I'm going to start. I think we already have 20, I see people joining. You have 28, but I think I'm just going to start. And if you join later, you'll catch up. You can also see it on your screen. So I'll just make a start. So the first question is what type of data you're working with or planning to share in the future. So we have quantitative, qualitative, mixed, not sure. This is also to help us and know what our audience is for future similar events. So, okay, so I see majority is mixed. Okay, thank you very much. Make a note of that. Okay, next question. Why is it important to produce documentation and please select all that apply. So we have to maximize reuse value. We have demonstrate transparency to enable replication. It aids in creation of metadata for fair research. It enhances research outputs, allows data to be used correctly in future research projects, provides provenance for future historical use of data, or all of the above. Okay, so the answer we were going for is all of the above. Maureen mentioned all the reasons why it's important to produce documentation, but we just wanted to refresh that. Okay, next question. What do the guidelines for fair principles emphasize? Is it machine action abilities? Sorry, just difficult for me. Concise metadata, open data, or not sure. So I see the most popular answer is machine action ability, and that is the correct answer, yes. That is what the fair principles emphasize as we saw. And Maureen talked through the fair principles. Being able to be read by machines is very important for metadata, to be able to find the data, to reuse it, and find it in data catalogs. So that is the correct answer. Thank you. Let's move on to the next question. So can you think of examples of metadata? So I think there was a slide that we looked at that mentioned examples of metadata. Okay, so we have code books, and I see data dictionary and a readme file. So these are data documentation files. These are of course important, and the metadata would be created using information in them. But what we were going for are things like keywords and population, and I see very good answers on the screen as well. Data description, location, topics, very good. They move so quickly. I cannot read. Date of collection, yes, very good. Okay, thank you very much. That's very good. We can now move on to the next question. Oh, they just keep on. Thank you very much. Okay, let's move on. Can you think of documentation for qualitative data? So Maureen talked us through documentation for qualitative data. Can you think back? Maureen, would you like to talk through this one? I mean, I'm happy to, but if you'd like to. Yeah, no problem. Yeah, so we've got user guide, blank and send form interview guide. Not so much the data dictionary. That's more of a quants kind of piece of documentation. But yeah, so there's a number of these that are looking quite good, like field notes, plans, interview guides. Yeah, excellent. A lot of people seem to be putting in code book, which is great. I hope you guys always deposit some of your analytical documentation alongside that would be great. There are a couple of these like data collection method and time of data collection, which might be considered metadata, but you can provide further kind of methodology and sampling information like longer, longer kinds of bits that you've written about it, which would be good. No, looks great. Okay, let's move on to the next question, which will be, can you think of documentation for quantitative data this time? Data dictionary. Yes, very good. Thank you. Code book. Yes, very good. Thank you. Questionnaire. Yes. Read me file. Yes. Right. This looks very good. User guide. Yeah. Listing variables. And a code book or a data dictionary. Okay. Yeah. This looks very good. Thank you very much. Okay. I think this was the final question. Yeah. Thank you very much. So hopefully you've found this to be quite useful in terms of thinking about your own research, how you might organize your research, what kind of materials you're collecting and how you're using them. And if you're depositing data with us, hopefully now you know what kind of documentation you can loot along the way. So do come along to any of our future workshops. So we cover a lot of data management basics. And yeah, that's all from us today. So thank you so much for coming. And we hope to see you guys again soon.