 All right, I think we've slowed down enough. Don't you think, Anka? Shall we go ahead and get started? Thank you everyone for joining us today for this online workshop on how to document quantitative and qualitative data. My name is Maureen Haker, and I'm going to be presenting on the qualitative data bit. I've worked with the UK Data Service for over 10 years now doing everything from ingesting collections to reuse projects. And I also lecture at University of Suffolk. And Anka, did you want to introduce yourself? Sure, thank you. Hello everyone, my name is Anka, and I'll be doing the quantitative part of today's session. I've also worked in ingest for almost nine years now at the UK Data Service. And I also work at Cancer Research UK where I help manage their trusted research environment or TREs, SDEs, as you've heard them referred before. But yeah, that's me, back to you Maureen. So what we're gonna do today, we've got just a little bit of an overview here. We'll start with the basics around what is documentation, why it's important. And we'll move into a bit of a distinction between documentation for qualitative projects and documentation for quantitative projects. And we're also going to talk through what metadata is and why that's important. And then at the very end, we'll have some further resources, some signposting and also time for questions. But going back to basics now, what is documentation? Documentation is all the stuff that you collect that kind of sits alongside your data. It helps to explain and make sense of the data. So another way of thinking about it is really all of the work that you do kind of behind the scenes, the necessary paperwork and the paper trail that helps you complete your project. But at the same time, this may not necessarily feature in all of your outputs. So we'll go through some more specific examples of documentation, but just to start out, there's two broader categories of documentation. And one of these is project level documentation. And this is all the stuff that you collect that tells you about your research project as a whole. So this would be what the methodology is, who the research team were, what keywords might be used to describe your research or where your research took place. But there's also data level documentation, which gives more information about specific data parts of data, which don't necessarily reflect the whole collection, but just a specific element of it. So this might be metadata about an individual participant. So whether whatever their gender, their occupation, their age, or any other kind of demographic details are, and it would be compiled in a way that's quite easy to access. It could also be about specific variables that are collected. So you probably don't need to think so much about whether your documentation is project level or data level, but it is just to point out that there are different layers to a research project. And all of those layers of contacts are important to document and consider when assessing and evaluating your data and the overall research findings. So hopefully you're thinking about examples of what documentation, broadly speaking, is across all levels. So this might be your interview guide, your variable list, your blank consent forms, information sheets. Some of you, if you do qualitative, might take some memos like of your analysis, or if you're doing quantitative, it could be syntax. So anything basically that relates to the paperwork that you create as part of your project, that's not exactly data, is probably going to be documentation. And just to make matters a little more nuanced, all of this material from your research is called different things by different policies. So archives like the UK data service, we'll call it documentation. And under documentation, we have specific kinds of documentation. So there's user guides, data lists, there's data dictionaries, all of which we'll talk about in the workshop. There's also readme files, which you might have been asked to write for us if you've ever tried depositing data with us. But some policies like the ESRC's data policy refer to research materials, which starts to get a little bit messy as to what counts as a research material and what doesn't. It also refers to data assets, which is the system that's used to hold data. And there's also metadata, which I've been heard described as data about data and the specific kind of brand name types like DDI would also be metadata. So while the concept of documentation is simple, the deeper you go, the deeper those ideas run, it gets a little bit more nuanced. If you're interested in exploring some more around some of this terminology, Codeata has a working group that's dedicated to terminology and guidance on terminology, just as a side note, hopefully some of the practical examples we have of different types of documentation will give you enough of a flavor to really dive into any area that you might want to know a little bit more about. So now you know what documentation is, the different levels, and hopefully you're starting to see how embedded these are in doing and disseminating research. But why do we do it? Documentation from the point of view of the archive maximizes the reuse value and is also essential for us to review and publish the data. So you can't really understand data without documentation. So for example, if you just dropped an interview transcript in the middle of the street and someone asked you to pick it up, they can't really understand the data without understanding the context in which it was gathered. This also adds to the historical value of the data as it builds up a kind of provenance for it. Documentation also allows you to expand on the methods and the processes that might not normally get covered in a publication. So there's less space for documentation if you're trying to publish, but if you feel that it's useful for understanding the data, you can make that available as well. And doing so will also help you enhance your research outputs. So documentation in and of itself becomes an output and the documentation can be reused and we'll talk about that a little bit later too. It also adds a level of transparency to your research. So as part of the peer review process, reviewers can better understand the work and the data and re-users can also more accurately and efficiently reuse that data. And finally, it aids in the creation of fair research. So I'll talk a little bit more about fair data in just a moment, but I do want to try and do a quick exercise which hopefully might highlight some of these points for you. So if I could have Emma, please drop in worksheet number one into the chat box. So you should get a link in just a moment. There it is. And what I'd like you to do, let me just see if I can put this on my screen as well. Give me just a moment and I'll put it up, but I'm gonna give you about five minutes to read through this worksheet. And you should see that it is a interview excerpt. And what I'd like you to do is... Here we go. Resume, share. Lovely. I'd like you to read through it and think about what are some of your initial thoughts on this particular interviewee. So where do you think she lives? How old do you think she is? What might some of her hobbies be? Who might be some of her friends? And consider if you need any other kind of information in order to analyze the data. So I'll give you about... It's not very long. It's about a page long, but I'll give you about five minutes or so to give this a good read through and see what you can kind of extrapolate from the data that's there. All of you should be able to access this worksheet. So if you just click on the link that's in the chat, you should be able to see that and bring it up on your screen. We've already got... I'll give you another minute. We've already got a couple of comments about the difficulty in reading it, which I think even for native English speakers, this can be really difficult to read because it's written phonetically. I think that's all I'll say about it for now, but we will address that point as well in just a moment. All right, if anyone feels like they'd like to just kind of pop in the chat, if there's anything that you feel like, oh, yes, so we've already got some people typing some things in. So from Scotland, grandmother, excellent. Somebody has guessed maybe in their 80s. Ooh, even guessing the particular region as well of Scotland. So some of you with a little bit more local context are actually able to guess the area. So we've got some people kind of saying, what is it, Ashington versus others who are saying maybe Glasgow, excellent, great work, yeah. And we've got a few people now who are guessing, she's in her 80s, definitely grandmother, elderly, excellent, or Aberdeen, okay. And part of the age, someone has just written in because of some of the context that she's talking about corporal punishment at schools and that, excellent. Okay, so what I'd like to do now, Am I, if you could just pop in worksheet two, let me see if it'll let me if I just switch. Yes, it is letting me switch screens here without, we do have a second worksheet here, which is the context for this interview transcript. So what I'd like you to do is just take a, just take a couple of minutes to read through this context. And if you don't mind sharing again in the chat what you found most surprising about our interviewee. All right, so a number of you are kind of, some of you are focusing in on the age. So it probably is useful to know when the interview actually took place because some of you may have guessed the time period that she might have been born, but perhaps not her actual age. I think being a grandmother at 43 is probably a very different kind of norm would have been of years gone by. Yeah, so somebody's pointed out interesting that we guessed the age based on the present day rather than based on the time that the interview took place. Is there anything else that anyone else wanted to add? And I'll just give you a moment to type that in while I just switch my screens back to the slides. So just bear with me a moment. Okay, so I think some of you perhaps found some of the additional context surprising and there were perhaps details supplied by the additional context that you hadn't thought about before such as for example, when the interview took place which is actually quite important. So hopefully it gives you a clear demonstration of how context can matter and how it can change your perception of the data. But why share the documentation and data with other people? And this particular point is one of the key underpinning principles to fair data. So fair principles are relatively new guidelines or you might call them goals of research which aim to make research more transparent, more collaborative and constructive. Since the early 2000s, technology has had a massive impact on how research is done. We collect more data which is much more complex and we share it much quicker than we've ever been able to do before. However, despite collecting so much data all the time we still have challenges to processing that data. So just think about any organization where the data is not shared between departments and you have to constantly re-enter the same information again and again. So how do we solve this? Well, in 2016, Wilkinson and others published the fair principles which outlined what good data management looks like that would enable the sharing and the reuse of data. And the key here is to make the data machine readable to be able to use the technology that's a massively changed the way research is done. And these guidelines were so influential that an international collaboration was established. In the GoFair International Support and Coordination Office just a year later and these fair principles continued to influence policies. So you'll see fair references in data policies of the Research Data Alliance, the Association of European Research Libraries. For those of you based in the UK, UKRI and all of its research councils also refer to fair data. And if you receive any grants or taxpayer money to complete research, chances are if you're anywhere within Europe you will be asked at some point to share the data and the documentation at the completion of the project. Many publishers are also now requiring the sharing of data and any research materials before the publication of your work. And all of this is sort of done in the name of transparency and research rigor. The fair principles state that data should be findable, accessible, interoperable and reusable. So research is not just something that you complete in the solitude of your academic offices, but it's something which is completed in collaboration with others and then shared for future reuse. To make your data fair, however, also means documenting the data. To make data findable and accessible requires clear metadata. Equally to make data reusable means documenting the provenance of the data. I won't go further into the fair principles. They are listed here, but please do send any questions that you might have about the fair principles through and we should be able to get to it in the Q&A. So we're happy to give a little bit more detail if you're more interested in fair, but we just wanted to give you a bit of background into context into why we are documenting data and why that's important to share. So from here, I'm going to go a little bit further around documentation for qualitative data and then I'm going to pass off to Anka to talk about documentation for quantitative data and metadata. In talking about qualitative collections, I'll show you some good examples of documentation, but also talk about how you might utilize changing technology to make documenting your project a little bit easier. And then finally, I'll talk a little bit about reusing documentation. All right, so here are some examples of documentation for qualitative work. Basically, all the information you probably create along the way of doing research, but which never really sees the outside of your research team or maybe perhaps some of your participants. So this can include interview preparation. It can include instructions to interviewers, prompts, topic guides, blank consent forms, information sheets, or any other materials that the participant receives prior to taking part can also count as documentation. It could also be text that is written by you expanding on the methodology or sampling, including where it might be permissible under copyright, extracts of publications or draft work. What we don't often see, but can be really useful are things like research meeting minutes, research diaries, field notes, documentations from analysis. So that can include things like memos or code books or even initial analysis write-ups. We've got about 1500 or so qualitative collections and we don't often see things on analysis. It's really interesting. I'm not sure why it's not typically part of the documentation for qualitative collections because that's something that's pretty standard in quantitative collections, but there's a layer of kind of analytic transparency, if you will, which really helps to validate research findings and it can really help re-users better understand some of the methodological decisions that were made or perhaps not made in the process of cleaning up the data. And this kind of context is actually really, really important to qualitative work. This is part of the arguments of why you do things qualitatively. Some of that work can be embedded within some of the analytical documentation. So the better you understand that context, hopefully those materials then would be actively considering would be actively considered by the research team during the data collection and the analysis, collating them and making them available as documentation then really helps to achieve what that qualitative work is aiming to achieve. Within collections that are curated by the UK Data Archive, these materials would be collated into what we call a user guide. So the user guide is then bookmarked. So you can see in the image here in the upper right-hand corner of the screenshots, there's little bookmarks there which tell you which materials are available in that user guide and those tend to encompass the project level documentation. In addition to the user guide, the UK Data Archive also curates a data list which is basically an at-a-glance look at participants. So this would probably be considered more like data level documentation where you might have metadata like basic demographic details about your participants or even just like the file name where you can find the relevant data. The details on the data list are not necessarily standardized or all-encompassing. It's meant to be the details that were relevant to the project itself. So those can sometimes take a little bit time to assemble but they can be really useful organizational tools as well during the research. So it's generally a good practice to create one of those as you go along. While the collections curated by the UK Data Archive will have user guides, you may also likely see a folder of documents or just separate files on collections that are deposited by researchers themselves. So we do have part of the archive that's like a self-deposit system. And in the self-deposit system, you'll often find a folder kind of like this where it lists out different kinds of documentation. So in this collection on political dissatisfaction, you can see each file that was included with the data from an end of award report to the information that was given to participants. This particular study is a multi-method project. So all of the documentation definitely makes it easier for re-users to kind of piece together that project again and understand what was going on. Other types of documentation can include observations that were written by researchers in the moment of data gathering. This is a little bit interesting because there is perhaps some types of documentation which are a bit of a gray area between whether it's documentation or data. So this is probably a good example of one of those. Some people might use something like interviewer comments or field notes as data itself. But it can also be considered documentation. Some types of methods will actually dictate taking time for self-reflection as part of the method. So this may be taken as standard and again, probably would be used as data. Other types of methods would just recommend this as a good practice. So these particular comments, they're just a few sentences that were written after every interview by the interviewers of the affluent worker which is the research which eventually was used to create our ONS categories of class. So they help to contextualize the relationship between the researcher and the participants and they're actually really helpful data level documentation. Again, this is a little bit unusual. I'm not sure if it's that the methods don't often dictate this kind of quick reflection after the interview or at least to actually write it down or perhaps it's just that this is something that's not often shared. But again, this really helps to rebuild what the power dynamics were of the interview. A step further, perhaps, then field notes are the draft work of analysis. So this is taken from Dennis Marsden's mother's alone collection and he's got this piece on felt poverty. And it was written but never actually made it to publication and it was included in the documentation for the collection. And it's a really interesting collection. So this was led by white educated males who were interviewing single mothers who were living on welfare. And the decade that this took place in the seventies, some really interesting gender dynamics going on as well as class dynamics. And this piece I think really helps to provide context to show the kind of sympathy that the research team felt toward their participants even despite the kind of time period and what kind of attitudes you would think would normally persist even within research. Annette Lawson's study, which is called adultery, an analysis of love and betrayal was a project that was conducted in the 1980s aiming to explore the extremely taboo topic at least at the time of adultery. As such, it was really, really hard for her to recruit participants. So Lawson chose to put out a call for participants in a newspaper but it created an arguably biased sample. So most of her participants were white, mostly middle class women and they were the ones that sort of self-selected to return her call for participants. So Lawson was a little bit preoccupied with her sample. So she wrote a 54 page defense of this sample. So she started here, as you can see a couple pages with a discussion on some of the ethical conundrums that arose from her sampling strategy which included, for example, jealous partners sending in the information of their married partners to participate or another man who seemingly called in from a psychiatric ward. She also had extensive comparisons between her sample and the national population. So she explored what she thought was, you know, what was said to be a significant difference from the national population and kind of expanded a little bit more on this, saying, well, the areas that it is differing from the national population do those actually matter very much and really sets out in great detail what the differences were. Until she finally comes to some interesting conclusions here about sampling strategies more broadly, including the point that sampling needs to match the context of the study and that exploratory studies are benefited from a greater focus on the ability to talk about the topic in detail rather than focusing on the participants. Earlier I had an example of interviewer notes but field notes are another great, you know, example of very detailed documentation. Field notes are a little bit different in that, again, they're occupying this kind of gray space between being data and being documentation at different times. So it's worth pointing out that documentation is something that's normally openly available within an archive. So things like field notes and other reflections, depending on the level of detail, may need to be put under similar access restrictions as data. There are probably only about a half a dozen collections with examples of field notes like this and almost all of them are from ethnographic studies. But, you know, who knows, maybe we'll see a little bit more of that kind of documentation in detail about our collections in future. And there's also new possibilities with changing technology. So for example, EnVivo and other computer-assisted qualitative analysis software allows you to download your code books, your memos and your mind maps that you made along the way of your analysis. And here you can see a list of what EnVivo calls nodes and their descriptions and this is all very easily downloadable. When we received, for example, the Edwardians collection which is our founding collection deposited in the 90s contains 453 80 plus page interviews with British residents who were born during the Edwardian period. And it was accompanied with all of the documentation of the work that was done by hand. So there were 16 shelves that were dedicated to holding the coding of those transcripts into key themes. But of course, now we can receive the single project file if you are using something like EnVivo or Atlas TI or Mac stuff. And all of that can be downloaded at any point after coding and turned into a PDF. Research teams can also use their blogs and their websites to keep in touch with participants. They might keep a research blog which updates others on the progress of the projects. It might post information for participants or it might even send out calls for participants. And then once the project is done, that site again can sort of sit alongside the project as a related resource providing, again, additional documentation about the process as a whole for going through that research. We also see more creative documentation such as this photo story. Again, this is the sort of thing that can be classed as either documentation or potentially data. And because it contains images of people, you would want to make sure that this is something that there is appropriate consent in place to use but there is scope to be able to use video and audio files to accompany data. We have also kind of interviewed PIs about their projects for them to expand on the methods and the sort of things that came up during the research. Finally, changing technology is not just for how research is done, but also how we archive. So the UK Data Service has created Qualibank which is an online tool for searching, browsing and citing qualitative data. And as part of that tool, you can search and view qualitative data online but you can also view the linked documentation. So this documentation can relate to a specific piece of data or it can relate to the collection as a whole but it allows the distinction between project level and data level documentation and would kind of seed it right next to the data within our user interface. And finally, I just want to make one final point about reusing documentation. So often we think about the reuse value of the data specifically, but less about how documentation has its own value. So documentation can serve as an inspiration for good practices. So adapting consent forms, for example, or information sheets rather than writing them up from scratch. So we have hundreds of consent forms documented within our collections, including snippets of the one that you see here which explains what data sharing means. So feel free to explore those, learn from the good practices of others and reuse it in your research as well. And you can also examine how to do research, for example, with children or vulnerable groups or other really challenging features of research. Things like writing up information letters and consent forms for children can be especially difficult. So we have some really good collections which provide good examples of how this has been done in the past. And you can also do the same with data collection. So one collection, the Foot and Mouth Disease in North Cumbria, deposited their interview guides which were then reused by medical students to better understand that doctor patient dynamics. I also refer dissertation students to find similar interview guides on their topics before setting out to make one themselves, just to help them see what's important to ask, perhaps what they might not want to ask and basically how those decisions are made. All right, I think we are over to Anka now. So give us a moment while we just switch screens. Thank you, Maureen. Just share my screen now. I'm just gonna go through the chat as well and see if there's any questions that are coming up. We've got a few in the Q&A but if we miss any from the chat, feel free to pop it in the Q&A box and we can get to those either at the end for the Q&A or I can try to type in some answers as well. Okay, so we're going to talk in this second half of the presentation about quantitative data, documenting quantitative data as well as metadata. So in this section on documentation for quantitative data we're going to look at exactly what documentation should accompany quantitative data. We're gonna have a quick look at embedded documentation as well as what good documentation looks like and of course we're going to have quite a few examples to look at and also quite a few links as well for you to come back at a later point of course when you'll have the slides. Okay, so just a quick summary for in terms of documentation for quantitative data. So we have on the screen a few examples as Maureen said you've probably already heard these mentioned around if not at the point of deposit then when producing them in your own research when using them as research instruments. So we have questionnaires, code books. This would either be an embedded code book or a separate file to the data file. We have data dictionaries, user guides, experiment protocols and read me files. We are going to have a look at each individual one in detail in a second but I just wanted to give a summary of documentation that is specific for quantitative data and really what we're gonna touch on in this second part of the presentation. Okay, so the diving into it. In terms of documentation for quant data I think the first sentence there also applies to quality data, applies to documentation in general. So in terms of what we would need to include when we share our data in an archive or repository really we need to ask ourselves what would someone need with of course no prior knowledge of the project or the data perhaps someone who wants to I don't know use it in say five years from now what would they need to be able to understand and use the data that we are sharing correctly in their own research project. Okay, so for quantitative data so this is to be a bit more clear as we're referring to structured tabular data documentation can be as I said they can be embedded in the data file. So you can have a variable and code descriptions in the database itself or you can have it separate in a separate file that you then upload alongside the data file. As I'm sure you're aware most data analysis software packages will have this option to add data notation and descriptions, variable attributes, data types and so on. So and then that can easily be exported and uploaded alongside the data file and of course it can be left in the data file as well. We're going to see later on why having it embedded in the data file might be an issue but I won't touch on that yet. And then as I said alternatively that information if it's not embedded in the data file it can be exported or recorded separately in a document such as a code book or a data dictionary. Okay, so this is an example where we have embedded documentation, SPSS for those of you that have used it are very familiar with this view, screen shot. So in the variable view section we have all the information about the variables and this should really be if you're planning on sharing an SPSS file this is really in terms of documentation this is really where we find quite a few issues when we have this type of file deposited especially in the value section. So not all the values will be declared for example and just to be more specific for categorical variables it wouldn't be clear what zero is where one is, where two is there wouldn't be any value labels. The labels as well they should be quite intuitive and easily understandable and they should ideally correspond to or it should be clear to what question and the questionnaire data referred to and the missing values should also be declared but we're gonna have a look at that at a later stage but this is an example of embedded documentation in the data file. Under data view of course we have the data itself and it will be deposited as one file in the archive. Okay, I mentioned user guides so what should a user guide contain? So this will differ across projects there isn't necessarily a golden standard of how these should look like, they vary but in terms of what content they include this will be data collection methods, field work, information, consent procedures, interview schedule, et cetera. So of course as I said this will depend on the project and I've added here three examples of data collections that you can have a look at their user guides they're very good and very comprehensive just to have as an example and you can of course adapt them to your own project. Code books and data dictionaries. So as I said a code book can be, this is the information that could be embedded in data files and I said earlier that that could be an issue potentially because of access restrictions. So if we deposit a data file that has embedded documentation that data file will potentially be under some sort of access restriction unless it's published openly of course but if it's under access restriction then also the embedded documentation will be under access restriction whereas that the data service and in general for all repositories, excuse me, and data archives that their documentation should be openly available. So this is why it's preferable if the documentation is embedded for that to be exported from the data files that it can also be uploaded separately as a documentation file or that a code book is also created separately if the data file you are uploading doesn't have the facility to have embedded documentation then you would just need to create that separate documentation file to a company. I'm sorry, my voice is not the best today. So this should, so in terms of what a code book or data dictionary should include. So this is, as I said, detailed and sufficient information about all the data items. Any variables of course that are new that have been produced or derived ideally including as well the frequencies, the command files that we use to create them just so that whoever uses in the future is aware of that information and how they were produced based on what variables, et cetera. In terms of code book creation tools, so there are a couple mentioned on the screen. So we have DD Editor and Nestor Publisher. Again, these are aimed for curation purposes. So they are used in general by data archives not as much by individual researchers but I just wanted to point it out, put them up for you in case this is something that interests you and you'd like to look at a more automated way to produce documentation files and metadata. Okay, a couple more examples on code book and data dictionaries. As I said, I wanted to give plenty of examples so that you can go to the data collection and have a look exactly at what we mean by good code books and good data dictionaries. So I was just going to leave you to have a look at this in your own time, obviously when you have the slides. So moving on to data level documentation. Marina's made the distinction between study level and data level documentation for quantitative data. Data level documentation includes adequate variable names as well as variable and value labels. I've already touched on this previously on the SPSS slide but in terms of variable names, these should question, they should mention the question number in the questionnaire so they should relate to that so it's clearly what question they correspond to. They should have a numerical order system. There should be meaningful abbreviations or combinations of abbreviations and they should be of course consistent in naming conventions across the entire project. So if there's more than one dataset produced by perhaps collaborators at different locations then make sure that you're using the same naming conventions throughout. And of course for interoperability, I always have issues saying that across platforms, these should not be longer than eight characters and please avoid using any spaces. So instead of space use underscore. That's why I usually do it. Data level documentation. So moving, continuing with that. So similar principles we have for valuable variable labels. So these should be as I mentioned earlier in the SPSS file as brief and concise as possible. Intuitive, make sure you keep it short though. Make sure to include the unit measurement where that is appropriate. Make sure that you're aware of any coding or classification schemes that are used. And of course as I said, reference the question number or the question number of a survey or the questionnaire that was used. And there's an example there of a variable name and label. So I won't go through that but that is exactly how we got to that variable name. Okay. For value labels, some of the same. So ensure there are no out of bounds values for categorical variables. I mentioned this in the SPSS slide already. So make sure that all the values are described and it's clear what category corresponds to what number. Avoid having blanks as well. System missings are zeros. Make sure you declare those. Yeah, and make sure that that is clear to make that distinction between different types of missing. So whether it's not recorded or skipped or, yeah, et cetera. This is an example of variable information for different variables. As we see, we have the label, missing values, the measurement level. So all this is very important and it can be easy to be produced. This is an SPSS output, but you would be able to obtain the equivalent in other software as well. Variable values as well. So as we can see, this is clearly labeled in this case for three different variables. Again, this is also an export that we can obtain as well as the next. We have two different variables here and we have the label. We have whether it's numeric or string or, yeah, and then we have the different values as well as the labels. So it's very clear what is what. Okay, moving on to metadata. So we are going to firstly discuss very quickly what it is, although Maureen has already touched on that and what qualifies as good metadata. We're also going to have a quick look at metadata standards and how that will be produced and how we go about producing metadata. Okay, so what it is, it's basically a different type of data documentation. It's a subset of core data documentation, but it's a type of documentation that provides standardized and structured information. And as opposed to a file, a documentation file that we would produce, it's intended for machine reading. So it is, because it is indexed, that is the word that we use, it is indexed, it's available to, it's possible for that to be read by machines. It allows data to be found in data catalogs. So that's why it is so important that it is machine readable. So it's important for the purposes of cataloging, citing, discovering data collections in a data catalog. So if we don't have good quality metadata, of course, we will have data collections that are not just impossible to be found in a data catalog, but also they would have limited value because users, first of all, wouldn't be able to find them, but also wouldn't be able to use them correctly or decide what collection is suitable for their project or not. And of course, at the end, they wouldn't be able to cite that data collection once they've used it. Some examples of metadata, I listed here on the screen. So we have abstract keywords, topics. Yeah, I won't go through all of them. Metadata in general refers to data about data, but when it's usually study level documentation. So we don't necessarily refer to metadata as a variable label. Metadata mostly applies to study level documentation. So collections that would appear in the data catalog and the information that you land on a page and you see that information. And that is exactly what we have on the screen here. I think there was a question about keeping metadata. How do we keep that from revealing information that we cannot reveal about research participants? And I think a key aspect there is that this is study level documentation. So it is general information about the whole study, not about individuals in the dataset. Also, good metadata. So Maureen already touched on the fair principles and as we can see on the screen, although some of the rectangles are not exactly where they should be, but we can see how many of them relate to metadata. So all of them except for three. And I'm not going to go through each individual one because you can find this and indeed it in your own time. But this is how important metadata is when trying to comply with the fair principles. And if there are any questions in more detail about this or maybe it's not exactly clear what is required in terms of metadata from that particular principle, do add a question or if you're happy to contact us, we're always happy to talk about this. Okay, producing good metadata. So how do we go about producing this? So archives or repositories will collect this information. At the point of deposit, you probably have to fill in a data deposit form or data offer form depends. It's different repositories and archives use different terminology, but they all are intended for the same purpose, collecting that information so that then the archive can use metadata standards to index that information so that they are able to be discovered and discoverable in their data catalog. They also promote interverability across platforms and of course for data to be reusable once found. So when you submit your data set to a trusted data repository or an archive, these standards are automatically applied. So this is not something that you need to worry about producing good metadata. This is something that as long as you pick an accredited fair complying repository or archive, they would do this themselves. At the UK Data Service, we use the DI to support, to structure our catalog records and that stands for the Data Documentation Initiative. And again, we don't have the time to go into this, but we included a link here and also at the end of the presentation for you to read more if you'd like. But what it basically is is a detailed metadata standard. It was originally designed for describing social and economic science data and it is still used by most social science data archives in the world. And of course it contains both mandatory and optional metadata elements. When you fill in that data offer form I mentioned earlier at the point of deposit, there will probably be some fields that are marked with an asterisk, meaning they are mandatory, so they absolutely are needed for us to build that catalog record in our data catalog. And there are of course other metadata standards as well and this will vary usually by disciplines and I included a link there for you if you'd like to read more about this. We also have controlled vocabularies. A controlled vocabulary is a consistent and organized way of describing data, essentially to again make it more findable and shareable research communities. So examples here are subject headings, disarray, ontologies, and taxonomies. At the UK Data Archive we use HACET which is the Humanities and Social Science Electronic Dessaurus, and we also are curating its multilingual sister which is available across, I think now it's 16 different languages, so yeah called ELST or European Language Social Science Dessaurus. So in terms of reviewing a data set, you can find this online and I included the link at the bottom of the screen. This is for reshare which is our self-deposit repository. So you can find this available in terms of what checks are done to the data once you submit it for review. So when you want to publish your data set it would come to us, we will check it and then we will probably come back to you if there are any further edits needed but if you want to speed up that process and you want to deposit a great project that wouldn't necessarily need any edits then you can have a look at the type of checks that we do and already work on that to make sure that the data set you are submitting for review passes the first time around and can be published. And of course there are project level checks, file level checks and I'm not going to go into detail but of course we check all the files that are uploaded we check the data in terms of disclosure, a risk. So someone asked about an organisation and how we make sure that what we publish is in line with what the respondents have agreed to so that is where when we look potentially at the blank consent forms and we make sure that they align with the access level that we choose on the data and the level of detail that we choose on the data and if you're interested more in an organisation we also have a workshop on that which we ran a couple weeks ago but the recording should be available and the slides as well on our website. Okay so we're reaching the final slides here so in terms of a data sharing checklist with documentation in mind so our advice would be to check with the archival to repository where you're planning to share your data in terms of their guidelines and creating the type of documentation that they will be needing perhaps they have templates that they recommend using that you can just download and adapt to your project and we have those as well in terms of the data list and the readme file and then of course you'll have to fill in the data for the data deposit form as I mentioned earlier but please make sure that you do this in as much detail as possible because that will allow the archival to repository to create that valuable machine readable metadata and the richer the better for all the purposes that we saw earlier and of course ensure that your data files also contain data level documentation so all that important information that we already saw it's important from a data quality perspective to be able to have that in present for others to reuse it Finally in terms of accessing data so I mentioned that documentation files are usually made available under open access but these are other access options that we have at the UK data archive so we have open safeguarded and controlled open it's available without any registration safeguarded the user would need to register and sign our end user license there's also special agreements there in terms of asking permission from the depositor to use it or the option to place that under embargo for a fixed time period and then we have the controlled access this is for our secure lab or our version of trusted research environment and this would be data that is quite sensitive and quite detailed so there will be some extra steps there in terms of training the users etc and testing them as well there's an assessment there as well so yeah these are the different access options and of course they usually apply to data documentation as I said is made available openly except for I know there has been situations where a data list for example has been placed under safeguarded access because there was some individual level information there but that is just decided on a case by case and I know there are some questions about this earlier in the chat but usually in terms of data dictionaries, code books especially for quantitative data that is usually made openly available and as I mentioned we included some links and for the resources at the end of the presentation here for you so we have a book as well that we I think last year we produced a second edition and of course we have quite a few training events and I included there a link to our training page on our website in practice I think this slide should have come right after the access the different access levels somehow it skipped one so in practice we have of course collections that we have file names under different access levels so we don't apply access restrictions at the at study level so one data collection as you access it in the catalog could have a few data files and those data files would be under different access levels an example here we have the food and mouth disease in North Cumbria where we have interviews written diaries we also have the transcripts and those are available for registered users but we also have well had they were embargoed until 2015 and then audio files as well were only available for with permission only so yeah this is an example just to flag that the possibility is there to apply different access levels to different data files it doesn't have to be the same access level especially if you have a mixed method collection for example yeah the option is to to have different access levels some tools and templates as I said other archives and repositories might have these as well we also have them to make it easier for researchers and depositors you can just download these and adapt them to your project some further resources as well for you to have a look of course get connected let us know if you have any questions or yeah there's recordings of past webinars and workshops on our YouTube channel as well I mentioned the anonymization webinar we ran a couple weeks ago that should be there now and of course we run recurring workshops on different topics data management topics if that's something that interests you and I linked the events page where you can register and we added our contact information here again if you have questions that we're not able to answer now at the end of this session or there was something that wasn't clear please do get in touch with us okay I think we're almost on time so thank you everyone hopefully we answered all the questions well I know I haven't answered a couple I'm happy to do it yeah if we missed anything and yeah thank you very much I hope you have a nice rest of the day and hopefully to see you in another workshop sometime soon bye everyone