 G'day everyone. Hello. Welcome to webinar two of our findability module in this Fair 101 course today. My name is Liz Stokes. I'm from the ARDC and I would like to acknowledge the Gadigal people of the Eora Nation who are the traditional custodians of the land on which I'm standing here. I would like to pay my respect to elders past and present and acknowledge my any First Nations people who are joining us here today. So welcome and thank you very much for your patience as I made my way through some incredible, incredibly poorly timed tech snafus today. So thank you again. I would like to introduce you to the team that is bringing you this course but almost all here we have Andrew White and Nicola Burton from our engagements team. I'd like to wave to our people. Mattias and myself are from the skilled workforce team at ARDC and we have webinar maestro Susanna from our communications team in all her headset glory keeping this webinar ship running today. So I just wanted to make sure that everyone here was able to see the people behind and it's not just the presenting. There's a lot of stuff that goes on behind the scenes. We are like an iceberg aren't we? Okay well on on with today's show we're gonna I'm gonna concentrate on the role that metadata plays in facilitating fair data. So thank you for your enthusiasm. So far it's been great to see so many introductions on the Slack channel and thank you for your lovely feedback from our first webinar. A few participants have asked some really interesting questions after that so we'll respond to these hopefully by the end of this week and we'll put answers to those in the Slack general channel. Also for the links that I refer to in today's presentation, Mattias will share these with you in the chat as we go along. So as an optional activity you can follow along with me if you like. Okay so if you do have any questions for today as we go along please pop them into the question window. You'll notice also that there might be a chat window as well and you can have a go at that one too but just preparing you will probably put all the links into the chat window. Hopefully that appears smoothly for you. Okay and of course if you're the tweeting type feel free to use hashtag fair101 as we go along. Okay so let's move it let's move it along. Okay so welcome there's the Slack information and the Twitter so if you haven't joined our Slack yet please do so. Find your equivalent at another institution in the introductions channel and please use the general channel for questions or comments for everyone. Okay there's also a link to our code of conduct. If you haven't had a chance to go and look at that please have a go. Okay now today's overview so we're going to get into metadata. Okay our big focus there is looking at what rich metadata for research datasets look like. We'll have a little poke around research data Australia and I'll also highlight a few recommended discovery platforms and wrap up with the activities quiz and some preparation for the community discussion next week. So if you are metadata lean clients today I hope this webinar meets your expectations. So whoa running ahead there. So the fair principles we're going to cover today are that data are described with rich metadata no I don't mean bling that metadata do not hide or disguise the identifier of the data that they describe and that both metadata and data are registered in a searchable resource. So all of this speaks to how metadata helps make things findable at both the item level and the repository level. So metadata let's get meta. My garden variety definition for metadata is that metadata are structured data about data. Okay so there are standardized these are standardized methods of describing research data so that humans and machines can understand what the data are about. Metadata are the primary tool for finding and retrieving data about almost anything and the more thoroughly something is described in metadata the easier it is to find. They are also organized in schemas which also have their own metadata standards. It's really possible that I might go into a rabbit hole at any point because of this metadata topic but I will do my level best not to fall down quite so early. Okay and the main point of having structured data about data is that it is machine readable and human readable. So perhaps if you're a beginner at this you may question the human readability of it but you know once you get used to it it's really nice. So types of metadata okay broadly speaking these are a few different types here and I've described these on the slide it's useful to know the purpose of different types of metadata so that you have a rough guide for assessing the quality and completeness of the metadata and ultimately the findability and fairness of a research dataset or collection. Okay so here we have descriptive metadata describing the content and context helping people make value judgments about the about the research data. Structural metadata contains information on the relationships inherent to the dataset how it's assembled and versions for example. Administrative metadata is for the people who are managing and curating the data so potentially quite a large portion of our participants today. It's quite a large category and for the fair principles I just wanted to highlight these following subcategories. Technical metadata which is useful for systems software and services you know is the metadata is the data compatible in a compatible file format for example access and writes metadata tell us who is allowed to access the data and under what conditions and preservation metadata keeps a record of actions to take taken to preserve the data and metadata into the future. So let's take a look at some crossref metadata in action. Okay so this diagram shows how when publishers register with crossref the metadata about their resources and metadata about research outputs including the all-important persistent identifier as we learned from Matthias the DOI is exchanged with crossref and all the systems which use metadata to credit and cite the work report impact of funding and track outcomes crossref has just released some new educational materials and documentation on their website and I recommend you check it out it's really nice looking and quite clear. Okay let's move on so here are a few examples of metadata schemas and this is one of my techniques for not going down into a rabbit hole I'm well I'm just going to give you a couple of metadata a couple of examples here so we've got Dublin core which many of you may be familiar with it's a very common metadata schema it's used for describing resources on the web okay schema.org is what major search engines use and is growing in its use across commercial applications data site and crossref both describe research data and research outputs and these two are the kinds of metadata schemas that I would encourage you to take a look at in a more in a deeper way in your own time if you're interested on in taking this a little further okay and of course here here's a little hat tip to a couple of disciplinary metadata schemas so the data documentation initiative is one that has been developed for social sciences and Darwin core is one that is used in the biological sciences but I should be I should make sure you have a word of warning okay because metadata reflects what humans and machines thought was important to know at that time okay as technology and services evolve so true to the standards we have for sharing and storing data so when you're searching for data or you're creating metadata records for data keep in mind that there isn't really one single standard practice or universal standard or vocabulary to rule them all right the methods for searching and the terms that people use are driven by why people are looking and how that data is stored okay so let's look at some examples okay so and with that over there I'm going to show you a couple of excerpts of metadata records in xml format expensive markup language format I should say then I'll take you over to research data Australia and look at the metadata of another research data set so what we have here and I hope you can hope this doesn't freak you out too much I've certainly been there here's an excerpt of a metadata record for some software actually and you can see at the top under the resource we can see in fact the metadata schema that they used they're using xml namespace and the schema location that's cut off but we get to see schema dot data site dot org okay and there's the metadata schema that they are using so the metadata identifies the schema and it also identifies the identifier okay you can see in this element here under identifier it tells us what type of identifier it is a DOI and then we get the content or the value of that metadata and here it is there's the DOI and just draw your attention down here to the metadata about the creators so we can see there are a few different creators but this first one is pretty special because AT Zalinsky has also got a name identifier attached to them and we can see the scheme URI universe or resource identifier is ORCID and we can see the name of that identifier scheme is ORCID so it also tells the machines that and here we can see the 16 digit ORCID identifier in the value component there so this is an example of how metadata has its own standards for for being set out and there are some there are a few conventions that I'm not going to go deep into really right now in order to help help the metadata and the data be readable from humans and machines let's look at another example here this one I apologize for the text being so small here I'm going to draw your attention to this this is another research data set metadata record and the title of this is identification of putative novel specific targets of MIR 210 in A549 human adeno carcinoma cells what I want to draw your attention to is in the subject element here you can oops in the subject here we see not just keywords or subjects which tell us what this research data set is about but actually we can see some attributes attributes that are specified and this first one here might be familiar to those of you who work in the health sciences and are familiar with mesh terms mesh stands for medical subject headings and it's a vocabulary of medical terminology that that is used in many medical and health databases and you can see there it's not they're not only just using mesh terms but they're also using a few other different vocabularies there we'll come back to vocabularies and controlled vocabulary and some ARDC vocabulary services later on in this webinar series but for now I'm just highlighting that there now I think it's time to have a little look at research data Australia so you can see the URL up the top there researchdata.and.org dot are you and I'm going to invite you to hop along and see if you can jump over there so here we are in research data Australia I wanted to introduce you to this because it's a metadata aggregator that we run you could also call it a repository of metadata so it is a repository of sorts but what we do is we aggregate the metadata from lots and lots of different research data sets from research repositories around the country let's see two rocks mooring so what I wanted to do here was take you over to one of these metadata records here we go two rocks mooring and show you that this record here has been contributed to us by Syro from their data access portal okay so if I scroll down you'll be able to see there's a bit of information there some brief information there's a brief description okay and we get some important information such as the licensing and the access details a who it's related to and some time there's some time period and geographic location information down here there's a little section for the do is and we can see okay so there's a local DOI identifier there's also the official DOI identifier here and look there's a little data site logo what I'm going to do though is actually scroll right down to the bottom and click over here on the bottom right hand corner which says registry view and this allows us to see aha this allows us to see the same metadata record but now we get to see what metadata is being used and what attributes so again if this is the first time that you've seen a metadata record in nested table format breathe deeply along with me and we'll just have a look and see what we can see okay so up here we've got some basic information about who it came from and how and here we see the name information and you get to see how this view separates the values from the attributes and the metadata elements right here so if I show you highlight here the identifier field so this is metadata for identifiers we've got two of them one of them is a local identifier one of them is a DOI and here are the separate values okay and another good example for showing how conventions of setting up metadata is by looking at the date so within coverage and temporal coverage so time here we have a date and this is the date format w3c date time format okay tells us about what type of date it is the start date or date from and here is the value and you can see that it's written in that date time format y y y y m m d d okay I can feel I've gone down into a data librarian rabbit hole so I'm going to pull back pull back out of that and see if I can switch over to come back to my main green okay okay so coming back to discovery platforms there okay um okay so you may be wondering well where can I find out about other metadata discovery platforms and how do I know how to find other research data repositories that may exist well the answer here is provided by data site and they have a site called re3 data it's called re3 because it's got the registry of research data repositories and this collects information indexes all of the research data repositories that exist in the world okay there you can browse this you can browse this registry by a subject or country so it's really useful if you're looking for things in your own backyard or if you're looking for repositories according to a particular subject or discipline and the metadata that re3 data uses also covers terms and standards and licenses for those repositories so it's actually quite useful if you're looking looking to compare different research data repositories for for your purposes okay let's move on the NOTO is a multi-disciplinary data repository which is hosted by CERN it's a I'm highlighting it to you because it is it's very big and it's quite open okay so you could almost put anything on there they have a wide variety of data types and content types on there and it's a very easy way to get a doi for a resource and link link it to your orchid as you can see over here we've got an orchid for one of the contributors and and you can you can probably get into that a little little later actually I would like to highlight Zanodo has a communities function which is essentially a grouping tool which allows you to collect certain resources across the Zanodo repository and put them together in a short list or in a community there's actually a great community that the digital curation center curate for research data management resources so I recommend you have a look there and explore that community for their resources moving over to data dryad okay data dryad was one of the first really popular research data repositories it used to be based in the life sciences but now it is much broader and they've recently launched a service for institutions to host host research data for institutions they currently host data for a lot of journals and data dryad is a service that is provided by the california digital library okay they have a data processing charge but they do attempt to preserve indefinitely but of course it's up to the researcher or the contributing people to prepare the context for that one thing I would like to note is that all content in data dryad has a cc0 license okay which means puts it in the public public domain they have also have a great fair guide already and that's there's a link to their best practices in making data fair in links that I'm sure my trust Matias is sharing with you right now okay another good excuse me repository is ICPSR which stands for the inter university consortium for the preservation of social science research so you can see with a name like that you'd want to learn the acronym I'm highlighting ICPSR here because they have been awarded the core trust seal okay and the core trust seal is a process of certification for repositories and it's a really solid step towards formalizing your organization's commitment to facilitating the creation and use of fair data so if you are you're interested in taking that further I would recommend you have a look at core trust seal certification the other reason that I wanted to highlight ICPSR is because they use the ddi schema that I mentioned earlier data documentation initiative so this enables this really helps researchers produce material that is findable and is well structured into the future because they're because all contributions to this repository have to comply with the minimum standards for that metadata schema and they also provide some really great guidance on preservation and archiving of social science data as well and let's wrap up our hit list of recommended repositories with a hat tip to the Australian data archive and I'm sharing this example here because they also have achieved a certification for the core trust seal but also because not all data that are available in the Australian data archive are actually open they do provide mediated access to some to some data okay but certainly the metadata is open and available and they use the Harvard dataverse infrastructure if you are wondering on how they do that okay so it's nearly time for me to wrap up about that so let's come back to metadata and ask ourselves what is it that that can take us to providing providing the metadata that will enable us our research data to be findable in other words how do we get metadata into a shiny status of metal data at this point I really have to thank the ARDC comms team for making these beautiful graphics for me puns are one of the main methods of learning for me personally but back to the content of the actual course sorry so in summary to make data findable we want to be using persistent identifiers we want to be describing data with rich metadata and we want to be ensuring that that metadata are indexed in a searchable resource like some of the examples that I showed you okay that brings us to the end of the findable module so now I'm going to give you a little heads up on the activities quiz and community discussions and then we can get into any questions that you may have so the findable activities are findable activities are to do one or more of the following read an article that started at all or you could go and browse the GoFair website which is quite helpful actually number two create and link to your orchid profile number three explore some repositories for research data so that will give you an opportunity to go a little deeper into some of these repositories that I shared with you today okay there will be a link to the activity worksheet for you right about now in the in the chat window I expect moving over here the quiz just a note on this quiz this is for testing your knowledge and what you've hopefully gained from our webinars today and on Monday you will be required to do the quiz for each module to get the certificate at the end but it will be open until the very end of the course and you can do this as many times as you like would love to know how you go and how this format is going to for you and a few notes about the community discussion okay so check your calendar invitation if you don't uh haven't received one or you haven't signed up please please take the opportunity to do that now or Nicola will assign you into a group by the end of this week okay we're going to run these community discussions on zoom okay so I would encourage you to try for a hard cable or ethernet connection to your router if you can or perhaps use the mobile use your mobile network it's it's really but I know sometimes our connection is patchy and nbn does service updates all the time it seems so if you have any issues please contact uh Nicola about the community discussions I'll put that information in the chat and you can stay in touch with us via the fair 101 hashtag on twitter or of course on our slack community now is that all of us oh and one little reminder for the feedback thank you so much for your comments after our first webinar we'll be running you'll have another opportunity to give us some feedback about today's webinar we do read these comments so we appreciate um what you what you say to us which helps us make these activities and this course um all the better for next time now I believe it's probably time for ah Matias hello look at you appearing at the right moment hello thank you very much for that Liz um uh two points two points I'd like to raise one is I'm quite jealous of your time uh especially given you've managed to color match it to the little a of the ARDC logo I think I need to step up my sartorial game um and also we have had appreciation of your puns or puns in general so please keep them up um now some of the questions that have been asked are uh uh sort of organizational in nature about the course uh so somebody asked whether all the links will be sent in an email say the the link to the quiz the link to um the the activities so that will be emailed out as well for people who haven't been able to get on slack just yet uh and if you are having troubles getting on to slack please get in touch with Nikola to help troubleshoot that uh issue um now I accidentally posted the wrong link to survey monkey um in the uh go-to webinar chat window please disregard that link everybody will be emailed a personalized link that will let us know who it is that's answering the questions if you click on the link that I already posted in the go-to webinar chat the results will not be um associated with you I'm sorry um okay now actual content questions um so I do encourage everybody to use the question module to post questions uh and we have one question here um about uh asking if it would be possible to explain the difference between xml and jason oh well xml uh extensible markup language um is a format or a language I guess that I am somewhat familiar with jason is something and I'm not as familiar with but I know that it exists um I hope this answer will cheer the librarians in our community but it's something I will take on notice um jason is javascript something notation object and it might actually object object notation right okay so perhaps it's um perhaps I can offer an answer I hate doing this because I'm almost making it up that uh that if you have data available in jason it enables the machines uh and um and any coding or scripts to do a little more activity on the data than xml but I will take that on notice and deliver a full report on the slack channel by the end of the week okay now um we haven't actually received any other questions um about metadata and reportatories I'm afraid waiting for any more to trigger in I'll give that uh some seconds um although somebody has suggested a uh a url um I'll work out a way to share that one uh that explain the web page that explains the the key differences between jason and xml um okay no no more questions seem to be coming in uh which is actually rather convenient because despite the the slight delay to the start we've actually managed to finish on time um you possibly could have gone down some more rabbit holes Liz uh but oh well next time perhaps um all right so I'll hand back to you Liz all right I don't have anything more to say thank you very much for bearing with us uh again thank you for your patience this um this afternoon and uh I look forward to seeing you all on the slack uh and at community discussions next week okay