 Hello everybody and thank you for your patience. I had some interesting moments there connecting and being confounded by myself and systems. So thank you. Thanks a lot. Let's get into interoperability today. My name is Liz Stokes and I would like to acknowledge the traditional owners of the land on which we're meeting. I'm coming to Sydney, Australia, which is where I am based at the moment. The traditional lands of the Gadigal people of the Eora nation like to pay my respects to Elders past and present and extend that respect to any First Nations people who are with us today. So let's get into interoperability without all of that faff and I am going to... I don't really look at it. So Matias talked on Tuesday about the reasoning behind interoperability and its place amongst the fair data principles and showing some examples of what goes wrong when you don't have common agreed upon standards and talking about the utility of having the data schema with the data which brings everything together. So today I'm going to pretty much go over this same stuff and hopefully go a little deeper into what it is when we're talking about vocabularies and share a couple of tools and some resources to help you get around interoperability. As you know, you can always join us on the Slack channel for any questions that come up. And if you've got any questions from what I'm saying today, please throw them into either the question tab or the chat term and we will have a Q&A in about half an hour. I'll try and speed up somewhere. So interoperability is what I think we're going to get over today. What does it even mean? Talking about...let's just get into it. So to be interoperable, which is basically data that is interoperable by a computer so that data can be combined with other data. The data needs to be...we'll need to use community-agreed formats, languages and vocabularies. The metadata will also need to use community-agreed standards and vocabularies. They contain links to related information using Identify, which is essentially what that I3 points there means. Metadata including qualified references to other metadata. So let's have a look at all of those principles in another way. So data that is interpretable by computers, community-agreed formats, languages and vocabularies. And the metadata itself also has some of these fair principles. Okay. So we also use community-agreed standards and there are links to related information using Identify. So this is my first warning here. We might just get into the week, which is good. Okay. Because when we're talking about vocabularies, classification systems, determining community standards inevitably will get into value judgments as to what is a weed? What is grass? And what distinctions are helpful now and in the future? Being humans, we don't always get it right. So it's a process and we all learn to trust in the process. And I hope that by going through talking about some fair looking vocabularies and a couple of metadata schemas, we'll even get over to linked data and as a way of coming back and having another perspective on what interoperability means. So hopefully I can get to ontology, but if we don't, I'll just send you off to some awesome things. Okay. So community standards. Okay. Let's focus now on community standards around research data. And what kinds of scenarios might people be thinking of or when we think about what community agreed upon standards might help us solve? So here are a few ones from the fairsharing.org website. So here we have the researchers talking about my funded data policy recommended use of established standards. But which are the ones widely endorsed and applicable standards for my crop data? Funders and journal editors may be asking, what are the mature standards and standard compliance databases that we should be recommending to our authors? Maybe the journal editors are looking for a repository that they can use to host their related data from the publications that they publish. Down in the bottom right corner we have librarians and data managers who are looking at genomic rights data in a particular format which has now been deprecated. So they're looking to find out what's the new format and what can they do to migrate their legacy data into a format that might be more widely used or meets current standards. And then finally we have curators and developers who are looking at sharing social science data. So they're questions. We need a standard for doing this but who should we talk to and what options are there out there? I would do recommend the fair sharing policy. As you can see in that little circle in the middle of this infographic it's a collection of standards, policies and databases. So indexes, fair databases, fair databases, policies and standards that you can all use in an effort to facilitate the verification of a repository. So this is my things can all go pear shaped and humans don't always get it right. So what you can see here is a picture of Copernicus heliocentric solar system where you've got the sun or soul in the middle and then these concentric circles showing the orbits of the planets around the sun. So it took us humans a while to figure out where we were in the solar system but not just figure out but also accept and incorporate that into a socially acceptable world view. It's interesting that we know now that Copernicus drew on the science from many Islamic astronomers who were leading the charge there and he even delayed publication of this model for years and years because he didn't have enough data or really proof his theories. And then 50 years later when Galileo used the telescope to prove this. Well his purposes were more he was actually trying to change community standards and unfortunately he was tried by the Inquisition and then placed under house arrest until he died. So hopefully community development of metadata standards is not necessarily going to be that difficult but I do want to reassure anyone who's getting fired up about this that well humans are funny and cultural changes hard and you might as well ask any data librarian who's had to promote data management plans how that goes. So let's move on to what things make it easier to help aid interoperability. So I'm going to talk a bit about vocabularies and why we would want to have vocabularies in our research data. So vocabularies is a standard way of setting out common language but a discipline has agreed to use to refer to concepts of interest in that discipline. Researchers planning observation or surveys need to define their data items clearly and an agreed vocabulary standard makes a good starting point for translating concepts into other vocabularies so that collaboration can occur. So we've got vocabularies happening at the data description level and we also have vocabularies working at the metadata description level as well. So I'm going to start off by talking about the metadata scheme for aggregating Australian research data and I think I've made some mention of this previously. Now I'm going to get really into it. So it's the registry into change formats for collections and services, affectionately known as RISCF. It's based on an ISO standard and yet, unlike the ISO standard which it's on, it is free to access and you can check out the vocabularies that RISCF uses as well as you can even repurpose it yourself. So its main purpose is in aggregating Australian research data so that it can be displayed in research data Australia, the platform that RISC provides. And the two important things that I wanted to focus on in terms of interoperability is that this metadata schema outlines establishes relationships between objects using qualified references and also uses a bunch of controlled vocabularies. Now let's have a look at a, well this is a little like an ontology model in that this diagram is showing the relationship between different objects within the metadata schema. So RISCF has four objects in its registry, four types of objects. They're parties, parties which represent people or groups, collections which are an aggregation of physical or digital objects. So a collection object is what we use for describing a dataset or a collection. So they're both types of collection. Then we've got activity as the third object there which usually translates to a research project but could be other activity. And then so these four objects come straight from the ISO standard. And so you can see there between these different between these different objects you can see little arrows pointing to things that can happen between these objects and how they're related. And those of you who have fond memories of our webinars on protocols should hopefully be gratified to see down here that there's a line out to services which are delivered through protocols. And that's going to aid interoperability. And then over here on the left the access policy. So you could imagine actually we've covered access and interoperability previously. Now we're moving on to looking at how the metasata might identify these relationships. So actually I'm going to jump over to some of the well, some of the vocabularies that we have. Let's see if that works. Oh, pretty good I think. Okay. Hopefully you can see oh yes so this is a list what I've just linked to is a list of the vocabularies that RIFCS uses. You can see at the top a little bit of the context of the different vocabularies. And then down here we get some definitions of what they are. So what I would like to highlight here for example are the accessed rights type here. And you can see so there are three access types. Open, conditional or restricted and there's some information about when you might want to use those particular types of access rights. Another interesting vocabulary or another way of thinking about these vocabularies is that they're a list of options, okay. Is the date type. So I'm going to scroll down a little, close your eyes if this makes you dizzy. Down to the date type. Okay. So as you can see here there's dates type. All of the dates type is actually, if you can notice a little DC prefix. These are reusing elements from the Dublin core data data schema. In fact Dublin core terms. And we can see because there are different kinds of dates that pertain to a collection of a data set. So when it was made available, when it was created, maybe it might have been accepted or submitted a particular date for which that resource or thing is valid. Okay. And I'm going to scroll a little more again. I'm going to come down to the identifier type. Here's a very useful vocabulary list of persistent identifiers. And as you can see there they all have down the side their prefixes, acronyms, and what they stand for. So that's a fun time in our vocabulary that we use. And in fact I might just jump over to research data Australia. We make that as in big. So back over to research data Australia. These controlled vocabularies say when we're using search and find so actually if I put over to choose the publicly accessible online term little checkbox there this should this will return results which have I'll just click on the first one there results which have open access metadata. Open access research data. As you can see there when we look down at the access part we can see that it's open and we'll go through the metadata centres to the correct record. I'm going to scroll down a little here and just take you into the relationship there. So here is a little graph. So thinking back to when Matthias was talking to you about linked open data and having different subjects and objects and their relationship or the predicates that identifies the relationship between them. Here is a nice diagram that shows graph that shows you how these things are related to each other. So you can see that these little green circles represent people or in RIFCS these would be party records. So a person is a party according to the RIFCS metadata schema and this particular one Lisa Barrow Professor is collector of and also principal investigator of this particular data set here which is also associated with these other four data sets in this little graph. So this graph shows us the relationship between these these different objects. We've also got we can see that there's a thing, a website here that's also listed and that is related to one of these data sets as well. If I even go down the bottom to my review, registry view, this will help us. So now we see RIFCS in all its glory and we get to see the metadata elements here down the side involved and then the values the values of the metadata that have been provided and if I take you can choose a related object and this is really the this is the qualified that the metadata provides, qualified links to other metadata. We can see that this this record is a collection and it's a type which is a data set and it is related to these other objects which are party records and I think we can see other it's also related to these other objects which are collections. Okay, now let's move on from the door, shall we? Oh I could, this is where we get into the weeds everyone, okay? So you could have a look, you could check out have a look at RIFCS in schema documentation and that's what we're looking at now in the components, right? Okay, for example we could go down to what a collection looks like and this kind of documentation that you get to see is telling us is telling us how to understand or rather telling machine and I guess data librarian how to understand the elements and how they relate to each other in this metadata schema. So this is quite a formal way of describing the data model that is used for the thoughts that are ingested and displayed in research out of Australia. I'm going to draw back some of these now but know that you can go in there and explore it any when you still hear me. I'm going to trust that you can. Yes, we can still hear you Liz. Thank you. Okay, now let's go back to webinar right, okay. So remember the magpie okay that Matias was talking about where so actually what I really want to talk about is another metadata schema called Darwin core. I'm very similar to Dublin core. And the reason I wanted to talk about this is because this is an example of a metadata schema that has undergone some change. So this diagram, you don't have to learn it really and in fact this was an early diagram or representation of Dublin of Darwin core. Okay, as it has become widely used open access standard for biodiversity data. So it was developed to provide a simple way to document and share information about species occurrences whether that was in the field or a museum collection. So it's been used to integrate hundreds of millions of records through global biogas diversity federations organization and it's a because it's so widely used it has it has benefit for bringing together lots of different kinds of contributions. So what I wanted to tell you about here is as you can see in that middle so here we have Darwin core this representation of how it contains location organism data geological context and tax on data. Okay. And you can see down the body in here, you can see that there are a few metadata standards that contribute to Darwin core. So these are being reused. So there are some Dublin core elements that are being reused by Darwin core. And there are also sorry. There are also links out to other extensions. Okay. I'll take you up to Apple core which is an extension of Darwin core that focus on herbaria and plant stuff. Okay. Now so Darwin core this was what it looked like in 2012 more recently there have been additions to Darwin core that support the aggregation of sampling event data set. So there's a new event core component of Darwin core that places the sampling event at the center of the simplified data set. And so it links its protocol and by which I mean when I say protocol I mean how they do the science not how they transfer the data efforts and measurements to the species of corons that arrived from the sampling event. Okay. So as a result researchers can tap into more complex and quantitatively rich, richer records for analysis and combine them alongside others which are focused on single organisms or individual taxa. Okay. So these changes could lead to improvements in the quality and usefulness of data sets that are already published on for example, Atlas of Living Australia and other biodiversity data repository. So this is where I want to get into this linked open data. Okay. Wait a minute let me try and back that up with a back that up with a picture. So coming back to the magpies here okay. So we've got all this occurrence records here and I'm just taking you down to how big Darwin core metadata schema so to do these other online resources and so by being able to link link this record to other data out there we can this is where we're getting to the utility of linked data and fair data that we're using persistent identifiers to link between different repositories and that when we have well-described data models then it can be that the data can be reused more efficiently by humans and also the systems and machines that we create. Okay. Here is a nice example of mapping different kinds of mapping different kinds of data to Darwin core terms here. What I'm sharing with you now is a is a page from the ALA blog and you can see here raw data which has been collected this is structured data in a table which has been designed to serve the purposes of data collection. So it's easier to say what you've got your vernacular names down here the purple swamp hen and then we have at each different we have different columns for the different localities. But this is not how Darwin core is laid out. Darwin core is laid out in a different format and that's the second table here and you can see how instead of having the dates up here in the top left corner we have date information coming through under event date and then we have the vernacular name locality indicating different components of the data. This is a really good blog and I reckon you should have a read of it but right now it's time for me to get back to linked open data. So this might be I wanted to get into this because this illustrates the five stars of linked data that have been defined by Tim Burner's link and I think it's a really interesting way of looking back at interoperability and the fair data principles because it has a lot in common with how we talk about these principles. But it's different so it's not the same but I hope by offering it up you can give you something to think about. These stars indicate levels of things that you can do to publish data and make it available to others. So the first thing is step one here and you get one star for publishing data on the web. Making it available and in this case they've used a PDF to publish it, put it up there and they've also put a license on it. That's the first kind of step so that people know that they can reuse it. Step two is making that data available as structured data. So we can see we've moved from tables in a PDF to actually tables in an Excel spreadsheet. Good step. That means that people can more easily grab that data and do other things with it, plug it into some analysis of the software or other things. Except we're using a proprietary format here where reliant on being able to read Excel. Excel spreadsheet. So the third star is to use an open and here we have CSV representing the value format of the data which can be read by anything. CSV is an open format. So step four and here we have the beginning of the resource descriptive framework or RDF and of resource descriptive framework is to use URI or uniform resource identifiers to point to things so that other people can point to your stuff. So using URIs to point to concepts in your vocabulary for example or in your metadata schema or in your ontology and step five the five stars of linked open data where you get to linking your data to other data so that you can provide context. Okay. And that's really where we get into that heady world of linked open data and query across across many things. So that's the future, right? Okay. And I should caveat it's not necessarily so that that must all happen for all data but it is a significant driver. One thing I also wanted to highlight is that the Australian Government have a digital continuity 2020 policy and I wanted to share this with you because it's another facet of how interoperability is driving change in how we manage data in the government level and this here is a component from that policy. Agencies will have interoperable information systems and processes that meet standards for short and long term management improve information quality and enable information to be found managed, shared and reused easily and efficiently and it's a brave thing. So the National Archives have some good resources and scenario mapping for government agencies who are building interoperability into their systems and processes. So I just wanted to highlight some of those scenarios for things that you might do if you wanted to take this interoperability a little further and that's me realise I'm looking at streamlining business processes you might undertake a legacy data migration to move data to upgrade your metadata or maybe it's looking at a data exchange activity with stakeholders or reviewing the standards that you have for data publication and sharing. Anyhow you'll have a few of these slides and you can go and see those scenarios which are in friendly storyboard maps. Okay. I'm going to kind of skip over ontologies at the moment but here are a few nice little ontological tools if you like and actually on if I do this again I would probably start off with a comic of Ada Lovelace Alice Babbage and then have a look at how someone has created an ontology of that comic and here let's finish up with research vocabularies Australia okay because here at the ARDC we've got some tools that can help you develop your own vocabularies and publish them so you can make them fair. Vocabularies services well a vocab service indexes vocabularies and it can tag items in catalogs and search profiles and that can help you provide keywords and other search aids and the services can also relate to machine to machine services which would support activities like creating, managing and delivering vocabularies so in the ARDC vocabularies services suite we've got an editor, a vocab editor which is called Pool Party so we contract the software Pool Party where you can create and manage vocabs you can collaborate with others and browse concepts using that built-in visualization tool and you can also query your vocabs as well which is very handy. We also have a repository which is where any vocabs you create can be stored in you and that's where you can publish it so it can be so your vocabulary could be accessed via the portal at Research Vocabularies Australia and this is here a few things that you can if you are using the repository on the other side and the portal it will enable other people to find your vocabularies as well so feel like going all around the world a little bit in interoperability and there are a lot of different directions that we can go in but in tying this up together I'd like to finish with this idea that if you want interoperability to work for you in making data fair then it's important to consider that the data model you're using for your metadata is well defined and well structured so it uses standards such as Resolve Descriptive Framework, RDF or it uses standards for describing structured data so that machines can pass that information and humans can also identify it secondly in using controlled vocabularies which are also well documented people can access them and are resolvable by using PIDs that's another way of enabling your data to be fair because the vocabularies and the metadata that you're using to describe that data are also fair so they're findable, accessible and interoperable, reusable and finally one part of that is that metadata includes cross references which provide contextual meaning to the data and that's the beauty of linked data and what we hope to enable so before we get into questions and answers I'm going to remind you to don't forget to fill in the feedback at the end of this webinar because it's really helpful for us to know how we're going and sometimes at the end of a long webinar that's when the questions come up so just in case you don't feel like you've got any questions now just giving you an out there so Mathias that's enough from me does anyone have any questions about all of this interoperability so we have one question so far and that is about RIFCS and RDA so the relationship of parties linked to data set how is that connection made is that manually entered in the related objects field in RDA yeah so it's well the related objects sorry can you hear me now yes good so the related objects in RDA is tend to be well this is doing that I would like to remind everybody that if you do have a question please enter it into the questions module the question was relating parties to each other or parties to collections relating literally the question was relating parties to a data set okay so you can have the you don't need to supply party records necessarily alongside collection records so most of the time you're actually sending sending the collection records that say the collection is related to a party and because of that link that has been established then there's also then RDA will infer the relationship to the party from no to the collection from the party so it's not I guess this is going to sound really confusing but when you send collection records when you send records to RDA it's not like you're sending all the collection ones with their relationships plus the parties with their own relationships as well so you had these sets of relationships that are in parallel they link up with each other because of the persistent identifiers that you have included so the relationship happens doesn't have to happen on both in the metadata it can just be one because the bi-directional link has been inferred I'm not sure that that's actually very clear in my description I apologise that's clear enough okay we do not have any more questions in but we are getting close to the hour so I will probably hand back to you Liz to wrap up okay well thank you everybody you've been with me quite today that's been nice and I hope to see you sometime on the Slack in our community discussions next week there next week thanks very much