 in the right. So I'm going to talk about RO Crate, which is Research Object Crate, which is a Research Data Packaging specification. I'll just go through what RO Crate does first and then I'll just talk about one particular issue with how we use, well, how we run into people who need to use a vocabulary that's not defined. So we're working in a very different kind of level from the last presentation. There's real contrast. We're picking up random people from random disciplines and trying to help them move into using linked data. So RO Crate is an international effort now, which is about trying to use linked data principles to describe research data and its big focus is on packaging data so that if you grab a zip file from somewhere and you open it up, you have good quality linked data metadata in there, which describes at least the who, what, where, why of a data set in linked data, but also has in there an HTML file which exposes that metadata so you can actually read it. So it's in there in JSON-LD but also with an HTML file. RO Crate came about as a merger of two standards efforts. So there's some stuff that was happening led by my team at UTS, University of Technology Sydney, where we had started doing this and called it Data Crate. We were looking at being able to package all sorts of different kinds of data. So we've got everything that you might run across in a university satellite imagery that's been processed for vegetation indexes. This one's a history data set. We've got engineering stuff about bridges and engineering stuff about time codes on networks and like just the whole, it's a comprehensive repository that we're running. And so we've been looking for ways to describe data that are not domain dependent. And so we were working on that and we met some people from Research Object who based out of Manchester in the UK and they've been looking at, their team's been looking at modeling this with a probably a bit more of a sort of a scholarly purpose about being able to describe Research Objects and their provenance. I think I've got my slides a bit out of all the VR. So this is the RO Crate page. One of the things I'd like to do is just make people aware of this. So if it's of interest, we want to talk to repository owners and people dealing with data about adopting this format and getting feedback about why not if they don't want to. Because it uses Schema.org as the core metadata standard, it's actually compatible with the latest, the thing that Google released very fairly recently, the Google data set search, which looks for Schema.org metadata packaged as JSON-LD, you know, hiding in web pages and can build catalogs. And there's a, when we started this work at UTS, we did a fairly comprehensive sort of survey of all the different standards that were out there and had a look at what we might base a research data packaging standard on. So we looked at obviously building up, you know, from Dublin core and the collections work. These are DC collections. And we kind of compared all the different approaches, including other things like the frictionless data effort. But when we actually mapped it all out, in terms of describing a generic research data object where you want to know at the top level who created it, who published it, how was it funded, what people were involved, you can do all of that with Schema.org. And that had the best, by far the best coverage of any sort of published vocabulary that we could find. So we settled on that. I was expecting maybe a little bit more pushback from the community about that because, you know, Schema.org came from a sort of commercial, I can see some nodding there. Schema.org came from a commercial place and was run by, you know, big, big corporates. But it seems like this is the way the research data is going anyway. RDA, as in the ARDC, Research Data Australia have been publishing data in this format. And there are several Research Data Alliance working groups that are also clustering around using these. So it's not, it seems to be fairly mainstream. The thing that I had up before is just showing you what it looks like if you download, if you download one of these crates, say as a zip file, there's an HTML file in there. And if you open it up, you get like a human readable thing like this. And you can actually click around. I'm not going to give a demo or anything because it's such a short talk, but you can click around and look at what's inside and down to definitions of descriptions of files and how they were created. So I talked a bit about that. So we use Schema.org. One of the nice things about Schema.org is that it has, it has a page that actually resolves when you click on a term. Now, this is not true of all vocabularies. If you're trying to play with like data, I'm sure people in this group have run into finding something that either doesn't resolve it all anymore or takes you to an AL file or something. I mean, maybe people in this group love that. But I don't, and I don't think it's a good thing to subject users to. So one of the nice things about Schema.org is that it has these pages. I pinched this slide out of another presentation, which was talking about why it's good for something to resolve to a page that explains what it is, because this one is showing that in Schema.org, title means something quite specific, which is to do with job notices. And the thing you're probably looking for if you're trying to do title in the Dublin core sense is name. So, but at least you can see that if you speak English anyway, you can see what something is. So here's another snapshot of one of our sample files. And it shows the little question marks here that point off to definitions of things. So that one, that example is pointing off to name. And you can see on this page, some of the kinds of data that you could include if you want to. This one's got EXIF data. And it's also got some provenant stuff in here about how we model the production of the creation of a file. This is just showing that we're shipping both human readable, it's a human there on the right, and machine readable data. And this is just to give you a bit of a flavor about the kind of representation that you can do that works. Schema.org again has, it has enough to do very basic provenance without having to go to other ontologies. So you can make statements about, so we have me up the top there as, but we have these create actions that talk about how the image object that we were just looking at was created. So that's create action photo one, created that for that picture. And you can have, you know, places and products and so on. Not nowhere near as sophisticated as what Natalia was talking about with the describing individual pieces of equipment. But you can do it. Okay, so that's a really quick intro to what we're doing. And I hope that made sense. We'll find out in the questions. One of the things I just want to pick one thing to talk about that's been a real challenge, which is when we're dealing with people coming from different disciplines, they often have their own local context, right? So, well, of course, they don't have the discipline context. So if we were talking to people doing the marine science observation stuff, then it would be quite easy to use the link data principles. And you could draw on those vocabularies we were just hearing about, and that would slot straight into this framework. But when we run into problems, say, we were talking to people from one of our microscopy facilities who got lots of microscope images, and they're using the open microscope environment and a marrow, which is a repository. But when we actually looked to find ontologies and vocabularies at that stage, this was a couple of years ago that we started the conversation, you could get there was XML schemas, but there were no defined vocabularies that went with them. And describing things like instrumentation was a real problem. So the lab manager we worked with, or the lab tech we worked with in that context, started setting up wiki pages for each piece of microscopy equipment down to the lenses and filters and so on, which hadn't been properly documented before. And that gave him URIs that he could use, which at least uniquely identified things that we could start to use in a link data context. Not particularly sophisticated, but it was a way to get going. And I want to talk about another sort of a case where I'm dealing with, we're working with a some people from the humanities, I've just, I've thrown this one in here, this is a this is a work in progress from the paradisec group, which is a language and cultural archive, paradisec. They're in the process of porting their data, so that it's all described using RO crates, all the collections and items described using this same technology, because they want to make things interoperate, not so much on an item level, but on a technology level that they've been having to maintain their own stack, their own repository application for many years, and are interested in converging on being able to reuse tools. So this is something that Marco La Rosa and Melbourne has put together. This is just a sort of a placeholder to remind me to talk about paradisec. But I'm going to talk about this data set, which is been collected by Alana Piper, who's a historian criminal, I'm gonna call it a criminal historian, but I think maybe a history of criminology, who has been digitising prison records for a big chunk of time from the mid 1800s into the into the next century, and organising crowdsource transcription, which as an aside, has worked really well in lockdown. There's lots of people all around the world who've gone into a crowdsource transcription and finished the job off way ahead of time. So that was really good. We're going to have about 50,000 of these. And there's a lot of data in Alana's data set when we're describing people, birth date, birth place. These are things that come straight out of schema.org, and they're not a problem. But there are some specific things that she's got around, like she has education codes, which are a bit more specific because it was specific to the particular Victorian prison system and sentencing. So how to deal with custom properties. What we want, and this, the thing I'm showing you here is, was part of a really stupid experiment that I did to try and work out how we could have these, have an ad hoc vocabulary, right? If you're not writing off to, if you're not sort of in collaboration with a big institution, or you don't know, you don't actually have the resources to get a vocabulary online properly, how could you actually at least ship some kind of vocabulary with a link data data set? And this was a stupid, very stupid thing that I did here, but it was fun, where we actually, I actually encoded the description of an RDF property into a URL, and then set up a website that would actually display this back to you. So you get something, the idea was that you get something like that schema.org page that I just showed, and this is Alana's description of what education means in her context. So this is obviously not a very sensible thing to do, but what we're aiming for is being able to show a page like this that goes along with the data set, so that when you're looking at the HTML file, or if you're trying to reprocess data, you've actually got access to the definition of the property. This is the website, this is my blog where I did that stupid experiment, and I'm just going to finish up now with, this is the solution that we've come up with, and I'm assuming reasonable familiarity with these sort of concepts of JSON-LD and so on in this group, but I can explain further if it's required. So JSON-LD works with, JSON-LD works using contexts so that you can actually let developers who are creating, you know, creating metadata records just work with things like name and description and so on, and treat stuff like JSON, but then there are definitions. So this is, this is the, sorry, this is definition for sentence, and that after talking to some sensible, more sensible people from the rest of the RO crate community, what we're looking at at the moment, this is our current thinking on this is to define UUID-URNs, which are unique, and so you could reuse this if you want to, but they're not resolving to something on the web, and because it's not resolving something on the web, you actually ship the property definition in the, with the dataset so that the HTML viewing software will be updated so that it will actually, if you click on a definition, it will actually show you this definition, the sentences of penalty imposed by a court with the complete definition in there. So that's, that's where we've got to with this idea, and I'm going to have a couple of examples out fairly soon from history and literature, and we'll start looking at this with people who bring us datasets where we can't find good, good, can't find good ontologies and don't, can't wait. So that's that. I'm just advertising RO crate, you can join. It's, if anybody's interested in coming along to our meetings, they're at 6 a.m. East Coast time once a month on a Friday, and there are a few tools out there for processing, I'll send through this presentation, but there's a few tools. There's a thing that Marco La Rosa has written called Describo, which is a desktop cross platform desktop app, which actually lets you build up descriptions. And he's done some work on loading vocabularies into that. So things like language codes, you can load in all the language codes if you're working on a repository like Paradisec, and there's a lot more work to do there with making vocabularies easy to use, and there's a couple of other things that are on that page. So that's me.