 This is a piece of work that I did with some colleagues actually all based in Europe, but it was the outcome. That works for me. Thank you. Sorry. Tina, I think it was. Somebody's having a chat. So this is the outcome of, well, it was initiated at a meeting two years ago now, which was a brand meeting was around metadata standardization, but the, you know, one of the discoveries. No, that's surprising is that alongside or underneath metadata standards, you almost always find that you have a set of terminologies or codelists or vocabularies that are needed to provide the data that goes into the fields in the metadata record. And that's, you know, there's this was a rich history, particularly in the sciences of developing the categories. But sort of too many of them reside in, in not very useful or not very interoperable places like in, in books or or in appendices to books or sometimes just in in graphics. Even organizations that you would imagine or to have their act together like the International Bureau of Weights and Measures, you know, publishes the SI in essentially what comes down to us as a book. They refer to it as the handbook, the SI handbook. It's, yes, it's on the web, but it's, but it's just web pages. So we, this, this piece of work, we, we basically looked at what we would want to do to make these kinds of legacy, what we referred to as legacy of the categories that's vocabularies, which are presented in the kind of traditional way, to make them more useful in a, in a distributed web-centric context. And we refer to that as a fair vocabulary, sort of using the terminology of findable, accessible and interoperable, usable. And kind of another that I wanted to use for that, which means I'm going to have to swap the screen that I'm sharing and share another one briefly. So this, this table is a sort of an interpretation, it's not a very strict interpretation, but it's an interpretation principle of how you would apply the fair principles or at least the fair letters to publishing the vocabulary. And, you know, we can go through those, but you know, to make it findable, it must be available from at least one repository recognized by our community, and that it's possible to search for a term of vocabulary and get the identifier for it. So this is a kind of link data viewpoint where we're focusing very strongly around the idea that every item in a vocabulary should have a web identifier. And the vocabulary as a whole should have a web address at the top. Then accessibility is addressed by saying, all right, when you de-reference that identifier, I'm sure to get something useful. The interoperability is addressed by saying, well, the, to be useful, the representation must conform to some standard. And it also makes it maximally interoperable to map in relations to other vocabularies. Finally, the reusability is managed through the licensing metadata and the definitions being sufficient. So, hoping back to my other screen again, see I'm reusing this material, but just to motivate us. We have plenty of precedents of fair vocabularies, some of which are hosted in services, we've got some which we host. ARDC has research recovers Australia and they host hundreds of vocabularies, which in a form which satisfies pretty much all of those criteria which I just went through as to how we think of vocabularies in the context of fair. And so this paper that we wrote was basically a kind of almost a step by step, but it can't be totally step by step because each vocabulary has got slightly different profiles, but a set of concerns that need to be addressed in order to make a legacy vocabulary, just something that you might quite quickly call fair. And, you know, in contrast to some earlier work or parallel work that has been done, particularly coming out in Europe, which is focused very, very much on the technology side of it, with national ontologies and these kinds of things. We've buried almost all of that in item number six, and everything else around this set of rules is almost all around sort of social processes, getting agreements, dealing with the vocabulary owners and custodians, and thinking about the licensing arrangements, taking care of persistence of the identifiers. And finally, making sure that you recognize that when you've published your fair vocabulary, you have to maintain it, and that maintenance is in a couple of directions. One is making sure it stays alive, and the other is making sure it stays up to date. So that's a kind of yes, Nick, you threw down the gauntlet and said, you know, suck us with some theory sign. So that's kind of what we did, you know, with all, when we were pulling together these rules, however, I would sort of emphasize that it was based on experience. And, you know, a particularly rich experience, or a couple of experiences that I had in the design of the way we did these rules. One was the vocabulary is from the Australian soil and so they fill a book, which is one of these classic legacy vocabularies where the reference version is in print form in a book. And we have most of these terminologies out of this field handle available in a fair way, currently largely published through Syro, but we carefully made sure that the reference URIs aren't Syro ones. So there could be this whole vocabulary could easily be re-homed in research for cameras, Australia, for example, and that would be the direction we plan to go. And, you know, there's a bit of a checklist here about how we satisfied those 10 simple rules. And another one which I've been working on over the years was turning the geological timescale, which the classic version of which is published is this colored picture, something which is sort of fully bare and some link data semantic web technology stack, so that so that it's for the accessible. Now, the process in the middle there of creating the machine readable representations was turning something which may be tabulated on a page into something which is in SCOS or IDF or Al or Jason or whatever. We did talk about that a little bit in the paper, but there's a number of different pathways and you just sort of recognize and the tooling around that was a number of different tool sets which are already available. So rolling up the sleeves and turning that theory into reality might be something that we should pick up with and over the screen. Okay, Rowan, do you want to say anything? So do you want me to just go ahead? Please, please go ahead. Okay, so, like Simon, I have done a little bit of work on persistence of things and what not. And in fact, let me share my screen and you'll see this is a mini mini sideline but it's related. Okay, so my screen should be coming up now says a paper here from a few years ago, the challenges of ensuring persistency of identify systems in the world of ever changing technology so this is the persistence of the identifier system to ensure the persistence of the identifiers that the system manages and it's relevant to both caps and other things and this is a paper that I and a couple of the CSI at the time people published. And so this is that this paper is somewhat similar to the 10 rules for vocabaries but for identifiers and it says you should manage the governance of the system as well as the identifiers themselves and so on. So that's just a thing there that this kind of work sort of goes on and on but where I wanted to go on top of what Simon's just presented is just some of the things that we're doing with actual technical and specific arrangements for vocabary publication. And I've mentioned these many times before in this forum, but I'll show you all a couple of things which will hopefully paint the picture as to where we're up to. So a little while ago a few months ago, we did some work for Geoscience Australia based on this workflow and what this workflow is it's a vocabulary creation and publication workflow we don't need to consider all the details but essentially, there are several ways in which vocabaries might be created inside Geoscience Australia where you can take a, there might be an expert there who can type out by hand a vocabulary, a semantic vocabulary file, or they might use a tool to make the vocabulary. There's a whole bunch of different ways it might come from out of a database but we have a couple of instances of organizations that have pretty good vocabulary review and release procedures. So however you actually make the vocab you then go through this review and release procedure and then once you've kind of agreed to release this vocabulary you then want to go into what we think of as best practice or pretty good publication. So what does that look like? So this workflow has got elements within it that try and do as many of these things as that we think need to be done with as little effort as possible from the doers of it. So for instance, if we look at, and I'm going to just, I'm using specific examples here, let's look at the vocabularies, as managed by the Geological Survey of Queensland. These vocabularies are SCOS vocabularies and we can see the system there in a second but Geological Survey of Queensland, you can manage graphs inside graphs and graph systems and graph databases and so on but actually managing vocabularies in terms of an organization managing a set of anything. The fact that their semantic is exciting but managing them as a series of documents and so on, the Geological Survey is quite able to manage collections of files in a version control system. So this system that you're seeing here, it has a collection of vocabularies in it, there's 80 or something in here, that's them all there, but it also has user acceptance testing vocabularies which are not exactly the same set. It's also got access control rules about this repository who is able to actually put certain content in what place, it's got roles assigned to the people to do that, it's got automated tooling around this so that when certain actions are performed other things automatically happen. So this is actually the management system for the vocabularies, it's just ordinary old GitHub, it's used for lots of stuff, and as far as we can tell it's working just fine for the Geological Survey's vocabularies. So the reason about going on about this is that we have very interesting content, we have all these different ways that it might be produced but the actual authoritative version containment and release system is just ordinary old GitHub. This is partly able because the vocabularies we deal with at the end of the day can be represented as text files, we might want them in a system but nevertheless we can have them so we can manage them like any other version controlled text file. So this is just from experience, we've tried to manage vocabularies in specialised vocabulary management systems but actually we have a couple of examples of organisations that have no problem managing it here. Now a counter example is the British Oceanographic data centre, they manage several hundred vocabularies in a oracle database system and they have procedures and roles and access and release and so on all around that. But again, their content is managed in a non-semantic web, non-vocabulary specialised system, it's just an oracle database and then it's used as semantic vocabularies somewhere else. But yeah, very standard tooling, so that's kind of one thing. So on this kind of this workflow of creation of vocabularies and management and release and publication, the management bit ormy systems, when we look at the actual how to best publish the vocabularies, it's not okay just to leave them I think in a folder in GitHub and say they're all a vocabularies. And a lot of places have done this, they just said here are vocabularies and we might have tooling that uses this elsewhere and if you want to use our vocabularies, here they are. But because of the kind of machine readable and human readable possibilities of semantic web stuff, I don't think I think we could be much better than this. So this particular set of vocabs comes out in the geological surveys vocab delivery tool. So here's the tool, you hear those same vocabularies and you know we can look inside them and we can browse them and so on that's all fine. But this tool is entirely slave to that GitHub repository, it doesn't do any management and it doesn't do any version control or anything it just represents the content. As you saw before that GitHub repository. So that's that aspect. Now, back to that workflow. So, I won't go through all the details but essentially there's one big block about creation and we haven't talked about creation tools yet much but you create vocabs in several ways, you get them into wherever it's gone. You get them into this click through my link sorry. Simon. Sorry you asked me to make my comment. You raised a hand so I did raise a hand yeah that was just just. You know it's interesting what you said Nick saying they're perfectly capable of managing their content in a version control system. And then you've, you know, since then we've been looking at the web interface to GitHub. You know, I work with this every day. I'm a little bit interested to know if the rest of the audience here if, if they can, if those things joined together as smoothly as we're assuming. That's a good point. Yeah, and I should say, even before anyone else jumps into answer that the geological survey couldn't use GitHub until the vocabulary project came along in fact they had never used version control systems. The vocabulary system was actually the first system that they now use GitHub for lots of stuff, but they had not used version control for anything until I came along three years ago and said well you're going to have to manage all this content. Why do you do this. They've since used it for you know images documents soft software all kinds of stuff but that actually this system was the first one and but yeah please if anybody else. You know, starting as if you like from an engineering viewpoint. It's, it's, it's sort of sometimes horrifying to imagine people aren't storing stuff in a version control system and and and in the cloud and backed up on that but that may be a strong assumption which isn't shared everywhere. I mean we know the next the next unit that we want to work with the geological survey WA they have no such management systems in place yet. Yeah. Yeah, that's definitely true. I could comment, Nick and Simon, if that's alright. I mean, as a scientist ecologist that hasn't had news version control at all. So working with vocabularies has been an opportunity to get into it and I really marvel at the, you know what it can do. Access controls you know rights responsibilities whatever you're assigning but the ability to work with people instead of sending Excel files backwards and forwards was just like oh my God. This is so much better there's just too much risk in you know having things stored where you've got them stored locally even backed up to a server didn't make me feel comfortable you like them. And knowing whether you're not you're working on the right version. So it's been a really learning curve for me but it's doable. There's plenty of resourcing on the web there to get a handle of GitHub particularly the pushing, you know, working with Git and being able to push up the files is something I need to keep working on. But so it's a learning curve for domain experts. I guess if you're expecting domain experts to manage the vocabularies in this way but I think it's certainly doable. And I think what you teach lessons. Yeah, I think I think what Nick's, you know, quite nicely illustrating to us is, you know, on the one hand you've got if you like the point of truth, which is these files and version control system. But then the access methods you need to make easier. And the fact that he can tell us that behind the scenes, this is completely synced up with the point of truth is, is, is a necessary condition. But, you know, I mean frankly this interface that Nick's showing us at the moment is is way way better than what you would find if you open one of those files. But it's still pretty geeky. And, and, you know, for for normal users we need to be looking beyond this kind of view to something which is is really as much as possible would resemble the things that they have printed in their books at the moment. Just one other thought on the version control. So, I'm not sure if everyone's on this course aware, but the, the version control system that's used to get have so get. I mean, the origins of that system where the Linux foundation so the people who actually build the Linux kernel operating system kernel they, they were unhappy with version control system so they went and wrote their only one. And now the world kind of uses it, but, you know, the way they use it is like orders of magnitude more complex than the way we use it. They use it to integrate and do all the stuff with software. And so managing files in the way that we are text files that have changes tracks changes is, is, is, you know, breakfast work for get. I mean, it really is no problem for that system so that's why we find it easy. However, it's still get still uses a, a difference mechanism that's not 100% native to the way our semantic stuff works so it can, it can very carefully tell you what characters changed in lines and what lines changed over time it's very good at that. But it doesn't do very nice jobs where say someone takes a vocabulary, and they've made a an isomorphic change or what I mean is a change that it doesn't actually affect the content in the, and so on but it affects the way that the content is printed out in the file but it doesn't affect it wouldn't affect the display that you see here but it does affect the source control file that would look like a completely different version but the content hasn't actually changed. For that kind of version control change, you really do have to look into specialized graph tools for that to work out that this graph is actually isomorphic to that graph. That's out of scope kind of here and I haven't yet come across an organization that other than my own that cares enough about different vocabs to do that but it's possible that yes you could have two vocabs that inversion control completely different but actually the same, and you could have for instance two formats of the same vocab it's the same vocab you know all of those problems what do you do about those there are version control tools for semantic graphs, but I found that yes let's get everyone up to using good version control and point of truth storage and all that sort of stuff well before we worry about graph do thing issues. That's a maturity thing, organization maturity thing. Okay so that's the vocab so we've got the vocabs in the conversion control system on a normal version control or an Oracle database or whatever the organizations wanting to use. And, and now we want to do stuff with the vocab so we want to publish them. And also we want to, in general, you know, find other vocabs for use and so on so I'm going to show you a quote that as in a kind of project quote that my company's done recently for the geological survey of Western Australia. I'm just seeing if I can not trying to have a zoom in. Don't think I can zoom in. Maybe I can hang on. There we go. Okay, so what this quote says is there's a bunch of vocabulary publication tools out there some of them are these one organizational tools like the geological survey Queensland, but there's also research vocabulary Australia which Simon's mentioned which is an aggregator. And what this contract says is let's build a couple of extensions to the search mechanisms in our vocabulary publication tool, so that we can use that tool also as a cross system searching thing. And the reason we would do this is that if I'm an organization and I've published a bunch of vocabs and I want to publish another one tomorrow. Optimally I would semantic web optimally I would reuse terms from elsewhere I wouldn't just slavishly implement them all from scratch so. There's a search functionality in these tools where if I search for the word coal it'll find me all of the vocabs, or all of the terms within vocabs. Within this organization of vocabs that have coal in it in a particular way, but I generally don't just want to do that. I want to search across all of the vocabaries out there. Now RVA has done a good job of aggregating, you know, many hundreds of vocabs and I can search across all of them. But what if I want to search across my collection of vocabs and that one and RVA and another one? How do I do that? So the contract extension that the contract that we're working on is an extension to the publication tool to allow for cross system searching. I think I've got a picture here. These are obviously wire friends. We haven't built this yet, but instead of just searching for a term on a vocab we would now search for a term and say which systems do we want to search across and we could choose. And the results would come back something like this rather than what I've just shown you where come back and it would say I can see coal in this vocab and I can see coal in this other vocab and this vocabs at some other organization. Now, this kind of multi search interface thing. This multi system search thing rather there have been several attempts in the community design aware of to do this sort of thing, but it's really difficult because of the way people constitute the vocabaries they can be quite different and searches, you know, do and don't work and all this stuff. So in terms of publication and management, I think we have to start with a fairly strict regime of different organizations publishing vocabs in a particular way and show that search works across those and then over time loosen how you may represent your vocabaries, but we have to from a multi organization point of view if we want search to work across those and we want absorption and reuse of vocabaries we have to be very very strict about how we constitute our vocabs. The temptation here is that because of the semantic website of being able to do quote anything, you might have very very different styles of vocabs that just makes the tool making tasks basically impossible. If we don't have very strong guarantees about what a vocabulary looks like, not the content, but you know how it's structured. We really can't operate. So this is the forefront of where we're up to is making cross system search and where does this go to the management and publication. Well, if I want to make a vocab tomorrow. The best thing I could do would be to ensure that I'm either directly reusing existing vocabs that are in the space or choosing not to use them and stating why to, you know, put the terms that I want in there and position them according to the other terms out there. And then once I've done all that stuff back to that workflow thing the last few things I want to do is I want to publish the vocabulary in my system and or publish the vocabulary in an aggregated system. So the last bit of this contract is to ensure that any vocab that's published in this way. So you see this. And this is the whole purpose vocabulary when this is published here by the geological survey Queensland. Unless they say otherwise, there'll be an automatic thing to go and also publish or update this vocabulary at the research vocab is Australia. So we treat the research vocab is Australia as a national aggregator. And we sort of on published there. And if this also may occur elsewhere internationally, you might say that this organization publishes this point of truth vocabulary here. It sends a copy over to the search vocabulary Australia or a reference. So it says, not going to send you a copy but I'm telling you that I've got a vocab that I'm interested in. And it might also contribute this vocabulary to some international set of vocabaries. So back to what I said at the beginning of this, we're trying to do as many of these things sort of behind the scenes as possible so that as an organization that actually creates and publishes vocabs. So potentially you have your ways of creating them, you have the ways of managing your vocabs. And then you as much as possible from there it's all automated you create it, you do your curation and publish it you go publish. And then all of these things you know best case is that you're publishing here and listing there and everywhere that also happens automatically. And we're trying to do as much automated tool as we can research vocabulary Australia has an interface that allows us to interact with it automatically. So we know that this kind of pushing of content will work. We've tested that out already so. So this is where we're up to it's it's it's how many of the points of Simon's paper will tick off a few of them. I mean it'll, it'll, it'll do some of the sort of fair formats and this and that the next thing that there is no guarantee that what we're doing will let you know, ensure the long term governance of an organization to vocabs it will certainly allow them to be listed in certain places, we can assign identifiers like this one to them under certain regimes that are very good identifiers but you know if the organization that actually owns this thing loses interest in maintaining this and this will crumble but we are dealing with organizations at least a few of them have been around for 100 plus years, and they've been creating and delivering content for a long time so there's a good chance I think that the geological surveys and elsewhere will when up and running continue to deliver vocab content for a very long time. So that's my view of this stuff and part of the discussion here is to ensure that other people's views you know do they use get or not or do they have fundamentally different ways of viewing vocabaries and publications on some of those come in because otherwise it's you know, well I'm interested to learn what other people do I suppose as well as saying obviously what I'm doing can't be the only way to do it. Okay, I'll see if I can stop sharing my.