 Okay, so there's just two cases that I want to talk through about vocabs that have become orphaned in the process of being rescued and then I want to draw some lessons from those. So the two cases are the International Organisation for Stabilisation ISO vocabularies and just a first point is that they don't have semantic representation but they're still orphaned in a way and that's to be dealt with. And then secondly we have Bureau of Meteorology vocabularies which have or have had a semantic representation that is fading from view. So to the first case, first ISO vocabs. Okay, so many of you will be familiar with ISO codelists. Now the particular set that I'm familiar with come from the 1.9 series of spatial standards. Many of you will know these but there would be correlates in other non-spatial areas too where you have documents presented in PDF form and you can see right over on the right there there's a rollcode list, very well known list of roles. There are notes beyond this table about what these rollcodes actually mean. So sometimes they're a bit trivial, the author it says the party that authored the resource. That's not that helpful if you don't know what the word author means but in some of them they have a bit more information and certainly there's been 20, 30 years of use of these and they're common, they're understood and they are essentially known. But why are they orphans? Well they're orphans because sometimes these standards don't do a fantastic job of getting them out there. They're just PDF documents you have to purchase them. Sometimes you see them used and you don't know where they come from and sometimes you may know where they come from but don't have access to them. So they perhaps not orphan in that way but they're difficult to work with. There is an orphaning stage that can occur though where a standard becomes obsolete by another one and so on. So because of the way these kinds of standards work, five-yearly document-based releases, they can become bypassed or obsolete and they're sort of in some way dead and yet you may still need to use them. So what have the community done about this over the last 15 something years? Well one thing that the parts of the community have done is to, sorry, one other thing is that PDF document that I showed previously, this one, that's one representation but many of these standards do have a machine readable representation version of the vocabulary in XML, something like that, usually XML. So what do the community do when some or all of that content is either badly available or not available or becomes in some way unavailable. So what they do is they do cheeky third-party representations of these vocabularies. So here is the national organization for, no, ESIP, so it's a federation of geospatial communities through American institutions that have, I'm assuming without direct permission, gone and republished these vocabularies. And they've done a good job of it. They've presented them well. They've presented them nicely online. This human and machine readable form as in the original XML and also HTML webpages. And for many, many years, people have used this representation of the vocabularies, but there's lots of problems. It's not authoritative. So if the ISO decided to publish the vocabularies elsewhere, then we might have a problem. It's probably against the licensing conditions of the vocabularies and many other things. So we have problems there. And these vocabularies could have a life of their own. If people weren't careful, they could evolve separately to the ones that are the standard and people's expectations about the vocabulary might be sort of challenged. So there's been an attempt to make the first-party representation. Take those vocabularies from the ISO and actually get the ISO to publish them. And I've spent years and years trying to do this. Five, six, seven years, something like that. So we've suggested to the ISO how to do this. We've shown them how to technically do it. We've costed what it would take for them to do it for these vocabularies. And I know that there's a similar conversations going on in not the ISO geospatial community, but other communities in the ISO. And this is a very slow-moving thing. It could happen, but the reality is these vocabularies have not yet landed even despite active engagement, despite even in some cases offering for kind of costless or reduced cost hosting of their vocabularies. So rather than a third-party hosting, we're doing it for the ISO and we're not charging them very much money or no money for it. And none of those things have sort of landed here because of the way that the ISO perceives the value that they and their clients get from vocabularies. They don't see them as a primary thing to be doing. They want to sell PDF documents still at this stage. So we have this ongoing issue. So in this case, the long-term preservation and the technical delivery of the vocabularies is really in fallback mode. And what's happened here is that certain peer organizations of the ISO or organizations that wish they were peers of the ISO, like the Open Geospatial Consortium, they've published these vocabularies technically. And you can see double 1.9 is definitions on the OGC's vocabulary server right there. So they've just gone and done it again. I'm not sure they're allowed to, but they've done it anyway. What are you going to do about it? People use it. And then you can see over on the right, I've resisted the attempt to publish ISO vocabularies over the years. I've tried to divert people to pressuring the ISO. I've tried to get the ISO to do it, all of the stuff. But in the end, when we have projects that require these vocabularies, because they are meant to use these standards and they cannot get them in the way they need them and they don't even know if they're going to be around forever, we end up doing the same thing that the third-party publications have done before. Technically, maybe a bit better, but you can see over on the right, this is a thing published by me just the other day through the link data working group. It's that same roll codes vocabulary. I hate myself for doing it, but what other choice have you got? So this is all bad really. And we're trying to avoid this, but this is where we're at for this class of vocabulary. Okay, on to a perhaps better story. So the Bureau of Meteorology, through an initiative that they were extended to implement, the National Environmental Information Infrastructure a few years ago, developed a series of reference vocabularies, datasets and tools for environmental information in Australia. And this went beyond the day-to-day operations of the Bureau. So it was an initiative, not core business. And as you can see from this webpage, that initiative has been decommissioned. So it ran for a few years, built up some infrastructure, built up some datasets and so on. And then that now has all been unwound to the point where the original NEII web pages, which used to say decommissioned, those web pages no longer resolve. So none of their assets resolve online, including all of their vocabularies in the way that they established them. And there are vocabularies there for different kinds of monitoring sites and tsunami types of tsunami waves and so on that the people actually use. And I've received requests, hey, Nick, do you know how I can get hold of that vocabulary that used to be online? Now, so that's bad. It's bad that an initiative has come and gone and hasn't maintained its assets. It's likely, and this is speaking with sort of an insider's hat in the Department of Environment, that there will be other initiatives nationally that will do something similar to the National Environmental Information Infrastructure and could pick up all of those assets that were developed and see if any of them are worth preserving and reusing. The problem with this, of course, is that this is just crazy initiative-based efforts and funding and there's no long-term sense of preservation really. And I'll come to a point about the way we act as a community with these kinds of initiatives and the kind of funding that we get to overcome these things. Nevertheless, there is some good things happening and the good thing in this case is that the Research Vocabularies Australia system, it continues on whether the party that submitted the resources exists or not. So if you search for any IIE vocabularies, you'll find loads in that system and you can see there's licensing and observed property, et cetera. Again, these are used, they're valued. Some people like them. They certainly have a value, even if they're not used directly, they have a value as a reference for someone making another observed properties vocabulary or another licensing vocabulary to either draw from this material or to do things differently. So I think there's quite a bit of value there and they function. You can go to those vocabularies and browse them. Now the issue with the vocabularies in this system is that they are not, in some sense, not a point of truth. The actual identifiers and the links in the vocabularies resolve now to nowhere. They don't resolve here. If you click on a concept in this vocabulary, it will resolve to a non-resolve, well, it will attempt to resolve a now gone domain name, NEII.gov.au. And so users can browse the vocabulary, they can interact with it within this system, but they can't, as easily as we would like, reference the vocabulary terms elsewhere, have people click on them and then people actually go to the vocabulary term because the actual web addresses don't work anymore. So what the Department of Climate Change has allowed me to do is to take on ownership of those vocabularies on behalf of the department and to reimplement the identifiers used in the vocabulary so that they do resolve. So the content is still in the system and it's still browsable, but rather than the web addresses going off somewhere and not working, they'll actually resolve back to the system. It could be that the Department of Climate Change decides to host these vocabularies elsewhere while retaining a copy in RVA, maybe, sure. That's not on the table at the moment. We just want them to work as best they can. And so what we have is this. I've just zoomed in to the first vocabulary that I've remediated and you can see, this is the concept of symmetry in the tsunami glossary of terms. You can see the web address here is linked.data.gov.au slash def tsunami glossary of terms. So what I've done is I've gone and created a persistent identifier namespace that's for this vocabulary that is as similar to the NEII-based one as possible. So the original NEII one read something like httpneii.gov.au slash def slash tsunami glossary of terms or something similar. So I've had to remediate the vocabulary as in rewrite its content to use this identifier and this identifier doesn't yet, but shortly next week or so we'll resolve back to this actual RVA webpage. So at that point, the vocabulary is working essentially as well as it's ever done. The vocabulary is not perfectly preserved in the sense that I don't really know anything about tsunamis. And if someone said to me, can we add these terms to the vocabaries? I'd probably would feel uncomfortable doing that. I don't, I'm not a domain person. So I can technically make sure that the thing works, but I don't have a domain custodian to actually take over this vocabulary. So the ultimate saving from orphaning, whatever the term is, is to find a technical preservation mechanism, which we can implement here, but then also to find a person or a group who is willing to be the actual content custodian of these vocabaries. And to determine, no, it's absolutely locked down an ice. It's there for historical reference only, or actually it's a very good vocabulary. We're gonna keep it live. We can add new terms to it. Now technically we have the mechanisms to add new terms. We can indicate that the term is new and has been added in this way, and it was not part of the original one, et cetera. We can do all of those things, but we do need someone or a group to actually look at terms like bathymetry, or if someone wanted to add another term to actually review the content of that addition, and actually make a call on it. Now this is also happening for international geological vocabulary, same kind of thing. Initiatives make them, they sit there, they're used, the initiatives kind of crumble, and then some other initiative or some other group has to take them over and deal with them. So it's thinking about those things in these two cases here that I wanted to go through a few learnings, which I know is not great grammar, but learnings, and then to talk about, as we have done on many occasions, what we can do to do this better. How we can better preserve things, how we can prevent them from becoming orphans in the first place and how we can deal with orphans or potential orphans. So the first point is that personal commitment from people still matters sadly. It shouldn't be this way. It should be that the machine of vocabulary publication and use is impersonal and it exists in some fanciful way without human individual people commitment, but that's not the case. You still have to have people like the scientists who want to use a vocabulary to say that they wanna do this and contact someone, me, and then you need people like me who for reasons that evade me, particularly interested in vocabulary and want them to continue whether I use them or not. So we still have several different roles that are often satisfied by personal commitment. Again, it shouldn't be the case. It should be the case that the Australian government understands that this initiative is of value and it will resource it or deal with it or something. So when the initiative goes, there'll be a four back mechanism. And it really wasn't. We had to search around in the case of the NEI vocabaries to find the person at the bureau who would be the person to authorize someone else taking it over. And there was no indication who the bureau would prefer to have take it over. Department of climate change seems reasonable, but there's no real organizational arrangement there. It was literally the bureau doesn't really have any ongoing role here. And if some other bureau like organization like the Department of climate change is interested, sort of great, so very, very informal. In the ISO's case, it's been possible to find out who the information custodians of the code lists are. We know that because they wrote the standards, but it's impossible to find people with the authority to maintain the content and certainly impossible to find people with the institutional go ahead to actually publish those vocabaries and on behalf of the ISO. We may think it's a particular committee, but the committee that is the domain committee might be prevented from doing this by the ISO central organization, which just does not allow them to do that. A second lesson is organizational conglomerates are needed. So we have organizations that kind of wax and wane through their interest in the space and it seems to be only possible to have preservation here of long-term information assets if we have conglomerates working. So in the case of the bureau vocabaries, it's the bureau plus the Department of Climate Change plus ARDC. Without ARDC, we actually wouldn't have the content to continue to work with. In the case of the ISO vocabaries, it's the ISO plus Australian government linked data working group slash Open Geospatial Consortium that are propping up each other in different ways. So these conglomerates basically, you hope that it's one of the members of the conglomerate that's interested at all times in order to preserve these things ongoing. The next issue is that persistent identifiers really matter. I strongly advised the NEII initiative seven, eight years ago, whatever it was five years ago, not to use an organization or initiative name in their persistent identifier. So NEII.gov.edu. I said you can use it for your website but don't use it in long-term information assets and they did and eventually it died as expected. So we've now moved from that persistent identifier to linked.gov.edu, which is better in every single way. So they really do matter these pids. If you think you've got one that's working fine now, just cast your mind ahead and eventually it will not work when the organization changes. So think of the most stable government web address out there. And we're thinking of things like the ABS. ABS.gov.edu has been around for a while. Yes, until some government decides to call it the Department of Statistics and then it goes. So pids really do matter and we have to have them working. We cannot, whether the vocabulary is a semantic web or just XML code list or whatever, we really do need persistent references to them that can persist over system change and organization change. And the last point here is that Australia is ahead. Australia is doing this stuff to do with our own vocabaries, but we are influencing and being involved in these international ISO vocabaries, OGC vocabaries, British Oceanographic vocabaries, all kinds of things Australia is involved in and positively influencing. It's hard to know why this is the case. I suspect that it comes down to vocabaries or small field and there's not that many people involved. And if Australia has one or two more people than the next unit down the road, next country, then we are double their size in terms of vocabulary involvement. So it's probably a law of small numbers or whatever it's called issue. So if every country was doing more, then Australia would be less significant. So that's not a good thing to know actually, but it is what it is. So we have, as our pollers love to say, punching above our weight in terms of vocabaries. So thinking beyond those stated points, what could we do differently? I think that the people in this call when encountering a new initiative from government or within a group or even in academia, if they can do what I tried to do a few years ago and just try and set the ground rules for that initiative's vocabulary creation and use, just shift it along a little bit. Yes, use, persist, and identify. Just have a relationship with your peer organization so that if you go away, someone else might know about what to do about this. You've nominated a successor like all Kings of France should have done. And you have a link to the ARDC. You understand this system. You know the relation between what you're doing and what they're doing. So I think all of those things that we can do, when our initiatives try and create vocabaries will make a 25% difference, whatever that means to the scenario. So I'm faced with making system vocabaries within the Department of Climate Change. Now we're making a bunch. And when I do, I hope that they will be better set up for long-term preservation beyond the initiative I'm working on than the NEII ones and so on. So we thank the NEII for giving us this learning opportunity and to see the full lifecycle of initiative creation, vocabulary creation, initiative wind down, vocabaries on life support, ultimately the vocabaries disappear to then be resurrected in some form. We need to have seen that to when we have the conversations with the initiatives and the bosses and so on to say this isn't just speculation on our behalf, this is the lifecycle that we've seen play out. And it's sort of good that we've had a short-term set of, a short-term lifecycle actually go through this like now, now, now, that's really good rather than only knowing this in 25 years time and then having a problem in 30 years time. So those are my points. I'm very keen to talk about any of these particular vocabaries more. There are other often ones that I can think of, but I would be repeating the main points to talk about them, but I will stop presenting now and hand it back to the chairs and others to guide us.