 So this presentation is naturally following on from another one that was given in the last session, so that was one quarter ago, three months ago, and that previous presentation was about the governance or the impending new governance regime around a series of Australian vocabularies that deal with information about addressing. And the reason there's a governance challenge is that the vocabularies, contributions to the vocabularies come from a, let's call it company A who discover term use in the wild and they bring that into their database. But the actual authority of the codelist is conferred by another agency, a government agency set up to manage spatial information. So there's a pattern here of information arriving and is unverified and then ultimately needing verification and massaging and so on. So there's a new governance regime being set up to deal with that. And the codelists that we're talking about are those codelists, but we're not really going to talk about governance in this presentation. This is a technical presentation, essentially. Okay, so as it says there, we're talking about the vocabularies associated with the geocoded national address file, which I'll define in a second. And we're talking about a current technical SCOS vocab publication approach. So I'm going to say this now and I'll say it again in the future. This is not the way to do everything for everyone. This is one method that we've implemented for this project for publishing vocabularies and it's definitely not appropriate in some of the scenarios John and others are going to talk about, but it's one that we've found useful for a number of reasons. Okay, outline. So what is the GNAP? I'll talk about that. I'll talk about a link data version of it. I'll talk a little bit about the ontology and the vocabularies that we're using, some initial directions that we undertook, some final directions that we're implementing now. I'll talk about the publication mechanics and I'll talk about future production. I've put a slide in there about governance, but it just says C. Joe, who's my collaborator. He'll tell you all about the governance. Okay, so the GNAP. So what is the GNAP? So as it says there, it's so PSMA is a government corporation whose job is to collect and distribute addressing information in Australia. And the GNAP contains all physical addresses in Australia. This is PSMA's own words. It's the most trusted source of foundational geocoded addresses for Australian business and government. Now, they've maintained this product, which they assemble from states and local councils and others for a few years. And recently, the open GNAP has been delivered. Now, what's happened is the Australian government has essentially paid out that company to make the data freely available in an open data initiative. And they've put that database or that collection of data on the web and they update it every quarter. Now, the data is typically used as a giant postgres database. It's got 13 and a half million addresses in it. And the addresses have got in the order of 20 statements of metadata about each one, something like that. It's got hundreds of thousands of things that are not addresses in there, like street localities, localities, the Australian states, and so on. And it's got a whole pile of lookup tables. So example there is the lookup table that tells you what type of address alias something is. So you've got two addresses. One is the alias of another. And how is it an alias? Well, it might be a synonym. It might be a so-called contributed defined alias, alternative locality, et cetera. Arranged addresses where you've got a principal address with sub addresses, so units in a block, that sort of thing. Okay. Now, so many of these lookup tables are aligned to a standard. And that's the standard that needs to be governed. And I mentioned at the beginning of this talk is governed, but not all of the lookup tables are aligned. And these may see use outside of this product. So some of not so much the synonyms one I showed, but you know, address type and that sort of thing, they could be used elsewhere. And it's for this reason that we're publishing the vocabaries as something distinct from the product itself from the actual 13 and a half million addresses. Okay. So in the last six months, Joe, who is the CTO of the company that produces the product, and I have worked on a link data version of this database. It's online at the moment at genafld, linkdata.net. And it's soon to become an Australian government link data working group, anointed or allocated data set. So what that will mean, we're not entirely sure, but almost certainly the URIs that we use for all of the components in the data set will be official government URIs. And there'll be some registering of the data set and various other elements of it in Australian government work link data working group systems. So you'll be able to discover that there is such a data set as the genaf link data version. So ontologies and vocabaries that the data model that the genaf link data version actually delivers. It delivers a couple of different data models, but the primary one is to find in an ontology, which very closely matches the database, surprise, surprise. So what we've done is we've taken the database and we said, what does an ontology version of that look like? And we've generated that and there's an API that actually delivers that content. And what you're seeing in that diagram there is just a very high level, main class level view of the data model. So you can see from that that there's a class called an address and it can have an alias. It can have a geometry and addresses can be within localities, et cetera, et cetera. So there's a few, not all the relationships in the ontology are listed there, but it gives you a feel for the kinds of things that are in this data set. So the genaf also, so we've got, just to recap then, we've got the ontology that's delivered, that talks, that describes the data that's being delivered. We've got an API that actually delivers the data. And we've also got then codelists expressed as vocabaries delivered alongside. So those codelist terms or the vocabulary terms are both used in the data set and available for use outside it. And I've just put a couple of arrows in there to show you that if you go to the website, you can see exactly where these things are and we'll probably click on those in just a minute. Okay. Oh, I made a spelling mistake there, but this implementation that we've got here is similar to other implementations where we have an ontology that describes something, in this case everything you need to know about the genaf's version of addresses, and then a series of codelists that are used with the ontology for various classification purposes. Okay. So the vocabaries themselves have a namespace that's an extension of the namespace of the ontology. So it's got a word code at the end. And you can see there, the codelist register has one, two, three, four, five, six, seven, eight, nine, 10 vocabaries. And by the by, the index that you're looking at there, which is a web page, is itself available in IDF. And as a register, and you can actually navigate to all the codelists from the top level entry point of the API. That's just for fun. Now, looking in one of the vocabularies, the initial direction was to say, well, we've got an ontology that talks about all the classes of things that we're interested in, and the vocabularies are essentially just subclasses. So for instance, address type, we've got an address class, let's just have a whole pile of different subclasses of address, rural, rural block, rural cabin, urban, urban house, etc. And ontologically, this is fine. It was very easy for me because I understand what an address is, or I certainly want the data model thinks an address is, and I look at this codelist that it uses for types of addresses, and I say these are all the subclasses of address, easy enough. And then we allocate those 13 and a half million individuals against one of those subclasses. But it's very difficult for other people to use this because there's just not that much familiarity with our out there. And there's little tooling that can really easily handle big collections of subclasses of things. And I'll come back to this, but there's also no practical reason not to use a more simple vocabulary in the end, something we discovered later on. So Rowan, who of course is the secretary of this group, and I had some discussion about how we were doing something here that wasn't your normal and these are air-quoted normal vocabularies, but in fact, an ontology which happens to include an awful lot of subclasses in a hierarchy, like you see over on the left there. And we moved away from that, and I'll describe that in a second. I'll just mention though that some of the codelists were lists of individuals, not classification classes. So for instance, the list of Australian states, there's six states in Australia, a couple of territories, and a few other bits and pieces like a catchall for all the addresses in Australia that are not in a state or a territory, the so-called other territories. And those things are not classes, they are individuals. There is the state of New South Wales, not the class of things New South Wales. So that's just one distinction. So the later direction is to say, well, actually these collections are the same structure that you see over on the left there of addresses and different types of addresses. We can just call that a scops vocabulary and we can use it quite happily. We can use it in conjunction with an owl api, sorry, an owl ontology that tells you where to use the classification thing, but we can also publish it just as a intellectual piece of work that says these are the kinds of addresses that we know about in Australia. And we've got tooling that supports scops much more readily available than we have tooling that supports generic owl. And so this is one of the reasons for moving to that. These things are not mutually exclusive. We can have them as both owl subclasses and scops things, and in fact we do. But the important thing here is to say, despite me trying to do something difficult, I have gone back to fairly standard scops, which would keep Rob Atkinson happy for those who know him. Okay, now this is a bit of a horrendogram, but it is the total publication process, and I'll just talk you through it because this is what actually happens, and this is really where I wanted to go with this presentation is to talk through this. So over on the left we have a database containing vocab terms, and the reason is that the company that acquires the information from states encounters new vocab terms every now and then, and they put them in the database, and then they publish that database. Then I run a Python script which takes chunks of vocabulary metadata like the actual name of the vocabulary and things, reads the database and extracts a local vocabulary IDF file, you know, a scops file, from that, or makes, I should say, a file from that information. From there, three things happen. The first is that I take that IDF file and I manually publish it in the ANS, the Australian National Data Services Research Vocabularies Australia portal. I just upload a new version whenever I run the script, and that automatically generates a CISFOC version of the vocabulary. The second thing I do is I take that local IDF file and I commit it to a remote GitHub repository for safekeeping, and I'll come back to that in a second. And the third thing I do is I take that IDF file and I use another Python script with a bunch of HTML templates and I assemble per vocabulary, so 10 of them, an HTML file that presents that vocabulary in a fairly easy to use web way, and then I put that in GitHub. And then finally, on the server that delivers the genafld.net, so the data set, the API, the ontology, I pull that information down as static files of HTML and IDF. So that's the total publication process. So we end up with a copy of these things in the latest version of a copy of these things in the ANS vocabulary portal, but the actual visible delivery of the vocabs and even the IDF versions are all made available via this single web server and are not contributed to a pool of vocabularies elsewhere. Okay, so publication mechanics, the ontology, the genafontology and the vocabularies are actually just a GitHub repository, and you can see a folder there called code docs. I think we might be missing one. No, no, we're not in code. Code itself has all of the vocabularies. If I clicked on the code folder, you would see 20 files. You would see 10 IDF files and 10 HTML files, but they're all just sitting there. So this is, you know, the management point, but the actual publication point is elsewhere. Okay, so last, we're getting towards the end, folks. So the vocabularies are published and each has its own URI, and now these URIs will change because we're going to be using Australian government ones soon enough, but the mechanics are going to be the same. So where we have slash def slash genaf for the ontology, we have slash def slash genaf slash code slash something or other for the various vocabularies. So you can see there's address types that are highlighted. Each vocabulary uses so-called hash URIs after that for the individual term. So you can see the example there, address types has a rural unit within it, and you can get to that by just clicking on that address there with the hash. So what this means, of course, is that whenever you resolve an individual vocabulary term, you get the entire vocabulary. Not super helpful for some applications, but reasonable enough for other uses, and we'll see that in a second. Yes, so the whole vocabulary, you can get either an HTML or an RDF, and there's some content negotiation you can do to determine, and you can click on a link to determine whether you want the human readable or the machine readable version. And finally, all the vocabularies are actually cached in a triple store for cross vocabulary searching. Okay, so future production, we're going to automate some of the steps in that horrenda ground that I showed you, but not all of them. The actual publication of the vocabulary versions is relatively infrequent, and there's no real need to push, automatically push versions to ends whenever we get the new one. It happens, yeah, on a monthly sort of time step at the most. We will update some of the git pushes so that when you commit files locally, they'll just appear in the authoritative backed up repositories and so on. Now, for Governance Ask Joe, this is my last slide, but I just wanted to quickly show you what this actually looks like online so that you can see that. And I want to hopefully click through this and see if I can actually show you what I'm doing. So I'm going to share my screen again, and hopefully you will see a web browser. Okay, so you should see the meeting agenda again. Can someone nod if you can? Yes. Great. Okay, here we go. So we've got a genaf.net, genaf.ld on it. There's the web page I showed you. The ontology is a very standard documented ontology with the diagram that I showed you. Going back one. We go to the genaf codes. Here's the code index. Here's address types. And what you're seeing here is a very, very, very simple templated web page that has a little bit of information about the concept schema, a bit of information about the collections that it contains. And then each term has a little chunk of information. So it tells you that it's a concept, what the label of the concept is, the URI, alternative labels for what the source is. And all of the terms come from the genaf product description. Some come from elsewhere. And the contributor, which that's just my AUK ID in there. And the contributor is not the contributor of the term, but the contribution of this representation of it. We've got some arguments about who exactly is sensible to put down as a contributor. But for the moment, it's just all me until we come up with a better bit of metadata there. But the important thing is that to produce an HTML file like this, which works nicely per term, let me just, I'm just going to scroll down. So let's just choose this term here and run that as a web address. Oops. I'll go back up to the top if I put that in. So hopefully you can see that what I'm putting in is that it's the web address of the vocabulary with hash, you rule at the end, and we go straight to that term. So this is very easy for people to use. It's not rocket science stuff, but if you want to resolve the term as a human, you put it in the web browser and you get straight to the term. If you do this as a machine and you ask for IDF, it's the same thing as clicking on, where is it? Up the top here. Get me this vocabulary in IDF, which if you click on that, we'll download an IDF file. So you can't zoom into the individual term, but you can download the vocabulary within which the term sits. So that's it. It's as simple as can be. And as I've said, and I promised I would say it again, this is certainly not what you would do for all vocabularies, but there's a reason that we're doing this here. And the reason is partly to do with the way terms are contributed. And at this point, at least the branding of the vocabularies, they're all bundled up with the data set. They are themed in a very simple way. They are simple vocabularies. There's hierarchies and so on there, but it's really each vocabulary is set up exactly the same with a single concept scheme, one or more collections and then a set of concepts. And we're not using any fancy tooling. Management of the vocabularies is from database plus template file and a bit of scripting and so on. There's no fancy vocabulary management tools there. The last thing I'll mention is that we are starting to get clients reading the vocabularies and doing other things with them. So in another database that's been set up to use some of this information, we're actually reading the vocab files and making database tables from them so that we can do vocabulary lookups against the database, which is much quicker than than doing it against text files.