 All right, so this presentation is just about the way in which my company and me are helping several organisations do vocabulary management. So it's definitely not the only way you could do vocabulary management, and it's definitely not the only way that we do it, but it is a style of vocabulary management that's got its place, and so I guess you can make your own judgments as to whether this kind of management is relevant to your situation and whether you think other places might benefit from it or not. So the kind of management that we're talking about in general before I start showing the systems and things is where you've got an organisation who's comfortable with creating vocabulary artefacts and have some people around who can get into depth about the technical artefacts if they need to, so they can understand vocabulary source files and fix issues with them. If the organisation doesn't have such people around then this style is not for you. You would need to use a tool to kind of do all of that for you, but if you do have such people then this might be for you. Now the particular vocab source files and so on that I'm talking about here are all RDF SCOS vocabularies. I suppose in theory everything that I'm doing here could be done with other kinds of technical vocabularies. I haven't encountered any recently, but if you had some kind of XML vocabulary format you could in principle do everything we're doing here. Okay, so what you're seeing on the screen is a test vocabulary server for ICSM, so that's the Intergovernmental Committee on Surveying and Mapping, so I guess it's a cross-government body, cross-government body, so state and sort of federal, and this is just a test system it's hosted on our infrastructure, it's not hosted yet by ICSM and you can see they've got three vocabularies hosted there. These vocabularies are pretty well known in some government sectors, the spatial data space I suppose, and some of them have been around for quite some time, so this one here address types, it's just a very simple list of types of addresses as used in the geocoded national address file, and so you can see we've got a rural type of address, a rural flat, etc. Very simple vocab, and one of the issues that has been lurking around this vocabulary is the ability for updates to this vocabulary to be absorbed by all of the people who actually need to use this thing, so one way to do it is to just create addresses with these different types and as new types emerge, shove the types into the geocoded national address database and people suddenly encounter them and nobody really thinks that that's a good idea, that's what's happening but it's not a good idea, better is that you have a published vocabulary and people can see the types and then you know you could even have control committees who argue about whether a new type of address should be accepted or not, so pretty normal vocabulary kind of situation now until recently these address types were all in a lookup table in a post database table or a CSV file or something like that and recently all we've done is taken, so recently actually 2018 we took this address types vocabulary and put it into a SCOS vocabulary format, in fact you can see that by the created date here 2018 and I'll talk in a second about the actual vocab format that's used but nevertheless we've got this vocabulary artifact, it's now being presented in this vocab system for ICSM with the goal being that they actually manage this vocabulary here and in all other systems go and use it from this point downstream including the geocoded national address file it would actually then become a client of this vocabulary, one other or a couple other words about the rest of the vocabaries in here so that was address types and that comes initially from an Australian standard and I might get the number wrong but I think it's 4189 AS4189 and then there's other vocabaries here for instance place name categories so this categorizes place names of Australia, again a vocabulary that is currently managed in excel by a committee and my hope and I think their hope is that this vocabulary becomes managed by ICSM in this vocabulary system and it's a straightforward vocabulary it's got categories and subcategories and so on okay so the result of this vocab management regime around these vocabaries are is vocabaries in this system so here you see three I expect by the end of July we'll have something like 25 or 30 vocabaries in here still in demonstration mode but at that number we should see you know serious amounts of vocabaries and people should within the spatial government community could look at this and go oh yeah that those are some of the spatial vocabaries that matter to me so results of management should be vocabulary in the system the actual management of the artifact does not happen in the system the management happens elsewhere and I'll talk about that in a second the system you're seeing here Vokprez just displays the vocabulary it can display status notes and it can display different ownership and all those sorts of things if you wanted to but in general the system assumes that the vocabulary that it's displaying is managed it doesn't do any management for you so it's very different to systems like pool party or top braid enterprise vocabulary net where you can actually manage the vocabulary in place the system not smart enough for that just an example of the same system in use elsewhere here's the geological survey Queensland their vocabulary system they've got about 80 vocabaries now so you can pick any of them geo admin features for instance and here we go block map series whatever lots of different things I don't know what these are but they also manage their vocabaries somewhere else and then Vokprez just really just displays them okay so the whole point of this presentation is now coming up which is the automated validation and management of the actual vocab content so and I'll just reopen this tab so for these three vocabaries here how and where are they managed and how is that process if at all automated well the vocabaries are managed in a github account as files so here's the github account I see you can see ICSM vocabaries and if I wasn't using dark mode you would see the ICSM logo above that but it's it's black on black so a bit hard to see but those three vocabaries really are these three files here one two and three and the Vokprez system just reads that file it puts into a database and reads it I should say but nevertheless whatever you see here for one of these vocabaries is in this file there's no other information so it's not like the system reads that file and adds additional notes to it or maintains additional statuses or anything it's a very straightforward what you see here is what you get there and the Vokprez system is just ready adding you know niceties to presentation so this github repository then is actually the data point of truth and whatever changes we make to vocabaries here will be reflected here and there's absolutely no filters in the sense that if we were to delete one of these vocabaries within 30 seconds or so the vocabulary would disappear from here and if we add a new one in here it would also appear there now that's deliberate we've not done such a tight cupping in the past we've had situations where say for instance the geological survey of Queensland's vocabaries they also have a github repository with their vocabaries in them but they chose to somewhat decouple them in the sense that the vocabaries in their github system that's what goes into Vokprez but there's a there's a bunch of human steps to sort of initialize a vocabulary system refresh the vocabaries go from github into a database and then Vokprez picks them up from there so that's human initiated I suppose slightly decoupled whereas in this ICSM systems case it's it's automatically coupled and so basically if a new vocabulary appears here and it's correctly formulated yeah it will automatically appear there okay so what does correct formulation of a vocabulary look like and how does ICSM or this is just chance to Australia um managing ICSM how would they get new vocabaries in here and check that they're all correct okay so this repository that you're looking at here has two so-called github actions which are just scripts that are initiated on certain actions in the repository and the actions are listed here actually I will let me show you the source code for those actions so the two actions are validate pr and update on merge now these little script files are configured to occur so they run when certain things occur and I'm just going to explain what's going on here so when a pull request so that is a request to merge content is made on this repository for instance adding a new vocabulary here this script will run and it's deceptively simple it looks it looks quite small it is a small script but what it's actually doing is it's running another script and that other script will then validate all the vocabaries in this repository so if I was to add a new vocabulary called vocabulary x the system would validate vocabulary x and I'll describe the validation in a second and it would also validate all these other vocabaries in here uh similarly if I was to um and that's that's for addition but it's actually for any action so if I was to delete a vocabulary it would validate and remain vocabaries if I was to update a vocabulary again it would validate everything and so you might think well that sounds like a rather lot of validation well even if we had a thousand vocabaries in here the the effort to validate is not huge and in fact the cost of validation and running these scripts is zero because of the way the github system works the kinds of scripts that we've seen here are run from our point of view for free obviously there's all those you know those sort of caveats about if you're not paying for it then you're paying for it in some other way well we're paying for it by using github and becoming um I guess fans of that system so that we're incentivized to use github for paid services but in the immediate term this script is run for free and it will validate yeah all the vocabaries okay so um before I get into the details of validation I'll just show you the other script so one script is any actions that occur to the the vocabaries trigger validation the other script is update on merge what this does is it says for any update deletion or addition to a vocabulary in this folder push those values over to the backend database that this system uses and trigger the system to restart itself so basically if a vocabulary is successfully deleted updated or added to the repository within 30 seconds a minute or so the this system that you're looking at will tick over it won't reload all the vocabs it'll just load the changed vocabaries and it will restart itself and then you'll see the vocabulary appear here so those are the two scripts now a little bit more about the validation okay so I'll just show you that validation script again so what this vocab this validation script does is it says yes when a pull request so a merge request against the master branch of this repository is is triggered it goes and creates a little virtual server somewhere magically in the cloud and it runs this validate vocab script against it so let's have a look at that script and the script you all of this is public in a repository so you can see all these things and I'll I'll put the links into them in just a minute in the chat so validate vocabs okay so this little vocab script it just iterates through all of the vocabs in this repository and it runs the same validator against all of them and it will basically print out an error to say if any of the vocabs are invalid it will print out an error and it will prevent the action that the user is requesting so merging of the vocab it'll prevent that action from occurring and I'll show you that in a second um what does it actually do to validate well it does a web request this script for a particular validator file and then it uses that validator file to validate the vocabulary contents so what is that validator file well technically the validator file is a shackle shapes asset so it's a shapes constraint language template which is used to check that the vocab reads contains certain information here is that asset oops there and I I'm not going to go through all of it but it's a very simple validator let's just pick one of the things here okay so so here is one of the criteria or the the things within this script so each vocabulary must have one and only one modified date blah blah blah okay and what the script does is it looks for the property modified date and it checks that there's a minimum one of them maximum one of them so one and then the type of that modified thing is either date time a date or a date time stamp so really just three forms of date um so if the vocabulary doesn't contain that um then this script will say it's invalid okay so that's the technical validator tool where's the actual specification for validation well it's it's this thing it's the vocpub profile so vocpub profile something I made up it's not rocket science by any means um it is um it is a schedule of requirements for a particular for vocabaries which um in some sense quite obvious so let's have a look at some of those um each vocabulary must be identified by uri these are fairly rock bottom sort of things for scoffs vocabaries now the one we looked at before the modified date we can find it here uh each vocabulary must have one and only one or that's created date there's a typo there it should say modified date there somewhere um and then there's a bunch of other criteria let's find another one so that's for a vocabulary what about for a concept within the vocabulary each concept must have one and only one title indicated by using the pref label property etc so there's all the schedule of requirements and the validation script echoes the written requirements here and in fact contains the same text so you can tell whether a vocabulary has um passed its validation if it hasn't passed you can tell which element of the thing has failed in general this vocpub profile basically says you've got to have one vocabulary per concept scheme in scoffs uh you can have as many concepts as you like you've got to have at least one though sorry so one or more uh you can have zero or more collections if you do have any concepts and collections they must have certain properties like titles and definitions very basic stuff um the vocabulary must have certain metadata created dates etc and and the vocabulary must indicate the top level concepts and all concepts must indicate whether they're within or they must be within the vocabulary or within some other vocabulary and if they're in another vocabulary but reused in this vocabulary they must indicate that so there's a few rules about that but the net result is that vocabularies that look like they valid according to this system end up being files well they don't have to be files but they end up looking something like this so here is the address types one you saw before and what does it say it says there's a scoss concept scheme with a pref label a publisher indicated to create a date etc a definition top concepts and then those top concepts appear somewhere down here in the vocabulary schedule this has also got a few collections this vocabulary so there's the scoss collection um for some purpose what's this purpose all concepts collection there you go so it's a flat listing of all the concepts in this vocabulary anyway so a very simple set of requirements to match but if a vocabulary does match these requirements then it can be displayed quite happily in a system like box press and also this really ensures that vocabularies are essentially managed they may be cross linked to other vocabularies but each vocabulary is really managed in some kind of file system uh rather than 20 being in one uh so the validation script once again it picks up that validator which it downloads off the internet and the reason it does it is that if I improve the spelling and the and the requirements in the in that validator then whenever this system runs it will pick up the latest copy um and then again it runs against every vocab in the system if all vocabularies pass then this request to insert update or delete a new vocabularies deemed to be valid technically valid now whether it's valid in terms of the substance of the material that's being contributed is is not for the script to determine that's for the vocabulary manager to determine but nevertheless it would be technically valid if it all passes and the um vocabulary manager would be presented with an interface that says there's this request to change a change of vocabulary in some way um it's technically valid and then they can decide to reject or accept that change if they accept the change then that next script is run this update vocabs and this update vocab scripts so a few lines of python code uh it takes those vocabularies or the the change to vocabularies and it posts them off to a database a triple store now this uses a particular triple store that we use um and that geoscience australia uses but in fact this script would be easily adapted to pretty much any of the standard triple stores out there in in you know in in 20 minutes it's a very basic take local rdf file posted to remote triple store very easy script so let's have a look at the log of actions that have occurred for icsm we'll go back so the log of actions so the validation attempts and so on are all listed under that repository's log of events and if there's a very loud bird here sorry um and we can see over time that there are some of these actions which have got a big red cross next to them and then some that have got green ticks so this one here place name categories this was in addition of that place name categories vocabulary um that i mean i made it i put it in here it was valid and then that was you can see the next action here merged so it was proposed and it was merged there've been some updates add place name categories etc and then there's been some rejection so i did a test addition of a broken vocabulary and it was invalid and so we got this red tick here and and that was just in a demonstration a few days ago and then shortly after i i committed the actual correct vocabulary and that was merged in so this system of having a target display system like vokpress which is quite dumb and then some kind of repository that contains the contents in that system this was replicated about four times so icsm gsq is not fully automated but hoped to be soon and there are several other servers at geoscience australia for international and and their own vocabaries and so on that use a similar system and from ga's point of view they're going to have four repositories and four vokpress servers that all operate in exactly the same way the vocabaries are stored in different places they have different management or governance regimes over those but technically they operate in the same way which is someone maintains the data store of vocabaries here they've got all of these scripts to assist them in doing that if things run correctly and are valid then the results appear fairly instantaneously in the target system and the person managing this the vocabulary content can always come in here and just delete a vocabulary or directly manage it and again the changes will occur here we can stop this system instantly picking up the vocabaries from here if we like we just tell it we just turn off the the scripts here that actually push content or push changes so it should run I mean you know these vocabaries are not expected to be updated every five minutes we expect new vocabaries every now and then and so far and and seemingly on most actions changes are made they're put through the system and it just runs through will run this for a couple more months before fully handing over to geoscience australia and as far as we can tell the staff there don't really have any problems running these things GSQ about the same they've been running as I say a not so automated process but they are looking forward to more automation and back to what I said at the beginning this kind of process is really only sensible I suppose for organizations that would be able to look inside a vocabulary file let's just choose another one FSDF themes and know enough about the scoffs format and so on to realize when there's a a syntactic or some other kind of error in this vocab file and actually fix it so if someone committed a a changed version of this vocabulary and they'd missed this comma the vocabulary validator would say this is invalid the vocabulary manager would have to know enough about the vocabulary to look at that message and realize oh I see why it's invalid it's invalid because there's a syntactic error in the idea file and they would be able to fix that recommit that file and that would be pushed through so it's as simple we think that the the vocpress system here looks quite nice it looks like icsm's website and you know the way it presents the vocabaries is probably as good as it gets for these kinds of fairly simple vocabaries you can look at individual terms oh that's not a good one let's choose another term nope we've got some issues there all right let's try another vocabulary thank you dev systems let's try address types no we've got errors all around okay something we looked at there well we'll try the gsq one i think the gsq's address type is yeah it's a very different address type it's for postage but you know we've got a very simple vocabulary presentation going on here um so there's no real controversy about how this being presented um and the management at that file level per vocabulary is about as simple as it gets these systems are aggregating all the vocabaries and you can do things like cross query all the content of all the vocabaries in place um so even though the primary storage of the vocabaries or the point of truth is some kind of flat file listing like you see here uh the systems present all of the vocabulary content in such a way that it can be cross queried and coming up shortly in the next month or two i hope to make a web page that allows you to query not just the vocabs in one vocab repository but in fact all of these independent um vocpress systems such as this gsq one the icsm one if it works uh the ga one and so on okay so