 We can see your screen. See a slide. Yep. We see slides. Very good. Thanks so much, Megan. My name's Liz. I'm presenting from Wurundjeri country. Always was always will be Wurundjeri country. I'm talking about taxonomy or vocabulary projects that I led a few not too many years ago now at the analysis and policy observatory. I barely have time to explain what apo.org.au is, but do visit it sometime. It's a digital library with policy information. It's got a great search interface. I'm going to be talking about a vocabulary project there as a case study. And then I'm going to offer some reflections about what about things like tools and governance and techniques, you know, that we encountered and used during the process. I guess just quickly the short answer to the question how the public taxonomy was made was really in that photograph there. That was the team when I was there. And it was really, I think just a quick reflection is that vocabularies and vocabulary projects really do impact every corner of an organization. You can probably imagine how vocabs and changes to vocabs impact the website development sort of area. It's also editors and catalogs, people who sort of create the value in these digital libraries. It's also leadership. I think Amanda might even be here today. But really you need to get leaders involved in defining the scope and the purpose of the vocabulary. It's important to have that sort of buy in. But even marketing as well. And we'll see that vocabularies used at APA were used to organize, you know, marketing products sort of thing. So everyone was impacted and everyone helped me with this project and it was a wonderful, wonderful experience. The reason that we were doing the project was because of a project called the link semantic platforms, which was an ARC funded project involving a few agencies. And one of the objectives of the project was to make APO taxonomies shareable. In fact, I think it was actually taxonomy in the singular. But we'll see how interesting that situation was about the plurality of the vocabularies. I'm going to dive straight into this slide, which is this is not a screenshot. This is actually a mock-up that I've created. But this is a true representation of something I encountered when I first started working at APO. So imagine that you're looking at a description of some metadata and a description of an article. And you like the look of what you see and you want to explore and find some other things that are similar. And on the right-hand side, you've got some links. But of course, you've found that you've got the word economics three times under different headings. So when I encountered this situation, my first thought was, well, there's some significant usability issues here, no matter what's going on behind these fields. But the other thing that occurred to me was, I'm sure there's an interesting story that took behind each of these fields. And it was part of my job was to uncover and document that story and to get requirements out of that investigation. It did remind me slightly of the multi-haul problem, but that's a digression. So excuse me, we skipped ahead. First of all, subject. What did subject mean? Well, it was a subset of the faceted application of subject terms or fast. And it was really there in the APO database because it had been sort of adopted after a minute out of harvest. Luckily, the fast IDs had been retained and that's an important factoid for this presentation. And also, APO had a seat in the fast working group. So APO was actually contributing directly into the working group for the development of fasts. Secondly, broad subjects. Now, the broad subjects was a short list, 15 or actually 13 topics at the time. And it was really there to address a system limitation where it was difficult to pull out a top level from the subjects. I think mainly because the subjects was itself a subset of another vocabulary. I noticed that some alignment work had been done to align the broad subjects with the fields of research, which is an Australian research classification system. And that I had to continue some of the alignment work as well. And the broad subjects were very important in driving marketing, such as newsletters and certain parts of the website and things and generation of icons and things like this. And last keyword, so what was the keyword? Essentially, it was a term that was not in fast, but we needed the term. It's not entirely true. Some of the terms that were in the keywords were also in fast. But the two of the terms that the editors came up with, so the people doing the cataloging came up with these terms and they stored them in the database as well. And I thought it was quite interesting that the keywords were referred to internally as an uncontrolled vocabulary. And I thought that was interesting. So the fast terms were regarded as controlled. The keywords were regarded as uncontrolled. And I took the editors a little bit of a journey there exploring the idea that maybe there was different types of control that were being applied. And there was certainly rules involved in how the editors decided what was the keyword and what wasn't. And I sort of led the move to document those rules and to refine those rules as well, taking them away from this idea that there was a controlled and an uncontrolled vocabulary. So to make a taxonomy shareable, there are a few things I needed to do. And so I've got this in five steps. And I just want to quickly mention that when I created this slide, I asked Microsoft PowerPoint to come up with some icons and it came up with the dinosaur for the third step. Can only imagine that's because of the Thesaurus Rex cartoon that probably many of you are familiar with. I thought I'd leave the dinosaur ring because I thought it was funny. I'll take you through these one at a time. First of all, improve the improve the keywords taxonomy. The reason I wanted to start with the keywords is because that is basically where the value was that API had created API created this word stock by keeping these keywords and this is over several years. Yeah. So this was API's intellectual property, if you like. The only problem with the keywords was that it had a very long tail of low use terms. So many terms that had only been used once or twice. And that's not surprising because sentence were not being controlled hierarchy was not being established. And the tools for looking up the terms in the database were a bit clunky. So it was very difficult to keep track of what had been added and where there were duplications in whether labels or meaning. So we worked with the data sciences at Swinburne, Yongbin Kang and very clever guy who use something called a subsumption method. This was some artificial intelligence method to identify co-occurrence between terms. And we had that checked by API editors and manual checking of the artificial intelligence and training the AI. And this helped me to clean up the tail. This helped me to find many synonyms and to assert those synonyms relationships. Wrong way. Sorry. Secondly, once that cleaning up of the long tail was done, it was time to merge the vocabularies. And there was a little bit of hesitation around the team. I have to say about this idea of merging the subjects, so-called subjects with the keywords. And I think I mitigated those concerns by saying we're keeping the fast IDs in the terms that had the fast IDs. So we'll always know that these were taken from fast. I also brought in some more metadata from fast like some of the alternative labels that were stored there that hadn't been imported before. Because we're starting to store synonyms for the first time at APO. And once that merge had been completed, I started applying, you know, more the Soros conventions, making sure that there was a full hierarchy, also associations, cleaning up orphans, terms that had no relationships. There were more synonyms from this process, which was great. We implemented those in the search engine. More updating of catalogs and cataloging forms and guides and so on. I'm going to go through this one a little bit more slowly if I've got time, I think. So to enrich the one taxonomy that we had now, I used the term analysis where we found that we had high-use terms. We targeted those terms to make more matches with fasts. And so we used a Scotts exact match relationship to those fast terms, which we were able to do because we have retained the IDs. But also, there's another vocabulary that was relevant to this project, APACE or the Australian Public Affairs Information Service, the Soros. It was a vocabulary that really is only officially published if you like, you know, in print format. It has not been maintained for some time. But one of our partners in the project, which was Australian data archive, you may have heard of them, was using APACE and its database. And so it made sense to us to do some mapping between our taxonomy and APACE. And this was an interesting story because I wanted a copy of APACE to import into the research vocabularies Australia editor. And I had a bit of trouble finding a machine-readable copy. I did approach the National Library of Australia, who's from what I could work out, owned APACE. They told me that I should go and talk to RMIT Training because they were managing APACE now. I said, okay, fine. So I went to RMIT Training and they told me I'd better go back to the National Library of Australia to get a copy of that. So I ended up using sort of an unofficial version that I found on the web. So I pulled that into the research vocabularies Australia editor and I used that to do the mapping. I'll show you a bit more about how the mapping was done. We also matched terms from our vocab with AGIFT, or the Australian Government Interactive Function Cessaurus, which made sense to APA because probably half of our audience was government as well as the other half being a sort of research or academic audience. And that was AGIFT was at the time hosted by the National Archives. I understand there's a bit of movement as to where AGIFT sits and how it's being managed. So some of those links have probably broken now. Also matching with the fields of research codes or these research classifications that we use in Australia. So instead of just having the top level of the terms aligned with the fields of research, I actually made machine readable links to those research codes for all the high use terms, I guess, in the Thesaurus. All this work resulted in more synonyms, which was wonderful and really enriched the Thesaurus structure so there was lots of synonym control. I had pulled a couple of these external vocabularies into the research vocabularies Australia editor, and it has a nice batch linking feature, which means that it can automatically detect where there are matches between vocabularies, but also makes it protection using alternative labels as well as preferred labels. So that's a really neat little feature there. Step number five, publish. So the public policy taxonomy is published at Research Vocabularies Australia in RDF and all the formats that you would expect to find, I suppose. We set up a bit of an interesting situation where we have the Thesaurus or taxonomy is managed there in pool party. It pushes a version back to Drupal, and when Drupal creates a new taxonomy page, the token that it creates is then copied back into Research Vocabularies Australia as a slightly messy manual step there at the end, but it works. The taxonomy is still being maintained, which is really good. And it's available over a slightly restrictive Creative Commons license. Just some quick reflections. I found it difficult to establish a community of practice for feedback into the public policy taxonomy. Policy is certainly a multi-domain area. It's a very multi-domain area. It's not the whole of knowledge, but it's an awful lot of things. It's anything that could be governed, I suppose. And while we had some special interest groups, for example, we would have a project decolonizing the taxonomy. We never really established, I think, probably a good ongoing source of feedback from the policy community. It's probably something that I would like to have done if I'd stayed in the project longer. 15 minutes, Liz, 15 minutes. Rules of engagement were unclear, as I think I've indicated, with things like approaching, you know, National Archives or the National Library or whoever, about things like A-Pace and A-Gift. And trying to understand, you know, could we, should we work with these vocabularies? How should we, what are the rules? They seem to be a little unclear at times. We never actually established proper URIs for the public policy taxonomy. We only use the ones that were generated by the software there. I was advised, thank you, Rowan, by you that I should approach the Australian Government Link Data Working Group to get that sorted out. I never got that far, although I will say at the time that it seemed an odd idea to approach a working group about something that would seem to be a service. I've already learned today that there's been some movement in this area. So I think I'm really interested to find out what happens with providing a URI service to people working in that environment. And also, yes, consuming the vocabulary back into Drupal, we weren't doing that as using sort of an open data situation. We had an application-to-application connection, which is a little bit of a non-standard thing to do, which is not, I suppose in a sense it's not great because if I was ever asked, you know, how should I integrate this with a system, I didn't have a very good example to point out. Okay. Thanks, Rowan. I'm finishing.