 Today I'll discuss an activity that I undertook in 2018 to extend one vocabulary by incorporating another, an extension of the ANZ classification by linking the GCMD earth sciences with keywords. The purpose was to illustrate how a browse path through one vocabulary may be extended by incorporating another. And the reason that I mentioned 2018 is to ground the activity in that context, as I was using tools that I was aware of at that time. These are the areas that I'll cover. Firstly, an introduction to the vocabularies and how the ANZ provides a browse path through the ARDC metadata portal. Then the approach that I took, focusing on the basis for making linkage decisions, the tools that are used and how to access the outputs. And finally, some notes on limitations and caveats associated with this activity. The ANZ fields of research classifies 22 areas of research undertaken in Australia and New Zealand, a broad classification covering many research domains. In contrast, GCMD focuses upon earth sciences and is more specific in its coverage of that one domain than the ANZ is of its 22 domains. GCMD comprises at least 14 component sets of keywords. For this exercise, I used one of these keyword sets, GCMD earth science keywords. And I chose the earth science keyword set as it happens to be used by some of the providers of records to the ARDC metadata portal. Research data Australia is a collection level metadata portal. RDA scope is broad and may include any area of research. Most of the submitted collection records are tagged with at least one ANZ subject. And so RDA uses ANZ to provide subject access to the metadata records describing the data collections. ANZ provides a browse path through RDA. By clicking on the top level ANZ division, I am shown the subject pathway within that division. And at each level of the path, the display of metadata records refreshes to display those associated with this select with some kept term. Clicking on earth sciences refreshes the display showing records tagged with earth sciences and narrow terms of earth sciences. And if I click on a narrow term such as atmospheric sciences, the display refreshes again. And so we can see how the structure of the ANZ is implemented within RDA to provide a point and fit browse path through the metadata records. However, although I may browse through the ANZ coverage of earth sciences, there are no links between ANZ and GCMD. Although GCMD provides more specific coverage of earth sciences, I cannot, for example, browse from a broader ANZ term into a more specific GCMD term. These are the areas that I'll next cover. Firstly, an illustration of difference between treatment of earth sciences within ANZ and GCMD. Then an overview of the approach that I took to making links, how I decided which links to make, and then a discussion of tools and outputs. Both of the vocabularies are hosted by the ARDC Vocabulary Service. They both appear in the ARDC Vocabulary Publishing Portal. Each vocabulary in the portal has a landing page. This is the landing page for the ANZ, including a description and access points. A landing page also includes a browsable tree visualization, a point-and-flick approach to navigating the vocabulary. And in the next slides, I'll be talking about some differences in each vocabulary's treatment of earth sciences. And I'll be using screenshots from the tree visualization for ANZ and GCMD, respectively. GCMD has 14 subdivisions of earth sciences, while the ANZ has seven. On first glance, does this difference in number of subdivisions itself indicate that GCMD may cover more discrete earth science subject theories than ANZ? Maybe or maybe not, it could be the case that ANZ covers all of the 14 GCMD areas and that they are simply organized differently. By scanning these subdivisions, I may start to see where some areas of the GCMD may relate to the ANZ. Is there a relationship between atmosphere in GCMD and atmospheric sciences in the ANZ between solid earth and geology, between oceans and oceanography? And what about agriculture and biological classification? They're living earth sciences on the GCMD site. Where are they within ANZ? These vocabularies are organized quite differently to each other. There are a number of areas within the GCMD earth sciences that are not found within the ANZ division of earth sciences. And they are perhaps maybe more at home in other divisions of the ANZ. For example, agriculture and veterinary sciences, biological sciences, studies in human society, environmental sciences, medical and health sciences and engineering. So quite a range of areas within GCMD earth sciences that are not included in ANZ coverage of earth sciences. So how did I decide which part of GCMD to link to which part of ANZ? And what basis was I making these connections? I made selections based on labels, definitions and context. And here I mean context in terms of a concept's location within a vocabulary and relationship to other concepts within that vocabulary. Where is the concept? What are its siblings? What are its broader or narrower relationships? And for the most part, I didn't look at context that might otherwise be provided by external sources. For example, I didn't undertake an analysis of metadata records or full text documents to which concepts have been tagged, where such documents may be judged to reflect the meaning of a concept. If I found a label identical in both GCMD and ANZ, then that provided the starting point for examining whether the concepts denoted by the label may share a common meaning. If the labels are the same, can something be said about the possibility of some level of similarity of meaning? For example, does the GCMD concept for mortality relate to that of the ANZ concept? Or if a label was similar in both, how much is animal breeding like animal breeding in genetics? How much is agriculture like agriculture, land and farm management? Concept definitions and scope notes can also be used in considering relationships between concepts from separate vocabulary rooms. By examining the definitions, I may try to understand the degree of similarity of meaning. And by examining scope notes, I may try to understand whether the way the concept is used in one vocabulary is the same as it's used in another. Here is an example of a definition in the GCMD, hopefully indicating that food science relates to products of human consumption. And so not to be confused with food products, which appears just above food science in the tree. GCMD contains concept definitions, but ANZ does not. So I wasn't able to compare a GCMD definition to an ANZ. But in the absence of definitions, I put at least examine context. On context, I could look at the placement of the concept within GCMD. Where does it fit and what is around it? In the GCMD, food science is within an agricultural context. And in the ANZ, the same is within an engineering context. If I examine the narrower term to food science in both vocabularies, I see some commonality. For example, food packaging and food processing appear in both. And in working out these links, I try and put myself in the position of someone using the browse tree. If, for example, they were navigating a metadata portal. I'd ask myself questions like if I was to merge a term from GCMD with one from the ANZ. And in so doing, combine what I've determined for the associated subject areas with the resulting aggregation of metadata records be helpful to the user. And as I don't have a background in earth sciences, I found it helpful to read some descriptions of the name. What is it? What are its sub-denames? How do they relate to each other? And this helped to give me some level of orientation independent of the main representations within ANZ and GCMD. So that's how I went about considering which links to make work. I looked at labels, definitions, context and domain descriptions. And I used tools to express the relationships and to produce a browsable output. Although I tried a couple of tools aimed at automating application of concept relationships, they had only limited success. Maybe it would have helped if the ANZ included definitions and alternate labels, providing more semantic hooks for the tools to work with. It may perhaps also have helped to have a set of gold standard metadata records representative of each concept to feed into the analysis tools. But I didn't have ANZ concept definitions and I didn't wish to create sets of metadata records for each concept. And so I instead selected a tool that offers a semi-automated approach to expressing relationships and which outputs an idea of links. And on the processing side, I used a Python library to provide a straightforward approach to working with idea to produce a navigables cost file. The alignment tool suggests mappings, which may be accepted or rejected. Having both vocabularies on screen helped me to navigate, looking at labels, definitions, and context. And the tool provided a straightforward approach to expressing relationships between concepts. At the bottom of the screen, I could scroll through previously expressed links. And the tool outputs an idea for representation of the mappings that have been expressed, a link set. The link set is a separate file that expresses relationships between concepts. And the main relationships that I used in this exercise was cross-broad match, narrow match, and attack match. For example, for this exercise, to support an aggregation of possibly disconnected but semantically related metadata records, GC and the atmosphere is close enough to ANZ atmospheric sciences. The narrower concepts of the GC and the atmosphere become narrower concepts of the ANZ concept atmospheric sciences. In this case, if the structure of a linked ANZ and GC and the were implemented as a browser within a metadata portal, then metadata records associated with either the GC and the atmosphere or the ANZ atmospheric sciences concepts would be gathered together for the browser user. With broad match, the GC and the concept becomes a narrower concept of the ANZ. For example, GC and the biological classification becomes a narrower concept to ANZ biological sciences and itself provides a whole lot of more specific areas for browsing. And this was an unusual case. Most of the mapable GC and the concepts could either be merged with an ANZ equivalent or become a narrower concept. There's no close or exact equivalent of a concept solid earth in the ANZ. From reading third-party domain descriptions, solid earth should appear above geology in the hierarchy. And so in this case, the GC and the concept became broader to an ANZ concept. ANZ geology became a narrower concept of GC and the solid earth. And I then define some rules to use in the process instead. If a concept has a broad match relationship to another concept, then do this. But if a concept has a narrow match relationship, then do something else. The process, the link set along with the two vocabularies and the rules that I defined relating to relationships to produce a navigable SCOS RDF file. The library that I used is called RDFlib and it provides a straightforward approach to working with RDF to create a graph that contains resource descriptions that express SCOS concepts. If you'd like to explore the combined output, the browse tree is available on the demo of vocabulary publishing portal and the link set file is available for download. I think that the results, the resulting output would be suitable for the purpose of improving the gathering of related records to assist browsing in a metadata portal. But there are some limitations and caveats and these are the ones that I thought were off along the way. The resulting browseable SCOS file is an output of an exercise. It isn't a published authoritative artifact and I haven't implemented it within a metadata portal. So although I may browse through the file itself, I can't check how it may work in presenting an aggregation of metadata records. Primary browse path remains ANZ at SRC, the GCMD concepts and sub-trees fitting within the ANZ structure. And this is not the same as browsing GCMD alone. I'm always browsing an extended version of ANZ at SRC. There's the problem of the, if you're at all familiar with ANZ, the problem of the not elsewhere classified tags that appear throughout ANZ. And I wonder, are these tags used? And if they are used, what grab bag of subject matter might be covered? These records have been classified not in a way that will be of much assistance to the user. And this relates to the next problem. The problem of records classified at a higher level in ANZ because the more specific GCMD classifications would not be available at the time of record creation. People are classifying using ANZ and have no access to GCMD. And of course, they're beyond going maintenance. And as well, this was one approach but there were the other upper level ontologies that could replace or complement the ANZ into the vocabulary such as GCMD and others might slot. Here's some links to the alignment tool, resulting links at the RDF library and the browser version of the end result. And that concludes the presentation. Thanks. Back to you, Simon.