 Vocabularies. How do they help research? How do you find them? If one doesn't exist, how do you build a vocabulary from scratch? And how can ARDC Vocabulary Service help? Mostly, how do vocabularies help research? Here are two researchers discussing research data that has been created by a third party. They don't agree on the meaning of the data. Is it a number? Is it a number nine? Is it a number six? Is it neither? Is it not a number but something else? Is it the letter G? What is it? If only the original creator of the research data had annotated it with vocabulary concepts, then the data would express itself, express its meaning, would communicate what it is. And then the researchers would understand what it is and have a discussion on the basis of that shared understanding. Or at the very least, the researchers would have a shared understanding of what the creator of the research data believed it to be. If I were to find data or to analyse data, aggregate it with other data, my task is made more difficult if I can't be sure what the data is, what it means. Vocabularies help research by providing a way of formalising meaning. These formalisations are published and then can be used and reused to annotate data. Data annotated with meaning helps discovery, analysis, integration and reuse. Here's a classic research data semantics problem, a problem that is addressed by using a vocabulary. In this example, the same label has different meanings. The label alone does not sufficiently convey meaning. In Victoria and Western Australia, there's a beer measure called a pot. Same label but a different measurement, 285 mils in Victoria and 575 in Western Australia. If I come from Western Australia and am visiting Victoria and I ask for a pot of beer expecting a WA measure, I'll get a smaller beer than expected. No great loss. But what if I'm working in a project team where precision in measurement is very important? What if we don't have a shared language for expressing measurement attributes? This calculation may be quite detrimental to the project. It's a related problem which can also be addressed by using a vocabulary. This example shows extracts from a land and water research publication. In this example, do these labels have the same meaning or different meaning? Does dissolved nitrogen mean the same as dissolved total and organic nitrogen concentrations in the water column? Are these labels all about nitrogen? Do they look different but mean exactly the same thing? Back to the beer glass example. We can take this information and provide a structure for a reusable vocabulary. And then we can annotate our research data with vocabulary concepts and so assert what we believe to be the meaning of the data. And in doing so, we can remove ambiguity and provide clear differentiation between labels that are the same but which have different meaning. To address this problem, we have a vocabulary of beverage glasses, a vocabulary comprising concepts that have descriptive properties. Here are two concepts that have the same label, POT, but they have different definitions, different sizes, different jurisdictions. The meaning of these concepts is formalized and encoded. And these formalizations are published on the web, each with its own unique address. I can differentiate between the WA POT and the Victorian POT. And so if my research concerns Victorian beer glasses, the vocabulary helps me to communicate that fact and to avoid ambiguity. I annotate my research data with the concept that is about a Victorian POT. In this example, two sets of data. One is about the Victorian POT, the other is about the WA POT. Each data set is correctly annotated with the URL of the concept that expresses what it's about. And those URLs can be followed to get all of the other properties of the concept, the label, definition, jurisdiction, source. And so we see that a vocabulary concept is an identifier with metadata. In my local system, I record the concept identifier and cache any other properties that I need. And because I can trust the service that provides the vocabulary, I don't have to maintain that vocabulary myself. I can instead use the identifier to grab a copy or a revised version as it becomes available. Now to some examples of vocabulary use. Vocabulary is used in many research areas. Here's an example from the Atlas of Living Australia, classifying the Galar, an Australian parrot. This example provides a good illustration of a hierarchical vocabulary, each level more specific than the level above, from kingdom to order to family to genus to species. This is a Galar and this is where a Galar fits in the taxonomy of meaning. And this is a great illustration of using vocabularies to aggregate data. By different organisations consenting to use these shared and agreed vocabularies, different types of research may be brought together. The ALA is a great example of how data may be aggregated and how that aggregation may be helped by agreeing to use shared vocabularies. For this species, shared vocabularies help to aggregate images, scientific names, occurrence records, taxonomic literature, genetic sequences and data providers. This vocabulary is used by the ALA but is not created by the ALA. Rather, these types of vocabularies are collaboratively managed. In this case, the Australian Biological Resources Study works in partnership with Australian organisations covering flora, fauna, biology. These are vocabularies as living resources, changing over time, reflecting changes in knowledge through discussion, research, argument, agreement. And it's the collaborative process which manages the outcomes from the discussions, managing the organisation and reorganisation of these classifications through community consensus, agreement on meaning. Vocabularies can be multilingual, assisting data sharing globally. International organisations use vocabularies to standardise terms and translations in international affairs. For example, the UN terminologies are translated into six main languages to eliminate ambiguity. What if your research covers a variety of subject areas and you want to relate those domains to each other? Or if you're using a number of separate vocabularies, vocabularies may be related to each other to assist data integration across subject areas. The backbone thesaurus is an example of a meta vocabulary developed to be a high level overarching vocabulary for more domain specific vocabularies and humanities. The aim being to make it easier to connect local controlled vocabularies and establish relationships among them through common high level concepts. The unified medical language system brings together many health and medical vocabularies enabling interoperable biomedical information systems. Vocabularies are used with machine learning, text analysis and other computational research techniques. Research on its sentiment analysis for opinion mining uses computational analysis to identify, extract and study subjective information with vocabulary concepts used to classify sentiments and opinions. In marine science, the understanding marine imagery project is using machine learning with image analysis to automatically apply vocabulary concepts to marine images. If you've decided that your research would benefit from a vocabulary, how do you find one? There are directories of vocabularies like Bartok where you can search for vocabularies on many different subject areas and their registries of vocabularies really focusing on particular subject areas. For example, Agro Portal is a registry of agricultural vocabularies, CESTA covers marine sciences and the ARDC hosts a number of research vocabularies. There are other ways of discovering vocabularies to support your research. Are there informatics people in your research domain? Are you using a data model or metadata scheme that recommends or requires use of particular vocabularies? Are there related data infrastructure projects growing on overseas? Are there professional associations that take responsibility for terminology development? Does your research literature include discussion of informatics initiatives? If you find some examples of vocabularies that may be suitable, how then to make a selection? Well, does the vocabulary suit your purpose? Do you need something to enable broad discovery or do you need annotation of data properties? Do you have a target research community that you want to work with or a jurisdiction in which you are required to fit? What vocabularies do they use? Is the vocabulary sufficiently expressive? Does it enable you to sufficiently convey the properties of the research data that you are working with? Is the vocabulary in a format that you can use? Is it back sustained, maintained? Is it available from a service that is reliable, governed and sustained? What happens if the vocabulary you want doesn't exist? How do you approach building a vocabulary from scratch? It can be quite time consuming to develop a vocabulary and then to maintain it over time. And so wherever possible, ARDC encourages reuse of existing vocabularies or extending existing vocabularies or subsetting existing vocabularies rather than starting from scratch. Otherwise, vocabulary creation involves identifying concepts, defining concept meaning, providing identifiers for concepts, publication and ongoing maintenance over time. If you intend to create a new vocabulary, ARDC recommends that you follow fair principles and that you consider guidance offered by the reference paper. Ten simple rules for making a vocabulary fair and ARDC provides some links to resources on vocabulary design. Finally, what is the ARDC Vocabulary Service and how can it help your research? If you need to create or manage a vocabulary, RVA provides vocabulary editing software. Vocabularies can be published in formats that are usable by people and machines. Vocabularies are browsable through a web user interface. Vocabularies can be integrated into local systems. If you need to create a vocabulary, the editor is easy to use and outputs vocabularies in a standard format. There's a means for publishing vocabularies. This is a view of the publishing portal showing a vocabulary created for AODN, Australian Ocean Data Network. It's the AODN platform vocabulary. Each published vocabulary has a landing page enabling users to see what the vocabulary is about, license conditions for reuse, how it may be accessed in previous versions. The portal also helps to make the vocabularies more findable and more visible. As metadata about the vocabularies is harvested by Google. Here's an example of how RVA services can be integrated with your local systems. The AODN has a portal which provides access to marine and climate science data. The AODN portal uses AODN vocabularies. The vocabularies are hosted by ARDC, so how do they end up in the system that drives the AODN portal? The answer lies in the fact that RVA provides technical means to draw vocabularies into local systems. Access can be provided by an application, programming interface, sparkle endpoint or a widget. AODN edit and publish their vocabularies using RVA services. When a new version of a vocabularies is published, the AODN portal uses the RVA machine interfaces to get hold of it. Here are some links to information about the service, contact point and a link to the vocabular interest group, which you are welcome to join. And so, in summary, vocabularies help research by providing a way of formalising meaning. These formalisations are published and then can be used and reused to annotate data. Annotated with meaning helps discovery analysis, integration and reuse. ARDC provides services for vocabulary creation, management, publishing, discovery and use, services that can be integrated with your local systems. That concludes the presentation. Are there any questions or comments?