 So thank you Robin for the introduction. So welcome everybody to this webinar dedicated to the new vocabulary server. The vocabulary server can be accessed using this link. Our webinar will be in two parts. The first part will be done by myself and it will be an overview of what the NBS is and why vocabularies are important. And then the second part Alexandra will take over for a technical work through some of the technical basis of the NBS. So what is the NBS? It is a semantic repository for standardized terminologies that are used for the management of data in marine and related domains. It stores and serves terms and their relationships and relationships between terms in a human and machine readable format. And here's a screenshot of one of the pages of our vocab server. So standardized terminologies are knowledge organization systems, also known as COS. They range from simple list of terms to full ontologies. So simple list of terms could be things like glossaries, dictionaries. And they are used to organize information and to provide terminologies to catalog and retrieve information by human and machines. The diagram on the right from Vang shows knowledge organization system. The entities can be aligned along a gradient of increasing structural complexity and range of functionalities. The main functionalities we are trying to address are eliminating ambiguity in language, managing synonyms, establishing hierarchical relationships like term A is broader than or narrower than term B, or associative relationships like term A is related to term B. And at the most complex level, attributing properties to objects. And this is what ontologies do, like term A has name term B. And the NBS provides access to list of control vocabaries and feather eye and publishes them using standards and ontologies, including RDF and SCOS. I won't go into the detail of RDF and SCOS because this will be covered by Alex. So the NBS in operation, how does it look like it's a few background information. It was launched in 2005 as part of the new data grid project, which was a first attempt to improve data discovery and access across distributed data sources, mainly for oceanographic and atmospheric data. Then the NBS was further developed thanks to European funding through the open service network for marine environmental data, also known as Netmar, as well as CDATA net and CDATA cloud projects. So it is used globally as you can see on the map on the right hand side there, geographic distribution by monthly number of sessions over the last 12 months until October 2021. With the largest usage being the US in the US, in Australia and in Europe. So the NBS underpins elements of large international observing data networks, including of course CDATA net and AMOD net data infrastructure in Europe. This infrastructure is a distributed marine data infrastructure for the management of data derived from in situ measurements in seas and oceans. By a pan-European network of marine data centers. Recently, the ARGO float program adopted the NBS to host its collections of control vocabaries. ARGO is an important international program that has deployed some 4,000 robotic floats over the last two decades, in order to measure what the properties across the world ocean. And then the ocean biodiversity information system, which is an infrastructure for marine biodiversity data, which is now managed under the auspices of the IOC. Also uses some of the control vocabaries developed by CDATA net, as part of CDATA net to underpin its management of environmental data and the consequence as adopted NBS as well. Another example, which is outside of completely marine domain, if you want, is Move Bank, which is a database for animal tracking data, and which has selected the NBS as well to host and manage its own vocabulary collections, supporting its data transfer system. So, what is the NBS useful? Well, it is useful for two main purposes. One is semantic annotation and the other one is semantic alignment and mapping. So for the semantic annotation of data sets with machine accessible terms, that either those, because those terms comply with the users chosen metadata standards, or simply to harmonize information by interesting consistency in the terms used for given concepts. This helps with the cataloging of the information and also with the creation of reproducible validation workflows, for example. It also improves the accuracy and the selectivity of search tools, and the efficiency of automated and semi-automated processing routines. Then the semantic alignment and mapping is used for improving interoperability and data integration, within domain and across domain. So here on the right is an example of, it's a screenshot of the CdataNet search interface, which enables users to search database on a wide range of criteria, thanks to standardized recoveries used to annotate the files. So who uses the NBS? It is used by people who create and curate data, database managers, data workflow engineers in the marine and related scientific field. A large proportion of current users are actually scientists, data managers and engineers involved in sensor-based observing networks. Because of the growth in autonomous, semi-autonomous observations of the environment and the increasing demand of control recoveries that is associated with that. And more and more the investors are also accessed by tools developed by API and web user developers as well. So the graph at the bottom of the slide shows monthly number of users from October 2020 to 2021, and you can see that for the second part of that year, we had steady values between 5 and 8,000 users per month. So why is a dedicated vocab service so important? Well, it enables us to have a central repository of shared vocabaries, shared terminologies. The content governance associated with it and the gatekeeping ensures harmonious growth based on consistency in decision and adherence to vocabulary management best practices. Also, the human gatekeeping element helps improve content quality by ensuring that no obvious errors are introduced and responding also enabling a quick response to users feedback. The technical governance ensures rigorous version tracking, reliability of services and up-to-date technology. Finally, having such a dedicated service nurtures human expertise, in-house human expertise and skills. Finally, it also ensures a long-term continuity of stable stability that foster international and cross domain collaborations. So what are the challenges? Well, you could have the first challenges are related to language itself, of course. Here you can have similar items, similar objects with different words. Of course, there's a well-known differences between the British and the American English. But even in the scientific field, conventions are many and adherence to a common language is difficult. If you take this example on the right there from KB, which is a well-known repository for information related to chemical entities of biological interest. You can see that for a simple molecule, at least 20 different synonyms and for simple molecules that is commonly known as ethanolamine. And it's many, many chemical substances, especially contaminants have a wide range of synonyms, which makes the transfer of information about those chemicals very difficult. Then the same word can have different meanings. And here if you, there's an example of the usage of pint in Australia, and depending where you are in Australia, you will have less beer for your monies than in other states. So this emphasizes a need for standards, of course. In this case, it would be the metric system, but also the need to have information about the context around the term to avoid misunderstandings. So human brain is very good at understanding context implicitly, but machines will always need context to be explicit. So in this example, for example, the, it just writing temperature in the header of a column of CSV file, for example, is not enough to make these that are readily understandable by automated software. We need to capture both the term temperature and its context in machine readable format. And in these cases, where, where the temperature was measured, what temperature for water body, measured by a sensor in the scene, or different kind of contextual information. But standardization is hard. And this is a massive challenge. But that can be helped by using semantic resources. And this is partly because it costs effort to change our ways. However, the consequences are sometimes very costly. So this famous example of the mass climate orbiter, which crashed to pieces in 1999, due to somebody in the processing chain, assuming that the data they received had already been converted to metrics, while in fact they were still in English and in English units. So this costly design disaster might have been avoided if control vocabaries and metadata standard had been used. And by having a mandatory field, for example, for units in a machine and human readable format in the data being transferred to NASA, the program could have been written to validate the data. So, another challenge is to keep pace with new terminologies. We have invented new discoveries made new concepts created. If we think about what we've all experienced over the last couple of years with communication about COVID, and its associated concepts, the apparition of new terms, and the need for these to be defined and understood by a large number of people from disparate backgrounds is a good example. Quickly, we realized that in order to make sense of what was happening around us, we needed to agree on a common language, hence appeared specialized vocaries dedicated to COVID and created to support communication. So it's not only just about keeping pace with concepts and terms, but also keeping pace with new connections. In the background of the COVID pandemic, a whole range of activities also started establishing new connections to enable the fast processing of complex information in the form of ontologies and mapping between data resources to bridge across disciplines. Because as we know, COVID didn't just have an impact on human health, it had also an impact on the environment and on the people's lives and livelihoods. And it was important to actually be able to process its information very quickly. And this has been reflected and supported by the rapid development or extension to existing semantic resources and also the creation of new ones. So another challenge is the proliferation of terminologies. And this is often driven by disconnection of activities, regional variations, pressure to deliver. And here on the right hand side, there is an example of pitch-fitter guidelines where there are many guidelines according to different regions, and they don't fully overlap, they don't fully align. And it is the object monitors are not the same everywhere in the world, so it's very difficult to have a global assessment of the impact of pitch-fitter. So this is resolved by setting up the establishment of a global partnership that can help align existing content and ensure that future growth remains aligned so that global impact can be assessed. So semantic artefacts are often referred, oops sorry, so another aspect is the importance of data quality. Data notation is a facilitator in the journey of building reliable artificial intelligence and machine learning. However, the output is always only as good as the input. And as always, data outputs sometimes suffer from the gygo, garbage in, garbage out principle. So, and it was illustrated recently by the publication of this study by MIT, which discovered some substantial errors in data sets used for machine learning benchmark, where images were mis-leveled. So it's important to use trusted data sets that have been validated by humans in order to, in order to trust the output of processing. So semantic artefact are often referred to as a group that brings distributed information systems together and enables greater capacity for artificial intelligence and machine learning applications. Including data discovery, analysis and integration tools, and building bridges across silos of information. And as often with glue, if you have done a good job, it should be invisible. So the complexity should be hidden from the user. However, without it, even the best design tools and the best presented data will fail to deliver what we need in order to address the urgent environmental and societal challenges we are trying to address. So it is important to annotate data with standardized fair terminologies in order to improve finability, facilitate interoperability and optimize reusability. And it is not sufficient to have open data but link data in order to connect data sources and services in the digital world. The States team Berners-Lee challenged us to envisage the World Wide Web as a semantic web, a web that is not just about putting data out there, but also about making links. So that a person or machine can explore the web of data with link data, can explore the web of data with link data when you have some of it, you can find other related data. So I will pass on to Alexandra for the technical part. Explain, that was brilliant. So, in this technical walkthrough, and I will explain what are the building blocks of the technical infrastructure of NBS. I would like to talk about RDF scores, how NBS is aligned with RDF and scores, about Sparkle and how NBS is aligned with link data. And at the end of this technical walkthrough. I would like you to be more acquainted with NBS technical infrastructure, and I would like you, I hope you will feel that all these acronyms are not that hard to understand and use. So RDF stands for the resource description framework and is a language to describe resources. A resource is anything in this world, a book, a movie, a web page or a term. And in RDF, this thing has to be identified by a universally unique identifier or a URI. What can you do with this URI is that you can unambiguously describe a concept resource or a thing. You can specify how your resources can be related with each other. And maybe you can do some basic inferencing. In RDF, everything is expressed as statements that are called triples, and they have a subject, a predicate and an object. For example, you can make a statement that Bob knows Alice, where Bob is the subject, knows is the predicate and Alice is the object. URIs can appear in all places of a triple. And they are important because they are global identifiers that enable other people to reuse them to identify the same thing. For example, this URI fove knows is used by people in the semantic web to state an acquaintance relationship between two people. The fove is a machine readable ontology that describes people, activities and relationships between them. And knows is the property of this ontology that describes acquaintance. In the same way as fove describes people in RDF. SCOS, which is the simple knowledge organization system, provides a standard way to represent knowledge organization systems in RDF. For SCOS, the scope of knowledge organization systems includes SRI, controlled vocabularies, taxonomies, classification schemes and subject hearing systems. And when you encode your COS with SCOS, you make your COS machine readable and interoperable. Some things that you can define in SCOS are concepts, collections, mappings and SRI. A concept is an idea, a notion or a term, like sneezing or coughing. A collection is an ordered group of SCOS concepts. For example, the collection of symptoms can include concepts like sneezing and coughing. A mapping is a link between two SCOS concepts. All CAF and respiratory disorders can be associated using a link or a property that is called narrower and you can say that CAF is narrower than respiratory disorders. And finally, a thesaurus is an aggregation of one or more SCOS concepts. Finally, you can list things with SCOS, but we can actually add properties to describe them in more detail. Here you can see a thing and using the RDF type property, you can identify that this thing is a SCOS concept, that it has a preferred label that is CAF and that is written in the English language. But you can also use another preferred label with a different language in French and that is two in French and then you can use several other properties to further define your thing. The things that we describe in NBS are SCOS concepts like high-volume air samplers, CAA weather station, meters, moles per second, etc. These SCOS concepts are grouped into collections like the device catalog R22 or the units of measure P06 or the device categories LO5. Currently, NBS counts 290 collections and over 325,000 concepts. This is an RDF graph of how things in NBS look like. For example, if you wanted to read this RDF graph, you would see that this thing that is called R22 is a SCOS collection and it has a certain description, an alternative label, it has two members, both of which are concepts and they have their own preferred labels. So, up until now we talked about things in SCOS, we talked about their properties in SCOS. SCOS can also enable the creation of semantic relations that can be either hierarchical or associative. For example, we can say that meteorological packages are broader than the CAAE weather station. And here we establish a hierarchical relationship, but we can also say that CAAE weather station is related to its manufacturer, which is CAAE Italy, and that's an associative relationship. We can also establish external mappings, which means that we can associate NBS concepts, not only to concepts from NBS, but also to concepts of other vocabularies like this concept which is meteor from our P06 vocabulary, which we have established that it is the same as the meteor from DPPdia or the meteor from QUDD. And this is a high level view of how one collection is related to several other collections inside NBS. You can create more of these diagrams and maps in that link on the top right. As we said, in NBS we list collections and each one of them has a unique URI that follows a certain pattern. This is the pattern and here we show that our collections all have a three character string. For example, this is the URL of L22 vocabulary. Concepts in NBS have a unique URI and the pattern that they follow is that they have to belong into a certain collection and then they will follow with their own unique identifier. Concepts also have versions so you can track how they evolved through time and the version is a number at the end of the concepts URI. And finally, we have a unique URI for mappings as well and we identify our mappings, we divide our mappings in internal and external ones. And this is an example of a unique URI for a mapping. Now, we talked about the RDF discourse and the URIs. All of these can be stored in a triple store, which is a specialized graph database or a database where you store RDF triples. These triples can then be queried online using the Sparkle query language, which is the equivalent of the SQL language for relational databases. Several of our tools in NBS are based on our Sparkle endpoint. And since we've talked about all these technologies, it's about time to show you how NBS is aligned with link data. Link data is a set of principles on how to publish structured data on the web. Coined by Tim Berners-Lee in the design issue notes link data in 2006. Tim Berners-Lee said, if you want to publish structured data on the web, you have to use URIs as name for things and use HTTP URIs so that people can look up those names. So, of course, we use unique URIs for each of the things that NBS is listing. When you click in one of these URIs, you will get a user interface that will tell you more about what this URL is about. The next principle is that when somebody looks up a URI, provide useful information using the standards RDF and Sparkle. When you click this URL, you can get RDF. This is how RDF looks like. We use content negotiation to provide either the user interface in HTML or RDF. But you can also go to our Sparkle endpoint and use the standard of Sparkle to describe this particular URI. And the next principle is to include links to other URIs so that people can discover more things. So here is a link that takes you from this particular URI to a different URI that states equivalence and it will take you to the QUDT vocabulary. This is a diagram to show you how you can access NBS either as human, as human, or as a machine. So as a human, you can access the NBS user interface. I think it didn't take me. So you can access the NBS user interface. Or you can access the NBS search to search for terms. For collections and for mappings. Or if you are an authorized vocabulary editor, you can use the NBS editor to create new terms and mappings. For machines, we provide three web services. Sparkle endpoint where Sparkle queries can be expressed. A restful API, publishing NBS as link data, and providing different serializations like RDFXML, JSONLB, Tertl and many more. And finally a SOAP web service. In the paper, ten simple rules for making a vocabulary fair that is shown up here. The importance of a terminology is highlighted as a very important requirement. All the background work to keep NBS alive and operational is performed by a team of experts, the vocabulary management group that performs different tasks that can be shown here. So when new requests are received by the team, they have to perform the gatekeeping of these requests. They provide technical support to external users and groups. They provide advice and consultation. They perform research, they liaise and contribute to these groups, external groups. They create collections and concepts and mappings, they publish them, they update and maintain existing content and its metadata, and they map and align concepts with internal and external concepts as well. And here is the procedure if you want to request new terms. So for end users that want to request new terms, there are three different ways to go. The first way is to visit our github account and go to the required vocabulary following this URL pattern. And when you are in this repository, you can create issues to create new terms or ask questions. Users can also send emails to vocab.services at vodc.ac.uk. And if users are authorized vocabulary editors, they can directly use the vocab editor to create new terms and mappings. So that's the final slide and here is the NBS operational and having revealed some of its ingredients of successful operation alongside why it is important. We are ready to receive questions and we hope that you learned a little bit more about NBS. Thank you very much Gwen and Alexandra. That was a really interesting and illuminating talk. I'm sure we have lots of questions and we do have time to take some so if people have any questions please feel free to pose them in the chat or the Q&A section. I did have some come in during the talk that we can start with. So I'll start with this one. How easy would it be to extend the NBS to incorporate vocabulary terms, vocabularies and terms from my project. Just two way to answer that. I mean, technically, it would be easy. In terms of human resource. It would be more difficult. So it would need a source of funding to create the vocabulary and populate it with the terminology that is submitted. I think I should add as well that we don't want to especially to publish any kind of vocabularies in the NBS so it would have to go through an evaluation process because we want to maintain quality of the information that is served by the NBS. So in the past we have been approached by people working for a project and saying, well, look, and that's what happened with MoveBind. I've got this terminology, you know, would you be able to serve it through the NBS. And basically what we did is just look at the terminology and it was it was ticking all the boxes. So, yes, in this case, absolutely no problem. We created a collection, put the term there, and then the people in charge are now using the vocab editor to maintain their own vocabulary resource. So we almost, once it's set up, we almost don't need to do much more except the basic gatekeeping to make sure that there's no big errors of editing errors in the structural errors in what is being submitted. But in terms of content, it's fully the responsibility of the people who are designated editors and representing the governance of that particular vocabulary. Okay, thank you. Another question has come in. This is from Anne Gledson and she's asking, does something like this exist for the health sector? Of course. So there is the mess, which is the medical subject headings that exists for the medical domain. It's the bioportal as well. And that is a portal that brings together different ontologies and vocabularies. So I would suggest maybe the bioportal as a first entry point to discover what is in there and then go deeper. Okay, thank you. Another question is, how does the NBS align with similar international initiatives? How are overlaps and differences handled between those initiatives and the NBS in terms of vocabularies? In terms of content alignment. I mean, we are connected with many groups around the world, so we work in strong collaboration with others. So through the research data alliance, we have forum for data for semantic interoperability that enables us to find ways of actually aligning with others. I can say about the Rosetta Stone project that we, what we did in this project is that because US, Australia and Europe were using different vocabularies to say the same thing. We created a proof of concept where we mapped platforms, sensors and parameters between Australia, Europe and US. And we created translations and then software could go on top of this translation and discover data sets that existed for a certain keyword that existed in all of these different repositories across the world. So it can work really well. And if you have annotated your data sets with a vocabulary, that's the most important thing because the next step is to create the mappings and create these translations. Okay, thank you. We've got another question here. This is from Rachel heaven. And she says, thanks Gwen and Alex is your vocabulary editor tool something you've written yourself, or is it an off the shelf slash open source tool. So the editor is, is an ad hoc tool. It was written in, in BODC, I would say years ago, we've recently revamped the tool. It's not very open to many users. So there are not many authorized users that we have allowed that to be used for, but we intend to open it up more so that people can actually self manage their vocabularies. Okay, thank you. I've got another question, which is, we hear a lot about the fair data principles. So what are the ways that approaches such as nvs can help move towards being fair. First of all, the item. So from the third principles, the I2 principle says that you have to annotate your data with vocabularies that are fair in order to achieve the interoperability. So, and that's why we said that it is important to use vocabularies to make your data fair but it's also important to use fair vocabularies to make your data fair. And it will, it is one of the requirements for fair, for the fair movement to annotate your data with vocabularies. Basically, data can't be interoperable unless they're annotated with machine-readable vocabularies. So just otherwise they don't comply with that interoperability limit. Yeah, it's the building blocks. It doesn't comply with standards and all of that, but one of the building blocks is the, and it is the I2 if somebody wants to look it up. Yeah. This is a question I had. I mean, I've worked in the odc for many years and our data managers take data in and we use vocabularies from the nvs to mark up those data so that they can be interoperable and more easily reused. How do we get it so that the scientists or the data collectors are using these vocabs from the off rather than it being something that happens further down the chain? That's a good question, Robin, and it's very important and I think it's a priority for the near future is to actually make control vocabularies more less daunting for scientists who create the data. And this is something we really want to get involved in to work closely with scientists so that we can use, because we should be able to use the nvs so that scientists can still use their own terminology or pet name for their variable, but providing the map those pet names to the standards that we've got in the nvs, then we can actually, without very much effort from their side, we can lose when they create those files, they will be automatically linked to a standard control vocabulary in the nvs. And it's very important for us as well as a data center and as data managers, we would have a much more efficient workflows. If we didn't have to retrospectively try to understand what the scientific variable is, what the methodology was or what the instrument was, if all that came to us already quality controlled and approved by the scientists who actually was involved in collecting the data, we would have been able to do that sometimes maybe three years after the data was collected. So it's a really important priority I think for our area of activities. And I'm aware that some of the instrument manufacturers are now starting to come on board with control vocabularies so if we can get them from the off, outputting the parameters already using the vocabularies then we're off to a flying start I think. Yeah. Okay. So, I suppose another question is, what are the big challenges facing the nvs in the next three to five years, do you think. One of the challenges is sustainability of funding. Because there's a lot of work happening in the background that people don't always realize. And it's quite, it's quite cheap to create one term but when you have to create like hundreds of terms per month or it's just starts, you know, taking time from core funding so or taking a resource from core activities so it's trying to develop that funding model that enables us to be, I mean we are sustainable but to be financially sustainable. And in terms of technical challenges. I would say that being on the forefront and being able to align with what is happening in the world. We have our backups by being part of participating in European projects that achieve this in the forefront of the technology, but also being in this RDA groups is really important for us to, to get updated but also have a platform for discussion for issues that come up with the semantic web community. Okay. So, another question is that much of contemporary science is by nature interdisciplinary or cross disciplinary. How can SCOS approaches be used for integrating ontological thinking across disciplines, for example, environmental science and social science or engineering. So an example might be modeling the impact of the environment on the built infrastructure. And how can we find out to to handle interdisciplinary contexts. Can I just start to differentiate between ontologies and vocabularies, because they are different things. You can model the world and the vocabulary. It has a drop down list in a user interface or a drop down list inside an ontology. So there are slightly different, and they have different purpose to fulfill on the cross discipline and interdisciplinary purpose. Gwen, talk about iAdapt. Well, yeah, I suppose the, in order to create this interdisciplinary narrative, we need to work collaboratively with others, of course, and as part of the research at our Alliance, iAdapt Working Group, which focused on how we label things we measure or predict or so the name we give to a variable. And we, for the last two years, we've been working on developing an interoperability framework for agreeing on common elements that needs to go into the definition of those variable names or variable levels because they are sometimes not just one word, but sometimes complex description of what is being measured and in environmental science, very often it's very important to know precisely what was measured. There's a lot of information, a lot of, a lot of properties or observations are made by proxy as well so it's very important to actually describe exactly one thing can be measured with different method. So you need to be able to distinguish between those different. So, so we've, for the last two years we've been working on that and we have developed, we have released our first recommendation document, as well as an ontology that enable cross disciplinary mapping of variable vocabaries. But this is just one element of the whole data landscape really but that needs to happen at multiple level. Things like maybe instruments, it might be sensors, it might be easier because very often sensors whether they are in the sea, it's a continuum but it's, it's, it can be part of a same collection so it's rare when you will have, you know, overlaps or you need to have overlaps of terms for, for like sensor models for example. It's good to just have one sensor model registry, and then everybody uses that. But for other terminology is good to actually have discipline specific kind of vocabaries that then we just developed in concept well developed in insurance that's what is common to those different vocabaries actually aligns, what needs to be common is aligned, and then the rest can specialize in the different, different fields. So here that apart from link data across disciplines, we also need linked people so we do need agreements inside the domain and in cross domains because the line, you definitely need to establish the language to understand what each domain tries to say, but also agree on how to say all these things so apart from link data it's about linked people as well. And it's really, really important. Okay, thank you. I think that's all we've got time for today. So I'd like to thank again our speakers Gwen and Alexandra for that great presentation and discussion.