 I'll give my everyone. Thank you for having me. It's great to be here. As a way just to further that introduction. Thank you very much for that. I've worked in cultural institutions for the past 15 years. I've worked both in galleries, museums and now in an archive context. I primarily worked in collection data management roles as a data steward, cataloger, really focusing around collection data and collection data management. This year I moved from the National Gallery. I've been there for 10 years and I moved to the National Film and Sound Archive. So this is my first experience working in an archive and as mentioned I managed the Data Integrity and Analytics Unit. National institutions share the same role to share preserve, sorry to collect, preserve and share national collection, each honing in on their specialized domain of cultural objects and context. I'm going to focus primarily on the National Film and Sound Archive but bring some other interesting concepts and information about other institutions, national institutions. So today my numbers are a bit wacky. Sorry about that. Firstly, yes I will give a bit of an overview of the National Film and Sound Archive, talk about the data landscape, its data landscape, examine the present condition of vocabulary use within cultural heritage and we'll discuss the new drivers, prompting a shift in the landscape and then pivot to next steps for the NFSA start revolution. I'm going to concentrate on the future of data standards, data models and vocabularies that would chart the course for our institutions advancement. Australians were earlier doctors of film and sound technologies and the appetite to make, enjoy and discuss audio visual culture remained strong. Australians have a long standing connection with film and sound dating back to 1890s and National Film and Sound Archives collection origins traced back to 1930. It was initially part of the National Library of Australia before becoming independent in 1984 and taking on the building that's not too far from here. The collection is diverse with more than four million holdings, spanding news, film, radio, home movies, oral histories and more recently video games and podcasts. Beyond media it archives Australia's history, audio visual history through costumes, scripts, props and relics like megaphones, radios and this cool Game Boy that you see on here the Aussie Game Boy as well as mp3 players as well and far more. As well as preserving these items for future generations through digitisation and safe storage practices, the NFSA continues to grow the collection ensuring it provides unbroken record of life in Australia and of Australian creativity. Unlike galleries and museums with smaller targeted collecting practices, the NFSA has a high volume of material brought in each year sourcing from industry contributions, public domains and through strategic collecting so this is a new concept to me just the sheer amount that we collect each year. The NFSA provides access to the collection via public exhibitions, screenings and online platforms. It facilitates private collection and data access requests, collaborates with artists and film and TV industry, the audio visual, the audio industry and house researchers. With 121 million viewings in 2021, NFSA stands as a critical cultural resource keeping Australian collective memory alive and accessible. The National Fairman Sound Archive like many cultural institutions has evolved its cataloging methods to keep pace with technological advancements. Initially the meticulous process of cataloging involved manually writing out details on cards which were then filed in cabinets accessible to those who could navigate the system. As digital advancements took hold, these card systems were replicated electronically capturing these essential details like identification, location, description in a digital format. This shift has made the archive's wealth of audio visual history far more accessible and is set the stage for future innovation in how we preserve, explore and interact with our cultural heritage. Over the past 15 to 20 years, media research has thoroughly analyzed a wide array of content types ranging from conversational news outlets to the transient trends in TikTok and spanning the gamut of mainstream entertainment to the gaming industry. This diverse assortment of media not only showcases esteemed artist work but also captures the everyday interactions that weave together the tapestry of modern social life. So our cataloging landscape at the NFSA is shaped by a diverse range of influences but not that kind of influences. I'll speak to those influences now. At the NFSA our cataloging practices have largely been dictated by the data model and the fields in the systems that we use with some modifications to accommodate our collection's distinctive elements. Adoption of international standards have ensured we speak the same archival language as our global counterparts. The varied nature of our collection itself demands specific data treatment, differentiating how we handle technical data for analog material versus digital material and varying cataloging methods for documenting, for how we document documents or moving image and audio material. Interacting and updating legacy data so integrating and updating legacy data is an ongoing process and I guess challenge for us to align with contemporary cataloging norms. We also incorporate industry specific vocabularies and metadata that are standards within audio visual sectors. So in the cultural heritage sector cohesive language is not just a convenience it's a necessity for interruptability and knowledge sharing. It enables professionals from different institutions and disciplines to communicate effectively ensuring that information is accurately exchanged and understood. Sharing vocabularies allows for consistent cataloging indexing and retrieval of information. This consistency is critical for research as it provides a reliable framework for scholars to locate and reference material across various collections and platforms. Moreover a shared language facilitates collaborative projects between institutions such as traveling exhibitions or digital archives where multiple organizations contribute to a singular narrative. Without cohesive language the risk of misinterpretation and data inconsistency increases potentially leading to a fragmented user experience. A unified language is pivotal for public engagement with our collection. It demystifies the academic or technical jargon making the treasures within our collection accessible and reliable to the wider community. When the public can easily understand and interpret the information presented their connection to the material deepens fostering educational enrichment and cultural appreciation. After all it is the nation's collection. The integration of fair findable accessible interoperable reusable vocabularies concepts enhance accessibility of cultural and historical resources. It allows the public to effectively search and locate items of information irrespective of the institution that possesses these resources. Emphasizing fair principles will ensure that these resources are not only easily navigable but also meaningful fostering interoperability and reuse thus a cohesive language adhering to the fair concepts is critical for cultural heritage. Sorry there was my arrow pointing to vocabularies working for our audiences. The effectiveness of vocabularies in the cultural sector relies heavily on diligence and ongoing maintenance as you would all be aware. This principle is demonstrated by the National Fairman Sound Archive with its preservation grocery that is available online. In shaping our cataloging methodologies the NFSA draws upon a wealth of international vocabularies ensuring compliance with global standards and fostering interoperability. This includes the International Federation of Film Archive moving image catalog manual the resource description and access for global cataloging norms that I will talk about shortly. We align with the International Federation of Library Association and Institutions cataloging principles and make use of the Library of Congress genre and form terms. The International Association of Sound and Audio Archives cataloging rules are also a reference point for us. Now nationally oh sorry I didn't show you them all sorry I can't see the screen as easily um move like this. So nationally some examples the Museum of Applied Arts and Science known as the powerhouse has developed an object thesaurus in line with the art and architecture thesauri from the Getty Research Institute tailored to reflect Australian perspective. This platform encourages contributions and updating from the community representing a model of interactive vocabulary management and whilst working at the National Museum of Australia we utilized this this the powerhouses thesauri for our object descriptor. The National Library Union catalog demonstrates the benefits of sustained enhancement and careful management evolving from a basic catalog to an expansive resource. It underscores the enduring importance of these vocabularies for enabling research and public engagement with our collection collective collection cultural assets. The Getty's thesaurus of geographic place names and the Gadgeteer of Australian place names are key resources for cultural institutions. These provide comprehensive databases for geographic and Australian place names um retrospectively and are continually continuously updated with contributions from various projects and institutions. Integration of these at the NFSA could be one way because it significantly improved the way we manage place data. So the new drivers when transitioning from its traditional card cataloging the NFSA initially adopted a similar approach for its electronic system emphasizing precise item description and identification over contextual information and this is not unique to the NFSA. However there have been some recent shifts towards emphasising the contextual and storytelling aspects of collection. This changed parallel parallelist discussions for the reconsidering Museums Canadian National Project aimed at understanding the value of museums to Canadians. A notable contributor to this project Wendy Fitch shared insight on this topic in a podcast exploring why museums matter. It's no longer simply training people how to catalog an artifact or how to physically take care of an artifact. It's about collecting stories and sharing those stories and encouraging museums to let go of control and encouraging them to let those marginalised communities tell their own stories and not try to control what those stories are. Currently the NFSA faces a significant challenge. While the cataloging database serves specialists well it creates a barrier for the general public with jargon complex structures impeding easy access and understanding and easy access to the collection itself. To bridge this gap a data transformation project is critical aiming to make the collection more intuitive and discoverable for everyone fostering broader insight and engagement. However as this data is manually entered and stored in uncontrolled state we are hitting hurdles when analysing relating and publishing on our search the collection website. The demand to pinpoint accuracy in navigating our collection is surging driven by diverse user cases with television networks scouring for specific footage to use in their news pieces. Individuals on a quest to undercover clips featuring their relatives an artist in search of the perfect visual element this underscores the essential need for precision in our archival search capabilities. Galleries and museums who have gone before us in contextualising their data and adopting the sorus-based vocabularies such as the Getty Art and Architecture of the Sorai have increased their collections discoverability. However to quote a paper or sorry oh my gosh I'm on the wrong side to quote a paper on AI for a role changing in television archives it's become increasingly impractical for television archives to catalogue their collection without automatic processing and artificial intelligence intelligence technologies to put it simply due to the sheer volume of backlog and the influx of new media it's not feasible to rely solely on manual cataloging consequently we're exploring alternative methods including leveraging large language models and other AI tools to manage our growing needs effectively. Essentially we're looking for technological alternatives that replace as far as possible part of the indexing and documentary description work that has been carried out manually in archives we aim to augment the the expertise of curators exploring technologies to shoulder the more repetitive tasks thus freeing our human experts to apply their irreplaceable judgment and creativity where it matters most. Now as we peer under the hood you've already seen my slide so I'm a bit disappointed with my timing of this anyway as we peer under the hood of the data engine we are confronted with a compelling need to evolve our current system akin to a vintage vintage car has served us well in the past but now requires a significant overhaul to keep pace with the digital era's demands despite best efforts to adopt international standards other influences have trod in our way. Now in navigating our data strategy we're honing in on three pivotal areas to modernize and strengthen our approach we aim to merge our narrative driven intellectual data with our granular technical data that caters for both professionals and the public we're moving beyond simply counting physical and digital holdings to develop richer metrics and adopt a knowledge graph informed by the FIBA principles which will revolutionize discoverability and connectivity. Our goal is to impose control and structure to our metadata the NFSA's current approach to vocabularies is split into two separate paths intellectual data this encompasses titles summaries credits the narrative elements of convey the story and the cultural context then we have our technical data here we delve into the specifics time codes usage rights file format and more focusing on operational aspects of the collection. The challenge lies in merging these two streams into a unified cataloging system that is both comprehensive for professionals and accessible to the public today we have integrated the resource description and access or RDA as a cataloging standard created for it is a cataloging standard created for digital environments RDA's forward-thinking framework guides us to create detailed and user friendly metadata for our collection so why RDA? Well it's simple RDA aligns with our commitment to preserve and share Australia's rich audio visual heritage the standard is based on the AACR the Anglo-American cataloging rules designed with a more international scope RDA established we align in separating between the record of data and the presentation of data the major focus of the RDA is providing guidelines and instructions on recording data to reflect attributes of and relationships between the entities defined in the FIBA and FRAD their standard supports our aim to make collections as accessible and discoverable as possible the RDA is tailored to accommodate a wide range of formats both digital and physical reflecting the diverse type of media found in our collection it places a high emphasis on the need for the of the user ensuring the information provided helps them find identify select and obtain the resource they're looking for RDA is grounded in internationally agreed principles making it applicable across different countries and types institutions and lastly the RDA's guidelines are structured to work well with linked data technologies which allows the NFSA to share metadata both easily and effectively on the web so a bit more about our data specifically and data models the collection is currently measured at a holding level we count our physical material by carrier with its independent items such as film reels tapes objects documents photographs and we count our digital material by file so this includes supplied and made files from high preservation quality to our browse quality versions but does not include any backups so what does this mean for an individual title we'll take taking mystery row to feature film in the collection we have three physical carriers and 137 unique files so for this one movie we have or feature film we have 140 added to the four million holdings count okay we are starting to unpack this count and looking to delve deliver new baselines that offer deeper insight into the collection this involves not only technical change but also a conceptual shift in how items in the collection are viewed not as isolated records but as part of a larger narrative web this will require a transformation of existing data where relationships are defined and vocabularies are shared are standardized sorry and data is enriched to fit the new model so we are looking at integrating concepts obviously making it fit for our purpose but taking from what already is out there we've been looking at the functional requirements for bibliographic records model it's a concept conceptual entity relationship model developed by the international federation of library associations and institutions it comprises three groups of entities so on screen group one expression manifestation and item this represents the product of intellectual or artistic endeavor group two is entities of people family family and corporate bodies responsible for the custodianship of group ones intellectual and artistic endeavors and then group three entities oh sorry i had group three there already group three is the yellow entities are subjects of group one and two intellectual endeavors and include concepts objects events in place while our current model that we use for our collection is loosely follows the fbr group one and two entities it inadvertently conflates entities and different of different natures like a feature film with a prop under the work categorization recognizing this we're committed to clarifying refining these categories and we're also acknowledging the absence of defined structures for groups three the subjects our goal is to enhance our model to distinctly represent all relevant entities and their relationships ensuring a more coherent and functional classification that aligns with our evolving needs the knowledge grasp which will hinge off our collection data model is our blueprint for modernization of our modernizing our data into a dynamic interconnected network so this network comprises nodes representing entities the film director soundtrack and edges which are the relationships between these entities it is the network that creates a more holistic understanding of our collection showing not just individual items but a rich tapestry of connections between them edges are fundamental in illustrating the interconnectivity between various entities in the archive for example an edge may link a film node to a director node symbolizing the directed by relationship another edge might connect a soundtrack to a composer indicating the composed by relationship transitioning to this knowledge graph paradigm marks a profound shift for cataloging single items to mapping a complex web of cultural narrative and relationship allowing for a nuance and conception and contextual discovery experience along with relating the other entity types such as people place in events the goal is to provide a comprehensive view that connects the work's original concept to all of its subsequent expressions and tangible items within the archive the knowledge graph approach allows these connections to be easily made and understood providing a richer and more contextual experience for researchers historians and the general public vocabularies will also play a pivotal role in our data transformation as they act like translation tools transforming raw data into structured meaningful information the current state of metadata presents challenges for us a feature in our discovery work has been geographic information the NFSA's database contains place data in various fields a lot of them are free text fields and unstructured uncontrolled is creating limitations in collection navigation and use on this challenge we will look to we are looking to leading institutions who have integrated tools such as the Gettysosaurus of geographic place name and Gazatira of Australian place name we will also look at leveraging projects that map collection data to wiki data for creating accessibility and sharing potential the integration of a structured place name vocabulary into a cataloging system would our cataloging system would significantly enhance the precision consistency and interconnectedness of our collection metadata concerning geographic location the TGN for example provides authoritative names along with variants in different languages and historical periods ensuring that all records reference refer to the locations consistently a controlled vocabulary place names would enhance the user's ability to find all relevant items associated with a particular location regardless of the variation of the data entry TGN would widely use and is widely used and recognized which would enable us to link our data with other collections resources that also use the TGN creating a more connective and enriched web of information the the structure structure information in TGN includes hierarchical relationships between place providing a more detailed and nuanced representation of geographic data in our collection for the journey ahead as we dig and structure and rebuild our data engine vocabularies are at the forefront of our strategy our primary audience remains in house our internal team dedicated to harness inside and cross strategies for data yet our research extends far beyond our walls we have a long-standing tradition of providing access to researchers in the humanities arts and social science field offering our catalog as a resource for generic user tasks of finding identifying selecting obtaining and exploring the content as data approaches gaining traction appeal to appealing to those who seek to delve deeper into the data exploring the social narrative that lies within both descriptive cataloging data and the containers themselves can be utilized for derived insight with a pec vocabularies once concealed are shared we're expanding the possibilities of new data applications from maintaining raw transcripts to structured entities or while being cognizant of the bias that it's crept into our collection similarly our catalog data went interlinked with new raw data serves as a connector unlocking the archive to greater audiences our vision is not to view our collection merely as a series of items but to embrace it as a integrated body of data ripe with semantic richness this approach caters not just to the traditional house researcher but also to those engaged in innovative innovative interdisciplinary research we're mindful of the complex rights that encircle some of our material as we build our solutions our commitment is to serve both the general and events researchers ensuring that our collections are as informative as they are inspiring visual analytics in culture offers us ways to discern patterns and identify topics this is explored in a through a recent artist acquisition the NFSA collaborated with artist jazz money in creating a motion picture solely from footage in the collection find me in the break if you wanted to know anything more about this project i'll give a brief overview of it now her work exemplifies the potential of the less manual more intuitive process allowing us to glimpse the unexpected within our collection her use of footage for me personally was both educational and captivating so as we move forward it is the constant evolution of our vocabularies that will ensure our cultural heritage institutions remain relevant and an avenue for all Australians and the world to engage with culture um that's it from me so thank you for listening and be welcome to take any questions i also have with me two colleagues basal and paul from the NFSA so if there's anything to maybe um relating to the NFSA because i've only been there since march i may deflect or the panel discussion at the end so we'll kick off with rob backinson from ogc australia thank you hi there yeah rob backinson ogc not so much australia only it's international sorry about that um yeah so um uh though i do live in woollongong darrell the land and i pay my respects to the elders so i'd like to riff on some of the themes we've uh we've discussed uh and extend a little bit by thinking about the r in reuse in fair so um as leslie pointed out before we've done a lot of f and a in terms of um i wouldn't say what it stands for but um around fair principles but ultimately it's in reuse where the money hits the road is where you actually get some value out of things so everything else is just the means to an end so going back to what our phone gregory was saying in the um the keynote yesterday now the the goal here is we really need to achieve machine readability of metadata and part of that is not putting something at the end of the process trying to describe what we made but actually surfacing up the actual details and so creating machine actionable um metadata is even more important than machine readable and we need to do that by design we need to surface the design and the operational status of our systems into metadata automatically not attempt to retrofit it because we'll never ever get it right and we'll never really understand what we need machine actionable metadata is always going to be a lot richer we saw things like ssom and mappings and that sort of stuff so on the slide i've got a um a little diagram which is available in the links i'll leave with the organizers which is a work in progress which is why i didn't do slides i'm going to go to show you uh some live examples in terms of how we are approaching thinking about reuse of the cabarets in um applications so the first thing i want to do is actually unpack the layers of that there's a single european interoperability framework which is kind of nice it breaks things down into legal organizational semantic and technical interoperability layers but when you actually start breaking down individual um implementations you end up with something a little bit like what's on screen where you have a layer in the middle where systems and infrastructures basically identify well this is up this is the common metadata standards if you like we need for our components to work together individual systems and applications and things within that then have to specialize it for their own particular application use and so uh if we think of the digital twins which we heard um oren talking about those and the examples are going to show you from a european project we're involved in for iliad digital twins of the ocean which is feeding in practices to the un digital twins of the ocean uh work um you know we see that the digital twin is sort of like an infrastructure it's a platform by which you can plug in data sources and models and all those different applications of the twin now we'll have their own specializations and they'll have their own set of vocabaries and extensions on top of those cores the infrastructures are really designed to talk to other infrastructures it's actually this is the system of systems interoperability space that they're sort of relatively general patterns for example we might say that when we have an observation with an observed property we're going to say that observed property is going to be a uri not a text stream because and then there's a requirement on infrastructure that that is actually a vocabulary that's online somewhere okay we may have further constraints that that vocabulary actually resolves to a machine a real machine readable resource or we may not so the point is that each of the systems will create constraints i have had to use these more underlying basic sort of standards of things like the ogc um and the w3c sort of um create so the ogc is very you know into creating apis for accessing features and and coverages and imagery catalogs and all these sort of spatio-temporal things and then we have sort of fundamental models underneath that um so in the w3c space we're more so active along with nicknames sorting later therefore we have cost domain model observation the measurements which is called sosa in there um the provenance model did get that cataloging um the profiles vocabulary um that sort of stuff do we have implementation patterns like in the general features apis so we build up these layers of specialization constraints interoperability uh domains and the vocabularies tend to get linked in largely at the very end of that process that's where i specialize and say my particular application i want to use this particular vocabulary i may have a number of vocabularies that are required to classify things within the context of the system my digital twin wants me to say okay all right are you about biodiversity or urban health or you know psychology or cats or whatever it is but my individual application needs to worry about okay what's the actual variables that i'm actually measuring etc now the digital twin doesn't want to constrain that so a lot of the vocabularies binding appears at the end of this process and that's where things have fallen down in the past we've tried to conflate all those things into one interoperability standard all the time and we always end up in a situation where how that vocabulary is attached to the end has always been loosey goosey no one has really done a very standardized way of doing that there's no commonly used ontology for describing how a vocabulary gets attached to a schema there is one called idf data cube which is reasonably useful but it's not commonly used for this purpose there's no offline validation tools for checking my data is using that vocabulary okay so just think about that okay well i'll do my vocabularies and yet our tools really don't support it anyway i'm going to dive down now into some sort of practical details this concept of this emergent vocabulary how do we make the underlying designs of our systems more machine readable so that we can actually have metadata that includes the reuse of vocabularies emerge up to the top and become actionable so as i said i didn't prepare slides because this is actually quite dynamically we're working on a range of projects i'm going to show two examples one from the digital twins of the ocean one from the the cadastral survey data exchange whoop um project i'm working on with um icsm in israelia new zealand uh and these are online repositories the first one is public accessible and you can link to it it's got a lot of sort of links to the underlying moving parts the ogc is building and testing and supporting and the other one will come online sometime in the next few months in terms of um being public so sometime uh you know in the new year that will be more visible so i'm just going to dive backwards that came from a general discussion about the nature of profiles which um is in this particular uh repository might point out a couple of things there that these machine readable metadata components and we've got you know the various project metadata your typical metadata describing what the nature of it is but then we've basically got our encodings we have this um we have examples of these things we have jason schemas which is the structure and that's a bit tricky because there's lots of different flavors of jason scheme with the language we also have shackle shapes constraint language which we can express using rdf knowledge graph technology a whole bunch of useful constraints so the trick is how do i go from a jason schema which has no semantic information usefully accessible to something where i can actually create some semantic rules so i'm just going to very very quickly unpack the hood and show you that the second part of it is that it's all about testing is about uh doing regression testing because as you build complex applications it's very easy to break things so i'm just going to dive now right into the code and show you very very quickly we have a series of some uh of profiles which layer on top of each other sorry we'll actually show one last thing before i dive into the code which is that just just a little anatomy of the profiles sorry it's there so in this particular case we've got a citizen science project for uh monitoring jellyfish swarms which sits on top of a generic citizen science profile which is on top of the generic um iliad oceans digital twin of the ocean model which sits on top of the ogc api sosa the sensor observation sampling which sits on top of the lower level of geo jason and so so so this this sort of profile stuff you know it's being exercised and they're separate components that are reusable so diving into the code just to show you very very quickly what lives in one of these sort of things we have some metadata which is not particularly interesting and we'll be working on standardizing that and look i'm thinking about those do is and frs and the requirements we might want to introduce there we have a schema which tells us what the pieces are and this particular schema basically says i'm everything about the observable properties requirements for the ocean information model which is one of the other objects and my expectation is that i have a particular set of results which has a particular set of properties quantity of jellyfish okay um that's pretty much all it has to do it has to just specialize that one last little bit that's important as i'll show you in a second there we have a context document which is the jason ld part which says oh okay our result density of jellyfish okay that has this particular uri and it has a base uri which allows us to turn the values into a into a link data value i wanted to highlight this because we're not doing link data by saying you've got to turn your data to link data you don't have to put all your uri's into your data you can actually um identify the namespaces for the codes you use and you can retrofit it so okay well i've only got two minutes left so i'll i'll skip the 3d cadastra which is about handling this for now several hundred macabres which is one of those sort of git processing um things we can talk about at another date i'm just going to dive finally into what this means in practice is that when i compile and test this um talk to me in today it's just a static html file it should not take you that long to talk to me okay i'm going to make a laritmy connection i'm going to use it by two minutes okay there it is well okay i don't know why that html page took so long um so so this is just the machine readable just a compilation but this is where the the fun stuff happens here's my plain old jason and you see i've got a term here that the property is this jellyfish abundance property when i apply that context document to create the semantic version that's now a uri and if i go to that uri it links through to the control vocabulary and the recovery server which is where you guys kind of know all about her all that sort of stuff so i just wanted to very quickly leave you with the idea that this machine readability of design connecting up all the dots schemas apis macabre it can be done and the final last piece is that it can be done in a very systematic way we actually generate a validation reports where we do schema validations and we inherit shapes from all the other profiles where we're profiling so we actually pull the validation the constraints the vocabulary uses all that stuff is actually all inherited from all those underlying pieces so 95 percent of this little act the jellyfish pilot is inherited so the complexity of making it machine readable is actually manageable because i'm not starting from scratch and producing a 2000 page document specifying my the semantics of my data model i'm inheriting 95 percent of that and i can detect that i'm i'm interoperable with all those underlying component pieces so i'll leave it there just to show that it is actually possible to have our cake and eat it too against those highfalutin machine readable actionable metadata but you can only do it by very very systematic design and testing everything as you go thank you hi my name is hau johns and this presentation is on a proposed standard api for vocab systems i'll be presenting this on behalf of nick car from car along ai so to begin with i'll give you a quick outline of what's going to be covered in this presentation i'll start by looking at the motivation for establishing a standard api for vocabs i'll cover the history around common vocab systems and give you a view into the api as they currently provide and i think this will highlight the lack of standards across the vocab ecosystem then i'll go over what our proposal is and what that implementation might look like and we'll also go into of course the time frame around the proposal to begin with for motivations there are several reasons for wanting to establish a standard to start with vocabularies are very niche and quite often only well understood in their own domains and even though each vocab system is dealing with similar data models there is no normal or standard api other systems can use when they're looking to integrate to vocab systems through an api this is primarily because each vocab system implements a different set of vocab apis finally searching vocabularies can be performed a number of different ways and this can depend on the size of the vocab which languages it uses and specific tailoring that might be required by a vocab so that's something else that we see should be provided as part of a standard when it comes to the current landscape for vocab systems and apis if we look at the commonly used vocab systems there is no standard that has been widely used for vocabs with the exception of constructing a general sparkle query now there has been efforts to establish standards from the rda vocab services interest group however we're still in a situation where we have no agreed standard now i'll just quickly go over some of the current apis that are used across these different vocab systems and i'd like to start with pool party and you'll notice here that they have a number of web services they provide to cover a range of different functionality and if we look at their vocab related apis you'll see that the types of web services and the naming they use for the vocab specific functions kind of follow this pattern where you can request a concept scheme and its top concepts or the child concepts within a concept scheme or you can request a concept subtree but that gives you a feel for the types of apis provided by pool party if we compare this to scosmos you'll notice many differences they provide a set of restful apis that provide you access to vocabs you can query a list of vocabularies you can view general information about a vocab or you can query the broader or narrower concepts in a vocabulary but already you can see how different the set of apis are to pool party finally moving on to our vocpress system it currently provides a simple specialized read-only view of vocabs and their concepts again differences are clearly visible between the vocab related apis from just these three different vocab systems so in review we have restful scos apis a range of custom vocabulary apis and they're all implemented differently and of course we have a sparkle interface into the knowledge graph but that doesn't give you a simplified standard api for other applications to use in addition to the lack of a standard api we also don't have a standard search mechanism that allows us the type of flexibility that is often required when searching different vocabularies on that going to what our proposal is to move forward towards a standard we see the need for a vocab specific api that is able to handle the varying specialized search methods that are needed by vocabs ideally would like to use an existing open standard that is already established in the community the standard should also allow for human readable presentation when viewed when viewing the api through a browser and of course we're hoping the api has a chance of one day becoming a standard so it can simplify integrations with vocabs and encourage that wider adoption to take our proposal a step further after our experience using apis from the open geospatial consortium or OGC we're in the process of implementing OGC's apis for geospatial datasets and our catalogs in our own pre-system even though their apis are designed to handle geospatial data they're also well suited to handling simpler non geospatial data in their OGC records api for catalogs is an api standard that we could use as the basis for proposing a new OGC terminology api that could serve as a standard api for vocabs it's important to note that OGC's api already supports cql and extensible search mechanisms which provide the type of flexibility that we're looking for with vocabs if you look at the complexity to implement the terminology api we see it as being more complex than OGC's existing records api due to the additional concept relationships but simpler than OGC's feature api which handles the more complex types of geospatial relationships and for our press system we're implementing the OGC features api records api in their cql interface we see the set of OGC apis as an important set of standards that we can utilize in our own press system if you look at the api structure defined by the OGC apis you'll see that they follow a consistent pattern for their records and features api as shown there these apis provide access to catalogs feature collections and their items for the terminology api this would follow the similar pattern where we can pass the concept scheme and the concept id that we're interested in a further useful extension we see in the introduction of an overarching is the introduction of an overarching catalog where we can further arrange vocabularies into catalogs and this would similarly apply to having catalogs of catalogs extending this approach again it can be applied across OGC apis that will also include the feature collections as well so the addition of the catalog level allows us to use a catalog for any type of collection of information we're looking to organize for concepts that exist in a vocab or concepts that exist as part of a collection the following type of pattern could be introduced where we include an additional path to support the retrieval of collections of concepts differently to concepts in a vocab but these types of design considerations are what we're currently working through at the moment for search support a pattern that could be used is where the search parameters are used at different points in the api and with this approach the api scope defines the items which should be considered when performing our search this gives us a simple logical method for providing a starting point for our search when it comes to catering for different search strategies a strategy parameter can be provided along with the search parameters and this would indicate the method to use when conducting a search this gives us a very extensible approach to supporting different search strategies lastly our timeline in our pre-system for the adoption of OGC apis is that we expect to complete support for OGC records api this calendar year extending our support to OGC features api for the cql implementation in q1 2024 and finally the new terminology api implementation is targeted for q2 in 2024 so that concludes the presentation please feel free to send any questions to the email addresses on the slide or visit our websites for more information about kaurawang or any of our pres related tools thanks for your time um hi everyone my name is kailin and i'm from csiro and i work with michael and i'll i'll be presenting um our tool snapper gogo which is really about facilitating collaborative mapping workflows um sorry um so as part of the australian research data commons rosetta project and there was a need for a tool that could assist researchers to map local terminologies to international standard vocabularies and terminologies so that's really about the interoperability as well as the reusable quality mapping development rosetta's objective is to deliver a national nationally recognized and utilized shared terminology mapping service focused on the needs of information systems delivery researchers and research in australia so snap to snow med was a tool that was developed and by csiro and it was identified as a potential starting point for the tool but it did need to support mapping to terminologies other than snow med ct so it required a little bit of enhancement so i might just start off um with just a little bit of background on that snap to snow med tool snap to snow med was um developed for snow med international as part of their commitment to supporting the adoption of snow med in national regional and local implementations they recognize the use case that many new implementations required migration of existing codes to be mapped to snow med ct so it could be used natively in new software so to support these implementers snow med international commissioned us to develop an open source tool for mapping local terminologies to snow med ct and that is how snap to snow med came about so their use case was really that requirement to map existing legacy or proprietary code sets to snow med for migration of systems for interoperability for reporting for clinical decision support and other use cases similar to those so what we wanted was an open source tool that supports that implementation by allowing users to collaboratively create and maintain simple maps to snow med using a guided approach to help help help users their vision was really a hosted tool for members in their stakeholders so that they could create those maps there was a group of different stakeholders so snow med international as well as national release centers and then organizations with those countries there was a few different needs there and there was a large number of product features that i think are kind of standard to anyone who's creating any kind of map so the development process of this was done as a collaborative iterative project it was it had many different stakeholders with many different requirements so there was an international use group user group with many of the snow med member countries those user group did talk about what their mapping requirements are and i think they kind of extend out to like all sorts of groups that do mapping they had all sorts of different users within their mapping team so sometimes they had ba's sometimes they had clinicians sometimes they had researchers it was really really like a wide variety they had different size teams sometimes they only had one resource sometimes they had up to 10 or 15 they all had different workflows and different business rules that they needed to adhere against so the development of this tool was really about trying to balance all those different requirements and so to do that we needed to try and give a flexible workflow but still kind of have that guide rails and also provide some intuitive design to allow different types of users to be able to use it without too much trouble so snap to snow med the its key features were really the ability to create simple maps to snow med ct it needed to be online and easily accessible for snow med international they wanted to keep the sort of user user list so you did need to you do need to require you do need a login to their system and it has an intuitive ui to make browsing and mapping of snow med ct easy and efficient the other key features that i kind of wanted to talk about today was the collaborative workflow component so at the moment a lot of people use things like excel there's a few proprietary tools out there we have one called snapper but none of those easily use allow like easy collaboration so teams of users to work together to create a map rather than sending spreadsheets or lists of things back and forth for author review so our tool allows for a single author mapping and a dual author mapping and i'll go into those workflows in a little bit shortly we also allow for an optional reviewer it uses task based work when work is assigned to people or they can be self assigned and there's user role based control per map so within the tool a person can be assigned different type of membership and that membership will allow them access to the map but it will allow them to do different things in there so there's a little bit of control there which is often quite important when you're mapping health data so the single author workflow is where you have a set of source codes to be mapped each source code can be assigned to a single author or mapper and then once that mapper has authored that row it can then optionally be assigned to a single reviewer to then review it so this allows at least two people to work on each source code so you could have a group of mappers working across each row will be mapped by one mapper and then one reviewer if you need it the other type of workflow that we had there was dual author workflow this is where two authors map a single source code independently and then that that the results of their mapping gets compared and then depending on whether there's a conflict or not it would go to another person so you can see here the source code by mapper one and mapper two it's blinded between them and only after that after they both finish mapping it's unblinded there's a comparison to say whether there's any conflicts or not if there's no conflicts it can be completed there or it can optionally go to a reviewer if there is a conflict it can go to a third person who's the reconciler who can then kind of adjudicate and decide which is the appropriate mapping so sometimes this is useful if you have a clinician who is only available part of the time and they're able to kind of adjudicate the difficult things to map you could just assign they could be there and that's way to help manage your resources after the reconciler does their process it they can then optionally also add in a review step there depending on your business rules another key feature of the snap to snow med tool was to allow the creation of new map versions and there was an additional ability to allow you to do that map maintenance as well so where there may be targets that are no longer in scope because they've been made inactive or they're no longer part of your original target code scope it can help you manage those identify those and then provide suggestions on replacement so for snap to snow med it was able to leverage snow med CT's terminology features by suggesting replacements using a historical association reference set so this diagram just quickly shows sort of that maintenance workflow you have a map that originally was mapped to the July 2019 edition when you migrate it to the September 2023 edition one of the targets is identified as out of scope and so snow med has its own historical association reference set which proposes replacements for the map target and and the tool will display those to the user in a in a simple fashion so that it's easy to make a decision whether we want to with what we want to do what action we want to take other features that are part of this mapping tool is automated mapping suggestions so this can get applied as a bulk operation across a whole map or for a selection or for single source term terms this at the moment leverages the fire terminology service which is CSIRO's onto server as well as snow med CT terminology features at the moment it uses that it we don't have any custom algorithms in it at the moment but we are kind of looking to see what could be exploring that and seeing what could be done there it does allow you obviously to import your own code sets but it also allows you to import any maps that you already have so that you can then maintain them moving forward and it allows you to export to JSON format csv tsv and excel format so if you are interested in using snap to snow med the scope is to map to snow med only but it is available free to use for users from snow med member countries so australia is one there's a hosted service available at this address here but it is also available as an open source thing here on github and there's documentation associated with it the only thing with the hosted service you do need a snow med international confluence account it is free and you just need to apply to snow med international and then that brings us to snapper go go so we took this just a two minute warning oh thank you we've just taken snapper snap to snow med and enhanced it for the ARDC it's essentially a fork off the snow med international instance it still is open source it now can map sort of any code to any code it uses AF authentication so that that kind of taps into existing users authentication systems we're still kind of working on what the requirements are from the ARDC stakeholders outside of what is there with snap to snow med and the extension we've already done so we are and we are also only supporting single author workflow at the moment and we just want to see who's interested in sort of having that dual authoring because that's a really big requirement in the clinical space but it may not be so outside of that and here's just a screenshot of what it looks like there's you can see here the target and the scope in the top left corner there's multiple you can add multiple versions we have a tasking workflow on the right and then sort of down down the table is sort of your imported codes or your target source codes and then on the right is your target that's that's all I wanted to show thank you hello everyone checking in can people can people hear me can people see my screen I see thumbs up wonderful all right thank you so much for having me uh yes my name is keo Paulson I'm going to be talking about advancing high formats capacity to leverage multiple vocabularies uh if you want to talk about me or this presentation in the future I like they them pronouns uh I've been I worked on this presentation with my colleagues Jane Greenberg and Scott McClellan uh at the metadata research center at Trexel University so what I'm going to be talking about uh is why we called this platform software Hive why Hive uh then what is Hive format sort of a uh iteration of Hive we're going to talk about the functionality of how Hive format works talk about the uh navigating vocabularies searching within vocabularies uh indexing articles uh across like according to vocabularies uh we're going to be talking where uh folks may notice that it appears very similar to some some of those vocabularies mentioned by the previous person uh or or bioportal or map portal are the ones that I'm more familiar with here in the US uh and I'm going to be talking about some of the recent advancements that I have been working on specifically and then what we hope to be able to do next with this little platform of ours so first uh why Hive Hive stands for helping interdisciplinary vocabulary engineering um what that ends up meaning is that instead of being a repository that uh maintainers of vocabularies have to sign into and uh give their vocabularies our strategy is to go and find lots of vocabularies and uh that could be on different places so here for for me in the US Library of Congress subject headings the RDF is XML is here uh the asthma ontology the RDF is in this location the USGS can find the RDF on this different US government website and then this United Astronomy Thesaurus they have their RDF on GitHub and instead of having to find all of those and explore all of those in their respective websites we've downloaded it and put it in this uh in uh in one place so you can see the asthma ontology Library of Congress subject headings USGS and Unified Astronomy Thesaurus so not for building vocabularies uh truly for for crawling for vocabularies and putting them putting them them in one place all in one place so pros and cons of this approach pros are that uh it allows for some flexibility to be able to pick and choose the vocabularies that that you want to be looking for um then because we are crawling for these vocabularies we don't need as much marketing to to like ask people to submit their vocabularies to still potentially be helpful uh in terms of searching different vocabularies uh and uh based on the software that we have we can pick different collections of vocabularies for different purposes which we'll see for hive for match or hive for material sciences uh the cons is that uh at the current moment uh as people might be able to imagine like hearing some of our previous presenters of how there's not a common API uh downloading each of those and normalizing them is a very manual process um then and since they are coming from different sources uh the the schema for those various rdf could be ever so slightly different to normalize into a scos uh and then because we don't uh do as much marketing uh we're not as well known in uh in the academic space or or across like internationally uh which we're trying to fix hello thanks for thanks for having me um so then what is hive for mat hive for mat uh is using this hive technology but for material sciences um or vocabularies related to material sciences as a rundown for material science the the vocabularies in that space uh have four major facets that vocabularies can be about either the structure of a material the properties of that material the performance of that material under various tests and the um the how hallow materials processed to to obtain it all define the characterization this this seemed like a helpful uh iteration a new version of hive because when we talked with our colleagues in the material science domain they let us know that the literature review process was was very time consuming and so by with this little platform uh people are able to see which vocabulary and which terms might be relevant the hierarchies to find other adjacent uh papers and being able to take your own paper and figure out which uh vocabularies might already exist and and how to uh how to use keywords or those kinds of things for for your own papers uh so let's take a look at the functionality we have uh navigation of exploring the scott space hierarchies looking at the notes the uri uh and uh being able to look at that in different metadata forms so uh this is the vanilla version of hive and then here is up hive for matt uh looks very similar uh so common core ontologies entity continuant independent continuant material entity uh let's do object organism oh an animal a person perfect i'm a person um can look at the the uh uri here i was searching the our presenters are very cool uh so you can see that the uri takes us to the website which in this case is just the raw xml uh can see the subclass the superclass uh where we can see the the uh broader of animal and the and the narrower of of these different things and we can see the different uh formats jf1 link data scoss rdf uh double and core xml and you can copy those fairly easily as well uh vocabulary search uh we can go into the search we can pick common different uh vocabularies here pick which ones that you want to search across uh and the example i have here is silver and so can see oh bwd has silver silver adam usgs and you can see the alternative label notes uh subclass of this uh and can still see all of these different uh metadata formats and we also have article indexing uh so if i take the index here i'm going to use this wikipedia article about silver uh take the url you can also do this for a text file or or even a pdf online or on your on your desktop i'm going to choose the same vocabularies as i did before i'm going to use this rake algorithm to do keyword extraction from the article uh index it and within a second we can see things that we recognize silver silver adam uh but also single crystal dissolved metals uh and again uh be able to see notes about about all those things uh so how is this different from other repositories like bio portal or in mat portal is a equivalent for material science um so i'll just i'll also go here where uh mat portal this is the material science equivalent has this annotator function which uh does the similar thing as the indexing uh however and this is the paragraph the first paragraph from this wikipedia article of silver uh but in this first paragraph uh the word silver shows up 10 times not including silverware and so it found silver from the mat ontology like like hive did like minded like here uh mat onto silver um however because silver showed up 10 different times in the article it is repeating that silver 10 different times and then 10 times for bwmd domain and then times for bwmd mid so it's it's hard to see them the which terms are clearly and it it's difficult to navigate uh the way that it is here and because they are finding each and every single match uh the annotator is limited to 500 words maximum whereas hive uh because we're doing some keyword extraction uh we're able to do basically any size of course if a if an article got really big like 50 pages 100 pages it would slow down um but theoretically could do could do any any size uh so some of the recent advancements that i've been working on very excited about um is that i uh was working on the keyword alignment algorithm to align uh terms from an article to uh keywords in the various vocabularies and before the algorithm uh we were having trouble pulling any keywords for a corpus that that we were using related to material science and afterwards we were finding we were able to pull keywords with 53 percent relevance um which was an improvement from before uh it's faster so now even if i include lcsh which has over 400 thousand terms and i start indexing it should resolve within 10 seconds as i'm doing it here uh so another one one thousand there we go uh see here see we have uh all of these things from silver including like silver in literature and that brings me to the next point that silver in literature that phrase wasn't exactly mentioned in the silver wikipedia article but uh the the algorithm is a little bit more tolerant being able to different like being able to accept both x hyphen ray or x ray in an article uh even if words are switched around like diffusion x ray versus x ray diffusion uh it uses this concept of string distance or string similarity to be able to still find things that are close enough uh and some ambitions for the future of what we're hoping to do with this morning thank you very much in a morning i heard uh and the ambitions for the future is that uh we're hoping to be able to make it faster so i mentioned before that it's a very um manual process hoping to automate that process of updating vocabularies make it easier to plug and play a new vocabulary um that based on just how it is currently uh uh 53 percent um is for the relevance is great uh but being able to optimize that even more based on some recent results uh would be great and uh being able to have more different more collections of vocabularies say hive for museums or hive for literature or hive for insert your discipline here uh and then as we work on this it would be great to have an open like i uh likely we're going to end up uh building some sort of library to ingest a vocabulary thesaurus ontology or taxonomy and being able to parse that into a more normalized scoss with uh minimal um input uh seems like something that would be very helpful in the space uh so thank you very much that's everything that i had uh if you have more questions besides right now uh consented to mrc dot metadata at rexle dot edu uh yes uh supported by the nsf uh thanks so much everyone uh my name is uh conzac siravahenda i'm currently phd student at babeshi bolga university kuzhna poka in romania my subject i want to share with you today uh it is about knowledge based economy vocabularies reflection on graduate employability so uh as mega trends and programs uh there is a rise of usage of employability skills capitals decent precarious work concept in today uh rapid changing world of work while in context of employability in a global competitive knowledge economy is recognized the romanian labor market now it's a home to many outsourcing company and uh there are many complex issues young university graduates continue to face when transitioning from university to work employability uh frankly speaking it's a complex multi-dimensional process why some consider it for this concept sometime lacking clarity and specificity of meaning most three vocabularies to describe employability uh focused on the terms related to job skills qualification industry specific knowledge and personal attribute valued in the job market the aim of this presentation uh i want to demystify knowledge based economy vocabularies uh mostly employability and skills these articles wanted to understand and explain what the employability means to main social actors when i mean social actors uh our employees manager in managerial positions such as team read managers HR professionals and the employees front-line employees working as customer support representative in romania i also want to shed light on how those uh social actors describe the concept of employability the methodology i use uh to call out this research this is a qualitative design with reflective thematic analysis and uh a critical realism approach this is based on two years of ethnographic okay study which i undertake 2022 2022 i use the participants of observation method semi structural semi and unstructured interview method to correct the data and then analyze with any viewable terms of the way as i mentioned before the location of these studies located in romania russian apocat romania it's one of the country in east european and russian apocat the second biggest city in romania so uh short demographic data of participants i had uh 21 customer support representative young people working as a front-line employees and i had also eight team read for manager all working in auto sourcing corporate so uh the key findings uh are as for uh participant described on probability as uh owning knowledge skills and attributes requiring the world of work also the second biggest description uh rate and probability as means to know how to accomplish job responsibilities the last one and probability refer to the ability and the willingness to search and maintain employment so uh i just choose some specific quotes from the interview that i did where they were uh describing improbability as uh essential skills that a graduate has and the capability to use them such as uh communication organization commercial and awareness and personal skills also uh use tina in 25 years already customer support representative described the improbability as a competence to maintain an employee position and get the job you might like improbability is the possession of aptitude that help acquire job and remain productive in a transparent world of significant world of significant change improbability also it is the possession of the proper mindset to learn and to guess what to improve and to do the best you can improbability are compass graduate knowledge skills and understanding that they indicate the value of um pro i to um pro i a discussion on the result uh we can start saying that improbability is about processing uh necessary skills attribute and knowledge improbability also is a process which it's not it's something uh dynamic that can just be elaborated uh within time and uh improbability uh depend on your position in the labor market if there are many uh graduate with few jobs the competition will be uh tough so uh your position will depend on your level studies your experience your possession and so on we have seen that improbability rest on employee employer expectation yes when conceptualizing improbability which we've consideration of improbable graduate who might for fear um prior demand these uh types of uh describing improbability coincided with improbability through a political uh perspective improbability also uh it is a policy concept participant uh described improbability interchangeably as a graduate work readiness graduate attribute exceptionally in the outsourcing company soft skills remain more important uh to describe uh improbable graduate improbability is extensively described based on micro if i can say individual level and less on macro level labor market supply and demand characteristic here we see wherever uh we are discussing about the improbability many people tend to be based only on knowledge skills possessed by an employee but few will be discussing about macro uh characteristics such as the composition of labor market the competition that is in labor market and mostly uh and the policy are available for uh integrating young graduates in the labor market we have found that improbability concept it's a multi-dimensional phenomena simultaneously confusing and misunderstood and misunderstood why it's because uh it's theoretically a good concept but it'll define sometime exaggerated by the press and trapped on by employer and the numerous police maker for instance some are comparing improbability as employment but for sure it is possible to be employable yet an upright or under umpride the sporadic interpretation of improbability practically impacted the improbability practice as uh from creating remarks improbability is widely regarded as ability to obtain do and search for employment yes being improbable may require skills attributed knowledge however as I mentioned it is possible to be employable yet an employer an employer improbability the main most misused concept in the knowledge based economy as I mentioned before improbability is widely regarded as a policies concept as the desire to have a knowledgeable positive society make the concept of improbability more politically motivated and probability terminology are like to be strong defined based on the nature of employment context based on the nature of employment based on the concept context uh such as job markets a specific industry they have their specific uh terminology education and training they have specific uh terminology and entrepreneurship and improbability terminology also can be uh defined based on micro meso and macro factors and probability is a complex notion that require continuous redefinition and probability is a lifelong career process that require the interconnection of different stakeholders to its usefulness thank you for having me as I mentioned uh I'm Gonzaga Silverhinder and I would welcome your question and recommendation thank you okay thanks very much Megan and thanks everyone and to say um in some ways I wish you that could be there in person but say given it's a big event that's there for me personally as well as they are I'm joining you from the lovely South Island New Zealand in fact um so I'm actually talking today about um the sort of continuation actually from last year's symposium picking up on work we've been doing subsequent to the um last year's symposium and um really looking at how we can drive forward an agenda for vocabularies and the vocabulary ecosystem Australia now I'm speaking here on behalf of uh many people including you know a number of your hosts um here in the in the last couple of days um you know including Natalia, Megan, uh Dougie Boyle, um Leslie, uh Kieran, uh and a host of others so let's say it's uh say this is certainly a group effort um that we're looking to here and it's a it's a community effort we're looking to engage everyone with so I want to bring everyone up to speed onto where we've come to um with the the roadmap itself and it's it's imminent release um so let's say we want to just kind of want to cover through today I've got more slides than I intend to actually present but say I'm going to sort of give an overview of what we're trying to achieve with the roadmap here um so why we want a roadmap um our idea of say really thinking about the vocabulary ecosystem in Australia uh and um and how it works I might just turn my video off I'm getting an unstable signal so I'm going to leave it um as um video off the priorities for the roadmap um and the next steps and your participation what we'd like to get your involvement going into the future okay so um we've included a definition of you know vocabulary is broadly in the the roadmap document so often you may agree disagree with this what we're trying to do is really cover the you know the the gamut of what we might consider vocabularies ontologies uh the sore eye uh controlled lists etc um really trying to think you know pretty much about um sort of the range of things that might be considered within a sort of a vocabulary ecosystem itself um this is a definition that actually comes from the Getty Museum who you know have developed one of the major arts and humanities vocabularies but as it says a working definition for us but the point point being here is what we're interested in is really trying to to look at how we can coordinate the community of vocabularies users and infrastructure providers here in Australia and internationally uh and for this we sort of drawing on the idea of trying to you know develop a a roadmap for actually a vocabulary ecosystem not what anyone should do with one specific vocabulary but how we might coordinate our ecosystem more generally we have the challenge here in Australia that our ecosystem is diverse and largely uncoordinated we don't have a particularly clear mechanism for you able to coordinate activities between folks of different things and that includes coordinating with our international colleagues a number of whom are presenting in this session as well and throughout this um throughout this workshop um and that's a challenge that we're to say really is the focus of what the the roadmap itself is trying to adjust so we're trying to frame up well what might that you know a functional and effective vocabulary ecosystem actually look like uh what are its constituent parts and uh what are some goals we can look to in the short medium and longer term that we might want to achieve as a vocabularies community to better coordinate and progress our ecosystem of vocabularies and vocabulary services as much as this is really to you know to orient around particularly around fair vocabularies you know it's if we can't you know we have difficulty even finding a lot of the vocabularies we might want to use and what makes makes for a good vocabulary let alone being able to think about accessibility interoperability and so forth so it's this diversity in this coordination that really we're trying to emphasize so why why then the roadmap I was saying this came out of as I kind of mentioned last year's vocabularies symposium that's really where the work started um the following on from the symposium which is the the public event we actually had a workshop of about 30 colleagues uh we they're sort of with you know highly expertise and interest in vocabularies and you know this included vocabulary developers users infrastructure providers and the like to follow on to explore how the ecosystem might be made more sustainable and broadly used and so they ranged in you know from government to academic to some of the consultant and developer communities here in Australia as well a subset of this group has met on a regular basis since that foundation workshop with the aim of establishing a roadmap for the future of vocabularies in Australia and more broadly internationally because many of us are tied into international communities and need to be able to facilitate access to those communities effectively so the roadmap is in tend to be our shared view of what a vocabular ecosystem might look like what are the key elements of that ecosystem and means for achieving what that future state might be and it's a shared vision we're looking to bring others on board with as well so we take the idea of an ecosystem and say and and say how to you know how do we coordinate the ecosystem as a whole vocabular unit of itself is an inherently useful thing because it helps us to organize knowledge and terminology and a lot of our users and a lot of our vocabular developers are actually not that interested in the technology provision itself it's the coordination and management and expression and instantiation of knowledge that really drives their interest here and the use of that knowledge in a consistent way between users organizations and system the vocabularies however exist within a broader ecosystem of usage from a you know essay and we can really think about that from anywhere from a simple controlled list within a single organization right through a commonly held and used terminology across an increasingly broad stakeholder group so we take this essay idea from colleagues who've been working in the environmental sciences and and before that some work that was done by McCreary so more recently this is a paper that's about to be published in scientific data which looks at the the idea from you know the semantic letter of the types of vocabularies and you know ontologies and the like that that we'd be interested in coordinating and we can start right down from the bottom there from the simple control vocabulary through the complex you know forms of ontologies that might get used in some of the some of the domains ocean sciences for example is where there's been heavily used they might be more or less expressive they might be more or less complex they might be more or less heavily governed in the in the in their operations we need to be able to you know understand the gamut of those vocabularies and allow different vocabulary providers to situate themselves with the ecosystem as a whole so what do we think is it within a vocabulary ecosystem well I say and this I say the national information science organization in the US has been thinking about this as well so we're really tapping into a coordinated activity internationally here they've got a nice definition of how they've been describing this a broadly distributed ecosystem vocabular creation maintenance and use based on a commonly agreed URI infrastructure to support distribution of terms consumers based on their explicit preferences so they've been thinking about you know what's required there and thinking about some of the the elements of the ecosystem we've kind of picked this up and tried to articulate a little bit more explicitly as to what are the things we think probably need to be coordinated here in Australia so we articulate half a dozen elements of what we see as sort of the the major part of the ecosystem here in Australia that we'd like to be able to you know bring together and and compare and coordinate activities within the community and I'm going to work my way Steve thanks Megan yep so I'm going to work my way quickly through these and just give you a highlight of you know where we're heading next so the half the six of them that we've kind of identified within the space here I say and I'll summarize these quickly as stakeholders governance skills standards infrastructure and tools and policy of the six parts here what we're saying here is say is is there's a maturity that comes as you develop a vocabulary and that your reliance upon your integration into the ecosystem increases as you progress through the development of your vocabulary early on in the piece it's really about instantiation instantiating and representing you know knowledge effectively so the priority here is really subject matter skills but as we grow in terms of the firstly the use and then the development governance of and maintenance of your vocabulary you increasingly start to engage with the different parts of the ecosystem in in more and more complex ways and the demands that are put upon how people access and use and seek to integrate with your vocabulary become increasingly complex as well so in the the paper that we were developing that we'll be releasing we hope soon we go through and try and articulate firstly a vision and a mission really trying to you know focus on the next 10 years the utility vocabularies is well understood and a burgeoned ecosystem vocabularies and services perform a key foundation of the data assets that we use here in Australia so our mission is going to be to you know focus on the further consistency and sustainability development implementation and use of vocabularies across the domains to allow us to solve real-world problems we go through and try and provide some basic principles in the paper itself so some examples up on the screen here focusing on vocabularies a representation of knowledge focusing on the you know providing the means for people to be able to support the transmission of vocabulary rather than necessarily having to be you know expert in the technologies and the management of use of vocabularies themselves we're looking increasingly look from human to machine interoperable services so humans are not the consumer here it's actually machines and how do we facilitate those you know the transmission between machines effectively so humans become the end consumer but not the point of contact for the categories themselves there's a and it's about nine vocabulary try and put their last one you can see down the bottom there really trying to be able to provide as ecosystem but provides builds and maintains trust in the vocabularies that people want to use so they can find them effectively and access them and interoperate them in a suitable way we go through and we yep we go through and try to provide some definitions of what each of those components of the ecosystem are so I've got a summary and you can you can see the slides are online you have a bit of a flick through as to what those ecosystem elements are as I said standards vocabularies governance infrastructure and technology are supporting the sets of demands that the organizers and users place upon the system the stakeholders have of the system and then policy which kind of provides the framework under which you know vocabularies can be used we then conclude with a series of setting directions trying to provide short medium and long-term goals for each of those components of the ecosystem itself and this is where we're really looking for input from the community itself what we tried to articulate from each of the subgroups that we've had there is a set of short and medium and long-term goals we'd like your input on those to get a sense of how you feel about those that we captured the right information we kind of reflected those and you'll see those there's sort of three or four sets of goals for each of those timeframes and then looking forward to the future publishing the first draft the roadmap hopefully probably in the next month is kind of what we're looking to here we'd hope to have it ready for the workshop here we're not quite there but we're pretty close so when we publish this here and we'd like to be looking out and we welcome feedback from the community itself on have you know have we covered the right ground in terms of what you think is the constituents of our ecosystem do you need clarity on any definitions of what we're trying to define here do you have any particular priorities of the vocabularie ecosystem that you'd like to see furthering and who do you think this sort of roadmap or to be going towards looking towards the future is the orientation is there so that's an update on where we stand and the roadmap that's there you can look forward to seeing the roadmap in the very near future and we'll look to circulate that to this community and the say the groups the attendees of the conference and last year's symposium as well to circulate for your feedback in the near future and that's it for me mega we we have this presentation that need the opportunity to present our thoughts on ontologies in the age of large language models and navigating the political dimensions in industry applications before we dive into the subject matter we would like to acknowledge the ones of people as the traditional owners and custodians of the area of buru birth on which curtain and uwa birth campuses in the seaside of kensington side are located and pay our respects to their elders and senior knowledge holders past present and those following in their footsteps overcoming the ambiguity of language is a centuries old quest the flood of information and the promise of machine actionable data and services have made this quest even more urgent and attractive however attempts at the semantic web have only seen low level of adoption the vision of the semantic web was formulated early in the internet age once the web became too large to be mapped in guidebooks the w3c developed a frame technical framework for the semantic web but applications were lagging and some implementations never materialized the semantic applications we see today were driven mainly by search engine operators but when you look at the right hand end of the google trends graph at the bottom of this slide you can see an uptick of interest in ontologies market blue and knowledge graphs marked in red why is there this sudden renewed interest in knowledge organization industrial facility assets are documented in huge volumes of documents how can they be consumed and organized by relentless reading building business logics manually constructing knowledge graphs companies have worked on enterprise data systems but controlled and uncontrolled systems continue to exist in parallel centralization of data in the cloud made data more available but it did not add any further degree of order current systems still suffer from a limited use of agreed semantic elements and systems are commonly bespoke closed and company specific going back to the internet the internet is full of unstructured text multilingual and distributed yet the internet works we can find things on the internet key elements that help us are the resource description framework and schema.org to provide context to the data in the web pages other factors driving adoption are that the technology should be easy to implement and data are available as an example services based on location data had had the difficult start government surveys were asked to sell data for profit and implementing services based on geolocated data was error prone a small but elegant simplification in the encoding made working with geolocated data much easier at the same time governments realized that there would be more economic benefit from open geolocated data than from any attempts at selling them these changes open the doors to a large variety of location based services we use today I've showed this graph before and pointed out the recent rise in interest in ontologies and knowledge graphs search engines have shown the usefulness of knowledge graphs to provide context to search queries knowledge graphs are now also seen as a way to improve the outputs from generative AI by acting as guardrails for AI at the same time large language models can also be used to extract ontologies from large text corpora the elegant hack for using knowledge models in industrial applications is to recognize the layers of scope and delegated responsibility these graphics show how IKEA delegates the responsibility for providing and curating parts of their knowledge graph only the top of the pyramid the high level concepts are centrally curated all further more detailed categories are delegated to agents in the supply chain the key aspects here are leave existing data word is treat relations as first class citizens each data gets a clickable address as url adhere to the same abstract concepts as used on the web and business specific concepts using ontologies standardization can open the door to new solutions based on a common standard early proprietary internet protocols were soon replaced by common open protocols the w3c and the open geospatial consortium are examples of organizations working on common open protocols that enable an entire online economy standardization can also be a tool for exclusion it can close the door to alternative solutions based on a competing standard it can build a mode around the company to protect its competitive advantage as an example we have competing standards separating the android and ios worlds similarly information structures can be used as tools of exclusion the standardization of high level ontologies for industrial applications has the potential for excluding alternative models incompatible structures can lead to exclusionary structures we already see manufacturers in the agricultural sector trying to build modes around their products there are lessons we can learn from the second wave of the semantic web as an analogy from geolocated data the political push towards open data and key technical simplifications of geospatial data protocols opened the doors to a widespread use of geolocated data and created a vast array of new opportunities for geolocated services to conclude the current interest by industry in ontologies is driven by the need to extract knowledge from vast troves of unstructured data ontologies can act as guardrails for generative ai to improve the quality of results large language models can also be used to extract ontologies from large tech corpora some industry sectors are starting to push for standardization of high level ontologies the risk of these standardization efforts at this stage is that they could act as barriers to alternative solutions the example of geolocated data shows how policy and technology development can come together to open new opportunities for new products and services through open data and open standards thank you for your interest in ontologies in the age of large language models and navigating the political dimensions in industry applications for more information about technical language processing please see our website at www.maintenance.org.au Hello everyone so can you confirm that you see my screen yes we can see you okay so okay let's go well uh yes indeed my name is Clément Jouquet and I'm from University of Montpellier and in Rae in France and I'm working with ontology repositories and and and uh semantic web technology for for a few years for a few years now I'm presenting here you a paper about ontology repositories and we also call them semantic artifact catalogs there is an expression coming up in in Europe more and more to encompass all of the terms behind the vocabularies terminologies tethered so this term of semantic artifact has really have all been proposed in the context of EOSC one of the one of the funding program for this work so I've had the chance to present this paper at the international semantic web conference about a week ago and I'm giving you like some of the version of that presentation also so presenting the the same idea okay so let's go I wanted to just say a word about the funding of this project just to give you the the names of the project behind that then include a few elements on ontology repositories and then present you the ontoportal alliance the the the group the community that is developing such such um such a work on ontology repositories and have more even more slide but that about the portal that we developed called agroportal so the part of this project is founded by a national project that we have in France called G2CAP for data to knowledge in agronomy and biodiversity and in this project we are developing that agroportal platform that instance of ontoportal for for for agri food and also knowledge different knowledge graph in the notion in the notion of agronomy and biodiversity with multiple partners in France and we are also founding part of this work also in the context of a project called fair impact which is a project of the European open science cloud roadmap in France in which we have a dedicated war package to metadata and ontologies in which we try to improve or create a framework in which semantic artifacts will be used by different communities and and the tools and methods the governance for for for relying on those vocabularies and ontologies and and the terms of semantic artifacts is also used there is being a setup so the European open science cloud is a very big program from the European commission we've had the cloud the core of of this vision the notion of open science and the adoption of the fair principle and of course to adopt the fair principle then semantic vocabularies semantic artifacts are indeed very important just give you a few elements really related to related to ontology repositories so i mean you all know that ontologies are spread out in different formats different size different structure and crazy from a crazy there are an increasing number for overlapping domains they come from different representation language they come with different semantics and of course they overlap so this is an illustration that really motivates us to understand that building ontology repositories or semantic artifact catalogs as we also call them really help us to address the fair principle so the previous talks really mentioned also that vision of fair vocabularies and and and and the symposium is very much about that so in a few years we argue them the fact that you don't have fair data if you don't have fair vocabularies and on developing ontology repositories one of the key element one of the key steps in making those vocabularies fair you can find them access them make them interoperable and reuse them and the screenshots that you see on these slides are really taken from that platform that we develop based on the entrepreneurial technology that i'm mentioning here so originally i'm sure many of you are very familiar with bioportal originally we had that platform called that we nicknamed a one stop shop for biomedical ontologies it was about 15 years ago now a little bit more and the idea was really to build a repository for biomedical ontologies make them findable accessible and also serve them not only deal with their the listing of their metadata and they're on a catalog but you really like offer services for their content so we indexed the the content will serve provide search service people will be able to to browse them discover the content and align see the alignment between them etc etc the impact the impact that bioportal had in the community was quite big you hope some of you may remember also the link open data cloud and that was the that was the version of 2015 something like that when when eventually all of the resources being published and produced by bioportal where end up in the cloud and and so we change literally change the the face the familiar face of that link open data cloud that's just to illustrate the impact that working on such a project for years gathering all of the vocabularies and ontologies in the domain will have on such on such a concept on the web of sharing link data so we build we built an architecture for a generic ontoportal system so a system a system to really serve ontology ontologies we mostly use the term ontology because historically ontologies were the key although the master element but more and more vocabularies have been added including a scarce vocabularies and we also now have a full support in agroportal of the of the scarce format so all of these components i'm not asking you to revise that architecture but just wanted to mention that from the application on the top including the ui and of course the the web application the the web services layer then we we have that business logic that is implemented mostly on really but also in the back end a triple store we do think it's super important to continue provider services basing all of our technology on a triple store that store the the ontologies in the back so from that sorry from that architecture all of those components were package quite complex components all of them are available on github for years and years but eventually about nine ten years ago they were packaged into something called the ncbo virtual appliance and eventually the ontoportal virtual appliance and the appliance was really is really the package of that technology that we can have to reuse the to reuse the ontoportal technology and develop your own your own software that's more or less what we are doing now with the alliance so so the alliance it is the gathering of the different groups working on on the on the on the technology behind and so we try to promote a semantic service services based on on the on the on that chair technology so that by portal was generalized and we eventually start working with with it in different domains like including the for example the biomedical domain but we focus on the french language that was one of the project we had in monopoly about about ten years ago and then eventually in other areas like agronomy agri-food and others join after that ecology biodiversity i'll show you the slide after that but that really the generalization of bio portal was at the beginning the history of the alliance evolved evolves with the numbers of partners joining eventually and the new portals being developed on the right we we now have a some kind of an established a little group that work that works together and really we have those workshops that we organize now and and we really think something happening in the something happening in the fact of developing together a shared project i'm i'll show you a little bit more there so really the motivation of the alliance is really to mutualize the research and development efforts we want to maximize that ontoportal value the state the portfolio of the services that we offer we want to consolidate the software manage that software with several people we want to increase the semantic uptake so every time we had a new party or a new a new group and we know that we are reaching out to other peoples that are not that were not necessarily in the realm of semantics before and we also thinking in terms of finding a model for a long-term operational and financial else of our ecosystem so you see different organizations have adopted the technology to develop portal in different domain from biodiversity and ecology to the biomedical domain of course but also industry material science or more recently earth sciences and those are the adopters of the technology to provide what we call public domain specific ontology repositories but i will show also with a slide at the end that's not only the our users but those are typically the users of the alliance the members of the alliance so i just give you two illustration agroportal is the the ontology repository that we develop at in a in Montpellier with different partners it allows you to publish third download browse ontologies you have those peer review mechanism versioning you can deal with an annotation service or recommender service you can have a mapping repository note mechanism feedback mechanism you can also enter the project using ontologies or to demonstrate how they are alive and things like that and our focus in agroportal is on agri-food i'm illustrating here with another one from the industry domain could developed by colleagues from from a european project called ontocommons those are focusing on the on the industry domain so just showing here the fact that we have been developing also a different pace in agroportal for example we are always in line with the ontoportal technology but we have added many features so we are progressing on to new proposition from from the metadata model to the announcement of the annotator service the fairness evaluation etc and we try to maintain that and then eventually offer that those new contribution to the group so if i give you one example of the things we are doing we are doing for example in agroportal and that is it's good illustration of how we pass things from one community one group to the others we have now something called offer an ontology fairness evaluator inside agroportal so you've seen that first slide where i'm arguing that ontology repositories help help to do to make a ontology fair so we wanted to kind of demonstrate that so we have them develop a method that is published with a bunch of question and test to find out if the vocabulary and all the fair principles applies to vocabularies and ontologies and we have implemented that method almost completely in agroportal in a tool called offair if you go to the portal now you'll see a mechanism and then explanation a lot of explanation on we deal with the metadata record that we have in ontoportal in agroportal do not to demo to to calculate or and then find out a score that we call a fairness score and on the right you see all of the adopters that have in the alliance that have adopted this code interacting with us this is not trivial i mean this is not happening in a in a clique but we are working to hold that sharing the code and making sure that our our our our meta models are aligned and can be reusable can be reused by others one minute so in a few words we are really working on making ontoportal a real open source project adopting the code and principles of open source project everything is on github from the documentation that everyone can contribute to to hold the code and we are trying to to consolidate that vision also of having everyone contribute to that the membership is really increasing i wanted to mention here the fact that we have nine public repositories the one that i've illustrated but we have also users using the technology in the different contexts for example hospitals taking the technology to run services like the annotator annotating some text data on their resources without without running the service running the service in house because the data cannot go on the network so those those are examples so we know we have implemented some features to to to let the box being deployed let us know that they are alive and and we know that we have a we have at least around 60 80 resources platform running everywhere on earth so finally we've set up those ontoportal alliance workshop starting in 2022 2020 and then in 2022 2023 and we are more and more people a year and we are really developing a group starting an activity so so that's really also a call for looking at what we are doing and eventually joining us you'll get more information on the website talking about the technology the product itself and the alliance pushing for that product and that international semantic web paper that i did mention that was published last week too in conclusion i will say well we are really welcoming feedback so feel free to to send us your your comment possible interest or declaration of community that you know could be eventually interested in adopting the the technology um of course i'm super happy to participate in the symposium in australia also today because i know there are a lot of things happening here the two previous presentation i did mention it i did show it and the whole workshop is also illustrating it so i'm also very eager to propose or show that the ontoportal is something that that could be also a consider a tool code to be considered in the different ontology of vocabulary project in australia every time we have a our overall goal here in europe at least in the context of yask is to try to have a proposition to make ontoportal deployable at the click of the mouse for a project of a community so you'll come you say oh we want an ontology repository and in a few clicks you can get that deploy on a certain infrastructure and use the ontologies you want inside every new community brings new ideas we like that so it's really a call to participate because we know that every time we have new people new community we have a new vision on the way to use vocabularies etc federated purple yes we're working on it and exchange with other communities are also coming and and i think i'm i'm excellent okay we need to wrap up here thank you klamot yeah no reason okay cool so um i would like to show you the software that we have been working on that we started to develop um recently that is supposed to to help um the everyday researcher create fair data um from scratch and let me start up with my background to this so um i'm working for the helmholtz metadata collaboration which is a project of the research association of the helmholtz research association of general research center and overall we're trying to make the research data in helmholtz more fair so i started two years ago and i learned about as a metadata developer that was the position what was called and so i started off with learning about rdx and then graph data and i was um i was very intrigued about having this technological opportunity to connect data and metadata in graphs and i liked that thought and um next i learned about ontologies and so i saw those as the tools to standardize terminology and document terminology right because you can for each term you can store and connect definitions and descriptions example synonyms and so on in a structured meta so that looked really great but then as time went on um i well i noticed a lot of people were talking a lot about data publishing about fair data publishing but not so much about creating data in a fair manner it was more like well now we want to publish our data let's make it fair or that but somewhat after the fact um and there also was kind of expecting to see terms from ontologies or vocabularies in directly in the data so for example having an ontology term in the column header of our spreadsheet but um that was really rarely the case at least for that for that which i was able to see i was at so we started to to develop an idea for for some software that would deal with these kind of issues um and in order to see how we can now standardize data uh with software let's start from what a contemporary data set looks like so right now in a very simple scenario we have a table here on the left side with some columns an e-column and i intensity here is an example for some some property or some things a duration and maybe with within units here like for example seconds for the duration in a second and then we might have maybe hopefully a metadata file in this case this is a text file that has a little bit of pretext and some structured metadata so here's structured metadata the device an operator in the project that the data in the spreadsheet should belong to um but as you can already hear this is um the link is not super strong and um there is no standardization here whatsoever this is all just all just text so then a first step would be to use terminology from ontologies of vocabularies in well for these properties that are listed in this table here for example intensity and duration and vocabulary for this unit will be great um also for the information in this text there are some properties in there that correspond to that could correspond to ontology terms as well as the the properties in this and this structured metadata like the device and operator project those could all be terms that from an ontology that have an id that have a label then a prescription then so on and hit before so if we could even just have a software that would allow us to to create a table and to create a metadata file in this manner right that will already be great so you would you would type in your column name and um you would then select from a list of ontology terms to click on click on the right term and then it would be in there and the same here that would already help a lot um as another element of standardization I also show here these um ids so here you can see a researcher called Mel who has some kind of org id um so standardizing ids identification of entities there's also an aspect to this but not so much I'll focus that now all right we are still missing some things so just selecting terms from ontology it's it's partially standardization however for the data structure that is which columns to choose which metadata fields to use is not not yet that's still free to do for anyone designing their data also the connection between data metadata is not yet super explicit at last but not least um having enough and the right columns and metadata fields that is having data richness there is also no no instance to control or report in that yet so what we came up with was that we want to have a software where the user interface looks like this here as you can see here that is where you can design your data structure as a graph you can have the tables on such a graph since since the table can also in the background still be graph data that's no problem but you will add some kind of your class to this so you will say what what class are the entities listed in this table and you can also have single entities like this project here or like this device here and you will add a class to to every entity say what kind of entity it is uh sitting here with this unit for a second and then you can have terms from authorities in this graph since we are very good fit of course because of the detail of the graph and you can even incorporate these IDs very easily so these entities are the entity IDs like an odd ID for example this makes it much easier to link data and metadata because we're already a graph the structure is quite explicit so if you create a data structure like this you can easily extract only the structural elements by omitting those value notes the value fields for example here in this table you would maybe have some some numbers in these fields and then the string and those and you can you can just um say okay to this um to this third here uh some kind of data type and unit belongs to it and that's the the structural part and you can just extract that from this graph here four minutes thank you Leon uh thanks yes right and then uh how do you really harmonize that with others so we want to embed this kind of um working on on graph datasets like these we want to embed that in a collaborative software structure so maybe you've probably heard of Github or Github before so you can use that to collaboratively work on graph datasets multiple graph datasets also for here one graph dataset second graph dataset with different um also a provenance tracking then automatically included and you can you can have a graph database like for example in Neo4j graph database and some software that listens or watches all these graph datasets puts it together in one graph database and then within the editor you can get the suggestions on which terms to use so imagine you're designing your table here and starting to type some kind of column name just from what you have at the top of your head and it will be able to suggest you a meaningful term that others have already used for example based on the the usage the that is how often other people have already used this term or also depending on the context in the graph so just which other columns are already in the table the big graph will be able to tell other people have used this combination of columns so maybe this person also can use a similar combination so this kind of suggestion algorithm will be a complex endeavor but quite an interesting and I think useful one so that is that is what we are trying to build right now and this is the the text tag for the prototype that we have so we're doing we're doing this with Docker so that everyone can deploy and try the software on their own machine without too much hassle and this is all supposed to be web-based and the graph datasets will have the json as a format since that fits well with the the web space we use github as I just showed you a new forj database and for the web front end we have this javascript frame but we'll start the kit right so thank you very much for your attention I'm looking forward to I want to present to you the acroboc thesaurus maintained by the by the FAO but not so much from a technical perspective but rather from the perspective of community involvement and how curation works as for myself I'm Daniel Martini I'm actually from the association for technologies and structures in agriculture in Germany so we are we're working on knowledge transfer in agriculture mainly but regarding acroboc we have a strong cooperation with FAO to support this this work so as for the team actually the presentation I'm giving is a teamwork of mainly six people that you you can see here Emma is actually the team lead at FAO Andrea is dealing with all the all the technical stuff maintaining the infrastructure and technology behind it then well main curators are Christine Koltzwuss and Esther Meech from from my team at KTBL and yeah then we have Veronica Vintorini who is dealing mainly with community involvement public relations and communications and myself I am kind of bringing in a certain yeah semantic web expertise but as I'm agricultural engineer also also content aspects to give you a little bit of context FAO is probably known to most of you it's an it's an organization of the United Nations and well its main task is is actually or the main goal is achieving food security all around the world they are dealing with topics that people have enough to eat and food is produced in a sustainable manner so and there are yeah most of the countries as it is in the United Nations most of the countries are also also involved or represented at FAO agro work actually is a it's a multi-link co-authesaurus covering concepts and terminology under under all of FAO's areas of interest so that includes food but but also nutrition and and they are partly health as long as it's related to nutrition forestry also aquaculture so it's it's a broad coverage of topics actually the coordination of maintenance is done by FAO but here a curation of terms and concepts is actually done by 34 organizations from 24 countries the agro work currently contains over 40 000 concepts with almost a million labels so so translations and they are currently 42 languages are covered it's released monthly as a linked open data set and yeah relying on the linked open data standards well it it also provides or there are also some technical services provided to to deliver the data so there is a is a linked data server but you can also very agro work through a spotl end point and we use the cosmos tool for browsing and navigating the term hierarchy the the curation is done through a tool called workbench and all of these tools are are hosted by the university of tober gata in italy as mentioned yeah technical infrastructures is in the hand of mainly andrea turbati the agro work is actually indexed by several semantic catalogs and registries for example the bar talk registry but also acro portal which has been has been presented by clemore in the earlier talk contains the metadata on agro work um well agro work is actually a controlled vocabulary so so in the sense that steven presented it according to the definition of harbing as a collection of of organized collection of of terms to index resources and make them more findable uh it's it's so it's based on this cost standard by w3c actually it's not an ontology but it contains to some extent also also ontological relations so so the acro acro ontology is accompanying acro work and it yeah it provides us a bit uh yeah how to say it richer means of describing relations between concepts so for example which which which fruits are produced by which species and and yeah which pests apply to certain certain crops and so on so yeah how do we actually build a yeah how to say it a sustainable community and and try to keep things going well uh the FAO carries the responsibility the responsibility for the for the six main FAO languages which are English French Spanish Arabic Chinese and Russian and it coordinates the editorial activities technical maintenance is also facilitated by by FAO including the publication as a link data linked open data resource however we're really talking about a collaborative effort and different institutions are responsible either for different language versions of acro work uh but but also different domains that are covered i will come back to this in a in another slide quite in a minute the work is done on a on a volunteer basis and they are knowledge sharing with the acro work team is rather important and we do this within our the the FAO and kdbl team but also also with with the editors and i will also come back to this aspect a few a few slides later on generally we follow agreed guidelines and standards and yeah to dive into this a little bit more you can see on the right hand side of the slide the acro work editorial guidelines that is one example of the standards we actually follow so so within these guidelines it is described how we organize the terms in the in the concepts in the hierarchy how we deal with with translations of terms and so on so it's actually curated by the editorial community through a collaborative approach and yeah we really try to involve a broad user community including domain experts researchers and practitioners from the agricultural domain and not only are people indexing from the from the library sciences so there is really really a big part is involving also the agricultural and experts to ensure the vocabularies relevance yeah for such a large multilingual thesaurus things like well resolving ambiguity is really is really a key aspect it's yeah in often cases it's not just translating terms but but we also have to localize the terminology it's four minutes thank you Daniel yeah thank you well then we have another another approach that is yeah maybe maybe interesting to show acro work actually supports the integration of yeah or the technical integration of different domain specific vocabulary so to say technically this is done as modeling them as as separate concept schemes but they're using the already given UIs from acro work and yeah currently there are there are five vocabularies actually integrated into into acro work it's the the vocabulary on land governance provided by the land portal or coordinated by the land portal foundation then it's the aquatic sciences and fisheries abstracts the as far which has a separate team at FAO for for maintaining the vocabulary then legislative and policy concepts are integrated from from farlex and well this is coordinated by the far legal office then we have a strong cooperation with with cgiar an organization quite active in in agricultural research which is coordinated yeah by the cgiar FAO task force and we have topics concerning indigenous peoples in by the FAO indigenous peoples initiative so promoting the use is really is really an yeah important issue to us so so we're also we are trying to demonstrate the benefits socializing the value of vocabularies well yes as you all know using standardized terminology facilitates effective communication and and knowledge sharing and yeah we think by the use of semantic technologies we can can really help to to leverage and safeguard the content work in a in a in a portable portable and and long-term technically sustainable way yeah regarding the outreach for for this and last year there have been special outreach activities mainly in latin america but but we do this all around the work highlighting dedicated pieces of of acro work content there is also outreach material available we're present on social media and yeah through a collaboration with the FAO country and regional offices so this is just the example of latin america where there have been yeah quite some activities including including webinars and and user community involvement ongoing quite recently one minute thank you daniel yeah thank you then well we share we shared a new concepts which have been added each as as mentioned there are monthly releases and we shared a new concepts which have been added into into each release and there is also yeah something that we call the concept of the month which which highlights an important concept and and showing its definition and this is yeah presented and shared on the acro work website yeah we're trying to reach different audiences so not only the website is important but there are also a collaborative efforts on on publications like like this example together with the land portal foundation on the role of metadata and open data in the innovation cycle of land administration then finally i want to highlight the acro work online course which will be will be published this month it covers all aspects reaching from very basic foundations like like information on data sharing and then going through accessing and using also hints for editors and for curation and so on so it covers the whole range of working with acro work and yeah let me share this with you and with that i'm actually finished with the talk and yeah you can reach the acro work team through through the email address acro work at far.org so if you have questions later on feel free to send them to this email address but yeah i'm also available here now for answering questions and for the panel so thank you