 So there's one thing on the agenda for today, which is that we are giving a briefing on a report that we prepared for the RVA review. And I'm going to do this fairly simply, just by looking through the report, which is structured in a, I reminded myself of it today, we've prepared it, finished, completed it about two and a half weeks ago, and basically just give an overview of what we found and what we wrote up in there. We were working with ARDC, if I didn't introduce myself, I realised Simon Cox from CSIRO. So I don't know everybody who's on the call, so people I don't know, hi, hi, John, hi, Sharnjit, hi, I'm not sure which Andy that is, most of the rest of the people I've met in the past. So John's talking, but he's muted, so I can't hear him. Anyway, so I'm going to share my screen that should, hopefully you can now see the cover page of the report with the link to which has been, I believe, circulated to everyone on this list by Cell. Discovered today that some, there's been some feedback received, which Cell has been dutifully pasting into a second copy of this document that I've only just discovered the location of. So at the moment, the report that I'm going to brief on is the original copy and has not yet incorporated responses to a small amount of feedback that we've received from Natalia and a couple of other people. Let me just, yes, Natalia lears a couple of other people. So anyway, if you can see, I hope everyone can see on the screen the cover page of the report. Just scroll through and just to reflect on how the report is structured. We started out just with a summary of what the intentions are around doing this review of research for Caballaries Australia. A little overview of what research for Caballaries Australia is. A brief summary of the stakeholder survey that we ran back in June with feedback from there gathered into under two headings, potential enhancements and limitations of the current service. Then an account of the workshop that we ran in late June. A separate chapter, which is additional information that CSIRO, as the people carrying out this review, have fed into the review based on our expert knowledge, if you like, and also our survey of other vocabulary research for Caballary Services out there in the world. And finally, some recommendations, which are tabulated under five topic headings. So we'll probably be most interesting to sprint through to the recommendations at the end. But first, we'll just give a little bit of a survey of what else we have the reviewers undertaken and what we've documented in this report. So compiled by CSIRO in collaboration with ARDC and in response to the workshop and the survey. So the overall headline output is that RVA is meeting a clear need, provides the suite of capabilities valued by the community. There are some ideas about potential future improvements. So ARDC, picking up on the work that ANS had done over a number of years, are in a process of reviewing the infrastructure services that they actually host, as opposed to those services, which they've been working with other collaborators with. And in particular, those services are Research Data Australia, the Federated Index of Research Data Repositories and Research for Caballaries Australia. So this first document is looking at the latter of those, Research for Caballaries Australia. And the intention is to identify future directions for the ARDC data publication services. So I've already given you an indication as to how the review was conducted with the survey workshop and then some additional input, interviews with ARDC staff and then comments based on this feedback round, which this briefing, I guess, is a continuation of. So this part of the report just gives a little overview as to the general background as to the service that Research for Caballaries Australia provides. I won't go into detail on that at the moment. And then a summary of what the current state of Research for Caballaries Australia is. That is, it comprises a vocabulary editor, which is based on cool party, which is a commercial product, but which ARDC hosts on behalf of the Australian research community. So they're wearing the cost of the license on behalf of the whole community. There's a vocabulary repository, which is based on an RDF data store with sitting in front of it, a bit of some open source software, which provides a link data view of the controlled vocabulary is there. There's a vocabulary registry, which is the first thing that you encounter when you go into the RVA website. The registry is the work that ARDC have hosted on the back end for updating and versioning vocabulary, the vocabulary portal. I jumped ahead of myself there. That's the first thing you see when you're going to Research for Caballaries Australia, which is the initial discovery state and the high level summary of vocabulary, as well as user support documentation. This diagram you've probably all seen because it's part of the basic offering on the RVA website. So there's a browser user interface for discovery, a browser user interface for vocabulary providers, registration upload versioning, registration API for vocabulary maintainers, so you can update vocabulary, search and download, and novel piece of the RVA kit is the vocabulary widget, which provides a widget that can be dropped into websites based on content hosted at RVA. So as of the end of July, there was 217 controlled vocabulary in the portal, of which about a third, 80 of them, are just a link to an externally hosted web page and the rest, a mixture of ones maintained and uploaded from pool party and those prepared externally. So diagram there summarizing that. Though vocabulary is hosted by RVA, follow contemporary web best practice. Every term has a URI and you get a web page or a machine readable version of that depending on how you ask. So there was a survey distributed at the beginning of June, I think was open for a couple of weeks. I only got a relatively small response, but allowed some insight as to what the community thought was important. Particularly interesting was the narrative versions of limitations, a couple of providers provided quite a lot of detail as to what they thought was good and particularly some of the limitations of research for Cabrera's Australia. Top of the list there is that in some communities, they don't need vocabulary to be hosted in RVA. Possibly they could be registered there, but they don't need to be hosted because they rely on a discipline or community hosted service which already exists. And then there was some issues around the difficulty of searching for terms across vocabularies and some opportunities for maybe improving the way in which RVA vocabularies can be integrated with enterprise tools and processes. A couple of issues around governance, particularly that the level of community or domain endorsement, vocabulary is unclear. RVA has been run very much as a bottom-up process where anyone who is eligible to have an account at ARDC can register any vocabulary they like. That sometimes means that there's more than one vocabulary covering a similar scope. In that context, it's important for users to be able to make some kind of a judgment as to which one they prefer to use and that not always easy. We had a workshop with this set of attendees at the end of June. We're organised by CSIRO and ARDC. So good representation from across some of the other NCRIS facilities, but also from National Archives, ABARS, IATSIS and Geoscience Australia. Program was very interactive. There was a bit of an intro session, particularly focusing on some presentations from three exemplary users of controlled vocabularies, but then most of the work of the day was done in breakouts around tables to get maximum engagement, addressing in the morning these questions, what's working, what are the gaps, and then a report back and then in the afternoon, sorting those and sorting priorities. So we have some captures of the outputs of the group sessions, which are included in the report here. Something which came out quite strongly, particularly during the morning session, was the extent to which the community engagement activity provided by ARDC is valued. So essentially this is the support that Rowan and Richard provide to new players who come along, wishing new vocabularies providers in particular, wishing to take advantage or use RVA for publishing their vocabularies, plenty of hand-holding and that turns out to be very much appreciated. Some suggestions about changes in new features, which are in the details that I'll probably leave until we're looking at the tabulations at the end, and some suggestions about brand new features. We attempted in the afternoon session to get these sorted along a number of axes. There were sort of general comments and more specific detailed comments, and there was ones which we sorted them into quadrants, according to level of importance and level of urgency, and we ignored the quadrant that said not important. So we just focused on important and urgent and important, but we can wait a while. So that's what the tabulation here goes to. So more or less all of these topics are picked up in the recommendations at the end, so I'll leave discussion of those to that. Finally there was a subset of the people who were involved in the workshop who were not current RVA users. So rather when at the stage where we were doing a retrospective on people's experience and suggestions around the way RVA works at the moment, instead of doing, since they had no retrospective viewpoint to provide, rather they took on the job of brainstorming about what an ideal vocabulary service would be. Some of some topics of which are already present, some of which are fantastic ideas, but might take more than ARDC and the community is currently capable of doing, and other ones were able to feedback into the recommendations. So then as I say, we did some additional work on doing an external scan of related work, both in terms of a tabulation of some other initiatives and services. This is the kind of content-oriented part of the other work. So a wide variety of different services, and this is this can't possibly be comprehensive by the way folks, but it's the ones which we're aware of or have encountered vocabulary services for some research communities. The one at the top there is a very general purpose. This comes out of the library area. The DDI Alliance do a very good job of managing their controlled vocabularies in social sciences, official statistics, public health, those kinds of things. And then a series of sort of natural science oriented vocabulary services with links and also an indication, not only the scope, but some comments on the status. So in some cases they're very stable, some cases they're not sure if you call continuous maintenance stable. It means things actually do change regularly, but the service itself is stable. Some are wide open by a portal. Anyone a bit like RVA can submit a vocabulary as long as it conforms to our link and get in. It means there's a lot of vocabulary is there, but other ones it's much more curated. So I'm kicking the tires a couple of years ago, I submitted some vocabularies to AgriVoc and got a polite email overnight saying, I don't think these are agricultural vocabulary, so please remove them. So which was interesting to discover useful. So say quite, you know, a reasonable list here, and the community should be aware of these and a couple of emerging or new initiatives down at the bottom there. Some comments on governance patterns because governance was one of the things which the participants in the workshop seem to be particularly concerned about that governance patterns. There's not as much guidance as people would like when perhaps not as much information as people are expecting about the governance status of some of the vocabularies. You know, looking through what's in RVA to a certain extent, that's just it's down to the what's contributed by the people registering vocabularies in RVA, and I've even in the last week or so picked up on some of the vocabularies that Ann's now ARDC themselves submitted into RVA and said, hold on a moment, the descriptions in the metadata on the top page there is incomplete and makes it a little difficult to assess why you'd use this one versus another one. So the the floors go all the way up and down. I guess there's some questions about whether there needs to be more rigorous checking and whether the wide open nature of submitting them is meeting community requirements. But as an as a comparison, basically did a brief description of some governance arrangements for some other vocabularies and vocabulary services, which I was particularly familiar with. So increasingly, there's a lot of use of issue trackers and ticketing systems, particularly around GitHub. So that's, anyway, it's described in here. A little reference out to standard governance models as described in one of the standard stacks that some in our community use. This comes from the geographic information standards with the idea that you separate out the management of the content from a technical point of view for the management of the content from a sorry, the management from a conceptual point of view, and such like but basically this is just drawing attention to the fact that there are quite rigorous models for how content would be maintained and including some sequence diagrams flow diagrams, decision trees that which are in these standards. Finally, looking externally, a bit of a technical view here about some of the platforms which are available. RVA, primarily the link data API to vocabularies is based on the elder SysFox stack, which Jonathan and I were actually were involved in developing SysFox quite a few years ago. And when RVA was establishing its offer in the first place, there was a small working group that evaluated a number of options and decided to go with that primarily because of its open source nature and its compatibility with web standards around machine and machine readable and also human readable interfaces. But there's a number of other offers out there. Sorry, I'm not going to see because I'm scrolling up and down with and as I mentioned, the the tech stack which RVA that part of the tech stack is is, you know, arguably getting quite long in the tooth. Now, although there's no reason why it still functions perfectly well. But there's, there's, there's alternatives, which are listed in here. And and on and on and on. So the bottom ones on that page there to proprietary tools, pull parties already installed as an alternative from top quadrant, both of which come with moderately hefty price tags. But obviously RVA has been able to, sorry, ARDC has been able to absorb that cost on behalf of the whole community through their pool party offer. So okay, finally, getting into the recommendations, we sorted these into, as I say, I think five, the topic areas on engagement, training and practice, governance, tools and technology, content and impact and analytics tracking. And the the recommendations are numbered and they're also got a little color coded icon to indicate what the source of the particular recommendations were divided between things that ARDC themselves contributed as part of the consultation, two kinds of contributions from stakeholders, both from the workshop and from the from the survey. And the expert feedback was stuff that CSIRO should have in as part of our role in the review. So in the engagement, training and practice, big ticks to the current help, help desk service from Rowan and also sell, sorry, sell I ever should have mentioned you before, including the facilities for notifications and meetings of the Australian vocabulary services interest group is this session are all appreciated and should continue that was coming from the stakeholder feedback. And then there's number of ideas about improvements in the engagement, particular one that was coming through was the idea that there's not enough sharing of user stories about how different communities are able to effectively use RVA in them for the management and publication. The challenge there is that different communities have got different views of the have their own internal processes for how they manage their recoveries and how they want to publish them. In fact, there's a lot of flexibility in the RVA process, which means that having a map to get through that and how to effectively reflect your communities requirements on the technical platform that we're using need a bit of guidance on that. And the idea is that perhaps the best way of doing that is through through some user stories, which allow you to say, Oh, does my community operate in the way that this other community is already operating? This is how they went their way through. In the governance space, there is a recognition that the metadata is a bit messy. That because early on some of the metadata fields were free text, it means that sometimes you've got multiple entries for the same organization or the same person. I know my name is in there several times CSI Rose names in there several times. And that means that sometimes it's hard to keep track or get a proper evaluation of the provenance or source of the capillaries, because it's a bit tangled up and that needs to be some of that's because the means to do that was not always available or kid and ROR have only come along. Since RVA was established, our ROR is the registry of research organizations. And so that's a way of giving identifiers to research organizations until a year ago, that was a bit of a black art how to do that. But as the web of science or web of research, if matures, RVA needs to reflect that now that probably means there needs to be some remediation of existing of the existing offer in there, which is something which ARDC would have to someone would have to roll up their sleeves and do a bit of probably some manual work to remediate that if that was to go ahead. But the suggestion is that. Oh, I didn't say at the beginning. We've also got as well as those five categories, we've got the recommendations sorted into a continue improvements that can be made on a relatively short timeline one year and longer term improvements, say a three year timeline. So anyway, documenting the user stories and patterns of governance again is seen as something which needs to be, you know, is likely to, to spill out beyond the one year timeline. Looking at tools and technology, the pool party editor is very much appreciated by I mean, it looks like it's fairly heavily used. It's used for about half the total vocabulary is and for significant majority of the ones which are actually hosted by RVA supposed to just linked to. So pool parties appreciated the machine readable the report the API, the elder sysbox thing is also seen as good. And the the top level interfaces as well as the widgets are appreciated. And number of users commented that the term search is difficult to use because you get multiple hits from different recabularies but then to look at the details you have to burrow down into the individual vocabularies. And so cross vocabulary term searching needs needs an upgrade. Vocabulary mappings is an area where that's a never ending project, of course. But people are certainly interested in it. Some possible other issues about about updating or improving API access into other applications. Looking a bit further out, there's some richer functionality that potentially can be provided beyond just term search, including for example, a recommender by a portal runs one of those where you submit a bunch of text and then it says, ah, it looks like you're working in this discipline. And here's and here's the based on the key terms in here is the vocabulary or ontology, which covers most of it. So it might be the one that you that you want to use. And also there is the question about, you know, we don't want to just sit on our rolls. And if there's some of the other technology options would provide some improvements, then we need to keep our eyes open about about to adopting those. Terms of content. Although RVA is primarily based around SCOS type terms and definitions. It's not only limited to that. And certainly a couple of vocabulary which I maintain in there go way beyond SCOS. And that's seen as a feature to be maintained. One of the issues that comes up from time to time is the extent to which the sort of passive bottom up approach is enough in terms of content or whether ARDC could be more proactive in searching out and acting as a curator or adding additional content. And then there's vocabulary mappings, again, which is a content issue. And finally, on the impact and analytics tracking, the idea that vocabulary publishers need to be able to get feedback about how much use there is of their vocabulary and perhaps some information about who the users are. But that is kicked down the road a bit into the three years and later stage of the recommendations.