 Hi, and welcome to today's webinar on the topic of RAID's Research Activity Identifiers. So let's get started. So my name's Natasha Simons from the Australian National Data Service, or ANS, and I'm your host for the webinar today. And my colleague, Cezanne Sabine, is behind the scenes co-hosting the webinar with me. So this webinar will introduce you to RAID, a new addition to the world of persistent identifiers, or PIDs, PIDs. RAID stands for Research Activity Identifier, and is an ID for Research Activities, which places the project at the centre of the data management life cycle. Our two speakers for today, Dr Andrew Janky and Shabon McCafferty, are from the Data Life Cycle Framework Project, or DLCF for short. And RAID is the first of the DLCF-enabling technologies to be developed. So this webinar is the fourth in a series examining persistent identifiers and their use in research. The first webinar looked at citing grey literature using DOIs, and the second introduced the international geosample number for physical samples, and the third explored the global initiative called SCOLICS, which is linking data and publications. And you can find the recordings of these webinars on the ANS YouTube channel. So I'd also like to acknowledge the Commonwealth Government for their support of ANS Research Data Services and NECTA under the INPRESS program. And now to introduce our speakers for today, Dr Andrew Janky, who is a software architect for the DLCF project, and Research Data Services based at the University of Queensland. And Shabon McCafferty, who is the project manager for the DLCF project, and works for the Australian Access Federation in Brisbane. So I'll now hand over to Andrew and Sean. I'm going to start with a few horror stories because that's always fun. Some of the things the DLCF project tried to solve or had a look at when we talked to universities at the start was, what are some of the problems you were having? And there were people who expressed multiple things, a lot of things were about lost data, people couldn't access data, and what it came down to in the end is that universities had no great way of identifying what a project was. There was a good notion of what a project was within an institution, but the best research is done collaboratively. So people wanted to be able to get around this problem somewhere, and what came out of that is that we knew we needed a project ID. We looked around to see what was available internationally, we couldn't find much. We talked to the Auckland group and their advice to us was have a go, so we did. And we came up with the notion of RAID, which is a research activity ID. We actually started with project IDs, but what it was semantically associated with project was a problem, so we backed off to research IDs. One of the things we also knew from the start is that researchers do a lot of things. Some of the things are on the screen there, and unfortunately often doing research is the small component. And we knew that researchers also, from their feedback, had a lot of what they call black holes, so they were reporting into a lot of systems already, so in grants and finance and publications, and the last thing we wanted to do was develop yet another system around RAIDs where researchers had to report information into it without getting any benefit. We know there's a number of systems around already for recording what projects are about, and typically around the outputs only. And we sort of took this view of research is that what we tend to publish is the nice shiny vision, that what happens in the basement is probably only half the story of the working data, what actually goes on underneath that. Often who really knows what goes on in the project, so we knew we had to keep all these things in mind. I'm going to get up my soapbox now and say from the start of this project we were very keen to say the research data is associated with projects and not people, and this is because we're in a situation where institutions are responsible for the outputs of research beyond the time when the researchers are left, so we knew we had to structure the RAID identified around that idea. If I look at my very simple diagram, we all tend to draw a draw of how the research lifecycle works is we have it in our heads that people are going to fill out a data management plan at the start of their research and then they're going to do something, they're going to put some data somewhere, they're going to produce outputs. We know this doesn't necessarily work, but if you look at the existing tools out there like DMP Tool, ThinkShare and DMP Online, who we could use as a project ID, we recognize that these tools were starting to work on the data really towards the publication and output stage. They're trying to change this now with a lot of these tools around where they actually interact with researchers, but for the moment I would put them in the outreach side. The problem is research probably looks more like this, and it meant that we weren't really covering or looking for IDs and issues around research data. That happened at the start of the process, so I've drawn a little black cloud over here of the part we're probably not capturing well. We knew we had to change the language a little bit when we go out and talk to people about research activities, IDs, and that is we're not talking about approaching repository or repository software tool like ThinkShare and one of these. We're talking about research life-sort management, and we're talking about attaching identifiers throughout this process. It's about attaching the idea to a RAID identifier at the start when a researcher has an idea or wants to get some storage in their institution and they use a DMR system, a data management record system, to provision some storage to their project or the project they're associated with. And the idea is if we can get this, get the RAID identifier attached to all parts of the lifecycle, we can't necessarily join everything up here, but at least we can find it. So we're about attacking the F and the A part of FAIR. I think IR will come in time, but if we can just attack findable and accessible, I think we'll have done very well. Another thing we knew we had to do is that change some of the language around what people say a project is. The existing view of a lot of projects like ORCID is that a project is a noun, it's a thing describing something that's that happened. For us in RAID or in DLCF, project is a verb, it's a continually evolving project which RAID continues to associate it with and the various touch points that researchers interact with with things that helps them get their research done. So we knew we had to build a system that was essentially transparent to researchers and that would just happen via the infrastructure in the background. So I'll now let Siobhan take that up. So following from identifying a lot of these these issues, the data lifecycle framework started to come up with what we call the enabling technologies. So the DLCF has a little bit of background, it's a national strategy to connect research infrastructure through some of the increased science capabilities. We've got five stakeholders who'll be easily recognizable, RDS, AF, ARNET, NICTA and ANZ and all of these groups do different things. So those capacities, sorry, influenced what the DLCF decided to focus on. So our enabling technologies are cloud-based connectors and identifies for research data tools, storage and outputs. The first of those was identified as a project ID which became RAID, the second a group ID and group management service and the third a metadata store and APIs. So today we're talking about RAID mainly but the need for these persistent identifiers was because we needed a really simple tool. We needed something that connected things up without being heavy, heavy software or being very taxi for researchers or for the institutions themselves and it had to be something that could be quite responsive, semi-automated. So we wanted a PID and the benefits are standardization metadata, mechanism for creating a persistent chain of provenance, supports fair principles, lightweight flexible tool and facilitates rich data linkage. RAID is simply put to handle as minted via ANZ and attached to it is what we call the DMR which is the RAID metadata manifest. So the manifest contains other persistent IDs related to the project. So DOIs, orchids, other RAIDs, potentially group IDs, tools and service IDs and any other rich metadata that the service provider as the point at which the RAID is created wants to include in there. So I'll show you what a RAID looks like on a conceptual level. You can see on the left here we've got the RAID number, it's really simple. Service point, day, month and year that that's minted. Attached to it we have these potential spaces. So space for the group ID which can include orchids or email, DOI, grids or ISNIs, tools, services and other RAIDs. And further kind of abstracted out you can see what could go in there, what they can do with it and here's an example. So the RAID being a simple handle has attached to it actually a lot of information. It doesn't hold any of the information. We don't hold any data, it's just metadata at the bag of pointers that gives you a timeline of what's happened during the project as process. And what we hope is by the use of simple things like persistent IDs that we can link up all of this pre-existing infrastructure, semi-automated things, make research easier and make tracking research easier. For institutions that gives you improved visibility and research activity components, so the people, the tools, processes, storage and outputs involved in each project gives you an audit trail for research processes and improves the visibility of collaborations across institutions. But infrastructure, you get metrics for use and connections, can automate storage and provisioning, you can also automate allocations, storage and compute and you can access authorization, you can automate access and authorization. Cool, so that's me. Thanks very much. So I'll just ask, you both have been talking a lot about RAIDs at conferences and events for a little while now. What sort of feedback have you had from people on RAIDs? What sort of questions do you get asked? Yeah, so I can answer that from a technical point of view. People are interested in how long it takes to integrate with with the RAID system because it obviously means changes in how they do policy at their institution and it needs institutional uptake. When it gets down to the actual technical implementation because we've designed RAID to be more machine to machine rather than a manual service, the feedback we get is that the integration time is very short. It's not a hard thing to do. What is always hard is getting the policy and the agreement in place that this is a good thing. With respect to the feedback we've got from institutions around is RAID a good idea? I think the answer is a resounding yes, in that they know they need something to identify projects across institutions but again, implementation and all the things around a inertia of large organisations. We've actually gained even pauper in your own country. We've gained a lot better uptake and feedback in some ways from New Zealand than we have from Australia. This is probably because of their different approaches to things. In New Zealand many may know that the uptake of orchid was driven by the department rather than by the institutions. The department said MBIE said this is a good idea. We will help all our institutions go forward with this. We're looking as to whether we can do the same thing around RAID. We know that the New Zealand Department of Education is interested in developing a project idea across organisations. They see this as perhaps a good way to do that. We get many and varied feedback. We get even some researchers who say they're really, I wouldn't say the clever ones, but the ones who are the converters, they want to do this stuff and they say, how do we integrate? For the moment we really can't help them much because we're not aiming the service and researchers right now. This is something that should happen in the background without them having to put any effort into it and it should just make their life easier. That was the goal of it, but that means that in order to provide a level of trust around the RAID providers and the RAIDs, it means we must have agreements in place with the service providers who are linking to RAID. That doesn't exclude researchers, but it gives them a different level of access to what they might have expected. Well, follow-up question. I think you've probably answered this, but are RAIDs only for projects at Australian institutions? No. At the moment we're starting the iZone process for RAIDs. It will be an international standard, a B&A and Z standard and then an international standard. So it was begun with the intent that it was international because researchers work internationally, they're not just working in Australia. So yeah, it's international. Okay, thank you. The next question, accessing the RAID API, will it be publicly available? That's the first part of the question. Yes, definitely. If there's a chat thing in here I'll type it in right now where you can go have a look. It's api.ray.org.au api.ray.org.au. I think people can get that. And we can also send that link around anyway with the follow-up recording. If you are interested in looking at that a bit more, if you add ui.api.ray.org.au, there's a pretty version of the API which allows you to interact with it and see interactively what you're doing. Now to get beyond the initial part, you will need a key. And if you need that, contact us. Through the RAID website. Through the RAID website. And we'll provide you with a trial key which doesn't require anything beyond it's a trial system. Yeah. Okay, so at the moment people don't need to register to have a look at that. They can just, they can actually access the API as it is. Yes, but if they can contact you. Yeah. Okay, so next part of that question is, are you aware of how many systems have integrated with RAID so far? And do you have any exemplars? We're very aware of that. So RAID was launched in April and it has a small amount of people who are integrated with it so far. Probably the best example is UQ's new IDN system which Andrew is involved with. So the DMP kind of tool for UQ has been updated and has integrated RAID and storage provisioning which comes out of there. So that's a pretty good example. Also probably Redbox RAP which is, Redbox have made for us a very lightweight version of Redbox, the Resergic Training Portal which has integrated RAID and kind of ability that's available. Anyone can go. Anyone with an AI credential can use Redbox RAP. So that's a minimal DMR system that will allow you to put in the minimal information so a project title and maybe an institution and a few people. And that will generate a RAID for your project around that DMR so that's accessible as well. Okay so if somebody wanted now to mentor RAID or start metting RAIDs, how would they go about doing that? Well if you're an individual it's something that your institution or research group should do for you. So we talk about service providers or service points and that can mean any platform with integrated RAID capability. So if I as a researcher saw that it was useful to have a RAID, I would go and hopefully talk to my data management planning team or talk to whoever runs my Singatron and have them integrate RAID into their management processes and have a RAID mentor via that. It's not something you do yourself. It should be through your institution because we need that really rich metadata that comes out of I'm a researcher from here. I'm using this piece of machinery. These are the people involved with my projects. They're not intended to be this is my project. It's intended to be this is a project I've worked on. Okay and so RAID at its basis is a handle. So when you click on the handle what page does that, how does that resolve? What does that resolve to? I guess the magical content provider string is required by hands for people who need to handle. Yes so the handles will resolve. Now they resolve to a page on the RAID website which essentially said this RAID is valid or not. Now it's a decision by the service providers or the institutions that in time we model the underlying data structure of the information about a project on the RIFCS service activity. RIFCS schema sorry. This means that in time if a institution chooses to make information about a project public then that handle will resolve to the service record within RDA and we will update it. And obviously we'll migrate the information into the service record in the public section of research site or a string. Okay is there anything further you'd like to add? We've come to the end of questions. Is there anything? No but we're keen to hear from anyone who'd like to have a go at integrating the RAID API into their institutional or research groups workflows. The more people that use it the more useful it becomes. So we're looking at the moment at improving uptake in Australia and in the rest of the world. So if you're interested drop us a line and no cost to you. In fact we sometimes give people money so we want more people to use it. It's federally backed. It's not going anywhere. It's a really useful little thing. Have a go. Okay and it's raid.org.au. Is that the website address? Yes. Okay fantastic. Well thank you everyone for attending today's webinar and just a reminder that this is one as part of the Persistent Identifiers webinar series. You can look out in the ANNs online news or on our website for an announcement of the next webinar in the series but that will not be until next year now. And thank you again to Andrew and Siobhan for making the time to give their webinar today. Thanks.