 Tobi Ohara is the implementations manager for the University of Western DC. He's got his fingers on the pulse of the metadata store. And he's going to show us a bit about what's been going on under the bonnet. And I'm hoping he's going to provide his thoughtful insights into the pros and cons of working collaboratively and anything else he wants to say. I'm now going to hand over to him. Let's see. So I've been doing this for about a year. And I think most of the folks on this call have also been doing metadata stores and sort of wrestling with how to meet not only the metadata store's project requirements, but also how to really embed some of these new things into the University. I've had the benefit of working with Peter Sefton. And so he's obviously very clued in with all the issues and where things need to go. And then through him we also have the benefit of being intimately used to Red Box and Mint. The first thing that we did, and I think this is from day one when the metadata store's contracts, when we were putting up our hands, and this is before I arrived, I was actually brought on board right at the very beginning. But in all of those planning stages, the University internally, and I think collaboratively as well between, say, Peter and his contacts, there was this distinction or maybe a rebranding of the term metadata store. And it was rather than talking about the metadata and putting it into sort of a storage of metadata at the University, I rebranded it as a research data catalog. And I think that this brings to mind, at least for me, it brings to mind as a kid going into the library and usually near the front door there would be these little sliding catalog files. And you slide out, say, A through C and you're looking for an author. And so you look through the little cards and then on that card it would say where the book is that you want or the author and what books they've written. And so that's a way that at least UWS think about the metadata store. It's a catalog with entries. So the pictures that I grabbed for this was basically you got a pile of data and whatever the data set is, it gets a catalog entry or a record. Or in my case it was, again, thinking metaphorically back in the sort of olden days, you'd get a card, you know, a new card that represents whatever it is out in the library space. And this would allow me to go and find that book or go and find that publication or in this case go find the data set. And I've got another, this is to the record. For our particular metadata store project we were building on work that was already done by the library or sort of done in tandem really, work to seed the comments. So the Seeding the Comments project was started just before. And so basically with their efforts of outreach and, as the name implies, Seeding the Comments, getting a few data sets that could then be used as examples. And sort of the next logical step is to strengthen the flow of how the new entry is constructed. So this is, I'm not the expert, this is sort of my very basic high level understanding of the anatomy of a catalog entry. And actually Simon mentioned earlier that I should double check whether or not the slides are changing. Are the slides changing okay? That's fine. Is everyone seeing a puzzle? Okay, excellent. But yeah, so a single entry in the data catalog to my mind is sort of constructed of pieces. And you can see that in the Redbox application. I know not everyone in the ANS metadata stores are using Redbox, but Redbox is our example. But it's tabulated, so obviously you can grab the different pieces of what actually makes up the record. We also, I think one of the key benefits of using Redbox in our case was that the name authority provides some of those components in a reusable way and a consistent way. So every data set when a record is or a new entry is created about that data set, you can grab a person and when that person is attached to the record anything else that that person is associated with will then also, it will display the same way, that person will show up the same way. And so there's a few components on the screen. Metadata stores specifically are sort of over and above our seed in the commons efforts. Specifically we work to get as much data into that name authority as we could. So we set up the process and procedure by which information came out of our research management system, which is a home ground system, we're not actually using Research Master, but it's a home ground system and we've got that process documented and sort of refined so that we know how to get information from that system into the mint which the Redbox can then use. And we also did that for activities. We also spelled out how other activity type information such as FOR codes and national grant information can then also be put into the mint ready to go so that when it comes time to create a new record about a data set, the plan is that these bits can be easily grabbed and attached. The other component was, and this was a requirement for the metadata stores for us, and I think for most, the ARDC party infrastructure, which means linking out to the National Library of Australia. So again we defined the process, how that was going to work, how we would basically documenting for ourselves the procedure that borrowed documentation that ants have already created around performing matching within the National Library of Australia's Trove application and also just doing the checks inside a mint to make sure that that link, that extra puzzle piece comes into the collection record. To then have all of that ready to go in mint is intended to sort of reduce the amount of time or processing that's required when a new data set comes in and a collection record is needed. And you can see there, I've got a little asterisk, just a disclaimer. This is a very simplified diagram, but I think it gives, at least gives you some idea of grabbing different pieces and sticking them onto a record and then obviously putting that out. So just to sort of reiterate, our metadata source was really about the mechanics of the system and improving the gears, improving the flow and the process. And in my conversations with other collaborators, for example, especially those universities that have adopted Redbox, I think it really is more about the implementation of the mechanics and the implementation of the processes and procedures so that the team that are interacting with the system do have a clear sense of where the process starts, what to do, and where the process finishes. And then of course, in each case, the integrations with various systems have to occur so that the process works. So as you're crafting the process, you'll notice that you want to be able to do the next step. But at the same time, it's imperative that in order for that step to occur, some other problem has to be solved. Some other system has to be adapted or if that system can't be adapted, then some adaptation needs to be made in-house. The other thing, and this is tied again to making sure that there's information into the mint component of Redbox in the name authority. We developed in-house, and this isn't on the slide, but we developed in-house a way to view the ingested information, an individual ingest when it comes into the mint, a new entry is created on a new view. So we created a view that lists each time that an ingest has been brought into the system. And so we had an in-house developer. You might have seen his comments or responses on the Redbox distribution. That's Lloyd Harashandra, and he's done a great job. So this is actually available on GitHub as well if anyone wants to sort of preemptively take a look at it and pull it down and patch it in. We have initiated a pull request so that hopefully it gets into the trunk functionality of Redbox. But as I said, this was a new feature and I thought it was quite key to be able to see what and how many records are coming into the mint and what updates are occurring so that if anything isn't registering or if any new information doesn't make it in, we'd be able to look at the report and see whether or not those records are being updated. So that's sort of the mechanical work that we've done just on the metadata stores. The next thing I'd like to talk about is the sort of logic context. So at UWS, as you know, Peter is very collaborative in his approach. And we also took a slightly holistic approach with the metadata stores project. I was actually brought in to implement the research data repository. And so inside the large head of the repository was this smaller head of the metadata stores. So I had to get my head around sort of both and understand the distinction, the deliverables associated with the ANS requirements, and then also the larger deliverables of the university. I think that that was actually a very helpful approach to it. Not only was it integrated with the larger agenda, but it was also part of the technology needs of the components that would be feeding in or components that maybe don't directly feed into the data catalog, but as a system will enhance the or maybe you might say increase the possibility that we would see more data sets, we would see better data management planning, and ultimately we would see the reuse and the integrity of the data basically go up. Also part of the setup was within UWS we created a steering committee. And that was actually not through my efforts. That would be largely through the efforts of Peter Bajaya. He's an intersect liaison here to UWS. But essentially bringing together the major stakeholders and getting them in a room on a regular basis to report on the progress of the RDR, the research data repository, report on the metadata stores, seeding the comments, and also our data capture, as well as some other projects. And just having those people in the room interacting with one another, understanding where any difficulties might be being able to not only escalate issues or suffragism, but also to be able to communicate with the group when there is a success. And then those successes can then be carried on to their various organizations. In forming the steering committee there was a lot of thoughts around what would these different players actually get from various e-research projects, essentially from the research data repository from the metadata stores. And we spelled out, actually Peter spelled out, some of those key benefits to stakeholders. And it's on our e-research blog. Did the screen just change? Yes, yep. Okay, good. So anyone can go to this. It's a public blog. And you can get updates about what's going on with us. And on this particular entry, this is sort of an early entry regarding the RDR. And as I mentioned, the metadata stores is inside of that. So we talked about the stakeholders and what those stakeholders would receive or what we could do for them. And obviously at the top of the list is researchers. They're really the reason we're doing all of this. We want to create a safe place to put working data. So that's the storage component of what I do. We want a platform for collaboration around the data. That's also very critical. And specifically on the metadata stores, I think this is well understood, but I'll go ahead and say it, collaboratively reusing data, being able to find all data so that it can be handed to someone who's interested. I've heard stories where the data is quite old and it's been placed somewhere. It's on a thumb drive somewhere, but not sure where. And despite the best efforts or the goodwill of the researcher who created this data, if it can't be found, then it's much more difficult to share. Archive stories. So you'll notice there's the top bullet talks about working data, and then in the third bullet it talks about archive storage. So we've got working storage and we've got archive storage. And we're really pushing forward on this concept of working with the data, but then at reasonable intervals or at key stages, basically moving that into archive. A platform for describing and advertising data sets. That would be the data catalog right there. A platform for publishing open access data. Again, one of the strengths of Redbox is that the data can literally be uploaded while the collection record is being created. So electronically, it can simply be attached to the record and it can be marked as open access. A platform for protecting confidential and trade secret data. Now on the flip side, just because a record is created in the catalog, and I think this is going to be an ongoing PR campaign with researchers. When a collection record is created, it at least gives everyone the opportunity to ask about the data set, but doesn't necessarily release the data to them. So being able to protect anything that needs to be protected. Direct connection to scalable computing resources. This again is just within the data repository component, but the computing resources are obviously very key to be able to generate answers to big questions. And then I think this is the most difficult one. I'm saving the most difficult bullet for last, and that is minimal extra work. So trying to get them to change what they do or change how they're trained onto something better without creating more work and more hassle. So this gives, I think reading this, being able to see this on the web, anyone can then realize that UWS are biting off a big chunk and we're taking seriously all of the different components that need to come together. Not just metadata stores, but a variety of components that need to come together to create a solution and a service that researchers will naturally gravitate towards. So not only have we integrated metadata stores project-wise, but we've also integrated different pieces of the university into a steering committee. And we try to address how those different pieces are really going to benefit from what we're doing. I'm sort of just marching right through this. I hope everyone's following okay. I don't know if there are any questions at this point, but I think what Simon said about collaboration, this is where collaboration was really key. Not only on the steering committee level, but also on a technical level because within the data repository, we had a lot of integration to the library, to information technology services, to the R Office of Research Services, and we asked them to do things. We asked them to create new services, make changes to their forms to create new processes and new procedures. And as that happened, that's where we saw success. Okay, moving on. So then, not only do we think about how we can collaboratively issue new processes, new tools, new collaboration opportunities, and new ways of obtaining data sets to put into the catalog. We've given some thought to how can we tap into the research lifecycle? Where are some natural breakpoints? Where are some triggers that would then somehow incentivize the researcher to start taking advantage of all of these great things that are upcoming? I'm just going to flip to the blog once again really quickly. You can read through the words at your leisure, and at the same time, there's some pretty pictures in Plant's UML to make it even more clear. The first one is a library-initiated data deposit. So again, the Seeding the Commons project, there was quite a bit of work from our library staff. Susan, who you know, Yui, and also Maura and Amir, who you might have seen on the Redbox distribution list. And this one's a bit more ad hoc. It's agnostic of any other trigger. We're starting with a very sort of spontaneous scenario or use case, and it might even equate to a cold call in some cases. And I'm going to go ahead and borrow Susan's story and Yui's story at this point. This is really more of a Seeding the Commons story. It was to do with the link with our steering committee. And my understanding is that in the early days of Seeding the Commons, reaching out to the individual researchers and saying, hey, would you like to try this new cool thing? It didn't have a lot of success, but being able to go back through the steering committee and go back through the PBCR, the Provost Chancellor Research, and just ask them to sort of nominate some people and maybe butter them up a little bit and say, well, your research is Seeding the Important, and we'd love to be able to work with you and preserve this extremely valuable data set. And just being able to send that message from the PBCR really beat a path to the researchers in a much more effective way. And Susan was able to get more engagement that way, so we were able to get data sets. And then the metadata source also definitely benefited from that. The same engagement was continued, and some of those data sets were sort of pushed through under the Seeding the Commons, Seeding the Commons finished, and there were still data sets coming in. So we've definitely seen success with a, you might say, library initiated or even sort of ad hoc university initiation of nominating individuals and asking them for their data sets. Just to sort of quickly walk through. So let's say we search for candidates in the university systems, so in our research tracking, research management system, finding who those research and researchers are, asking the researcher directly, is there any data that you would need later, or you need to archive, or you need to cite, or any sort of charity marketing type messages that we can potentially send to the researcher to entice them to participate. The researcher, of course, instantly obliges, is completely besotted with the idea and sends their data set. And then the information around the data, the catalog entry is created, and then the data plus the entry goes into our research data repository or specifically into the catalog. So by representing the repository as a thing, it masks all the mechanics and the gears of, this bit is the storage, this bit is the catalog, this bit is working, this is our cover song. And then we looked at different ways that maybe in a more procedural or ongoing basis that we might start to see data sets come through and new entries to the catalog come through. So in this case, it's automated data capture. We do have a data capture project at UWS, that's the DC21, which is going really well. And there may be other data capture. Most of you might be aware that we are cooking up a hybrid data capture ad hoc packaging service, formerly known as the CLAW, now known as CREDIT. CREDIT is going to combine the benefits of data capture with the benefits of being able to do packaging basically at any time. So at any stage in the lifecycle, this packaging can occur. A data set can be packaged up and it can be transferred into the catalog and the data store. But this scenario assumes that a data capture application is in place. And obviously we're not the first. Data capture has been in place. New tasks for a while. They have research centers that send data collection records on a regular basis. And there are other universities around the country. But this is not as ad hoc as cold calling a researcher and asking them for their data. This one requires a bit more planning in advance. And it also assumes that during that planning the lifecycle is considered. Okay, so the type of data we're generating, the frequency of the data, the volume of the data, all of those things are considered. And then there's a setup where the data capture is set up. It's implemented. The switches are configured properly. And data simply flows in nice neat and tidy packages. So again, not tied to any particular part of the lifecycle, except for the fact that generally it would be at the beginning of the research. Then we've also had scenarios where the publisher, a journal or any publishing activity, the publisher says, oh, you've got to have your data in a place that you can cite it. And it's persistent. It's permanent. So that's pretty self-explanatory. And typically this would be towards the end of a research cycle. A paper's been written. Or potentially the researcher already understands that the publisher is going to require this. The magazine that they're targeting has already been very public about, oh, we're going to definitely need some proper data deposit and citation. And potentially that researcher plans ahead. But the actual bundling of the data, creating a data set, making the deposit would probably potentially occur at the end. So again, tying it to the lifecycle, trying to plug into a natural spot where researchers would naturally want to provide a data set and make that available. And then, again, part of the lifecycle, applying for a grant, or at least commencing the grant. I have talked to researchers where the grant application process can be quite long and bumpy, and no guarantees that anything is going to come through. So we do have to work with the possibility that there's long gaps of time. There's some damage control when it comes to how much effort am I really going to put into this before I know there's going to happen. But all of that aside, it makes sense that when a grant is, if I'm a researcher writing a grant, I strengthen my application by saying, I know exactly what to do with my data. I know I've already planned for where it's going to be deposited, how it's going to be deposited, who's going to take care of it, and how it's going to be cited. So again, this scenario is something that we can present and we can say here, just copy and paste this into your application and we will help you execute. And then also tied to grants. Obviously, when the grant is finished, the actual data deposit needs to occur. And so assuming they've done all their planning and they've signed up to this, a data deposit occurs at the end of the, when the project is finished. So the reason I bring up the life cycle as part of our metadata stores, there's another scenario before I go into that reason. Reporting driven. So if the university needs to be able to report that the university has generated this much data, the university is playing well with others, the data is available, or even sort of building that reputation of really taking care of researchers. We want to be able to report on that. And so if the publications need to have an entry, or if the research management system, for example, the ERA portfolio, if we want to be able to put that into our ERA portfolio, then this is another trigger that we've identified that would encourage the deposit of data citations. So as I say, read through this whenever you have the time, and we hope that this generates more discussion outward so that these triggers can be recognized, plan for adopted. But the reason I bring that into a metadata stores discussion is because, as I've said before, we're taking a holistic approach, and not only do we need to sort of see metadata stores as a project by itself, we need to see it succeed and we want to fulfill our obligations, but also to be able to see the larger projects succeed and to be able to accomplish the ultimate goal of naturally plugging into what the researcher does, we need to recognize where those sort of intersection points are, where the natural triggers are. Would you like to just talk about the pros and cons from your perspective, some of your reflections on those collaborations? Let's see. So let's start with the pros. I am naturally a half full type person, and it's been a massive, massive benefit to have the collaboration on the software components specifically. I think it would be really difficult in my mind to separate ANS objectives and ANS deliverables and ANS requirements from this solution that we've implemented, which is Redbox. And it really gives our university and other universities like Newcastle, something concrete that we can point to and say we need this to work well. And it gives, at least gives a point of discussion that says, okay, well, I'm trying to implement this process and if there's a software feature that's already been put into the software, then it's fairly easy to just point to it and say, well, just implement this and you've got your problem solved. As I mentioned earlier in the presentation, it's not only the software and it's not just the mechanics, but it's also the procedures and the, we don't quite have a policy and I think that's on purpose. You can discuss that with Peter Sefton. But it's the policies and procedures and the sort of organizational change at the university that seem more important. And while every university is different and each university has its own challenges, I think there's still room for collaboration and sort of collaborative discussion around the challenges of reaching out to the right people, the challenges of being able to make headway, essentially, and what kind of messages might fly better than other messages. So there's that. And I would also say that, you know, because we have a strong, somebody who's already been involved, we've got, you know, Peter, he's also technically savvy when we brought our own programmer. That was, I don't think it would have succeeded as well without Peter's technical background. Turkey, would you like to talk specifically about the collaboration that you've been engaged with? Yes, certainly. What have you been trying to do? Yeah, one of the features that was developed by the University of Newcastle was the reporting, having some better reporting inside of the Redbox software. And we went out with sort of our intention to improve the reporting. I wanted, as I mentioned before, I wanted to be able to see when data had been ingested into the Mint. And so Vicki, at the University of Newcastle, she included that as a feature. And we actually talked through that, and I thought that was great. You know, it's just a classic case of many hands making light work. And we could then take our resources and invest in, say, something else while that's being developed. So that was very positive. And I would say it was great to work together on that. Then, not in a negative way, but I would say then it just so happened that being able to see the list of ingests that we were after. It didn't quite make it into the upgrade that Newcastle had put together. So it was just as well that we were still proceeding with our version. So collaboratively speaking, I would say if we had had a real dependency, then I would have had to make changes to try to adapt because that feature didn't actually make it into the reporting that the University of Newcastle created. So our sort of list of the ingested data, it has a different look and feel, and it's not really inside of the new reporting module that's in Redbox version 1.6. So there's the, I would say, the communication was very good. And of course, as soon as something changed, Vicky called me immediately and explained the situation and we worked with that. Collaboratively speaking, however, we were still in a good place because for the metadata stores, one of the optional deliverables on our subcontract was to provide reporting and we basically got that deliverable for free. And it was, again, it was sort of reporting about what's in Redbox versus what we were doing, which was just having a report related to ingested information, ingested metadata. So there was definitely a massive benefit in that we didn't have to go to a lot of effort to get the deliverable. And then there was just the, I guess, the changes as well. So you're sort of at the mercy of changes to the planned functionality. If there are any changes going on with the other organization and if it's beyond your control, then that creates sort of a follow-on effect where if we hadn't had something that we could go to, then we would have been stuck to actually create that feature. So your advice to others engaged in collaborations from your experience? What's the key to making it work? Toby, I've lost your sound. Have a now. I think I bumped something. You're back, so we missed your answer. Oh, yeah, sorry. Yeah, so let's see. My advice, I'm not one for handing out advice, but what I learned was that, first of all, there was a real trust, and it wasn't necessary to formalize. We didn't have anything formal, and I don't think anyone in the community has anything formal written down that says, okay, we agree that this feature is being developed by you and we agree that there will be massive problems if it doesn't happen. So nothing formalized, and I think that was not a bad thing. On the other hand, by not having it spelled out, then you're definitely in a trust scenario. I want to say that the best approach is, well, from my perspective, and this is more about style, from my perspective, I prefer to just roll with it, and if we hadn't had a backup plan, then I would have created one, basically. So as you can tell, I don't really have advice for collaborative development. Because I simply enjoyed the benefit without having to work through the difficulties that I had were not insurmountable. Perfect. Well, hopefully there will be some questions coming in. Can I either use the chat box? Amanda, it would be great to hear from you. Or unmute yourselves and ask Toby. Toby, while we're just waiting to see if there are any questions coming in, the project at UWS is beginning to wrap up now. What are your sort of thoughts on it as an instrument of change at UWS? I know that it's part of a bigger picture, and you've covered that. But we spend a lot of time sort of reflecting on just how these metadata stores' projects are having an impact on the universities. Is there anything more to add to what you've already said? Well, maybe I could just sort of build out the concept of making it easy for the researcher. And also, I would say, making it easy for the supporting environment. So within our library team, being clear about how and when to address research data. Making that kind of clarity, I think, is invaluable. And I would strongly urge any university, any e-research team or library team that are implementing their metadata store. That's one of the key success factors, I think, is to make it very clear how things are happening. Create that recipe, if you will, so that things proceed the same each time. And as that clarity emerges, it then becomes so much easier to explain it to other university stakeholders. This is how it works. And with that clarity, all of a sudden it's not so difficult. All of a sudden it's very doable. And then with that clarity, it's also easier to explain it to researchers. So to get back to your question, the project itself was about greasing the gears, making the clarity more apparent, making as much explicit as many things as we could so that the path is that much smoother and it's going to be easier to follow the next time and the next time. We really are taking a long view and we're basically saying, okay, this is going to happen over the long term and by setting it up in a clear and consistent manner, we can then start to make improvements. So the core capability is set up and it's as smooth as it can be and as easy to do as possible. And then from there, it can only get better and improve with, say, more automation or increased flexible input or any other number of things. Thanks, Toby. There's a question from... I'm pretty sure it's Mary White who appears under the name Maureen McCarthy. Mary, if you've got a microphone, would you like to unmute and ask Toby if not, I'll ask it for you. I'll pause for a moment. I don't think Mary's got a microphone but her question is, and I think I'm going to put a couple of words in front of it, what are the benefits of MS-23, your metadata stores project? She's written the benefits of MS-3 being incorporated in institutional repository project. So I think it's what benefits here? That's a very good question. First of all, infrastructure-wise, holistically speaking, we looked at the infrastructure needs of the metadata store, of the red box. So when we were planning to upgrade the storage and we were planning for some of the appliances that go around the storage, we were able to incorporate the needs of the data catalog. And at the same time, looking at where data is going to go ultimately, making space, making sure that certain technology was available to plug into. And that's just the infrastructure component. The other component, we actually moved the IT services piece into the overall project. And so anything that we needed for the metadata store from an IT services perspective, we then could use the services component of the overall project. And that component was to create a new service where needed and basically insert that. And this more ties to storage. For example, we created a new service around the storage to be able to give working storage to researchers. And as an integrator part of doing that, we wanted to give visibility on that storage to the data catalog as well. So it then makes it easy to grab the data and import it into the catalog. Assuming that the data size was reasonable where most research data is not that copious. So creating new service, not necessarily required by the ANS metadata stores, but because it was under the overall umbrella, we could then integrate any time that we were creating a new service. So having the metadata store as part of the overall RDR project, it gave context to our steering committee, to the leadership where they could see that it was part of a larger thing. They could see that this isn't, it's not just an exercise to tick boxes, it's really an exercise to roll out a new way of doing things and a new, I'd say, host of services that then make the university better able to work with researchers in that sense. Thanks David. We're getting very short on time and there are two questions you might quickly cover. One from Sharon at UTS. What platform is the repository based on? That's the repository side. And Amanda Nixon has asked you Toby could you please talk a little about how you work with your research office, especially when you're thinking of dealing with reporting. So you haven't got much time. See if you can cover those two. Well the first one is easier. It's running on Sun Solaris. Any other questions there? The next one is that our research office have been fantastic just in the sense of whenever I talk with them about processes, data management especially, and as a concrete example we ask them what forms and grant applications that we use for our internal grants and also we ask them what forms were used to basically set up a new research project. So a researcher here would come to the office of research services and they say, I need to get started and I need to write a grant. And they were able to bring out those forms and they were able to say absolutely let's work together on inserting some language around data management. So that was quite good and they've just been very cooperative. I can only say that any walls or any obstructions must have been beaten down before I ever arrived. But they're very cooperative. Another concrete example is within the management system that we have we've got a systems engineer who's made suggestions that are essentially we can implement triggers, we can implement reminders, we can implement different things into the system that would then trigger researchers to participate and get involved and think about their data. So there's just been a real willingness to discuss and I don't know if that's just because the relationship must have been very good to begin with and it could be because of the link to the steering committee again because with all of these senior people eyeballing each other and daring one another to blink first they can essentially send that message, the impression of cooperation through the team and so anyone I talk to they know that the head of Office of Research Services is on this committee and they agree with what we're doing so we give that cooperation. Thanks Toby. Look, we're just running slightly over time now. I just want to thank you very much for your time and that presentation.