 Welcome to the talk on the data curation network. My name is Lisa Johnston and I work at the University of Minnesota. I'm our data management and curation lead there. And I am the PI for the data curation networks on the grant project. Introduce my co-presenter. Hi, I'm Cynthia Hudson by Tally. I'm the head of research informatics and publishing at Penn State University. And I'm the PI on the data curation education initiative. And unfortunately, Tim, I could not join us today. So just the two of us presenting the work at the DCN. I think because we're such a small group here, I'd really love to keep this informal if that's possible. And I'd love to hear your questions as we're going along. We are recording this session as well. Just FYI. So you can take a look at the slides later. But first we'll just start. And I just want to get kind of a poll of the room here. How many people work at an institution that has a data repository? So good majority of the room. Okay. Anybody in the room actually a data curator themselves. You know, these. Kind of. Okay, great. So that, I'm going to continue talking about a little bit of what is data curation and why we're doing it and why we're talking about it. Because both Cynthia and I, and all the members of the data curation network work at institutions that do have data repositories. And the challenge that we're, we're trying to help researchers address is this need to share their data. And sharing data presents its own unique set of challenges. Data are not naturally self-describing. When you open up a data set, you don't automatically know who created it, why they created it, how they created it, what in the world it means. So there's just several more challenges for us as we've run our digital repositories and ingest these much more complex digital objects that might be presented through a PDF archive, for example. So these are some of the issues we run into. And data curation really helps address some of these challenges. So a data curator really works in partnership. And that's a really key component of this. We work in partnership with the data authors to publish their data, to, to really provide that fair access, the findable, accessible, interoperable, reusable components to their data in that publication process. When we did a survey of all of the data sets that came into our initial six planning phase institutions in this project, we found that 47% came with no documentation whatsoever. So these are files coming to us in Excel, in that lab, in a wide range of formats with just really no explanation of what they are. So the only way for somebody to get to that findable, that fair component is by contacting the original author and asking them all of the questions that you might need to know in that documentation. And that's not a really great long-term solution. Here's an example of a data set that came into the repository, the data repository for the University of Minnesota. This data set not only did it look exactly like this, but it presented a number of additional challenges that you're not seeing here, just with Excel file. It turns out each one of these columns represents a unique participant in a study that was studying seasickness. People were standing on a ship. They were asked, you know, they were standing on a machine that detected how much they swayed. So pretty basic. The other challenges, though, besides the fact that you haven't noticed already, there are zero headers here. So you really don't know what's the difference between column A and column D. And in fact, it's a different participant doing a different part of the study. It turns out. But we also needed to know, do the participants agree to have their data shared in public access or passenger? In many cases, they didn't. So asking additional questions of the researcher in this curation process turns out to be really vitally important to the data sharing process. So we really do detect some of these errors. Oh, this is the corrected version after we set down the researcher and actually filled out the headers. And this was a very simple mistake. It's not as if the researcher didn't know where the headers were. They removed them to do the processing in R. So this was pretty common practice that you've run into. So we work with our researchers in the data curation process to really review the data set as a user. We ask all the questions that a first user of the data set would ask. And we ask them to the researcher once and then complete that record so that future users of the data don't have to go and ask them, hopefully. So we do things like admissing documentation. I just kind of gave you that full example there. We screen for privacy risk disclosure. So we're not only looking for identified information, which we didn't see there, but we're also making sure that the data are shared according to the participant agreement and the ethical obligation the researcher has. Fixing code issues. That's another big one. We'll run the code and detect issues that come up. If you move the code from one operating system to another, you're just dealing with a different set of packages. We transform the file formats. We might get that. See that you saw CSV earlier. We might get an Excel and transform it to CSV for preservation purposes, retaining both because they both have. We'll arrange all of those files. So if we get a package of data that comes with 100, 200, sometimes a million files, we do have to think about that arrangement of the file. So somebody could actually use them. And then, of course, we're reviewing and augmenting the metadata to make sure that these datasets are contextualized, that they're linking back to the publication that they might relate it to or other versions of that data set. So that's just a brief understanding of what dedication is, but why do we do it? So it's not just for us in archives and us in libraries. So make sure all of our records are neat and tidy. It's really for that top level there, that public piece. So many of us are doing this as a part of public access data repository to ensure that those datasets can be used. We also see a great value in working with the researchers. So a data curation interaction is not usually just like a one-time thing as a service. Okay, I'll never talk to you again. It's actually building a relationship. We talk with them about this particular data publication. And then when they submit again, their dataset comes to us better documented. And they invite us to come and talk to their labs. They invite us to come and work on their next plant submission to build out their data management plan. So we really demonstrate not only the usefulness of the data, but we're also demonstrating the value that data curators can add to the research process. The value to repositories is that our repositories are more trustworthy. We are putting our best datasets forward. We're not letting data fall through the cracks. We're not publishing data that can't be used. And then the curators themselves really benefit from this work as I mentioned with that relationship with the researchers. And really it's just a part of our profession and to increase the value for the objects that we are curating. So we also did a survey with Section ARL's FEDCIT in 2017 where we asked ARL institutions if they were curating data. And if not, what were some of the barriers for curating research data? So some of the main barriers is not having the expertise in the domain. Many of us might be working for an institution that has lots of different domains. And the data that you collect comes in a wide variety. The scan with the increased demand is also a serious concern. Right now I think many of our data repositories are seeing the tip of the iceberg for the datasets that are being generated and that could be better curated and better managed. And then the third one I will go through them all, that retooling of existing staff. So how do you actually train a data curator is a major concern. So these are all issues that we knew going into the data curation network that we want to address. So let me now shift to what is the data curation network. So the DCM, the data curation network, is a shared staffing model for curating research data. Our mission is really to enable researchers to better ethically share their research data. Right now we have 10 institutions that are involved in the data curation network. Nine of these institutions are academic institutional repositories that accept and curate data. The 10th institution is the DRIED data repository that is a general data repository open not only to institutional affiliates. But right now we've got 28 data curators who bring with them expertise in 43 domains and 26 specialty file types. So these are some of the examples of what I mean by the domains and file types. I intentionally split those out because of the research that we've been doing in curation where even if you have really deep domain expertise in a particular area, if you're not familiar with the format that the data comes to you in, you're really going to have hard time or maybe just a less enjoyable time curating that data set. So we're seeing this as an intellectual example. Health Sciences 3D image data files that come to us in BFF and XLS file formats. We're seeing microbiology images and code. The code is in Java in this case. So those combinations are really important because once we get a data set into the data curation network, we evaluate it for those key attributes and that's what we use to assign it to the right person in the network who has both hopefully the domain and the file type expertise. If we had to prioritize, we often prioritize with file type and we've done a lot of research to show why that is. But that is some of the examples coming through. So what we're doing in the DCN is instead of, you know, just your local curators trying to handle all of these different file types, we are sharing the burden across all of these institutions. So we've been working since 2016, that's hard to believe. We had an initial planning phase from the other piece loan foundation that allowed us to really conceptualize what would this look like if we were to share expertise across institutions. And then we went into implementation mode in 2018. And since this year, so right now we're in 2019, we actually have been up and running beginning January of this year. So far we've curated 70 data sets collectively and that's only a portion of the data sets that we've curated across all of our institutions. But 70 data sets have come into one institution and have been sent to somebody else in the network. So I'm here to say that if you don't get anything out of this session, it's working. We're actually able to pull this off. The future though, of course, has a lot of questions for us about how do we actually grow this project beyond a very, you know, tight knit group of people that have really come to trust each other and have, you know, built up this pretty intensive process for analyzing and curating data sets. How do we scale? So here's our vision of how to do a number of different things. So we've got a couple of ideas on how to do that. The curation itself is the actual doing. We're doing the work of curating each other's data sets and I'll talk about our workflow. We're also sharing what we know. And I'll actually not just sharing what we know, but harnessing the knowledge of the community to build better data curation practices. And Cynthia will be talking a lot about that. And through that work in the education piece and through some of our other work in just our subgroups, we're developing new research and new curation best practices. And then finally, I'll conclude with some of our ability plans. We've actually been working with lyricists to do some consulting work, gathering stakeholder feedback about how we can really make this scale. So first I'll talk about how we actually do it. So the curation work involves training all of our curators on similar best practices. We want to have some kind of minimum level of curation. We also have a coordinator, a person in charge of, you know, assigning the data sets, assessing the data sets and moving them all around and monitoring the turnaround time. That's another really big, important piece to us. And then that technical infrastructure to make all of this happen. So all of this we pull together in about a six month time period. And this, what I'm showing you right now is our curate steps. So that workflow that every curator needs to have some kind of baseline understanding of how to curate data and what we're expecting before they apply their own unique domain and file format knowledge. So these curate steps we teach, we train all of our data curators on. We hold an annual training where everybody comes together. And this is not a sequential list of steps, even though we made it spell curate, which is great. But it does really work as that framework or that scaffolding to really do more deep dive checklists, like you see here in each one of those steps. And each one of these steps could be different as well for the different types of data. We treat our code differently than we're going to treat our Excel file formats. So this is just the workflow of how we see the kind of conceptual model. This is what happens when a local institution like Minnesota or Penn State gets a data set. The local institution is responsible for the data. That's something I need to make very clear. The data curation network itself is not a repository. It is not a repository service at this time at least. And what the local institution does, they do the appraisal. They accept it or not. They need to make that storage location available for the day. And then they can choose. Is this something that I think my local curators would be best suited for? Or should I shoot this off to the network? If so, then our coordinator will review it and assign it to the appropriate curator for that particular data type. Then the curator will go through the curate steps, go through the workflow. And we actually have set up JIRA as a ticketing system to really help us manage all of this process. And as a location for our curators to enter in all of their recommendations on how that data should be treated. Now that's also a little important because our data curators are making recommendations back to the local institution. And we did that very specifically. Because it's not as if that, you know, reduces time at all. It actually adds time. But we didn't want to interrupt that really important relationship between the local curator and the researcher. And I think that's definitely up for debate. I'd love to hear people talk about that, although I'm seeing maybe one of our curators nod. But it is really important. And that's something we can kind of start it out with recognizing that this is something you wanted to retain that relationship. So the DCN never really interjects itself on your campus. We're not, you know, going and talking to your researchers directly. So all of that information gets back up to that, the local repository where they will then make that data set available and preserve it for long term. So the curation work itself in this case is really about all of that initial review and understanding of what else do we need for this data set to make it useful. I mentioned Jira as the tool that we're using for time tracking software. We also do surveys to capture the curators experience. Was this appropriately assigned? How comfortable did you feel working on this data set? I mentioned the annual training. We also have a couple of different ways that all of our curators communicate as a network. So they have a listserv and we work together on Slack very often. So there's also a lot of interaction and that's another really key aspect for not only our project, I think, but also for any network or shared staffing model is bringing people together to really build those relationships and to really build trust. So since, as I mentioned, we really went live. We turned everything on January 1st of this year. So far 70 data sets have gone through the network. And you'll notice just in this mapping, where the orange is the submitted data set and the blue is our experts, the expertise that we have in the network. We're seeing a lot more code, I'd say, than we have expertise. And I know in particular one of our curators at Penn State, who's got a lot of coding expertise is seeing a lot more data sets than some of our other curators like myself. So I think that's really an important lesson learned so far. It's just really where we're seeing the most need for support. Institutions aren't sending us their easy data. They're not sending us their Excel files. Yeah, question. Just clarify that. Is that deposits of code or deposits of data that code with? Both. Yeah, definitely both. So I don't like to make too much of a distinction between code and data because I think that's a difficult thing to tear apart. But we're seeing a code coming to us that could just be a compiler and there's really no data to go into it. It's just somebody wrote into compiler. But a lot of the data sets that we're seeing that have code, it's more for the reproducibility piece. This graphic, by the way, is not going to work for us for the long term because obviously the numbers of data sets are going to keep going up and our expertise is going to grow, but probably a lot slower. So probably won't see that graphic for now. But I do like it because it does give us kind of an example of what we thought we would need and what we're seeing. Turn around time is an important piece for us. Researchers are often depositing when they need the data now. So a lot of our deposits do come to us due to journal requirements, not necessarily finding requirements, but journal requirements for data sharing. And they're looking to get their GOI and they're looking to resolve all their issues with the publisher. So the turnaround time is important. We do often put a deadline about five days, five working business days for a curator. It can take a few days for a curator to really dive into the data. And that communication of the back and forth is really important. But we are seeing a meeting turnaround time of three days. And then once the data set has gone through the network and once the curator has worked with the data set and it's gone back home and it's been published, we want to let people know about it. So we do have a website where we have all of our data curators profiles. And we are keeping track of the data sets that they curate. So in this case, Ashley from the University of Illinois worked on a forestry database for the University of Minnesota. And we're pointing back out to that data set where it lives. You know, at its home institution. But we're also noting any curation actions that were taken that were particularly to this data set. So creating documentation or adding additional metadata or maybe this data set had some particular quality control issues that were alleviated from this work. So we're trying to really measure, you know, what was done for each data set that goes through the network. And then I guess I will turn it over to Cynthia to talk a little bit more about our education. Great. I'm wondering if we can just take any questions about. So now I'll just quickly, quickly cover some of the other initiatives that the data curation network is involved in. So one of the big ones is the DCN education initiative. And this grew out of an IMLS grant. And the goal really is to, it's more like a peer to peer training model for curating research data. So these are intense one and a half day curation workshops that we've run throughout the United States in the last two years and Canada. Sorry, that was just like two weeks ago, which is kind of exciting. To really share expertise for curating different types of data sets. And these workshops have been quite popular. I think we tallied it up and it's over 175 people have been trained. 98 institutions have and organizations have been represented at these workshops and they run the gamut. So it's our ones, it's liberal colleges. It's even US and federal organizations. So USA it has sent people ICPSR, others. So they've been, they've been a really good initiative. And this is just some high level learning outcomes. We're really interested in increasing the understanding of data curation practices and sharing expertise. Like this is just integral. Like the community of data curators is vast. And it's really important that we all build a community together. And then finally, it's all about networking too. It's, it's really interesting in our last workshop, which we just had it wash you in November. Sorry, I'm looking at Mara. I know you weren't there. But we actually had tweets where people were like, Oh, it's really sad that this is only one and a half days. I wish it could be longer. And I mean, I've never gone to a workshop and like really said that. So it's like super positive feedback in my opinion. Yeah. So fun stuff. This is just an example of how the days run. We really allow the individuals who attend to get some hands on experience with the curation workflow. And I'll show you an example in just a minute of how we kind of do that. But really we do deep dives into each of the curate steps. And you'll notice like over here where it says 12, 15, we have added a step as well. So while Lisa talked about curate, we actually have curated where the D stands for document because that's so important. And that's sort of inherent in the DCN model through the work happening on JIRA. But if you're not part of the DCN, but you're curating research data, it's really important that you document your workflows and what you've done to the data and how things have changed over time. So documentation is super important. So as I said, we have people really open up a sample data set and we have five sample data sets that we have people kind of play with, break up into groups and spend time talking about like what they do at each of these steps. So they check the files, they'll open them, they'll try to understand them, try to figure out what's missing and what they think would enhance it. So it's really like people sitting around in a table and talking about, well, it's missing headers or this, this geospatial information is missing like some important contextual information. This code, I can't get it to run. It's not well commented. Things like that. And then we have them draft out a request information. So we provide examples of email templates that we use at all of our different institutions about how we reach out, reach out to researchers. So it's almost formulaic in many ways. So we, we thank the researcher for their deposit and we tell them how it can be improved with just a few small things and we prioritize what those are. So if there's many things that need to happen, it's really important early on to just highlight a few and then move forward. If there's an opportunity to go back to them multiple times and say, if you could just do this other little thing, it will improve the reuse of the data. We also talk about transforming the formats and whether or not you even want to do that. There's actually a lot of discussions about how far and to what extent you transform data. Some institutions don't want to transform data, right? Because it could potentially change it. And then you have, like, it elicits a lot of amazing conversations. One of the things we do as well is evaluate data sets for fairness. So we, we look across the fair principles and we have the attendees look at the final data sets where they're published because they're actually using real data sets from each of our institutions and see, like, okay, from this final data set, how has it actually been made fair? And then finally, the documentation throughout. So this is just an example. These are some of the data sets we work with. This is, what's this? This is not your institution. No, big red bears from Cornell. So we have five data sets that we ask each attendee to divide up into. They can choose survey data, tabular code, image data and geospatial. And some of these have very easily identifiable issues. And some of them are, are more complex. Oh, so there's Goldie. Thank you. So oral histories. This is the great Penn State, Whitney Lyon. Yeah. So this is just a quick overview through IMLS. We've been able to host three of these workshops. We actually had one prior to this, what that was funded by IASIST. And then most recently a workshop that we conducted with the Canadian data curation forum. So out of these, we have the development of something called a data curation primer or primer, depending upon how you want to talk about it. Sorry. It's an ongoing discussion in our group. We have a divide in the DCN behind pronounced primer. It's primer. Oh, Susan Borda would insist. Sorry. So one of, it's actually really been surprising to me how much people who attend our workshops are into these primers. They want to give something back to the community. They want to talk about things they know. They want to build this resource that other people can use. And I'll kind of show an example of what these primers, primers, I have problems with it as well, are in just a minute. But each of these little spokes, you can see the primers that were developed. So we ask each attendee to form small groups and to focus on developing these over the six months period after the workshop. They meet monthly. They have mentors they work with. And they produce these amazing documents. So the data curation primers, this is really an opportunity for curators or other people like even subject liaison. So we've had attendee. It's like to share their expertise about file formats. And you saw like some examples, it's not just file formats, but it's also like, you know, oral histories or ISO images. It's, it really, it's interesting to see the extent of which these are being developed and explored. So this kind of gets into another one of our initiatives, which is the DCN resources. And these include data curation checklist. So these are format or subject specific curation checklist that sort of abide by the curated model, as well as these primers. And these primers are really meant to jumpstart the curation process for a variety of file formats or domains or disciplines. And they're meant to be incredibly actionable. So you can go there. Let's see, say you get like a dot CZI file in your repository and you can go and you can look up the, the CZI file Primer and you can see, well, what kind of software do I need to use to open it? What are, what kind of curatorial considerations do I need to take into account to understand if all the materials are there for somebody else to reuse it. And it's really, really meant to be like a rich resource to facilitate curation and reuse. So this is just an overview of the primer curation process, attendees, typically two to three groups. We've had people work just kind of one on one. I think our WordPress one was just one individual, which is really amazing. At the actual workshop, they spend time developing a roadmap for the next six months and outline of what they hope to accomplish. And then they're assigned a DCN mentor. So this is somebody within the education initiative who has, has agreed to sort of facilitate this primer development process. And then in the next month, there's also a peer review process that we've put in place to ensure that what's being developed is not just, um, somebody's anecdotal kind of thoughts about how something should be curated, but really is reviewed and, um, ensures that it's, it's, it's more developed. Do you want to mention that we switch from something? Oh yeah. So as part of the, the primer development process, we also encourage, um, the peer review process. Um, we also encourage, um, these groups to, uh, do interviews with researchers. Yeah. So we offer $25 gift cards if somebody wants to interview a researcher so that it really is a little bit more, um, evidence-based in how somebody might reuse a specific file format. Um, so the peer review process at, um, about the halfway point drafts are sent into the group. Um, the peer review process is a really great, um, great, um, a really great, um, a wonderful project manager actually facilitates a peer review process of all the primers. So, um, we reach out to the entire DCN and actually former attendees of the workshops to do some of these peer reviews. And we've had like really great, um, involvement from former attendees as well. It's really amazing. Um, and they basically ask, um, each individual or each reviewer to find a data set within, um, that specific file format, but it can be really, um, useful and we find it useful. Um, so publication, all of these are published on GitHub. And the idea for that is that anybody can go in and make modifications to the primer. These can be like living actionable documents that, um, that can be built upon. So a lot of times that these will come back and, um, they, uh, they might not get all the way through all the curated, curated steps. Um, but somebody can then build upon that. So we're seeing in workshop three, that some, some things are being built upon from the previous, uh, versions of the primaries, which is really great. Um, the community may suggest. Yeah. Um, so these are just some of the examples that came out of the first workshop. So STS, uh, was one of the, uh, wonderful ones that came out. Sophia, were you a mentor? Yes, you were. Yes. Um, so, um, basically we, uh, uh, I wonder if I have an example in here. I think so. Uh, we asked them to identify software that's used for it and the like key curatorial considerations, preservation actions. Um, and then it highlights a bunch of, um, useful information tutorials, bibliographies, that kind of stuff. Microsoft access. We've actually, um, seen this one being used in the community, which is really nice. Um, we all get a lot of proprietary file formats in our repositories. And so mechanisms to evaluate what you need to do to make those reusable in the long terms is really helpful. Um, Excel. Uh, we get a lot of as well. Um, Jupiter notebooks has been really popular. Um, one of the individuals who wrote this is, uh, super adamant that code is not data, even though it often comes with data in most cases. Um, and Jupiter notebooks are kind of one of those, um, weird formats that not everybody knows how to, how to handle yet. They're not as ubiquitous as Excel and, and those kinds of things. So, um, they do have different metadata for different situations. And this is actually an incredibly, they're all of these photos. Um, and I think Lee is here, right? And C and I. Okay. Um, so coming in January, 2020, we have another group of primers coming out. Um, these have some qualitative, um, uh, formats and files in here. Um, STL files, Tableau, Google docs, which is going to be interesting. Um, point clouds and T or text and character coding. Um, so as I said, this really is like a community initiative. Um, we're seeing people even outside of the DC annual haven't attended our workshops contribute to the get hub repositories, which is really amazing. There is a, um, a community, um, contribution guide there so that people know like what's acceptable and what's not as far as what they can contribute on. So quickly, I'm just going to cover some data curation R and D. Um, so this is really our research and development arm of things like where we're headed. Um, so one of the big things we're focusing on now is trying to assess the value of curation. Um, so there's a lot of, as you can see in what's been exemplified here, there's a lot of human effort put into curating data. Um, and while we all use automation and technology to, um, to do some of this work, uh, it's still a lot of effort. And so for us, it's really, we, it's important for us to understand like what is all this effort get us, right? So what is the real value? Are we seeing curated versus uncurated? Are we seeing differences? Are we seeing differences in use? Are we seeing differences in downloads? All of these aspects. Um, and we're, we're not seeing the research component of it is complicated. So one way we're kind of approaching this is to survey, uh, a library repository field about the frequency by which, um, curation activities are taken and that will kind of help us establish, um, a model for act data curation activities. Um, and then we're going to really do some evaluation of the, um, the data that we're going to be looking at. Um, there's also a group focused on big data. So, um, this, I think is led by Duke. University. Susan. Okay. The university of Michigan. Excuse me. Um, and so the idea here is that, um, more and more we're being tasked or asked to archive, um, large data sets. Um, I know that at Penn State, we're often, uh, we're seeing this more and more. It's not just uploading them, but then also downloading them. And we hit, um, ISP or other kinds of restrictions. Um, so this initiative is really focused on figuring out how to, um, integrate with Globus or other sort of, um, research networking tools that are out there to, um, to facilitate big data archiving. Um, there's also a lot of, um, we've seen a big request from others for it's support for institutional outreach and advocacy. So, um, to share stories across institutions about, um, why data curation is important, why this is a useful, uh, opportunity, uh, and role for libraries. Um, and then in the sharing component, it's really about sharing experiences. So, um, we can, we can have like a collective discussion about what's working in at each of our institutions. Um, and then finally there's, uh, some work being done around human subjects and figuring out like, is this an appropriate place? How, is the institutional repository, the data repository, an appropriate place for human subject data? How can we make it happen? Um, de-identification methods, consent form reviews. It's a, it's a complicated space, I would say. Okay. So, um, I'm gonna turn this back over to Lisa to talk sustainability, or if there's questions about the things that just quickly covered. No. Okay. All right. So, um, as I mentioned at the beginning, this is a grant funded project, currently in our implementation phase, and we're trying to figure out really where do we go next? Um, how do we continue to add new partners to this project? Um, we started with six, the lead to eight, now we're at 10. Um, that certainly allows us to grow the number of curators in the program, but it also adds more of a burden in the number of data sets that are coming through the program. So, how do we create an effective balance there? What is the effective balance? So, um, we've been working on a couple of different paths. Um, the first, uh, as I just mentioned, this was kind of our original, like this is how we're going to do it going in, uh, to the grant. Um, as I mentioned, we wanted to really grow slowly and incrementally, only adding two new partners per year. And I will say, um, since we've already done this in two iterations so far, we've really prioritized partners that, uh, well, for better or worse, really look like us. They, they are institutions that have a data repository that do data curation and have data curators. That was the really key thing is that they have staff to bring to this program. But we also recognize that we don't want to just be very exclusive. We really want this to be available to others. We want to be able to scale up so that curation services might be available to those that don't have a data curator or who are going through, uh, transition. And maybe they did a creator left. One of our great institutions. Um, so really that's what we're trying to account for. And we've already seen turnover. We've got staff that have turned over in the last four years. So, um, really trying to account for that as well. We're not always going to have the same pool of expertise. How do we, uh, really allow our, is our, our program to be feature proofed. Um, so this is the path we've been following. Um, we, uh, engage a really excellent advisory panel. Um, and thank you. Thank you. This is a concern I haven't noted yet. Um, to really talk to us about, uh, how to, to really grow this thoughtfully and, and how to think about growing the data curation network. Um, with the variety of different stakeholders in mind. Uh, as I mentioned, we've got a lot of institutional academic perspective, but we wanted to reach out to folks that have worked in maybe, um, smaller liberal arts colleges. Um, or they're doing data curation or folks that work in domain repositories or, um, national labs. So really trying to understand, you know, what is that broader landscape? How do we all fit together? Uh, and then we have, um, uh, our P process, through our process, we actually engage lyricists as a consultant on this very important question. And, uh, you know, not, not a unique question of how to sustain a project, but, but certainly how do we continue to grow a network of people that really, um, it's relying on each other quite heavily. We are really truly radical collaboration right now. We are very essentially relying on other people that don't work for institutions to do the work of the institution. How do we grow that? So, um, working with lyricists, we, we did a couple of things. Um, we, we did some interviews with our advisory panel. Um, we also did interviews of a wider variety of stakeholders, uh, to really gain their perspectives. And now, um, we're right now in the phase of trying to understand some of the administrative and financial models that the dedication network might look to initially and then further down the road. And I really do expect it to be a hybrid type of model that we finally come up with. So just for the sake of chatting, um, we'll show you the three models that we're looking at right now. Um, so we're looking at three different things. Uh, these are going to be familiar to all of you, because you've all gone through something like this before. No doubt. Um, we're looking at a partnership model, a, uh, stakeholder model and a fee for service model. And again, I think some kind of combination of the three is what's going to happen. Um, so just briefly, the partnership model is really thinking about a tiered structure. So we could allow members to contribute, uh, some kind of membership fee, gain access to the network. Maybe they are, um, also contributing staff. So maybe that's tier one where they are contributing staff and they have a certain price to make associated with that. Um, if they're a institution that doesn't have curation staff, or maybe they don't anticipate to use the data curation network very often, maybe that's a lower tier and lower price tag. So pretty standard tiered membership model there. Um, there are many pros and cons to that. Um, you know, the pros of this is very adaptable and it would be much more welcoming, I think to a wider range of people. Um, the cons of course is that there's a membership fatigue. There's, it's a difficult thing to keep continuing to ask, uh, institutions like ours to contribute money year after year. Um, and we also anticipate that at least early on, there's not gonna be enough memberships to justify or to cover all of the costs that we currently have. So, um, we're also looking at the stakeholder approach. Now the stakeholder approach is kind of allowing us to really grow, uh, very, you know, incrementally. Um, so we have a really tight group of core members right now. We add on some additional core members, keep the team really small, keep it very stakeholder driven. Um, this would allow for that preservation of trust that I mentioned. As I said, we come together every year. We're building up that trust. It's vitally important. I think for the shared staffing component of what we're doing. Um, the problem though, of course, is that high level of responsibility tied to each stakeholder. One person leads, what happens? Are we all going to be upset? No, the cart being upset. We won't personally be upset. Well, the cart getting upset. Um, it also has, you know, a lot of unstableness to it. Uh, so really, you know, that, that exclusive aspect to it is not going to be financially stable. Uh, for the long term. And then, uh, the final one is this fee for service idea. Uh, so really thinking about curation as a service. Can you, um, put a price tag on that? Is, is there a way to enable others to gain access to curation services, uh, on some kind of fee for service model? Also, is there a way to diversify the types of services that the data curation network can provide? Um, maybe we are consultants to come in and help your institution set up your data curation service in the first place. Okay. You have a running or come in and train your data curators on how to curate data. Uh, thus, you know, not only growing the network, but also, um, you know, just adding to that value that we see in curation across institutions. Um, so this is kind of really following that it takes the village model and really building up this profession that is still developing. Um, it's also perhaps very difficult to predict what income you're going to be generating. Um, how do we kind of sustain the down years? So probably nothing too, uh, unique or amazing here in these three concepts, but we were really wanting to explore these with our potential stakeholders, uh, with all of you who may have these same problems that we have and may see a value in banding together and helping each other really address this with quality curation and producing quality data sets, because that's something that, um, I think we can all thoroughly believe in. And I said that, so hybrid models. Um, so right now we're going to, um, move into a testing phase where we're going to be approaching our different stakeholders, namely our, uh, deans and, and our administration, each of our institutions, uh, to really talk through these models about what is most palatable in the short term, what can we do in the long term, and to think through all of the different pros and cons that I just very briefly mentioned. And further down the road, or maybe even closer, because we're dealing with these actually all the time, um, it's from day one, really, it is, you know, a lot of other questions and a lot of other opportunities continue to arise for a project like ours. Um, so these are just a few of the things that, that we've been talking about. Um, advocacy for data curation is such an important thing that all of us need to be doing. How can we do that more formally? Uh, how, how can we approach our funders to talk about, you know, it's not enough just to share. We really do need that curation work behind that sharing to ensure that the data are usable. Um, international BCN spinoffs. Uh, we've had a number of, uh, people from different countries talk to us about the data curation network project and to ask, you know, are we the data curation network? Can there be spinoffs? How would they relate to one another? Uh, so we're actually trying to start up that conversation at, uh, RDA and to talk to others who are in, you know, similar situations to see what that might look like. Um, domain repositories I think is really critical, uh, for us to figure out that relationship between general multi don't disciplinary data repositories and domain repositories. Remember domain repositories might have a core of curators that work for them, but they're very, very specific to the types of data that they're working with, which would be of huge value to us. I think we need to figure out the reciprocal of what our value would bring to them and figure out what that relationship would look like. So we definitely want to make sure that we're creating a network that is going to work for all of us. Automating curation. Um, Cynthia mentioned this earlier. I really think that there are so many tools and so many things that we could be automating in our process, in our workflow and how can we incentivize that? I, you know, can we host an award, you know, or host a, uh, uh, put out a call, a RFP for some of the work that we think could be really well automated and develop some of those tools and work with developers to make that happen. Um, and then building our community, building this, uh, this work that we're doing as a professional community of data curators. What does that look like? Where does that live? And, and, you know, being able just to come together in the workshops and, and in the project itself. I think one of the most viable things that I've heard is that finally I have a good people I can talk to about this work. And it's not just me trying to figure this out a lot. So that's been, for us, I think one of the most important things. So we will continue to do that. All right. And with that, um, I will end there. And I most particularly would love to hear your comments about what we're doing. If you think it's valuable and how we might sustain it, because I know you all have a lot of expertise in this room as well. So thank you.