 So hi everybody, thank you for coming to our session about prioritizing researcher perspectives and driving adoption for research data management. It's much louder. Okay. Thank you for letting me know. I didn't know you were talking. Yeah, it's okay. So thanks everyone for coming to our session about prioritizing researcher perspectives and research data management. We'll be talking about, talking to researchers about their data. Just to introduce ourselves. My name is John Borge. I'm a clear postdoctoral fellow at California Digital Library where I work with the UC3 team there. And I do a bunch of research and outreach projects. Mostly I mention that because I'll be talking about a kind of data-driven approach to communicating with researchers today. So you have some idea where I'm coming from there. And I'm Danielle Lohenberg. I'm a research data specialist and a product manager, splitting half my time between the data publication platform for the UCs and half my time as the project lead for the Make Data Account, Data Usage Metrics Project with Data Site and Data One. And we'll start with John's first half. Okay. So I'm going to talk about a project that we are undertaking at the UC3 team, developing data management tools for researchers and service providers. Said another way where you're looking to build a tool to help researchers and librarians talk to each other about research data. And so I think difficulty underlying this project is the fact that talking about research data is very difficult. And it's difficult for a number of reasons. There's lots of angles on this problem, but one big reason is that there are very different stakeholders involved in research data. So we have the research community. We have institutions, libraries, funders, publishers, and one, not only are there these different stakeholders, but these groups are not homogeneous within themselves. So the research community is incredibly heterogeneous about how it thinks about and talks about research data and all these other stakeholders are as well. And there are overlaps here. People in the research community also work with funders and publishers. The perceptions and priorities are kind of messy. Another big difficulty is jargon. So I'm a data curation postdoc. That's part of my job description. And when I talk to researchers, I have a real hard time identifying what that actually means. So there's a lot of jargon in the space, even relatively simple terms or terms that are very common mean different things to different people. So these stakeholders may use the same words differently, may use different words for the same thing. The one that I run into all the time is repository. So when I'm in a room like this, repository means a very specific thing. When I talk to researchers, often they think I'm talking about a GitHub repository. Metadata is like a scary word for every researcher, including me. So I've actually, at CDL, done a little bit of informal research about this jargon problem. We did a survey, a very informal survey, where we asked researchers about the words that they used to describe the various parts of their research process. And so we asked them, what terms do you use to describe the stage of your research that involves preparing or outlining procedures for managing data? And they gave us a wide variety of responses. There are a couple on here that are data management planning and research data management. And I suspect those are actually librarians or people kind of in this space rather than like a biochemistry researcher or somebody like that. But there's just a huge variety of how people describe that phase of their research project. And I think looking at this not only reveals that there are kind of terminology differences, but there's also some differences in perspective about what's important that comes through here. So a lot of sort of experimental researchers are just defining everything in terms of like getting ready to do the actual experiment. I don't have a separate phase for this. This is my favorite one at the end. I don't have a name for this. It's just a thing that I do. And this kind of terminology difference and kind of complication I think has some real consequences. So I'm a neuroscientist by training and one of the projects I'm undertaking at CDL is investigating how brain imaging researchers manage their data like through the course of an entire project. And while we were designing a survey and surveying researchers we thought it would be a good idea to ask them about their interaction with the library. And so we asked them about, I'm not going to read all three of these, but basically we asked them about their interaction with data-related services or library kind of services in three different ways, one focusing on technical infrastructure, one focusing on something that is kind of approximate to research data management and something else that kind of scholarly communications. And in every single case the majority of researchers said that either that those services were not available to them, they were not sure if they were available to them or if they were available to them, they had not taken advantage of them. So the most commonly people were taking advantage of like IT-related services but you can see there's kind of a steep decline. And we are actually checking right now for the people who said there are no services available to them, we're actually checking back with their institution to see if that's actually accurate or not. But what this demonstrates is there's like these large swaths of the population on a campus who are not interacting with kind of library data services. I actually presume that most of the people who said no here are incorrect because most of these people in our sample come from very large research institutions in the United States that probably have at least some data services around. Okay, so the problem here is researchers have data needs, they have a particular way or many particular ways of talking about their data. We as people in the library would like to talk to them about their data but that is very difficult. So there are some tools that this community has devised and is using to make that communication a little bit easier or to at least talk about data. I'm not going to go through these in a lot of depth but I am going to cover some of them because they have informed how we have thought about our own tool. So the first, it would not be a kind of conference involving libraries and data without a research data lifecycle. This is probably the most common one from data one. I think this is a really nice way of showing that research data management is a thing that happens throughout the course of a research project rather than just the beginning or the end. Unfortunately I think when you show this to researchers they assume that it doesn't really apply to them because most researchers I talk to say my data, my research process is not nearly this linear, it looks nothing like this, it's a crazy mess of things going all around and also these words are not necessarily what I used to describe these phases and if I'm talking about and maybe they're not even thinking about things like preservation, discovery, like those kinds of things. We have the data curation profiles which uses some of the data lifecycle kind of terminology and structure to get a very comprehensive view of the data going on or the data being passed around and developed and devised within a particular research group or project. I think this is really great for understanding what data is around. Unfortunately it takes a long time to sit down and conduct one of these or complete one of these and it does have some jargon that if you just hand this to a researcher they would maybe not know what you're talking about. There's a lot of these now, these maturity-based frameworks that kind of allow for assessment of RDM activities or data management services at kind of an institutional level or a benchmarking kind of tool so this is one from ANS here on the side which allows institutions to grade their data management services on a level from initial to optimized and the idea is the higher level you are, the more organized and managed and defined things are and rather than kind of developing new things in response to a need you're just optimizing existing services as things develop. This particular one splits the framework into policies and procedures, infrastructure, support services and metadata although there are many different versions of this now that take different approaches and there are tools that kind of combine all of these things together. This is the DM vitals tool where a researcher and a data curation person or a librarian can sit down and enter responses to a number of prompts and get kind of automatic recommendations about where that researcher should proceed with their data management activities. This is not being used very widely and I don't think a researcher would be able to use this on their own although it is kind of defined, devised for that purpose and taking like a step back and looking at the research community itself it is interesting to note that the research community has adopted similar models in how it talks about research data. So this on the right, the life cycle figure here is from a paper about fostering reproducible science and it talks about faults in ways that science can kind of go awry and become non reproducible and there are some data related things in here and it kind of looks like a data life cycle but you can see that the emphasis is all about kind of statistics and methodology and experimental research and these kind of rubrics for grading journal policies in terms of openness and transparency so these are the top guidelines have a similar structure to kind of these maturity based tools where they are assessing things on a number of levels across a number of different criteria so there are like converging models but we are using different terminology and we are using kind of different perspectives. So here are the problems that we are addressing in the tool that I am going to be talking about. Researchers are faced with constantly evolving expectations about how they should manage and share their data. I think Danielle is going to talk a little bit more about that. Data stakeholders have different perspectives and use different terminology and existing tools while excellent for the purposes that they were devised for are not always particularly user friendly or researcher focused which again they were not always devised to be researcher focused so that is okay but we are developing at CDL and UC3 specifically an RDM guide for researchers. It will have a pittier name when it is complete probably but for now this is what we are calling it. One thing that we think a lot about is we cannot call a thing that is all about getting away from terminology and RDM guide when that is such a hard thing to define so I think here it is okay to call it that but when we are done we will have a different name. So the characteristics of this guide are it is intended to help researchers and data service providers speak the same language. It builds on previous efforts and our own research it emphasizes accessibility, usability and adoption so we want to build a thing that people actually use and emphasizes flexibility and adaptability so something that we think a lot about is the fact that researchers in different disciplines or even different career stages have different needs about data management and we don't necessarily want to tell them that there is one universal ideal state that they should be striving towards and different kind of local institutions might have different services that we want them to direct be directed towards so we want to build something that we can adjust and have be adjusted. So these tools consist of two main parts and for a project that is going to be accessible and usable and have some design elements right now it kind of just looks like a grid but I will walk you through what this looks like. So central to these tools is this RDM maturity rubric again it will have a pithier name probably not use the word maturity in it when it is all complete because talking to researchers it is hard to talk to them that their practices might be immature they don't really like that so we are thinking through the terminology. So just to walk you through the guide so vertically it is organized in kind of a research data life cycle kind of fashion where it is talking about different activities that a researcher might partake in during the course of their research so planning a project, organizing data, saving data, getting data ready for analysis and analyzing data and publishing data for example and horizontally it is organized in terms of something kind of like RDM maturity although we have categorized it as ad hoc one time active informative and optimized for reuse so the idea is you know someone who is planning in a very ad hoc basis might have a way of doing things when it comes to their data but it is not written down anywhere it is not standardized in any fashion or particularly documented as it becomes more of a one time thing they have a plan maybe but they don't look at it ever again if it is active and informative they are updating it as it goes along and if it is optimized for reuse that is exactly what it sounds like they are optimizing all of their planning activities with the idea that their data might be reused in the future either by another researcher or by themselves and so that is true of all the kind of levels. Two things to note is we have these declarative statements at every level that puts what we are actually talking about in a researcher's own kind of terminology as much as we can we try to get away from anything resembling jargon although if you are looking at this and say there is some jargon in here please just let me know and we will try to take it out. So these are like active sentences like I decide what data is important to me while I am working on it and typically save it in a single location. This organization we think allows a researcher to assess themselves very quickly and easily on these criteria and we are working under the assumption that not every researcher necessarily wants to be optimized for reuse we want them to maybe start thinking in that direction but if they do an assessment of themselves and find that they are all doing one time things and that is okay for their current need that is just part of the communication that happens rather than a kind of grading or anything like that. The other thing to note is that these stages along the side are open to being rearranged or open for additions or subtractions depending on what kind of work the researcher is actually doing. I should also mention that when I say researcher I mean any kind of scholar. I am an experimental researcher by training so I kind of default to that mindset but we are trying to build something as conclusive as possible for anyone doing any kind of scholarship that will necessitate some research data management. Okay and so tied to each of these levels are a one page guide so once the researcher has kind of assessed where they are they can open up one of these guides and learn how to advance their practice to the degree that they would like to or need to. So these guides again there will be some more kind of conscious design work eventually. We were just focusing on getting the content together. These guides allow for whoever is reading them to understand what we mean by planning, what we mean by defining each stage, what actually that means in practice, some requirements so if there are actual requirements from funders or publishers or institutions about what needs to happen at these stages those are in these guides and some general points to kind of think about. So for example you know a researcher planning their data one of the things we have in the guide is that planning is not a one-time activity so just writing a data management plan for your grant does not mean you're done with planning and don't need to refer back to your plan it's not an iterative process but and these are all designed to be localized as much as possible so there's space in all of them for a local RDM team to put their contact information if there's a data retention policy on campus that obviously would be incorporated into something like this. We're thinking about working with disciplinary communities to develop like more specific versions of these. These are intended right now to be very general but we would like in the future for you know some kind of researcher to say like what are the particular repositories and databases and metadata schemes or whatever that apply to me. We'd like to have that in there if possible. So again flexibility and adaptability. So in terms of outputs we are working right now to design the physical collateral that will go along with this so there's going to be a postcard with a much fancier version of the guide on it with some information on the back that a librarian can hand to a researcher and kind of facilitate a conversation like that. Also brochures including the kind of one-page guide material. Pretty soon I'll start writing a publication describing the development of this project so the community of librarians and data curation folks can read that we're also yeah like I said developing tools for developing discipline or institution specific versions so right now if someone wants to create a local version they would have to come through us but we would like it to be that they could download our templates and add in their own versions if they'd like. And I'm trying to be as transparent as possible as this project is developed so there's a blog post coming soon there have been blog posts throughout the whole thing project updates. I'm just keeping people apprised of what's going on and allow for feedback at every stage. We've had a lot of conversation through blog posts about the difference between data sharing and data publishing for example and like terminology that is important to have some operational definition of moving forward. And I would like to I guess end this by just noting that we're always seeking input in collaboration so I'm happy to sit down and talk more about this with anybody who wants to my email address will be at the end. I'm happy to collaborate or talk through this more. Okay thanks. Great. So as I mentioned part of my job is promoting data publishing for the UCs and globally and I came into this position about 10 months ago after working for four years at PLOS starting the open data policy there. And so I come from a position of not a lot of pushback because you have to open your data to publish the paper and coming into a job where I'm overseeing a data publication platform and I'm not seeing any adoption. And why am I not seeing adoption? It doesn't have to do with the technology. So we talk a lot about the sticks and carrots of open data publishing. We talk about how there's funder and publisher mandates. There's doing the right thing. There's reproducibility and transparency. But there's something we're not talking about when we talk about sticks and carrots. Let's start with sticks. The PI is the one submitting the DMP plan. The grad student or postdoc is the one who's doing the research and is responsible for publishing the data. So the grad student might not know that the PI put in the DMP that the data is going to be published or never released. And then when the grad student goes to publish and they say, okay, to my PI, I really need to open up a status set so that we can get our paper published or, oh, I want to open this up because I think it's the right thing. The PI could say, I had no idea and our lab doesn't do that. So there's this huge disconnect between the sticks. Doing the right thing is that we keep talking about how we need to open up data. People need to reuse it. We should be doing data citations and giving credit. Absolutely true. But right now, tenure committees aren't looking at data citations or opening up your data. And so we're saying, do the right thing. But researchers aren't being rewarded for that yet. We say that open data has a huge success right now. We talk about all the publishers that have opened up thousands of data sets and how many there are out there. But as I know from being a publisher, if the researcher wasn't prepared to do this, they could just submit a figure and put it up in FigShare and SI file and we'd say that's open data. But how do we know that's the data? Similarly, we talk about how many submissions there are in FigShare and open science framework in all these places. But an example is we put all of our presentations from CNI and open science framework. So how do we know how many actual data sets? If we look through these repositories, how many data sets and how much of it is actual reusable data, fair data, not just things that you're putting up to fulfill a requirement. And we know that these resources are not adoption focused. There's a lot of jargon, like John was talking about. But we're really focusing on tools and technology. Even just now, I was going through Twitter and seeing that there's a lot of talk at CNI about the technology backends. But where are the researchers in the room talking about this? The researchers don't know anything about the back end. And we've come up with so many tools to support these policies and practices. But they haven't been user tested. The researchers don't care about specific features. It's like what you learn in product management is that only 70%, I think, of features are ever used in a product at one time. So we're investing so much resource into building out these technology things. But we're not talking about the researchers. And even in this room, it's great to all be on the same page. But we have already bought in. We agree that these things should be open. So where are we including the researchers? And I've had this issue. I go to the UCs and do boot camps. Here's a series of them that were done. UCSF had 56 people sign up. They had 20 postdocs show up, which was great. No one showed up to Santa Cruz. No one showed up to bids. I think two people showed up there. And we asked the people at UCSF, why, why did you show up? And they said, my PI made me, which is great. And they showed up. And then you did it. But we've tried everything like we'll give you free food, your funder requires that you come to this. The NIH tells you to come like all these things. And if that's not enticing, and even a free food not enticing grad students, what is it? What is this language? How are we going to get these people to pay attention? And there was a great talk this morning from Spark about having us invest in people and investing in the new wave of grad students coming in. But how are we even going to entice them? So we have a disconnect. The language we're using isn't connecting and the incentives are not apparent enough for the researchers. So to tackle this, I decided, All right, I'm not gonna sit in this cube. I'm going to go out and I've been going to the campuses and buying any researcher who will talk to me coffee and asking them for sit down and talk to me. And I have no idea if they know anything about research data or publishing practices, they could be in a variety of levels. Since it was not IRB approved, everything is just going to be random quotes. But I promise the real quotes. So I started and I said, What terminology resonates with you? And a postdoc said, Give me credit for my research. And I said, What terminology makes sense when I say repository? And a PI Berkeley said, Just use the word archive, which we all think is an old term. And then the PI was like, That's all I want to hear. Don't use the word preservation. Don't use the word repository. If you tell me to archive it, I'll put it there. We said, How would you describe your labs? Are you in practices? This is what I got. I think both of them are equally ridiculous and kind of shows how crazy this situation is that we're in right now. What motivates you? A postdoc said, Well, nothing. Or nature papers. And this came out of when I said, Well, what about if I said, You can get a data citation when you open it up, and you'll get 500 citations on it. And you're going to know that there was 40 nature papers that, you know, that cited it does not look so much better. And they said, Nope, just need that one nature paper may take me two years, but I'm not going to open up my data in the meantime. I said, Would you publish your data? That's another grad said, I'm not giving away three possible first author papers in the stream world that we're all living in. But these grad students are going to get that. And that's really what's holding pulling them back. A postdoc said, How is this any different than SI files? And that's for this whole. Well, then what are we saying publishing your data? Do people even know what we mean by this? And a PI who is publishing their data a lot said, I'm hesitant to do it before publication, because I still not convinced that a DOI is recognized that I'm the first person who published out that work. Until people start citing it, I'm going to still hold on to it until I publish my papers. So what'd we learn? We need to include researcher in these conversations because there's a massive disconnect between all of these resources we're putting energy into and all these amazing discussions we're having. And yet, those are the answers that we're getting when we think of the end user. So taking it back to what I'm doing day to day is gosh, and so this is a data publication platform that is really focused on adoption. That being said, we don't have massive adoption right now, because of everything I just showed. And so I'm spending more time trying to talk to researchers going to individual labs, and trying to integrate data publishing into their workflow, then actually just trying to talk about the tool, really focus on pushing for data publishing in general and not trying to say, just use dash, I need people to use dash, because if dash doesn't work, that's okay. Really, what we're just published, we're really trying to push for a change in the practices that we're seeing right now. And if that means people are going into any other repository, that's great. But just the more that we can see if people actually putting the research data out there, the better. So our goals have really been having the researcher needs drive development and integrating into the researcher workflows. So going out to these researchers in these interviews that I'm doing, and show them dash and I say, what's one thing that would get you to use it? And then we put that in the backlog, and we see how many people are saying that. And that's how we prioritize what's going to happen next. We are integrating into researcher workflows, because we know the only way people are going to start publishing their data is if it's common practice. And that's the only reason that we've seen such an uptake in specific fields, because it's a standard, it's a common practice, it's a you have to, they're used to it. How, how are we doing this? As I mentioned, talking with as many people as possible, and some ways that we were able to integrate this, we put in a manifest upload, we're calling it. So instead of drag and drop, you just put in the URL for where your data are in the cloud or on a server. And then dash goes in the back end and grabs it and publishes it for you. We're building on a submission API. So that means that there'll be more technological ways to get the data in. We're doing integrations with our open side, as well as Jupiter and online lab notebooks, to actually be able to just be in your practice, right click, publish out your data, right click version your data, in ways that the researchers are actually working with it. So we think that, you know, just working with boxes is really great, but actually, our researchers even using box, where is the data right now, and just finding the best source and the easiest way for them to get their data in. And we're also talking about UI integrations with publishers. So working with UC Press right now to be able to get that in. So when you hit the data availability statement, you could say publish with dash, and it would be a one click where it sends over all your metadata. So removing those additional barriers, so that people can actually just, while they're publishing their article, make it that publishing your data is part of that practice. So communication with researchers is essential in this process. It's not fun. They might tell you they don't care what you're talking about. But it can be really fun if you take it as, okay, I'm going to take that as a challenge and we're going to make this happen. And if we're going to build these services and tools in this community, we have to be including them. And we have to be able to hear what they're saying, even if they say, I don't like anything about this. I don't feel comfortable with this. And we need to iterate based on that researcher input and not just work within our own communities, where we say, this is a technological practice. This is a standard. This is how GitHub is. But is that something a researcher actually values? That's what should be driving these conversations. And I look forward to seeing how this will continue and what we kind of see in the next year as more of us are starting to get into this. So we'd love to hear about your experience or answer any questions or talk about why you disagree with