 Okay, so my name is Grace Agnew and I manage strategic initiatives at the Rutgers University Libraries and I want to talk to you today about a research project that we're engaged in with one of our unfunded mandate departments in computer science, RDI2 and it stands for research in data and informatics, information and informatics. So what we've done is I'm working in a team, I'm the lead of the team and the co-PI on a grant. I'm working with two other people, Ron Janssen and Ryan Womack who's our data librarian at Rutgers and then two other institutions, Annie Johnson who manages scholarly publications at Temple and Robert K. Allendorf who manages data for Penn State. So the virtual data collaboratory is a grant that we received, it's a four million dollar grant, it's three years, four years awarded by the NSF and the intent is to develop research data building blocks. So we're building a prototype of a multi-state research data portal that's supposed to focus on interdisciplinary data and the libraries that are participating which are Rutgers, Penn State and Temple, our role is to develop the data services layer which is basically the user interface for storing and accessing research data that's placed in the virtual data repository. We're utilizing Sam Vera which is a Fedora 4.x application, I think it's up to 4.7 and it uses linked data to build its metadata architecture. So what we wanted to do is we really wanted to understand a couple of things, we wanted to understand what faculty and graduate students thought was interdisciplinary data, then we wanted to understand how they found it, how they used it and what their needs were for storing it and accessing it because we actually found that there wasn't a whole lot of literature out there, that we really didn't know ourselves what was going on. What was very interesting is we really, really did plumb the depths of our ignorance and it turned out our ignorance was actually quite profound for people that actually work with faculty and students that see them on a daily basis, it was really interesting to me how much they've changed and we didn't really notice because frankly the clothing hasn't changed all that much and the phones look pretty much the same. So just to give you a quick overview, when we did some research on interdisciplinary data and this did prove to be true, interdisciplinary research really cuts across disciplinary boundaries so it really can be impeded by current academic structures and our academic structures actually have really become very solidified around responsibility centered management at Rutgers, so that really, really impedes interdisciplinary data because it really does give money to bricks and mortar and interdisciplinary data is not bricks and mortar. It is characterized by shifting boundaries, intersecting domains, collaborating specialists. In other words, it's pretty much anything and what we actually found was in the couple of years that we've been kind of skirting around, poking at it with a stick, trying to figure out who's doing it, what does it mean. It turned out that it's actually become basically the way everybody at Rutgers, at Temple and at Penn State now do business and we just didn't really realize that it was no longer something that people were thinking about and it was basically just business as usual for them. We at Rutgers I can say with authority are not prepared for the fact that research has now become pretty much interdisciplinary on its face. It does involve exchanges of factual knowledge, new ideas, socialization, new technology and network of contacts. In other words, it's very dynamic, it's very engaging, it's very complicated, it's very bewildering. It's influenced by personal compatibility which basically means if you're a researcher who likes to work alone, maybe with a college graduate student who needs a good grade but basically alone, you're really SOL these days. You're going to actually have to get on with people and you're going to have to understand them and they're going to have to understand you. That works as we discovered for some people. It doesn't work so much for others. It still is a vague concept even though everybody does seem now to be doing it. What we did is we conducted one-on-one interviews for about an hour in length with 14 faculty members who are participants at Temple, at Penn State and at Rutgers knew we're working in interdisciplinary areas. They actually represent the spectrum. There are tenure track, young professors, there are people approaching retirement who are the eminent grieves in their fields. It really is a range. We actually did have a couple of humanists slip in there which was very interesting as well because they were primarily STEM faculty. Then what we did and this was really smart of us, we did two seminars. We offered guidance on how to manage your data to the graduate students. There was a half hour seminar and then afterwards we asked them to stay and we told them upfront they'd be doing this obviously, IRB and all those requirements. We did a focus group with them and we didn't serve the pizza until the focus group started. We actually got really good turnouts. If you're wanting to talk to graduate students and you want good turnout, that really works. Give them something they want and then feed them after and they were very, very generous and expansive with their ideas. Then what we did is we designed our prototype interface which we're currently programming for our virtual data collaboratory. We put the design out in a survey with a bunch of survey questions and sent it to all the participants in the focus groups and all the faculty and asked them if we had met their needs that they had articulated in their meetings. What was kind of disappointing is we didn't get a great result from the graduate students but we did actually get almost all of the faculty responding. That was interesting and a little unexpected. Here are some of our findings. The findings really kind of were very surprising to me. They really did knock on my head everything that I kind of expected, everything that all the assumptions that I had made. One thing that was interesting and it was really interesting, CNI of course is a wonderful conference and Cliff's introduction was really good and I've been to several sessions and those sessions would lead you to believe that the world of research has really changed on its head and everybody is, the faculty would just blow us away with what they're up to and we're really gonna have to struggle to catch up. What was really interesting is interdisciplinary research remains, while it's become the norm, it is extremely manual and labor intensive. We bombed them back to the dark ages with the way we organize information. So literally we have done that. When we asked them how do you get started in another discipline? Do you just rely on the person that you're working with in that discipline? Which you know is a very interesting and relevant question. What they said is no, I try to learn something about that discipline. Well, how do you do that? Well, Wikipedia was a very popular response. Wikipedia was about what everybody did. Well, I go to Wikipedia and read up. But people also as one faculty member said, I dug up my high school textbook to reintroduce myself to math. You know, another thing they do is they talk to the leading expert in their department, you know, in their university. What can you tell me about sociology? Can you give it to me in a nutshell in about half an hour because I have a class or however they broach it with. I'm sure that's a real popular, you know, meeting. Can you tell me all about your job in five minutes? But they don't turn to the libraries at all. We were not an answer in any way, shape or form. They're not using our resources. They're not using as when I presented this to our library liaisons, they said we spend a fortune on online encyclopedias. Now, I'm the first to say Wikipedia has gotten a lot better. But nobody is using our resources for their general reading, they're using Wikipedia, or they're digging up books or they're talking to people. So it's and you know what what came across very clearly is it's a struggle. They're reinventing it for themselves. Nobody's giving them guidance. They're figuring it out and they're not figuring it out well. What was also interesting is data is all about the person who created it. And that came through loud and clear in both focus groups and in all 14 interviews. It's not what but who. So ideally, they want to find somebody local because whatever they're going to do, there's going to be a lot of conversation, there's going to be a lot of collaboration. They said that FaceTime, Skype, etc. don't really cut it. You really want to be in the same room if at all possible. So if you can't, you can't. But if you can find someone local, that's what you prefer. They also said, you know, I if I need to know something, I want to ask somebody. I want them to tell me I'm off base. I want them to tell me I'm not getting it right. I want them to tell me who the thought leaders are and what I should read. I need to cut to the chase and the way you cut to the chase is talk to an expert. So that was, you know, a little disconcerting and a little disappointing because we spend at Rutgers about 11 million on, you know, digital resources. So, you know, but at the same time, you know, we're trying to just really pay attention to what they're telling us and really understand and apparently the person is is king in the world of interdisciplinary research. So that's that's good to know. So the graduate students differed a little bit from the faculty, but they actually were very similar to the faculty. So what was really interesting is that graduate students and mind you, you know, these are, you know, recently, you know, they're recently inducted into their fields. We're not talking about anybody who's, you know, on their the road to retirement or anything else. We asked them what their discipline was. That was really interesting when we asked at Temple. At Temple, we had about 30, 35 attendees and there was dead silence. And we saw people looking at each other and they're trying to figure out what their discipline was. It just wasn't a concept that was familiar to them. They didn't think in terms of discipline. So somebody finally volunteered. Well, I'm a math person, but I'm working on cancer research. So I'm not sure which of those two you would call my discipline. So that was really interesting. And someone else then volunteered. Well, I'm physics and I'm working on cancer also, but from the point of view of blood. So, you know, they were coming at it from different lenses, but they, the two of them, one in physics and one in math, really identified their discipline as cancer research. Well, we don't catalog materials that way. We don't organize them that way. We don't organize our collection budget that way. We don't organize our liaisons that way. So we really aren't organized in that manner to acknowledge that cancer research is apparently a major discipline, both at Temple and at Rutgers. So that's something we're really going to have to think about. They also, we asked them if they used research data that they didn't create. Graduate students, not so much. What faculty told us is we're constantly looking for data. We wish you would just alert us to data, tell us where we can find it. We really don't want to jump from interdisciplinary portal or disciplinary portal to disciplinary portal, but that's one of the things we'll ask an expert where you get your data. And, but the students, what they told us is either their professor provided it or they said, you know, it does come with the articles and there's always a DOI you can click. So increasingly, where students are finding their data is via DOIs that are attached to the articles. It was interesting when we talked to them later about trust to have them say that they trusted data that had a DOI because that meant it was peer reviewed. Well, that doesn't always mean it's peer reviewed. The article is peer reviewed. The data may or may not have been peer reviewed. A lot of the data is not peer reviewed. It's also not stored in the same place as the article. It doesn't have the same, the article and the data may have very different, you know, guidelines for preservation for long term maintenance, etc. So it was interesting that all of the graduate students seem to conflate the data used to create the article with the, you know, intensive peer review process of the article. So we did ask them, what were your challenges in using interdisciplinary data and, and needless to say, and doing interdisciplinary research and not surprisingly, everybody had a lot of challenges. One thing that came through loud and clear across pretty much all the respondents and they used the same terminology is I have to learn different languages. What they basically, and what they meant by languages, a few actually did mean I'm working with people in Portugal and I don't speak Portuguese and that can be a problem. And Google Translate only takes you so far. But what most of them meant was I'm working with people that are looking at the same problem but using a different lens and I need to understand that lens. The lenses consisted of methodology. You know, they're doing qualitative, I'm doing quantitative. I'm doing, you know, I'm doing experimental, they're doing analytical or theoretical. So methodology was a huge barrier and one very interesting, the woman who was the physics major who's doing blood work commented that she was working with a bunch of health scientists and she would say things like why didn't you ask this question or why wasn't this data created or why can't I cite this data in this, you know, paper and she was told, well, we didn't get IRB permission for that. And she said it took her a while to get up her nerve to ask what is IRB exactly. So of course the health sciences have tremendous institutional review board requirements that they have to meet and the physicists not so much. So that was a new world for her. Terminology of course is different. They often use the same terms but use them in different ways. So, you know, you can't count on particularly methodologically, a term in one methodology like variable will mean something else in a different methodology. So they reuse the same terms but with different meanings. So terminology can be a really huge issue in translating the language across different lenses. And then as I said, there are actual language differences. But it was interesting to have everybody say I have to learn their lingo. I have to learn their world view. I have to learn how they see things. So that was a really interesting thing. The needs is they did they were not fans of disciplinary portals which I found very interesting given that, you know, Clef is saying that, you know, digitally disciplinary portals are going to take over from interdisciplinary because what they found was it was hard to find resources in disciplinary portals. Again, there's a language difference. You need to understand what makes sense to them to organize it. Some people, it makes a lot of sense to do georeferencing and other people, you know, the geographic distribution is not what really matters to them. So drilling down to the data in Albania is not really what they wanted to do. They don't necessarily understand, you know, the things that you can do to filter, search, et cetera. So they found, you know, the disciplinary portals were hard to find but could also be a translation hurdle. They also said that not enough is in the cloud. And when data is in the cloud, you can't always use it with tools like HPC and other things. So they said the tools have to be in the cloud as well and the data needs to be in the cloud because it is a real barrier to have to pull the information down. And some of the graduate students talked about, you'll start your graduate work in one school and they have really good tools and they have really fast networking and you go to another school, there's no storage space, they don't have the same tools. They said they would really like a level playing field of having everything up in the cloud. They also really would like better tools and these tools don't exist to any great extent yet to create subsets of data to abstract variables and reuse them so that you don't particularly with massive amounts of data so you're not having to use the entire set. They also said that the tools they use most heavily and the tools they'd like to have more of are better collaboration tools because all of their research is collaborative now and not all of that collaboration happens in the same university and also not everybody is in the same room at the same place, people working from home, et cetera. GitHub is heavily used by everyone and what people are a lot of them and what people said they liked about it was the ability to track accountability, the ability to track everybody's contribution and the ability to dial back several versions if you don't want someone's contribution so the tool of being able to separate out your workspace but then come together in your workspace and also to have versioning and accountability is what really appealed to them about GitHub. And then they also said that they need to trust that the data is curated and managed for the long term. And here they did seem very aware that that was a library responsibility, that that's something that we do, and they were very curious to know how we were going to be involved in what we would be doing with this. Many of them said, I don't really know where to put my data. What do I do with the data I'm doing right now? A lot of them do want their data to be private until they've finished their research. They've edited the data to their comfort level. They really like the concept that when it's publishable and you assign a DOI, that's when I want it public. I really don't want my work product public except to my project team until we're ready to share it. And that makes a lot of sense. So I think we can all understand that. So trusting research data, we ask them. And with the students, what we really ask them is, if your PhD, getting your PhD, depends on picking the best data out of several data sets, what would enable you to have trust that you pick the right one? You know, bearing in mind, the stakes are really high. You don't get your degree if you pick the wrong data. If you have to pick something that you really trust, what are you looking for? And what we ask with faculty is when you're looking for data that you can use in your research, what are you looking for? And they all basically said the same thing. The person who created it is the primary way that we decide that we trust this data. They want to know the reputation of the PI group or lab. The faculty in particular wanted to know if somebody was a thought leader. That's one reason they contact their colleagues, is tell me who's the best in your field and what data they're producing. Because they want to cut to the chase and find the person that they think is authoritative. We did ask, is data from a grant more trustworthy? And we assumed that this was kind of a no-brainer. And for the faculty it was. The faculty said, yes. You know, we would like to know upfront what, what grant approved this data. Because that means that the hypothesis, the process, was approved by a PIER group. The graduate students at Rutgers didn't seem to feel it mattered one way or another. And the Temple graduate students were adamant that that would actually make it worse. They said, granting agencies have agendas, particularly if they're commercial. So they have a bias. They only want the data that they want to see out in the world. So that was an interesting perspective. To some extent I put that down to graduate students are not usually part of the grant review process. They don't know how rigorous grant review is. So they often participate in grants but they usually didn't write them. So they aren't quite far enough along to know the grant review process but it was interesting that they have definitely conceived that, and we all know of course, some commercial granting, grantors are looking for results that they feel will help them further their profit margin. The other thing that they said was really critical is, has anybody else used it? And that was across the board. Faculty, students, both feel more comfortable using data somebody else has used. As one of the graduate students said, if the data is wrong, at least I'm in good company. So they all said it would be nice to know that others used it but also who used it and why they used it. So the other thing is they said can we know that the creation was trustworthy? Was it created with the appropriate equipment, the latest version of the equipment or application so that we know that the data is trustworthy? And is there a code book or are the variables explained in some way? So one thing that people said across the board and I think again this goes back to that lens concept or the language concept, knowing the hypothesis behind the data really helps them evaluate reuse of the data. For one thing it tells them, well it's interesting that they're doing this blood work with cancer but they didn't look at the age of the people that donated the blood. Well maybe that wasn't in the hypothesis. Maybe that's not what they're looking for in their research. They're looking at how sex or gender influences cancer but not age. So knowing that that was their hypothesis that helps you understand why they left a variable out that you would have liked to have seen or thought was important and maybe it tells you this data set isn't gonna really work for me. So that was something that everybody said they really wanted to know and they actually said that's one reason I wanna talk to the creator. I wanna know what they were trying to find or what they were trying to prove by creating that data. So this is, I am pressing the right thing. Oh I'm sorry, let me go back. How do I go back now? This thing, stupid Mac. Okay, all right. All right, so I'm back where I started here, so okay. All right, so this was what we sent to them. You can't read it really, I know but this is what we sent to them in the survey, the Qualtrics Survey that we followed up with. So what we've done in San Vera is we've created Portland Common Data Model for the person, for the creator of the data. So we basically, when they apply to be a member of the VDC and they fill out the application form that gets fed into actual metadata which we will then associate both with the resources that they put in but also the resources that they reuse. So when we display a resource we do show people the name of the person, their ORCID ID, what university they're affiliated with, their email address, their affiliation, their position and their discipline. So we try to show them enough about the data set so they can decide if it's worth pursuing further. They did, we did ask them, is it important? Cause it cuts out a step, they don't have to click on a directory, it's right there that they can get to the email right away to contact somebody. So we asked them if that was important to be able to contact right from the metadata and they said yes. So we did include that. The other thing that we did is we added use. So we show who used it, we have the ORCID ID, you can also click on that person which is me in this case and you'll get that same information, what my affiliation is, what my position is, what my discipline is. And then we, based on the uses that they told us they make of data we created some uses and this one says modify for use in original research. So I would modify somebody else's data set to use for my original research. So we took some of the things that they said they had done and we created a dropdown menu. So when somebody downloads data, if they're a VDC member, but we can only do this for members of the VDC, if they're a VDC member we're gonna ask them what use you're making this data. What are the choices is I prefer not to say and that's fine, but we are asking them what use they're making and so somebody else finding this data will be able to see everybody that used the data and we're taking advantage of the inherent link data in San Vera to do this. They'll be able to see the use that was made and contact that person to ask them about the use. So as you can see the survey is largely faculty that took it but we did have graduate students and I do think the staff member who took it is also a graduate student who just identified a staff they're a part-time lab manager. So some of the survey questions we asked, again, this is interesting because faculty did tell us, we said in finding trustworthy data, this was the question, what is the most important to you? Availability of the abstract was number one but then the second was the research question or hypothesis. So that was clearly something that people want to know. So those of you that are doing metadata for research data, that is a question that we ask people when they upload data, if they're uploading it private, all they have to give us is the names of the creators and a title but when they go to publish it they do have to tell us what their research hypothesis or abstract was because that is something that people really do seem to want to know. What's interesting is granting agency was not considered important really. We asked people to label each thing one to five. So the least important for everyone was granting agency. So despite the fact that faculty said that that's something they would select based on that was least important. We asked how important it was to be able to contact the data creator. They said it was very important. We only did give email addresses because I know it's very annoying to be contacted synchronously so we didn't make that an option. As I say, it was both very important and important to know. Nobody said it was not important to know that other researchers have used the data. I will say the graduate students wanted that more than the faculty but the faculty wanted that as well. And here are the primary reasons for data use. So the primary reason was to cite the data in a paper that you're writing. So 10 of our, I think 14 respondents said they've done this. Computational analysis not surprising is another biggie. Modifying for use in original research was up there. Use for teaching a course, use unchanged. And then the one person added, this is an environmental engineer that she used the data for government or institutional policy recommendations. And I do know that she does a lot with New Jersey environmental regulations. So how important is the information about the data creator? The name of the creator obviously is the most important but the discipline was surprisingly important. And again, for people that didn't really identify, they don't really identify in the discipline, I'm a mathematician, I'm a physicist and this was true for the faculty as well. Well he asked them what they did. They said things like I do sustainable crop research and it turned out he was a climatologist. He didn't say I'm a climatologist. He said I do sustainable crop research. They really identified by the problem that they're trying to solve. But they do wanna know the discipline. So why is that? What do you think? What are your thoughts? Absolutely, that's what I thought is they're looking for that lens. So basically what we kind of hypothesis is that the problem is now the meta-discipline. And then the formal discipline is actually the lens. So we don't have a straightforward kind of disciplinary approach to anything anymore. What we have is this meta-disciplinary approach and then lenses that we place on that approach. So we did present this to our liaison librarians to just see what they thought and it really engendered a very lively discussion. There was talk about maybe we need to figure out how we can mimic the approach that they take to studying the problem to supporting them. But we're tightly staffed like most libraries. We can't really field five librarians to go talk to researchers. But what we could do is start triaging with LibAnswers and start sharing the research questions that we get and saying, I'm gonna give you some immediate help, but I'm gonna get back to you after I talk to the sociology librarian since you're working with sociologists on the socioeconomic issues involved in cancer research. So we're looking ourselves, we just started talking, but it was a very lively discussion. We went long over the time we were going to spend, how we can start changing some of our services. But I think we also have to think about what starts with the faculty and the graduate students trickles down at some point and faster than I think we know because I really thought that this was still emerging and I didn't find anybody who said, oh, I still do straightforward. I'm a mathematician by gum and I do math research. That's not how they put it and they all worked in teams. So I thought it was really interesting that it seems to have changed completely and not just on the verge of change which is what I was still expecting. So I think we can expect teaching and everything else and at the undergraduate level to change as well. So I think it's a good idea to start thinking about how the dynamics of learning, research and what people are studying is changing and how that's going to impact. How we tell them about resources we have. We're not marketing, obviously we're not marketing. We spend over $100,000 on general purpose and cyclopedias and we're not marketing them well if people are digging out their high school textbooks or turning strictly to Wikipedia. And maybe Wikipedia is better and we don't need to spend that money. I'm not someone who's saying, oh my God, Wikipedia is the devil. We need to take another look at Wikipedia. Really smart scientists think it's fine that maybe we're wasting a lot of money on encyclopedia Britannica. So lots of interesting questions were raised by this study. I think it's going to benefit us long after this grant. So any questions or thoughts from what you've done? Thank you very much. Thank you.