 All right. Well, welcome back to our closing plenary session. I'm very pleased to see how many people were able to stay on till this closing session. I think it's been quite a busy and rich couple of days. I'm looking forward to watching a few videos of things that I wasn't able to see. And I suspect that many others here are in that same boat. So we will try and get those videos and other materials from the sessions here up on our website as soon as possible. And as always, we'll put out announcements to CNI Announce as that material gets to be available. I'm going to be very, very brief in this closing. There are a few folks I want to thank before I introduce our closing speaker. I also just want to remind you there are, in the packet, you've got dates for a number of upcoming events that may be of interest, including our December member meeting in Washington, D.C., where I hope I will see many of you who are here. So three sets of thanks this time. I do want to, as always, recognize the contributions that our breakout speakers have made to this meeting. We had a lot of superb presentations. They represent a lot of hard thinking, hard work, good preparation, and opportunities to learn a tremendous amount. And it's those sessions that are the backbone of the meeting. So I'd like to call for a good round of applause for all of our breakout speakers. Thank you. Second off, I would like to, you may have noticed in the last couple of years, we have increasingly been doing more and more capture of sessions. We used to just take a few videos of the plenaries and maybe one or two other things in the plenary room, and now we have extended that as you may have noticed to quite a few other sessions. But to pull that off, we can't do it just with our staff. And so we usually seek a bit of help from some of our colleagues in the local area. And at this meeting, the University of Washington's iSchool has been very, very helpful in finding us a number of folks to come over and help out with the AV capture that hopefully you will be enjoying in months to come. And hopefully also, as I was saying, talking with Carol that students at places like the University of Washington iSchool will have an opportunity to learn from these kinds of materials that we capture as well. So I'd just like to express my thanks and our thanks to that team of people for coming in giving us a hand. Thank you. And finally, I just want to thank the CNI staff. There's a lot of logistics to this. And when the meetings go smoothly and I really feel like this one went very smoothly, that's because these people are working frantically to make it happen. And I really appreciate all of their hard work. Thank you. So now let me move on to our closing plenary speaker. I'm delighted that Carol Palmer is with us. She has just come this year to the School of Information at Washington after a long stint at the Graduate School of Library and Information Science at the University of Illinois, Champaign, Urbana. You probably, if you didn't know all about her already, have read a long factual bio on the web. Or elsewhere. So I'm just going to say a couple of things about her and why I thought at this juncture it would be so helpful for us to hear some of her thinking. Carol and I go back probably 10 years or so now. And I feel like between us we've watched the emergence and at least preliminary definition of data curation as a profession or a sub-profession. And she as much as anyone I can think of has been deeply involved in a set of inquiries about what do people need to do to do data curation? What do they need to know to do those things? How do we teach them about those things? How do we balance tensions between disciplinary knowledge and more general knowledge of curation? How do we deal with the sudden spike in demand for people with various of these qualifications? And the programs that she's led have really been integral to sorting those out. But very much particularly in the early days, practice-based and involving internships and those kinds of mechanisms and then moving from that into a sort of a broader definition of this. And of course, you know, it doesn't stop there. She's thinking about things like what are the next generation of people need to know? What happens as we go forward with this kind of data intensive scholarship? What is that going to imply down the road apiece? In recent years she's been engaged in things like the Research Data Alliance, which are looking at some of these challenges on an international level. And I think she's going to share experience broadly and thinking broadly about all of the challenges around research data. And bring a particular, you know, sort of look to it that's not about how we do it on the front lines of an institution, but how do we train a generation to do it? And what does that generation need to know today and tomorrow? So I'm delighted that Carol is here and was able to work her schedule to share her knowledge, experience, and insights with us. Welcome. Well, it's great to see you all in Seattle. And I'm very happy to be a new resident of Seattle. And I'm thankful for bringing CNI here because it allowed me to meet some of my students face-to-face for the first time who I'd had in an online class. And to, you know, really engage and see everything that's going on with CNI. This is really important for me to get the exposure I need to continue to do the work I do both in research and education. So it was interesting to me to read Cliff's roadmap. Does he do that every year for you? Isn't that something? That's a nice piece of work. So I read his roadmap and I read what he said about my perspective. And so I put a little different framing on what it was I was going to talk about today. So after reading his roadmap, I thought, well, I think I want to try to interrogate these things a little bit. You know, are we the experts we need to be? What are the exemplars for data resources and services? Can we learn and lead at the scale and pace needed? And then, of course, I had to go back and look at the last time. Maybe I did something CNI-ish in a formal way and found the ARL CNI forum meeting in 2008, which was reinventing science librarianship models for the future. And so this is the presentation I gave there, preparing eScience information specialists, new programs and professionals. So this was 2008. Our biological information specialist program was in place and was going pretty strong. Martin was on our board. And interestingly enough, that program is not thriving like some of the others I'm going to talk about, but there's interesting dynamics there, I think having to do with how informatics has grown so much in our domain fields. I talked about our specialization and data curation that was just sort of taking off. First graduates come through that program at about that time. We'd been running summer institutes in data curation, actually just sort of launched those a couple of years before that. And I think I've met some of you for the first time at some of the institutes. And I think the story I told was very optimistic and talking about the positioning and the potential of our profession and where we were going to be part of all of the data intensive initiatives. But I did close with some words of warning and talking about sort of the danger of us relying too heavily on our knowledge base that is so closely tied to the bibliographic universe. And feeling like this was something we wanted to really explore and not devote too much emphasis to. And looking back at that talk, just the other day when I was preparing this particular slide, I realized that in fact I was also underestimating what it is we needed to do and how we were going to get there. So this was a great opportunity. Cliff, thank you for getting me to revisit my thoughts back then. So 2008, even before that, we kept hearing the deluge of data. But in fact those of us who teach and do research in this area were in a deluge of discourse and directives. So 2003 alone, there were at least 11 reports that came out from the National Science Board, the National Research Council and NSF on cyber infrastructure, harnessing data, really making research a different kind of operation than it used to be. We had leaders in the iSchools who were heading up some of these initiatives. So the Atkins Report is now canonical and referred to as the Atkins Report. We had Ron Larson who was also involved with a group on cyber scholarship. And John Unsworth who was trying to balance the score with Atkins in terms of the science getting its say. It was time for the humanities and social sciences, so he let up our cultural comm and wealth report. So this was sort of the reason that people like me, faculty who were educating librarians, people who were going into research libraries in particular, jumped in full force and sort of frantically. So that's where we were and the reports are still coming out but not quite at the same pace. There were dozens over the next dozen years and they continue to come out. But now if we fast forward a little bit, we'll see that in fact there's not only the deluge of directives and discourse, but we now have, you know, more and more actual repositories functioning for our data. So I take a snapshot of Re3Data or what used to be DataBib they've now merged every year for a slide in my class and I've watched the numbers tick up. So we're now at about a thousand research data repositories, a big number. Last year, well maybe two years ago it was more like 600. So going fast. We have other initiatives that are similar and in fact there's some overlap but not completely. So BioSharing is noting now 665, what they call databases but in fact are repositories of a certain kind. And 584 standards. If you haven't looked at their standards it's a very interesting set. Many of them are about minimum information. What's the minimum information needed in this particular type of data for this kind of research in terms of metadata and description for the kind of sharing that we want to have happen with data. So there's that and then Marine Metadata Interoperability Initiative. Previously I think it was called Marine Metadata Initiative. Now focusing primarily on interoperability, 215 plus standards, many of which are different than what we see in BioSharing. So again trying to keep up and really understand the pace and the scale is hard. Trying to get our students as immersed as possible and exposed as possible is also a challenge. And I'm sure all of you as well, challenged by this. I'm also, as Cliff mentioned, I'm also active in RDA. I was elected to the technical advisory board this last year. And how many of you have been to RDA are sort of involved? Great. Okay, so RDA is a bit of a phenomena for me. Being someone who studies the activities of researchers as well as participating, it's hard not to go in there with just my researcher persona on. But I'm trying very hard to be the kind of participant they need. The technical advisory board is really about trying to move things forward, keep a lot of overlap from happening and looking for gaps in the initiative. So RDA, if you haven't heard about them, the primary aims that are idealized are to reduce barriers to data sharing and accelerate coordinated global data infrastructure. So RDA US is what I participate in, but there is also RDA global. And so RDA global, 95 countries are participating. 50% of those are in Europe, 37% in the US and the rest from other parts of the world. We're working hard to try to push that so that we have more from the Southern Hemisphere. But it is growing. Individual members also growing very quickly. 2783 community members, 138 joined from February through March prior to the last plenary in San Diego. So things are also picking up very quickly with RDA. So in my role with TAB, we went through an exercise this last three or four months to try to bring some coherence to what was happening because there's so much growth and so much activity and it's scaling so fast. So currently there are 56 working in interest groups. So how many of you who go to RDA are actually part of a group? All right, so we've got a few. So 56 working in interest groups, that's a lot of groups and the charters are coming in and we're reviewing them. And like I said, our job is to try to not to watch for overlap and such. Coordination is very difficult, but this was an effort to try to map the groups and the nature of the groups and to get them in fact to place themselves in this matrix of what we have on the solutions axes somewhere between the social and the technical solutions and then on the other axes, the data provider, data consumer, beneficiary dimension. So 56 groups, they're all sort of speaking up and saying where they fit. This is not an exact science by any means. In some ways we wanted this mapping more to help new members navigate what's going on with this quick growing organization. So we're not sure how many purposes it's really going to serve, but again, for me personally as someone who involved with libraries but also a growing number of our students going to research centers, data centers, the corporate sector, really trying to understand the nature of the enterprise at large globally. So one of the observations I made and that was then sort of verified by the leads of the library's group and the data repositories group is that like no other group, they were struggling to put themselves anywhere but right in the middle. Of the social, technical and the data provider, data consumers. So I don't know exactly which quadrant they will name themselves or assign themselves to in the end, but I think we should pay attention to this because more and more there is strong alignment between these kinds of organizations and I think it has sort of an indicator to us in research libraries where some of our other peers and people who we can learn from might be. Another part of RDA is, sorry, at the next level we have the individual members, the organizational members, we have these working groups but many of the national data services are also involved and again a very important layer of activity. I think this last meeting was the first time that the national data services sat down in a room together and started to coordinate. So that alone, if that's an accomplishment of RDA, we have to applaud that. That's a tremendous thing, you know, that Australian National Data Service has been really active for a long, long time. They've been very strong supporters of RDA and I think, you know, they're working very hard now to move that coordination out to many other efforts. I'm also now just as of last week on the steering committee for the National Data Service and one of the roles that I'll play is a liaison between RDA and NDS. So a lot for me to watch here and for all of you to watch this big picture, the goals of global interoperability. But at the same time we need to think about how this relates to what's happening on our campuses and how we as the information professionals and research librarians, you know, are really going to play our roles in making this global cyber infrastructure work. So we also have an abundance of things happening on our campuses and I just picked a few and I'm sorry there's so many from California and I didn't really mean it to happen that way, but there are, you know, how many of you have data, data science initiatives on your campuses? Okay, now how many of you think they're going to come in the next year? Okay, so not as many as I would have thought, but if you go out and start looking, they really are everywhere and some of them are associated with libraries and the research computing and some of them aren't, some of them are just about scientists doing new and interesting things with data. But again, a trend and some, you know, an effort that's not going to go away. This is something that's happening both in the research and in the academic sector. So in fact, these efforts around data science, it was part of what drew me to the University of Washington. So we have all of this activity on these various campuses and then we have all these people, all these people on these campuses stepping up working, saying, you know, claiming we are data scientists, we work with data in new ways, we're interested in coordinating with the rest of you. And at University of Washington, there was this, what I called, I don't know if they called it, I wasn't here yet, I was just watching from afar, a celebration of data intensive research. So the data science, eScience data science effort there was kicked off, had a kickoff session, 137 posters from faculty and senior researchers and others around the campus coming together to sort of demonstrate and expose and discuss their work from 30 different departments and units across the campus. So this is sort of a snapshot of that. This actually took place in the building I'm in where the information school is, not the eScience speech, which I will show you in a minute. So here we have all, so we have all those directives and then we have all those repositories and we have all those standards and we have the RDA and all 2,000 members and then we have the national data services and all their efforts and then we have all these people on all these campuses. So there's a lot going on. And so the problems associated with this kind of, what do I want to say, this kind of trend and this kind of phenomena have been recognized in a lot of these reports and directives. One that's taken very seriously by the RDA was the 2007 report by Edwards et al. on understanding infrastructure. It's like a Bible in my classes for data curation, where it explains, it has a nice historical perspective and talks a lot about things we even heard Brewster Cale talk about yesterday, how we go from centralized, local, more homogeneous collections, efforts and services to heterogeneous, distributed, coordinated activities. And that it really hinges on a lot of consolidation and certain kinds of gateways for interoperation. And so there's often you hear sort of this debate or tension in RDA. Are we really about data sharing, which is the tagline, or are we really about interoperability, because that's what's going to make it work. Well, they're still about both. But in one of their more recent papers, they talk about Mark Parsons and Fran Berman who are to the leads talk about this being a make or break phase of this consolidation or lack of consolidation. And that these early choices are going to constrain our options later going on. And so again, they are thinking about it from RDA US, RDA global, but we too need to be thinking about it even on our campuses. The University of Washington, it's, you know, 30 units all participating in something that has some cohesion or could have some cohesion. So thinking about the institutions, which I really enjoy thinking about institutions and disciplines. And I've done a lot of work in fact on the evolution of disciplines within institutions. And I was thinking recently about a scholar I haven't thought about in years, even not since my dissertation work. Timothy Lenoir who talks, and he doesn't use these words exactly, but this is my interpretation, is institutions as the intellectual habitat. So, and that there's a really important distinction between what researchers do with their programs of research and what institutions do as what he would call builders of disciplines. But I think in the case of what's happening now, maybe we don't want to be thinking about building disciplines, but still how it is we build these habitats where researchers can build their programs of research. And their goal and their needs are to extend and legitimate the products of their work and in more and more cases that's some kind of data intensive work. And they also have to concentrate on their careers dominating the cycles of credit and resources. And we in the institutions and the support structures get to do all the rest, more or less. But the part that really struck me as I guess essential to think about again now at this point in time with all these networks emerging is that for these research programs to really gel and to be productive and important, there needs to be a place where new routines can be around and settled long enough for distinctive types of work to emerge, new types of work. And I think that's what we're trying to do at the University of Washington. I think that's what a lot of institutions are trying to do, but they're not that self-conscious about it. They're not thinking about, oh, we need to be a place where we can solidify these routines long enough for them to get traction. But in a sense, that's what we're doing. So we're trying to establish the new kinds of service roles around data intensive scholarship. We're trying to facilitate links with other disciplines and enable transmission of test techniques and information amongst those groups and out. And these were things that Lenoir was recognizing decades ago when it wasn't about this kind of distributed effort needing to consolidate, but just about a typical institution. But I do think it applies and that we can see it taking different shapes. So of course I'm new at University of Washington. I am still in my phase of everything's magnificent. You walk down the street and everything's magnificent. You go to your school. Everything is like, wow, they do everything better here. Well, I'll come to my senses before too long, but it still looks that way to me. And so I thought, well, I'll just talk a little bit about the eScience effort that was so attractive to me. So yes, it had a lot of resources behind it. So let's of course accept that at face value. And so they got it one of a very large Sloan, more Sloan grant. And it was a 37 million cross institutional effort collaboration at NYU Berkeley and the University of Washington. So together they are creating on their campuses various kinds of data science environments. And that's, you know, the more Sloan is very interested in this institutional approach to what's happening with eScience and data science. So first you have, so all those people you saw in the room that were coming together for the poster session. First of all, there's all the PIs and the PIs themselves for this effort are distributed across campus in many different units. And then we have all the eScience Institute Steering Committee, the actual governance of the group out in other parts of the campus. And then we have all the participants in that February meeting that I showed you, the campus-wide meeting. And so they're spread all over the campus. Again, there's this distributed network of activity. So I have several more slides. I want to at this point just acknowledge Bill Howe, who's the associate director of eScience, who gave me some wonderful slide decks which I cut and pasted and put together. And so I'll fess up later and show him what I've done to his slides. So one of the other things that happened in addition to the more Sloan was an effort to, again, sort of build something on the campus. Something very physical. And they, with resources from the Washington Research Foundation, invested in what they're calling the Data Science Studio. Sixth floor of the Physics Astronomy Building. There it is, the whole floor. But the magnificent thing about this was that it was a partnership among the Provost's Office, the UW Libraries, the Physics Astronomy and Art Sciences Division, and the eScience Institute. All coming together to try to revision what was actually a library space. So Betsy, I didn't tell you, I was going to show you these pictures of your library. And of course Bill made it all dark and dingy, right? I don't think it really looked this, didn't look this bleak. So revisioning the library to focus on working space and culture. And I actually scrounged up, you know, a few of the resources that were distributed, the Physics Astronomy Reading Room transition to the Data Science Studio, and then also where all the services were going to be placed. So there was lots of communication. I don't know the details of how all this happened, but it did. You can talk to Betsy or Cynthia or Stephanie about that. But it went from what many of us have is reading rooms and services that are distributed on our campus to a different kind of space. So this was a plan, and this is pretty much how it worked out. But they were very focused on what I noticed from his slides. They were calling it the tyranny of the open office plan. So the idea that everybody thought open offices were the way to go. And then once we all found ourselves in open offices, we weren't getting anything done. So that's a simplification, of course, of the dynamic. But they worked very hard in the design of the studio to have many kinds of spaces and places where working and learning could happen. So casual open spaces, seminar, closed seminar and small group spaces, quiet work spaces, and then lots of, you know, informal places to meet up. So this is what it looks like now. Note this isn't in black and white, so we don't get the true comparison. But it really is a vibrant learning and working environment. I've been over there a couple of times and I always sort of wonder if when I get off the elevator it's going to be like just this hush of, you know, quietness. But it's not. There's, you know, a lot of interesting things going on, a lot of activity and students of all ages, but also researchers, staff, etc. So they have an incubation project. And the reason I wanted to mention just this one slate of projects is that one of the requirements for being involved in one of the incubation projects is that you must physically, as a lead, co-locate and work with the incubator staff at the East Science Studio. So getting, you know, these faculty to actually pick up and go and work with others is really a great thing. And it is, again, a way of working towards this consolidation and everybody buying into this idea. There's a resident data science team, which is some permanent staff of data scientists that help with these new incubator projects. There's drop-in office hours space. I'm sorry, there's studio office hours space and then many drop-in spaces along with the incubator programs. There's seminars and lunches and workshops and boot camps. But the library's in the mix. The library didn't go away. They weren't pushed out of the space. They were integral to the design of what happened and they're still there. They conduct office hours. I think Stephanie said they're there as much as eight hours a week. You can talk to her and Cynthia find out more about the details. They're very active in some of the working groups including the reproducibility and open science group. They've been part of the site visits from the funders, etc. So they're still very much in the mix of what's going on. And just the other day I was talking to Bill when I was getting his slides and asking him how all of this is going. And the statement I remember so vividly is, I don't see how you do this without the library. So for the associate director of this, it's certainly part of the vibrancy. And so again this is a model for an institution coming up with a way to support these new distinctive routines of work long enough, at least, for some new types of work to emerge. And that's what's happening. That's what's being supported here. So this is Bill's slide. He says, data science, the rising tide that lifts all boats. Well, yes, I suppose it is, but it's also not without its challenges. And so I think we want to be thinking about the challenges as well. So I do think data science is important and moving ahead quickly. And there's this very nice image of it and its distribution. The problem that Bill notes when you talk to him, the biggest problem, is this statement. When they ask the scientists, again this is not formal. They're doing a formal survey now, but this is in him engaging with the people in the incubator projects and others. Oops, my slide's there. How do they do that? He asks them this question, how much time do you spend handling data as opposed to doing science? And he said the most frequent answer is 90%. So this is a problem that eScience wants to solve that's part of what's happening in the space that they're developing and why they've got all these personnel. But I actually found this fascinating. I flipped this completely around and said, wow! What does this mean about the quality and the value of the data that's being produced if they're putting this much time and effort into it? It must be pretty darn good. So I thought that was, you know, we often fuss, even Saeed and I when we were working on the data conservancy, we would use this, you know, the line. We want to get scientists back to science, right? We do. But I hadn't taken the time to think about also what was happening with these scientists putting all their expertise and effort into their data and what that means. So the problems that I worry about as opposed to Bill is what qualifies as reusable data. So if we're really moving towards open data where we're putting it all out there for others to use, we want to know what's reusable, not just what's releasable. So we've got releasable. What are scientists willing to let go of, or a scholar of any kind, willing to let go of? And then we have reusable. But releasable is hard enough. This is a hard problem. And one of the dynamics that I really see at play was written about by Harry Collins in 1998 in a paper in what he called Evidential Cultures. There are some cultures, particularly in Europe, where there is a sense of collectivism with data and results where they're more willing to let their scholarship out a little earlier in the cycle because they feel it's the community's responsibility to assess it, evaluate it, decide whether or not it's a value to them. Whereas at many other kinds of sciences, even within the same field, but maybe in another country, you have a more individualism approach, where it's my responsibility as a researcher to make sure it's meaningful, valid, and ready for someone else to consume. But I think this is a bit of an open question now about where we really want to sit and where we as information professionals and these institutions that are supporting these new kinds of work want, you know, where do we land on this evidential culture for our institutions? I think there's going to be some difference perhaps in who's going to take responsibility for the validity and meaning. Ultimately, that's what it's about. Researchers wanting to make sure that validity and meaning are associated with the products that they release. So let's think about it as institutions, as libraries, as research services, as research computing on our campuses. Why are we investing in data? Maybe not everyone here is, but most of you probably are seeing it happen in your library or on your campus. Well, we know about the open data requirements, and where there aren't requirements, there are expectations. We hear a lot about reproducibility, replication, and the many other Rs. I mean, there are so many Rs that you can't even fit them on a slide anymore. Carol Goble has this slide about 50 different Rs. You know, repeatability, rerun, regenerate, recreate, reexamine, restore, all these re-things that we want to do with data. But that's important, and there's no doubt that, especially in the computational sciences, there's been a really a void there. The institutions that we're involved with, we're often concerned about being stewards of the common good, what it means to secure the scholarly record and make it part of that common good. But in thinking about this institutional role where we're trying to stabilize, I also think, as much as anything, we want competitive, innovative research, and I think our data resources are going to be the key to that. Maybe not forever, maybe for the next 10 years, maybe for the next 20 years, but I think it is going to be something to be proud of. I think it is going to be something to make a claim that you are an exemplar of open research, and that you have a center of excellence, perhaps, for data around areas where you have a research prominence. But that's going to mean that those of us who are in those, you know, not the research programs, but the institutional programs making all this happen need to figure out what it means to have a center of excellence around data. And that's where it gets hard, and that's where there's a lot of research to be done, and a lot of the more demanding work for us to do in our initiatives. Because optimizing data for reuse is very different than optimizing it or just being able to release it. It has different objectives all associated with it, and there's different expertise than preserving the record of research or providing access and transparency. So much more resource intensive, but what I like to say now is what we want are rich, deep, functional data resources. It doesn't have to be big data, but this is what we want, and this is what we want in our centers of excellence around the places where we have research prominence. And of course we've always had these. We've had lots of ways, lots of data being generated, lots of products being generated. I wrote a chapter in the Handbook for Digital Humanities back in 2004 that was really about the kinds of thematic research collections that humanists create and why our libraries are falling down on their responsibilities by not bringing them into their collections in very formal ways and making them a first class object along with all the other publications. But our scientists at the same time are creating also very complex compilations of data that have tremendous value and maybe are not the same kind of product that the humanists are creating. But again could be part of these centers of excellence and part of what it is we share and that we show as our expertise. So I'll get to the teaching in a little bit here, but I've also been doing a lot of work on the research end and in doing that part of it is to help me educate. We don't really have any principles yet about how we work in our organizations to create reusable or even releasable data. So we've been trying to come up with very empirically derived principles and we've done this through several projects, the data curation profiles project where we were partners with Purdue, the data conservancy that Syed Choudhury led at Johns Hopkins and now the site-based data curation project that I'm leading from Washington. And while I wouldn't say these are necessarily proven yet, they are principles that I'm starting to take into the classroom. Think about the next generation of the professionals that are going to be working in what we hope are centers of excellence around data. So first one I've already sort of pushed on maybe as hard as I should that releasable is not reusable, but that's tightly associated with the fact that we have producers of data that produce sets of data and they have very strong beliefs about that meaning and validity I just talked about and what that data needs to look like for it to be released. But on the other hand we've been trying to investigate what it means and what's of value for consumers and it's hard to predict and we don't know a lot about it yet but we do know that it's not exactly what those researchers want to let go of. It's often some subset that you therefore in your organization may be thinking about a different kind of priority for curation. So in that science that I showed you on the last slide, stuck in the middle there is a spreadsheet of averaged water chemistry samples and we've come to learn that that data actually has somewhere where it could go, where it could be aggregated with lots of other data. But yet that scientist would not let go of it unless it was captured and kept together with the very complex data. So these are very, you know, this push and pull between what people will let go of but what really can be used in interesting ways is a hard thing for us to sort of prioritize. But we do have, and starting to see through the course of these projects and lots of projects that other researchers are working on, some indicators of reuse value. So, and it's, again, it's still very early and we need to be driven by our research communities but we can also step aside and I've heard many people, Elliott Shore said something yesterday about, but we have expertise. We have ways of seeing what's valuable and important that maybe we can convince our researchers and I do think that's true. So, and looking at these indicators of reuse value is what pushed us to the site-based data curation project because we were finding that certain deep collections of data had tremendous value for communities. Data that had, you know, was longitudinal in nature or was from, especially if it came from the same site over a long period of time. Data that came from sites that are hard to get to, of course, are very valuable. Sites that you have to have a permit in order to get data from that site. So understanding what is it that really is unique to our organization that may have very high value for others. And this came out a study of, you know, interdisciplinary geobiologists. This idea of, you know, the geospatial coverage, spatial and longitudinal. And then things like reputation, of course, were also prominent as an indicator and so that wouldn't surprise anybody, I don't think. The other, you know, really strong principle that we've come to acknowledge is the primacy of method. So in order for these researchers to let go of their data, to release it, to do the release part, much of what they're concerned about is that people understand the process, the protocols, what's happened. At the same time, on the consumer end, it's equally important. And this really came out in our work with the geobiologists and I'll say a little more about what I mean with that. So we really tried to get into what's true reusability for these site-based data. We wanted to find a way to retain the value and promote the reuse of data from scientifically significant sites. For us, the model site, or we're building this framework, the model, was Yellowstone National Park, geobiology. A place where amazing research is going on from, you know, looking at the origins of life on Earth up through life on other planets. Because of the extremophile environments, the really unique conditions, very unique data. So getting leaders from all the different disciplines who are collecting data, not all of them, but many of them into the rooms, Said was there, was a great experience. And we learned through the process of engaging with these constituency more and more about this reuse being dependent on the sampling procedures. Often something that gets lost along the way when we push data out. But the really hard part, and in fact the fascinating part, and the part when again that maybe we have expertise that nobody else has, is that we learned that for the most interesting reuse, the things that really will push across boundaries to use those microbial data that the geobiologists are producing in some of the medical research that might need to know what's happening with bacteria in those areas that in fact they couldn't use the data at all unless those researchers were actually taking different measurements than what they were taking out in the field. So this is, you know, we thought it was hard before just trying to get researchers to give us some metadata, to describe what it is that they've collected. But in fact, if we want to promote reusability and really interesting interdisciplinary research, we have to work with these communities to convince them to measure the center of the vent where they're taking their water samples. Well, this is hard to do, but not impossible, right? But it's fascinating. And then we talked more with more people. Well, yeah, the people who study coral reefs. What we need is a measurement of the fascia back on the shore in order for us to calibrate over time what's happening with these moving environments that change to make it reusable in the future. So this seems impossible, right? But it's not. We need to put our eye on it, get our eye on it. If we want to have the most prominent and the most valuable set of coral reef data, we make sure our researchers out there on the coral reefs are taking that one extra measurement. And you can claim that. That's a pretty cool thing, but it's not going to come from those researchers because they're not thinking about the other people that are going to use their data. It'll come from people like us who are out there studying it. I was really pleased yesterday. I heard a lot about collections. So the flood, the deluge of all those reports, only one of all those reports takes collections seriously. The way we take collections seriously in our institutions. And that was the Long Live Data Report by the National Science Foundation in 2005. And it's really the centerpiece of what they explain. And there's a very simple functional framework in that report that talks about research collections, the things that sit on the hard drives of individual researchers or in their labs, the resource collections that are in the middle where communities are starting to come together and do something important and they're really sharing and coming up, starting to come up with standards, and then the reference collections, which are sort of the big league, the protein data bank, Gen Bank, the things that are very well supported and have lots of activity around them. And we have this crisis in the resource collection area that they sort of predicted. But we know of the dozens and dozens of growing, really rich, deep, maybe not quite functional yet, resources like this that are out there. And there's really not a programmatic way of dealing with them. And again, this is something I think institutions are going to need to claim and want to promote for themselves for us to get a solution to this. And if we think back to the Atkins report, I mean it really was about interdisciplinary value-added collections. Unsworth talks about collections. They all talk about it, but the NSB report took it very seriously. And I bring collections into a lot of the writing and thinking that I do. And in fact, in the chapter I did in the Handbook of Interdisciplinarity that I'm currently revising right now, I close with the thought that, you know, the greatest challenges that we have are not in this ability to move across disciplines, but in fact, to maintaining the increasingly long and mutable intellectual paths to the disciplinary past. And again, that is what we can do in our institutions with our collections that we build and promote in these stable environments. Okay, so what's going on with the workforce? How do we get them into these important roles? Well, I don't know how many of you have seen the recent output from Carol Tenepere and her colleagues, but I think it was associated with the Data 1 work. They're showing a decrease in research libraries, or academic libraries more generally, in all kinds of services around data. So I'm not quite sure what to make of this, because it doesn't quite line up with what we're seeing in our programs, but it may be true. How many of you think this is true? Think there's a decrease? How many of you think there's an increase? Okay, they're looking further at what's going on. I'm not sure what's going on, but something is going on. So it could be that things, and I think one of my theories is it's dissipating, right? It's becoming more ingrained into the sort of units we already have and the expertise of people we already have. Ron Larson, actually there will be a report coming up very soon from the National Academies Committee that we were both involved in. He showed this slide at the International Digital Curation Conference last year of some analysis he did on Indeed.com and showed what was trending up in terms of the workforce, demand for people to fill certain roles, trending down. The only one was librarian. Yet trending up, information steward, data steward, digital repository, digital preservation, curation science, digital curator, data curator. So lots going on there in terms of new roles or new ways of thinking about our expertise. At Illinois, oops, got the wrong way. At Illinois, we had, we placed probably 100 students have come out of the specialization and data curation there, and 40% of those have gone into academic settings. A quarter of those were in academic settings but not in the research library. So in research computing or more likely a research institute where these kinds of, you know, unique collections could be being built. Many of those roles are focused on metadata and technology. Many of them probably didn't exist five years ago. And then there's a whole list of things that our students are doing that are not academic positions. So worrisome to me a little bit because we may be losing some of our best people. We've been working hard to get more expertise to our students as fast as we can. The DSERC project funded by the Museum of, the Institute for Museum and Library Services has been our next effort that's really been successful to get more expertise in the classroom. We bring experts from the sciences, from cyber infrastructure, R&D, from international data sharing and funding and policy agencies into the classroom. But we're also putting our students into these internships giving them field experiences with multiple mentors that are both data management specialists and the scientists themselves standing at the National Center for Atmospheric Research. Lots of great projects coming out of this and the best part is that NCAR will tell you every chance they get that there's tremendous reciprocity here. Our students are bringing a perspective and a level of expertise that they did not have before. And so we expected in this very premier data center to be the ones gaining everything. But we're not. A lot of materials are getting out. They're getting described in ways that they didn't before. They're more functionality. So these rich, rich resources are being released. We've learned more from this effort in how, what, maybe what I would call now, our peer or sister institutions in these data centers. Remember I showed you the RDA picture with libraries and data centers in the middle. These are the things they are looking for. These are the expertise that the National Data Centers, when we went out past NCAR to 20 other national facilities to see what they needed from our students. These are the expertise they're looking for. We don't really have time to look at it in detail. Interesting though, they said they're expecting some growth, but it's more modest or moderate growth in these positions. They will have fewer specialized roles. So from now on it won't be someone just does data transformation or data cleaning, that they'll be doing metadata and front end and back end and everything. They have tremendous need for people with information science backgrounds, but their internship programs tend to be restricted. They tend to bring in the domain scientists. They privilege the domain scientists and occasionally computer scientists. It's very hard to get our students in the door. So we have to make special efforts to do this. Okay, so I'm going to close up. I'm probably running a little bit long here, but if you looked at my abstract, I said something about that we have this tremendous data science going on and it's science with data. We don't really have what I would call data science in the science of data in the same way that we've had information science that has been the foundation for what we do in our fields. I feel really bad that they've hijacked our words in this way, but I think we can still do it. We still have a science of data as first class citizen. We've got too much to lose to not get it right. My favorite line now is anybody comes to talk to me. I just say, well, your analytics are as good as your curation. So data science isn't going to succeed unless it has this behind it. And we do that by marshaling our strengths in LIS, leveraging all the progress in all the informatics disciplines, but also the disciplines that are, you know, picking up their samples out in the field in those hot springs in Yellowstone National Park and leveraging what they understand about their methods and their data because it's the most important thing. And then we build a new foundation in the science of data. Okay, thank you for your attention. We're done. You are exactly the people I need to have a conversation about this with. So I hope you have some comments to share or questions. Hi, I'm Jodi from Alabama. You said that we were losing or potentially losing our best students because they're not going into academia. Isn't it more likely a challenge on how we can best meet their needs without restricting ourselves to just librarians? The students were losing to non-academic positions. They will continue to need organizations like this. I hope so. Shouldn't we expand our vision of who our students are rather than consider that we're losing them? Yes. So as a research library, you will say these are your partners, right? And when we used to teach the summer institutes, I would tell the people coming in, they're usually less of an associate director level ADs and libraries that, you know, you can't do this without the partner, the data person who's in the organization helping to feed the data and prepare it and make it how it needs to be before it, when it comes to the repositories or to the libraries. So I do think it's a partnership, but I also think you should all be very cognizant in the fact that your student, to do what we need to do, you've got to be able to do all this. And these are the people you need, you know, to have the credibility and the capacity to do it and making it, you know, making your institution's places where they want to come. I think, you know, we, I think we educate people in a way that they find it very exciting to go to libraries because of the things that we bring to the problems that no one else does. But I also am seeing a trend for a while. It was half and half. You know, we had 50% libraries, 50% other, but it's shifting the other way a little bit now, so. I'm just concerned that we aren't expanding our vision of information. Scientists, we're restricting ourselves to thinking of our students as librarians and ourselves as librarians. I agree with you. I think that's a mistake. Yeah, I agree with you. Tim Lance from Nice in New York. This is just a reflection on that 90%, 10% comment, which I think was wonderful. But it depends on what they mean by it. Handling data is something you can do and then stop doing. And you know you're not handling data. But a researcher doesn't have an off switch. They're running all the time, but they'll admit to it only when they're making progress. So one of the real advantages of handling data is it takes your mind off of things and you lose your perspective. And two quick examples. Clark Ray got one of his great breakthroughs as he was stepping off a streetcar. Or Bob Schrieper, who shared the Nobel Prize for Superconductivity. The insight came writing in the New York City subway, which is known to rock and quantum waves, which is probably a help. But it was off the beaten track that they dropped the prejudice that was keeping them from seeing. So that's interesting that they all said 90, 10. I think I would have given the same answer. Right. But I wouldn't have been honest, really. It's 100 and then 90% of that is handling data. I think it's a really good point. We often talk about, especially the humanities scholars that we study, we talk about that most of their projects, you know, they go on their entire career. They don't ever stop studying what they're studying. And they're continually collecting and analyzing and just goes on and on. In some of the other fields, at least projects end, but you're right, the next projects just overlaid on that and it's building. Yeah, and it's a very anecdotal number. But I think, you know, Bill sees that as a problem to solve and I just thought, well, isn't that interesting? And what does that say about the data? So it might say that that's how science really is. But we hear a lot of complaints also that, you know, researchers do want a division of labor. They want to be able to move some of the work away from their day to day. So I think this is all, you know, emerging. You know, how these divisions will end up. And I think it won't be permanent. As I said, I think this big, I would even be surprised if I were still at Illinois in 20 years to be graduating the same number. I would think we might even reduce the number of people who were graduating going into these kinds of institutions and maybe they're going somewhere else doing another layer of data work. The interoperability work, for instance, will be big. You're going to release everybody? Not quite yet. I've got a couple questions I want to throw out. Yeah, two in fact. So one thing I would really love you to give us a view on is I've heard some, not all of the data curation challenge, but a piece of it characterized as a somewhat transient phenomenon in the sense that the next generation of scientists will as graduate students learn at least some of this. And so it will just be part of your basic knowledge of methods in your field. It's a little bit like the challenges that came with information technology where I would say in a generation many scholars became more self-sufficient with it and less reliant on outside expertise. Do you think that's actually going to happen and how do you see that changing the picture here over, you know, think about, like say, the next 10-year time horizon? Yeah. I do think it's going to happen. We're even involved in projects and proposals where we're trying to move data curation out into MOOCs to get great exposure even for younger constituencies. There's no doubt that we need that to happen because if we want to do all these interesting things and have data ready for that, the pipeline isn't going to work unless a lot of it's happening early on, especially the things that make it rich and deep and functional that we want. But we're going to have to do that in partnership. I don't think it's all going to get absorbed and hopefully a lot of it will be automated. Right now there's a lot automated in terms of the workflows and computational work, but this field work that we've been studying very, very slow and hard. But there are a lot of advocates and they're ready that we're moving sort of the site-based framework into the curriculum for geoscientists right now. So undergraduates who are going out to collect data are now being told, not told, but educated about what it means to have the certain kinds of features, actually that extra measurement taken, what it does for their data. So I think we'll see it. And I do think that's what I said. I think some of our roles are going to change and that there will be some of it pushed back. But I think what we want to keep our eye on is the interoperability layer, the new interesting things, the things that we can claim and promote. Well, the things only we will look at because we are information scientists and then those things that our institutions can grow and be proud of and really, you know, promote as great, you know, fabulous research resources that are the envy of all our peers. There, you know, I think we'll see more of people like us contributing to those. All right. Just checking. I'm not hogging the mic too much. My second question or comment. So I love the slide near the very end about, you know, your data science is only as good as the underlying data curation. That is one of the most succinct expressions I've seen of the difference between the two and the interdependence between the two. It's absolutely wonderful. I see a lot of data science today where I'm really worried and suspicious of the underlying quality of the data. I don't think it's been well curated and, you know, you feed it into these sort of analytic machines and the most amazing things come out. Beautiful things, right? And so I'm wondering if you have some thoughts on how to really kind of underscore that in data science education. I mean, this is a wonderful thing to tell people who are fundamentally focused on curation. But how do we remind the data scientists that they need to have this quote on their bulletin board as well? Well, again, hopefully now we can start... We're going to work in partnership. Like I said, there's all this escalating data science, exactly what you're talking about. I mean, really exciting work that's going on and probably less consciousness about the data. So I think going back to the idea that I talked about in terms of really focusing on the meaning and validity of data is going to be critical. And where that happens, again, is different for different communities. The individualists versus the communalists and whatnot. And again, I think the information professional who wants to make a claim about this magnificent data resource we have is now going to have to take up some of that responsibility or at least educating the data providers to do that. But I'm completely with you. And the interesting thing is when I say that now, I've found that the people in industry and corporate, which I get a lot more coming to my door to want to talk about how do I write a job description for to get the best bloody data curator I can get for my data lake? They get it immediately. For them, it's more internal, right? We want to be competitive. We get it. We know. We've been messy sloppy. We want everything to be in order. But it's the open data layer that, you know, surprisingly, I think you're right. They're ready to jump on board with the idea of open data and transparency and reproducibility. And often just with data that's associated with a paper, which isn't where these deep rich resources are going to come from. It's going to come from another kind of activity. So I, again, it is partly education, but it's partly having, again, someone who, what they care about is the data. We get excited about the data. We get excited about the information. For us, that's the heart of what we care about is making it as functional as possible. So I think maybe we'll just be a burr in everyone's side from now on out. But, again, that's why you show up at the door. No, you don't have it anymore. With these capabilities, right? The concentric, so you show up with these capabilities and, you know, then you can say something about, well, you know, because I have all these capabilities, I can talk about your data, about the validity of your data. No, it's big. It's huge. And I think part of our responsibility is going to be doing what we can to make it clear that that's responsible science. It's responsible research. I don't see anybody else with a question. And I'm thinking we're about at our time. So I think rather than climbing back up there, I'm going to just stand here and say thank you so much, Carol. Thank you for the invitation. And with that, I wish you all safe travels. Thank you for joining us.