 Good afternoon. My colleagues and I are going to talk to you a little bit today about organizational implications of data science environments in education, research, and research management in libraries. My name is Jenny Malenberg, and I'm the data management librarian and data services coordinator at the University of Washington, and my colleagues here will introduce themselves shortly. Hi, I'm Vicki Steves. I'm the research data management and reproducibility librarian at New York University. And I'm Eric Mitchell, associate university librarian at UC Berkeley and director of the Northern Regional Library Facility. So we're going to talk about four broad topics today. I'm going to introduce the data science environment and tell you a little bit about how that came about. We'll spend a little bit of time exploring perspectives around the data science impact in libraries, and then also talk through potential positions and roles for data scientists in libraries. And hopefully, Dan will have some time for a conversation about next steps. Our main focus in doing this was to kind of think through how to leverage the impact of the data science environment to have impact in our libraries. So a little bit about the project. In 2013, the Alfred P. Sloan and Gordon and Betty Moore Foundations established a grant. The three participants were New York University, University of Washington, and UC Berkeley. The goal of the project was really to harness the data science environment and see how big data and data science impacts research as well as education. It's a cross-institutional effort and the interdisciplinary nature of the project is really kind of one of the unique factors in this project in particular. I'm going to turn this over to Vicki and she'll talk a little bit about some of the specifics of the program. Thanks, Jenny. So the MSDSE, say that five times fast, has essentially three main areas in the midst of a lot of moving parts. So the first goal is to develop meaningful and sustained interactions. So this really focuses on the ability of researchers to liaise on with folks outside of their own fields. So for instance, at NYU, we have a lot of interaction between social scientists, computer scientists, et cetera. And it's not only a mix of research ideas, but also research methodologies, which makes data science and in particular the more Sloan data science environment a really interesting place to be. The second is to establish career paths for data scientists. So this is kind of where our presentation comes in, where we talk about long term sustainable careers for people who can leverage big data in a lot of new and interesting ways. The third is to create an ecosystem of analytical tools and research practices. Again, we're focusing on sustainability, reusability, extensibility, and the ability to translate across many research areas. And these goals essentially turn into some working groups that have local instances and cross institutional collaborative objectives. So these are some of the working groups here and the titles of them are highlighted. And I'm going to hand it back to Jenny and she's going to talk you through the first two. I'll talk you through the second two and then Eric will end with the last two. Okay, so for tools and software, there's clearly a lot of people that are creating tools and software to do their research. So each of the environments at each university presents seminars and workshops where researchers present the tools that they've written or the procedures that they've used in order to do their research. They're also highlighting maybe particular problems they've encountered during their work and how they solve those problems. Also tools and analysis protocols are in the process of being built, they're then shared out to the university. Another thing that the data science environment does is also shared out to the greater University of Washington community and the same holds true at the other institutions as well. Another thing they do to try to share this kind of effort is they hold office hours and I'll talk a little bit more about that in a couple of slides. For careers, each institution has several kind of different groups of people they employ, tenure track faculty being one of them, also research scientists and affiliates. And what's nice about affiliates is that they are housed predominantly, potentially predominantly or halfway with another department but then also affiliated with the Data Science Institute, which helps broaden the impact of the Data Science Institute in particular and then also brings data science methodologies into another discipline. For education and training, each institution is committed to developing pedagogical approaches to data science. Again, keeping in mind flexible enough to cater to the interdisciplinary nature of data science but also we want to build a common core of skills, knowledge, language and community. So this includes building data science curricula with wide appeal to the heterogeneous data science community as well as holding continuing ed such as seminars, boot camps, hackathons and online courses to offer a wide variety of coverage on data science topics. The reproducibility working group of which I am a member focuses on developing infrastructure and a suite of tools that supports the process of making scientific experiments reproducible. So additional work in this includes dissemination of information related to reproducibility through office hours, short courses, tutorials and workshops. Essentially the reproducibility working group wants to make it really easy for data scientists to express their needs to use reproducibility tools and to discuss ideas. The other two working groups first are physical and intellectual spaces. This working group I think was really established at the beginning of the project to help the three universities think through how they would bring data scientists in the larger data science community together at each institution and really deliberately or intentionally plan for collaboration in an open space. And so UC Berkeley and at University of Washington these groups are very active in the first year. In fact this largely consumed I'd say the first six months at UC Berkeley. I understand NYU this process will be kicking up soon because they're moving into a new facility. The ethnography working group I almost look at as a metal working group. The intention of the ethnography working group is to study how data scientists interact amongst themselves and within their community. We see these ethnographers across all of our different working groups. They're often active participants, many of them come from some part of the data science field and they do what ethnographers do. And I think initially my view of this was the ethnography working group was in part tasked with assessment but I think assessment is the funders found after the first year was really a much larger effort and we needed to think broadly about that as we do with other areas. I suppose it's worth saying that the six working groups collaborate locally at each institution and then across the entire project. And annually we have a summit where we get together and we talk through issues raised in the groups and actually the three of us formed a library working group in this past fall summit based on some of the input from the community and the issues we had raised. So these are the set ones but there's also a bigger sphere of activities there. So now we're going to take you through a bit of what each data science center or community looks like at each institution. So to start with NYU in particular, unlike UCB and UW, the Center for Data Science at NYU is housed outside of its library. So it's got its own floor at 726 Broadway but as Eric was saying we're soon to be moving next year. We're very excited. So the Center for Data Science really functions again as a place where multidisciplinary researchers can come to benefit with regular meetings with each other. So in my earlier example I mentioned sociologists and political scientists getting together with astrophysicists, computer scientists, etc. So the Center for Data Science is really a great incubation center for working with big data in a multidisciplinary setting. At NYU we do a lot of outreach and dissemination of these ideas through data science lunch seminars and to appeal to social scientists. We do a text as data series because I'm sure if you worked with researchers, sometimes I think their data is not data but we're trying to really combat that kind of cultural bias. Additionally, NYU has a data science showcase. It happens about twice or so a semester. It's basically an evening where we explore one faculty member's research. In terms of education our Center for Data Science has a master's degree program but NYU also offers specializations in data science through other degree programs which we'll discuss in a moment. And we also offer office hours for consultations. Myself I have office hours in the CDS where I offer support for research data management and reproducibility. But we have other faculty members who also work in their domain. So at University of Washington? Yeah. Jenny can talk about that. Want me to come up? Okay. So at the University of Washington we actually already had an existing eScience Institute that then became the affiliate for the data science environment. What we didn't have was a space so as Eric mentioned the first several months were occupied with creating a space for what became the data science studio and it's housed in what was our physics astronomy library. So that took some time to set up and now that is used as the primary location for all the data science work and seminars, office hours, etc. Projects from multiple departments meet there. People involved in various research projects can meet there. It's also a lab that's open to the greater university community so students can come in and use computers or come to office hours. I mentioned speakers as did Vicki. There's regular seminars where university researchers from our campus present their work and then they also bring in people from outside campus. Our postdocs present their work as well. For education the committee for education that's involved with the DSE has started what's called a transcriptable option for undergraduates which means undergraduate students in various disciplines can get. It goes on their transcript saying they have kind of a concentration in data science. There are also masters and PhD programs being formed that allow people to have the same kind of background degree. The office hours that are presented also at the data science studio include not just the statisticians and the data scientists from the data science studio but other statistical consults from on campus. There's Amazon Web Services and Google Cloud people that are there, the physics astronomy librarian and then myself as the data librarian hold office hours there as well. The UW program has also included they're calling it a data science incubator program wherein people from around departments on campus can apply to have their projects incubated at the data science studio. And if the project is chosen what happens is that researcher commits to a certain number of hours per week to be spent for winter and spring quarters in the data science studio and in that time they work with the data scientists in order to further the analysis part of their project along. And that's actually affiliated with the next slide. The University of Washington has also instituted a data science for social good program modeled after the program that's at the University of Chicago and Georgia Tech. The goal of the program is to really bring people together from different disciplines and the data scientists together to work on projects that benefit social good and potentially have implications for policy. The theme for 2015 at UW this year was urban science and our projects centered around transportation options, crime prevention, education, sustainable urban planning, etc. And it's a program that was very successful and they're just starting to accept applications for the 2016 year. So UC Berkeley we have done a lot similar to what the University of Washington is. And I think it's actually interesting to note that one of the early influences I believe the libraries had in this project was helping shape the notion of how that space would be used. One of the early discussions at UC Berkeley was whether or not the data science environment was or the office or the space was about having offices for the KOPIs and the data science fellows or whether or not it was about public and collaborative space. As the project has gone on we've seen them reinvent other library like services like consulting services and reference desk services and instruction. And so I think it's actually really useful to have them housed in both of our libraries and kind of more closely able to work with some of our colleagues. At UC Berkeley we had a really interesting discussion early on in the process about what the mission of the particular data science environment in our location was. We focused a lot on software development and so we were really interested at first in finding the best software developers for open science projects out there and making sure they came and joined our community. At the same time we had a lot of advocates for educational initiatives and at the end I think both of those kind of identities and those themes play out of the UC Berkeley space and in the community. We have a really strong outreach activity but we also have some very large programs like R Open Science and IPython that are kind of logistically housed within our organization. We haven't quite talked about software yet but I think software is an interesting area to mention. It's something that the data science community at Berkeley they kind of connect with, they resonate with. And I noticed that at NYU you have ReproZip and other strong software development and UW has VisioMetrics so this notion of using this data science community or the environment to build platforms that we then advocate for adoption is I think very strong. So as we transition to kind of the exploratory part of our session. We wanted to look a little bit about how our universities are training data scientists and then ask the broader question how libraries might think about leveraging data science expertise. I think it's interesting to note that in the term of this kind of initial data science environment program which we're in year two, right? Each of the institutions really has launched or magnified educational efforts so at UC Berkeley we launched an undergraduate program. The vision of which is to give every undergraduate student who comes to our university training in data science and as well as to magnify the connector courses or the cross disciplinary efforts around data science so that those students have gateways into their core discipline in the future. NYU, I'm curious Vicki. Yeah, so one program that always sticks out to me in particular so we have a master's degree program and through the center of data science as I mentioned that gives an MS in data science. We also have several other programs that offer specialization or certification in data science. One that I really love is our master's of sociology program has a specialization and applied quantitative research and that also comes through with heavily influenced by data science methodologies and classes. And UW, it's similar, I mentioned before that the East Science Institute at UW kind of existed prior to the DSE and we had I think one or two master's degrees that you could get that data science focus. Since then they've added quite a few more. You can get a cross disciplinary PhD that includes data science and then as I mentioned the transcript, I can't say the word, transcriptable option for undergraduates is really going to broaden the core curriculum. They've worked together from researchers from many different departments work together to ensure that that option has the same set of core classes across the discipline so that students coming out with that have the same set of skills. Yeah, I think at Berkeley we have a strong computer software developer, a hacker culture, and we've really seen that community start to shift over to the bid space, the Berkeley Institute for Data Science Space. I think that's actually really exciting for us. So the balance of our time, we'd actually like to spend talking about the so what for libraries. The data science environment is very much an effort focused on advancing the impact of data science in academia and across other fields. The three of us are involved particularly because of the library connection or the information field connection. As we were talking about what first topic would we like to take on as a working group, we really went down the career path avenue. This in part was inspired by a conversation I had with one of the data science fellows at Berkeley where he looked at me with kind of disbelief when I suggested that maybe a stable career path was in libraries that we cared about these issues or development opportunities. And so I think that sparked the question that we all had is actually how would that play out for this person? Because I had a hard time kind of saying to him at that meeting, well you would do this or you'd get that type of job or here's what your career would look like in 10 years. So with that objective, we want to focus on two questions. We want to, one, think a little bit about the landscape of data science from the library perspective or the library information science perspective. And two, we actually engaged in a small thought experiment to actually think about career paths for libraries. So thinking about example programs and clear paths. I know we're all familiar with the clear postdoc fellowship program. Many of us have probably already had a fellow either as part of our libraries or engage with us in some way. From my perspective, that gives us a good foundation to actually think about how we make use of postdoctoral researchers in a useful way in our library so that they accomplish some of their career goals while we accomplish whatever objectives we have. There's an interesting report that actually came out of CNI from last year that talks about digital scholarship centers. And I noticed there's a session that was earlier today that talked about this as well. We're seeing increasingly that the scholarship centers take you, you know, find people with different skill sets, folks who want to pursue all academic careers and bring them into the institution. One I know really well is a myth at the University of Maryland where they have a really diverse set of researchers and developers who can help build a community of scholarship there. And a local example in the University of California system is actually the University of California, San Francisco where they've brought in somebody with this expertise to serve more as a liaison or a bridge role with academic departments. In thinking through these three programs, I'm also actually kind of inspired by how some of the PIs on the data science environment came into this program. So Saul Perlm out of the PI at UC Berkeley had a long career in astrophysics and found data science, I think, you know, comparably late in his career. We're seeing folks come in now who make data science their initial focus. John Lacun is a really easy NYU, but he's got this very heavy applied focus at Facebook. And so I think we're seeing in the data science world people with different career paths and maybe different objectives for either for research or industry or other applied methods. So we wanted to see, is there any training for librarians in data science before we started to really dig into, well, what is there for data science in LIS? So there are two that stick out as really exemplary and really great representation of this existing connection. So the first is at Indiana University in Bloomington. There's a data science specialization for an MLS program. So you can see there from the description, academic libraries are hungry for librarians who can work with and manage big data projects. So this talks about supporting the work of academic data scientists more than, say, applying big data techniques yourself. The other one is at UC Berkeley in BIDS. It's the master of information in data science. It's MIDS at BIDS. And it's a part-time online program that essentially wants to train data savvy professionals and managers. So it's distinguished by its disciplinary breadth. It's not simply just math or modeling. It actually does social science, policy research, as well as computer science and engineering. So these are two, you know, LIS programs are not necessarily turning out data science experts. So libraries maybe have to look elsewhere for this type of expertise. And so in thinking through that question, how can libraries make more use of data science expertise and data scientists, the three of us decided to approach how each of our libraries might design and post a position, what's the skill set, what's the expected role, impact, and potential career paths in libraries that align or maybe does not align with data science career paths. So our assumption here is that libraries provide sort of an all-academic path that's kind of underutilized and appreciated in the data science community. A lot of things that we've heard from our data science fellows is that we lose them to the private sector. So there's a lot, you know, marketing research, etc. In the private sector kind of takes a lot of graduates from these programs and we're wondering, well, how can we keep them in academia? How can we grow this even further? So we each selected a different position type. You can see the three listed staff appointment, dual appointment, and library academic appointment for our little thought experiment. And we drew from position descriptions in library as well as across academia to form these conversations. So I'm actually a dual appointment, which is why I wanted to take this one. I'm a faculty member in the division of libraries, non-tenure, and I'm a professional staff in the Center for Data Science. So this was kind of close to home for me. Obviously when you think about someone who's a dual appointment, they have to have MLIS skills as well as some specialization in data science. So the degree to which that specialization is necessary, we've maybe could get your thoughts on after the presentation, but I would expect at least a master's or perhaps a computer science with an undergraduate specialization in data science. Some of the things that we thought about for potential impact for these dual appointments develop programs and libraries for data science folks because currently within libraries we're lacking. Additionally, library services made better through the use of data science techniques and analytics. One example I love, I was in a conversation with a colleague the other week telling her about this presentation. She said, oh, there's this great project I've been thinking about for a while and it seems like it's a big data alley. So what is it? She said, well, I would love to see if we could measure at any given moment how many seats are occupied in the library and maybe visualize where those seats are taken and where open seats exist for students to utilize. We do this in a lot of other ways, right? So for laundry, I know at my undergraduate university we could see what laundry machines were taken and what weren't. So it's just an interesting little library service that kind of was conjured up through this conversation about integrating big data techniques into the library. Additionally, a potential impact for someone who's a dual appointment would be to develop research infrastructure sophisticated enough to handle interdisciplinary big data. So this could take a number of forms through integrating HPC, high powered computing with repository systems, digital archiving and preservation. This just kind of folds into some potential career paths and roles. So there are a few of the obvious ones, right? So a library and ITS services where someone make a big impact in terms of building infrastructure, maybe a research infrastructure manager. There is me, an RDM librarian. So a reproducibility advocate. They do active outreach to the data science community. There's a data science subject specialist. So that's like a library liaison to the data science school, perhaps doing collection around data sets. And again, integration of HPC and digital preservation for data sets through becoming an infrastructure manager. So I took on the staff appointment and I'll say up front this is kind of, I view it as a professional staff appointment somewhat opposed to the academic librarian appointment is how we categorize those. You know, I tended to, as I looked through the position descriptions, I wound up finding myself looking at information technology roles and information technology domains. And I don't know that's necessarily the case, although you can see in the career paths that I laid advancement through information technology or business process ownership that that seems like a natural fit. I was struck as I was doing a little bit of research on folks who have pursued data science career paths, how much they talk about applied information technology. And there's an interesting book called Data Scientists at Work where every chapter profiles a leader in this field and kind of without, to a fault, they all said, well, I do all these great leadership things or these strategic things, and then they would talk specifically about the tools or the application or the problem they're trying to solve at the moment. And so I think this notion of being applied really resonates with this community. As far as career advancement, certainly moving up through the information technology field through an operations support field makes sense. I believe there's actually a lot of opportunity for some form of business process ownership as well. You know, actually running a data analytics service or publishing some sort of platform for people to use. And we see a few cases already where folks with a data science emphasis are being put into strategic leadership opportunities. Chief Chris Wiggins is the chief data scientist, for example, at The New York Times. And he talks a lot about the role in data science in helping them refactor their platforms so that they make sure they're addressing their community needs, but also in helping the newspaper itself rethink the sorts of journalism it does. And so he kind of gets to that notion of how strategically important this field is. I took on the academic appointment. And what was interesting about this is there are already a lot of positions that are described as data librarians out there. And so I tried to look at this at maybe a slightly different tack to see what else we could do to have data science actually practiced in a library. Similar qualifications to what Vicki was saying, where there's an MLIS involved, but also then some sort of certification or a degree in either a data intensive research field or in data science itself. And expected roles would be to promote best practices and data management on campus, as well as ensuring the university's research outputs are archived, curated in perpetuity. Potential impact, like I said, would be the curriculum and best practices around research data management, as well as potentially customizing existing or developing new tools to promote those best practices and help people out. It's possible that that position could then also work within a grant writing process at a university level to help with that archiving piece that's become more standard with various data management plans that are required by federal agencies. Vicki had mentioned about assessment and planning with library data, looking at use and statistics that we already, so many of us are collecting in great volume, but looking at it potentially in new and different ways to try to figure out different information about the library as itself. The career path would be, again, something like a research data librarian who's also a domain expert for data science, and that would be that more traditional liaison role with the increase in programming at the undergraduate and graduate level for programs that involve data science, just having that liaison that's helping in that more traditional academic library role, those kind of duties, potentially as a grants and data manager that I mentioned, someone who could help with that part of the research process, and then at some universities, there's someone who works as an embedded librarian who actually gets appointed to work on projects and serve as that information management person throughout the lifetime of a project, and that's also a potential role for someone with these qualifications. Okay, so in thinking about these themes and what roles and skills and impacts these kind of things would have, we came up with a few ideas. One of the things that really came up was thinking about what degrees are required for a data scientist to work in a library. There's the traditional expectation that an MLIS is required and potential data science becomes kind of a secondary degree or an undergraduate qualification, but we didn't come up with any answers, but it's an interesting conversation to have. At what level do you require those two things to be equal or maybe data science wins out over an MLIS? Again, we don't have any answers, but it would be interesting to hear other people's input on that. I think the source of the degree is actually an interesting area of discussion. So the MIDS program at UC Berkeley is actually housed in the School of Information, and they come out with a policy and high-level strategic focus, maybe not necessarily a high computational focus, and there's been some very interesting conversations at our university around where the home department of data science is, and obviously there's not a single home department, but I think it shows the kind of cross-domain thinking that's there and certainly kind of raises the question where's the, how does the value of librarianship or how do these skills in librarianship fit into this? In the career instability track, something I've seen from working with the fellows in BIDS is that they're very entrepreneurial. Many of them have a considerable amount of soft funding that have come by them having to stand up their own research projects and find support from it over time. They tend to be frustrated with the lack of a clear career path, so you can imagine the cynicism that might come from being able to raise a lot of soft funding for a university you work for, but not having a guaranteed job over the long term, despite that. And I think they often are, they're not necessarily academic in nature or focused on becoming academics. The folks who are probably stuck at BIDS have that in their portfolio or kind of vision in their future. So I think one role that the academic career path in libraries lays out pretty clearly is that notion of a progressive career path and potentially funding stability if they can find the right fit. Kind of recognizing that the notion of an academic librarian is pretty rigid in our field. We talked a little bit about how a dual appointment might actually give a mix of representation in the domain that somebody might come from or actually give them the chance to kind of have impact in an area outside of librarianship so that they're more successful as an academic. And certainly in the staff domain, the professional staff domain, I have met many people who care very much about a specific tool or have really devoted themselves to some sort of application. And as we talk about where they might fit, that notion that they get to focus on that as part of their career path seems to make a lot of sense to them. Common thread within libraries is perhaps in terms of specifically analytics. There's often a great impetus to look at tools and services external to the library. However, I think business analytics is a great example where data science can be pulled in, and data science techniques can be pulled into a library setting where our traditional assessment methods and knowledge, we could be making better use of it through incorporating these business analytics. We get great statistics that no one's questioning that. But I think in terms of using existing toolkits from other domains, data science is somewhere where libraries could really be making a huge impact. So, in this thought experiment, we pretty much identified some potential roles, but we're not really sure of the perspective of our data science colleagues. So, our next step is to survey them. We design the survey, and this is just a rough outline of what we want to capture to get input from the data science environment community, hoping to launch in December or January. We'd really love to come back and share those results with you. Essentially, what we're hoping to capture is perhaps some cultural understanding of what is a librarian, what is a data science? Do you believe that there are roles for data science in a library, and if so, what might that look like? Or what kind of position would you really enjoy to have in a library? What would satisfy your potential, your career goals? And could a library fill that gap? You know, I think conversely, we're also trying to figure out what value libraries actually have in the data science domain. It could be that the real connection here quite simply is that libraries are there to support this new form of scholarship. I don't necessarily believe that, but I know that in my own experience we've had trouble translating the expertise we have in the building into expertise solving specific problems that we have. And so, we're really interested in getting perspectives from both of these sides, from talking with our data science fellows at our institutions, as well as the folks in the library who might not quite have that same view of the world. So, we have one final slide where we have just a few questions. And we actually wanted to take the balance of our time and talk with you a little bit about this. We're actually curious to hear more about how you view your own libraries or how you would view data science in your own libraries and what sort of positions you might recruit for with it. I'm curious to hear if you've actually got somebody in your organization already filling this role that might suggest a useful career path for our fellows as we help mentor them and figure things out. And certainly if you've got ideas about where the right connection is between four libraries actually use data science techniques, I think that's an area that we haven't quite explored sufficiently. Certainly, we tend to use the technology tools that we're well versed with in our organizations and we don't know to necessarily to force out. So, with that, why don't we turn to the floor and we're glad to answer questions or coordinate a discussion.