 Let me start by saying that I'd like to thank the IMLS for funding our project to study data science in libraries. It's been a labor of love not only for the people that are listed here but I know many other people that participated in this study to train librarians in data science to really give us exposure to this. And so some of the things that came out of our funded grants where we held a workshop and we're close to, very close, we wish to, we actually wanted to share our report at this talk but at least we can present on some of our early findings for everyone so you can follow along. So we framed initially the workshop by around the national digital platform and so really one of the things we thought was that librarians really needed to know how to manipulate, analyze and manage data. It's very crucial for where we're headed and we focused on two challenges, the skills gap where we were finding many librarians trying to do ad hoc training, so various programs that are out there but sort of in an unstructured way and then the other challenge was the management gap. And so management just not necessarily having the stories, the background to understand what was required for their librarians to employ data science in their libraries. As I mentioned before we held a workshop with roughly 45 people from various backgrounds from public libraries to industry to librarians in different roles to you name it. We had I think at the end of the day a very diverse group. We put a lot of effort in determining who would come and so we met over two days just discussing this challenge and one of the things that stood out was that data science didn't necessarily resonate with many of the people there that really Data Savvy did, felt more comfortable about it and it explained the situation better that we really work across a spectrum from having to know the deep statistical and software engineering skills to the advocacy, the policy development and the data management planning. And then I really emphasized to these roles that not all of us can be a unicorn but the shared roles that we have from different perspectives that were all in this game together and that was something that shined through in the workshop, everyone felt that way. We did an environmental scan, we brought together some of the bigger training initiatives out there from data scientists training for librarians to library carpentry to data science and visualization institute for librarians which is at NC State and ANS which is the Australian National Data Service, they're interesting, they ran a national program to help train librarians and the name of the program was 23 research data things. So it was interesting and then on the other side we had use cases where for instance at University of Houston we had one librarian talk about how they were trying to use data science and collections development in their analysis for collections. So one of the things we came upon was this multifaceted framework after we had the workshop and we looked at all of our notes after the damage was done, we looked at our notes and tried to provide a structure and we've came upon these four facets around sort of the organizational managerial structures, the stakeholders, researchers, IT students, administrators, public, stakeholders can go on a little bit and then professional and informal skills training and data savvy services, data science services and we grouped them together so the structures and the skills training was one section that I worked on with my colleague Matt Burton and just to briefly go over some of the things that we saw under those facets, the drivers, one thing is this apparent from Cliff's talk that we're living in a more increasingly computational environment, so programming analysis, other things like that are becoming de facto in the sciences, well they already were, but in other fields it's just becoming the trend and so we had some people there talking to us about data pipelines, Jupiter notebooks, that particular example was a contractor working for the Library of Congress that librarians need to account for this, then the recognition of library as a resource for data management, that's come on strongly with all the funder mandates, that's been something we've been living with for a while and becoming seen as a resource to our researchers and other people in the academic setting and then this belief I think that libraries that we're not just a service provider that we're a collaborator, that we can be collaborators on these research products, be more embedded and then one thing that came shine through is the shortage of data savvy employees and library workforce, so that again points to the fact that librarians are trying to take some of these informal training programs and another thing too is that we're competing with just the overall workforce that these skills are in demand, we found some barriers so our branding again like the traditional library versus the scene as a data science resource, formal LIS education, I guess the key is their formal that it's not adaptable, you know some libraries are actually already responding to this but that shine through a lot of people were commenting on the fact that library schools really needed to gear up and develop these skills in data science, the incentive structure so this is something you know that again librarians are looking to their management the purpose what what are the goals behind this maybe also the resources right because this takes a lot of a lot of effort to train and then information overload so too many tools that's that's something actually we work with often in libraries is something we shouldn't be necessarily be overwhelmed with and then the drive-by workshops of the boot camps are happen often but then librarians can come back to their organizations and they can't really implement or practice what they've learned or they're speaking another language that the other librarians can't necessarily understand they have they don't have a shared experience and then again leadership we leadership needs stories to understand why why we need to do these things but I think I emphasize the brick wall is right here often from people who go to these training programs that you know they come back and and they can't really implement what they what they want to do they they you know they hit that brick wall and I think this quote speaks a million learn new skill I want to learn new skills but I still need to do my old job that's one of the tensions that we still face so we had some exemplars of what we thought we're good examples so training for training programs I mentioned 23 research data things there's software and library carpentry which is a lot of libraries are starting to adopt and data science is trained for librarians which is being run in Copenhagen now but it started in in at Harvard and then our local NC State data science and visualization Institute for librarians and this will this will be held actually there's an announcement going out tomorrow about that training program but yeah I think there's especially those last two programs they they they follow along the research life cycle so they really give exposure to the librarians to gain experience in in these spaces and then I mentioned some people Lauren DeMonte who's at NC State is now at Rochester she she she presented at the meeting and talked about her work with graduate students in teaching sort of Python data analysis in the maker spaces and it was a it was a very compelling case of how we we should be working together so she she's definitely an exemplar and then Victoria steves was mentioned by Sloan by Josh Greenberg who who is there with us as as another example of a librarian that's embedded in and working with researchers on reproducibility and I I've never met Vicki but I follow her on Twitter and she she's tweeting amazing things so so I I will move on to my colleague Bonnie hey hi everyone so Chris covered structures and skills portion of our workshop output and I'm going to talk a bit about services and stakeholders so as Chris explained these are services we can provide both internally in terms of data we have within the library to use to improve our libraries but also externally the services we can provide for the library community or for the community that that of our users and then also the stakeholders who will benefit from from those areas so in terms of drivers so what the participants said are kind of driving us in those areas to to want to explore being more data savvy libraries the fact that libraries are inclusive cost neutral and shared physical spaces we see as a pretty big driver for for libraries role in in being data savvy libraries the library acts as shared space on campus it can be viewed as sort of the literal heart of campus both intellectually and geographically it can be viewed as a place for exposure to innovation and creativity and provides proximity to people and community members and collaborations so the library can play a significant role in building community around data science and can catalyze new partnerships the libraries also representative of of our kind of broader community so creating an environment where people feel comfortable asking questions so this idea of inclusivity where someone may not be comfortable going into a very technical environment to learn data science coming to a place like the library where they're coming for other resources and other reasons may be a more comfortable place to learn these skills and play around with tools and technology a few drivers that I kind of clumped into one is this idea of data for informed planning and problem-solving in order to make informed decisions in annual planning senior library and managers need to have tangible evidence to validate operational decision-making future investment and staff deployment so this evidence can be better can be gathered through data collection data analysis and insight and then if data is collected for a while they're just interesting longitudinal data analysis that could be valuable so data science approaches may shed light on otherwise hard to see or understand problems in the library so for instance even something as simple as gate counts can shed light on difficulties in terms of decisions about whether to resource certain desks and hours of operation things like that another driver is the campus use of metrics to demonstrate impact institutional administrators are collecting metrics performance indicators and other evidence to demonstrate impact and to raise the profile and influence national and international rankings so so data science is sort of being used throughout the campus and what role could the library play in supporting that or working with the institution increased another area is just the increased use of data science in the classroom so data science has pervaded almost every discipline and data savvy skills are used in many professions to increase efficiencies and gain insights so this means researchers and students on campuses need support as they learn these skills and use them in whether it's educational programs research projects or even for like personal exploration so keeping up with evolving needs of our users is obviously important for libraries so equipping librarians with data savvy skills increases their ability to support this data focused area and then another driver mentioned at the workshop is something that I'm sure we're all familiar with which is that you know libraries play a role facilitating collaboration across disciplinary areas on campus and data science has become a very kind of interdisciplinary space so again another area where libraries could be supporting in terms of barriers I think what we we saw over and over again though I think this is changing based on the type of library but this idea of the silo effect being seen as sort of staying within the library and you know how how could we sort of rebrand as organizations that are data savvy when we at the moment may be seen as more of like a siloed or within like a fortress but I know many libraries that that are not like that at this point I think there was also another concern about scalability so what we see with research data management and digital projects is often one-on-one work is is kind of what happens with with a researcher with the research team so our workshop participants were concerned about you know what if data data science what if we offer data science services how do we actually scale those kind of connected to that as resources obviously I think this is a pretty common concern many libraries ask what should we give up or do less of in order to take on this area another barrier that that came up in the workshop was around credibility and image people may not have this image for librarians this image as a part of the data science or data savvy teams on a campus though I I do see this is potentially a great opportunity to rebrand the library in this capacity the folks also mentioned experience if librarians are not doing this research themselves first hand that again this goes back to credibility and image how are they able to kind of help the research process and then another thing brought up was sort of library culture and how doing some of these projects and and doing data science might mean having to break some of the traditional rules so I also have a few exemplars like Chris did I'm just for for time reason I'll just sort of talk about one of them though there's there's many and in our report that we will have in January or early February we talk through several of these exemplars and our case studies so at the Carnegie public library Pittsburgh data librarians have been helping the public to use and reuse open government and civic data which are accessible from the western Pennsylvania Regional Data Center the library has run a data day with the aims of changing the public perception from open data to public data and enhancing public data literacy plus they also run a speaker series with some really interesting speakers data scientists and and and data artists so just sort of having that speaker series to get to get citizens and interested people aware of what's kind of happening in this space so as Chris mentioned we have a we have a report and a roadmap currently our report is with the other participants of the workshop we wanted them to have time to comment and get back to us before we packaged it all up and and gave it all to the public but this is our kind of close to final draft of our roadmap so one of the things we wanted to do was then take some of the actions and recommendations that came out of that workshop and actually turn them into a roadmap and look at short term medium term and long term things the library profession and the library education part of our profession can do to make libraries more data savvy so the roadmap covers the four facets we talked about structures and skills services and stakeholders we also have an area called scan because there were several actions that were kind of more of an environmental scan type of thing so as you can see and so it ranges from short term medium term and long term so everywhere from just highlighting success stories as an easy way to talk about what is already happening in libraries to repositioning the MLIS as a long term kind of thing so we we're trying to look at all the different things that need to be put into place to to make this happen so we noticed some common themes that we we want to reflect on and we do reflect on in the the report all of these areas were they were sort of pervasive and important so we created sections for for each of these areas where at the workshop whether it was sort of unspoken or spoken each of these areas kind of covered discussion points that just sort of cut across all parts of the workshop so again I don't have a ton of time to dive into each of these areas but I'll talk a bit about the the ethics and values area because I think that that's something again that we sort of saw pervade a lot of areas and and is I think especially interesting to Chris and I so when exploring the needs of researchers doing data science we learned that there are many unanswered ethical questions and concerns that no single department is tasked to triage or support researchers on on most university campuses and if anybody heard me speak last year at CNI this was the topic of my my presentation there new and complex data sets raise challenging questions ethical questions about risk to individuals that are not sufficiently covered by by data science training by ethics codes or institutional review boards so the use of publicly available data corporate data and government data sets in research projects may reveal human practices behaviors and interactions in incidental or unintended ways creating the need for new kinds of ethical support so researchers and students using these data in their research are navigating issues and making ethical decisions in ways that are not really taught in their discipline many have only their peers to turn to for difficult questions that have essentially like long-term impacts on their research and potentially their reputation so we see research librarians as one set of actors within a support network that university researchers rely on during their research and the values of privacy ethics and equitable access to information are core to librarianship making librarians a unique partner for researchers and for others who are part of the campus support network at the same time there's there is an international conversation about ethical use of data that the field should be participating in there's an opportunity for librarians to find their role in supporting researchers navigating emerging ethical issues in their research so a couple of sort of tangible ways we talk about that is offering triage services where librarians can take a role of providing sort of this triage service for researchers who are unclear where to turn because often in a research project I did last year when we interviewed researchers who were sort of dealing with these emerging ethical issues it was often that they just didn't know where to turn there wasn't there wasn't clear there wasn't really a roadmap yet for for these emerging ethical issues so so librarians can offer a network of support and provide background legwork to save researchers time they could expand their role to help with privacy and ethical decision making as it comes up in their research but also I think a maybe even simpler role that libraries can play in this area is as amplifiers libraries can host a series on ethical topics and research or partner with the work of campus organizers or campus organizations such as ethics institutes like Carnegie Mellon Center for Ethics and Policy Technology and Society or kind of these research institutes like University of California Berkeley Center for Science Technology Medicine and Society there's a lot of cybersecurity initiatives with concerns around privacy happening on university campuses so we see that as another potential role for a data savvy library to be able to participate in in those conversations and put on events and stuff like that so our next steps for for our project is getting out our final report and also going on a roadshow to discuss the project and gather interest so we're kind of kicking off that roadshow I guess by by being here if you join our mailing list at data science in libraries.org you'll know where we're going to be talking and also we'll be sending out the the final report initially through that list we hope to convene future meetings on data science in libraries where we bring together not just the workshop participants we saw those participants are people who are really kind of leading and around sort of that the edge of leading work in this area but we would like to hold an actual convening of those who are who are interested in data science in libraries and those who are you know those who can like showcase some of their interesting data science projects we're looking at potentially annual meeting rotating we had talked about it being sort of in line with some other existing conferences like ARNL or something else that that is out there and established and could help that get going we're also looking at exploring opportunities for improving discovery of data science educational resources so trying to to centralize and and make discoverable so that people are not reinventing the wheel it's kind of like putting together like a repository of a lot of the educational resources we also think it's really important because we weren't just looking at training we're not just talking about training interested frontline library staff we're also talking about the importance of this management gap we had mentioned at the beginning which is we want to ensure that leaders in libraries think about how they can be data savvy libraries and how they can then use the the librarians and the staff they have who who will now learn these skills so we'd like to find ways to share our report and maybe create modules with some of the leadership institutes that exist in the profession to help us think about how we as a profession could create data savvy libraries and also we're looking at great gathering training programs and discuss shared and community programs so again sort of bringing together what programs we are aware of and making sure that they're tailored to the needs specifically of libraries so we have a lot of people to thank you don't have to read them all but you can check that out later so these are a lot of our participants at our present at our workshop and also those who contributed some pretty amazing case studies again those will the case studies will also be wrapped up in our reports you'll be able to see some real world examples as well as kind of what what we see as next steps okay so again I want to thank IMLS there's the URL Liz and Matthew couldn't join us but I wanted to make sure you knew the other participants who were leading this initiative and we have a little time for questions if anybody has any questions