 Welcome, everyone. We'd like to start the session on radical partnerships, expanding academic collaboration and data and computational sciences. We're happy to be with you here in person. I was thinking this morning, it's been probably three years since we've had in-person sessions from many of us in the data community, so we are absolutely delighted to see you, very happy we don't have to say, could you please unmute yourself. We do intend for this to be very interactive and hope that you will ask questions throughout the session. I don't feel like you have to wait until the end. But to start, let me introduce my colleagues. I'm Joel Herndon. I'm the director of the Center for Data and Visualization Sciences at Duke University Libraries. To my right is Stephanie LeBou from the University of California, San Diego. Next, we have Tim Dennis from UCLA Library, and finally, Harrison Decker from the University of Rhode Island. Today, we're going to be talking a little bit about data science in our libraries, how it's evolved, some of the successes that we've seen, some of the challenges we've faced, and the role of partnerships in advancing data science at each of our respective universities. This is an area that has seen rapid growth on all of our campuses, and for the last couple of months, we've engaged in a discussion talking about how different data science can be depending on the institutional circumstances where it starts. And we hope we can expand that conversation with you, the audience today, to talk about what you're seeing at your particular institution, and we can explore this space together. If you do have a question or comment, if you could identify yourself in the institution that you are part of, we'd love to hear more a little bit about the context that you're bringing to the discussion. But to begin, we'd like to start with a short introduction followed by a little bit of background about each of our institutions, and then we're going to open to a longer discussion amongst the panel and the audience about some of these issues of the rise of data science. I think the first thing that we want to acknowledge to the audience is that data science is a space that is challenging to define, or at least is a contested term in a lot of spaces. Generally, there's some aspects of computer science or computing, mathematics and statistics, and domain expertise inside of the space. But the particulars vary a great deal when you're talking with different researchers or different administrators about what actually is included. These different word clouds from across Google give you some idea that many things are considered inside of the space and some are outside. We're going to take a fairly expansive approach today, being very inclusive about what may be inside. Second, we would like to take note that of a couple of different things about the state of computational and data science at universities. First, it's that it's increasingly common at most universities across the US. Unlike a few years ago where you might hear some skepticism about data science, increasingly this is a fairly standard thing. People know the term data science, computational science. We're seeing more departments, institutes and programs over time. I think the second thing that we wanted to note is that the library has a very special role in this space. There are different domain particularities when you're talking about data and computational science as a domain agnostic space on campus that tends to be very inclusive. Libraries have a special opportunity to reach out and to try to partner with other groups across the campus. This makes it a fairly special space. Finally, we want to go ahead and assert and indeed we'll come back to this through the presentation about beyond the ground details for data science are going to vary a great deal based on your university size, the focus of the university. It's going to vary a lot about the size of the library and the role that the library plays on campus. And there's a little bit of path dependency or kind of how data science developed on your campus as well that's going to change how this plays out in day to day. I think the final thing I want to say before we talk a little bit about each of our institutions is that we are going to make the assertion that there is a shared opportunity for library and data science. There's a long history and research libraries across the country and our involvement in data science training and data training in general. So that is a well established fact. But in addition to that, there's a long history of collaborative efforts in data science and data curation and open science spaces on campus. Not only on campus with research computing, IT groups and faculty members, but also across institutions as well. We have groups such as the Carpentries which has significantly expanded data science training across the US. The data curation network which is at the intersection of some of these data science tools and methods and data curation and open science work and other organizations. And beyond all of this, the strong networks of data professionals that are at all of your institutions who work closely together to share our challenges and information. With that, I'll let Stephanie talk a little bit about working at UCSB. So we wanted to start by giving an overview of our campuses because, spoiler alert, we do not agree on a bunch of the things that we are going to talk about which should make for an interesting panel. And part of this is because we're coming from different organizations with different groups we collaborate, different size institutions. So UC San Diego is a really big institution. I usually say when I'm talking at conferences like this or with students like there's 40,000 of you and there's one of me. Now it's not true that I'm doing all of this alone. That's not even remotely true, but it does kind of make that point that if we have 40,000 students, that's a lot of people to support, especially when data is such a hot topic. And we have an increasing number of majors that are taking that sort of like data and computational bent. So we have our data science undergrad. We're going to have a new master's program and a new PhD program in data science. We have political science specializations that are focusing on really computational methods. And it seems like every quarter we're getting some new announcement of a group on campus that has decided, yes, we are going to form some new like alliance or initiative that's really focused on data or computational methods. And so we're fortunate in that we have a lot of stakeholders on campus. So we do have the Hela Jolu Data Science Institute, SDSC, but just because we have a bunch of different resources on campus doesn't mean that it's simple to navigate. It just means that there are more places people have to go and it just means that we have more silos. So a lot of what we focus on for collaboration is how can we take the resources that we have and make it easier for a student to access. I've been at UC San Diego for about four years now and I still sometimes don't know who to talk to to get questions answered. And so how is a student supposed to navigate all these different components? If they're in social science, there's a computing cluster in social science, it doesn't really talk to SDSC, but maybe it does depending on the PI. So we've got a lot of different moving parts at UCSD. In the library, sorry, I have to like look back and check my own slides. There's me, I'm the Data Science Librarian. Thank you Joel. We have our GIS Librarian, we have a data in GIS lab. We have a research data curation program that focuses on trainings. We have our fantastic IT group that's only not listed here because I ran out of space on the slide, but they're really crucial for a lot of what we're working on. So with all this is to say, okay, if we have a limited amount of resources, which we do, and we have this huge campus and this huge appetite for data and data science, how can we try to be most effective without burning out everyone in the library, trying to support every one of our 40,000 students, which all have various needs? And so if we look at the student breakdown, we do have a very large undergrad population and of our undergrad social sciences are quite popular. So like we do have a medical school, we do have oceanography. A lot of people think of UCSD as a very STEM campus and it is, but it turns out that our social science students are a large part of our student body and they are really getting into data and computational methods. And so we're also seeing, as we see differences between domain, also differences between undergrads and grad students. And so depending on what program the undergrads are in, they may get a little bit of data science broadly defined. This might be taking a statistics class that happens to use R. So you'll learn how to run a regression in R, but you're never going to be taught how to organize your data or wrangle your data or rename your data or clean your data or all those things that are part of data science. And there might be curricular gaps in undergrad. What seems to shake out though is we have this pool of undergrads who are getting a little bit of information in their courses and they're wanting to build off of that. Conversely, we have this graduate population who comes into grad school and they're told, great, welcome, you should already know how to do this. But they've actually never been taught how to do a lot of these methodologies in their undergrad program. So I work a lot with our School of Global Policy and Strategy, our Masters of Public Health and these other masters programs where they have two years that's not a lot of time to learn these methodologies but they're being told, okay, welcome, welcome to UCSD, welcome to your master's program. By Thursday, please have started to analyze your research using R and they kind of panic. And that tends to be my bread and butter for the grad students. They are the ones who need some sort of support because there isn't an option at the moment in their courses to have another place to go. And so they find me and then word of mouth takes care of the rest. So I'm coming at it from this place of we've got people who are trying to build on their skills and then people who really just need this foundational knowledge. Okay, all of this to say is we do a lot, we can't get every student but we try to have a range of activities to support the students. And so we do offer a quarter long data skills course for GPS, it's based off the Carpentries curriculum. We run a lot of Carpentries workshops. They're very popular. We work with our research IT. They're a really valuable collaborative partner that we have. UC Love Data Week is where we collaborate with the other UC campuses and the libraries because I could organize a week of events to reach my affiliates or I could work with Tim at UCLA and the other UCs, we could all do one event and this past year we reached 1,000 people. We had 1,000 people register and attend, which is fantastic. And then of course on-demand consultations and class visits for supporting faculty who are interested in this because it's not just students. We have a class that we work with and they are taking their student data science projects and we work with them to create metadata and document and then we ingest it into the library's digital collections and it's their first real experience of that end of the data science life cycle which is you did something cool and then they graduate and it's in a Google Drive somewhere at the breaks. So we didn't love that. So we take it into our collections and they learn a little bit about how data science lives on. And then of course a plug for this afternoon. We also hosted some leading fellows and they're gonna be talking this afternoon. And so that's the brief overview for UCSD to give you a sense of where I'm coming from when I probably disagree with Harrison about some of the things we're going to talk about because that's what came up in our proof learning sessions. So our second institution is Duke University. We are a private, medium-sized university of roughly 1700 undergraduate and graduate students with multiple professional schools and our data science landscape is actually one of multiple data science efforts on campus with multiple schools, institutes and departments and individual faculty actually. Engaging in different data science experiments at different levels of formality. And I'll talk a little bit more about how that is shaped or formed over the years in a moment. But at the moment, things have gotten a little more formal. There is a large science and technology initiative being led at the university level with a focus on computational thinking and its role both in research and inside of the curriculum. Increasingly, many of the groups that have worked separately in many of these efforts are coming together at this point and discussing how do we do this together? How are we providing these services? So there's a lot of hope that that will generate both some synergies across the different groups as well as new ideas and new strategies. As I mentioned earlier, we have a wide range of different data science and computational initiatives, everything ranging to a formal master's program, an undergraduate certificate, another initiative, the Roads Information Initiative that brings together researchers interested in the space, some undergraduate groups and clubs, as well as some machine learning initiatives both on the college and on the medical center side. But increasingly, there's always been a sense of collaboration but more of a formal collaboration as we've launched a new group called the Duke Center for Computational Thinking. This is bringing us together increasingly to talk about how do these efforts flow together? What can we learn from each other and how can we make them work even better? In Duke libraries, we've been involved in data science training roughly about a decade at this point. We may not have called it data science when we started, but as we had more and more students requesting assistance, working with open source analytics tools, data visualization, digital mapping, computational, lean to it, we increased our staffing in all of those spaces as well as looked for faculty members who would partner with us as we worked on those projects. We each semester work with a wide range of classes on campus trying to bring those topics into the individual faculty members classroom as well as increasingly have moved more to providing workshops on these topics. Roughly 30 workshops each semester outside of the pandemic, far fewer inside of the pandemic. If you look at the graph on the left-hand side for the last few years, you'll notice that the average registration in those workshops and Zoom did help, I will admit, has gone steadily up higher and higher as more and more students self-identify as needing these skills. And as my colleague, Stephan, here's just noted, is needing these skills usually just in time. These are a lot of students who are not getting this training as part of their degree program or either they need a refresher in time for a particular project they're working on. So we see that as kind of a core area of some of the support we're providing on campus. The pandemic has really forced us to, really forced us to reconsider how we provide instruction. We've done a lot of experiments on putting more content online and we are not alone inside of that space. Several of these slides, which in the documentation that you can go back and look at individual ones, are the library's instruction sessions. We've done everything from synchronous asynchronous training online. Our colleagues in the center are doing more and more content putting it out on YouTube for students to discover around the country. Generally, we're trying to make this available as open as possible, not just for the Duke community, but for anyone who might be interested in trying to make it as flexible as we can. I think this is one of the wonderful things about working with the center for computational thinking and that it allows us to kind of do a large range of experiments at the same time and compare and contrast about, well, how well did that work? Is this actually reaching the people? Did you see a change in kind of what you were trying to achieve with a particular instruction session? I would be remiss if I didn't, once again, reinforce this concept of partnerships and developing our program. We are a fairly large data unit in the library. There's seven of us with different skill areas and that may be part of our discussion as well. When we offer a wide range of data science topics because we have several individuals who have different skill sets in the group and we're not all providing the same areas of data science. But I think the thing that's really helped us well are some of these outreach initiative. For many years, going back to 2015, we've been a core partner with the American Statistical Association's data-fest competition on campus. We spend a weekend working with undergraduate students and some high school students on a data competition that allows them to use all of these data science skills as well as presenting those things back to a general audience at the end of the competition. We work on longer form sessions like Data Plus at the University, which brings together undergraduate students to work on research projects for about nine weeks every summer, providing instruction and partnering with them. And indeed, we're trying to reach out to the community as well, partnering with data science groups like the Our Ladies are as a statistical programming language, bringing them into the library, allowing them to have their meetings and bringing in speakers, which benefits not only Our Ladies, but also the wider community inside the triangle. So we're very optimistic as data science continues to grow and hope that we continue expanding this outreach, expanding the partnerships both on campus and beyond. And with that, I'll turn it to Tim. Right, hello. UCLA, yeah, just a bit of the facts of the university. It's also a really large campus and also a very diverse campus with a lot of different diversity perspectives and backgrounds. So we're very proud that 31% of first generation undergraduates are families and also a lot of international students. There's a really, and with that, we have also a lot of research funding, which there's always kind of interesting data, increasingly data and computationally centric projects happening on campus that we can plug in and play with and support. We have 34 departments, 128 majors. I looked that up and I don't have the numbers from our workshops, but we're over 100 have reached over 100 from our workshops that people have attended, 50 graduate programs. On campus is a bit, I always love the term federated or distributed rather than ad hoc. So we have, there's computational data and digital scholarship support happens in our research computing group. It happens in the medical school. They have their own kind of unit that supports data science, have their own virtual enclave, do a lot of cool stuff with Jupyter notebooks in their own little sandbox. We also have a quantitative biology group that teaches workshops by consulting. And then within each division, like social science has a social science computing group that has a data scientist little group within that consulting with faculty there, humanities and technologies. There's a home tech group that also provides support. So it's just, it's a big array of a campus and it's like Stephanie articulated, people get lost along the way finding to find who are the right people to talk to. We don't have a real effective way to congeal and connect all those services together. We're trying to work on that more. But in the library, I run the data science center and we have five FT now. Two technical data roles. One is the data scientists, technical data scientists, which is a new classification, relatively new classification in UC, current, if you're, well, for those who are in the UC system, encourage you to use it. Another is another data management type role. It's also technical with a spatial data science librarian and emerging technologies all within our unit. We also have other contributors to our service. Been really fortunate. My unit is in the digital initiatives IT group and we've recruited like a senior software developer to kind of also contribute time and he has a background in machine learning and data science kind of natural fit. He can give us a few hours a week and he teaches with us as well. We have another, a couple of people in our operations group. One in particular has a GIS background and also some consulting hours and instructions. And then beyond that are user engagement group, which is more like the liaising libraries. There are two data roles, like a science data role and also a data literacy role. So we're kind of knitting this together. To give you a sense of kind of, these are just some programs that we've actually worked with a number of these groups through the data science center. The million dollar hoods is really kind of, it's a center that was funded. Recently it's done a lot of work to get incarceration data out of the LAPD. They teach a class that we co-taught. Half of the class we taught R and geo computation and also like doing mapping and GIS. So that was done in the African-American studies history department, right? So it's kind of this teaching, this non-traditional proud like how to do these computational tasks and project based over a term, over two terms actually. So that's a really cool project. The other programs that kind of popped up, the data, there's a master's in data science engineering. I love, I still like have not really liaised really well with this data theory degree, which is kind of cool, data theory. We have worked really well with the Anderson management school where they have a master's in business analysis and we've done like a datathon with them where we taught like R, like the week before the datathon and then datathon happens and we've become judges part of the day. That has worked pretty well. We also worked really effective with the social science division because that's our, my unit was previously from the social science division. So we've taught some, we've taught workshops and work with the urban studies group and also a master's of social science program. Kind of like Stephanie mentioned, you have these terminal programs that, you know, they don't, they just get dropped in, they don't have any of that background and haven't learned the skills. So they're kind of like, oh wow, how'd I learn this? I wanna be marketable too. So this, just to give you a sense of my unit, it really started as a social science data archive that was in the social science division and it existed since the 70s. We're still like the caretaker of that data archive. And 2018, we rebranded into the data science center, partly because there's acknowledged desire when they moved the archive into the library that the library really didn't have data services. Additionally, kind of what data science and libraries is. And so we rebranded into data science center, partly to reflect the change in research, research innovation, all these computational methods and then what the library really wanted to support. We really see it, we focus really on democratizing these skills. So we have researchers who come in that are machine learning or all these buzzwords or jargon-y stuff. I wanna use this. And we kind of help guide them through, well, do you really wanna use this? Let's look at your data, let's see what the best methods are. And we're really trying to teach people how to fish. We wanna help people learn these skills and become part of their research. Because you really don't wanna publish your own research, you don't really know what's going on inside of it. It's like a black box of something that happens inside of machine learning algorithm. Kinda need to know more, so that's our perspective. So you wanna make it welcoming, warm, help people into this domain. We also teach the gaps of those skills in numerous workshops. And so we heavily use the carpentries, or a carpentries member. We started out with like maybe one carpentries instructor and now we have like 20 over the last four years. You can see the dramatic growth in our workshop and that's part of that is that second part where Stephanie mentioned, and Stephanie was kind of leader in spearheading the UC data week. So we started to teach collaboratively during COVID with all, with other instructors in the UC system. And that's been really successful. There's also a UC GIS week, it's also really successful. And that last workshop there is something we teach to go to these carpentries workshops or more boot camp style. I think we had like 15 instructors. Highly successful, it's like, but it's also online. How we'll evolve over time when we move back into kind of face-to-face. And then the last thing we do, a lot of consulting. The one thing I just wanna call out, we start just kind of partnering with other universities. There's this model of the data squad that started at Carleton College, with a colleague of all of us that we know through association. And the idea is it's a model to hire undergrad, hire level students, and then they do consulting. Not a peer model, it's that they're working with graduate students, they're working with the faculty, and all our data squad kids are in the upper level stats. They're doing really exciting work. And the idea behind this is we wanna develop a kind of an international model that people from other institutions can adopt. So you can have your own data squad. And the other big component of this would help professionalized students. They can get the name out for themselves, add it to the resume, wanna ask them to write blog entries and kind of have this profile. I think that's me, I'm sorry. And I'll hand it to Harrison. Okay, so I'm a bit of an outlier here at it since I work at a kind of typical medium-sized public university. We're in R3, but somewhat unusual R3 in that we have a small number of programs that are doing sort of more R1 level work, bringing that type of funding. Very small library staff. And then one faculty data library in which would be me. We also share a position with the Peter Science Department. That position does not sort of have a public-facing role in the library. He's more involved in writing grants that could be potentially beneficial to the library data programs. So bringing in money to try to fund new programs. That's all sort of in this kind of entrepreneurial spirit that we have at sort of the administrative level in the library. Now one kind of unique thing that sets us apart is our AI lab in the library, which was funded through a local nonprofit that funds a variety of educational initiatives, usually just equipment type and infrastructure grants. But because of that, we were able to acquire some fairly, in addition to some robotics equipment and things like that, we do have pretty advanced technology. So we have GPU workstations, we have a high performance computer. And we are also through that program able to establish a connection with the computer science department, another connection with the computer science department where we hire one of their lecturers to kind of run our educational programs in the AI lab. And she is also, she also runs interim programs. So we have a steady stream of computer science, usually computer science students coming into work on projects, lead workshops and so forth through our AI lab program. In addition, the library has fairly conventional, consortial memberships in like the ICPSR consortium and DRIAD for the DRIAD data repository. Now, as for other data services outside the library, there's not a ton. So we have just recently established research computing service with four, I think they're in the process of hiring a fifth person and a fairly recently hired director of research computing. We have a environmental data center for GIS, but everything they do is for they charge for, including workshops. So it's not something a student could just sort of drop into to learn GIS. And then internally within the psych department, they have a quantitative consultation center. And I should add to that that we are sort of working on improving that we've submitted a proposal to the university to create a data science consulting service that would be based in the library but would be staffed by people from a variety of departments. So on the next slide, this is for context that maybe will help in our conversation as we move forward in this panel. I thought it would be important to kind of differentiate what we're talking about with data sciences from what I've sort of put in quotes conventional library data services. And I really see libraries involvement in data sciences being an outgrowth from these sorts of data services that have been going on. Well, probably for larger research universities for maybe 40 years, a lot of growth in the last 20 years in a wider range of libraries where you have sort of that orange circle being, let's call it data reference, data collection development where you start helping students and researchers find data resources. You might devote some of your collection of money to acquiring either data sets or databases that provide data in order to support more quantitative approaches in education and research. Then that evolved into or spun off research data management support. And as we kind of filled positions to deliver these sorts of services, I think libraries began to hire more staff with programming and disciplinary expertise. I won't go into all these little notes I have about organizational impact and mission impact, but I think the kind of takeaway from here is that as you move to that lower dark blue circle, it starts having a bigger organizational impact because of the types of skills that you're hiring for are more technical and less from kind of the traditional library preparation. In fact, three of the panelists here do not have library degrees. I'll let you guess, which three. There'll be a prize. And then I have, so organizational will be basically just on the, I mean that on the side of the library. And then mission impact would be the broader mission impact of the library within the institution. And all these are of course, you know, my subjective opinions, but that's again, that's not really the point I want to make right here except for that kind of organizational impact, which I suspect will come back to. So in starting my job, so I spent 15 years at a large research university, UC Berkeley, kind of delivering these conventional library data services mainly in the social sciences, but built up a skill set that became more consistent with data science and also was, you know, part of this growth in data science at UC Berkeley where they have the Institute for Data Science and Development of the Data Science program and I was the library liaison to that. And so for my last slide, I just wanna show you how we've, I've sort of translated that into the things that I've developed at URI, the sort of novel data sciencey thing. So I've had developed two courses, got Senate approval for an undergraduate introduction to data science course and then a similar course, but at the graduate level in a graduate online asynchronous graduate certificate program and then I've co-taught special topics seminars with faculty from marine affairs, biology and computer science. We do have, we're fortunate in that we have several faculty who are active in the data carpentries movements and we do a couple of those workshops per year. We've done instructor training sort of irregularly. Through the AI lab, we've done ad hoc training at a fairly high level. You'll notice here sort of a conspicuous absence of things that are more directed towards say an undergraduate audience apart from my DSP 110 course that's somewhat intentional in my part. Our traditional public services in the library do a really good job at the undergraduate level, maybe not so much focused on data but they can handle a lot of the basic questions and then as things more technical come up, they may or may not refer things to me. And then the last bullet point sort of significantly and this has a lot to do with just sort of the lack of expertise on campus, wait, that doesn't sound right. There's not a lot of sort of extra faculty members sort of floating around who can get involved in funded research. So I'm currently serving as a co-PI on a supplement to a multi-year multi-institutional grant, it's environmental health sciences grant and then we're waiting on a decision on a renewal to that grant in which I would lead the data management and analysis core of the grant. So we've been able to kind of translate my background and some of the opportunities that present themselves at this sort of smaller university into some, you know, embedding the library into some activities like course development and teaching and funded research that you might not necessarily see it like an R1 university. So that's my little summary, University of Rhode Island. Feel free to ask questions about any of our institutional stuff that's going on but we are now going to just sort of set things up with some shared challenges that we have identified. And I'm not quite sure how this conversation will progress but these are the three areas. So one challenge is that the demand for support outpaces capacity. I'll just read them all out and then we can just start talking. And then the pace of change. So rapidly evolving data science tools and methods. You know, how do we keep up? Although I would sort of argue that this is one of the niches for the library because we often have more time on our hands to kind of dabble in new things and to keep up with trans then some other researchers who really have to keep focused on their particular field and the traditional tools. And then building cross department collaborations in order to meet demand. How do we do that? How do we identify potential partners to help us solve these challenges essentially? So with that, who would like to go first? Or I guess, right? Maybe we should stop and anyone have their hand up with a question? There's a mic right there. There is a mic in the center. As you can tell, we're more than happy and able to rift with each other but as people have questions we're more than happy to make it really interactive. If there's anything that has sparked an idea for your institution based on some of these shared challenges, I know that it's not just our four institutions where the demand outpaces capacity. And if your institution is not having that problem, please stand up and tell us how you've solved it because this is a real topic of conversation. Or share your resources. Please share. Don't be shy. Question? I think they're recording it. So you might, if you wanna be reflected. Hi, I'm Patrick Yacht from Northeastern. I'm just wondering if you've experienced what we're experiencing, which is our rising freshman during the summer before they come to school now, reached out to my data science team for our training and Python training. Thanks. I would say there's kind of, I'm thinking there are kind of two questions there. We, and I think Stephanie mentioned this as well. We do see in the library a lot of students who feel like they need the base level training and a lot of data science tools in order to actually engage in the full semester course or the full quarter course that they're taking. I have not seen as many high school students coming in, although I had an email on the way here that was suggesting that may be happening this summer, so maybe that is the data future for Duke and the president. But we do see a lot of requests for bootcamp training, which seems very similar where we have graduate students who are coming into programs where there's an expectation that day one of that program, you will know how to program in Python or are still in the blank. And we do try to support those efforts generally in the summertime because we're a semester-based institution. I think there's a question about prioritizing topics in groups, which differs depending on how many staff and how much bandwidth you actually have in the timing on a lot of these questions. There has been some literature about bootcamps, how different individuals here might have different opinions about the impact and the effectiveness of two weeks of training or two hours of training if you're doing short form instruction or a series of YouTube videos. I would probably say we don't have as much evidence as we would like to actually have the answer about if you have a student who's trying to get up to speed very quickly, most likely it probably differs. I would suspect, depending on where the student, their background and what they're trying, their degree program and what they're trying to achieve. But that's a very long way of asking, it's like, yes. There are a lot of people coming to the library as the, I don't use the term institution of last resort, but it's seen as the shared space where this training is available and the library is often willing. And I think, so Tim mentioned we run kind of this boot camp that's maybe three or four UCs in the summer, right before the quarter starts and that's mostly for grad students. I can say I do get a lot of undergrads. Their first week of their freshman year that come in in a panic because they took their very first day of their very first IntraData science course and looked around and everyone seemed to be doing deep learning. They were coming from high schools like they got to take a Java class in high school and the library, however they come to it seems to be more of like a safe space where they can come and say, yeah, hi, like I've been told that you're my librarian. I'm too embarrassed to ask, but like, what is data science? What is machine learning? And how can I get caught up because everyone else seems to be coming into college with like all this crazy coding skill and I'm not? So the freshmen I see tend to be actually from our data science programs who are very, they should be starting from scratch. That's the point of going to college. That's the point of getting this degree. They're not expected to know that but there's a bit of an arms race in data science where like by the time I see them when they're juniors they're doing neural networks stuff that I can't even remotely keep up with. And it tends to kind of panic some of the other undergrads much less someone coming from social science saying, I wanna learn how to use R but everything I found seems incredibly intimidating. So that's a niche that I think the library can fill because it's pointing to existing resources in sort of a non-domain way. And we're okay saying I'm not an expert either but if I can learn to do it and we can sit down and learn to do it together we can make an environment that's a, I think a little less intimidating than going to the professors in like a 500 person class and say hey not to bother you but like I've never heard of object oriented programming. And so the freshmen that I get tend to be like you're saying the incoming ones but they're really looking for that extra boost because they feel like they're falling behind. Ironically, they all feel like they're falling behind like every single one as imposter syndrome but that's where we can help provide even just directing them to like what are good tutorials online? Because if you search data science tutorials there are literally millions. So if I can just go through and have a curated list of like these are legit if you go through these I promise that you'll have a basic understanding and then our grad students are kind of like a different. Are there questions? Wait, I had a response. So all the Rhode Island high school seniors that ask about our training I refer them to Northeastern. No. No, we do get, I think it might reflect what the students, yeah we don't get any inquiries about our Python training but we don't really offer a lot of that internally within the library. We do because we show off things like the AI lab on the campus tours. We do get some questions about that. Now COVID has kind of skewed everything but we do offer some summer camp activities in our AI lab. But no, we're mostly focused on graduate students, graduate student training. I believe we will be doing more sort of camp style training focused on graduate students and it might reflect what the most popular majors are on campus that are not necessarily, they usually have some quantitative component but it's not like a major emphasis. Like departments like, I think nursing is the biggest, nursing and business are pretty big. Yeah, we focus mostly on grad students as well. But we do get undergrads who come to our workshops but they're mostly kind of probably not freshmen. But that said, like Stephanie was saying, we're talking about the Carpentries curriculum. It is geared at novices. It has been taught to high school students. It's not assuming any prior knowledge. So I would recommend you could use that curriculum if you did target that audience. And it is designed to kind of build confidence. You go slow, you're not gonna throw a bunch of jargon stuff at the students. So that's really the focus. You wanna build confidence that if you target high school students in summer, I think it would be an effective way to use that curriculum. And that's one collaboration that we've been really focusing on. We've got like our internal library, our internal campus between different departments and even between campuses, but at least for us. And I can only say for like me as like the data services. We don't really have any close collaborations with our local community, but it's something that, you know, with extra capacity is an area that it would be great to be more plugged in to like our local community and having a pipeline. We are a land grant institution to really work with our local community and give back and be more engaged in that collaborative nature. So we're slowly spreading out, right? Like we have multiple departments in the library. We have so many departments on campus. We're stretching out across the UCs and across different institutions. And so this actually didn't come up in any of our planning conversations, but it doesn't make me think. We don't have any currently collaborations with our local communities, but it's something that might be somewhere that we end up going. And the Royal League, like universities, if there's this interest in high schools to like get students into some of these programs and train up data science skills for people that are gonna like stay in the community as opposed to train up students that then leave. So a question from the audience. Hi, I'm Curt Heligas. I'm the associate CIO for research computing at Princeton University. What we've been finding is when teaching these things, whether it's in coursework where professors are trying to teach it themselves or whether it's for our training programs, a huge impediment, something that gets in a way and just really makes it so that the first day or the first few days of teaching this is that getting the tools onto the laptops of the students is a huge impediment. And 90% come to class and it's ready to go. There's 10% that take up 90% of that first day trying to get it working. We've moved towards, and this is both for Python and for R. We've moved towards using a virtual desktop environment which has its own drawbacks. There's some latency issues. Some people don't like the responsiveness. It's confusing getting there, trying to get shared files there. What tools are you using to make this easier for the students to be ready for that first day of class so that they're learning about data science and using the tools rather than how do I get the stupid tool onto my laptop? Yeah, I can tackle that as someone who's actively teaching data science. So one of the first things that our research computing unit did is to make it easy to spin up our studio server instances and Jupyter hub instances. So it really took maybe one email and maybe an hour for me to get 30 logins to our RStudio server that's locally hosted. I mean, I have used cloud-based services like RStudio.io, but it's very nice to have one that's locally hosted. And so what I'm talking about here is RStudio is the common user interface for using the R programming language. You can run it on the desktop, but it has a virtually identical interface that's browser-based. The only difference is that you can't access your local file system, so any sort of project that involves sort of moving files around, there's this additional step of moving things from your desktop to the web. But most students can figure that that really hasn't been a source of problems. So that's, I think that's really important. And I know that's one of the ways we already talked about this over lunch, but back at Berkeley, one of the reasons they're able to have these large, like 1700 students taking data science at once is because they have a very robust Jupiter hub environment, which means you can program in Python, in the web browser, and everyone in class is using that same tool. And they also have some other sort of value-added services like problem graders. So the instructor can create a programming problem for the students to work within the Jupiter notebook. And I know we're tossing around a lot of terms, and I've been unfamiliar. You can look them up, or you can track us down, and we can talk more about these. But those two tools are tremendous time savers because as all of us who've taught workshops know, you can easily spend 45 minutes figuring out why that one student's computer will not load a certain R package. Yeah, so at UCLA, we do, like they, our research computing does run Jupiter hub for classes, so I can request it and then have people log in just like what Harrison's described. There is like the caveat though, I mean, researchers, I think 80, 90% of your work is gonna be done right on this laptop thing here. So there is a pedagogical reason why you'd say wanna teach people how to use their laptops. I mean, I think it may be when moving into a cloud kind of computing environment, maybe less important over time if that works effectively, but still there's worth of teaching people how to use their own machine, I think there's value in that. That said, yes, it can blow up a whole workshop. You can really, you can go in the ditch pretty quick. I mean, we've tried to have pre-installation sessions because I'm actually ride or die for installing locally. Part of working with these platforms is you have to know how to work with it locally. So there's a collaboration at UCSD, we have our data science machine learning hub and it's essentially a Jupiter hub, but it's class specific. So if you enroll in a class, you can use it and then once the quarter's over, too bad. We have these fantastic virtual machines that the library has available for students that do have R and Python installed, which is great. However, when they graduate and go out into the world and work with it, they no longer have access. And so I totally agree that it totally blows up a whole day of a workshop. It's fantastic to be able to have it. I tend to be like old school, like you have to be able to run it on your local machine otherwise what's the point? And so we do offer, I have had students come in with all kinds of like weird operating systems and it's running in like a different language and I've never even heard of it before, but like it's worth that effort because then it's installed for them forever unless they get a new laptop. So we are fortunate to have these collaborations to enable us to offer the kind of cloud-based things, but I tend to fall on the pedagogical side of you have to be able to have it locally. But your mileage may vary. This is a like continual conversation in the Carpentries community, like this is a global community and this is like, oh, do you get people to teach them how to set up their own machine or do we move to kind of cloud-based kind of environments? It's like a virtual environment. I will offer a different perspective. I agree with everything that's been said. I think cloud environments are really convenient for the reasons that we're stated about trying to reduce that barrier to easy access, but if you're trying to be inclusive and including people outside of your institution, we've bumped into issues where we had people who are not Duke affiliates who are with us in the class. That's a standard policy for us. Everyone is welcome. We couldn't use the Duke virtual computing resources that we had and so we vary that around a little bit now and we have some instructors in the group who feel very powerfully teaching people how to set up the installation on their laptop, their own machine, is getting them ready for doing research and the class work. I don't think we've settled that debate. There are a lot of different factors to consider. Yeah, I'm adamantly opposed to work-on-stalls. Sorry. She promised a controversy slide. I'll just flat out say that. But we worked it out on something else before. But we have offered, we do allow with our GPU laptops, we use something called Apache Guacamole that allows you to sort of like running a virtual machine. It's a web interface that allows them to log into a workshop. That was something we launched during COVID. So that is sort of a halfway in between type thing. That's kind of a workaround. But yeah, I mean, my experience is that a lot of local install problems have to do with idiosyncrasies, like a wide range of idiosyncrasies rather than one sort of set of distinct skills that you can teach. And more often than not, the problems occur when students come with the software already installed. And it's the wrong version. Like, I mean, if someone comes and wants help installing something, I'd give them help installing it in terms of classes. I don't think it's the best use of our time. That's me. I think it's... Maybe one or two. Exactly. That's important because you could work that out as well. Being the space available, and if people choose not to take advantage of it. We do have like 10 minutes at the beginning, but then say like, well, if it's not installed now, I guess you should have really showed up or emailed. And we're gonna move on. And we might do a breakout room in Zoom to try to get the person up to feed, or say, when it was in person, it was nice because we would just say like, pair up, right? Work with your neighbor, and we will deal with your weird computer issue. As people are thinking about their next question, I wanted to touch on something that Harrison just mentioned about the importance of infrastructure collaborations. So we talk a lot about collaborations between people and between departments and information sharing and having support. But a lot of data science and computational sciences in the end really boils down to infrastructure collaboration. So like, who is going to pay for the GPU machines? Who is going to service those? Is it going to be library IT? Is it campus IT? Is it ed tech? Who is going to take ownership of that when it's so distributed? And we have a really complicated system that I'm not gonna get into here because we only have limited time. But I think it's something that we're all kind of facing in terms of collaboration. And there's different needs, and it means that there's a lot of people at the table in terms of security and cyber infrastructure and giving people access. I know that we have about 150 people who have access to our library virtual machine at the moment, and it's giving the IT part palpitations because that's so many people who have access to virtual machines, even though they're all students. So I don't have an answer for that, but if anyone has other takes on infrastructure collaboration or anyone in the audience has a novel partnership for infrastructure, that tends to be the main reason. So we are close to time at the moment, and so what I would like to do is encourage you. If you have questions about collaborations at any of our campuses, we are more than happy to meet with you and talk this afternoon outside of the sessions about things we've seen on our campuses. If you have particular questions about what data science looks like, how we're staffing those roles and kind of where the role of the library can be inside of the space, please do pull us aside. We're happy to explore that. So I'd like to thank the panel for a wonderful set of presentations. Thank you. Thank you, Joel.