 All right, I'm Mikaela Parker and the program coordinator for the Moore Sloan Data Science Environments. This is a partnership between two funding agencies, the Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation, together with three universities, New York University, the University of Washington in Seattle, and the University of California, Berkeley. And this partnership sought to bring an advanced data science and academia. And it started in 2013. So I'm going to cover the highlights from the Moore Sloan Data Science Environments. You'll hear me refer to it as the MSDSEs for the last five years. The funding from the two foundations largely went to support three data science centers, one at each of the university. And so I've highlighted them in their logos on the slide. At Berkeley, it's the Institute for Data Science at NYU, the Center for Data Science at the University of Washington, it's the E Science Institute. But before I dive into some of the highlights of this partnership, I wanted to just set the stage with something that I think all of you are familiar with, and that is that data are being collected and used everywhere in our everyday lives. We are living in an age of smart homes, smart cars, smart health, smart cities. We heard of the plenary, the internet of things. But importantly for academia, we're also talking about smart discovery. Every field of inquiry at a university campus is transitioning from data poor to data rich. And that's not just the traditionally heavy data sciences like astronomy. We're seeing lots of data at the advent of the worldwide web in digital humanities, sociology, economics. It's really unlimited. But importantly, I think we need to remember that data science is not just about volume of data. Now this figure is from a paper that's five years old, but I think it largely still applies. When university researchers are asked, how much data do you work with? We're talking on the order of gigabytes typically. So this is not a lot of data necessarily, but it's also about the variety, the heterogeneity, the messiness of data that university researchers are struggling with and the velocity at which it's coming at them with all the advances with new instrumentation, new algorithms, new models that are just generating more and more types of data at increasing velocity and volumes. So it's really this challenge to bridge this gap between the university domain researcher and data science practice that was at the heart of this partnership. That is that as data increases in all forms and in all fields, even some of the very best researchers on university campuses are struggling to generate knowledge and insight from these data. So this MSDSE partnership really sought to build bridges where new advances in methods and data science practice are brought to the university researchers and enable them to generate new discoveries. And these new discoveries open up a whole new world of different questions they want to answer, which will then in turn spur new developments in data science practice. So the goal here is really a feedback loop. And this feedback loop was kind of the inspiration for the structuring of the more slow in data science environments. The efforts for this partnership were organized into what were identified as challenge areas or working groups. And these working groups were the bridges between the scientific disciplines on the left hand side and the data science methodologies. And as you can see, there's this feedback loop again. So those bridges include career paths and alternative metrics, education and training, software tools, environments and support, reproducibility and open science, working spaces and culture and ethnography and evaluation. And this last working group actually was recently renamed to data science studies and I'll come back and explain that a bit. So just a high level outline of my talk, it's mostly in two parts. As I mentioned, I'll be highlighting some of the successes and outcomes from this five year collaboration. I'm going to highlight both cross university collaborative outcomes as well as touch on some individual university achievements. But then I'm going to transition to a few key takeaways and institutional challenges. And for this section, I'm going to draw not just on the more slow in data science environments and those lessons learned, but also touch on a recent landscape survey conducted by apt associates of 20 data science centers nationwide that is now available. So I don't have it with me, but if you're interested, I can get you a copy. Also touch on the final evaluation of more slow in data science environments also conducted by apt associates. And earlier this year, there was an inaugural data science leadership summit and some of the outcomes and lessons learned. I'll touch on there as well. But first I want to nod a few unsung heroes and this is that ethnography and evaluation working group I mentioned who are now called data science studies because these folks are really, I believe at the heart of some of the successes of this program. So this working group is really tasked to understand the complex landscape within which data science is situated and identify and evaluate best practices. In a sense, they're doing the data science of data science. It's basically an ethnography meets reflective and reflexive self evaluation. And what I mean by that is our ethnographers were embedded in our programs to provide immediate feedback and of our programs on activities and that allowed these data science environments to be very flexible and adaptable. The working group also is tasked with raising awareness of ethical issues and surface best practices to the larger community. And they have their own scholarly work as well. So they publish in computational, HCI, historical and ethnographic approaches to studying the best practices and tools of the data science culture. Okay, so now I'm going to move to some highlights and successes. One that I think is really appropriate for this audience and this meeting is reproducible and open science. And New York University has had major strides in this area. They've always partnered really closely with the New York Libraries, the New York University Libraries and this year they were successful at hiring their first ever reproducibility librarian into a ten year track position. And we think this is the first ten year track position for a reproducibility librarian. They've also put a lot of time and effort into a tool called ReproZip. This is a package that allows you to essentially pack up all of your research along with any necessary data files, environmental variables, options and allows other people to reproduce your findings without having to go and dig up all the necessary dependencies and they can even run it on a completely different operating system. The next point in the reproducible and open science category is a collaborative effort, so this was a case studies book. It's available on Amazon, but it's also available completely free online. Neglected to put the URL on there, but it's something like practice of reproducible research. And this is a volume of reproducible research workflows. So it includes tools, ideas, practices for real world research projects. And the contributors to this book were all from the morse on data science environments and they were the individual postdoctoral fellows or researchers that contributed each a different chapter. And it has a real emphasis on the practical aspects to make this research reproducible, so it's a real practical handbook. Moving over to software and education, I chose an example at the intersection of the two of these at the University of California, Berkeley. They have made a lot of strides in undergraduate data science education. They launched an intro to data science called Foundations of Data Science or Data Eight course and the very first offering of this course. There were students literally sitting in the aisles and on the stairways. So there's a huge demand for foundational data science at the undergraduate level. And the course continues to see enrollment rates over 1,000 students per offering. It's the fastest growing class in campus history. But at the same time, at the University of California, Berkeley and their Institute for Data Science, they support the Jupiter team. So if you're not familiar with Jupiter notebooks, it's an open source web for creating and sharing documents that contain live code, narrative, visualizations, all packaged neatly together for sharing. And they've developed a multi-user version of this called Jupiter Hub. And Jupiter Hub is great for classrooms. So the way this works is Jupiter Hub allows you to create and deploy identical installations of all software for each user notebook. So each student has a notebook and it looks very much like the left-hand side of this slide where the student names are just in folder format. So it's a very familiar, directory-oriented layout. And then the instructor can manage authentications and the users. And it allows a lot of easy scalability, essentially, and identical installations across all of the learning tools in that classroom. And the cool thing about this is that it was so successful and it's been so widely adopted at Berkeley that the University of Washington following Berkeley's lead picked up Jupiter Hub in 2017 and is now using it in at least two of its courses. I'm gonna switch now to the University of Washington eScience Institute and focus on research support. So at the University of Washington, they have something called a Data Science Incubator Program. And you can think of this as the space between office hours and a grant proposal. So this university's institute has been around for longer than the more sun data science environment. So they already had a pretty good handle on the data science needs of their university and they came to find that most researchers, their data science challenges don't fit in a one or two hour consultation. But they're not so big that they need to write themselves, data scientists, into a grant proposal on the scale of two or three years. It's somewhere between those two engagements that most of the data science challenges at a university lie. And so they spun up what they call the Data Science Incubator Program. It's an intensive data science consultation to advance research. But importantly, it's not a throw your data over the fence and have somebody work on your problem for you. It's mostly a teach a person to fish approach. And it provides a shared environment where researchers can learn from an in-house team of data scientists, but as well as external mentors and importantly from each other. I have here, let's see if I can find the examples because I didn't want to forget them. So just as an example of the kind of cool cross pollination that can happen on the incubator program, a political science faculty member, John Wilkerson, ended up using a sequence alignment algorithm from Biology for text analytics to trace the workflow, to trace the flow of ideas through legislation. And in another example, an image processing algorithm that's used in astronomy was applied to analyze MRI data in the brain. So you can see where tools can really have cross disciplinary applications and it's at an environment like this that you get that kind of cross pollination. So this winter incubator program is, we're on the quarter system at the University of Washington. It's 10 weeks long. But importantly, it requires in-person engagement two days per week. So the person with the data science challenge proposes this project to our team. I say our because I also work at the University of Washington eScience Institute at the moment and they will sit side by side with the data scientist committing two days per week on this project. And it's that kind of side by side intensive collaborative work that really advances the project. They've had participation from faculty, graduate students, staff. And as I mentioned, the network effects here are really important. There's my two examples, biology and astronomy. And just to give you some flavor of the types of projects that are handled in the winter incubator, you can see they touch on hydrology, prostate cancer, economics with simulating the airline industry. So really the sky is the limit here. This program is so successful that the university then decided to launch a similar program in the summer called Data Science for Social Good. In this case, the projects are specifically aimed at societal good challenges and they also bring in a team of four students to work on each project. So now you're talking about a project lead who pitches the project, a data scientist and four students. The students come from a nationwide call and it's a very competitive program without very much advertising. They solicited or they managed to attract 250 applicants for just 16 spots. And these are students that come from mechanical engineering and physics. I mean, any kind of discipline you can imagine. But they all really are drawn to the idea of addressing challenges that make them feel good, that are benefit to society. So the dual goals of this program are student training. So the students who arrive have a lot of data science expertise already, that they get to learn new tools or new ways of applying the tools they already know. And also the other goal of course is to create tools that facilitate responsible data driven decision making. And just a quick highlight of some of the projects that they have hosted. Everything from permanent housing from homeless families, looking at unsafe food products, transit system questions. And the data science for social good program, many of the project proposals are coming from outside the university. City government is a really popular one. Okay, that's great. These incubator programs are very intensive and very successful, but they don't scale very well. An engagement where you're talking a one on one with a data scientist for a quarter is only going to help a handful of labs at a time. So I'm going to turn now to an option for scalable research impact. This is a program that all three of the more Sloan data science environments worked on together. It's an opportunity for community learning within domains, and they're called hack weeks. So many of you are probably familiar with the hackathon model. This is a flavor of hackathon in a way, but it has a very strong emphasis on community building and tutorials. A lot of learning. So it places a strong focus on building a culture of practice and developing resources within an existing domain specific community. And importantly, it brings together researchers from many different universities, but that are all from one domain. So these researchers already have a shared language. And in some cases, shared scientific objectives. So it's very easy to gel a community and start immediately with a learning around this kind of approach. And as I mentioned, there's a lot of peer teaching and peer learning that happens. In exit surveys from these hack weeks, 70 to 80% of participants say that they taught someone else during the hack week. So it's really powerful for fostering collaborations, learning among people from various stages of their careers and technical abilities, and catalyzing a community through a shared interest in solving computational challenges within a field. The success of these hack weeks has been so apparent that we now are seeing many others spring up. Astro hack week was the inaugural hack week from 2014. It's now in its fifth year and it was taken international because there's such a demand from international participants to have an easier travel to the conference. It took place this year in Leiden in the Netherlands. NIH recognized the power of neuro hack week and awarded that group funding to develop neuro hack week into neuro hack academy, which is now a two week program. And they just had their first iteration this summer. There are two new hack weeks, water hack week, which focuses more on freshwater and ocean hack week. And personally, I would really love to see some of the social sciences get on board with this because I think it would be a great tool for them as well. And so I put socio hack in there with a question mark because I also think that's a terrible name. Some of the outcomes include papers, software, and of course results. So now another version of this kind of broadening and scalable research impact is community learning across domains. So that was within domains. Another version would be across domains, also known as XD working groups. XD for across domain. XD working groups host workshops and they're centered more around methods. So it's an opportunity for scientists to work on similar problems, but in different domains to strengthen ties across disciplinary boundaries and work together to identify common principles, algorithms, and tools. It includes a make session, which of course is an opportunity to work together to craft solutions to shared problems or investigate research questions of shared interest. The inaugural XD workshop was image XD that was held in 2016. And there were 50 researchers from 14 different institutions. And you have listed some of the domains where these researchers come from. So you can see it's really broad, including computer vision, earth science, neuroscience, astronomy, and more. Now these XD workshops are more new. They're more recent than the hack weeks, but they're already gaining traction. We've now seen the launch of text XD and graph XD. And some of the outcomes from these shorter workshops include blueprints for open source image processing, in the case of image XD, and training sets for ML applications. So my key takeaway here is that informal, intensive, community-driven learning opportunities like hack weeks and XD workshops might quickly and effectively bring data science to campus researchers. So you might say, well, that's great. But who does all the work? And I put in there, where does the magic happen? Because that's an easier question to answer first. So here, oh, unfortunately, very washed out. But here is a view into the institutes at those three universities. All of them developed some kind of open research space with flexible furniture, adaptable configurations. A lot of emphasis put on the ability to just drop in and attend a meeting or work together side by side. And a crucial part of the development of these spaces is the idea of a neutral space. And in order to emphasize that, I'm dropping here a quote from an interview with apt associates during our final evaluation. I'll read it real quick. One thing that I think we talk a lot about and I think has been verified is that having a neutral space on campus is important. We're not viewed as part of the computer sciences department or another department in particular. There's this sort of Switzerland effect. You're outside of the departmental silos. People come here and are more likely to collaborate across disciplines than they might otherwise be if they were all going to somebody's particular department. And I think this is really a major asset of all these three institutes. But I think one that's also a lesson learned that others can take up as well is the idea of having a neutral space on campus. And so when we think about other universities, where's a good neutral space on campus? The libraries, of course. So partnership with the libraries is one of the best options, I think, for spinning up an area for data science to happen. In the case of the Moore Sloan Data Science Environments, two of the three were using renovated space that was on loan from the libraries. These are the design, as I mentioned, was around very flexible, collaborative spaces, which are also one of the key elements of the libraries. And in this case, they also tried to think of how to make it transformable and flexible. Having small and large meeting rooms, hot desks for people to come, casual seating, writable surfaces was a really important one that a lot of people wanted. And I think the East Science Institute probably took that to the extreme. They made most of their walls writable, so people just stand up and start having a conversation and start writing on the wall. And that's actually a really effective collaboration tool. People think really well when they're standing up with a pen in hand trying to explain something. OK, now I'm going to get to the harder challenge. Who does the work? Career paths for academic data scientists. I think probably many of you have heard the term data science as a team sport, and I firmly believe that. There is a lot of expertise that goes into data science challenges. But unfortunately, I think that it's more like a team of X-Men than a team sport. And the reason I say that is the challenges that face data scientists are really ever-evolving, and they themselves are ever-evolving. But unfortunately, they're also, in many cases, viewed as second-class citizens at the university. What do I mean by that? Here's another quote. This is from a research scientist at one of the more some data science environments. I'm doing all these projects, and the university is very happy to point at my work and say, isn't this really cool work? But I don't have that first-class status of a faculty member that would just grease the wheels and make everything a bit easier, including getting grants. I know that if I was an assistant professor somewhere, a lot of these doubts would just go away, just based on the title alone. So this is a challenge, right? We need to find a way to have prestigious careers at a university for data science talent, or we're not going to be able to keep them. And it's a common theme that was also expressed in the landscape survey of 20 universities. Academic labs are struggling to obtain the computational support they need. In some cases, it's obvious. This is a salary issue. Really talented data scientists can earn probably three times as much in industry. But on top of that, most academic data science positions, especially within labs, are contingent on grant funding. So you're saying to somebody who has all these data science skills, come work at a university and make less money, and we can't guarantee that money is going to be around for more than a couple of years. That's not very attractive. And on top of that, data science positions themselves are difficult to create a university, because many universities don't have that prestigious titling and career path that kind of matches what faculty are able to do. So the question is, how can we make data science more attractive in academia? So these are just some ideas I'm throwing out. One, PI status. There are universities where staff members cannot have PI status. In the case of a data scientist, that's silly. These are really talented folks, and they should be allowed to bring in their own money. This will help with the sustainability problem. Of course, we can all highlight the advantages of a university, being a more intellectual environment, and having opportunities to mentor and teach. These are actually really important to some data scientists who want to stay in academia. But then we also need to make it possible for them to mentor and teach. Competitive salaries, I put that in quotes. Of course, academia can't compete with the Googles and the Facebooks of the world, but we can certainly change our HR structure so that data scientists don't fall in the same payroll schema as somebody who's a pipette washer in a lab. We need to think differently about those pay scales. And the titling. So we heard from, I don't remember if I have a quote. No, I don't. One of the interviews from the APT Associates evaluation, there was this feeling that having a title like fellow, like data science fellow, it felt very transient and also very junior. And so that only works when you're trying to first attract data science talent to the university, but thinking in the longer term, if these are folks that want to stick around, there should be some other titling for them, maybe something like professor of practice. And I put early career mentorship here because this was a lesson learned for the Morselin data science environments. We actually successfully created some of these data scientists positions, and believe it or not, they're still with us after five years because they really enjoy the work. But in these confidential interviews, we found that we were not doing a good job about mentorship. And this quote is, this person is saying, I think there's a degree of structural change going on in the academy. Sorry, the first quote's gonna be about the titling. A degree of structural change going on in the academy, but I think that's happening very slowly. Do these kinds of positions of leadership that are not tenure track faculty get created? If not, I'll probably end up going to work for some other nonprofit open source type of place. So again, titling is important. Mentorship, mentoring of the data scientists and research scientists to help them figure out what to do strategically for themselves, their careers. It isn't something that's really addressed now. And it's hard because these are new jobs in academic research, which means we need more mentoring, not less. So again, lesson learned, but hopefully we all can take a good cue from that. So I'm gonna now, I spent a fair bit of time talking about career paths. I'm gonna now just touch on a few challenges and lessons learned that we could then use perhaps for discussion. I could dive into any one of these in excruciating detail, but I know you don't wanna hear that. So let me just gloss over a few more that arose from this five year institutional experiment. This first challenge came from the landscape study and it's about establishing a center. It was noted in that survey that the greatest challenge is not surprisingly navigating the university's political landscape and persuading faculty that a data science center would be a benefit. So the important lesson learned is to engage the university community as much and as often as possible in the design process. Some of the foundational elements that we found that were in common across data science centers nationwide are a dedicated space. I mentioned how important that is and a strong emphasis on collaboration, interdisciplinarity and community building. And to note, virtually all entities in that landscape survey are administratively based outside of any one department or school. Faculty involvement, this is really a big challenge. Faculty already have a lot on their plates with service to their departments, teaching their own research. So how do we balance engagement in a new data science center or a new data science program against all of these other demands? Some ideas, again, provide teaching releases that it worked in at least one case for the Moore Sloan data science environments. However, as great as that sounds, a lot of departments are very reluctant to release a professor from an instruction obligation. So another idea might be provide some access to discretionary funding to support some of their research while they're helping to run the data science center. Credit for software, I think this does not get discussed enough and this is something that touches faculty as well as data scientists. On the university campus in hiring committees, in tenure reviews, the time and effort and outputs that go into software development and workflows and tools that help an entire community advance science, this time and effort output are not recognized at the same level as a high impact publication in these kinds of reviews. And that really frankly needs to change because when somebody spends two years developing a tool that advances everybody's research, that needs to be recognized at least as much as a high impact publication. And paths to sustainability. This is gonna be a challenging one and I think something the plenary speaker spoke of as well. It's gonna be interesting to see where data science goes on the university campuses. Right now, there are, I would guess, probably more than 50 new data science centers and programs nationwide, but many of them are getting spun up with temporary startup funds, either from a foundation or from industry. What is gonna happen to these initiatives once that funding runs out? Who is going to continue holding up data science in the academy? I think it's important for universities to start thinking hard about data science as a service and as a core component of university research across all fields. And if that means data science becomes part of the libraries or part of research IT or maybe as the plenary speaker says, we don't need to silo and we just say support for researchers and we all go to the same place, that's great, but it is on the universities to make that happen. So moving forward, how do we start addressing these challenges? I think it's important to continue the community building for institutions. This is a shot from our last Moorestone Data Science Environments annual meeting, but as I mentioned earlier, there is a new annual meeting for Data Science Leadership, Data Science Leadership Summit. It's an opportunity for thought leaders to discuss the challenges and lessons learned as academia adapts to the data science revolution. So continuing to meet regularly to discuss these challenges and to come up with ways to advance data science on our campuses is an important step. I just wanna mention as well the Moorestone Data Science Environments Summit while it has been exclusive to those three universities, they are looking to broaden it and have other universities start participating and it's a really cool opportunity for data savvy researchers to share and learn tools and methods outside of their domains. So I think in some ways it's a really unique kind of meeting because it's not an astronomy meeting or a social sciences meeting. It's a meeting of anyone who works with data science tools and is developing methods that wants to share and learn from each other. Okay, I think I have talked long enough. So I'm going to stop there and just see if there are any questions. One thing to note, there were a lot of lessons learned coming out of five years of experimentation you can imagine. So I wanna draw your attention to a white paper that all three universities co-authored together called Creating Institutional Change in Data Science. It is available on the MSDSE website on the bottom of the first page. And there's just a highlight article in the Chronicles of Higher Ed, but the full paper is online and hopefully you can find some more nuggets of information there. But if there's anything in this talk that you have questions about, also feel free to reach out to me. My email address is right there and I'll take any questions. Otherwise, all right, thanks so much. Thank you.