 All right, welcome everybody. So this is a talk on open-source science and OSPAC collaboration with OSPACON. My name is Lexi Krabrov. I'm open-source science director at IBM Research and Tim is my colleague. Yes, my name is Tim Bonaman. I'm the community lead for open-source science. So and welcome everybody online. So what is open-source science? Open-source science is a new initiative that we started at IBM Research last year. The initiative itself is at NumFocus. NumFocus is a non-profit home of Jupiter, Pandas, NumPy, SciPy, Scikit-learn and several dozen projects mostly doing data science and a lot of them are originating in science and serve science. So that seems a very good relationship for us at IBM Research to start this initiative fully in community space, right? So it's a volunteer initiative as IBM. We support it. We run it at IBM. We work on it. We help NumFocus scale it, but it fully belongs to the community. So what are the three pillars of this? The main idea is that we bring together open-source developers and scientists and we want to accelerate science. We want to help scientists do better with open source, right? So this is actually a big difference from a lot of software engineering projects where one kind of software wants to, you know, work with other kinds of software, right? So typically have two kinds of software, then inevitably there will be third project which will connect these two. And this is very important that we actually consider scientists our customers and to them very often improving software is not their goal. It's not what they do best, right? They're focused on end goal of science discoveries and scientific collaborations, new materials, new molecules, new drugs, new knowledge, right? So it's not easy for software engineers, you know, developers at heart to actually work with scientists to help them achieve this. So how we approach this? Three things. So first of all, we stand up the open-source science as a kind of thin organization, right? We need to have a structure where we convene communities of discovery, scientists and open-source developers who want to help them. And so this is, I think, very fundamental. If you look how great open-source software spreads, usually there are developer champions, right? Like if you've seen the rise of Spark, you found that people who picked up Spark, they were really enthusiastic about having big data to fingertips, being able to have kind of a terminal connected to a giant cluster and do interactive computing. So I think there is this kind of initial impulse where developers get excited about something which is at the root of open-source. And people who do scientific open-source usually are scientists, right? For instance, in brain imaging, in astronomy. If you look at some of the biggest projects in non-focus, astronomy takes half of this. So if you're an alien and you just land at a sci-fi conference and you think, you know, what are the earth things doing with software? You think half of them are doing astronomy. A quarter of the, you know, is doing particle physics with certain others. And a very small fraction is doing healthcare and biology, right? So the kind of the natural areas where open-source start in science was started by scientific enthusiasts. And I think it was organically driven by the fact that, for instance, astronomy, the data is plentiful. Commercial use of this data is non-existent. You cannot sell galaxy data for money. So this area is actually ahead. And so if you look at them as examples, you need, you know, what we call developer champions for Spark or Kafka and so forth. We need the same people from scientific verticals. We need biologists who found that model and drugs with open-source tools, let them achieve their scientific goals, right? Or chemists who find new materials or, you know, climate change, scientists who look at space images. So you need to find kind of allies. You need to find the scientists who understand open-source. And likewise, there is a lot of software engineers who basically like science, right? They may be doing science now or not. Some of them are kind of stuck in jobs supporting ad stack and selling ads. But, you know, as David Patterson at Berkeley famous and said, you know, you know, instead of selling ads, why wouldn't you help us solve cancer? There is this aspiration which is driving a lot of people to do science. And there is a big overlap of these people with open source enthusiasts. There is a lot of synergy. So the first pillar, we need to find these people and put them together, right? And as I said, the kind of scientists see their goal different. So there is a huge language translation. If you want to work with chemists, it's not like reading a readme.txt. There is no readme.txt for chemistry. You can't just clone chemistry, read the file and start doing it. Same with biology. You need to read an enormously heavy book which you cannot even care in a backpack if you want to understand, you know, cell biology. And so the way these people share knowledge is actually very tricky. So we did a study at Almaden, human computer interaction, where you look at boundary objects as kind of machine learning, boundary objects as models, which share knowledge between these groups of people. So even the way the knowledge is spread needs to be treated properly. Now the second pillar, so once we have the communities, we can systematically identify the needs and gaps, right? So usually they are not aware of a lot of open source projects. Scientists do not share this information for some reason as computer scientists do. So the communication channels in science are different, in sense, are different. And what often happens, scientists just go and start hacking, right? And they do not try to eliminate kind of, they don't follow the dry principle. They don't try to reuse existing projects. So it's very important that we enumerate what is there, make people aware of everything, and then we prioritize work which is needed so we can understand, do they need a new project? Do they need features in an existing project? And very often they're not able to articulate what they need. They know they want to use AI, right? They're skeptical of it at the same time because their knowledge is very specific. So computer scientists need to actually work with scientists to share what is available and together they can actually identify some of these needs. So the needs and gaps themselves are not a given. It's something which is collaboratively defined, option space. And finally, once we know where the effort needs to go, we can resource this and currently in open source space there is, I think, a very mature ways to find funding. There is NSF funding, NIH funding, a specific open source funding from NSF. There is going to grow an understanding that you need to enable science with open source and the founders are thinking how to do this, right? So you can actually connect resources to this work. So that's kind of the three pillars. We launched at SciPy in Austin with non-focus and we had basically several organizations which support us and share the same goals so we can launch partners. It's not a formal structure. We don't have membership fees or anything like this, right? So we currently run as a purely volunteer organization. So this is the very close community scientific python. It's one of the communities at non-focus and it operates as a separate project of non-focus. And so basically all of these folks develop Python for science in overlap with goals of open source science. So another kind of finding is that there is a lot of organizations, right, who want to advance science through open source in different ways. So we want to be a glue, we want to be kind of the place where all of these forces can join. And I think being at non-focus gives us a very good vantage point. So Panjio is another example and now I have to team to talk about the interest groups. All right. So we started, this was launched in July of last year at SciPy and we launched the first couple of interest groups in November of last year. We have three verticals, chemistry materials, life sciences, healthcare and climate and sustainability. And we also have a couple of horizontals. One is launching in June on reproducibility and the other is about an idea of creating a map of science, an interface that will allow you to, you know, in your area of scientific area of interest to find the open source tools that are being used and also the research that has made use of those tools and the people behind both. And so like having an instant starting off point to connect with the ecosystem. Yeah, we, people have been gathering, getting to know each other, a big part of what we do is a build community, providing a space for people to exchange experiences and identify pain points and we're hoping to move to a kind of an actionable agenda, an action plan over the summer with these early groups and then yeah, life sciences and reproducibility are launching within the next few weeks. We have done the basic setup. There's a slack for the first four dozen collaborators. We're launching a newsletter this month. There might be some kind of forum or discord coming for the wider community of people that are interested in this initiative. Structure wise, there is a steering committee that's forming with some luminaries in that group that's going to provide guidance and it is set up to also support in the future additional verticals and additional horizontals. We have about 40 people signed up that want to either start or join an interest group and there's some very interesting potential topics that are bubbling up where people want to get engaged and so we're starting to vet that list and you know, find opportunities for people to plug in. In principle, it's supposed to be once we're a little bit more organized, very open, right? So anyone can attend the call, maybe join a group to collaborate. We want to share everything publicly, like work in the open, so that's the approach. So partnerships, so basically this is bottom up. Bottom up, we stand interest groups in verticals by science and horizontals by concern like reproducibility, which everybody cares about, right? You have two sciences, you have five ideas of reproducibility and as computer scientists we know how to solve it but it's a social technical problem. You have to convince scientists, you have to find what works for them. It's not just, you know, having a mutable infrastructure. This is not what they like to use. Now, kind of top down, we want to do partnerships with as many allies as possible, right? So some specific things. We're at non-focus. Non-focus itself has, you know, a 10-year-old established no-profit at this space. So everybody knows its tools, not everybody knows non-focus. So some of the chemical scientific python, all right, which is really deep roots, kind of, Jupiter comes from this. CZI is actually very active, Chandler-Grabbeck initiative. So again, like this is something you see once you start working in the space. The mandate is generally to fund healthcare, you know, software open source for healthcare but because OSS for healthcare sits on top of the whole data stack, they effectively support the whole data science stack. And so they give specific open source grants, right? Anybody can apply for a grant to do basic work at NumPy and some other, like, image viewers and so forth. And they also do a lot of interesting work where they look at citations and open source tools used for papers and we'll talk about it later. We also collaborate with OSPAS. So open source program offices, specifically Sloan Foundation funded six academic OSPAS. One of them is the University of California is on the cruise and it happened to be nearby us. And they're actually one of the leading OSPAS because they stand on top of CROSS. So Emil Lovall gave a talk, Center for International Resource Software. It itself was funded because CEPH started at UCSC, right? And they donated to UCSC and established CROSS. It's a 10-year-old center. And so they actually have a very effective program where they show how open source in academia brings money to the university, right? If you look at technology transfer, they demonstrate that the IP developed for technology transfer with open source actually brings money to the university. And so they're very systematically prove viability and importance of open source in academia. So I think it's a very good example how an open source center at the university can coexist with all the university bureaucracies and succeed and thrive by being very focused on what it does and showing the impact of open source. So we want to partner with OSPAS there to look at how does open source science work at the small. If you look at the university, the university is a kind of a time machine. What has happened at the university today will be happening in reality in 10 years, right? So it's a lab. The research being incubated and spread will be commercialized later. And it's important to also understand how different sciences interact. And the university itself is very silent. So a very interesting finding is that when they place an OSPA in a university, they don't even know where to put it, right? Like, where do you put it? Each department has its own thing. You cannot put an OSPA in a department. So yes, where do they put an OSPA in a university? Any guesses? Yeah. Perfect. You got it. Library, right? Like, what else? Who else is serving the university, a library? So like, in some universities, they put it in a library in Carnegie Mellon, Johns Hopkins. And so that's just an example of how hard it is to organize a place like a university. And that's an example of what happens in science, right? And I think open source is actually breaking all these barriers. A lot of these are communication barriers. And obviously, you need to do a lot of field work. You need to engage people and to talk to them. So like literally want to go door to door and knock on the doors. But let's say you want to even study where do they use open source? You want to go to GitHub and find all the repos associated with the university. It's actually not easy. It's a research project in itself because people committed their own Gmail addresses. You cannot use the API to query by email if you can. You will not find by domain because people use their own emails. Right? So even in such a simple ask, like, give me open source for the university is not easy to do bottom up. And basically, we find a lot of this stuff is not a data mining question. It's really can slow up the whole GitHub, put it in source graph or something like this. You will find 100,000 abandoned projects and you will not know what to do with them. Like the real question is who is using actively open source for some current science and you need to do it by sitting at the top projects and you need to talk to people to understand what the top projects are. So, and we have a bunch of other partnership. One of the other things I want to mention is the Amsterdam Declaration on Sustainable Research Software. So it's actually a very interesting policy meeting. We learned about it through CZI and NASA folks going there. And at first I thought this is going to be like some kind of policy meeting, but it's actually extremely useful. So it was several dozen organizations who give money to open source, including essentially ministries of innovation of different countries, you know, Japan, all your countries, Brazil, Indonesia, cross EU consortia, which gives money to innovation. And so what is happening, these people want to foster open source based innovation, but they don't know how to do it. So this group of folks, it was a Netherlands Science Center and Research Software Alliance. They essentially said, okay, let's basically put a set of principles down to educate the funders. And one of the things we got in there is that if you want to have sustainable research software, you need to ensure it has a viable and thriving community around it. So to a lot of funders, it's news, right? So we want it codified in basically this declaration, then the declaration uses a set of recommendations. What do you recommend to fund? So now you should be able to ask for a community manager in your grant application, and it should not be looked down upon. It should actually be encouraged by the funders themselves, right? They should encourage applicants to include things like community managers. So we hope that once this thing is adopted and signed by Dario at IBM and other top folks, hopefully ministers of innovation in different countries, it will become a established best practice. And you will not have to fight for these things, right? You will be able to point at it. Also, it will be very easy to explain what we're doing by pointing at Amsterdam Declaration. So this is actually a very useful tool. We want to basically, so we're at the table drafting it, and we're going to basically use it very actively. I think it will help us explain what OSPAS do, what open source organizations do. And the funders should be encouraged to actually use it to give money to people in our community to do their work. So and Tim will talk about our meetups. Yeah, just really quick. We're starting to, we had a first meetup at the University of Santa Cruz in February, and we're starting monthly meetups this summer, either probably next month, to give people a chance to learn about open source signs, to learn about what people are interested in, to learn about what people are already doing or working on, and get them involved in this initiative. And those will be hybrid if we are traveling, and they will be virtual if we're not traveling, so that people can tune in from wherever they want. Yeah, and the final project I want to mention is Map of Science. So this is something we should want to build, right? So currently, if you ask a question, so we want to go to the world where scientists use open source, right? To solve hard problems facing humanity. You cannot go somewhere unless you know where you are. So if the question is where are we with open source in science, we cannot answer this. There is no easy way to say these areas have a lot of open source and are thriving. These areas are not. We have anecdotal evidence like astronomy and biology, but we cannot quantify it. So what we'd like to do, we want to build a map of science, which actually will have essentially an ontology of all of science, all of open source ever in existence on GitHub, GitLab, PyPy, anywhere. And we want to basically be able to click on an area of science like lung cancer and find all the open source used in this space, all the papers, you know, from the science side, citing this open source, data sets, machine learning models, and groups of people, right? So like, if you're a scientist, you want to ask basic questions, you know, which open source is my group using in the university? My colleagues are using it in another university. What open source are they using, which I'm not yet using, right? And let's look at scientific papers. Give me all the OSS used by these people. So there's a very simple questions, you know, you would want to ask, if you go to archive.org, click on a paper, you want to say, where's a slack? I can ask about open source using this paper, right? I want to install it. I want to ask some questions. So this is very hard. You cannot do this right now at any scale. And I think the reason that it does not exist is again, there is no social network of scientists like these Twitter and LinkedIn. So again, it's very fragmented. So we'd like to build this. So CZI is going to fund the first workshop for this in October during the Linux member meeting in Monterrey. So we hope to find some people who, and so it will be like top folks in bibliography, meta research, and graph mining and so forth. So we're actually looking for people who can contribute knowledge, also infrastructure resources, right? Like folks like source graph, folks actually doing supply chain management. They have a lot of tools for this, right? Follow independency graphs of software. So we are actively looking for folks who can contribute to this idea. So if you know anybody, right, if you have ideas, share it with us. It's something we want to build and have as a tool at non-focus, right? So everybody should be able to go there and find where they are in science, open source, find other people working on this as questions, add their own projects to that. Cool. Yeah, we have a growing list of people that are coming into our network, either as collaborators or as people that have expressed support for the mission. So it's just an excerpt of some of the organizations that are popping up on our radar. And we're very eager to connect with existing people, organizations, communities that work on related efforts. So again, if you have leads that what might be, who this might resonate with, please point them away or make an introduction. Yeah, and how you can stay involved, maybe last but not least. At least follow us. We are medium. There's a newsletter that's launching this month. And if you want to get involved, talk to us. The interest groups are generally open. We're still setting up a little bit and need to get a little bit more organized, but in principle, they're open. And you can also start by looking at if you're a scientist or you have scientist friends, they can start by looking at what open source is being used in their space and how that might be improved going forward. Yeah, we're at this point just growing on network. So we're really, this was very helpful to come here to Vancouver, meet a lot of people that are very well networked and hopefully will give us some good introductions to some very cool neighboring networks. So and we're at open source.science, Alexi and Tim email us, contact us and now we're going to open it up to questions feedback. Okay, talk to me how you're working with LF in spaces like LF energy. They're doing a lot of work there on some basic science about energy transmission distribution generation. I was in a presentation yesterday about Sandia labs, lithium ion batteries. They have a huge data set. It's the biggest data set describes battery degradation across different battery types and they help us. So are you connecting? Yes. So we're starting. So I met a few people from the Linux Foundation research team. I'm hoping to still meet someone from OS climate. If not, then we're going to follow up. We connected before on LinkedIn. And good for you. And specifically for Sandia, I can add specifically say, yeah, I met in our focus on Sandia. Again, this is the reason we come here. Like we're just starting out. We do not have a formal kind of collaboration. So basically, what I'd like to figure out, right, how non focus and Linux Foundation work together, we want to be the glue between LF and non focus in the specific area. Right. There is no reason for artificial fragmentation. Most of Python data stack is a non focus by torch foundation is at LF. There is no reason for us not to treat it as a whole thing. Right. And scientists do not care where each project is home. So specifically like for the data sets again, like it's, it's, we need to talk more and to establish where things are. Once I learned about the battery data set at IBM research, there are folks who do exactly this and they want to run hackathons and cleaning up people data sets. And like, send it has tools to do it. So I think this is a perfect match. So I'm just going to like take it back to Zurich and tell people, right. So, so this is an example. But I think what, what should happen organically, once we have an interest group in open source science, we should set an interest group. Right. Like, if you know, basically, folks who can lead it and again, like the group does not have to kind of belong to one or the other, right. I'd like open source to be science to be a glue. And once people know that there is a monthly call of this group, this is where you learn about data sets. This is where you learn about open source. Hope that will become a discover mechanism. And once we have this map of science, you can put it in there. Right. So hopefully we'll increase the discoverability of this stuff. So there's a few other projects that, that aim to make an ontology or a map of science that I've seen. It doesn't, I don't think that any of them have that bi-directional integration with the open source aspect of the science that exists. Do you plan to reference or collaborate with any of those projects out there? I could try to name some if, unless you, you have. Absolutely. Yes. I mean, we definitely want to do that. Right. So I think the, having the specific goal of finding open source tools, data sets, communities, it's a very good, you know, focusing function for us. Right. Because again, you have three ontologists. You have 10 ontologists. Like you can argue forever. Do you need an ontologist? Some people say we don't need an ontologist at all because Yahoo lost to Google. You have a search box. If you know what to look for, you can search for it. So some people question whether they need an ontology. I think we do need an ontology because, or you need some representation because if you're an administrator at university or you, if you're an executive links foundation, you want to know what's going on the whole thing. Right. You need the bird's eye view. You need to see which sector is growing in the heat map. So you do need some visual organization of the whole thing you can drill down on. But how it is done is done functionally. Right. So, so, but it's very important that it's done correctly. Because for instance, I learned that, you know, in chemistry, there is material science. You would think everything is a material, right? Like material science probably includes drugs. No, if you think that material science includes pharma, you're wrong. And if you misname it, you will not, you will lose the trust of people in each, in each of this fields. So the ontology should be correct. People, scientists should find themselves in the ontology by the names they, you know, accept. So, so it's very important ontology is something coming from the scientific worlds. That's why we need the interest groups and verticals so they can name the things properly. Right. But once they name their things, we can go and find all the open source and their things. So, so it has to go hand in hand. But I'd like to know all of this projects. So here at the Linux Foundation, we are building an infrastructure to look through all of open source and be able to go and understand where our collaborators are and all of the different things that you've described as what you need to go and find. What we don't have is that overlay of science on top of that. So at some point, the collaboration between us should probably happen just to figure out how to overlay that. Because we will have all the data pulled together and be able to make use of it. So I think it's October, November, come see me. Okay, sounds awesome. I mean, we saw LFII landscape, right, like in the Linux Foundation. So we definitely want to basically overlay with this. I think. Yeah, I'm talking about down at the developer level across all the open source projects. So understanding completely what's happening out there and being able to tie that back through everything is what we're building in behind. Because that kind of telemetry is what all of our community members are looking for in regard to their projects to help them guide them. Sounds great. I mean, science can be one use of it and people can ask different questions, right? So yeah, I would like to be one of the layers of all that. All right, come talk to us. Sounds good. First off, I think this is such a cool project. And I know our office is really excited that UC Santa Cruz Ospo to be working with you guys. And so thank you for the kind shout out as well. My question is I'm so curious about like the people aspect of everything. And I know you've been talking to lots of different scientists already. And I'm just curious, like very anecdotally kind of by discipline, like who's really excited about this, like who sees obstacles in their discipline to this, you know, I think that different my experience and research is that there's some fields where like open source is already the norm and other fields that are like, oh, that's something to aspire to, but we have no idea how to get there. So yeah, just curious your experience with that so far. Jim, do you want to ask questions? Yeah, I can start. I think so we have the climate and sustainability and chemistry materials going. I wouldn't quite say up and running, but they're like they've met a few times. I think we're still figuring out where we're going to tackle first. Generally, the energy is very good, like people are coming to this with lots of they have lots of ideas for where things could be improved. They're like they're it's like they're scratching their own itch. It's like you don't have to push. It's more like channeling it now and coming to like a concrete, you know, actionable agenda. Now, life sciences we don't know yet because that's about to launch. That might be we'll see how that works out. I think in principle, again, people are interested, but they might have other restrictions in terms of what can be shared and what is proprietary. So we'll figure that out, but Alex, yeah, I cannot a few, you know, it's very interesting, right? Because if you look at every individual scientist in general, while generalizing, they're like super smart and maybe like smarter than individual developer, but collectively they're behind, right? Again, like, you know, part of my is duration, but why is that? It's a baffling question. Sciences are very smart. Why are they behind in open source? The reason is, kind of in the macro level, that science is a church. Science is a trust network where trust is bestowed by people to other people. Open source is a meritocracy. I don't care if you're penguin and Antarctic, if you're committing good code, if everybody's liking it, right, you're good, you're accepted. So a lot of this is actually in communication. So if you look at who are the people successful in introducing open source to science, we want to find them. You are an example of this, right? Like you guys are in the lead. So usually you find that these are people who are like, it's very interesting group of people. They're kind of, they're able to navigate the bureaucracy of a university, which is like a normal mind field, and they're able to bring an open source there, right? And so basically you need to learn from them. An example, chemistry. Chemistry is extremely fragmented. I'm spent like a whole year looking at it, and I still know where I'm behind where I started, probably, right? And it's, you know, they have a lot of machines. They're very focused. This is a very typical example. They're very narrow focused on finding new materials, new drugs. They don't care about software. They don't care about big data, anything. Like they want to find the material. Everything is a means to an end. Also, they're very niche. The colleagues in chemistry may provide data to them and they feed data to others, but they're not like, they don't want to build data pipelines. So people who are actually successful in this, they talk, so the way communication works, they pose a problem as a computer science problem to their friends who are professors in computer science. And those people popularize it among computer science students. So there is an affinity, right, then, and then because they reinvent the wheel in chemistry, they don't know what to do. So in the departments where it works, there are these lateral connections and there are usually very active young faculty who actually overcame a bunch of adversity and know how to fight this. So I think our goal as this initiative is actually to find these people and elevate them, invite them to speak and be examples in their field. Also through this interest group, kind of assemble them together and create a network effect. And I think there is no other way but to painstakingly find them, discipline by discipline. So it's a field work. It's a people initiative. There is no other way but to do that. You can just write block posts, write and hope they will be accepted. And then once you find these people, like we just connected with one of the folks leading the lab at UBC this morning, super inspiring, like lots of learning. So that's what we should be doing. Hi, it's so nice to see you here. So this sounds really cool. I have designed and facilitated interactive events for professional, technical, and scientific communities. And I am hoping that you will, as you start convening and building a new thing, avoid one to many talking heads as your predominant event format and that you explore using open space technology as your format for when you gather. Because I've spent 18 years helping the digital identity space come together. And the reason we've succeeded is because nobody sets the agenda at the event, except the people who show up that day. So it's just a friendly suggestion, offer, slash, encouragement. Yeah, I mean, we're not doing those kinds of events yet, but we will. I mean, we will have larger gatherings and probably work meetings and stuff like that, right? Absolutely agree. And I mean, like one thing we want to promote is that we want, you know, scientists to show us a Jupyter notebook and see how the science is working and show us a robot. So we've seen today a robot controlled by software. Like we want to see stuff, right? We want to see science done with open source, not talked about only. And like that will be the best example if you can show stuff which is working, right? And like if we can structure these meetings so we can demonstrate things and learn from each other, that will be the best way to do it. And definitely like we're fans of on conference format, where people show up and do stuff, right? Like which is interesting to them. Absolutely. You're welcome to help us facilitate it. Any other questions? Thank you so much. Please connect to us and we hope to see you at all like our open source events.