 The speaker is Fernando Perez. He's at the University of California Berkeley and he's going to talk about Jupiter technology. So, Fernando if you can share. Well, thank you Albert for the invitation and thanks to the organizers Greg and the right and Lynn and the rest of the team for for inviting me. I'd like to talk a little bit about project Jupiter, the computational tools that we build and the project from a sort of from an earth science perspective. I'll start with a little bit of a connection to see you to see your Boulder. I actually, I actually am a graduate of CU I did my PhD in physics at CU and I did my postdoc there. So I have I have a lot of connections to Boulder and I'm delighted to be here. And actually my co director in the project Brian Granger was also a graduate of CU physics. So there's there's a lot of of us were connected to to the university. And, but I also did spend a lot of time procrastinating and during my PhD part of it in the mountains like everyone else in Boulder, and part of it writing writing code and in fact some of these things that will be talking about started at CU Boulder the screenshot on the left is the very very first part of the, of the beginning of a Python when I first put it out and you can actually still make out my, my Colorado dot edu email address from the physics server. There this was a little 259 lines of code script that I put out in 2001 to basically allow me to use tools interactively I wanted to be able to explore interactively my code my data. And this is as part of my PhD in a way similar to say math, math, math, our math lab, but, but with open source tools, and I want to emphasize that from the very beginning. These tools were collaborative effort when I put this code out what I did was I later quickly found that two other scientists had built similar things. An oceanographer in Germany and a computer science graduate student at Caltech, and they allowed me to merge what I was doing with their codes to put basically the first real working version of I Python that wasn't a toy. To the community as a the union of three open source, three open source tools. And that's how it started telling my advisor that I was going to take an afternoon to hack a little bit and that I would be back the next day to finish my dissertation. Later we see announcements from teams as large as Microsoft will say that they're building the planetary computer for studying environmental sustainability. And it's all a bunch of basically Jupiter hot technology so it's definitely been an interesting path and some of that is what I want to discuss here, perhaps one perspective. It's a little bit tongue in cheek but but not completely is to think of Jupiter as sort of the operating system for interactive data science for when humans are in the loop and they're trying to think about their data and their problems. I love this way in which the documentation team at the Turing Institute kind of presented Jupiter to the community. It's, it's a project that contains a human community and that's critically important that produces tools tools that we're going to see a little bit more of in a minute. And, and I think it's important to frame these problems in this way because a lot of the issues we have with scientific software and some of them will probably be discussed today, really have to do with with issues that go beyond the software open source software is a lot more than code on GitHub. And in Jupiter we've kind of taken the time to understand that structure. And I think it's useful to reason about these as separate layers that there's their services and content that is important many people come to our tools, because of what they enable them to do not because they're interested in the tools themselves, some of us do enjoy building the tools but not everyone does. In Jupiter, for example, the software supports supports sharing of notebooks online sharing of entire computational environments deploying Jupiter as a service for others in your community, all of these things are not exactly the self but rather than the, but rather the domain specific content to data analysis education, research, etc, that they care for and, and it takes it takes effort to develop kind of that that layer of infrastructure to have impact in the community. And in Jupiter, we've also gone in the opposite direction not just in providing services but rather the opposite which is abstracting away the ideas that are in our code into kind of openly formalized and documented standards and protocols and so the way in which the internals of Jupiter work were documented as a protocol, and then implemented in languages like Julia and are and by making them open standards over 100 different communities have developed implementations of the Jupiter machinery in their own programming languages so that they can reuse the ecosystem they can grow in the same ecosystem, but with their choice of programming language. And finally, the bottom layer of these ideas is not is neither software implementations nor even abstract standards. It's actually the people who make all of this happen. And it turns out that we as scientists aren't exactly trained in large team management fundraising, I mean, we know how to write regular grants, but not large scale fundraising across industry connection how to maintain for example, the balance of power between corporations as large as Google or Amazon, and single individuals were contributing as personal volunteers and so it takes a lot of effort to construct the infrastructures for governing and managing these, these types of efforts in a healthy way and this is, there was no course on large scale distributed community management in my particle physics PhD at CU for one thing. But it's important to really recognize the value of that community and that that by the way is where all the credit goes so anything that I show today is really not my work but it's the work of these and many, many more people who contribute contribute to the project. But when we think of these layers that maybe it may be helpful to sort of realize that the services and content that are built on top of your tools that's where the impact goes. But by generalizing the software into more abstract ideas what you allow is for the growth of an ecosystem, an interoperable ecosystem of third parties. Yes, that does take perhaps away a little bit some of your sort of central power and that is something that unfortunately the incentives of science tend to force projects into trying to be overly kind of self branding. But I think that by encouraging the growth of an interoperable ecosystem open ecosystem in the long run we're all better and hopefully we can change that culture a little bit. And finally the existence of those people in that community gives you new blood new ideas innovation resiliency, the ability to adapt and evolve and meet diverse use cases that you might not have thought of. And so from this perspective, Jupiter is offers a set of tools that sort of tackle we now do focus on the tools a little bit because obviously they matter. And that help you tackle what I hope is sort of a rough cartoon of the life cycle of a research idea we explore data we explore an idea we explore an algorithm. So typically we start on our own, eventually, we end up doing teamwork it's rare these days that single author papers are mostly a thing of the past at least in many in the kinds of fields that I think are relevant to this audience. Eventually if you really if that little idea you had is going to work you probably at some point need to do large scale runs on HPC and the cloud on supercomputers on large cloud based analysis, and you will publish communicate and teach with your results right this is kind of the the rinse and repeat cycle of science in a very idealized way and hopefully in Jupiter we were offering this operating system offers kind of little bits and pieces at all of these points that you can use to build. So I imagine at this point most of you have seen the Jupiter notebook it's a web based tool that allows you to combine code and the results of that code with narrative in using Markdown and mathematics into documents that actually can be shared and reproduced. But that interactivity that for example allows you to write with just one line of code get a little widget with controls here to to animate the parameters of a model for example. Is is exactly the same infrastructure that can be used to build richer graphical user interfaces but oriented towards research use cases. And so this is an example of a project actually led by folks at CU and others that was presented at the earth to meeting last year, we're using these exact same same interactive widget tools that built an interface called Balto user interface that can be used immediately by the scientists in their own workflow it's not a separate standalone application it's a modular library call that produces the right UI for you to continue your analysis. And at the other end of the spectrum is the usage of these tools not for one one direct local analysis but actually for very large scale cloud based analysis of one of the biggest data sets we have these days which is the same of six data set. And this is an example taken from Pangea, which is a product that I'll mention a little bit later also presented last year at the earth to meeting where there are notebooks that any of you can run if you if you go to that to that URL you'll find. You'll find links to execute these in cloud infrastructure to access the same of six data and analyze it interactively. And part of the beauty of it is that we have put generic protocols to represent interactive interactively data in the in the richest possible and kind of most informative way. And so, for example, when you call up a data set what you get is not some static printout you actually get a rich clickable double clickable. It's a HTML view that has a ton of informative data about the structure of the data set. So these are examples, some examples of the kinds of extensions that have been built on top of our basic machinery and today, the user interface that the project is most developing is called Jupiter lab takes these lessons of these interactive workflows and tries to give scientists a highly extensible kind of platform for building building custom tools that are not just notebook based but that actually incorporate other things. Data, for example, becomes sort of a first class citizen. It's not only about having notebooks but having potentially multiple views of data right there in the same interface so as you look around your data you may find an image and you can open an image. Data tables are viewable directly as data tables, but this is an example of something that that we've kind of thought through which is how to access a data set in more than one way. For example, if you open a specific Jason file that happens to be encode geospatial information is Geo Jason you can sure you can view it as text Jason is text. But if you have the right Geo Jason viewer that same file can be viewed in this manner below and instead of being seen as text it's represented using a JavaScript mapping library using leaflet so that you can actually view that data in what probably is a more informative representation of that data. And, and these tools can be used to extend to extend your lab for many new use cases this is, this is an example of sort of high end 3d visualization, being done right there in the browser developed by the team at one stack which is a startup in France, and this is another example from the same company of building sort of a workbench for robotics for interactive control and modeling of robots using our OS which is the robotics operating system. And these things have been put directly into into Jupiter lab. The following is a short video that I'm going to show you per second of a tool called the fly brain lab. So this was developed in Oriolas or Oriolas ours lab at at Columbia in New York. And what you're going to see here if the video gains playing there we go is Jupiter lab but you see at the bottom of 3d view of fruit fly brain and the neuronal circuits that are indicated there are actually simulated on the right as an electrical circuit. And when you clear they use your clicks on that those those models can actually run on a dedicated specialized cluster. And then on the right now, a little genomics and kind of relevant biological information data browser has appeared that is connected that is connected to to the same models and the same data sets. So all of those are very domain specific tools that obviously the Jupiter team has no business developing a 3d viewer for for brain data from fruit flies, but by making this infrastructure available, those domain scientists can add their own tools and they still have the rest of Jupiter right on the top left it's still a normal on the top center and sorry, it's still a normal notebook that they can use for the rest of the workflow as usual so I'm going to provide precisely that kind of platform for exploration for collaboration. So this is something that is sort of fresh up the oven. And this is in the next version of the lab a quick demo of real time collaborative editing and collaborative work with with live notebooks and computation and so in here, I have those two views are actually two separate browsers that are accessing and they happen to be on the same machine but but they could be you could be accessing through the browser or remote machine say at NCAR and you can see that the outputs are synchronized code executes and the views appear in both as Markdown as Markdown is typed text and math also is also synchronized. And you can see in here once I probably should have type a little bit faster yesterday when I recorded this, but you can see how both edits can be made in both places so down below now we're editing in the second view and we're modifying the math and quickly the real time preview and the first one updates. If I change the parameters of the plot here I re execute and then the other view shows the updated plot so it's a fully synchronized view of a real time collaborative environment for science. It's coming you can play with it right now if you, if you go to that URL, you can play with it and continuing on kind of this, this line of ideas. Okay, well after collaboration you want to run in production. If any of you work at NCAR, you can go to jupiter hub dot and car dot you car dot edu and those are the resources where jupiter hub allows you where NCAR offers you a jupiter hub that allows the login. This one hits very close to home, part of my appointments and affiliations are with LBNL. And a few days ago, Roland Thomas was one of the scientists at NERSC, post of this first notebook on the system. And if you look here he's printing the host name of the machine this is pearl matter. This is the next generation the new machine that is being commissioned at NERSC. That's going to replace Corey that's coming online. I think I got a calendar invite for the, the dedication of the machine a couple days ago. And so from the get go the one of the biggest supercomputers in the nation will be outfitted with the jupiter machinery so that you can have all of this fluidity of interactive usage and with with the tools we've been discussing, but sitting on really high end production for large scale simulation. And finally, well, if your ideas go anywhere of that project where you, you had a good idea and you collaborated and you ran in production gets somewhere you're going to want to publish your your work and probably teach it to your students and these tools also support that so there's a there's a tool in our stack called jupiter book that makes it very, very easy to grab a collection of notebooks and turn them into into a book. These are a couple of examples of some of the text books for our data science courses at UC Berkeley. And so when a student clicks on one of these books, it looks like a rendered web page with with text and math and figures, but they can click on that button that says open on the data hub. And when they click on that button they land immediately without any installation of anything they land on a hosted jupiter hub like the n car ones and nurse ones we we were showing earlier but in this case in the cloud, not an HPC and oriented towards students. And having that kind of infrastructure allows us to teach very large scale courses at Berkeley in data science in a way that if we were doing local tech support would be unthinkable. So we had pictures from a couple of years ago obviously when we were launching these courses a few years after we launched these courses but when we could still do them in person. They are they I finished teaching this last fall, and we had about 1200 students for data 100 that would that were obviously virtual, but with these courses, what we're doing is we're reaching kind of about half of the campus give or take. And really are these courses are growing very rapidly and we're reaching a lot of people on campus. And it's pretty remarkable that that that we can do that. And, and these tools are being used in the in the earth sciences community extensively here two examples of textbooks on earth environmental data science from Ryan Abernathy and Brian Rose at UAlbany. But again connection local connections to see you the earthly have asked her when I imagine many of you know at the earth lab leads the development of open educational materials for earth sciences. In Python that use use all of this machinery and our colleagues at NCAR have actually just teamed up with with Brian rose precisely this person to build a new project for project Pithya that will develop educational materials for earth sciences as well. With with with this stack. And the example is on the right is from a colleague at UC Berkeley, who's a paleomagnetism expert in kind of finally if, if all of this works we have we have a beautiful, we have a beautiful kind of combination of tools and ideas that that hopefully you can make use of that hopefully serve your research needs. But one last point that I think is becoming very apparent is the importance of the infrastructure that this runs in. Pangeo is a project that I'm sure many of you know this is a quick interactive example of what Pangeo enables a scientist is zooming into an image. When something happens it gets fuzzy there are some color bars for a second, and then after a few seconds the image sharpens in an own book what's the big deal. Well the big deal is that this is about 100 gigs of landscape imagery in the state of Washington and the little bars at the bottom our whole distributed cluster doing the analysis for making that zoom, you have to run a lot of code at scale. This is a project that takes Jupiter Dask and X array and assembles them into a stack that let scientists concentrate immediately on their science rather than becoming cloud engineers at Amazon or Google or Microsoft. In Europe and the European Commission at the the the JRC the joint research center I think it's called in Italy they've developed a Jupiter also a Jupiter lab based large scale platform for interactive analysis of geospatial data that is absolutely fascinating G A O D P P something like that but it's a really interesting development precisely along these lines. We have a project funded by earth cube that kind of tries to connect precisely scientific use cases in in geosciences we're on the page Jupiter and Pangeo stack with the development of new ideas. And Anderson but in a way and Kevin Paul or both from NCAR and so is Joe Hammond though he's on leave now. So this is a collaboration with folks with folks from NCAR and we invite you to connect with us if you're interested. And finally I want to mention a new effort that we launched last year together with Ryan Abernathy actually from Pangeo, as well as folks from from UBC, which is called to I to see and it's a nonprofit organization that aims to precisely offer this kind of tools but as managed infrastructure in a way that universities are not well suited for, but also that is not strictly a business we don't want to leave. Although I mean industry is great and we collaborate closely with many in industry, but we don't want this to be strictly a matter for in dust for industry and startups. So if you're interested, take a look at to it to see.org and let us know if any of this can be of help to you. And I will try to stop here because I know that you have a tight schedule today. Thank you so much for your attention. Thank you. Thank you, Fernando. Wow. And I thought I knew you put a notebooks and and what they were capable of you showed quite a bit of new, new technology there. We have time for one quick question and I see one in the chat from Fedor bar. Okay, so I'm going to read it for you, Fernando, if that's okay so Google has a less developed fork co lab. How is the relation between Jupiter. How is the relation with Jupiter projects so co lab is a fork. We met with the team before it was even called co lab I think we met with that team co lab is a fork of an early version of I Python when back back when the name Jupiter didn't even exist it was the lifetime notebook. This was early in the days, and it implements many of the same ideas but by this point, the problem is that co lab has moved so far into the Google infrastructure and architecture that it really is impossible to share anything so I think it's great that there is an implementation of these ideas from from Google and it is based originally on our code, but at this point. I think the best I think of the Google cloud infrastructure the Microsoft Azure infrastructure and the Amazon infrastructure. I think of each of those as an operating system I think of it as a computer it's like a proprietary computer. At this point they sort of ported I Python into the Google API is it runs into their authentication system data storage etc. And so it isn't particularly easy to get anything sort of back out of it so it enhances the visibility of these ideas in the community, but it isn't something that we can directly benefit from.