 We're going to have a little bit of a quick blitz of lightning talks. So my name is Fernando Perez, I'm one of the founders of the founding PIs of this institute, but I'm also a staff scientist up at Lawrence Berkeley National Lab. And Jessica Hamrick, who's here with me, is a graduate student here at UC Berkeley. She's a member, she's a core developer in the project and a member of her steering council. Damiana Avila works at Continuum Analytics and is also a member of her steering council. Jorge, who will speak afterwards, works at Bloomberg. And finally we have Peter Parente from IBM, who will give the fifth talk. We're going to try to give you folks five very fast ten minute talks. We'll take only one question in between talks while we change speakers. But then at the end we'll have time for more discussion and questions. So, is it not on? So I want to introduce a little bit of kind of the context of the project, but I'll leave it to them to really try to showcase what it is that we're doing in the project. Jupiter is a project that tries to tackle kind of this problem. The fact that in data science and in scientific research, if we think very, very schematically, the process kind of involves at some point or another these separate stages, exploration of an idea. You have some code, you have some data, you're playing with it. At some point you need to collaborate with colleagues. You often have to run your ideas if it looks like they're working out on large scale resources, whether it's in the cloud or an HPC environment for those of us who are at places like the national labs. If you're successful, you end up communicating your work, whether that communication is in the form of a peer-reviewed publication as an academic or a blog post or report for your customers. You still have to communicate your results and hopefully you're doing that in a way that others can reproduce and work off of. And those of us in the academic environment teach students, but education actually happens in many, many contexts, not only in academia. And often this is a cycle where we try to repeat it, but where each stage is a separate problem. You toss back and forth scripts, you email figures back and forth, you write things by copying and pasting, you replicate a MATLAB prototype into C to run in the cloud. And the question that we've posed ourselves is, should this be considered a single problem? And can we treat this in a unified manner so that we build tools and environments that actually help all of this process be a single continuum where you can move back and forth fluidly? The project didn't start with such grand ambitions. In 2001, I was a grad student looking for any excuse to procrastinate on a particle physics dissertation. And I began writing basically a fancy interactive shell to run my code in Python and look at my data and plot my figures. Skipping very rapidly, in 2016 now, I Python has evolved into a project that we now call Jupiter, which still works for Python and I Python continues to exist. But it also works for Julia, it works for R, it works for many, many other languages right now. The architecture that we've built provides that same interactive exploratory experience, but across I think 66 programming languages and counting less we checked. And obviously none of this work is mine at this point. The credit entirely goes to the team. We'll have a sampling of a few of them. This picture was taken yesterday at the lab, so most of the people in this photo are in the room if you want to meet them, if you want to talk to them, if you want to ask them questions about most of the areas of the project they know more than I do at this point. I'm now decoration on the wall. And we also have a lot of other people who have contributed. We have a healthy open source community that contributes and we try to engage. We hope some of you will be the next contributors to the project. This could not happen without the support, financial support and funding and partnerships. We have received over the years a combination of funding from foundations. In particular right now it's important to acknowledge the Alfred P. Sloan Foundation and the Gordon and Betty Moore Foundation. They're the ones who paid for this place, but they're also the ones who are paying for a lot of our development and the Helmsley Trust. But we've also received funds from other foundations and from companies that in various ways collaborate with us, whether it's contributing people who, like some of the speakers you'll see later, actively participate in the project or contributing financially to the project. And I very quickly want to kind of show just two quick vignettes of what is it that motivates us to do this. So first of all, science. Science matters. A very long time ago in a galaxy far, far away, two black holes met each other. And when those two black holes met each other, Einstein exactly about 100 years ago predicted that when that happened, when two black holes merged into one another, the effects on space time would be such that space time itself would ripple and that we could measure that if we were smart enough. Now measuring a gravitational wave, measuring the ripples in space time that propagate, just like light propagates as the fluctuation of the electric and magnetic fields. But in this case, the fluctuations of the fabric of space time itself is extremely tricky. What you have to do is you have to measure something with the sensitivity of 21 orders of magnitude. To give you a sense of scale, that is roughly equivalent to trying to find a person, not on a photograph of Earth, not on the picture of the solar system, but one person in the Milky Way. That's roughly the scale of the problem you're looking at. So what scientists did is they built two things that are called Michelson interferometers. These are two L-shaped instruments that have tunnels that are four kilometers long each at 90 degree angles. One of them is in Hanford, Connecticut. The other one is in Louisiana. And because they have these arms that are at 90 degrees to each other and they shoot laser beams up and down these arms, if ripple goes through space time and space time itself stretches, these arms will stretch in unequal manners. And you can detect that in the interference pattern of the laser light that is bouncing back and forth between the mirrors at the ends of these things. Why do you have two of them? Because this level of sensitivity is so crazy that pretty much anything else is worse, has a much stronger signal than the thing you're looking for. Somebody walks nearby. Somebody drops a hammer, a truck drives by the highway 10 miles down, a wave that a little on Earthquake. So you put two of them so that at least you can say halfway across the country the same hammer didn't fall in exactly the same way. And they did. In September 14, 2015, before they even officially turned on this instrument for its new phase of data acquisition and it was technically an engineering phase, they detected a signal that was actually the ripple. This is the Hanford detector and this is the Louisiana detector. And this ripple is what a black hole does, what two black holes do when they collide. And it's an amazing feat of science in the sense that in here the gray, the light gray shading is actually the numerical model of the solution on a supercomputer of the general relativity equations for that phenomenon. Down here, this is the spectrogram of the signal. This is in the time domain and this is the time frequency to composition of that signal. If you look on the walls, you'll see here pictures of a bunch of color maps. It turns out that this plot is not only scientifically interesting, but it ties to bids because all of the figures in this were made with open source tools and in particular all of these plots were made with a tool called Matloth Lib that is part of the scientific Python community. And the new color map that was developed here was actually done at bids by two of our fellows. And these are the prototypes of that color map. This is probably going to be one of the most sited images in the scientific literature in the next decade. And those were the sketches of the development of the color map, which is based on a very good perceptual model of human vision. A few days ago, one of the lead scientists in this project was presenting this work at LBL. It was probably the most fascinating and electrifying scientific talk I've ever seen. And the LIGO team actually published all of their work for you to play with, for you to replicate in the form of Jupyter Notebook. So you can go to their Open Science Center and execute as a Jupyter Notebook the exact analysis that shows this data. Should have plugged this in. Do we have audio? Never mind. So there's a notebook where this audio can actually be converted. Did you hear that? So that is the actual sound of two black holes converging into each other 1.3 billion light years away created in a Jupyter Notebook. Second, for education. Because we're not only scientists, we also are at a university and we care about education. Jess is going to talk a little bit more about education in the context of the university. So I'm going to bring up a little vignette of something new that was released just a few days ago by O'Reilly, a team of educators and technologists that O'Reilly released a new platform called O'Reilly Oriole, which is basically a way of presenting tutorial and computational material online for education. Peter Norvig, who's the head of research at Google and a very famous research in artificial intelligence, is famous for writing these amazingly interesting notebooks that describe computational problems and walk you through the analysis and the construction of the solution. You can download them. They generate the most traffic on our servers. But what happened, what became available last week is that O'Reilly developed a platform where the same content can actually be tied to a video that narrates, in synchronization with the code execution, online the original notebook together with a video of the author. So it's basically a way of delivering a lecture that is a live executable lecture with content and with a video that is automatically synchronized to the actual text. So there's a table of contents here. And basically as you move up and down, the video automatically synchronizes with the narrative. And you can execute the computation. So this is on the O'Reilly website. But each code fragment you click in here and you actually execute the result. And you can modify it. So you basically have the delivery of the lecture together with the text and the computation in a way that you can execute by just opening a web page. So these are two quick vignettes and examples of why it is that we're trying to build what we're building. And hopefully by having a quick look at a collage of other ideas and other uses that we're making, you'll see the kind of perspective that we're trying to bring to data science. Ask us questions. And we'll be here for a while. Thank you. And I'll pass it on to Jess now. The name is not an acronym. But it's inspired by open languages in data science. And it also helps that by misspelling the language of the planet, the search space and the domains and everything was available. So it's that also helped. So it took us a long time to find a good name. And eventually we converted them to there for that reason. And the PY, even though the architecture now applies to all programming languages, the PI, the PY is so far as to Python, which is the main language of the process, which is the one that we built. Is this loud enough? Do people hear me in the back? Great. OK, so hi, I'm Jess. As Bernando mentioned, I'm a grad student here at UC Berkeley in the psychology department in a lab called the Computational and Cognitive Science Lab. And what we do in this lab is we construct computational models of human behavior. And then we run online, well, not always online, but frequently online behavioral experiments. And then compare the results of those models to the results that we get from our experiments as hypotheses for why people act the way that they do. And so what I'm going to show you is just a few examples of how I specifically use the Jupiter ecosystem in my research and how it plays into these types of things that I need to do in terms of creating models, running simulations, doing the data analysis, and so on, and a little bit about education too at the end. So I'm just going to give you a really quick outline of some of the topics I'm going to cover. It's going to be a bit of a whirlwind tour. The first example that I'm going to show you is how I use the notebook just to prototype my simulation code and my model code and work on that as I'm developing my model. The second I'm going to show is how I use the notebook for doing data analysis after I've run my model simulations, collected all my data online, and then I'm actually analyzing that. I'm going to show you an example of how I use a tool called MB Convert for my data analysis as well. So MB Convert allows you to convert notebooks into other formats such as HTML. It also allows you to run notebooks in place and so provides a method for automatically executing the analyses that you need to do. I'm also going to show you how I use IPython Parallel to do a little bit of parallelization. I don't do a lot of big data stuff, but I do use it a little bit. And then I'm going to show you also not just how I do my data analysis and stuff, but for other aspects of research, for example, there's a reading group that I have this semester, and I've been coding up some small demos of the models that we're talking about in our reading group in the notebook. And then also using the notebook to actually communicate some of the research and things that I've been doing. And then finally, looking at notebooks and interactive widgets as a tool for education, and in particular why I think the notebook format is really ideal for creating assignments and conveying knowledge, especially about data science to students. So that's the really broad overview. I'll now show you some more specific things. So the first thing I mentioned I was going to show you is just how I use the notebook for doing prototyping. So here, oh, that is not sized well, sorry about that. Let me see if I can resize this. So here I have a notebook for a project that I'm currently working on, which if any of you do stuff with Bayesian statistics or Bayesian modeling, I have a Bayesian model here that is trying to capture both people's responses to a task as well as their response times. And so I have defined here, not actually in the notebook, I have all my modeling code elsewhere that I work on in a separate file. But then actually in the notebook, I can run the model and take some samples that will take a while to run so we won't wait for it. But this graph here is what you see when you actually output it. And I can decide to change the parameters directly here in the notebook. I can run different versions of the same model and compare those. And this just gives me a really nice way to interactively play around with the different decisions that I might make in my modeling choices and instantly visualize the results. So that's kind of the method that I take when I'm working on creating these models and before I set up my simulations, once I've actually run the simulations and collected data online, I will also use the notebook to do the analysis. So this is for a different project in which we've been looking at how humans and robots can collaborate together. And I have all the analysis for this project in this notebook and I can run the cells, have a few analyses that are defined elsewhere in separate files. But then some of them, like some figures I have that I can run directly here in the notebook. And these are the same figures that go in the publication. And the notebook is like really an ideal place for creating these types of figures because it allows you to so rapidly iterate and make the smallest tweaks that you need to. You don't have to rerun your entire script, you just rerun the same cell. And furthermore, if you are doing your analysis, I do most of it in Python, but some more advanced statistical analyses, maybe it's more appropriate to use R, so you can use R directly in the notebook as well using the art and magic. And you get the results of that analysis there too. So that's one particular way in which I might do data analysis. I might take it even a step further and try to make it a little bit more reproducible. So whereas this is a big giant file with all of the analyses for the paper, for another paper I'm working on in which I have even more stuff, I have a whole directory full of notebooks. So here I have maybe, I don't know, 30 notebooks, something like that. Each one of these has a different analysis in it, creates a different type of plot or a different type of statistic. And I can run each of these individually or I can, as I mentioned, use, I don't know why this, my screen is weird. I can, here in this analysis folder, again all the same notebooks I was just showing you and I have a build system set up, this is called SCONS, it's similar to MAKE, but it's written in Python. And I have this set up to run and be convert in the backend and execute all of my notebooks in the order of the dependencies that I've defined. And I can completely reproduce my entire set of analyses using this build system in this way, using the Jupyter ecosystem. So beyond just running analyses in the notebook, some of the other tools that I use are, for example, IPython Parallel. So as I mentioned, I don't really do very heavy parallelization, I don't really do HPC stuff, but I do occasionally have some analyses that are too slow to run on a single core and I might want to run them on, for example, a 16 core machine that we might have in the lab. And so just very easily I can import from IPython Parallel, create a connection to the client, define just the dependencies that my analysis functions need to run, and I can call those functions and it will automatically parallelize it across cores. I won't go into the details of how exactly that works, but the point is that IPython Parallel is incredibly easy to use and it makes it really easy for me to scale up my analyses in this way without me having to put a lot of thought into it. I also mentioned that I use the notebook not just for doing analysis, but in other research contexts as well. So I have this reading group in my lab this semester in which we are going through a few different deep learning models and trying to understand the details of how those work. The most recent week we covered auto-encoders and so I put together a quick demo of an auto-encoder where the goal of an auto-encoder is basically it takes some input like these images and it needs to learn to be able to reconstruct those. And so down here I have a demo in which it doesn't quite fit in the screen, but here at the top it's getting noisy versions of the input, they're being corrupted. Down here at the bottom it's learning to reconstruct those things. And so this is an animation that's running live in the notebook and this is a really great way of being able to prototype these types of demos makes it for me at least easier to learn these types of things and to get a better sense of how these models actually work. Similarly, I just did my quals in January and as part of preparing for my qualifying exams, that involved reading about 100 papers and I wrote a blog post for each paper that I read and for some of them they were long or particularly interesting blog posts but it was way for me to synthesize and understand in detail the contents of the papers and for a few of them I actually again went through and created a demo of the thing that the paper was doing and so in this one it's just kind of a simple Bayesian model of perceptual illusion. I just wanted to get a sense of why it worked the way it did but then you can use MB Convert again to convert it to HTML and now I have a nice blog post directly from the notebook that I wrote with the demo in it. And then finally, I promised I would talk about education a little bit. So education is something in particular that's kind of very near and dear to my heart because I think this notebook format that allows you to have both written pros in line with math and with figures and code and interactivity is really really ideal for assignments because you can give students instructions of what they need to do, maybe some sample code and then actual coding exercise and you can have these coding exercises be interactive so we can give them this is an interactive widget I think you'll see a little bit more about these in some of the other talks but I can change these sliders and ask students to reason about as I change this parameter why is the model's behavior changing and ask them to give a response and actually interpret that in their own words. And so this type of document format is really ideal for these types of assignments in data science and in these sort of computational fields because it gives them the ability to actually explore these things and come to the intuitions on their own. So that's it for my tour of how I use the notebook in my research and I'll take a question while the next person sets up. The R magic allows you to specify input variables and output variables. So you can both take a pandas data frame from Python, pass it to R and it becomes an R data frame and then you can say I wanna take this R data frame out and turn it into a pandas data frame. So yeah, it's a really nice, because Python's sort of my preferred language that I'm most comfortable in but sometimes I really do wanna use R so it's really nice to be able to sort of seamlessly go between them like that. Okay, do you hear me well? Okay, nice. Well, my name is Damia Nabila. Right now I will be talking a little bit about the involvement of Continuum in the Jupyter ecosystem. I am working, I started working Continuum two years ago and it was a great experience there because I am working essentially in the thing I like most which is Jupyter or Jupyter. I am also a core contributor and well, I will talk a little bit about these things. This is my Twitter handle if you have some questions or you can reach me when we finish the talk. Essentially, Continuum is interested in provides to the data analyst with tools that enable to analyze the data and to process that data and to find interactive things and it does that thing just or not only just building open source technology but also participating in several open source projects and in this case I will talk about the involvement in the Jupyter ecosystem. We have the first very thing I want to highlight is the people because the people is really important for the most important thing, I believe. We have from the Continuum side and steering consumer, we have core developers of the projects, we have season it contributors and that is very interesting because we have a lot of projects and these projects hands have some interactions with the notebook in several ways. So these people is actively working in this project and making the interaction with the Jupyter ecosystem more deep and that's great and we also have occasional contributors to the project. So that's really great because it's not only funding, it's people time that's very interesting. One of the reasons I am working with Continuum. It's a little bit tiny but essentially I want to highlight some of the more important projects inside Continuum or that Continuum supports in some way. The Jupyter lab thing, it's an upcoming thing and you will see that soon on master and probably in a soon release, I hope. We have been discussing a lot of things about that but essentially currently in the notebook you have a text editor and you have a console as well. We started with the notebook view and then we added some other things and right now we propose and we are going to an evolved way to make all these things interact in a complex but also more simple user interface from the user perspective that lets you have a five browser and the notebook and an editor in an integrated way so you can process and develop your workflow in your analysis and that is really interesting. There is a team working on that lead by Chris Goldberg and continually supporting that effort. The other thing is the Anaconda Enterprise Notebook formerly known as Wakari Enterprise. This was our solution for bringing the users with the multi-user setup for the notebook that this project started two or three years ago when the notebook was just the notebook so we provided the people with first a project concept where several users can share that space and the concept of themes and then we added also some terminal and editor then the project evolved, the Jupyter thing evolves and now the Jupyter thing have those tools but at that time it doesn't or it didn't so that was our solution. Right now we have some more evolution over this product essentially adding searching capabilities and some social capabilities related with tax, ratings and so on and we are currently also evolving this in something else trying to catch up the evolution of the Jupyter ecosystem at all. This is an interesting model. We can talk more if you are interested. The other interesting thing we have been working on in the last month was essentially some extensions and the extensions and server extensions that enable the Jupyter notebook from the Jupyter user interface interact with the Konda and the Anaconda ecosystem. I guess people know about Konda but essentially Konda is a pack manager somehow and you can easily use that pack manager to install packages but also to create environments with defined packages and that is really interesting. So we first developed NBKonda kernels which is essentially an extension that brings a contact kernel spec manager which essentially is reading all your conduct environments and it's on the fly creating kernel spec for those environments so you start just one server and then you can change the kernel to the environment you are interested on. So that's an interesting thing. Maybe I can show a little bit about this. For example, let me see. Here you can start a notebook with the kernel pointing to the specific environment and execute the Python and use all the packages that are in that specific environment and isolated environment and that is interesting. Then we have NBKonda which essentially brings this capability to install, update and so on packages inside the notebook. I don't think we can show too much because of the time and because I lost the mouse but essentially here we add another view in the file free which essentially allows you to know which environments that you have and allow you to interact with the specific packages in those environments. That is really interesting. Then we have NBKonda Cloud which essentially lets you update your notebook to Anaconda Cloud. Okay. I will go a little bit faster and then NB present which is essentially this thing you are seeing right now is essentially a notebook. This is a notebook view but you can create. This is funny because I started with this light show thing inside NB Convert that is baked by the NB Convert Reveal.js then I want to have a live slide show so I developed Rise and then another great guy which is here his name is Nick Goldweb took that idea and evolved that idea into a better presentation tool which is essentially this one NB present and it allows you to create slides on the fly. This is executable code. This is an Excel like you have a cell in your notebook but you can show that in a slide show way and this is a great work from Nick. Well I think you probably know about the Anaconda distribution which essentially gives you a lot of packages probably more than 300 and the notebook and on the Jupyter ecosystem is the first citizen in that distribution and then a lot of other projects where we have a deep integration between the tools in those projects for example Bokeh and how that project is executed or used inside a notebook. I don't have time for more I think thank you very much and this is the link for continuing if you have some questions about that. Thank you. So can you hear me? I hope you understand my Americanized French. I'd like to thank Fernando for having me here and Kevin Koy for this amazing space and every time we've been here we had an amazing time at the Sage days and Jupyter at the dev meetings working with you guys. So my name is Sylvain Corley. I'm a quant researcher at Bloomberg and also an adjunct faculty member at Columbia University and NYU where I teach a course on mathematical finance and in this context I use a notebook a lot but the first reasons why I started using and contributing to Jupyter is my job at Bloomberg and as you probably know Google and Facebook are not the only companies to do data science. Bloomberg has been an important data provider in financial data since the 80s and we've been providing analytics and tools to do analysis to our clients for a long time and naturally as quant at Bloomberg the notebook became a very appealing tool as soon as we became aware of it. So we have increasingly been contributing to it and now there are two people at Bloomberg working or contributing to Jupyter a significant proportion of the time so myself and Jason Groud and so I personally must be contributing to the Jupyter interactive widgets and to the threat lights and other infrastructural components of the platform. So Jesse presented some examples with interactive widgets but I wanted to get a bit more into depth on this subject. I should probably zoom in while the key map has changed, Audrey. Anyways, I should probably. So in addition to... So in addition to providing the ability to interleave products, code cells and rich content the notebook also offers the possibility to create simple grids using the Jupyter interactive widgets. So in this example here I created a float slider which is a simple slider that you may interactively slide or modify in the notebook or from your code interact with also from the code. You can also read the current value that was changed as you move the slider and interactive widgets also implement the sort of observer pattern that allow you to register callbacks to changes in widget attributes so that you may build more complex applications and we've custom using whichever numerical functions that you want. So for example, in this case in addition to the slider I create another widget that is a float text and I link the slider value with the text value and as I move the slider the text widget is also going to update. Or what you can also do is in addition to simply linking values you can also add a transformation to the values so that in this case for example I just compute the square of the input of the value input via the slider and display it into another text. So the widgets also are based on an MVC architecture where you may display over views of the same widget. And the way it works is that you have a single object that is being synchronized between the Python float slider object that I instantiated here and a JavaScript counterpart to that model which can be displayed multiple times and every time you display it you will have a different view of that model created in the notebook. So in addition to the slider and the text IpyWidgets also provides a number of other interactive widgets such as drop-downs, button selection sliders, toggle buttons, color pickers, et cetera, et cetera. So it's a pretty rich set of widgets but it's not enough because it doesn't yet provide any ability to visualize data. And in the example that Jessie gave earlier she was linking a slider with a plot that came from Matlothil. So IpyWidgets is not only a set of widgets it's also a framework upon which you can build other interactive GUI components. And at Bloomberg we started a project called BQplot which is open source and available in GitHub which is a complete plotting library entirely based on the IpyWidgets or the JupyterWidgets infrastructure. So it's a plotting system that implements the abstractions of the grammar of graphics as interactive Jupyter widgets and it provides two APIs, one that is meant for high-level procedures with relevant default for common shot types so that you can just call plot or hist and get what you expect from the plotting library and also lower-level description of data visualization for more complex interactive visualizations and dashboards. So let me give you some examples here. For example, in this case I just call using an API very similar to Matlab or Matlothil spy plot. I call plt.plot with x and y and also provide some options regarding the access and the grid lines and I get what you would naturally expect that is an inline plot in the notebook with a toolbar to perform panning and zooming or also a button to save the figure and as a PNG image. But, and we also implement similar things for other shot types such as histogram, scatter plot and many others. But more interestingly, every single item in these charts, in these charts actually are Jupyter widgets that can be used independently of BQ plot or shared in between different figures. For example, if you want to use the same range for two different figures for the x-axis, you may do this by just sharing the same access and scale objects. So in this example here, what I'm doing is describing in a more atomic fashion a visualization where I create a number of scales, scales being objects that transfer that, describe a mapping between data and a visual quantity such as color, pixel or even size of markers in the scatter. And then I create access that are representations of those mappings and finally I describe the different elements that I want to put on this figure that is my canvas and eventually display the figure where I have like a number of elements such as a scatter, lines, et cetera, et cetera that are over, you know, overlaid. So every single component of the figure is a widget and when I update the corresponding data in the same way as you could update the data for a slider, you see the widget update live. So, and the same thing holds for any attribute of any component of the widget. For example, in this case of the scatter, where, sorry, in this case of a scatter, I may decide that the X scale minimum is going to be four and then while it was determined with the data earlier, now it's four and if I switch back to none, it's going to be again determined with the data, the same holds for any attribute such as the access labels, et cetera. So you may wire BQ plot widgets with other widgets because it's native to API widgets and here I have a slider deciding for the rotation attribute of a scatter plot. And so, and we have other more complicated capabilities in BQ plot such as heat maps where you can do also tooltips with having other figures and other elements of BQ plot in it such as this example where I display all the countries of the world in a heat map where the colors encode the current GDP per capita and where when I hover on a cell, I would see the historical GDP for the corresponding country with the name of the country and I built that in about 15 lines of code here and you don't even need to know any JavaScript to do this. So you can also use BQ plot figures as input widgets. For example, in this example, I created an interaction, a selection widget that is an interval selector where the height of the pointer determines the width and the center and the X position of the pointer determines the position of the center of the interval. So with this sort of interaction, you can very quickly select very different sections of your chart and wire the selection or the selected item to other widgets or to other computation that is completely ad hoc that you may have written in Python. So it's a really powerful tool if you want to storm through the data with BQ plot or you can also, here I read the number, the selected items in the lines or you can on the fly change the interaction and in this occasion, in this example, I decide that I created a hand-drawn interaction where I can very simply change the data on the fly and if you want to see what would be the result of your analysis with, by cheating when you just modify a point or by seeing what it would be if it was even worse than what you see, you can do this really quickly just using BQ plot. So, and actually one thing that's really frustrating with classical widget interactions is that with a pointer you can only do one thing at a time. You can click at only one point on the screen. So you have at most X and Y, so a two dimensional control, but video games or gamers know that you have other types of interactions that you can have with a computer and it happens that the browser has an API for gamepads and so which we exposed in IPI widgets as a controller widget. So in this example here, I instantiate a controller, I'm being prompted with a message asking me to connect my gamepad and press a button and right away I would see the state of the different access and buttons of my gamepad displayed and I can easily wire this to whichever data analysis I may have done. So, now you know as data scientists you get to play with gamepad. And so just for the fun of it, since I really have to hurry, so there is another visualization library in the notebook called Py2JS that uses WebGL to perform 3D graphics in the notebook. So it works really well, super fluid. And for example, we can use it if you want to visualize terrain elevation data. So here I use GDAL to download some terrain elevation data for the Grand Canyon and display it in Py2JS. So there you go. Unfortunately, the camera is not very well positioned. So rather than trying to iterate by changing the position of the camera manually in the code, what I can do is just control the camera position with the gamepad. So here in like 10 lines of code, I wired it to my gamepad that I can, which I can plug. Sorry, is it here? Yeah, there you go. And now we almost have Flight Simulator in the Jupyter Notebook that is pretty awesome. And what I find amazing here is that this entire example for the Toy Flight Simulator that we wrote here takes about 20 lines of code. So you can do really non-trivial things using interactive widgets in the notebook using BQplot, Tratlets, and Py2JS. Thank you. So I'll take one question. So the question was, is this notebook available somewhere? Yes, it is. It's on a binder. So you can play it live without having to install anything on the internet. Yeah, I'll send you the link. Evening. I'll give the disclaimer right up front. I have nothing. His cool was that. So I'm sorry. I have no gamepad. I'm Peter Prenti. I work at IBM in the Emerging Technology Group. I'm gonna talk a little bit about how, some of the things we've been working on in the Jupyter community. We've been longtime Jupyter users. We actually use it with a bunch of customers, analyzing their data. We did a bunch of Watson engagements, analyzing output of Watson experiments that the machine actually played jeopardy. But more recently, we've gotten involved in the community trying to give back because IBM, we think Jupyter is important to us and to the broader community. So I'm gonna call this 10 new reasons to use Jupyter. This is the first time you're seeing Jupyter. I guess it'll be 10 reasons to try Jupyter. Just gonna whet your appetite with a bunch of things that might be cool, might benefit your analytics workflow. And that's actually the first one I'm gonna talk about is why IBM thinks this is important. It's really about time to value is the way we like to approach it with customers and the customers talking to us. We think, and our customers are finding interactive computing helps reduce the accidental complexity in their data science workflow. And a lot of the things I'm gonna show you are in that vein. I can sit in a consistent environment, use notebooks, use my language of choice, mix languages, not have to worry about swivel chairing between tools and get to deployment, dissemination, delivering the data product via the dashboard, a notebook itself, a slideshow, some insight, right? Jupyter, we think IBM thinks our team thinks is a nice platform that's headed in that direction to help our customers get there faster and realize that future. So some of the things we're directly involved with and this is by no means the complete ecosystem of things going on around Jupyter. One of them, number two here is Docker Saks. It's actually a project under Jupyter. It's a bunch of highly opinionated pre-built Docker images. Does everyone know what Docker is? Can I see a show of hands? Good amount, if you don't afterwards talk to someone to put their hand up for me. But you can go here, you can Docker pull an image like you just saw in the short video there and you're up and running with the Jupyter notebook server with R, Python, Spark, Scala. There's a bunch of different flavors that we have out there. We're looking for new contributions. We're really trying to decrease the barred entry to getting these things running. One of the kernels that IBM's working on, IBM is a big investment in Spark for big data analytics. Right here at Berkeley was where Spark was born. Apache Tori is its new name. It's actually an Apache project. It's a Jupyter kernel that combines Scala, Python, R, works with the Jupyter notebook. There's actually Scala code being written right there. And they're actually working right now on sharing data frames, data structures across those three languages. So you can mix all three, single notebook, work in Tori, and use Spark. There's actually a Docker stacks image, I should say, with that pre-configured. So if you're into Spark using Spark at all, check it out. It's got all three of those languages pre-installed with the kernel working in the notebook. Another thing we found when using Jupyter was the proliferation of notebooks. Working with the team, you start doing ad hoc analysis left and right, and sooner or later it's like, where did I do that? Where was that code that I'd like to copy paste? So we built a little extension. It's out in the Jupyter incubator. It links for all these at the end of the talk, by the way. It does full text search across your notebooks. So you can actually search across all your notebooks. It will show you the name of the notebook. It'll search inside the notebook too. And there's more capabilities I'll show in a minute where you can actually reuse the code in that one notebook in another, either importing it or actually inserting a snippet and so on. Another big request from our people we worked with in our engagements is, it's great I did all this work in a notebook, but now I want to stand up a standalone or create a dashboard, let's say, a layout where users that don't really care about the code that was used to arrive at the insights or do the analysis, they don't want to see that. They want to hide it and get something like this where there's plots, there's interactive widgets, the iPy widgets or BQ plot or whatever, lay those out in a dashboard layout and now be able to share that notebook with other users so they can view the dashboard and just interact with the widgets without having to scroll through all the code. We have an extension out in the incubator that enables this to, actually I'll come back to this one in a minute because it does a little bit more than that, but just think of it as a dashboard layout. To go along with that, the topic of the night is lots of interactive plotting of widgets and things. We actually are exploring another area of can we work with other ecosystems of widgets? Has anyone heard of web components or Polymer? A few people, not as many as I expected, right? So it's a rising HTML standard, let's say. We're kind of betting on, we're emerging tech, we're supposed to push the envelope that there will be an ecosystem of these web components that we want to be able to put into Jupyter Notebook and actually have them in line. And you can see here, this is actually HTML, so it's sort of language agnostic in a sense. This will work in R, Scala, and Python today. You write these declarative tags and you can pull in Polymer components like this WebGL globe and actually bind it to data. So you can bind it to a data frame, you can bind it to really anything. You've got to match the spec of the widget, obviously. But this is a core component of if you're gonna start thinking about dashboards from Notebooks, well, you need interactivity, you need these types of capabilities, otherwise you just got static images and plots and whatnot. So dashboards within Notebooks, within Jupyter, interactive widgets within Notebooks, within Jupyter. Another part of that puzzle is, well, a lot of the time, the ultimate deliverable is a standalone web app. It's great to be able to ship someone a Notebook and say, okay, you run Jupyter, install all this stuff, you could reproduce what I did. But for a lot of users, that's too high of a bar for entry. It's great for us data scientists, not for the general public. So can we take this a step further? Can I take my Notebook? Can I do the layout? Here's a very simple one. Can I then deploy it, which is what we're working on, to a standalone web server and have it run as a standalone web application. So that's what this is demoing here. In a moment, I think it shows a more complex one, I hope, because Hello World's nice, but you like to see really fun things. I think it's demoing. If we make a change to the Notebook, you can think about the workflow. Someone comes back and says, well, I want to look at it this way, okay, change a little bit of code, redeploy. It's running as a standalone web app again, right? And I think, yeah. Here's a much more complex example. It pulls in a WebGL globe. It pulls in some D3 visualization. We can also deploy full web applications, right? We're really pushing the envelope on what we can do there. Not only, another customer request we get often is not, I want to publish a web application, which is fine for certain use cases where applications have a short shelf life. People are going to interact with it, say, oh yeah, I understand what the data's telling me now, and then it kind of gets thrown away. But if you're really trying to put a full-on dashboard or web app in the production, a lot of the companies have dedicated software dev teams to do that. But what they're finding is the work in Notebooks, a Notebook will be handed off to software dev teams and say, go reimplement this, build REST APIs, build web APIs from this. So we're also trying to enable, this is the worst demo ever, right? It's using a web JSON API through my web browser, so you should just sync blocks as you go. But we're trying to enable the ability to annotate Notebooks, as I'm doing here. A little comments syntax that works with our Scala Python, we've tried a bunch of different languages, to say these cells are a web API, right? And then someone could code a web application and call this, right? So Notebook defined web APIs microservices, right? Code reuse, again, is one of the big problems we find, we're collaborating, again, playing on this idea of annotations in the Notebook. Can we turn Notebooks into reusable code modules? Can I import a Notebook as a Python module? This one only works in Python today. And actually invoke functions in that Notebook, right? So I could start sharing the Notebooks without having to think about it. First, I gotta turn into a .py file and put in a package and commit it to Git and have my team pull from there. We can actually just ship around Notebooks and invoke functions that are defined in them. That's sort of what this is showing, my very slow typing here, but in a moment, hopefully this will show something popping up with high-py widgets right in line, where that code is not from a Python module, but is from another Notebook. Just skip ahead so you don't have to wait. Oh, that's terrible when I skip ahead. Let's not do that, we'll go on to the next one. Anyway, IBM clearly, you know, great, we're contributing to open source. We do have some hosted solutions in IBM Cloud that are pulling in Jupyter and beginning to embed Jupyter. One of them is called the IBM Data Scientist Workbench, very novel naming, right? It's a combination of Jupyter, Zeppelin, Open, Refine, RStudio, all hosted together and kind of working together. It's used in IBM's big data university endeavor, a lot of data science training and classes online. Another, this screenshot is IBM Bluemix, which is sort of a cloud pass, a platform as a service, where they're starting to surface a Notebook user experience. It is Jupyter with Spark behind it. So if you want to get a big Spark cluster, you don't have to all run it yourself. If you could put your data in the cloud, you could pay IBM, you know, whatever the rate is here to get everything hosted for you, right? That was 10, but I got one bonus, which is, like I said at the beginning, we're a tiny piece of the Jupyter ecosystem, right? I'm talking about things I have direct knowledge of because I'm involved in them, but people are doing amazing things with Jupyter. We very recently published a survey, this was last week, a colleague of mine wrote this off with input from other people about the Jupyter project ecosystem. All the wonderful things are going on, both in the core dev team, at companies, in academia, in industry, and so on. So check it out. I think there's something like 60 some odd links in this at the end, the different things to look at around Jupyter. It was pretty cool, it was fun to write. And they're all the links. But you don't have to copy all those down because this presentation is a notebook, is a gist. So somehow I will make sure it gets there so you can just go through it and look at all of this at the end. Thank you.