 Our next speaker is the Director of Research in the Physical Sciences at the University of Washington's East Science Institute, where his research is primarily in the area of data-driven astronomy and astrophysics. He is also incredibly famous and well-known for being a maintainer and contributor to many open-source Python projects, including SCI Kit Learn, SCI Pi, MLD, MPL D3 and others. Please welcome Jake Bonderplatz. Thanks. It's really great to be here. As you said, I'm at University of Washington. I'm an astronomer by training, and I've found myself in the last six months or so doing a bit of visualization stuff, which I'll tell you about. But today I'm going to be talking about a really, really nice open-source tool called the iPython Notebook, which, you know, I'm used to speaking to crowds where everyone knows this and uses this all the time, so can I just get a show of hands for how many people actually use the iPython Notebook? Okay, so we've got maybe a third of the room is familiar with this and uses it. That's good. I'm going to tell you why the rest of you guys should all start using this right now. So just as a talk outline, I'm going to kind of talk about the iPython Notebook itself for a little while, why I think it's important and why I think all of you should be using it. Now I'm going to go a little bit into the current world of Python visualization and talk about that a bit. And then these iPython 2.0 was released just a few weeks ago and includes this new feature called the interactive, the widgets, and this is kind of a cool way to bring some interaction to your data visualization. So I'm going to talk about all those things, but we'll start out with just something about the Notebook itself. So I say iPython Notebook is visualization in context. So I want to start out by showing you something that we've seen this sort of plot before. This is the type of plot that's easy to pick on, right? It's a USA Today graphic. I hope there are no USA Today people. But basically when a graphic lacks context, when we lack where the zero is on the y-axis, it can be misleading. And there are all sorts of examples on this. On my Facebook and my Twitter, these sorts of things come across all the time. I guess triangles are, people can really naturally tell the area of triangles and compare them in their mind. So this is a great way to visualize revenues and comparisons. But these are kind of cheap jabs, right? We can sit here and pick apart little graphics like that all day. But I'm going to do something right now. I'm going to pick apart something that's a little more subtle. So this is something that was published just a couple of weeks ago on a 538 blog. I'm not going to pick on them too much because I think they do awesome work. This is talking about the Bechtel test, which is a test of basically looking for gender bias in movies. And it's a really simple test. It's really nice. The Bechtel test, you have to meet some criteria. The film has to have at least two women. Those women must have names. The women must have a conversation at some point. That conversation has to be with each other. And it can't be talking about the men, right? So this seems to be a really low bar to hit for movies, right? But if you look at this, let me go back one. If you look at this, it's like even current day movies, it's barely 50% of them passed this Bechtel test, right? So what 538 did is they did this interesting thing where they explored the best Bechtel test and correlated it with the results as far as movie profits, right? And they found that actually, surprisingly, movies which pass the Bechtel test make more money than movies that don't. So I guess that's a good thing. We can be happy about that. That the films which don't display a gender bias are basically better received by the audiences. So this whole thing is an interesting study. But why do I bring this up in the same list with the Fox News bar plot, right? Well, the thing is we don't know if there's anything wrong with this, right? I wouldn't say there's anything wrong with this visualization. But because we don't have access to the raw data, we can't say anything about that. We can't prove that to ourselves. If someone is interested in exploring the deeper correlations in this, the data is not available. And this is exactly the point that a guy named Brian Keegan made in response to this article. And he wrote this post called The Need for Openness in Data Journalism. And what he said, and I really resonate with this, is the journalists should subject themselves to the same reproducibility and openness standards as scientists. So we as scientists, I know we fail on this all the time, but we are doing our best to make sure that the data we're using, the processing we're using, the code we're writing is all available so that people can reproduce our results. And Keegan echoes something that I think a lot of us have felt from 538 and other data-centric news outlets, that this new brand of data journalism is disappointing in some senses because it's trying to do science without any of that peer review that keeps people to abide by the scientific norms. So this article was really nice. And he wasn't just sitting there and complaining. He actually wrote a response to this. And this is awesome. You can go to this site. And he wrote an iPython notebook that summarizes all this. And then he actually goes and he says, start your kernels. iPython notebook allows you to do some parallel data processing. He goes and he finds the data. He gives you scripts to download it. He analyzes it. He produces the results. It's just this huge long thing that's, I mean, there's no way I could even talk about all this. But he has all that stuff in there that just shows you exactly what's happening in this. And fortunately, he confirms what 538 was saying. So there were some subtleties that he disagreed on. But basically, the overall message was correct. So that's good. That's encouraging. But he shouldn't have to go out there and dig out all that data himself. And to their credit, I lost one thing. Where have it? Oh, well. Yeah, so to their credit, anyway, 538 responded. And put their data on GitHub. So now you can go to their GitHub page. You can download the data. You can download their scripts that do these things. And the hope would be that they learn from this and do that from the beginning next time, especially when they're talking about things like elections and climate change and stuff that people really get argumentative about. It's important to have the data out there. So I showed you that example of the iPython notebook. And I called this iPython notebook putting visualization in context. And that's really what this does. The iPython notebook allows you to put explanation, code, data, visualizations, and much more all in one place. So for example, right here, this is an iPython notebook. You can click here and see that this is just written and marked down. And it renders in the browser. You can put in code. And you can actually execute the code here. This is just a Python script that computes the Fibonacci numbers and then prints a whole bunch of them. You can do things like write paragraphs of text. You can use markup to do mathematical equations if you're into that kind of thing. You can write lists. You can embed static figures like we saw above. So it's really a powerful tool to put all these different media sources together. And I showed earlier you can even embed iframes with other web pages. And more importantly, you can actually put code that generates figures dynamically in here. So let's say I want to put 1,000 points instead of 100 points. I can add a lot more to the plot. So this is an executable document where a reader can download this and tweak the numbers and see what happens. It's no longer just static data visualization or static web page. And the core visualization tool is this matplotlib tool. And you can actually, I'm going to go ahead and open this. You can open the matplotlib gallery. And there's just a ton of different graph types that are in here. I'll talk a little bit more about those later. But you can do things like load one of these examples. This is just copied from the website. And once we load that and run it down here, you get various visualizations. So it's an easy way to explore how to do all these things if you don't really have the chops to do it from scratch. There are a lot of resources out there. The other thing is notebooks can be shared online. And this has been really, really important. There's a site called NBViewer, notebookviewer.ipython.org. And it basically takes any IPython notebook. Notebook is just a JSON format. And you can save it on GitHub. You can put it anywhere you want. NBViewer allows you to render those basically on the cloud and share them with anyone. So if you go to the NBViewer site, there's all sorts of stuff. There's different programming languages. There's whole textbooks that have been written in IPython notebooks. So these are executable textbooks, which is super cool. There's all sorts of other things in there. And people are doing other things too. Like for example, blogging. I write this blog called Pythonic Perambulations because I like to think about Python a lot. And almost all of my blog posts are written in IPython notebooks. So this is one about understanding the fast Fourier transform. There's math in there. There's code. There's all sorts of stuff. And it's an executable document. So if anyone's curious about it, they can download the notebook that's the source of the blog post and go from there. It's really a cool format. And of course notebooks can even be viewed as slide shows. So that's what I'm using right here for the IPython notebook slide show. So anyway, what is IPython? The summary here is that IPython, it's tools for the entire life cycle of a scientific idea. It's a tool that helps you with exploration, that helps you with collaboration and sharing of your data. It helps you with publication and reproduction of results. So any of the research that I do now in my astronomy research, I tend to write up summaries in IPython notebooks and share them with people. And it's a way to really kind of make the scientific process more dynamic and more streamlined. And there are all sorts of other things there. Skipper Siebold, he's a stats guy who did basically re-ran 538 famous 2012 election coverage in an IPython notebook. So you can find that there. And there's this whole gallery of interesting IPython notebooks that you can find online. So there's a ton of stuff out there that people have done with this. The IPython architecture is something really interesting. You basically have, when you launch this notebook, you have this kernel running on your system and that's running the Python or kind of running the code. And then that kernel has to interact and interface with your web browser. Because the notebook is a browser based interface. I'm just using my Google Chrome right now. And that happens via this zero MQ messaging protocol. But the cool thing about this is that it leaves a lot of flexibility. So for example, right now I'm running a Python kernel for my IPython notebook. But there are kernels available in many other languages. Julia, R, Erlang, I have a whole list of it somewhere. The other cool thing is that the client is browser based and we know in here how powerful the browser is. Lots of people focus a lot of time on getting interesting visualizations in the browser. So it means that you can do things like define some Python code that creates some JavaScript. So here's a little Python script that creates an interactive widget and it allows us to visualize factors. So I'm gonna go to 72 because 72 has a lot of factors. And what this is doing is it's calculating the factors in Python, sending it to the browser and then rendering using D3, rendering the result. So this means that you can really start tying together, you can tie together all sorts of languages, all sorts of different tools in one place. And I wanna show you a brief example of that. You might have heard of the Julia programming language. This is a new language out of MIT that a lot of folks are really excited about because it sort of combines the best of both worlds of compiled languages and dynamic languages. But IPython allows you to do this thing called the Julia magic. So basically you use these percent signs and these denote magic functions. So this is going to call in Julia, it's going to import some Python packages in Julia after starting the Julia kernel. And then we put this 2% Julia and this says all the code down here is actually Julia code, not Python code. So this is, if you know the Julia language, this is the Julia format for defining an array, defining the sign of that array. And then we're gonna do this strange thing where we actually call the Python plotting library in Julia and plot the Julia variables. And then I think, I'm just gonna execute all this cause it's fun to see. You plot all the Julia variables and then it returns the Julia version of the Python figure to Python which then renders it in JavaScript. So you have, you know, it's really simple. It's easy for anyone to understand. And there's kind of other cool things too. So this is this Julia figure that we created. We can now pull that out of Julia and work with it in Python. So now I have that figure object that was created in Julia. And you know, you can really start to do some crazy things. This is one of my favorite examples. This is a dual recursive algorithm for the Fibonacci numbers where the Julia version calls the Python version and the Python version calls the Julia version. And you go back and forth until you've computed, you know, the sixth Fibonacci number. And it's, you can see here that Julia is calling Python and then we're passing this Fib which is the Julia version of that. So it's multi-language interoperability in the IPython notebook. So if you take anything from this talk, I want you to remember that IPython is not about Python. It's about much, much more. That's definitely where it started but there's some great things going on with that. So if you're interested in other languages, these are the kernels that are currently available in IPython of Julia, Haskell, F-sharp, Ruby, Go, Scala, all these other things that I honestly haven't heard of, Mathics. I don't even know what that is. But someone wrote a kernel. So you can use this with your own favorite language and make these sort of executable, reproducible documents. So let's switch gears a little bit now and go over and start talking about the modern world of Python visualization. Now visualization in Python, it sort of gets a bad rap and there's good reason for it to get a bad rap. Matplotlib is a standard scientific visualization library and it was written over the course of the last 15 or 18 years by a bunch of academics. It was actually written by some of the same people that Rob complained about who used the wrong color charts and think they're doing great stuff. So if you do a scatter plot, it looks like this and that's a fine scientific scatter plot. And it's funny though, it kind of reminds me of Excel a little bit. And when I was first trying to convince people in Uda astronomy to switch to Python from this system called IDL, which is the proprietary language, I heard this a lot like, hey that looks a lot like Excel, you know, this derogatory thing. Yeah, Excel's good. I'd try not to make fun of it too much. It's also horrible in terms of default color schemes. If you just pull out a color bar, you know, it's your favorite rainbow color scheme that we, that Rob railed against earlier today. And there are things like the fact that it's not interactive. If you make this plot and you say, hey, I want a center on that circle right there, it's just an embedded PNG in the browser, right? You can't do any of this stuff by clicking and dragging. So these are all the weaknesses of Matplotlib that people get annoyed with. But you know, it's true with enough effort, you can do some pretty interesting visualizations. And this is something from a project I worked on where we're visualizing, this is made with Matplotlib, and we're visualizing asteroids. And I want to take a brief aside and tell you about this because I think it's pretty cool. Basically on the left side, the left plot, we're plotting the optical color versus the infrared color. So this is kind of the observed colors of asteroids. And the way we chose the color of each of those individual points is just kind of an arbitrary mapping of that space on the left plot. And then what we do is we take those same colors based on the location on the left plot and plot it on the right plot. And here we're plotting orbital dynamics. So the x-axis is the semi-major axis, basically the distance from the sun. The y-axis is the inclination, so how the orbit is inclined, right? And by using the same color scheme from the left plot, it allows us to see these relationships between four different dimensions. And I like how this turns out because what you can see here is that there are clumps in the orbital space that are all the same color. And this clump right here is all green. And what that says, those colors are basically correlated to the chemistry of the asteroids. So these clumps here are in the same place in orbital space and dynamical space, but they're also chemically similar. And this is evidence that basically these asteroid families started out as a big asteroid, and then two of them collided and it turned into a whole bunch of little asteroids. So we can deduce that from this data right here. And this is in Matplotlib. So you can do decent stuff in Matplotlib. This is the code actually that generated that plot. So you can see that doing decent stuff in Matplotlib takes a little bit of work. You have to add a lot of stuff in there. And this is, incidentally, I'm gonna do a shameless plug. This is from our textbook. It's called Statistics, Data Mining, and Machine Learning, and Astronomy. It came out in January and we have this AstroML website that basically every single figure in the textbook, you can check out and click on and see the source code that generated it. So I'm pretty proud of that. It took a long time. Anyway, so if any of you are into like graduate level astro statistics, this would be the book for you, I'd recommend it. It's selling like hotcakes. So anyway, Matplotlib is old, it's static, it has crappy defaults, so why do we use it, right? Well, people are using it because it's a really well-established tool. It's really, really battle-tested. It got a big boost back in the mid-90s from the Space Telescope Science Institute. They kind of threw a whole bunch of their postdocs behind it to make it really good. But Matplotlib is, like most graphics frameworks, it has, I think of it as a three-component thing. It has an API, it has some abstraction of the graphics, and then it has the output that might be PNG or PDF or web or whatever you want to do. And Matplotlib does all of these, right? It has actually two different APIs you can use. It has this abstraction that's essentially Python abstraction of the scaleable vector graphics. And the output, it can do all sorts of stuff. Scientists need PDFs, we need PNGs, we need SVGs, EPS, PS, because all the journals just accept different things. And they say, we only take EPS and not PS. So you need Matplotlib to do all that stuff. And there's all sorts of graphical GUI backends as well for all different systems. So Matplotlib does all of this really well. But there are other parts that it doesn't do well. I showed you the API. And this is where a few add-ons, just in the last six months or a year, people have been doing a lot of interesting stuff with this. So at the API level, there are three projects in particular that I really like. There's a Seaborn one, which I'll show you, Prettyplotlib and Ggplot. Basically, they try to replace the Matplotlib API and put in something that's a little more reasonable and then they use all the rest of the Matplotlib architecture to do the rendering and everything like that. So an example, Seaborn here, this is by Michael Wascombe and he's a neuroscientist at Stanford. And essentially what it does is, the goal of Seaborn is to do, is to do, why isn't this working? Oh, I know, that's not supposed to show anything. The goal of Seaborn is to do really nice plots really tersely. So these are like statistical plots, things that you would do in statistical data mining and data exploration. And so for example, there's this join plot right here that basically calls several hundred lines of the Matplotlib API, but gives you a nice wrapper to do a type of plot that you might want to do a lot. It uses nice color schemes that does things like compute kernel density estimation automatically. And like I said, this is using Matplotlib, it's using that battle tested core, but it's giving it a new API. And it uses good color schemes. I think this actually draws from color brewer and projects like that. So just to show you that Matplotlib is not the, if you hate the Matplotlib API, that shouldn't stop you from using this tool. And for the sake of time, I'm gonna skip over showing you pretty plotlib and ggplot, which are kind of similar idea to that, but they're great projects. One that I've worked on a little bit is this MPLD3 project. And if I click over here, so this is just a quick demo of MPLD3. So what MPLD3 does is it takes Matplotlib MPL graphics and converts them to D3. And it's pretty fun. I showed you a little while ago that the output of these plots are basically static plots, right? This is a PNG, I can't do anything interactive here. But if you call a little piece of code here that I wrote, the MPLD3 enable notebook and show the figure, then all of a sudden you get a figure that's dynamic and you can zoom around and scale and click home and explore things more dynamically in the notebook. And what this is actually doing, it's kind of, it started out as a hack and turned into a gigantic hack. And it uses this package called MPLExporter to basically scrape the Matplotlib object, construct a JSON that's similar in spirit to projects like Vega, but there are a couple of reasons I didn't just use Vega. So it has a JSON that basically tells you everything that's in the plot. And then once it has that, it sends it to this JavaScript library that's about 1500 lines of JavaScript. And this is where I realized that there are scientific Python users who spend all day, they like Python so much that they spend all day writing Python and debugging Python. And then there are the scientific Python developers who love Python so much that they spend all day writing JavaScript. So that's what I've been doing with my last few months, debugging JavaScript. Oh boy. And it produces this pure client side view. So you can essentially embed this in web pages, you can share your data that way, you can let people interact. It's just D3 in the browser. The other cool thing that I added is this ability to add plugins. So you can do something like this tool tip plugin and you connect that to the figure and then all of a sudden when you hover over points, it tells you what those points are. These aren't very helpful labels, they're just 0.0 through 0.100. But you can put in whatever labels you want, this is just a list of strings for those labels. Oh, we can do things a little more sophisticated. So here's a big Python script to just generate a grid of plots. And then down here, we just connect a linked brush plugin. And you can probably guess what's gonna happen. If you then brush on the points, you can explore and see the relationship between all the different dimensions. That's pretty fun. And you can even, if you're really masochistic, you can start writing your own plugins. It's basically a bit of Python plus a bit of JavaScript and the JavaScript is defined in a string in the Python. And then, yeah, I told you, massive hack. And then you connect it to the figure and it renders it and it actually puts that JavaScript in there. So this is a custom plugin to make lines highlight as you go over them. It's more proof of concept than anything, just to show you what's possible with this. So all of a sudden you can start using your, Matputlib plots interactively. Here's another silly one I did that's just like exploring SVG paths. So you can click and drag the path and it'll, you know, funny things like that. Anyway, so that's MPLD three. If you wanna see more, there's that website right there. I just released a new version about five days ago in to get it ready for this conference. And I'm pretty excited about it. There's a lot of good work going on. And I think it's paving the way for some interesting things. The other thing, the other projects that are out there for visualization, actually Sidestep Matputlib all together. There's this Bokeh project. There's Vincent Bearcart Plotly. I have links to all those. I'm gonna skip showing you examples of those because I want, in the last few minutes, I wanna get to the IPython interactive stuff. So this is the new stuff in the IPython notebook that'll actually allows you to do, to interact with the kernel via JavaScript. So this is pretty cool stuff. Like I said, this is about three or four weeks old. I think I first saw this, the prototype demoed maybe at the end of last fall by one of the IPython core folks. So basically, we've heard this week that the process of visualization is kind of an iterative thing, right? We heard from some of the speakers yesterday that despite what we might think, they don't feel like they're amazing geniuses and they actually have to sit there and try things and do different stuff and eventually arrive where they want. And the notebook, what the interactive visualization, the interactive widgets try to do is put that process in the notebook a little more seamlessly. So for example, let's say I have some data here and I have a function that draws a scatter plot and this is the result. I'm not very happy with that because points are too small, the default color scheme is ugly, stuff like that. So what we can do is use this interact framework and it's just a couple lines, right? We import interact from IPython, then we call interact on this draw scatter function and we give it a bunch of values that we want ranges for. And now all of a sudden we have these JavaScript widgets that when you move them, they're calling back into Python. So this is calling from the browser into your Python kernel, regenerating this plot and we can actually explore here that there are some nice-ish color maps in here. The cube helix one is one that I like a lot because that's the one that's spiraling through space while incrementing the brightness basically. And we might want to make this a little transparent. So this is the kind of stuff that two months ago you would have to do by actually editing by hand the values in the code. And now we have these nice interactive sliders to do that process quickly. And we can do a lot of stuff with this. Here's an example of a bit of code that generates some random graphs using the network X library. And then we can explore what these random graphs look like as you adjust the parameters that are used to create them. And I honestly don't know much about this so it's just like clicking buttons and sometimes stuff looks cool. But the guy who wrote this knows a lot about graphs. Which he would hope. And this is one that I did that I really liked. This is exploring the Lorenz system. So the Lorenz system is a system of differential equations that have really interesting solutions depending on how you set the parameters. So this is actually code that solves the Lorenz system, the differential equation. See, we just define the derivative here and then we actually use wherever it is, we, where's the solution happening? Oh yeah, here it is. We're doing the ordinary differential equation integration to find the solution then plotting it. And if we call solve Lorenz right here, we get this nice little static view of what the Lorenz system paths look like. We can do the same thing with the interactive and now all of a sudden it gets interesting. We can adjust the angle and see in 3D what's going on. We can see the effect of sigma on the paths. What does sigma do to this? What does the beta variable do to this? And each time I move one of these things, it's actually in real time calling back to Python, resolving all the paths, resolving the differential equation, plotting them and then showing it here. So I'm working with some grad students at UW right now who are basically using this to develop a series of notebooks to teach applied math. So there's a professor named Randy Levec who some of you might know and he teaches a lot of intro applied math stuff and his grad students are working with the two of us to develop these sorts of tools to help students learn about this stuff. The same sort of thing we saw yesterday with the D3 visualizations. When you can start interacting with things, when you can start seeing it dynamically, then learning really takes a leap. So that's the end of the talk outline and just in conclusion, I wanna hammer down these couple points. I think the iPython notebook is an incredibly powerful platform to accomplish the types of things we wanna do in science and I would say the types of the things that you folks want to do in journalism. Those of you who are working in media, I think this push towards open data, towards reproducibility, towards dynamic documents like the iPython notebook is really, really important. iPython gives you access to the code, to the data, to the visualizations and everything in context. It allows you to reproduce analyses and graphics kind of in real time or at your leisure. The user can go back and download the documents and do the analysis themselves. It's multi-language, so this back end, these kernel back ends allow you to work in whatever you're comfortable in. I even heard rumors that someone's doing a MATLAB back end. I don't know if that excites anyone in here, but in the right audience there would be cheers. And it has this browser front end that really the possibilities are limitless. Anything you guys are doing in the browser, you could write a Python script, meta script to generate that code and make it happen, make it reproducible. And not let you'd want to do that necessarily, but there's some interesting possibilities for that. So anyway, openness and reproducibility, it's a hallmark of good science, journalism and visualization. And I hope that I've convinced all of you that you should use the iPython notebook in the future. So you can find my contact information there and thanks very much.