 It's a sponsor talk, so hopefully I won't sound too sponsor-dish. My intent is actually to talk about some of the technologies we're working on that are open source. I'll give you a little brief insight into what we do as a company, but mostly I'm going to talk about the open source tools we're doing that really drive from my experience as a NumPy and SciPy communities. We are basically a team of scientists, engineers, and data scientists trying to build tools for others, scientists, engineers, data scientists. We feel like in the wider ecosystem of computer science and computer technology, that category of people, the domain experts, the scientists, the domain scientists, tend to get left behind as people focus on developer tools only, so we tend to be developers that focus on the scientist-tivic tools, and there's a lot of need for this in the real essence of the big data movement is really getting insight from those data, and those at insight create, requires models, scientific models typically. I'm Travis Oliphant, my background is in NumPy, SciPy, I'm actually on a PSF, I'm a PSF director currently as of June. I started the NumFocus Foundation, we'll talk a little about that beyond. I've been a professor of UIU, been a scientist myself. My roots are as a scientist, but we created a company really to allow other people to build open source software. We love open source software. Peter Wang is my co-founder, two and a half years ago we built Continuum. Our whole purpose is really to allow other people to help us build open source and deliver it to the enterprise, and really make it a part of everybody's enterprise experience. So that's what we're about. We love open source, it's part of our DNA, I've been contributing to open source since 1998 when I first found Python, and I've been a Linux user, a lot of us do a lot with open source. Now we've got 50 people worldwide, we have remote developers, depending on the project remote developers work really well, sometimes it can be difficult. We try to find those projects where remote developers can work really well, but they are available. We have major contributors to NumPy, SciPy, Pandas, SimPy, Ipython, and we love more. We love new open source products as well. We think that open source can be more than just a hobby, our desires to grow the community. That's why we started the NumFocus Foundation two and a half years ago as well. This foundation, its whole purpose is to promote accessible computing in the sciences and to back NumPy, SciPy, Pandas, SimPy. A lot of these projects, they are emergent open source projects with just kind of a loosely affiliated community, and not much money to help them. So NumFocus' purpose is to gather money from enterprises and drive it towards sprint development, towards scholarships, for diversity training, diversity events. NumFocus also sponsors and promotes and actually receives any residual income from the PyData ecosystem, the PyData conference series. We're having one as an affiliated event to this event, so please come to the PyData conference. You'll hear all about the great scientific tools, the great data analysis tools that are emerging. Now, as a company, what we sell is enterprise consulting and solutions, optimizing performance, to managing DevOps and the big data pipeline, to building native applications in the web or on the desktop. We also provide training, Python for Science, Python for Finance, as well as practical Python through our partners Dave Beasley and Raven Henninger. And then we are building the Continuum Platform, which is a product for kind of taking the desktop, the data center, and back that allows people to deploy data analysis applications and dashboards. So our products are all centered around that platform. They kind of take the appearance of Anaconda Add-ons and Anaconda Server with Cari Enterprise. I'll show you briefly just those, the key behind these products is to really give experts and scientists what they really are asking for. I've been spending a lot of time myself as a scientist. I kind of understand what the workflows they desire are, and we're trying to bring that to large organizations, large companies. So this is a picture I show of the Continuum Platform. You can see that it rests on an open source base, and an open source base that we contribute to greatly. We continue to contribute to it. The IPython, Sympy, SciPy, NumPy, Pandas, that basic baseline, and we have additional open source projects that we're writing and growing. Numba, Bokeh, Blaze, Dian, Kanda, Elevium, Py, Py Parallel. All these things are trying to bring high-level scientific applications, make them easier to write, make them faster, make them take advantage of the hardware that's changing today, GPUs, multicore. I wrote NumPy six years ago. I still know all the bad places where it does not optimized. There's many, many places, and it's not optimized because it can't take advantage of multiple cores or can't take advantage of multiple GPUs. On top of that, we deliver Anaconda, and then above that are some of the proprietary applications that we provide, all about creating applications that can deploy in the enterprise very, very quickly, and really empower the domain experts that exist in every organization. Why Python? We love it because it provides a spectrum. What you'll see here in the Python community is kind of different categories of people. You have some people who are web developers, and they love that. Some people are DevOps folks in system administration, and they love that. And then I kind of in the camp of data scientists, scientists, and sometimes it can be challenging because we don't all speak the same language, and so we kind of talk and use different words and different terms, different libraries, but one thing that's great about the Python community is it is a community, and people for the most part listen to each other, try to work forward on solutions that help everybody. And in particular, some of those people that are in the Python community aren't even developers. They're what I call an occasional developer. They're the cut-and-paste programmers. I have an idea. I kind of want to put a few things together, and Python fits my brain. It's partially leveraging my English language center so I can kind of understand what it's saying and not have to be a developer to use it, and I can build things very quickly. Python does that. It's very unique, actually, among all programming languages. Now, NumPy, it plays a central role in the kinds of tools that we build. It's at the center of a large stack of data analytics libraries. There's a lot of users of NumPy, actually. I think about three and a half million. It's hard to tell because they don't ever tell me. They don't write home and tell me a postcard. Sometimes it'd be nice. You could actually get a sense of who did and who used it. As a company, so that's kind of what we build on. But as a company, we ship Anaconda. Anaconda is a free, easy-to-install distribution of Python plus 100 libraries. One thing that's challenging about the NumPy stack is it uses extension modules. It uses C, uses sometimes Fortran for SciPy. How do you get that installed? It's not enough to just have a source install solution. We have to have a binary install solution, and so we invented Kanda. And Kanda is, and we work with the Python packaging authority to try to promote Kanda, help understand how it fits in to the overall packaging story in Python. But essentially, it's like Yum and Apgit for Linux, except it's for all platforms, Linux, Mac, and Windows. It's a fantastic distribution that people rave about, and they love it. When they use it, why do they love it? I think Kanda is a big reason. Kanda is a cross-platform package manager. It helps you manage a package and all its binary dependencies. It's an easy-to-install distribution that supports both Python 2.7 and Python 3.3. You can actually install Anaconda for 2.7, then create environments. I just had a talk by Red Hat. They call these software collections in the next space. We call them environments. They're system-level environments that let you, they're more than just Python. They support anything. So you can run Python 3.3 in a separate environment on a Python 2.7 base. You can also do the reverse. Get a Python 3.3 base and run Python 2.7 as a compatibility test development environment separately. It's a fantastic solution for bridging the gap between Python 3 and Python 2. Then there's over 200 packages available, scikit-learn, scikit-image, iPython notebook, just at your fingertips. Kanda install gets them and you're off and running. You know more compiling dependencies than we're trying to figure out how to install it. And this is all for free, completely free. You can even redistribute the binaries we make. So that's Anaconda. Its purpose is to make Python ubiquitous in data science and there should be no excuse for anybody in the world using Python to solve their data analysis needs. And that's why we make Anaconda. Get it at continuemyo downloads. It's free for downloading, free for distributing as well. And we do sell some things on top of that. As a company we have to stay in business, we have to sell something. And part of that is an Anaconda server. It's a commercial supported Anaconda. Provides support, provides identification licensing. It also provides a package mirror and kind of a management tool. And if you're interested in that, I can talk more about that to others. Come see me later. binstar.org, you can see kind of what Anaconda server might look like on a on-premise installation. By going to binstar.org, signing up, get a free account and you can upload there any package you like. There's a three gigabyte limit so don't just show up all your movies and content packages. But you can put any binary package you like and share that with somebody else so that they can easily install your solution. And as long as it's public, as long as anybody can download it, it's completely free. Well, Kari is our hosted analytics environment solution. It's a fantastic way to quickly and easily get running with the IPython notebook. You can sign up and instantly you're in an IPython notebook running code. Now, the free version gives you a node with only a little bit of memory and only a tiny bit of computational power. But it's great for teaching, for showing, for demonstrating. If you want more power, you can easily upgrade to get as powerful a node as you like. Then Makari Enterprise is the on-premise version of that cloud story. It's adapted for, the UI has changed to allow LDAP support integration to installs to internal servers. It has a notion of projects and teams and lets people instantly collaborate on a large scale project and then share the results of their workflows with others very easily. So from desktop to data center is kind of our platform story. It helps you anaconda on the desktop, Makari on the data center and a seamless connection between the two so you can go from writing code on your desktop to deployed applications that are on the cloud or on the data center on-premise. So that's our solution. That's the thing we are building together as a company that helps all enterprises everywhere. But the part I like the best is the open source tools that we're actually building as a part of this. We feel it's critically important to continue to build open source technology. So we have key open source technology that builds on top of NumPy, SciPy, Pandas and the rest. So Blaze, Bokeh, Numbaconda. I don't really have time to explain all of these in the brief time I have. Tomorrow, my keynote, I'll be talking not about all these technologies a little bit, I'll mention Numba. Probably mostly about Blaze and kind of how I see it as part of the story for the future of Big Data Analytics. We do have some add-ons. I've talked about those before. So I'm gonna briefly talk about kind of these technologies. Get you excited about it? We're looking for help. We're looking for developers who can help us with each of these. These are paid positions. So one is Numba. Numba is really a technology about taking the CPython stack and providing compiled technology to it. So PyPy is a fantastic project, but it doesn't integrate with the NumPy stack very well. NumPy, Matplib, SciPy, Sympy. So we took and took the LLVM technology stack and with decorators we can take a function, compile it to machine code and integrate it in with the rest of the NumPy stack very easily. It takes advantage of the LLVM tool stack. So the kind of work that we're doing is to basically translate a function that looks like this with a decorator, create a general assembly kind of code via the translator and then LLVM takes that code and runs on your platform. It can do amazing things. I think it changes the game. It lets Python essentially be like a compiled language. It's a subset of Python and we can go to the details if you like later. But a subset of Python can now, you can write it in the Python syntax and get compiled performance just as if you'd written C or C++. And we have numerous examples of that. Very, very easy way to get optimized performance out of your Python code. Here's a simple example of a Mandelbrot generation. Gotta have your Mandelbrot generation example. It illustrates the ability to call functions and have that bypass the Python runtime and essentially be a low-level machine code. So this is one way to bypass the GIL use Numba to add a JIT and now you have a Python, now you have a, it's not in the Python runtime anymore, it's actually compiled code and you can release the GIL and execute that. So that's Numba. Blaze is our data to code seamlessly. It's about taking the fundamental problem Blaze tries to solve is when you have data in let's say it's an HDFS, or somebody else in your teams as well, I think we should have it in Postgres with Green Plum or maybe we should have it in Netiza maybe we should have just a bunch of HDF5 files. That decision of how you store your data ends up determining how you write code, how you write your queries, how you write your solution in Python. It shouldn't be that way. There ought to be a way to write an expressive table-oriented code that then you just plug in whatever data you have and even allow you to cross different tables and have the same expression work across all those tables. So Blaze is a foundation for large-scale array computing that leverages the technologies that are out there already. So with data, this is describing some of the pain involved in data, there's many, many kinds of formats. The big data pipelines are constantly changing. It can be difficult to reuse code in that environment. The Blaze architecture has an API with it has some fundamental pieces, a deferred expression and a compute, a pluggable compute infrastructure and a pluggable data infrastructure. So it's a flexible architecture that it can scale across multiple use cases. So data, for example, it can be stored as CSV files or a collection of JSON files or HDFS or HDF-5 or just in SQL, you can add your own custom data type. So a simple API lets you add it but then your Python level expression is common. It's more numpy-like, you can slice it, you can dice it, you can grab pieces of it. And then you can write in a compute graph that refers to part of that data. So this is a compute abstraction that basically can sit on the top of multiple backend libraries, things like pandas. Dined is a next generation numpy equivalent. It's a C++ library, does the same things as numpy but allows other, it's more general, allows things like variable length strings, ragged arrays and categorical data types which are missing from numpy. It can also sit on top of Spark which is part of the Hadoop ecosystem, PyTables which from our friend Francesc who's sitting in the back. Then this Blaze expression graph, you can write a single expression and have it attached to multiple data sources and pull it all together in a single application. Here's a simple example. We have a generalized data declaration format called DataShape which generalizes numpy's D-type and this DataShape allows you to describe data universally in a way that can sit on top of multiple data formats. So here I'm creating a symbolic table and this symbolic table then I can write an expression involving that symbolic table including joining, group buys, aggregations. Now that creates a deferred expression and then the load data, there's different implementations of load data depending if my data is in SQL or if it's in Spark and then I simply map the elements of what I've loaded to basically a dictionary representation of the namespace that that compute is going to evaluate in and then the compute maps the expression graph to the actual backend calculations that are needed. So whether it be pandas in memory or Spark on a 100 node cluster, the same code can be executed. So this is the load data showing the difference between the Spark and pandas. I'll talk more about this tomorrow because I think it really sets the stage for reusable computing and reusable expressions and helping people make sense of the diverse and changing world of big data and large scale array-oriented computing. So last technology, I didn't tell you a lot about Kanda because I've got a lot of videos out there. If you wanna hear about Kanda, there's actually some jokes about me constantly talking about Kanda because I love it so much that you can find videos about Kanda on the web. I'm gonna talk about Bokeh, which is our visualization library. I'm really excited about the visualization library. A lot of people are as well. It basically allows you to do interactive plotting in the web without writing JavaScript. So as a Python developer, you can write interactive visualizations in the same spirit as D3 but using Python. Now it's still in development but quite a bit can be done already. You can have novel graphics. Actually, the violin plot came from a Seaborn library using the map plot lib, compatibility of Bokeh. So you get up a map plot lib plot and then essentially render it with Bokeh to provide the interactivity and the JavaScript rendering. Lots of different kinds of graphics can be built. There's even streaming and dynamic data that can be built. I have a simple demo here. I'd like to show basically it's running in the background. So if I go to, this is just my basic computer and it was been running for a while and it's the microphone. What I'm doing is using the NumPy stack to do a Fourier transform on the audio coming from the microphone and show that spectrogram in a couple of different ways. So I can see the time series. I can see the frequency, here's the time series, here's the frequency spectrum and then here's a image map of the frequency spectrum. Take this line and rotate it and stick it in an image and then it kind of moves across so I get a spectrogram image over time. And then here's just a radio plot just for fun. So you can see that this is sampling the microphone. I can't whistle that high. Anyway, there's things to do with the game. So this is a JavaScript library and you can actually do this from Python. Currently, this demo is currently written in taking advantage of the Bokeh.js backend but it's being written in Python so you can show just how to do this sort of thing in Python and create these kind of visual apps. It illustrates many things about the platform that I think is the new platform for visualization which is the web browser. So this is what we're doing. What's it about is the kind of things you can do if you come with us, if you come work with us. Let's go back to my presentation. Not the Twitter feed. Although that, maybe some of you tweeted. What the other aspect of dynamic interaction is that because it runs, there's a web socket communication and an object model. Bokeh creates a scene graph in the web browser and an object model that can be reflected in the Python side. So you can write objects, an object model in Python that gets reflected to the browser and you can have server side control. You can also just have all that logic in the browser and have kind of a static web page that has all the interactive logic in the browser. So this is just an example of essentially a web, the web service updating the plot and then the backend server updating the plot in Python and having the web display change. So it's a great way to handle streaming data and all kinds of different interactions. You can also do big data kinds of analysis pretty easily with this kind of setup. I can go to, this is running actually in the US. So these are time series that are stored on a server and I have just a ability to zoom in. So you can see it's actually updates. It zooms in initially with the data it has and then it goes back to the server and updates a higher resolution version. And these are all links, these different plots. So it's just a simple example of resampling that I can reset the view and then it expands out after it grabs the data. This is actually back in the US so there's a little bit of latency between them. Here I have an example I'm actually looking at the whole world. You can see I've zoomed into a particular slice. This is a worldview, it's a three dimensional time series about four gigabytes of data we got from the JPL from NASA. And it shows the ocean view and it's in time. So I'm seeing a 2D vision, 2D projection of the world but this slider changes the time view and it takes a little bit to bring back all that data. But if I zoom in to particularly interesting area of the world, I can see that it updates from the server. It gives me back this resolution view and then I have projections that show the period through time and I can change which slice it shows. You can see it's updating down here. So that's just an example of an application built with the visualization and the kind of things you can do very quickly and then deploy in a website and a web browser across your organization. Another, you have a little widgets you can provide. This is just example of a simple widget and some dummy data about downloads and I can adjust as I slide through it. These are the kinds of things you can do from Python without running JavaScript using Bokeh and it's application technology. So there's lots of, that's the gist of what we're trying to do on the platform. Basically from data to visualization and beyond and make it easy for people to do it at a high level so they don't have to be expert developers. They don't have to change and know everything about SQL and about JavaScript and about development operations in order to get solutions that take advantage of multiple kinds of hardware, multiple kinds of data sets and high level ideas. So no JavaScript just a little more example of the kind of plots you can do. With Bokeh there's actually gonna be a tutorial at PyData. I invite you all to come to PyData in the tutorial given by one of the, by the principal author of Bokeh, Brian VanDeven will be here. There's also a great website you can go to that will explain Bokeh, bokeh.py.pydata.org, it's got a gallery, you can go in and look at the code. It's still a work in progress. It's 0.5 just came out. The widgets just came out. It's making rapid progress. But it's usable today. But if you find something that you want and it's not there, let us know. And I'm sure it's either on the roadmap or it will be added if you let us know about your particular needs. Okay, so that's a quick run through of the technologies we built and the kinds of things we do. And basically I'll end by talking about the openings that we have. There's many openings for the number team, for the Blaze team, the Bokeh team, embedded consultants. If you want to live in New York, come talk to me. I have great opportunities for you in New York City. And these are opportunities that not only work with a client, but work with the rest of our team in helping us build this platform based on open search technology that can benefit large and small scale organizations around the world. We're really excited about what we're doing. We think we have ideas that can really help and transform the way people write code and write code for high level data analysis. And we'd love to have you join us. So I think with that I'll ask for questions or anything else you want to know about. So we got any questions? Thanks for the talk. I have two questions regarding the Python part of the Bokeh. So first of all, I remember that at the beginning the Bokeh was something which was trying to implement the grammar of graphics for Python. But recently I saw that there's no mention in the documentation about the grammar of graphics. Are you still using the same kind of interface or just... I would say it's not the grammar of graphics. Well, I know that some other developers see the grammar of graphics as a good direction, but not necessarily complete. And so Bokeh.js itself uses concepts for the grammar of graphics and its architecture. The interface is something that can be added on top. So for example, ggplot, which currently has a back end and mapplot lib, can easily be retarget the Bokeh.js. In fact, we have examples using mapplotlibs interface of doing that. So let's say the grammar of graphics discussion is higher level than Bokeh and Bokeh.js. And the second question is regarding the widgets and interactivity of the plots. So as I understand, widgets is something that you can play with to, for example, select some data points, get some old information about particular data. Is it something that you have to implement in JavaScript or you can just use Python code to define the widgets? For which? For example, if I want to select some data points and print maybe some tool. Right, selecting data points and pretty them. I believe that's on the roadmap to be done from Python. I think currently to do that, you have a nascent Python interface to that. And so if it works for you, it might be enough, but it's possible that it's still not quite complete, that API. So the idea is you won't have to use JavaScript. I'm not sure if we're completely finished with that API on the selection of points side. Any more questions? And Brian will be here later today and he can give you a lot more explanation of Bokeh. Anyone else? No? Okay, thank you very much, Travis. All right, thank you. Thank you.