 Hey Okay, so I Know a lot of you people work a lot with notebooks, which is good, but As you probably know The work can be completely frustrating and monotonous and just crappy before you get a result So here's an example of I hate stand this way Here's an example of what you need to do to create that ugly 1980s bitmap thing in the middle So import a bunch of libraries We're gonna use map plot lib here use some obscure style To find your figure your axes and all that stuff and then set all your colors and you probably Don't just write this code You probably take hours to figure out exactly what you want And this is probably the simple version of hours or days worth of work And at the end of it you get this thing which is You know not really Not really bootstrap type like 2017 web design So wouldn't this be nicer? So pixie dust is an extension mainly written by a colleague of mine another developer advocate in my group and I just Implemented the mapping part he did all the tough part about reverse engineering Jupiter notebooks and figuring out how To write new Python code in the sort of inside the kernel and extend it in amazing ways So with pixie dust you get one command display and it takes your data frame Either spark data frame or pandas data frame and Gives you a gooey to work with it. You can make it a table you can do bar charts and line charts and scatter plots and histograms and adding support to Stash it to different databases that we support our favorite IBM databases on the cloud and Also do mapping and that's the focus of what I'm going to talk about So how many people do mapping on a daily basis geospatial work just a few good It's always part of the reason um, you know, you can do Obviously when you're going to do really high-end stuff you're going to get in and Customize it yourself and you're going to want the control you get with a lot of code Or the the kind of control you can only get with coding it yourself But if you're kind of new to it and you just want you know You have a latitude and longitude field in your data You want to see what it looks like or you want to do a lot of exploratory data analysis before you spend all that time on your final product The pixie dust extension can be a great way to just play around before you devote a lot of time to The custom code-based work and that applies to the charting the bar charting and the line charting as well so, you know, I think of pixie dust as a tool to sort of Give you just as much Or devote just as much time as you want to at the right stage of the project project so at the beginning of a Getting a beginning of a project when you just want to sort of get a feel for your data Look at a few things you don't want to have to stop Switch context and go and read books for three hours and figuring out like all the all the options in the PLT command All right, you want to sort of play with your data You want to be focused on your subject matter and understanding that and then when you get You know overlapping text and on your on your graphs and you want to fix that and really make that nice You can do save that all to the end when you're when you're just thinking and brainstorming You want some easy-to-use tool and This looks pretty good it actually right out of the box So I'm gonna talk a little bit. So that's the problem. I think this is a great solution to that It's an open pixie dust in an opening source extension. We're looking for lots of help with making it better and You don't have to run it on IBM cloud I'll talk a little bit about that it can just run on its own in Jupiter You can download Jupiter notebooks set everything up yourself and go yourself and use it But let me talk a little bit about what we do so I'm in I'm a developer advocate for the Watson data platform Which is a combination we used last year. We were cloud data services We're responsible for all the online databases managed databases that IBM offers a DB to on cloud and cloud in great no sequel database and dash DB column data store things like that and This year we kind of merged with cloud and Watson a little bit and our responsibility is more Now it's more than databases and analytics and we're talking more about some of the cognitive stuff from Watson But the you can't really read this back there But the core pieces of what we do our data engineering data science analysis and app development and we try to bring those all together so Eliminate a lot of the friction of starting of doing all that stuff yourself and have it all in one in one managed service offering so You know That's the whole big picture and then we have data services that we're adding like we bought the weather company last year and You know making it as big companies do a one-stop shop But today I'm just going to talk about data science really and the flagship offering there is Oh forgot to do my animations So yeah, we got a lot of weather data from buying weather company and Twitter Partnership and adding Watson machine learning stuff and geo is my specialty So data science our main offering flagship offering in data science is called the data science experience not very creative And we call it DSX But all DSX really is is Real convenience tool it combines gives you a web-based interface to a Jupyter notebook and it fires up spark instances automatically along with object storage, which is kind of like Amazon S3 so as soon as you start up a project in DSX you have spark Place to store files and a Jupyter notebook environment running right away Which is a really nice thing if you're well you most of you sounds like you probably have set that up on your own and That can be no fun so So for those of you who don't know much about notebooks Just talk about so a Jupyter notebook is Not just a coding environment, but it's also a place where you can combine code with markdown text and Interactive HTML to really create Final presentation product around around your data engineering and data science work Started off the idea started off, you know hundreds of years ago scientists have been documenting their work for a long time and They'd write notebooks. That's where the name came from So you'd have all these scientific notebooks where people are writing down their experimental results and drawing little graphs and things like that and Then I forget who invented MATLAB Carl I can't remember this name, but this is amazing guy We owe a lot to MATLAB took the idea of this sort of analog notebooks and brought it into the digital world back in maybe the 80s, but I think the 90s and the MATLAB notebook is really the Probably the direct heritage of Jupyter notebooks You could do a lot of coding in there and get you know results right away in a in a graph and a visualization put in pictures and helper things and MATLAB is still an amazing product and So out of that grew a whole sort of family of data science notebook types. There's not just Jupyter. There's Zeppelin and some other things but Since we since our company Focused on supporting Jupyter. That's what we built pixie dust for And I'm talking about visualizations and particularly map visualization But there are a lot of other pieces to pixie dust which make your Jupyter notebook work a little bit more pleasant Package manager it can import while I'm going to talk about all these very quickly because I'm already almost halfway through package management visualization cloud integration a scala bridge, which is really cool I'll extensibility embedded apps so the package manager Can install spark packages or jars without modifying a config file Just we do that. That's really nice one simple API for display which I'm going to talk more about Data export into files CSV JSON XML or your favorite online cloud IBM database Scala bridge if you use this, I don't do scala So I don't know how amazing this is but I hear that sharing variables between Python and scala You know if you want to use a library really good scala library, but you're working in Python You can go back and forth as an amazing feature Try to make this animate quickly so I can go through it quickly Extensibility with HTML jcss and javascript And embedded apps, so this is something we're building so We're still building this out right now. So once you're able to sort of programmatically control your data frame and build some of these graphics Why not construct an app? Why not really take advantage of the DOM and build out a whole? Basically a whole web web application within your Jupyter notebook Which is fed live from dynamic data from data frames It's all based on the fact that you have these data frames these objects variable objects some stored in your Jupyter notebook that can Be exposed and operated on computationally and Then rendered with anything you can do in HTML and javascript to CSS CSS So now I'm going to switch to a demo Let's hope everything works well Sorry, this is a little bit blurry in the background so this is the data science experience and You'll see it's not very the fact that it's Jupyter isn't hidden very much Everything below here is pretty much like the Jupyter notebook. You'll see on your desktop So I'm just going to I have some code here You can install it from Pi Pi, so I commented out pip install pixie dust import pixie dust So we bring in this library So you've got two mapping libraries in here Google Maps and Map box and that's what I'm going to talk about mainly So Google Maps has a nice little developer API for letting you make maps based on named named fields So let's say you have a bunch of international data and you have country names and you have values in another field That's what Google Maps is good at or you have a US data And you have state or county names and you have values you want to map That's what Google Maps is good at map box is better at mapping data with geographic coordinates Whoop so I forgot to talk a little bit about this so the first thing we do we import pixie dust We run the pixie dust comes with a few sample data sets built in which I'm going to take advantage of So if you run the sample data command with no no parameter it returns a list showing you the data sets available I'm going to grab the total population by country data set from the UN or from the World Bank that I have in here And that was number three. So I run sample data again with the value of three It creates a spark data frame And then I run the display command with the data frame as the single parameter And you'll see this sweet looking map with no code And not just a map, but as you hover over it pops up pops up the name of the The field name and the value in this case This is a nice exam. Obviously. I use good sample data So you pop up the name of the country and the population value you probably can't see back there And as you move over here You'll see this is all this is in us. This is all directly from the Google Maps API So all we're really doing here is feeding the Google Maps JavaScript library, which is all in this cell with the data frame and I'll show you so If you go in here, there are options here. I could have given the map a title so global Population And this is where I picked out my fields. This is a lot more fun than writing code so you can just drag and drop these guys here and Here's an important thing So you can choose what sampling of your data to use So here I'm going to choose that I'm all this has to happen in memory So you don't want to use a terabyte of data in your browser So you can pick the number of rows to display probably I'm using one thousand right now Which covers all the countries in the world so that that actually grabs all the data But if you had millions of records, you know that you were guaranteed to only get back a thousand In your browser in memory and you could up that to ten thousand which would probably be fine But I know this works for this and one thing to note about that. It's not the number of rows that are going to come back are not The number of rows in your data set, but it's the number of rows after you do this Aggregation command of some count or something like that so you can actually operate on the whole data set and As long as you know that less than this amount of rows is going to be the result so that's cool, but You know if you're doing real work you probably have Data with latitudes and longitudes in it So I'm gonna I just pulled down a few months worth of home sales data from red friend one day and use that to build a Another sample data set we put in here So I'm gonna pull that into a spark data frame really quick and I'm gonna run this Display command again How much time do I have and ooh we get a nice so pixie dust fires up the Grabs the map box client-side JavaScript library takes all the data transforms it into Geo JSON As you heard Mike Bostock talk a little bit about Because D3 uses Geo JSON as a spatial data format So does map boxes client-side library Translates it into Geo JSON add some basic Styling styling thematic styling Cartographic Whatever you want to call it. It's a JSON file that describes how to cartographically style the points And then it also uses map box to pull in this base street layer underneath for some context So instead of just getting a map with your data back you actually get all the streets for free so these are So it's it's doing a little bit of deconfliction for to give you a clean map and it's clustering points If they overlap so as you move out, you'll see all these numbers are how many points are clustered there But as you zoom in you'll get individual points, and then you can see the price of the home sale So these are all home sales over a million dollars in northeastern Massachusetts in a over a few months period So you can start to see patterns what you can also do here. We have a few options. You can just see Just see all the points not thematically Just hover over get the values. I showed you that coropleth map, which is thematic or you can do a heat map more of a heat map style so that Spatial patterns jump out more visibly and That's all defined all you have to do is have a latitude and a longitude field in your data You drag those over here, and then you put some numeric field as your value to style on and Just like anything else you get some you get aggregation there. Oh Forgot to show you so You'll need a free You'll need a free Access token from map box to make this work. You can get that on their website the The help button explains all that Well, let me just quickly show you so in addition to the mapping which is sort of the most complex thing You can just click on the table button and see the data as a table and you can go back and forth So here's a nice if you've ever used notebooks the Tabular output is kind of crappy. So this is a much nicer way to see your data in a table and then you can go in here and the idea is that At any point in time you might want to see your data as any one of these types of visualizations I don't think this one will make sense really Yeah But it's kind of fun to play with this you see like the relationship between home price and number of bathrooms things like that And go back to the map. So That is the demo and all worked so for Programmers who I'd like to court your participation on the project As I mentioned earlier This all works off all the renderers. We're calling all these different visualization styles renderers They all work off of a spark data frame Which is exposed to the code and in my case in mapping I need to translate that spark data frame in the GOJ sign because that's sort of the lingua franca of web-based mapping and I Know if you chose a thematic coropleth style map I need to quickly Generate five quantiles Just chose that because that's the most common thing you do that Which is obviously in a really easy thing for for Python to do and then we So the data is in Geo JSON the styling the credit graphic styling is in another Style type of JSON file we create that and then we combine it all into a ginger to template So pixie does adopted ginger to as a templating engine so you can use variables. You don't just have to spit out html You can reference variables from your Python code within the html to bring data and and Then we show it all in the output of a notebook cell And the magic of notebook cells is that you can embed any html JavaScript and CSS inside them So you can do amazing things you can take that as far as you want and we're pushing it pretty much as hard as you can In our case Only for the mapping for some reason map boxes Library doesn't like being embedded in a div so you have to had to put it in an iframe But that's so that happens So we have an iframe that gets shown before we embed the map view dot html inside that iframe and Then we just as I mentioned before we call the base mapping service from map box to show a really pretty street map underneath so future of this particular part of pixie dust is probably adding more support for Other companies maybe a Esri support for mapping if you want to in addition to the map box and Google Right now. We're just doing points. It would be nice to support polygons And then do you have more cartographic visualization options like a lot of data not a lot of natural occurring data like weather patterns or Hurricanes or sort of erosion erosion likelihood is more suited to a sort of a Hexbin type view rather than a Rather than a hard line view points points lines and polygons are really meant for Manmade features not really environmental things like rainfall patterns and things like that So having some support for continuously changing data visualizations is is In the future and then animated visualization So if you want to model if you want to see plume dispersion That's something pretty big now with all the with all the things happening in subways and or you know flooding The way water disperses across the surface those are the kind of things those are the kind of scientific areas you'd want and animated Spatial temporal visualization and that's that's a lot harder than what's happened so far, so please help and That's what I've got so go out and use pixie dust install it try out data science Data science experience or try it on your own in your local notebooks and Let me know how you like it or if you want to get started helping us write some code Thanks