 Sure, even though most of the best program as I know are generalists And if you become a specialist for long enough you become a guru and then you just turn up and they give you money And it's great. I hear um anyway, so I'm on my way to guru them and I gave you talks in London and then talked to few people and I rightly got in touch and right now Python is obviously Amazingly hot in the area of data processing JavaScript is amazingly hot just because it's JavaScript and the web isn't going gonna get any smaller anytime soon And it also just happens to have some fantastic visualization libraries So that suggested something and if you buy me a pint I'll tell you how I managed to get bees on my cover because they aren't just you can't just request your animal You have to give them a story. I work with bees as a research scientist for a while So I was rather chuffed about that And the great thing about writing book is Ed. This has asked you questions like why which is always a hard one Um, and they also ask you questions like they talk about pain points, which is a really good way to focus here What what are the pain points this kind of book or this area is trying to solve and for me? I was in science I did lots of goodies in my time WX Python PI QT absolutely everything and I had they're all lying beautiful I'm very impressed by them, but I'll never show them to any of you here because it's just too much trouble if I'd done them as a web app if I had sort of focused if I did I've made that the web GUI the focus initially you would all I'd be able to press it and share it the world So there's an integral pressure there just because the distribution I can just press the button everyone sees it So there's only one web language and it isn't Python Now what do we do to get into the web there's some fantastic Python initiatives I'm really impressed by Bokeh plot Lee Vega very clever programming gymnastics but There's an elephant in the living room and it's JavaScript if you hadn't worked it out. I paid a license to this cartoon. So I'd like a couple of laughs. It's rather good Anyway, so yeah, people tend to tip you toe around that fact that we're programmers We don't like to be told something's gonna be compiled into our language. That's a horrible thought for Pythonistas It's also horrible thought for people use JavaScript actually because some of us quite like it. We're we're more impressed. So So it's great compiling into JavaScript You can do things with it But you're gonna be faced with the problem and the same problem you do if you're gonna be faced with if you do Compiling to anything else it's debugging and debugging takes place on the browser and it's much better in the browser And if you have to work out sort of mapping files and everything else it just gets well in my experience It gets pretty horrible and coffee script had a kind of an idea there, but it didn't take And the thing is JavaScript isn't a bad language, whatever you might have heard it's It's rather good. I'm here's some just a completely random polemical Bullet point which I will refuse to defend if asked. I just I just stuck it there randomly You can talk about it or not, but I do think that the thing is whatever happens these days You're probably gonna run up against JavaScript and yes, the standards committee will they're working on it. They'll get there in about 100 years If you're prepared to wait that long, you'll be fine. So it turns out actually JavaScript I Interofferate a lot of find a lot of people who do the whole point of this book is that they interoperate pretty well They're both scripting languages. They're both very simple a simple cheat sheet quite frankly gets you most of the way there It really is not a difficult language. It has its quirks, which language doesn't you learn three or four things gotchas and you just move on and you start programming and We're all programmers. So that's easy, right? It's not c++ You don't have to you know, you don't have to spend three years communing with a guru to understand how to do the simplest thing It's very simple and it has it's not all one way. I love coding in Python I think there's a reason why there's so many great Python libraries because Python program experience is so pleasant and it's so lovely to read other people's code, but JavaScript has first-class functional methods on arrays you can do filter map reduce things like this There's anonymous functions being used as well, which I don't think Python has So Sometimes it's actually a more pleasant programming environment than Python And I never ever thought I would say that it smokes Python to speed Google there's an arms race on the in JavaScript right now. These are taking off a famous benchmarking site and Yeah, spectral norm in body. We're talking, you know 20 to 60 sometimes a hundred times some you know, they've got JavaScript running almost natives in places They're working on it huge forces are involved in making it the very very very efficient powerful language so the other but the other thing is You can get obsessed with syntax. It's all stupid. I mean people who worry about it. I love Python's white space I'll defend it because you spend most time reading code. So that's great, but syntax isn't the issue I mean for data visualization certainly and a lot of things. I think there's been on JavaScript Declared to functional paradigms. It's much more significant than any particular language D3 the library uses a very particular type of programming convention It is a declarative and functional and it would you would be faced with the same problems of using it in any language You would still have to maintain these abstractions So Yes, we should we should be in a good place here. They should be a perfect compliment, right? I know node is making Forays in the world of the server, but it hasn't as I'm sure a lot of you aware It hasn't got the libraries yet Python has if you're doing data visualization Python has amazing data visualization a data processing Stacks and getting better all the time, but it hits this brick wall when it wants to express itself on the web And I said there are various solutions to that my feeling having tried and played with them a lot Is that you actually want to be doing programming on a browser and there's only one way to do that Um, so why why why is why is this why is there this push? Why do people I've talked to Pythonistas? I tried to get get their feel for it I think a lot of people are scared and I think they've got every reason to be scared because web dev is horrible It's associated with all sorts of craft Famboyism frameworks all this rubbish and it can you know, I need to use an idea. I don't need to use an idea so this is big space between between pyland and jsland and I think in people's sort of Mindset and my book if anything is to suggest that that is figmentary Actually, you can just pass Jason between the two of them. You create little Shim that passes the date. It's all about the data. It's just follow the data At which point you're just doing programming. You've created a very very small amount of Conventional web dev which is getting smaller all the time So my proposition is a little HTML skeleton to be honest It's not much more onerous than a lot of the tabs that we used to have to write for qt or there was xhtml tabs I had to write for WX Python at one point That was more horrible than conventional HTML 5 quite frankly Jason is the obvious delivery format. It's pretty good. You can do pretty much anything with it and Python of course handles it fantastically and the great thing about Python is compared to say are or Or any of the kind of matlabby map Mathematical environments, you know, you can just roll a server in Python in a few lines And that's amazing and you can stay in the Python ecosystem to do everything all the processing all the delivery You only need to live to leave it when you hand off your data to the to the web at which point Maintaining control of it certainly in the visualization context Anyway, it doesn't work for me You can call it one-page app if you like you can think of it as just a canvas on which you're programming You can think of it as being desktop based. I do You can even use various various ways to make sort of desktopy apps, which And then you press a button in there suddenly on the web, which is incredible So I thought I should do an amazing visualization for this talk and I didn't have much time But I'm still very proud of it because I've ordered the numbers here in such a way that you can See patterns in the data. I've ordered the rows, you know, I've done a lot here to help you visually understand what's going on and You know, I because this this was this was the original, right? not as helpful and So, yeah, there's a data visualization we've we've made we've taken the day we've made easy to consume Of course, some people would say That's a better way. I don't know, you know, you choose you choose. It's entirely up to you Now I did this for a way two minutes might bust of the credit D3 would probably have done it in 30 seconds But the point is that when you start using D3 you start just thinking of something and almost doing it It forces you to you know adapt your distractions But once you've done that and if you thought well, you can be and I intend to teach you well Then you can do things and I want to just show you I thought well I just take this that's all the web dev involved in that now. Obviously. It's not a particularly pressable found Visualization, but that's the HTML file. That's the index That's it and everything else is Programming this is JavaScript. I won't go through it now, but it does you should go to me It does look like programming And it is programming and it's pretty easy to do off while it's not much more difficult than Python it's as powerful as expressive and The point is to minimize all of that horrible craft. You don't need less. You don't need Ideas, you don't need any gulps or grunts or any of that rubbish You can program with a bit of CSS bit of JavaScript a bit of HTML to access backbone and the rest of it is just programming which is The point Just one kind of shout out the D3. I've you've probably what you've heard of it. It's It's not the only game in the park, but it's so much better than everything else It's sort of overwhelms data visualization in in in the browser and the point I think has to be made that Perhaps the greatie. I remember when it was proto of is I remember following it when my boss that was first developing it and Everyone's what why why in JavaScript? That's insane This is a powerful visualization choice one of the best implementations of a very profound book called the grammar graphics Why which is the base of ggplot to which I know a lot of pythonistas have terrible envy problems with I do having seen it work But it has a solid theoretical core and he did it in JavaScript He was going against the grain then in fact He was using SVG which was about to be dumped right and it's easy to begin Scalable vector graphics were on their way out and now It's unthinkable that any browser wouldn't support pretty much the whole the whole caboodle. So that itself is It's not a charting library In fact, I think the fun thing about this is getting away from charts getting away from conventional data expressions The sort of thing that we've been obliged to use because we couldn't do anything else because the software didn't let us change anything And then as programmers that's pretty frustrating So the idea is to build if you can build chart libraries with it There's lots of them out there. I suggest using them instead of rolling your own I will teach you how to build a bar charts. It's a very good learning experience Very good way to absorb the fundamentals, but there's a point where you just want to pass all that off I think the thing is the innovative use of data the innovative visualization data is what D3 and libraries like it I mean, there will be other libraries at the moment D3 is predominant because it's just it's so mature It's you know, it's a 10-15 year project and it it shows So the idea of the book was Harping back to that. I was making the point with that visualization transformation all data visualizations essentially transformative and And the point is transforming into things that the primary hominid visual cortex Can easily absorbs that's what you do when you communicate and with a picture So I thought I would try and base the book around a transformation to transform Wikipedia's Nobel Prize page which is fairly dry into a Modern interactive visualization and to teach the whole process and also serve to use all the amazing Python Stack available in a not completely contrived way So scrappy will scrape the pages for you scrappy some I know there's been a talk here and I've seen the team scrappy seems to be coming on leaps and bounds and It's once you get Yeah, once you get into it and it's not a huge learning curve You can do amazing things cleaning pandas is and we all know that pandas is great It's a great way to clean data and the data is always dirty always Pandas and matplot lib with seaborn others are a great way to explore it and then you can roll a Server I used to ask in a few lines. You can roll a restful API that your JavaScript web-based JavaScript can use that's that's amazing You cannot get that in any other language that I'm aware of I've seen web servers in our and the goggles do nothing So this is the what you start out point as you see this is how it works in Wikipedia People have lovingly entered these names by hand Your alarm bells should be ringing at the moment human enter data. So that's going to be an interesting challenge By now by country and these link out. I wasn't prepared to risk Wi-Fi, but these will link out to the individual Winners and the idea was to scrape this page and then use this page to get the individual winners And then scrape them get all the data the biographical data about the categories and the people and turn it into something a bit more kind of digestible I think and Unfortunately unlimited by the resolution of this screw I thought I figured that it would be huge, but this is a 768 so it's almost fits but so this is the sort of thing you can you know you can You can ask questions of the data You can kind of quiz it and the point is discovering your own stories Discarding your own narratives and not being obliged to absorb other people's because generally Not much has happened since the static visualizations of Victorian times. You know the Times newspaper did amazing visualizations in in ink and these became Visualizations in pixels and now we're reaching a point where we can actually play with them We can interact that's a huge thing. This is an enormous dilemma. I think in the capacity of human beings to Communicate it's really big honestly Right so First we scrape I'm not going to go into any grade you just give you a kind of feel for it Scrapey as I said it has a learning curve, but when you get used to using it you use as he I'm using Chrome Explorer The other great thing about programming JavaScript if you don't know is there are some very powerful Debugging very powerful exploratory tools built into modern browsers the Chrome Chrome kit is arguably that might not be the best it is the best and You might well be surprised at how much you can do in fact the debugging environment is vastly better than anything Python has You can do almost anything and it's a performance profile. It's all built in in this case I'm using it to explore the structure of the page I get what's called the X path, which is the the Identification the syntactical identification of the the bits I want in this case It's the biographical detail of the Nobel Prize winners. I want their little mini bio and I want their picture I create what's called a spider within a scrapey and I Send that through the it deals with all the asynchronous load balancing It deals all the cleverness that you don't want to deal with yourself For example, if you do manual scraping you'll probably get banned within 20 minutes because you'll be hitting the service too hard And then you want to do manual throttling and that's a bit of an art form and it's nice thing That's great because it just does all that for you. It'll even do anonymized Anonymized get some various other things Then you sent to a pipeline to the finishing finishing Zone to in this case to consume the image. So at the end of this I'm left with a nice array of JSON objects Almost certainly dirty and the job is to identify and remove the anonymous fields and Clean it up as best we can and then use pandas to explore it with map.lib So So we've got our data. We've got a little we've got an array of JSON objects That's what you get left with with pandas and in our case also have little local Links to the image data and anything else that we were might have been interested in and all that is sensibly hashed and and Efficiently done and scraper does all that for you and it's lovely Cleaning is only so interesting But you load it into the data frame you Do a quick recce are then obvious missing fields. There's an obvious missing field in place of death You use these are these built-in pandas Methods you describe your data and me that you can see that there are duplicates in the In the name field here, we've got Frequency of two for this guy. That's flag And you see other things 59 countries in total for nationality and other stuff This is a very nice way of kind of summarizing what you're looking at and also Directing you to the attention you probably need to give it. Here's one example. How easy it is with pandas to clean stuff The date of death was recorded by a human being in In Wikipedia and we have Joe Hannes Deedrick van der Waals van der Waals Field and his date of death is Deedrick caught fake Which is what happens when you get human beings to fill in data and that's a category error in philosophy I remember that much of my philosophy degree But you can see how three lines and you can you can find it you can tag it you can fix it You can throw away into what you like a when you pandas makes this sort of thing very easy, so And at the end of it you have I said the worst stains. They'll be gotchas They'll never to be gotchas because that's the nature of the beast, but you've cleaned it pretty well We have our 858 winners. Those are the only ones that Wikipedia is recorded. It might not be all the winners probably won't there'll be somebody in Usually a small country who's missed the fact that they have a Nobel Prize when in their country and no one else is bothered So you missed two or three That's the nature of Wikipedia, but equally That's the joy of it I guess You then Want to explore this is much more fun. You want to explore your data with pandas. You're looking for stories to tell you're looking for correlations, I mean Everything in data visualization is really narratives Even if it's got a dashboard you're trying to you know explain your your account index to or you're trying to explain anything You're communicating a story. That's what human beings respond to I think So you're looking for narratives you're looking for narratives that you can tell and that's what you kind of the pandas exploration will suggest those Here's a really quite dull narrative probably quite predictable, but it's a story You know United States has a huge number of Nobel Prizes relative to many others. All those will see that's not in a per capita index That's when it gets a bit more interesting But you can get a more interesting story with a few more lines of pandas. This is breaking the countries into regions Up here. We just create North America Europe Asia. Just take the three biggest countries And we just plot you see how easy it is just interacting and you can see a the blue chart Which is America's Nobel Prize Hall passes the state European chart around about 1980 odd and That's a story. So America's shooting off and this is put this is all a post-war Investment in American science huge thing after the atom almond Manhattan project and various other things scientists got a lot of money and a lot of support Here's a huge story two lines of pandas Gender disparities in the Nobel Prizes. That's pretty pretty shocking, right? You can look at individual things like The Distribution of the age of winners quite interesting if none of you've got a Nobel Prize yet and you're 95 It's very unlikely If you're around about 60, it's your sweet spot So if you're expecting a you're expecting a and a letter this probably still letters a letter from Sweden then There you go. It's a good time right now This is life-explaning. This is even this is more interesting. They're incredibly long-lived now Of course, if they're selecting at 60, there's already there's a selection pressure if they're getting prizes at 60 They've already got really everyone who died before everyone who died before that But but what's nice is I'm using what's called a violin plot. This is seaborn's violin plot It gives me a kind of extension of a box plot gives me the distribution and in to and you know very very easy to do a few lines and the Yeah, the distribution is just not the longevity of the Nobel Prize winners is astonishing And you can do it's a bit more interesting You can plot the longevity against the time that the year in which they want it and you see this is kind of like Doing population just a little bit of population the demographics Changes over time people are getting people getting longer lived. So here we have a line regression with Confidence interval in a few lines using seaborn's lm plot, which is very nice Seaborn is lovely. It has extensions statistical extensions to matplot live. I'd thoroughly recommend it And here's another story which is Which just dropped out it's called the noble diaspora These black spots here. This is plotting the country in which a Nobel Prizeman was born against the country in which they won their prize So we compete it has that data or some I've had a fair amount of that data And you just do that in a little heat map in a few lines and these black spots here Represent the exodus of essentially Jewish scientists from world from from Europe in World War one and World War two Following periods of anti-semitism There's a story in there. There's also a story out some other That's a lot of Canadians moved to America There's probably reason for that. I'll have to ask some Canadians But yeah, this is the Austro-Hungarian Empire breakup off as a clear signal and the breakup of the Third Reich as well In the little heat map. So once you've done all that you want to imagine a visualization you just Probably less is more almost certainly less is more you want to create a context in which people can find their own stories That's a big idea of modern data visor. I mean, obviously sometimes you want editorializing but I think Even the strongest editorializing should allow for some alternative perspectives because otherwise we're just saying here it is This is what you were interested in wasn't it? So if you create something which maybe guides people puts in the in the zone You can tell the story that you want to tell but equally you can allow them to to forage And that's kind of I guess the idea of the visualization here Before you can do it. You need to deliver your data As I mentioned flask is a fantastic way to do this And this is a flask restful api. That's the enormous amount of work required If you have your data in in mongo db It's not a lot longer for a standard sql implementation Plask restless at which I explain the book is Just good, but this is eve. It's a lovely new no sequel restful api Roller and as you can see it's five or six lines At which point you can consume your data from a JavaScript app Straight out of the mongo database And of course you can tailor this to your heart's content. Abe is very well built. It's very powerful It's got lots of lots of bells and whistles, but this is the basic implementation as you see you test it from command line with curl and It produces lovely jason files, which you can then Use d3 to To turn to something nice. And this is the transformative transformative phase You pass it off to the browser. I I know you can do stuff that doesn't involve essentially passing data to the browse You pass a pre-compiled. I guess javascripty thing Often obviously that has its place, but I mean but what you do when it goes wrong I mean the browse with this great debugging set and I have this computer generated code And I'm sure a lot of you have nightmares about computer generated code because I I can really using ball and Delphi and number other things and billions of lines that you'll You'll never understand just to do the simplest thing and that's funny. That's not my stick anymore We're programmers. We want to control things at a low enough level Unexpressive enough level and it's hard to do that Within direction is my kind of feeling having as I said flavor, but that's not to say that other alternatives don't have a place I'm just my perspective So you build your visualization. I'm not going to go through all of that in In well two weeks. I'm saying I'm going to go through it in in half now, but D3 was All of these are completely built from scratch and that's the other thing with D3 you don't Of course you can use bar chart plugins, but it's just a real thrill to build your first bar chart Controlling all of the different elements, you know Because I guess I'm sure you love you heard that feeling you get a piece of software that Is great as long as you're doing things its way and then you want to change something and It's then you go on to another piece of software and just keep doing this in a cycle With D3 you just You change it So right, so let's see. Let's see. Let's see a story a couple of stories on to to end on Has anyone got an estimate of the number of female physics prize winners? Anyone know To who said to All right, you're good. You know who the other one was. I presume everyone knows the first one Yeah, everyone knows the first one. Who was the other one? She had I don't know I don't think so not impressed physics. She might have won in chemistry And No, I don't think so. No, she's made it when in mass Anyway, I think it's fair to say two female physics prize winners. You'd have thought somebody would know the second one. Um Two by the way, I'm gonna but I can I can I can show you first off. Here's the big story. Here we go So that's female noble prize winners Which as you can see is kind of Smaller than It's only a bit right Let me come on Um And let's do physics. Here we go Oops There we go maria gulput maya Why doesn't anyone know her name? I mean other than maricuri. She's the only female physics prize winner It's astonishing, right? There you go. So as the story told, um Let's Pull this back. Let's tell another story per capita This let's change the winning metric a little bit. So Um, and let's do it in a big this this big thing here is is the santa lucia derrick wallcock poetry winner This is an island with a population of about 50 000 So you can imagine a per capita rating is gonna is gonna skew a bit But let's let's do physics because that's that's that involves money and stuff You can't do that in santa lucia with their research budget. Um Anyone from the netherlands? Okay, well netherlands scandinavians danmark, uh, swiss and all do incredibly well in per capita index. Um Which i think is probably fairer than than most others i can think of And another i guess story would be See Let's do economics prizes So is that a post war neo libertarian consensus i see possibly um So those are stories tracks in the data. You can find them yourself. I would direct you to them if i did this This you as i said the the idea is to learn how to build this but The main thing is to allow people to find their own stories and balance that a bit. You don't leave them completely in the dark But uh, well, let's just have a very quick look for This is what started with as you can see slightly less easy to find stories about individuals Um, but that's the original once converted um This is all the html that uses Those are tags. They're essentially a backbone of that you will flesh out programmatically Using d3 in this case as you can see it's not a lot. Um, there are more far more sophisticated visualizations But that's multi elements and they're all interactive and various other things This is not too ominous. I think in terms of your web dev. That's pretty much the whole file. I'll give or take um The rest is just importing scripts as you JavaScript definitely does has not fixed importation ECMAScript 6 which is pretty much out now is has made big strides there. JavaScript is moving Quite fast. It is improving um Someone's thinking it's going backwards in terms of class, but uh, that's another issue. Um But yeah, as you can see you're unloading my there's my script files each one controls a component um And you got to get the order right that then I'm never going to get to there anyway in programming, right? I mean a kind of circular Imports is always a bit of a problem and that's loading your d3 which you can load off the web using cdns, which is pretty efficient and uh And to summarize so mediated by python, uh by jason python javascript are a great compliment I really think they work very well together also Something's got to work with javascript and it should be python, right? Because the alternative is not worse javascript and it just goes and does it on its own And there's probably enough energy and money and everything for javascript to roll a data processing library um, it'll take a while to to reach pythons heights, but Um, I think the thing is make friends with the elephant I should say make friends with the elephant very little web dev needed as I hope I showed These are exciting times. This is a very exciting time to data this it's And data is everything right data is and visualizing data is communicating data So because doing it any other way is usually pretty bad. So this is um, this is an exciting time And guy called me not he did the famous visualization. I should have shot here napoleon's army The death march to moscow. Uh, this was an amazing visualization that captured multi-dimensional data in a single frame and um The big challenge I think now is capturing multi-dimensional data and that's what we need to do and that's uh, we now have the tools to hand They will be tweets, um I'll be tweeting around the book and stuff and there's a mailing list if you want to kind of hear things about it I'm done I got one question Yeah, hi brilliant talk It's um, I've been uh, I started learning python and trying to do my own little desk data vis project That was my the first thing I I did to learn python. Uh, so basically I'm trying to do exactly what you're doing Um, so my question is really practical. I mean, have you finished your book? Um, Can it's have you got something on github? I could look at it's on it's a few months away I'm in the middle of can I talk to you after the same period absolutely But yeah, I yeah, the book's probably three months away. Um, my editors are on my case deadlines whizzing past me is so Um, but it's mostly done. It's in it will be in the kind of the I'm about to receive feedback at which point. I will probably just scuffle into a hole and die But then I finish the Anyway, thanks Uh, do you have a sort of a favorite set of gis libraries that you work with when you gg Mapping libraries geospatial? Uh, yeah, there's uh, I use Brains gone blanks great. I do work with shapefiles. It's gg gdis isn't and there's a fairly decent open source Um, it's ggs camera's name right now, but I use it But generally you can find I don't do a huge amount of mapping work You can normally find top adjacent is is the kind of the format of v3 You can normally find your map the map you need to start with but yeah, there's a very I can tell you Thanks for the nice topic. Uh, just a quick question. Do you recommend any way or method we can use to export the The report to pdf or something like that. So export which export the charts Or the results into a pdf format or something like that Well, I mean that would be obviously a static. So I don't think you can do interactive charts in pds Um, but yeah, you can do html to pdf. That's pretty trivial. I think Yeah, one question you spoke about eve and mongo db. I think it's a very good choice Have you considered also using mongo engine? Which is provides some more natural way to query mongo from python And there is also an extension eve mongo engine, which has started. Maybe you could consider Yeah, I've seen I've seen mongo engine. Uh, there's so much good stuff to talk about but especially in that area But yeah, no, absolutely. I know mongo engine and um, I'd recommend it And also for the gist of just as a complaint leaflet is excellent for javascript. Sorry leaflet leaflet. Yes Just yeah, I cover leaflet in the book. There's one there's one example Leaflet's a great way to do mapping without having to create everything yourself with d3 And uh, yeah, you should be said there are some very nice high-level js libraries Any more questions? So thanks