 Our next speaker, I'm thrilled also to present, he's an assistant professor of operations management in the Saltz, the Sattel's faculty management at McGill University. His research focuses on the application of predictive analytics and optimization methodologies to improve operational decision in retailing. A self-professed art addict is developed to really important packages that he's actually going to show us today. Please help me welcome Ramnath Vadyanathan. Thanks so much for having me here. And so this is a good talk to be after right after the I-Python talk, because I mean, many of you would have followed like all the nice things that people in Python say about R and people in R say about Python. So Jake's kind of really up the ante for me, but I'll try to sort of do my best. So it's a quick thing. As Aideen said, I'm a professor at Montreal. And my research essentially focuses on retailing, mainly product variety, and stuff like that. So the reason I put this up, so the very first time that I gave a talk on data visualization, so I was basically taking a bus down to New York. And at the border, I mean, immigration, they asked me a question, so what do you do? I said, develop a professor. I do operations management and statistics. What are you going to talk about? Well, I'm going to talk about interactive visualizations, the kind that you see in New York Times. And he's like, does this make sense? So basically, I got rid of for like half an hour, and then he was a good guy. I basically spent time explaining to him, and then he said, oh, now this makes sense. OK, good, good. Have a good talk. So since then, I think I've gotten better. So every time at the border, I make sure that I just convey the right amount of information, and I'm good. So what's my talk about? So my talk is going to be focused on an R package that I wrote called RCharts. And the whole idea of this package is to essentially take what people typically do with R in terms of how people create plots with R, but instead of giving you static plots, essentially giving you interactive plots. So in very simple terms, I think Jake's kind of made my job really easy. So what Jake's done with MPL and kind of going to D3, I've done the same thing with RCharts. With one difference, and I think I want to point out the difference right up front, Jake has done a really awesome job with writing the JavaScript library that plugs in, and essentially ultimately does D3. I have kind of taken the easy way out, because when I started doing D3, I was kind of not really comfortable coding that much of JavaScript. So I said, hey, people have already built wrappers on top of D3, like NVD3 and a whole bunch of it. So I essentially piggyback on those libraries, so I don't write a lot of JavaScript code. It's essentially interpreting things that are there. So let me kind of quickly get you comfortable with what RCharts can do. So the main difference between how people think about plots in R versus, let's say people think about plots in JavaScript, at least in my mind, is that the primitives are very different. So in D3, in JavaScript, you build a plot like pretty much point by point, line by line, axis by axis. But when you do stuff in R, so how many people here have used R? Just a quick show of hands. OK, good. So as you know, in R, you're very used to plotting by just saying, OK, here is X, here is Y, here is my data set. Give me the plot I need. So I'm not really bothered about too much about the other things. So here is one of the libraries that I plugged into RCharts. It's a library called Polycharts. That's actually based on the grammar of graphics. Too bad that it's not open source. It's free for non-commercial use. So the idea is to sort of say, OK, so this is a scatter plot where I have a data set called empty cars, which essentially has mileage and weights of cars. So the notation there is saying, give me a plot of miles per gallon versus weight. And you will find these variables in the data set called empty cars. And by the way, I want a plot that has points as the basic type. So once you do that, you see this plot. And I'm not sure if you can see the, at least you can see some kind of a hover. It's very small. But the idea is that this is an interactive plot. It's essentially being there is JavaScript code that is generating this plot on the floor. Now, once I did this with one library, one of the things I found was that, and this is something I always puzzle with, every library kind of takes its own tack in terms of data structures, in terms of whether it wants array of objects, object of arrays, wide data, long data. So one of the things I wanted to do was to say, OK, you know what, I think as an R user, I don't want to be really exposed to the finer details. I just want my plot. So basically, I did use the same approach and plugged in a few other libraries. So this is an example from a library called high charts. And again, the same idea. So I want a plot of pulse versus height. And the dataset is again a built, inbuilt dataset in R. But this time, I want to size the points based on age, group them based on exercise. And the type of chart I want is a bubble chart. And these are the title and the subtitle. So once you do this, you get this nice high chart, which as you can see, for some reason, revealed JS tends to rescale things. So my plots are much smaller than what I expected them to be, but it's all right. So you can see that high charts has a lot more functionality in terms of what you can do. So you can kind of zoom in and zoom out and all that stuff. And all this, you just get for just writing a few lines of code. So the high chart binding actually was written by my co-author, Thomas, who is in Sweden. So the basic idea is, as I said, think in terms of data. Data is the primitive that people working in R usually think about. And then you want an interactive plot with just the same lines of code that you would typically use when you do a static plot. So I can kind of go on and on about this. And initially, I mean, this project started as a hobby project last summer, where I was teaching statistics. And I wanted to kind of impress my students with some interactive kind of graphs. Statistics is not the subject that a lot of people really like. And then once I found that there was a pattern, then it became pretty much an academic quest to sort of say, hey, can I find a library? Can I plug it in? So I think as of date, I think I plugged in close to 10 or 15 libraries, JavaScript libraries, wrapper libraries, not D3. So the idea is that now I can kind of use any one of those libraries to get the plot that I want. So this is an example of another library called MorrisJS, which actually does pretty neat time series plots. And these JavaScript libraries allow you more interactivity, like click and other things. But I mean, with a few lines of code, you can pretty much get only the hover and maybe interact kind of behavior. So this is something that is still something that I'm working on. Another library called DimpleJS, which actually is very powerful. The name is a little unfortunate, but I think it's a very powerful library built on top of D3. So if you go to the RCharts website, you will see all the libraries that have been integrated. We're in the process of integrating more libraries. And the objective is to let the R user take advantage of all the wonderful work done by JavaScript developers and essentially, and interestingly enough, it also provides a community for the people developing JavaScript libraries. It provides another community which uses them. In fact, for some of these libraries, like PolyChart, I think R users filed more GitHub issues than the JavaScript users. So it's kind of interesting to me that it's not just we are kind of taking, but we're also contributing back to the JavaScript community. Now motivated by that, I kind of wrote another package called Rmaps and no guesses here, takes the same idea. So instead of charts, like, can I do interactive maps, right? And here again, I mean, I didn't want to kind of get into primitives. So I said, okay, look at the libraries that already exist. So there is leaflet, which does really need maps. So I'm just gonna quickly rush to the code here. So this is just initiating a leaflet object, setting a lat long, choosing a provider. I like Stamon's watercolor maps. And then I want a marker. And then finally, I'm saying print the map. And then you get a map like this. And you can point on it. And Rmaps exposes you to the full API of what leaflet or pretty much everything that leaflet exposes you to Rmaps tries to expose people to that. Another one of my favorite examples. So, and many of these examples came about because of kind of phishing expeditions. So for example, I was looking at a library called Crosslet, which is a combination of cross filter and leaflet and that's some really nifty things. So I said, ah, this is a really nice JavaScript library that can be used to visualize spatial data. So the data set I'm using here is actually a data set from visualizing.org. It's a data set that has web indices on multiple dimensions across countries. So what I'm saying here is look, I mean, I always tend to think of pretty much any plot as like an X, Y. I know that it's probably not the best way to do it, but it simplifies my life. So here I'm saying, okay, the X variable, I want you to think about it as the country, the Y variable, I want these four variables to be plotted, and the data set where you'll find these variables is web index, and I want you to render it using Crosslet. So once I do that, and actually this particular one, I think it's more fun executing it live. So, let me just go down to this. Okay, okay, so I'm just gonna execute this, and as you can see here, okay, okay, there you go. So here you have a coroplet, and of course there's a lot of data missing, and the nice thing is that you have all these numerical variables on the top where you can actually filter by multiple variables, and you can see which countries kind of have indices in these ranges, right? And Crosslet, I mean, bulk of the work is done by Crosslet, so as I said, I don't write a lot of JavaScript code. I study the JavaScript code that is written by other people and basically provide an interface from R where I try to think about, okay, how would somebody dealing with data think about this plot, and the way I would think about this plot is, okay, there are some numerical variables, there is a country variable, and essentially that's basically what's gonna give me the plot to this. So this is Crosslet. Now you can also do custom visualizations. I'll get into this a little later, okay? Now just creating visualizations is not sufficient. Being able to share visualizations is extremely critical, right? And here's my rationale usually, whenever you want a tool to be used by people, you want to make sure that it doesn't change their workflows too much, right? That's the best way to get people to use a tool. So if I think about how people in R typically use R, I mean, when they do static plots, they don't have to kind of jump through hoops to share things, right? Because images, static images are easy to kind of have in the document. But when it comes to interactive visualizations, it's not the same case, right? There are a lot of dependencies. You have JavaScript assets, you have CSS assets, you have data, you have all these kind of things, right? Now it's not terribly complicated to kind of ship everything when you share, but easier you make it, people are gonna be more likely to use it. So that's the design principle that I always use when I design packages of this kind, where I say, okay, I really want to make sure that it doesn't deviate on people's workflows, okay? So here is again a simple chart. It's actually a multi-bar chart. So I'm just writing the code to create the multi-bar chart. And now there are many mechanisms that people want to share their visualizations, right? So one is just a simple save. So here the idea is that I'm just taking the plot I created and I'm just saying, okay, save it. And you will see there is an argument called CDN equals true, which is saying, okay, you know what, don't link to the local library on my machine, but link all the CSS assets from an online repository, okay? So that's essentially what it does, okay? Now I thought that this would solve the issue because I mean, these assets are online, so when somebody shares it, it's gonna be fine. But there is a problem, right? So a lot of times when I link to something online, and for example, in this case, NVD3, they decided to change the links to some of their JavaScript assets. And somebody who was using our charts, they were giving a talk, one day it was working, the chart was working, the very next day the chart wasn't working and nothing changed, right? So then this kind of prompted me to do something a little different where I added another option called standalone where the idea is that the entire chart and the JavaScript assets, the CSS assets, everything is gonna be bundled into one single standalone page, okay? So it's a lot of, I mean, if you look at the HTML, it's really ugly because what it does is it converts everything to data URIs. But the nice thing is that, I mean, with this HTML, people can share without really having to worry about things breaking for anybody, okay? Now, one of the things that Jake was talking about was this whole notion of reproducibility, right? And I think reproducibility is really critical. So a lot of times I'm not just interested in sharing the chart, I'm also interested in sharing the code that came with the chart and I want to maintain things in sync, okay? Now there are two ways to maintain things in sync. One way to maintain things in sync is what Jake talked about, which is the good old iPython notebooks, okay? The other way for those of you who are more proficient in R and kind of more comfortable with R, you would know that RStudio has a format called R Markdown, okay? And essentially what R Markdown does is, let me just increase the font size so that you can see things better. So what R Markdown does is it allows you to mix code with, allows you to mix code with the end product, okay? So it's just like iPython notebooks. In fact, there's a lot of similarities between the two to the extent that it's actually possible to automatically convert an R Markdown notebook to an iPython notebook and I do that quite a lot. So for example, the slide deck that you just saw where I showed you the high charts plot, this code is live in the document, okay? So which means that if I change this code and I execute it, my slides will change automatically, okay? Now you can do the same thing with iPython notebooks too. And of course, you wanna make things really simple for people to use again, as I said, that's the objective. So for example, if I wanted to show the same plot in an R notebook, I would essentially use the show method and basically embed it as an iframe and then choose things to be delivered from a CDN, okay? So these are a couple of other ways to kind of share the same plot. You could inline the iframe whole bunch of things, okay? Now, sometimes you just wanna kind of share a quick prototype, right? You don't wanna kind of go through the process of like saving and then shipping, emailing. And so here, again, I'm very inspired by people who built out the packages. So Mike, I think D3, one of the very good things about D3 is this whole thing that you could publish to gists and you could really view things amazingly, right? And I think it's really triggered movement where people share examples and can easily take advantage of things. So now I wanted to do the same thing with R charts. So basically provide people with a function that would allow them to publish their charts online, okay? So let me again just do a quick demo of this. So let me just make sure that I have this code here. So this code, all that it does is it just creates a chart, okay? So I'm gonna run this code, okay? So my plot is good. So now all that I'm gonna do is if I wanna publish this, I'm just gonna invoke the publish method and say, okay, hair versus eye color. And so what it's doing in the backend is it's using our package called httr and it's saving the HTML and pushing it to the gist and it returns a link to a viewer that I kind of borrowed from open source code online. And so now you have the chart show up here, okay? Now in this case, I only shipped the plot. I didn't ship the code with the plot, okay? I could do a live demo on that but I just wanna make sure that I wrap up my talk in time. If you shipped it with code, it's very easy to ship it with code. If you shipped it with code, then what happens is that it allows you, the viewer allows you to actually play with, allows other people to play with the code online, okay? So this is kind of very, very similar, model very similar to Mike's blocks. So only difference is that in the case of blocks, people wanna see the HTML code. In the case of our charts, I believe people wanna see the R code because that's what's generating everything, right? So now if I built a functionality, you can't see things here but that's all right. So what it's doing is it's essentially opening this up in a web app. So it's an R-based web app which I will talk about in a few minutes. And the nice thing is that it allows people to play with code without really having our charts installed on their computer, okay? So in this case, for example, if everything is good, it should show up the plot, yeah. So you see the very same plot being built online as what you saw there. So the whole idea is that there's an ecosystem where people can publish their code with their plots and somebody can just directly go there, either copy the code to their R session or I mean they don't wanna do that, just play with things, they could hit the edit button and it essentially takes them to that, okay? So the main point I wanna make here is that when building tools of this kind, I think being able to create is one thing but also being able to share and making it really easy to share, especially for people who are typically used to sharing static assets, I think is really key and the simpler you make it, the more people are gonna use it, okay? So this is as far as embedding goes. Now I will talk more about this in just a bit but you can also do the same thing in iPython notebooks. So I'll talk about the iPython R kernel which was built recently, which actually provides you another way to build self-contained, fully reproducible, dynamic, interactive documents, okay? But again, these are only the means, right? Both iPython notebooks are marked down and the means to do it, they provide you a very nice way to kind of tie things up. Now we just talked about creating charts. Now what about customizing them, right? So you wanna basically do more with the same chart. So here's an example of a scatter plot that I had shown you before and in this case, only difference is that I'm coloring the points based on the number of gears. Now suppose I wanna add an interactive control to this chart where I wanna let the user dynamically pick the x-axis variable, right? Now there are multiple ways to do this. One way that was shown yesterday where Sam when he talked about MVC frameworks, he talked about how you could use angular, ember, backbone to really create this, right? So again, I'm gonna do a live demo of this to show you what can be done. So here all that I'm doing is I'm creating the chart and so you have the chart here, okay? Now, suppose I want to add an interactive control, right? So I wanna add an interactive control and here is what I wanna do with the control. I want the control to essentially control the x-variable. I want to initialize the x-variable to weight and I want the control to be populated with all the column names from this data set called empty cards, okay? So let me run this and now if I print this plot, you will see that it has added a interactive control on the left side, okay? Now this is completely client-side. This is completely client-side which means that I could just take the HTML, ship it off, just making sure that the assets are all linked online. Now what's happening in the backend is that it's actually writing AngularJS code, okay? So I will just show you the source. It's probably the most unidiomatic JavaScript, so please don't judge me on this. It's like how it happens in an R programmer writes JavaScript. So you can see all these controllers over here, okay? So what I did with R charts, I basically said, okay, now when you think about a control, I wanna specify the bare minimum set of things, right? And what's a bare minimum set of things I need to specify? I just need to say, okay, here is the variable I wanna control, here is what I wanted to put in that variable and here are the options I want you to kind of add to it, okay? Now you can do more with this. Let me actually show you an example with this here. So sometimes you want to be able to, let me just change this, give me a second. Sometimes you also wanna be able to do data manipulation in the front end, so in this case for example, I have a bar chart, okay? And I wanna add a filter, okay? Essentially what the filter does is it allows me to switch between males and females in the data set. So now if I run this whole plot, you will see that it's added a box here. Now of course, there is overlapping bars here, so I haven't figured out the way to kind of summarize it automatically, but if I switch to female or male, you will see that the data is actually getting filtered. Now all this is happening on the client side, so what's happening is that R is writing low-dash code which actually does all this stuff, okay? Now before you think that this is kind of like a clean translation from R to JavaScript, which I hope you're not thinking about it, this is very hacky. The whole idea is to identify the primitives which I again said and so there's a lot of templating involved here, okay? So let me now quickly show you how R charts works, okay? And I think this is kind of important because it'll show you how, what happens in the background and why I believe that this approach can actually be ported to other languages and then essentially like it makes things mean R, Python, everybody can coexist peacefully, right? Which is what I always want. So here is a simple example of a library called UV charts that's again a wrapper on top of D3, okay? And this is actually the actual JavaScript chart. Now if you kind of break this JavaScript plot into multiple pieces, I'm sure you can't read that over here but essentially what it's saying here is that these are the assets being used by the plot, okay? So what I do is I have a config file in R charts where the config file says, okay, these are the plots that I want to be, these are the files I want to be using for my JavaScript and CSS, okay? Now going to the actual JavaScript code, if you look at it, this is the actual JavaScript code that writes the plot. Now if you think about JavaScript code, I tend to think about it in two parts. One is what's the data and what's boilerplate, okay? And essentially what you wanna do is you don't want the user to worry about the boilerplate, okay? So what I've done here is I've abstracted it out to a template, I use mustache templates and so my reason for choosing mustache is the logicless templating framework and pretty much every language has support for mustache, right? And this is something I consciously made a choice because I really wanted to be able to do this in Python, Ruby and other languages. So now this is the dataset and here is the pain point that's always, in fact when I look back to all the code I wrote, most of the code that I wrote is to translate between a data frame which is the standard in R to a choice of JSON that the developer of a particular library makes, okay? And you'll be amazed at the range. People choose all different kinds of data structures, wide, long, array of objects named, I mean it's a whole gamut of things and I believe that standardizing some of that I think will really make things simpler, okay? So for example, just assuming that people are gonna use a D3.csv or a D3.json to read in the data, I think we'll simplify things a bit, okay? So there is the code here, so what this code does is just does the transformation from the data to the JSON, okay? So once I do that, all that I have to do now is to wrap this code into an interface, okay? And I'll quickly go over the interface. So what I'm doing here is I'm instantiating a new object of the R charts class. I'm setting the library to UV charts which is actually a folder that contains all of this and then I'm setting a few things. I want a bar chart. I want my categories to be the names in the data set. This is the data set and I want you to put it in the DOM element called chart one, okay? So once I do this, I have this plot, okay? Now I can make things even simpler by wrapping it into a piece of code called UV plot where now the user doesn't have to worry about any of the conversions internally. As far as the user goes, switching from one library to one other is seamless, right? Now why would anyone wanna do that? Well, what I've seen is that I always find features in one library that I like, which I don't find in the other library for certain things. So I keep switching between libraries for the kind of plots that I want, okay? So now you have a UV plot functionality and now people just have to plug in their x, y and other things, okay? Okay, so now so far I've shown you simple plots. You can also do pretty complicated plots using all this, okay? So now I kind of debated between doing this in R Markdown versus IPython Notebook. Finally decided that okay, maybe I should do it in the IPython Notebook. So this is actually a recreation of New York Times had a chart on strikeouts where strikeouts across the years, right? So let me just execute all the cells here. Hopefully nothing fails. And so one of the things you can see here is what all that I'm doing here is I'm using a data set that's built into R called a lemon that has all these databases. So the data is built in and I am essentially using the grammar of graphics available in polycharts, okay? So you can see I will publish these IPython Notebooks so they can take a deeper look at it. So for example, what I'm doing here is I'm just creating a first layer that's consisting of points where it's strikeouts per game versus year with a tooltip, okay? Now I want to add a line. If you remember the New York Times interactive there is a line that shows the means of every single year, okay? Now that's pretty easy to do here. All that I'm doing here is I'm kind of specifying another layer and let's say just to show you that this is kind of dynamic. You can see that it's actually drawing on top of the previous layer, okay? So I'm drawing a line and everything else is copied from the previous layers. Now finally, let's say the team, actually I had it as Boston Red Sox. I changed it at the last minute. So let me just change it here again, okay? Although I might have to re-execute the entire notebook. Just give me one second because the layers are all, okay. There you go. So now you have another line added which shows you only stuff for the Boston Red Sox, okay? Now if you remember the New York Times interactive it was not just showing you one particular team, right? You could choose what teams you wanted. And so Jake kind of spent a lot of time talking about the IPython interactive features. Now for R, there is an amazing web framework built by the guys from RStudio called Shiny that allows you to do pretty much the same thing. And actually it allows you to do a lot more at least in terms of features I've seen. It allows you to do a lot more. And so all that I've done here is I've just wrapped this whole thing up into a function called plot team which basically takes the team's name and creates the plot that you want, okay? Now all that I want to do to make this a web app is I want a dropdown menu that has the team names and then when I switch the names there it should basically give me the correct plot, okay? So to make sure that I'm not kind of putting everything in the dropdown I'm just selecting a threshold where I want the teams to have appeared at least 30 times. It's an arbitrary threshold, you can change it. And now this is essentially Shiny, okay? Now what Shiny does is Shiny allows you to specify a UI and a server. So in this case the UI is written completely in R where I want to just add a interactive dropdown menu. And so this is the UI and my, let me just write this piece of code here. And what I want is on the output side I essentially want it to plot the team, okay? But I don't want the team to be hard coded. I want the team name to be picked up from the input side, okay? So in this case I just want plot team input dollar team. So essentially the way Shiny is a reactive framework just like Angular, only differences that is basically completely R-based. So now if I, let me just make sure this is good. Okay, okay, everything goes right. I should get the stuff. So now you see, so this is actually a Shiny R application and if I select a different team here, Chicago Cubs, you can see that the plot changes, okay? Now the main difference I want to point out in terms of interactivity is that this one is kind of happening on the server side. The previous thing I showed you was happening completely on the client side. Here whenever somebody's clicking on the dropdown and picking a different team it's actually running back to the R server and saying, hey, the guy's changed the team so just give me another plot, okay? So we are actually recreating multiple plots pretty much like what the IPython notebook does, okay? So if you want to learn more, there is a lot of stuff, there is a gallery on our charts here and I'm constantly looking for ideas for libraries to incorporate. So this again, actually thanks to Chris View who built this website for D3 and allowed me to share the same code base for my gallery, okay? So I think I'm out of time. So thank you so much. If you have questions, I'll be happy to take questions and Irene, feel free to stop me when I need to leave the stage.