 Right, so I lead the data visualization team at Twitter. We build tools to help the company explore, analyze, and visualize all our large data sets, which mostly contain people saying that they are eating sandwiches, stuff like that. But there's some interesting stuff happening there, too. Part of my job is to design and develop data visualizations that run in the web. To do this, we've used several of the different web graphics APIs that we all have available. So today, I'm going to give you a brief overview of these technologies. I'm going to show you a couple of examples that we developed on Twitter. And then, based on those examples, I'm going to give you some basic guidelines on when to use or avoid these technologies. So I figured that the best way to start my talk about web graphics API was to use them. So I'm just drawing a set of circles using four different technologies. In the top left, I'm just using HTML and CSS. In the top right, I'm using SVG, which stands for Scalable Vector Graphics, and it's a vector image format that has support for interactions and animation, and it's widely supported by modern web browsers. In the bottom left, I'm using Canvas, which is an element introduced with HTML5 that can allow for scriptable rendering of 2D shapes and bitmap images, and in the bottom right, I'm using WebGL, which is a JavaScript API for rendering 2D and 3D visualizations. Now, these technologies are very complex, and I'm not going to explain them in depth, but I'm just going to show some concepts that I believe are important in order to explain the examples that I'm going to show you. Now, to us, these circles all look the same, but the web browsers, they look quite different. I'm going to use a tool called Tilt for Firefox that let anyone visualize any website, and here you can see the first difference in these technologies, and it's the mode in which we are rendering the circles that we're drawing. So the ones in the top, both HTML and SVG, use a more called retain mode in which you're basically saying, I'm half a circle, I'm going to put it there, and then that circle has some properties and some attributes that you can retrieve and change later. The ones in the bottom, they look like an image right now. It's like you're just stamping a circle there and just forgetting about it. So for the ones in the top, both HTML and SVG, I like to compare them to these shape puzzles that we use as kids, in which you have shapes that you can touch, you can move around, they have some color, they have some properties, and theoretically you could change their shape by destroying them or painting them or something like that. So on the other hand, in media drawing, which is the one that Canvas and WebGL use, I like to compare them to a physical Canvas where JavaScript is your brush and you're just drawing around, and it's up to you to keep references to what you are drawing in order to change it later. Now, these metaphors are not perfect, especially because unlike the physical space in Canvas, in SVG, you can redraw really fast, like in milliseconds. So this allows you to do some effects that are not possible to do with a physical Canvas. Now, I have a very simple example of a graphic in the web using both SVG and Canvas, and this is how you draw a basic circle using SVG. You just open an SVG tag, it's an XML-based language, and then inside, you say, I have a circle here, I'm going to put it in these coordinates, both CX and CY, and I want the circle to have these radios, which is R, and then I want to fill it with this color, which is the property style where you can just put CSS rules. Now, if you're building interactive visualizations, you're probably not going to be coding SVG directly. You're probably going to be using JavaScript, so you can add elements dynamically, remove them, move them around, or so on. On top of that, you're probably using a library. In my case, I like to use these three, which is relatively new, but very well known library to manipulate documents. And what's happening here is basically the same thing. It's just some more programmatic way. We're just creating an SVG object, and we're appending a circle to it. And know that we have a variable core circle here, where we are storing the reference to the object that we create. And then we can just apply properties to the circle. So for example, here I'm just saying I want this circle to have the radius of 10 pixels instead of 100. And I want that to happen in five seconds. You'll see how the circle just shrinks. So now I'm going to draw the same circle using canvas. And I'm not going to go through the code, but it's a bit different. In canvas, you just get the context outside the canvas. And you perform operations in the context. You don't say I have a circle here, I'm going to put it in the canvas. You say I'm going to draw something that looks like a circle, and that's it. But that doesn't mean that you cannot get this circle unanimated. You could very rapidly redraw the canvas, store your radius of the properties of the circle yourself, and then do the same effect that I show you with the SVG example. So you have to recap here. Retain mobile, SVG, and HTML. You have objects with some properties that you can retrieve and change later. In canvas, it's up to you. You draw and forget. Or as a developer, you can make objects of whatever you're drawing, and then later on revisit them, and if you want to change them. So now I'm going to move to show you some real-life examples of data visualizations in the web that we've done on Twitter. We're going to start by showing you an interactive visualization that we published last fall before the 2012 US election. And for this visualization, what we did was taking all the tweets of the candidates were posting before the elections during the campaign, and we tried to see if there was something interesting going on with these tweets. One way to look at it was to get all the responses to the tweets, especially the positive responses, which in our case were favorites and retweets, which are mostly people saying, I agree with this or I want to share this. So they probably agree with what the candidate was saying, and we divide them by different states, by all the 50 states, and we also tag the tweets with different topics. I'm going to show you the visualization, and you can actually access this right now. It's on elections.twitter.com slash map. I'm going to explain what's happening a little bit. Left and right, we have the tweets on the candidates represented by bar charts. The size is the total engagement, which is measured by using favorites and retweets, and also normalized by the Twitter population in the different states. And in the center, you have the tweet that you're looking at, and you can do search, for example, you can search for some interesting tweets, like tweets from Romney about coal, which get a lot of engagement in states like Kentucky and West Virginia where coal production is very high. You could search for Pell Grant where there are tweets that have very unusual engagement in a cluster of southeastern states. By the way, in the map, what we are coloring is the amount of engagement that the tweet received in that particular state. So you can click the map, you can click the tweets, like I show you, you can click the maps, and when you click the maps, we have a summary chart in the bottom that represents the topics on tweets which the users on a particular state engage the most. So if you click on Texas or California, that topic would be immigration. If you click on New York, it's abortion. Washington state is gay rights, and Nevada is taxes. So there are very different properties on the tweets based on the response by users from different states or U.S. Now, we'll open tilt again just to show how the visualization was done, and I'm only going to split the map. The map, because the bar charts are very simple, it's just HTML, the element size by the width on the CSS. But for the map, as we wanted something highly interactive, we're using SVG here, and we were able to draw all the 50 states, and the main reason we use SVG is for the interactivity that it provides. You can click in a particular state, and you can also attach data to the objects. In this case, I'm highlighting the object representing California, and in the ID parameter, we have the tooler ID for the country, for the state, and that let us do things like, whenever I click on a tweet and I need to call our state, you know, in some way, we can query the state directly and say, I want California to have this shade of blue. And it also helps us to do event handling, so whenever you also click on the state, you can update the visualization based on the particular state that you clicked on. And one last thing is that you don't see it here, but below or behind the SVG map, we have a static image that we generated for all the tweets, and this was to support browsers that have no support for SVG. So we have an image that represents the distribution of engagement for that particular tweet, even though you cannot click on the image, you see something, you don't see an error message or an empty screen. So for this visualization, we have a low number of elements. I have a couple of attributes here that I like to use whenever we start to do a visualization on the web, by the way. And the elements are complex, especially the shapes of the states, where the majority of them is an interactive visualization. You can click on the map, you can click on the different tweets, and you can search, and that will update the UI. There's no animation going on. We had to support i7 and 8, because a lot of our users did use i7 and 8, and it doesn't have support for SVG, so that's why we added the static image. So they saw something, and we used D3. There's a lot of examples on how draw interactive maps using D3, so that was very helpful. And for this example, again, we use SVG and HTML. Very easy to do interactions to draw complex shapes, like the shapes of countries or states in this case. And we got a lot for free for the interactivity of the visualization. Like I showed you event handling, you can attach data to the elements and so on. So the second example is a visualization of the flow of tweets that we did after the earthquake and tsunami in Japan in 2011. Whenever something happens, even if it's a very small earthquake on LA, people tweet about it a lot, and you can see these massive spikes going on, and that's already something normal. But with Japan, not only the spike was just unprecedented, it was huge. But also, we got reports of users telling us that after the earthquake, they came to Twitter to share news and to let friends know that they were okay because the normal channels of communication were not working very well. So we wanted to try to see at this, and this was more, the visualization was a complement to a larger story that we share on our story's website, and I have a video for it here that I want to show you. It has sound, but the sound doesn't matter that much. So the quality is not that good, but here what you're looking at is white line representing tweets that Japanese users sent that are mentioning users from different parts of the world, and the blue line represent the other way around. Users from all over the world mentioning users from Japan. And that's before the earthquake, and this is after that text where the earthquake happened. So there's some massive amount of tweets flowing in and out of Japan. And then there's a second visualization that shouldn't be blended here. You're just, at the end, you're just gonna see a mess. But there were actually two different visualizations. But I wanted to explain a little bit what the second was, which is the yellow and green lines. Yellow lines are tweets that were sent from Japan and people from all different parts of the world were retweeting them. So if I am in California and I retweet a tweet from Japan about the earthquake, then I have a yellow line going from Japan to California. And then when you retweet a tweet, you're propagating it to your network. So in my case, there's probably a lot of people from Puerto Rico, from Latin America, and from different parts of the world reading it. And that's what the green line represents. So in basic terms, this visualization is basically circles moving in. So circles moving from A to B in some path. And that's the way we started to develop this visualization. We used SVG, we got a lot of circles and we started moving them around. The first problem was that after the circle gets to its destination, you have no way to know that something happened from A to B. So we moved to instead of just drawing one circle, just draw one circle, move in the path, draw another circle, move in the path. And doing this, you know that there's something happening from A to B. And after it happens, you know that it happened, it's still there. When we tried to do this with SVG, the animation was like three seconds. And then it was just like a resource hog on the web browser. And the main reason is that all of these circles are objects. And by the way, in this case, SVG is on the left. The canvas version is on the right. And here I'm just drawing ten lines, very short lines. The visualization, we were drawing thousands. So there's probably, there were probably gonna be millions of circles being drawn on the screen. And with canvas, again, we're just drawing and forgetting about the circle. So we're just saying I want this pixel to be in this color. And in this case, this was very helpful for us because we didn't care about the circles that we created them. We didn't care if you clicked on them. We were not gonna remove them. We were not gonna change its color. So for us, it was just drawing and forget. And this was a very good case on why to use canvas. Instead of SVG. And I have to mention, though, that the visualization wasn't as smooth as the video show. Even with canvas, we had to fast forward it. Because it was a bit slow. There was a lot of data moving in. And again, in this case, we had a visualization with a very large number of elements. They were just circles, not complex at all. There was no interaction. We were just drawing and moving on. And it was an animation. That was what basically is an animation that we used to complement a blog post and a story that we share in our stories website. We didn't care about browser support. We just used it internally. We have an internal showcase of visualization. So we just advise employees to use modern browsers anyway. And we use processing JS. It allows us to do animations very simply on canvas. And with this, canvas was very helpful. We were drawing a lot of elements. We had animations. But we didn't care about interaction. We didn't care about other things that we care on the first example, like attaching data to every element and so on. So the third and last example that we did on Twitter was a visualization of tweets about Neil Armstrong after he passed away last August. So again, as with the earthquake, when important things happen, we can see them on Twitter. We can see how people start talking about them. And this case was special because there were a lot of people sharing quotes and images and stories about Neil Armstrong. And we wanted to somehow get all these tweets and do some kind of tribute that wasn't a tribute by Twitter but by Twitter users. So the first thing we did was to grab all the tweets. And that's where the first problem showed up. We had 1.4 million tweets about Neil Armstrong the day he died. And there was a problem because we needed to do something fast. And we didn't know if we wanted to aggregate this or summarize the data by country or by keywords or so on. So Nicolas Belmont from the data visualization team on Twitter had this idea of using WebGL. I just tried to render all the tweets. Even though we probably didn't have enough resolution to show all 1.4 million tweets, he just wanted to see what happened when he put the tweets there and tried to visualize them. So he used WebGL. It takes a little bit to load. And the whole idea was to take every tweet that wasn't and just represent it with kind of like a light of a candle in earth. And you can see how the earth is just moving around. And the main idea was to see to get the earth zooming out while the tweets were going on until you get to the point where you can see the surface of the moon. So it was a nice, thank Nicolas who did this, not me. But anyway, it was a nice, I mean, it's not insightful as any map where you're drawing things that mostly focus in the metropolitan areas. There are just maps of things happening. But in this case, we thought it was special because those points represent someone saying something about Niel Armstrong. So even though I show you this as an animation, this is actually an interactive visualization. You can zoom in the map, you can move it around, and you can even change the stage of the animation. So in this case, I can show you daylight when it's like eight or nine on the east coast and so on. And this is all happening while we're loading a million tweets. And again, we're not rendering all the million tweets, because we probably don't have enough resolution. But the data set is the raw number of tweets that we have for that day. And in this visualization, we're not only visualizing the tweets, but there's a lot of information, you know, the air, the sphere of the air with the terrain and the water, and the whole sense of space with the moon on the background. And for this case, I think the only way to go was to use WebGL, especially because of the 3D component of the visualization. So moving on on this large number of elements, the complexity of the elements is high, because even the points that we're using to represent the tweets, they are not just circles. They're circles with a halo. So they gave the effect of being like a light or a candle. And it's an interactive visualization, even though we share it as a video, again, as part of a larger story. And it was interactive with animation. Browser supporters don't even think about it if you don't have WebGL. And in this case, one of the problems that I personally have with WebGL is that it's very hard to get something simple done. And for this, there are a couple of libraries. There's 3GS. And there's 5GS, which was done by Nicolas for this visualization, by the way. And they help you to have simple shaders that you don't have to write and other common things that you don't have to do yourself, which is very good. So again, WebGL was really good for very large number of elements, three-dimensional visualization and even visualization with interaction. I showed you three tables with some attributes on the visualizations, even though I had mentioned most of those points. Because I think those questions are probably the place you should start if you are developing a data visualization for the web and you have a specific use case and you probably have your data already. So first ask yourself how many elements the visualization have. Do you have a visualization with like 50 or 100 elements like the elections map visualization? Or do you need to draw thousands of potentially millions of elements in the screen? How complex are these elements? Sometimes you just need to draw circles or even if something looks complex, but you're basically drawing circles or lines or text. In other cases, you need to draw complex shapes like when you're drawing maps, for example. Do you need interaction? And as I showed you in the first example, SVG is a really good way of if you don't have millions of elements to have interaction in your visualization in an easy way because you can attach data to the elements. You can handle events and so on. Will you have animations or transformations? Like I showed you on the Japan earthquake example, we tried to use SVG where we were appending circles all the time in an animation that was tried to run in a couple of frames per second. And it's very hard. If you have a lot of elements and you need to query them later, it's very hard. Using Canvas, on the other hand, you just have a set of pixels and you're just drawing on them and it's easier. Do you need to support legacy web browsers? As I showed you in the beginning, this is an important question because if you have to, you need to think about your strategy for falling back. In our case, it was just an image. In some cases, that worked. In some cases, if you have an animation, then you probably want to show a video. You probably want to show an animated gif or so on, which, believe it or not, has worked for us, at this internally. And I think this most important question is are there related open source resources that you can use to build your visualization in the web? And I'm actually going to answer this. The answer is going to be yes. Whatever you're trying to do, you're not going to have to start from scratch. There are libraries. There are frameworks. There are a lot of example code. And there are a lot of resources online, stack overflows, answers, Google groups, threads, with a lot of information and insight. So instead of starting from scratch, you'll start by doing some search. And I'm trying to get some examples that can guide you whenever you are building a data visualization in the web. So I want to show you a last example. This example wasn't done by Twitter, but used Twitter data. It was actually done by Santiago, who's going to be speaking here tomorrow. This really nice, beautiful example is a nicely done graph of conversations between Twitter employees. And I chose this example because it has elements or characteristics of some of the samples that I show you. And it has characteristics of many of them. It doesn't have characteristics of the canvas or of the SVG one. So for example, you have elements with interactivity. And the elements have data on them. So you can hover on the different elements and see, trying to find myself, but I cannot find myself now. Anyway, no, anyway. But you can move it around whenever you hover on an element, not only you get data about the node, but you get the edges color of, the edges represent the conversation between employees, by the way. And it's very smooth. You can even click on something, and it will move the animation around, which is very nice. And again, there's a large number of elements on the screen. There's at least 1,000 nodes and at least 10,000 edges. But on the other hand, there's a lot of interaction going on. And this visualization was actually done in Canvas. Which shows that even though, yeah, I mentioned some of the guidelines that if you want interactivity, the SVG is a very good way to get that. That doesn't mean that you cannot do it in Canvas. Actually, this is probably a good example of Canvas, because you have a lot of elements, and you need to query all of them whenever you hover on them, whenever you click on them. So the animation is very smooth. And I showed you that example to give you my last point on this presentation. And that point is that at the beginning, when you start working on data visualization in the web, it's good to follow these guidelines to start from the basics, to learn how things work under the hood, to learn libraries, to use them, and so on. But then you're going to get to a point in which you're probably going to get used or very comfortable with one of these technologies or a couple of them. And at that point, you're going to see yourself just not using these guidelines at all. And that's perfectly fine. I think the most important part of building data visualization in the web, and not just in the web, but data visualization in general, is just to be creative, and not let frameworks, standards, or some guidelines, just limit whatever you're building. Just have fun and be creative. Actually, it has, yeah. That's it. Finally, I just open-sourced this talk. You can go to my GitHub profile. It's on my areas. And just see the code examples. They are there. If I got something wrong, or you think something can be better, just open an issue. Or better yet, if you think there's something you can contribute to it, just submit a pull request. I'd be happy to have it be useful not only for you, but for anyone who wants to start working on data visualization in the web. Great. So I have. Thanks. You have time for questions? Yeah. All right. So there's an exploration phase. And I think the first step is to usually go to our data centers and get the data, in this case, from our analytics data store, which is where we dump all the production data. So I don't go to the Twitter. For example, if I want to work with tweets, I don't go to the production databases and try to run a query there. We have a whole system that dump all this data into Hadoop, in our case, or Vertica. And then we can just go and query them there. So I think the first step is just to get some aggregations from Hadoop using PIG, using Scaldin, or any other map-reduced abstraction. And just reduce the data enough that we can load it in something like R. And then in R, we just explore and see what's interesting. For the elections example, we had a lot of parameters about users. And we tried to look at different ones. We look at mobile clients versus desktop clients, for example. And the ones that looked interesting was when we tried to do, we did a tile or a grid of maps with color by state, like the same ones I showed you, but where you had one tweet or all the tweets, and yes, all the tweets in a grid showing the map. And then we saw, and at that point, we looked at it and said, this looks completely different. There's some tweets that have higher engagement in different places. There's some topics that are predominant in some states and not in others. And then after that, we moved to the data as aggregator already. And we kind of cleaned it up a little bit. And we just get just the right amount of data to load in the browsers. Sometimes we have some extra parameters and so on. We just get rid of them. And then in the browser, we just started iterating on the visualization. So it goes from Hadoop to something that fits in memory to the minimal amount of data that we can use to show the visualization in the web browser. Sorry, the question is how, yeah, what the frameworks do we use to visualize the data before we made the final version? So yeah, we visualize it in R. We look at it in R, mostly because it's very flexible. You have just five lines of code and you get a grid of like 100 visualizations that you can look at really quickly and then identify the layers or change the parameters that you want to call a buyer or so on. Or sometimes we use something like Tableau, for example, where you can just drag and drop fields and change filters and things like that. With these ones, none, which is sad. And the elections might work in mobile, but I don't think it's very useful. The map is more you want to click in a state that is more you're probably clicking in a different state, or tapping in a different state. But yeah, it's a very interesting topic and you have less pixels and not just double less space in which you can interact with. So yeah, in these examples, we didn't even think about mobile at all. Top. Yeah, that's a hard question. It's hard for me because before I started working at Twitter, I was in that position, too. I did a lot of data visualization based on Twitter and I had to remember one spring break where I had my laptop just running the streaming API and getting every tweet that I could for like a week. So I can get 100,000 tweets or so to visualize. And I don't know. What we usually advise, and this is usually for students, not for everyone, obviously, is students usually just come and do an internship, spend like two, three months. And whenever they come to the company, they have access to the data sets and they usually have some research that is based on Twitter data and they can just expand it there by using the entire data set and by using all the signals that we didn't share. And there's a couple reasons. I think the main one is right now it's on privacy. And it's not, I mean, tweets are public anyway. But there's a lot of problems when you just get a big dump of data and you throw it out there. So people start getting analytics that may be misguided about the business, for example. If your particular dump of data has tweets per day declining 10% every day and your sample wasn't evenly distributed by day, then you probably are concluding that Twitter is going down or something like that. And that's not good. All the things, as I mentioned, in privacy, people delete tweets. So if we release officially a dump of data with a lot of tweets and then someone want to delete those tweets, how do we go to each person or researcher and say, hey, you need to delete these tweets from your dump? So it's very hard. Yeah. Yeah, no, I agree. And it's something that is not in my hands, sadly. But there are tools that, for example, I know Node Excel, that is an Excel plugin that you can get Twitter data out of it. But again, it's constrained by the API too. So I don't have much. I have full answer for that. No. I mean, at least in my example. This is where we spend like 5% of our time. And when we usually release something publicly, it's something that we do real quick that we can use for different purpose. Like the elections map, for example, we released like a week before the elections. And we built it in like a week or so. So now even in the product, we're getting more about accessibility, trying to make the clients more accessible. I think if we try to release something publicly, like a data visualization publicly in the future, it's something we're probably going to be thinking of. All right, thank you.