 My audible from the back, clear towards the back. Start off there. We're gonna be talking about visualizations using D3, the library. This was originally planned to be a workshop where I talk about how you could create visualizations from D3. In the one hour that we have, I don't think we're gonna be able to do that. So instead I'm gonna make a compromise. I'm gonna show you stuff that can be done with D3. Just excite you on the possibilities of what this library can do. But also not just about the library. It's also about how one can conceive of visualizations and can create some fairly interesting things out of what might be potentially mundane data. And after that, I'll try and show you some simple things, the basic model of how one creates stuff using D3. To begin with, I'll start with a very, very simple picture. I'm gonna start with a very simple picture. Some of you might know of this thing called the National Rural Employment Guarantee Act, or NREGA as it's called. There's five bit of political overtones around it. The full name is the Mahatma Gandhi National Rural Employment Guarantee Act. And what that act says is, it's sort of like a social security act and it's center funded primarily but also to a certain extent state funded and there are various rules around that. And each state gets a certain amount of money which they use to promote, well, which they use to give to unemployed people directly or indirectly. There are two ways in which they can spend it. They can either spend it on labor or they can spend it on raw materials. It's curious to see how spending on raw materials would actually help unemployed people but it is part of the act. And the funding can either come from the government or the funding can come from the state. What we have here is a very simple picture that shows on the vertical axis, what percentage is spent on labor versus right at the bottom, what percentage is spent on raw materials. So each bubble is a state and we have here Tamil Nadu, for instance, spending a fair bit on labor as opposed to, for example, Madhya Pradesh which is spending a lot lower on labor and primarily on raw materials. I haven't put it in the scale but the bottom is roughly 50%. So Madhya Pradesh is spending a little over 50% on labor whereas Tamil Nadu is spending close to 100% on labor. Now, you'll notice that most of the states are on this side. The size of the state is the total amount that they're spending on the National Rural Employment Guarantee Act. So Madhya Pradesh is clearly spending a lot whereas this tiny bubble which is Jammu and Kashmir, not spending that much. Andhra Pradesh is spending a lot. Maharashtra is spending a whole lot less. But you'll also see on the horizontal axis, there are two states on the left extreme. These are primarily state funded meaning the bulk of the money that's going into this act is coming in from the state itself. One state is Karnataka. The other state is Orissa. This is a fairly marked divide between these two states and I'm not making any comments on the center being Congress and these two governments being BJP or anything like that. But it may have nothing to do with this at all. Now, this is a very simple visualization, scatterplot that's done in D3 and towards the second half I'll be showing you how exactly this visualization is created and walking you through it. But in the meantime, let's walk through some more interesting visualizations. Let's take, since we are in JavaScript conference, let's take JSLint. Each, what we did was crawl GitHub about 10,000 users or so and took all of their repositories and these are JavaScript users and their repositories and said what kind of JSLint errors do they make often? How many of you use JSLint? Practically everybody. Those who do not JSLint is a sort of error checker for JavaScript. If you get a JSLint error, you might want to correct it. It checks for things like white space, good style practices. It also checks for potential security errors and so on. Each bubble is one JSLint error and the size represents, now there are two ways in which we can see the size. For now the size represents the total number of files that contain an error. So let's look at this large bubble. This one is expected something at column B, not column C. To give you an example of what this sort of error looks like or what kind of errors generate this, let me open a sample set of errors. I ran JSLint on jQuery. Now, the error that you saw there, the most frequent one was expected something at column B, not column C. So let's search for at column and you've got all of these. So, expected bar at column five, not column one. Expected window at column 13, not column five. Expected, I mean, undefined at column nine, not column five. Notice that all of these are almost always four columns away from where they are. These are classic cases of indentation. Let's in fact look at these. I think I have a version of jQuery 1.3.2 out here. Yeah, so line, let's start with line 13, which is here. Now, it's saying that this bar, which is starting at column one, should in fact be starting at column five. In other words, make sure that your JavaScript is indented and that is the most violated rule of JSLint across. Which leaves one to question, look, should we even have this rule? I mean, if so many people are violating it, there's probably a good argument to say, it's okay not to use it. The second most violated rule is missing spaces between something and something. What kinds of missing spaces, let's see. Missing space between close bracket and open curly brackets. That's typically when you're defining an anonymous function. You just say function open bracket, close bracket and immediately put your curly brackets. That's not good practice, according to JSLint. Then there's exactly one space between something and something. When do we want exactly one space? Exactly one space. Between function and open bracket. So if you look at jQuery here, between this function and this open bracket, there should be exactly one space, but no space has been provided. I didn't even know that was a rule. But these are among the most violated rules. And then there are lots of smaller ones. Let's take something really small. Weird assignment. That is weird. Strange loop. I'm not picking these intentionally, I'm just picking it up. Variably it was not declared correctly. What's the least unnecessary use strict? So there are people that are using strict when you don't need to. Good for them. Now, let's see how this, remember I said this is by the number of errors in a file, by the total number of errors. Now, there are two ways in which we can count errors. You can say there are 10,000 files and if an error occurs in a file, I'll count it once. Or I can say if an error occurs 10 times in a file, I'll count it 10 times. You can choose to count the presence of an error in a file or you can choose to count the number of occurrences of the error in the file. And this is counting the number of occurrences of the error in the file. What would happen if we just counted the files? Now that changes a bit. What we had earlier as the most prominent one, which was expected something at the indentation error, really, when you move it, it's no longer significantly more prominent the others. In fact, this one, which is A was used before it was defined. So some variables being used before it was defined is the error that occurs in most files. It doesn't occur the most number of times, but that's the error that occurs in the most number of files. This error, missing use strict statement, again, occurs the most number of files. This is one of those errors that can at best occur only once in a file. If you got used to it, you used it. And if not, it'll be missing. So when you move this back to the total number of errors, it becomes relatively small. So there are a number of errors like this that can occur only once in a file and tend to misrepresent the total size. Now, this was again created using D3, among other things. It tells us what kinds of errors we are most likely to make and the single most error is single largest errors, indentation followed by missing white space, followed by wrong number of white spaces, followed by unexpected spaces, followed by expected one thing, saw another thing. To give you a sense of what this might mean, expected, okay. So, expected var, okay. This search for some other word and instead saw. And instead saw, okay. So in line 17, we expected an identifier and instead saw undefined. Now that's because jQuery is doing something weird. In this line, it's just defining a blank variable called undefined. Why? Because that means that undefined now becomes a variable instead of a keyword and you can minify it if somebody has in the Java environment redefined undefined, which you can do. And that can lead to some security issues, among other things. This will still take care of it because if you don't define a variable, by default it always gets the value undefined. So it's doing a rather clever trick, but JS lint is obviously saying look, you shouldn't be doing stuff like this. So that's on JavaScript errors. Let's pick something else. Let's see what people have been tweeting or when people have been tweeting about JS foo. Every dot here is a tweet that has the hashtag JS foo. And what we're doing is seeing this over time starting from Thursday afternoon, where a few people have tweeted and times flying along. Black dots are original tweets. The red dots are retweets and blue dots are replies. And if the reply is connected to some tweet that exists in this list, then it's connected to this particular tweet. So early in the morning, there aren't too many tweets, thankfully. 3 a.m., people are still asleep. 4 a.m., 5 a.m., people have not yet woken up. And let's see when they wake up. 7 o'clock. So we woke up at 7 o'clock yesterday, 8 o'clock. A lot of tweets, people are coming over to JS foo and then JS foo starts. So people are tweeting through the day, the event becomes richer and richer. Around lunchtime, things die down. We're obviously, you know. In fact, towards the afternoon, things get a little slower. Sleepy after the food or what? I don't know. But towards the evening, things get somewhat quiet. 8 p.m., but people are still tweeting or in many cases, retweeting. Ah, okay. That's weird.js. Okay. But seems to be going on till almost 1 o'clock. Did it go on till 1 o'clock yesterday? Wow, okay. And tweeted. Okay, so you were probably the last dotted. Hey, but there was something at 5 o'clock. Somebody's waking up really early. 7.50 this morning, 8 o'clock. People are slowly starting to tweet 10 o'clock. Now's when the tweets should start. Actually, not that much. It's around 12 o'clock that they really picked up. And that's 1.57 just before the stock is where I stopped. So today, there haven't been as many tweets as there were yesterday. Let's see what the nature of these tweets are. So like I said, the red dots are the retweets. The black dots are the original tweets and the blue dots are replies. Not too much conversation happening here. I don't see that many blue dots. Let's pick a cluster that has got a reasonable number of replies. So we've got this one, okay. That's at Shamir C saying, I'm kind of interested to know how or where all of you use backbone.js. Links much appreciated. And that's gotten 1, 2, 3, 4, 5 replies. So somebody said, we use it for building single page or highly interactive apps. Somebody said, it's a great addition for single page apps. If you'd like a walkthrough of some backbone.js code and some examples, feel free to drop by at our table, blah, blah, blah. There's also a chain here. So somebody's replied to somebody who's replied to somebody who's replied to somebody. So what's the original thread here? That's Shari P. Just came back from JSP who didn't feel that it was that interesting. I hope, hoping for better tomorrow as Karthik Dot is giving a talk. Did Karthik give a talk? Oh, okay. Looks like we too catching up is gonna have to wait. Won't be able to make it earlier. Okay. And Shari replied, what? No, was relying on your talk to improve the conference but it's an emergency. Okay. And not coming to Bangalore. I'm sure tomorrow's talks will be awesome. Let's hope. So that was a long chain. Here's something with two replies. Let's see what that is. So that's Zainab. Is that the quotes mean retweeting something potentially? Oh, okay. And what are the replies to that? Oh, okay. The reply was actually a retweet of the same thing. And you know, the sheer number of reds here leads me to think that JS Fu isn't the kind of crowd that tends to talk originally as much as tends to pass on stuff. Let's compare it with PyCon, which happened September 29th. So in the morning, bunch of original tweets coming in, new tweets, probably people going, getting into PyCon and start a little later. And now where the event starts, a few retweets coming in, a few replies coming in, but still mostly original tweets happening out there afternoon bit more. Not as many as they were in JS Fu. So clearly it's not a tweeting crowd, but the few that do are putting in their own messages. There still aren't that many retweets. Goes on till the evening around eight o'clock, things start dying down. And then picks up, picks up after dinner maybe. This is a slightly late night crowd. We are tweeting all the way till midnight, 1 a.m. And Fu, finally, okay, they do get some sleep. Nothing in the earlier, oh no, four o'clock, five o'clock. Between four and five, there are some people. In fact, somebody's actually replying to something. There is a chance that this might be someone in a different geography, we'll have to check. And then nine o'clock, that's when things begin on Sunday. Lot more tweets pouring in that time around afternoon bit more of activity and then slowly it starts dying down. Not too much activity, which seems to be a common pattern. There doesn't seem to be too much activity post lunch on Twitter. And around four o'clock is when I stopped getting the tweets. But look at the pattern there. Far more red dots as a percentage than, sorry, far more black dots as a percentage than red dots. Let's take Paris Web, which has been happening over the last couple of days at Paris. This will be a bit slow. But yeah, Paris Web is a much bigger event, at least in the Twitter universe. It's quite slow here. I wish I could show it to you on my laptop. It comes on like an explosion. This is over a very short span, less than a few hours. It was from 4 a.m. to 8.30, 7 a.m. or something. And clearly you see it struggling. Now, if you look at the percentage of reds versus blacks, it's about the same. Meaning roughly as many reds as there are blacks perhaps. So that seems to be a uniform mix. But the number of blues is certainly higher. Not very high. So clearly Twitter's not being used as a conversation medium. By and large it's used to say stuff that you're doing, to pass on stuff that other people are doing, not much to reply. But at least in Paris Web, there seems to be a lot more activity. I was looking at the Twitter trends this morning, and there was SOTY, which seemed to be a big trend. Anyone knows what SOTY is? Karan Johar, is it? Okay. So I just pulled the tweets. I haven't seen what it looks like. So three in the morning people are, oh, wow. Okay. They are clearly very excited about it at three in the morning. And goes on 6 a.m. Well, far better activity there this morning. Combination of retweets replies and so on. I stopped it around half past eight. Or that's when I pulled the tweets. But yeah, that's certainly a trending topic. And it started at three in the morning. So people must have been either waiting for it to start and push the messages, or maybe there's some marketing campaign that's going on. Anyway, let's look at something else. Let's look at shopping. So if you walk into a store, and this happens to be a store in the UK, you'd normally come in from the outside and make your way to one of the aisles. Each of these bars is an aisle. And the thickness of the line tells you how many people make their way in that direction. So people start outside. A reasonable number of people go to the general merchandising section. But the vast majority of them come to the fresh produce section. By the way, how do we know this? Because in this shop, we had a bunch of carts which had an internal GPS kind of a system. So as people push the carts along the aisles, this thing would track where exactly the cart went. So we have a rough sense of which aisle they hit. So for a given cart, we know that it hit aisle A first, then aisle C, then aisle D, then aisle X, and so on. We don't know how fast they moved and stuff like that. That wasn't, I don't have that data. But at least we know the direction in which people went. So they start from the outside, they go on mainly to fresh produce. So after going to fresh produce, where do they go? They go to buy meat, fish, and poultry primarily. But a few of them also go outside. Few of them go outside. Now remember, this does not mean that they came in and went out immediately. All it means is that of the people that came to fresh produce, and they may have come after finishing their full shopping round, they're exiting from here. So it's a relatively small fraction that are exiting from this location. They primarily go to meat or fish or poultry. Few go to ready meals, few go to dairy, very few go to general merchandising. Almost no one goes to any of these directly. So there certainly seems to be a strong proximity effect. There aren't that many people that go straight to a section and buy stuff from there and head out. But what about those people that went to general merchandise? They primarily come to fresh produce or entertainment. A lot of people also go to general to merchandise before going out. They also go to the apparel. So there are two sections on the left, apparel and cafe, which are actually on a different flow. So let's not count them. People come outside and then climb up, or go near the gate, climb up the stairs, and then go to the sections. Okay, so from fresh produce, they primarily go to meat, fish or poultry. From there, they go to the next section, which is ready meals. From there, they go to the next section, dairy. From there, they go to the next section, yogurts and desserts. From there, now there's a breakup. The up to here, so it seems almost a linear flow. And there's a reason why these sections are arranged that way. From fresh produce to meat, fish, poultry, to ready meals to dairy. These are the most common purchases that people make. And it's so easy for them to just go to one, get the next, get the next, get the next. But from then on, when you get to yogurts and desserts, that's not something people buy as often. World foods, certainly not as often. So from dairy, oh sorry, yogurts and desserts, I got that wrong, yogurts and desserts is essential. And all the people buy that. But from there, a few go to world foods, but a few go to canned foods, rice pasta recipes, bread cakes, rolls. And across all of these, there's also a little bit of migration towards the opposite aisle. So people cross the aisle. But notice how few people are crossing the aisles. Most people are going along one track. They just say, I'm staying on this side, finishing all my shopping, going right to the end and coming back. Which almost means that you can have lanes in this store. One lane for going all this way and another lane for going all that way. Oh, it's a half an hour session, is it? I thought it was a one hour session. If it is, then I'll wrap up. Thank you. It is a one hour session. We'll wait for a reconformation. It is, right? So, yogurts and desserts, blah, blah, blah. So notice that they stay on this side. From canned foods, they go to the next section. From rice pasta, they go to the next section. Understandable, there's not that much jumping, but from all of these sections, there's always a small probability of them going outside. Let's go to the last section. From there, there's clearly a transition to the opposite side. And then they start moving backwards. Now it could have happened the other way as well. It's just as possible that people start from fresh produce and then go to baby health pharmacy, but no, they don't do that. They go to the opposite aisle and then come back here. A lot of people seem to be exiting from this section, which I would worry about. What's this section got that either it's the last spot people shopped for. I don't know what BWS is by the way, but whatever it is, either that's the last spot people are shopping for and therefore are not interested in buying anything more, or it's got such bad, sorry? How many people go there straight? No one. Where do people come to beer wine and spirits? Some from nuts and snacks. You know, this never hit me. This just, you know, not many from sugar and tea and coffee, nor eggs, breads. You know, what is this thing about beer bottles being stacked next to diapers? Let's test that. Baby foods, baby healthcare and pharmacy. No. At least in this tour, that doesn't seem to be the case, but from beer, yeah, okay. From beer, okay. They've got their priorities right. Buy the beer first, then we'll worry about the diapers. Then it sort of makes sense. So that's one way of exploring this sort of data. Let's take some other data. Let's take weather data. Now some of you may have seen this visualization of mine before and it's a somewhat slow one. This is a map of India in which every district is highlighted. I've got a few gaps there because early this morning I didn't have the patience to cleanse the data. There were a few district names that were misspelled and that's what's causing those gaps. But the color here that you see is the average temperature of the district across the century and every flash is one month's worth of temperature data starting from 1900 all the way down to 2000. So we are probably somewhere in the 1900s right now, early 1900s right now. Now the darker the red, the more hot it is. The lighter the green, the cooler it is. And you can see that JNK stays cool throughout the year. The northeast is reasonably cool most of the time. The west coast is surprisingly cool almost right through and so is poor Bandar. That's poor Bandar. So if you want to retire barring the rain you now know where you ought to be. Okay, this flashing spot if you notice has a slightly offsetted weather patterns relatively hot when the surrounding areas are cool and cool when the surrounding areas are hot. That's Belaspur in Chattisgarh. It's a valley. You also will notice a somewhat similar pattern in Shimoga. It's not coming through very clearly here and it becomes starker over the second half of the century. But Shimoga also has a counter cyclic weather pattern which means hot when the surrounding areas are cool and cool when the surrounding areas are hot. They're both valleys but there are many other valleys. We don't quite know why this is happening but it's one of those things that seems to be happening. You've seen word clouds I'm sure but what if the word cloud were in Sanskrit? Why do we have all of these word clouds just in English? So we took the Ramayana and said let's plot all the words there. For those of you who know Sanskrit, I guess most of it will make sense but for those who don't, they pick us mostly about Raja, King, Ramaha. Now here's where you start getting into a big problem in Sanskrit or many Indic languages that you wouldn't get into if you were dealing with say English. The fact is in English by and large most words are discrete. If you have a word like has, it's usually spelled has and there aren't too many variants. Hasn't for instance, is a variant and you can detect many such variants. It's not trivial but you can get the vast majority of those variants easily enough and if you simply split based on spaces, just take any corpus split based on the spaces in between and say those are my words, you wouldn't be wrong too often. You wouldn't get too many overlaps and you'd be right say 80, 90% of the time. In Sanskrit you have problems like okay so there's Ramaha here which is one variant of Rama. Then you have Ramam which is Rama as the object. Then you have Ramoh which is exactly the same as Ramaha just that if you have an aha at the end you can also use oh if there's a vowel succeeding it. Where else do we have variants of Rama? Rama, okay that's just plain Ramam. Ramam, we saw Ramam somewhere, right? Okay, you're probably able to spot it a lot better than I am this close. So the thing is there is all of these variants Ramaha, Ramam, Ramene, Ramaya, Ramacea, Ramat, all of these variations which are effectively from Rama by Rama to Rama which get combined to this word and segregating that out is a nightmare. All I've done is remove the stop words which is things like and, chair, also, up, stuff like that. And it's not complete and there are a huge number of stop words still left out there. But one gets a sense of what the epic is about and incidentally the Ramayana is a lot simpler and epic. So there's a lot about kings, Vakya, people saying stuff, Dadasa, showing stuff, Vachanam saying stuff. Now as a main character you see Ramayana number of places. You'll probably see Ravana in a few places. You'll see Sita in a few places with the variants. Do you see Lakshmana? You're able to see. Above Raja, okay, Lakshmana. And you'll probably see, okay, I see Sugriva there. But the thing is, you don't see as many characters as you would in the Mahabharata and that's also the nature of the epic. This is a somewhat simpler, more linear epic. Few characters, it's still got hundreds of characters but it's more like a, what shall I say, Telugu Masala movie, if you will. Relatively simpler plot as compared to the Mahabharata which is more like a soap opera. Incidentally, let's do this for the JS food tweets. Okay, so it's primarily about Hasgeek and Kiran and Zainab, there's a lot of talk about live streaming, talking, watching live tomorrow. Why are people talking so much about tomorrow and not today? A lot of talk about Hack Night, Raspberry, the Raspberry Pi talk, I'm sure, has been quite interesting, awesome, mountainous, Bangalow JS JavaScript, blah, blah, blah, who's, that's roughly the pattern. Let me show you one last, or maybe two last visualizations. Let me show you this salary bit. This isn't something that I've created, this is on wealthfront.com and what it shows you is a visualization of, on the horizontal axis, what percentage of equity are different kinds of professions given and what is the average cash on hand that they are given? So, for example, if you are into business development, then as a sales engineer or as product management, you'd be getting a fair bit of equity and a fair bit of salary as well, whereas if you're an administration, different story altogether and one can start breaking that up. So let's, okay, let's take software engineering. That's scientific research. That's not quite what we wanted. Let's get to software engineer. So as a software engineer, you might earn as much as 180 to 200,000 if you were a director and you'd get a lot of equity, 30%. So anywhere between the 25th and 75th percentile are between 8.5% to 30%, which is quite good. As an architect, you might even get more equity, those slightly lower pay and then there are the lower levels, managers, reasonably fine. As an architect, let's see. Oh, yeah, of course, yeah, sorry. By director, you would be a product director, for example, or the director of a business unit, et cetera. Not necessarily the owner of the company and obviously if you were the owner of the company, your stake is going to be much higher, which actually leads one to wonder, is there potentially some confusion even there between director as the owner and director as a role because that equity, okay, fair enough. Fair enough, fair enough. Let's look at some other profession. Let's look at what you would make if you were in design or UX. Design or UX? Not as much. So in small companies, you'd get a lot more equity, but there's more variability around the pay. Large companies, there's less variability about the pay, less equity, and you'd make around 100K or so. Well, those are all examples of stuff that can be done using D3 and I'm gonna run you past a shory that was prepared by Mike Bostock that gives you an idea of the kind of animated transitions D3 can make. So this takes four company stocks, Apple, Amazon, IBM, and Microsoft and shows the same data in various forms and gives you a sense of what kinds of charts are possible and how these can be transformed between each other. So that's D3, what D3 can do. And in the next 15 to 20 minutes, what I'm hoping to do is explain how D3, how you can use D3 to create some visualizations. Before I jump in into that, any questions? Yes. Your first chart was about, the bubble chart was about. Andrega. Andrega. Yeah, Andrega one. So how does it make sense to have it as bubbles? Okay, I think that's the second chart where you were showing the JavaScript errors. Yes, Lint. How does it make sense to show it as bubbles instead of having in a graph form where it is more easy to make sense out of which error is made more? Because in circles, you don't really know which radius is bigger. Very true. Now, the reason why I put it in bubbles is not because it would be a better data visualization. It is simply because I wanted to show circles and it looked cool. If I were doing this for a client, I would have certainly put it in a different way. We tend to avoid circles. For those of you who are unaware of this point, it's a lot tougher to compare areas of circles and a lot tougher to compare the radiuses of circles than it is to compare horizontally across bars. Areas are slightly tougher to compare than distances, especially when they're laid out next to each other. And angles are among the worst things that you can use to compare. So in general, circular visualizations tend to be more wasteful of space, less effective from an Edward Tuft perspective of data to increase ratio. But in a lot of cases, if done well, look more visually appealing. So that's your answer. For Jing Bang effect, not for accuracy. So as a follow-up question, we have, I've used D3 and we have a lot of options of doing various visualizations. But to understand which visualization makes sense for which type of data is the most difficult task, not the programming part. So how do you tend to get around that? I will give you a very, very brief answer to that, but first begin by saying that that is by far a tougher problem by several orders of magnitude, knowing what visualization to use when. The rough, okay, let me give you two rules of thumb that would probably be the most effective. First, if you've seen a similar visualization that you like to copy it, that is probably the best way of figuring out what is the right representation, which therefore means you need to be on the lookout for various kinds of visualizations. The second thing is all other things being equal follow this hierarchy. If you want to compare something, make sure that you're comparing it via our distances, x-axis, y-axis, specifically position. We are excellent at comparing position. We are slightly worse at comparing sizes. We're slightly worse than that when comparing colors, especially when they're close to each other. In fact, colors you should just keep for discrete categories and never have more than 20 colors ever. And lastly, angles, avoid angles at all costs. Using this hierarchy, you can choose a visualization where you say, these are my most important metrics and therefore my most important metrics should lie along the position axis. Then if I have a secondary axis and if I have a secondary dimension that I need to compare and it is possible to do it using size, do it using size. If it's possible to do it using color, do it using color. If not, just put that into a separate visualization. Last part actually I just want to add on. You may benefit from two things that visualizations on a computer can do, which is you can start putting it in a drill down manner. You can say I'll start with the top level visualization and then click to get to the next level and using that to break up a visualization. Or you can use a principle of small multiples, which is I don't have to have one graph. So let's say I want to see India's weather and I want to see it on paper or that animation was too complex for me. What I could do is take a grid, January, February, March, April, all the way to December, 1900, 19, 10, 19, 20, all the decades and I have a grid. In each grid put a map of India and that's the principle of small multiples. So that's one way you can break up visualizations. What is the browser support for? Browser support for D3 is pretty much everything except IE. I should clarify, IE9 works okay, IE10 works well. Yeah, you've shown quite a lot of interesting data out there. So is there a public data available like the temperature thing that you have shown? So what is the data source that you primarily use and whichever, which I know Twitter is something we can access but other things which are publicly accessible. So data sources, let me tell you that the easiest way you, this audience will probably find data is by just asking people, okay. I'm sure most of you are experts at scraping and as an audience, you know, programmer crowd is much better at scraping than asking for help. So best way is ask for help and it took me many years to get to a point where I, A, have a large collection of data sets and B, have a large number of people I know whom I can reach out to for data sets. It does take time as well. Anything more detailed than that? It's not for this session, I would say. Just to follow up on the guidelines, the rule of thumb you gave on deciding, I am wondering whether you have it documented somewhere because your words were very quick and. Okay, so I'll give you the hierarchy again. It's a position. No, it's okay, I might forget it still. So I'm just asking. No, I might forget it as well. I just, I'm remembering it from, well, mostly experience, but I have seen this somewhere. I honestly can't think. Okay, sure, I'll find it. I'll try and tweet it. So the other question was generally about D3. The examples and other things are really overwhelming. If you go to the site, the possibilities are really overwhelming and you actually forget what you want to visualize. So do you have anything about what to look for when you look at the charts and which could be of helpful for the general usage? I'd suggest do it in a two-phase fashion. Spend some time looking at all the charts so that it goes in the back of your head and sleep over it and then use a different time slot to think about what you want to create. When you say, I'm gonna create something. This is a problem I have. Let me see what I can find. You will easily get lost in the visualizations. Just spit it out. Turn the internet off, might help. Okay, that'll probably be the last question before we get into the actual creation. Google data is a pretty good source of public data. And there's a book called Visualize This by Nathan Yau. He covers everything you need to know about visualization. And Nathan Yau has also published a small compilation on what are the various sources of data. So has a guy called Pete Wadden. W-A-R-D-E-N, or W-E-R-D-E-N. I can't quite remember. Both of them have a collection of a number of good data sources that you can pull out and play with. Let's do this. What I'm gonna do is show you the code for the Enrega visualization, if I can find it. Let's do a view source. And I'm gonna talk you through this. I'll make it a bit bigger. Hope that it's visible from the back. Okay, yeah, that size is probably all I can risk. So this is where the interesting stuff is happening, but I'll just for formality sake cover the rest of it. We start with a regular HTML5 doc type. We put in a title. I've got a style sheet common.css, which has very little. I'm using bootstrap simply out of habit. I'll show you what common.css has. I'm using bootstrap like I said out of habit and I'm saying define the body's width and for SVG blocks just make sure that it looks a little pretty. So all it does is draws the shaded border around it, nothing more. So that's all common.css has. Let's give all of this. Then we get into the heading and the paragraph and we've defined an SVG tag. Inside the SVG tag, we've got a bunch of labels. So you see this spend on labor, the top left, spend on raw material, state funded, blah, blah, blah. Very crude way of labeling it and all we've done is put text there. One of the prerequisites of getting the most out of D3 is that you know some amount of SVG. You don't have to. D3 can work with divs as well, but you'd get a little more out of D3, a lot more out of D3 if you knew some SVG. Then we include D3. It's mid-sized, well, maybe even large library. Unzipped, unminified, it's about 200k, but for the kind of visualizations that you use, the alternative is to use an image which will invariably be larger anyway. So most people don't worry about the size of the payload. Now, what we do then is D3.csv, which is an Ajax call to pull out a file and then we get the data and then we do a bunch of stuff. Now, before I get into this bunch of stuff, I've got to explain some basic concepts of D3, which is what I'll do over the next few minutes. What I'm going to do here is reduce the size of this, get into the, might be a bit of a challenge, let's see. Okay, first thing I'm going to do is show you how D3 is like jQuery and then I'll show you how D3 is not like jQuery. So to begin with, D3 has selectors similar to jQuery. So you say D3.select and what that does is selects one element. Unlike jQuery's dollar, which selects all the elements in a selection, this selects one and only one element and can trip you up the first few times you use it. So if I say select SVG, that returns one element. If I want to remove everything under that, what I've got to do is say select from SVG star, but I've got to make sure that I say select all because otherwise it would only select one element. Now select all SVG space star to pick everything under the SVG element. That's six elements, mostly the states plus four labels and say remove, which clears up the canvas. Now I can start drawing stuff. I can on this SVG element, I get the SVG element first D3.select of SVG. Now that I've got this SVG element, I can append to it. That's the next thing we have another element. Let's say I'm gonna append a rectangle to this and I can set various attributes to it. Now in order for a rectangle to be visible, at the very least I need an X, Y, width, height and a color, color is optional, but I'll still put it in. And you can specify those using the attributes. X I'm gonna say 100, Y is 100, width is 100 and height is 100. Notice that it has jQuery like chaining, but also notice that I did not use the jQuery equivalent of a dictionary here. And that's because it isn't yet supported. And I'm gonna add another attribute, which is fill, which I'm gonna give color red. And I have a red rectangle, straight forward so far. So this is how D3 is like jQuery. You can do stuff that jQuery can do, not as easily as jQuery can in some cases, but the facility is there for most cases. The power of D3 comes when you start taking data and tying that into the data set. Now D3 stands for data-driven documents. Now this is a very powerful concept. What it says is you've got data and you map that data onto elements of the document. For example, let's take Enrega again. I've got a list of states. I want each state to map to a circle. That is the concept, which means that every attribute of the state will map to the attribute of the circle as well. Or at least I'll have the ability to do that. If an attribute of the state changes, let's say the percentage of center funding changes, I can make it move as an object. I can map the percentage of center funding, which is a variable of that object, to a position, another aspect of the DOM element. Or you can map it to any other thing for that matter. So how does one do that? What we do is, firstly, I'm gonna remove this rectangle. I'm gonna let it stay, it's fine. D3.select of SVG. Now at this point, I have selected an element that exists, which is the SVG element. And then I'm gonna do something weird, which you wouldn't do in SVG, which is select all of circle. Now here is the first place where D3 differs from jQuery. Select all does not necessarily only select stuff that exists. It also selects stuff that could exist in the future, at least from a certain perspective. So when you say select all, what you're really saying is, take all the stuff that is and could be inside this as circles. Now the second thing that jQuery lets you do is bind data to a selection. So at this point, you can say dot data of, I'm gonna put an, and you can put in an array. I'm gonna say just one, two, three, four, five. Now, at this point, what we're saying is inside SVG, there may or may not be circles. But there ought to be five circles corresponding to each element here. I've got one, two, three, four, five. It doesn't have to be one, two, three, four, five. In fact, it typically is an array of objects. And the attributes of the object then start becoming the attributes of the element. Now with this, I have defined what is called, I'm just gonna call it nodes. Let's not get into the technical terminology. Now I've just defined this. I've got to tell it to draw it and how to draw it. Now there are, there is a, the third important concept with D3 is knowing the update cycle. There are three possibilities. You have a set of elements that exist and there's more data that does not exist. Or you have as many elements that exist as there are data. Or you have more elements that exist than you have data. All three are possible. Total number of elements versus total number of data, total size of the data could be less than, equal to or greater than. And what D3 lets you do is say, if there's more data, how should we add it? If there's less data, then how should we remove the stuff? And if there's the same amount, then how should I change the existing stuff to map it? So which is effectively the create, update and delete operations. And in order to specify the create operations, right now we know that there are exactly five elements in the data. And we know that there are no circles. So what we want to do is create it. So we say nodes.enter. When you say nodes.enter, it tells D3, you've got to start doing some creation work. The creation begins with appending an object. So we say, even though we've done select all of circle, we say, once any time you find more data than you've got elements, start appending, start putting in a circle for each element, and then start setting the attributes of these. So what I'm gonna do is set the attributes as X to B. Now, here's another interesting feature of D3, which is that you can set this to be a function. Oops. And in that function, you're by default pass the data element. So I'm gonna return, you've got one, two, three, four, five, so I'm gonna return that element times 100. And I'm also gonna put in a Y, which is function of D. I'm gonna return the same value, D times 100. So it's gonna draw X and Y as the same. So it'll start with 100, 200, 300, 400, 500. And I'm gonna put in a radius of, let's say, function of D, return D times 20. And I'm gonna set a color. Let's just put green for now. Let's hope this works. Okay, not quite. Looks like they're all sitting at the top left. Now let me show you the command again and let's see if you can spot the error. For those of you who are familiar with SVG, sorry, coordinates are negative. T is always zero, descending order. The last circle is sitting on top. Y should be negative. Okay, none of these so far. No, not the radius. No, the position of a circle is determined not by X, sorry? Yes, but it shouldn't be, right? We said X is 100, 200, 300, and so on. The name of the attribute is wrong. It's actually CX and CY, not X and Y. So what I'm gonna do is delete all of these. So D3.select all of SVG star.remove. And create the nodes again. Now, instead of X and Y, let's try putting in CX and CY. And you've got the circles where we want them to be. What we've done here is mapped a data set onto objects. And this is the exact same principle that has been used in every one of the visualizations that you've seen so far. Let's take an example. Let's take tweets. What we're doing here is periodically adding an item to the data and then saying data.enter and then putting it at a certain position. Now, incidentally, D3 has a layout that automatically moves the position around. Let's not get into it. Actually, I probably took the wrong example to show you how this works. Let me pick something simpler. Actually, let me just show you the Enrega thing. That's the simplest of the lot. So let's take a look at the code. What we're doing here is D3.csv gets this file, Enrega.csv. And using that data, what Enrega.csv has is A, the state, each rows of state, B, the percentage of center funding, and C, the percentage of spent on labor. We say D3.select of SVG. That takes the top-level SVG element. We select all the non-existent circles and then we bind the data. We say the data that I want to use for all of these circles is in this data. Enter, which means now we want to start telling D3 what to do when it finds more data than it has circles. First, append a circle, put an attribute Cx. Now here, we're putting in a function which returns D dot center, which is the percentage of center funding, multiplied by 940. That's just to position it on the scale because the width is 940, which is zero to 100 maps to 40 to 940. Cy, we're gonna do some transformation so that the bottom is zero or bottom is something else. And get a function. Lastly, we take the radius R and that is the square root of the total NREGA funding. Now, why do we use square root? Because if we use the actual value, then the radius would be proportional to the value. Usually we perceive size as how important something is and size is the square of the radius, or size is proportional to the square of the radius. So we take the square root of the size, which is the funding, and use that as the radius. And then we put in a color. The color is, so we return an HSLA-based color, which is just taking various hues on the edge of the circle and putting in what is close enough to a random color. And then inside it, we append a title and put in the text, which is the name of the state. So with those 30 lines, you get a scatter plot. You might argue the 30 lines is quite a bit for a scatter plot. But remember that every single one of these visualizations without exception, have been created in less than 100 lines. That includes all of the HTML and so on. So that is where I want to end before we take questions. What you can do with D3, this was a sample of, it's a fairly powerful visualization library. The crux of it is you can map data to visual elements. Thank you for your time. Happy to take questions. In the tweets example that you had shown us, does D3 handle the animation aspect of figuring out that the objects need not get overlaid over each other? Yeah, the question was, does D3 handle the animation in the tweets example? The answer is absolutely yes. And can you just kind of show me the API? Sorry, there? Can you just show me a part of the API, like this snippet? Sure, so let me show you what's not relevant. There's this entire section, which I've very crudely coded, the sole purpose of which is to update this timer. The real code is here. And the important pieces are this call force.drag and force.start. So we create a force-directed layout, which is the visualization that's being used here, and we give it some basic parameters. And then we say every time there is a tick, which means there is a movement. This is how I want you to update the lines and the circles. And force.start keeps calling those ticks every now and then. Force.drag lets me move each of these to wherever I want. All of this is built in. And like I said, this code is still under 100 lines. Thanks, thanks. Hello. So could you spread some light on the different extensions and plugins that are used along with the D3 to make some useful visualizations? That's something literally like some databases or some other storage engines that are used with D3. Okay, the question is what other libraries could... What are the tools that are normally used along with D3 to get the data from databases or let's say elastic, some other similar database engines? Those are usually server-side tools. I would rather not take that now. Maybe we can talk about it offline, but I guess there were many presentations focusing on how to get data from a database. Remember though, that this is an entirely different problem. D3 can natively consume XML, CSV, and JSON. And you've got to pass it data in this format or you can pass it in any other text or binary format that JavaScript can parse, but then you've got to build your own parser. What you're asking is the problem of what libraries can I use to create that data from a database and pass it to them. Extract the data, yeah. Sorry? Extract the data from the database. Extract the data from the database, which would be a server-side problem. And I think that is completely independent of D3, meaning it doesn't matter whether you're using D3 or Fusion Charts from before, or Raphael, or any of those visualization libraries. That would be common. And I would, if you don't mind, go into that offline. And thank you for the wonderful presentation. I have a small doubt in the errors visualization. Did you use any sort of force-directed layout for placing the spheres? No, not quite force-directed layout. It has what is called a packed layout. And let me show you what that looks like. So if you look at the code here, let me make it slightly bigger, there is a different layout called D3.layout.pack instead of D3.layout.force, which makes, what it takes is a bunch of shapes and puts it in the most compact representation possible. Not shapes, circles, really. Yeah. So I mean, I can see a, Yeah. So that the layout takes care of the spiral or like you need to manage? It isn't actually creating a spiral intentionally. What it's doing is taking the objects one by one. So it's taking the first object. And since I've sorted it in descending order, the first object is the biggest and it puts it somewhere. Then it takes the next largest and puts it as close as possible so that it's all packed. Takes the next, packs it as close as possible. And so on. And it isn't doing it very efficiently if you notice because this circle would have fitted here. Exactly. So it's not necessarily doing the most optimal packing. What it is doing is taking a good trade-off between speed versus size. So it just lays it out and hopes for the best. If you were to randomly sort it, you might get, not might, quite often will get a better packing than if you were to sort it. But if you sort it, tends to look nicer sometimes. Cool. Thank you. If you sort it ascending, it looks almost exactly like a spiral. In fact, let's see if I, no, I won't have the time to show it to you, but all of this code is on GitHub. You can just pull this in and sort in descending order. It'll be a, you know, all near perfect spiral. Questions? Yeah, I missed the first five minutes of the presentation. So maybe you mentioned this, but what other visualization libraries do you use at Grammler and how do you choose between, like when you start a project, which one to pick? Okay. The question was, what do we use at Grammler? So for, I don't know, I don't think I introduce myself at all. My name is Anand. I'm in this company called Grammler and we do data visualization. Sorry about the lack of introduction. As for what we use at Grammler, we do not use JavaScript at all. If we do, we use D3 and that is sparingly. If for any reason we could not, and I'll tell you why we cannot use D3 and that has happened only once so far, we use Raphael, Raphael.js. If not, we just try and stay away from visualizations. The only problem that we have with D3 is that it does not work on lower versions of Internet Explorer. Other than that, it is fit for purpose, sorry, no, that is the primary problem. Other than that, it is fit for purpose for almost every other visualization. The secondary problem that D3 has is the slightly higher learning curve. It isn't as easy to learn. I'm not expecting, for those of you who don't already know D3, the 15 minutes of D3 overview that I gave you, I assume will make no sense whatsoever. It's just a starting point. The learning curve is steep and that is the other problem. So that is one of the few occasions where we use another library, but that is always when we don't have a choice. So short answer, always D3, perhaps Raphael. So I have a question regarding the weather map, the India map that you had. So was that even created in the 100 lines of JavaScript? Yes. See, the good part about it is while it's 100 lines of JavaScript, the SVG file behind it is more than a megabit. Okay. I don't know if that's cheating or not, but that's data. You imported an SVG file and directly did it. Yeah, absolutely. I mean, I wouldn't draw a hundred lines of, I wouldn't use a hundred lines of JavaScript for a map or at least not this detailed map. And unfortunately, we're out of time. So I guess all questions are going to be offline from now on. Where's the next speaker? Is it your setup? Thanks, Anand, that was great. Thanks everyone.