 Wow, that's a lot of people anyhow, so my name is John Alexis Guerra Gomez Yes, the full thing and I come from a country called Columbia and it's Columbia not Columbia by the way But it's the place where I get to work in this beautiful place This is Los Angeles University. That is what I call work And it's also the place that makes these beautiful mochilas at a wakas and I wonder what the Life-scriber is going to write in there. Anyhow So it's a beautiful country you should come and visit But that's not what I'm here. I'm actually here just because I had a network. Well, actually no I didn't have a network. I had an idea a question the thing is that There was open bees 2014 and 2015 because I didn't know about the conference in the first time and then I realized this thing existed And I couldn't come I applied like three or four times until I bribed Irene. I'm sorry to get accepted But I could follow online so when I was doing that I actually realized some things and the first thing is that I definitely needed to go So I actually ended up paying by my pocket like last year and then the other thing I realized is that thanks to these amazing bloggers and Journalists people we get to follow everything through Twitter in here So that was quite interesting and then I was getting so much information in there But then I asked am I getting it all so am I actually following all the right people that I need to follow in database That are talking actually about that So I started like writing some scripts to see if I can actually see that well, and then I started saying well What if I see and check all the people that are actually talking about open bees? And then I found more than 3,000 tweets and this is from last year's by the way this year. We are around 2500 right now, so we are kind of doubling that And then they were like almost 800 accounts of people tweeting about that and that was interesting But then I had another question like is that what about the interesting people? That are talk interesting database, but they don't really Interacted with the database with the open bees conference. So for instance, where is Mike? There you are So Mike Bostock, he didn't tweet about open bees last year What I mean, what were you doing man? I mean like were you writing version for D3 express one of these things like no, but seriously man Thank you very much. Thank you for all the work you do for the community that is Chakita cheese, Colombian cheese Not from Madrid or sorry Spaniards cheese hours But yes, thank you very much Mike for all the great work you do He didn't tweet about open bees, but we definitely need to be following him Otherwise, we will have messed all of these wonderful stuff about D3 express So he wasn't there. So that's the one I started saying like well, I had a realization What if I follow all the people that you guys follow the most and that wasn't that was the click and that was That's how we celebrate in Colombia and then I say well, that's it I just have to follow the people that you guys follow the most and that's how I come up with this list And of course Mike is on the top of the list apart from the open miss conference and then we have all the big figures in there We have all lovely Organizers and then I knew okay, that it's an inside that's something that I really care about However, I love networks and I come from Maryland and I've done a lot of work on that So and we also have the beautiful D3 Networks visualization. So I say let's visualize that it will be even more clear And then when I started doing that that happened So Nadia already show us some examples of that, but that's what it's called a hairball and that's what this talk is a lot How the heck do we get that thing to actually show insights? And that's the important thing of doing data visualization. So I say well We can come from those 700 notes and then we actually show like say the top 300 and what will we get a hairball? Let's say we only get the most influential ones sort of those 300 let's look only at the ones that were actually tweeting about the open this conference and though it what do we get well a hairball So I say let's find an algorithm that computes the communities in the network and to try to separate the different clusters and get All data scientists and machine learning and blah blah blah that's going to solve the world and what did you get a hairball? So in that moment I say well, let's go back to my question Why was I asking at the beginning and that what I was asking it's What are the most important people that I should be following and then what I did is to create a scatter plot I fixed the the positions and then on the x-axis you have how many people they have Following them overall and then how many people are actually following them inside the conference And by the way, there is an error in here, but Mike Bustock is around here somewhere And what you have to find in here and the interesting things about this Oh, sorry is that he he wasn't talking about open this. Sorry, Mike But in any case what you can find in here is that we were interested it I was interesting in following people around these area I wanted to find people that it wasn't following and they were in here And then you can draw your own conclusions and trying to find Where were the ones that you were in following that with a good balance of how many people follow them at all so to improve this I actually remove the links and that's the visualization I created and then you get the big figures Robert is you in there Jeff Alberto Cairo and all of the nice Infobis people and then the other thing is that with this Representation I could actually add back the notes and in that case that Mike back and then here you have interesting stuff And one of the most interesting ones is this one. You know who is this guy? Oh, I'm sorry. By the way, Mike I need your help getting this thing to work So I hope you still like me Please don't stop follow me So who's that guy who knows That is Edward Tufti and Edward Tufti has around 82,000 followers But if you look at him how many people are actually following him in the conference And that was very interesting because if you know that his relationship with the community on Twitter He's not the best with many people So he was actually amazed to see that reflected in there, but anyhow, you can draw your own conclusions Please please don't stop funding the conference Anyhow so after doing that I say Well, what if we actually look at how many tweets they did in the conference and then is where we have our view Beautiful bloggers and the people that are actually helping us know what it's happening on the conference And then you can see all the nice people that were doing that So this is the type of things that we need to be doing doing network visualizations And there are a bunch of tools and techniques that we can use to actually untangle the hair So let's talk about that. And for that, let's go back to the basics. That's me on graduation day That is Ben and Catherine my two beautiful advisors Ben Schneiderman and Catherine Placent from the wonderful HIL Laboratory in the University of Maryland and every time you talk to them and you start a new project with them They will always say what are your tasks? What are your users and with that that really stick to me and when you're doing a network visualization That's what you need to be asking yourself what it's actually what you want to answer with that visualization So for that, I actually have to recommend you very strongly this book So I'm using this to teach the two classes. I'm teaching on database, but in Los Andes and in Berkeley Well, I'm actually trying to push the people and Berkeley to use it But this is Tamara Monsner's book on data visualization and design And it's a wonderful book because it creates this amazing framework that they've helps you define What is the data that you're analyzing? What are the tasks that you're doing and then how you can actually visualize that? It's a beautiful framework that you can actually follow simple rules And then you will get better visualization just by framing on that as you can see in here I could give you a lecture of like of an a whole hour on the all the different types of tasks But the most important things that I have identified when I'm doing network visualizations is that there are two type of tasks One are overviews if you want to create like get a whole idea of the network And then the other ones is when you want to query in a specific node and those are the query tasks So let's start with the overview tasks so for the overview I have already shown you some things and this is by the way the IEEE v citations network a beautiful herbal this is data up to 20 15 And then the first thing that I recommend you when you're doing that is that instead of trying to show the whole network What you should do is to try to select the most important nodes out of the network How do you get those most important nodes? Well, think about the task you have at hand So in my case when I was looking the open at the open these people Then I wanted to find the people that had the biggest number of followers that actually tweeted about open Things like that when I was finding fishy doctors in Syrox park I was trying to find doctors that had like a very suspicious behavior So you can find different matrix Sorry different metrics that you can actually use to rank those nodes and then you select the most important No, you want in most important ones? However, if you have a very big network the thing is that that network most probably will not be very well connected So one of the simple tricks I did is that once you select that core of 10% of the notes or something then you should go and try to get the neighborhood of that 10% so when you do that is a very simple trick and then you have a small number of notes that will create a Representation of the network and then if you allow the user to actually interact with that and then select different filters Then they can actually get gathered insights from that So the next thing is that you can use communities community detection So that was something very common and I Really hated every time having to go to our and then running the queries in there and closer in there So what I did is that with the help of? Some code that I found online. I created a library called net closer in JS Net cluster in JS is the library that allows you to go from this to this And the most interesting thing is that it actually runs in the browser and the beautiful thing about that Is that you can actually let the user create a lot of filters select the notes? They care about and then cluster and it's actually acceptably fast if you have a good enough network and by by the way If you try to show more than a thousand notes on on a browser There's not much that you can actually see but even on a network like that It will take only one or two seconds to to cluster now even showing something like that You can see that it's like yeah You see the colors and everything but you don't really see the actual clusters So using an algorithm called grouping a box that Receives things on a trim up. Of course. He had to be a trim up. I come from Maryland It separates the notes into their own boxes. That is what is called grouping a box use an algorithm created by Cody Dune or a very good friend of us and Ben Schneiderman many other folks in there and what I did is to create an implementation of that on D3 I did it for D3 version 3 and then Mike changed the whole force simulation thingy But actually with the new system. I don't know if you have done Networks in the in D3 version 4. It's amazing because right now It's just a matter of creating a force that draws the notes to a certain position So today I'm releasing the forcing of ox for D3 version 4 and it's as simple as just adding a new force To your simulation and then setting the different Link strengths and things like that and you can get this now The other thing I created is that if you don't want to see it on a trim up then I actually created a force-directed Metal layout for a force-directed layer So what this thing does is that it creates like clusters and each one of those clusters will look at it as a note And then that note it will be used to create the foci in which each one of those not is going is actually moving So if you actually see here, I can actually spend like the rest 15 minutes of my talk just doing this So it actually will adapt on to whatever Resolution you have in there and all of that is thanks to master Mike So thank you very much and you can actually use it. It's open source and it's super easy to use Please send feedback and things now the other thing that was very useful every time I showed my users these things the next thing they wanted to know is like oh That cluster in the top that one looks suspicious or I want to see who are the people that are in there So that is what I call jump into a cluster So there it is that you can create some interactions that you can click on this And then once you do that it actually goes and only filters that note And since we have the force in a box that can run on the browser Then guess what you can recluster you can redistribute on the screen And then you can keep on repeating that as many times as you want if you're doing that You have to let the user know how deep they are into the rabbit hole But other than that is actually a great way of filtering down into the details on the man That is the the whole idea of drawing insights now having say that Another tip of something that could be really useful for for doing visualization And is what I did with the open miss visualization is just to have a fixed layout. So in this case It was very jealous because You Americans had all of these beautiful grid maps and Chris master once opposed about created his own Representation in which you can actually see the different states with the same size You know, which ones I'm talking about the ones that you see all the time with political results But in Columbia, we didn't have that and then the thing is that actually Chris wrote a beautiful article explaining how you have to Pretty much do that those manually So I didn't have the time to do that But what I did is just I created a force layout that represented the states in Colombia and then just by moving those states I came out came out with this one And it's actually one of my students in Berkeley actually created a new version of this for for the US one And it's actually quite useful. He's not really truly a network But he kind of explains the idea of why fixing things into certain positions can be useful now having say that The other type of things that you can do on query or sorry on networks are query And by querying what I mean is that you choose one node and out of that note You can start creating something like a that I call egocentric views So we did the type of egocent the the quote of queries that you can do with egocentric views is the same type of things that we saw early this morning with nothing That is that you select what note and then you see what are the notes that are connected to them That are one test one or two or three steps away from them And then if you can provide interactions that allow people to select one note and then expand that one on demand That is marvelous for them, but I don't have much time for this So I'm not going I'm going to jump very quickly on that now All of that is all the different things that have been doing on D3 that helped me untangle in networks But I also wanted to share and this is for the first time in the world We are talking about network cube net network cube is a project initiated by John Daniel Fekete Benjamin back Natalie and Paula and a manually or whatever you pronounce his name They have been created this open source Application that is actually a framework that you can go and upload your own data And the nicest thing about it is that it has the basics network visualizations But apart from that they also have different representations that go beyond the node link representation So if you guys actually click on the note link, it will scale very well And it's also for dynamic networks, so it will work if your network changes over time But apart from that you can use matrix representations You can use timelines we already hear about them earlier today or you can use even mixed approaches So it's free for use. It's not commercial And it's just an initiative that we have been creating just to allow people to untangle more networks So please feel free to use it and just to go and visit the website And let us know if you guys have any feedback on that so with that I will like just to give you some A bunch of recommendations of the type of things that you can do for Defining your networks and is that always remember to define your tasks always remember that if you filter Actually, you could be showing more sometimes less is more That you have to define if you have tasks that are for overview or for query And that there are many other alternatives beyond the node link that you can actually use with that by the way That should be updating life on the tweets that you guys are doing I have been creating the hairball for for this year. So there you go. So that appears a new one So with that I will thank you very much and welcome any questions you may have