 Thank you very much Irene Yes, cool. Good morning everyone. My name is Shirley and this is nothing and we are super excited to be back at open Viscom this year to talk to you about our project data sketches So let's start from the very beginning We actually met online virtually in the data visualizations Centric slack in the fall of 2015 and Didn't meet each other in person in real life until open viscoff in Boston last year those are our happy faces and At open viscoff we had the pleasure of giving talks and hanging out for three whole days where we hit it off Super well so that when two months later when Naughty put up her SVG tutorials From her open this talk I went and dove into it with vigor and I started chatting with Naughty about all of the questions I had of her tutorials and somehow that whole conversation led into us lamenting the fact that we hadn't finished as many full blown visualization projects as we would like and Then I had an idea And I was like hey Naughty fully expecting to be rejected Do you want to collaborate on something? And then she said yes And that's how data sketches was born So in the following week we figured out that we both like the idea where we would create a Visualization each month around the same topic and do that for a year to see how two people would create a sort of Visual to visualization starting from the same seed the topic But then diverging into different paths based on our own interests in history Well besides sharing the end result We also want to write about the creation process and we split this up into the three pillars that we find most important data sketching and coding and Initially we thought we could pull data sketches off at about five to six hours a week But you know real life usually doesn't agree with plans, especially coding plans So since starting in July 2016 we've clocked many many hours into creating a visualization each month And during this talk we'd like to take you through some of the lessons that we learned the challenges that we faced and how we Tackled them and the insights that we gathered along the way So let's start with the data We often get asked this question Do you get the data first and then come up with your ideas or do you get the idea first and go and find the data from that and For us the answer is always always idea first So for example for my November data sketches I wanted to visualize every single line in the musical Hamilton and I built this Filter tool where I can filter by any set of characters as well as their conversations and any themes and then be able to dig into the set of lines from the songs that were left over and The idea came from a question of how do the relationships between the characters change throughout the whole musical and Then what are the recurring phrases and who are they associated with? Now as you can imagine this data set is not available anywhere online save for the lyrics themselves So I had to go through all of the lyrics and know all of the recurring phrases that pure for more than across more than one song group them into broader themes Go through the lyrics manually again so that I can enter them into the computer Associating them with the right song and line numbers and also do the same thing for the characters and conversations Write the script to aggregate all of that information together to get my final data set and a more extreme example For my October data sketches I wanted to put emojis on the well former president Mr. And about mrs. Obama's faces and then I built this tool for exploration where you can go through any of the videos I found of their late night talk show interviews and then go through the whole Entire YouTube video to look at all of the emojis that I put on their faces And this idea really came from a conversation with Eric Cunningham right there I'm where he was like wouldn't it be cool if you could just run facial detection on the videos and correlate their emotions with What they're saying that was like hey Eric you realize this those that like I only have one month for this project And then I was like challenge accepted So I started with first manually gathering all of the late night talk show appearances off of IMDB I then I then went and downloaded or I then went and found all of the videos correlating with those talk show interviews from the host channels Used a new package to download all the videos and their captions and get the timestamp from each of those captions So that I can take a screenshot for every single time somebody talked Upload that to Google Vision API because they give me information about Faces and their boundaries and emotions and how happy they are or angry or if they're wearing a hat And then I took that data and then aggregated it with my caption data to get my final data set What these two lessons in these two months taught me it was if I'm just curious if I have a curiosity There is some way that I'm going to be able to get my hand on that data set whether it's to manually Go through and enter them or write a script to automate them Which by the way, I found I think there is a no package for literally anything I can imagine under the Sun As long as I do all of this responsibly and legally Thankfully not every month. This is data intensive as that So for August the obvious theme was the Olympics especially since we're both big fans And I decided to visualize all 5,000 gold medal winners since the very first Olympic Games in 1896 So each of these circles represents a group of similar sports like water sports or ball sports And then each slice or feather within a circle represents one sport like athletics We have the first edition over here going outward to 2016 The reddish background are female events and the blueish background are male events And then finally each medal is given the color of the continent in which the country lies that won the medal so For example, America's is red Europe is blue and so on and we can see here that there were actually no female events in the very first editions of the Olympics, but we've been catching up since then So I actually found the data for this piece from two articles published by the Guardian for the 2012 games in London However, after getting a rough shape of my visual on the screen I noticed some very obvious medals were missing like hockey from 2012 So my confidence in this data set suddenly dropped drastically even coming from such a respectable source as the Guardian And I therefore had to get a sense of the accuracy of the complete data set But I didn't want to have to go manually throw all 5,000 medals. Maybe surely wouldn't So I found a proxy instead So on Wikipedia I could find the number of events that happened during each of the editions and I then compared that to the number of gold medals I had in my data set and if there was a discrepancy I dug deeper to figure out where and how and That's how I found out that for some of the years the horses were also in the data set winning gold Which makes for an interesting read to see princess and sissy and lady Mirka as women winning gold in the Olympics Well, eventually I ended up figuring out each discrepancy and making adjustments to my data to get it to the point where I trusted it again So my lesson here was that even if I have data from a very respectable source I really need to get a sense of accuracy and completeness Missing data can be harder to find than wrong data And you don't have to check every value like every separate value But think about taking sums and counts and averages and comparing that just a plain common sense Or even better a different data source so Many people dive straight from data to find a visual But take some time and actually sit back and sketch out your ideas on paper We've filled many pages of our notebooks since starting because it helps us think and lay out ideas beforehand But my sketches often are very simple only focusing on the main abstract shape that I want to fit my data into Colors and layout and design these are things I only vaguely think about but don't act on until I have the data on my screen There's just no use for me to think about these things until I figured out that the data actually works once I've morphed it into the main shape So for the Olympics, for example, I had this idea of feathers placing emphasis on the more recent additions But I had no idea if that would look all right once I finally placed all 5,000 metals together so I had to see if the general shape would work before moving on Well, I started laying out the feathers, but it took a few steps until I saw that luckily it did show potential with the actual data But sometimes there's even no use to start sketching on paper Although I will say that's very rare for me, but networks are an exception and for our October month I decided to dive into royalty. I've always been intrigued. You know how into merits the royal families really are are they all cousins twice removed? Well, luckily I found a genealogy data set that contained a gigantic family tree of the European royal houses It was from 1992 So I had to add one or two more generations in the main line of succession, which was a fun night on Wikipedia not So here we have to find a result of all 3,000 people in my family tree The bigger circles are actually the 10 current royal leaders and everybody's connected to their parents their partner and their children To highlight the interconnections you can hover over anybody to see their six degrees of separation light up and how far they reach into the web But you can also click on any person and let's see if I can get to work here and any other person To see the shortest path between these two people because the entire web is connected. They're all family in one way or another But when I started out with this data set, I had no idea what it contained So I just sort of plotted everybody using the most basic network settings And then this happened an explosion of points and lines going out of my screen So I pulled in gravity a bit. I could have used D3 expressed during this part But and then I ended up with a useless hairball So I call it the points by year of birth Well still not helping well thankfully in D3 version 4 you can easily have gravity depend on variables So I pulled the web apart by year of birth as well Which was better, but it was still rather an insightful bundle And at this point I'd already invested several hours into playing with the network settings playing with different types of connections Adjusting my data and I was really at the point where I thought about giving up Maybe I could try a different angle like how much they're spending each year or something But I gave it one last shot and that's when I decided to focus on the current royal leaders So I placed these in a line and then I let the vertical gravity depend on which of these royal leaders You were most closely related to and that's when I finally saw it Insights for example that the Queen of Denmark over here is very central in the web But that we have the Prince of Monaco whose line separated from the rest more than 200 years ago And it was around this time that I also started thinking more about the general design aspects and Networks often remind me of constellations and with my astronomy background I have a bias for all things space, so I turned it into a starting night But I could have never sort of designed this visualization before hand and illustrated or sketched it I had to go hand-in-hand with the actual data and apply my design choices to all of the data Simultaneously so that I could see if the results were both interesting and engaging So Nadi gave us a really great example of when you land on the right visualization earlier on But that's not necessarily always the case for our March data sketches We had the incredible opportunity to work with Google News Lab and their search data dating back to 2004 Which by the way launched this morning, so please check it out. Yay And so with access to all of that data I wanted to look into what people are searching for and specifically what people in a country search for around the world So each of these blocks are a topic that the US have searched for in spring And I can toggle between all of the different weather's whether seasons And see the topics the top topics as well as dig into a specific country and look at the top places there I can also expand on a topic so that I can see the search interests and seasonality for that particular topic and The question I really started out was what are the top search countries which turned out to be Brazil and Then I was like who's searching for Brazil Can I see actually what kind of topics are being searched for around the world in each country including Brazil and can I see the distribution of Those searches across the years and then let me actually get a little bit greedy because I want to see the search interests also Okay, maybe not Okay, so let's step back and let's try for circles for each of the topics and size them According to the search interest and maybe I can show the country is searching for each of those topics by overlapping circles And this is kind of pretty and like bubbly So I kind of like it But those geography play a part in who searches for a country's topics So let's try sorting maybe try sorting all of the topics by distance for a year And that doesn't look really great So maybe I can just concentrate on one topic across all of the years and that actually kind of looks like it might lend Well to a heat map, maybe Maybe not this is really not going anywhere I need to step back But I did notice in an earlier exploration that seasonality is quite common in a lot of the topics So maybe I can keep these sorting by distance because that geography is actually still quite interesting But maybe this time around I can try and filter and group by the seasons and that sounds quite promising, right? Nope, so it turns out that all of the topics are searched for across all of the seasons Making for all of the bar graphs to look exactly the same. That was pretty sad But wait, then I realized that Because there is seasonality the search interest is actually different across all of the seasons So if I just size all of the heights of the Blogs by the search interest I actually start to get very interesting insights like this one that the U.S. Searches for travel more often in the spring than in the fall and Finally finally ta-da. I have my last Visual is my final visualization form that I'm quite happy about Where each of the topics are grouped by the country that they belong to and there's interesting insights like for example, if we just look at the topics for summer Mexico is actually not searched for that often, but Canada is and and and the hotter countries around the world like Thailand and the Philippines aren't searched for that often, but if we go instead to winter the opposite is true where Mexico peaks and Canada drops and Belize and Thailand and Philippines the hotter countries actually go up in search interest So the lesson here was Sometimes Sometimes we don't get the right visual from our first sketch or a second or even third try But be patient and go back and forth with the sketching code Because that will help figure out what works and more importantly what doesn't work so that we can go on to our next step So as expected most of our hours are actually spent on getting the data on the screen and here are some of our maybe less obvious coding lessons So in the very first month the topic was movies and therefore it was immediately clear to me I wanted to do something with the Lord of the Rings and I found this super interesting data set that contained the number of words Spoken by each character in each scene of all three extended editions of the Lord of the Rings Amazing So I decided to focus on the members of the fellowship to see we have them here in the center And then to see how many words they spoke at each of the locations around here on the circle Well, no surprise maybe that Gandalf speaks the most But my favorite insight is actually that Boromir who is really only alive during the first movie Manages to speak more than Legolas does in three Anyway When I looked at the sketch for this project I found that it was very similar to a core diagram so I thought I could start from there and then slowly transform the core diagram to my sketch and the most basic thing to me was if I could figure out how to get these cores to flow inward and That actually took less time than anticipated which is very rare for me in coding But yeah, it worked well getting rid of the excess space and now it's ready to handle the Lord of the Rings data And some more appropriate colors So we have nine members of the fellowship So making sure that the centers end up at the right vertical location, but this was looking very squished So I used the same setup that I used in my bad plot and pulled the two halves apart And finally as you can see these cords are now looking rather unnatural And I therefore decided to dive into learning SVD pads And that was the thing that took the longest in this project sort of figuring out how to make these pads look more natural and that's how this sort of new D3 layout came into existence mutated from D3's core diagram and Many people have done a wonderful work that you can use So even if you think you are creating something new you don't always have to start from scratch Just pick the thing that lies closest to your design or idea and start adjusting that Remix what's out there already Sometimes we dream up visuals that are unique enough There's no base for us remix off of for that same movie month I wanted to take a look at Top summer block busters in the last two decades and reimagine them as flowers So each of the colors are associated with a genre and the size and number of petals for the flowers are their IMDB ratings There are some really really beautiful flowers in here I think like the Dark Knight Rises and Slumdog millenair and my absolute favorite is the 1997 Batman and Robin, which is this tiny teeny little thing that I think is super cute And I've gotten questions about hey, how did you make this? How long did it take and the answer is it's actually really quite simple it just really takes a good grasp of SVG pass and in particular the cubic bezier curve command and So how that works is we start out with a starting point of in my case Zero zero and the way that I like to think about it is we draw a line between that starting point and the end purple point and then we take the two anchor points the blue and green and Nudge them out until we get the curve that we want and then drew some of the lines and Made the curve on the other hand Rotated it the petals out and added the colors with some motion blur and And that's it. That's all that it took so the lesson here really was that When we're creating things really understand the tools that we're using because that's how we can go beyond the prescribed examples and in particular our favorite tool our favorite tool is SVG pass because with that under our belt we can make anything any shape that we imagine up for truly unique results So now we've seen two examples of adjusting paths. What about their positions? Well going back to the Olympic feathers, you know all of these circles and slices they depended on you know Depended on each other, but they were very structured. They all follow the same concept And at first I tried to calculate all of the rotations of these circles and slices in JavaScript But after having written like 30 lines of code and still not achieving something I knew I could do an R in two lines. I just pulled all of these preparations into our as well So even if they were visual variables, they have nothing to do with the data only with how it will be laid out on the screen So for example, I pre-calculated the initial rotation that each of these circles needed to have So that eventually the center would end up at the bottom I Pre-calculated the offset that each of these slices would need to have based on their predecessors The only placement variable that I kept in JavaScript to keep it dynamic was the year scale from the center outward because then I could sort of scale the entire circle based on the screen size and Even the metal offset is something that I pre-calculated in R And you can use the same idea of visual variables in networks as well that are static and fixed Just download the final X and Y locations and the next time place them there immediately Saving your viewers from having to run and wait on a heavy-force algorithm So even though they have nothing to do with the data It's perfectly fine to pre-calculate visual variables and attach these to your data set And that's more often the case for fixed data sets than you may think No, sometimes it's just way easier to calculate some things outside of JavaScript Or it can save you a lot of browser calculations making your visual easier to load and as an end of the bit as a bonus It'll make your JavaScript file a lot more readable as well So far we've talked about what I like to call the initial 80% the data preparation the ideation the visualization itself But I like to I think that I like to think that the last 20% is extremely important as well So when I started thinking about the story for Hamilton I wanted to reach a wider audience than usual and that meant that I wanted to make sure that they were engaged enough that They would keep scrolling down the screen So the first thing I did was I had the dots fly in as the as the page loaded into the center to form the Hamilton logo and As the user scrolls down The dots fly apart and dance in the background and then come back together so that I can tell the reader Hey, each of these dots are actually the lyrics and as they go into the first section of analysis Each of these sections actually Correspond with a song so then I highlight the correct song to tell them Where to to tell them where that were what the song is for each section The next thing I do or one of the other small things I do is If the user decides they want to click on a song Ooh That worked cool. Um, I didn't expect the sound to work Um, you can see I put a progress bar on the right hand visualization. Oh, and you can't really see much else um, oh Okay, but it gives users the context of where the song is relative to the musical itself Another smaller example is for our March data sketches where I use animations To explain how to read parts of the project But I don't actually trigger these animations until the user has scrolled into that section So that they can always start from the beginning of the explanation No matter where and how and how fast they're navigating and it's these small Attention to detail and attempts at delight that really make a piece for me because it tells the reader that we really care about their experience And some more examples of what you can do with the light While on a flight back to Amsterdam I was without Wi-Fi so I couldn't do anything essential and therefore I decided to animate the legend I had for my visualization about fantasy books just for fun and Other non essential things that I did are adding animated gifts of the most memorable moments of Dragon Ball Z That took like two hours to go through all of the gifts Or adding a hover with the most detailed information I had from music nerds in my December music visual or turning the top ten songs Is the tiny vinyls or having annotations about weird and silly events that happen in the history of the Olympic games Like Henry Pierce having to stop for ducks in the wrong event, but still managing to win gold So although getting your data on the screen in such a manner as to make it insightful is key It's the other things to add such as animations annotations weird legends gifts and more that can make it truly Unique and special and even more of the light to investigate. So take some time to think about these aspects as well And then we get to the soft of or what I like to call the best stuff When Nadia and I started thinking about data sketches and talking about it We really didn't expect the kind of reception that we've had We thought that if we could just have fun Maybe learn some things and if some of our our friends will enjoy the project that'd be really cool But we've gotten the most amazing responses on both our visualizations and the value of our write-ups and we've gotten to meet Incredible people and talk to them that we would have never had the opportunity Otherwise and we've gained an amazing friendship with each other that we didn't expect at the beginning We just wanted to have fun But when we step back and really thought about the transferable lessons The we agreed that the most important was if you're about to take on an ambitious project Make sure to bring somebody on along the ride with you and make sure that if you're not too responsible like me that the Person that you build that you bring along is very responsible So that you can keep each other accountable and relatively on track for you for your project Make sure that it's someone that you really respect and hopefully that respect is mutual and Most importantly that it's someone that you trust or can grow to trust because that's absolutely crucial to receiving and giving feedback and Finally if you're about to do something ambitious like make a visualization from scratch every single month Know that know that it'll be hard There's there are there have been months where we've been creatively drained and didn't know how to go on but Remember you learn as you struggle and it's absolutely amazing in the amount that we've learned both technically and personally And it's been actually worth the time So over the last ten months We've learned to find data in the weirdest places that it's not blasphemy to pre calculate visual variables that Sketching can help weed out thinking errors But that you can also sketch with code that SPG paths are amazing and math is too But we already knew that of course and that it supplies a surprisingly small things can add a sense of delight to your audience and We didn't set out to learn or be confronted by these things like like Shirley said we set out to have fun And in that we definitely succeeded sure it's been intense there were times when we were coding into the night When we would have ran a business TV show, but it has opened up paths and opportunities that we weren't even looking for But happy to have gotten so two more months to go or maybe three and then what? Well, I can assure you we will not keep on creating Visualizations in our spare time at the same breakneck pace but we do want to share Data sketches with everybody and we've had so many wonderful reactions, especially about the write-ups so we've decided to create a data sketches publication on medium and Anybody who has made a visualization and wants to share his or her writings about the creation process can do that here You can do a full data sketches month collaborate with others on the same topic But it's also fine if it's a single standalone project the main point is to show how your final visualization is a product of Iterations mistakes and improvements So please let us know if you ever have anything that you'd like to contribute Well, we hope that you'll join us in our final two months of data munching sketching and coding up our topics and often Weird fun and overly elaborate visualizations. Thank you very much