 Okay, up next we have Asha Hamerli. She's gonna talk about easy data visualization with graph. And then we'll eat some food. So, Asha. So, I'm gonna talk today about getting the most out of your data. So, a little background. I work for a company in Belgium called Dreambox Learning. We do adaptive software that teaches children math. As part of that, we currently have over 700 individual activities to teach specific skills to children. And we're manually specifying the dependencies. We're manually specifying things like you need to do counting before you can do addition. You need to do addition before you can do multiplication. And I started down this project because we needed to find the bugs and the patterns in that data. When you're manually specifying stuff, you make mistakes. And we need to find those mistakes before they go out into production. I'm gonna pull examples from my work experience space, but this stuff applies to any project where you have data that has relationships inherent in it. So, if you have subscribers and you have a pyramid scheme where one person invites five other people who invites five other people, you can model that with these tools. There's all sorts of things you can do with this tool and it's really easy, which is the best part of it. So, let's start with a quick example, which is easier to comprehend, this. And this is an XML file that we use to specify, this is 10 lessons worth or 15 lessons worth of data, or this. That's a picture drawn with graph from that data. In that XML file, you couldn't tell what was going on. In this picture, you can see that down the left side and on the right side, there's a long string of stuff that's dependent on each other one after another in this pattern. There's some other stuff over on the left. You can see that the blue notes are things that don't have any outgoing edges, which means that there are endpoints in the graph, the leaf notes. And it's a lot easier to see what's going on here. I can see that that note at the very top is not connected to anything and that's probably a bug. I couldn't have told you that from the XML file, there was just too much data there. And it wasn't in a format that humans are gonna scan it. Making pictures by hand is easy. We realized we had a problem very early on and some of our non-technical folks were drawing out graphs and visiting on the graph to model this data. It worked well, everyone got handed up for an ounce. Problem is, it doesn't scale. We're a startup, we iterate, we iterate a lot. We add data on a regular basis, we change things up. And if you have to, every time, go back and redraw your hand graphs, you're gonna be slow. So some of the disadvantages to doing things by hand, as I said, it's time consuming. The underlying data changes frequently, it gets even more time consuming. And the other thing that we realized very quickly is that different people need different views of the data. Our customer service and sales folks need a kind of high level, 10,000 foot view. They need to know that counting comes before addition, they don't really need to know how many activities there are in counting. The testers, however, want to see each and every one of those dependencies. They want to see the whole graph in detail so they can find those bugs. That 10,000 foot view is not especially useful for them. So in addition to having to recreate the graphs every time we add the content, we had to add, we had to recreate multiple graphs every time we added the content. And it just made it even more time consuming. So I brought this problem to Seattle RV and someone goes, why don't you use GraphFizz? And I'm like, what's GraphFizz? GraphFizz is a program that uses the dot language. The dot language is a language designed for specifying graphs. And by graphs, I'm talking graphs as in graph theory, not graphs as in bar graphs you created in third grade. Graphs have a combination of nodes and engines and GraphFizz allows you to edit attributes with a node. So you can change their shape, you can change their color, you can change their fill style so that you can highlight specific kinds of data. So we'll do a quick example. This is a very simple example. We have a digraph, which is directed graph. We have an edge going from A to B and an edge going from B to C. A has the shape box, B has the color red. And that little code makes that image really straightforward, really easy to comprehend. So there are some tools for viewing dot files. GraphFizz is the most common. There is a WinGraphFizz for like three people in the audience I saw yesterday who were using Windows machines. The Mac version is pretty great, it occasionally crashes, but it mostly crashes when you change the underlying file underneath it. Like if you're editing the file in Emacs and you're viewing the picture in GraphFizz. It works really well. It doesn't do so well with large graphs. I was at RubyConf last year and someone had a graph with like 30,000 nodes and GraphFizz just chose. And for that there was two look. It's kind of the grown up version of GraphFizz. It's got a lot of options and a lot of power but that also means it's hard to use. So for almost any purpose, GraphFizz is just fine. This is Cascadia Ruby tool. So we're gonna use Ruby. All you need to do to get access to the graph gem is sudo gem install graph. A-Ruby, A-Ruby gems. So let's do a simple graph. So you wanna draw a single node. We have digraph, take some blocks of digraph and do. We have a node with the name B and we get a label of B. And talking to Ryan Davis who wrote the graph gem last night, he insisted that I say that this is a bug that I haven't reported. It should just be node with the name B. So I will report that bug and we'll get fixed. So a graph with a single node is not all that useful. So let's look at a graph with an edge. So another digraph block, edge A goes to edge B. The order is implicit from the order that the elements are. And the other thing that's kind of subtle is that the nodes are implicit. You see that there's no specification of the nodes here. I say there's an edge between A and B so obviously there must be nodes in A and B. You don't have to tell it any more information than that. It figures that out on its own. So you can draw these but that's not especially useful. We need to save them. The graph gem is Ruby so it's straightforward and simple. Save, file name. This creates a dot and dot file which you can then open in graph desert server. There are other applications that'll open dot, dot, files often they'll open it for import but you can't edit them and save them back out and do on the graph with one of them. This is a simple cycle going from A to B, B to C, C to A. So saving them as dot, dot files is great because you can then edit them if you want to but if you want to share them with people you need to export. The dot language has easily 30 probably closer to 50 possible export formats. So set the export format, save, file name, file type. So PNG, jpeg. There's a full list of output formats on the GraphFiz website when the slides go up online and you can watch them talk from contrary so you can check them out and figure out what you want. PDF is also included, for the Windows users bitmap is included, they're all there. It's kind of freaky how many different output formats are available natively. So with just those basic tools you can build bits. This is a graph that's an early version of our curriculum at kind of the 10,000 foot level. We've got comparisons and we've got building numbers up to 10, ordering numbers up to 100. There's some stuff at the bottom, place value in addition to subtraction and work with the constant of inequality. And being able to pull that out from one of those XML files was awesome. Also could pull out a different view, which is this. This is a deep dive into one of those circular or over-it notes and this shows the actual activities that would be in one of those. And so there's some stuff on subtraction and there's some counting to 500 by 10s. A lot of different activities and from that one XML file we're able to pull out all of these views. That's a pretty boring stuff. They're black and white, it's boxes or ovals and lines. And the real power of the graph is what else you can do with it, the ways you can highlight and emphasize specific aspects of your data. So let's talk about shapes. Maybe decide that you really want all your nodes to be triangles. Great, you say that the node attributes have triangle on that top line. Hoof, imagine all your nodes are triangles. It's just that simple. Boxes are special. Oftentimes you want to use boxes because they take up the least space. They're very short around the label, you don't have the extra swoop of the ovals. So boxes are really common in graphs. So you can just put in the word boxes. You don't even have to add it to the node attributes. Automatically all of your nodes have boxes. But maybe you want to be more creative and do many shapes. Maybe you want to do an old school flow chart or you want all of your initial nodes to be inverted triangles and all of your end nodes to be regular triangles so that you can see where stuff comes in and stuff goes out. That's easy too. You paint node A with triangle. You paint node B with circle. You paint node C with diamond. The other interesting thing that's called out in this particular example is that you can have multiple edges in a single line. So we've got edge from A to B. We've got another edge from B to C. And we don't, we can put those all on the line. You can do 10 edges on one line and it just knows that you take them each pair-wise and draw the edge in between them. So shapes are cool, but oftentimes people want to use colors. They want to color code things and work with a bunch of teachers and they color code everything. So this is where you can do coloring. You can give red as a node attribute and all of your nodes will be red. You can give blue as an edge attribute and all of your edges will be blue. Maybe you want to use lots of colors. Maybe you want to have specific things, have specific colors. So we can say node attributes, we can paint our node gene with green. We can also specify that our node attributes are filled, which means that we're gonna fill in those bubbles and so the colors are even more obvious. So here's an example. It's the same graph we looked at earlier, except we're this time about a color to call out some things. So those two hideous bright red boxes at the top that basically scream error, there's a bum here. Those are islands. Those are orphans in the graph. They're not connected to anything. And when we're testing and when we're trying to get an idea of what's going on with that giant XML file, finding those places that aren't connected is vitally important because that means we've put effort into building that stuff and nobody's ever gonna see it because it's not in the graph. I've also colored the edges, leaves of the graph blue so I can see where we need to hook stuff together where other places, places that are dead ends in the graph and if a student goes down that path then you may have a problem. So as you can tell from the previous slides, I am not a designer. I am fairly design-impaired. I think many people in the audience probably also are. Luckily GraphVis has color schemes. And here's, so it uses something called the Rubber color schemes. These are predefined color schemes that someone who knows a lot more about design than I do design. They've set them up to be helpful to specific audiences or to convey specific meetings. So there's color schemes for sequential data. A good example for sequential data that a lot of people have seen is election night return maps where they have the map of the US and the counties are red or blue or some shade of purple showing how far Republican or Democratic they went. And that's an example of sequential data. You can see the range because it goes from red to blue. They also have data that indicates that all of your data is categorical. There is no implied sequence. They have color schemes that are colorblind friendly so if you want to make sure that a colorblind person can still understand your color coding, you can pick out color schemes that are colorblind friendly. They have color schemes that are photocopier friendly. We photocopied this stuff a lot and being able to make sure that I've got a photocopier friendly color scheme is fantastic. So if you want to explore the color schemes, I recommend colorwordchame.org. It's got a nice UI for checking specific attributes that are important to you, telling how many different kinds of data you have. Maybe you've got eight categories. And then it shows you examples of the different color schemes that you have access to. And then from mapping those color schemes into the dot language and into graph, the Graspus website lists them all out and got key little tables where you can see all the different color schemes and how they were looking graphless. But the same color schemes, the naming is slightly different though so having that resource available is really. So here's an example of how you use color schemes. So I'm going to set the node attributes to be filled again and then I'm going to set the color scheme to set one with four colors. So it's set 14 as the way it looks, but that actually is set one with four colors. And then I'm going to say that the attribute of node A is fill color one. Attribute of node B, fill color tier. Say really self-explanatory thing. We draw an edge and what you get out is this. So the example we just gave is the middle one. You can switch to a pastel color scheme by changing that one color scheme line to get the top example. And the bottom is an example of a sequential color scheme. This sequential color scheme is going from orange to purple. And so the stuff that is the most orange is the most to one side of the spectrum of the stuff that is the most purple is the other way. And that's a really good way of indicating data that has a sequential nature. So color schemes are awesome. I use them all the time. They make me look really great in terms of a design sense. When I first showed someone something with color scheme they're like, that's so pretty. How did you pick those colors? And I was like, I didn't. The program picked them for me. I picked one that looked kind of good. I knew that people would be able to see and I put it up there. It's awesome. It's really fantastic. So here's another example of using color schemes. So the T-O-H nodes are kindergarten curriculum. Those are that same curriculum graph I showed earlier. They look kind of grayish on the screen there. Those are first grade curriculum and that light blue one is second grade curriculum. So what you can see from this and became really obvious when we started applying the color schemes to the data is that we've got a kindergarten unit over here, some kindergarten content that you have to play through part of first grade to get to here. And that's a bug. And it wasn't obvious before but now that we've got the color coding in the data we're showing that off very clearly and anyone can see that. It's really obvious and we can fix those issues though. So coloring is great and clustering is really powerful too. So this is what you use when you know that your data has some internal clutches. You know that there are groups that you want to think about. You want to force graphics to lay it out in a way to show those groups to you because otherwise it just lays it out in a way that makes the most sense, usually minimizes the edge lengths and edge crossings. So this slide that I showed earlier is kind of a mess. It's better than the XML file but I happen to know because I know the data really well that there is two distinct subsets of data inside that graph. So with clustering I can turn it in from that into this. And from this you can see really clearly that group on the left has a lot more content than that group on the right. So when we're doing a content review and trying to figure out which of our backlogs we're gonna take on next, the stuff that's gonna have us build more content for that right beard, all of a sudden there's a higher priority because it's really clear that we need to flush that out some more. So how do you do clusters? Well, it's easy. So we've got our normal guide graph block and inside that we just got a sub graph block. So I've got a sub graph, we give it a name of cluster one. We put a label on it and then we put an edge through A to B because of the implicit nodes that come from edges we get nodes A and B. Do the same thing for cluster two. Name the cluster two, put a label of cluster two, edge from C to D. And then you can put further stuff outside the clusters and everything will be connected up automatically. So a really important note is this bites me every single time I try to do clustering after I've taken a couple of weeks off. If you want that gorgeous box around it, very clearly defines it. You need to name, so in that sub graph block line, you need to name something that starts with the word cluster. If you call it foo seven, it'll be totally happy. It'll use it for layout, but it won't put that box around it. And a lot of the power in clustering is visually putting that box with a label around it so that people can see that this is very clearly a sub graph. So that's all great, but I actually haven't showed you how I pulled data out of that XML file yet. This is all good stuff. It's like, yeah, cool, I can draw pretty pictures in Ruby, yay. So let's actually talk about having pulled data out of that giant XML file. So here's an example of XML file. It's much shorter than that one I showed. It's a cookie example, but it'll get the point across. We've got four lessons. The lessons have an ID and a name, and these are the other activities I was talking about. And then there's some sequencing for lessons. I'm sure that people in high school or college had to deal with prereqs. Specific classes had prereqs that you had to pass before you could take the next class in the sequence. Same thing here. So lesson number two, lesson ID, two has a prereq of lesson ID one. So on. Four prereqs, four lessons. So first we have to extract the data. I like using Nukogiri. So we open up Nuko, we open up the file in Nukogiri. We pull the lessons out with XPath and put them into a lessons collection. We pull the sequence elements out with XPath and put them into a sequences collection. And then we call a function we're gonna write called drawGraph. So drawGraph. First thing it does is create a digraph block. It iterates over all of the lessons and puts a node on for each lesson. And I'm giving the node an ID of the ID and then using the name as the label. Graph is in dot are really smart. They understand that when you're writing code you often wanna have an ID and a name that are separate. You can use the name or you can use the ID depending on what makes more sense in the context. But each of those nodes actually has both of those attributes, which is really fantastic. So then I go through all of my sequences and I draw an edge from the prereq to lesson ID. That's it. That's all we need to do. You put a save line in here if you were doing it for yourself and then you get this graph out. And the best part is that exact same code would work for a file that was seven or 800 times longer than the one I have. The graph would be more complicated but the basic technique is just as simple. It is absolutely great and easy. You want to create different graphs, different views. You slightly change how you use those Nogogiri collections that you get out with the elements. It's really, really awesome. So if we put that all together, we get this. This is kind of insane but this is three quarters of the curriculum that we're currently working with. It's actually about six months old. And it's color coded. And so the kind of teal-y stuff is for kindergarten, orange is first grade, blue is second grade, purple is third grade. And it's kind of insane projected like this but that's why there are plotters in the world. You can print this out on the plotter, you can tape it up to the wall and that's exactly what we did. It's taped up in a common development area. We need to reference it as a group. We can put it out on the table, we can all pour over it. We've been talking about making changes or modifications, people throw sticky notes on it and see how moving stuff around would look. I was hanging out in the hallway the other day and someone came out of the meeting real quick, ran to it, traced their finger down a specific line and said, yes, that'll work. Ran back into the meeting. It's fantastic to have something very visual and tangible that people can look at and reference. The other thing that we can tell from something that's color coded like this is that, so if I can get the mouse, oh the mouse will come on there, that's awesome. So this here, probably can't see the lines but I can tell you that this is kind of a mess and it's kind of a mess because it actually is kind of a mess. That particular part of the curriculum is really, really complicated. A lot of really strange dependencies and we're working on cleaning it up. This is some of the first code we wrote. You can see down here, this stuff looks a little more similar. This is some of the later stuff we wrote. We learned as we wrote along. We're showing this to someone and they said, well, what's going on over here? Well, I actually looked it up. That's an email that we send based on specific skills that have been mastered. And yeah, it's a choke point. There's a whole bunch of lines coming in because once you've mastered, all the skills are represented by all of these lines. You get the email and then a whole bunch of other content opens up. Which means that if this email is somehow working, everyone's gonna get stuck and knowing where those choke points are is really, really helpful when we're debugging. So this is what you can build with this. And we have five or six different versions with different color codes depending on their purpose. It's all the same underlying code with just slightly different attributes changed in the files that we set. And so I'd like to do some thank yous for this. So I wanna thank Ryan Davis for writing graph and then rewriting it after I showed him clustering so that it natively supports clustering now because I was hacking it in beforehand. I wanna thank Aaron Patterson for no good hearing. It saved my bacon more times than I can count. We try to deal with these giant XML files we use for storing static content. Bed and Shave for organizing Cascadia movie comp. It has been fantastic to sleep in my own bed at a conference. I'm really good fan of that. I also thank Dreambox for letting me speak here and letting me share this stuff with you. They tend to be really secretive because this is our secret sauce but they knew that a lot of folks would get stuff out of this so they were into it. And we're hiring. Please come find me. We're specifically looking for developers, testers and a whole bunch of other entry-level, non-technical positions. So if you're interested, come find me. We can talk. We're in C and we're in Bellevue. So we're local. And at this point I want to thank you guys for listening and take any questions that you have.