 Yeah, so I guess first of all my first like regional talk. This is a subject. I'm really excited about I work at a company called my straight place Yeah, so I kind of been getting interested in data visualization Kind of a hot topic in technology Yeah, there's a lot of like a big documents huge Repository of data for all things government and some of my plans also has a lot of tools and data themselves Info chance is a I guess a commercial offering they have some open source data and I gave kind of a precursor to this talk like a sort of Testing testing things out talk with the user group and They were like slides. I was doing on the code repositories. I'm looking at code repository formation as data. I Think that's pretty cool So I the means by which I was accessing this Code from my repositories and the possibilities I've interested in I was ripped and I think most of you remember grid. It was it was kind of like the first Library that came out when they were launching the github It was really cool because you know is it was this library movie that would interface with your repositories and would provide you like ways to interact with your repository and The reason I liked using for data visualization is that the data is really complete One of the things I found was just using using some of the other Sources of data The data would be really sparse or be It'd be an Excel file or a CSV file one of the Repos and it pulled down from data.gov was the energy data and The energy data was really interesting. I thought it was I thought it'd be cool to Work something topical and that sort of thing, but What I found really quickly was that there's always really weird formats and data wasn't where they're supposed to be and you had to use like six digit codes to to figure out which You know any source in which units were being used so it got really complicated really quick Great great also provide some like really nice basic built-in stats for commenced and that sort of thing and Also, it's really nice just because with great yeah access to anything to get up as far as the code repositories go and And I like being able to Wanting to talk Have a subject that's Engaging to the audience pretty obvious, but but I thought You know data or And so the first thing that I First thing I was worried about with with the data was it was having some sort of assistance layer What I what I first did was Rick Rick is this interface for you know asking all the kids and stuff and I wanted it In some sort of persistence man. I chose MongoDB that and I thought that'd be a nice way to Access the data and have how about persisted One thing I did was as I was getting to the MongoDB stuff is I was screaming at the data a lot and You know, I would run over some of some existing data Yeah, I have duplicate data and that sort of thing. So what I thought that exporting plain text would be really nice It's export out to JSON and that would provide you the important into MongoDB and also sort of a backup For the persistent data if I could mess up anything like I've been doing And I also wanted to do some math reviews and Mongo has a pretty nice way of doing that And we'll get into that in a minute So this was this was kind of the short script I used for important data and this first one is It just so I used the Ruby MRI Ruby and calling down the repo and the Connect count is just is just the total commits and I used that to to go to me to commit import the data about each of those commits And at this point, I'm just writing that to a JSON file And again, this is the original thing I was telling you guys about where I wanted to have a flat file Representation that I easily And so I thought this was a really nice way to do it and I'm finding that I really likes persisting with JSON just because it especially with some of the database options available now You know like couch and all you know, it's really easy to to go hanging out from JSON And then after we have that After I wrote out that file all we have to do is run a smaller import Specified the database and the collection and in the file And it just takes a line by line JSON file and imports it Into the longer DBs now we have access to all these commits This was kind of the first starting simple the first This is the visualization at this point. It's just kind of some day I was getting out of the repo and I show this slide just because it's so simple to do this sort of thing and especially with all the All the data heads here. It seems like we don't spend enough time looking at Just kind of factual information rather repository Like did anyone did anyone have any idea the Commits that the Ruby repo might have that this $22,000 people were thinking or more or less half a million Yeah Yeah, this is just like the some information in NNA I'm showing how I got this out of there and You can see and I was I want to show these one-liners because I was just messing around Console and information up Showing how easy it is to do the sort of thing You can see the queries are really simple even if you're not much of a long ago person And really long ago is a super significant to the application here. It's just the tool, right? Have to use for this case You know Some sort of sequel solution The only thing that's a little bit funny if I used to monitor is that Sorting you can see Dave negative one is Descending and then Dave one is They said Slide here So, yeah, I think when you're approaching this stuff just start simple start to make little queries that You can just get off this information out of or information if you want Grab you just start simple Pick your visualization libraries the next step that that I went down and there's a lot of There's tons of jobs for visualization libraries Graphiles A lot of some decent stuff I really like this product Matt Bostock It's a library that that produces SDG style those are visualizations and Yeah, that was really cool like that Great some tax. There's really clear in the training lots of lots of examples and good documentation This guy's also happens to be the Stanford Organization professor, so he's definitely got some chops Here's a little example and they show us on the program's website, but you can see that How appealing this style syntax is just with being able to call out these These properties Sort of method calls and changing everything all the way along It's really just It's just one long chain method calls to produce that and I thought that was really handy way really interesting way to Work with visualizations just being able to change all this stuff because a lot of times we add stuff to sort of imaginary canvas and so And in this panel Is that is our canvas and we're adding bars That's not a long way of data and You can see like there's nothing from first glance It's pretty straightforward. Even if you have this is why you're looking for only thing that's a little tricky and Turns out to be a really nice trick is that function D D times 80 so each When you're in the chain each property You can use a closure there with the function and get D and D is the current Data that we're dealing with in the array. So if it's so it applies Times 80 to each of those numbers in the array And that's to create the height. So yeah, this is a very simple example This is a little bit more complicated and this is Ruby code base from the epoch down on the left towards Current day and this is actually taking The delta for each for each Commit and adding it up. And so this is the size of the code over time. This is like how the Ruby library has Yeah, the MRI Has grown over the years and I have dates in there, but they didn't show up very well, but you can see on the Yeah, I guess I won't want to do that, but the black spots kind of towards the bottom of those Of the ramp is there's the year separations and Everything The interesting thing is you can see that there's sort of some like jets up and To the right and you can see times when they're been more aggressive development versus You know, it's kind of smooth and not very active. So I'll show how I'll show how I got here What's the first year Someone wasn't paying attention First date is 1698 and you can tell that there's probably That I captured So the way I did this is I wrote a little snapshot app to access The data from our Longer important that we did there and you can see it's very simple We're just including long ago and we're setting up a route There's just a repo slash commits that I thought you know at some point it'd be cool to have You know a little web service that you put in your repo and it shows you This sort of information just automatically Anyway There's a little kind of And it's that dollar GT Hashtag and that's just That's getting all the commits that are bigger than 300 lines change. I found So it's kind of a filter to only include the bigger commits and when that was smaller My graph was really huge but showed similar things. So I just set that as kind of cutoff So it's only showing the commits if we back up so these are only commits over 300 lines and I realized that dropping data like that isn't really cool, but for the sake of getting it on a slide and just And knowing that it doesn't change a lot of things by having that filter. I just want to have it with it And then yeah, so like the next thing is just creating a hash out of the fields if we want access to this is even more than we need because As I mentioned, it just uses the additions and relations from the commits and and so You can see here. This isn't a this is an actual view At the top you can see that you are all equals local posts and that's just interfacing to the To that little smotra app we created to grab the data out and And I'm doing this in sections that I didn't want to show all the current ones because it's it's a lot Even at this It's kind of a lot to get down and I'll show you guys that it's it's not super difficult to even create some sort of interesting as the is the change in line of code So this The sums is kind of where it's interesting the first part is just saying The sums You can see we're taking the additions most of relations and then keeping a running Delta of of the code that's changed and I'm just pushing that into an array And then returning that big array And that another thing is that really nice to have a small package of data tools That Allow you to to scale your data One of the things I found with any sort of visualizations that You do is too small or too huge so the scale kind of provides So it takes that Delta's array and make sure everything's between zero and two hundred sort of normalizing the data in that range This next is the tail of it and all of this is just adding those It's actually a bar chart, but I made the bar really thin so that you can see the progression over time There's nothing that super interesting opposite that that was the Totality of the of the code to create something like that One of the things I didn't like about Is For each visualization I wanted to do I need to create some sort of Some sort of endpoint for that didn't be access that and so for different sort of data I have to do another or You know make make another app or something like that And I thought I was getting a little bit arduous So I was searching around and there's actually a library called Ruby vids which shaman ported promo vids into Ruby and Allows us to have these to generate SVG server side with the right thing even better syntax So you remember the first part of this Demonstration our example code I showed is very similar to this, but you can see it in a Ruby style it's very clean and very concise and Leverages some blocks and taps we have available Ruby And produces the same result. So I was really excited about this because I wanted to use I Wanted to be able to interface directly with the data and produce an SVG graph Without having the right of a lot of service that takes in data and serves it up and that sort of thing So you can see here that Nothing actually at the very bottom It's kind of a kicker for this Ruby vids It's got a method called 2sdg and 2sdg is is kind of like the terminal call for Producing the visualization it allows you to do a certain manipulation up until you call 2sdg and when you call 2sdg it outputs SVG syntax which is just markup and Pipe that to a file and and create nice server side visualizations Another thing I forgot to mention about the code image Visualization is that every time I was rendering it it has to do all these You know it has to be the query and it has to be all the longer stuff Well, it's with the Ruby for 3d those example Just do it Once and then you have you know a flat file of your SVG you can access anywhere and You know you've paid the rendering task So one of the last 20 commits, this is just another little kind of example of Using Ruby vids with with our same Ruby library data and you can see that Another green and red it's not so green on the screen there, but with green and red It's as similar to what you see kind of tonsils file out part of Of your last few minutes and to do that. I've got another few slices of code here We just access our same collection and and basically the first 10 lines or so Is is sort of water plate for our examples. We're just opening the connection to Mongo and then You can see that the additions Is about two thirds of the way down And it's not a really really simple query We're just sending us the Additions for each in the last 20 commits. You can see that I have a little 20 on there And that's just for the sake of screen real estate Deletions to And so that scale is something I mentioned earlier when you want the data to not be you know wildly Huge or small The scale kind of gets it all in the form and to a manageable size And the nice thing is you can set you can set your exact zero to Whatever why height you are and in this case, you know, we were doing You know Positive on the why and they get along the why graphing with additions and relations and we're using similar scales of both I have to call that scale twice because The it builds a scale based on the data in that collection. So And and here's here's sort of the bar style that we're doing We get a protobits and We're and we use that nice blocks and tasks. So I don't know if you guys are appreciating this as much as I really do I love that that syntax is so clean and so tight that you know, even even seeing this for the first time Very cursory. It's it's almost obvious. What's going on? And this is just This is just putting out to a file and you can see I'm not being very sophisticated about HTML output, but I just want to dump that This to SDG and it generates market for us This was the this was the last visualization I did and this is committers by the amount of code they change and The labels are a little bit Janky there, but you can see that's the big purple ball at the bottom and You can even see you know tender love scum or green look over there and why down in the bottom And these circles are like a really nice way to Jam a lot of data really densely If we if we realize this in a sort of linear fashion, you really why it would be hard to stand a screen You guys can grasp very quickly Especially like the difference in code between Max is like that next to why Excuse me the difference in code base there So I thought this was really fun way to Look at the reading repository And you can see that There's a lot of variation Sizes and what people have done there so I'll just Walk you through the code and creating that And for this I used a little map reviews and And the reason I did that was so that I could Data for each All the roads we want to see all the commits for all the data and we want to add that And so I thought my previous would be a very nice way to do this And you can see the map reviews for them for the longer drivers a little bit Trusty because you just have to give a string for the JavaScript way of doing the map reviews with that Emit this author and reduce You can see I'm summing up all the values for each of the author then Yes, we create we create an empty array of all the nodes for all units and then pop each into So to take each of them the empty this big empty box of nodes and pop a data for each commit by author and then So yeah, this layout this layout path is that ball style And it just gives us a little bit of the Infrastructure for adding those dots and the fill-in stroke style are just built in color libraries that are provided through these Intensity I'm just rejecting that there is one called a computer called SPM and I think it was Just a Rata, I think it was probably from our SPM import What is why it existed so I just got that out of there and then the last thing is out of the labels for each of the offers And then again our little and this is the this is really for me the crux of Of my love for really just being able to call this that to SPG and having a static representation And so I haven't looked into a lot of Browser issues at SPG. I know there's a really good Site that is specifically just for SBD compatibility every week. It does like a test over all the browsers to see You know what supports it and what doesn't If I was doing this yeah For like a job or something. I'm going crazy about compatibility. This is Just a fun thing for me And I was just kind of using whatever worked and First up One last thing about this is where I got doing a visualization for Foundation Yeah