 Good morning. Welcome to Tokyo. Hopefully the summit's going well for everybody. My name is John Stanford. I am the VP of Software Development at Solinia. This is Alex Jacobs. He's on my team as a front-end developer. We're going to talk a little bit about data visualization and the process that we've been going through as we develop our product called Goldstone. It's maybe a little deceptive in the title. We're not going to talk about Horizon much at all. It's a tool that you're probably all familiar with and our goal is to talk a little bit more about other things that go beyond Horizon when we talk about visualizing data coming out of clouds. So a little bit about our goals for today. One is just to familiarize ourselves with you and you with us. We're all open-stack people. We're all working in this space and I'll give you a little intro about who I am and what our company does and all that stuff. We want to share some of our experience mostly about the process of how we look at data and develop visualizations. We want to learn from you, so towards the end of the presentation we'll leave them for questions and answers and hopefully we can all interact a little bit and share some of your experiences and see how this all resonates. So as I said, I'm the VP of development at Selenia but I'm told we can't work all the time so I do some other things. I like to hike around in the forest outside of Oakland. I have a beehive out in my yard and pull some honey out of it once in a while and I have a little rack of equipment that I use for experimentation with stuff that doesn't necessarily fall into my day job. The Selenia story is that we're a consulting and product company. We have consulting services in the areas of conceiving and working with cloud strategy at the high level in organizations. We architect and design clouds in a distroagnostic way so if you're in a state where you don't exactly have a distro choice made we can help you with those kinds of things. We do cloud integration and implementation services as well as helping adopt cloud technology. We also offer training services. Training services are in the realms of OpenStack both core and specialty and then James Louise also in Docker and CI, CD type of services as well. Our area is the project called Goldstone. Goldstone is a tool to help ingest cloud data, logs, events, API usage metrics and that type of thing and analyze that data, give you some reporting tools that help you operate the clouds that you want. The tools that we use to do all this stuff are the typical litany of things you might see in a data pipeline. We're heavily based on Python and Django. We're pulling data out of OpenStack and Docker for the front end which this talk is about. We use a lot of backbone and D3 and JavaScript and things like that. On the way back, we're using LogStash to pull in data. All that's fairly flexible and fluid at the time as we grow will change the stack as necessary. On the topic of data visualization, it's really all about the context of your data. What kind of environment are you working in? Are you in a dev test production environment? What are the timing of the events happening in the cloud? Are you seeing things unusual in a time where you're supposed to be in a low demand? Are you seeing things in high demand that don't look right? The relationships of events to other things around you in the cloud and the dimension of the data that you need to get a good sense for what's happening. Part of the challenge of visualization at scale is that there's so many metrics that you need to see to get a real picture of the meaningful elements of your data and part of the challenge is getting those key dimensions in a way that you can make sense of it without too much distraction. Ultimately, the movement of data over time is challenging to keep up with. I'm going to turn it over to Alex and he can talk about some of the theory and practice around cloud development visualization. Thanks, John, and I want to say good morning and happy to see so many people here. I generally like to describe myself as a musician with a day job and the reason for that I want to mention is that I made a fairly late life career change over into tech after being a professional musician in healthcare playing at the bedside for people and as I was in that process many people would say to me musicians make good programmers so I would just shrug my shoulders and say okay but I'm starting to really understand what that means in terms of whether we're coders or musicians or I think novelists as well or people who are holding the intangible creating patterns from things that don't actually exist and the only reason I wanted to mention all this is it's also a fascination of mine to come to a conference like this and learn more about who are the people that are actually making up OpenStack and making in this community what are our life experiences beyond this that shape the way we see the world because ultimately that's what we are creating in terms of software. So just a brief intro and now I was going to talk a little bit about theory and these are some of the things that inform where does this world of data visualization come with it and obviously at this stage nothing started in a vacuum we're looking back now to principles and design grammar that has been around for quite a long time so this slide that is showing now the central image there is something that I snapped a photo of and what I'm showing in general here on this slide are data visualizations that are not necessarily that compelling or scalable for large sets of or changing sets of data what is generally represented in a chart like this are low data sets unchanging data sets or displays for very particular presentations so I'll also show some better examples of typical techniques but it was interesting to me that there have been many signs around the Tokyo Metro and train stations that even though I was having trouble with the Japanese language I was still able to understand very clearly what was being presented and that is what we are going for as a goal in data visualization but this one on this slide is an example of one that I really could not figure out whatsoever and the slide on the left which I got from a site called viz.wtf was they called this a typical government slide it's something that presents something there's data behind it but what is it actually communicating if you're sitting there scratching your head so am I and this is what we are we want to also show examples of things that don't work so we can see a little mix and match again I mentioned the grammar or the vocabulary of data visualization and looking back a lot of what we are using today comes from people who had mastered these arts from the world of static data literally statistics and so here's just four in brief that I want to mention who have informed a lot of what we do today if we look at the grammar of graphics by Leland Wilkinson when we look at our Microsoft Excel make me a pie chart make me a bar chart or whatever it is we are working at there was a point when someone had to say well what are the elements of these graphs how do we abstract this and how can we break this down so that we can actually create these visualizations Wilkinson was one of those people who contributed heavily to those abstractions and a visualization program called GG plot or grammar of graphics was based off of his research and his design philosophies so this became a foundation for scientific journals, newspapers statistical packages and other data visualization systems so again when we see things like radar charts, maps, plots this is the person who offered up a lot of the theory behind that and looking at another master collection the visual display of quantitative information by Tufti which is the name that you'll probably hear a lot if you look into data viz and from this book I also highlighted what is often considered the greatest data visualization of all time and I forget the exact year that this chart is from here but there's a clear representation of six different variables going on here this is Napoleon's march to and from Moscow and what is being shown here are the size of the army the location the direction of movement temperature and the location of major river crossings and so to pack that much information into something so concise that you can sit here and study it and actually figure out the story from that is something that becomes a path forward for all of us trying to figure out how to do this let's see exploratory data analysis by John Tufti again this was before we had MacBooks crunching numbers at blazing speed this was someone who was coming up with pencil and paper techniques of how to take data sets and come out with summaries or theories about what was being represented by our data sets so it's, I think, tempting or easy to forget how far we've come in such a short amount of time with the increase of personal computer power the elements of graphing data William Cleveland another explanation of methodology and resources for people doing scientific research so that's where we've come from and now that we're moving into highly dynamic, large, changing data sets how do we now take those original grammars of visualization and make them something that suits where we are now this image at top here was taken from the perceptualedge.com which is Steven Fugh's blog and he's well known for someone who solves some of these problems when we look at 3D pie charts that's often something that's sort of the classic bad example of taking an old style of representing graphics and so here's a perfect example where he says, okay, well, here's what we're going to do with that we're going to put things into a stacked bar chart and how easy it is to come up with some sort of quick visual understanding of the data held behind there where here you have to look at the color you have to look up the key, it's just a mess so over here in the lower left corner Donna Wong is someone who said here I'm going to just take all these different examples and one by one give you a detailed description of here's the right way and here's the wrong way don't do your pie chart like this or do it like this and that goes forward through all of the different styles of representing data that you would see in something like the Wall Street Journal there's someone who has already come up with this and written down the list of rules this is what you do, this is what you don't do so again just furthering the grammar as we move forward in time and we're representing more and more complicated data and then in the lower right corner one of the most significant contributions in terms of what we're working on together in our company today, Mike Bostock who was one of the main authors of D3JS and the three D's stand for Data Driven Design and so I'll be talking more about that and giving a brief demo of that as well My first day in Tokyo I was out visiting a design-oriented bookstore and ended up taking this photo because I realized it was a perfect example of data visualization and so here are these pictures that I took now in our slideshow and I wanted to point out that once I give you a brief explanation here I think hopefully you'll agree with me that this is a quick, intuitive, narrative and scalable way of presenting data So the artist made up this series of life stripes and they queried, it looks like maybe a couple hundred different artists and designers and they said, I want you to record what you do for one 24-hour period and I'm going to key it by color and lay it out Now I'm not saying that I can look at this and understand exactly everything this person did all day long but what it does give me is an immediate access to a story that I can relate to I can think of the way that I spend my own days and how everyone else does as well has both similarities and overlap and also great differences and all that is contained here in the larger view where I'm looking at hundreds of people's lives and I'm instantly making a narrative I'm going into the data and making a story and making inferences and if there's any one particular pattern that interests me I can zoom into it or, in this case, walk over to the piece of art and look at it and figure out stories about the person that I was interested in looking at so I found that to be extremely effective and inspiring So, again, we're working with principles that are as old as Leonardo Da Vinci and before that but I just wanted to highlight three of his classic and beautiful images here and what I see here are in a static two-dimensional drawing I see movement, I see layers, again, I see narrative and these are just really, really inspiring ways of seeing how data is represented So, now we're going to go from the very old to, again, to the current and the modern This, again, was on the Tokyo Metro I was lucky enough to get here a full day and a half before the conference to have time to tour around and I saw so many inspiring things around the city including some that were relevant to this talk and this is something that I had read about in one of Mike Bostock's who, again, was the inventor of D3 he had talked about train table layouts as being a perfect stem and leaf chart and, again, even without reading too far into these charts or, you know, even without feeling overwhelmed by the fact that the chart is mainly in Japanese with some transliterations or translations, in some case I instantly saw a story happening here and then where I wanted to zoom in and find out more I could just read the translations here and I noticed that this is a breakdown of weekends and weekdays train times for the first trains and the back trains and you can see the shape being made here is a tapering up and a tapering down which, in the back of my mind, tells me, alright, around four things begin and around midnight they end and if I'm at the station at 1am I'm in a capsule hotel or taking a very expensive cab I thought that was a great use of layout grouping things together with color conjunction of related data over here we see motion and over in these bullet points these are just some of the, again, part of the grammar or the vocabulary of design that are being applied here this was the, I think it's the Yamanote line which is the ring metro and so much communicated here color instantly drawing my eye to the location of the ring that I'm on numbers easily decodable for how many minutes it is for all the stops coming up and so I just thought these were excellent examples to share so I'm going to turn it back over to John who's going to talk a little bit about the practice of these elements in our research thanks Alex we started working on Goldstone a while ago and the goal of Goldstone was to take in data from OpenStack Clouds and present it in concise novel ways that give you an overview of your cloud holistically and not necessarily in a project-by-project sense but across all the projects of OpenStack so it's a work in progress and we continually look at evolving some of the visualizations of clouds as we get better sense of what the data is and the scale of the clouds that our customers start to look at but on the left you can quickly see that we've got a visualization of all the a region in a cloud with all the key projects Nova Neutron, Cinder Glance, Keystone and all the things that the resources that are within those projects and you can quickly navigate to the configuration of all those things and see what's going on this is, so as a developer of the product I look at this and go, oh boy I mean I get to a table at the end and it's not really scalable but it resonates with people people can get a quick sense of the structure of OpenStack by looking at this little tree we've got some other things something that's just a log flow that we use a lot just to see if logs are being generated anonymously fast or slow and then you can quickly look at logs and search and sort those things and then one that I like particularly even though my math background tells me that it's not really valuable is the little API performance graph so we're presenting the average and the min and the max of API calls for a particular type of call across the cloud and you can quickly see whether things are performing fairly consistently or if they're trending toward slower if you've got spikes of problems and as we go through we look at this and go okay what would I do with it next well one I might want to condense it and give you some different views of these things to show how this current data 15 minute view or our view is comparing to a monthly view for instance or on a day scale is this Tuesday just like last Tuesday or the Tuesday before do I see seasonality kind of patterns and data so one day we might have something that looks more like this where we take advantage of these scaled up complex views so we talk about these things we have the data we'll go model one of these and Alex is going to show us a couple examples or an example of some of the stuff that we've done with one of these and how we started to think about it and this again wouldn't be possible very easily without the D3 package it's really amazing how a community of folks who like data visualization have come together and just put out a great set of potential visualizations for different purposes that we can draw from and go okay does this meet our needs for presenting the data concisely scaled what are its limits along the way I mean we're first of all how many folks operate clouds here so we got cloud operators and then on the other side of the coin how many folks are software vendors providing software for clouds okay so for those of us on the software vendor side we typically don't have giant clouds in our environment to work with so one of the things we have to do is think about what real world data looks like and try to model that data so our process kind of looks like hey we have these metrics we think we want to tell a story and the first thing I'll try to do is understand the shape of this data and use statistical distributions to represent that shape so for example a Poisson distribution by the way my math background is so old I do a lot of research to figure out what this all means but a Poisson distribution is good for modeling the flow of events over time so on average I have a hundred events that occur in an hour if I want to do a probabilistic projection of that Poisson distribution and say that I have an average of 100 and I have a range that comes out and I get a unique pattern of data every time I run the simulation if I have a set of resources that I can look at in my cloud and see patterns of action initiators and targets of API connection or something like that and I happen to have 25 users in my cloud but I want to create a visualization for 150 users I can look at the distribution of those resources in my real data and then scale that up using one of these statistical models so defining your target scale is important and then it's pretty straightforward to use tools like NumPy and Pandis and things like this because they have all that statistics built in so with a little web browsing about the API usage you can quickly generate a set of data that's fairly realistic for a scaled up cloud as a proxy for actually having a scale of cloud to work with and then you can generate that data and visualize it and see how your visualization performs at that scale and if it works great you hit it try number one fantastic if it doesn't then you iterate and figure out how to modify things to get a better visualization so I'm going to turn it back over to Alex for a little demo of some of this stuff and then we'll wrap it up and leave you guys some time for questions and answers okay thanks alright so I just have to turn on mirroring and let's make sure okay D3JS in five minutes or less we'll see how long this takes I just wanted to apologies for speaking above or below anybody here I'm going to try to make this narrative and not so that doesn't matter whether you're a programmer or not and just give you block by block what's going on in this relatively simple D3 visualization here and again you've already seen some D3 images in the slides and we're going to show you in action in Goldstone so here's just some basic code for telling your computer browser that you're about, sorry your internet browser that you are about to give it some instructions here's some scripts that say okay here's some basic variables for margins widths and heights and this I want to highlight data, we're just starting with data it's just a container, nothing in it at all and then now all of a sudden I'm putting one thing in there and that is what you're seeing here on the screen over here on the right it's that one thing that I put in there that's right from the start and now when it comes to D3 I'm setting up some graphics that work right out of the box with the browser this is called SVG again I know many people here are already familiar with this I'm putting in a group on that SVG and then I'm using D3 now all of a sudden for the first time saying set me up some scales here that if I give you one number in you give me a number out and I want it to be within this range and the numbers I'm going to give you are going to be within this range so for example if we were going to convert yen to dollars roughly we're going to say if I give you a thousand give me back one okay so I'm sorry ten you want to do business with me right okay so again for X and Y scales we're saying give me back things within a certain range and here's a sample if I set up a linear D3 scale range 0 to 100 domain 0 to 1000 and then I convert the system it gives me 100 2.5 turns into 25 feel free to just follow along and just keep these rough ideas in your mind and now I am putting in some axes for the X and Y axes and then here is where I put the first rectangle on the screen okay so now I add a listener because I put a button here and every time I click that I want to put a new random data point in that container I mentioned of data and then I update that part of the scale where I said remember when I told you to give me back numbers between here and here I said well here's the numbers that are going in that's changed now because I've added some new data and then I want you to add another bar and then there's some D3 specific things that are going to happen anyway so to see that in action we can see down here in the console I'm asking the browser to let me know what data is being inserted every time I hit this button so now I've got 2 and 66 and lo and behold where we originally saw only the 2 filling up the entire box D3 has automatically computed for me well now I better be able to represent the entire data set so I better scale things press it again now I've got let's see I'm having a little trouble with the mirroring apologies now I've got a 15 so that was less than my maximum scale doesn't need to change 67 oh that's a little bit higher now so the 66 has moved down slightly and so on and so forth it's updating the scales to fit the data that's going in and down here at the bottom it's also showing how many elements are in there each time I click that we see a new element being entered and I can keep going if I want now that would be what we consider a very unscalable system of representing data so I also included something in here to prune out anything from the data set if it gets larger than 6 so let's just do that again quickly now we only have the last 6 data points that were generated and if you look to the left and notice the Y axis you'll see it adjusting itself to accommodate for the largest data set now this is very jumpy this is basically the next step of taking just a static display of data there's nothing all that exciting about it how do we use all the tools at our disposal to actually tell a story about how this data is changing rather than just see one thing jumping to another well that's where D3 comes in and if we if I uncomment what's known as transition now all of a sudden we put a time element into that change and now it's defaulting to I think 250 milliseconds for each of those changes and something that was previously hidden was that when that sixth element is pruned off, when the new one comes in we see there's something that's turning red and flipping off the screen so what if I want to make just that part take a little longer well I go into what's called the exit selection within D3 and I can very specifically change things just in there and these are milliseconds here, what if I tell it well take a second to get rid of that last element there now it's sort of creeping off the screen in a second anyway that's just to show you the flexibility of data that's coming in data that's already on the screen and data that's leaving the screen what I want to show you now is a quick example of that working in Goldstone when we are bringing in cloud data right now I'm showing event counts for the last 15 minutes with an APL calls across our cloud if I change the look back to an hour well we see things smoothly change and if I look back at six hours we see that I've only been running this cloud on my laptop since 10 this morning so that's where the data ends we can zoom in by changing it to an hour and even further with the last 15 minutes and over here the x and y scales have been updating themselves accordingly so that's just a little bit of an explanation behind what is not actually magic but is a system of declarative or functional programming that Mike Bostock created to abstract away all the what's going on underneath the hood for us next piece of our demo is what's called a chord diagram and as John was talking about creating appropriate data sets in order to model real life cloud applications he used some of the python data generation tools that he mentioned as well as some of the distributions to simulate what if we had a number of users in a cloud and we needed to figure out how many of them were potentially you know potential hackers who have gotten into our system and were making requests that were mainly failing so on the front end I coded up something that would say well not only do I want you to draw me all the connections between users and the endpoints but if there was a rate of greater than 80% make them stand out all in red so we see that I can't read their names because in order to fit all these on here I had to make it too small but these two individuals here have oh sorry it's Lorraine and Jules they have been busted by the system of taking really what is just needle in a haystack kind of data until you start to make pictures about it and we realize that we can create narratives and stories and look deeper into the data that is otherwise literally incomprehensible and another example I wanted to show you briefly is this is not something that I created and again this one here I also wanted to give credit to Mike Bostock the creator of D3 I have merely adapted his visualizations to our data D3 is one of the most amazingly documented systems which have tons of examples so there's lots of great starting places which you can then adapt to your own data sets this co-occurrence matrix I just want to show you a transition within D3 in play here the problem with this here is that no matter how long I've been looking at what it represents it's a co-occurrence of the characters in the novel Les Mis I still haven't been able to figure out what it's actually saying so this is where we have to be careful between bells and whistles and actually clear narratives so if you feel so moved to figure this out you can go on d3.org and look at the examples page and you'll see this amongst many other examples and then when you've got it sorted just let us know e-mail us reach out to us on twitter let us know how you interpreted that indeed so I'm going to turn it back over to John after I can turn off the mirroring excuse me okay and now I just have to start the slide show again there you go time check okay so just sort of wrap things up here and bring together some of the recurring themes or thoughts throughout here one is that in order to do good data visualization you need to have a good understanding of the data you're visualizing as well as the audience that you want to present that data to so the Les Mis thing is a good example of hey we've got an awesome data set but clearly I'm not a literature expert so just looking at it to understand what what that means without probably doing some deeper research into the data on my own which is maybe not a great place to start if you're trying to create operational tools for folks who aren't software developers or statisticians or mathematicians so we spend a good deal of time trying to grind through that and make visualizations that actually solve operational problems problems another good point is that there are a wealth of tools out there especially in the D3 space that have already been created in that we can accelerate development by using those things and not reinventing the wheel the framework is there we just need to take our data apply it figure out the story to tell and then create that story and it's quick enough with data generation even real cloud generated data to do that in an agile way and that we should keep a healthy curiosity about the environments that our products go into and that the data comes out of and to be patient it's not quite needle in this haystack hunting but it takes a while to get a good sense of these things so that you can create real meaningful and useful visualizations that someone can look at and go this is important not only do I get it it's useful to me right now and so that's what we're striving for and it's very evolutionary in that we'll continue to work on evolving these things as we learn more and as we learn more about how customers use data so with that I'll just say thanks on behalf of myself and Alex for your time there's some a few minutes for questions out there and so if anybody has some I'll here's a question over here and then I'll put a shameless plug for Selenia hiring people up here we're looking for engineers, we're looking for consultants we're looking for all kinds of folks that are in the open stack and open infrastructure space a couple questions is D3, what's the licensing there is it a patchy license it's either a patchy or MIT I'd have to look but it's it's open definitely and it was friendly to a patchy and I heard you mention SVG is it 100% SVG based or is it also use some HTML5 specific graphic I believe it is agnostic and you just have to figure out how to do the data joins within D3 I've only been working on it in SVG but I understand people are using it for canvas as well so D3, what it does it takes that data and binds it to your DOM so whatever your DOM is if you're using SVG elements in your DOM you can bind data to SVG elements if you're using canvas elements you can do that if you're using HTML5 elements you could bind that data into those elements as well thank you do you plan eventually to have your work be used for like data like for example do you show alerts which they do or nova performance so is your product totally separate from the horizon UI or from the salometer UI or do you plan to merge at one point so we have we have a standalone user interface that leverages some of that data as well as data that we get from the notification boss and from other places so we've done some experiments with SysPanel and making panels in horizon we haven't taken it very far yet but I wouldn't rule it out in the future other questions in your JIT Hub project Goldstone, where is most of the data sourced from is it already from salometer can you put some light on that and does it introduce any kind of overheads into the default configuration or operation of open stack yeah so we do two things we adjust settings for the notification bosses in the various products so that they're actually turned on so event data gets pushed to the notification boss and we'll get that we also use salometer data coming directly out of salometer and we use SysLog to point our SysLog data at our server for ingestion so basically at our log stash interface to ingest that data of those things I suppose the SysLog thing might have some impact because you're setting up a remote SysLog setting rather than dumping it to log files on the machine to be quite candid I haven't measured the impact of that stuff yet so I can't tell you how significant it is but it's probably not that significant other questions? okay I think we're about one minute over our time so thank you again for your time I hope you enjoy the rest of the video