 Welcome back to theCUBE, it's Jeff Frick here at the O'Reilly Fluent Conference at the Hilton Hotel in San Francisco. You're in theCUBE where we go out to the events, we extract the signal from the noise, we try to find the smartest people that we can in the room and ask them the questions that you would ask if you were here. We invite you to join the conversation. The hashtag for the event is FluentConf, hashtag FluentConf. So without further ado, I'd like to welcome Irene Ruse to theCUBE. Welcome, Irene. Thank you, it's great to be here. So Irene had a portion of the keynote, there was a lot of keynotes this morning. There were, indeed. And you're from Boku, so before we jump in, why don't you give us a little bit about what you do at Boku? Sure, so Boku is an open web technology company, and that means we do three things. We do consulting and training and community development and all really in the name of making open source a viable alternative to closed source software. So we believe that if open source becomes stable and widespread and if people use it and start incorporating it into their process and contributing back, that's really how you give it a viable future. So we've been working really hard on, with our clients, but also with a lot of open source contributions from the Boku staff to just kind of really promote that mission. Okay, so your keynote was titled the ABCs of visualization, but I think you changed it during the keynote to make better charts, right? Architect better charts. Architecting better charts. That's right, which is still ABC, so I feel like it kind of fit in. Yeah, it was a bit of a decoy. Well, you know, because we were announcing a new library and I didn't want to put that in the title, obviously, so, and you know, they were still very much related because I think building the ABC of data visualization is part of it is making charts and thinking about how to do that, right? I think is a really big question. Right. So let's talk a little bit about kind of architecturally in terms of visualizing data and kind of what are the objectives? Because on one hand you think, wow, it's super obvious if I can see a picture with a thousand words and I've got this massive sea of stuff over here and then of course, everybody's talking about big data and the internet of, you know, the industrial internet and the massive information coming off a jet engine flying to Tokyo and we've all heard the things, right? But, and I see, well, that's interesting, but is there so much data in there? How do you put that into a visual perspective? So from kind of philosophically, when you talk to clients or you talk to people that want to get a better handle on their data through visualization, what are some of the kind of key frameworks and ideas that help drive that process? Sure, there's a bunch of questions in there. You can pick a couple of your favorites. Sure, so we definitely, in terms of working with clients when we do data visualization work, what we found is that we very much follow a certain kind of pipeline. Okay. And that, you know, often they'll either come to us with data or questions. Sometimes there isn't even data yet and we try to, and then we spend, you know, a decent amount of time going through that data and researching really what that data can tell us. So sometimes there's a pie that sees, we try to confirm or not, and, you know, or we can't and sometimes we have to go harvest more data or we have to transform the data or analyze it or reduce it down. I mean, you know, a big question of big data is, you know, how to visualize it? Well, you never really visualize big data. You first reduce it down to a meaningful subset and then you visualize that and at that point it's already, you know, of the appropriate size to visualize. So there's definitely kind of a cycle of, you know, cleaning up and transforming and analyzing and then visualizing. Okay. You know, we have on staff an economist, the statistician who helps us a lot with that process and it's been really great to Adam Highland to have him on board. And so that's a big part of it and then they know what we do actually get to the point of needing to visualize it for the web. That's when a lot of these things come in because, you know, you think about tools like jQuery, there's a million ways to accomplish the same thing and some of those are going to be more organized and cleaner than others. And so what ends up happening is that, you know, we do see a lot of people making more and more data visualization but we don't talk as much about the patterns quite yet of how it is that we do that right. I think we've done a really good job of that in just general JavaScript land with things like Backbone and UmberJS and Knockout and Angular. We're really, really pushing the envelope forward on and we see frameworks and I think we're starting to do that also in data visualization which is really what I talked about today which is the question of what does it mean to architect charts and from our perspective it's how do you make charts that are reusable? Right, right. Which really, you know, covers four concepts. One, the repeatability. Repeatability, okay. Tough for work then. Get the scroll of water. To be able to instantiate multiple versions, things in your chart to configure ability and being able to give your users an API so that they have some control over how your chart looks. Extensibility, being able to build on top of the charts you've already built without really just modifying that code and then composability which is really kind of the pinnacle of database which is when you start composing charts from other smaller charts and kind of making, you know, new things from the old. So that's what really D3 chart is kind of about. So talk a little bit about kind of hypothesis-driven visualization versus, you know, the proverbial, again, you hear these great examples. I'm gonna throw all this data in and I'm gonna have the visualization and I'm gonna find the needle in the haystack, right? I'm gonna find some correlation, some relationship that I had no idea and it's gonna solve all the world's problems and give me promoted. So talk a little bit about when you're talking to clients, again, using visualization as a tool. You know, what is kind of the hypothesis versus the needle in the haystack? I'm just gonna throw it in and scramble it up. Yeah, I think everyone wants that beautiful moment of you running an average and suddenly something standing. I mean, that's the most basic, obviously, metric you can compute and is really never sufficient but you know, rarely does that happen. A lot of the times, you know, the patterns you think are there aren't because they're driven by our everyday interaction and so on and you know, it's really easy to mistake correlation for causation. Yes, yes. We see that in the media every day, right? Exactly, that's a really common pattern and so, you know, a lot of the times, you know, they'll come in and they'll say, oh, we want to visualize this, you know, this particular type of data and we want to use this particular type of chart and that very rarely works so we have to sort of work with them to sort of really explore, well, what are we really saying because, you know, I love data visualization but I also believe that everything is data visualization. Your Facebook feed is a data visualization. It just, you know, every single data element is a status and then they happen to be in a list so the range of what you can build and still call it data visualization is really vast and so the most important part is to sort of do right by the data and kind of the narrative that we're trying to tell here so, you know, there's this whole big process that has to do with just visualizing the data and preparation for visualizing the data which is really, you know, it's funny, you can make 100 scatter plots and line charts and look at distributions and things and you're really just doing that to get kind of a mental picture of what the data looks like and you know, when that's all said and done you sort of have a better idea of what you may want to highlight or not but you really can't get to that point until you really understand kind of what your data looks like. Right, well it's interesting, you speak of the narrative the narrative that's in the data both that which you want to get out of it as well as that which is in there that you're trying to extract. Yeah, yeah, oh it's a really, It's a story, right? It's a story and I used to, instead of this slide and various talks that would say where data visualization is not objective because we think that data and visualizing it will result in this objective picture but the reality is every choice you make is somehow going to impact the viewer. If you think about just like taking a number and coloring it, if you color it blue it may be interpreted as a temperature that's cold if you color it red it may interpret it as something hot, right? And that's a really tiny, tiny thing and so it's really easy sometimes if you just look for different visualizations of the same data to just see how different the narrative is because of the various color choices and the really, really small things and so it's really important to balance those things. It's really important for us to stay very true to the data. We definitely would never create any visualization that lies, quote unquote. Right, we used to call that in college how to lie with statistics. I mean, the classic case is your Y axis on a simple two scale. You have a big Y axis, the deltas look very, very small. You have a very compressed Y axis and oh my God, look at the changes. Definitely, I mean we've seen pie charts from they've shoved 130% but I've never, I'd love to have under 2% of a pie. So talk a little bit about kind of, again as you're helping people figure out kind of where the method to most effectively get the information out of their data via virtualization. What are some of the tips and tricks, kind of high level things that you lay out for them or do's and don'ts that really should go into that first cut of organization and really planning to get to the end state? That's a really good question. I mean, I think a lot of it has to do with your data and the quality of your data and are you capturing everything you should be capturing. You know, we try to, when we work with folks helping them make APIs, you know, we try to help them do that in a way that will allow them to then be able to analyze it. So things like time series are really important, a really important form of data because it really lets you look at patterns over time. So that's a pretty common dimension that we like to capture and you know, I also think that it's important to get to a point where you're comfortable, kind of just looking through your data. One thing that I think became a pattern is people really jumping in on the real time analytics and that's a really dangerous kind of territory to enter because it may cause us to impossibly react when we don't really need to. It's really important to understand the overall patterns of our systems and of our data and then look for potential anomalies and so that's been one thing that we kind of tried to also instill when real time data is involved. But you know, I want to say that there are consistent patterns just in that process but I don't really think so. I think the variety of data and I think that's why in part data science and data visualization are becoming such an important field and area of work is because there's just so many different applications that are coming to play which is why, you know, we're starting to come up with languages and separate them by size and purpose and so on but I think there's a lot of overlap between them. But that's an interesting take that you bring up with the real time, especially as again, we're talking about internet of things and industrial internet and there's going to be so much of this stuff and to not react necessarily in the real time to real time unless it's appropriate because it may be masking something that's part of a much bigger trend or maybe an appropriate trend or the right trend. Kind of like what are you managing to question, right? Yeah, absolutely and in some situations it's incredibly important, right? I mean, if you're in a medical field or you're flying airplanes, it's very important to react to real time data so I definitely don't want to take away from that but I think we've started bringing in some of those practices into places where it's really less urgent and you know, there's a danger of kind of overriding. But context is always key. Context is always key, absolutely. Okay, so we're about out of time, I want to ask you there's got to be one great story that you have of working with a client and going through this process where maybe you found the golden nugget or there was the aha moment or everybody was caught by surprise. Do you have any fun stories to share with us? I think probably one of my favorite pieces that we've done last year was actually while we worked with the Guardian Interactive Team on the MISO project and one of the interactives we built was looking at donation data to the Somalia famine and it was really interesting because we started also analyzing just the media conversation around it and just really looking at whether there was correlation between donation behavior and then also media publicity and so on and there really wasn't that tight an integration between the two which is really interesting, you know, considering how important it is for NGOs to reach out and granted this was not, we did not have data directly from NGOs. This was really much more sort of high level and so on but it was just really interesting to look at these completely two different pieces of information and try to figure out how to tell that story. Right, great. Well, thanks for coming on theCUBE. Thank you, yeah, I appreciate that. So Irene Ross from Boku. She's a data visualization specialist. I don't know if your keynote is online, hopefully they'll put the keynote up somewhere. But again, thanks for coming on theCUBE. We are here at the O'Reilly Fluent Conference at the Hilton Hotel. We'll be here all day today, all day tomorrow. Again, getting to the smartest people we can find in the room, asking them the questions that you wish you could ask them if you were here and giving that information to you. So again, Jeff Frick, we got our next guest in just a few minutes on theCUBE. Be right back.