 Live from the San Jose Convention Center, extracting the signal from the noise, it's theCUBE, covering Hadoop Summit 2015, brought to you by headline sponsor Hortonworks, and by EMC, Pivotal, IBM, Pentaho, Teradata, Syncsort, and by Atunituan Disco. Now your hosts, John Furrier and Jeff Frick. Okay, welcome back everyone. We are here live in Silicon Valley for Hadoop Summit 2015. This is theCUBE, our flagship program. We go out to the events and extract the signals from the noise, I'm John Furrier. My co-host, this segment is Jeff Frick, and our next guest is AJ Anand, VP of products at Keevo's Insights and Praveen, Hanqaria founder and CEO of Inputus who was here earlier. Welcome back to theCUBE and welcome to theCUBE. Thank you. Okay, so let's talk about the technology. We had a great conversation around the services and the big fortune 50 companies, these large deployments. Services led, that's great, but now they got to execute technology. AJ, tell us what's going on with the technology you guys have and how that relates to the challenges that the large enterprises have, because you guys have a unique scalable product. Tell us about that. Yeah, so we're here at the Hadoop Summit and what Hadoop enables enterprises to do is collect all sorts of data at tremendous volume and bring it all together into the Hadoop quote unquote data lake. But the challenge then is how do we make that accessible and interactive for the business user? How do they get their hands on their data and get insights from it? And that's what we enable them to do. So we enable a business user to interact directly with the data on Hadoop, visualize it, slice and dice it with interactive instant response, even though the data can be at massive scale with massive granularity. So I want you to explain something for me because I'm kind of getting the big picture I want to do a drill down is, Praveen and I were talking in previous segment where customers just want solutions and outcomes, right? So they don't talk in the speeds and fees language of the industry, but when they come here, they say, okay, hey, the old way I know I've seen this movie before. I have my big data warehouses and I kind of know what I've inherited costs and they know all the costs of loading everything else. I don't want that anymore. I want the new way. I want to be open. I want to get advantage of open source and I want to be able to work with data, build apps, deliver some value and then incrementally transform my enterprise. But I don't want a situation where I'm locked into a platform or tool for that matter. So what do they do? Yeah, so what a business user is looking for is self-service interactive analytics on data regardless of what the size and scale is, right? And without having to depend on somebody else to be doing the programming and creating the reports and so forth. So they're used to doing this in an enterprise data warehouse using OLAP tools for doing that interactive analytics. And OLAP is pretty much entrenched in pretty much every enterprise out there and has been there for years, right? Now as you've got the new age of big data coming in and you've got data with tremendous volume and variety trying to fit that into an existing enterprise data warehouses doesn't work, right? Because the enterprise data warehouses are pretty structured and inflexible. And in order to make a change to an enterprise data warehouse it pretty much takes an act of Congress with multiple months of effort to make those changes. Hadoop is a much more cost effective and flexible environment and forgiving environment to bring in all of that data. But they still want to be able to deal with things the same way as they're comfortable with the OLAP way of doing analysis. And that's what we do. We kind of marry the benefits of OLAP with the scalability and flexibility of Hadoop. So I've got to ask you a question because Amazon has proven it's Amazon web services and we do all their shows, do it with the Cube and some interesting use cases have come out of this some of these guys that can literally spin up supercomputers to do massive, whether it's regression analysis, black cells or some modeling that they could never do before and highlights the benefits of spinning up massive compute. Okay, good job. With this now what we're seeing is I want to do essentially the equivalent of me spinning up massive data warehouse but I'm not sure if I'm going to be able to double down on I need to see some data, but I need the benefits of scale instantly in a short period of time so I can see and analyze do I have the right connections and formulas and data and then look at the results and then iterate again. So this is going to agile meets data warehousing. Is that what you guys do? Can you deliver that Holy Grail dream? Yeah, absolutely. So as you're bringing the data in, you don't have to pre-commit to a structure or pre-commit to a schema. You can interact with the data. You can build the old lab cubes and then get the results right away and then explore the results and look at different dimensions and aspects of the data and you can realize, is this the right cube? Do I want to change it? Do I want to create a new one? You have the flexibility to create as many cubes as you want and expose them to different kinds of business users or business use cases. So it's a much more flexible environment than existed with the traditional enterprise data warehouse. I love how old lab cubes in the cube are all in the same sentence. It's good for our tweeting mojo. Get the cube out there. No, but the old lab cube really signifies performance. Talk about that use case. What is the old lab cube for the folks out there that understand the magnitude of this scale? Yeah, you know, so if you think about Hadoop, there's kind of two ways of doing things on Hadoop. One is running machine learning algorithms and so forth where you kind of submit those algorithms, come back after a while and you get some kind of result. And the other is where the human aspect of it, where the business user wants to interact with the data and he wants to be able to follow a train of thought and follow it through as he's exploring the data without submitting a query and then going off and getting a cup of coffee, right? So what we do is give you that instant response time. So even though you're dealing with data in massive scale, this could be billions of records and that's literally true. We've got customers with hundreds of billions of records with dimensions with cardinality and the 100 million plus, 200 million plus range is what they're talking to us about. And you still get the response time within a couple of seconds, that's what we deliver. And nobody else can deliver on that. So talk about the business impact though, was that guy can continue to drill down within seconds. So in terms of following a train of thought and really exploring a hypothesis and changing direction, it's got to be completely different animal. Right, so the productivity can dramatically increase in terms of what you can do with the data, right? So we've got users who were taking weeks to produce reports and they started out with Hadoop and they said, okay, we'll put a hive structure in place and we see if that can reduce the time. And okay, so they went from multiple weeks to a few days but it would still take a whole day for them to process the reports and get the results they were looking for. With us, we went to the same environment. Within a couple of weeks, we were able to put a cube in place and now they can interactively create those charts and dashboards and make modifications as necessary and iterate on them and get real value from it instantly. So I'll give you a simple example, John. So let's say I'm a consumer company and I'm an analyst at a consumer company, a multinational, selling, let's say, so-bar, shampoos, whatnot. I want to analyze- Unilever, Lever Brothers, Fratrin Gamble? That's only two. That's it, that's it, Lever Brothers. Fratrin Gamble. Now, almost every resident of the US is a potential customer or a customer today. And if somehow I'm profiling, tracking who bought what using their credit card histories and purchase histories, whatnot, I can actually go in and I can build a cube for the entire US population in one place with absolutely fine-grained information in there. Today, the way what we've discovered is most enterprises, either if they do it at a national level, then it's a very coarse granularity. They don't have, they cannot go down to a single household or an individual in the same cube. Or they'll just go and do it at a state level or a county level or a metro area level. So they've got these disparate systems. Now, with what Ajay just described, you can just build one cube in a, you know, you can drag and drop and then fire it off on a cheap Hadoop cluster, you know, 300 nodes, 400 nodes, just quickly builds out and then you can start playing with it and I can just zoom up to a national level and I can zoom down to a household level. You're hitting on a really core thing which is you're highlighting the old way which is limited by technology and resource, not creative thinking. Certainly the individuals are building cubes, right? I mean, there's a human capital element here. Now with Hadoop Cloud and these technologies with this kind of performance, it makes scale and resource go away. One, two, now the human role changes. Those outliers which are, wow, if I go drilling down on that outlier that's going to cost all this money, that outlier actually could be the answer, right? So this is where we get into the kind of the weirdness and nuances of data science, right? So what impact do you guys see? One, do you believe that? And two, where does this go from here? I mean, what happens next? I mean, is this transformative? I mean, what? Let me answer this question by asking you a question, John. Let's switch our roles. Yeah, I love this. So between cost of computing, cost of storage, cost of memory, and cost of human time, which one is increasing? Increasing cost? Increasing? Talents, of course, yeah, I would say talent. Human talent. Human talent, right? Absolutely. No more falling. Everything else is falling. So the way we have designed this is we'll leverage the falling cost of all three but we want to make sure that the human being is not made to wait. The human being who's waiting there to know his or her data doesn't have to wait. You know, that's a good other good point. This is a good conversation. Like how we're kind of in real time riffing on this, but the human capital piece, the most expensive component is really what the ROI should come into. So everyone is kind of looking at ROI as kind of justification, we mentioned, how do you sell stuff? But when you look at the ROI, whether it's an expensive salesman in the right sales situation with the right person or using the right resource for the linguistic ontology or whatever things get done, that's the ROI piece. So is this yet even on the table to do ROI on this? Or is this ROI still elusive because it's kind of developing? Most people look at ROI like, oh, we invested X and an artifact popped out and we have a sale or generated revenue because that's revenue focus. But on the human capital, is there an ROI equation for the human capital at this point? Because that's an interesting comment because that's like, okay, good, what do we do? So actually, that's a very good question. So there is no direct ROI that, oh, we are analysts saved so much and that will not be enough of a value add for us as a company. But I think the real benefit is that analysts, before he or she lost interest in pursuing a certain path, pursuing a certain hunch, was able to arrive at that insight and validate or invalidate that insight and act on it. Well, and the other thing is you said, the concept of sampling, right? It's less and less relevant, right? When you can drill up or drill down. Absolutely, you don't have a sample. Right, so we actually have a customer, EnterVision, and that is precisely their use case. They look at, you know, trying to analyze how the Latin population is using different media channels and what the viewing habits are and what the purchasing behavior is. And the way they used to do it before was with self-reported diaries, right? And you know how accurate those can be and surveys and samples. So they really wanted to move away from that and move to empirical data with real transactional numbers behind it and much more precise measurement of what these people were actually doing. And that completely transformed the value they could provide to their customers who are targeting these channels. Right, right. Because now they know precisely where to target them, what the profiles of those customers are and what the best marketing campaign could be for them. So they went from monitoring 40,000 users, a sample of 40,000. Out of how many? To... Well, so their population base that they cover is about 20 million subscribers. So they were sampling 40,000 on 20 million. Right, and now they can get to, you know, every single user of that population. It reminds me too, John, of the all flash discussion, right? There's the unit change in going with flash versus spinning disk, but it's the second order impacts of freeing up time, moving everyone's process quicker, better replication of the data that really opens up a completely different ROI than just looking at I need to justify this with a high value application that needs to be a little latency. It's really these second order impacts that you can't even begin to think how analyst behavior is going to change, discoveries they're going to make and an ROI of business cases you haven't even necessarily thought of when you move away from sampling into this really holistic view. And I think the point you were making was instead of being constrained by the systems and memory and so forth, now you're completely free up the analyst to explore whatever he wants to do. Yeah, I'm a decision maker too. It's a great point. I mean, to me it's all about the data, right? The outliers, which you couldn't get to before because of cost and friction, you can now look at, and at least go yay or nay, I want that, or drill down on it, kind of like that movie contact with Jody Foster. It's like in that small piece of space is a big puzzle piece behind it. That's analytics. Yeah, and actually that's exactly what OLAP cubes are meant for, right? So you can take a look at a certain aspect of your data and start drilling down to whatever level of detail you want and really figure out what's going on. So I got to tie this into some real-time relevance and I want to get your perspective on this, just how you guys are thinking about it. So the use case of real-time is huge. We were talking about streaming earlier and the modernization of data warehousing. So data comes in two forms and by the way I look at it. Passive data that's stored and monitoring data I'm pulling and putting to a pile, a corpus, whatever you want to call it, and then active data in real-time. Data is changing shape every day. So if I analyze a piece of data, a corpus of data, and there's more data piling in, I have to constantly be at a real-time stream. So okay, my analysis changes every time new data enters the analysis. So this is a challenge we're seeing I call the data ocean, which is not a data lake problem. It's more of, you know, more currents, more streaming, more stuff. How do you guys think about that and how can we get the interactive of the linguistic or the experts, the humans to solve this? Is it going to come from this kind of system? Because this seems to be the cutting-edge problem. I analyze X, but it changes because new data. So there's two aspects to that, you know, responding in real-time to new information coming in. And there you're looking at, you know, new information comes in, but in order to figure out if that's an anomaly or not, you need to have that historical analysis of the profiles built that tells you that it's an anomaly, right? And the second aspect- Oh by the way, that's got to be in, you know, low-latency performance. Absolutely. Yeah, and the second thing is, you know, incorporating new data into the queue. And there, you know, we have the ability to do incremental builds. So as new data comes in, you know, in very short periods of time that can be incorporated into the queue itself. Awesome. And since you're not sacrificing granularity, you can, that includes time as well, the dimension of time. So you can have very fine-grained mapping of any trend across multiple dimensions. And then you can compare what happened here, what happened in this five-second interval versus the last one year. So AJ, we're getting the hook here. I want to get your thoughts. What does this mean and where's this going to go? I mean, this is cutting edge, it's really great and against high-end, but it's self-service BI, so it can be commoditized to the point where citizens, normal people could use it. So where's it going? So we're looking at, you know, transforming the way business users think of big data. You know, right now it's kind of thought of as a very complex environment. They really don't know how to deal with it. You know, they need some data scientist to kind of figure out how to, you know, analyze that data and really transform that and make it accessible to the business users so they can do self-service interactive analytics right there. So we should contact you when we build our OLAP cube for the cube platform we're building. So that's going to be fully cube, cube square. We can call it, you know, some, you know, cube tie-in for our crowd data. We'll call you guys up. And how do we contact you? Customers out there think, hey, I like this conversation. Where do I go to join the conversation? Where do I find information? Sure, at kivosinsights.com is our website. And great. All right guys, thanks so much for the insight here. On the cube, Silicon Angles Cube. We are talking about OLAP cubes, bringing the data, making the data work real time, passive analysis and low latency performance. I'm John Furrier with Jeff. We'll be right back after this short break.