 Line from Orlando, Florida, extracting a signal from the noise. It's the Cube, covering Pentaho World 2015. Now your host, Dave Vellante and George Gilbert. So congratulations on the high energy and covering a lot of ground. How do you feel? I'm really happy to be here for a second year in a row. Last year was the first one, as you pointed out. A lot of energy, both developer energy but also business energy. So it's a great comment. So you made, you set up sort of the spectrum of different types of analytics and I want to talk a little bit about them. But before we get into that, what do you see in your customer base? You know, Forrest did a great job dealing with both business technology and technologist. What are you seeing in terms of analytics in general? Its ability to affect business outcomes. You made a sort of a tongue-in-cheek about data is the new oil. It's saying sun is the better analogy. Is data analytics changing the businesses in your client base, the way that everybody expected it to? The way that we're starting with the business priorities. So our survey data shows the number of business priorities is customer experience. That's, you know, the reasons are obvious. Customers pay you, you want the oil, so you want to get new customers. So that's sort of the backdrop that drives analytics and data analytics. You want to know what can I do? That was sort of aspiration itself. Okay, the way we can do that is we have all the data. The conversation today is about analytics. So some businesses saw this opportunity and said, okay, we're already in the data business. We can now accelerate time to insights or we can make that data more readily available to our customers. In other situations, you have customers that they don't really understand how to make a data-driven organization. They're trying to figure out new business models. And George, we've seen this as well when we talk to practitioners. How do they go about monetizing data? What are you seeing in that regard? Well, who decides what's on that record? They decide, I don't want to see this, I want to see this. But it's usually to confront their beliefs about how the business is working and what the APIs matter. So a lot of the data-driven business intelligence actually is not data-driven. It's human-driven with data-supporting. So for an organization that is truly data-driven, they have to actually use the data to find answers and not to confirm their bias. So you had the spectrum of different types of analytics from descriptive all the way to prescriptive and a couple in between. One's looking back sort of the... Well, there's a couple in between, but they're very important. Yeah, so let's talk about those. Which the best way to think about that is traditional business intelligence. It's a report, it's a dashboard, it's a product model. What happened? Right, what happened? It's always based upon the historical data. The second one, and these aren't necessarily in order, but it is predictive. Building a predictive model. So think of a risk scale. Your credit score is a predictive model. A customer's likelihood to buy a product is a predictive model. A recommendation, like... I recommend that you might like that or call Solve. A recommendation action is predictive. So predictive analytics. The third one is streaming analytics, which is real time. It's detecting what's happening in real time and analytics of what's happening. And then finally it's prescriptive analytics, which is what should we do in the month? What's the next best action to take? One of those four analytics is very distinct. Two set of technologies. They all ultimately need to work together, but there's very distinct technologies. And it seems like prescriptive, being able to actually take action, turn insights into action, seems to be the one that organizations are, they set great potential, but as a consumer I feel as though it's the one that has sort of the biggest challenge associated with it. I don't know if you would agree with that or what data shows. I mean you had some excellent survey data. You guys do a big survey each year. One of the surveys you did was, I want to say over 3,000 global individuals, practitioners, business technology practitioners. Obviously there are other business people too because Forrester has those in a big constituency. And then you had a number of smaller surveys. So I'm wondering what are you seeing in terms of that fourth category, that prescriptive analytics. Is that really nirvana or is it actually happening today? Prescriptive analytics kind of struggles to be understood onto what it is because it's actually three things. It's business rules. So it's what you do about something. I now know, so for example, the Obama campaign famously used predictive and prescriptive analytics in the 2012 campaign. So you want a predictive model to predict who a swing voter is. And then you want a prescriptive model of what to do to influence that voter. So do I talk about healthcare? Do I knock on their door? Do I call them up? Do I send them an email? Do I send them a letter? What action for that individual is most likely to influence them? Now how you implement prescriptive analytics, you can do it in three ways. You can do it with business rules, which is basically a human saying, all right, if these factors then do this. You can use it with a second predictive model like predicting based upon these factors, I should knock on this person's door and talk about education. That's the most likely thing to influence them. And then finally, there's some numerical methods using linear algebra to figure out what to do as well. So the prescriptive category is the hardest to understand and it's the least understood. So the Obama campaign is a good example. We've had some folks on theCUBE before from the Obama campaign. And those are kind of early days of big data. One of the practitioners in the Obama camp just raised $58 million. I don't know if you saw that. Really? Yes. I wonder how many of the presidential candidates now are sort of using these forms of analytics. Right, I mean, you wonder if it's like Moneyball and they all get it and sort of hop on. Yeah, they hop on. And then if they're like the Red Sox, they break the pattern because somehow they won three World Series and they figure out, we'll try something different. It takes discipline, doesn't it? It does. And I think there also has to be a realization that it's not a silver bullet either. Prediction is about probabilities. It can't work miracles, but it can give you an edge. Just like at the casino, if the casino has a one, two percent edge, they can build those tall, fancy buildings. So sometimes you don't need much to just give you an edge. So were you at Strata two weeks ago? No, I wasn't. A couple of big themes there. One was sort of this data in motion and move to real time. There was a lot of talk about storage, obviously. Kudu, you know, database and complexity. George and I are interested in your thoughts on Spark. George, you're enthused. You know, you're excited about what's going on with Databricks and sort of the next gen platform. Well, and we were talking about this just, you know, it's not as mature and hardened as the Hadoop ecosystem. Right. But that it is more cohesive and easier to run as a service. It appears that as we go from sort of a service rich offering early in the market to more mainstream product rich for customers who don't have all those fancy skill sets, it would seem like Spark is going to have an important role to play as a compute framework within the Hadoop ecosystem. Yeah, yeah, no, I mean, you know, most people will say, and I would agree with this, that Spark has taken over for MapReduce on Hadoop. Now that doesn't mean that it's taken over for Hadoop. Hadoop adds, I mean, you know, we can chat for a long time and all the additional value propositions, but Spark essentially, there's two things about it that of why it's taken over MapReduce. One is got an in-memory model, but the second is the DAG engine, which is a directed acyclic graph. It's a way to parallelize work and it's an easier programming model for developers. So while you still need to be a programmer to really use Spark, if you are a programmer, it's easier to use that programming model than the MapReduce and it's faster. And they're actually putting layers on top of it, whether it's taking a Python notebook, which is notoriously, it's kind of a single processor, spread it across the cluster or these other interactive notebooks. Yeah, I mean, you've got to be careful of that because there's also Spark R, right? And there's Python notebooks, but you can't take any arbitrary programming language, whether it's functional or whether it's object-oriented, any arbitrary program that you and I, we could just sit here and start writing some code and then just magically say, I want to distribute this, right? So that's where people got in trouble with R. They thought that you could just distribute a program. You can't, so what they've done in Python... With R, but can you not do that with R now? Well, what they've done is they say, all right, well, if you have an R script and now you want to run this logistical regression algorithm, call this function, that function will then call out a parallelized implementation of it. So I just wanted to... I just wanted to make sure that people understood that it's not a perfect parallelization. But let's back up just a bit because we've gotten down in the weeds. In the interest of sort of looking at this class of applications that we're aspiring to with Hadoop, beyond the data lake to get to predictive or prescriptive, how much easier will it start to get when we have Spark as sort of the core compute framework and then all the sort of administration and governance capabilities surrounding it from the Hadoop ecosystem? Yeah, no, I think that's a great point that you make, that that's one of the values that Hadoop is bringing. The governance, the security model, the resource management, and Spark was designed to run on yarn. So I think it will get easier. But we also need high level... We still need higher level tools on top of Spark. It can't just be... It has to be a programming model, but it can't just be a programming model. So we're tight on time here, Mike. I apologize for that. But I want to get your take on Pentaho. Pentaho is a second year keynoting. So you spend your time down here. So obviously you like what's going on. But what's your take on how they're doing? It's interesting. They're 11 years old. Hadoop's 10 years old. I want to talk a little bit about the Hitachi acquisition real quick. But what's your take on Pentaho? How they're doing? Sort of where they fit? Number one, open source. And I think sometimes people forget that Pentaho is actually from the open source community. Because I think people in the community, that's just a given. But I think people being introduced to Pentaho might not always realize that. So open source innovation model. Pentaho is often difficult to understand from a newcomer, because they actually have multiple products. They have the data integration product. And they also have the analytics. The front end development as well. It's a visualization. And the visualization thing. So the idea in the past, these things have really been like separate markets. You get this and then you get this. They sort of had the vision to put those together. And I think that the market is catching up with the idea that the data processing and the analytics has to be closer. And the visualization has to be closer together. So I think they're in a good position there. You mentioned the Hitachi acquisition, which is new. That certainly gives them tremendous resources that Hitachi can bring to bear. Globally, especially I think in the industrial, IOT, and many of the initiatives that Hitachi has and just, I think what they call it, don't they call it social innovation? Social innovation, yeah. Social innovation applications, yeah. I mean on paper, we've got $80 billion diversified business. It looks like a huge potential win to put Pentaho in there. Now how do they go to market? We have Hitachi, I think it's too new to really, there's lots of things that happen in any acquisition. They have to play out. So, but I think you're right. I think on paper it looks great. Yeah. All right, sorry, we're out of time. We've got to leave it there. Mike, thanks very much for coming on theCUBE. Great job today. It was really a pleasure having you on. All right, thank you. All right, CUBE right there, but we'll be back with our next guest. This is theCUBE, we're live from Pentaho World 2015. Right back. Line from Orlando, Florida. Extracting a signal from the noise.