 Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017. Brought to you by Hortonworks. Welcome back to theCUBE. We are live, day two of the DataWorks Summit in the heart of Silicon Valley. We have been learning a ton from lots of different people that have been on the show the last day and a half. I'm Lisa Martin with my co-host George Gilbert and we are next joined by the Chief Product Officer of Pentaho, Donna Perlage. Welcome back to theCUBE. Cube alumni. Thank you, thank you, it's great to be here. And you had a great keynote this morning that we had to miss because we were here. Sorry. Tell us about, so one of the really interesting things that you were talking about on your keynote was that a lot of machine generated data today is not being fully exploited or used. Tell us about that. What opportunities are businesses missing and how does Pentaho help them capitalize and monetize that? Yeah, so it's true. We're hearing so much about IoT and one of the things we've seen and we had a press release out the last couple of days around the growth and bookings we've seen around customers who are leveraging machine generated data, Caterpillar and Beontra and a couple others. And what we're seeing is that there's this major opportunity out there for organizations to get value from that data. But the most interesting thing that we're seeing in these use cases is the ability to bring all the work that we've done over the last few years, all of us, in big data. When that starts to come into the picture, that's where you really start to drive some interesting insights. And so the idea that that isn't being exploited is kind of like the next step up from big data and sort of what we're all going to prepare for here. But much like big data, we have to focus on what those business outcomes are in order to achieve that, so. So what truly, the necessity is the convergence of big data with machine learning, IOT. Our customers, like you mentioned, Caterpillar, for example, give us an example of kind of where they were. Did they understand we've got tons of sensors out there but we're not able to kind of connect the dots? Where did they come to Pentaho for help? To really maximize their investments? So Caterpillar Marine Asset was a customer of Pentaho's for a long time and they leveraged our platform to collect the sensor data off of the ships at sea and they managed sheet fleets of ships for the US Navy for many years. And they always could do predictive maintenance. That was the thing they did really, really well. And so they captured that machine generated data, they'd take the operational data, data about parts, location data of where ships were and they could say, okay, somebody ship part out to the ship out in the Indian Ocean so that it's not sitting out there. And then what happened is as that sensor data became more valuable, they said, gee, I wonder if we could look at maintenance costs and they started to look at the fact that they were spending pretty sizable amount of money on maintaining ships and cleaning the halls twice a year. But what was interesting is once they took that sensor data and then brought in a lot of the other kind of big data about fuel consumption, the drag of the ship, propulsion, all this other data that we might think of as big data, they realized that they should actually, based on the barnacles that accumulate on the bottom of boat, I've learned a lot about shipping, that they should be cleaning the halls more often and they save themselves $10 million over two years for a fleet of eight ships. So if you do the math to 2,000 ships, that's amazing. So that context of the big data is really what kind of drove those business outcomes that are obviously driving cost reduction and innovation. Who knew barnacles could influence big data outcomes? Well, I know, it's amazing, you know, and probably my terrible driving patterns can influence that as well. Wow, that's incredible. So tell us, you've given us some interesting use cases. Tell us how being able to manage data coming in from the edge all the way to a business outcome, how that works and why you have to have that span to make a big impact. Yeah, so a really key thing, and I mentioned this before, is focusing on a lot of the research says, the winners will be those who focus on the business outcome. It's much like the Warriors. You missed my presentation this morning, but we talked about the Warriors and the Cavsari, anybody who's a LeBron fan. But, you know, the winners will be those that focus on those business outcomes. The losers will be those that just think about technology for technology's sake. And we saw that in the early days with Hadoop, a lot of science experiments, people fail. Why are we doing this? And so when we look at IoT, if we think about the edge to the outcome, it sort of forces that the pieces that Pentaho's done really well over the years of the insights and the outcomes will drive what the business is going to see as the result. The edge piece is really interesting because not only will we have all the usual data sources and the big data, but we'll have to connect to those devices at the edge and we'll have to figure out how to register them and bring them into, stream them, or maybe in a micro batch way, bring them into that world of the rest of the data. So if you think about that as continuum, it kind of forces the business outcome to be kind of front and center. And Hitachi is really where Pentaho kind of can extend our platform into that world of the edge to the outcomes. Maybe expand on that a little bit in terms of, you know, we've seen GE and IBM take slightly different approaches, but in the case of IBM, they take the expertise that they gather from their customers in joint development and their own data science expertise and build these sort of rich models that then it sounds like you would deploy. But with Hitachi, you're doing the same thing with Hitachi being a, you know, 90 billion dollar company, same size as IBM. You're complimenting their data science chops and their domain expertise and building semi-custom, essentially, applications for other industrial companies. How does, how, tell us how, you know, how that makes a big impact and how that accelerates the process. Right, so if we think about what Hitachi brings to the table in that mix is they've got domain expertise and they've got, you know, a lab here for instance right down the road that's got 50 data scientists. So in these early big data use cases that are really kind of transformative and then you need to have the domain expertise and the data science. Hitachi can bring that to the table and then Pentaho brings all of the data orchestration and the orchestration of those machine learning models that get built for the business outcome. So we now have the edge to the outcome with Hitachi providing kind of that connected to the devices, registering, bringing them into the world of data. But now we also have the consulting expertise of Hitachi Labs to provide the actual domain expertise for the machine learning. So we were talking earlier, it's like, you know, if you have a turbine, for instance, that's a very specific type of machine learning that has to get developed and that's the thing that Hitachi can help us and in the early use cases, we're learning so much and they're learning so much about how to help other customers too. It's a great combination. So one of the things that you mentioned was kind of what's happening now that's forcing the business outcomes to be front and center. One of the things yesterday that Rob Bearden announced when they were making the announcement about the IBM expansion of their technology and strategic partnership is, you know, the four meta trends, cloud, IoT, streaming data and data science. So one of the things that's interesting is people say, well, it's as a company, whether it's, you know, a traditional network trying to compete with the Netflix or Hulu saying we have to monetize our content and today the conversation was really around we need to monetize the insights, the outcomes. So when you look at the role of the data scientists, which we've all known, there's not a ton of them, how do you guys help? You mentioned consulting, how does Pentaho perhaps help the business users kind of understand or what outcomes would we want to see? What variables would we want to examine since you're seeing now this forcing function of the business outcomes being kind of front and center? What's that conversation like? Yeah, so the data scientists obviously are very rare, right? And important resources. And so over the last couple of years, we've really been able to help with the preparation of data, which we all know is like 80% of the job. So removing that time to kind of get just the data alone prepared and then actually to tune and test those models and make that iteration process faster. Pentaho can really help with that. And then the best part is is once those are actually created and you've got all that value, you can simply bring those into those already existing transformation flows that you've built with Pentaho and we can execute those models to achieve the business outcome. So we're definitely allowing you to have your resources that are focused on the really hard challenges to create the transformation. So whether it's predictive modeling or it's sentiment analysis, we're helping with that. And then when you execute those, obviously that's what drives that business outcome. Excellent. When is it appropriate to run the models just within Pentaho or orchestrate the predictions and prescriptions it provides and when do you find the need to integrate them with existing applications to inform a system of record making a decision in the form of a transaction? Right, well that's some of, today we often do that. So the Caterpillar example I mentioned, IMS that does usage-based insurance as I was mentioning about the bad driving, they take that telemetry data, they bring that together with customer data to be able to manage risk, retain customers, offer better policies, etc. to their customers. Those are areas where Pentaho today, they simply create all their data transformations from the different data sources with Pentaho and then they execute those as part of those flows. Outside of that, if you think about IoT, I might want to actually kick off a process and workflow integration and so that's an area where we're working on the IoT side with Hitachi to extend that edge to outcome because we know that it's edge to outcome but eventually that outcome needs to trigger that machine learning model to your point, George, and say maybe I need to trigger my ERP system to say send something out to this particular provider. So that's where the work we're doing I think just to help with that initial orchestration and getting that value and then being able to integrate from that outcome back to the edge is where there's going to be a lot of value. And sort of where are you in that journey? Is it a technical immaturity at this point or is it more that the customers still need to sort of wrap their head around a new approach to thinking about closed loop analytics? Yeah, I think it's mostly the latter and what we've really found is what helps us to drive what to build is by kind of going at pace with the market and seeing what's coming and following the use cases because the beginning is always so uncertain. I remember five years ago with big data everybody was trying to figure out what are those patterns? And so I think the closed loop piece, what we're seeing is a lot of the customers that we have, they're not quite there yet, even with real time. Some of it is kind of micro-batch, some of it is batch. So it's really one of those things as the IoT market matures, then I think we'll see that orchestration technology improve. And there'll be a lot that'll be open, right? A lot of it's going to depend on APIs from the edge, from those devices. So still a lot to kind of shake out over the next few years I think there. One of the things that you guys have just done is come out with a message about adaptive execution as this world is obviously inherently complex. Kind of last question to you is, describe what adaptive execution means in the context of some of your customers. Yeah, that's a great question. So adaptive execution, it really applies to what you just mentioned, George, about the closed loop and how are we going to get there? A lot of these things we honestly don't know until we see what emerges now and then we start to build for it and we look for those repeatable patterns. Adaptive execution was a similar one. There was a lot of excitement around Spark and around how are we going to create tools to do drag and drop and generate code and so developers can move faster because there's not enough resources. Adaptive execution, what that'll allow customers to do is say, you know what, I'm going to create this transformation once, but I might want to run it in Spark. I might run it in MapReduce. I might want to run it in whatever's coming down the path that we haven't seen next. I'm going to be able to balance the workload again to the business outcome I'm trying to achieve and preserve my resources. So adaptive execution for Pataho says, build the transformation once, build your data transformations once, and then execute them on whatever engine is appropriate for the workload. And that's something that we're really excited about. Our customers are super excited about. That was a couple months ago, we announced it and great feedback from our customers. Well, speaking of great feedback, thank you so much, Donna, for stopping by the Cube and talking with us. What a great conversation, barnacles, big data, and from the edge to business outcomes. Thank you so much for joining George and me. It's been great to have you here. Thank you, thanks for having me. And we should remind everyone that we'll be at Pentaho World in a couple months. That's right. And so we'll have wall-to-wall coverage of all the cool things you're doing. Pentaho World, October 25th. Don't miss it, Philando, Florida. You heard it here, all right. Well, for my co-host, George Gilbert, I'm Lisa Martin. You've been watching the Cube Live from day two of the DataWorks Summit. Stick around, we've got great content coming right back at you.