 Live from Orlando, Florida, it's theCUBE. Covering Pentaho World 2017. Brought to you by Hitachi Ventura. Welcome back to Orlando, everybody. This is Pentaho World, hashtag P-World17, and this is theCUBE, the leader in live tech coverage. My name is Dave Vellante, and I'm here with my co-host, Jim Kobielus. Donna Perlich is here. She's the Chief Product Officer of Pentaho and a many-time CUBE guest. It's great to see you again. Thanks for coming on. Thank you for all of them. Happy to be here. So, yeah, I'm thrilled that you guys decided to sort of reinitiate this event. You took a year off, but we were here in 2015 and learned a lot about Pentaho and especially about your customers and how they're applying this sort of end-to-end data pipeline platform that you guys have developed over a decade, you know, plus. But it was right after the acquisition by Hitachi. Let's start there. How has that gone? And so they brought you in, kind of left you alone for a while, but what's going on? What's up to date? Yeah, so it's funny because it was 2015, was Pentaho World, second one, and we were sort of like, wow, okay, we're part of this new company, which was great. And so for the first year, we were really just driving against our core big data integration, analytics business and capturing a lot of that early big data market. And then probably in the last six months with the initiation of Hitachi Vantara, which really is less about, you know, Pentaho being merged into a company. And I think Brian covered it in a keynote. We're going to become a brand new entity, and which Hitachi Vantara is now a new company focused around software. So obviously, you know, they acquired us for all that big data, orchestration and analytics capability. And so now as part of that bigger organization, we're really kind of at the center of that in terms of moving from edge to outcomes as Brian talked about and kind of how do we focus on data, right? Digital transformation and then achieving the outcome. So that's sort of where we're at right now, which is exciting. So now we're part of this bigger portfolio of products that we have access to in some ways. And I should point out that Dave called you this CPO of Pentaho. But in fact, you're the CPO of Hitachi Vantara, right? No, so I am not. I am the CPO for the Pentaho product line. So it's a good point though, because Pentaho brand, the product brand stays the same because obviously we have 1,800 customers and a whole bunch of them are all around here. So yeah, so I cover that product line for Hitachi Vantara. Well, and there's a diverse set of products in the portfolio. So I'm actually not sure it makes sense to have a chief product officer for Hitachi Vantara, right? Maybe for different divisions, it makes sense, right? But so I got to ask you before the acquisition, how much were you guys thinking about IoT and industrial IoT? I mean, it must have been on your mind in the back 2015. Certainly it was a discussion point and GE was pushing all this stuff out there with ads and things like that. But how much was Pentaho thinking about it and how has that accelerated since the acquisition? Yeah, so you know, at that time in my role, I had product marketing, I think I'd just taken product management. And what we were seeing was all these customers that were starting to leverage machine generated data. And we were thinking, well, this is IoT, you know? This is, and I remember going to a couple of our friendly analysts folks and they were like, yeah, that's IoT. And so it was interesting, it was right before we were acquired. So we'd always focused on these blueprints, right? Of, you know, we got to find the repeatable patterns, whether it's customer 360 and big data. And we said, well, there's some kind of emerging pattern here of people leveraging sensor data to get a 360 of something, right? Whether it's a customer or a ship at sea. And so we started looking at that and going, we should start going after this opportunity. And in fact, some of the customers had a prolonged time like IMS who spoke today, you know, all around the connected cars and they were one of the early ones. And then the last year we've probably seen like more than 100% growth in customers that are purely from a Pentaho perspective, leveraging machine generated data with some other type of data for context to see the outcome. So yeah, we were seeing it then. And then when we were acquired, it was kind of like, oh, this is cool. Now we've been, now we're part of this bigger company that's going after IoT. So absolutely, we were looking at it and starting to see those early use cases. Now, but a decade or more ago, Pentaho at the time became very much a pioneer in open source analytics. You incorporated WECA, the open source code base for machine learning data mining and so forth into the core of your platform. Now today here at the conference, you've announced Pentaho 8.0, which I, from what I could see is an interesting release because it brings stronger integration with the way the open source analytics stack has evolved. There's some spark streaming integration. There's some Kafka, some Hadoop and so forth. Can you give us a sense of what are the main points of 8.0, the differentiators for that release and how it relates to where Pentaho has been and where you're going as a product group within Hitachi Vantara? Yeah, so starting with kind of where we've been and where we're going, as you said, Anthony Deschetser, our head of customer success said today, 13 years, right? 13 years, I think on Friday that Pentaho started with a bunch of guys who were like, hey, we can figure out this BI thing and solve all the data problems and deliver the analytics in an open source environment. So that's absolutely where we came from. You know, obviously over the years with big data emerging, we focused heavily on the big data integration and then delivering the analytics. So with 8.0, it's a perfect spot for us to be in because if we look at IoT and the amount of data that's being generated, and then the need to sort of address streaming data, data that's moving faster, this is a great way for us to kind of pull in a lot of the capabilities needed to go after those types of opportunity and solve those kinds of challenges. So the first one is really all about how can we connect better to streaming data? And as you mentioned, it's Spark streaming, it's connecting to Kafka Streams, it's connecting to the Knox Gateway, all things that are about streaming data. And then in the scale up, scale out kind of how do we better maximize the processing resources? We announced in 7.1, I think we talked to you guys about it, the adaptive execution layer. So the idea that you could choose the execution layer or the execution engine you want based on the processing you need. So you can choose the PDI engine, you can choose Spark, you can choose hopefully over time we're going to see other engines emerge. So we made that easier, we added Hortonworks support to that. And then this concept of, so that's the scale up, but then when you think about the scale out, you've got to, sometimes you want to be able to distribute the processing across your nodes and maybe you run out of capacity in a Pentaho server, you can add nodes now and then you can kind of get rid of that capacity. And so this concept of worker nodes that we use some of the, to your point earlier about the Hitachi portfolio, we use some of the services in the foundry layer that Hitachi's been building as a platform. As a low balancer. As part of that, yes. And so we could leverage what they had done, which if you think about, you know, Hitachi they're really good at storage and a lot of things that Pentaho doesn't have experience in and infrastructure. So we said, well, why are we trying to do this? Why don't we see what these guys are doing? And we leveraged that as part of the Pentaho platform. So that's the first time we've kind of brought some of their technology into the mix with the Pentaho platform. And I think we're going to see more of that. And then lastly around the visual data prep. So how can we keep kind of building out on that experience to make data prep faster and easier? So can I ask you like a really Colombo question on that sort of load balancing capability? Nice looking trench coat you're wearing. I mean, cigar. So is that the equivalent of a resource negotiator? Do I think of that as sort of your own yarn? You know, I knew you were going to ask me about that. Is that unfair to position it that way? I mean, conceptually right. It's going to help you to better manage resources. But if you think about mesos and some of the capabilities that are out there that folks are using to do that, that's what we're leveraging. So it's really more about sometimes I just need more capacity for the Pentaho server. But I don't need it all the time. Not every customer is going to get to the scale that they need that. So it's a really easy way to just kind of keep bringing that bringing in as much capacity as you need in habit available. It's a really efficient sort of low level kind of stuff. Oh, okay, cool. When you talk about, you know, distributed load execution, there's the whole, you know, you're pushing more and more of the processing to the edge. And of course, Brian gave a great talk about edge to outcome. You and I were on a panel with Mark Hall and Ella Hillel about the so-called power of three. And you did a really good blog post on that the power of the IoT and big data and predictive. The third is either predictive analytics or machine learning. Can you give us a quick sense for our viewers about what you mean by the power of three and how it relates to pushing more workloads to the edge and where you really Hitachi Ventara is going in terms of your roadmap and that direction for customers? Well, it's interesting because one of the things we, I wish I could, you know, maybe we have a recording of it, but kind of shrink down that conversation because it's a great conversation, but we had, we covered a lot of ground, but essentially that power of three is a lot. But we started with big data, right? So as we could capture more data, we could store it. That gave us the ability to train and tune models much easier than we could before because it was always a challenge of how do I have that much data to get my model more accurate? And then, you know, over time, right? Everybody's kind of become a data scientist with the emergence of R and, you know, it's kind of becoming a little bit easier for people to take advantage of those kinds of tools. So we saw more of that. And then you think about IoT, IoT is now generating even more data. And so as you said, you're not going to be able to process all of that. You're not going to be able to bring all of that in and store it, it's not really efficient. So that's kind of creating this, we might need the machine learning there at the edge. We definitely need it in that data store to keep training and tuning those models. And so what it does is though, is if you think about IMS, is they've captured all that data. They can use the predictive algorithms to do some of the associations between, you know, customer information and the sensor data about driving habits, bring that together. And so it's sort of this perfect storm of the amount of data that's coming in from IoT, the availability of the machine learning and then the data is really what's driving all of that. And I think that Mark Hall on our panel who's a really well-known data mining excerpt was like, yeah, it all started because we had enough data to be able to do it. So I want to ask you just sort of a, again, a sort of product and maybe philosophy question. We've talked on theCUBE a lot about the cornucopia of tooling that's out there and the people who try to roll their own. And you know, the big internet companies and the big banks, they got the resources to do it, but they need companies like you. When we talk to your customers, they love the fact that there's an integrated data pipeline and you've made their life simple. I think in 80, I saw Spark, you're probably replacing MapReduce and making life simpler. So you've curated a lot of these tools. But at the same time, you don't own your own cloud and so your own database, et cetera. So what's the philosophy of how you future-proof your platform? When you know there are new projects in Apache and new tooling coming out there, what's the secret sauce behind that? Yeah, well, the first one is the open source core because that just gave us the ability to have APIs, to extend, to build plugins, all of that, right? And a community that does quite a bit of that when there's, in fact, Kafka started with a customer that built a step, initially, right? I mean, we've now brought that into a product and created it as part of the platform, but those are the kind of things that in an early market a customer can do at first. We can kind of see what emerges around that and then go, we will offer it to our customers as a step, but we can also say, okay, now we're ready to kind of productize it. So that's the first thing. And then I think the second one is really around when you see something like Spark emerge and we were also focused on MapReduce and how are we going to make it easier and let's create tools to do that. And we did that, but then it was sort of like, well, MapReduce is going to go away. Well, there's still a lot of MapReduce out there. We know that, so it was like, okay, well, we could see then that MapReduce is going to be here and I think the numbers are around 50-50. You guys, you probably know better than I do about where Spark is versus MapReduce. I might be off, but- We had George Gilbert, he'd know. Okay. It's about right. It's about right. It's about 50-50. So you can't just abandon that because there's MapReduce out there and so it was, well, what are we going to do? Well, what we did in the Hadoop distro days is we created a adaptive big data layer that said, let's abstract kind of a layer so that when we have to support a new distribution of Hadoop, we don't have to go back to the drawing board. So it was the same thing with the execution engines. It was, okay, let's build this adaptive execution layer so that we're prepared to deal with other types of engines. I can build the transformation once, execute it anywhere. And so that kind of philosophy of kind of stepping back, if you have that open platform, you can do those kinds of things, right? You can kind of create those layers to remove all that complexity. Because if you try to one off and take on each one of those technologies, whether it's Spark or Flink or whatever's coming, that's just as a product and a product management organization and a company, that's really difficult. So the community helps a ton on that too. Don, when you talk to customers about, give a great talk on the roadmap today, give a glimpse of where you guys are headed, your basic philosophy, your architecture, what are they pushing you for? Where are they trying to take you? Or where are you trying to take them? Well, hopefully a little bit of both, right? I think it's being able to take advantage of the kinds of technologies like you mentioned that are emerging when they need them. But they also want us to make sure that all of that is really enterprise ready, right? You're making it solid because we know from history and big data, a lot of those technologies are early. Somebody has to get their knees skinned and all that with the first one. So they're really counting on us to really make it solid and quality and take care of all of those kind of intricacies of delivering it in a non-orchest open source way where you're making it a real commercial product. So I think that's one thing. And then the second piece that we're seeing a lot more of as part of Hitachi, we've moved up into the enterprise. We also need to think a lot more about monitoring, administration, security, all of the things that kind of go at the base of a pipeline. And so that's an area where they want us to focus. The great thing is, as part of Hitachi Vantara now, those aren't areas that we always had a lot of expertise in, right? But Hitachi does, because those are kind of infrastructure-ish type technology. So I think the push to do that is really strong and now we'll actually be able to do more of it because of we've got that access to the portfolio. And I don't know if there's a fair question for you, but I'm going to ask it anyway. Because you just talked about some of the things that Hitachi brings and that you can leverage. And it's obvious a lot of things that Pentaho brings to Hitachi, the family. But one of the things that's not talked about a lot is go to market. And Hitachi data systems traditionally have a lot of expertise in going to market with developers as the first step, which is kind of where in your world you start. Have you seen, has Pentaho been able to sort of bring that cultural aspect to the new entity? Yeah, so for us it's actually, even though we have the open source world, it's less of a developer and more of an architect or a CIO or somebody who's looking at that later. And more and more it's the Chief Data Officer and that type of a persona. I think that as, now that we are a new entity, a brand new entity that's a software oriented company, we're absolutely going to play a way bigger role in that because we brought software to market for 13 years. So I think we've had early wins, we've had places where we're able to help in an account for instance, if you're in the data center, if that's where Hitachi is, right? If you start to get that partnership and we could start to draw the lines from, okay, but who are the people who are now looking at what's the big data strategy? What's the IOT strategy? Where is the CDO? That's where we've had a much better opportunity to get to bigger sales in the enterprise in those global accounts. And so I think we'll see more of that. Also there's the whole transformation of Hitachi as well, right? So I think there'll just be a need to have much more of that software experience. And also Hitachi's hired two new executives, one on the sales side from SAP, and one who's now my boss, Brad Suric, who's from GE Digital. So I think there's a lot of good strong leadership around the software side, and then obviously all of the expertise that the folks at Pentaho have. So. That's interesting, that Chief Data Officer role is emerging as a target for you. We were at an event on Tuesday in Boston, about 200 Chief Data officers there. I think about 25% had a robotic process automation initiative going on. That's, they didn't ask about IoT, just this little piece of IoT. So, and then Jim, data scientists, and that whole world is now your world. Okay, great. Donna Perlich, thanks very much for coming on theCUBE. Always a pleasure to see you. Yeah, thank you. Okay, Dave Vellante for Jim Kobielas. Keep right there. But this is theCUBE we're live from Pentaho World 2017. Hashtag P-World 17, brought to you by Hitachi Ventara. We'll be right back.