 Live from San Jose, California, it's theCUBE. Covering Big Data Silicon Valley 2017. Okay, welcome back everyone. Here live in Silicon Valley, this is theCUBE. I'm John Furrier covering our Big Data SV event, hashtag Big Data SV. Our companion event to Big Data NYC, all in conjunction with Strata Hadoop. The Big Data world comes together and great to have a guest come by, Donna Perlich, who's the senior VP of products and solutions at Pentaho, a Hitachi company who we've been following before Hitachi had acquired you guys, but you guys are unique in the sense that you're a company within Hitachi, left alone after the acquisition, you're now running all the products. Congratulations, welcome back. Great to see you. Thank you, good to be back. It's been a little while, but I think you've had some of our other friends on here as well, so. And we'll be at Pentaho World, you have Orlando, I think it's October. Yeah, October, October timeframe. So, excited about that too, so. I'm sure the agenda's not yet baked for that because early in the year, but what's going on with Hitachi? Give us the update, because you've now got your purview into the product roadmap. The Big Data world, you guys have been very, very successful taking this approach to Big Data that's been different and unique to others. What's the update? Yeah, so very exciting actually. So, we've seen, especially at the show that the Big Data world, we all know that it's here, it's monetizable, it's where we've actually, where we shifted five years ago and it's been a lot of what Pentaho's success has been based on, and then we're excited because the Hitachi acquisition, as you mentioned, sets us up for the next big thing, which is IoT, and I've been hearing nonstop about machine learning, but that's the other component of it that's exciting for us. So, yeah, Hitachi, we're still running standalone. You guys doing a lot of machine learning? A lot of machine learning? So, we announced our own kind of orchestration capabilities that really target how do you, it's less about building models and how do you enable the data scientists and data preparers to leverage the actual, kind of intellectual properties that companies have in those models they've built to transform their business. So, we have our own, and then the other exciting piece on the Hitachi side is, on the products, we're now at the point where we're running as Pentaho, but we have access to these amazing labs, which, you know, there's about 25 to 50, depending on where you are, whether you're here in Japan, and those data scientists are working on really interesting things on the R&D side. When you apply those to the kind of use cases that we're solving for, that's just like a kid in a candy store with technology. So, that's been great. You got a built-in customer there. But before I get into the prior, I have some questions on what's uniquely happening within you guys with the product, especially with machine learning and AI as it starts to really get some great, great momentum. But I want to get your take on what you see happening in the marketplace, because you've seen the early days and as it's now hitting a whole other step function as we approach machine learning and AI, autonomous vehicles, sensors, everything's coming. How are enterprises and these new businesses, whether they're people supporting smart cities or smart home or automotive autonomous vehicles, what's the trends that you're seeing that are really hitting the pavement here? Yeah, I think what we're seeing is, and it's been kind of Pentaho's focus for a long time now, which is it's always about the data. You know, what's the data challenge? And some of the amounts of data, which everybody talks about from IOT, and then what's interesting is it's not about kind of the concepts around AI that have been around forever, but when you start to apply some of those AI concepts to a data pipeline, for instance. We always talk about that data pipeline. The reason that's important is because you're really bringing together the data and the analytics. You can't separate those two things and that's been kind of not only a Pentaho specific sort of bent that I've had for years, but a personal one as well, but hey, when you start separating it, it makes it really hard to get to any kind of value. So I think what we're doing and what we're going to be seeing going forward is applying AI to some of the things that in a way will close the gaps between the process and the people and the data and the analytics that have been around for years. When we see those gaps closing with some of the tools that are emerging around preparing data, but really when you start to bring some of that machine learning into that picture and you start applying math to preparing data, that's where it gets really interesting and I think we'll see some of that automation start to happen. So I've got to ask you, what is unique about Pentaho? Should take a minute to share with the audience some of the unique things that you guys are doing that's different in the sea of people trying to figure out big data. You guys are doing well and you wrote a blog post that I referenced earlier yesterday around these gaps. How, what's unique about Pentaho and what are you guys doing with some examples that you could share? Yeah, so I think the big thing about Pentaho that's unique is that it's solving that analytics workflow from the data side, always from the data. We've always believed that those two things go together. So when you build a platform that's really flexible, it's based on open source technology and you go into a world where a customer says, I not only want to manage and have a data lake available, for instance, I want to be able to have that thing extend over the years to support different groups of users. I don't want to deliver it to a tool, I want to deliver it to an application and I want to embed analytics. That's where having a complete end to end platform that can orchestrate the data and the analytics across the board is really unique and what's happened is it's like the time has come where all we're hearing is, hey, I used to think it was throw some data over and here you go, here's the tools. The tools are really easy so that's great, now we have all kinds of people who can do analytics, but who's minding the data? And so with that end to end platform, we've always been able to solve for that and when you move in the open source piece, that just makes it much easier when things like spark emerge, right? That sparks amazing, right? But we know there's other things on the horizon, flank, beam, how are you going to deal with that without being kind of open source? You guys made a good bet there and your blog post that got my attention because the title was, and it wasn't click bait either. It's actually a great article and I just shared it on Twitter. The holy grail of analytics is the value between data and insight and this is interesting. It's about the data, it's in bold. Data, data, data, data is the hardest part, I get that. But I got to ask you, with cloud computing, you can see the trends of commoditization. You're renting compute, you're renting stuff and you got tools like Kinesis and Redshift on Amazon and Ares has got tools so you don't really own that. But the data, you own, right? So you got to. That's your intellectual property, right? That's your organization. But that's the heart of your piece here, isn't it? Yes, it is. The holy grail, what is that holy grail? Yeah, that holy grail is when you can bring those two things together, the analytics and the data and you've got some governance, you've got the control but you're allowing the access that lets the business derive value. So for instance, we just had a customer, I think Eric might have mentioned that but they're a really interesting customer. They're one of the largest community colleges in the country, Ivy Tech and they won an award actually for their data excellence. But what was interesting about them is, they said, we're going to create a data democracy. We want data to be available because we know that we see students dropping out. We can't be efficient. People can't get the data that they need. We have old school reporting. So they took Pentaho and they really transformed the way they think about running their organization and their community colleges. Now they're adding predictive to that. So they've got this data democracy but now they're looking at things like, okay, we can see where certain classes are over capacity. But what if we could predict next year, not only which classes are over capacity, what's the tendency of a particular student to drop out? What could we do to intervene? That's where the kind of cool machine learning starts to apply. Well Pentaho is what enables that data democracy across the board. And so I think that's where, when I look at it from a customer perspective, it's really kind of, it's only going to get more interesting. And with RFID and smartphones, you could have attendance tracking too. And say, yeah, who's not showing up? Absolutely. And when you bring Hitachi into the picture and you think about, for instance, in it from an IoT perspective, you might be capturing data from devices and you've got a digital twin, right? And then you bring that data in with data that might be in a data lake and you can set a threshold and say, okay, not only do we want to be able to know where that student is or whatever, we want to trigger something back to that device and say, hey, here's a workshop for you to go log into right away so that you don't, and not pass in the class or whatever it is. I mean, it's a simplistic model but you can imagine where that starts and really become transformative. So I asked Eric a question yesterday. It was from Dave Vellante, who's in Boston stuck on the snowstorm but he was watching and I'll ask you to see how it matches. And he wrote it differently on CrowdChick was public but this is in my chat. HDS is known for mainframes historically and storage but Hitachi is an industrial giant. How is Pentaho leveraging the Hitachi monster? Yes, that's a great way to put it. Or Godzilla, because it's Japan. We were just comparing notes, we were like, well, is it an $88 billion company or $90 billion? According to the yen today, it's 88. We usually say 90, but close enough, right? But yeah, it's a huge company there and every industry make all kinds of things. Pretty much they've got the OT of the world under their belt. So how we're leveraging it is number one, what that brings to the table in terms of the transformations from a software perspective and data that we can bring to the table and the expertise. The other piece is we've got a huge opportunity via the Hitachi channel, which is what's seen for us the growth that we've had over the last couple of years that's been really significant since we were acquired. And then the next piece is how do we become part of that bigger Hitachi IoT strategy? And what's been starting to happen there is, as I mentioned before, you can kind of probably put the math together without giving anything away but you think about capturing, being able to capture device data, being able to bring it into the digital twin, all of that, and then you think about, okay, and what if I added Pentaho to the mix? That's pretty exciting. You bring those things together and then you add a whole bunch of expertise and machine learning and you're like, okay, you could start to do, you could start to see where the IoT piece of it is where we're really gonna have to be. IoT is a forcing function, would you agree? Absolutely. It's really forcing IoT to go on, whoa, this is coming down fast. And AI and machine learning and cloud is just forcing everyone. Yeah, exactly. And when we came into the big data market, whatever it was five years ago in the early market, it's always hard to kind of get in there. But one of the things we were able to do when it was sort of, people were still just talking about BI, but have you heard about this stuff called big data? It's going to be hard. You are going to have to take advantage of this. And the same thing is happening with IoT. So the fact that we can be in these environments where customers are starting to see the value of the machine generated data, that's going to be the end. And it's transformative for the business like the community college example. Totally transformative, yeah. The other one was, I think Eric might have mentioned the IMS, you know, where all of a sudden you're transforming the insurance industry, was always looking at charts of I'm a 17 year old kid, okay, your rate should be this because you're a 17 year old boy and now they're starting to track the driving and say, well, actually maybe not. Maybe you get a discount. Time for the self-driving car. I'm transforming, okay? Well, Don, I appreciate it. Give us a quick tease here on Pentaho World coming in October. I know it's super early, but you have a roadmap on the product side so you can see a little bit around the corner. Yeah. What is coming down the pike for Pentaho? What are the things that you guys are beavering away at inside the product? Yeah, I think you're going to see some really cool innovations we're doing and I won't, on the Spark side, but with execution engines in general, we're going to have some really interesting innovative stuff coming, more on the machine learning coming out. And if you think about, if data is what is the hard part, just think about applying machine learning to the data and I think you can think of some really cool things. We're going to need algorithms for the algorithms, machine learning for the machine learning, of course humans to be smarter. Don, thanks so much for sharing here inside theCUBE, appreciate it. Pentaho, check them out. We're going to be at Pentaho World in October as well on theCUBE and hopefully we can get some more deep dives on with our analyst group on what's going on with the engines of innovation there. More CUBE coverage live from Silicon Valley for big data SV in conjunction with Strata Hadoop. I'm John Furrier, I'll be right back with more after this short break.