 From Orlando, Florida, extracting a signal from the noise. It's theCUBE, covering Pentaho World 2015. Now your host, Dave Vellante and George Gilbert. Welcome back to Pentaho World, everybody. This is theCUBE. theCUBE goes out, we go to all the events. Mike Olson is here, he's the Chief Strategy Officer, Founder and Chairman of Cloudera. Mike, on theCUBE for probably about 100 times. So welcome back, great to see you. Good to see you, Dave, George, great to see you, and glad to be here, guys, thanks. I tweeted, I'll dress up today, you know? No hoodie. And somehow I'm still underneath you. You look good, it's a pretty tie, I like it. Well, you know, East Coast guys, right? So, well, anyway, great keynote today. Picking up on some of the themes from Strata and Hadoop World, making Hadoop, you said disappear. I like to say invisible, because it's still there, you just can't see it. That's been pretty important, all this complexity. We heard from Strata and Hadoop World complexity. We heard data in motion, heard a lot about storage, database. You know, you guys are evolving the platform. So, give us the update on what's happening in your world. Well, listen, a few things. First of all, notwithstanding your use of the term invisible, mind of disappearing, absolutely needs to happen. People have to stop focusing on the technology and start thinking about the business problems that are getting solved. Notwithstanding that, we've got to advance the state of the art of the platform. It has to be better, more stable, easier to use. It's got to be faster, it's got to onboard more workloads. And we've been driving very hard to make that happen. You've heard about the big investment we're making in Apache Spark, our one platform initiative, integrated with all of the rest of the Hadoop ecosystem, secure, governed, manageable, and so on. We've innovated in two ways on security with our record service offering, a consistent security layer that makes it much easier for the processing frameworks and analytics for you, for Spark and MapReduce and Impala and so on, to have a consistent secure view of the data. Record service provides that without changing those engines in the way that you used to have to do. And then finally, our new Kudu project, a new storage engine for the Apache Hadoop ecosystem, a complement to HDFS, a complement to HBase, lets us handle workloads that neither of those systems could, and we think is going to unlock new analytic value in applications that you couldn't onboard onto a new cluster for performance or complexity. We're super excited about Spark, and I know, George, you want to talk about that, but there's been a few moments in the, well, first of all, happy birthday to Hadoop. Ten years, ten years. Pentaho's 11 years old, I think, so they're kind of interesting parallels there, but Hadoop, you guys got it all started. I mean, I remember you had no competitors, and then people realized what the opportunity was, Impala changed the world, bringing SQL to Hadoop, and now you're advancing the platform Kudu. I heard a lot about storage, so help us put Kudu in context. People say it's a replacement for HDFS or HBase, you said in your keynote a couple weeks ago that it's a complement, help us sort through that. So when Google created this technology, the key workload they had, the thing they needed to do, was very large sequential transfers, right? Slurp up a huge swath of the internet and write it out in order, or observe what users were doing on their website, capture the web logs, and write that out in order. Scan through it in order. Never go update a web log, a thing that happened, happened. HDFS was patterned on the Google file system and delivered those same services, and it is best in class at those services. You can't find a better large scale sort of log processing engine than HDFS, storage framework than HDFS. It didn't handle an important use case that the community cared about. There was no sequel capability. You couldn't do very fast record put get. You couldn't do real-time data service to an internet app. HBase was created to complement HDFS to provide that service. Neither one of those storage formats supports very well a very important analytic workload, which is I'm landing data in sort of time order, right? I've got sensor data streaming in from an IoT app. I've got stock trades coming in. I've got time series data. I'd like to capture that data in order. Sometimes we do want to go update that stuff. We want to support updates. But then my analytic access pattern is to scan through little bits of it and to compute aggregates, which the average was the max trade, for example. Neither HDFS nor HBase is good at both random access and scans. Kudu solves that problem. Does that workload very well? HDFS will be here forever. There's nothing that you would use for log processing other than HDFS because that's what it was built to do. HBase, for no SQL service, always going to be a winner. We think we've filled a hole and we think we're going to attract new workloads with Kudu. So I always get nostalgic when you come in the queue because of you and the folks at Cloudera that we were able to get into this and furrier spending time in your offices. And we had the opportunity to early days interview guys like Jeff Hammabocker, who's of course famous for saying everybody's trying to click on ads, blah, blah, blah. The industry has evolved so much and we're solving so many new problems besides clicking on ads. Clicking on ads, still booming. Opower was a customer of yours that was mentioned today. You're hearing about innovations in healthcare. Hitachi talking about IoT. You mentioned the shift toward business value. Can you talk about the way that this industry has transformed in the last five years? Yeah, so the original platform was built really to do log processing for Google. And it changed the world by being able to do that better than anything else. But there were a lot of workloads that weren't log processing at Google, right? The kind of IoT data ingest that Opower does, sucking multiple readings a minute off of every single smart meter in Berkeley, California or Iowa. The analytics that you want to run over that, the different frameworks, machine learning. I want to examine vast amounts of data for patterns and then begin to make real-time predictions and score new events against what I've learned. Those capabilities are now part of the platform. That's why Hedup is such a big deal. It's really these advanced analytics workloads that are driving massive new value out of data. It's always going to be good to lower your costs, to do more scalable, big ETL faster, carve down your spend in the data center by going scale out, a la Hedup, a la Cloudera. But you're going to drive new revenue. You're going to drive new profit if you're able to analyze and understand your data in new ways, engage with your customers in new ways, and that's when data becomes strategic. So we know that Yarn and HDFS sort of as Hedup 2.0 made a lot of these new capabilities possible. And there's that race to shrink the difference between sort of getting the data, analyzing it and operationalizing it. What are some of the outlines of big data 3 or Hedup 3 that you see coalescing? You know, I talked about it a little bit in my keynote and we've touched on it here. We need a robust ecosystem of applications and solutions that run on top of the platform. If you're a banker, if you're a hospital administrator, you actually don't want to know about Hedup under the covers, right? What you want to know is, hey, I've got a bunch of patients coming into my hospital, I want to deliver the best possible care, give them the best outcomes as reliably as I can and I want to reduce the frequency with which they come back, right? That's an analytics application and you want to buy that application. If you're a banker, you care about risk in your equity portfolio, you care about fraud, you care about cybersecurity. We need to move from selling the platform to selling those applications. That's a big part of 3.0. And it's a big change in sort of the go-to-market and how you sell and I realize that the emphasis, you know, in what you're talking about to the market changes, but I guess my question is, it's almost like an ISV-centric question, which is what does the ISV need to deliver those solutions? So one answer is good middleware and an easy to use development interface on top of the platform. And actually services like record service are meant to make that easier. Less customization, you just benefit from security because it's built in at the platform layer. We made a strategic investment in a company called Cask last year. Those guys are developing a consistent set of APIs across the entire platform, designed to make it much easier to build these applications. But the 3.0 question also opens the door to talk about what are the new processing frameworks? What are the new analytic engines that are going to emerge that are young today and are going to grow up? And actually we very much view Spark in that way. It's fantastic, it's transformative. As born at Berkeley, no security model didn't have any sort of enterprise grade storage story. There was no governance, no operations. All that stuff needs to be added. As a vendor, we need to make that happen in order to deliver the analytic ability to the application developer. One other question, and you know, there's this from Jeffrey Moore's Crossing the Chasm in the tornado. As solutions, well as projects become, solutions become products over time, the service license mix changes. We're not dealing with license now, but you get something that's more repeatable. And Amazon, Google, Azure, they're building their own native services that are not necessarily open source. They're not always Hadoop. They might have those as well, but they fit together really well. To what extent are those competitors now and in the future? That's a good question. So first of all, public cloud deployment, public cloud infrastructure is a fact of life that every single technology vendor just has to flat out acknowledge, right? We've got to deliver. We do deliver today. Our platform on Azure, on Amazon web services, on Google compute, we'll continue to do that. And then a bunch of smaller enterprise clouds as well. We go to market with an integrated platform, MapReduce and Spark and Impala and our search engine with a storage framework and security in much the way that there are composable services at the big web service providers. We think we've got a coherent platform that you can pick and choose the pieces you use of. We do and we will continue to deliver that on these public clouds. I think our focus on large enterprises, on their use cases allows us to go over after a valuable segment of the market that we're deeply expert in servicing. Our ability to run on the fantastic infrastructure and the object stores that both Amazon and Microsoft, the consistent compute layer, that makes our lives and our customers' lives easier and this innovation in the public cloud is going to continue to drive prices down. It's actually why we're so bullish on the big providers. Mike, I know you've got to ask about Pentaho, Pentaho World, the relationship. Hajupe and Pentaho, both about a decade old, very impressive presentations this morning about the depth of the platform. Talk about the relationship a little bit. We've been close to Pentaho almost since the day we opened our doors. They've been one of the earliest companies to embrace big data and Hajupe as a processing framework. Relationships in cordial for a very long time. When Quentin Gallivan assumed the CEO role some years back, I would say he really turbocharged the relationship. First of all, Quentin is a quality guy. He's salt of the earth, he promises you he's going to do something by God he lives up to. High energy too. Fantastic partner for us. Under his leadership, under the great product and strategy teams he's put together, Pentaho has been among the very earliest vendors to embrace, for example, Impala and Spark. How do you build the best analytic application? Well, you take advantage of the most powerful analytic frameworks. Pentaho has been aggressive in doing just that. We've got a huge number of joint customers. Pentaho is, to me, the kind of solution that I was talking about. I sell a platform to IT guys. I need Pentaho to walk in and talk to the line of business owner in a financial services institution about value at risk calculations, which they actually can deliver. That'll drag the platform into deployment. They are a big forward bet by Cloudera. Pentaho is a big forward bet by Cloudera on our continued growth. With them, I think we can do great things. So I got to ask you, you're seeing all these trends, companies, Dell went private, EMC's now looks like it's going private, BMC, Informatica. You saw some recent IPO, Scott Deetson's company, just went IPO. You guys had a huge funding event a little while back. Life as a private company, good. Do you need to be public to get more awareness? What's your position on that? You know, I don't think our challenge right now is awareness of what we're doing. For sure, the strategic partnership with Intel, the really substantial cash investment they made in the business, has bought us options, alternatives that are unique. We've got the resources that we need to pursue strategic acquisitions, to grow organically in the ways that we like. Now, founder, you know, I've told you since the very beginning, we intend one day to be a publicly traded company. We're able, given the resources that we got from Intel, to make the decision about when that will be. So I'm absolutely confident we're going to be a publicly traded business, but we're able to do it. We've got the luxury of doing it on our timeframe. It's great, do it on your own terms. Mike, great to see you again. Thanks so much for coming on theCUBE. I know you got to run, and always a pleasure. Dave, thank you, George. A real pleasure, thanks. Keep right there, everybody. We'll be back with our next guest. This is theCUBE, we're live from Pentaho World 2015. Right back.