 Here live, this is SiliconANGLE and Wikibon's The Cube, our flagship program. We go out to the events, expect a signal from the noise and this is our exclusive coverage of the Stanford Excel symposium. Excel and the Enterprise is the big focus. Excel's a venture capital firm and it's exciting. And they're here at Stanford, bringing out all their top executives, people in industry and also obviously industry thought leaders. I'm here with Jeff Kelly from Wikibon, my co-host and our guest is Mike Olson, the CEO of Cloudera and Excel-funded startup, well, not anymore, did a series of, Mike, welcome back to The Cube. John, Jeff, thanks very much for having us. You know, we've gone back when we started SiliconANGLE when we were just a small little project in your office in Palo Alto when you guys moved down and started growing, you know, it's still 35 people, now you're hundreds and a lot of expansion. You guys were the only company in the space and as the CEO you've grown a great team and you guys have grown, you're changed. You can, Cloudera 2.0, new folks have come in, we've grown, the market's grown, you have competition. How are you doing? So, very personally, you personally, Mike Olson. So the answer's very well. You know, as you say, the company is different today from what it was three, four years ago when we first started spending time with you folks, five years ago when we started. Of course, the job and the demands on all of our time, the work that's required of us and the way we engage day to day has changed as a result, but it's been tremendous to watch this business grow a pace with the market around it. You know, we really feel like we're in the middle of a whirlwind right now. There's lots of demands, lots of pressure, lots of opportunity, but it's a lot of fun. Anything calmer would not be nearly as good, I think. What do you think about the industry? Because now when we were first talking to you guys, your video was number one. We've been doing cubes and, you know, we remember the Hadoop world second one years ago. You did the Mike Olson Hadoop 101. So are we in a 301 now? What class are we in and what is the current, what are you guys doing right now at the level of complexity? First it was evangelizing Hadoop. Okay, Hadoop has changed to big data, analytics, the apps are starting to proliferate. You're seeing a software explosion, commodity scale out open source, fundamentally changing the infrastructure, data complexity, you're seeing the enterprise adopting it heavily, the competition for you is IBMs, EMCs and all those guys, it's changed. What is the, what is the 301 for Cloudera? Well, you know, a couple of things. When we started Cloudera, we were the only company talking about big data, for sure the only company concentrating on delivering Hadoop as a platform to traditional enterprises. Obviously, no longer the case. There are many, many vendors that have entered the space with big data solutions and most of them trying to build on in some instances to some extent co-opt Hadoop. But Hadoop the platform has changed dramatically as well. When it was first created, when Google invented this scale out storage infrastructure, it had a single way of getting at your data, map reduce. It was this batch data processing engine. It was transformative and powerful and changed the way that organizations could attack their data. But it was really just one way of getting at the data. Over the years, we've seen other ways of getting at the data added to that platform. One big shared store, lots of data from lots of sources and lots of different formats, all in a single repository. Yeah, you can map reduce that. You can use machine learning and natural language processing in batch mode. But you can also serve data out in real time. If you've got web applications and need to be able to fetch records interactively, well, HBase is tremendous for that. Last year, we introduced Impala, our scale out interactive speed SQL engine that runs on that same iron, yet another engine for getting at your data. Just today, we've announced general availability, version 1.0 of Impala. And yesterday, we announced with SAS, the numerical analysis mathematics company, a port of their software so that it runs likewise native on the Hadoop platform. Scale out computation, analytics using the SAS engine, using Impala, using HBase, using map reduce. A variety of engines brought to the data. The real opportunity we think in big data is precisely that. All your data from all those sources in one place, and then lots of ways of getting at it, right? Not just map reduce anymore, but support for BI applications using SQL, support for sophisticated analytics apps using SAS. I predict you'll see this platform get more interesting still, more engines added to attack more problems. I know Jess got some questions, but I got to ask you about the Cloudera status update. Obviously, more funding announcements you've had. You guys are expanding. Just talk about some of the successes you guys have. Just explain to the folks, I don't think anyone can appreciate what you've done in two, three short years. Where are you now? You're on a trajectory. Just describe us some of the highlights. Well, as you know, hundreds of employees. Tremendous enterprise, global 1,000 customer base folks in virtually every vertical market. Now using big data platform to attack real, meaningful business problems. We are all over the United States with development and sales offices in about 30 states at this point. We've got offices in Europe. We are in force in Asia primarily right now in Japan, but looking to expand more broadly. And we are beginning to expand the partner base even more. North of 600 partners, 200 ISVs right now building applications around the platform. We think this is the next generation of data management platform. We're really seeing it evolve the way that relational databases did in the 80s. Applications skills are proliferating and that's going to drive adoption significantly worldwide. Mike, so I'm curious to get your take on what's called the internet of things or the industrial internet. We're hearing a lot about that lately in the press. And the idea is that these really industrial level pieces of equipment are creating tons of data and companies want to kind of interconnect them and create pretty complex systems so that they can make them more efficient and respond in real times to anomalies and things like that. How does, what's cloud areas play in that space? Is Hadoop a kind of repository where maybe you'll bring that data in? Or how are you helping customers really kind of build that larger infrastructure and systems to actually make this a workable platform for that kind of use case? Yeah, so we think that that is and will be a major driver of the platform's adoption. You think about it, back in the 80s and in the 90s we generated data at human scale. So you'd hire or fire somebody and an employee record would happen. Or you'd go to the store and you'd buy something and maybe a transaction record would happen. When data's getting generated by human activity it can only grow so fast. You have to go do something to make data. When machines start talking to machines things accelerate in a hurry. Virtually every piece of serious equipment today think about jet engines, think about robots on a manufacturing floor. Heck, even think about your refrigerator, think about the smart meter outside your house. All of these are instrumented extensively. They have sensors that are streaming data continually back to the mothership. If you buy a car now, some insurance companies will reduce your insurance rate if you instrument the car so that it can verify that you're using good driving techniques. Now of course nobody wants to do that for themselves. Everybody wants to do that for their teenagers. But data is going to proliferate in exactly that way. Most of the data in the future is going to be machine generated data from this internet of things. And then the way that you capture and analyze that data, well look, if you're a jet engine you've got one set of problems. What kind of preventive maintenance? What kind of protective maintenance do I want to do on this platform? If you're capturing data streaming off the smart grid, hey, what do demand curves look like? How should I think about spinning up new generative capacity in order to be in front of the demand curve? So the applications are going to vary but the desire for data capture and data analysis is going to explode across all of those sectors. Mike, I got to ask you a question because two years ago you announced, we've been covering you guys so long and you say, hey, two years ago you said, you and Ping went up and said, we're going to do a $100 million big data fund. I remember it was a Hadoop world and you said, we believe that a lot of applications are going to build on the platforms. So give us an update, what have you learned? What did or didn't happen? Obviously analytics exploded as the killer app so that next year analytics kind of dominated the conversation. And then you're starting now to start seeing apps across all industries. So for a share, what you learned from that time that did or didn't happen and what did happen? Well, a couple things. So we believed then, we believe now that proliferation of tools and applications are what will really drive adoption of the platform. The reason that we were excited about Excel's decision to invest in that way was it would build user facing tools that made the platform consumable. Now that's absolutely happened. So applications like Cosata for predictive analytics and customer optimization offers in retail. Absolutely tremendous, built natively on the platform. That means a whole bunch of customers who wouldn't build that application, don't have the skills to build that application, can just buy it and run it and get the advantages of big data. And we're seeing lots of those end user facing applications come on board. There are a bunch of existing tool vendors. Think Informatica, MicroStrategy, I mentioned SAS a little while ago, that provide data analysis tools to their users. There's a new class of those players emerging as well. So companies like Platforma, for example, that are delivering now the first generations of native, built on top of Hadoop, analytic tools and applications. The one sector that I probably didn't foresee, that I probably shouldn't have foreseen was, you know, one of the biggest problems in big data is just cleaning it. What have you got? Where did it come from? How can you scrub it? Data exploration and transformation tools turn out to be a pretty big sector, great Excel backed company called Trifacta, working on exactly that problem. I think that's what we're making more of. Your friend Berkeley Professor. Yeah, Joe Hellerstein and Go Bears, let me say right here on theCUBE. So let's talk about that. So here at Stanford, I just had Tom Byers on talking, and he's the Berkeley guy too, who actually works at Stanford. So you guys have a lot of PhDs at work for Stanford. So a lot of PhDs are coming into the field and are leading companies in big data. What do you see happening for these new entrepreneurs? Because there's two types of entrepreneurs that we've been documenting on SiliconANG, Wikibon. I don't want to say kind of different profiles, but Hadoop brings on people solving big problems. Amazon developers are kind of the lamp stack developers. They're out doing some really good fast mobile apps, agile, blah blah blah, all that great stuff. But yet they have a lot of traction. They have turnkey infrastructure and DevOps. And Hadoop has always been kind of looked at as hard to use. So with these PhDs coming on, what do you see as that profile? Is it getting easier? And how does that all connect? Yeah, so we're seeing the industry skill up for sure. More people understand how to use the platform better. And we do a lot of training. We've trained more than 15,000 people in the company's lifetime with more than 5,000 certified on Hadoop. I was on a panel here at the event just a little while ago. We need more skills. We need deeper expertise on the platform, but what we really need is good software solutions. So ordinary mortals can just buy apps that they can use to attack those problems. The PhDs you're talking about, the geniuses that graduate from top flight universities like Stanford, go build those applications. They have the deep science to make that happen and deliver it, we believe, in a package that makes it consumable by ordinary mortals. And let me say, I wouldn't in any way suggest that those building on public cloud infrastructure, Amazon in particular, aren't skilled up to do it. I think that public cloud is going to be an increasingly popular way of consuming this platform. Much easier to spin up a cluster for quick analytics. Much easier to build a proof of concept or dev test cluster in the public cloud, inexpensively. Making sure that the platform runs well on that infrastructure is an absolutely critical task for a company like us that is concentrating on driving adoption. Yeah, I think you make a really good point that the public cloud is an area where it might be a good space to do some of the analytics because you've got a lot of the third party data you might want to bring in, already lives in the cloud, really. It's not by its definition, it's not inside any one person's, any one company's data center. But of course you've got concerns around security and privacy and things like that. I guess, and you also mentioned we're seeing a lot of test and dev situations arise in the public cloud. Do you also see that potentially expanding to actual production level? Or do you suspect that as we're seeing now, a lot of those are being brought back in-house when it moves to production? A couple questions there, but I want to get your take. So I guess the answer is, we're not driving public cloud consumption. We're not particularly driving behind the firewall consumption of the platform. Our point of view is, we're a software vendor, we want you to use our software wherever it makes sense for you to do that. Most of our customers need to do that where the data already is. If you're a bank or if you're a hospital, you've got a big data center, you've got a lot of data already deployed, and as a result, you're probably going to run in your own data center. You may, by the way, have regulatory or legislative reasons that you need to keep data behind the firewall. If however, as you say, you're taking advantage of publicly available data sets, if you're running a bunch of cloud infrastructure right now to deliver, for example, your website to users and your web logs are getting generated in the public cloud, heck, that's a perfectly natural place to run. I predict we'll see increasing adoption of public cloud infrastructure for these applications, but it's not going to be suddenly it happens, it's going to be over time, we're going to see steadily more and more. My final question, we're getting the hook here. Great to have you on theCUBE. Final question, just answer real succinctly your goals for next year for cloud year, where are you guys going to do this year that you're going to be proud of, your key targets? So I think the answer is we want to take some of these application vendors that have made an early bet on the technology and we want to drive them to ubiquity. I want to see real important business and socially meaningful problems solved on this platform. There are folks working on optimization of energy delivery, on better healthcare, on how to produce food to feed the entire planet. Now we don't have the expertise to do that, but we can provide tools and platforms that do, and I want to be sure that the vendors, that the entrepreneurs, that the innovators that have developed those applications see real success. Clearly, that's going to make a difference for us, we're going to make more money, but in addition, I think it's going to deliver the value that the platform really offers. It's socially meaningful, not merely commercially meaningful. Okay, Mike Olson inside the Cube with CEO of Cloudera, this is Excel's Stanford Symposium. This is theCUBE, we'll be right back with our next guest after the short break.