 Okay, we're back here live in New York City for Strata Plus, Hadoop World, Cloudera's conference started three years since our third season. I'm John Furrier, the founder of SiliconANGLE.com. I'm with Dave Vellante, the co-founder of Wikibon. We have Mike Olson, the CEO of Cloudera. I know you're really busy. We've got tight schedules. We'll jump right into it. Mike, Cloudera, what a morphing story it's been over the past three years. I think you had hundreds and hundreds of employees now from the 30 when we first met Hadoop World. We were invited by you guys three years ago, 2010. It's our third season, I guess it's the season if we call it, of the queue. We've been at all of them. We love it, and you guys started the show this year. The show is now being run by O'Reilly. So before we get into it, just give folks the story about what the show is here, and I see Cloudera's co-producing with O'Reilly. Give a quick sound bite on that. Yeah, so let me give you a quick rundown. So this is the fourth of these shows ever. In 2009, Cloudera organized the first ever Hadoop World here in New York City. 500 people showed up, and I have to tell you, we were ecstatic. I couldn't believe there were 500 people on the planet who knew what Hadoop was and didn't work for Cloudera. The enthusiasm there, the energy there convinced us that interest on the East Coast and among business users was deep and real, and we should double down. Second year, the first year you guys came out, 800 people. Last year, 1,400 attended. This year, 2,500 people sold out the show well in advance. We made the decision after last year's show to team up with O'Reilly to produce a joint show and did that for a few reasons. First of all, I think that the big data and real social use case stories that get told at Strata are fantastic. I think that it is content and it is information that we weren't always presenting given our focus on Hadoop. In addition, we thought that by bringing the two events together, we'd be able to attract a much broader population. 2,500 people later, look, I'll tell you what, I think we could have sold a lot more tickets than that. People were getting kicked out of the door because there was not enough room, fire marshal issues like last year. Yeah, if we had been able to find a bigger venue, I think it would have been great. I don't know what we're going to do next year. This is about the biggest you can do in New York City unless you want to go to the Javits and there you kind of need 50,000. Yeah, we're almost there, but O'Reilly knows how to run events as the brother of the president. Plus, you're the CEO of the business and it's taxing of resources too internally. The whole Cloudera team ran it last year. So it's like, oh, there's that. I mean, you know, there are a lot of advantages to controlling an event as well, right? I mean, we can set the schedule, we can have a little bit more of a hand in choosing the content, not that there was any problem with the quality of the submissions or with the process to bring them in. 10 papers submitted for every one that we were able to accept this time. Good news, if you're here, the quality is extraordinary. The bad news is 90% of the people doing great work didn't get their work presented. We hear it's harder to present at Strata and Hadoop World than is to get in an Ivy League college, so there you go. Okay, well, that's great. I think it's a great gesture that you guys open up to the community. So instill our leaders in the event and co-produce with O'Reilly. Great stuff, got that out of the way. My next question is, Cloudera, give us a quick update because you guys are growing, you have a strategic shift with Impala, which we want to talk about your big announcement, but Cloudera's a company that you're leading, you got new personnel, you got your packages are increasing in terms of people, your scope, just give us a quick update on Cloudera. Company has grown dramatically. We've got offices now all over the United States. We remain heavily centered in California, so Office of Palo Alto headquarters and a satellite office in San Francisco, really for recruitment purposes. But in Washington, D.C., here in New York City, in North Carolina for developers and test, we've got physical presence as well. We have people on the ground in Japan, in Europe, in the U.K., and when we think about where growth is gonna happen in the coming year, for sure, in our ability to reach new customers. So technical field and sales, we expect to grow pretty significantly, and we expect to expand overseas in a very focused way. So we've got pockets of people, but I expect Kirk Dunn, our Chief Operating Officer, to be concentrating on building out infrastructure overseas. Last year, we were here, you were with Ping Lee, with Excel, who financed Cloudera on your board, talking about the Big Data Fund. And last year, big focus for you was the applications. So I want to ask you, was that a disappointment to you? Was there other forces in the market? Were you happy with the adoption of applications? Obviously, the killer app that we've been seeing in theCUBE is analytics and insights, which there's no debate, that's smoking hot. So what's your view on the application? Because last year, that was a big focus. Did they materialize? Was the middle way of working? You're seeing app fabric from continuity. You're seeing a lot of people putting out new things you have in Paula. Talk about the app market. Is there an app market yet? Is it just analytics? What's your view? So the war's not won, but the battles are going well. If you look at the existing established vendors, MicroStrategy has done a great job of taking is BI tooling and pointing at Hadoop and specifically at Cloudera's platform and making sure that ordinary business users who know how to interact with MicroStrategy can analyze data in Hadoop. Informatica likewise, you can now design data processing pipelines graphically and push that work down to a relational database or a special purpose Informatica engine or to Hadoop. So that kind of integration lets users who understand those tools well, often for years, get at Hadoop. But you're going to see some really exciting companies launch here. Ben Werther from Platforma gave a great talk about the visualization and data exploration tool that they've built that runs on top of the Cloudera platform that integrates natively with Hadoop. Our long-time partners Datamir and KarmaSphere have good products. This is the quarter and 2013 will be the year I predict of real explosion there. So we've seen enormous progress. It is much easier for business users to use Hadoop than ever before and I think over the course of the next five or eight quarters we're going to see even more of that. So the application, you're happy. The battles are being fought and some wins here and there. But analytics obviously has been the big first generation focus. Let's talk about simplicity in real time because then this sequence into Impala. Making Hadoop easy has always been something that people have been talking about. So you're seeing some progress there. So kind of creating a hard and top with the integration of SQL we just had adapt on. Talking about integrating natively into the clusters rather than building connectors. So that's one question I want to ask. Can you talk about that dynamic? And then secondly we'll go back to real time. So appreciate the opportunity. So let me begin by talking about Impala, what it is we're announcing and what it does. We have known for a long time that batch data processing in Hadoop, powerful, flexible, innovative as it is, solves only part of the big data problem. Not every user can tolerate the latency, not every workload can put up with the delays and the performance that batch data processing imposes. Hadoop lets you store any kind of data. You can store it in enormous volume. What if we and our customers said to each other, you could get all that power of MapReduce, you could do all those complex analytics, but you could also choose to ask interactive speed queries of your data. Don't want to have to move it out, I want to knit directly into the Hadoop framework. Impala is a distributed query processing engine that's a first class citizen in the Hadoop ecosystem. Your data stays exactly where it is. You leave it in HBase, you leave it in HDFS, you type a SQL query and you don't wait minutes, you get answers back in seconds, instantly responses so you can work at the speed of thought. Oh, go ahead, sorry. I was going to say, it gives users and developers the choice of how they get at their data and what kind of analytic and query support they require. So they get the best of all the power of MapReduce and the interactivity of a traditional SQL engine in the identical platform. Now you came out of the SQL world, you know how well, and talk about the impact on adoption, particularly in terms of the skill sets that are out there, people who know the language and understand SQL. So in terms of professional skills, if you know how to type a SQL query, you now know how to talk to Cloudera's platform. Impala lets you get a data stored in Hadoop as a first class citizen. Beyond that though, think about the enormous number of high quality applications and tools that speak ODBC and JDBC, right? Those now get to point at Hadoop and they get exactly the same responsiveness and exactly the same behavior out of the big data platform that they've long experienced from the existing relational players. So it opens up not just more users and more developers, but more existing tooling that people can point at Hadoop and that'll drive adoption for sure. So it's clearly a good thing for the marketplace. How does it affect the relationship with some of the current players? Like for example, we're at Oracle Open World this year, Larry basically paints this picture, do your filtering at Hadoop and then bring it into big data meets big iron. As you say, I'm an old guard relational developer. I grew up in the RDBMS industry beginning in the 1980s. I watched that industry and I watched those products mature and I will tell you, they are excellent. Look, if you've got a high performance transaction processing workload, right? If you're doing banking transactions, if you're running an OLAP analytic application where you're flying through the cube and looking up and figuring out, you know, in real time what users are doing, you're going to continue to run that on your big enterprise data warehouse. You now have the option for taking some of the workloads that were forced to run there before because there was simply nowhere else to put them and letting them run on Hadoop. So the idea is, look, if I've standardized on a big EDW for all my data management, well, for sure I'm flying through cubes and doing transaction processing, but I'm also doing ELT. And you know what, that's just data processing workloads and you're paying first class fairs to run that workload on your big EDW. What if you could free up that capacity to do yet more analytics and do the data exploration, the reporting, the interactive exploratory queries on a much more scale out and much lower cost infrastructure? That I think is the exciting opportunity. That's the disruptive. Sounds like the mainframe transition to me. But the cost is so low, it's okay. So that market is waiting to be disrupted. It's being disrupted. What's the barriers? Let me say it's not merely that the cost is lower, but suddenly you've got a place where you can keep not just the last quarters worth of data available for analysis, but the last decades. Yeah, so the innovation curve's also there. The real time, let's talk about real time. So that's a real time benefit. So that's a different mindset. So I talk about the barriers because those guys aren't going to be disrupted quietly. They're going to hold on and clutch on to their data warehousing solutions and eventually they'll, the smart ones will move over fast to this new concept. So what are the barriers to get to that real time, low cost, high performance environment? Well, so let me say first that as I said, I think that there are workloads that still naturally belong in the big EDW and in the big relational systems, right? And Hadoop wasn't designed to attack online transaction processing, right? By letting you do exploratory queries, we've opened up a bunch of flexibility and we've enabled a bunch of new use. We've got a tremendous relationship with our good partner Oracle, right? They resale Clutter as software as a part of the big data appliance. And if you're able to stand up and exadata relational system, performing the kind of operations that I talked about at scale. And next to it, you can put an Oracle big data appliance with the high performance connector between the two. So you can do all of your data scrubbing and cleansing and then blast it into the EDW for further interactive exploration. I think customers and both vendors. That's important, I want to capture that point because I think that's, we just, we were seeing a lot of that IBM, we were just at the IOD and French and I demanded this early in the week doing theCUBE there and they still have a huge mainframe business, right? So at the top of the flagship offering, yeah, the price performance, there is still that market. Well, you're saying if I get this properly, that a mid-range, the new use cases are the mid-range. That's exploding, that wasn't there before. Is that what you're saying? Or you're able now to take workloads that didn't need those really high performance externally demanding services and choose when it makes economic and performance sensitive to it to run them in Hadoop instead. And by the way, if you're running them in Hadoop, it's not just that work you can do, but you've got this hugely powerful analytic engine in MapReduce. And now you can ask questions about the last decades worth of point of sale transaction data and you can build behavioral models of users over time and you can do predictive analytics over what people are going to prefer based on a decades worth of history. So you get the best of MapReduce and interactive query support. It just wasn't economically feasible on Big Iron unless for a very small select group of customers. My final geek question before we kind of wrap up is HBase. Obviously HBase is morphed and we've been showing you what we've been working on. You've been following some of our little tool we built. And it's been great. It's been such an amazing product. But now HBase has grown up to be really big and popular for a lot of different instances. I notice HBase is a key part of Impala. You talk about the vision of HBase. Obviously it's growing outside of the geek community. It's coming more mainstream. There's different types of databases out there but HBase seems to be really, really popular and growing. What's your take on that and vision around HBase? Yeah, let me talk a little bit about how we view the platform overall. So if you read the announcement, if you listened to the messaging that came out of Cloudera today, Impala is our real-time query offering. RTQ is what we branded as when you buy the enterprise support and everything else. HBase is real-time delivery or RTD. HBase was the first of the real-time additions to the platform. It lets you get at individual records at basically web speeds and at web scale. We believe Impala offers the same sort of real-time access to a different sort of user. I predict that over the course of the next two quarters to two years, you're going to see the Hadoop platform evolve further. It'll support more real-time workloads and certainly it's an area in which we are actively investing. And I predict it'll attack different programming paradigms, different ways to get at data. There are things happening in the community now with Yarn, the new strategy for basically deploying and executing compute operations on a cluster that are going to make it much easier to build innovative ways to get at your data. So people think of Hadoop as HDFS and MapReduce. But look, you guys, that was just the problem that Google had first, right? Over time, this platform is going to grow into something much more capable and much more flexible. We're thrilled to release Impala as what we think of is really the first volley in that war, but I think you're going to see lots of interesting stuff happening from us and from the community. You know what's great? Having you in the Cube one, you know your business, you've been running it, but you're also a geek and you're a big Cal Berkeley dude and we just had Daniel from, was at MIT now, he's at Yale. East Coast, West Coast, they're a real big rivalry there. Although we don't joke about Brown versus these guys. But the question legitimately is, I know you follow a lot of the academic activities. You got stuff at Carnegie Mellon, University of Illinois, Champaign's got some compiler stuff that's got virtual machines built into it. You got stuff going on all over the top computer science programs. What are you seeing, what is Mike Olson seeing after that gets you intrigued coming out of the computer science programs right now? That are going to be related to the trajectory that this new industry of the big data group that's here is going to connect into the next decade. There are projects like Spark and Shark and Mesos that I think are pretty interesting. The trick, really the advantage of open source software, we don't need to predict in advance which of those is going to win. We can sit back as a vendor with presence in the market and we can wait to see what gets adopted by users and what solves really interesting problems. I mean, the reason that HBase rolled into the Cloudera platform is we noticed our customers were using it on their own. They went out, found the software, it was clearly solving a problem that they had and so we invested in it. And we'll continue to do that with the innovative work happening in the academic community. I think there's lots of interesting platform work and I named a few of those projects. If I had to say one thing that excites me pretty much right now, it is the quality of the data exploration and visualization work that's coming out. Stealthy companies are getting funded in the valley these days. We get a chance to look at a few of those that are making it much easier to wrap your head around a petabyte in real time and to swim through it. Get that signal, awareness. I mean, Tim Estes had a great line as Keynote said, there's an understanding gap. I mean, the attention's flat but data's exploding. Yeah, so digital reasoning with synthesis builds a beautiful platform for understanding and visualizing that kind of data. Likewise, Ben Werther at Platform launched yesterday with beautiful renderings and huge interactivity and that kind of stuff hasn't been available on Platform before. That unlocks business users, right? All Cloudera does is make this platform safe for IT staff to operate. We rely on partners to build those things. 400 plus companies in the Cloudera Connect ecosystem right now, that's tremendous for us but that kind of innovation is going to drive adoption and that's really important to us. Okay, Mike Olson is the CEO of Cloudera. Cloudera is the founder of Hadoop World, started at 500 people, not 500 people, but some 500 people four years ago, 2009. We've been doing theCUBE here ever since. We've been a great friend of theCUBE. We love Cloudera, great supporter of us and vice versa. You guys are doing some great work. Continue to be the leader and appreciate the support. I know you're busy, so thanks for sharing your perspective and good to see you and we'll be right back with our next guest after this short break. John, thanks. Good deal. Thank you, Mike. Good man. Great to see you. Awesome, cool.