 Live from the San Jose Convention Center, extracting the signal from the noise. It's theCUBE, covering Hadoop Summit 2015. Brought to you by headline sponsor, Hortonworks, and by EMC, Pivotal, IBM, Pentaho, Teradata, Syncsort, and by Atunity. Now your hosts, John Furrier and George Gilbert. Okay, welcome back everyone. We are here live at Silicon Valley. Live from the San Jose Convention Center. This is theCUBE, our flagship program, where we go out to the events and extract the signal from the noise. I'm John Furrier, the founder of SiliconANG. I'm joined by co-host George Gilbert, wikibond.com's new big data analyst, and our next two guests, Todd Lawrence, director of global partner sales at Cloudera and Michael Crutcher, director of product management at Cloudera Storage. Welcome to theCUBE. Thank you. Thank you. Guys, so Cloudera, obviously, you know, near the history. theCUBE started at Cloudera, was in the office, 17 people back in the day, grew now Cloudera's massively worth billions of dollars. New VP of engineering, I saw the news, congratulations. Going to the next level. You guys are kicking some butt. It's great to see, not yet public, but we speculate. So what's going on at Cloudera? Let's talk about some of the things going on at Cloudera. What's happening? What's the market telling you guys? Share some insight. Certainly. First of all, thank you all very much for the opportunity. We're of course happy to be here and talk to our friends at theCUBE. So, you know, it's interesting. You'll see the market continue to go the way that we've been sharing that it's been going for the last two years or so, right? Two years ago, we were having a very different conversation with customers, right? It was all about what I used to call the Hadoop Zoo, right? How many committers you have on this project, whose version number of that project is more advanced, things like that. And as a result of where our customers have led us over the last two years, the conversation is now about a very different set of needs. It's about business needs and it's about enterprise capabilities. And so now what we're hearing from customers is how can you solve our business problems? And how can you ensure that Hadoop, as we're using it not on sandboxes, out in the periphery of our engineering infrastructure, but actually in the center of the data center connected to all of our other enterprise-grade systems, how can we be sure that our Hadoop cluster shares the same enterprise-grade capabilities as our data warehouse, as our ETL servers, as all of the other things that have been driving our business for a decade or more? Yeah, we saw each other, EMC World, one of your partners. You guys have a great partnership under your leadership and great senior all the events. EMC's got a whole new vision now with their core technology division under Guy Woodward and then you've got CJ Desai, who's run the emerging team. It's interesting they put Isilon in there. It's kind of like the anchor, the Macy's of the emerging. It's not emerging, but it's been around and then ExtremeIOs and the other group, which is emerging, but it's going to be a seller. It's going to sell like hotcakes based on the pipeline. So storage is hot. Storage is a big part of Impala, all the stuff going on around data. What's the relationship with EMC and how does that relate to what's going on here? So let me comment about the business relationship and then you can comment on the storage. Yeah, sure. So as you know, we announced a jointly supported solution Cladera on Isilon last October, right? So we've been working and working very deeply with the engineering team over at EMC Isilon for the better part of a year. It's kind of funny, nobody believes this, right? But I've been told multiple times that we have a tighter relationship at the engineering level than other companies that also have solutions certified on top of Isilon, right? So we have weekly engineering meetings. We collaborate at a roadmap level. We collaborate at an Apache HDFS level, because as you know, 1FS has supported the HDFS protocol for a long time. So our partnership with EMC is super, super strong. Today it's centered around the Hadoop solution on top of Isilon and turning your Isilon data lake into an enterprise data hub with Cladera. But I think you'll see some very exciting developments emerge within the ETD family of solutions. You mentioned CJ Desai, of course he runs ETD. You'll see some very interesting developments within the ETD family related to Hadoop over the coming months. We're excited to be working with him. Well, it's interesting. So Jay Patel, who is the founder of Amazing, I mean, he's no longer there, but like when I interviewed him years ago, back in 2010 when theCUBE started, when EMC acquired Isilon, they were running all the big data for Facebook and all the big, I mean, they were big data. They weren't, and then when Greenholm got acquired by EMC, I mean, that was their big data. So it seemed like a mismatch then, but now we're coming back to Isilon. It's the solid stable in the emerging foundation. It's not like an emerging technology. It's powering some big ass data in a lot of big web scale companies. So just one more comment, and then I want to make sure Michael has a chance to comment as well, but the value proposition to EMC Isilon customers for Hadoop is actually significantly different from the market as a whole, right? So Isilon's got 6,000 plus customers today using Isilon. They're not looking at Hadoop as this new cheap and deep storage solution, right? Those customers have already chosen their scale out storage platform. It's called Isilon, right? So they've been investing in the concept of a data lake since before we all use the term. Totally, yeah. But what those customers are excited about, and this is validated by all of our early joint customers as well, they've got, in some cases, multiple petabytes of data sitting in their Isilon cluster. They listen to what this whole Hadoop ecosystem is doing at the analytics and at the processing level, and they're thinking, you know what? I want to get more value out of the data that I've already got in my data lake, and so that's what drives their interest in Hadoop. It's really all about bringing the analytics to the data where it lives today in Isilon. Does it also simplify the management relative to just locally attached storage, J-Bot stuff? I'm going to give Michael a chance to comment. All right, champagne, come on, get it worded. Yeah, certainly on that, there are management capabilities both on the Isilon side and through Plataura Manager, which has very extensive storage management capabilities, but kind of going back to what Todd was saying, I think it basically comes down to customers are looking for capabilities and not necessarily just a particular label that's slapped on something. And we've got customers that already have these very large install bases that are sitting on Isilon. They require certain capabilities, but they also want to run analytics and a lot of the other stuff that come with the Hadoop's ecosystem and the applications that we support. So bringing those applications there solves customer business problems. And at the end of the day, that's what we're about is providing capabilities to customers. What about Flash? What's going on with Flash? How does that relate to Plataura? Because certainly all Flash arrays is looking at going to be a cell-like hot cake, certainly on the extreme IOS side. So how does that factor into some of the software innovations going on in the big data ecosystem? My background's in engineering right now. I think if you ask any engineer, like he'll say, it depends. Because that's the honest answer, right? That's a good answer. And it really does depend on the characteristics of the application that you're looking to provide. And so for some Hadoop ecosystem products like HBase, Flash is very important because you really need that sub-second, very, very low-latency applications. And for that, Flash is orders of magnitude better than disk. So certainly we see it being before there. So workload-specific issue, right? Yeah, I think that it's still for, there are certain applications where Flash is the right answer and certain applications where spending disk is the right answer. And if we look back like 24 months ago or 18 months ago, we may have seen a different shading in terms of how much was appropriate for spending disk and how much was appropriate for Flash. We might see some of that tipping a little bit towards Flash in terms of a Cosper Thoroput and Cosper IOPS. So I think the landscape is changing. We're seeing Flash as part of a default configuration more and more often. We're seeing that more available in the data center. So I certainly think it's getting real and customers are really deploying it. If Flash gets more cost-effective for IOPS and you can get closer to memory speed, but storage capacity, sort of combine the best of both, does that change the type of workloads HBase can handle? It changes, or does HBase have to change? I don't think HBase has to change. It can take advantage of that PCI Flash or just SSDs right now. I don't think it fundamentally changes what you deploy it to do, but it comes better at the things it's already good at. So it's kind of just like ramping up the things that those kind of applications are already good at. So what's the big challenge around mainstream? I got Merv Adrian who was just on theCUBE earlier talking about this whole ecosystem is now going mainstream. You know, I'll see you talking about the trophic disillusionment, but then that kicks up at the value, right? So start to see certainly the conversation change. You know, the old, before the early adopters, Cloudera was scale out commodity hardware. Certainly on storage, awesome. Amar and I would have conversation, Amar Awadallah would have, I'd have conversation how great that is, just like Yahoo, just like Facebook. But now EMC, we're into this now, intersection of mainstream. You've got people running stuff with Icelon. There's nothing to do with religion around open source commodity, anything. It's just that's where their businesses run on, right? So you've got big iron systems that are legacy and software management systems. What's the critical thing that's happening in the market for customers? I mean, what's the, because storage in particular, there seems to be a lot of activity, right? Because is it scale, is it latency? I mean, what is some of the top conversations that you're involved in there? This sounds like a business question, Todd. So I wanted to be polite, but I will tell you that you mentioned the religious war kind of perspective here, right? There's still a lot of that, right? There are still people in the Hadoop world that think, oh, you know what? Hadoop is based on something that Google designed for commodity hardware, and that's what it's going to be. And Amazon builds their own machines. God bless America, what it cares, right? Exactly, but again, that's a religious argument, right? If you look at the commerce argument here, it's about, for these alternative storage platforms that now Hadoop is moving and supporting and running on top of, you know, the economics for a customer who's already made that investment are very different, right? So you look at a customer who's, again, got, let's say, petabytes of data in Isilon. The marginal cost for them of making that data accessible to Hadoop is not about rebuying commodity hardware. It's frankly about just finding, redeploying or potentially investing in more compute to be able to add the compute capacity that you need to match whatever workloads you're trying to run on your petabytes of data in Isilon, right? So the marginal economics of making more data accessible to Hadoop are very different when we're talking about now that data living in Isilon, right? One of the other things, I'm sure you've heard this before, but the early customers that are moving in that direction all tell us that for them, they value the ability to manage their storage and their compute independently, right? Data growth, there was a customer that spoke after me at EMC's booth at Hadoop World, I'm sorry, at EMC World. And this customer said, look, we're living this. Our data grows 100% a year, right? Our compute requirements are not growing that fast by any stretch of the imagination. So they're expanding their Isilon footprint fairly significantly, right? They're managing their compute with a virtualization layer in a much more efficient manner. Yeah, it's cleaner. And then you move the commuter on compute, it's great, it's getting cheaper, too. Absolutely. Storage isn't it? You know, it would be interesting to hear sort of the solution selling that Claire is doing right now because we heard a message from Hortonworks that was somewhat different, like along the lines of electricity, as in, you know, pervasive and we're going to have, you know, essentially appliances that consume that, appliances, the analogy being applications, you know, pervasive and everywhere. But in terms of getting an economic buyer to write a check right now, you know, where the infrastructure is not ubiquitous and it's not, it still has rough edges. What are the ones that are, you know, sticking right now? So I'm not sure I caught your, to compare and contrast with the Hortonworks, but I love that question. So just flesh it out a little bit more and I'll do this. The Hortonworks one is someday it's going to be ubiquitous, the infrastructure is going to be ubiquitous and we'll have all sorts of apps that plug into it. But we're not there yet. And so the question is, you know, with the Gartner PipeCycle or Jeffrey Moore's crossing the chasm, what are the applications that are getting the customers to write checks now? So another way people have been talking about the recent Gartner report is, you know, Hadoop's Dead and things like. Yeah, but that was, it was, no, I know. So let me just say that's not been our experience. Obviously not, right? But you guys went to EMC. Well, did you see Mike Olson and Doug Cutting's keynote there at EMC World? Yeah, and then Mike came on the queue with a tie on. We don't see that very often. I know, we had pictures. We had to prove it, we had to put it in the archive. So look, Mike facilitated a whole panel of customers, right? And these customers in their own words, one from financial services, one from Telco and one from healthcare, described how using Cloudera, they're getting tremendous business value today in production out of Hadoop, right? So there is no disillusionment. But that wasn't the source of the question. The source was more, he said early adopters, mainstream. He's talking mainstream, not so much hardcore early adopters. Okay, so then carrying that thought forward. Again, along with that transition from the Hadoop Zoo to talking about enterprise-grade Hadoop, what else we are seeing, and by the way, this is very much a difference, right? Other players in the Hadoop space will still emphasize things like, well, out of the zoo, right? They'll talk about yarn. Look, we supported yarn for many years. We agree, yarn's a good thing. That's not what our customers tell us they want to hear about. They want to hear about this transition, I would say from an early adopter to what I would call maybe a fast follower, right? And in the fast follower, you're seeing, we're already organizing ourselves this way, right? We're moving quickly into this realm, let's call it solutions, right? The horizontal IT-driven use cases, they're still very important for those companies that are starting their journey. But more and more companies are moving to their second, third, fourth use cases. Those are all going to be more vertically driven, meaning more industry driven, right? So let's take the jump. Well, Joe, main expertise is really critical in there. Absolutely, right? Not quite to the level of an application. So here's how I define it, right? A solution has maybe 70% repeatability, right? With 30% configuration on the end, so what you're doing, let's take fraud detection in the banking industry, right? So we've solved fraud detection as a use case at multiple, meaning many, many financial institutions, right? So now we have a solution, which is a collection of assets, some project oriented, some code oriented, some intellectual know-how that we can bundle into a solution and do 60, 70% of the work done when we walk in, knowing that we're still going to have to tailor it or configure it for that remaining bit of functionality that's unique to that next financial institution that wants to tackle fraud. Someday we'll be to an application level, which is 90% packaged, and then there's a little bit at the end. But I think this transition to solutions and don't make it 90, make it 60, 70% repeatable, that's happening right now. I would agree, and the linguistics of how customers talk is different. It's outcome driven, right? So I hate that word too, it's getting overplayed, but you know, business value, another overplayed word. But this is what people are talking about. They're just like, hey, you know, I don't care about this, I want, I have a problem, I need technology, I need an integrator, I need someone to deploy it, deliver it, I'll pay. They write a check, this value. Correct. You repeat that, that's a business. So to me, that's where it gets real. So how are you guys doing in that regard? Mike said, services is not a big part of the business. I mean, you say we're going to talk about the business, but yeah, Cladera makes good product revenue, right? I mean, you guys sell product and you have services. Yes, we do, we're a software company. But it's not just all, not just not all consulted. No, no, no, not at all. I mean, we have a hybrid business model, right? We're very open about that. People can disagree, but we have, and by the way, make no mistake about it. When it comes to an open source core, we have the core of our solution is apples to apples equivalent with anyone else who says 100% open source. It's just that around that, we believe that there is additional opportunity to have software that we need to control the roadmap on for some period of time. And that's where we get a lot of value. Michael, I want to ask about the product styling on EMC. So what are the top requests that the customers have with Cloudera and I, I'm assuming it's just Isilon group, right? Not other parts of EMC or is that, is it just Isilon? You guys, we'll talk about Isilon today. We'll talk about Isilon today. Just Isilon, okay, just talk about Isilon. So Isilon's out, they've got a lot of huge install base. What are the key features that they're looking for? What is the top three? I'd say that, I mean, primarily what they're trying to get out of an Isilon installation is they're trying to make it more valuable. And the way you make it more valuable is you're able to do more stuff with it. You've already made that sunk cost investment on what's going to be stored there. And so the thing that really drives customers to reach out to us and what's causing this partnership to be close is that we add a lot of capability on top of just the deep storage that Isilon provides. So what product are you guys tying in that synergistic with the relationship? Because it makes a lot of sense. I got the big iron storage. Now I want to pull in some sort of active layer. Impala, is it Impala? What's the, what's the, what are the products that are tying in with this EMC relationship? It's the, it's the CDH suite. I mean, across the board. It's HBase, Hi. It's all Cloudera. So like I said, hey, I love Cloudera, but I got to make it work with Isilon. Is that the number one kind of thing that you get kind of thing? Yeah, it may start the other direction. I have, I have Isilon. I really want to do these new use cases. That's how I justify my position within the business enterprises. So you're getting pulled into EMC business. Oh yeah, absolutely. More than bumping into them. I think it's both ways. In fact, what's really interesting is to watch when we have a customer that's running Cloudera on one part of their business and they happen to be running Isilon and generally in their core part of their business, right? So they've already made the decision that they want to go with Cloudera as the distribution. And now they have just more data that they can fairly easily, right? I won't say you just wave your hands, but you can very easily take whole new quantities of data and run the same kinds of workloads in the Isilon environment that you're already running in your Cloudera cluster elsewhere. That's a very common motion that our teams are engaging in the field and working together on. Well, I think it's great that you guys are with Cloudera. It shows a level of maturity across the industry. EMC, Cloudera, rising up, two great brands. Obviously Cloudera is now, how many employees now? Like 3,000, 4,000, 50,000. What's the number? We're closing in on 1,000 with this. But it's still, I mean, not 17 when the cube was there. That's more than 17. It's a big company, you know. So do you guys see yourselves, I mean, if you've got 70% repeatability in the apps now. So in the solutions, that's fine. Where are my questions going? Do you see yourselves getting to the point where you're packaging mostly repeatable apps or do you offload those to partners or what happens to that IP and the orientation of the company? So we're actually not even trying to be the solution provider, right? We happen to have all the expertise to do this. So this is just a, it's a way to make the professional services organization more effective. Out more precisely put, our global systems integrator partners are the folks that are working with us to take what we've done because frankly, to this point we still have more implementation experience across more industries than really than any other company, right? So we have all the assets, but we don't aspire to be an application company. That was my question. So what we're doing is we're enabling applications. Yeah, because look, our partners have much deeper industry expertise than we do just because we've solved the same problem using Hadoop multiple times in financial services. It doesn't make us financial services experts the way a global SI is. We'll work global SI be an application provider or a solution provider that works with a packaged application vendor. We've always believed that the SI channel will be the first place where these solutions emerge and whether those solutions get taken all the way to what I defined as a package application. We got a break, we're getting the hook here, but I think what I'm just going to end on that note is that solution providers, I mean, SI's can sometimes have their own cloud. So you're seeing some vertical integration at the top of the solution stack. You guys are enabling that though, right? That's our goal. But the package apps will be a distinct class of vendor. Yeah, I think so. And there's a lot of folks that are already, they're much closer to packaged applications here, right? And our goal is we want them to build it on cloud era. Tools, packaged tools. But you're also seeing some packaged applications as well, right? So loosely pre-built analytics for a particular business problem in an industry. We're going to take this offline because I'm going to talk about some other things off camera, but great to have Cloudera on theCUBE representing their solutions and their partnership with EMC, Isilon. We'll be right back with more action here in Silicon Valley after this short break.