 I'm at the Splunk Conference in Las Vegas. Conference 2013, I'm John Furrier, the founder of Silicon Lambs. I'm joined Dave Vellante, wikibon.org, and our next guest is Cube alumni and the famous co-founder of Cloudera, Amar Awadala. Welcome back. Thank you. We saw the future five, six years ago. Now you're living it. Splunk is doing well, too. I mean, they're in the sweet spot. Yeah. Cloudera is doing great. You got the Duke World coming up in the fall. Big data, New York City action. Great to have you on. So obviously you've been doing a lot of speaking engagements, you do a lot of the CTO work. Really with Mike now looking at the next generation of Cloudera, kind of that next journey. A couple of questions right at the start of the bat here. What's your observations of kind of where we've come from to this net point now? And what's the next journey forward in the evolution of the big data ecosystem? Yeah. The balance between the open source community, how do you, and then the business side of it, which is growing very rapidly. So it's always a ratchet game. One's going here, one's going here, in the middle is that equilibrium. And as you know, open source needs that balance. So give us your observation of where we come from, where we are today, and kind of what's going forward. Yeah. So I think now we are reaching the stage where maturity and enterprise readiness is just so important. That includes stability, robustness, security, et cetera, et cetera. So like some of our customers right now, they're telling us, can you please slow down, like stop innovating and stop adding new things to your platform, and just like give us a, like now focus more on the stability and the reliability and all of these things that enterprises need. So we're doubling down on that right now. So like recently we launched Sentry, and Sentry brings the same security level that you have in a data warehouse and a database, brings that to the Hadoop world. Sentry lets you do role-based authentication within the data that you have inside Hive or Impala, which is again, that's a must-have for enterprises. So that's kind of what's over the next year, a lot of that focus. The rigor of how can we be much more reliable, much more better at doing workload management across all of these nice features that we have in our platform, and much more secure. So that's the short-term focus. So the meat and potatoes, the blocking and tackling for enterprise table stakes. Exactly. How about going forward in terms of new tech? Obviously, I still keep the innovative, a lot of smart people at Cloudera. What's around the corner? What's that next 20-mile stair for Cloudera? Yeah, so I can't share our exact kind of roadmap. In that phase now, we have to make a sign in blood before we tell you exactly what we're doing. That's one file down there. Everything we do innovation-wise and what we're working on, when we are working on things in our innovation labs, we're definitely working on things towards that, is enabling the future vision of having the single platform where you can store data of any type, whether that's relational data or machine-generated data or roll-ogs or images or emails or PDF documents, and then bring multiple workloads to that data. That's what we're working towards. How to enable that across multiple data assets and multiple workloads. So we were at Oracle OpenWorld last week. We had Mike on, and he was talking about the Oracle partnership. It was interesting. So I wonder if I can ask you a similar question. We sort of see the juxtaposition of Oracle and just Oracle, but a lot of traditional companies with scale-up, expensive hardware. You guys come in, you get scale-out, open-source. Jeff Hammabucket talks about no more containers and the like. What you just described is a platform that a lot of companies would like to have. So I wonder if you could talk about that juxtaposition a little bit. Is that natural synergies or is there tension there? I don't mean between you and Oracle. I mean between just the vision that you put forth of sort of an open platform, a lot of Hadoop open-source versus the traditional legacy. They want to have that platform as well, bringing in any type of data and bringing it on to whether it's Oracle or Teradata or EMC or whomever, IBM, etc. So I can't comment how the long-term path is going to look like, but today they are complementary today. Because the systems that Oracle have, for example, picking on Oracle specifically, the Exadata system that Oracle has, it's really, really, really good at structured data. When you have relational structured data that already has been modeled, it's really good at that and can run extremely low latency queries against that. And now with the in-memory functionality that Oracle announced, they can also run in-memory transactions against that at very low latencies. Hadoop, on the other hand, is not about just structured data. Hadoop is about bringing the structured data with the unstructured data together and enabling applications on top of that. So they are very, very complementary today. Longer term, how is that going to pan out? That definitely is a very interesting question. Yeah, so you mentioned Sentry. You guys also just announced support for Apache Accumulo. Yes. So you talked about role-based security. Accumulo is very well known for its cell-level security. So help us understand sort of the differences and where they complement each other. Yeah, so essentially, role-level is in a table. We have a role. And at the granularity of that role, with Sentry, you can say who can access that role and what they can do with it. Accumulo does that at the cell level, meaning not just the role, not just which role, but for which role and which column within that role. So every single cell within the table can have its own tags for the security credentials of who can access that cell. Obviously, very, very important for the federal government. So our support for Accumulo is coming from that. The federal government, Accumulo was born inside of the federal government. It's very heavily used inside of there. They're using it on top of our platform. So it was a natural kind of thing that we would say, yes, we will support Accumulo. So it's like that AT&T commercial, you know? I got to ask you, so that what you described it, isn't that better? I mean, so what are the trade-offs? Help us understand that. You're talking about the granular level of security, the fine grain versus the sort of role-level. What are the trade-offs there? So that's actually a very, very good question. So we do have a similar system within the cloud. There's that called HBase. And HBase actually is almost identical to Accumulo in terms of feature set. It does everything that Accumulo does, except it doesn't have the cell-level security. It has a role-level only. Sorry, it has column-level, column-level only. And actually when I said role-level earlier, I meant column-level. You meant column-level, right, right, right, right. Which column? You said role-based. Yes. Give me a column-level. A given role, meaning a developer versus a scientist, and then for a given column. Okay. So HBase, on the other hand, has a lot of features that Accumulo doesn't have, like snapshots, for example, or like doing replication across data centers is another example. Obviously enterprise. Yeah, so HBase's technology is matured in many, many other dimensions compared to Accumulo. We have a lot of extremely large deployments of HBase for real-time production workloads, not just for analytical workloads, because of features that HBase has that Accumulo doesn't have. So that's kind of the give and take. Like Accumulo is really good, very, very good and deep on the security fronts, but it lacks on some of the other fronts. HBase is really good on these other fronts, but it lacks on the security fronts. So do we support both? Do those two worlds come together eventually? I mean, about five, 10 years out? I mean, it's hard to tell, I know, but. Potentially, potentially. We'll see. We'll see how that evolves over time. Right now, there are separate worlds. Like, we don't have a lot of our customers in the enterprise space using HBase asking for the role-level, the set-level, sorry. Yeah, except maybe in government, or maybe- It's really just government. Just government. No, the other segments, like health and finance, which are very, very security-aware, they come from the database world. In the database world, having role-level at the column level, that's good enough. Like, I can have different roles within my organization and then say which columns they can access and which columns they can't access. For example, this column has a credit card number and this column has an ID number, a social security number. Only these users can access it, but other columns everybody can access. That's good enough for them, actually. So they are not asking for the set-level thing. They're asking, give me snapshots. They're asking, give me a better way to do an application across my data centers. So that's kind of the give-and-take that they want in a price function. Yes. Amar, talk about the relationship Cloud Air has with Splunk. Obviously, you're here at Splunk Conference. You guys have a relationship. Splunk's done very well for themselves. Coming from the log file, just logging in the weeds, helping people as a tool. Now they're a full-blown platform. They've now morphed into a full-on value proposition for C-level analytics. They've got a cloud. They have mobile. I mean, they're pumping and they've got a lot of client acquisition and a strong ecosystem here. So talk about the momentum of Splunk and the relationship Cloud Air has with them. Yeah. I mean, we have several integrations with Splunk. The most exciting one is the one they announced today, which is the Hunk integration. And essentially- A valante. You know. They had the, you know, they had the picture of the Shippendale guy with the Splunk address. Pretty funny. The cute guy here. Please. In my dreams. What Splunk has, which is really special, is their UI. Their UI is a very, very special UI. It's very easy to learn. It's very easy to use. Traditionally, it came from the search background. How can we very easily define new search patterns for the events we are trying to look for in our logs? And then, and now they also added analytical capabilities as well. So how to do analysis within the datasets we have pivot tables and so on, which is a, so they are, as you said, they are expanding into the BI space, right? They're really going up to the BI space at large as opposed to the log searching space. A lot of data lives inside of Hadoop already, right? And obviously, it was a natural extension for Splunk. Is how can we shed the light on that data that's inside of Hadoop without having to suck all of the data out of Hadoop and move it inside of Splunk? And that's really where Hunk comes in. And we love that on our side because one of the biggest gaps for Hadoop has been skill set. Not everybody knows how to use Java MapReduce or use Impala or other technologies that we have to look at the data inside of Hadoop. There's a very, very large number of people out there now. I heard almost 1,800 people, 1,200 people attending these, sorry, 2,000 people attending this conference and obviously a much more bigger user base out there that knows how to use the Splunk tool set. Now they can use it to access the data inside of Hadoop. It's a little liberation too. We, Dave and I were coming on the intro that you're seeing names like Ariston Networks, which makes a great box that we all know. Palo Alto Networks is here. A lot of ecosystem departments. That's on the ingest side. On the a lot of ingest side. And then we had some security guys who talked about policy and compliance where Splunk is creating, enabling compliance to be done easy, which we all know is a slog and can be an inhibitor to innovation. But now you have this ability to automate. And that's really kind of a new factor in BI in this kind of real-time automation. This kind of sounds like Impala a little bit. I mean, real-time, how do you explain that? What are the differences? So I mean, Splunk's foundation is about how to work with unstructured datasets, right? So with raw logs and with text and research was the kind of the underpinning of that. Impala is more about SQL. Impala is about structured data that has a well-defined schema that on top of which you're using the SQL language to integrate with traditional BI tools. So it's another way of looking at your data. It's not, I wouldn't say the same thing. Is that so I don't have to move it into Exadata or some sort of tongue-in-cheek, right? It's hard to predict where this is going because you have those capabilities that imply that someday the robustness of your system could start to be good enough in a lot of these cases. Good enough is the right word. So good enough is the right word. And that's the same, I mean, with Oracle, when we're talking about Exadata, so yes, first Impala, the analogy for Impala is stronger with Exadata than it is with Splunk. So I agree with that remark. We typically would say, we would segment it as this, is Exadata is like first class in an airplane, right? It's like when you want to travel in first class in an airplane. You want to get there first and you're okay paying the higher price tank to get there first, not through the nose. Oh no, you pay through the nose, come on. You said that, I will say Oracle. You're paying first class. You're funding the boat, thank you. And while all of your data would love to be in first class, would love to be in first class, not all of your data deserve to be in first class. There's a lot of data that doesn't deserve to be in first class. We should fly in first class. I want to see the Cloudera name on the next America's Cup. That's when you know you made it when you can get a boat out there. But that's a good analogy, okay. So it's like the first class, right? And so that means it's just a type of impairment. So instead of archiving it and killing it. A smaller number of use cases. Instead of archiving it, no, no, it's not saying it's more a number of use cases, but typically what happens today is when my data doesn't deserve to be in first class anymore, I kick it off to archive and it dies and I can't query my data anymore. So you said no, no, not a small, but less passengers, right? Less data, is that fair? It's more data per bytes. It's more data on a byte basis. There is more people that travel in economy class than travel in first class. Yeah, yeah, yeah, that's what I'm saying, right? Just the value of that, Right, that's right. So fewer passengers in front, right. But the value of all of the economy class, passengers could be higher and aggregate. To my point, right, that's exactly the point I'm making. My argument would be that your model is, in theory anyway, a bigger total available market than the tip of the pyramid. We should hope so in the long term. Yeah, I mean, I know you've got to be careful on because you have good partners with Oracle and you don't want to say the wrong thing, but just, we're talking 10 years out. I mean, nobody knows what's going to happen, but I mean, this is new. I feel like, you know, the Holy Grail, you've touched on it before, of big data is putting analytics in the hands of business users and that's what everybody's going after. Tableau, Splunk, and Hunk, right. And so I feel like Cloudera is this big ice breaker that created this ability for the ecosystem to now create that type of, so when you were saying competitive with Clouder, not really. You're the platform, the infrastructure on which you're going to build all these applications. Yes. Yes, like one of the analogies we use, sorry just to interject you, like another analogy we use, you can tell now I love analogies. So another analogy we use is, we're kind of building the iPhone of big data. So when you take a picture with the iPhone, that picture is good enough. It's not a perfect picture. You can take a much better picture if you go and buy a DSLR camera. Yeah, right. But you got to log it around. But first you have to pay more for it and second, the DSLR camera only takes pictures. It doesn't run other apps. We're building a foundation that can take very decent pictures but can do many, many other things besides taking pictures and do that at a very good price. That's kind of what we're doing for big data. The iPhone of the big data, it's a high end. We're getting hooked on time here but I want to ask you one final parting comment. Obviously open source, we know is near and dear to your heart. Obviously with a dupe, scale out commodity hardware is based on the vision of cloud area enabling all this resource to be used. What does the open source community need to do to maintain the pace of innovation? Given that you said, a lot of people being asked to slow down to build the foundations for the enterprise. At the same time, open source is still accelerating. What is the balance that open source community needs to take care of in the software communities? Yeah, so that's the nice thing with having an open source business model, like we see that with Red Hat as well, is you have these two communities that you can appeal to at the same time. You have the open source developers that really want the latest and greatest. And Red Hat, for example, they appeal to that crowd using Fedora. They have Fedora and Fedora appeal to that crowd. We appeal to that crowd with a patch of foundation. Like in a patch of foundation, we push all the latest and greatest bits in there and anybody wants to grab them, they can go grab them and try them out, right? But then on the other hand, we have our own distribution, which is CDH. And CDH has a very fixed release schedule for when the major versions come out. Every three months, we have a minor update that fixes all the security and stability bugs that we encounter. And that has a very, very predictive schedule and that's where that appeals more to the enterprise users. And that's where you guys are buckling down and doubling down on, as you did earlier. Exactly. CDH side. It's making that extremely more robust, extremely more secure, extremely more functional for the enterprise. Amar, always great to have you on theCUBE, the co-founder of Cloudera. Really amazing. Remember when you started Cloudera, you were in the Excel partners office and you were tongue-in-cheek and we're going to have something really big. That was, I think, in 2007. Congratulations on all your success and the growth of the company. Thank you very much. We are here live at theCUBE at the Splunk Conference dot conference, the hashtag Splunk Conference. Tweet us, we're watching Twitter. I'm John Furrier with Dave Vellante. We'll be right back with our next guest inside theCUBE, extracting the signal from the noise here live in Las Vegas.