 Here, this is Silicon Valley coverage of Hadoop Summit. I'm John Furrier, the founder. We're pleased to have a friend inside the Cube. It's rare to have such luminaries. Amar Awadala, good friend, and also co-founder of Cloudera, really the pioneer in the space that helped build this industry that we're living here at Hadoop Summit. I'm with Dave Vellante from wikibond.org. Amar, welcome back to theCUBE. Cube alumni. Thank you for having me here. Wow, what a journey. You co-founded Cloudera. I remember when you were in stealth mode, I really can't talk about it. And then, of course, the history of Silicon Angle being founded and kind of built in your office when you only had like 20 or something employees. We owe a great deal of gratitude to you and congratulations to you, Mike Olson, the team for building an industry. So I just want to say thank you and welcome to theCUBE. Thank you, it was great to be here. So what do you think? What's your take on the current Hadoop ecosystem right now? I mean, honestly, a lot's happened. I mean, it's big now. It's growing up fast. The word enterprise grade is out there. You're seeing it move from trying to change the world. Our first interview, you said, I've seen the future. I want to bring it to the mainstream. It's here. It's hitting mainstream right now. What's your take of the current situation of the ecosystem and its value? Yeah, so I have a quick question first. Should I look to you or look to the camera? Look to the camera or both, whatever you'd like. So I think the ecosystem is definitely growing, which is very, very healthy. However, there is a side question there, which is what do you think of all the competition coming into the space? So five years ago when Cloudera was started, it was just Cloudera. There was no other commercial vendor trying to support or enable Hadoop in the industry for enterprises. And today, there is at least 10 of them trying to compete with us, right? And that includes big companies, established companies that decided, hey, we're gonna start addressing the space, but includes many, many newcomers, like Hortonworks, who were founded over the last couple of years. That's a healthy thing. I mean, that's absolutely a sign of a growing market. If the market wasn't growing, if there wasn't money in the market, if there wasn't, if it was just hype, there wouldn't have been all of these new companies and new ventures showing up. That said, I never look at competition as something that worries me, that I'm afraid now always gonna happen to me, or that's normal. That's exactly what happens to successful companies. If you look at Red Hat, when Red Hat was launching with the Linux, they had 25 competitors, or even more, 30 competitors. That's when Red Hat was forming out. And today, even of these 25, 30 competitors, they still have six or seven, still left. So I think it's a very, very healthy sign of the growth of this market and the maturity that's reaching. What do you think about some of the white spaces that are evolving? You guys have obviously been involved in a lot of deployments at Cloudera. Again, you're doing a lot of work with the top names and the clients that you have aren't usually disclosed because you really can't disclose them. What are you seeing right now as the white spaces for things to do in the Hadoop platform? It's a very, very good question. So first, I can't really talk about future road map right now. We're becoming a big company at that level where we can't comment on future road maps. Ah, that's the sign of the time. You're well, you're media trained. It's good to see they're doing a good job keeping you, if you want more information that I can connect you with PRT. No, no, no, please, no, no. We're good, we're good. We'll get it out of you. But our vision for Cloudera from day one, like you were saying earlier, we saw the future, right? So our vision from day one was really to build this data system where you can have data of any type, whether that data is structured or unstructured or images, it doesn't matter. And then on top of that data, run any type of workloads. That workload could be the initial genesis of Hadoop, which is MapReduce, which is batch processing. But now, as we made many announcements through the last few years, we also now have Impala for interactive analytics as a workload. We have a very, very strong partnership with SAS for doing machine learning and statistics as a workload. And a few weeks ago, we announced Search as another workload. So you have multiple types of workloads that can handle different types of problems that you have within your organization and bring all of these workloads to all of your data, regardless of type. And that's the vision that we'll continue to deliver on. That's exactly what we're building, going into the future. So how does that fit in with Yarn, right? We're hearing a lot at this conference about Yarn, the ability to do more with less and a lot of the things that you typically hear within the enterprise. So talk about that a little bit. Yarn is a very core part to our platform. In fact, Yarn has been part of CDH4 for more than a year now out in the markets. So we did bring, we were one of the, I think we were the first vendor who brought Yarn into a distribution of Hadoop out there. It's very, very fundamental to us because that's how we're going to coordinate. We're going to be using Yarn to coordinate, launching all of these different types of workloads. You're going to have the MapReduce workload, which is very batch-oriented. The Impala workload, which is very latency-sensitive. The Search workload, which is also very latency-sensitive. The Machine Learning workload, which is more batch-oriented, et cetera, et cetera. And Yarn is a very central piece to helping us coordinate all of these different types of workloads onto the platform. Cloud Air has been a great citizen in the community. Obviously, you mentioned and we witnessed that your team create the industry. You guys were there. You took the chance, you were the first ones commercially funded by the venture capitalists. You know, then others will follow and I'll see a huge ecosystem here. A lot of noise. A lot of people are trying to get attention. So I have to ask you because I want you to address this because I know it's been talked about in some of the other blogs is there's a lot of FUD going on around who's doing what, who's doing what. In some cases, maybe flat-out misinformation and that happens in a growing market. Elbows get sharp. So I want you to share with the audience anything that you want to say about the FUD around what people say about Cloudera or about others or what you're doing just to clarify. Because there has been, I mean, I've gotten back-channel information around not sure the committers this and it's been well-documented. There's a lot of FUD out there. What would you say to the folks out there to clarify that? I would say that our focus should be to continue to work as a community to push the platform forward. I would say that at Cloudera we do a lot of contributions. Hortonworks definitely is one of the top contributors out there as well. I'll acknowledge that. So is many, many, many other companies. And we want to continue to see the platform evolve. I would stress though that at Cloudera we do have a number of the original project founders working at the company. So it's not just the contribution that we bring but the fact that we have the founders of these projects working at Cloudera. And some of these projects actually were created at Cloudera from day one as opposed to created in some other company and then you hire the employee and they work for you. So I gave you both examples from Cloudera. Doc Cutting, he is the creator of Hadoop. Doc Cutting is also the creator of Lucene which became Solar which is part of the search project that we launched recently. Doc Cutting wasn't with Cloudera from day one. Right, so when he created these technologies he actually was at Yahoo for example when he created Hadoop. He was at Yahoo, wasn't at Cloudera. However, he now works for Cloudera. So we get that because now Doc Cutting works for Cloudera. So that's one example. On the flip side there is projects like Flume and Scoop that are now part of every single distribution out there and Flume and Scoop were both created at Cloudera. They were actually created inside of Cloudera. So the key point is, and that's what I would like all of the vendors out there that are trying to leverage Hadoop and get benefit out of Hadoop is please don't be just takers. There are some vendors out there who are just takers. Just want to take from the open source, take from the open source and don't give back. Right, I'm not going to name them but there's a few of them out there. Please, please, please. I mean that is very, very selfish behavior. It's not going to help the ecosystem in the long term. We would like to see you both take and give at the same time. So that would be my core message. And that's for example, like I thank Hortonworks because that's exactly what Hortonworks is doing. They're both giving and taking at the same time. You guys have always been clear on that. Nobody, I mean, your contribution to open source has been well documented and there's no question about that. John and I have talked about it a lot. You guys helped get it all started. It even hamabaka when we had them on a couple of years ago when Hortonworks came into the market said, hey, the more people working on open source, the better. Yeah, exactly. Yeah, it's always been your posture. You're not playing games there anyways. Having said that, you have a strategy to layer on top of that open source some of your own proprietary code. And so you have choices to make in terms of how you allocate those resources. So as an engineering manager, how do you allocate those resources in terms of, okay, what do we do for the community and what do we do for our own future because of the business model that we chose? How do you make those trade-offs? Yes, that's a very, very good question. So first it's important to stress that our core platform, CDH, is open source. Everything we put in the core platform is open source. So for example, Impala, which we launched very recently as a GA now, we launched beta last year, but now it's GA, is 100% a petulize and 100% open source. Search, which we announced very recently, is also open source. So the platform itself, we're committing to everything in there to be open source. Now, we believe fundamentally, just from having lots of history in studying the open source markets, from our CEO, Mike Austin himself, being one of the very first open source people in the world with Sleepy Cat, the company that he sold to Oracle before founding Cloudera, from our investors helping many other open source companies to have a successful open source company, you need to have a very good engine between the business model that generates revenue and between the products that you are creating. If you don't have a good feedback loop there between these two, you won't be able to sustain the innovation to continue to push the boundaries of how good the product is. So we strongly believe in that. If your product is literally 100% open source, meaning both the management and everything is nothing proprietary whatsoever inside of your products, I can't tell what that is. It's taking a picture. Oh, sorry, I thought somebody was waiting for me. Sorry about that. It's a huge signal. It's such a really good armor. I thought it was like a card of paper with some writing. You have a fan out there. They're storming the concert here. Okay, that's good to hear. Sorry about that interruption. So if you have everything 100% open source, that creates two problems. First, you have no differentiation whatsoever, meaning another big corporation without naming who the big corporations could be. We just can take everything you do, literally every single bit of source code you have and say, hey, we can do it too. Come to us, don't work with those guys. We have the greatest things that they have. Why do you want to continue to work with them? So no differentiation is number one, which is very dangerous. And number two, when it becomes if it's 100% open source and there's lots of other vendors able to take the open source artifact and work with it, then it becomes now purely about maintenance and insurance on the products, which is a commodity products, which obviously the prices for that will go down to the ground. And you won't be able to have this, sustain this positive feedback effect between your business model and between your product roadmap. And you won't be able to build a long-lasting company. So that's why we do have a combination of open source artifacts and proprietary artifacts. Now, our proprietary artifacts is always around the management of the system. So how do we manage the security of the system? How do we manage the data flow within the system? How do we manage the services inside of the system across all layers, right? Not just the Hadoop layer, but the Edge-based layer, the Zookeeper layer, et cetera, et cetera. So that's where we focus our efforts going forward. And that's how we differentiate ourself from other vendors out there. Cloudera manager, Cloudera navigator are very unique to us. Nobody else has anything close to those capabilities out there. So it sounds like the contributions you make to open source are cultural in nature, I mean DNA of sorts, right? And so that's something that you guys do because you've always done it. Absolutely. The artifacts that are proprietary are essentially around rationalizing the revenue opportunity with the expense that you're going to apply there and making a business case. That's one, and then two, the differentiation from other competitors. So it's these two things. Yes. Okay, so. And we believe that's fundamental to business, to open source business models. Yeah, I mean, there are many open source business models, right? You can go pure service, you can go, like you said, you can call Lee Bogart the code. There is no pure service open source model company that was able to build a long-lasting, surviving public company. Never happened in history. They always get acquired because it becomes a commodity. I mean, right. I mean, even IBM. Tom, I want to ask you about the storage thing. We're talking before camera, the Hortonworks announcement storage. What's your take on that? Which one? The Gluster? The one with red hats. Yes, so red hats, and yeah, there has been some recent news about red hats with Hortonworks having a version of the Hadoop platform that uses MapReduce for the computation, but uses red hats for the storage, right? So red hat has a new storage offering that was built based off of a company that acquired what's called Gluster. And that news was very, very surprising to me. And the reason why it was surprising is correlated also with a shift in messaging from Hortonworks. If you look at Hortonworks last year at Hadoop Summit last year, one of the key messages that they delivered to us is that within the next five years, or by 2015, the tagline back then by 2015, and you're doing research right now to see if I'm saying the right thing. By 2015, half the world's data will be stored in Hadoop. Will be stored in Hadoop. If you look today at the slides, it doesn't say that. It says within five years, right? No, no, no. It says... Well, that was the second iteration within the five years and how they say something different. Now they say within 2015, by 2015, half the world's data will be processed by Hadoop. And instead of stored by Hadoop. And that's a very, very fundamental shift. So it's a nuance. It's a very important deal. Because when I first saw that I said, what does this all mean? And then it sounds, 2015 sounds a little early. And now you're saying processed by, okay, that's different. Yes, exactly. And the reason why now is we believe HDFS is very, very core to the Hadoop platform. HDFS is very core to Hadoop platform, the storage system of Hadoop. We want to, it's really the layer that made Hadoop what Hadoop is. More than anything else. It's how scalable, how reliable, and how economical the HDFS storage layer is. So we really, I mean, ask Cortenworks and ask all the companies working in the Hadoop community not to fragment at the storage layer. We need the storage for Hadoop to stay inside of Hadoop and not to fragment it out. That's very, very critical. Okay, so but. So you're saying that they're indicating through the gesture that they're not come out saying we're going to fragment HDFS, but the way that this is positioned might signal. No, no, no. The announcement with Red Hat is that. Is the direct signal. Literally, you'll be able to run MapReduce directly on top of Red Hat storage instead of HDFS. Okay, so I. Is that a compliance? I interpreted it as they were just, Hortonworks was hedging on its prediction, which I said, okay, I'll give a break on that. You're saying it's something different. It's a shift in strategy. Potentially. Yeah, which can be dangerous. It's a shift in strategy. Is that a compliance issue? Cause you know, the diss on Hadoop is POSIX. Red Hat does have a lot of enterprise customers. So is that just maybe. Then invest in making Hadoop POSIX compliance, which actually by the way, we are as a community investing in that. Yeah, that's a must have. Yeah, so we are investing in adding POSIX compliance to Hadoop. We're investing in adding the snapshots into Hadoop, which will be coming very, very soon over the next. Do you think that for a year, I don't care if it's 2015, 2020, 2000, whenever, that the majority of the world's data will be running into Hadoop? The majority of world's data that has to do with analytics. Yes. Okay. So that is very important, the caveats. Yes, exactly. Because there is lots of types of data that are not very suitable for Hadoop at all. For example, the data storage for Oracle systems, for Oracle database systems. No, you want to store that in a NetApp or an EMC. You don't want to store that in Hadoop. The data storage for streaming video files, right? For just streaming lots and lots of video files. No, you don't want to store that in Hadoop. It's a huge proportion of the data. Yeah, which is a huge, huge proportion of data. In fact, that could overwhelm the data. Yeah, so the new ones, like I would say, I agree that the half thing, but the half thing within the world of data for the purpose of analysis. Yeah, okay. So that's... None of that on the... Yeah, okay, but it's a more reasonable, but I never... Which is still a huge market, by the way. It is. Yeah, it is. Yes. Okay, so what's next for you, Amar? You've gone on this journey. You start this company. You've been traveling around like crazy, working with customers. What's the next phase of Amar Awadallah's career? What do you want to have happen next? I mean, what excites you? What are you working on? Yeah, just to continue to grow cloud data to be the biggest company it can be. I mean, we want to be literally, we want to be one of the very few companies that were able to take an open source model and turn that into a large, publicly traded corporation. So you've talked about that. You just brought a new CEO on, right? Look at the background of the CEO. Clearly he's got some IPO chops. So that's an aspiration that you guys have put forth. Okay. And you're outward facing now, so you're doing a lot of travel? Yes. So what have your travels taken? You've been in China, obviously you've got a European office open. So what's going on internationally? Give us some sound bites of what's happening in the field. Yeah, so internationally, I mean Europe definitely is our next big focus right now and we now have a big operation in Europe and we have an office presence in Europe and a big team down there. And it's growing very quickly. I would say Europe is about two years behind the US. Kind of like that's how the growth usually matters, what's happening here. And yeah, so our next big market is Europe. We are looking at China. We don't have a big presence in China right now. Japan, we have a big presence in Japan. Japan is growing very quickly. So yeah, I mean, we're obviously kind of with the US growing very quickly as well. Great to have you on theCUBE again for me personally and for Dave and I want to say thanks to CloudAir for some great support over the years. You guys have been fantastic. You know, I'd say it's build a great company. It's so hard to build a company. You guys have done a great job. I got to ask you the final question because you did bring that first sound bite, which was I saw the future. This is back when you guys were just in your B round and Palo Alto office just ramping up, just starting to ramp. What's next? What do you see as around the corner? Obviously we're on a trajectory right now. A lot of things going to get done. Positive compliance, a lot of stuff's going to fill in. The platform's going to get stronger. We think that open source will win. Through all the democratization of open source. What's next? What's around the corner that you're watching personally that's interesting to you, Amur, around where this will take us? Yeah, so what's next is having this vision become true. Having this future vision that you refer to become true. Meaning having a single platform that can store all of your data and that can, regardless of the type of that data, and allow you to extract value for different types of workloads. Whether that be batch, interactive machine learning or search or more, right? There will be more things that will come to the platform. But how to bring your applications, all of your data applications, how to bring them to your data and all of your data as opposed to have the data go to them. And what are the landmines out there that you need to avoid and the industry and community needs to avoid to make that a reality? The key landmine, it's a bit technical. The landmine is a bit technical, which is making sure that the yarn vision continues to evolve and that we have the capability to properly have a multi-workload resource management system that allows me to run all of these type of workloads without having them step on each other's toes. That's the key, key step going forward. And of course, playing well together in the sandbox. And as always, competitive is competition is good. And again, Hadoop is doing great. Amar Awadallah, co-founder of Cloudera, Inside the Cube, this is SiliconANGLE and Wikibon's exclusive coverage of Hadoop Summit here in Silicon Valley. We'll be right back with our next guest after the short break.