 I'm John Furrier with SiliconANGLE.com. And I'm Dave Vellante of Wikibon.org, and we're here with Amar Awadala and Mr. McDougal from VMworld. VMWare. So this is our big data panel spotlight number one. We're going to have another afternoon session with some other big data gurus, but we have the conversation that's hot in cloud is big data. So Amar Awadala, co-founder of Cloudera, the hottest startup in Silicon Valley around big data, the market leaders in commercializing Hadoop. Amar, welcome back to theCUBE, where you're a CUBE alumni, you've been with us at Strata, and you've been in theCUBE in our office, your office actually. But I can barely hear you. You can barely hear me? Yeah, maybe you can speak up a little. Okay, I'll speak up a little. I'm hearing you delayed from over there. Okay, you got to try to hold it together. Richard, welcome to theCUBE. Thank you. Okay, so first question, Amar, big data. How has VMWare changed over the years from a virtualization company now to a pure on cloud company? That's a question for me or for Steve? For you, from your perspective as an entrepreneur, you're pioneering software, you know about virtualization. Yeah, yeah. What do you see changing, what do you expect to hear? How do I expect VMWare to change going forward, or how did they change already? How they change, and what do you expect to hear from them? I mean, VMWare, they're changing beyond just the role virtualization layer. We see them growing into layers above that, right? So we saw that with the acquisition of Zimbra and with the acquisition of JBoss, right? No, no, spring source. Spring source. Yes, so we can see how they're going beyond just the virtualization at the lowest layer. Regardless of that, storage is very important at both of these layers. Traditionally, I've seen VMWare focus more on storage for the virtual machines using storage ad networks and central storage. And I predict that over the next few years that's going to change, and they will be using more of the storage on the servers themselves. Richard, so the question for you is, I wrote a post about OpenStack this morning and how VMWare's cloud plans are putting possibly the damper on things like OpenStack, which is open source, where the emphasis is on delivery and on commercialization and high grade quality reliability. So how is VMWare rolling out, continuing to roll out in the enterprise, when you have to deal with open source frameworks and software development environments? I think Open Source is actually really good because it helps drive the ecosystem of the tooling and community above it. But I think if you look at some of the core values that you need in this growing world, especially in the big data world, people want big data with strong isolation, with resource controls. They'd love to be able to share the same cloud infrastructure for big data and other workloads at the same time. So some of the core features that have gone into the product around resource management, strong isolation, all those types of capabilities will really enable people to leverage that level of isolation on the platform. So if you look sort of deeper in the platform, I think a lot of the core values that have been in the BSVF platform are really able to be exploited now for some of the needs around the big data platform. And I think that there's a lot of differentiation in the platform in that level. So, Amr, we look at the trending items inside the Twitter sphere and what's going on in the marketplace. Obviously, Hadoop and Big Data is very hot. What are the things that you're seeing that enterprises really want in big data? And how does that relate to their cloud play, their cloud strategy, or cloud possibilities? I mean, Hadoop and Big Data in many ways are about cloud, right? I mean, if you define cloud, cloud is about essentially having a resource that is scalable that can do your bidding, right? And in many ways, Hadoop enables you to do that when it comes to big data, right? So if you have lots of data that you want to have lots of flexibility and agility and scalability in terms of processing, that's exactly what Hadoop offers to you. So what we're seeing is three kind of major trends I'm seeing that are making Hadoop become so hot today. The first one is actually a behavioral one and that's where organizations actually want to be more agile, they want to be more adaptive and they want to be locked into a given language to query your data or locked into a given schema in which they have to present their data. And Hadoop gives them that freedom essentially, gives them that agility. The second two important trends are the commodity hardware trends, right? So now we have commodity hardware, we have boxes in which we can have multiple cores and multiple disks and that allows us to push a lot of the computation and scale our big data processing much more than we have ever been able to before. I'll just tell it into just looking back a little bit so we can see back and forth. So what's your view of big data? I see VMware has to put in commercial opportunities. So what do you counter that with the armor? Yeah, I agree with armor. Big data is very important. Customers are realizing they can unlock a lot of value from the data and they're starting to invest in a diverse set of big data platforms Hadoop is definitely growing like crazy in our customer base and the interest level around Hadoop and the community around it is very substantial. So I think you've touched on some of the things armor raised around agility. People want to be able to deploy Hadoop on the existing infant cloud infrastructure they have very quickly. And so if you actually mix things like Hadoop with a cloud infrastructure, you get agile Hadoop where you can deploy on demand Hadoop instances for analytics rather than having to go and set up a wait three months and go and set up a very specific cluster for that specific purpose. So I have to ask because this is something we're following very closely at SiliconANGLE is the whole HP debacle around computer, their PC division and some of their moves. The world is going towards commoditization of PCs and servers, which is one of the things why Hadoop is so powerful and the capabilities are enabling. HP and these big server guys are also big customers of VMware. How does that all shake out? Do you think it's just natural evolution? How are they going to adjust their your partners? At the same time, the commodity trend is rapidly accelerating all the way to the edge to inside the core data centers. I think what it would do is it would change the shape of configurations of platforms that are used. So Hadoop is about using a commercial platform, a shop shelf platform, so both compute and for storage. And so I think we'll see initial amount of Hadoop being deployed on virtual clouds on top of things like infrastructure sands. But I think we'll see a commoditization of the storage layer as well. So Hadoop can take a lot of machines with local storage and cluster those together and then produce high performance computing and large data storages on top of that as well. So I think that's the next big thing to come is the change in the way that storage is provisioned and probably a five or 10x reduction in cost in the cost per gigabyte of acquiring storage for big data. The one question I want to ask, and I'll let Dave ask a few questions is to Amar and Richard is, what are the customers saying to you guys? What are the customers saying to you about what they're looking for? In a way, they're inventing the future with you guys, right? So in a way, a lot of the things that they need, they don't know yet. You said once, you've seen the future and you're out there inventing it. That was three years ago, I was saying that. So I think I started three years ago. So, okay, let's get down to our theme. Our theme is reality, delivering product. It's hard and you got to have quality. So what are the customers in like right now? What are they asking for? What do you think they need and how are you guys looking at rolling out this out to customers? We've been sort of looking at our customers in, you're either test, dev, you're small to medium or you're one of the super huge birthing customers of Hadoop where you're trying to roll out five or 10,000 nodes of Hadoop. I think in the small to medium development and test environments, what people are asking for is that we have this cloud infrastructure, we have infrastructure in place, we'd love to be able to deploy Hadoop quickly on top of that. And we'd like to be able to share the platform with other workloads. So for example, if I've got these 100 nodes at night, I'd love to be able to use those for Hadoop when I'm not running my main data center applications on that same platform. So time sharing and doing elastic Hadoop on top of existing infrastructure that I would time share with other more critical workloads. Are you worried about the fracturization of Hadoop? I mean, obviously everyone's coming out their own flavor. It seems to be the trend. I have a version of Hadoop. You got MapR, you got competing, different versions of MapReduce and other rewrites going on and how does that all shake out for you guys? How do you care? Sort of you, Hadoop, we do care because if we look at a lot of the work we're doing around Paz and Cloud Foundry, at the upper level, we have application stack and you have developers interfacing with those application frameworks. And so Hadoop sort of looks like two pieces of software. It's a developer framework, which the big data developers can write MapReduce and other tools on top of. And then it's a bunch of system software at the back end. So the system software for scheduling and storage and all the other things. So I think it's likely that there'll be more work integrating the system software with the virtualization platform and in the future there might be more work going on around integrating Hadoop with Paz frameworks to sort of try and take a different approach to that. So I can answer your question. I mean, the one about the customers and what the customers are asking for. So essentially there's two things customers are asking for right now that we've focused on at Caldera. One is scalability of people. So I mean, Hadoop is very good at scaling, processing, right? However, how can we get one system admin to be able to manage 1,000 nodes, 10,000 nodes? So Caldera over the last couple of years we've been focused on the Caldera management suite and the Caldera management suite essentially attacks that problem. It allows one system administrator, Hadoop administrator, to be able to scale their capacity to manage hundreds of not thousands of nodes. Towards that end, we launched also Caldera, CMS, SEM Express and SEM Express brings down the level of knowledge that you need so that even our CEO was able to install Hadoop using it. So that's the first dimension is how can we bring down the administration deployment configuration of Hadoop to the normal system administrator. The other one is integration with existing enterprise infrastructure. So customers want to make sure that they can continue to use Hadoop with the stuff they have today. So along that line, Caldera has been working on many integration with industry partners. Some of the big ones are the Dell relationship we announced very recently. MicroStrategy is another one on the BI front, Informatica on the ETL front. We also launched a partner exchange program, very similar actually to what VMware has, to allow many of the partner ecosystem players to start building their own integrations to work directly with Hadoop. But these are the biggest two areas of focus right now for us. So Amar, I wonder if you could comment on the other part of John's question which related to the competition. When we met last at Strada in February, I had asked you, what about competition? I said, well, it really isn't any. And since then, it's come out of the woodworks. You've seen EMC and Green Plum, you've seen, in a way, Nexus Lexus, you've seen Hortonworks. So I want you to, if you could comment on that. And then the second part of my question is, a lot of the competitors are saying that Hadoop is not enterprise ready. And I'd like to give you an opportunity to comment on that. Because I know, we've talked in the past, how hard you're working on that problem. You may understand it better than anybody. So in first, talk a little bit about the competition, what your thoughts are on that. And then let's talk about, is Hadoop enterprise ready? Sure. So in regards to competition, you're right. I mean, a couple of years ago, three years ago, it's just us, right? We were the only ones saying, hey, this is going to be huge. And nobody was believing us. We tell them, this is going to be a major movement. As you said, we saw the future. This is going to change the way data storage and processing is done. So we have three years of headway, compared to anybody else in the industry right now. And that's materialized in lots of customers that we have across many industries, that we know how they use this technology. And we know what their needs are and what we need to develop for them. So yes, there is other solutions out there that will try to do it one way or another. Any major movement, any major new technology wave will have many, many players that appear as soon as the market proves to be a hot market. Many players would appear. Few will survive after the wave is done. Cloudera will be number one in that space. I assure you. Now, in terms of the enterprise readiness and enterprise maturity, that's, as I just said a few minutes ago, that's one of our biggest areas of focus right now. Hadoop is not there yet. It's getting there very quickly. We're deploying lots of resources behind that effort. And we will be there. You have to keep in mind that Hadoop is only seven years old. So it's not that old as technology compared to VMware, 15 years old, database technology is 30 years old, 40 years old. So it's getting there. It's getting there, but it's going to take time. So I have a question for both of you guys around something I was talking privately with our team. We haven't really written anything about it yet. Is the democratization of IT? So, you know, obviously with social web, democratization of media, blogging, we're all doing that and we're doing live right now. But we're seeing that trend. You mentioned ease of use for the system admin. There's a democratization. You're seeing federation in clouds. So it should be as easy as a business analyst saying, hey, I can improve my company's business by using, say, big data or I have systems resources out there and configure it, roll their own and literally not have to go through the IT process chain of command or at least make it easier. So is this a real trend in your mind, this democratization of IT? I don't want to call consumerization. It's more of, hey, enabling someone who can make change and drive revenue or reduce costs at the same time. So let's start with you. Do you think that's true? Absolutely. I think what that's going to happen to two layers. First is at the cloud infrastructure layer, our job is to make these Hadoop distributions run as best as they can on the cloud layer. And you don't want to have a business unit thinking about, do I have to go and request or deploy Hadoop? It should be a matter of like, I know I've got a cloud, that's a matter of a set of resource pools, I know what's going to cost me, I can just go and request those dynamically and use it. And then at a higher level on top of Hadoop, you would expect Hadoop to head more towards very simple in support of what Amo was saying, very simple multi-tenant self-service Hadoop. So once you've got a cloud infrastructure, then you can say, I want to use Hadoop, this particular job's going to need 100 nodes, give me that straight away and you shouldn't have to have an interaction with the systems group or systems platform to get that job done. Yeah, so I mean, totally agree that both, I mean cloud, the virtualization technologies like VMware and data processing technologies like Hadoop's scalable ones that can grow to many, many nodes are allowing the layman person to do things that just couldn't do before. And we are seeing startups come up left and right that are changing the way things are done that wouldn't have been possible without these technologies. And also public cloud technologies like Amazon, EC2 as well. So for example, a very good example startup I'd like to mention is AdMob. AdMob is a mobile advertising company that- Now run by Google. Sorry? Yeah, it was acquired by Google for $800 million or so. And that company wouldn't exist without technologies like Hadoop and the virtualization, they just wouldn't exist. How are VMware and cloud are working together? Can you share with us any specifics there? I don't know if we're going too much specifically. We are definitely doing performance testing of Hadoop on the VMware platform. And so we've done moderate scale performance testing and we had a lot of concern with customers about big data workloads. In the past, there was a lot of questions about things like Oracle and big data databases and we spent a lot of time showing we can run those well. And now we've been turning our attention to a big data platforms like Hadoop. And so we have lab, we have machines and we're running Hadoop at moderate scale and showing that we can get a single digit overhead. So very low overheads with virtualization underneath Hadoop. Yeah, you still get the qualities of agility. So it's a lot of proof points and, but not in intense integration. Is that because that's just not needed or? No, we have integration between our technologies. I mean, we have, we have the word projects in Apache Hadoop right now and we're WHIRR that allows you to easily deploy Hadoop on top of a vCloud solution. Can you just point to the vCloud if you are and say how many nodes you are and then spawn the talk for you? Well, I just want, we have a startup panel coming up next. So it's a great time and we talk about startups but I want to ask one final question. And I've been following CloudEra and VMware you guys for many, many years and it's got this, you said system software. I mean, it sounds like operating system, resources, deploying, making things easier. You guys have been very impressive armor by being ahead of the competition, being forward thinking. And it seems like this VMware and your cloud connections is a good formula. So what's your vision going forward on, not just the CloudEra but the whole ecosystem of cloud in the next five years? And then Richard, same question to you. What you mean how, sorry, I'm trying to drive what you're trying to get to here, but. How is VMware going to evolve and how is the marketplace going to evolve with commodization, we're seeing smartphones, we're seeing HP's little challenges, they have. All these things are going on in the marketplace. Yeah, so one of the key observations I made earlier is about the storage layer and again, I might be proven wrong there but my prediction there is storage will move away from centralized storage, specifically when it comes to data processing or data access heavy applications because commodity hardware is bringing down the cost of storage significantly within the commodity service. They already have 12 this, 24 this on them. So how can we use these disks as opposed to having to go back to a local storage area network because one, these disks are much cheaper in terms of access and then two, the latency to use this is much better because you don't have to go over multiple network cops or contend on the shared network resource to get to your data. So that's kind of my prediction for the next five years. It will see a major shift in how storage is being managed. A little flash player role there, do you say? Absolutely, absolutely, there's no question. And just a quick follow up and then I'll translate to Richard and you can add this onto the question. Opportunities for startups. I'll see your well financed with great angel investors and you know the landscape and there's a lot of white spaces whether it's, you know, software configuration, you're talking about essentially system software means applications software. What opportunities for startups do you see out there? Yeah, so this one specifically is how to work with virtualization technologies like VMware and others to join the storage layer at the node level with the virtualization layer at the compute level. How to bring these two things together. So question, future, how's it evolving and then opportunities for startups? So two big things in the future. I think I'm gonna hit on the point, local storage will become predominant and virtualization will be showing how to exploit local storage for Hadoop installations. And that's definitely a trend going on. The cost per gigabyte of storage is like 10x lower at that price point. I think the other future that we'll see like in the five year timeframe is when we go and talk to big data customers or data scientists specifically, they typically don't just use Hadoop they might be using some SAS or R or some other code in different languages, PHP. And so what we've been doing in the layers above that is providing radically simplifying those development environments through platform as a service. So the Cloud Foundry Paz framework is a really good example about how you can take all that system software out the mix for those language framework developers. I think the same thing will happen for data scientists. You'll be going to a Paz and you better say, here's my MapReduce or here's my Hive or here's my Perl code or whatever I'm writing in. Here's my R code and I can throw it out at the Paz platform and I have to think about spinning up machines or instances or anything behind that. So a radical simplification through Paz for all the languages that are needed for data science. Paz, if you don't mind, just to remember also there is lots of peculiarity in terms of building solutions on top of this technology way. So a very interesting example that I met with a couple of days ago is a company called Skybox Technologies. And what Skybox is doing, they're building these commodity satellites, very cheap satellites launching these satellites into space and then taking images of different locations and then analyzing these images and selling the data that comes out of it. So an example here is, for example, looking at how many cars are parked in front of Home Depot at different times of year. That data is very valuable for analysts who are analyzing Home Depot and also competitors who are working against Home Depot. Just an example, there are so many things you can build on top of this technology that I just couldn't think of before. I mean, I'm just so bullish on big data and just limitless opportunities for a business analyst or a younger entrepreneur, a subcutous line. You shouldn't need a PhD to do big data or run clouds. So, democratization of IT, we're talking about a systems environment, applications, guys, thanks a lot for coming on theCUBE. Appreciate it. Okay, thanks for being here. Richard McDougall, thank you very much. It was great to see you again and always a pleasure. Thank you.