 Let's go, extracting the signal from the noise. It's theCUBE, covering VMworld 2015. Brought to you by VMware and its ecosystem sponsors. Now your hosts, John Furrier and Dave Vellante. Hello everyone, welcome back. This is theCUBE, live in San Francisco for day three of coverage of VMworld 2015. Again, live in San Francisco at Moscone North Lobby, right off the street. I'm John Furrier with SiliconANG. I'm Joe McCos, Dave Vellante. This is kicking off day three segments with two days of wall-to-wall coverage. Go to youtube.com slash siliconangle or siliconangle.tv for all the videos. We have special guests on this kickoff. George Gilbert, senior analyst at Wikibon on Big Data. Guys, Dave, big night last night, obviously it was the circuit party for all the action. VMworld is really known for their engagement, their communities, interactions, but also they're known for all their parties. And certainly tonight, the super party at AT&T Park. But we learn a lot at these parties. People love to talk and we love to listen. So we've got there pounding the pavement, getting the scoop. And interesting, a lot of venture capital parties. So you've got a lot of startups here trying to, VC's trying to find the startups, but certainly really the post-Gel-Singer day yesterday, really about the VM, where the future of VM, where what's going on with EMC, the Federation, a lot of conversation around the ecosystem. Everyone's engaged. I mean, it's kind of a lot of storage focus, but again, it's a free-for-all on the floor there. Ton of activity. It seems like people are waiting. It's like the calm before the storm, we're in the eye of the hurricane. Some stuff's going to happen. We're in this ecosystem, a lot of transformation. Dave, what's your take on what you learned last night on theCUBE? Yeah, so last night we were at Lightspeed, hit Highland Capital, Kleiner Perkins, Sequoia. We met a lot of entrepreneurs, a lot of VC's. I'll say this, there's investments in innovation going on virtually every part of the stack. And surprisingly, including storage, because you figure, okay, who needs another all-flash array company, but there are several in stealth. But also a lot of activity around security and networking. Tons and tons and tons of investment going into those spaces, and also still tons into the cloud management. So it was good. And then as well, George, a lot of big data action going on. We saw numerous people, I saw Ben Werther out last night. We saw Stefan from Datamere, a bunch of folks in the Hadoop and big data community. Not that this is a Hadoop and big data show, but everybody's here, and it's close. So we wanted to have this special segment with you, George, to sort of catch up on what you're seeing in the space, special big data segment here at VMworld 2015. So give us the update. What are you hearing in the ground? You've been focusing on Spark versus Hadoop. Hadoop 3.0, you've been talking to the practitioners, the technologists, what's your take on what's going on? So if I were to sort of make the drawing evolution of big data on this infrastructure and sort of in stages, Hadoop 1.0 was MapReduce. We all know that. And we built higher level things on MapReduce to hide its complexity, but you could only do one thing at a time. It was the DOS era. It was pretty low level. Sea prompt. Yes, exactly. With Yarn and HDFS, we had enough of a platform where we could have multiple compute engines working. We could have an Impala for sort of MPP SQL query. We could have machine learning going on. We could have essentially multiple applications working on the same data. What has been less clear is what Hadoop 3.0 is going to look like. And I think those threads are beginning to come together where Yarn will grow up from something that just allocates hardware resources to something that can also manage workloads. One of the sort of secret sauces and best kept secrets within Oracle is that you can run queries that are optimized for performance, but also that have like hundreds of other people competing for the Oracle resource. And it can manage that. The what's leftover resource. Yes. And the methods. So you're talking about your quality of service, if you will, for the main app. And then, OK, everybody else can go fight over it. Well, but it's also, which apps get how much in terms of processor and memory, things like that. But then there's the other layer, which is how each user, based on the role and their priority, how much of the workload gets distributed to them within the application and then between apps. So you're running Oracle, and you might be running a stream processor. And how do you keep everyone sort of happy? That's now a layer that's becoming a sort of either Yarn v2 or messes. And you're saying that's a harbinger of what we're going to see in Hadoop 3.0? As a Hadoop 3.0 as a collection of capabilities, that would be the new foundation in terms of resource management. So it's really kind of like an operating system, like Windows manages all the workloads within all the applications that are cooperating. Then, right now, HDFS is kind of limited as a storage medium because you can only add stuff. So it's sort of like an archive. And it's also pretty much disk-based. But we want to have a performance tier that's in RAM. And we also want to have sort of a higher performance kind of storage tier, like our analyst David Flurr talks about, which is sort of large, flash, but high speed. Low latency. Yes. OK. No, but doesn't Spark bring that? It's very intensive. Well, that's the thing. Some people think Spark will choke on the fact that it needs so much RAM. And then when it doesn't, it spills to disk and gets really slow. But if you have a richer storage tier where you have both RAM and this low latency persistent storage as well as traditional disk, then things like Spark and others can become much more cost-effective in very, very large scales. That's those are the first two layers. Then the third layer is we've got all these different vectors going off where we have the MPP SQL databases. We have these document databases like Mongo. We have the stream processing things, because whether it's the internet of things or whether we want analytics to happen really, really fast, not store it in the database and then do something with it. So this next layer that's coming together is putting together the storage where you could have key value and tables that you would get from HBase, the files that you would have from HDFS++ and a doc database like MongoDB. That would all be integrated in one layer. Then you would have your choice of streaming or storage. And then on another layer, you would pick your poison in terms of analytics, whether you want machine learning, whether you want SQL. So you have all these three things integrated. Then on top of the storage layer and with this resource mediation where you put those three together, you have a platform that customers can essentially go to their database vendors and say, OK, you're good for the transaction processing. We're not going to do that here. Everything else? Yes. But you're talking about totally reinventing the big data stack. I'm talking about evolving it. OK, so it's not reinventing it. All right, so it's specific. What vendors now are doing that? Well, it's reinventing what? Well, how does that relate to VMworld? OK, it relates to VMworld in the sense that we need sort of infrastructure as a service to make that work, because we have to abstract out the complexity of running the clusters. Well, it also relates in the sense that guys like Amazon, as you pointed out to me, George, are building that end-to-end data management stack and delivering it as a service. And that is a killer offering. Within the infrastructure. Everything from infrastructure to service all the way through platform as a service. And that is the how does VMware and others compete with that? OK, here's my take. So someone like a VMware, maybe OpenStack, maybe IBM with SoftLayer, if they can manage that infrastructure as a real service, like Amazon does it. And then you have the software vendors, like a MapR. MapR is really bringing these software layers together. But you'll see it coming from others. I think you'll see it coming together from them for an on-premise offering. If you put that on top of the VMware style infrastructure, and if MapR can manage it as a service, you're going to see essentially the TCO that you might get in the cloud, but without the proprietary lock-in that you would get in the cloud. Okay, so now let's come back to Spark V Hadoop. When Spark came out, you guys did the Spark Summit. There was a lot of excitement around Spark. IBM, I think they did their billion-dollar play or playbook, or maybe not, I can't remember about. But they put a lot of effort and resource into Spark. Some people said, oh, IBM's just doing that to kill it. But you, George, got very excited and said, hmm, you actually made the point to me, is maybe Dave, excuse me, Hadoop needs Spark more than Spark needs Hadoop. And we sort of had that conversation. What's your take now, having gone out and talked to people, maybe getting some arrows thrown at you, shot at you? What's your take, give us a sort of view of the Spark versus Hadoop, where does it fit debate? What I talked about in terms of these three layers, storage, sort of resource scheduling and mediation, and then the application analytics on top. Spark fits in that in the sense that you can have one engine that does your machine learning, your streaming, your graph processing, SQL, all on the same data set. And it's a single application model. It can rest on the mediation layer, whether it's yarn or even more sophisticated mesos. And it can rest on a storage layer though that has to evolve somewhat from HDFS, just like the broader Hadoop v3 I was talking about. We need a smart storage layer so that yarn, I'm sorry, so Spark knows where to move data around when you're doing the analysis. So a couple of little things we hear in this week, John. So the rumor was floating around that Microsoft was going to buy Docker. And supposedly that deal's done because Ben, they couldn't agree on a price because the price was too damn high. So a good way to go Docker is to stay private. So word is now they're going after mesos. That's because if you look at mesos and if Microsoft does some interesting work on storage, then they can lay claim to the foundation of the next generation data center operating system. And that makes Microsoft extremely relevant in that space, doesn't it? Right, also Microsoft needs have a play here and that's bottom line. Docker is going for, they have too much traction in my mind. Docker is going to be the winner in my opinion, not just because I like the company and like what they've done, but their lead is significant. Their mind share, their developer traction is significant, but that doesn't mean there's room for other guys. Categories being created, there's going to be a number two and number three. I agree with you. I love CoreOS, love Alex, but I mean Docker's got a lead, they've got the market momentum, they've got the mind share. The CoreOS, I mean I was talking with Alex last night at the Atlanta Perkins party with CoreOS and I'm a huge fan of CoreOS, been on theCUBE, he's been on theCUBE. Here's the problem that they have. They're running really hard right now, super hot, they're scaling up, they're looking for engineers, they got basically an open ticket item, any engineer they are trying to hire so fast, they're growing really, really fast. At the same time there's a lot of pressure coming at them, M&A pressure, everyone wants to do a reach around on them, everyone wants to grab them, they're the bride right now that everyone wants to dance with because Docker's saying, no, no, we're going alone, so that's a lot of pressure for an entrepreneur and team. So I said, look it, put up the heat shield, you got to protect yourself and still run the creative product development, they have to go faster and I think that's the biggest challenge with CoreOS and my opinion is, do they just blow up? I run it too hot, but right now they're looking very good and I think it's a great company. Well, with a few points, there's definitely room for number two, there's always a room for an alternative, but the market pressure for an emerging startup is really hard to grow a company really fast and not self-induced blow up, that's the only thing that's going to stop CoreOS in my opinion, CoreOS is solid and with Docker's traction, there's totally that love, hate relationship going on between the two. So bottom line of the bottom line, Spark v Hadoop, Spark kills Hadoop, Hadoop evolves, there's a coexistence, what's your take? You know, I don't think it'll kill Hadoop, but there will be a certain class of customers the most sophisticated and potentially also the smaller ones where integrated on the high end where you can take advantage of features that sort of reinforce each other that no one else can offer and on the lower end, simplicity of having a one package where it's very compelling, but it's likely to involve some amount of the sort of Hadoop v3 infrastructure that can exist separately from it, the Databricks doesn't really require the Hadoop infrastructure itself, but you'll see a Venn diagram where there is some overlap and you'll see situations where the Hadoop vendors say, oh yeah, we'll run Spark as a job and you'll see the Databricks guys and perhaps IBM as well say, you know, you can run it self-contained. All right, let's wrap, John. Give us some predictions, you know, based on what you've seen here this week and then we'll wrap it up. Yeah, we've got a long day today but we want to kick off this segment by saying, look, we were on the streets last night, pound the pavement, year to the ground, listening to all the conversations and everyone's got their side of the story of what they want to promote. A lot of people promote in their agendas but here's the bottom line. We predicted on Silicon Angle, I predicted on theCUBE that EMC would not allow a reverse takeover. I'm still maintaining that prediction that EMC will create a new version of itself, maybe shed some product divisions off to the HPs of the world that really have good synergies, doing some M&A and VMware comes back in. You're predicting Federation 2.0? I'm predicting that VMware is so powerful that if EMC, you said this yesterday, I'm borrowing your line, is too powerful for EMC to give up and I think that legacy of the Federation is key and this Elliott Capital is the Gordon Gecko, modern hedge fund, he's evil and think about the damage that would happen if he raids that company. Thousands of jobs glossed in Massachusetts, how can it be a ghost town? I just don't see the individuals who run EMC, they're tough, they're not going to let that happen. So my opinion, I think you're going to see EMC reborn, rebooted, pivoted, whatever you want to call it, in a whole new way with VMware at the center, I just don't see VMware. But that story doesn't end there, right? Because Gordon Gecko's not going to go away. Well, lighting a fire into someone's butt is one thing but coming in and taking down the company just over short-term gain, I don't think EMC will let that happen. The founders and the DNA of that company is resilient. And then obviously the startup scene, Docker versus CoreOS, that is going to be a real battle. And then on the big data side, you're seeing big data conversation here at VMworld, but from an IT ops perspective, I think you're going to see the developer piece, got a big part of next year, where it's real developers, where infrastructure as code is going to be the DevOps ethos. And then you're going to start to see the end-to-end architecture. I think that's what the new EMC is going to look like. They're going to come out looking more and more like Oracle, more and more like an engineered system end-to-end with cloud to compete ultimately with Amazon. So, and big data is going to be sprinkled all throughout that fabric. It's that next layer. It's that next layer. Again, we're breaking it down here inside theCUBE day three. We're kicking off, we've got two sets. We have our director set, our new innovation. Go to siliconangle.com and wikibon.com for the free research and subscribe to some of the cutting edge research over there. Of course, siliconangle.tv, where all the videos are. And we're going to be more live from seven years ago after this short break.