 from Union Square in the heart of San Francisco. It's theCUBE, covering Spark Summit 2016, brought to you by Databricks and IBM. Now here are your hosts, John Walls and George Gilbert. Well welcome back here on theCUBE as we wrap up our coverage here of Spark Summit 2016 in San Francisco. John Walls along with George Gilbert. George, we talked a lot about the enthusiasm here, about the presence here, about really the vibe with a lot of our guests. So I want to get your take on that. 3,500 attendees, right? Shows going strong, bigger than ever, jam-pack keynote session, so a lot of good things happening, but what's your take on what all of this means in terms of where Spark is in the marketplace? Well, I mean, you have to remember that like Spark and the core of the Hadoop ecosystem started at very different points in time. You know, Hadoop was, well, the components of Hadoop were published in a paper by Google back in, I think, 2003, something like that, and then Yahoo turned it into an open source product within a couple years, whereas Spark as a paper wasn't until 2009, but the numbers of people turning out for the respective ecosystem conferences are converging very quickly now. You know, 3,500 here, I think the most recent Hadoop conference was in the range of 4,500, so that's a rather rapid ascent on the part of Spark, but I think the energy is that the creators of Spark put in place something that has more runway so that you can push the walls out further before you hit barriers that are rather insurmountable. I'm going to use what might sound like arcane terminology, but there's something called adaptive stretch, which is when you take something simple and you bend it in so many different ways that it was never meant to be used, that it turns into, you know, barnacles on the bottom of a boat, and you know, and then a mixing metaphor. I love the imagery, yeah, I'm not sure if I... And then, you know, that dog just doesn't hunt. And we had Doug cutting on to open up this morning, who was the creator of the open-source version of Hadoop, and he acknowledged that Spark is replacing MapReduce, it's the core calculation engine. But about what's so on that point though, so that MapReduce, that's one slice of it, but you got yarn still going strong, what have you. So Hadoop and Spark, I mean, are they, can they fit inside the same puzzle? They can overlap, and they can largely compete almost completely. Now, you can run Spark within the Hadoop ecosystem, so where you carve out anything that's MapReduce-lated, which is related, which is all the calculation engine type stuff, and then you keep the storage, the different storage layers, the databases file systems, and the management tools from Hadoop, and you have a simpler Hadoop when you combine those Spark, that Spark piece, and then the remaining Hadoop pieces. That's coexistence. There's a version where it's complete substitution, which is what Databricks does. They don't use anything that's from the Hadoop because system they run Spark in their cloud on Amazon, and others have solutions just like that. So they're probably in that camp that if I'm an admin or developer, whatever, and I find it much easier to operate Spark, run Spark than I do Hadoop. But why is that? What's the secret there? Okay, so I'm being a little politically incorrect, and I'm going to use a- It would be the first time. You figured that one out. I'm going to use a semi-biblical reference, which is when the Great Flood was coming and Noah had to march the animals onto the Ark two by two, well, in the case of Hadoop, you have to sort of march every animal, and Hadoop, every component is named after an animal. You have to march them on the Ark three by three because for high availability, to have a quorum, if one fails, you still need two running, and then you need also three zookeepers to march them in case one of the zookeepers fails. And so when you look at all those animals and the zookeepers, that's a fair amount of complexity where Spark is one singular engine. And you run it on a cluster, so if a node fails, but it's still one engine. And the difference is that means all those components run the same way. Their security works the same way, their high availability works the same way. Whereas again, with the animals marching in triplicate, each of them is a different species, and you almost have to have a different trainer for each of those different species. And that worked 10 years ago, but we can do better now. Yeah, I don't know if that was too politically incorrect, frankly, I mean, it's just kind of tough to throw no in here, but that's all right. We've heard a lot about continuous applications this week too. And that being just launched, right, this notion of the continuous app and melding intelligence, so again with all these other capabilities. How close is that, you think, or what's that going to take to really be widely adopted, become common practice, as opposed to just a really neat concept right now on a drawing board? Well, we heard a lot of our speakers trying to move in that direction. And to define a continuous app, let's put it in historical perspective. In the very early days of computing, you took a stack of punch cards or tape, and you put it into this giant machine and you go away and you come back sometime later and it gives you an answer. That's batch. You're running through a batch of cards or a batch of tape. The next big advance was interactive, which is it gave you a screen and waited for you to interact with it. And then it gave you an answer. And in fact, in reality, all those applications had a little bit of both. So really though, for the first time in 60 years, we're inventing a new programming model, a new way of interacting with applications where they never stop. They're always working. And we've had problems to work with computers that work continuous, but we didn't have the technology to treat them that way. Think of point of sale terminals in Walmart. They're always collecting Walmart wide. They're always collecting data and transactions, but it wasn't set up to work that way. It was set up to collect, well, I guess maybe till recently, you collect the line items for a particular customer, run the credit authorization. It might go into some server in the store and then nightly it might get batched up and sent to Walmart headquarters. But now you want to treat something like that as a continuous stream because you get signals from that data. Signals that tell you, pop tarts are selling out in Mississippi or Oklahoma because the twisters bearing down on them, that sort of thing. That's a pretty profound change, the fact that we're moving in this direction. It's not here yet, but part of the excitement over Spark is that they took the batch programming skills that everyone has. Everyone knows how to deal with a SQL database. You can deal with it interactively or in batch, and they took that skill and that capability and they made it work on a continuous basis. And that's right now part of why Spark is so exciting. Now, you can't build an end-to-end continuous application yet, there's still work to push the walls out and make that possible, but that's a big part of the attraction. As we're wrapping up, I think we're being told to wrap up. Cleanup crew start to bring the curtain down on us here. MapReduce had kind of a, I mean, you can see the shelf life in a way, right? People are predicting from a few years back kind of its end, basically, if you will. I mean, what about on the Spark front? Is it the same kind of runway out there? Or as you said, there's a little more malleability, a little more flexibility. You can push and shove a little bit. There's still some elbow room here. Is that, how do you see that? Using another, I don't know if it's a mixed metaphor this time, but think MapReduce had a, think of it as a plane on a relatively short runway. It was a very low-level language. It wasn't very expressive. It had performance issues. People tried to put things on top that would sugarcoat it, but ultimately it hit the end of its runway. Spark is like San Francisco International Airport. The runway is much longer. We know that because we know some of the problems that are coming up for it to solve, like internet of things, not just in the cloud, but out on the machines, the edge of the network. We know we can make lighter-weight versions of it, so we know that we're not going to have to break off and do something new at the edge. For the next couple of years we might, but we know that with Spark we can solve that problem. Even more significant, something called deep learning, which is what beat the go player, which is far more difficult than chess. We know we can run that on Spark. So the point is some of the far off in the distance walls or problems that we might expect to see, limitations for Spark. They're not going to be coming any time soon. They're not, yeah, we can see them, we can see ways around them, right? Excellent. Well, George, it's been a pleasure. Really enjoyed it the past couple of days. We started with IBM on Monday and moved over to here to the Spark Summit on Tuesday, Wednesday. I hope you've enjoyed it here on theCUBE. And for George Gilbert, I'm John Wall, saying so long for now from San Francisco and the Spark Summit 2016.