 Live from New York, it's theCUBE. Covering theCUBE, New York City, 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. Hello everyone, welcome to this CUBE special presentation here in New York City for CUBE NYC, I'm John Furrier, Dave Vellante. This is our ninth year covering the big data industry starting with Hadoop World and it evolves over the years. It's our ninth year day we've been covering Hadoop World, Hadoop Summit, Strata Conference, Strata Hadoops. Now it's called Strata Data. I don't know if Strata, they'll right, I was going to call it next. As you all know, the CUBE has been president and creation of the Hadoop big data ecosystem. We're here for our ninth year. Certainly a lot's changed. AI is the center of the conversation. And certainly we've seen some horses come in. Some haven't come in and the trends have emerged. Some have gone away. Your thoughts, nine years covering data. I remember fondly, vividly, the call that I got. I was in Dallas at a storage networking world show and you called and said, hey, we're doing Hadoop World. Get over there. And of course Hadoop big data was the new hot thing. I told everybody I'm leaving. Most of the people said, what's Hadoop? So we came, we started covering, it was people like Jeff Hammabacher, Amarawa Dalla, Doug Cutting who invented Hadoop, Mike Olson, you know, head of Cloudera at the time. And people like Abhimeta who at the time was at B of A. And some of the things we learned then that were profound as much as Hadoop is sort of on the back burner now and people really aren't talking about it. Some of the things that are profound about Hadoop really were the idea, the notion of bringing five megabytes of code to a petabyte of data, for example, or the notion of no schema on right. You know, put it into the database and then figure it out. Unstructured data, object storage. And so that created a spate of innovation, of funding. We were talking last night about many, many years ago at this event, this time of the year, concurrent with Stroudy, you'd have VCs all over the place. Really aren't a lot of VCs here this year, not a lot of VC parties as there used to be. So that's somewhat waned. But some of the things that we talked about back then, we said that the big money and big data is going to be made by the practitioners, not by the vendors. And that's proved true. I mean, the big three Hadoop distro vendors, Cloudera, Hortonworks, and MapR, Cloudera, $2.5 billion valuation, not bad, but it's not a $30, $40 billion value company. The other thing we said is there will be no red hat of big data. You said, well, the only red hat of big data might be red hat. And so that's basically proved true. And so I think if we look back, we always talked about Hadoop and big data being a reduction, the ROI was a reduction on investment. It was a way to have a cheaper data warehouse. And that's essentially- Well, what do we get right and wrong? I mean, let's look at some of the trends. But first, I think we got pretty much everything right. As you know, we tend to make the calls pretty accurately with theCUBE, got a lot of data. We have the analytics in our own system, plus we have the research team digging in. So we pretty much do a good job. I think one thing that we predicted was that Hadoop certainly would change the game, and that did. We also predicted that it wouldn't be a red hat for Hadoop that was at production. The other prediction was is that we said Hadoop won't kill data warehouses. It didn't. And then data lakes came along. You know, my position on data lakes, I've always hated the term. I always liked data ocean, because I think it was much more fluidity of the data. So I think we got that one right. And data lake still doesn't look like it's going to be panning out well. I mean, most people that deploy data lakes, it's really either not a core thing or it's part of something else and it's turning into a data swamp. So I think the data lake piece is not panning out the way people thought it would be. I think one thing we did get right also is that data would be the center of the value proposition and it continues to remains to be. And I think we're seeing that now. And we said data is the development kit back in 2010 when we said data is going to be part of programming. Some of the other things, our early data, and we went out and we talked to a lot of practitioners who it was hard to find in the early days. They were just a select view. I mean, other than inside of Google and Yahoo. And but what they told us is that things like SQL and the enterprise data warehouse were key components of their big data strategy. So to your point, it wasn't going to kill the EDW but it was going to surround it. The other thing we called was cloud. About four of years ago, our data showed clearly that much of this work, the modeling, the big data wrangling, et cetera was being done in the cloud. And Cloudera, Hortonworks and MapR, none of them at the time really had a cloud strategy. Today, that's all they're talking about. This cloud is hybrid cloud. It was four years ago, I think, Dave, when we actually were riffing on the notion of Cloudera's name, it's called Cloud Era. If you spell it out in Cloud Era, we're in a Cloud Era. And I think we were very aggressive at that point. I think Amar Awadala even made a comment on Twitter. He was like, I don't understand what you guys are coming from. We were actually saying at the time that Cloudera should actually leverage more cloud at that time. And they didn't. They stayed on their IPO track. And they had to because they've had everything bedded on Impala and this data model that they had and being a business model then, then they went public. But I think clearly, Cloud is now part of Cloudera's story and I think it's a good call. And it's not too late for them and never was too late, but Cloudera has executed. And if you look at what's happened with Cloudera, they were the only game in town. When we started the queue, we were in their office as most people know in this industry that we were there with Cloudera and they had like 17 employees. I thought Cloudera was going to run the table. But then what happened was Hortonworks came out of the Yahoo, that I think changed the game. And I think in that competitive battle between Hortonworks and Cloudera, in my opinion, changed the industry. Because if Hortonworks did not come out of Yahoo, Cloudera would have had an uncontested run. I think that landscape of the ecosystem would have looked completely different had Hortonworks not competed. Because you think about they, if they had that competitive battle for years, Hortonworks, Cloudera battle. And I think it changed the industry. I think it could have been different outcome if Hortonworks wasn't there. I think Cloudera probably would have taken Hadoop and making it so much more. And I think it would have gotten more done. Yeah, and I think the other point we have to make here is complexity really hurt the Hadoop ecosystem. And it was just bespoke, new projects coming out all the time. And you had Cloudera, Hortonworks, and to maybe to a lesser extent, MapR doing a lot of the heavy lifting, particularly Hortonworks and Cloudera. They had to invest a lot of their R&D in making these systems work and integrating them. And complexity just really broke the back of the Hadoop ecosystem. And so then Spark came in, everybody said, oh, Spark's going to basically replace Hadoop. Yes and no, the people who got Hadoop right, embraced it and they still use it. Spark definitely simplified things. But now the conversation has turned to AI, John. So I got to ask you, I'm going to use your line on you and kind of ask me anything segment here. AI, is it same wine, new bottle, or is it really substantively different in your opinion? I think it's something different. I don't think it's the same wine in a new bottle. I'll tell you what, well it's kind of. It's like the bad wine is going to be kind of blended in with the good wine, which is now AI. If you look at this industry, the big data industry, if you look at what O'Reilly's did with this conference, I think O'Reilly really has not done a good job with the conference of big data. I think they blew it. I think that they made it a monetization closed system when the big data business could have been all about AI in a much deeper way. I think AI is subordinate to cloud. And you mentioned cloud earlier. If you look at all the action within the AI segment, Diane Greene talking about at Google Next, Amazon. AI is a software layer substrate that will be underpinned by the cloud. Cloud will drive more action. You need more compute, that drives more data. More data drives the machine learning, drives the AI. So I think AI is always going to be dependent upon cloud ends or some sort of high compute resource base. And all the cloud analytics are feeding into these AI models. So I think cloud takes over AI, no doubt. I think this whole ecosystem of big data gets some zoomed under either at AWS, VMworld, Google and Microsoft cloud show. And then also I think specialization around data science is going to go off on its own. So I think you're going to see the breakup of the big data industry as we know it today. Strata Hadoop, Strata Data Conference, that thing's going to crumble into multiple fractured ecosystem. It's already starting to be forked. I think the other thing I want to say about Hadoop is that it actually brought such great awareness to the notion of data, putting data at the core of your company, data and data value, the ability to understand how data at least contributes to the monetization of your company. AI would not be possible without the data. And we've talked about this before, you call it the innovation sandwich. The innovation sandwich last decade, last three decades has been Moore's law. The innovation sandwich going forward is data, machine intelligence applied to that data and cloud for scale. And that's the sandwich of innovation over the next 10 to 20 years. Yeah, and I think data is everywhere. So this idea of being a categorical industry segment is a little bit off. I know data warehouse is kind of its own category and you're seeing that, but I don't think it's like a magic quadrant anymore. It's every quadrant has data. So I think data is fundamental. I think that's why it's going to become a layer within a control plane of either cloud or some other system. I think that's pretty clear. There's no like one, you can't buy big data, you can't buy AI. I think you can have AI, things like TensorFlow, but it's going to be a completely, every layer of the stack is going to be impacted by AI and data. And I think the big players are going to infuse their applications and their databases with machine intelligence. You're going to see this, you certainly see it with IBM, the sort of Watson heavy lift, clearly Google, Amazon, Facebook, Alibaba, Microsoft, they're infusing AI throughout their entire set of cloud services and applications and infrastructure. And I think that's good news for the practitioners. People aren't, most companies aren't going to build their own AI, they're going to buy AI, and that's how they close the gap between the sort of data haves and the data have nots. And again, I want to emphasize that the fundamental difference to me anyway is having data at the core. If you look at the top five companies in terms of market value, US companies, Facebook maybe not so much anymore because of the fake news, but the Facebook will be back with its two billion users, but Apple, Google, Facebook, Amazon, Microsoft, those five have put data at the core and then the most valuable companies in the stock market from a market cap standpoint. Why? Because it's a recognition that that intangible value of the data is actually quite valuable. And even though banks and financial institutions are data companies, their data lives in silos. So these five have put data at the center, surrounded it with human expertise as opposed to having humans at the center and having data all over the place. So how do these companies close the gap? How do the companies in the flyover states close the gap? The way they close the gap in my view is they buy technologies that have AI infused in it. And I think the last thing I'll say is I see cloud as the substrate and AI and blockchain and other services as the automation layer on top. And I think that's going to be the big tailwind for innovation over the next decade. Yeah, and obviously the theme of machine learning drives a lot of the conversations here and that's essentially never going to go away. Machine learning is the core of AI. And I would argue that AI truly doesn't even exist yet. It's machine learning and really driving the value. But to put a validation on the fact that cloud is going to be driving AI business is the, some of the terms in popular conversations we're hearing here in New York around this event and topic, CUBE NYC and Stratoconference is you're hearing Kubernetes and blockchain and these automation AI operation conversations. That's an IT conversation. So that's interesting. You got IT really with storage. You got to store the data. So you can't not talk about workloads and how the data moves with workloads. So you're starting to see data and workloads kind of be tossed in the same conversation. That's a cloud conversation. That is all about multi-cloud. That's why you're seeing Kubernetes, a term I'd never thought I would be saying at a big data show, but Kubernetes is going to be key for moving workloads around of which there's data instrumenting the workloads, data inside the workloads, data driving data. This is where AI and machine learning is going to play. So again, cloud assumes AI. That's the story. And I think that's going to be the big trend. Well, and I think you're right on it. That's why you're seeing the messaging of hybrid cloud and from the big distro vendors. And the other thing is you're hearing from a lot of the NoSQL database guys, they're bringing acid compliance. They're bringing enterprise grade capability. So you're seeing the world is hybrid. You're seeing those two worlds come together. So their worlds, it's getting leveled in the playing field out there. It's going to all about enterprise, B2B, AI, cloud and data. That's the cube bringing the data here. New York City, cube NYC, that's the hashtag. Stay with us for more coverage live in New York after this short break.