 Live from Midtown Manhattan. Breaking analysis from theCUBE at Big Data NYC. Made possible by Hortonworks. We do Hadoop. And when Disco, Hadoop made invincible. Now your co-hosts, John Furrier and Dave Vellante. Okay, welcome back everyone to Big Data New York City. This is theCUBE, our flagship program. We'll go out to the events. In some cases, create our own event, Big Data NYC, which is here live in New York. We are covering all the Big Data action news analysis here and in New York City, around Big Data. Also covering Hadoop World and Stratoconvers right across the street. So very exciting news segment here. Want to bring in Dave Vellante, co-founder of Wikibon, my co-host of theCUBE and Jeff Kelly, big data analyst to break down the news and analysis in the past two, three days. And of course this news segment is brought to you by Hortonworks and WAN Disco. They do Hadoop, nonstop Hadoop. These are the supporters of theCUBE and I've done a great job of supporting our independent analysis here at Big Data Week, Big Data NYC, Hadoop World Stratoconvers. So guys, let's just jump right into it. Dave, I want to get your take on it. We saw a lot of news, a lot of conversations and a lot of, let's start with the news first. Then we'll get into the conversation, what I call conversation news, the scuttle but the hallway conversations. What we're hearing in the streets, what we're hearing from the people at the parties, kind of like the back channel. So let's get to the front channel, which is the news. What are you seeing, it's the key news. Well, I mean, you're seeing a lot of themes obviously at Hadoop World and Strata and Big Data NYC. It started maybe a year ago. Everybody talking about real time bringing SQL to Hadoop. You're hearing less about bringing SQL to Hadoop. I mean, I think people are generally comfortable with that but a lot more about real time with Hadoop 2 and Yarn. We're starting to see increasing number of use cases and a much more broadening of the applicability of Hadoop. That's sort of one thing. This whole meme of we got to make Hadoop Enterprise ready which started probably two or three years ago when big guys like EMC started to get into the business and IBM. I think people are getting more and more comfortable applying Hadoop to specific use cases for the enterprise. So there's the governance, the security, the performance, the scalability aspects are clearly starting to improve as the ecosystem matures. You're seeing some very interesting battle lines occur, swim lanes, whatever you want to call them with regard to not only the distribution and the base infrastructure but also the middleware and the functionality on top of that base infrastructure, the analytics platforms on which you can build applications. And I think the other thing is we're still see a lack of commercial off the shelf applications in this community which is slowing down adoption because the skill sets aren't there and the heavy lifting has to be done in-house. So my take on the news, Jeff, when we get your analysis and perspective is to kind of run down, to me the top stories here is obviously Hortonworks data platform 2.0 with big announcement. They do that prior to the show, kind of on the eve, kind of to try to pre-activate the audience prior to Cloudera's big announcement which is the data hub. Those are to me the two significant major announcements at the platform level. You are seeing the emergence of data engines or data platforms. Cloudera calls it the data hub, Hortonworks calls it the data platform. Kind of different approaches but the same naming but the same approach. Enabling platform, we're seeing a lot of decoupling of key subsystems, MapReduce and Yarn. Enabling applications on top of it to do more. So that's to me the big story right there. To the launch of ClearStory, we're going to have Charmilla Mulligan on shortly. She's old school, goes back to the Netscape days. She is a serious entrepreneur, serial entrepreneur, great executive. We're going to have her on theCUBE, Silicon Valley legend. Also, Continuity had some news. Splunk had some news. Pivotal announced. We had Pivotal, we had HP. There's a lot of stuff going on and a lot of little announcements but the second wave of the story is the business BI market. And to me, the story of the business intelligence changing and morphing into a modern view, you're seeing the data warehouse concepts kind of taking a center stage, backstage to the center stage of BI. And I want to get you guys to take on it. Do you see the same thing? You're seeing data platforms, not necessarily data warehouses. You know, it's interesting. Jeff, remember the term decision support? Nobody uses that term anymore, DSS. Some of us feel like BI is going to go the same way. Do we need a new term for business intelligence? Business insight. Yeah, well, I think business intelligence is a loaded term. It comes with a lot of baggage. So yeah, I don't think that term necessarily is one that we're going to hear a lot in the long term. What's the new term? I like, you know, clear story. As you mentioned, John just launched and they're calling it data intelligence rather than business intelligence. But it is about getting, whatever you call it, it's about getting data and insights into the hands of end users so they can actually make better decisions. So I think that's clearly a theme we're seeing. You know, some other things, hearing a little bit of talk about the cloud. Not as much as I think I thought we'd hear about. AWS made an announcement today. They're updating their platform, their Elastic MapReduce platform to support Hadoop 2.0. You know, we see Microsoft announce the general availability of HD Insights, which is their Hadoop offering based on the Hortonworks distribution. So that's an area where I thought we would probably see a little bit more action because the cloud obviously offers the ability to kind of abstract away all that complexity of running Hadoop and really let customers focus on what they do best, you know, their core business rather than being in the business of running, you know, Hadoop clusters. So that's another area to keep an eye on. But you know, ultimately, this is about making better decisions with data. And so what I like about this show this year and the vibe here is that that's really what the conversation is around. It's not about the guts of the technology inside of Hadoop. It's about how do we use this data to make better decisions. And related to that, how do we make sure Hadoop is enterprise grade, security, always available, et cetera. So that's kind of where I see the conversation going. You know, Dave brings out decision support. I love that. I just did a little tweet on the crowd chat. And we've talked to us in the past with about this old terms, you know, data processing, you know, the DP department was the punch card day. The old mainframe days, data processing, decision support, these are old glass house for the people, the young people don't know that means. That was where they kept the mainframes. But again, these concepts are rearing their heads again in a new way. You know, what's the old expression? Same wine, new bottle, but you know, data processing and decision support is now data platform advanced analytics, right? I mean, do you see, I mean, is this the software mainframe, Dave? We talked about this with Maritz launched, you know, the map in 2010 was at VMware. You know, it's the other thing. So I remember we were at the Tableau conference early this year and Kristen Shabo, very dynamic individual CEO of Tableau. When he, and whenever he's a very fast talker, he's really excitable. And when he talks about the old data warehousing and decision support and BI business, he just slows down and puts it in a bucket. And I really do think that's apropos. I think that we do need new terms. I mean, data intelligence, it's interesting. I like it. It's all about being able to take action on that insight that you're getting and that seems to be what's missing, right? And to me, it's like Bill Schmarzo's quote that he has in this book, this whole notion of being able to put, what did he say here, the challenging and conventional thinking regarding how non-analytical business users should be using analytics. I mean, that is the holy grail. And frankly, if the big data business doesn't achieve that, it's going to end up just like the promises of the decision support and business intelligence and enterprise data warehousing business. A lot of money was made, but ultimately ended up being a reporting platform. I think one of the keys is for, there's definitely value in a dashboard and reporting and we're still going to see the need for that, but really it's about putting intelligence in context for end users. So whether that's a recommendation engine, next best action rather than a user simply kind of digging through some data. Put it in context of what they do every day, make it easy for them to understand what's the next best option for me? What decision should I make? What action should I execute? Rather than, okay, I see a dashboard and it gives me some interesting insights, but in the day-to-day flow, how do I make a better decision? We had one of the guys on yesterday talking about visiting a hotel chain. They were talking about instrumenting the entire hotel, having sensors everywhere, key card swipes, understanding who went to the gym, what they're eating so they can personalize the experience in another hotel thinking why would we do that? Well, which one's going to win from a competitive standpoint? Right, and it's a immensely complex problem to crack. It's what we're seeing in the industrial internet, but at scale, forget one hotel, think about a chain of hotels, think about a city. So that's the promise, I think, of big data. We're years from that. Okay guys, I want to get your perspective on now the Scuttlebutt, what's the hallway conversation? So whichever, we'll start with you guys. You were out scouring the landscape yesterday. We've been on doing the analysis. Dave and I have been doing theCUBE. We had our party last night for theCUBE party. Guys, what's the Scuttlebutt? What are you hearing out in the hallways, in the streets? What are people talking about this week at Big Data NYC? Well, I mean, there's definitely talk about how these infrastructure, the Hadoop platform wars are far from one. There's not very uncertain what's going to shake out here. So that's one. The same thing with the NoSQL is like eight zillion. Well, how did Sean Conley describe it? The thin slices of Prejuto is how many, NoSQL database platforms are out there. So there's a lot of uncertainty as to how that's going to shake out. There's a lot of, we heard from some Scuttlebutt, from some of the application guys saying, you know what, I don't need all this stuff that these guys are trying to sell me. I'm building it in my own. I'm building my own database. I'm building my own security. I'm building my own data integration platform. I'm building my own ETL. I don't need these companies. Now, also hearing the distro. I mean, we're hearing people saying, hey, you know what, distros are like a commodity. So are you hearing the same thing? I mean, what you're basically saying is I want flexibility. I think people agree that distros are the commodity, but still, on top of that commodity, there's money to be made. So Sean Conley made a really interesting point about the giga-home article, because Claudero was basically saying, hey, we lost nothing from that deal. And he said, oh yeah, you did. You lost the right. Essentially, he didn't say these words, but I'll use my words. You lost the right to have maintenance revenue for a hundred years. That's what you lost. That's big business. So still some very interesting business model sort of shakeouts going on. Scuttlebutt? I don't know, Jeff. Well, anybody who says the distribution war is over or it's not a fiercely competitive market is, that's inaccurate. What the Scuttlebutt for me is how fierce this competition is. A lot of off-the-record conversations about different vendors talking about one another, and trying to obviously put their company kind of at the top of the heap, but this is very competitive. It's clear. It gets a little snarky at times. I mean, it's one of those. Well, it's a lot nicer now than it was two years ago when Hortonworks came in, rubbing elbows in there and sharp elbows. And Claudero had the lead. And Claudero was the pioneer. And I gotta say, I like what Claudero is doing. I think they're just laying it out there. And one of the things, like I said this on the intro yesterday, what Claudero is doing that I like is they're basically showing the world and being completely transparent about their business model. There's no head fake. Now, I still think that that's ingenuous for the CEO saying that Hortonworks is not, disingenuous of me to say that Hortonworks is not a competitor. That's ridiculous. Of course, they're a competitor. I think what they're trying to say is, if they were in a competitor, then why is it only mentioned Storify and Spotify in these conversations? The point is, they're just laying out their path. And I think what they've done is successfully, in my mind with Claudero, and I talked to Amar about this, Amar Awadala, is that their platform, so Claudero's vision initially, Dave, was to be a platform. And when they were pioneering, they didn't have a lot of the politics because they were the only game in town. Enter EMC with Pat Gelsinger, enter Hortonworks with the Yahoo! spin out. So certainly it's gotten competition. Intel's now in there. So what you're seeing is, Claudero are putting a stake in the ground. And they need to do that. And I think clarity is key. And I think this is not a pivot in a major way. It's just more of a direction clarification. And I think that's positive. And the data hub, some people are criticizing the name. I kind of, I don't mind at all. I think it's fine. Data platform hub. Glad they didn't call it a data warehouse, but I think what they're trying to say is, this is the modern data warehouse. That's a viable strategy. Very, very viable. Different than Hortonworks. That's why I think that's where they're coming from. See, I don't think it's that, really, if you take away some of the labels, I don't think it's that different. They're both trying to serve as a platform to enable all sorts of type of applications on top of it. I don't see how it's different. They're all doing the same thing. I mean, the business model is different, but the end result is a platform for data. Well, the interesting thing to me about what John was saying is that Claudero basically came out and said, hey, we compete with Pivotal. Those are the guys that we're competing with, and IBM. And so here's the interesting little scuttlebutt that I picked up last night, was that the EMC guys were saying, you know, hey, we've seen this movie before with VMware where the spin out Pivotal is very hands off. You know, we actually have a hard time sometimes doing business with Pivotal. We're aligning with Claudero and Hortonworks, because really the Pivotal guys are aligning with our competitors. And so that was kind of interesting, but the same time you, Jeff, were at the NYSE, and there were a lot of customers there, guess where those customers came from? They were reeled in by EMC sales reps. And so that's a big advantage that Pivotal has with regard to access to that customer base. And same thing, obviously, for IBM times 10, right? Right. It's interesting because obviously EMC, huge, you know, storage is their business. Hadoop, something like Hadoop can undercut that business to some degree, so it's interesting. I'm gonna give EMC credit for spinning off Pivotal and really investing in that company and, you know, pushing leads their way. That couldn't in the long term potentially impact their storage business, but this is kind of what EMC does. They saw virtualization, they spun out VMware, they see big data, it's been out Pivotal, smart strategy. So, John, you're the master networker. You're out in Silicon Valley, you're hearing stuff from the street here. What's your take on all this? On what? Yeah, the inside Scuttlebutt, what's going on? Well, I mean, to me, there's a lot of threads. One, I think the BI market is exploding. I think to me, the Scuttlebutt is everyone's racing to be that layer of business intelligence because to me, that's the search engine of the future. It's an abstracted away notification network. If you look at big data origination, the one that Jeff Hammerbacher originally was sour on, you know, recommendation engines for ad servers, that's essentially what you're gonna see with big data at BI. Basically, real-time semantic analysis where things are just happening for users in the UI, in the, that's a search paradigm, that's a discovery, that's a user experience. To me, that's the modern BI. I think that's the secret, public secret in Silicon Valley and the venture capital community is they're putting the big bucks down, heavy funding, platform, they all get this game, they're all driving to that. Underneath that, it literally is the platform wars, but it's nice, I mean, it's a clean environment right now because the market is growing so fast. So the people I talked to, Dave, said, hey, you know what, there's no real competition, there's posturing, but at the end of the day, it's a massively growing marketplace and the total TAM is exploding as you guys pointed out. So when you have growth markets, you don't squabble over fruit on one tree, there's plenty of fruit to go around, plenty of beach head. So to me, that's the key. Scuttlebutt going on, I was like, hey, you know what? There's so much growth and it's so early. I don't know, I agree with you, but I still see a lot of squabbling, but nevertheless, the other thing I wanted to point out, this whole idea of BI, I mean, look, this is what humans do, they take information, try to process it and make a better decision based on it. That's what we've been doing since the dawn of time. So this is just, that's what BI is all about and now it's a new approach. We've got more data that we can work with, but it's really an age-old problem. Take an information process that makes sense of it and take an action. Well, the other thing that's really interesting here is we talked about earlier is the whole no-SQL database space. You go back, we've talked about this a lot. In 1990, you couldn't have predicted SAP was going to win in the ERP battles. In the 80s, you couldn't have predicted that Oracle was going to win in the database battles and it's really hard to predict who's going to win in the no-SQL battles and the wild card here yet again is open source with whether it's HBase or pick your platform. Who's got the momentum? Obviously, Mongo has the momentum. You've got data stacks doing pretty well. You've got niches like Aerospike, tucking into things like ad serving. You've got Accumulo as well, tucking into government and other financial services and then dozens and dozens and dozens of other. I mean, how do you pick a winner there? Can you pick a winner? Does there have to be a winner? I mean, a lot of those are open source technology. I think there will be a winner. Well, they're open source technology and different companies can choose the flavor that works for them best. Database is infrastructure, so at some point, it consolidates. I think. I just don't, I mean, can the market support, I mean, how many are there? Oh well. 30? Probably, I don't know. I think I just heard a new one spring up, so yeah. So can the market support that many? I mean, I would suspect that as standards develop, something's going to emerge. DynamoDB, I mean, a lot of key value stores out there. It doesn't need to be 30 key value stores. Yeah, and then we're seeing kind of the new SQL, kind of the merging of some of the no-SQL and SQL type of databases. So yeah, there's quite a few out there. And guys like Cloudera and Hortonworks for that matter, you know, they're playing their, they're playing the field, but they're still really got the thumb on the scale for HBase, because they know if they can do a better job integrating with HBase, they're going to get the downstream residuals, right? Guys, I want to get your take on a couple things. You brought up the whole, you know, aero spike in memory, and I want to get to that in a second, because that brings up the SAP conversation. One of the big themes that we talked about yesterday is the role of these big guys coming in. So Hadoop is obviously maturing, it's been validated, use cases are presenting themselves, starting to see some scalability happen. That's a very positive for the industry. But you've seen the big guys come in. We saw Boyd Davis last night from Intel. We saw SAP here. So guys, I want to get your take on the big guys. And their role with Hadoop, Hadoop has continued to be validated. Are they embracing it wholeheartedly? Are they integrating it in? I'll see SAP has HANA. Vishal Sikha had announced some big stuff at TechEd. And I'll see you seeing with Cloud Foundry and Pivotal, you know, this legacy vendors coming in that have an agenda. And quite frankly, it's clear that it's their own stuff. So what's your take on the big guys like SAP and others coming into the space? Well, from SAP's perspective, I remember a couple of years ago at Sapphire, I think it was probably 2012, some of the executives, they're really poo-pooing, if you will, for lack of a better term. No sequel in Hadoop. They've changed their tune a lot on that. They are now, I think they now recognize that Hadoop has an important role to play in the modern data infrastructure. They've established partnerships with some of the big Hadoop players, including Intel, as a matter of fact. So SAP, I think, has realized that HANA is not the be-all end-all in terms of big data. It plays a very important role. We've done some research. Our David Floyer has done a lot of research around HANA and use cases. And it solves particular business problems for SAP users. But I think SAP is finally realizing that Hadoop and other technologies have to be part of the mix. Intel, interesting, when they launched their Hadoop distribution, some of us were sort of scratching our heads. Hardware company getting into the Hadoop distribution business was not something I don't think a lot of us expected. But to their credit, when they launched that, they did put some significant muscle behind it. I've talked to some of the engineers that work on the project, and they're sincere and they're working hard on things like security, on making things like Hive perform better. They've kind of gone quiet a little bit over the last six months. I haven't heard too much about them in the Hadoop space specifically. But one of the other things they're focusing on is the internet of things, the industrial internet, whatever you wanna call it, partnering with Pivotal and GE. So I think we're gonna start to see more focus on that industrial internet meme, if you will, probably from Intel. Dave, what's your take? Well, and then we haven't talked much about Oracle, right? So Oracle essentially is a similar thing, right? They ignore, criticize, prevaricate, and then sort of embrace. Oracle's announced a key value store. Obviously the big question that we sort of bat around all the time is, is Oracle gonna buy a Cloudera and just sort of stomp their way into the market, as oftentimes Oracle does. We had a really interesting conversation with David Richards yesterday on theCUBE. He is of the belief, firmly, that there will be another billion dollar software company that emerges from this space. I hope he's right, just because it means more innovation and more interesting things. But I'm fearful that he's not right in that the large companies will start absorbing these firms. So the big question is, are they acquisition proof? So if some of them get to IPO and they can boost their valuations, like a Tableau has and like a Splunk is doing, they maybe create an environment where they are acquisition proof. If that happens, then some of these big companies really could get disrupted. And that's really where I think this gets interesting. Okay guys, great analysis. It's our news break here, brought to you by Hortonworks and Wendisco. This is theCUBE, big data NYC, covering all the action in New York City, Hadoop, World Across the Street with Strata Conference, all the action happening here in New York City this week around big data. This is theCUBE, we'll be right back with our next guest after this short break.