 Live from Midtown Manhattan, the Cube's live coverage of Big Data NYC, a Silicon Angle Wikibon production made possible by Hortonworks, we do Hadoop, and when this go, Hadoop made invincible. And now your co-hosts, John Furrier and Dave Vellante. Okay, we're back here live at Big Data NYC. This is the wrap-up of the show, Day 3. This is the Cube. Our flagship program, we've got the events that extract the signal from the noise. In this case, we created an event called Big Data NYC. We came out, tamed to New York to cover Hadoop World for the fourth straight year. Now it's strada plus Hadoop World. Cloudera signed the show over to O'Reilly Media. We are not inside the hilt and it's right behind us next door. We're surrounding it and all the thought leaders are coming into Big Data NYC. It's been a great week. I'm joined with Dave Vellante and Jeff Kelly. Guys, what a fantastic week. We launched a company. We broke exclusive news. We had a book signing on the Cube Party. We had commemorative giveaway, shot glasses with the Cube logo. First ever, it's first for the Cube, Dave. It's a great moment. We're proud to be here and continue our independent coverage of events. We wouldn't be possible without the sponsors Hortonworks and Wendisco with some help from MAPR. These are the guys that support the community as was said in the Cube here. No one company is bigger than an active community and the Big Data community is very active. It's early. It's robust. It's growing. There's business to be made. And we had the CTO of Acteon said, quote, there are many kings in this land. There's plenty of land to go around. So it's a really exciting time. Guys, let's wrap up the coverage. Dave, we'll start with you, in my opinion, but we'll start with Dave. Your take, three days. We had day one, eight interviews first thing, Monday from three o'clock on. A lot of buzz, HD insight, Hortonworks, Microsoft announcement, and then a slew of announcements. A lot of commercialization, a lot of technology discussions. What's your take? Yeah, I mean, I think that the announcements think this has now become the cycle. You're going to have the Hadoop Summit in June. You're going to have Strata in the spring, Hadoop Summit in June, and Hadoop World in the fall. And you're going to see announcement cycles around that. You know, it's the sort of cadence that you would expect. For me, John, I want to just review some of the money highlights of the Cube guests. Some of the great quotes that I heard. Mike Hoskins from Acteon, he said, the digital data tap has been turned on and it will never be turned off. And then, of course, on Tuesday we had Abhimeta. Abhimeta is just an amazing Cube guest. He's always got some great concepts. The one he introduced this time that I've been using all week is BC and AC before cutting and after cutting. Algorithms are free, he said. The stack wars are far from over, despite what a lot of people are saying. He said Hadoop must be more than a cheap storage platform. We had Jack Norison, he said we're helping customers control the fill rate. I love that quote. The big story, according to Merv, is choices. Merv, Adrian, fantastic guests from Gartner. He said that the bumper sticker, when you asked him, John, what's the bumper sticker of this show? He said, from wither and why to how and who. And then Hilary Mason said, you know, she gets a lot of questions. You know, humans are going to be replaced by machines and data. She said, look, AB testing tells you which choices make the most sense when. They don't tell you which choices you should test. What should A and B be? And then Bill Schmarzo, the dean of Big Data, said Big Data is challenging conventional thinking regarding how non-analytical business users will be using analytics. That to me is the holy grail of Big Data. And finally, Charmilla Mulligan talked about this intelligent data harmonization with Clear Story and what they're doing. So, you know, a lot of innovation, still not enough applications, but a lot of growth. And so John, that's my takeaway. What do you have? I mean, I think one of the things that's clear is that you have the old world, Oracle, the terror data of the world. You have the new school, Hortonworks, and Clear Story platform and the BI companies. The big story to me is watching this over four years is the rise of the data platform. And I think Amar Awadal from Cloud Air pointed out that it's a platform, it's an operating system, it's going to be a big picture. Merv Adrian from Gardner said it's not sure how to do it, it's about the data center. Again, we, again, connect the dots here in the Cube and we are telegraphing out that the trajectory will go right back into the enterprise as the enterprise is transforming their architectures. That's going to set the table, in my opinion, for an explosion of big data applications. I think what happens with software-defined networking, software-defined virtualization is going to clearly enable the hybrid cloud is here. That's the preferred path is confirmed by both Amar Awadala, Co-Founder of Cloud Air, and Merv Adrian. So the cloud real construction is going on that we've been reporting in other events and other verticals will influence and be a catalyst for big data. Two, the business intelligence, the advanced analytics is the killer app for big data. Just like email became the killer app for the internet, analytics in some form or the other, helping humans be better, helping people make better choices using social data, the tap of big data, the age of big data that was talked about earlier is another big story. Not data warehouses, it's the data platform. So it's the business intelligence transformation to advanced analytics will be a tsunami of user value. And I think that's going to drive that. And again, as the cloud matures on-premise, off-premise cloud, hybrid cloud, you can see big data apps take center stage in the next few years. That's exciting. And then finally the business model of open source is very viable. We've always been watching this day from day one when Pat Gelsinger said there can't be a red hat for Hadoop. Hortonworks is executing. They're not wavering. It's clear that there's a business to be had in open source. They're executing on that. And I think that points to the fact that there's not a lot of distrust. People want pure Apache. They want some support. Cloud Air has picked their path. Again, the market's maturing. And on top of that, you have Pivotal, right? Pivotal trying to put the pieces together, building on green plum. Really, they're only asset they have right now that's in market doing anything of value for their customers. They're building on hawk. They're going to fill out the white spaces in the platform. We're going to watch Pivotal specifically. And again, that's going to be the merit story. And we're going to see if they can pull it off. And again, Cloud Foundry is going to go up against OpenStack. It's going to go against AWS. Another quote I would add to the repertoire that you last mentioned was Omar Awadal said, OpenStack's like Android. And he mentioned Cloud Foundry. But then you clarified and said AWS is more like the iPhone. That's what we're seeing. We're seeing that kind of landscape. The iPhone has proven that kind of business model will work as well as Android. So I think we're going to live in a world where they'll be both. OpenStack clearly driving that on the cloud side. Again, this is going to map right into the data world. And I think ultimately the data platforms will drive a lot of the analytics. Jeff, what's your take? Well, I think you guys hit on a lot of great points there. You didn't leave me much to analyze. We'll throw you yarn. Well, yarn, I think that was one of the big things here is talking about the data platform, as you mentioned. And the key innovation, or one of the key innovations, I should say there's several. But one is making Hadoop a multi-application platform. So you can do multiple tasks with it, whether it's search analytics, ad hoc analytics, ad hoc queries, machine learning. We heard a little bit about integrating a storm with Hadoop that's coming in the future to do streaming analytics, make sense of data, streaming into your organization, your enterprise in real time. So that is one of the key storylines for me. The other one I think you mentioned some of the big players, the old school Oracle and Teradata. And I think now that to me the Hadoop players have really stepped it up a notch. They've got their game together. They're going to market in a very smart way, each in their own way, but they're very much, they're no longer just these little scrappy startups. They still got that scrappiness, but they've got the adults are there as well, and they know how to go to market and start selling these products. So I think you're going to start to see the big players, the Teradata's and the Oracle's of the world start to take notice. Finally, I shouldn't say finally about Teradata because they have noticed and they've got their partnership with Board and Works, of course. But what I mean is the success of the Hadoop distribution vendors is going to start rubbing up against the traditional EDW database players. They're going to start to take notice. I think we're going to start to see rumblings of either acquisitions or real pushback from some of those vendors. I mean, there's been some not so great quarterly results from Oracle and Teradata. And part of that I think is they're seeing competition now. Coming from the Hadoop market. So I think that those two markets are coming to a head. It'll be interesting as Hadoop continues to develop the capabilities and as the Hadoop players mature, start selling into some of these accounts that are replacing Oracle and Teradata. We'll see how it plays out, but I think it's going to be interesting next 12 months. So in connecting the dots, we mentioned a little bit of the data center and how we're kind of teasing that out and kind of bringing that to the table a little bit. There's kind of some methods of my madness here, David. I want to go to Dave Vellante on a comment here. A surprise to me in the show is the rise of the relevance of WAN Disco. And WAN Disco has this non-stop Hadoop positioning, which is great marketing for essentially continuous operation, continuous availability, high availability. And they're bringing that concept down where horizontally sharing workloads across is a key. And they want to ask you, we had Pivotal on with WAN Disco. And I asked Pivotal, you know, EMC has high availability. Why aren't they working with EMC? And they're smiling. Pivotal is really agnostic. They're going to be like VMware was. They're going to go work with a lot of folks and make the solutions. So Dave, the WAN Disco rise of relevance, what's your take on that? Why are they such a hot topic at the show this year? So I remember when WAN Disco came into the Wikibon offices and you can imagine, we get inundated with vendor briefings and quite honestly, 90% of them are just another vendor briefing, right? WAN Disco came in and for some reason I walked in the room, met these guys. It was a big crew, you know, and we have a small office in Wikibon and these guys started to describe what they did and I said, wow, this sounds like a really hard problem that you're solving. And then I remember meeting some of the tech athletes at WAN Disco, you know, guys like Jagann and Kaz and Wright. And they started to describe, you know, essentially what is an age-old problem in computing, which is speed of light. If you have distributed data all over the place and you have to, you know, protect it and make it continuously available, how do you do that over long distances? You can do it in short distances, but the problem is if you get something like as Jagann was saying, hurricane Sandy, everything gets wiped out. If you do it over long distances, you lose data. It's a really hard problem that they're solving. So what they did, again, going back to Abby Metta's, you know, before cutting, after cutting, after cutting they did a Hadoop-based, continuous availability system right in the Hadoop file system that understands distributed, geographically distributed data. It reminds me of some of the magic that Google's doing with Spanner, where they're using atomic clocks to figure out when the data's actually going to get there, even though it's over geographical and dispersed distances. It's a very hard problem. So what EMC has solved, that question you asked was fantastic, what EMC historically has solved is really, you know, on Wall Street, doing a data center in New York and Manhattan and one across the Hudson in New Jersey. You know, pretty good protection. Saved a lot of people in 9-11. Didn't help with Sandy. You had to use other techniques for Sandy. So the rise of WinDisco is a function of, we hear it all the time, Hadoop needs to be made enterprise ready. One of the aspects of enterprise ready is it, question, is it continuously available? Yes or no? Okay, how is it achieved? And what happens when something goes wrong? And WinDisco has really solid answers for every one of those. So very impressive and their growth rate is just enormous from a small base, but their stock price is going through the roof. Well I think they solve a really critical problem and it's a really, fairly easy problem really to show the ROI on. If you've got an application, that's a consumer-facing application, for instance, that's running on top of Hadoop, if it goes down for five minutes, you can calculate what that's going to cost you. And it's a really straightforward value proposition to a really, really important problem. The other thing that I want to bring up, and Jeff, I want to get your take on this, because this came up multiple times, the importance of what yarn means, meaning yarn is one of those aspects of the big data platform, that it's kind of confusing. If you're not a tech geek, you can know the nuances of what it means. But MapReduce has been a core part of the Hadoop platform, and what yarn does is creates the ability to not be dependent on MapReduce. That becomes a real disruptive enabler to create more innovation. What's your take on that? Are you seeing consistent feedback from folks out there? What's your take on the yarn situation? Well I think yarn is, the concept is critical to the overall success of Hadoop as the core enterprise data platform for organizations out there. Yarn itself is very impressive, but it's really the first round. They have Arun at Hortonworks, who really leads development of yarn. It's a community effort, but Arun really is kind of one of the core developers. We spoke a little bit about the road map, and there are several phases to go, just for a quick example. So the idea is that yarn essentially allows you to do multiple types of applications on Hadoop rather than only batch analytics using MapReduce. You can now do things like using Hive, do more SQL interactive type queries. So essentially what Hive, the current GA version does is it kind of masks some of the MapReduce happening, but when you do a Hive query it's still running MapReduce in the background. It's much faster than it was before yarn, but it's still MapReduce. So down the road and future versions of yarn, they're going to actually move it off of MapReduce and you're going to get even better performance. The other thing that I mentioned earlier was integrating Storm into Hadoop. That's also on the yarn road map. That's part of that project, bringing those kind of things in. So really the fundamental, if you're not a tech geek, the critical thing to understand is that it makes Hadoop a more well-rounded platform that can serve multiple needs, multiple applications. Again, in talking to the president, Herb Knitz from Hortonworks, what's great is that you can see the business model taking off of big data. And I see operationally driving top-line revenue and building a marketplace for vendors to do business and do commerce and share value and get paid for it. It's critical. However, these shows always bring out kind of the new shiny object of the new trend that we want to kind of keep an eye on. And I want to get your take on this, but I'll share my take first. That is that we heard from Hillary Mason, who's now a data science and resident Excel partners. We also heard Ed Dunvell, who's a principal at Silicon Valley Data Science, also co-chair of Strata, who announces this will be his last show, co-chairing. So he hats off to Ed and Alistair, mainly Ed. Great job. Great run. Fantastic conference. A big fan. Is the rise of data artistry, the art and science. And Hillary Mason's like, yeah, there is art to it. It's got to be the creative side. It's both. It's really going to enable a lot of people to add value. You don't have to be a data jockey or a Python coder to be that killer data science, that next Billy Bean in reference to Moneyball. And two, Ed Dunvell brings the user experience to the table, create predictive analytics, do things, and thinking about the human being in mind. So you're going to start to see the instrumentation of the universe. And this is something that we geek out on with CrowdSpots, our platform, and CrowdChats, which is exciting to say we did two, three this show. Very exciting. So bringing the human element. That to me has got my radar on big time in that area. And that is that it's not just the math. It's the science behind it. I mean, the art behind it and the science. Charmilla talks about people with iPhones not having to do query. So, you know, this kind of discovery with data is going to bring new algorithms, graph computation, machine learning, community detection, all these things kind of that we talk about internally all the time. So to me, that got me kind of intoxicated a bit. What's your guys, what out there got you excited that's off the quote blocking and tackling business marketplace side of things? Well, kind of building on what you were mentioning, what, you know, Hilary mentioned the data artistry, but also the skills that you need to be a good data scientist and one was the communication skills, which is key. I think that gets overlooked a lot. Ultimately, what you're trying to do as a data scientist is tell a story. You're trying to uncover some insights and you've got to communicate that to folks who are going to actually use those insights to make decisions. And part of the key there is being able to tell a story, or it's, you know, through what kind of visualization you choose, how you choose to introduce that insight to end users. So for me, that's really what gets interesting when it comes to data scientists and some of the innovations that they're at. To me, John, really two things. One is, and I always love to bring up Nick Carr and how he was so wrong. I know it drives you crazy, but it reminded me of when whoever made the statement said every great invention has been made. Every scientific discovery has been made. I mean, it was absurd to say that technology can't be a sustainable source of competitive advantage. And if you look at technology and the technology of data, it is becoming increasingly and increasingly a source of competitive advantage. Now, will that moderate over time? Okay, maybe. Everything goes in cycles. But that to me is really exciting that practitioners are creating more value around data and the use of technology, the creation of technology than the technology industry. That's great to see because traditionally in the technology industry, a few guys made a ton of money like Microsoft and IBM and Intel and a bunch of guys made a lot of money like Sun Microsystems and HP and others. So I'm very excited to see the industry transformation that's going on. The second thing is I love business models. The business model discussion that we were having today around Cloudera and Hortonworks and Red Hat, et cetera, et cetera. What is exciting to me is we're starting to get some visibility, some real visibility on business models because I think you're going to start to see some of these big data companies do IPOs. And that's going to allow us to really track this business in a much, much better and more transparent way. Breaking news, just want to keep coming across the wire. I know it's kind of not related to big data, but talking about cloud, CSC, which we've done a crowd chat with Dan Hutchins, CTO, has acquired Enterprise Cloud Management Company Service Mesh. So our friends at Service Mesh just got acquired by CSC. This enables CSC to continue their transformation to the next IT company. So back to your Nick Carr analogy. This speaks to a very, very huge issue and that is orchestration. So in the cloud business, it almost mirrors kind of what's going on in big data. We didn't get into much on the cube here but orchestration is a big issue and I think this is going to be a huge win for CSC. Does it say how much? I'm just reading the press release now. It probably was significant because I know those guys are doing really, really well. Certainly a blow to pivotal. I'm just seeing the number 325 million in range is what they're saying. Great. Good outcome for the founders. Frank Martinez, congratulations. He's the founder, friend of ours in the cube. Great work. They've been doing great work at OpenStack. Obviously EMC I think should have been in the running for this one. I'm surprised that EMC let this one go for that kind of dough. CSC and EMC are close partners, I wonder if that's how CSC got good visibility on Service National. But again, this is it. I mean you heard one of the cube guys here, the CTO of Acty, and say we bought three companies. People are gearing up Dave. They're putting their battle gear on. They're putting their armies together. Whatever metaphor you want to use is plenty of land to go around for many kings. And what I'm excited about is that we've said this years ago in the cube, in the big data business, it's just going to be a matter of time where there's going to be many fruits on many trees. So I think a lot of people are looking at who to go with, who to play with, who they're going to hang with, the kind of Karitsu's, the kind of the platforms. Again, the service mesh with CSC announced what speaks volumes to that and pretty exciting. Well John, I want to thank you for coming up with the idea you and your team at Silicon Angle of the Cube because it's been just an awesome run. I also want to thank you for the phone call almost four years ago to the day I was in Dallas. I was at Storage Networking World. You called up and said, hey, we're doing a dupe world. Get your ass out here. Four years ago, Dave. And I said, John, I really, I got a lot of appointments because, okay, hey, you can stay there and hang out and talk storage with your 20 buddies or you can come meet some new friends. And I got in the plane. We got diverted through DC. I got in at like 3.30 in the morning. You were still up. We met and then we had just a phenomenal week and it's been a great run ever since. It's really privileged to be part of this community, part of the dupe world community and really want to thank you. We've got some new stuff, Dave. We've got crowd chats. Go to crowdchat.net. It's been public select beta for only a couple months. We're testing it in some of the B2B clients but also check out crowd chat, crowd chats, new innovative engagement at platform with an integrated application. So we call that the crowd spots platform with the integrated application crowd chat. A fantastic way to communicate in a group setting. Creates a chat room around Twitter and LinkedIn hence the word crowd, not tweet chat. Allows for threaded conversations, got game mechanics with voting. We love it. We've been using it now and everyone that's using it is fantastic. You can see a lot more crowd chats so proud to bring that out here and we're going to continue to roll that out. It does not allow us to do a lot more collaboration do some social business with folks. Crowd chat, Dave, is really to us the next exciting evolution of social business and that is crowd activated innovation. Crowd activated innovation is the future and that's activating the crowds and really creating value for each other, the community and doing it in a way that's in the open just like open source. So again I am so excited to be involved in open source going back to my earlier days coding ultimately here in the social realm open source principles will continue to thrive as we get more connected and instrumented as we all know it's going to get stronger. So watch crowd chats. Also we had our first ever Cube party event sponsored by Horton works and when disco last night they want to thank you for coming but that was just a tremendous success. We had all of our Cube alumni there that was in town we had conversations. I saw some deals getting done on the back of a napkin big time executives from big public companies talking to startup CEOs our friends great great community developing here in the big data world and again the Cube is here for that's one thank you and this will shout out to all the team here we had a little bit of power outage yesterday knock us off the air for a couple hours we got all those videos captured all the content we just missed a few hours live but we had great sessions Monday great show it is the age of big data it is big data NYC we'll have a big data SF coming up probably next early next year again very intimate environment with the Cube and continue to cover Hadoop World ongoing consecutive years we're going to be here this is the Cube want to thank the team guys thank you Jeff great job with your analysis any final party thoughts guys Jeff well so you know it will be interesting when we're back here next year the conference itself is moving down the road to the Javits Center so maybe not here at the Warwick but certainly in the city and it will be interesting to see how far we develop in the next 12 months in this market as I said I'm looking for some big things from these players they're starting to mature and yeah it's going to be an exciting year and next week we're at IOD IBM's big data conference awesome show last year second year for the Cube and then the following week we're at AWS re-invent it's going to be like the Super Bowl really excited about that we've got the cloud market covered like a blank obviously storage you know us we've got that covered as well we're going to take it to the cloud big data we've got that covered with the analysts real thanks to again Hortonworks and Wendisco for supporting our open source independent coverage here at the Warwick Cross Street from the Hilton where Stratoconference is going on where Hadoop World our fourth year here it's exciting and you know Sean Connolly, David Richards Amarawa Dalla, Bill Schmarzo Charmilla Mulligan, Jack Norris Herb Knudts, we had Spotify here SAP, Ed Dunbell, Hilary Mason Merv Adrian from Gardner the folks in the community we had Bob Haines from BOTS with Formal with PAPS Microsoft just a lot of support from the community a great great event great content exploding market we're excited this is the Cube keep following us we'll be in Las Vegas for our next couple of events and we'll be out on the road we'll see you at the next event thanks for watching and this is a wrap from Big Data NYC and the Big Apple with Hadoop World and Stratoconference going on this week thanks for watching and all the videos are available on youtube.com slash SiliconANGLE that's a wrap