 Live from New York City, it's theCUBE at Big Data NYC 2014. Brought to you by headline sponsor, Juan Disco, with support from EMC, Mark Logic and TerraData. With hosts, Dave Vellante and Jeff Kelly. Hi everybody, we're back wrapping up Big Data NYC. Jeff Kelly and I and Jeff Frick have been going two days, Thursday and Friday wall-to-wall coverage. This is theCUBE. theCUBE is our live mobile studio. We go out to the event, we extract the signal from the noise. We're celebrating our fifth year at Hadoop World. I remember very well, Jeff, John Furrier called me. I was at storage networking world and he said, get your butt out here, get out of that old stodgy show, which I think is no more, and come out to Hadoop World and help create the future. And it's the best thing that I think I've ever done. It's really been a life-changing experience for me personally and certainly it's transformed our organization and SiliconANGLE and Wikibon have come together and theCUBE and it's just been a wild ride. And we're really pleased to have been here. We had a capital markets event. You have a great presentation and we had that awesome panel, which will be curating, I'm sure, for months to come. Really, really well done. Abhi Mehta, Amy O'Connor and Peter Goldmacher and yourself, just a lot of fun. And then of course we had our fifth year CUBE celebration party at Hadoop World. It was really fantastic. But one of the things I want to do now is, for some reason I want to summarize the event here. I think some of the big themes that we've heard. Enterprise comes to Hadoop. We used to hear a lot about real-time, real-time sort of a given now. We're hearing a lot about data integration, governance, a little bit about security, a lot of these hard problems that are typical in the enterprise, that the enterprise guys are bringing to the DevOps crowd and Hadoop crowd. Now you spent some time over at the event, at the Javits Center. What's your take on this year's Hadoop World, where we're at in the maturity model and in the cycle? Well, there are quite a few vendors on display down at the Javits Center, which we talked about a little bit in my presentation last night in the panel. This ecosystem is just exploding. I've got to guess 50 plus vendors down there. So the vibe was good from my conversations with some of the vendors that are on display down there. They're getting great boost traction. People are excited. People are not just stopping by to talk about the technology and kind of the, as one put it, the code jockeys. You really want to get into the code, but they're talking potential deals and what can we do to make this real? So I think what's interesting is not so much, so the real-time equation is one thing, and that's kind of, was one of the buzzwords over the last couple of years, is Hadoop gonna go real-time. What's replaced that, I think, is not so much going real-time but operationalizing big data. That's kind of a slightly different way to look at it. Real-time is a component of that, but people are talking about, okay, I've got this experiment I've done. Maybe I've used a free open-source version of Hadoop from one of the vendors or Apache. I've got some really smart data scientists who may be running some algorithms on large amounts of data. We found some interesting insights. Maybe we've even built an application, but how do we actually scale that out and run it across huge data sets and operationalize it so that it's running 24-7 and then it's highly available and resilient? Those are some of the questions that people are starting to ask. That along with some of the data governance and some of the other enterprise-grade issues around security and privacy. So I think it's interesting, having come to the show now for myself personally the last three years for theCUBE, five years, just watching this evolution of both the user base here, but just the larger evolution of the attendees in general, whether it's vendors or users and developers and now to see more business side people getting involved. And now we're really starting to talk about making this stuff work in the enterprise and less around science experiments. So Jeff, what I wanted to do is maybe unpack some of the slides that you went through the other day. And if you guys wouldn't mind maybe bringing up, I know we're going to skip around here a little bit, but I wanted to share with the audience, give them a glimpse of some of the stuff that you presented. So if we take a look at some of the adoption that's going on in the Hadoop world and the big data community, we've made the point several times that it's really not easy. It's hard for practitioners to do this stuff. The skill sets aren't there, they're having data integration challenges. But one of the things that we observed is that the enterprise data warehouse spend is the customers are starting to baseline that and they're clearly shifting spend to the new Hadoop things. That's not a dollar for dollar. They might, the dollar that they maybe spent on enterprise data warehouse, they're maybe spending 30 cents on Hadoop. So that's in the one hand bad news for the industry, but the premise is that the potential is there for much, much higher growth. Yeah, well, I mean, it's bad news if you're one of those data warehouse vendors, A, losing that revenue, but B, even if you are able to adapt and become an integrate Hadoop into your product portfolio, now you're only getting 30 cents on the dollar where you were getting that dollar before. So that's a challenge for the vendors, but yeah, what we're seeing, I mean, I think what we're seeing consistently is the practitioners, as you say, are baselining, they're trying to freeze their spending on the data warehouse. Storage is going up, spending levels or budgets should say are remaining flat. So they've got to find a way to be more cost effective in the way they store large volumes of data. So what they're doing is trying to put a cap on their data warehouse spend, putting historical data into the data warehouse, and then maybe running some experiments, hiring some data scientists, experimenting with the different data types to find new ways to use that data and potentially figure this thing out. As you said, it's hard and it gets particularly hard again when you start to move to larger scale production deployments and you want to actually integrate this into your larger organization, you want to scale this out, you want to run this 24 seven. So there's a lot of experimenting going on. We're starting to see that move to production. We found in our survey of early adopters, about 31% of practitioners are running production workloads on top of Hadoop and other big data infrastructure. And that's a sizable chunk, I think, but there's still a long way to go. You've got nearly 30% that are essentially doing POC, I'm kicking the tires, if you will, and then around 40% that are still evaluating. So yeah, we're seeing that base lining of the EW spend, moving some of that spend that you're, you would have spent on the EW, moving that to Hadoop, doing some experimenting, because people are still trying to figure this out. All right, so do we have those slides? Can we take a look at some of the data that Jeff presented? We'd like to share with our audience and then maybe go through it. I guess we're having some troubles with that, but so what we are going to do is we're going to put them up on slides here. Jeff will have them up next week, I presume. Yeah, we'll have them up, may have them up even by this weekend. And really the idea, the premise really of my presentation was that the industry heavyweights are under pressure. I mean, I think it's clear, you look at what's happening to companies like Oracle, like Teradata, they're Oracle's missing revenue and earnings estimates. Teradata has lost significant market cap over the last year. I think part of that is because of what we just talked about. People are starting to freeze their spend with those big vendors. They're moving to some of these new approaches, collectively called Big Data, a lot of it around the Hadoop ecosystem in particular. That's where a lot of the innovation is happening. We're seeing this explosion of vendors out there. As I mentioned down at the Javits today, there's 50 plus vendors down there. And so the question, of course, is there a cause and effect here? Is the pressure or the struggles of some of the industry giants that they're feeling, is that a result of what's happening in the big data space? And I think it is largely because of what practitioners tell us. That trend or that pattern I just described, freezing EW spend, moving to Hadoop, doing some experimentation there, that comes directly from our conversations and a collaboration with Big Data practitioners. I think there is a correlation there, but the idea that the enterprise data warehouse didn't really live up to its potential, I think there's a pretty dramatic argument could be made that that's accurate. But then the question, of course, Dave, is can Big Data live up to the potential, or should they live up to the hype? Can they deliver on that promise of this 360 degree view of the customer, the single version of the truth? And that remains to be seen. I think it's certainly possible, and that's some of the challenges that the practitioners down at the Javits are working on helping practitioners actually achieve. All right, we're going to give it one more try. Maybe bring up the slides. We're going to take a look at some of the slides that you gave yesterday in your presentation. And we might have to jump around a little bit, but this is our Hadoop and NoSQL forecast. I want to go forward a little bit, if you could, Andrew. I really want to get into some of the things that we saw from the practitioners. Keep it right here. So this is a slide, Jeff, that you basically uncovered in your research, and it's the percent of organizations that we surveyed that are actually paying for Hadoop and Hadoop for distribution. So take us through this slide. What is this slide show? Sure, so the zeros you see on the left, and I think a good choice on our graphic designer on that choice of images, the zero is for the practitioners who are using roll your own Apache Hadoop, which means they're paying zero. They're using free open source software. That's half the base. That's about half of Hadoop users. The yellow circles that you see on the right hand side of the screen represents a 24% of practitioners that told us they're using a free distribution, Hadoop distribution, from one of the commercial vendors, Cloudera, Hortonworks, MapR. So there's some stickiness there, but they ain't paying. Well, it's interesting. There's a little bit of stickiness, but maybe not as much as you'd think. What I hear from practitioners time and again is whatever distribution they may be using for POCs, they are reevaluating who they're gonna go with when they start to think about production. They are starting from scratch in a lot of ways because they know one, they can play the vendors against each other. The price pressure being put on the vendors is dramatic. And so they're playing them off one another. And when you go to production, you've got a whole nother set of challenges that you've got to meet as a practitioner. When you're in POC, you're testing things out, you're running different algorithms, you're trying to see where the insights are. When you go to production, you're increasingly interested in things like high availability, security, privacy. So they are reevaluating the Hadoop distribution vendors when they do move to production, regardless of who they're using, if they're in that 24%. Now, I think if we can put it back up on the screen, you'll see the dollar signs represent the 25% of Hadoop practitioners that are actually paying one of the vendors for a subscription to their commercial distribution. That includes usually support, depending on the vendor, it sometimes includes some proprietary software as well. So there's 25% that are paying for it now. And of course, naturally, that number is gonna increase as we see more production deployments because that's naturally when you start to look for a subscription support. So is that glass half full or half empty? Well, Dave, as you might say, it's a quarter full. We'll see. Is it gonna, it will become, that glass is going to fill up. The question is how high will that glass be? So Andrew, there's a bunch of infographic slides that I wanna share with the audience. I wanna go down a few, if you wouldn't mind. The, the other one that I wanna talk about, Jeff, is the, as part of the survey, we shared some data yesterday. Andrew, if you wouldn't mind just, yeah, let's take a look at this. So this is stated big data deployments. The thing that interested me here is that 31% of the people that we talked to had big data in production. And I think this really underscores the theme that we heard at Hadoop World this year is that the enterprise is coming to Hadoop. The average age of the Hadoop world attendee is trending toward my age. And so, you know, that's a decent chunk and you got, you know, 28% kicking a tire still with a proof of concept and, you know, the rest are sort of in the evaluation phase. Nobody's essentially not evaluating, you know, Hadoop in this survey because we bias the survey toward people that are evaluating at least. So well over half actually doing something with Hadoop, so that was pretty substantial. And then if you go to the other slide, Andrew, what you'll see here is this one I thought was really quite telling, Jeff. What we're looking at here is the big data tools and technologies that are in use. And the point you made yesterday is that, you know, Hadoop and NoSQL, 36%, 38%, some people say, oh, that's the bottom of the list, no. Why is that important? Why is that 36% and 38%? Why are those two data points so important? Well, they're important because those two technologies, NoSQL but particularly Hadoop, really are foundational technologies in the, you know, what we've been calling the digital fabric. You know, along with cloud infrastructure as a service, Hadoop really is becoming the new operating system in a data-centric world. So that's why those are important. And additionally, that's where the innovation is coming from. That's coming from the Hadoop community, it's coming from the NoSQL community. That's where we're seeing new and innovative ways to process, store, analyze data. So it's important that those numbers are where they are and they're going to increase over time. And then the other piece that is striking here is the conventional data warehouse. We put conventional in quotes. 51% of the people said that the conventional data warehouse is fundamental, you know, to our big data project. Everybody, we always get asked, is the data warehouse a dinosaur? You know, your joke yesterday was, yes. Well, yes it is. And like the dinosaurs, they'll be around for a hundred million years. Right, so it's a key piece of the technology stack here, the data project stack. Organizations have built processes on top of their data warehouse around compliance and regulation and reporting that, you know, despite the fact that I mentioned earlier, certainly, you know, the data warehouse has not lived up to its full potential, but it has delivered some value. And that's some of the areas where it has delivered value and it's not realistic to think that most organizations are gonna rip that out when they are getting some of that value from that. What they're trying to do, however, is, well, this is the value I'm getting, but I wanna limit my spend here because these are the big, expensive systems. If you think about the conventional data warehouse, conventional in quotes, the approach is really that appliance approach, right? That pre-configured hardware software appliance approach from vendors like Oracle and Teradata. Those are expensive systems. They're in place now in most large enterprises and they're serving to, again, report against, for compliance reasons, to report against historical data to show some trends in sales and customer account, things like that. So they're not gonna remove those. And those systems, you know, they contain insights that you may want to include in your big data project. So they need to be integrated in, which brings us to the next data point, data integration tools, the most widely used tools and technologies in relation to big data projects. We had Todd from Informatica on it earlier and I kind of made the quip, the data integration, it's about moving data from point A to point B and he's like, oh, that's a fallacy, Jeff. And he was right. I did simplify it a little bit much because included in data integration is the whole data quality component and the data governance component. Understanding your data lineage, understanding the relative quality of your data, improving that quality to the appropriate level, not every use case requires pristine data quality. So those tools are gonna play an increasingly important role, especially as we move more into production because putting this all together in a repeatable way requires orchestrating data movement, understanding the quality of your data and definitely adding data governance capabilities because when the government comes knocking, you better be able to report and show your data lineage, who did what to the data when and that's where these kind of tools and technologies come into play. All right, so I wonder if we could take a look at the next slide, Andrew, that again, looking for those infographic-like slides. This one, very quickly, we'll talk about this, 72% of the practitioners that we talk to are tapping outside services. 45% of the revenue in big data is for services. Yeah, I mean, this is just kind of a follow on what I was just mentioning. I mean, this stuff is challenging and especially, again, as you move to production, there are a whole new set of challenges that you have to deal with around connecting systems, hardening your deployments, ensuring things like high availability, ensuring that you can report for compliance reasons against the system. So there's a lot of moving parts here from both the technology perspective, from a people in process perspective and they're looking for help. Practicers are looking for guidance and that's where professional services comes into play. We're seeing companies like, we had Manny from Cloud WIC on earlier today talking about some of the things that he's seeing in the industry and really a lot of the, as you move into more mainstream adoption, we're just at the very beginning of that, but as that continues, a lot of the IT practitioners out there who grew up in the traditional database world of the vertical stack are not versed in this new way of managing and analyzing data. In some cases, they're going to be able to adapt, but in some cases, they're not and organizations are looking for guidance about how best to proceed and when they move to operational level big data and that's where professional services really comes in. Okay, I want to, if we, Andrew, go to the next slide if you could. I want to just unpack that a little bit. So again, a lot of data that we're sharing, which of the following best describes your attitude toward big data analytics? I just want to focus on the 1% slice. 1%? Okay, so it says data analytics is a buzzword with unclear meaning or an application within my enterprise. When we asked this of cloud 2011, 2010 timeframe, it was a huge percentage, like two thirds of the respondents said cloud is a buzzword of unclear meaning. You joked that today they'd still say that. I would actually disagree. I think most people sort of accept that term, but this, I've never seen anything like this before. I mean, big data is all the hype, yet people don't see it as a buzzword of unclear meaning. So perception wise, people see big data as a new source of competitive advantage. Interestingly, Abhimeda would dispute that. He said it's not the data that's a source of competitive advantage, it's how you differentiate the data that's a source of competitive advantage. So it's a nuance, but it's important. Yeah, I mean, I would agree with that, but really what we're seeing, I think what's happening is people are seeing companies like Google, like Facebook, and how they're building these enormous, multi-billion dollar companies really on their ability to analyze data and take action on that, to drive revenue from that data. So I think that is starting to play a role. I think in the cloud space it was a little bit different. It was a little bit harder to point to success stories in 2011 for people really to relate to. So I think that has something to do with it. And we're just really moving to this area where I wouldn't even say this is new really, when you look at surveys of CIOs and C-level executives, getting better data, getting better analytics is always a top priority, has been for years. The challenge has been, the reason it's continued to be a top priority is because they hadn't achieved it. So I think this is an opportunity where the question, the big question that we asked earlier, can big data live up to this potential? And I think it remains to be seen. I think there's a lot of promising areas. I think Hadoop's already delivering a lot of value for some companies. We're seeing companies like Uber, like Netflix building entirely new lines of business or I should say new business models, new industries based on their use of data and their ability to analyze and derive insights from that data that they can then take action on. So we're already seeing it happen. And I think when you're seeing it happen, that helps to validate the trend of the day and make it real. There was one more data point. We don't have a slide on it, but it's in your report, which we just released, I think, to Wikibon clients this week. Yes. As part of Big Data NYC. And that data point is the percentage of people that are shifting resources from their enterprise data warehouse into Hadoop and Big Data. And it's an enormous percent. 65% of the respondents said that they have already shifted resources from EDW into Big Data and Hadoop. And the second interesting data point is another 35% said they will by the end of this year. So fully 95% of the Big Data practitioners that we talked to said they are either baselining or reducing their investment in the EDW, shifting it to Big Data. Now as we've pointed out before, it's not a one to one. It's a dollar they're exchanging for 30 cents. And the hope of the industry, the sellers, is that the volume will be more than triple to make up for that. And so it's going to really be interesting to see. Another data point that you shared yesterday was the ROI on Big Data spend. For every dollar spent on Big Data today, the average return is 55 cents. That's a problem. Now that's the mean. At the edge, you're seeing much higher return. I'd say greater than three to one. I think you're seeing 10 to one in some cases. But we really haven't, I don't have any hard data on that, but that's something that we'll be studying throughout the coming year. And again, sharing with our working bond clients. But last thoughts on that and then we're up. Yeah, well, it's this trend that we're seeing where some of the lower hanging fruit is to go for these cost reduction, these cost reduction use cases where you're moving data to Hadoop for essentially you've got lower cost storage. And then it allows you to start experimenting with that data. So it makes sense that we're seeing a lot of the practitioners start in that area. Where it gets exciting though, for me, is moving to more analytic use cases, net new applications, and doing things with data that you just couldn't do before in the old world. And you know, we're starting, again, we're moving in that direction. You know, it again, it makes sense that you're going to see a lot of practitioners start with that data warehouse offload. Like it's really interesting when you start talking about the, not just the cost savings, but the revenue creating workloads. Yeah, I mean, the other point Abby meta made, I thought that struck me on the panel yesterday was the focus right now within large enterprises is reduction on investment, not return on investment, which is, you know, kind of makes sense. I mean, ROI is a very simple equation. It's benefit over cost of benefit. And if you can reduce the denominator, you can increase your ROI. So if you can cut your costs, you know, you win. So that makes sense that large companies would do that because it drops right to the bottom line. Having said that, the real innovation here is on the guys that are focusing on the numerator. And I think that's where the innovation is going to be. We're seeing the collision of two worlds, the sort of startup, do, you know, nose ring T-shirt crowd, hoodie crowd with the traditional enterprise. I think that's a good thing. I think those companies that can sort out that misalignment, that's something else we didn't talk about. There's a real dissonance between IT and the business and how they measure success. A huge percentage of the IT, 63% of the IT folks that we talked to said our big data project was successful, only 18% of the business people. So that's a gap. Those companies where there's alignment between IT and business are really the ones that are succeeding. So this is just sort of a flavor for some of the research that we've been doing at Wikibon, really stepping it up there, making a lot of investments in the research, you know, a silicon angle is cranking, the cube is going crazy. Our next event, let's see, what is our next event? We're going to be, I know IBM Insights. IBM Insights is in two weeks. Not this week, but the week after next, Monday and Tuesday, I think the 27th, 28th. Guys, got anything else next week? Are we doing anything next week? We're off next week. Wow, okay. I'm having two weeks, Google November 4th. We are doing the Google Cloud Show. We are stoked. We're doing the Google Cloud Show. We're doing Amazon re-invent. So really excited about that. Covering cloud like a blanket. And anyway, this is a wrap on Big Data NYC. Really appreciate everybody watching. Thanks for all of the folks that came to our event. And as I say, Jeff Slides will be up on SlideShare. We're going to be re-running the panel all week. We're going to be curating that. Really fantastic content. And Jeff has really been a pleasure working with you. John Furrier, we miss you. Thanks to the crew. Brendan, Andrew, Matthew's gone. Greg, Mark's gone as well. John Greco, good job working the floor, making sure clients got here on time. We didn't have one. We were off-site. I think we've mentioned that. We're at the Times Square Hilton of the Javits is where all the action is with the Duke World. We've been shuttling clients back and forth. Thanks to John Greco. We did not have one miss. Every guest was able to show up. So thank you for making that happen. And I think we even had some extras, Jeff. I think we did. So thanks everybody for watching. That's a wrap. This is Big Data NYC. You're watching theCUBE. Thanks everybody. We'll see you next time.