 Hi, this is Stu Miniman coming to you from Wikibon World Headquarters in Marlboro, Massachusetts. Going to be focusing on big data for this CUBE conversation. At Wikibon, our research agenda is covering cloud, big data, software-led infrastructure, how all of these disruptive technologies are blending over, and of course we always want to allow IT practitioners to share with their peers and doing that through deep conversations as well as some surveys. So joining me for this segment is our lead big data analyst, Jeff Kelly. Jeff just completed an extensive big data survey. Can you tell us a little bit about that survey and what was the goal that we were looking to get out of it? Absolutely. Great to be here, Stu. So, we set out to survey the big data community to really understand how big data analytics is being applied in the enterprise, kind of the state of deployment. Where are we in terms of maturity of the market, maturity of the practitioner community? And of course, another key area was talking about the challenges and the barriers that practitioners are facing in terms of getting the full value of the investments they've made in big data technology. We do hear a lot of talk about all the promise of big data, but we wanted to find out just where are we in that journey from all that talk to reality and actually delivering on that promise of big data analytics. So talk to a little over 300 practitioners from a number of industries, some are IT practitioners, others were line of business practitioners, so some really fascinating insights. All right, Jeff, can you help unpack for us what were the top level findings that you've received from the survey? Sure. So I think from a very high level, we were just amazed by the general opinion that practitioners have around the promise of big data. So 90% of respondents believe that big data is either critical to the competitive nature of their organization going forward, or at the very least an important complement to some of their existing data warehouse, BI, and data management capabilities. Virtually under 5% thought big data was just a meaningless buzzword or was ill-defined and really didn't have application in their enterprise, which if you think about terms like big data previously cloud, go through these progression of how practitioners view them, whether that's, again, just a buzzword or something that's really going to have an impact. And the fact that big data has come to this point where the vast majority of our respondents believe it's so critical to their business, I think that's something about the technology and the approach and the impact it's going to have on the enterprise over the years to come. Yeah, so Jeff, I know we've looked at getting beyond the definition, talking about some of the struggles users have been having to really get value out, and it seems like we're kind of turning the corner in some of that adoption. We are. So we asked our survey respondents about where are they in terms of their deployments? Are they in production? Are they doing POC kind of experimentation phase? Or are they still in that evaluation phase? About 30% of our respondents said they are actually supporting applications running on big data infrastructure in production. So that's a pretty reasonably high number considering where we are in this market. It's a fairly new market. I mean, if you think about, you know, Hadoop is kind of the poster child for big data and that's, you know, less than 10 years into its existence. So the fact that we're that far along, I think it says something about how rapidly this market's maturing. That said, that still leaves about 70% that are either simply experimenting or even just evaluating their options when it comes to big data. So there's still a long way to go. So that was, you know, one of the areas we focused on. We also talked kind of a little bit more in-depth about what they're doing with Hadoop specifically. You know, about a third of our respondents are actually using Hadoop in their organization today. And very interestingly, this is one of the more surprising findings from the survey. Only about a quarter of them are actually paying customers of one of the Hadoop distribution vendors. And the majority, over 50%, are actually using essentially roll your own Apache Hadoop that they downloaded for free. So, you know, again, what that tells us is we're still very early in this market, huge opportunity for those Hadoop vendors to capture that business. And I think we're reaching a tipping point where a lot of those practitioners who are not yet paying for support from a vendor such as the Hortonworks or Cloudera or MapR are going to come to the point where they're going to have to make a decision with which one of those folks to go with as they go to production. So we're getting to that tipping point and it's important for the Hadoop vendor community to really be aware of this and be on their game for the next 12 to 18 months is going to be critical for those vendors in this market. So I want to talk about kind of the old way versus the new way. If you talk about kind of traditional data warehousing versus Hadoop, is this a zero-sum game? Was this new projects that we were using big data in Hadoop for or are we starting to see an erosion of the data warehouse market and the shift from where customers are going to be going? Well, it's a little bit of some contradictory findings on the surface. So we found that close to 50% of practitioners, as I mentioned earlier, see big data and things like Hadoop as complementary to their existing data warehouse and BI and data management practices. Yet we also found that over 60% of practitioners who have deployed Hadoop today have migrated at least one workload from a data warehouse or mainframe or some other legacy system to Hadoop. And a lot of those are for cost savings reasons. Hadoop is a tenth the cost of your traditional data warehouse. So those seem to contradict each other. It's complementary, but you're seeing disruption. I don't find them complementary. Sorry, contradictory that is. To me, complementing doesn't mean there isn't some disruption going on. So while we don't believe that Hadoop is going to replace the data warehouse, at least certainly not in the short to medium term, there is no question that there's competition going on for the different workloads where the two do overlap. There are certainly areas where the enterprise data warehouse is going to excel over Hadoop and there's going to be things that Hadoop does much better than the data warehouse, but increasingly there's this blurring of capabilities one over the other. And that's where we're seeing competition from the Hadoop side of the house with the data warehousing side of the house. So we're definitely going to see some of these workloads migrate. It's going to be interesting to watch how the data warehouse vendors react to that to try to maintain their business, whether they embrace Hadoop or try to put it in a small box off to the side. It will be interesting to watch. But it's certainly complementary, but we're seeing this disruption still happen for particular workloads where there is overlapping capabilities. So you mentioned that one of the things that surprised you was what distribution people were using. I'm wondering what other nuggets you found in the survey, things that challenged your existing beliefs or general practices out there, or just generally surprising? Well, one of the more surprising elements was around the use of public cloud to support big data analytics projects. We think here at Wikibon that long-term, cloud and big data are going to be intertwined significantly, but we're talking long-term 10, 20 years. What we found was that again about, I believe it was over 60% of practitioners are using the public cloud for some element of their big data analytics project. Now that doesn't necessarily mean they're using Hadoop in the cloud. It could mean any other component in big data analytics takes into account things like Hadoop and NoSQL, but also to some extent traditional databases, data integration tools, other data management tools. But nevertheless, that's a pretty high number, 60 plus percent using the public cloud for some aspect of the big data analytics project. So that surprised me because we do hear a lot about the concerns, particularly from enterprises in highly regulated industries and we're security and privacy are a concern, about moving data from their internal data centers out to the public clouds such as AWS or Microsoft Azure or somewhere else. So that number caught my eye and what I think is going on here is we're seeing a lot of the early experimentation happen in the cloud. AWS in particular does a great job of offering really easy to use, easy to spin up capabilities around things like Redshift and EMR. So the developers can go into that environment, create prototype applications and test those out. And I think that's what we're seeing a lot happening to some extent data scientists going in there and using some of the kind of data sandboxes there to do some experimentation. The next question of course is going to be as we move to production workloads, are enterprises and practitioners going to bring these back in house or are they going to keep them in the cloud? I think that's an open question, but I think there's an opportunity both for those vendors that can provide a private cloud environment so that they can shift what the work they've done in the public cloud, move that to a private cloud environment, bring some of the benefits of a cloud-like deployment, a service-like deployment to those internal big data analytics projects, but also obviously AWS would like to expand those to really enterprise-grade full production deployments. So we'll see how they attack that problem. I would say both a perception issue as well as an actual technology capability issue around things like data integration and data movement. Jeff, so I mean you know cloud is an area that I focus a lot on, so just a follow-up on that. Are people just using the cloud as a platform where they can do test or are they using the services? So you mentioned things like Redshift, Google with Google application engine, are they building the apps to be kind of a board in the cloud app that's going to live there? Is it just a platform that they can do test on? Are they really using some of the more advanced services they have? It sounds like it's a bit of a mix. It's a bit of a mix and this is an area where we're definitely going to do a lot more investigations part of our big data service and we're going to continue to do these surveys and other work with the community both qualitative and quantitative research to try to determine some of these questions. So we're working on that now and I think it remains to be seen. It's an open question and it's going to have pretty big implications for whether it's the cloud providers or the more incumbent IT vendors. Sure, absolutely and definitely AWS has the lead in this space. Google has some nice tools there and then of course you've got IBM with Bluemix, HP with Helion and many others that are trying to challenge in that space be interested and see what they can do in the second half of this year and beyond. Last question I have for you is from a research perspective active in following up on this survey, what next steps do you have? What are you going to be looking at later this year from follow up? So this was a pretty extensive survey, the one we've just completed. So in fact, we've got lots of data that we're still going through now. You're going to see from Wikibon a number of research notes that are going to roll out over the course of the summer and into the fall, the draw on a lot of the findings we have in this survey and also with the ongoing outreach we have with the community and our other research efforts. Then in Q4, we're going to do yet another survey. In fact, you're going to see this as an ongoing area of exploration for us. We're doubling down really in the research studies and the survey work we're doing. So you're going to continue to see surveys coming out. You're going to continue to see forecasts. We're going to be publishing our Hadoop and NoSQL forecast, market sizing and revenue forecast for the remainder of the year. You'll see that in the fall. So yeah, stay tuned. There's a lot happening. We're really excited about our research service here at Wikibon around the Big Data Space. All right. Well, Jeff, thank you so much for spending the time. Of course, go to wikibon.org, big data, find all the research. Go to siliconangle.tv to see the upcoming events. Jeff will be hosting many shows in the Big Data Space throughout the rest of 2014. And thank you for joining us for this segment of Cube Conversations.