 Live from New York, it's theCUBE. Covering Big Data New York City 2016. Brought to you by headline sponsors, Cisco, IBM, NVIDIA, and our ecosystem sponsors. Now, here are your hosts, Dave Vellante and Peter Burris. Welcome back to New York City, everybody. This is theCUBE, the worldwide leader in live tech coverage. Peter Burris and I are going to wrap up four days of wall-to-wall coverage of Big Data NYC. This is also a Duke world, Big Data Week. It's been a good week. It has been a great week, Dave. We came in here expecting to hear a lot about machine learning. We heard a lot about machine learning. We came in here expecting to hear a lot about the next phase of maturity in Hadoop. We heard a lot about some derivative tools that are intended to accelerate the pace of innovation and business value in Hadoop. We also came in here expecting to hear about a lot of new tools, new technologies, new approaches to doing things, while there wasn't an enormous amount of that, it's pretty clear that the ecosystem around Hadoop is starting to settle down, recognize what Hadoop does well and embrace Spark and some of the other new things and, for example, data warehousing and recognize that all of these are gonna end up contributing to a company's analytics capabilities. Right, so Monday, of course, we did an event with NVIDIA, so a lot on GPU, a lot of questions about GPU and CPU, what's it gonna look like, how's it gonna shake out, what are the use cases and apps, and that's a business with a ton of momentum. And Tuesday was great. Tuesday was kind of data science day. IBM organized this fantastic panel of data scientists that I got to interview, eight data scientists for an hour, and it flew by. And then, of course, IBM had this pretty significant announcement where, again, we've kind of been unpacking it, essentially what IBM is, they stitched together a number of existing tools that they had brought in some partners and then presented it in a new way with a natural language interface. It looks pretty cool, looks like they're really gonna potentially make some new breakthroughs there. Taking what traditionally would have been an IBM services-led business and putting it into software, which I think Peter is fundamental for IBM's success going forward. Oh, I agree, and it also kind of speaks to the transformation that IBM has been going on for the last number of years, where before they were the company that sold to the CIO better than anybody, and now they're one of the companies that sold to the CEO, as well as anybody. And so if you think about the whole data works notion, it's about trying to get technology into the hands of people who are actually going to create business value with it, and not just through a services play, but through a software play. And given that some of the old guard is actually getting rid of some of their software, it's good to see IBM recommit to try to bring good software to users that are responsible for creating business value. Yeah, and to your point about the majority of the ecosystem and the tooling, and we're not spending hardly any time, if any, talking about the Hadoop distros. Those days are long gone. I mean, Hortonworks obviously came on, but even what you see from them is a real discussion around business value, sort of driving innovation and applying data to support monetization strategies. The other piece is this sort of big data week hinge next to last week's data science summit where we really did some excellent research in terms of understanding the frameworks that chief data officers are using to build out their data organizations, really starting with understanding how data can be used to help drive business value, not necessarily monetizing the data itself. Oh, there's some of that going on, but many companies early on made the mistake of, okay, how are we going to monetize our data? And they're like, well, we can't. You know, we've got to go compete against, you know, data markets rather. There is this whole notion of the data economy where everybody's going to sell data to each other and a good data scientist gets in the middle of that and says, yes, please, because I'll take all your data and I will re-engineer your customers, what your customers want, what your customers are buying. I will take all your customers away from you in a week and a half. So I think that the concept of monetization of data is now a recognition that it's not selling the data but actually finding ways to generate new sources of value out of data within the business. So for example, Bill Schmarzo talks about this or our Dean of Big Data talks about this all the time where what we're really trying to go with this is recognize that data should be regarded as an asset that should be able to generate returns. And in fact, because of its special qualities, might even be able to generate greater returns in a lot of the other assets that are available given what technology can do today. Cloud another big theme. I mean, you know, it's sort of trite but cloud is taking hold. It's almost like people are surprised but you're hearing that a lot, which is interesting. I mean, we've sort of seen this for years in our research and our data that big data was gonna happen a lot of big data anyway in the cloud. So, you know, question is if you don't have a cloud strategy, what are you gonna do? It's interesting to see a lot of these distro vendors saying, okay, we're gonna be in the Amazon marketplace or we're gonna reside in the Azure marketplace, et cetera. But those companies at Amazon, Google, Microsoft are building out their own data pipelines, right? Well, they wanna create value above just the raw infrastructure and that's gonna be one of the biggest consumers of that raw infrastructure will be the pipelines and the tooling and the application capabilities that will be required to turn all that into value. So that's where the profit's gonna end. And they're gonna happily commoditize that infrastructure. And I think going back to the notion we've touched upon a couple of things. I said that, you know, it's that derivative play. We're now at a point where the derivative play, what happens after people first recognize what a dupe is and is not good at. Our research throws pretty strongly that the community is struggling a little bit with complexity. And cloud is one of the answers to some of that complexity. What, you know, creating new and different types of partnerships. We've heard a lot about new partnerships. What IBM is doing. Answers to dealing with the complexity. So a lot of the open source software in place. And so not to confuse anybody. We don't think open source is dead. We don't think open source in and of itself is a way of creating and innovating with software is bad. But we think increasingly we're gonna see some of these other mechanisms for ensuring that customers get visibility into how tools do things and how they create business value with this enormous wealth of open source software that's been generated. I'll talk about that for a second. So Rob Hove published an article. You helped contribute to that. The broken promises of open source software and what can be done about it. In big data. In big data, yes. And so that's right. And so a couple of observations that we made this week. One is that when you compare everybody likes to compare the sort of who's the red hat of big data. But if you go back to Linux you've made the observation that there was such a huge community of Unix experts out there that when Linux came out they didn't need help. They knew exactly what to do with it. They didn't have to call your red hat for support. That plus I pointed out that IBM under Steve Mills put a billion dollars into the Linux marketplace to sort of tip the scales with Microsoft. And it worked. It worked. It was a brilliant strategy. But nonetheless, that was a far less complex situation than you have today with big data where people don't really understand how to use Flume and Scoop and Hive and Pig and all these other tooling tool sets. So what do they do? They call Cloudera. So one of the things that's interesting about this market Dave and it's indicative of how important this transformation is is that the use case for an operating system is pretty clear. The use case for a SQL database is pretty clear. The use case for an application development tool is pretty clear. One of the things that's interesting about big data is the use case has become more clear as a consequence of using the tools. So you have multiple levels of complexity going on in the marketplace right now. Will this all settle out? Yeah, we'll settle it out and we'll see where the profit pools end up as a consequence. But what's clear is that the idea that we're just going to give people the opportunity to download stuff and let them play with it. And suddenly this is magically all going to come together so that we get this enormous, highly valuable production systems as a consequence. That just isn't playing out. It's going to be a lot of hard work is going to go into making all of this stuff deliver on these enormous promises that quite frankly will ultimately deliver but it's just going to take a little bit of time. Well, in five years ago when we published the first big data report what jumped out the chart to me was the pie chart of hardware software and services and it was predominantly services. Software was quite small. Now you could explain that, okay, open source software, the license fees aren't as large but we put forth the supposition that that has to change in order for this business to scale. It cannot continue to be a services led business and to the point you've been making all week is if in this space you've got to call these vendors to get that support it's going to be very expensive for them and very hard for them to make money. It's not going to be able to replicate what happened in Linux. The other again, talking about IBM but the example there is this is a services led company that absolutely has to put their services expertise into software in order to scale. Yeah, and it was very interesting to, so you did Edge and the data science summit in Boston with IBM last week. You had open world. I did Oracle open world and here this week and it's very interesting to put all of that together because some really interesting stories come out. Number one, it's very, very obvious the market is starting to pick up at least one of our key research things. I'd love to say it's all our doing but it's not of course is that this notion of data is an asset and increasingly businesses need to acknowledge that what they do with their data is going to have an enormous impact on how their business works. Secondly, as you said, cloud is also now in that second phase that second phase of adoption, that derivative play and we're now seeing different approaches to thinking about how we package cloud kind of what we mentioned earlier and one of the drivers of that will be that people are now looking at big data, cloud, data value, new types of partnerships and that may in fact be one of the primary drivers of cloud growth and big data growth in combination over the next few years. On the backside of that you have IoT and we also started hearing, I think for the first time, some very interesting approaches to thinking about IoT and the relationship between IoT and big data and for me anyway, a first time that I've heard folks acknowledge that IoT is going to lead to analytics at the edge, processing at the edge and some new types of architectures whereas for the last few years we've heard a lot about everything's gonna go up to some central location and we're gonna crush it with these tools. Yeah, you know this notion of edge to cloud is probably a little too simplistic is really the point there. It's gonna be a lot more complicated than that and require a lot more touch points and an ecosystem to support that. The other big thing we heard this week is the cleaning up of the data lake. We all knew this, when the term data lake came out we all said, uh-oh, this is gonna be all, not all of us, a lot of the vendor community was going crazy after it but we as analysts and bloggers and so forth said, wow, this is gonna be messy because you're kind of kicking the can down the road on actually applying schema, right? Discipline. Discipline, right, that's good. And so now we're hearing about a lot of companies trying to help solve that problem which is critical and again, the chief data officer framework is one of those steps is you gotta have trust in that data, you gotta have provenance, you gotta have data quality and so we're starting to see machine learning techniques applied to helping clean up that Boston Harbor data lake. Yeah, and that's, you know, that's a real advance and again, it's an example of a derivative technology play where we, George refers to this based on some, Brian Arthur and some others who've studied innovation deeply and some of our research shows we call it the adaptive stretch where you take a look at a tool and you apply the tool to a particular set of use cases and you realize that the tool isn't quite a perfect match so you start stretching it and you stretch it and you stretch it until it breaks and you start again and that's kind of what Hadoop and Spark is part of that transition but we are seeing a proliferation of use cases, some of them successful, some of them not, a proliferation of tools and finally a first pass from companies like Alation and others that are now bringing in technology to start going back and cleaning all this stuff up. Okay, so this is again a wrap for four days, wall-to-wall coverage here at Big Data NYC. The Cube is, are we doing Juniper next week? Is that right? We're doing Juniper next week, yeah, and then October's a big month for us. We've got a couple of big events coming toward the end of October. We have in the third week of October, Dell EMC World, that's gonna be the first sort of coming out of the new Dell technologies. They're calling it, I think Dell EMC World so we'll be down there at Dell EMC World interviewing Michael Dell and that same week is the Grace Hopper, Anita Borg Women in Tech, Celebration of Women in Tech. We have a huge booth down there, we've got the partnership going with The Ground Truth, Charlie Sennett's organization, we have our fellowship, we have three women fellows that are now undertaking, they're in a practicum right now and they're researching women in tech, equal pay, why more women who graduate with a computer science degree don't get into the tech business. Covering that, we've also got two fellows. Location of that. That's in Houston at the Grace Hopper of Celebration of Women, Anita Borg Celebration of Women event. We've got two students from Palo Alto High School, one man and one woman coming as well. So we've got five fellows, we're gonna be publishing a ton of content, we've got an amazing line up there, we're really excited about that and then at the end of the month, the world of Watson in Las Vegas, Ginny Rometti is gonna be speaking. First time I can recall that Ginny has actually spoken at one of the customer conferences, of course Ginny, you're welcome to come on theCUBE, but you're gonna get her on there. We got inquiries in, the ask is in, it would be unprecedented for Ginny to come on a show like this, but we'd be happy to have her, love to talk about that, but that's a big deal because it used to be IBM Insight, they've changed the name to World of Watson, obviously IBM's going for it with that whole AI play. She's a great speaker, she's got a lot of good things to say. So Peter, it's been a pleasure working with you this week, really fantastic. Gents, the crew, always Patrick, Seth, Alex, Brendan, Brian, thank you, and appreciate all your help. And of course Bert Lattimore watching, the crowd chat's doing a great job and Kristen Nicole, our managing editor, always a pleasure reading what you summarize after these shows. So thanks for watching everybody, this is a wrap, check out siliconangle.tv, wikibon.com for all the research and siliconangle.com for all the news. We'll see you next week.