 I'll open up all he does that. Good. Perfect. All right. Rock and roll. This is Robin Matlock, CMO of VMware, and you're watching The Cube. This is John Siegel, VP of Product Marketing at Dell EMC. You're watching The Cube. This is Matthew Morgan. I'm the Chief Marketing Officer at Druva, and you are watching The Cube. From Midtown Manhattan, it's The Cube. Covering Big Data, New York City, 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. Hello, everyone. Welcome to this special Cube live presentation here in New York City for The Cube's coverage of Big Data NYC. This is where all the action is happening in the Big Data World, machine learning, AI, the cloud, all kind of coming together. This is our fifth year doing Big Data NYC. We've been covering the Hadoop ecosystem in Hadoop World since 2010. It's our eighth year really at ground zero for the Hadoop. Now the Big Data, now the data market. We're doing this also in conjunction with Strata Data, which was Strata Hadoop. That's a separate event with O'Reilly Media. We are not part of that. We do our own events. Our fifth year doing our own event. We bring in all the thought leaders. We bring in all the influences. We bring in entrepreneurs and the CEOs to get the real story about what's happening in the ecosystem. And of course, we do it with our analysts at Wikibon.com. And I'm John Furrier with my co-host, Jim Kobielus, who's the Chief Analyst for our data piece, Lead Analyst Jim. The data world's changed. We had commenting yesterday all up on YouTube.com. So that's still an angle. Day one was really set to table. We kind of get the whiff of what's happening. We can kind of feel the trend. We've got a finger on the pulse. Two things going on. Two big notable stories is the world's continuing to expand around community and hybrid data and all these cool new data architectures. And the second kind of sub-story is the O'Reilly Show has become basically a marketing event. They're making millions of dollars over there. A lot of people were last night kind of like not happy about that and then what's giving back to the community. So again, the community theme is still resonating strong. You're starting to see that move into the corporate enterprise of what you're covering. What are you finding out? What did you hear last night? What are you hearing in the hallways? What is kind of the tea leaves that you're reading? What are some of the things you're seeing here? Well, all things hybrid. First of all, it's building hybrid applications for hybrid cloud environments and there's various layers to that. So yesterday on theCUBE, we had, for example, one layer is hybrid semantic virtualization layers are critically important for bridging workloads and microservices and data across public and private clouds. We had, from at scale, we had Bruno Aziza and one of his customers discussing what they're doing. I'm hearing a fair amount of this venerable topic of semantic data virtualization become even more important now in the era of hybrid clouds. That's a fair amount of the Scuttlebutt and the hallway and atrium talks that I've participated in. Also yesterday, from BMC, we had Bazel Faruki talking about basically automating hybrid data pipelines or data pipelines in hybrid environments. Very, very important for DevOps. Productionizing these hybrid applications for these new multi-cloud environments. That's quite important. Hybrid data platforms of all sorts. Yesterday we had, from Acti, and we had Jeff Weiss discussing their portfolio for on-prem public cloud, putting the data in various places and speeding up the queries and so forth. So hybrid data platforms are going increasingly streaming in real time. What I'm getting at is that what I'm hearing is more and more of the layering of these hybrid environments is a critical concern for enterprises trying to put all this stuff together and future-proof it so that they can add on all the new stuff that's coming along like serverless clouds without breaking interoperability and without having to change code just plug-and-play in a massively multi-cloud environment. And also I'm critical of a lot of things that are going on because to your point, reason why I'm kind of critical on the rally show and particularly the hype factor going on in some areas is that there's two kind of trends that I'm seeing with respect to the entrepreneurs and some of the companies. We have one camp that are kind of groping for solutions and you can see that with their whitewashing at new announcements, this is going on here. It's really kind of... Everything's AI now by the way. And they're AI washing it, but the tell sign is they're always kind of like doing a magic trick of some sort of new announcement, something's happening. You got to look at underneath that and say, okay, where is the deal for the customers? And you brought this up yesterday with Peter Burris which is the business side of it is really the conversation now. It's not about the speeds and feeds or management, it's certainly important and those solutions are maturing. That came up yesterday. The other thing that you brought up yesterday I thought was notable was the real emphasis on the data science side of it and that it's still not easy for data science to do their job. And this is where you're seeing productivity conversations come up with data science. So really the emphasis at the end of the day boils down to this. If you don't have any meat on the bone, if you don't have a solution that rubber hits the road where you can come in and provide a tangible benefit to a company and enterprise, then it's probably not going to work out and kind of had that tool conversation as people start groving. So as buyers out there, they got to look and kind of squint through it and saying where's the real deal? So that kind of brings up what's next? Who's winning? How do you as an analyst look at the playing field and say that's good, that's got traction, that's winning? Not too sure. What's your analysis? How do you tell the winners from the losers and what's your take on some of the data science opportunities? First of all, you can tell the winners when they have an ample number of reference customers who are doing interesting things. Interesting enough to get a jaded analyst to pay attention. Doing something that changes the fabric of work or life, whatever. Clearly solution providers who can provide that are, you know, they have all the hallmarks of a winner, meaning they're making money and they're likely to grow and so forth. But also the hallmarks of a winner are those in many ways who have a vision and catalyze an ecosystem around that vision of something that could possibly be done before but not quite as efficiently. So for example now, the way what we're seeing more in the whole AI space deep learning is AI means many things. What we're seeing more right now in terms of the buzzy stuff is deep learning for, you know, being able to process real time streams of video and images and so forth. And so what we're seeing now is that the vendors who appear to be on the verge of being winners are those who use deep learning inside some new innovation that appeals to a potential mass market. It's something you put on your, like an app or something that you put on your smartphone or it's something you'd buy at Walmart and install in your house. You know, the whole notion of clearly, you know, like Alexa and all that stuff. Anything that takes chatbot technology, really deep learning powers chatbots and is able to drive a conversational UI into things that you wouldn't normally expect to talk to you and does it well in a way that people have to have that. Those are the vendors that I'm looking for in terms of those are the ones that are going to make a ton of money selling to a mass market and possibly, you know, and very much, you know, once they go there, they're building out a revenue stream and a business model that they can conceivably take into other markets, especially business markets. You know, like Amazon 20-something years ago when they got started in the consumer space as the exemplar of web retailing, who expected them 20 years later to be a powerhouse, you know, provider of business cloud services? You know, so we're looking for the Amazons of the world that can take something as silly as a conversational UI inside of a, driven by DL inside of a consumer appliance and in 20 years from now, maybe even sooner, become a business powerhouse, so that's what, you know. Yeah, I mean, the thing that comes up that I want to get your thoughts on is that you're seeing data integration become a continuing theme. The other thing about the community play here is you're starting to see customers align with syndicates or partnerships and I think it's always been great to have customer traction, as you pointed out, as a benchmark, but now you're starting to see the partner equation because this is an open, decentralized, distributed internet these days and it is looking like it's going to form differently than the way it was in the web days and with mobile and connected devices with IoT and AI, a whole new infrastructure is developing. So you're starting to see people align with partnerships so I think that's something that's signaling to me that the partnership is amping up. The people are partnering more. We had Hortonworks on with IBM, some people take a Switzerland approach where they partner with everyone. You had Wendisco partners with all the cloud guys. I mean, they have unique ITP, so you have this model where you got to go out and do something but you can't do it alone. Open source is a key part of this so obviously that's part of the collaboration. This is a key thing and then you got to check off the boxes. Data integration, deep learning as a new way to kind of dig deeper. So the question I have for you is the impact on developers because if you can connect the dots between open source 90% of the software written will be already in open source, 10% differentiated and then the role of how people are going to the market or the enterprise with the partnership. You can almost connect the dots and say it's kind of a community approach. So that leaves the question what is the impact to developers? Well, the impact to developers, first of all is when you go to a community approach and some big players are going more community and partnership oriented in hot new areas. If you look at some of the recent announcements in chatbots and those technologies we have sort of rapprochements between Microsoft and Facebook and so forth or Microsoft and AWS. The impact for developers is that there's convergence among companies that might have competed to the death in particular hot new areas like I said. Chatbot enabled apps for mobile scenarios. And so it cuts short the platform wars fairly quickly. Harmonizes around a common set of APIs for accessing a variety of competing offerings that really overlap functionally in many ways. For developers it's simplification around a broader ecosystem where it's not so much competition on the underlying open source technologies. It's now competition to see who penetrates the mass market with actually valuable solutions that leverage one or more of those erstwhile competitors into some broader synthesis. For example, the whole ramp up to the future of self-driving vehicles and it's not clear who's going to dominate there. Will it be the vehicle manufacturers that are equipping their cars with all manner of computerized everything to do what not? Or will it be the up and comers? Will it be the computer companies like Apple and Microsoft and others who get real deep and invest fairly heavily in self-driving vehicle technology and become themselves the new generation of automakers in the future? So what we're getting at is that going forward developers want to see these big industry segments converge fairly rapidly around broader ecosystems where it's not clear who will be the dominant player in 10 years. The developers don't really care as long as there is consolidation around a common framework to which they can develop fairly soon. And open source is obviously a key role in this and how is deep learning impacting some of the contributions that are being made because we're starting to see that the competitive advantage and the collaboration on the community side is with the contributions from companies. For example, you mentioned TensorFlow multiple times yesterday from Google. That's a great contribution. If you're a young kid coming into the developer community this is not normal. It wasn't like this before. People just weren't donating massive libraries of great stuff already pre-packaged. All my new dynamics emerging, is that putting pressure on Amazon? Is that putting pressure on AWS and others? Yeah, it is. First of all, there is a fair amount of, I wouldn't call it first mover advantage for TensorFlow. There's been a number of DL toolkits in the market open source for the last several years but they achieved the deepest and broadest adoption most rapidly and now TensorFlow is essentially a defect of standard in the way that, let me just go back, portraying my age, 30, 40 years ago you had two companies called SAS and SPSS that quickly established themselves as the go-to statistical modeling tools. Then they got a generation, our generation of developers, at least data scientists, what became known as data scientists, to standardize around, see, you're either going to go to SAS or SPSS as if you're going to do data mining, cut ahead to the 2010s now, the new generation of statistical modelers, it's all things DL and machine learning. And so SAS versus SPSS, that's ages ago, those companies or those products still exist. But now what do you get hooked on in school? What do you get hooked on in high school for that matter when you're just hobby shopping DL? You're probably going to get hooked on TensorFlow because they have the deepest and the broadest open source community where you learn this stuff. You learn the tools of the trade. You adopt that tool and everybody else in your environment is using that tool and you got to get up to speed. So the fact is, you know, that broad adoption early on in a hot new area like DL means tons. It means that essentially, TensorFlow is the new spark. Where spark, you know, once again, spark just in the last five years came on real fast. And it's been eclipsed as it were on the stack of cool by TensorFlow, but it's a deepening stack of open source offerings. So the new generation of developers with data science work benches, they just assume that there's spark in there and increasingly just going to assume that there's TensorFlow in there. They're going to increasingly assume that there are the libraries of algorithms and models and so forth that are, you know, floating around in the open source space that they can use to bootstrap themselves fairly quickly. This is a real issue in the open source community which we talked about when we were in LA for the open source summit was exactly that is that there are some projects that become fashionable. So for example, the cloud native foundation very relevant but also hot, really hot right now. A lot of people are jumping on board the cloud native bandwagon and rightfully so, a lot of work to be done there and a lot of things to harvest from that growth. However, the boring blocking and tackling projects don't get all the fanfare but are still super relevant. So there's a real challenge of how do you nurture these awesome projects that we don't want it to become like a nightclub where no one goes anymore because it's not fashionable. Some of these open source projects are super important and have massive traction but they're not as sexy or flairish as some of them. DL is not as sexy or machine learning for that matter it's not as sexy as you would think if you're actually doing it because the grunt work John as we know for any statistical modeling exercises, data ingestion and preparation and so forth that's 75% of the challenge for deep learning as well but also for deep learning and machine learning training the models that you build is where the rubber meets the road. You can't have a really strongly predictive DL model in terms of face recognition unless you train it against the fair amount of actual face data or whatever it is and it takes a long time to train these models that's what you hear constantly. I heard this constantly in the... Well that's a data challenge you need models that are adaptive you need real time and I think this points to the real new way of doing things it's not yesterday's model it's constantly evolving. Yeah and it relates to something I read this morning or maybe it was last night that Microsoft has made a huge investment in AI and deep learning machine learning they're doing amazing things and one of the strategic advantages they have as a large established solution provider with a search engine BING is that from what I've been this is something I've read I haven't talked to Microsoft in the last few hours to confirm this that BING is a source of training data that they're using for machine learning and I guess some deep learning modeling for their own solutions or within their ecosystem that actually makes a lot of sense I mean Google uses YouTube videos heavily in its deep learning as for training data so you know there's the whole issue of if you're a pip squeak developer some you know I'm sorry it sounds patriotic some people who face kid in high school who wants to get real deep on TensorFlow and start building and tuning these awesome kick ass models you know to do face recognition or whatever it might be where are you going to get your training data from well there's plenty of open source database training databases out there you can use but it's what everybody's using so there's sourcing the training data there's labeling the training data that's human intensive you need human beings to label it there was a funny recent episode or maybe it was last season episode of Silicon Valley that was all about machine learning and building and training models it was the hot dog not hot dog episode it was so funny they bamboozled a class on the show fictionally bamboozled a class of college students to provide training data and to label the training data for this AI algorithm it was hilarious but where are you going to get the data and where are you going to label it there's a lot more work to do that's basically what you're getting at it's DevOps but it's grunt work well we're going to kick off day two here this is the SiliconANGLE Media the Cube our fifth year doing our own event separate from O'Reilly Media but in conjunction with their event in New York City it's gotten much bigger here in New York City we call it Big Data NYC that's the hashtag follow us on Twitter I'm John Furrier Jim Kulbis we're here all day we've got Peter Burris joining us later ahead of Regis for Wikibon and we've got great guests coming up stay with us we'll be back with more after this short break