 Live from Midtown Manhattan, it's theCUBE, covering Big Data, New York City, 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. Okay, welcome back everyone here live on day two of our three days of coverage of Big Data NYC. This is our event that we put on every year, so our 50 year doing Big Data NYC. In conjunction with Hadoop World, which evolved into Strata Conference, which evolved into Strata Hadoop, now called Strata Data, probably next year we call it Strata AI, but we're still theCUBE, we'll always be theCUBE, and this is our Big Data NYC, our eighth year covering the Big Data World since Hadoop World, and then as Hortonworks came on, we started covering Hortonworks data summit. You guys are... Dataworks Summit. Arun Murthy, my next guest, co-founder and Chief Product Officer of Hortonworks. Great to see you, looking good. My voice, thank you, thanks for having me. Boy, what a journey. Hadoop, for years ago, I remember when you guys still remember, you guys came out of Yahoo, you guys put Hortonworks together, and then since it got public, first it got public, then Cloudera just went public. So the Hadoop World is pretty much out there. Everyone knows where it's at. It's got a nice use case, but the whole world's moved around it. You guys have been really the first of the Hadoop players before even Cloudera on this notion of data in flight, or I call it real-time data, but I think you guys call it data in motion. Batch, we all know what Batch does. A lot of things to do with Batch, you can optimize, it's not going anywhere, it's going to grow. Real-time data in motion is a huge deal. Absolutely. Absolutely, we've obviously, we've been in the space personally, I've been in this for about 12 years now, so we've had a lot of time to think about it. Since you were 12? Yeah, almost, I probably look like it. So back in 2014 and 15, when we sort of went public and we started looking around, the thesis always was, yes, Hadoop is important, we're going to love you to manage lots and lots of data, but a lot of the stuff we've done since the beginning, starting with Yarn and so on, was to really enable use cases beyond the whole traditional transactions and analytics. And Rob, our CEO, calls it, his vision's always been we've got to get into sort of a pre-transactional world, if you will, rather than the post-transactional analytics and BI and so on. So that's where we started. And increasingly, the obvious next step was to say, look, enterprises want to be able to get insights from data, but increasingly, they want to get insights and they want to deal with it in real time, while you're in your shopping cart, they want to make sure you don't abandon your shopping cart. If you're sitting at a retailer and you're on an island, you're about to walk away from a dress, you want to be able to do something about it. So this notion of real time is really important because it helps the enterprise connect with the customer at the point of action, if you will, and provide value right away, rather than having to try to do this post-transaction. So it's been a really important journey. We went and bought this company called Onyara, which is a bunch of geeks like us who started off at the government, built this Apache knife, I think, huge community. It's just like taking off at this point. It's been a fantastic thing to sort of join hands with Joe and the team and keep pushing in the whole streaming data. It's a real, you know, I don't know the tangent, but I do do it since you brought up community. I want to bring this up. It's been the theme here this week. It's just more and more obvious that the community role is becoming central beyond open source. I mean, we all know open source, standing on the shoulders before us. You know, it's just an analytics foundation showing code numbers are hitting up from 64 million to billions in the next five, 10 years of exponential growth of new code coming in. So open source certainly booming, but now community is translating into things, you start to see blockchain, very community-based, and that's a whole new currency market that's changing financial landscape, ICOs and whatnot. That's just one data point. Businesses, marketing communities, you're starting to see data as a fundamental thing around communities and certainly it's going to change the vendor landscape. So you guys compared to Cloudera and others have always been community driven. Yeah, I mean, our philosophy has been simple, right? You know, more eyes and more hands are better than fewer. And, you know, it's been sort of one of the cornerstones of our founding thesis, if you will. And you saw how that's gone on, you know, or the course of six years we've been around. Super excited to have, you know, somebody like IBM joined hands, you know, happened at, you know, data work summit in San Jose. That announcement, you know, again, is a reflection of the fact that we've been very, very community driven and very, very ecosystem driven. But communities are fundamentally built on trust and partnering. Exactly. I think it's pretty obvious. You code with your friends, you code with people who are good, that become your friends, and there's an honor system among that. You're starting to see that now in the corporate deals. So explain dynamic there and some of the successes of you guys have had on the product side where one plus one equals more than two. One plus one equals five or three. You know, IBM's been a great example, right? You know, they've decided to focus on their strengths, which is around, you know, Watson and machine learning. And for us to focus on our strengths around data management infrastructure, you know, cloud and so on, right? So this combination of DSX, which is their data science work, you know, experience, along with Hortonworks is really powerful. We're seeing that over and over again. You know, just yesterday, we announced the whole data plan thing, right? We're super excited about it. And now to get IBM to say we'll get in our technologies and our IP, even big data, you know, whether it's big quality or big insights or big SQL into it has been phenomenal. I love the data plan announcement. You know, finally, people who know me know I hate the term data lake. I've always said it's been more of a data ocean. So I might get redemption because now the data lakes, no one's admitting that it's a horrible name, but you're saying stitching together the data lakes. Which is essentially a ocean, right? But data lakes are out there and then you can form these data lakes or data sets back to whatever, but connecting them and integrating them is a huge issue, especially with security and retirement. And a lot of it is, you know, it's also just pragmatism, right? You know, we start off with this notion of data lake and said, hey, you got too many silos inside the enterprise in one day or so and you want to put them together. But then increasingly as, you know, Hadoop has become more and more mainstream. You know, I don't, I mean, I can't remember the last time I had to explain what Hadoop is to somebody in the enterprise today. As it's become mainstream, you know, a couple of things have happened. One is we talked about streaming data. We see that all the time, especially with HTF. We're seeing, we have customers streaming data from, you know, autonomous cars. We have customers driving streaming data from security cameras, right? You can put a small minify agent in a security camera or smartphone and can stream it all the way back. But then you get into physics, you know, you start to, you're up against laws of physics. If you have a security camera in Japan, why would you want to move it all the way to California and process it? You'd rather do it right there, right? So with this notion of a regional data center becomes really important. That talks to the edge as well. Exactly, right? So you want to have something in Japan which collects all the security cameras in Tokyo and you do analysis and you know, push what you want back here, right? So that's physics. The other thing we're increasingly seeing is with, you know, with data sovereignty rules, especially things like GDPR, there's no regulatory reasons where data has to, has to naturally stay in different regions, right? Customer data from Germany cannot move to France or vice versa, right? Data governance is a huge issue. Here's the problem, I have a data governance and I'm really looking for a solution so we can illuminate this to be great. So there's going to be an Equifax out there again. Oh, for sure. And the problem is, is that going to force some regulation change? So what we see is, and certainly the Mugi Bhan side, I see it personally, is that you can almost see that something else will happen that'll force some policy regulation or governance. Policy change. Absolutely. So you don't want to screw up your data. You also don't want to rewrite your applications or rewrite your machine learning algorithm. So there's a lot of waste potential by not structuring the data properly. Can you comment on what's the preferred path there? Absolutely. And that's why we've been working on things like data playing for almost a couple of years now, right? Which is to say, look, you have to have data and policies which make sense given a context. And the context is going to change by application, by usage, right? By compliance, by law, right? So now instead of having to manage 20, 30, 50, 100 data lakes, wouldn't it be better? Not just data lakes, right? Data ponds or any data, right? Any data pool, stream, river, ocean, whatever. Jacuzzi, right? Data jacuzzi, right? So what you want to do is you want a holistic fabric, you know, I like the term, you know, forestry users, they call it the fabric. Data fabric, yeah. Data fabric, right? You want a fabric over these. So you can actually control and maintain governance and security centrally, but I'll play it with context, right? Last but not least is you want to do this whether it's on-prem or on the cloud, or on multi-cloud, right? So we've been working with a bank, you know, they were primarily based in Germany, but you know, for GDPR, they had to stand up something in France now. You know, they had French customers, but for a bunch of new reasons, regulatory reasons, they had to stand up something in France. So they, instead of building their own data center, they went ahead and did, you know, wanted to be a cloud provider, right? Why won't they? And they were like, great, things are working well. Now they want to expand the similar offering to customers in Asia. It turns out their favorite cloud vendor was not available in Asia, or they were not available in a timeframe, which made sense for the offering. So they had to go with cloud vendor too, right? So now, although each of the vendors will do their job in terms of giving you all the security and governance and so on, the fact that you have to manage it three ways, one for on-prem, one for, you know, cloud A vendor A and B was really hard, too hard for them. So this notion of fabric across these things, which is a data plane, and that, by the way, is based on all the open source technologies we love, like Atlas and Ranger. Oh, by the way, that's also what IBM is betting on and what, you know, the entire ecosystem, but it seems like a no-brainer at this point. And that was the kind of reason why we, we sort of foresaw the need for something like data plane and obviously couldn't be more excited to have something like that in the market today as a net new service that people can have. And you get the catalog, security controls, data integration, and then you get the cloud as whatever, pick your cloud scenario, you can do that, killer architecture, like it a lot. I guess the question I have for you personally is, what's driving the product decisions at Hortonworks? And the second part of that question is, how does that change your ecosystem engagement? Because you guys have been very friendly in a partnering sense and also very good with the ecosystem. How are you guys deciding product strategies? Is it bubble up from the community? Is there like an ivory tower? Let's go take that hill. It's both, right? Because what typically happens is obviously, we've been in the community now for a long time and working with publicly announced, well over a thousand customers, not only puts a lot of response in the shoulders, but it's also very nice because it gives us a vantage point, which is unique, right? So that's number one. And the second one we see as being in the community, also we see the fact that people are starting to solve the problems, right? So it's another telemetry for us. So you have one as the enterprise side, we see what the enterprise are facing, which is kind of where our data plan came in. But we also saw it in the community where people are starting to ask us about, hey, can we do multi-cluster Atlas, right? Or multi-cluster Ranger, right? It's like, okay, you put two and two together and say there is a real need. So you get some consensus. You get some consensus. And you also see that on the enterprise side. And that, and last but not least, is when we went to friends like IBM and said, hey, we're doing this. This is where we can position this right so you can actually bring in IGC. You can bring big quality. You can bring all these, you know. Just things clicked with IBM. Exactly. I mean, Rob Thomas was thinking the same thing. Exactly. You get the power. Look at that. Exactly. Yep. You know, something, for example, we've been working with Power, the Power guys in NVIDIA for deep learning, right? That sort of stuff is just sort of clicks if you're in the community long enough. If you have the vantage point or the enterprise long enough, feels like the two of them click. That's, frankly, my job. Yeah, great. And you got off to the landscapes of the waves are coming in. Exactly. So I got to ask you, the big waves are coming and you're seeing people starting to get hip with the couple of key things that they got to get their hands on. They need to have the big surfboards, metaphorically speaking. They got to have some good products. There's a big emphasis on real value. Don't give me any hype or don't give me a head fake. You know, I buy, okay, AI washing. People can see right through that. All right, that's clear. But AI is great. So, you know, we all cheer for AI. But through reality, as everyone knows, it's pretty much BS except for some core machine learning. It's on the front edge of innovations. That's cool. But value. I got to integrate and operationalize my data. So that's the big wave that's coming. Comment on the community piece because enterprises now are realizing as open source becomes the dominant source of value for them, they are now really going to the next level. It used to be like the emerging enterprises that knew open source. They guys will volunteer and they may not go deeper in the community. But now the more people in the enterprises are in open source communities, they're recruiting from open source communities and that's impacting their business. What's your advice if someone's been in the community of open source? Lessons you've learned. What is the best practice from your standpoint on philosophy? How to build into the community? How to build a community model? Yeah, I mean, at the end of the day, my best advice is to say, look, the community is defined by the people who contribute. So you get a wise if you contribute. Which means if that's the fundamental truth. Which means you have to get your legal policies and so on to a point where you can actually start to let your employees contribute. That kicks off a flywheel where you can actually go then recruit the best talent because the best talent wants to stand out. Like GitHub is a resume now. It's not a word doc. If you don't allow them to build their resume, they're not going to come by. And it's just a fundamental truth. It's self-governing, it's reality. It's reality. Exactly, right? And we see that over and over again. And it's taken time, but as with things, the flywheel has changed enough that now all other generations coming online. If you look at the young kids coming in now, it is an amazing environment. You've got TensorFlow, all this cool stuff happening. It's just amazing. And 20 years ago, that wouldn't happen because the Googles of the world weren't open sourced. And now, increasingly, the 21 years ago. The secret's out, open source works. I'm telling everybody, they know already, but it was just changing some of the HR works and how people are collaborating. And the policies that are owned there, the legal policies that are owned there, our contribution and so on. Arun, great to see you, congratulations. It's been fun to watch the Hortonworks journey and I want to appreciate you and Rob Bearden for supporting theCUBE here in Big Data NYC. It wasn't for Hortonworks and Rob Bearden, your support, theCUBE would not be part of the strata data, which we are not allowed to broadcast into for the record. Riley Meady does not allow theCUBE or our analysts inside their venue. They've excluded us and that's a bummer for them. They're a closed organization. But I want to thank Hortonworks and you guys for supporting theCUBE. We really appreciate it. Thanks for having me back. Thanks for helping shout out to Rob Bearden. Good luck and CPO, it's a fun job. You don't have the pressure of, I get a lot of pressure. Thanks a lot. All right, thanks. More CUBE coverage after this short break.