 Live from San Jose, California. It's theCUBE, covering Big Data Silicon Valley 2017. Okay, welcome back everyone. We're here live in Silicon Valley. This is theCUBE's coverage of Big Data Silicon Valley. Our event in conjunction with O'Reilly, Strata, Hadoop. Of course we have our Big Data NYC event and we have our special pop-up event in New York and Silicon Valley. This is our Silicon Valley version. I'm John Furrier with my co-host Jeff Frick and our next guest is Scott Niles, the CTO of Hortonworks. Great to have you on. Good to see you again. Thanks for having me. You guys have an event coming up from Munich so I know there's a slew of a new announcements coming up with Hortonworks in April next month in Munich for your EU event. And you're going to be holding a little bit of that back. But some interesting news this morning. We had way on yesterday with Microsoft Azure Team, HD Insights. That's flowering nicely. I've got good, good, good bet there. But the question has always been at least from people in the industry. And we've been questioning you guys on, hey, where's your cloud strategy? Because as a distra, you guys have been very successful with your always open approach. Microsoft Azure guy was basically like, that's why we go with Hortonworks. You guys are pure open source, committed to that from day one, never wavered. But the question is, cloud first, AI machine learning. This is a sweet spot for IoT. You start to see the collision between cloud and data. And in the intersection that is deep learning, IoT, a lot of amazing new stuff going to be really popping out of this, your thoughts and your cloud strategy. Yeah, I mean, obviously we see cloud as an enabler for these use cases, right? In many instances, the use cases can be ephemeral. They may not be tied immediately to an ROI. So are you going to go to the capital committee and all this kind of stuff versus, let me go prove some value very quickly. So it's one of the key enablers kind of core ingredients. And when we say cloud first, we really mean it. It's something where the solutions kind of work together. At the same time, cloud becomes important. Our cloud strategy, and I think we've talked about this in many different venues, is really twofold. One is we want to give a common experience to our customers across whatever footprint they choose. Whether it be they roll their own, they do it on-prem, they do it in public cloud, and they have choice of different public cloud vendors. And we want to give them a similar experience, a good experience that is enterprise grade, kind of platform level experience. So not point solution, kind of one function, and then get rid of it, but really being able to extend the platform, what I mean by that, of course, is being able to have common security, common governance, common operational management, being able to have a blueprint of the footprint so that there's compatibility of applications to get written. And those applications can move as they decide to change their mind about where their platform hosting the data. So our goal really is to give them a great and common experience across all of those footprints, number one. And then number two, to offer a lot of choices across all of those domains as well, whether it be, hey, I want to do infrastructure as a service, and I know what I want on one end of the spectrum, too. I'm not sure exactly what I want, but I want to spin up a data science cluster really quickly, boom, here's a platform as a service offer that runs and is available, it's very easy to consume, comes pre-configured and kind of everywhere in between. By the way, yesterday, Wei was pointing out 99.99 SLAs on some of the stuff coming out. Yeah, the SLAs are amazing. And obviously, in the platform as a service space, you also get the benefit of other cloud services that can plug in that aren't, wouldn't necessarily be something you'd expect to be typical of kind of a core Hadoop kind of platform. So yeah, getting the SLAs, getting disaster recovery, getting all of the things that the cloud providers can provide behind the scenes is some additional upside, obviously, as well, in those deployment options. So having that common look and feel, making it easy, making it frictionless are all kind of the core components of our strategy, and we saw a lot of success with that in coming out of year-end last year. We see rapid customer adoption, we see rapid customer success, and frankly, I see that I would say 99.9% of customers that I talked to are hybrid, where they have a foot in on-prem, and they have a foot in cloud, and they may have a foot in multiple cloud. And I think that's indicative of what's going on in the world. Think about the gravity of data, right? Data movement is expensive, right? And analytics and multi-court chipsets give us the ability to process and crunch numbers at unprecedented rates. But movement of data is actually kind of hard. There's latency, it can be expensive. And a lot of data in the future, IoT data, machine data, is going to be created and live its entire life cycle in the cloud. And so the notion of being able to support hybrid with a common look and feel, I think very strategically positions us to help our customers be successful when they start actually dealing with data that lives its entire life cycle outside the four walls of the data center. You guys really did a good job, I thought, on having that clean positioning of data at rest, but also you had the data in motion, which I think ahead of its time, you guys really nailed that. And you also had the IoT edge in mind. We've talked, I think, two years ago, and this was really, I'm not on everyone's radar, but you guys saw that. So just made some good bets on the HD insight. We talked about that yesterday with a way on here and Microsoft. But so edge analytics and data in motion are very key right now because that batch streaming world's coming together and IoT's flooding it with all this kind of data. And we've seen this success in the clouds where analytics has been super successful with powering by the cloud. So I got to ask you, with Microsoft as your preferred cloud provider, what's the current status for customers who have data in motion? Specifically IoT, too. That's kind of the common question we're getting, not necessarily the Microsoft question, but okay, I got edge coming in strong and I'm going to run a lot, certainly hybrid in a multi-cloud world, but I want to put the cloud stuff for most of the analytics and how do I deal the edge? Wow. There's a lot there. You have 10 seconds, go. Yeah, thank you. Take your favorite piece. But I mean, you have Microsoft as your premier cloud and you also have my Amazon relationship with the marketplace and whatnot, but Amazon, you guys have got a great relationship with Microsoft. Yeah. So I think it boils down to kind of a bigger macro thing and hopefully I'll kind of peel into some specifics. I think number one, we as an industry kind of shortchanged ourselves talking about Hadoop, Hadoop, Hadoop, Hadoop, Hadoop. I think it's bigger than Hadoop, not different than, but certainly bigger, right? And this is where we started with the whole connected platforms indicating a kind of traditional Hadoop comes from traditional thinking of data at RAS. So I've got some data, I've stored it, I want to run some analytics and I want to be able to scale it and all that kind of stuff. Really good stuff, but only part of the issue, right? And the other part of the issue is data that's moving, data that's being created outside of the four walls of the data center, data that's coming from devices. How do I manage and move and handle all of that? And of course there have been different hype cycles on streaming and streaming analytics and data flow and all those things. What we wanted to do is take a very protracted look at the problem set of the future and we said, look, it's really about the entire life cycle of data from inception to demise of the data or data being deleted, right? Which very infrequently happens these days. Or cold storages. Yeah, cold storages. Tearing. You know, it's created at the edge, it moves through, it moves in different places, it's landed, it's analyzed, their model's built, but those models get deployed back out to the edge. That entire problem set is a problem set that I think we, you know, certainly we at Hortonworks are looking to address with the solutions. That actually is accelerated by the notion of multiple cloud footprints. Because you think about, you know, a customer that may have multiple cloud footprints and kind of trying to tie the data together. It creates a unique opportunity and I think there's a reversal in the way people need to think about the future of compute. Where, and you know, having been around for a little bit of time. It's always been let me bring all the data together to the applications and have the applications run and then I'll send answers back. That is impossible in this new world order, whether it be the cloud or the fog or any of the things in between or the data center. Data are going to be distributed and data movement will become kind of the expensive thing. So it'll be very important to be able to have applications that are deployable across a grid and applications move to the data instead of data moving to the application. And or at least to have a choice and be able to be selective so that I believe that ultimately scalability, five years from now, 10 years from now, it's not going to be about how many exabytes I have in my cloud instance. That'll be part of it. It'll be about how many edge devices can I have computing and analyzing simultaneously and coordinating with each other this information to optimize customer experience, to optimize the way an autonomous car drives or anywhere kind of in between. It's just totally radical but it's also innovative. You mentioned the cost of moving data will be the issue. So that's going to change the architecture for the edge. What are you seeing with customers? Because we're seeing a lot of kind of people taking a protracted view like you were talking about and looking at the architecture specifically around, okay, some pressure but there's no real gun to the head yet but there's certainly pressure to do architectural thinking around edge and some of the things you mentioned. Patterns, things you could share, anecdotal stories, customer references. The common thing is that customers go, yep, that's going to be interesting. It's not hitting me right now but I know it's going to be important so how can I ease into it and kind of felt the suspenders? How can I prove that this is going to work and all of that? And so we're seeing a lot of certainly interest in that and what's interesting is we're able to apply some of that really futuristic IoT technology in Hortonworks Dataflow that includes NIFI and Minify out to the edge to kind of traditional problems like let me get the data from the branches into the central office and have that round trip communication to a banker who's talking to a customer and has the benefit of all the analytics at home but I can guarantee kind of that round trip of data and analytics. Things that we thought were solved before can be solved very easily and efficiently with this technology which is then also extensible even out further to the edge and so in many instances I've been surprised by customer adoption where they're saying yeah I get that but gee this helps me solve a problem that I've had for the last 20 years and it's very easy and it sets me up on the right architectural course for when I start to add in those edge devices I know exactly how I'm gonna go do it. So it's been actually a really good conversation that's very pragmatic with immediate ROI but again kind of positioning people for the future that they know is coming and then doing that by the way we're also able to prove the security right? Think about security is a big issue that everyone's talking about cyber security and everything and that's typically security about my data center where I've got this huge fence around it and it's very controlled. Think about edge devices that are not outside that fence so security and privacy and provenance become really really interesting in that world and so it's been gratifying to be able to actually go prove that technology today and again put people on that architectural course that positions them to be able to go out further to the edge as their business demands are. And it's such great validation when they come back to you with a different solution based on what you just proposed because that means they really start to understand they really start to see how it can provide value to them. Absolutely, absolutely but so that is all happening and again like I said this, I think the notion of the bigger problem set where it's not just storing data and analyzing data but how do I have portable applications and portable applications that move further and further out to the edge is going to be the differentiation of kind of the future successful deployments out there because those deployments and folks who are able to adopt that kind of technology will have a time to market advantage, they'll have a latency advantage in terms of interaction with a customer not waiting for that round trip and really being able to push out customized tailored interactions whether it be again if it's driving your car and stopping on time which is kind of important to getting a coupon when you're walking past the store and anywhere in between. It's good you guys are certainly in well position for being flexible, being in open stores that's been a great advantage. So I got to ask you the final question for the folks watching. I'm sure you guys answer this either to investors and whatnot and customers. A lot's changed in the past five years and what's happening right now you just illustrated out the scenario with the edge is very robust, dynamic, changing but yet value opportunity for businesses. What's the biggest thing that's changing right now in the Hortonworks view of the world that's notable that you think's worth highlighting to people watching that are your customers, investors or people in the industry? Yeah, I think you brought up a good point the whole notion of open and the whole groundswell around open source, open community development as a new paradigm for delivering software. So I talked a little bit about a new paradigm of the gravity of data and sensors and this new problem set that we got to go solve. That's kind of one piece of the storm. The other piece of the storm is the adoption and the wave of open community collaboration of developers versus integrated siloed stacks of software and that's manifesting itself kind of in two places and obviously I think we're an example of helping to create that. Open collaboration means quicker time to market and more innovation and accelerated innovation in an increasingly complex world so that's kind of one requirement slash advantage of being in the open world and I think the other thing that's happening is kind of the generation of workforce, right? When I think about when I got my first job I typed a resume with a typewriter, dating myself. Whiteout. Yeah, with whiteout. I was a typewriter. You know, resumes today is basically name and GitHub address, right? And here's my body of work and it's out there for everybody to see and that's the mentality. And they have the Q videos up there as well, of course. Well, yeah, I'm sure. So it's kind of like that shift to this is now the new paradigm for software delivery. And I think it's a very interesting way. This is important. I'm just joking aside, Scott. I mean, not the CUBE interview, but I mean, you're seeing it in media and entertainment. No, we're seeing people put CUBE interviews on their LinkedIn. So this notion of collaboration in a software engineering mindset. So you go back to when we grew up in software engineering now and went to open source. Now as GitHub is essentially a social network for your body of work. You're starting to see the software development open source concepts be applied to data engineering, data science is still early days, media, media creation, whatnot. So I think that's a really key point and the data science tools are still in their infancy. I mean, and that's And I think, you know, open, and by the way, I'm not here to suggest that everything will be open, but I think a majority and a majority of the problem that we're solving will be collaborative. It will be ecosystem driven and where there is an extremely large market, open will be the most efficient way to address it. And certainly no one's arguing that data and big data is not a large market. Yeah, but you guys are all in the cloud now. You got the Microsoft, any other updates that you think worth sharing for folks? You got to come back and see us in Munich, man. All right, and we'll be there. Well, theCUBE will be there in Munich in April. We have Hortonworks coverage going on in DataWorks. The conference is now called DataWorks in Munich. This is theCUBE. Here it's got now the CTO of Hortonworks Breakout. I'm John Furrier with Jeff Frick. More coverage from Big Data SV in Chinchengelo Stratahadub after this short break.