 Live from New York, it's theCUBE. Covering Big Data New York City 2016. Brought to you by headline sponsors, Cisco, IBM, NVIDIA, and our ecosystem sponsors. Now, here are your hosts, Dave Vellante and Peter Burris. Welcome back to the Big Apple, everybody. This is theCUBE, the worldwide leader in live tech coverage. We're here at Strata plus a dupe world. We run Big Data NYC in conjunction with that. John Furrier this week, it was in Orlando covering splunk.conf. Scott now is here, he's the CTO of Hortonworks, longtime CUBE alum. It's good to see you again. Thanks for having me back. You're welcome, I was saying, I was seeing you in the audience last week at Edge and now we're in New York. We got to come to New York to see each other, it's good. And so, you know, lots going on. We had Sean yesterday kind of gave us an update on the strategy, but what's going on here for you at the show? You know, we've got a lot of exciting things going on. And you mentioned last week, we participated in IBM's Edge conference, announced our partnership with the Power organization. You were like the star of the show. I mean, you were up on stage a couple times, it was good. Yeah, you know, it's really great. You know, being able to give more customers more choice on the technology they deploy, how they deploy it, being able to bring Hortonworks data platform to the power line of servers just creates more options as our customers, our joint customers really create their data fabric, right? And you know, when I've been here before, we've talked about the value of the ecosystem, right? In today's world of big data, there's so much data flying around, there's so much change in agility. It's really about being able to fit into a broad ecosystem and not having monolithic converged systems, right? And that was just another example of some of the things we're doing. I'll be speaking later today, talking a little bit about some of our latest product releases and feature functionality, again, giving more choice to customers on how they deploy their data fabric, how they combine different kinds of technology to solve their business problems. We also are bringing that technology, of course, to cloud, so working with Microsoft and HD Insights, making the technology more consumable by persona and allowing data science clusters to be spun up in the cloud, whether they're ephemeral or long running. This is really exciting, not just because of the technology, but really... Before you say, I'm sorry, when you say making it available by persona, could you kind of clarify that and then go into why it's exciting? Sure, so you think about all of the technology that's out there in the open-source stack and how we're trying to bring all of that technology together and create choice, that can also be a very daunting thing for someone who's, gee, I wanna go run some machine learning against a set of data to see if I have any interesting correlations, right? That user has a persona, wants to do machine learning, a certain discrete set of technologies, doesn't wanna know about the 37 other things that are available, just wants to get the job done. And so we make it less daunting by saying, okay, spin up a machine learning data science kind of cluster in the cloud. So the persona thing is really important as we're trying to get very discrete solutions, time to market and agility, okay? And so it's a packaging of all the broader stuff that we offer. I'm asking the other parts of the stack too. But it matches the persona of what's that use case that I want to go do. And then being able to execute that in the cloud exponentially makes it even more valuable because you think about ephemeral and cloud and all the advantages of cloud, that's really great. The key thing is there are applications that can be tested against a whole sample of data, very economically. This is stuff that couldn't have happened before. If you think about, yeah, I've got to go to a capital committee, I've got to buy a bunch of hardware, I've got to spend millions of millions of dollars, and I don't know if this thing's going to work. You'd never do it. If you can spin it up in cloud, load the data, go find it, yep, it's great, I'm going to keep it, I'm going to go do something with it, or you know what, there wasn't anything really there. I haven't spent a lot of money, shut it down, right? And so there's a whole new class of applications that are spawning because of this capability, right? And it's a combination of the open data platform, some of the open source technology, some of the newer algorithms, all that innovation. And then the ability to go deploy it in a very simple and seamless way. So summarize the cloud strategy, Hortonworks cloud strategy. It sounds like you want to make technology available. We were talking this morning about technology needs to be plentiful for revolution to occur, and technology's certainly plentiful today. But from a cloud standpoint, you're making the technology available in the cloud for your customers, gives them choice, they can still do on-prem, they can do cloud. But summarize the strategy for us. You know, the easiest way to think about it is there are two things that we want to go do for our customers. One is we want to make their experience with Hortonworks data platform and Hortonworks data flow the same regardless of how they deploy. If they deploy on-prem, if they deploy in the cloud, we want that experience, the applications, the integration, all that stuff to be consistent. So that again, flexible choice of how it gets deployed. The second thing, and we think that this will be happening, we're starting to see it now, but I think it'll be kind of mainstream in the future, is that the entire move to cloud is happening, but I don't think that that implies that customers will move from on-prem to a single cloud instance. I believe that they will have data in multiple cloud instances, maybe with the same provider, maybe with different providers, they maybe have some data still on-prem. Why is that? Think about devices on the edge and the collection of that device information. That device information is probably going to live in a cloud. Companies are going to buy that from the provider that they bought the devices from. It's going to live in the cloud that that provider put it in. May not be their cloud. So data having gravity, you may not want to move all the data from there to somewhere else, causing latency, complexity, more expense. You want to play it where it lies. So we believe that companies will have, ultimately, multiple cloud implementations and footprints, and so we want to actually build that common set of shared services so that they can have common security, common governance, and common management across the entire ecosystem that they're building out, whether it be on-prem in the cloud or multiple cloud instances. We call that the data plane and being able to play data where the data lie. And when you talk about, and we talk about, and we know IoT is coming, the whole notion of getting value out of Internet of Things devices is being able to push processing to the edge. Why do you need to push it to the edge? A, that's the only way it'll scale, and B, you need to push it to the edge so that it happens in a timely fashion. And so the only way to really enable that is to enable this notion of a data plane with that shared service so that you can actually have a discreet view on a single pane of glass of all your data assets and applications and how they're running, and be able to push and move processing around and have choice in the decision that you make. And that data plane is a collection of open source technologies that you and your open source, committers, partners, whatever, develop that then becomes available wherever on-prem or in the cloud. And your cloud partners, let's say it's a Microsoft or other cloud partners, will then, you will deliver that through their marketplace or is it a code element? Yeah, so obviously, different cloud providers have different business models and different ways to go to market. Our goal is to make it the same look and feel and touch and feel and to conform within each of those business models. So yeah, they'll be able to go natively to a cloud provider and spin up either a persona or the whole breadth of infrastructures of service. Again, whatever look and feel they want will allow that to happen. Or with these shared services, we'll also create that data plane where they can actually go through our data plane and have that single pane of glass and spin up cloud instances however they like. Let's talk roadmap. What's new, what's coming? We've had a lot of releases this year and one of the neat things about the model we work in and I think the fundamental shift to open source and open community development is this consistent pace of agility. As technology typically matures, it slows down and gets a little bit harder to do upgrades and so on but the open community driving that agility is allowing us to keep at that pace that we've been at for a very long time. So we just had two major releases of course, new release of our data platform that includes integration of security and governance, the integration of Apache Atlas and Apache Ranger. Why is that important? Well, security's important, right? Making sure that you encrypt PII data, that you're able to secure the perimeter of your clusters that you're able to understand who has access to what data. That's all really important. Core security as part of the Hadoop 2.0 versioning, that's all very strong. We've taken it to the next level and integrated our metadata tagging Apache Atlas with the security so now you can do tag-based security, role-based, tag-based security, why is that important? Think about multinational corporations that I travel around the world. If I'm sitting in the United States, I have access to certain pieces of data. If I happen to find myself traveling and in Germany, I may have different access based on where I'm logging in from, based on frankly the rules of different countries. So being able to tie that security to other metadata tagging like where I've logged in from, kind of location-based, context-based, role-based, as well as taking a look at one of the big opportunities we have is if I have access to this data and that data and there's no PII data in either source, but when I join that data together, I could potentially infer PII data. Being able to do tag-based metadata security based on saying you cannot have access to that data joined. You can have access to it separately, but if you join it, you can't see it. This is a really big deal. We talk about deploying data lakes, the value of all the data that's out there, but the importance of doing governance, having this core technology in there is a big deal. The other thing, and there are millions of things that are in the stack, we've added interactive query with Hive 2.0 in tech preview, being able to do interactive query, sub-second query response. Again, rather than creating a new project or some other complexity for our customers to understand, we actually work with the community to improve Hive's performance, and this is something that a lot of customers have downloaded they're using in tech preview. We've created a new release of our Hortonworks data flow, 2.0, where we've expanded the footprint of our ability to manage data in motion. So not only now can we manage very complex data flows, bi-directional point-to-point data flows. This is a huge IoT use case as sensors need to talk to each other. With 2.0, we've actually now created edge agents so we can move further out to the edge, lightweight, either in C++ or a lightweight Java instance, very small footprint, so we can actually push out the security, the governance, as well as the provenance of data collection and data movement. So it's a really exciting times, I think, for the community and certainly for Hortonworks and what we're delivering back for our customers. What's going on? All right, we're out of time. We have to leave it there. Scott, great to see you as always. Thanks for coming on theCUBE. Thank you very much. All right, keep right there, everybody. We'll be back with our next guest right after this is theCUBE, we're live from New York City. Right back.