 Live from New York, it's theCUBE. Covering theCUBE, New York City, 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. Okay, welcome back everyone here. Live in New York City for CUBE NYC, formerly Big Data NYC, now called CUBE NYC. The topic has moved beyond big data. It's about cloud, it's about data, it's also about potentially blockchain in the future, I'm John Furrier, Dave Vellante. We're happy to have a special guest here, the co-founder and chief project of Hortonworks. I've been in the ecosystem from the beginning, at Yahoo, we've been on theCUBE many times, but great to see you, thanks for coming on. Likewise, thanks for having me. Super smart to have you on here because a lot of people have been squinting through the noise of the marketplace. You guys have been now for a few years on this data plane idea. So you guys have launched Hadoop with Cloud Air, they were first, you came in from Yahoo, became second, two big players. Evolved it quickly, you guys saw early on that this was bigger than Hadoop. And now, all the conversations on what you guys have been talking about three years ago. Give us the update, what's the product update? Obviously, hybrid's a big part of that. What's the story? Great point. We started off being the Hadoop company. I was sort of, Rob, our CEO, who was here on CUBE a couple of hours ago. He called it sort of the phase one of the company, right, we were the Hadoop company. We quickly realized that we had to help enterprises manage the entire lifecycle data all the way from the edge to the data center, the cloud in between, right? So which is why we did the acquisition of Onyara, and we've been talking about it, which kind of became the basis of our Hortonworks data flow product. And then as we kind of went through that phase of that journey, it was quickly obvious to us that enterprises had to manage data and applications in a hybrid manner, right? Which is both on-prem and public load and increasingly edge, which is kind of really where we spent a lot of time these days with IoT and sort of everything from autonomous cars to video monitoring to all these aspects coming in. Which is why we wanted to get to the data plane architecture, allows you to get to consistent security governance model. There's a lot of kind of, you know, I'll call it, there's a lot of like further out there about cloud being insecure and so on. I don't think there's anything inherently insecure about the cloud. The issue that we see is lack of skills. You know, enterprises know how to manage the data on-prem, they know how to do LDAP groups and you know, Kerberos and AD and what have you. They just don't have the skill sets yet to be able to do it on the public load which leads to mistakes occasionally and data bridges and so on. So, we recognize it really early and part of data plane was to get that consistent security and governance models. So, you don't have to worry about, you know, how you set up IAM roles on Amazon versus LDAP on-prem versus something else in Google. It's operating consistent. It's operating, exactly. I'm gonna talk about this in a fast. So, you know, data plane was that journey and you know, and this week at Charlotte kind of what we announced was we wanted to take that step further. We've been able to kind of allow enterprise to manage this hybrid architecture on-prem, multiple public loads and the edge in a connected manner. The issue we kind of saw pretty early on and you know, it's something we've been working on for a long while is that we've been able to connect the architectures but Hadoop when it started it was more of an on-premise architecture, right? And I was there in 2005 and six when it started. Hadoop started was born in a world where we had a gigabit Ethernet, right? And that was up to the rack and from the rack on we had only eight gigs up to a rack. So if you have like a, you know, 2000 or cluster, you're dealing with eight gigs of connection out of this huge bottleneck. You know, fast forward today, you know, you have at least 10 if not 100 gigabits moving to 100, you know, a 10-bit architecture from a networking standpoint. And then what's happening is in that world everything you can, you can, you've had the opportunity to rethink some of the assumptions we had in Hadoop. And then, you know, the good news is that when the, when Cloud came along, you know, Cloud already had decoupled storage and storage and compute architectures. And as we've helped customers sort of navigate the two worlds with Data Plane it's been a journey that's been reasonably successful. I think we have an opportunity to kind of, you know, kind of provide identical consistent architectures both on-prem and cloud. So it's almost like we took Hadoop and adapted it to cloud. I think we can adapt the cloud architecture back on-prem to have consistent architectures. So talk about the cloud-native architectures. So you have a post that just got published. Cloud-native architecture for big data in the data center. No, the cloud-native architecture to big data in the data center. That's hybrid. Explain the hybrid model. How's that, how's that, how do you define that? Yeah, like I said, for us it's really important to be able to, you know, have consistent architectures, consistent security, consistent governance, consistent way to manage data, and consistent way to actually develop and port applications, right? So portability for data is important, which is why having security and governance consistently is a key. And then portability for the applications themselves are important, which is why we're so excited to kind of be, you know, kind of first to embrace the whole containerized the ecosystem initiative, right? And, you know, we've announced the open hybrid architecture initiative, which is about decoupling storage and compute, and then leveraging containers for all the big data apps for the entire ecosystem. And this is where, again, we're really excited to be working with both IBM and Red Hat, especially Red Hat given their sort of investments in Kubernetes and OpenShift. We see that much like you'll have S3 and EC2, S3 for storage, EC2 for compute, and same thing with ADLS and Azure compute, you'll actually have the next gen HDFS and Kubernetes. So is this a massive architectural rewrite, or is it more sort of management around? Great question. Part of it is evolution of the architecture. We have to get, you know, whether it's Spark or Kafka or HiAv or any of these open source projects, we need to do some evolution in the architecture to make them work in the ecosystem, in the containerized world, right? So we're containerizing every one of the 28 animals, 30 animals in the zoo, right? So that's a lot of work. We're kind of well suited to do it. We've done this in the past. And along with that, to your point, it's not enough just to have that architecture. You need to have a consistent fabric to be able to manage and operate it, which is really where Data Plane comes in again. That was really the point of Data Plane all the time, right? This is like a multi-year roadmap. When we sit down, we're thinking about what we'll do in 2022 and 23, but we really have to execute on a multi-year roadmap. And Data Plane was a linchpin. It was just like the sharp edge of the sword, right? I was a tip of the sphere. But really, the idea was always that we have to get Data Plane in to kind of get that hybrid product out there. And then we can sort of get to a, an iteration of Data Plane, which would work with the next iteration of the big data ecosystem itself. You see Kubernetes and things like Kubernetes. You got Istio, these service meshes up the stack. Absolutely. Are going to play a pretty instrumental role in orchestrating workloads and providing new stateless and stateful application with data. So now data becomes, you get more data being generated there. Exactly. So this is a new dynamic. It sounds like that's a fit for what you guys are doing. Which is something we've seen for a while now, right? Like containers are something we've tracked for a long time. Really excited to see kind of Docker and particularly now with Red Hat. All the work they're doing with Red Hat containers. Get to security and so on. It's sort of the maturing of that ecosystem. And now the ability to be able to port, you know, build and port applications. And the really cool part for me is, you know, as much, we'll see, we'll definitely see like Kubernetes and, you know, OpenShift and so on on-prem. But even if you look at the cloud, the really nice part is each of the cloud providers themselves provide a Kubernetes service. Whether it's GKE on Google or Fargate on Amazon or, you know, AKS on Microsoft. We'll be able to take identical architectures and leverage them. You know, when we containerize high work, Afgar Spark, we'll be able to do this on Kubernetes on-prem with OpenShift. And then the OpenShift online, which is available in the public cloud, but also GKE and Fargate and AKS. What's interesting about the Red Hat relationship is, and I think you guys really had smart to do this, is by partnering with Red Hat, you can, customers can run their workloads, analytical workloads. Exactly. In the same production environment as Red Hat is in. But with kind of differentiation, if you will. Exactly. With Data Plane. And it's just a wonderful thing there. So, again, good move there. Now, around the ecosystem, who else are you partnering with? What do you see out there? Who's in your world that's important? You know, again, you know, our friends at IBM, but we've had a long relationship with them. And, you know, we're doing a lot of work with IBM to integrate Data Plane and also ICPD, which is the IBM Cloud Private for Data, which brings along all of the IBM ecosystem, whether it's DB2 or IGC, Information Governance, Catalogs, all that kind of work, back in this world. And what we also believe this will give a flip to is the whole continued standardization on security and governance, right? So, you guys remember the ODPI, it cost a bit of a flutter a few years ago. Just a little bit. Yeah, exactly. You kind of know how that turned out. But what we did was, we've kind of said, ODPI, you know, was started, was based on the Hube distributions, right? Now, ODPI has turned to be more about metadata and governance. So, we're collaborating with IBM with on ODPI, more around metadata and governance, because, again, we see that as being very critical in the sort of multi-cloud on-prem edge kind of work. Well, the narrative was always, well, already needed, but it's clear that these three companies have succeeded dramatically. I mean, you look at your financials, you know, there's been public statements made about IBM's contribution to seven-figure deals for you guys. We had Red Hat on, and you guys are sort of birds of a feather, so. Exactly. It certainly worked for you three, which presumably means it confers value to your customers. Which is really important, right? From a customer's standpoint, what is something we would really focus on is fact that the benefit of the bar game for the customers now they understand that they're key vendor partners, such as us and IBM and Red Hat, we have a shared roadmap. So now they can be much more sure about the fact they can go to containers and Kubernetes and, you know, and so on and so on, because all of the tools they depend on, and all the partners they depend on, are working together. So they can place bets. They can place bets, exactly. And the more important thing is they can place longer-term bets, not a quarter bet, right? You know, we hear about customers talking about building their next-gen data centers with Kubernetes in mind, right? They have to. They have to, right? And it's more than just building machines up, because what happens is with this world, we talk about things like networking. The way you're doing networking in this world with Kubernetes is different than you do before. So now they have to place longer-term bets, and they can do this now with the guarantee that the three of us will work together to deliver on the architecture. Well, we're great to have you on theCUBE. Great to see you. Final question for you. As you guys have a good long plan, which is very cool, short-term, customers are realizing, you know, okay, the setup phase is over. Now they're in usage mode. So the data's got to deliver value. So there's a real pressure for ROI. You know, we would give people a little bit of a pass earlier on because, you know, get up everything, set up the clusters, set up the data lakes, do all this stuff, get it all operationalized. But now, with the AI and machine learning in front and center, that's a signal that people want to start putting this to work. What have you seen customers gravitate to from the product side? Where are they going? Is it the streaming? Is it the Kafka? Is it the, what products are they gravitating to? Yeah, definitely. I mean, I look at these in terms, in my role, I look at these in terms of use cases, right? We're certainly seeing a kind of continued push towards the real-time analytics space, which is kind of why we place longer-term bed on HDF and Kafka and ifan and so on. But what's been really heartening kind of back to your sentiment is that we're seeing a lot of push right now on security and governance, right? So which is why we introduced the, for GDPR, we introduced a bunch of capabilities in data plane with DSS and James, you know, Kabilius wrote about this earlier in the year. We're seeing customers really pushing us for key aspects like GDPR, right? This is a reflection for me of the fact that of the maturing of the ecosystem. It means that it's no longer something on the side that you play with. It's something that's more, the whole ecosystem is now more of a system of record and sort of a system of augmentation. So that's really heartening. It also brings a sharper focus and more sort of response pretty on our shoulders to the place we're in. Well, congratulations. You guys have a stock price of 52-week high. Congratulations. Those things take care of themselves. Get good products. Stock price takes care of itself. Okay, theCUBE coverage here in New York City. I'm John Furrier, Dave Vellante. Stay with us for more live coverage. All things data happening here in New York City. We'll be right back after this short break.