 Live from Dublin, Ireland, it's theCUBE. Covering Hadoop Summit Europe 2016. Brought to you by Hortonworks. Now your host, John Furrier. Back to when we are here live in Dublin, Ireland for the CUBE Silicon Angles flagship program. We go out to the events and extract the signal from the noise, I'm John Furrier. I got two guests who are going to do a drill down on all the Hortonworks news and product announcements. Matt Morgan, Vice President of Product and Alliance is reporting this with Wei Wang, who's the Senior Director of Global Marketing for Hortonworks. Welcome back to theCUBE, great to see you again. Yes, for sure. We had a great update in Big Data SV. It was part of Big Data Week with Strata, Hadoop and our event, Big Data SVs. And we talk a lot about data in motion, data address. It's a great interview. But you were holding back some data and insights because you knew this event was right around the corner. Yeah? So Matt, Wei, welcome. We had all the execs on talking about the big picture strategy, some of the announcements. Let's go into some detail. What are some of the announcements, Matt? What are you guys announcing here? And let's dive in. Yeah, well, first point to make on the announcements is look how far we have come from a perspective of conversation at Hadoop Summit. Today, we're having a conversation that is not just limited to just Apache Hadoop. It's about what Apache Hadoop can do and, more importantly, how it connects with all the other platforms that manage data in motion. There's not a single customer that we talk to now that isn't having a complete, connected data platforms narrative in terms of their future strategy. They're trying to solve for the data problem, whether the data's in motion or the data's at rest. And they need to be able to do this in real time through a variety of different applications that maximize the value of that information. So the announcements that we made this week are paying off that strategy to try to address these customer demands. What specifically on that addresses that? What are the details on that announcement that addresses that? Sure, so there are three distinct categories of announcements we made this week. First was around the data at rest platform. The second was around security. And the third was some really cool partnerships. So for the data at rest platform way, you could probably speak best to the capabilities. You could weigh in. I can weigh in, for sure. And I'm not going to hold back this time, right? Come on, weigh in. Yeah, I will weigh in and I bought you a beer last night. So let's get the news on. We are announcing two technical previews and we're going to announce actually two GA products. So let me go through them for you. The first one is the technical preview of Atlas and Ranger integration. And everybody, security, you know, it's on every CIO's mind, right? In fact, actually, I met a few customers here. The major topic points during our conversation was how do you help me secure my cluster? And now with the Atlas, the dynamic security and tagging that truly fulfills the requirements and needs of these CIO's. So talk about the Atlas. That's something that has come up over and over again, especially international. There's been demand for that out here. Is it geography-based? I mean, the project? So actually it's a project that we incubated last year. Within seven months, we have not only the vendors, but also our partners participating in the effort of solving the common challenge of how to do data governance in a big data world. So that's what the birth of Atlas as a project. So what's the state of the data at rest situation? I mean, it's pretty stable. Data in motion kind of complicates things with IoT. Yeah, so the Atlas and Ranger security, dynamic security situation truly helps to solidify and harden the data at rest platform in which we have solved. So that's the first news. The second news is actually Zeppelin, right? Spark and Zeppelin notebook. Again, security. This is the news of this technical preview is we are hardened it. Now you can secure the Zeppelin in authentication and other things that so that you are comfortable to run this in your mission critical application environment. So that's the second technical preview. And now I think everybody here, including Sean, we heard him yesterday talk about cloud. Cloud is a big, big initiative in many of people's mind. So we're doing two announcements. One is the GA product of Ambari that is allows the ease of operations for many of the people. The second is cloud break, right? You were here maybe here last year, we announced the acquisition of a sequence IQ and that is cloud break. So what cloud break does is truly pay off Hortonworks cloud strategy. We feel that many of the CIOs, many of the corporations that now going to go hybrid cloud environment, right? Cloud, hybrid cloud is going to be a reality. It's not going to be something down the road five years from now. Most of the companies I talked to again at the shows to ask me, so what can you do to help me simplify my operations in the cloud? That's what we're doing. What are we doing in cloud break that allows people to truly ramping up, ramping down, out of skill, the cloud operations? Well Dave and I were talking about this earlier and Wikibon's research is very clear. They believe that the operating model of cloud in general will be independent of what you call it. It's an operating model, it's not a destination. And that's Brian Grace who speaks this as well. So it's an operating model, percentage on public, big percentage on hybrid and still on prem. But at the end of the day, the operating model is what everyone wants and they see Amazon as a reference point and reference architecture for costs. So the goal for CIOs is an operating model that promotes agile, new software development frameworks like agile and DevOps and other things, application development, but also a cost model that runs efficiently for more profitable operations. So that's what they want. They don't want more costs and more complexity and they want to have that speed and agility. So now this is a product marketing challenge, so you're looking at the product marketing saying, hey, I don't want to have conflicting goals between what the CIO wants, they want simplicity. What do you, how do you guys look at that and what's your answer to that? So you have to be context sensitive with this conversation, right? IT organizations within the Fortune 500 has spent the last three decades building on-premise infrastructure. They've got data centers, they've got dedicated facilities, they've got racks of servers, they've got information that they physically must keep on prem for a variety of regulation reasons. So the answer is how do you deal with a hybrid environment? We're looking at a hybrid situation. That hybrid situation is easy to forecast, it's going to be consistent the next five years, possibly the next decade, with public cloud being an additive element to the conversation. Now, having said that, it's not a trivial, additive conversation. People want to bring public cloud resources in. We see it in every single account we service. We have line of business owners that are actually making the decision on what public cloud resources they bring in. So you could have the marketing team bring on Amazon. You could have a line of business bring on Azure, you could have Google cloud, you could have all of the above. So what IT needs from their data structure and their data architecture is a solution that can scale ubiquities across all three, having a heterogeneous approach. So we're solving that problem with cloud break. And what we've seen is the ask of us is to provide a very simple, repeatable solution that can automate that scaling because they don't want more complexity. They don't want to actually go in an open an account and manage a setup of a cluster inside a public cloud every time a line of business wants to burst some computer storage needs. They want that in an automated fashion and that's what cloud break does. And that lowers the cost because you assume automation produces the manual setup and or labor involved. Yeah, so let's talk about costs for a moment. At the end of the day, people want to leverage public cloud because it gives them a superior cost curve. And the easiest way to justify that is to do it on a burst point of view. So as I need more capacity to do some compute or storage, I can bring it online and ramp it down when I'm done, right? So yes, automating that reduces cost but the ability to automate it also allows you to use it when you need to use it and turn it off when you don't need to use it, right? It kind of creates this elasticity between on-prem and in the cloud and everything in between. Yeah, Dave and I were talking about our years of experience in the industry and enterprise and this patterns, the movies play over and over again. Total cost of ownership is a huge issue because it's not about the point technologies or the platform per se, it's what solutions enable and the cost to get there. And Alan Gates, co-founder of Horton, which was just on talking about that, saying, we want to get to a point where it's not about the platform, it's about what's the tooling on for the job and the enablement that happens with the platform. So that being said, that is a point that CIOs want to take is that total cost of ownership. How do you guys talk about that internally because you got new projects coming on, you have now data at motion and at rest. So Alan also said that when data has to move, that's the problem. So data at the edge sits on the edge, on the cloud, so this is actually a good framework for the cloud. Okay, so I got three points on this. First off, the connected data platforms concept and category is embracing this reality. You're spot on. There are going to be platforms that sit out at the edge. There are going to be platforms that sit in the data center. There'll be platforms that sit on the cloud. The connection between their platforms, the mobility of data between their platforms will separate the people who can maximize the value of information and the folks that can't. So that's the overarching conversation. But on the cost side, there are three dimensions that Hortonworks focuses on. Number one, this transformation, unlike transformations in the past, is driven from the fact that open source is actually lowering the cost constraints. They are bringing down the barrier to entry in terms of an overall cost. From a software development standpoint. From a software development. Not operating, maybe. Let's talk about operating. Some previous generations of providers in the Hadoop world have not focused on the enterprise scale problem. And I totally agree with you that the operating side is actually the most important thing to solve. Now at Hortonworks, we've taken the at scale challenge and we have taken that as an operating charter for our product strategy. So we announced with you Spark at scale. We announced with you with Way here talking about cloud solutions at scale. We've talked about Hive at scale. We use the phrase at scale because we are embracing the operations, the governance, the security, the roll out tech to bring that cost vector down. And I'll talk about Spark for a moment. Everyone loves Spark, but no one wants to have the conversation of what does it cost to put two petabytes of data live in Spark in my enterprise with all the security governance and operations needs. And the reality is, is that it's too high to do. And that's why we've taken Spark, incorporated into our platform. We allow it to be part of Hadoop and to inherit all the security ops and governance needs. So I got a question someone direct messaged me. They didn't want to put it on Twitter live, but they said, can you, John, ask the question around Hortonworks Spark strategy. It doesn't seem to be clear to me. Where do they stand today? So clear up that question. Where is Spark stand today? Clarify that message. It is a great question. I appreciate them asking this. We have embraced Spark from the point of view of what does it take to take an enterprise and go 100% ubiquitous across every data scientist to leverage the power of Spark. Now from our point of view, Spark adds enormous capabilities around agile analytics. But getting it out to every data scientist requires we bring the barrier to entry down. It requires us bring the cost vector down of what it would take to put two petabytes of data in memory. It requires us to take the worry vector down, if you will. In other words, people are concerned about security ops and governance. That needs to be addressed. So we have taken those as our operating charter. We call the strategy Enterprise Spark at Scale and we're addressing all of this. And oh, by the way, we're also innovating very rapidly in the Spark community. Did you know we were the first company to support Spark 1.6 as part of the platform? Within the same day, to add to that, right? When Spark 1.6 was announced and released in the community within about eight hours, shorter than that, we actually put Spark 1.6 as a technical preview component for our customers to download. Spark caught a lot of people flat-footed. Even Cloudera, who supported Spark, has retooled some stuff. And a lot of people are saying in the industry that Hortonworks and Cloudera have to retool everything around Spark. Now, I've heard that, I know you're saying no, no. But talk about that dynamic. Did you guys, are you retooling around Spark? When you say embrace Spark, does that mean you've embraced it from the beginning? As you mentioned, that same day, that would imply that and restate that. So are you retooling or are you embedding native Spark? This is a semantics conversation. Okay, when people say- This is what it's all about in the community. Right, when people say Hadoop using the legacy definition of Hadoop, there's almost a thought process that says Hadoop equals map reduce. Stop there. Therefore, any other type of processing or access engine is not Hadoop. Reality is, is that years ago, Hortonworks had the vision to make Hadoop a platform. That's why they called the product Hortonworks data platform. This is why we did all that innovation around yarn and the data operating system. What's my point? Yarn's a zipper. That's what Michael Siragu said. It's half the zipper. First half's HTFS, yarn zips up the second half. Which basically becomes a stack. Exactly. So what does that allow us to do, right? It allows us to have a platform conversation where we can embrace Spark as one of our many children that would operate within that platform. And we can tune our support for it for the benefits it adds to an organization. Which is around the agile analytics and empowering these data scientists. It's why we did Zeppelin. So you're saying in the platform you don't have to retool for Spark. No, not at all. You just plug it in. It's exactly right. And I understand that people like to have narratives in the industry, but this particular conversation. Well, I mean, Clodera has high and they had to retool around that for Spark. That's just their issue. And again, I don't want to get that Clodera thing, but my point is a customer out there and your customer, their customer. So your customer's customer is what they care about, right? So that's what they care about. So when they look at something like AGFS and say Yarn, they want to say, okay, I got to run Exadata or not Exadata, I got a 12C database. So can I run SQL with big data Oracle without Exadata and Hortonworks? And that's what they want. And that's what you guys have, have you guys provide for Oracle's of the world to run a 12C database in Hortonworks? That's kind of the vision that you guys see, right? So that you eliminate those benefits, I mean, those barriers, yeah. For sure, right? You think about all the partners we have, but we are now over 1600 partners and the co-development effort. People always bring all kinds of, the people you see on the show floor, they are our partners. We don't say the partnership as just a resell agreement. We say the partnership is you, we sit down together and see how our platforms can work seamlessly together. So when our customers and your customers have the systems in one roof under the same data center, you can actually do not retooling yourselves. We're going to do the retooling ourselves to make it seamlessly. I think the Oracle is one of many. I think you guys have a good opportunity with the platform and the new emerging piece because what we're hearing is, and we heard from Raghu from Microsoft who came on yesterday, fantastic interview. He talked about the zipper is half HTFS, the other half yarn, but that enables people to plug into it. And I think that to me is the opportunity for these unknown new opportunities like IoT. You can share some color on how that plays out. I mean, obviously Microsoft talk about at scale, Azure has huge scale, you know, and Amazon where your customers might run in some stuff in, Oracle's got scale. So all the big vendors that are coming into the ecosystem all have customers that want scale. They need scale. And they want operating scale at a cost structure that's compatible with the dream of cloud. Absolutely. So yarn is first off, yarn is going to empower that. I want to use your language, data ocean concept where literally all the apps sit on top of a central data store where they can access and process information. Without yarn that reality falls apart because reality is instead of having a data ocean, you have many sub data ponds all with their own access tools. So I think that this is a very big advance for the entire platform. Now going to your question about the emerging tech, the data in motion streaming in, I think that there's confusion a little bit on what exactly data in motion is. So I want to clear it up from our point of view. First off, there's data at rest, which is in the ocean, we get that. There's streaming data, which is data on its way to the ocean, analyzing it in route. That's an important piece of the conversation. A lot of people stop there and say streaming, that's it, I'm streaming, I'm done. Conversation, I got it in motion, I got it at rest. But what they're not talking about is the data that isn't on its way to the ocean. It's going between device to device. And the analytics that come off that are just as important to include in a complete intelligence picture. And sometimes that data wants to stay there. And it might always stay there, right? It doesn't have to come back and move. Think of your Apple Watch, right? Your Apple Watch talks to your phone. It's not going to another storehouse. It's going to your watch to your phone. And it's important data, because you're checking your heart rate, you're on the treadmill, you need to know if you should speed up or slow down. Good use case for cloud. Good use case. Very good use for cloud. You know, move the computer where the data is, that's kind of like. So talking about the cloud, right? All of these concepts around scale. Infrastructure at scale certainly has been solved by a variety of different partners that we have. We want to leverage that capability. But this whole concept of taking unstructured data and managing at scale in motion at rest, that's a conversation that has not yet been solved. And that's why we're focusing on that. Okay, so we're getting the time signal here. I want to get you guys the final word. What's on the roadmap? What's next? What are you guys working on? The product marketing and alliance piece, because you guys obviously have, you're very alliance friendly in the community, the way you guys are set up, and you got a lot of stuff going on in your area. What's next? What are you guys working on? So let's talk a little bit about security. We also made an announcement around a new Apache project called Metron. Did you see that announcement? No. All right, so Metron is a cyber security project. Oh, I did see that yesterday. People ask me all the time, what do you mean cyber security? Why are you guys even talking about cyber security? Your data and motion data. People are spending a lot of cash on cyber security right now. What are you trying to do? And we have a perspective, right? We have a perspective that if you look across all of these vendors that are in the security space, there's a common thread. Data is the foundation to the modern threat detection conversation. We have a contribution to make here. So we talk about new and exciting. We have announced our interests, our strategic vector around this. We're going to be making investments here. And it sits on top of these connected data platforms, right? And we're starting to see that these modern data apps that sit on top of these platforms are going to be able to add that value. And even Hortonworks thinks that we can make a contribution here. So that's an area of direction. The other is a great relationship with Pivotal. We talked about the new relationship we had with Pivotal. I know other spokespersons have talked about that. We talked about the fact that Syncsort is now part of our offering to help people onboard more ETL onto Hadoop. We're excited about all of that. Wake. On product side, certainly Cloud. All the customers we're talking to are asking us to give them more functions and features that allows them to streamlining lower the total cost of ownership that how to operate in a hybrid cloud. So that's the first focus. The second focus certainly is to continue the journey on security and governance. Yeah, you got to not come here. That's a little hanging fruit. That's table stakes. And there's so much activity. They'll take embryonic opportunity and try to rebuild custom solutions. But that's, but fraud detection is a big one, right? I mean, you know, persona-based 360 marketing is another. These are all easy use cases that are kind of on the table right now. Absolutely. We'll leverage the cloud. We'll leverage our security and governance. Great time here in Dublin. A great event last night. You guys did a great job. Thanks so much for having theCUBE. Really appreciate it. Thanks for the update. And we'll talk to you soon. This is theCUBE live here in Dublin. Be right back with more after this short break.