 Live from Dublin, Ireland, it's theCUBE. Covering Hadoop Summit Europe 2016, brought to you by Hortonworks. Now your hosts, John Furrier and Dave Vellante. Okay, welcome back everyone. We are here live in Dublin, Ireland on the ground for live special CUBE presentation theCUBE. SiliconANGLES flagship program. We go out to the events and extract the signal from the noise. I'm John Furrier, my host Dave Vellante. Our next guest has got now CTO of Hortonworks back on theCUBE. We were here together, not here, but in Silicon Valley for Big Data Week, which consists of two events, our event, Big Data SV, and Hadoop World. So we chat a lot about the stuff going on there. What's changed in two weeks? Obviously, European, different conversations. Out here, then in the States. What announcements do you guys? In like five minutes, things change. And that's a great thing about this market, right? The agility that we have there. And we've had a couple of big announcements this week. We've announced two major partnerships. We've announced some additional technology in the last two weeks. And you know what? If we get together in two more weeks, there'll probably be even more to talk about. So honestly, the big thing we're hearing is Spark and partnering, integration. These are kind of like data points that people are connecting around to the next generation of how fast acceleration will take place for these big data applications. Cloud is a big theme of that. Spark is a big part of that. What is the connected data platform technically in terms of your, from your position, the CTO? What are the key underpinnings for this connected data platform? You know, the key thing in there, really the word is platform. So in a rapidly changing market, which obviously we're in, where there are new tools, there are new requirements and new applications every day, the key thing that we think that makes it sustainable is that the fact that there is a central platform that can go through this evolution and agility and remain basically the platform so that applications don't have to be rewritten and rehosted every time there's a new shiny object, right? So platform is really the key, making it extensible, making common security, common data governance, common implementation, common operation, so that customers who are going down this path with us can really future-proof the business by making sure they have a sustainable platform into which new projects, new options, new applications can be built, and they know because of our open community model that will support any of those things that come along. I want to ask you a specific question on this connected platform, because I think this is a winning strategy you guys have. I know you get your meat and potatoes, your core engine, and you get the emerging group which we discussed at Big Data SV, but bringing it all together under one is really key, especially around the open source aspect because community will win the day, no matter who's involved, whether it's a data warehousing, pre-existing guys, to the new school coming in developers. If you don't have a community, you won't be a winner. So I got to ask you, and this is what we're talking about, a number of patchy projects that go into each distribution grows every year. So Cloudera tells us that 50% of the engineering effort goes into each incremental product that's towards interoperability, 50%, okay? Why not 100%, okay? Is that because they have other priorities? Do you guys have a similar philosophy when you're putting 50% of your time into open source and 50% into something else? I don't understand. Well, there's a significant effort that goes into interoperability and building the platform. So obviously I said, I think our differentiation is the fact that it's a platform and not a collection of different projects that don't speak with each other. So there is obviously going to be some work that goes into that. I think the difference really comes along because of our ability to really work with the community and within the community entirely in open source to actually help each of the projects do some of the interoperability themselves and to think about themselves as part of a bigger thing where we're not confused or have different proprietary things that we have to do manually and expend engineering resource in. We look at the whole thing as 100% open and we actually influence the community to make those things work together as a platform. So Cloudera has proprietary stuff. It would seem that way. Okay, so with that being said, inherently then you're saying is you're connected data platform with 100% open source is inherently interoperable because it's open source and customers can figure out how to interoperate themselves. That is what we deliver, is that interoperability? So follow up question on that because the Cloud guys generally at Amazon specifically seem to be kind of cherry picking on the big data innovation and building a data pipeline that's set of services pretty simple and easy to use. Is the deck stacked in their favor from the standpoint of adoption because of that dynamic? Or would you argue that the richness of your ecosystem is going to trump that simplicity? I wonder if you could square that circle. Yeah, you know, and it's interesting, right? I think that obviously across the landscape and some of the cloud providers there's a lot of really good stuff going on out there, right? And one of the things there's one strategy that says, let me do something that's really simple, that's very small, very contained, as an easy on-ramp for customers who are just sticking their toe in the water and trying out these new kinds of applications. The trouble becomes obviously that if those things don't operate together as a platform over time, they become very expensive to maintain and manage, forget the cost of the cloud infrastructure, just for companies to reconcile 37 different applications and they'll talk with each other that don't use common data sources becomes unsustainable. So we've chosen a platform kind of strategy and I think one of the things that you'll see us working on is ways to package that platform to make the simple onboarding easier, right? So one of the announcements that we made this week, the partnership with Syncsort for DMXH is a really nice way to package ETL onboarding from legacy systems, right? So customers who are building data lakes, one of the things they want to do is not only get new and emerging data into the data lake, but they want to pull in some of the legacy data from their operational systems, right, to get better analytics. The DMXH is an easy button for that. So you'll see us looking at opportunities where we can make it easier to either onboard data, easier to build application, but not lose sight of the fact that it's got to be an integrated platform for sustainability over time. So that's a solution that you guys, there's a go-to-market, what is that? It is actually a go-to-market so we can resell kind of as a single vendor a package of the Syncsort solution along with Hortonworks data platform. And it goes together as an ETL onboarding for legacy data. And then you made another announcement with Pivotal, specific to HAWK, correct? What's that all about? So yeah, so we made a big partnership announcement about Pivotal, obviously Pivotal is adopting HTTP as kind of the core distribution for Hadoop inside of those applications. And we also have the ability to deploy Apache HAWK, the open community version of HAWK as another data access engine. So this gets also to the notion that we're not building individual pieces, but in fact we're building a platform. Having HAWK in our portfolio is another data access engine for our customers to use without having to change platform without having to rewrite all of their underlying infrastructure, but now gives them another option to plug this tool in where they have specific applications that might take advantage of that functionality. At the same time, and we've gotten some questions, we're not abandoning Hive, right? It's not the tool, it's another tool, right? And so there are a lot of advantages that HAWK brings to the table, there are a lot of advantages, Hive brings to the table, Spark, HBase, we want to have them all be part of that platform. She mentioned Spark and Hive, so obviously Cloud RST, Spark as an execution, the default execution engine for the ETL with Hive, IBM says Spark's going to hollow out Hadoop. How do you guys see Spark fitting into your roadmap? So again, I don't see it as either or, but and, and I think the real value is in the and. Spark as an engine is extremely powerful and it has some use cases where it's like the best things in sliced bread, right? There are other use cases where it's maybe not the best option, and so I don't see it as a one-size-fits-all. It's a use case-driven scenario. It's a use case-driven scenario, and it needs to be part of the portfolio, and it's obviously integration with our latest release of HDP. We have the latest version of Spark supported, and we will continue to do that. We also announced in March a partnership with HB, E, on adding to the Spark community and adding functionality and performance. So we are a believer in the engine, but we don't think it's the only engine. Yeah, what about the conversations that you're in from a technology perspective? Obviously, we're living in a weird time right now. It's global economy, global communities and open source. Are the conversations different in the States versus here in Europe? We're around some of the data questions. I mean, we've always seen kind of pockets of country-specific policies and governance and things and whatnot, not as IoT data. I think we were talking about telemetry data or telematics data that could be kind of crazy. Yeah, I think the biggest difference in the conversation really is around security and privacy where there are so many different and discreet rules and regulations that are very different than in the United States. And so we have a lot of conversations about that. Obviously one of the announcements we made on the integration of Atlas and Ranger, integrating metadata tagging and security is something that's of great interest in this part of the world because there are a lot more rules and there's a lot more sophistication in how data can be used and transferred. You know, it's interesting even when I connect here in the hotel and I was Googling something, I got a little notice, you know, because of where you are, some data have been restricted. I'm like, oh, okay, I haven't thought of that. Yeah, so that surrounds us and I think that is one of the centerpieces of the conversation that we have with customers here in this part of the world. I want to ask a security question. I'm working with the CXOs in the Wikibon community to basically write a research document on how to communicate with boards of directors on what they need to know about security, with the threat matrix, you know, continuously growing, two questions. One is what's changed that boards need to know about specifically and what's your advice to a CXO communicating to a board about security? How should they approach it? So first question is what's changed that I should be communicating to the board? You know, I don't know that there's anything that's really changed specifically that should be communicated to the board. I think the threats continue to change and mount up as people devise new ways to try to infiltrate. I think it is really simple that there are three things that boards need to look at and understand and understand that people are managing. First is what's my security perimeter? How do I manage the fence? Am I using best practices in managing the fence? Can I report on the threats that have been deterred and so on? The second thing is how do I compartmentalize the data inside the fence so that when someone breaks through that fence, and I'm sorry, the question is not if, the question is when, when they break through that fence, how do I compartmentalize it so I can minimize the data loss? And then third, what are my logging practices so that when number one happens and number two happens, how do I understand what got out so that I can minimize the impact so that I can take direct action and not have to go replace every credit card for every user? A response plan. A response plan. So if I can follow up. One of the things that seems to have changed and I want to test this is that executives are more open and transparent about the probability of getting hacked. You said it's not if, it's when. Do you see that change? Yeah. Yeah. You know, when I say that, I get varying degrees of responses. And I think there's an acknowledgement that frankly the world we live in, you should expect that it will happen and hope that it doesn't, as opposed to the other way around. And if you think about it the other way around, you're living in this partial state of denial because it's out there. And if you believe that that perimeter, that fence is going to be impenetrable and you don't take action on the inside, then the loss downstream is a lot worse. So I tend to be very pragmatic. Plan on it happening and build in that response, build in the compartmentalization, build in the logging. And by the way, if the fence holds up, you've won. Hope for the best plan for the worst, as you like. Scott, I got to ask you the question I asked Sean Connolly who's filling in for Rob Bearden earlier. A lot of hands went up when Herb and it's asked the audience, I mean people are new to Hadoop. You're still seeing a nice inbound migration of new people coming in the community from a customer standpoint, potential customers for you guys and developers. What would you say to those folks around how you guys are different from Cloudera and other approaches? What is the Hortonworks from a CTO perspective? What is the Hortonworks main value proposition? Yeah, the value proposition really is in connected data platforms. I think we're the only vendor talking about data at rest and data in motion, and really being able to manage the end to end process from point of data creation to point of analytic and back with security, with governance and all that kind of stuff. And then secondly. Vapor or shipping product? Shipping product that works today. Talking about it doesn't mean you're doing it. You are doing it. Installed with customers at scale. And that we do it as 100% open community. So that means that we can continue with the agility, getting new and fresh ideas in there and be very transparent with our customers and prospects. I said two things, but the third thing, and that is that we're also publicly traded. So we're 100% open, right? We are completely transparent. You're open book, literally. We are open to the third power. We're open community. We're open ecosystem with all the partners that are attending here. And we're open in terms of being public and transparent. What one thing that you would like to share with folks out there that they may not know about Hortonworks or something that could be a misperception on Hortonworks? What would you share? Wow. That was a trick question. It's hard to think of just one thing. There's a lot of fud going around the market. There's a lot of people saying this is any other thing. They don't, I mean, end to end thing's interesting. It's shipping product, end to end. Yeah. Yeah, some of the people should know about Hortonworks. It is not vaporware and the intensity and the energy level that I see every day when I go to work is just, it's very encouraging. It's really great to see. I have one. I think people still think Hortonworks is a services play. There's still a lot of people out there. I mean, people in the community obviously understand it, but there are a lot of sort of people that observers, maybe investors or whatever, that hear somebody say that, that fud, and so it's a services play. It's a services business model. It doesn't scale. That's not true. I think that that's completely untrue. Obviously I voted with my feet and I'm here and I believe that we sit at a very unique time. When we were together two weeks ago, I talked about the data tipping point and just what's happening just with the sheer volume, velocity and variety of data. And I believe that the only way to really address that successfully for the long term is the model that we've chosen. And those things coming together in my mind, it's a once or twice in a lifetime kind of opportunity. Yeah, and the connected network I was telling Sean, I love that positioning. I think connecting data can give the 360 degree view because you're getting all the data's out there now. And it's not about mutually exclusive platforms, it's about integration. That we see integration. So my final question is with that being said, what is the white space that you're going after for your plan now? Obviously the world's evolved, you got IoT, you got to end to end now, but it's not truly end to end. It's still changing, it's evolving, it's a moving train. Whatever you want to call it, what are the white spaces? I think there are a couple of areas of white space. One is in areas of new algorithms and analytics. As we get new IoT data in, we're going to have new kinds of analytics evolve. And so fueling the pump on that to get better insights back to our customers, that's an area of white space. And the Apache Metron is an example of other white space of connected applications where we think an open approach can make sense. Obviously in cyber as a big problem to go solve, my simple minded way to think about it is the bad guys are a community of bad guys working against the perimeters that we're building and telling our boards about. The good guys combining together into an open source cyber module is a really good thing. And we'll look for other areas of opportunity for connected modern data applications, especially where we think there's an open community play. Is that really your last question? I'm taking bets on that. This is like the fourth last question. I might have a final final question. Okay, good. So, John, you touched on it before when you were talking about Spark, potentially trying to marginalize Hadoop and deposition it as just a storage engine. But isn't that kind of what Hadoop was originally envisioned to be was a storage engine and does that bother you? Do you care about that or is that a misconception or talk about that a little bit? I used to be a teenager and now I'm not. I think that obviously there's still some perception out there that we're still in Hadoop 1.0 which was HTFS and MapReduce. It was batch, single application. The world has changed, right? We're now a multi-tenant, multi-application. We're real-time, we're batch, integrated workloads. Now with Hortonworks Dataflow and our ability to move data, we're also combining data in motion, data in REST. It's a platform decision for modern data architectures. It's not a single-purpose thing. Some people may have missed that change in the marketplace. Okay, my final, final question is, okay, if you look at it over the next few minutes. Do I get paid extra for this? You can ask me a question. All right. Go ahead. No questions, okay. Final, final question. Obviously the show here is smaller, okay? The big keynote presentations from Worldpay. Big player up there. Obviously smaller crowd and infinite in Europe, but that's a real fully, that guy's fully loaded with Hadoop, so that's one of those situations where he's in production, real critical financial transactional thing. Can you talk about that and how that, because it seems to be much more telco, a lot of telcos in Europe. So transactional stuff is key. But it's also a pretty big deployment. Why Worldpay on stage? Was it the security angle? Was it the, can you share some of the deployment scenarios? Well, I think the big thing is showing relevance and gravitas is really important. And I think that that was obviously the intent. And when we go looking for those kinds of opportunities, the folks who are gathered here love hearing from Hortonworks and from other vendors and solution providers as they're looking at their roadmap. In the end, the most valuable thing that they can leave with is hearing from others who have gone before them and done it and been successful, creating that credibility. One of the great things about the show, not just the keynotes and the speeches and the sessions is a networking opportunity where folks can actually talk with others who have implemented and get the real deal. And so, I think with any of these shows, hearing from the vendors is really great. I love presenting from time to time, but hearing from folks who have done it is relevant and we'll continue to push that for this event, this event next year, the event that we have coming up in San Jose. Scott, thanks for sharing the insight on theCUBE. It's great to hear from you. And also, as the CTO, you have a good view into a lot of the frontline activities as well as inside the kingdom of the community and the Hortonworks. So thanks for your time. Thanks for having me back. We are here live in Dublin, not London, Dublin, Ireland for theCUBE's special presentation of Hadoop Summit 2016. We'll be right back with more live coverage after this short break.