 Live from Dublin, Ireland, it's theCUBE, covering Hadoop Summit Europe 2016, brought to you by Hortonworks. Now your hosts, John Furrier and Dave Vellante. Okay, welcome back. And when we are here live in Dublin, Ireland for the Cube's European edition of Hadoop Summit, sponsored by Hortonworks, Yahoo, the school of other industry leaders, bringing the Hadoop Summit show to Europe, really with a big data gain, certainly as elevated as well as it is in the States. The governance is secured at the top of the list. I'm John Furrier with my co-host Dave Vellante, it's theCUBE, our flagship program. We go out to the events and extract the signal noise, our next guest is Sean Connolly, VP of strategy of Hortonworks, filling in for Rob Bearden, who had to take the jet back to the States. Welcome back to theCUBE, good to see you. Thanks for having me. Yeah, no concord, he's got to take, you know. A five foot eight right hander. He's still going in. So how's it going? Obviously, we were at Big Data SV, that's part of Big Data, where we have our event, Big Data Silicon Valley, and then there's the other event at Duke World going on. A lot of talk about the cloud. We saw your messaging, that was on, we had way on early during that show, but there's a new emerging aspect of Hortonworks. You have a whole emerging products group. Seems to be, you've got your blocking and tackling groups, and there's kind of a speed team, going after the new IoT, whole new sets of opportunities that you guys have to be very agile on. Is that kind of the strategy? Is that by design? Is that reacting to market pressures? Talk about the strategy. Well, it's how you evolve, from an open source business, there's various ways that you can scale the business. One is to broaden and deepen the portfolio, right? So I think towards the tail end of last year, we started making the transformation from a single product platform company to a multi product and solutions company. And that's where we have our emerging products team focused. I'm in the areas of data and motion, in particular in the internet of things area, whether it's related to cybersecurity, connected car and some of the emerging use cases. And so we have a targeted team of really passionate folks. We made the acquisition of Anyara last August. With that came sort of that Apache NIFI technology that I call it like a data logistics platform. It's like a FedEx system for data delivery. It's kind of how I like to look at it. And that's sort of the foundation of that team when managing data, getting to the data where it needs to be. So talk about the dynamics going on in the industry right now because you guys have seen the transformation you've lived, I saw Arun's keynote, the journey, my journey with Hadoop was awesome, but 2006 today, but the world's changed. And you guys have interesting positioning in this new connecting the data. Because when you look at it, we found out at our last event that it's pretty clear that the existing data warehouse business tells us, I'm going to have a nice little coexistence piece. It's not about ripping and replacing those guys. You'll have big MPP databases out there from the verticals of the world. And so you have an integration message and with the cloud powering at all, you're seeing IoT emerge. So IoT integration, what is this connected data platform that you guys talk about? At the heart of it, besides the messaging and the marketing of it, where did that come from? How's that tie into the execution on the product side? Yeah, so in our mind, it's a world of connected data platforms and those platforms need to reside wherever they are. You may have it as close to the edge to do real time edge analytics. It may be in the cloud to do machine learning and exploration and discovery in an agile way. It may be on-prem data lakes, where you have a lot of data at rest. And the key is, is how do you create an architecture to actually very actively orchestrate the data and get it to where it needs to be in a secure way, in a transparent way, in a way where you can actually influence the edge. And basically say, you're giving me these 50 data points, stop sending 10, send me these other five because I need to reshape the data that I'm getting based off of the analytics and I'm driving out of it. So it winds up being a really closed loop system that's very adaptive, if you will. And so it's really, it's an architecture sentiment, if you will, of connected data platforms, some of which are ours, some of which are our partners. Many are in cloud, many are on-prem, right? So that's today's reality. So, we go to a lot of events, marketing always leads the real adoption of technology. So I wonder if you can help us sort of level set here. We heard this morning, Rob Bearden emphasizing business impact over technology, which we talked about a lot for the last five or six years at shows like this. And he gave a couple examples, a retail company and an insurance company that wanted a 360 degree view of their business. I talked to John earlier about, they probably wanted that same 360 degree view back in Y2K and they weren't able to deliver on it. So now we're talking about Hadoop and big data delivering on that. But we tend to get distracted with IoT and some of these new things. Now, are they related or what gives you confidence that you and your ecosystem can deliver on that promise where the traditional vendors weren't able to do so back in 10 years ago, 15 years ago? You know, I basically say we've gone through a variety of ages, the age of a relational database, the age of web, right? And you have Salesforce and others that were born out of that and Apache, web server at the time played a key role, Linux played a role, we're in the age of data. And to your point is, is before we've arrived where we are, in many cases, the systems have been siloed and highly structured. And I think that's what's different these days is you can deal with almost any form factor of data, big or small. That wasn't the case five, 10, 15 years ago, right? So your 360 degree view winds up being not just about the structured world, it winds up being about all of that contextual data that surrounds those transactions. You now have the economics and the ability to actually capture that and drive analytics out of that. That work's been playing out over the past five years. I actually have been in the cloud space for a while and I really am excited about how cloud architectures are now further drive and accelerate that those types of use cases where you can act on any type of form factor, no matter where it may be born and where it lands on-prem or off-prem. But you need an architecture that joins it all together cleanly, right? We need a vibrant way of getting the data so you can have that 360 degree view. Okay, and is your sense that we're starting to get there, I guess is the claim, we're there in some cases. Fraud detection was another use case we heard about this morning. That's something we can all relate to. Years ago, fraud detection was sampling. Six months later, you'd get a maybe something happen. Exactly. Fraud detection sort of 12 months ago and I want to understand your perspective on whether or not that's changing and how fast that changing was not quite real time. It was very close. Hey, a transaction just occurred with a lot of false positives. Feels like many of those are being eliminated in the last 12, 15 months. Is that true? Why is that true? Well, I think a lot of these waves build on each other, and so deep learning is a facet of that, but ultimately when you pair deep learning, which is the full historical view with stream, being able to actually interact with data as it's flowing, you get adaptive systems, adaptive learning systems, and I would say right now it's exciting. The technologies are increasingly available to create those types of systems. I saw in your crowd chat where you were asking, where are the packaged apps and those types of things? Where are the packaged applications? And I think you and I have had this offline before where I think companies are born as software as a service solutions. The company is the application. Gone are the days where you're creating shrink wrap software, perpetual license, install it on prem, right? It's in this world of the connected world and the applications are effectively companies, right? And that is the customer, that is the build, the commercial app. For example of that, you look at what Zuckerberger now says to their 10 year roadmap. It's all apps, companies, what's app? Exactly. Instagram, Messenger. And what's under the hood? Data. All right, those businesses are based off of data. And so this vision of connected data platforms is it's a data-first way of looking at the architecture. I'm a big fan of containers, that'll move compute, but at the end of the day it really needs to be how do you connect your data for these applications? Because you're not going to move it. And you need to make sure you interact with it where it is, right? I think you bring up a great point. That's why I like your marketing around connected data platform. All the apps that are now companies, really, that's the explosion. That's the renaissance that we're seeing. It's right in front of our eyes. It's not some new software package. It's the company. And in many cases, the technology is swipe your credit card and you can have it booted up almost immediately. So our new head of research, Peter Burris, who's on our team, I think you've met him. He and I talk about this and this is key to his thesis is that context is everything, right? That's why Facebook couldn't be successful in these other apps because the context was friends on Facebook. The other context, what's app was doing? So each app has context, whether it's a certain application. That's the data value. Each app has different contexts. That's the value of the data connected platform. Can you allow, do you agree? I'm sure you agree. And two, can you elaborate what that means for a practitioner saying, okay, how do I figure this out? Yep, and that gets to the point of the single view is your point is a lot of these applications have the context. The power becomes when you can actually blend that context across these different applications to actually derive that single view. And that was that credit fraud example. In the smart car wave, it's very similar is data's coming from manufacturing line as much as it is from driving habits, right? And these auto companies are really getting a highly contextual insight into the driving behavior of the people in the car. You mentioned CrowdChat, go to crowdchat.net slash HS16 Dublin. And one of the questions I did put on there last now is, to what extent does Hortonworks see native services from cloud vendors as competition? We were talking on our intro, with the cloud being the power source for this accelerated explosion of context and apps and more data. Exactly. You're seeing IoT. This is a developer dream. Developers love Amazon. So naturally you look at that and say, well, I got Kinesis, I got Redshift, I got Lambda for all those other stuff. So how do you guys view that? How do you intersect with that? What's your coexistence? And since our beginning, our approach has been, how do you integrate the tech in the initial waves it's been? How do you integrate it with data warehouse, data marts and things like that, typically on-prem? In the cloud world, this connected data platforms cuts at the heart of exactly your point is, you want to be able to easily get data into the services. Like if you need to get it in a big query or Redshift or what have you, it's not dissimilar to an on-prem architecture except it's a cloud service. And you need to connect the data platform architecture to factor those services in. It isn't just about Hortonworks platforms and technologies. It's how can we be that connective tissue to drive that architecture? You're connector-centric. And you'll get a connector as a service to a cloud. And it's connecting the data from it, I think will be great at getting it from virtually almost at the source and getting it from on-prem. And if it needs to get in cloud in specific form factors, we can get it there secure in a way where you get traceability, provenance, governance and things like that. So you guys embrace the cloud if the customers want to go Amazon or on-prem? Yeah, it's more use cases. It drives more consumption of data and the need for our technology. So one of the things the big cloud guys are doing, Amazon in particular, and I presume Microsoft to a certain extent Google, is they're making it really easy to construct the data pipeline. How is Hortonworks and your ecosystem, your partners, how are you sort of replicating that simplicity in that data pipeline? Yeah, I'll use Microsoft as an example because it's a good one, it's a cloud-based example. I'm going to bring that up in my next question. So Azure HD Insight is effectively Hadoop as a service, where Microsoft's data platform powers that service. But if you just look at it in isolation, it's interesting, you can do it, you can get started very quickly. But when you pair it up with things like Azure Data Factory, we can automate your pipelines and boot up, do jobs, shut it down. When you pair it with Azure Machine Learning for doing machine learning around the data you've gotten into the cloud platform, when you share the data downstream into some of the SQL and data warehouse technologies, then you wind up building a really interesting, holistic solution to get to that 360-degree view. So it's a much of an integration and getting data in the right form factor. As we've had on-prem, but the nature of the services change in the cloud. So Microsoft Relationship, any update there? Any new developments, extensions? We continue to roll out. I know just past couple weeks ago, the latest version of Spark has been available in HD Insight, seeing a lot of uptake there, a lot of excitement. They bring even more tools in that service. The Zeppelin Data Science Notebook is one of the ones that's in the Hadoop ecosystem for doing Spark development. But the Jupyter Python Notebook is also available in HDI as well as the R community. So it speaks multi-language. Whether you're a Spark guy or gal, R or a Python enthusiast, you can do your work. So I got to ask you the question. We saw the hands go up when Herb was out in the keynote and asked everyone three questions. Who's new to Hadoop? Still, a lot of people onboarding to the Hadoop ecosystem, either from a vector of I want to use Hadoop and or, I'm interested in using it in conjunction with existing stuff. So a new migration continues onboard new people. Okay, so that's cool. So I got to ask you, what's the message to those folks who want to look at how to get involved? And specifically, they want to know, and we have the same issue with our writers we hire, Hortonworks Cloudera always is being discussed. How do you still talk about the Cloudera versus Hortonworks Equation today? Because you got a lot of things going on. You got Spark, a lot of people talking about stuff. How do you guys compare versus Hortonworks? I mean, Cloudera. Yeah, so when I look at the next wave of onboard, and I'm a technologist by background, but those who are successful focus on the use cases and there's a lot of repeatable use cases in around active archive and single view and data exploration and data science or even predictive applications. And there's a journey to that. I think even in the keynote that you saw in that credit fraud, there was a journey to assemble your single view before you get to predictive. That's what we focus on with new folks are we're in the customer success business being an open source subscription model. Even if it's in the cloud, how you succeed is you drive more usage. So you focus on the use cases. You enable those use cases and you bring technology to bear on those use cases. I happen to think that we productize Apache technology into consumable products on primer in the cloud, the best in the business. I think we have, our engineers are just fantastic at bringing that together in a consumable way. Sean, I want to ask you about security. Shift gears a little bit. I want to ask a business question before we geek out about Metron. So it seems like CXOs have a choice when they go to their boards. The threat matrix now is so big. They can either say, hey, everything's cool, we got this, or they can say, listen, chances are we're going to get hacked, we're going to get compromised. We need a response plan and I can lead that. Are you seeing that? What does the board need to know at this day and age about security? How are you advising your customers, senior executives to communicate to their board of directors? And this gets back to the whole data driven thing and the data motion data at rest is the nature of solving that problem has fundamentally changed from how it's been done in the past. It's been a variety of stovepipe solutions that just do one piece of the security puzzle and it gets back to that single view problem but as it relates to security. So when we talk to chief information security officers, they can't inundate a single person with 10,000 events and expect them to respond in a timely fashion. The nature of the approach to that problem has to change. You need it automated, you need advanced machine learning to underpin that and you need it to bring the five needles out of the giant haystack of things they could potentially react to. And so that's where you see technologies like Apache Metron that are of keen interest to the chief information security officer. And particularly with our customer base, that term data lake has been thrown around a lot. We see enterprise data lake but we see security data lakes and in many respects it's a fine tuned security data lake to solve that problem, to get the network packets captured as well as a lot of these new forms of threats and enrich their algorithms with really being able to respond to the new emerging advanced threats immediately, right? Not after the fact. A security data lake is a corpus of data that you run analytics on that give security outcomes. And that's a common term and the chief information security officer typically owns that. There's a lot of compliance around that and that's why I like that NIFI technology is important. As many cases your enterprise data lake wants some of those same security logs so you should be able to fork that, you should have a connected data platforms architecture where you can fork that stream, get the enterprise data lake team that feed but also get the security team what they need to solve their problem to protect the business and protect their customers, right? What's the conversation difference between the state's show and here in Europe? Officer, we're in Dublin. Can you share? Obviously it's different geographies, different issues but what are those conversations here that are different from the conversations in the state? Yeah, before I was coming on in October I spent the better part of the month in Europe mostly in London and I toured around and I'll be heading back to London after this event. I would say London on the UK in particular is probably on par with the South Central in the US when their adoption of the tech, the sophistication of the use cases, the business value propositions. Some of the other countries are a little further behind but this isn't an 18 months behind thing. There are really great single view where GPS location winds up being the pivot point not just the customer because you have properties that are under insurance and you need to enrich it with global event data to figure out what your risk exposure is. That's a really fantastic problem that these technologies can actually help you solve, right? And the availability of public data sets that you can enrich to solve those problems. There are people in the UK doing those types of use cases. What about governance, topics like governance and more or less, what's rising? Security and governance are of extreme importance particularly in Europe. That's why I think it's top of mind for this. More than the US, less than the US team? I would say the more conservative middle of the US and East Coast US, it's a higher priority. I think the West Coast doesn't mean that it isn't a priority but there's just a lot of wave of innovation that tends to emanate out of there. So extracting the signal from the noise sometimes is a little bit harder there but when you go into Chicago and when you go into more of the, I'm from the filly area from one of the blue collar practitioner states it's security is clearly and governance are clearly top of mind, right? It comes with different mandates too, right? So it's another issue here. Exactly. I think one of the examples I think Arun alluded to when he was talking about some of the tech directions was how do you marry metadata and tag based authorization policies? If you tag your data by location, let's say you log in in the US and you're allowed to have access to data while you're in the United States but if you're traveling abroad and you log in from somewhere in Europe you may not allowed to have access to data and now you can actually do those types of scenarios in the data lake where based off of that tag in your location, you'll be prohibited access. It's a very sophisticated security, credential and authorization notion that applies well to data lake architectures. Sean, thanks for sharing your insight inside theCUBE. Appreciate it. Great to see you here in Europe. This is theCUBE on this European kickoff tour. Thanks to Hortonworks support. Appreciate all your support and... Thanks for hosting again. Hopefully you'll get a lot of people on. It's theCUBE. We're live here in Dublin, Ireland. We're home of Guinness as Dave allowed his favorite beer and many of the tech community has to get some big keynote plug on the front stage from Rob Beard. So we'll be right back with more live coverage here at Hadoop Summit Dublin 2016 after this short break.