 from Berlin, Germany. It's theCUBE, covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. Hi, welcome to theCUBE. We're separating the signal from the noise and tuning into the trends in data and analytics. Here at DataWorks Summit 2018 in Berlin, Germany. This is the sixth year, I believe, that DataWorks has been held in Europe. Last year, I believe it was at Munich. Now it's in Berlin. It's a great show. The host is Hortonworks. And our first interviewee today is Scott Na, who is the chief technology officer of Hortonworks. Of course, Hortonworks got established themselves about seven years ago as one of the up-and-coming startups commercializing a then-brand-new technology called Hadoop and MapReduce. They've moved well beyond that in terms of their go-to-market strategy, their product portfolio, their partnerships. So, Scott, this morning, it's great to have you. How are you doing? Glad to be back and good to see you. It's been a while. You know, yes, I mean, you're an industry veteran. We've both been around the block a few times, but I remember you years ago. You were with Teradata, and I was at another analyst firm. And you know, you're with Hortonworks, and Hortonworks is really on a roll. I know you're not Rob Bearden, so I'm not going to go into the financials. But your financials look pretty good. You're latest. You're growing. Your deal sizes are growing. Your customer base is continuing to deepen. So, you guys are on a roll. So, we're here in Europe, and we're here in Berlin in particular. It's, you know, it's five weeks. You did the keynote this morning. It's five weeks until GDPR. The sort of Damocles, the GDPR sort of Damocles, it's not just affecting European-based companies, but it's affecting North American companies and others who do business in Europe. So, your keynote this morning, your core theme was that your, if you're an enterprise, your business strategy is equated with your cloud strategy now. It was really equated with your data strategy. And you got to a lot of that. It was a really good discussion. And where GDPR comes into the picture is the fact that, you know, protecting data, personal data of your customers is absolutely important. In fact, it's imperative and mandatory and will be in five weeks. Or you'll face a significant penalty if you're not managing that data and providing customers with the right to have it erased or to withdraw consent to have it profiled and so forth. So, enterprises all over the world, especially in Europe, are getting racing as fast as they can to get compliant with GDPR by the May 25th deadline. So, one of the things you discussed this morning, there was a, you had an announcement overnight that Hortonworks has released a new solution and technical preview called the Data Steward Studio. And I'm wondering if you can tie that announcement to GDPR, it seemed like Data Stewardship would have a strong value for your customers. Yeah, there's definitely a big tie in. And, you know, GDPR is certainly creating a milestone, kind of a trigger for people to really think about their data assets. And it's, but it's certainly even larger than that because when you even think about driving a digitization of a business, driving new business models and connecting data and finding new use cases, it's all about finding the data you have, understanding what it is, where it came from, what's the lineage of it, who had access to it, what did they do to it. These are all governance kinds of things which are also now mandated by laws like GDPR. And so it's all really coming together in the context of the new modern data architecture era that we live in where a lot of data that we have access to, we didn't create. And so it was created outside the firewall by a device, by some application running with some customer. And so, capturing and interpreting and governing that data is very different than taking derivative transactions from an ERP system which were already adjudicated and understood and governing that kind of a data structure. And so this is a need that's driven from many different perspectives. It's driven from the new architecture, the way IoT devices are connecting and just creating a data bomb, that's one thing. It's driven by business use cases of saying just what are the assets that I have access to and how can I try to determine patterns between those assets where I didn't even create some of them, so how do I adjudicate that? And discovering and cataloging your data. Discovering it, cataloging it, actually even, you know, when I even think about data, just think the files on my laptop that I created and I don't remember what half of them are, right? So creating the metadata, creating that trail of breadcrumbs that lets you kind of piece together what's there, what's the relevance of it, and how then you might use it for some correlation. And then you get in obviously to the regulatory piece that says sure, if I am an EU customer and I ask to be forgotten, the only way that you can guarantee to forget me is to know where all of my data is. If you remember that they are your customer in the first place and you know where all that data is, you're even aware that it exists, that's the first and foremost thing for an enterprise to be able to assess their degree of exposure to GDPR. So right, so it's like a whole new use case that's a microcosm of all of these really big things that are going on. And so what we've been trying to do is really leverage our expertise in metadata management using the Apache Atlas project. Of course, we work with IBM and the community on Apache Atlas. You know metadata tagging is not the most interesting topic for some people, but in the context that I just described it's kind of important. And so I think one of the areas where we can really add value for the industry is leveraging our lowest common denominator, open source, open community kind of development to really create a standard infrastructure, a standard open infrastructure for metadata tagging into which all of these use cases can help plug whether it's I want to discover data and create metadata about the data based on patterns that I see in the data or I've inherited data and I want to ensure that the metadata stay with that data through its life cycle so that I can guarantee the lineage of the data and be compliant with GDPR. And in fact, tomorrow we will have Mandy Chessel from IBM, Key Hortonworks' partner, discussing the open metadata framework you're describing and what you're doing. And that was part of this morning's keynote flow as well. So it all really flowed nicely together. Anyway, it is really a perfect storm. So what we've done is we've said let's leverage this lowest common denominator, standard metadata tagging in Apache Atlas, and up level it and not have it be part of a cluster but actually have it be a cloud service that can be enforced across multiple data stores, whether they're in the cloud or whether they're on-prem. And- That's the data steward studio. That's the data, well that data plane and data steward studio really enable those things to come together. The data steward studio is the second service under the data, the Hortonworks data plane service. So okay. Yeah, so the whole idea is to be able to tie those things together. And when you think about it in today's hybrid world, and this is where I really started where your data strategy is your cloud strategy, they can't be separate because if they're separate, just think about what would happen. All right, so I've copied a bunch of data out to the cloud. All memory of any lineage is gone. Or I've got to go set up manually another set of lineage that may not be the same as the lineage it came with. And so being able to provide that common service across footprint, whether it's multiple data centers, whether it's multiple clouds or both is a really huge value because now you can sit back and through that single pane, see all of your data assets and understand how they interact. That obviously has the ability then to provide value like with data steward studio to discover assets. Maybe to discover assets and discover duplicate assets where hey, I can save some money if I get rid of this cloud instance because it's over here already. Or to be compliant and say yeah, I've got these assets here, here and here. I am now compelled to do whatever, delete, protect and Crip. I can now go do that and keep a record through the metadata that I did it. Yes, in fact, that is very much at the heart of compliance. You got to know what assets there are out there. And so it seems to me that Hortonworks is increasingly, the H word rarely comes up these days. Not Hortonworks, you're talking about Hadoop. Hadoop rarely comes up these days when the industry talks about you guys. It's known that's your core, that's your base, that's where you, you know, HDP and so forth, great product, great distro. In fact, in your partnership with IBM a year or more ago, I think it was IBM standardized on HDP as they're in lieu of their distro because it's so well-established, so mature. But going forward, you guys in many ways, Hortonworks, you have positioned yourselves now. Wikibon sees you as being the premier solution provider of big data governance solutions, specifically focused on multi-cloud, on structured data and so forth. And so the announcement today of the Data Stewards Studio very much builds on that capability you already have there. So, you know, going forward, can you give us a sense for your roadmap in terms of building a data plane service? Because this is the second of the services under the data plane umbrella. Give us a sense for how you will continue to deepen your governance portfolio in the data plane. Yeah, I mean, really the way to think about it, there are a couple of things that you touched on that are, I think, really critical for, certainly for me and for us at Hortonworks, to continue to repeat, just to make sure the mess we got there. Number one, Hadoop is definitely at the core of what we've done and was kind of the secret sauce, some very different stuff in the technology, also the fact that it's open source and community, all those kinds of things. But that really created a foundation that allowed us to build the whole beginning of big data data management. Yeah. And we added and expanded to the traditional Hadoop stack by adding data in motion. And so what we've done is- NIFI, I believe. You made your investment. Yeah, so yeah, we made a large investment in Apache NIFI as well as Storm and Kafka as kind of a group of technologies. Yes. And the whole idea behind doing that was to expand our footprint so that we would enable our customers to manage their data through its entire lifecycle from being created at the edge all the way through streaming technologies to landing, to analytics, and then even analytics being pushed back out to the edge. So it's really about having that common management infrastructure for the lifecycle of all the data, including Hadoop and many other things. And then in that, obviously, as we discussed, whether it be regulation, whether it be, frankly, feature functionality, there's an opportunity to up-level those services from an overall security and governance perspective. And just like Hadoop kind of upended traditional thinking, and what I mean by that, it was not the economics of it specifically, but just the fact that you could land data without describing it, right? That seemed so unimportant at one time, and now it's like the key thing that drives the difference. Think about sensors that are sending in data that reconfigure firmware and that those streams change. Being able to acquire data and then assess the data is a big deal. So the same thing applies then to how we apply governance, right? And I said this morning, you know, traditional governance was, hey, I started as an employee, I have access to this file, this file, this file, and nothing else. So I don't know what else is out there. I only have access to what my job title describes, and that's traditional data governance. In the new world, that doesn't work. Data scientists need access to all of the data now. That doesn't mean we need to give away PII, right? We can encrypt it, we can tokenize it, but we keep referential integrity. We keep the integrity of the original structures, and those who have a need to actually see the PII can get the token and see the PII. But it's governance kind of thought inversely, you know, as that's been thought about for 30 years. It's so great you've worked governance into an increasingly streaming real time in motion data environment. Scott, this has been great. It's been great to have you on theCUBE. You're an alum of theCUBE. I think we've had you at least two or three times over the last few years. It feels like 35, no, it's been five. Yeah, it's been great. So we are here at DataWorks Summit in Berlin.