 Live from San Jose in the heart of Silicon Valley. It's theCUBE, covering DataWorks Summit 2018. Brought to you by Hortonworks. Welcome back to theCUBE's live coverage of DataWorks Summit here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobelius. We're joined by Rob Bearden. He is the CEO of Hortonworks. So thanks so much for coming on theCUBE again, Rob. Thank you for having us. So you just got off of the keynote on the main stage. The big theme is really about modern data architecture. So we're going to have this modern data architecture. What is it all about? How do you think about it? What's your approach? And how do you walk customers through this process? Well, there's a lot of moving parts and enabling data architecture. One of the first steps is what we're trying to do is unlock the silo transactional applications and to get that data into a central architecture so you can get real-time insights around the inclusive data set. But what we're really trying to accomplish then within that modern data architecture is to bring all types of data, whether it be real-time streaming data, whether it be sensor data, IoT data, whether it be data that's coming from a connected car across the network, and to be able to bring all that data together in real-time and give the enterprise the ability to be able to take best-in-class action so that you get a very prescriptive outcome of what you want. So if we bring that data under management from point of origination and out on the edge and then have the platforms that move that through its entire lifecycle, and that's our HDF platform, gives the customer the ability to, after they capture it, at the edge, move it and then have the ability to process it. As an event happens, a condition changes, various conditions come together, have the ability to process and take the exact action that you want to see performed against that, and then bring it to rest. And that's where our HDP platform comes into play, where then all that data can be aggregated so you can have a holistic insight and have real-time interactions on that data. But then, it then becomes about deploying those data sets and workloads on the tier that's most economically and architecturally pragmatic. So if that's on-prem, we make sure that we were architected for that on-prem deployment, or private cloud, or even across multiple public clouds simultaneously and give the enterprise the ability to support each of those native environments. And so, we think hybrid cloud architecture is really where the vast majority of our customers today and in the future are going to want to be able to run and deploy their applications and workloads, and that's where data plane service offering gives them the ability to have that hybrid architecture and the architectural attitude to move workloads and data sets across each tier transparently to where that, what storage file format that may be or where that application is, and we provide all the tooling to mask the complexity from doing that. And then we ensure that it has one common security framework, one common governance through its entire lifecycle and one management platform to handle that entire lifecycle of data. And that's the modern data architecture that's be able to bring all data under management, all types of data under management, and manage that in real time through its lifecycle till it comes at rest and deploy that across whatever architecture tier is most appropriate financially and from a performance on cloud or prem. Rob, this morning at the keynote here in day one at DataWorks Center, Jose, you presented this whole architecture that you described in the context of what you call hybrid clouds to enable connected communities and with HDP, Portland Works Data Platform 3.0 is one of the prime announcements. You brought containerization into the story. Could you connect those dots? Containerization connected communities and HDP 3.0? Well, HDP 3.0 is really the foundation for enabling that hybrid architecture natively and what it's done is it's separated the storage from the compute and so now we have the ability to deploy those workloads via a container strategy across whichever tier makes the most sense and to move those application and data sets around and to be able to leverage each tier in the deployment architectures that are most pragmatic. And then what that lets us do then is be able to bring all of the different data types whether it be customer data, supply chain data, product data. Imagine as an industrial piece of equipment is an airplane is flying it from Atlanta, Georgia to London and you want to be able to make sure you really understand how well is that each component performing so that if that plane's going to need service when it gets there, it doesn't miss the turnaround and leave 300 passengers stranded or delayed. Right, now with our connected deployment we have the ability to take every piece of data from every component that's generated and see that in real time and let the airlines make that in real time and ensure that we know every person that touched it and looked at that data through its entire life cycle from the ground crew to the pilots to the operations team to the service folks on the ground to the reservation agents and we can prove that if somehow that data's been breached that we know exactly at what point it was breached and who did or didn't get to see it and can prevent that because of the security models that we put in place. That relates to compliance with mandates such as the Global Data Protection Regulation GDPR in the EU. At DataWorks Berlin a few months ago you laid out, Hortonworks laid out a renouncing new product called the Data Steward Studio to enable GDPR compliance. Can you give our listeners now who may not have been following the Berlin event a bit of an update on Data Steward Studio how it relates to the whole data lineage or set of requirements that you're describing and then going forward, what is Hortonworks' roadmap for supporting the full governance life cycle for the connected community from data lineages through like model governance and so forth. You can just connect a few dots that would be helpful. Absolutely. What's important certainly driven by GDPR is the requirement to be able to prove that you understand who's touched that data and who has not had access to it and that you ensure that you're in compliance with the GDPR regulations which are significant but essentially what they say is you have to protect the personal data and attributes of that data of the individual. And so what's very important is that you've got to be able to have the systems that not just secure the data but understand who has that accessibility at any point in time that you've ever maintained that individual's data. And so it's not just about when you've had a transaction with that individual but it's the rest of the history that you've kept or the multiple data sets that you may try to correlate to try to expand your relationship with that customer and you need to make sure that you can ensure not only that you've secured their data but then you're protecting and governing who has access to it and when and as importantly that you can prove in the event of a breach that you had control of that and who did or did not access it because if you can't prove in a breach that it was secure and that no one breached it or accessed it that it's not supposed to you can be opened up for hundreds of thousands of dollars or even multiple millions of dollars of fines just because you can't prove that it was not accessed. And that's what the variety of our platforms you mentioned Data Studio as part of Data Plane is one of the capabilities that gives us the ability that the core engine that does that is Atlas and that's the open source governance platform that we developed through the community that really drives all the capabilities for governance that moves through each of our products, HTTP, HDF, then of course in Data Plane and Data Studio takes advantage of that and how it moves and replicates data and manages that process for us. One of the things that we were talking about before the cameras were rolling was this idea of data-driven business models, how they are disrupting current contenders, new rivals coming on the scene all the time. Can you talk a little bit about what you're seeing and what are some of the most exciting and maybe also some of the most threatening things that you're seeing? Sure, you know in the traditional legacy enterprise it's very procedural driven. You think about classic and core ERP. It's worked very hard to have a very rigid, very structural procedural order to cash cycle that has not a great deal of flexibility and it takes through a design process that builds product that then you sell product to a customer and then you service that customer and then you learn from that transaction different ways to automate or improve efficiencies in your supply chain, but it's very procedural, very linear. And in the new world of connected data models and you want to bring transparency and real-time understanding and connectivity between the enterprise, the customer, the product and the supply chain and that you can take real-time best-in-practice action. So for example, you understand how well your product's performing. Is your customer using it correctly? Are they frustrated with that? Are they using it in the patterns and the frequency that they should be if they are going to expand their use and buy more and if they're not, how do we engage in that cycle? How do we understand if they're going through a re-review and another buying of something similar that may not be with you for a different reason? And when we have real-time visibility to our customer's interaction, understand our product's performance through its entire life cycle, then we can bring real-time efficiency with linking those together with our supply chain into the various relationships we have with our customers. To do that, it requires the modern data architecture of bringing data under management from the point it originates, whether it's from the product or the customer interacting with the company or the customer interacting potentially with our ecosystem partners, mutual partners, and then letting the best-in-practice supply chain techniques make sure that we're bringing the highest level of service and support to that entire life cycle. And when we bring data under management, manage it through its life cycle and have the historical view at rest and leverage that across every tier, that's when we get these high-velocity, deep transparency and connectivity between each of the constituents in the value chain and that's what our platforms give them the ability to do. I remember your platform, you guys have been in business now for I think seven years or so you shifted from being in the minds of many and including your own strategy for being the premier data at rest company in terms of a Hadoop platform to being one of the premier data in motion companies. Is that really where you're going to be more of a completely streaming focused solution provider in a multi-cloud environment? I hear a lot of Kafka in your story now that's like, oh yeah, that's right, Hortonworks is big on Kafka. Can you give us just a quick sense of how you're making that shift towards low latency, real-time streaming, big data or small data for that matter with embedded analytics and machine learning? So we have evolved from certainly being the leader in global data platforms for all the work that we do collaboratively in and through the community to make Hadoop a enterprise viable data platform as the ability to run mission-critical workloads and apps at scale, ensuring that it has all the enterprises from security and governance and management. But you're right, we have expanded our footprint aggressively and we saw the opportunity to actually create more value for our customers by giving them the ability to not wait till they bring data under management to gain an insight because in that case they're having to be reactive, post-event, post-transaction. We want to give them the ability to shift their business model to being interactive, pre-event, precondition. The way to do that we've learned was to be able to bring the data under management from the point of origination and that's what we use Minify and I-Fi for and then HDF to move it through its lifecycle and to your point, take prescript, we have the analyte, we have the insight and then we have the ability then to process the best in class outcome based on what we know the variables are we're trying to solve for. Is that happening? And there's a word, the phrase acid, which of course is a transactional data about paradigm plan. I hear that all over your story now in streaming so what you're saying is that it's a completely enterprise grade streaming environment from end to end for the new era of edge computing is that, would that be a fair way of character? That's very much so and our modeling strategies always then bring the other best in class engines for what they do well for their particular dataset. Couple of examples of that, one you brought up Kafka another is Spark and they do what they do really well but what we do is make sure that they fit inside an overall data architecture that then embodies their access to a much broader central dataset that goes from point of origination to point of rest on the whole central architecture and then benefit from our security governance and operations model being able to manage those engines. So what we're trying to do is eliminate the silos for our customers and having siloed datasets that just do particular functions. We give them the ability to have an enterprise modern data architecture. We manage the things that bring that forward for the enterprise to have the modern data driven business models by bringing the governance, the security, the operations, ensure that those workflows go from beginning to end seamlessly. Here, go ahead. So I was just going to ask about the customer concern. So here you are, you've now given them this ability to make these real time changes. How, what's sort of next? What are their, what's on their mind now and what do you see as the future of what you want to deliver next? Right. Well, first and foremost, we got to make sure we get this right. And, you know, we really bring this modern data architecture forward and make sure that we truly have, you know, the governance correct, the security model's correct. You know, one pane of glass to manage this and really enable that hybrid data architecture and let them leverage the cloud tier where it's architecturally and financially pragmatic to do it and give them the ability to leg into a cloud architecture without risk of either being locked in or misunderstanding where the lines of demarcation of workloads or data sets are and not getting the economies or efficiencies they should. And we saw that with data points. So we're working very hard with the community, with our ecosystem and strategic partners to make sure that we're enabling the ability to bring each type of data from any source and deploy it across any tier with a common security governance management framework. So then what's next is now that we have this high velocity of data through its entire lifecycle on one common set of platforms, then we can start enabling the modern applications to function and we can go look back into some of the legacy technologies that are very procedural based and are dependent on a transaction or an event happening before they can run their logic to get an outcome because that grinds the customer in post-world activity. We want to make sure that we're bringing that kind of, for example, supply chain functionality to the modern data architecture so that we can put real time inventory allocation based on the patterns that our customers go in and either how they're using the product or frustrations they've had or success they've had and we know through artificial intelligence and machine learning that there's a high probability not only they will buy or use or expand their consumption, whatever they have of our product or service, but it will probably, to these other things as well, if we do those things. Correctly. Predictive logic as opposed to procedural, yes, AI. And very much so. And so it'll be bringing those that, you know, what's next will be the modern applications on top of this that become very predictive and enable versus very procedural post-event, post-transaction. We're a little ways downstream. That's looking out. That's next year's conference. That's probably next year's conference, please. Well, Rob, thank you so much for coming on theCUBE. It's always a pleasure to have you. Thank you all both for having us and thank you for being here and enjoy the summit. We're excited. Thank you. I'm Rebecca Knight for Jim Kobielus. We will have more from DataWorks Summit just after this.