 Live from San Jose, California, in the heart of Silicon Valley, it's theCUBE. Covering Hadoop Summit 2016, brought to you by Hortonworks. Here's your host, John Furrier. Okay, welcome back everyone. You're watching theCUBE here for another event. We are live in San Jose in Silicon Valley for Hadoop Summit 2016. The hashtag to join the conversation is hashtag HS16SJ. That's Hadoop Summit, San Jose. You can go to crowdchat.net slash HS16SJ. I'm John Furrier, my co-host here. Peter Burris, you're watching theCUBE. I'm a broad bearded CEO, Hortonworks back on theCUBE. Great to see you off the keynote. How are you feeling? Doing great, very, a lot of energy here. Great attendance, tremendous partner showing. The community's here in full force. Of course theCUBE's here. So we're excited about some of the next few days. Thanks for having us. Again, three days of wall-to-wall coverage. You're watching theCUBE, SiliconANGLE.tv. Rob, I got to ask you. One thing I noticed when I signed in today was some huge names as sponsors. People back at Enterprise, Microsoft. Again, the logos and the support that Hortonworks is getting has been phenomenal. Talk about the dynamic there because we're seeing ecosystem and all the events we go to in theCUBE. Developers and ecosystem success. You're seeing partnering front and center in this. How is partnering changing now? What is the state of the union? What's the current landscape? As we saw you in London, said the maturation has to go faster, the acceleration has to go faster, got to get the cost down lower, complexity reduced. What is the partner update? That is a critical story here this week. I think what the partner that you mentioned, whether it be Microsoft, Google, you see on the floor, virtually every major enterprise player from IBM to HP and everybody in between. But also I think the part of that is, what's the broader community? You have great representation from Kafka, from Storm, right? Of course the Yarn and Hive communities and Spark communities are very vibrant here. And what it's really saying is, I think the signaling there is, Hadoop has become an enterprise viable data platform. And the major enterprises are adopting it and bringing it into their mainstream strategy for managing data and to modernize their data strategy. And because of that, the major enterprise players as we mentioned, whether it be Google, Microsoft, IBM, HP, everybody in between, are here because their customers are adopting Hadoop at scale and in mass. And enabling their modern data architecture with and around Hadoop. And so it's very important that they ensure that their platforms, their tooling, their products are part of those reference architectures going forward. And it's important that we as Hortonworks enable our platforms to work seamlessly with their environments as well to create a very good solution set. So we've seen a perfect storm developing this year, certainly going through the summer. We'll see it in the fall kick up with a lot of big shows coming in the fall. And I'm sure the theme will continue. We saw DockerCon with Container Madness, going crazy. Developers are really engaged in that interoperability. You guys have been fighting that battle in this ecosystem of big data. Now it's just Hadoop anymore. A lot of other data management solutions, data warehousing. What is the simplicity equation look like? Because it's not about the string that everything's going to be lined up and you pull the string and it works. People want Lego blocks. They want to have a developer agile environment where they can do that. What is the critical success factor for the customers to be successful? Well, step one was making sure that Hadoop had the ability to become an enterprise viable data platform. And so that it transitioned from just being a single data set, single application, batch processing environment to truly being able to drive a central architecture with batch interactive in real time and then having the enterprise services as part of the native architecture. So things in the security, the governance, the operations, life cycle management. And those are the things that have transitioned Hadoop into the mainstream modern data architecture. And so that's sort of phase one and we'll never finish that, but I think we're past the tipping point. I think that's evidenced by the, when you signed in, the people who are here, the people who are on the floor, the over 4,000 people attending. But what we now have to do is take those next generation modern applications that the developer community is now building and make it very easy for them to deploy those applications onto and into the data that Hadoop is able to now manage and to make it very easy for them to interact with the data, access the data and take that data and turn it back into an application that they're building. So the next step in this is certainly making sure that we were given the tooling there and the access to the developer community, but then more importantly, giving them the ability to deploy those payloads and interact with those payloads via containers and into the next architecture. And as you see some of the additional keynotes, see some of the tracks, you're going to see very much the containerization that we're enabling through the core architecture via yarn of Hadoop. Because the next phase of adoption is all around the developer community being able to deploy their apps in and on the data and vice versa. So one of George Gilbert's going to be helping the host theCUBE through much of the next couple of days, but he did a piece of research recently that's going to be published sometime this week that looks at the concept of adaptive stretch and adaptive stretch, effectively it comes from an economist by the name of Brian Offer. And it talks about how a platform when delivered is associated with a set of problems. And then the community will stretch that platform to its near breaking point. And then it snaps back into place and you move on and you take as you take on new problems. The Duke's been out for about 10 years now. We're celebrating that 10 year anniversary at this great event. And Spark is an example of looking at the adaptive stretch with Duke and saying, well, let's bring in new complimentary set of capabilities. But as you think about the Duke ecosystem and the new types of problems that had Duke's going to be associated with, how do you see the tooling evolve over time? Is it going to mush together more? Is it going to get more specific simplicity? I know governance is a big issue. Where do you think this group of people are going to focus their attention on some of those issues over the course of the next year? I think that's a great discussion for us to have. I would actually ask us to look back. During the first five years in the 10 year journey that we've been in, the first generation of the Duke was really batch oriented, right? And very monolithic, single applications, single datasets. But really transformed how that kind of workload and dataset could be managed in the horizontal scale out that it brought and the scalability that came with that for that particular use case. And it goes back to the economic principles of then the community's going to stretch that into the next generation. And I would suggest that the formation of Hortonworks and what we did with Yarn, right? To take Hadoop and now open up architecturally and expand architecturally or transition transform to the next generation of being able to bring all data to get all of the disparate data into a central platform and then be able to bring batch interactive and real time applications with Yarn as the resource manager on top of that. And what that did was it opened up a much broader opportunity set. And I've said from day one, for Hadoop to reach its full potential, no one company can do that. I'm making your point again. It's about the community. It's about the ecosystem. It's about the developer community. And so what we did as Hortonworks is we took it to the next generation with Yarn. And now we're experiencing the third generation and the third push at that. And so we see then where if you think about Spark and what it's done, it's a tremendous engine that then can leverage the power of Yarn. And what you now will see is, okay, now that we're managing exponentially more data for mission critical real time applications, right? We must make sure that we can lock it down from a secure and encryption standpoint. We must be able to have that hand in hand with data governance. And then in parallel with that, now to sort of get to the final part of your question, what's next? And it's going to be about not just being able to manage all of the enterprise data that sits in the data center, it's about being able to work cross deployment architectures, cloud and on premise and be able to do that very transparently, right? And what we're also doing is we're now pushing to the fourth generation of that with all of the streaming data sets, right? And be able to bring real time from the point of origin of data and all the way through its life cycle until it comes at rest and give the developer community and the enterprise the ability to interact with that data as events as it moves, as events happen, conditions change and to be able to take proactive action against that based on a rule set and do that across a cohesive architecture because from our view, it's got to be about connected data platforms, managing all data, not just centralizing data at rest. It's data at rest, data in motion across an entire life cycle of that data. And that's what we're doing with HDF, HDP and then making sure that we're opening that up for the community, the developer environment, the developer platforms to build those next generation apps, right? That allow, now, again, a stretch again, right? Sorry for the long answer. So data at rest, data in motion, connected systems are all the theme that you just mentioned, but I want to drill down on to go the next level of that is enterprise readiness. On the crowd chat, Bert Latimore pointed out, the main theme is enterprise readiness of the conference. Not a lot in the general session about that, but let's go there. The enterprises just want to be ready. So this is a skill set issue. It's adopting technologies, difficult, and generally speaking, how does an enterprise get ready to go that next level? Yeah, that's why it's a three-day summit, right? So day two, day three, we've got to keep the camera back. I'll be here. Come on, do it. Yeah, but a little bit of the teaser for you guys to come back and see for the next couple of days. But no, I just- So we're going to hear more about that tomorrow. Oh, we're going to hear a lot more about that tomorrow and the next day. But in general, it's very important that you have central security, central governance, and central management platforms to manage the data, whether it be on-prem or in-cloud, and have a cohesive, common architecture for managing all data, irrespective of where it's deployed again. Let me make sure we're clear on this. You're saying central security, central management, that doesn't mean central data. Correct. Got it. A central security, so you want to have a central security in governance and management architecture that then can span to wherever that data may be, whether maybe it's at rest in Hadoop, on-prem in the day center, it very well may be in the Azure cloud or even multiple clouds. Or IOT around the corner is coming really fast. And that's that fourth generation principle. Right. And then when you look at the stream and be able to go out to the point of origin at the, you know, either at the collection device or at the center itself, and to be able to have a common method for managing and governing that data and its associated rules that irrespective of the deployment architecture or whether it's in motion or whether it's at rest, right? And that's the first step. That's actually the second or third step of making sure that it is enterprise ready so that it is absolutely can have the data integrity, the predictability, the security, the governance. But then we have to make that easier to use, right? We've got to have it rock solid. Now, in the meantime, we want to provide the tooling but that's where we then open it up and let the community, let the ecosystem build the best to breed methods to do that and then the next generation applications and tools for managing that. Now, clearly we're providing that tool if you look at Sequence IQ and what Sequence IQ does to give the user the ability to move data into a cloud and into a cloud and data center and do that very clean in a drag and drop way. So I got two quick questions, John. The first one's a little bit more complex. Maybe hopefully you can give us a simple answer. The first one is centralized rules for governance, security, management, decentralized data. It would make it a lot easier to cohere and put all that together if we had a clear understanding of data value. We're moving to a world where the data is becoming the primary citizen in a lot of systems but we still don't have a good model for data value. What do you think about that? In my keynote, one of the three topics I've said I wanted you to hopefully take away from my keynote. And the first one was data's your product. It's kind of same principle, data's your product, data's your value, right? And that's where the tech has to be able to bring that normalization and consistency irrespective of where that data lives, whether it's at the edge, whether it's on or in the device, whether it's across multiple clouds. And that's where our products like our security products and our governance platforms are striving to get there as fast as we can with and in collaboration through many of our partners to make sure we're thinking about those problems at scale. And we get a big benefit with our founding partner. Yeah, I think we're going to make the discussion about data value in general as a community a lot more explicit. The second thing very quickly is our clients are starting to say we're learning a lot about how this stuff works and it's starting to bleed into management routines. So it used to be data in support of existing management. Now we're actually seeing adaptive stretch in management. So empirical based management kinds of things. That's also going to be interesting to see how the Hadoop community prepares itself to actually support brand new ways of thinking about governance, not just in the data, but across the entire corporation. And I think that's a key principle of being able to what the first generation if we go back to the elasticity of the tech or the evolutions of tech. Hadoop was able to bring all that data together and understand the patterns that were associated with our engagement cycle with the customer or product or supply chain, right? And that's great. There's tremendous value in there because we can understand a whole new level of analytics and engagement with our customers or products in our supply chain. But we need more velocity than that and it has to happen much faster than that. And that really drove our core strategy and ultimately the acquisition of Onyara. Because we believe we needed to be able to engage with the data from the point of origin all the way through the movement and the lifecycle of that data across the stream. And the reason that is so incredibly important is that you want to have the ability to take action or to drive a process as an event occurs or a condition change, right? And so in real time, as that event happens or a condition changes, we can prescriptively act against that based on a value that we're trying to manage to or for. And then what we'll stretch on top of that is we'll be able to apply machine learning to optimize the most efficient and productive outcome that as these events are happening, we know how to drive that to a finer grain of execution based on the value of that data and where it is in its lifecycle and what conditions are happening or changing. And we will know what act prescriptive action we take for it, with it, against it, right? And then from that, we can continuously optimize and learn through traditional machine learning algorithms. And the great news is all that's happening right now. Rob, I wish we had more time. I know you got a hard stop. I had a zillion questions on the growth strategy. And I know you're in a quiet period so you can really talk about the numbers. Great to see you. Give me the final word for the folks watching. What is the vibe of the show that's currently going on and what are they going to hear in the next three days? We're going to hear how the dukes become mainstream and the importance of bringing data at rest and data in motion together and be able to manage all the data through a seamless life cycle and how to drive value back into the business by doing that through various types of use cases and tools that they can leverage. Energy's great. Thank you guys for being here. You give great exposure worldwide and we're very, very happy to have you and appreciate the opportunity to be here on theCUBE. Thanks so much, Rob Beard, sharing his insight and data on theCUBE. Centralized theCUBE here. We don't care where the data flies. Enjoy the CUBE content anywhere around the world. Appreciate your time. Rob Beard, CEO of Hortonworks. You're watching theCUBE. I'm John Furrier with Peter Burris. Be right back with more live coverage after this short break. You're watching theCUBE.