 Live from San Jose, California, in the heart of Silicon Valley, it's theCUBE, covering Hadoop Summit 2016, brought to you by Hortonworks. Now, here are your hosts, John Furrier and George Gilbert. Okay, welcome back, everyone. We are here live in Silicon Valley in San Jose for Hadoop Summit 2016. I'm John Furrier with my co-host, George Gilbert. This is theCUBE, our flagship program. We go out to the events and extract the signal for noise. Our next guest is the president of Hortonworks. Welcome to theCUBE. We're here at Hadoop Summit 2016. You guys, at day three here, three days of wall-to-wall coverage, you just announced that this conference is going to be renamed DataWorks Summit. Yes, we did. Welcome to theCUBE. Why is it going from Hadoop Summit to DataWorks? I think I know the answer, but I want to hear from you. Well, first, thank you for having me on here, John and George. Great to be here again. So, we think there's a whole evolution in the industry that we've talked about. This year is the 10-year anniversary of Hadoop. And we've seen Hadoop go a long way from where it started around batch analytics, what you can go do, map reduce, et cetera, to today with IoT and machine learning and streaming and all of that. And we think, actually, Hadoop and the summit needs to evolve as well, too. This is not just about the Duke data anymore. It's about all data. And when you get to DataWorks Summit, it's really a reflection of all the other ways you can interact with that data. And I know the name, really, DataWorks, HortonWorks, but data does work, and that's what people want to put data to work. So, kind of nice, kind of like synergy in the naming. But, you know, categorically, we've been talking on the Cuban. We've been identifying that, you know, the big trend in the industry is to break down silos for the customer, data silos. Here in the event, Hadoop Summit, the conversation has constantly evolved. And Arun in Dublin said, it's beyond Hadoop. We've been getting that message. So, is that really kind of the expansion? Focus is to kind of open up and have a much more comprehensive event? It's about a much more comprehensive event right about technology. So, technology doesn't go away, right? What can you do around Hadoop and streaming and IoT and cloud platforms, et cetera. But we've expanded. It's not only on-premise, it's in the cloud. It's not just the hoodies and technology. It's also the suits on the business side now. And as you get to that extension, you start to say, how do you really help the business understand not what the technology does, but what it can do for them? It's a very different conversation. Yeah, I really think that's a brilliant move on your part because, again, we're seeing a conflict. We just had US Bank on and he's excited, but yet frustrated. He's stuck in the middle between two conversations that he's battling in his constituents, his stakeholders. Business value really is around the analytics. That's always been the core conversation that had Hadoop enables and among other things. And the technology is the moving train, right? So you have real growth going on on the technology side to scale with cloud and whatnot. That transformative aspect of the tech, but yet the blocking and tackling that he has to solve for the business side. He's stuck in the middle. Talk about this transformation concept and how you see Hortonworks and the ecosystem solving that problem where customers shouldn't be confused. They should be straightened on what they see the value. So there's a couple ways you solve that problem, right? I'll give you one on the technology side. It's the emergence of cloud. The cloud experience is very different. The cloud experience is, I just want to consume. And I just want to consume it that way. I don't care about all the other things behind it, just allow me to consume. So part of it is, how do you give them that experience so they can just consume what they want without worrying about all the underlying capabilities or the underlying formats, et cetera? So that's one thing that drives this. The second is this industry has emerged from people willing to take the toolbox with the early adopters and assemble all the tools and understand how to be a craftsman to say, just give me the application of how this goes works and let me go use it in context of my environment and allow me to do what 10 others before me did because there's already proof points. I don't need to assemble it myself. I'm just going to go take what works and go do it. So give me that application. That's awesome. Would it be fair to say that using Jeffrey Moore terminology, the first, the application that helped Hadoop cross the chasm was ETL offload? Was that the one that helped pay for putting the infrastructure in place? I would say ETL offload, absolutely, George, was the one that allowed Hadoop to cross the chasm, the repeatable use, what allows you to cross the chasm for repeatable use case? That's the repeatable use case that people start doing for cost savings. But now that they've done that and the data's in one place, as they start to put more and more data ingested with NIFI, put it in one place and correlate it, they get to a very different point to say, what else can I go do with all that data, like predictive analytics or increased basket analysis and retail? Because now you have the foundation from the ETL data, now I can expand to all these other candidly, much more cooler use cases that move the needle. So that's sort of my follow-up, which is if that was a common app that got most people across, it was sort of cost carve out. Are we, are you able to identify yet, let's say the top three new use cases or are they so broad that it just doesn't make sense to try and categorize them? No, so if you go across, there's obviously very specific ones by industry, but if you go across industry and say, what's the commonality? A single view of customer, how do I get a single view of that customer? If I'm a retailer, what's my omni-channel experience from how they interact with me on the web, to they're walking through the store and provide a real-time in-context offer, still single view of customer. Financial services, they own a mortgage, they own insurance, they have a checking account, can I cross-sell them over other services? That's one common use case. Second, predictive analytics. I have assets, machines, running, IoT. How can I get more utilization out of that asset by keeping it running longer? Predictive analytics. Those two are driving most of what we're seeing in terms of the opportunity. Yeah, that's interesting on the ETL. I'm glad you brought that up, George, because most people don't know what ETL is these days. It's certainly changing, and a trend that we're seeing certainly in the entrepreneurial circles of technology is machine learning is talked about everywhere, but when you apply machine learning with ETL, you have more an intelligent ETL. You have AI, so AI is in the top of the conversations. This is kind of the new thing that's coming out of the ecosystem. I'm glad you, which is why DataWorks is a great name, because it's not just about Hadoop. It's the combination of the platform opportunities that's going to enable better machine learning, an AI-like approach, where ETL eventually goes invisible. I mean, ETL technically should go away. Your thoughts on this, because ETL has always been that, well, we do ETL, but that's now going to change. Do you see that, and has that fit into your vision? Yeah, so I'd say take three stages of this. So ETL, call that stage one. Taking some of the core workloads off the data warehouse and running it in Hadoop. Second one is think now of all the BI and analytics workloads, OLAP, and all the other things. We announced a relationship with that scale very specifically around how do you state that next year of workloads that can start to now run in Hadoop and get the same benefit. Now once you have all that, now you can start to do more of the iterative machine learning on top of all that data, bring in all your other data sources, social, quick stream, et cetera. And that machine learning, why is that important? Because everything I just said on more ETL and what you do with more of the BI tends to be more on static data. Cassie comes in new, but it's static data. Machine learning is dynamically. How do I constantly refresh that? And it's really the emergence of the data and AI, artificial intelligence, into now I can get better insights dynamically. Following up on that. When you have, as a prescriptive platform, to say if you want to do ETL offload, these are the pieces you need. And I guess most of those came in your distribution. But now with predictive analytics, if you want application vendors to target a certain prescriptive platform, what might have to go in that? Or whether it's by partnership or organic development? So, now what we're starting to see is think back to the database days here, the database and the application. Now we're seeing the emergence of a series of applications, data-focused applications, not as much user-focused, data-focused applications that can run the platform. If you look around the show floor, there's difference here, there's one over there. Data RPM, which is now a predictive analytics framework and application that does iterative machine learning running on the platform. You're starting to see the emergence of these companies of that next wave of innovation. This is interesting, because you guys saw about this connected platform concept. I want to get your thoughts on, because non-stationary data really is where the vertical applications kind of do their heavy lifting, but have to be connected in. It's not just about the silos. Take us through where the evolution of the connected platform's going and how does that relate to some of the core things that you guys have done? So, three vectors, right? Vector one is, I have all this data that I want to consolidate into a data lake on-premise. So, for all that, my data rests on-premise, right? Vector two is, I've got all this data that's external to me, right? Could be social signals. It could be somebody walking through a store and the beacon goes off and you know where they're standing at the time, what they're looking at. How do I, in real time, ingest that? Probably in the cloud and go bring that data in, right? And then third is, how do I correlate all that with a common platform that allows me to do things like security and governance? And I know where the data is and I know it's secured. Whether it's in the cloud or on-premise, all that has to tie together and provide that platform service so the apps can emerge on top. And that value is going to come from platforms or apps or both? That is going to come from both because from where we get to position is we get to be the platform that people build apps on but we also get to be the platform for platform providers. We get to be the platform that the other platform providers build their platforms on and take out to market. Yeah, and I think that's great because your customers here have app developers. So one of the things that came up this last week was DockerCon. Shows that there's a real thirst for enterprise developers. In this new conference of DataWorks, are you guys going to look at expanding the agenda to include things like Spark, these other hot areas that are kind of categorical mostly the business tracks, data science? You know, we're talking about Pivotal. Data engineering is a huge concept that's exploding in terms of a new persona. Obviously the chief data officers. Then you got the developers. So how do you guys look at now the next had dupe summit changing over to DataWorks? How do you view that? What's the internal conversation? Probably a number of tracks, right? This is based on the interest we're getting from the community. This is Nustride. The community is asking for this, right? They're thirsting for it. They say things like IoT and streaming, Internet of Things and streaming. Think of that as a track of what you can go do. So where you were just going to think of the DevOps track. How do I do assembly in real time of different components and package it to go run in a DevOps type environment? So containers, assemblies, how do I go do that? That would be a second track for that type of person. Third is just raw operations. How do I run this at scale? How do I go run this and really make this work at scale? And then I'd give more around data science and machine learning. What can I do around data science and machine learning? All with the same foundation again. Hadoop summit doesn't go away. Hadoop is still a big piece of it. We had Joel Horowitz on from IBM and he was saying that, you know, they've been more parochial from an IBM perspective but you know, it's not just about Hadoop anymore. It's about the entire ecosystem. That data. And that's why people are confused by the ecosystem. I want to get your thoughts on this because people are like, oh, Hadoop ecosystem, it's not just a Hadoop ecosystem. What other ecosystems are converted? You mentioned IOT, but here in the feedback you're getting, what are the core ecosystems? You have to kind of label them. What would they look like? So definitely the Hadoop ecosystem, right? What I would call the traditional data warehousing, right, and that type of ecosystem. Now the streaming ecosystem of companies saying I need to move data. Every car that's traveling down the road, right, is now being equipped with farther and more than enough compute power to go transmit that data back and also send messages back to that car. It's a moving computer. It's moving metal and moving iron. What can you go do it? All of those ecosystems are coming together. All right, final question is, what do you say to the folks watching that are customers are saying, oh wow, sounds like a relief. I have one place to go, but they still have to get their answers all the stuck and confused and trying to figure out how to make technology that's a moving train and standards that are trying to develop work. They're under pressure to provide business value with analytics and apps. They're kind of stuck in the middle. What's your advice to them and what you guys are doing to help solve that problem? So if you ask any of the panelists and the 20 customers we had on stage talking, they all say the same thing, which is first thing is just get started, right? Don't overthink it. Don't over engineer it. Just get started with the process. That's step one. The second is you can get a lot of help and assistance. Obviously from us as a platform provider, what we do and to help you get started with it, but all of the people in the ecosystem here, we just announced a managed service provider program, which is all around somebody saying, you know what, I'll go take this off your hands. I'll just go run the whole thing for you. I'll just make this work. Those are the opportunities that you have. And assemblies might help too in reference architectures, things of that nature. Absolutely, absolutely. There'll be tracks on that in the next event. There'll be definitive tracks on that in the next event because that's where this industry's going. Her great vision, love the new event, Hortonworks announcing an industry event, DataWorks Summit that will expand and rename the Hadoop ecosystem into a data ecosystem, which is really about time. Congratulations, something we've been kind of banging on the table here on theCUBE. You guys are hearing it from everyone else too. It's beyond Hadoop at this point. Thanks for sharing. This is theCUBE going beyond Hadoop with DataWorks Summit announcement here. We'll be right back with more coverage here in Silicon Valley after this short break.