 Live from San Jose, California, in the heart of Silicon Valley, it's theCUBE covering Hadoop Summit 2016, brought to you by Hortonworks. Now, here are your hosts, John Furrier. Welcome back everyone, we are here live in Silicon Valley and actually in San Jose for Hadoop Summit 2016. This is SiliconANGLE Media's theCUBE, our flagship program where we go out to the events and extract the signal from the noise. I'm John Furrier with my co-host, George Gilbert. We'll keep on analysts, our next guest is Arun Murthy, who's the founder and architect at Hortonworks. Welcome back to theCUBE, great to see you. I had a good pleasure. I know we don't have a lot of time because you've got another hard stop, another keynote, big things going on here. Great show so far, day one of three as a wall-to-wall coverage from theCUBE. What's going on? We've seen a lot of change and a lot of acceleration open source. DockerCon last week, container madness, really, it was like Burning Man for developers. You had Geeks there, you had businesses, big names. You had VC circling the hallways. Here you've got big name partners, like Microsoft on stage, HPE, IBM, and you've got developers, same deal, data scientists. What is the trend of containers and this Hadoop ecosystem certainly sparked summit? Again, that was exploding. What is it telling us? I think what's telling us is, from where I sit, I spent, as a community, we spent 10 years in Hadoop, which means we spent the first 10 years making up the plumbing, whether it's Spark, whether it's Kafka, whether it's Hadoop, whether it's Hive, right? What really enterprises are looking for now is, you guys have all this sort of, you have all the Lego pieces, right? Now we've got to be able to put them together in a preconfigured way so you can actually, you don't buy the Lego pieces anymore. We've got the Lego pieces, we've been selling it. The question is, now can you buy a toy that's been put in on? Think of it as a credit fraud application. It's a customer 360 application. We have all this data. We clearly, the enterprise understands that the modern app is going to be a data app, right? I mean, it's fundamentally going to be driven by data, whether it's predictive analytics, whether it's forward-looking stuff. So, given that we have all this data in the platform, we now have to make sure that we make the platform easy for people to develop the apps one, and more importantly, get the apps off the shelf rather than have to put the pieces together. It's interesting, to your point about data value, Peter Birch just asked Rob Beard about valuation, and George's doing a lot of research on that with the Wikibon team. LinkedIn sold to Microsoft for $26 billion. It's essentially a Rolodex, it's a social network. No, but the data set is valuable, and now and now we're seeing customers realize, wow, it's not the database, or the scaffolding of the technology, it's the data sets. It's the data itself, exactly. How is that changing the game right now? Because what does that mean? Because how do people get, everyone wants to be in that position, I mean, $26 billion for LinkedIn, shows the step-up change that data can do in creating value. Absolutely, and at some point, I don't know how much longer we're away from a point where enterprises are going to start listing their data assets on the balance sheet. Think about it, you can always buy real estate, it's money, but you really cannot generate the data. If you don't have data, you don't have it. You can go back, you had software running 50 years ago, that software itself is not valuable, but IBM trades from 1950 are still valuable. If you can do analytics on that, it's the same vintage, but it's data versus everything else. 10 years from now, they're going to watch this video and say, hey, back in 2016 before they had cars on the internet flying in the air, and Erun Murthy said, there might be data on the balance sheet someday. So Erun, let's continue that talk about applications. For apps, normally to emerge out of an ecosystem, there's a consistent platform, but the ecosystem, the Apache Hadoop ecosystems growing so fast, how do you sort of prescribe subsets of a platform as appropriate for different types of apps? And is there a forking with other distro vendors? The way I look at this is, we could spend the next 10 years trying to standardize on everything, or we could go solve business problems right now, which exists today, right? You guys saw what Progressive said on stage. They've been able to pass on $562 million in insurance savings for their customers using data. Now that gets people's attention. It means you and I are paying less for our car insurance, right? I think what I see and what we see as a community is rather than trying to spend all this effort trying to standardize for the next 15 years, let's make it trivial for somebody to take an existing set of technologies. Today it's Spark, tomorrow it'll be Flink. It doesn't really matter, right? Take Spark, take Storm, take Hive, put it together as a credit for our application, which is what we demoed for the morning. Put it in the marketplace and let somebody download that application and it's not going to be perfect. It's only going to be 85%, right? Allow people to modify the last 15%, but go run, right? So everybody can take the 85%, modify to their use case and then go run. And as long as you have enough people putting up these in a marketplace and I can download and run, rather than today I have to pull up Kafka, I got to pull up Storm, I got to pull up Spark, I got to stand them up, wire them together. That's painful. I mean, we cannot train people fast enough, especially to your point, which is these technologies are moving. I'd rather have customers like Progressive download an app from a marketplace, run it and then modify the last 50%. So this is a different go-to-market model. I mean, it dictates a different go-to-market model from what we saw traditionally in packaged apps. Well, first on PCs and then even the bigger ones where the change was to the business process, making it map to the application or the other were around and it was painful. But so it wasn't all that repeatable. So I think to our point, what people have realized is it's not about the application, it's about the data. So if you can deal with the fact that you have the data and you look at data as your permanent asset, then you can modify the application. It's not the other way around. You don't want to get into a world where every application holds its data hostage, which means the modification to one app will screw up 25 others. If you can look at data as an asset and then you can modify the apps rather than having to go the other way around. Could you say, Jim Cabilia said at IBM said something interesting about this. He said in the old world, we had process and then the data came out of it. Exactly. And the process could evolve. Here he's like saying we have data and the process can evolve. And the process emerges. Exactly. And so, but it means the app is always evolving. And that's a reality, which means which is why we have to embrace technologies like Docker. It allows us to evolve the app faster. So you might have one version of your data. You might have one version of metadata. If the data changes, you could build up a new version, with a new pipeline and new, and you can store both the data sets anyway because it's economically viable on Hadoop. You store both the data sets. You run the old version of the app with one set of Docker containers. You run another version of the app with a different set of Docker containers. And you can manage all of that consistently through Hadoop and Yarn because you're running all these Docker containers and you get consistent metadata and security with aspects like Ranger and Atlas. So if you put all of these together, that's one, the other way to look at this is it allows us as a community and as an ecosystem to move, to deliver value to the customer on day one. To deliver, I'm sorry. Value to the customer on day one. Because right now, they spend four weeks trying to put storm and spark and hive and everything else together, right? If you can get a preconfigured set of that out, then you can always customize it, but you're not starting from scratch every single time. So would it be fair to say that your favorite go-to-market partners would be the big SIs who? Absolutely. And we started to see some of that too in the open source community, right? So you're looking at technologies like Metron. So Metron is an app. It's an app which is for cyber security, right? That app is something pretty soon to be able to download and run on your existing HTTP cluster, right? And by the way, when I mean an app, it's not just the technologies, it's everything including the user experience around the app. So you can actually go in and look at who is creating a denial of service attack? Who is doing fraud? You want to be able to package the underlying data technologies, right? And sort of the user experience around it and send it as one, we got an assembly. So that's an assembly for us, right? And the nice part is now you'll be able to scale that assembly as a whole. You don't have to go scale HBS and Storm and Hive. You scale the credit fraud app or Metron itself. So one last question. You talked about containers and sort of continuous delivery for agility, but it sounds like also the machine learning component absolutely. That's what's going to directly take the emergent data out of emergent logic out of the data. Exactly, and that's really what we want. I mean, if you really look at what we see as a modern data app, predictive analytics and machine learning are really key part of it. I mean, yes, we've got the bread and butter use case of BI and reporting, but those are stuff we've seen in the past, right? The fact that we can now do machine learning at scale using technology like Spark is really important as we bring these new sort of breeds of data and new breeds of applications on board. Aruan, thanks for coming on theCUBE, really appreciate it, I know you're tight on time, but I want you to just quickly summarize the theme this year from your perspective. I know that it's energizing, the enterprise readiness, all that good stuff. Technically, we're the core communities that you guys are actively working in right now. If you could give us a list of the top three to five communities that's around the Hadoop Summit ecosystem, what are the top five? Our top three. I would say we continue to drive innovation in the core Hadoop platform. I love you to build all these apps, very simple APIs around it. Security and governance is super important. You can't have security without governance and you can't have governance without security, right? So that's why tying technologies like range is an atlas, range is for security, atlas for metadata is super important. Machine learning continues to be huge, right? Spark continues to be one of the shining lights there. Streaming analytics is big. Streaming ingest is big. So we want to eliminate the distinction between data and motion data arrest. You want to be able to join both of these data sources simultaneously. That's why NIFI is so important for us. And last but not least, I would say continue to drive all the SQL set of things because for the next, it's been on for 30 years, it's going to be on for another time. What does Yarn fit in all that? So Yarn is going to be that platform there, which we can make it really easy for you to develop the apps and run docker-raised apps on your Hadoop platform. So you can actually access the data and access all the resources in there. Last but not least is, of course, cloud. You spend a lot of attention and time on the cloud. Cloud. Another big one, yeah. So we're making all the data everywhere. IoT is certainly going to be in-ground, okay, great. I remember the founder and architect at Hortonworks here at Hadoop Summit. Again, congratulations. 10 years of Hadoop, great celebration going on. Big party tonight, big, a lot of things happening. We'll see you around, have a cocktail later. It's theCUBE, live here in San Jose in Silicon Valley, watching theCUBE. I'm John Furrier with George Gilbert. We'll be right back with more theCUBE.