 Live from the San Jose Convention Center, extracting the signal from the noise, it's theCUBE, covering Hadoop Summit 2015, brought to you by headline sponsor, Hortonworks, and by EMC, Pivotal, IBM, Pentaho, Teradata, Syncsort, and by Atunity. Now your hosts, John Furrier and George Gilbert. Okay, welcome back, everyone, live in Silicon Valley. We are here at Hadoop Summit 2015. This is theCUBE, our flagship program. We go out to the events and extract the signal and noise. I'm John Furrier, founder of SiliconANG, I'm showing my co-host, Wikibon's new big data analyst, George Gilbert, and our next guest, Sean Connolly, VP of Strategy at Hortonworks. Great to see you again every year. We have you on to lay out the chessboard for us. There you go. Hadoop ecosystem, and now we have a lot of customer data to discuss. Welcome to theCUBE. Yeah, the game continues to evolve, and congratulations, George. Great to see you on the team. Great to see you. Awesome, great. And thanks for coming on. ODP, first question out of the gate, ODP. Boom. ODP, what is it, what's going on? Yeah, so I categorize it in two halves. I'll talk about the value and why it's important. And then there's sort of the other half of why sort of it make it akin to like there's the housewives of Atlanta, housewives of New Jersey, housewives of New York, and it came out of the gate as sort of the housewives of Hadoop type of reality TV show, which is the uninteresting part, from my perspective. The most interesting thing is. The bickering you mean, just the, yeah. The side show of the bickering. Exactly, he said, she said, and it loses sight of what we're really trying to accomplish in the Hadoop market, which is how do you enable the broad ISV ecosystem to build solutions easily on a common platform, right? And all the different permutations of versions and stuff make it very complicated for software vendors as well as solution providers to get their technology procured by the enterprise, right? So if you're built on a version that's incompatible with what I've standardized on, then you're held up in procurement. And so it slows the market down. So simply it's a ISV enablement and an enterprise procurement and enabling more solutions to flourish and be acquired more quickly. Really simple. So let's talk about the value proposition of the market right now. So big discussion on Twitter and certainly here in theCUBE around crossing the chasm. Yep. Because, you know, we even tried to bring it with Merb, but he wanted to bring back the trough of disillusionment, which is his framework, but you had Jeffrey Moore, multiple keynotes here. We've been on theCUBE. So we will stay with Jeffrey Moore for a minute. Sure. So crossing the chasm. We've crossed the chasm, that was on the keynote. Is that the industry crossing the chasm? I mean, I can honestly say George and I debated this. I feel we've crossed the chasm from an industry perspective because it's robust here. You look behind us, people are buzzing, more leads, bigger names, more operational use cases. It's just, it's growing. And so I think we crossed the chasm. But to the customer, are they viewing the chasm crossing as a watershed event? Or are they just saying, hey, still a lot more work to do, or at least letting us hang on to the edge of the chasm? There you go. I think the chasm provides us a Where's Waldo type of metric. The customers, they'll stratify across the technology adoption curve. Early majority, which is where we are. Late majority, laggard, right? There's sort of last three phases of the market, one on the other side of the chasm. It's very palpable internally when we set our strategy. I plotted where we were and I plotted Merve's numbers into that. And I was like, I do not disagree with the numbers and how they chart out on where we are. 26% adopted so far with another 11% coming in the next year. That's almost a 40% growth on top of that one year. But from a market adoption, I would say it's barbell, very large, and sort of the small, the medium, and the in between of the barbell is starting to fill up. And that's where the meat of the market adoption happened. Yeah, I would agree. And I think Merve's right. And he says the glass is half empty or half full. His survey says that about half of all enterprises are considering with respect to Hadoop, at two purchases or something of that effect. I mean, it's a little bit deeper. Go to the garden and share the report. But that's successful. I mean, that's, those are good to be doing. I'd say it could be larger personally, but that's a big number. Yeah, I did some looking into, in the posts I had an enterprise Hadoop adoption, half empty or half full. I basically posited, it's a more legitimate comparison if you look at where the relational database market was about five years into its journey. So I did some quick looking, right? So, let's say it was late 70s when it actually got started, round up a little bit, right? So 85 or 86, I encourage you to go look at the revenue numbers of Oracle at that time. Interesting data point, Oracle March 86, one IPO, right? They had 23, 24 million in revenue, and then 50 million annual revenue, and then 50 million-ish the year that they went public. You convert that into $2015, it's a 2.2 factor. They were 50 to 100 plus million in 2015 dollars. That's where this market is as well and it's growing aggressively, right? So the stats are not unusual, right? We've seen the story. So it's a trajectory that says pattern matching to other software models. And if you look at the adoption curve, that's the inflection point in 99 customers and 105 customers in our last two quarters, which are over 40% of our total customer base. It's the inflection points arrived. The Oracle, you have a great memory to go back all the way back to 85, and I think their sort of initial use case was reporting because back then, no one trusted the OLTP, our DBMS for OLTP. But if you had to pin down one application for Hadoop crossing the chasm, what would you pick it? Well, that was one of the questions I asked. There was sort of a panel yesterday, right? So if ERP and OLTP was the killer app, if you will, for the relational database, where is it or what is that for Hadoop? Because we're seeing supply chain optimization, we're seeing a variety of use cases, right? Does the internet of things, if you listen to GE's portion of the keynote yesterday, Vince Campisi from GE Software talking about industrial internet, is there a killer app that emerges out of that? I would argue the machine and sensor and the internet of things use cases represent about 30% of the interest in our platform. So that's coming one really strong, so I think there's something in there. It has not materialized yet, I would say. And a good bit of it is it's a multifaceted platform for many different use cases, right? Whereas I think the relational database was tuned to do certain things very well and not be so multifaceted in its approach, whereas Hadoop is very much multifaceted. Well, you could argue that the acceleration of value in this market with cloud might, not hard to do a straight up comparison, but I see your point. But I do agree we're in an inflection point, right? I mean, I would agree with the analysis as a way to understand and rationalize the trajectory. Yeah, it just feels very palpable. Yeah. And the question now is that in the inflection point where it kicks up, where are you in that curve? How far down are you? Is it based inflection point? That's the case, is the angle the curve straight up? Is it going to kick up again? So I think these are the things that we'll be watching. Certainly George will be tracking that. So I want to get your take on that point. With cloud powering hard right now, you've seen a lot of cloud action in the enterprise. And certainly public cloud, we know that's out there. Google, Azure, and Amazon. But now VMware, EMC, others, real heavy duty cloud push which will shape the enterprise's data center. Convert storage, flash. Well, that's been playing out for a while. It's been playing out, so that's baking out. That's coming hard. What's that going to do for the growth here? What's your take on that? Because we're speculating that that could have a real change of make the inflection point kick up a little higher. What's your thoughts on that? Have you looked at that in the analysis? Well, it was funny, I did a joint webinar with Lance Olson for Microsoft, right? And it was, from a cloud point of view, is it up there? Is it down here? What is it? And I'm on record on saying yes, it's both. Because the gravity of data is going to dictate where the workload is going to be. And then the internet of things, if it's born in the cloud, then you're going to want to land it and analyze it at one of the network where it's born. In many cases, there's a lot of data inside the enterprise. And so you need to have a strategy that gives you a architecture that can span those. And that was one of the things in Arun's keynote yesterday where he demonstrated easy click of Hadoop clusters for Spark machine learning in the cloud. So you could democratize the data in the cloud so people can do that in a very agile way. I think that's the key. You mentioned a couple things I want to unpack and drill down. The diversity of use cases of, it can't really say that's the use case that's going to blow everything up and make great. Because there's so many different value propositions across the board, it's hard to put your finger on one, the bowling alley comment. But what you said, the pattern that I'm seeing with cloud is whether it's an economics issue or a workload, compute or resource issue, that'll sort itself out in my opinion. But the issue of standing stuff up fast, that is the common thread. We had Teradata on earlier saying, hey, with Presto, why go buy a Teradata license, go through all that work to do an experiment? Exactly. So I think that thread of standing it up is the true cloud push with no risk. Well, mitigated risk. There's another side to that, which is once you've stood it up, how hard is it to operate? And does the cloud help with that? Or the continuing work with Zambari and Zeppelin, it's not a veneer, it's a sweet. It's a user experience for a specific end user. For the admin, for the developer. I mean, to what extent does that play now that you've enabled it to work with the cloud and get it up and running fast? Well, and the thing is, is there's multiple levels of choice in the cloud, right? So Microsoft has the Azure HD Insight service, right? So from an operational perspective, they handle all that for you. You just swipe your credit card, spin up a Hadoop cluster, spin it down. For those who want to have more control over what goes in the cluster and turn more knobs and dials, then they can certainly use the technology we showed in the demo where you could deploy on the cloud as an infrastructure as a service. And the cloud is broad in that case, right? It could be OpenStack, VMware, public cloud. It doesn't matter in that case. But you should be able to appeal to the range of need and the range of audiences, right? So if you're a small to mid-sized business, you're not very technically savvy, then you're going to go to Azure HD Insight and build your solutions and get started very quickly. I got to ask you a question from a tweet that was sent yesterday. So you were quoted as saying, gravity can determine where data is stored and or processed in a multi-data platform environment. I think that was your quote, right? Okay, so what does that mean? Be specific, kind of drill down. Gravity in terms of the app workload? Gravity in terms of infrastructure? So I've been in the application platform business, right? So JVOS and those, right? And I switched to, so Hadoop, we call it the data operating system, right? I love that, by the way. It's a data-centric platform for your apps to run co-located with the data. So the gravity means if you have an app and it needs to pull a lot of data for the application logic, then you're stuck drinking it through a straw, right? That's not interesting, right? You need to solve that data problem. It's not performance either. Right, so what you need to do is you need to get the apps as close to the data as possible. If the majority of that data is born in the cloud, then you want to bring the apps to that data. If it's on-prem, then you want to bring the apps onto the data. My point is, yes, the apps are both. So you need to bring the platform out to those areas so you can actually operate on the data where it resides. That's awesome. I got to ask you to follow up on that because I asked CTO Scott, new CTO, congratulations, by the way, great guy. Cube alumni with Teradata Labs in the past, super smart, explain interesting tools and platforms in this environment. Because what you basically described is an agile platform. Basically, it moves around its intelligence, systems of intelligence that George is promoting in his research. If you have with virtualization, all this kind of technology, the ability to move stuff around rapidly, it can be efficient. So that changes the game of our old school. Containerization is changing it even further. This is DevOps. It's all beautiful stuff. We go geek out on it. I got to ask you the question here. We talk about this, the VC community and our startups and also big enterprises. Oh, a tool is easier. I like a hammer. I can bang some nails in. Oh, platform is very heavy. I got to spend a lot of dough. So those old classic definitions all still relevant today are now transitioning and meeting. What's your take on as the environment changes? What is tools? What are platforms? Is there a distinction between the two? Yeah. So one of the analogies I use is Facebook early on, which is a social website. Then they actually explicitly made it in the Facebook platform. And that's where it expanded its growth, the ability for people to bring their workloads in the platform. If we relate it to the strategy and who it works, it's a very similar model is, and that's why you see us partnering with the likes of Pivotal around Hawk, IBM around Big SQL, is I want their application ecosystems to ride on top of the platform. Why? Because it drives more customer value and it drives the industry forward and breaks down the barriers. So you think it's a smart move by theCUBE to have theCUBE and then have theCUBE platform? Absolutely. Exactly. Transient. Which we're doing, by the way. I was quoting these from five seconds, quotes from you from 20 to 90 days ago. And you need people to bring their apps to the platform, right? Ha ha ha ha. Content. Yeah, in all seriousness, but that is the model. Land that expands, very cloud-centric. Get some successes, and then figure it out from there. So tools is not a categorical, I'm stuck as a tool guy or a sole vendor. And the thing is, is platforms are inherently, if you do it right, and that's why we feel yarn is so important, is there will be the next big engine. And we wanted to snap in and participate with all that data and bring additional data to the platform as well. So it needs to be future-proof. The platform needs to adapt to what's coming next. When you say to enable the next engine to slide in, are you thinking like Spark as an example of that? Spark's one. You have the data-tarned guys who just open-sourced theirself as Apex, right? They are built natively on yarn, right? They're very familiar with yarn. They actually did a lot of work to enable Apache Kafka on yarn, right? So that's the beauty of open-source, is you get this technology out there in a very democratized fashion. And if you make it easy to slot it in and plug it in, then those workloads can come on to the central platform and participate, and they won't be their own silos. And that's really what we're after. It's classic development. You have a branch of open-source, people can come in and tap off that core branch, if you will, kind of use a GitHub example. Okay, so I'm going to take that concept to the next level. Open Data Platform is about consumption. You mentioned about ISVs, about customers. So I talked with a lot of your ODP partners, Open Data Platform partners, and all of them, we talked to all of them. And the common answer when I ask, is ODP real, from a customer standpoint, not from a housewives of Hadoop perspective? They all say, look, here's the deal. We love open-source. We're going to continue to contribute open-source. It's so fast. There's so much good stuff going on in open-source. Our customers don't move that fast. And also our organization has multiple elements, and we can't even hurt our own cats fast enough to, doesn't my words, to get around core-base. So their value in their mind is to have one set of code that they can support for their customers. That seems to be the common thread. And it's a common, it's a- Is that right? Would you say that's number one? It's consuming the common version of, because if you look in the Apache projects, there's a lot of branches, there's a lot of versions. It's very confusing as to which one do you build. And then you find each ISV that, once in out of the box experience, will build their own permutation. So you have this weird infinite version hell, if you will. And so if we can all agree on, here are the various stops along the way, and just mirror those versions into the ODP community from the Apache projects themselves. That just simplifies the consumption process. So it's not about changing the nature of how the projects are innovated, because innovation will not stop, we will not slow that down. But we have to have waypoints that people can get on, on the bus or on the train, so to speak. And then let the customers ultimately decide. So I got to ask you an Apache question as we kind of wind the segment down. When the history books are written, you said you've been in software business, we've talked about this past. And I've been dealing with Apache for 10, 15 weeks. We've all, our generation, our age has lived, I would say, first generation open source, all the way through. And now it's so awesome, open source is so kick ass, it's just a whole nother ball game. But I want to ask you a question, when the history books write about Apache and its impact, what's it going to say? What is going to be the write up? What was the impact of the Apache software foundation for the industry? So I would say it will be rated as one of the top five most influential innovators in the industry period. History, historically. Yep. Yeah, up there with Apple, the web, everything. And just a few years ago there was an article that posted that where it was Google, Apple. Apache was the number one, there's Linux Foundation, right, there's Eclipse Foundation and other things that were on that list at that time. But I would say hands down from an open source community perspective, the Apache Software Foundation has done more to shape enterprise IT consumption of open source technology. What was the key thing in your mind? You had to point to a couple key elements. Because open source, some people look to sell us, bunch of guys writing code, drinking Red Bull, going out drinking beer, whatever they're doing, coding away. Right. What is about Apache? What did they do differently? What was the tweak, what's this big deal? So Apache has some pretty strong sort of rules, if you will, and their role is pretty straightforward in that they want a thousand flowers to bloom. They're not going to say HBase is better than Accumulo, even though they're both no SQL databases. They're going to say they are my children, they're both great. So they're not going to dictate what versions of these things should work together. Their goal is to enable the best communities to flourish. And they police checking the vendor badge at the door. They don't want vendor shenanigans in the projects. It's about community, it's about contribution. And it's a forcing function for that to happen. So I'm a vendor in that space, but I have to respect the Apache process and I have a lot of respect for that. Yeah, and it is historic. Thanks for sharing. Sean Connolly, VP of strategy at Horton. We're here inside theCUBE. We'll be right back after the short break.