 Live from New York, it's the Cube. Cover Big Data NYC 2015. Brought to you by Hortonworks, IBM, EMC, and Pivotal. Now your host, Dave Vellante and George Gilbert. Welcome back to New York City everybody. This is the Cube, we're here. This is day three coverage of Big Data NYC which we run concurrently to Strata. Scott Nau is here, he's the CTO of Hortonworks. We got a segment, Scott, on just Hortonworks, came on yesterday with your colleague, Ron Bodkin, from Terra Data and Think Big Analytics, so it's really great to have you back. Thanks for coming on. Thanks very much. So we just had Mervon, and we're asking him, what's the big theme over there at Javits? And he said, well, he boiled it down to two things, simplicity and the store. And a lot of people threw in several others over the course of the interview, talking about data in motion, being able to ingest data, all kinds of really interesting innovations going on. What's your take on the big themes that you're hearing this year from customers and the ecosystem generally? I think the big thing is, and we talked last time we were together in the summer, is moving from really cool tech to, what am I doing with it? How am I changing my business? How am I creating value from the technology? And as you start to see the early adopters get more mature and the second wave of adopters come in, it's natural to expect a drive for more simplification, right? So simplification covers a lot of aspects. It could be anywhere from, I don't want a command line, I actually want a GUI so that normal human beings can use it. It could be, how do I integrate the projects? How do I make deployment easier? How do I make governance and security easier? Those are all kind of the second order questions that we're starting to really hear a lot of energy around. Yeah, and so on the simplicity piece, the other thing Merb was saying that I found interesting was that there's so many new projects going on and you're closer to it than I am, but the skills that you learned a year or two ago are becoming outdated instantaneously. And that's a challenge for organizations. How do you see the ecosystem dealing with that challenge? Is it a matter of just more training, more resources? Absolutely, there certainly is a great network of information flowing out there, being in an open world. It's easier to obtain the skills that you don't have to sign up for training and pay money. You can go online and learn things. So there's that aspect of the environment that we live in, but certainly what we're trying to do with Hortonworks is really innovate around the core rather than create a whole bunch of new separate different projects that you have to understand. Trying to make it a little bit more consumable, but at the same time leverage that open community for innovation and it's a balancing act. Well, so follow up on that and then Georgia, I want you to jump in here, but so you, the big distro vendors, you know, the big three, two of sort of announced database projects this year, kind of end running HDFS, you guys have a different strategy, sort of evolve HDFS. I wonder if you could sort of confirm what I just said, and I may have got it a little bit off, but then I wonder if you could talk about the Hortonworks approach. Yeah, so again, our approach is really to innovate at the core and make the core better. And I think that's founded on, you know, a couple of the key constructs of our belief system in terms of running a business and the business model. The first is that you really can't beat the open community, right? And so we've seen huge innovation. This is where the whole Hadoop ecosystem really started is a community that's very open, leveraging innovation. And you know, from my history, having worked in a proprietary software company for a long time, one of the biggest differences that I see is, you know, developers in the open world actually gain credibility by sharing ideas versus in a proprietary development model, right? Having ideas in your head is your job value, so you sometimes keep more of it to yourself. So that openness and the community open is really appealing because people gain credibility by sharing ideas, sharing ideas, you get more eyeballs on the problem. You cannot out-innovate the open community model. So I think that's one of the core and foundational ideas we have. And then obviously getting into how do we deliver and make the technology consumable, it's really at this point, as I mentioned, and we talked, you know, at the very beginning, it's about simplicity. So do I really want another project or another six projects, or do I just want to make it work in the core? And so to the extent that some of the core underlying technology can be extended and will work, and we can leverage the open community, that's obviously a first strategy. If there's some net new piece of work that needs to get done, as an example, data governance is one of those second order things that becomes very interesting. And so we went and we worked with the community and some of our customers and we built Apache Atlas as an open community project to go solve governance because there really weren't assets in this stack before that covered that aspect. So I think sticking to the core, making the core work better and leveraging the open community is more the strategy than creating 16 new projects to solve a very thin slice of functionality. Just as a follow-up to that, Scott, one of the things that Databricks did to simplify access to sort of all the Spark APIs was the notebook capability. So it's like one development environment. If I date myself, it's maybe like visual basic making it easier to work with or build Windows apps. But there's a Zeppelin project that's for Hadoop. Can a notebook simplify a bunch of separate projects so that a developer has that same experience or similar experience to working with Spark? And I think that you fit on another aspect of the simplification, which is really moving from command line to UI. And so that is an area where we, through Umbari and Umbari views in the management infrastructure space have been starting to build that out. And yeah, I think there's a lot of promise behind projects like Apache Zeppelin where again, you can have an open community development kind of environment, but actually take it up a level to the user experience. One of the things that, again, having yesterday in the interview we talked about having seen this movie before, I think, in one of the, getting to mass adoption means that you have to take away some of the difficulty and make it a little bit more easy to consume. And so I think some of these UI projects are extremely important because we can get more eyeballs, we can get more folks, it's very easy to come on board and take advantage of the technology. There's a trade-off. The fact that you have so many projects gives huge choice. You can do the mix and match to build, say, an analytic pipeline in a way that would be much more difficult with monolithic DBMS. But at the same time, if you've got all these projects evolving independently, putting a GUI over them can be a challenge as opposed to putting them on one sort of unified development cycle. It can be a challenge. I think one of the things that you'll see is with rapid adoption and deployment is as winners emerge in the GUI and certainly Ambari and Ambari Views is emerging as a winner in that space. I think Zeppelin obviously has that opportunity as well. I think a lot of the project owners will say, gee, it's to my advantage to fit into that framework. So how do I code to that API? How do I make it more compatible? So you actually turn it around and say, the leverage is how do I get more users taking advantage of the technology and it becomes a pull more than a push. Okay. So if you heard what I said last night in the panel, what I talked about, I was reading SiliconANGLE the other day and they were quoting me from theCUBE and we were closing and I said this space is crowded, overfunded, profitless but has a lot of potential. And we talked about that on the panel. And I don't state things that are factually incorrect but it has implications. And you can turn those all into positives. It's good that there's a lot of funding. It's good that it's crowded. There's innovation. My question, Scott, is there's a lot of discussion in the community about, geez, there's not a lot of IP because it's all open source. What happens to all these companies who have all this funding? Hortonworks not concerned about that. You have a different business model. I wonder if you could sort of comment on that. Just the lack of hardcore IP that is right, let's say, for instance, for an acquisition and how that relates to what you guys are doing. Yeah, I mean, I think the open source phenomenon as it relates to this space is certainly very much driven by the rapid pace, the rapid change, frankly, the rapid growth of volume of data and sensors and standards and all of those kinds of things. So it's kind of like chicken egg. Did open source come first or did the requirements come first? I think they really fit hand in glove. And so you're right, in an open world, the environment is completely different. Your influence is not what's between your ears. Your influence is how you package it, deploy it, make it supportable for customers and go find that business value rapidly and support your customers. And that's obviously the business model that we at Hortonworks have built around. And I think that it's a dramatic shift where the value is not in owning the software anymore. The value is in delivering, deploying, packaging and supporting. Yeah, so I mean, actually there is IP there, you just don't charge for it. That's correct. Or yeah, you monetize it differently. Yeah, we'll monetize it through. Well, it's interesting, our data suggests that the number of people actually paying for things like Hadoop, for Hadoop specifically is way, way up this year relative to last year. I think two thirds are actually paying, last year was around 25%. So we're seeing a huge, huge spike even though there's a lot of talk about adoption, maybe being slow and so forth. The number of organizations that are getting more serious is clearly on the rise. And you can see that, I'm sure, across the street. So I wonder if you could comment. I think like anything, you move from science project to dabbling to trying out an application to moving into production. And when you start to move in that application into production, that's really where paying for the Hadoop infrastructure becomes important and interesting, making sure that it's supportable. As those production applications start to drive business value as they drive business process, having the ability to do change management, change control, understand that it's been tested, understand that it's secure, that you can do data governance around it. Those things become more interesting and I think that's where you get the on ramp too. Now, yeah, I need to, I want to go with a trusted provider and get support. Following up on that thought process, I mean it makes, as you were saying, it makes perfect sense that as you go into production you need a support relationship. But is there an opportunity for the distro vendors to be even more proactive, not the break fix model, perhaps to peer into the deployment, a customer's deployment and look how it's operating and proactively help them do better? Yeah, and actually that's kind of the next thing that I was going to get to, so it's perfect, is I think we talk about what's the value and how do we go create value in this market. In an open source world, you have a support case, the support case comes in, there's a bug or there's a function that's missing that I really want. Working with a vendor that's got lots of committers and has got a lot of the folks who can go impact the community and get those things in, not just a bug fix but actually a design implementation change to really make the product better. I think that's really a differentiator in how you choose a distribution vendor, how you choose who you go support and that's certainly a big part of the value add that we provide in terms of being able to get that and drive back into the community, not only changes, bug fixes, stability improvements, but gee, we have a cluster of customers that really want to do this. Let's get that added to our feature functionality. Okay, let's talk about ODP a little bit. We had a panel last night, ODPI, sorry. What's the I stand for? I'm still confused on that. It stands for whatever you want it to stand for. I ODP even. Okay, so we talked about a lot of things and it was really the purpose of the panel was to give us ODP's vision of the enterprise and we I think openly discussed many, many things. So I wanted to, so George, the components of, oh, help me here, ODP, Avari, HDFS, MapReduce, what's the fourth, Yarn, thank you. So can we talk about, in the context of the discussion that we were just having, sort of how you're evolving each of those. A lot of talk about, oh, MapReduce, we're replacing MapReduce with X. And obviously, HDFS, you just mentioned that you guys are evolving that, Yarn was innovation that you guys came up, community came up with a couple of years ago. So could you talk about those components and their roadmap? Well, I mean, first off with the whole ODPI thing, I think is a really significant and important thing in the industry and actually have been a supporter of the notion for some time. And that notion is in this world where there's innovation coming all the time, new projects, keeping track of it. Having a core set of services and defining that as kind of a common kernel, very much like in the Linux industry, there's a common kernel, there are multiple distribution vendors, there are other packages you can install, but there's a common kernel. That means that application developers can now code to the lowest common denominator and at least understand that their applications and the certification efforts are a little bit more streamlined and simplified because they have that common kernel. So I think that that's actually really important for the industry so that we can avoid a lot of fragmentation and non-compatibility, which will be very difficult for consumers of the technology to go deal with. So inside of that common core, obviously, we at Hortonworks and as a driver and frankly the committer of the ODP core assets, want to make sure that we have extreme stability, scalability, depend of all the illities around each of those core components. And so you'll see us continuing to come out with new innovations. I would say probably the most rapid innovation you'll see because it's one of the newer components is in the Mbari stack, creating a common control and management infrastructure framework and user interface for management of the cluster. And I think inside of the other core components, we continue to drive new innovation in each of the other layers. I would say, however, you'll probably notice more in Mbari just because it's a little bit newer than some of the other components. What about MapReduce and the future of MapReduce? The future of MapReduce, you know, the core tech, I think there are still some functionality and scalability things and stability things that we can build into MapReduce. And I think that it's actually an underlying core tech that a lot of applications are using. And so I don't predict the end of MapReduce like some pundits have been doing. I think it will continue. Over time, I think there will be more engines that come along, that come and go, that serve different needs, and that's okay. It's an ecosystem play and the center of the universe is really moving up to the yarn layer anyway. And so being able to plug additional engines in to work in addition to a MapReduce framework will be very interesting. So MapReduce maybe becomes invisible, but you use the resource negotiator as sort of what people are going to touch and feel and say, go ahead, George. Along the lines of ODP, there's commonality and I imagine that makes it easier to support with the movement to hybrid deployments on-prem and cloud. Does the value add opportunities change for a distro vendor when they're on the cloud? Everyone's potentially on the same release. Your ability to help people run is sort of the same across all customers. I think that the value proposition is a little bit different and obviously we have been investing very heavily in that space. I think that my prediction is that more than 90% will be hybrid. I don't think it'll be either or, but probably both in most instances. And so to that end, there's an additional level of management infrastructure becomes interesting. In the cloud, and we've made some investments with our cloud break technology. Being able, cloud break, being able to provision, cloud providers can provision the hardware. They can provide the bare metal as it were. But then being able to provision, how do I want my Hadoop cluster installed? Which packages do I want there? Which projects are unimplemented? Can I create a blueprint and make that replicable so that I can easily spin up and spin down cloud instances? How do I create provisioning and scaling? So one of the big values behind cloud and one of the reasons I think we'll see lots of hybrid implementation is that burst processing. On Monday morning from eight to noon, I need to double my cluster for processing. And then I'm done with it till next week. So being able to automatically spool up and spool down and have all of the services work and just have that all be streamlined. That's also part of the value add that we can put into the packaging and build around the support model. Does, do you have a different set of competitors in that circumstance where a Microsoft or a Google or an Amazon would potentially compete as the cloud part of the hybrid deployment? Or is that too difficult if you're the distro for on-prem and they would try and be the distro for the cloud? I don't see it as competition. I see them as providing the infrastructure as a service and us being able to add value by helping our customers manage the infrastructure that they're provisioning from those providers. Okay, so in other words, the assumption is there's going to be that you would provide the manageability service. We take those services and actually make them even more consumable for our customers. I think it's a win-win for us and the cloud provider. So we're out of time, but Scott, last question is, how are you spending your time at Hortonworks as a CTO and how does it sort of differ from what you were doing at Teradata and running Teradata Labs? You know, it's about the same in terms of how I spend my time. I probably spend about half of my time in front of customers talking with them about what their needs are, where they're going, how we can help them, how we can help them be more successful. And then obviously the other half of my time with the product management and engineering teams talking about how do we, how do we want to impact the roadmap? How do we want to line up our investments and... And cracking the whip. Well, I don't know if I'd call it that, but yeah. Hurting cats. Hurting cats, right. Excellent, well, Scott, thanks very much. Thank you very much. Great to see you again. Good discussion. Thanks. All right, keep right there, everybody. We'll be back with our next guest. This is day three from Big Data NYC at Strata and Hadoop World. This is theCUBE, right back.