 Live from the San Jose Convention Center, extracting the signal from the noise, it's theCUBE, covering Hadoop Summit 2015. Brought to you by headline sponsor, Hortonworks, and by EMC, Pivotal, IBM, Pentaho, Teradata, Syncsort, and by Atunity. Now your hosts, John Furrier and George Gilbert. Okay, welcome back here. When we are live in Silicon Valley at Hadoop Summit 2015, this is theCUBE, our flagship program. When we go out to the events in the next trip, the signal from the noise, I'm John Furrier. Now I'm just siliconing, I'm showing my co-host. Wikibon's new big data analyst, George Gilbert, wikibon.com, and our next guest, got two great guests, Ryan Peterson, Chief Solutions Strategy at EMC, and Scott Now, CTO at Hortonworks. Welcome back to theCUBE. Guys, back again, together, EMC, Hortonworks, welcome to theCUBE. This is the place to hang out at this event sometime. You guys were on earlier individually, but I want to talk about the storage aspect of big data. We were just talking earlier on with Scott about the issues of how to get customers in a situation where they don't get a large cost increase when they have success. Or Chris, some sort of foolproof environment where you can have some agility and flexibility for the app guys. So as an architect, person, and enterprise, how do they figure this out, Ryan? Because I don't want to get locked into EMC, I don't want to get locked into NetApp or anybody else, but I want my data to be free. Yeah, I know we've talked about this to Mass, and I completely agree. You've got to maintain flexibility across all of the stack, and whether that's swapping out the particular storage infrastructure, making that available, or swapping out the distribution for the right distribution, or having the right tool sets, being able to put whatever application you want, going to make sure that flexibility that's avoidance of under lock-in. And as much as I work for a storage company and I want to try to lock in my storage infrastructure, the reality is that customers need that choice, they need that flexibility, or it's just not going to work out long-term for them. And I think one of the things about being in the software business, what interested me was the EMC world, Jeremy Burton said that I think 90% of the engineers or maybe 95% is software, not hardware. Yeah, if I do it, we very rarely talk about hardware anymore. Think about it, we create OneFS as a storage system for Isilon as an example. Most of that's all just code on the software side. To some extent, it's what the community's doing around HDFS, and that's where the one competitive point is between EMC and actual community. Everything else is really the whole stack, we love it, we support it, and that's how we go to market. What's interesting, we had talked to Scott Connolly, and he's like, he's the strategy guy, he's got the chessboard, and he's always fun to talk to because he's got color to where he describes it. But I asked him, why are you guys winning? And he said, the analysts who did research was up on stage, what's his name? Thomas Del Grequillo, I think it was on stage with the research guy. He said, important work is winning the largest deals in terms of large enterprise, not large number, but like big enterprises. And I asked him why that was the case, and what was interesting was, Scott, was that as the companies get organic growth of deployments inside the company, it's diverse. You have a little bit of Isilon here or EMC, you got open source over here, and then they got to rein it in. It's not shadow IT because it's actually happening inside the company. So it's going on, and at some point they go, okay, we got to kind of rein this in. Seems that's where we are right now. So how does someone figure this out? I'm an architect at a large enterprise. It's not just an RFP, I actually got to think this through. I got to say, okay, how do I rein it in without disrupting the innovation? That's the question, Scott, how will you look at that as an architect? Okay, rein it in, but don't screw up any innovation. Well, it's not easy, it's a somewhat daunting, right? But I think it really does get back to trying to define the data fabric for the organization at a logical level, and then really looking at the total cost to the organization, right? And the different standards that have been implemented across the organization, and where does the intersection of the data fabric logically land based on different use cases in the standards that have been chosen, right? And so, like we've been talking, it's about ecosystem flexibility and having choice, and then letting those architects have freedom to go design and define what they want, versus kind of the other way around, I've chosen a standard, therefore everything is going to be this. Ryan, I got to ask you the EMC question. I know you're a little bit different inside of EMC because you've got a technical background and you've kind of got like a visionary DevOps vibe to you. The old EMC, the old, go back in the old days, very competitive, very sales-oriented culture, win the account, elbows out, lock everyone out, win everything, burn the village, EMC wins, because back then storage was, you could be a single vendor storage, and you had emails of storage and some files. Now it's big, it's huge, so, and with the growth of data, EMC doesn't even have to do much to win because there's more demand for storage than ever before. The only thing that EMC has to do is provide a great solution for developers, to store stuff. That's DevOps. Did I get that, so is that the vision of EMC? I mean, if you just get out of the way of the innovation and just be the best storage for apps and data. Yeah, I think in general, as you know, we've gone through a pretty significant transition over the last couple of years, especially. Companies started off with a bunch of scale-up applications, really made its way through the mainframe era, and been around for a long time, it was a very significant portion of the business. What's changed is really the entire persona of how EMC is going. We have to say, EMC's no longer your father's EMC. And by that means, if people like Jeremy Burton and Jonathan are going out and they're changing the vibe, the culture completely. I mean, we're jeans-wearing, although I'm not today. Jeans-wearing, t-shirt-wearing, try to join the Silicon Valley in a direction of things. And as a result, we're doing things like open-sourcing projects. I think when I started at EMC, I would have never thought that that would have been a direction for us. And yet, just a few weeks ago, we open sourced our first set of projects. And I think we have about five or six or seven projects already open sourced. I expect that to be a continued trend, as we'll continue to look at how we change our business model to fit in and match in with the way the world is going. It's a very unique- Scott, talk about this, because we were just talking about your statement about Presto and open source. There's now a playground and a real track for these trains to run on for big companies to transform themselves with open source. How does a company transform themselves to be open sourced? Because in the old days, open source is a way to kind of get in, well, not open source, but standards bodies, which open source basically is, in a way, de facto standards body, the way the community. Self-police is itself. Vendors would go in and meddle and screw things up and try to have agendas, ITFs, a long list of history we all seen in the past. But now with open source is more community-based. Those shenanigans don't happen because the community will self-police. It's like its own governing body, if you will. But how does a company, like a big company, come in and do open source? Well, I think that number one, stepping up to go do it, takes some strategic thinking, right? Because it's not going to be the norm. And number two, encouraging a community to get built around the code that's been distributed is, I think, a very important part of the credibility factor that comes with, yeah, it's open source, but it's still kind of a single code base that two guys wrote, versus a community of people that are actually making contributions, trying to compete for better algorithms. I mean, the whole beauty of open source is that you get more eyeballs on the problem, right? And so promoting that community aspect, I think, is a really important part. So taking this strategic step to do it is probably a white knuckle decision in some boardrooms, but then, you know, investing in, investing in to build the community, I think, is a key success factor. It's a whole new customer constituency. If you think about it that way, it's the old book from Dale Carr, you had to win friends and influence people. In open source, there's a playbook for that. It's called, Build Good Code, and Gratiate Into the Community. But that's actually, do you have to build open source in from the start? Or can you take something that was built proprietary and the code base was, you know, the thought process is different. And I mean, of course, anyone can publish the source code, but is it the same as designing it to be open source? I'll take that. It's an interesting question, because we just did this. So we built a software called Viper Controller. We put tons of money into going out and marketing this Viper Controller software. And of course, it's made quite a big splash. But we realized at some point that what we're doing with this needed adoption by other companies outside of EMC, it needed other storage industry providers, for example, to start integrating into the solution. We were finding that, you know, because we created it, that they weren't really that interested. We took our code as we developed it and we open sourced that. We actually literally placed it up and we recalled it's a copperhead. And so a copperhead went out just a few weeks ago and I expect that to make actually significant changes to even Hadoop, because as we started to integrate into ultimately different storage systems that are already in the data center, re-utilizing those assets. How can we do things like take Ambari or Yarn and start integrating directly into those storage systems, hit a click of a button? Well, that's what Viper Controller was built to do. And now we've open sourced this. We didn't go and let's ground up, build it for open source. We took the code that had already been developed and said, so here you go. So it's not necessarily in the format that maybe the open source community would want it, but it's there and it's clean. Guys, talk about the relationship between EMC and Hortonworks. Obviously Hortonworks' whole strategy is pretty clear. Open source, all pure, all open all the time. Consumption with ODP and equivalent and which is closed ecosystems, right? So ecosystem partnering is a big deal for Hortonworks. So talk about the relationship. Obviously enterprises, you guys have huge presence, EMC and the enterprise storage. And like I said, you're already there. Hortonworks is evolving, emerging as a leader in open source, large enterprise standardizations. What's the relationship? I can tell you how it started. It's a prior to Scott coming on board, but we had customers, joint customers that said, hey, we've got Iceland. We want to put Hortonworks against the Iceland cluster. How do we do that? And of course that forces some conversations in the very beginning. And we say, well, you do want to work together. You want to, and ultimately, we launched this relationship in February, officially. And already we have dozens of up and running installed customer base. So I think it's successful in that regard. How we take it to the next step is, do we build better hooks, better integration? Do we just make it easier for our customers to consume? So for example, we built Ambari plugins directly into our code set. So if Ambari reaches out to an EMC cluster, we respond with all of the right requests that a normal ODP implementation would want. And so doing those things I think is helpful in providing a good integration. But I don't think student, what you're talking about? I don't know that I can add a whole lot more other than, yeah, customer demand and customer choice is really, really important. And we talked about it before. There's the logical view, and then there's a physical implementation view. And in this world, it's not going to be one size fits all on how things get deployed. So creating more choices, making sure that the choices are tested, certified, and we can build better engineering together, I think is key to this. Yeah, but to your point, we talked a little bit about this data being free thing. This is really kind of a key thing. My mind is still getting my head around that, which is if I'm a developer and I'm driving data, I need to have that data available. I don't want to get caught in a net of migration. So how do you guys address that? EMC, I mean, answer that question. Do we, do we, how do you help me if I'm an app developer? So you're an app developer? We agree, Hortonworks, Hadoob, I'm in. But I don't want to get stuck if I'm successful. I'm going to blow up in costs just to kind of do maybe better stuff down the road. I think Hadoob implementations always seem to start pretty small. You know, if you're 20 nodes, you're probably actually a pretty decent size install for beginning cluster. And what tends to be the problem is not when you're starting out, it's when you get to very successful range and now you're having to figure out how you manage potentially thousands of nodes. And sometimes you didn't plan for that. Your data center's not ready for that. You know, the power of the cooling. You don't have the resources of Yahoo and Google to go out and build a new data center on the water to cool your infrastructure. And so what we do is we try to, you know, look at those kinds of things, how can we bring down the cost? So for example, we have, Erasure Coding's already been built in in Iceland for many years with ECS, which is our geoscale solution. We do Erasure Coding across different nodes, along with XORing to try and reduce the overall load. I'd really try to keep that cost very efficient as low as possible. Add feature sets that care about efficiency like deduplication. So not only do Erasure Coding, but we also dedupe blocks at the 8K level. Some really great things to try and keep it down. Finally, and I think something you and I have actually talked about before, is we try to keep it open and flexible to the tool set that you're going to ultimately consume the data with. A great example of customers using Splunk in Iceland. Quite a few. And those customers are ultimately looking at, I want to look at the same data that I brought into Splunk. Now, important works. Well, how do I do that? And we're one of the only solutions that I know of that can ingest all of that data from Splunk. Drop it into a central data repository, and ultimately, data becomes available to Hortonworks in the next step. So guys, talk about, so I'm going to get your perspective on a more technical question. You know, in the tech industry, we had Moore's Law. Moore's Law would increase the performance, price drops and performance up, then there's a new processor, which is high price. And then Moore's Law continues again, and that created a lot of innovation. Is there a similar metaphor or analog in the open source Hadoop world or the big data world we live in that could be not necessarily a Moore's Law, but some Moore's Law-like where I can understand the cadence of innovation. Because what we're really talking about here is if this plays out, then things like Docker containers, Kubernetes, orchestration in the cloud expand the overall functionality. Certainly virtualization has done a lot to change the game. So as all this tech comes together, this operating system builds out. I mean, it's like a main, distributed mainframe, if you will. So how do we think about the cadence of innovation? Is there a way to think about it? Have you guys thought about that? Any personal thoughts on this? Well, yeah, certainly cadence is an important thing, right? And the more mature something gets, the more it will start to slow down. And so that's why I actually believe that the model that we're building, where we rely heavily at the Port-and-Works, at least, on the Apache community and being very open for the innovation and then being able to add a regular cadence that's predictable for our customers, kind of draw a circle around the latest of each of those things, package it up, do the testing, make sure it's supportable and make sure there are no regressions in it so that it's dependable when we release it. That's kind of the best of both worlds. So we're not limited on the input, but we will at least look to contain the output so that it's predictable for enterprises. By the way, if companies want to get ahead of the release, they can go to the open source and pull it, right? Do it at your own, at your own point. Yeah, and those are the guys who are the, the guys who eat glasses, spit nails, those are the hard coordinates. They're going to get that, you mean some of that code, you know? Exactly, exactly. Pulled their own Linux distro. But at the same time, I do think, I do think it's important for all of us to work with the community to try to keep it from slowing down and getting it, you know, as committees get larger, sometimes decisions get slower. And so, luckily, we haven't seen a slowing of the cadence today, and you know, certainly part of our strategy is to make sure that doesn't happen. And it's having good projects out there that provide value to me. But is there just the structure of Apache since they're not the ones trying to put all the pieces together, it's almost like, you know, let a thousand flowers bloom. I don't know if you were the one that said that earlier. I mean, ODP is the one that's then kind of packaged it up into something that is consumable. Yeah, I see that kind of as a separate thing, but also somewhat important. In that, I think for the industry, for the Hadoop landscape in general, I do believe that it's extremely important that we figure out a way to agree on a common kernel at certain points in time to make it easier for app developers to be able to develop their apps and not have to worry about this distro versus that distro, only really looking at kind of the kernel version, just the way Linux works today, right? But isn't that ODP's role? And certainly that is the goal of ODP, to try to create at least that common set of services for application developers so they can have kind of a universal thing to test with. You know, one of the worst things that I think we could do as an industry is to let that kernel fragment into multiple different things. I totally agree. And then it would just not be sustainable, and the innovation would actually slow down dramatically. Yeah, yeah, I would agree 100% with that. What's your take on these whole case? Is there a way we can, the common tech person could like just understand, think of it like a journalist or a blogger out there, just like, okay, it's good, things are healthy, green light, go. I have an interesting thought on this, so a couple thoughts, one is, I've always been told that whenever you have an idea, someone already had it, there's a good chance that somebody's already implemented it. I think a lot of our cadence development comes from organizations pushing us to be innovative. And to the extent that, hey, why haven't you guys put this into the environment? Oh my gosh, I didn't even think about that with it. Yeah, let's put that on the list and let's take a look and see if it makes sense for the overall organization. I guess my point is that I think, it's not even say a cadence as much as it's a, just a raw speed of development and innovation. People constantly have a different thought and they're putting that into the next thing. And if there's an opportunity to innovate or an opportunity to make money on it, then somebody's going to put it into place and they're going to try to make it happen. I think we're going to start seeing things that nobody ever thought would happen with Hadoop in the very beginning. I think we're going to see applications developed and we're not going to be talking about big data or analytics or any of the things that's traditionally are the stories today. I think we're just going to be talking about scale out applications of the data center. I think that's going to be the future and when that starts to happen, the whole new level of innovation. It's really hard to think about because I ask the question because it's a mind bender of an exercise because if you believe Rob Bearden's electricity come in, the keynote, which is very interesting, it enables more stuff, obviously, the quality of life increases, and then Herb was talking about railroad standardization, you know, the standardization of having this fertile soil, if you will, with Hadoop. It's pretty interesting. So it might be that there's no, it might not be any case, it might just be a complete changeover. So to me, I'm trying to figure that out. How do I describe the level of innovation from Hadoop and big data at a level that people can understand? It's really challenging. We study history to determine what the future will look like. And if you look at history, this has been a repeating pattern. It hasn't happened, in my opinion, in this particular industry. We'll call it the data center at scale. And so we've had as things like Microsoft Windows, Windows 1.0, I remember it was a lot of custom development, custom programming, and then all of a sudden 311 comes around and you've got a pretty stable platform and then Windows 95, everyone just adopted massively. Of course, the same thing happens with the iPhone. The device comes out, starts out so just a handful of applications. Now there's a million something plus apps out there. But it was the platform had to get created. Developers had to start understanding it. The applications start to innovate. And all of a sudden, as soon as there's applications that are changing the world, then all of a sudden that drives the infrastructure, drives the play out there to the bottom. I think some things will work out. We got a wrap, but I want to get just closing comments from you guys just last word. Good relationship. I look good to see EMC and Horton work partnering well together. What's next for EMC? What's next for you guys in this open source roadmap for you guys? More coders, more action, meetups, crowd chats, CUBE interviews. What are you guys going to be doing out there to get the word out? How are you recruiting people? Just give us an insight of what's on the agenda for the next year for you guys. You know, the next big thing is we've taken and broken up our strategy into multiple components. The components are, how do we continue to go with the data lake strategy that we've been going really well with Hortonworks, for example? And then start to look at the fringe cases where things just don't work today. We think flash, for example. Very, very high-speed flash performance, getting into the millions of IAPA range in very small windows. Talk about DSSD? Yeah, we're talking about DSSD. And start looking at DSSD as an example. Cloud-scale applications like what we're doing with ECS, integration of ECS to full automation. We're looking at how can we give you guys choice from a customer perspective? How can you give you choice to put any particular platform underneath this great scale-out set of Hadoop? All right. Scott, anything final word you want to share with the folks out there? I agree, I would just say the data lake is not a one-size-fits-all thing, and it's going to become even less a one-size-fits-all thing over time as use cases evolve. And so having choices, like we mentioned, SSD, rotational drives, high-capacity drives. And what we see from hardware vendors coming in the future is there going to be a lot more choices. So being able to integrate that and provide flexibility at an API level for our customers I think is a joint value. Integration and growth. That's a big theme here. Guys, thanks for joining us. This is theCUBE. We're getting the hook. We're going along with these good segments. We'll be right back after this short break. This is theCUBE. Live in Silicon Valley at Doop Summit 2015.