 Live from New York, it's theCUBE covering Big Data NYC 2015. Brought to you by Hortonworks, IBM, EMC, and Pivotal. Okay, welcome back, everyone. We are live in New York City for Strata Hadoop. Big Data NYC is the event, and this is theCUBE live for three days of wall-to-wall coverage. Day three, kind of, all the action's kind of coming together, we're here in theCUBE. I'm John Furrier, the founder. I'm still going to hang on. This is Joe Horowitz, a VP of marketing analytics at IBM and Ryan Peterson. Big Data Ryan, chief solutions strategist at EMC, and we have exclusive content here. You know, theCUBE is, we're breaking news, we're putting it out there. Not necessarily hard news, but this is actually significant industry-related formation, so guys, let's talk about the IBM-EMC relationship, you know. This is a market where customers, guys figure things out, just make it all work. That's the word we're hearing from customers. We want solutions, we know there's a cobbling together of multiple technologies, it's the one cohesive fabric. IBM, EMC, guys, all work it out, be friends, be competitors, whatever you do, you got to work it out. So guys, what's the news? I'll take a quick stab at it. You know, this is a concept of co-opetition, right? So we obviously have business units that compete, and we have business units that work together. But customers are looking for an open infrastructure. They want to be able to choose what they want to choose. And in organizations as big as we've gotten, we really can't be 100% competitive. We have to figure out where we work together. And customers reached out and say, hey, I love this product, I got to say. I mean, the stuff that IBM's been doing at the analytics level and the tool sets they've been creating, really incredible. And the customers are picking up on that. And we said, well, we really need to make sure our joint customers can work well together. And so I'm really excited about this. So IBM, obviously, doing great stuff. I mean, from a company standpoint, we've been following you guys now on the messaging side for multiple years, executing the vision. And you guys are winning, doing great. Might not get the press headlines with all the IBM news. Oh, stock buybacks, all this happening, but you guys are executing well. Obviously, Bluemix is doing well. It could go better and go faster. And you guys are working on that. Big data, you guys got the Watson announcement. You guys are investing in moving fast. What is this deal mean for you guys in IBM? Yeah, I mean, you know, we see our main value add. We go deep on analytics, right? And so I see this as just an extension of that. I think one of the things to comment on the, on what we're doing together is the fact that our clients are asking for choice. Our clients are asking for less complexity. Our clients are asking for an open community to work with. And I think that's what EMC, IBM, Hortonworks, Pivotal, all of us are trying to accomplish with this ODPI is truly making, taking the friction out of the system. You know, it's a systemic problem right now that we have with Hadoop, where a lot of people are making money on creating friction, frankly. And so my point of view is frankly that we need to remove that so the entire market can grow. Because otherwise I see it stagnating. And so I think this, you know, these types of things that we're doing together, these types of partnerships, I think are really exciting because- Can you name names on who's creating friction and who's monetizing friction? There's a lot, you know. How much time we got, you know? So what we do in the media, we create friction and then kind of clean it up after. We make money on both sides, friction and love, love it. But this is a love deal. But I want to pray on that point because this is something that's important. Open source just tend to being a jockeying game, arm wrestling, whatever you want to call it. Hey, you start a project, I'll start this project, glasses half full, the glasses half empty, it's always been this way. But the ball moves down the field, the evolution happens. But right now the customer demand for solutions is pretty clear, the straight and narrow there at some level. The technology stacks are not moving along fast enough from any one vendor. So now it's the collection of the parts of the pieces. Right, well that's where the value is if I may interject. I would say that the value is how you couple these pieces together. And IBM knows this. I mean ever since we made our decision to support the Linux technology center and we got the open source software and a bunch of things. We kind of been working in this business model for quite some time. And so this is just an example of that. I think to put that point, an exact customer is a telecommunications example. We've been storing all of their CDR content on Icelon for actually many years now. And they said, you know what, we just want to start analyzing this content. So they looked at IBM and brought them in, plugged in the software, connected it right to Icelon and it started working no problem. All right, so let's get specific. What is going on? It's Icelon? What's the relationship between you guys in the big data space specifically between EMC and IBM? Be specific. So we used ODPI, which is a great way to get started. So the benefit now, and I've been a chance to really talk about the benefits of ODPI for us. But you know, we have to validate the RPCs all work with HDFS. And so we do that with with Hortonworks and Pivotal today through ODPI. Well, when the customers... ODPI, that's the open data platform initiative or I'm not sure... The new letter. Is it a big eye, little eye? Little eye. There's some rebranding going on right now. We had a panel last night. So yeah, so people are familiar with that. So this is a part of the ODPI evolution. It made it possible, right? It simplified the ability for us. So you know what? Well, this isn't a heavy lift between either one of us because they had already gone to shifting their direction to ODPI versions. We had already standardized the RHGFS implementation to ODPI. So okay, well there you go. Then the two work without any friction. So there's a great benefit. This is basically a use case of the promise of ODPI in the sense that when we were talking about it at the last event, you know, I was not, I mean, I was not critical of it. Dave and I certainly, you know, analyzed and said, this could be one big vendor love fest and nothing happens, or we'll see what the proof is. So this is evidence of a working relationship. It actually did end up being a good thing, at least for us to be able to- All right, so I'm a customer. I'm going to put my customer up. Pretend I'm a customer. And I'm a big, large enterprise. Okay, guys, great. You're holding hands at the altar. ODPI helped facilitate that. That's phenomenal. All I care about is I want the solution on one SLAs. That's what they want. They want to have delivery. They want you guys to do your jobs, a lineup and execute your- So I'll take that exact customer. The first customer has actually done this. We have actually a customer who's in production now with us. And they just took, because it was ODPI, it was tested, it was validated. IBM brought their software in. They plugged it in. And a couple days later, after just getting through all of the little setup procedures, it worked and it worked great. And the customer's very happy with it. And they said, well, hey, this was surprisingly easy. You could keep doing this. So I got to give you some, we had Merv Agent from Gardner. So I mean earlier, and I want to share with Joel and you get your perspective on some of the comments. Some of the things Joel just mentioned. The people that are at this event, Big Data NYC and Strata Hadoop, some of them are funded just for one purpose to deploy Hadoop and make it easier. So deploying and managing Hadoop easier. That's their whole focus. That's a one-trick pony, right? He also made a comment that every year, the complexity in Hadoop is getting more complex. It's getting harder. So it's more complexity every year, to your point. And then he said, secondly, the data's separating out two data sets of research. Big Data and Hadoop, two different surveys. Big Data survey, 76% of companies are investing and are planning immediately to invest in Big Data. That's a huge difference. 42% in Hadoop. Right, so where's the other 30-something percent? And they added Spark in there, which came out really kind of low actually. Interesting. And there was another term in there, Spark, kind of lower on the, this is full on enterprises now. Yeah, and I think that's the whole, this idea of Big Data. So what is Big Data? And clearly from that survey, or those two surveys to combine, it's not just Hadoop. And so a lot of our clients already have Big Data and they've made huge investments, whether it's on mainframe, whether it's on ISLON, whether it's on power systems, like any of these systems, they've invested in over the last few years and they want to see a return on that investment. And the beauty is, is that, whether I think for whatever reason, Big Data got coupled with Hadoop. And that's not really what we see. We see actually Hadoop as opening up that data more. We see these types of relationships, giving the power back to the IT team to do more with their data. Because otherwise, it just sits there locked in this confined data warehouse in a very rigid structure. And I think things like Hadoop and NoSQL databases are kind of creating this malleable space for people to work in. Well, I think it's great. I mean, you guys have your own storage group. We go IBM Edge, we've covered that event many times. So it's not like you don't have storage. It's not like you're doing an OEM deal with EMC. This is a partner deal as part of ODPI with ISLON, which by the way is kicking some butt in a lot of these big named hyperscale accounts. And it's a great use case. So customer choice, this is a customer choice thing for IBM, right? This is where you guys say, you want to work with ISLON, no problem. Yeah, and I think it's interesting because I think there's a misconception that Hadoop is like this greenfield opportunity. It's like this white space. It's not, right? People are like, oh, we're going to offload all our data off a mainframe or off our systems into a data lake. And it's like, that's ridiculous, right? No, what's actually going on? And the people who are getting the most value out of Hadoop are the people who are actually pulling Hadoop into their current system and integrating deeply into what they've already invested in, right? And I think that's where we're seeing the value. He was just basically, I'm not going to say underhandingly slamming data laking, but data lake really speaks to the data warehouse side of the business. And you got flow, you know, you got, as Mervin said, data in motion, which is, you know, may not even hit a database or a lake, it's like, it's the ocean. Right. Well, let's say this is, I think the data lake is something that we've been building for many years. I think the fact that it's been termed that now is something that's getting more interesting. And to some extent- You know I don't like the term, but that's, you know, it's one well done. In all the cube people know I don't like data lake. It's like data ocean. In Wikibon, next to John Freer's name, is I do not like data lake. I like, no, actually there's some images out there. Data ocean has currents, it's dynamic. That's real time. I think we should have data oceanomies coming in. Data torrents, data pools, data screens, everything's water, a river. A river, exactly. I think, you know, we've been building architectures of scale-out architecture. I think scale-out to some extent can equate to data lake. With the same time where we need to make it so you can analyze content within that store. And so, you know, from our perspective, it's really easy. Take a big insights, for example, and you point it at the network share, HDFS network share, and you get this parallel connectivity and it just works. Yeah, so I bring that up only because I want to highlight the comment and kind of tell you on data lake. It's not one thing. There's a lot of different use cases architecturally for the use of storing data, of which Merve said on this bumper sticker, my last question, what's the bumper sticker of this show? He said, store simplicity is the key thing that he's seeing as another thing, storage and simplicity for executing four solutions, right? So storage, again, back at the center of the value proposition and ISO on hence IBM. The interesting thing he said on top of that was, when I asked him, what are the top technologies for big data that you're seeing? Big data categorically, which was the most, you know, 70% he said, past three years consecutive for one, two, three, top three for the past three years were enterprise data warehouse, cloud and Hadoop. Spark rising up, still in single digits, but moving up the leaderboard. Yeah, I think Spark is a means to an end. So you talk about, you know, ease of use or simplicity in his words. And Spark is clearly helping with that. I think where we've stagnated a bit is, you know, MapReduce is a pretty clunky programming framework to work in. So I think Spark is actually pulling, you know, a lot of this industry alum because it's making it a lot easier to build solutions and coming back to the ODPI, I mean, that's what it's all about. It's about making it easier for people to get value out of this technology. It's not like, when I first came to Strata Hadoop world, like back in 2011, you know, it was people up there talking about data products, DJ Patel for one coming on stage saying, we built people you may know at LinkedIn and this is how we did it. Nowadays it's like, hey, you know, we're doing this with Hadoop and it's like, it's a technology kind of trade show. And it kind of bums me out because it used to be far more about the solutions people are building than necessarily the technology. Well, hence your friction comment, which you can, if you name names that'd be great for ratings, but I can tell you right now, I've seen firsthand a lot of the mudslinging because it's a land grab. So the other comment is, you know, to his point about cloud. So we'll be at Amazon reinvent next week with theCUBE and we're going to see the Amazon messaging. Certainly we've seen it at VMware with their cloud. Actually we don't do their events with IBM has their cloud. Cloud has muscle. Cloud actually understands the delivery, the operational scale that customers expect. And Merv said, suddenly he said, no one really talking about big data in the cloud until this year. Yet 50% of the deployments of big data is already in the cloud. So that means the cool kids, us, or you guys, are not talking about cloud when the customers are actually doing it. If you believe his data, that means the cloud players are going to have a huge muscle in taking over this whole ecosystem. But they're not mutually exclusive, right? I mean, we see, I don't know about you, but we see both, right? So we see hybrid clouds. So I think the trick here is not saying, well, is it all cloud or is it all on premise. It's saying actually there's only a few groups that can do both. So which hybrid cloud vendors are here at this show? Well, frankly, you're looking at one of them. Of course, that's why both of us actually are here. Microsoft had a lower third sponsorship on all the keynotes. Microsoft, IBM, EMC, I haven't seen HP here. Okay, great. What does that mean? What's going to happen to this ecosystem? I rest my case from Tuesday. I think what whales are in town. No, this is customer driven. You and I have talked about this concept of data gravity, right? The idea that data ultimately will store and stay wherever it was ultimately created. Because it gets too big to a point where it really can't be transferred between architectures. So I think part of the challenge is, are you talking about data that's actually being transacted in the cloud and then you're ultimately running big data against it? In which case, that's a data gravity challenge. Or is this actually transferring information from a local facility into the cloud? Because maybe it was an easier operational model. The marketing team didn't want to have to deal with local IT, so they go set it up in the cloud. And I think that a lot of that percentage of those opportunities that are happening in cloud are that ladder. They're just real quick, easy, simple tests. And when they go to full scale, the question is where will that be? Will that be in cloud? Or will a data gravity problem ultimately bring them back internally? That's a great point. Because the POC's low hanging fruit, we heard on theCUBE, average deal size is there for vendors, between 30,000 and 60,000. But when you get into full deployment, it's larger, I'll see six figures, seven figure opportunities. But you bring up the great point. Yeah, I can knock down a POC here and there to your point about complexity. Okay, now I got to make it work. Now the issue becomes, how does the customer consume the products and technologies to ultimately actually put the solution in motion? Me and put it in practice, operationalize it. That to me is the number one problem that I see is that, okay, hey, I don't care guys, figure it out. But operationalizing the solution is the execution of the products. Yeah, and that's hard. I mean, it is still hard. We're trying to make it easier. Cloud makes it easier. So we launched actually this morning with this discussion is something we call the Hadoop Starter Kit for IBM and Iceland. And the intent is, it's a multi-page document that walks the customer through how exactly to set up Big Insights to work with Iceland to make it. We want to really operationalize and simplify the deployment. I think that next step is, can we simplify the architecture about the application capabilities and functionality, making it so customers can really get it to be simple. Same time, we need to take processes out of the steps of setting things up and make it easier to get these up. Joel, I want to get your thoughts on some. Yeah. Go ahead, you want to say something? Well, I was going to say, I mean, one of the things that we see too, especially with Big Insights is the fact that it has a big sequel. And I was sitting in the conference or in the exhibition hall the other day and overhearing a conversation and someone's like, does this support ANSI to one of our other competitors? And it was like, does this support ANSI? And the answer was like, I don't know. And it's like, yeah, that's an important thing because if you're already running a ton of workflows, right, or- That's table stakes for the enterprise. Yeah. If you can't support ANSI, you're, don't even, you're a startup. You're like, what's ANSI? You're dead. You know, it's like, what? You never get an enterprise customer. It's just kind of like, it just kind of goes to that kind of ease of use thing and the fact that, you know, we really do need to kind of focus on the actual client as opposed to just trying to do things for the sake of doing things. So I want to get your thoughts. Yesterday, I interviewed Aide and O'Brien from EMC's that she was a big data group and Bill Schmarzo, the Dean of Big Data Author Professor at San Francisco State. I said, how are you guys working on it so hard to integrate all this stuff together? The point of customers want to integrate solutions. They want it to work. They want it supported. And they said, big data is a team sport. Yeah. So I want you to give your comments. How do you view that statement with the ODPI successfully showing some fruit on the tree? Right. What's your view on teamwork and what should people look at for being a good team player? Yeah, I mean, I would say it's a data science really is the practice in my opinion. I don't think big data is, like I said, that's just basically been around forever, right? I mean, data warehouse is big data. But I would say data science is absolutely a team sport. And I think that's where this whole idea of like, oh, there's an individual data scientist who's going to be able to do all these things. That's just not factually true, right? It's just not. So you do need a team. And I think the best way that teams work together is when they can work in one environment, right? If they're having to develop some code over here on one system and one environment and then having to like transfer that code into a different environment to go production or whatever, that just adds this friction, as I mentioned earlier, to the system. So again, that's why things even like Spark help because they have actually a very deeply integrated framework. So if you're doing SQL, it's there. If you're doing machine learning, it's there. If you're doing streams, it's there. If you're doing graph analysis, it's there. And it's all in one place. So people from multiple backgrounds, whether you're a data science person, whether you're a DBA person who's working with SQL, whether you're an application developer working on Scala, all of these people are now in the same place. And we've demonstrated that internally. We ran this hackathon as we talked about last month. And we had hundreds of solutions come out from our development team because they were able to just sit down from various backgrounds and on the cloud in Bluemix, able to actually develop these things extremely fast. And so to me, that's what we want to bring to the enterprise, right? We want to bring that in and make all of that available to them so they have this like sandbox to work in and do cool stuff. It's interesting, even if the infrastructure layer is a bit of a team sport, especially in the larger enterprises where you have a storage team and you have a compute team and you have an applications team and you have all these different guys. You've got to get those people to all sit down and say, hey, we want to deploy Hadoop or we want to deploy Spark or we want to deploy this technology to reach this conclusion or this solution. And how do we do that together? And that's actually very hard. You get a lot of competition between those two groups. I mean, sometimes there's some in-fighting amongst the team. You've seen that certainly in the baseball with Jeff Hablebaum and Brent Harper, but in the dugout there's some friction and baseball, we saw that. So it happens on these industries whether it's some in-fighting. But there's a culture shock going on, right? It's the IT crowd who have been taught by IBM and AMC, for a long time to protect data. Lock it down, it's a liability. Don't let people touch that stuff. Data orders, we talked about that. Yeah, and then you have people now that are saying, no, no, open it up. There's value there. You got to get in, get deep with your data and do something with it. And so that to me is like the actual kind of boundary. That actually came up on the panel yesterday or last night on our panel about how to balance the sandboxing of sharing economy of data with the compliance and security of protection for the right reasons. Not for, you know, massive value reasons. Well certainly lots of things we have to deal with around governance, around PII. But when it comes to things like data monetization and understanding the value, I think that's still such a big subject. I know there's people doing research on exactly how to do that. There's got to be some magical algorithm. All right, so I want to get you guys since you brought it up. The data hoarders and that kind of culture of locking down the data. Data hoarders. It could be a reality show in the Cube. I mean, it could be a hoarder, you know. Open up. Oh, I've seen some files. This looks a little terrible. It's a show that opened up in the storage wars. I mean, we actually do, you know, open up the storage and do a storage wars. I love it. So where are, where are we in the culture? Just kind of give up some color to, you know, people are getting it 50%, 100% sharing. How would you, on the spectrum of the locking it in IT mindset that you were referring to, where are we in the spectrum? I mean, from my standpoint, I see, again, I think this is one of the big values of big insights is the fact that with our big sequel, not to push it too much. You know, we do have kind of row level, table level governance, right? So, you know, when you're working with, you know, personal identifiable information, especially in the healthcare space where we're working with a lot of clients or you're working here in New York City with financial, you know, instruments, that matters. And so I think that's not changing those requirements, those SLAs for security and for governance aren't changing. And so we take that very seriously as we have for many years. And I think that's also one of the things that big insights is bringing to the table is we're one of the few kind of vertically integrated distributions that brings that to the enterprise. I think from our perspective, what we're taking is a lot of data that we've already got stored. We're figuring out how to enable that to run analytics against that data set and trying to determine exactly how we get value from data that's already been stored and, to some extent, archived. So you'll see even further down the road, more and more EMC products continuously, you see HTFS interfaces attached to those products, even though they've been around for quite some time. Brian, Joel, thanks so much for sharing this exclusive news, you guys. It's really historic. I mean, EMC, IVN working together. This is a real testament to the partnering, team sport mentality that you guys have. The ODPI, man, it's real, it's happening, it's great. Very powerful. And the benefit to customers real quick is just stability. Ease of use, it's getting going. They've made the investment, they can now use big insights there and get a lot of value out of their data. So this is a rising tide floats all boats kind of mentality, right? For sure. Yeah, absolutely. Alright, IBM and EMC working together, bringing it to you here from theCUBE, exclusive. We're uncovering all the signal from the noise out there, and of course, it's theCUBE, live in New York City, the big data NYC, it's part of Strata. We'll be right back after this short break.