 Live from New York City, it's The Cube at Big Data NYC 2014 brought to you by headline sponsor WAN Disco with support from EMC, Mark Logic, and TerraData. Now here is your host Dave Vellante. Hi, welcome back to Big Data NYC. This is Dave Vellante. This is The Cube, our live mobile studio. We go out to the events, we extract the signal from the noise. We're here doing Big Data NYC all week, Thursday and Friday. We got our Capital Markets event right after this today at 4 p.m. East Coast time and then celebrating five years of The Cube at Hadoop World. Sam Grocott is here. He's the Senior Vice President at EMC's Emerging Technologies Division. Came over from the Isilon acquisition in Aidan O'Brien as the general manager of the Big Data Solutions at EMC. Gentlemen, welcome to The Cube. It's great to see you. Thanks for having us, Dave. So EMC, you guys got it all started. You know, Cloud, Big Data, really simple and you were right on. It was 2010 maybe? You know, that's good foresight. I mean, again, simple, but you nailed it and here we are today. I'll never forget the first time I'm during the acquisition announcement of Isilon. Joe Tucci was talking about Big Data on the acquisition call and it was just like, well of course we store lots of data, but there is obviously a bigger term and he was very on the forefront of this new market movement that was going to be coming. So, you know, after a few months of being part of EMC, the realization that Big Data is obviously much more than just big capacity and big storage. There's a bigger solution out there. So tons of foresight into the future. Yeah, and I think you guys created a conversation around it and started having those conversations with customers, which I'm sure led you to some ideas as to whether it's acquisitions that you should make, solutions that you should create. And now you're, you know, a major player there. Before we get into it, you guys were over at Hadoop World in Strada. And what was the vibe like over there? Oh, it's fantastic. The pact? Yeah, huge numbers of people. It's even better than last year. You know, as you think, very interesting topics, lots of great speakers. I mean, obviously, you know, it's great from our perspective in terms of, you know, the relationship with Cloudeira that we'll talk a lot about. Really good vibe. So what's the buzz? What are people talking about? I mean, obviously internet of things is hot. You know, what are the topics that are coming up? Yeah, so we really see the conversation going on two different paths is how do you build more value in the Hadoop stack northbound into more management interfaces and other applications that are taking advantage of Hadoop data. And then certainly southbound where the storage layer lives. And we talk about it as the EMC storage foundation or data lake storage foundation. That's where the EMC business typically hunts historically. And that's where we really see our differentiate opportunity going forward with the entire ecosystem, whether it's Cloudeira or Hortonworks is certainly pivotal. We think that's a match made in heaven, both from architecture standpoint, as well as from an application focus. So can we talk about EMC's big data strategy? Maybe EMC particular, let's stay away from the federation for now. We can bring that in in a moment. But but EMC specifically, what's how would you summarize same the big data strategy? We want to store it all. It's going to be the biggest shift that's going on within the storage industry. You know, if you look at the amount of data that's under Hadoop today in HDFS storage today, if you look at next five, six years, some people believe that will be about half of all data stored will be under Hadoop management at that point. So it's a huge industry shift that's going on to being able to leverage your data, your huge investments you've made within content and storage and be able to analyze against that. So certainly from the EMC standpoint, we want to make sure we have best of breed technology across whether it's a file scale out file architecture, object architecture, whether it's converged Hadoop storage or server side Hadoop storage, we want to be the foundation for the data lake for the space. And then from a partnership standpoint and applications perspective, we want to be delivering what our customers want. If they want XYZ vendor, we're going to be right there partnered and supporting them with it. So it's a very aggressive approach to move to frankly, what I think is probably the hottest workload in storage today, the sexiest workload that we're seeing out there today that most enterprise customers when we talk to them is no longer what is Hadoop. It's a Hadoop is important, it's critical, help me plan my future strategy with Hadoop on enterprise storage. So it's interesting because EMC plays in virtually every segment of the market. So we've talked a lot today about sort of the traditional enterprise data warehouse and how customers are sort of rethinking their spending and maybe moving some not maybe they're moving some of the budget into Hadoop. Now a lot of that stuff is running on VNX's and and symmetric's and so forth. Joe Chuchot says I'd rather disrupt myself than get disrupted. So you're sort of playing that with but the target for a lot of that is isilon storage. And so Aiden, your group's responsible for creating big data solutions. I'm presuming that isilon, you know, underneath this part of that. Oh, very much so. Can you talk about some of the solutions specifically that you're building for big data? Yep, absolutely. I mean, so the work that Sam's doing with the ETD group, the emerging technologies group, that foundation that Sam mentioned, that is the base for all of our solutions afterwards. And that's obviously when we bring in the additional technologies over and above that. But I think the general realization to your question before Dave about the strategy of the organization, I think at the most senior levels, we're seeing the a need to move from product to solution based selling and actually, you know, working with clients to achieve outcomes. So you know, sort of some of the outcomes, you know, obviously our traditional heartland is, you know, working with the CIOs. So you know, helping them use analytics, machine learning, all the data science, those types of things to do things such as reduce unplanned downtime for mission critical applications. That's the type of, you know, thing which we're really focusing on in the future and delivering outcomes for CIOs. So I know you guys are tight on time, you got to go. But let's get right to it. Because the heart of it is people say, Oh, Hadoop, that's white boxes, it's C8 disk drives. We don't need that EMC stuff. Why EMC, Sam? Yeah, absolutely. And most of the market is deployed that way. You know, I don't have the exact numbers, but it feels like nine out of 10 engagements that we're talking to are built on build your own white box, server dash hardware. There is an opportunity. It is not the biggest part of the Hadoop and HFS stores specifically. But there is a significant segment of the market that expects enterprise class caliber storage that can provide a wider set of benefits outside of just doing purpose built HFS storage, Hadoop storage. That's the market segment that the EMC data lakes are focused on is merging that enterprise storage use case with the ability to then leverage those that data, analyze that information by natively integrating with Hadoop environments via the HDS protocol. That provides a much more scalable, more efficient storage architecture than traditional white box build your own. It also allows you to use that data lake in more ways than just doing Hadoop against it. By moving it into a nice on scale at day like as an example, you can read and write all of that information, whether it was written via HDS, SMB, NFS, NDMP, you can back it up, you can FTP it that data once it gets put in the data lake can be used by all of your applications compared to a dedicated silo for HDS stores in a DAS market. So there's absolutely in a very important enterprise segment we're targeting on that values enterprise data services, the interoperability with their existing infrastructure and that ability to leverage multi protocol access to all of your data, regardless of it's new or not. So you're saying this data lake concept is big data piece becomes a horizontal transport that the entire organization can take advantage of versus building these rigid silos that are very expensive, obviously, and require staffing for each. That's where a big part of the savings is going to come. Absolutely. And then you can overcome some of the pain points of servered as architectures with Hadoop, such as Mirin and the low efficiency of capacity storage and so forth. You can move that, but you're not the point is once you put it in the data lake. It's a it's a highly agile data lake that can be used by a lot of different users and applications. It's not exclusively a data lake. We're just doing a do and what about leverage? And what about things like governance, you know, people spinning up a do clusters and nobody worried about data quality or data governance and can you help solve that problem? Absolutely. That's the intention with the solutions. You know, when we look at these solutions, there's a combination of hardware software and services. And it's a recognition that just simply buying a technology product isn't going to answer some of those sort of softer issues as well, you know, such as those which are well understood around around data governance. So what's the conversation like with customers? Share that with us. I think you see lots of many of customers that are all in different places. So the first thing is you're trying to assess, you know, what the level of maturity is. And you know, sort of, you know, are they sort of, you know, traditional sort of IT managers? And we're actually looking at more of a technology conversation. You know, as soon as we start to speak to more, you know, often there's the most senior clients that we deal with. They're all about outcomes. They're all about, you know, okay, fine, I understand you've got great products. I know you've got a great reputation. It's how do you actually help me in this space, understand firstly what you can do with all of that data. And then afterwards, let's talk about, you know, sort of making that a reality with your great technology afterwards. But when we're speaking of those senior guys, the technologies, you know, sort of comes in once they've understood what they're going to do with it. Any big mistakes you see customers making that you can sort of share with our audience, things you should maybe avoid? Yeah, I think the the biggest mistake or the biggest gap in specifically enterprise buyers in all of Toronto is understanding what it takes to manage that over time. And more specifically, how to successfully set it up, install it, but then extract value from that environment. There is a huge lack of core expertise in the enterprise space about how do you actually take advantage of Hadoop, not just set it up. So it's moved, as Aiden was pointing out, it's moved beyond just technology point. It's how do I have the soft skills, the management skills, and the intel, the wisdom to actually pull that information out and use these tools to maximum value. The market is really missing deep kind of post integration understanding of how to use these tools. The tools are fantastic, storage applications, it's now how do you train the people to use to use those in maximizer? Our data is showing that to the recent survey, 60% of the IT practitioners we talked to said, yeah, our big data projects, Hadoop is totally successful, only 18% of the business side agreed. And so there's a real dissonance there, hey I got the Hadoop cluster up and running, different from what you're talking about business outcomes. And so what does EMC do to close that gap specifically? So I mean, in the first instance we've got our own capabilities to actually, you know, help lead customers through that discussion of, you know, finding what is the killer use case that's going to help them justify that investment, achieve competitive advantage in the market. But the other big thing that's really important for us is that, you know, we're actually looking to enable that ecosystem, you know, sort of other sort of SIs and service providers, so they can also also actually help their clients do that as well. Yeah, well this is why, this is like we said not going to really get into the federation, but this is to me where the federation comes in and starts to get pretty interesting, when you can bring in the pivotal piece and there's, there's obviously there's some VMware infrastructure stuff that's going on, but the pivotal piece is really interesting in terms of solving that problem. EMC's got the customer relationships, increasingly you're becoming strategic to CIOs. Yeah, and I don't think I would add on is the other community that we're looking to really build out this trusted skill set is the channel partner community. It is a huge opportunity for them to raise the visibility with their accounts to be a thought leader not only around Hadoop, but then a deployment specialist around how do you successfully deploy these things. I think it's creating a new great opportunity within channeling community as well to be that trusted advisor. Guys, I know we get a hard stop, so I'll let you go, but thanks very much for coming on The Cube. I really appreciate your insights. All right, keep it right there, buddy. We'll be back with our next guest right after this. This is The Cube. We're live from The Big Apple.