 Live from the Fairmont Hotel in San Jose, California, it's The Cube at Big Data SV 2015. Okay, welcome back everyone. We are live in Silicon Valley for Big Data SV. This is our event, Silicon Angle Media with Wikibon and The Cube, where we extract the sylvanian noise for Big Data Week in conjunction with Stratoconference at Duke World. We're here to extract the sylvanian noise, talk to the experts, have a great time, do some social media, do some crowd chats, but more importantly, share all that data with you. I'm John Furrier with Jeff Kelly, Chief Data Analyst at Wikibon. We've had too many data sciences on. Our next guest is Sam Groca, Senior Vice President of Marketing and Product Management at EMC's Emerging Technologies Division, ETD, which essentially they're working on all the cutting-edge stuff. That is where the big data is. Welcome, Sam, to The Cube. Thanks for having me. We love talking to the Emerging Technology Division at EMC World because it's some really good content. Thank you. You've got to go deep and you've got to get weird in the long-range plan, but then bring it down to reality, right? Go into the fog, come back to the reality. Big Data. You've seen clear visibility now on some things. On the app side for sure. There's some movement on the converge infrastructure side. We talked to your federation partners over there, Pat Gelsinger. You've got vSphere now coming out. You've got vVolves, you've got a lot of integration going on. But this middle layer, the other federation partner is throwing out this red meat called Open Data Platform in the middleware. Cloud Foundry. That's a stack. What do you guys look at that? What's emerging? What's clear visibility and where are the emerging areas that you guys are focused on? From an emerging perspective, what we now say is the EMC Data Lake Foundation. What we really focus on is providing a storage tier that can provide native Hadoop analytics, whatever the Hadoop distribution is. Whether it's Hortonworks, Pivotal, Cloud Area, we plug right into that and support those ecosystems. So as ODP emerges and as Cloud Area moves forward with their community, Apache moves forward with theirs, we plug seamlessly into that. So we're really, you know, from a storage perspective, Switzerland all these different distributions and Switzerland to these communities and we want to make sure we provide our customers with the best integration, the best enterprise class storage for Hadoop analytics. I definitely want to get into the details of the announcement, but let's just take a step back and can you articulate for us EMC's of a Data Lake. What does that mean to EMC? I actually think there's different definitions depending on who you talk to. How do you look at the Data Lake? Yes, so the Data Lake Foundation is a first step in building the EMC strategy from a Data Lake perspective. It's consolidating all your unstructured data into a single pool of storage, a single volume. We call this single Data Lake. So we have a couple of offerings, ISON and ECS, depending on if you want to build a file-based architecture, an object-based architecture, but the concept is the same. Consolidate your unstructured data from your traditional workloads, whether it's a file servers, project shares, your HBC sands, wherever your data is being created, consolidate that into a single pool of storage, but also be able to pull in these next generation protocols, whether it's HDFS, object data sets, even mobile assets via Swift, OpenStack, etc. Be able to provide both traditional workloads and next generation workloads a single pool of storage. So consolidation step one. Step two is then be able to offer these next generation protocols like HDFS, for example, and be able to drive insights and analytics across those. So our Data Lake is consolidation of data and then being able to open that up for analytics and be able to drive insights out of that. That's our definition. See, I think that is a bit of a wider definition than I think some people have. So if you talk to some people in the Hadoop space, well, to them, the Data Lake, it's Hadoop. Yeah. And it's, you know, some of the new things that you're bringing in Hadoop, but it's not necessarily bringing in the old world and some of the more traditional workloads. But that's where your focus is on. The biggest challenge with a traditional Hadoop Data Lake, which is typically built on a DAZ scale, a clustered environment where you have compute and storage together, is you have to move the data. If you build, your primary storage, which is what EMC offers as primary storage, is also Hadoop ready and Hadoop enabled, you never have to move the data. So it removes that ingest of information, which by the way, typically is mirrored three times. So there's certainly a storage capacity requirement that is triple what you would need in your primary storage. So not only is it time to analytics delayed because of the copy time, and we're talking about the environments we're focused on is not tens of terabytes, it's hundreds, it's not thousands of terabytes. You move it in, you've got to store it three times more just to start running analytics. The beauty of the EMC Data Lake foundation is you run analytics in real time where it's at. There's no movement and it's very efficient. Which is one of the key concepts of big data is bring that code by data and run the analytics there. Sam, take us through some of the internal EMC as much as you can, share publicly around Icelon EMC, because we were talking before he came on, Jeremy Burton really set the agenda with cloud meets big data in 2011 when he came in as CMO, now he's president. We have a crowd chat with Jeremy at 10 o'clock on Friday and ask me anything, so that's going to be awesome. He likes to take risks, but back then EMC had to really run and put that positioning in place. At the time Icelon was exploding in value, so a lot of the big hyper scale companies, web scale companies, Facebook, you name it, all buy an Icelon and droves. Was that the big data, initial big data, but it wasn't viewed as big data internally. You had to earn that in, and what has changed in the definition of big data from the Icelon days? Because Hadoop was just on the radars, this batch open source project where you guys were selling Icelon drives at Icelon's. Can you talk about that dynamic, and what does it mean for today? Where are we with what does big data mean? It's a great story and I'll never forget the first time I heard the word big data was during the announcement that Icelon was being acquired by EMC. Joe Tucci was on the announcement call and he talked about, hey we bought Icelon because of big data, and we're all sitting around like, yeah, we store a lot of data, and it's big in capacity, but we don't quite get this big so we're like, okay, but to Joe and Jeremy's credit, there was a much bigger, broader vision. You had huge customer base, too. Absolutely, but it was, our first reaction is, yes, we store petabytes of data, it's big, but it obviously they had a much bigger, grander vision of what big data meant, so I think the EMC's credit, they've seen very, very early on this trend of really being able to grab insights from all of this information being created, and that's kind of the broader term of big data, it's really the value you're getting out of data versus the creation, storage or management of that data. Those are some of the challenges, the means, the ends, but the value creation is really what big data represents to us. So from very, very early on the big data term, and I think as you looked at the industry, it moved from massive capacity, massive quantities to now really landing into a firm connection to analytics. Those things are one and of the same now. So talk a little bit more about the details of the announcement you made today, around things like, things that people think about in the new space that Data Lake has performed, its security, its ability to scale, obviously. What are some of the details of the announcement from Tesla? So a couple things, so we announced a new Isilon platform, the HD400, it's 2.5x more scalable, so we can go from 20 petabytes in a single Data Lake to now 50, so much, much more scalable, more importantly though it's much more dense, so we can pack 60 drives in a 4U form factor versus 36, which was our most dense node we had before. So the economics are greater than 50% TCO savings because of the density. So that's key piece number one. Second piece is we've already been certified and supported by Pivotal and Cloudera. We're integrated into Cloudera's management tools, but what's new today is we announced certification with Hortonworks and integration into Ambari. So now we've got that strong ecosystem across the board, both with certification and providing that customer with that end-to-end distribution and enterprise class storage, integration points, they can deploy us in confidence knowing that the distribution and storage will work together, but we're also not integrated in their management tools as well. So that's the second piece. The third piece is we announced the availability of the ECS object scale-out platform as a Data Lake option as well. So ECS does multi-protocol across many different object interfaces and protocols, but also HDFS as well. So if you want to build and design a platform that was or an application that was designed in kind of the emerging use cases or third platform that it's constantly continually referenced around, object is a starting point for a lot of those applications. We have a best-in-class object storage system with ECS that can also run Hadoop against it. So whether you want an enterprise scale-out NAS file offering from Iceland or a scale-out kind of cloud-based object platform, both of those can do analytics in a multi-protocol protocol fashion. Those were really the big three announcements that we rolled out today. So I'm sure driving these announcements are what you're hearing from customers. So what are you hearing on the ground from customers? What are their biggest pain points when it comes to building the Data Lake and actually turning all this, we keep hearing about Data Lakes or Data Oceans as John likes to refer to it, and they're great buzzwords, but when you get down to it, all right, how do I practically get started with this? Because when we talk to practitioners, they're struggling. This is a challenge to build these out. Even the Data Lake where they're just offloading some data is challenging. Forget about doing some of the more advanced analytics. What are you hearing in terms of pain points from enterprises and how does this announcement address some of those? So EMC focus is clearly on that enterprise user. The biggest pain point we're hearing now versus a year ago is those proof of concept, those lines of businesses who have been kicking around Hadoop for a couple of years now that it's becoming so big and so mission critical, it's now becoming an IT problem. Before it was those guys off in the closet. I don't really know what they're doing. I don't even want to know what they're doing now. It's like, okay, this is actually very, very important to the business. I understand that, but how do I do this while not trading off or sacrificing all of my other ecosystem of integration points across the data center and across IT, the security, the governance, the compliance issues, the interoperability with the other parts of IT and so forth. And that's really where EMC is filling a huge gap is we're delivering technology that gives you the best of both worlds, enterprise grade capabilities, that plug and play and what you would expect from a NAS or more an emerging space, an object archive, but without, with also including those next generation capabilities. So you can run your Hadoop use cases against an ISLON for example, but you can still snapshot it, you can replicate it, you can back it up, you can do all of your applications can see the data, read the data, rewrite the data. It's not these yet net new silo environment, so it really is taking that enterprise grade plug and play issue off the table for these IT users that this is now a reality that is today and it's definitely not going away. So that knowledge gap is probably the second thing that needs to be filled. There's a features, functions, enterprise integration and then the knowledge around how to deploy this quickly. It's kind of the second gap. Obviously as an SVP, you've got to lead the troops and also set the agenda. And you've got to deal with the market chess board which is challenging while you get paid the big bucks. What's the landscape view for you? What's the focus? I mean, obviously you have to look at navigation, make sure it's on that bus to do what you guys want to do. What are you focused on for the outcome for your group and the emerging group? And what within the big data world are you watching closely that will be a lever for your success? Yeah, I think one of the big things that you'll see us focus more and more on this year is being very active in communities, being active in the development space. Especially in this use case, so to speak, in analytics and Hadoop specifically, EMC we need to be very, very active participants in this community because it is going to be one of the key drivers of innovation versus historically it's been vendor driven innovation. Now it's time for EMC and all the vendors frankly to find that middle ground where you're working together leveraging. So whether it's Hadoop, whether it's OpenStack, whether it's community and developer support and integration, that space for us is very important. What does that mean to you when you're going to contribute code, resources? I think you're going to see it. I think you're going to see it. No, I think it's you know, we definitely don't want that attitude of if you build it, they will come. Field of Dreams is not going to be successful here. So we have to one, we have to one, we have to acknowledge our participation. We have to bring products that support and integrate with these applications, these use cases, these OpenStack communities and then we have to provide. We have to deliver. We can't just open our arms and think that it's all going to come to us. So I think that's definitely an area that you'll see much more from you this year. We were talking when he was on who was really active with customers by this inside out organization where everything is done in the open. You can't hide in the shadows anymore the transparency of the information flow. It's big data. It's mobile network. Things are connected. You're going to find out what you had for breakfast and the minute you walk out the door for the house. And the last thing from an emerging sample, I don't want to be on the outside looking in and watch this all happen. So we are going to be from an emerging technology division. We are going to lead and we're going to lead from the front and we're going to become much, much more active than historically you've seen. Keeping with that theme, the big announcement this week that everyone's talking about is the open data platform where Pivotal Hortonworks, other members like IBM and of course EMC are playing a role as well. Talk a little bit about the open data platform. Why EMC, it's important for EMC to play a role there and what do you hope the open data platform is going to achieve? Yeah, so whether it's ODP or other communities that are getting together I think there is that problem of okay IT is now dealing with this new emergence of Hadoop and analytics and their IT infrastructure. Anything that progresses any movement, any groups any technology that progresses an enterprise experience meaning it's more predictable. It's more manageable as innovation has been so quick in the Hadoop space and the analytics space customers and organizations are looking for something that they can count on more consistently and be more predictable to work within this much more complex environment versus again the POC closet. So things like ODP I think will help to bring forward that predictability and be able to bring forward that consistency that these enterprise environments are looking for. I think from a Claude era perspective they're very strong in their community and their strategy around bringing that predictability as well and so from an EMC perspective why did we join, we're a gold sponsor of the ODP is we think anything that advances that agenda of getting these applications into more of an enterprise grade level is great and we'll support any and all of them so we're very much committed to that. What do you think the ODP can and should do that maybe the Apache community can't do? In other words because you're hearing some criticism from the open source community that this is an vendor driven thing and we've already got this standards board it's called Apache. What role do they play to complement what Apache is doing? What are they going to do that maybe the Apache community, the open source community can't? Yeah so I think the members of ODP the segment that is a member now they will be collaborating incrementally above and beyond because they're all members and contributors to the Apache open source community as well so I think it's going to provide that ability for these partners to collaborate much more closely be able to leverage each other's strengths in a more coordinated fashion and then again provide that customer base and that user community with that more predictable enterprise experience I think is positive now is it going to be dramatically different or am I saying that the Apache open source community won't be able to deliver on that? No I'm not saying that all but again it's a new commitment it's a new initiative that is going to look at this problem a little bit differently with a broader ecosystem of partners I think that's positive for the industry positive for all of us. So I got to ask you one of the things we were talking about earlier and again this is something that we're watching in real time is playing out is the convergence of open source community out in the open with governance of standards bodies we've all been through the hydropole W3C and so on and so forth and there's a criticism out there that these new land grabs or Barney deals or in their marketing programs when people have these consortiums but OpenStack is successful it's gone through some trials and tribulations Cloud Foundry it was very skeptical of it at the beginning like it's never going to work it's just got one together but you know what it's working so this new concept in this new environment is it different now why are these projects becoming successful is it the new era of open source is it the transparency is it the access to information I think what's driving it is more so from more than we've ever seen from a customer expectation perspective I think why these things are being more successful now is the amplification of the voice of the customer and the expectations that are being drawn into these organizations is requiring them to really get focused and get their acts together and solve problems that are being driven by this wider use case so as OpenStack has proliferated and gotten more and more momentum behind it adobe analytics as well you're going to continue to see these type of organizations OpenStack looked good great launch similar to ODP which has to prove itself time will tell and then OpenStack turned into a marketing program we were very critical of it this is a total cluster but what happened was they got their act together internally and said let's ship with code and then OpenStack got on the rails again and it's doing great open foundation so the question is is it because the customers are now connected to the projects because you could argue that things like in the land manager days and the UNIX days that the customer's intention was best in mind so is it the fact that the customers are now actually part of the contribution you know the DevOps space is obviously has emerged dramatically over the last couple of years the people who are also managing the storage and operating the IT environment broader are also now becoming contributors and also participating in that community at a level that they've never participated before there is historically a strong divide between the developers and the operators this world is coming together so I do think that is probably one of the key differences in today's world and then you've seen over the last few years awesome so final question what's the outlook what are people what should people pay attention to that's not being talked about in the press in the events that you guys are keeping your eye on and you think around the corner that we should be paying attention to yeah I think I think the big thing that is still not as well understood throughout the entire IT industry is just the true value and breakthroughs that analytics can provide organizations you know as I meet with customers and CXOs all over the world there is a clear set of environments that understand the value and this is top of mind this is a board room discussion it is a clear cut competitive differentiation and then there is a segment of the market a large segment that are not operating at that level I think that is probably the biggest missed opportunity that we'll see change very quickly now we've seen it accelerate over the last couple of years this year is that breakthrough year where I expect the entire industry and this is what will determine winners and losers in the marketplaces across these key markets if they can harness the power they can understand how to take advantage of these massive resource called their data that they've created and they're throwing tons of money at that is going to be the breakthrough that you know a lot of organizations haven't even started down that process so I think this is a year where you're going to see that accelerate to the masses and you're going to start seeing winners and losers across mega mega industries decided by who is able to harness this information faster and get to market sooner Sam Groekrod, Senior Vice President of Marketing and Product Management at EMC's Emerging Technologies and they're building up the future great to see you on theCUBE appreciate it thanks for sharing in the open way on theCUBE and sharing your opinions and what's happening with EMC great to see you guys out in the open with the community it's fantastic to watch certainly your cohorts at VMware and Pivotal doing the same thing so a lot of great stuff coming from the Federation we are here live in Silicon Valley I'm John Furrier with Jeff Kelly it's theCUBE we'll be right back after this short break