 Live from New York City, it's theCUBE at Big Data NYC 2014. Brought to you by headline sponsor WAN Disco with support from EMC, MarkLogic and TerraData. Now, here is your host, Dave Vellante. Welcome back to Big Data NYC, everybody. We're live here. We've been going Thursday and Friday wall to wall. This is theCUBE, our flagship product. We like to go out to events. We extract the signal from the noise. Last night, we had our fifth year celebration at Hadoop World and it all started with a friend of ours, Mike Olson, who's here, also here with Ryan Peterson of EMC. Mike, of course, is with Cloudera, now Chief Strategy Officer. Mike, I think we were talking. I think about that day all the time. We flew in to Hadoop World Furrier, called me up and said, get your butt out here. I was in Dallas. He said, you're at that old world conference. Come and create the future. And you were one of our first day and really taught us a lot. And since then, it's been a wild ride. So thank you very much for letting us in back then. Well, look, you folks have been a great channel, a great way for us to talk to the market and a real participant in the ecosystem, including building a lot of data-driven products of your own. So not just talking about it, but actually playing a part in the evolution of the ecosystem. Well, it's true. Actually, CrowdChat was actually born inside of this Cloudera with some interns and Danny Ryan at Georgia Tech. And that thing's just exploding now. And as we were saying offline, John, sorry, he can't be here, but he's going crazy with CrowdChat. So, and back then there was, you know, under 1,000 people. Yeah, you know, the very first one we ever did was like 450, 500 folks. I think we got nearly double that next year up to around 900. And you know, it felt so much like a movement. It was exciting, the energy in the room. What's remarkable is now a few years later just about 5,000 people, right? The city of New York has organized around it. There's NYC data week going on. We've got tremendous interest, participation, customers, partners, vendors, innovators. The momentum and the enthusiasm has absolutely continued. And I understand the average age of the attendee is starting to trend in my direction. And again, of course, Ryan, now EMC's all in. I mean, you didn't have a huge presence back then. You maybe had a couple of guys walking around and say, what's this Hadoop thing? And you and I talked in August, I guess, or September in VMworld, and you were really helping me understand VMC's angle on Hadoop. So you guys are all in to this space. So give us the update. I know you recently just talked to Aiden and Sam yesterday and they'd probably give you a little update on the EVP and ETD solutions. So ETD is our new division in emerging technologies. We brought Icelon, Viper, ECS, the new DSSD innovations that are coming out next year all under one roof, really excited and can't tell you how excited we are to get Mike and Tom or Clara working closely with us and absolutely just what a great start of the week this week. Everybody wants a piece of the Hadoop world. We had Rich Napolitano on this morning. He's looking for his next big thing. And of course, where does he come? Hadoop world. So Mike, tell us from your perspective, what's changed and what stayed the same both in the industry and in Cloudera? You know, in the industry, I think what's changed is the scale, the reach, the maturity of the platforms and the market, right? We're seeing now this platform roll out in some really compelling use cases. Ryan and I were talking on the way over here about one company that is using big data technology to help return missing kids to their families, right? That's just remarkable. I was able on stage to talk about children's healthcare of Atlanta doing great work on delivering better care to babies, the neonatal intensive care unit. We get to talk about use cases like Digital Globe helping chase terrorists, kidnappers of the 270 schoolgirls in Nigeria through the jungles. That collection of use cases is transformative. Look, we're gonna make banking better, right? We're gonna help people optimize their customer relationships, but we're able to do stuff with data now because the technology's ubiquitous and the skills and the applications are more mature, we could never do before. So that's what's new. What's the same? Man, the pace of innovation is as breathtaking as ever, right? New stuff is happening all the time. The maturity of the market has made it much easier for enterprises to adopt the platform and to integrate it with the mission critical infrastructure they're already running, right? And that's what has driven us to expand our partnership reach and to form long-term relationships with great companies like EMC. Well, I'm glad you mentioned that about how you're having impacts on the world because of course, early days of Hadoop world, we'd interviewed Jeff Hammabocker and of course he was very famous for his quote about the best mind of my generation to try to get people to click on ads and of course he's gone off to Mount Sinai and trying to change the world yet again. I was at a conference last summer and somebody stood up in the audience and it was sort of the immense. And when is this technology ever gonna affect the society and it's happening, isn't it, Ryan? Absolutely, Mike just talked about this customer we have who's taken basically just call detail records and data off of telcos and transformed that content into something easily searchable, taking that information, giving it to law enforcement in the right controlled manner so law enforcement can track things down they need to track down much more efficiently quickly. And as a result, they've been able to find rings of child abductions usually for very terrible acts and get those children returned back to their families and shut down entire evil practices. So you can't write these things I mean you couldn't come up with this before and we were talking about on the way over is, you know what? What's next? What have we not figured out? What information could we start using and discovering? And real question is what is the question to ask? We couldn't, I couldn't have predicted those use cases two years ago, right, that recently. So stop and think right now about what's going to be possible two years from now. It's not that the data's going to be new, some of it will be, but what's really going to be new is the vision and the use of these new platforms to chase problems we can never attack before. Well, and I talked about our panel last night with Peter Goldmacher, when I first met him he put forth the premise that big data practitioners are going to create more value than the supply side and with all due respect to vendor technologists we started thinking about that and people have put forth this notion of the digital fabric and the digital fabric is this idea that you've got industries have always been aligned on their own vertical stacks, whether it's retail or manufacturing or healthcare or financial services, production design, distribution, partnerships, sales, et cetera, very rigid within their own stacks and it seems to be with cloud and infrastructure, the service and social applications and now data, these horizontal transports cutting across industries and it seems like Goldmacher's theory is coming true but it's still hard to pick the winners. In other words, what organizations seem to be doing is riding on top of that digital fabric, Uber is a good example. I mean obviously Amazon, Google and Facebook are the other clear big ones but whether it's Uber or Fitbit or UPS, Coca-Cola, to the GE, to the degree that you're able to leverage that digital fabric seems to be where the new value chain is being created and not worrying about non-differentiated heavy lifting or all this geeky tech stuff, let the industry take care of that, we're going to innovate at the business model level. What do you think about that theory? So it's interesting because this is exactly why we got excited about this. We have a lot of data. I mean EMC's got more data on storage than anybody else and we're realizing that customers of ours in very large organizations and governments and not to say customer enough but say NASA and NIH were to get together and talk about climate weather data and combine that with healthcare information. What could you find out? What could you glean from that two data sets coming together? But in the olden days I call it now, just a few years ago, the discussion was all about taking data and moving it into the sedup architecture so you could run the information. We're trying to get all of that content migrated over WAN infrastructures and WAN infrastructures, even internet infrastructure so that you can use it and cross-correlate it, very difficult. What we've really done is innovated, I think the next step which is turning an HDFS into a protocol so that the content can be accessible to the Enterprise Data Hub. Making it so that the Enterprise Data Hub now has accessibility to content where it sits and where it lives, we think that combination's just killer. Inert data sitting in storage is taking up space. It's always been the case that using that data to make decisions to understand the world better is what matters, right? The question is, can you get enough of that? You call it data fabric. The industry likes to talk about a data lake, a great big repository where you can put everything. It's great to have that data but layering the processing and analytic capacity on top so that you can ask those questions allows you to build those applications. Data at scale actually gets to support more applications than smaller amounts of data. We can just ask larger questions because we can look at longer histories or deeper histories of data and the combination of good scale out storage, a tremendous way to build the fabric or build the lake that you're talking about with the analytic capabilities that clutter layers on top of EMC. I think it's going to enable a lot more of those applications. And you're starting to see certain firms cut across into, I mean, you look at what Google's doing, look at what Apple and healthcare and it seems to be data driven. It's almost like the data lake, that data transport allows you to do things that you never could have done before and there's a lot of data science behind that. Where are we with the whole data science skills gap? Is that still a barrier? Is it sort of confined to those guys that can attract people and they're going to make all the money? What's the skinny on that? We spend a lot of time skilling the industry up. It's absolutely critical for us that lots and lots of people know how to build on this platform. But you know, this is no different from 30 years ago when relational technology was brand new. We need not just skills on the market but software in the market that attacks these problems. On stage this year, I posited to the audience $100 billion worth of value on sort of existing traditional relational platforms, right? That's platform, software and applications and tools. Vastly more data than ever before, analyzed in these new ways, right? Let's go find out what the relationships are in a call graph. We ought to be able to drive dramatically more value. I think that the big data market is a $1 trillion market and that value is going to be realized not because we've got data scientists who can poke and prod but because they've rendered their skills into the applications, the software that ordinary people can use. Yeah, we talk a lot about the market. You know, the Pat question that we always talk about in the queue is will there be a red hat of a dupe and you guys are obviously thrown into that mix but I think I'll be met and said it best that no way there'll be a red hat of a dupe because the dupe is so much larger than Linux and so much more innovation and investment and players that the industry would never let one red hat emerge. There's just too much opportunity so it's kind of an interesting and fun thing. What is the relationship between EMC and Cloud Era? You guys are here together and what's new, what's different, what's the same? So starting Monday, we put out the official press release that we're working together and we've just completed certification with a Cloud Era manager with CDH with Icelon. Certainly we have plans for the future that will come out over time but we're looking forward to continuing to grow that relationship and see more and more innovations happen between the two companies. So people always say, well, big data, it's about open source, it's about cheap storage, it's about white box servers. Is that changing? Absolutely. Talk about that. The reality is that people are making decisions about where they need to store their data and the type of storage that they need for that kind of data. They're also in the exact same time talking about how they're gonna analyze and use that content. The problem is that today there's a happening within different divisions. Sometimes it's happening at that enterprise architecture layer. When that happens, they find themselves at the dichotomy of am I gonna build a storage infrastructure or am I gonna build a Hadoop architecture? And we're taking that decision point away. Now you can choose the storage that you need for your type of environment and use it with analytics. Great example of that is what we're innovating around this Viper platform at EMC. Thinking Viper is going to be a cloud-based architecture. It's really built for large-scale object repositories. And when you start looking at how large these are gonna start out at, the hundreds of petabytes, multiple users, all multi-tenancy. How do you take a hundred petabyte implementation and say we're gonna now build a 3, 400 petabyte Hadoop cluster and migrate all that content off of the object system so that we can start writing analytics against it? We really believe you need to be able to make that decision on what cloud architecture you need and then be able to touch all of that data with Hadoop and be able to pull that information in and start using it, but not necessarily have to pull it from where it sits. I have a question on, and it's kind of an infrastructure question, but you've seen storage scale out, you've seen compute scale out. The network is still this hierarchical mess. Is the network gonna flatten and scale out? What are your thoughts on that? The answer to that question, very briefly and very simply is absolutely. One of the reasons that we've got the strategic relationship that we've got with Intel is that we want to build this software platform along with the evolution in the underlying technology and take advantage of advances in memory, solid-state storage, on-disk storage, and network and compute in the ecosystem. And there's a lot of innovation happening in all those areas. I'll pile on what Ryan said. For those who want to run J-Bods attached to RaxFull, a white box hardware, we love you. Come on, that's how Hadoop was born. That's exactly what Google intended when it built its data center. But you know, as the market has matured, so has the platform and so have the companies using the platform. They want us to integrate with the infrastructure that they've chosen. And they've made those choices for good and important reasons, manageability, cost, scalability, skills. It's critical that we unlock the value in the data they've already got, right? Not just new stuff, but in the data that they've built out. And we want to fit in the data centers that they've built out. So I think the relationship with EMC is excellent for Cloudera and an excellent sign of a maturing market. More value is going to be captured and created. This is a deep strategic partnership for us, right? We're making a long-term bet on this relationship. When there's a petabyte of data stored in a system, that data has real gravity, right? It's going to be there for a long time. We're going to need to service that customer together for the long-term. And EMC's got a diverse array of storage products. Of course, I think there's plenty of room for further strategic integration, but Isilon was a no-brainer start. You know, I'll say we announced a great relationship on Monday, but we've been talking. We've been working together. We've had, it was probably nine or 12 months ago that we had our first really substantial meeting with the two of us. And I think we kicked off a partnership that's going to last a very long time. Well, I mean, the two worlds are coming together. I was sort of joking before the average age is coming toward my direction, but I think it's underscored by these types of relationships. I mean, you think about the EMC customer base. It's hardcore, traditional IT, can't ever go down. And what we're seeing in the Wikibon community is the DevOps crowd is coming and saying, okay, here's how you want to do this and do stuff. And there's a lot of smart IT guys out there saying, okay, that's cool, but whoa, whoa, hold on a second. We got standards. We got to worry about governance. We got about data integrity and data quality and things like that. And so we're starting to see some, the smarter companies collaborate on those things. And it's really your two worlds coming together. And we're starting to see the smarter vendors move in the right direction, right? I mean, Isilon was super simple for us to integrate because the Isilon team implemented the HDFS APIs. Yeah, we had to go be sure that everything worked. We had to integrate with our management framework. We had to test all of our products and fix a few bugs. But these are enterprise platform companies that naturally have decided to integrate in support of customer requirements. You brought the Intel relationship. I wonder if we could talk about that a little bit. I don't think we've talked since that. And obviously, eye-popping and everybody focuses on the numbers and so forth. I want to talk about the sort of the go forward plan and the outcomes here. So what are you actually doing with Intel? What are you getting? Is it distribution? Is it technology? A lot of that technology I think went to the open source community. Can you help us just give us the skinny on not the numbers, but the post numbers outcome? Well, people focus on the numbers. But in fact, our intent pre-numbers was a deep strategic relationship that allowed us to be the innovator. We want to build the very best product and deliver it with the most analytic and processing capabilities earliest in the market. Intel can look 10 years into the roadmap, the footprint, the capabilities of Silicon. The data center that we build a decade from now is not going to look like the data center that we build today. And we need to be sure that the software, the scale out infrastructure for big data analytics evolves along with the hardware. So we get to look at the Intel chip roadmap, understand what innovations are driving there, render them into code. Sure, we contributed to the open source community. Sure, the industry at large benefits, but we're first, and that matters to us. We want our enterprise customers to have the most time to monetize their data. And that means being the company that delivers that value earliest. Intel's decision to invest was flattering to us. They made a tremendous bet on Microsoft early. They invested seriously in Linux. They got super excited about virtualization. Big data is the largest infrastructure investment that Intel has ever made and flattering to us. But this was not for us a deal about the money. It's great to have these resources for strategic acquisition in case something terrible happens in the markets. But this was for us fundamentally about making the platform better and continuing to drive the state of the art. And I wonder if I could follow up just on a couple or two areas. One is security. And the other is just distribution globally. I mean, you know, small company, everybody wants your stuff. You got to go overseas. It's very expensive. How has Intel affected both of those? Security on the chip as well as into Hadoop and then the distribution side of things. In the 5.2 release that we dropped just now, we've integrated Intel Silicon, numerical arithmetic capabilities to do very, very fast on-chip AESNI encryption. So now you can store your data encrypted in this big data platform on Isilon, on whatever sort of substrate you want with the strongest possible encryption. We've also delivered alongside good key management so you can do PCI compliant data management. You could personally identify all information in there and feel good about it. Not just there, but across the board. Intel's got in-memory architectures, on-chip networking architectures that'll be rolling out and we get to continue to integrate that stuff. But security is fundamental to the maturation of the market that we're talking about. In addition to that, beyond the security, beyond the product stuff, as you note, we want to be able to reach customers everywhere in the world. The largest vendor of all of this ecosystem in Asia was by a mile Intel. They had a presence in China and in India that was vastly larger than anyone else. Because we've combined forces, we get now to use those channels and in fact, we've got a very large operation of our own as a result of the Intel relationship in China in order to grow our business in that part of the world. So that's been tremendous for us. It's a terrific strategic partnership on product, on go to market, and on real strategic and innovation focus. So Ryan, I wonder if you could weigh in on the global angle. I mean, EMC used to be, you know, everybody thought, okay, storage guy, you sell to the infrastructure people. That's changed. What Tucci did with the Federation was genius. You become much more strategic and a lot of customers that are saying, hey, we want some help with the big data, you guys have many, many answers. Not only through your partnerships, obviously you got a pivotal as part of the Federation. What are you seeing in terms of sort of global interest in your customer base around Hadoop? We live in this echo chamber sometimes, right? But I wonder if you could comment. Yeah, and to tie up on the Intel investments, I want to say we're really excited about the investments in Cloudera from Intel. Obviously our relationship with Intel has been, you know, extreme. Did Pat help out with that at all, too? We have absolutely a, obviously a deep relationship. And more, obviously. Deep relationship with Intel. I'm very excited about the combination thereof. Globalization, I mean, some of the conversations we were just talking about with some of these customers, big in Europe, big in Asia. We're really doing quite a lot in our Australia region, for example. We're seeing, you know, the requirement for big data is everywhere. In fact, I would say that even when you get into some of the larger populaces, you have a bigger problem to try and figure out. The ideal situation here is you start asking the bigger questions. You have bigger populations to deal with and a lot more data to mind through. And it doesn't really matter physically where that is. It could be all over Asia, all over Europe, and we're having successes in all those places. It seems like customers are still trying to figure it out, though. They were spending like crazy on their EDW. We call it chasing the chips, the snakes wallowing the basketball, and then this other thing comes along, Hadoop, and they say, wow. Maybe we should start spending more money here, so they do a little experimentation. And then they start baselining their EDW. They're not throwing away by any means. In fact, it's critical to the Hadoop world. But they're saying, okay, how do I, you know, shift my investments? What are you seeing? How are customers sort of managing that balance? I think that's a great question. We see this all the time where people are spending a lot of money on EDWs. And a lot of, I think, where people are getting started with Hadoop is they're shifting some dollars from EDWs and figuring out ways to do some things a little more efficiently and still using their EDW for quite a lot of work. I've been doing things like ETL inside of Hadoop before, maybe lands inside of their EDW. I'm trying to kind of clean that process so it's done on cheaper hardware, on less expensive software solutions that's, and ultimately utilizing that structured platform for exactly what it really needs to be used for. Amir calls it the SLR smartphone. I don't know if that's his analogy. Yeah, yeah, yeah. There's the high-end camera and the general-purpose camera in your pocket. Look, our point of view, we've got a fantastic and very long-lived relationship with Oracle, for example. We just announced a great relationship with Teradata. Our platforms are deployed side by side in every single one of our large enterprise clients and in general, what we are doing is complimentary and gives customers much more power and much more analytic capability over their data. They're able to move the workload and the data to the right place. Data volumes are exploding. There is way more data being created today than ever before. We expect the opportunity across the board to grow, right? It's not about what am I going to turn off in order to have budget to turn this thing on, but how can I drive value? How can I extract profit from data that was previously unavailable to me so that I can invest in that value so that I'm willing to invest forward for additional revenue, for additional growth? And I think that the existing long-lived products are outstanding at the work that they do. The real opportunity here is, let's go build something about 10 times better together than was ever possible before. Well, and I think that's the last point is key. And having said that, the EDW did not live up to its promises of 360-degree view of the business, real time, footing data in the hands of business users, all those things that Hadoop promises can Hadoop live up to that promise, is my question. Look, we'll find out. I think that I could find you, a bunch of people would argue with the premise that EDWs had somehow failed. There are a lot of CFOs not in jail because of the visibility that they get into their businesses from those products. Hadoop is young, but we got to talk about some really interesting, socially meaningful applications that are running on it right now. Hey, by the way, detecting fraud and financial transaction flows and making sure that banks are complying with regulatory requirements and optimizing customer engagement by understanding customers better. That stuff's happened in a day. I'm excited to see what we're able to do with high performance analytics that were never possible before. Machine learning has leaped out of the research lab and into the enterprise where it's doing some amazing stuff. We're only getting started on that. There's hype, not all of it's gonna come true, but I bet we underestimate what we're able to do with data in the next 10 years right now. I do have to, I think I agree with you a little bit, though, there's a little bit of that missing component in Hadoop that now Hadoop adds the extra kind of piece. I think where Mike, I agree with this, is the relationships yesterday announced with us, InterData, what a great vision, though, of taking at EDW and expanding out the capabilities of it from being overly structured to now having a lot of capabilities outside of that environment. So even when he gets to images and video and things you can start doing with Hadoop against that information, now tying that back into that structured data you already have in your EDW is gonna provide extreme value to the organization. That maybe will answer some of the questions that people are concerned about. I think you're right, I don't think it's a zero sum game and there's no doubt, I used to work with a lot of CFOs and post Enron, it saved a lot of butts and it was actually a boon for the EDW business, thank God for Enron in some cases. And I do feel like our data shows that the two, the number one and number two tool sets in use and big data initiatives are the EDW and is number two, data integration tools, number one. Oh yeah. And Hadoop is there, probably large proportion to Hadoop but at the top, EDW and data integration tool. So I think you're right Ryan, it's that combination that is actually gonna fulfill that vision of 360 degree view and real timeness and hopefully putting, the hard part is really putting the information in the hand of business users. There's a gap right there now. That's the application opportunity that I'm talking about. I think you're right, if you're gonna put a Hadoop cluster next to your existing great big enterprise data warehouse, data integration has just doubled in importance to you. You gotta be able to get the bits to the place they need to be and to the user that needs to understand them. So plenty of work to do but a platform that lets you live up to your standards in the data center that you understand how to operate that you're able now to capture way more data in and analyze in new ways, that's giving businesses mature existing businesses a huge new tool for understanding their customers, their markets and what they need to do. Well gentlemen, congratulations on the renewed partnership, our best of luck and Mike in particular to you, thanks so much for giving me a Hadoop 101 way back five years ago, it really, it changed my life. Quick study brother, quick study. You understand this, good to see you again. Thanks very much, Ryan, thank you. All right, keep it right there everybody, we'll be back with our next guest. This is theCUBE, right back.