 Dave Vellante, we are here live at the Duke Summit 2013. We're here at the San Jose Convention Center. Two days at the Duke Summit, the big open source community, sponsored by a number of folks, but headlined, of course, by Hortonworks. And this is theCUBE, where we go to the events, we extract the signal from the noise, we bring you the absolute best guests that are attending these events. We pick their brains, we package up information and share it with you, our audience. I'm Dave Vellante with Jeff Kelly, wikibon.org, John Furrier is on his way down from the meeting in San Francisco. And we will be covering wall to wall two days at the Duke Summit. A lot of action here. We're seeing the Duke community maturing. We're hearing a lot about Yarn, yet another resource. Manager, not the greatest acronym. I thought it was an N. It is an N actually, which is why it doesn't... We'll talk about why the world needs yet another resource manager. And, but we're talking about the maturation of Hadoop. Last year, Jeffrey Moore gave a talk on the whole notion of crossing the chasm. The senior executives here up on stage, folks like Rob Bearden and Sean Connolly are putting forth this notion that Hadoop has crossed the chasm. We're going to talk about that. We're going to unpack that and check that out. So we were in the keynote this morning. Rob Bearden, the CEO of Hortonworks, gave a keynote. Rob is a long time CEO of open source companies like JBoss and Spring. Merv, at Merv from Gartner, Merv Adrian gave a great talk talking about some data that he just published, a survey. The notable pieces of that survey, Jeff Kelly, 31% of the people have no plans to do anything around big data. And then of course, Sean Connolly talked about the modern data architecture, which again, I want to talk to you about, Jeff. He drew a sort of slide that showed the applications up top, showed the data sources down below, like, you know, SQL databases, and basically the data in the middle, layer in the middle, and then basically showed Hadoop as a bolt-on to that, which was kind of interesting. So sending forth the message that we're not going to have to rip and replace to actually make this thing work. So you were in the keynotes, what are some of your thoughts, and then we'll break down some of the news that we're hearing today from a number of folks in the Hadoop ecosystem? Well, I think the key message that Hortonworks wants to get across in this conference is that Hadoop is growing up. Herb Kunitz, the president of Hortonworks, mentioned in his opening keynote that really the baby elephant is quickly growing up into a large, robust adult elephant. So the idea here, of course, is that Hadoop is ready for the enterprise, and the key is, according to some of the speakers we heard this morning, is to enable the community, the open-source community, to continue innovating and building out the core Hadoop, the core components of Hadoop, HDFS, MapReduce, now Yarn. Keeping that open-source, that was a very key message coming through loud and clear from the Hortonworks executives who spoke this morning, including Rob Bearden, CEO of Hortonworks. So this show is all about making Hadoop enterprise-ready, continuing that journey, adding components, or I should say more than a component, really a fundamental new core pillar of Hadoop called Yarn, yet another resource manager. Negotiator, actually. Negotiator, there we are. Yeah, there's your end. So adding that component, or that core pillar to the two others in Hadoop, that those two others being HDFS and MapReduce, really enabling new types of applications on top of Hadoop, so that Hadoop can really serve as a core enterprise data platform that you can run multiple applications, have a lot of concurrent users, that you can manage that environment in a way that they all perform at optimal levels. We'll talk about that a little bit. So Yarn is designed to solve the problem that MapReduce is essentially a single job stream. So Yarn allows you to run multiple job streams, schedule multiple jobs in tandem. So we heard some data today that Yahoo runs 30,000 nodes, 400,000 jobs a day, and 10 million compute hours per day in Yarn. The significance of that is it's to X the amount of work that they can do on the same exact hardware. So that's the point of Yarn, right? It allows you to do a lot more with less, right? It allows you to do a lot more with less. It allows you to really manage those resources. So in a lot of early Hadoop environments, if you were to run an H-based application, for instance, that would take up a lot of the resources on your cluster, and you really couldn't do much else with your Hadoop cluster if you're running an H-based application. So basically what Yarn allows you to do is run that H-based application, but because it manages the resources more efficiently, you can also maybe run a graph analytics job. You can potentially run a SQL type job with Hive or Tez. So really it allows you to deploy multiple applications simultaneously on your Hadoop cluster, make better use of those hardware resources, and really, again, make it a true data platform rather than just a single use case application. So we're also hearing a lot of news today. I know that you're seeing a lot of partnerships with Hortonworks, Teradata's got some news, Wendisco's got some news, Fusion IO and MapR. MapR's got some news. What are you hearing out there? What's the news of the day? Well, yeah, there's a lot. As often happens on the first day of confidence like this, there's just a flood of press releases that hit the wire at 9 a.m. So yeah, there's a lot to wade through, but I think some of the highlights are a couple that you mentioned. So let's talk about Hortonworks first. They're announcing a few things. So Community Preview, HTTP, a Yarn certification program. There's also a joint announcement between Teradata and Hortonworks, where Teradata will now be reselling the Hortonworks data platform in a number of form factors, including a Teradata appliance for Hadoop. You mentioned MapR. They have partnered with Fusion IO to the net result really being a significant improvement in performance of the MapR Hadoop distribution and the applications you can run on top of MapR's M7 distribution. Thanks to some of the flash enabled hardware that Fusion IO brings to the table. WAN Disco, another company in the Hadoop market, announced today a number of things, including the latest iteration of their distribution, which includes some tools to really make it easy for customers that maybe have started experimenting with Hadoop on Amazon web services and want to migrate that internal to internal cloud. With S3, so they got Global Namespace, the S3 piece, they got some in-memory stuff that they're doing, and then the new Hadoop distro. A lot of things happening there. And WAN Disco, a very interesting company we're going to have them on later. Sure. So yeah, so a lot happening in the news. Most of the companies here have announcements. They've timed them to this conference. So let's take a look, Jeff, at our schedule today. So we're going to kick off. John Furrier is on his way down from San Francisco from a meeting. Rob Bearden's coming on right after this kickoff in just a couple of minutes, about five minutes here. Arun Murthy, who's the founder and architect of Hortonworks. David Richards, the CEO and co-founder of WAN Disco. They got three or four announcements today. Intel is coming on, Merv is coming on. Amarawa Dalla of Cloudera, person who's been here from the beginning of this big data and Hadoop movement. Sanjay Mehta, who's the vice president of product marketing. It's Splunk. Splunk is really doing some amazing things. Anjul Bambri, vice president of Big Data and Streams at IBM. Streams is a very interesting new capability. I'm going to talk about that a little bit more in a moment. Scott Now from Teradata Labs is coming on. Brian Bukowski, very innovative startup called Aerospike. Yana Ulig is coming on. She's the CEO of Hstreaming, a company that was essentially spun out of IBM's labs, the streaming technology. They're taking that to the new level. So this is actually making decisions on data before it gets persisted. I also want to do a, well first of all, go to siliconangle.com, check out all the news. Go to wikibon.org, check out all the research. Siliconangle.tv is where you can see all these live broadcasts. We've got multiple channels going on as well. Go to youtube.com slash siliconangle. You can see all the playlists from all the events that we do. And I also want to give a shout out to our sponsors. Of course, we're here in a large part because of Hortonworks. They've built this fantastic stage for us and really the anchor sponsor of this event. Teradata, Newtonix, Squirrel, Adapt, MapR, Splunk, Wendisco, Cloudera, Cubol, and HStreaming. Thank you so much for allowing us to bring you the coverage from events like this and generally and specifically the Hadoop Summit. So Jeff, what are you looking for for the next two days? What should observers be paying attention to you? To what are the kinds of things that are going to excite you? Well, I want to hear a lot more about Yarn, of course. Really, Yarn has the potential to really allow Hadoop to take the next step in terms of its ability to be a real fundamental data platform in the enterprise. So I want to hear about a little bit more around the details of how that's going to happen. The state of Yarn, as it stands now, it's a three-year effort that Arun Murthy at Hortonworks has undertaken really leading in that project. So he's been heads down on that for three years. So we're going to have Arun on later today to talk about that. Want to hear from a lot of the supporting players in the ecosystem how they're going to leverage Yarn to actually help create and apply their applications on top of Hadoop. So those are some of the things I want to hear. And of course, at a big ecosystem conference like this, I want to hear about the partnerships and what's going on between vendors. We heard during the keynote this morning from CEO Rob Bearden that clearly this is going to take the entire ecosystem, a community effort really, to improve Hadoop, to really make it a core part of the enterprise IT infrastructure. So I want to hear from those vendors, I want to hear how they're working together. At the same time, a lot of them are competing. So that will be interesting. And again, the news items, other things we didn't mention, of course, Hortonworks announcing $50 million of new funding. So that's something we'll talk to CEO Rob Bearden about. Companies like Datamir announced their 3.0 version of their platform with what they're calling smart analytic applications. So there's just a lot of news happening here. Splunk announcing a new product called Hunk. I particularly like that name. Basically, Splunk Analytics on Hadoop. So lots of news to cover, lots of developments, both on the business side and the technical side. And of course, the other real thing I'm looking for is to hear from practitioners, people around the field using big data, using Hadoop, hear what their challenges are, some of their success stories, and really see where this is going inside real enterprises. All right, great, thank you, Jeff. Keep it right there, buddy. So we'll be right back, Rob Bearden's coming up next. We're going to talk about that funding. We're going to talk about this event. Has Hadoop crossed the chasm? What needs to be done? He said the two things you got to look for, one is to harden Hadoop and two is to bring those data services to the market. And he's stressed, we cannot do that alone. It has to be an ecosystem. So we'll be right back with Rob Bearden, John Furrier's joining us. Keep it right there, this is theCUBE, live from Hadoop Summit in San Jose.