 Okay, we're back, we're live in New York City for Hadoop World 2011, this is SiliconANGLE, SiliconANGLE.tv Productions, this is theCUBE. My name's John Furrier, the founder of SiliconANGLE.com. My co-host is here and his name is? I'm Dave Vellante of Wikibon.org and we're here with a special guest, Jeff Kelly, also of Wikibon.org. Jeff is a big data analyst for the Wikibon community. Welcome, Jeff. Thanks for having me, guys, appreciate it. Yeah, good to see you. Hadoop World, third year in a row. Your first year, my second, you know, John, you've been here since the beginning, but so, Jeff, what are your initial impressions of this? You know, you're scouring the crowd, talking to some users, what are you seeing out there? Well, there's just a ton of interest in Hadoop, all things Hadoop right now. I mean, we're seeing this evolution of Hadoop from a kind of a niche or almost a fringe approach to big data analytics and processing to really starting to gain mainstream adoption and we're seeing more business users here as we heard in the keynote. I think about a third of attendees here are actually kind of business users. They want to understand how Hadoop can help them in their everyday business. So we're definitely seeing a shift from the more techie crowd to include some business users. Yeah, so I know leading up to Hadoop World, you published, along with some other folks in the Wikibon community, was really your effort, the big data manifesto. So a big project that you undertook, really trying to describe the ecosystem, the use cases, the industries, the market segmentation and the like. Talk about why the big data manifesto, what led to that? Well really, we were hearing from our community, the Wikibon community that they wanted this information, really. There's a lot of interest around big data right now and our community wants to needs to understand what it is, how it can impact their business. There's a lot of questions around adoption challenges, how Hadoop can fit in your existing IT infrastructure. But I think most importantly, how can it be applied so that organizations can achieve competitive advantage. And so that's why we put together the manifesto. So what's the big premise that you're sort of putting forth? Right, well the premise is the big data is the new reality. I mean, the fact is that old ways of storing, processing, analyzing data just aren't gonna cut it anymore with the variety, the volume, velocity of data that organizations are now grappling with. So organizations that want to maintain their competitive advantage, want to innovate, want to create new ways of doing business, start exploring new opportunities that are made available to them by big data, need to kind of embrace this new reality. Okay, are people really making money though? I mean it sounds good, but how real is it? It's real, I mean our research indicates that there are some good number of organizations that are working with Hadoop now, from, you know, you've got the large, the web companies essentially helped establish and develop Hadoop, Facebook, LinkedIn, Yahoo, Google, et cetera, and those are the well-known use cases, but we're also seeing organizations, kind of online retail organizations, Orbitz for instance, eBay, Amazon, they're big Hadoop users as well, and they're all making money by optimizing their business in various ways, finding new efficiencies, developing new products. There's also financial firms, we heard this morning from JPMorgan Chase in the keynote here, financial firms are really starting to understand that Hadoop can help them manage risk, manage their exposure, I mean we've got the Greek debt crisis going on now, manage their exposure to Greek debt or whatever the case may be. Also customer, pretty much across industries this can be used to understand your customer better, we heard about a 360 degree view of your customer, use that information to offer better services products and basically provide unprecedented levels of customer service. Okay, so talk about big data, what's different for the people in the audience that may not be familiar with the concept, big data, little data, what's the difference? Well, obviously people hear big, they think volume, and that's true, we've gone from small data gigabytes, terabyte of data a few years ago was not that common, even in large enterprises, now we've got multiple terabytes, hundreds of terabytes, even in the petabyte level, even for non-web companies, so that's certainly one factor. The variety of the different types of information, it's unstructured data, semi-structured data, it doesn't fit into neat rows and tables, small data, stable data models, known complex into relationships, big data, we're talking petabytes of data, we're talking unstructured data, flat schemas, so it's really a whole new sources of data, so it's really a new world. Jeff, as you're talking to end users, you're talking to all the companies, as well as end user and practitioners out there who are kicking the tire as we heard, proof of concepts are moving up to other areas, where's all this data coming from? We heard different approaches all originates with Cloudera, is that Cloudera's message, all originates with Hadoop, we heard Informatica say something different, where's all the data coming from? Well, it's coming from a lot of different sources, I mean I think kind of the, what really kicked off what people think of as the big data era, I think was the social media, social networking data, you think about tweets, Facebook updates, LinkedIn updates, I mean a tweet, for instance, has multiple data points, it's not just the text, but it's the time, the user, the device, so there's multiple data points being created, all types of social media updates. You've also got mobile devices that are sending back location based data from organizations are using that as you optimize, give you real time advertisements, real time offers, depending on where you are. You've got network data, hardware, IT hardware, they're sending data back to kind of the home base constantly, sensors are just kind of becoming a part of the way we do business in terms of think about the trucking world, fleets of trucks, they're censored, organizations can use that data to route more efficient routes, gas consumption, things like that. Were you with Jeff Kelly, who's the author of The Big Data Manifesto, you can find that on wikibon.org, it's in real in-depth comprehensive document, it's obviously a wiki, so it's always editable, again, Wikibon's information and research is free, Jeff deals with all the vendors and talks to end users, so go to wikibon.org and find The Big Data Manifesto, or just go to Google and search Big Data Manifesto, and you'll see the link up there, it's either the top one of the first two links on Google search. Let's talk about the marketplace, right? So you have different market segments, and we're here at Hadoop World 2011, a lot of vendors, Aster Data, you got Arista, you got Dell, so just let's handicap these guys and talk about how they shake out and what bucket do they fall in, because, and tell us, what are the buckets? Business intelligence, analytics, just describe us. That gets confusing sometimes. Can you lay that out and just kind of like... Yeah, well, you know, there are different, I mean, when we think about The Big Data, you know, the market segments, you know, they lay out in terms of hardware, software segments, your services segments, the hardware segments, and Mark, I think we have a slide we can put on the screen. You've got your storage server, your networking equipment, you've got your hardware that's going to power this Big Data deployment. Which Andreas Wagens says, the infrastructure is irrelevant. Well, you might hear a different argument from some of the players here today. Tell us in that app. Right, so, you know, commodity hardware is the foundation here. Without that, this wouldn't really be possible, wouldn't be economically possible, so you've got players like Dell, HB, Cisco, a lot of the names you've heard before in the hardware business. Then we've got the kind of the Hadoop distribution game, which we can go into a little more detail a little bit later, but in terms of the different options out there, you've got the open source options from, you know, the purely open source. So we're looking at the slide, this is the Big Data market segments that you've laid out in your manifesto. Okay, so go ahead and continue telling them what they're looking at. You know, you've got free open source distributions, you've got vendors like Cloudera selling enterprise level distributions that bundle in different levels of proprietary customization as well as services. Then you've got some non-Hadoop Big Data approaches. You've got Lexus Nexus has a company, or spun out a company called HPCC Systems. They have a competing approach. And then you've got kind of the the next generation data warehouse vendors, AsterData, Vertica, they compliment kind of the most Hadoop deployments. It's a slightly different approach, but some commonalities around massively parallel processing and map reduce within the database. So handicap a little bit. I mean, who are some of the companies that you really see as some of the favorites that people should be watching out there? Well, obviously, you know, we're here to do World Cloudera is has been on the scene the longest. You know, they've got a two year head start on the market. They've been around since about spring of 2009. So they've got, you know, the most mature distribution on the market. They've been updating it, you know, for two years they're on to version 3.5, I think. So, you know, they have, as I mentioned, two different levels. They've got a CDH Cloudera distribution, including Combatch Hadoop that you can download for free. You can download or you can purchase the Enterprise Edition, which includes their proprietary management console, as well as some services. So who else is going to make a boatload of dough in this space? Well, two different questions. So there's a few different players who's going to make the boatload of dough we can talk about. We've also got EMC, cut in the game over the summertime, creating their own distribution, mainly based around MapR's distribution, which includes a proprietary storage layer. So it's been painted as a largely proprietary distribution, and that's fairly accurate, I think. And then, of course, we've got just, over the summer, Hortonworks was spun out from Yahoo. And just, I believe, last week, Hortonworks released or announced their first distribution, 100% open source, including their custom built management. So those three are going for the big prize, right? John, we've talked a lot about the next Red Hat. Armour made some comments about the similarities, and the difference is one of the similarities is Red Hat's got a $10 billion market cap, and that's what everybody wants in this space, right? Yeah, and it's obviously validated by one, the presence of the vendors, the market segments slide that Jeff put out, obviously lays out the ecosystem or the landscape of the industry, but really the telling sign to me on the money making and the growth side, I'd love to get your perspective, both you guys on this, because I know you're tracking in doing some specific research on this, but the Excel partners putting in a $100 million fund announced today at Hadoop World is compelling. Cloudera recently just announced yesterday the close of a $40 million financing, more pile of money for them. So the big bag of money being brought into the sector, obviously they see returns in there, and we're going to have Frank Artalian, who's with Ignition Partners, all formerly the head of BizDev's end source, now part of Citrix, but he dealt in the cloud business the same way. Is it the same kind of momentum? What do you guys see as the end game for the monetization and opportunity? That's a really good question, John. It's interesting, we're talking about the big prizes that Cloudera, EMC and Hortonworks are going after, but those are largely funded, right? I mean, there may be some subsequent rounds, but the funds that Excel announced this morning are largely going to go after the rest of the ecosystem. So where would you put it? Well, I think building applications on top of Hadoop is definitely the way we're going to go. We heard that in the keynote this morning, and I completely agree with that. I think we're starting to move from what is Hadoop phase to what can Hadoop for me phase. We're not quite there, but we're moving in that direction. And the key to answering that question is building applications on top of this really powerful infrastructure that we call Hadoop. To actually turn that data that you've now processed, you're storing, you're managing it in Hadoop, make it work for you. So I think there's definitely going to be a huge opportunity for startups, other organizations, to build these types of applications to really turn Hadoop into a kind of enabler for both business analytics and ultimately achieving competitive advantage. That said, there's certainly a lot of money to be made in the distribution, the infrastructure layer itself. I mean, if that doesn't work, you can build all the applications you want on top and that's not going to do you much good. So we're going to have a remote day from Alex Williams, from the author of the H-Base book who's going to do a remote. The first time we've ever done that on theCUBE, we're actually going to use some of the technology that we have to do that with Skype. But my final question is, what are you seeing here at the event? I mean, any new discoveries? Obviously, you get your sharp eye out there and has that tie into some of the work you're doing. Well, we're seeing, I think some of the sponsors and vendors we see here is very interesting. We've got Oracle here, which some people might not have expected to see Oracle here. I was a little surprised seeing them, right? I was a little bit shocked too. I thought maybe somebody had misplaced the banner. They got to play, right? But yeah, absolutely. What do you think they're doing here? Do you think they're just doing reconnaissance? Well, they got into the big data game themselves, at least in name at Oracle Open World, which you guys covered with theCUBE there. So I think they're starting to feel it out. I think they're trying to understand this market. And that's probably why they're here. A little bit of competitive intel, that kind of thing. Just feel the pulse of the community here. Get a feel of the Hadoop community. It's an open source community. One of the challenges for a lot of the vendors here, the more traditional proprietary type vendors, getting into that market, getting into this market, kind of cutting through the open source community and kind of gaining acceptance. Are they Hadoop washing? The term that John Furrier coined today? Hadoop washing, I like that. We're inventing new terms on theCUBE, which is good. That's what we do. We extract the signal from the noise. And we're looking for noise, and the noise is in the form of Hadoop washing, which is essentially bolting on the word Hadoop. And it's a tricky one, like cloud washing, where we're going to look at that. I think that's definitely happening. I would include the big data washing, because you're seeing vendor after vendor is slapping on a big data XYZ product. What does it really mean? There's different definitions, depending on what part of the market you're coming from. So that is something that end users and potential companies that are evaluating Hadoop and other big data approaches need to cut through that. And that's what Wikibon's trying to do. Okay, just stay right here. We're going to go to a remote with Alex Williams, who's our lead blogger for services angle and SiliconANGLE in the enterprise, with Lars George, who's the author of the O'Reilly book, HBase book, which they're giving away, by the way, to ever all the attendees. I haven't gotten mine yet, but... Got yours. I do have mine. I read it last night, cover to cover. As you know, Dave, we have an HBase deployment. We've built our own custom HBase, so I'm really excited about HBase. It's one of those hot things that just totally works well for us. It's allowing us to do things we haven't done in ever, and if we try to do it years ago, it'd be very difficult with structured databases. So...