 Okay, we're back live in New York City for Hadoop World 2011. This is siliconangle.tv, siliconangle.tombs, continuous coverage of Hadoop World 2011. We are here on the ground with our flagship telecast, theCUBE, where we go out and talk to the smartest people, extract all the knowledge and share that with you here on theCUBE, and we're here for two days today and tomorrow all day, talking to all the top executives, thought leaders, bloggers, Twitterers, and we have here a famous Twitter, Dr. Lucky Spin, Todd Papayuanu, who is an entrepreneur in residence at Battery Ventures. I'm here with my host, Dave Vellante. Todd, welcome to theCUBE. Thank you for having me. So, you know, we missed the boat with you at VMworld because there was a huge band playing and you were slotted to be on theCUBE. You didn't make it to work. Someone like Paul Maritz gazzumped me with this time slot. I don't know what he was thinking. Glad to have you back. You're a chief architect. You're a tech geek. You're in the big data. You were at Yahoo, before that, Teradata and Green Plum. You know the space. You know the environment. The world's changing. So, you know, you're working on a startup. I know you don't want to share any details. Obviously, when you're an EIR, what that means is you're cranking something up and trying to put things- It's a pretty, pretty good, you know, guest that I might be doing something. Yes. He's doing something. Chief architects just don't sit and work with VCs unless they're retired doing due diligence for them. But no, in this case, you know, the guys at Yahoo from Spun Out Hortonworks. Yeah, all my friends. You've seen large scale stuff. You know, it's shaking hands. Everyone knows who you are in the community. So talk about what's happening. I mean, you're in a position now where you've got a VC office. You're working there and they have a good view of the landscape, looking at all the investments. What's the hot area? And what are you working on, you know, categorically, some of the things that you're seeing that are hot? Can you share a little bit? Yeah, absolutely. We'll talk about, so, I mean, I think where we're, you know, the market now is highly interesting, right? If you think really, we're kind of two years into the market formation. And I would say that we're just at a point now where we're past the kind of like missionary phase, you know, where, you know, Cloudera was doing a great job out there, kind of like teaching everybody in the world. Now we're at the point where the early adopters have adopted and, you know, the fast followers are starting to actually start to jump on the platform. And so the formation of the big data market around us is accelerating, I think, you know. So if you look around, you know, to that Hadoop world compared to last year, just even compared to Hadoop Summit, that was, you know, only like six months ago, whatever it was, less than that, three months ago, there's been a tremendous amount of, you know, change in the industry. I think we're just starting to see it accelerate. I think what's interesting from an opportunity standpoint is that, you know, most of the work that we've seen today has really been kind of like at the plumbing level. And I think we're going through the next phase of kind of like market, you know, evolution where people are going to start to come out, hire up the stack, right? There's going to be much more opportunities for enabling application development, you know, allowing people to build applications, big data applications more quickly. And ultimately over time, I think, you know, value accrues in the market, hire up the stack. People want to pay for business insight and value analytics, you know, over time, they don't really want to pay for infrastructure and plumbing. What about your architectural issues? I mean, you know, a lot changes. Is there anything on your radar in terms of, you know, you look at this and you look at solutions. A little bit more complexity now. I mean, obviously the growth brings in new players, a little bit more diversity outside of the cocoon of Apache, you know, the green field of this, as they say, of Hadoop platform, which is accelerating as you said. But now you've got, you know, unstructured data now blending in, in production with structured data, you know, the SQL database, relational database market, which has traditionally not been the fastest. It's been, you know, a lot of overhead involved. Different use cases. You know, we're seeing stuff from HP. We were just at the cube last week, they announced new servers, 280 servers on a fricking slot and a ProLiant box. Low voltage, off and powered on. So, whole new paradigm coming down this pipe. There's 2,880 servers in Iraq. In one wreck, it's in one wreck. One wreck, one wreck. So, more servers and a card. Micro nodes, basically. Into a ProLiant. Managing cores, turning off cores, really cool. But this points to a whole new architecture, a network architecture, or software architecture. Can you share with us what you're seeing there? Yeah, I mean, I think there's, so as I kind of like look at the market and try to take a step back and look at the mega trends, right? There's 2 mega trends that I see happening at the minute. One is cloud and one is big data. And those are the 2 mega trends for this decade. If you look back last decade, probably the 2 mega trends you could identify were SOA architectures and virtualization. So if you buy that assessment, that look cloud and big data into 2 trends, what does that really mean? Cloud is really a way of thinking about how you deliver business applications. And you put this kind of like cloud fabric in place. And so there's a lot of stuff that goes on underneath that fabric. And for everybody running applications, they shouldn't have to care about any of that stuff. So I think actually that this decade we're going to see a tremendous amount of innovation, more kind of like exotic configurations of stuff in the data center underneath that fabric. And then above the fabric, the applications are going to be rolled out much more quickly, more dynamically, more elastically. And the most interesting thing going on up there is big data. I think we're moving to a trend now where data applications and the next generation data applications are no longer going to just be batch offline applications. They're going to be real time big data applications where people are running data through their platform in real time and being able to act on that in much more real time than they traditionally were. Real time, so big data will go real time. We've said it. Yeah, we're already saying that. And let's quickly define real time. What do you mean real time? Is that transactional? So what does that mean? Like before you lose the customer, you can react? Is that a fair definition of real time? If you look at the types of applications we were trying to get to at Yahoo when I was there, what we were trying to do is get to a real time platform so we could do inter-session targeting or optimization. So that doesn't have to be, that's not microsecond level, right? But it's still enough so, you know, if we understand that you search for a holiday in the Bahamas, when you go to another page, we don't show you an advert for a holiday in Antarctica. You're probably looking for a hot destination. Right, so that's all the stuff. So I think, you know, traditionally so far with the evolution of the big data market, we've seen that it's been incredibly good for doing batch offline science. And people have been able to do a whole bunch of data science on this data. But you know, even if you look back at the traditional database world, the evolution and traditional database world was from the batch window reporting to the active data warehouse. And there's a whole reason for that. Time to decision, time to insight is actually highly important for your business. So we're gonna see the same thing in big data. It's just a different type of data. You talked earlier about that how much data is actually exploding. And you look at, you know, recent Gartner reports saying the amount of data in the world is gonna explode by 800% in five years. But 80% of that data is gonna be unstructured. So we have the opportunity here for a completely new, you know, ecosystem marketplace and landscape to appear. The people are gonna go and get value out of that, not just in science, but in real time, much more business inside, real time for their business. So the scenario that a lot of people put forth at this event is that, particularly, we had Mike Olson on the Cube and he basically said, look, this is incremental. It's not competitive to existing, you know, traditional transaction environments. Do you buy that or is this, there's a feel of disruption in the air, isn't there? I think there's disruption, you know, for certain parts of the workloads in the traditional data space. I think it's also very complimentary. There's a reason that structured databases, you know, and I've been at tour to kind of like leading companies there, right? Do so well, right? If you just look at Teradata, they're killing it at the minute. Green Plum are doing very well as well. There's a reason that we've been able to derive a huge amount of value out of structured data and models and all the rest of it. But I think what we're finding now is that there was this class of data that traditionally people just used to drop on the floor. Was this unstructured data and people were like, look, I don't know what the value is in it and I don't know how to use it and I don't have a platform that has the right dollar cost for me to actually store it. So they just dropped it on the floor. And in the last few years, we've found that things like Hadoop have really driven the cost down and so people are starting to look at this data again and go, oh, there is value, there is insight. And there's actually this drive that a lot of this data has been generated by the consumers. And so for a lot of the online companies, whether they're e-commerce, whether they're gaming companies, whether it's just internet companies, actually targeting the consumer and understanding how to do better localized deals or advertising on content or coupons or you know, that sort of stuff that's actually been driven by this construction data. So I look at it as very complimentary but I think you're right over time as this market unfolds. So I think we're going to see that because of the volumes of data, and the way people want to deal with the data, we're going to have new paradigms and how people actually get to interact and query that data. I mean, it's my senses and I talked to a fair number of customers in the enterprise data warehousing space is that a lot of installations are a real mess. They're cobbled together. It's like they got one of everything. They're, it's bubble gum and bandage. Oh, we'll try that now. Another chip from Intel. We'll put it in and see if that helps. I describe it as a snake swallowing a basketball and it's a very painful environment. And now, the dupe is so new, and it's not solving the same types of problems but the, is there potential in your opinion to actually change that dynamic, that painful dynamic that is known as the enterprise data warehouse? You know, are we overstating it? Well, I think we can even step back and say, look, this is not my fault, I think I was in when I came over, you know. I think we could actually even abstract up a little further and say, look, the dirty little secret in enterprise software is there is no enterprise software. There's enterprise solution ready for customization with my PSP, right? But, right, as it's not just the enterprise data warehouse but I do think that that is an example of the fact that the enterprise data warehouse used to take the kind of six to nine month modeling phase and the data models are there and then the data cleansing, the MDM and all the rest of it. And people have found now that the pace of business and the pace of requirement of kind of like insight from that data has driven us to these new platforms where what we want to do is just plow data into a data fabric, that's how I think of big data. It's a fabric that you just plow data into, you don't have to pre-model it, you don't have to pre-structure it and then you're able to derive value out of it by being able to navigate and query through the soup, right? And so I think actually we're going to see the query models and the way we actually think about extracting value out of the data change. So if I pose the question to you and said, what do you think the world's largest, most widely used big data application is on a daily basis? What do you think the answer is? Say, upload these pictures. Google, Google is a big data application. They pull data from our unstructured data sources from all over the world and the search interface is the interface to that data. So I actually think that over the next decade we're going to see the return of the inverted search index or just the search index as the navigation query interface into this unstructured data. So we're going to see a lot of change in the way people extract value. But fundamentally, you know, all businesses want intelligence. Is that a little teaser? No, no, no, no, no. I see it isn't, it's just, this is one of my pontifications and predictions of the future, right? Good cube, right here, that's some good cube action right there. We'd love to, Dave and I will talk about, because Dave's a big horse racing man. We like to handicap the ponies on the track. So let's go through and talk about the growing market if they do rising. No, far enough, you know, a lot of those people are my friends, so let's not call it handicap, we'll just call it like, you know. Being polite. We got some really stallions on the track. Exactly, there's some studs out there. Studs out there. So we have different, so rising tide floats all boats, right? Hadoop is growing. So the more people use Hadoop and there'll be winners and maybe Cloudera maintains that number one lead position and continues to do great work and they got $40 million of fresh capital. God bless you. Yes, they have. And that's good for them. But you got Hortonworks, you know the guys from Hortonworks, Ex-Yahoo, you knew those guys, they're out there. For full disclosure, I'm on the board of advisors at Hortonworks. Okay. So I love those guys. I love the Cloudera guys too. And you know what? There's nothing wrong with being number two in a huge growing market, right? Well look, here's the really interesting thing, right? I think which is, you know, let's take a company like MapR. MapR are doing extremely well because they're really focused on enterprise features. And in some ways, they've delivered features ahead of the Apache community. So the challenge for companies like Hortonworks and Cloudera is actually to be able to implement those features and catch up and move ahead. But, you know, we still have the, you know, the challenge of working with an amorphous community who will have, you know, individual, you know, criteria for what they want to do. So I actually think that MapR has been fantastic for the big data community because it's caused a challenge to Apache to do and that everything is going to get better because of that. Competition is going to drive competition. Competition drives innovation faster. Exactly. People don't get, you know, lazy and kind of like take their things for granted. Right. Exactly right. Because looking in their rear view mirror. Okay, so Hortonworks is back on the track, you know, spun out of Yahoo. Very successful marketing campaigns are running. Very strong PR right now. The management suite's in alpha. So that's kind of obviously a strategy. Let's talk about MapR. I mean, does MapR have a chance? Is it just a unique work use case? Does it have a chance there? I mean, obviously EMC is making a big bet with its guys. I think they have a real chance. I mean, I think they've done a, you know, extremely good implementation of Hadoop. You know, it's, you know, has very nice enterprise integration features where they're NFS stuff. It's more performant. It's easier to use. It's easier to manage. What's not to like, right? If you're an enterprise. You can make a pretty strong case from MapR right now. You got back by EMC, you know, Green Plum's got momentum. You know, you see a shift toward the enterprise, right? The enterprise is adopting now. The early adopters were the West Coast companies. They're going to more likely buy from an EMC than they are, you know, a Cloudera. Now granted, Cloudera just made a deal with NetApp, for example, like that move. Okay, so you're seeing all these things come together. But I mean, MapR's got a product, Hortonworks, are people indifferent though? I mean, that's a question I want to know is that, are the users out there, the buyers, the marketplace, do you want to consume this stuff? Are they indifferent? Do they really care? That's what I think they care about, right? And just to kind of pick up on that point, look, so I think MapR have done great. You know, they have a great product, right? I do think that Cloudera and Hortonworks are going to catch up, right? I'm very confident that my, you know, guys at Hortonworks are going to do a great job. But to your point about, are they indifferent? What I actually think the market wants is easy to use features further up the stack. People don't want plumbing anymore. They don't want to have to get at a distribution and install that distribution and figure out, I've got 25 components that I have to wire together before I can even build an application. What they want is a fabric, like an Oracle, that says, boom, it just works. And here's a language, PLSQL, that I can build applications. And it's that thing I want. They don't want a kit. They don't want a kit. They want a product, right? So I think actually over time. The platform, they want a hardened top. They want a hardened platform with a symbol to use interface, right? And right now you've got all of these different components that you have to wire together. So companies are basically forced to go and acquire talent and build distributed systems teams and stand up a platform. Build a JSON connector, it's like, you know. Yeah, so, but here's the good news. We're really only two years into kind of like this market formation. The stats I gave you about, how much unstructured data we're going to have tells me, look, it's a great opportunity. And who believes those numbers? It's probably higher. It might even be higher. It could be higher, exactly. Gardener. You know, exactly right. Double. You know. Even if it's holder of magnitude close, I think there's a big opportunity. They're usually doubling the numbers. In this case, I think they're error on the low side. Right. No, I think that more than 80% of the world's data is going to be unstructured. What's the number? 8x is going to grow? 8x in five years, 80% unstructured. So to me that says, look, look at how many people around here are with companies coming out of stealth or working in the space. It's embryonic. It's very embryonic. It's early stages. I really think 8x in five years is immediately. It should be, basically, you're doubling every year. I think 18 months to two years is what they say. Exactly. I think that's compressive, actually. Are there other approaches that haven't been launched yet that you think might be dark horses that could come out of the woodwork this year? That's a good question. I mean, I think that there's undoubtedly some startup companies out there working in the stealth space, working on some of this stuff. They're different approaches to dealing with it. I'm not so sure. Lexus Nexus? Lexus Nexus came out as interesting. Strategically, they have a challenge, right? Which is, let's say their platform is great and performs and does all the rest of it. But it's very difficult to establish a totally new beachhead and new platform and standard in an existing ecosystem. So, from a strategic standpoint, I think they face an uphill battle, right? I think anybody who's going to come out in the big data space at this point has to be in some way associated or complementary to the Hadoop ecosystem. It's going to be very difficult to establish a completely new platform and language, right? And this is just so simple. It's like Apple simple that people are like, oh, okay, I got it. And one company that actually has a great chance is Splunk. Splunk is a company that has a very, very interesting platform, right? They make it extremely easy. They just announced their loop connector stuff today. Make it extremely easy to just plow data into a fabric and get it out using kind of search and the graphical user interface. How much of this open rhetoric is, you know, we're more open than you are. How important is that, do you think, to customers? Or is that just a way to, you know, get some good marketing going? I think there's a lot of marketing there, right? I mean, if you said, you know, how open are you? Teradata is doing better and better revenue every single quarter, they're absolutely killing it. Oracle open world. Yeah, exactly, right? So, look, I think people, you know, at the enterprise, this is what I've learned over the last decade that I've been doing it. People on a platform that works, that's stable, that's durable, and single-throw to choke when they have a problem. As you said earlier, they have right now a patch, worker stuff that they have to assemble from all over the place. And that's painful, right? If you talk to all of the CIOs out there, they're kind of like, yeah, man, this is not exactly what I want to be dealing with. But you know, it's interesting, because I think, I agree with you, short-term. No question, customers want solutions. But the market has proven that long-term, open wins, right? I mean, certainly Linux, I mean, open source, neutralized Microsoft's monopoly, you could argue. I would argue, anyway. Yeah. And so. So, there's a really interesting quote from Matt Mullenwick, who's the founder of WordPress. What he said is the formation of his business, he said, the more open I made the platform, the more successful the platform became. And he's actually, you know, and it's open source, but it's a very, you know, something I took to heart was like, you know, you definitely need to make it open. It doesn't mean, though, that you can't actually have a, you know, very viable, successful company. At Yahoo, one of the things to just pick up on your point about open source, at Yahoo, part of our strategy was to say, we actually believe that the infrastructure that we were building was not a competitive to answer to our business, it was a cost of doing business. And so, we were going to open source all of it. So, Hadoop was kind of like, they're kind of like early forerunner of that, but we also opened, sourced a bunch of other projects, and we were working on open sourcing, you know, we opened source traffic server, which was to see the end stuff, we were working on the next generation cloud serving container and open sourcing there. And the reason was, is that, you know, infrastructure software is hard to build. It's expensive, it takes a lot of resources and it takes time. And so, if you look at the internet, how the internet grew up, the internet infrastructure software was open source, we built this thing that, you know, we all now depend on. Our thesis at Yahoo and my thesis for what we were doing in cloud there was to say, below the pass, everything was going to be open source and we were going to do that as a community and the whole thing was going to rise up, but the secret source applications it's had on top, that's where we were going to differentiate. So, I think we're going to see that again in this market, have this thesis in big data that it's kind of like three layers in the market. There's the bottom layer, which is the plumbing, data storage, infrastructure layer, right? That's saturated now, we've got plenty of players there. The next layer up, where there's application enablement, monitoring, building tools, starting to see some people there. And then the top layer is kind of like analytics, horizontal analytics or vertical analytic applications. Once we get to that layer, that's where the secret source is, that's where you're going to see closed applications, just like you did in the database world. You know, it's just going to take time to get there. So, you're saying you still can do, have both open source and sustainable competitive advantage, you just got to pick your spot. Yeah, exactly. You pick at which layer you want to play at. Absolutely. Okay, we're here live, Hadoop World 2011, day one. And we got two days of coverage, silicon angles, continuous coverage, wikibon.org, check out the websites, check out siliconangle.tv, siliconangle.com, research at wikibon.org. You got questions, we got answers, John. Yeah, we have an answer for everything. So, yeah, ask anything, question. I'm just gonna win this. At Twitter, it's at Furrier. Todd, Popo. Papa Ioannu. Papa Ioannu, Papa Ioannu. Dr. Lucky Spin, at Dr. Lucky Spin on Twitter, check them out. Guru in Big Data, chief architect, understands this deeply and is an expert and working on a new startup. So, if you're interested, contact them at Battery Ventures. What's your email address? Todd P at Battery.com. Todd P at Battery.com. And if you put the promo code, the cube, he'll give you a special lunch invitation. Absolutely. I added that in the end. Ha, ha, ha, ha, ha, ha. Just what's next? I mean, like, we have to prognosticate around the future. So, like we talked about, you know, a lot of cases of stags. What's around the corner? So, let's assume the ecosystem starts to go. I mean, everyone plays nicely. There's competition. People have got their running shoes on. The demand on the marketplace is for products and solutions. I agree, applications will be key. Cloud provisioning's getting easier. What's next? That's a great question. I mean, I actually think next is, you know, the developer focus. Right now, if you look at the marketplace, companies have to build big data applications, have to go and acquire really rare kind of talent. Distributed systems talent first, then people who can build on top of distributed systems. We look at a kind of like evolution of what happened in the database world, right? This database came along as the first data fabric. And then people suddenly started building applications on top of it. PLSQL became kind of like almost a facto development language on top of data. And that enabled all of these applications to be built. Then it took time for the archetypal pattern applications to appear, whether that was, you know, people's software, CRM, or Salesforce automation. So I actually think if you think about, you know, BI took maybe 30 years to unroll to where we are today, unroll to where we are today. I think big data goes faster because we've already been through it once and people have learned a lesson. So let's say it takes 15 years to get to the point where it's as mature as where we are now. The first couple of years are done and the early adopters have put their feet in the water. Well, what's next? Next is everybody wanting to build applications, right? But the stack isn't there yet, right? The stack is not there to enable people to do it. So there's all of this pent up, you know, kind of like, you know, frustration out of like, I don't want to monetize my structure data. I know there's an application here for me to build, but right now I'm just dealing with plumbing. So I think the next kind of like three to five years will be a focus of this bespoke one-off apps being built by people internally and the application developers, all of their SaaS startups out there who kind of like come through the battery doors saying, I have an idea for an application. Then too, they're building apps and then over that period of time these patterns start to emerge and we'll see the winners whether it's the Salesforce or PeopleSoft or... Or a new startup, new incumbent. Yeah, no, yeah, I mean those types of app pattern applications that kind of like archetypal big data apps will emerge. We don't know what they are yet. We really don't know. Other than maybe consumer intelligence, we don't know what the patterns are that exist out there. So I was talking to Michael Olson about this and you know, being a entrepreneur since in 1997. You know, I'm out in the trenches always trying to look at startups and do startups and we've got our own going on now and a few other projects. You know, remember during the dot-com bubble and even before that, but mostly in the 90s and 2000s, you go in with startup idea, you go see a VC, it's like, oh, I love the idea, but that's a feature not a company, you know? And that's, back in the day, legitimately, you know, features were not something to invest in because you had to build the data center, you know, do all that stuff. So today's environment, you know, which is well documented all over the place, it's cheaper to start it through a startup. But with big data, you could actually differentiate on a feature and aggregate using open source, other stuff, get massive traction through virality, meaning penetration, use case, good product, or a good feature in this case, and build a company around it. Lower cost, higher profitability. Do you believe that? I mean, that's kind of my vision. I mean, Mike agreed with it. Big data fund is going to look at that. So that kills that argument. I do believe it. I mean, I think for startups now, our life has never been better, right? I mean, you can start a startup with like, you know, MacBook Air sitting in Starbucks and having a massive infrastructure running up on Amazon, right? When you need it, you start for like one VM and pretty soon you have like, you know, 2,000 VMs running up there and you never did any data center and never bought any equipment. Back in the 90s, right?