 At Big Data SV 2014 is brought to you by headline sponsors WAN Disco. We make Hadoop invincible and Actian, accelerating Big Data 2.0. Okay, welcome everyone to Silicon Angle and Wikibon's theCUBE, our exclusive coverage of Silicon Valley, Big Data, Big Data SV as we call it, hashtag Big Data SV. I'm John Furrier, the founder of Silicon Angle. I'm joined by my co-host Dave Vellante for three days of live coverage starting now. The world of Big Data is happening here in Silicon Valley. We have all the coverage and analysis, news, opinions, and commentary here on theCUBE, as always. And also coverage from the Stratoconference happening right across the street. Big Data is happening right here. And pleased to bring you exclusive wall-to-wall coverage. My co-host Dave Vellante, this is theCUBE. We extract the ceiling from the noise. And Dave, this is, I mean, our fifth conference with Big Data, Stratas seems like a, more on John, I think, started what, 2010, the end of 2010, when everybody was asking, what is Hadoop? And we were answering that question. I asked that question of you years ago and you explained it to me. You said it's the next big thing. And it has certainly become the next big thing, although John, I think we're seeing the slow maturity of Hadoop. There's still a lack of applications. We're seeing a lot of partnerships announced this week. We're here at Big Data SV. It's concurrent with O'Reilly Strata. We're seeing a lot of partnerships announced. Not a lot of whole new innovations. We're gonna talk about that today. Is that a sign of what Merv Adrian at Gartner calls the trough of disillusionment? He says that's progress, because we're going from left to right. But questions remain. We came out this week. Jeff Kelly quantified the size of the Big Data Market. It's big, as we said. But the interesting thing to note, John, is still dominated by a lot of the whales. IBM, HP, you see Dell in there. You see Teradata. So where are the startups? Palantir is the biggest startup there. Where are Cloudera? Where are Hortonworks? I think we're going to start seeing those guys in 2000. So the end of this year, end of 2015, start to break out. You're probably going to see some IPOs. But right now, it's sort of all quiet on the Western Front. Well, we're going to have a live coverage here on SiliconANGLE.tv. Go to SiliconANGLE.tv. You'll see our new cube page there, which we just pushed this week. We're psyched to have that. We're going to start breaking down all the events. And we're indexing all the videos that you can actually watch. Every single tidbit, we're tagging it so you can get the sound bites. If you don't have time to watch the videos, you can go in there and dig in and tweet out the epic moments, the sound bite, the epic tweet, the epic crowd chat. And of course, this year, we'll be adding a new element, our crowd chat product to us. Go to crowdchat.net slash big data SV. And join the crowd. We have open source crowd commentary to augment the cube and strata conference coverage here in Silicon Valley. And we're going to also bring in an analyst, Jeff Kelly, who's going to join us. Jeff, I'm great to see you. I know you've been really busy. So thanks for taking the time to join us. I know you're out doing the lunches. Big news. Yesterday, you dropped your big data report again for the third straight year, constantly getting up there. The road to $50 billion total available to the adjustable market or TAM, as Dave says. Great feedback. It's the trending item on Twitter around strata conference and big data SV. And Silicon Valley is on fire. A lot of new startups. Tell us what the report is saying. Tell us what you're hearing on the ground. New trends. What's game changing? What's business as usual? Well, what the research and the report tells us is the market is growing. Somewhat along the trend line, we forecast two years ago. So up to about $18.6 billion in 2013. That's going to grow at a rate of about, well, that's a 58% growth rate over last year. We're going to see a similar growth rate, probably a little bit slower next year. But still, it's growing fast. As Dave mentioned, there's a lot of the IT heavyweights kind of leading the market right now. And that's for a few different reasons. One, mostly around the hardware and services part of the market. Companies like IBM and HP even Dell from a hardware perspective. But there's what a great ecosystem of startups. And it's almost, I guess we still have to call them startups. They've been around a lot of these companies five, six years now. Companies like Hortonworks and Cloudera and MapR and others. But it's a really interesting mix. And now we're finally starting to see some of these companies start to grow up. And it's really interesting to watch. So Jeff, you've seen a lot, of course, all the innovations in software. But software in this market started out. In this market, the software was free. And you pointed out in your report that that has somewhat suppressed the software revenues. But you're starting to see some of these companies bubble to the top. But as you say, you still see Oracle, Teradata, IBM, HP Dell, of course, because it sells a lot of hardware. Yeah, it's got some big data stuff. But you don't really consider them sort of pushing the market for big data. But where are all the sort of software startups? Obviously, you see Splunk and Tableau have done IPOs, very successful. But where are some of the core big data startups? Some of what we talk about is Pure Plays. Where are those guys at? Well, the Pure Plays. So let's take this from a couple different angles. You've got the Hadoop Pure Plays that we have been covering for years. You've got Cloudera, Hortonworks, and MapR. The three competitors have been in this market the longest. And really, you're battling it out for that Pure Play Hadoop mantle, the title. So we're seeing companies like Cloudera hit 70 million plus revenue last year. Hortonworks hitting over 50 million. So these companies are starting to grow. MapR, over 30 million. So they're starting to actually accrue some significant revenue. That's where some of the software is starting to catch up with the hardware. So a lot of these companies would initially, they have a lot of users, a lot of downloads, but there are a lot of free downloads. In Cloudera's case, for instance, you can download CDH for free, deploy it on some cheap boxes from Dell, and off you go, doing some experimentation. But now people are starting to actually move into production deployments, or at least thinking about it, and are starting to engage companies like Cloudera and Hortonworks for support contracts, for more enterprise grade support and availability, high availability, things like that. So finally we're starting to see them generate some significant revenue. From the NoSQL space, companies like MongoDB are doing well, data stacks. So all these companies are starting to, who have the freemium model, based on open source software, they're finally starting to kind of catch on in terms of a revenue perspective. And Pivotal has now jumped up because the EMC took the collection of misfit toys, pulled them out of VMWare, pulled them out of EMC, threw them into Pivotal, all of a sudden, boom, you have this $300 million company on the scene. Yeah, it's technically a startup, right? Yeah, so my question is, okay, how much of that is real? How much of it is just bits and pieces coming together? Where is Pivotal? You see 300 million and say, wow, that's a big number. All of a sudden it's surpassed some of these other companies that we talked about. Where is that all coming from? Well, it's a real number, but most of the revenue is coming from their Green Plum line and from their Gemfire line. So, and not necessarily deployed, or certainly not deployed, kind of in the larger Pivotal platform, still selling them somewhat as point products. But I think Pivotal and their owner's EMC understood that was gonna be the initial revenue base of this company. They're at very early days. They have what I would call a grand vision of really building out this three-layer platform, the infrastructure, the past layer, and then the data fabric, as they call it. So they've got a ways to go before they really make that hardened and really enterprise-ready. And they've got to do a lot of work with partners. It requires a lot of cloud partners to really realize their vision. So they've got a long way to go there, but they've still got this revenue base. And that's not, it's only a bad thing. That gives them some, a little bit of room to play with. So, John, it seems to me that the guys who win the developer community are the guys who are gonna win this end game. I mean, what are your thoughts on this whole time? Well, Dave, I put a question out on the crowd chat. And I think ultimately Silicon Valley is leading the charge right now in terms of technology innovation. We're seeing, as we had big data NYC in New York, you're seeing much more of the meat on the bone kind of approach, where it's like, hey, proof is in the pudding. The financial services market drives a lot of value. But here in Silicon Valley, it's really about the startups and the emerging categories. So to me, the top couple clusters of audience are crowd spots, if you will, are data science, the database app kind of developer, that's the Mongo, that's the NoSQL, that's the NoSQL, that's the database market. And you've got Hadoop, which by itself is a revolution. That speaks to the open source software developer. That's more the open source. That'll span spark and a variety of other new projects like Storm, really classic. That bleeds into the red hats of the world. That's a core community. And then finally, you get the legacy vendors, like the IBMs, the Informatica's, the Oracle's, HP's, guys who are trying to get their foot back on the door. These are the guys with the fast followers coming in off the innovators. And then finally, in the new emerging category that Jeff Kelly and I were discussing, that Jeff pointed out, kind of teasing his report, but we'll see more data coming in this next year. That is a new emerging school of applications, new modern era kind of thinking like BI, data warehousing, reinvented. So these are paradigm shift, these are disruptive approaches, these are new approach to big data. This will impact anything from the internet of things to software. And so again, data science, database app developers, Hadoop legacy, and then this emerging category that we are watching very, very closely, where big data is native in all aspects of the computing software fabric. So to me, that's where we're going to watch this event. And obviously we have all the usual suspects coming through, talking about more of the same around enterprise grade, security, BI, et cetera. So that's exciting, Jeff. Your take on that, and I want to get your thoughts. You were talking about this emerging area. Connect the dots of some of the things that you're looking at around the corner. As I said, we're still at very early days, but you mentioned the industrial internet, for example. So we're going to have GE on tomorrow. I'm going to come on and talk a little bit about their strategy there. But this whole idea of connecting physical devices, physical objects through data, and then building applications that actually help those devices perform more efficiently, essentially orchestrate processes across operational environments so they're more efficient and actually help people in their everyday lives, whether it's health care or energy, transportation. So that's really an interesting area from my perspective that really has an opportunity to impact the larger world, not just kind of the smaller world of IT and enterprise tech. So that's something to keep in mind and watch. The other thing from a market perspective that struck me this year, or I should say in 2013, as I was doing my research and working on this report that we just published, I think the critical thing that happened last year is that Hadoop and some of the other big data application frameworks and other open source components, started to continue to mature, that's a given, but the way they did it was focused around things like, I think the biggest news was Yarn and the idea of making Hadoop a multi-application framework. And kind of complimentary to that was all the work, all the integration work that a lot of the larger legacy vendors that you just mentioned, John, work that they're doing to bring Hadoop into the fold. So I think last year was an important year for Hadoop, specifically big data generally, in kind of validating that this is not gonna be relegated to kind of backroom data science. Big data, Hadoop is gonna play a very important role in the data architecture of the future. I think that's what we saw last year with some of these developments. They're still early days and of course we have a long way to go in terms of bringing in enterprise-grade features like security, you mentioned I think is one of the critical ones gets often overlooked. Privacy, all those not so exciting things like metadata management and data integration, all those things have to play a role here. So those things are being built out but to me what was important last year, the platform continued to mature. Clearly it's gonna be a multi-application framework when we're talking about Hadoop and it's really been adopted and been embraced by a lot of the larger legacy vendors so we're gonna see next year and this year and next year going forward. I think these technologies starting to really find their way into the enterprise. So we talked about Pivotal as a relatively sort of new name of course in your report anyway. Amazon was the other one that you mentioned quite a bit. We weren't typically talking about Amazon in your report last year and at this event last year but now we've got a situation where Amazon, you talked about cloud and big data continuing their LoveFest, Amazon has announced things like Kinesis so it seems like it's a place for a lot of people to park their big data projects. What do you see going on with Amazon and big data? Yeah, I'd say it's a courtship right now. It hasn't been consummated yet to stretch the metaphor. So we're seeing cloud is a good place for a lot of this test and dev, building initial applications, basically testing them out, see how they work, see if they're gonna deliver the results you expected. You can go to Amazon and spin up a Hadoop cluster. You can load data into Redshift for some data warehousing. You can use things like Kinesis for now for streaming data. So they have a really a fairly comprehensive portfolio of services around the different types of data workloads. But the thing is you're not seeing a lot of enterprise workloads really being moved to the cloud, AWS specifically and that's for a number of reasons. One, I mean you've got data integration challenges, you've got security concerns, you got internal compliance policies you've got to consider. So we're starting to see some movement there. I think it's really interesting what the opportunity that AWS offers to enterprises looking to get started with big data and build their first applications, removes a lot of that complexity around deploying your own Hadoop clusters and things like that. But it's still very early. And I think it's gonna take some time before the cloud becomes a place where you're really going to bring your enterprise workload. It's gonna be a place where you test new applications. Maybe you deploy some new applications but for full-on production deployments that are supporting mission critical apps across large base of concurrent users, I think it's gonna be a while before we see that. Dave, one of the things I wanted to go we'll get some time limits here on our intro segment but Silicon Valley is really the place we're gonna be digging into here. We had big data, NYC, Silicon Valley is really where the innovation center is here. Obviously the market's really on fire right now, some say a bubble but we see the innovation. Obviously big data is kind of coming out of the trough of disillusionment. You're starting to see where the rubber meets the road. So in your opinion, your assessment of Silicon Valley, as we rise up into the trajectory in terms of the growth, you're starting to see the partnerships form. You're starting to see the fog lift in terms of where the action is. And what's your take, Dave? What's your take on the maturity of big data and where the value propositions really being highlighted? Well, when I get questions like that, I always come back to the customer. So when you look at what the customers are actually doing, they're still, as Jeff was saying, okay, I'm not going to put my mission critical apps in the cloud, am I going to put them in Hadoop? So you're seeing the infrastructure still maturing. And so you still see a lot of, when you talk to companies like Yammer or Facebook or Google or the internet guys, obviously they're adopting. And there are examples of mainstream companies. We've had guests on theCUBE, John and many others that are sort of experimenting and actually getting value out of big data, but it's still not hit the mainstream of the customer base. When you talk to CIOs, they're all talking about how they're transforming into being data-driven companies and the like, but they're still, the center of their data universe is still the data warehouse, it's still the traditional BI tools. And that is very slow to sort of break that pattern. And at the same time, you're seeing big companies like Oracle and IBM and others sort of hang on to that base like a big giant magnet. So Dave, I mean, one of the things I'm seeing right here is obviously from the startup scenes kind of mellowed out a little bit. You're starting to see the big A rounds come in. So really the startup scene is very simple. You're seeing people getting funding. Certainly you're starting to see a lot of cloud action on there, but on the big data front, the startup action really is seed financing to get going. And then ultimately either they cross the chasm and they get the big series A round in around the seven to 15 million dollar range. What that does is causes a dynamic where the funding is the filter of success. So you're seeing people getting about a million dollars, maybe two million dollars in some seed funding, which kind of means like a series A. And then seeing what they could do with that because the ability to scale with big data with mobile and cloud is phenomenal. In order to do that, you know, the investors are looking for massive traction fast. If you're not having escape velocity in the venture, then you might not make it. Or you'll be a nice little lifestyle business or small business, but the series A rounds are between seven and 15 million dollars as a funding vehicle. That's a new filter. So you're starting to see the real powerhouses emerge. So you're going to see a fewer winners coming out of that startup scene, in my opinion, and a lot more value propositions coming from there. And then you're going to start to see those guys who got funding two years ago that we covered here starting to see what the traction points are. We're going to start to see a platform, a continuity, you know, clear story. These are the companies that are hitting the table right now. They're in their B rounds, BC financing. And you're going to start to see is that horse coming home in the home stretch? And that's going to be a big proof point. And of course, you're the big guys. You're going to see HP, you're going to see IBM, you're going to see the big whales coming in here. You're going to kind of build on a position, leveraging their customer base and trying to be innovative. So again, it's the NASCAR race we always talk about. Which car is going to emerge out? And that's the dynamic of that market point. Let's keep in mind, can they do it with their customer momentum? They have, do the horse have any, you know, gas left in the tank, as they say. Okay, this is theCUBE. We're going to be here live for three days. SiliconANGLE.tv is the new site. theCUBE on Twitter, I'm at Furrier, at Divalante. Follow us, go to the crowd chat where we have crowd source commentary. Shout out to all our guys out there. We have all the usual suspects back into the crowd chat. And when we have open questions out there, ask us anything, use the crowdchat.net slash big data, SV, ask us questions, we'll respond to them. You know, we got Tim Crawford out there. Hey, Tim, saying a shout out to you. Great to see you in the crowd chat. Spread the word, let's bring in the crowd. We want to bring the crowd into theCUBE. We love extracting that signal from the noise, talking to the thought leaders. But we really want to have you guys part of the process. So if you're watching, go to crowdchat.net slash big data, SV, and be part of the conversation and part of the production. We'll be right back with our first guest here, kicking off day one of Stratoconference and big data, SV, right after the short break.