 Okay, we are back here at Strata Live. This is siliconangle.com's independent analysis commentary and opinion. We're here with theCUBE, our flagship program where we go out to all the top tech events and we talk to the smartest people we can find, extract the signal from the noise and bring that to you guys out there and share that knowledge with you, share it out to the social media streams. And this is the end of the day of our extended deep coverage like we always do at the events. I'm joining with my co-host Dave Vellante from wikibond.org. And today, we had an amazing day of great guests here at the Strata conference, really getting deep into analyzing what's happening in the big data world. Obviously, we've been there from the beginning and this is our wrap up. And I'm joined with Dave Vellante, my co-host and Alex Williams, SiliconANGLE's senior editor and writer for Enterprise and Services Angle and now DevOps Angle. Guys, welcome, Alex, welcome back on theCUBE again. Guys, we're wrapping up here in a great day, long day. We've got a big day tomorrow as well. Couple observations. I think our ad supported business model is working out great. I want to thank our sponsors, Cloudera, Digital Reasoning, MapR and 1010 Data. You guys are great. I want to thank them. And they allow us to bring this great coverage and bring our full team here to do a full analysis of an inspection of what's happening here and share that out. We've got blog posts flowing on siliconangle.com and I'll see Dave's team at Wikibon. Jeff Kelly, the leading analyst in big data wrote a groundbreaking market research, market sizing and revenue share by vendor on wikibon.org slash big data. And between all of our free content, the research, the video, the blogging, we're pounding out as much content, signals we can. So let's bring it together, guys. Let's talk about the wrap up. Dave, we'll start with you. Experience here, first day, observations, feeling, compare and contrast last year, notable points, what's your wrap up? Well, the first thing I want to say is I want to say how proud I am of the team and what a pleasure it is, John, to be working with you again and Alex and the rest of the team, Mark and Kean, Jeff Kelly, David Floyer. But I pull up siliconangle and we just had pervasive software on minutes ago and here's an article, pervasive software Data Rush will make it do faster and easier. Kit Dotson puts it up. Our team is watching the stream, we're covering this and I'm very, just very proud to be associated with such a high quality team and such motivated people. And essentially we're using big data in our business to identify the trends and report on them and we make everything available under an open source license. It's free, not only free of charge, but free to use in any way you want. And again, I'm very proud of that. People always say, how do you make money? And it's just like you put out good work and money will come in, and that was our basic philosophy. John, I know it was yours when we started and so again, very proud of that. I think this year really does underscore, I think Abbie Mehta was right. It's really the year of substantive proof of concepts. Every business out there, 80% of them anyway, have some kind of big data project going on. They're beginning to do real things with big data. It's going to be like a website. You remember the dot-com boom, the jokes of everybody thing was named dot-com. It's the same thing with big data. Virtually everybody is having some kind of big data project. So we're sinking our teeth into it and we're really seeing some substantive traction. And I think next year, the year after, is really when it's going to start raining money, John, to use your term. You know, we're not there yet. You can see by the pure play, guys, not a ton of money's being made, around 300 million from the pure plays. That number is just going to absolutely. Well, let me ask you, Dave. First of all, I love working with you and again, SiliconANGLE team and Wikibon has been cranking it out. I really want to ask you one specific pointed question and that is, how do you feel about the feedback you've gotten here at Stratoconference from last night and today from your big data study that Jeff Kelly put out, obviously pretty groundbreaking, pretty bold. A year's worth of work collectively between SiliconANGLE and Wikibon around research, around market size. It's a very difficult task. You know, this is not like just, you know, some blog post with some analysis from the armchair quarterbacks out there. This is real deep research. You did some investigative work on numbers. So feedback, what kind of feedback have you gotten and what kind of feedback has Jeff Kelly received? Well, first I think it's important to understand the motivation for that report was to put a stake in the ground. There was no data on big data. Back when the days when we were at Forester and Gartner and IDC and doing these types of things, we would sit in a room and, you know, collaborate amongst ourselves for many, many months. What we decided to do is let's take it to the community, the power of peer insight. And so you're right, we did a lot of work, a lot of digging, a lot of community interaction and put it out there. It did take a lot of guts and the feedback has been, first of all, great. People have said, wow, great job doing that. Thank you for doing that. We really appreciate it. Hey, these numbers should, you know, I think are higher. These numbers are lower. Debates about the definition of big data and all that's taking place in public and that's what we want. You know, we don't want to hide behind, you know, what's happening and with, you know, phone calls and things like that. We want to make it all public. And so that's been fantastic. It's free, so the report is free as well. It's just for folks out there. Yeah, so as you said, wikibon.org slash big data. It resolves through the report. So we're very excited about that and the feedback has been flowing in. It's been quite amazing. We think the market's, you know, bigger than you say. We think the market's smaller than you say. We like your numbers. We don't like your numbers. I mean, the feedback has been really, you know, fast and furious and we're taking it all. We're assessing it. You know, it's like a detective following leads. You know, somebody will throw out a little tidbit. We'll track it down. We'll try to find it. Let me ask you, where were you wrong and where were you extremely right? Because a lot, you know, I don't want to trivialize what I said, throwing darts at the board, but you know, you've been in the research business. You know, you built IDC's researchers. You know the game, but you know, you were, where were you, where were you surprised and like, hey, we were so right on this one and we're way off on this. I think it's, first of all, it's a little early to answer that question fully, but I will say this, that when you're quantifying, you know, something like big data, and you're segmenting the pure plays, and a lot of these pure plays, as you can see, they're talking about a couple million dollars in revenue. The swings. Start-up pure plays. Yeah, the start-up pure plays. The swings for a company you predict is, you know, three or four million could be, you could be a hundred percent off. That could be really, you know, five, six, eight million. Okay, so some of those, I think, were off. I think others, you know, some of the larger pure plays, I'm very comfortable with the Vertica number. We've been getting some heat on that, but I think it's very good, based on the sources that I got. And then the other is, you know, the big whales. It's hard to say, you know. Like IBM, right? IBM, yeah. I think IBM's a lot of services. Oracle, personally, I think the Oracle number's still too high. I just don't buy that there's that much exadata in big data. I'd like to see that number come down a little bit, and we're having a discussion about that in the community. And then the other is, as you know, you can say I like to talk about S-curves. There's this O-Give curve. I'm not so sure the market's going to double this year. I think it might be a little flat and then go even steeper than we say. And I think your scenario of the market is bigger than we say, might be right. Yeah, Alex, I want to go to you. What's your perspective on big data? You've been out at the sessions. Alex Williams, obviously, in the sessions, getting data and talking to people, breaking stories. What are you finding out there? What's your ear to the ground tell you? And some of your face-to-face conversations here at the conference tell you about the segment and some of the things that we're reporting. Well, first of all, I think it makes sense that in Jeff Kelly's report, we saw a real representation from the services providers. And that's reflected on the show floor. And who I talked to, there still is a real need for services. As the EMC data scientist who came on today said that he walked into a room with 10 people who were experts in business intelligence and none of them knew about Hadoop. And that's, I think, a picture of where the market is. You see that when you talk to people on the show floor. They're going out there just to try to learn and trying to find out what it's about. I spoke to one woman, for instance, who works for a nonprofit. And she said, I just need to learn from my peers. And I think that's what is so important about this event. She said, I really can't afford the solutions from these big providers out there. But here, I can learn from my peers. And so I think that's just as much as a factor as anything. It's like this peer-to-peer development we see in the Hadoop community and the big data community. And that will be a factor in driving it forward. So John, you have a unique perspective. Abhi said you make bold statements. You said, yeah, well, I work for myself, so nobody can fire me. Well, the audience can reject the bold statements. But I'm right most of the time. So what's your take? On what? What particular? Big data, this event. Day one, day two, whatever I call it. What's changed since last year? Let's start there. OK, I just randomly pull out some factoids or opinions and comments perspective from me. I had a bunch of notes from the day. The first thing that pops in my mind is obviously Mike Olson, who I really like and have a relationship with because of my relationship with Cloudera being there a year and a half in their office. I've gotten to see firsthand a working startup that I wasn't actually involved in, but kind of involved in. I know Amar Awadallah and I watched that whole team just grow really from the beginning. When Amar was EIR at Excel, I remember meeting him and saying, hey, what are you working on? I'm trying to get the data from him. And we're talking about some of our visions and it was really cool. And then when they funded Cloudera, when they moved to Palo Alto, they had extra space and I was allowed to go in there. So I'm really impressed with Cloudera, right? How much growth they've truly done and they've made some good business moves. They're known for their technical team. I'm really impressed with how much they grew, the revenue and on the pure business moves. It's a chess game, as you know, David's early market. Some companies have no revenue or a couple million dollars in revenue. Cloudera has made some very, very smart business moves and I'll give you a few examples. Mike also talked about ramping up the sales teams and then having a good business development strategy. Get that to the scoop. Alex Williams getting phone calls. That's right. But what I really was impressed with is those business moves and specifically the federal business. And he talked about their investment in the federal dollars. Cloudera made the investment early on federal and digital reasoning kind of actually validated that in saying how much money they're making on the federal business. So that's a bold move for startup to do that. In addition, they've done very well with the early mover advantage on proof of concepts, hence the revenue leadership there. So one, Cloudera is very solid. The other thing that I noticed is that the human talent conversation has been a big part of this strata where the people equation is really interesting even though machines and algorithms and data sets are all being talked about in their normal way, that human element, the impact of society, which is people, the human element where the data scientist and the DBA, that role is very, very important. Humans are still the last mile, as Roger said from a while ago. Yeah, that conversation came up a lot in a session I attended with the other media about that issue with people being the last mile. And I think what we're hearing a lot here is the whole kind of the premise needs to change and we're starting to see that evolve where it's not about you start with like some end, something at the very end that you believe is really right and correct, but starting with asking a lot of questions and learning and building something from that. John, how about the HBase mean? Yeah, I think another observation is that the HBase conversation is great. As Dave, you know, we are using HBase and our team is an early pioneer, part of the Cloudera team that built the HBase. HBase fundamentally is going to be the database on top of HDFS, absolute opportunity for anyone involved in the HBase community, not only on the open source side, but also on the commercialization of that. Very positive feedback has come out of that. In addition to that, I thought that Scott at Peer Storage talked about data ISVs. Interesting comment because that opens up a whole nother conversation around the kinds of businesses and services that will be funded and or self-funded around Hadoop. So I think that's interesting around data, and then Abhi Meta really circled back to that thread where data as a business is fundamentally a core asset and there's a perplexing set of conversations around policy, around privacy, identity, and trust. That's going to be big around the data. Also, this conversation that hasn't been talked about much here other than a few people is the database architecture around systems, the flash, the impact of flash on the performance side will have a tremendous impact on a lot of these integration conversations. And so finally my other observation is that integration between legacy data warehouses and business intelligence systems with Hadoop fundamentally is a massive opportunity because they're not mutually exclusive in the equation. So I think predictive analytics in real time will be part of the new model and that'll be a mix of relational and unstructured systems, mostly open source, but data warehouses will have to build those layers and I think that's why you see Vertica and you see companies that are coming out with these approaches doing very, very well. Alex, I understand you attended the Netflix session today. It was one of the hottest sessions going. Did they talk about Cassandra? Did they talk about how they're using real time analytics in their streaming platform? They talked a lot about how they build recommendations and they started with the premise that they have five billion user recommendations, but they needed more additional data beyond that to really determine what really people, what they're really trying to do overall. And what they came out with was that what they really want is people to watch a movie and then want to watch another movie again. And we hear these debates about big data. What they're saying is that the more data you have actually does help a lot. This idea of small amounts of data, well in their case, the more data, the better. You know, John, when we were in New York, we can't say the name of the company, but because John and I were at a private investor meeting in New York in January, and there was a debate about how much data to keep. And one of the, it was a large web company and the point he was making was, well, we found that all that old data was useless and everybody hopped on it. Like, are you out of your mind? And it's totally antithetical to what you just said. And what Netflix is finding, and maybe that's why this company is sort of struggling. Again, I won't name them, but one of the granddaddies of the internet in the old days, right? You remember that. Is that, I mean, I'm curious from your perspective to Dave about storage. And you know, that changes the whole storage paradigm in some respect, doesn't it? With the ability to keep more data and tier it. What is the impact there? Oh, that's HDFS, right? Can I interrupt you for a second? Because, you know, that was a good comment that you guys just talked about. There was a quote, I'm getting a note here from Mark Hopkins who's been monitoring the Twitter stream and also some of the sessions. In the Netflix session, they said it was more expensive to delete the data than to save it. So. More expensive to get rid of it. Yeah, more expensive. Because of the impact of the data dependency, again, the data models are kind of all, you know, I don't want to say lock in, but no one can lock in one data model as Bill Schmarzo pointed out. The data model lock in, the data model architecture is no longer a lock in for one vendor. So I think the Netflix comment points to the fact that as data interoperates between these systems, the ability, the impact and dependency across different data sets becomes a real big developer upside and the consequence there operationally is cost. It's kind of just like oil, it just flows everywhere. Yeah, and you know, I mean, HDFS basically solved the storage problem. We had Jeff Hammerback around at Hadoop World and he said, I was basically, when I was at Facebook, I was incensed by the amount of money that I had to spend on the container. You know, meaning I'm going to put this stuff into a big monolithic array. So HDFS allows you to keep the data on commodity devices in a distributed fashion, bring five megabytes of code to, you know, a petabyte of data instead of, you know, 100 terabytes into five megabytes of code. And so that architecturally that problem has been solved. Now, there's a long way to go. I mean, what we've been hearing at this event is protecting that data, securing that data, you know, replicating it, moving it, you know, cleaning it, transforming it, all that stuff still needs to get done. And the other big theme that we've heard, and we predicted this, certainly I predicted it in my post, you know, late last year, early this year on big data of the intersection between the traditional enterprise data warehouse, the legacy data warehouse and Hadoop. And you're seeing that now everywhere. I mean, look around all the legacy guys up here with Hadoop solution. My current conversations on the show floor, I heard a lot about that. You know, for instance, the people from Hadoop adapt. We're talking about, you know, what they're seeing is that, you know, companies do want to adopt Hadoop, but they need to think of their existing infrastructure and the expertise that they have. And so the cost, the overhead and the skill set is really not there for Hadoop. And so what they want to do is just, you know, optimize what they can and what they can with the existing resources that they have. All right, this is Dave Vellante. I'm here with my co-host, John Furrier, with SiliconANGLE, Alex Williams of SiliconANGLE as well. It's probably a good time to say, look, I mean, we've got these free resources. We've been talking about open source content. Go to wikibon.org. It's a wiki, hit edit if you want. Ask a question, we've got answers. Go to siliconangle.com. Go to siliconangle.tv. Go to services angle. We're just about to launch DevOps Angle. These are all free sites, resources for practitioners. Help you make better decisions. There's ways in which you can interact and engage with us. You know, please do, we want to have you. We really want to give back to the community. And it's really a pleasure being here again, you know, thanks to O'Reilly, thanks to our sponsors. Yeah, I mean, O'Reilly has been great, Dave. I mean, you know, O'Reilly's busy. They're producing great content here at the event. We're covering their event. You know, I got to give this event a, you know, a solid A minus. I wouldn't say A plus because, you know, the venue is kind of a little bit spread out a little bit. The Wi-Fi's been pretty good. The connectivity's been great. The content's been A plus. So, you know, O'Reilly is a great set of people they're putting it on. The other thing that they're doing about the commerce that I like is they're opening up the live streaming, not only to us, but themselves. We've got all the keynotes and sessions, live stream, which will be on the O'Reilly channel and allowing us to do coverage the way we like to do it and not meddling in our editorial's been great. Not that they ever have, but they're really cool to work with and high quality people. I think also hats off to them for bringing, for attracting such a smart and energetic community to this event with people traveling from all over the world, you know, to learn from each other. And that's really what community's all about. You can really see it here at this event. Okay, so we're going to wrap this up for today. We're going to be back tomorrow, day three, day two for the conference, day three for us, with eight hours of live coverage, really to kind of drill into more interviews, find out more about what the hot trends are. Again, we saw the hot trends here, convergence between business intelligence, data warehouse on Hadoop, machine learning, visualization, all the hot topics, big data's hitting all the different verticals. SiliconANGLE.com, SiliconANGLE.tv has all the coverage, all the research on wikibond.org. So go to those sites, check it out, and we'll be back tomorrow. We want to thank our sponsors making this all possible. Cloudera, Digital Reasoning, MapR, 1010 Data. We love them, support them, and we'll see you tomorrow. That's a wrap from day two here at Strata at SiliconANGLE.com. Thanks guys.