 Live from Boston, Massachusetts, it's theCUBE at the HP Vertica Big Data Conference 2014. Brought to you by HP with your hosts, John Furrier and Dave Vellante. Hello everyone, welcome to theCUBE. We are here live in Boston, Massachusetts. This is theCUBE, our flagship program. We go out to the events, extract the suit and the noise. I'm John Furrier, I'm here with my co-host Dave Vellante and here with Jeff Kelly. I don't think it has a wiki bond. I'm excited to be at the HP Big Data Vertica End User Conference. Really it's a conference around the big data around Vertica and HP, HP software and really about the players in the industry. It's exciting to be here. This is year two, Dave, we are going here at HP Vertica and in Boston again, the summer. No summer blues going on for theCUBE here, Dave. Just a lot of more action, HP, again, really, really kind of conservative but still lighten it up out in the marketplace with Vertica. Yeah, I mean, yes and no conservative, right? So we have 1,000 people here this year, second conference, I thought last year was very, very well done. The thing that struck me last year, John, was just the number of customers, the number of customer stories, guys like Guess, Yammer, we got the US Postal Service here. A lot of customers willing to come on theCUBE and talk about how they're using data. So as I say, about 1,000 people here, we heard in the keynotes this morning, Meg Whippen got up in a video and gave kind of the obligatory, big data rocks kind of discussion and then Colin Mahoney, who's the general manager of Vertica gave a talk and he's always good. He always brings in new information. Everybody shows about how much data is growing. Well, he gave us some stats. He basically said in 1983 it cost about a dollar and you could store about 3.5 web logs and today you could store like 35 billion web logs for that same dollar and so he put forth a scenario of you basically want to log everything, figure out what happened, figure out what didn't happen and talk about essentially samples. Remember our friend, Abhi Metta, talks about sampling is dead. Colin talked about the sample size are essentially infinite now. So he sort of laid out that scenario. The other thing he did, I think he did a credible job of talking about how Vertica used to be essentially a one feature company, a one trick pony around column store. Remember when Vertica first came out, it was really all about columnar compression, doing it for less, doing it faster and then he sort of laid out, okay, now here we are today and I don't know, Jeff, where are we at now? Vertica 7. Vertica 7, that's correct. And so he showed the pieces of the puzzle which was a much, much more robust portfolio inside of Vertica and then of course they had Tyrek the rock star who's this kid who does Rubik's cubes in like literally 10, 15 seconds he can solve a Rubik's cube today. His trick was to do it with one hand. So interesting mix of good gimmicks, a little bit of fun and some good customer interaction. I want to bring it over to Jeff Kelly, Kelly Austin's analyst, the number one analyst in Big Data as far as I'm concerned. First, actually do a big data sizing, Dave, not to kind of pimp up his own report. But it was Jeff, looking back now, you did the first market sizing of Big Data. So I got to ask you first, is it bigger than we thought and how does that relate to what's going on in the marketplace today here at the Vertica show? Yeah, I'd say we did a pretty good job of forecasting the market. I'd say it's growing a little faster than we anticipated, which is not surprising considering that all the innovative uses of data that we're seeing out there from practitioners, from users who are coming up with new ways to gain value from their data. And we talked about back in that first forecast that we really thought the majority of value in the big data market was going to come from practitioners, people actually putting the technology to use versus the vendor. So we're seeing the revenue related to big data products and services certainly is growing a little bit faster than we thought, but I think the value that practitioners are creating from big data technology and services is increasing at least as fast as not faster than what we're seeing but that's a little different than some of the other folks out there were pumping up the big data apps as to the being the tsunami, but you pointed out who's going to come from the practitioner markets, meaning how it was being put to use. So let me ask you some questions around this show. And obviously the guest list is pretty significant that we have mostly customers here that talk about their story, which is phenomenal for us. We love that versus people just pumping up their own software. But be specific Jeff, what are you seeing specifically around the growth on the practitioners versus the hype that just never came home on the big data apps? Well, you know, so we're seeing, so on the one hand, you've got essentially greenfield opportunities where you've got either startups or companies that are really were born on the internet, born on the web, whose whole business is around data. We're seeing them use technologies and tools related to big data, Hadoop, Vertica, Tableau for visualization, other tools really to create totally new business models. So you're seeing, you know, company like Carlytics who we'll have on later in the show talking about how they actually help organizations, help merchants essentially, better target ads. Okay, makes a lot of sense. But you're also seeing more mainstream, more traditional organizations making use of big data. So we're gonna have one of the show, United States Postal Service. So, you know, you don't get much more traditional than that. So they're a big Vertica customer. They've got obviously a lot of challenges in this new environment where, you know, there's a lot of, you can connect with people through any number of means other than the Postal Service. So they've got to modernize. So we're seeing them use data in interesting ways to better compete in this marketplace. We're also seeing healthcare organization, Blue Cross, Blue Shield will be on during our broadcast. You know, obviously healthcare is a huge opportunity for big data analytics. There's data coming off the machines. There's all the wearable technology that people are now using Fitbits and trackers such as that. There's of course larger demographic data, social data coming off the web. So any number of use cases in healthcare as well. So it's both traditional organizations and of course the big data startups, those companies that were really born around data and the internet. So one of the things, Jeff, that you found in your survey, which was astounding to me, you had a question in there. Has your organization shifted any workloads from a traditional data warehouse like a TeraData or an Oracle or a mainframe, maybe a mainframe to Hadoop? And a huge number of people said, oh yeah, we've already done that. So over 60% and another 30 plus percent said they're going to do it by the end of this year. So literally 94%, 95% of the respondents said, yeah, we're going to start shifting resources by the end of this year or they already have. That's an astounding figure. Now you didn't really dig down deep. You had some in-depth one-on-one interviews you did with various practitioners. So add some color to that stat. Are we seeing what we call like you on a share shift occurring? Is there a big disruption going on where Hadoop is that big sucking sound taking resources away from traditional EDW? And I wonder if you could talk about that and talk about it in the context of Vertica. What does that mean for Vertica? So this is obviously a big topic and one of contention among the different vendors in the space. So you've got the Hadoop vendors on one side, you've got the traditional data warehouse vendors on the other, and most are positioning Hadoop as a complement to the traditional data warehouse. And we would tend to agree with that on the whole. However, as you mentioned in our survey, we found vast majority of respondents have shifted at least one workload from a traditional data warehouse or even a mainframe to Hadoop. Those shifts are being driven largely at this point by cost reduction. Hadoop is something like one-tenth the cost of a traditional data warehouse. Obviously it depends a lot on the workload and other characteristics of the workload. So that's a bit of a broad statement, but if we accept that that premise is true that there's a huge price difference between the EDW and Hadoop. The question then is, of course, which workloads are gonna shift? There's no question that the EDW is a much more mature technology than Hadoop. And in fact, the EDW isn't going away altogether, but there is no question that Hadoop and the EDW are competing for workloads. You have to see we can replace so much as workloads where Hadoop can do the job just as well as an EDW, but for a tenth of cost. You're gonna see those workloads migrate. And I think that's what the survey findings show. So we did dig a little bit into what are those actual use cases? And they're the things you might expect that the most popular use case was data transformation. So doing some of those big, the T and the ETL, moving data to Hadoop to do those transformations rather than doing it in an expensive data warehouse. So, Jeff, I'm gonna ask the vertical folks when they come on, this question I'm gonna ask you. So I want you to give me the answer and then we'll see how they answer. See if they can jive together. Obviously Hadoop is huge. We've been following that from day one, but now it's on mainstream. How does Vertica play with Hadoop? And what does it mean to them? And what's your take on their position vis-a-vis Hadoop and what they're going to do obviously from their position? Are they riding that wave? Are they just kind of barely holding on? Is it really gonna be beneficial to them? What's your take on the Hadoop trend and how Vertica fits in with it? Vertica is in a much stronger position than a lot of the other data warehouse vendors. Vertica from a technology standpoint is a very, has got a very small software footprint. It's a scale-up platform on commodity hardware. It shares a lot of characteristics with Hadoop. It's popular with a lot of developers in, we've seen of course in the gaming industry and a lot of other digital native companies. So that shift that I mentioned where you're moving some workloads from a traditional data warehouse to Hadoop does not impact Vertica as much as some of the other players in the market. It says it's a terror data. Again, that's partly because of the small footprint partly because of the types of workloads that we're seeing most Vertica practitioners doing inside of Vertica. That said, in terms of actually integrating with Hadoop, again, Vertica because it shares so many technology characteristics, scale-out commodity hardware, small footprint with Hadoop. We are seeing a lot of areas where Vertica is very popular to either put side-by-side with your Hadoop installation or in some cases, actually running on the same cluster. We covered here on theCUBE back in February. Partnership with Mapart so that you can actually run Vertica in the same cluster as Mapart. And of course, a few weeks ago, the big news was HP investing 50 million in Hortonworks, the open-source, focused Hadoop company. They're gonna get a board, HP's gonna get a board seat, Martin Fink on Hortonworks board. But from a technology perspective, the two are gonna work together, Vertica and Hortonworks to certify Vertica on Yarn. So really to make it much easier and more seamless to run Vertica on top of Hadoop. So it's really what Vertica's positioning themselves as is yet another SQL on Hadoop option. And what Vertica has going for it, in addition to the things I mentioned, is that it's of course a much more mature database than some of the other SQL on Hadoop options. So SQL on Hadoop was a big hype up two years ago. What's happened there? A lot of dead bodies kind of on the side of the road now. I won't say a few names. We know them, Dave. We have some friends of ours. But it really is panning. You've seen the big whales obviously there. Give us the update. What is the status of the SQL on Hadoop? Well, it's part of our server we asked that question. And we found that again, an overwhelming majority of Hadoop practitioners are in fact, at least experimenting with these different SQL on Hadoop options. So of course you've got, really the first to market was Cloudera's Impala. I should say even before that, of course, Hive, a project, sub-project of Apache Hadoop really was the SQL on Hadoop kind of originator. But people felt that it was not performative enough. So you saw Cloudera develop Impala. You saw Hortonworks work to develop Hive and improve its performance with the Stinger Initiative, which was recently completed. And then you've got other options, of course, like now Vertica on top of MapR, soon Vertica on top of Hortonworks. But you're saying there's demand for it. Oh, there's absolutely demand. So what the survey said? You said that we want this? Right. So over 80% of our respondents said they are at least experimenting with some SQL on Hadoop option. Which makes sense because they know SQL. It's a language that they're comfortable with. Look, it makes sense because it allows you to move some workloads from your traditional system on to Hadoop and bring in more data scale to larger volumes of data and cut your cost. So it makes a lot of sense. But I think where the real value is going to be is where kind of these new uses, these new ways of interrogating data, using a tool you know, SQL, but bringing in all sorts of different sources of data you couldn't do in a traditional environment. How mature is the integration, though, Jeff? Are the vendors getting it done, generally, in Vertica specifically? Where are we at in terms of true integration? It's still pretty early. I think we've, you know, the practitioners that we've talked to, most are still, while they're experimenting with SQL on Hadoop, I think most, you know, analytic, advanced analytic workloads are still taking place outside of Hadoop, side by side with Hadoop. So we've got a ways to go. The technologies generally are relatively mature, relative to the data warehouse market, obviously. So there's a ways to go. As I mentioned, the Hortonworks HP Vertica partnership, that will be interesting to watch. Right now, that's all it is, is kind of, you know, a partnership announced. We haven't actually seen the hard work happen yet to integrate the two, so that's where the X-Teen has to happen. We're interested in that news because, obviously, it puts Hortonworks right back in the conversation after Cloudera's massive liquidity event with their financing from Intel, which, as some say, it was an IPO for Cloudera, which it really was. It was a great liquidity event for early investors. Pre-IPO IPO. I mean, this is the new normal, right? Get liquid on before the IPO, these secondaries. This is kind of what's happening. Because of all the pressure, Mark Andres has been very, very upfront about it. He's like, hey, you know, this liquidity activity is normal because it just sucks to go public. And he says on the side, you got to be really buttoned up to go public. So this brings the conversation, Jeff, to the IPO window that's clearly open. This is big data land going public in this next couple of years. So you're starting to see the horse are starting to get ready and hit the track. As we say, you know, when Kentucky Derby's coming and getting into the stalls, it's an IPO window that's wide open. So who's winning? Who's knocking the numbers out of the park? Obviously, Cloudera had to do the big financing. Hortonworks takes the big financing from HP. Are these guys ready to go public? They're moving in the right direction. As I said, at the start of the year, I think, and I thought, and I still think that Cloudera is going to top the $100 million revenue mark for 2014. That's kind of one of their- So you got to be at 100 million. They got to be at 100 million. You got to be growing fast. You got to be growing fast. Based on our research, I think Cloudera is on pace to hit that. And we're only about halfway through the year, so long way to go, but we think they're on pace to do that. And that's largely licensed revenue, correct? Or is it mostly services still? They are probably 55, 45, 60, 40 licensed to services. So probably still two heavy services for the public market. I think probably if you're looking at a potential IPO for them, probably mid-2015, maybe late-2015, as John said, they don't necessarily have to- Just as interest rates are starting to rise. There's not as much pressure on them now to go public with the huge investment from Intel. Another company, Hortonworks, they have very clearly said their goal is to go public. I think you're looking at definitely first half of 2015 for them. They likewise, I think, they're nipping at Cloudera's heels. They may even pass Cloudera in terms of revenue this year. Really? It remains to be seen- So you're saying they're growing faster? They're growing extremely fast. And a lot depends on the relationship with the resellers, obviously, which is kind of underpins Hortonworks. So we'll go to market strategy. So some other companies, we're going to potentially see MapR is another company that sometimes gets left out of the conversation. They seem to be the smaller of the three, but they had a large investment around recently, 100 million. And they're another possible IPO, maybe an acquisition target, maybe IPO, we'll see. And then on the kind of no-sequel side, MongoDB, Max Jerson recently just stepped down as CEO, which was an interesting move. I think for family reasons was why he stepped down and I don't doubt that. But he was more of an engineering-focused CEO and I think they're bringing in more of a financed guy to run the company now, and I think that's in anticipation of an IPO for MongoDB. So they've got some challenges of their own, but it's really an interesting market. It's going to be really interesting on Wall Street to see how these companies perform once they do go public. So how about the view from the Valley, John? I mean, everybody talks about there's a bubble. We tried to get some B-roll a couple of weeks ago in Boston and Massachusetts, and the number of companies around here is few and far between in terms of brand names. What's happening in the Valley? Well, first of all, I just want to say, from the CUBE standpoint, Jeff Frick and Grace Stewart and the team will be kicking off September 1st. The CUBE Silicon Valley is kind of our little pilot, but we're going to be aggressive with HBO Silicon Valley hitting mainstream, the big popularity of that show, and also the attention going on when the innovation is huge. The issue with Silicon Valley right now is awesome, right? Jeff mentioned some of the players. Cloudera is positioned with Intel, that liquidity event, although we were talking about that. It's actually a positive for Cloudera because here's what's going on, Dave. In Silicon Valley, a lot of these are transformative technologies, and let's take Cloudera as an instance in Hortonworks. These companies are in the long game. They're going to win big, and if they get forced to go public too early, a lot of bad behavior could come out of that versus staying true to the mission, which is really transforming industries, and that's what's happening. There are some key technologies happening around big data where the new stuff is transforming either incumbents out of business, putting them out of business, and we know that those companies are, and transforming industries. So that's the key thing going on here. So the public market is interesting to me. However, I think you got to look at that and saying, if they go too early, they could miss the full transformation upside, and that is a huge strategic decision that a lot of the VCs are making. And just to use some other random examples, let's take them, the consumer side Snapchat turned down $3 billion offer from Facebook, recently valued at $50 million. So it looks like a good call. BuzzFeed just scored $50 million. The market is frothy right now, but we're talking about transformative technology. Are you worried? I mean, if these companies start doing IPOs and we do have another bubble burst, are you worried that the timing will cause real havoc in the marketplace? Well, no, I think the bubble will, the froth will come off. Let's take Twitter for instance. They went public. I thought they did a good timing on their public offering, but their market, their model was different. The street didn't understand Twitter. What's happening now is, the public market actually helped Twitter get focused. And they're getting focused on keeping the user experience the way it is, maybe making some tweaks, but focusing on the revenue. If you don't have the revenue, public's not going to be a good spot for you. So that's why, you know, look at CloudAir pulled back a little early. You know, they don't want to just force the revenue equation too early. That's what Twitter did the same thing. They created the penetration and the critical mass, then they doubled down on the revenue. So going public is a double edged sword. You go public too early and you're not scaling on the revenue side, you're going to be screwed. Has the next billion dollar company in your opinion, software company going to come out of the big data world or is somebody going to gobble up one of these potential candidates? What's your opinion? I think this is the big debate right now. My personal opinion is that yes, it will be. And I think it's going to be different. And I think this whole, how do you put people in categories is going to change? Oh, they're a software player. They're a SaaS player. I think you're going to start to see things like, what Uber's doing with cars. These new transformative technology don't fit in the category. They're cross categories. I think Andreessen Horowitz calls it the full stack startup. Whatever the hell they want to call it. That's kind of a marketing term. But in reality, it's fully integrated, using big data, it's critical. That's the real lever. So this new model is not going to be some categorical siloed company. It's going to be something completely different. I completely agree. And I think we are going to see the first billion dollar company in big data. It's going to happen. But it might not be who you think. And I think it's more likely you mentioned Uber. It's going to be a company that's using data. Practitioner of big data. It's going to be a practitioner that maybe you don't think of as a big data company. But that's what drives their business. That's what we're going to see, I think. So in Uber and Airbnb, potentially, gaming companies, social media companies. Facebook, Riot Games, Twitch. There's going to be, yeah. The Cube. There will be companies in margin that we can't even think of right now with different use cases. I mean, we saw on stage today. Now, I don't think they're a nonprofit, the Conservation International. But some of the things they're doing with data. I mean, I never would have thought, you never would have thought of some of these use cases. So what's going to happen is you're going to have startups emerge. You're going to have maybe some existing companies pivot and start to use data in ways that we never would have thought of. And that's where you're going to see the billion dollar companies, in my opinion. We got those guys coming on, Conservation International. I totally agree. Billion dollar companies are going to come from out of left field. It's going to be a unicorn. It's going to be a black swan. Whatever you want to call it, it's not going to be predicted. It's just going to happen. I think that's the beautiful thing about transformative technologies. And certainly, the disruption is there. The bubble helps with the financing. And that's why we're here at theCUBE. Getting all the data and sharing that with you. Excited to be back in Boston, Massachusetts for year two of HP Vertica Big Data Conference. We'll be right back with our next guest after this short break.