 Live from New York City, it's theCUBE at Big Data NYC 2014. Brought to you by headline sponsor, WAN Disco, with support from EMC, Mark Logic and Teradata. With hosts, Dave Vellante and Jeff Kelly. Welcome back to New York City, everybody. This is Dave Vellante with Jeff Kelly. We're here for two days. We've got three events going on. We're running theCUBE live wall-to-wall coverage of Big Data NYC, which is concurrent with Adupe World and Strata. We first did our first Adupe World in 2010. So we got two days of CUBE coverage. This afternoon at four o'clock, we have a Capital Markets event. We've got over 150 people registered to come in and hear Jeff Kelly talk about some new Big Data information from our latest survey. And then we've got a panel, a great panel, Amy O'Connor, who's the former Big Data team lead Nokia, Abhi Metta, former B of A, now it's CEO of Trasata and Peter Goldmacher, who was the lead analyst for Enterprise Software and Big Data at Cowan, as well as Jeff Kelly. That's this afternoon at four. And then this evening we have our celebrating five years of CUBE at Adupe World Party. We are here at the Hilton Times Square, which is offsite at the Javits, but buses are running, we've got cars going. So stop by and see us. David Richards is here, CEO of WAN Disco. Great to see you again, always a pleasure. Always great to be here, love the show. So you guys, we had cause on at Oracle Open World. I was texting saying, hey, can you get up here? And we want to shake things up a little bit. So that was fun. It's just amazing to see what's going on with Adupe. Jeff and I have been sort of sifting through the data and I call it the big sucking sound. The Enterprise Data Warehouse isn't going away, but boy are people rethinking where they put their money. Do you see that and give us your take? So I had a wander down Wall Street and met with a bunch of CIOs a couple of weeks ago and this actually is where the rubber is meeting the road in terms of Adupe adoption and deployment. It's the banks. So it's like an inverted product adoption life-cycle curve. Virtually all of our early customers in uptake are with massive financial institutions. And I spoke to one, I can't name any of these banks, I'd love to, but I spoke to one, a particular CIO and I said, how are you going to cost justify this move to Adupe? How are you going to do that? And his answer was so succinct and so simple. They're going to tear out half of their existing data warehouse solution. I won't disrespect to the man whose technology that is. And they will save half a billion dollars a year. Half a billion dollars a year. I mean, when you hear numbers and stats like that, it's, you know, this is a seismic shift, it's a tonic shift in the marketplace that I don't think we've ever seen before. And then the other side of that coin is that people are making money with Adupe because they're doing things faster, they're identifying fraud more quickly, they're able to identify risk more quickly. Talk about the other side. Some of the use cases are just amazing. So we did a production trial with one of the world's largest banks. And they had a very simple question which was, are we making money in a particular geographic region? Now you'd imagine a bank should be able to add all those numbers up and figure it out. Well, it turns out that a lot of the data is in a very unstructured format. And by that I mean PDF files and Word documents. Within 30 days, they were able to determine actually that they weren't making money in this geographic region. Another case I went to see, a massive insurer, insurance company in and around the metropolitan New York area. They've been in business for 100 years, right? They're actuarial scientists. The assumptions behind their business have been in play for 100 years. This is a 100 year old business. Their top, when they implemented big data and just ran a simple text algorithm, regression analysis on text, they determined that their top three risk elements, their actuarial risk elements were all wrong. Now, can you imagine what the board of directors at those two companies would have thought, A, they're not making money in a particular geographic region. And secondly, that the assumptions associated with insurance risk are actually not the top three reasons for insurance risk. The opportunity is phenomenal, but also the risk to those businesses in not doing it is almost as great. So it's this risk reward. I think that interesting stat that in 10 years, 66% of the S&P 500, 66% of the S&P 500 in 10 years hasn't even been created yet. That tells you that we're gonna have disruption. And I think we're beginning to see now that big data, and it's a horrible phrase, we know that, but Hadoop is the disruptive force in the market that might be enabling that. And you're seeing that the hub of innovation as the financial services sector, generally New York City specifically, are you seeing sort of similar trends in London, for example? Well, I mean, Wall Street obviously is the financial capital of the world. And no disrespect to my friends in London, but certainly the financial services industry is a major driver for this. Now, I know banks have had a history of adopting technology pretty early, but I've never seen a wave coming like this. And I think the most interesting thing as well is we talked about economics earlier, but when you see a curve of the amount of data that we're gonna have that's doing this, and when you see the curve of budgets for storage that's doing that, I would suggest that there is probably a problem in the marketplace that Hadoop solves, which is that gap in budget. And that gap is clearly widening. More so than it ever has in the 30 years that I've been watching it. Yeah, I would just add to that that we're seeing evidence of, to kind of back up that premise, talking to Wikibon community members who talk about a lot of the different benefits of Hadoop, but one of them that they talk about, it's not always the sexiest one, a lot of people want to talk about analytics and all the cool things you can find, but a lot of it is storage costs. Now, I think Hadoop is, to really leverage Hadoop, you've got to look at it much more than just a storage platform, but that is an element. Or you're talking a 10 times economic difference between the two technologies. It's interesting, Mike Olson in his keynote at Strata today, founder of Cloudera, talked about Hadoop disappearing. It doesn't mean it's gonna go away. What it means is it's gonna be ubiquitous and it will disappear behind lots of other applications that will be built on top of this. I think, I do think Hadoop will commoditize as a platform very quickly, which goes to this point as well about ubiquity and stretching right across the market. So, you guys have chosen what I'll call a niche. Somebody once said to me, you're gonna start a company, make sure you own a niche. And you guys are solving a really super hard problem that most people just wouldn't even want to touch. I wonder if you could talk about that and talk about the uptake that you're seeing in your space. So, we came to market with our patterns of active-active replication technology, applied it to Hadoop, and solved the central point of failure associated with the name node. In fact, the whole data center, being a central point of failure, if a data center goes down, you're screwed, right? So, we ensure that the data is not mock backed up because it's active-active, so there is no DR site. You have data that resides in different geographic locations. That was our first premise in going to market. Actually, interestingly, and this is sort of brand new, hot off the press news, we're discovering that there are plenty of other use cases for our technology. One, which is we went to a very large consumer electronics company that had a problem associated with the Internet of Things, which is a huge driver for that increase in data. They run a Kafka job. They cram in a colossal amount of data every 20 minutes into a Hadoop data lake, or an enterprise data or whatever they call it these days. That cramming of data, that data ingest job, utilizes the entire system resources across the entire data lake. So, that means that while that ingest is going on, you can't run analytics, you can't do other batch jobs, you can't do anything else with the data lake other than ingest. And as that data climbs, this company would concern that actually the full system utilization would be only used for doing data ingest. So, what WANDISCO can do is segment the data lake into what we call cluster zones. So, you have zone A that's doing ingest, zone B that might be trimming the data, zone C that might be doing an in-memory analytics job, and so on and so forth. I'm gonna go through every single potential use case for that. But it also plays into a cost perspective as well. A CIO from a company said, can you guys do this zoning thing? We thought about it, he said, actually, yeah, that's a byproduct of what we do. Turns out, if you need to speck out the hardware in a data lake, you have to speck it out, assuming that you have a big data lake, you have to speck it out for the highest possible use case. So, if it's a lot of in-memory processing, and that's what you need to speck it out to do is massive in-memory processing. With zones, it means that you could have one particular zone with a high spec set of machines, another zone with a lower spec of machines, and so on and so forth. So, we probably reduced the cost, the hardware cost associated with the deployments by 33% minimum. I mean, we're seeing much bigger numbers than that. So, I think those two use cases and add to that remote data ingest, because of course, if you're monitoring oil pipelines in Saudi Arabia with, you know, or wind speed or whatever, you're not gonna surely build an ingest pipeline all the way from Saudi to data center in California. You don't want to collect that data locally. And we did announce two new products today that also deal with that problem. So, data center-aware yarn, now what is that, okay, this means that you can run a map-produced job, create a single global namespace against data that resides in different geo-locations. So, we made yarn data center-aware. And I'm also really excited with a product that's come out of our big data labs. It's not ready, it's not prime time yet. It enables a single global namespace from different Hadoop distributions. And we're demonstrating today, I think, Hortonworks and Cloud Era, because it turns out that a lot of the banks, a lot of the financial services companies are not really settling on one particular distribution. They want two, maybe three distributions. So, when you say we make yarn data center-aware, you're talking about resource negotiation and job scheduling across data centers? Across data centers, because if you think about it, there are countries where you cannot remove data. Saudi Arabia, Australia recently, Argentina, Germany, exactly, you can't remove data. So, what do you do? How do you run fraud analysis if you're a bank? This is a real use case, of course. When you've got data and all these different, you know, if you're using your credit card in Germany, how can I query that data, right? I'm not allowed to remove that data from Germany. So, you have to be able to run a map-reduced job across multiple data centers. We enable that. It's really cool technology, that is cool. Well, yeah, the interesting thing is, you know, we're seeing from our community a recent survey of analytics, big data analytics adoption. It was an astonishingly high number. I think it was over 70% of practitioners that have Hadoop deployed in multiple data centers. So, now the next question then is, well, are they trying to connect these data centers or are they isolated pockets? Kind of related to what you mentioned about seeing different distributions. We're also hearing from, you know, especially the large kind of telecom companies, banks, as well, that they're using multiple distributions. You might have the security team using MapR to parse and analyze data relative to fraud, and you might have the product development team bringing in data, you know, phoning home from their products using Hortonworks. So, you've got different distributions, but ultimately if you can tie those together because one of the benefits of big data courses, you don't know what these correlations are gonna be found, you don't know what data sets are gonna relate to one another, and if you can tie those together into a virtual data lake, and run jobs across all that data, well, now you're in a position where, well, data scientists would just love to have that data. And that's a good point. I mean, no disrespect to my product management team, and I mean that guy is no disrespect to you, but these ideas came from customers. You know, as Hadoop gathers pace in the marketplace, these use cases are coming out, and I'm hammering our sales team. Every single account we go into, I want to understand exactly what the business use case is. So I'm also curious, you mentioned some of the banks that you're having some success with, and one of the things we're trying to squint through here at Wikibon is, what are some of the characteristics of some of the early adopters that are having success? Because we've done some research around ROI, and frankly a lot of early adopters are struggling. This is hard stuff, but there are, of course, those handful of companies that are doing it well. What are some of the characteristics that you see in companies that are doing this well? That's a great question, and I think there is a couple of characteristics, and it's all related to the same thing, really. The ones that are not successful, what they do is they go and create a data lake, pour data into it, and it's the builder, and they will come mentality. And guess what? Nobody comes, right? Whether you have a business driver, and I go back to the major bank that we're looking at, are we making money in this geo-region? That is not trying to ball the ocean. It's a very succinctly defined use case that has a clear business driver. Those are successful. It's the, yeah, we're gonna build it, the IT department says, yeah, we're gonna build a data lake. This is not an IT problem, guys. This is a business problem, and the driver for Hadoop adoption, the reason it's picking at pace, the reason that we're seeing it so quickly is because it's the board of directors' dam. That's so interesting, so in this survey, we asked people, to what degree has your Hadoop, or big data projects, succeeded? And the IT guys, in a huge way, said, success, job done, mission accomplished, we're there, 100% of the value. And we asked, so I think it was what, 60%? It was about 60% of the IT guys said, success. Only, less than 20% of the business said it was success. And so the real dissonance between the IT guys saying, hey, the Hadoop cluster's up, it works, it didn't fail, we got it going. We learned how to write a map-reduced job, whereas the business guys were saying, wait a minute, what was this? I don't get it, so if you look at the average ROI on Hadoop projects, it's right now small, for every dollar they spend, it's 55 cents in return, but that's the mean. You know, if you find the guys that are the best of the best, they're getting huge return. It's such an interesting discussion. When you go to the IT department, sometimes the IT guys will just say, well, we can do this with Exadata, or we can do this with our existing technology. You go to the business guy, the business guy talks about storage cost. They talk about the potential business use cases associated with lots and lots of data. They can, the business guys, not the tech guys here are the visionaries, I think. Yeah, it's very difficult to make the business case for storing the amounts of data we're talking about in a more traditional system, because yeah, even if they're running up against performance issues, but even if you could do it, any insights you find that might save you some money or make you some money, you're gonna be offset by the millions that you're pouring into companies like Oracle and others, on those huge scale-up proprietary boxes. We went to speak to a pretty major government department, I'm gonna have to try and skirt around and not revealing what this is, but they wanted to monitor HL7 traffic, all the healthcare traffic, and look for things like Ebola breakouts and so on, because you can do it if you can store all that data. The only reason that they've never done it is the cost. That's what it boils down to. It would have cost them several billion dollars to do this with Oracle, right? That's that, the technology's kind of there-ish. You know, I mean, we had a guy, we had the CTO of Sun Guard, Steve Cummins speaking at an event that we did earlier this week, and he said, you know, big data was actually invented in Wall Street, we just didn't tell anybody about it. Wall Street's always been dealing with large quantum of data. Really, the simple fact is that nobody could afford to do it in the past, now we can. So let's talk a little bit about capital markets. We have our event this evening, you guys are a public company, but there's a real lack of sort of real big data for public companies. I mean, there's you guys, I mean, I guess, Splunk, Tableau, Klick, but they're sort of on the periphery. Everybody's kind of waiting for that big data IPO and trying to, Wall Street's trying to figure, well, how do I play big data? What's your take on what's going on? Companies seem to be raising money, staying private. You know, you're, of course, in Europe, public company in Europe's maybe a little bit less onerous than it is in the US. What's your take on everything that's out there? About 20% of our shareholder base are actually US investors, pretty large investment funds here in the US. They're all desperate, frankly, for Cloudera to get this IPO done, and so are we, because it will, when we start to get those numbers from companies like Cloudera, Hortonworks, and so on out there in the marketplace, people will then see the herd mentality, the herd moving in this direction, because we'll be able to see the real numbers. At the moment, of course, with private companies, you can't really see what the numbers look like. There's a huge appetite for big data public companies. I mean, you've seen it, massive demand, and I mean, with the amount of money that's raised, I mean, the only logical exit for many of these is IPO, right? Well, I mean, a $4 billion or whatever it was, $5 billion valuation that Cloudera got from the investment that Intel made. I mean, have you ever seen anything like that before in your life? I mean, I'd never seen anything like it. I mean, that's kind of, they don't need to go public. Well, yeah, well, exactly. Right, well, that's the challenge. Cloudera bought itself a lot of time there. So, I mean, I think more likely you're going to see Hortonworks or Matt Barby, the first Hadoop provider to hit the public market. I spoke to a fund manager last week. You had a bet with me that they would do a $10 billion IPO next year. That we will see. That Cloudera would? Yes. I don't know, I've got no inside information. That's what he said. He said, I bet you that they do. He's a pretty smart investor as well. And you've been saying that you think Hortonworks is going to go first. Well, to me, they don't. Well, Cloudera doesn't need to raise the money right now. I mean, they've just raised nearly a billion dollars in the last year. Hortonworks, you know, to keep up with that, maybe they need to go public sooner rather than later. Matt Barby's same situation there. You know, they sometimes get left out of that conversation, but to me, they're ready in that conversation, but they may need cash to keep up. But isn't it interesting, because a year ago, two years ago, we were all saying, when is XYZ company going to acquire Hortonworks, they're going to acquire Cloudera. Now we're not talking about that. Now we're talking about when are these companies going to go public? Because they're the disruption for them. Well, but that's an interesting question, is how are the big players going to adapt here in this world, Oracle, Teradata, SAP? Are they going to make some acquisitions? It seems like the ones we talked about are more on the path to IPO, but there's nothing stopping Oracle for making an acquisition in this space. I mean, if you look at the floor down at Hadoop, where all there's a million startups out there doing interesting things. I mean, what stops them from making an acquisition? I think we've seen this movie played out before that, haven't we? When we had these things called mainframes, and then there was a bridge over to 3Titline server, and the bridge technology there was screen scraping, right? Because we wanted to make a mainframe work inside a PC. Now we have this thing called Hadoop and Big Data, and the bridge is things like Impala, that means that SQL can run on top of Hadoop. I just think that's a lot of baloney. I don't think that market, just like the screen scraping marketplace, is going to be around for very long. And I don't think Hortonworks and Cloudera need to be acquired. And I think it's that there's a dilemma, right? It's back to that graph where budgets are doing that, data's doing that. They can't adapt their business models. In the public market, you're not allowed to do it. Because what happens is, and I may be teaching Granny's to Suck Eggs here, is you, they're out there in the marketplace. They're shareholders and mature shareholders. What do they, what do mature shareholders want in the mature phase of a company? They want cash, they want dividends. What do our shareholders want? Our shareholders want growth. We're allowed to do all that sort of stuff. If we wanted to change our business model, I'm not saying it would be a good idea, we could. Companies like Oracle can't. It's impossible for them to suddenly say, you know what, we're going to stop selling all this expensive hardware, and we're going to move our entire pivot, our entire business. I'm not picking on Oracle, by the way, they're a partner of ours. We're going to pivot our entire business and go down this path that they just can't do it. Well, but the other hard part about that is that Oracle throws off $15 billion of free cash flow in the last four quarters, and they're getting punished for lack of growth. So it's hard. EMC figured this out. When they said, okay, data center is going to be less hardware centric, we're going to create VMware, we're going to spin VMware off, and they still own whatever is 50% of VMware. 80, 80%. They've done exactly the same thing with Pivotal. And I don't know if, it's a pretty smart strategy. There's a big pressure from Elliott Management to have VMware spun off out of EMC, which I think you're right. I think Tucci and company figured it out, and that would be a disaster long-term for them. But of course, short-term, it would unlock a bunch of value inside of EMC. That's how you do it, to your question. You have to do it the way that EMC are doing it. If you can't beat them, you join them. Well, interesting that more companies don't do that. I mean, that federation model is pretty unique. I mean, IBM's pretty much got the whole thing in, Oracle certainly, when it buys companies, it red washes them. Well, you know, CEOs don't like having other CEOs right alongside them. Yeah. Well, I think, again, I think that's the brilliance of Tucci, very open-minded. And of course, one of these days, he's going to retire, I'm not sure when. All right, David. Well, listen, thanks very much for coming on theCUBE. It was always a pleasure riffing on the trends. Good luck with everything in WAN Disco. And thanks again for coming by. Always a pleasure, guys. Keep it right there, buddy. This is theCUBE. We'll be back from Big Data NYC right after this.