 Live from New York, it's theCUBE. Cover Big Data NYC 2015. Brought to you by Hortonworks, IBM, EMC, and Pivotal. Now your host, Dave Vellante and George Gilbert. Hi everybody, we're back in New York City. Bruno Aziza is here, longtime CUBE alum and friend of theCUBE, he's the CMO at scale. Great to see you again. Thanks for having me guys. Awesome, now this is, I said to my panel last night, survey week, we wanted to survey, Gartner did a survey, AtScale did a survey, Databricks did a survey, we're sort of parsing through all the data, so I definitely want to talk about that, but before I do, give us the update, give us the quick bumper sticker on what AtScale does. On what AtScale does, sure. Your latest, you know, next big thing. Yes, yes, so we launched a company in April and we announced this year as a, where we had the participation of Jerry Yang, the founder of Yahoo and the founder of Cloud Air. We're essentially a BI on Hadoop, middle layer application, and what we do is we enable any BI tool, like the Tableau or the Excel of this world, to connect to Hadoop while providing scale, speed and security, which is today the issue that people have when they connect BI tools directly to Hadoop. We're kind of like, if you remember Business Objects had this thing called the Business Objects Universe, we're that, we provide an environment for people to model the data that's in Hadoop without ever moving it. We integrate directly with the BI tools without ever downloading any client. We integrate natively with them, and we provide security, access, and performance. We have this great aggregation management system that allows people to get sub-second queries on whatever data they have. So billions and trillions of rows can be queries very quickly using AtScale with very little intrusion to your environment. And what can you tell us about the company, the funding, the headcount, all that? Yeah, so the company was started by Dave Mariani. Dave was a customer of mine, actually, when I was at Microsoft. Dave is the guy who built the largest OLAP cube on Earth using SQL analysis services. And when he started that, he realized there was a huge opportunity to set the time where Yahoo was incubating Hadoop. So they were really the first company to do this. And he realized there was a need to connect the BI tools to the Hadoop player. He left Yahoo. He went to do the same thing at Cloud, and we had this discussion like, well, surely someone must be building the software, and he realized nobody did. So that's when he started the company about three years ago, and I just joined at the beginning of the year to help launch the company. So we launched in April. We announced our Series A in July. We are building the sales and marketing team. Actually, Bobby DeMartino is our VP of sales, an ex-platform guy. So we've got really the A team in the big data space to help us really help enterprises with the multiplicity of BI tools that they have to connect that to Hadoop, which is going to be the secret, I think, as we'll talk about in the survey, on connecting the business users to Hadoop is how people get the most value and the most successful. So you collaborated with a number of organizations, companies, and we're going to be collaborating together as well, Wikibon's going to dive in and participate in this. So who are the companies that participated? Let's start there besides AdScale. Yeah, so we worked with Cloudera, Hortonworks, MapR, and Tableau. And the background for the survey was there wasn't really a lot of information on what people were doing with Hadoop particularly. So what we decided to do is go out and survey people that are already working with Hadoop and get a sense of how mature are they, what workloads are they working on, what they're anticipating on doing in the next six, 12 months, and so forth. And so this is essentially the largest Hadoop maturity survey that is in existence, 2,200 answers, 1,300 unique companies across the world. And what the data is telling us is actually quite different from what we've seen over the last few surveys that were commissioned by Barclays or Gartner, I guess, recently. How so? Well, I think if you look at the surveys up to now, I think the Barclays survey came out in January and said it was surveyed 100 CIOs and it was fairly positive. And I think Gartner came out and then reached out to 300 companies. They got about 125 answers and some of their questions and it was fairly negative. I think it was saying 54% of people were actually going to deploy Hadoop. In our case, out of the 2,200 people, 77% of those companies are people that are not tire kickers, people that have more than 10 nodes that have been using Hadoop for more than six months. And what we find is only 3% of them say they will do less with Hadoop over the next 12 months. Only 3%. So 97% of people are actually going to do more with Hadoop. And there's some interesting trends in terms of what are they doing with Hadoop itself. You know, I think in the first generation of Hadoop, you saw a lot of work on ETL and data science and clearly the trend is going towards business intelligence. The way we structured the survey is we asked people if they had Hadoop and they had a set of questions and we also asked them if they didn't have Hadoop what their intentions were and there's another set of questions. And the workloads on Hadoop are changing which to us indicates that there is a maturity going and Hadoop is entering its second phase where really it's about engaging with the business users. The key stat to remember there is for companies that have been able to provide self-service to Hadoop, they're 50% more likely to gain business value out of Hadoop than any other company. And so I think we're now entering the second phase where Hadoop is about kind of what we lived in the BI world which was less about structuring the data, worrying about loading the data into an environment, it's more about connecting it to people like you and me that could actually get business answers. So I'm curious as to whether or not you probed around the role of the traditional BI, traditional data warehouse. I'm struck, you know, Christian Chabot has been on theCUBE before and he talks about the, and you live this world, the slow building of cubes and the BI business, of course Tableau comes in and they're all about the Viz, they work fast. We had ClickOn recently, they're sort of going after a similar space. Did you query the respondents around the role of that existing data warehouse and existing BI tools? Yes, so I think there's a couple of trends going on. The first one is the assumption that Hadoop is going to replace the enterprise data warehouse and that's clearly not the answer from the surveyors, I think. Opposite, right? Yeah, I think there's, I don't have the exact stats in front of me, but I think it's only 40% of people that are saying they're going to replace it over time. So I think the reality is we're going to live in a heterogeneous environment for a long time. Like we have mainframes today and I think we're going to have traditional data warehouses for a long time. I think what's going on is there are some scenarios where Hadoop is uniquely positioned to address it and companies are betting on it. I think if you look at the data in general and if you're an executive today and you're sitting next to two other guys and you're the one that's not doing anything with Hadoop, two things are going to happen. The guy to the right that's been working on Hadoop is going to do a lot more, right? Only 3% of them are not going to do anything. And the guy to your left who's looking at Hadoop is going to onboard it over the next six, 12 months. So I think there's a sense of urgency that these executives should get to. Now in terms of agility, I think you're absolutely right and there's a need for a set of new tools and new approaches to provide agility on the data that's in Hadoop. I think historically in the first generation of Hadoop, people have been pushing people to move data into Hadoop, leave it there and then at analysis time to take it out of Hadoop, which is I think completely contrary to why people go to Hadoop. So I think, and I'm talking about it because that's the space we're in, we need to be able to provide analysis capabilities to people in Hadoop without moving the data. You need to be able to play the data where it lays. And I think that's a big change in how people are looking into how they're going to be able to leverage to their Hadoop player. Just to be clear, we've heard scenarios like this where you put the raw data in Hadoop, data engineers massage it, data scientists try and get some signal out. The repeatable sort of answers, then they push out to the data warehouse because it's more consumable. Would it be fair to say that what you're trying to do is sort of put that cube layer on top of the data so it doesn't have to be moved into the curated data warehouse? That's exactly right. So I think specifically for AtScale we're a virtual cube technology. So differently from the early OLAP technologies that were very physical where you had to refresh the cube and build it and would take weeks to do that. In our case, it's an XML definition so it's very malleable, it's very agile. And as you make changes there, they're visible right away into the BI tools people are using. But I think to your point, I think enterprises are realizing by having the data scientists front end this process and having a very physical process of moving the data, you're cutting into the agility of your organization. I think the game where it's going next is that actually you don't have to move the data and you should be able to analyze it rather than go through the typical ETL process. What's happening there is that as data is growing the process of moving the data becomes very heavy, very expensive, and very slow. In fact, in some cases, we're working with an online company right now, they can't move it fast enough. So we've got to de-force ourselves from putting the data in Hadoop and then moving it out at analysis time. I think organizations want to be able to analyze it the minute the file is closing on the HDFS. So I wanted to ask Merv this, but we didn't have time. So I'm going to ask you to play a BI analyst. You know that business. I love Merv, but I can't feel his shit. No, he's great. So Mark Madsen tweeted out yesterday, the fragmentation of metadata in Hadoop is the same as the old BI world. And then Merv responded and said it could be worse if a lot of people just dump it in and figure it out later. I wanted to ask him because I wanted to ask Merv, is that necessarily such a bad thing? Just dump the data? Yeah, just dump it in and figure it out later. I mean, I know there's a lot of work to be done, but what do you make of that comment from Mark Madsen? So I don't think it's a bad thing. I think what we've learned is data is just like wine. You never know when it's going to be good and you probably will regret it if you don't have access to it. So I think the habit of never throw away data, which is something that is a big principle for our company and where Dave, a yahoo learned that, is that's got to be a principle number one. Whatever data you can collect, just dump it in there and then worry about it later on. But you don't want to be in a situation where you don't have access to it later on. I think what's going on as well, and I think it's probably a lesson from the first generation in Hadoop, is that we were thinking about Hadoop as a very cost efficient way of storing a lot of information. And if you look at the data in the survey, what we found is the companies that are actually approaching Hadoop, the Hadoop opportunity is a revenue generation of opportunity rather than the cost saving, are more likely to succeed and gain value. I think the stat actually I've got it here, the 30% more likely to achieve business value. So I think what we're learning in the second phase of the life of Hadoop is that all the things that we thought about in terms of reducing storage, moving data around, I think are just disappearing to bring it to a new light, kind of what we lived in the BI days, which is you got to figure out a way to attach it to business people, you got to figure out a way to attach it to them quickly in the tools that they already know. Yeah, Abhimeda said on our panel last year, he said thus far the value in Hadoop has been, the ROI has been reduction on investment. The point was that's not sustainable. You really have to focus on the value. But I wanted to ask you something about the keep it forever because I would imagine a lot of customers, the general counsel doesn't want to hear that and the governance guys go, oh, wait a minute, no. We want to delete everything. How do you deal with that, square that circle for them? Okay, so this is an interesting trend as well because in the survey we asked people that have Hadoop and people that don't have Hadoop what their fears were. And in both cases, kill set is the number one, which I think makes a lot of sense because they're afraid of how do I front end this kind of new data environment. But of the people that have been working with Hadoop, other fears kind of come in. And those fears are management and security and performance and so forth. So I think there's a little bit of a business number of people that are onboarding Hadoop and think, oh, I'm going to have a problem because I don't have enough people to work with Hadoop. And then they realize it's kind of a traditional data management project, which is you're going to have to have security. You're going to have to restrict access and so forth. And so that's where I think it's an interesting market for us is because I call it the three S's. The three S's is speed, scale and security. And if you're able to, oops, I guess somebody's calling me, if you're able to nail those three things you'll be able to transition those people and legal counsel and people that are worried about access. It's not about not storing it, it's about giving the information to the right people when they need it and the information they need to have access to. All right Bruno, we have to leave it there. I really appreciate it coming back in theCUBE. We're looking forward to getting that raw data from the survey. That's right, it is right here. We're going to keep you on community. And always great to see you again. Thanks for coming on. Great to see you, thank you very much. All right, keep right there, everybody. We'll be back with our next guests. Right after this word, this is theCUBE. We're live from Big Data NYC at Strata Hadoop World.