 Live from New York City, it's the queue at Wikibon Big Data Capital Markets Day 2014. Brought to you by headline sponsor WAN Disco with support from EMC, MarkLogic and Teradata. Now here is your host Dave Vellante. Good afternoon everybody and thank you for attending the Wikibon Capital Markets event. My name is Dave Vellante. I know many of you and appreciate you guys coming here today. Especially since 42nd Street is closed, you guys know the story that are here. So thanks for finding your way in. When we had the notion of this event, of course we're running this in conjunction with the Duke World in Strata, we had the idea of some of the research that we've been doing at Wikibon and trying to understand investment angles on big data. There are only a few big data companies that are pure plays. Splunk, Tableau, Click, and really they're sort of on the periphery of big data. And so we thought that we could maybe share with you some of our research and talk about the market and maybe come up with some investment angles. We first heard a couple years ago, Jeff and I were at Atlas Ventures and Peter Goldmacher who's in the audience and the panel today put forth the premise that big data practitioners are actually going to create more value than vendor suppliers. So we started researching that. And what we've seen observe over the past several years is this so-called what some people are calling a digital fabric emerging. What's the digital fabric? Well, historically if you look at industries, whether it's retail or manufacturing or healthcare or media, their stack, if you will, is very highly vertically integrated. Design, production, manufacturing, and it was pretty hardened. What we're seeing now is these sort of horizontal layers emerging across those industries, infrastructure as a service. Some people call it cloud, transaction applications, social applications, and of course big data. And it seems to us that the organizations that are able to ride on top of that and produce value, the obvious ones are Google and Amazon and Facebook, but Jeff Kelly is going to talk about some others. So our session here today is really we're going to talk about what's happening in the enterprise market generally and specifically the enterprise software market in the big data space. And we're going to drill into the big data and then really try to introduce this notion of what's happening in the practitioner world and try to come up with some investment angles for the Wall Street folks in the room. So really how do you play this market? So Jeff Kelly is going to give a presentation. And then we've got a panel, got a great panel. Amy O'Connor is here. She's with Cloudera, former big data practitioner at Nokia. The aforementioned Peter Goldmacher is also in the audience, former Cowan analyst and Abhi Mehta, who was a CUBE guest earlier today, former practitioner at B of A and now CEO of a company called Trasada. And then Jeff Kelly, we're going to riff on some of these trends. So without further ado, let me introduce Jeff Kelly, Wikibon's big data analyst. Jeff. We're really excited about today. We've been planning for this for a while and we've got some great new research we want to share with you. And so, you know, want to talk about a little bit about, you know, my background. They've talked a little bit about Wikibon. I've been covering this market with Wikibon now for about four years, covering the big data space. Prior to that, I was with Tech Target as a journalist covering the more traditional BI data warehouse world for about seven years. So I've got over a decade of experience covering this market. You know, at Wikibon, we talk to collaborate with big data practitioners, hundreds of them, through research, through theCUBE, through other means. And everything we're going to share with you today, all the insights are derived from our conversations, our research, our collaboration with big data practitioners and members of the Wikibon community. So I thought maybe to get started, just kind of lay the groundwork. So my presentation today is going to be broken down into three parts. The first part, we're going to take a look at the traditional data management market, where that stands today, what some of the pressures they might be feeling, and what practitioners think about that market today. Second, we're going to talk about the big data market, the emerging big data ecosystem that includes the technologies, the vendors that play, and some of the dynamics that are happening there. And finally, we're going to put together, as Dave mentioned, we're going to put forth an investment thesis on how investors who are looking at this market, trying to figure out where to place their bets, kind of give them a framework to try to understand where to put their bets to make some money in the big data market. So now it's obvious to anybody here, you watch CNBC, you read the Journal of New York Times, and it's obvious that industry heavyweights are under pressure. There's no question about that Oracle, missing earnings and revenue quarter after quarter. Teradata has lost over $3 billion in market caps since last year on their last call. Their CFO said they expected to come in on the low end of guidance for the next quarter. SAP, you know, we hear about from SAP as Hanan, they're trying to figure out how that fits into this new landscape. They're also struggling a little bit with the cloud fits in that equation. So these guys are under pressure. There's no question about that. And then meanwhile, the same time we've got this big data ecosystem that is just exploding. So if you've been down to the Javits today, if you get a chance to head down there tomorrow, you go to the show floor, you'll see just an explosion of vendors in this space. We're really creating, you know, the new innovative approaches to data management and analytics, you know, very different approaches than what we've seen in the more traditional space. So now the question is, is there a cause and a fact relationship between the two? The struggles of some of the industry heavyweights, the pressure they're feeling is that because of the emergence of big data and the Hadoop ecosystem? You know, my opinion, the answer to that question is yes. So we talked to, as I mentioned, hundreds of practitioners in this space. And this is one of the core research areas that we're focused on. And what we hear over and over again from practitioners is that they are increasingly baselining their spend on the traditional EVW, their challenge with storage costs, you know, data volumes are going up, but budgets are staying flat. So we're increasingly seeing them baselining EVW spending and carving off a portion of what they would spend on the EVW, moving that to the big data Hadoop space where they're experimenting, trying to get this thing figured out because, as we'll talk about a little later, big data for all its promise is still very challenging. There's no question about that. So as part of my presentation, I wanted to share with you some feedback that we've gotten from the Wikibon community. Specifically, we've talked to practitioners about this relationship between the Enterprise Data Warehouse and big data. And this is just a snippet of findings from these conversations. These are actual quotes from practitioners we've talked to. This is from a financial services professional quote, we're chasing the chips, we struggle to get maximum performance from a complex highly concurrent operational data warehouse environment, where hundreds, sometimes thousands of users are interacting without constraint on the system 24 seven. Another one, another financial services profession. I've been at company x 22 years and this data warehouse for the entire time I've been here has been the snake swallowing the basketball. We're just getting a huge amount of data here through the pipeline. It's always been a challenge. We get it solved. And then we go from fractions to decimals and the data explode by 10 times. Now they're going to throw pennies at us. And data is going to explode again. And this isn't just a 10 20 30% data growth year over year. There are these data tsunamis where all of a sudden my data growth by 10 times you solve it and then it explodes on you. And one last one I wanted to share is from an insurance professional. You're definitely looking at moving more and more of our analytic and reporting workloads to do from our data warehouse. There are a lot of factors at play but cost is definitely one of them. When you're looking at proprietary hardware and software of a data warehouse vendor versus commodity hardware and open source software is quite a difference. I think there will be a role for the data warehouse but I don't know what it is. I definitely don't see it getting any bigger. And we've got dozens more of these similar conversations with quite a few practitioners. And so I think it's fair to stay in my opinion in the opinion of the Wikibon community that the enterprise warehouse data warehouse has not lived up to its promise. Now as I mentioned I've been covering this market for over a decade and during that time heard a number of promises from the data warehouse BI data integration vendors. Or in particular that seem to come up over and over again and that is the promise of a 360 degree view of the customer. A single version of the truth ubiquitous business intelligence capabilities for business users not just analysts and data scientists. And the most important I think real time actionable intelligence. I think based on our conversations with practitioners and our coverage of the market over that time I think it's fair to say that the enterprise data warehouse has not lived up to those promises. Now to be fair it certainly has delivered value. I think particularly in the compliance side of the house. And certainly the data warehouse has provided a level of insight that wasn't available before. But largely I don't think the data warehouse has lived up to those promises. Now kind of moving to the second part of our presentation today. The next question of course is will big data live up to the height. So what you're looking at here is Wikibon's big data market forecast. This represents revenue from both hardware software and services. As you can see we've tagged the big data market in 2014 to top 28 billion dollars growing to 50 billion in 2017. Now this is made up as I mentioned hardware software services. The biggest component here is services. So that's really not surprising to think about as I mentioned big data is still hard. Technology is still complex and services plays a role of helping put all that together. That's about 45% on the market. The next largest component is the hardware component. Again when you've got a fundamentally a scale out approach on modern boxes, modern hardware. That takes a lot of hardware especially as you're scaling adding more data volumes. Hardware plays a big role and that's about 35% of the revenue. And software is actually the smallest component. Just about 20% of overall revenue. And we'll talk a little bit more about this in a moment. But part of the reason of course is in the big data space a lot of the software is open source. Which means it's free to download. And especially when you're at this stage in the market where you've got a lot of experimenting going on, POCs, and again we'll share some data around that. A lot of practitioners are first starting with free open source software. So how does this translate into market leaders? Well when you've got a market that's very heavy on services and hardware it might not be surprising that today the leaders in terms of revenue and big data are IBM, HP, and Dell. Now IBM obviously has got a few services armed. They made a lot of investments in the analytics software and they are at the forefront in terms of the larger vendors in pushing this notion of big data analytics to their customers. HP and Dell are coming at it more from the hardware side of the equation. Now I can see my friend Albimeta out there rolling his eyes a little bit. IBM is big data, HP, Dell this is not what we expected right? So for you purists out there let's talk about quote unquote real big data. What's really happening in this part of the market? So what you're seeing here is wiki bonds, Hadoop and NoSQL market forecasts. Now it's important to know this is a sub-segment of the larger forecast that we just shared with you. So of the total 50 billion dollar market we think that's going to happen by 2017, a little over 3 billion of that, between 3 billion and 4 billion is going to be generated through Hadoop and NoSQL software. Now granted this is a small slice of the larger market but I think what's important to remember here is that Hadoop, NoSQL are two of the foundational technologies and what Dave described as the digital fabric along with infrastructure as a service, social and some other things. So this is critical component of the big data market and really this is where a lot of the innovation, pretty much all of the innovation is coming from is in this open source Hadoop and NoSQL space. So you know we've taken a look at the larger market, we're taking a look here at the Hadoop and NoSQL space so as investors, what are investors thinking, where do I want to put my money? Now who are the big data pure plays out there and Dave mentioned earlier, on the public markets there aren't too many options out there and I think it's even a little bit of a stretch to call these players pure play. Let's say the closest to a public big data pure play is probably Splunk. We've just had Splunk's conference third one in a row that we've attended Splunk conference 2014 now the knock on Splunk is that they are a little bit expensive and that they're going to have challenges scaling but the reality is their customers are enthused, they're solving problems and Splunk is really delivering value for them. Splunk is actually one of the few what I would call big data application companies out there focused on machine generated data. Next we've got Tableau and again this is a little bit more periphery again as Dave mentioned focused on data visualization and they're really disrupting the traditional business intelligence market so the whole idea behind Tableau is to deliver self service data visualization to business users, to end users, the non analysts out there. Again their challenge they were kind of created pre-mobile pre-cloud revolutions so for them the challenge is adapt into this new world and they're taking steps to do that. Click similar value proposition of business intelligence data visualization company really focused on self service disrupting that kind of traditional market. They are you know have a heavy presence in Europe one of their challenges is going to be breaking into the US market. But of course what everybody really wants to know about when are the Hadoop peer play is going to go public. So with all due respect to my friends out there who may be in the room here or are watching from IBM and Pivotal we really think there's only three Hadoop distributions that matter. Cloudera, MapR, and Hortonworks. Now Cloudera has started in 2009 they were the first to market and they have raised quite a bit of money all three have raised quite a bit of money but Cloudera in particular raised nearly a billion dollars about three quarters of that from investment by Intel now on 18% stake in the company. So in terms of going public market I think that move which happened in the spring probably pushes that off of ways for Cloudera. In terms of revenue we estimated Cloudera's revenue and let me just say we don't have any inside information we do our revenue estimations based on you know publicly available data conversations with vendors with their customers taking into account metrics like headcount things like that. So having said that our estimate for Cloudera coming into or as we come up on the end of 2014 is just north of 100 million in revenue this year. Next MapR Hortonworks I think those two vendors are more likely to go public in the shorter term. MapR we have coming in just north of 60 million revenue Hortonworks we're estimating just south of 100 somewhere between 90 and 95 million this year in revenue. We'll talk a little bit more about these players in a moment. No sequel space it's a little different. There are not three vendors two vendors that dominate there's any number of vendors out there in no sequel space it's a little bit more of a splintered market there are dozens of no sequel databases and vendors trying to commercialize them identifying three here who are leading the market at least in terms of revenue. Mark Logic we're estimating coming in somewhere north of 80 million in revenue this year. MongoDB extremely popular with developers the open source model the company itself we're looking at slightly over 50 million in revenue this year and finally data stacks which is commercializing Patrick Cassandra no sequel database created at Facebook coming in somewhere north of 49. So you know we've talked about the size of the market no sequel market we talked about some of the players out there and I think there's an interesting dynamic happening both the Hadoop and no sequel spaces. I use Hadoop to kind of illustrate it but it's happening in both markets and and that is practitioners are asking some of the question about Hadoop is to pay or not to pay. So one of the interesting dynamics with the new big data movement is most of the software that I mentioned is open source so what you're looking at here some findings from our merchant adoption big data adoption survey we conducted a wiki bond with wiki bond big data practitioner and this is focused on early adopters these are people who are at the very least in the evaluation stage of big data of these circles these empty circles these zeros here represent the 51% of practitioners that told us they're using a roll your own Apache Hadoop so they're not paying anything for that there they've downloaded the software from Apache Hadoop and they're experimenting with that. The tannish colored circles on the far right there represent the 24% of practitioners in our survey who said they are using one of the free distributions from one of the vendors whether it's called AeroCVH, Hortonworks HDP or MapRs M3. Now these dollar signs which I think we're probably interested in the vendors represent the 25% just the 25% that are actually paying subscription customers of one or more of the distribution vendors. So you know if you're one of these vendors or perhaps investors thinking about investing in these companies as they do come to market you know you can look at this as you know glass half full glass half empty I'd say well it's a quarter full anyway but it's an interesting dynamic and one that has to be taken into consideration when you're looking at this market. And then of course as I mentioned at the beginning of this presentation the ecosystem is just exploding and makes things just that much more difficult for investors to kind of squint through and the reality is the big data market is just wild west right now. Now this slide represents you know quite a few companies I probably could have filled up three more slides with the different startups in this space and as Dave and I were traveling here to New York we were talking to each other about this market and we were trying to think of a technology company that's not in the big data space in one way or another and we really couldn't think of one. Really pretty much every major technology company has a play here that everybody's trying to get their piece. You know this does make it difficult to kind of squint through all this noise and it's important I think to kind of boil it down to two really basic but important questions that we consistently get from the Wikimon community in one form or another. And of course important for investors and that is will anybody make any money in open source big data? Put it another way will there be a red hat of Hadoop it's another question you hear a lot out in the media. Really if you want to boil it down bottom line is will there ever be a billion dollar big data software company? And the second question we get is will Oracle SAP carry data all the industry heavy weights? Are they going to swallow up a lot of these big data startups that we're seeing in the ecosystem? So you know in question one my opinion is yes there will be money to be made in open source big data and I do think there will be a billion dollar big data software company. Now if I had to place my bets I'd say right now the most likely candidates for that are either Cloudera or Hortonworks. From Cloudera's perspective they've got the first mover advantage. Clearly they've got the juice they've got the cash injection from Intel and they've got a really deep bench. Hortonworks perspective they have a very disruptive business model. I'm not sure if everybody's you know familiar with Hortonworks business model but essentially they are giving away their software for free they don't charge anything any license fees for their software simply monetizing the subscriptions for support services. And then of course they've also got you know a really deep and extensive partner ecosystem and of course likewise they have a pretty deep bench. I mean if you look at these two companies in terms of the talent from the engineering perspective and it's very impressive. Those would be my two most likely candidates in terms of the first billion dollar company. I think kind of a maybe an outlier or a dark horse in this race might be somebody come out of the big data application space. I think that's where really where the value is going to come from a practitioner perspective is when we start to see more big data applications that are built on top of the infrastructure that we're seeing being built out at the Hadoop layer and just above that. So that's a possibility as well. Onto the second question you know I would say for sure if we go back a slide there's going to be consolidation in this market there's no question about that. There's just too many vendors for this market to support in the long term and you know this is a pattern we've seen in other markets this happened in the business intelligence market in the mid 2000s shortly thereafter in the MPP data warehouse market acquisitions consolidation happened and I expect it to happen in the big data market as well. And we're already starting to see that with players like Teradata making acquisitions think big analytics Hadoop and some others so I would expect that to happen but the question here is you know does that what does that do for some of the industry heavyweights in my opinion is that some of the acquisitions are going to happen but I don't think it changes the fundamental economic challenges that the heavyweights have that is the open source nature of Hadoop and the big data ecosystem that's evolving around it. The license modeling is different it's a fundamental challenge to the business model of the industry heavyweights and simply acquiring the technology which they potentially could do does not change that equation it's still going to require a business model change on the part of the heavyweights to really adapt to this market. Okay so you're thinking about where do I place my money? We talked about the big guys in this market and they're under pressure and they're struggling to adapt we've got the new startup vendors most of them are public yet so the question then is where do I put my money? Put another way who are going to be the biggest winners in big data? In our opinion and our thesis is that the biggest winners in big data are going to be the practitioners. As Dave mentioned a lot of the credit for our thesis goes to our firm Peter Goldmacher who really opened our eyes to this idea back in 2010-2011 when he was talking to us about the ERP market and how that evolved and if you had tried to pick the winner from a vendor perspective in the ERP market in the early 90s never would have picked SAP but if you had focused on which companies out there are using ERP technology most effectively and put your money on those companies you would have had quite a good return. So we've been testing out this thesis does this hold up in the big data space? And we fundamentally think that it does so we believe that it's going to be companies that leverage the digital fabric as Dave described to create new business models for example new companies companies like Uber like Netflix notwithstanding their not so great results today into it they're creating completely new lines of business new markets and really disrupting some of the old line industries those are the companies that we think they'll be the big winners but as well kind of the more traditional enterprises that you see out there that are making use of this technology to find efficiencies to drive more profits and open up new lines of business for themselves sometimes in completely adjacent markets so we're seeing that from companies you know 100 year old plus companies Coca-Cola, GE companies like UPS so this is not confined we don't believe to the web giants and to startups this is going to impact companies across industries. Now having said all that it's important to recognize that in terms of the state of big data in the enterprise we are still in very early days so again this is data from our big data adoption survey just laid this out for you on the left hand side you'll see this blue piece of the pie represents the 31% of early adopters who are in production deployments that means they're supporting production applications with big data infrastructure and software 28% have a pilot product or a POC underway and 41% are in the evaluation stage now in the gray box we asked practitioners what tools and technologies are making up or you're using in conjunction with your big data projects excuse me 36% said they're using Hadoop 38% said they're using no-SQL database of one flavor or another and again we think this is really important because these are foundational technologies in this stack now another interesting data point I wanted to share you'll see second from the top 51% and that represents the 51% of practitioners who said they're using a conventional data warehouse as part of the big data practice uh... you know another question we get a lot from the Wikibon community is one form or another of the question you know is my data warehouse a dinosaur now the answer to that question is yes dinosaurs it'll be around for another hundred million years so it's gonna play a role and finally 52% of the most widely used tools and technologies data integration tools what's interesting here is I think tells a little bit about the complexity of big data technology it still requires a lot of data movement uh... what's interesting that we're seeing shift here is that the traditional ETL world is kind of moving from ETL to ELT uh... with their loading data into new uh... platforms like to do using the power of those platforms to do the transformation so it's disrupting that traditional data integration market uh... we're seeing a much more uh... interest in real-time data integration as well where you're not necessarily moving huge volumes of data but you've got to move discrete parts of discrete volumes of data in real-time to point applications uh... you need to orchestrate that movement uh... in an effective way so I mean one takeaway from this of course is this is this is complicated this is challenging and so it's not surprising the practitioners are looking for guidance so another striking finding in our survey was 72% of practitioners are engaging professional services firms in one form or another uh... to help them either architect deploy or run their big data analytics projects they're looking for help they're looking for guidance on how to make this work from a technology perspective they're looking for guidance around what are the best use cases that get started how do I get a return on my money and they're looking to outside professional service firms to help them do that now how do you get a return on your money and what are practitioners expecting well they're expecting big things so we ask practitioners let us know what percent I should say uh... for every dollar spent how much do they expect to get back in return uh... in their investment in big data technologies they're expecting big things in that they're expecting three dollars and fifty cents return on every dollar invested in big data technologies and services unfortunately the reality today is that they're only getting return on average fifty five cents so that's a long way from where they want to be now it's important to remember that this is just a mean this is just an average and then in fact there are a handful of companies cross industries but a handful of companies that are doing this well and are really transforming their businesses and earning quite a profit with big data but on average big data practitioners are struggling to get a return on their investment now the reason for that are there are a number of barriers associated with big data it's complicated meaning there are data and technology related challenges challenges around data integrations we talked about data transformations another major challenge we hear about getting data just in the suitable form so you can analyze it it takes up a lot of time and effort there are people in process issues and this may even be kind of a bigger bigger barrier things around mindset around you know what can we do with data people are so used to being constrained to the uh... the old model of defining the question in advance and not really being creative with the questions I could ask and that's changing and people have to change with that so there are people in process challenges that they have to overcome but the most striking finding in my opinion in terms of the barriers was really this misalignment between IT and the business now this is old as the hills it's not uh... restricted certainly to big data but uh... it certainly rearing its head in this space as well so we ask practitioners to rate the relative success of the big data projects uh... so when we asked IT over sixty percent felt like they were getting the full value of their investment in big data technology sounds pretty good sixty percent we asked the business side of the house the same question and under twenty percent felt like they were getting the full value of their investment in big data now clearly there's a disconnect here and you know part of the challenge I think is that there are different ways that they're measuring success IT is looking at standing up systems as the Hadoop cluster up and running you know are the green lights flashing on the disk drive okay great mission accomplished the business side is saying well you know we need insights from this data that we can actually use to drive our business uh... so they're judging uh... the criteria for evaluating success is different and that disconnect is really I think in part leading to that uh... fifty-five cents on the dollar return that we talked about earlier you know this is I kind of laid it out here a lot of doom and gloom around you know the state of big data in the enterprise today but the reality is we do believe this is going to be transformational the reason among other things is that practitioners really do believe this is critical to their business you'll see here in the yellow box we asked practitioners you know which of the following best describes your attitude towards big data analytics now this little tiny slice of the pie this one percent represents the one percent of people that said big data analytics is a buzzword of unclear meaning now interestingly we asked the same question around the term cloud uh... back in two thousand eleven I believe it was and ninety-five percent of the practitioners felt cloud was a buzzword of unclear meaning and I'd venture to guess if you ask it today you still get around fifty percent so the fact that we're not this far along in big data and it's just one percent that feel that way I think says something and then on the other side of the equation forty six percent of practitioners felt that big data analytics is the new source of competitive advantage for their enterprise on the other side of the page you'll see in this blue box this represents uh... the question we asked you know what are the major drivers you're looking at in terms of big data projects is it to save money is it to make money neither or both the majority fifty five percent see this as well I would consider a strategic investment they are looking to both save money and make money with big data they're not looking at this as a tactical tool to solve a particular discrete problem they're looking at this as transformational technology that's strategic to their business and again just to reiterate we really do believe the practitioners are going to create exponentially more value in big data vendors the other reason we really believe this is that it's already happening in this box you'll see just representing some of the industries that were part of our survey uh... IT technology providers but also healthcare manufacturing banking finance retail wholesale and we've got a big slice of the pie there other agriculture pharmaceuticals so this really does span industries and as I said we're already seeing this happening in these new companies that are creating new business models like Uber like Fitbit, like Netflix but it's also happening in some of these larger more traditional enterprises if you will called the COLA, GE AP Morgan Chase, Banks, or Financial Services are somewhat ahead of the game in terms of adoption so this is happening at the web giants, this is happening at new start-ups, this is happening at more traditional enterprises but it's important to remember that as exciting as this is, big data is challenging and just because a company adopts big data technology does not mean they are going to be one of the winners from an investment perspective you've got to be very selective about the companies that you invest in and it's going to be about who are the practitioners that use this technology most effectively and one way to look at this is what's your data IQ? here are some questions and a kind of a framework that we think will help investors ask the right questions and evaluate companies who are starting down this big data journey and again it's not specific necessarily to quote unquote big data it starts with how well are you leveraging the cloud companies that are adopting the cloud and leaving the non differentiated heavy IT lifting to others those are the ones that I think are starting themselves down the right path how well are you personalizing the user experience consumers expect a certain level of personalization and how well I might add are you anticipating their needs and that's also a big data analytics challenge how secure private and trusted are your transactions because ultimately that's what you're trying to get to as a transaction and we've seen the concerns around privacy data breaches and those kind of things and that's there's no quicker way to sink your prospects there than to suffer a major breach so this is an important part of the equation how effective are you at fostering information sharing I mean this is the collaborative economy your consumers don't want to just it's not a one-way street between the vendor talking to the consumer it's the consumer talking back to the vendor the consumers talking to each other and if you can foster that information sharing that's gonna help you move on to making those transactions happen another important big data question do you have the data science skills to harness collective intelligence so we're gathering organizations are gathering data from multiple sources that's coming from inside outside the enterprise do you have the skills to make sense of all that information to find the insights actionable insights that you can act on to actually drive your business and then of course how rapidly can you react to those insights the evolving consumer tastes new value opportunities as they present themselves as you identify them through data science you can't just identify them you've actually got to be able to put in the infrastructure and the processes to take advantage of those so this is our framework for evaluating big data practitioners and where we think some of the biggest winners are going to be so you know just to kind of wrap up we talked about the market traditional market pressure we're seeing on the major vendors the big data ecosystem is exploding there's a lot of innovation happening there but the biggest winners are going to be practitioners focus on the practitioners that are doing these things well and are laying the foundation to do these things at scale we think that's the best place to put your money in the big data market thank you very much um happy to take any questions if anybody has any don't be shy well i think you know one of the challenges so the question was you know how are organizations tackling the challenge of kind of getting that single view of the truth and transforming data into a form that really gives you that single view of the customer the single view of an entity whatever it might be well i think what's happening you know is on the one hand you've got big data can serve as a the larger the data source the data volume that can iron out some of the discrepancies in data quality but it's still a really important problem and i think you know we're still trying to figure out how to how to crack that nut there are vendors out there that are building tools and technology around data transformations and trying to do that in a way that allows a business user to do it so they those with the domain knowledge can take action on that but it's an important area that you know we still need to see some more development on i mean data quality is a challenge that's been with us you know in the old world it's going to be with us in the new world but it's about really connecting the right dots it's an analytic challenge is what it is so it's not just thinking about it in terms of data quality it's thinking about it in terms of analytics it is true certainly that the emerging big data technology to do SQL based on a lot of open source technology you know and the economics are very different than the traditional world the numbers as much as ten times different in terms of pricing on a GW versus a Hadoop installation with similar amounts of data so that's one reason i think there's just going to be the pie is going to be as big in that space you know for a mega vendor to try to replicate the revenue they're getting from their database business with Hadoop that would be a challenge and so i think partly that's the reason but ultimately i think it's because it's the right i mean it's good news i think for vendors that practitioners are going to create more value because that's the right market dynamic if it's the other way around if the vendors are creating a lot more value than than the practitioners well that's going to collapse under its own weight eventually so ultimately we just think when you're putting this technology to work you just have much more the ceiling is much higher in terms of the amount of value to create well yeah so some of the the question was to kind of expand on some of the people in process challenges that we're seeing the practitioners run up against well there's a few different areas i'd touch on there one is around this whole idea of i guess you might call it some call it kind of the scheme on read so the idea in the old world was you have to kind of model your warehouse ahead of time you have to basically ask know the questions you wanted answers to prior to setting up the system so you want you have to model your data warehouse in such a way that it can answer these pre-determined questions what happened with that is practitioners business users uh... will kind of train not to ask too many questions it was you know let's set this up in advance if you have to ask a new question that's going to take three months or six months to get that new data source involved and model that data so you can actually get an answer uh... and and i think practitioners were conditioned not to ask more questions because the answer they were often getting from it was well we can't do that or it's going to take too long so now with some of these newer systems you can ask any question the idea is not model the data ahead of time but to ask the question and model the data as it's read as it's read back so that's a kind of a change in mindset in terms of the idea that you can actually ask these questions on you can get these answers uh... you know in terms of scaling you know part of the challenge is I think one of the challenges with traditional bi has always been how do you get it to more people to use it I mean the number roughly is around twenty percent of people in any given organization using business intelligence never really got much higher than that in the average organization and part of the reason is in my opinion is that it required kind of a separate user environment people are uh... you know if you want to get some insight into your data and this business intelligence tool you go to a separate bi tool what I think needs to happen is more integration of analytics and bi into the operational applications that people use every day I think that's one way you scale analytics and make it truly ubiquitous in the organization well I think from you know from the sorry what what what's the uh... different between the kind of the old world and the new world in terms of the technology I think what's different couple things I think the three things that are most uh... important to recognize are that the software self is open source now open source doesn't mean free as I've heard it say you know open source is free like a new puppy is free there's a lot of other challenges that go around that for sure but that's one dynamic that's different uh... the hardware component also very different in the traditional data warehouse world it's really built up around kind of the appliance model the exadata model uh... the teradata appliance model when you're bundling really uh... expensive proprietary hardware uh... with the software in the new model using commodity hardware scaled out not tied directly to the software that dramatically changes the economics and it also improves the you know that allows you to scale it at a rate that you couldn't before either for economic reasons or for performance-related reasons so I think those are two areas where there's definitely a difference uh... you know there have been you know the kind of high-performance computing space and and others there have been you know people been doing big data before this but it was very expensive it was supercomputers uh... I think what's different about big data is it's making it economically feasible for most enterprises to tackle this problem so the question was the concept of the data lake is that uh... has that kind of run its course uh... the premise of the question was that was popular as a concept a couple years ago and this may be moving away from that I don't know if I necessarily agree with that I would say that the idea of having a single physical data lake uh... yet that idea I think is difficult to achieve reality uh... especially the large enterprise so you're seeing large enterprises with deployments for example cross data centers some cases you've even got you know multiple distributions in use the interesting thing would be if you can link those in a way to create more of a virtual data lake uh... that allows data scientists and analysts to actually access any data within the organization without needing to be in a single physical cluster uh... I think that more of a virtualized uh... data lake with kind of a federated view across that's kind of the model I see uh... it's kind of going forward that's a great question uh... so the question was you know what percentage of what percentage of signal is being generated from all the noise that's happening that's that's a tough one to answer and I think it certainly varies by industry uh... you know I so I can't give you a number on that what I would say is you know in terms of the more successful big data deployments that we've seen are the ones that start with a business problem and not more IT led where you know from an IT perspective they say well we're gonna create this data lake concept and we're going to load data into it and then maybe we'll figure out some some signal from all that noise in there that is a much more challenging problem than if you come at it with a business problem and a hypothesis about what data sources you think are going to be the most valuable and hold the most predictive value and if you take that approach you're going to have a much frame in the way you ask the question and what higher percentage of signal the noise then if you were to take a boil the ocean approach we're just gonna throw all this data in here and we'll find a business problem for it later it would be very interesting to do some research I suspect that that's one area one way to look at the problem and I would imagine it also varies by kind of vertical industry and the particular problem you're trying to solve so with that thank you very much I appreciate your time and attention and I think I'll welcome Dave Vellante back to the podium we're gonna do the thank you Jeff appreciate it so plenty of time to ask questions of the panelists let me now call up the panelists Jeff you wouldn't mind helping me with the chairs here we're on double duty let me introduce you to the panel members we just need four Peter Goldmacher we've been abusing his name Peter thanks very much good to see you former Collin analyst and sort of expert in the space Amy O'Connor Amy's with Cloudera so we're gonna recuse you Amy and a lot of the questions that we're gonna be asking and riffing about but feel free to jump in thanks very much for coming today Amy former team lead at Nokia big data team and and Abhi Metta right Abhi there he is a friend and colleague Abhi Metta CEO of Trasada really interesting company doing some cool stuff in the big data space former practitioner of B of A well-known speaker so thank you thanks very much for for joining us and of course Jeff Kelly as well actually some of the questions you guys asked are better than ones that I had so I'm gonna start there and when we turn that oh yeah sorry when's the IPO okay so again thank you very much for coming on I actually I love that question about what's really different I'm gonna start with the practitioners Amy okay what's really different about this whole I love this question because this is how I start all my talks with customers so I am with Cloudera I've been with Cloudera for a year my title is evangelists it's fairly religious I travel the world and talk to customers at all sorts of different stages of adoption about what do they need to do to adapt how do they need to change their orgs culture and all that and when I talk about what's different I kind of start I basically say there's seven things that are different between this big data world and the traditional BI and analytics world and Jeff you really hit on them you hit on the first three that I was gonna say are pretty important number one economics the economics of this is just radically different so you can store a lot more data number two compute so in the old world if you put a whole bunch of data on a big disk physically you couldn't spin the disk and get it into the CPUs to compute on it fast enough so it's a distributed biosystem but it's distributed to you so now you can compute on it and we have all sorts of customers that are doing things in a matter of minutes that used to take them weeks the third thing is linear scale and this is something that's gonna impact a lot of IT organizations you can build this thing up by just popping in more of these little commodity servers so changes the procurement model drastically it really means that you don't have to do this buyout way ahead of time of all of this capacity and and I always say those three things are the bread and butter and once you get past the bread and butter then you get to the differentiators and there's only four and I'll hit them quickly number one that you also mentioned is this concept of no schema on right so when you bring the data into the system you don't change it from however it was created you take it from it came from video came from sensors came from log files you don't change it so you don't do any work on it you just put it in the system this actually creates an agility that lets people over time to transform it over and over again it's really really powerful and that basically means that you can start now to do things like real-time views of this as well as historical views we see customers using this for fraud analysis for geolocation analysis there's kind of a lot that's happening in that space and then and also because you're kind of bringing all this data together and someone asked the question of is it still one lake and we do often see the concept of one lake or one hub with maybe some other ancillary one depending on where the data is created but we do see that because a lot of people want to share their data across these silos so they're breaking down the silos in their business and sharing data which leads to the last basic premise of true innovation the ability to really get at the data very quickly because you don't have to ask IT to go find it it's just there transform it and then find some innovation or fail fast and move on so Abhi you built one of the first data factories you know in the financial services business could you have done it with existing technologies okay so anything you'd add to Amy's turn me off and I'll just yell so I think I just have a I use three words to make a seven point which is faster, better, cheaper you know we've all been in technology long enough and we've all just said that it's impossible to achieve faster, better, cheaper in technology you know if you want two of those you'll have to lose one right and finally if you want better and cheaper you can't be faster if you want faster and cheaper you can't be better and we finally have all the tools in place thanks to Moore's law and no one else right which allows you to get faster, better, cheaper in technology finally that's my you know three word answer to the what's different I think the biggest thing that's different is got nothing to do with technology it is fundamentally customer behavior is different and as industries you know at the point that you made the practitioners would make the most money it's very interesting as to how you pick how we pick our customers to work with which is the segment we are the term we coined I think first shared with you the sampling is dead right and what it means technically is that you can actually treat your customers in segments of one what that means is that just like and we joke with this with our banking buddies you know to say there's software specialized in financial data and we say if Google can personalize a search for two billion people the least we can do is personalize banking for a hundred million you know personalize retail for three million and that is what's fundamentally different that if you have to treat every single customer as their own segment and build products and services as a business that caters to Dave, Amy, Jeff, Peter, Arby how do you do that with an economic model that can scale and luckily for us in this Darwinian moment that we are living we can finally a business can finally enable an understanding of customer behavior and treat the customers and give them for every single customer exactly what they want at their price I think that's what's fundamentally different everything else that we are seeing is the evolution of technology that enables a business to deliver on that promise I think that fundamentally will drive companies like ours you know delivering customer intelligence Amy's or Peter's new gig to figure out how can customers be treated with products and services that no one else can build except the smartest practitioner that is fundamentally different with the premise Peter that we first heard from you that notion of practitioners will create more value than the suppliers interesting thought you made a lot of noise about the existing old guard particularly your Oracle SAP I don't know what you said about Teradata but I'm just going to throw them in there nothing good so you were very outspoken about that and as you joked the other day when we were prepping for this family you said yeah and now I'm out of the job have you changed your tune on that you still feel that way I mean Oracle threw off $15 billion in free cash flow in the last four quarters now the stock really wasn't rewarded for it maybe to your point but if they just gobble up some of these big data guys so have you rethought that at all or have you dug in your heels and even feel more conviction for that I wonder if you could talk about that yeah so I was a self-made analyst with talent for ten years and left for a bunch of reasons and went back into industry worked at Mongo for two months and then realized I didn't really want to have a boss which in my case meant I didn't really want to have a job so I'm not working now and I'm thinking deep thoughts but what I have learned how close are we to the fight? spend more time with my family what I have learned or what I have come to believe is that definitely this is an enormous opportunity for companies nobody buys a product saying I'm going to spend $10 on this piece of technology because they only want to generate $3 in return if I'm spending $10 I need $100 or $1,000 so there's no doubt that if Cloudera sells a billion dollars worth of product that they're generating $100 billion worth of value for whomever they, whoever their customers are so what Jeff was talking about and what you were alluding to earlier is we ran a study of what happened to all the ERP vendors all the customers of the ERP vendors in the 90's and what happened was ERP gave these vendors the ability to automate gave the customers the ability to automate gave them scale let them get much bigger so that automation process created an entirely new kind of company where your R&D spend is a percentage of your revenue declined by a third your actual revenue growth was up 8x and your market cap grew by 5x and what you're going to see now what I think you'll see is we've already got a handful of multi-multi-billion dollar big data companies Netflix Facebook, LinkedIn, Google and Adobe is arguably now a big data application vendor so it's happening and we just thought Google was searched and we just thought Facebook was like what people used to show pictures of their cats and stuff and LinkedIn is Facebook for adults but these companies fundamentally could not have existed without this technology and that's where we are Let's take it of the ROI I saw the numbers Practitioners expected $3.50 was it? They're not talking of CEOs if that's their return that they're expecting Amy, what are you seeing in terms of desired ROI and actual ROI Jeff was saying it's the means $0.55 which is not too good $1 spent to get $0.55 back but to talk to the guys that some of the folks that Peter just mentioned a lot bigger numbers, what are you seeing? Well, yeah So those numbers were great and I think what we see a lot are numbers that are a little bit lower than that and that's because of the other chart that you had that said how early the adoption is So in most enterprise customers and it's not the Google and Facebook, the Web 2.0 kind of company but the enterprise company what we're seeing is that the simplest way for them to adopt this is to pick an operational efficiency use case some place, maybe it's compliance so FINRA for example they're required to store 15 years of data 80% of their IT cost 80% of their IT cost were being spent on storage so we're seeing a lot of folks that either couldn't store or couldn't process their data in traditional environments so they moved to Hadoop and those ROI numbers are really easy for them to measure because they're hard and fast I've reduced my storage costs or I've reduced my processing time and therefore I can put that time onto something else so a lot of people start on that operational efficiency and then at the same time start looking at the things that are going to hit the top line and really cause a basic transformation in the business so we kind of see it happening those two things happening in parallel a little IT driven on one side creating that environment getting some of those lower numbers but enough investment number to actually justify the cost of putting this infrastructure in place and then the revenue generation the innovation use cases take a little bit longer so as folks start to get into those that's where you're going to see the faster and the larger ROI numbers as Nokia to the extent that you can talk about that that kind of larger return and I'm going to ask the same of you in your previous practitioner world so some of the things that we did at Nokia when I was first there so I don't know if we went into that background but I built and ran the big day team at Nokia for three years before I joined Cloudera so some of the things that we did there was to basically bring in search files that we weren't using at all looking at some of the app store to look at so some of that stuff was basically operational efficiency and then we eventually got into because Nokia owns NavTech the mapping property that's in most of the cars and the garments what we got into was really building out maps on an automated basis from GPS probes off of the phones that took two years to get to that point where we could automatically refresh maps on a very quick basis and that made a major change in the business not only on the cost side somebody has to drive a road or go there but if you can do it from a GPS probe then you have an automated process but also you can get those maps to market more quickly so that you can actually raise your revenue and then there's incremental market offers after that which is traffic patterns traffic patterns are based on historical traffic models and then you layer in incidents on top of it so we all know and suffer with traffic every day of our lives there's a lot of people out in the world working on traffic on top of some of the basic things what are you seeing now what have you seen in your past in terms of that return on dollar invested Nokia and Bank of America are examples of innovation with data they both basically got bankrupt got sold one got sold and one was technically bankrupt Nokia bought back a new company and they're owned by Microsoft they're Nokia now the devices are owned by Microsoft so you gotta keep track of where things go re-invention is a good thing Google Maps changed how mapping is done there's a reason why people don't buy GPSes anymore besides the point I think there's an interesting way to think about ROI I think the numbers are very telling and it's a good thing I think the issue with the numbers are very low right now looking at the market it's not a return on investment it's a reduction of investment and that's the numbers are low so when there's an operational use case with big data or Hadoop it will reduce your existing investment and the numbers will be lower you have a very good interesting customer it's a big bank we can't name the name not because I wouldn't love to you know I'm an open guy but I can't I would be sued if I did which is not a good thing but they bought our software for payment analytics and they are one of the largest treasury providers in the world and they returned their investment in our software in 30 days in a higher side of that's a new business process they actually used data which was differentiated in the market to build a product no other bank can build so the slight change in your practitioner numbers that I would like to offer is I think practitioners will win but it's not going to be the practitioners who adopt big data it's going to be the practitioners who have unique data if you don't have unique data no matter how good the tool set so when you're looking at the players who will be winners in big data if they have a data asset that isn't unique no matter how good my software, Amy's software, Peter's research next software company when he goes to it doesn't matter because your data asset itself is commodity the thing you have to remember is data is the new infrastructure data is not the new comparative advantage what you get out of the data is comparative advantage so the reason why Google has been so successful is because their online data is unique and Facebook is because their social data is unique remind people is Google knows what you search for Facebook knows what you like Amazon knows what you buy so what is the most interesting piece of data banks know exactly how you transact Nokia GPS knows where you travel Google knows a lot more than what you just search for so I think you have to really start thinking about return on investment with properties practitioners who have unique data assets and that is a hard thing to go eke out I think if you can find those properties they will do incredibly well because data again as I mentioned is infrastructure what you get out of it is the competitive advantage you talked about this notion of what some people are calling this digital fabric and the basic concept is that you've got industries vertically and horizontally you've got whether it's infrastructure as a service social applications etc and one of those sort of pipes is data you're putting forth that data is a way to traverse potentially different opportunities and you're seeing actually you see Amazon now competing with Google Apple in media and healthcare actually traversing taking advantage of that digital fabric question it's a great question first of all improve productivity and increase the revenue by 1% really is what they're looking for I think a crack at it I'm not very familiar I'm not a GE business model expert but the big data becomes interesting we were talking to an interesting company today morning Syncsort, John is in the audience they're in 4,000 companies and they are a mainframe company the interesting part about that was outside of the fact that he's a great guy there are 4,000 companies in the world that still use mainframes but the interesting part the point I'm going to make with this is there's a lot of legacy assets that companies like GE have and I believe personally they have a lot of unique data assets they probably have the only or single largest source of airline use information through their engines they probably have the GE appliance that they recently sold it I try and keep up with some sales and M&A transactions not all of them but they sold the appliance business but they have 30% of the world's appliance data on how you're using your washers and dryers and microwaves the ability to take a base be it a bank with 100 billion in revenue and then find the operational improvement or revenue improvement these are big numbers 1% on 100 billion is 10 billion and you can use unique data assets to hopefully get there whether it's cost-saves or revenue I'm assuming the GE one was more cost-save driven GE's expert at cutting costs out but we don't see yet this is us to say that as an intelligent software company we see very few customers talking in return on investment we see a lot of customers talking in reduction on investment but I think it's going to change I think this is a tough item to grasp the bank we worked with was interesting because they realized that their payments business was unique they have 50% market share and no one else has that and they had visibility in the world economy like no one else does I would buy that stock if I am an investor I want to buy that stock because they are taking a dynamic approach to changing the business model versus saying I'll save $50 million in cost which is interesting it's not a market we play in it's really important because if GE for example is able to reduce processing time somewhere make more efficiencies reduce those costs that means that they can reinvest in other areas we see both sides of the coin and I've worked with just in one year over 200 customers that are at various stages of this big data adoption we see both sides of the equation in every account think about somebody who spent this much last year and now are you only spending this much this year but if you can then take those savings and pump that back into new investments in big data and analytics that's one way to get the flywheel moving and FINRA for example it's $10 to $20 million a year they're going to save by just moving their storage into the Duke environment so now they're doing better fraud analysis we're going to be doing a similar reduction on investment emphasis today you know I when I talk to these folks we're just talking about the big data side we don't drag into the rest of their budget I want to just talk about what's the new and exciting stuff we're going to do so we'll let you guys come and do the rest of it yeah no I was just you know it certainly makes sense that's where you're going to start but I think it would be a mistake to look at Hadoop as just a low-cost storage environment because then you're missing a huge part of the value proposition and that's kind of where that you might call that a lake is it just going to sit there in a lake and that's not what you want you want to bring the data together so that you're going to do something to it to your point I would say based on the conversations we've had over the last I mean I see it changing in the last six months I think it's started to do summit and a lot of the conversations we had in the cube here today what are we talking about today on the cube machine learning, analytics we're moving up the stack slowly up to actually making use of all this data that we're people are collecting in the so-called data lake and that's an important step but really a lot of it does start with those efficiency planning Peter we were talking about your scenario practitioners and value and all that stuff and that was sort of the first part of the question the other part was sort of what happens to the existing guys the guys that you you know didn't have many good things to say about you know we're kind of always looking when these waves come along for new players new billion dollar companies to emerge and the old guard getting crushed although a lot of these companies did the crushing back in the client server days with so much cash and the ability to sort of like we were at Oracle open world a couple of weeks ago everything was cloud remember 8i and 9i that was internet and then 10g and 11g that was grid and then grid turned into cloud so it was 12c but basically you know Oracle, Larry Ellison they act like they invented it right they did weight and weight and weight buy a company spend money is that not sustainable what's your outlook for these large enterprise software world so I think that something very bad is happening in the startup world and it's the Intel investment in cloud era really crystallized it for me you have Intel saying we have arguably 100 million dollar company that we think is worth about 4 billion dollars so Tom and Mike say fuck 4 billion dollars we need to sell a ton of stuff so so they have a big problem right so they have all this money and the they're not going to take that money and make a lot of really compelling technology buys or do something great with the product they'll do a little stuff around the edges they've got an enormous go to market effort so what they're saying is we're going to build our differentiation through distribution so if we generate 100 million dollars in revenue has that 100 million dollars cost 150 million dollars to generate 200, 250 million dollars so revenue growth is only a component of the story it's not really the most interesting part the most interesting part is when are you going to make money so we see mongo raising 250 million dollars now data stacks is raised 250 I mean the amount of money behind these companies is enormous well I mean look you've got salesforce.com that does 5 billion dollars in sales doesn't make money and has a 35 billion dollar market cap so nobody gives a shit about making money anymore they just want to get big so we're bigger than Hortonworks because we've got 100 field sales reps they have 75 field sales reps so we're doing better than they are Cisco has 35,000 field reps Oracle has 35,000 field reps if and when Oracle comes in and buys one of these distros and gives it the Oracle Glow sure you hate us we don't care because you pay us every year more than you paid us last year which is the case for 95% of their customers if you're not the company that got bought all of a sudden your differentiation through distribution is gone because I don't care if you have 100 reps and Hortonworks has 75 because if Oracle buys Hortonworks now they've got 25,000 they've got their place in a broad data architecture that already exists so no I don't think the legacy guys are dead I think the legacy guys are doing what they do which is let's see how this sorts out a little bit and then we'll make our move and I think it's going to be a very bad outcome for most of these investments in these technologies until maybe they screw up maybe they don't you want to weigh in I like to weigh in it was a nice amount of money it didn't line my pockets personally so it was a nice amount of money one of the first things that we did with it was jewelry is beautiful so one of the first things that we did with it was to acquire a company called Gazang because a piece of our portfolio that was missing was encryption and so we acquired Gazang we're integrating the encryption which is vitally important particularly in the financial services arena and retail, healthcare, anybody that has personally identified information and we're integrating into our product but the other interesting things that are happening is that Intel stopped doing their own distribution on a redistribute cloud era and we kind of took the intellectual IP from that and are putting it back into Apache and then I think the third thing is that Intel, they kind of in the late 90s said Linux is coming and they did a major investment in Red Hat to change their chips because they really want to have the footprint in the data center to run Linux then they saw the major shift in virtualization they made a major investment in VMware so this is what Intel does this is how Intel takes a step ahead of the fact that they have that strong footprint in the data center So has Intel, Jeff, anointed cloud era as the Red Hat of a dupe? I don't know if I'd go that far I mean it's an interesting play from Intel I think at the time, the numbers were staggering for sure I think it does fit to some extent with their in and of things play it makes some sense there but I don't think that means no I don't think cloud era is necessarily the Red Hat of the dupe Well there are some differences between the environment when Red Hat was emerging and what's happening now Red Hat didn't have a lot of competition where it's clearly there's a lot of competition in this market now So I don't know if that's the right way to frame the question Do I have an opinion on that? No I think it's So I'll take I'll answer you I may talk about the Intel investment from being a smaller part of the ecosystem we set higher up in the stack all of our software runs in Hadoop and we deliver customer intelligence I think from a player in the stack and already the emergent big data world the Intel announcement was phenomenal because I think it has emboldened one player in the ecosystem to go hard you've heard me say since day one since five years ago that Hadoop is bigger than Linux or any other open source wave combined many times over because it goes after the fundamental building block for enterprises in the new I call it the second industrial revolution seven years ago which is data will fundamentally transform business models if that is true it is awesome for companies like ourself to have a player like Cloud Era go out and say finally this is not complimentary the enterprise data hub will take away business from the data warehouse to the large player's cell that conversation needs to happen unfortunately not all we were tired sitting there as a player saying this is not complimentary it is truly disruptive I think disruption is the wrong word this is Darwinian we are rebuilding the technology stack IT is a three trillion dollar market it's the only place I disagree with your analysis maybe we'll do another one Jeff because you're so good at analysis IT is a three trillion dollar market 80% is enterprise software 2.4 trillion dollars is up for grabs every single one of it in the old stack and there will be multiple billion dollar companies built off of it I agree with Peter that I don't think he exists it's not time to put the nail in the coffin of the big players I joke and say you should pick a big tech player who has a good M&A team I don't think there will be a red hat I don't think there will be a red hat for Hadoop I think Hadoop provides such an interesting building block to architect what we call customer intelligence management software I actually do believe and agree with Peter that you will see some interesting acquisition M&A activity because I hope my friends in the analysis committee agree with me can you find people in the stock market and I know people in the stock market have a different intelligence level and they'll buy a lot of things but Tableau has a 4.5 billion dollar market cap on 80 million dollars of I think quarterly revenues in Cloudera had investment at 4.2 billion with no close Tableau's revenues is it frothy or not is not my opinion but I hope investors are smart enough to realize that market caps deserve some fundamentals at the back and I don't see you I think it's very difficult to build a multi-billion dollar open source software company and one's been built and I think we won't see many more Peter when I said the rich get richer you started to so to say well wait a minute you disagree with that so enterprise software world they continue to bump along more consolidation what we've pre-selected ourselves to be a Hadoop world and kind of get involved in this echo chamber that's all these guys saying how great everything is and all the opportunity and I believe that's all true I think time horizons and rate of adoptions are being massively compressed I think the reality is it's gonna happen much more slowly than people think so you have I live in Silicon Valley and that place thrives on innovation right and people are always experimenting but then I come to New York far too often and there's some innovation here not to the extent of the valley but there are all these other states I fly over that don't give a shit about Hadoop or any of this stuff right they're just I mean there's exceptions there's John Deere and there's guys along the way but when you talk to these customers and you ask them what they're up to they're like you know we're fine we'll wait and you have waves of adoption and you're not gonna really get into the bell curve of adoption for a long time and you're not gonna get until that boat into that bell curve of adoption until the technology is really vouched for, vetted people don't perceive the risk somebody can go to anybody that's buying anything from Cloudera is also having to talk to 15 other vendors for who's your production database and what are we doing for analytics and what are we doing for all these other accompanying technologies and when whomever it is says we have enough pieces of the architecture that you don't need to talk to 7 other vendors we have all of that then we really start to see things go mainstream so it's a lot of fun to be in the echo chamber and we all feel great about it but your number of 500 million in revenue for 2012 in big data software probably came at the expense of about a billion dollars in spend right so we're selling dimes for nickel hopefully I should have a good business until we see these companies get profitable and this isn't just big data this is all these cloud guys this is Tableau and until we see the market really start pulling stuff into the market which I associate with profitability we're not there yet and again if all these startups are spending all their money on distribution and cramming stuff into the market we're accelerating the eventual trough of disillusionment which is going to happen then the assets are available for a lot cheaper than they were when Intel got cut or it was worth 4 billion maybe it is and I don't know I don't think so and then we can sort of get on with it I think I would like to add something I agree with and this is very rare I've got a panel that I have not used the effort or the S word and you have which is very good the S word which is awesome I appreciate it I think customers don't get a shit if you have Hadoop in the back end because I can do it now he's used it already the thing has been said now you can't you're about to go public I don't have to go public for a while but they don't care if you have Hadoop if you actually have a piece of software that can make you money I can promise you the business guy will not ask you what's in the back end and I think there is a sense of maturity when the entire ecosystem reaches that level when people stop asking you so do you do it on Hadoop and we're not there yet but look I do believe this is a massive market I think Clara is worth multi-billions whether they go IPO whether they go private doesn't matter the telling thing for me is more about the ecosystem and it's telling because you're part of it but you go to a partner summit you can go unnamed and you have two customer success stories and you ask the customer the question which is what else do you use outside the distro and this is full of partners and they say nothing and that's a problem because unless that partner ecosystem matures all these numbers are moot because yes you'll have three distribution companies the $100 revenues what about everybody else and that's when it becomes interesting is then the ecosystem is developing a healthy ecosystem and we are partnering together and you're launching multiple hymen or companies not just in the distribution there when that picture looks different and I agree with Peter I think it will take time I think there are trailblazers like us there but it takes time you know there is there is no partnering in the Hadoop ecosystem I can assure you of that there is big players working with the new players saying we connect to it so we are Hadoop ready it's become like the new dot com it's the dot big data so you put everything in your big data ready but the the smaller players get no support from the people with the billions we are building a company the fun way right we are profitable we are paying taxes this year we have raised zero institution money but then it allows us to be disruptive but I think unless this ecosystem learns how to partner better to your original question how do you get that if it is if this is truly if you all buy this is Darwinian whatever word you want to use and truly multi-billion we have to prove to each other all of us as an ecosystem we can ride in this together and do that not of us do I want to bring the conversation back to the practitioners and I do agree with you Peter I see waves of adoption of this big data clearly started in Silicon Valley moved to Washington DC and New York and then I have to tell you maybe you get to fly over the middle of the country I've spent a lot of time in the middle of the country with you know companies that make the food that we eat the food that gets grown the equipment that moves the oil and gas healthcare all over the place a lot of the conversations are just starting but they're really interesting conversations and I also agree with you on the differentiated data conversation I have with every single customer so there may be things that you can do that are low hanging fruit that just kind of fix things in your business but always try to figure out what data do you have or what data can you create that's different from everyone else every single business that I've talked to once you get them really cranking up here they can think of some data that they have or can create whether they're in retail the retail we talk about creating a gamified experience across the brick and mortar and the app on the phone and you know the presence on the internet and really starting to think about things more like being a casino with healthcare we talk about where's this data they've created the videos between the patients that have some really vital information device manufacturing companies that they go to trial to try to get the device out and the FDA at the end says you have a number of successful trials but along the way all of a sudden they realize we can actually look at what happens in a trial and come up with a better care path for how a doctor should be using that new piece of medical equipment so it is really early totally agree with all of you that it's really early but the conversations are exciting the highest value here is that the people who are really thinking about data in new ways are going to transform our lives we all want to transform what happens to us in healthcare we all want traffic to be fixed somehow so that we know where to go when we leave work at night we all want all of those things fixed and they can all be fixed by data but yeah Peter it's going to take a while absolutely so a lot of brain power and that's what Tamara is just hanging out with you guys questions from the audience privacy and security is a huge issue there's a delicate balance between convenience in our lives and the use of data that we produce in order to create that convenience I think it's going to change legislatively multiple times over the next decade so we're all going to have to kind of keep in front of it I firmly believe that privacy has to be privacy policy has to be attached to how the data is used not how it's created but that makes for a much more difficult way to process data which is one of the reasons why there's more cost I think it was your top line which was 52% of the effort was being put in data transformation so we built a number of new things I mentioned we acquired Gazang we built a new Hadoop component called Century which does better access control it's going to push the encryption down into the chip so there's a lot of things out there but there needs to be a lot more people processes around data governance around all of that I think we have to make sure we're differentiating between privacy and security so security is you should not have this data you can't get at this data I don't want my social security or my credit card numbers available on the internet privacy is can somebody see what I'm doing on the internet privacy great fodder for the media I promise you we don't give a shit about privacy raise your hands if you're on LinkedIn everybody's on LinkedIn you don't care about privacy Facebook, you don't care about privacy do you use Google Chrome you don't care about privacy we we sacrifice privacy for convenience every single day we love to bitch about it and to care about privacy so we don't care my angle would be slightly different I agree that you should separate the two I think like any new innovation data privacy as an issue for society has not caught up with the innovation and it will take decades to catch up we don't even know what it would look like in the future I do agree with everything you just said by the way that's number one number two there's a very smart gentleman who advises us and he built this thing called the new deal on data privacy head on so our policy on data privacy for our customers is you don't own the data we put data back into their HDFS environment they always own it and we don't give access to it security may be addressed properly I think the most important part of privacy which is what you said is I think people are willing to give a lot a lot up for something free so Eric Schmidt said if you don't want me to track you on your search engine and start paying me a penny for every Google search I won't track you I've got to make money somehow so when people are willing to get something back in return for giving something up that's in our opinion the ultimate answer because now you've incented people you've told people up front so for example I use is do you have an iPhone or a smartphone and which company AD&T doesn't matter when you bought that iPhone or their smartphone every single telco makes a sign of disclosure whether you use your phone or not doesn't matter and if they told you that up front and said you can sign this and it will give you everything for free free service, free monthly some people will say yes and some will say no or you can pay us a lot of money and we won't track you that conversation has to change because if there's value in exchange then you know what you're giving up unfortunately it's not transparent right now that's the problem the other challenge here not just privacy but the model is and you know you can the provider can use your data in a certain way now what big data allows you to do is to and the value proposition is to find new uses for that data so that creates a challenge as the vendor as the person, the company creating the insight is do I have to go back to the customer every time to get their permission to do a new type of analytics and try to find a new insight and then you know the other thing is the old model the more rigid enterprise data warehouse model where you modeled everything ahead of time you could build that governance into the model you can only do certain things with that warehouse it's built that way you can only ask certain questions and you can build that in with the new model the schema on read you're asking questions that you don't predetermine so does that so you could run up against situations where you're asking questions that run a file of compliance regulations so how do you and I don't think we as an industry have figured out at all how we're going to handle that problem yeah it's industry and it's also government so we see a lot of people talking about legislation that may happen in this case instead of having that big huge privacy policy that you sign up front that it would be more use case based and that's kind of why I talked about the security components of it you have to secure the data in the system and to be able to audit the data in the system to ensure that you actually have implemented the right privacy policy may I ask a question there was a thought as you were answering it Jeff but don't you believe that Jeff in that case and we say this to our banking colleagues a lot that industries that are extremely good at working with regulations like privacy like compliance it actually could be an advantage for them a comparative advantage that as privacy does catch up because it will whether we like it or not the government is going to be in our business that's the price of living in a lawful society don't you think that industries that are adept at running in a regulated environment have a unique advantage as data privacy becomes a commercial concept if the flip side of that is you were not adept at it and you continually run a foul of regulations well that's going to be a problem but it's also even non-regulated industries where it's maybe you're not running a foul of a regulation but if you're doing something that's deemed unethical and you're on the front page in the New York Times because you're targeted and you figured out that you're a 15 year old customer is pregnant before the father did it's like well jeez should we be doing that I don't even know if that story is true by the way I think it's true heard the more was the data breach the security part heard the more that's a very good point that's a very good point but beyond regulation beyond regulation just because you can do something with big data doesn't mean you should do it whether there's a law banning it or not you know it's a very good point there's a phenomenal gentleman he wrote a book Philip Evans I don't know if he ever heard his debt talk it's a must listen to and he argues that the cost of continuing to run data as infrastructure is going to go so close to zero that the marginal cost of adding a product or service so the economic right when the marginal cost of adding a product or service is zero so an incremental search on Google cost Google nothing and they can afford to sell an ad for a penny it's all profit right so I would not give the answer I'll take Philip I'll default to Philip with a good friend of ours and say yes I think the economics of managing this unique asset called data have become so commoditized that every industry where the marginal cost of an additional unit of product service if you don't have a platform without zero you'll be out of business so I think it's an excellent point and I think that's an interesting dimension on how do you pick the winners which is have they built a platform that can scale without adding more cost when they add more customers so if you have to spend more money to add 100 more customers you don't have a marginal cost of zero and it's a problem but I think I do believe that Moore's law and whatever the law we create Cloudera's law whatever the cloud may be and I say with respect whatever laws we create you didn't like them I love you guys you don't like me I don't know why but Nokia is a great company you know but I think it is here to stay and we will see tremendous opportunity with it where every market is green and my nightmare you know that won't happen that would be worried yeah I think the question was how far are we from this future where the internet of things where places people and objects when they're all connected and spewing data why do we why even search you should know me I think I'm paraphrasing you should know what I'm looking for even before Eric Schmidt's concept of Google's vision is I think the same thing that you said we should give you the information before you search for it and how far are we from that that was the question it was talking to a retailer yesterday and just to put the RFID tags and all of the different clothing there's a cost to it so they've got to look at their model and make sure that they're going to get some sort of return on it but I would have to say in every industry we are seeing sensors so the railways every travel industry entertainment retail stores healthcare is huge and the laser permanente is starting to do a lot of work to help to heal people in their homes so that they can gather information about people and their health in their homes so they don't have to go to a doctor's office or hospital where you're likely to contract something else because of the people that are ill there so we're really seeing those sensors showing up everywhere but it's really going to take a while one more question better be a good one if not I want to thank Peter Obby, Amy, Jeff really great panel thank you very much okay now as I think you know don't leave if you don't have to we are celebrating five years of the cube at the dupe world the first year I met you 2010 and so out of here we got drinks, got tons of food so please stay and thanks very much for coming today if you guys like share and we'll email them out to everybody so again thank you for coming and please hang out and have a drink and some food great job