 Okay, we're back live here in Silicon Valley in Santa Clara, California. This is Silicon Angles, exclusive coverage of O'Reilly Media Stratoconference. This is where big data is driving the world. It's the ground zero for innovation. Business meets technology. I'm John Furrier, the founder of Silicon Angles, and I'm John Furrier with my co-host, Dave Vellante, wikibond.org. And we are the reference point for innovation and big data. We've been covering the big data world since it really was existed. Dave and I have been to all the Hadoop worlds, all the Stratas, all the Hadoop summits. We've seen the movie over and over again here in big data and we're going to break it down for you, but I'm especially pleased to have my friend Bill Schmarzo on the cube today. An early guest at EMC World and on on, we call him the Dean of Big Data. This is back when it was just a small little cottage industry growing and we called you the Dean of Big Data and it sticks, I hear. I hear it's on your blog, the Big Data. That it is. Yeah, well I've got to take advantage of that. You're pumping up my marketing, and I've got to follow through. We've got to call EMC and definitely let them know copyright violation of that immediately. Good thing we're open source. Yeah. And I'll say it's great to have you on the cube and obviously it's an exciting time. You're at EMC where you're on the consulting side, but obviously you've been involved in the Green Plum days and you've seen what they've done over the past iteration, so obviously big news with EMC and EMC has Green Plum. They have Pivotal Software, they have CDIS and a lot of other software, Paul Moritz, spinning that out into a new company called Pivotal Project. Not yet, no details around that, but yesterday was an EMC announcement. You guys announced Green Plum Pivotal HD, which is a new distribution of Hadoop, software-based innovation, no appliance, distribution of Hadoop, really targeting the data warehouse market and the business intelligence market, so one was a really excellent announcement, a lot of sizzle, some steak, we've been commenting on what that kind of steak that is. A lot of people are up in arms by the aggressiveness of EMC against Hive and Impala in particular, and we had some Cloudera guys on, but I want to get your take because you've worked in the industry, even prior to EMC, and you're out with customers and you're on the consulting side, so one, what's your take of the announcement with Green Plum? Obviously you're a little bit biased, we'll comment on that, but really tie that directly to what you're seeing in customers. So John, the BI community for the most part has been on the sidelines throughout all the big data discussions. They've, for whatever reason, they've decided not to participate, not to play, not to get engaged, not to fight, and what's exciting to me about yesterday's announcement is it has opened a channel for the BI folks, and there are tens of thousands of BI trained individuals out there, maybe hundreds of thousands, who now have a ticket for playing in the big data world. They no longer sit back and say, well, I only know SQL, so I can only access relational databases. Well, now the Green Plum announcement has opened up the world for those people to just directly jump in and play. So talk about specifically, because we've had some commentary this morning, we had Clouder on the record, we have yet to talk to TerraData and those folks, but the business intelligence market is really driven by the data warehouse. Now big data opens up a little bit more range, and so there's really two approaches we're hearing early on in the commentary of the NALS, a big data platform to identify new use cases, new data sources, and then the pre-existing data warehouse market, which is SQL based, which is part of the announcement, rigid some say, but yet a lot of people still have that. So why is that such an important distinction to say, hey, it's okay to have business warehouse. So maybe it's a little bit, some people say cheap data warehouse, some say viable, more fast and more innovative, because there's no appliance. What's the defense from your standpoint on why that's a relevant announcement and why do you think it's going to be successful? Well, there's a couple of things, John, I think are important takes in this. First is no one wakes up in the morning and says, no business person wakes up in the morning and says, oh my gosh, I need a data warehouse, right? Data warehouses exist because they allow organizations to make better decisions, to ask questions they couldn't ask before, to gain insights into the business. And the challenge that the business intelligence or BI market has had is they've been hampered by the inability for data warehouses to support really detailed high-speed analysis of data. With this announcement yesterday by Green Plum, the burden of proof is now going to shift off of the liabilities or the governors of the constraints of the data warehouse to force the BI organization, the BI vendors out there to really step up and provide the ability to not only harvest more detailed data more quickly, but also to bring in semi-structured data into that analysis in a way that allows them, the business users to tease out more insights about their customers, about their products, about their operations in the marketplace. Yeah, so in a way the BI guys haven't had to worry so much about big data. They could say, yeah, that's sort of this experimental thing over on the side and we've still got our thing and we're doing our reporting and the company's relying on us and that's starting to change, isn't it? Talk about that a little bit. Yeah, I think it puts a lot of, so whether it was by choice or by necessity, the BI vendors have stayed away from the big data space, right? Their tools were constructed to optimize around SQL. And so it really didn't allow them any way to play in the Hadoop marketplace and leveraging MapReduce. There was some hacked around kind of ways to do that, but none of them was very clean and it was very efficient. So the BI vendors have pretty much have been, have stayed on the sideline. With this announcement yesterday, they are going to be forced to have to address this marketplace. They're going to be forced to have to play. And I think it's going to open opportunities for either existing BI vendors to move into that space or I think will be interesting as new BI vendors to step in and provide some very differentiated capabilities to allow the business users. I'm talking about the people who run marketing and sales and finance and store operations and inventory and supply chain. Those people who don't really care about data warehouses but they care about getting insights about their processes, how to make decisions more quickly and to ultimately to get into more of a predictive environment where they can start making predictions about what inventory they need, what marketing programs work and et cetera. Describe how a business person traditionally and you know, I think you come from the BI space but traditionally would interact with that corpus of data, the data warehouse and the BI professionals and describe your vision for how that will change with Hadoop and big data. So I think in today's world, the BI community is dominated by reports. Whether it was business objects or Cognos or MicroStrategy, their number one selling product is without a doubt a business objects was our reporting tool, right? Which basically just monitors existing business performance. And the business users have been dying for the BI tools to leverage both real-time data and the predictive analytics to uncover insights buried in the data, to provide recommendations about what they should do. And if you think about it from data warehouse perspective, we've been delivering aggregated data for so long that if I'm looking at a total brand number and below, which might be change value zero and below that I have product A that has increased in unit sales by a million and product B that's decreased unit sales by a million. If I'm looking at aggregate data, I'm missing that. And so a lot of the nuances in the data have been missed by the business because the data warehouse didn't allow them to do it. And the BI tools only as good as data underneath it. So now we change this, right? We now provide the ability to have access to very low level detail very quickly. And now so those nuances in the data that get buried now can surface to the top and people now can start acting on those data that the system can find insights in the data and delivers to the users and help them to basically figure out what's really driving their business performance. Okay, so Bill, I got to ask you, so you're at EMC and so you've got to have this notion of a journey. Yes, that's right. That's been a fundamental part of your marketing now for the last couple of years and it's good because it sets a direction, it sets a vision and users like to, okay, here's where I am, it's basically a maturity curve. We were talking off camera about sort of the maturity, the adoption of big data. I like to think of, okay, you got the internet giants, the Googles, the Facebooks, the LinkedIn's, you've got some financial services and some government agencies that are very much heavily adopting it and so then there's everybody else, but take us through that journey, where we are, you know, essentially, give us the sort of state of that spectrum, if you will, of maturity. Yes, so what we're seeing is that the organizations who have a significant investment in their BI environments and their supporting data warehouse environments are looking to take, move from being sort of a business that's monitoring to one that's really trying to uncover insights in the business and looking to optimize around those insights. So what they're doing is they're taking their current BI environment and they're looking to integrate predictive analytics, unstructured data sources, real-time data feeds to tease out insights in the data that may be buried at a low level of nuances there and to deliver recommendations as far as actions people should take. Once you have that process in place, you're now already prepared to start optimizing those, to create self-sustaining analytic models that basically optimize the process, that are instrumented so you can measure effectiveness and they're constantly fine-tuning themselves. What's interesting about that process, is you start moving along that maturity curve, there's three things that happen for organizations. One, data gets treated like an asset. People start realizing that the more data you have, the better decisions I can make. And so there becomes this appetite for start gathering data at the lowest level of granularity, even data that I may not even be certain I know how to use. The second thing that comes out of this process is organizations start realizing that their analytics is actually intellectual property. That it's something to be nurtured, to be gathered, to be harvest, to be protected, to be refined and grown. The third thing is more around the organization. And as organizations start moving from monitoring to insights to optimization, organizations are starting to appreciate and adopt and develop decision-making capabilities, starting to trust data around which to make decisions, starting to get confidence in the fact that the data can actually help them to make more decisions more quickly at a higher level of confidence. So I want to follow up on something you said, the second point, which was analytics as IP. Any new trend, big data over the last couple of years, again, like any new trend has been purported to certain people are going to make a lot of money, it's going to improve healthcare, it's going to improve society, and et cetera. And at the same time, we've talked about a lot about data being the new source of competitive advantage. And I think of Moneyball as an instantiation of big data, you guys are baseball guys. And you look at Moneyball now and what's happened and pretty much everybody's doing the Moneyball techniques. They're all using Saber metrics or if they're not, they're probably losing out. Do you see analytics, for example, as IP and the ability to use data as a competitive advantage as sustainable, or is it kind of a race to everybody who's going to be racing to the mean? I wonder if you could comment on that. That's a good question, Dave. And actually, I do think it's a race, but I don't think it's a race to the mean. I think it's a race for organizations to be, can you continuously look for new ways to look at their business? So this is a Moneyball example. And one of the metrics that's been commonly used to measure fielders effectiveness is fielding percentage, which is how many balls that were hit to you did you catch? And the higher that fielding percentage, the more money you can demand for your agent. And fielders quickly learn that you can game that number. That if there's a ball that's hit outside of your fielding range, don't go for it. If you don't go for it, it can't be an error. So what happened is that there's some players out there who realized that they could game the system by knowing how they're being measured and figured out those variables and optimize around that, right? Humans are revenue optimization machines. So what teams have done is they've now have put video cameras in the stadiums and they video each of the players in the balls hit to them. So instead of measuring them on fielding percentage, they now measure them on effective fielding range. How much range can they actually cover? Which is a much better predictor of performance. So I think what's going to happen in business world as well is that there's going to be a constant look for ways to identify those variables that might be better predictors of performance and leveraging new data sources like video cameras in the case of the baseball to take advantage of that data to really know how much money and willing to pay that asset. That's interesting because baseball is kind of a, it's almost an infinite little microcosm of data and but it is a microcosm. And so you can think about a business context. The data is vastly greater, the number of possibilities and permutations. So your premise would be that, let's say everybody figures out that range of fielding that in the business context, there's going to be another source of competitive advantage. And you might commoditize the range of fielding but then there's going to be something else down the line. I don't know if that'll happen for baseball but it'll most likely happen for business. I mean, you can definitely see even in baseball is outfielders' arm strength, ability to throw the ball accurately, ability to throw players out, right? You could see that every time you think you've got something mastered that new data becomes available or there is a new analytic approach to figuring out how really to value that person. So it does become a race. I don't think it's a race to the mean. Well, it might be a race to the mean for the people who are in the middle of the pack but the leaders are going to be the ones who are always trying to find those metrics that are much better predictors of performance and trying to gain that advantage by leveraging those insights. Well, maybe it'll be the race to the next source of competitive advantage that's going to be data driven. So my last question for you is around the CIOs. They're under a lot of pressure. I mean, we see an Amazon attack the enterprise now. You're seeing all this big data stuff coming on. What do the CIOs tell you about big data? Where are they at and what are they telling you? What are they looking for? Are they part of the EMC Global Services? What are they looking for help on? So when I talk to CIOs, many CIOs are frozen. The bevy of new technologies, the new announcements that seem to occur on a daily basis has really confused them and almost frozen some of them into doing nothing. But there is a body of CIOs who realize that technology exists in order to help deliver business advantage. And those are the ones who are basically taking a different approach. They're not looking at it from a technology perspective. They're looking at it from a business perspective saying, how do I leverage these new technologies, these new data sources, these new predictive analytics to derive new business values that give me competitive advantage in a marketplace? And then whatever technology I use is great, but they're not wed to it. That they are basically willing to say, I'll throw away that technology and get rid of it if something else better comes along. But the thing that they really focus in on is, how do I start creating data as an asset? How do I build intellectual property in my analytics? Those are things that are going to be sustainable. The technology underpins that. It might be very different three years from now. So I'm ready to throw it away. Bill, I want to ask you just some questions. Obviously yesterday the announcement was awesome with Green Plum and obviously it's causing a lot of stir. But what people are missing here is there's a bigger picture around the industry, around what Hadoop can be used for. So while people rearrange the deck chairs on certain segments like data warehousing and argue whether it's data warehousing or business intelligence or is it open source or open source plus or proprietary, close, open, all that stuff, it's ubiquitous. So Intel announced a distribution of Hadoop. So another one, I mean SiliconANGLE's about to release their distribution soon. So it's going to be announced in 2009. And it's coming soon, it's going to be awesome. It's going to kill Cloudera. So believe me, it's going to be 10 times faster than Green Plum, which is 100 times faster than everything else. And I'll say this, Intel, I mean, it speaks to a platform. So just the Bill Schmar's or the Dean's take. Not so much the EMC angle, but you're looking at a pervasive, ubiquitous ability to collect, whether it's an internet of things or smart TVs. Obviously Intel is probably looking at a big data from a little bit different perspective that it's not a segment anymore. It's an industry. And we've always said on the queue, even Dave and I, our first Hadoop world was, this is an industry creating opportunity. There's plenty of beach head to go around. And that's what we said when you guys entered the market with EMC. So just what's your view? I mean, you have a lot of experience, you've been through the industry. Do you agree with that statement and what's your commentary on that? Technology is an enabler and organizations will, the smart organizations will figure out how to leverage technology to enable not only their existing business processes, but to uncover new opportunities to monetize data, to serve the customers, even to new markets. I like to comment, John, about platforms. And we talk about platforms from a technology perspective, but the smart companies are the ones we're going to start realizing that they are going to manufacture platforms. Think about, for example, Ford. Ford has got 12,000 plus sensors in their car. And they're going to know all kinds of information about the driving behaviors and the car performance. But that's also a platform upon which developers could develop applications that sit inside that car, whether it be customized radio or on help services like OnStar or mapping services and things like that. So while we think about technology and we always kind of come to this platform discussion, I think that model's going to move up in the value chain as companies start looking at platforms that will be underpinned by technology that allow them to take their products and create intelligent products that ensense to be platforms for other people to provide value on services on top of that. The petabyte club, we were joking about the petabyte club and I said terabyte by accident, but exabyte. So obviously we're the beginning of a new era, right? Obviously big data is still stuff to be stored, right? What's your experience with some of the biggest customers you've worked with? Where are they in the evolution of really transforming big data to be something that's a part of their fabric, of their organization where there's a data fabric, as Paul Moritz would say. I think most organizations aren't there yet. Then most of the companies I talked to haven't gotten their heads around that yet. And I think it has to do with a lot of organizations that still don't understand what data can do for them. The data warehouse world and the BI world taught them that they can do great reporting, but we didn't do a lot to really help optimize their businesses. We didn't provide the kind of insights that they needed in that space. So I think there's a lot of education, a lot of mercenary work that has to take place and trying to help companies understand how they're going to leverage these new massive data sources and new technology innovations to, and things like data fabric, to broaden their ability to bring data in and to create an appetite. And a corporate mandate that says that data is a corporate asset and we're looking to constantly enhance through instrumentation and acquisition and enrichment that data asset. Okay, final question is, what's next for you guys right now in the consulting side? How's that going and on the engagement side? What are some of the hot products, projects you're working on? So I have been traveling a lot lately, involved working with a lot of different customers across their industries, running what we call these vision workshop projects, which is really to help our customers to envision the realm of the possible. What can they do if they have access to all this kind of data sources? What can they do if they had access to all this kind of processing capabilities and processing power in these advanced analytics? Because most organizations, while they know what questions they're trying to answer today and what kind of systems they're trying to make, haven't been able to go through the process of envisioning how that world changes. So to be honest with you, excuse me, it's really taken off when the last quarter and a half, I've spent a lot of time talking to customers, which I love by the way. The customers teach me a lot about what's really important in the marketplace, and most of the things that are important in the marketplace have nothing to do with technology. Okay, this is theCUBE, Bill Schmarz with the Dean of Big Data with EMC. Consulting also works on the Green Plum, big data applications. I'm sure that's going to expand in scope, given some of the new things going on, and welcome back to theCUBE. Thanks for your time. John Furrier, we'll be right back with our next guest here on SiliconANGles theCUBE Strata, exclusively O'Reilly Media, Strata Conference in Silicon Valley, right back. The cube is this conceptual box, if you will, and we bring people inside of the cube, and then we share ideas, but those ideas don't stay inside the cube. We explode that idea. We allow that idea to grow and grow, and it does. So we really try to own the whole enterprise technology space. I mean, that's what we're all about. We take analysis, we take publishing, we take news, and we take live TV, and we combine it together in a product and share that with our community. No one's doing what we're doing. What we're doing, in my opinion, is the future of media, future of television, future of the internet. Video is an amazing, powerful product. So we work in what John and I talk about as a data model. People always say to us, well, how do you guys make money? We sell knowledge, we sell information, we sell data. So the problem that we identify is about what we call big, fast total data. Anybody can analyze a gigabyte of data. If you do 1,000 gigabytes, that's a terabyte of data. You take 1,000 terabytes, that's a petabyte of data. 1,000 petabytes, that is a terabyte of data. So you are talking big data, lots and lots of data, and can you analyze it in real time as it comes in, right? The cube is like, we call ESPN of tech because we want to cover technology like ESPN covers sports. John has a great vision for what's gonna happen next in tech. And so John is sort of that alter ego of mine that lets me...