 From San Jose, in the heart of Silicon Valley, it's theCUBE, covering Big Data SV 2016. Now your hosts, John Furrier and Peter Burris. Okay, welcome back everyone. We are here on day three of coverage for Big Data SV, Big Data Week here in Silicon Valley in San Jose. This is theCUBE's Silicon Angles flagship program. We go out to the events and extract the signal noise. We're here working in conjunction with Strata Hadoop and we're on day three. So we had a big event. We had our party. We had our new research that we introduced and I'm here with Mike Coase. Peter Burris who introduced that research. So Peter, good to see you. Our guest here, Bill Schmarz, the Chief Technology Officer at EMC Global Services, Big Data and Kerry James, Business Development for Big Data with EMC. Great to see you guys. Thanks, great to see you. Thanks for having us. Top of the morning. So we're seeing now the end of the show of Strata Hadoop and Big Data SV. Peter introduced some new research last night and kind of the philosophy and you're seeing that this couple of themes are emerging. One, big data is for real. Obviously you guys know that, Bill. You wrote the book. You're the Dean of Big Data. But the Hadoop itself is just a feature in the overall growing ecosystem. And then two, the valuations. Cloudera just got clipped big time on valuations. So where's the business? Where's the value? So again, value, hence the unicorn's getting clipped a little bit and slayed if you will. And also Hadoop going beyond just Hadoop. Your thoughts? So I've always wrestled with this idea that every new market thrust has to have a killer app. And so people scratch their head thinking, oh, what's the killer app for big data? Is it Hadoop? Or is it Spark now? Or is it something else? And I actually think the killer app is the apps that the customers are billing. That the value creation in the big data space isn't in the software that's being developed, but it's in how organizations like GE and the Googles of the world and such are actually extracting value out of it. Building apps that are so unique to their business, they're in many cases transforming those organizations' business models. So the customer apps, we banter on the cue all the time about this. But back in 2011, I think it was 2011 or 2012, it might have been one of the early Hadoops we were there. Mike Olson from Cloudera got on stage and said, this is a big data app economy developing and Ping Li from Excel partners went up there and said, we've got $100 million earmarked for the big data fund. And so we expected this renaissance of apps, kind of like the old way and the old ERP and software business, but companies were going to be built. It just never happened. And so to your point, there is an app market that's completely developing, but it's not what people thought it would be. And I think it validates the research you showed yesterday, Peter, that said that today the biggest winners are the consultants. But at some point in time, the software is going to supersede that, that more and more of the consulting skills are finding their way into products, not necessarily a mass product that's going to be the killer app, but lots of smaller products that are taking over piecemeal's jobs or visualization or data science work or data engineering work. So I think your research is spot on that we're going to see that transition as the services continues to grow, but the software is going to start usurping more of that services, and it's going to become a bigger play in this space. But it's not going to be a market where the services suddenly start going away. And we could talk a lot about what have been the problems with the services market in the last 30 years in the tech industry. But what is going to happen, and I think we obviously think you're right, is that we will see software start to be codified, or some of these insights start to be codified, turn into software. But at the end of the day, the applications that we're talking about building are so embedded within the differentiation and what makes the company distinct. It's not like an accounting package. It really is about the new behaviors that you're going to present to the marketplace that are going to distinguish you in the marketplace. And if everybody has the same distinction, as the Incredibles like to say, if everybody's special, then nobody's special. And these are going to be applications that make businesses special. So how do you think? I mean, look, we saw this in many respects in the data warehouse world. We had Jerry held on yesterday, and he gave a great overview of the history of data. And the observation that Jerry made is, you know, the data warehouse market delivered some value, but it maybe didn't get to where we wanted it to, perhaps because we were focused on the distinction between the infrastructure and the application. And he starts to see this notion of data lake and the emphasis on the infrastructure at the data lake level, and now searching for the application, perhaps also creating similar conditions. What's your guys' thought on that? How is that going to play out? So what we see happening in a lot of this place, you're exactly right. We have him saying this on EMC and says, you know, insight without action is useless. So it's the action that allows you to create the value. So we see we're working with companies and partners actually to embed applications into the systems. We can manage the data. That's all great. I mean, it's EMC, we are a data management company. That's what we do and we do really well. But managing the data is not the control point. It's not the tipping point of where the value comes from. So we're working with smaller organizations, like you said. We, if I go build a population healthcare application and I give it to every regional office in the United States, right? Then all those healthcare providers look identical, but we all know the regions of the United States are not the same. So I think that's where we agree with you, right? We see the development of applications, but still the need for services. And what we say that is, is we're going to build the, we're working to build an ecosystem similar, you know, to the iPhone application, the App Store, where you'll have different types of applications that are not like the App Store that are generic for every single person, but are the basis for starting that, right? We see that the platform as a service, the third platform, embedded applications, mobile, starting to see that trend grow. And that's where we see the information coming from, managing the warehouse, managing the data lakes, for, you know, the term that we all despise, but, and then bringing that data, making it easy to consume that information, but then making it easy to understand the value from that. Part of that also is moving it away from only the realm of the special few, right? The data scientist. So making it accessible to, you know, application developers, making it accessible to the business analyst to find the value and then be able to work with them. The biggest thing for us too is process, right? Because I can do these types of insights, but if I don't have the processes and people in place to take the action, I don't get the value from the backside. So I want to build on that, Kerry, because I want to, I don't want to say that we despise the term data lakes. At the end of the day, it's a term and a concept, but we want to make sure that it doesn't run away with and promise more than it possibly can deliver. Bill, how is this notion of data lakes going to be snapped back into place so that we remain focused on what's important, which is creating value out of these things, as opposed to buying a bunch of infrastructure? I think it's a challenge, Peter, and it's a challenge, let me be really blunt. We failed in data warehousing. We failed to deliver compelling value to the business. We, the industry, or we, the industry? We, the industry, failed. And industry I've been involved in for almost 30 years. And I'll tell you a story, a classic story I heard from a lot of clients that they'd budget their money for it to build this analytic environment using BI and data warehouse tools. And by the time they got the data warehouse built, they'd ran out of money. They'd built a data warehouse using the technologies that we had available to us back 10, 15, 20 years ago. It's even amazing that things even work. It was like making a dog walk on its hind legs. The fact we make these things even work. But we never got to the point where we could ever deliver value to the business on it. And so there's a lot we can learn from the failures of a data warehouse. Things like how do we make sure we don't have all these data silos, right? Data silos is the anti-data science. How do we make sure we don't have this proliferation of data where you have executives in a meeting arguing about whose version of sales is the right number? And I think most importantly, getting back to the analytics or the application part of this, how do we make sure that the data in the data lake, the data lake's only repository, but how do we make sure that we're mining that data, that we're cultivating that data in a way that helps us make better decisions? Whether it's human decision making through decision support or it's automated decision in fraud detection and things like that. Yeah, I mean, you brought this up. I wrote a comment on the crowd chat a couple days ago when we kicked off, you know, why did the data warehouse fail? Huge threat. We had some people say, why didn't it fail just in scale? Well, let's dig into that because you have some thoughts on this and I like where you're going with this because I don't mind data lake as a term, but it's not all-encompassing. It's like, hey, I'm gonna store some stuff, but there's other stuff going on. I would joke and say data ocean because there's real-time stuff going on, there's different currents and whatnot. But the issue is, okay, if I'm a customer, I want the information to be free. And Peter, your presentation last night kind of brings up this potential energy of information. You mentioned silos or the anti-data science. What, I mean, because that is really the thrust of it. It's not the one-trick pony anymore. It's the freely available interaction of the data, the engagement that is the key. Yeah, and so let me tell a story. So we developed, so two points. First off, companies that say you're gonna take and put Hadoop on top of some sort of technology and call it a data lake, they're doing the industry a huge disservice because they're bastardizing the term data lake. They're making it sound like it's just a peer technology stack that only needs Hadoop on top of it. But let me tell you a story that goes back to this freeing up to data, right? Making this data freely available. We did a project for a healthcare company in Denver. We built for them a data lake. And I would call it a real data lake 2.0, data governance, cataloging, lineage, traceability, security, all the things you need to have and something that's gonna be a living, breathing asset to the organization. And he was talking about how the data lake helped them to focus on quality of care. That was the focus area. How do we improve quality of care we're holding down costs and how do we piping decisions and recommendations to the doctors and the nurses and such. And he said, we had a surprise. We didn't expect this. He said, but we are saving about $12 million a year in reducing and eliminating shadow IT spend. Because he said a lot of our shadow IT spend was about going out and getting data that would then be stuck into some random, access database or some places. Now I stick it in the data lake and now it's available to everybody. So not only have I stopped the shadow IT spend but I've taken an asset that was gonna be put into a silo and now made it, as you talked about, freely available to everybody in the organization who can take advantage of it. So in that case, the shadow IT which had good intentions ended up having foreclosing the opportunity because it was stuck. Yes. And it did its job. Then it kind of got caught outside. It was locked in a jail. Well, I think it might not even had good intentions. It just grew out as a response to the fact that there wasn't an approach, a set of disciplines and a philosophy that data is an asset that has to be taken advantage of. And you also gotta be able to move quickly on it. You need something, the problem with a data warehouse to get new data in a data warehouse, the joke was always three months, million dollars. Data scientists can't wait three months. They wanna go out and grab building premise data to see the impact that's having on traffic patterns. They wanna be able to grab that data, drop it into the environment, do their data magic stuff with it and see what the impact is. They can't wait months. I wanna get your guys thoughts on something because this brings us to the point and last night at your, at the event, Peter, in the presentation and customer panel, the industry panel, there was a guy who raised his hand twice and he was so compelled to ask this question and I was totally enthralled by his question. And his question was, you know, hey, actionable insights. Give us the bottom line. Is that fantasy? Cause you can see the frustration in him and I see this in all of my CUBE interviews. Oh yeah, let's get to actionable insights. The question is, that's really hard to one, get the insights to be right, but then having the infrastructure as you say or environment where someone can take action. So what's your thoughts on that? Because it seems to be, the customers are saying, we want actionable insights, but yet there's no burning bush, there's no magic, pixie dust, there's no, you know, magic. Where does it kinda come from? So what's your thought, you just taught the customers, what's going on there? So, I mean, we see a lot of our customers, the actionable insights, right? It is the holy grail, right? It's where they're trying to head towards. So what we again see in that system is setting up the ecosystem that allows the infrastructure, the data governance policies, right? The security functions that allow them to be able to open up this information freely to enable it, but not so free that it, you know, it starts to wander off into the wilderness. So we also see them driving through is what we see initially right now is we see a lot of companies getting the actionable insights from taking the insights that they've done through analytical models, kind of in the science fair on the side, and they're beginning to embed those in existing business processes. So around low hanging fruit basis. Low hanging fruit, so. Nothing wrong with that. Nothing wrong with that, right? It gets them started, you know, we also have a saying, right? Start small, win and grow. And so it allows them to do exactly that. Start small, get a win that they can handle in their existing processes and their existing technologies and tools. But now what they're starting to see is they're wanting to move forward to, we're working with a couple of companies in the energy space. And, you know, they're looking at energy theft, energy fraud, but what they want to be able to also start now doing is taking energy, you know, production. You put energy onto the grid and it's not consumed, it's a wasted asset. So they're starting to look, you know, how can they take these pieces of information but provide it back to the consoles where they're generating power so they can take in weather information, they can take in the existing grid, they can take in historical data and then predict based on what's happening the power they need to put into the grid to be consumed so you don't want to brown out, but if you don't want to waste energy either. So we're seeing that. The other big one we're starting to see even inside the companies is they're wanting to put, we're working with a gaming industry customer that wants to put an app on their iPads for all of their floor bosses. So they can now start to see in real time interactions with their customers so they can work through those pieces and understand how they can drive action. Because if you're having a customer that's having a bad night or having a good day, right, if I don't provide that information to the boss and the application is what allows them to get that insight into action. This brings up an interesting point that you brought up because you mentioned data science and the silos of the anti-data science and this is interesting because what you're talking about here is not a back office IT configuration. It is a frontline line of business, front forward facing thing. So the innovation on the digital transformation seems to be coming in from the front lines, not the back office. So the question is, it seems to me that the data science are frustrated. They're stuck in the middle between I'm waiting over here. I mean, I mean, that's my uptake. What's your take? So I think this process is really simple and as Kerry was talking about and a lot of these projects we're doing is we're figuring out what decisions people are trying to make and we're delivering recommendations of what they should do and we use concepts such as scores. We're working with a client who's trying to create a retirement readiness score for all their clients in the financial services space, right? And people understand the concept of scores and scores are a very powerful way to communicate where things stand. And so if you've got somebody in the casino business and you're trying to figure out who do you give a free hotel room to? Well, you look at the score and tell you, is that person really qualified for one or not? And by the way, the data scientists aren't frustrated because they know that what decisions are trying to make, we're trying to support and they can go through a process with the business people to brainstorm the different data sources they can bring in to help figure out that on this project, the casino, what was really interesting is as we're trying to figure out all these decisions about who gets free play money, who gets comped on this, et cetera, et cetera, who do you target for marketing perspective? They had this idea they wanted to create a customer value score, right? How valuable is the customer form? Well, we came back and said, no, what you really want is what's the maximum value for that customer? What's the upside potential? So instead of focusing your money on the people who are already spending money, why don't you focus your comps on the people who should be spending more money? That one decision on hotel rooms, $14 million a year worth to them, one decision. So, very quickly, I was a little bit afraid you were gonna tie together the retirement score to the upside on the casino. But I'm gonna tell you a quick story and then slightly shift in the next couple of minutes and I know we're gonna be talking again a little bit later, we might be able to pick up on this. So my story very quickly is I walk into the guy who's running analytics just as a number of years ago. And he's got three offices. And he's a very important guy. So I talked to his office, he says, let's come down. He goes past an office with the door closed to his conference room. Said, what's in that office? He says, I'll show you. He opens it up and there's 14,000 copies of a very popular BI front end tool in boxes in the office that he doesn't want anybody to know he couldn't deliver because it didn't work. Here's the question. The question is, at what point in time, might be concerned about the Hadoop or the data lake thing, at what point in time do we actually start worrying about delivering the insight? Not just capturing the data, running the analytics, but delivering the insight. Because at the end of the day, you're absolutely right. It's to the person that needs the information so they can make the decision. We're not hearing much of that here. And so you're right. We see that piece, right? So we looked at this floor this week, right? It's a lot about simplify. Simplify data ingestion, simplify the analytics model, simplify data management, simplify pieces. But it's still at that, right? It's simplify this piece, simplify this piece, simplify this piece. There's not a lot of people looking at the ecosystem and how do you simplify the ecosystem to make it, to a point, I kind of agree with you, right? Data scientists in some points are frustrated, they're not frustrated from the business perspective. They're more frustrated in the fact that it takes them, you know, on average, probably about 70% of their work space or their work time to manage the environments and managing the data. Even trying to find the data, does the data exist? So if you look at the simplification of each of the pieces, but if you actually look at the simplification of the ecosystem, that's what EMC is looking for. And that exact reason is to make it easy to consume the infrastructure, easy to understand and manage the data, allow them to do their job, and then that makes it quickly easy to be able to deliver those insights to the marketplace. So they're now focusing more of their time on the analytics and not the technology. That allows them more time to drive insights, but also allows them more cycles to deliver those insights. And then we were also seeing the development, as you guys both talked about, right? Kind of the specialized applications by the customers themselves in that, you know, the platform three mantra of, you know, mobile applications and applications that are utilizing analytics to provide the value immediately in near real time for those functions. So simplifying that entire ecosystem allows them to spend more of their time deriving analytics, delivering analytics, and then creating the valuable insights, delivering those insights to the application systems. Let me take a slightly different approach. Because you asked a really important question, which is what insights are we trying to deliver? And so I'm gonna steal a quote from Stephen Covey, beginning with an end in mind. And so when we approach these things, we don't approach talking about technology. We approach saying, okay, what decision are you trying to make? What business problems are you going after? What are your key business initiatives? And when we know the decisions they're trying to make, that helps us figure out what data they could be using, which tells us what architecture needs to look like, which tells us what technology should be there. And yes, the simplification is really important because Kerry's points are right that the data scientists spend too much time gathering data, not enough time analyzing data. But if we don't know what we're trying to build, if we don't know what decisions we're trying to support, the whole thing is just a waste of time. And you guys are doing a lot of work with the customers, and I think that's really a testament in the language you're using is not speeds and feeds. It's all about what's the business value? 4M's a big data, baby. That's right, 4M's. Guys, thanks so much for coming on theCUBE. Really appreciate the insight from EMC. EMC doing a lot of work with customers. Obviously, the journey of big data, digital transformation here in theCUBE, you want to follow the journey of theCUBE, go to the, at theCUBE, follow us on Twitter, and check out youtube.com, watch our SiliconANGLE approves over 8,000 videos of theCUBE, we're in our seventh season, and every single Hadoop world we've been here is theCUBE live in Silicon Valley. We'll be right back with more after this short break.