 from Las Vegas, it's theCUBE. Cover EMC World 2016, brought to you by EMC. And welcome back to EMC World 2016. I'm John Walls along with Jeff Frick here from theCUBE and in our really our quest to provide you with all the insight and all the know-how and knowledge that you possibly need, we've constructed our own best-seller list, the rival for the times you might say. And this is number one on theCUBE best-seller list. And the latest by Bill Schmarzo, Driving Business Strategies with Data Science, Big Data MBA, and we have the author with us, the Dean of Data from EMC. Bill, thanks for being with us here. Tell me, data science, what's data science? What's that all about? So people get very wrapped around the axle regarding what is data science. And data science is really very simple. And actually I think the book that describes data science better than any book I've read is the book Money Ball. And Money Ball describes data science as trying to identify those variables and metrics that might be better predictors of performance. So the Money Ball story is, you know, for years and years, people have paid baseball batters based on batten average, when in reality on base percentage is a much better indicator of value. And so data science is all about finding that next on base percentage. So what drove you to write the book then, I guess the target audience and then ultimately what do you want them to take away from? Sure, so as you know, this is my second book. The first book I wrote was really targeted towards IT. Helping the IT folks understand how do we get value out of big data? I teach a class in University of San Francisco School of Management. And the class is designed to help business students and tomorrow's business leaders to embrace analytics as a business discipline. That the days where we could flip analytics over to IT are over, that it's a business discipline in the same sense as finance or marketing. And so I needed a book, a textbook I could use in order to teach the students. And I went through and read all these data science books and most of them were so fixated on stats and the mathematics of it. And it really missed the point behind how do you get value out of data science? How does a business person, how does an MBA student think about big data and data science in a way that allows them to get more value out of the big data conversation? It always makes me think of something that comes up all the time, which is there aren't enough data scientists, there's the shortage of data scientists where they ought to come from. And it makes me think back to the original car conversations. These cars were never going to go anywhere because there aren't enough chauffeurs. I mean, we need to flip the conversation and get it out of the realm of the pure, statistical, heavy lifting guys and into the business. Well, that's a great story because there was a study done many, many years ago by Western Union, Western Electric, and they did a study about how many operators were they going to need in order to support the growth of phones. And at one point they realized that half the country is going to be operated to support the phone. So what they did is instead, they turned us all into operators, right? And that's kind of the key here. I think you get there, Jeff, is that to make the data scientists more effective, we can't expect to go out and find data scientists to understand the business as well as the business people. Those unicorns are not out there, they're hard to find, right? So how do we get the business people to start thinking more like a data scientist? Remember we said data science is about identifying those variables and metrics that might be better predictors of performance. Who are the people who have probably the best ideas about what variables and metrics they want to test? The business people. So when the business people are brainstorming what variables they want to test and the data scientists are applying their techniques, methodologies to figure out which of these variables are actually better predictors, you have that one plus one equals seven sort of synergy that sort of explodes out in the business community. So it seems intuitive here. The more information you give me, the better guided I am to make decisions and the more predictive analysis I can do, great for me. So why is there, I wouldn't say reluctance, but maybe slow uptake then? I mean, why aren't people just, I would think embracing this opportunity and be looking to incorporate it more into their thinking. The process, while it's really simple, is hard work. It requires you to really have to take the time to understand what your business drivers are. You know, any MBA student who's done any of the Peter Drucker stuff, right? We'd know how to take this stuff apart, right? But we look for the silver bullet and especially in technology, we hope that cognitive computing can come out and tell me what the answer has to be when in reality, that's never going to work. Never is maybe a long time, but you are much more effective when the business people are taking responsibility for their own destiny. And by the way, they know how to do this. If you put them in the right environment, they will tell you what variables they want to test. They have ideas and so it's, I think it's a case where on the business side, we've been over-promising technology for so long that business people have given up on us. Let me tell a story. So probably the number one question I have heard this past week is what can we do in the business side to rebuild trust with the business? I mean, what can we do in the IT side to rebuild trust with the business? That the IT people have lost the trust of the business. And why have we lost that trust? Because for years and years, we've been over-promising and under-delivering. Oh, the data warehouse will solve all your problems. Well, no, no, the BI tool will solve all your problems. No, no, this predictive analytics tool will solve all your problems. No, no, this visual is eight. No, every time we leave them down this path, and we leave them, we over-promise and under-deliver. So the business people are primed to have that conversation, but they expect the IT people to be there arm in arm with them to help them figure out where and how I can apply data science in order to drive the business. And it's really interesting. And it has things like cloud, freed up the IT person from managing boxes to actually sit down with the business person to help them execute their hypothesis. Because that's the other thing, right? It's kind of hypothesis versus this concept of throwing on Hadoop and out the answer will come with some great visualization. You still need hypothesis, but you also still need that IT guy that can help you as opposed to IT guy, I need some help. Well, it'll take you six months to spin up a server. You know, I'm exaggerating, but that used to be the conversation. Maybe not, yeah. No, I think your spot on, Jeff, is that we are freeing up the IT people. The IT people, the first book was really designed for those people who were all of a sudden now having more cycles available, who were interested in learning more about how do I apply this to the business. So the first book was really geared towards them. But I soon realized that, well, I don't address half the equation. I was missing the business people and how do I make sure the business people have that right mindset? So that when they engage with IT, when they engage with the data scientists, they're asking the right questions. They're moving from an environment where I'm asking questions about what happened historically to start asking questions about what's going to happen in the future. So that transition from descriptive questions to more predictive, prescriptive questions is part of the process that we take the business people through. We call that thinking like a data scientist, getting the business people to think like a data scientist. So I'm big on case studies, you know? So I get it, that helps me certainly see things. What do we have in here in that in terms of trying to bring it to life? Oh, gosh, the thing is full of stories. Full of examples, we'll just pick one. Well, I usually, for my class, I use a Chipotle example, because I find it- It's a good start, we both like that, right? We like Chipotle. Make sure we go there, you order no E. coli. It costs extra, don't order the E. coli. Wash your hands, people, wash your hands. So I used the Chipotle example because all my students are familiar with it. And plus, it's a really easy business to understand. And so I make them go through the exercise of reading the annual report of Chipotle and identifying Chipotle's key business initiatives. One of them that jumps out of the annual report is increasing same-store sales. And so we walk the students through a process of brainstorming. What kind of decisions do you need to make in order to support increasing same-store sales? Well, decisions regarding staffing and scheduling, inventory, production, then you start broadening the conversation. So you say, okay, if I know what my sales were last week at Chipotle, how would you predict sales next week at Chipotle? What would you want to know? Well, knowing that there's a little league field across the street might be very important. Knowing that you're five blocks away from a high school might be really important to know what school's still in. When are there softball games or football games? And so when you start the students start thinking about what's the traffic pattern? What's the weather going to be like next week? Are there any major events, a Super Bowl? Is there a protest plan? All the things that you can start getting the business people to start thinking about variables that they might want to consider helps position them in a place where now when you engage the data scientists, they can start looking at all these variables to see do those variables actually have predictive powers. Can they help me to predict how many people are going to come in next week at Chipotle? That's such a great example because John and I were talking offline about this concept of businesses now need to use data that exists outside their four walls and leverage it with the data inside the four walls in a much more powerful combination. Yeah, think about using Zillow data for the value of a home using building permits data to understand where a construction's going on, understanding where accidents are happening, understanding are you near a corporate headquarters or Chipotle and Fridays they have people go out, they have the catering thing. So knowing all these variables that drive your business, the best people, the best people to identify those in the Chipotle example are store managers because they live it every day. They look across the street and say, wow, there's a soccer field across the street. They're having a soccer tournament this week. I'm probably going to need to have more people. Well, how many more people do you need? By golly, the analytic model would tell you you need to have two more. I got a great example. I think I've told this story before, but we're doing this project for this hospital and they're located in Denver. And we're talking to the nurses in the ER and the nurses in the ER know that when the Broncos have a home game, the two hospitals within the three mile radius of sports authority stadium have an increase in broken bones, lacerations and head injuries. When the Broncos have a home game, they have an increase in broken bones, head injuries and lacerations. What they didn't know was it's a 37% increase. The minute I know it's 37%, now I can act on it. Now that I can forecast it, I can act on it. I need two more nurses, one more physician, that's one more supplies. And so, as you said, Jeff, the challenge is getting people to start thinking more predictably about what's going on. The business people can make that transition. Every business person I've talked to in this, in my job over the past five years and even longer, they can transition from that descriptive questions about what happened to the predictive. And when you do that, all kinds of great ideas come out. Now the other thing that, you know, you're so fortunate because you sit down and you have people who have a captive audience. Best job in industry, best job in industry, no doubt. And a topic we've talked about before, but I want you to review again, is this whole topic of diversity and diversity of opinion and differences of opinion. And too often people think of diversity as just, you know, we need more women, we need more minority or whatever. It's much broader than that. And you tell a great story of the quiet little mouse in the back of the room that really had the breakthrough point of view. Yeah, so a project we did at a casino, we were trying to figure out how do we understand a more predictive lifetime player score, right? We knew how much they spent, but how do we predict how much they could spend? Because we want to direct our comps and marketing to people who could spend more. So we're doing this brainstorming workshop process. We do these vision workshops. Greatest engagements in the world. Customers just come out of these things. Their eyes are all like, it's like Moses walking down from the mountain, right? Their hair is all pulled back in gray. It's really, it's great. So we're doing this exercise. We have 13 executives in the room. And there's this woman in the back, her name is Sally. And Sally runs, if you're doing the casino here, she runs all the people who sit behind the bars, the cashiers. And she's responsible for giving out lines of credit. And this casino is located in Southern California. And because these casinos don't want people running from casino to casino, running up a line of credit, at the end of every day, they share with everybody else who all applied for a line of credit. They put it on a PDF and they mail it to each other. And she's telling this story. And the room goes dead quiet. And a guy who's ahead of slots, his name is Buddy, stands up, turns around and says, wait a second, Sally. You're telling me, you know, when our best players are playing another casino and applied for a line of credit? And she goes, yeah, but it's on a PDF. And she goes, wait, wait, wait. You also know who their best players are based on what kind of line of credit they played for. And she goes, yeah, Buddy, but it's on a PDF. And of course, our data signers in the back room, they're all drooling, a PDF, let me add them. I can scrape that stuff. So your point on diversity is spot on. You want as many different business functions as are related to the problem you're trying to solve. So whatever problem we're trying to solve, we're always trying to identify the business stakeholders who either impact or are impacted by that initiative and we bring them into the process because one plus one equals seven equals 21 is creates a very creative process and creative thinking is contagious. One idea leads to three leads to 10. It happens so lickety split. It seems like it kind of plays in the data lakes too, right? Cause all of a sudden you're getting data from a lot of different inputs, different sources, maybe that you never considered before. And all of a sudden you put some intelligence behind that and you have a voila moment. Well, the data lake is such a great enabler cause I can dump data of all different kinds, structured, unstructured, video, audio. I can dump it into the data lake. But I have a process that I go through and prioritize the data I'm going to put in there. I'm not just going to dump it in there cause nothing's going to happen to dump data in there. My data scientist still got to do work with it. This vision workshop process that we go through and it's described in the book here talks about how we go through this process and then we prioritize. So your brainstorm everything you can think of, right? All ideas are where they're consideration. Then you got to focus and prioritize. What are the four or five most important data sources? From a value perspective and from an implementation feasibility perspective, that's where we're going to start. So what's your take on the reality of, you know, we hear often, you know, scheme on read versus scheme on write. And it's so different now with big data cause you can have scheme on read cause you're not really sure what you're writing to cause Lord knows what the hypothesis is going to be down the road. Very good. How does that actually work in the real world with some of the use cases and companies you're working with? So in a data warehouse world, before I could ever put data in a data warehouse, I had to build a schema first. And that schema process could take two, three, six months to build a schema, right? In Hadoop, I don't need to build a schema. I got a sensor, a bunch of log files off of internet of things, drop it right in there. I got a bunch of text files, dump it in there. I got a bunch of PDFs from Sally. I just dump it in there. I just dump it in. I don't need to build a schema. I build my schema once I figure out what kind of test I want to, what kind of question I want to ask, what kind of decision I want to make. The data scientist then figures out what data they want and they build, by the way, the schema for a data scientist is a flat file. It's just a flat file. No snowflake, no star, no dimensional. And so the schema process development is much faster. And so the, and I'm only building it based on the question I'm going to ask or the decision I'm trying to make. And so it's very focused, very, very fast, which enables this fail-fast environment. I could never fail fast with Data Warehouse because every time I want to add a new data source to my Data Warehouse, we jokingly said, six months, one million dollars. With Data Lake, boom, I'm dropping in here. Back to your trust issue. Yeah, you went out to trust, right? They didn't trust it, took so long. Now I can drop, I can screen scrape Zillow. I can drop it into the Data Lake. I can compare the changes in the home value to a student's performance in class to see if there's any sort of correlation there. And when there's not, I can realize I just learned. Right, I didn't fail, I learned fast. I learned, that's not the right variable to look at and move on. Well, before you do move on here, I'd like you to make this a collector's item, if you would, please. For Jeff, if you would. Just an autograph for Jeff. For Jeff. Jeff Marzo. Yeah, we're building our library. We might know him as the dean of Big Data, but I think he's also the Billy Bean of EMC, the way you were talking about this earlier. See if you'll take that. Bill, thanks for being with us. We appreciate the time, as always. Thank you very much. Thanks for having me. Thank you, Bill. And we'll continue here on theCUBE in just a bit. Looking back at the history of Dell.