 from San Jose in the heart of Silicon Valley. It's theCUBE covering Big Data SV 2016. Hey, welcome back everybody. Jeff Frick here with theCUBE. We are live in San Jose, California for day three of Big Data SV, which is part of Big Data week, which is a conjunction with Strata Hadoop. Basically, everything Big Data's happening in San Jose. So if you're not at Build in San Francisco, you should be at Big Data SV here at the Fairmont. And we're excited to have one of our favorite guests. I don't know his official title is, we just go with the Dean of Big Data, Bill Schmarzo from EMC, back from the Hinterlands. We got him off a plane, a rare week for him to be home. Bill, welcome. Thanks for having me as always. Always great to have you. So give us a quick update. You are out in the field, really touching customers use cases, the real world and not back at the office talking about products and PowerPoint slides. There's no one back at the office has any ink in their pen. So we like to, so we're talking to customers and we're doing projects around airport, how to optimize flow in airports. We're doing a project, we're doing another casino project. We're doing a project regarding manufacturing and one on financial service. We just got projects all over the place. And it's great other than the fact you got to fly on an airplane to get to these places. So it is nice to be home this week. So Bill, one of the things that we've been talking about, we talked about it in the session this morning when we were describing the relationships and the learnings from data enterprise, from data warehousing to Hadoop and Data Lakes, that we said, you know, at the end of the day there has to be a new touchstone. And that new touchstone is, what's the action that we want to take, what insights are going to be required, working back from the business impact and the role that decisions and information and data ultimately play in that. So as you talk to these customers, you have a different perspective. You have a consistent perspective. How do you talk to them about what they need to do? So I find, I like to talk about the anti-Jabberwocky strategy. You know, the Jabberwocky poem, I wish I could rattle off top of my head, but the poem is full of all kinds of nonsense words and such, and most of sales pitches I hear in the big data space sound like the Jabberwocky poem. So like there's this belief that if I can confuse my customer enough that they'll actually buy from me. When in reality, of course customers don't buy in confusion, they buy when they have understanding. So our approach is really simple. We want to simplify the conversation and we started by saying, what is it the organization from a business perspective is trying to accomplish? Is it, you know, customer acquisition, improving teacher retention, improving student performance? Is it improving hospital outcomes? You know, what is it you're trying to accomplish? And we really like to focus on this nine to 12 month window. We want to find something where somebody's hair is on fire and there's a sense of urgency. Anything beyond 12 months is a science experiment. So we start with that bullet point. And every business executive knows we're trying to accomplish. Nothing else, you pick up their annual report and you can read it, right? From that, what we do is we go through a process of identifying the business stakeholders who are impacted by that business initiative, which is usually three to four, five different business functions. And then the real secret of the entire process, and this is what we tell our customers, is to identify the decisions that those stakeholders are trying to make in support of that business initiative. Organizations understand that. They go, okay, I can get that information. Well, data science is all about making better decisions. And you boil down data science. What is data science? The book Moneyball describes it better than anybody else, right? Data science is about identifying those variables and metrics that might be better predictors of performance. And so we're gonna identify the variables and metrics. We're gonna take the decisions. We're gonna brainstorm the kinds of questions you wanna ask, both the descriptive as well as predictive questions. That's gonna unleash all the other kind of data sources we can bring in, like building permits and the Zillow data and traffic and whatever else you could ever imagine. The process is really simple. So when we simplify the discussion, customers are like, oh, I can do that. That makes sense. And to your point, Peter says, it gets back to an outcome. What decisions am I trying to make? How am I trying to improve this? And what decisions are, you know, what data is gonna help me to get there? So in many respects, we're talking about literally designing decisions and designing data to serve those decisions where the decision design process is what's the outcome and who's involved in that outcome. That says here's a schema of decisions, so to speak. And then on top of that, we say, and what information or what data capital is necessary to put in service to those decisions? It's as simple as that. Why we overcomplicate this conversation? I have no idea. We wanna jump right into the do I need Spark conversation. And when somebody says, you know, Shamar, so do I need Spark? And I go, I don't know, what are you trying to do? What decision are you trying to make and how does it tie back to the business? And so if you simplify the conversation, what we're finding when we have these conversations is that customers are engaging us. We are busier than a one-legged man in a Nascar contest, right? We are, good things are happening. But it comes not from the Jabber walkie strategy, which you can hear if you walk around the expo hall. It comes from simplifying. That's funny, because there was this meme, this early days big data meme, where, and it's kind of the data like, you know, you just throw it all in a pile and magically throw some analysis on it and insight is going to pop out. And clearly that's not the case. You can't get someplace if you don't at least start walking in a direction. We talked earlier about this frustration of data scientists, right? Data scientists, of course they're frustrated because the tools aren't easy enough and there's too much data wrangling going on, but you've simplified their life tremendously when you said, here's the problem we're trying to solve. And by the way, these are the subject matter experts on the business side who can help you think through the questions and the decisions. When you combine the business subject matter experts with the data scientists, all kinds of magic stuff happens, right? It's just, it's unbelievably exciting to be involved in these projects. And the data scientists in many respects, at least I'm gonna test this on you, the data science in many respects is that kind of the guide through the magic domain. The person that's helping the executive, the domain expert, understand how to move their way through all this data and find the nuggets that make them more productive. I've never thought about that way, but I think you're spot on. They are the guide. They are the guide of helping the business people understand the realm of what's possible. We had this project and I tried to tell you about it where one of the most important data sources the client had about determining what their future value of a customer is was located on a bunch of PDF files and they'd given up on it, right? Our data science team is sitting in the back room and they're drooling, right? We can screen scrape the PDF file because it had all this incredible insights from the customer. So they had, the business users had sort of, that's PDF, we can't use that. The data scientists, like you said, they're a guide, it says, hey, data, it's in the PDF, text files on a bunch of notes, information that's out there, we can get at all that nowadays. So, but Bill, there's this other meme that's going on and I'm curious to get your input from the field where there just aren't enough data scientists and one of the keys to the magic is unlocking the access to the data as well as the ability to manipulate the tool, manipulate the data with the tools, not necessarily just with the data scientists and this whole thing of, can big data get to the point where today's Excel users are now the people manipulating and trying to find that insight? What's kind of your reaction to that and what do you see kind of in the real world? So I think the biggest mistake we make is I think that data science is one single person. Data science is a team sport, right? You got data engineers who are responsible for munging the data, you got data visualizers who are building a lot of the visualizations, you have architects, you have subject matter experts, you have user experience people. So data science is a team sport and if you start thinking about it as a team sport, then you've become less reliant on trying to find that one unicorn status scientist and you let the data scientists do what they do well which is to build analytic models, predictive models that tell you which variables are predictive, which ones aren't and you surround them with a team of people which, by the way, includes the business subject matter expert who knows the decisions, who knows the questions, who knows the metrics, right? If you put that team together, then you get that great synergy. But then how do you accelerate it without kind of the classic old school, okay, you know, the data science comes back, here you go, not what I wanted, change this, goes back, so you know, to get it more of a quicker turn, quicker fast, better iterate into the process. That's the power of the data, like versus the data warehouse. Because I don't have to worry about a schema, I can bring in Zillow data, I can test to see if it has any sort of predictive powers on a student's performance in class and do that in a matter of hours, right? I can iterate hours so I can have the subject matter expert pose these kinds of questions. The data scientist tested out, working with the rest of the data engineers and such and the visualizers, and come back and say, that doesn't have any predictive power, right? Or it does, so the data lake allows me to have that very rapid environment so that the business users aren't throwing an idea at you and then going away for two months, three months while you build a data warehouse to answer that question. But it's very clear today that the kind of the notion that this comes from data warehousing and we're seeing it now actually starting to happen in some of the data lake conversations, the idea that grab all the data, put a visualization tool on it, is not synonymous with, you know, grab the data because the visualization tool is not a guide. The visualization tool is, you know, I think about what Eisenhower said so many years ago, it's not the plan, it's the planning. You don't necessarily want what's on the screen, you want how you got there, how you get to that moment at which somebody says, I get it, I understand how to do it. Do you subscribe to that notion? It's the process of doing the data science more than just the visualization of what shows up in the glass. Well, clearly the visualization means like a CAT scan. Right, it's just going to show me what's in there. Right. It doesn't tell me anything, right? My data scientist is using what's in there, using the results of the CAT scan to figure out which variables are in there, which ones might be moving in combination, where my outliers might be and using those sort of insights to figure out how do I build the right analytic model? Because at the end of the day, I have to quantify cause and effect in order to make a better decision. Right, so having the visualization is great, but I have to be able to have this idea that I'm going to have to take what's in that data sets or across all these different data sources to actually at some point in time build a model and then do all that FP and T values stuff to measure the goodness of fit. So Bill, we've talked a lot about, over the past couple of days, about the way that developers are going to enter into this whole data world to start finding new ways of unlocking the value of software. We've observed that process-orientated development might not work. Maybe we need to think about model-oriented development as a way of making things work. Are we really talking about the data scientist in conjunction with this other team becoming the new developers of capabilities that can be codified as software? Are we talking about, when you talk about iteration, a new agile thinking for how we think about data? Yeah, that's a really interesting point cause we really feel that the user experience people should be on the data science team and most people don't even think about user experience. But if you think about when we're delivering recommendations to the pit bosses in the casino about who they should be giving free play money to, right, when we're delivering recommendations, we want to be able to have the user interface not only capture what decisions they made or didn't make or they might modify them, but these mobile devices, they're a two-way street. And so it isn't just about the data scientist throwing ideas out at people. We gotta be able to measure how effective the decisions were which requires the subsequent capture of data. Maybe somebody's putting notes that says that decision didn't work in a situation. I tried this instead, right? It's gotta be this sort of two-way street, and so... And it persists. It's something that lasts over an extended period of time. Oh my gosh, yes, like forever. Not three to six weeks, but forever, right? You wanna know economic impact, right? When the world goes through a life is good, life is suck, life is good, life sucks sort of event which takes sometimes five to seven years, you need to have 14, 21 years of data to figure out what does somebody do? What are their behaviors when there's an economic change? You can't do that by having 24 months of aggregated data in a data warehouse. You gotta have seven, 14, 21 years of detailed data so I can see that when this happens, people, they change their buying patterns. They buy less expensive goods. They go to less expensive hotels. They start doing, they take the public transportation more than they take the ubers, right? They change patterns. And so you gotta have that detail out there that allows you then to tease out all that data to have and drive those kind of interactions. So another concept we've been talking a lot about, shift gears a little bit, is really the value of the data. And data as an asset, as Peter likes to say, every company's combined, they've got a bunch of assets they can deploy, they've got processes, they've got technology, they've got people. How do they put that together and make money and build their business? So it's interesting on the value of the data. Whether you know it's valuable or not, obviously some stuff is obvious, but we talked a little bit about the airport example that maybe I had no idea kind of what the real value of that data is. So how are people kind of figuring out the value of the data, allocating resources against that because they know it's a value, discovering new wealth, if you will, within data that they didn't know was out there, or some of the storage you can share from the field? And it also fuels, I need to go out and buy data, how much am I willing to pay for that data? And so, this is, by the way, this session day is impetus for me to finally finish a blog I've been working on forever, which is what's the economic value of data? And in my very limited view of the world, I think you tie the value of the data back to the decisions you're trying to make, which ties back to your business initiative. If my ability to reduce, improve customer activation, that one of the casinos are working at 48.2% of the people who sign up for a player's card only come once, 48.2. If I can drive it out to 46, that's worth $40 million, right? So I know I've got a problem worth $40 million. I know the decisions I can make to help support that. Now I've got the basis for making decisions regarding, okay, how important is this data that I can go out and get on people? Maybe axiom data, maybe other sort of data sources I can pull in. So I think by tacking that top-down approach, it gives me the basis for making some range of economic decision on the value of the data. Again, motivation for me to finish that blog, I've been hanging out for three years, but I think that's, I don't know any other way to how to approach that problem. Well, and it's on top of that, and you already mentioned it, that that's the economic value for the data for that decision. Yes. One of the things that makes data so interesting is that data that was valuable for that decision in unexpected ways is gonna become valuable for another decision later. Great example. So we're doing this project for a very, very large worldwide international organization, and they're running a bunch of projects for how to fuel big data ideas, right? And so they've got a bunch of submissions regarding, they wanna bring in tourism data to predict economic growth, they wanna bring in big ticket sales, they wanna bring in shipping data and sort of things like that. And one of the things we went through with this is we realized if you bring in like tourism data that's being used to solve that one particular decision, it has the advantage of being used across a whole number of other decisions. It does something that's hard to do with dollars, right? And with people, it's something that can actually replicate in value. And so you have that almost a network effect when you put the data in, you have that one plus one equals three equals nine equals, et cetera, et cetera, that sort of just grows from it. And you don't see that with dollars and people, but with data, you can certainly do that. And it has that multiplier effect. And I would say that that is essential, that is the essential of digital business. It's thinking that way. So good point, thinking. Thinking is the hardest challenge. When I talk to my customers, it's getting them to think differently about their approach. You know, in my book, I talk a lot about, in my class at USF, I teach a lot about how do you get the business people to think like a data scientist? How do you get them to sort of embrace the, using data and analytics to help power their business? It isn't the technology that's getting everybody all balled up. It's how do I think differently about the economic impact of data and how I'm leveraging data to drive the analysis, to drive into decisions that support my key business initiatives? And what's the hard part to get them to think that way? What's the part that they're not used to? Is it just to think that I have the data or I can, I have more data than I think I have because I've never actually explored it or I just like to go, I mean, I would imagine people are beyond the gut versus data decision making. What is the key component of thinking like a data scientist that most people don't do? It's cultural and it's getting people, it's allowing people to have ideas and not be throttled by the hippo and the organization, the highest paid person's opinion, right? So when we've done these facilitation workshops things, we always started by saying, all ideas are worthy of consideration. No one's allowed to print the poll rank and say, oh, that's a stupid idea. Because if you wanna kill creative thinking, have some senior person go out there and say, oh, that's a stupid idea, then no one's gonna volunteer anything. And so I think culturally, you have environments where some organizations think that the only smarts in the organization are the very top of the organization. When in reality, the smarts are located at the point of customer engagement. Sort of to validate that, we're seeing the biggest success and big data in small to medium sized businesses because they don't have the luxury of having a bunch of hippos running around telling everybody how to do, they have to drive this cross organizational collaboration, especially between IT and the business, to figure out how do we get more value? Data is especially disruptive inside the organization. The politics and I talked yesterday about this notion of management by astrology. I have the model, I know the right thing to do, don't let data change my mind. I think there's a lot of that that goes on in the organization. Well, I would tell you that I think data silos is a political issue, not a technology issue. People have their data and by God, they're not gonna share it. Well, Bill, as always, thanks for stopping by. They're giving us the time hook. We'd love to have you on. I'll give you the last word, anything in the real short term that you wanna highlight that you're working on? Yeah, actually, I do wanna share something. I have the very good fortune to be a part of a very select group inside the EMC called the Pathfinders. And it's a select number of SEs who have been chartered to change the way the EMC does business. And so, I told them I'd give them a call out here because they have been an incredible force trying to help drive the way inside of EMC, as you said, inside the key battleground, how we go to market and how our sales teams are engaging with our customers. So the Pathfinder group has been a marvelous team to be a part of. They've taught me more than I've taught them. Well, that's great. Thanks for sharing that. Because it's really important, we talk to everyone that we have on you. How can you tell your customers to transform their business if you're not transforming your own, if you're not looking inside and trying to apply all these things internally? Amen, brother. All right, well, Bill Schmarlser, thanks for stopping by. As always, the Dean of Big Data here with Peter Burris. I'm Jeff Frick. You're watching theCUBE. We're live in San Jose, California at Big Data SV. Part of Big Data Week. We'll be back with our next guest after this short break. Thanks for watching.