 Produced from theCUBE Studios, this is Strong by Science. In-depth conversations about science-based training, sports performance, and all things health and wellness. Here's your host, Max Schmarzo. All right, thank you guys for tuning in today. I have the one and only Dean of Big Data, the man, the myth, the legend. Bill Schmarzo, also my dad, is the CTO of Hitachi, Vantara. And IOT in analytics. He has a very interesting background, because he is the, well, he's known as the Dean of Big Data, but also the king of the court and all things basketball related when it comes to our household. And unlike most people in the data world, I don't wanna say most as an umbrella term, but a sum, Big Bill has an illustrious sports career, playing at Co-College, the Harvard of the Midwest, my alma mater as well. But I think having that background of not just being computer science, but where you have multiple disciplines involved when it comes to your jazz career, you had basketball career, you have obviously the career you're on now. All that plays a huge role in being able to interpret and take multiple domains and put it into one. So thank you for being here, dad. Thanks, Max. That's a great introduction. I really appreciate that. No, it's wonderful to have you. And for our listeners who are not aware, Bill is, I'm referring to him as Bill, he's my dad, but if I call him my dad the whole time, it's gonna drive me crazy. Bill has a mind that thinks not like most. So when he sees things, he thinks about it, not just in terms of the single, I guess trajectory that could be taken, but the multiple domains that it can go. So both vertically and horizontally. And when we talk about data, data is something so commonly brought up in sports, so commonly brought up in performance and athletic development. Big data is probably one of the biggest, I guess, catchphrases or hot words or sayings that people have nowadays. But it doesn't always have a lot of meaning to it because a lot of times we get the word big data and then we don't have action out of big data and Bill's specialty is not just big data but it's getting action out of big data. With that going forward, I think a lot of this talk will be talking about how to utilize big data, how to utilize data in general, how to organize it, how to put yourself in a situation to get actionable insights. And so just to start it off, Bill can you talk a little bit about your background, some of the things you've done and how you've developed the insights that you have. Thanks, Max. I have kind of a very, I don't wanna say deep background, but I've been doing data and analytics a long time. And I was very fortunate, one of those forced moments in life where in the late 1980s, I was involved in a project at Proctor & Gamble but I ran the project where we brought in Walmart's point of sales data for the first time into what we would now call a data warehouse. And for many of this became the launching point of the data warehouse BI marketplace. And we can trace the origins of many of the BI players to that project at Proctor & Gamble in 87 and 88. And I spent a big chunk of my life, just a big believer in business intelligence and data warehousing and trying to amass data together and trying to use that data to report on what's going on and providing insights. And I did that for 25 years of my life until as you probably remember, Max, I was recruited out of business objects where I was the vice president of analytic applications. I was recruited out of there by Yahoo. And Yahoo had a very interesting problem which is they needed to build analytics for their advertisers to help those advertisers to optimize their spend across the Yahoo ad network. And what I learned there, in fact, what I unlearned there was that everything that I had learned about BI and data warehouse and how you constructed data warehouses, how you were so schema centric how everything was evolved around tabular data at Yahoo that was an entirely different approach. I have my first introduction to Hadoop and the concept of a data lake. That was my first real introduction into data science and how to do predictive analytics and prescriptive analytics. And in fact, it was such a huge change for me that I was asked to come back to the TDWI, the Data Warehouse Institute, where I was teaching for many years and I was asked to do a keynote after being at Yahoo for a year or so to share sort of what were the observations. What did I learn? And I remember I stood up there in front of about 600 people and I started my presentation by saying everything I've taught you the past 20 years is wrong. And it was, well, I didn't get invited back for 10 years. So that probably tells you something. But it was really about unlearning a lot about what I had learned before and probably Max, one of the things that was most, one of the aha moments for me was, BI was very focused on understanding the questions that people were trying to ask and answer. Data science is about understanding the decisions they're trying to take action on. Questions by their very nature are informative, but decisions are actionable. And so what we did at Yahoo in order to really drive the help our advertisers optimize their spend across the Yahoo ad network is we focused on identifying the decisions the media planners and buyers and the campaign managers had to make around running a campaign. You know, how much money to allocate to what sites, how many conversions do I want, how many impressions do I want. So all the decisions we built predictive analytics around so that we could deliver prescriptive actions to these two classes of stakeholders, media planners and buyers and the campaign managers who had no aspirations about being analysts. They're trying to be the best digital marketing executives or people they could possibly be. They didn't want to be analysts. So that sort of leads me to where I am today. And my teaching, my books, my blogs, everything I do is very much around how do we take data and analytics and help organizations become more effective. So everything I've done since then, the books I've written, the teaching I do at the University of San Francisco and next week at the National University of Ireland in Galway and all the clients I work with is really how do we take data and analytics and help organizations become more effective at driving the decisions that optimize their business and their operational models. It's really about decisions and how do we leverage data and analytics to drive those decisions. So how would you define the difference between a question that someone's trying to answer versus a decision that they're trying to be better informed on? So here's what I'd put it. I call it the SAM test, S-A-M. And then is it strategic? Is it actionable? Is it material? And so you can ask questions that are provocative but you might not ask questions that are strategic to the problems you're trying to solve. You may not be able to ask questions that are actionable in the sense you know what to do. And you don't necessarily ask questions that are material in the sense that the value of that question is greater than the cost of answering that question. And so if I think about the SAM test, when I apply it to data science and decisions, when I start mining the data, so I know what decisions are most important. I've gone through a process to identify, to validate the value and prioritize those decisions. I'd understand what decisions are most important. Now when I'm starting to dig through the data, all the structured, unstructured data across a number of different data sources, I'm looking for, I'm trying to codify patterns and relationships buried in that data and I'm applying the SAM test against those insights. Is it strategic to the problem I'm trying to solve? Can I actually act on it? And is it material in the sense that it's more valuable to act than it is to create the action around it? So that's, to me, the big difference is by the very nature, decisions are actual. I'm trying to make a decision. I'm going to take an action. Questions by their nature are informative, interesting. They could be very provocative. No questions have an important role, but ultimately questions do not necessarily lead to actions. So if I'm a sport coach, I'm running a professional basketball team, some of the decisions I'm trying to make are I'm deciding on what program best develops my players. What metrics will help me decide who the best prospect is? Is that the right way of looking at it? Yeah, so we did an exercise at USF to have the students go through an exercise to what decisions does Steve Kerr need to make over the next two games that he's playing, right? And we go through an exercise of identifying, especially in game decisions, exercise route. So how often are you going to play somebody? How long are they going to play? What are the right combinations? What are the kind of offensive plays that you're going to try to run? So there's a bunch of decisions that Steve Kerr, as coach of the Warriors, for example, needs to make in the game to not only try to win the game, but to also minimize wear and tear on his players. And by the way, that's a really good point to think about the decisions, the good decisions are always a conflict of other ideas. Win the game while minimizing wear and tear on my players. All the important decisions in life have two, three or four different variables that may not be exactly the same, which is by, this is where data science comes in. Because data science is going to look across those three or four metrics against what you're going to measure success and tries to figure out what's the right balance of those given the situation I'm in. So going back to decision about playing time, think about all the data you might want to look at in order to optimize that. So when's the next game? How far are they in the season? Where do they currently sit ranking-wise? How many minutes per game has player X been playing? Looking over the past few years, what's their maximum point? So there's not a lot of decisions that people are trying to make. And by the way, the beauty of the decisions is the decisions really haven't changed in years. What's changed not the decisions, it's the answers. And the answers have changed because we have this great bounty of data available to us in-game performance, health data, DNA data, all kinds of other data. And then we have all these great advanced analytic techniques on neural networks and unstructured or supervised machine learning and all this great technology now that can help us to uncover those relationships and patterns that are buried in the data that we can use to help individualize those decisions. One last point there, the point there at the end. When people talk about big data, they get fixated on the big part, the volume part. It's not the volume of big data that I'm going to monetize, it's the granularity. And what I mean by that is I now have the ability to build very detailed profiles going back to our basketball example. I can build a very detailed performance profile on every one of my players. So for every one of the players on the Warriors team, I can build a very detailed profile that details out what's their optimal playing time. How much time should they spend before a break on the court, right? What are the right combinations of players in order to generate the most offense or the best defense? I can build these very detailed individual profiles and then I can start measuring them together to find the right combination. So when we talk about big, it's not the volume that's interesting, it's the granularity. Gotcha, and what's interesting from my world is so when you're dealing with marketing and business, a lot of that when you're developing, whether it be a company that you're trying to find more out about your customers or your startup trying to learn about what product you should develop, there's tons of unknowns. And a lot of big data from my understanding can help you better understand some patterns within customers, how to market. In your book, you talk about, oh, we need to increase sales at Chipotle because we understand X, Y, and Z are occurring around us. Now in the sports science world, we have our friend called Science. And Science has helped us already identify certain metrics that are very important and correlated to different physiological outcomes. So it almost gives us a shortcut because in the big data world, especially when you're dealing with the data that you guys are dealing with and trying to understand customer decisions, each customer is individual and you're trying to combine all together to find patterns, no one's doing science on that, right? It's not like a lab work where someone is understanding muscle protein synthesis and the amount of nutrients you need to recover from it. So in my position, I have all these pillars that maybe exist already where I can just begin my search, there's still a bunch of unknowns. With that kind of environment, do you take a different approach or do you still go with the, I guess, large encompassing and collect everything you can and siphon after? Maybe I'm totally wrong, I'll let you take it away. No, it's a good question. And what's interesting about that, Max, is that the human body is governed by a series of laws, we'll say. Ineseology and the things you've talked about, physics, they have laws. Humans as buyers, shoppers, travelers, we have propensities, we don't have laws, right? I have a propensity that I'm gonna try to fly united because I get easier upgrades, but I might fly Southwest because of schedule inconvenience, right? I have propensities, I don't have laws. So you have laws that work to your advantage. What's interesting about laws, if you start going into the world of IoT, in this concept called digital twins, they're governed by laws of physics. I have a compressor or a chiller or an engine and it's got a bunch of components in it that have been engineered together and I can actually apply the laws, I can actually run simulations against my digital twins to understand exactly when is something likely to break? What's the remaining useful life in that product? What's the severity of the maintenance I need to do on that? The human body, unlike the human psyche, is governed by laws. Human behaviors are really hard, right? And laws of Vegas is built on the fact that human behaviors are so flawed. But body physics, like the physics that run these devices, you can actually build models and run simulations to figure out exactly how, what's the wear and tear and what's the extensibility of what you can operate in? Gotcha, yeah, so that's when, from our world, you start looking at subsystems and you say, okay, this is your muscular system, this is your autonomic nervous system, this is your central nervous system. These are ways that we can begin to measure it. And then we wrote a blog on this and it's a stress response model where you understand these systems and their inferences for the most part and then you apply a stress and you see how the body responds. And then you determine, okay, well, if I know the body can only respond in a certain number of ways, it's either compensatory, it's gonna be returning to baseline and might be maladaptation, but there's only so many ways when you look at a cell at an individual level that that cell can actually respond. And it's the aggregation of all these cellular responses that end up and manifest in a change in a subsystem and that subsystem can be measured inferentially through certain technology that we have. But I also think at the same time, we make a huge leap and that leap is the word inference, right? We're making an assumption and sometimes those assumptions are very dangerous and they lead to, because if that assumption's unknown and we're wrong on it, then we kind of sway and missed a little bit on our whole projection. So I like the idea of looking at patterns and look at the probabilistic nature of it. And I've actually kind of recently changed my view a little bit from when we first had talked about this, I was much more hardwired in laws, but I think it's a law, but maybe a law with some level of variation or standard deviation in it, we have guardrails instead. So that's kind of how I think about it personally. Is that something that you said that's on the right track for that or how would you approach it? Yeah, actually there's a lot of similarities Max. So your description of the human body made up of subsystems, when we talk to organizations about things like smart cities or smart malls or smart hospitals, a smart city is comprised of a, it's made up of a series of subsystems, right? I've got subsystems regarding water and wastewater, traffic, safety, local development, things like this. There's a bunch of subsystems that make a city work. And each of those subsystems is comprised of a series of decisions or clusters of decisions which equal use cases around what you're trying to optimize. So if I'm trying to improve traffic flow, if one of my subsystems is traffic flow, there are a bunch of use cases there about where do I do maintenance, where do I expand the roads, where do I put HOV lanes, right? So if you start taking apart the smart city into the subsystems and then know the subsystems are comprised of use cases, that puts you into really good position. Now here's something we did recently with a client who was trying to think about building the theme park of the future. And how do we make certain that we really have a holistic view of the use cases that I need to go after? It's really easy to identify the use cases within your own four walls, but digital transformation in particular happens outside the four walls of an organization. And so what we're doing is a process where we're building journey maps for all their key stakeholders. So you've got a journey map for a customer, you have a journey map for operations, you have a journey map for partners and such. So you build these journey maps and you start thinking about, for example, I'm a theme park and at some point in time, my guest slash customer is going to have a epiphany, they want to go do something, they want to go on vacation, right? At that point in time, that theme park is competing against not only all the other theme parks, but it's competing against Major League Baseball who's got things, it's competing against going to the beach in Sanibel Island, just hanging around, right? They're competing at that point. And if they only start engaging the customer when the customer has actually contacted them, they miss a huge part of the market. They miss a huge chance to influence that person's agenda. And so one of the things to think about, I don't know how this applies to your space max, but as we started thinking about smart entities, we use design thinking and customer journey maps as a way to make certain that we're not fooling ourselves by only looking within the four walls of our organization, that we're knocking those walls down, making them very forced and we're looking at what happens before somebody engages with us and even afterwards. So again, going back to the theme park example, once they leave the theme park, they're probably posting on social media what kind of fun they had or fun they didn't have, they're probably making plans for next year. They're talking to friends and other things. So there's a bunch of stuff we're gonna call it afterglow that happens after event that you wanna make certain that you're in part of influencing that. So again, I don't know how when you combined the data science of use cases and decisions with design thinking of journey maps, what that might mean to your business, but for us in thinking about smart cities, it's opened up all kinds of possibilities. And most importantly for our customers, it's opened up all kinds of new areas where they can create new sources of value. So anyone listening to this needs to understand that when the word client or customers use, it can be substituted for athlete. And what I think is really important is that when we hear you talk about your, the amount of infrastructure you do for an idea when you approach a situation is something that sports science, for my opinion, especially across multiple domains is truly lacking. What happens is we get a piece of technology and someone says, go do science. While you're taking the approach of, let's actually think out what we're doing beforehand. Let's determine our key performance indicators. Let's understand maybe the journey that this piece of technology is gonna take with the athlete or how the athlete's gonna interact with this piece of technology throughout their four years. If you're in the private sector, right that afterglow effect might be something that you refer to as client retention and their ability to come back over and over and spread your own word for you. If you're in the sector with student athletes, maybe it's those athletes talking highly about your program to help with recruiting and understanding that developing athletes is gonna help make that college more enticing to go to or that program or that organization. But what really stood out was the fact that you have this infrastructure built beforehand. And the example I gave, I've spoken with a good number of organizations and teams about data utilization, is that if you're to all of a sudden be dropped in the middle of the woods and someone says, go build a cabin and I was in a giant forest, I could use as much wood as I want. I could just keep chopping down trees until I had something that was a shelter of some sort. Even I could probably do that. But if someone said, you know what, you have three trees to cut down to make a cabin, you're gonna become very efficient and you're gonna think about each chop and each piece of wood and how it's going to be used in your interaction with that wood in conjunction with that woods interaction with yourself. And so when we start looking at athlete development or we're looking at client retention or we're looking at general health and wellness, it's not just, oh, this is a great idea. We wanna make the world's greatest theme park we wanna make the world's greatest training facility. But what infrastructure and steps do you need to take? And you said stakeholders. So what individuals am I working with? Am I talking with the physical therapist? Am I talking with the athletic trainer? Am I talking with the skill coach? How does the skill coach want the data presented to them? Maybe that's different than how the athletic trainer is gonna have the data presented to them. Maybe the sport coach doesn't wanna see the data unless something, a red flag comes up. So now you have all these different entities just like how you're talking about developing this customer journey throughout theme park and making sure that they have an experience that's memorable and causes an afterglow and really gives that experience meaning. How can we now take data and apply it in the same way? So we get the most value, like you said, on the granular aspect of data and really turn that into something valuable. Matt, Matt, you said something really important. And one of the things that, let me share one of many horror stories that comes up in my daily life, which is somebody walking up to me and saying, hey, I got a client, here's their data, go do some science on it. Like, well, what the heck, right? So we created this thing called the Hypothesis Development Canvas. Our sales teams hate it. Our data science teams love it because we do all this pre-work. We make sure we understand the problem we're going after, the decision they're trying to make, the KPIs against what you're gonna measure success and progress, what are the operational and financial and business benefits? What are the data sources we want to consider? Here's something, by the way, that's important that maybe I wish Boeing would have thought more about, which is what are the costs of false positives and false negatives? Do you really understand where your risk points are? And the reason why false positive and false negatives are really important in data science is because data science is making predictions. And by virtue of making predictions, we are never 100% certain that's right or not. So predictions have to be built on good enough. Well, when is good enough, good enough? And a lot of that determination as to when is good enough, good enough is really around the cost of false positives and false negatives. Think about a professional athlete, right? The faults, the ramifications of overtraining a professional athlete like a Kevin Durant or Steph Curry and they're out for the playoffs, it's huge financial implications, then personally and for the organization. So you really need to make sure you understand exactly what's the cost of being wrong. And so this hypothesis development canvas is we do a lot of this work before we ever put science to the data. Yeah, it's something that's lacking across not just sports science, but many fields. And what I mean by that is especially you refer to the hypothesis canvas. It's a piece of paper that provides a common language, right? You can sit it out beforehand for listeners who aren't aware of hypothesis canvas is something Bill has worked to develop with his team. And it's about 13 different squares and boxes and you can manipulate it based on your own profession and what you're diving into. But essentially it goes through the infrastructure that you need to have set up in order for this hypothesis or idea or decision to actually be worth a damn. And what I mean by that is that so many times, I hate this, but I'm gonna go a little bit of rant and I apologize, is that people think, oh, I get an idea. And they think Thomas Edison all of a sudden just had an idea and he made a light bulb. Thomas Edison is famous for saying, I didn't make a light bulb, I learned it was at 9,000 ways to not make a light bulb. And what I mean by that is he set up an environment that allowed for failure and allowed for learning. What happens often, people think, oh, I have an idea. They think the idea comes not just in a flash because it always doesn't, it might come from some research but they also believe that it comes with legs and it comes with the infrastructure supported around it. And that's kind of the same way that I see a lot of the data aspect going in regards to our field is that we get an idea, we immediately implement and we hope it works as opposed to set up a learning environment that allows you to go, okay, here's what I think might happen, here's my hypothesis, here's I'm going to apply it. And now if I fail, because I have the infrastructure pre-mapped out, I can look at my infrastructure and say, you know what, that support beam or that individual box itself was the weak link and we made a mistake here but we can go back and fix it.