 Hello, and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager of DataVercity. We'd like to thank you for joining the latest installment of the DataVercity Webinar Series, Data Insights and Analytics, brought to you in partnership with First San Francisco Partners. For today, John and Kelly will be discussing the series, or discussing descriptive, predictive, and prescriptive analytics. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share our highlights or questions via Twitter using hashtag DI analytics. As always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and additional information requested throughout the webinar. Now let me introduce to you our speakers for today. Well-known industry analyst John Miley is a business technology thought leader and recognized authority in all aspects of enterprise information management. With 30 years experience in planning, project management, improving IT organizations, and successful implementation of information systems. He is the president and chief delivery officer at First San Francisco Partners. Also joining us is Kelly O'Neill. Kelly is the founder and CEO of First San Francisco Partners. Having worked with the software and systems providers key to the formulation of enterprise information management, Kelly has played important roles in many of the groundbreaking initiatives that confirm the value of EIM to the enterprise. Recognizing an unmet need for clear guidance and advice on the intricacies of implementing EIM solutions, she first founded First San Francisco Partners in early 2007. And with that, I will turn it over to John and Kelly to get today's webinar started. Hello and welcome. Hello. Hello. I hope everyone is doing fine out there. Good morning, good afternoon, good evening, wherever you may be this day. Today we're talking about analytics in a bit of detail to be more precise, descriptive, predictive, and prescriptive analytics. And today's offering will be a bit on the educational side, a bit on the side of putting things in perspective with our annual topic this year of data insight and analytics. Before that, we have a couple of our poll questions which we like to ask about this very dynamic and busy discipline. So first of all, get your mouses ready or your mice, whatever. What type of statistical analysis do you use or plan to use, and you can choose multiple answers. So we will allow this about 30 seconds for this to go. And then we'll take a look at that and we will move on to the next one while everyone is typing in their answers there. Today you don't have to be a data scientist to listen to this today. In fact, we kind of hope that you're one of those people that will be supporting them or thinking about supporting data scientists or peripherally involved or looking over the cubicle wall at things and going, I don't understand a lot of what's going on. Sometimes you feel like you should and that's who we're kind of pointing our little session at here today. We have, I think, do we have an answer? Yes, we do. And we have no answer is the biggest one. And we have a dead on tie between descriptive and predictive. Very, very interesting findings there. Not too far away from I don't know. Okay, let's go on to our next question then. And if you don't want to answer one of these weeks, we're going to ask you why you don't want to answer and then you're going to have to answer. All right, how frequently do you use statistical analysis in your work? So if you don't use it, put it there less than once a week, once or twice, once a day, whatever, please answer that. And we'll let the time race by for that one as well. And then we will get started here. Now, when I used to be in radio, this is where you did the weather forecast, but it would be a whole lot of different weather forecast at one time. So I think we're about ready to go here on the result. And we'll take a look at that and away we go. And we had some questions on the first, there was some asking some questions for you, Shannon, on the first question. The poll didn't seem to work, right? So we're ready to go here. I don't see the answer. I'll keep an eye on it, but we'll see it go. We have a lot here. There we go. Most of the people aren't using them now. Okay, so that's very, very good. So we're in an educational mode today, and that's kind of what we were thinking. So we're going to start with an overview and a definition. And we're going to take a slightly deep dive into all three of these things, which is what we're supposed to do. And we're going to follow some examples. We're going to pick on the retail industry today because that's something that a lot of people understand. And then we're going to hopefully give you folks some takeaways to take back with you from the end of this webinar. So let's get going here. Let's just talk about the process, first of all, of statistical analysis. Because when you do this type of work, what you're really saying is we need an answer to something, but we don't have the resources to get all of the data and have a comprehensive understanding of the situation. So we're going to take a subset and we're going to make statistical inferences. And there's a lot of different ways to do that. No matter which way you do that, however, there is a sort of a method to this. And that is to have a hypothesis. And there is two results of verifying your hypothesis, the null and the alternative. We need to have a data source and then we need to prove our hypothesis. And I'm going to go over this process here just briefly because the thought process is very important if you're trying to understand this line of thinking and what data scientists are doing. I think you'll find some surprises today versus some of maybe the common perceptions of data science and statistical analysis. So our first step here is the hypothesis. And the null means that while we think this is the answer, but we find out that anything, there's a lot of chance in our answer. And then of course the alternative is that we do have irregularities, but they are going to have a bearing on our conclusion. So what we're really trying to do here, we're going to use a sample of a retail chain. And the example we're going to use is we think our hypothesis is if we train our sales associates better, we'll get more sales. So we're going to have a sample of data to do that. And we're going to think about what might happen here. Now if we have the experiment one with a null result of no difference in the amount sold between who trained and was trained, then we would say, okay, well, then it doesn't matter. But we might notice in one experiment with our data that the difference in the amount sold, there is some difference in the amount between the people that had some training and not. But at that point we know that there's a difference. We don't know whether there was some type of, whether that was good or bad, but we know that's a difference. In the second experiment, we kind of shift our view, our hypothesis that the sales people did better. And we focus our results in our analysis on the fact that if we did train them, they sell more on average. So there's a difference between more sold and not knowing just doing a comparison and then having some type of indication that they sell more on some type of average type to compare them. So the takeaway from this slide is with the hypothesis is you have to understand exactly what it is you're looking for. It isn't just that the answer is going to leap at you. You have to understand the type of answer that you're looking for. Secondly, what are the appropriate data sources we don't want to develop? First and foremost, yes, we're talking data inside an analytics. This year the theme is big data-ish, but that's not necessary. You can do this with, and it has been done for many, many, many decades without big data. This big data was preceded by many, many years by statistical analysis. So it's also the data that you don't need. There is a tendency to collect every possible thing and then dive in and hope there's an answer. But the data scientist isn't going to look at everything all the time. There is going to be some setting aside of data to use. There might be consideration of external data. The key here, like any other data insight or analytics or BI or data warehouses, what do you need it for? What condition is it in? What's it going to cost us to handle it and all of that and all of those types of things? And that's kind of an easy one for most of us data folks. And the last one is we have this hypothesis. We understand the type of null or alternative result that we're going to get. So now we have to figure out how we're confident. And this is where we get a little bit in the statistics. You can have some incorrect conclusions if you don't have a realistic view of your confidence. So we're going to look at our example here. We're going to pick a 95% confidence interval. Now we don't ever pick 100% for a confidence level because if we needed the data to be 100% certain, we wouldn't be doing statistical inference. We'd have to look at all the data and then have an absolute firm conclusion. So what we're going to say here is if we can get within 95%, we're going to accept that and that's going to give us a bit of a margin of error or a margin for error. And we'll be able to work within that. We have a couple of errors that can happen. We reject something that we shouldn't have or we don't reject something we should have. That takes a little bit of training to understand how to do that. But you're going to hear those terms with the data scientists. And the key here is which type of error is more detrimental to your investigation and then you study your data accordingly. Well, let's just quick look at our example here and then we'll just keep moving along. With our confidence level, there's a lot of numbers on this slide. You don't need to understand this whole slide. The key things here is you have .05. That's something called the alpha. Notice that's 100 minus 95 gets you .05 or 5%. And then the SIG2 tailed. That is, through the analysis, if we don't exceed that .05 with whatever the SIG or difference is in the run of the variances here, we know that the data because of some statistical theory is proving our hypothesis. Now, again, we're not going to dive into a lot of this in detail today. Just to show you that there's an example here that says that it looks like, wow, if you look at the bar chart, if we train people, we're going to get an uptick on sales. And that's pretty good. Now, this doesn't take out common sense. Common sense would be, well, at the same time we did this, there was this wonderful new product and everyone flocked to the store to get the product. So you have to use common sense with this kind of stuff. But this is the beginning of understanding how this types of analysis work. So I'm going to just go over the top three real quick and then we're going to start to dive into them one at a time. Very, very briefly, descriptive analytics. So it uncovers insight from things that have happened. So what happened? We have some data and we take a look at the data and we look at what happened and have some deeper understanding there and we call that descriptive. Then we have the next one, which is predictive, which helps forecast behavior. And that is what could happen. Now, what's interesting here is when you look at the market and the literature, everything is called predictive analytics or advanced analytics right now. And what we're finding is that they're pretty much referring, using the term predictive analytics, many places to cover all three. It's important for you to know that there are shades within this advanced analytics world. Predictive is the second. The third is prescriptive and is what should be done. So we do some analysis and some actions or activities are suggested by the result of that analysis. At this point, I am going to hand over the descriptive topic to Kelly, catch a little beverage, a little drink and we'll go from and then we'll pick up from there. Kelly, take it away. Sure. So descriptive analytics are really the backbone of analytics. Although, as John mentioned, they don't really get much credit these days, but really descriptive analytics are generally a good starting point for further, more complicated analysis like predictive and prescriptive. And while the findings of a descriptive investigation may not be as exciting as a complicated model, you wouldn't actually be able to complete the more complicated models or even know whether they're appropriate without descriptive analytics. So I wanted to spend a minute talking about descriptive analytics because they are still very valuable. And if we look at the example regarding the salespeople training, that was a type of descriptive analytics that was dealing with means or averages. So what we're going to go through here is the two primary types of descriptive analytics and then we'll walk through another example, again using a retail chain. So there's two main types of descriptive analytics, measures of central tendency, which is what we saw in the previous example, and then measures of dispersion. So measures of central tendency, most people are familiar with and use commonly the mean or the average. And so that is the average. In the previous example, it was the average sales per person. The median is the most common, or sorry, no, the median is the middle of the road answer, and then the mode is the most common answer, so highest frequency. So the second type of descriptive analytics are dispersion. Measures of dispersion are range. So what's the minimum and maximum and the difference between the two? This essentially tells you the raw spread of the data. The variance is the difference or the average degree to which the points differ from the mean. So how similar or different are the representation in the data? And this essentially tells you the difference between your maximum and your minimum, but between the average and the tails. And then the standard deviation is the most common way of expressing the spread of the data. Standard deviation is found by taking the square root of the variance, and this is the most common way that people measure and discuss the spread of the data set. Now if we just take a quick example on the right-hand side of the screen, what we're looking at here is a buying analysis. Again, using very simple data, but as a way to express what this descriptive analytics example might be. So in this instance, we are looking at customers, the number of items purchased, and the amount spent in an effort to get to know our customers overall. And so here we see that the mean or the average amount spent is $6.50. We find that the median amount spent and then the number of items that people purchased, which is most commonly people purchased one item. So within this retail chain example, we can see that there's quite a big spread of the number of items purchased. And our conclusions of our customers might be swayed if we look just at the average amount spent and we don't understand the median or the mode. So in this instance, there's, if we're looking at the number of items purchased, the number of items purchased are quite high on an average basis. But the most common purchase amount is just one item. And so this is where you can see where the measures of central tendency can give us a wide view, not just looking at the average items purchased, but looking also at things like the median and the mode to learn, again, even more about our customers and their buying habits within this retail chain example. So with that, I am going to turn it over to John to talk a little bit about predictive analysis. And then we will continue to take examples of predictive analysis and then lastly prescriptive analysis as well. Thank you. And I could have predicted that this one was next. I'm sorry, that was a little bit of, when I took statistics in college, we definitely needed levity and we didn't have it. But we didn't get it there either, did we? Okay, predictive analytics. So we, some folks, because of the name, go over predicting future events. And that's not really what's happening here. It's more answering the question, what could happen based on the factors that we're putting into some sort of model? And so the models can be very simple by looking at factors that are very, very complex and crunching through a lot of things. But we're looking at what could happen. The example here is sentiment analysis. And we see this a lot now with tweets and organizations going, finding out in the morning that there's bad tweets or good tweets and getting all that. So, and you can start to say, well, if we do something this way, we're going to get a bad social media impression, things like that. Lots of models can be used here. Forecasting, simulation, regression, classification, clustering, and there are many, many, many more. We're going to take a look at just a handful of these just to give you the idea of what's going on within these models. Remember the key here today is that, here's the kind of scenario we talked about when we put this together, that you're a data architect or a data engineer of some sort, and you're sitting in a kickoff meeting and all these words are flying around you. And this helps you figure out what folks are talking about because maybe as a data management person or a data governance person or a BI person who's starting to look at an environment where a lot more sophisticated things are going on, you might have something to contribute if you understand this a little bit better. So that's kind of our mindset. If you have any questions, don't forget to enter them. We do leave time at the end as we try very hard to get to the questions and then write our answers and get those sent out. The last week we broke our record. We had things turned around in about four or five hours. Okay, forecasting the first one here is just, you know, we all have seen this kind of taking some points on the line and bottom up and then kind of extrapolate what that line would look like. We take the means of some various periods. One for four would say give us period five and then two through five would six. So we have kind of a rolling set of means there and then it smooths out. Notice our curve, the brown curve before the blue part is kind of jagged. But then the predicted part is smoothed out. Well, because some type of smoothing occurred there. Now this is the thing with any forecasting. You're using past data to give a rough projection of the future. Well, the more data you have for one thing here, the more variable that this can become. I'm sorry, when I said more data it's like the period of time that you're forecasting. Okay, I mean if I were to take this line and say what's going to happen in 2050, we pretty much as rational people go too much is going to happen between now and then. So again, you have to use common sense here, but we're using past data to do so with our example here with the retailer, we kind of took some sales for store C. We plotted them. We did some type of smoothing technique and there's a bunch of those you can use and you can try those out in Excel if you want. And we got our little line pushed out there into the future into the years 17, 18, 19 and 20. The next technique here within predictive analytics is simulation. Now, and again, there's a lot of ways to do this, but what you're trying to do is in a simple sample data fashion run a model. It is what they say it's simulation now most of us are familiar with simulators like in airplanes or or climate simulations and we see the result of really, really interesting models all the time on the evening news or a weather channel or something like that. So this is this is in that ballpark there. The queuing model is a really good one. It's used a lot because it's wait time queue length. Anyone who's been in a a Costco or a Sam's or a grocery store of any sort and wondering all of a sudden the lines go from five deep to two deep and they turn things on somewhere. Someone has done some queuing modeling and a little indicator went off somewhere that it's time to open up some more lines and then there's been some analysis behind that. The next is the discrete event model where we can when you can't use queuing we can go look for bottlenecks. And then the last one that we'll talk about is the Monte Carlo simulation, which is a very sophisticated type of simulation scenario type thing. But it is used a great deal. We're just going to look again at some examples here here the queuing. So the takeaway here are on the highlighted things. We have a situation here in our retail store where where customers are arriving and in certain scenarios we have a wait time of 30 minutes. Oh my gosh, that's that's too long. This might be say service in the layaway department or the automotive department or something like that. And we find we got four people an hour we can handle six an hour with two people. Well with one person people are waiting we add a person but now the utilization goes way down. So you're paying people only want to work one third of the time. Well, we that's another thing that we really can't handle. So we say well, what if we just train people for more throughput and draw up more out of out of less people. So if I up the service rate to 10 an hour for the one server that scenario to utilization does improve the wait time does go down a little bit from 11.3 to 10. So the probability that a customer waits does go up however that time they're waiting does go down. So you can see this is some way to help you make your decision notice. And this is what's the difference with where the name predictive is a bit of a misnomer. It's not telling you what's going to happen is telling you what can happen based on the data. You still need to make a decision based on this data. So you're going to ask yourself a lot of questions like how much do I invest in this training to increase the service rate. You know, will this person quit if I make them work harder. There's something else people can do if they're not helping a customer if this this utilization rate of 60% or 66%. Well, can they be doing something else. There's a lot of again this is where common sense comes in. But this is a pretty kind of cool way to break down and help you make your decision in in. And again, that's what the queuing type of models do or really help give you a lot of alternatives. Of course, that's what predictive modeling does. It gives you a lot of these relationships that tells you what could happen. It's not really predicting what will happen. The best example of this is the Monte Carlo simulation where you have a lot of variables. And here we have an example of we're having our retail thing again. And we have all kinds of we're going to maybe build a new store or should we build a new store. And here we have a old maybe 20, 18 or 20 variables. Normally when you do these there could be hundreds of variables and we put in one of the ranges that we can tolerate. And the simulation runs all using random number generation runs all kinds of permutations of this and comes up with what things would look like if all these things happen. Well, again, it gives you a bunch of data to look at. But then you have to look at a particular scenario and say, oh, that's what's going to happen if all these other things happen. This is a really cool one. And this is so cool and so in demand in some tools, it's an upcharge and other tools. It's a much requested feature for statistical tools to have within them. The last one is something we've got two more to go here. So real quick regression, regression analysis, it's one of these understanding your independent independence. So that's the old statistical conundrum of causality and correlation. So now we can start to get some things to maybe help us work through that. So for logistics is a good one. You know, if our store is closer, will someone shop there? Will they go to the competitor? You can look at linear relationships. For example, daily store revenue by the number of customers that enter the store. In other words, if someone enters the store, are we going to have more revenue? Can we tie the two together? And then we can learn how to see what things tie together with this type of modeling. And what you'll find in the case of our example here is that you can actually, if people do come into your store, we can predict revenue. So I do have a correlation between a headcount and a revenue forecast. You know, it's really important to understand accuracy here. Again, use common sense here. There's lots of ways to do regressions, but that's the general idea of this type of model. The next one, near and dear to my heart, because I have had in my professional life a lot of use for classification and costuring. And that is where you start to group things in the common characteristics and then see how they relate to others. For example, for the marketing team, you want your marketing team to pay attention to social media for your grocery store. So they might do some type of sentiment analysis and then classify the content of various posts to positive or negative and then put those into some type of model. So we can – the colors would indicate perhaps the sentiment or a category of something. Then we take a look at the grouping of revenue numbers and then against the cot, you know, the number of items purchased. And here you can see, wow, you know, of course, everyone looks at the upper right-hand corner, lower left-hand corner, things like that. Again, remember, this is showing us how things will relate. Now I have to look at this or this data scientist or the business person have to sit down and look at this, and then they make their decision from here. Clustering is a lot of ways you can look at clustering. For our example of the retail change, you know, suppose we look at the data from a customer rewards program and you see customers items purchased in a year and then a total money that they spent. You can take that data, look at if the rewards program is working. You can see if they're getting some type of rewards feedback. Did the targeted promotion work? Should you do more targeted promotions? And all of us who have shopped or have a credit card or anything like that, we get things in the mail. We get little messages on our phone. We log into Amazon to buy something. It tells us something. This is the type of analysis that is making those type of things come to mind with retailers or anything. We actually use a form of this in our practice in terms of really large organizations with lots and lots of data requirements. We will cluster organizations where we have hundreds and hundreds and hundreds of information requirements or uses of data. We will cluster them by the categories to see where they sit because we might, where this has helped us in the past as using statistical analysis and doing data architecture is in a really, really large enterprise, you will always be challenged at why do I have to standardize things when my area does its job, leave us alone. And using clustering, you can say your data requirements match 60% of the rest of the organization's data requirements. Therefore, the burden of proof is not on us. The burden of proof is on you. What makes you so different that we have to spend the extra money to have a special architecture for you? And that's the exact same thing. We've done that type of analysis in our architecture work. So that's why this one's kind of a pretty cool, a pretty cool example. So from that one, we're going to move into the really cool one, and we save that one for Kelly, and that's predictive analysis. Prescriptive analysis. Did I stay predictive again? You did. I'm sorry. Well, I thought mine was cool, but actually prescriptive is much cooler because that pushes a lot of people's thinking, too, when something says you ought to do something, right? That's kind of a neat, there's a neat cultural aspect to that, which we've talked to in our other webinars. But take it away, Kelly, on prescriptive. I got it right that time. Sure. Okay. So if we go to the next slide. Oh, am I meant to be advancing these? Sorry, I thought John. Okay, great. Thank you. Okay, so this is our last category. We are going to take a similar approach as we've done in the previous categories in terms of talking about the most frequently used types of analysis and then providing an example to give you a feel for what this could look like. So as John said, that prescriptive is really what should happen or what should I do. Realistically, you can get some of those answers via descriptive and predictive analysis as well. So it's not to say that a prescriptive analysis is the only way to figure out what you should do, but it is a way of using statistical analysis to base it on past performance as what should happen. Usually, prescriptive analytics answers explicit questions that you're looking to solve to improve your business and usually it focuses on maximizing profits or minimizing costs. Many of these are done through programming, such as the four examples that we have listed here, but some of these you can do more simply in a more manual way. So let's just go through and define the four examples that we've got here, and then the next slide will actually show an example of linear programming. Linear programming is used to minimize or maximize output, again, usually minimizing costs, maximizing profits based on multiple variables. So for example, where you have a limited supply of resources, such as in the next example, our limited amount of resources storage space, maybe it's manufacturing time, assembly time, maybe it's a limited amount of parts, etc. Each product uses a different amount of resource, and each product provides a different amount of profit. So in this instance, linear programming just as its title says is that all of the variables are actually linear in the way that they are represented. Nonlinear programming is one in which at least one variable is not linear. So it's a just more complex way of looking at how those different variables relate to each other and therefore what the prescriptive output would be. Integer program is a subset of linear programming in which at least one of those variables must be an integer or a whole number. So there's some typical use cases for this. For example, capital budgeting is usually using integer programming where you can only order, you know, five or four of the five product options. Sometimes when you're looking at warehousing locations, again, I'm going trying to continue with this retail example where you must minimize costs associated with transportation between a warehouse or a store location given a specified route. This can be used in scheduling. So let's say you've got 10 salespeople who live in various parts of the city and five stores in different locations around the city. Where do you want to put those salespeople not just based on their location but also based on their skill set, their training around how they've been trained to sell particular items that are represented, etc. And then mixed integer programming is a subset of integer programming where some variables are constrained to be integers and some are not. Again, for typical types of prescriptive data analysis. So we're going to take a minute and go through a linear programming example. And in this example, what we're trying to solve for is that, I guess, rust colored row at the top. We're trying to determine what is the optimal product quantity to order across our five product lines in order to maximize profit. So our goal is what is that total profit and what is the maximization of the profit. In this instance, we've got some constraints. So you'll see below those constraints that we're trying to take into consideration. Each item takes a different amount of storage space. Each item has a different degree of selling effort. So the storage space can be very specific. The selling effort is probably a scale that has been created and agreed upon and approved by your stakeholders. Minimum order is bound by the provider of that product or the manufacturer. So in this instance, we're looking at product A provides, has a profit per unit of $5. Storage space is .05, has a low selling effort. So it's .25 versus we see that product E has a high selling effort of 7. And it's got a minimum order of 100. So you can see how all of these relate to each other. So within a linear programming example, you can actually solve this by hand or graph to create a graph. And you can then see how that graph extends in order to find the maximum profitability per order. However, it is of course much faster to do this using software. So using linear programming software, we can solve for the problem below. And you can see that the linear programming software has given us a solution of ordering the minimum amount for products A and B and C. But for product D, we want to, sorry, I'm getting this wrong, the minimum amount for products A and B. We want to maximize our order for product C. And then we're looking at getting the minimum amount for E and D as well. And what we find is that we have a maximum profit for this order of depending upon the scale, either $14,000 or $14 million. Now, one of the things that this also tells us is that we have some unused storage space. So if we look kind of at the bottom here, towards the bottom right-hand corner, the output of this also tells us that we have used up 852.5 of our storage space when there's 1,000 available. So one of the benefits of this type of programming example is you can actually get additional information that may be helpful in other sorts of decisions. So if you have some leftover warehousing space, maybe there is a different product line that you want to use for that storage space, or maybe you want to use your human analysis to modify this order just slightly to take advantage of that additional storage space. So again, there's the output from the linear programming software that solves for the profitability and then also taking advantage of some of the additional information that is found as a result of that linear programming example. I think that's back to you, John. Oh, already? Already. Well, we went through those fairly quickly. I wanted to just revisit on the predictive, and this is where I mentioned something culturally. I wanted to come back to that. And I think you mentioned it was, you know, this type of analysis is the one that's going to say, well, you know, this product is not the most profitable or takes the most space up, or your order quantities are wrong. So you might want to consider things like cutting the product, not selling the product, outsourcing the product, all kinds of things. And these are the ones that this is the type of analysis that can, you know, raise eyebrows with a lot of people. The other ones will say, well, you can consider this or you can consider that. Here's the ramifications. This is the analysis that goes, ooh, really? And people will open their eyes and go, I'm not sure we want to do that. Excuse me. So let's kind of just revisit these and do a comparison here for a little bit. And we have a few questions that have come in. And please remember that if you do have some questions, we will do our best to answer them. And if we don't get them all answered, we will put those answers to paper and get those to you in the near future. So the most common of our three types here is the descriptive. And I know one of the questions that came in have kind of, we can answer it while we're going through this one here. The best practice here is to perform this as I understand your means, modes, standard deviations. This will help with your hypothesis. This will help with your null or your alternative answers that you're looking for. This is that initial pass someone's going to make at the data. And you have to see if even the data is worth using. So this is a very important type of analysis that is going to be done. And after that is, well, so predictive, maybe you want to know what could happen if we understand all the variables again, or what should we do once we have all the variables understood. And we're looking at what that's saying. We run some additional type modeling, and it tells us based on certain assumptions you should do X or Y. Is there a precise step between all three of these? No, not really. It's like anything else in this world. There's going to be indistinct borders at times between these. But there's three distinct thought processes with these. Really, how much time do you have? Obviously, when you get into predictive and prescriptive, you're going to have to run a model. You're going to have to stage some data in whatever tool might run that, or you're going to have to make sure that the data that you have positioned is visible to some tools. Do you have the right people to run those and understand those? Is the data accurate enough to support your hypothesis, not support the answer, support the hypothesis and support the analysis you're doing? Don't forget, the same data set can support one type of analysis and totally not be able to support another type of analysis. And that's, again, something that the data scientist or the analyst is going to be considering. And when you're supporting those folks or talking to them, a good thing to bear in mind that one super-duper great thing they did a few weeks ago might not be repeatable on another data set. Then how popular or accepted is the model? This is where, if you have a model that says, don't sell Product C anymore, but Product C is the product that Great Granddad founded the company on. You've got a business dilemma. You have a conundrum now because it is something that might be your brand or your visibility isn't really doing anything for you. So subscribing to the, that's how we've always done it, may not work. But on the other hand, then make sure that your stakeholders are aware of what type of analysis is going to be coming out of some of these models. These models will give you answers you did not expect, and they will give you answers that you may be uncomfortable with. We've seen this in many, many disciplines. Those of us that travel a lot are experiencing the, I don't know what word for it, the a la cartization of airfares. And you're going to get charged to put your bag in the overhead if you're not getting charged that already. This is a model that came about based on a real keen analysis of data and spending patterns on what people will or will not pay for. And again, the results a little uncomfortable for the stakeholders, but we will see what happens. The recent election in the United States had one set of data scientists say it's going to go this way. Another model in another country said it's going to go another way. One model turned out to be right, and the other model turned out to be incorrect. So again, common sense, reality, those are the kind of things that we want to bring to bear on all of these. So to review descriptive, what happened? Predictive, what could happen based on a bunch of variables? And then prescriptive, what should happen if all this other stuff happens? Now the difference, one of the questions was why is clustering not considered descriptive? Well, because clustering is taking a lot of variables into play. It is one of those simulations where I have to categorize things and then I can try different categorizations and different permutations of the data. Whereas with the descriptors, pretty much you're taking it as is, where is, and in describing some, that's why it's called descriptive. You're describing characteristics of that statistical sample. So that's why the clustering is not put in the descriptive category, why it's put in the predictive category. Let's see here, what's, Kelly, I think you're, you time to come off, go ahead. Yeah, so before you go on to this slide, I think another thing just to consider is that you don't have to use these in isolation. And so we did talk about how descriptive can be a foundation to validate what information you have. Can you do a more complex model? But predictive and forecasting could be fed into, so the output of a predictive model could be fed into a prescriptive model, right? So it's not that you would be using either or, it has to be one versus the other. But there might be benefits in the way that you integrate these different sorts of models to either provide that additional information in terms of what could versus what should happen. But to get to that next level of granularity. So that's one thing I wanted to highlight on this slide. And then the second thing just to add to what you were saying, John, is that this last bullet point I think is very important. And it's not just making sure that you have validated with your stakeholders that there is a resonance and a consensus around the output, but that the model that you're using or the way that you're getting to the output is accepted. So this could be because of a level of understanding. This could be because of a level of inputs that go into it. But it's just as important to ensure that you have agreement that the model is valid and meaningful in the same way that once you get the output in order to action a decision, you need to get agreement and consensus around the output of the model as well. That's all I wanted to add. Really, very good. Well, let's just we'll stay on open here and we'll give our audience some takeaways. And thanks again everyone for listening here. We do have time for some questions and we probably have some room for a few more. I think I haven't looked at the list in the last minute or two, but it seemed there was. And thank you everyone for hanging in there. It's a nice really good turnout here and we really appreciate it and we hope we're helping key takeaways today. You have to plan this out. Kelly was just kind of talking about that. You have to have some awareness and consistency on how you do these things. It isn't there's a perception out there that the data scientist just dives in and some miracle occurs. No, it's there's a discipline. There's there is science going on here. There is not a replacement for common sense. So don't don't just take everything for face value. We had a client a few years ago who and this is an example I gave in one of our first talks. The initial recommendation was close all the branch of close some huge percentage of branch offices. And it turns out that was driven by an incorrect assumption on date that someone didn't use common sense. So, you know, you have to be careful. You have to use some common sense here. There's a lot of resources out there on this stuff, but the key when you're looking to help understand is the word applied statistics. If you run down to the local university and grab a 400 level statistics coursebook, if it were me, my head would explode. You need to look at some things that are applied and gives you some good examples. Big data is not required data insight. It's an analytics. The name of our webinar here it covers big data, but you can use this stuff with little data. So one of the questions that came in here Kelly is someone would like some examples of tools that is in three types of data. Now any of your long term statistical tool suites, which are well known, so I'm not advertising for anybody here, SPSS, SAS, right? They work in big data environments. They work in middle and small data environments. They have the ability to do all of these things and are priced to configure in various ways. But your alterics and other types of tools also have big chunks of functionality that cross all of these as well. When you're looking at them and picking them, looking at the type of models you want to run is a big criteria for your evaluation of these things. A basic understanding of statistical tools. Don't forget that Excel is a useful statistical tool too. That's simply me. Yes. If you want to just start to play with this stuff, yes, Excel. You can smooth curves. You can do exponential smoothing, logarithmic smoothing. You can do all kind of plotting. You can do clustering in Excel. If you really get fancy, you can download or purchase macros that will go on top of Excel to do some even more sophisticated stuff. There's a lot out there that you can, you know, and in the old days, and I'm dating myself, you know, you could put too much data in your PC and things, tools wouldn't work. Nowadays there's a whole lot more data you can stuff in there on your desktop and go crazy. And I guess Kelly, you can weigh in on this one here. A basic understanding of statistics goes a long way. As we were putting this material together, Kelly and I had to revisit some stuff that we had done in the past and had done a little bit of it. It's been a few years since you did it or you heard it in college and you never got back to it or I worked with some people, for example, that did all of that and listened to them and saw their results, but it's been a long time since I myself sat down and did something like this. And refreshing our awareness was invaluable for this particular topic and for the whole data insights and analytics topic, right? I mean, it really helped expand the vision of how this stuff is going to be used and then how do we support it and how do we get the data in the right place to do it. It was a great refresher for us as well. Here's some examples. We have a bibliography, Statistics in Plain English. When Predictive Models Fails, a tech target topic out there and a podcast on statistics, proving that there's a podcast on just about anything, I guess, nowadays. So there's some other sources for you to consider. Kelly, anything to add here before we take a look at the questions? Nope. I think that this is all really good and I do think that we did cover a lot here. So there probably are a bunch of questions that we could go through and, of course, anything that we don't get through live, we will answer in written format. Sure. Well, here's one that just popped in. Data analysis or data analytics, is there a difference and which is the correct one? That's an interesting question. I think analytics, data analytics is more a label of the discipline and perhaps data analysis is a label for a process. That's how I would view it. Kelly, you thought on that? Yeah, I would agree with that. I think that that's a good way to think about it. I think the implication is that analysis tends to feel more simplistic versus analytics is the bigger fancier picture, but I think that that's accurate. John, one is more of a practice and the other is more of a process. Yeah, and I think, and this is a really well timed question because there is a, and this just happens in all human endeavors, you know, labels get assigned and then there's an impression with the label. You know, someone who is a data analyst or someone who does data analytics, the impression might be that the latter is a much more sophisticated, smarter, more highly paid person, but the reality might be they're identical. The reality might be it's reversed and the reality might be, yeah, it is a much more sophisticated. It just depends on things. The takeaway there is look underneath the label and look at what they're doing. Are they running these types of models? Are they just moving the data? Are they maybe just doing descriptive models and coming up with some relatively simple understanding of the data versus moving it up the curve? So, good question. I think it is to keep the functionality and what you're doing and what the intended results and what you're going to do with the results are really the important considerations before the label there. Let's see, we have another question here. Data-driven versus data-informed, big deal or not? I'll take, you can have that one, Kelly, to start. I think that those, well, my opinion are those are levels of maturity and some organizations need to pass through data-informed before they can be data-driven. So, in my view, the difference between the two is that the first is that we are using data to make decisions. We are using data to identify trends. We are using data many times to validate a hypothesis that we're making without data. Data-driven is we are proactively incorporating data into all decision-making processes and we are not making a decision until we actually do some data analysis. So, it's just a difference between using data to inform decisions versus using data to truly drive decisions. So, it's a nuance, but I would say that there is a difference and it's mainly a cultural difference and likely a maturity and progression. Yeah, I think it's a good, I think data-driven and data-informed are kind of shades of maturity. Both of them require a certain acceptance of models. It's similar to the prediction versus the prescriptive acceptance of that, but it is a shade of maturity. Not to say that you have to, I'm not so sure about going through one before you get to the other, but there might be a way to go directly from point A to point C, but definitely a good way to suddenly the difference between someone who's really, really going to build data into everything or just will consider it as they see fit. At that point, we're out of time for the questions. We're at the top of the hour here. And in a moment I'll turn it back to Shannon, but please join us in a month for the next webinar, Building a Flexible and Scalable Analytics Architecture. And in that one, we're going to be throwing up some punch lists for architecture and some ideas for you for a reference architecture. That is broad spectrum all the way from your traditional type of BI all the way to the more sophisticated, big data type things. So we look forward to presenting that material for you. Shannon, Kelly, anything to add or we'll turn it back over to Shannon. Thank you, John and Kelly so much for another great presentation. I just a reminder for everyone I will send a follow-up email by end of day Monday with links to the slides, links to the recording of the session and anything else that was requested. It looks like we got through all the questions, but we'll just do another quick scrub of those. And thanks to all of our attendees for being so engaged in everything we do and just asking all the great questions. We appreciate it so much. And I hope everyone has a great day. Again, John and Kelly, thank you so much. Thank you. Appreciate it. Talk to you in a month. Bye-bye. Ciao.