 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Officer of Dataversity. We'd like to thank you for joining the latest installment of the Monthly Dataversity Webinar series, Advanced Analytics with William McKnight. Today, William will be discussing strategies for machine learning success. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we'll be collecting them by the Q&A section. And if you'd like to chat with us or with each other, we certainly encourage you to do so. And just to note the Zoom chat defaults are sent to just the panelists, but you may absolutely change that to network with everyone. And to find and open both the Q&A and the chat sections, you can find those icons in the bottom middle of your screen for those features. And as always, we will send a follow-up email within two business days containing links to the slides, the recording of this session, and any additional information requested throughout the webinar. Now let me introduce to you our speaker for this series, William McKnight. William has advised many of the world's best-known organizations, his strategies form the Information Management Plan for leading companies in numerous industries. He is a prolific author and a popular keynote speaker and trainer. He has performed dozens of benchmarks on leading database, data lake streaming, and data integration products. William is a leading global influencer in data warehousing and master data management, and he leads McKnight Consulting Group, which has twice placed on the Incorporated 5000 list. And with that, let me give the floor to William to get today's webinar started. Hello and welcome. Thank you, Shannon. Let me get my screen going here. And welcome, everybody. I appreciate you being here, and I appreciate what we have created over the course of the years doing advanced analytics. And we are going full steam into 2024. So I look forward to seeing you all back second Thursday of the month at this time every month, next year. And thank you as well for pounding that proverbial like and smashing that proverbial subscribe by coming back every month. I see there's a lot of regulars to this now, which is great. You helped this program out by sharing it with your friends, liking it, and taking a peek at our sponsors, too, when we have them. So now let's get into it, strategies for machine learning success. This has come about over the course of this year. This year is when I think this really took off in terms of organizations, enterprises, actually doing a lot of machine learning. And the preparation for next year has been nothing but extraordinary in terms of the ratio of projects that we're looking at, that we see, that are going to be machine learning oriented projects. That is where the enterprise is going. So it's important that we take a step back as we do these projects and make sure that we are ensuring success. Now, some of these strategies come from my work. Some of them come from what I've heard in my walk with fellow consultants, analysts, and vendors. And some of it I just sat here and surmised based upon thinking of the whole life cycle and where things could go wrong and where things can go right. And with that, we will launch in. By the way, if any of you are thinking, what is the data guy doing talking about machine learning? Well, yes, I do associate with data and I think machine learning is highly associated with data. I've always felt that way about machine learning. And I often feel that it is too bad that data scientists have to spend so much time in an area that they're frankly not as well equipped in, which is data wrangling. And so that's why I've been doing a lot of data wrangling for machine learning and learning things in the process and being very, very close to the applications. So with that, this is our partial client list for my company, McKnight Consulting Group. So just to let you know, this is where a lot of the practices, the strategies come from. And this is, I'm just proud to share this. If you'll indulge me a couple seconds, this is hot off the press. This is our 2024 tech logo guide. These are the capabilities that we have. If you have any need in any of these technologies or any combination of these technologies, let us know. Okay, machine learning. I'll start with a few minutes of definition on machine learning. And as we get into it, I will have 24 strategies for you today for machine learning success. So let's start with what is machine learning? And this is how I define it. It's supervised learning plus unsupervised learning plus reinforcement learning, these types of learning. Now supervised learning is like regression algorithms. Where the regression model is going to look at features and output, output to score, for example, what's the price of a house based upon the square foot of the house and many other dimensions and things like that. The error is defined as the distance between the prediction and the actual. So how far are you off of the line that's formed by the regression model, which can be polynomial, logistic, or a straight line, linear. And a regression problem, so when you know you're in supervised learning, it's when the output variables are real or continuous in value, such as salary, weight, things like that. Regression, like this particular algorithm is that I'm talking about, requires continuous output. So you can fit a curve to a data point. Another type of supervised learning is classification, not a popular type, there are many others, but you might think of classification and you can classify that as a supervised learning problem. So this is like progression with a format of the prediction being different. The classification model will actually predict the outcome. A classification problem is when the output variable is a category, for example, when filtering emails as being either spam or not spam, or when looking at transaction data and trying to determine if it's fraudulent or authorized. So it's a categorical output. On supervised learning where the data is unlabeled, typically the typical algorithm we use there is K-means clustering, which is where you find groups in the data, you're able to group up the data where the data hasn't been explicitly labeled. So this can use domain knowledge of the dataset to try to actually label the various groups within the data. And it's a beautiful thing because it helps you to classify your data and act appropriately in tiers instead of and individually instead of just one way for everybody. Now reinforcement learning, that's a little more complex. This is like when a clicker for example is a technique to let your pet know that some treat is about to get served. So that's that's the example. So this is essentially reinforcing your pet to practice good behavior. You click the clicker and follow up with a treat, and with time your pet gets accustomed to this, right? Generally, we know the start state and the end state of an agent, but there could be multiple paths to reach that end state. Reinforcement learning finds an application in these scenarios. So this essentially means that things like driverless cars, self navigating, whatever, vacuum cleaners, the scheduling of the elevators and all that, street lights and so forth, they're all applications of reinforcement learning. So that was a little bit long-winded, but I wanted to spend a minute on what is machine learning? What are we talking about here? It's a subset of artificial intelligence. And the uptake is strong, as I mentioned before. And this shows you by industry how many machine learning models are in production. And I talked to the author of this study, and he's from Harvey Business Review. And I can say that the zeros, although they're not explicitly shown on here, the ones that have none, that's probably most of the gray on the far right. I'm not sure why it's on the far right, but as you can see the percentage is rather low, 5%, 4%. Most companies have models in production, and some of them are getting started on their journey one to 10, but probably the sweet spot is in that 1150 range. And it's only going to go up from there. And I would say between, and I've been to several conferences, several fall conferences, between this machine learning and the things like I Column Developer Assist, things like Oracle Apex and Microsoft Co-Pilot, Amazon Q, these are the hot areas that the vendors are pushing the direction into as well. So you are in a very hot area when you're talking about machine learning and the use of AI and machine learning to drive business transformation and reimagine customer experiences has been throughout all industries and will continue to be so. That's why we need strategies for this. The use cases, I kind of hesitate to box them in here because there really is no limit. It's everything. Everything that you're doing as an enterprise could have a machine learning component to it. It's just a matter of time. It's a matter of priority. It's a matter of when things make sense to get around to. And that's going to be a combination of factors. That's going to be a combination of how easier things to do. It's going to be how important it is, how much savings and efficiency can you drive? How much TCO and ROI can you drive out of the project? And some things set up other things. Some things are more preliminary in terms of what they are. Now, I've been working in healthcare and technology here recently. So I'll pick on healthcare. So these are some things that machine learning can do for healthcare and it's doing already. Patient care pathway optimization. So the care pathway can be determined the most effective one for the desired result. Usually a healed patient. Disease research and drug creation. Many variables go into that. The early diagnosis of conditions. And wouldn't that be nice if our medical system can be turned from one that is really sick care, so we don't really engage it until we're sick, to one that is helping us do early diagnosis of conditions? Could machine learning be a conduit to that future? I hope so. And finally, different forms of patient safety are done in healthcare. But whatever industry you're in, there are flow optimization, modeling and analytics, predictive insights and threat and risk analysis. At least that's how I've classified them. There are those cases for you as well. Now, my first strategy for you here today and keep in mind that these strategies, some of them are high level like this one. Some of them get into much more detail though. So there's something here for everybody, but somebody in the organization needs to be attending to all of these strategies. So you might want to make yourself a checklist, a grid or whatever and make sure that as you go into your project that these things are being taken care of. And some of them are before the project is even conceived. Obviously like this one, when you think machine learning, think big. Machine learning is not just about small projects. It's about transformative solutions. So don't focus necessarily completely on short-term gains or incremental improvements. Think about how machine learning can be used to address the major challenges and create lasting impact. Let me add at this time that in my venture capital circles, you must have these new pitches and so forth. They must have a machine learning and artificial intelligence component to them today. That's just how hot it is. And so this success is not guaranteed, but it's not going to be solely about technical expertise either. It also requires a collaborative and learning oriented culture within the organization. So we want to encourage knowledge sharing across the departmental collaboration and so on. The things that create the big ideas that machine learning can be a part of, but I say look everywhere for machine learning opportunities. Look at the products you make and the services you offer. What can we instill in them from machine learning, the supply chain for those products and services, business operations, the intelligence you use in determining and designing your product and in the marketing approval funnel for your products. So it's everywhere really. I've done a separate presentation on I call it the world in 2050. So you can find that and you can see how artificial intelligence and machine learning are going to impact our lives in such profound ways. So anything is is is gained for this. Now this is the no-brainer. You knew it was coming. I have to give this success strategy everybody does, but in this case everybody does it because it is correct. You must align this with business goals like everything else. So you need a firm grasp of the business issue you're attempting to address. You need to know how important good old return on investment is going to be in the success of that project. What are the success criteria for the project? And if it's return on investment, then you got to deliver that cash flow back to the business. You got to show how you're going to do it. You got to track it and deliver that back to the business. But sometimes it's just breaking into new strategic ground. I think a little bit more. We've definitely returned to ROI this year. It's been that kind of a year. But as we break into more machine learning oriented projects, I'm finding the windows opening a little bit more for strategic projects that break into new ground and we don't have time to stop and look at ROI. It's going to be that transformative to the company. So you got to know, though. You got to know what is the goal of the project, what are your business goals, and how machine learning can address them. And when you do, boost productivity and effectiveness. Look for repetitive tasks. Look for improving decision making and personalizing experiences, especially the customer experience these days. Optimize resource allocation and weigh the compromises that accuracy, efficiency, scalability, and cost, all the things that are part of a machine learning project. How they are all brought to bear for the business. And by optimizing efficiency and performance, one can guarantee the robustness, agility, and sustainability of their ML solution. So bringing people to higher value added tax in the organization must be a goal of the project. So this is another lens to look at, well, where can machine learning be effective here? Or maybe you're already chartered with using machine learning to do this thing. Well, what about this thing? Is it that you're trying to do? And I say part of that has to be bringing people of the organization to higher value added tasks. Now, let's turn a little more technically and look at the machine learning stack. Anytime you have one slide for something as complex as this, it's going to be out of date, it's going to be probably well off of what the possibilities are. But hey, we know that we have these data platforms out there that are sourcing data into various layers, a couple layers I'm going to add here because I think most of you are going to be familiar with the whole kind of data warehouse thing where you have source data and you have BI on top of the data warehouse. Okay, we get all that. And by the way, data warehouse slash data lake slash lake house. And a lot of, let me, before I end that point, a lot of the BI tools today, like ThoughtSpot comes to mind, but a lot of them have AI built in. So there are machine learning algorithms that are built in there, but a machine learning project, we must separate these, a machine learning project generally requires much more expertise and effort to develop and implement than simply, I don't know, clicking a few buttons in ThoughtSpot, although that is great. I am talking about full on ML projects here today for the most part. Now, I show you a unified semantic layer. What's that? Some of you are, that's new. Okay, so that's a conceptual layer that sits between the raw data and the ML models. It provides a consistent and standardized way to represent the data regardless of its source or format. That way the data is uniform and can be addressed with machine learning algorithms. So this makes it easier for ML models to understand and use the data. It can also help to improve the performance of the models. Now an acceleration layer, that's the other layer that might be new and different in machine learning. It's a type of layer that is designed to improve the performance of a neural network by reducing the amount of computation required. And for a lot of us these days, this means getting GPUs if you can get them. TensorRT is a runtime optimizer. For example, they can optimize deep learning models for deployment on hardware platforms. So there are some new things in the stack and what's the full stack look like? Well, let's look at the full stack. And we break it down by 11 things, 11 categories. You might do it a little bit differently and that's okay, but make sure you're covered. Or you've decided consciously that, wow, William, that's for an enterprise level, global 2000 company, big project. Yes, it is, by the way. We're not quite doing that. So maybe we don't need a this or that a data catalog comes to mind. Well, really think twice about that. We're not going to belabor that here, but here you can see some of the major stacks. Azure, AWS, GCP, and the fourth one I'll call the snowflake stack, even though snowflake has a much more limited part of the stack than the other three do for their stack, right? Snowflake's a database, a data warehouse, but it does other things as well. But as you can see, that's I've really thrown in some other things. There are some things from Amazon. What else? Tableau, I threw in there. I mean, you can mix and match, especially on the snowflake stack. As a matter of fact, nobody has these other three hyperscaler stacks completely that I'm aware of. Everybody has their pet. I like this or I like that. We've got to use it. Nobody's exclusive to one of these stacks. So if you're not, that's okay. You might have a different VI tool. You might have a different data integration tool. Maybe you don't want to go with, for example, AWS Glue, you want to go with a more robust enterprise solution like an Informatica for that. Yeah, that's okay. But these are the standard stacks that the vendors would push to you for now. And I just want you to see all the products that are involved. And that's why machine learning costs a lot. And we're going to get to cost because that is a success strategy coming up for you as well. Now, know what you're building. Know what you're building. You're chartered to do ML. Are you building a machine learning program, which will provide analytics, automation, and personalization, all the things that it provides for several projects? Or are you captive to a project, which frankly, most people are today. And here's what I say to that. Try to build it with scalability in mind. Try to build it so that you're not completely going to be locked into just supporting this project with this ML project. Make the project into a program. So keep that in mind as you build your project. Are you trying to just build machine learning insights for a project? In other words, you're not doing the project. You're not doing, let's say, predictive maintenance. Somebody else is in charge of that, but you're in charge of the ML parts of that. Okay, just know. And the inclusion of new projects into an existing ML program. There's already ML going on. There's already some best practices, some standards, and so on. You got to fit into that. And so if that's the case, you got to know what you're trying to fit into. So know what you're building. That's going to be my success strategy number five for you. Now let's get into cost. I cannot talk about machine learning strategies without talking about cost because most everybody out there cares about cost and these things are quite expensive, much more so than even if you're building a data lake or as we used to do, build data, just data warehouses, right? And all these things that we like to build, master data management hubs and so on, ML costs a lot. And I am throwing in, of course, the data layer here. It's very important. We'll get to that, how important it is. But it's a big part of the of the stack cost. Now a large project stack cost, and we did a study on this. So there's a lot more behind this I won't get into, but it's going to cost between 7 million and 23 million. This is like large global 2000 type of company doing a full ML based project, taking it to production, probably has some technical debt to deal with, and so on and so forth. Um, buyer beware, as always, you can't completely buy into what anybody says, myself included, you need multiple opinions and you need to put your thought bubble on above your head and think about everything you are hearing and make the best decision. By the way, I know this is about AI and ML and all this automation so on. We still need to do these projects successfully. We still need a lot of human judgment. All right, so nobody's taken that away. You might be you might be building something that doesn't have human judgment that has machine judgment, but you need to apply your human judgment to it. Let's be sure about that. Now hardware is often the biggest performance bottleneck of a database management system, which is going to be a part of the stack here. And most of those cloud analytical products scale in powers of two, watch out, that can really drive your cost up. And in many systems, you can add more memory here or more CPU there at a more fractional cost. The true gauge is price performance. So be able to understand what your price performance is going to be. So for that, you need to benchmark. You need to see what the performance is going to be. Make sure it's acceptable. Make sure you know what you're getting. And frankly, I would say pretty much try to make sure that the performance is top notch is as good as it gets. And you won't know that unless you at least benchmark a couple. And the true gauge of project efficiency is ROI. So bottom line ROI, I am not opposed to ROI for machine learning projects. I'm actually banging the drum for it because it makes sure that the project has legs in an organization, which still is requesting a lot of ROI, as I mentioned before. So be cost conscious about the ML stack, that strategy number six, be on the lookout for cost optimizations, like you don't have to pay when the system is idle, compression, and moving or isolating workloads to avoid contention. Look for the ability to operate on compact open file formats like Parquet or Iceberg, Hootie. Yeah. Wow. We are all over Parquet. I have just gotten so excited about Parquet. I've always been excited about columnar formats, going back to columnar databases, going back to Vertica, I don't know, 20 years ago, but leading right up to what we see today in Parquet. So it's easy to turn your data lake into a Delta Lake. All you need to do is specify when you are storing data to your data lake that you want to save it in Delta Lake or Parquet format, as opposed to other formats like CSV or JSON. And this gets you ACID compliance. This gets you DML capabilities on that data. And these can be quite essential for these applications. And there's a whole lot more to say about that, but I'm going to move on. But Parquet, Parquet, Parquet. Okay. Okay. Also cost and spin out of control, if you have to pay a separate license for each deployment option or each machine learning algorithm, and some of them do that. So watch out. Find if you consider how you will be paying, because it's not consistent across all the products. Per user, per node, per terabyte, per CPU, per hour, etc., per combination of these. So be cost-conscious about the ML stack. And 6A is kind of related, right? Watch for pricing gorgeous. Like I mentioned before, concurrency scaling, serverless RPU. Now, picking on Redshift, they all do it, but picking on Redshift for the moment, serverless RPU usage refers to the amount of Redshift processing units consumed by a serverless workload. Now, this is serverless, not necessarily promoting serverless, especially for your big enterprise grade applications, but that's just an example. And you also may get some extra costs thrown at you. For example, SageMaker costs for Redshift. Yeah, it does. Now, speaking of Redshift, for a moment, they have reserved instance pricing, speaking of pricing, which can be substantially cheaper than on-demand pricing. This is for all of them. These are available with a one or three-year commitment and is cheapest when paid in full upfront. So yeah, it's pay as you go, but within some big limits. I don't even say that word anymore, pay as you go. It's really not. Okay, so to know some of the things that I just mentioned, performance and price, you need to benchmark your stack. And I find so many enterprises try to do this on their own, and they get halfway through their goals, and they just have to make a wild guess at it, because it is hard. It's hard to get it fair. It's hard to get it locked into what you're trying to measure. It's hard to measure everything that you want to measure, but I still go to say, I think it's essential. What are you benchmarking? Training performance, loading performance, inference performance, with concurrency or without? How about ease of use? What competition is there going to be? What models? What data? What efficacy? How far do you go with this? And I'm not answering the questions, but I think putting the questions out there might help you. And what kind of scale are you going to? This reminds me, I was in a band, we called ourselves the 999 megabytes, but we could never get a gig. We could never get a gig. Okay, I'll be here a week. Anyway, back to the benchmarking things. We got cost in there, the number of runs, how big the cash is going to be, the number of nodes, how much tuning is going to be allowed to do a comparative benchmark. What's the vendor involvement? People don't think about this. Are we going to let the vendors crawl all over this and tune it and give us their advice and so forth? Are we going to try to go on our own? I'd like to, more or less, I'd like for you to try to go on your own, because ultimately you will be on your own. And I think that having the skills in-house eventually to do everything is pretty important. Any free third-party software, any not-free third-party software, finally just make sure you're measuring price performance. Now, some of the key things that we're measuring is recall, which measures the ability of the model to identify all true positives. Precision, which measures the ability of the model to identify only true positives and avoid false positives. Hopefully, you're seeing the Venn diagram in your head when I talk about these things. And F1, which is the harmonic mean of precision and recall. We're doing this right now for a vector database benchmark. And it's very exciting to see these measures come into play. New measures for us. Quick strategy number eight here. Don't ignore the UX. I know we're talking about machine learning. The machine's going to do it all, but I don't know of a machine learning project yet that has no human involvement from a user perspective. There's still that. So when you do a UX period, you should do it with user-centric design principles, clear and intuitive interfaces, all the things that you see there. So don't ignore the UX is strategy number eight for you. Strategy number nine is don't forget that corporate requirements are going to be more than data. And this is sort of a lead into the next section, which is about data. But before we get there, I'll get everything else out of the way. You need these kinds of skills in-house, math skills, GPUs, Python. Yeah, it's still very important. TensorFlow or equivalent, R and MATLAB or equivalent, Java and Scala or equivalent. These are some of the requirements, some of the skills you might think that you really need on the team in-house and that brings us to machine learning data. And of course, you know, I'm going to be excited about the data and I might overplay it a little bit, but I don't think I am. ML data is so important, I'm going to say focus more on the data than the algorithms. This is what I've seen leads to success. You'll need data for machine learning, but without a discrete focus on it, you will not get it well. So to ask a data scientist to wrangle the data every time, they're not going to do it to the standard that is really ultimately necessary for long-term success with machine learning. And so you want to do the data layer at least with data specialists, data modeling, integration, quality, a lot of the things that you learn about here at Dataversity. These are oldies but goodies when it comes to skills that we still need when it comes to data, even in machine learning. It's operational and it's real time. Let the data infrastructure create the analytical or empowering elements. In other words, build that in. Build that into your data layer. And I know machine learning, wow, it's so fast. Yes, but you ensure consistency of calculation when you build it right in. And it actually ends up becoming a field in the data warehouse or the data lake, what have you. Focus on total cost of ownership first for justification of data storage. In other words, once you've justified the project, which stack you're going to use, that oftentimes can effectively be a TCO comparison. And build a scale. I mentioned this before. I mentioned it before in context processes. When I was saying make sure that you can scale this, but make sure also that the data is built to scale. So for decades, much of the analysis, I would say 80% was on getting the right data to the right place at the right time. Remember that? Now it's 90%. It's even more. So searching for and preparing data are the most common activities of the data professional and the data scientist using different tools. Half of the time is spent on unsuccessful data activities today. And we just need to change that. We need to make sure that that's being done to standard with data professionals and so on. And data professionals, data specialists, as I say on here, they're not everybody. And they're becoming even harder to find than anything. But that's your challenge. Machine learning data. I'm not going to read all these, but it's everywhere and anywhere. See the picture? Data is the new oil. So data for machine learning projects have come from and will continue to come from all of these places. Website behavior, call center recording, streaming sensor data. Yeah, that's huge. And even good old alphanumeric data like customer account data and purchase history. That's still a part of this. And that's why I'm going, that's going to bring us to a strategy number 13. But right now we're on strategy number 11. Click. There we go. Get your machine learning data from a data lake house. Plus, okay, at least a data lake house, but maybe it's in a mesh, maybe it's in a fabric, maybe it's in the data cloud. I'll get to that. But this idea of decoupled data warehouses and data lakes, let's start to really think about the lake house concept. It has stood the test of time, proven itself, and is a going in notion, I guess, of mine into any new enterprise situation. We got to get you there. Combine those, make sure there's one panel that gets you to all that data that's seamless. And there's still some skill set involved here in terms of what goes where, what goes in the data warehouse, what goes in the lake, what goes both, what goes neither. Okay, so there are just to round out the lake house concept. There are a few key technology advancements that have enabled the lake house metadata layers for data lakes to set up drill through paths, new query engine designs providing high performance SQL, like execution on data lakes and sometimes straight SQL, access for data science and machine learning tools, and the lake not being necessarily concerned with offloads into other systems. It's a terminal point for the data. It's where the data will get access. Now, all of the major data platform vendors have converged their messaging around this concept of a lake house architecture. It takes the best of the data warehouse and the data lake and enables them to run on platforms like S3 with these data lake storage architectures where the data might be on object storage, but it's laid out in a relational way. Now, to add on to that, strategy number 12 for you is to use an architectural pattern. I mentioned the lake house. Yes, these are mutually not mutually exclusive, all right? And I'm not going to belabor it, but there is the mesh. There is the fabric, the cloud. We're very excited about these. And so we are bringing these ideas to bear as well in our engagements. There's no one size fits all, by the way. And a lot of it has to do with where are you at today? What does it look like in your shop today? A lot of organizations almost have a data mesh if you squint your eyes and you don't look too hard. It almost looks like a data mesh without even introducing the concept of a data mesh. How's that possible? Well, we've just become more or less decentralized in our data warehouse data lake approach. And that's what it is. And so why not add a few concepts that the industry has learned about doing things that way, which they call the data mesh, and make it into a more true data mesh so that you have all the benefits of a data mesh. But I think the best in the long run is a combination. It's a lake house and a mesh. It's a lake house and a fabric, or maybe all three of those, or maybe it's the lake house and the cloud. You definitely want the data virtualization, if you will, that the data fabric brings. So ideally, I'm saying all four. That detail might be for another day, but I wanted to plant that out there when we're talking about data. Now, earlier I talked about the importance of that alphanumeric data. And here, I'm going to say, make sure master data management is in the environment. Of all the 24 strategies that I'm giving you here today, this might be one of the ones that people are not doing, maybe to peril, though. As you can see in these industries, there are various subject areas that are very interesting for the various ML applications that are being built today. And if they're being built without MDM, they're doing it the hard way, frankly. So yeah, I don't want to pile on here. I know you got a lot of work to do, but that alphanumeric data brings a lot of good light, if you will, to the detailed data that is the bread of machine learning. And let's move on. The data, okay, data's important. So is processes. Adopt ML ops early. These are still the early days for ML and success is not a given. Adoption faces several challenges. So in response, ML adoption requires a cultural shift and a technology environment with people, processes and platforms operating in the responsive agile way organizations are looking to operate today. That's the approach we call ML Ops. It is the science of getting your ML work to production and beyond and getting it into an iterative cycle. And so to get from ML to ML Ops, yes, there are a bunch of tools and you don't have to have one, but the tool list is long and growing. And we've worked with Microsoft Azure ML, Google Vertex AI, both have ML Ops offerings, but there's Amazon Sage maker, cloud data machine learning, data robot, data IQ, just thinking here, splice machine, so many others, so many others. This is in competitive day, but these products help you with your ML Ops and open source is quite prevalent in the ML Ops world, by the way, as they place maybe to start. So getting from ML to ML Ops, many companies have built strong ML capabilities, many have not, but wherever you are on that journey, the sooner the better that you implement your ML Ops. A few businesses have been successful at putting the majority of their ML models into production, leaving a sizable amount of value on track. We know how it is. You may not even be interested in building something that you think has a low chance of getting to production and adding value. So you would probably not be surprised to hear that when I landed organizations that have hired me, very few of them can lay out their path to production for machine learning, which I call ML Ops, very few today. And that's okay. It's wherever you are in the journey, but that is something that we're going to have to focus on. So get from ML to ML Ops and strive for iterative pipelines that are reproducible, reusable, manageable, and have automation. The pipeline should be iterative. With these criteria in place, it is possible to deliver on the iterative nature of ML modeling and application development. As a result, data scientists get the benefit of CICD, evolving a model creation pipeline, a working environment, and a target architecture continuously. This is strategy number 15 for you. Are you keeping track? How many of the 15 so far are you doing? Just kind of mentally think about that and maybe bring it to your next team meeting and make sure that all of these things are being done. Like number 16, use a step-wise progression for model development and evaluation. Yes, that's right. A step-wise progression for determining what the model will be and how are you going to even evaluate it might require an evaluation. Determine which algorithm is most suitable. Some people are hunting a little bit more and allowing auto ML and things of that nature to do the selection. It's kind of there. Make sure that that is a way. Auto ML is a way. It is an algorithm, if you will. Make sure that the algorithm ultimately selected is the most suitable to the problem. Data pre-processing and feature engineering, divide the data, assess the efficacy of the model. Did it work? Don't just take it and run with it. Did it work? Does it pass some muster tests? Tune the hyperparameters of the model. Check for bias and I'm going to come back to that notion of bias here in a minute. Document the process of developing the model and let that be a guide to your ML Ops process. Package the model and the deployment. Package the model along with its dependencies. Enable version management and revert functions. Revert functions, that enables you to undo changes made to your code and return to a previous state. Constantly monitor the efficacy of the model. Retrain the model in regular intervals. I don't think too many of us are at the point where we're hurting bad because we are not retraining our models because it hasn't been years and years and years but you need to build that in or obviously you're just going to be snowed under with everything else and not get back to the idea of retraining the model, making sure the model is still fit because it's not really like the model will change but the underlying world that it operates in will change. The data will change so you must really monitor model efficiency, efficacy, and performance really continuously. Review and update your model governance as well. How you are providing that sort of guidance to the model modeling process. Consider more than accuracy for the models. Robustness, that make sure the results are stable over time. Make sure it's easy to understand the model results. Make sure that it is fast processing time for training, test, and execution or inference. The resulting model has only the essential parameters and it's easy to monitor and explain. Make sure the model can handle growing data volume and concurrency. So more than accuracy. Yeah. Okay. A lot of us are stopping at accuracy. Employee explainable XAI, AI or XAI techniques. Transparency and explainability are crucial for building trust in machine learning models. XAI techniques help you to understand the inner workings of models and identify potential biases and gain insights into why the model makes certain predictions. It's about buying. It's about compliance. It's going to be increasingly required to compliance mechanisms. So start early and often on your explainable AI techniques. I have a client that just ignored this completely and we're kicking the can. Okay. We're consciously kicking the can down the road a little bit here, but I'm keeping it alive. I'm keeping this idea alive and we're revisiting it. And I know I sound sometimes like I don't know the boy who cried wolf. Is that who that was? I may sound like that guy, but I think that this is going to be important and the modeling that we're doing is only getting more complex. So this is necessary. XAI is essential for upcoming regulatory compliance and responsible ML practices. And again, something we don't think about often is the buy into the business by the business. Of course, we're going to come back to that. That's going to be big. Address ethical considerations. Wow. It's a big topic here. It's a big topic. But are there, and there aren't ethical considerations in every ML project, but are there any in yours? And if so, are you addressing them? Are you collecting data responsibly, ensuring that data is gathered responsibly, ethically, in a way that respects and protects individual rights? You might, if you were looking for data sets, especially around language, I work with define.ai. They have ethically gathered languages from all over the world for various call center and customer interaction systems. Responsible development, taking into account how AI is developed and used and how ethical considerations, including fairness and privacy, are addressed. What about the trustworthiness of the data, the explainability of the data? Is there any discrimination than data or privacy? Because data does potentially contain these sorts of things. This is potentially a big problem for your ML project. And so make sure that ethical considerations are adhered to or considered. And I say that I like to expand data governance to this area and provide that level of data governance over the project. Also plan for data drift strategy number 21. Model decay and drift is inevitable. It's not the model necessarily that's changing. It's really the reality that is changing while the model remains static. And so changes to that environment affect model input, implement statistical tests and anomaly detection algorithms for monitoring the performance of female models over time. And implement that version control I talked about earlier and retrain those models as I talked about earlier. It's not a one and done. It's not a put it in production and forget about it mentality. Now I've alluded to this a couple of times at least already support your machine learning with data governance. And when you have good data governance in place, that's where I'm going to ensure some of the things, especially I think strategies 18 through 21, the ones I just covered can be supported by data governance, ensuring data quality and reliability, data quality still being job one, you still got to do data quality. Even though it's machine learning data, you may not have to do as much as the accounting data that you might have in a data warehouse, but you still got to do some. As a matter of fact, I have a whole presentation coming next year on that very top. It's very nuanced, but we'll get to that. Then hopefully I'll get to the topic before you do and help you along on that. But data governance does a lot of things, right? Data security, data sharing, making sure that data can be shared, it's not being boarded, managing the whole data lifecycle and the disposal of older data, supporting regulatory compliance, improving model fairness, which is some of the strategies I just talked about, enhancing trust and adaptability. So data governance still important in machine learning. Now I said I have 24, so 23 is getting pretty close to the end. We got a plan to deal with big change, big change, because when you are doing a project that's machine learning, that is very different, especially to people in the business that aren't part of the project, people that are hearing about AI this and that, misinformation, sometimes, sometimes accurate information. Organizations implementing ML have recognized the need to make significant changes, or they should. People instinctively don't like change to begin with. And this is change. It's not just another project. Your first ML project and your second and third are not just another project to the business. They see this and some of them, at least, are thinking different thoughts than they ever have about projects. When you add AI coming into the workplace, which might be what you're doing, that's going to even make the issue worse if you don't get ahead of it. That's what I said. I want to get ahead of that challenge. And I was challenged on this point by a client, and I'll put it out there, that, well, you're just going to get them all hot and bothered about this ML AI business. And I don't know that I want to do that. And I think you're just taking the can down the road when you take that approach. I like being upfront, open and honest about it. And there are definitely organizations I could cite that are taking that that good approach, I'd say. And I think it's paying off in employee loyalty, and better projects will buy into the projects and so on. That's a big topic. It's a big topic. And it's not a technical topic. And so a lot of technical folks that come to a webinar like this, this may not be what you need to do, but somebody needs to in the organization. Somebody needs to demonstrate how what we're doing here to help the company, instead of having the fear grow with people thinking it's going to hinder them or even worse replace them. I haven't come across that project yet, even in ML. That's huge on replacing people. It may come, but it's not here yet. And so I definitely want to get ahead of that issue, say what needs to be said about it. In order to enable that cross-functional collaboration, we want to provide a more robust and up-to-date IT infrastructure. That's what this is, and manage new risks that can jeopardize trust in AI. If you want to hear more about this, I have a whole presentation in this series with a diversity on organizational change management. I think I gave it a couple months ago, and you can find that. But please deal with this. And finally, the last strategy for you is to stay abreast of industry developments. This presentation is good for maybe a year, and then it needs to be revised because we're going to learn so much more. The technology is going to step in on so many different areas that I talked about. And it's going to change how I make recommendations around ML strategy. It's going to change how you should view machine learning. So stay abreast. You must have a plan. I'm so thankful that so many of you have reached out to me this year and say, you try to make everyone, or at least catch it on YouTube, or catch it on dataversity later. You catch everyone. And that sort of is your launching point for your education in this field. And that's what it's all about. I'm a launching point here, and I'm helping you understand the topic to some degree. But there's more to learn. And there's obviously more to learn. Coming in 2024, it's going to be a big year for machine learning. So take these strategies forward to success in summary. Machine learning opportunities are everywhere. Think big and produce business results. Choose the ML stack with careful consideration. And I gave you a lot of considerations, including performance and cost. Focus more on the data than the algorithms. Yeah, you got to focus on both, of course. But the data is where a lot of organizations are getting tripped up the most. And the data, you get that data solid, the algorithms can be laid on top of great data, so much easier than if it's poor data. You can trust what you get out of great data. Adopt ML Ops early. Prepare the organization for the change in different ways. Data governance, make sure you're taking care of data drift, making sure you're taking care of ethical concerns, and so on. That people do care about that within your organization. And they care about this mysterious thing that you're bringing in called machine learning. And so take care of them that way as well. This has been 24 strategies for your machine learning success. Shannon, back to you to see if we have any questions. Oh, love it. Thank you so much, William, for this great presentation. Just answer the most commonly asked questions. Just a reminder, I will send a follow up email by end of day Monday for this webinar with links to the slides and the recording. Diving in here. So William, can you not have machine learning without MDM master data management implementation but with a data quality program for all the data we move in the warehouse or lake house? Yes, you can is the straight answer to that question. I am using master data management. I'm not necessarily saying, oh, you got to get a master data management tool. And you have to do a program that way. You can do master data management without a tool. As a matter of fact, I like to say that we're all doing master data management. We just may be doing it the hard way. And frequently, the hard way is without a tool. But the idea is to have a place to go, a go to place where you can trust that that data is fit for my application. And I'm going to borrow it. I'm going to use that data as a service to my machine learning application. And that data has data quality in it, as the questioner mentioned. Yes. So as long as you have access to the data sets that have data quality, that is great. You might consider some of the other benefits of master data management to your overall organization, however. And by that, I do mean a project and a product and that sort of thing. So you can get there different ways. Yes. And can you expand on explainable techniques? Yeah. That's a mouthful. But you want the algorithms to be able to produce some language and with LLMs, this is becoming a reality of how we got where it got. That's not too deep because you can't get too deep because nobody can understand it. But it's not too high level either. Like you cannot just simply say the algorithm picked it. And so, well, hey, you know, the algorithm picked it. And that's the end of the story. So somewhere in between. And that's to be determined. I think the law will and compliance regulations will help guide us along the way there. But I want you to stay ahead of that because it's just going to get harder than it is now where it's sort of loosey-goosey. But what you're missing out on if you don't employ explainable techniques today is that buy-in because another of the things that not only people in the business that are unexposed to ML, but also your executives, what they're going to, they're going to want to put trust in this application, they're going to want to know roughly speaking how it works. So you can either dig in, which is very hard. I find it very hard to dig in and understand sometimes except to kind of explain the algorithm generically speaking like K-means or whatever. Or you can choose a data science platform that is attentive to that area. So I think it's a definite plus when the platform is attentive to that area and is providing us some help and explainability. I love it. We've got just a little over a minute left, but I'm going to ask the next question. So with generative AI coming in, do we still need machine learning programs and projects, especially after seeing the cost of machine learning? Yeah, I didn't mean to dissuade anybody by the cost because I think the costs mostly can be more than covered with the returns on these projects. So Gen AI is a type of AI obviously that's categorized a little bit differently from machine learning. But yes, there's room for all of this. So I'm not throwing Gen AI in here necessarily, but a lot of the things that I mentioned here are definitely strategies for Gen AI success as well because it is a machine learning category. And so yes, you need both. The Gen AI is going to focus a little bit more on those projects that are heavily centered around language and text. And machine learning will be grabbing a lot of the rest, a lot of the more quantitative stuff. So room for both. And there's definitely going to be 2024 is going to be the year for all of these kinds of projects. I love it. Well, William, again, thank you so much for this great presentation. And for another amazing year of webinars, I look forward to hearing more in 2024. Thanks to everybody who's joined us throughout and in the engagement. Again, I'll send a follow-up email by Anna Day, Monday with links to the slides and links to recordings, everyone. So thank you all so much. Thanks, William. Thank you and bye, everyone.