 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Officer of Dataiversity. We'd like to thank you for joining the latest installment of the Monthly Dataiversity Webinar Series, Advanced Analytics with William McKnight, sponsored today by Informatica. Today, William will be discussing ML ops, applying DevOps to competitive advantage. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A section or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag ADV analytics. And if you'd like to chat with us or with each other, we certainly encourage you to do so. To open the Q&A panel or the chat panel, you'll find those icons in the bottom of your screen for those features. And just to note the chat defaults to send to just the panelists, we absolutely change that to network with everyone. As always, we will send a follow-up email within two business days containing links to the slides, the recording of the session and any additional information requested throughout the webinar. Now let me turn it over to Abelash from Informatica for a brief word from our sponsor. Abelash, hello and welcome. Thanks, Shannon. I hope you everybody can hear me. You sound good. I'm going to share my screen here and start presenting. I hope everybody can see my screen too. Looks good. Right. Hello, everyone. Before I get started, allow me to introduce myself. My name is Abelash Mola. I'm part of the product management team here at Informatica. And I drive the roadmap and strategy for our MLops product. And also I lead some of the serverless technologies on the data integration side. So today I'm going to talk about how to put AI into action and boost productivity with MLops. Now, before I jump into the topic, let me, you know, kind of, you know, set some context and kind of lay out some of the things that are, you know, happening around us, right? And this is probably something you're also seeing on a database, a day-to-day basis on what's happening in the world of cloud AI and analytics. Now, if you look around us, you know, there's an increasing data volume, you know, the data volumes are exponentially increasing with, you know, new data types and new file formats that are being introduced. You know, more and more devices are connected today, right? And with connected devices, you know, there's more data generated and we are, we want to get insights into this data much faster and quicker, right? There's new user personas that evolved in this new cloud and analytics space. And also we see a lot of organizations today are taking kind of a multi-cloud and hybrid approach, right? So data is being stored across both on-prem and cloud and on-cloud, you know, we are adopting a multi-cloud strategy. So this is kind of a new world that we are seeing with, you know, different, you know, volumes and different formats of data, different user personas and things like that, right? Now, how do we get insights into this data faster, right? But to get, before we get insights into data, there are some challenges that we are facing, you know, to be more productive, right? So some of the things that we are seeing, again, these are some of the numbers from a third-party study that, you know, it's becoming more complex for organizations to get insights into the data, right? For example, 72% of the organizations today, they don't have a complete architecture to manage end-to-end set of data activities. They're using many point-in products that don't very well integrate and interoperate. And for that reason, you know, data practitioners are spending over 80% of their time, you know, preparing the data, instead of analyzing the data. And a lot of these organizations are also challenged by, you know, the quality of data as well. And other challenges that we are seeing with this, you know, changing landscape is resource constraints as well, you know, technology is also changing. And we are finding it very difficult to find people with specialized skins. And also, there's lack of automation that's impacting for the existing resources to scale as well, right? And there's also lack of cell service access, like, you know, as more non-technical users want to access the data, we are seeing a lack of a lack of cell service tools that is kind of delaying the innovation because they are not able to access the data faster. And other challenges that we see in this newer landscape is that, you know, there's a cost overruns. You know, most of the organizations are, you know, while they're out of cloud, they are encountering budget overruns, you know, resulting in, you know, questioning the value of cloud, you know, 75% of the organizations are running into this cost overruns. And they are actually questioning, you know, whether cloud is the right strategy or not, right? And part of the reason is that it's very difficult to predict compute costs and, you know, there's lack of visibility and control of users, how they're accessing and processing the data. So that's one of the reasons that cost, budget, these organizations are running into budget overruns, right? So given this changing landscape and also, you know, the data management challenges, you know, only like 1% of the AIML projects are becoming successful. Again, this is a number that we took from a third party study but AIML projects are really making into production today, right? So how can we help AIML projects to, you know, go into production much faster, right? And, you know, for that reason, you know, there's a concept of MLops, right? So MLops is basically a process of streamlining the development of, you know, operational, the development, operationalization and execution of AIML models. Again, if you have already, you know, some of you have made up, embarked in this journey of, you know, AIML, right? And, you know, you probably know, productionalizing machine learning models is, you know, difficult, it's painful. And part of the reason is that the machine learning life cycle on a high level consists of three phases, you know, you have to prepare the data, build the model, and then, you know, you have to operationalize the model and then monitor it, right? Again, these are like three high level steps, but if you kind of double click on each of them, we have that much more components and steps involved, like from data ingestion to data preparation, this model training, model tuning, model deployment, monitoring, explainability and many, many more, right? So it's kind of a complex process. So that's why productionalizing machine learning models is difficult. And also, it requires collaboration and hands-off across different teams from, you know, data engineering, data science and ML engineering. You may want the data engineer to, you know, kind of, you know, clean the data for data scientists, right? So there's a lot of, you know, teams that need to be, need to come together and collaborate with MLops. And naturally, you know, with MLops, you know, it requires, you know, stringent operational rigor to keep all this, some of this process in synchronous so they can work in tandem, right? So essentially, you know, what MLops stands for is machine learning operations. And it's a core function of machine learning engineering that's, you know, streamlined in the process of taking machine learning models to production and then maintaining and monitoring them. It's a, you know, collaborative function and, you know, which comprises of, you know, different personas like data scientists, DevOps engineers, and also IT, who can, who needs to handle the infrastructure as well, right? So again, it's not just streamlining, but it's a team support, you know, cross-functional collaboration is key. Now, you know, if you take a typical machine learning project today, you know, you find, you know, different, different personas, right? Like data scientists, data analysts and data engineers. And today, you know, outside MLops, you know, these three, you know, different groups have to perform their own dedicated role. And in some cases, you know, these personas or these users are part of different organization branches. They're part of separate organization branches, and each group kind of works in silo, right? And, you know, for example, as a data scientist, if I need something, I have to request something from the data engineer and, you know, I have to wait for that person to fulfill that task before I can continue on my work, right? So, a lot of, you know, back and forth happens. Now, what MLops does is MLops kind of, you know, makes an attempt to, you know, remove the silos and empowering the data scientists, you know, traditionally, you know, a data engineer is required to generate features, but instead, you know, applying this MLops approaches, we are, you know, empowering the data scientists to work hard and remove these different silos so that the data scientists can, you know, kick off these machine learning models. They can train some high quality models at will, and they can make it available for business to consume, right? So that's kind of essentially what MLops does. Now, given some of these challenges, you know, we at Informatica, what we did is we recently launched a new product called model serve around MLops. Again, to overcome some of the challenges that we described, I described in the previous slides, you know, we kind of abstracted some of the, you know, infrastructure overhead and you know, things like that from within our product. So essentially what we did is, you know, we create a model registry, there's a model deployment, and it's a simple wizard based driven product, and where data scientists can quickly, you know, build, you know, whatever framework they're using TensorFlow, PyTorch or any, however they build the AI ML model, they can bring it into our platform and operationalize it and we expose that as an endpoint within our IDMC platform. So that way, you know, we are data scientists can focus more on building high quality models instead of, you know, worrying about, you know, the other things like infrastructure related things, right? And we also have monitoring capabilities that helps the data scientists to monitor models on a continuous basis. So, if you're interested in trying out this service, it's free. It's a public preview product, we made a public preview. You can just sign it up. And we also recently published the ML Ops white paper. Feel free to download the ML Ops white paper to put AI into action. And thank you very much Shannon for having me and I'll pass it back to you. Abelash, thank you so much for kicking us off and thanks Dave for Matica for sponsoring today's webinar. If you have questions for Abelash, feel free to submit the questions in the Q&A section of your screen, as he will be joining us at the Q&A at the end of the webinar today. Now let me introduce to you our speaker for the series William McKnight. William has advised many of the world's best known organizations his strategies form information management plan for leading companies and numerous industries. He is a prolific author and a popular keynote speaker and trainer. He has performed dozens of benchmarks on leading database data lakes streaming and data integration products. And with that, I will give the floor to William to get his presentation started. Hello, and welcome. Hello, I trust my slide is being presented okay here. Looking good. Thank you, Shannon. And thank you Abelash for that introductory material I found it pretty fascinating. Actually, you talked about resource constraints which is a very real thing. And I think that sort of plays into our topic today a little bit. I'm going to automate some of the functions within ML to make it successful without it being resource intensive. That other thing that you said Abelash about 1% of AI projects being successful. I guess the only thing I have to say to that is wow, I noticed it was a few years ago. Hopefully it's at least double to 2%. And by now, I guess we can fudge a little bit with what success is but I do see it still being strong and, and definitely being a wave here of the future and so we'll have to just try try again until we are successful with it that's my take on it. Okay, so I titled this ml ops applying dev ops to competitive advantage, because in a lot of ways, as we'll see ml ops is a fork, if you will dev ops as it applies to machine learning types of projects now why couldn't we just use dev ops, because it doesn't fully embrace all the new ones that ml has for us. So therefore we created a new discipline a few years ago and and it's called ml ops and a whole wave of vendors have emerged in this space to help out with that and I do believe they provide strong value integration and if you're serious about ML you got to be serious about ML ops, and you pretty much got to be using a product today so I'll show you some samples as we go through here. A little bit more about me. I'm a longtime consultant and analyst in the space of enterprise data. And that means a lot today that means data lakes data warehouses analytics bi streaming data graph data, etc a lot of the topics that I speak about in this series. And we offer strategy training and implementation. Interestingly, we, we, we provide a lot of strategy to the vendor community out there. And we actually teach them a lot about their competition, which a lot of people find interesting but yes, we are on top of this space for you. So some of our client logos we learned from our clients would bring that to our analyst work we learned from the analysts about where things are going we bring that to our client work. And it's a beautiful thing. And we've had opportunity be part of a lot of great, mostly large but also midsize implementations over the years. Okay, let's talk about machine learning because that's what ml ops is all about machine learning uptake is strong. Okay, this is not the 1% slide right. This may be a little bit, a little bit more up to date but you can see at least a lot of people are trying and, and they're putting machine learning models in production I guess this isn't a success slide per se but if you're over 100 I guess you've had some success along the way and a few people are as you can see here. You might be thinking well, where's me I'm I'm zero. Well, I talked to the authors and publishers that is of this of this study, and zero is probably where the na is over on the right the gray bar, 5%, 4%, etc, depending on industry. So the use of AI and ML to drive business transformation or reimagine the customer experience has become ubiquitous across industries. Again, we're trying. We're out there trying. And throughout organizations large and small and one thing has become clear, the ability to deploy ML in and of itself is not a silver bullet for success. Today in enterprises ability to leverage ML to its fullest has reached a critical juncture, while many companies have built strong ML far fewer have been able to deploy it. And that's where ML Ops is going to come into place but first, a look at some use cases for ML. I did this slide so public sector or on gas we see various industries here. We see the four pillars, at least in my view of ML usage flow optimization modeling and analytics predictive insights threat and risk analysis hope you can find your way in this slide somewhere and see a use case or two or three for your enterprise and you probably can just look and see what are we already doing with ML scaling ML to reach its maximum potential is a highly methodological process based on a set of standards, tools and frameworks broadly known as ML Ops. And ML Ops focuses on the entire lifecycle design implementation testing monitoring and management of ML models and I love this topic, because I love ML. And I think it's the strong wave of the future, and I am passionate about doing things right and doing things in a process oriented manner. ML Ops brings that to ML. And I'm passionate about everything we do is marching towards production and value to the enterprise and ML Ops does that. So what are some of the drivers to ML Ops besides just we're doing ML. So why not. Senior management does not always see ML as strategic and it can be difficult to measure this is more or less just a truism warming us up here ML initiatives can work in isolation from each other, resulting in difficulties aligning workflows between ML and other teams so that can be real a challenge to leverage in terms of the expertise in an organization at the bottom there you can see kind of the evolution of data usage across the enterprise just good old reports from wherever data warehousing data legs and machine learning, and that's actually a little data within a lot of enterprises right because machine learning algorithms, they need the data from someplace, usually multiple places but one large place for that is the data lake today. So I'm not going to say the day like is prerequisite for for ML, but it's pretty much way up there okay. So high quality data as one of these bullets here the third bullet mentions by the way, I've been working with some vendors that that can support that need for high quality data so if you want to discuss that you don't have enough data for your ML algorithms just do some, some of the basic things in response to all these drivers ML option requires a cultural shift and technology environment with people process and platforms operating and a responsive agile way organizations are looking to operate in today. That's what we call ML Ops, creating a culture and environment cannot happen overnight comes by learning from those at the vanguard of ML, how to map the potential of ML Ops, driven innovation against an organization specific needs and resources. So, this is a little bit showing you kind of where we are with ML today, many of the enterprises using ML are well past the phase of having just a few models in production. As we saw, we're cranking on the models there, indeed, 10s or hundreds of models may now be the norm and highly mature organizations may have thousands. These organizations needs a dev ops like model for ML. And from this is where ML Ops emerge so let's define it. God define it right. ML Ops is a practice for collaboration between data science and operations to manage the production machine learning life cycle so I'm not going to read the rest you can see it there you can see some of the benefits you get faster time to market more rapid rate of production which is very important. So you can turn around your experiments in a fast manner so you can get to something that works again for production. ML Ops is essential for scaling ML without it enterprises risk struggling with costly overhead installed projects several vendors have emerged with offerings to support ML Ops. And so there's, I'm just going to rattle off some, some names of vendors in this space in case you hear about them and wonder what they do maybe now you'll know a little bit more about what they do. So you can go size them up for yourself. Microsoft Azure ML Google Vertex AI algorithmia, Allegro clear ML ultra X promote Amazon Sage maker see none of that none of them have a ML Ops in the name so it makes it a little confusing cloud era machine data IQ data robot domino data lab informatic a model serve, which we heard a little bit about earlier. Azure machine learning I mentioned that already paper space gradient sass has one splice machine has one. Today is not the competitive day I am. I certainly don't have enough time to to talk about each one of these and any great detail but I'm going to say that they all do the things that I'm going to talk about here to some degree, and to the degree of applicability to your situation, obviously something that you'll want to look into, because you want to go from ML to ML companies have built these strong ML capabilities, but few businesses have been successful and putting the majority of their ML models into production. And maybe this gets back to Abilash is statistic about that 1% right, we got to get it into production. Again, a passion of mine, let's get everything we do, marching in that direction. Machine learning operations, ML Ops are set of standards tools and frameworks that are used to scale ML objectives create repeatability shield the complexity of it all from behind the scenes and develop it without having to need a as I say here a horde of engineers. Today, every enterprise serious about embracing machine learning is turning to ML Ops. It helps standardize and to a degree automate certain processes so engineers and data scientists can spend their time on better optimizing their model parameters and business objectives. ML Ops can also provide important frameworks for responsible practices to mitigate bias and risk and enhance governance. So, ML Ops operations and these are just some more basic truisms about ML Ops. It's an iterative approach. It's an automated tool. And it helps you work collaboratively leveraging knowledge that is gained someplace in the organization to the rest of the organization and ensuring that there is some level I know we want to be creative and free flowing with this but some level of structure around the process of model creation. And that is needed today I will say because you've got varying levels of experience in building ML models, even within a single organization and so creating some consistency is going to be important so you know for example, how many models do we have. And where are that how many are in production how many are working they all should be, you know, being iterated, but how many are actually working and what have they done. What have they done for us because we want to see results. So why not DevOps. Well DevOps success depends on how well platforms of data and existing new services can be integrated, adapting to changing circumstances but not machine learning. The organization needs to be considered in context of machine learning to ensure conflict consistent delivery of business value and machine learning has some specialized needs in this area. And does not a DevOps does not address the operation and orchestration of resources for ML. So that's why not DevOps. It's supposed to assure the delivery of value to the businesses customers and its stakeholders. Alright, now, let's get into some terminology around MLOs because it's fairly terminology intensive. And no terminology is not consistent out there. Unfortunately to some degree it is I suppose but I'm sure it's going to get worse before it gets better. And that's just sort of the nature of things so I'm going to put a stake in the ground here for you. And to help give you that foundational piece of, you know, these definitions, so that you can go out into the market and try to really understand what people are talking about when they're talking about these things are in regards to MLO. So the pipeline will start with pipeline I have two slides of terminology by the way so this is not this only half of pipeline. The process of automating the machine learning lifecycle from data collection and preparation to model training deployment and monitoring, you know, simply that we used to call these pipelines just sort of data integration flows right. Well now it's, we got to call it something else we got to call the pipeline. It's a modern, it's a modern flow, and it has a lot of machine learning on it data sets store and the data sets the data itself. Wherever it may be I talked about the data lake, being a major place for that they're in MLOps repository. It's a storage location for all the artifacts related to a machine learning project. So, this is where your data sets are referenced your models are your scripts are in your configuration files. It's used to store and manage the projects source code track changes and collaborate with other team members so it's pretty important I'm going to show you a screenshot of a repository so it helps the concepts thinking. History, a logical picture of all elements required to support a given ML model. Okay, more terminology workspace, a workspace is the virtual and fine. Okay, that allows as the name may imply it allows data scientists engineers and other stakeholders to collaborate on the development deployment and management of machine learning models to platform for teams to create data coded models, as well as track and monitor the performance of models. There is the target. That's what you're trying to get out of the model experiment. Let me spend a minute on that. It's a process of running a ML model in a production environment and measuring its performance. I guess that's the way I would say it. It typically involves deploying the model to a production environment, collecting data from the environment, see how's it doing, and then analyzing results to determine the accuracy and performance and by the way, case you don't know these models are continually enhanced and iterated. The model itself packaged output of an experiment. That's what it is, which can be used to predict values or build on top of an endpoint in ML ops is a web service or API that allows users to interact with the ML models typically used to make predictions or provide insights into the model's performance. So these are typically hosted on a cloud like AWS Azure or GCP. And so that is the end of the terminology in case that was just brutal for you were done. And now let's look at some some better looking pictures than that. Okay, the workspace itself. This is a pretty basic one I am showing you examples from Azure ML. Again, as an example only not meaning to call it out for anything other than providing us an example today. So the pipeline must be able to package and deliver models and code into production, both into training and into target environments, in order to adhere to the principles of continuous delivery that CI CD that we all hear about that is very important here in ML ops and the workspace is used to develop both applications and models package models into containers or microservices and configure the target environments here you can see some of that going on in our workspace example, a workspace it's, you could also say it's a cloud based development environment that enables you to collaboratively develop test and deploy ML models. I wanted to get you a quick peek at the workspace since I just defined it. Now let's talk about the stakeholders and obelisk should already introduce some of the stakeholders in this whole process. And maybe a little double click on that we see here we see some of the artifacts that they work with. There's the developer, the data scientists, the business user and a few others so let me start with the data scientists because to me they're, they kind of own the room when it comes to, to ML, and I want them to, because what they're going to provide us a lot of value right develop they have a lot of models to answer business questions brought up by SMEs, or they do whatever they feel like, and that's okay. I like it when the data scientists can do whatever he or she feels like I like that I have enabled that, and I have it pigeon hold them into okay you get to do this but everything else you can't do until you know 10 other things are done I like it when the scientists can have a free hand and and create high levels of business impact, frankly, I think that's where the big levels of business impact have come from recently and are going to come from in the in the future so enable those data scientists. They are responsible also for testing the models and delivering them into production to produce that business value, and they review the model results, accuracy, and retrain the models. So that's their domain now not shown well actually shown here is the developer. They talked about that them they developed the API so the applications that work with ML models. They verify the models work correctly with other software platforms. They don't get into the model results per se. They don't, they don't test the models, but they, they're more or less integrators I guess of the applications with the models. The SMEs are not shown here but they are sort of the higher level business users. They might ask the business questions that kicks off the whole process, and they ensure model performance meets their business needs and goals so it's not just the data scientist that says, This is great. They work in conjunction with the users or the SMEs now. There's also the data analysts that are kind of, I mean, everybody wants to be a data scientist today nobody wants to be a data analyst but there are data analysts that handle data analysis and exploratory data analysis. They optimize and build the data extractions for use in ML processes, and often they are fairly attached at the hip to the data science data scientists or the data science process. There's also, we could go on. There's other forms of stakeholders, the machine learning architect that enables scaling for ML models in production and improves and optimizes the architecture for ML models in production. And the DevOps engineer sitting over all of this and just overarching looking at the process making sure the process is followed managing the security and the performance of the architecture that supports ML models in all the CICD aspects for ML models in all environments. Okay, so those are the stakeholders here are some of the things that they'll be doing. This is sort of the flow of things how things need to look for ML, you can figure the target prepare the data, train the model, containerize the service validate the results, deploy the model and monitor the model and then probably repeat at infinite. Alright, so how does this need to look. Let's consider the activities involved in the development of an ML based application, as I just kind of went through here to adopt the terminology we call this sequence of activities. Let's take a pipeline, a pipeline the ML pipeline necessitates review and iteration models need to be tuned results need to be tested and data sources and models need to be improved. For example, you may discover that the insights that you need are associated with only a subset of the data sample, or you may discover some inherent bias in the results, which needs to be addressed through additional data, or improving the results, or you may find discrepancies between training and inference data sets, we call that data drift, by the way. So for these iterative pipelines to continue to deliver results, we need these things out of our pipelines, reproducibility, reusability, manageability and automation with these in place. It is possible to deliver on the iterative nature of ML model and application development. As a result, data scientists can have the benefits of CI CD, evolving a model creation pipeline, a working environment, and a target architecture continuously. The pipeline must be able to package deliver models and code into production, both through training and target environments, in order to adhere to the principles of continuous delivery. So integration and delivery CI CD, like DevOps, depend on automation to ensure quick and repeatable pipelines, especially when these are supplemented by governance and testing and we have those bounds around what it is that we're doing. Let's take a look at an example. An ML op scenario, our scenario will be customer turn so an online mobile phone retailer. All right, has been looking at how it can reduce churn across its customer base. Pretty typical right they need to do this. That is the length of time that customers stay loyal, particularly at the end of a subscription period. They're well established reduce churn. The retailer needs to increase customer loyalty, ensure that products and services are a good fit. So make sure the customers into the right products for them and make sure you're offering the right next best product for them deliver targeted and effective marketing. However, the linkage can be difficult the retailer already let's say the retailer has some success with ML already this isn't their first model, and they have some data scientists and engineers and so on. So put all that aside. So what steps are required to expand to an ML ops approach they seem kind of ready for success here right. They need to configure the model and the data environment. A first step is to prepare the data and the modeling by putting in place an environment that can manage ML within an iterative process. Platforms and tools need to be able to support deployment to the infrastructure and the libraries with multiple local cloud cloud based targets, depending on model status. Next, setting up the data store. Pipeline steps may also consume data sources and produce intermediate data. By the way, we'll just kind of set that aside for now. In this scenario, customer loyalty factors may be fed back into the model as variables, for example, testing the effectiveness of historical loyalty schemes. And remember for ML. ML itself can't tell you that we're that is successful. Ultimately, that does today anyway, comes back to the data scientists working with the SMEs to confirm at the least the success of the overall model. And then we need to get that factored into what it is that we're doing in ML ops what do we need to iterate on how strongly creating the pipelines now this may be a little bit of an eyesore I apologize. But it's just kind of a basic workflow that you see on here, which is how we create the pipelines for training and the inference with an environment and data in place as possible to consider how to organize the flow of model creation activities from training through validation and testing, operation and inference as reducible pipelines during training and other steps scripts can be read from or written to a data store and records of execution are saved as runs in the workspace grouped under experiments. And by reviewing the experiments the retailer can monitor the results for applicability and effectiveness of the insights, speed and repeatability are key to this test and learn approach. At the same time managers need to ensure that the results are delivered in the right way. For example, assessing whether data drift remains below a certain threshold between training and inference. Let's drill in on that a little bit. Let's talk about data drift I brought it up a couple times. And you may be wondering when we might talk a little bit more about that because I just sort of put it out there because there's a lot of information here. But let's talk about it monitoring the results for applicability and the effectiveness of insights. I've already mentioned that's part of this process. Now the main thing here in my opinion is to look at data drift. Data drift is the phenomena of data changing over time resulting in a model's performance degrading. It can be caused by changes in the data distribution, such as new data points or changes in the underlying data. Drift can also because by changes in the environment such as changes in hardware or software drift can be monitored and managed through ML Ops. Once you're satisfied on this, you can use since we're talking Azure, the Azure machine learning SDK or whatever SDK that you have that's associated with your ML Ops tool to create and then publish a pipeline into the workspace. So, ML Ops looking at drift to me is kind of an under under under the radar kind of effective aspect of it, but it's pretty important. So, this is an Azure machine learning quote unquote architecture example. So, you can see how it kind of sits in context of everything it's a fully managed platform as a service ML, Azure ML provides developers and data scientists the ability to build train and deploy machine learning models and accelerate time to value with the end to end fully featured ML Ops. So, any ML tool is going to provide into in lifecycle management keep track of all experiments storing the code settings and environment details to facilitate experiment replication. And these models by the way they can be put in containers for deployment like any container in Kubernetes or whatever. There are a lot of ways to carry out ML on a cloud computing platform a popular choices to leverage a machine learning service, which is a collaborative environment. So, the solution architecture for Azure using some Azure terms but really applies to a lot of ML Ops tools I'll just pick on a few things here. It's used as as a managed workstation by data scientists and to build those models, a compute cluster issues as a training compute to train ML models. And once the models created, it can be deployed on an Azure Kubernetes services cluster or whatever Kubernetes service that you're using. So a user can use the ML Python SDK CLI or UX as it mentioned is as is mentioned there to provision a workspace private link customer manage keys and role based access controls so this is how we put some controls in place around this. After putting the overall security controls in place and automation can be done at that point. ML models are trained using the compute cluster as a training cluster. If using a public IP is prohibited the it administrator can enable a private link or build a compute cluster behind the Vnet a model come to be deployed on the AKS cluster after it's been created. Some of the features of ML Ops. Now, I'm going to go from embryotic features, all the way to more or less mature features. I got about four slides on the features of ML Ops, which I will talk about which I have evaluated for several of the leading vendors. These have set up in use now these are these are my criteria for you ease of setting up and use creating those ML managed in points how easy is it to do it how quickly. Can you do it. What kind of person do you need to do it. Part of that is configuring the networking and the security and connecting to a workspace using the vendor portal how quickly is all that done how about creating the compute resources. How quickly can a data professional deploy and attach compute resources to a workspace is that hard. Actually I recommend doing this before you commit to a tool. In our case, the benchmark that we did on these tools we created a compute instance with startup scripts auto shutdown policy provisioned by an admin persona, but assigned to a data scientist and we put it behind a Vnet with no public IP. We deployed our single instance and a production grade cluster. We measured all that. So for managing the compute resources, you want to look at how easy it is to delete a resource that does happen. And then the ML Ops workflow the model orchestration how quickly can you build models. For example, some of them have a one click ability for a data scientist to launch a Jupiter R studio or terminal interface to build models. That's great. How about reuse the models and then data orchestration how quickly and easy, easily kind of data engineer perform things like importing data, validating and cleansing data, transforming and normalizing data, staging data, all of this, so that has to do with the important asset in this process of data. So that's sort of my very beginner set of ML Ops features, moving on growing a little bit here into security to so to fully secure the ML Ops platform network parameters must be put in place to avoid potential attackers. So it administrators need to configure the platform and other services like storage, the key vault, the container registry, and the compute resources are virtual machines in a network secure way, such as using virtual networks to enable into machine learning lifecycle security, a virtual network as acts as a security boundary, isolating your resources from the public internet so I won't belabor that the security is pretty important in this process, and then governance capabilities. Wow, we haven't really talked about that too much, but that's a part of it as well, governance capabilities should allow users to set up network and data protection policies that ensure users are not able to create problems with public IPs, or without customer managed keys. Additionally, monitoring is also key to maintaining effective governance, the cloud platform should offer full stack monitoring. Automation finally, it's a key differentiator in ML Ops, you want to be able to automate your experiments, the ability that is the ability to automatically pick an algorithm and generate a deployment ready model. An algorithm, by the way, look at the smarts that it uses to pick the right algorithm for the job. Automating workflows, automating code and app orchestration, maybe using GitHub or team foundation server or something like that. And finally event driven workflows and moving right along growing a bit with ML Ops capabilities here. Event management. Yes, I want to look at the capabilities of managing experiments at any point in time. A company should have, well, really dozens of experiments in place if they're really active with ML Ops so managing those experiments are pretty important. And then, again, accuracy management and retraining so typically ML Ops platform provide ways for the accuracy of all runs in an experiment to be compared, including through visualization of the relevant metrics data. And once an experiment is completed the data scientists or ML engineer will have determined which model is best and will proceed to submit that model for deployment. Now, whether it's possible to achieve this with operating system based cron jobs and schedulers and provision physical or virtual infrastructure, a dedicated scheduler in the ML Ops platform is generally more reliable. So we're looking for that capability with the prediction accuracy that has to do with looking at the actual values for the predictions and how it played out how it all kind of became known, and then the retraining aspect of it. And this is just doing the whole thing over again with new data and new information. So, a lot of things there. And finally, we get to some other things to look at yeah there's a lot to look at these products haven't been around a long time but some of them do a lot of this quite well. We want to look at model explain ability. We have learned that it's increasingly important to be able to explain how models work. And this might be to a third party might be to a customer might just be for your own information, but the more that the ML Ops tool can provide for that, the better explain ability frameworks are critical to responsible AI as well. Because they can help humans monitor and confirm the predictions and their explanations, comport with common sense, reality on the ground, and in general, are fair, and we humans still have that common sense, right. That's what we bring to the table so that's our part in this comparing models against each other for performance accuracy and other metrics is an important ML Ops capability. And this is all cost. We're moving into ML Ops we did the, we did the study. And we have it broken out here for three products to be unnamed today not important, because I'm just giving you parameters not laying any kind of details out there but the grand total bottom line is about 70 K to 196 K for a midsize organization. Large organizations would probably be a good double or triple that. Of course, there's gradients here, and I'm just generalizing. We're some assuming here that the amount of model prediction compute stays fixed that we use 16 compute nodes running 24 7365, and that may or may not be your case. But that's what we use to come up with these numbers notice that each one is broken down by compute and service. Sometimes the services included in the compute. Now, you may say all that sounds great. Well, you are somewhere in here. You might be my maturity one, all the way through five. And so there is enough or there are enough data points out there from users of ML Ops to actually have a go at what maturity levels might be. I've laid it out this way so maturity level one is you're just gaining an understanding of machine learning. And there is a belief kind of a general belief I know that's kind of weird to say, an organization has a belief because an organization is made up of people that have different beliefs but sort of the general understanding. And the main share that this company would give to it is that DevOps are fine DevOps are fine for us. We're not going to need anything more. You move on to maturity level two. This is where the data architecture is serving up most of the data that would be necessary for ML. Good thing. That's a big. That's a big item of work kind of pre-work if you will, before you can step into ML. The cloud commitment is there and that's going to be necessary. You have at least a data scientist in place of true data scientists and a full lifecycle ML is accomplished with but with manual processes and once that becomes too painful we might move on to number three, where we begin to fork your DevOps for ML. Maybe you've embraced an ML Ops tool at this point. Maybe it's level four, but somewhere in here, you're not only moving to ML Ops tools, but you have company wide embracement of ML, the power that it brings, the value that it brings the commitment, and so on. And this is where you are level four is where you are fully into ML Ops. Now, are you fully mature with ML Ops? Not until five. And at five, the business has fundamentally changed due to ML. And it could not have done so without ML Ops. ML is applied to initiatives wherever possible. ML Ops is well nurtured within this organization so it's iterating, we understand it's iterative, and you have the governance aspect we talked about with your ML Ops. So you're doing pretty well there. Now this will change over time. We'll have to do more, but most of you are somewhere in the one, two, three, I would imagine. And hopefully you've learned a little bit more about where that journey goes today. So in conclusion, ML Ops take is strong. An ML Ops workspace is a cloud based development environment that enables you to collaboratively develop, test and deploy machine learning models. It's the workspace. You will develop iterative pipelines to continue to deliver results. And automation is a key differentiator in ML Ops platforms, and embrace your transparency and predictability in your ML Ops journey. And that's, this brings me to the end of my slides. And I'll turn it back over to Shannon to see if we have any QA for myself and Abelash. Thank you so much for another great presentation. I appreciate it. If you have questions for William or for Abelash or about Informatica, feel free to submit them in the Q&A portion of your screen. And just to answer the most commonly asked questions, just a reminder, I will send a follow up email by end of day Monday for this webinar with links to slides and links to the recording. So diving in here, I didn't hear mention of data ops. It seems plausible to delegate to the data team, everything that fills the lake house. When that include data drift and schema drift, which affect all applications, not just machine learning. So I can take a first pass at this I did mention data. Well, I mentioned DevOps and data ops is not as ruggedized a term. So I didn't use it but pretty much everything I talked about when it comes to DevOps applies to date it's really data ops that I'm talking about here today and now I did not get into kind of who's going to do everything within ML Ops, but certainly I would expect a data professional to handle the types of things that you mentioned in the question data drift and schema drift. So, yes, data ops is a part of it, if you consider it that way. And obviously there's more what what you left out there and the question was all the ML processes right we're talking, you talked about the data processes, which is great and very important. But there's also the process of moving machine learning across the platforms into production and beyond and into its iterative eventuality so that is a very strong aspect of ML Ops, as well as the data ops aspects of it. Yeah, and just adding to that, the philosophy around data ops and ML Ops is basically, you know, collaboration right bringing different departments together, so they can basically work together right. So, both data ops and ML Ops, they're all like this some overlapping right I mean there's a word ops in there. So, there's definitely if you draw a Venn diagram there's definitely overlap and both of them are trying to automate processes within the pipeline so for data ops is automating the entire process of data preparation to reporting. And part of that can be used as ML Ops like the whole data preparation aspect of data ops. Yes, that is applicable to ML Ops as well. But ML Ops primarily like William said, it's mostly focused on automating the entire process from model creation to deployment and monitoring so that's kind of the difference between both those. Great, great question and answers. So I'm going to get this the only question so far but I'm going to give everyone a moment to type an additional question. Let me ask you have a lot of ship there's anything you want to add after hearing Williams talk that you want to comment on. I don't have anything specific to add here. But yeah, ML Ops is again evolving and like William said, you know, again my 1% slide was kind of an old statistic from 2018. But again, given the, you know, how this whole space is evolving. More and more ML projects are, you know, coming streamlined there's more visibility at the sea level on ML projects at various enterprises, and ML projects are becoming successful. And, you know, it's not a big number but yeah, I think there are more tools out there that can help you out with this initiatives. I love it. Well, thank you both so much for these great presentations and thanks for attendees for for attending today. And thanks to, of course to informatica for sponsoring today's webinar and helping make these webinars happen. If you have. Oh, I've got one more question. I think our group has different environment dev uat and product do you suggest having different workspaces. I'm going to let Abilash answer in context of the informatic product because I think it's fairly product specific as to how you would handle that. Yes. And again, you know, coming to work spaces, given, you know, if you bring in security on how you access data between dev uat and prod. Yeah, ideally, you know, most of our customers have different workspaces between dev uat and prod. All right, well again just reminder I will send a follow up email by end of day Monday with links to the slides and links to the recording for this webinar. Thank you both so much. Thanks everyone for joining. Thank you.