 I'm very excited to be able to present a DEF CONF again this year, even if spiritually. I hope next time we will be all in person and we can see each other. The topic today is the feedback about the deployment of an intelligent app. So we will consider machine learning models, deployment, and we're going to focus on the Python ecosystem first, but let's see what we have on the agenda of today. Project Toth is the first topic we're going to address. Basically, we're going to see what is Project Toth and what we do in Project Toth, what are our goals, the integration that Toth has, and how you can interact with the Toth services. The second topic is the MLOps relation. So we talk about this new pipeline that gives feedbacks about the deployments, but how does it fit in the MLOps context? Then we're going to see why, what, and when. So why we want to provide this pipeline, what we want to provide in this feedback. So at the end of this feedback, what the user will see, the scientist and the DevOps engineers, and when we're going to provide it. So what is the trigger to this pipeline? Then we're going to see how you can use it. So if you want to test it and where you can do it, what you'd need to install to use it. Later at the end, we will see all the pieces of the pipelines. So what is the pipeline actually doing? And an example of the results that are provided by this pipeline. So the feedback itself, how it's shown, and how the user will see these results. But let's go one step ahead. So let's start with the project tools. So project tools is an AICOE open source project, and three main goals, let's say. The first is that we want to help the developers in the selection of the software stack. So we want to provide a service that allows them to decide which software stack to use based on their requirements. What do we mean by that? So when you select the software stack, or when you start working on your project, you usually select some libraries, like TensorFlow, Matplotlib. These are typical libraries that you use in your project. But when you select this kind of dependencies, you should know that together with these direct dependencies, there are also other packages which are installed together with TensorFlow. So there's NumPy, there are other dependencies on NumPy. So there are different kind of layers of dependencies, and these are called transitive dependencies. And this comes all with the packages that you install when you do just pip install TensorFlow, for example. So when you want to select these dependencies, it's not trivial if you want to consider, for example, not the latest version. Maybe there are a version or a combination of dependencies that can give you higher performance, or maybe there are a combination that can give you higher security. Or if you're interested in community projects and you want to know what are the best maintained projects in your software stack, then you can also find out this information. And so what TOT tries to give is a tool or a service that would ease the selection of these dependencies based on the requirements. So I'm a data scientist, I need to train my model, I want the most performance software stack. So I can ask TOT for a recommendation of type performance. Or maybe I want to put my model in production, so my application is ready, I have my software stack. But what are the dependencies that should I use in case of security so that I don't find any CDE and vulnerabilities that can impact basically my application. So all this question can be delivered and by TOT services. Another important goal that we have is to deliver optimized images. So if I, if you follow this, let's say description of a different recommendation that TOT can provide, then you can see that we can provide the optimized images for your application. So if you're working on natural language processing, if you're working on computer vision, then all this specific stack will have specified dependencies and transitive dependencies and each of them can basically be optimized for specific runtime environments. So if you want to run them with GPU or CPU, this is also another important factor that affect the selection of the dependencies. So it's not just the different version that you typically need to select when you select a specific package, but also there are different type of version that can work on specific runtime environment. So maybe it works on UBI8, but it does not work, for example, on Ubuntu or on Fedora or there are some different type of combination. And as you see, the space of combination is very large, not just the dependencies, but there is also all the layers on top of it. So if you have different version of Python, this is also impact the selection of the software stack. And the third goal is that we don't want the human to do this. So we want actually bots to automate all this work. So to maintain these dependencies up to date, to verify that there are no CV in your software stack, verify that this is the most performance software stack that you should use for your application. This is something that we want to automate completely. So we want the bots to take care of it, and you should just focus on your project, on your problems, on the real problems that you need to solve. But let's go ahead. So how TOT can help the developer? We mentioned already some of this, but basically we want to keep dependencies up to date. So you don't have to focus on that. It's the bot that is going to come to your repo, open pull request and update your dependencies with a specific justification. So maybe there is an increase in performance or this package is not secure enough. There are too many CVs and the community is not maintaining that package. So you should be warned or you should use another version which has less vulnerabilities. So all this kind of thing can be completely automated by the bots and by the TOT services. So we want to maintain the software stack for the secure, for the performance software stack, for your AI application, and we will integrate always more knowledge into TOT in order to provide better recommendation and better justification of why some decision has been taken by the bots or by the TOT services itself in order to make your life easier and to basically offload that from you. So you don't need to go and update the dependencies unless there are some specific requirements, of course, but in general, this is something that can be automated completely from the bots and they can take care of it. And what we can also help the developers is basically integrating our services into day-to-day developers tools and other scientist tools. So let's have a look at the integration that the TOT provides. The TOT integration, of course, there is a command line tool which is called TAMOS. So if you're using your laptop or my machine, I have a certain operating system, a certain Python version, and I want to start my project, but I don't know what is the best software stack for this specific environment because it's not only about taking the latest one, but some of the packages actually cannot be installed in certain environments. So instead of going there and then look for debugging and trying to solve that, I can immediately ask a service that is already that knowledge and can give me directly the software stack that will work on my machine. And this is the CLI. Then we integrate with the data scientist tool. So we have Jupyter integration. You can basically go to your notebook and that is something that we strive for is to help the data scientist in reproducibility and shareability of the notebook. Many times, we see many repos with notebooks and dependencies which are not maintained, first of all, and the dependencies are installed directly with the pip install from the notebook cells. This is something that basically breaks the reproducibility. There is no way to reproduce that notebook if I run it today, if I run it in a week, if I run it in a month. So this is something that we don't want and we want to help the developers solving that problem. So we created some integration for Jupyter notebooks. So basically there you can select your notebook using the UI, using the CLI, or using the magic commands present in the notebook. So you can just ask for a specific package and then the extension will take care of it. So it will reach out to thought services, it will create the software stack, pin down to all versions with all the ashes, and store everything in your notebook. In this way, when you basically give this notebook to someone else, they could reuse the same extension and say, okay, this is the software stack that my colleague was using or that this contributor was using upstream. This way we can basically improve this and this will benefit all of us, because if I want to try an experiment that another person did, then if the first thing that should I do is to install the dependencies, if I cannot install the dependencies around it, then I cannot reproduce the experiment. And then we have the bots. So the bots, we talk about them. We want to automate many of these things. So Kebyshek bot is the one that takes out of the dependencies. You can install it from the marketplace. And once it's installed, it can immediately start taking care of your dependencies. So it will look for specific dependency management files. So we focus on the Python ecosystem, as I said at the beginning. So it's just going to look for PIP file log for requirements, TXD, requirements.in and it can take care of these dependencies automatically. Then of course, we want to integrate in source to images. So if you want to have container images, which are based with thought services, you can integrate and receive recommendation on your software stack during the build. So if you integrate this in CI, for example, and you ask for security, then thought services can reach out and see if the software stack that you're going to build is going to be like secure enough, or if the CI will immediately fail, because there are some packages which basically do not come from a trustful source. And this basically is important in order to maintain a secure pipeline and secure images before you're deploying in any production environment. And we want to optimize deployment pipeline. So this is similar to part of the topic that we're going to consider today. This is, let's say, a first step of the goal is to have an optimized in deployment pipeline. So it's going to know, it's going to learn, thanks to knowledge, what is the best deployment that you should use. So from the software stack point of view, the runtime environment point of view, this is the best that you should use in order to get the most performance for your model. And this is something that we will see in a moment. How you can interact with TOT. So you have all this integration, but the point of interaction is all in this specific configuration file. So in this configuration file, you can basically state which kind of requirements format you're going to use. So if it's P-Pen, this is P-Tools, this is poetry, so all these kind of requirements can be handled by the system. And then you can basically state the runtime environment. So if you're using Fedora 34, or you're using UBI8, you're using Ubuntu, all these runtime environments can be stated. And based on the knowledge that TOT has, then we can provide this recommendation. Of course, not all the runtime environments are supported at the moment, but if you have interest or if you have some specific requirements, you can reach out to us and we can discuss about it. As you see, there is not only the operating system, but there is the Python version if you use CUDA. So if you have images with GPUs that needs to be used and the type of hardware, of course, that you're going to provide. This configuration file can also be automatically discovered. So there are some commands in the CLI or in the S2i and the notebook integration. So all of this can automatically discover this information from the system. So actually, you don't need to provide them unless you have specific requirements. Emelops relation. So let's talk about a little bit like a general example of what happens in machine learning lifecycle of your project. So you have your project, or typically on a specific GitHub, GitLab, so these places where you can work with your teams and I am a data scientist and what I will do is to use a specific platform. For example, I can use the Operate First environment, which is the open environment as OpenData Hub deployed on top of OpenShift. So there you have an open environment where you can start an experiment with all these tools, Jupyter Hub, Jupyter Lab, Elira, you have Kubeflow, Tecton. So basically I have all my environment to start doing my data science project. So I use notebooks, I create pipelines, I can retrigger my model, the training of my model. So all the things that and the tools that are required for the data scientists are there. So this is the first part of the machine learning deployment development. Then every time I basically make changes or I want to freeze my specific project in a certain moment in time, then I will use releases. So this is something that you do with CI CD, you can do your CI checks, you can do the builds and in this way the project can be basically tracked and everything is basic, is always coming from the source of truth, which is your GitHub repo where everything is stored. You have the KBSHET bot, for example, that maintain your dependencies, update so you don't have to do anything there. The CI is all automated, what you have to do is just release. And then you have the last part which is moving to the stage and prod environment. In these environments you can be completely automated for you if you use something like ARGO CD and Prometheus and Grafana for monitoring. So these tools can help you to have automatic deployments, automatically monitoring. If there is some data drift for your model or anything is happening, you can check for the metrics and this can re-trigger again, for example, a new training or it can provide new training with a new dataset or maybe there is a new software stack that can trigger again a new training or something in these life cycles. And this is basically what look like when you have to describe what is happening from the model training. So you have the model that is trained with a pipeline. You store it on some specific place, cloud storage, on GitHub, if there is not big enough, then you create an image out of it, you test this image and you push it to the registry. Then basically the manifests are updated through the pipelines and ARGO CD can be triggered based on these changes and redeploy the model with the new version. Then the continuous monitoring will continue and if there is anything like a data drift or anything wrong in the metrics, then you can have alarms and you can re-trigger the pipeline. So this is all something that this can be completely automated. So if we focus on this, let's say three big parts of this schema. So the first one is the model deployment, you have continuous training. The second one is CI CD and the third one is the continuous deployment and monitoring. So where do we want to add this feedback? So now we see where is the new pipeline fitting in this context. So what we want to do is to provide this feedback as early as possible, so you don't need to move to the next stage and prod already, but you can immediately receive the feedback about the deployments. So is it the model that I just created as a data scientist that needs to go into another environment, is going to behave in the same way, is going to provide the same performance or we need to tune or adjust something. So this is another question that we want to answer as soon as possible because this will speed up the process, of course reduce cost because we don't need to focus a lot or we need to focus less on these problems and more on the general idea and problem and that you'll try to solve. And so this is where we are going to fit this new pipeline. So as soon as I open up a request, basically the CI can start and one of these checks or one of these pipeline will be the one related to this feedback. Of course there are some inputs and checks that needs to be done, but we will see them in a moment. Why, what and when, why we already specified why basically, we want to speed up the process, we want to reduce the cost and of course this is related to the lifecycle of the application, so when you build a model it's not a static model, this will change in time because the data will change because the inputs and the data set will change so all these things or the software stack will change, so all these inputs can basically change the application lifecycle and so you, we want to help them basically offload this part of the work and they can focus on modeling and data. What feedbacks? So the feedback that we're going to provide is basically metrics. So metrics relate to the model, metrics relate to the application, so latency and CPU usage, memory usage and also the platform metrics of course. So we target different personas, so this feedback can be useful not just for the data scientists because they are going to focus on some specific metrics but also on the AI DevOps, so if there is latency, latency, memory consumption and CPU usage. When do we want to give this feedback? So as soon as there is some change in anything, in any of these specific things, so the software stack, the model version, data set version, but there could be also other type of triggers. How to use it? In order to use it there are some requirements of course that you need to have. You need to install the RCECI, this is something that you find in the GitHub marketplace, it's very quite easy and straightforward to install it. Same for the bot, so we have the KBOOT which is basically another the GitHub app version of the KBOOT which is the one that is running the services. You need to have a model to be deployed in your repo because you want, I mean this pattern works because there is a model to be deployed and you want a test. We will talk about the test in a moment. So the ACI, this is where you find it on GitHub marketplace, this is just what you have to do, you will find the install button. How does the ACI interact? We saw the top configuration file, ACCI is an ACI ACI YAML file for configuration, where you can state basically the build that you want, what base image you want to use, the source strategy of the build, and the registry you want to push. If you have also this model as I said, so you can also automate the process of updating the manifest. So if you have a specific tag after a release, this can be automatically updated by the bot and RQoCD can be triggered because immediately see that there is a change in the manifest and can really deploy the model for you. Deployments, there are several types of deployments, flask application, cave serving, cell don, tensor RT, we can go ahead with this list, but for the moment we will focus on flask application. This is the KBOOT bot, you can also install it. As I said, there's basically triggering the KBOOT services. And this is actually the library we use for the test. I will spend just two moments on this. Behave is a library that basically allows you to create tests in natural language processing, and they are backed by a Python code. What I mean by this? Let's have a look at this example. So this is the test that I want to use. I'm a data scientist and I want to use this test to gather some metrics for my model together with my colleague that takes care of the platform and application metrics. So I will write this test saying that if I have a data set and I have deployment of this model, then I want to run this test on the predict endpoint and gather metrics that I can use. And this is the Python part. So the back of this basically and natural language definition of the test, you have some Python code. So the first part would be the dataset availability. So check the dataset and download it, take the deployment, and check if it's available, and so on and so on. The pipeline and the example of the results. So let's have a look at the pipeline itself. The pipeline is triggered, as I said, when you basically modify anything in your project and you open up a request, you either modify the dataset, the model, or the software stack. The inputs are checked, and this means that we're going to check if there is a model to be deployed, if there is a test to be used, because the test we expect to be created by the users is not something that we can automate yet. But this is something that the user needs to provide, because actually the metrics of the model depends on the type of application that you are creating. And so this is specific to the data scientist. Then we will configure the pipeline. So we use the SOE configuration file, and we start building the image and push it to the registry. Then we're going to deploy the image. And once the deployment is done, we will destroy the route. And then we run the test. So with this specific test is run against the deployment and the predict endpoint. And at the end, the metrics will be basically shown in the in the pull request. And this is an example of the result. So this is just the bot that is commenting in my pull request and telling me what kind of test was run in which namespace and the metrics relate to the model and the application. So everything related to specific for data scientists and also for the AI DevOps engineer and the platform metrics to know basically if the deployments requires more memory or more CPU in order to improve these performances. If you have any question, please reach out to us. We have a top station website and we are on GitHub, so you can open issues and reach out there. And we have these are the link for the ICCI for the Kebut. And if you are interested more into what thought is doing, you can check the YouTube channel that is more about taught the services, how everything works, how do we provide the recommendation reinforcement learning that we use in the model in the top services. And then you have also linked to operate first. If you want to basically join the open community and learn more about all this data science part and also operational part, then you can go there and start playing with our tools. I hope you enjoyed this presentation and I'm open for any question offline. And if you want to reach out to us, thank you very much and see you next time. Thank you.