 Welcome everyone. I imagine we'll have a few folks starting to trickle over from the main stage to this session, so we'll get going here in just a moment. I hope you all have had the opportunity to enjoy the morning and afternoon depending on what times on your end. Sessions and then the great session just now, the great keynote by Burr Sutter. Really interesting talk from Burr as always. I would like to welcome you to our next session in the track today, the Intelligent Applications, The Red Hat Way. My name is Mike Ward and I'll be your moderator for this session. It's my pleasure to introduce our speaker for today, Marion Fentes, who is a Principal Product Manager for OpenShift AI. She's been working with customers on automation projects and intelligent applications and previously held positions of solution architect for customers and partners in North America. First, a few comments on logistics before we get started. If you have questions during the sessions, please submit them into the comments area and we'll take those and address those as they go along. So we'll be monitoring for your questions and take those as they come up. If we're not able to address your question for some reason during the session, we'll make sure to log that and take a point to follow up with you as long as we have your contact information and can do so. Also, a recording of this session and all the other sessions throughout today's event will be available on the Red Hat Developer YouTube channel. We'll take it probably a couple days or so to get those uploaded and posted into YouTube, but you'll certainly have the opportunity and benefit to go back and watch all the great information being shared today. We also then as a reminder, of course, just coming out of the keynote session in the live chat, but as a reminder, as you stay through us at the end of this block, we welcome you back to the main stage for a general open Q&A session if you have questions to chat with Red Headers. So with that, let me turn it over to Miriam and we'll get going. Thanks so much. Thank you very much, Mike. And thank you everybody for watching this session. And the topic for today is how to build applications powered by AI, the Red Hat way. So what we will discuss is, you know, what are some of the basic principles when you are building AI powered applications. And we will start with just a real quick level setting explanation of what is an intelligent application. An intelligent application is somewhere, it's an application where part of the code was written by a human in any language after liking. And using the platforms and the tools that they already been using since the beginning of the days. And then the other part, it's called that it was created and put into a model trained with data. So there are two, this intelligence that you put in the model is what makes now the applications intelligent. What are some of the examples of these type of applications? Well, recommendation engines, virtual assistants. It's very useful in things like fraud detection, money laundering, and any type of activity where you can leverage large amounts of data to discover patterns and to learn from past experience. So pattern detection, text analysis, summarization, now with chat GPT and all of these LLMs, the use of AI into everyday and everyday activities becoming very popular. But what are some of the challenges that you get when you are trying to build this type of applications? This is a forum for developers. I'm sure that you are eager to learn. Well, I already developed application. What else do I need to develop an intelligent application? So things are a little bit different on this room. The first one is that there are different personas involved on this development cycle. So you have on one side the application developer and on the other side, you don't have a developer. You have a data scientist. So each one of these different personas have their own tools and they do very different things. The data scientist mainly works with data that needs to be sanitized, that needs to be gathered, that needs to be plugged in, and then once they feel confident, they use that data to conduct experiments to be able to produce a model that's trained with a certain level of confidence. The application developer, well, they have requirements, they code them, they test them. And once they are confident that the requirement was met, well, then you deploy them and then you operate and monitor. From here, we can see that one of the main differences is that the path, even if an application developer is using agile methodologies or they are iterating over their code is more like a straightforward path because you have a requirement and you have multiple ways to meet it, but you choose one, you implement it and that's it. For a data scientist, it's really a scientific process where you need to experiment to find the optimal answer. You never have the right one. And then the next challenge is that it's not only two personas, sometimes there are more different streams, there are different streams of work with even more personas. So when you are building an intelligent application, it's more of a multidisciplinary team that you need. So maybe you need a data engineer, the other two that you already had, data scientist, application developer, and again, each one of these personas have their own tools and their own environments that they work in. But at the end, they need to produce just one output, the intelligent application. And to add a little bit more complexity, well, you have a myriad of tools to mix and match to solve or to do each one of the tasks that we saw for the different personas. And I mean, all of these tools have different degrees of maturity. They are open source, some are closer, some are from vendors. So there is a lot of choice out there. OpenShift.ai is a platform that gives you all the tools to manage the end-to-end lifecycle, but we do it in a pluggable way. So if there is a tool that you already use or that you prefer using, you can just plug it into the platform to have a cohesive way and a cohesive platform for managing all of the application lifecycle. And another difference between regular development and intelligent application development is that, well, there are more variables. With a regular application, you usually just have to control things like the code, the configuration. That's where you... Those are the things that you need to version. You may be using things like source control repository. With an intelligent application, you have more variables. So you have the data that you use to train your model. You have the code of the model itself. You have the application code. You have the model that it's now trained. You have the configuration of the application. So all of these things are more and more variables that you need to handle. So when something goes wrong, like when you find a bug, you need to control each one of them to know what change in what produce that bug. And then the other thing that is also very different is that with a regular application, if you have bugs when you go into production, it's because that bug was there from the beginning. You didn't test properly or whatever happened that bug made it to production and then you know that you have to figure out a way to fix it and that's it. With intelligent application, the model is only as good as the data that you use to train it. And the data comes from the real world and the real world is not a static, so it's constantly changing. So that means that the premises that you use to train your model are constantly changing too. So you know that at some point, sometime, your model is no longer going to be valid or it's going to be defective. So you will have to do the whole process again. So if with regular applications, it's important to have the ability and the agility to deploy fast, to fail fast and to iterate over rapidly based on the feedback you receive. Well, an intelligent application is even more important because you know that they are going to fail because you know that the environment and the data that is trained on it's constantly changing. So having that process down, it's super important to produce some value. So to help with that and to build that muscle memory of constantly deploying applications, we're going to see what are the principles for ML DevOps or MLOps. But before I continue, I would like to make a pause to see if there are any comments or questions. Okay, perfect. So, sorry. So, okay, we'll see what are those principles. The first one is automate. If you want to be able to measure the progress that you are making, you have to be able to measure each one of those steps on that process. So when you're building models or when you're building applications that use models, you have different activities that you need to do to get an end product. The first one is gather and prepare data and develop the model. These two things that are part of training and happen before you actually deploy the model and the model starts serving and making inference, those are called inner loop. And then you have the last two steps, which is deploy the model into an application and monitor and management of that model once it's deployed into the application out there. And that is called an outer loop. All of these is sustained or supported by a hybrid or multi-cloud platform where you can do computer acceleration depending on the model that you are using or the algorithms that you're using. You may need something like GPUs if you are doing some sort of computer vision or neural networks. So all of that is our services that support all of this process. But the important thing is that all of these steps need to be automated and they need to be measurable. The second step is that you need to have control over all of the variables of your model at all times. So to be able to reproduce the results, you have to track and version the code, training data, the artifacts that you produce and not only the artifacts itself but the lineage, which means how do those artifacts change while you traverse this continuum? And remember, with intelligent applications, this continuum is really a continuum. So you are never done developing the application. There are a lot of places where I've seen that there are apps that even the developers don't even have the source code anymore because it's working so well that people are left and there are static applications. Well, that will never happen with here when you have a model and an intelligent app. So you have to be able to accurately tell what happened with each one of these artifacts that you are generating. And to be able to do that, there is a very nice solution that we can find in Open Data Hub or Rodes, the product, which is a pipeline that will help you automate and integrate all of the different components of the model lifecycle because as we remember, some of these activities are done by different personas. So the data preparation could be a data engineer and they can be using tools like feature stores or data curation tooling to train the model. It's a data scientist, so he works in Jupyter notebooks. So to be able to connect all of that into one cohesive process that is repeatable, we use Data Science Pipelines. Red Hat OpenShift Data Science Pipelines, they are based on Kubeflow, which is an open source project that includes a lot of components. So just be aware that right now we are not using all of the available components to do a lot of other things, not just the orchestration, but we are mainly using the Kubeflow Pipelines and the Pipeline Storage. So in Red Hat, we are using a version that is different from the Kubeflow open source. We are not using ArgoCity for the underlying workflow engine. We are using Tecton because we support Tecton, so you will have a fully supported platform to automate the process of building applications. And the main pieces of this solution for the principles are well. First of all, you have IUI to access all of your pipelines. So here you can see the process that you follow to train a model and to deploy that model and to put it into production. And you can see it graphically and you can see all of the information associated with it. So this is your automation engine for that end-to-end lifecycle. And then the other piece that you have is the data and metadata storage. And here it's very important because as you experiment and you do all of these different runs of the process to continuously enhance the performance of the model and get better results, all of that is tracked and is stored in this metadata storage. Afterwards, let's say that you are training a model to identify if a person is a man or a woman based on his or her personal information. And this is a real-life scenario. So in this model, when someone had the title of doctor before the name, the model assumed immediately that that person was a man because he was biased against women and the degree of education that they have. So when you have a problem like that, how can you go back and say, okay, I use this data set to train models? How do I know now which models were trained with a bias in it? So for OpenShift Data Science Pipelines, both the data set and the pipeline, the execution and the model, these are all artifacts. And you can clearly see all of the different relationships between them. So you can't see, for example, which models were trained using this artifact, which is this data set. And that will leave you to know exactly what models are biased and that you need to retrain or regroup. So if you are a data scientist that is producing hundreds of models and all of the sudden you see that one of them is biased, you can go back here and look at the lineage of that model and remember what was the process that you followed by looking at it graphically. So what are some of the data that you store to track the lineage of your model? Well, as you remember, we saw that intelligent applications have a lot of variables. And then here we can see that for every workspace, which is your sandbox, you have runs. A run is an execution of your pipeline where you are training and deploying your model. And then each run is associated with an experiment because maybe you're experimenting with a certain algorithm and you have to rerun it several times until you get the optimal performance. So out of all of these preconditions, the data that you store is the description of the data set. So what happened with the data? What was the name? Who was the owner? All of the information about the model, the description, the name, the version, the type, the training framework, the hyperparameters that you use in case you change them and any specialized labels. And then you also have the metrics. So what was the result? What was your confusion matrix? Where were the values? What was the model ID? So all of these will help you track what happened to the model and to the execution. And it will make your life easier as a data scientist. Another good thing of data science pipelines is that, well, it's really a workflow for data scientists and there are a lot of other tools to do workflows. Tecton is one of them and that's something that's supported by Red Hat. So why do you need something on top of Tecton? And the response is because if you use Red Hat data science pipelines, you don't have to deal with all of the codes of Tecton. You don't have to be an expert in YAML to define each one of these steps. You don't have to understand the insight of custom resources in Kubernetes. You don't have to containerize each one of these steps because thanks to the pipeline server, each one of these steps spins up a container that will run this particular step of your process of training a model. So you don't have to worry about any of that. The only thing that you have to define is drag and drop each one of the steps. If you're a data scientist, you can do it inside of Jupyter notebooks using Python. And once you have the Python definition of your pipeline, there is something called Kubeflow SDK that we saw here. This is the SDK, which you can just drop there the Python code that defines your pipeline. And this will give you the definition in YAML that then the pipeline server will ingest to do your runs. There are other options. So there are extensions like Lyra, if you want to work only with Jupyter notebooks. And there are also SDKs for things like Java and other languages. So it gives a lot of flexibility to meet the developer where he or she is. This is, again, the pipeline is defined in YAML. So if you are a data scientist and you mainly deal with Python, it shields you from having to write all of this code on your own. You just stay with the tools that you know. And also, the workflows are very portable. So you can spin up a Jupyter notebook in your own laptop and do runs and experiments locally. And then when you're sure that everything is working, then you can upload that exact same pipeline into our cluster and OpenShift running in any cloud provider and take advantage of the inside GPUs, distributed training, and containers, the scalability of the cloud providers. So without having to redo all of the work that you have done locally. So what are some of the benefits of applying these two principles, automating and tracing? Well, you will have reproducible and reliable software release. So your data scientist and the rest of the team won't dread having to iterate and experiment. You can have software release at any time. So it's very important that you can deliver production great models and software at any time because they don't bring any value to the business until they are actually deployed. And especially for any application, it's very important to have short adaptation cycles, but for intelligent applications even more because the reality and the data that depicts that reality changes very fast. So you need to be able to adapt very fast to some of these changes. So if you want to try Red Hat OpenShift Data Science Pipelines, you just need an OpenShift cluster, the Red Hat Data Science Operator, the Pipelines Operator. You need a project to define your pipelines and then just configure your pipeline server, mainly the storage where we saw that you will keep track of all of the different variables and the lineage. And that's it. Then you create a new pipeline. So with that, I'm going to show you real fast how the pipeline, how they look like. Okay, just give me one second. So here we are. And this is Red Hat OpenShift Data Science. If we go to Pipelines, we can see that we have to the best pipeline ever in a new pipeline. If we go to the best pipeline ever, we can see the different steps. This is just a very dummy pipeline, but if you click on each one of the steps, you can see the arguments because that's also important. Remember that each one of these steps may be a container. So the pipeline server also takes care of passing arguments from one container to another. For example, the result of this execution could be the input for the next step on the process. The image that you use, so you can use different images for different steps again, because this is a multi-disciplinary process. And then here you can see the YAML that configures those pipelines if you want, or if you feel more confident with the YAML. Here are the runs. So the runs are the actual executions of your pipeline. You have two types of runs, so you can schedule one. Let's say if you know that you will have more compute resources at a certain time of the day, let's say midnight, you can schedule your pipelines to run that day at that time, or you can just trigger some ad hoc runs. So here you have track of all of the experiments and all the runs that you have done. And as you see, each run is configured on a default experiment. If you go into the runs, you see that everything went well. You can see the details, the output of the run. And you can clone it, delete it, or stop it. So with that, I hope that you guys try the pipelines and automate the process of delivering intelligent applications. Thank you. Great. Well, thanks, Miriam, for that great presentation. We hope you all enjoyed this session, and as a reminder, this will be available on our YouTube channel in probably a few days if you'd like to recap this or again to catch other sessions that you weren't able to that happened earlier or later throughout the day. Certainly be sure to hang out here for our next session, optimizing Apache Kafka with cruise control. We're going to do a quick disconnect and reconnect to swap our speakers and go into the next session here in about three minutes at the top of the hour. So if you're not able to stay with us, we hope you have a great day. Again, Miriam, thanks for your time and presentation today. Thanks, everybody. Bye.