 Have you ever found yourself lost in logs and dashboards while getting to the root of the bills and tests? Have you considered monitoring your CI processes using AI tools? How do you like the idea of having an intelligent monitoring tool that can help you analyze all these failures? Well, in this session, I, Akanksha Dughal and my co-presenter, Oindula Chatterjee are going to introduce you to our project AI for CI and walk you through the tools that we've built that can help you better monitor your CI CD processes. Hello again. I am Akanksha Dughal. I am a data scientist at the office of the CTO in AI Center of Excellence at Red Hat. I am based in Boston. And hey, everyone. I'm also from Boston, United States. And I also work as a data scientist with Akanksha in the office of the CTO in the team AI Center of Excellence at Red Hat. So by the end of this presentation, you should have an idea of how you can use AI for CI and its set of AI tools to monitor your CI CD workflows. But before we go there, let's start to understand what this project is about, the story of how we got here and the various initiatives that support this process. So what's AI for CI? Before getting there, though, let's start to understand the AIOps mindset. AIOps stands for artificial intelligence for IT operations. AIOps is a critical component of supporting any open hybrid cloud infrastructure. In our opinion, though, it's mostly a cultural change, very much like what we saw with the DevOps movement. Here we combine the two cultures from dev and ops to create a new culture, the DevOps culture. And the same cultural change is going to happen with AIOps. The DevOps culture and the data science culture use very different tools and have some very different mindsets. Our aim with AI for CI is to bring these two cultures together so that they can learn from each other. And we aim to embrace the intelligent tooling that is available in the AI world and apply it to the operational domain. And what's CI? CI or continuous integration is the practice of automating the integration of code from multiple contributors into a single software project. CI CD or continuous integration continuous delivery is the solution to problems that integrating new code can cause in the development and operations teams. You have heard of it, it's called the integration hell. So what's AI for CI all about? Simply put, it's AI applied to CI CD data. So it's an intelligent open source AIOps toolkit that can be used to better monitor bills in order to help the development lifecycle. So the goal of this project AI for CI is to build a set of AI tools for developers by leveraging the open data that is made available by OpenShift and Kubernetes CI platforms. As a part of AI for CI, we have built a collection of open source AIOps tools which can help support the CI CD process. Such open source data, open source CI CD data which are coming from real world production systems is a rarity for public data sets today. Thus, this also presents a great starting point and an initial area of investigation for the AIOps community to tackle. We are focused on cultivating an open source community that uses open operations data and an open infrastructure for data scientists and DevOps engineers to collaborate. So let's look into some of our current initiatives. Firstly, we are involved in collecting data from various open data platforms and creating a community around open CI data sources. We are also involved in quantifying and evaluating the current state of the CI workflow using key performance indicators. We are also building AI and ML techniques to improve the overall CI workflow. And finally, and most importantly, we are creating a reproducible end-to-end workflow around data collection, analysis and modeling efforts using multiple open source technologies like Ilyra, Kubeflow Pipeline, Jupiter Hub, Selden, all of which is developed, built and operated on the Operate First environment. So what's the Operate First environment or the Operate First cloud? So we build this AIOps community and develop the necessary tools on the Operate First cloud. Operate First is an initiative to operate software in a production grade environment bringing users, developers and operators closer together. It uses the same community building process of open source projects, but it's extended to ops procedures and data. Operate First enables the collaboration between open source developers and cloud providers. And AIOps supports this collaboration by creating a new set of tools around CI CD processes which could otherwise be a pain point during the development and production of these open source systems. So next, Akanksha will dive into the various open data sources that we are currently looking at. So as we discussed earlier, one of the goals of this project is to build an open AIOps community involving open data sources which originate from different parts of the CI process. So to understand these different segments of a CI process, let's take an example of OpenShift which is an enterprise Kubernetes container platform which consists of hundreds of repositories. Thousands of contributors create pull requests in order to contribute to the code via GitHub. Every new pull request to the repository with new code changes is subject to various tests and bills. Prow is the central component of the automation. It is a Kubernetes based CI CD system and how the Kubernetes testing group defines this. Prow is a CI CD system built on Kubernetes for Kubernetes that executes the jobs for building, testing, publishing and deploying. It is seamlessly integrated with GitHub via hooks which can trigger automated CI CD jobs for GitHub PRs. Test Grid is a platform that is used to aggregate and visually represent the results of all these automated tests and bills. OpenShift also collects the cluster level telemetry data which consists of the metrics about the resources that are being used while running these bills and tests and report any alerts that were experienced while running these. The ultimate hope of this work is that we will be able to connect these various data sources like GitHub code changes with the bugs, the cluster telemetry data sets, Test Grid and Prow in order to develop a complete picture of a CI process. Now let's take a look at our first open data source that is GitHub. So the bills and tests run by the CI process are required because of the changes that are happening in the application's code base. Besides the tests and bills, the information within the GitHub code repository such as the metadata and the diffs about the PR can give us more information and insights about the overall CI process and ultimately lead us to the root cause of the failure or any issues in the development process. So in an attempt to quantify the critical metrics within the software development workflow, starting from the code contribution, we prototyped a model which can predict the time it takes for an open pull request to be merged. That is the time taken from the creation of the pull request to the time it was merged. A metric like this can help identify the bottlenecks within the development process. For example, having an estimate for how long is it going to take for a pull request to be merged can help the developers and the engineering managers better allocate the resources to a certain set of pull requests and speed up the engineering process. Now let's take a look at the workflow that we use for estimating the time to merge. So in order to do this, we frame this as a classification problem. Whether the time taken to merge for a PR falls within one of these few predefined time ranges. So as a first step, we start collecting data using a lot of, we start collecting the data of this pull request using one of our internal tools that collects all the metadata from the OpenShift Origin repository. Then we transform the input columns obtained from the pull request such as the size of the PR, types of files that were added in the PR, the description of the PR, and all these various features that can be ingested into a machine learning model. We then explore various vanilla classifiers to classify the time to merge value for the pull request into one of these 10 bins or classes using the features that were engineered on the raw pull request data. And finally, we deploy the model that was yielding the best results into an interactive service using Selden. This endpoint is available for anybody to interact with and test it out on new pull requests. Once integrated with the GitHub repository, this service can provide newly submitted pull requests with the time to merge estimate. Now we shall look into another data source that is PROW. PROW is a Kubernetes based CI CD system which will allow us to triage an issue and see the actual logs that were being generated using the building and the testing process. The PROW dataset consists of log data generated for each build and job. In order to understand the dataset of the events and build logs from PROW, we download them programmatically in a Jupyter Hub environment. That is a tool used by the data scientists to interact with Jupyter Notebooks. And we perform exploratory data analysis and apply some MLNLP techniques such as TFIDF, that is the term Frequency Inverse Document Frequency. In order to cluster these build logs and automatically determine what is the type of failure for the given log. These logs represent a rich source of information for automated triaging and root cause analysis. But unfortunately, they are noisy data types and cannot be ingested into a machine learning model directly. Therefore, we've cleaned the logs by encoding some subject matter expert knowledge into our process, a couple of well-defined rejects, before applying the TFIDF to encode each build log into a vectorized representation that should retain its contextual meaning in relation to the others. We cluster the job runs based on the term frequency within their build logs to group the job runs based on the type of the failure it is. For example, the different cluster represents the group of builds that fail due to platform failures or failures that were caused during the preparation steps, even before the test start to execute. And it is also able to differentiate between the different syntax structures of the logs. And having said that, now let's move to our final data source, that is Test Grid. So Test Grid is an open source visualization platform developed by Google to help various communities to track the status of their tests and visualize the builds in a friendly format. So where does it fit in the context of a CI process? As we see, Test Grid is a dashboard which is an aggregation of multiple tests over the time period. And each cell here represents whether a certain test was failing, that is red, or passing, that is green, or maybe not running. And along with that, a few other possible states. So now let's take a look at the workflow that we use for Test Grid. So to gain more insights and work with this data programmatically in a Jupyter Hub environment, we need to find a way to access the visual grids that were displayed on the Test Grid data programmatically. For that, we will create a connection from the Jupyter Notebooks to the Test Grid URL and download all the select dashboards from the Test Grid by scraping the HTML. This will allow us to dig deeper into the various features of the data and gain an in-depth understanding of the data that would not be very obvious by just looking at the dashboard. Once we have collected the data, our goal is to apply some AI and machine learning techniques to improve the CI workflow. But first, we start by applying certain analysis, aggregating tests and detecting the patterns in the data. So for instance, if developers and managers were to manage the resource allocation for a CI process and in order to save these resources, it would be vital to know if a particular test has a higher probability to fail or if we could find an optimal stopping point for a build run after which that build is supposed to be failing. And to do that, we can calculate some relevant metrics and the key performance indicators which will not only help us evaluate any AI-based enhancements we make to the CI process, but also pinpoint to the developers what specific areas we need the most improvement in and therefore should be devoted more resources to. And after using the Jupyter Notebooks for data collection, model training and all parts of the machine learning workflow to ensure that these tasks are sequential and continuous, we automate the sequential running of the Notebooks using a simple workflow. Using tools like Ellyra and Kubeflow Pipelines, which is a platform for building and deploying scalable machine learning workflows, we can run our Notebooks in an automated fashion. Let's take a quick look into Kubeflow Pipelines and how they are useful to us. Machine learning pipelines help automate manual and repetitive machine learning tasks and also help us create faster iteration cycles by paralyzing the tasks. Predefined and automated components lead to better reproducibility and more consistent workflows. Pipelines can also help introduce better version control and monitoring of the machine learning code and artifacts, and thus it makes it easier to monitor the iterations. Coming back to our workflow, in order to help the developers and stakeholders view the KPIs metrics and the aggregated results of their tests visually, we create automated dashboards which can help better analyze the status of multiple tests, investigate problematic tests, bills or jobs. Let's take a look at the demo. So this is the TestGrid UI, which a lot of communities use to track the status of their tests and bills. So if I go click on Red Hat, we can see there are various tabs which have different versions of OpenShift releases and are classified as informing, blocking and broken. Let's take a look at one of these and see what information it contains. So as we see here, most of the tests here were flaking out. To get more details about how and what's happening in here, I'll click on one of these tests. And we can see this dashboard that shows a lot of tests were flaking out. Some of them were green over the time window which is passing. The red ones here indicate failing. And if we click on one of these cells, we will be redirected to PROW, which is the central component of the CI CD. Here we can see various build logs as we discussed earlier. So you can clearly see like just by looking at these build logs, you cannot literally get to the root cause of the failure. And to get access to all of this data that is not very understandable, we will access this entire data set in a Jupyter Hub environment. So we have a notebook that is Get Raw Data that helps us fetch the data. What we do here is we create a connection from the Jupyter Notebook to the Tested URL and try to scrape the HTML using the beautiful soup library and get access to all the dashboards. We download this entire information and store them in Cep S3 storage. Now, we compute a certain set of metrics and we have a well-defined metric template for that. Let's take a look at a small example for build pass and failure. So we start by importing the raw data that we had previously stored. After getting that, we perform the metric calculation. We first fetch all the tests that were failing with the status code of 12 and then store them in a data frame. After that, we do the same process for all the tests that were passing with the code of 1 and store them in a data frame. We combine both these data frames and perform some basic calculation like what was the total number of bills, what were the total number of bills that were failing, what was the percentage of bills that were failing and try to visualize all these metrics in a dashboard format that we'll take a look at shortly. We've also done small visualizations using C-Bon and Angela will show you the automated pipeline and the superset dashboard. So over to you Angela. All right. So we saw an example of how we calculate the test grid data, how we perform various metric calculation steps. Now we can take a look at a sample Kubeflow pipeline that we have created using the Lira notebook pipeline editor. So as you can see on the left, we have the get raw data notebook, which is the first node which is collecting a bunch of raw data from the test grid UI. And that is followed by a number of metric calculation steps which run in parallel to one another. So this is essentially a two step pipeline that we have created. So to trigger this pipeline, we can hit the run button here. We can provide this pipeline a sample name. Let me run this again. All right. So I'm going to call this AI for CI. I'm going to select and a preexisting Kubeflow pipeline runtime, which I've previously created. And I'm going to hit OK. So once this pipeline was submitted by clicking on run details will take us to the Kubeflow Pipelines UI, where we can see the running jobs and the various experiments that we trigger. So what you see here basically is one node within the experiment, which refers to the first notebook in the pipeline. To view any running logs, we can also navigate to the logs within Kubeflow Pipelines like we saw in some of our previous talks and demos today. So once this pipeline completes executing, it looks something like this. Green check mark essentially indicates that all the steps ran successfully. We can also track any metrics during the execution of these notebooks using Kubeflow Pipeline metrics. And to automate this, this can also be automated to run on a daily, weekly, or a recurring basis depending on the time duration you want to run it on. So moving on to the superset dashboard where we visualize the output of all of these metric calculation steps. So what you see here is basically a dashboard which aggregates multiple metrics like the total number of running tests, number of tests that are passing, number of tests that are failing, metrics such as how many times was a build or a test suite run before a failing test started to pass. What is, how much time did it take before a failing test start to pass? Metrics such as what are the different correlated test failures we log in once. So we can see various metrics while we log in here. We can essentially see all the outputs of the various metric calculation steps. We also have an ability to filter by tabs, grids and tests within this dashboard. So if I filter by a particular grid, we can see all the metrics and the steps will, all the various metrics will be recalculated based on that particular grid. So metrics such as this can give engineering managers and developers more insight into long-lasting failures, more insights into how much time did it take before a failing test started to pass again, how much time does it take for a passing test to start failing and metrics like that. So moving on and moving back to our presentation, let me make sure, all right. So we saw how we can, so we saw how to visualize these metrics, how we calculate the data and the different metrics. Let's take a look at how you can engage with this project. So there are multiple ways to get started and we compiled a list of ways you can engage with this project on the URL that you see on the screen. So you can basically interact and leverage the various open CI data sources that we work with on this project, which include TestGrid, Prow, GitHub, Telemetry, Bugzilla and various other data sources, along with the data collection scripts, exploratory analysis. We also have interactive and reproducible Jupyter notebooks for this entire project, which is available for anybody to start using on the Jupyter Hub instance of the Operate First Cloud right now. We also have interactive dashboards like you saw earlier to leverage your own CI data sources. We also have an interactive model endpoint for the GitHub Time to Merge model, which you can try out. We run AI ML workflows using Ilyra and Kubeflow Pipelines. So if you wish to run ML workflows and automate your Jupyter notebooks, you can follow some of the guides that we have compiled. And finally, to learn more about different analysis and notebooks within the project, you can also check out our YouTube video playlist. So this is an open source project, which we started within our small team at Red Hat. So if this project is useful to you or let's say if you're working on similar efforts, we strongly encourage contributions to this effort. So there are various ways in which you can contribute to our existing work, our existing notebooks, or contribute additional KPIs and analysis. So if you would like to contribute to the work of developing additional KPIs and metrics, we have a video tutorial and a helper notebook which outlines the template for a metric notebook contribution. And if you would like to contribute to an existing ML workflow or a model by improving it, we highly encourage that, or if you would like to add your own ML analysis and model, we also have issue templates for you to do so. So that's all we had for you today. Thank you so much for joining. If you have any questions, feel free to drop them on Slack. For our virtual friends, please feel free to drop them on the Q&A. If you're going to be around in the conference, you can also find us at the Red Hat booth for the next four days. So thank you so much.