 Hello, can everyone hear me? Perfect. Hello everyone, hope everyone's doing well. So most IT organizations generate a large amount of operations data in the process of operating their software and are beginning to embrace the adoption of AI to support IT operations. Yet the real world operations data is rarely available as a public data set. In this session we will introduce you to how we are bridging that gap with our project AI for CI and how we are making a shift towards an open operations data community. Hey everyone, my name is Akanksha Duggal and hey everyone, my name is Oindrila Chatterjee. We are data scientists working in the emerging technologies group at Red Hat in the office of the CTO. We come from Boston United States and we are so excited to be here. So today we are going to look at what is open operations data, why is it important, and where it comes from. We'll also introduce the concept of AI ops to you for those who are not familiar and we're also going to introduce our project AI for CI and the various tools that are a part of this project. And finally we're going to show you a little demo of what we have been working on and finally share some resources with you. So open operations data, what is it, where does it come from and why do we care? So it's the data that is originating from real world production systems. It's the data such as logs, metrics, telemetry data, all of the data that you generate by operating softwares and applications in production. So for the most part, though, software, even though if it's open source software, it's operated behind closed doors and that's the data that's generated from operations is not openly available. So where do we get open operations data from? That brings us to operate first, the operate first initiative. It's an initiative centered around open sourcing the operations of software. It provides us with a real production cloud environment that can be used to operate software and applications in production openly. So thus we are open sourcing the operational details, the SRE best practices and generating tons of logs, commits, issues, templates and making all of that open source. Operate first also provides data scientists with a real production cloud environment with instances of cloud services and applications like Trino for data scientists, data storage and relational data storage needs, Jupyter Labs for model training, the Selden operator for training models, Red Hat OpenShift for deploying models and various other applications which can basically support their data science work streams. And if you're interested to just learn more about this initiative, I definitely recommend checking out the upcoming session by my colleagues Karsten Wade and Marcel Hild on Thursday, June 9th at Open Infra Summit. Another source of open operations data that we are currently interested in using for demo purposes is the Kubernetes, coming from the Kubernetes testing systems. So the Kubernetes testing system contains some applications. Number one, Pro, which is basically the central component of the Kubernetes testing infrastructure, contains a lot of logs and builds and data related to tests and runs. And also Test Grid, which is a Kubernetes testing and visualization platform which basically contains the aggregated results of builds and tests and runs. These are some data sources that we are using for our demo. So we have all this open operations data. Why do we care and why is it important to us? So firstly, open sourcing, operations and SRE best practices leads to better collaboration between SRE folks, DevOps engineers, data scientists, and ultimately leads to building better software. These open source operations data and CI CD data that are coming from real world production systems is a rarity for public data sets today. Thus, this also presents a great starting point and an initial area of investigation for the AI ops community to tackle. And finally, these open operations data sets can make the creation of AI tools possible that can assist with cloud and IT operations. So we all have heard about AI ops or those who have not heard about it. I'm here to introduce it to you. So many of us know that it's a thing of the art. Everybody is doing it. But what does AI ops actually mean? So we define it as artificial intelligence tools that can be used to build services and ML applications that can be used for and applied in order to support IT operations. And we use these AI ops capabilities to support CI CD workflows. And hence, our project AI for CI uses open operations data to build a collection of intelligent and open source AI tools to collect and analyze CI CD data. And now to dive a little deeper into this project, I'll hand it over to Akanksha. So as a part of the project AI for CI, we periodically collect the open operations data from various sources like Proud, GitHub, DeskGrid, and make them available for analysis. We use these open data set to build a collection of AI ops tools and share these notebooks, scripts, dashboards, and pipelines that make up a tool. We collect key performance indicator metrics from CI CD data and display them on superset dashboards and also build machine learning tools. We will demonstrate that in action. So stay tuned for that. And we build all of these tools and services using the Operate First Cloud, which provides us with various machine learning tool services like clusters with Jupyter notebooks, superset dashboards, et cetera storage, Trino database engine and model serving infrastructure to support our end-to-end machine learning workflow. So now let's take a look at the architecture diagram to understand the workflow that we follow in order to build this tool in the project called AI for CI. So the first step is to start by collecting the open operations data from various data sources like DeskGrid, GitHub, Proud, coming from the Kubernetes testing infrastructure. To gain more insights and work with this data programmatically in a Jupyter Hub environment, we need to access this data programmatically for which we use a tool called Jupyter Notebooks, which is a way to run your Jupyter Notebooks interactively in a Jupyter Hub environment. To create a connection from the Jupyter Notebooks to our data sources, we specify the URL for the desired data source, collect the data, and store all of this data in Cep S3 to make sure that the data is stored in the Cep S3 storage. Once we have collected the data, our goal is to apply some AI and machine learning techniques to improve the CI CD workflow. But first, we start by applying certain analysis like aggregating various tests, detecting patterns in the data, which can help us quantify and evaluate the CI workflow. So we start by calculating relevant metrics and key performance indicators, for example, the data-based enhancements we make to the CI process, but also pinpoint to developers what specific areas need the most improvement, and hence should be devoted more resources to. And finally, for the developers and stakeholders to view these KPIs, metrics, and aggregated results of their tests visually, we export these metrics to the SQL network, and further use these tables to create automated visualisation dashboards called using SuperSet. So after finding all the relevant metrics and KPIs, we'd also like to apply some advanced AI and machine learning techniques to better support the CI CD processes. So a few examples of the ML services that we are currently developing are like one of them is the model that determines the time to manage for a new pull request that comes to the repository. Some other cases would be like suggesting an optimal stopping point for the long-running tests so that we can save our resources at a desired time and try to find out tests, try to find out an optimal stopping point after which the test is most certain to fail. And in order to do all these models, we need to develop machine learning models using Jupyter Notebooks and store the trained model in CepS3 storage and further to use this saved trained model and to deploy it as a service using Selden operator on the Red Hat OpenShift, this service lets us to create a route to the model and interact with this endpoint from the terminal or Jupyter Notebooks. After using Jupyter Notebooks for data collection, model training and all parts of the machine learning workflow, we try to ensure that all these tasks are sequential and automated. For that, we use Ellyra and QFlow Pipelines, which is a platform for building and deploying scalable machine learning workloads and we can run our Notebooks in an automated fashion using these. So now let's take a look at the demo. So I would like to introduce one of the ML services that we have built, which is basically the GitHub time to merge model. So basically this is a model that can predict the time that a new incoming pull request on a Git repository, the time that will take to merge the new PR. So basically it gives an estimate of when the pull request will be closed and it classifies the time into some time buckets, like 0 to 3 hours, 3 hours to 5 hours, 5 to 10 hours, so on and so forth. And this label can basically be a bot which can be integrated with a new GitHub repository or can help an existing community and what not. So metric like this can basically help identify bottlenecks within the development process. So for example, by having an estimate for when a pull request will be merged can help developers and engineering managers better allocate resources or speed up the engineering process. An estimate like this can also help new contributors or an existing community by giving them an estimate of when their issue will be triaged or when their issue might be closed or when their PR will be reacted upon or merged and basically this can help encourage contribution on an open source community or give new contributors sort of a feedback or time estimate on when their work on new contribution will be acted upon. There could be different use cases or various ways a concept like this can be used but just to give you an overview of the workflow that we follow to train such a model. So we basically started with the pull request data coming from the open shift origin repository so we picked this repository as a demo repo for the demo model. We then went ahead and stored all of this data on an S3 storage and we extracted certain features from this data basically tried to encode the incoming pull request into features which a machine learning model can understand like the size of the PR, was the PR created by a reviewer, a maintainer, the day of the week the PR was created, the time of a particular day the PR was created, did it have a description, what's the size of the description, what's the title like, the types of files added in the PR so on and so forth. Once we were able to gather these features we also looked at some vanilla classifiers like knife bias, actually boost, random forest, SVMs, et cetera and once we were able to sort of compare these models and come up with a best performing classifier we went ahead and deployed the model as a service. So for that we used the Selden operator on red hat open shift in order to create an endpoint which can be accessed from a terminal, a Jupyter notebook or even integrated with a bot which directly comments on a new PR. So just to show you how we can actually interact with this model I'm going to this inference notebook here for example I'm importing a new pull request so you'll see it's a typical pull request with like a title, body size, when it was created when it got merged all of that and I also have a similar formatted test data but here I am translating the input data into features such as what's the size of the PR categorical variables such as which day of the week or month was it created in, is the PR creator a viewer, are they an approver so on and so forth and then we send that as a payload to the model service that we just trained so we basically put the URL of the model here and we are JSON encoding the input data and pushing that to the model and the model responds with a positive or negative response so here obviously it's performing well it's giving us a positive response and it's also able to classify sort of which class the model thinks that it falls under so it would likely say that oh it belongs to and it's most likely to be merged in like a long time greater than 462 hours so this is a model we are working on and this can be suited to like multiple repositories we obviously don't need to stick to one single repository this can also be expanded on orgs and we are working to improve its performance but that's just an example of a service that we worked on and we will also go into another ML service that we have built so every time a developer wants to contribute to a new project the best way is to create an issue and address with a pull request so all these pull requests are not directly merged they are always subject to certain tests and bills before they can be merged into the code base and sometimes these pull requests are run into tests that take longer than 40 minutes to finish our aim with this model is to find an optimal stopping point after which this test is most likely going to fail and why are we doing this so that we can better allocate our resources and save the time for our developers and managers who are waiting for these pull requests to be merged so let's take a look at the quick demo alright so we are using a little bit slow as we are using for the time to merge however the data set that we are using here is from the test grid visualization platform we try to specify the URL where the model is deployed as a service and to interact with this model we suggest a test name for which we would like to find an optimal stopping point and put a timestamp for the day that we want to find it for and we will interact with the URL and we got a response 200 which means the service is up and running and finally we will come to the conclusion that the optimal stopping point for this given test is say 104 seconds so things like these can help developers and managers better allocate these resources and save on a lot of time and now we will take a look at how do we automate all of these work loads using Ilyra and Qflow pipelines so we just had a look at some ML services which we built but this involves a lot of steps for data scientists to carry out manually especially when you want to populate dashboards on an automated basis or have the dashboard populated every day so you don't want to go ahead and collect all this data every day run these notebooks every day just like create tables every day so you want this to be in an automated fashion and also to help with the more consistent ML ops work flow so you can track metrics and do things the right way so for that we use Ilyra and Qflow pipelines these are also being operated and these are services that are available on operate first but here what you see is basically an example of a sample Ilyra work flow so Ilyra is essentially a Jupyter lab plug in which can be used to create Qflow pipelines and automate notebooks and scripts so this is like a three step work flow the first step is essentially collecting some data from the test grid visualization platform the second step is calculating certain metrics from it here we have two notebooks but you can put as many notebooks as you want to collect as many metrics as you like we are calculating metrics such as aggregated views of how many tests are passing how many bills are passing how many bills are failing how many are flaking out how many are getting blocked we can also see which tests are correlated to a given test or how many times does a test run fail before it starts to pass again are some is some weirdness that we are seeing in certain tests and metrics and KPIs like that and once we have calculated these metrics we store them on an S3 storage and we create some tables from those on the Trino database engine so these tables can then be used to sort of create visualization and nice charts on SuperSet so in order to trigger this I'm going to hit the run button give it a name and select an existing Qubeflow pipelines runtime configuration that we created earlier and hit OK so what this basically does is it packages all these notebooks to the environment variables and everything that the notebook needs to run and it basically submits it on to the Qubeflow UI so now when I go to Qubeflow pipelines and if I go to like some some runs I can go ahead and see that hey it got started it's running and in order to debug any issues that just went ahead while running this notebook we can go to logs and figure out what's going on how the notebooks are executing and we can also make this happen on a recurring daily basis so that we don't have to do this every day and this is how it should look after the whole thing has completed executing successfully there should be a green check mark next to each step and this basically sends all the data onto SuperSet so here we are here's a SuperSet dashboard so as we discussed that we calculated a bunch of metrics and KPIs it's easier to visualize them using this dashboard so we start by looking into the build, passing and failing and the distribution percentage for the same we also try to find out how many tests were flaking out and what was the number of flakes over time and probably need to log in again so this is our SuperSet platform on Operate First Cloud and you could also log in here by creating an account here and if you click on Operate First you can actually access SuperSet using your GitHub ID and there we go you just need to basically add yourself to a Git repo and you have access to all of these so awesome after number of flakes we've also calculated the average percentage of tests fixed over time and what's the duration for the test per grid and how long does it take for a particular test to run at different times and then finally one more feature is persistent failures you know sometimes you don't even make a fix to your PR and suddenly the test start to pass we are trying to capture that metric here where suddenly your failing test starts to pass so we try to find out how many times do you need to run your test before it starts to pass again and how long does it take for these tests to start passing and also calculating the transition rate for pass to fail and fail to pass for this dashboard you could also filter it by selecting a particular tab or a particular grid and even by a particular test that you want to choose here so that's it for the demo and coming back so if you would like to engage with this project there are multiple ways to get started first one is to scan the QR and get redirected to our GitHub repository you can also interact with and leverage the various open data sources that we talked about and also you can get access to the data collection scripts and exploratory analysis that we've performed there are also interactive and reproducible Jupyter notebooks for the entire project available for anyone to start using on the public Jupyter Hub instance of the operate first cloud right now and we also demoed the superset dashboard that is available for anybody to look at add new charts and take a look at already existing KPIs and metrics that we've plotted we also saw a bunch of model endpoints that were available for anybody to interact with and pass your own payloads and see what your results are and you can also run your AI and machine learning workflows using the Lira and Qflow pipelines we've also put together a guide for you all to follow along and if you want to learn even more about this we have our YouTube video playlist that you could look at and if you wish to get started with operate first for your data and if you are interested in learning workloads you can scan the QR here and here are a few links for you to interact with and that's about it thank you everyone for joining in I'll open the floor for any questions one, two, cool so are the models that you're talking about available publicly as well and specifically are the training pipelines for these models answer to both is yes they have been trained on some particular repositories right now so we have the code, the training pipelines and all of the notebooks and the data that we used publicly available but if you wish to leverage it for your own you'll have to work on them and change it for your own workloads based on your experience so far what's the accuracy for the models you've been talking about especially the first one for the time to merge how it goes so far so the time to merge it has multiple buckets it's not a binary classification so accuracy is always not the best metric and it's very hard to I mean like accuracy as a broader term like performance or whatever so right now just like to give you very broad numbers classification accuracy for one bucket out of 10 buckets is like 24% it's not great but not bad and this is noting that we only trained it on vanilla classifiers if you introduce deep learning models if you improve upon the features if you create some stacked models this might even do better this is very nice very vanilla classifiers maybe well engineered features but the models are very basic and it's about 20 to 25% accuracy on each bucket right now and it's about 20 to 25% accuracy on each bucket cool thanks thank you I've got a quick question if I can just for those who are like me who aren't as familiar with the Joplin and Noplin environment when that was going on and it was showing the run there was a certain portion that was interactive and actually happening in real time you just went right by it and it was running on the cells but for time to merge it's pretty fast so we can run it for you like yeah I did it just it was pre-run but that can be obviously changed for different PRs different data sets and it's working, it's live we just didn't want the unexpectedness during the demo yes we tried to evaluate which features are more important and just exclude some features which are not very important so as far as I remember some features like the description of the PR was an important feature and some other features which I do not remember but let's see if I can pull it up right now everything is in the repo so while I do this I can also see where this is being created I cannot I cannot tell if the notebook is partially broken somewhere but here we are basically evaluating is it here it's a long notebook but somewhere here we evaluate feature importances like you see here and we see let's see which ones are doing good so we see like oh okay this is very important if I'm seeing the right thing size was important followed by created at month change in this particular directory which I don't even remember what it does but change in this particular directory created on which day of the week stuff like that but these feature important scores would depend heavily based on which repo characteristics a maintainer characteristic or just things would differ arc to arc so this was just on that particular open shift origin repo for a particular date range so this would change the user name of the person like a good user name will be merged soon maybe we should probably include that I think we are just out of time but if there are any last minute questions or if you want to reach us if you are taking around the red hat boot or just feel free to come to our repo we can just find us about almost about anywhere so thank you so much