 Okay, great. Welcome everyone. Our next speaker is Sri Anand and he will be presenting on data science with Red Hat Insights. Sri is a software engineer slash data scientist. I'm actually not sure what your title is at Red Hat in the AI center of excellence. And as always, I guess if you notice any issues with the recording, please let me know. I'm going to be sharing my screen now. Hi everyone. My name is Sri Anand. I am a data scientist with the AIOps team, which is a part of AI center of excellence. Today I'll be talking about data science with Red Hat Insights. And so all the work that I'll show has been done in collaboration with the AIOps team. So thanks everyone who participated and gave feedback. And we'll have a question on the session at the end, so feel free to ask any questions then. That's it. Let's jump into the slides. So I'm going to talk about Red Hat Insights and give you a feel of how users interact with Insights and what's the workflow like and what's the problem that we're trying to address using Insights. And I'll also give you examples of the kind of data that we collect from Insights. And then I'm going to go into two different projects, SAP data analysis and draft baseline suggestions. So one of them is data analysis part we try to visualize what's going on and then come up with insights from that. And then there is just baseline suggestions, which is more on pattern recognition and data science, data science stuff. And then we try to come up with patterns there. At the end, I'll tie everything together and talk about other projects that we're doing that fall into the same bucket and also how we can improve the current approach. So this is like a meme that I created to walk you through the journey of what happens when a customer runs into a problem. So imagine that you are a real user and then your colonel starts to panic. So first thing that you would do is go to a support person and then try to figure out what's wrong. Red Hat has an incredible support team and then they would try to navigate the problem and then come up with appropriate solutions. So your problem gets solved, but that's not the end of the story. So there could be other people who face similar problems. And in order to save the time of the support engineers and also not to duplicate effort, the problem and the solution that has been already figured out are encoded in a KCS article, which is like a knowledge case article. So if there is someone else who has the same problem, they can always refer to the KCS article and then figure out what's wrong. So that's there, but even that's like not the end of the story. So we have insights. So what insights does is pretty incredible. So we say that, okay, we know that these like in through the KCS articles, we know that if certain conditions are present in your system, then that would mean that your system could go into a colonel panic. Right. So what they have done is we have created an engine of these conditions, and they say, okay, we'll scan all the data in your system and see if these conditions are present. And if these are present, then before the problem arrives, we're going to proactively warn you that, okay, your system may go into a panic. We need to fix these issues. And then so we have a repository of these conditions. And then we also, in this process, we collect the data of all the systems. So we check this data using these rules and then see if there are any vulnerabilities. So this is like a well-defined process and we have a team that creates all these incredible rules. But where we come in the picture is that we collect a lot of data from the systems and then we have these manually written if and then rules. And these rules are pretty specific. And also business critical, right? So if there is a colonel panic and then we know that it's because of this hardware error, then we need to warn the users. So these are pretty important rules. But there are other things that can be done with that data in the sense that we can come up with rules using data science and data analysis that are maybe not hardware specific, but that can be mined by examining data from a lot of the users. So for example, if there is a configuration file and the user is trying to edit that and they make some error and a particular service or maybe even the system crashes because of that. So we can rectify those errors because we have seen a lot of these configuration files in users across the real family. So using that information, we can alert the users that you're doing something wrong in the configuration file. Now we have to understand that there are only certain kind of errors that can be detected using data science and then there are certain kind of errors that can be detected using like manual support engineers. So data science here is like trying to find these large scale errors so it's scalable so it can do a lot of these tasks for unknown environments. So that's where we come in and elaborate more on the kind of projects that we're doing here. So yeah, so before moving to that, this is like a typical dashboard of data inside. So you have these vulnerabilities affecting your system. So the rules that I talked about, right? So they check for these vulnerabilities and then show you in this dashboard. And similarly, there are other things that are going on here. So in the process of finding these rules and also checking them through the data, we are collecting this data. So the examples on the screen are some of the ways, like some of the data points that we collect. So on the left you see hardware information and software information. So like who's the cloud provider, what's the architecture like, the bias vendor, the bias version, and then also like the software information. So it's like which services are enabled on your system, which are the installed packages, services and modules. So these are like the software and hardware information that we get from the systems. And then there are also configuration files like I talked about. So this example is of a configuration file called SSST. And then there are a bunch of key value pairs here, access provider, ID provider, etc. So there are a bunch of services that can be configured using this file. So that's what the data looks like. Now we can do like a bunch of things with this data. So the first thing that we can do is data analysis. So what that means is we can visualize different aspects of the data and find relationships between them. So for example if we have three columns and then we want to see how they're related, we would like find the distributions, correlations and all those things. So what that does is it helps us to understand the system better and then make decisions better. So one project that I'm going to show is SAP Workload Analysis. And hopefully that will clear things up for the data analysis part. The other thing that we can do is data science. What that means is we are trying to find patterns in this data. And we are trying to see like given all this data, what are the patterns or what are the frequently occurring themes in that data. So the baseline suggestion is an example that I'll go into details. Before I move on to that, I want to emphasize that all the technologies that we used are open sourced. So for SAP Workloads, we are using SuperSet, which is an open sourced business intelligence visualization software. And for all the data science work, we use Jupyter Hub, which is a part of Data Hub, and then it allows us to have notebooks and then do all our experiments on those notebooks. So a big shout out to the Data Hub team who have helped us in like all the infrastructures that we need. So let's move on to the projects. Right, so SAP data analysis. So in this project, we're trying to explore like user systems and also the SAP systems that are installed on those systems. So on your screen, you see an example topology of a user system. So the 5127 here is a system associated with REL insights account. And then these are HA1, HA2, and BV2 are different SAP instances. And these are like IDs of SAP instances. So what we're trying to do here is that we're trying to see the SAP workloads that are spread over different machines in user account. So we want to visualize them. And then we want to see the topology in order to reorganize or maybe change how our systems are laid out. So I'm going to show you a dashboard, the superset visualization tool that I talked about. So the image that I showed you was of an account that was pretty simple, but this one is more involved. So we have all these like the small dots are the systems associated with this particular insights account. And the DA here is again the SAP instance. SAP is like a business organization tool that a lot of REL customers run on REL. So it's like worthwhile investigating how these workloads are organized. So on your screen, you see that there are a lot of systems and a lot of them have this particular type of SAP instance. And then a lot of these systems have their own different types of SAP instances. So what this does is it gives you like an overview of how your systems are structured and then how you can better organize them. So it gives you like a bird's eye view of what's going on. So I think like that's really powerful and then also it shows what you can do with the right analysis. So there are a bunch of other things that you can see in this dashboard. The link is attached in the presentation. So for example, this graph shows, so we have accounts on the x-axis and the number of instances on the y-axis. So it shows like which accounts have a lot of instances. So we can see that this particular account has so many instances, right? And then like the other information here. So coming back to the demo slides. So that was the analysis data analysis part. And then the second thing that we do is data science stuff. So that's like drift baseline suggestion. So that's one project that I'm going to explain. So I'll go into the notebook here. So it may get technical for a bit, but I'll try to make sure that everyone is on the same page. So what is drift analysis? So it's a visual tool that tracks changes in the environment. So remember the dashboard, like the insights dashboard that I showed you at the beginning. So it's a part of that. And what it tries to do is so if a user account has let's say 300 systems. So what it's doing is it's giving them a way to compare those 300 systems. So for example, let's say one of your systems is crashing and then you want to pinpoint why that's happening. So one way of doing that could be so you compare that system with a similar system and see, okay, well, this is crashing and this is not and then these systems are similar. So what's wrong with this system or what's different with this system? And then you can compare them side by side. So drift application allows you to do that. So on the select this left column is one system in the right column is not a system. And then these are like the points like all the data points that insights collect. So for example, here, this is like a Lenovo bios vendor and this is C bios. And then it's a physical, it's a virtual system, it's a bare metal, it's KVM. So these are the system, these are the differences that you can see in your systems and then compare them. So that's what drift analysis does. What we are trying to do here is we are trying to recommend baselines. So what that means is, like I said, so in the example that I gave where a user was having issues with one of the systems. So they had to know which system was similar to their system in order to compare, right? Otherwise, there are 300 systems and you don't know which one to compare your system with. So you have to have a baseline. So in this project, what we're trying to do is that given an inventory of systems, we want to find groups in those systems. So we want to say that, okay, all the elements in this group are similar. And then from each of those groups, we want to recommend a baseline or a central point. So if you have a problem in one of your systems, then you'll find which group it belongs to. And then you'll compare that with the baseline and try to see, okay, what's different here. So the image here shows these systems and then the colors show the groups and I'm going to go into more detail. But where it's helping is that it'll save time for users by identifying and recommending baselines. And then also it'll help in improving configuration management. So that means like, if you want to standardize all your systems in a group, then you can just like compare them and standardize, right? So that's always better if you have standard systems. So let's go into the demo. So this is the Jupyter Hub environment that I talked about. So you have to go to JupyterHub.datahub.hera.com. You have to log in and you'll arrive at this page. And you have to select the drift analysis notebook. Anyone can do this. Like anyone with a red hat account can do this. So you select this drift analysis notebook image. So what I'll do is it'll get all the dependencies that are required for this project and put it in your environment so that you don't have to do it on your own. And then you just click on spawn. It'll take a couple of minutes to load the dependencies, create the environment and do everything. Yes, I have one that is already open. So I'm going to go through this notebook in detail. So let's talk about what we are doing here. So the first step here is to select features. So there were like these bunch of features that I showed you, right? And when they are finding the groups in our systems, all of those features may not be important. So you would want to find groups based on only certain features. For example, OS release and installed services, installed packages and kernel modules. So these are the only features that you would want to investigate. So you can just comment out the rest of them. You just keep these and then just comment out the rest of them. If you want, you can change this and then you'll have a different grouping of systems. So that's the first step. And then the second step you encode these variables. So what that means is that when you get information from these features, a lot of them are categorical in nature. So a statistical model may not understand what AMD or Intel or stuff like that means, right? So they need numbers in order for them to operate. So encoding is the process that converts all these categories and all these like not suitable features into something that you can put in a model. So we have numerical features. We have categorical features. So we want the categorical features. So that basically means we give one bit to every category. And then when it's present, we say that that bit is turned on. When it's not, we say that the bit is turned off. And then similarly, we have list of word features, which means like we have a list of services that are installed. So again, we use something similar to one hot encoding here. And so when we do that, we run into the problem of dimensionality. So we have 333 systems. But then when we encode them, we reach into it. So we have this problem of dimensionality reduction where so we have a lot of these features in the encoded vector. So we have 333 systems and then we have 25,000 encoded vectors. So we want to reduce this number so that the statistical model is able to computationally process that. And so we use a dimensionality reduction technique called UMAP, which reduces it to 100 dimensions. And then we cluster based on that. So in the clustering, we use something called k-means, which would basically group these systems. And then it automatically finds that, okay, we have five different groups in the systems that we have provided based on the loss. And then we get to this image, right, the one that I showed. So what this means is that these dots are systems and the colors represent the groups. So for example, this is one group. This is another, the red one is another and the green one is another. So like we have five of these. These black dots represent the center of these systems and also like the baselines that we recommend. So for example, if this system is failing, then we compare it with the system and say, okay, let's see what's going wrong. So yeah, also like the number of groups here is important. We automatically detected that based on loss, but then you can also change it. For example, if you say that, okay, I just want a completely different group for these elements and I just don't want them to be in the yellow group. So you can increase that from five to 10 in the notebook and then you'll have different clusters. So there's also like some parameters available for the use to configure this. And the next part of this notebook is based on interpreting what's going on in this clusters. So I'm going to jump on to the inspecting function here. So what we are trying to do is in the zeroth cluster, we are trying to inspect the installed services. So this provides like an overview of what's going on. So it says that total elements in the cluster are 33. So then we have all these services here. And then we are trying to see like the frequency. So all these services are installed in all the 33 systems that belong to that group. And all of them are in the baseline as well. But as we scroll down, we would come to some services that are only installed in 31 of those systems. So we are clustering things that these groups are similar. But like this particular services are only installed in 31 or 29 out of 33. So maybe we could also, in order to standardize, we could also put these services in the other systems. Or maybe if they're not required to move them. So what it's offering is that it'll give you like a clear description of how your systems differ. So yeah, that's it. So we saw here how to select features, how we can encode those features, and finally how we can use them in the clustering. I also have a detailed video of like each of those steps. Because it wasn't like in the scope of this talk to cover everything. So I'll include that in the links as well. So if you are interested, you could go through that and notebook as well. So it's pretty simple, right? So you, when you spawn your notebook, you'll come to this page and you open the recent one. You go to the demo, you open the baseline demo and you'll basically come to this place. So you don't have to do, you don't have to get any dependencies. You can just like come and play with this notebook. So coming back to the slides. So before I leave and then I'll like let you guys ask questions. There are some other projects that we're doing. So there is this configuration file analysis that I hinted in the beginning where we're trying to detect errors in configuration files so that we find those patterns and then we can alert users that, okay, so these are things that can go wrong with your configuration files. Then there's also KCS article classification. So remember, in the initial example, I was talking about these KCS articles and how we make if and then the rules out of these. So we also read a project where we had these KCS articles and then we were reading the text like the problem and the solutions. Then I tried to figure out whether it'll make a good rule for insights. And also like we, in one of the projects, we were trying to figure out which systems are outliers or in those 300 system, let's say one system is completely different. And then we were trying to figure that out and see whether or not that could be a problem. And then so yeah, so those are some other projects that we did and we are continuing right now. So I hope I was able to give you some intuition about insights, the data that we collect and also the analysis that we can do and the pattern recognition and data science projects that we can do with this data. So with that, I am going to open the floor for questions. I think there's a lot of noise coming from your mic, could you adjust that a little bit? Is it better? Yeah, I think so. I think it was too cool so we could hear you breathing. Okay. Not that there's anything wrong with breathing. All right. I can stop there if you would like. You sound good. Great. Yeah, so folks, the floor is open for questions. But while we wait for someone to get their confidence up, I have a question. Okay. Red Hat open source, are these notebooks available for anyone to try? I don't have the GitHub or anything. That's a good question. So right now, no. But we are working on, so like you can understand, this is customer data, right? And so we have a lot of privacy issues there. But we are working on anonymizing this data and also putting out like the first few books that deal with public data on open source repositories. So in short, right now, like publicly, it's not available, but it should be soon. You answered my next question, which is, are there any plans to open source some of this data, you know, and the not, and the not. Yes. It's a science terms and anonymizing it. There you go. That's a top priority on our like next things to do. So yeah, yeah, I think, yeah, I think a lot of people could, I think it's something the community would find interesting. Similar way. I like hearing about when people struggle with their work. So what are some of the biggest challenges you face with this? I think like, so the biggest challenge here would be just to deal with the systems data. I think when we do our education and like study about data science, we get all these like comfortable data sets that are cleaned and also like natural language or computer vision and all this kind of stuff. But like, we never really get to deal with data that's noisy and you know, real way. So one of the biggest challenges for me in the beginning was that to deal with systems data, because I've never like, I never did that in school or anywhere else, and then also deal with noisy data. So that's purely from a data science perspective. And then from an organizational view, I think the challenging part was to collaborate with other teams because so you cannot possibly exist in a silo and keep working on something because the end user for your model or your project has to be continuously involved. From that perspective, you learn a lot in how to collaborate with other people, how to be responsive. I'm still learning. I mean, it's it's a continuing process. So that's that's another thing. Yes. A lot of us can agree like those organizational challenges are sometimes bigger than the technical problems, right? Like figuring out how to work with people and just building that right sort of relationship. I think. Yeah. There's a lot of hidden trial as well. So sometimes things just don't work and then you have to keep moving on and keep trying new things. So are you talking about I for work right now? I think I think it can be applied anywhere. Yeah. I have something else which I yeah. So, you know, with all this data and like kind of a lot of what you're doing self exploratory. What what do you think would be the coolest or where you see this data and like being most useful long term. And this may not be a question you can answer since it's well, you know, kind of restricted data. But what do you think is the coolest thing? That's a good question. So they can be a lot of things that can be done. So we already have, for example, insights, right? We already have a platform that's there. One like low hanging fruit would be to include all these kind of analysis and I think that we are doing into that already existing platform as features, right? That so having an active integration with existing software would be pretty cool for a data science team. And I think so that's like a challenge that we can solve and then we are already trying to do that in long term. Something really cool can be building systems from scratch that rely on AI and that would mean solving harder problems like root cause analysis or causation models that Gordon was suggesting, trying to figure out why your kernel is panicking or and in an unknown environment why something happens and then trying to figure out those policies or rules. So I think if we ever come to that stage where we can algorithmically deal with an unknown environment, that would be really cool. I think or from where I stand, I think this is something that operators could really benefit from, right? Like this whole idea of open shift operators which are which can be self-remediating and all and if you can start pulling in machine learning models which were trained with this customer data, maybe we can start solving problems before they happen. I think it's always the dream. Right. Yeah. So this was fun. Thank you Anish for asking these questions and everyone who watched the video. Yeah. Thanks for putting this together for us. This was some really cool stuff. Have a great day and we'll see you around. Bye. So for everyone else who is still here, don't leave quite this yet. We have an exciting panel coming up with some experts from the EICOE. They'll be talking about machine learning and it'll be like a panel. So really any questions you have, they're happy to answer them. So we'll see you soon.