 So, hi again. Welcome everyone. My name is Rada Warbowski and I work at Red Hat as one of the kernel engineers based in Brno. And part of my job as a kernel engineer is to help to deliver new kernel updates to customers. And two years ago, I saw Stav Alter's presentation about cyborg teams and automatization for the project cockpit. And I thought that it's something really cool and I want to do something similar, but for the kernel. So I started experimenting with the data that is available to me. And it's not only patches and kernel source tree, but also various kernel error messages generated by customers and our internal teams. Although Red Hat is not exclusively an OS company anymore, Fedora and Enterprise Linux are important key assets of our ecosystem. And such error messages are sort of a feedback how good we are about doing our job correctly. There's a department in Red Hat that serves as a gateway between kernel engineers and customers. And here in Red Hat, we call it the global support services or GSS for short. And customers are sending their kernel crashes to Red Hat to take a look at what happened under systems. And if the problem cannot be solved by the first line support and trained engineer needs to take a look at the send data using cryptic tools like crash or GDB and compare the customer data with the source code and do a technical analysis by hand. Such engineers are only a handful and there's limited time how much they can invest at a certain case. And when I heard that the kernel crashes and their logs are proceeded manually by hand, I thought I can help here. Because Red Hat has a knowledge base with already solved customer cases and solutions with solutions to problems. And such cases already contain data like the kernel error messages. And I thought if one customer encounters a problem, there's a probability that someone else has the same problem. And there's also a possibility that the problem is already solved and all you have to do is connect such cases or with existing solutions or to a similar problem which can lead to an engineer to the right direction. But there's a small catch regarding Linux kernel and its error messages and why they are analyzed by hand. But first let's take a look at the kernel error message. So here's an example. If you are using Linux for a longer period of time, you have probably seen a similar message. It's usually generated when something went wrong in the kernel and the kernel does not know how to handle the situation and it wants to notify you about its crash. It brings you its late state. So in case you are a kernel hacker, you can look up the source code to figure out what happened. Or if you are a customer, you can at least have something you can report to your support so they have a point of start to help you to solve the problem. There are many parts to such a kernel error message. So let's take a quick overview. The issue line is usually the first line of the message. It tells you what went wrong, what kind of problem you are looking at. In this case, the kernel was triggered after its internal counter was not refreshed for a while because some process was holding the CPU for a long time and the kernel could not schedule other tasks and became unresponsive. The yellow bar points to the list of loaded modules. They are part of the kernel that can be loaded on demand. And you might think of them as hardware drivers for networking cards and sound cards, but they might also contain file systems and even communication protocols and even communication protocols and native language support. It's also a way for third party vendors to deliver their propriety code to the kernel. The orange bar highlights the information about the process that has caused this message. The CPU where it was running, its process ID, information about the kernel version, whether a propriety kernel module was loaded, and of course the hardware and the bio string. The pink bar highlights the content of the CPU registers when the crash happened. The registers might contain immediate results of previous operations, pointers to memory structures, input parameters of functions, but might also contain nonsense and garbage values if the process has gone wrong. The violet bar highlights the top part of the stack. It's a piece of memory privately allocated by the kernel for every process and running tasks in the kernel. And the stack may contain return addresses of functions from the cold trace. It might contain local variables, but also nonsense if the memory was overwritten by a rogue process. Okay, so the blue bar shows the cold trace. It's a simple call graph of functions starting from top where the entry point to the kernel was to the bottom, where the problem happened. And the trace shows the mapping of the functions in the kernel space, but most importantly, the names of those functions and it's usually the first part when an engineer will start analyzing the error message. Do you see the next slide? The blue light. The last is the light blue bar showing the machine code around the place where the problem happened. So there's a lot of information encoded in this text, and depending on the type of the failure, not all of the present information needs to make sense, as it might be random noise. Additionally, there's no single unifying functions or a macro to display such output, because every different subsystem displays different messages and each hardware architecture defines its own set of callbacks for hardware specific information. And to make things more complicated, do you see this slide? I mean, is it okay? Linux kernel is a complex piece of software. It's not an average user space application which can be easily debugged and where errors will show up with the same output. Debugging a running operating system is tricky and although the kernel has ways to tell you something went wrong by printing an error message, there's a catch. In the kernel, there are many asynchronous processes running at the same time, which might influence each other. Also, every system is different, unique by its hardware or a list of loaded modules. And all that might cause that the same problem will manifest with different error messages on different systems. In other words, if you are sitting the cold trace, again, everything that is shown here might change even if you encounter the same problem. So you will get a different content, even if you are encountered the same problem. So if you are using a simple string comparison function to compare kernel messages for an aiding system, this will not work and some kind of fuzziness and comparison is necessary. And you might think, well, that's easy. There is a whole set of string similarity algorithms that are proven to work and we can use and you are right. So I made the list to name a few. I'm not going to go into details how they work as it would take another presentation. However, in practice, they accept two strings on the input and they will give you a number on the output to tell you how much the two strings differ from each other. Since they are proven to work, and there is a list, there's it's a huge list aside from I wanted to make sure I take the best performing one. So in this case, with it, it was the fastest one. So I made a small performance test to see how they hold up. And the test is simple. Find the most similar match from a search space consisting of 63,000 unique kernel error messages. So here's the table showing the result for a single average search. The values are in milliseconds sorted from worst on the left to the best on the right. And it took more than 63 seconds for a single search for the worst performing algorithm. And the best one was 3.3 seconds. So, yeah. Yeah, the worst one was 36 milliseconds. The best one was 3.3. I mean, 36 seconds and the worst one was 3.3 seconds. And this, this might be good for a small research project, but it's not imaginable that an employee will in best case log a server CPU for three seconds to get one result and the search time will expand with new cases. But maybe there's a way to make it faster. For example, if we take one of the algorithms and do some tweaking and optimizations we could achieve a faster result. So let's take some two random search algorithms and let's see if we can make them faster. And to be honest, I, if you see the how about I will probably ask for a notification every time the slide switch. So I'm sure I'm on the right. I'll stay here. How about we can see how about with two algorithms in both. Okay. I'm just, I have not chosen those randomly because they are already used in red hat for specific projects. Yes, that was to string similarity, and then jump again. Okay. But you should see like J W star and star. Okay, because I won't know I won't name those projects. Instead, I will use substitutes J W star and LCS star. And before I talk about the specific optimizations. Those projects have taken let's, let's like, let's take a look how fast they have become compared to their unoptimized counterparts. So there's a huge drop in search time in both projects. J W star down to a half a second and LCS star down to 39 milliseconds. And although I'll see a star seems to be the clear winner year. Let's talk about it. J W stars optimization first. The J W star project takes the cult race part of the error message that was the blue portion of the message, but throws away the whole cult race and keeps on the three bottom frames. They are then concatenated to a single string and compared against its internal database. And you might ask, are three frames enough to decide about string similarity of two different messages. That would probably depend how long is an average carol cult race anyhow. So there should be a graph now. So the graph was generated from the sample search space and it shows the distribution of the current cult races by language frames. Reading from the graph it's highly probable that and call graph cult race will be between 18 to 25 frames. And if we keep only the last three frames people throw away 83 to 88% of the information from the cult race or more. And we can still argue if those thrown away frames have any value, but I would say depends from case to case. So how about the RCS star optimization. The RCS star compares the whole cult race. And it was the most successful after optimization dropping from almost 13 seconds per search to 39 milliseconds. And the secret behind the RCS star improvement is that it compares only the last 200 entries in its database. In this specific case where we have 63,000 entries, it throws away 99.7% of the data to speed up, and only 0.3% is kept for comparison. The results with optimization are much better, but at what cost, either by limiting the search space or by throwing away the content. And I wanted to scale up for speed but without any compromise and possibly by keeping all the data. So the string similarity functions are dead end here. It looks like computers were not really designed to do fast operations on large quantities of human text. Although kernel error messages contain technical information about the CPU and the state of the operating system. It is presented in a form that is readable primary for humans, not for computers. Luckily, there's a field of computer and of computer science and linguistics that specialize just on that. It allows computers to process human readable text in a fast way, and the name of the field is natural language processing, also known as NLP. The one feature I borrowed from NLP are word embeddings. What would Wikipedia for definition word embedding is a collective name for a set of language modeling and feature learning techniques in natural language processing where words or phrases from the vocabulary are mapped to vector of real numbers. In other words word embedding is a representation of words or phrases using numbers and high dimensional vectors, explaining how to turn words into vectors would be for another presentation but you can look it up on the Wikipedia. After some initial test with my own implementations. I chose an existing library named work to make which was written by Thomas Mikhailov, former Bruno universe university student in 2013. Here's a picture from the original work to make paper, what they used a method named PCA or principle component analysis to squash the high dimensional vectors to a two dimensional representation. And it's easy to see the relationship between countries and their capitals, although work to make is a statistical model which does not understand the text that is transforming to numbers. The integrated vectors capture, surprisingly, a lot of syntactic and semantic information from the source text. Here's another example. There are multiple degrees of similarity among words, for example, King is similar to Queen as man is similar to woman. So you can think of work to make as an analogy detection, but how is this applicable to Linux error messages. So the work to make model on the 63,000 messages from the search sets, search test, search test, each string that means all the function names, all the modules, kernel versions issues drinks and hardware strings are turned into large vectors. And I chose some random functions and module names and search for the most similar ones. And remember, the, the found similarities are based on the content of the other messages, not from the source source that source code. Here is a picture that uses dimensionality reduction to draw kernel functions as dots, and why the color of the dots was added by me, depending on where the each of the symbol was defined in the Linux source tree. The structure and the clustering of the points was learned by the NLP model, not from the source code but from the error messages. Although information is lost when you are transforming a vector from hundreds of dimensions into just two. It's clearly visible that the frames from the same subsystems are grouped together. So it seems to work. Let's take a look at the searching actually works. We don't have to deal with characters anymore. The strings are turned into something that the computers are good at into numbers. In this case vectors, and you probably still remember from high school or university years. What is the easiest way how to find distance between two vectors. And for those of us who are guest L2 or Euclidean distance, then you guessed wrong. Because we are working with vectors, not points. The easiest way to measure how similar two vectors are is to find out the angle those two vectors are holding. So on the right should be the formula. In the numerator, we multiply vectors element wise and divide the result by their magnitudes computed in the denominator. To make sure the denominator is always equal to one, there's no need for division. The question is, when is the denominator equal to one. The answer is, but if both vectors are normalized and their magnitude equals to one. The wall distance measuring becomes a simple multiply multiplication of two vectors, or in our case of a vector and a matrix where the matrix contains the vectors from the search space. There are two lines of Python codes to implement the search. The first line of the code is the multiplication of the search space matrix and a sample, and the second line source the result and results the three causes matches. So we have seen how to turn individual words into vectors we have seen how to search for the closest vectors or analogy. The next piece to the puzzle is to figure out how to transform individual individual vectors into a single vector that can represent the whole kernel error message. And there are several ways how to do it. However, the simplest one that worked for me is to sum all the vectors and normalize the final product. Though it might seem to be illogical, it's proven to work for, for example for sentiment analysis on human text. And so there are all the steps needed for a simple editing system. We have a have an LP model that can translate strings into vectors. Take an error message split it in the into individual words words or tokens translate those tokens into vectors using an LP sum them up and make sure the result is a unit vector. So you probably would like to know how fast it is single search on the 63,000 samples in my configuration is below 10 milliseconds. And there are no compromises. That means no data is thrown away for speed or optimization. You may ask, does it really work is that a proof of it. So mathematical proof would be difficult. And therefore I made a small test application that collects data from various parts of our internal network. New error messages from testing infrastructure new customer cases and their solutions, and the more data are available to the model the more accurate it is in searching. The application is accessible through a web interface where you can upload your query as a file, or you can copy and paste your query into a web forum, and the application will give you three best results that have that is that are stored in its database. So here in this picture I'm entering the example from the start of the representation into the web form. And the application will give you three best matches for every unit kernel error message that you have uploaded. How close are the matches to the original query is displayed by colored cubes seen here on the left upper corner in the picture with color gradually changing from green as the best result, the red as the absolute first one. So here in this example it found the perfect match as it was already in the database. So you can match that the application found the match is not the same but very similar one. The hardware is different the kernel is different the code phrase is different, but the problem seems to be the same. Watchdog barking about a stock process that is network file system related. Also this match is pointing to an already solved case, which might be used as an inspiration for solving the new problem. For example, there wasn't an exact match, and you might might be asking, why am I talking for 17 minutes about something that could be explained in just two sentences. Here is a picture of all the error messages as word embeddings from the sample said the vectors are gathered into groups of clusters and there's an underlying structure. Because unlike previous mentioned algorithms which work directly with strings and their only output is a single number. This one works with vectors. And just to the help of NLP we can find analogies between individual individual messages, but more importantly, such vectors can be fed into more advanced machine learning models that can do more work in the future than just searching. Well, it could take a look at new previously unseen problems and help them classify with growing number of automated tests and more complex infrastructures this would be a huge help to those who are taking care of the infrastructure and also to those who are writing and running the tests. And with the help of large language models like transformers, machine learning model might become one day your coworker that will help you with customer requests with time computers will be will become less the source of frustration and error messages by the more valuable to a partner that will help us be more productive in the future.