 Hello everyone, my name is Arseniy Ryutov, this is Fedor Sakharov, and today we'll be speaking about detected web attacks with recurrent neural networks. So before we start a little bit about ourselves, I'm an application security researcher at Positive.com. Positive.com is an application security company, and further is a software developer at SOM.com. SOM is a decentralized computing platform. So our presentation will be divided into three parts. We will start with the problem, the challenges of web attack detection, then we will move to the actual solution, the anomaly detection for HP requests with attacks using deep learning, and then we will finish with our results, demo, and our plans. So the first part, I would like to speak about the problem. So we are solving the problem of web attack detection. So what are web attacks? The systems that aim at detection web attacks are called web application firewalls. These are the systems that protect websites and web applications against the attacks on the higher level of OSI L7 attacks. The first commercial WAVs appeared about 20 years ago, and the most known open-source WAV with remote security. Typically WAVs operate as a reverse proxy, so we have some intermediate server that processes the traffic, web traffic, and then proxies it to the backend. And mostly WAVs use pattern matching nowadays still use it to detects web attacks. From WAV perspective, there are basically two types of web attacks. The first one is time series based so that it means that the attacker tries to make different multiple requests to perform his attack. It may be a web scraping, root forcing, for example, a login page, fingerprinting to detect the version of your web server or scanning, for example, for vulnerabilities. And the second one, the second type of attacks is based on a single request, a single pair of HTTP requests and HTTP response. These attacks can be detected on a pure request basis. These are, for example, SQL injection, cross-site scripting, XML external entities injections, and all other attacks that are basically some kind of injection into some HTTP parameter. And the focus of our research will be the second type of attack. So we do not analyze at present the sequences, but we analyze a single request and try to detect and analyze single requests to detect some patterns of web attack injections. Now I would like to compare classic pattern matching to machine learning approach. Pattern matching is effective to detect known attack vectors, and it can be easily maintainable. It's just a text file. You can open it in any text editor and add your additions or you can delete anything or you can maintain it via version control such as git. And it can be pretty fast. The results are always predictable and rules and patterns can work out of the box. So you don't need to mostly tune it for particular websites. So it can work for majority of websites. But of course false positives can happen. There are also some disadvantages of pattern matching such that they are subject to attacks themselves. So if you write a pattern that is not well sought of, that contains some vulnerabilities itself, then it can be subject to such attacks like reges. So you can perform denial of service on the pattern that is buggy. And patterns are typically easy bypassed by hackers. So if they have above that uses patterns to block, for example, SQL injections, it can be easily bypassed. And of course, it's not so effective at catching zero days. So unknown vectors that pattern matching cannot extrapolate to unknown vectors. And to write a pattern to write some rule, you of course need to understand what your projection from. So you need extensive web security knowledge. And yeah, pattern matching causes lots of false positives. So let's see what benefits and disadvantages presents machine learning. The biggest advantage, of course, is that machine learning can extrapolate, it can detect previously unseen samples. And usually it's not so easy to bypass. It's also pretty fast. If we are speaking about just a forward pass, not training. And it doesn't require web security knowledge, especially if we speak about deep learning, when you don't need to extract features prior to training. And the disadvantages of machine learning is that it requires some time, of course, to be trained. The results are difficult to interpret. It's just a decision zero or one. It just tells that there is something in this particular request that is uncommon. But I cannot tell you what is uncommon, what kind of vulnerability or a check is in this particular request. And you cannot predict the behavior of a trained model if we compare to rules and patterns. And models are not so easy to maintain. You cannot just open it in a text file and modify it. You always need to retrain it. So these were the problem metrics. Now I would like to state the goals of our research. There are three of them. The first one is that we would like to create a deep learning model that does not require a future extraction prior to training. The second one is that this model should solve the anomaly detection problem in HTTP requests. And finally, this model should yield interpretable results. What is an anomaly if we are speaking about HTTP requests? In fact, it can be anything. It can be just a request that has, for example, three headers, HTTP headers, instead of normal 20. Or it can be a spam or it can be even a zero J attack. And the model should understand the intention of this HTTP request, whether it is malicious or not, like in classic movie sentiment problem. And of course, malicious and benign classification greatly depends on the samples and the history of previous observations. There are three examples that I want to share with you, just to show how important to be able to take into account the context and the history of the previous observations. If you're a web security enthusiast, of course you would notice here that this request has a string that is where it resembles an SQL injection attempt. But without context, we cannot say that it is a real SQL injection or not. So in fact, this request comes from a backtracking system known as JIRA. And they use the special query syntax called jql. And a pattern matching approach would detect this request as an attack because obviously there are some keywords that are a sign of an SQL injection. But in fact, this request is benign and it shouldn't be classified as an attack. Let's see this example. In this request, you can see a post parameter that has some HTML markup. Just looking at this example, we cannot say that this request is actually an attack again. But web security hackers would definitely try to inject there some, for example, script checks to see if there is an access as vulnerability. Again, a pattern matching approach would definitely block this request because there are some HTML checks. But in fact, this is a request that is also benign. And this markup is allowed for this particular website. And only knowing the previous states, previous observations, the module can tell that this HTTP request is actually benign. And the search example, this request is, as you can see in the host header, comes from a content management framework called jumlah. And looking at the task parameter, you can see that this is probably a typical user registration. But in fact, there is an additional parameter, as you can see, user groups, which is equal to seven, which is, in fact, an exploit for jumlah, which escalates privileges and allows to register anyone as administrator on any jumlah prior to three, six, four. And again, this is a contrary example, when a pattern matching approach would just let this request pass because it doesn't contain any injections. It just has an additional parameter with the value seven. So it typically would not be blocked. But a module that is properly trained and our module would detect this request as an attack. So the next part is actual solution, how our module works and how it was built. All right. Please raise your hands, those of you who know what a neural network is. All right. Now those of you who know what is a convolutional neural network is, right? And recurrent neural network. Okay. LSTM. Good. So I guess I'll have to spend a little bit more time on this. We decided to build these machine learning algorithms, just like a proof of concept inside our web application framework, the protection. So first of all, we tried to build a classfire. And, well, what we tried to do is we tried to collect some benign data. And obviously, we didn't have any malicious data for some example web application. So we had to generate some malicious examples. So it would look something like this. There would be requests which are labeled malicious and some of them would be labeled benign. So let's try to build a classifier. Well, what is an HTTP protocol? It is a text based protocol. And each line of it is an independent sentence. It consists of headers, URI, which are not that long. Some body may be somehow encoded, like whatever. And, well, it's text and it's sequential in nature. And, for example, the values of the parameters, they depend on the names of the parameters. So it would be a weird thing to see an IP address in the connection header, or something weird. So we decided to use recurrent neural networks for analyzing text data. They deal pretty well with texts. And it's a class of neural networks that can work with sequential data. The data which is the rise in its size and its sequences where like normal texts or music or movies or whatever. First, we tried to use simple recurrent neural networks. And we built a classifier on top of it. And tried to evaluate our results, which were somewhat good. But, however, there are known problems. The results are not interpretable in classification. You just get a label and you don't know why a model considers something to be malicious or benign. So your user, which is probably not a security expert, will have no way of understanding the decisions of your model. Also, you have a need to construct the malicious data, which is tricky because you can just like take your benign samples, inject some known attack sequences to them, and call that your malicious data set, which is kind of weird because most likely real attacks to your web application will not look like the data set you generated. And it also needs manual labeling. For example, you want to detect scale injections, XSS attacks, benign data, whatever else you want to detect. So it becomes your classes. But you have to label it all. And, well, it has a certain problem if you encounter a new type of attack. It's not clear which class it will belong to. Okay. So what we tried to do is we tried to improve our classifier. And the first thing we decided to add the attention layer. Attention mechanism is something that solves a lot of problems. Well, first of all, it aids the process of learning. And it also makes the results of your decision process interpretable. You can use it to highlight the parts of the data that your model considered the most important in its decisions and not highlight something that is considered not important. But it still improves our situation, but it still doesn't solve the problems of the classification. But what if we tried to detect anomalies instead of trying to classify our data into SQL injections, XSS attacks, and so on and so forth? Well, the initial task of attack detection is more similar to it. And if you think about it, it's exactly what the human brain would do. Well, if you tried to detect some anomalous attack in your web application, you would probably first detect something weird about the request. And only then you would understand that it's an SQL attack, for example. Well, the advantage of that is that if we built a reasonable anomaly detection platform, we would no longer have to mutually label the data to generate our malicious samples. And that would do it. I don't know, the link embedded, not well. Well, there's a class of recurrent neural networks that are used mostly for machine translation or for like music generation or whatever. It is a so-called encoder decoder models. And they're basically two different recurrent networks with LSTM cells, which are connected with each other in the following fashion. The encoder processes the inputs, and it outputs some state which is a fixed sized vector. Then it's fed into the decoder, and also our sequence fed into it, and decoder outputs some target. Well, for machine translation that's used obviously, the sequences are sentences say in English, and the target sentence is the sentence in French. So the encoder and decoder will possibly translate your English sequence to French. In a nice fashion, it works and Google translates a lot of applications nowadays. But what if our target sequence would be the same as the input sequence? We could build a model where we feed the... So, yeah. So what about if we build a model where the targets are the same as the inputs, but not really the same, but the probabilities of the inputs, which during the learning phase would be one-hot vectors. For example, if we have our request, the first letters of it G E T. We also feed it to the decoder, and the decoder outputs the possibilities of the letters. The first output is the possibility of the letter G, the second of the letter E, and so on and so forth. So what we're trying to do is to teach our encoder the decoder model to reconstruct the data with some possibilities. Now the outputs of the model are the probabilities of each letter in the sequence, and also we collect the whole loss of this reconstruction process. And if the model so-called failed to reconstruct our request, then we consider that this request is anomalous. And it turns out that the probabilities of the anomalous characters in the request which would be considered anomalous are quite low. So here's a bit about how actually the input data is processed. Well, earlier this day there was this great talk by, I don't know, a gentleman about the reverse engineering and recurrent neural networks. He explained a bit about embeddings. Well, we also use embeddings, but we don't use the word embeddings. We use letter embeddings and just use some vocabulary created to transfer our sequences to numeric data. And also the date inside the batches is padded to the maximum length in the batch. It improves the learning process, I don't know why. So in the end we built a model which detects anomalies which can be visualized in the following matter. Here we have an SQL injection and you see that the detection is a bit noisy, but the letters which are considered anomalous, they are highlighted with red. And it seems that apart from some small noise in the parts which aren't actually anomalous like the first letters in the parameters and some unknown one-letter anomalies in the PHP session ID cookie, we have detected SQL injection quite well. Okay, now I think it's the time for our demo. Okay, here's a process which we actually presented to some... Well, first we generate some normal traffic to our test backend. Well, it's actually a burp suit, nothing special. Well, the parameters of the traffic will be set now. Yeah, we use burp intruder to generate some traffic to get trained, to get model trained. So we have sent some requests and we assume that the model is ready and now we will try to send some checks to the target web application. It's a demo application that we have built for testing our different machine learning models. So firstly we tried to submit an SQL injection and it was refused. Then we logged in as a regular user. Now we tried to perform some payment. We do a money transaction and in the comment field you have seen a part of some access as a check, just a sample. And this is the interface of enterprise WAF and you can see how these anomalous requests are logged into some actual WAF. Now here's some interesting part. We are trying to do the, I don't know how it's called, the SQL split injection. Yeah, it's fragmented SQL injection. As you see, it splits into two parameters as traditional pattern machine would typically just analyze each parameter one by one. So it would miss this check because it doesn't see the whole payload, but a machine learning approach on the contrary would detect it as it analyzes the whole HP request instead of particular parameters inside it. And again you can see the detection being logged to the web interface of our WAF. Yeah, and it's quite interesting because, well, this problem with these kind of SQL injections we addressed it separately and we even built a separate tool which tried to detect some sophisticated SQL injections by like parsing SQL grammar and whatever. And it turned out that the generalized neural network can also detect these kinds of sophisticated attacks just as well as the tool that we built specifically for detecting these types of injections. All right. So it looks like we have created a deep learning model that does not require the prior feature extraction. It just can't look at the data online which is flowing to the web applications of your, for example, clients or your own web applications and use it to learn how the normal requests to your apps look like. It detects the anomalies in the future traffic after the learning phase and which I think like the most important result of our work, it yields interpretable results. So your client or you yourself would know why the model considered something anomalous or not. We have open sourced this work as a Jupyter notebook. You can see it on this GitHub link provided below. Please do run it. I think there's some some data from the bank application. Please run it. Please try to train the model. Please try to verify our results. They will be very helpful. And I hope some of you will come up with the new ideas how to improve it. There will be some weird things in the code which you will probably notice. Feel free to ping us whenever you want about them like for example how the thresholds for anomalies are chosen. It's quite obvious in the code but it looks like kind of magic now. And for the future work, well, there's a lot to be improved. First of all, we need to optimize the learning time because now it takes like five hours for the data set that we have open sourced on a high end GPUs. We also need, it will be a great idea to now build the classifier on top of the anomaly detection. So if we have detected some anomalous sequence, it is a good point to try to classify it whether it is an SQL injection or XSS now because we have stripped all the normal data from our request. And improve the threshold calculation. Well, now the threshold is calculated in some sophisticated way. We have a hypothesis that we have a normal distribution and for our learning data set we calculate the parameters of these distributions which we later use in the threshold calculation. And this is used for anomaly detection. Well, cases like that have to be proven or whatever in a more scientific manner because it may look like an exponential distribution more to someone. So that has to be improved a lot. All right, thank you.