 Hello everyone. Welcome to my talk named Normalizing Empires Traffic to Await Animal Based Ideas. Can everyone hear my voice? Is it okay in the back side? Okay great. So let me introduce myself first. I'm Otku Shen. I'm usually doing researches and writing tools which are about offensive side of security. I'm currently working in tier security which is not live yet but it will be live soon. You can find detailed information on my website and you can follow me on Twitter if you are interested in. So in this presentation I will talk, start to talk about current state of defense and how we prefer assume scenarios. After that we will check advantages and disadvantages of network intervention detection systems and their evasion techniques. After that we will check empire project and its situation against animal based systems. Then we will check proposed solutions and the first order tool which can be candidate solution for these problems. So you are seeing a very basic architecture of a network in there. There are client computers located alongside with other assets like server, databases etc. Top of them there is also an intervention detection system which controls the whole traffic and tries to identify malicious traffic. So also we have a concept named perimeter defense center. This concept isolates your internal network and acts like a door which connects you to the outside. But as we can observe from recent breaches, perimeter defenses are not holding attackers from an organization network. A good crafted superior phishing email is usually enough to gain a foothold on the network. So more and more attackers are using these techniques in order to bypass the perimeter defenses. So gaining a foothold on the internal network is not a big deal anymore. So as attacker profile has been changed by time, defense methods are also changing. Besides that penetration testing concepts are changing too. Today organization and testers are mostly focused on assume breach approaches. So assume breach is simply accepting that attackers are bypassed your perimeter defenses and got a foothold on your network. So the red teams and testers will be focused on post exploitation instead of passing the perimeter defenses. So rather than traditional vulnerability scanning, testing activities becomes a cat and mouse game between attackers and defenders. We can show needs to network intrusion detection systems as an initial step for detecting attackers on the network. We can talk about two main type of needs in there. They are signature based and anomaly based. So you know the signature based systems. It has a signature stack of privacy known attacks. It compares each network package with those predefined patterns. If they match well then we have a malicious traffic. There are very good and open source projects out there such as Snort and Surukate. The downside of these systems is since they only catch known attacks, they won't be able to catch new type of attacks. So Havana Tucker can evade those systems. I say it's not so complex but it's not super easy too. All you have to do is changing the traffic elements such as package size, headers and some known strings. Also maybe you can apply some encoding methods too. Anomaly based systems are a bit sophisticated than the signature based systems. Anomaly based needs refers to building a statistical model describing the normal traffic and flagging the abnormal traffic. To do that it analyzes the normal network traffic and applies various data science techniques to build a pattern. These kind of systems are exists but they are usually commercial products and not open source tools. There are some theoretical concepts described in various of researchers but they are not so practical yet. So Anomaly based system have a chance to catch new type of attacks. We can show the overall process in a basic chart. Anomaly system should record the daily traffic first in order to build a baseline. So what do we mean by daily traffic? It's the regular activities of the regular users on the network. For example, visiting a news website, sending an e-mail via SMTP or accessing a server via SSH, et cetera that kind of things. After then that capture data will be given to the learning algorithm. It will be trained by this data to create a pattern which means the normal traffic profile. After then this program listens to the network and compares each network package with the normal traffic profile. If it's not fit the normal traffic profile it's flexed, it's as a normal traffic. The evasion part is a bit tricky than the signature based systems. We can list the evasion methods into two different categories, pre-training evasion and post-training evasion. The training refers to analyzing the normal traffic to build a pattern. So as we know that an anomaly based system needs to capture and analyze regular network traffic in order to build a pattern. So we are assuming that during traffic capture only legit users are generating legit network traffic. So that we are expecting that anomaly based system will be able to differentiate malicious traffic from the normal traffic. But what if an attacker is located on the network before the training process is done? He can generate malicious traffic. So it will be included in the normal traffic pattern. For example in this chart a user visits amazon.com, another one connects some server via SSH. But the other one the attacker uses metroprotor reverse TCP traffic. It generates that traffic. So anomaly based system will include this on the normal profile. So in feature when anyone run metroprotor on the network, anomaly based system won't catch it. But we can say this scenario is not realistic, right? How can an attacker know when the traffic is going to be trained? It's hard. We can counter-center malicious insider threats. In a realistic scenario we expect that attacker can't know when the traffic training is done. Therefore we need to focus on post-training evasion. In this scenario we will assume that a trained anomaly based system watches the whole network. So we as an attacker should gain a foothold on the network and exfiltrate the data without causing any anomaly alerts. To achieve that we need a post-exploitation framework. So I could choose metaspoint for this job but instead I choose empire. Because I love this project. It's very flexible for me, easy to add remote things. Also written in Python which I prefer over Ruby for myself. So how many people used empire project before? Okay, there are quite a lot of people but there are some also people who doesn't know it. Okay. So we can describe, so empire is basically a post-exploitation framework like metaspoint. We can describe empire's workflow in two parts, agent and listener. So agent states the infected machine on the network which takes and executes tasks on there. So listener is described as communication server, the C2 server in which agent connects there, gets designated task and sent related output to the C2 server. Empire supports different type of listeners such as HTTP, HTTPS, Dropbox and there were some others too. But even though HTTPS connection encrypts all communication, we will assume there is a solution on the network which intercepts and decrypts TLS communication. Because of that HTTP listener will be our main focus. There are different traits of this HTTP listener as you can see on the slide. We will see how these traits affects our visibility on the network in upcoming slides. HTTP listener provides an encrypted communication even without TLS connection. This encryption is done in two parts. The client data which includes data like command to be executed and its output, it's encrypted symmetrically with AES algorithm. On the other hand there is metadata routing data package which is responsible for routing packages to the right destination. It's also encrypted with our C4 algorithm. So we will consider these encrypted payloads are not decryptable by a solution on the fly since there is no publicly known product has many in the middle capacity for empires communication. So assuming that an agent is deployed on the network, we will check the communication between agent and C2 server. After the initial negotiation agent will connect to the C2 server in every and seconds. You can see generic request and response of a heartbeat connection in there. So we can list following traits which can be inside of an anomaly based system. First the request URI. As shown in the request agents make its connection with a GET request to a specific URI. It's read.php in this example. So if only HTML or ASPX pages are in use on the local network, this PHP extension may be flagged by the anomaly detection system because it's something different than the HTML or ASPX pages. Second we have user agent value. If all users on the network uses Microsoft windows with Chrome browser, setting the user agent value to Mac OS with Safari browser probably will be flagged by the anomaly detection system because it's something else, something different. It's not familiar traffic. It's also same with the server header. You know if you are using Microsoft IAS servers in the network, setting the server header as Apache, it won't be a good choice for you. Okay. The last one is post request body actually. As seen in the example request, post request body is encrypted and contains gibberish characters. If all users are browsing our regular websites, this will be likely to be flagged by anomaly detection system. So we listed the traits which will be considered by an anomaly based system. Now let's check how can we adjust them. We can gather the traits into two different groups. Traits that can be changed in listener's option menu and the traits that can be changed by changing empire source code. Our first group consists request URI user agent value server header port and connection interval. Second group consists default HTML content and post request body. So let's look at the proposed solution. With which method we can normalize the mentioned traits. The proposed solution is called polymorphic blending attack. It's a useful technique for evading anomaly based systems. From a high level of perspective, the idea is creating attack packages which are matches to the normal traffic profile. Of course, to use this technique, attacker should know what is considered as normal, right? The attacker should know the features which are used to train a normal traffic profile. So to do that, our model requires a traffic capture data of the network after the normal training process is done. By analyzing the traffic capture, we can do some deductions about the normal traffic profile. For normalizing the first group of traits which are request URI user agent server header and port, we need to analyze the traffic capture and identify most common values. For example, which user agent values preferred most, which ports are used, what kind of server headers in there, etc. After identifying these most common values, we can configure empire's listeners option. After that, we can start the agent and C2 communication. We also have a connection interval in the first group of traits. However, finding normal connection interval, it's not any easy task. One way to do it, figuring the connection interval and frequency of the users to the specific websites. However, this solution will not be practical since there can be delays, interruptions during the traffic capture and this will analyze our results. The second solution actually relies on false positive rates of an anomaly detection systems. Some research mentioned that the false positive rate of an anomaly based system has a positive correlation with the size of the network. Which means that in a large network, even if we keep connection interval value small and get flagged by anomaly detection system, this will be likely to be seen as false positive by analysts because it usually happens a lot. However, if we are located in a small network, we need to set the connection interval higher than a predefined threshold. The disadvantages of this method is C2 agent communication will be delayed but it's a trade off to keep our communication out of sight from anomaly detection system. For the second group of traits which are explained on the previous section, default HTML content, it can be chosen by identifying most visited website in traffic capture data. However, normalizing post-request body of the communication is not achievable by using traffic capture data, of course. As it explained in the previous sections, post-request is encrypted and contains gibberish characters. So here's the deal. If we encrypt the post-request body, it gives us power over signature based solutions because it won't match with any malicious signature. However, anomaly based system will flag this since it's not like a normal HTTP data. On the other hand, if we don't use encryption in there, our problem is bigger. Now signature based ideas will catch us since it may contain some malicious integrator like who am I, cat, etc. Also anomaly based ideas will catch this. Same reason with the previous one. It's not like a normal HTTP data. Instead of directly encrypting the data, we can use Markov Obstruction tool which is created by a silence superior team. So let's check the overall process in the slide. First, we need to, we need the training data which consists lots of texts. Silence team used an English book for their demonstration. After them, Markov encode algorithm is trained by this book to build a model. Then the model is built. It's ready to encode any string to a fairly meaningful English text. For example, this is the data of an ETC pass VD file. Normally every ideas solution will flag this. However, when we use English book to generate a model and encode that's string with Markov Obstruction tool, we got this result. It seems like a meaningful English text like a blog post. However, when it's decoded by using same model, we will have the initial content of the ETC pass VD file. You can check silence superior teams GitHub page in order to get more information about this tool. So the operation steps will be like this. First, we need to encode encrypted content with base 64 to get rid of gibberish characters. As we talked, we also need lots of text which will be used as training data for Markov Obstruction. One way to do it, including training that are inside the agent, however, it won't be practical since it will increase the agent size a lot. Instead of that, we can download the data set from an external source. It doesn't have to be our own website. We can program it for crawling and parsing text from like news websites, blogs, etc. After then, we are ready to encode our data. And we will send the encoded data along with the training data to the C2 server. Since the anomaly based system will see only English text, it will think that it's kind of blog post or a forum comment and probably it won't raise any alerts. But there are some drawbacks of this method. The first one is our training phase will consume time and resource of the deployed computer. Time may not be a very big problem, but if there's a centralized resource monitoring solution on the network, Blue Team may identify these resource users spikes. The other drawback is you need to implement Markov Obstruction code inside the agent. Because it should do this by own zone. We can send some comments and expect outputs at that moment. This issue will increase agent file size also. So we created a tool called First Order, which is written in Python. With given pick up file, First Order can extract key features of the network such as most used ports, most used user agents, server headers, how many different machines are located on the network, et cetera. It can extract all the traits which we discussed in the previous sections. According to these identified data, it automatically configures empires listener. So let's see a small demo video to see it's on action. Sorry, I can't pause the video so I need to explain in real time. So First Order takes three arguments. One of them is capture data, pick up file and the other is empires username and password. After then it analyzes the packages and provides as a summary of the capture file, most used elements, and it automatically generates an empire listener. So since this empire listener is feeded by the most common elements of the network, probably it will bypass the anomaly detection system located on the network. Because we are using the normal traffic profile of the network to build an agent and C2 communication. But if you use default configuration of the empire, probably anomaly based system will catch it. So this tool is available on this address. So feel free to use it, submit issues if you find any bugs, and of course send some stars. So as a result, we can say that defense mechanisms are evolving to something smarter, something better. Maybe in future, signature based approaches will be totally be abandoned. And there is a, yeah. And there is a high probability that machine learning based defense mechanisms will be cheaper and wider than today. So attackers should also evolve in this way. We need to find smarter ways to mislead artificial intelligence. Thank you for listening. That's all I got.