 It's made some a slide, and I'm very excited to be part of this fantastic event this year. I would like to thank the Blue Team Village and DevCon team for this great opportunity. And a big thanks to all of you for being here. It's highly appreciated. In this presentation, I'm going to talk about one of the most challenging issue in cybersecurity that may affect almost every type of the engagement. I name it SCOP-X, and I'm going to define it. But before that, let me have a quick introduction about myself. Again, my name is Meysam. I'm a cybersecurity practitioner, consultant, researcher, and blogger. Please take some time and check my blog at MediomSECHOPMedium.com. Every feedback could help me to improve my series. At the moment, I'm working as a senior director at the EGS, EC-Concealed Global Services, which is a consultational division of EC-Concealed. I strongly believe on the collective knowledge. So let's connect. You can connect me at LinkedIn, or Twitter, or whichever works better for you. Together, we share, and together, we learn. Here is the agenda for today's presentation. But before that, let me raise a very typical disclaimer. These are all my words and my opinions, not a view of my employer for sure. So anyway, I will start with the definition of the SCOP-X, what is SCOP-X and why it's a big challenge. I will talk a little bit about subdomains, the issue and challenges, and why I choose a subdomain as a perfect example for this scope. Section 3 covers the process flow for discovering the unknowns and followed by the threat hunting, the structure one. And I will end with the hypothesis, how to convert the finding to the proper hypothesis for the successful threat hunting. So the SCOP-X, what we exactly don't know, almost every cybersecurity engagement begins with the pre-engagement questioner and clarifications. How many web applications are being assessed, how many IPs will be there in the scope. There is nothing wrong with that. Scoping is essential for every service line, but it suffers from a very fundamental issue. It's all about what we know and what we think need to be tested. So there could be a lot of other assets out there in our environment that we simply exclude them from the SCOP for any reason. And we define the SCOP base over point of view, which is not necessarily the same as the hacker things. Hackers are mainly attack every part in our organizations. That's why we can define one of the most scariest part in our environment as a SCOP-X, which is a group of entirely unknown assets and items where we have zero knowledge about them and zero visibility. And they are mostly act as a perfect attack vector for cyber criminals. This is actually the SCOP of work for the cyber criminals. Normally, when we talk about the SCOP of works, we focus on the two below categories when we focus on what we know, whether they are in the SCOP or out of the SCOP, whether we know the risk or we hire some third party or conduct the in-house security test to find the threats. But hackers are much interested in what I call it SCOP-X, where the assets are unknown and the risk accordingly. This is the one big challenge for almost all of the security service line, not only offensive but defensive as well. If you are part of the IR team, I bet you have experienced it several times in your entire career alert generated. We know something happened for an asset, let's say a server, we own that server, but we have no idea where is that located. It's a common issue with the large organizations, with the poor asset management. However, the SCOP-X is not only belong to the large organization, it affects the companies with any size. And to be honest, this is not something completely new, but in my opinion, it has not paid enough attention. Somewhere at 2016, the Gardner predicted that in 2020, most of the attacks will be conducted on the shadow ITs. So that predictions came out through, Buck Road 2021 showed a statistics on that that proved that hypothesis. But I'm glad to see that the Buck Road did not use the term shadow IT because the SCOP-X and the unknowns is something more than just BIOID. The BIOID or the shadow ITs technically, it's every unregistered, unattended, unauthorized and unknown things within our people, process and technology. And subdomains are a very interesting example of the SCOP-X. Why I said so? A very common issue with the subdomains. We want to test something, the external service, we create a subdomain, we run a service and when the test is done, we forget about that. It's remain active there. It's easily used by the attackers as an entry point, walking or breaking our network. There are 100,000 subdomains for the main domain, which not necessarily inherited security controls of the parents. They are not well maintained and obviously they are not under monitoring because we may not even aware of them. As a red team, we normally use them as attack surface. I personally always suggest my client to consider the subdomains as a part of red team when they engage us. However, as a blue team, we should proactively look for them. Look for the subdomains on any other unknown things actively to discover them as a risk and address them. When we talk about the subdomains, I call them the vulnerability-lucky draw because there are a wide range of attacks for subdomains. However, when we search in the Google and other sources, we mainly talk about the subdomain takeover only. No doubt this is a very big issue, even the big company like Microsoft affected. But I did a research on the hacker one recently and only 30% of the findings were on the takeover. More than 60% is another type of the attacks. That's why it's not easy to just jump into the subdomain analysis and the SCOPEX threat hunting without the proper process. So technically, threat hunting is supposed to proactively look for the anything malicious in the past and present. In the ideal world, it watched the areas that not covered by our monitoring system to look for the attacks that not detected. But there are a few success factors for that, total visibility for sure. We should aware of everything that exists in our environment. A situational awareness is all about a good, bad, and any deviation that makes them unknown and we tag them for the further analysis. We should have the advanced knowledge about the attacks and the techniques used by the attackers. And of course, the data should be available and the data should have some certain of the quality to be analyzed. So this is something that we rarely see in the subdomains. So I have a question for all of you. Do you know how many subdomains exist for your organization domain? That's my point. We never checked them, but the hackers may use them, may take over them, may hack you into our registrant system and create a subdomain under our name and use it for the malicious purposes. For that reason, we cannot just blindly jump into SCOPEX and search for something that we have zero knowledge about that. We're still exactly as confused as Alice, but in the cyber-digital Wonderland. So this is the simple process on what we can do to discover the unknowns in the SCOPEX. We start with the data collection. It could be passive and active. For example, for the subdomains, we can do the proactive subdomain enumeration to come up with the long list of the subdomains. All the data must be validated to ensure the accuracy and the quality. For example, I check whether the subdomain is resolvable or not. It's up or it's down. So every collected data should be validated and classified. We can group them, free them, and reduce the unwanted data before we start the analysis. And of course, we ended up with the report and finding. Maybe you think that this is something that you know. Of course, this is not something new. But I merge the second part with the first part, which is a very common in normal penetration testings. So the moment we discover the unknown in the data collection phase, we can feed the other service line with a new SCOPE. Let's say, for example, we have these are the domain and IP in our organization. We do the proactive discovery and we find the new items. Before we do the analysis, before we do any further action, we can update the other teams with the new findings. Okay, the compromise assessment team, the penetration team, the red team could have a new SCOPE of work. But the main point here is when we find a report, means we collect the data, we validate, we classify, we analyze, and we have a set of the finding. This is the time that we can conduct a structured threat hunting by creating the hypothesis. That hypothesis should be testable. We should check all the type of source of data, which can help us on proving the fact or disapproving. All the tools and techniques that we need and we test the hypothesis. The finding could affirm our hypothesis that, yeah, we thought something bad happened and is exactly there. It could come up with the action list, like trigger the IR, or we could simply reject. We have a hypothesis, we think that our server infected, we go check, everything is fine. Peace of mind. There is a third option, which is non-conclusive. Means we make an initial hypothesis, we analyze, but we couldn't find anything. It could be because our hypothesis was not mature enough or we don't use this proper tool set. So let's start with the data collection with the subdomain as an example. So there are a lot of tools and techniques to enumerate the subdomains. I'm not going to talk about them because there are a lot of interesting talks out there. Even in the previous DEF CONS, you can find an amazing techniques discovered by awesome presenters. Just a suggestion that we have, as you may know, we have to use different techniques because the objective is to collect data as much as possible. I'm not going to whitelist any tool here, but the three tools that I introduce here are my top favorite, because simply they work for me, especially the asset finder, which is quite fast. I would like to talk a lot more about potential observation that we could have in this stage as a threat hunter. When we collect the data, when we enumerate the subdomains, we may see a lot of unknowns. The domain is there, but we have no idea what is this. It could be with the unusual name, unusual lens, unusual pattern, or it could be an odd number of the subdomains. A small company with a hundred subdomains, when we did expand the business on that scale, we have no idea. It could be the indicator of the attacks. So once we collect the data, we should make sure to validate them because a good data, a high quality data with the accuracy, reliability, and completeness could provide us with the better vision and higher quality in the analysis. So for example, for the validation in this presentation, I'm going to check which IP can be resolved and which subdomain is done. So I simply filter out whatever is done and I keep all that can be resolved in my data set for the further analysis. So as you can see, for this case study, I had around 6,600 subdomains, which I reduced to the 4,025% data reduction, but the important thing here is that we should not simply filter them out and forget about them because whatever you don't know could be up any time in the future. That's why we should keep the data in the watchlist, keep eyes on that for the further analysis. So there are a lot of prefixes for the subdomains based on the nature, based on the service, based on the stage like development, testing, acceptance, or whatever. But there is a few interesting things that I always go for, which is the admin login API. For example, in this example, I was searching for any domain which has the API and it's in the staging state. So once I do the first classifications, I go for, for example, another round. In this example, I go for the port 80 and port 443. And also I check the HTTP response code of all of the subdomains, which is up, leave, and operating on the HTTP protocol. This is not only for having the better data, this is to having a situational awareness about normals in other organizations. So when we discover the things, we study them, we understand and record all the normals. And if any deviation happened, any sudden change, any sudden difference from what we use to be observed, we look for the justification. Is there any operational justification for that? Or this is something that we are not aware of that could be tagged as a malicious. For example, if we keep tracking up the number of the subdomains that can be resolved and normally up and down one or two per month, but suddenly we have a significant drop of our active subdomain. This could be an example of the, for example, DNS server, attack on our DNS server, the vidas. Even the status we can check, for example, one subdomain changed the status in the code like once in a while, that's pretty normal or could be different stages. But if we observe a subdomain, which is constantly changing the status, it could be some sort of the second communication between our infected server and the clever hacker, a smart hacker. So technically, there are a lot of things to be analyzed from technology to platform to services to applications because it could be anything interesting behind every subdomains. For example, for the HTTP, we could have the source code review, directory, external internal links, login pages that may lead us to the vulnerability, misconvolution, data leaks and whatsoever. The main question is that there are a lot of techniques. What we should do? We should go for manual observation. We should use the automated tools or we should use advanced analytics techniques. So I don't want to debate on the human VS machine here because at the moment a threat hunting still is a human driven thingy because we need the human brain to understand, to sense, to find all the strange stuff. So as well commented by Charles, we should automate what we know and we need to spend quiet time finding what we don't know. So manual observation is the first step in the data analysis, but of course, navigating the subdomains one by one is not an option. As a common practice, we can use the automated tools which make bulk screenshots. There are a lot out there, but in this example, I'm using the eyewitness to make the screenshot and also download the page source code for the review later. So the screen shots are fun to review. We could have a lot of error messages who may could expose the sensitive information, login pages, any other juicy information. For example, here I show you the real example of the Apache Comcat in the real engagement time to say hi to the exploit DB. On the other side, we can check the source codes, page source code to look for any default credentials, login pages or Java scripts. But one that was interesting in one of my engagements that with a simple directory and file traversal, we find the file and we simply search with the zip file which is contained the entire backup of the website. So a simple grip on the configuration file which is contains pass or password lead us to find a username and password. Even if the username, even if the password is no more in use, but still the pattern can be used for our dictionary attack later. So there are a lot of things that this is the challenge. There is a lot to discover and we have to know three main things. Normally what to discover, we find them through the technical checklist and guidelines like everything that covered by the OVAS, by the CIC controls for misconfigurations. We could have our hand dirty with the knowledge bases like Mitre to find both recent techniques and technology used by the attackers. And the most important things in the threat hunting which is at the cyber threat intelligence. When there is a newly attack discovered or new pattern observed by our team, we should adopt them and use them to look for the similar things in our environments. So we have all this huge information there. We can pick up those related to our threat hunting objectives. For example, as for sub-domains, I could choose the application entry points from the OVAS. I choose the application layer protocol for the command and control mechanism from the Mitre. And I look for any sub-domain and port 80 and HTTP and web-based traces in the public available cyber threat intelligence databases. But to be honest, it's too much work. We need a lot of knowledge. We need two big monitor popcorn coffee and the nice eyeglasses like me. But still the manual observation is an integral part of that. Personally, I enjoy doing the things manual, but don't take me wrong. We should do it mostly during the training and capability developments and we are not supposed to avoid tools and waste our time and energy when we are in the real battlefield. We are all human. We need the tools. There are a lot of tools out there that help us to automate the process, make it faster in the larger scale. So again, I'm not quite listing any of these tools, but I personally have a lot of fun with the spider food, which is the O-Synth automation tool. Get fits from several data sources, classify it for them and make it easy for us to navigate. But what I really love the most about the spider food is the ease of installation. If you work enough with the open source tools, you know the pain. So the famous amas from the O-Bas, which is the tool for the network mapping and the attack surface external asset discovery. The project discovery is interesting because this is not one tool, it's a set of toolkit that we may use. The sniper to me seems good, but I only check one or a little bit of it yet to explore. These are all good tools, no doubt. They can automate the things and make things faster for us. But when it comes to the complex sophisticated attacks, multi-stage techniques where the huge volume of the data is involved, the data keep changing, the data comes in the very different and variety of the format. We need something more than the human, the ordinary hunter and tools. This is where we should use the advanced analytical techniques like data mining, machine learning, pattern recognition, neural networks to look for the sophisticated things. I always use the botnets as an example because they are really sophisticated and the botmaster use a lot of techniques to remain persistent as much as possible in our environment. For instance, I picked up a few subdomains from the IBM X-Force. These are the subdomains belong to the certain botnet activities. So for all of them, the main domain are still up. For the first one, the domain are up, but it's operating in the different IP which is not belong to the main entity. The domain is in this company and the subdomain which is under the main domain is operating from the another entity. It could be the obvious indicator of either the takeover or the shadow domain attack. On the other hand, I found another one which is interesting. So the domain was up and the subdomain is up, both operating from the same IP. It could be an indicator of the server compromise. The last one was interesting. The domain was up, but the subdomains are no more there. So when we look at the threat hunting and threat feeds, we should not only focus on what is active at the moment. We should consider those who are not active as well as well for many reasons. They could be simply just disabled temporary by the attacker as a strategy to evade the defensive solutions. Or it could be disabled, retired and replaced by another new subdomain. In that case, we can use the pattern, identify the pattern and look for something more or less similar to that. This is what we call it a similarity-based detection. So the main point of this slide is, as you can see, there are a lot of patterns, a lot of behavior, a lot of scenarios, a lot of detection evasion systems. For example, one of the web-based generation of the botnet used the HTTP and port 80 to communicate with the command and control server. They can impersonate themselves as a normal web traffic and it would be difficult because port 80, we cannot easily block it and we cannot just monitor it because there are a huge amount of the web-related activities in our company. That's why we need some advanced technique to look for the pattern of these activities combined with the manual observations and, of course, the automated tools to detect these kind of complex attacks. For example, in two of my last researches, I proposed some classification based on the neural network and pattern recognition to detect the HTTP botnet traffic based on the periodic nature. If you are interested to know more about them, I would be more than happy to connect and explore together. So let's say we have a set of the findings. So this finding, actually, this is not the final finding and the final report should be more advanced. This is just initial observations. I list down three examples for you, one with the manual observations. For example, the unknown subdomains during the red team exercise or pentest exercise or we have just found fun with the discovery. We observed two newly created subdomains under our company name. So we have to go and validate them. Is it something that one of the development team did for some purposes? Any logic behind that? If no, we should definitely tag them as a suspicious for further analysis. It could be as simple as the result from our vulnerability scanner for a known vulnerability. For example, in this example, there is a CV with that code with the unverified exploit available in the exploit DB. As a turret hunter, we should consider everything as a top severity because if the exploit is not available public, it doesn't mean that no one out there could exploit it. It simply means it's not available. It's not publicly available. We should still consider it as a top priority. But I'm more interested to discuss about a new botnet. Let's say our cyber threat intelligence monitoring discover a newly web based botnet, which is targeting a Microsoft Windows operating system and exploit three different vulnerabilities. Not to mention if we are heavily operating with that kind of industry or that type of industry or that kind of infrastructure. This is something that we really need to go depth into. So now we know the different issues. What would be the next step? Let's say for that button infection, we have two things and we can go minimum three directions. So we have the indicator of the compromise. So that botnet have a certain URL and a certain house name in the turret intel feeds. The first thing that we could check is any of our system communicating with those known bad URLs or IPs. This is something to easily search from the different source of the data logs and firewall in our environment. So but the second interesting things is a still IOC base, but instead of searching about communication from other organizations with those URLs, we do the other side. There is it because that URL that botnet working based on the three non vulnerability. And if by chance we have that vulnerability in our servers, there is a chance that we get compromised. The botmaster take control of our server and use other sub domains as a part of its domain is a C to C mechanism. So in that case, I highly suggested to search to discover our sub domains, list them all the IPs and subdomains and start searching for any traces of the existence in the public available. Threat intelligence if we found means be compromised. So IOC is a very good things for the rapid testings, but they are most basically on the rule base because it's pretty obvious so I have this pattern I have this signature I have this hash I have this IP if I find it I find it. So as well mentioned by the Anton if we can simply write a rule, let's write a rule we don't need to really go for the hunt. The challenging hunt comes when we talk about the TTP, the advanced technique, the procedures used by the attackers. So for example, as we just mentioned, but master could use the web based protocols to communicate with the command and and and controls and they can hide amongst the normal traffic. However, based on the research out there, the bot nets mainly chose some sort of periodic pattern because they need to periodically communicate with the bot master to receive the new commands and updates. So any periodic web based net for traffic that do not follow the expected standard protocols. For example, they use a different ports on the on the HTTP rather than the 80 and 443 could be considered as a sign off infection. But this is something that we should collect a reliable data. For example, we collect the network traffic based on that hypothesis. We identify the parameter that use for the detection we go for the advanced analytics and we either affirm the hypothesis or we rejected. So actually, the scope x is more wider than just a 30 minutes talk so the talks is ends here, but the threat hunting is a never ending battle. I would like to say the very special thanks to Joko, a very good friend and the teammate for all the brainstorming and the support not only for this talk but for all the other things. Feel free to connect to me if you want to know more about these concepts, discover your unknowns, be proactive, dive into the ocean, happy hunting.