 Hello, everyone. Welcome and thank you for tuning into this virtual session, Cleanup on Idle Cloud. My name is Sarah Johnson, and I'm a lead cloud security architect at Boeing. This session is for security engineers, platform engineers, developers, and everyone in between. So whether you're new to the industry or you're quite experienced, there should be something in this talk for everyone. We'll jump into both high-level strategy and technical details as we explore challenges and solutions I've encountered on my cloud-native journey at Boeing. I started my career at Boeing on the commercial airplane side of the house, performing vulnerability assessments on commercial aircraft for compliance. Since then, I've had the opportunity to support some really unique and interesting projects, including machine learning, wireless pen testing, security automation, platform engineering, and most recently, which we'll be focusing on today, cloud engineering, and GitOps. I worked in commercial airplanes. I picked up a term that I've been thinking about a lot lately. The term is FOD, which stands for foreign object debris. This is an oversimplification of the definition, but it essentially refers to something on an aircraft that isn't meant to be part of the configuration and could interfere with its operation. Sometimes, the object can seem insignificant, like a pencil or a paperclip or a wrench, but left in the wrong place during the assembly of the aircraft, it presents a vulnerability and an increased risk to flight. I've been thinking about this recently because, well, we sort of have FOD and software too. Whether we're downloading packages from the internet or copying code from Stack Overflow, do we always check to see if everything it contains is safe? Or how many of us have ever left debugging or testing code in our software? Or maybe you left a testing resource running that's no longer in use. Or maybe you're testing out a complicated infrastructure deployment, but when you go to tear it down, not all the resources are cleaned up. That is software FOD. And like in aviation, it comes with risk. Whose job is it to manage software FOD? If you know me, you'll know that I've joked occasionally about my title and that instead of cloud security architect, I should really be called cloud janitor because a good percentage of my job has been cleaning up other people's cloud trash. Working in security sometimes feels that way. Sometimes we're builders, sometimes we're defenders, sometimes we're quality control, and sometimes we're garbage collectors. But is it always security's job? What can security do better? And how do these issues affect the jobs of other engineers? I've been part of Boeing's transformation to the cloud and helped a number of development programs transition from legacy environments. During that time, I've experienced all the pain points that come with adapting to new technologies through both failure and success. From this experience, I'd like to present three relatable examples of FOD in the cloud and talk about how to navigate these challenges with your teams when configuring cloud development environments. We're using AWS GovCloud for software engineering at Boeing. So the examples that I'm presenting here will be specific to AWS, but the principles apply to other cloud offerings as well. Let's take a look at scenario one. In our first scenario, Alice is a cloud engineer for a platform engineering team. The team is new to the cloud and is pretty small, less than 10 people. However, the team is building a platform that needs to support over 300 developers. Alice comes to work one day and notices an odd resource in the team's environment. It's an unnamed, untagged Windows 2012 EC2 instance. So why is this a problem to Alice? Well, Alice knows the team is a Linux shop and shouldn't be running Windows systems. Windows 2012 is also an old version of the team really shouldn't be leveraging. Alice also found that the image came from an unknown vendor in the AWS marketplace, and whoever created it didn't follow the team's standards on naming and tagging conventions, so she can't tell who or why the resource was created just by looking at it. So to Alice, this seems to be a classic case of software fought. Now, in our second scenario, Bob is an SRE on a development team. Bob is having trouble accessing data he put in an S3 bucket. So Bob makes the bucket public so he can pull the data down to the resource that needs it. Why is this a problem for Bob? Oh, we know public S3 buckets are just that. They're public. Data in the bucket could also be sensitive. And there were other ways Bob could have configured the bucket to access it properly. This is an example of software fought because we know it's a misconfiguration and it only exists because of a temporary problem, but the misconfiguration will probably persist unless other measures are put in place. So this brings us to our third scenario. Eve is a security engineer on an IT team. Eve was assisting some development teams and noticed an issue with SSH key management. Eve knows there's a policy to enforce SSH practices, but it turns out some of the teams were not made aware of the policy. After more investigating, Eve found several teams that had completely open security groups or sensitive ports exposed. Is this a problem to Eve? Well, Eve knows that there's big vulnerabilities associated with some of the open ports. Open security groups also are just bad practice. And Eve knows the team needs to be NIS compliant so they need to disable unused ports and protocols. Now consider the impact of scaling these issues in the large organization. What happens when these teams become 10, 20, or more teams with different regulatory and customer requirements? Even in small organizations and projects, we need solutions that can grow with us. With these scenarios and scalability in mind, let's walk through how an organization can tackle addressing misconfiguration and software FOD in our cloud environments. We're going to leverage classic security frameworks to apply administrative and technical solutions to our problems. So, is administrative controls or policy? Policy is the first step to creating change. Trying to force changes without a written policy will likely cause tension in the organization. In the second scenario, if we just tell Bob he can't have a public S3 bucket, but we don't have any policies to support our statement, we may damage our relationship with Bob's team. Bob believes he needs the bucket to be public and his team may feel that security has unpredictable and arbitrary requirements. Similarly, in order to create an effective policy, we need to have buy-in from all the parties affected by the policy. Proceeding without buy-in can also damage our reputation and have unintended consequences on our organization. For example, in the first scenario, Alice could create a policy to only permit use of CIS official benchmark images. From a security perspective, that's a great decision because they're from a known vendor and have some hardening applied. However, what if the development teams have a requirement to use images that match their customer configuration and they can't use a hardened operating system? If Alice goes forward with the policy without working with the teams, that might mean developers can't meet their product requirements without violating the policy. And lastly, as we saw in the third scenario, policy also needs to be communicated to be effective. If teams don't know about the policy, it can't be followed. Back to some example policies that may apply to our scenarios. In scenario one, Alice works with stakeholders to create and communicate a policy to use only approved vendor images. In scenario two, Bob's team is consulted and a policy is created to disallow use of public resources. In scenario three, Eve already has a policy to remove sensitive ports and works to communicate the policy. With all this in mind, administrative controls have limited effectiveness on their own. We also need technical controls to support our policies. Technical controls are divided into preventative, detective and corrective categories. Of the preventative controls, we should start with IAM. IAM is the backbone of AWS security. Working in AWS, it should be the first place you go to allow or deny user actions. There's a few ways we can leverage IAM and in AWS organizations, service control policies to address our earlier scenarios. We can use a boundary policy or service control policy to do things like prevent an instance without required tags or prevent creation of service accounts without a permissions boundary or block changing S3 bucket policies and access control lists. However, security groups might be a little trickier. It's not a good idea to block changes to security groups in a development environment from my experience. Things may evolve quickly and developers or SREs may become blocked for long periods of time and it may not be reasonable to have a security team babysit security group configuration for everyone. That brings us to some of the limitations of this approach. It may be difficult to enforce policy at this layer if a lot of exceptions occur or frequent changes are expected. We're going to use Bob's example to explore other options of preventative controls and look at other tools for leveraging. Looking at the S3 service and this demo account, we can see there's a setting for blocking public access at the account level. One way we can prevent public S3 buckets is to enable the setting for each of our member accounts. Obviously I'm demoing this in the GUI which doesn't scale well and isn't enforceable. So let's look at two tools to do this programmatically. First up is AWS Config. Config is a great cloud native configuration tool that provides policies, code automation, data aggregation and resource management capabilities. Real leveraging AWS is conformance packs and bowing to make measuring configuration compliance easier. Conformance packs use config managed rules to monitor compliance with best practices and frameworks like CMMC, NIST, CIS and more. FIG also provides two managed rules for the S3 public block setting that come with the conformance packs. The one we want to examine is the periodic check. The other rule is only triggered on changes which means we won't catch accounts where the account level public block was never enabled. I'm going to go manually add this managed rule to my account assuming it wasn't already enabled. Gonna change the frequency to 12 hours. That's the settings the same. And then for some reason we have to go find this in the GUI. Not sure why it's so hard to find these rules. I could have used a CLI but I thought that would be less interesting for a demo. Found it and we should see my account get flagged. So now that we have this finding we want a way to remediate it. If we're applying a remediation action through config we can use SSM automation documents to apply automatic remediation. AWS provides a bunch of managed automation documents for doing different actions. They also provide one for configuring the public access block. But availability of these services can range by region and it doesn't look like this is available for us in GovCloud. So we have some other options for this. Instead of using a managed rule we can create our own in lambda or guard. But there's also a config rule development kit from AWS labs that's available for us to leverage. The benefits of this is there's a huge repository of open source rules for us to use. And there's a ton of stuff in here. And if we look we should be able to find one to solve Bob's problem. Here it is, if it loads. Awesome. But config does have some downsides. There's some differences in GovCloud which means some features aren't available or weren't until recently. You also can't update service linked rules like the ones that come in the conformance packs. Vig is also a little complicated. I found that it does have a learning curve. And if you're like me and you're trying to do super customized rules you may have to muddle through some convoluted errors when you're trying to get things to work. I mentioned the flexibility of the RDK is awesome but what we wanted to use to leverage with S3 actually requires some special modifications to work in Lambda. And some of these rules are really long like almost 500 lines of code long. And all I wanted to do is something simple like turn on the public access block setting. So in general config is great and we want to leverage was possible in that tool. But we also explored another tool to supplement our AWS native services like Cloud Custodian. Cloud Custodian is an awesome open source CNCF incubating project that complements other cloud native tools really well. It also supports AWS Azure and Google Cloud platforms. So to look at how to solve Bob's problem in Cloud Custodian we're actually gonna look at our demo account again first. We're just verifying that the account level public S3 block is still not enabled. And then we can just create a little YAML policy to using Cloud Custodian for this. The resource we're using is AWS account. We're just applying a filter for the S3 public block. And then we're gonna go run this policy. So it returns count of one which means it found one resource that isn't in compliance with our policy. And we can go into the step directory that it created to get more information about our resource. So I'm gonna hide the account ID and account name here but you can see that it did find my account and these fields are all set to false which we know is true because we just looked at that. But in order to actually make this a preventative action we want to enable this setting in our account. So we're just gonna take a set S3 public block action and set each of these fields to true. And then we're gonna go run our policy. Okay, so we see our count of one but we also see that action here and it says it took an action on one resource and it looks like our setting is enabled which is pretty cool. And we did that with 16 lines of code rather than 500 lines of code that we looked at it with the other rule. So to sum this up we just looked at how to leverage a preventative control for blocking public S3 buckets using IAM, SCPs, config and Cloud Custodian. It brings us back to detective actions. So what if we're not able to prevent an action but we just want to flag and monitor findings? Alice's use case of a rogue resource is a perfect candidate for detective Cloud Custodian rule. We've written some YAML here and we have some variables set for what we believe are authorized AMI names and owner IDs. And we're just looking for EC2 resources that have been created that don't fall into these permitted categories. So we're gonna go over to the console and we're actually gonna spin up our example, Windows 2012 instance which was not permitted in our rule set. Care about accessing it. Right, and should be spinning up. See it pending? It's okay if it's not running. We should still catch it. All right, and we ran our rule and now we have a count of two which is interesting. So we know about one because we just provisioned a Windows EC2 instance which is in our list. So let's go take a look at our resource output. So we see an instance. Ends in BAE3. Looks like our instance ID. It also says a Windows platform. So that makes sense. And then we're just gonna scroll down until we find our other resource which I should have just grabbed for this, but. Okay, so we have another image ID here and instance ID. So we're gonna grab the instance ID and figure out which one this is. And interesting, cool. So we have another instance that it flagged. It is terminated, but it's still caught it. And AMI is Amazon Linux 2 which we have Amazon Linux in our list but we're restricting it to benchmark and Stig images. So we were able to flag that as well. But we can also, because we're looking at a detective control, send these findings to other monitoring services like Security Hub. And so we're gonna take that action now and we're doing a post finding action. So this should post this to Security Hub in our account when we run it. Looks like it found our resources and it committed the action. We're filtering for product cloud custodian in Security Hub. And it should show up here. Sometimes this takes a little bit for it to post. Awesome, we have our two findings. So the other thing that we're doing of Boeing is ingesting these findings from Security Hub and other AWS services into a SIEM solution. We're running Splunk to report findings, trigger alerts and present dashboards to different teams. So to summarize, we looked at Alice's use case of a rogue resource to apply detective controls using cloud custodian and send them to solutions like Security Hub or SIEM. Finally, we have corrective controls. We can use both config and cloud custodian to take corrective actions on misconfigured resources in our environment. Eve's use case of sensitive ports is a great candidate for corrective actions using cloud custodian. So we're using this test security group in this demo account and we're just gonna add a rule to permit RDP from an arbitrary source, right? It's added and now we just have this little policy in YAML that we're gonna run in cloud custodian. So it's filtering for that rule and it has an action that will remove the permissions. It says it took an action on one resource and we should see it remove our rule. Awesome. So we just discussed how to take a simple corrective action using cloud custodian to solve Eve's problem. We don't have time to demo config but we can do similar things in that tool as well. Guard rules in config also support YAML. We just happened to leverage custodian at Boeing because of what was available in GovCloud at the time and cloud custodian's flexibility has worked out well for us. For recap, we looked at three different scenarios involving cloud misconfiguration, rogue resources, public resources and exposed sensitive ports. We also discussed how to leverage administrative and technical controls to solve our issues including policies, IAM, SCP, config, cloud custodian and security hub. At the time, I wanna bring us back to a question I asked in the first few minutes of this talk. Whose job is it to manage software fraud? The obvious answer may be the security engineer as we focus a lot on the role of security and the custodian nature of our jobs at times. But security is just a lens to see problems through. It's a perspective but the problems themselves aren't unique to security. A messy cloud and messy software is challenging for everyone. Does anyone know of the concept of the big ball of mud? It's expensive to maintain, difficult to scale, inefficient and unsustainable. In general, the same problems that present security risks also present other risks to the business. So do my security engineers out there. If you're not already, start leveraging compliance as code and automation tools to do your work for you and approach these problems not as isolated security concerns but also as roadblocks to efficient development. I think you'll find if you apply that mindset you'll build better cross-functional partners outside of your security teams. My developers, SREs, platform engineers and others tuning into this call I encourage you to think about how you could leverage your existing solutions to jointly solve security problems. If you find it difficult to manage your resources what's the security risk associated with that? How does infrastructure or configuration as code help secure your environment? Thank you again everyone for joining. That's my time and remember to clean up your cloud.