 Hello everybody at the Cloud Native Security Conference. Welcome. Today we're going to discuss from theory to practice with Sean Anderson, myself, Anna McTigert, and Michael Hackett. Here we are. And Michael Hackett and I are at Red Hat. I'm a security analyst and he is a principal product experience engineer. Sean Anderson is a PhD student at Portland State University with a focus on formal methods. So here's a little thing. This is actually from practice to theory. The title was created prior to this decision. We're going to be starting off for clarity of this talk, the practical security used in SAF, the flaws with those methods, and a better way with theory. We're going to start off by discussing Rook for SAF, moving into the shortfalls of this current practice and other security features with SAF. We're going to show how theory and in particular formal methods will help us fix these flaws. And I'm going to hand it over to Michael Hackett now. Thanks Anna. So before we get into the practical security side of Rook SAF, I'd like to just give a quick overview on why we're talking about Rook SAF today, right? We all know from a development standpoint that looking at deploying cloud native applications can be extremely complex and challenging when we're looking at it from the outside in, right? Large clouds, large number of compute can lead to a very overwhelming situation. So why we're looking at Rook SAF is because we're simplifying the underlying storage aspect and allowing our developers or our end customers to focus on writing applications and testing code. What we've done with Rook SAF is we've basically designed a way to automate packaging, deployment, cluster management, upgrading and scaling of storage, all underneath the stateful applications as well as providing infrastructure services such as an area to provide logging and metrics and registry to Kubernetes clusters. We do this by augmenting Kubernetes by giving them access to storage services including block, file system and object storage. And you can go to the next slide. It's cool. So why are we focusing on Rook SAF, right? Working at Red Hat, we can see the container adoption is just exploding throughout our customers. More and more businesses are looking at using Kubernetes buzzwords as ways to automate and drive automation, just improve overall efficiency with their teams and just be able to scale at need be without the requirements of going out and purchasing these large storage arrays. The trouble that we have with cloud native environments, which we're looking to cover here, is how do we actually secure and protect all this data that's going in here. We don't want to leave this to the development engineer in order to figure this out. So we need to have these features in place that are able to provide this type of security and protection for the data that resides in our Kubernetes clusters. Go ahead. Go to the next slide, please. So where are we currently at with Rook? Rook gives us the ability to auto scale itself and also auto heal itself when we're facing any type of underlying hardware failure or something that may impact the underlying cluster situation or data access for our end users. There's no requirement to go in there and recover a field object storage data yourself. The Rook operator automatically goes out there, restarts the pod for you. So we're dealing with a self-healing cluster where the developer doesn't have to take his time away from moving forward on any of this application work to troubleshoot an issue and auto recover the cluster. This is auto done all from the Rooksef operator. We're also allowing Ceph, which can be a complex storage product to run on our Kubernetes platform with these via the Rooksef operator. This is enabling the benefits of containerization in the ease of using containers and push that down to our developers. The ease of using containers gives us the auto packaging, the auto upgrade and those types of features that I discussed in the previous slide. Looking at security from a practical standpoint, we can apply these from OpenShift to our Rooksef clusters and we can implement these based upon a individual YAML file or in your CRDs specified for each one of your resources. A CRD is a custom resource definition. In Ceph standalone outside of Kubernetes, when you're dealing with these types of changes to any of the security policies or features inside Ceph, it's usually a very manual process where you update configuration and then push out to a set of nodes. That's a manual configuration. When we're dealing with Ceph and the Rooksef operator, we're still dealing with somewhat of a manual configuration here, but it's a lot less overhead than it would be if we're dealing with a bare metal Ceph cluster outside of Kubernetes. You can go to the next slide please. Addressing a little bit more where we are from the practical side of security with Ceph, I want to go over a little bit of what we're currently offering. We can deploy our object storage daemons, which are your underlying data storage devices in Ceph. We can deploy these as being encrypted using DmCrick during the OSDs creation. These can be done in multiple different ways using Lux. This can be done on the actual device layer, or we can do it on the LVM abstraction layer. As of our Nautilus release, we're actually supporting external KMS via HashiCorp being the only KMS server that we're supporting currently, and that was made available with the 1.5 release. The way that we set encryption on these OSDs does need to be applied manually, so this isn't an automated concentrate that can be done, or something that can be done to an OSD that's already online, unfortunately. This is done by updating a respected storage class device, YAML, or template, and setting your encryption value to true. By default, this is stored in Kubernetes Secret, which from a security standpoint isn't as secure as that would be fit for a certain requirements for users, which is why we introduced support for the external KMS, HashiCorp Vault, when a requirement is for the key to live outside of Kubernetes and on an external vault device. We also, in Nautilus, released something called the V2 Messenger Protocol. What the V2 Messenger Protocol is, it actually introduces encryption on the wire. When we're talking the encrypted level of OSDs, we're talking data at rest encryption. What we're doing with the V2 Messenger is actually encrypting data that is running through the network, and this is done utilizing our own authentication system called Cephex. What we do there is there is a default method called CRCCheck, which basically sets a valid CRC on traffic flowing through the network and validates it when we reach an additional endpoint. CRCCheck is the default method of a V2 Messenger that we use, but there is also a secure setting, which is a more in-depth full encryption of all the traffic going through. We're looking at a full cryptographic integrity check of the traffic running through the network. This requires also a manual editing setting of the cluster CRD during the cluster creation. We can't just turn on secure mode during the cluster being up and running. This is a setting that is required to be set while the cluster is in creation. This is done by defining your cluster CRD. Another area that we look at from a security standpoint is actually the ability to modify user permissions per pools. This is giving certain users read, write, or executive access to any of the pools. This can be configured by setting the client custom resource definition for a specific client. Looking at this primarily for LibRBD use cases, for example, OpenStack running on OpenShift, where we may have a registry or something like that where a user may be required permissions to access specific pools. There are also other use cases. For example, your radio gateway or object store user CRD can also be set to specify specific users to access different sets of object store pools on the underlying cluster as well. That's currently from the tactical standpoint. What we're offering right now from the CEPH Rook standpoint, we understand we are limiting in some areas. In particular, we're looking at a, there is no automation when we're looking at setting these types of security principles on Rook CEPH. We're looking at manual additions and changes to anything required to enable these types of features. You want to cover the next slide, please? Yeah. Thank you, Michael. So this is really state of the art stuff and it's really important. It's amazing. CEPH is continually improving our security as are many other places in industry. It's really good. It's not enough. I still have a job analyzing CVEs day to day. CEPH on Rook is really good. It enables all the benefits of containerization. It enables a lot of security benefits, but it doesn't formally guarantee a whole lot. It's not bullet proof. We still get CVEs. What causes our CVEs and how could we reduce the number of them? Going from a whole lot of CVEs to just a few. So let's think about some of our sources of flaws and CVEs. Many of our flaws occur with access control and with buffer overflow attacks. Going to discuss these categories of flaws and then how to prevent them in theory and in practice. So access control. You know, we deal with this. We're dealing with a lot of stuff by applying permissions, checking user authorization, making sure our key management is set up. It hasn't eliminated the flaws. It has decreased the number of flaws down to about 1% under CWE 284, aka improper access control in 2020. And it's a decrease from prior years, but these flaws still exist. We could eliminate these flaws if we had formal guarantees that state only certain users are authorized to access particular things. We don't have a formal bullet proof guarantee. We just have software that enables that and that roughly provides a general way to generally prevent these flaws. Another, the most common and overwhelming source of CVEs are related to buffer overflow and memory buffer attacks. This is 21% of our CVEs in 2019. These are entirely preventable with guardrails from overflowing languages, from programming languages. There is no need to have a buffer overflow in 2021. It happens due to the use of languages and of course we fix some when they occur, but there are languages that prevent buffer overflow attacks. For example, Go and Rust. Go and Rust are both memory safe, meaning that you can't write to sections of memory that you don't mean to write to. They eliminate buffer overflow attacks. Just by having these guardrails from a programming language and taking a little bit of a technique that says that you cannot have this, you can design a system so that these attacks are impossible. Go and Rust have different benefits. Go obviously has garbage collection, Rust does not, and it all depends, but Rust can be a little bit faster, so it all depends on your goal as to which language you use, but we can eliminate buffer overflow attacks. Even with older languages, with C, you can have verified compilers that do not allow security flaws. The biggest downside is efficiency, but as we're seeing it's starting to be practical with languages such as Go and Rust being widely adopted in industry. These techniques are not just in the realm of theory. They're very practical. So how are we even enforcing security nowadays? On RHEL, we're using SeLinux and SecComp, and this helps, but it relies on user configuration, which as we touched on, can be a huge pain doing all this manual user configuration. Containerization really helps to contain flaws and decrease their security severity, but alone it doesn't eliminate the flaws. It reduces it, it provides a stop gap, but we still get CVEs. Again, we want to eliminate these security flaws eventually so that there are no more CVEs of these types. So how would we do that? Well, we really need to make it so that security is impossible to mess up. Again, we are very optimistic. We're making progress in reality. We want to make this impossible to mess up, and we want to make it so that programmer errors don't lead to exploits. A programmer error should not just be a personal flaw, but it should be a flaw within your design. How do we make it so it's idiot-proof? You can't mess up, you can't create a major security flaw. So we can do that using formal methods. Currently, we're the deniades. We're bringing our buckets of water, our patches, to a CVE that pours it out day in, day out for all of eternity. It's annoying. It's honest work to be fixing all of these CVEs, but it can feel repetitive. So what's our way to the future? I'm going to let Sean here speak about formalizing security and formal methods and ways to actually stop these at the source. We're no longer the deniades. We're no longer having it flow out the bottom, but we're slapping on a CVE. We're going a little bit more than slapping on a CVE fix here and actually fixing that entire bucket so that it's not just all together with the duct tape, but there's a nice solid glass beaker up there. Over to Sean. Thanks, Anna. All right, so we've been hearing this term formal methods a bunch in this talk, and one might reasonably wonder what is that? What do I mean by formalizing and why is this helpful? And why do we have these visions that one day this will take us beyond the era of flex tape security? So the essence of formal methods, it's a kind of broad class of research connected to programming language theory and formal logic. The essence is that we're going to represent programs and systems in some kind of logical system that is amenable to mathematical proof. So we're going to rigorously model our systems and programs. And what that means, for instance, in the area of my research is that we can define logical propositions representing the essence of some concept of security. And so we can think of this as just a very rigorous way of specifying what our system is supposed to do. And then ideally, we then take that specification and the fact that it's embedded in this formal logic, and we actually build a logical proof that guarantees that for a given system, whatever property we've claimed we want, actually does apply to that system. And then the very nice thing is that with recent technologies, we can we make these formal logics machine checkable. So it's not just that I wrote down in my notebook a proof that you take UI ball and say it looks right. But actually we can feed the proof into a machine that will say, yes, absolutely this proof holds. Now that is a pretty high bar. And as I'll get to in a couple of slides, it's not always it's not currently viewed as super practical to do this for large scale projects. So for my part of this talk, I also kind of want to focus on the secondary benefit of formal methods that which is that when you start thinking in terms of these logical systems and properties, you kind of change the way that you think about security in a way that even absent the formal proofs can be helpful in understanding the nature of security in a system. So we want to get beyond thinking about just individual exploits and examples of bad behavior. And we want to think at a higher level a more abstract level. Hit the slide. So we want to think in some sense in terms of abstractions. And this is something that as computer scientists we I think most programmers have some natural ability to do this just by the nature of programming. Certainly we have many, many layers of abstractions in all of our systems. And it's useful to just think more explicitly about what those abstractions are and what they do for us. Certainly programmers, perhaps our end user or perhaps the programmers working on a system like Rooksef, rely on abstractions that are given to them by the programming language. We rely on the idea that if I call Alec I'm going to get a block of a particular size that is sort of separate from everything else in the system. That's memory safety. I expect that if I write through a pointer to this block I don't accidentally write into this other block. That's an abstraction and that makes it easier to write a program. But often those abstractions don't actually hold. You click twice, Anna. Often the programmer's mental model of what's going on in the system has some holes in it. And there are all sorts of different places that these holes can exist. And a lot of CVEs come from the fact that someone was thinking in terms of some kind of higher level abstraction and missed the fact that underneath that abstraction there's actually a much lower level more concrete system that's doing something a little unexpected. And classic example that Anna referenced is the memory, the buffer overflow. The fact is that our buffers are not separate from one another. And in languages like C you can do a simple pointer or arithmetic and jump from one to another and nothing will actually stop you even though it's not legal. And so these assumptions we can almost think of as being in our trusted computing base. And that's sometimes a problem because we aren't actually, we shouldn't actually trust them. So if you took the next slide. What the formal methods community will sometimes get into when we're talking about solutions to this is, can be thought, some people find it a bit utopian. It's the idea that we can actually specify at all the layers of abstraction in a system what things do and how they work. And that gives the programmers working up near the top the ability to prove and then verify. They're starting to specify and then verify exactly what their program is supposed to do. And then because the specifications for the lower levels should also be verified, they are actually able to trust that the abstractions they're relying on are going to hold. And that's very important. There's been plenty of work done in specific programs being verified. A flagship example is Compser, a C compiler that's verified. And the verification there is, well, it does exactly what it's supposed to do. It's not that it doesn't have security vulnerabilities. Luckily with a compiler, you're not running it actively on a production system. So maybe that's not as big a deal. But we sort of have the, like you can verify a program and there's also a lot of work being done in like mitigation of these abstraction breaking things in the lower levels of our system. And that includes things like hardware and software security mechanisms. There's a whole movement toward hardware that provides security enforcement primitives. There's also when Anna is talking about programming language features that exclude certain kinds of bugs such as memory safety and Rust and Go, these help us be more confident in the abstractions that we're giving the final programmer even without actually doing all the work of specifying and verifying every level, which is a lot of work. You have to specify everything and then do a formal proof and that involves, we have good tools for writing those proofs now. We have things like Coq and Isabel Hall that are proof assistants and let us do that in an interactive environment. But it's still, it's a lot of person months to verify even fairly simple programs. And then you have to build up this whole system or you have started the top and you make assumptions. So often this is viewed as impractical. Maybe we'll be doing something. In fact, there are some programs that are getting close to making realistic attempts at this and there are many things that prove a core set of functionality. One thing I want to call attention to is Micro V, a recent work that does this with a fairly realistic hypervisor. But given that this is not something that we're going to just start being able to do overnight, what are some takeaways that we can do right now? If you hit the slide. I think one of the big ones is just kind of maybe a development philosophy level starting to think in terms of these abstractions, in terms of the properties that we are depending on and that we are offering further users of our software. And this is where, you know, if I want to connect this to Seth, I might think, okay, so we have all these different security features that are being offered to the end user. But what are the, what is the end user actually using these for? What, you know, can I write down a formal description of how the system should behave so that the end user can rely on that? You know, if the end user were writing their own code that they were going to verify in this very rigorous way, and I were providing them with some axioms that they'll use in their proof, what would they want to assume? Can you hit the slide? And there is a whole area of research around just writing these properties. I won't spend a huge amount of time on them now, but like, for example, if we're talking about containers, the sort of official way in the security literature to describe a compartmentalization system where containers don't talk to each other, is something called non-interference. Actually, we use non-interference for a lot of different purposes, but in this case, non-interference means, you know, if I'm a container living in some larger system, well, maybe that larger system could be two different systems, and I can't tell the difference between them because the parts of the system that I can see, my own memory, my own storage, between these two systems, it's identical. But other containers in the system might be doing who knows what, right? They might be different from one another. So if I'm living in either of these environments, and I can't tell which one, and then we execute for a while, I should still not be able to tell the difference between the rest of the system, you know, if I'm in system A or system B. And that means that that's just kind of an abstract notion, but we can think about what that means. That means that if out there in that system there is an attacker in system A, but no attacker in system B, and I can't tell the difference, then clearly I didn't get attacked, right? And on the flip side, if I'm an attacker, and out there in system A, there's some secret that I'm trying to uncover, and if there's a different secret in system B, well, if I still can't tell which system I'm in after executing for a while, then I must not have discovered that secret, right? And by phrasing things in this way, it may seem kind of convoluted or, you know, just like too abstract, but it means that we don't have to talk about specific actions like memory accesses or, you know, things like that. And so we can actually capture a wider range of behaviors that lead to the leaking of data or the interference with programs. So you can get the slide. The other thing to think about with, let's pretend that you sat down for a bit and said, well, how would I show my system obeys this non-interference property? And you start thinking about, like, okay, in C semantics, if we step and step and step. These are the different steps we can take. And at each of these steps, I want to say, well, I'm only reading from the things that I'm allowed to see, and I'm only writing to things that I'm allowed to change. We'll pretty rapidly hit some circumstance where actually this doesn't hold. This property, as I've stated it here, is too strong. And that's kind of intentional. This is like the utmost level of safety. So if we want to think about applying it in practice, we need maybe to make it probabilistic because the underlying mechanisms that Michael talked about rely on encryption. And encryption gives us probabilistic guarantees and not absolute ones. And of course, many systems will have ways that containers are going to talk to each other and so we'll need to model that. And it's not going to be this nice straightforward isolation thing. But the really powerful thing about a model like this is it gives us the starting point to say this is the extreme of what security is. And now here are the ways that it doesn't apply. And we can build our model from there and hit the slide. And then we can talk about what kind of security guarantees we do and don't offer. And so if I'm going to give any guidance as to how to use formal methods without really formalizing everything and just informally getting value from these ideas, it's that you think about these abstractions and the way that you're relying on them. And you think about the abstractions that you offer. And you kind of do in your head an informal kind of reasoning about what those are and why you believe that your security mechanisms are sufficient to offer them. And then you can tell your customers that and you can have some new buzzwords. We all love a good security buzzword and I'm frankly a little surprised that non-interference isn't one of them yet. I'm sure the marketing people would love to slap that on your software if you have a reasonable belief that you can provide it. But just by going through that process of thinking about security in this way, you have a kind of higher level view of the sort of security that you're trying to offer and the reasons that you might fail to offer it. And that's even before you've started doing any real formal methods work. And of course as a formal methods person, I advocate for you to also do the formal methods work. But I'm a little biased there. And then of course the other thing is, as this talk is an example of, it's good for developers and product people and security people and formal methods people to all be talking about our own different perspectives on how to make systems more secure. And I think that cross pollination is going to be very valuable because at the moment in industry at large and academia, these are kind of very siloed disciplines and we really could use more cross pollination. So I think that's it for this talk and we'll now take questions.