 It's not so, because there are supposed to be two windows and we didn't get two windows. So if this full screen is here, we're good. That is good, but we have a title bar. It's not going away. Have you seen this before? Okay, now it's, oh, I see. Mark, I guess we'll have the title mark. That's not too much. The last thing is this is going to give you a beep at 30 minutes. Let's just let you know it's time for that. At 30? Oh. Got it. All right, so let's get started. So, hello OpenStack Summit. I mean, Open Infra Summit. It's nice to be back at the summit after such a long break. We and the SAF team are very attached to OpenStack, so it's nice to meet you all in person again. So my marketing manager had a long history of saying that you have to introduce yourself, otherwise it's malpractice. Those are things I worked on in the past. So I've had the privilege of spending basically my entire career in open source, aside for brief stint in academia. I'm the product management director for the SAF platform at IBM and at Red Hat, and previously I was the Ubuntu server PM at Canonical, and if you round out the decade, I was the dreaded systems management czar at SUSE. That was my actual title. I was the maintainer of MAN for a very long time, and I wrote a book by O'Reilly about AWS, which is why the picture looks funny like that. And I think that's enough about me. So, security. This is a pretty advanced session. We have the obligatory introduction notes, but this is SAF Day. It's the afternoon. If you don't know what SAF is by this point, see me afterwards for Cliff Notes. Similar thing for Rook. You should have absorbed something about Rook, but here the focus is open stack, so maybe you didn't. Rook is a Kubernetes operator for SAF, and essentially it manages SAF clusters running on a Kubernetes fabric. That's the way you should think of it. It's interesting because it automates some of the day-to-day management or encapsulates multiple options in ways that are not possible outside of a containerized and virtualized environment. So, that's another interesting component that we use all the time. So, now let's go to the actual security. Let's dive right in. Now, what is security? Security hardens a specific point of the infrastructure. That is what security best practices do. Cherry-picking practices without the model of the threat and of the attacker that you're worrying about is not a viable strategy. It's like saying that everything is a priority, which is the same thing as saying that nothing is. So, you have to decide instead of having... That's not the meaning to management, but instead of saying management like things that result in everything is a priority, you have to have an actual priority, something that is actually meaningful. Otherwise, you will wind up in scenarios like cover your computer in concrete and bury it at the bottom of the ocean because you have to stop every possible imaginable thing, which is not very useful, clearly. So, are you facing script kiddies at one end of the spectrum? Or GRU, the Russian Secret Service, at the other? These are very different threat models. Some attackers want to steal your data. Others don't care about your data. They just want to encrypt it and hold you for ransom, cryptolock you for ransom, as they say now. Others may be just satisfied with complete disruption, bringing you down. It doesn't matter where the data is or if you have it or not. So, for these reasons, you have to decide who is in your threat model and who isn't. Typically, the worst-case scenario is the privileged insider. Somebody that is inside your security perimeter and has root credentials. If you have that in your security profile, then you're going to have some very interesting challenges to deal with. Now, once you have this, you have the basic framework to make choices. And so, let's start from the network. In SEF, the network is segmented in multiple logical networks. The public security zone is an entirely untrusted area of the cloud. It could be the internet as a whole or just networks external to your cluster that you have no authority over. Data transmissions crossing the public security zone should make use of encryption. Note that the public zone, as I just defined, it does not include the storage cluster front-end, the SEF public underscore network. That's an unfortunate naming, but that is a different thing. That it defines the storage front-end and properly belongs in what is called the storage access zone. So anyway, the SEF client zone next refers to network accessing SEF clients. These could be the object gateway, for example, the SEF file system or block storage. SEF clients are not always excluded from the public security zone. For instance, it is possible to expose the object gateways S3 or Swift APIs in the public security zone. That's exceedingly common. Next is the storage access zone. That is an internal network providing SEF clients with access to the storage cluster itself. And finally, the cluster zone refers to the most internal network, and that provides storage nodes with connectivity for replication, heartbeat, backfill, recovery tasks. This zone includes the SEF cluster's back-end network called the cluster underscore network there in SEF. And operators often run clear text traffic in the cluster zone because they rely on the physical separation of this network. Or on the logical separation, usually this is on a VLAN at the very least. This, again, going back to previous comments, would not be a valid choice, for example, if privileged insider is in your threat model because the privileged insider very likely would have the keys to the cabinet, can go in and tap this network. But if your threat model doesn't include that, then you can have the performance of a cluster that doesn't encrypt its internal communication if you're not worried about people putting a network tap inside your rack. So, again, everything is on the threat model first. These four zones are separately mapped or combined depending on the use case and the threat model in use. Now, components spanning the boundary of two security zones, which are demons in SEF, two security zones with different trust or authentication requirements must be carefully configured. There are natural, maybe not weak points, but there are natural points to attack when you're trying to escalate privileges. You've got access to one network, you're trying to hop into a more privileged one. And so the crossover points should always be configured to meet the requirements of the zone with the highest security requirements or the higher level of trust, if you want to think about it, in those terms. In many cases, the security controls should be a primary concern due to the likelihood of these components being probed and the possibility that you just fat finger some configuration thing and you create a passage between two security zones by misconfiguring the way things are set up. Operators should consider exceeding zone requirements at integration points, which usually it's an empty thing to say because, again, you have to be able to use the technology that you're deploying. But in storage, it's actually often a possibility. Storage is a little bit simpler in this regard than compute. And so if your storage use case gives you the possibility of ratching up the configuration security of the integration points by all means, that's a good hardening practice. For example, here, let's say the cluster zone can be isolated from other zones easily because there is no reason for it to connect to other zones. Conversely, an object gateway in the client security zone instead needs access to a lot of things. The monitors, ports 6789, all the OSDs on ports 6800 to 7300 to access the actual data storage will likely expose the S3 API to the public security zone, ports 80 and 443. So some things you can't close, but there are plenty of things that you can. So those are the low-hanging fruit. Now, normally I'm joined by Sage McTuggart, who leads our security team, but they could not be here today. The thing about product security that is interesting in CEF is that as you heard yesterday, we have transferred CEF internally between the IBM and Red Hat boundaries. We moved the CEF team from Red Hat to IBM. So now we are looking at IBM product security not Red Hat, which is slightly different. If possible, even more paranoid is my impression. So from the point of view of the customer, that is good. You may get more patches, but it doesn't matter because the deployment model is containerized. So when you get one container, how many security fixes have been addressed in the container is really not that relevant. So potentially, if it doesn't slow us down and fix things that in the past we would have considered relevant because they were not accessible in a storage system like vulnerability in the... Usually the canonical example is saying that there is a vulnerability in the printer driver in Grell and how does that matter when you're running storage on top of that? IBM tends to be very strict about what is in the images. So we've become stricter about purging the images with dependencies that we're not using. But it's fairly easy to just uninstall these things that are in the stock rel image that we start from. And we're probably going to be shipping more fixes for things that are rated medium or lower in the CVEs. I don't think that it changes things much, but it's better or same, right? So that's probably the most visible change. Let me see. The new team is called Product Security. But because we're still getting revved up in terms of what processes we're adding as part of IBM compared to Red Hat, we're trying not to say things that then don't happen later. So bear with us as we figure it out. The general line is that it's going to be same or better in terms of security. Generally speaking, the standard for Red Hat is that vulnerabilities are addressed if they're exploitable. Otherwise, Red Hat tries to minimize the number of fixes that ship to reduce the noise in terms of what the operators have to manage in terms of the constant stream of updates. As I already mentioned, IBM takes a different take, and it tends to be if they exist, patch it. That is probably the most visible difference. We are getting more disciplined in terms of doing penetration testing and regular scans. I don't think I can tell you how often just yet, but it's more often than in the past, which is nice. And of course, nothing changes in terms of our relationship with upstream. Everything that we fix goes to upstream first, any CVE that we address is available in the community project, just as is in the products. So let's move on to encryption. Server aside, operators overwhelmingly choose to encrypt data at rest using the Linux, using the Lax mechanism. All data and metadata of a self-storage cluster can be secured using a variety of DM crypt configurations and nearly all of the existing Red Hat customers choose to. I stopped tracking how popular was at rest encryption in 2015, because at that point it was already overwhelmingly over 50%. It's a given that at rest the data is encrypted. The OSD is encrypted, period. We're at the point where actually replacing encrypted drives is a single operation in CFADM, and it's a very simple workflow in the dashboard, so that all the configuration mechanisms that need to take place when you are taking a dead drive out and putting a new one in are all handled for you. We also have a security best practice of locating the monitors on separate hosts from the storage demons. This is because the at rest encryption keys are kept in the monitors, so that way by having anti-affinity, if somebody steals one machine of your cluster, they either have the keys or they have the data, but they don't have both. They need to take two machines or the right two machines to be in proper. This obviously is not possible at the edge where you overlay everything in three nodes, but for normal self-clusters with seven nodes, 11 nodes or hundreds of nodes, this very much makes sense. And there is one other thing that I get asked about all the time, which is, okay, well, no, actually, I get asked only by the Washington DC crowd, but since this is the OpenStack crowd, you're on the same technical level. What happens with the keys that are being kept by the monitor? How are those secured? Those are at rest on the file system of the monitor, so the file system of the monitor should be also de-encrypted. The Linux boot partition should be de-encrypted. And then it's in the hands of, however, you secure the boot up of your machines, right? You're in your data center, you've already solved that problem of how to authenticate boot up. It applies there as well. Encryption-wise, the object store gateway has additional capabilities, including encryption at ingestion time, so you can have RGW encrypted data as it comes in, rather than LUX encrypting it as it gets stored on the file system. The difference there is a little bit academic. The part that is interesting is that you can think of it as when you're encrypting the data at rest, the keys are managed by the cluster operator and they're centrally managed and they are per machine keys. When you're having encryption done at RGW, the keys are per user keys and they're managed by the user. So that's the biggest, most relevant difference. Key rotation with tools like Vault is supported in RGW. There is support for Amazon AWS, SSSE, KMS, and there are more things coming. Further, you can use a Department of Defense certified cryptography under FIPS 142 and this year we hope FIPS 143 is supplied by RHEL in the version that you're deploying. So if you use RHEL, you know that there is a mode you install RHEL in that's called FIPS mode. It only gives you Department of Defense cryptography in that case and SEF very strictly only uses crypto from the operating system. So at that point, SEF is also using certified crypto. Encryption in transit now. For encryption in transit, a network communication can be secured by turning on the SEF protocol encryption in the Messenger v2.1 protocol or later, a protocol that was introduced with Nautilus. Now, here it's different than at rest. Network communication in clear text may be fine. The extreme case was the one that I was giving you. The internal network that the class and turn nodes are using to replicate data is very often physically secured with its own NICs. So if your threat model doesn't include someone that has the keys to the cabinet, there is no need to purchase additional CPUs and RAM to just encrypt traffic between the nodes themselves. The most common scenario where that kind of communication gets encrypted is corporate policy. We have some customers. I believe one is in France. I think it's due to a French regulator and a couple of others mostly in Europe that are adopting policies that say we encrypt everything no matter what. So from the point of view of the operations team, if what is happening between the SEF nodes in one cluster is regarded as network communication, the operations team may decide that it's actually better to buy a few more CPUs and a little bit of RAM rather than slugging it out with their security team for an exception there. So most of the network encryption that we see for the SEF protocol is coming from that kind of compliance reason. But there are plenty of cases where it's actually necessary for actual practical security reasons. CFFS talks to the entire set of OSDs. RBD talks to the entire set of OSDs. If the clients are transiting any network that is less than 100% locked out, it makes sense to encrypt that. In the OpenStack-specific scenario, this has been addressed for years in a slightly different way, which is that in the NOVA VM, you deploy DMCrypt to encrypt the file system of the VM internally. And then all the communication from the virtual disk of the NOVA VM to SEF is in the clear in terms of SEF protocol, but the payload is DMCryptedData. So you have perhaps not ideal because you're using at rest encryption for in-flight data. So maybe some cryptanalyst may wrap my knuckles, but in general, that is how OpenStack has done it for all this time. And it's worked just perfectly fine. However, if you want, you can replace DMCrypted at rest in the VM and use Messenger v2 encryption instead. It's a little bit of... It's logical that you use DMCrypt in the VM because you want to encrypt the VM at rest anyway. So you kind of get two birds with one stone there, or I think with one scone is the new way to say it. By encrypting both the VM at rest and the network protocol. But that is very OpenStack specific. How much does this encryption cost is the other question. And so you have to size the cluster. If you're going to encrypt the in-flight protocol, you have to size RAM and especially CPU to account for these overheads. In most cases, the performance impact is not that significant. You have to account for it in the cluster architecture, but in terms of the overall user performance, you are going to see the same performance because there is no significant enough latency that is not overshadowed by the latency of network communication. So we usually don't see other slowness. You just need a little bit more hardware when you design the cluster. Things vary between large block sizes and small block sizes, but that's a general heuristic. All right, looking at more specific protocols, DS3 service is usually secured between RGW and DS3 client with TLS on port 443, obviously. Also, it is possible to serve plain HTTP on port 80 for some reason you want to do that. I guess for some services that may be desirable, maybe you don't want to encrypt images or something like that. The interesting bit is that TLS termination at HA proxy is a special case. So TLS would end at HA proxy, then HA proxy needs to talk to RGW. That hop is in the clear. You have to account for that in your security model. The other thing that's generally applicable is that standard practices like maintaining the firewall of all your nodes are then the firewall around the cluster, but individually firewalling the nodes so that they don't expose things that are... that is not necessary to expose is an obvious best practice and that should be absolutely followed. Rook specific. This is a little bit of Kubernetes bombardment, so I apologize if you're not too familiar with that, but it's only one slide. Rook can use CRD's custom resource definitions to encode many settings like configuring trust certificates for the Rados gateway web server. Rook supports at rest data encryption as we discussed earlier with in-flight self-protocol encryption management being added in 1.9. It's essentially a management tool, so you see there usually a little bit of latency between self-versions and support for it in Rook. The Kubernetes user permission system applies to PVs, Kubernetes persistent volumes. So permissions, quotas, and all the other accoutrements that come with Kubernetes storage management are all there. Rook doesn't need to do anything for that. Rook also support key management systems and CSI interface, allowing individual volumes to be encrypted with their own key. This kind of helps a little bit with managing key rotation, revocation, and limiting the scope of each key. Control plane. So how do you manage this? This is very standard. As popularized by Ansible, SSH is used by SAP ADM, and so SAP ADM, SAP Ansible, and other deployment tools or day one tools tend to be SSH-centric. So that's something everybody tends to be very familiar with. These provide paths for install and upgrade management as part of host management. And then on the management dashboard, there is a thing that is interesting in terms of being another decision. You have to decide where you're placing the dashboard in terms of what network. You can place the dashboard in a fairly secluded network inside of SAP where only the SAP administrator can reach it. You can put it in a more public network where it's more accessible within your corporate infrastructure. It's a little bit more accessible. That is something that different users choose to do differently. It's one of the decisions you must make. Ultimately, it needs to be reachable by the operator's workstation to be useful, right? But where is the operator workstation is variable depending on who the customer is. The manager is the demon that provides a significant number of, or mostly all, of the functionality on the dashboard. So that needs to be reachable to have the dashboard accessible. All right, more on identity and access. Cephex is the internal identity system that Cephe uses. Its use of shared secret keys protects clusters from man in the middle attacks by default. But good practices still apply here. So good practice is to grant the key ring, read and write permissions only for the current user and the route with client admin users restricted to route only. And that limits what user can pretend to be a route, essentially. Talking about RGW. RGW supports the key and secret model of AWS that most of us are already familiar with. So nothing too surprising there. It also supports the security model of OpenStack Swift as it needs to be a drop-in replacement for Swift. Of course, the administrator's key and secret need to be treated with the appropriate respect, and that's no big surprise. We use full route privilege keys or administrative keys if you want to call them that way, sparingly to reduce risk profile. RGW's user data is stored in self-pools. So that should be secured as we discussed previously with data at rest, and you have to consider this user data, so you should be aware of it. You can integrate identity providers from your organization to provide identities to RGW so that you don't need to create a whole set of users just for RGW, you can obviously import. You can obviously import the users from your corporate identity vault. With identity and access, we support LDAP and Active Directory users. We recommend secure LDAP. And obviously in the OpenStack context, we support Keystone for OpenStack Clouds. In auditing, a lot of operations are audited. There shouldn't be anything touchy there, but it is considered the best practice to regularly purge your logs so that you don't leave too much data behind. You should aggregate your logs in a central location like ArcSysLog or Splunk or whatever it is and purge them locally out of an exceeding level of paranoia. Data retention is an interesting one. Once data is deleted from a self-cluster, you generally cannot do anything with it. However, there are exceptions. RBD is a trash bin where dynamic use of spare capacity can be used to retain deleted images for a certain number of days or until the space is reclaimed. RGW has versioned buckets, so you retain data as part of previous versions of objects that you have deleted. If you don't want to retain data inadvertently, you have to configure these things correctly so that they don't retain data for you and you don't want to keep it around. Additionally, like in almost any storage medium, individual data blocks used to constitute an object are still there on the disk platters and you certainly cannot overwrite a self-cluster by writing a lot of data to it. It's not going to work. At least it's not going to work reliably, which is what you want to do. So the sanitization or the secure deletion practice is use hard drive encryption, like we discussed before, for other reasons. When you want to sanitize the drive, you throw away the key. The fact that now there are drives that do this in hardware make things even easier. You don't even need to manage the keys yourself. Hardening options are very vendor-dependent. These are Red Hat options and obviously consequently also the IBM options. Others may vary. Now, we're not going to go into discussion of GCC and kernel security options, but if you're lucky enough that you don't know what these things are, you can learn more by looking at documentation from the Ubuntu kernel team. They have a very nice table of all of these. So once you know what they are, then you can decide whether you need them or not. It doesn't really, for most people, well, maybe this is open stack after all, but for most people, we don't go out compiling new binaries. We don't have the time to do that. But you should be aware of what the security level is. So you go and study and as such. Now, that is it. There are some more reading things for you available, covering the things that we haven't covered. Kubernetes secrets are always a hot topic, especially recrelated. So hacking Kubernetes by O'Reilly and a tutorial by Rani Osnot at Acquia Security Cover, that in detail. Our product, Red Hat, Chef Storage and IBM Storage. Chef have a dedicated hardening guide that expands what I was describing in long form. Kubernetes documentation is a very nice treatment of encrypting data at rest and again, those mysterious options from GCC for the hardening of binaries are also documented. These are all the people that contributed to the presentation so far and that's it for today. Are there any questions?