 All right, everybody. It's 340, so I'm going to get started. How are you doing? My name's Jason Fritcher. I'm a principal infrastructure engineer in Semantics Cloud Platform Engineering Group. And I'm currently involved in the middle of rolling out Barbican for a lot of services we're using in our OpenStack Cloud. And just wanted to share with how we're using it. So I've worked in the internet industry for almost 20 years, mostly in development and operation roles running internet-facing services. I've always had an interest in security and kept an eye towards developing trends in the security field and have been one of the people that other people on my team come to, asking for advice and reviews and whatnot. In my free time, I enjoy playing with Hobby Electronics video games and riding my motorcycle. This is the agenda I've put together. So for those in the audience who may not be familiar with Barbican, I'm going to give a quick five, maybe 10-minute introduction to it. Beyond that, I'm going to discuss what kind of problems we have at Semantic and how we're using Barbican to resolve those problems. I'm going to move on then to talking a little bit about what we've done to improve the out-of-the-box configuration of Barbican and make it a little more secure. And then I'm going to cover some improvements that Semantic's working on in Barbican. And then if things go well, we'll have about 10 minutes for Q&A. So an introduction. Back in the old days, this is what a Barbican was, was a defensive structure outside of a castle or city, and was in a very effective defense until about the 15th century, where more advanced siege tactics and artillery became prevalent, and they kind of fell out of use. Today, Barbican's arrest API designed for the secure storage provisioning and management of secrets, such as passwords, encryption keys, and TLS key pairs. Barbican basically has three types of resources that it manages. You have secrets, which actually contain the data that you want to protect. These are typically small bits of arbitrary data. Barbican really doesn't care what you put into it, but out-of-the-box has a 10 kilobyte limit on what you can store in there. Really, you don't want to use this as a large encrypted object store, because it's really not meant for that. If you have large objects you want to encrypt, encrypt them, throw them into Swift, and then store the key in Barbican. Barbican also has containers. These are basically logical groupings of secrets. So if you store a TLS key pair in there, the container will have a reference to a secret with the private key and a reference to a secret with the reference to the certificate. And this gives you one unit you can fetch, which gives you the URLs for the rest of what you need. The last resource type it has are orders. These are more longer running tasks, like when you ask Barbican to generate a key for you. Asymmetric keys can take a long time if your system doesn't have a lot of entropy on it. And you really don't want a client hanging around waiting for that request to complete, which could potentially take minutes. So you submit an order, tell Barbican what you want, and then you can pull Barbican to find out when it's done and then get the result back. There's two basic components to Barbican. There are API processes, which are basically the front door to the system. All client interaction to Barbican occurs through API processes. These processes also completely handle all of your requests for secret and container requests. So whether you're storing secrets or you're fetching secrets, the API processes handle it all. And when you make requests for orders, the API processes will take those requests, stuff them in the database, and then send the request off to an asynchronous worker for further processing, which brings us to the second type, which are the worker processes. They do the longer running tasks behind the scenes, and then when they're done, they update the database with the results so the API nodes can see them. And again, those longer running tasks are asymmetric key generation. It does symmetric key generation. And it's also how certificate management is implemented. Here's some potential dependencies that you can have with Barbican. If you want to do user authentication or authorization out of the box, you have a keystone plug-in. If you need something more scalable than SQLite, which is the default configured database, you can interface with Postgres or MySQL or any of the other popular databases in OpenStack. If you have standalone worker processes, you need a message queue for the API nodes to talk to the worker processes. In OpenStack case, that's RabbitMQ. And then if you want true security in your system, you need hardware security modules. And these basically provide a secure environment for cryptographic operations and key storage. So moving on to how we're using it, basically, some of the problems we're having is, I'm sure, like a lot of organizations, you'll find in your version control system you have secrets stored in there, be it credentials to access systems for scripts to run or API keys for API services, or any number of other secrets that you might find in programmer code. In addition to that, you have issues of, how do you start TLS keys? How many of you here have a central machine that you go to generate your certificates and key pairs on? And then that's the only copy of it you have other than what's in service? Or a developer who generates that stuff on their local machine and that's the original copy. Barbecue can help solve that. Another problem you might see are encryption keys. If you're doing per object encryption, how do you handle the keys? Where do you put them to securely store them until you need to use those objects again? On top of all those problems, automation is the key word of the day. How do you enable automation in your deployments and deploy secrets like service account passwords and stuff like that in a machine accessible form so a human doesn't have to go and put them in? And finally, it'd be nice to be able to automate more of the certificate management lifecycle. The capability is not there today, but it'd be nice to have a system where you store a TLS key pair and the system notices, hey, this expires in two weeks, how about I go renew it? And then after getting a signed certificate back, it goes and notifies the service, hey, I have a new key for you, and then it's deployed automatically. The use cases we're looking for here is our LBAS team really wants to use Barbecue for TLS provisioning. You want to be able to have other semantic customers come in through some interface say, hey, I need an LBAS instance. It needs to support SSL. And here's the certificate and key rep for it. And then the LBAS instance can go and stash the key pair and certificate in Barbecue provision the LBAS instance, and then that fetches the keys for provisioning there. This also enables future scaling in that if you need to scale up your LBAS instances, the key's already there in Barbecue to provision. We also have in our configuration management system tying back to version control, service account passwords in there and API keys we'd like to get out. We'd like to pull stuff out of our puppet manifests and chef recipes to stuff that into Barbecue and then have a plug into those systems so that at runtime they can go and fetch those secrets for deployment on the systems. And finally, we're looking at deploying key management as a service for different product groups within Symantec. They've been really interested in this service because there are products that really want HSM backed security, but they don't want to have to manage HSMs themselves in the data center, HSMs being hardware security modules. So Barbecue is going to help out there in that our group will manage that stuff and they get a stable API to interact against. For how we've deployed things, basically we've got a Calera MySQL DB cluster that is doing all of our data storage needs. This is really nice because it gives us a multi-master topology where we don't care what database node we write our data to. We just write it and then the clustering software takes care of replicating it around to all the other cluster members. This gives us really good replication support and it also enables us to do cross data center replications so that we can have one consistent Barbican view across data centers. This is particularly important because one of the things we're looking at doing is if we hit an issue with Barbican in a data center, we can just fail it to another data center and the client shouldn't notice anything other than some increased latency. We've got a rabbit MQ cluster for handling all of our messaging needs. We're using SafeNet Luna hardware security modules for the root of our security trust. I'll get into more on those later. And for the actual API nodes, we're basically running Barbican in a UWSGI container with Apache and Mod Proxy UWSGI in front of it. We wanted Apache up there so that we had a convenient place to insert filters into the request pipeline. We also offload all of the TLS processing to Apache itself. One of the things we're considering doing is inserting mod security into that pipeline and writing some web app firewall rules for Barbican so that we can, at the Apache layer, detect bad requests and reject them without ever passing the request back to Barbican. This also provided a nice point to insert metrics emissions. So we generate metrics for Barbican transactions at that layer and we didn't have to go dig into the Barbican code to enable metrics emissions. So moving on to some of the hardening stuff we've done. One of the first things I looked at was how to secure database access. And in a way, MySQL kind of made that hard because MySQL support, their TLS code sucks. No offense to any MySQL people in the audience. Enabling TLS within MySQL is pretty easy. On the server side, you just insert the SSL key and SSL cert directives into your My.com file. After this, MySQL will advertise that it has TLS support to clients. If you want to ensure that TLS is always used, add a require SSL directive to your user account and then the server side will reject connections that don't come in over TLS. On the server side, activating TLS is as simple as adding the SSLCA parameter to your SQL connection string. This enables the client to do validation of the MySQL certificates. But as I said, there's challenges here. Basically, MySQL does not implement intermediate certificates properly. So if you throw an intermediate certificate into the PEM file on the server side, MySQL won't read it and it won't send the intermediate to the client, so you won't get a proper certificate chain on the client side for validation. You need to look at adding your intermediate certs to the CA PEM so that the client can generate a trust path. In addition, in the MySQL client libraries, their hostname validation is extremely poor and naive. I ran into an issue deploying it where Semantic being a CA. We use EV certs for everything and the subject DN in an EV cert is longer than a buffer that the client uses for parsing. Basically, it expands the DN into a text string and then searches it for CN equals and extracts the hostname after it. We overflowed that and truncated it and hostname validation failed, so it's something to watch out for. And also, depending on what your mix of OpenSSL versions are, you could run into issues with Diffie-Helman key sizes. So a recent version of OpenSSL added code to reject all connections that use Diffie-Helman keys below a certain size, which MySQL does. And if you run into this, basically, the only solution you have to fix it is to specify a cipher that doesn't use Diffie-Helman for key exchange. It sucks, but that's the way it is. So moving on from there, we did some work on hardening the Barbecon nodes. With the type of data that Barbecon has, you really wanna restrict access, the pool of people who have access to these machines to as small of a pool as possible, to narrow the pool of people who have access and you can actually vet the people who have access to all this potentially sensitive data. Depending on your security needs, it could be as simple as just restricting SSH access. If that's all you need, then in the SSHD config page, the allow users and groups and deny users and groups directives should be sufficient. If you're looking for broader restrictions, like say you don't trust your data center personnel and don't want them logging into these machines on the console, then you could look at the PAM list file or PAM access modules to enable system-wide restrictions, including console and x-axis and anything that uses PAM to authenticate. Automation brings challenges as well. If you use automated systems like Puppeter Chef, then that opens up the scope of who has access to your nodes to anyone who has access to modify your Puppeter Chef code. If you don't have fine-grained controls on who can commit changes into those repositories, you may need to just disable automation all together, which is the route I took. If you're not using community packages, be aware of file system permissions. Really, you want all of the Barbican code and configuration files to be owned by some other user than the user that Barbican runs as. Really, the only things Barbican should have right access to are a log directory and attempt directory. This is to prevent issues where if someone manages to get Barbican to execute arbitrary code, they can't go in there and change Barbican to allow greater access. With read-only access to its own code, an attacker can't do anything to it. And finally, something to look at is host-based firewalling. I would really advise to implement IP tables to restrict access to the Barbican service to just your load balancers and your monitoring systems. This way, if someone happens to break into your network on an adjacent node, they can't attack Barbican from that node, and you force them to go through established paths to get to Barbican, where hopefully monitoring or intrusion detection systems or something like that will pick up their activity. Now we get to hardware security modules. So for cryptographic security, these things are like the gold standard. You're not going to find much better than these in the civilian market. Hardware security modules are specialized devices. They provide a environment that is tamper-resistant and tamper-evident, and it's an environment you can have secure storage of key material in and perform cryptographic operations in. And it's extremely difficult for an attacker to remove secrets from these things that they're not authorized to get. You start tampering with these, and usually they'll either freeze the cryptographic modules until you reboot them in the case of minor attempts or just outright erase themselves in more drastic attempts. They come in different shapes and sizes all the way from USB thumbstick-sized processors, which mainly just hold keys and maybe give a little bit of on-chip operations to PCI cards that you add to your servers, which can give you much better performance, to network appliances, which are basically hardened servers with PCI cards in them for the secure processing. One thing to note is performance does vary greatly between these devices, and when it comes to just raw symmetric performance, like the type that Barbican uses, general-purpose CPUs will beat it for performance, but those general-purpose CPUs don't give you the security environment that the HSMs will. One thing to keep in mind is these are not easily jumped into. There are challenges here. I've run into this myself with my production deployment in that data center personnel who are not familiar with these things are very hesitant to put them in their environment. I've lost two or three months of time fighting with our data center people trying to get my devices put in. Finally, they caved, but it was a horrible fight to get them in there. Depending on what your security risks are for your production devices, you may need to build a physically secure environment within your data center. In our case, we are, because our data centers have access to third-party vendors, contractors and such, and it would be a shame for someone who had data center access to be able to walk in, pull a hard drive out of my Barbican box, and have credentials to everything else they would need to steal all the data. It would also be a shame for someone to be able to walk in and tamper with the HSMs or potentially just walk out with one. But building that secure environment comes with challenges of its own. You want it to be secure, so you end up having to implement auditing processes and compliance processes and policies to ensure that they remain secure. And when the auditors come knocking and ask you to prove that your environment is secure, you better have the evidence on hand that it is. Something else to consider with these things is you need a good separation of privilege model. In the ideal case, you don't want your HSM administrators to have access to your Barbican servers or to have access to the database. Because if one person has access to all of these things, one person could go and dump your database, get credentials to the HSM, and then go decrypt everything that's in there and walk off with all your sensitive data. You don't want that. Now granted, that ideal case takes a lot of resources and knowledge. So it might not always be possible to have the ideal case of separation. I know in our case, we don't yet have three-way separation there like we should, but luckily we have another team within the company that's going to manage all of the production HSMs for us because we have a whole lot of those already that manage our CA business. So it was a good fit there. So now if we could find a way to get DBAs that don't have server access, that would be great. So let's move on and talk about some improvements that we're doing. So the first one being performance improvements in Barbican's PKCS 11 module. The module that is in there right now has a lot of room for improvement in performance. There are a lot of round trips to the HSM per transaction. A lot of this is because it is opening a new session, authenticating that session, doing what it needs to, and then closing that session for every transaction. And it's not doing a lot of caching of some things that could be cached. So part of what I'm looking at for improvements is reducing the number of operations that are being done against the HSMs by caching data where possible. So Barbican implements a multi-layer encryption scheme where you have master keys that will live inside of the HSM and be protected by it. Each project within Barbican has its own top level keys that are then used to protect the data within that project. So the way things are currently, the project keys, every time you go to encrypt or decrypt a secret, it loads the encrypted project key out of the database, sends it to the HSM to decrypt, and then sends the data into the HSM to operate on. Doing this over and over and over is really not efficient. So we're looking at caching those project keys such that after they're used the first time, we leave the decrypted key in the HSM and just hold a reference to it for future use. We're also looking at, as part of the caching, holding a session open so that we don't have to make future authentication requests for future sessions. In PKCS 11, sessions inherit the authentication state of already open sessions. So by having a session already open that's authenticated, any new sessions we create inherit that and we don't have to log in again. Save some time. And then later on down the road, once we can do some profiling and performance testing on this and we can ensure that there aren't any resources leaking on the HSM side, we can reduce that down to just a single persistent connection to do everything through and that'll save some more overhead. The community does have an effort going on with her looking into improving these things on their own and I'm looking here at the summit to sync up with them, compare notes and hopefully reduce this down to just one new improved module. So the next improvement I'm looking to make is basically classes of service in the PKCS 11 module. So the existing modules, it's either all or nothing. Either you do everything through the HSM or you don't use them at all. And this becomes a bottleneck if you need to do high transaction rates. I haven't been able to generate performance numbers of my own but in previous conversations with people the current module does not scale very well. So if you want to scale this out in the current model you need to throw money at the problem and buy lots and lots of HSMs. I'm sure the companies that make those things would love that solution because these things tend to be expensive but for I'm sure our usage, we don't wanna throw gobs and gobs of money at them. So what I'm looking to do is basically create a split class of service. The default class would be exactly the same as it is right now. That way we don't have default behavior differences between the existing module. But what I wanna implement is a second class of service that relaxes the encryption standard. Not everything needs the full level of security that the HSM plugin provides. So what I'm proposing here is a hybrid between the PKCS 11 module and basically the simple crypto module that's in there. The simple crypto module is the one you get out of the box. It's a development module and it's not secure because the master key lives in a configuration file in plain text. So don't put anything sensitive in there. What I'm proposing to do is basically take a lot of what's in simple crypto and instead of having the master keys in a configuration file, use the master keys in the HSM. They will continue to protect the project level keys the way they do now, but instead of wrapping the keys, they would encrypt the keys and then hand the plain text key back to Barbican and then Barbican can do secret level encryption on the API node itself. Combined with the caching from the previous change, this will take a lot of load off the HSMs and enable high transaction rates for services that don't need the top tier of security. I would love to contribute this back to the community as well and something else I hope to get feedback on while I'm here. We haven't yet started implementing this functionality yet so now's the perfect time to get feedback before we do. And finally, we've got one other improvement we're researching at this point and that is Barbican doesn't have a whole lot of database integrity protection. All of the encrypted data that's in there, so your encrypted secrets and the encrypted project keys have integrity protection on them but all of the metadata associated with those objects and all of the other data in there does not. So if an attacker can manage to get right access to your database, they could go in there and play with the metadata and do not fund things like take an expired key and make it not expired anymore. Or if you're using per secret ACLs in there, take the encrypted content of a secret they don't have permission to access and swap it into a secret in the same project that they do have access to and now they have the plain text from the secret they shouldn't have. This is a challenging problem though because we don't want to absolutely kill performance just to add that integrity protection and there's a lot of other hurdles that go along with this like every time you make a schema change in the database you potentially invalidate all of your stored signatures. So we need to find a way that we can add this integrity protection and have a migration path in the face of database schema changes. I mean I'm open to feedback from anyone in the community who wants to provide ideas here and we'll keep the community up to date on what we find here. That's it for my improvements. We're gonna move on to QA now and see what you got. There's a mic back there for questions so that it's picked up on the recording. Anyone? Hey, I'm gonna get a rower. So I was wondering what objects you're encrypting and you talked a little bit about your transaction rate. So like which ones have the highest transaction rate? Encrypted objects right now are the same ones we've had in Barbican forever. So just the secret data, those are the only objects that get encrypted in the database. Beyond that you have your encrypted project keys but like your containers, they're just objects of references to secrets and orders are mainly references as well to other things so there's no encrypted data there. As for transaction rates, we're still in the process of building our production environment so I don't have real performance numbers myself yet. I just have numbers that other people have generated. Anyone else? What do you use for high availability of the REST API endpoints themselves? For API HA, we've got Barbican behind hardware load balancers. They've worked fairly well except for being able to do health checking because the entire API basically sits behind authentication. It's been a challenge to do a real deep health check on Barbican. I've got a service I wrote that kind of works around that by sitting on the side that knows how to query Keystone to get an authentication token so it can make an authenticated request but that's another improvement that I would like to make but haven't actually started working on yet within Barbican. So you mentioned in your non-HSM rooted crypto approach where you cache the project level keys in the HSM itself. Are you worried at all about the HSM capacity being a limiting factor of how many of those keys you can actually store in the HSM? It very well could be an issue. At the scale of my deployment right now, it's not but it'll be a simple matter to add code to the plugin to periodically sweep the cached keys and find ones that haven't been used recently and expire them. Okay, thank you. Going once, going twice? All right, I'll give you seven minutes back. Have fun. Thank you.