 Hi, everyone. We will be talking about Keyhouse, a key management system that's ready for production. And we work for a company called Bydance. It was founded roughly nine years ago. We have a lot of products. Most famous one is TikTok. And a couple of words about ourselves. So ViewDing leads trusted computing and secure coding initiatives at Bydance. And his big supporter of Rust is maintaining Apache T-Clave, which is an open source universal secure computing platform. And before that, he worked at Bydance similar problems. Max, who is with us today, is a very quick learner. And he is the primary developer on Keyhouse. So without him, this project wouldn't exist. He brings his Rust skills and, like in general, computer science skills and security skills to the table. And previously, Max worked on some anti-automation problems. And me, Sergei, I work primarily work on web security problems, which includes anti-automation and some browser security APIs and so on. So none of us had a prior key management system development experience, so that's kind of more interesting. So what is Keyhouse? It's an open source project, well, soon to be open source project, completely written in Rust. And at Bydance, security and privacy are top concerns. And just a few words how we started doing this is that there was an existing key management system that we were integrated, started integrating with ZTI, with our new authentication mechanism in our company Cloud. And while looking at the code, several refactoring initiatives came to our attention. And at some point, critical mass of changes was so big that it was so massive that we decided to rewrite everything from scratch. And while there was no massive Rust projects in the company, we were allowed to, management allowed us to experiment with Rust for a bit and maybe rewrite some small components in Rust and see if performance benefits or any other benefits overweave a steep learning curve and we can benefit from it. And yeah, it went so good that within a couple of weeks, we wrote a minimum viable product and we were amazed with the performance and scalability and like memory safety guarantees that we were allowed basically or we got a support to implement everything from scratch, 100% in Rust. And yeah, so our Rust deployment, this project has since matured at Bydance and we're deployed in production all across the world now. So we can say this project has internally been a success and part of our open source initiative is turning this into a success for everybody else. Yeah, so thanks Sergei for telling us stuff about why Rust. So on Macs, I'll be going into some of the design considerations in Keyhouse. So starting off here with our core components, everything in Keyhouse is designed around kind of like a monolith kind of end product in that you ship like just one service out that provides Keyhouse and then you balance that out amongst many pods or whatever. It would be horizontally scaled, but just one category of service. The reason being is just minimizing chance of failure, minimizing any kind of like potential dependency problems and just overall trying to maintain as much simplicity as possible because as it turns out, key management systems tend to be depended on by a lot of critical services. We've really, really needed that top reliability and that top like just reliability and redundancy. So we just don't have any room for failure. So a lot of our design considerations go with that in mind. And with that in mind as a monolith, we have kind of like two halves to our program. So we have our data plane, which is what our SDKs talk to. And then we also have our control plane, which is what like you or an operator administrator would talk to to create new keys and do other administration operations. And in general, you could say the each component of Keyhouse is designed to be very easily swappable at the build phase. And we did this kind of with a service as a library approach where you would import Keyhouse almost as a library and then fill in whatever implementations you need to integrate with your own corporate infrastructure. The idea being, okay, as ByteDance, if this was somebody else's open source project and we have all the specific infrastructure requirements we need to fulfill, how can we get this open source project to mesh with our own internal systems? And so we tried to make it an approach so that we could both answer that question for us but also answer that question for everybody else. And so that's why we landed this kind of hot swappable system. So for example, like control plane authorization can be defined by any system, like some kind of SSO internal to a company, just standard JBT tokens directly through Spires, Biffy, zero system infrastructure, anything works. And on a similar note, like exact crypto algorithm support and even like the backend itself, like we will primarily be talking about EtsyD here, but you could swap out backends of other needs of Rose. And in this sense, with a lot of other components of Keyhouse as well. Speaking of EtsyD, EtsyD is if you're not familiar with it, a backend data store and we'll be talking more about that shortly, but it's ultimately like the root source of all, like backend storage for Keyhouse. EtsyD itself doesn't have support for Spiffy-Spires ETI, which is what we use for identity and attestation and ultimately authorization. We wrote another project called Spire Proxy that allows us to provide to kind of like a wrapper over EtsyD in order to provide like that, that Spiffy-Spire attestation authorization. And we'll be talking more about Spiffy later on in the presentation. Also generally speaking, Keyhouse will also be talking to an NHSM hybrid security modules, a hybrid of trust, we'll also be talking about that more later on. So let's talk more about EtsyD. So EtsyD, as I said before, it's our primary authoritative backend data source. Every Keyhouse instance talks to EtsyD all the time. Of course we do have failure like prevention, but we have like kind of like failures in place and stuff. That's all ultimately up to how you want to deploy things. We use Spiffy IDs as a form of identity between EtsyD and Keyhouse. So we're not relying on like username, password, scheme, anything like that. And you could set that up with other backend stores as well, but like I said before, we'll be focusing on EtsyD. And in general, EtsyD, if you're not familiar with it, it's a very simple key value store with its primary selling points being high consistency guarantees, very persistent, very reliable load distribution, but it is distributed. Like generally speaking, you can play like five to seven nodes. So it's relatively scalable for what we need. And it's also relatively easy to create replicas of both with EtsyD and as we'll be talking out right now with replicas and with caching. So I guess on the last side I talked about a little bit here, we do also create replicas of EtsyD in order to deploy in the same cluster dataset across multiple regions around the world. And part of that is we have this really interesting kind of cache infrastructure in Keyhouse where each Keyhouse instance, right? So each like Potter and Keyhouse effectively becomes a replica of EtsyD in and of itself, a read replica that is. And in really kind of like more theoretical terms, this ends up being a kind of like a right through evictionless cache in that due to the generally low size of the entire dataset, like I think our entire production network, it doesn't grow above a couple of megabytes for every encryption key there is because encryption keys just aren't that big. So what we end up doing is we actually just cache everything all the time in every Keyhouse instance, which provides amazing like failure, like just handling failures really well. EtsyD goes down, Keyhouse just keeps trucking, like it's no big deal. Of course, we deny any rights during that time, but that's why we're right through is to deny rights when EtsyD can't confirm anything. And in general, Keyhouse actually relies a lot on the kind of consistency guarantees that EtsyD provides in order to synchronize its operations. And so that's a very useful future provided by EtsyD. And yeah, this has given us great performance gains across the board. And ultimately, if you think about it, it makes it such that any given request that Keyhouse has to handle does not have any direct network dependencies, which is both at least other than the request itself, of course. And so this results in just like a really reliable like request serving latency and in general, just high reliability of serving requests. We've talked a little bit about it before, but what is Spiffy? If you're not already familiar with it, it's an attestation and identity framework. It's an open standard, open source implementations and standard. We mostly center around Spire at Keyhouse and ByteDance. Our own internal deployment is using Spire. And if you're not already familiar with it, ultimately what the usage of Spiffy Inspire provides is some kind of Spiffy verifiable identity document or SVID, generally in the form of an X509 certificate signed by Spire through some form or NES256 JWT token containing a Spiffy ID, whereas Spiffy ID is just a URI of a specific form. In Keyhouse, we place a generalized form of this and that you have like Spiffy and then you have like a trust domain and then you have a series of key value pairs, which denote some property. The intention being that you can use wildcards to deal with it more generically refer to parts of a Spiffy ID. So yeah, how does Keyhouse specifically use Spiffy? So in general, everything in Keyhouse is protected with mutual TLS. And we're using Spiffy to communicate identity through this use of MTLS. So like SDKs will confirm the identity of the Keyhouse server it's talking to. The server will confirm the identities of SDKs, if applicable. And on top of that, the Keyhouse will also use these identities to authorize resources in, we'll talk more about the specific resources being authorized later, but the customer key is that a given request that the data plan is accessing gets authorized to these Spiffy ID identities as well. So those URIs we looked at before. And all of our integrations of this due to the general low integration of Spire and Spiffy in Rust so far in open sourcing, we actually wrote our own implementation called Spire Workload RS. That'll also be getting open source at the same time as Keyhouse when they get so open sourced in the coming months. So let's talk more about our key hierarchy. So in general, like Keyhouse really revolves around a lot about envelope encryption. And we also put a lot of effort into hardware root of trust. So our master key, generally speaking, is stored in our HSM and it is really hard and down. It's in a lockdown cluster of like HSM modules with any implementation defined like backup system and access control outside of Keyhouse's hands. And generally speaking, one would expect that to have a very like low throughput for encryption. Like you're not gonna want to send a bunch of stuff into your HSM. So we use an intermediate key that is directly encrypted by the HSM. And this gets stored in at rest encrypted. And this is used to encrypt all other data in at CD generally speaking. So like customer keys is our number one example. We'll talk more about this later, but those are really like our meat and bones. And that's what a lot of like access control and users kind of interact with. Data keys are what we use an SDK level to, it's kind of like the end use key. You can think about it that way. So we made a master key. These are generally manually rotated by any like, presumably have some kind of like operation in place to create a new master key internally. And then you would just have Keyhouse updated to start migrating all of the existing encrypted customer keys and intermediate keys around this new master key. So our intermediate key... The only time you would need to access HSM or any other hardware root of trust implementing storage will be at the startup when you need to decrypt intermediate key that's in trust or when you rotate intermediate key. Yeah, so this ends up being that in our current defaults where we rotate intermediate keys every 24 hours and assuming you don't restart Keyhouse very often that results in one encrypt like end decrypt operations one per pod and then one encrypt operation to create a new intermediate key. And then each pod would have to decrypt the intermediate key on boot. That's kind of like the general load on the HSM that's expected. Yeah, so our intermediate keys we've heard a little bit about these but these are, as I said, are rotated once per day and are encrypted at rest by the master key in the HSM. There can be more than one floating around due to the way that Keyhouse migrates customer keys when we rotate intermediate keys. Intermediate keys actually get stored alongside or they get copied alongside customer keys encrypted at rest this way that when we rotate an intermediate key we can do it reliably with each pod taking an arbitrary chunk of those customer keys and re-encrypting them with new intermediate keys in a highly reliable way taking advantage of SED's consistency to coordinate. So our customer keys, these are real meat and bones. These are what access control lists live on. These also have an assigned purpose, like, okay, these are for encrypting secrets. These are for encrypting data. And they never leave the Keyhouse. It's kind of like the lowest layer key that is in Keyhouse server itself. Generally speaking, what happens with these keyhouse or these customer keys is an administrator goes through creates one sets an access control list and then we get to the data key. So what happens is, if you think about it from the SDK perspective, SDK wants to encrypt some data. SDK requests a data key from Keyhouse. Keyhouse uses a one-time password scheme, kind of like two-factor authentication to generate a data key for a given day to make caching a much more reasonable thing to do because we want to keep the number of data keys floating around for a given time period low. Generally speaking, we do one data key per day per customer key. And so what Keyhouse will do is it'll return to the encrypted data key and envelope encrypted by the customer key and the unencrypted data key to the SDK, which will then encrypt the original payload and store alongside the encrypted data key alongside it, along with metadata that tags it along to its owning customer key. And yeah, it ends up being this really clean implementation where the data never gets sent to Keyhouse so it minimizes your bandwidth usage, but we also avoid sending the customer key to the client and also minimizing the number of requests upstream to Keyhouse. This also provides us a lot of redundancy. So in the event that the Keyhouse server is down, the SDK can't fetch another data key. I mean, it's not great, but you know, you can have some leeway where like, say we try to fetch a new data key every six hours, but it fails, well, we can wait another six hours. It only rotates every 24 hours anyways. And, you know, of course you want to put strict limits on that to avoid any kind of problems, but it allows for a lot of extra fault tolerance and just improving that reliability guarantee. So we've talked a little bit about this before, but some secrets. So not everything can really fit into the neat box that is like a customer key data key setup. So say you have some persistent key you need to use for some other purpose, like, you know, an HMAC key, an immutable private key for some PKI stuff, some credentials for some sensitive service. You know, you can store these in a Keyhouse secret and they get stored in that CD by a data key that doesn't leave a keyhouse. And a given customer key that is mutually exclusively assigned to only be able to encrypt secrets and not other data will be able to have any number of these associated with it. So key rings. So key rings, this is not a type of key. It is rather a category or a grouping of customer keys, mostly used for operator and or slash administrator ease of use and convenience. You can use these to kind of group the other customer keys and like say, share it to another operator. And it's mostly all there for us. It's just a form of organization. Cause I mean, if you have one customer key for use case as you should, it's falling good security principles, then you can have a few of them. And then, you know, it's nice to be able to group them together into some like coherent grouping, say like X team or X service or whatever. So, you know, we're talking a little bit about operators and how we're using access controllers to the customer key level. This is kind of like a full diagram of what authorization looks like in Keyhouse. So you have some administrator, you know, they have access through some implementation defined mechanism, you know, say like SSO or whatever. They have ownership of some set of key rings or access to some set of key rings. And then each key ring had uniquely owned some number of customer keys under it. Like I said before, it's a grouping. And then each customer key has its own unique access control list, which define how it works at the data plane level. And so the operators can then go into customer keys they have indirect control of and set up a access control to whatever spiffy authenticated entities like service person, whatever your spiffy IDs represent and can set it up to, so they have whatever specific access is necessary. Reading secrets, writing secrets, encrypting data, et cetera. So next steps, currently Keyhouse is not open source. However, we're working to open source it now. We've been working with some security vendors to make sure everything's tip top shape. And we're also working to open source our Spire workload project and the Spire proxy as well, which is mostly just an offshoot of Spire workload. And yeah, we expect those to be open sourcing in the next few months. We also want to increase the number of integration for generic or open source implementations of various components of Keyhouse. Say like different open source metric servers, like log different ways to gobble up logs, control plan authorization schemes, perhaps even doing back end stores. So just maximizing our kind of like ready to go on a lot of different open source, more open source stacks. And also we want to add support for asymmetric keys. Right now Keyhouse is really, really tailored for what we need internally. And we've really designed it around what our internal customers at ByteCance need for their different projects. And one of those things just hasn't been, asymmetric keys stored in Keyhouse. When you still can't store asymmetric key as a secret. Yeah, you can store it as a secret by like say automatic rotation or setting up PKI infrastructures is not something currently innately supported by Keyhouse as some special feature like we have for customer keys. It fits in pretty well, but we just haven't implemented that. It's not something we want to add in. But yeah, in general, a lot of our design decisions have been primarily motivated by what we need to get this rolling inside. I think we mentioned a little bit before, but Keyhouse is deployed in production now at ByteCance and is now serving all the people on Tech Talk. Yeah, we can't really give out any concrete numbers, but several pods of Keyhouse are serving tens of thousands of services. Yeah, yeah, that's all we have here. Thanks everyone. And I think now we have some time for questions. Thank you so much. All right.