 So we can get started. Hi, everybody. Welcome and thank you for coming on what I think is the last block of sessions in the conference. So yeah, we've been using Hyperledger at Corsa for about three and a half years in production now. And we're really excited to share what we've been working on with the community. My name's Alex. I'm Corsa's head of infrastructure. So I handle things like our infrastructure's code and building containers and everything, sort of of that nature. And today I'd love to discuss some recent work we've done as part of sort of a venified development partnership with CERT Manager. So we'll start off today by discussing why trust is so important and why we went to such lengths to sort of do this integration. We'll briefly review how fabric manages trust today and the current off-the-shelf sort of options. We'll double back a little bit to talk about why we at Corsa care so much about this in particular and why we use a blockchain to accomplish our business goals. We'll briefly review how we use and deploy fabric and then move on to just a quick sort of overview of what CERT Manager is, how it works, what it does, and how we use it in the context of our application, essentially. I look forward to demoing just sort of a quick poke around our Kubernetes cluster. We can view one of our staging clusters and view CERT Manager in that context. And then finally end with sort of some extending of these concepts. So we talk about how we push CERT Manager to the fabric network edge and then conclude with some lessons learned. So fundamentally, it's all about trust. Why are we all going through all this trouble of using a rather complicated distributed system? It's about a way to sort of programmatically decentralize trust. I truly think that was Satoshi's greatest sort of contribution to the world. Initially used it for currency, a fairly straightforward application. And Hyperledger is here sort of building on that use case and extending it to support arbitrary data sets. Fundamentally, fabrics implemented by flowing trust through a PKI, which is a very mature and well understood technology. And it allows us to achieve massive scale. Bitcoin measures its transactions per second tens of transactions per second. Hyperledger, I mean, people do 20,000 plus. So we basically achieve this massive scale by using this PKI technology. So just sort of a quick high level overview here to give us some context. Components are usually booted up with a certain set of certificates provided to them at startup. So this is sort of called certificate pinning. And when a component comes up, it says, basically, OK, I've been instructed by my operator, my trusted user, that if a request comes in or I'm presented with a signature, if it is signed by something I trust and it's not expired, fundamentally, then it's considered good right now. I know we can get into like no to use and some more nuanced sort of ways and the way fabric uses X509 inserts. But at a high level, right, that's how things are working. So in the official fabric documentation, there are two primary ways of provisioning these certificates when you bootstrap your X509 PKI and start up your fabric network, right, via the Cryptogen tool, which is a binary command line, and then via the fabric CA. And throughout fabrics documentation, right, they refer to concepts you can extend when using other CAs, but they sort of say, well, we support this, you can use them, but there's no documentation on them. And so we'll explore another one of those today. So to sort of go into each of these sort of off-the-shelf tools that it does document quickly, the first one is Cryptogen, right? This is sort of what people are pushed to when they first start getting involved with fabric, right? It's a binary program, runs on your local machine, and it generates a full PKI, right? It just sits on a directory structure on your machine. And then it's basically up to the operator to manually distribute this PKI to all the different components, right? Sometimes it's Docker shared volumes, sometimes people are manually copying things over. It's very simple, right? It's this nice manifest you run and you specify out what CAs you want, whose certificates you want them to sign. It comes at a pretty steep cost, right, in weak term, this trusted setup. So specifically, right, it's all in one place. So if I'm an attacker, right, all I need to do is basically somehow intercept this bundle of certificates and private key material and I own your network, right? We have our little attacker here, maybe you could spin out phony components, right, using legitimate PKI or you could swap out your legitimate certificates with their corrupt or otherwise bad PKI and just one of many examples, right? Beyond like the sort of Byzantine implications of this, it in general makes maintenance difficult, right? How do we handle a certificate rotation? How do we swap that out, right? It's a lot of manual steps in transferring these certificates from where they're generated out to where they're actually used. And it raises secondary concerns like how do we securely store these certificates well after they're out in the network, like where do we make backups, destruction of data, right? If I rotate out certificates before they expire, do I have legitimate certificates floating around on a hard drive somewhere that are still valid? So the fabric sort of community, I think, acknowledges this, right? And has implemented the Fabric CA and this is sort of the officially recommended route for production deployments according to the Fabric docs. There's a Fabric CA container infrastructure that's distributed official for Hyperledger project and everything like that and the CA binary integrates with either an LDAP that you bring or it integrates with the Postgres off the shelf. So the way that this works at a high level again is clients submit a certificate signing request that is signed by their private key, right? So there's a huge benefit here in that the client generates this key pair on their own device, right? And the secret key never leaves the client for the entire duration that the client interacts with the network. So we get a green check mark here. That addresses probably the biggest drawback of CryptoGen, is that the certificates are generated in a distributed fashion. And the way that a client sort of authenticates itself to the CA says, please sign my certificate. It says it provides a username and password, right? And it counts on someone else to say, okay, it will be a time in the future where a client will present this username, password, combination. You are allowed to sign a client certificate or a peer certificate for them, right? And that sort of puts then the responsibility on the client for actually reaching out. The client is provided from an administrator, CA administrator who is ever setting up this account with these username, password, combination, right? And then it interacts with the CA either via REST, there are SDKs, right? There's a Fabric CA client binary. It reaches out and obtains the certificates it needs. So this is a graphic that I took right from the Fabric CA documentation itself, right? Fabric CA is designed to be run as a separate service and it's pretty much a standard two-tier web app, right? You have your application tier which runs the business logic of the CA, performs the actual cert signing and then that's backed by a persistence layer. So we can see that at the extreme right of this diagram, right? So you can back it with either an LDAP database or various RDBMSs. Fabric CA has its own configuration format, right? It has the concept of a bootstrapping. So if you have a root CA as in this diagram, the intermediate CA can contact root CA, get its own CA cert signed and then participate in the network. And again, this is all handled sort of asynchronously through a config. So what does this mean that, you know, if you're deploying this and you wanna use this in prod? Well, you have to follow traditional, you know, service deployment best practices, right? Do things like load balancing as they sort of reference in this diagram again in the top right. And you have to address concerns like RDBMS scaling, right? It could become a scaling bottleneck. And again, these are all well understood technologies, but again, it is just sort of things that you need to keep in mind and keep in view. So what does this all mean? I mean, we've discovered that this is hard, right? Running a CA is hard. There are many large companies that make an entire business out of doing this like Verisign, DigiCert, right? Challenges, there are always challenges around RDBMSs, right? Right, replicas and that sort of thing. And just scale in general, right? Do you provision for peak? Do you auto scale? How do you implement that? You know, it depends a lot on your workload, sure. But it's again, it's all of these concerns around running a CA that doesn't really have anything to do with blockchains or fault tolerance or any of those sort of core fabric offerings, right? What we also found is a challenge with the CA is it has its own RBAC system, right? It uses attributes encoded in JSON and X509, V3 extended, right? Attributes and certificates to allow, you know, okay, you're allowed to sign certs for this type of a client, you're allowed to sign certs for this. Who can create usernames passwords, right? And it's basically you have this whole other layer of authentication, right? In that who's allowed to sign certificates on top of your existing fabric enrollment certs. We found that, you know, the bootstrapping process can't really be extended well to third-party CAs, right? The concept of a bootstrapping sort of necessitates the ICA be talking to another fabric CA at the root, right? The bootstrapping is sort of a unique thing here. So you can't talk to Venify or you can't go out and talk to like DigiCert or anything like that to get your certificate using the bootstrapping process specifically. In terms of longer-term maintenance concerns, right? We use a separate CLI to interact with fabric, right? Either even if you're interacting over rest, right? Using Curl or something or the fabric CA client, it's on the client to implement this and invoke it in some way to get these certificates. So that begs the question, well, what happens upon cert expiration, right? We never went through with implementing this but just sort of a quick thought experiment, right? You need to detect your certificates expiring soon, perform the CLI client call, right? And then somehow swap that into your node. And that's tough, right? We did it manually. So it's truly a hard challenge. And we had to tackle it because we care a lot about trust, right? So what we fundamentally do is we offer a security product, right? We built a cybersecurity solution on top of Hyperledger fabric. And essentially what we offer is an MFA solution for APIs. So we have a little diagram here of the DLN is again top right. And what we do is generate unique identities per API client that we then validate down low at the service, right? And then go from there. What this means is that our deployments are typically mission critical, right? If you're a customer who is looking for this level of security for your API clients, a lot of government users and that sort of thing. So we need to drive the attack cost through the roof, right? We can't have one RDBMS because, you know, there are some well resourced adversaries here that will attack any single point of failure. So we care about trust because our customers in turn must trust us. So just a quick sort of a side here to give a little context, we deploy fabric on a Kubernetes cluster using a tool called Helm, right? So fundamentally we use one Kubernetes namespace to deploy the core fabric components along with some license management stuff. And again, we have our fabric maintenance jobs scripted some to some are Kubernetes prod jobs, some are just one-off jobs. We have rebased all of the fabric containers on Department of Defense iron bank containers to minimize our attack surface there. And all of this to deploy multi-tenant deal ends. So we have on the left a distributed ledger namespace that will be shared between and many customers. Again, all separated out. And these are fronted by certain small microservices that again we implement as a company to sort of facilitate this interaction with fabric. So on top of this, you know, by using Kubernetes, right? We gain cloud agnosticism, right? We run equally well on-prem. There are customers that run us in a totally air-gapped environment on virtual machines and that sort of thing. GCP, AWS, Azure, we run on all three. Within the last year and honestly within even the maybe the last four months. So yeah, this is sort of what we're trying to protect, right? And we have so many challenges with scaling out the CA in particular as we scale out like we've done some research work recently where we've deployed like 49 peers out, right? And it's like, how do you handle that scale and manage that trust, right? So, you know, how do we scale a CA without needing to deal with the operational headache of running this big, scary, very powerful web service exposed to the internet basically? And our answer to that was to adopt CERT Manager. So just for those of us who aren't familiar with it, CERT Manager is a open source project and what it essentially does is it signs X509 certificates using pre-configured backends which in CERT Manager parlance is called an issuer. Here in this graphic like we have, we have multiple different types of backends we support, right? The ones I've listed here are the ones that it supports out of the box. There are tons of third party plugins for Cloudflare or most of your other issuers honestly. You know, if your DNS system supports the ACME protocol, right? You can issue CERTs that way. It works with HashiCorp Vault, right? And it works with all of these simultaneously. You could be running 50 of these. And again, it's a form of indirection, right? And on the bottom, it's issuing CERTs using this to it can issue it to peers, fabric clients themselves will get into specifically how that's done later or just your typical web service that needs an FKDN CERT sign for to serve its web UI or something like that. The way they accomplish this is CERT Manager uses the Kubernetes API exclusively, right? So in Kubernetes, again, it's declarative infrastructure, right? Resources declare, I need a certificate. And they specify what they need. And again, we'll get into the specifics here. CERT Manager is always watching the cluster effectively subscribes the Kubernetes API server. And as soon as that request is put on the cluster, it does the backend work, right? It says, okay, you know, this peer wants a certificate signed by this CA and it goes and it handles the interaction and it just gives it back to the peer. All of the authentication authorization is done by the Kubernetes RBAC API, right? So none of that is implemented in CERT Manager. Kubernetes gives us so many features here that they just sort of delegate that out there. So if we can add a little depth here, this is an actual certificate object that I pulled off of one of our clusters, right? So this is a, it's a custom resource definition, right? And just sort of going top down, we specify to CERT Manager how long before the CERT is typically expired should CERT Manager handle the renewal. Right beneath that, I'm sure we'll all recognize these standard X509 fields like organizational units, common names, expiration dates, right? These are basically all copied verbatim into the signed CERT, assuming the CA doesn't modify anything like that. It allows us to specify the private key type. Obviously, Fabric requires ECDSA certificates. So we manually override that here. I think it defaults to RSA. CERT Manager, of course, supports X509 v3 extended attributes. So for an enrollment CERT, right? We need the digital signature usage. And then finally at the bottom, this is the linkage to tell it which CA we actually wanted to get signed, right? We tell it, okay, I want the issuer CA, the issuer one, this is a local key pair, but this could be Vennify, right? Enacme provider, whatever you'd want. And that's it. You just kubectl create this object and it works. So this is sort of a architecture diagram of how trust flows now through one of our production configurations using this sort of paradigm, right? So we root all of our trust in air-gapped root CAs. So this is just offline key pair that never sort of leads, that never hits the internet and is never live in the cluster. We manually sign intermediate CA CERTs from those. We use a little go binary we wrote internally, but you can use open SSL or anything. And those are what are given to CERT Manager in a secret, right? As a key pair and it is then registered as a CA. And then from each one of those, and we have three orgs in our production networks and those all flow down into the peers. We run our orders in org one, so the trust flows from there. And then towards the bottom, we have our CORTRA off server, which is how we again run our back and authorization on clients. And if a client earns its trust, basically, we issue a CERT there and the trust flows through there. And just a quick aside, our consensus policy is two of three, right? So we can tolerate the loss of any one org. Cool. So yeah, now I think we have time for a quick demo. We're just going to explore a cluster, one of our staging clusters. So hopefully this will be large enough. All right, so first I'm going to just get the pods and we can... So first we'll look at one of our production clusters. Actually, this is in staging, right? So this matches a production cluster topology, but we're not in prod right now. What I would like to point out too is that at the bottom we have our nine peers all running. We use external chain code. So this is all the chain code that backs up each peer. What else? There's our orders here, some of our maintenance jobs and what we use to set up the fabric network. I'd like to point out that there is no CA running here, right? We don't have any Postgres, there's no CA. That's because they're here, an issuer. So an issuer is a custom resource introduced by cert manager. And here are three CAs backing this cluster now. They don't exist in the traditional sense, right? It's just sort of a group of keys that tells cert manager, okay, when someone wants to cert for org zero, use this key pair, sign it and get it back to them basically. So if we want to go look at cert manager itself, we store those in our ingress, all right, we got a bad rap here. We store those in our ingress namespace and basically cert manager consists of three main components. There's the logic itself here. This is the CA injector. So this configures the Kubernetes API server to basically trust cert manager and to send all of its objects that it's creating to cert manager to see if it needs to be signed or there's any action. And then these web hooks are listening from Kubernetes. So a cert manager configures a subscriber essentially on the Kubernetes API server. And as soon as anyone creates a certificate object or certificate request or anything else that's associated with cert manager, these are waiting there and they are listening and then they make changes to the object, invokes cert manager itself or whatever else is sort of needed to get taken care of. This is in an HA configuration. So we are scaled out and we can sign several hundred certs per second, I think, honestly in this configuration. So if we want to go look at the certificate objects themselves, we'll go back to our demo cluster here. So here we have all of our certificates that are back in the cluster, right? I'm sure a lot of us recognize this. So here is our enrollment cert for a peer, for example. For mutual TLS we have a client cert, right? Server certs and then this is just replicated for everything. Cert manager is what's actually modifying these statuses, right? So we have a real time update of whether the certificate has been signed and it's available and ready to be used. And then we finally point to the actual Kubernetes secret object where the cert data itself is stored. This is just more of an abstraction. So if we dig in, we'll grab this, or we'll grab an enrollment cert. Grab one of these enrollment certs at the bottom. Actually we'll describe it. So we get the full sort of range of Kubernetes API commands. Again, this is all implemented in a custom resource definition. So we get basically real time status on all of our certificates in the cluster. Cert manager has let us know the last message was okay, I've signed the certificate, it's not expired, it's continuing to check, right? We have the time it expects to renew the certificate. We have the maximum age, right? We see all these sort of fields that we declared a few slides earlier in our definition. And yeah, again, our usages. So this is sort of the abstraction that Kubernetes holds and that the cert manager binary itself is watching to know when it's time to renew if anything's up. Yeah, so the benefits of cert manager aren't just in this sort of automation here, right? There are other add-ons that, again, work with this binary. It's an indirection point. So one of the things that we utilize at Corsa is, it's called JetStack Secure. So these are the makers basically of cert manager itself. And what this is, is this is an additional sort of add-on that we've put into our cert manager object. And what this does is it synchronizes our cluster certificates. Now just the certificate portions, right? No secret information leaves the cluster. And this is a SAS. So now we and our security team can get a full sort of insight here into every certificate that's in our cluster, right? So if we dig in here to one of our certs, right? We could see it makes some recommendations, right? In terms of certificate health, how we can better utilize these some of them might not apply, right? Like we sort of have to use ECDSA for fabric. But again, we can view the cert metadata. We can sort of trace through who issued it, right? So here are our cluster issuers, right? We looked at these a little earlier. So it provides us with this sort of all-in-one monitoring tool. You can configure Slack alerts to alert you if something's up. But again, it's one place. You can just sort of find the status of everything, right? You can find the renewal. And you don't need to be digging into the cluster to get this sort of thing. Right, yeah, so standard production deployment. Yeah, this DLN, just for your information, we sustain about 100 transactions per second, right on this, 2,000 transactions per second, read, give or take. So to sort of carry this forward, right? How do we sustain this over long periods, right? So CertManager gives us a lot of benefits like automatic rotation, right? We support a number of CAs. It completely abstracts CA management from the application code, and in general just improves security hygiene. One sort of trap that we fell into, right, is that if you spin up your fabric network all at around the same time, CertManager in general is very fast, right? So it will sign all of your certificates at once. And so if all your certificates are signed at the same time, they'll all expire around the same time, right? And so this could be very bad in that your entire cluster essentially goes down cryptographically, right? Because all your certificates are expired, and sure, I mean, CertManager can, you know, reprovision them quickly enough, but you might have lost consensus, and you're in big trouble. So I just wanted to include this little bit of pseudocode that we have implemented where we essentially hash the name of every individual component, right? Strip out anything that isn't a digit to get a pseudo random number, and then normalize that so that every component has a different lifespan that's somewhere in the last 50% of the certificate's life so that different parts are, you know, maybe they're getting rotated more often than they need to, but, you know, one peer goes down at a time, then maybe some days later, another peer will get its certificates renewed and we sort of spread this out and, you know, we can easily tolerate that sort of behavior. So as we sort of wind down here, I just wanted to go over some concepts around extending CertManager outside of the Kubernetes cluster, right? So, you know, CertManager's power and capability, a lot of that lies in the fact that it runs inside of Cates, right? But it is totally dependent on it, and that's nice in a way, but the question is, and that we faced was, well, you know, a fabric network isn't just peers and orders, right? People are using it, right? It's how it delivers business value. So how do we basically push this capability to sign certificates out of the cluster? And our answer to this was, well, we handle it in our app, right? So a client, when it's first booting up, will generate its own CSR and will submit it to our services, right? So that's this inner dashed line, right? So when it's first booting up, it supplies CSR and then it runs through our control plane, microservices logic, fronting this, where it goes through, sort of our business logic, our back checking, right? We check that it's allowed to get a certificate signed. And then we pass it right into Fabric, right? Our chain code has a CertManager client, really, it's just a Kubernetes client in it. This chain code is responsible for submitting this certificate request onto the cluster. The chain code then waits until the signed certificate is ready and then it follows this green path back out, the signed certificate. And then at that point, we have a signed enrollment cert, we can interact directly with Fabric services. And we did not have to expose a public CA. So I'd just like to conclude here with some lessons we learned through this effort. We have been running this CertManager integration in prod for darn close to a year now, maybe 10 months. So we found that Fabric components read there, certificates once at start, right? So it's not like an Ingress controller or something where you can swap the contents of the file system and it will immediately get used in the logic. This makes automated cert rotation challenging, right? You need to figure out how to basically get this, get this application to read its updated certs, which are just silently updated in the background by CertManager. So to get around this, we just use an off the shelf tool called the Reloader Helm chart. And what it does is it watches all the secret information that's attached to a pod. And when anything changes, it bounces the pod and restarts it. And then that solved that problem. CertManager is a best effort service, right? They don't really do a blocking operation or anything like that when you want a certificate signed. So what we have to do is sort of a two phase here. We submit the certificate and then we pull or we submit the certificate request and then we pull until the certificate is signed and ready to come back. CertManager supports raw PEM, right? You can, if the API doesn't support exactly the fields you need, you can just give it a raw certificate request PEM and that'll get signed. And lastly, if you manually create certificate requests, so it's not matching a certificate object, which is what we do in the slide before for outside clients that are outside of Kubernetes, make sure to clean them up. CertManager will dutifully monitor them and keep them up to date, even if the client has long since come up, done its business and stopped existing. So we eventually wind up with thousands of these certificate requests knocking around. So just make sure to clean those up. We implemented just a tiny little cron job to take care of that. And that's it from my end. Happy to handle any questions and yeah, thank you again. The patient processes are chained in the media CA and the root CA and maybe you can have it in the more layers. Just to also handle that. Sure, so yeah, the question was about chaining different tiers of CA's essentially. So this currently does not, it's a really interesting question. I don't see any reason why CertManager, you couldn't use it to sign an intermediate CA from a root CA. We just sort of do it manually. That is a deliberate business decision to keep that a manual process because the root is sort of the root of everything, right? And that way we don't have a root CA that ever goes live on the cert. But I mean, it's an incredibly flexible tool, right? And at the end of the day, it is signing X5 on Ion Certificate. So I'm sure you could chain it. Yeah, cool. Well, if there are no other questions, I'll be up front here if anyone wants to chat. But yeah, thank you again.