 Hello, my name is Carlos Sánchez and I'm here to talk to you about dedicated infrastructure in a multi-tenant world Why would you care about this? Our use case was we have multi-tenant Kubernetes clusters Where each tenant may want to have dedicated networking features like dedicated IPs dedicated network connections, VPN connections, things like that and we built this solution on top of Envoy and I think it's For us it's been a great success to be able to do this while keeping multi-tenancy on the Kubernetes cluster and creating this infrastructure with Envoy kind of Driving it and it's been great for us. So maybe it's good for you too So let's go I'm a cloud engineer at the Adobe experience manager cloud service That's the product. I'm gonna talk to you about today and where we built this at Adobe and My background is a lot on open source Work over the years. I'm the author of the Jenkins Kubernetes plugin. I started Years ago and a long time open source contributor at open source projects like Jenkins, Maven, Puppet and so on So brief introduction about Adobe experience manager. It's a content management system Digital asset management, digital enrollment and forms and is used by many 1,400 companies. This was even before the cloud service was was created and It was already a distributed Java OSGI application that used a lot of open source components from a purchase of our foundation And it has a huge market of extension developers writing code for AM So Adobe experience manager on Kubernetes. We took what was already there. We started running it in Kubernetes like a couple of years ago I'm running on Azure. We have more than 18 clusters already across multiple regions, US, Europe all over the world and An interesting point is that we have Adobe has a dedicated team that runs the clusters for us and for other products So this is also limiting the amount of customizations we can do to the to the Kubernetes clusters a Very a specific use case is that customers can run their own code So this the customers can create their own extensions and run them on the cloud in this in AM That's why cluster permissions are very limited for security And we have to enforce like traffic leaving the clusters has to be encrypted We use namespace to provide a scope on network isolation quotas permissions Trying to keep every tenant separate from other tenants If you want to know more about these details you can watch my Kubernetes talk my kubicon 2020 talk and Where I last year when I went through more details today I'm just gonna focus more on the envoy details and these networking infrastructure that we set up so for the dedicated infrastructure part as I said customers want to have their own like egress IPs because by default all the tenants running on the same on the same cluster would get the cluster egress IPs Maybe they want to have private connections VNet peering private link express route all these things that the cloud provides you the cloud provider does Or they want to connect through vpn to their on-premise or other cloud assets so we Build the solution on top of envoy we run envoy on virtual machines and we run envoy on both side cards inside Kubernetes and That's a bit of how the whole system works on On the Set up or architecture We have like each tenant we give them a virtual network and a virtual machine out to scale set and load balancer These virtual machines are running envoy The vNet can be privately connected to the customer network. So if they wanted to have this VPN or if we wanted to support Express route things like that. We could use the cloud provided Features and products to connect this vNet to the to the customer So the vNet where the envoy VMs are running the load balancer in front of the virtual machines Provides the a dedicated public eras IP that is only for those VMs So all the traffic going out of those VMs will have a dedicated eras IP the private load balancer on the on the right hand side gives the dedicated Private eras IP for all the traffic that is going to their customer network through VPN or dedicated connections on The Kubernetes side we have Java virtual machine configure with an envoy side card as an HTTP proxy And so the traffic from that the JVM is going to the envoy sidecar Between the sidecar envoy and the envoy in the virtual machines, we have an HTTP to tunnel That is encrypted and authorized with MTLS So what are the benefits of for us of using an envoy? It is for for the Java virtual machine. We have a simple and transparent configuration We are seeing hdb proxy system properties. So this is like Java level Configuration that will apply to most or if not all the HTTP connections that occur out of the of the JVM and We can also support any other protocols Not an HTTP by using different listeners on the envoy sidecar An all traffic from that pod into the VMs is encrypted The virtual network on Azure allows us to configuration of VPN private connections at cloud level as a service So there's nothing else we have to to set up or buy infrastructure and anything like that and MTLS prevents unauthorized connections and Also one tenant connecting to another tenant envoy. So this is how we can isolate some of the issues we hit or the main issue is that We have to manage as one set of certificates for each tenant For sidecars and for VMs and that comes with like rotation expiration and all the things that we have to do with with certificates which becomes complex over time How is the envoy sidecar setup? So we have one listener with a TCP proxy filter for HTTP on HTTPS By using HTTP connect. So using it as a HTTP proxy that gives an envoy the destination where that connection should go For each non-http port we have one listener that hardcodes the destination on the envoy configuration under the tunneling config and We have one cluster that just points to the envoy load balancer that is in front of the envoy VMs and We have the TLS transport socket configuration there. So quickly through some code. We have the filter chains and We are listening on port 3128 where we just get all the HTTP proxy traffic and we send it to the cluster cluster 0 For non-http We just put tunneling config hostname What's the destination host that we want to send every all the traffic that is coming through that port? And on the cluster side We just put the in the socket address the destination address the envoy load balancer Daddy and the port to that where the load balancer is listening So those are the main The main configuration options and on the TLS configuration we use we point We point to envoy to the SDS configuration files for the certificates and the certificate authority What about the envoy VM side? How is this configured? We just have one connection manager that listens with a connect-up rate So every traffic coming in Envoy will check the connect header and then it knows where the destination is We have one dynamic forward proxy for cluster for all the destinations We don't have to come to hard code any destinations So neither on the listener side or on the cluster side so on the listener side HTTP connection manager and port 443 and this is accepting the connections from the envoy sidecar and We also need to Configure a bit differently what HTTPS and HTTP is and The upgrade type is connect so all the HTTP 2 connections coming in from from one end for through the tunnel They get sent to their to the right destination and Yeah upgrade type is connect and Yeah, we just set the dynamic forward proxy DNS setup and all that Again for the TLS configuration we point and void to the SDS config files both For the certificates on the CA and the very important we require a client certificate We said that requires client certificate to true. So it it is effectively MTLS and On the cluster side, it's just the dynamic forward proxy as I mentioned For SDS configuration, we have a separate file because this way allows us to Reload the certificates we have having the touch envoy and we will automatically read the the new certificate files when they're They changed and we said the match subject alt names. So we use the sound subject alternative names to match exactly the Tenant that we want to allow in those set of VMs. So the tenant that is set on this side has to match the tenant the Certificates Song that has been generated for the for the sidecar. So both sidecar and VM employees Certificates have to have the same subject alternative name for Certificate rotation We use fault agents to generate shortleaf certificates both on the sidecar and the VM We have the certificate authorities store involved and we have passwordless authentication from Kubernetes and Azure VMs to the to vault So we don't have any I mean it's using the the APIs there The certificates are configured as I mentioned in separate SDS files and this ensures that they are automatically Reload on change. So when the vault agent rotates the certificates and void automatically notices And the CA is the same same setup just two different files There are some alternatives you can use third manager on Kubernetes side to do the rotation for the sidecars Or you could use a spiffy. That's another option. We just didn't want to Complicate our first implementation, but that's something we also consider What about the envoy debugging? This is something that is probably the most useful Advice that I can give on all the issues we hit A lot of TLS connection errors only show up on the the backlogs under the connection component and When there's a TLS connection error the client like the sidecar side Only sees the socket closing messages and doesn't get any feedback on why it is The connection is not is not working So some errors that you should look for So on the config side Level warning at least this is one is level warning that you weren't gonna get a key values mismatch if the key doesn't match the certificate But all these other ones are only on the debug level. So other certificate is fired Okay, your certificate is no longer valid Certificate by verify failed or certificate unknown. It means that the certificate Subject alternative my name does not match the expected one and you only get this on the debug logs So if there's an example if the certificates and doesn't does not match This is the the log errors. You're gonna get Certificate verify failed on the VM side and this one shows on the sidecar side as This one does show as alert certificate unknown How do we use all these? Envoy setup So for htp and htps as a normal htp proxy for other ports We just open a port in local host in the pod So in the sidecar and it's available for any any container in the pot On Java we just set the instant properties That are the default ones that any or most of the Java clients will Honor So this point to local host on the port where MV is listening In there's a specific case on Java they like the Apache htp client ignores and you have to explicitly tell them tell the client to use the system properties. So this is something we've we've seen some issues with long running connections and typical connections for Database like where do you have connection pooling or long and long running connections? Something we had to do is configure max stream duration and a stream idle timeout Because otherwise our connections we're getting Disconnected very early and we have also to increase the load balancer timeouts to make sure these connections that didn't get dropped But yeah, this is Something that you can tweak and you should if you have long running connections the defaults don't really They drop the connections very early for the connection pools with vice customers to use Validation of connections before use like in Java JDBC drivers This is just a configuration option where you say whenever you pick a connection from the pool just execute this query to verify that the connection is good and Some tweaks we have done. Yeah, the max stream duration a stream idle timeout These are set up to increase the to prevent the connections for being dropped on HTTP d you can also Proxy connections through these envoys and One case we've seen is like HTTP d sends the proxy request with a connect 1.0 hdb 1.0 header and So that we saw and boy was dropping these connections and we had to Set these hdb protocol options except hdb 1.0. That is by default disabled So that's something that took us a bit to figure out Some more resources you can find about how all this is set up very interesting reads the HTTP upgrades on how this HTTP to connections tunnels work TLS for all them TLS set up and the double proxy example on the sandbox Those were very very helpful to understand how and we would be configured for these use cases So that's all from me if you have a Use case like this Then I hope this help you and show you what you could do with envoy what you could do to have Multitenant Kubernetes clusters and also have dedicated infrastructure like networking This this vnet solution providing a vnet for each tenant allows us to use any of the cloud providers products to set up VPN we could set up Dedicated IPs we can set up Express route private connections anything like that any cluster Sorry any cloud provider will allow you to do that once you have a virtual network and Then the configuration with envoy and all this architecture with envoy allows us to bridge the gap between the Kubernetes clusters and The virtual machines with envoy running on this vnet. So this This is one solution that we are running now and it's a solve a lot of these Use cases for us. So I hope it's it's also useful for you and I think I can answer I'm gonna be able to answer some questions Thank you and have a good day