 Hello, my name is Carlos Sánchez and I'm here to talk to you about dedicated infrastructure in a multi-tenant world. I work at Adobe and I'm going to talk to you about some of the things we are building there in one of the services. I'm a cloud engineer at Adobe Experience Manager Cloud Service and I'm a long time open source contributor. I started the Jenkins Kubernetes plugin that allow you to run Jenkins agents on Kubernetes but I've been a long time contributor at other projects like Apache Maven, Puppet and a few other things. Just so you know, I'm going to have a brief introduction on what Adobe Experience Manager is. This is a content management system that does digital asset management, digital enrollment and forms and is used by many Fortune 100 companies. This is the product that I work on at Adobe and my team. It's from a technical point of view, it's an assistant distributed Java OSGI application that uses a lot of open source components from the Apache Software Foundation. It also has a huge market for extension developers that write modules that run in process on Adobe Experience Manager and this is important you'll see later. We are running Adobe Experience Manager Cloud on Kubernetes and this has been launched over a year ago and this is all running on Azure. We have more than 14 clusters right now and growing. Multiple regions, United States, Europe, Australia, Japan and we keep adding more. And also something to note is that Adobe has a dedicated team managing clusters for multiple products. So we are one of the customers of this other team so we get the clusters and this defines what we can do in the clusters, right? So we don't have full access to the Kubernetes clusters. Important thing, customers can run their own code. We have limited cluster permissions for security and another requirement is that traffic leaving the cluster must be encrypted for compliance. So again, it's going to be interesting later on as I explain what we built. We use namespaces to provide scopes. So namespaces in Kubernetes give us network isolation, quotas, permissions. And if you want to know more details on what our Kubernetes setup, you can watch my KubeCon 2020 talk and you'll find there that I went over all the things that we have built on top of Kubernetes to support AM. This talk is going to just focus on what we are doing with Envoy. The requirement we got from customers is that they wanted to have a dedicated infrastructure. A dedicated infrastructure boiled down to dedicated Egress IPs. So traffic going out of the cluster, they wanted to have a specific IP that is not shared across other customers for multiple reasons. For instance, getting rate limited when hitting different API servers, maybe filtering at the network level in some firewalls, whatever. There's a few reasons to do that. And they also are interested in having private connections to their data centers or other cloud services, things like virtual networking, network peering, private link, express route, direct connect, all these services that cloud providers offers customers to set up their virtual networking across multiple locations. And VPN, another one, whether they want people to access the service through VPN only or connect to their internal systems through some VPN. So this came as a requirement, sort of, and we came and worked on, iteratively, on how to best support this. The first version we built was running a squid proxies in virtual machines that were connected to the internet and these virtual machines would give traffic going through the squid servers their own and dedicated IP address. So how that looked like. We have Kubernetes on one side, we have bots running Adobe Experience Manager inside the Kubernetes cluster. Traffic is going out through the load balancer of the cluster. And this traffic goes into a virtual network where we run multiple squid VM scale sets, auto scale sets, one for each customer, one for each tenant. So the traffic goes from the cluster into the virtual machine squid servers. And then it goes out of the load balancer in front of the squid machines. The load balancer is both for incoming traffic to the squid and outgoing traffic. So that load balancer has a public dedicated IP for that auto scale set of virtual machines. So we run an auto scale set because we don't want to have just one squid server and when it goes down or whatever happens, or if we want to run operates, obviously we want more than one running and we want it to automatically scale on demand if there's a lot of traffic. So again, traffic coming in from Kubernetes into a load balancer that is in front of a squid server, squid VMs. And then from there it goes out to the internet through the load balancer that has a dedicated IP address. So that's what gives it, the load balancer is what gives it the dedicated IP. And there's a scale set for each tenant. The Java virtual machines are configured with the squid as an HTTP proxy and this becomes transparent for the customer. Adobe Experience Manager, I mentioned before, it's a Java application. You set some system properties and then every connection that goes out to the internet is going to use that proxy. So all the traffic is going transparently into the squid VMs and then out to the internet. We use Kubernetes network policies to prevent one tenant to access a squid proxy from a different tenant. So that's how we limit or prevent that one tenant uses the dedicated IP of another tenant. And as I mentioned, each tenant gets an outer scale set and their own load balancer with the dedicated IP. All the virtual machines run into a virtual network that is paired to the Kubernetes cluster virtual network and so this gives high speed, low latency, private connections between both virtual networks. What is good about this solution? It's a simple and transparent JVM configuration using the system properties that Java supports and the VNet peering, the virtual network peering makes all the traffic private. On the other hand, we have some issues like proxy authentication and authorization is not really well supported on Java or squid. So we need to use network policies to prevent one tenant traffic going through a different tenant's squid VMs and this complicates a bit the setup. It only works for HTTP and HTTPS protocols and if somebody by mistake were to use HTTP protocol, the traffic would not encrypt it but this is kind of a problem in general that you don't want, I mean people should not be using HTTP and it doesn't support other use cases like virtual private network or private connections because everything is running on the same virtual network. All the proxies, all the VMs are running on the same virtual network because that makes it easy to do the peering to the Kubernetes network but in turn makes it hard impossible to do other use cases. So we came up with second iteration using Envoy. So it's a very similar architecture but we are running Envoy on the virtual machines and also running Envoy as pod side cards to other experienced manager pods, containers. So the architecture is similar, we have on one hand the Kubernetes cluster, on the other hand we have a virtual network and in this time we give one virtual network for each tenant, a dedicated virtual network for each tenant. We have a virtual machine to scale set as before but this time running Envoy for the same reasons, I mean we don't want one VM to go down and this also runs across multiple availability zones and it can outscale if there's a lot of traffic. Having one vNet for each customer allows us to also have more open opportunities to grow the architecture and support more use cases. So each tenant gets the virtual network, another scale set and load balancers. On the right hand side what you can see is that the virtual network can be privately connected to the customer network. So cloud providers allow you to have vNet peering, allow you to have VPN or any of the private link, express route, all these things at the virtual network level. So you can do it as a service, you can just tell Azure, AWS, Google connect this virtual network to something else that the customer already has. And so this allows us to have all these other use cases that in the previous iteration we didn't have. The load balancer in front of the virtual machines as before gives a dedicated public egress IP to the traffic going out of Envoy. So once the traffic is on the virtual machines it can go out to the internet with own dedicated IP something that we already had in the previous version but we still put support. We have a private load balancer that is connected also to the customer side network. So this load balancer, all the traffic coming from Envoy into the customer network with whatever of the connectivity options that the cloud provider has this would give all the traffic a dedicated private egress IP. So on the previous load balancer we would get a public IP of all the traffic going into the internet and on this case we would get a private IP, dedicated private IP to all the traffic going into the customer network, private network. The JVM is configured as before with HTTP proxy but this case instead of pointing to the VMs is pointing to the Envoy sidecar running in the same pod. So the AM container, Adobe Experience Manager Java container uses the local host Envoy sidecar as a proxy for the transparent forward. Then the traffic from that Envoy, the sidecar Envoy to the Envoy running on VMs, we do an HTTP2 tunnel for all the traffic that is encrypted and authenticated with a mutual TLS. So this allows us to do full encryption from the pod to the virtual machines running Envoy on the scale sets. So what's good about this solution? We have the same pro as transparent configuration on the Java virtual machine using HTTP proxy system properties so that part is transparent to the customer. We support any protocol, not just HTTP, HTTP is very easy because on the headers Envoy can inspect that and know where the traffic is going but for any other protocol we can support it by adding Envoy listeners in different ports in the sidecar container. So the traffic from those different ports can be routed to the correct destination. All the traffic going out of the pod into Envoy, into the outside of the cluster is encrypted. So if there's any mistake by the customer or something that is not encrypted, this makes it automatically encrypted. The dedicated virtual network per tenant allows configuration of multiple options on the service level of the cloud provider, VPN, private connections, express route, private link, all that. And it's just a matter of configuration on the cloud service. Mutual TLS prevents unauthorized connections and one tenant connecting to a different tenant Envoy virtual machines. So something that we were doing before with network policies, we can do it now just using certificates and it's much simpler to manage and more secure. There's some drawbacks like VPN and private connections require that the VNet has a non-overlapping IP range with the customer private network, so it requires some interaction with the customer to know what are their range of IPs. And the certificate management becomes a bit complex because we need one set of certificates for each tenant for both the sidecars and the virtual machines. So we have a certificate authority and we emit certificates for each tenant and then we have to emit for the sidecar, for the VMs and the different VMs and all the pods and so on and then we have to rotate them, handle expiration, all these things and that kind of becomes a bit complicated. How did we configure all this with Envoy? Envoy is very powerful, so what we did is look at the feature support and we built this solution that on one hand you have the Envoy sidecar running on the pod and on the other hand you have the Envoy running on the virtual machine. So from the container running Adobe Experience Manager, we have HDDB traffic that goes into the Envoy sidecar using it as a proxy, so using the Connect header. And this is one listener using TCP proxy filter, Envoy TCP proxy filter, both HTTP and HTTPS and HTTP Connect gives Envoy the destination, so all the traffic going into the sidecar through this listener Envoy is going to encrypt it and tunnel it on TCP over HTTP too and using MTLS and it goes to the Envoy running on the VM the Envoy running on the VM examines the Connect header and then sends the traffic to the internet through a dedicated public IP for that tenant or to the private network if that's the case through the dedicated private IP For other ports that are not HTTP, we have one listener for each port In the destination we have to hard-code it into the Envoy configuration on the tunneling config part So this tells, we have a set of rules, basically if traffic is coming on port 10000 then send it to smtp.example.com, port 25 and if traffic comes to MySQL to the port, whatever 10000 one, send it to MySQL.example.com And then on the other hand, the Envoy running on the VM then knows what to do with that and sends the traffic to the right place through the right interfaces On the Envoy configuration we have also the cluster configuration and on the pod side on the sidecar we have just pointing, we just point to the Envoy running on the VMs to the load balancer in front of the Envoy running on the VMs and we configure the TLS transport socket to do the mutual TLS So some examples, some code examples The listener for the proxy, you can do it like this It's listening on port 3128 and the filters is just TCP proxy and pointing to cluster underscore 0 On other ports, not HTTP, you can do the same thing, but just in the tunnel and config just put the hostname of the destination So any traffic coming through this listener is going to end in that hostname that you are setting that you can put the hostname on the port On the Envoy sidecar, the cluster configuration, it's just pointing on the LBN points We're just saying where the Envoy VM load balancer is listening In this case it would be envoys underscore VM on port 443 for instance and this would go to that load balancer and that load balancer would send the traffic to Envoy And the transport socket configuration, this is where all the MTLS is happening and you set what's the certificate and the private key of that Envoy and the trusted CAs So as long as both sides have a certificate trusted with that CA, Envoy can connect to that and Envoy on the sidecar will present its own certificate to the other side Now on the VM side, we have one HTTP connection manager listener that has the connect upgrade and takes all the traffic coming through this one port and one port only, we don't need more than one And on the clusters, we just configure a dynamic forward proxy and then this handles all the connections, all the destinations for all the traffic We don't have to have more than one cluster The listener side, so listening on port 443 and the connection manager, HTTP connection manager filter And then what it does is the demax, the multiplies all the traffic coming through the tunnel So this is configuration that you can find in the Envoy examples Basically it's telling the upgrade is connect So everything that is coming through there, send it to the destination set on the connect headers And just a small detail that if you want to use this as a proxy with HTTP, not HTTP With HTTPS, the first route is enough, the route that matches on the connect But if you have traffic coming in using HTTP and not HTTPS, sorry, not coming but encapsulated that were originally for HTTP You need the second rule that matches the slash and sends it to the same cluster So this is just a small detail So you just need to this one, these two routes to match both HTTP and HTTPS The filters is just the router with allow connect on HTTP to protocols and the upgrade type connect Just to handle all the traffic that is coming through the tunnel And on the transport socket, this is the other side of the MTLS connectivity and security So same as before, we set the Envoy certificate, public key and the trusted certificate authority And a small detail here is that we say much subject alternative names to a specific subject alternative name So only certificates that have this song, in this case Envoy Sidecar, can connect to this Envoy And this is where it's defined which tenants can connect to this Envoy So you can have certificates with different sons and each son can only talk to its counterpart Envoy in the VM So each set of Envoy has their own line here, their own much subject alternative name And then Envoy's on the Sidecars have different certificates for different tenants Different sons, different subject alternative names for different tenants And this is what validates that is not receiving connections from somebody else On the cluster side is just a simple dynamic forward proxy so we don't have to define what are the destinations It will figure it out based on the DNS name and that will just forward all the traffic Some resources, if you want to check examples on the Envoyproxy.io documentation side You can read about the HTTP upgrades where it explains how to do all this tunneling thing TLS and double proxy is what explains how to do the MTLS authentication between two Envoy proxies Some advice, let's say about debugging on cases like this The TLS connection errors only show up in the connection component, the backlogs So from the client point of view, you only see that the connection is dropped or the connection is not established Sorry, the connection is closed And you have to go and look into the debug logs for this connection component Then you will see things like if the certificate subject alternative name doesn't match the one that the Envoy in the VM is expecting on the other side You will see things like certificate verify failed on the virtual machine side So this is where you can see why is the connection being closed You can go to the VM side and if you see this or on the sidecar side you will see something like alert certificate unknown And then you would know, okay, I made a mistake on the configuration or something like that Otherwise Envoy is not going to give you a lot of information and in the default logging levels So that's it, I hope you like it So if you are looking to do something like this, having dedicated infrastructure in your Kubernetes clusters that are shared across multiple tenants Envoy is a very nice solution to do networking side things And for us it allows us to do this whole dedicated IPs, dedicated virtual network connections, dedicated private link, express routes, VPN All these things are basically put together thanks to Envoy Allowing us to do this encryption end-to-end and routing of the networking and dedicating infrastructure to each specific tenant So it's a very interesting project, I've liked it a lot so far and I hope you get some ideas from this talk Thank you and you can find me on Twitter, at C-SANJF Thank you, have a good day