 Hello, I'm Michael Farinacchi. I'm a staff software engineer at VMware and my talk is securing microservices with a service mesh in the real world. I'm going to cover network and infrastructure trends. I'm going to answer the question. What is a service mesh? Also answer the question. What is missing with current meshes? And I'm going to cover global namespaces, which is what I'm currently working on. I believe that enterprise networks have changed, creating new challenges, not only with routing, but also with security. Enterprise networks are increasingly becoming multi-cloud, multi-location, multi-platform. You have different providers in the cloud. You have your own private data centers. You have different ISP providers, SD-WAN potentially on top of that. You have private lines. You have many locations. It could be many data centers, many offices, edge computing. You could have Kubernetes running anywhere. You could have VMs. So now you're mixing monoliths with the new containerization, microservice architectures. All of this creates unique challenges from a networking perspective. The size of the network has increased and the types of devices that are accessing the network have also changed as well. A lot of them are now ephemeral and it's very difficult to track. Who's actually on the network at any given time? The Kubernetes network model presents very unique challenges. To start with, a Kubernetes pod on any given node of a Kubernetes cluster should be able to communicate with all other pods on all other nodes of the cluster without using that. That means deploying a pod and interconnecting it with other pods in the Kubernetes cluster is super easy by default. But it also means that you are now taking much greater risks with security and lateral attacks if your pod ever becomes compromised is much more severe and damaging. And when you consider that these clusters can exist anywhere in the network and are all interconnected, it becomes even more clear that it's very difficult to contain a breach when it does happen. And it's very difficult to track where the breach is because all of these pods are very short lived and it's very difficult to track without identities associated with everything that's on the network. We identified the problems that exist with current network trends and application architecture changes. We can segue into what is a service mesh because that is a potential solution to these problems. Before we talk about how a service mesh can solve all of our networking security problems, let's first discuss what it is by going over the features it provides us. And in my opinion, there's three main features. There's traffic management, security and observability. With traffic management, we get automatic load balancing, routing rules, rate limits and quotas. With security, you get service authentication and authorization. That's through MTLS and you can employ a zero trust network model, which would mean all communication that occurs needs to be whitelisted and anything that isn't whitelisted cannot happen, which is very different from a vanilla Kubernetes application deployment where everything can access everything else. And with observability, you get metrics, logs and traces for all traffic within a cluster. You can sample all of these things after a certain point and scale, but for the most part, you get visibility that did not exist before. There isn't just one service mesh. There's actually quite a few and they all address the traffic management, security and observability challenges that exist in this new Kubernetes environment. I personally have used Istio and that's what I'll discuss further during this talk. Istio's architecture is actually a control plane and a data plane. The data plane is Envoy sidecar proxies, which exist in every single pod that's part of the mesh. And then you have a control plane that is a collection of microservices that all have individual jobs. I won't go into individual details because the architecture has actually changed to something that's much more easy to manage. The microservice control plane for Istio was quite complicated to maintain and manage. And as a result, the architecture has shifted to something that resembles more of a model with and that's called IstioD, where you only need to deploy a single pod to manage your entire Istio control plane. It's easier to operate a single pod for your mesh than the many pods that exist in the old architecture. And I believe this is a step in the right direction for stability and manageability. The modern network has quite a few attack vectors. For instance, we are now reliant on DNS for service discovery and DNS can be hijacked. We also have flat networks, so route injection is possible. And result being that we are now potentially exposed to men in the middle attacks. We may be talking to destinations that are not trusted. And to solve these problems, we can move towards encrypting everything on our network. And Istio provides authentication and authorization out of the box and it's actually quite good. Istio provides workload identity using Kubernetes service accounts as a potential principle. And user request level authentication using JWTs is possible. And principle can be from the transport MTLS or origin authentication. Authorization is possible with RBAC and you can supply optional one conditions to make your RBAC policies even more secure. Istio uses Kubernetes service accounts to identify who the service runs as. And the identity is assigned at service deployment time and encoded in the SAN field of an X509 certificate. Both the source and destination are authenticated via TLS handshake. And all requests go through the Envoy proxies. So TLS is between source and destination and you don't have to worry about putting that into your service logic at all. And you can apply policies to the source or destination or both at the same time. And you can also put optional conditions on top of when that can happen. Service-to-service communication with Istio is mutual TLS. It's automatic mesh-wide per namespace or per service. There are some costs associated with it. For example, with a thousand services, 2,000 sidecars and 70,000 mesh-wide requests per second. The Envoy proxy uses half a CPU and 50 MB per thousand requests per second going through the proxy. And the Envoy proxy adds 2.76 milliseconds to the 90th percentile latency. There is a cost with deploying a service mesh, but I think that it is more than worthwhile from a security vantage point. And mutual TLS configs can be complex for large clusters. That's another consideration, partially because you're going to have a lot of roles and a lot of rules. As far as who can talk to who and when they can do that. It's all manageable, but it's not just deploy the mesh and then expect everything to work. One of the things that instantly sold me on service meshes was the observability that you gain. In particular, the Kiwi dashboard I thought was amazing. Instead of having to manually create the service topologies or the overall network graphs, it's already there for you in real time. You can see who's talking to who at any given instant and what they're actually doing. Do service meshes solve the new network challenges? I think the answer to that is yes. Within the Kubernetes cluster, now we have identity, observability, the ability to audit, the ability to provide micro segmentation and whitelist. I think this solves a lot of the problems that exist within microservice architectures. However, not everybody just has a single Kubernetes cluster and the network is much larger as we discussed before. Not everybody is going to have just a single Kubernetes cluster. In the case where you have many clusters, Istio has two different solutions. One is with a shared control plane and in that case you only have to deploy the controllers in one cluster and they will manage all of the clusters that are on the same continuous network block. In that case, you will get MTOS between all the services, but you are essentially restricted to one geo location. The other option is a replicated control plane where the same root CA is used for all clusters and the Istio control plane is run on every single cluster as pretty much a replica. With this type of a model, you get encrypted cross cluster traffic through a gateway instead of service to service directly and this is possible because you are using the same credentials. Now I would like to talk about some of my personal experiences with Istio. I used Istio 1.0 to 1.5 in production and I have seen huge improvements in stability. Some of the alpha APIs have changed, that was expected. In particular, I would say the security APIs changed more than most others. I have used MTLS routing features, tracing and Kiali and I only use Istio in a single cluster environment. For instance, one cluster for prod, staging, or dev company-wide. I have used a multi-cluster Istio managed by GNS which I will talk about later but I never used Istio multi-cluster as a native feature. Over the years and many versions that I have used Istio, I encountered a few issues that stood out. One was the port name issue which was all the way back in 1.0. The problem was all service to service connectivity was broken and it was because Envoy requires a specific port name convention. The solution to that was to use an emission controller and Istio now has logging to a word for config errors which didn't exist back in that version when this was a problem for me. Another thing I noticed was that Pilot had very high CPU usage in one of the earlier versions and it was unable to keep up with updates. That's because you're supposed to use a sidecar CRD to limit resource advertisements. Eventually your cluster will get to a size where it's unable to process everything if there's like a network-wide flap. There was a bug that was pretty damaging for production and it was after some time Ingress Gateways would stop forwarding traffic and the root cause ended up being the Envoy state machine can get stuck and unable to process listener updates. My team actually pushed fix upstream to Envoy so that all users would not encounter this issue. There was another bug that was noticed in production which is the Gateway STS bug. Basically TLS and TLS gateways were not forwarding traffic. The root cause was there wasn't an update being pushed and fortunately the Istio team ended up fixing it in release 1.7. Despite the challenges that I've seen with Istio I'd say Istio development is shifting from features-first to reliability and stability and the later versions of Istio are much more enterprise-grade than they've ever been. Multiple Cates clusters running Istio has been addressed. But what about the rest of the network? As I said before the enterprise network is now multi-cloud, multi-vocation and multi-platform and we now have monoliths mixed with microservices. Whole set of new challenges that need to be addressed. To address some of the remaining challenges or problems that Istio alone doesn't address I'd like to introduce the concept of a global namespace. A global namespace or GNS is an abstraction that enables users to securely deploy and manage interconnected applications agnostic of the platform or location. Any service within a GNS can securely communicate with all other services within the same GNS by default and Federation policies allow interior GNS communication. As far as how a GNS works I would like you to think of it as a controller for your service meshes controllers. You can federate many different service meshes or load balancers, gateways, firewalls and access a unified control plane for all of them. Therefore your large modern enterprise network has one management access point where you have full visibility as well. An example GNS use case could be where you have two clusters and two different namespaces. In this situation it doesn't matter where your application pods are deployed they should be able to communicate with each other by default across clusters, cross namespaces with fully encrypted MTLS connections. We can extend the global namespace to the enterprise network example where we have a Kubernetes cluster and namespace foo that has two pods and those pods should be able to communicate with legacy VM workloads that potentially could be in a vSphere cluster. GNS should take care of all the interconnectivity and encrypt all connections between the pods and the VMs. Global namespaces should also provide dedicated DNS per global namespace. Services should not be discoverable external to the GNS except if explicitly configured and gateways ingress and egress can be shared cluster wide if isolation is not critical and we can have dedicated gateways per GNS for sensitive workloads specifically in the case of a multi-tier or multi-privilege application or network. I believe the policies specifically security policies can be greatly reduced in complexity with the global namespace concept for example with external services. Services within a GNS should be able to access the same white listed public IPs or domains merely because they're part of the same privilege level. Every service within a GNS can talk to every other service within the GNS therefore they should also be able to talk to the same external services. As far as exposing those services you can have global public services. Select services should be exposed to workloads outside of Cates so that we can interconnect our VM workloads with our microservice deployments in the hybrid network of current times and we can also allow HA and global load balancing with the global namespace concept as well. Overall I think Istio and global namespace concept solves most if not all of the modern network security challenges that are presented. We don't have to worry about route injections, DNS hijacking we can track who we're talking to, source and destination with workload identities. It's overall a much safer environment that we can monitor with observability and audit when things go wrong. Now I'd like to wrap everything up and let's summarize some of the conclusions we came to. Istio does solve many of the modern network security challenges that exist and Istio is prioritizing stability and supporting multi-cluster. Ultimately the concept of global namespaces is needed for the modern hybrid network that mixes old monoliths with new containers. I think we are heading in the right direction and a lot of the problems that exist now or will become problems because of the direction we're going in will be addressed. I hope you enjoyed my talk, I left my contact info on this slide here so please feel free to reach out with any questions regarding what we just discussed. I'm happy to answer any and all questions.