 Hello everyone. So we're going to talk about Istio and Selium today. So starting with quick intro Hey, my name is Ahmed Bipars. I'm a software engineer at the New York Times. I Do scuba I do AWS. I do gover and I joined here with Pete. I Am Pete also an engineer here at the Times It's an honor to be here. I'm super excited to give this talk. I'm from Brooklyn, New York You know, this is my first kube con so super excited We'll make it fun. So so I was thinking how to start this talk and like this code for Albert Einstein caught me Which is out of complexity find simplicity like usually as engineers most of the time we try to Find the best working solution but over time it gets more complicated and then like when we talk networking More complications more problem more debugging more you spend your time trying to figure out what the hell went wrong in my stack So today we are walking through how we started to build a platform in the New York Times And transitioned from like one step at a time to make things simpler in a good way So here's our agenda for today We'll talk about the multi tenant cluster foundation give talk about this before but like we just Get the foundations sorted out We'll talk about our day one setup how we started the platform and Set all of our tenants with their networking stack and runtime And then we'll hand it over to beat we'll talk about cilium and the ebbf and then we'll close with the takeaways. So Starting with our platform. So we host our platform on a multi tenant architecture What I mean by that We have clusters that host all of our runtime for our tenants Mean by tenants customer engineering which product engineering teams have services that they need to build Uh on the platform. So each team will get their own cloud accounts Which will get connected with our clusters. We have a multi regional clusters across multiple environment And all of this based on kubernetes. So that's how teams get their runtime access and build their deployments Now talk about the c9 and network addresses one of the things that I want to mention here Because we are using a multi tenant architecture We want to ensure that we have all of our ib addresses are Up to the stack. So like we are not running into ib exhaustion All of the kind of layers. So you can see here that we are using like the tank class for all of our resources across all of our cloud accounts That's for the nodes and then we transition from like the 10 class to the 100 class for our cilium buds That gives us like a good Way of like not running towards ib exhaustion in our clusters. So basically like we have a translation between the node layer and the buds layer Now we'll talk about strict isolation. So Clusters are very isolated per tenant So we have a multi tenant but each tenant run Different kind of workloads. So when we deploy for a tenant, uh as a service They get a specific namespace. They can get one or more namespaces by default They get one namespace and this namespace is isolated from other namespaces So they can talk to each other by default and this is how we built it to ensure that all of this is stricted Between namespaces and then we will talk more about like how we can open things between different namespaces Another benefit of using cilium and network policies is that we can apply like cluster wide policies across uh workloads So we're talking about like tenant namespace by default like they get access to STU namespaces for example Which like to allow like all of the sidecars to communicate with std and like with ingress gateway and other things But also on the other side, we can see like we need to do all of the dns Lookups and all of the other things happen in the cube system. So these all default policies that get applied to the cluster So that's the foundation Let's talk more about like our day one setup or how like the platform started with the multi-tenant architecture So this is our ingress layer Traffic comes from a dns perspective heads like wherever Providers we are using route 53 for example and then goes to an external ingress load balancer that heads first to our first layer of ingress, which is our invoy So You can see this like there's a multi Load balancers in between but this was purposely built that way as a first step So we can ensure that we have like a publicly ingress and then we have a private ingress But like makes the setup really complex So you get like from a dns To load balancer to invoke to another load balancer to invoke to the workload So that works where we can like expose different Criteria for the first load balancer makes the second one internal But also complicates the process and you can see like we have multiple Invoys in the picture here makes the setup a little bit complex I'll talk why later And then we talk about the mesh And that's how we built Istio In the first iteration. So in the first iteration we have like I talked earlier. We have regional clusters So the regional clusters are east and west and then we to ensure communication between east and west cluster we have to install like load balancers between two regions and all of the communication between workloads goes in that direction So we have nodes but Istio sidecars and when they start to want to communicate with another But on the other side of the world, which is in the west region. They have to go through the Istio Gateways so like traffic traverse through a multiple layer and like you can see from the previous ingress layer and from the mesh setup we have a lot of Load balancers and Istio sidecars and Invoys and like the setup here seems a little bit Complex and we were trying to like make it work as a first iteration But then look forward into how to simplify this And here I'll hand it off to beat He talked about like how are we doing this with psyllium and ebbf Thanks, med I'm going to pick up where I med left off and talk about our current configuration and how we're leaning into psyllium and ebbf This is a more holistic view of our network stack Our friends at istio psyllium and aws are continuously improving and adding new features to their offerings. However, that also means there's More overlap between the features that they that they offer. For example, there's quite a few cni providers including aws themselves As platform engineers, it's important for us to map capabilities that we're going to build on to the vendor Otherwise, we're just making things more complex not simpler So we can see it here with istio. We use service discovery l7 observability in advanced traffic management That's for things like routing a request based on the headers on that request For psyllium, we use it for cni l3 and l4 observability in namespace isolation. We'll get into that in the next few slides And for aws, we use it for the eks offering and the various vpc offerings This makes migrations easier and the stack more maintainable overall If we drill in a little bit further, we can see how these are all tied together We can see at the bottom psyllium and psyllium cluster mesh handle the cni In workload identity Next we pure our vpcs to flatten the network Right in the middle here And finally up top we're leveraging istio for with a multi primary configuration to handle multi region service discovery This type of configuration makes active active and active passive configurations more manageable for product developers because of the built-in failover inherent In this approach, let's take a deeper look at how that failover can work Here we can see there's actually three moments where failover can happen Failover the first being at the dns level if the dns query fails the request will automatically be routed to the healthy host if one exists The second is an envoy failover at the edge of each cluster that tenants can optionally configure Finally at the istio mesh level the sidecar will automatically failover if the workload reports unhealthy You may be wondering why that last failover is even necessary Let's take another look from a different angle to see how that could be useful Here we can see that the mesh failover is quite useful for intra cluster service to service communications If a workload is degraded the mesh failover will automatically proxy to the viable workload In the team's namespace with the same name in an alternative region We'll get more into that in a second Keep in mind without explicitly access without explicit access grants The proxy won't be able to happen in a namespace outside of the namespace that the tenant owns This is thanks to psyllium's network policies, which Ahmed spoke about previously Which enabled the multi-tenancy aspect of the cluster Let's change gears a bit and take a look at how psyllium network policies are maintained and automated For keeping the policies in sync across clusters We use a kubernetes operator pattern to distribute the psyllium network policies which keep namespaces sandboxed This keeps the security folks happy, but workloads within the cluster still need to be able to communicate with each other Let's see what we can do about that And here we can see this is the operator up top of the controller distributing the cmp's across all of our regions As we saw on the previous slide the controller manages the core psyllium network policies, which restrict namespaces by default However, we also allow the users to define their own psyllium network policies to explicitly open up access to other tenants It's important to note that psyllium network policies must pass through an admission controller to ensure that They're only altering the namespace that they're allowed to be or rather they're only within the the policy itself This combination of psyllium and istio are powerful As they allow for granular service to service authorization capabilities Let's take a look at how a tenant may configure a service to work with these authorization policies Here we are building on top of the first layer psyllium and introduce the istio layer, which is using its authorization policy This handles both internal and external authorization to free up applications of any validation business logic Since it's handled completely within the istio sidecar Writing this can be fairly error prone and not the most user friendly thing to do So we've abstracted it to look a little something like this This then gets translated into the appropriate crds And with this tenants are more are able to leverage the complex networking policies at a lower level With a fairly simple abstraction Now i'm going to take a moment and talk about some opportunities we've identified for improvement In general reasons for consolidation and computer science is pretty obvious But I think it's important to call out It's a smaller blast radius for us to maintain It allows us to lean more heavily on ebpf aka psyllium in this case And overall It's simpler management easier to upgrade things if we can consolidate them more So in order to do that we have identified two main things that we're looking to try to consolidate Which are replacing istio multi primary with psyllium global services And replacing istio virtual services for l7 psyllium with l7 psyllium configuration Now i'm going to pass it back to Ahmed to take us out with some key takeaways thank you be so This is the first diagram that I showed around like how is the cluster and all the traffic are being Hosted in the cluster. So we're still like working on dns level multiple load balancers going through multiple invoice And to make that simpler I didn't add istio side cars there Just because like I couldn't find enough space inside the chart Then we moved from that layer to a simple of layer where like we have an external in grassgate way That takes or traffic load balancers that takes all of the traffic coming from outside of the cluster Through dns and then like send all of that to invoice and then from invoice directly to the workload So with this makes like less When we think about like it doesn't It doesn't help like tremendously on latency. The problem is not latency It's more about simplification on the amount of hops that your traffic is going through So instead of like going through multiple layers of invoice to less termination different like load balancers Now we are going from like one load balancer to invoice to the workload However, there are still bees here that's missing or like we are still going from like an ingress To invoice to another invoice on the istio sidecar With the failover with all of the istio sidecars on the clusters That's another piece of abstraction that we want to get out From our setup. So we want to ensure is that like we have a singular invoice that does all of that ingress traffic And also do all of that l7 traffic routing between all of the workloads So we are not there yet, but we are trying to get there But like simplifying the process and make it much easier So what we learned so far make it simple because I ran into problems and everyone who runs ciliums and kubernetes networking runs into problems And once you add more components to your setup, you have more debugging tools that you need to add to your stack and Just simplify it as much as you can even if it started a little bit harder. Just try to simplify it Down the road User experience is very important. So when we look at One easy thing as a platform engineer or as a like someone who works with cilium on a day to day You understand really what is a cilium network policy look like and you can define it easily But when you start to talk as platform engineer to your product engineering teams and you start to say like hey, there's like this Cilium network policies or cube d nas. There's this workload. There's tls. There's mtls all of that kind of stuff It gets pretty complex and they don't have to worry about all of that kind of stuff So as be sure that's earlier like i'm deploying a home chart I just need to Service a to talk to service b. That's all I care about as an engineer If i'm running the network, that's a different story But like the user doesn't really need to know like all of the things behind Unless they are about a user and they want to dive deeper and like they have an edge use case So user experience is pretty important even if it's in a complex situation like the network stack and then That approach works best because at this layer when we started super early we have like multiple load balancers I think like most of the clusters have stu sidecars like most of the containers I don't remember the exact number but like we have Half of our containers stu sidecar and the other half is applications. So it's just like the footprint of stu is like drastically Decreased by like some of the configurations that we are doing We also reduce the number of hops between like stu sidecars and grass all of the kind of Traffic that we are going through on the cluster. So It's okay to start like a little bit complex in a way But like you know how to get and remove things out of the way once you start like making some things simpler for your engineers Who is that? That's a wrap. So thank you everyone and would like to hear your feedback on the survey We do have time for a couple questions if anybody has one Thank you very much for the red talk I have one question regarding the network policies that you're having for users So we are talking about people not having to care more than service a talk with sbsb, right? So I think like what is happening usually when we Start applying a large number of network policies is that it gets very hard to debug and to test, right? So I want to know if you had any framework for testing and having some like driver run of your global network policies So you don't get into situations where dependency dependency management is is becoming very hard and impossible to do Sure, I can talk to one aspect of that We have we have several tests when we're working on a test suite that is When we stand up a new cluster it will validate everything's working as it expects that we expected to work And that's kind of at the lower level And there's a more interesting thing that we started to adopt recently for our tenants that are using helm charts To develop or to deploy their applications. We're actually using helm a unit testing to test the policies and ensure that Whichever way they write them is actually creating the policies that we expect and we can assert on those So there's a two different like layers that we're doing those testings One is more Preemptive and the other one's more reactive Yeah, this is from a user perspective, however, like as a platform like we depend on Hubble For all of their use cases where like we try to debug like how's the network requests are going around So like we keep all of our flow logs in between and like we have a controller Most of the network policies are done programmatically. So they are not done by hand So it's either like through a controller which spits out like some default policies Or through a helm chart which templates something specifically But other than that like we use Hubble to track all of the flows happened between services and understand like what's being dropped What's been allowed. So we would debug this but we still like early in the process on providing this capability to tenants to understand like Is my traffic is being blocked between the service to that service We're not there yet. We are doing this on the higher level on the cluster side But we're looking forward to do something like that in like a CLI too or something might be say Hey, do I have access to that things similar to kubectl off? Can I do that? So maybe something that can I access that and spits out like yes, you can no, you can't You mentioned that you have that component to make it really easy for the developer teams to manage the connectivity the cnps Is that the helm chart that you're providing or is it part of the controller and as a follow-up Is that controller open source or plan to be open source? Um, I could speak quickly to that. So it's two things both of what you mentioned We do have a controller that if we think back to the slides That manage the core cnps. These are the things that kind of keep those those tenants isolated or sandboxed and It can You know, we have a crd that that the tenant will launch CRD into and then that will parse it and change certain things with the policies but The tenants can also change it themselves and they do so via a helm chart Um, so I would say both and I don't think there's any plans to open source the controller right now. It's pretty Specific to what we're doing, but maybe in the future Okay, great. Thanks for your talk. Um, it's now we're breaking for lunch