 Let's get started. Good afternoon everyone. My name is Suresh. I'm a senior architect from Oath. Go ahead. Hi, I'm Rinmay. I'm a senior software engineer at Oath. So, Oath. What's Oath? Anyone know what's Oath? Good. Nice. Oath is a company we just created when Verizon bought Yagu and merged with AOL and TechCrunch. So, we are from Oath. I'm an ex-Yagu now. So, today we're going to talk about Ingress Controller, Kubernetes Ingress Controller using Apache Traffic Server. The team where we belong to, we are part of core infrastructure team which is covering all of media products such as sports, finance and news. So, we manage the complete Kubernetes cluster for this product and we manage and operate the Kubernetes cluster for them. Not only we manage Kubernetes clusters, but we do provide other tools which increases the developer productivity and reliability tools which help improve the reliability and availability of all these media suits. So, agenda today. We'll go over Kubernetes at Oath, do overview and architecture we have and we'll do a little deep dive into the Ingress design and also we'll do a demo and we'll go through the selling feature which we built part of it and then also like how is our Kubernetes setup there? Okay, Kubernetes at Oath. If you look at here, we have 12 Kubernetes clusters across six data centers. We run 100 plus application. We at peak do 150k RPS. The Ingress design which we're going to talk about today, that particular layer does 185k RPS peak. Other important thing is when I say 12 clusters, six data centers, in number of parts, we run more than 10,000 parts. Each of the parts may have three to four containers. At 40k containers, we run. And the other important part over here is like we manage the cluster for the entire media, but we provide them an independent pipeline. They can deploy their code whenever they want to as a product developers for sports or finance, but just we give you the tools to deploy to this Kubernetes cluster which we manage. So, I want to go over the architecture which we have. If this is the Kubernetes architecture which we have in-house, we are on-per. We are not in any of the cloud world. We deploy everything in-house. So, if you look at your left-hand side, all the boxes which are marked yellow, those are the components which we built in-house. And those are which are green or existing components within Oath which we reused. Other things come from Kubernetes open-source world. So, if you look at left-hand side, there are custom tools, what we do with custom tools. Custom tool is something which we built into our screwdriver continuous delivery pipeline. So, this is something which we built for so that user can deploy their code, natively apply the Kube-CTL, apply to their YAML to different environment using the tools which we built part of our screwdriver pipeline. And then there is one which we built, this was a webhook. If you look at the bottom here. So, this is a webhook for authorization of their workflow. So, there are two types we do. One is for identity for the user as well as for the service which is deployed in these Kubernetes. So, if someone is trying to deploy a code to this cluster or through that particular namespace, we will make sure that that particular user or that particular pipeline has been authorized to deploy a code on that one. So, this webhook make sure that make sure that you are authorized to deploy. So, if you look at the Athens which is kind of open-source, Athens is nothing but if you look at AWS IAM is similar to that, this is an authorization, role-based authorization system. So, if you look at the right side of this custom Ingress, it is called ATS, Apache Traffic Server. How many of us are using Apache Traffic Server here? In production. Nice. Nice. So, Apache Traffic Server. This is the Ingress which is like any user who is accessing from your mobile or from your web, sports.ego.com. So, it goes through this particular Ingress layer before routing to the origin which is running in Kubernetes. So, the origins are bunch of parts which is dynamically changing parts so that it have IP whenever there is a new part deployed. So, these dynamically changing parts and there are part goes down, there are node gets crashed and the new deployments are happening whenever these things happen. So, we get every part, every origin part which is serving the sports.ego.com content gets a new IP. So, those IP we need to dynamically fit into the Ingress layer so that it can route the request which is coming from the user to that origin. So, that is complete design which we built which we will go through in detail. Okay. Bunch of requirements, right? Bunch of requirements which we can want to do. When we took, when we want to design the Ingress layer is dynamic updates. So, when I was saying dynamic updates in the previous section I was saying that there are so much deployments happens and there are parts are deployed and they get new IP, right? And in a day, in a cluster, 40 to 50,000 parts get created, containers get created and destroyed and new events generated. So, we want to make sure that we capture those things and feed into Ingress very dynamically. So, in our version one of the design which we had we used to make sure that whenever these things changes the origin IP, the new origin IP changes for any particular deployment we used to feed into ATS and then we used to do reload of the ATS process. What happens when we do reload is it impacts the request, the implies request are impacted and also it's increased the long-term latency of the request. So, what we did as a mitigation was like instead of doing every seconds, every milliseconds whenever it's occurred we batched into five seconds. So, we did five seconds reloaded. So, every five seconds we observed if you look at our chart of the Ingress chart it's like spikes up and drops, spikes up, drops. So, what was happening was we are trying to load the dynamically changed IP, the origin IP to ATS taking into effect by reloading the process. And then say, oh, this is not scalable we are impacting the end user then we kind of try to batch it by every 30 seconds. So, yeah, every 30 seconds is good. We kind of, you know, fix the issue in fact we moved the issue so that it's not every five seconds every 30 seconds. Okay, it's not a great solution but we had every 30 seconds we see that spike, end user spike. And also whenever there is a new part comes into picture Kubernetes deploys the parts in seconds for us to route our request to that particular part we have to wait for this reload to happen. Okay, we need a dynamic nature much more flexible than the version one we had. And also at the same time we also want to make sure that we have so many custom plugins Apache traffic server because Apache traffic server is open source from Yahoo under Apache license. So, we have so much plug-in which are powering the media sites such as ESI SSL plug-ins which we built which is kind of powering the media site. We want to make sure we support those plug-ins part of the migration from old stack to the Kubernetes world with this ATS layer this as a ingress layer. And at the same time we also had, okay, now that we are moving to Kubernetes we also make sure that we are natively using all the Kubernetes ingress specification. The ingress spec which we, it's been used by everyone else like NGINX or any other ingress controller. We want to make sure that we use that ingress specification. Other important requirement we had was like, hey, we are not any public cloud. We are in-house, right? We want a seamless experience for a user. Someone wants to deploy a application to this particular cluster. We just take home and then define a bunch of YAML and then define the URL and the host name for their application and they just deploy it. Everything is taken care for them end to end, okay? So they're not dealing with many different operation teams to get their application deployed. This is in-house, it's not public cloud. So we want to give that kind of experience. These are a bunch of requirement which we had before we design the solution which we're going to portray now. So I said why Apache Trafic Server? Okay, it's fast and scalable and we have so much investment from Yago into ATS. We have so much plugin built into this. We want to make sure that we leverage that plugins. At the same time, it's easy and extensible, right? And we, in case we want to build in a new load balancer algorithm to round robin or their list connection or vector list connection, we are able to quickly develop a plugin and drop it. Then it's very easy to modify the hook of every HTTP transaction hook in ATS. So it's very easy and extensible. And so also we can customize a lot for our occurments. So we have lots of plugin which modifies the record shader before forwarding that to the origins. So with that, why we picked ATS? That's the main thing. We'll go to the Kubernetes Ingress Controller which we built and how it operates with the demo. Okay, go ahead. Thank you, Suresh. So today I'm going to go over the design of the Kubernetes Ingress Controller with Apache Trafic Server. I'll go over the various components that make up the Ingress Controller. I'll talk about some of the key features that we've built into it. And then I'll go into a quick demo which will show you how an application can actually use this. So this shows the overall design of our Ingress Controller. The flow goes from the top to the bottom. So our Ingress Controller reacts to changes on the cluster. It fetches information from the cluster. It parses and processes it through the various components. And it finally provides routes to ATS. So with this process, it knows how to keep all the routes to the pods up to date at all times. So now I'll go over each of the components in a little more detail. So our first component is the service watcher. So this is responsible for setting a watch on the Ingress and the Endpoints resources of the API server. So what this does is whenever there's a creation, updation, or deletion event, it gets notified. It fetches information from the cluster for all the changes that took place, and it writes it into a JSON file format. And this is... So now, for example, if a new application gets deployed, it'll create like a new endpoint and a new Ingress. So all that information is gathered by this particular component. Our next component is the health monitor. So this is responsible for doing a periodic health check of all the pod IPs that the service watcher obtained. So it... And only those IPs that return a 200 response for the health check URL are written to the second set of JSON files. So this is very important for us. So currently, our Ingress Controller actually is on a set of bare metal hosts, on the side of the Kubernetes cluster. So we want to prevent an issue of a network split brain. So it's possible that in some cases, the pods are actually healthy on the cluster, but they are not reachable from our Apache traffic server. So this component helps us to detect such a situation and quickly route away from those pods. Our next component is the compiler. So this picks up all the JSON changes that have taken place, and it compiles it into a binary format that can be consumed by Apache traffic server. This sets like a watch on the JSON file directory. So it's updated whenever there's a change in that directory, and it only compiles that particular change into a corresponding binary file. Finally, we have the Apache traffic server and the traffic manager plugin. So this traffic manager plugin sets a watch on the binary file directory. So again, it's notified whenever there's a change in any of the binary files, and it does an in-memory map of that particular binary file into the Apache traffic server memory map. So with this approach, we're actually only going to do a memory map of the change that took place. It's not going to replace the entire memory map. So this helps us to prevent the whole hot reload situation which Suresh mentioned before. So we don't see any impact to in-flight requests, and we also see that the pod routes get updated much faster, like in a matter of seconds. So this is the overall flow for our ingress controller. So how would an application actually go about and use this ingress controller? So like Suresh mentioned, throughout our key principle has been that we want to use the native Kubernetes resources as far as possible. So even when it came to our ingress controller, we decided to use the native Kubernetes ingress resource. All we require is that the user specifies certain custom annotations that is understood by our ingress controller. So you can see those annotations highlighted on this slide. So for example, we need them to specify ports. The reason that we went ahead with annotations is because we need to specify non-standard ports. For example, we may want to specify like 8080 or 9999 because these are ports that we need in our internal applications. So we decided to specify through annotations. Another thing that we need users to specify is the default domain. So this is basically the domain that they want their application to be routable on. We also have an use case internally that we want to specify multiple aliases for the same particular, same backend. So we could have gone the route of using the host, but then that would mean like duplicating the application for the same backend. So we decided to instead go with annotations where we can specify it in a less complex manner. So all users need to do is do a Q-cuttle apply for this kind of an ingress resource into their cluster, and the ingress controller will automatically pick this up and find a route to that port. So now I'll cover some of the key features apart from the dynamic updates that we mentioned and the active health checking. We also have some key features that we provide as a part of our ingress controller. So the first feature is the cluster level failout. So in a scenario where, so like we mentioned, we have clusters across multiple regions, and it could be that in a particular region there may be a network issue, or we may require to take out a particular cluster for some kind of maintenance. So we want to seamlessly divert the traffic which is coming to these clusters for all the applications on that particular cluster away from that cluster. So we decided again to utilize config maps which are natively provided by Kubernetes. We also have our service watcher component to also set a watch on the config map. Now what cluster admins can do is they can go ahead and create such a config map on the cluster. This will notify the service watcher that it needs to set up, it'll actually inform Apache traffic server to serve a 404 response for the health check URL for this particular cluster. At Oath we actually have a DNS level routing which fronts our ingress controller. So once it starts receiving this 404 response from our cluster, we'll be able to divert traffic seamlessly away from this particular cluster. So in this way we won't really impact any application but the traffic will still get diverted and then we'll be able to do whatever maintenance we want on our cluster. So similar to this ability for at a cluster level we wanted to provide it also at the application level. So applications may have the need that they want to suddenly maybe fail out all their traffic from a particular region. So we also provide the ability to specify application level config maps. If the applications provide such a config map the service watcher gets this information. It sends that information to the traffic server through the entire ingress controller flow. And the Apache traffic server starts serving a 404 response for the health check for this particular application. So all the other applications on the cluster remain untouched and unaffected and they'll get the traffic as was before but all the traffic for that particular application will get diverted to the other regions that it's serving from. Now the last one is to prevent usurping of the same domain by multiple teams. So as a side effect of the ease with which application can onboard to our environment we see that it's possible it could be that multiple applications will try to claim the same domain. I'll go a little more deep into how this could happen. So consider a situation where you have a generic app one which deploys this particular ingress resource and it's already claimed in alias which is kubekorn.media.yahoo.com So it's actually serving on this particular alias. Now since applications can go and update the routes as they wish another generic app say a generic app 2 could come about and they could try to apply an ingress which also uses the same alias which is kubekorn.media.yahoo.com Now since they both can just apply an ingress resource and if this ingress resource got created it would actually lead to a non-deterministic behavior on the Apache traffic server. The way Apache traffic server works is it really depends on the order in which it picks up the rules. Since we have multiple hosts multiple hosts could pick up the rules in a different order and they could actually potentially route to both applications at the same time. So consider you go to yahoo sports and then suddenly you see the yahoo finance page so that would really not be a situation that we want to occur. So we actually came up with a solution to counter this problem. So this is our ingress admission controller. Kubernetes provides a dynamic admission control feature. So what that does is that before a resource actually gets created on the cluster you can perform certain validations of your choice. All you need to do is when you're configuring this control you need to specify what is the service that is going to handle the validation for you. So what we did is we created a dynamic admission control for ingress resources and we provided an ingress claim service which will handle the validation for it. For example now if a user goes ahead a user or a CI CD pipeline and tries to create or update an ingress resource before that before that resource gets created the admission control comes into play. It'll actually forward this resource to the ingress claim service which will actually perform some kind of validation on it. So our ingress claim service at all times it maintains a mapping of the domains that are currently present on the ingress cluster and corresponding to the ingresses which are actually claiming them. So if it gets a request for a particular domain the first thing it'll do is it'll check if that domain is present in the map that it has. If it finds that that domain already exists and it's claimed by the same application by a different application then it will actually fail the validation. It'll send back a fail to the dynamic admission control which will prevent that resource from being created. On the other hand if that domain is not found in its map which means that no other application has claimed it but true to the admission control which will then go ahead and create that resource. So this is how we prevent two applications from claiming the same resource. So this sums up pretty much how our ingress controller works and how you can use it and also the key features that we are offering with our ingress controller. Now I'll go into a quick demo which will show you how an application can go ahead and use this. So this is our Be Right Back page. This is kubecon.media.yahoo.com slash kubecon.html. As you can see you're right now seeing a Be Right Back page. So this is the static page that we show whenever you don't have a route to a particular application. So I have this yaml file. As you can see the ingress specification is on top here and I'm specifying the custom annotations that are required by our ingress controller. So I'm specifying the alias which is kubecon.media.yahoo.com. I'm also specifying the ports and the application to be routable on. For the use of this demo I'm using a simple nginx container. So there's also a deployment yaml which has been specified along with the ingress resource. So you can go ahead and you can just apply this like we usually do with the kubecattle apply. So I'll go ahead and create the resources on the cluster. Let's just see if our pod got created. So it's still in container creating state so let's just do a watch on it and let's wait until the pod actually becomes ready to serve. So it's almost there. The pod is up and running. Let's just make sure that the ingress resource also got created. The ingress resource is also there. So now when I go back and I refresh this page we'll see that the nginx pod is ready to serve. So as you can see in a matter of seconds after the application gets deployed on the cluster with the ingress resource the ingress controller picks it up and it's ready to serve. So that's it for my demo. I'll pass it to Suresh for concluding remarks. Thank you. Can you hear me? Okay. Apache traffic server is very great and we used for one particular use case which is HTTP based and there are some limitations with Apache traffic server which is there's no TCP support and we have use cases which where we want to have support TCP for like red days or any other protocols. So we want to support those workloads. We're looking for other alternative for those use cases like GRPC or TCP. So there's a possible solution which we are looking at this point as like IPVS or NY. So in next cube con we will try to share our experience with those things when we support both HTTP as well as TCP workloads. Okay. Open source. I spoke of a bunch of components out there which is which we built which is on the yellow section. One is a webhook and Athens and how we build this dynamic admission controls all those things are available for community to use. So if you go through this repo here and you can find all the code out here and then we are in process of open sourcing the controller which we just built for ingress claim service. Okay. Other important things. If you look at this one like we just specified talked about hey Kubernetes ingress controller. So within Oath we have this cluster. How are we managing this cluster? We have our own network model. Okay. We have our own template templating. We have customized that. This particular link where I presented like couple of months ago in Bayer and it explains the completely how we how our clusters are deployed how we manage the cluster. How is our network model? I would like you guys to take a look at that and get the full form of how we manage it on firm. Other important thing is we are hiring. Okay. In case you want to work on this awesome Kubernetes platform on firm please contact me. My name is Suresh V. I can see my contacts here. Thank you. I would like to thank you the entire team who worked on this from my team as well as other team within Oath which helped us like it and screwdriver on those guys. Thank you. I'm happy to take the opportunity. I think you're basically essentially in front of Kubernetes. Do you ever in the Oath use case go beyond just the HTTP request load balancing and use any of the caching or other components just as a use case example? I know it can. But are you guys doing beyond the HTTP request load balancing? Are you also doing caching as well in the Oath case in front of Kubernetes? So very good question I'll answer that. This one, this particular routing layer which we specify, that's the routing. We have caching layer, the application. We have sports like the Node.js application. That has its own caching. We have not enabled caching at the Oath case layer but we have a look-aside cache which is running in RedDisk which is powering those websites. This particular layer we just left it to, just to routing. But we have other caching layer which is powering those websites which is RedDisk caching which powers these websites. They're not in this particular layer. Hope that answers your questions. What are the challenges? So, the network split-drain is like, hey, we have observed, because within it's on-perms, we have many switches and racks which are distributed across, right? So what happens is like the pod which is running on the Kubernetes cluster was able to communicate to the API. So the API says that, okay, this pod is healthy and it feeds to this other ingress layer. But ingress layer to the pod, it was broken because one of our ACL push, we do lots of ACL push in the network layer, or there's a particular switch on that was not routing properly. The switch between our ingress layer to that particular pod where it was running that was broken. So we have to have this active check so that we avoid sending the request which will not get any response. So that pushed us to run those. Yes. The annotations you just showed are a little too generic and may conflict with other ingress controllers and so forth. What would it be to properly namespace that? Is that edit the code, make changes and recompile, or do you have some control over that? So you have three annotations you showed to control your ingress. Yeah. Path, I forget what the other ones were, but... The port and then the domain name. Yeah. So those are very generic and can actually conflict with other things or may conflict with other... If someone's written something specific and just took a shortcut also, properly namespace them, you'll put it in a portion domain name or some attribute on there prior to those. How hard would that be to integrate or do you have some control over doing that today? So right now, if you look at this particular ingress layer, it's like a shared ingress layer across all the namespace. Right, but that's probably the problem is what I'm saying. You may conflict with other products that may pick up that annotation cause problem. So that's why I'm asking how hard would it be to change those annotations? It's easy. We just need to change this spec and then we are also looking for some critical cases where the ingress could be specific to that particular namespace. We are also looking to do that design saying that if you are very critical workload, you want to have your own thing, then you can only hook your ingress to this particular namespace and you fetch it. That's something which we are evaluating. But most of our use cases are very similar, all kind of workload. So for us having one ingress layer supporting all of our workload made sense, because mostly this, if you look at this, this is the HTTP at this point of time. Right, okay. You talked about building TCP support or support for TCP ingresses. That's actually like a limitation of the ingress resource itself right now in terms of only being able to support HTTP and HTTPS. So is that something that you're planning to develop support for the ingress resource for TCP resources or that you're going to kind of develop your own implementation? So this is on the ingress spec itself you are talking about, right? Yeah, the ingress spec itself only supports HTTP and HTTPS so to do TCP or UDP or other layer for services. Is that something that you're thinking about developing and upstreaming or I'm just interested in what your implementation might be? Yeah. They're not thought through that. I did not know that the ingress spec was not supporting TCP. Because I know the service where you can define the protocol and say that this is TCP. Okay. I did not know that ingress was you cannot define TCP. We'll look into that if say that it's something which we can contribute back to the open source world we can contribute back. Thank you guys. Thanks for coming.