 All right, so I think we can start. So next up, we have Amidar Pahim talking about the SRE tail, taking 100k calls per hour to the GitHub API. Go ahead. Yeah, thank you. Yeah, that's the presentation today. Intense to go through our process of developing a service that enabled us to get to that number, 100k calls per hour to the GitHub API. But I want to start that tail with the topic of GitOps, because the year is 2019. And back then, we had a teammate, Thread Hat, Jaime. He gave a presentation at DEF CONF, US as well, back in Boston. And he was going through the implementation that led the application SRE team at Thread Hat using GitOps, led the team to have a massive scalability achievement in terms of how much workload we can take into the infrastructure persistent piece. So I advise you all to go watch that presentation. It's on YouTube, you can just look it up. But back then, he was showing us how the GitOps infrastructure and implementation enabled the team to scale up to 17 services being controlled by that GitOps infrastructure. And also, back then, he mentioned we had already 22 reconcile loops. And those reconcile loops, they are integrations. They are control loops that are getting data from the GitOps repository and persisting them into the infrastructure. In the next slide, I have a bit more detailed view in that infrastructure. So we have the tenants down here that are providing merge requests and proposing change to the Git repository that contains the GitOps base. And every time something is merging there, that gets bundled as a package into an S3 bucket. That S3 bucket is actually the data back end for web service deployed on Kubernetes on an OpenShift cluster. That deployment will use the data coming from the GitOps repository and expose that data through a GraphQL API using a Kubernetes service. And then we have all the reconcile loops that are using, consuming that information from that GraphQL API. And considering that the desired state and propagating that desired state into all the infrastructure pieces that we talked to. For example, we have a reconcile loop creating namespaces or creating Kubernetes objects. We have also reconcile loops talking to GitHub API to create organizations to manage users and permissions and that kind of stuff. So all those reconcile loops, they also can be sharded so we can actually have multiple instances of the same reconcile loop, each of them processing a slice of the data that's coming from that GitOps repository. So that whole infrastructure, what it brought to us was a very high capacity to scale, to horizontally scale the infrastructure of persisting configuration into the infrastructure. So with that, the tenants, our tenants, our internal development teams that need infrastructure artifacts or configuration, they would come to that GitOps repository. We'll do some specification of what they need in this example here and are willing to create a Quay repository. After that configuration change gets merged, after a few minutes, they will have the Quay repository created by the reconcile loop. So the interface is well-defined. There's a reconcile loop that will give that to us and we can create as many reconcile loops as we want to automate whatever piece of infrastructure that we need. With that in place, we fast forward to 2021. So currently, we have more than 120 services configured into that platform. Many of them already fully onboarded to the SRE team that are supported. We also have more than 120 reconcile loops. That's also a big role compared to 2019 because each of those loops are doing a different thing, are doing a different piece of automation. And with all that in place, what it meant for us was that we got more than 25,000 merge requests as of today to the GitOps repository because we advocate for a self-service initiative. The internal development teams, they come to that repository to propose changes that will be reflected in the infrastructure later on. So when you have all that in place and you have solved the scalability issue in the infrastructure automation piece, you will start basically transferring the bottleneck to the next layer. Whatever API you are talking to, if you talk too much to that API, you will have a bottleneck happening in that next layer. And that brings us to the problem that we started facing with GitHub, which is the fact that we started exhausting the rate limit in that specific API, the GitHub API, because we do a lot of requests to check if the data that's in there is the desired state when compared to our GitOps code base. Because we do that too often and from too many integrations, we started exhausting that rate limit that it's imposed by that API. It's not, of course, many APIs can impose rate limits, but GitHub is quite critical for us because we have a lot of reconciliation loops that are doing communication to that API. And I'm listing here in this slide, we have six reconcile loops, different reconcile loops as of now that are doing different stuff to the GitHub API, GitHub repo invites to take care of invitations to our bot account and, well, GitHub org, GitHub owners, managing organizations, teams, users, all that each of those will be an independent reconcile loop. And also, each of these independent reconcile loop can be sharded to any number of instances. So we can really have a high load going towards that API. And when we looked up the GitHub API rate limit documentation, what we found out there is that they have a limit of 5,000 requests per hour if you're using OAuth or base authentication against that API. And if you are not authenticated, the rate limit is way lower than that. It's just 60 calls per hour. But for authenticated requests, which is our case, the limit as of now is 5,000 requests per hour. And in this slide here, I'm just exposing to you the headers that come back from those requests to the GitHub API. When you do some requests, you get some response from it. And with that response, you'll have a set of headers. And I'm highlighting here the rate limit total and remaining counter that comes back from the API. So from those headers, you can have a perspective on how many more calls you can make within the hour to that GitHub API. So before we go into the exact implementation of the system that helped us there, a bit of the anatomy of our API calls. We do those API calls every two minutes. By that, what I mean is that every reconcile loop we have is executed every two minutes. And it takes some time to complete. But after it completes and after two minutes, it will be executed again. Mostly, it deals with unchanged data. The data on the GitHub side doesn't change a lot, like users, permissions, or organizations. It's not that every time we reconcile, the data will be different. It's the other around. Most of the time, data doesn't change. But when it does change, we need to react fast to that changed data. Because it typically means that we are trying to perceive something that's already merged into our GitHub's repository. And we have to make sure to reflect that into the infrastructure. So we have to react fast to that changed data. And we have also contracts to our tenants where we say, well, we will reconcile your change proposed to the GitHub's repository within a given amount of time. So we have to comply with those contracts. And by the time that we started this project, we were making around from 6,000 to 8,000 calls per hour to the GitHub API. So we were already a bit beyond the 5,000 calls limit. Of course, if you do some optimizations and you change a bit your code, you can reduce that number. But we were following or trying to come up with a solution that would also unlock us in terms of scalability there. And the alternative is considered to get around that issue. First and the most obvious one, pay from our calls. If you have some subscription or some paid service or option that will let you do more calls, that's the first thing you should be doing. But when we looked that up, it was still limited, right? Even if you pay for some enterprise subscription or if you manage to grow the rate limit for your user accounts, you are just postponing the issue, right? You'll be, sooner or later, you'll be hitting the new limit that's imposed by that new levels of subscription you have. And there was no such thing as unlimited API calls. So the second option that we considered was to create additional users, right? Because those API calls are per user basis. If you can make 5,000 calls per user and you have multiple users, you can, I don't know, load balance the calls sharding the users. You can come up with some mechanism around that. But there's a lot of management overhead. And it's hard to horizontally scale a scenario like that. You have to create new users and all that sharding mechanism. And that typically means that you are being a bad citizen against that API, right? You should not be like faking that you are multiple users. So in fact, you are the same user. You're just using that to get around some limitations. So there might be something better, right? And we started looking into the documentation of GitHub API and we found the conditional requests implementation that they do. Actually, the conditional requests are specified by RFC 7232 and they do implement that in the GitHub API. And what's very interesting about conditional requests there is that every time you request something to the API, you get together with the response, in the response headers, header that uniquely identifies that resource. And every time that resource is different, that identifier called here ETag, that ETag will change if the content of that resource has changed. So every time you make a request to a resource, if you ship with your request that ETag that you received before, when you get the response, if the resource hasn't changed, you get a 304 from the API instead of a 200. And that 304 is telling you, well, the resource is the same. Use what you had taken before because that's current data, right? And the important thing about those 304s that we get from the APIs that they want consume from the rate limit. They don't count against the rate limit. So you can safely make those requests, get the response saying that the date is the same and then you go ahead with life without having consumed anything from the rate limit quota. This is just a bit more detailed representation of that flow. Whenever you do a get full bar into the API, the API, if this is the first call, the API will give you a 200. That 200 will have the content of that response, of course, and that counts against the rate limit. But it also ships with a header called ETag and that header has these unique identifier. Next time you do a request to the same resource and if you ship that request with the if not match header and the value of that ETag you got before and the resource is the same, you get a 304 with the same ETag, no content and it doesn't count against the rate limit. And I'm just proving that you work with the GitHub API with this code snippet here. I'm doing a request against the GitHub API, I got the 200, there's content, there's the ETag header. Next time I do the same request against the same URL but this time shipping with the value of the ETag in the if not match header and what they get from it is a 304 with no content in the response. So that really works with the GitHub API. So next up, how can we use that to make a better usage of the rate limit that we have? The first thing that we considered was to make the conditional request implementation in the client libraries that we use to query the GitHub API, right? But the problem with that solution is that well, first you have to read, write or to add features to the client libraries and some client libraries are not developed by us so you have to contribute to those libraries, whatever community that's developed on. So that would also impose a bigger turnaround to get that done. But most importantly, the fact that if you implement that in the client and you don't share the cache responses that you get from the API, it means that each instance of your automation will have a separate cache for the responses and there will be a lot of duplication of data. And so what we came up with at the time was to create a service called GitHub Mirror and the service basically will be acting as a proxy to the GitHub API. All the clients will do API calls against that new service instead of doing it directly to the GitHub API. So the goals of the project, right? It's an API mirror that caches the responses and implements the conditional requests. It supports both in memory and also ready sketch backends. It has an offline mode which detects when the GitHub API is down and when that's the case, it switch to a mode that will serve all the content directly from cache. So outage in the GitHub API, they are mostly unnoticeable by the clients. It has low footprint, it's easy to get started with, easy to scale out, written in Python, it's highly tested and it's a community project open source, it's in GitHub. So to start using GitHub Mirror, it's very easy. You just call podman run the Docker image that we have there with podman or Docker and it will be your local instance of GitHub Mirror running out of the box, right? And after you do that, you start making requests against that URL instead of doing the request directly to the GitHub API. And even though I'm doing the requests twice here and I'm getting 200 in both of them with content, with everything, it means that the client doesn't have to deal with the conditional request because that piece in between, which is the GitHub Mirror is doing that for you. So you see that the first request got a cache miss, which means that the response was not cached before, but for the second one, we got a cache hit, which means that we served that from cache because the GitHub API gave to the GitHub Mirror at 304. So GitHub Mirror was able to serve that request from cache and that saved us one API call from the rate limit. Talking a little bit about client support, we use mainly request module and also PyGithub with the request, well, it's straightforward, you just have to point to a different URL. But with the PyGithub, they also support the base URL, argument to the GitHub class and with that, you can just point the requests made by that library to a different URL and that will also work out of the box. So zero implementation was needed on the client side other than just adjust the URLs that we are coding against. This is just a little bit more visual representation of what they just described. The client will talk to GitHub Mirror all the time. It will get 200s all the time, always with content, so no awareness about conditional requests on the client side. And the GitHub Mirror is the component doing the conditional request against the API. For the first request, it follows the normal flow. For the second request, it will see that there is already a response for that resource in cache and it will get the ETag from the response header to put it in the next request. It will get the 304 from the API and it will serve that request from cache with a 200. So that's just another way of seeing that flow happening. So that brought us, the service was first rolled out to production in April, 2020. And when we started using that service, we already started with 10,000k calls per hour. And two months later, we got to, because we of course unlocked that limitation from our perspective, we got to 30,000 calls per hour to the GitHub API. And today we've been consistently crossing the 100,000 calls per hour in peak hours when we have more automations kicking in for different reasons. But one very important number that I would like to highlight in this slide is the number of cache misses we have per hour. In this case, even though we made 100,000 calls to the GitHub API, only 100 of them were counting against the rate limit, right? Because only 100 of them got something not 304 from the API, probably a 200 saying, well, here's the new data. And so that brought us to a very low level in terms of how many calls we make that actually counts against that rate limit, allowing us to scale massively in that perspective. The deployment of the GitHub mirror service is a simple Kubernetes deployment. And guess what? We just use the GitOps repository to define everything that we need for that service. Namespace, it has five replicas. And we also have Redis backend. It's using ElastiCache on AWS. We have three nodes, cache T3 small, primary, one primary and two replicas. And all those spots are using the Redis backend to as a caching backend, right? Here's a sneak peek into the OpenShift console for the GitHub mirror deployment. You can see that we have five pots there, each of them 100 megabytes of memory, very low CPU footprint. It's a very slim service, very easy to run and it scales very well. You can have any number of replicas you want. There's no hard constraint there. So just a couple of words around the Redis cache backend to enable it. You just have to set some environment variable pointing to Redis. And when you do that, you also want to specify the primary endpoint, reader endpoint, port, password, well, everything that it needed for connecting to the Redis backend. When you have those environment variables all set, GitHub mirror will use Redis too for the cache out of the box and instead of using an in-memory data structure, the problem with the in-memory data structure is that if you have two replicas, each of them will have a separate cache. It's not shared between all the replicas of the service. And that's why we created the Redis cache backend for it or implemented the code part to use the Redis as cache backend. So some data from the monitoring we have to the service. This is the aggregated CPU usage for all the five replicas. This is the aggregated memory usage. And one very important piece of information as well is the latency graph. You see that what we measure here is how long it takes from when the request from the client gets into the GitHub mirror service until we send a response to the client. So the time it takes considers looking up the cache, talking to the GitHub API, potentially writing data to the cache again and then sending the response to the client. And you can see here that P95 is well below 0.4 seconds and P99 well below 0.6 seconds. So it's a very efficient service. And we try to keep those numbers as we go forward. The offline mode, there are some metrics that represent the offline mode or when it kicks in. You see that there's the screen peaks here. It means that the offline mode started kicking in because of some voltage in the GitHub API. So what happened to us was that by using this service, we became more reliable than the GitHub API itself because when it goes offline, we just keep serving stuff from cache and things just work for the client, right? A few issues that's worth mentioning, but the main issues we had were related to the content or to the headers that come back from the API having also URLs to the API.github.com, which means that if the client used that information to make a new request, it will skip the GitHub mirror and go direct to the GitHub API. So we had to do some URL rewriting to have that also well fixed from the client perspective. Just the last slide against, sorry about roadmap. We want to serve from cache when we have not only outage in the API, but also when we have errors coming back from the actual API.github.com, which means that on 500s we serve from cache, not only when there are some outers in the API, we want also improved offline detection because right now sometimes the API is not really behaving and we don't detect it offline, we just tell it it's still working. So the clients keep getting errors from the API. And also we want to use those client requests to the API to flag that the API is now. So it's all connected. So it's all about making sure we can detect when the API is not working as expected. And as a last topic, multi-tenancy metrics awareness because right now we only show metrics as the accumulated for all the users that are using that service. And we want to split those metrics per user basis. Yeah, that's all I had. I think we still have a couple of minutes for questions. Thank you very much for joining. Here's the GitHub repository URL. You can go there and check it out and feel free to reach me out if you want to, yeah, discuss further the topic. Thank you so much for sharing the information with us and talking to us. I'm just waiting on seeing if there are any questions that we have for you. And it seems like your presentation is probably flawless. Right, yeah, that's nice. All right, yeah, thank you. Thank you for joining and yeah, thank you for having me here. Thank you so much.