 Hi, hello and welcome everyone. Thank you for joining us today. We're very excited to have you all on. In this webcast, we will be covering an interesting topic on GitLab implementation, tips for performance, stability and recovery. My name is Zach and I'm on the marketing team here at GitLab and I'm joining you today from Raleigh, North Carolina. We'd love to hear where everyone is tuning in from. So please use the chat tool at the bottom of your screen to say hello and tell us where in the world you are calling in from today. Before we get started, I'm gonna cover just a couple of housekeeping items. First, feel free to ask questions throughout the presentation using the Q&A function at the bottom of your screen. Please enter them as you think of them and we'll make sure to get to as many of them as we can by the end of today's presentation. If you're experiencing any technical difficulties, you can use the chat function as well to get in touch with me, the moderator, to provide some assistance along the way. Our presenter today is Joel, a Solutions Architect Manager who is joining us today from Chicago. We are going to launch a couple polls throughout the webcast so we can learn a little bit more about you that way Joel can tailor his presentation accordingly. So without further ado, I am going to kick off our first poll and then Joel is gonna get things started. So our first poll, we're just looking to find out a little bit more information about you and what package, GitLab package you might currently be using. We'll leave this up for just a couple of seconds and then we will move on from there. All right, Joel, it looks like we've got a pretty even mix of non-users, free core users and bronze starter users. So without further ado, I'll let you go ahead and continue our presentation. All right, thanks Zach. Thanks everybody for joining us today. So we are gonna talk about GitLab and its offerings as far as implementation go as part of the GitLab premium package today. The reason for that is when we start talking about performance, stability and recovery, we're moving well beyond the core package that you might install with GitLab or that you may be hosting today. This is where we start to get into issues of capacity. When we start talking about performance or we start dealing with recovery and backups and we start talking about, well, how do we recover from a disaster? Obviously, GitLab is one of those systems that's really important in any software development shop. And so we wanna be careful with our implementations and make sure that we've got the data integrity that we desire. And so we're gonna start talking about some of those things today. As we get into this, we're gonna specifically focus on two areas. One being GitLab GO and one being GitLab high availability. Now, these topics are going to, we're gonna talk about kind of where you start and what they do. We're gonna talk about some tips as you're rolling it out, some things to think about. And we're gonna get into not only the what and the why here but we're also gonna give you some basic outlines for a rollout. So you can see here, what would it look like on AWS? What products should you use? If you've got a thousand users today, how many users should you, or how many different nodes should you deploy for those users? And even some common problem resolutions. I'll give you a few examples from some outages that I've actually been a part of and what it took to resolve those. One thing I do wanna point out here, this is not prescriptive. What you're gonna see today, what we're gonna talk about today, these are not the things that are going to be absolutely applicable to you. We're gonna talk quite a bit today about the different things that cause us to have to evaluate what nodes we implement as part of GitLab high availability. So with that, let's jump into just a quick why. Why do we talk about this stuff? Conservatively, if you had a thousand developers at 65 bucks an hour, and that is a very conservative number, any eight hour outage, any day long outage generates a half million dollars in revenue loss. That's a productivity loss right there. That by itself may seem like a huge number and you might scale it back and say, well, shoot, I'm only 10% of that, that's still 50K. The idea here though is that one day of downtime can exceed the cost of the purchase of the product, right? So you bought GitLab, now the outage day costs you more than that. This by the way is something I've seen at least four times in my duration here at GitLab. And I've been around a couple of years, but I've been part of this where entire enterprises went down for days at a time. It wasn't GitLab, it was the way that the product was architected to host on-premise, okay? So that's why we're talking about this topic today. It is absolutely an avoidable thing and I hate for any of you to be here. So let's start with GitLab Geo. And based on the poll that we ran earlier, I'm assuming that most of you are not running Geo. However, Zach, if you wanted to add that poll just to see if anybody is using this today, that would be great. So just a question on how you're doing basically disaster recovery today if anybody's doing that. So GitLab Geo, why do we talk about GitLab Geo? I talk about this first because a lot of people jump from the idea of the core package of GitLab, a simple installation, a single node installation into the high availability structure right away. And they go right by GitLab Geo. GitLab Geo is one of my favorite things, okay? This is the thing where if the red dot on this map is your primary node, you can roll almost identical, geographically distributed nodes of GitLab into the other locales. These are the orange dots that you see here. That can actually reduce the cloning time in those locales where the orange dots are from minutes down to seconds. It really is that substantial. We're removing that network reliability, that network contingency, right? Where it can slow everything down. And it also is balancing our user load. So if you think about the simultaneous user load, when we start distributing into those geographically dispersed nodes, we're now taking some of that user load off of the primary server. Now as part of that, once we do this, this also becomes our fastest path to disaster recovery. If you look at the GitLab documentation, disaster recovery is best sponsored by a GitLab Geo node. So if you think about it, if the red dot is now, let's say it's continuously synchronizing with the dot in New York, that is a secondary node, that becomes your failover node. You can designate it as such. Now failover is not an automated thing. Today, the most common way of doing that is that we alter the DNS structure, we go into the root of that server and reconfigure that secondary GitLab server to be the primary. But there's a lot of advantages here where your database is already up to date, right? Your Postgres has been synchronizing in the background. Your authentication is inherited from the primary nodes. You're not reconfiguring single sign-on and LDAP and all those things. That is ready for you. So this is your fastest path to disaster recovery and it's something that is really important to think about. We'll chat a little bit more on that in a bit. It's also the simplest method to manage. You can see all these nodes in a single UI. The servers that you deploy, I mentioned earlier, they're basically identical nodes. You can deploy the same server type in four different locations. So it's not like you're trying to figure out, well, what do I deploy where? It's a single omnibus-based rollout and there's that one-to-many kind of deployment method where, again, back to the red dot, right? That red node, that primary node is distributing, it's synchronizing with those remote nodes in the same way. We're not doing a node-to-node on the secondary. We're doing a primary-to-secondary, a one-to-many type approach. So it really is simple from a management perspective. What exactly is a geo node doing? Well, it's really quite simple. The primary server is mirrored onto the secondary server. It's a streaming replication service and any interaction on that secondary server, any users who are doing get pulls and commits are actually being proxied back to the primary. So what that's doing for us is replicating the data between the primary and the secondary at all times, but then it's also proxying the pushback so that the primary stays fully up to date at all times. And just for more detail on that, you can see the streaming replication happening on the Postgres layer, and you can also see here, just notice all the arrows moving to the right, right? Everything is syncing. The LDAP and the HTTPS transfers, all this stuff's transferring to the right. That git push proxy is the idea of everything on the secondary coming back over to the primary node, okay? So you've got users interacting with both nodes. The primary is always up to date. The secondary is getting the streaming updates from it. So it's a relatively simple concept, but here's a few tips for you as you're rolling this out. At first and foremost, you can roll these nodes at will. You don't have to have that whole network designed today. So in the original picture I showed you, where you had one primary server in Europe and you had a secondary in New York, you could start there before you roll nodes in India and China. There's no set order to geo implementation that you have to follow. So you can start small and grow without being interruptive to your existing stack. Set up your failover plan ahead of time. I mentioned this is a manual thing today. So you do need scripts configured to alter your DNS and be able to do a reconfigure in case of an outage. We've seen some of that happening. Is it a manual set of steps? And if it is, do you know what they are? You wanna know this stuff ahead of time so you're not just jumping in and hoping for the best. Understand the limits of the secondary. The secondary is synchronizing. You notice the post-discourse database side of things, but it's not syncing everything. So there are times if you're thinking, well, what would that look like? I would advise you to consult with our docs and make sure you take a look at what is exactly synced. But one example might be an NPM registry where that was always pointed back to the primary. So you wanna be careful to understand what are the limitations of the secondary? What data do I want to restore on my primary? Or what are the things that I need to look for to make sure that we have everything in place? Oversize nodes. So in case of failover, you've just doubled potentially your user population on that server. So you wanna make sure you don't undersize your nodes, especially the one that you're going to fail over to potentially. So that secondary node size it as big or bigger than the primary, right? Make sure you've got plenty of compute, plenty of RAM on hand, because if things fail over, you wanna have the capacity and not suddenly suffer a great performance slow down. And lastly, for the other nodes that are not necessarily your secondary failover node, you can use selective sync to limit your data exchange. So maybe in that diagram we originally looked at, you're going to be failing over from the Europe database into the secondary node in New York. But the instances that you have in China and in India, maybe those are only selective. You've got one group of projects or specific set of projects that you wanna sync. And that really helps out, of course, from the network perspective. You don't have near as much streaming traffic as it's replicating and pushing back and forth between the primary and the secondary. So it creates an efficiency model that you wanna look for. Now the bottom picture that you see here, this is a snapshot of some geo node configuration and what you would see in the GitLab UI. You see, you can see the health and the status of the secondary node, what's actually occurred, what hasn't occurred, any problems we've had replicating the data between the primary and secondary servers. The idea here though is, of course, the primary has the data, it's replicating into the secondary. And this is where I have the ability to shut things down, promote that secondary to the primary and actually visualize it here in the UI where that secondary node will now become the primary and I'll be able to see that within the GitLab UI. All right, so maybe you've rolled GitLab Geo nodes out, maybe you haven't. But at this point, there's a couple of considerations. This is where Geo nodes can really help you support up to about 500 people. When your nodes are between 300 and 500 users per node, on the primary that is, that's when I like to say, okay, let's stop and talk about how you're working. What are you doing when it comes to your software development? Are you a game developer with giant commits? Are you doing a lot of streaming commits with a whole lot of pipelines running? Do you have large system level pipelines that are a system integration type of tests that are running for an hour or two at a time? We see all these different models of working. We see where people don't commit until the end of the day. I can't recommend that, but I sure do see a lot of it, right? And all of a sudden you've got this influx. So when you're looking at the different ways of working, once you get to that threshold of 300 to 500 users, maybe now above 500 especially, we wanna start talking about high availability. There are other instances though where workflows have pushed that down and we've needed high availability at lower user counts. So we're gonna talk about high availability next as a secondary method to implement a reliable GitLab instance. I will say that under 300 users, this is rarely a common occurrence, right? Most of the time we do recommend Geo as your first stop until you've scaled further. So if you decided that Geo's not enough for you, GitLab high availability is the next stop. Now, GitLab high availability is something that we use today on GitLab.com. You're getting the same code in your GitLab Omnibus package that you're getting that's used on GitLab.com. So when we start talking about scaling to literally tens of thousands of users, and we do have by the way reference architecture is available in the GitLab documentation online that you can take a look at, what does it look like? How many nodes do I need for 2,000, 5,000, 10,000, 25,000 even users? That kind of data is available to you. So you can see kind of what the baseline architecture should look like. But each of those needs to be taken in context. And we do recommend professional services from GitLab as you consider implementing things like this, right? Because these can get pretty complex pretty fast. And when you add too much complexity, you actually can increase your ability to have downtime with GitLab. Too much complexity can actually counteract the stability you're trying to put it in place. So it's like the caution that, right? There's a trade-off of complexity with the cost of downtime. So understand what components are being most utilized and stressed. We're gonna talk about all the different components of what they're doing and where you might wanna add new nodes. And of course, the benefit here is you can troubleshoot and scale things at that component level. You're no longer dealing with GitLab as a single application server. You've now broken out components from that server, can take a look at some logs and actually break down where is this issue happening? And of course, no downtime upgrades. We do have a no downtime upgrade capability. However, it's just much more seamless with high availability structures because, hey, there's two of everything. Something should always stay up. And of course, we can install this in the environment of your choice. Everybody's pretty aware of the fact that you can install GitLab kind of anywhere. And most of our users are designing their instances on premise today, behind the firewall or in their private cloud. We'll actually talk a little bit about the cloud install on AWS as well. You can customize your install. Now, again, this is where it's really, really important to understand your workflows and your usage patterns. How many large scale commits do you have? Where is the bottleneck going to occur? Is it going to be something to deal with your repo and your file size? Could it be something to do with your NFS storage? Could it be something to do with your CI system, queuing mechanisms, numbers of CI jobs? There are so many different things that can come into play here. And we want to be aware of what those things are so that we can make sure that we design enough of the nodes to keep you highly available and performant along the way. We're also of course looking to eliminate any single point of failures. So not just bottlenecks, but those single point of failures. The last thing that's pointed out here though, that's kind of fun is we come bundled with Prometheus and Grafana. So you can monitor all these nodes at any given time on a dashboard and keep an eye on what's going on in your GitLab instance. That of course is very, very helpful when things start to go awry. So types of GitLab high availability. We start with the simpler form. There's kind of a horizontal and a distributed model. There's hybrid models you can have. You can see here, this is a horizontally scaled example. Straight from our documentation, I pulled this image. You can see that we have some load balanced front end to our application servers that are in place. We have Postgres database there with the multiple Redis instances. These numbers or these images here aren't necessarily numerically correct, but it's pretty close for a baseline example, which I'll show you in a bit. The thing I'll point out on this page is the Redis Sentinel there on the right side. When you look at that, that number is interesting. You do need a quorum for Redis high availability, which is a three node cluster. So that's really important for keeping your uptime as it relates to Redis. You'll also notice on the Postgres database, we've got three there. Again, these numbers are not in place for a design guideline, but they are fairly accurate for what you may want to roll in a horizontally scaled approach for up to your first thousand users. So what changes when we go beyond this into a more highly available larger user base supporting structure? Well, we get into what we call our distributed model, and you can see here now NFS has become redundant. By the way, let me touch on that real quick. NFS by itself is not something that GitLab provides any redundancy on. That is something that the vendor of your NFS solution is going to provide. So you want to know how to take advantage of that. So that's one thing I do want to point out. From a load balancing perspective, obviously it's the same thing, but in this case, the distributed model, the thing that's most important is we start breaking out that application server. So in the last image, what you saw was the application servers were kept whole. Here we're starting to break out the components of the application server. In particular, what's noted here is just some sidekick cues. Sidekick is a queuing mechanism that controls just how much traffic is going to be controlled and queued up for various functions. So if you think about it from the perspective of the CI pipelines, this is one thing that I see pretty regularly. The CI pipeline sidekick cues are helping us throttle what traffic is going where when it relates to the sidekick cues, getting too long, we end up with some problems. And so we want to make sure that the runners can pull jobs regularly from GitLab and that queuing mechanism there becomes critical. So to the point I made earlier, if you're doing large file storage, the lines going up on the page into the NFS shared storage become a critical thing. Do you have enough network capacity? Is your NFS solution fast enough? When we look at Git traffic, is your IOPS level fast enough, high enough to support the amount of Git traffic that you're sending? So how many of those nodes would you want to have in place? When we're talking about CI traffic moving through sidekick nodes, how many of those nodes do I need based on the amount of CI jobs that we might have queuing up at any given time? So these are just some of the things that I wanted to call out as items that we want to consider as we get into distributed architecture. And again, this is about the most complex architecture that we see rolled out because you can see the number of nodes growing dramatically. And if we start growing with multiple nodes for CI or best effort, then all of a sudden we have a pretty complex scenario here and we want to understand it really well and make sure that it's very robust. What if we take this conversation to AWS? Well, from a components perspective, let me pause here a minute. We've been talking about our omnibus package a lot. There is a Kubernetes Helm chart installation of GitLab. We are moving that direction and I would be advocating for that probably later this year, later in 2020. However, for right now, the most robust package that we have and the way that we're running GitLab on gitlab.com is primarily on the omnibus package of GitLab. So that is why we're talking here not about EKS but about EC2 for those application servers. This is a robust proven system. So I really encourage you to stick with that for now. And as part of that, if you break out the components of the GitLab application server that you're installing on the EC2 instances, you'd be using EBS to store your Git data. You'd be using S3 for all of your artifacts, all of your large files, your Docker containers, all those good things. ELB probably for the front end, you want those elastic load balancers there. RDS will give you the HA Postgres instance so you're not configuring anything special there and elastic cache for the Redis installation again, HA. The reason we're going and talking about the AWS install in particular is when we see a non-local install of GitLab, we talk about the cloud install of a private cloud, 70% or so of those conversations are happening in an AWS context. And so this is the by far the most common rollout that we see. Again, what we talked about earlier as far as number of nodes and things applies, but some of this gets simplified when you roll out HA Postgres with RDS or HA Redis with elastic cache, you're not thinking about rolling all the nodes, that's taken care of in the background by AWS simplifying your rollout. One thing I will call out though, I mentioned earlier the idea of the IOPS and the amount of traffic that you have for Git and how that affects the nodes that you roll and how many nodes you roll. We have found that EFS for data storage, EFS for the Git data is a problematic thing for us. And it's not AWS's fault, it's not GitLab's fault, it's simply a Git mechanism in the way that the processed data and the IOPS rate come into play with each other. So as you scale things into a highly available structure over a certain user count, and I'm talking a pretty low user count here, this wouldn't be something that I'd wanna even tempt or test with high availability today. I don't roll EFS for that data storage, it will in fact cause you performance problems. So that's the one caution flag that I would use when you're rolling high availability on AWS. The application architecture here, we've talked about a lot of these things already. Part of the reason I wanted to call this up is just to point out the ports that we've got to interact with this. So you see there's three open ports 2280 and 443, those are the primary methods of communication from GitLab servers to the outside world. You can also see on this page there's a number of lines coming out of the unicorn and the Giddily workers. And those two workers are really important as far as understanding the flow of Git data and the interaction with storage and with the database. So those two components are pretty important. You'll see us focusing a lot more on Giddily as we go forward as an option to do storage of data to replace NFS. That is something that we've been working on and it's shortly hopefully going to be a high availability function that'll allow us to fully replace NFS requirements. That is a common question we get as well. So again, recap on the components and then just a quick idea of kind of some of the things they do. You can see here the application nodes. Again, the GitLab application can be broken out into things like the unicorn and Puma and Workhorse. So you can separate those web requests out. You saw that in an earlier diagram. A sidekick, again, those cues are really important. Postgres, we've got obviously the difference between the data, the logs, so I'm sorry, the databases themselves and the PG balancer nodes. Make sure you've got those PG balancer nodes in such a place where if something goes down it doesn't take out the balancer. That is something that we've actually seen done where the balancer was installed on the same nodes. And of course, once the node went down there was no way for PG balancer to try to bring another node back up or to keep things moving. So that was its own problem. Redis, the same thing here. We've got some sentinels watching for failover manager. Again, that quorum of three that I talked about. We've talked a little bit about Gitaly, NFS. Again, if you're using AWS you're gonna be using S3 storage and that's for all your large object storage. And again, no redundant solution here from GitLab that is gonna be based on your vendor. We are typically pointing toward NFS before wherever possible, B3 is supported as far as the NFS solution. We do not specifically recommend a NFS solution that is certainly up to you. Load balancer, we talked a little bit about ELB and the AWS side of things, but we commonly use HA proxy. It's also what GitLab uses here on gitlab.com. So we see that commonly rolled as the front end proxy for high availability systems. And of course the monitoring nodes I talked about. Now for anything to be highly available, there are gonna be some core components that are really important and that is gonna make, that is where those asterisks come into play, right? Side kick, Postgres, NFS and the load balancer are things that are really gonna be critical to keeping you move forward. Notice there's not one on the application nodes which is first thing that somebody asked me about when I put this together. And the reason is if your load balancer is working correctly and you've sized everything correctly, the application nodes will typically suffer less than your side kick configuration or Postgres misconfiguration of some kind. If those other items are misconfigured, then your application nodes will act up. But more commonly the problem is where the asterisks are. And we'll talk a little bit about that here in just a second. So one other thing I wanna talk about too is just the whole idea of persistent versus ephemeral components. What are the components where they're saving state and user information and what are the components that we can recreate at will? Because we're talking here about highly available disaster recoverable type systems. So when you talk about things with save state or save user information, things where we want that data to persist. Of course we're talking about the data base itself and we're talking about the Redis caching system which is saving some of that user info. It's queuing and caching that data up. There's also the file system. We don't wanna lose any get data and we want our object storage to not lose any data. These are the persistent components. From an ephemeral perspective or a temporary perspective, we can always spin up a new application server or any of its components front end or backend. And of course from an external services perspective, if the load balancer, the networking is affected, we don't have a data integrity situation. And so that's kind of the way this breaks out. But I like to point this out to people so you know that if you do need to spin a new application server, you can do so without affecting the file system that you're trying to access behind it. And that's something that we get common questions on. So what does all that look like if you just have a simple horizontal scale for 1,000 nodes? Well, or 1,000 users, I'm sorry. It looks like about 14 nodes. These 14 nodes are as listed here. Now again, if you access our documentation, you can actually see what these systems would look like and what we recommend for 2,000, 5,000, 10,000 users, et cetera. The thing here, the way that we size this is for support per user of things like, how many Git interactions can we do 10 plus API replies per second? Those kind of things. That's what's taken into consideration as we design this out. So again, your mileage may vary. I couldn't say that enough this time on this chat today. So I think what you're seeing here is a couple of things. One is that the Giddily storage is of course the largest from the allocation of RAM. We wanna make sure that all of that data is processed successfully. We don't want anything to become a bottleneck on that because it's helping us with our Git storage. And you'll also notice that the primary CPU allocation happens on the GitLab application servers. This also shouldn't surprise you, since this is our primary method of interaction with the user base itself. You can see that some of the other components can be pretty small, but there's a fairly sizable allocation here when you're talking about 14 nodes and this number of CPUs and RAM into your ecosystem. So you wanna make sure that you're well-prepared for rolling out a high availability system with GitLab. Okay, a couple of quick notes on some common HA problems. I told you earlier on, I was part of some production recovery efforts in the past. And we'll talk about just three of them here and then open up for any Q&A you might have. But from the HA problem perspective, if your Git commits are slow, if your overall GitLab performance is lagging, if it feels like, my goodness, shouldn't we be moving faster than this? Yeah, you probably should, right? If you can feel it, it's not right. So the questions, again, are back to your workflow. How many commits? How many large commits, large file commits? How many users do you have? We've seen people try to scale up to thousands of users on a single omnibus node. It's not built for that. You're going to run into problems, I guarantee you. And from that perspective then, first things first, try doubling up the RAM and the CPU on your GitLab solution, on your application server. That's the first thing I would try. And if that's not working, you can always add GitLab application servers because again, those are ephemeral components. They can help you with that capacity. I noted up front the asterisk on the load balancer component, right? If the load balancer is not doing the work correctly or if it's getting overloaded, it's going to slow things down before you ever get to the application server. More commonly though, when I see this problem, it is related to the GitLab application servers and their configuration. They're being overloaded by the users and or the number of interactions with those servers. So that's the first thing that I would look at. Pending and slow CI jobs. So if you're happily merging along one day and notice that there's this giant queue of CI jobs coming together, and nothing seems to be getting processed anymore, that could be just because of a surge in commits. Now again, back to your workflows. If you're doing a lot of CI jobs late in the day, let's say one shift of workers is about to go home from a time zone of the, let's just say Europe in general, any of the European time zones, the US is coming online at the same time and they begin their merge requests. You've got anybody making a bunch of commits at the end of the day and anybody who's starting a bunch of merge requests at the beginning of their day clashing at once. Are you prepared for that capacity? Because too many jobs can create that pending pipeline backlog. And of course, the answer there is typically sidekick that we've seen sometimes where we add more Puma nodes to help with that. So I was part of an outage where we actually saw the sidekick logs and queues just back up tremendously because there was this really intense push towards the release and we were pushing a lot of things at once and the sidekick nodes had been just adequately set. So we had to add some nodes to help with that capacity. Now, of course, sidekick's not the only answer, right? When CI jobs go bad, there's one thing we haven't talked about in this webcast and that is runners and auto scaling runners and runner capacities of course are part of CI issues. If your auto scaling runners aren't set up correctly, if you don't have enough runners in place, those runners are always coming back when they're available and saying, I will pull and look for work, ah, I can take this job. If you have too many jobs that are tagged for a specific runner, which becomes overloaded, or if you're not set up properly for auto scaling, you can of course run into this very same issue based on it just being runner capacity. And that's an important thing to note as well. Last item here is just downtime and outages. I get asked regularly by people who pass me, said, well, GitLab's not performing well for us. We have to restart this thing on a weekly basis or we've got this problem where it goes down. That is not a normal situation. I just wanna get that out there. You may be used to that with tool sets that you've used over the years. GitLab is not that product. So if you're losing performance over time or it's going down entirely, we have a different issue that's in play. We wanna start by reviewing the logs and just seeing what is going on in the system. We wanna double check the networks. We wanna double check the load balancers. Is PG balancer installed correctly? If you're in an HA configuration, because we do see that that database loses a node and all of a sudden everything just shuts down. That's a fairly common occurrence when things aren't configured correctly. But I just wanna point out that's definitely not a common occurrence. And a lot of times that comes down to just this simple spike in users overcoming the application server or again, some of the sidekick nodes and that creating a queue that can't be recovered from because of course, once those pending jobs or once that interaction slows the server down, the first thing we do is not back off, we just try again. And because of those repeated retries, that puts us in that instance where things can go down entirely or slow to a crawl and we have to restart it. So nine times out of 10 that points to a capacity issue. Reviewing the logs helps us identify where those bottlenecks are. All right, so real quick, some of the things we chatted about today, GitLab Geo, part of GitLab Premium, high availability part of GitLab Premium. One thing we didn't mention is as you roll those out, you do get live upgrade assistance from GitLab support and we release monthly, which means you can have live upgrade assistance on a regular basis from GitLab support. A lot of folks really enjoyed that capability and rely on that. We also have priority support that comes with GitLab Premium. So if you're not using that today, you get a four hour SLA, but notice also the 24 seven priority support where again our support folks can jump on with you and help diagnose your high availability constraints and concerns and downtime. We wanna make sure that that's not something that happens for you. Again, professional services helps tee this up really, really well. As part of that, we wanna make sure you've got a fully supported and significantly structured high availability. Cause without that, of course, now we're trying to debug, well, what did you set up for high availability? We wanna help you vet that system before we have to support it. I mentioned Elastic Search briefly on the AWS side of things. That is part of our starter offering. So it's readily available. And then of course, GitLab TAM support. So with a certain number of users or a certain value to your GitLab contract, and it's a fairly low value, by the way. You do get a high value GitLab technical account manager to help you. We wanna keep you growing and healthy in your usage and adoption of GitLab. And this is something often overlooked. There's no extra costs to this, but this is a live person from the customer success team who will work with you to make sure you're aware of the new functionality that's coming out, work with you on product and feature issues. We'll also make sure that you have a healthy approach, not only to your availability of your systems, but also to your system usage. So these are all part of the GitLab Premium offering that we've been discussing today. And I wanna pause now and just see if there's any questions. Looks like we do have one question. And that is a question about looking into GitLab, high availability for not for performance reasons, but business continuity guarantees. What would we recommend in this case, not necessarily needing multiple geographic locations? So there's two things there. One is you probably still wanna consider GitLab Geo. And the reason I say that the node doesn't have to be geographically distributed to provide you that continuity that you're looking for. When you're close to a hundred users, you're still very much in the mode of trying to keep things simple. When you roll high availability, you will certainly have a lot more, you'll have a lot more complexity to your system to contend with. And it takes a lot more time from an administrative perspective as well sometimes to keep up with. So I would say you still probably wanna consider rolling a localized Geo node of some kind. If you have to go high availability because you are a real time, I'm trying to think here, you've got a really high interactive system that has a 24 seven requirement to it or something like that. We would look at that baseline high availability structure that we talked about earlier, kind of that 14 nodes or so, because when you get into the high availability pieces, Redis and Postgres alone kind of forced you to go into that multi node consumption, you start getting into, I need six different virtual machines and that kind of thing just to start that process. So I think that's it, that cost complexity and the trade off between the two that we need to evaluate. And again, your mileage may vary depending on what your workflow looks like. So I think that would be the difference there. And you may need to consider high availability if you're on that 24 seven constant interaction versus a Geo node. Looks like we have one other question about rate limiting values because we've seen some, seen some on-premise installations being hammered by automated API requests, which hurts real users, absolutely agree. We have the same issue on gitlab.com as you can imagine. And so we do limit the API's access on gitlab.com. Now, it really is gonna depend on your architecture and how much it can support, right? So if you've rolled high availability, you're able to say, well, I can throttle that at say maybe 40 requests per second or something like that. The reality is a lot of times that we see that rolling at 10 or maybe 20 so that it's not just totally beaten up and it comes to requests per second because otherwise we've seen them jump immediately over 100 and of course that starts to create some issues. So we wanna be careful with that one especially that rate limiting component is really gonna depend on how many nodes you have available. If you have a single node in place, obviously you wanna keep that number very low. When you get into the high availability structures, I would advise you to take a look at our gitlab implementation docs where we do have those structures of the 2005,000, 10,000 you can actually see there what we recommend as kind of the average API response per second as part of those rollouts. And again, I think you'll see some of those numbers are in that 10, 20, 40 range because we know that we hit problems when we double some of those numbers. Joe, it looks like the other question that we had was just a very general broad question about how to get started with HA. And I think after that we will be able to wrap up. Okay, yeah, getting started is again, start by analyzing what your workflow looks like. Tell me or talk with the solutions architect or your technical account manager on your account. Let's look at the things like what are you doing as far as volume of commits? How many, I think this is a great question too. How many different automated API requests might be part of the way that your system has been architected? Is that really important and part of what you're doing? Do you have a lot of bots interacting with your system? Are you doing terabyte commits, right? These kinds of things really make all the difference in how we architect our systems. And I think we wanna start with understanding that workflow and then secondarily understand the primary business case behind what you're doing. Is it simply uptime? Is it simply productivity? Is there a specific driver behind the performance because you're dealing with something that's say financially based or is it regulatory based? Or there's so many different drivers behind it that we wanna understand that piece before we go into rolling a system just because we need uptime or just because we need to have a continuous flow of interaction with our users. So lots of different things to think about there. The fact that we're talking about this though and that the questions that are coming in are what they are, tell me good things because that does mean we're not trying to roll a whole bunch of users onto a single server and that's the top thing that we've seen that's caused us issues in the past. Awesome, well thank you Joe for a very informative session here. Thank you, give everybody a lot of things to think about and consider. That's gonna be all from us today. I wanna thank everybody for joining us. We'll be sending out the recording from this webcast in the next few days, so be on the lookout for that. Thank you again to everybody for joining and thanks again Joe, have a great day. Thank you.