 In the participants, it says Francisco is the host. Francisco, would you mind clicking record? If you can. It shows recording from... Okay, it's already done. Ah, thank you Francisco. So, as I was saying, I'm going to show you where... what Githli is, why we have it, why we've decided to extract it out into a microservice. And I'm going to show you a feature that I built. And then, at last, I'm going to show the things that I've learned or that I've worked around. And the things that I needed to do to be able to work comfortably on a Githli feature. So, first of all, this is where we were coming from. We had Githli Brails running and on the server. And then we would have several NFS mounts on that server. And we would just talk to Rugged to manipulate or read that repository. Rugged is a Ruby wrapper around LibGith 2, which is a C implementation of Gith. So we can, yeah, we could very quickly make operations on Gith repositories through that. And this has several advantages because, well, the API of Rugged is very nice. We could easily make changes to repositories because it's simple. Adding more storage was also simple because we could just, if one of the storage nodes ran out, we could just add another one, start storing stuff on a new disk and that way we could keep scaling that in this direction. We also had high availability because NFS offers that. For those of you that don't know, NFS means network attached storage. So there's another node with a lot of storage. And we would just share a folder kind of to the server and we would store the repositories there, if that makes sense. Another thing that was nice about this is that Rugged by itself caches some things, which means that by using Rugged, we would not need to reopen the repository for each read or write that we did. This is the reason that some of our big customers now needed to fall back to some of the Rugged implementation because the overhead that Githy introduced, so there's a little bit of overhead for network calls, but there's also some overhead opening and closing the repositories. The main reason we wanted to get away from this mobile is that it was kind of uncontrollable. If there were timeouts for the disk and so on, the whole system would just kind of stop and we couldn't really control that. So that's why we went to this kind of implementation. We have the GitLab Rails application running on the same kind of server as it was before. And then we're talking to different servers who are running Gitly as an application, and those disks are directly mounted SSDs on those servers running Gitly. So that means that we are going to talk from the Rails application through a class called Gitly Client that talks RPC to the Gitly service. RPC stands for Remote Procedure Calls. It's kind of like an API. So instead of counting on a file system, we would have a real API with an error handling that comes with it and so on. Our implementation would be more robust. If one of the storage servers goes belly up, then all the repositories on that server would become inaccessible, but it wouldn't bring the whole system down. Yeah, obviously this implementation is more complex than what we used to have. And there's some overhead, as I mentioned, the network call and opening and closing the repositories, which is something we're working on right now. In reality, it kind of looks a little bit more complex because we have several consumers using Gitly. We have GitLab Rails. That's the one that we're going to be most interested in because that's great developers. That's where most of our work happens. But there's also Workhorse and GitLab Shell that talk to Gitly and even Gitly itself talk to Gitly. As I mentioned, the Gitly process runs on a single storage node, but sometimes that would need information from another Gitly host, for example, when we're working across forks and so on. So then they would talk to each other fetching information. All of this happens through a GRPC protocol defined and defined in our Gitly proto repository. This is kind of like a contract that all of the Gitly consumers and servers talk. We define the code inside code with generate code that we can use across the different services. So for GitLab Rails, it would generate a Ruby Gen. For Gitly Ruby, it would also generate a Ruby Gen, but we would have a Go package that we can talk to from all the Go code and so on. So this might all be a little bit much to take in, but I think it's much easier with a short example. So the example that I'm going to show is a feature I implemented a few months ago and it allows users to email patches as attachments to GitLab. GitLab will create a merge request, apply those patches to a branch, and yeah, so that's what the feature is going to do. So to build this, the first thing I did was think about, I'm going to start showing code, the first thing we needed to think about was what are we actually going to change in the repository. So the first thing to start working on when you're going to implement a feature that touches Gitly that's going to modify or read from a repository is thinking about the messages that we are going to send around. The other thing that we need here is we're going to send the patch to Gitly, apply it and commit it as a certain user to a certain repository. So the endpoint that the RPC, so where we are going to send the message to is this one. It's defined in the Gitly proto repository that I just mentioned. So we're going to have, we're going to have to add a method to the server, user applied patch that gets a certain message. So we are going to send a message of the user applied patch request format and we're going to send back user applied patch request, I'm sorry, user applied patch response to show you what these look like. These are the messages that we're actually going to send. This is the one that we're going to send and we're going to reply with operation branch update response here. As you can see this message consists of two parts. This is because the header will contain the method that is going to contain the user that's going to be committing the patches, the repository where we're going to do it, and the branch we're going to do it all. So that's the new branch that we're going to be creating. And then it's going to contain all of the patch files. We don't actually know how many patch files those are going to be and how big they are going to be. So we need to stream that response that means we will send multiple messages with we could be sending multiple messages with this part of the message filled in. That's because the message size of the RPC calls is limited. If there's any questions about that to free to interrupt me. That's the easiest things to start with figuring out what you're going to do inside the repository. And that's a small change that we can make and we can get the people the maintainers are quickly involved and have them review this. See if there's any spots you missed any more data you might need for your call. They're always very contain changes and they allow you to think about what's going to happen. First, that's why I think it's a good idea to start with the, to get to the pros stuff. Bob, can you explain about the stream thing on the method declaration there? Is it because you're going to have like multiple messages? Yes, we are going to so then you mean in the You mean this kind of stream? Yes. Yes. So that's here because we're going to we don't know the message size is limited. So we can't send we can't be sure that we can send all of the patches that the user uploaded in one single RPC message. That's why we're going to split it up in two parts. One is the header. That's just the main metadata. And we know for sure that's going to fit in a single message. Then the next messages are going to contain the actual patches and we're going to keep looping through them and splitting them up into the bytes that fit in one message to send it off to Italy. Does that make sense? Yeah. Just for completeness, can you give us a hint of how we figure out if it's too big or not? If it's a user provided, then probably you should take care that it can be bigger than a message size. If it's limited like the data that we're sending here inside the header, this kind of data is. Yeah, it's kind of limited. The thing that might be worrisome is, for example, the target branch here, maybe somebody could think of a branch that's way too long. But I don't think that's a risk. But in this part of the message, the patches part, we can't guess how big those are going to be. They're going, they could be megabytes long. And I don't remember the maximum message size of them. It's about one megabytes and we try to stay, we try to stay well below one megabyte, like more in the order of 100 kilobytes max or less. Yeah. Do we apply a limit to the branch size in the code? Do we have a, our branches, do they have a fixed maximum length? They are limited by the file system mostly, like try to create a, on a Mac, try to create a branch with, what was it, Jakob? Longer than 200 something characters, was it even more? Like, because those are file names and file system isn't happy if you create parts that are too long. So it's going to be limited to that. So with branches, we usually just let it blow up on them. So we don't set a hard limit. We could, we could say a branch is not allowed to be longer than X, but it becomes a problem way before we hit the maximum message size. What do you mean, Christopher? What happens when there's no host? Oh, sorry, that was about the recording of the session. I'm going to continue Bob, like that's a side thing. Sorry. So where were we? Ah, yes, the message size. Well, so if we suspect that it could be bigger than one megabyte, then we are going to separate it off into multiple messages. Thank you for that, Jakob. The protocol is the first thing I suggested to the, to the Gitaly team. They picked up on it. We were discussing in the merge request what we're going to do with it. They agreed with me that this would fit the requirements. So that cup merged first that cup merged, which meant that I had a new gen to include and a new go package that I could use for developing the feature on the Ruby side and on the Gitaly side itself. And so let's walk through what happens on the. Excuse me. Can I ask a question before you move on? Yes, of course. In the previous code. What are those numbers like repository one user to thank you very much for bringing that up. I meant to talk about that, but I forgot it. I forgot it during my practice run as well. So we give a name to and an index to those messages. So the protocol can assign them to field. We cannot, because of backwards compatibility, we cannot reuse those numbers like for example here. We have a field that was called pre received error and had an index of two to keep old servers and old clients happy, like for example, an old client would still set this field. And we cannot reuse those. Does that make sense. The name. Just mapping. Yes, there's a location of the, of the field in the message. The size trick. If you think of how big Jason messages are in Jason messages, you waste a lot of space repeating the keys in every message. And this is protobuf, which is designed to be a more space efficient encoding. So instead of writing out a key as an ASCII string, there will be a small binary integer. One, two, three, which indicates the field. And then the name is only used for how ways of accessing those fields. Anything else around the protocol. Okay, so let's see how this is implemented on the, on the get lab rail site. I'm not going to go too deep into this, but this class just receives the, the email and it checks if there's any patches in the attachments, if there are, are we're going to apply those patches to the source branch. So we're doing that through, like, that's kind of a common pattern in our code base, in our code base, there's a service that we pass a project the user and then whatever information that needs that it needs to do what it does. So here's the service. It's just going to perform some validations, seeing if the, if the patches aren't to be this kind of stuff. And to call out into a piece of code that lives inside get lab get all the code that lives in get lab get is the code that is going to call out to get to the client which is going to call out to get to the itself. So here's the, the code that is actually going to call out to get to the client. What this class mainly does is knowing where to call, which get the client to call which RPC is going to have to call. And then it's going to wrap the errors that the RPC errors that could be thrown into something that's more meaningful for the application, for example, a GRPC not found would be wrapped into a non repository error, which is for meaningful on this side. And this is going to call out into the Italy operation client and this is this service and yeah this class and the main responsibility of this class is to wrap the Ruby object we built before into what will be the messages sent to get Lee over the, the GRPC call, as you can see here the header is just a fixed one. And then here we're building different chunks of the patches and that's just going to be this binary IO, does that make sense. The request is entirely built, we're calling out to get Lee here, and we're going to expect a response back in a certain format. In this case that's the, what's it called again the branch update operation. If I'm not mistaken. So we parse that here and we pass it back to everything that needs to do cash in validation and so on. So this passed off to get Lee and according to our protocol which we were just looking at this would end up. There's my other window here inside the applied patch in the operation service as you seem here we were working on the operations proto and the operation package that we have here implements all the methods that are defined in this. In this product in this protocol. So the one we're interested in now is the applied patch one. And first of all, this is a decision I made, we could have passed it on immediately but right now we're validating the header of the request inside go so this we're receiving the stream we're getting the header. And then we're just checking if everything's present and we return early. If, if it's not. Sorry, that's it. If everything's fine, then we're going to pass this off to get the ruby get the ruby is like an another RPC service that's has some of our old rocket code older rocket code and ruby methods that read and write to the repository. And the reason I am calling into get the ruby here is that there's a bunch of helpers there already that will execute the get hoops, the, what is it the pre receive and so on, that we want to execute when we're pushing a new branch stuff like that. So what we're doing here is just getting everything to call out to get the ruby, setting the headers again on the request and then just proxying the the rest of the, of the request to, to get the ruby get the ruby itself implements the same interface as we've defined in the protocol so we would have an operations. So, the operation service, and this is the one that's going to be receiving the request that we just sent from get the go to get the ruby so here's where the request comes in again we kind of read the request and map everything into ruby objects that we can reason about and we pass it off into a class that's going to do the actual thing we want to do. That's the class that we're interested in. Similar to what we have on the in get lab rails, there's a repository class who's going to perform the actual commit, but the reason I wanted to bring this up is this operation service singular. That's a little bit confusing, but that's the class split. And that's going to perform this update branch with hooks. That's the, that's the, sorry, I was distracted by the by the chat, Alexander, and I think how is the load on this RPC calls handle this a question. And what do you mean by load. Can you. Hi. So, like, obviously, like we have multiple users there will be a lot of calls do we have some some sort of like proxing or like load balancing or proxing like in front of these services, or how is that handled how this all of the requests are queued in some way As far as I know, not all of the get leave requests go to a single note like each repository is stored on a certain note. So each note only needs to deal with the request. But there, there, there can still happen like a lot of requests in the same time, like in parallel right so there, there has to be like some sort of boundaries. Andrew, I would love for you to answer that. I remember, Alexander, that most of the requests are handled in, right. And so the vast majority of the request don't go through to giddily Ruby. And, and nearly all of those are just handled with no sort of concurrency limits. So at any one stage, I think at the moment, we have about 1500 requests going to giddily a second, I think it will start the process for every request. No, pretty much. For most of the For most requests that it will start to get a good process. Now remember that there's 35 get shards at the moment right so every now and again we'll add some more, like when we started off with giddily there was maybe 18 shards and now we've got about 35 shards and whenever those questions get full, we add some more shards and so we can horizontally shard it like that. For certain, so there's a there's a there's a small subset of requests that are kind of considered dangerous if you want, and those requests have got something on them called concurrency limits and what that or concurrency limiter, what that does is for any particular repository for those requests, there's only a certain number that can happen at one time, and the rest of them queue, but that's not the default. So one of the RPC methods that we use that for is called get archive, and what it did what it does is it creates a table of the entire get repository, and then sends that through to the client and so if you spin up 100 requests to get that concurrently it could it could damage the server quite drastically and so what we do is we intercept that and we say no. If you want that you're going to have to wait, but for 99% of the request we don't do that because it's not necessary. Okay, okay. I'm just wondering like, like, it may happen that one repository is more active than the other so on a specific server or in a specific shard, there will be a lot of this request calls to the to the services like the RPCs and so on so I'm just wondering if we do have a queue. We, you know what we do is we kind of try to balance things out a little bit and move things around between the shards if there's particularly noisy repository. So we'd move a repository to a different chart that that's like. We have done that, like, you know, three of the busiest repositories for Giddily are surprisingly GitLab CE, GitLab EE and about www.gitlab.com. Luckily, those are on three different servers because if we put them all on the same server, it would put a lot of load on those servers. But actually what you find is at any stage, there's about four servers that are getting new repositories and then the rest of the service. So what's that at the moment, like 29 servers or whatever it is, don't get new repositories. And if you look at the load, those ones always have much higher load and the rest of them, it's sort of quiet since down a little bit. Cool. Thank you. Matt asks if every Giddily shard is a singleton. And as far as I know, yes, for now it is. So each shard has one disk with a certain set of repositories and we keep track of which shard which repository is on. Andrew, you can correct me there if I'm wrong. That's exactly what that's exactly right. And I don't know if some of the Giddily people on here, but there's a piece of work called Giddily HA. And one of the things that will come out of Giddily HA is that we will have, you know, multiple copies of each chart. But at the moment, it's a single shot. So for machine reboots, a subset of the repositories will request to those repositories will fail during that reboot. But luckily, it's generally pretty stable. And what's also interesting is, as I mentioned, for some requests, mainly writes, we have this second Giddily process that's running on the Giddily service as well called Giddily Ruby. And that's obviously much slower than the go implementations of stuff, but it contains a bunch of codes and it's using the target, which is pretty handy to like it has a pretty handy API to write to repository and so on, which is what we're doing here. And mainly this updating the branch with hooks is going to call out. It's going to run the githubes, which is the, I forget now, pre receive update and post receive. And we want to do that for each for each operation that updates repository. And we already had that implemented in Ruby. Because the writes are not as high traffic as the reads, there's wasn't really much reason to rewrite all of that and go, which is why we're using Giddily Ruby for this RPC as well. So as you can see, it just calls out to the repository. It's going to perform the commit patches, which in this case is going to be just a calling of githam. We passed the user that's going to perform the commits. Remember, we're using patches here. So the author of the patch isn't the side of the patch, but we're committing that as the user that sent the email using the token that we got from the other side from Githlaparail. That's passed inside the environment here. And we just run a githam the way you wouldn't apply a patch locally. And then afterwards we were forced to be able to send back the branch update requests containing the new revision and so on to update the caches back on the on the Githlaparail site, the Githlaparail site. Matt asks if what the relationship is between Githy Ruby and Githy go on was the go implementation of rewrite of the Ruby implementation. So as far as I know, when we were implementing Githy, we were implementing a bunch of RPCs and rewriting them into go. Especially we started with the high, the high traffic ones and those would perform much better in go. We try to map one get like one operation one read to one RPC call. But yeah, that all hasn't always worked, which is why sometimes that we have problems of performing way too many Githy calls inside a single request. Andrew, you were going to say something. Yeah, just to give a little bit more color to to that answer like the original intention was that the entire everything was going to be written in go. So as we went along, we realized that like there's a lot of code that is, you know, it's really old like it's some of the original rugged code that went into Githlap. And there's a lot of really weird edge conditions and rewriting that in go would be incredibly complicated and there's not enough test cases. And so at that point, we realized in order to speed things up, we would just leave those in Ruby and that's sort of where Githy Ruby came from. You know, that stuff that long tail of things I don't think will ever go away, because there's no point in in moving them across there's not a lot of value in moving that to go. Since the amount of requests that are doing these rights since it's mostly rights going through Italy Ruby. There's not a lot of performance to be gained there as well. And as I mentioned the API for the rugged API, like for us to use is pretty nice. So I'm kind of happy that it's still there for some cases as well. And as Andrew mentioned that at some point we just copied all the code over. That's also why you will see a lot of similar namespaces here and so on. It can be a little bit confusing like the operation and operations service operations plural is the RPC like the implementation of all the different RPC calls defined in the operations and operation service is something that comes over from GitLab rails that would perform a thing on a repository as a user. And I think you can correct me if I'm wrong on that, but that's how I understood it. Yeah, it's it's a bunch of weird. Lots of giddly Ruby is just a bunch of weird code that we literally vended from kid lab rails to get it across the way exactly exactly the way it used to work. So that's roughly how that particular feature works. Hello, there's two more questions. RPC calls vulnerable to security that's what has been done to tighten up the security so the RPCs themselves are not open to the public, but the source code is so they're like you can manipulate so you can. Yeah, expose vulnerabilities like that sometimes we shell out to to get like we did for this particular call and where's the file. Here, this is things that we are shelling out to get so there is some danger in that but giddly is supposed to be a safer way of working with Git than just shelling out to get directly. And I think I read that from Jacob somewhere in a merge request, and I thought it was nicely said. And what have we done to tighten up security and we, I don't know anything. Yeah, what one of the obvious problems is argument injection on commands because we spawn lots of good commands so we try to have same abstractions where certain things can go wrong but it's there's no golden bullet there it's an ongoing. It's something we still get the threads, but as much as possible we just try to create safe abstractions so that certain types of security errors won't happen. It's not perfect but Matt asks, do the clients know which operations to send to go and which ones to send to Ruby or all operations rooted to Ruby and some pass through go it's actually the other way around so all operations arrive first at get the lead to go implementation and for some operations, these rights will be passed on like proxy to get the Ruby running on the same host. So actually I have a question about when we add a new RPC code to get the lead. Like you said, get the delay list have some functions, like all the functions. When we add a new journey to be called should we add guitar, should we add a method in Gittery relays to what just we add in Gittery go only. And that brings me kind of nicely to to my last slide here and lots of stuff already existing Italy so there's no need to reinvent the wheel for example the operation service that I just showed, which will trigger the hooks that we want to call for a user operation. If you were to implement a new RPC that needs to write your repository as a user, I would recommend using that service and not re implementing it entirely and go. But if you're doing something that has none of the existing things related to it then by all means, only do it and go. That answer your question. Yeah, thanks. So yeah, that's kind of the some takeaways that I wanted to leave you all with Gittery does a lot already and we need to be careful to. It's not because Gittery already does something that we should call three different RPCs right next to each other in a single request. If that's going to be a high traffic thing. It might be better to implement that as a new RPC. And there's a trick that I linked here that you can use in specs to see how many calls you're making to Gittery and if that doesn't exceed a certain number. If you add resources for example the same way we try to limit the amount of requests and the amount of queries we send to the database. And I also noticed that trying to work outside of the go path is just a, well, it's more complicated than it needs to be so I decided not to do it. And there's some tips I wanted to leave you with on how to use a local Gittery instance. I personally have a sim linked my GDK Gittery instance to the one in my go path. And so I can use that one instead. GDK we also keep it up to date and so on if you wanted to do that. You update the Gitterly proto as I shown you before you can vendor it before it's merged as well. Or you can point your gem file to your specific branch and then you can already start work even before the protocol changes have been updated. Regarding the tests. I try to like for this particular feature was very handy to just generate some patch files and then have a high level test in GitLab Rails sort of an integration test that would validate that they get properly applied but we could like you should also be testing all the units along the line. For example, as you've seen in my example here there was some validation happening on the go side and then application on the application of the patch on the in Gitterly Ruby. So both of those are tested separate separately and there's an integration test on top of the on top of everything in GitLab Rails since we spin up. Gitterly process for tests anyway. Furthermore, the people in the key underscore Gitterly Slack channel are super helpful. So if you're building something you get stuck feel free to reach out to them. And that's, yeah, that's my main takeaways are there any more questions. Okay, so thank you very much everyone and talk to you all later. Bye bye. Thanks Bob. That was awesome. All right. Thank you. Thanks.