 Great, so for our first tutorial session for the Q4 hackathon, I got a couple of great colleagues to talk about at LabGeo, so Fabian and Ton. So let me turn things over to you, I'll let you introduce yourself and then we can go from there. And then for people who have questions, feel free to either type it on the chat or verbalize it during the session and we'll go, let's get started. Super, yeah, thanks for the introduction. So I'm Fabian, I'm the product manager for Geo and I'm very excited about the hackathon. We got some really great community contributions during the last hackathon and already actually merged community contribution during this hackathon, which is amazing. And what I'm going to do is I'm going to share my screen and then give you a high level overview of what Geo actually is and then also show you Geo in action, what you can do with it. And I'm joined by Ton, who's a backend engineer in Geo, who is hopefully going to stop me when I say things that are technically not correct. All right, let me quickly share my screen, all right. You should be able to see my screen, is that right? Yep. Okay, so Geo is a part of GitLab that is interesting to GitLab customers who are very distributed and who are potentially interested in additional disaster recovery capabilities. So these are two statements and I'll run through both of the sort of main use cases of what the Geo group is concerned with at the moment and then explain that to you a little bit more. So the first one is Geo essentially provides read-only instances of GitLab instances reducing the time to clone and fetch large repositories and speeding up development. So you can imagine if you have offices in different continents, different time zones, sometimes having a single GitLab instance, let's say for example in Europe, may result in performance issues for developers that are outside Europe and the United States or in Asia and India. And so Geo essentially solves that problem by being able to create many different read-only instances very close to where developers actually are. That's particularly interesting when you have very large repositories and large amounts of data that you need to clone and fetch because internet speed is great in some regions but not in others and you really want to have your Geo instance close by. So that's sort of the first use case for Geo and that also makes the most sense. It's about Geo distribution. This is why we are named in that way. The second one is actually sort of almost a natural follow-on to having a Geo-distributed system. So Geo replicates not only Git repositories but also the database, so the Postgres database that powers GitLab and a few other assets. And what you can do is as well you can essentially promote a read-only instance to a read-write instance in a disaster recovery situation. And so that is quite useful if you, for example, run a GitLab instance that contains more sensitive data as probably most instances do. This can reduce the time it actually takes you to recover from a disaster. So let's say a power failure in a single data center. You can then promote another read-only instance. One important side note, this is not a replacement for backups. This is to reduce recovery time, so you should still backup your data, but this is the second use case. If you then look at this at a high level, the way this works is essentially like this. You have a primary, and I'm going to use this sort of term quite a lot, a primary instance. The primary instance is read and writable, and then you have a secondary instance in a different region or a different data center that is read-only. And we mirror most of the data between those two instances. And then for Git traffic specifically, we are actually able to proxy the requests over from the secondary. And what this means is you essentially have now a system. It's almost like a star topology type network. We have a read-write primary in the center, and then you have one or many secondaries surrounding your primary. And quite a few of our customers have, let's say, one secondary, but some others have many different offices, and so they have sometimes five or more secondaries. This is the whole thing on a high level. Because this is a hackathon, and the high level may not be sufficient. This is the geo-architecture diagram, and now you can see this is a little bit more involved if you dive into the details of what replicating things between A to B actually means. And I think one thing to highlight here, sort of from the geo team, that geo is built to be relatively resilient when you transfer information over the internet. So there's a lot of internal logic to retry and to make sure that data actually gets covered. And so if I start here from the top, again, we have sort of a primary node and a secondary node. And the first thing to know here, and this is actually quite interesting for many of our engineers is sort of this database aspect. So GitLab uses internally a Postgres database. This is at the moment the only supported database technology. And we essentially stream all of the changes in the database to the secondary node. So this here is the Postgres streaming replications, which is relatively fast. And so every time there's a change on the primary that essentially is written to the database, that's a create a new issue that gets replicated like very quickly to the secondary. But we also have a tracking database here, which is the only writable part actually on the secondary, which keeps track of all the events and what changes were actually made. And then we sync up the primary and the secondary node. There's also other things like, for example, authentication, the secondary node authenticates against the primary. And there's a bunch of additional bits and pieces here that are being replicated. And so we have, for example, attachments. So if you upload something on your secondary or CI artifacts, LFS objects, they get also pushed to the secondary and are available there. That's quite helpful as well because you don't need to manually rsync these things over. They are just supported out of the box by GEO, which is quite neat. And one of the main challenges in the team is actually adding new data sources here. So let's say currently we're working on design repositories. And then lastly here, we deal with Gitterly, which is the RPC layer that we used to access Git. And this is actually very neat. And I'll show that to you live a little bit. So from a developer perspective, using GEO is almost fully transparent. That means if you actually clone or pull changes from a secondary node, you can also push changes to the secondary. And this gets proxied transparently to the primary. So it essentially behaves as if it was writable, but it's not. And that's quite nice because you don't need to configure Git in such a way that you pull things always from the secondary and then you push things over to a different server. You can do the same thing on the secondary. And actually by now we also support using a GEO aware load balancer for this, specifically for Git traffic. So you can have, let's say, five secondaries and one primary. And they all share the same name essentially. And then there is, let's say, root 53 or other implementations. Depending on where you're located, we determine sort of automatically what the closest secondary is or the closest node is, and then you are automatically redirected to that. So that's GEO essentially in a nutshell. And from an engineering perspective, it is quite interesting actually because a lot of the changes that are happening happen in different parts of GitLab as a whole. So you can sort of joke that if you want to work on GEO, you sometimes need a good overview of what GitLab actually does as a whole. So for us it's sometimes quite interesting because we interact with a lot of different teams and need to understand how new features are being added. And that's actually quite exciting. Okay. I also have a list of issues here prepared that I'll send to Ray and then we can go through that in a little while. But I wanted to take the opportunity here to actually make it a little bit more concrete and show you how this actually looks like on a live sort of demo instance of GitLab. So let me drop out of my wonderful presentation here. And I don't think there are any questions in the chat yet. So I'm just going to continue. Okay. So what I have here is a pretty empty GitLab instance. So I just reloaded and you can see here I've designated this as a primary. And it behaves exactly as any other GitLab instance would. So for example, let's say I make, I go to my projects here, I create a new project called Hexon and I just create it. And I initialize the project. Just on it, there's no difference that you need to be aware of. But if we actually now go and we look at the administrator area, this is actually the latest installation of GitLab, so 12.4.2. You can see here on the left we have the geo area. And you can see that this is actually one of two nodes. We have the primary here, which is where we are. And it's healthy. And then we have a secondary configured here as well. And you can, they're essentially linked. And so if you look at verification information and that other information, you can essentially see we have three checksum repositories and three wikis that always get generated with it. And some other more technical information. And then we have a secondary here that is set up to replicate all of the changes that happen on the primary. And so if we look at the sync information in here, you can see that currently the last event that we processed was 46 minutes ago. So this is going to update relatively quickly because we just actually created a new repository. And if we now like follow this open projects link here, and I opened this actually, we are now on the secondary. So this is this node over here. And you can see the first thing is actually that the interface here currently, the web UI interface is indeed only mode. And this is, if you recall, I said that this is a read only node. So in the web UI, you can look at things, but you can't necessarily change anything because that would actually cause changes in the Postgres database. And we have currently not an able sort of proxying of changes to the to the primary. So yeah, I'll reload that. And luckily enough, you can already see that the hackathon project I actually created is already synced over here. So I can look at it and you can see this is the thing I actually created, but I won't be allowed to actually perform any changes. So let's say I tried, I tried to add a change log test, I committed, it will fail because I cannot perform right operations on the read only instance. So this is this is a limitation because you are on this sort of read only, read only instance. If we go to the admin area again, I'll just leave the site and we go to geo, you actually get a little bit more information here's the sync information again. And you see the last event actually from the primary was two minutes ago, which is exactly me generating a new repository. You can see here, this is currently unverified. So we verify some of our data, data types. So this will be scheduled in the background, and then be verified. And you can also look at some more detailed information here, where I created some some test projects and I can take some corrective action, for example, to re verify and re sync. The same for for uploads. So for example, I uploaded a screenshot a little while ago. And this is also currently in sync. So that's that one of the challenges we are actually going to address relatively soon. And that may be maybe something where community contributions are also the welcome is redesigning parts of the UI. So that's that's one of the things I'm looking I'm looking forward for forward to like understanding how to surface the information that is relevant to systems administrators. But one thing that I also wanted to show you is how you interact with geo when you actually you know, you use kit, which many of you will be quite familiar with. So for example, if we go to my projects, and I go to the hackathon project here, and I clone. So for example, here, you can clone with SSH, I'm just going to like open my terminal. So let's say get this one hackathon primary. So I'm cloning the project now from the primary and I'm just going to go into the hackathon primary projects. And you can see here's my read me. I'm just going to quickly edit it, fix some spelling, I'm going to leave a message from the from the primary. And now I'm just going to commit with a very informative message. And now I'm pushing. And this is interesting, because here, this is on the primary. So this should just work, right, because this is the read write instance, I can I can make changes this directly. And if we go back and look at the repository here, you can see this already synced. That's not particularly surprising. So what happens if you actually go and you clone this from the secondary. So if I go on to my secondary node here, and I go to the hackathon hackathon project, you can also see this was already proxied over, right, which is exactly what Geo does. I pushed it to the primary, and now I've already copied over my changes to the secondary. But if I clone here now, so I'm actually not going to clone from the second from the primary, I'm going to do this on the secondary as let's support this hackathon minus secondary. So here I have the same thing again, and I've cloned it from the secondary itself. And this is already like one of the main, like the main benefits, you can't really see this. Obviously, this has only a small read me file at the moment. But in, let's say, some instances, one thing comes to mind is, for example, a very asset heavy industry is in the gaming sector, where some of the repositories will be very large. They will contain many LFS objects. So for example, for, let's say the primary here is in the US, and you download and clone your repository, it may take, let's say five minutes. But you have another office in Europe, and they try to clone this from the US, it takes 20 minutes. Developer satisfaction goes down a lot because nobody likes to wait. But if you then actually clone this from the secondary that is also in Europe, it may be five minutes again. So that's the main benefit here. So I said the secondary is read only, and we've cloned here from the secondary. But one of the cool things that GEO actually does is we are able to now here also type in, I'm just going to do this, and I'll commit, this is not surprising, I made some changes. But now I'm going to push again. And one of the cool things that actually is enabled via GEO is that if you do it, you get transparently proxied over to the primary, and you can see this message here that we helpfully left for you, so like, hey, you're pushing to a geo secondary. We'll help you by proxying this request to the primary. And this is exactly what we are looking for, right? We are able to essentially use a local read only mirror. And we are able to more or less transparently proxy all of our requests over to the primary. And if you would add a geo aware load balancer into the mix, this will actually be not, you know, my demo secondary geo yada yada. This would just be, let's say git lab or git.company.com, and it would automatically determine what the right node is. And so if we go back to the web interface here, you can actually see the other message appears. And so that's essentially the functionality that GEO offers. And one thing I would like to highlight is that this is actually from an engineering perspective. This is really quite interesting, you know, all of the logic that we need to have, you know, all the functionality that enables something like this is quite exciting. But ultimately, GEO is also sort of from the product perspective, a part of git lab where for many of our users, we are interested in making the user experience very transparent. I'm explaining all of those things to you here because it is interesting. But if you think about a developer who is interested in using GEO and interested in getting the best user experience possible independent of their location, fewer configuration steps and fewer extra things that you need to need to add are quite important. And so, you know, like all of this internal logic here is something that to a certain extent we are trying to abstract the way for our users. Yeah. If you want to learn a little bit more about GEO and how it works, I recommend you take a look at the documentation. There's a lot of interesting information. You can see GEO is something that is like very actively developed. There are some current limitations that I would like to highlight. So we replicate quite a few data types already, but there are some things that we are currently not supporting. And so we are thinking about the next steps of, for example, supporting git lab pages or supporting Maven repositories. All of these are things that we are looking into. And we have some interesting things on the roadmap to allow others to actually contribute a little bit more actively inside git lab, but also in the wider community. Because at the moment, sometimes adding new things is relatively challenging. And we have a proposal at the moment to move, for example, or to try out to move to a more self-service framework where it becomes easier for everyone to contribute new data types that are GEO supported. That's that. If we go to issues in general, so this is the self-service framework I was talking about, but we have some issues that are tag accepting merge requests. And so the group that you are searching for is essentially group GEO and then accepting merge requests. That's quite broad. There is loads here. We have a short list, but any contributions are welcome. We are really looking forward to some of the work that you are doing. We have a few things that are improvements to the front end. Some are contributions that are more back-end heavy. So it depends a little bit on what you want to do. And we certainly also, if you are really keen, have some more advanced feature requests that you can dig into. So yeah, I think that's pretty much it, I'd say. Cool. Thanks, Fabian. Just one quick question I have. For some of these issues, are there cases where you may need like enterprise license of GitLab to work on an MR? So I think that may be the case, especially the last one here, the advanced one. If you are interested in that, that is something that may be required. So GEO is a premium feature. But yeah, I must admit, I'm not quite sure how the process is in that regard, but I'm sure we can figure something out. Yeah. I mean, so basically, I mean, what I recommend people to do if you need an enterprise license, I mean, go ahead and get a 30-day trial. And if you need more time beyond the 30 days, especially for complicated ones, just ping me, R-Pake, and then I'll be happy to get you extend that license for you so you can continue your contribution. But yeah, I just wanted to double-check, so thanks for confirming that. Yeah. Any other questions? Let me make sure there are nothing, I don't believe there's anything else on chat. Like, Ton, I don't know if you have any recommendation or suggestions for contributors in general as they get started. Obviously, they can ping you on issues and merge requests if they have questions. Yeah, please do just browse through the issues, look up something that might interest you. You might discuss how we can maybe break it down in small issues or stuff like that. Just ping me or Fabian will help you along. I think that's true. If you are thinking about contributing and you have a question or you need clarification or you would like somebody to review an MR already, just ping me. I'm usually pretty good about responding to those. So don't be shy in that regard. These things are really appreciated. This is not bothering us at all. It's exactly what we would like to see. Yeah, as Fabian noted, I think there was an MR that came in for Hackathon and I think within a few hours, you were able to merge it. It was a good documentation fix that somebody pointed out. So that's everything's welcome. Yes. Cool. All right, I guess there are no other questions. We can just wrap things up. I'll post this video on the playlist channel. And then, yeah, Fabian, if you can send me the links to your slide, I'll post it on the Hackathon page as well with the issues. Appreciate your time. Thanks for organizing it and thanks everyone for contributing. All right, have a good evening in Europe. Thank you. Bye.