 Okay. Can everybody see my screen? Yeah. Yeah. Okay, cool. Let's get started then. So welcome. This is the Functional Group Update for Infrastructure, which is a group of multiple teams. I'm not reading the chat right now, so if you want to ask anything, just leave it there. We're going to get back to that later. So first of all, production, the status of the post-post-mortem of the database, which means that all the actions that came up from the database incident we have, currently we're running a backup recovery test at all levels. The database is covered. It's covered by Wally, which is a streaming backup solution. We talked about this before, and it's behaving really, really well. We're also checking storage. We discovered that things are basically breaking after we move to ARM, which means that the assure way of doing snapshots in classic is completely incompatible with the new environment we're using and that we pushed to. So we're rebuilding all the disks. This is going to happen. This requires downtime, which means we're going to be doing it during the weekend, because that's going to be when we have low usage of the site. But it's a risk, so we want to do it really, really soon because technically we don't have backups for the file system right now. So this matters a lot. It's critical. This will be a public recording. If you say no backups for the file system, can you elaborate what we do have? So we do have the disks right now. We do have all snapshots. The problem is that we don't have new snapshots for the Git file system. We have backups for the databases and Redis. Okay. So it literally is we don't have backups for the new ARM Git thing. So this is our highest priority item. Yes, which is basically why we dropped everything else, and we're concentrating on this to get it solved during this weekend. The sooner, the better. Besides that, we added monthly backup appreciation date in which we test our backups and we restore something. It's scheduled in the production calendar. I think it's the first Tuesday of the month, which means that next Tuesday we're going to be recovering the database completely to check that it works. The big theme ahead, besides all this critical stuff, is that we have a plan for getting to Canary deployment. It's a long-term vision. It's going to take some time, but it's going to bring some really interesting things. The first one is that with this, we will be able to provide migrations at a large scale and deploying branches using review apps, which means that we're going to be able to dog food ourselves. With this, we also want to provide automated black box testing to detect performance regressions. In the same plan, there's a proposal on how we can actually have a baseline and test whenever we're working on a branch and check if we're going forward or backwards in the new branch. On the long-term, we want to use containers in production with rolling deploys. It's a long-term plan. It's going to take some time, and we're going to start with development and then move through staging to find where it gets production so we get there safely. In other news, Jason Tevnan joined us as a senior production engineer last week. There's more people coming soon. You'll get to meet them all. As sure as we're starting NFS instances, and this is causing downtime, we have been chasing them a lot on this, at least to understand why this is happening and what we can do and particularly why they don't send an email to let us know when this happens or if this is going to be happening because we're not having a good experience there. We got some improvements, but nothing too critical. Besides this, we're also pushing for a change procedure because we have been detecting that we spend a lot of time in production, introducing changes, and we want to avoid, we want to announce whenever we need downtime with enough time for the customers to actually adjust to it. We want to control it, and we want to have a solid checklist of what are the things that needs to happen when we want to take a change to production. With this, we're also pushing for a production readiness questionnaire, just a set of questions to get to know what is the state of things we want to push into production. The aim of this is to avoid not having a clue of what is actually going to be happening whenever we want to push something to production. Regarding the database, I'm taking your spot today. We started working with a consultant from Crunchy. It's already yielding really great results. We're getting really good information from them, and we're setting up following steps to improve quite a lot of things. We right now are running three database hosts, two as secondaries and one as a master with load balancing. Enabled all the time. We did have a little bit of a setback here a couple of times, but in general, it's behaving really well, and it's performing great. We also enabled PG Bouncer in production. PG Bouncer is a thing that allows us to reduce the resources that the database needs. We aim to use smaller hosts for the database by doing this. We can use less connections, which means less memory, which means less resources in general. It's really good progress. As far as I know, we're talking about dropping MySQL support. I don't know how official this is, but... It's not official. We're not going to drop MySQL. We have a lot of customers using that. I don't think it's likely, but we're talking about it. Yesterday we lost George somewhere in the canals of Amsterdam. It's the King's Day, which means that we are hiring a database specialist. We're hiring a database specialist because George has way too much work and there's a lot of work to do there, and we want to have more people. The position is open, and we are basically hiring for it. Regarding security, we're close to complete the first risk assessment. This is going to yield interesting recommendations by impact and what is the cost to mitigate these risks. Some of them is DNS protection and improve the incident response policies. Basically, we don't have good processes for handling these things, and we are working on getting them in place and then testing them so we are ready for whenever we have a problem. Hopefully, we will never have a problem, but we want to be ready for it. Top recommendations by impact is the disaster recovery plan for whenever Azure goes down. This means having a large disaster recovery plan. We're all working on that. We plan to work harder on that. Better monitoring for vulnerabilities. Particularly, we want to know if we ever get compromised, we want to be able to react really fast to that, which is the data breach notification policy also, which is about that. On top of this, we're hiring a security specialist, which is a development position of a highly paranoid person, to say it somehow. I think that the position is opening, and we're going to be interviewing or we are interviewing for it already. Julie, Andrew, want to take this? Cool. Thanks Pablo. Yeah, I'll just tell you guys about Gitly a bit. I'll start with accomplishments. Our first accomplishment is that we actually managed to ship an initial release of Gitly in GitLab 9.0. For that release, we shipped smart HTTP info riffs, and it's been running well on GitLab.com. Right now, it's running alongside Workhorse and reading Git data through an NFS mount. We hope to change this in the new future by running Gitly close to the Git data, and I'll tell you a little bit more about that later on. In GitLab 9.2, we have several more migrations running behind feature flags, and I've included some links to some of those migrations. Concerns for Gitly, so probably the biggest concern that we have in the project is that we underestimated the learning curve for GRPC adoption, and so we've had a lot of problems around the Ruby component that we use for GRPC, and I've included issue 191 there. Just kind of gives you an outline of all the different problems we've been seeing. We've also been investigating potential workaround using a piece of technology called the GRPC Gateway and Swagger, and then generating a Swagger Ruby client that doesn't use the C bindings. Basically, what the GRPC Gateway does is it takes the GRPC component and exposes it as a restful interface, which is obviously nice, simple, boring technology, but it's probably worth just saying that that is like a backup only at the moment, so we're going to continue trying to get the GRPC component to work as it should, and if all else fails, at least we'll know that we've got something that we can fall back to. We've had a few other problems with GRPC as well, but I won't go into those now. The second concern that we have is around the lack of like a stress testing environment where we can test things with load, so one of the things that we found is that we've tested things in staging and in developments, and we haven't found any problems, and then we've put them on gitlab.com, and we have found problems, and it's all down to basically load, but I think some of the things that Pablo mentioned earlier in the presentation might help with that. Cool, do you want to skip to the next slide Pablo? Cool, so our plans for Gitli are firstly, as I mentioned before, is getting Gitli to run on the network, so currently as I said, Gitli is running on workers via Unix socket, and in 9.2 we'll be moving Gitli across the network and co-locating it on the NFS service, so it's nice and close to the data, and once it's close to the data we're hoping to see some performance improvements, and these will mostly come from reducing the roundtrip latency time for Git IO calls, and once we've got some experience with the network configuration we'll be able to start getting an idea of what the next step forward will be in terms of optimization for Gitli, and alongside the optimization work of course we'll continue to migrate routes one by one to Gitli, so we'll be doing that GitLab CE, Workhorse, and GitLab Shell, and we're hoping to complete the migration of Workhorse first, and we'll probably do Shell after that sometime, and GitLab CE will take a lot longer because there's a lot of routes that need to be migrated, and so we'll be, we've prioritized those by the worst-performing routes first, and we're working our way down the list, and there's a link to that list included there, and that's it for Gitli, so I think we're on to questions. Correct. Let's see what we're having in the chat. Yes, your question, yeah, Gitli is running on all the GitLab servers at the moment. Yeah, so I think that's quite an accomplishment that we shipped it in 9.0, and there was, I think, a small thing where it didn't start, but we didn't have any major, I don't think we had a lot of complaints, so kind of introduced a big architecture change, and everything just kept working for our users, so I think that's a huge accomplishment by the Gitli team. Well done. Thanks, thank you. Any more questions? Quick, it's the one. How is the pipeline for the database specialist looking? Not very good. We started changing the job description. We had a meeting yesterday, and there's a merge request where we're basically changing it, because it looks like there's a lot of people who go to the job description, but then don't apply. So it seems like there's something that is not exactly clear. Also, we use the opportunity to clarify a little bit our ideas, because the position has been changing through time. Yeah, and if I recall correctly, I think it's now advertised on Stack Overflow, so I can comment on whether that's true or not. Same for the security specialist. I think that that's what she mentioned yesterday. Quick, it's the one. Quick, it's the two. Quick, it's the three. Have a great day, everyone.