 Okay, everybody, it's eight o'clock my time. So welcome to the engineering update for July I hope everyone gets has access to the slides. They should be on the calendar. It's not a walkthrough and we have two new members since I last update on production side. We have Victor Lopez and On the CI CD team. We have Shinya and welcome Victor and Shinya They already hit the ground running in the first five weeks and done a lot. So welcome board Wanted to go over so the engineering Okay, ours before I get going on some of the summary of what we've done and what we're going to do The main thing that we're focused on are a couple items Enabling high availability for get lab calm and for our customers Some of forget lab calm specifically there are two Things that we're really working on right now One of which is graceful decoration at a fast servers right now We have something on the order of 12 to 16 at a fast servers running on get lab calm And if any of one of them falls over get lap comms down So we're trying to avoid that and trying to make sure that at least if one of them goes down We can still function with the 15 other ones So that's work on going thanks to Bob one thing that we identified in Maintaining availability is that our reddest cluster was growing quite big We were storing lots of cash data during a lot of persistent data like sidekick cues and things like that and one thing one feedback we got from The reddest author himself is look you guys are using reddus in two different ways You really need to have two German clusters for this use case And so thanks to a community contribution from Paul Charlton and review by pro Robert We should have this in 95 The two the one big area that customers and we are facing today is just performance of get lab itself And when they okay ours is to lower the latency in application and there's two areas for that We need to focus on the back end There's obviously a lot of work going on with the git elite and profile system acts But on the on the more general higher level stuff on the database level There's a lot of optimization we can do and there is a good list of the ten things We need to optimize so if you're on the back-end team, please take a look and We can optimize one at a time and hopefully we get a lot of bang out of our buck by doing that There's a good front-end plan that Tim and Jacob chats up it together about what are what are we going to do to optimize the perceived performance and also the actual front-end performance and there's a lot of simple stuff like lazy loading of images and Getting rid of all this inline JavaScript so you don't block the whole rendering when you go to a page So let's work on going right now The other key one that I am particularly sensitive to in our support team as well as the critical stability issues. So for example We've all seen merge requests getting stuck or fork not working or import not working We need to get that solid and we need to make sure those things work a hundred percent of the time And the other okay ours are just making sure the other existing features are used by get a lot of the comp things like service desk So forth Kubernetes deployment canary deployments and all that and the last one That's really been on my radar is just getting geo dr off the ground and getting customers using it and working on all the kinks there So jumping into actually cool stuff that we've been working on front-end and then the UX team have done a great job of Putting in new navigation for 9-4 and if you haven't seen this yet It should show up pretty soon now It's on dev it is on staging right now You can once this RC goes live either today or tomorrow you should be able to click on the right upper right side of the This screen here and it's just turn that new nav You click there and you you have to change your preference, but then you get this new navigation So it looks cool. It's a significant improvement to what we had it It can mix it answers a lot of the criticism we had about the two-level navigation. So great work By everybody involved here to get this going On the platform side, I think that we mentioned that we've got to improve audit at events that showed up in 9-3 This is actual data from get lab comm you can see you you can filter now by groups, which you couldn't before which is great So that if you have specific Groups you want to monitor you can go to the admin panel and do that So I think that before you just had one giant list and it wasn't necessarily that helpful Discussion this is launching in 9-4. I just played with it yesterday. You can add related issues to an issue So if you've got one issue that ties to other ones, you can add it to there and it shows up and it's a pretty cool feature and it's Our goal to match the kind of features that people want in an issue tracker like JIRA It's finally landed. I think I got delayed 9-3. So it's pretty exciting Postgres HA the build team is working on it and doing some great work there I know Maran mentioned a bit about this and I kind of wanted to draw the diagram of what is actually going on with Postgres HA and correct me if I'm wrong Ian or Jason or Mara this is the picture that I see You've got get lab running on one machine. It's talking to a database proxy called PG bouncer So if some database goes down PG bouncer will figure out Well, we'll be will direct traffic either to the master or the secondary on each of the database nodes We have this thing called rep manager, which is essentially trying to monitor the health of the database and it will take action If something goes down So as Maran mentioned is up his update. We shipped the PG bouncer in 9.2 we shipped the actual rep manager in 9.3 and The actual config to use rep manager is now going to be in 9.4 I think Ian is still you guys are still working on figure out Well, what happens when an actual failover do we give rep manager access to the PG bouncer machine to alter the config? That's TBD, but that's basically what's happening with Postgres HA So we'll start hopefully we'll use this not getting out the calm right now to actually have real database failover And once we have that that will enable other customers to do that as well Edge team they this performance bar I mentioned in my last update and I want to showcase some of this because it's a really cool feature that will help both our Support team and our development team and our customers as well Essentially we love we shipped this night for the challenge was that we've identified a number of security issues That if you gave everybody access to they could figure out like the existence of projects and other things so Remy and did a lot of great work to Actually make it possible for you to say okay I want this group to have access to this feature and We've enabled on staging so you can enable you can play with it on staging by going to your account Logging in and typing P and then B that activates it if you actually go to a page and scroll down You can actually see this performance bar and click around You can click on one of the links of this PG link here up here Actually shows you the sequel database queries that actually happened and it's sorted by longest queries actually it's not sorted actually That seems wrong there, but I thought it was But then you can click around and look at the view profiles and it basically gives you a lot more data about what's as slow So I think this will really help Debug a lot of things front and center So if you notice a page is slow you can activate this and you can click on the on the proof of the performance bar and see What exactly is taking all the time? Yes, it is activated on dev already. I took the liberty of doing that To I've been working on this with the team at Douglas and Gabriel First order of business is we decided to load up an instance with 1.5 million projects to see where everything would fall over The first thing that happened was postgres file out of sync even though we were connected This is a solved problem postgres has a thing called replication slots That basically allow you to keep as much data on the primary as you need As the secondary needs so the secondary gets disconnected for a week The primary will keep as much data as necessary so that when the secondary it does come up it can automatically resync So we've added support for that in 9.4 We've run into all sorts of read-only issues because the secondary is Got a copy of the primary database, but a read-only copy. We have a lot of code in our EE system that tries to update the database when it really can't so we've addressed a number of issues There's still a few of them remaining But the big ones where you had background workers trying to do stuff that it shouldn't have been fixed And the third thing we saw was that repository sinks were just too slow if you got 1.5 million projects You really need to paralyze as much as possible and the first iteration was just you know synchronized one at a time and In 9 4 were synchronizing multiple repositories at once so you don't have that problem So we still need to retest this, but it's an improvement over what we have before In 9 4 we've done a lot in geo this past month We have this event log that tracks some metadata about when people delete projects or rename projects We need to be able to reflect on the secondary So we've added a lot of information in the event log that isn't in the audit log For example, like what the project was before and what it was after so we can actually do stuff and respond to those events Gabriel implemented the first iteration of the geo log cursor which basically looks at this event log and goes one by one and tries to do stuff and update Which repositories need to be synchronizations and which files need to be downloaded and so forth Again as I mentioned Douglas did some work on improving the performance of the repository synchronization We've added as I said mentioned support for replication slots. The big question we've always had is how do we maintain the Consistency these author authorize keys. These are the things that actually manage SSH access to get and What we've had and get love the comms. We've done a database lookup Now does that work for our enterprise customers? We're running sent all us the challenge is that they need a custom version of an open SSH team in to do this And so we've tested this we built written instructions and how you actually build your own version We may consider shipping out a own package to make it easier for customers But first iteration is just document. How do you actually do it if you wanted to do it on sent us? And then the last thing of course is a number of customers are trying out geo already. So we're Constantly talking to these Customers or prospective customers on what they're running into and getting feedback right away. So it's been a good kind of feedback lose this past month Concerns need help This is a great graph that Mike Bartlett put together to analyze how many of our issues are Related to new things and how many I have to do with the regressions The the the blue line is the number of issues we have for each release The red line is the number of issues related to regressions And you can see there's an indefinite uptick and the yellow line is the percentage of those issues And you can definitely see there's an uptick From 817 about 10% 15% of our time to closer to 3035 percent and we're addressing this by trying to get feedback earlier doing better reviews and Having a better canary deployment process and having people look at staging earlier on and so forth Other concerns more customers are actually hitting performance issues I'm seeing a pattern in the last couple weeks where customers are doing stuff and saying this is slow and One common theme seems to be my sequel now whether this is the only problem is not really it's not really clear But it's definitely one main source of pain And we're addressing this I'll talk about that a little bit and support team again is getting backlog because we have a lot more tickets We lost two members. We're getting more inquiries from prospective customers and in-depth health The tickets are not as I can't log in. It's you know, this my Postgres database isn't working or this Setup isn't working and so they're becoming trickier and they're required jumping on a call and doing some really tough debugging Plans from next five weeks as I mentioned my sequel is becoming a Bigger problem now Jen Shan has been doing some great work to figure out How can we provide a really smooth my sequel to Postgres migration process? There's a tool out there called peachy loader. He's been playing with and it's promising results there So I mean the main issue is getting the scheme is consistent and that's the hard part about doing this migration Platform teams doing a lot of work to make sure that is the default so you can download you can install it You can use all the features and then if you want to upgrade a certain feature It will be right there in the in the UI You can purchase it from the web web link and just make the whole process of going from CEDE much smoother Victor Lopez and Nick and Valery have been working on actually starting the elastic search indexing It's it started they ran into a number of issues with Sharding and balancing these across different nodes, but it's happening and they've They're going back and improving that and with geo One big thing we realized is that we really need to get this The rename and deletion case working well at scale and to do that Right now we tie all the repository names to the actual directory name. So for example the gitlab CE project is Name gitlab CE dot git on disk now if you rename that from gitlab CE to something else You've got to also rename it on disk and and that can be really a problem if you think about that happening at scale And and this is also probably get lab.com. We see that sometimes the file system move fails So we can simplify this by just saying look the disk name can stay the same But the project name can rename to your heart's content And so we're working on figure out what is the right scheme to do that? How are you going to make sure we have a smooth migration process? And I think the production team will be happy for this feature and we'll be happy on the geo team for this feature because it will Simplify our lives greatly There's a bunch of other events like the rename deletion case. We need to handle on the log cursor. We're working on that now That's and then for hiring we're still hiring we've actually opened up a number of positions recently the back developer Position is open Support engineers from all regions are open. Please take a look if you know somebody who'd be a good fit, please Have them apply and you know, let us know if they're somebody we should talk to right away Any questions? Yeah, the perfumer's bars at the bottom of the page. I Was at the top of the page. I think we had CSS issues And so I think Rami's looking at that whether we can move it back to the top Do we plan on doing a blog post let communication community know about it? Yes, there's an entire blog post on that great Yeah, and Jim thanks about meals meals off meals off is trying to use our API actually to create projects and Create commits directly with the API and they ran a number of performance issues there And so we've been doing a lot of work there to optimize those those optimizations won't show up as like the UI Performance, but it's important for customers because they're trying to use git lab In a different kind of way Great if there are no other questions. Thanks for everything and I'll see you in the team call