 All right, let's get started. Hello, everybody. This is September's engineering update and I am Stan who I'm gonna first start by welcoming a number of new members I think I might have missed Greg Stark from last time he wasn't on the team page, but now he is it is But I also want to welcome Mikkel I think I misspelled your name there, but sorry about that. I'll fix that later a Daven Walker for support and Alex Straighten for support as well. So welcome board. We really need your help Get lab 10.0. I think I want to highlight this. I read the blog post. There's too much to cover This is the this is the draft post. It's not actually published yet, but I encourage everybody to read it The big highlights from 10.0 are going to be auto devops the new navigation But there's a ton of other stuff in there that I'm really excited about stuff like S3 support for LFS and things like that that will really help a lot of our customers and get lab calm. I Just want to talk about and highlight some great work had done This is sort of behind the scenes that necessarily isn't called out in the least post But I know that Fati started this and with the help of Jacob Shatz and Phil and Flippa and Torrey Sean Dow would review all this stuff. Basically they took our existing issues and issues discussions and Refactored it in the Vue.js. Well that two benefits one Decrease the technical debt because it's much easier to test and debug issues with code written in Vue.js But also it made things faster because we instead of loading everything at once We are now loading the page and then pulling in other things asynchronously. So in some ways it's it's it's not exactly a direct performance improvement because we're moving stuff outside of the load but at least the user is perceived that the initial load will be faster and If you look at the issue there, there's a bunch of other related issues that say we can make this go faster. So I'm hoping by 10.0 we can get this number from Originally, it was about two over 2.8 seconds now. It's under two seconds of 1.8 I think we can get it down to under 1.5 or even faster by this Friday or this this Thursday when the release goes out So super excited about by those improvements. I think they're it's it's again not highlighting necessarily in a release But it's something that we should be proud about and and shows that you know refactoring does also pays dividends as well Yeah, it is the beginning of the performance improvement. There is a lot more there I wanted to talk a lot about geo because that has been on my mind and You know, what's going on in 10.0? I've talked a lot in the past about the previous architecture with system hooks basically every time you do something like you push a Hook goes out to the secondary and then it does something but we've removed that completely Essentially, we've moved a lot of code as a result if you look at the numbers there They're 63 change files, but close to 1500 lines of code deleted as a result So oftentimes you measure the progress by the code you delete but not by the code you add The new architecture has this geo log cursor. We've documented that we've added to the Documentation thanks for tone for removing system hooks and Gabriel for adding the diagrams of documentation around it What that means for most customers is that? You've got to now move your geo installation to to make SSH look up to happen over the database traditionally It's been done by this Authorized keys file that you have to manage is really hard to manage Big customers are having trouble can keep making that consistent But we've we've we've made a lot of progress in figuring out What's the story for sent us users some sent us 7.4? Have a good story because the new upgrade in September makes it this possible makes it really easy out of the box to enable this Something Gabriel's been working on for a while. This hash store support You can look at my last update to see the details, but there is now a setting in the admin page to enable this It's it's been a like a feature flag because we're still testing it out making sure that there aren't any corner cases but essentially the idea is that we don't have to touch the The path of the repository once it's there as It's a meetable name. We never have to rename it So that will help with geo and help with get lab comm quite a bit But we're still making sure that it's working before we enable it for everybody There were a bunch of security fixes Nick came in right away and just saw immediately some some Potential security holes and patch them right away. So that those went out With nine five and also with 10. Oh and then we fixed some other customer issues For example, people a customer had an issue cloning LFS objects from the secondary and we fixed that pretty quickly in nine five The next slide really talks about what does it take to make geo production ready? And this is sort of the test bed that we have set up right now If you heard if you hear that word test bed a lot, this is basically what it is We have production instance in Azure It has a sanitized copy of get lab comms database. It also has one file server from get lab comm They're in total of 16. I believe so we've only taken one just take a snapshot of one of them And then we have another geo instance in a different cloud. It's running a different database It's the secondary of the primary and then also a separate file server So we're essentially trying to test get lab comm or at least some subset of it and just see how this performs I'll give you a little bit update of where we are right now What we found immediately was that large databases Need some optimizations and Nick added some great optimizations at least get us out of the rut where we were before Where we weren't thinking we were basically retrying the same Projects are failing because the query was too big and we've added a workaround to at least make that go And 10.0 we've identified that we want to use postgres foreign data wrappers Because we have a primary database on the secondary and also a tracking database on the secondary to track what we've downloaded what we haven't downloaded And so this foreign data wrappers allows us to tie these two databases together in a nice way and make queries that are less expensive Yeah, we're also testing geo that's bigger than any customer that we have I've been on calls almost every day last week with customers setting up geo and they have far fewer repositories and geo seems to Work for them right now We've worked out some issues and we've identified them, but it's at least we're progressing and getting customers using it For get lab comm repositories are still sinking way too slowly. They're going but they're way too slow right now You can see at the chart below We started this last week and it's only up to point oh six percent part of it is that We are Too conservative about how we're scheduling things so a 10.0 We're gonna focus on increasing the parallelism and reducing the scheduler delay Based on the schedule delay is the time at which we decide to go say hey Go clone a repository from that primary and we're just too conservative right now. I wanted to highlight sort of The work that we've done over the last releases and just see What our time is being spent on on bugs from previously so these are called regressions bugs that surfaced from the current release things that work in the previous no longer work in the and the blue line is the number of closed issues the red line is the total closed issues for that release and Then the orange line is percentage of the issues relating to that so you can see in high point. Oh About 20% of it was Regressions it crept up to 40% 9 4 it came down to 20% but it's 9 5. It's closer to 40% so It's not great to keep introducing new things But you know, we're spending a lot of time trying to fix customer related problems or things that surface And we're gonna have to figure out ways to address this so that we're spending Less time fixing things that we broke and getting ahead of that and to be fair some of the stuff The regressions are things that we saw and release candidates so they didn't actually get out to customers But in general our testings we should have caught that before or even went out concerns Things that we need help with unicorn metrics. I promise you the same has spent a lot of time Putting metrics into unicorn so that we can instrument things like how busy are our workers or HTTP workers and things like that But we haven't been able to enable them because they there's some concurrency issues They're really tricky and Pavel's been sick and unfortunately Haven't had the bandwidth to address that but once we get that in that will enable a whole new level of monitoring Within our application that we really need We're seeing a lot more performance issues not only on get love calm But I'm we're also seeing with customer related issues. I think there must be at least The number of tickets every every week we see that say hey this thing is loading slowly This thing is timing out and support engineers have had to get on the phone and benchmark and profile and find things that are clearly just not optimal so Keep in mind that performance optimization that we will do in and get lab calm Benefit our customers as well because they're also running into these things and if we solve it for us will also solve it for them We're still getting a lot of error 500 things that shouldn't happen. We should be more graceful about Failing for example, so if we don't have a commit for example, we shouldn't totally bomb out and loading a mercy quest We should at least load some stuff but what this really translates into is increased load to the support team because either it's a get love comm user or it's a customer that says why is my page not loading and I Think we've pinged a lot of people on how do we improve these things? But essentially it boils down to just making things more fault tolerant and if something does go wrong We at least handle it and gracefully capture it and provide more feedback to our developers about what exactly went wrong Again the last slide. I really highlight the need for integration testing. We haven't put in much enough time in this for example, you know LDAP logins Google auth have At some point this past release is broken for people and that's really unacceptable because that is fundamental to get lab to get into get lab you need to be able to log in and We have unit tests, but in the end things have to be tested as a whole So there's a great project called get lab QA if you haven't looked at it It's a great framework to add these integration tasks We need to put more time and effort into those things so we reduce the regressions reduced amount of time We're spending on fixing things and more time worrying about the cool stuff that we want to build Really the plan for the next five weeks for geo obviously I've mentioned that we want to significantly increase the performance of that really the Goal there is to make that ready for production for get lab calm and to migrate to another cloud if we want to Gave will have been working a lot of finishing a tool that will help us migrate to this hash storage format. It's still in review Sid mentioned this NFS circuit breaker that will enable availability on get lab calm that we're still improving that and There are still some cases we need to handle so that doesn't get too trigger happy We need to reduce the amount of false positives there and have the unicorn metrics to monitor that as well We're going to Q4. So we need help defining these okay ours for all of engineering They're gonna be I think a lot of them gonna follow on Q3, but we need to flush those out Hiring we're really gonna increase hiring across engineering You can look at the jobs post and there's a lot of different positions available today So if you know people who are good fit, I'll talk about that in the next slide But you know people who are a good fit, please reach out to them. You need their help He led 10.1. You can look at the kickoff documentation for more details and then the summit Looking forward to meeting everybody and seeing everyone there that will know a coincide with the release of 10.1 So it will be exciting to be in the same place when the release actually goes out I think last time in Austin it went out this following Saturday of the summit Which is not ideal because people had to travel and get back so Hiring as I said, there's a lot of positions open anywhere from the Julie to front and back and developers and yes, there is a referral bonus Thanks for that reminder Sasha Any questions? All right, I don't see any questions. So thanks everybody for your time