 Okay. Hello, everybody. Go ahead and get started. It's the top of the hour. So hello and welcome to the infrastructure functional group update for September 2017. And so a quick rehash of the OKRs to the infrastructure teams. The goal of the infrastructure team is to ensure that GitLab.com is ready for mission critical tasks. This includes an availability of GitLab.com at 99%. 99% of user requests served in under one second and completing the top 10 risk assessments. Accomplishments. So first up for production. The geo testbed is up and running. We are now actively working development on geo. So that's great. This is the future of our redundancy. So that is off and going. Elastic search. We are slowly, very slowly backfilling our elastic search cluster to enable using it for searches at GitLab.com. We've added feature toggles to dev.gitlab.com also. So this has removed the bottleneck in the GitLab testing process. C groups are now in around GitLab. Ilya did a great job of wrapping GitLab services up inside of Linux C groups. It's a jail that makes it so that GitLab cannot take all of the systems resources of a file server. Therefore it can no longer crash a file server. Good stuff. We are slowly moving artifacts into S3 object storage. So object storage will allow us to gain redundancy and the ability to scale and remove another single point of failure. We currently have a proof of concept that can build staging on-demand environments. And as we speak, we're rebuilding the front-end fleet and removing all nested mount points. Nested mount points can cause a lot of problems, particularly with LFS. So those are going to go away. Next, database. In 10.0, we're finally getting rid of the old event setup. This saves us around 165 gigs of storage. It was very large. We initially estimated 140 gigabytes, so we did better than expected. There's also potentially a reduction in the amount of buffer caches that are being used by the events tables, but we still need more deployments and measurements to verify this. The P99 of global SQL timings is closing in on that 200 millisecond barrier. During the week, we're averaging 230 to 260 milliseconds. The setup on our primary database has been improved thanks to production team. So that matches the secondaries, leading to much better IO wait timings. Also, the high load, high database load that before was taking production completely offline is now only increasing replication lag. We're finally using PG Bouncer from Omnibus, albeit there's still some manual tweaks, but there is an open issue to fix the last of these and remove them. Migrating all events to a new schema, we moved to somewhere around 100 million rows. It was a background migration. It went very smoothly. It was estimated to take about seven days, but after the disk change, we did it under four. So, Giddily, Giddily Ruby. So Giddily Ruby is working well and is helping the team to perform the migrations to Giddily at a faster rate than before. And there are currently 13 new migrations ready to be tested in GitLab 10.0. So the team also built the Giddily benchmark tool. This is designed to help test the cgroup functionality. However, it'll also help in the future with optimizations and stress testing. And Giddily is opt out now in the development environments by default. This means more developers will have to use Giddily by default. It'll be turned on by default and people won't forget to turn it on. And plus one detection. So there's some badly performing code hitting rugged and Giddily inside a loop and that's the primary Git performance problem we have on GitLab. There's a merge request now that'll cause an actual build to break for most new and plus one problems that are accidentally introduced into the code base. So we're being a lot more proactive now about stopping new and plus ones from even happening, you know, getting to code base. And there's a new acceptance testing template. It's going to speed up the testing process. It auto generates the template with custom links directly to the right searches in Kibana and our Grafana dashboards. And there's a good example there. On to security. Security vulnerability scanning improvements. This includes a new VM for it, a custom scan rules, the addition of all cloud providers and scans, Azure AWS and Digital Ocean, divisions between GitLab.com web scans, and the network scan of other assets allows scans to happen more frequently. And there's a custom report of the summaries that sent to Slack and email application, auditing improvements, external logging. We're now logging successful and failed logins, account lockout events, new account creation, email confirmation, admin logins, login failures with one time password and UTF. So a lot more visibility into what's happened. And we've performed security audits on both AWS and Azure. And now we have to go to concerns. The inability of testing chef cookbooks changes before it goes into production has unfortunately impacted users. Along with changes to load balancing in the pgBouncer database and production environment led to an outage. There's also load induced outages. We've had high CPU and load on the back end NFS servers causing a lot of pain. We found projects running 400 builds and one of our first get hotspots. And we're also having outages from our provider. We've had host routing problems. We've had the database load balancer just stop working. And at least twice, I know of NFS servers have just rebooted unexpectedly. Another concern is geo is only capable of thinking about seven gigs of data per day right now. We have 107 terabytes that roughly breaks down to 40 years to move everything. So we have a lot of room for improvement there. Database concerns. While we're getting close to that 200 millisecond barrier, we are not going to break it. So we'll miss the OKR of all of the global P99 SQL times being less than 200 milliseconds. Like I said, we're at 240, we're approaching it, but we're not fully there yet. The GPG keys feature had some database scaling issues very early on. And we're wondering if we could have caught that in the review process a little better. And I think in the plans you'll see we are working towards that. And a big need help. We just need to get more people actively work on AP one, two and three issues for the database. Italy concerns recently discovered N plus one problems in GitLab CE. Some of these routes will query Rocket plus NFS or Gitaly hundreds, even thousands of times at a single HTTP request. We saw Thursday morning, I believe we were up to about 5,000 requests on the Gitaly. And these help, we need to work on the nightly staging environment to help Gitaly debug and move forward faster. It's all seen in the plans. Everything's in the plans. Security. CI abuse trends. We're seeing a lot of kind of smash and grab abuse of CI. We don't really want to go into depth in a public forum what's happening. Don't want to get too many people ideas, but it is really impacting the CI availability. So we're going to need to require quotas on the number of running jobs as well as existing minutes quota that's there. And the logging infrastructure is still too fragile. Work has been done. We're moving forward a lot on it, but we still got it to go. Plans. So plans. What are we going to do to address these? So first, we're completely revamping staging. We're going to make it production quality with full SLAs on it, and we're going to call it pre-prod. And it's going to be an exact mirror production. We're going to have a new staging environments. These are going to have full sanitized production databases so that developers can log in. They can run queers against a large database. They won't require any help from production infrastructure anywhere. So developers will have full access. And we're going to plan to throw GitLab QA into the development process. This is an end-to-end testing tool. So in pre-prod, we'll be able to test end-to-end any deploy, so we'll gain a lot more confidence about our deploy being successful. And of course, it won't be perfect to start, but as we go on, we'll learn better tests to write for it and improve it. So for Gitaly, Git fetches and clones are already go through Gitaly. However, in the coming days and weeks, we're going to be transitioning HTTP and SSH Git push to go through Gitaly. And as Gitaly is, Gitaly always migrates. One or more migrations need it. So Security Team is looking into a paid bug bounties program. I think it's initially a small invite group, but growing from there. And working on incident response policy, this is the creation of a comprehensive incident response framework. This will put all the existing policies and procedures under one umbrella. And there's a return to the content security policies, a full implementation of the content security policy, both on GitLab.com and inside the GitLab application. Last but not least, the team needs to grow. We're adding bodies of people all the time. And these are the open positions currently. However, you know, if you go to about.gitlab.com slash jobs, you'll see all the openings we have. And thank you. And I'll open up the questions in the chat to what we have. I have to, of course, figure out and stop sharing my screen first. But no, I don't want to use her. Pause her. Oh, look at the chat. Feature toggling, that's great. Great for Gitaly, but also for using feature toggles. That we're using feature toggles mostly only for infrastructure level items. You know, we're not going to put everything through feature flags and toggles. But I guess it's up to the visual team. And, okay, well, I'm not seeing any more questions, anything. Oh, that was fast. I guess everybody gets some time back. So thank you and see you in the team call. Thank you, everybody.