 So welcome everyone on today's CICD team update. Let's maybe start with accomplishments. This is always the most happy part of the presentation of groups. We ship a few of interesting features, but also we did not ship everything that we planned from things that we like improve crazy from 9.1, something that we tried with 9.1 as something new is the CICD pipeline schedule. CICD schedule got a major recall also from backend and from the frontend perspective. It could basically be considered like a new feature. With 9.1 we actually give ourselves a chance to ship something fast, give an access and actually break that later, but the truth is that we didn't break anything. We just improve this feature in a major way. We also ship a few of very important performance improvements that actually have quite significant benefit right now, but the more we'll be visible once we finish everything else that we have planned in our agenda. And we actually continued our work on making a Github as we are time as possible, but we had plans for much more. We are always very ambitious in what we want to deliver. We want to move fast and we want to ship as many as possible, but unfortunately we didn't manage to ship the direction features which is like the biggest low light of CICD team at 9.2 because it's from our perspective direction is equality or even more important than everything else that we are working on because it actually pushes our product forward. It did happen from like various reasons. One of the reasons is just a number of things that we had to fix in 9.2. If you see on the previous slide we had like 36 closed issues out of which 28 were bugs and regressions. It's like basically half of the issues were like things that you have to fix. Actually it's not really true to say that every issue is like of equal weight, but it also shows a little of the story of complexity and the scope of what we are covering with our changes. We also had like things that kind of unexpected, like when we had mixed people availability during the release. I actually had to take a few times, a few days off. It was like something not really planned, but if we like take 30% of the CICD team capacity for some time it just takes a crazy amount of difference. But this is not only low light. The other low lights is we are still facing the CI autogest. This is something that we are constantly working on. A little more about this part of the story a little later. But maybe before that let's talk about a little about 9.3. Actually sorry there is 9.2, but it should be 9.3 at the top. We have still like 51 issues open out of which 20 are bugs, 60 are closed so far, and there are something like 8 are the bugs. We are just crazy ambitious. We are working on so many new things. It's really like when I started building that list today, I didn't realize that this is how many things we actually have in our pipeline for 9.3. This is like crazy amount of direction features with the guest help from Dimitri, who is working on cold climate. Thank you Dimitri for doing that. And also we have like a crazy amount of the new features that just built on top of what we have now and to prove this like the small tiny things of CI CD that makes it even more awesome. Like we are getting us about pipelines usage quota for users. We are getting asked for CI environment URL, so just make it easier to script. We are getting asked for aliases for github ci.yaml and Docker integration. This is something like eight or ten months long feature request. We are actually also pushing envelope of like security of the CI CD. We've introducing concept of protected variables, but as always we are also pushing real time stuff. We actually pretty much have finished environments list to be real time. We are working on improving our job details page to not refresh, which is like something that's most of time works, but sometimes it does not and it just gives you a butt impression of how the product is behaving. We are actually pushing a lot of in terms of scalability and performance. Unfortunately, object storage, it's like the follow-up and basically this is a follow-up from 9.2, something that we didn't ship, but we plan to finish that now and also pushing things that are requested by support team. A little more about CI on github.com. I just mentioned that we had in low light that we had like CI outages and we had some time ago very constructive discussion with Ernst, Pablo and Stan about what we could do better to actually make it more production ready basically. What we did came up is just nominating a person within the team to be this kind of like a single point of being a reliability specialist. Thomas is currently a developer who is maintainer of GitLab Runner, but actually the Thomas work will change slightly to be more focused on production readiness of GitLab CI, GitLab Runner and GitLab. Everything that he will be doing will be focusing on making sure that CI on github.com is always working and building all features that actually makes it possible for us to scale significantly over what we see now and scale that we're keeping in mind limitations of our product. Thomas will be like something that is actually happening for some time, but it's not really well described in written communication, but Thomas is doing a lot of work in production already and right now he will be focused completely on making most of CI on github.com to make sure that it's always stable, that we can scale, that it's also cost efficient something that I know that we could greatly improve. If we continue this topic CI on github.com with that comes a lot of responsibilities, but also like more rigid planning of what we want to deliver. For example, this is a little mix, a mix of things that we are actually resolving and that we resolved, but also a mix of things that we are trying to resolve in upcoming weeks and months. For example, we have like a step plan what we want to achieve with our changes to actually at least to some extent consider that CI is in much better shape on github.com than it is now. Something that is actually very crucial because we started offering shelter owners minutes on github.com. So we actually selling CI on github.com right now as part of our bronze plants and silver and gold plants and people are just asking us about these features, about improvements, but also about the stability because they do expect much more from us right now than they were expecting when the service was free. And Thomas's work will be basically focused on making sure that we can fulfill this goal of CI production readiness with the steps, at least these steps outlined today. One of the changes that brings us closer to achieving this goal is like pushing envelope of monitoring. And we are doing that every week. It's really interesting because, for example, recently we had Zigarian who was contributing to github monitor and adding new graphs. We also have Thomas who is of course working on that. But with like better monitoring, we can, we actually build better alerting that allow us to react much faster to problems. And recently we actually like being more responsive to infrastructure outages the infrastructure provider itself, because we just know that before them, which is like very interesting outcome. One thing that is like concerning me and our work is like the always this balance between new features and maintenance costs of things that we are building. For example, I did take a look at the CI technical depth. We had maybe not like that crazy amount, but 38 open and 34 closed issues with the technical depth. But it still is like show us amount of the things that we have to improve to actually say that CI performance, CI scalability, CI reliability is something that is up to our expectations basically. But there is another concept, something that we started discussing when we basically started working on policy status in database. This is like one of these big architectural changes to how like we store data right now. And just because of the amount of data that we have today, it is already quite complicated to make this architect to our change without interruption and make sure that everything is like continuing to work on gitlab.com. But it also brings us to a point how the world will look like in a half year from now or one year from now, and we have like 10, 20, 100 times more data than it is today. And this is something that we are trying like to figure out now. This is what we are actually trying to figure out based on this like quite complex topic of storing data and reconstructing them from what we have now. To actually like understand how we can look in the future when we start actually migrating another sets of data that we see that would just greatly benefit for system performance. Like another case, it's just migration of artifacts and traces. This is something that is actually also have significant impact of project dilution, something that is already asking for some time, and something that we have some ideas about how to actually achieve that. But this is actually part of the bigger story because it's like another problem that we have today that we just only get bigger over the time. As the hiring, it's actually very interesting because we are in like the final phase of hiring one CICD engineer to that team. We actually like waiting for resolution on how the construct should be done. It could take some time, but actually we have a few other people in the pipeline that we are constantly reviewing, and some of them are actually very interesting. I think that that's it from me. Maybe anyone do have any questions? Okay, I am not hearing anything. Ben is asking what else do we want to monitor? I believe that everything. I think that we are getting to the place where that will monitor everything just to like be over apparent with number of data and then actually try to figure out what is important in terms of alerting. Yes, Ben, I think that we are heading to the point that we probably need the dedicated CICD, dedicated for me to server for CI. This is also something that is discussed in steps for CICD for CI production readiness. So the question is how much can Peretius handle? Well, that depends. Of course, it depends on how big of a server you have. We've got reasonably good size servers now, second now, and of course I have the two-factor to get into the Peretius server to see the metrics. Where's my two-factor box? Okay, maybe one thing to like mention what we want to monitor. We basically want to monitor every job run that is being executed on CI runners. So basically, we are looking at having like scraping 1000 servers right now, but it will just go over the time. So it's the problem of today, but probably in like half a year from now, it will be the magnitude of more data. Camille, why does Ben have to go for his two-factor? Aren't we trying to move everything to public servers? I'm not sure if I understand the question, sir. Shouldn't all of our monitoring Prometheus servers have open public views? I think that's not all. We like this close to public, but most of that, and I believe that Pablo would be better to answer that question. As far as I know, we don't put the Prometheus servers directly on the internet publicly because there's potential for abuse because we don't have a way to disable writable endpoints yet. We're working on that for the 2.0 release. So you're logging into the Prometheus server and not the Grafana views of the Prometheus server. Okay, and that's being worked on so that we can do that in the future. Okay, makes sense. It's also difficult on the Prometheus server side to limit how long and how extensive queries can be, which can cause basically DDoS against the server. Okay, so in the history of GitLab, a lot of talk has gone into DDoS, and the only DDoSs were like there was never in the history of GitLab was there a malicious DDoS. I'm not saying there will never be, will be one, but we'll take a chances. All the problems have been like self-inflicted or shared incompetence. Yeah, specifically I'm looking at that. So we have it set up so that we have redundancy between the public facing Prometheus and the internal facing Prometheus. So that's no problem, but there's no way to limit query time and things on a public Prometheus server. So it can get annoyingly dangerous, which is why we put Grafana in front of it because it has a little better limitations. Cool, yeah. The ultimate solution would have Prometheus outage problems when a database outage happened because there are too many people viewing our Grafana. Yeah, that can happen. Yeah, it would be great to have like a public and a private one, but have exactly the same data on both of them so that they're like automatically mirrored. No, it'd be great to have a fully accessible public Prometheus. But if we talk about how much stuff we're monitoring, we have, we're doing about 20,000 samples per second into the current Prometheus server. And if we look at the memory usage... Just from the top of my head, like... We've got about 300,000 metrics in the Prometheus server right now, which is actually not that much. But if we want to go and monitor all of the runner nodes, we want to get node exporter metrics from every single runner. If you're talking about a thousand runner servers, that's going to push the limit of the production server. And we should probably have a separate one for that because we want to do... We want to do no more than about a million metrics on a single Prometheus server, especially given that our cloud nodes are a little smaller than a dedicated hardware nodes. Ah, so CI means older runners. Okay, that makes a lot of sense. That's cool. And that's... So actually, do we... Do we... We can do one million metrics per second per server? We can... We probably do a million different time series at a rate of about 100,000 samples per second. Okay, so the base one can still grow by a factor of five? Yes. We started to run into problems when we had some excessive amount of metrics. The Azure VM that we're on, the VM class we're on, is medium-sized or medium-large-sized. I forget what they call sizes these days. But I'd say we have like three to five X headroom right now. Thanks for the presentation, Camille. I thought it was really, really interesting. And I think we're seeing great progress in stability, and I'm super excited for all the features that will come out in the next release. Okay, thank you very much. See you next time. See you on time. Bye.