 Hi, everyone. It's 5 p.m. for me. We should start. You should be able to hear me Welcome everyone on the Functional Group Update of the CI CD team You can find the presentation link in the invitation There is also if you are interested what we talked about the last time you can click this link in this presentation to the previous one but let's start with the The current release so the accomplishments. This is like the very important thing because The last release was pretty tough We had we all we all had like amazing summit when we had an opportunity to meet each other and spend time together and Have a really fun time, but it also it's like very hard time to Deliver things that we plant we actually as part of 10.2 We closed more issues than we've had in 10.1 So it is basically impressive that we managed to do basically more but we also did take big undertaking on Kubernetes story and GK integration because we quickly figure out that the architecture that we ship in 10.1 for GKE and Kubernetes was not really the the best one that we could like envision for the Future scalability of the future. So we decided that we want to change that we want to change that quickly quickly improve that and be done with that But it turned out to be like very big tasks that like everyone everything else from the Kubernetes started that kind of depends on but we did manage to do it in the basically In just time. It's like we merged everything like most of our most because in the last two days It was crazy right what we manage. We are done with that We came to rate on top of that and it was like very impressive from everyone in the team how They work together on getting this done helping each other on solving the issues Fixing tests and making sure that we can merge that in time But this is not the only the only thing there are so things that We've been working on and things that we are continuing working on for example today. We Merge github to ask them to github render Alessio we probably can probably update on chat when it's gonna be executed. So we'll be more aggressive on Closing github runner issues if there is no response in the time we just right now the github runner issue tracker is growing and We need a better way to control amount of the information store there But this this this for more about the accomplishment about Some engineering side, but there are actually actually accomplishment about the team itself and the way how we Present ourselves and talk about things so Mattia He's this guy actually that like did This undertaking on making web ID on the on the summit so we invited Mattia to start helping us with the CI CD deliverables and Mattia so far is doing really great on trying to help us with Kubernetes Direction deliverable of this release Also, we got back to having the CI CD team retrospective every month We will continue having that we probably have that somewhere around one week before the Company light retrospective because it allows us more easily and Much better iterate on how we are performing as a team and how we could improve our teamwork The next thing actually this is like you could consider this this kind of promotion Like very important thing. I'll actually become the maintainer of github runner. He's like focusing most of time most of his time on support issues Helping customers helping our users Specifically about the github runner and everything that is connected with that part of the CI product and this kind of actually allow us to Move Thomas to be more focused on their ability expert Thomas previously was the maintainer of github runner He he become the reviewer of github runner. So he's still actively Helping with everything that we work on there but he's like full focus right now is on production readiness of the CI but Previously he was like more focused on the only runner part of the CI Currently his focus is basically for stock. So he covers end-to-end So he covers infrastructure monitoring security runner where is communication. He's he also covers helping 100 abuses and basically building all the tooling and Feeling the gaps in the product from the engineering side that would help us to better manage This kind of things, but also this is the scalability We know how important is the CI for github.com and for github in general? so There has to be a person that is basically constantly dedicated for this story And also github runner manager on Kubernetes This is like the big story for everyone in the company right now switching to Kubernetes and we also like follow this train of Everyone doing that and we also want to be ahead of the curve in this case The the the last thing we had actually UGC session during summit about the CI scale You could consider this be more like me complaining about the state of the CI Probably something like that but If you click this link, you will find actually my preliminary notes that I actually had prepared ahead of this session where we actually covered like One-sixth of the content But you could actually have a glimpse of What we could also talk about the CI scale at github.com in the future But also find the links to everything that we discuss on UGC and Go through these graphs go through these materials that are and listed there That's as always a Lot of things goes right, but some things also go wrong. So I did mention about this huge undertaking of this refactor We did manage but it kind of put a lot of pressure and a lot of communication challenge for everyone in the team It was worth doing that but it also that causes a lot of a lot of Of like the people working on this story This also kind of impacted the parts that we are continuing these reasons We decided we wanted to Conclude basically some important part of the Kubernetes integration is to move Kubernetes from service to cluster page Basically due to this refactoring we decided to split this story and ship smaller iterations smaller the riverbrew for 10.2 For 10.2 and basically postponing this work to continue and finish that in this race It's it's like 90% done. We will manage that but this is definitely low light Alessio who did jumping on Githra Praner train actually started improving also Githra Praner code base with some concurrency challenges that we have because we Started testing our runner with the race detector and we detected 21 data races that we have to fix Just to make sure that Githra Praner code base is concurrent and it's good We are still continuing working on the object storage. This is like very big task, which is like very important for all the DCP We seem that we fixed the last outstanding bug that prevented us from migrating all artifacts object storage so we Basically continue on migrating the rest of the data that will be basically left We're figuring out what to do with the tracer traces and migrate traces, too and Also, we rented Jagos to Geo team on helping Githra Q&A Because Githra Q&A it's like the best way to test the Geo end-to-end story And Gregos was like the best noticeable person about Q&A because he was like the first person introducing that at Github so as for the 10.3 we are actually Continuing some stories that we started probably the biggest story that we are continuing this release is adding support for multiple clusters multiple Kubernetes clusters per project So you could probably so you could like delegate one of your clusters to Basically be used only for your production apps and other cluster views for the review apps This was one of the things that was like one of key deliverables for the company-wide Tasks that was announced by the seeds some time ago But we also like helping we've Actually, this is done static application on security testing. This is already merged. So we are so helping we've Extending the CI CD scope we've added automated sas. This is big help from Dimitri who is doing all the backend work and Philippa who She is doing the front end one We also working on the helping Building some needed APIs for better handling crypto miners improving performance and also solving a lot of runner issues and of course Continuing working on the object storage We are working on the object storage because This is like part of the bigger undertaking of the GCP migration, which is like starting to get We are starting to get a picture of what has to be done and object storage is one of these like the key Factors that would allow us to move from Stateful application to more like the stateless Factor application One of these tasks is definitely object storage. The second one is actually probably Somewhere around the same size is the coupling github pages from Github and allowing us to easily migrate to GCP Without being stopped and having to migrate github pages too But also the we are working on GK integration and we will definitely continue working like if you look at the CI CD planning our original plan was to And it still is is to out group level clusters next release It's still yet to decide whether this is something that we will be doing but this is like that Like major candidates to date for 10.4 at least and 10.4 is also moving GKE out of beta phase to be GA phase so Making it something that we are committing to support in that specific form for longer period because right now We are still making a lot of changes. We are still making a lot of improvements. This is still like not the production quality That we could like think of that also Okay ours and we are working hard on Trying to hire more developers seniors and intermediates the current plan is to hire for more developers It's basically like down in the CI CD team capacity, which is like become they're taking in its own See on github.com Bitcoin miners. This is like probably never ending story. This is something that we are like Having the constant fight for probably like six months right now and for example recently we were hit by the problem of the growing pending jobs, definitely the same thing that we discussed on the UGC session and in the case where Bitcoin miners can scale up like very rapidly to having like 7k, 10k, 20k pending jobs We are having the scalability problems This definitely has to be solved like I'm using To our protest one is like being more aggressive in discovering them and killing them And the second one is like finally implementing the proper queuing for the CI job surfing that we are discussing for crazy amount of time But something that is like getting more and more important with the how we grow CI at github.com Also, we actually recently had very good work on enabling M2N's monitoring of all autoscale machines However, we are hitting some performance problem of the Prometheus because as you probably know, we are creating crazy amount of the machines Like in the hot times when we Process a lot of builds from github.org So basically our own builds and the shared runner builds we can process something like 2000 jobs at a single time This is probably like our upper limit and just because a lot of these machines are fmarrowable. So we Constantly create them and destroy them. So this is like very dynamic. We are hitting actually the file descriptor limits, IO limits, and CPU limits of the Prometheus We thought that Prometheus 2.0 would help us and it's definitely helping us because it allows us to handle way more But this is not yet still that actually works constantly without Problems. This is something that is so much right now working on and we hope that with the joint effort of the Prometheus team and CI CD team we could finally like enable that for everyone to see this kind of metrics Basically like knowing exactly how many CPUs we are using github.com wide at a specific time We are looking basically at this kind of information But also at the information how much agri-traffic we are using we are using in like given period of time So I mentioned about the hiring at this kind of shows a little about our hiring pipeline This is actually the summary of the of the these two positions because these are two separate positions senior and the back end the pipeline is Let's say like medium. It could be better right now we have around 14 applicants in like the assessment phase that we like interviewing and like ping-pong backing on Trying to get most of the most from them from our questionnaire and we also have four Developers in the second stage where we actually having the course with them or we are actually having the Second or the third call with them. So It's like trying to hit this goal of five developers. You don't have 511 developers in the interview phase But it seems that we should be able to hire maybe half of that maybe more it just depends how good will be the Develop the applicants in the first stage But as you probably know and our the hiring process is like usually like very very long Before we like contact with the person for the first time and this person who is gtlop It's probably like in the best case like two months In like probably like pessimistic cases like maybe even three months So it's not it's not very fast process, but we are still like optimistic that we could maybe hire some half of them and Basically, this is all I would probably look at the questions I should have mentioned that we start closing a lot of issues tomorrow at 5 a.m. UTC, so We'll see how it goes and how the response will be from the community Okay, so I'm not seeing any more questions if you have more questions you can ask on the CICD channel So thank you very much and see you on the team call