 Welcome on the fourth session or talk today from the analysis, testing and automation track. This is keeping the lights on for Fedora Project Infra and it's by Mark O'Brien. So Mark, it's yours. Yeah, just start my presentation and then I'll take it away. Okay, so first just to introduce myself, my name is Mark O'Brien. I'm the lead of the Infra and Release Engineering Team, which is part of the Community Platform Engineering Team. I've been working with Fedora for about two years or so now and this is keeping the lights on for Fedora Project Infra. So first of all, just to give you an idea of the size and scope of the Fedora infrastructure. So we have two main data centers, one in Ashburn, Virginia and one in Raleigh in North Carolina. So between those two with 147 servers, we support four different architectures. So we have x86 and x64 and we also have IBM Power. Also made available to us by Red Hat, we don't host ourselves with S390X. Aside from this, we have a couple of machines sponsored machines in iBibliot in North Carolina University. As well as those data centers, we use the AWS cloud. So that's Community Account sponsored to us by Amazon. So there we use a number of different things such as EC2 cloud front, EKS, S3 and other things as we see fit. All in all, we host well over 60 applications across all of our infrastructure. Some of it we maintain and run, some we just run. So for example, now that we maintain and run, we just run NGios, someone else maintains it. Some of it's critical parts of Fedora like Bodi, which would be the upstate system for those who aren't familiar and some of it's just part of the ecosystem. So like the election software that deals for Fedora elections, we also run that. So the different types of infrastructure. So we have a number of different types. We use bare metal units. So on those you might have KVM hypervisors. We have a couple of open shift container platform installs. We have two of those in production and two in staging. We have a 3.11 cluster, which will be soon migrating to 4.9. We also use virtual machines a lot. So mostly there for applications, which are either not suitable for containers. I haven't been moved containers yet. So for example, we have IPA, which are all over there. That's not really suitable for containerization. So we use virtual machine for that. We also have most of our databases on virtual machines. As well as that, we have COGIS, our main build server for RPMs. And for that we use both virtual machines and bare metal units depending on the build type. In AWS, we have a number of things there. So we leverage the multi-region aspect of it for proxies. So we can have them all over the world. So we can deliver content to the users quicker because it's coming from closer, obviously. We use Cloud from CDN for that too for static web pages like our status page. And for registry. We provide resources to the Fedora CI team. They use ZKS clusters. We also provide storage to a few teams in S3. And we provide EC2 instances to open source projects. One example of that is Librevatar, hard to say. That's an open source avatar services similar to Gravatar. So we provide EC2 instances to them. They run it on it. So as well as that, we have, as I mentioned earlier, an open shift. So we use our containerized applications there. We do encourage containerizing applications where necessary as they're easier to maintain and run. So let's move to Ansible. Ansible is a huge part of how we manage our infrastructure. So important against its own slide. We are very early adopters of Fedora. We've been using it for years. A repo is publicly available at the address you can see there. So what do we use Ansible for? So we use it for both configuration and deployment. So almost all of our infrastructure is defined in Ansible for deployment and configuration. So this allows us to have consistent configuration everywhere. So we can re-deploy hosts quickly if we need to duplicate them or move them. It also allows out-of-potency deployments. You can run parts of playbooks again and again without anything changing ideally. It allows for rapid deployment, as I said. Also allows for templating deployments. So for our open shift cluster, there's template there. So some of the work is taken out for you. You can just fill in the template for what you need with your open shift, and then it will deploy the app for you. We also use it for inventory. So because we use it to configure and deploy all of our infrastructure, the inventory contains all the hosts that we have and run. So it's easy to find what we use. We use the central deployment node for our Ansible. So that allows us, gives us two big advantages. One is we can use access control. So we use RBAC Playbook, which is a wrapper around Ansible Playbook. So what that does is it uses groups from our authentication system to say who's allowed to run a Playbook. So for example, you could be in a group called Sysadmin DNS, which allow you to run the DNS Playbook, the update DNS, but it won't allow you to run another Playbook for saying deploying open shift apps. There is one main group called Sysadmin Main, which has control over, which can run every Playbook. It's allowed root under the deployment node. They're the only people without root in the deployment node. So they're the main group for Fedora. And also use the central deployment node for secret control. So we have a private gate repository where we host all our secrets and they get injected to the Playbooks at runtime. So that's present on the central deployment node. We're also community driven with our Ansible. Anyone can contribute. So I have the link at the last slide for repository. You can fork the repository, create PR. If your changes are accepted, they'll be reviewed by someone. They'll be brought in and run. It's as simple as that to contribute. We have lots and lots of contributors, some very regular, some occasional. We take whatever we can get. We also like to get infrastructure integrity. So around the release times of Fedora, so for beta releases and for our full releases, we do an infrastructure freeze. There's a script within our Ansible code to tell you it's frozen, trying not to make changes during that time. It's not impossible. We do have a freeze break request process where if you do like to make a change and it's urgent, you just need to get a plus one in our thumbs up from two members of the CIS admin main group and we can make that change for you. So next on to how Red Hat helps Fedora. So Red Hat is the main sponsor Fedora. While it is the main sponsor, it doesn't steer or make decisions for Fedora. So we have Fesco, which is the Fedora engineering steering council. And we also have Fedora council. They make the decisions. They're a community like the board. So they are separate from Red Hat. Most of the hardware we use is supplied from Red Hat. So nearly all of the machines we use in our data center is supplied from Red Hat. We have onsite IT support from them. So they take care of, it's record stack, it's network issues and things like that, the data center for us. And they also supply, as I mentioned earlier, they also supply us with some infrastructure like S3 and ITX machines. They also hire a dedicated team to work on Fedora. So like I, myself and my Red Hat employee, there's probably a few more on watching this presentation and work for Red Hat on Fedora. So that allows for a full time contribution. So we can supplement the community, support the community, complement it. So some people can only do it part-time, so it is important to have people doing it full-time. They also foster an open country. So that's community first. As I said, we like to get the community on board. We like to help out with the community and we like to get them involved as much as possible. More community members try out more Fedora benefits. Gives more ideas, gives more across, gives more everything. We also like to be open by default. So we try and have all our discussions in public. We like to share as much info about what we're doing and how we do it as possible. We use IRC, our matrix clients. They're bridged for most of our communication and also mailing us. So what CPE does in Fedora Infra. So CPE is a subset of Fedora Infra. So we work in the Fedora Infra community, but we're only a small part, but it's a very large community. We also have other things we work on, like CentroStream, Apple and a few other things. So as such, we see ourselves as community members. So then communication with us would be the same as any other community member. So if there's something you wish to talk to a CPE team member about your good contact on IRC, you can mail the infrastructure list if you want to bring up, or you can raise a ticket on any of the trackers, which has been a preferred method, and when we'll work through the tickets as they come in. We're always happy to help when available for here to support the community. So if you need anything, just reach out to us. So this is kind of an overview of how we do our work in Fedora. Just as a caveat, any urgent fires are Fedora leases themselves or higher priority in these, but they're kind of outside the scope of the talk. So outside of those, I'll just go through what we do. So we have three main types. We have day-to-day work, which is kind of a small amount of work. We have many initiatives, which are medium-size and initiatives, which are large-size work. So day-to-day work. So it's work which takes relatively short amount of time. It might take a couple of minutes, could take maybe two, three days max. So generally that comes from issues raised on our ticket trackers. So we've Fedora and Fedora Lynch tracker. And these three are twice daily, once in the morning in the EU time and once in the morning US time. So these are assigned tags. So we do trouble and gain. So trouble is the amount of work it'll take to carry out the issue, estimated and gain is how the community will gain from it. So the priority then would be a low trouble, high gain ticket would be the first one we do, obviously, because it's the most benefit for the work. And high trouble, low gain ticket would be lowest on the priority list because it takes a lot of work and there's not much gain from it. We also assign different categories to the tags, to the issues as well. So for example, if you came in as an AWS expert and you want to tell about, you could filter for AWS tags. You could start working on those tickets as necessary. Generally, there is a person in each category who you talk to, if you had an issue for AWS, reach out to me. If it's for general infrastructure, you could reach out to Kevin Fenty or whoever. As well as that, we have some unticketed work we carry out day to day. So these would be kind of general tasks, if we serve our upgrades, mastery boots, CV, patches, review on PRs, things that might come in are regular work that we didn't raise a ticket for. So next up are the medium-sized work, many initiatives. So these will take possibly maybe a couple of weeks or they might take a couple of people to do them. So as they take a bit of time, they take a lower priority in day to day tasks. So when we do our daily meetings, if a ticket is deemed too big as a day to day task, we tag it with the main initiative tag. Every month, we prioritize these in order, and then we work on them down through the priority. These are second priority to day to day work, and as such, they can take a little longer to do. So an example of one we've done recently is we wanted to centralize all our documents, to say that we're on a few different places, somewhere up here or somewhere on readdox.io. So we wanted to move them all into dox.floraproject.org. So a few people in the team in their spare time are under downtime during the day. They start moving over bits and pieces, and at this stage, we have most of them over. So then the last is the big chunks of work, the initiatives. So these are usually to take a team of a couple of people. It can be anything from three to five people really, and they can take multiple ones. So these are generally proposed by the community. It can be proposed by anybody. When a proposal comes in, the product owner of CPE will have a look, and they'll see if it benefits the fedora community that caused the merit of it, and maybe do some more information gathering if anything isn't clear. So if the product owner thinks it might be acceptable, they hand it off to the art team. So the art team is a small team, usually kind of two to four people. They work for about two weeks. It's built up of different members of the CPE team based on skill sets. So for example, I'm normally the CIS admin owner, so there might be some expert of mine or if it was for a mailman, for example, somebody knows mailman. So the art team, first we look at feasibility of whether this project is possible. Maybe some technical paths we could take, some possible solutions, and if it is possible, maybe a POC. If it's deemed unfeasible, the initiative, we'll go back to the initiative, a proposer and say, look, this isn't in its current stage feasible. So they might ask for it in a slightly different way it is, or they might just drop it. But if it is deemed feasible, which most of the time it is, it will go on to our backlog to be prioritized by stakeholders. So every quarter, the stakeholders will prioritize what work we should do. And based on that, and the availability of people, you assign maybe two to three initiatives a quarter, and we carry out the work on them, and all going well to deliver them to the community now. Some big examples it is, the biggest one would be our new authentication system that was carried out over a number of months by a few different people, and it was delivered early last year. So how can you help if you'd like to contribute? So if you do any of these things, like Python, Bash, Linux, admin, Ansible, even any other computer, anything UX, UI, anything we can always use your help. A good way to introduce yourself is to mail the list of infrastructure at listoffrohrproject.org and introduce yourself. We have weekly meetings on IRC, where we go through any news that's in Fedora. We go through some tickets. Sometimes there is people who give talks on parts of our infrastructure information for new people and people who existed and might not know it very well. It's a good way to share knowledge and learn. You can create a Fedora account, so that's kind of necessary for almost everything to get access. When you have one, you get sponsored to become an FIA apprentice. So that's one of our authentication groups. So what that'll do is give you read-only access to a lot of our servers. So you can kind of SSHA, and you can have a look around, see how things work, see the risk of breaking anything, see what applications we use, what you can be familiar with, things like that. And as always, the team is there to help. So if you see a ticket anywhere on any of our repos listed below there at the bottom of the page, and you think you might help out or you think you just want to learn and watch, just make a comment on the ticket to take your hand up. It's someone from the Infra team. Maybe you can just do it yourself, but if you can, someone from the Infra team can help, you can chat with them, you can discuss with them, and they go through anything to get the community on board to get involved. So that's pretty much it for me. There's my details. If anyone wants to contact me after the talk, if they have questions that they can't think of now, by RC or Matrix, you can just take me as M.O. Bryan or you can email me at mjobrinerrealac.com. So if anyone has any questions, I can open it up to the floor there. Right now, I do not see any questions, but I have one. So you said that you have three levels of work or projects. Does it happen that your day-to-day work is so intensive, you do not have time for anything else for a prolonged period of time? And how do you handle it? So that's a very good question actually, because it's something like, obviously that can happen occasionally. So generally, CP is a fairly big team. So infrared range is a sub-team CP. So we're the ones to carry out kind of most of the day-to-day work. And as the initiatives are an assigned team, but as we run the quarterly cycle, if in a quarter we find that the day-to-day work is too much for us in the next quarter, they'll take on less initiatives and leave more people on the infrared range teams trying to make up the difference. So it's a case of moving bodies to where they need to be so that nobody's swamped. In my experience, it doesn't happen that often. Okay, there's certain people who know a lot who obviously will have more work on their plate, but in general, there's usually enough people around to try and get the work done. All right, thank you. There is one question. So what about initiatives like RPM Autospec that require a small but constant amount of attention? Issues and PRs on RPM Autospec don't get answers right now. Okay, so RPM Autospec was an initiative run recently. So it did get a solid three months of attention. It got updated. So as I said, the infrared range team do the day-to-day work. So we keep an eye on the issues and PRs to come in to RPM Autospec. It can be a little bit lower on applications like that, but if there's anything urgent, we'll try to get them straight away. If it's for small incremental builds or minor bugs, it might take a little while, but generally they'll just come out to a radar and we'll try to get to them as soon as we can. Thank you for the answer. All right, I think there are no more questions. Yeah, okay. As I said, if anyone thinks of anything after the call, feel free to reach out to me by email or on IRC. And all the links, the slides are available there, so any links, you should be able to get through. So thank you, Mark, for the presentation. Thank you. Enjoy the rest of Friday and the rest of the... Thanks. Yeah, time for a beer. All right, thank you.