 Hello my name is David Kerwin and I created this talk alongside Vipple and we're both members of Red Hat Community Platform Engineering Team and we're both responsible for the CentOS CI infrastructure. So just a quick overview of what we're going to talk about in this talk and so like I said we'll give a brief overview and we'll talk about some of the services that we provide so we have bare metal containers and VMs and we'll give a little demo and hopefully there's going to be lots of time at the end then to have questions and hopefully someone will actually have a question. So let's go on. So what is CentOS CI? So it's an initiative of the CentOS project and the idea is that we provide a place where projects that open source projects, it's a place where open source projects can run their CI infrastructure and basically anybody can apply if they're open source and they somehow benefit the Dora or CentOS. That's what CentOS CI is. Okay so on this slide in the green it shows the parts of the CI story that we actually provide access to. So we give you a workspace, so it's basically an open shift and like a project an open shift and optionally we can give you a Jenkins instance, a persistent Jenkins instance so you can basically you have a place where you can run a Jenkins instance, you can run your tests then on bare metal or VM or containers and then you have a place to store your artifacts temporarily just so you can like I don't know check out your code and build an artifact and then run your tests against it. So what's changing in CentOS CI? So we're hoping to stop support for at the moment we have like a central Jenkins instance so we're hoping to turn that off pretty soon. We also have an open shift 3.6 cluster and we're hoping to switch that off hopefully by the end of the year but it may go on a little bit and what's replacing both of these then is it's a single open shift cluster and the whereas in the past on the central Jenkins you had all of the projects were like if you had to make a configuration change it would require you opening a ticket and then one of us would have to go make the change for you whereas on the new open shift cluster you'll have your own namespace and like I said your own Jenkins instance and you'll have admin access to it so you can make that change yourself so hopefully it'll free up a lot of our time where we don't have to answer basic configuration change issues you'll be able to fix it yourself in a self-service manner. Yeah and we're continuing now to to maintain the bare metal images the VM images and the container images that can be used then to run your tests on. Okay so yeah the open shift cluster yeah it's an open shift 4 cluster we have three control and nine compute nodes you can see the stats there below you get a namespace like I said you get full admin access and we can provide ReadWriteMani or ReadWriteOnce persistent storage via NFS you get an optional Jenkins and we have things like a shared library so this is something we maintain on the team to make it easier to run Python tests so you can integrate easily and running your Python tests against one of the workloads that we offer. Okay so open shift was the containers so the system that we used to control the bare metal then is doffy so what is doffy it's basically it's a REST API in the back end you have Ansible playbooks that can communicate with the the bare metal nodes using IPMI and you can install then some of the various OS versions using Pixie. We have a Python client to make it easy to to contact this this API and you can use this then to reserve the bare metal nodes so with multiple different architectures you can choose from whichever one you want and you can you can hold it for a period of six hours so as you can see there there's a couple of different versions of CentOS we offer and on the right hand side you can just see like a sample kind of JSON blob being returned by the API so you can use that then to discover like the host name that you need to SSH in or you know do something else later. Yeah so right so we have a much we have a pretty large pool of hardware at any one time you can see the numbers below there so we maintain a certain amount of numbers if possible at any one time so you can see there we have a kind of back end workers and we have a message queues that are sitting there with which contain the metadata that maps to each of these nodes so basically when you make a call to the doffy API it'll consume one of the top and we have all kind of monitoring in place so if those pools drop too low it'll automatically provision new ones or if it gets really critical or whatever it'll alert one of our engineers and we can go hopefully kill some instances that are stuck or something like that. So on the right hand side there we have a little table so these are the numbers that are the number of bare metal nodes that we have provisioned per year. You can see it's a pretty sizable amount and we're up to 132,000 for 2020 and in the brackets there is just the number that it's likely to hit around that around that size by the end of the year. So down in the bottom left then you can see in total we've provisioned 1.2 million bare metal instances using doffy. Okay so the next category of workload that we can offer is VMs so we recently provisioned an open nebula cluster and we're still in the process of integrating it but we're hoping to get this into the doffy API so you'll actually be able to check out a VM instead of a bare metal instance if you require but that's it's an ongoing work at the moment. Okay quick demo. All right so how you would interact with us? So this is to new people that want to become tenants of CentOSCI or maybe the existing ones that are on the central Jenkins or the legacy OpenShift cluster. So the first step is to open a ticket on, I'm trying to sure do it, pronounce this right, is it Paguir.io and it's the CentOS infra board. You can just go to issues and create a new issue so it's pretty simple, you can go here and in the types here we have a template. You can just click that and it'll pre-populate the issue with all the information that we need from you. So it's basically do you have an in-space in the old cluster? Yes, no. Do you require access to doffy? So do you need to be able to check out bare metal or VMs or are these containers enough? Yes, no. And then just a list of the, oh well of course we need the project name and then the ACO accounts of the admins that you want to have admin access in your OpenShift project. So once you fill out that information you just create the issue and you know one of us then we'll pick it up and if it's approved then we'll provision a namespace on the cluster for you. So I'll give you a quick look into one that we've created earlier. So we have, okay so this is, you can see the name of the project, it's just a monitoring example and it's one of my test projects. And inside here I've deployed one of the Jenkins instances and it's got a little worker here. So this worker it already has the doffy API Python client installed on it. So if we go into the Jenkins, there's a kind of a sample pipeline here. You can see just down here at the bottom see that it just checks out one of the nodes and then some, within the groovy scripting we're just grabbing the host name and the SSID and printing them out. And then at the end then of the pipeline it just like returns it says we're done with the node. So that'll, doffy then will recycle it, it'll tear it down, reinstall the operating system and once that's complete then it'll make it available back in the pool. Let me just run it to show you, build now. Makes the connection to doffy, gets back the information and then prints out. You can see there's the host name and the metadata that can be used to turn it back off or to tell doffy that we're done. Yes, there's an example. It's a basic example of how you would interact with this. So in between these two provision and tear down you can put your own tests in. So that's pretty much it. So it's some of the resources that we have available. The top one there you can, how to interact with the team on a day-to-day basis. You can see the place where you can create tickets. If something's wrong with the infrastructure or if you want to create a project with us you can see some of the wikis that show some of the architecture. So some of it's out of date like we need to get it updated and stuff. But we're working on it at the moment. Yeah, so I'm after powering through that. Is there any questions? Go ahead, Brian. If your workload is not a container or a VM or a bare metal, give me an example of what it could be. Ideally the least resources you consume the better? Totally. But the very much point comes in on what you need. If you're fine with just using a container that you can maintain somewhere and you can just use Jenkins to pull it and run it and be fine with it then it's even better. But if your job can't be containerized and you need a bare metal one of the examples would be say if you're from a storage team and you need four servers and four clients to interact with each other in a storage pool and do write data and check regressions in the speed. So for that you'd need VMs or you'd need bare metal. Bare metal much better for the exact performance if you want to check regression quality. In that case, it's much better to go for bare metal. But it's very much the idea would be the least you can get away with because it is an infrastructure that we run for multiple projects. It's not consume all you want. So it's also very much dependent on you on what you think works the best. I'm just reading the chat so that I'm aware of exact context if you have added anything. If you are not tenant yet, so David went through it but I'm just going to run through the process once again. If you are not a tenant yet, I would very much like you to go on purgear.io slash David can you share the slides and then we can Is that working? Yes. So go on purgear.io slash CentOS have an infra and you would see a template where you would see CI migration. Right now it's the same thing. It has all David if you can zoom in on the template would be good. I can yeah. So we have moved away from our older way of doing things on central Jenkins because of the reasons David told Jenkins can be hard to maintain when the number of jobs through on a single load and you there are more and more projects onboarding. One is that problem. Second is that you can't add plugins. You can't do thing. You can't add credentials in that because there are so many projects giving admin access to everyone is also not very feasible and security problems. So we don't want that now what we do we create your namespace in OpenShift. We give you an account that you can log in with ACO your CentOS email address. You authenticate with ACO you are in the OpenShift account and once you have created the namespace you would see that on your dashboard. There we can create a Jenkins deployment for you Jenkins workload for you or you can do your own thing. If you if you know what exactly you are doing. So that's why we call it CentOS CI infrastructure and we don't want to claim that we do all your CI jobs because how you want to run your bills. How you want to configure your jobs and how you want to trigger it. Those things are very much on you. We can help to an extent that we can offer you a Jenkins instance. You can go to the root. You'll see the Jenkins you authenticate with ACO you are in and now you own you are the admin of the Jenkins that David just gave an example of. So again going back the process would be to file a ticket on a gear.ios less CentOS using the template or you can not use a template but provide enough information for us so that we know information would be what is the namespace name you need. That's why I went to OpenShift site because we need to know what is the project you are going to use run this run your tests on the Jenkins. So as you see David is saying you have your project name and then who all will be the admins of that namespace in Jenkins in OpenShift. Once all that is set up you are good to go. You can configure your Jenkins to run your jobs on however trigger you want. Brian was that clear while I wait for Brian to confirm. Thank you. Is some cleanup policy for Duffy VM such as running for n minutes without activity or something like that. So right now since we are in process of integrating virtual machines we have not thought that far yet. It's definitely a good idea but we have something like that in bare metal. So if you have not you check out a node and you get it for six hours. Even if you have not done your builds you take it back in six hours because so far I have not seen a job that runs for six hours especially CI jobs imagine every PR there is a six hour jobs running but if it is in that case then you again you get an option to mark the nodes reserved for 12 hours you can just fail the nodes or in case let's say you saw some failure and you want to investigate more exactly why that failure happened you can go in and check it but if not then after six hours we automatically claim those nodes back but again we don't want all tenants to think that okay anyway they'll claim after six hours because since these nodes needs to be reprovisioned from scratch so that next tenant can use it it will be best if you do c code node done and the session ID so that it's back to us and like if you're done in 15 minutes or one hour or two hours you give it back and then we can reinstall it from scratch for the next consumption so we have this thing in bare metal where we take it automatically in six hours if you don't give it back and in VMs we definitely have to think I'm not sure if we can do it on if there's any activity or not because it's much more prone to error what if we start getting complaints that hey I'm still running job and it took it back difficult to investigate I'm not trying to run away from job but it's also it's very easy to say that hey your VMs are claimed after six hours or five hours or two hours and then you can maintain however you want to run your jobs in there and I need to see how we can describe those things is it an openabular feature where we can auto claim those VMs back or we can integrate it in the defeat part where we do just as bare metal but that's definitely something to look at thanks a lot Leo it's a great question to see Steven from there and I mean I'm more than happy to run open shift or the community shifts but I mean how we can still have the GDPR issues yeah yeah we can take over the community shift if Steven is ready to give us exactly what GDPR issues to deal with fine I mean we just have to run an open shift cluster we don't even have to run it we can community shift community shift if you know what I mean we can ask a community member to take care of it but yes I understand the pain point then we'll probably set up something similar to the past the pace that sent us like we'll just wipe it every 24 hours brand is a really big question I'm sorry to interrupt David what if I want to run tests on Fedora VM yes we do have plans for that right now there are reserved VMs but once we start virtual machines the first job would be it in my opinion send to us VMs won't be priority although it's just adding VMs as template it's the same day job but the way I very much like to have Fedora VMs in say open able template already existing that we can create for your jobs so Fedora virtual machines are the top priority and as soon as we have integrated the VM thing VM thing in a Duffy that will the first two to come across so definitely plans for that and we have been waiting for it for a very long time commenting on open shift is a fun thing of mine not sure how much I can do it on the recorded position yeah no that that makes sense Steven you got loophole do you have any more questions on on how we are is there a way to check the for a VM state the get method will provision a new one right it's an API the status of VM using its return right so I'm just really repeating the questions for the recording yeah so as I said VM is still an integration process and definitely open nebula support open nebula has its own status thing and we can integrate it in Tuffy API it's very easy in our bare metal if you go read the Duffy doc that I'll share the link or if you go to the slides on top David share the slide link David if you can share this landing again it would be great yeah so the way we work is we work on states there are different states of each and every bare metal nodes if it is in active nodes that means it's ready to be provisioned again and once it's provisioned it gets in ready state where you can so the way Duffy works is it maintains a queue where you can go and so the pool the sorry David mentioned pool it's just that all those provisioned nodes are in a queue and you can go and query against what you have requested and you can claim it so it contains a state as well and if our database the way we as I said active and ready and then once you claim it it's in deployed states so that no one else can claim it and once you give it up it it's again get back to active or if it if there's a failure in provision it's put in failed state so that we can go and investigate and do things around it so we work on states and that is definitely something that you can get but Leo if we have more idea around what kind of state you are expecting from a virtual machine definitely let us know yes so Brian mentions Duffy has an inventory call if you give it an API kit will show all the machines that you have requested so yeah that's that's very handy yeah I think we should highlight the features of this a lot more because I think more people would be interested in using it if they knew about it it's like our best best kept secret in some way it's not a secret if you go on python seco client doc it's very much mentioned there since we expect our tenants to like if you're using python seco client in the Jenkins that we have given or the Jenkins you have provisioned using your Duffy API key we kind of expect them to go and see what all this client can do which is python seco client if you go on python seco client documentation all its features are mentioned there right and you are definitely planning to hook the same feature in virtual machines as well and ideally I'm not sure if leaving things for a longer time as I said that there's seco note done part you would also very much expect that once you consume virtual machines you do that so that the virtual machines are freed up but if not then we will find a way to auto claim those back because it's definitely resources are concerns especially when we try to do it for a wider amount of those projects related to veterans and to us it can get pretty quickly so just to talk about another thing that we just recently enabled is cube vert so it is possible that we'll be actually able to do VMs directly on the open shift nodes themselves and then you can continue working with the open shift API or the kubernetes API it was a very good topic to include it in the talk but the reason we didn't because there we right now we are a little bit torn apart and identifying what will be the problems coming across it like if you start using open shift and at some point it may become a throttling issue for the whole cluster and it can get slow other issue would be what kind of extra privilege sec will have to provide you to be able to use consume kvm or consume cube vert and do we need to maintain them so there are some challenges around it we are definitely interested in making it as simple as possible and that's why we discuss that even if you are consuming VMs through cube vert or open nebula we'll do it through duffy so that you don't have to deal with that you can you just make a call to duffy you provide so the example that we shared it was an example snippet you do hyphen hyphen type vm and it will give you a vm so we are excited about doing some more work around that and investigation are we actually at the time we have five minutes and so that's what that was our main goal like if we have a lot of questions because infrastructure questions can be tricky a lot of complaints sometimes so if anyone has any more questions more than happy to answer question for Leo how hard is it to make a firm plugin we have joined the joint meeting yeah if you're interested you can do that uh right now we don't use a firm in center ci at all just i want to make it clear we use uh ansible to provision notes and power off reinstall yeah so since uh we don't have a lot of questions uh i like to go on the future plans just once more very shortly the most important one that is ongoing right now a lot of understand we need to make sure that how we handle and how we make sure that we know how much resources being used and that's where the problem is right now we want to make sure how many vm's are consumed and providing virtual machines capability is the thing that we want to do the immediate next and then i would very much like to focus more on containerized test especially using federal and center s container image uh is there first would be more and more of how we can get most using least amount of resources and then our ongoing maintenance david has been doing great things on monitoring and we had fabian who hook up with nice zavik's instance we have to yet look into how we can improve those a little bit smarter monitoring so that before you notice i am aware of the problems which i guess has been going good since january i have not had to listen from emailing list that there is something going wrong i get notifications on time so the monitoring is also one of the things that i want to finish all right so it was a great talk and thanks a lot david for what volunteering to take the hard part of the slides and yes if you have any questions feel free to reach out to us on send david can you go back to the resources slide if you don't mind on free notes center s-ci is where you reach out to us or even better if you have anything if you just want to discuss with us center sci or center s devil would also work if you want to file a ticket if you want to have any request or if you want to know more about infrastructure and how to put things around it we very much appreciate ticket because we try as it and we worked according to the priority we see center s-infra on pager.io and read more about duffy on github.com slash center s duffy that's where you'll see the code and the wiki is on wiki.center s.org the link in the slides and slides link in the chat thank you very much i guess it's time to go bye thank you folks