 Welcome everyone. Happy Fedora 35 release and you don't see me for some reason my video is not active. It's all blacked out but I'm raising both my hands and I'm very happy for the 35 release party. I'm Vipul, I'm here with Akastip and David and all of us work in community platform engineering team. If you have been around like devilish infrastructure, you should have heard of this team we work on a bunch of different infrastructure Fedora, CentOS, early engineering side but one of our major responsibilities is taking care of the Fedora infrastructure. And this very short talk is about some of the steps we are going to take and we have been taking to improve metrics and monitoring of our applications and of course the infrastructure they run on as well. So, Akastip and David, do you want to just say hi? Hello. Hi folks. Hello. There you go. So, yeah, as of now, we have Nagios to monitor our hosts and some basic monitoring of application running on them. Let's say if Pagiyar is running on one of the host and it crashes then we get notifications for that but there is no for a while we have been feeling a need for a newer and slightly updated monitoring stack. There is no need way to monitor what's happening within the application but just the state of application if they're running or not and slightly more complicated things you can achieve by passing script on the host, which is not a super ideal way. You cannot just run a query and filter down the metrics to the point. And again, and you can't do any of the things when it comes to OpenShift applications. So, as we are trying our best to move as many applications as possible to OpenShift content of platform for our easier administration, we feel the need of something new solution even more. And to tackle this, we, in CPE team there is a sub team called ARC which before any initiative is taken care of, these team members are two, three people who are assigned to it. They take the task and they try to find a feasible solution to it. And our team in CPE came up with two solutions replacing Nagios. And the answers came out to be Xabix and Prometheus. Xabix because we have already been using it in centralized infrastructure and we are quite happy with it. It has all the features we need to monitor different hosts and infrastructures. And Prometheus kind of came as default solution on OCP4 and upon investigating it, we love it and satisfied all that we are looking for. So, again, Xabix for your VMs, bare metals and just hosts and Prometheus can be used for applications which it turns out to be a very good application in our research. And Prometheus can also integrate with Xabix where you can export metrics and see all in one place. So, in last couple of months, we installed a new OpenShift content of platform, and in a platform version 4.x right now, I guess it's version 4.8. It's moving quite fast. And we have had OpenShift versions in Fedora where a bunch of applications run, but it was 3.11 if I'm not wrong. And quite old, it's been a long view that we needed to update it. So, we installed OpenShift content of platform 4, both staging and production. And in the end, I'll share this slide. It has links to those both clusters. And anyone can go there and log in right now using your VAS ID. Which means it's ready to be tested upon. But when you access it directly, you won't have any access right there. So, we are working on how to provide some of these access will come to that. So, again, Prometheus is the best option. I like this line. The Prometheus is the best option for monitoring applications running inside OpenShift. And 100% of our sample size agreed with it. Three out of three devs working on the project thought Prometheus was the best solution. And again, it comes with OpenShift by default. So, it's an integral part of OpenShift 4. We always knew that there would be support of it as long as OpenShift support would last. And it comes with three main components. In the monitoring stack, Prometheus, Alert Manager and Grafana. Prometheus is, it also has a web end, but we are not talking about that. Prometheus would be mainly for collecting your metrics from the applications that you define. And Grafana to display it. It's a very nice, beautiful graph dashboard. An Alert Manager to customize your alerts how you receive it. Like, unlike Nagyos, if you're part of this admin group, you know how many emails you get. And there's very little customization on what you receive and what you not receive user-specific. In Alert Managers, the namespace admins, which is application admins, can configure those things. But the problem is by default, Prometheus, Alert Manager and Grafana, they are accessible only to cluster admins. And the reason they are, they're mainly there to monitor state of your OpenShift cluster, not for applications within. But we found user workload monitoring configuration which allows every namespace admins and project owners to use this feature. So we configured it and every app owner when you have a namespace in there where you run your application, you will be able to extend metrics and like Prometheus to monitor your metrics there, your applications metrics there. But as I said, the only thing that needs to be sorted out is role-based access control on who has access to the namespace admin. We want it to be well integrated with fast so that we don't have to add users and remove users at multiple places. This is in work. It's already been configured to quite some extent, but we are still just doing some polishing work in this. And once this is done, every namespace admin can add and remove users on who have access to the application and monitoring and metric stacks by themselves. So it's a very cell service thing and you don't need to depend on the infrastructure folks, which is always very welcoming. So this was the introduction of what I wanted to say just a little bit of primer before we move into slides. Very small demo of what we have done, which I'll leave it to David. David, do you want to share your screen? I'll stop sharing here. Okay, okay. Can you hear me? Yeah. Oh, yeah, perfect. Okay, let me try to share this. And meanwhile, I'll add the slides in chat there and I'll upload it on Facebook. Okay, let me know when the OpenShift dashboard can be visible. I can see it. Yes, I can see it. I said it's visible at my end. Okay, so this is the OpenShift for dashboard. So we're after creating a sample project that has a couple of Prometheus rules configured. So this is just an example to show what it would look like when a custom application, for example, or an existing Fedora app has been migrated to the new cluster and is then configured to use the user workload monitoring stack. So I'll just jump in here, this monitoring example. So you can see the work, the, let's just look at the pods that are inside of here. So there's a single container running. And let me just show you if we jump onto that container. Hold on, sorry, one second. It's got a service, so it's got a networking. You can see that it has a single service running. So the new things that we need to add here to get Prometheus to scrape the service is a service monitor. So we'll go up here and search for service monitors. Okay, so you can see we have a service monitor object. So this is all self-service. So the administrator of the app can basically create one of these objects. And I'll give you a quick look at what it actually looks like. You can see it just says to scrape the app that matches a particular label. It uses like HTTP and there's pretty much the port. So Prometheus, it always expects to scrape the slash metrics endpoint on that service. So basically by creating the service monitor, Prometheus will automatically scrape the metrics endpoint every 30 seconds. You can see here in the interval. Okay, so let me just show you a quick example of the data that gets returned by this endpoint. In this example, it's just querying the price of Bitcoin and Ethereum and it's being returned here. So this is a promql, basically, or the language that Prometheus understands. So when you're creating your metrics, you just need to follow the Prometheus guides so you can produce data in this correct format that Prometheus will understand. Okay, so every 30 seconds, Prometheus is scraping this data. And the next thing we can do is we can set alerting rules. So you can create a Prometheus rule here already. It's Prometheus rule. It's pretty easy to read once you see it. So it's called Bitcoin over 100,000. So you can check that if this metric returns data and the value is over 100,000 for five minutes, we'll then create a warning level event. So what we can do is, I don't think Bitcoin is over 100,000. So let's put it at 5,000 because it's definitely higher than 5,000 at the moment. We'll just save that. We can check back later. But as we move on now, that's the actual rules. So you'll have alert manager will be continually looking at the data. Is this alert firing? Is this alert firing? And if it is, then you can configure it to maybe send an email to the app owners. So we can configure all that type of stuff in Fedora Infra as well. So we could have a custom mailing list that's unique to the particular app. So then we can set up alerts that only the app owners get. So that's pretty cool. Okay, so let's have a quick look at the monitoring stack then what it actually looks like. So in here, you would actually see any alerts that are firing. And you can see there's a couple already. These are alerts that are actually firing right now. And if we wait at about five minutes, you'll probably see that the Bitcoin alert would fire. Yes, that's pretty much the demo. Will we stop there and see, does anyone want to ask questions or will we move on to the last area? Yeah, let's move on to the last part and then we can come to some of the discussions and how folks can start utilizing this. So Akash, do you want to share your slide or do you want me to just do the last slide share for you? Yeah, please share this last slide for me. Okay, all yours. Thank you and thanks to you for a demonstration. All right, so what are the next steps forward for the developers? The cluster now corresponds to Norgan IPA for authentication authorization. So you can log into the cluster, PIFR accounts. The infrastructure and release engineering team will work on migrating the existing applications from the older P4 cluster to the newer one. One can help us with existing running applications with reworking their OpenShift applications playbook to fall in line with the newer cluster. And to make a PR against our Fedora Ansible, Fedora Infrastructure Ansible repository, the link of which I will provide over here. There you go. And by finding metrics to add to the Prometheus monitoring for your own applications. And how exactly that can be done. The link to it is over here. Right. So Akash, where, okay, I see the link side. Yeah, in the chat section. So down the line, Sabix would replace Nagios entirely for monitoring the infrastructure and OpenShift monitoring and do it pretty well with Sabix. And that's pretty much it from our site. And folks, please feel free to use the Q&A section for all your questions. I guess stay a little bit for your back now, mind. Yeah, I realize I was sharing my video, but it was not visible. It's all just blacked out. So yeah, thank you for that's all we wanted to share of what we did for last couple of months and how you can start utilizing it very soon. You'll see emails on how developers of different applications and maintenance can come and start utilizing this metric stack. As of now, things are in place. We are just figuring out some of the connecting the plumbing pieces of how from Fedora side of things how who is going to get access to these different names basis and what's the right way to do it. If you go to Ansible repository in Fedora Infra on Pagir, you'll see OpenShift apps. That's where all the applications live. I need some small tweaks because there are some APIs and the general workflow from OpenShift 3 to OpenShift 4. OpenShift Kendra Platform 4 has changed a little bit. So if you want to make sure we do it in the right manner, obviously expect more communication from our site, more blogs and guides on how to do this. As a class that shared infrastructure and infrastructure and releasing your team. Sorry, I forgot my team name. We'll already work on migrating some of these applications. So they'll have a proper guide for someone who wants to help or someone who just wants to follow these things along. That's all from our side. If you have any questions, we'll answer them now and we can discuss. Feel free to even jump on the call and if you want to discuss something more than welcome for that. So which OCP for minor version are we using here? Sydney we are using, I think the latest version. We update it quite frequently. We still have to come up with update cadence and when we are going to update these things, we are hoping to update every month. Right now we are just updating as soon as the release. So I guess right now is 4.8. Something but 4.8 is what we are using right now and we update every small stable versions. Yeah, thanks David for sharing it in the chat. 4.9 has gone to you. So there you go. We have one more task up updating it and that will be done. Thankfully openshift upgrade process is quite easy and stable. That's nice. One good thing is that we are using openshift container storage for our storage solutions. It helps a lot in taking care of self-service things when it comes to storage. But OCS talk would be totally different if you want to listen us talking about OCS, how and why we love it. We would love to talk to you all later on. That's all. Thank you everyone for coming. I hope we are able to share some good things here. Reach out to us if you have any questions or suggestions for us to moving forward and we are looking forward to it. Bye.