 Hi everybody, welcome for this new Jenkins infrastructure meeting. And today we have quite a few topics and we also have Victor who joined to do a small demo. But first let's look at a few announcements that we have. The first thing is we just noticed issues with the Windows package for the latest weekly release to the 294. For some reason we can't start the container that package Windows. So that really is a big delay, but otherwise packages are available for the other distribution as they be on Red Hat and Zeus. The second thing that I want to announce is we now have access to a discourse account. That service is available on community.jnk.io. At the moment we don't want to open it for everybody, so it's still in the beta modes. We are looking for people who can help us to configure it. So we would like to use that service to help and organize questions around Jenkins. So the first focus will be about the Jenkins user community and then we'll open it to a broader group of people. So if you're interested to help, just drop me a message with your Gmail account and you'll have an invite. And finally the third major issue that I want to share. We've been a little bit behind with this one. So we noticed last week that puppet certificate expired a few months ago. And so while we fixed the puppet master and few puppet agents, we now have to go back on every machine to be sure that puppet is in sync. We'll have a little bit more, we'll have more time in the coming weeks to fix that. So let's start with the first topic. So since we have Victor who joined us today to present how we can help with the open telemetry plugin or on Jenkins instances, I'm going to switch this topic at the first one. So based on discussion we had with Victor and other people at Elastic, so Victor has been working on the open telemetry plugin and so ways to visualize information. So in this case, it's not about monitoring Jenkins to know if Jenkins is working or not, but it's more how we can collect information and visualize those information to detect wrong behaviors. Maybe Victor, you want to share a little bit here. Sure. I'm going to share the screen first. Yeah, let me share the, let me stop sharing. So while you prepare the sharing, our objective is to use CI to Jenkins.io as a way to test this. Yeah. Can you share? Awesome. Here we are. Right. The very first thing I would like to introduce is the plugin is already available in the NKCI or has been released a few versions already. We are in the 0.x branch at the moment, so it's not yet 1.0 for the, basically, we need to standardize all the name prevention to be sure they can be used, won't change in the future. For such, we need to engage with different communities, you know, with telemetry spectrum and also in the continuous delivery foundation. So a little bit of context, the idea of this plugin is to enable distributed traces for every single build that happens in Jenkins. So it supports different type of jobs, freestyle pipelines, though the more information you gather is, will be always in the pipelines. So we just decided to focus more in the pipelines itself. And the point is for every single build, there will be a particular transaction if created as an APM distributed trace. So this particular plugin is completely agnostic. It does support different vendors because it uses open telemetry, it's becoming more like a standard from anything related with logs, metrics and traces. So that's why we choose the lead to use open telemetry to be completely vendor agnostic. So from the architecture point of view, you can plug to this plugin for any kind of backend. So the supported ones at the moment is any, but you can customize as you wish. But let's say you can plug to Jager, Promocers, Elasticsearch and the visualization as well. You can find which one to be in Grafana, it could be as one in Kibana as well. The idea is you install the plugin and then start collecting data such as every issue we trace for every build and metrics as well. Logs will become shorter than later. So I'll leave it more context. All the information we are gathering from the transaction point of view, from the build point of view, is the name of the job, the type of the job, whether it's a multi-branch type, how long it took. Any description, the build number, you know, traditional data that normally is in the metal of every build for the spans are normally related to you do a key checkout in a pipeline or you do a shell step. So every single built in step either provided by the core of Jenkins or any plugin that you use is going to be basically report as a span with all the details such as what is the plugin version, what is the name of the plugin that supports that particular step, the name of the step. If you use any label such as for shell steps or batch or power shell, they will be populated as well. Anything related with a checkout of the source control such as the deep ripple name of the branch, the user, these sort of things will be also information as a few views. And there are other things such as metrics, the number of builds, the number of failures, how big is the queue, how many items have been left, the queue, the disk space, related to the garbage collector and system CPU as well. So it has been added recently. So a little bit more of context as well. The only thing that you need to do to install this plugin is basically either you use G-Cas or using the UBIT interface which basically just specify the entry point. What is the credential that you need to use from the UI point of view which is the dashboard that you would like to use. And now a couple of examples how it looks like from the user interface. For every single build there will be always a link. You can see in this particular pinkie box with the link. This is customized. So you have your own ad-hoc. You won't see any reference to elastic as this is just an example. You can even customize this particular description. Then it will directly link to the distributed train. So this is probably the most dummy example of running a build. You just check out a source code called runmoven for building and packaging. So what happens is the entire transaction, this is the name, it took 10 seconds and then for every stage and every step you will start seeing spawns. So how long did it take to provision a machine? In this case just less than a second because it was a local machine. The time they took to check out the source code took one second. The time to build and package from scratch took four or five seconds and then these purely are eco. This visualization is specific for Laxity. You have different ones for Jaeger and Zipkin. All of them because it's open telemetry based. But yeah, so everything related with open telemetry is here. Feel free to ask any question. Anything about what is the roadmap or what are the things that we are planning to do. It's also in the discussion on any issues, but in the project you can see what are the different topics. We are interested to move further, which are the ones that we are postponing. So yeah, so it's public and you can access anyway. So a little bit more context. About distributed traces probably is more used for any application. So you have to monitor in real production environment, every transaction or everything that interacts with any microservices or third-party services. People use these particular distributed traces. So we thought we could apply the same for Jenkins. More specifically to troubleshoot and analyze how to analyze when something goes bad in the related with the CI ecosystem. So if I want to give you a demo, we already have a couple of Jenkins instances. The one I would like to show you probably is the one related to our production instance at the moment. So this is one of the instances that is probably accessible anyway. It's simply related to the ATM presently in the elastic organization. I'm going to log in and then from there I don't want to go deeper in details. I'm going to click a couple of folders to show you one of probably other projects that I'm more interested to show you because it's a real live demo. Please bear with me as well. So this is a multi-job project. So this is the serve library. We have an install again probably accessible. So in this case, as I mentioned earlier, there is a link. The link is here and we can click on that. How we can visualize the latest build from the distributed traces, how it looks like. So I'm going to open that one. Meanwhile, I'm going to go back to the previous tab and I'm going to open and build one of them, probably the previous one, like just finished. And I want to see how is the blue ocean for you. And then you get more like one-to-one how it looks from the blue ocean, how it looks from the distributed traces. So this is quite linear. It's just probably the most traditional step like you check out source code, you know, some linking, you do some tests and other things. But in theory, it's more like what basically is a Ruby Java project like just sequence of running. Every, every, from the blue ocean, every step that you see here most likely a step in the pipeline that does something specifically. This is a print. This is validating if the machine is a Linux or not. Then loading a file and blah, blah, blah, right. So it's quite precisely. Well, it's quite accurate what it does every single step. So in the UI from the blue ocean, we see this. How does it look like in the APNB security traces? So this is one of the environments we have. So in this case, the certain granularity because obviously the UI view here is more collapse compared to this one. Even though we can collapse, you can just focus on the different stages. And then we can move forward. Probably would be easy. So in this case, what we have here is the entire view of a distributed trace. And this is a transaction. So this particular bill or job was executed. So you can see here certain metadata. Such as what's the name of the pipeline. I'm going to zoom a little bit more. What's whether it's a multi-branch type. On other things, the time it took, the URL. So this is the information regarding the transaction. Then for every single stage, we are going to see the same. We have the checkout, which is the default from the SCM, sort of from the clarity pipeline. The different specific one we have is the checkout, the check license. So you can see like this stage here are the one you can click back the idea. So those are spans. And if you go a little bit deeper or let me go deeper, for instance, and the checkout happens and by default. We already have a slip, but that's something specific for us anyway. But what is important here is from checkout, we can visualize for the detail, what is the request story, what is the branch, what is the protocol that has been used, if it's a necessary or not. So these are standard attributes we are using. Others need to be standardized and that's what I meant earlier. So every single step that happens in the pipeline is a span. And there are meta spans to check out the stages, which are meta spans. The similar for how long it took to provision a machine. For this one is an ephemeral one and it took almost two minutes since the request of the worker was asked by the pipeline. So this is the idea. So every single span you see here will give you a sense of what's going on. We don't see any failures in this case and everything works out of the box, but then you can go deeper to any metadata or any logs regarding the build or the machines that has been used. This is not yet integrated much, that's in the future it will be. Other things that are important is how you can visualize metrics, how you can visualize what's going on in your CI. So there are a couple of dashboards that give you a sense of the healthy of your system and all of them are based on distributed traces rather than metrics. At the moment, all the metrics we gather from the plugin are not used much, but probably those are not the services that I want to show. I want to go back to the open telemetry and I want back to the dashboard. So there are a couple of them. I'm going to click on the provisioner and I'm going to click on the CI. So provisioner is about the build queue, how many machine has been provisioned, how long that you took, how much time you spent. So it's trying to, for every time you run a pipeline and there is a no request, you will see an entity and that's the idea. Gathering details from these particular transaction spots and show these details. So in this case, in the last 15 minutes has been requested at once at this particular peak, 11 workers, and then we can see the number of jobs that are queuing at the moment. So those are informational DPO sets of course going on. If we move to the traditional dashboards that probably you made with all the kind of tools, so we have a similar one where we monitor the queue, we monitor the status of the jobs, how they look like. This is in the last 15 minutes, the number of workers that has been requested, the number of agents that are alive at that particular time, all the things that are more precisely on the pipeline, the number of steps, how often they are called, the time it takes, the duration is in nanoseconds or milliseconds. And what is the top 10 of jobs that has been queued, the number of steps that are queued per minute. If we have any failures, we will see here as well. Failures such as there was our traditional Java stack trades, there is a connection issue. So this is some of the dashboards that are already public as well, we will be happy to show you that as you wish. So there is a lot of room here to do. We are in the transition of using this in our production instances, working with the community to help them to use this and gallery use cases or value a scenario that are important to show a lot of things here. We are trying to prioritize this, mostly this is how it looks at the moment. I don't know if you have any more accurate questions that you want me to answer, but in a brief summary, this is probably the demo I want to give. I don't listen to you for the moment. You have a sound issue again. I'm muted, I was muted, sorry. Thanks Victor for the demo, that's really impressive. What is interesting here is why monitoring Jenkins is pretty easy. I mean, you know if Jenkins is running or it is not, but really having information about how it behaves, if it can provision nodes, how long it takes to provision nodes, which protocol is used and so on. That would be really interesting, especially on CI, the Jenkins.io, because we regularly switch between cloud environment. We use different, I mean, we regularly test latest plugin and so on. So that would definitely benefit to the Jenkins community. So I'm really excited to see this. So another interesting, I think, area that we could work is to identify what are the different scenarios. Because I'm sure other people have other questions as well. And so if we can build generic dashboards that people can get inspired, that would be really nice to drive. So I think what we could try is to, let's say, focus once a month, something like that to report a little bit of the improvement we did here and the kind of question that we were able to answer. That would be really nice. Another thing, I don't think you're using the plugin metrics, the metric plugin. So by saving? So I don't think that you're relying on the metric, the plugin metric. No, no, not at all. So this particular information we are gathering is purely in the open telemetry spectrum. So all the information we are gathering and all the SDKs we are using to create these distributed traces is purely open telemetry. So all the information I mentioned about the Java virtual machine, garbage collector and so on, is not used in the metrics, but is used in the open telemetry that provides this kind of a whole coming over flag. But again, this is pretty much new. We just merged the PR a couple of minutes ago. Yeah, this is a really nice project and I'm really excited to work on that. So thanks, Victor, for that demo. Any question? No. So lucky for you, it was recorded. Yeah, I missed something, but I have a few good news from the CDF lens. So when you finish the management, you can discuss these topics at any time. Okay. So then I propose to quickly move on discourse. So I sent an email this morning to collect feedback. So as I said, we deployed so the company behind discourse offered the project to sponsor with the business here. So we have access to quite a lot of things. The service is available on community. The Jenkins radio at the moment it only work on invite me as we want to better understand the tool. So if you're interested to participate, feel free to reach out. So for me, the main question that I have to solve in the coming days is how we authenticate with that tool. Do we only rely on GitHub SSO? The benefit that I have about using GitHub SSO is we don't rely on the third service and everybody who contributed Jenkins already has a GitHub account. And the other question is, how do we also allow people to use their Jenkins account to connect on discourse because it's also not a possibility? Well, I commented in the middle increased, but actually my proposal would be to focus on the Linux foundation SSO if possible. Because we definitely don't want to keep our old app for long term as we discussed in previous meetings. So we want to have it, but not for every Jenkins user and ones who use Github, etc. But for people who don't need permissions to core systems. So for me, my preference would be to not use GitHub directly, but to use Linux foundation because in this case, you'd better also support for Gmail, G Suite accounts, none of these. Not everyone who would be using this code would have a GitHub account. Well, I think it's not to be able to create that unless you're based in a country where you cannot use a GitHub account. For example, if you're based in Crimea, Iran, etc., where Iran is no longer concerned, but having a foundation level account would be nice. Yeah, that's a good point here that you mentioned Jenkins user and not Jenkins developers. Yeah. So again, it's preference. I have no idea about technical feasibility. And the guest Andrew Guggenberg will eventually comment on that thread. I'll see whether I'm completely nuts or not. But yeah. So last time we had a discussion about using Linux foundation authentication system that was okay. I'm not sure if we have access to the groups, but those are technical implementation that I have to see with Andrew. Yeah, I guess our point is that we don't really want to use Jenkins sold out for new systems and services we deploy. Okay. So we kind of have an agreement that we'll try to avoid Jenkins accounts. Okay. Thank you. And the other, I mean, I would like to clarify is when do we consider that the discourse instance is really to be used broadly? What would be the acceptance criteria for this? I think we need some more group of champions who would drive the adoption and this group is already forming in the developer manifest from what I can tell. Okay. Then, yeah, once this thinks that you want everything to be more related, there can be just a proposal that Jenkins governance might include developer manifest that let's make it official. Then we just have like common process. So if there is a consensus, you hold for that, okay, it's official. I wouldn't really want to invent a bill there. So it might be there, but if you want to write a specification for that. But yeah, I think the main point for us is to just evaluate the system and see whether it fits our needs. Sounds we have an agreement there. That's perfect. So we cover all the point that I have regarding discourse. So I propose to move to the next topic that I'd like one to bring here. CDF news. Oh, okay. So one of the CDF news that they have brought up a question about transfer and KWS account. So generally, there is no concern from the technical oversight committee from tracing. Tracy is supportive about that. Tracy, you can like search, explore what would it take? Because we know that Spinnaker already has a AWS account, but if you are Spinnaker account and Jenkins account, you will be in the same situation like Colby's and Jenkins account in principle, in terms of billing. So we need to figure it out and Tracy will help us with it. There was no opposition. And yeah, I will be able to drive it on the CDF QC. It's kind of fear to be announced, but I will be joining the technical oversight committee because we're on the forum and we need four seats, so there will be no elections. This year, so effective July 1st, I will be a talk member basically replacing Kiki in this role. Which sounds really good news. So what else? Yeah, nothing else related to CDF at the moment. So we don't have any estimated time for the AWS accounts, right? Yeah, no estimated time and yeah, we agree on all sides. So what is currently happening is what I sent to the information list. We are leaking the cloud this account so that there would be an independent Jenkins account to which you will connect all sponsorships. So this account will be ready for transition. We'll be technically able to add more contributors at this stage. I believe so, but it won't be owned by CDF. So there will be somebody who's formerly the owner. Maybe it will be still cloud based. Maybe it will be some of the individual contributors if they interest the credit card from that, I'm not, but it's yet to be decided. But we are doing the initial transition stages and apparently we initiated the discussion with the CDF. But yeah, let's set the expectation. If we get it done by autumn, it will be already an achievement. So we didn't expect it to happen overnight. Yeah, I mean, that's regret because then that means that we would be able to invite to that accounts and then let these employees, which is really nice. But we should be able to unblock once the cloud this migration will complete. And thanks to Ben Walding, thanks to Ray Sullivan, another of these employees. So this is being processed quite quickly. So it is spring planning and other things apply, but generally it shouldn't be even soon. That's awesome. We are running over time for this meeting. So do you have any other topic that you want to bring here? So, yeah, then thank you for your time and I'll see you later on RSC. Goodbye. And thanks, Victor and Ivan. Thanks.