 So far today, we've discussed collaboration and metrics independently. So now let's bring them together. Collaboration is about understanding the context of a communication, whether that is a suggestion in a code review or a request for resources or the dismissal of a security vulnerability. We need to understand what other team members are doing and why they're doing it. As a DevOps platform with a single data store for all of that, GitLab has the data, but how do you approach it? And how do you leverage it to break down silos and shorten those collaboration loops and automate with confidence and deliver your software ultimately more efficiently? In our next talk, Chris Riley from Splunk will talk about exactly that. How to look at analytics from a platform level to build that context and build that confidence. Let's take a look. The only thing better than working in an environment that is implemented DevOps is working in an environment that is implemented DevOps in a way that it is sustainable, visible. You know where you've been, you know where you're going. Thank you for joining me today and welcome to my session on measuring DevOps success with pipeline analytics. And I want to start out talking about an imaginary scenario. Maybe it's not imaginary for you, but currently it's imaginary for me. Let's just say you've got a chance to drive an F1 McLaren race car. One of the most brilliant, fastest pieces of machinery out there. You get in the car, you start to look around and you get really excited about the idea of just going fast velocity because after all, that's what we're almost excited about is building applications faster. But once you get in, you realize that you don't have a seat belt. You don't have a helmet and you look in front of you and all you see is a country dirt road. Well, this is the way a lot of organizations implement DevOps. They focus solely on application velocity without thinking about what it takes to go fast. And if you're in a state of transformation today, you need to think about visibility from the day one. So in this McLaren, are you going to feel very comfortable driving it down the country dirt road without the tools that you need to ensure that that velocity is going to sustain? My name is Chris Riley. I am a DevOps advocate at Splunk. That basically means that my career as a software developer was not fantastic, we'll say, but I could not give up my obsession about talking about improving application development practices and doing a better at delivering better quality applications faster. If you want to find out more about me, you can scan that QR code, get connected to my podcast and please reach out. I love hearing from people who view my sessions, but I also, if there's something that you want to discuss about or you disagree with me, just reach out and let's have a conversation. So obviously we all want to be in that McLaren scenario where we just focus on velocity. And that's usually where developers want to be and DevOps engineers want to be and certainly enterprises want to be able to deliver functionality to their users faster. But delivering that functionality in a way where we don't know where we're been, we don't know where we're going really means that you have a shelf life or at some point in time, you're going to have to stop everything and build that visibility into your environment. So traditionally what it looks like is enterprises have silos. They have silos where developers are really good at visualizing their activity. Maybe it's test activity. Maybe it's Prometheus metrics. They understand really well what they're doing. Then you have the silos with the DevOps engineers where they may have a good understanding of what's happening in the pipelines that they've developed. So they've automated these delivery chains with GitLab and they're moving very fast and they're able to offer this as a service to their engineering team. But they don't really understand what the developers are doing and they don't most of the time don't have a good understanding of what's going on production especially in large organizations. And then in the production side, you have SREs and security professionals and monitoring professionals who understand really well what's going on with the application in runtime but nothing that came before it. So their understanding starts right then. Well, the problem is you've seen the DevOps infinity loop and you know that it's supposed to be an infinity loop. But how do you complete the circle if every team is speaking their own language and understands the information about what's going on in their delivery chain differently? And so these silos come at a big expense and I'm sure you've experienced it. You've probably had to interrupt your processes in order to explain how things work in your environment because somebody further down the chain just did not understand and they need to understand. Maybe it's in an actual incident where it's more stressful. Hopefully it was in a scenario where they just want to confirm or better understand that you're doing vulnerability scans for example. So if I can convince you to think about the software delivery chain in a different way think about your software delivery chain as an application. It's the meta application. It's that application which ships code and its customers are the entire engineering team but also the business because you have to deliver functionality. Well, if you think of your delivery chain as an application that means that you already understand that it needs to be operable because this is how we manage every application. It needs to be secureable. You need to think about security and potential exploits. You need to be able to measure it. You need to be able to at any point in time go to somebody and say this is how we're doing. These are the things that we can do to improve. This is how we were operating previously. If you don't have a baseline you have nothing to compare to. That's what pipeline analytics is. And my job today is to if I can get you to buy into this idea that the delivery chain is application of applications then to make sure that it's visible and pipeline in the analytics is the tool that you do that. Why? Well, hopefully I've given you all the reasons why the ultimate reason why is you want to go fast. You want to deliver functionality faster. And the best way to do that is to make sure you understand what is going on throughout your development process. But obviously just one obvious thing is that if the delivery chain is down then no code chips. You can't deliver functionality. So that's why it needs to be operable. Your software delivery chain is a part of your attack surface. You do not need to look far into recent industry news to see that delivery chain attacks are becoming more common. Could be secrets. Could be configuration drift that make it into production and thus make your application more exploitable or even more advanced attacks where artifacts are being injected into the delivery chain itself so that they become part of the production application. So your delivery chain is a part of your attack surface which makes it everybody's responsibility. As I said, visibility silos come at a huge cost. Reconciling information between teams is a distraction. Nobody wants to deal with it. Wouldn't it be a lot better if each team had their own lens on the same data so that you didn't constantly need to explain what different attributes of your environment or what different dashboards mean? You need to align your software delivery chain to business value. You as an engineer, even from your career perspective it helps you to be able to point out how the functionality you're delivering impacts the business. This is how you grow your career. This is how the business better understands how to better serve their users. Technical debt. This is a thing that happens always. Well, you will never have an opportunity to address technical debt unless you can identify where it is. It could be in your repositories. It could be aging branches, for example, which is a fairly common form of technical debt and risk. So creating visibility so that you can spot those things is a way to address technical debt sooner and better. And finally, I'm sure you've heard these terms shift left and shift right. If you want to operationalize those practices and you want to put more responsibility earlier in the software delivery chain, you can't expect to do that and just assume that developers are going to take on more responsibility. You have to give them the tools to be successful. One of those tools is to be able to see what is going on so that they're not shifting left or right blindly. So pipeline analytics is a critical aspect to any delivery chain. And I've always been surprised at the few number of enterprises who have actually considered this. But I know why because usually in DevOps conversations we talk about speed. And we don't often talk about how to pave that road in front of you. The good news is it's not tremendously difficult, especially when you have tools like GitLab. It's not hard to bring this data together to start to create those visualizations. So the remainder of this conversation is really how you do that and why you do that. So how? Well, the first step is you just need to get this telemetry. So the tools in your delivery chain from your ticketing system to your repositories to your pipelines, all of your automation and into production all produce telemetry and data. Different tools will produce data in different ways. Some will do it via logs. Some will do it via APIs. And you also have webhooks. So you need to gather these metrics. You need to collect these metrics and instrument for them and get them into a monitoring platform. Now, how you instrument matters a lot, especially when you have multiple tools correlating data together, you want to have apples to apples comparison. With GitLab, the API is a tremendous source of information all the data and the telemetry that I'm about to talk about here. But also if you can find tooling that has direct integrations into GitLabs and other tools in your environment, then the quality of the data is going to be better and then it's also potentially supported by the vendor. The benefit of logs and APIs is they're very verbose. They have a lot of data, especially in your CI CD processes. The benefit of that is if you're using testing tools that don't have a lot of data, you can actually surface that data through your CI CD tool. And that has to do with data resolution that I'm about to talk about. Webhooks are extremely powerful, but the problem with webhooks is it's hard to go historically. Really, webhooks are going to give you day forward and a really great real-time information at a low volume cost because it is not a lot of data. But you lose some of your verbosity, you lose some of your detail, but it is very useful for real-time data. You may mix and match this. Now, the crazy thing is this instrumentation is not tremendously difficult. I mean, you can already in your brain kind of intuit how you would build this. But what you choose matters a lot, so you need to be strategic about that. Once you collect the data, if you're collecting data from multiple tools, you need to correlate them together. So if you're wanting to track, for example, what happens from your ticketing system all the way to production, your CI CD tool, your ticketing tool are going to be the most critical tools. In order to correlate those together, you have various options to do that. But you need to think about that. And then finally, once you've correlated the tool and you've presented it in a dashboard form, you need to observe it over time. Now, that seems so obvious. But yes, you need to be strategic and create meaningful dashboards that everybody's going to understand. And it is important not to boil the ocean here and think that you're just going to suddenly out of the box visualize everything. So this gets to some critical considerations. The organizations I've seen implement this and some very successfully have run into some very common challenges. First is data resolution. When it comes to pipeline analytics, you're going to get the greatest resolution when you correlate two or more tools together. Individual tools can also give you good resolution, but spanning tools will give you even better resolution. In the scenario I talked before, if you want to correlate your ticketing system to your CI CD process, you need to have data related to the pipeline version that you're using and your commit messages need to include the ticket number because that's one of the only ways to tie the two together. So you need to think about this stuff. Tool sprawl almost every organization is in a tool sprawl scenario where they have a ton of tools out there is a noble ongoing effort to reconcile those. And fortunately, with a lot of tools out there, you can do that because the more tools sprawl, you limit the better quality pipeline analytics data you're going to have because you know that you're speaking a single language because each of these platforms will have slightly variations in their logs how they report data, especially as we talk about releases and the types of releases. And then the third is information architecture. If you're a developer watching this, this is something that's going to make you grumble, but I promise you it impacts you in a very big way. You need to organize your repos in a meaningful way by projects. How you name your repos is also extremely important. How you version and name your pipelines is important from the DevOps perspective. Your information architecture is what is going to determine whether pipeline analytics is easy to implement for you or you have to go through a lot of minutia and re-architecting of how you organize your repositories, et cetera in your organization. So if you already know this is a problem, it's something worth addressing today. Some more considerations that will impact your organization, how you think about deployments will inform how you set them. So for example, in an organization that wants to measure change failure rate, the frequency of which deployments fail, if you're doing blue-green deploys or canary releases, a failure may not be the same as an organization who is on regular sprints. A canary deployment that fails might not be a failure because that is part of the process. This bleeds even into the world of feature flags. So how you determine failure is different pretty much for everybody. And my experience has been different for every customer I've worked with who's implementing pipeline analytics. So your deployment strategy will influence how you calculate the metrics I'm about to talk about. Automation is a prerequisite. You have to embrace everything as code. You have to brace infrastructures code. You have to automate your pipelines. You can't treat your software deliver chain as the application of applications unless you have this automation. And starting with one metric is okay. Organizations are really successful when they start with one meaningful metric that is high value but has a huge impact. Because if you are the one implementing pipeline analytics, getting interest can be difficult. But if you create a meaningful metric that everybody can understand and put it out to the organization, many people are going to want to know how they can get that for themselves. And so these metrics actually can be a way to lead people into embracing pipeline analytics so that you can do more. And you can start with one metric. The most common is meantime to recovery from the incident response side and change failure rate. But also application velocity. And there's other metrics that I'm going to talk about and show you that I'm really excited by it and I actually like better than MTTR just because I think that they're more fun. But you need to think about it. Also, there are metrics that can hurt you. Creating leaderboards for engineering teams can be problematic sometimes, especially if you don't understand what in particular you're measuring. So for example, a team building a service that's a backend service is going to have a slower release velocity than a team building a front end service. So just because a release velocity is slower, does not mean that they are not delivering in value, that they need to be delivering to the organization. So let's talk about what these metrics are in particular. So I mentioned three primary use cases of DevSec, or I'm sorry, that's another really big topic, pipeline analytics and a topic at this event. So the first two are the ones that we don't get the most excited about generally, but they're really important. And all of these are influenced by pipeline analytics. So you cannot measure these categories, which are operate, compliance, and measure success without having pipeline analytics. So the first is operate, monitor. You do it like you would operate any application or any delivery or any application in production. These are your standard metrics from an infrastructure perspective, especially if you're running your own tool. So it's going to be memory, CPU, disk, all your compute metrics. You're going to want to keep track of is it up or is it down. Most enterprises that I see that are successful with this actually have status pages for their SDLC for the tools in their delivery chain, even if they're SaaS products. They will test their availability via API. And if you are hosting these yourself and you're managing them yourself, the tools, then you're going to have rate error duration, use metrics. So for example, this could be a secrets management tool and artifact management tool, you know, all the typical tools that you would find into delivery chain. The next one is auditing your SDLC. Now I already gave you the hint here of what I was going to talk about a little bit, which is DevSecOps. So in DevSecOps, we have three key categories of practices or use cases. The first is building more secure applications. That is code quality, code quality scan testing. Second is securing the software factory, keeping track of who acts that has access to your repositories to your delivery chain or your CICD to your secrets management tool, et cetera artifacts. This is where we start to see a lot of delivery chain attacks. It is pipeline analytics that is going to give you the visibility to spot issues in these areas. And for every tool, you are going to be looking at requests by policy, authentications, where authentications are coming from, denials, overall activity, any anomaly suspicious behavior. So pipeline analytics is really what you need to support DevSecOps. But measuring the success is really where people get excited. This is where everybody in the engineering team starts to get excited because they start to think about their productivity, but also how their productivity in the application relates to the bottom line of the business. Now, there is a concept out there called the Dora metrics. And the Dora metrics are really powerful, but they are not the only metric that you can use. So Dora was a research institute that was acquired by Google and they release these four common metrics, which are deploy frequency, lead time to change, change failure rate, and mean time to recover. All of these are very useful to every organization, but they are not the only ones because maybe you want to audit your repositories and see if you have aging branches or information silos that are not being used in access. So some of the other metrics I like are work in progress. This is an agile metric, so you can see what is the throughput, the volume of your, from your ticketing system to functionality and development. Cost of downtime. This is a very real metric that is similar in right next to MTTR, but actually shows the exact cost of your outages. Amount of unplanned work. This is such a critical metric because when a developer has to stop building functionality to do something else, and this again is how you shift, support shift left and shift right, it is important to understand how much time developers are spending there because in a way this is a metric of burnout, which is risky for every engineering team and top talent, but also context switching, which comes at a huge cost. Context switching can be hugely disruptive and this is this thing that you want to work to discrease. And aging summaries I already say, just analytics on your repo, how old your repositories are, the activity by repo, et cetera. There is a lot out there that you could measure. And again, I said that you can even start with one of these and most common is release velocity, change failure rate and MTTR. But this is where you want to grow and these are some of the most common metrics out there. Now I'd like to take a chance to show you some examples. So the first example is an MTTR dashboard. This MTTR dashboard has the cost of downtime. This is relative based on your business. A consumer application is obviously going to have a more dramatic spike when there is an incident. But also this dashboard correlates incidents to activity from the developers and also who is addressing incidents the most? What is the potentially the highest risk for certain on-call engineers? Number of active incidents you have, number of incidents that need to be resolved. The next dashboard I want to show you is related to velocity and sometimes these are called your flow metrics. So these are the metrics of how fast are you going from your ticketing system to value to the end user or back because a lot of times change is the cause of incidents. So there's rollbacks and all of these analytics, by the way, in your delivery chain or pipeline analytics help to feed the context in production so that you can resolve incidents better. So the frequency or the deployment dashboard gives you an idea of the number of artifacts being deployed, whether they're being deployed, whose deploying them could be by team or individual. Usually it's by team that gets to the information architecture challenge, but it could be individual. I would recommend team deployment frequency. You can see over time what's happening there. And I think most of us intuitively want to say we want to deploy more faster. That's great. But not necessarily is that always the case. This is a view into CI CD. So this is what you would commonly use as a part of, say, a get lab integration to visualize pipeline analytics, but also on the repo side. So you're going to look at the number of releases, what's happening in your CI CD process. You might segment it by environment. That is very common. And in general, you're just going to look at the status of your pipeline. Now, one thing that I've seen organizations do very successfully is version their pipelines. And what they tell the DevOps team will tell the rest of the engineering organization is you may use the pipelines to versions back, for example. So you may use the version five, the version six pipeline and the visualization will be all around that. And here is an expanded view of this, but this expanded view incorporates other data from other testing tools. Because like I said before, your CI CD processes are a great source of seeing activity with other tools. So it could be SonarCube. It could be Selenium tests, whatever you're using to do in testing. You can also get this data in a single view from a single tool. That is ideal. That is where the tool sprawl comes in and can potentially be a big challenge if you have too many tools reporting the same data in a different way. And then finally, maybe we're just looking at merge requests. Maybe we're looking at activity cross your repos. This is also important both from a risk and a security perspective, but also to understand velocity. So all of these are examples of pipeline analytics that enterprises are actually implementing and consuming this data to grow their business, to have a better understand of how they're delivering chain is operating, have a better handle on risk, and also how the delivery chain impacts the business. These are all things that you can do that are not significantly hard. One thing I will say is every organization is slightly different. So there's an 80-20 rule here. 80% is you can get it in in a best practices format, instrument the data, start to visualize it, but there will be unique cases for every organization on how they choose to consume this information and how they calculate some of these metrics. Thank you so much for joining me today. I understand that there's a ton of information out there. I hope you enjoy the rest of commit in all the great sessions and I appreciate you taking a little bit of time with me. And also thank you to GitLab. Organizing these events is tremendously difficult. I've done it at a small scale for a DevSecOps event. And so I understand effort and please thank everybody contributing to that. Have a great day.