 Good morning, everyone, do we all have enough coffee? Maybe? OK. So I'm here to talk about my first year at Chef, measuring all the things. I've been at Chef just over a year now. And a bunch of people ask why I came to Chef and what brought me to Chef and what I used to do. So I'll start by talking about that a little bit. So who am I? Why am I there? I started out doing a little bit of software development for a hot minute at IBM. I was a software engineer. And then I did hardware. And along the way, I ended up supporting those systems a lot of the time. So that was my background. And then I went and got a PhD. And right now when I'm at Chef, I spend a lot of time helping the company out internally. I ended up consulting them, helping them out, kind of figure out how to measure what they're doing. Sometimes I help our customers measure what they're doing. But essentially, I end up helping us all figure out how to make things better. And especially helping us make things better with metrics. Because we can't improve what we don't measure. So the joke is that I rub science on things to make your dev and your IT ops. And your dev ops better with metrics. So who in here is on the dev side of the house? Who in here is on the ops side of the house? Awesome. Who's security? Okay, so we've got a handful of security. Who's like QA test? And who's here just for the free food? Fantastic. Okay. Who's here for some food? Coffee. Awesome. So as we start moving forward in our journey, metrics really help us make better decisions. And it's finally starting to move into the IT side of things. For years, business has known this, right? We all have metrics around sales, right? We all have heard metrics around sales and revenue and ARR and TCB and like whole bunches of three letter acronyms that people all throw at us. But when it came to software, we all just sort of guessed or had intuition or tried to figure things out. So now we're starting to do this around software. So what did I do at Chef? What have I done when I work with companies in this area? And how can you do it too? So otherwise known as a talk outline. Where do you start? What sort of numbers should you think about? What about benchmarking? And what are some other things that you can think about as well? So where did I start? First of all, you can start at the highest level. So for this, think about what makes sense for you. So for me at Chef, they asked me to take a look at the entire level of Chef. So I started by talking to some of the executives, several of the executives. For you, if you wanna do this for your very own team or at an individual level, start talking at your team. If you wanna do this for your team, start talking to middle management. Ask around, ask what's important. What matters to you? What matters to your team? Is measurement happening? Is metrics even a thing? Ask around. Next up. But all we do is fight. Or I mean, talking circles. One thing that people always forget about or they don't necessarily realize is that metrics provides an opportunity for communication. I'm sure that several of us here have heard of CAMS. Have we heard of CAMS? CAMS is a thing, right? When we talk about DevOps, we know it's communication, automation, metrics, sharing. The really great thing about metrics is that it actually helps us communicate because now we can start sharing what it is that we're doing. So metrics really help us long here. If you still aren't sure, ask what's important. This can be done at any level of the organization. So ask your team members. At Chef, I started by laying some groundwork. I shared this really, really great O'Reilly short book. It's free, it's all online. It was written by Hilary Mason and DJ Patil. Since they wrote it, DJ Patil was named as the White House Data Scientist, Chief Data Scientist for the US. It's about 20 pages. So I just sent out an email introducing myself to the company. I attached this, I said, feel free to read it if you'd like to be, you know, if you're interested, if you'd like to take a look and see what I'll be doing here at Chef. And I just let them know it's about 20 pages. So quick, quick read. I also established the Chef Scorecard. So this was something that came from my conversations with executives. Asking them what was interesting, what was important to them, what were their goals for the year. So we called it Scorecard because goals and initiatives was like a special word for executives, right? Because words have meaning, right? Definitions matter. So goals was a special word for them. So I called it Scorecard. Now the rule here, this was important milestones for the whole business. It had to fit on one page. No cheating on font size. I used to be a professor, right? So I was like, I had to hold myself to the same rule. No, like six point font. I think I kept it to 11 point font and one more margins. Because then if I needed to, I could print it out on a single page and anyone could keep this near their desk. The important thing here is that by keeping every important milestone or goal on a single piece of paper, anyone else in the company could figure out if what they were doing that day was important to somewhere in the business. Could my work today map to something that was important to somewhere in the business? It may even be important to map to a few areas in the business. And this was important. At first a few people panicked, right? Because it might take a little bit of thinking to figure out where it is that what you do maps. So this is scrubbed, but this is the 2015 scorecard. So you can see how ecosystem training and community. Now some of these things might be difficult to measure. Totally fine. That doesn't mean they aren't worth measuring. Some things we decided to come around to, we'll figure out how to measure them later. These are things that are important. It was even interesting. So one example here, international infrastructure, it came from our VP of legal. I was asking her what was important and she actually made a comment and said, wow, thank you. No one's asked me this before. She's a VP. So even just asking everyone what's important to them at the appropriate level can be huge. Examples from financial, marketing, people. So that's our HR team, sales and BD, product and engineering. So I was able to collect all of these. Now, once we had all of these, what we wanted to do was collect them and measure them periodically. This was the goal. So what we wanna do for each of these is define them once you have whatever metrics that you have. So again, each area was going to be responsible for their own. Define it so you know what it is you're talking about. Set a target. This target may change and then measure it periodically and then communicate it throughout your team, your organization, whichever the appropriate level is. You have a metric, you have a baseline. You know where you are currently and you know your target. So you can see how you're progressing. Now, keep in mind, your metrics aren't set in stone. A handful of those, so that scorecard was our initial scorecard. A handful of those metrics ended up getting tossed because there were a few things that some of the executives that were super important, absolutely loved, after a few months or a few rounds, either I'm not, they were no longer super passionate about it or they realized that it wasn't driving the things that they thought that they were driving, which is fine. Another thing, you can start with an MVP, that minimum viable product and then you can iterate. Also, as I just mentioned, toss what doesn't work. This is a continual improvement process, right? This is, treat your metrics like you treat your drive-offs. Okay. The next big challenge. What numbers should I think about? Business has set metrics that they use. We've been doing accounting for years and decades. So I also have a master's in accounting. I've used like money numbers for examples because we know them, right? We've been doing accounting for decades. There's a reason those numbers don't change is because we've tried them for years. Software is different. We're just starting to figure out how to build software and how to build software in different ways. So what numbers should we be thinking about? There are three general categories here. So any time you start on your journey, try to think about at least three general categories. If you're just barely starting on this, pick one or two in each category. The first one is going to be external. Outfacing customer focused. Now, when I say outward facing, customer focused, think of who your customer is. Your customer might be in your company. So who is it that you support? Think of one or two metrics here. It can be subjective. That's gonna be like my next objective versus subjective. So subjective is something that comes from like a survey. Someone's opinion. Could be customer satisfaction, even if it's your internal customers. It could be objective. It could be something that comes from log files, okay? Internal, so you're inward facing your process improvement. How do your inward, how do your own systems behave? And then culture, because people matter. Go back to your most important things list and pick something on that list. It can include things like maintain excellent customer satisfaction, increase the speed of software delivery, hit your revenue targets, increase software quality or usefulness. This is going to be your external measure. Okay, how does this apply to us? It gives insight into your metrics that are important to areas. It identifies potential external metrics and it's going to focus your efforts on value added metrics. At Chef, we started with an engineering initiative. So start by identifying a goal. So I flew through that Chef scorecard earlier. We started with Chef every day. We wanted to be able to ship every day. So ask yourself, is there existing data? Is there anything I can reference right now? Took a look at the data when I first got in there. We didn't have consistent sources of objective data. So that objective data comes from our system. We had some data. It wasn't totally consistent. It wasn't super awesome. We took a look at some of our third party data. We were using third party tools. Dug into some of that. Not super awesome. So what we ended up doing for initial steps. Remember that slide I had with a star just a minute ago where I said, MVP, iterate and improve? So what we did is we started by doing some subjective data collection. We went around to every team. We went around to every team lead and we said, so starting with test maturity. We know that test maturity is going to be an important part of delivering code for people here who are devs. Does this sound right? Does this feel right? Awesome. So what we did is we said, before we start building out this massive tool, which is gonna take a whole bunch of time, we're just gonna go interview all the team leads on a periodic basis and say, do you have all of these tests done? Have they been done? Do you have your unit tests, your component tests, your integration tests, your upgrade tests, your compatibility tests, and your ancillary tests? Have they been done? We did this as a gut check. Interviews were conducted monthly with team leads on each product and we calculated the percentage complete for each area calculated and communicated. How hard is this gonna be to do for anyone in here if you don't have any data yet? Or if you realize that your data isn't very good. Can we do this? We can all pull this off, right? Okay, here's what we found. Oh, and these are, by the way, some of the example questions for integration tests. So we pulled together questions, pulled together the team. This accomplishes a few things, by the way. This socializes everyone to the idea of metrics. This gets buy-in for everyone. This gives everyone exposure, feedback. We did this for several different areas for unit component integration, upgrade compatibility, ancillary. We ended up adding a few more areas. The other thing that lets us do is it lets us test which ones we think are most important so that when we start asking for support and resources to build in tooling, we can have a pretty good idea of which ones we think are most important. So if I wanna come in and tell them we need metrics for automation and tooling and we need metrics for all of this stuff, build this, how much excitement do you think I'm gonna have for people who are like, no, I'm super busy? But if I socialize this and I get data, this is what we had. Here are our initial charts just for the first couple of months. Not necessarily super awesome. This has been scrubbed because these are charts for all test maturity. So this is a composite score across each product or feature, so you will not notice product or feature names at the bottom, but it exists on the internal charts. So by the way, this is real data. This is old data though. This is also super old data. So we now have charts that cover lots and lots of stuff. Here's where it's interesting though. So what about benchmarking? I'll come back to these charts. A benchmark is absolutely essential. What you have to have, you have to have truth. Even a bad baseline is good. Because it tells you where you are and it tells you where you can improve. So one thing that was awesome here is that those charts are honest and they are true. And that's why I'm showing you super, super early is because the team was super awesome and they acknowledged the fact that they had improvements to make. You need to have at least one reference group. You have to have something to compare. So I'm showing you month two. Month one is good. You have somewhere to start and then you have something to compare to. You can compare yourself, you can compare to yourself, back. Like back at time period. You can compare to someone else. You can compare to another team. You can compare across the industry. You can compare somewhere else. You also need to make sure that it is communicated and visible either to your whole team, to other teams, to the company. So let's compare, okay? So let's run through that checklist I just had. Charts for all test maturity for each product and feature. Does it have truth? At least relative truth. The teams were committed to being honest. It was communicated and visible. This was shared to a Google Drive that they actually reminded all of engineering to take a look. It was also like if anyone in the company wanted to take a look, you totally could. There is at least one reference group and even a bad baseline is good. Like they went ahead and pointed out, like they just went ahead and let that product feature. Now this was actually a pretty new product and feature. I will point out or notice that there are a few areas where month two goes down. It's because they introduced, they realized after the first month that they were missing an area of test they wanted to add. So instead of like recalculating and renormalizing, they just went ahead and said, we'll take the hit. We all know what this means. We're fine with it. Again, your metrics aren't set in stone, right? They went ahead, they added a whole new test area even though they realized it would mean that their charts would look like it went down because it looked like they were missing a whole area. Start with an MVP, iterate and improve. We're now building in metrics for automated data collection. They started with interviews because they wanted to get a handle. They wanted a better understanding systematically of what was happening. TOSFIT doesn't work. So we talked about external metrics, right? So then we were also going to be tying this to build times and ship rates. That was our external metric. The internal metric is test maturity. Also wanted to tie to cultural metrics. So I am a big fan of the Westrom organizational cultural metric. There are teams now in Chef that have started capturing this on a quarterly basis. Organizational culture we know is important. This measure in particular actually comes from the work of Ron Westrom. He is a sociologist who used to study healthcare and aviation, which some people say like, why does that relate to IT at all? It's because healthcare and aviation are highly complex systems. And this organizational typology predicts what happens when things go wrong in highly complex systems with high risk. Would anyone here agree that IT is a highly complex adaptive system? Do things ever go wrong? Maybe. So this is awesome because the thing that it optimizes for or the thing that's really important is information flow and trust. So who here has a friend? That works in a pathological organization. Low cooperation, messengers are shot. Responsibilities are shirked. Bridging is discouraged, bridging across teams. Failure leads to scape coding and novelty is crushed. Within the bureaucratic culture, you have modest cooperation, messengers are neglected. Narrow responsibilities, bridging is tolerated. If there's a failure, it leads to justice. And novelty leads to problems. Now in a generative culture, this is performance oriented. And high trust, you have high cooperation. Messengers are trained. Risks are shared. Bridging is encouraged. Failure leads to inquiry and novelty is implemented. Now, this is particularly awesome. The Western culture is predictive of both IT performance and organizational performance. And across, I'm lead investigator on the state of DevOps studies for the last three years. And overall, we tend to see for high, across the whole population, about half of people end up in bureaucratic cultures. A third in generative cultures to about 15% in pathological cultures. But the high performers are most strongly represented in generative cultures. And by the way, this is a great way to measure it. I know people always say that people lie on surveys. But you can't really get a perception of what it feels like to work on a team from an HR database. You just can't. These are scientifically valid, statistically valid, statistically reliable. You can ask people a six quick question item. And then what you can do, strongly agree seven to strongly disagree, sorry, disagree one, that would be strongly disagree. You can then average up their scores. Four is neutral to come up with like one measure for them. And you can get someone's basically culture temperature, right, you can get their temperature for what their culture is. And this ends up being a leading indicator. Once culture falls apart, technology and tooling starts falling apart six months out. So, summary of that engineering initiative. They collected three types of metrics. External was shipping every day, among a few others. The internal metrics, test maturity in all of those sub-components. The cultural measure was Western culture. They did benchmark the metrics. A few things to consider, subjective versus objective. So they started with subjective and are now moving into objective. Leading versus lagging. So, lagging, most metrics are lagging metrics. We didn't chat about that quite yet. So, lagging metrics tell you about things that have already happened. Leading metrics tell you about things that are about to happen. So, Westroom is both leading and lagging because it tells you how things feel and how they have been feeling, but it also is an indicator for how things are about to go. By the way, one thing to consider is whip limits. So, Domenica DeGrandis is here, and she'll tell you that whip limits are a fantastic leading indicator. Because if you don't implement whip limits, that's kind of a leading indicator for how your software delivery and especially your lead time and your cycle time is about to kind of blow up. So, whip limits are a particularly good indicator for that in technical work. So, another example at Chef. Some of our external customers. So, the goal was increased commercial adoption. Our existing data, some of our existing data wasn't really consistent. Premium features were being reported just as account. We just had a summary count, which wasn't really nuanced. We didn't really have good holistic data here. But our account managers and our success engineers knew that more holistic data would be more meaningful. They were having really great conversations about how you can use Chef and how there were other things happening in organizations that could really talk about your DevOps journey. So, we started collecting more data. We had several key areas for a more holistic approach. What we did is we were walking, they started walking customers through an assessment that had several steps. And we could create then what we call a dojo, the DevOps journey assessment. And we could present them with a chart. And the great thing here, remember I said metrics is also about communication? Before we had a count, and it was visible only inside the company. Now we have a DevOps journey assessment that we also share with the customers. So now we communicate it internal the company and external to the company. So these are two different customers. The second one is what the one I'll use for our benchmarking. It has truth. It's communicated and visible to the customers. We have a reference group. Here the interesting thing is that the reference group is current, it's where they are now, and it's future. It's their goal six months from now. It's where they want to be. So they set a goal and they decide where they want to be. And also even a bad baseline is good. They decide where they are, they decide where they want to be. Now, remember this is also about continuous improvement and iteration. I stepped back in, so I am not the only one measuring all the things. I hope that has been clear. This is totally, I'm only one person, and this is totally a company-wide effort. I stepped in and I took a look at dojo. It's a fantastic effort, but there was a lot of potential for bias there. So I stepped in, took a look, iterate and improve. I took the opportunity to expand dojo in a few ways and decreased the opportunity for bias. And what that meant was expanding a few ways, expanding categories so that there was less overlap in some of the data collection and changed the way we collect the data. And so now we have dojo v2, DevOps journey assessment version two, which is still communicated and visible to customers. We still collect even bad baselines, but now we have a few extra things. Now we explicitly capture organizational culture. It's not rolled into other measures because we know that it's so important. We have technical practices and version control broken out explicitly. We include metrics and decision-making. We have a few extra things in there. So we're continually improving. We're continually iterating with the teams. And again, we still have at least a reference group. It's current and it's goal. We also have previous ones, so we can see them and we can help them see their journey as they go through. So other things to think about. This is advanced topics as we run through for the stats heads in the room. Any, like, no, we're gonna race then. Okay, measurement targets are distributions of probable outcomes. Variance of metrics are thought to influence things. Don't ignore rate of change in your data. By the way, if anyone ever makes distributions or graphs or whatever, if it makes a drastic change, take a look. Super interesting. If anyone tells you to normalize your data, have you guys ever heard anyone say that? Normalize your data. Sure, go ahead, but be super careful because a bad thing is still a bad thing. A customer is still gonna feel like a defect or an escape or a system failure. Don't normalize the bad things. This one is super huge. Everyone, please pay attention to this one. This picture's gonna be really weird because I'm standing funny. Distributions and operations in development are never normal. So we've all seen the bell curve. Bell curve, right? We know the bell curve. So don't report an average or a mean. Especially don't report a standard deviation. It's meaningless. All of our distributions, like this is a normal curve, our distributions look like this, which means that the average doesn't tell us anything. Use a median. That's our takeaway for today. If you're gonna use a number, use a median almost every single time. Especially for our system and our log data. It's almost always going to be a median. And that's it for today. Okay, so the question is that I have three categories of measures, internal, external, cultural, and is there anything that passes between the two? Like the test metrics, I have four ways, but I'm not sure I understand the rest of the question. Oh yeah, so we totally game the system just by adding more tests. Is that what you're getting at? Like we wanna make sure there are good tests. So the thing there, so one thing, if you're only incentivizing people by one metric, you know which metric will be gamed. So you wanna have a handful of metrics. What you wanna do is make sure that you have a handful of metrics that you think are going to impact that outcome measure. But you wanna try to do it as cleanly as possible. So throw the ones in that you think are going to impact it, but do it from a way that you think is going to be useful, clean, impactful. Pull out the ones that you don't think are useful. Yeah, you wanna make it be a good test. Don't just throw in as many tests as you can. If your incentive is to create as many tests as possible. Oh, so here's another way to say it. Input metrics in one case can be an outcome metric for another case. So if ship every day is influenced by input metrics, someone else farther down the road could be incentivized by saying my output is to create tests. So how do I create as many tests as possible? Does that make sense? And so for that you would wanna say creating good tests. And so what's the input for creating good tests? So these, like those can chain. You can chain all of that along. And so then you wanna find the good input metrics for that because then their customer is creating test cases, right? So a developer farther down the line, their goal is to create good test cases. And the key there is good. So what is a good test case? Catching as many defects as possible, running as quickly as possible. Does that answer your question? So for Western, so the question was how do you apply Western to a larger team or larger sets of teams? It's best if you keep it at the team level because people know what it feels like at the team level. But if you want to, you can change that instead of on my team responsibilities or shared or failure leads to inquiry. It can be at my organization. It was a little bit of, so how was it accepted, right? Cause sometimes people are like, whoa, a little bit of both, to be honest. So some of it was like, whoa, Nelly, like don't tell me that emperor has no clothes. Super don't wanna hear it. And some of it was, oh my gosh, finally we totally need to hear about the data, right? So, and that's kind of why I pointed out for the engineering thing. So capturing some measures through subjective means, doing the mini interviews to collect the data can be a really good way to do it because you're essentially socializing things. And you end up triangulating, right? So I've been researching this space since 2007. So I had a pretty good idea of what data we needed to be collecting and building hooks into the tools for. I had a really good idea of what we needed to do. Walking in and saying, so I'm the expert and this is what you're gonna build. Not a good way to do it. And I didn't do that. But there was like, I think I have a pretty good idea. I think I'd like to see this in our tools. I don't see it. So went ahead and like they did their thing. It was their initiative. Several of them came up and they said, it's looking like these end up being really good ideas. What do you think? I would like to get your advice on, your insights on what to build in. And I was like, thanks. I'm excited to hear that the team is ready. This is a great idea. This actually looks like it triangulates really well back into the things I've been finding through the industry. So like sometimes people joke and they're like, oh, surveys are all right. Yeah, except for the most part, people have really good intentions. And socializing that helps you build also exposure, right? Suddenly now everyone is also used to the idea of collecting some data and seeing some data. It kind of eases everyone into that. Who's responsible at Chef or when I see it in other organizations? Who's responsible for collecting metrics? At Chef, we have someone in engineering who is on like a special team who kind of helps but also the team leads at other organizations. It can be some that they bring in just to do metrics. Okay, thanks everyone. Thank you.