 Yeah, thanks for coming. I hope this will prove to be a very exciting subject Glad to be here my name is I'm Ina Stani I'm the senior manager of infrastructure services here at Acquia I Was in the ops team from December 2010 through 2015 so I did ops for a long long time with Ricardo here up front During my tenure there. I formalized the incident response and ticketing process wrote up a lot of docs and confluence and You know created, you know process around things that in a fast growing company was very half-hazard I also wrote automation tools When I started out to manage a very rapidly growing fleet When I started it was like less than a thousand servers now. We're at 17,000 with production and dev, but I think just production. It's just 15,000 Last year I implemented a Kanban To manage the operations team work in progress Kanban work very well because it's very interrupt driven And then finally I'm right now the the tech the tech lead for the ops tools team Which is an automation team that builds tools for ops to get their jobs done easier I'm also the people manager for tier two operations soon to be SRE So if you were at Ricardo's talk yesterday, he talked a bit about what site reliability engineering is so metrics, right? What do you usually think about when we're talking about metrics? Well, usually we're talking about things like utilization and saturation Availability error rates and throughput and we're measuring these things from certain Pieces of components in your stack. So talking about the hardware CPU memory all the basics, right? Of course the Esoteric Stefan operating system like the number of network connections and open files You might have service metrics. So you're measuring the number of requests. You're getting Cashmiss and hey, you're talking about varnish and of course holistically in an app level. What are the htp responses? Are you getting click-throughs? Are you getting sales all that stuff and why are you gathering metrics in the first place? Well, they help you make decisions and There's all kinds of decisions that you'd make around data like should a person get pages that app down due to certain Service-level metrics that you're gathering should we do we need a scalar infrastructure? Are we running too hot in terms of our Utilization across our fleet. Do we need to revert that last deploy? Are we starting to get you know 500 errors because there's a bad piece of code that we deployed or should we keep that feature? Are we not seeing a lot of requests for a particular API point and therefore maybe we should just deprecate it and remove the code and Therefore remove liability Where's the problem? It's not the whole picture So humans are involved in the process of building an operating software. They are essential In terms of keeping a service up and the customers happy So it makes perfect sense that we should be measuring people also so It's people metric stuff. What could this accomplish? So let's say you're a manager and You're trying to keep your team engaged and happy and retained Well, these metrics enable you to do a few things. You can be proactive about quality of life issues for your team So alerts fatigue getting paid too much Toil lots of manual labor and in the day-to-day process, etc It also makes the team status Transparent to the rest of the company so that people can see and empathize around what you are doing It allows you to make justification for additional funding for staffing resources. You can identify that you are You're taking in more work than you're able to do that type of thing and also identify opportunities to process improvement making your team Happy or through incremental changes over time now Let's say you're one of those people that just picked up a copy of the Phoenix project And you see that there's a serious issue in your organization and you're like wow I need to help fix this because I see that there's a cliff and we're approaching it And we need to slow down before we go off the cliff So if you're that type of person and you're trying to raise urgency around opportunity or a problem in your organization People metrics can do certain things. It allows you to convert anecdotal experience into empirical data I think it's number one thing Revealing the operational cost of current conditions the leadership being able to express things in dollars Identify constraints and key business functions. Where are things slowing down? What team is not performing as as fast as the other teams and making you know the cycle time for a process of the customer Be longer than you'd like and finally when members of leadership to your cause making it so that decision makers are You know rallying with you and and and supporting your your your initiative and is are willing to dedicate time effort and resources to you couple things so that little diagram on the right that's from John Cotter he wrote something called the eight-step process relieving change he wrote a book called leading change very interesting And the first step is establishing a sense of urgency So these metrics are all around establishing a sense of urgency if you're a business leader So if you're you know higher level People metrics can quantify the level of efficiency your teams have in creating value how efficient they are How much time are they spending towards what they're hired for? They allow you to identify what organizational pain points are so you can see which teams are struggling and need some assistance They allow you to be equipped with the essential data to make tactical decisions Do you need to create a new team? Do you need to move resources to another project and finally ensure customer success because without a successful customer you're not going to be around so This set of information is probably the most important out of the whole Presentation so take note complaining about a problem isn't going to work so there's a book by Eli Goldrat called the goal and From the from that book there's this quote the goal of an organization is to increase throughput as in the rate of Features or widgets going through the system to the customer's hands While reducing both the inventory the amount of money that's tied up in raw materials and not Translating into value and operating expense the cost of doing business. So that's the goal, right? So you have to communicate with leadership in these terms throughput inventory operating expense if you can't can if you can't convey your message in these terms You're not making it very easy for them to listen because again, these are the guys that look at spreadsheets all day so What can this accomplish? What would influence decision-makers more effectively? So let's take on a little story of the two bears. You have the good bear and the bad bear Well, the bad bear is just gonna say hey working on TMX really stinks and we're always firefighting and doing tickets And it's kind of disruptive and it's just making a big stink and you're not really able to back up your point Or you could be the good bear 40% of TMX's time is spent an incident response and 30% of their time is spent in manual tasks that the business needs That's seventy four percent of their time not spent on making improvements that product or streamlining current processes Out of the two subjects here, which one's gonna be more successful and getting the change to happen So let's get into the metrics So the first metric I want to talk about is the time and effort spent in the four types of work So how many of you read the Phoenix project? All right, so a few of you so if you haven't read the Phoenix project, you're in your you're in the dev ops track You're watching stuff from the dev ops track I highly recommend when you get home go on Amazon or whatever pick up a pick up a copy and read it So in the Phoenix project they talk about there are four types of work that an IT team does and I argue It's the same for development. There's four types of work for them and let's go over it briefly So first thing is business projects. So that's new features That's doing your job as a developer creating the software necessary to make the customer happy and therefore you Putting food on the table Second thing is internal projects So this is things that improve your ability to do your job as in cleaning up the technical debt investment in CI and CD Those type of things where you're in reinventing yourself and your process and things like that and then you have operational change With any piece of software that you're operating You're going to have to do some manual steps and things in terms of the upkeep of the system You're going to have to do releases. You're going to have to configure, you know, your stack You're gonna have provision and instantiate new instances of your application That happens. You have to track it and then finally the silent killer of every single project out there unplanned work aka outages Firefighting, you know, the big things that make you very very sad and waste a lot of your time and you're up till three in the morning And it's not good for anyone. So if we actually measure these things The quantity and the percentage of each type of work over time and you show it to the business They are now instantly aware of where their money is being spent and they can see where if they're getting the return investment So, yeah, what do you do with this data? Okay, let's say I have this stuff So for the first thing you can actually track the amount of unplanned work You have and you can start taking measures to keep it to a minimum so you can focus on the things that actually provide value So there's some priorities here. So let's say you're on a development team The priority is to is to target for maximum time spent on business projects new features, right? So you want to make sure that your maximum amount of time to spend business features But if you need to do some internal stuff to increase the velocity and speed of the pipeline, that's great If you have to do releases, that's okay, too But you probably should be investing in internal time in order to automate that stuff away And then finally unplanned always sucks and you should never be spending your time on that and then for ops team It's a little different. So instead of having spending all your time on business change because for an ops team It's like tickets. It's like tasks, right? But if you're spending all your time on internal tasks and internal improvements You could in theory build a series of software systems that do all your tickets for you And you're continuing to improve upon yourself And you're freeing up more and more time to do projects rather than the classic IT model Which is I'm just doing tickets all the time and my job is sad So this is a this is fake data Suspend the suspend disbelief for just a few moments and let's say that this is the amount of hours on a per hour basis Spent by team X on the four types of work and we can take a look at this data And we can make some very interesting conclusions. So you can see that unplanned I actually try to get Google to make this a red line, but whatever so unplanned is Constant non-zero for this team. So there's always someone hour to hour spending time on incident response or outages or critical failures or whatever, right? So that's a very interesting thing you can pay attention to you also have business projects So if this are operations team, we'd be talking about, you know tickets and getting things out To make the cut to make customers happy, right? And you can see that there are dips and valley dips and mountains and valleys and stuff It's very unpredictable And then you have infrastructure change was just kind of you know happening and then project time the thing where You know, we're improving on ourselves. There's only like a little bit here. So what would you say about this team right now? I mean, well, here's the percentage, right? So almost half of their times the unplanned work and then You know a little bit more the core of their time is on ticket work and then you have some scheduled work And then you have this little sliver on improvement. This is this team is not happy For this 24-hour period this team was really really sad and doing really really bad stuff So just to make the point even more clear unplanned work is waste unplanned work is Well by definition work that was not planned by the business to perform It is taking money and literally throwing it out the window every time you have an outage Every time you are, you know page the middle of the night because you have a failure in Infrastructure or your app or whatever you have taken that money in that time in that effort And you have just tossed it out the window or set it on fire unplanned work is the bane of your existence and you should work to eradicate it So here's a quote Tom Limoncelli. He worked for Google as a site reliability engineer He currently works in Stack Overflow He's written a lot of books around system administration and recently site reliability engineering and this is a quote from him If more than 25% of a team needs to be dedicated a ticket duty and on-call There is a serious problem with firefighting and a lack of automation So that's just something to put in your minds and think about in terms of when you're gathering your metrics If you're seeing a good portion of your time spent on Handling just tickets or doing outages. There's there's an issue that you must solve So you have these four types of work and that can be kind of complicated to track. So let's do something simple So there's operational load. Now, if you were at Ricardo's talk yesterday, you already explained Operational load, I'll go and repeat it for those that are not here or we're not there So operational load is simply the percentage of time spent towards the upkeep of your service as in time not writing code or making improvements So Google they kept their time at 50% for their SREs and when exceeded this work actually overflows to the software engineering team So why 50%? Well a very interesting thing is there's this wait time graph from the Phoenix project That's in the back of the book and it's very very interesting. I thought 50% was a very cool value So wait time so that that's like the time that a customer would wait for requests to get done I mean, this is this is plain queuing theory, too Once you exceed 50% You start to wait a little bit more for your service or for your request to get performed Now as you can see it's equal to the percent busy divided by the percent idle So when you're at 50% one over one is one. So you're good. No problem, right? But once you start, you know, going up to 80 now you're at four if you're going up to 90 you're at nine So once you start getting into the zone where all of your time is spent on busy work and Your in your your team is utilized You You're gonna be waiting forever. You're gonna have a huge ticket queue people are gonna get frustrated and upset and angry and that's not a place You want to be? So that's why I think a 50% cap is necessary in SRE teams not I mean clearly for you know They're their career and stuff but also to prevent this this situation from happening So slack is your friend and I'm not talking about the chat service What I mean is actually idle time So time that you have set aside Not marked for anything so slack means that your team can be responsive to burst of unplanned work without a business impact So how many of you are doing scrum? Okay, so some of you so are you guys setting aside like a Buffer just in case there's a bug or something you guys establishing that alright That's good because without it You have one thing come in your sprint that you really have to do and then you blew your sprint You didn't complete the stuff you target for right? So slack allows your team to be responsive to burst of unplanned work without business impact It also allows the opportunity to improve skill sets a morale If people have a few hours a week to be able to learn about something new they're gonna be happy They're gonna be able to bring it into your team and introduce new ideas. It's good So the 20th century management style of keeping slack lean and non-existent, you know 100% utilization We're squeezing every last drop out of the team. It doesn't work What it does actually is it creates constraints it creates bottlenecks So flow of work and the and the ray of requests coming into your team can be very inconsistent So be prepared set aside some slack So let's talk about another metric happiness, right so this is Something that I've I've deployed Adakwia, and it's been very very useful, and I want to share it with you today It's just the happiness metric so all you have to do is Per interval of time it could be a week it could be a month or quarter or whatever doesn't matter But you ask from a scale of one to five how happy are you doing your job? So how happy are you on your team doing the stuff you are you are doing right now? And then from one to five how happy are you at working at your company? Are you agreeing with its vision? Are you happy with you know their policies and the procedures and stuff? The community the culture and there's some other three questions. They're kind of useful They're not something you can plot on a graph, but they're cool So what makes you the most happy? What makes you the least happy and then what single thing have changed would most greatly increase your happiness Which is kind of like hey, well you want me to go fix for you, so you stay right? So that's kind of a good thing to ask people So why? Sure, this is all touchy-feely. Why why why do this? So the first thing is it allows you to quantify a common morale of the team You're able to do some math and figure out. Hey, everyone's kind of hanging out at a three or four What is it that we can do to make things better and you know You can see changes of people's morale over time as new initiatives or certain crises or certain situations have popped up So you can see their effects You're also able to identify improvement opportunities if everyone's saying hey, what would make me really really happy is if we Had a foosball table, I don't know and if they all suggested that and then you implemented it You probably are going to make them happier And then of course you're going to be preventing burnout and employee turnover by addressing the problems that are you know serious that they've Indicated and it also allows for a safe place for people to sound off on team issues Especially if you allow anonymous submissions, so at aquia when we started doing this for operations team We had a field that pretty much said you can put your name in here if you want Um, but we're not actually going to record who submits this so someone comes in doesn't record their name and is Frankly like just frank honest about the current situation that's going on for them we have key data and They can feel confident and have a level of trust that we're not going to use that data to some nefarious end Employee turnover is really really really expensive So if you have good talent on your team you should do everything you can to keep them and keep an eye on them and these types of Metrics make it easy for you to do that This isn't a replacement for one-on-ones and actually meeting with your team But it allows you to calculate to get get yourself a baseline and provide opportunities for discussions and initiatives later So there's a bunch of other metrics that we can go over and I'll go over them briefly But I wanted to just talk about the primary three because I think those are the big You know quick hits that if you implemented them you're going to be equipped to communicate with other teams and your leadership to make change So you have cycle time, right? So that's simply how long will a customer wait our request? You got throughput the request performed per day week or month, you know, how many widgets is the team churning out? Frequency by request type what should be automated first? So if you got, you know 20 types of requests in a week That's probably a candidate to look at the process and figure out how to optimize it fully automated or at least tool-assisted Frequency by root causes so we're talking about incidents, right? So what's causing the most pain? Oh this particular subsystem maybe this database or this component or you know component of the stack is always causing troubles If we can see that's where the pain is coming from then we know okay We should probably start paying attention to you know investigating this particular portion of the stack and working on fixing it over any other thing Reopened issues or bugs how often our defects going downstream, you know, is there Process failures or is there an issue with a particular piece of software? What's the impact of your current state of your process or your software? And then finally a very interesting one is time spent for customer Imagine you had everyone track their time and you're able to attribute requests to a particular customer You can then calculate what the operational overhead it is to have a particular customer and then you can calculate whether or not they're profitable Very interesting So hopefully I've won you over on this stuff and you're like man. I want to do this. How do I get started? I will tell you So six steps. I'm not going to say it's easy, but they're clear So the first thing you got to do is you got to track your work in a ticketing system How many of you don't track your work in a ticketing system? Good, it's okay. You can talk to me later and we'll hug it out and we can get you set up It's okay So it's question one of the ops report card for a reason if it isn't in a ticket it doesn't exist and That's really important when starting this all of your work has to be tracked in the ticketing system So you have a single place of information to query to get your stuff. So There's several things. Oh, I have like I duplicated the slide and I forgot to remove some stuff And I'm really sorry about that So I'll move on The second thing is a logging time So you have your ticketing system and you're tracking all of your stuff in there great Well, now you should probably track your time spent on each issue that you work on So for ops and SRE type people it's important that they track all of their time So that way that you're able to see what percentage of their time is actually like the toil and the stuff that Ricardo and I have talked about Developers should probably just track their time spent performing the non-coding tasks That's okay because then you're able to figure out how many you know dollars or hours spent on the stuff not coding But tracking time sucks Yeah, it really does but it still needs to happen and there are some tools out there that make Tracking time a lot easier toggle is one of them. We use toggle at aquee We also so it's it plugs into Jira and there's some nice buttons that you can click and there's some reports And it makes it a little bit easier for you to track your work You can also write tools that integrate with your ticketing system to make that much more ergonomic so what we have is a very simple Ruby tool and Any ticket that a comment or a group of comments was made on for a given day, but there's no time tracking We actually generate YAML files and email them to the engineers and then the engineers feed it into this utility And then it simply says hey on this ticket you said these comments How long do you think it took to do this work and then they're able to actually reconcile and you can get your data back You got to emphasize over and over why time tracking is important Just saying oh, yeah, you got track your time is not very motivating and people aren't going to do it But if you're telling them that hey if you give me this information, it's gonna be easy for me to get you more staff It's gonna be easier for me to get you more automation and more tooling and more resources Then they're gonna be motivated to do it and of course you can always bribe them and provide incentive Accurately track time, but be very very careful never ever ever ever use time tracking data as a weapon Never go and say oh if you're not tracking in hours a week, then you're not working if you set up An environment like that you're gonna get bad data and you're undermining your team's trust step 3 so You got your ticketing system You're tracking your time The third thing you can do is you can track non-issue like data using custom tools So there are time series databases like stats D slash graphite And there's also influx for those that are into that and they're really useful so you can write code Or basic utilities and tools to emit metrics Into these systems for graphing and stuff And the worst case scenario you can use Google Forms and I'll talk about that later So okay now you got all your data you got your ticket data You got your non-issue tracker related data that might be relevant and now you can make a dashboard out of them and do something Very interesting so Grafana is a very useful tool for this, especially if you're using the stats D graphite stack If you're using Jira, I know some of us are You can create widgets For your dashboard and you can set those up and and you'll be able to get some key team metrics there as well So yeah get you know a television and a little raspberry pi or whatever and display this data in a prominent space of your office Right where people are walking by and see it and make sure that you have some form of basic documentation or very clear description of what this data means The goal the reason why you want to do this is so that you're generating empathy for your team's current state I can go and walk by and I can look at your team's dashboard and say wow they had a really tough day today What is it that exactly they do exactly? You know what type of work are they doing? How can I help so being able to get the information in their face is huge and being able to to motivate change from a grassroots point of view So you got all this data you got the dashboards you have some Stuff in your face that you can look at now you have to interpret it and you need to communicate it So Reviewing them daily or weekly as part of your stand-ups your agile process or your weekly meetings or whatever is going to be very Useful because it's going to allow you to look at that and then start asking questions about particular alarm anomalous metrics So you can say oh look there was a big spike of unplanned work. Why is that well on that day? There were these incidents? Wow, that was really expensive. Maybe we should look into that It allows you to Make it possible for you to articulate the information that you're that you're gathering in the form of a story Right because back in the in the slide with the two bears being it just saying oh, yeah I'm firefighting isn't very useful But if you're able to say in a form of a story Hey, this code push caused this many hours of unplanned work this week And we weren't able to get those tickets out that we promised the customer. That's a really clear story to tell And then you can take this data and you can share it with management So now you have to approach management now. How how do you do that? Well remember? They care about operational cost inventory and throughput So you have to speak in terms of time and money So you can say if you're able to articulate in these example statements, you're gonna have a lot of power $5,000 a team access time is spent rebooting servers due to bug why Customers are waiting up to two weeks for team X to fulfill requests It takes one hour on average to perform task X and We need double our usual EC2 costs while bug X is unresolved so time and money step six so You can define a target condition, which is pretty much a very elaborate way of saying a goal And working towards achieving it so target conditions are something that I learned from a book called Toyota Kata so it talks about The coaching Kata the way that the form in which they have their team solve problems So what a target condition is is if you have these metrics you can now say hey I Want the cycle time of issues being performed For the customer to be reduced from the current state, you know a weak average cycle time To one week average or like from half a week average cycle time And we want it we want to be able to get that done in three months So it's pretty much just saying I have a metric where it is now I want the metric to be in a certain place and I want a you know a time box on when we want to achieve this So it's a very interesting effect You want to set your your your target condition not so close that it's really achievable and it kind of Kind of lends your your your thinking to certain solutions that you kind already have in your head It's just a little far out out of reach so that you start thinking creatively about well How can we make this possible? and then you're able to iterate using the PDCA method or pretty much a scientific method to iterate and make changes one at a time until you reach your your target, right? So examples of a target condition is to say okay We want to reduce the operational load to less than half in six months or Reduce the 90th percentile cycle time on tickets to one week in three months So I have a reference at the end about the Toyota Kata book. It's a good book It's very interesting, especially if you like plant manufacturing for some reason. Maybe I'm the only one So okay, seriously show me how to do these things so I can make dashboards and impress my boss. All right, let's do it So quick quick and dirty So how many of you write bash or at least her? Okay, we got some of you so a really cool thing about graphite and you probably already know this You can just net cats and UDP packets to the port and you can start making graphs right now So there's nothing stopping you from writing really simple scripts or putting little hooks in your code To send integer or float values to a machine and graph it so I mean here's a really simple one If I run this it will read in a number and then it'll go and plot it on a graph for me, right? So this uses stats de gauges which once you set it it will remain at that value until you change it again, which is very useful There's another example, which is counters. So, you know rate of incidents of certain things So a very common thing for system administrators to deal with is interruptions people come up to your desk I mean you build your career or career around being helpful being the guy that can answer any question But there's a little consequence which people just come up to your desk and say hey you got a sec Usually you don't but you don't want to be mean so you kind of take the interruption So there was a period of time in which I was like hey I'll just go and run a script every time I interrupt it and I can watch The interruption rate over time and that might be useful data to have so this uses counters So if you and it just sets one so as you as you go and you continue to run this metric It will start plotting or making spikes in certain periods of time So you can see where the frequency is taking place from a time aspect So again real simple stuff in bash So yeah, I actually have an itty bitty little demo So let's run this stuff and see what happens So I'm not pulling a bait and switch on you So here's the happiness. I All I did was just do some helpful message. Hey, how happy are you and then we take it in real simple shell script So I'm gonna go run this Let's say it was a really bad day and it's a two And let's say that the reason That I had a bad day actually was because I got a lot of interruptions So I'm gonna get I'm gonna interrupt myself a whole Bunch, I mean you got a sec All right, so bunch of people came up to my desk asking asking for some help because Yeah, exactly. So here we go So now We're gonna go back Sorry tiling All right, so I have a Grafana dashboard that I set up with these metrics. So if I refresh this Look at that. So I ran I ran this and you can say oh look my happiness is out of two. That's bad Right and look, I mean interrupted. I got a bunch of interruptions. It takes a little while for for for for graphites actually like Flush and persist everything the disk so when you render the graphs There's like a little bit of a delay before you actually get the true metrics But as you can see I interrupted myself four times and there you go It's right there and and I am happy. I'm happiness level two, which is not great. So If you had your team Generate these metrics in some means like this you're able to see as a group and individually What's going on? So I fired this up in a Docker container this morning It was really easy to generate this So I think it'll be really easy for you folks to do so also But that batch stuff was really really awful to read and I I Might not be a programmer at all and I really want to do this metric stuff, but I can't what can I do? It's not a problem. So if you're using JIRA, there's lots of reporting capabilities built in There's documentation and there's plug-in so you can do things from within JIRA if you have to There's some also some business intelligence tools We use Domo so we actually feed our JIRA data into Domo and do some analytics and get all kinds of information There's also Amazon quick site, which they released last year if you want to use their service to do business intelligence And then again, there's Google Forms and I'm serious This is actually really useful and I'm going to show you I'm gonna make a new form Call happiness metric So we're gonna do exactly what we were talking about the slides So how happy on team Linear scale. This is kind of cool. So one to five Right It's pretty straightforward And then I could duplicate it. All right, so we got that linear scale again So I'll make a new one paragraph Duplicate that duplicate that again Okay, so Now we just go use it. It changed it recently Preview, there you go So let's say In the middle about what I'm doing on my team, but I'm really jazzed up about what I'm doing in the company And what makes me most happy is puppies and kittens and What makes me least happy is Outages and what have changed will most increase your happiness Unicorns, I don't know Okay, so I submit this your responses I Got graphs Right off the bat Right. How long did it take for me to do this? Maybe a minute maybe two of dead air and we have a means to gather metrics on your team You didn't have to write a single line of code. You didn't have to do anything elaborate and you can start getting metrics So let's talk a little bit about JIRA. So how many of you use JIRA again? I don't know if I asked okay, so a bunch of you all right cool So I use and abuse a JIRA API. I do all kinds of crazy stuff if you're interested talk to me after so It doesn't have everything you need So if you want to mine for some interesting little goodies in the database, there's some stuff You should she should know so let's say you wanted to do the the matching of the comments in an issue with time tracking So what you have to do is You need to look at two tables. There's the work log tables That's a track time tracking table in the JIRA action, which is where the comments are stored So if you're able to figure out, okay Here's the work log for the issue for the day and here are the comments in the issue for the day and you can match them up Okay, that guy tracked time But if you have items in the JIRA action, but no items in the work log and then you know they need to track time I can't share source code for what I have, but I think that should give you enough to get started There's a bit more so remember that grad that simulated graph that I showed a little while ago So that actually exists on my team There it wasn't there was no good functionality generating this in the JIRA API So I actually had to talk to the database directly. So if you want to set this up, this is what you have to do So in JIRA for your issues you create a custom field called work type and you do business internal options change known planned So in the JIRA database, you have to join work log issue ID against custom field issue And then you're looking for specific custom field values mapping to each of those things and they're going to be stored in some weird representation because it's JIRA and then if you sum all of these things together By you know the the time spent in the work log You can aggregate the time spent over whatever time frame you want for a specific type of work And then you can push the data to a time series database like in the in the in the batch scripts So with that you're going to be able to figure out Hey over the last week or over the last 24 hours over the last quarter Where have people been spending their time and that's going to be really huge When you have to go to your boss and say hey, I think I probably need more people are hey There's something we need to fix So if you have JIRA you have means to do it, but it's going to take a little bit of work So there's a bunch of books that led to me putting together this presentation So I'll share it with you the Phoenix project is like the book It's like the gateway drug to DevOps if you haven't read it already Please get it because it's going to really open your your eyes and to you know A new way of thinking in terms of doing your job, especially if you're an IT or software development There's also the goal written by Eli Goldrat. It's like the product It was it was the Phoenix project in the 80s. Actually, the Phoenix project was modeled after the goal So it talks about the theory of constraints And he has his own system around setting the pace of work and things like that, but It's still really really good read and I highly recommend it There's also the practice of cloud system administration, which is this big tome that talks about Architecting systems as well as operating them and it talks about all kinds of interesting stuff relating to toil and tracking your time and automating stuff away You also have scrum the are doing twice work in half the time That's where I think I got the happiness metric from And then Toyota Kata, which I talked about which talks about the PDCA method and finally Kanban for those that do use it Very awesome book, especially for interrupt driven teams that need to be able to track their work And you can't put things in a sprint So I also added the links in here so you guys can get it after the I share the slides So, um, yeah, I hope you guys really enjoyed this presentation. I really appreciate you coming Yeah, someone at aquee actually photoshopped my face on that. That's really cool So I just want to motivate you folks if you guys are in an interesting situation Where you see that there is a problem and you don't quite know how to articulate it to people in order to Make a decision or a change happen. Well, congratulations. You know have those tools go forth and conquer questions And please So, yeah, I do have a I know you guys probably a bunch of questions Just make sure you queue up in the microphone so we can record it and people can hear Yes, thanks Very useful Thank you. So my question is how how would you implement these things for small teams because You know, like my team is like seven people. Sure. So when you bring this information Then it's so easy to tell who was not happy or you know I mean you figure these things out very easily when you have such a small group, right? so it's how you communicate these things and how you use this information and To make it there says that's what you know That's one question is how we make these things more applicable to small teams and also the small teams I think there is I have seen this culture, which is People are a little bit loose in the time, right? It's not like like big operations aquia Sure, you know people are more loose, but they are also more flexible when you need them So how you how you manage those things like from moving people into being so loose into doing a lot of tracking time and so on sure so You have to approach it from two angles You have to give them an incentive and a reason to be able to be willing to have the discipline to track their time At the same time you're gonna want to be you gonna want to understand, you know What what tools you already have in your arsenal and like how you do your work and? Build tools and process to enable them to do so for me to just to go up to my team and say hey You're gonna track your time. You're gonna use this product. They're gonna say no But if I say hey if you do this we can get more headcount or if I do this I can go to my boss and say hey, there's a problem if you do this I will give you a hundred dollars if you have you know tracked all your time in the week And I roll a d20 and your number comes up like it might take a little bit of bribery But you have to continually and consistently communicate the reason why and for your team to to Understand the purpose and the goal behind it and to know that this is a tool for you their boss to enable them to be Happy they're gonna be motivated to it because it's like okay I'm gonna give him what I what I need to for me to be successful as an engineer So you have to set that up now for small teams. You're asking about small teams I mean if you are in a situation where you have a Culture of distrust where they're like look I'm not gonna go and and submit my true feelings because that might single me out Well, I mean first that's something you have to you have to fix you have to fix that like if if and work I'm building relationships and building an atmosphere trust so that you can talk openly about these things And if the and while work is in progress then you could take the arithmetic mean of the numbers Or take very basic themes out of the the text submissions and say hey, here's some correlations around the data Here's the general tone And it's really cool to share a digest of the data with the team as a whole because first you're saying look I have paid attention. I spend enough time to put together the data and to enter, you know Tell you and it also gives an opportunity to maybe Reiterate points. They weren't able to clearly convey so It's a bit on the culture side and a bit on meeting them in the middle with the processes and tools you have It's not easy, but it's definitely well worth it Ricardo it's kind of touching your question and Asking question to I mean So this is all a part of culture so if you don't start with the culture now This will be really really hard to implement and the culture starts with honesty and like People have to trust you like I notice you're probably a project manager or manager In that level people need to trust you honestly like I mean has been on our team is been a leader everybody trusts him and when I mean says Hey guys, this is really to help you like we don't have a slight Point of distrust to point him in the past that will say no is not trying to help us No, he is trying to help us so this worked Because your manager you can trust them Right and you you have to build those bounds of trust like you have to show your team that They are a part of the solution not a problem if there is a problem then process should fix it Process automation all of those things that will remove toil and unplanned work from from the way because Sometimes people don't work because they are blocked and your work is to unblock them, right? So my question to I mean is how we're in the beginning like before we have this yeah, how did things work? So before Kanban So prioritization was based on the loudest screamer And there were a lot of you know discussions and in Unseen places to try to get people's time to work on things With the lack of visibility people just assumed the ops were just lazy or they didn't care or whatever But by you know, so those were Having an opaque team having a team that's just like file a ticket and having that that standoffish Relationship between the customer-facing teams or the stakeholders between between them and ops it Really eroded confidence both ways now by going and and having data and being transparent about your day-to-day work You're able to start creating a dialogue And to start putting you know having opportunities for empathy where people can see wow these guys are really doing a Really good job. It's just there's all of these impediments So you're inspiring those folks to to find opportunities to help you because by helping the team They're helping themselves by enabling them to unblock them or to enable, you know customer success or any of those things Yeah, no worries man Thank you Sure Well, sure, I mean that's why in the happiness metric three questions were non Like numerical values, so sure you're gonna be able to I mean the the happiness in a team and happiness in a company is Is a real general, you know barometer around okay? Where or are the people's general sense and that can change week to week depending on what's going on right? So it's it's kind of a loose metric, but When taken as a group you start seeing some very interesting trends over time now the other Questions that are text-based. I mean it's tantamount to a suggestion box And you can look at those and you can start grouping them into themes like This is around how meetings are being conducted and this is around how you know When you know how early I have to wake up and come into the office or commute or whatever So by being able to look at that and figure commonalities then you start you're able to put together a series of continuous improvement opportunities Where over time you're going to be able to make your your team happier or at least reset expectations about what you're You know what you are expecting as a manager So that way they can at least know if they're in the right place So these questions are now Asked by the whole company on a quarterly basis Which was really cool? They they started doing this and gathering metrics and we're able to actually go from team to team and be like Okay, so this team feels this way and this team feels that way. So it's being done quarterly Also in on tools team, which I lead we do it every two weeks So as part of the retrospective process of a sprint We then go and ask, you know one to five happiness rule straightforward. So that way You can see correlation between what happened in the sprint and the current morale because there might have been a story or a piece of unplanned work That really burned them or something Thanks for the talk. Thank you. My question is kind of about just at a really high level like when you're rolling this out it went from kind of throwing stuff over a wall not tracking work to Kind of like what happened and kind of what order at a really high level like how did how did the rollout happen? Yeah, sure. So The first thing that we did is we established Kanban because we needed to get control Over work in progress and be able to visualize the basic metrics of what the team did So we're talking about ops, right? So the things that people are going to be concerned with was how many tickets did they do? Because if you know that on a day-to-day basis for Kanban, excuse me for Kanban You can say okay. How many more tickets can we replenish the queue with right because you have to limit work in progress and a Kanban board So we started doing that April 2015 So bit over a year ago and that was Taking the wool off of a lot of people's eyes because they started to realize man We don't have a dedicated team that just does our stuff We have a team a team that is shared amongst multiple stakeholders with multiple perspectives and multiple goals multiple products and Now they have to communicate with each other to figure out what's most most important to the business. So that was step one Just doing that and we were graphing. We were graphing the Kanban throughput and we were like, hey, look, there's a big dip What happened? So that led to the four types of work So we started metricking that so we started adding the the four types of work to the tickets and then we started classifying them and We monitor, you know the the count of the tickets they're created as well as the time spent on them And then we have some tools that remind people to track their time on issues that we find that haven't been tracked for a given day Or whatever and then we have this nice little graph or things are crossing around and going around and and and you can see relationships And that was huge because then I was able to take that graph and share it with people in engineering and share it with people in leadership like look at this this is where the team is at and as a result it created a Huge sense of awareness around okay. Now we know this Now we know what we need to do in terms of taking action. So it started with just How we interface with other teams and then we started gathering more and more data as The need and opportunity came up and then it allowed us to tell a clear story. Thank you So we have a little bit more time. I think Any other questions on my time is your time. I want to make sure you get every last drop of value out of this talk Sure. So the question was What are we doing with the Jira API? So I will tell you So a few so of course we do some basic Jira API calls for hey, are there any issues that are like fires? We should pay attention to we can graph that real simple stuff But there's a couple projects that came out of the Jira API. The first one Was a tool assisted automation mechanism where we had a rest API and you can make calls to do certain things And then we had a client and a chat notification saying hey There's some stuff in the queue and it reduced some of the work into a yes or no question Should this change happen or should this change not happen if and if the engineer said yep It just took place. So it reduced it reduced all of the work in a toil from like, okay I got a ticket and have to figure out what happens what was supposed to happen then get the docs and then do it to This is the request is a valid one and then it simply implements it and closes it So that increased ticket throughput and eliminated a lot of the work by like 10 15% right off the bat after we launched it The second thing is something we're working on right now Jira has you can set required fields or Jira of course For each ticket type, but we felt that That made it made it kind of cumbersome. It isn't like Jira configuration is version controlled So we did something very interesting. We created a JSON schema web form a JSON schema-based web form So we can create pretty much a bill of materials for ticket types and throw them in a folder And it would do it would present a form would render a form saying okay This is the type of work that you're asking for and here's the required fields You have to fill out and it did validation Clans side JavaScript and then it did validation service side In the app. So what what that meant is you're clearly telling people this is what we need in order to do the work and then we're validating that that information is complete via schema and then we actually went reached out the Jira and Made the ticket and filled out all the forms for them and all the fields and all the stuff that people don't want to do and Then we 301 redirected the person into that ticket So what that did is it made it very very easy for people to ask for stuff to get done Without thinking about the process because we don't want to burden people with you have to go into confluence and you need to read This page and then once you fully understand it, then you can go file this ticket. We're gonna go yell at you I mean that doesn't work and doesn't serve the customer. Well, so those are the things we're doing with Jira And yeah, the Jira API is well documented and it has served us quite well anything else All right. Thank you very much for your time I'm gonna leave this in the desk because they asked me to leave into it. Thanks, man. I'm happy you