 One reason is you have to get through me to get to lunch. It was somewhere around midnight. Halfway through the nasty data center migration, when the exhaustion began to take hold. Hi, I'm Mervo, and I'm the least interested part of this presentation. This is Fair Living in Systems and Illustration. But first, a little housekeeping. This talk isn't endorsed by anyone. This is me. I'm just a boy standing in front of his community, asking me to love him and make things a little better. Will I take questions? Yes, but after the show, I only have 30 minutes, and I'll probably spend too long anyway. So if you really want to talk, I encourage you to propose an open space. If you heckle, if you must heckle, you have to be funnier than me. So of the attendees, how many people are first-time Dev Ops Day participants? Wow. So if nobody else has said so, welcome. Welcome to our community. Of the audience and people who are paying attention to me, how many would call them, well, show of hands, call themselves systems people, ops people, network storage? That's pretty good, actually. How many would be developing software engineering architecture likewise, a good participation? The often neglected, very technical, DBA security, things like that. A small but noticeable amount. How many people are product or management? Perhaps familiar, useful, but now they deal with other things. I'm kidding. That's actually some of my favorite work. How many people are straight-up business-type things? I'm talking financial, legal, HR, administrative. Yes. Any more? So the best thing about Dev Ops is that it's the entire company. And these people are a part of Dev Ops, too. Even though it's not listed, we just really don't want to make the name any longer than it has to be. So is there anything I missed? Any rules that I missed? OK, don't say anything. So given the many, many conflicts, you might be expecting to hear lovely stories about sunshine and rainbows and happiness and cooperation and so on and so forth from our favorite unicorn companies. These stories are fantastic. And they give us hope and a goal, but they seldom destroy the struggle or the path. They describe what they have now, or at least the ideal now, and they describe where they came from, and probably a key change that they made. There are some missing steps. There's also usually enough content to participate in the Dev Ops drinking game. So instead of talking about the glorious utopia that is the Dev Ops promised land, I'm going to talk about the first steps of the path. Now, one final point in my preamble. I'm going to be moving pretty quickly. So here's the safe road for my talk. If at any time you're feeling overwhelmed, please say the safe road clearly, clearly, and we will take a break. So let's get on with the show. There's a Chinese proverb that says, the best time to plant a tree is 20 years ago. The second best time is now. So the number of times I hear this phrase, Dev Ops doesn't work, is amazing. The best thing about this phrase is that the people who say it are almost completely right, even if for very wrong reasons. Who says this? I'm going to use the word sysadmin as a shorthand for a very broad group. Other times I get a lot of feedback. I'm going to use operations or ops engineering, things like that. I'm talking about largely the same group of people. The people in this group have widely varying titles, but it's most commonly a combination of systems, networks, and operations with administrator, engineer, technician, or analyst. And even though the suite of names is shared, the actual day-to-day job can vary so widely that the people doing one job might be completely incompatible with another. In some places, they have the best computers. In other places, they have the worst. The workplace probably isn't very nice and almost certainly has no natural light. If there's a page rotation, they're certainly on it. If there isn't a page rotation, they're on it all the time. The work is largely invisible until something fails, in which case it's the talk of the town. Define operations. And if anybody's keeping track, the mean tone to thought leader is now. The typical day looks something like this. But if one word could be used to describe these people, I think we could all agree that would be cynical. So who has ever had a party? Their management says, we launched a thing and it's awesome. We're going to have a party to celebrate. Who's ever been stuck behind keeping that success breathing a lot of people? And it's typically your systems, your ops staff, who are keeping it running. So for them, it doesn't look like a success. So you may be going about the course of your day when you see a new project announced and they begin to hire some more people. So you go, well, oh, that sounds great. Who's going to support this thing? The response is probably a blank stare. And you are. The, of course, might not be vocalized, but it's probably there. So I think, great, how many people do I get to hire to add to my support workload? The response is, ooh, sorry, there wasn't anything left in the budget. Come on, it won't be that much more work. Or a variation of the team player, good soldier speech. Anybody receive that? They will not know what that is. Moving on. Well, I have your request for training or conference budget rejected at hand or because another group, like you used it all. These people probably have a basic working knowledge of a half dozen programming languages, even if they don't do any of them full time. They most often likely think in a shell. They probably know at least three ways of testing whether a TCP port is open. And they probably have a soft spot and their heart for a couple of terminal commands. I'm not kidding. They have no patience for people who won't try to help themselves. They may have seen or participated in a DevOps initiative, but let's be honest, that largely consists of a team or a position rename. Or they're helpful when you suggested that you should install some Chef or some Jenkins and some monitoring so we can DevOps now or so we can Agile now. When we hear DevOps or Agile, what they're really hearing is now let's take the same people who can't handle a planned release schedule or make whatever effort that they need to squeak by the change board and let's give them unfettered access to production. Because clearly, I'm not paged often enough. When bringing a problem to management or engineering, they're likely to be told, but that means the developers will have to do additional work. I don't want to give them more work. Can't you just deal with it? So what you hear is, well, you clearly think they're more valuable than they. And you had to know this when it was coming. Why does this mean so pervasive? Because this attitude is pervasive. It's not intentional. Nobody's going to go on the record and say it, but actions speak louder than words. And these things become patterns when they happen often enough. You see if you don't start becoming cynical. So what is one to do? How do you maintain your sanity in the face of increasing job scope, increasing demand for access and security, and little hope for an effective head count? Not to mention continuing to juggle the existing volume of requests and continuing to keep the gears greased so that the machine keeps running. Now, please note, I'm not saying just. There's nothing simple about this. There's nothing just about the situation. There's nothing simple about it, and justice hasn't been involved in a long time. Making these changes will be difficult. They will take work, and they're going to have to convince other people and other teams to get on board. But the first step towards getting help is admitting you have a problem. And the problem isn't technical. It's almost entirely social. Because the saddens are typically responsible for the environment, and the easiest way to assure that the state is maintained and is stable is to lock everybody else out from it. Now, by doing that, you met the goal of keeping out unexpected changes. I have a number of side effects. Now, aside from decreased velocity, a kind of learned helplessness sets in. Your customers and your teammates became so used to being hands off that they don't know what a reasonable level of effort is to expect. Now, some teams settle on hands-off production, but you can make changes to staging and whatever else you have. But this is fraught with peril. Now, since your customers are uncomfortably making any changes, all changes still end up being made by cis admins. Now, they're frustrated because they have to do work that they gave you right to do. This leads to your time being taken up by a lot of low-value tasks. And the most common problem here is configuration drift. Now, config drift is when you have different settings in one environment versus others. When the cost to discover what production looks like is high, like I have to go talk to those bitter cynical people, people are more likely to just take a guess or use defaults, make assumptions, or use the settings in their IDE. So it works on my machine indeed. Again, this is well said by config management tools, but you need to be willing to trust your peers and give them access. If you want to be part of the process of validating changes, you put in place a pull request and code review system. Not even a system, just a workflow, let them into it. This is something your software engineering peers should be very accustomed to. Granting access to see the existing configs and the ability to propose changes also shares the responsibility for your team's environments and contributes to feelings of ownership. Denying your colleagues the ability to make necessary config changes, it contributes to the problems of config drift and learned helplessness. You need to stop feeding the machine. Your value isn't doing the work, but being able to make the decision of what needs to be done. And I'll be the first to say that automating all the things is a flawed goal. It's usually said in the context of a project rather than the philosophy of continual improvement. While I agree everything should be automated, not everything should be automatic. And certain main companies, they're there, but the vast majority aren't. These are common problems I see everywhere. So decision making is complex and trying to codify all the possible decision trees is a fantastic way to make yourself insane. Not to mention documenting your decision making process might be an unwanted look inside your brain. That may just be me. But you shouldn't have to engage in an automation project to improve your environment. Build some time into your schedule to review, propose, and make changes towards a better environment. Pick something that's rough, manual, and repeatable and remove a small piece of friction. Move on to the next one. See, an automation project, my friends, is ocean boiling if I've ever seen it and I'm really good at ocean boiling. Ideally you should have time blocked out for continuous improvement. And if not, create a meeting or a weekly project, review the issues you've experienced lately and pick something to fix. Make this habit. Whatever you don't automate has to be documented. Your brain doesn't count the post-it notes. It's something shared that people can see. Now, beyond the typical benefits of documentation, like the bus factor, it also serves as functional requirements for someone to pick up when they can help you to make this better later. You have to try to recognize whether documenting or automating takes longer. Perhaps this piece of documentation would be better served as executable documentation, meaning code. You should attempt to pick apart your pieces of your work and attempt to describe them. Way to make this a fun exercise, use other job titles to describe your role. Code cleric, or release ranger, et cetera. Or are you an internet plumber? How much legacy systems spelunking do you have to do? Could you describe your role as a production customs official? I mean, are you the gateway to production? And if so, are you really equipped to do that? And a quick test for that. If you say no, that can't go live, do you get overridden? Because if you do, that's not your job. More importantly, is this what you wanna do? Before you can ask for help, you need to prepare. So, it's no secret, most just admins do not have a healthy relationship with the rest of the business. You'll need to initiate the healing. Take someone to lunch, preferably someone you don't know well. Ask questions and listen to the answers. This isn't the time to defend yourself or your team. It's time to find out what the business needs from someone else's perspective. Ask what they think your team's role is toward achieving that success. Align to business goals. Ask them what they think your team does well and what you need improvement on. You probably recognize the words and you may even use the same ones, but in different contexts, even within different divisions or teams, the same words can have vastly different meanings. You need to go out of your way to speak on their terms. To communicate your message, you have to speak on their turf. Now, this may seem terribly unfair. Why can't they meet me on my terms? But I'm guessing that hasn't been a winning strategy so far. Otherwise, I wouldn't be giving this talk. So, not only do you need to use their language, but you need to communicate over their medium. Probably not IRC, and sending an email is a really good way to get it ignored. If you're speaking in management, you need to write a slide deck. Executives love presentations. It doesn't have to be slick. It shouldn't have any sounds, minimum of transitions, if any. Gifts are fine, trust me. But this can help to lay the groundwork for a conversation. So, now that you've described your role, we need to describe everything that you support. It's entirely possible, and likely even, that the people and teams that you support don't actually know what you're responsible for. It could be argued that most of them shouldn't need to know. But if you've been saying no to protect yourself, it's a sign that you're significantly overtaxed. You need to have a real discussion with your leadership about your role, scope, and staffing. In order to have this discussion, you need to prepare. You need to come up with a fairly comprehensive list of the products and teams that you support. This is a list of every team and every product that each of them makes. The components and tasks that belong to you for each. Now, don't forget about all those components that nobody owns, but somehow people keep coming to you to either implement or fix. Most commonly, what I've seen is CI and source code, repository, any ticketing or project systems, wikis, these are really common that nobody owns, but, well, fix it. Are you also responsible for directory services, the virtualization platform, mail, chat, phones? Do you do workstation purchasing? Do you have to deal with printers? Do you manage the storage and networking layers? Seriously, get into detail. This is not the time to skimp. Here's the thing. Your leadership might not know what LDAP or directory services are, but they'll understand if nobody can log into their machines, they can't pull information to build reports, and, by the way, code can't deploy because that credentialing is kind of important. Are you sure you understand what the company's goals are? Would your CXOs agree with your take on it? So what do you need to succeed? How much more staff do you need? What tooling or equipment do you need to help you work more efficiently? Does code get deployed even when tests fail? And how many outages have or wasn't because, let's admit, that happens. In order to have a meaningful discussion with the people in your company who aren't necessarily technical, you need to be able to relate to a language that they speak. Regardless of team duties, the lingua franca of most teams is money. As engineers, we prefer to think in terms of the tech itself, but in order to describe an impact, everybody can translate to money. It's helpful if difficult and important habit to get into, but I encourage you to consider the components of cost that go into every incident or task. What's the cost of a main site outage? How much revenue does this feature bring in? Why are you spending so much on infrastructure and effort to make this component highly available? Why does it matter that you do that piece of maintenance? You have to show the negative value of doing things the way they are or opportunity costs versus time invested to improve the situation around it, including providing automation and delegation and self-help tools. Describe how doing this maintenance work reduces your contact switching, unplanned outages and lost reputation of your company. Describe the benefit of enabling technologies so they have increased visibility to the business and the agency that other teams have to do their own work. If they claim that they're doing agile but they can't do continuous delivery, they're not doing agile. And the whole point of that framework is to improve delivery of value to the customers and the business. Further, you need to show how systems relate. It doesn't have to be terribly detailed. Describe the features that customers use or rely on X, Y, and Z pieces of infrastructure. Draw the lines from LDAP to storage to your CI to testing to artifacts delivered to production. Then show all the other systems that have those same dependencies. Once the picture emerges showing how everything's reliant on certain pieces of unexciting things like LDAP and your storage cluster and that janky collection of angry pearl and shell scripts that keep everything working, realization will begin to dawn. Congratulations. You've just effectively communicated value. So, I mean, this is a DevOps talk and if you're playing the drinking game, we have Conway's law now. Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations. So, to most people, that means an odd chart. And you see how smooth and linear those lines are. Yeah, that was redundant. And how many lines separate you from somebody else? But this is missing the point of Conway's law. It's the communication structures that are key. If you have regular communication with people outside of your reporting structure, those are communication structures. Lunches and friendships, not status meetings. Don't pollinate. A key to making the business work better and improving the life of operations is the pager. Are you held responsible for applications written by other people? Who gets paged when the app goes down? And how does that make sense? Get devs on call for their apps. Sysadmin should be available to be escalated to. Devs can triage and troubleshoot their own apps more readily than you can. They can look at the error log and go, oh, that's a normal error. That's a normal error. There's the problem. Instead of me reading through like 200 lines of Java stack trace, that's not a good time. Especially for me. This is especially true if you have full stack engineers. Hey, go debug this kernel module for me. I was hoping for a laugh on that one. All right. Can't win them all. So they get to call the cavalry when they get stuck. They don't need to know everything about the system. They don't need to resolve everything. But when a fault occurs and they need help, they stay on the call. They pair with you as you diagnose and triage and bring about resolution. That way, they don't need to escalate to you for that thing the next time it occurs. They can collaborate. They can stop, collaborate, and help work on a permanent fix. See, when teams aren't responsible for their products, when they're not paged, when it fails, they're numb to the pain that they inflate. They're not trying to cause pain. They just don't feel it. And it's especially easy to argue for this when they're using agile. They claim they want continuous feedback. Well, there is no feedback, more visceral, than being waken up by a pager in the middle of the night. When the inevitable exclamation comes at, we can't interrupt our developers. Ask if that makes sense to interrupt somebody else. Even being aware of the pain, seeing your friend tells you about how many times you got woken up by the pager last night. Well, I'm sorry to hear that. That's a far cry from being woken up yourself. Here's another thing. If you don't wanna be woken up at 3 a.m. because of something you wrote, how do you think your ops team feels about being woken up for something they didn't write? Further, this is a list of responsibilities as for. Ask each team to take responsibility for their own products. You likely still have a hefty list of things that you're responsible for. But as he's set in, as he's realization set in of who owns what, who's responsible for it and really takes care of it, the staffing numbers are gonna start making more sense. Now, this may just be a matter of the places I've been, but I've never seen a developer to systems or ops ratio of less than five to one. And most places, it's like 10 engineers for every systems person. Most places, it's even higher than that. By adding teams to pager rotations, they drastically reduce the load on you. By not adding them to pager rotations, they are complicit and you're burnout. Now, as the sediments have a reputation for saying no. The people who are asking, probably aren't trying to make your life worse, they're just trying to get their work done. They may not know what their simple request involves and all of that it might not even be necessary. But by not having responsibility aligned with authority, you may have been stuck with the pain of other people's wishes. You know the fulfilling their request is gonna cause you pain so you understandably say no. What happens next is, well, they escalate until they get to somebody who is sufficiently important enough to override you. And this is the basis of why CIS admins feel steamrolled by everybody else and everybody else feels held hostage by CIS admins. All hope is not lost. Stop saying no. Yes, but is extremely powerful. Yes, but can get you help. Yes, I can set that up for you, but we don't have the capacity to be responsible for it. We can't run it for you. What happened there? You agreed their request was important and reasonable. You set expectations for the level of support that you can give, none. You left the requester with several options for going forward. They might have hiring requisitions that they can't fill. You can negotiate for some of those. You can negotiate to have some of their engineers join your team as you're clearly understaffed. Perhaps some of their engineers join your team on a rotation or just a plain lateral move. They'll need mentorship or training, but this kind of trust training is invaluable. It's a force multiplier and it also sets precedent. So maybe they take responsibility for the thing. They run it, they get paid for it. Of course, you'll probably have to be an escalation point at some point to help when it fails, but it's their product. They feel ownership. Again, you're setting precedent. Most assignments are stuck doing tasks that provide very little value because they restrict access to their peers. In my mind, there's a perfect example, and I call it plain telephone. When I say plain telephone, this is like the situation where somebody, say, developer wants some logs. So you go fetch the logs for them and get it to them in some awkward way. No, not that log, this log. No, I'm not seeing what I'm looking for. Can you check over here somewhere else for this thing? And so on and so forth. And like a Pikachu, just kill me. No laugh. I don't know what you're hoping to prevent by restricting access, but if the scenario ever happens, you should know you're providing negative value. You're making things worse. Again, let's try to remember that your peers are not out to get you. They can probably be trusted to be reasonable people. They're human beings, and they will meet you halfway. So with that framework in mind, it's time to demonstrate some trust and delegate to them. Give them access. Look, your value is not in the credentials that you have. Otherwise, you're just a poorly implemented terminal as a service. Even better than giving access is giving tooling, logging into a server should be an anti-pattern. You need better tooling. So with the example of logging, let's talk tooling. First, logging into a box to get logs is just plain dumb. Sure, you could wrap a tail command in a RENDEC job, but let's centralize those while we're talking about it. Look, syslog's better than nothing, but not by much. Shipping logs is easy, but consuming them to something useful isn't. Batteries are not included. If your company wants to spend money on Splunk, then encourage that. It's a fantastic suite of tools, but I would wave you away from it if you're not gonna use it for everything. It's gonna be expensive, but if you're not gonna spend the money to do everything, you're gonna have confusion on what it lives where, and you're gonna miss things. Some things, some logs just aren't captured anywhere. I like elk, Elasticsearch, Logstash, and Kibana, or use a cloudy elk or elk as a service for a middle ground. If you run it yourself, it's free, it's in beer, and it's very featureful. But centralize logging tool of choice, provide access to your customers and the URL and point them to the docs. Let them get out of the way. Again, terminal is a service. You don't need to run all the things. Someone asked you to run this command for me. You need to put a button on it. Rendex, a fantastic tool for putting a button on it. It's pretty easy to make HA. You can tighten the LDAP for credentialing and you can manage permissions for specific jobs or projects. And by shipping it's logs, see what I did there. You get auditing of who ran what and when. It's just an, it's a REST API. That's a redundant slide. So if you have some data that must be restricted, try to isolate those specific cases from the rest of your environment. You shouldn't have to restrict everything just because something needs to be kept in isolation. Why would you want to deploy other people's code? Do you really provide any value in that activity? If the deployment goes poorly, you're launching straight into another game of telephone. What if you make it easy for them to do it and power them with trust and tooling, make it easy for them to do the right thing? Logs are a start. Metric dashboards that show changes in performance conditions and error rates will make it plain to see if it worked. This freedom doesn't come free. Providing tooling doesn't dissolve the development team of the need to communicate. In fact, it's likely they'll have to communicate a lot more. They'll need to be watching those dashboards and logs to see for themselves the success of every deploy. Look, I say they a lot in this article and that's because by default, most organizations have a strong component of us and them, us versus them. It's only natural for there to be an us and a them. And while it may not be my job to do foo, it's our job to ensure that the team and the company is successful. Well, that may sound like some happy-go-lucky tree-hugging-pop-psychology nonsense, and it is. The goal here is to get you, the beleaguered sysadmin, the help that you need in order to improve the capabilities of the business. And if I may be permitted one final thing. The psychological damage done to sysadmins by their peers makes us bitter and cynical. I encourage my people to try to see that they aren't trying to make life difficult for you, but it's very likely that responsibility and authority are misaligned. I likewise encourage my people to take steps to make their lives better. A ship's courses change in small increments over time. And it may seem like the onus here is on the part of the sysadmins. They're keeping the plates spinning and much of the work here described as on them, and that's a fair point. We sysadmins, operations people, we have to put in the effort to make our lives better for ourselves, because the way we've been working clearly isn't working out well for us, or anyone else. Look, there's so much more to this topic, particularly the shift away from a systems team supporting a bunch of projects towards a series of largely self-sustaining product teams, but that's gonna have to wait for another day. So when somebody says DevOps doesn't work, they're absolutely correct. DevOps is a concept, it's a philosophy and a professional movement based on trust and collaboration among teams to align them to business goals. A concept doesn't do work and a philosophy doesn't meet goals. People do. So please remember that just like soil and green, DevOps is made up of people. So I'd like to take a moment to thank my editor, Sean Moten, for the help he did on the original article. Our DevOps organizers, could we get a hand for our organizers and volunteers? They're a little bit of an awesome. This event went fantastically smoothly. And the DevOps community at large, we, for being so welcoming and shaming. So, as your thought leader, I advise you to take one antagonist, one company credit card and two lunches. Let's go plant some trees. And with that, thank you. Lunch.