 gyda'r cyllid y flwyddyn lle ymddangos yn ymwineithio mewn gwnaeth, ac rwyf yn gweld i'r llwyffydd yma i'r coleg Audio Media o amser Llywodraeth. Mae wedi gweld i parw gwahos a'r cyhoedd ymwineithiaeth, felly rwy'n gallu gweld i'r cyfeilio a'r llwyffydd a'r cyllid yn ymwineithiaeth ac allan goeth y llyfrydd. Yn gywlen i newid o'r myff경 yn ymddangos efo y mynd, ond mae'n ei ddrasgau o'r gweithiau ar gyfer y byddwyr. As we were doing this, as we were going around the building, people would talk to us about how bad they felt and how they really wanted to do the right thing, but they were trapped in this system. It felt very much like counselling, we'd be sitting there taking notes and people would be pouring their hearts out to us over the terrible things that they saw going on around them. This was a very weird feeling. A lot of the coaching literature says, don't get into counselling, that's not what you're doing, you're here for coaching and improving. So we were trying to use this information to work out how the organisation could better organise itself, how it could be better structured and better working practices and better technology to do development and operations work. As we were having these discussions, one overarching message came out, which was at this client and at other clients, most of the problems they were having in the DevOps space were non-technical. There were technical issues for sure, there are technical problems in software everywhere, so development is hard and testing is hard and system administration is hard. But the people in the organisation were able to do that, they were all smart people. The developers were good developers, they had good testers, they had good sysadmins who knew their domain, they knew the field, they knew the products, but they still had enormous pain releasing software supporting it. They also had lots of money, so they had the best tooling you could imagine. There was no want for test tools or continuous integration platforms or anything they needed that the company would buy, money was no object. So they had fantastic tooling, really smart, very motivated people who wanted to do the right thing, but all of these difficulties and problems that they had in just keeping their software running to give you an idea of how bad things had got, this organisation was running a website with 85% uptime. Ok, it was pretty bad. So it kind of turned out that most of the problems that were happening with this organisation, that the people in it were having, most of the problems were social or psychological and were to do with the way that they interacted with each other and the way they thought and behaved. So we thought we'd look at where is their common ground between all of these people, because they all want to do the right thing, they all want to help and work together and collaborate and produce some awesome software and release it and run it. So where can we get some common ground? So we looked at the values that are sort of encouraged by DevOps. These are not like a canonical list, this isn't like the rules of DevOps, these are just things that I have found that this team had and that are sort of good DevOps-y things. So in the world of DevOps we believe in having a common purpose. The dev team and the ops team have the same goal which is to get awesome software into production and running so that it can make our users happy and we can make money from them. We believe in sharing, it's not the dev team does a load of stuff and keeps it quiet and the ops team does a load of stuff and keeps it quiet. We like to talk to each other and say hey I've got some awesome test tools that you can use and I've got some awesome deployment scripts that you can have and we can share these around and be helpful towards each other. That's part of the point of DevOps is that the devs help the ops and the ops help the devs and we all get on and live happily. We believe in technical rigour, we believe in doing the right thing technically. We're going to write good code, we're going to have good tests, we're going to manage our systems well, we're going to use configuration management tools. We're not just going to bodge things in and get it done as fast as possible, we're going to do it the right way. We believe in rich communication, so it's not the dev team sits in one room for six months and writes a load of software and then throws it over an imaginary wall to the ops team who then pick it up and run it. We're going to talk to each other all the time and we're going to sit near each other and we're going to be friends and we're going to go out for lunch together and we'll occasionally have a beer and it'll be all really nice. Hopefully you're looking at this and going, yeah, this is not revolutionary stuff. These values are very basic, straightforward things that we teach our children. Maybe not technical rigour, we maybe don't teach them that one, but it's all decent stuff. It's all good things that people just want to be like. Everyone wants to be helpful, right? Who doesn't want to be thought of as a helpful person? Who doesn't want to be nice to other people and who doesn't want to communicate with them and bring them your good ideas and share and help your fellow colleagues work towards your common goal? Given that everyone wants, in this organisation, we're believing in the same sort of stuff, given that we all believe this stuff and we all want to be a good person, we all want to do the right thing. The difficult part that I found wasn't necessarily that people didn't want to be an aligned DevOps team. It wasn't that they didn't want to do good software or that they couldn't. Something in the system, in the way they were working, the way they'd been organised, the way they behaved was blocking this. That's what this talk is largely about. It's not about should you do DevOps, because the answer is yes. It's not about is DevOps good or any of that stuff. When you've decided, right, I'm going to have my DevOps and my operations people working closely, we're going to collaborate with all of this good stuff. We've decided on that and we formed maybe a single team and we all sit in our team room and there's some developers and there's some sith admins and some QAs and some BAs and we're all sitting together working on our project. We're excited we're going to do this and it will all be wonderful and then you do it and then nothing changes and it's all still broken. This is what we're looking at here. You've tried doing some cool DevOps stuff, you've read the books and you think it's awesome, you try it out, it still doesn't quite work. These are 10 tips for things that you might want to look at if you're having the same problems that you had before you drank the proverbial DevOps Kool-Aid. What is the presentation theme? For me, it's a revolution theme or it's a packaging theme. Maybe your definition for this presentation was the operation theme. This presentation is an amalgam of clients that I've visited over the years. There are a couple that are very dysfunctional and very large that I think are particularly appropriate so I come back to more often than others. So an operation team is whatever it is that the developers don't do between code getting written and it being run. That's different in different organisations, you're right, you might have a packaging team separately, you could have a build team separately, it's not uncommon to have the monitoring team, the networking team, all of these various teams. That's part of what we come onto is does it make sense to have all of these teams? Is there just one? So not every tip in this list applies to every situation. Most of you in the room will look at some of them and say, we already do that, which is awesome, good. Many of them you'll look at and say we can't do that, that's not going to work for us and that's okay too. What this is is a grab bag like an amalgam of useful things, some of which will be useful for your specific problems. So they're useful things to think about and come back to, maybe you're feeling some of the pain I illustrated and these are some ways of getting out of that pain. So in the DevOps world we talk about production, the whole cycle from concepts, business analysis, stories, development, QA, staging production is one delivery process. The DevOps part of the world is mostly concerned with the develop and put into production, but there's also some stuff that we can do up front with helping drive out requirements and that sort of thing as well, particularly around monitoring and gathering metrics. So we're mostly looking at the Dev and Operations bit, but there's a little bit of the earlier part of the chain too. So the first one shockingly for an Agile conference is do Agile development. The reason for this is Geigo. Geigo stands for garbage in, garbage out. If you don't have good analysis, if you don't have good stories, you're not going to be writing the right software. So you could have a fantastic operations team producing beautiful servers that are always up and never down, running wonderfully crafted software that nobody ever uses. And that's a pointless and expensive place to be. You should be doing iterative development and regular deployment. If your development team is only able to produce a releasable artifact every six months, there's no point in having a sort of high tuned DevOps-aware operations team who can deal with a release every 10 minutes because they'll be sitting around bored most of the time and you'll still have the pain from doing big releases. Continuous integration precedes continuous delivery. So continuous delivery where we're looking at the whole end-to-end process and we're able to continually push up useful features into production is awesome and it's a great ideal to work towards. But continuous integration is the first step. So you should be doing that when you commit, you run your tests, you produce a deployable artifact. This is good solid agile stuff. You should definitely have automated testing at all the appropriate tiers for your application. And the reason I have all of this stuff is that if you don't do this, you've got bigger fish to fry. The problems that exist in your development team will almost definitely be bigger than can we do regular releases? Can we support what's out there? You'll almost definitely be fighting is our software of high enough quality? Are we catching bugs? Are we doing the right thing? Do we know what's actually in our deployable package? So the first tip is not groundbreaking. It's just saying carry on doing this good stuff. So sitting together is important. There was a great presentation on Friday about sitting together and how an organisation refactored their room setup within their building so that the dev teams could sit together rather than having everyone hidden away in their own little office. This applies to operations teams and development teams as well. One interesting thing, you'll forgive me if I don't have the citation, but there's a body of research that shows that the richness of your communication is mostly based on a factor of distance. So there was a study of US university students and the study was to see where the people's friends come from. When you go to university, you're out of your hometown, you don't really know anyone, you're living in a hall of residence. Who do you make friends with? And it turned out that the biggest factor for who you made friends with was not your sex or your race or your age. It actually turned out to be who are your next-door neighbours and who are their next-door neighbours. That was your circle of influence and that's where closest relationships were made, just like the people you were near. Naturally, as human beings, we develop neighbours. There's this idea of neighbourliness and being good neighbours and getting on with the people around you. This is a really strong thing. The communication is a factor of distance. The amount you can communicate with someone, the bonds that you build with them, are dependent on being close to them. Obviously, that's easy to sort out in physical distance if you'll sit together. If you're remote, then you have to really make efforts to shorten the physical distance. So things like always on Skype channels and what have you. You definitely need to make time to talk explicitly, especially if you have totally disparate dev and ops teams. To explicitly call out, we're going to talk to each other. Maybe we'll come to each other's stand-ups in the morning. Maybe we should get the ops guys involved in our planning meetings, because operations people have requirements of software, right? They need it to have logging and they need it to have monitoring and all of this stuff. When you bring in ops people early on, they can help. They can start saying, we need this stuff and they'll be much more helpful later in the process because they know they're getting something too. If you come and say to them, here's our new release, can we have it? By the way, it's got this cool logging stuff in that you wanted. They'll grab it off you and throw it onto the servers before you've even finished speaking. Open channels. So this is like the big Skype screen that everyone can just walk into and wave at and say hello and talk to the people in the other building. But it's also about saying don't funnel all of your communication down one channel. One of the things that was most disruptive about one of the clients I had was that the operations team and the dev team were completely separate business entities. There was no common management until you got up to the senior vice president level of an organisation with hundreds of thousands of employees. So it meant if you had to get anything done, you had to go all the way up the development management food chain across at this senior level and then all the way back down the operations food chain. Whereas actually the development team and the operations team had offices right next door to each other. So the open channel was you would walk out of one office into the other and say, hey guys, while the managers are off doing their thing, can we just sort this out? So those open channels, the back channel, secret IRC channels, mailing lists, going down to lunch together, that sort of stuff, really important. And it's also important to make time for individual people. One of the things that organisations traditionally do very badly that Agile tries to fix a bit is it's very easy to treat people not as individuals in a corporate context. And those individual relationships are more fulfilling for your life in general and being much more effective for getting stuff done. So it's very important still to sort of take time to go and just talk to have a coffee and see how they're doing, building up those relationships and giving your people enough time to do that with each other. If two colleagues go out for a coffee, I almost guarantee that when they're out they're talking 90% about work because that's their common bond. So it's really important to sort of nurture those relationships, let them grow and to help them happen. Does that come out all right? Yeah, don't do this. Does anyone want to have a guess at what this is? All right, I'll explain because it is an idea so I'm rolling the awful that I can't blame you for not immediately recognizing it. This is the door to the CIS admin team's office. And this particular team were a bit annoyed with the number of developers who would come into their office to ask them questions and distract them. So this thing here is an IRC terminal placed outside the room so that when developers came to talk to the CIS admins they would be told to go and stand outside and type in the IRC terminal any questions that they might have. Don't do that. That's bad and wrong. So moving on. Knowledge sharing. You really need to spread knowledge out around your dev and ops teams. Having knowledge silos is a real horrid anti-pattern and it results in lots more communication overhead, lots more risk to your project because only one person or a small group of people can do a specific activity. You've got only Bob knows how the backups work then when you need to do a restore you've got to really hope that he's not on holiday and that he's well and that he happens to be in the office and that he's not doing something else at the time because if you have loads of people and they all know how the backups work or they all know how monitoring works or they all know where the production data centre is and how to connect to it, you've de-risked any work that involves that topic and those resources. So you've really got to go out of your way to not siloise people. So ways to do this. One of the things we covered at this client was packaging because that's kind of an interesting handover point between developer operations teams. A development team has a CI server, they build some software, they test it and then at the end they can make a package. So in this instance we were using RPMs but it could be whatever. And then this admins would take the package, put it into their yum repositories and then it's available to be deployed in whichever data centre it needs to go to. So we had not all the devs understood RPM so we'd run special interest groups. So we'd say every couple of weeks people who do packaging get together, sit in a room, have some coffee, talk about how do you make RPMs, how do you deal with them, what's a good version numbering scheme, comments field, how do you build them, how do you deploy them, all of this stuff. Which meant that the actual packaging work didn't have to belong to the packaging expert. It was spread around all of the dev teams but there was that level of support that meant that a developer, if he was told here's your story, you're going to have to do some packaging soon. He had someone to talk to, he wasn't just left in the lurch to work it out. And this is a bit contentious. Irrelevant learning, I believe quite strongly that there's no such thing as irrelevant learning. Learning new stuff is good even if it doesn't immediately mean that you can use it. Just getting into the habit of reading books, talking to interesting people, learning about stuff that's outside your domain is really good. It's interesting, it makes your life more fun. It means you're not buried in a single topic. Encourage sysadmins to come to your testing talks if you can do some testing talks for your developers. If you're doing special interest groups on puppets or chef or system management tools, have the developers come along. They might not need to use this stuff straight away but it's in the building and they can have conversations about it. One day it's going to break and it's nice to have a little understanding. Encourage people to go and try things outside their comfort zone. You need to look at Conway's law. Conway's law is an observed phenomenon whereby an organisation will always produce software that resembles its organisational structure. If you have a networking team or a web design team, you'll produce a totally distinct network product. Back-end front-end teams are very common where you have the back-end system, the front-end system and there's nothing in between them. You can also exploit this. When you know that this happens, you can say we're going to build this deliverable thing. The deliverable thing is going to have some software, some networking, some databases. We'll make a team with all these right people in and then they'll deliver that one thing as a distinct business unit. I've had customers who do that. They sit their developers right next to the traders. It was a financial company. Developers and traders sit next to each other. Developers code, the traders run. It all moves very, very quickly because they're very tightly coupled as a business unit. Having said, don't have silos, if you do and many people do, don't just get rid of them. There's kind of a transitionary phase. I'm going to go from delivery team, embedded operations in with your development teams. You can't just say, right, we're not having a network team. Networking guys go and make yourself useful somewhere else because if you do that, your network will break. You do need to have time to do that work. There are truly some things in operations that are cross-projects. The network is the usual example. Everyone goes across the network, doesn't belong to any one team, but you all have to use it. There's lots of shared infrastructure. The firewalls, load balancers, the actual connection to the outside world. Even if you are going to bring operations people into your dev teams and they're going to sit with the dev guys and they're going to all work together, they'll still probably have some work to do outside that room. Maybe you do it on a timeshare basis. If you're a dev team, you get an operations guy once a week and he comes around and helps you with whatever it is that you need doing and he pairs with you and trains you up on how to do operations stuff. You have to make sure that there's enough time to get the shared workload done as well. How do you have things like people having access to production services? That is an excellent question and one of the future top tips will deal with your question. If I've not answered it by the end, ask me again. You still want to reduce the amount of siloed work. There will always be common shared things, but you want to take those common shared things and make them specific to delivery projects as much as you can. The shared infrastructure for your development teams is as pure to infrastructure as it can be. There shouldn't be anything interesting in there. You shouldn't need to have big central services teams. That doesn't mean that there would be someone, right? It doesn't mean that there isn't the networking people and they understand the network. They still need to exist because networks are complicated. You should make sure that there's enough networking knowledge embedded in each of your delivery teams so that that centralized function is quite small, is quite compact and their job is mostly to do with training up the other network engineers who are around your dev teams and your developers rather than fixing stuff themselves. That's not to fix the really nasty stuff, but the centralized function teams should be very light and they should be providing services into delivery teams rather than the usual anti-pattern which is, oh, you want to use our network. Well, here are the requirements. You're not allowed to use these ports. You're not allowed to use these transport protocols, which is horrid. Going down that path is just grim. Keep those teams small and light and focused on helping developers. Excuse me. Some organizational structure. This was a huge pain for one of my clients. I said the development team and the operations team were totally different business units. It was really damaging to have that. The whole team, the whole delivery unit, needs to have similar reporting. Doesn't necessarily mean you all have to have the same boss. You all have to go through the same processes, but it needs to be similar enough that you can understand each other and that your motivations are the same. In this organization, the developers were given bonuses. They were literally given bonuses that were based on the features that they delivered. That's fairly sane. You'd expect that, give developers bonuses for delivering features. The operations guys were given bonuses based on the stability of the site. These guys want to change as much stuff as they can to earn bonuses. These guys want nothing to change at all so they get their bonuses. This is really horrid. They've got totally separate bosses and their bosses get even bigger bonuses based on not having as many changes. These whole organizations, hundreds of people were set on the outset so that you can understand each other over are we going to release or are we going to wait. You need to have similar reporting so that you understand each other and that your boss is your colleague's boss. If you disagree with something, you can have a conversation that makes sense. You're not totally set up to fail from the outset. You need to set yourself up so that you don't have to go up all that way up the management chain because it's just inefficient. People grossly underestimate in our industry is the cost of communicating. It's actually quite expensive to call someone and have a conversation with them then they call someone else and have a conversation with them and then they call someone else and have a conversation with them. By the time you've gone up a certain number of levels of management the problem is now being described in a totally different way and has totally different impacts and who knows what's going on. It's quite expensive so every time you have to go indirectly to someone rather than just to the person who can help you and say, let's talk about getting a release done and say, that has a cost. Similarly, hand-offs, gates, checkpoints, this sort of thing, these all have a cost as well. It's not uncommon for the operations team or the QA team to be like the gatekeepers. Your dev team is going to do some work. They produce their deployable package. The operations team will say, please deploy this. The operations team will go, ah, right, okay. Please fill in a change request form in order to be allowed to deploy your package into the production infrastructure. Here is our change request form. I wrote an organisation and their change request form was a spreadsheet that had 11 tabs. It was a phone book that you had to fill in and that was the standard change. Anything you wanted to change in the production infrastructure, 11 tab Excel spreadsheet. Fix a typo, massive spreadsheet. Release a whole new data centre, massive spreadsheet. Going through that process, it made the developers not want unnecessary work and it gave the operations guys a false sense of security. They believed that by going through this big hand-off process where they say, describe your testing. If it doesn't work, it was pointless. It was just paperwork. All a large pile of paperwork proves that you've written a large pile of paperwork. It doesn't prove that the software is well tested. We think it's all right. We've done the usual testing. If it breaks, you'll have to fix it. Having a big pile of paperwork saying if it breaks we'll fix it doesn't help you. These hand-offs become a way for operations teams or QA teams or whoever is holding the gate to try and push risk out of their team further down the chain. They've got this massive document they can beat you with if anything went wrong. Reducing that is really useful. One thing that we did was just say, we're not going to do that anymore. The change request process is now a little web form. You put in the dates of what your change is and a short description of what it is and hit go. The operations team manager will look at it and say, it seems sensible. I trust you. Go and do it. The main reason that we have that at all is just to stop two people changing the same thing at the same time. You don't want to change the time so when it breaks you're pretty confident you know what it was that caused the breakage. Reducing those hand-offs is really important. There's an exercise called value stream mapping which is a lean thing which is very useful for helping work out what your management structure should look like and where your hand-offs should be. This works by looking at the whole stream from concept to production and saying, in this stream where are we adding value and where are we just waiting or where are we not adding value and adding up those periods of not adding value and then you can work on other ways of trying to reduce them but value stream mapping will tell you where are you being productive. Testing is good so when you're testing something you're adding value to it you're improving the quality of it but filling in a document to say it's been tested doesn't actually make it any better so you want to reduce that. Let's talk about incentives. I did touch on this earlier. Incentives are good. Who doesn't like getting paid? This is the ultimate question to any incentive. Does it help or does it hurt? The case where the developers are getting told to do one thing and paid and the operations guys are being told to do another thing getting paid on the basis is really hurtful. So that means that you need to give them a common goal. Delivery is the ultimate goal. Good software put it into production, it runs. Everyone has to be part of that. You all have to work together and it's in the sys admin's interest that the software doesn't have bugs. It's in the developer's interest that it's reliable and it can run consistently in production. You should reward based on is the software out there? Are we making money for customers happy? You should do it fairly. It's really obvious to say you should reward people fairly. But what I mean here is that these people rewarded for quality, these people rewarded for quantity doesn't make sense. Everyone gets rewarded for quality. One of the key things that comes out of XP is that you bake quality in. It's not something you add at the end of the process. It's something that everyone does while looking on your product. It gets a little bit better every single time someone touches it and improves it. Everyone is feeding into quality. Everyone should get rewarded for building a high quality product. Don't get gamed. The awesome thing about software developers and operations people is that they will game any system you put in front of them. They're very clever people. Because that's what the job is. The job of software development is here's a bounded system. Manipulate it within the rules of that system to do what you want it to do. If you give a bonus system to someone they will find ways of maximising their bonus. That doesn't necessarily mean that that will have the positive effect that you're hoping it to. The canonical example of this is a software development team where the developers were given a bounty every time they fixed a bug. They were given £50 for every bug they fixed. Which is nice. We all like fixing bugs. That sounds good. Except what happened was the developers being sensible, rational good at the gaming systems people. They went to the QA team and said every time you raise a bug that's really trivial you need £25. The number of bugs went through the roof and they were all trivial please fix this typo bugs and they'd get fixed. The developers would get paid and they'd split the money with the QAs and everyone was happy. Except for the fact that the quality of the product didn't actually improve. Nothing got better. Everyone got richer and the management realised that maybe they'd made a terrible mistake. Watch out for any system that you make will get gamed. You have to keep it simple and open and fair and honest and really know what it is you're trying to the behaviour you're trying to encourage with those incentives. Celebrating. Celebrating is awesome. Who here has a project or is on a project or works on a software project? Keep your hand up. If occasionally you have a release or something good happens on the project you don't need me to tell you this. It's nice. You go out and celebrate and you all feel better. You've done some good work and maybe you've worked night so you've gone really hard just to get the software out the door and it's there and that's great. You have to be careful with this. You have to be inclusive. One of my earliest clients I was working with the systems team and made a development team and we were working away agile development all looking very nicely. The first release came up. We've been going for three months. It was a totally green field project. Three months going. We were doing little QA releases staging releases and then the big day of putting it live on a public website came. So the final iteration came around and the developers had done all their stories. I was like, who are all the stories are done? We're going to the pub. The development team went off to the pub which then meant that the systems team was left behind to pick up the final build from that iteration and put it into production. So their work was just starting as the developers was just finishing. Here we were doing quite risky stuff. We were bringing down a production web server or in fact 70 production web servers to install this new software to a site that serves 200 million hits a day. Nervous, right? Meanwhile everyone else was getting really drunk and that sort of thing doesn't build a cohesive team. It says, I'm going to do my work, you're going to do your work and we don't really talk to each other. The right thing to do in that instance is you say, hurrah, we're done now we'll help the systems guys. Now we help the operations team and then when they're done we'll all go out drinking. So the thing is to be inclusive. No one likes being left out of the party. This is mostly pitched at a Western audience but I would imagine a similar thing in India where quite a lot of people don't drink. So while it's nice to go to the pub on occasion you don't have to do it every single time. Do something different. One of the best sort of after project celebrations I had. We finished early, it was a sunny day. No drinking involved, we just all went out and enjoyed the sun. So kind of be, have variety, do things that are inclusive. Again no one likes being left out and the level at which we work where we have experts in systems who we depend on, everyone's an individual and we need to look after those individuals and do things that they like. So make sure that when you do have celebrations not everyone will like everything and cycle around lots of different activities to just let everyone let their hair down. This is again it's a little contentious. Eric Rees in the lean startup describes businesses as learning machines and the point in the startup is to learn about what your customer wants and learn about how you can deliver it. This means that failure is a very important thing. Failing is good we like failing because when we fail at something we've learnt about it and we've learnt that what we did was not the right thing to have done so we won't do it again. So recognising and celebrating your failure is important. So that's the thing, is by celebrating small failures and learning from them you prevent big failures. So here's an example of that. This is the fail cake. This is an idea from a friend of mine who works in a and they have a rule where as a system administrator if you break the system in some horrible way your punishment is to buy cake for the team. This is awesome because it's a small punishment. It's not like you're being beaten up it's not awful but it's still enough that you're going to be careful and then you bring the cake to everyone and then you talk about what you did and why things broke and what went wrong. So you have a little retrospective and everyone's eating cake and it's really hard to get annoyed with someone who's just brought you some cake. So you kind of diffuse the fear of the situation the problem that this failure has caused. So the fail cake is an awesome thing and so I encourage you to try other interesting ways of looking at your projects failures and saying how can we learn from this how can we retrospect how can we improve the way we work. So systems access. This is something where I start to deviate from the way a lot of system administrators think. Everyone has a right to puppet, chef, configuration management. I would go so far as to say everyone should have root on your servers. Developers should have root on your production servers and here is why because if your production server gets hacked or broken in some way you need to be able to rebuild it quickly and efficiently and sensibly. So you can assume that it's a hostile environment. So your developers can run code on your production servers. That's their job is to write code that runs on your production servers. Functionally they've got root already. Then we'll do something bad, they'll write code but it runs. What it means to say they can change things is we are confident that we can fix our systems if they break. We're saying we trust the developers to do the right thing. Now there's a whole load of other systems you have to have or practices that you have to have to pack that up. You might say we'll not let you have root but a sys have been next to you and you can pair and he'll log in as root and you can do work together. Or you might say we'll give you sudo for certain things. But the rule of thumb is that it's you know if you just lock developers out of your production systems you're just saying these are our problem anything before it is your problem. It's a false risk mitigation. But it comes with a cost and the cost is you have to share the pain. So if developers can log into production that also means that when production breaks developers fix it. To say root is like root on the actual box is maybe overstating it for some cases certainly you read right to puppet. You can make changes to puppet, you can check them in and everyone can see what you've done and if we need to roll it back then we can change your puppet configs and roll it out. But that means if it breaks because of your change you need to put it back. You look at places like Etsy and forwards like Fred talk about forwards on Friday developers just push straight to production. Can we make a change into production? Sorry? Will they test to make themselves feel happy? So it's do a change. Do I need to test it? If I'm comfortable, am I just fixing a typo? As long as I'm comfortable I push. And if it breaks I fix it straight away. In fact at Etsy they're very strict about this and they say if you commit to the repo you have to push that change into production within 15 minutes or so. Because if you don't then what happens is the next person comes along and wants to push their change and they can't get backed up behind yours. So they're very strict about if it's good enough to go in the repository it's good enough to go on production. Super fast. Etsy is a website it's a little bit like eBay for handmade goods. So if I make something I can have a little Etsy site and you can go there and see what I've made and you can buy it and they take a percentage. And they're doing very well. Yes. Sorry can you speak about banks? Yes. So the question is if you have a bank or a financial or some system with regulatory requirements about user data how do you deal with access to those? The answer to that is you focus around delivery teams not around the operations team. So a delivery team is a unit that writes code into production. So the financial I was working at five years ago their developers sat next to their traders. So the developers had access to the systems the traders were using with all of the live data in and they would write code and deploy it and then the traders would use it. So they put access to production their name is developer but they're also the administer of the production system. The production system breaks, the developer fixes it. So that's how you look at that you try and get out of the mindset of the developer role and the sys admin role and you just say here's the team and they manage that product. But it is an interesting question we can talk about it more later if you want to. Sure. So that's a great thing right, how do you stop developers breaking your stuff? Tests. As developers as agile people we like tests because they prove stuff works. Documents don't prove stuff works, tests do. So as the guardian of the production database I might say to my developer have you run your query on the staging database did it take a sensible amount of time? Have you got a test that proves that it works and that it's not going to go and delete a load of data? Things like puppet chef changes there are tools, there's Cucumber puppet there's a chef, Cucumber chef like tool I can't remember the name of so you can write tests for these things and say is it passing the tests? Have you done it in QA? Or you can do pairing. If you're going to run a big query on a database you should sit with a DBA first. But the thing is that the access is shared it's not my system it's not operations owns production and you can't ever see into it. It's our system we all have to work on it. So let's find good ways of doing that and largely it's based around trust. A lot of the gate keeping type systems that exist exist because there's not enough trust between operations people and development people. Occasionally you run a long running query that's horrid most of the time you won't and it means you'll go so much faster that you negate the problems of breaking your database every so often. Information sharing systems are great for DevOps people this is a great example this is a monitoring dashboard in an operations team and you can see these are the alerts that are coming in and as the team clears out the alerts they get more kittens so if you get all of the alerts clears and the production data center is fine you get pictures of kittens which is a wonderful motivator for systems teams. So developers need to teams production I want to see that that change I've just put live has worked and this is really useful stuff so places like Etsy where they're doing continuous deployment like every change goes straight into production the developers will watch production metrics when they make the change and you'll see if user traffic just goes like this you've broken something so developers really need to see into production similarly ops people need to see the new features an awesome thing that we can do now is using Nagios or similar monitoring tools to drive development so I can develop my feature and while I'm developing it I'll write a monitoring script that monitors that it works and I can deliver the whole thing into operations and say here is the new feature and the script that proves it works and that's awesome for doing releases because I can put the monitor live watch it, fail, fail, fail, do the new release and then the monitor will start passing and that way I know that all of the features for that application are working not just the usual monitoring stuff of is it running, I can actually say is it running and does it work big visible displays they look great they make the investors feel like their money is being well spent they're really impressive and they're useful information radiators that just kind of bleed information into the room is the build passing or failing is the production website running a bit hot are we going to need to add capacity all useful stuff because it gives you a feel over time for what the project is like things like noises, build noises that go off whenever a build passes really useful because they kind of give you that heartbeat of a project you can hear every hour or so a noise that said a build has passed and when you start running this in a team you notice things like when that noise stops being there you notice and people start saying what's broken where do we need to fix things if you're surrounded by a wall of red lights you can be pretty sure something has gone wrong and it's more important than whatever it was you were doing before all the lights went red and Devon Ops people should all go to each other's meetings not every meeting necessarily but enough to make sure that you understand what's going on with the other people what they're doing, what's going to come up soon that's going to affect you the operations guy is going to change the base operating system you're running on that's going to affect your development team so you need to know about each other's work backlogs and schedules for your upcoming work Metrics are really useful there have been some other talks on metrics so I'm not going to go into them in huge detail but these are things that you need to ask yourself you definitely need to have metrics to drive your system to make decisions about it so ask yourself what happens at release time Netflix when they do a release they do AB testing so they release their new code they watch the site if the amount of money coming in goes down they roll back the release which is really useful they only release stuff that makes their site more valuable practice try it for a certain amount of time it should be short enough that you're making decisions quickly enough but long enough that you've got a representative sample of data so the answer is how many customers do you have how rapidly do they visit the site how much money they spend each time try for a period and if you're going too slowly do less if it's not enough information do more but you need to answer the basic question how can we measure it there are two schools of thought on metrics one of which is capture everything and the other school of thought is just capture those things that you care about I tend to be a bit in the middle which is capture everything but then only look at those things that you care about there's no point in measuring server uptime if you've got a web full of bajillions of servers who cares about one server's uptime it's totally irrelevant so don't even bother tracking it but you do definitely care about site uptime and probably more importantly you care more specifically about different uptimes of different parts of the site because you wanted to degrade gracefully if something breaks you don't want to just throw your toys out the pram so you need to think about what metrics you're going to have and then you can use those to make proper decisions do we keep a release in or do we roll it back are we making more money then we'll keep it there's hard evidence it takes away a lot of the wishy washy defensive thinking that sort of occurs around organisational boundaries to do with risk you can say this definitely happened it's an empirical observable fact therefore this is the right way to do it that's very valuable and you should share the data around everyone it's useful for developers to know how many hits does the site get how do they need to scale up if your site isn't performing are you going to improve that by adding nodes to your pool or by improving the speed of the software by performance tuning it work that out and you can only do that by sharing the metrics data the final tip for having happy DevOps teams is change agentry so in order to do a lot of this stuff you need to make changes to the way people work and that's difficult so you need to employ change agents people to go and make friends stir things up break the boundaries do ask those difficult questions say is this really the right way to be paving shouldn't we try something out an interesting question about these people are they an internal employee or an external consultant there is no right or wrong answer for this it depends on your organisation is it better to have some proverbial skin in the game and feel the pain alongside everyone else or do you want someone who is external and unburdened by past history so for those of you who are counting hard you will see that the top tips actually went up to 11 that is the end I'm sorry for running us right to the end of time if you do have further questions I'm around all day grab me any time I'm happy to talk thank you very much for coming