 All right, so I'm here to talk to you about how we use the code review system that we use in OpenStack for our DevOps team, for our infrastructure team in the OpenStack project. It's a little bit about my background. I work on the OpenStack infrastructure team. I'm paid by HP Enterprise, but I'm not actually doing anything for HPE directly. I'm working full-time on upstream OpenStack infrastructure work. I've been contributing to OpenSource projects for probably about 15 years now. So even before I was hired by HPE, I've been working on OpenSource projects, Debbie and Ubuntu, and then various little ones all around for a very long time. I'm also one of the co-authors of the official Ubuntu book. We're working on a ninth edition right now. And then I'm also maybe someday publishing an OpenStack book. It's about half done. It's been half done for a while now. The OpenStack infrastructure team itself that I work on is pretty much making sure that all the developers can do their job. So we run all of the services that the OpenStack developers interact with on a daily basis. That means the whole continuous integration system, which I'll talk about, as well as wikis, chat bots, paste bins, pretty much anything a developer will interact with in our infrastructure. We're running all of those services. All of our infrastructure is open source. And you can find it at git.openstack.org slash cgit slash openstackinfra. And I'll make these slides available later as well. Anyone can propose changes to our infrastructure since we are an OpenSource project. And then we all work remotely. None of us actually works in an office anywhere. And we work across various companies. So there's core committers in our project from five different companies. As I mentioned, we run a bunch of services for developers. This is a small list of what we work on. We've actually had to work on getting our documentation up to speed with what we have. And we're doing a very good job of that these days. So if you go to ci.openstack.org, it redirects to all of our documentation for the project. You don't really need to read this, but it's pretty basic stuff that an infrastructure might run for an OpenSource project. I mentioned the CI system. So OpenStack itself has over 1,000 projects. OpenStack projects need to work together. So we have a whole system to make sure that we're doing unit tests, functional tests, integration tests to make sure that a change to one component of OpenStack will not break another component. So you make a change to networking. You want to make sure that doesn't break anything to compute. We also have hundreds of companies working on OpenStack. So we don't want to break the master branch. Because then if someone at one company can't get their work done because someone at another company broke everything, you're going to have some serious political problems happening. So we make sure things don't break by testing them. We make sure code is syntactically clean so that people can look at their code and see something that looks vaguely familiar. At least that's the hope. And then, of course, testing is completely automated because we have patches coming in every minute from developers all around the world. How we're accomplishing this, again, it's all open source. We're using Launchpad for authentication and bug reporting. We're hoping to move to OpenStack ID for authentication pretty soon. But Launchpad is still where we have our bugs. Launchpad is a completely open source platform. Of course, we're using Git for revision control. Garret is our code review system, which I'll show you in a couple minutes. Zool is a gatekeeper that makes sure all changes are merged together and tested together properly. And Gearmin is a tool that tells Zool what Jenkins masters it can put run tests on. So we have a series of Jenkins masters that all tests are run on. And then we have a fleet of VMs, currently about 600 to 800 VMs that we do all of our testing on. And those are controlled with a tool called Node Pool. Now this is all important for us because we use it in the OpenStack infrastructure team. So this may be a little hard to see, but it's sort of the workflow that I just explained. So you've got local changes here on your own computer. You send changes up to Garret code review. So instead of doing a pull request with GitHub, in our infrastructure, you send them to Garret. It goes over to Zool, sends to Gearmin, and then Gearmin will send it up to a Jenkins master for testing. Once it's all tested, people will review the code and then it'll go through testing again to make sure nothing has changed since it was first tested. And then it'll land in our Git repository, which is mirrored to GitHub and git.openstack.org. So I just explained our CI system, and this is not a CI talk, really. But the important part for this talk is that we use it for the infrastructure team as well. We used to be a very traditional open source project with our infrastructure. People would submit bug reports when a service would go down, and then one of the sysadmins on the team would go and fix it. But we've evolved since then. We have a number of developers on our team and then other operators. And we work very closely together to make sure all of our tooling is developed and launched in a way that makes sense. And then, of course, it's all open source. So people don't need to submit tickets to us. Instead, they can submit code reviews to fix things, or they can come into the channel and bother us. I guess that's pretty traditional. So in OpenStack, it's all written in Python. So they have Python tests that are run on things, and then integration tests, which I mentioned, will test all of the components separately. For infrastructure, things are a bit different. We're using this testing system to test our code, but also infrastructure-y things. So we use Flake8, which does Pepe8 and PyFlakes to do all the Python syntax checking and little unit test type stuff. And then we're using Puppet for our infrastructure. Not because we love Puppet. It happened to be the one we got running faster than Chef at the time. So we run PuppetParserValidate to validate the syntax on our files and PuppetLint to make sure that the files are looking about the same. Because again, just like OpenStack, we have people from many companies contributing to our infrastructure. So we don't want the infrastructure to, the Puppet configs all look really out of whack when people try to look at them. We also do some Puppet application tests. So we test that our Puppet configs will apply on precise and trusty, different Ubuntu versions. And then so that doesn't guarantee that actually work in production, but we have unit tests that make sure all the boxes are ticked to, in theory, make sure that they would run. We also know the syntax of a bunch of our XML files. So we have some scripts that will run through our XML files and do checking against those. We also like to alphabetize things. And we learned that people, humans, very bad at the alphabet. Computers are really good at the alphabet, so we have syntax checking to make sure that files that are supposed to be alphabetized actually are alphabetized. And it'll throw back an error if you upload a file that's not. And then we also have a bunch of IRC bots. So we make sure that the bot, when you want to add the bot to your channel, we have a little script that will log on to FreeNode and check to make sure that the permissions on the channel are correct. We need access to the channels in order to add them to our infra. So that's the automated testing side of things. And then we have the code review side of things. So I can show you. Yeah, OK, so I got some pictures of Garrett there. Garrett is a code review system. And I don't know how well you can see this, but this is pretty much what Garrett looks like. You've got a commit message up here, a bunch of people who looked at the change here. And then we've got our automated tests down at the bottom here. And then the second page, this is like if you're scrolling down. People will vote on your change. They can also leave comments like in this change, which is very funny if you can't see it, if you can see it. So I wrote, this is my change, I wrote, if robots.txt does not equal undefined. That's a double negative, and that's very silly. So one of my colleagues chimed in here and said, you could just do if this thing. And I was like, oh, yeah. Because I had refactored this patch a few times, and I came out with a double negative, which was very amusing. But not, I mean, it works, but it's not really the way I should have been doing things. So that's what Garrett looks like. So when we write a patch, we'll submit it. It's done and get. We submit it to Garrett. And then it shows up on that page I showed you. And that's where people can review it. So instead of logging into servers and making changes, we're all doing everything pretty much through Git and Garrett. At my old job, we would fight fires and go off on our own ways and fix things. And then we'd come back at the end of the day and say, hey, I fixed things this way. And then we'd talk about it after we had applied the changes. This wasn't so great. Sometimes I'd apply something wrong, or one of my colleagues would do something that wasn't quite so efficient. But it worked. And we'd be like, all right, next time we'll do it right. Since we're doing code review for our systems, we can do things right the first time around. It also provides a really nice framework for developing new solutions. So if I'm working on a project and I'm not quite sure yet what I want to be doing with where I want the direction of the code to go, I can just put up my infrastructure changes into the review system. And people can make comments on it. And we can work on it together. That question. OK, so the question is, how long does it take to get a change, except at upstream? So it depends. If you are reviewing our code and if you're helping us in the infrastructure team and we know your name, we're likely to review your change much faster. If it's someone we've never heard of and they decide to submit a change to our infrastructure, it'll take longer. So it may take a couple weeks if you don't come to us and say, listen, I'm doing this change, and this is why I want to make this change. If you don't come to us and talk to us about it, it may take a few weeks. But if we know you, and if you're helping us with our patches, we're going to help you with your patches, which means there is an investment to the team if you want to make changes. But I think overall, it makes for a better team. But yeah, it can be a few weeks. It depends on the change, too. If you're refactoring something, it may take months. So also, since everyone is using this code review system, no one can really make changes directly on the server. That's going to be persistent. So we don't really have a procedure to go through for commit access. We've got core reviewers who can actually approve changes and make things go through. But no one has specific access that you have to go through. And then you, as a one person, can commit changes directly. Technically, it's possible to upload a patch and then self-approve. But you're not going to be a core member for long if you're going to be doing that kind of thing. It also trains us to be really collaborative. One of the things I really wanted with this job, in addition to working on open source and doing infrastructure work, I wanted to work on a team that was really cohesive and made decisions together. And this system works very well for that using code review. It also means any company can contribute resources to our infrastructure. So a while back, Red Hat wanted to run an Asterisk server in our infrastructure. And most of us were like, I don't want to touch Asterisk. Asterisk, sorry. That's a big, scary place. So some of the people who did know about Asterisk at Red Hat devoted a bit of time to writing the public in figs, getting them reviewed, and then finally testing it with us to launch an Asterisk server. And it's still running. And it's used by several projects these days. Because we try to steer projects away from using things like Google Hangouts and other proprietary solutions. We really want OpenStack projects to use open source solutions. So by providing them a voice system, they are happier. So these were the Garrett pictures. Once things are code reviewed, we continuously deploy in our infrastructure. So the change gets merged. And either our Puppet Master, which is actually Ansible now, we're driving all of Puppet with Ansible. But it runs off of a server called Puppet Master still. But whatever's running it, we'll grab the change from Puppet and apply it to the servers. And it gets deployed. We also have a module in Puppet. If something is committed to a Git repository, we'll launch it automatically through this VCS repo to continuously deploy. There are a few exceptions to this. Garrett itself, since we're merging patches with Garrett, we can't really do an upgrade of Garrett with Garrett. Also, that would be crazy, because upgrading Garrett's really complicated and takes time. And it's a whole process. But most of the stuff we continuously deploy. So the question I get is, can you actually do this? Like, does this actually work? It does. We have a few tools to make it easier. Since a lot of people on our team don't have shell access to the servers, we have to give them some weight of view into the system so they can make intelligent decisions. One of the things we run is cacti. You've probably seen cacti before if you're into infrastructure. So you don't need to log into a server to, say, see CPU usage. So if you see a server is going really, really slow, you can go to cacti.openstack.org and say, yeah, the CPU loads way crazy high. That's probably why it's slow. And then you can dig in and see if there's how the disk is looking, memory usage, CPU usage, everything. And this is totally public as well. We also use a tool called Puppetboard. So if you submit a change via Puppet and you want to see if it's applied yet, you can watch Puppetboard. And this is just a picture of Puppetboard here, showing this AFS server change because we use AFS. And you can see whether your change applied or not. Or if it applied and it failed for some reason, you can start on your next patch to fix the thing that failed. And that means you don't need to come to me and say, hey, Liz, can you look at the syslog? And I was like, I'm too busy. I'm at a conference. So allowing people, they can look at Puppetboard. And that's at a puppetboard.openstack.org. We also, as I mentioned very early in the talk, we maintain a bunch of documentation. It's in our system config part of our documentation. And I don't have a picture of that. But if you go to either this URL or ci.openstack.org, you'll be able to find all our documentation. And we're really disciplined about this because if we don't do documentation, we can never go on vacation. So that's kind of our code review and our testing system. There are a few caveats. We can't do everything through code review. Sometimes we do need to log into a server. Sometimes I just need to dig into a syslog. Or Matt over there, I need to look at his subunit SQL logs. We also need to do complicated migrations and upgrades. Like I mentioned, Garrett, we can't really do that through code review. We can stage all of the patches. We can go into Puppet and make all the changes get ready for our migration. But when it comes down to it, we have to use other tools to finish that. We still have a manual process for the initial deployment of single servers that run our services. So I briefly mentioned Node Pool, which runs all our testing. That's all completely automated. Those VMs just take care of themselves in theory. They break all the time. But the system is developed in order for that to be completely automated. But when we're bringing up a new service, like a Wiki or a new Garrett server, or some new Jenkins server, that's all kind of manual, we are working to automate that more and more. And then, of course, we can't have everything be open source. Our passwords and SSH keys are not public, because that would mean OpenStack would be completely insecure and anyone could change anything at any time. So those are privately managed on one of our servers. But we do use Git, so they're revision controlled at least. And we can keep track of those things. For the things we can't do with CodeReview, we use collaborative tooling. This is Etherpad. So etherpad.openstack.org, a lot of people are using for collaborative stuff. We also use it for our maintenance windows. So in this one, I think we were, what were we doing here? Yeah, we were making a bunch of changes to Garrett. So in order to make changes to Garrett, we had to rename a bunch of projects. So we had to shut down Garrett, which means we shut down Zool, and we shut down Garrett, and we shut down. So we had all these steps laid out in the Etherpad. We chatted on IRC, and then we also selected tasks in the Etherpad to do them. So Jeremy Fungi up here, he stopped the puppet runs. I shut down Zool. My friend Spencer Nibbolizer there shut down Garrett. And so we're all chatting in IRC and also working on this document. And this document would have been written beforehand. And then we just add our names to it as each step is done. We also do a lot of human-based collaboration. So I mentioned we have an IRC channel. That's OpenStack Infra on FreeNode. We also have an OpenStack Infra Incident channel. So when something really goes wrong, we all pop over to the Incident channel. And we can focus on solving that one specific problem. Then we don't need to worry about other people coming in channel and asking us, is something broken? Yes, everything's broken. But we're focusing on fixing it rather than answering questions right now. We also have OpenStack Sprint channel where we will focus on a specific project for a day or two. And everyone is working on that single project. One of the things we did is we did a bunch of work on our puppet modules a while back to split them all out from a giant monolithic puppet module we had. And we all just went to the Sprint channel for a couple of days and then worked through getting that all the changes in. So it didn't break everything all at once. We have weekly meetings, which are all logged. So anyone who doesn't get to our meeting, they can see the logs afterwards. When we're sharing a bunch of stuff, we use Pastebin quite a bit. So I'll paste a bunch of logs into a file, a Pastebin when someone needs to see them. And then we see each other at an OpenStack design summit every six months, which is really valuable because it turns out you do need to see your colleagues every once in a while. At least on our team, it's been very helpful. We don't do voice or video calls just because we don't really like them. We like each other. Don't like talking on the phone. One of the challenges we definitely have, which I'm sure if anyone here is working with teams in the US or Europe, you may be familiar with time zone issues. We haven't quite solved that problem. It's really hard to have a team that's distributed around the world when you're working on an infrastructure. So the first person in a region who's a core and root member of our team, they really struggle to find cohesion with our team because they're waking up at the wrong time. We've tried to start to do kind of handoffs, like this is what's wrong, good luck. And then they may be reluctant to land things production, they don't get the opportunity for mentorship that the rest of us do because something goes wrong during the day in the US and I'm just like, oh, help me debug this and I've got people around. But if you're the first member of our team in a new time zone, that can be tough. So we just try to add more people and get them the support they need. But it's a tough problem. We don't really have a solution at this point. But the team works really well with between the code review that we're doing and the continuous integration testing that we're doing, we actually have a pretty solid infrastructure that people can contribute to. Since we're all open source, anyone can contribute and we could always use a lot of help because it's a big infrastructure and things break sometimes and we don't always have the people power to fix them. So if you see something is wrong where we welcome the help. And that is all I had. I got some contact information here. You're interested and then if anyone has questions, I think I'm at time but yeah. To the developers, actually we're in the company and it's got really hard for me to do code review in the team because I have to pause them in time to do code review. So in the experience, how to train the culture or something else to make the people do the code review for the team. That is an excellent question. So he asked, how do we make our team do code review? Cause it's really hard. People just want to write code. They don't want to review other people's code. So in the OpenStack project, we've tackled this sort of project wide. Reviewers are a big deal. Reviewers are how you get core access on a project and it's how most of our statistics in our project are aligned. They're not about lines of code. In OpenStack, it's about reviews. So we've tried to push that in like seriously as an OpenStack itself. Within the infrastructure team, it's also getting you on your path to getting core access to the team and root access. So you'll never ever get to log into a server in our infrastructure if you don't do code reviews. Also, as a team, you're less likely to get your patches reviewed if you don't review other people's. So I will review code if someone reviews my code and it's sort of just institutionally that way. And if you never review anything, your code's probably gonna sit around. So motivating people that way. So rewards based on number of reviews, even if they're like ours are very you know, project based rewards as far as cloud in the community and getting the plus two access to getting things approved so you can approve changes yourself. Yeah, so that's pretty much what we do. Happy to talk about it more. Anyone else? All right, well, I'm around through tomorrow so feel free to grab me in the hallway or whatever you have more questions. Thanks.