 Thanks for the good one upstairs. Perfect. Great. Thank you. That's going to be annoying. Well, I guess we'll get started. So, those of you who have been in this room this morning have probably heard a lot of really good advice about how to contribute upstream from the upstream perspective. All the tools and processes and good practices that you can use for successfully contributing and interacting with the upstream open-stack community. I'm going to talk about things from the other perspective. I'm going to talk about the view of contributing from within inside an organization and how you can get a team that's not used to contributing or working in open-source software up to speed and contributing fully in a productive way. So, I'm Tom Leggett. I'm an engineering manager for a team of engineers in HP. I want to, first of all, put up a disclaimer and apologize to my colleagues whose stories I have borrowed, exaggerated, or otherwise mangled. So, it's useful to have a bit of context and some introductions. You know who I am. My team is the NOVA team. Actually, Compute to Service is our name. We're about 13 people. And importantly, we're a product engineering team. This means that we have some support and operations responsibilities for a couple of HP product lines, for a couple of live HP product lines. Those are the HP Helion open-stack product, the HP Helion public cloud, and the community addition of the HP Helion open-stack cloud. So, we are third or fourth line support for each of those product lines. I want to tell you a little story about the way we worked a couple of years ago as a way of setting the context. We were working in a mode that I call interrupt-driven development. Each engineer had multiple tasks on their plate at the moment, certainly more than five. Support work was handled by one individual. And knowledge was very siloed to individuals. There were certain individuals who could do certain tasks on certain parts of the system. If that individual was busy, then you were out of luck until he was free. This meant that predicting when any one piece of work could get done, any one piece of work that our customers cared about could get done was very difficult. One thing that the team decided to put in place early on was something that I called the NOVA engineering environment and product support rotor but which the team rapidly came to know as sentry duty. This was a daily rotation of duties. The responsibility of this person was to answer any walk-up, email or IM queries, deal with any issues in production, triage incoming bugs, deal with issues with the build tooling and dev test environments, and if there was any time left over, work on improvements to the aforementioned systems. What this meant was that the knowledge that was previously siloed into one person's head rapidly got spread around the team. I'm telling this story as a way of demonstrating the kind of improvements that the team is putting in place and capable of driving themselves. I want to have a brief aside on team size before I go any further. 13 people is a big team and 13 people are briefly trying to explain you should have some intuition about that. I want to briefly try and explain why that is and what you can do about it if you have got a big team to be acting as one single team. Let's say you have a three-person team, Alice, Bob and Quentin. Quentin has a question and he knows that the team knows it because he's seen this piece of knowledge put to use before. He has four different ways that he can get this piece of knowledge. He can ask Alice, Alice knows and tells him. He can ask Bob, Bob knows and Bob tells him. Or he can ask Alice, Alice says she doesn't know, Alice asks Bob and then tells Quentin and vice versa with Bob asks Alice. So there are four different routes in a three-person team. There are four different ways that you can find out a piece of information that is only known through oral history, as demonstrated there. As you go up to four and five people there are 15 and 64 different ways, respectively. As you start to get to much bigger teams by the time you've got to a 13-person team you are looking at about 1.3 billion different ways that a piece of information that is known by the team already could be transmitted to a person who doesn't know it. So there's two models here. One is keep your team sizes small and two is document your stuff, write it down, maintain a team wiki or I don't know, whatever works for you. So two years ago we had a proprietary fork of the Diablo code base, which Diablo was from... This was in early 2013. Our cloud was Diablo, which was a September 2011 code base with about 30,000 lines of proprietary closed code on top. Some people find it amazing that we managed to make Diablo work in a production system at all. However, we were now at the point where we wanted Grizzly. Grizzly was starting to rapidly catch up and overtake in certain aspects. We wanted to start using and maturing Neutron, for example, in our public cloud. Our customers were telling us that they wanted the flexibility and power that Neutron offered. Unfortunately, we had approximately zero upstream contributions, give or take one. And we had no experience really with working or engaging with the upstream community. So we did a few things. The most important thing was to institute a project around getting back upstream, so making it formal. The project was called Bravo. That was the internal code name for the project. The most important thing was making it formal. This allowed us to get time and priority from the business and from the people who had input into what we did from day to day in order to work on getting our code base back upstream. So my advice about maintaining a fork of OpenStack is don't. It will kill you in the end. However, no, let's move on. Okay, so we've got a project. We've got some time, we've got some priority carved out to get the team start working upstream. So you could say that we're at the river mouth. We're at the estuary. We've been released from the cages of the salmon farm. Unfortunately, not really knowing what to do, we just swim around in circles near where the cages were. So we needed to put some practices in place and a bunch of the other people who've been talking in this room today have talked at great length about some of those practices and tools that you need to do. So we need to be reading the mailing list. We need to be on IRC and reading IRC, all of that good upstream stuff. We actually started sharing on our internal communication platforms either via email or on our hip chat team room when there was an interesting discussion going on in IRC or on the mailing list. Then we put in place some very simple practices designed to encourage people getting started with collaborating with upstream. The first thing we decided to do was and this was based on the advice of some folks we met at one of the summits was to start with code reviews. Start with reviews. This turned out to be very good advice. For all the reasons that the folks before have mentioned that it gets you visibility, it starts to build your karma in the community, it gets you familiarity with the code base and it starts to get you used to giving and receiving peer feedback. In order to encourage this we put in place a simple practice at our stand-up every day. We would raise our hand if we had done a review in the previous day, which is a bit like in this photo, raise your hand if you can't swim at times. But that got our review count up and I think it's important as well to note here that this was not a hard and fast rule. You weren't kind of in trouble if you hadn't done a review in the past day. It's about, as a couple of the other people have mentioned, it's as much about quality of reviews as it is about quantity. But by gently encouraging folks to get out there and start making reviews, that was our first real foray into community engagement. I guess the next thing we started to do was celebrate small wins. First time that one of our team members got a patch merged, that's for the whole team. This set a trend which continued thankfully, not to this day, thankfully for my waistline and for HP's health insurance plan, didn't continue to this day. We now only get donuts when he manages to get a patch set merged with only one patch set. There was also a realization that language is powerful. So we started to refer to any place where we had diverged from upstream in any significant way as technical debt. And that was a metaphor that our product management and senior management folks could understand. And we started measuring, we started measuring the debt and reflecting that upwards on a regular basis. Just in terms of the number of patch set, number of local patches we had to apply in order to make our project work in the environments that we needed to work in. We also kind of developed, I guess, and matured a piece of software called Git upstream. This is available on Stackforge now. So this is a small toolset that makes working with a Garrett-based workflow easy. And the thing it really gives you, it's basically like rebasing. It gives you a slightly easier to read history, but most importantly, it'll automatically drop your local patches once it notices that you've merged them upstream, just a way of reducing the friction of carrying local patches to the upstream project. Okay, so we made a start. We were on our way upstream. Then one of my team members said, my goal for the next six months is to commit as much code upstream as I would have committed in an afternoon in my last job. Which, when I first heard it was quite a shocking statement, is working upstream really impacting our productivity that badly? How does this developer feel about being so apparently unproductive? But I think it indicates that we're moving in the right direction. We're thinking about committing upstream and working upstream and what it takes to work upstream. I don't know whether this person would agree, but we'll see. So that led to this realization that writing working code is hard in the first place. Writing working code that works with other working code is really hard. It's harder. So writing integrated code is an order of magnitude harder than code that just works in a nice pristine standalone environment. Hardest is writing code that works with other people's code that changes on a daily basis, is owned by another team and is largely unknown to you. But I guess the point is, it's not only hard, it's more valuable. So code that only works in its own isolated little bubble is all very good if it performs its task. Code that is integrated into a wider code base and is part of something bigger like OpenStack allows combinatorially more ways of exploiting the usefulness and power of that code. And that's why it's harder. It's harder because it's more valuable. It's more valuable because it's harder. So really, really hard. Peer review. Getting your first minus one on your first patch set you've ever pushed up is devastating. But giving and receiving good peer feedback is a skill that can be learned and it can be learned through practice. And one of the things that I've noticed happening organically in our team is that people will critique each other's reviews and talk about what was good about this review they saw on this patch set or this other review they saw on this patch set with an aim to getting better at giving as well as receiving reviews. I think to get better at receiving reviews you probably just need to get your stuff out there and start getting feedback. So all this sounds really hard and really tough and quite demoralizing but after a certain amount of time we notice we're getting better. We notice that the patch sets are going through fewer iterations before they get accepted upstream. We notice that the patch sets when they do get submitted upstream are of a better quality. They've got more and more comprehensive tests. So we notice that we're definitely on an improvement we're on an improving trajectory and we have got there by engaging with the upstream community and leveraging the power of all these immensely clever and knowledgeable people tearing our code to shreds. It's soul destroying but it's making us better engineers. I guess it was about this time that we also started to measure and track things. It's not a leaderboard because it's sorted alphabetically but it's a dashboard of the number of reviews that we have done over the past X amount of time. We can use this to see who's giving more positive reviews or negative reviews and to see what our overall level of contribution as a team is. I'm slightly reticent to give this advice because if you turn this into a hard target then you'll start to see some pretty dysfunctional behavior. If it's just used for informational purposes I think it's very different. If you're targeting a certain number of reviews in a certain time period and that's a hard target and you're going to be in trouble if you don't hit it then the quality of reviews is going to be awful. People are going to be doing drive-by reviews. They're going to pick the easiest review they can think, something that somebody else has already plus one and plus one it and they've got a plus one stat. That's not useful to anybody. Also all of this stuff is available through Stacolytics. There's nothing proprietary or secret here. This is all out in the open. This data is all pullable from Stacolytics. We just found it convenient to have it in a team-focused dashboard, a team-focused view. We also had a view that gives us stats on a patch-by-patch basis. Here we're tracking things like time to first review and time to first core review. That's the time between submitting a patch set and it getting attention from somebody else in the community and somebody else in the core team. This lets us have discussions about what it is about that patch that made it get attention sooner than the others. Was it chased in IRC? Was it smaller? Was it larger? Was it better formatted or documented? Did it have a really nice commit message? All these kinds of discussions that we can have about why that was. However, we still get to patch set 42. How do you maintain your focus and your morale when you're at patch set 42 on this change and it's just been minus one for the 44th time? I guess there's three things you can say about that. At least it's getting attention still and hasn't been either minus two or abandoned. So people see value in the thing that you're trying to propose. It's quite often useful to go back to the first patch set or first couple of patch sets in that change and have a look at how much your change has evolved in those 44 patch sets, how many spelling mistakes you've corrected and what you've learned about the change that you're trying to make to the code base. And if you're in a really foul mood, you can console yourself by saying that, well, maybe it says as much about the review team's ability to convey their will if it takes 44 patch sets, 42 patch sets to get there, but only if you're in a really bad way. Okay, so what happens when you reach a dam where you've been asked to deliver a feature by your customers or by the marketplace and no matter what you do, you cannot seem to get it accepted upstream. You can't seem to get the work accepted upstream. And this is where we have in place, so this is where we use the Git upstream tooling to maintain a very small set of changes or differences from the upstream vanilla code base. We have a very clear and ruthless policy about the kinds of patches that we will accept locally. In fact, we have a process that's loosely based on the Nova specs or a specs repo process where you will submit a patch set into Garrett explaining why you need to carry this local change. It will have a template that forces you to think about the area of code that you're trying to patch and how often that changes and how often you're likely to have to do a rebase. Because for every change or proprietary patch you're putting on top of the upstream code base, you risk diverging in behaviors that customers care about. You risk letting yourself in for a large maintenance overhead if you want to keep up with the pace of upstream development. So my general advice would be to not do it at all. But if you absolutely have to, because you've hit a damn upstream, then do it with clear and ruthless policies and enforce them transparently. Okay, waterfalls. Most corporate development processes of waterfalls, that is phase gates, even those that claim to be agile are probably closer to a phased approach as anyone who's lived through a QA sprint or a test sprint will be able to testify. And the problem with a phased approach to development where you run through specification, requirement specification and elucidation and then coding and then testing is that they don't deal too well with uncertainty or changes, well, especially late in the day, but actually at any point in the process. And when you're working with upstream there can be plenty of uncertainty about when something's going to land. You have, because so much of it is out of your control. If you're trying to deliver a major feature to your customers that might comprise of a dozen or so patch sets and any one of those may be delayed for any amount of time and if that gets pushed beyond a feature freeze then you may have missed your delivery window altogether. So how do you deal with, and typically product management project managers, I can be rude about project managers because I supposedly am one, don't deal too well with uncertainty. They don't like being told that, yeah, maybe this will go in next week, maybe it'll take three months. It gives them the heebie-jeebies and makes you not sleep at night. However, just because there's no, just because there's uncertainty in the individual items doesn't mean that we can't provide useful forecasts to the business or to the folks who care about this stuff. We don't just say, yeah, we can't estimate this stuff. You'll get it when you get it because that's not useful to anybody. Marketing folks need to know when to roll out their campaigns. The training folks need to be able to prepare their training plans, et cetera. What we did for the Bravo project was we looked at the history of... We had collected about three months' worth of data of how long it was taking us to get certain things done. Most good ticket or work-tracking systems will be able to do this for you. I'm not talking in terms of kind of hours spent on a task. I'm talking about calendar dates, wall clock dates, from when you started work on this to when it went into your product and you had no further thinking to do about it. So it was fully QA'd and tested and done. If you've got some historical data about how long those sorts of things take and you know roughly how many things you've got to do in a given release, then you can actually come up with some surprisingly accurate forecasts surprisingly easily. So if you look at the statistical distribution of the end-to-end delivery times of your individual work items, you can then very easily use a statistical or a Monte Carlo forecasting method to predict the confidence levels of the delivery of the whole chunk of work items that you've been asked to do. I appreciate I'm going through this pretty fast. This slide probably deserves a talk on its own. So I'm just going to say, start off by drawing a cumulative flow diagram for your work items as they move through the different phases of your project. So cumulative flow diagrams are one of my favorite visualizations for visualizing how a sequential process is behaving over time. This is a diagram from node pool, which is a component of the upstream infra CI suite, and it's measuring how many nodes there are in each state at a given time. And the idea for applying this to your process is that you count every day or every week how many tickets there are in a given state. So how many tickets are in the requirements planning phase, how many tickets are in the development phase, how many tickets are in the QA phase, how many tickets are deployed to a dev test environment, a staging environment, deployed to your production environment. And that will let you draw a picture that looks like this. There's all sorts of things you can tell from a cumulative flow diagram. But the point is, if you're gathering enough data to draw a cumulative flow diagram, you're gathering enough data to make some accurate forecasts about when stuff will get done. There's a difference between accurate and precise, right? There's a difference between accurate and precise. Accurate is how right you are and precise is what sort of a range you're offering. These kinds of probabilistic forecasting methods will let you offer accurate forecasts. They might not be that precise if you don't have a lot of data. So we've started from a salmon farm in a CS tree. We've worked our way upstream. We flattened waterfalls along the way. And this is pretty much where the metaphor breaks down because at this point in a salmon's life cycle they spawn and die, which is unfortunate. So luckily that hasn't happened to our teams. We've managed to maintain a maintainable pace. I much prefer this picture. We have, by putting these very simple but very effective processes into practice, we've managed to maintain a sustainable pace for our engineering teams and have lived to swim another day in the OpenStack River. Thank you. So I finished early. Who's got some questions? Very good. That means I've thoroughly confused you all. Thanks.