 Good morning, everyone. No watching Stranger Things during my talk, OK? I am here to talk to you today about the relationship between tools and culture. How many of you develop tools as part of your job? We heard about a lot of tools today. So how many of you are thinking about adapting some of those really cool tools we heard about this morning? Great. So I think so, too. I heard some great tools. What I'm going to talk about is how culture impacts the tech, and then also how the tech can turn that around and also impact the culture. So we're going to start with talking about how culture impacts the tech. I think it's really important for us to understand when we build tools that we're basically reflecting the culture in those tools. If we're not reflecting the culture in those tools, we might be challenging the culture with our tools. Those are two very different approaches, and they need to be dealt with very differently. And so being mindful of which approach you're taking is super important to the success of your tool. Likewise, when you adopt a tool, understanding which one of those things is going to happen when you bring that tool into your organization, is it going to challenge the culture, is it going to reinforce the culture, is also a consideration. So I'm going to talk about Netflix culture a little bit, and then I'm going to bring in a discussion about a tool at Netflix so that we can reinforce that conversation. So Netflix is really about freedom and responsibility. I'm going to talk in a little bit more detail with the spines, but basically it means that employees are empowered to make a lot of decisions, but need to be responsible. That that's a good choice for the company, for their team, for the product. I thought the clicker was going to run out of batteries. That would have been a different problem. At Netflix, we really encourage those decisions to be made at the place in the organization that's going to fill the impact the most. And so employees are empowered to make these decisions at every level. It means that we also have to share information really deeply, broadly, deliberately. Because otherwise, people won't have enough context to make that decision, to make any decisions. And we try to avoid unnecessary rules. These pieces of freedom and responsibility are the part that I want you to think about as I talk about an open source tool that we have called Spinnaker. How many of you use Spinnaker today? So Spinnaker, I'm not here to encourage you to use Spinnaker, although I think it might be fun if you thought that it was useful. What I'm really here to do is to talk about it as a case study and talk about how our culture is being reflected in that tool. But first, I want to talk about the genesis of Spinnaker as a continuous delivery platform and infrastructure management platform. Before Spinnaker, there was a tool at Netflix called Asgard. How many of you used Asgard for cloud delivery? Yeah, several of you. Asgard was really built to simplify the delivery of our services to the Amazon Cloud. We opened it up and put it in open source. And a lot of people enjoyed it. What they did was used it in their own environments and for their own, but they ran into some of our cultural considerations when they did, such as the fact that we didn't need permission to be able to do deployments. They also found that it only supported one cloud provider. And there were a lot of other limitations about Asgard that made us think that we wanted a continuous delivery platform beyond just a deployment tool. So we were working on Spinnaker inside of Netflix. At the same time, there was another project, which you hadn't heard about, probably. It was an internal project called Asgard. This was in 2013. I'm sorry, it was called Edge Center. This was being done by another team at Netflix, the Edge API team. And they were building this for their own purposes because they couldn't wait for the centralized team, us, to finish the product that we'd been working on. They had very immediate needs, and they needed to do fast deployments, and they needed to have those done in a safe and reliable, repeatable way. So they built this tool called Edge Center. And a lot of people were really enjoying using it. And then other people were coming to them saying, gosh, we're not in the Edge API teams, but we'd really like to use your tool. And inserting some pressure on that team, because they're not a centralized team, they don't typically support centralized tools. You might wonder, what was engineering tools during this time? And we were continuing to work on the centralized product. But we were chasing the features that were showing up in Edge Center. If we wanted a centralized tool that was going to serve all of Netflix, we needed to have those features represented in the tool that we were building. But remember, they were solving a problem for a very distinct purpose. And so they were able to run very, very fast. And we were basically solving the problem for the rest of the organization. So we were running a lot slower, maybe walking. And so this presented a challenge. How are we ever going to catch up to them? These two teams met a lot. And every month they'd meet, they'd say, oh gosh, not yet. We're not ready yet. We're not ready yet on the Spinnaker side. And the Edge Center side would say, your product isn't ready for us to adopt yet. And I'm telling you all this because I think that the answer to this problem basically demonstrates two of our cultural values at Netflix, courage and selflessness. The answer to this problem was that Andy Glover, who leads the delivery engineering team at Netflix, went to Sangita Narayan, who leads the Edge API team at Netflix and said, it was a lot of courage, said, I can't possibly keep up with you. Is there any way we can join forces and make this happen? Because I am not going to be able to get Edge's needs built into my tool fast enough for you and for the rest of the company. And Sangita, who's very selfless, said, you know what? We can come up with a solution to this problem. How about if I give you two of the developers, the two developers who are working at Edge Center, I'll stop development on this tool. And they'll work on your team. And they'll build our features that we need into Edge Center. They'll bring the context from the Edge teams into the product as well. And so this is what they did. And they worked together as a team. And they were able to realize Spinnaker. I probably just embarrassed those two people on stage. And I did not warn them that I was going to do that. But I think it's really important to call out cultural values and make sure that people feel appreciated when they do something that I think is really quite unique. The resulting product was a continuous delivery and infrastructure management platform. Edge Center was a continuous delivery tool. Asgard and its infancy was a delivery tool and an infrastructure management tool. Spinnaker brought those two features together. It was designed to be a pluggable architecture. Some of the challenges I said that we met with Asgard was that people would fork it because they had extra needs on top of what Asgard could do. It was never going to work for everybody. And so they would fork the tool in the community and go off on their own path. And we would never get the benefit of that innovation. Likewise, people would fork it for other clouds. So Spinnaker was designed initially to be a multi-cloud, from the very beginning, to be a multi-cloud tool. We worked with other companies in order to realize the multi-cloud solution. And this actually changed the architecture because you're thinking about what the cloud is, not just what your implementation of the cloud is. So in 2014, Spinnaker was an internal tool. We used it for about a year before we opened it up to the broader ecosystem. But during that time, we were working with some partners on developing this multi-cloud strategy and this pluggable architecture. When we finally released it, we talked about the fact that we've been working with Google and Microsoft. We were supporting Amazon's needs with our implementation. But Google was supporting their own endpoints that a cloud provider, Microsoft was providing its cloud provider, and Pivotal was providing one as well. So what you see today is a tool that can support many cloud providers and that doesn't have to be forked in order for all of us to realize that innovation. The community is vibrant and strong, and I'm really excited about having all of them working together. What's not on this list is actually that Oracle is also working on a bare-metal solution. So it continues to grow. So take you back a little bit and see that Spinnaker in 2015 really started at the bake stage and moved forward until your instances were live in the cloud. We're continuing to expand that, and I think that moving forward will actually have be pushing back a little bit into those earlier stages around continuous integration and code check-in so that what you actually feel is this blurring of a distinction between continuous integration and continuous delivery. And you can also see that the stages after the deployment, well, all the way from the baking all the way out to the end of this timeline, show that we are also pulling in resiliency testing with our chat platform and also some health and status monitoring as well, which I'll show you in a few minutes. So continuous delivery, it's primarily what you find here to support continuous delivery. So there is a pipeline supporting environment. We also support setting persisted properties. We're deploying immutable objects into our ecosystem, and we need to be able to make changes to those immutable objects. That's kind of funny, isn't it? We need to adjust those immutable objects maybe to turn on debugging and such. And so persisted properties give us an ability to be able to do that. And so we've often called these fast properties. We're starting to call them persisted properties now because we want to emphasize the fact that these can roll out just like deployments roll out across your infrastructure. And they can be rolled back as well. And one place where I want to stop and talk about culture with both the deployments and persisted properties is that one of the expectations we have at Netflix is that we'll be able to have an active, active environment. We don't want your stranger things streaming to be interrupted by the fact that an engineer maybe pushed out an errant deployment or changed a property that they shouldn't have and the whole system goes down. So one of the ways we can respond to problems in our ecosystem, which a lot of times are us shooting ourselves in the foot, is to fail over to another region while we have enough time to figure out what we did wrong in this region. What that means, though, is that in a multi-region environment, the deployment system that Netflix has, what we don't want our engineers to do is to push globally to all regions at the same time because then what have you just done? You've completely defeated the purpose of having an active, active environment. And so we need to set that expectation within our culture and then also with our tools, but we don't want to restrict it because you understand your service better than anyone else and we want to make sure that you have the freedom to make that choice, but you understand that basically you're taking on a lot of responsibility. We also want to give the developers the ability to manage their instance health. What needs to happen to this service? Do I need to re-bake it and redeploy it because something was deprecated? I want to be able to give you a dashboard and opportunity to see this at the place where you're spending your time. So at Netflix, like I said, it's a pluggable architecture. Some of the plugins that we have added to the pipelines include this CHAP RKS automation platform, automated canary analysis, and then some squeeze testing to figure out how to right size your instances. Key to understanding the way that this tool works and the way that Netflix works is to understand that we want to provide guardrails, not gates. I don't want to stop you from doing something. I want to give you context about why we think it might be a bad idea. I want to make sure that you have all the context to make that decision, but I don't want to prevent you from doing it. We do this for a couple of reasons. One, because the developers are making the decision about what to deploy. And also before they understand their service better than anyone else. If there's an emergency, I do not want to get in their way. The last thing I want to do is put up a gate that basically people have to jump over and trip and fall and make things worse, right? And our tools really reflect that. Developers have a lot of responsibility. They decide when to deploy. They're on call for that service. They're the ones who are going to be woken up in the middle of the night if they made a mistake and they have to fix it. So we give them context about execution windows as well. And this is if all of you are watching Stranger Things, we have a pretty good understanding about what those time frames will be. And that's reflected at the bottom of the screen where you see this very cyclical graph, very repeatable pattern about what traffic looks like to our service. We provide this information so that developers can create their deployments and time-based deployments to reflect times that aren't busy that they're able to deploy, windows that they're able to deploy into. But they might have a service that they don't really think will be affected by that and it's okay for them to deploy outside of that window so they don't have to use execution windows. Engineers manage the life cycle of their services but we also want to give them context about which services are absolutely critical to the service overall. And this traffic guard's page is a solution to the problem of making sure that systems don't go to zero, that you don't accidentally delete and duplicate some services that you didn't expect to. Engineers decide their own deployment strategy. There are many different deployment strategies built into Spinnaker, but we also give in the continuous delivery pipeline, we give the ability to be able to add a manual judgment. Now, does that sound counter to continuous deployment? Maybe it does, but what I think this does, it's optional. I think what it does is it gives the opportunity for our developers to build confidence in our tooling that they know that when they push that button, they can decide, do I want a manual stage? Do I want that last screen before I check out at a retail store that says, don't worry, we're gonna let you look at this one more time before it processes your credit card, right? And that's kind of the equivalent here. It also gives them the ability to build confidence in their own service. Am I ready to go without a manual deployment stage? Maybe I'm not yet. And so I want to make sure that we build that into our tooling so that it's really accessible to everyone. We want to provide smart defaults based on the criticality of a service. We've started identifying our services as critical to streaming, critical to the business, different expectations around that service. The core services that will provide you with the ability to be able to stream a movie all have great fallbacks in the event that the service is not available that tells you what you've been watching on a different device or the service isn't available that tells you some recommendations about shows that you might like, we still want to be able to stream a movie, right? And so we want to treat those core services, those critical services a little bit differently than we would treat the services a little bit farther out in the tiers. What are we going to go in the future with this? As I said, I think we're going to extend back a little bit more into the continuous integration space, kind of blur that distinction a bit. Declarative continuous delivery gives us the ability to be a little bit more abstract about what we're deploying, where we're deploying to. It gives the developer more focus on the domain and the problem that they're trying to solve. We're taking on some of that responsibility by them giving us some configuration information rather than what specific instance type do they want to deploy to or what specific parameters they want to use over here. So we're going to see a lot of movement in that area as well. Spent a few minutes here talking about how tech impacts the culture and how some of this was a surprise to me. An increased interest in delegation of that responsibility to us was kind of a surprise to me. When I joined Netflix about four and a half years ago, the sense, the engineering culture was really all about let us configure all the things. We want to see all the details. We want to look at the every detail that might be available in the Amazon console. And by the way, we might want to go to that sometimes. So give us that access, right? Just we don't want you to make decisions for us. We want to be very involved with all of that. 2017, give us smart defaults. Now this might be a change in the way, in the people that we're hiring might be more domain experts, maybe. It might be a reflection of the fact that people trust our tools more. I don't know, but what I'm telling you is that people really do want to focus more on the problems that they're coming to solve rather than every single person understanding the details of a build and deployment. Something for you to consider as you build your own tools. What are you making your users know? We built some understanding about what the best practices are across all of these providers. We really understand what a cloud deployment looks like, not just an AWS deployment or Google deployment or an Azure deployment. We really get to build those abstractions and really understand what the cloud ecosystem looks like and build those into the tool. Lots of community contributions. I don't think this is always realized in open source projects, but this community's been really responsive. We've used inspiration from the Kubernetes Cloud provider that Google developed. We've also implemented directly the role-based access control solution called Fiat. We're deploying a lot more often. We have about 8,000 orchestrations a day. That includes tearing things down, creating services. It means that we're also doing a lot of deployments. We're also holding some hackathons. And I think this is critical to the success of Spinnaker internally, especially because we want our other internal teams to be able to understand how to contribute back to Spinnaker. It's those stages, those custom stages for their use cases are what's going to continue to grow the tool. And we don't want to be the linchpin, the bottleneck in that problem. We're only getting started. I think there's a lot more we can do. But I want to give you some parting thoughts. I want you to think back to what I talked about with courage and selflessness. Are there places in your general workday that you can exhibit courage and that you can exhibit selflessness and you can actually make change in your environment? I also want you to think carefully about the tools that you choose. Are you fighting culture or are you reinforcing it? I say here, don't fight culture. I just actually mean be more aware of the fact that you are. Thank you very much.