 So yeah, as Mike said, we were chatting about this for a little while and I have opinions just like most of you. And I decided, let me quickly write a blog post about this which has far more content than this will have. I had to cram a lot of this into 30 minutes, but I've been doing this for quite a while and these are clearly my opinions on this. So a little bit about me. Been doing ops for real long. Feels like forever, but it's definitely not. Pager forever, as we all know. It could be pretty fun, right? Been contributing to open source for a while. Fathered to a very fantastic little guy. Tried to get them here today, but he was cranky so that wasn't happening. And one cool thing is that DevOps and SRE whatever all these other things are, the term is thrown out there so much. So it's really difficult to understand what people are talking about and the different roles people are hiring for and all that fun stuff. So interestingly enough, I love to field phone calls from recruiters and kind of take what they're looking for and then give them my 15 minute opinion. And usually at the end of the call, they're like, oh, this is great. You open my eyes and 15 minutes later I get another phone call about the same thing. So that's really fun. So right now we're at Bloomberg, just as Mike said. Been at companies previously, DigitalOcean, Sailthrough, ideally a bunch of startups here in the city. And yeah, so just quick disclaimer. These are my opinions, just like this man here, probably one of the most opinionated people if you know who that is. Well, it's Hunter S. Thompson. And I've been a part of just so many different organizations in IC roles and leadership roles and this is just the information I've kind of gathered along the way. Now, this won't be for every organization. You could start off small, start off big, but there is really no definitive answer for how to implement this type of org inside of your current company. It's impossible really, not impossible, but so here's all the fun terms that you've probably heard over the time, right? You have a series, tech ops, network dev ops, dev ops developer, senior dev ops engineer, production engineer, just kind of everywhere, right? All the different calls and all the different LinkedIn requests you get. These guys will come in and help make your company super fast and fix all of the problems, handle all the technical debt and everything will be super fun. Incorrect, which is what I love to talk about and tell people. So this is actually my son, right? I sat him down in front of, sat him down in front of the next generation MacBook Pro and iPhone 8 and showed him basically what I do day in and day out and he was obviously thrilled, which was, it's kind of funny but he got real angry obviously. But again, like one of the most important pieces to this puzzle how to explain that the most is that it's not about the technologies, it's not about the people coming in, it's not about all of the wave of doing things. It's primarily the organization really needs to focus on the cultural shift that's gonna happen and kind of if people start to understand that this cultural world on bridging dev ops or ops and developers and that whole world, once that kind of starts to happen and click and the engineering culture becomes much better and cleaner place and more welcoming place to work in then these things could start kind of taking form. So I wanna talk about quickly like an implementation that we've been doing at Bloomberg and what we're trying to do. So we're taking this massive organization, right? It's huge, 4,000 developers, thousands of applications and we said, okay, well how do we support all of this? How do we get everybody on the same page so we don't have the same people building the same thing across every team? I'm sure everyone has had that happen. So I wanted to dive into kind of what SRE is to me and the numerous challenges around implementing it in different size organizations. So what does SRE means to me, right? It's a core group of individuals who have a wide array of skill sets. These skills gets can rain from operations, networking, development, hardware, distributisms, all the fun stuff, right? The response for building out and scaling all aspects of your infrastructure and kind of helping people really understand the right tools or the good tools that could fit and help a certain situation and then it's not just about the technology, it's the mindset that the people think in and really always try to push that the cultural shift is just so important for people to understand and that's kind of the current situation that we're working through at Bloomberg. And then so you've got the areas of responsibility that I kind of really want to quickly touch, which is monitoring, configuration management, automation, core infrastructure services, tooling, provisioning, capacity planning, right? Because that's clearly important. Performance, documentation, documentation, run books and incident response. I'm not sure if I said documentation yet, but that is clearly very important because we all go into an organization and there's nothing. And oh, we have Wiki, we have Confluence, we write it on this wall over here, we have some napkins over here, but I mean, at the end of the day, nobody has any idea for half of it. So I like to talk about application SRE versus infrastructure SRE. As you can see, you have an individual staring into a mirror because technically it's the same thing and it is the same person, just the roles and the responsibilities are slightly different for how we kind of, or how I kind of look at allocating these individuals between teams. And the application SRE is someone who's allocated to a specific app team or specific service team and these individuals are kind of the one-stop shop for bridging the gap between infrastructure SRE and the development teams. The application SRE, in my opinion, is usually very knowledgeable about the app or the service that they are working on, that they're assigned to, they can be a part of one, they can be a part of many. It all depends on the size of the org and kind of, well, size of the org and how many people you actually have running there and that's kind of important. Can't see my bottom notes, but fun. Infrastructure SRE, as you would assume, it's someone that kind of works on building out the core services, the centralized services. You're monitoring, you're alerting, you're Kafka, you're HBase Hadoop, building all of these bits as services and kind of offering them back to the application teams and then the application really kind of works in the middle there. So I'll talk about bridging the gap, right? Another very important item that needs to be accomplished when implementing this type of organization into a company is, natively, operations and development have always been separate orgs and worked in silos and this still happens today in most companies and that's one thing we noticed coming into this was nobody talks to anybody. We still have people that I meet brand new every day and I'm like, oh, you've been building that? That's awesome, we have that over here and this other guy has it and this guy probably has built it also and it creates a very difficult situation when everyone needs to kind of eventually come together. So whether it's how the application is written or will be written and how it can be operationalized and everybody has skin in the game on this and we as technologists and employees and all of the other fun portions of that, we need to bridge the gap and make things just far more cross-functional and far more understanding between the teams and everyone should kind of help and be a part of the design process and the architecture reviews and the operational reviews and not just get an email saying, oh, hey, we're going live in two weeks and you guys need to figure all this out for us. So where to begin, right? Because everybody wants to be a part of this journey and everybody wants this type of support within their company and so how do we know what to do or who's who? So I wanted to just quickly walk through a few of the areas where we could start to address the different aspects of SRE. We talked about application SRE. We talked about infrastructure. We'll go through some organizational layout stuff that we worked through. We'll go through some support structure for the outside teams that'll be involved. In some cases, right, you have a lot of these smaller companies and your role will basically be all of this, right? You're gonna have to do all of the work but eventually when you start growing, you can start breaking out a lot of these pieces and you don't necessarily have to be the point person that goes out and tries to solve all of these problems for your company. So application SRE, right? Little, let's drill slightly more into this. Like I said before, these application SREs are embedded into different application teams or not embedded but available through some sort of chain of command and they're available to multiple application teams. The goal kind of is for us, is for the application SREs to spend about 70% of their time dedicated specifically to the application and service team and then take the other 30% of their time and kind of assist the other areas of SRE and building out some of the core infrastructure, relaying some of the problems that they're seeing in the application world and say, hey, these are the problems that I'm seeing. These are the most consistent issues that we have across the board. What kind of a solution or what can we build to kind of centralize this and help multiple teams across the board? Now these are rough numbers depending on the situation, but for us it was a pretty decent guideline to follow. We also want to make sure that the application SREs don't just become a dumping ground for the work that the app teams don't wanna do. Like, well, I don't want to automate this, this just thing constantly breaks all the time and I've gotten a thousand pages for it but they expect you to just sit there and field them and then try to go off and fix them and deal with all the problems that they don't necessarily want to and you don't blame them, right? But at the end of the day, we kind of need to fix that ourselves. So infrastructure SRE. Primary focus for us that we have is really or that I've seen is really building out some of these core infrastructure services inside of the company, right? Building manage some of the core tech, a lot of the provisioning OS, DNS, DHCP, networking, all of that fun stuff that is obviously needed. Automation of the infrastructure services, telemetry is huge. That's probably super important without some of the telemetry log aggregation and Chef. I don't really know what you would do all day long. Like, what do you look at, right? Like, how do you know anything's broken? If everything's just cool all the time, then awesome for you but that's not reality. So without these types of plumbing in place, for us, it was one of the initial steps that we had to build and how we've had to build it in the past to kind of start even introducing this organization inside of any company, really. I mean, these will be probably the most core components. So organizational layout, right? I like to pick a home reception school. The one thing is like, the title's Narnia but definitely don't have any talking animals in here. Somebody picked that up when I was going through the slides and I said, well, I'm not changing everything. Trying is the first step towards failure and that's hopefully what most of us do is, because if you don't fail at least once or twice, how do you really know if what you're doing is the best possible route? So our organizational structure is something that kind of should be addressed when building out a SRE. Now we have the application and the infrastructure SREs but it is centralized. There is a nice head of SRE that kind of controls everything. And I'm gonna show you a quick chart on kind of how we do it. The leads inside of these different teams can have multiple app teams, multiple infrastructure teams that could be cross. It just all depends on the size and however you guys would like to chop things up. That's probably the most important. So here's an example of how we have implemented this initial phases of SRE. Again, it's just because it's surely the size and the mass of our company. So we have taken large amounts of infrastructure teams that we have focusing on services, HBase, Hadoop, Meso, just a ton of different things. And then we dove into the application side and instead of going out and specifically saying, oh, let's hire these application SREs. Again, it's very difficult to figure out and find it and what the job specs should be. We actually dipped into the application teams and said, okay, who here is interested in kind of automating some of these things, fixing some of these processes and kind of being a part of the overall infrastructure and operations of your service and your teams. And there was plenty of people that raised their hands. I was like, okay, cool. I don't have to write bug fixes all day long. Let me go over here and try to fix it from a different angle. Support structure. Yes, I have no idea who this guy is. It's just, it seemed pretty cool and he was super enthusiastic, so I picked the photo. So support structure for us is another very important item, right? We have thousands of applications and most people have, however many it doesn't, it shouldn't really matter, but we don't want to bombard the teams with pages and alerts that are just, that's just noise, right? You're gonna burn people out pretty quickly and that's obviously not fun. So now, if you haven't have an infrastructure history working on building out some portion of a metrics cluster or building out any other centralized services that's going to take a lot of thought and lots of their time up, you don't want them to start getting pages in the middle of the night that, you know, pull 80 on some random application that has no documentation, just went down and they're just gonna look at you and go, oh, well, I'm probably not gonna do this and your app may stay down or you're just really gonna make them angry and the next day they come in, it's going to be interesting. We've run into plenty of situations like that and it's unfortunately not fun. So this is kind of an example of how we've laid out or how I've also laid out building out some of these support structures. So at the bottom, we have this really fun team called Frontline Support. Now, Frontline Support is, they handle a lot of the initial fielding of the pages. They do a lot of the incident management. They'll document a lot of the common issues for us. They'll follow runbooks because that's important, right? If every alert should have a runbook, that's just my opinion. If you have an alert in a system and it's just blank, I mean, that's not helpful for anybody. So these guys are really, or these people are really the lower tier of the system, actively monitoring everything. They'll initially try to diagnose the issues and they'll take the common patterns and they'll try to improve the runbooks because that's something that is very important because they're on the ground. I mean, those are the guys that are really getting all of the main brunt of the issues. So most of the time, the Frontline Support guys, people will go directly to the application SRE teams. They'll kind of give them this collective understanding of all of the issues that have happened and that they've went through. And now, the application SRE team, here's where I kind of broke out some of the primary responsibilities that they would have on this example web application team. These are some of the main parts that they'll be covering, which is the performance tuning, release process on call scheduling. They'll be building the initial monitors, the alerts. They'll be handling the database, the Rails, the web server, doing a lot of the best practices and tooling info, and obviously, they have to have the application code knowledge because without that, they can't really assist many people outside of the team, and then they're in direct support with the application team themselves and that's mostly most of the areas in which the application team controls. Bug Fixes, application performance, testing, and all the other fun stuff that goes along with building out a service or an app. So now, there's a skip between front line support and application team where the infrastructure SRE is. So now, their direct support is to the application SRE team, and we've saw this to be much more helpful so they're not trying to field information or alerts or pages that they don't necessarily understand of what's going on. So they're really responsible for supporting this overall random Ruby web application, the monitoring cookbooks, monitoring standards, capacity, database cookbooks, API hooks to the infrastructure, best practices, OS updates, service and package updates, just all of the actual underlying architecture and infrastructure for running this fabulous Rails app that we all love today. So yeah, so this is kind of what we've approached and how we've seen the flow work out for us and it seems to work really well. And I think I went through that very quickly, five more minutes, awesome. So that was my quick 30 minute approach to kind of explain what random thoughts go on inside of my head, which is interesting. So it's just, this is my opinion on quickly implementing an SRE organization. I have a blog that I wrote that has far more information about it. I like to talk, so you could just come up to me and ask me kind of anything you want and what I think about it, so I'm sure everyone else has their own opinions on it. So questions, some contact info? So yeah, you're out of time. Yes, so if anybody wants to talk, let me know. Cool, thank you Anthony. Thank you.