 Hey everybody. So I'm sure you love instrumentation like I do. I was checking my heart rate We're at about 101 so it's sort of like a warning. I wouldn't call it a critical yet. So I think we're good. So hi I'm Jarvis or that's Jarvis. He's our HuBot. So let's talk about chat ops I'm a product owner at CA Technologies and I support our agile central and flow doc products Jarvis is excited to be here, but this there's a little static. So you're gonna have to bear with us So what's chat ops? It's like doing ops, but doing ops in the flow. Why? It's transparent. It's like pairing all the time. It's remote first. You can do automation It's safe. There's all these things if you want to talk about it Let's chat about it after this today I want to talk about how you we use flow doc our HuBot Jarvis and some chat ops patterns and share what our teams do So I want to thank everyone for letting us be very transparent all of these examples are real So we're just pulling back the covers. These are real things. So bear with us Context we have 18 teams in our release train over a hundred people Engineers and ops and dev ops and designers and everybody working together three sites Boulder Denver Rally Continuous deployment dozens of things going out to prod every day feature toggles. They let me do that I turned things on and off for people and so there's a ton of change happening at any time out in our environment So, you know, how do we swarm around this? How do we take all those people focus on getting great things done? But still making sure that when we need to work together, we do that so one of the patterns we use is we share the load You know, there are teams are a little bit more sticky to ops We have a tier one team a support team teams that make it really easy to measure and monitor and things like that But we expect that if you put a service out into the world, it's your kid You're gonna take care of it So we have of the weeks for different services and things people stand up and say we're gonna have somebody We're gonna figure it out They're gonna be that person you talk to if you need to disrupt someone so here I'm volunteering to hang out with Jarvis. So if someone asked something about Jarvis. Hey, does anyone know how to fix Jarvis? You know calls me right and so maybe somebody doesn't know who I am but say hey Adam this thing called you I think you volunteered for this. Maybe you can help me out and sometimes that's more than one person So who are all these people? So Jarvis list alert tags. We have a dulcerita. Can you guess what they do? We have some interestingly named services We have on-prem we have someone looks like Wade is actually really good with our kegerator And he can help us switch the lines So what else do we do so we swarm on big problems? So sometimes, you know, if we have a P1, which is our major thing something's really wrong Stop what we're doing and fix it. We're gonna get all these people Together and figure out what happened right and sometimes it takes a while to figure out what happened And that's the hardest part. So we have some P1 commands So one of them is what the heck is this thing? And so if you're a new person you can say hey Jarvis What is this P1 process and you can hang out with Jarvis and you can sort of learn about what that is That's in our op stocks repo, which I'll talk about later and Jarvis announced P1 So when things go really bad you pull the fire alarm. This is a real one that happened last week. Unfortunately. Oh, no Did something really go wrong? It did. I'm sorry What happened so Safari users couldn't log in so we get together we announced that it pushes it to a bunch of team Flows we all get together in one place to have a single conversation We talk what happened who was it do we have a resolution looks like we're rolling forward should be out in five to ten minutes We're getting some metrics out of Splunk. We say we see it going down. We see it coming back up Well, so actually recently have a fire alarm. So if this happens, there's a service that lives out in the cloud It knows if we're having a P1 things can pull it. So we have a Raspberry Pi that checks it We got a big blue police light. What else would you hook into that? We got the red alert siren going off on some speakers and so maybe you're playing chess Maybe you're having a conversation. Maybe you're trying to focus and you have your chat tool down for a minute So this is really disruptive and you can jump in for a second come up and say What is it? Can I help and if you can't you go back and if you can you swarm? And then we inspect and adapt right so things go wrong. I love when people lean into things going wrong It's just about how do we how do we make it not happen? How do we recover faster? So how do we inspect and adapt and learn along the way our tool is a per a post event retrospective This is not a tool that any one team owns Those folks involved as part of saying, you know, we're back up and everything they say we're gonna meet about this We're gonna meet about in two hours We started gathering all the stuff and putting it together in a Google Doc from Splunk and Victor ops and Flo doc and the bot and everything and 20 people get in this doc and we get a facilitator and we talk about what went well and what didn't go well and how Could we recover faster and how could we detect faster and then hopefully folks take things away from that? And this is actually an example. We created a story. I asked Jarvis to tell us about this story And in this case, we're updating op stocks, which is our repo that anyone can contribute to it's where we keep Things that Jason was talking about, you know to answer some of those questions And so we got that change done the very next day. So cycle begins anew Check out my projects some of the code from today on all these tinker lab things And if anyone wants to talk about chat ops, I am super passionate about it And I'd love to write some great scripts with you. Thank you. All right. Thanks Adam