 Good morning. It's the last day of the Boston Summit. I know that it's early. I'm from the left coast, so I got a little bit of coffee in me. Hopefully I can get my mental rev on this morning. But looking forward to the rest of the day, and I hope you have a good summit so far. So it's want to introduce who we are. I'm Joseph Sandoval. I'm the manager, engineering manager, of the Cloud Platform team at Adobe AdCloud. Mikaela Mogalenko, he's on my team. He's one of the lead SREs for the Cloud Platform team. And he's been one of the drivers behind our adoption of the topic that we're going to talk about today, which is chat opsing your production open stack. And so just to kind of get that business side of it out so that I don't get in trouble when I come back to the office, is to let you know a little bit about us. You may have seen me speaking before last year when I was in Barcelona. And we were to Mogal. We were like a leading advertising programmatic advertising company. And at that time, we had got acquired by Adobe, which was a great opportunity for us to really lead in the AdCloud space. We're really the first end-to-end platform that allows you for managing your advertising across whether it's traditional television, it could be your computer, it could be mobile. But what it does is it really makes it simple to be able to really deliver video, display, search ads. And it's an independent cross-channel platform, so you can identify, engage with your audiences, and be able to reach them no matter what. And especially now, as you see what's happening with ad space is shifting. It's been from the traditional world to where it's in this dynamic programmatic world where there's so much interaction behind the scenes. But at Adobe AdCloud, the one thing that we love to talk about is we really want to make it an experience for our customers. And that's what we try to deliver. And as far as our team as well, that's what we try to do as well. We really want to make sure that as we serve up our platform that it's a good experience. We don't want to make things harder, introduce impedance. Or when someone has a request, we want to try to make it so that they're easily enabled to get the answers that they need. So I like to think of the platform that we run is like the bacon of the platform. I know my boss loves bacon, and that's kind of what we are. If you're a vegan, I apologize. Just think of it as the faking bacon. But it's really kind of like what is core to run our business. And this codename is really representative of our infrastructure platform. The one thing that we've been very pragmatic about is that even though that we are running primarily open stack, we have workloads that also need to be considered because of the high volume of traffic, the billions of requests per second. So bare metal is in our infrastructure. It's something that we use on our edges. And we really just try to take an infrastructure as code approach. And as well as that, we know with our business that we have to scale. And so public cloud is something that we consume in our platform. We want to make sure we're in a position that we can serve up to our business on demand what they need to be able to really make our customers happy, make them have a really good experience. And so at this point in time, we are getting to the point where we're going to have six clouds online. Currently, we have four running. They're running Ice House at this time. We have two new ones by end of summer, which will be rolling us into Mataka. And when I had a chance to really reflect on what we had, what I realized is that we actually have about a little bit over a 1,700 or 15 compute. We're trending getting closer to that 100,000 cores. It's going to be an interesting milestone when we get there. But the one thing that even though as we continue to grow, our business continues to grow is that we have a core philosophy of being lean in our approach. And so what that means is we have to really be judicious about the things that we choose in our platform. And so when it comes to OpenStack, I mean, we've took in that same approach. I mean, there's a lot of great rich set of services. But our business really, it's core. It's compute storage network. I think storage is a little bit kind of like I would say secondary. But we really need to really, as a small team, be able to support a growing platform. And so we always have to keep thinking of ways of like, how do we make things easier for us? How do we extend things out to our customers so that they can get information that they need at any point in time? And that's where the chat ops has really come in at and it's really helped us. I mean, it's been around since 2012. I know of you, you've ever had a chance to kind of look at Jesse Newlin's talk, but I think he framed it in such a concise way when he said that, you know, what chat up was. And he said, it's like putting tools right in the middle of a conversation. And for us, it could be a conversation about, you know, what we're developing. It could be a conversation about operational things that, you know, we need to get done. And if you just think about kind of like that, that normal dialogue of what happens, it's not like it's a true disruption in regards to how we go about our workflow. I mean, you think about what happens anyways when you're in a channel. A dev may ask you like, hey, I think I'm seeing something with this cluster in this region, what do you see? And then we'll probably reply back and come back with like some graph, some data and say, well, this is what we're seeing on our side. And there's that dialogue that kind of goes back and forth. So, you know, within that chat framework, you know, all we're really doing is just kind of inserting a bot that then interfaces or wraps around a lot of the tools that we're using in our infrastructure. And so one of the drivers of why we would do this is that, you know, in our world, it's really easy because, you know, we live and breathe open stack day to day. We know our tools. We know how to get the information we need. But then the challenges that sometimes our, you know, our customer has is, you know, there's always that smart and savvy guy who can come in there and jump in and kind of get the information he needs. But it's not always that way. And so for us, it's kind of a way for us to extend knowledge. We want to make it so that they can actually be power users with using chat. And so, you know, for us, there's a lot of benefits. I think, you know, as I kind of was iterating, you know, there's some opportunities for us to kind of, you know, enable that self-service that these conversations around like what's happening in our environment, they're captured. People can roll back. And, you know, we have teams that are in Kiev. We have teams in India. We have our team here in the US. We have some that are just spread across. You know, this is a great way for us to kind of capture like what happens. Like if something happens in the evening, we can easily roll back and like see, all right, hey, something did happen. What did they do? And so we got a nice system of record for us to be able to kind of go back and to roll back. I think when Mika and I were talking about a little bit of this before, one thing we liked, and I think I learned this when I was at Lithium Technologies, is it's nice when you can kind of enable your devs to kind of do the right thing. You know, steer them because they don't live in our world. And so what I mean by that is with automation and some of these toolings around it, we want to make it so that they can easily go and kind of like take care of tasks, respond to events. And so give them ability to provision as they need to provision. And it really kind of like avoids someone like that interrupt driven type stuff that can happen and take us away from what's most important since we are such a lean team. There's some times as well where, if you think about you're out and about, and this is something that happened to us recently where we're out at dinner and something happens, don't have my laptop right there, but if we have an easy way to off in, we can go and observe in a chat channel without having our Nova CLI or what other tools that we have around. So it really gives us kind of that opportunity to kind of get in there really quickly and no matter where we're at, we're able to kind of be respond, understand what's happening in our environment. So it just kind of makes people's life easier. And I had a ticket the other day because we've had some changes in regards to since being acquired, we've moved from using hip chat over to using Slack. But I like this ticket because one of my key engineers, he's basically saying like, you wanted access to Cloud Ninja, that's kind of like our pre-release name that we had for this new bot. And he really said, I just don't want to interrupt you guys. You guys are busy. He just wants to get to the information he needs. And so it really kind of allows us to be able to kind of give him that power. Even though he's not bothering us, but he knows that it interrupts us. So I really appreciate what that kind of gives us. And so just to kind of give you some background, I mean, Mikaela, he started on this journey before I got here. As I mentioned at Lithium, we kind of used chat ops as a way to kind of wrap around Kubernetes. We didn't want to expose Kube CTL. But here at TubeMobile, which is now Adobe, these guys were doing this for like a couple of years. So, hey, Mikaela, why don't you give me some context around how you guys kicked off? Like where were the initial things that you guys were tackling and how did you get started? Yeah, so basically we started two years ago. And at that point we had only hip chat. At the initial goal was to simplify day-to-day operations, but to bring like simplicity, but not complexity. So, and we tried to provide like the easy way to for duty engineers to query some information without having any access to CLI. Right now we're in progress of migration to Slack. So, what we did in the beginning, we just build all functionality like a separate library. So, thus we don't need to rewrite everything. It's easy, pluggable to any solution. So, I mean, I know definitely that was kind of like when some of the motivation and kind of getting started. I think the one error that I started kind of like realizing when we started kind of like walking through some of our, you know, what we had currently had done is I really kind of realized that it was like an indirect way to me to teach kind of like some of the junior members of the team as well. Like we've, you know, as I mentioned earlier, like we're a lean team, we've brought people into our organization, you know, who had no open-side experience. And we wanted to kind of give them a way to kind of like get going, get, you know, hit the ground running. You know, we wanted to make sure that, you know, we weren't destroying our infrastructure, but that they were able to kind of, you know, contribute. And so by using this, we're able to kind of like, you know, wrap commands that they can go back, learn and see what they're doing. And then as well as like, there's a lot of mundane things that I think a lot of us do, especially as you scale that, you know, you're gonna, you know, having this type of approach will just make your life a little bit easier. So as Miquela mentioned, it kind of really simplified our day-to-day, but you know, let's get to the meat of it. And you know, I want to talk about like, you know, you got to have the use cases, you know, like, what are we really talking about? And I'll just level set, as I mentioned at the beginning, you know, because of CloudMogul, we're really across, you know, multiple types of compute resources, storage resources. But let's really kick it off Miquela about like the open stack part, control plane versus data plane. Maybe you can qualify what that is and how we, I think we all can agree, but you know, why don't you clarify that as well as like, you know, some of the use cases that you've kind of like addressed using chat ops. So yeah, as since my growing every day, so we are hybrid. So we have like public clouds, private clouds and like physical service spread all over the world. So what we do, we, for example, we just provisioning like all databases, all like load balancers, all physical equipment using chat ops. So it's not a replacement for any other tool. So we used to like, we have proper to mention in place like Jenkins and like any frame frameworks. So from open stack perspective, we can provision full open stack cloud if you want it using chat ops. So and the initial goal was to simplify all those things because it's hard to remember what particular configuration you need to have like networking, VLANs, like base operating system configuration for every environment. So and like when we use it, we are fully relying on what we do. So what if we ask to do particular command, we expect it to do it. So we are not trying to add to bring more complexity. We're not trying to replace people, but we're trying to make like it's like easiest to make it an easier way. Yeah. And then, you know, so it's where we were talking about the other day, like someone like the control plane stuff that would be automated. Cause obviously when we talk about the data plane, what we're doing is like we're extending off like, to me it's like the safe spot to really start at because you can easily extend like VM operations. People can get information about, you know, the metadata about their clusters, but the control plane, like, you know, there's some things that maybe you want to want to steer away from like what, what areas did you decide like just, it just didn't make sense for us to kind of go after? Yeah. So basically it's very useful. Like when we build our private cloud, so based on workloads, we have a different purpose for every compute. So, and what we do is the automation with chat ops, we just automatically define based on like on its role where this compute will go. So it will be running like customer facing services or it will be running like some like background task, like database and stuff like that. So it helps us like a lot. Yeah. I think, you know, coming just the manager hat, you know, for me, and I know like I have my peers, you know, constantly asking me about certain information. And so we've also used it as a way to kind of really extend out so that they can easily grab resource information, like how much am I consuming? Like, you know, we don't have a real formal like billback approach to our infrastructure, but people want to see like, you know, what are the size of VMs? What are their, what are their fleets look like? They want to get, you know, like inventory reports. And so those things are like really easy ways so that it's not tasks that come to me. I can really just push that out so that they can get the reports and information they need. I mean, I always love like the, I think GitHub did a really great job of some of these things where, as well as I think I see Dmitry from Stackstorm, where they do some cool stuff with like CI CD pipelines. Those things I really love because I mean, it's so great to have that in a conversation where you can see like a build triggers, something fails, what happens? I mean, immediately everybody's kind of in the know of like what's happening. For us, you know, being such a lean team, it's important information that we can kind of capture and then capture and roll back. But you know, to get started, I mean, there's, the barrier of entry is very low and that's why like I really like encourage and I'm, especially now that I'm more like in an enterprise, you know, organization, you know, there's definitely some things we, you know, we have to think about, but prerequisites are low, meaning if you've done your automation, you've got the building blocks, they're right there. You really, what you're doing is like taking some of these workflows of like, you know, of like an action you take, there's an event and you're responding. I mean, it's already there. All you're doing is just kind of like wrapping chat around it. The only thing I would definitely, you know, encourage and we, you know, is to think about the security aspect. You know, with us, I wanna make sure that we're not making it a backdoor into taking destructive actions or that someone who, you know, could get access to things where they can destroy things. We wanna make sure that we really think about that as well. So I've always been like making sure, hey guys, you know, what permissions or what actions does this individual have? I think you had some comments about our back or what you would like to see. Yeah, so basically we had like amazing talks like yesterday when Verizon was like complaining that it's not probably easy to implement like better granular, to have better granularity in access. So what we do is chat ops, we just doing like providing a role-based access control. So if some people need just to get like simple information, they don't need to see what's going on overall in the cloud. They don't want to see like any specific or any other information. So they just need to get what they need to get. So, and what it helps us, what we do, so we just providing information for everyone, for all our customers, like developers, SE, 3D teams, and for our team also. So we manage like access based on ACL, so which is like much more better than like default access rules in the horizon. Yeah, so you talked about kind of like getting started, like what would be the simplest way if you're like just wanted to like kind of kick the tires and like, you know, get going? What would be your approach to taking that? How don't, you know, you gave me some ideas about that earlier. Yeah, so the best idea is to start to start like with your local environment. You can actually just start with DevStack and like to create a separate account for any like chat. As soon as you feel comfortable, you can move to UA or pre-prod. So, and don't try to make a complex thing, just try to like do like a basic stuff. And if you see if it's working for you, what it takes to implement, you can move on. Sure. All right, well, you know, we have a few things. I mean, it's, there's quite a bit of other stuff that we have, but Mikaela, we were going to like walk through just a couple of like some of the things that you were kind of like extending out to the team. Yeah, just a small demo. Okay. And just, you know, talk me through kind of like, you know, what exactly that you're doing through this? Yeah, so for example, like, let me show you how we work with NOAA API. So, but by typing this command, so I'm trying to get the information about instance running in one of our public clouds. It's basically what is going under the hood. It's just no, it's a wrapper for NOAA show command. So we can see any information about instance. It's metadata, which is very helpful. And next command is like a NOAA list. It feels like a command that you wrote for me basically, as I forget everything. So this, like another one on NOAA, like it's NOAA, NOAA list and with host. So we can get like the list of things running in particular hypervisor. So it's very useful when developers trying to ask, do we run in the same cluster on the same hypervisor or not? So it's the simplest way to check if you have a proper server group stuff in place in your cloud. Basically, if they know they're using affinity or anti-affinity basically, I got it. And then you were mentioning that, you know, like some of the deploys also, we talked about some of the bare metal aspects as well. Yeah, so for us, it's very important to have like, all like information in our inventory database. So based on say, on that we can make any decisions like whether I deploy it properly, which version of OpenStack is running on it. So we basically have inventory database where do we store all information about physical server so we can easily query and check like if we have proper bias config settings which OpenStack version we have. So this particular command shows like every information about like particular compute node. So, but what was really important for our team, it was to be able to test like OpenStack releases pretty fast. So we did some work in Jenkins and we also integrated with, so when you have multiple integration in your chat, so we can easily, you can pretty fast see what's going on. So you're querying like asking about to provision OpenStack cluster and then you get information from Jenkins. For example, if I ask to deploy compute node, it will take just 10 minutes to do it. Yeah, so what basically happened under the hood, we just querying inventory database, getting only the information about networking in top, then we're triggering Jenkins job which is consists of three steps. The first one, it does just basic operating system provision. The second one is deploying OpenStack itself and of course the last one validation. Sure. So it helps a lot in our daily job. So then basically what happened is and you just outputted like the Jenkins job that's running behind the scenes then. Yes, so right now we're provisioning in our lab compute node. All right then, good. So you can see that a lot of these are pretty simple use cases, like just thinking about kind of like where we go from next. We're looking at, these are like simple easy steps but they're like building blocks and that's kind of where like thinking about like where we headed next. I mean, some of the things that we're considering is just integration, even deeper integration into monitoring systems. I was in a conversation yesterday and we were talking about some network challenges and one team is looking at it one way, another team is looking at it another way and it was like we were looking within our preferred monitoring system and I'm like we should just integrate that into it so that in the conversation we're all looking at the same data. Like we'd grab the same graph, what's the timeframe and then we can all kind of like observe and make sure we're all in agreement that we're seeing the same thing versus like bringing all these different tools and seeing different lenses and having misaligned optics about what's actually happening. The other start we thought about is like we sometimes use it already for like retrospectives, like going back, tracking data. It's easy for us to kind of go back and get into root cause and like capture all the data and what was happening and who did what at what time. The other thing that you mentioned was like what you were looking at was anomaly detection. Yeah, so it's pretty easy like when you have like a... But yeah, it requires a lot of work because you need to basically build a dependency graph so but it will generally help you to determine what's going on because sometimes if some particular service is down you don't know what really happened without any pre-analyses. Sure. So things to think about, you know, like we talked about, you know, a little bit earlier I did talk about like security and I think that that's something that... There goes my wifi. So the areas that I always like try to caution in regards and I think this has really just to do with like automation in general is like, you know, there's a saying about, you know, you can automate anything but sometimes it always doesn't make sense to really think through like with chat in a similar way, like does it really make sense? Am I really saving time? I think we talked about the complexity challenges. You may want to like, you know, think it through first before you decide. We really try to go for like the really the low hanging fruit, like start with the easy things, the things that are just constantly interrupting you every day, like find an easy way to give that to them. And I would also encourage like, you know, extending the commands so people can see what's exactly happening. You're also like indirectly like teaching them how to like go fish for themselves. The security aspect is really important. You know, now that we are like, you know, in this new world with our business, you know, I have to constantly be thinking about maintaining that we're in line with like our two factor authentication. Like we don't want to bypass these things. So, you know, think about what you're delegating. You know, you don't really need God access to the bot. I mean, that's a really, really powerful thing. So we try to really just go least privileged with everything. And I think that's the best practice we all know. And it's not, it's not hard. It's really easy. It's just make sure you don't compromise those things. And, you know, there's a lot of resources out there. You know, like I listed just a couple that are available. And, you know, it's a very rich ecosystem of tools and things that are available. You know, like where do I see us going next? I mean, I love to start taking these things and like taking these initial building blocks and making even more like, you know, more richer, you know, kind of like orchestration that happens with like our chat ops. So with that being said, I just want to say, you know, thanks for this opportunity of letting us share and, you know, we're open for any questions, even suggestions or comments. All right, it's early. I think we all need more coffee. Thanks for your time.