 Good afternoon, everyone. Yes, I'm one of the practitioners of CICD at MyData. I just introduced myself. I'm Oma Mukhera, CBO co-founder of MyData. I actually applied for a talk to talk at this conference about CICD. Then Priyanka said, why don't you actually moderate this panel? Because there are a lot of experts which you could ask some good questions on. So happy to be here. Why don't we all introduce ourselves before we get started. Hi there. My name is Anna Medino. I currently work at Gremlin as a chaos engineer. We help companies build more resilient systems by proactively injecting failure. Prior to this, I've worked on cloud infrastructure at Uber as well as chaos engineering there. And I've been working in software engineering for about 10 years now. Hi, everyone. My name is Angel Rivera. I'm a developer advocate at Circle CI. And at Circle, we enable developers to build tests and deploy their software at scale. Yeah, I started with Circle about a year and a half ago, a little over a year and a half ago. And I basically come to conventions, conferences, meetups, whatever, and do grassroots engagements with individuals and teams on how they're using technology. Hey, everyone. My name's Dan Ver. And I'm a co-founder and lead on the CNCF.CI project. And what that project aims to do is test CNCF CI projects and does interoperability testing. So we want to test things like what versions of Kubernetes do these projects work on and to make sure everything's compliant and then do matrix testing out. Does this version of QoDNS work with the stable version of Kubernetes and so forth? And my name is William Chia. I am part of the product team at GitLab. I work most with CI CD and Ops, which means I get to work with Kubernetes and our serverless features in GitLab. And I work specifically on the go-to-market side of our product team, where I get to work mostly with customers hearing from everyone from startups and small businesses to large enterprises what they're doing every single day in regards to CI CD, Kubernetes, and multi-cloud. And I took a photo of you all because you look great and it's on Twitter. Hey, my name is Frank Ford and I am an IT architect at Genworth, primarily working with helping enable our internal development teams to build CI CD solutions. Great. The previous panel was also interesting. I picked up two things just to recap. Kubernetes made the multi-cloud journey easier and faster, but I think it was Wix who said from Google, Kubernetes doesn't solve CI and DevOps problem. That's an interesting one. So while I think we have real users here, practitioners on CI CD and of course, we're going to find out they do use multi-cloud in their daily practice and how they actually using CI CD. So why don't we generally go around the panel and say, do you use any cloud provider in your daily lives and other multi-cloud involved already? So Gremlin actually is only using one cloud provider. We're strictly on AWS. But I get to work on the developer advocacy team. That's kind of like the community side of Gremlin. So we actually play around with all cloud providers. I mostly stick to using Azure, AWS, Digital Ocean, and Google. So most of my experience is actually playing around with Kubernetes and building out environments that sometimes we play around with CI CD or actually implement multi-cloud technologies around it. But for the most part, like as a company, we still haven't gotten to the point of going multi-cloud. But I actually did get to experience trying to move Uber to go multi-cloud. I was part of the cloud infrastructure team there where we were assessing being completely bare metal and trying to move to the cloud. But we wanted to do that in a very non-vendor agnostic way. So we're actually implementing the API with AWS and GCP and trying to build out a way. And from what I recently heard, they now have a tripod model implemented into Uber that they're still touching two clouds and keeping their bare metals at data centers. Yeah, so at Circle, we're definitely doing multi-cloud. Presents a lot of challenges. We have to run your code, right? And we have to have the ability to provide the functionality from the different cloud providers, specifically Mac, right? That's a really hard one to solve. That one, we actually have our own private cloud that we built out, right? Because of the restrictions in the Apple ecosystem. But we also added Windows recently. So that's presented a challenge for us to adapt to Azure and all the other things that come along with that, right? But yeah, we're definitely doing multi-cloud. And it's always a challenge and we're always reinventing, or not reinventing, but figuring out how things should work as we go. At CNCFCI, we're using a combination of two cloud providers. We use AWS for our front-end stuff because it's a live workload, yet it has a really high rate of change. And then we use packet.net for bare metal for our resource intensive workloads because it has such a low rate of change that it doesn't make sense. So at GitLab, we have a pretty complex story as it relates to the clouds and the environments which we use. So we offer both an on-premises product that you install yourself and self-manage. And you can manage that on bare metal or any cloud. And then we also offer a sash platform which runs the exact same code as our premises product and is the largest instance of GitLab. So that GitLab.com instance actually runs all on GCP. So we had a well-publicized migration from Azure to GCP. And for the most part, that is a monocloud SaaS where that runs completely in GCP. But we also have a component of our CICD build agents which we call runners. And those runners are actually multi-cloud. So those we offer as what we call shared runners. You can just sign up and you can just start using these to run your CICD workloads. And those run both on GCP. And as of 12.5, which is launching this month, you can also do it on Windows as well. So we're launching Windows Beta. So that's multi-cloud. Although you can also manage your own runners. So using our SaaS solution, you can run them on-prem or any cloud you want. And then of course, we have features where users and the users that I talk to all the time are doing multi-cloud. So this is things where they're using, perhaps Terraform to orchestrate this and doing deployments to multiple clouds using GLEB's CICD. This is where we have a Kubernetes integration with GKE. And as of 12.5, releasing this month, we'll also have an EKS integration. So that is multi-cloud. And then you also heard about our cross-plane capabilities which are shipping in 12.5, which allow orchestration across managed services. So a complex story, I think, if you caught all of that. So at Genworth, we're using a combination of some on-prem solutions as well as AWS and Azure and then a couple of SaaS and PASS cloud-native solutions as well. And for us, multi-cloud is not necessarily a choice due to the regulated nature of the space that we're in. There's certain things, there's certain products that we have to, you know, the regulation, whether it's external or internal, kind of dictates where we actually have to run certain workloads. Thank you. Before we actually go to a question, I can also share how we use multi-cloud and then use CICD. We had two projects. One is Open Source, Open EBS. The other one is a SaaS platform called Director. Both of the CICD is done using GLEB. And the runners, as Williams was saying, they actually need to certify this project before it actually gets shipped on multiple clouds. So we run the runners on multiple clouds. One thing that we also make sure is you keep the agnostic piece of the stages separately and then cloud-specific tasks separately. I think we're going to talk a little bit more about that in the panel. Before we actually talk CICD, why don't we recap what is multi-cloud and what are the advantages? And specifically, are there cases where multi-cloud is not a good scenario to use? Maybe Alex? Yeah, so I would say obviously, as computing is becoming more and more modern and changing, you definitely have to have capabilities to enable your customers to run through software in whatever platform, right? I don't think there's any more of these excuses where, oh, we're only Windows or Linux or Mac anymore. And it definitely makes sense to build out that capability, but there are challenges, right? I saw an earlier briefing called a briefing here earlier about cloud-plane, right? That's a pretty cool project that I didn't know existed. I'm gonna go research that a little bit more. But those are the kind of changes, I think, that are coming that we need to start looking at so that we can, again, bring that capability. And it doesn't matter where the applications are running. The other piece to this, though, is what you said, the contradictory pieces. When you're running in multiple clouds, that also introduces problems, right? Especially for operators. And I'm talking more specifically about HA, high availability, and also fault tolerance and then disaster recovery, right? These are things people just think about, oh, we need to connect, integrate. But at the end of the day, if you're serious about running these applications, you need to also think about those things and introducing those complexities from the different cloud providers will definitely impact your operations. I've, trust me, from experience, I worked in the government before and we had all those problems. So it always, you know, it's stung me in the past and I always have that in the back of my brain, so that's some of my insights into it. I think I'll kind of piggyback on what Angel was talking about. I think there's various uses for multi-cloud and we're at the moment where if we're only a single point of failure on a cloud, it's really easy to have some downtime, have an outage and be like, well, it was my cloud provider's fault, but to our customers, that doesn't matter. You as a company, we're down and that affects them. So I think with more reason, we really need to push for more folks to start going to multi-cloud, start looking at hybrid clouds, but then it's not just about coming up with the multi-cloud strategy, it's actually doing a lot of fault injection to make sure that your multi-cloud strategy is working. So when we talk about disaster recovery, yeah, it's great that you say you have AWS and if something goes wrong, you're gonna fail over to GCP, but until you actually go and pull that plug on all your AWS cloud and you're like, okay, cool, we actually failed over and we didn't suffer any data loss and maybe the meantime to recovery was only an hour and maybe that's what your company's okay with in terms of your objectives with your customers, but we get to the point that a lot of folks are still on the, we're implementing multi-cloud, but that's where we stop. We don't really talk about what other obstacles that we get to once we have more vendors in the space. Yeah, I think back to this morning's keynote where we talked about maturity levels and I would even say HA is kind of orthogonal so you could have at each stage as you go on your multi-cloud journey, you could be talking about failover, HADR. I'm even curious in this room, how many, I don't think we've answered this yesterday, how many folks in the room are using more than one cloud today? Okay, so there's most everybody, but there are some folks that perhaps are only using one cloud. You're in the right spot learning about multi-cloud. So I would call that using multiple clouds, but maybe not necessarily multi-cloud. The next stage would be what we would call workflow portability. We're using an orchestration tool, it could be Circle, GitLab, and with that tool you're able to deploy to multiple clouds and I would maybe even call that application portability where you can take the same application and run it on multiple clouds. How many folks have that today where they take the same application and can deploy it in multiple clouds? So a lot fewer folks. I will even ask this, how many folks have the level where perhaps you have the same application and workloads from that application can just span multiple clouds? Does anybody have that kind of sophistication? So our folks that spent two years to go build it have gotten to a very, very sophisticated level, but most of us here in this room are on a journey and so it's early days as a lot of folks have said and I think as we're going, the question to ask is not necessarily binary multi-cloud or not, it's where am I at in this maturity model and what's the next step are the questions? All right, go ahead. I think there's another area that it's really important to have a multi-cloud strategy now and that's testing. There's a lot more applications now that are becoming event-driven and have relying on integrations with cloud providers and it's more than one. You can't just test on one provider and go, well, it works across the board, you need to be expanding your test coverage to cover multiple cloud providers to know that your application actually works and there's no random migrations that got in. Yep. Frank? I will say so I guess I can be the negative guy in the room and kind of touch on where they fall flat here. So, at least in my space, multiple cloud providers can potentially fall short. It's usually around, like I touched on my previous comment, around regulatory things and compliance. So, if a particular cloud vendor has a product that is not compliant with a particular privacy regulation or something like that, I can immediately take it off the table and I need to run a sensitive and that would be in the case that I need to run a sensitive workload out in one of those cloud providers. So, like I said in my previous comment, that's kind of where some of these start to direct where some of these workloads, especially sensitive workloads, can run. Yep. Yeah, that's great. It's actually interesting to see the multi-cloud adoption. I was expecting a little bit less than 50% but most of us are using multi-cloud on the panel as well. So, let's actually talk about the deployment which is the crux of the topic for this panel. Can we just go around the panel and then talk about how have you deployed or set up your CACD pipeline so that your developers could deploy the built images, tested images directly onto the clouds? What are some of the best practices that you've been using? Sounds good. One of the ways that we're doing is that at Gremlin, we help companies avoid downtime. So, we're starting to work with integrations with CICD platforms so folks actually start having a stage that they run chaos engineer experiments. So last year we came up with, or I think it was earlier this year, we had integrations with Google Spinnaker and that allows you to integrate your entire pipeline to now have a stage that allows you to run chaos and that allows your engineering team to basically be deploying code at the same rate that they usually are, but they're deploying it with a lot more confidence and there's another layer that we actually talk about with CICD is that you can actually build a lot more testing around past outages that your company has had or maybe some of the large outages that we've seen around in the industry and building testing around those scenarios from making sure the caching layers are able to handle when one of your services goes down but maybe your caching layer can actually limit out and making sure that if your caching layer limits out, the other services that are dependent on it are able to still continue providing a good user experience. So there's various ways that we're talking about it with our customers and around the community but for the most part we're looking at building integrations that allow folks to have an entire other stage just to run chaos before they get to production. I think one thing that helps a lot when you're working on deploys for multi-clouders to choose tooling that is gonna support multiple clouds off the bat so looking at things like Spinnaker for example or Terraform because one thing you really wanna avoid if possible is ending up with different workflows for different cloud providers because then you're testing with different CICD pipelines it's different code and it's inevitably gonna behave differently and then you're gonna run into weird bugs. Yep. Yeah so at Circle that I don't spend as much time as I should with my operations team but as I understand it, we eat our own dog food, right? So that enables us to make sure that we're delivering quality software. The other piece of that is we do find and I can speak to I guess the runs that our customers do and we do have customers that are deploying to multiple platforms. We have a wealth of data. We've been doing this for like nine years so we understand with the metadata we're not looking into the code. I just wanna preface that we're not looking at the sensitive stuff but we do a lot of analysis on the data that we get from the performance of the pipelines and yeah we can see how often people are deploying to the different cloud providers and I was also shocked because there's quite a bit of companies that are doing this to the different cloud providers but as far as Circle is concerned we encounter the same problems that our customers encounter and that's how we deploy multi-cloud. All right probably I can switch one topic before we go to the audience. Multi-cloud is all about being cloud agnostic, right? So you want to be able to develop and deploy it on any of the cloud providers and CI-CD is a pretty important tool in that or area in that but the cloud providers themselves are having this CI-CD solutions. For example, Cloud Build from Google Azure Pipelines. Are you going to get stuck yourself into one of these clouds? Do you think, you know, is there a real agnostic nature of CI-CD? Can we just have some comments? I know. I can start from that side. So, yeah, I got that question earlier and I was thinking about it. For the most part, I think CI-CD platforms in general have to remain kind of agnostic in a sense, right? The syntax is where, like, how you control that automation is where we kind of stumble, I think, as a whole as an industry. You know, we've kind of settled on YAML as like the cloud syntax, right? I think that's fine but on the other side of the house, people want to do more extreme things inside of their pipelines. Now, I don't know if that's the right answer, you know? If you want to put logic into YAML, that's pretty risky. I've seen, like, Ansible's a good example. They have a little couple of control planes there for logic. But at the end of the day, I think, definitely we struggle with being agnostic. Everyone does things differently. You know, some of the other platforms, they definitely have a niche kind of, they use languages instead of, you know, a generic syntax. I think YAML's not, in my opinion, YAML's not the best, but it's one of the better ones out there for folks to kind of have an even playing field and then you can expand on that or extend that. It sounds like a bit of a controversial question. We had the cloud providers up here earlier and so now we're asking the non-cloud providers the kind of the vendor agnostic CSD tools. Hey, are you better than the cloud providers? So from that, I was encouraged a bit from what I heard earlier, that I think the answer to that question will be, we'll see. You know, for example, I know that, as I mentioned, GitLab is not just shipping a cloud agnostic tool where folks are using it to deploy, but we have tight integrations with GKE. We have tight integrations with EKS. We partner with all of the clouds, as Brandon will tell you if you see him around here. That's what we consider a strength, is not just the cloud agnostic functionality, but the cloud partnerships we have with all of the clouds. So I think that'll be incumbent. I think if you start to see out of the cloud provider specific CICD tooling that they have tight integrations to the other clouds, okay, yeah, maybe it'll work. But I think that that is the key component, as Denver said, that you want something cloud agnostic that is gonna keep your workflow the same regardless of which cloud you're deploying to. Right, the workflow and then keep the platform specific, there's always going to be platform specific code. Just keep that separate and then, you know, you're actually amuling the logic, keep it agnostic. So it really depends on how you as developers adopt the best practices around multi-cloud. That's what we do. We've got about five minutes. Some questions we can give for our patient audience. All right, audience questions. I would think of these folks as end users just like you because some are vendors, some are, you know, pure end users, but none of them are a cloud provider, as they said, so. I write YAML daily, probably, as well. Yeah. All the time. We use CICD quite a bit. Thank you. What are some of the common use cases you guys are running into that are problems that solution providers should be solving for in this space that you're seeing repetitively? Up time. Lately, it's been a struggle, I think, for some of these cloud providers and, you know, it fails for some folks, doesn't fail for others. So yeah, that's one of the struggles I see, you know, from the platform's perspective. And again, getting back to the integration points where, so yeah, if just to the day GitHub had an issue that impacted all of our customers, right, or most, I should say most of our customers, and it was out of our hands. We were definitely like, wow, this is bad, but, you know, and we're working to fix that, right? So becoming more of a virgin control agnostic, we're gonna be having connections into different virgin control systems like GitLab and others, right? But I definitely believe that these integration points need to be monitored a little bit better and also have, and again, going back to kind of like, failover, getting the ability maybe to say, all right, well, our code's here, but we also have it here, and let's automatically fail this. I always encourage people who are writing pipelines in our platform to do some checks against APIs that they use so that they can just fail their builds right away instead of, you know, wasting money and effort in going to a build that it's gonna eventually fail out of your control. I think with that, definitely having a little bit of a checklist of what it actually means to be production ready and having a lot of that failover testing where it's like, yeah, it's cool if you actually implemented the two clouds, but if you're actually not running these exercises weekly, monthly, at least once a quarter, you won't know until you get paged at three in the morning and your engineers are gonna be pissed, because maybe even that runbook has not even been updated and you're just kind of praying to all the runbook gods that your failover is actually gonna work when you really need it, so making sure that you're actually implementing and running some of this stuff as we wait for more uptime to be provided by the cloud providers. I think that's kind of like where I see it. Yeah, that's a terrific question. What are we seeing that tools should be building to offload from practitioners? And I can think of two. So one is the whole pipeline. You shouldn't have to build a pipeline. What if we just built the pipeline for you? You could just click a button or you could just commit your code and the whole pipeline was already built. At GitLab we call this autodevops and there's a certain set of use cases where this works really nicely and so we think that you shouldn't have to build a pipeline from scratch. You should just get it all yourself and all of that duplicative work should just be there for you that you can then extend. And the other one that comes to mind is orchestration. So when I'm talking to customers the most that are doing users and GitLab customers that are doing multi-cloud, they're doing a lot of orchestration and abstraction. So for example, whether they want to use Google Cloud SQL or RDS and they're having to write an abstraction layer in order to homogenize a logic so they're basically building an interface so that their developers can just say, okay, write to state and then whatever's on the backend is abstracted. And that's a lot of code that a lot of different companies are writing. And so a lot of folks have talked about cross-plane today when I see this type of capabilities and cross-plane and that community emerging, that's pretty exciting because that's what I see a lot of folks writing all the time that could just be pulled out into a tool and offloaded so that you can focus on the business logic. Yeah. I can probably add one more though I'm a moderator but I directly have real-life experience. Uptime is pretty important but fact is the cloud will fail at some point of time, right? So it's very important for us as developers to have the API to actually inject platform failures, right? To see how your pipeline really behaves when the cloud fails, right? So the cloud providers, some of them have, right? So how to create a disk loss, network loss, node loss but not all cloud providers maybe, it's as easy as other clouds. So it'll be good to have some kind of an API emerging on platform failures, that's my take. Awesome, that question really got the panelists going. So thank you for asking it. Any other questions? Oh, sorry. So I'm gonna ask Anna this question but everybody's free to answer it specifically because of your chaos engineering experience which is a sweet job title by the way. So I'm kind of like wondering, because I'm thinking like how I might start with chaos engineering and my first question would be like, well, what kind of tool do I use to kind of like orchestrate these tests, kind of gather the results and then like judge the results because when you're inducing failures it's kind of like it's not like a test pass fail necessarily scenario, it's kind of fuzzy. So I'm curious if you could speak about some of the tooling that you're using to do your chaos engineering. Yeah, so I'll start off that I work at a vendor that is a resilience a company that offers a chaos engineering platform. So we actually offer a free offering of our tool which allows you to run shutdown and CPU experiments on your hosts and your containers. And we actually just launched a pretty neat Kubernetes new product today. And then on the enterprise side we also have various attacks that include like more on the network layer, like latency, blackholing, packet loss, maximizing your CPU, your disk, your IO, your memory. And we have that tool that allows folks to do it but prior to you actually even building your own tool going to open source or using Gremlin there's a little bit of a culture that needs to come with it where you first have to embrace failures. So having a little bit of a blameless post-mortem culture is one of the big things that you should be having. And then we think about chaos engineering in a way that is very thoughtful and very planned. So it's not about just going into your company the next day and be like, hey, we're gonna do chaos in production, let's go. We're gonna bring down some instances, we're gonna shut down some clusters, that's not it. We think about looking at it like the scientific method where you come up with the hypothesis and you think about some abort conditions that can actually mean that you have to stop this experiment, then you go ahead, you implement it, you look at your observability tools, make sure that you're still up that you haven't met those abort conditions. If you met those abort conditions, you wanna go ahead and halt that experiment, go fix your application, make your infrastructure a little bit more resilient and do it all over again until you can kind of pass it again. And we talk about the blast radius where you don't wanna run this on 100% of your fleet, 50% of your fleet, if you don't know how it behaves on one cluster, one node, one day mindset. So starting really small is the key to success for chaos engineering. I think it's called game days, right? Game days, the culture is really important. Just to add, you can also take a look at the open source chaos tool, Litmus. Point out of that project, it's chaos engineering for Kubernetes, specifically for cloud native applications. It's a little bit of an early stage, but I think it's going to get there. Yeah, because I might say, how do you get started with chaos engineering? Don't, so, but to Anna's point, right? It's about building a culture. So for example, before you can do continuous deployment to multi-cloud, how about figuring out continuous deployment and DevOps for one cloud, right? How about getting a good DevOps culture of small changes, repeatable, boring deployments? And when you have that done really well, then you can think about multi-cloud. In the same way with chaos engineering, you need to be able to have enough instrumentation, monitoring, rigor, blameless postmortems, even to understand what is caused by the chaos tool and what's actual failure, right? Because chaos will help you build resiliency to test if this goes down, does it fail over? But if there's actually something wrong, if you need to troubleshoot, there's not enough memory on this node, and it's not just that the node went down and you need to fail over. This is, I have an incident and I need to go add more memory or add more compute. If you can't tell the difference between those things, then it's not time to jump into chaos engineering. You need to build that culture first. Well, I think, do we have more questions? We'll take one last. Anyone? All right. Thank you very much. Thank you so much, Panel. And moderator, that was amazing. All right, thank you.