 All right We'll get started a few minutes past. Hopefully You know we've 30 people. That's good number. So let's get started. So hopefully everyone has the Deck in front of them today, you know first off on slide three Is there anyone new to the call that would like to kind of introduce themselves that wasn't here last last time or before? Yeah, sure, so my name is Paul Osmond. Can everyone hear me? Yeah, I'm clear Perfect. Yeah, my name is Paul Osmond. I run a team here at Under Armour Connective Fitness So we're the ones who build met my fitness and my fitness pal. Yeah We've been doing game days for a while You know, we're enthusiasts of chaos engineering and just interested in checking out the group. Cool Thanks, Paul Yeah, I'm Deepak here Deepak Sarada. I'm actually in a personal capacity today I work for a fairly large financial services company. I Run a chaos engineering program there Trying to make that involvement official, but as of today is just personal Cool. Welcome Hi, can you hear me? Yeah, we hear you loud and clear My name is Geeta Gopas. I'm from Capital One I run the identity and access management team for Capital One, the external consumer identity platform And I also own the team that does a lot of chaos engineering We just got started about six months ago Cool, welcome to the group. Thank you. All right. Anyone else? Cool We'll move on so Slide five essentially kind of gives an update on where we are with Building out a landscape. So I'm not not sure a lot of you familiar with kind of the Cloud-native landscape that CNCF puts together if you go to l.cncf.io you kind of take a gander on it We're trying to do the same thing for chaos engineering. I've essentially taken the liberty of putting together a spreadsheet I'll paste it in the zoom chat of basically the one Technologies that essentially in projects I found out there that are kind of the more popular and widely used end I would love to get a sanity check from folks on Basically the tools out there that are on this list and tools that may be out there that are not on this list I'm because I would love to basically get all this information collected So I could do a pull request and get our kind of first cut of the chaos engineering landscape that plugs into the the CNCF landscape The other thing that I think we discussed last time with Uma and some other folks was how do we categorize the different tools out there? It looks like Uma you've suggested some subcategories I Believe that was you but it would be good to kind of get feedback from the group because you know some of these tools You know are hosted offerings some of them are not some of them focus maybe on a specific area just around storage but kind of the the categorization problem is something that has plagued us at least in terms of You know trying to bring some organization just list I mean we could keep it as simple as just saying like we have one category all this kind of just fits under category chaos engineering to start And then maybe later on we kind of play the game of trying to break things out into More concrete and discreet categories Uma do you want to share any thoughts because I believe you were the one that came up with these? Categories last time right I believe There was a good comment from can you hear me? Okay. Yeah, we hear you loud and clear. Okay There's a very common from Matt, you know a couple of days a few days ago So idea here is you know cast engineering is you know a lot of tools out there some in small capacity some a specific to introducing a type of chaos While there is a Lovely good effort. We are trying to make where To get a litmus as a framework or an orchestrator and dead you know started with chaos engineering as a general practice while there is an application meant or The administrator So the idea is in order to find something that belongs as frameworks of each other in it is put to do that framework and Then the framework could also More focused around either infrastructure chaos or application goes on network inside infrastructure, there are multiple times, right? So To compute or network conservation Just one of this and I read Matt's comment where he says Let's let's do a layering right Infrastructure, sorry to be introduced in layer. So let's define ears and he talks about the station Really, and I like that as well. Okay. I'll take a look at it and write up she But my dog is Okay Appreciate appreciate the comments. Any other thoughts here before we move on the community presentations I think if we are going to be categorizing it to also be useful if we think about the target of cat chaos Like there's a bunch of tools focusing on Kubernetes. So it'll be interesting to see What tell what are the platforms are getting active attention, you know, AWS as your whatever it is. Yeah Some characterization that will be useful So that was a little bit on what I touched on the sense that like it was OS versus like Provider versus containerization or orchestration layer. So I think Definitely, you know, my suggestion is we'll keep hammering away on this and continue discussion on on github. Um, I have a Feeling I kind of just want to start with something Simple and just get kind of tools there that Tools on a landscape and then from there we kind of divide and figure out how to lay Things out Uh, all right. So Slide six and seven. So we got two community presentations today one from the Pumbaa team and the other from the litmus team I believe alexa is going to go first to talk a little bit about Uh, Pumbaa and kind of what it does and and where it fits into The chaos engineering landscape. So alexa are you there? Yes, yes It's yours feel free to share your screen if you want to or do whatever do or do whatever you need to Just Screen Fine Okay, so perfect Do you see my screen? Yeah Yep. Okay. Okay. So Let's make it small So, uh, a few words about Pumbaa First of all, it's a well known Disney supporting character from the lion king, you know On swally Pumbaa means to be foolish, silly, weak-minded careless neglect. So I think the name is Quite represent the idea behind the tool and It's a common line tool for Couch testing for docker container So I started this journey two years ago when I was working on On one project with a lot of micro services involved. We use docker and thus it was Coro as a fleet orchestration engine and I was enthusiastic about trying chaos chaos testing on this project and I read a lot about the chaos engineering and cause monkey but When I searched for a docker tool container specific tool, I failed to find one so That that's why I decided to to create one using a docker epi starting from simple scenarios And that's what I have today from time to time. I maintain this project till now extended fixing issues and enhancing it with people's suggestion So the project itself, as I said, is a common line tool It's a Single binary for for whole platforms for Linux windows mock So what Pumbaa can do? It's actually Can start a docker on time environment Inject and failures like You can specify the victim container using names ideas or regular expression You can introduce use randomness Is random flag container not all containers to define the chaos taste So Pumbaa can disturb either single docker host with multiple containers So docker is a swarm cluster or containers running in Kubernetes cluster And basic commands like we say docker, this is what I started with Like you're able to stop running stop and restart running container Killing or sending any Termination signal Linux signal to the main process within docker container Remove container together with its links and volumes And pause the old processing within container for specific Specified time So switching to a short demo and split horizontally Okay, so as I said, do you see my screen? right guys Okay, yeah Okay, so Pumbaa, as I said, it's a it's a common line tool. You have Several commands and multiple sub commands within command With a lot of parameters you can connect to the local docker host or the remote docker host with TLS certificate, etc So I will run multiple several containers that do nothing Just increase the default so you can see it Okay So seven containers doing nothing and I'm going to I'm going to run a kill command Select a random container from running containers Every and repeating the same every five seconds So so sorry So I see container Happen running and they start to exit one by one And in the log you can see that it find containers and select containers to kill and send a sick kill It's possible to specify the signal and to do other stuff Okay So Let's take a look at what else it can do It can as I said, it can kill it can pause it can you can specify this the termination signal you can use a regular expression and You can also pause container process within containers also some valuable scenarios And next thing after that I Added capability to to To create network emulation to to emulate network failures at the container level So things like a delaying all angriest traffic Introducing packet loss or Defining a rate limit. This is currently supported network emulation Disciplines and they Just to demonstrate it. I will show Okay, let's run some pink command Okay, I am running pink to To some DNS just moment For some reason it Does not work To be able to show That's another one So for some reason I cannot issue the pink command right now from my docker machine So I cannot show the latency Delay, but I will show the different scenario I will show the packet loss scenario And it will be local network I am adding a IPF tool for for the server and for the client And so I start the IPF server Okay, and so I'm starting here IPF servers. Just need to remember how to start it I have the command copy paste So it's server it's a listening for the traffic and I start the client sending a packets for Let's say 60 seconds So you can see the packet start arrive with zero packet loss And I start the pumba command on client To introduce the packet packet loss here So it's a for 20 seconds. I will use I will add packet loss 20 percent with correlation of 10 percent And you can see on the right bottom pane That there are packet loss If I stop it It will restore the connection What's behind the scene? As I said, it's possible to to reduce the delay and packet loss and rate limit And behind the scene I'm using a control program Either available inside the container or I if it's not I inject the container into the same network namespace as the target container Okay, and for the next plans I think to extend it to support a kubernetes native chaos testing Introducing chaos for services and resources and network in kubernetes cluster Not only at container level, but also at kubernetes entities level And adding public api Possibly compatible with chaos toolkit as it defines. I need to deal down and see how I can make it compatible And that that's all Thank you. Thank you, alexa Does anyone have any questions for any questions guys? Sure Um Yes, alexa. Hi, this is kuma from the litmus team. So first of all, thank you for this Great tool we use Pumbaa for introducing the network chaos in one of the the answer to playbooks Actually answered my question in the last slide You know, this tool is great But most of the teams are moving towards some kind of an orchestration to manage The workflow particularly kubernetes being the most interesting one So it would be great, you know to have some effort going on to Have this working with kubernetes, which you said already went to do And we'll see now if we can spend some bad rate to contribute because we are already using this in the litmus framework Just wanted to voice out that. Thank you Okay. Thank you. And I actually it's open source project. I lead it and Guys, I'm I'm more than happy You decide to contribute to help his idea Yes, so pull request support so anything Yep. Thank you Okay, Chris Thank you alexa Next up, I believe we have uma and the litmus team, so You're here uma. So you're welcome to start steering now All right, uh, let me just share my screen Yep, perfect. I see your screen so All right, go for it. Great All right. Um, so thanks, grace. Um, so litmus Is is basically a combination of many things it all started from Doing course engineering from our own open ebs project development We were faced with an interesting scenario In our storage being the most difficult one to get that separate quality levels Um, so we started writing kind of a negative test in the team and you know an entry in framework We started seeing you know from a couple of users in the community that You know, I'm using your product, but You know when something happens things are, you know, not behaving exactly the way we want, right? So we wanted to give um a good framework for our own users to Test their applications which are using the underlying Storage which is open ebs. That's really how The litmus born we then thought uh, let's have a good v framework and that introduces that a lot of Functional tests or security performance, but mainly you know naked test That gets used There in the application or in the infrastructure or in storage any layer We wanted a good framework and uh, that we called it as litmus. We Put it up in a more usable You know from then for current et framework is we just put a Structure and how end users can use litmus framework Or I like matt's Categorization Of litmus orchestrate, right? So imagine You know it as a kubernetes how you seek it is as a Framework get your application employment and and planning on That way you can think of mess as a tool or not really a framework To plan and in here the chaos into whatever you're doing, right? You may be doing application development. Then you have certain way of planning chaos and getting some expect results out Or if you're doing, uh, you know devops orchestration as an architect It was litmus in a in your own way So that's how mess is born and who can use litmus It's almost two categories. I would I have four here, but primarily, um, Applicating developers who are You know most database related stateful applicants That are interested in, uh, testing how, uh, Application works under certain conditions, mainly negative scenarios, right? I have a use case uh going Into club slides below And the only is devops architect. This is where we expect most of the crowd will really, uh, you know, practice litmus, um, to Devops architecture An interesting job of making that whenever you enter the change Uh, you have to certify, um, that change that things have to be okay And they're going to be working for, uh, given some developers, right? And the change could be as simple as, you know, upgrading Kubernetes from one layer version to the other one So how well, uh, I as it was architect Certify that everything going to be okay Because I'm actually using a big chain. Well, of course, Kubernetes must have well tested before release We all know that but I still have the responsibility to ensure that my particular environment Are going to be okay So you need a framework for devops architect to run certain tests Before, uh, I said, you know, you can start, uh, moving towards this platform So that's where, uh, it must be helpful and more importantly, you know, we develop litmus as a company There may be, uh, main test suite that are specific to a given application For the MongoDB, Costa, MariaDB They may interested in adding, uh, specific kiosk tests That are related to their own base so that, you know, their users Can actually load this test suite and run it in litmus similarly, uh That, you know, it's because PS is having a list of And also contribute something some simulators to Kubernetes local P other swish vendors, uh cannot progress can also Start contributing more specific kiosk tests that are related to their own application. For example, Glacier or Rook, um, etc, etc and, um, it could also be for network providers, you know, Other open source projects, right? It's almost for everyone because it's just a framework with a set of tests And go into, um, details probably So in this presentation, uh, I do not have a Dell demo. Um, Really because, you know, we're still trying to together E2 framework into a litmus form. So, uh, a couple of guys are busy Probably they'll be available for next, uh Next session to do the demo Let me explain what really litmus is Litmus, uh Is really at a core It's a Kubernetes job find Our design to do a particular job and, uh You can define what the job is and, uh, it's framework or an orchestrator to run job and, uh, the framework contains A lot of ansible test cases, right? And you can Really think of the test, uh, case or a playbook is really independent of whether You have a set of end-to-end functionality and into playbook and they are defined And, uh, put out there so that anybody can run the test Um, it is really standardized and sometimes customers for given works, for example We might, uh, configure even my SQL application with a particular parameter and then customers it Uh, so you need not, uh, take it from the little chart Of my, you know, that's out there But if it's customized, it will be available in the litmus deposit Right, so you can consider it as a workload winners, both standardized in which case you are a ML file These customers will be available in the most depository And although, um, it will have, uh, a set of tools, uh, pump is one Okay, it is one. So this test case may, uh, you use of in this tools to really create that ansible playbook Right, for example, whatever Alexi has, uh Demonstrated to use a net delay between the print and server at that embedded in end-to-end test of deploying a manual application some load then pumba coming and Reducing some network delay and then observing whether my SQL has Relief of as expected or not, right that entire thing is a test case, right? And that's what Alex will have and you can actually check out clone the And then change it and then start executing. So it's designed as a easy to uh, start Calculating a given application to verify whether your application books or not. There's some chaos Um, and then we we're also thinking of writing, uh, kid as deployers for anyone to start, um, you know, um, an end-to-est You need to have, um, a Kubernetes clustering and it's not as as full as in, um Running a test case and deploy Kubernetes code. It will become difficult, uh, task, right? So we do have Uh, some lawyers or mostly engineered at the well-known moves like Or Qtm You can choose of them and then up your, uh, it is, uh, clusters. So the least starts, um, you know, being to get started, uh They're testing their applications and then starting them into this city And then for some probably specific depth, this is more of a future Um, uh, concept we have that for open EBS right now What really means is, um, you have It's a kind of a global framework. Um, Panibias is is, uh, been into the famous work, um, and, uh, you Through some API and cluster or a rook or anybody else can I use API and then make it It's available for, uh, late most users So we don't have a lot of it written what we have today is really, uh, this, uh, uh, job framework and, uh, some of the Test cases and we have e2e tests that are related to these different applications We are in the process of moving them into lateness And our original end-to-end frame is also open source And, uh, in being about weeks will be moving All this, uh, uh, application as into the lateness framework And another important, uh, use of mass is how you really, um, This in an automated way into the Bob's, uh, planes, right? And if you see, uh, There's a diagram of the end of your Of your regular pipeline and before you actually go into the pipeline You actually plug in, um, uh, lateness there to do some of the tests It is expected to be readily available Minimal integral into your CA pipeline Could add a lot of, uh, already well done tests into the pipeline This is how we expect The DevOps architects to make use of mass, uh, to, to To get immediate benefits of, uh, you know, how applications behave in their case Let me, uh, be, uh, summarized by How any, any user, uh, as their DevOps architect or developer can make use of it mass by describing it as use case So I will define, uh, free as, uh, A DevOps architect whose job is actually to, um, Reach a large Kubernetes cluster that runs multiple services in different namespaces Is really responsible for Making sure that Kubernetes is up and running and their developers are using the Kubernetes platform If that is the case, one of the key responsibilities would be, you know, the class questions And whether you're compatible with all the Um Compatibility services that are offered, uh, are going to be doing the same when you move one version to our version So what, uh, Jerry, as a DevOps architect can do here is to list, uh, Some of the guest test, uh, at the end of, um, uh, his regular test cycle And it must, it must actually put out the results, whether it must or not So, uh, just imagine, you know, you have, um, you're in the beginner's journey and there are a bunch of that are Well used by immunity that can't certify the All the applicants are actually working fine. You might have a myself server or Other data access. All are going to continue to find you and not to upgrade It's expected to make, um, you know, a lot of product improvement When it comes to, uh, DevOps being agile, right? And then the other one application developer applications, both are, um, Not like that to introduce chaos, uh Into the infrastructure. So what has been to do is probably defined by Uh, administrator, right? In this case, uh, Jeff, you would have already Provided some namespace restrictions, uh, or back restrictions On what type of, um, uh, chaos can be introduced at network security or disk conditions John, uh, yarn provides, uh, um, this list of details What, uh, it must give back At the end of last, um, that has gone into his application to think And Jeffrey, uh, would basically give, uh, certain type of feedback back to his developers that, hey, your application really not Do the disruptions properly. So, um, you know, this this application is not really Uh, good enough to add in into the pillar pipeline. So I would expect you to go back and do, um, For disruptions logic properly in your application. So the application can, uh, Continue to add more and more, uh, negative application developers cannot add more and more negative test cases into the, uh, Litmus framework So, uh, uh, where Litmus really is, um, it's a right, right now. It's still out of an OpenEBA project. Um, And, uh, there's a list of, uh, Demo apps that if someone wants to try out, uh, Litmus, uh, be used And, uh, Maybe I'll just go on. It's, it's simple, um You use, uh, work out the boot, uh, and, uh, Let me just go back there. Clone the Litmus into your own, uh, A mission and then set up, uh, the urbax poorly And then, uh, go to, uh, set on, set a test that you're interested And then start running it. For example, um, we are Having some related to MySQL and then menus and application, uh, with Blinsight SQL, there are playbooks that are, Uh, that are in the storage framework on MySQL, right? So you go there and then, uh, run Litmus, the animal file will be there. Uh, you just do Cuttle, uh, apply MSF, the CML, uh, it, it installs MySQL, run benchmarks, and reports the results Into, uh, currently a cemented directory that, uh, Host where you're running, but we expect to see, uh, more, um, Development or more contributions from us into this project on how you actually interpret this results For example, um, you know, we want to, to begin with this. We have MySQL and you and a lot of these things are in, uh, ET, uh, work of, yes, we will move to Litmus to begin with. We're trying to, uh, develop, uh, some automated way, On how to, the results of a given test, right? Many times it is possible that, uh, the same application worked fine in my previous commit, but It's this commit, a new rock, um, this test case. So I want to be able to compare my previous law and current laws and, um, we are plan to step in Elastic B, uh, and, uh, some, uh, site, uh, give some insights in the results that, uh, that run in each test, uh, either in given run or multiple runs. But that's, that's probably coming in the next few months as we progress. Um, that's all I have, uh, for today and certainly I'm not able to detail demo, but, uh, I look forward to giving a detailed of this Litmus, uh, in the next session. Questions? Cool. Thank you, Umar. Any questions? Thank, thank you, Umar. All right, um, so a couple things to wrap up on, uh, on slide 25 that really pointing out two things that we kind of need help with is, um, uh, Sylvain and some folks have been doing a good job in, uh, trying to piece together kind of a white paper introducing, uh, chaos engineering to the CNCF, uh, community and ecosystem. So I'd appreciate if people could continue to kind of, uh, send pull requests and iterate, uh, on that. Um, and then of course we have the landscape, uh, which, um, you know, I prefer people to add stuff to the spreadsheet, uh, and make any comments on the issues as we continue to iterate there. Um, to kind of wrap things up, um, slide 27, just a reminder that, um, CNCF is sponsoring ChaosConf, which our friendly friends from Gremlin are putting on together. So hopefully, um, you know, we'll see some folks there in, uh, September. Uh, also we are going to have a kind of, uh, chaos engineering, uh, track, uh, slash intro slash deep dive session at, um, uh, KubeCon, CloudNativeCon in Seattle. Um, I'm going to be looking to this group to help together, uh, to put together maybe an intro and deep dive, uh, presentation to the topic. So I don't know if anyone, you know, wants to volunteer, but essentially my goal is to have a couple, two to three sessions essentially on chaos engineering there, kind of featuring an introduction to, uh, you know, the overall, you know, kind of topic of case engineering along with kind of a dive on, a deep dive on some of the technologies out there you do it. So if you're interested in, in volunteering, uh, let me know and I'll get you kind of added, uh, to, to the list for sessions. Other than that, uh, I am, uh, gonna kind of close out this meeting and ask, uh, if there's any volunteers that are willing to do a demo or, um, you know, kind of intro on their project, it would be great to, great to have you. Chris, I think I was well and told last meeting to do a deep dive on fire drill, which is the other part of LinkedIn's, um, chaos engineering stack. It may have to be prerecorded for corporate security reasons, but, um, we should be able to have something for the next meeting. Okay. Okay. Yeah, I'll be in two weeks. So, cool. Awesome. Thank you, Michael. Anyone else? Yeah, we might be able to do a detailed demo with Chris. Okay. Yeah, just let, just let me know. So generally, you know, I want to fit about, uh, one to two, uh, demos slash presentations per, per meeting. So, all right. So I think that's it for now. I link the, uh, Google doc, uh, for folks to volunteer at the next meeting to present. Uh, but for right now, I will tag, uh, Michael, uh, is presenting on fire drill and then, um, UMA as, uh, tentative, um, depending on where you are in a couple of weeks. So, uh, thank you very much. And, uh, we'll meet, uh, in a couple of weeks. All right. Take care all. Thanks, everyone. That was awesome. Thanks. Thanks. Thank you. See you, folks.