 gets started now. So, I hope everyone had a couple of good weeks since we last chat. So, a fairly brief agenda this week. I know many or hopefully a lot of you will be at the chaos conference coming on Friday, so I'm good to hopefully catch up with some folks, you know, face-to-face in San Francisco. But today's meeting. Start with any introductions for any people to call. We're going to do a community presentation from Julian, who's going to do a little bit of a demo around Istio and Chaos Engineering, which I'm personally excited about since Istio is kind of a growing project. And there's a lot of, let's say, knobs and switches that you can play with with Istio. So, I'm kind of curious what Julian has to share. And then I'll just kind of do a brief kind of update where we are at Landscape and White Paper. So, before I get started, any new folks to the group that want to introduce themselves and say hi? Going once, going twice. Hey, I'll jump in. Hey, me, Calco. I haven't been in this working group just yet, but I've been busy in other ones. And so, I saw the Istio topic. I've been wanting to jump in and then figure I had to. So, I'm compelled today. Thank you, Julian. Awesome. Good to hear from you, Lee. So, let's go on. So, oh my gosh, my computer is going crazy. So, to kind of start things off, we'll hear from Julian. He linked his slides, but he'll talk a little bit about, for people that are not familiar with Kubernetes, he'll do like a brief, brief, brief, brief intro and then talk a little bit about Service Mesh and Chaos Engineering. So, I'm happy to share my, or happy to give up the sharing if you want to kind of steer things, Julian. So, let me go stop sharing really quick. All right. So, you should be now empowered. Very good. Can you hear me okay? All right. Cool. So, welcome everybody. I hope, can I zoom? All right. That is better. So, we're going to talk about Chaos Engineering with the Service Mesh. And even though many people are more interested into Service Mesh than Chaos Engineering, I hope you take out a lot of knowledge from, to implement Chaos Engineering in your organization. So, who am I? I'm a software engineer. I turned DevOps and I used to work for Unity, the game engine company. I am now in Sweden, Stockholm, working for Discovery. And you can contact me or feel free to drop me a message if you want to know more about these topics. So, as I said, as Chris said in the presentation, we're going to cover a little bit the base ground of why Service Mesh and how it came to be. We also follow with a few demo. I will demo Envoy separately from Istio because I've done this talk a few times now and people have kind of a hard time to grasp what the power of Envoy and how does it fit with Istio. And of course, since it's basically intended for people to introduce people to Chaos Engineering, I will demo, I will introduce the concept of Chaos Engineering and demo some fault injection. Feel free to stop me at any time if you have a question. I don't know, I don't think I might see everybody, but it's all right. So, at the beginning, there was an app and the app was code and that needed to scale. So, most companies, they have this big monolith and inside that monolith, they have a few components that are pretty independent from each other. And instead of scaling vertically, meaning buying bigger bugs, they try to break it down into what we call microservices. So, into separate services. The problem is that instead of just calling a function that is nearby in the code, they have to go through the network and that brings a whole new set of problems eventually. So, from there, the deployment is not one thing that's done atomically. So, now we have tons of little microservices that does one thing. And to solve that, Docker and the container came to be. The change is that it brings to the code is that now you have to code in a certain way, respecting the 12-factor app. But it also brings, like, facilitate the deploy and the build because now you can have a declarative language to describe your deployment and your build. But the problem doesn't stop there because how do you scale your container? How do you schedule them? How do you make them talk to each other? How do you make sure that they are healthy? And also, the secrets and the configuration are kind of hard. But this is where all those points here, Kubernetes can help. So, Kubernetes is a scheduler. Basically, Kubernetes fix the deployments and the rolling out of new version. It's very good to increase the speed of development for our team. But it doesn't solve everything. So, as I said previously, the network is still a problem. So, we introduced these eight fallacies of distributed computing that people might be familiar with. And if you see the first one is the network is reliable. So, that's the biggest lie people tell you when you say just send it and you will receive something. It's never true. You can have hardware failure. You can have misconfiguration and packet get lost all the time. So, to cope with that, there's also this RFC that I highly recommend. It's three-page long. It was written 22 years ago and it's still relevant. It's actually quite funny. It's totally worth watching it, reading it. So, of course, once you have Kubernetes, you also have all those problems here to figure out. And it becomes tremendously overwhelming for a team to really handle each of these building blocks separately. For instance, metrics is a real pain for developer because they have to figure out some to have some kind of consistency across all those microservices. And so, you also have a lot of problems with authentication. And what happened if one service act badly, you want to cut him out and just serve traffic to those microservices that are actually healthy? And what happened if you want to do canary release? If you have a small, if you want to just do A-B testing and just route a small part of your traffic to a new release to test it? How about failover and all those things are really hard to get right? I mean, I don't know if anybody has implemented retries, for instance, if you have to retry over a service. I mean, I deduce myself that way. So, it's quite hard to get it right without breaking something. So, the way developers solve that is they use more code. The problem with more code is that you have an explosion of language framework and version. Just imagine for each language, you have to have one framework with the exact same version or API with depending on which feature you use. So, it's almost impossible with the complexity to maintain and upgrade. And sometimes you just want to have a service running and not touch it because by changing something, you might break something. And it becomes so much complexity into the system that they talk about the distributed monolith. So that if something breaks, it's basically everything breaks. And there is a really good talk. I think it's called Unabstraction by Zachary Talman that described what is a good abstraction in codes if you're interested. But basically, all those problems with the deployment and what happens if you have rolling updates, you don't have downtime. That means that in your code, you have to handle two cases. If you, for instance, migrate from one database schema to another. And also debugging becomes tremendously complicated if you have all those microservices talking to each other. You want to know basically what happened. And so, there is this trend now where something becomes irritating developer in what they do is they take that code, make it a separate part of the infrastructure. So, we see that trend very much with the service mesh trying to fix or to not fix, it's not the right word, but to basically abstract away the network. So, there is this, the genesis of service mesh of Istio in particular is I think it's done serially from Google who went touring in Europe and the US asking about the main Google customer, what are the main problems? And for instance, banks have, I think it's trillion, billion of dollars invested into hardware to encrypt all traffic. So, if they don't want canary release because nobody wants their bank to try to see how they handle money today and see how it works. But they very much want the encryption, especially if you are in Europe, GDPR actually force you to encrypt your traffic. So, that's one thing to think about. Also, connecting traffic, even if Kubernetes provide some service discovery that you can manage, doing canary release in Kubernetes is quite hard. And also, the last step is observe, which is really hard for developer to know how the service behave once it's in production. So, here's the link of the video if you're interested to basically see further why you don't need the whole thing. You can just pick one block and go with it. It will make your evolution towards a complete service mesh easier. And so, by talking of what is a service mesh, I like to explain what problem does it solve. And the only problem in solve is communication between services. Now, it's not just the one function call away. You're not on the same box. You have tons, you have to go through the OS, you have to go to the network, to the other OS, and then the application received. So, there is a lot of component that can go wrong. But the idea here is to, a service mesh can be summarized as a network for services. You don't want to describe all the IP table on every host, especially with all those moving parts. It's become quite hard, even with automation to keep up. And so, how does a service mesh solve the interior service communication is basically the sidecar pattern. So, here you can see that service B is in what we would call Kubernetes pod. So, it's the smallest unit of deployment in Kubernetes. It can be, a pod can contain one or more container. And the goal here is to inject a proxy inside that pod that listen to every network packet in the service B is sending or receiving. So, if service A here wants to talk to service B, it has to go through the proxy. The interesting part here is that you have the control plane. So, the proxy to proxy is called the data plane. The control plane is where all the overall governance, the decision get made. So, for instance, you have three parts in Istio specifically, is the pilot that is in charge of making sure that the consistency of the routes are spread to the proxy. So, if you create a new services, you want to let the other proxy know about that new services, you just publish an update and the pilot will be in charge of updating his route table. You have to keep in mind that since all the traffic goes through those proxy, you see exactly how much traffic gets used. So, all those statistics get sent back to the mixer. So, for every request, there is also a request to the mixer services. Then you have the Citadel, which is in charge of encrypt, rotating the certificate to make sure that service A is authenticated with service B. Actually, it's not the service B and A that communicates, it's the proxy that encrypts. So, the encryption is abstracted away. It's taken away from the code and put inside the proxy. So, you don't have to worry about, is my call encrypted? You don't care. And so, for the data plane, the proxy is actually Envoy made by Lyft. If anybody hasn't heard of Envoy, it's basically a single binary that takes 10 megabytes in memory. It can handle 2 million requests per second. So, before you have a scaling problem with that, if you have a scaling problem, just let me know. I would be very curious to see what you're doing. So, here I want to demo. I have HTTP bin, which is just, I want to demo Envoy. So, is the font size okay? All right. I just want to curl this HTTP bin with the header to show you that this is the header that I get back. So, it's just a very simple application that just prints depending on the path I use. So, here I'm going to connect Envoy to that HTTP bin. And now, look at the proxy. Yeah. I forgot to curl from the, sorry. Here, if I query from the proxy, you see here that Envoy added some headers into it. So, you have the request ID, which allows for tracing. So, all your requests are marked in a way. And it's the responsibility of the application to take that ID and pass it to the next service is going to call so that you have all the tracing necessary. So, that's a very quick demo. But we can actually create errors and see if I'm not. So, there is a 500. And you can see that the telemetry is quite interesting to see. Okay. So, you see that I actually defined some retry logic. So, you know exactly what happened in the service how many times it retries. And it returned the 500 after the three retries. So, that was for the demo of Envoy. Is everything okay? I cannot see. Is it good? Okay. That's good. All right. Where was I? Envoy. That's done. So, as you can see, usually if you have to configure, this is the Aussie network stack. So, if you have to configure an overlay, this is quite low level and you have to deal with the IP table and it becomes like tremendously complicated. If you have 20 service that talk to five database, that's 100 rules that you have to implement. So, but with a service mesh that's on top of TCP, IP, you can actually just say name that service web and that database and you just have one rule that scales instead of having to be specific about what you get. It's completely different for authentication if you say I want to talk to that person or I have this phone number. It's a little bit the same. So, yeah, the control plane here is the pilot that does service discovery. And I was wondering what was in the code. So, how do you have to code your application in order to gain from the service mesh? And it's so smart because you just give it a name. It doesn't matter what you use. You just give a name to the service and you stick with it. And Envoy will make sure that the name is resolved to the service that you define in the rules. So, here I cannot understand for the life of me why they use this port inside the URL, but I guess it's just the demo app that the Istio team built up. So, they might have some reason. I have no clue, but maybe it doesn't matter. Maybe it's just okay, we use a port and we stick to it. So, we have the full IP. It's not default to 80. And so, if you want to see what is a manifest of, for instance, what Istio call a virtual service, it's quite easy. You can define the host that is going to send to. And you can define rules to match how the traffic is going to get routed. So, here you see that I have a HTTP clause that will match with the header containing an end user that is exactly JSON. And since in CML, I guess this part is together. And so, if a header in the request contains the header, end user, and is JSON, it will be routed to the service called review V2. So, V2 is like the tag name in Kubernetes, but you can implement it different way. And otherwise, it gets routed to V1. I will make a demo to clarify what all this means. Another thing is that resiliency is basically out of the box. You want to implement retries. You just have those three lines to add at the end of your YAML. I mean, I've implemented retries a few times. And this gets so much easier. You just know how much you can know your timeout. You can understand the overall behavior of the old request. So, yeah, about authentication and security, you can, of course, implement mutual TLS in between those proxies. You can have namespace and service-level policies. And, of course, it integrates very well with the Kubernetes with the airbag. Observability is actually quite interesting. I have this key allee, and you see I deployed an application. It's a bookstore. It's the tutorial one. And all the traffic reached this product page. This product page talked to that review service. That review service only the version 2 and 3 talked to the rating service. And you see that this is the application that gets called. And you can see that sometimes you see the color of the reviews changing. And that is the different version. So you see that V2 is black, V3 is red, and V1 doesn't have. So this gets, you have a free load balancing. And that's how you can visualize what happened. You can see the percentage of requests. So 50% of the request goes to the review service, and 50% goes to the detailed service. A nice thing to have is also those tracing. So with the ID, the request ID that was in the header, you can do very interesting things. So you see that the detail page is called before the review page. But maybe now that you see that, you can say, but can't we make that call in parallel and save a lot of time? And you can improve your service that way. So yeah, let's do a little demo. So that's the application that I showed you. And the best thing is that it's code independent. It doesn't matter which language it used, because it's separated from the code. So let's see. All right. We can see also the, I like to show this tool. It's called Slapper. And it's allows to generate some traffic and show the response time. All right. So here, what we're going to do is we're going to set all the requests to go to v1 here. And let's test that. Okay. As you can see, let's check if it's really, so yeah, all the request goes to v1. I don't have to worry about anything. The other are not used anymore. So what we want to do is actually specify that the user, let's call it JSON, is going to be routed only to v2. So it's the only one to have access to v2. And the way we can do that is by creating a new page, super secure app. And there, I'm logging as JSON. So you see that without impacting the user, I can have everybody on the main version and have a specific version just for my user so I can test whatever this is happening. And I remember at the beginning of the working group, I think is a Michael from LinkedIn made a demo about how do they do chaos engineering? And that was exactly the same. They choose for one user which problem they want to generate so that you can see. And from there, we can come back to the demo. So all right. Is it okay so far? All right, perfect. So I put a list of the various service meshes that have more or less the same features. Some have more, some have less. I would say that LinkedIn was made in Scala, but they re-implemented it in Go and Rust. So they provide a simpler adoption rate. So they don't need to map the whole cluster. You can just introduce a proxy gradually. Console is very made very console connect and made it very easy to implement security, so encryption. So depending on all of that, you can choose whatever your organization needs. And there is a nice article on the Hashicorp website where they compare console and Istio. So now the chaos engineering part. So you have the official definition which is chaos engineering is the discipline of experimenting on a distributed system in order to build confidence in the system's capability to withstand turbulent conditions in production. So we want things to go well even if they are mistreated. And it's a little bit like the vaccine shot where you inject a little bit of harm in order to create a reaction from the body and for the body to be able to defend himself. And the thing is it's not very much to cause problem. It's very much to reveal problem. We want to know how the system behave under these certain circumstances. Because we can have a lot of idea of how things work, but in reality it's completely different. And we all end up surprised and oops, and there it goes. And it's like the Nintendo switch snafu that happened at Christmas. So it's impacted a lot the business. And here I really like this explanation is chaos engineering is exploratory testing of non-functional requirements where non-functional requirements are the requirements that if not met render service non-functional. So it's quite very hard to define what would make the service withstand turbulent condition. But that's why we need to explore and test. It doesn't mean that we have to blow half the cluster to find out because that way you will know exactly what it is and you would probably won't like it. So the main yeah, I love this good having a child chaos engineering for everything in your life. And so what chaos engineering is not is having this belief that if you do things without paying attention, it will go away because hope is not a strategy. It's really doing things in order to find out how how to know because there is things that there is unknown known and there is unknown unknown. There are variables that you don't know yet. And we might need to find out that the most of the things that usually go and test it is for instance, draining the request. Sometimes when a deployment, there is a shift of traffic and those requests get send 500 back and nobody noticed because it's during the deployment. So sometimes things go wrong. But the health check and all those timeouts are super hard to detect. And so there are different types of error you might want to test. For instance, what happened? The difference between a service is late or a service is unreachable. And that might be difficult things to detect. Or what happened when some service reply 500 for too long, but the other are fine. You might want to circuit breaking everything. Or there is this, I have this story about doing a database migration on a huge MongoDB set. And the migration was running before the program starts. So the thing is that nobody on Kubernetes, there is this sliveness and readiness probe that nobody ever thought that the migration will take longer. But it did. So Kubernetes saw that and killed the migration right in the middle because we didn't implement a health check endpoints so that Kubernetes could check how the migration was doing. And those kind of things would have been nice to be caught beforehand instead of right in production. And so this is very much the role of site reliability engineering to identify weakness and improve resiliency. There is also a very good paragraph in the Google book called SRE. And that is about a service level indicator and objective and agreement. And it says, these measurements describe basic property of metric that matter, what we value, what value we want those metrics to have, and how we'll react if we can provide expected services. So basically the SLA is money to the customer that you made a contract with. So okay, we can provide this level of service. And with a service mesh, you can see that you have all your data right there. You know exactly how many 500 you sent back last months because everything is stored in Prometheus. And you can query everything from as long as you want because all your data are stored. And so for doing chaos engineering, it's a good idea to, they call it a game day. So fill in the blank when you want to answer what happened when. And there is this good article about breaking DynamoDB. One thing to note is that DynamoDB doesn't scale down. So after a certain load, DynamoDB scale up and you cannot reduce the size of the box after that. So you are stuck with a big bill. If you want to scale down, you have to migrate the data. How to do migration, that is a very good game day. Say, we're going to practice and recover from the data migration from one instance to another. That is a nice example. But nothing can get done if the organization is not behind. So it's like the mentality of the organization should be expect failure and learn from it. We should not fear and cover up failure. It's something to be dealt with. And so a very good idea is to have the high severity incident management program, to know what to do, who to update, who should be, who should communicate that and how we can resolve that in a useful manner. So it's a very much a cultural approach and never underestimate the power of root cause analysis. It's really nice to have a document with a proof that saying we did this, this happened because of that. And we here are the result and everything is documented because if you don't learn from a mistake, it's bound to be repeated. And so there is this, I find this recently about the Toyota assembly line. So they have this motto called Kaisen, which if you take the character separately means change good. It's basically means a continuous improvement. And so they have this end on court on the assembly line of the car. And as soon as an employee detects a problem, they pull that cord, the manager come to the site to check. And if the problem is severe, it can actually stop the whole line. But at the scale of Toyota, where they have to produce thousands of car per day, maybe, it's a lot of money that they might lose. So problem gets fixed really fast because the detection of problem is really fast. They detect the problem early and they fix it early. So they have a procedure on how to fix problem basically. And yeah, if for people who would like to start chaos engineering, I really recommend by experience to be careful with the word chaos because it means different things for different people. And I really like that people in IT are really excited about it, but for a manager saying, oh, we're going to blow half the cluster to see how it react might not sound like a good idea at the time. So I would recommend that you use the word resiliency at first for after telling, okay, we're doing chaos engineering. Once you have the result, it's easier to explain that was chaos engineering. So that's the little setup because I think there is a few steps that might be important is instead of mentioning chaos, mention the results, mention the goals, like we want to improve the resiliency of the database. Like how do we do that? If we don't have monitoring, it doesn't exist. Something that doesn't get measured is just going to be forgotten. So the good thing is that if since we have all those nice graph now about how the system react, we can make, you can feel almost like, oh, this is not normal. It was not like that last week. And you have some kind of gut feeling of how things should look. And that could be called your steady state. Once you understand the steady state, it's easy to make hypotheses like what happened if, because you know, it's like a use to it to what is, how things should look, and you can challenge that by forming an hypothesis. And so once you have your hypothesis, you can set in place real-world events like, I don't know, shutting off one instance, see how the service react or killing a pod in Kubernetes to see how it gets recreated, and so on and so forth. The goal is to, once you did that, to write a report so that other may benefit from it. You don't have to reinvent the wheel over and over again. Another good thing is that once the report is written, you can talk about the chaos experiment and then become, it becomes like a positive thing instead of something scary, because the only reason chaos is scary, people don't know about it. It's nothing different from it than testing engineering. You want to know about something, so you experiment with it. And the last thing is to keep on doing it. Doing it only once won't make it improve, so it's better to set practice in place and do it often. So yeah, let's see a little bit of the demo for chaos engineering. I have this manifest here that will introduce a delay for our friend Jason that is connected. So I set a timeout. Maybe we can look at how easy it is to set a timeout in. So yeah, so basically what I did here is, if it's Jason, for 100% of the request, just add a seven second delay. Otherwise, for all the other, just send the normal one. So we don't impact anybody else at all. So here I'm unimpacted, but if I look at Jason, you can see that here the requests are stuck. And we should see an error soon. There you go. The error fetching product review. So you have a very small blast radius, a control blast radius to do your testing and to see how the service will look like if this service doesn't answer. So it's a very good way to be comfortable with errors. It allows for a very easy way to handling and to show that we can recreate problem easily. What happened when? So what was I? Yeah, I want to clean up because I might have another demo later, but okay. And so that was easy to just to clean everything. And let's see, everything is back to normal now. So yeah, I went a little fast maybe, but here are most of the resources that you can find on the on Chaos Engineering and also on Istio. I highly recommend you look at it and thank you for your attention. Awesome. Thanks, Jillian. Super thorough. Any questions from folks? I totally appreciate the intro to kind of onboard Istio for folks that were not kind of familiar with how it works with Kubernetes and so on. So I think it's good to kind of set up everything. Yeah, I think there was a lot of myth around it and what it is. And I just want to show simply how to handle that. It's a lot of work. I would recommend that if people are not on Kubernetes, it might take some time. But if kept fully planned, it can be a very great tool for the organization. That was awesome, Jillian. I really appreciate that. I also especially agree with the whole idea that we should try to change the culture around reporting outages and being very upfront about sort of what failures happen so we can all kind of learn and grow together. I did have a quick question just sort of about that little YAML file at the end there. That's Kubernetes YAML, right? So how does that actually interact with Istio? Is that is the delay injected between the actual application and the proxy? Or is it injected before it hits the proxy? So if you want, I can show you what is the manifest to install Istio on Kubernetes looks like. Sure. I'd love to see that. It's 5,000 line of YAML. So if you never see the wall of YAML, that is it. There is the wall of China. And then there is the wall of YAML where this is like insane. There is config. There is everything. Okay. To be honest, I installed a lot of add-ons. So you know, everything that could get my hands on, I installed it. I guess like if this is 5,000 lines, like how much did you actually have to configure yourself? Or was it all just like cut and paste? Istio, right? For the most part, like this thing ships with Istio. Yeah, exactly. Actually, you have the helm chart, but I don't like to install things with helm because of the Tyler is not super secure. So I try to avoid that. I just template the manifest to Kubernetes YAML and I pushed it. So I guess back to just the original question. It's between the proxy and from the proxy that the delay gets injected there. So the delay gets sent to Kubernetes through the custom resource definition. So Kubernetes send it to Istio, the pilot. The pilot take that info and spread it to all the proxy, to all the envoy. Gotcha. The thing is I can show you what the envoy, the YAML of envoy looks like. And it's a little bit more verbose. So you see Istio come as a manager of the fleet of envoy and it allows for, here you can see that retry on 503 retries. In Istio is much less line of code of YAML. So it provides a nice abstraction on top of envoy, especially if you have like hundreds of envoy to manage, Istio becomes really useful. Yeah, it looked very clean. I was just curious how it all ended up working. Cool. Thank you. I have a question on the same point that Matthew was talking about. I tried injecting the delay and I realized if I try to access the endpoint using curl, the delay is not enforced. However, if I go through the web browser which then goes through the ingress gateway, then the delay is observed. So I guess that can explain the point that you were making earlier that it is injected before the proxy. So if you go through the proxy, then the delay is observable but not too curve. Yeah, exactly. So the thing is that there is this, the main problem, the question also I get from that is that people who have already a service, for instance, with SDK inside that connects with TLS might create some issue if you try to go egress. You have to configure that to allow service to go outside the cluster and you have to configure for traffic to go in the cluster. So the north house traffic is also part of the YAML definition that you need to think about. It's not only, I think it is west service to service. You also have to think about what goes in the cluster and what goes out. And the tagging of the request is important but I can maybe show you what was the basic gateway. The gateway is interesting to see if you look this one. Okay, so you define a virtual service which is the product page. This is the web page and you can define route and for those routes you send it to that service. So that's the ingress gateway. So that's a little bit to you allow this traffic to be, to reach the cluster because by default nothing is reachable. It's an opt-in mechanism. Is that actually a question? Exactly, no it does and that's exactly what I was doing as well because when I was starting to play with this I realized that it cannot be using curl, it has to go through a gateway and that's when it works. As a matter of fact, funny enough I'm actually giving a talk at O'Reilly Velocity next Monday on this exact same topic. So this is a very relevant thing. I have a lot of content prepared already. I wish I could have gone through sort of the on Y concepts but I only have a 25-minute slot so I'm straight away jumping into the code and saying all right, here's the role that is on Y plays and here's how you can inject and I like the way you pitched it that to your manager do not even mention the word chaos. Unless you have results, if you have results you can do it. Right, we mentioned them resiliency and as a matter of fact we were talking to a customer and we said, hey we're going to use, this is a financial bank and we said we're going to do chaos engineering and the customer completely freaked out in the room as the manager walked in and then we said, oh no, no, no, we're going to do chaos engineering to make your applications resiliency. Now the manager is talking and he's listening and he's more tuned into it. So I think that presentation on how you pitch it to the senior management versus the developers is very important and I think you did all very clearly. Well, thank you. I'm very happy to hear that I was not the only one who got that kind of reaction coming from non-technical people if you don't know but I heard that even Netflix renamed the chaos engineering team, the resiliency team. So they want to set the goal, not what they're going to do. Resiliency is the goal. Yeah, because chaos engineering, if you think about it, is an implementation detail. What you are really trying to do is bring resiliency in there and that's sort of the ultimate goal there. The way I usually pose it to customers is chaos engineering is a tool to get you to be more resilient. So you can still talk about chaos engineering without freaking people out so long as you talk about the end goal. So I think you definitely covered that point. Exactly. And as a matter of fact, the title of my talk at O'Reilly Velocity next week is that bringing more resiliency into Kubernetes using chaos engineering. Very good. I have more demo. One thing I'm really excited about is the dark launches. So you can actually mire your traffic from the ingress. So you would have a secret cluster cloned with a new product and you can test live all with real traffic going there because all the answers back to the proxy are discarded automatically. So you can do super powerful thing. And just with configuration, basically, I know that Weave works. It's a company. They have also one service mesh. And what they do is they use Kubernetes to query the cluster to find what is the state of the network. And with that, they compare with whatever is done, the Git repository. And if something doesn't match, that means that someone made a manual change. So they enforce the pull request for the infrastructure change. So they can avoid infrastructure drift so that the infrastructure doesn't change over time. And I think they name it GitOps or something because everything is in Git and whatever is in Git gets reproduced into the infrastructure. So this is a very interesting topic. And I think we are just at the beginning of what's possible to do. I agree. And Istio does open up a lot of possibilities in terms of the capabilities does it offer and how you can leverage that for chaos engineering. Thank you. Thank you. Any other questions for Julian? All right. That was an awesome demo. So super appreciate that. I'll make sure to post that to the mailing list because it was great to get the walkthrough from beginning to end. So a lot of people will appreciate it. So thank you, Julian. Thank you. My pleasure. All right. So we only have probably about five minutes left. I'm not going to, you know, take too much time outside of, you know, I mentioned last time we met. We have a chaos engineering landscape now for CNCF. Please, you know, send pull requests. Like there's obvious things missing, you know, like Netflix's chaos monkey and so on. But I haven't been too lazy to edit myself hoping someone else does it. More importantly and more timely, you know, there's a chaos conference this Friday where I know some of you will be at that I think Matthew and some other folks from his organization are putting on. So hope to meet some of you face to face there. And if you have any kind of topics you want to discuss, let me know. And then finally, I had some volunteers for the kind of intro to chaos engineering topic for KubeCon, Claudian of Con in December. So thank you for everyone volunteering there. So we'll have a session there that's hosted by a few folks. And finally, you know, there's some work for us to kind of iterate on the white paper. I've just been super busy that I haven't had time to kind of go look at myself just been wrapped up with traveling to China. So I appreciate I think Sylvain's been driving some of that. I don't know if Sylvain you're willing to give a quick update on that or maybe he just disappeared. But essentially, I'll send the note out to the mailing list about taking a look at that. Eventually, I'd love to kind of get that a little bit further along and kind of use that as the basis to pitch the official blessing and formation of this group under the CNCF to the technical operating committee or technical board there. So but that's mostly on my onus but I appreciate it folks have time to contribute and look at that. Other than that, you know, we'll meet again in a couple of weeks. If there's anyone in this group that would like to volunteer to present on a topic, let me know. I found Julian's topic today awesome and very interesting. So if there's any anyone else that wants to talk about something, let me know. I know Mikhail from Bloomberg is up to present but I know if he can make it in two weeks, but if anyone wants to do something, please shoot me a note. Are there any any topics or questions for the last few minutes that we have? Do we have a draft date or a date where we want to make that but RSC white paper final? There's no, you know, there's no official drop date but I would so it would be advantageous for us if we got it ready for December, early December for KubeCon, mostly because we'll have a lot of PR kind of analysts people there at the conference. And so it's just a way to kind of drum up interest and we actually could have folks if folks are actually planning to be there we set up kind of meetings, you know, for folks kind of discuss it. So that would be my ideal to get it in shape for December by December 10th. Got it. I think that's completely doable. Yeah, cool. Yeah, I think we I think it's totally doable. Like a lot of us have been, you know, busy with, you know, conferences and so on. But I think now we got a good group of folks that kind of iterate on it and get there. Sylvain's been doing a good job of pushing it along as far as possible. Yeah, I think I've reached my limits in not just making some make it sound like, you know, it's just my idea of chaos and drink. So it's probably better if, you know, many people actually contribute so that we have, you know, variety and diversity there. I just made a note to take a look. So I'll help you out. Thank you very much. Appreciate it. Well, we just, you know, we just said we have to get a visit of people really just charging out and pumping it out. Like, you know, maybe because probably sounds like to me, there's too many people with not enough time. And then you get too many opinions and nothing's getting done. Yep. Yeah, I mean, that's a challenge with open source sometimes in general, right? So yeah, I mean, the idea would be like if we get, you know, it doesn't have to be this, you know, like, you know, chaos engineering tome slash Bible, like, you know, you know, we're basically positioning it to introduce the topic to kind of the wider cloud native community, and basically offering like an initial landscape, right. So, you know, I would love, you know, more additions to that landscape. So when we kind of launched the working group officially and announced the white paper, we had the kind of updated landscape to kind of go with it, right. That's kind of what we're trying to really do here is educate the wider, you know, CNCF and in cloud native community on chaos and resilience engineering. Cool. Any other thoughts, concerns, questions before we cut out? Hey, crazy, you can add me at some point to do the security chaos engineering. Go on. Okay. Awesome. Yeah, I'll let you, I'll let you know just I'll shoot you a note with a couple of upcoming meetings and then just let me know what works best for you. Okay. All right, Chris, a quick question. Good, cool. This Friday's Gathering in San Francisco. Correct. Link to that. It's a conference called... It's chaos comms where we've been sold out of tickets for a little while now, unfortunately. I mean, they're on the waiting list though. Yeah. I'll have to check CNCF. We may have an extra... I have to go check. We may have, we'll see, but yeah. Maybe a few extra tickets that come available. Yep. I won't be able to attend, but we're just curious about it. Yeah, I'll send you, I sent you the link via chat, Lee, but yeah, so cool. Yeah, there'll be folks at Velocity too. So yeah, I'll be there. Okay. I knew Tammy will be there as well. Casey will be there, Rosenbaum will be there. Yes. Yep. I'm most likely, I'm planning to make it work. So maybe we could kind of do something. Let me, let me think about that. Okay. All right. Thanks everyone for your time and thank you again, Julian, for that amazing demo. So we'll get the stuff published on YouTube so people could watch it. I'm going to go update the readme too, to kind of link to previous videos. I've been a poor steward of that readme, so I'll get it done by the end today. All right, take care. Great. Thank you. All right, bye-bye. Thanks, Chris.