 Hi. Good afternoon. Let's go ahead and get started. Sorry, we're running a little bit late. I'm so honored to see many of you on a Friday afternoon, so really appreciate you guys are still here. So let me quickly introduce myself. Linh Son, I'm an Istio contributor, and I started working on Istio not so long ago in August of this year, so I'm also working for IBM. Hi, my name is Mandar. I've been working on Istio since the project started. I work for Google, and I'm really, really excited to be here. First, a show of hands. I think there have been many Istio sessions so far, so we just want to kind of get an idea of how many of you have attended one or more Istio sessions before. Okay, great. Yeah, great. Yeah, because we want to use that information to help us. So let's talk about common dev-op challenges, right? So what we're really trying to do with this session is to share with you guys some of the challenges we foresee and how Istio can actually help you solve those challenges, so we really try to connect the dots for you. So the biggest challenge number one I feel is how do I roll out a new version of my microservice without any downtime, and how do I actually roll out a newer version of my microservice without impacting of my existing microservice, right? So I want to have the new version, like I've been telling Mandar, booking for, there was one thing I really don't like, booking for, because the red star, I saw Chinese, I have an issue with red, because red means I hate you, and so we recently did a new version of booking for, and we changed it to Istio blue, and you saw both of us are wearing Istio blue, so we want to change that to a different color with a different shape, with a hot shape, so how do you actually have a new version rolled out, but you can still continue have your user working on the previous version, and you only not expose the new version to anybody else, so that's the biggest challenge number one, and the second challenging is as the new version being rolled out to production or test or stage, how do I do A-B testing, right? So I was telling, I wrote this newer version of Istio blue heart for the review service of booking for, but I don't want to roll out to every single one of you, I just want to roll out to user myself, because I want to be the pilot testing and test it out, right? So this is actually not possible if you use Kubernetes by itself, you can't really roll out a newer version of your microservice, and without really testing the newer version and not impacting your existing of anything, so that's the second challenging we're trying to solve. The third challenging we're trying to solve is canary testing. As you get comfortable with A-B testing, as I get comfortable with the new review version, I wrote about the Istio blue heart, I want to roll out to maybe a percentage of my user, because you know I'm happy with it, and I want maybe 10% of my user start to testing it, right? So I want to make sure the 10% user can actually be happy, so that we can actually roll out to a much wider version of the, wider percentage of the user. So these are the different challenges we feel like Istio can really step into help. You could do it with Kubernetes with a percentage, but you have to be really, really good at math, and actually you have to really watch your system really closely, because let's say if I want to say 90% of my traffic goes to my existing version, and 10% of my traffic goes to my new version, then I have to really look at my Kubernetes deployments on the counter replica sets to say, okay I need to make sure my older version has a nine ratio to one ratio was my new version, so you have to count it very closely, but with the Istio, you don't really have to do that. So there's also some of the other challenges, right? What if some of your service goes down? Well how do you actually inject foot into your application to simulate that foot exists in the test or staging environment before you even roll out to the production environment? What if your team member are writing your microservices in different languages? That's the biggest thing people are advocating for microservices, right? People have their skills and language and they could choose to write it in their preferred language, and how do you want to have people actually read you everything in different language to do the retry and to do the inject foot, right? So that's a lot of work and they are not implemented in a consistent way. The other thing is how do we over service actually handle certain rate? So as the version three, it's putting some of the load on the system. It's not actually impact the user behavior of the existing version. And also, I heard this word observability. It's pretty hard to pronounce for me, but I hear it throughout the conference this week. It's like how do you actually gain observability of your applications as it's deployed into your cluster? So you guys already know Istio, so really quick. This is a project launched by IBM Google and Lyft since May this year, not really so long ago. And we really are trying to bring Istio to solve these problems. We were just mentioning early on intelligent routing and load balancer, so user can quickly do AB testing and be able to do canary testing before they actually change their version to the particular version they like to. It helps you to do resilience testing across different language and platforms. So regardless, whether you are writing Java or Python or Ruby on rail, we have the same on voice side car inject for you and control the traffic for you. You can also do policy enforcement on the entire service mesh. In fact, we're going to talk about rate limiting as an example of the policy enforcement very soon. And then telemetry and reporting observability. This is actually my favorite because I feel like this is the heart problem of DevOps, right? Without knowing exactly what's going on with the system. It doesn't matter really, you know, how your system is doing because you have no visibility into that. Components of Istio, since you guys already visited Istio sessions, many of you are really quite trying to run this through quickly. Just so you guys know, a lot of this information is already available on Istio.io. So feel free to visit there. So the way I'm trying to think about Envoy is actually we have a chart on Envoy. It's the sidecar that we use in Istio to inject into every single of the micro services. And the reason we pick Envoy is it's really battle test at Lyft. Lyft has 100 plus services. It has like 10,000 VMs really being tested really well with two million requests per second. So on the right side, you can see what Envoy already provides and also what Istio actually, as part of the Istio work, how the Istio team actually contributes to Envoy at the bottom of the right side. So getting down to the architecture of Istio, as the traffic coming through the internet, as you visit your micro services that's deployed into the mesh, you can see the traffic goes to the Istio Ingress, which is the Envoy sidecar, I guess it's on the left side. And then as that's going on, it's going to hit on to your micro services. In fact, I try to think of Envoy as the dummy router without pilot because really Envoy doesn't know how to route your traffic without talking to pilot. That's why this picture always having Envoy connected to pilot and it always ask a pilot, hey, what's how do I route to the next endpoint? What is the services endpoint and which part should I reach out to? And then also the sidecar also talked to mixer as far as ternometry and policy enforcement. So they always ask mixer, is it okay doesn't match the policy and also forward the metrics and monitoring information through mixer. Istio also took me by surprise recently, I said took through Istio also. By the way, we rename it to Istio security now. When I put Istio OS and every security and every single component into WeavScope, I actually find out Istio security doesn't talk to any of the other Istio components. So that took me by surprise and I find out the main reason is Istio security mainly talk to the Kubernetes API to make sure the service account, the certificates are generated properly and mounted properly. Traffic control, so traffic control really help us to solve our challenge in number one. So we talk about how do we roll out a new version without downtime or changing any of my existing code, right? So this example gives you a simple route rule where you can specify I want, in my production environment, I want 100% traffic goes to this particular version. So really simple. You don't have to change any of your application code. You can just apply this route rule using cube cuddle or Istio cuddle and then bone. It would be taking effect immediately on your system. Traffic steering, as an example here, we were talking about how do I do A-B testing, right? I wrote up this new version of the review system. I just want to test it to myself or maybe to my iPhone or maybe to all the iPhone users. So traffic steering is typically based on the request header or cookies to roll out to a specific set of user first. And then traffic splitting, as you can see, this is another route rule we have where we want to say, okay, we want to route 90% of the rule to production continue. And because the new thing, we already did A-B testing, let's go ahead route 10% of the new thing, which is the alpha, but let's go ahead route 10% and do some canary testing on that. Resilience is another thing Istio add out of the box to your microservices. So you can add for tolerance to your application. And what's really nice is you don't have to change your application code. So what you do is you can config this configuration to specify, you know, what is your destination and how do you want to config your circuit breaker? For the example here, like maximum connection is 100 and the HTTP makes requests a thousand. And it's trying to say, you know, for every single five minutes, it's going to check the host, it's going to check the HTTP endpoint to see if it's still live. And if it actually has an error for seven times consecutively, it's going to put it on to sleep and give an error code back for 15 minutes. So this is really effective when you need to inject a fault into your system. You can easily try it out, see how your system react to the faults. As you guys probably heard, having the right configuration into microservices is extremely important. So some of the services might have a different timeout than the other services. So out of the box, Istio supports a timeout configuration where you can say, you know, the timeout is 100 milliseconds and you can also config how many times you want to retry. Like in our example here, you can config retry as three times, three retries. So that's essentially means after 300 milliseconds, it's going to reach the timeout and then it can return a specific error code that you actually pre-specify in the configuration. So we support HTTP and GRPC out of the error code out of the box. You can also inject some of the delay into your system this way. With that, I'm going to pass on to Madar to talk about rate limiting. Oh, yeah. So rate limiting is kind of another one of those things that you need in a microservices environment, specifically just even within a mesh, you sometimes need to protect a production service from like a rogue or a bad deployment. So something is deployed and it certainly starts dadausing kind of your own service. So you definitely don't want that even in a canary situation. And then the normal API management rate limits are, of course, kind of well-known to those use cases. So this is just an example of how rate limit is configured. I won't go into the details of the format because actually we will see, oh, there is a demo at the end. So we'll see it, some of it, a demo. But essentially, we provide configurable limits and overrides. Istio provides multiple quota or rate limiting backends. So there is one that ships with Istio and you can always use an external kind of an enterprise-grade rate limiter if you so choose. And then telemetry, we have spoken a few times, but we have to make sure that we collect metrics in a very systematic way. And Istio defines a standard set of metrics which are what you would expect, right? So the mesh is monitoring every link, and every link has kind of a set of what you would call obvious or reasonable metrics. So you want to know the error rates, you want to know the request rates, the latencies, and then you also want to label them by what service is calling what other service and also service versions. So at present kind of that's the set of our standard metrics. We let you look at traffic by these different metrics and their dimensions by these kind of four or five parameters. So we have kind of taken you through several features that Istio provides, and you have also attended other talks, right? In order to use it to do a reliable application rollout, right? You can actually use many of these primitives and kind of fit them together. So here is an example of how you could do that. And this is just a proposal, kind of like your comments on it, but it's also an example of how you can stitch things together that all the primitives that Istio provides and make it do kind of a higher level task. So here we show that there is an Istio deployment controller, right, which is similar to the Kubernetes deployment controller. It is spawning its own replica sets and attaching horizontal pod scalars to those replica sets. So now, as those get more traffic, they will be automatically scaled. That's kind of done by the autoscaler itself. The key piece, one of the key pieces that Istio is adding here is being able to smoothly change the traffic percentages. So you can dial up from 1%, 5%, 10%, and then you can let the horizontal pod autoscaler actually take care of scaling the instances up or down. So that's kind of architecturally separate as like a traffic split controller, if you will. As I mentioned before, we collect a standard set of metrics, which means as this process is going on, you can watch those metrics, you can watch error rates that are both to and from whichever service you're trying to deploy, the new version off, and then you can make decisions regarding whether to proceed, pause, or rollback. So those are the three signals that are kind of available that the traffic split controller will act on. If everything is going well, at every stable, say, proceed, and then it will gradually go to 100%, at which point we can retire the old replica. Now this is kind of a simple use case, but one of the things with Istio is that we want to make it composable and we want other people to use it and do other great things with it, which means if you can actually use an external canary analyzer, which doesn't just look at the basic metrics, but it could be looking at lots of other things that kind of we don't have any visibility into, but it can provide those same signals saying, okay, whether to proceed, pause, or rollback. And then this kind of forms a useful and extensible mechanism to do a reliable application rollout. Istio being composable, it doesn't have to be implemented this way. You could actually implement this whole thing as CICD, and actually there were several talks which used CICD, like basically attached some of these pieces to CICD and achieved a similar goal. So from the perspective of like external canary analyzer, you, yeah, so. Yeah, this is a project I showed around this week. So it's actually a research project by the IBM research team. So essentially I've been complaining to them how much I love about Zipkin, right, distribute tracing, but I don't want to be a machine and look at each single request through Zipkin. That's just too boring. So they come up with this project that allows you to analyze a period of aggregated trace through a configurable time and actually analyze for you what's exactly going on. So you don't have to analyze the 100 traces yourself. They can also do the base deployment and canary deployment comparison and give you a visualization of, you know, is there any time impact throughout your canary deployment. Right, and then kind of tying it back back to the previous slide. This can also provide that signal and looking at lots of other things it can say proceed or stop. So last step, demo with kind of all standard disclaimers and prayer to the demo guards and all that. And remember, if it doesn't work, it's actually live. So let me start. Okay, so that's the book info app. It's outside. Oh, okay, sorry about that. Oh, it's down there. That's the easiest way. Okay. Do you want me to try that? Yeah. So I can just talk while, yeah, okay. So the book info app, you've seen it before. I think what we are going to do and Lynn set it up before was that there is a review service in the middle and we are going to deploy a new version of the review service and kind of show you some of these pieces at work. Okay. Thank you very much. All right. So that's the reviews v3 service. It's whatever the red stars and now we want it to, so right now it's deployed as normal, which means that it's going to round robin between the various different back ends and you'll see that it, so this is what we want to deploy. The blue stars and then the other two versions are there. This is just to show that we deployed the third version. Yeah, this is the default to Kubernetes behavior. Unfortunately, you don't have a lot of control, so if you have multiple version deployed, it's just going to be round robin. Right. Okay. So now what we will do is we will create a route rule which will only, which will send all the traffic for reviews to version v2, which means that we should stop seeing the, we should stop seeing all the other stars and we should only see the black stars. So if everything works right, we should see the black stars. And now we're seeing the black stars. So this is kind of, this is the setup. So now we have our v2 and we have recently deployed v3, but it's not, it's not visible yet. Right. Now what I will do is I will create a test rule. A test rule would be for AB testing and you see that we, well, we use the word JSON as the, as the tester here. So We should just change that to length. Okay. So and I'm creating this rule and what this rule is doing is if the logged in user is JSON, only that user is going to go to the version 3 of the application. Typically you wouldn't do it for one user. You would do it for a group or something like a test group or something like that. Is that, can you see it now? Okay. Thank you. Okay. So now if I log in as JSON, I should see the, these stars, right? Sorry. I should see the hearts. So you, you test with that and now I'll, I'll kind of show you. I'll put some actual load on it. Okay. So I'm going to put some load on it now and that load is going to be from that user JSON. So what should happen is we should see a lot of traffic just going to version 3 of the application. Right. Oh, it even bigger. Okay. Okay. And here is the Grafana dashboard. So now, okay. So now I've just put some load on it. This is the rating service. I'll let it load for a little bit. But yeah, as, as you can see, this traffic is all v3. Right. So with, which means that only v3 is getting traffic and ratings, v3 calls ratings. So we're seeing all the traffic here and no traffic from the other side. That was just manual traffic. Now, once this, this kind of goes on for a little bit and you're, you're sure about, yeah, your application works or seems to work, what we will do is we will update the previous rule and we will actually start like normal canary. Right. So now it's a 90, 90, 10 split, but it's for all traffic. So I'm going to do that by, okay. So now we replace that rule and now 90% should go to v2 and 10% should go to v1. And we should see that over here. So, so product page, as you can see, it's all going to v3 here and now I'm going to load it one more time, this time without the, this time without the cookie and that should let it just kind of rip and go to both these places. Okay. So, so now very, very soon we should see both of these, these light up and here is where I'm going to introduce the rate, the rate limiting part. Right. So now consider a scenario where this new, new service that Lynn just updated has like some tight loop, right, has some like crazy tight loop that puts a lot of load on the rating service. So even, even though you're canarying, v2 is the, is the primary version that is being used, but you don't want to affect the rating service just because there is some problem with the version p3. So, and as you can see, there is, yeah, so as you can see, even, even though now traffic is evenly split, the amount of traffic that ratings is receiving from reviews is completely skewed because there is a tight loop apparently in the, in the code that you wrote. Okay. So, so now, so now what we have to do is we, we have to make sure that we don't, we have to make sure that that does not happen and the way to do that is to configure this rate limit rule which is, which is what, what I showed you earlier. So what the rate, what the rate limit rule says is that if it's source version is v3 of the service, only allow 10 requests per second and during, when the deployment is going on, it will make sure that you're protecting your backend services as things are rolling forward. So we are going to create that rule by F and here's the rate limit rule. So now I can actually look at that rule. So now here's the quota rule which was the, the rate limit rule that was just created. So now if I load the, the system again, I'm going to load it for a little bit more time now. Okay. So now if I load the system again, I should see that the, the ratings service will never get traffic more than up 10 requests per second and I'm actually going to do it with the, with the cookie just to kind of bring the point forward. Okay. So now we have to let it rip for a little bit and we should see some traffic flowing by. That's great. So I wrote that code but I'm not supposed to impact production for version two, right? Right. Thank you. Right. So now as you can see v3 to v1 is receiving all these 429s and 429s means it's being rate limited. So all the traffic is being sent back from the proxy. It never actually reaches the service. And if you're, so which means your actual service won't be impacted and all your production traffic can continue going through. So again this is, this is just one of the features and important feature but one of the features that Istio provides and you can stitch together a very compelling application rollout with the primitives that Istio offers. So, so yeah, so now, now you can, you can see here that this, that this blue line is actually sorry. Yeah. You can see here that the actual traffic that's allowed to flow through is just 10, 10 requests per second. So that's it. That's the demo. That's amazing. Yeah. So you can write bad code but Istio really help you to have policy enforcement, traffic management. So you have full control on exactly what's been deployed into your production. You can rate limiting. So you might roll bad code but thank you for protecting me and our system. All right. Yeah. So we have one minute left. Maybe you have time for one or two questions but we'll be here if you guys have further questions. Any questions? So what are the, okay. So I'm just, my question is one actually. Does this work with only HTTP traffic or can I route the policy with any other traffic like TCP or UDP? So we also support TCP. Okay. And there is, so TCP has less number of attributes than HTTP because well, HTTP has headers and you can write policies on headers. But yes, we also support TCP. You'll also see TCP metrics. In fact, if you just scroll through, so there are no TCP services right now, so no TCP metrics but if we had TCP services in the mesh, we would have seen TCP metrics as well. And then you can also write policies on the same. Okay. I have one more question. So if my application is like an MQ consumer or producer, so it won't actually accept the TCP traffic or any HTTP traffic. So it's basically just a pub sub. So what is the best practice actually to roll out those kind of apps? Actually, troubles. I'm having trouble hearing you. Okay. Sorry. I couldn't get the question. Thank you guys. Really appreciate you guys attending. Thanks for attending.