 Hello, everyone. Welcome to my talk. This is, as the title suggests, about a way that is not so scary, but you can still add and use Istio on your platform. So a little bit about me. My name is John. And I work at the vacant retail group in the Netherlands. I've been doing platform engineering for about seven years. Well, it wasn't called platform engineering back then. I've also been doing a lot of consultancy, mostly to keep a fresh and diverse perspective, because I believe that if you stare at the same thing for too long, you've kind of gone blind. And well, that's not good. And that's mostly revolving around DevOps and the DevOps culture, cloud-native architecture, stuff like that. And if you need to reach me later on, I do have the QR code for the SCAD feedback at the end. But you can also find me on all of the Slack's, Kubernetes, Istio, CNCF, whatever you like. So let's dive into the scary part first. Why might it be scary? So it could be because Istio seems very big, and there might be an infinite amount of things that you might need to learn. But it could also be that it's not necessarily the learning part that scares you, but what if something goes wrong? What if you need a very big change, and the very big change goes wrong, and you get something scary? It usually isn't this dramatic. But that is, of course, the scary outcome that you want to prevent. So how do we make it not scary? Well, first, we're going to make the scary stuff smaller. So say you're scared of breaking production, because that is something that you might accidentally do. Well, that would be bad. But we can make it even smaller. What if it's just a single department or workflow? It wouldn't be nice if your developers are, well, they can't work. But it is smaller than the entirety of production. And we can even make it smaller than that, and then just scope it to an application. So if an individual application doesn't perform right, that's not great. But it's a whole lot less problematic than having the entirety of production down. So how do we do that? We make a plan, and we test our stuff before we do it. And of course, we can make changes that we need to make very small. So in the interest of time, I'm going a little bit fast in the first few slides, because they are not the technical details, and usually people are very interested in technical details. So let's race through these as well. Let's say you're here in the Kubernetes starting point, and you want to get to the Istio side of things. We're going to assume a couple of things about your environment. So let's say you have Kubernetes running, and you have some applications already, and maybe you're using the Nginx Ingress Controller that's a very easy starting point that people often use. And you might be wondering, well, how do I get universal observability or some other service mesh feature? How do I add this without having everyone refactor applications? So that might be a service mesh feature that you're interested in. You might have other existing requirements, and it's quite important to know those ahead of time. So usually just ask around, like, hey, is anyone using this, this connection to this database, or something else. So you need to define your starting point. Where are you now? Where do you know that you are? And how are you going to make a plan to go from that position to, well, your future position? So we have a couple of steps. The first two ones are just a handful of slides, so don't worry. The first one, we can already take off, because we already kind of assumed what the starting point is. Yours might be different, but it's often somewhere in that direction. And then the next thing is the inventory step. So you want to make sure that you know what you have before you start building, because you start building and you run out of time or money or resources. That wouldn't be really a nice execution of a plan. So we're going to do that first. So how do you do that? You take inventory of existing knowledge. So maybe you're already familiar with service meshes. That would be the best start, because then, well, it makes it a lot easier. But if you're not, you might have, say, a classic networking team in your company. And maybe they're doing software-defined networking. Well, that's very close to service meshing. It's not the same. But it might mean that you have knowledge that you can reuse, so you don't have to learn everything from scratch. And you might have some internal people that you can ask questions. That makes it a lot more comfortable and a lot less scary. Now, the same applies to the other two. So the infrastructure context would be, let's say, what existing computer network resources do you have? Or maybe you have some infrastructure that you can repurpose or buy something new or rent something or you're in a cloud. That's something you should probably also know ahead of time, so your plan actually fits your needs. Maybe you're in a regulated environment and you need to have specific controls on encryption or data privacy, important stuff. And then the last one, you're your people, your time and your money. Say you have a lot of time, a lot of people, not a lot of money, might mean that you need to do more yourself. Could be the other way around. Lots of money, not a lot of people. Maybe you have to hire someone. So it's also important to know ahead of time. So it's not necessarily that you know what you are going to need, but know at least what your resources are so that you can make decisions based on that. So those are out of the way. Now we get into the interesting stuff, Istio-wise. But the other stuff is also very important. There could be wrong. So what is it that you want out of Istio? So we have this diagram. And this is straight from the documentation. And don't worry, we're not going to do an entire introduction to Istio. There have been many talks about that. They're all very good. But we can use this as a map of some of the things that are inside Istio. And you can use it to mentally place things that you might be configuring or adding to your system and have an idea of where is this stuff. So this one, like I mentioned, comes from the documentation. The documentation is very good, especially if you're just starting out. So let's take a look. The concepts and the tasks sections, they essentially give you an introduction on all of the main topics of Istio. So conceptually, it might tell you what exists within Istio and why does it exist, what is it for. While the tasks section might say, well, let's assume you already know how it works or you know what it is that you want, which you're just not sure how to do it. Both of those, they will get you like 90% of the way there. 10% that remains is just stuff that's specific to your situation. So let's go to the next and the last section of the documentation. And that is the examples. So there is an example app that contains, it's a multi-service, multi-microservice app. So there are multiple microservices that are, I think, in three or four different languages. And they essentially give you a complete setup that you can run locally if you want with all sorts of Istio features enabled and you can enable more and disable some and just see how it works. So if you're in a situation where you don't have, let's say, a playground environment or you have a pre-production, pre-development, pre-testing environment, well, then you might just want to learn about it locally. And this is mostly useful if you're saying, well, I want to learn about it, but I'm not entirely sure if I'm going to get there by just reading a bunch of documentation. So that's where the example is super useful. You can relate something that is actually built to a piece of theory and you can place it in your mind on the map. So if you look at the main topics, that's the top row, the three items, those are the things that are the main things that your service mesh will give you. The bottom three are things that you, well, maybe not the middle one, but the first at the bottom and the last one at the bottom, those are the ones that you often find the most useful when controlling and letting traffic in. So those are very important. And they are also very useful to just start with, because essentially the one at the bottom, that one has zero impact on everything you already have. You can do this. Nobody will notice, which is not necessarily good because it doesn't do anything if you just put it in. But that's the one we're going to start with. So to actually start with it, and we're now in the execution phase, and this is the good part, we're going to start with an application of your own. So assume that you have a couple of pods there in the service, you're using an Ingress, and there is a load balancer. And the load balancer is maybe managed by your cloud provider, maybe it's an internal hardware device, both are fine. And your traffic just flows in like this, maybe on production, maybe you have a duplicate of this in development, which is super useful if you want to test things. Now there are a couple of other things listed here. You might have your GitOps and your CI CD around this as well, maybe you're already deploying your manifests as custom Helm charts, maybe you have an internal Helm registry. It's all good. If you don't have it, this still works. We're going to add the Ingress gateway, and what that gives you is a way to instrument traffic coming in, gives you metrics, gives you control over it. And it also means that traffic can get in at all, because without an Ingress gateway, in theory, your mesh is closed to traffic from the outside, which is a good thing. You don't want just arbitrary traffic going in. Going out, I think, is still, by default, you can connect to whatever you want. But anyway, we'll get to that later. To get the Ingress working, you also need a teeny, tiny bit of traffic management. And we're going to get into what that means in a second. So that gives you things like control over who's talking to who, and it gives you, let's say, more insight into metrics about requests and responses, timing, if there are any errors, stuff like that. The advanced routing part, that is something that you also get. We're not going to get into that, because, well, that would take too much time, and we don't have that much. So back to the map. What we're going to start with is the bit, that bit, where your traffic actually goes in, and it goes into your service. So before we can do that, there is a dependency. That's the control plane. That's the thing at the bottom. And the thing at the bottom, you can add to any existing cluster. Cluster won't notice it. New users won't notice it. You can just add it. You can try it out, check the logs, and be comfortable that it works. So what are our steps? Step one, well, there's step one. And we started with Kubernetes. And step one is we add Istio. Step two is we add the Ingress gateway. And step three is we're going to do a teeny, tiny amount of traffic management. So what does it look like? Well, we use the standard installation options available provided by Istio to both install the control plane and the gateway. And we're going to call our gateway the Ingress gateway because, well, it's a gateway that lets stuff in. What it does, if we look at the diagram on the side, it's essentially the same diagram, but some stuff has been connected and added in. The Istio daemon is a combination of various components. So in the diagram, the more complicated diagram. At the bottom, you see various boxes and words. There are specific components of Istio, but you don't necessarily have to worry about it if you're just using it. The Ingress gateway and the virtual service in there, they are connected. I'm going to show you how that works. But step one and step two is essentially the same thing. So if you're using charts, if you're using Istio CTL, if you're using Bayer manifest, it all works. It all results in the same setup. The documentation also has specific platform instructions. So if you're using locally MiniCube or K3S, or you're in one of the clouds, or you're using Bayer Metal installations, self-managed Rancher, it's all possible. There's all dedicated documentation for every one of those. And there's generic documentation. So if you're making your own clusters, you can do that too. So how does this work? Because when you do the installation, you get your Istio and your Ingress gateway. That doesn't mean that anything is happening yet. So the next thing, the step three, is the virtual service that we're going to add. And that's a resource that is specific to Istio. So what you'll see is that the resource is essentially a manifest, and this one I've made very small so it fit on the screen. But essentially it says, you have a service. So that's the service, the Kubernetes service, that's already existing. And we've named it My Cool App, because we need to put a reference in there. And then we have the gateway, the Ingress gateway, and that's also reference. So the virtual service essentially references both your existing Kubernetes service and your Ingress gateway. And that's all it takes to glue them together. So that is the first step into adding Istio without anyone noticing it, without any, well, let's say, impact, which is both good and bad, because it still doesn't do anything. So it brings us at a crossroads. Option one, we keep it as is, we deploy to production. None, nobody's anything the wiser, but it also doesn't do anything. So option two is, all right, maybe we should start doing something, but we're going to deploy it later, and this is generally the option where you say, okay, maybe I'm going to add more features before I do a release, because say you're not in control of your releases, because your company or your organization says, you don't get to release when you want, you get to release when we want. That's kind of annoying, but that's a fact of life in some cases. And the third option, and that's the option we're going to take, because that's always the option we're going to take, we're going to make it testable first. So we're going to make sure that we can actually put traffic into the thing and reach our service just as if nothing has changed. So how that works is, we're going to take the existing diagram that you already have with the resources that you are somewhat familiar with, and we're going to add another load balancer. So that's the thing at the top. And that load balancer allows you to essentially replicate the same way traffic flows through your system, but in your Istio enabled version. And after you add the load balancer and you configure to have it connect to your Ingress gateway, and again, that varies on your load balancer implementation. So usually it's an annotation you put on your gateway and you stuff something like, hey, this gateway would like its node port to be load balanced, something like that. So look at the documentation for your load balancer controller because I don't know which one you're using, but it's definitely supported. So what you see is you have two load balancers. One at the top is connected to the internet and the other one is not. So what you could do is you could just change your host file on your machine or use an extra domain you have lying around and you can actually go put traffic into your Istio mesh and reach the service. So that would actually enable you to test it without any of the existing systems noticing any change or any things that suddenly behave differently, even if it behaves well. People also don't tend to like things just changing with no notification. But anyway, that's not enough because once we have tested this and we might have deployed it, we're still perhaps not entirely sure if everything is going to work. So how do we make it, well, not scary? We add a bypass service. So what it does is it is a special virtual service that says I am not going to listen to any specific path or host name and I'm also not pointing to any Kubernetes service. I'm going to point back to the old load balancer and Istio has some specific logic that says if I find a virtual service that is essentially a wildcard, that one gets the lowest priority. So it will never receive traffic unless all of the other virtual services that you have in your cluster do not match. So that's essentially why it's a bypass or a fullback service. So what this does, and we have the manifests that describe it, is it takes your gateway. So it's connected to the ingress gateway and with the ingress gateway, it will then say, well, give me everything because in the lower section, it has no matching rules. So it says, if I see some HTTP, I don't care what it looks like, I'll take it. So it will eat all the traffic and then it says at the bottom, oh, but by the way, the traffic has to go to this other load balancer. Now, Istio doesn't know about this load balancer because it's not inside the mesh. So that's where the second resource comes in. That's a service entry. And what you can do with service entries is inform Istio about stuff that it can't discover by itself. So normally Istio knows about everything that's inside your mesh, but something that's not in your mesh, well, it wouldn't know about it. So the service entry says there is a host and that is external to the mesh and you can find it by doing a DNS lookup and this is the protocol and the port you're going to use when you're going to talk to it. And that's all it takes. So at that point, you could in theory, connect your internet to your new load balancer and there you go, everything keeps working. So you've essentially transferred all your traffic over without noticing any changes. Well, in theory, so what you're essentially doing is any traffic that's not captured by a virtual service will get an extra hop because it will go into the new load balancer, it will go into your ingress gateway, it will say, well, it has to go to the wildcard service or what's called virtual service and then that one hops back into the old load balancer. So for latency, like if you're super sensitive about latency, like having two milliseconds extra is really bad, well then this specific thing would not work for your specific case. But in a lot of cases, it'll definitely work. So that's pretty neat. But if we want any other service and we've only talked about stuff in here at the beginning, but if you want anything else, traffic between pods, if you want to do something with that, we're going to need more. So if we look at this grid again, we can say ingress gateway check.work, traffic management, tiny, teeny, tiny little bit. But we want more. So let's say we want observability because that's another very important feature. It'll get you insight into pretty much everything. Well, provided there is some support for the protocol in Istio, what you're going to do, let's say, a binary protocol for a custom service is just probably not going to know about it. But that's specific to your application again. If we're just talking about, say, web, you will see everything, which is pretty cool. It also is required if you want to make use of any other system. So there's Kiati, for example, which allows you to have a visual representation of your services and how they are talking to each other. There's also your Prometheus stuff if you want to have metrics. Well, you're going to need observability enabled in your mesh. So we're going to do that. And for the other thing, security, the same applies. So you can't really do anything with policies or rules or make something work or make something not work without adding a special ingredient. And once you have security, you also get authorization for free, basically. And authorization also brings you cool stuff like rules for who can talk to who. Say you have a bank account service and, well, you're not allowed to talk to it unless you are a banker, kind of a weird example, but the banker should be able to talk to it. Some strangers should not be able to talk to it because if the stranger says, you know what? Add a million to that account. It will be very nice, but it will be against the rules. So what do we do? We add this thing at the beginning. Those are sidecars. And that's the logo of Anvoy because that's the technology that's used. But the sidecars essentially is a prerequisite for anything else. And there is a new mode of operation. It's currently not ready for prime time. That's the ambient mesh. It uses different technology. There have been some very cool talks about that this year, but also last year, and also at the virtual IstioCon, I think. So if you're into that, definitely look it up. But the sidecar essentially deploys it with every pod that you launch. So what it does is it sits between your pod and the rest of the network. So that is logical because at that point, you can see all the traffic and instrument it and block it and transform it. The next step would be adding some security parameters. Most of that you get for free by default. So in the default configuration, there's a lot of stuff that comes out of the box. So you don't need to do any heavy studying just to make it work. And the last one, that's the one where you actually do get to work. So let's see what that looks like. We have a slightly simplified diagram because all the extra stuff with the bypass, let's assume that you've been running this for a while and you don't need the old stuff anymore. We essentially need to have the pods at the bottom to get a sidecar. So we need to add a little bit. And how does it work? You use an annotation or a label. No, it's still annotation. Things have changed over the past few years. But this allows you to inject the sidecar to any pod. So you can do it on various levels. You can do it very tightly scoped. So just on your specific pods, but you could do it on a replica set or on a deployment or on the entire namespace or on the entire cluster. So again, this allows you to choose your level of scariness. If you like scary, do the entire cluster. If you don't like scary, just stick with a single deployment first. And this essentially gives you, in this very complicated diagram, because this is the advanced version of the diagram, also straightforward documentation. But it does show you, again, every feature on the map. It gives you these two things. So they essentially sit between you, your pods, and the rest of the network. And they allow you to see and manipulate all the traffic. Well, that's something if we take it in a very simple diagram. That's essentially what allows you to do something called MTLS. So it's mutual TLS, which means services both have encryption and confidentiality and trust. So it means that if I talk with my service to your service, then I both know that I am who I say I am and you are who you say you are. But your service also knows that I said who I was and you can verify it. And it knows that my other service, essentially, they handshake and they know who they are. So the nice thing about this is that if you want to make rules, you now can not only do that based on a service name, but you can also do it based on an identity. Because the sidecar also provides you a cryptographic identity. And the pod, while the service inside your pods, your container, essentially, is not in charge of that. It cannot fake an identity. It just gets the identity from your sidecar. It has no choice. So that's super secure. Now, if we have the pods injected and we do communicate with another pod by default, that's allowed and you can change that. You can say, well, maybe I don't want that. Maybe I want my MTLS to be strict. So you can only talk with someone else. If the other someone else is also talking MTLS, that makes it more secure. However, anything that is not injected yet will break. So it's not always the best choice to enable that right out of the gate. The other thing with the bottom section here, usually it works, but there are some protocols where it doesn't work. And we internally have had this with, we have an actor framework that uses some cluster coordination. And it uses a custom binary protocol. If you enable sidecar injection, that protocol isn't well understood by Istio. So it doesn't really work that well. But you can exclude specific ports if that is something that you run into. So if we have your sidecars enabled, we essentially get this immediately for free, which is pretty sweet. Because you don't have to figure out 20 pages of documentation to enable these one by one. You're essentially just taking your application deployments and hopefully you're using some sort of charting or GitOps or an automated system that helps you enable this type of injection across multiple servers at once. Or maybe individually, each service you make a pool request and the service owner, the teams, they have to sign off on it, something like that. But you can make it as small or as big as you dare. So the next thing, the last thing on the series of icons is the security authorization. So let's pretend that we have an application that speaks REST and it has a metrics endpoint and it is on the internet. Well, your metrics are probably not for the internet. They are for you. And maybe we also have some other idea. Let's say we don't want to allow you to delete anything. We could just do it in the application, of course. There's nothing to prevent you from writing this code, adjusting the application. But what if you have 100 applications or 1,000? And what if some teams are on vacation and you have to make the change now? There is an easy way to do this inside the mesh. So if you think about it, you can do it in layers. You can block this inside the application. You can also block it in the mesh. Why not? But the idea is that when you do it inside the mesh, the responsibility is no longer just that application's responsibility. You can say, well, maybe we should have policies that apply to everyone. Now, in this specific example, we just talk about one app speaking rest. So how does this work? Well, the first one is the deny deletion policy. And what it says is, if I spot a request and the method is delete, and we have a selector and the selector says, if it's my cool API, then it's not allowed. We're going to deny it, period. It doesn't matter where you come from. And this one doesn't have any selection on source or whatever. It just says, if I spot a delete, can't have it. The other one, for the metrics, is a little bit more specific. And that's where the cryptographic identity is also used. So what it allows you to do is to say, if I spot a request that comes from my Ingress gateway and it goes towards my REST API, then that is a specific one that I'm going to check for a slash metrics path. And if I find that as well, then I'm going to deny it. So it means that your internal permissives will be able to scrape your servers all day long, but the internet doesn't have any access. So these are two relatively simple examples of how you can control access to your services. Now, this normally works just fine, but there is one thing that always goes wrong if you just get started with this. And that is you essentially can cause a service disruption if you don't have this diagram memorized. And what essentially says is these are the rules and the order in which the rules are checked. And the most important one is the one, well, let's say not at the bottom, but one just above that. By default, everything is allowed, but if you add one specific allow policy, everything flips around. So then everything is denied except what you explicitly allow. And this is something that goes wrong every now and then. And it's, well, get debugging and like, well, why does this work? My policy isn't that strict. Well, it doesn't work because this is the authorization flow. This is how it works. So yeah, that's something that I've done a couple of times and was wondering, wait, I have done this before. And then you run into this diagram because that's where you always land. And you find out, well, this is how it works. So the last thing, it's an egress gateway. What you can do with your egress gateway is you can instrument and block and prevent services that you don't own. So this is essentially adding Istio to the entire internet. So how this works, this is very short because we are running out of time. You've got your virtual services and we essentially make your virtual service for an external API. And we also add a service entry for that external API. And it allows us to say, well, if I connect to the external service, Istio's gonna notice that, well, hey, there's a virtual service for this one and there's a service entry for this one. And then you can use your policy, oh, wait, you can use your policy to then describe that that virtual service has to attach to the egress gateway, which means that your egress gateway gives you all the metrics and all the information and control about everything going outside of your cluster that you have for traffic that's inside your cluster. So essentially, if you've extended Istio to things you don't own, it's pretty cool. And that concludes my presentation. If there's any feedback or any questions, I am all over the slacks. There's the QR code for the schedule.com feedback. If you have any, please leave some. And I hope you enjoyed it. Thank you.