 Everyone, welcome to our workshop. This is almost like a part two of the previous workshop that Lynn did. Lynn's workshop was more on how to install Istio and how to use Istio, whereas this one is going to slightly take a different angle and focus more on the SRE role of how to deploy Istio for production, how to use production components, not just the stuff that comes out of the boxes, samples, and then some debugging tools so that as an SRE or in your production environment when something goes wrong, how can you debug exactly what's happening without knowing too much about the application itself, but just being able to introspect the network? I think everyone here has been in the previous lab. If you haven't, go to play.instruct.com and make an account. And then a little bit about me, I'm also a field engineer at Solo. I work with our customers to make sure that they're happy and their adoption of Kubernetes, Envoy, Istio, Service Mesh, whatever the technologies. So we're solution architects, developer advocates, customer support engineers. We do it all. And we're hiring. I think you've heard a pitch about Solo a couple of times already, so I'm going to skip over that. The two main products that we have are Glue Mesh, which is our enterprise Istio, and then Glue Edge, which is our enterprise Envoy that sits at the edge. And then with Glue Mesh, we also have a gateway solution that brings in all of the API gateway functionality to the Istio Ingress gateway. We have various other sessions in the day. I think we're halfway through the day. So the other session that we have is multi-cluster service mesh later on in the day at 3.30 PM, that another field engineer from Solo will be running. And I'll be in here. I'll be moderating in here like I did this morning. OK. Again, we're going to be using the instruct platform. The link to this workshop is at the bottom of my screen. Eric will also share this in Service MeshCon Slack Channel. It should also be available on the main CNCF session page. Can I get a thumbs up that most people are able to access this? Cool. I'm still going to wait probably a minute or so so that the people that are joining in virtually are able to access this. Cool. So once you click this link, you'll see the track. You basically add it to your study room. And then you start it. And then once you start it, it sets up the environment. It sets up your Kubernetes cluster. So that process takes about a couple minutes. So while that's running, I'm going to kind of move forward. So the people in the room are experts in instruct now. So this should go much more smoothly. The first portion of our lab, we're going to put Istio aside for a while and understand Envoy a little bit better. And this is important so that when things are not working well, you're able to see what Istio did to Envoy to achieve the functionality, whether it's traffic routing or MTLS or request transformation. The Istio's API, all it's doing is it's converting your simple Istio API to thousands of lines of Envoy configuration and then pushing it to all the various pods. And then it can scope it down. It can export it to the right namespaces, et cetera. So it knows what pods need to talk to what pods. So it's able to program each Envoy intelligently. It's a control plane. So let's do that. So this lab, all we're going to do, we're going to deploy the sleep sample, an HTTP bin sample, so two different apps. We're going to go to HTTP bin directly. And then we're going to install Envoy and then go through Envoy to reach HTTP bin to kind of see the difference. OK, so with that, everybody should be on a page like this, right, running an Envoy server. Then I'll click Start. OK, so let's see if you're able to see the screen. OK, I'm going to go full screen and then zoom in. Can I get a thumbs up that everything looks good on the screen? OK, cool. First part, we're just going to install the HTTP bin and sleep YAML. You can just click in this box, and it does the copy for you, just a quick shortcut. So if you click in the code box, then it'll copy it. And then you paste it in the terminal window. So it deploys HTTP bin app, and then it deploys the sleep app. So once that's done, give it a couple of seconds. And then the next command will exec into the sleep container and then call your HTTP bin container. And it's going to call slash headers. So then HTTP bin will respond back basically whatever headers that you passed it and anything else that it adds. So pretty straightforward, there's really nothing there. It's saying that you called me from curl. This is the host that you called me on and nothing else. So next, let's deploy Envoy. Envoy reads its configuration from multiple ways. You can statically declare it using this Envoy Conf YAML that we're going to show. Or you can also program it dynamically using its XDS API. But for now, let's just look at a static definition. So this is, so if I open up, if I do a cat on Envoy Conf.YAML that already exists in this environment, we can see some admin configuration that's showing that where to save the access log, what port the admin port is listening on. And then the static resources is the config. Is all the config that you need to tell Envoy, if you receive traffic on this port, I want you to do whatever Envoy things you have to do and then direct traffic to the HTTP bin app. So what we're trying to do is use Envoy as a proxy. So this is just saying that I want to create an HTTP filter. And then this is my route config over here. I want to listen on all domains, all prefix for your route matching. And then I want to send it to this HTTP bin service cluster. Cluster and Envoy terminology is basically like a set of endpoints. These are your destinations that Envoy will direct traffic to. And then that cluster is defined on the bottom. So it's going to define that HTTP bin service that you see up here, down here. So that's my very simple Envoy config. That's about as simple as it gets. So then let's create a config map out of that configuration, apply that config map to my Kubernetes cluster, and then deploy Envoy proxy. So it's going to be another pod, another container. So now you have three containers, HTTP bin, sleep, and Envoy. When you ran your exec command before, we exec into sleep and went directly to HTTP bin. But now we're going to exec into sleep. But instead of going to HTTP bin, we're going to call Envoy. And then we're going to do slash headers. But we've configured Envoy to direct all traffic to HTTP bin. So let's do that. So now we have a couple of additional headers. You can see the host Envoy. So that means that Envoy is your proxy that received the scroll command and then forwarded onto the HTTP bin service. You can also see this exEnvoy expected timeout. So the timeout configuration that we defined in our Envoy config, you can also see here for the clients to use. Everybody good? Cool. So now let's change the Envoy configuration and change the call timeout. So if I look at this config again, before we didn't specify a timeout, but now we're going to specify a one-second timeout. So again, let's create that config map or update that config map. And then we're going to restart Envoy to pick up these changes, and the change being the timeout of one second. Give that a couple of seconds for it to restart. And now you can always run like kubectl get pods and then see the three pods. You can see Envoy started. So now, if we do the exec command again to do the curl to Envoy, you can see that the timeout's updated to 1 second, 1,000 milliseconds. HTTP bin app has an endpoint where you can tell it to have a delay before responding. So if you go to slash delay slash 5, then it's going to wait five seconds before responding. Well, we've configured Envoy to timeout after one second. So as you would expect, the connection should fail. So you can see now we have a H504 gateway timeout. Pretty straightforward. When things are not working well, Envoy exposes the stats endpoint as well as other endpoints that we'll get to. But for now, we're going to look at the stats endpoint where you can get statistics about what's going on with all the requests. So let's exec again from sleep. But now we're going to call Envoy's admin port that 15,000 and then slash stats. So now we're not going to HTTP bin. We're just going to the admin port of Envoy and then went to the slash stats endpoint. Then we see a lot of good information. So this, all kinds of stuff in here. The way it's configured, all of its various clusters that Envoy knows about, and all the 500s, all the retries, all kinds of goodies are in here. So let's run that same command again, except this time we'll just grep for the retry word so that we get a smaller list. So you can see that all the retries are set to zero. So that means Envoy hasn't really retried any request. Next thing we'll do is we'll configure Envoy to retry a request on, say, 500s. So again, we have an Envoy config YAML. We will take a look at that. And this one is saying here I'm going to add the retry policy, retry on any type of 500s, like 501, 502, 503, 5Xs, and retry it three times. Like just like before, apply the config map, and then restart Envoy. Give it a couple of seconds. And that HTTP bin app also has a slash status slash 500 endpoint. If you call slash status slash 500, it's going to reply back with a 500. If you call slash 502, it replies back. So it's just basically like an echo, so you can do various tests. So if you do the slash 500, it comes back with a 500. That's as expected. You can see HTTP 101, 500 internal server error. But we want to see more details about this 500. So let's use the stats endpoint with the grep for retry and see what's going on. So now we see, if you look closely, that this cluster HTTP bin service has a retry label with a value set to three. That means that Envoy tried to hit this endpoint three times in a fill. So good stuff there. That's very useful in production when you're seeing if your retry policies are applying. So you can see if your app is really down, or if it's just taking a couple of times to respond. If everything is working well and you periodically check the stats endpoint and you see various retries, that might be a sign for you to figure out like, hey, what's going on? Why am I getting retries once in a while? It's good that Istio is able to mask this by applying these retry policies, but you still want to kind of know that this is happening once in a while. So that's Envoy for you. Very simple basic configuration if you were to program Envoy yourself. Cool? All right. So once you're done with that lab, click on check at the bottom and just does a quick check to make sure that you've done all the required pieces before you move on. Can I get a thumbs up in the room that everything's good? Cool. Eric, everything's going good online? Awesome. The next lab, we're actually gonna install Istio. We're gonna install Istio using revisions. Revisions let you specify a version of Istio to deploy into your cluster. That helps you when you want to do upgrades so that you can have Istio 183 installed into your cluster and then you can have Istio 195 also installed in the same cluster and you can tell your pods which Istio do you want to use and you can do slowly incremental upgrades. So for that reason revisions are extremely important, especially in production, where you don't want to impact any traffic and you want to do gradual rollouts of any changes. So we have the namespaces at the top are default Istio in system, sorry, default Istio in action namespace as well as the Istio system namespace. In the default namespace, we have the HTTP bin app that should already be there. We're gonna create this new namespace Istio in action and deploy this, the same app you ran in the previous lab. It's got four microservices, web, recommend, purchase, and sleep. And then in the Istio system namespace, this is where we're gonna deploy our Istio. Question? We're not using the operator here. Yes, the number one recommended way of installing Istio is to use Istio cuddle. So Istio cuddle manifests apply or manifest generate look at the YAML and apply. I would say the second best approach if you're installing brand new Istios moving forward, the helm support is getting better and better. Right now it's an alpha, but I believe in the next version where the alpha label is gonna go away and helm three will be supported. And then the Istio operator, it works well. I mean, it's been a supported way of installing Istio for a while, but the community is moving slightly away from it because the value of the Istio operator is not as big as it once was. Before, like last year, Istio came with multiple components, but now Istio is kind of shrinking down into just the single monolithic Istio D. And then the gateway portion of Istio is becoming more of a user's responsibility because they're installing the gateways into their own namespaces for better delineation of security, traffic, and administrative domains. So for that reason, because there's kind of only one pod or like one deployment Istio D, there's really like not that much use of having this overhead of an in cluster operator running. You're still gonna use the Istio operator API. You just give it to Istio cuddle and then Istio cuddle will take that Istio operator API, convert it to all your Kubernetes Emo and deploy it to your cluster. Okay, back to our lab. Like I said, let's first start with deploying our sample application. First, create the Istio in action namespace and then deploy those microservices, the sample application microservices. This installs the deployments, the services, service accounts, et cetera. And then do a git pods so that you can kind of see it come up. This might take 30 seconds or so for these containers to pop up. So those, my pods are up. And now let's download Istio. This lab uses Istio 183. The concepts are the same for 19, 110, 111. Yeah, this crawl command just sets the Istio version and then goes and downloads the binaries for Istio cuddle. Once it's downloaded, we'll set the path so that we can type Istio cuddle and have it pointed at the right Istio. And then once that's done, you should be able to run Istio cuddle version and then see the version output. Normally when you run Istio cuddle version and your kube config is pointed at a cluster that has Istio installed, then it's able to go query that cluster to see what version you have installed. Because we don't have Istio installed yet, that's why it says no Istio pods in Istio system namespace. But then the 183 on the bottom is the version of the CLI. Like we talked about, there's three ways of installing Istio, Istio cuddle CLI, Istio operator and Helm. With the current support, I would say that this is a good ordered list. And but in the future, I think what the Helm is gonna take over Istio operator. Great, and then let's kind of proceed. We're kind of running short on time. We're going a little slower than I expected, but we're gonna create the Istio system namespace. We're gonna create this Istio D service. This is kind of a workaround for a bug that's been there in Istio for a while where if you wanna do Istio revisions but you're starting on a fresh cluster, then you need the Istio D service created for the gateways to communicate with. So this is this section where you're applying the Istio D services of workaround. But this is also documented on the Istio installation, revision installation page. And then finally, the Istio profile that we're gonna install is minimal, meaning the only thing you'll get is Istio D. So that's what's defined here in this Istio operator. In production, you should always have an Istio operator file. Have multiple Istio operator files, one for your Istio D, and then one for each one of the gateways you have defined. So take advantage of multiple Istio operator files, save these operator files in your source control, and then when you install Istio, you would do Istio Cuttle install and then point it at that Istio operator file and then you would specify your revision label being one dash eight dash three. Many users also do like Istio Cuttle install and then you can pass it every single flag that you wanna overwrite directly with that CLI, but that's more error prone. So take advantage of actually storing your config in this Istio operator resource. With the Istio Cuttle method, you're not applying the file to your Kubernetes cluster, you're just passing it to your CLI, your CLI does all the magic and installs it. Let's take a look at the pods that we have set up. Now you can see the only thing that got installed is Istio D, but it's actually named with the revision that you passed it. So now you can kind of get hints that how you can have multiple Istio Ds running because they would all have a different name, they would all have a different service. To make sure that Istio D is working okay, you can exec into Istio D itself and then it has this pilot discovery binary inside of it and that binary exposes all kinds of various sub-commands that you can use to figure out what services that your Istio D has discovered. So this one is just calling the registry endpoint. So this is, you're basically asking Istio D's service registry to list all the services that it knows about. This one will have both services that are part of the mesh as well as not part of the mesh. It knows all the services because it talks to the Kubernetes API. The output is pretty verbose, but you can take a look at it in your own time. Next thing we'll do is we're going to add a sidecar to the HTTP bin service. So you might be familiar with just labeling your Istio system, you're labeling your workload namespace and then restarting your workloads and then using the webhook to inject the sidecar for you automatically. But if you're trying to do a more methodical step-by-step approach of injecting one part at a time to make sure things are working properly and you want to take advantage of revisions to point it at the injector that is version 1.8.3, then you can use this method of specifying, you're using Istio Cuddle to specify the your config map that you want to use. So I want to use the mesh config map 1.8.3 and you can specify the injector that you want to use. So this allows you to be very specific in what sidecar you're getting. If you have 10 versions of Istio running in your cluster, you can say that I want to deploy this pod and this pod I want to have the 1.8.3 sidecar. Okay, that's the end of that lab. So now we have the HTTP bin service. So you do Cuddle, get pods. You can see that HTTP bin has two of two. The two obviously be meaning one is your HTTP bin container and then the second container is the Istio proxy container. All good, moving right along now. The third portion we want to cover is observability. Out of the box Istio comes with, actually how am I doing on time? Do you know when this, hey Will or Eric, do you know when this part one ends? How much? 15, cool, yep, we'll be able to get through this. The observability lab focuses on not using the sample observability add-ons that come with Istio. The Prometheus deployment, the Kiali that come with Istio, like if you follow the, they used to come with Istio itself but now we've broken it apart in the documentation. There's a separate section for observability and that lets you get up to speed quickly by using basically just shortcuts to deploy Prometheus and apply the config maps and everything. But in a more production environment, you might have a full-fledged Prometheus installed using like the Prometheus operator and then using resources like pod monitors and service monitors to scrape metrics. So that's what we're gonna do here is we're gonna set up a more enterprise Prometheus. We're gonna use the Prometheus operator. We're gonna use the Kiali operator and then making sure that they're secured properly and they're configured properly. So skipping or moving on to lab three. Okay, first we're gonna install Prometheus. Step one, create the Prometheus namespace. Then we're gonna download the Prometheus helm charts. Then we're gonna use helm to install Prometheus. When you're copying and pasting these commands, remember you have to hit enter after you do the paste. Sometimes if it's multiple lines, then everything except their last one gets entered. So be just be careful with that. So now we're running the helm installer. We've configured this so that it works well with Istio. And then once that's done, do the kubectl get pods in the Prometheus namespace. So you can see these various components starting up. So run that, get pods a couple of times until everything is running. And then we should be able to port forward to your Prometheus service. And then the way the port forwarding works in this instruct environments is that we already have these tabs configured to listen to that port. We've had localhost 9090 port. So if you click on this Prometheus tab, you can see that it now loads. If you were to click this before doing the port forward, you would have had a blank screen. Actually one step that I missed is that there are multiple terminals. There's terminal tab one and then terminal tab two. We want you to do the port forwarding in tab two so that you're not blocked and you can continue on with in your tab one. So if you did the port forward in terminal one, do a control C, break out of it, go to terminal two and run the port forward. And then if you should still be able to get to this Prometheus page. You can start typing in expressions in here, but it doesn't have any stats yet. So Prometheus has not configured to scrape any control plane or a steel plane pod. So executing any queries won't give you anything. Next, go back to terminal two, exit out of it, and then also port forward to the Grafana service. And then if you go to Grafana, you should see this page load. And then the credentials are in the chat. Sorry, the credentials are in the workshop. You should see something like this. If I go to go to home in the dashboards, you can see I don't have any dashboards yet. So let's fix that. Let's add the Istio dashboards to Grafana. The Istio sample Grafana deployment comes with Grafana as well as a bunch of pre-configured dashboards. So we can take that and apply those Grafana dashboards to our Grafana instance. We've already downloaded all those dashboards to the local file system. So you should be able to just create a config map out of it by specifying all of the files. I don't think it matters which terminal you run these in. So we've created a config map using all of the dashboards that are on file. Next, we're gonna label the config map so that Grafana will pick it up and then do a port forward to Prometheus again. Sorry, port forward to Grafana again. And now if you go back to Grafana, and then there should be a refresh on the top right. So I'm gonna hit that refresh just to make sure it picks and reloads properly. And then if I now go to home, you can see I have all of my Istio dashboards loaded. So if you click on, let's say, the Istio control plane dashboard, you can see that the dashboards configured, we just don't have any stats yet. And that's because we need to tell, we need to configure Prometheus scraping to scrape our control plane as well as the data plane metrics. So we'll leave that port forwarding on terminal two. So let that port forward for Grafana run in terminal two, but then move over to terminal one and then start looking at the Prometheus service monitor resource so you can take a look to see how it's configured to read the control plane metrics. Now once you take a look at those metrics or how it's determining what service it's trying to scrape and any modifications it wants to do, it looks like for this one there's none, and then apply that. It says you should be able to go to Grafana UI and start seeing data, but in my experience, it usually takes a couple of minutes for the scraping to start picking up new metrics. So while that's happening, let's just keep going down and then apply the pod monitor, the pod scraping, your workload metrics as well. So once you applied both the service monitor and the pod monitor yamls, go down and then simulate some load by running this for loop. So for 10 times it's going to do, it's going to exec into the sleep container and then it's going to call the HTTP bin container. Remember that your HTTP bin container is now part of the mesh, but sleep is not. Looks like my control point dashboard is now picking up metrics. See, there's a few lines here and there. And if I go back here and then go to the Istio, let's see, service dashboard, in a couple of seconds, I should be able to see metrics for my HTTP bin app as well, not yet. Like I said before, this usually takes a few minutes. We'll come back to this probably in a later section so that you can see better metrics, but you can see that the control plane metrics are picked up, the service metrics are taken a little while longer. So with that, hit check on the bottom. We're not quite done yet. Ah, Kiali, I forgot about installing Kiali. So Grafana and Prometheus are good for you to see how every single service is doing, the health of every single service, the number of 500s that they're getting, the response times, et cetera. Kiali, on the other hand, is a service graph, so it's good for observing traffic as it's flowing through the system. So it knows about all the various services and how they're connecting together, and it determines that based on the Prometheus metrics. So let's go ahead and install Kiali. First, we'll create the Kiali operator namespace. And then do a helm install. If you check the pods, wait for the Kiali operator part to start up. And then, you know, just like with Istio, you're defining the config in the Istio operator YAML. Kiali, you can define Kiali configuration using this Kiali resource. So let's apply, let's create that as well. If you list the pods in the Istio system namespace in a couple of seconds, we should see the Kiali pod come up as well. This zoom stopped. Okay. Is Kiali running? Looks like Kiali's taking a little while to start up. But basically, you know, same thing as Prometheus and in Grafana, you know, you install the components and then you pour forward to it. This has taken a little while. So what I'm gonna do is just apply all of the Kiali commands so that we can move to the next section. We have a lot of things to cover and I don't wanna spend all of our time on observability.