 I will start with a quick introduction, I am Sunit, I work as an application developer in ThoughtWorks. I am with me. I am Girish, I am also an application developer in ThoughtWorks. So this kind of a little bit advanced topic on the container world, most of you I think must be having some sort of experiences working with the containers and all that. So what we are talking today is, I will just give you the quick overview first, we are trying something different, we will start from spinning up the cluster, so it is like zero and how we can get up and running in within the session itself, which gives you the feel of how easy or difficult it is to get all of this working in your environment, right. So we will start with spinning up the cluster on Google Cloud, by the time the cluster comes up and all that, we will go through why we need ServiceMesh, what is ServiceMesh, a setup, then we will do the second step, since the cluster is fresh new, we have to setup Istio and everything inside it. So we will setup Istio and ServiceMesh inside the Kubernetes cluster and then we will have, we will go through the Istio architecture, we will look at the different demos of all the possibilities or the solutions that we can put in the ServiceMesh, right. And then the demos at the end, right. Now, so to start with, yeah, so we have too many demos, so please bear with us if you go wrong somewhere, so, yeah, fingers crossed. Cool, so we will be setting up a Kubernetes cluster, to setup Kubernetes cluster, I, we went a little bit jazzy, we did not want to use a normal CLI, so we created a chatbot, so that with the help of which we will spin up a cluster now. So I will invoke this chatbot, okay Google, can you talk to Peter Griffin? Okay, let's get the test version of Peter Griffin. Hello master, what can I do for you today? Can you create a Kubernetes cluster for me? Okay master, how many machines would you like to have in your cluster? I would like to have three machines in the cluster. I have created a cluster demo cluster with three nodes. Thank you. Who is it? But you are the one who created me, master, have a nice workshop, bye-bye. Cool, so let's refresh the GCP page and let's see if Peter is telling the truth, yeah. So we have a demo cluster being in a creation stage, so it has three nodes. So let's go back to the slides till then, till it is up. So I think we got the cluster setup part done which is as easy as like you have a visual interface as well as command line also is just automated with the chatbot there, right. Now, once we have a Kubernetes cluster, I think how many of you are using microservice architecture or have worked on the microservice? Fair, I think now the next slides of things that I am going to talk about will I think you will be able to resume it well, right. How many challenges we face? I think when we start the journey for implementation of microservice architecture, I think there are many, many different things that we have to go through. So there are many cross cutting concerns that we have to implement, right. Starting from the simplest one is the service discovery, right. People can imagine, right, how easy or complex it can be, right. There are many ways and the most complex part is getting the client side discovery done. Otherwise every time when you are to make one service call to another service, you go via some gateway which is increasing the hop and increasing the latency, right. So if you have to take care of that right, otherwise your simple call on monolith will take milliseconds, here it will go into seconds, right. It is just there because we are calling too many services internally, correct. Now this is like getting it right is very important, right. Traffic management, throttling, right. This is also one of the key requirement that we have which is like, okay, one service is bombarding the other service and bringing down services for everybody else, right. So you have to make sure that you have certain throttling implemented, right. If one service goes bad, it is okay that is the only one culprit that will get affected. Then resiliency, circuit breakers, health checks, fault injection, all that is like, yes we have to do that, right. Going ahead with more security, very important, right. Zero trust, I do not want one service to trust other service by default. There should be some authentication mechanism, right. Authorization, all that has to be in place. They should be talking over SSL, right. Secure connections. I do not want to keep it open on HTTP, correct. All that needs to be taken care. Monitoring, again, how much response time that each service is producing, which can give me a ability to know that if I have a problem in my user experience, I can go and find out where actual bottleneck is, correct. And testability, if I want to deploy another version of a service, then it is important that I am able to test it before I roll it out to everyone, right. So AB testing, Kennedy testing, all that has to be done. And when I think of all this, and whenever I try to think that, okay, I want to go with the microservice at all these concerns, take a shit out of me. I always feel like I really want to do microservice or should I stick to my old ways of getting stuff done, right. These are the challenges. How many of you resonate with this kind of challenges that you have faced, right. It is important to get this right. Now, this is a scary thing, right. Because once you have too many services, you have large number of inter-service connections, right. The first time when I was doing this, after some time, I did not have any clue with services talking to with service. And then started creating diagrams, okay. Let us have some repository somewhere to make sure that we know who is talking to which service. Again, zero trust, authentication, all that is to be in place, right. Multiple point of failure, again. Standard applications, I can just fail my application and I know it failed. Now here I have to monitor like hundreds of microservices. One can go down and that can have a bad experience on the user, right. So this is all very critical in terms of implementing the microservices, right. But we have tools, right, which are even my obviously Netflix most of the time. We have many tools. We have a tool to take care of a problem, right. How many of you have used most of these tools? Yes, right, right. Now have you noticed this kind of, I am taking one of this example from the problems. Have you noticed this kind of a problem there, right, timeouts. One service to another, another service to another, another service to like that, it just goes on, right. Where do we keep all this configuration? We have to configure it, right. Where? Where.config which is inside your application, right. Now each application knows about this. One person changes it in one application. It can have a ripple effect over all the services, right. I am just creating the problem statement right now in the first part, right. Now this is where my all configs are, which is in the application. And this is where the challenge is. Now for each problem I have a tool that I have to use, right. Now for 10 types of different problems I will be using 10 different tools. Every time I have to go and research which tool is right. Is there any newer better tool available, right. So Uber is also publishing nowadays all the tools, right. As like Netflix, which one I choose. I have to make a choice. Then I have to implement it in each of the application. I have to go and make sure that each of the 30 services are using those tools, right. And the right versions that is also important, right. If they are not backward compatible it is going to bomb, right. I think it will not have the overall objective achieved. Again challenging, challenges that upgrading these tools. Or changing this tool toward your better tool. Now 30 services. One service nobody is touching from last 6 months because they are stable and running. That itself becomes a project. Oh we have to change this. Netflix, Zipkin 2, Uber, Yagato. Now let's take it as a project. Now that's kind of problem we start facing once we are into this world. The solution is service mesh where it can help to connect, secure and observe our services. So now on the application code side we have our microservice. This is a standard where we have our business logic. And along with the business logic we have our all the cross cutting requirements, right. As we talked about all the configs are stored within the microservice, right. All my configs of the timeouts and the libraries. They all reside in the same application code base. So in a way I am polluting my code base with all this knowledge, right. What could be better is if we have a part which is again a Kubernetes world definition. A microservice and a proxy. Now proxy replaces all this cross cutting concerns that I have to implement for the application. So let's say I have a two pod running. Each pod is having a microservice and a sidecar proxy. Now to talk outside it has to go through proxy. And even the incoming traffic that comes in is via proxy. So all incoming and outgoing traffic are via the sidecar proxy. Now that proxy is going to do the magic for us. And that's what the in nutshell is service measure all about, right. So it's like how many of you have used AOP in the Java world, right. It's like AOP of the Java world, right. On top of my service I have a proxy or alongside my service I have a proxy which is taking care of all these communication challenges, right. So with this proxy what I can do? I can do traffic control, bring resiliency, real drives, monitoring, observability, telemetry information and also bring in security. Service to service communication side proxy to proxy can be a secure connection because that's the outside of my pod, right. My actual service is talking to my proxy internally. Right, is this clear? Yeah, this is the main concept of the whole service mesh all about. Now on technology that are this has been on to assess from November 2018. We'll jump into the history of it now. So is there is an open source implementation of service mesh? It is created by Google, Lyft and IBM. So it allows you to connect, secure your microservices. You can do a lot of things like load balancing, observability, telemetry, authentication, authorization without changing any single piece of code. So that's where it is so much powerful. So before we deep dive into Istio, let's set up Istio first in our newly created cluster. So to do that we have Istio is available as a Helm chart. Helm is a package manager for Kubernetes. So it's just as simple as doing Helm install Istio. So there are a couple of steps which I need to do. First I need to set up my Kubernetes CLI so that I can connect to my newly created cluster. So for that I have already created a script which I can just invoke now. So it will get my Google account and get the configuration and put it in my cube config and my cube CTL should be connected to the newly created cluster in GCP. So if I now do cube CTL get nodes I should be able to see three nodes. So to set up Istio, so first we need to initialize the Helm in the cluster and then we can just say Helm install Istio. So let's do this, set up Istio. So this is going to take some time till then we can go back to the slides and talk a bit more about Istio's architecture. So Istio's architecture is very similar to how Kubernetes is designed. So there is a data plane and a control plane. The data plane basically consists of NY proxies. So NY proxies as Sunith mentioned they live as a sidecar next to your container. So inside a Kubernetes pod you will have an NY proxy and your application sitting here. All the communication between to service to service will happen via this proxy only. So this NY proxy was written by Lyft, a US based company. It's a very lightweight proxy written in C++ just 8 MB in size. The control plane basically consists of three components. The first component is pilot. The role of the pilot is to dynamically configure your invoice with the rules that you provide. So if I want to do any traffic based routing which we will see, we'll send those requests to pilot. Pilot will add the runtime configure those NY proxies. No restarts required. The next component is Citadel. Citadel manages all the certificates like TLS certificates in the cluster. It will keep on periodically also rotating the certificates as well. The third and the important component is mixer. Mixer talks to NY and collects all the telemetry data that is flowing inside the cluster. It will also make sure that the policy checks are enforced in the cluster as well. So we have a and yeah the communication between NY to another NY can be with or without TLS. So for demo purposes we have a book info application and Sunith will talk about it. So this is a book info application provided by the SPO people only for demoing their capability and everything right. Now in this book info application, it is simple micro service architecture again, there are multiple services and the one UI service or application learning. So this is the main product page from where it makes different calls. So one call goes to the product detail service. The another call goes to the review. I'll come to that. And review makes a call to rating. Now on review side we have three versions deployed V1, V2 and V3. V1 is simple without rating. V2 is rating in black color with stars and V3 is rating in red color right. Now these are the three versions available and V2 and V3 talks to rating to get the actual rating data right. So this is how the setup is done. Now with the SPO we will have the whole setup where we have every part running a service alongside a cycle proxy right. So each of this is having that and a ingress controller to start incoming requests. So to install this book info application we already have the YAML files that are Kubernetes YAML files ready with us. So basically these are the four YAML files, details, product page, ratings and reviews. What they will basically do is create a deployment object and create a service object correspondingly. So let's install book info application. It still is setup. So I am going to just say kubectl apply minus F the whole book info folder. So it will create service objects for four microservice and deployment objects as well. So now if I say kubectl get pods I see there is details product page ratings and there are three pods for reviews running because we have three versions and if you can see here Istio has injected a sidecar container automatically. So there are two containers that would be running for each pod. So if I refresh do kubectl get pods again. So I see each application now has two container up and running. So what Istio requires is that all the communication should all the requests should be incoming through its ingress gateway. We just need to configure the gateway. So people who have worked with HA proxies nginx will relate to this. So this is how you configure a gateway. You basically say this gateway will listen on these three hosts on port 80. That's we put in server blocks in an nginx config file. And then we do some pathways routing. We basically say any request coming from bookinfo.com if it has these URLs then read and then direct that request to a product page host on port 9080. Now this product page host is nothing but a kubernetes service. So we already created a kubernetes service with the name product page. This product page is registered as a DNS in the kubernetes cluster. So we are basically saying if the request is coming as bookinfo.com demo.com slash product page go to product page service. So let's install this gateway. So now if I go to my browser and go to bookinfo.demo.com slash product page we all we have our application up and running. So if I now review fresh you will notice the ratings are changing. Look at this part changing right. So no rating red color these are the three services. We're going in ground rubbing default way to each of the service as a simple contribution set up. So by default this is load balancing in a round robin fashion to v1, v2, v3. Now let's imagine a scenario that we want to control this traffic. Let's imagine a use case where we have rating service under development. It is not ready for production use case. We want to control we don't want any request going to rating service. So basically what I need is I want every user coming to my application should only see reviews version v1. No request should go to reviews v2 or v3. So how I can do that is so basically what I will say is I will divide my traffic into three subsets. I will say a traffic coming for reviews consists of three subsets v1, v2 and v3. v1 subset consists of the parts which have a label version v1. v2 subset consists of the parts which have label version v2. And similarly v3 subset consists of the parts which have version as v3. Now what I need to do is I want to say whenever there is a request coming for reviews, route it to reviews but only to subset v1. So I will not change any of the application code. I will not uninstall reviews v2, v3. Reviews v2, v3 will be running there. I will just apply this traffic route so that all the traffic now goes to reviews v1 only. Let's see this. So now if I refresh the page, I do not see any rating. And in n number of times I refresh, every time the request is going to v1. So that's where Istio comes into picture. You can control the traffic, any traffic routing you can do. Now let's imagine a second scenario. We have these ratings now developed but we do not want to roll it out to everyone. We want to do a canary release. We have a special user called Peter. I have already signed in as Peter. I will sign out. So I have a special user Peter. So when Peter signs in, he should only see the ratings. Nobody else should see the ratings. Just a canary release. So how we can achieve this with Istio is so we can do a header based content routing. We can basically say if the request is coming for reviews and the header contains Peter then route it to v2 subset otherwise route it to v1 subset. I do not change any application code. Just configure my envoy to direct the traffic to v2 when the header contains Peter. Similarly, you can imagine use cases using user agents. You can say if the request is coming from an iPhone or from an Android or from a Google Chrome, from Mozilla Firefox, you can direct traffic correspondingly. So let's apply this. So as a normal user, I still do not see ratings. Now when I sign in as Peter, I see ratings in black color. I refresh the page. I see the ratings in black. So I have basically canary release this feature to Peter. So similarly, we can also do weight based routings. So we can say if I want to now roll out ratings to everyone, but I want to say 80 percent of the request still should go to v1, but 20 percent of the request should go to v2. So that I tested how my rating service is behaving. So I can say route 80 percent of the request to v1, weight 20 to v2. So in this way, you can configure these traffic routes. So let's say our rating service is developed. It is ready for production. We want to roll out to everyone. So I will say I want to route it to everyone. So I just need to configure my route accordingly. So I say whenever the request is coming to reviews, just route it to v2 with no clause of headers and anything. So I will just apply it. So now if as a normal user, I refresh the page, normal user sees ratings. Now let's imagine a use case where I want to test fault tolerance of my application. I want to say, I want to test how resilient my microservices are. If one of the microservices is down, how my UI behaves. So for that as well, Istio provides us some traffic routes, rules. So you can say if the request is coming for ratings and Peter is the user, then abort that request with 500 status. But for every other user, just proceed normally. So you can do this kind of testing in production as well with production test users. So you can do some rejects matching here. You can inject delays as well here to test how the timeouts and your delays are working. So here we are just aborting the request for ratings for Peter with a 500 error. So let's apply this. I think this fault injection is very powerful and I think it's very helpful for doing all the testing specifically in case of worst case scenarios. So right now for the normal user, everything is working fine. Now let's say Peter signs in. This is the rating service is currently unavailable. So basically the UI has handled this scenario gracefully. So I can see my application is working correctly if there is any fault to rating service. I am displaying at exact error accordingly. So you can handle gracefully whatever you want. You can test it now. Otherwise, bringing down a service and testing is a very difficult task in production. Yes, there is a round robbing happening. You need to have more pods for a V1 node and you can have automated scale up and scale down for each pod cluster. You have to use the deployment for that purpose. It's not possible to be honest specifically. And there are work arounds that you have to do which will not give you this kind of flexibility without downtime and all that. And you have to do a lot of work arounds like you cannot release also is possible, but it's difficult. I think we'll continue and take the question answer in the end. We have a lot of things still to cover and to show you there are really good stuff there. So we looked at the demos like we looked at the service discovery that is happening. We looked at the A, B testing, canary release and all that stuff. We looked at the fault injection as well. These are just tip of the iceberg. You can use many, many more use cases that you can come. The most important part that Girish mentioned is the header based routings and header based fault injection. If you put your architecture in place where you have a good amount of information which you want to control in the HTTP headers, your job is done. So yeah, let's look at how it internally works little bit very high level from the 100 feet not going to the actual level. So I say that there is a Istio and Kubernetes as one part and then services. So as soon as your services are up, I think the services register themselves to the Kubernetes masters and it is available as a endpoints in the service. Similar way Istio is also there and all this information is fed back to the Envoy proxy which is similar to the Kube proxy which is running on your actual nodes. Now Envoy proxy has all the knowledge about all the services and endpoints and all so that it can achieve the client-side routing. Now if you have a traffic rules that you have applied like Girish was doing that fault injection and everything as soon as the traffic rules are applied, they are pushed to the Envoy proxy again. So that's how the control plane is coming to the picture of the Istio and pushing all the details to proxy. Now when actual request if service A wants to talk to service B, it makes a all outgoing traffic through Envoy. In some cases Envoy needs to talk to Istio for getting the telemetry data constantly if you have the percentage based routing because it could have happened that another part somewhere else is also making these calls and all this telemetry data has to be centralized somewhere. That's where the mixer comes into the picture and again percentage based routing also can be achieved. Now if you do the user based routing or the header based routing specifically, now this is very classical problem. Lot of time we want to have a different services to be in place for Android and to be in place for iPhone. Now this is very classical use case that I have used in multiple projects where you can just have the user agent on the header and then have the traffic going into the different actual implementations. Now without anywhere in the code getting polluted all this can be achieved. Now let's look at, so this is the use case how you can do all the configurations in the Istio. Let's look at some telemetry. So as we said mixer is collecting all the telemetry data, any request flowing to Envoy is registered as a metric in mixer. So when you set up Istio, Istio by default installs Prometheus, Grafana and Yegar for you. So let's see what metrics we have already in the cluster. Here you will have just the data of last 10 minutes because our cluster is fresh new, just created just now. So we have a backup cluster running. So I'll just show the metrics there. So these dashboards in Grafana just come out of the box. We haven't installed anything ourselves. So if I just see the data for last three hours, I made some request. Yeah, monitoring is done. So now the all telemetry data is also collected and then configured to send to the Prometheus Grafana setup for the monitoring purpose. And Yegar for the distributed tracing. So those who have used Zupkin will be able to relate to the distributed tracing easily. Now this is as good as this is like beauty of combination of Docker containers, Kubernetes and on top of that it's the service mesh. All these are working together to make and give you this kind of a output. Yeah, so these all dashboards come out of the box when you install Istio. So you can see a request flowing in out what is the response time per response code you can also see. The most important tool is Yegar. So with Yegar it lets you do distributed tracing of the request. So let's say I select product page and I do find traces. So I can see how each request went to which service. So now let's say this is one of the request. So you can say first then request came through ingress gateway then it went to product page product page called details product page also made a request to reviews then reviews made a request to ratings. Now I can also see that the details service took 8 milliseconds rating service took 1.48 milliseconds and overall the whole request took 66 milliseconds. So you can just trace each of the request going through each of the microservice and see where it is failing where it is lagging where there is latency. See this one was the fault injection that we added right and this is running on the cluster that we have set up latest. So there is no nothing back up there right. So now all this is in place I think the one question that comes in everybody's mind is that what about performance what about scaling and all right and this is like some people says that okay this is like kind of proven some people will say that no no this is very new to the very new to the game what we do right. So I think some metrics that we have understood why it helps some to convince ourselves that it will not be a problem. First is that the size of the envoy proxy or any other proxy that you choose like there are many more implementation available. Inginx has its own proxy linker d and all they are all about in the range of 10 MBs size and they are doing and working very hard to reduce every MB into it. I have heard that there is one proxy which is less than 5 MB also right. So now your side curve proxy which is getting injected into every part right. You have to think about it right if you think that it is a 200 MB what the little like it will multiply right. So now it matters right and they are working very hard. So it is less than 10 MB so that way we are covered in a way and they are working hard to reduce this further right. Performance also since running in the local host itself and not a hop outside or going outside it has some overhead involved which is but less than single digit milliseconds that is what I have observed. Sometimes this performance becomes heavy when we try to do load testing and it is not about proxy the performance that I have observed or we have observed is all about telemetry because it is collecting so much of data around right. By default it collects all the data and we have observed that that creates little bit of overhead in terms of collecting. So header based routing works very fine because it is just the same request you do not have to go and do anything there. But if you have to do percentage based routing and all that it has to keep collecting that data every time right and it could be that there are tens of nodes running right. All of them have to collect that data it is just not about one node which is making a call to another service right. So that is where it gets little bit tricky and have a overhead of performance going high. Also you have seen the architecture of service mesh it is similar to the Kubernetes master which is like control plane and data plane which is like your nodes right. In Kubernetes you know that if master goes down even though my application will keep running right. Same architecture same principles if my control plane goes down I cannot deploy new rules I might not be able to collect that elementary information. However my system can still continue to work right similar principle. So you have a scalability resiliency all that available in the architecture. When I was reading more about it there is one specific proxiesilium they claim to be lower than the nomroxy. I have to read more and understand why and all that but yeah there are lot of things going on in this space. Yeah that is all we had any questions anything and we will make our code available on github vgirish ajailindia.io project okay. You can take a photograph of this so that it is easy for you to remember if you want all the code. We will make it available mostly today or by tomorrow morning you will have it. Yeah we are done and any questions we have kept 5-10 minutes for it. We are good on time. COM and AWS it is a hop right it is a separate outside. So when you have an incoming traffic from public COM is the right choice there. Now this is inter service communication. Now you have 30 services you will not go to COM every time to make one service call to another right and that is what I was talking about. Client site service discovery is very important. I do not want to have a central API gateway it does not answer your question. Yeah it is a different use case right that is bringing resiliency for a service. So if so if you look at the review service we have 3 review services version 1, version 2, version 3. In version 1 if I deploy 10 pods if one goes down kubernetes is going to bring that up not istio. Now istio or kubernetes does not help you to have the routing based on rules. Istio helps you to do that kubernetes does not allow you to do fault injections kubernetes allows you to do that sorry istio allows you to do that right. Actually it is not even Istio it is about the proxies who are having the actual information about what to do right and how to make all the calls internal. It takes good amount of storage it is telemetry data yeah what all you want you can reduce the telemetry data and that is where I was talking about right istio by default does not take too much of resources and all that stuff yeah yes yes so you have multiple options in prometheus it is a prometheus feature it is not a istio feature okay istio just sends the data right and the whole architecture prometheus is a tool where you can reduce your granularity as your day passes like you can say that up to 7 days I want granular data from 7 to 30 days I want granular at every 15 minutes from 30 days to one year I want at one hour yes it is like. So it basically exposes slash matrix and points and prometheus is scraping those and points basically. So you can use another tool also with this so if you are comfortable with any other tool for monitoring and not prometheus yeah it can be plugged in uh Kibana is a visualization yeah you have to use some tool to send data to elastic search yeah yes it can be I think this architecture is all pluggable you can pull out the components and use other components also so all these so istio sorry if the proxy supports HTTP, HTTPS, GRPC and TCP all that is supported that is for HTTP yes yes so all that is possible through proxy however there will be some features which will be available some features won't be available so like this feature that we showed is HTTP header based right which is specific to HTTP. So it will work in case of when you have HTTP services talking with each other it might not work in case of TCP but in TCP you can have another ways to inject the faults the conflict changes that cool yes I have seen console and kubernetes it supports uh I am not sure with docker swarm they have support may might be they might be working console and kubernetes I am sure. So Istio right now mostly in kubernetes world works seamless like you have seen that we started a cluster and we had a everything up and running up to distributed tracing right in that same new cluster which was set up and that's the beauty of the whole kubernetes world also it's not just about Istio it's about kubernetes helm chart there are many tools which are used cool good thank you thank you thank you