 Hi, welcome to Cloud Native Live, where we dive into the code behind Cloud Native. I'm your host today. My name is Whitney Lee, and I'm a CNCF ambassador and a developer advocate at VMware Tanzu. So every week, we bring a new presenter to showcase how to work with Cloud Native technologies. We'll build things, we'll break things, and we'll answer your questions. So this week, we have Victor Gamoff here with us to deliver a presentation titled, Streamline Service Mesh Observability with Kuma and Open Telemetry. Now, this is an official. A little mouthful, right? Yeah, yeah. And I'm not even done. Oh, yeah. I have some more to say. I have to do this disclaimer, which is just remind everyone this is an official live stream of the CNCF. And as such, it's subject to the CNCF code of conduct. So please don't add anything to the chat that would be in violation of that code of conduct, which basically means please be respectful of each other, be respectful of Victor, be respectful to me, and we'll do the same for you. So friends who are joining us live, if you have any questions, please, please do drop them into the chat. We're hoping this feels more of a discussion than just a simple presentation. And so with that, I'm going to hand it over to Victor to kick off today's presentation. Thanks so much, Whitney. I wanted to say that huge fan of the things that you do, not only in this channel, but in general, in the internet and YouTube, a lot of the lightboarding stuff. Folks, if you haven't seen the thing that Whitney does with the lightboards, go check us out. Wow, thank you. Yes. And the second thing is that I'm a long time listener, first time caller basically, because we did few of those in the past. But it was not the live thing. So we kind of like hand over the recording and the CNCF was kind of like playing the video. But it's great to have a live conversation. I was wondering if we can test the chat and the folks watching us live, like write down where we're coming from, like where we're at. Like what's the geography of our presentation? I'm coming to you from today's super sunny New Jersey. We're expecting to have like a very scorching sun today. So that's why I'm trying to avoid go outside this afternoon. Did you spend time with us? Yeah, let's stay in our dark rooms. I'm in Austin, Texas. So it's also swelteringly hot outside. Yeah, unbearable. We have Gorov saying hello in the chat, which I love. I marvel all the time about I'm in my 40s. So I remember when there was no internet at all. So I marvel all the time that we can be having like a real time conversation from people across the world. I think it's just the coolest thing. I'll never get tired of it. Yeah, yeah, that's how we roll for better or for worse. And let's get to it. So at least we have three of us here. So we know that we have me, we have me and we have Gorov. So it's a good audience to start. Excellent. As the week point out earlier, we're going to talk about the service mashups or ability. We're going to, I'm going to talk a little bit about different CNCF related projects. So I'm going to talk a little bit about Kuma, which is a CNCF incubating project. I think it's not incubating. It's a still sandbox. And also obviously open cell metry is also a CNCF project. Many other projects, you know, include integration. This is the talk where I would like to spend a little time of like slides and talking. So don't hesitate, interrupt and ask questions. For me, as a, I work as a developer at vk.com. And for me, talking to developers and make sure that I'm unblocking any like things that stand on their ways to build awesome apps in a cloud native way is the kind of like my goals and it's going to be my achievement. So Richmond, Virginia, very nice. Amitash, welcome. Kumar says hi from LinkedIn. Great. So feel free to drop your questions and Whitney, if you also see some interesting things don't hesitate to interrupt. Or maybe, you know, if you want to ask some question yourself, I like to have these conversations as much, much better to do this like alone. I agree. And we already have a good question from chat which is how is this different from Istio? Oh, that's a very good question. That's a very, very good question. We're going to talk about this as well. Stick around for, I love the audience that go like a straight shooter should we go directly into the business? Like what's up with that? Anyways, so essentially the observability. We're going to be talking about observability and the most important question how the people trying to figure out, like, hey, like why is it so slow? Why X is slow? I put the kuma because people will ask questions about kuma or if I'm talking about some con or I can talk about some other technologies people tend to love two things. Why things are slow and why it's so expensive? So that's why kind of like when you bring in conversation and give examples from those like categories people kind of like react the best, right? So when you say something, oh, let me give you example with the credit card or financial stuff, people kind of pay attention or why things are slow? And we're going to be trying to not to be Charlie Day's character for it was standing in Philadelphia when he tried to investigate the particular person and deliver the mail. We tried to investigate some of the problems that might happen in our microservices environment and things like that. Once again, my name is Viktor Gamov. I work as a principal developer advocate with Conq and Conq, we build the tools for, we call it cloud native connectivity that might include multiple things, but essentially APIs is in the core of any type of cloud native connectivity. So we build tools that allows you as a developer, build your API APIs, deploy your APIs, govern your APIs and all those kind of things. And Kuma was one of the project that we started at Conq and a couple years back we donated this to CNCF and we continue to develop this service mesh in open forum. Also we use this internally to build our own SaaS offering for service mesh. You can follow the Kuma mesh in Twitter, very active Twitter, and we have a very active Slack community as well. So a brief agenda for today's presentation. In the first part I would answer what's Kuma and probably how it's different from Istio. It's one of the only favorite questions to answer. What's the open telemetry? What's the benefits? And hopefully, as Whitney mentioned at the very beginning, we break some things. So we start to build things, we break some things, maybe some time we'll work. We'll see, it's gonna be exciting. So first of all, what is Kuma? Essentially when we're talking about the service meshes, people need to think about two components. Essentially it's kind of like a one big component and many smaller components. Essentially service mesh includes a control plane, which is a service that will be responsible for storing configuration of your microservice, traffic and things like that. And also stored configuration about the policies that you want to enforce inside your service mesh. And one of the roles of control plane is to manage and monitor data planes. So data planes, it is in this current iteration of the service mesh history. I would say it's a second generation of service mesh to have. Control plane is the separate process that runs next to your microservice. And it basically proxies all the traffic and communication happens through this proxy. It can be kind of like a reverse proxy for your services, internal and external traffic. And control plane is responsible for configuring this and sending configuration and making sure that data planes have up to date information about the topology of the services. And the beauty of this, your applications, your service, this is like a blue box that I have here. This service is not necessarily need to know that they run in environment like a service mesh. Comparing to past in what we call like service mesh generation one, when your application actually needs to include some of the libraries that implement this like a data plane proxy functionality and maybe collected metrics and the collecting all this stuff. Now data plane is separate process, lifecycle of your application, of lifecycle for your data plane, they are not connected. So you can upgrade your applications without updating data planes and control plane will be responsible for making sure that the data planes up and running and has all required information to pass the traffic from one service to another. So back to question about what's, how it's different from Istio. So Istio, it's another service mesh recently like almost a year ago, right? Istio joined CNCF as another service mesh project which brings number of service mesh project inside CNCF to I guess eight or seven or something like that. And Istio popularized the concept but obviously Istio was not the first one that implemented this. So Kuma includes the very similar components that Istio includes Kuma relies on this another CNCF project called Envoy proxy. So Envoy, it is a teeny tiny super fast proxy server that runs next to your application. This is gonna be data plane, Istio uses the same thing. If we're gonna be, I personally, not a huge fan of doing this kind of like, okay, let's do like check box comparison, things like that. There is a, if you Google, you can find there's very interesting like a Google spreadsheet where the different service mesh is compared but three things that when we designed Kuma and when we built Kuma in the very beginning, we wanted to put a front is developer productivity. We don't want it to overwhelm a developer with the numbers of all possible CRDs that you need to configure in order to run this. Deployment of control plane is literally one deployment. And when we run this in Kubernetes, we can enable sidecar injection label into namespace and control plane will be responsible for injecting sidecar to any application. So developer productivity is the key here. Second thing was when we plan to this, we really wanted to think how the people running their workloads. And in many cases in big organizations, they not running those one single workload across, I don't know, like one AWS region or one Google Cloud region. They're running this across multiple regions or even maybe across the cloud. So Kuma has this concept of multi-zone, multi-mesh deployment that allows to span your service mesh across multiple different, even heterogeneous environments. So we're gonna talk on the next step. What does it mean in terms of Kuma? And this will brings us to kind of like a created unified platform for your application to run, regardless when they physically deploy. And the third step is that even though we're talking about the Kubernetes, we all about all this like Kubernetes life and deploying our pods like every minute of the day, many engineers, SREs, they deploy applications, not necessarily in Kubernetes, they deploy application in VMs, they deploy the application to some other systems of deployment. So one of the goals was to create Kuma as universal, we call it universal mesh, meaning that it doesn't have a dependency, like a very strict dependency on Kubernetes. You can deploy same number of data planes in Kubernetes, some number of data planes in VMs. And after that, they will create this unified environment across heterogeneous network. Hopefully there was an answer. I kind of have a question. I'm gonna restate what I, so three differences that set it apart from Istio. One is that it's more simple to use, like a better developer experience specifically, you said. The other is it spans, you can span geographical regions with it pretty well. And the third is that it's not Kubernetes specific. And so the question I have is, is that what sets it apart from Istio specifically, or does that also set it apart from other service meshes in the space? Yeah, very good question. So similarities to systems like Istio comes to the architecture. So all these service meshes right now, like majority service meshes, they use Sidecar model. Some of these service meshes not using, like surprise, surprise, shocking, but not all service meshes uses Envoy as a data planes. For example, another project that is also CNCF project is LinkerD. LinkerD also service mesh also provides control plane, but they use something different. They chose to develop like a purposely build the proxy for their data planes. I think what's the, what is not the hard, but at least that's the situation we see. Either people going with the Istio because they were introduced on the early stage and they kind of like learn how to love and hate. Sometimes people love this, sometimes people find the challenging in the situation when they need to scale this outside Kubernetes because Istio is a very well connected to Kubernetes. Or you'd want to do things around, things around like I said with the multi-zone, multi-cluster deployment and all this kind of things. But most important thing, don't listen to me. Don't listen to some other vendors. You have to decide what you're comfortable when you're implementing service mesh in your environment. What kind of problem it solves for you. There would be plenty of presentations. People will get you excited about this. And my goal is not to tell you, hey, this is the best one. My goal is to show you what's available and help you to decide if there's something that you want to use in your organization or maybe you want to go with these, with other things. That is, I don't know, historically, usually it's very difficult to beat historical context and the things that when there's someone that's came in on the team and said, hey, yeah, we did use this in the past. We will continue to use it in the past. I'm here just to show the options. That's why I said I'm not a huge fan of doing this kind of like, oh, okay, let's compare the feature. This has this feature and this has this feature. This has this because there is a plenty of those materials in, I will try to show what is possible and our audience will decide. I love it. We do have a question in the chat for you from Amatech and it's, can it be deployed and used with ECS? Yes, it can be deployed and used with ECS. I believe we have a documentation there. If you couldn't find this, drop me a DM on Twitter. My Twitter is conveniently placed at the bottom of my presentation. So in any case, if you want to kind of like, hey, Victor, can you send me this link? I will do that for you. All right, we see Sanjeev join us from Frankfurt. Very well, very great. Thanks so much for joining. Yes. And this is just basically overall architecture of any kind of like a service mesh. As I said, Kuma is a CNCF incubating project. This is, I'm sorry, sandbox. Like we really want to do incubating closer to KubeCon. So we work in right now in order to kind of like make this official, but we do have some very, not ginormous, but I guess we have a very fast growing community of the users and we also provide commercial support. So we know like people who, you know, build this stuff for, as a commercial thing. And in this presentation, I'm going to talk only about Kuma. So Kuma mesh that we built on top of it has some differentiator, but I'm not going to touch this today. So everything is going to be like as is, and you can take this and run with it. And yes, and Envoy, Envoy. So since we're talking about observability, it's great opportunity to talk a little bit about Envoy. So when the Envoy was created, one of the goals that Matt Klein and his team at Lyft, they come up with this, an early stage that services that this Envoy would be fronted as a proxy, they need to be dynamically configurable. So we don't have to send the person to go and configure our proxy server. This needs to be API to configure this and observable all the way. So they want to see, first of all, every time when you're introducing some additional hop in your network, you want to make sure that you're not introducing unnecessary latency here, right? So they want to make sure that the things that we're introducing with this proxy will not introduce additional problems. So they need to have observability along the way. All the things that Envoy provides. And these two things help people to build some of the other things on top of the Envoy. So what the control plane is actually does, it takes the definition of whatever network policy and translate this into configuration that will be shipped into Envoy. Including the things we're gonna, I will show some of the examples of how we can manage traffic between the services in order to introduce some of the failures. I will talk about this when we will introduce those, why you want to introduce failures. And yeah, and another thing is to observe services and collect the metrics from the, from the older nodes, which brings us to open telemetry. Open telemetry, it is a standard, first of all, it's a standard that come up over years of different industry leaders, talking about different pillars of observability. We know this is metrics training, traces and logs. Those are considered the pillars of observability. So metrics will give you information what is happening right now. Logs will give you information what has happened. And traces basically give you kind of like this trace, what happened across the system. Because we're gonna be interacting between multiple microservices. And you wanna know that what happened in the system X when failure happened on the system Y. So that's gonna, the open telemetry as an open source project includes specifications, including set of tools to collect data, set of libraries that you can embed in your applications in instrument, your applications and send telemetry data into whatever system that able to consume it. So that's the premise. There's a different groups, I would say, some group of developers, they prefer to have a control over the things. So they prefer to embed this open telemetry libraries in their application. So in this case, they instrument their application by enabling these libraries. In another group of people, they want to keep their microservices teeny tiny and they don't want to include additional tools or additional libraries in their microservices and also manage those things and make sure that the libraries have a capabilities across different languages and things like that. So those people wants to have an infrastructure that will be able to collect all the data. So essentially we're gonna talk for the rest of this presentation, we're gonna talk about the second part of this. So I'm not gonna go ahead and instrument my applications in order to support open telemetry. I'll show you how we can enable declaratively when the application runs inside ServiceMesh, how we can start getting these benefits of collecting the open telemetry data. So we can get the different benefits for business, for developers. We have some of the features that built in in a gateway, but I wanna focus on actual what the mesh does with this open telemetry and how this will work with our content. So there's a couple components that open telemetry defines. So first one is a open telemetry collector. So this thing that I have here in the center of my screen. So open telemetry collector, this is the small intermediate tool that we're gonna be using. You can definitely use some of those open telemetry backends without it. But this open telemetry collector does a few interesting things that you definitely wanna check it out. So first of all, it has the ability to batch things. So you'd want to, with a number of metrics and traces and logs that your system has, overwhelm your backend. So open telemetry collector can perform some batching. So we can optimize how the metrics would be delivered without sacrificing the delays for receiving the metrics. For example, also we can have not every system might have native support for open telemetry. So that's why there's another component called receiver that can collect this data, this telemetry data from different system, including native open telemetry format or maybe Jagger format or Zipkin format or in format of Prometheus. And open telemetry collector will take this into something that we can't understand. So maybe even do some of the process like in transformation along the way. So we have a receiver that gets the data from the system of our system that has a processor that does some internal massaging of the data. And we have exporter that will have connection to our external system. In order to collect this data inside service mesh, we will need to define a policy that will be collecting some of the information that going between different components and send this into open telemetry collector. So is it time where I'm switching from my presentations and going into my, where should we start? We should start with the quick observation of this application. So right now this application is deployed into my Kubernetes cluster that runs somewhere in GCP. And this is my like the public IP address you can go and hit this IP address from anywhere in the world. And in this case, if you go to this IP address slash work it's a simulator of my life. So when I go to work, I do meetings. So that's why when I click here, go to work, I went for four meetings and it took me one millisecond. So inside this, let me quickly show this. Hopefully my boss, we're not gonna see this but we're here friends, we're not gonna report. So if I'm go to my meeting application, what I do in the meetings, well, I spent my time very productively. So I sleep for half, for quarter of the second. And so that's why when my work is happening like four meetings equals one second. So two microservices. And let me open this fairly straightforward, fairly straightforward deployments. It has a deployment that includes this application and it has a service. So this within Korean's cluster, it will be available through meeting.mesh4dev. Work application includes a few things here. So it's also deployment, but also it need to receive the meeting URL from somewhere because one of the components, one of the important step when you're running this in microservice world is service discovery. So somehow the services needs to be discovered and the configuration needs to be provided to application to receive this URL. So in this particular case, I'm using this Kubernetes default even though there is a ways how we can customize it. And specifically when we need to deploy this in the multi-zone, multi-mesh environment, we really don't want to rely on like a NEM spaces or some of the other places where it can identify our service in a not very consistent way. So for example, the service mesh also provides DNS service and DNS service with dot mesh, you can customize it and inside, everywhere inside this mesh, this meeting application will be accessible through this mesh DNS name or through the port directly. I'm not gonna spend much of the time because it's like slightly beyond the actual topic for what's the conversation even though it's also related to traffic policies inside service mesh. So, and also KUMA itself includes a gateway component that allows me to expose the service to outside world. So the way how it looks like is actually exposes my work service through this prefix. And inside the KUMA, we use Envoy as a gateway. So we provision another data plane that would be special type of this data plane that will behave as a gateway for our vacation. So that's why I said every time if you want to hit this URL right now from your computer, like what could go wrong. So you can do something like that, I don't know. Do curl, yes, I can do something like this and this should work as well. Yeah, so it works everywhere in the world. Let's take a look at how this service mesh thing is looks like. So KUMA also comes with very, very nice UI. So it comes with this UI control plane exposes this UI so I can get in to see what is going on in my in my mesh. There's about, if you have a multiple different meshes deployed here, you will see those here but we're interested in default for now. And here we can investigate some of the data plane proxies that deployed next to our applications. So I can see there's a few services. One service is for my work application, one service for my meeting application. And this is the special type of service that was created by KUMA, that will be gateway for our application. And also inside my cluster, I'm running Prometheus Grafanov, just to see what is going on there. They also joined the mesh and the mesh also can collect some of the traffic here. Now, so now how we can configure those tracing information. So the services are registered. So we see the services are communicating now. I need to find the way how I can collect the data between those systems. So in the mesh, we have this concept of the policies. And one of the policies that would be responsible for collecting traces is called mesh trace. So mesh trace, as you can see here, it shows a specification of this, of the mesh trace. Let me show this in the begin font in another screen. So some of the very, people who watch this with very much attention, I'm just lost the word how I can say this, the person who can see this with attention. Anyway, so if you would look carefully to this position, you can notice this is the YAML here, is not exactly like YAML here. So that's the part of this universal mode. This is the YAML how this mesh can be configured if you're running this alongside Kubernetes. But inside Kubernetes, it looks how Kubernetes people expecting this thing to see. In this particular case, we have API version, we have a kind, so it's gonna be CRD inside. If I go just do control yet mesh, and there's a bunch of different CRDs is already deployed. There's a mesh fault injections, gateway instances, mesh insights, proxies, a lot of cool things that can be configured here. And inside this policy, what we wanted to have is we wanted to have all the traffic will be collected through the backend. So we need to route all this trace information somewhere. Inside my Kubernetes cluster, I already have a couple of things that's running here. Specifically, if I'm running here, there's my OpenTelemetryCollector that will be available within my Kubernetes cluster through this myOpenTelemetryCollector.default SVC cluster local. So all the metrics will go there. And I can put some additional information for Kuma2 to collect. So I want to include environment header. I want to include the version information. So when I will be deploying these applications and redeploying the application, I will be able to trace down and see what is going on here. Now with this trace, where is it? I'm sorry, yep. So with the OpenTelemetryCollector, I need to also configure this collector somehow. So the way how it works, in HelmChart that this OpenTelemetryCollector provides me, you can actually get the, there's a lot of things configured. You can configure multiple different environments, different integrations and all this kind of thing. So this is the collector is very sophisticated and has a lot of things. The collector, this, the things in the center that I'm talking right now. So OpenTelemetryCollector. Inside this OpenTelemetryCollector, I stripped down this to something that can be easily digested for my use. We go with pipelines that will define how the data will flow. So pipelines include traces and includes logs. We receive this from OTLP, which is OpenTelemetry receiver. We do some processing in this particular case, we're gonna be batch this with the default configuration and we export those traces. Same thing for the logs. So inside here, inside this YAML, I'm going in seeing this receiver that configures this OpenTelemetryCollector. So this OpenTelemetryCollector will be available through this port. This is processor, nothing fancy, but the magic actually happens here. So the exporter, that's the backend that will be, you know, collect all the data. Backend can be here, something like Yager can be backend. Can be something like a data doc, something like Honeycomb. I just use Honeycomb because I found the integration is like easiest to do. For me, I just need to put the endpoint in the API key. Don't worry, this API key will disappear right after the stream. Just the reason I'm showing this is just to see like everything is very explicit. Now, and after that, once this data will be produced by my services, I will call one service. This data will, Gateway will call work service, work service will call our meeting service and all the trace information will be propagated to OpenTelemetryCollector. In OpenTelemetryCollector, we'll push it into the Honeycomb. So let's see how this will work. So for that, I will be using very sophisticated benchmark tool to generate some traffic. It's called send request on interval. So the insomnia in this case will be sending this request every second. So, and we will start getting some of the interesting data here. Obviously OpenTelemetryCollector has the default log collector. It's kind of like a backend that doesn't go anywhere. And all the headers, all this information that's coming from the systems will be spit out here in a log. So you can also see if this is actually working. But we visual people will like to see if it's actually works with the system. So for the last 10 minutes, let's see, I'm start getting some of the traffic, some of the data comes in and I can see the traces coming in here. Can show me different things. Excuse me. So I would love to see actual spans. In this particular case, there's a one that happens one second ago. Let's explore it. What kind of information we will be able to see here. So now we see the data comes into router that has our gateway. After that, our request goes into service that will be this our data plane proxy. You have a question, Whitney? Can you make it bigger? Oh, of course. Thank you. Yeah. Sure. I'm not sure if the UI will not fall apart. Yeah, that should be something like this. And as we can see, let me actually just, is it still okay? Yeah. Okay, so I wanted to show this kind of like a spans because a few things that we can see because we call this work service once and we get the response for, I don't know why it's two seconds. It's supposed to be one second total. Maybe something, maybe some, maybe some extra information that I need to investigate. But the, my meeting service was called one, two, three, four and we have precisely around, you know, including the some of the network things that happened, 250 milliseconds of sleep. So we do have some information here. How this can be, how this information can be useful? Well, some of the information can be useful for situations like investigating the problems. How we can get into problems in the world of service mesh, service mesh can help you solve the problems but also can introduce some of the problems. And this is actually a very, very cool thing that service mesh can do. There's a concept called a fault injection that comes from, I guess it comes from this chaos engineering methodology, right? You kind of like injecting failures on purpose inside your system. So you see how your system will behave. When we, when we, when I talk to my ISARIS that, who support the ConConnect cloud, they have this concept of game day. So what does it mean? The one of this, one of the practices that they like to do is to have a scenario of a failure and inject declaratively this failure into the system. And as a, as an architect of the system or like a SRE of the system, they have some idea what could happen because people know about systems or some people might not know what could happen because they just joined the team, they don't have a full context of what happens. So there's a scenario of how the system will behave, including maybe a system will go to failure, it will start send alerts and all this kind of things. All this fun stuff that you're expecting at night when you're sleeping and something goes wrong with your life system. In order to sleep a little bit better, people like to be prepared. So that's why we're doing this game days. So we inject these failures into system and see if our alerts came out, if our dashboards detected this and we see the spikes and the all responsible parties were notified. It's kind of like a simulation of actual thing that might happen in a real life. Like a fire alarm. Exactly, yes, a fire drill. Fire drill, fire drill, yeah. And so let's try to simulate this. And specifically here, I want to inject some of the failures between my two services. So remember, my work service call in my meeting service four times. So if I'm injecting this 50% of failure, so it will supposed to be calling this like two times, right? Also, I really want to see if my telemetry system will be able to report those problems. At least we will be able to absorb because I'm not going into kind of in alerting the mechanism because it would be also different from the system or whatever you use, but at least you have a data. So you will be able to do this. So I'm go ahead and just do apply. Mesh fault injection. So fault injection is created. So now immediately we start seeing like one meeting, two meetings, we start seeing some of the stuff is going on. So between those two services, now there is a fault injection working. Let's take a look if we will be, yeah, go ahead. For those people watching live, if you want to play around and see that you're getting fewer meetings, do it on your own. You can follow that. Yeah. I'll leave this on the screen, I suppose. That would be even cooler because we will get real traffic to generate, right? I'm playing with it over here. Yeah. So first of all, we reported some of the traces and some of the traces we can go and just query. We can do show what are my errors and it can show some of the supposed to be showing me where is it, there should be something like all data sets and text and what are my errors? We will be able to see our 500, two hours prior one. So we do have a few errors, true. So 200, the response is not considered as error, but 500 is considered as error. So we have some of the errors here. So let's investigate. So let's open one of the spans that we might see here, two minutes ago. Do we have anything more recent? Few seconds ago. So we see that we receive some information from service with the error. We're getting this information in a format of access log and in my configuration here, I also collecting not only traces, but also like access logs. And inside my mesh, I have a policy configured that also pushes data into open telemetry collector from the policy that's called mesh access log. So that's another thing that you can use in your tool belt. Now let's go back into, let's see, if I will be able to see some of the recent traces that will include the error. So let's take a look. Now immediately what we can see here. So first of all, it's not precise science. Like when I said it's gonna 50%, it's actually sampling and also there's a sliding window. So we as people like to observe things in a deterministic world, but computers in the real life is not happening in the deterministic world. So in our sliding window that we're trying to observe in this particular case, it was just like one of these requests failed. But if I open another span, we can find if we want to be like a very picky about kind of like what are we talking about? But essentially what I tried to show you here that in the real life, if you have a real life 500 response, you probably will have a different lens. You will be able to get the trace where the thing happened and after that, you can use the tools like Loki that captures all the logs and the correlate the trace ID with particular call in the log, for example. For that matter, I do have Grafana is running and inside Grafana, inside my Kubernetes cluster, I have a Loki, let me go into explore, search from Mesh Gateway and from the last five minutes around Qwiti. Can we make it bigger, please? Oh, yes, of course. Thank you. No errors, I don't see why I don't see this. There should be some more traces, say 30 minutes. Let's see if I have a traces from yesterday because I'm pretty sure I have some traces that I was testing this yesterday. And inside this, there's no errors, there's some traces without errors, let's see. And this data is good. Yeah, so for example, here I will be able to navigate into particular place in the log supposedly. Why Victor, you're not navigating to the right place. So inside the log, I will be able to see what has happened here, what kind of error happened, like there's a Java net, UTL, whatever error happened. So from the trace, there's a good correlation between logs so you know where exactly error happened. So like in this particular case, we see the only trace and we'll be able to see. And another thing is that since we're running this inside the service mesh, the response time is very, very quick and the reason for that is that basically the data playing proxy, short circuit, all these kind of things because this failure was injected as a synthetic error. This thing is actually, short circuit, in the real life, if it was a real request, you will be able to see maybe there was slightly more time. Usually when the system might depend on database or some other system, we can see the situation with cascading failures and usually the way how you can detect the cascading failures, you see the latency is growing because one system trying to reach another system and there would be either retries or something like that. So by default, when you run this inside the Kuma mesh, Kuma actually applies some of the, remember when I was talking about the developer experience, we also wanted to apply some of the same policies. For example, there's a circuit breaker policy that automatically for every service inside the mesh will introduce max number retries. So even because we're running this in Kubernetes, we're running this in a, it's not dedicated environment because probably my cluster uses like a cheapest possible tier on Google Cloud. So it uses shared infrastructure. So some of the failures can happen just because I'm running this in somewhere else computer. So in order to not introduce false positives, we also whenever we're running this inside service mesh by default enable these policies. We have a maximum retries, a number of requests and all these kinds of things. That's called like a circuit breaker policy that runs inside. There is also a retry policy that is, we configured based on some like empirical and based on recommendations for community. So it's kind of like a good enough. But as a series operator, you will be able to go and check these things without changing the code of application. That's whole kind of like a whole purpose of this. Now, I have a couple of questions. One is if there's time also, do you have enough presentation? Okay, one is about, you showed briefly that you can use the service mesh to expose your service to outside of the cluster. Does that mean you would use this, you could possibly use this as your ingress implementation like you might not, you might not have a separate ingress. You could just use Kuma all the way. Or would you, what both? Yeah, okay, interesting. So the thing what we like to say here at Conq, so the Conq known as one of the most popular, one of the most popular API gateways, and people usually expect every time when they're talking about all the things, like I will bring the Conq. So yes, it will work definitely with Conq as an external API gateway. Potentially it also will work with any ingress controller so the way how it works, it will, the service mesh will also inject the data plane proxy into this ingress controller. So potentially it will also be able to work with any ingress controller. Many users and customers are asking like, oh, okay, I don't wanna over complicate my infrastructure. So in this case, I just want to have like a bare minimum on this side. I don't wanna like have a full blown API management on my system. And it's like, okay, we have already an Envoy and what about we will just like use Envoy as our ingress. So this is exactly what the mesh gateway does. And if you think about this last year, I think last year was the coupon when the Envoy team announced that they're gonna be building gateway based on Envoy and based on the gateway API, which is another very exciting piece of specification that come in hopefully this winter will come into kind of like a GA or at least like land as a good to test type of thing inside the Kubernetes. So we as a community, we work closely with the CNCF and Kubernetes special interest group, like a few engineers from Conq actually help to define gateway API and gamma, which is gamma is a gateway for four meshes. So specifications for this. So it's all shaping up. It's shaping up very well and very good with the help of community and help of the people who interested in this type of jazz. So yes, you don't have to use if you don't want to. It has all the batteries included things with the, yeah, if I go to gateway, so it has a built-in, there's another type. If I will run this with Conq, it would be a thing called delegated. So the, and there's also all this like bands and whistles that comes with the API gateway. Okay. And so my second question is just a complete departure from this particular thing. Absolutely, yeah. You talked about ways to collect data from your running service or your running application and you can do it with open telemetry by importing a library where it's tightly coupled with your application itself, or you can do it using a service mesh like Kuma and that's exactly what you demoed. And my question is, is the information that you get at the end the same regardless of the implementation or do you get different sets of different types of logs or different information depending on which way you do it? Yeah, this is a very good question. So the answer is that open telemetry defines a standard of the formats, how this data will be formatted. And regardless, if you're using the instrumentation inside your application, or if you're using the instrumentation that built into things like service mesh or API gateway, you will still end up with the same data format. So like if, so I will use example from Hinecom, let me open one of the thing. There should be something like a raw, let me run some query just to, it should be kind of like a raw data and the raw data allows me to see how this stuff would look. Nope, where is it? So yeah, the raw data, there's all this information that is included and all these messages, see this, the tags that I propagated from my policy, say I can include some additional information here declaratively when I do mesh trace. So I said, okay, so I will include some stuff. There is a, it should be something like a service name, there should be something like some additional information like environment or something like that. User agent data was extracted from the headers. So you will be able to see same information regardless. It's the format of the payload that standard defines. Okay. So short answer is- It's the same. Yeah, it's the same. It's like choose your own poison. Like if you want to maintain libraries or you want to maintain infrastructure. Cool, thank you. Yeah. And with the, you didn't ask me, but I still answer. As you can see in this, like if you're using like open telemetry collector, you have a flexibility on what can be done by infrastructure rather than your application. Say you have a application that, Spring Boot application that uses micro-matter to report all the metrics in the format of Prometheus. You still can use open telemetry collector and send the data in the format of the open telemetry. So the beauty of this tool that can receive data from multiple different sources, whereas if you're using this as a library, so you use Spring Boot with micro-matter, so hopefully there's an easy way to migrate one format to another. Sometimes some of the fields are not available. Sometimes our other fields like available in one format and not available in another format. So it will require some of the application code change. In another things to consider when you're using libraries, different languages might have its support for different, some of the languages has full support of every possible step of specification. Some of the SDKs and libraries might not. So it really depends. Like this is the consultant talking inside me. Yeah. But it's both available. They're gonna be the same data, the way how it's end up into your backend system is gonna be the same. Excellent. We have about two minutes left. Do you have anything you wanna say in closing? The last one is we always do in the internet and the YouTube, the stuff. Please subscribe to my YouTube channel. You know, the thing is I don't, you probably also felt it as a kind of like a developer relations specialist for the last couple of years. We had to change the way how we approach our audience and spend more time on YouTube, we spend more time on kind of like a building setups. Like I also build a light board in my basement. So I also doing this step of explainers. Oh, nice. So a lot of things like three years ago, I said I'm not gonna do any TikToks. Now I'm doing like short videos explaining like a smaller bits about the API management. So yeah, if you're interested in the API management, Kubernetes, cloud, native technologies, service mesh, I post a lot of content on the con YouTube channel and I'm happy to answer any questions regardless of the topic. I know a few things. At least five things. We also have some lovely comments from the guest that it's been an excellent presentation of the technology. And as a final question, they said, can we, let's do it again. Let's do it again. Thank you so much Juan. That's nice. If you didn't know all these things are recorded and available in CNCF YouTube channel, you also need to go and subscribe to this one. It's massive. Like if you missed for some reasons, KubeCon the videos will be available within the weeks from the KubeCon. So subscribe to CNCF channel, enable notification and we'll see you in the next one. Stay abreast of all the news. Okay, I'm gonna do the ending script. Are we ready for the? Yeah, goodbye everyone. Thank you so much for joining today's episode of cloud native live. I don't think we broke anything. What the heck, Victor? We didn't live up to our promises. No, there were a few things that they were broken, but I didn't show this. It's kind of, you know, remember this meme from the office when the presentation went well and the customer didn't notice and the white truth is kind of like a smirky look This was the moment that I... Yeah, there's the moment that traces the full information wasn't showing up and you pulled it from yesterday. Yeah, okay, there was one. You're just like a magician with a sleight of hand. We have another nice comment from chat too. Love it, it's great. Thank you, Diego. All right, oh accidentally, there we go. All right, so thank you so much Victor Gamovs for teaching us about service mesh observability with Kuba and open telemetry. The audience, y'all are great, super fun in chat and from all over the world, which I will never stop loving. And here at Cloud Native Live, we bring you the latest in Cloud Native Code every Wednesday at noon US Eastern and we're actually adding Tuesday episodes too. So the next episode is gonna be next Tuesday, yeah. So thanks for joining us today and thanks to everyone who watches the recording and thank you, Victor and everyone have a wonderful, wonderful day.