 So hello, everybody. My name is Michael Abaman. I'm from Aspecto. And today, we're going to talk about two interesting things. We're going to talk, well, of course, about open telemetry. But also, we're going to talk about service mesh. And I think that when we're trying to see, when we combine those two things, we're going to see that they are interesting because of two reasons. So first thing, both of them can produce telemetry data, write both of them, claim that can help us to monitor to observe our application. But there is an interesting overlap that we have between the audience. So a lot of people that are interested in running open telemetry would also use service mesh. And then when you have two things that can work good together with one another and they are the same audience, that's a good match. So this is why I was interested in doing this talk. So a bit about what we're going to do here today. So I'm going to really briefly talk about what is service mesh just to make sure that everybody is aligned. But mostly I'm going to talk about where service mesh meets open telemetry and where open telemetry meets service mesh. A bit about the configurations. And lastly, to talk about what's the pros and the cons because in any technology decision there are the good stuff and there is the bad stuff and we need to talk about both. So really short, service mesh is a networking component part of your infrastructure that eventually responds about networking. And mostly I'm going to refer to service to service communication. So when your service needs to communicate with another service, you're going to communicate through this mess. And I'm going to look into three specific things that the service mesh can help you with. So the first one is going to be with observability, then routing and lastly on authentication. We try to see how we can benefit when you're using those things. So here is a very, very complex architecture. We have an older service communicating with the user service, then the author service and when they communicate with one another, they are going to communicate through a mesh. If you would go to most websites of service mesh like East Geolike, Leak and D, you would see that they would say you can get tracing out of the box. Now imagine going to your service match, going to the configuration file, turning on this magic tracing configuration and everything just works. That sounds really, really good. And I just want to put it out there and see how it really works and what you are getting and what you're not getting. So we can tell the service mesh to generate spans and here I have a, not here, here I have like a very, very simple application which I ran before we started this talk. And you can see here that I can decide whether to initialize or not initialize the tracer file. So currently what I have here, I have two service communicating with one another through Envoy and open telemetry is not activated. So looking at our example here, so I sent a request, now let's go to Yeager and let's refresh it and let's look at our proxy. And you can see here that we can see a trace where I have firstly the proxy, then I can see that I'm sending a request to the older service, but I don't see the request to the user service. So I do have spans, I do have a trace, I just don't have the context, I have a broken context. And this is a truth that I think needs to be told that if you are just enabling your service mesh to start emitting spans, you're not going to get tracing, you're going to get traces, but you're not going to get the whole tracing experience that you expect. So what you do need to do is you're going to propagate the context yourself, right? You need to implement open telemetry, you need to instrument the services and you need to take care of the context propagation. So what I'm trying to say here is that now if, and let me show it live, so you will be able to see that it actually works, but if I'll change this parameter to true, meaning that now we will have open telemetry running true, and it will take a second, then our trace will look like a whole trace, right? Now I'll be able to see everything that is happening there. So it's kind of this thing that some people may think that you can get tracing from a service mesh, but it's not actually the case because, well, you really, really need to implement open telemetry. And this is a quote from the editing docs saying that, hey, yeah, we do have tracing. However, just a tiny little bit, you still have a way to go. So what I'm saying here is when you are using service mesh, you are going to generate, you are going to implement open telemetry, but the question that I wanna ask is, is there a way around? If I am using open telemetry, I already implemented open telemetry, will it be beneficial for me to get spans from the service mesh, right? Because I have to have open telemetry, but what am I going to get when I'm going to get spans from Istio or LinkedIn? So let's take for, at least for my opinion, probably one of the most common use cases that you see with a service mesh, and that's routing. We are going to route different requests. When we are getting a request, we are going to route it differently, whether it's because we are running an AP test, a feature flagging, or I'm just running some deployment, I'm a camera deployment, and I just have two versions of the service and I need to route those things. So if you think of that from the perspective of tracing, when you look at the trace, and you see this trace and you ask yourself, well, to which version it belongs? Is it belong to the version that I already deployed? Is it the one that is going to be deployed? This has, it can be very critical. Let's say that you are deploying in performance improvement, and you expect to see a span that now is taking less time, and it's not there. It's taking the same time. Well, is it the new version or the old version? So the things that you may be able to solve using plain open telemetry, like using the distro, but looking at the data that the service mesh producer could be very, very beneficial. So this is a trace where we are routing the request and you can see those two traces. The first one is 12 spans, the second one is 13 spans, and here we have no telemetry data that is being omitted from the service mesh. This is plain open telemetry. And when I am going to add the service mesh data, and you can see it right here, now it's starting to get a bit more details, right? Now if you look at the first request, you can see here that here we communicated with service order number two and here with number one. And this is a great clue for whoever is going to observe this trace. You're going to understand, okay, now I understand what's happening. Now I understand that there is some kind of decision that there are two versions and I know which version I'm looking at. And let me try and show it to you. When it's actually running. So you see here, this is service version number one and now sending a few requests, two minute requests, service two, thank you service two. And now let's look at those and now we can see it in live. So we can see that there is a difference between them and I can understand the difference quite well. And if I'll click this one and something is happening with the timeline, so I don't know why the timeline is so off, but it's making it very, very easy for me to understand that this is a decision that the proxy took and the proxy decided to use now order number two. So what I'm trying to say here is without the service mesh data, it would be harder for me to understand this whole trace. Now another interesting use case that we can see in regarding authentication. So when you're talking about routing, this is basically saying one service is communicating with another and in the middle we had a decision that we took. But service meshes could be even more inclusive in our application. When the service mesh is responsible for the authentication, they are actually becoming part of the application. So when the client is calling an API call to the order service, we want to authenticate the user before it reaches the order service. Meaning that the service mesh is, although it's infrastructure and a networking piece, it's now responsible authenticating which is absolutely a part of the application layer. So if I would look at the trace where sending an API call, the service mesh would get that, take the call, make sure that the user is valid and only then going through the service, the trace would look like so. So I have this, a bit of a broken trace, right? I don't have a single root parent to this trace. I'm not sure exactly what's happening here. And I don't know why the user service was called. I just can see it was called. And I can see that after that we got an API call to the order service. Now if I'll ask my service mesh to emit the limitry data as well, it's going to look much better. So looking at it, let me increase that so we can all see. So now I can see the whole story. Now I can see that we sent an API call to get request. We did this async egress thing where we authenticated the user and then we communicated with the rest of the application. So what I'm trying to say here is even though service mesh is the networking piece, it's part of the application. It's a fact how our application behaves. And if we're trying to observe the application and we are trying to say we can look at traces, we can look at the observability data and we can understand what's happening in the system. Well, it's part of the system. It needs to emit to limitry data so we can understand the whole thing. So just to sum up the kind of the options that you have with installing the telemetry as a whole, you can just go with implementing open telemetry disto. It probably would give you, I don't know, 90% of what you're looking for and that's great. If you're only go with service mesh, you are left with a broken context that's I guess not going to take you very, very far and you are going to have both of them. You are going to be super happy. Now, if I were ending this talk right now, we will be super happy, but we need to talk about the real stuff because everything in tech come with a bot, right? Because when we are going to introduce the service mesh and start to implement traces, bad stuff are going to happen. So let's talk about those a bit. So I'm going to talk about three things that are going to be affected. Cost sampling, specifically head sampling, and a bit about the configuration. So the first thing that we can understand is that we are going to increase the overall cost. When I'm saying cost, I don't mean like dollars you need to pay the vendor. I mean the overall cost if you manage it yourself. You are going to generate more spends. And just by introducing at least one more or even two more spends for every network activity you are going to have, this could sum up to significant cost. Now, when you're increasing the cost, you need to ask yourself, okay, what value did I get? So if the value that you're getting is significant, yeah, go for it. It makes sense. So I saw companies implementing Easter traces and the only thing that they could learn from that is what's the extra latency that Easter introduced? And that's fine. That's not tons of value. I'm not sure it works, the cost will depend on them. But when you start to really do things that like routing, like retries, like back off, like authentication, then you really need the service mesh to tell you what happened in order to understand the traces. And then I think the cost does worth it. But it's every deployment, every company need to take this decision for themselves. So this is the first thing that I think it's the bad news that we need to think about. The second one is regarding head sampling. So if you are using head sampling, you are kind of used to that they open telemetry in receiving the first decision once they're getting the request. Talking about the service mesh, the service mesh now is before your application and the service mesh is going to take the decision for you. So if you were writing a code and you said, okay, in my service, I want to not sample health checks races, it's not interesting. Now the service mesh, well, depending on the service mesh, but most overall service meshes that I know, they are only going to give you the possibility to define what percentage of traffic you want to sample. So if you are relying on head sampling and then you introduce some service mesh that may go to waste. And you can start to solve it. You can start say, okay, I won't do like parent-based sampling. I won't look at whatever the service mesh decided, but that's going to take you to a whole kind of bad experience and a lot of work that you need to do in order to make it actually work. So this is something that out of the box, I think, doesn't really work. So it is another area that you need to consider. And the last thing that I would mention regarding the bad news at least, is that when we are working with open telemetry, we want everybody to work as we expect with open telemetry with the latest specification and follow the semantic convention and use OTLP. But the truth to be told is that's not always the case. So I took Envoy as an example, just because many service meshes uses Envoy behind the scenes. So Envoy still doesn't support OTLP. They do have a pull request for that, but it's still not merged. So currently, if you're using it, you need to export the data through Zipkin, through Yeager, which is fine, but they are not using W3C context propagation. So now you implemented open telemetry in Adodo. 100 services using W3C, and now you said, okay, I want to turn on tracing in Istio and now you're in a problem because you need to change it to B3. So it's not something that you can't solve. It's just a lot of management that you will need to do. It's a lot of changes that you need to do. So those things, I guess that in next year talk, I'll probably say, hey, we're all good. Everybody is now aligned on the W3C, but it's at this point of time, it's a bit annoying. And also the semantic convention is not always spot on, which could be a bit annoying. So if I want to kind of sum up what I'm trying to say is any component that is part of the application, any component that is affecting how the application is running, it's worth monitoring, it's worth to observing. When we look at a trace, we think this trace is going to represent what really happened and to do that, we need as much data as we can get. And I think there is something interesting with service measures because most of the company that I saw, they use an open telemetry distro implemented in their services. And this is kind of the first time that we see something that is not a part of the application code that is starting to emitting telemetry data. And I think, and the panel discussed it a bit, that we're going to see more and more components start to emit telemetry data as we go. I don't know, databases and message brokers and queues and those kind of things, they are going to follow it sometime. And we're going to see probably the same kind of problems that we see with service measures. So we can kind of look at that as data as a community and learn from that what we need to do in order to guide those who are going to emit data that we're not going to have those same problems or mistakes again. So thank you very much. And if you have any questions, I'd love to try and answer. Hold on, Mike's coming. First of all, thank you. I did the similar exercise that you did and I came to the same conclusion, but I always been surprised on how Kiali is able to create the traces of the box without any hotel implementation. So you have the context propagated. So I've been digging, digging, digging and I have never understood how Kiali was able to do that. Have you found? Which project? I didn't get that. Kiali? So I'm not that familiar with that. And it is very interesting to see and I'll try to dig up and maybe I'll come with a conclusion because if there is like a magic button, you can click and get good traces like with a context. That's a great offering. Anyone else questions? All right. Well, thank you very much. Big round of applause. Thank you.