 Thank you for being here. I'm Nicolas Frankel, blah, blah, blah. So as I mentioned already yesterday, I have a bit of experience. And so we already had, when I started working, monitoring. And at the time, monitoring, it was like rows of people looking at a huge screen and looking at a dashboard. And when something happened on the dashboard, they were alerted, that's something, and then they tried to fix the issue. So everything was manual. Actually, I work on an account where they were super proud to tell me that they had the biggest, like, screen dashboard of all friends. It was in the southern France, and they were supposed to give the money to unemployed people. So you can imagine in France, it was a lot of money. And then the system became more and more distributed. So now we say we don't have monitoring anymore. Now we have observability. And it might be a semantic thing. It might be a real shift. I don't know. I'm not an ops guy. I've been a developer and architect all of my life. But I always had a keen interest in helping my ops colleague operate my solutions. And so just as a reminder, even though the talk will be about distributed tracing, because I think that the hardest part, that the newest part, just some reminder about the three pillars of observability. I assume that everybody knows about the three pillars. Who here is a developer? OK, mostly everybody. Who here knows about the three pillars? Not that many. OK, now. So always a good reminder. So just we are on the same page. This talk is for developers to help your ops colleagues. So metrics, that's what I did before, dashboard with metrics. And at the time, it was hardware, very low level metrics, like CPU memory consumption, disk storage, whatever. Now I believe we are going toward a direction with higher level metrics. Because if you tell the business, hey, we are utilizing like 85% CPU. They look at you and say, yeah, thanks. In interesting information, what should I do with it? But if you tell them, yeah, we have 99% of people who go to, I don't know, to put items in their cart and then leave the system before buying anything, that's an important information. They can use it. That's something also that can be observed that gives real insight into the business and where you can act upon. Logging has many, many different facets. And if you are not used to logging, it's simple. Indeed, it looks simple, but there are many, many questions you need to ask yourself when you want to implement logging correctly. So the first thing is what to log. When I was a young and eager engineer, I thought, wow, I know about Java agents. I will add a Java agent that will log every entry into a method and every exit. And I will log the input parameters and I will log the return value. That was very smart. That was very smart, yes. But what to do with it? And what about the password if there is a parameter that is the password? So it's easy to automate that, but the value is very low. Whereas if you do manually, it's a lot of effort. You need to think about it. We have been doing, especially in Java, sorry, I have been doing in Java like regular log4j, SLF4j, whatever, for ages. And it's very easy to output what you want. But nowadays, most of the system is not read by human. It's read by machines and sent to an aggregated log storage. So basically, it's better to output directly in JSON so you don't need an additional transformation stage with, I don't know, log stash, which is time consuming and more fragile. Again, from my experience, when I started working, I've been told you don't log to console. Logging to console is bad. Now everybody is using container. Where do you log? And then, yeah, as I mentioned, you always, always need to aggregate your logs. The single log on the system is completely useless. As soon as you start doing distributed system, you want to have log aggregation. So this is the general idea. You get the log and then another question is, are you actually sending the log? Is the application sending the log somewhere? Or is the component scrapping the log? That's a question you need to answer. Then again, as I mentioned, you need to parse the log if it's not in JSON. So better write it in JSON directly and then you store and then finally, you can try to search to get more insight into the system. I've been using Elastic Search for some time, so it's the system I'm most familiar with, but there are many, many available. I won't comment on which is the best because as I mentioned, for me, Elastic is the one I'm most familiar with. Use the one you like the most. And then, third pillar is Tracy. I generally love that we keep that definition. In that case, the definition is not that great. So I came up with my own definition, which probably I've been inspired by many others. Actually, this describes what is Tracy. What you want to do is to trace a request across your distributed system components. You want to know in which places it went through. And if something bad happens, of course, you want to know where it stops. There have been a couple of pioneers in the distributed tracing area. So Zipkin and Yeager are generally the most familiar. They are still working. I will be using Yeager in one of my demo for no other reason than, hey, I have a good Docker image that works out of the box. There is also open tracing that doesn't exist anymore and for one simple reason that I will get back later. But what we want when we have a distributed system with distributed components is a specification. Because Zipkin and Yeager had their own format. They had their own protocol. And then, you had to come up with a solution that was compatible with your tracing provider. With a specification, everybody adheres to the specification and you don't need to think about, hey, what implementation should I choose? Only this one because it's compatible. So there is this W3C trace context specification. And I think it's not like the XKCD joke, right? I think it's going to be a real standard that people are adhering to. And the idea is very simple. You have a trace which basically is just the single request. And then you have spans and the span is the execution of this trace in a component. And if it looks very abstract, then a diagram is always helpful. So here, perhaps I can impress you again. Yes, are you impressed again? Oh, okay, here is a single trace, right? This is a single request. And in component X, there is this span. In Y, this is this span. And in Z, this is span. And every component bots the first, but the entry point has a parent ID. So you can trace every span to their parent span. Okay, specification is good, but we need tools. And for that, there is something called open telemetry, which implements the trace context, but gives you a lot more. It gives you APIs. It gives you SDKs. It gives you the format in which to send the stuff. I told you about the open tracing. So this is one of the few merges in the open source industry that went well. So there was something called open tracing and open census and the merge to create open telemetry. And I believe this is really, really good. So nowadays it's a very, very huge project. It's a CNCF backed project. And it has like a huge massive follower. And more importantly, there are like nearly all the tools that I know now for this open telemetry specification. What we need to understand is that the open telemetry specification, as I mentioned, is just the format and the channel. They provide for your comfort something called the hotel collector. But what happens afterwards, the green stuff, it doesn't care. And even the yellow stuff is just for your own comfort. You can implement your own hotel collector. So you have lots of sources that know how to communicate with an hotel collector. They send the data in this format, on this channel, and it's done. Zipkin and Yeager provide their own compatible hotel stuff. So it means if you are using nowadays Zipkin and Yeager, there is no reason to use their proprietary formats. It would be much better to switch to hotel format and then you can, if needed, change Zipkin or Yeager by the next good thing. So now, as I mentioned, it was for developers. So now you know how it works. How can developer implement open telemetry? The first thing to think about is whether you want auto instrumentation or manual instrumentation. If you have a platform, that's a good question. If you don't have any platform, just as Rust or Go, well, you need to have manual instrumentation. There can be no auto instrumentation. With auto instrumentation, for example, on the QVM, you can have a Java agent, which means that your code is completely oblivious of open telemetry. The developers don't need to know anything about open telemetry at all. Only the people who are building the artifact can add the agent and then the agent through the magic of the framework. So in my demo, there will be spring, but here probably there are a lot of quarkus people that implement the same stuff, will send the data to the open telemetry collector. If you go the manual instrumentation route, you will have an additional or different additional dependency in your code. And then your developers will need to explicitly code them. I believe as a first E-Force, you should reach the low-ending fruit, which is auto instrumentation if you can. If you're on the QVM, use the agent. If you are on the Python platform, add a couple of dependency on the Docker file and you're done. So here I have a demo and the demo is again an e-commerce demo. So basically I have a user which will ask for products. The first entry point is always the API getaway. You shouldn't expose anything over the internet directly. And then the API getaway will forward the stuff to the product. The product will get the data it has in its own database. Then it will look for prices of the product in the pricing microservice and the stocks in the stocks microservice. Yesterday I pitched a lot against microservices. You can consider just then distributed components. Even if you don't do microservices, you'll probably do distributed tracing anyway, distributed calls anyway. So the first entry point, the entry point, the first spot is the most important part because it's the one that generates the parent ID, the first parent ID. So your reverse proxy, your API getaway is the most important. I work on the Apache API6 project which is an important source project. Who has heard about Apache API6, by the way? Three, four person. So people who attended my talk yesterday didn't hear about Apache API6. I'm a bit disappointed since it's my job to let you know about Apache API6. Okay, because yesterday I didn't present it, but let's present it today. So basically it's an API getaway. It's built on NGINX open source. Then we have this open resty layer which is a lower layer that allows to script the configuration. So it's dynamic. You don't need to switch it off and on again to change the configuration. Then we have core API6 and API6 is based on plugins so you can add or remove plugins. And now I will do my demo because I've talked a lot already. So here is my Docker compose. I will start it and I will describe it. Docker compose up. So I use Jager because they provide everything into one single image. The web app, the hotel collector and whatever components they provide. I don't need to be interested in how they provide it. They have one Docker image. I use it because I'm lazy. Who isn't lazy there? Okay. Throw the first home. Then I have API6 of course. Yesterday I showed you that I was using a key value store, ETCD. Here I'm using like flat files, YAML files, static files. So I'm more using the GitHub's way. So basically if I want to change the configuration I will like listen to change in my GitHub repository. The change will be reflected in my static file and Apache API6 will reload its configuration. That's another way to use Apache API6. I have the catalog. So I will show you the catalog. I have the pricing. I have the stock. The catalog, I have multiple components and every one of them has a different technology. So who here is a GVM developer? Wow. Are we at a Reddit organized or sponsored conference? There are three people who are using GVM? Interesting. Python? Interesting. Rust. Yeah, everybody who develops in Rust, they are not that great in Rust. Yeah, me as well. So that's not an issue. So let's start with the Python stuff then. So here I have my like Python application. It's a Flask application. Nothing really interesting. I'm using SQL Alchemy to query SQL like database. Then I get the price, I jitter it a bit just to have more random value for fun and you can ask for the price of one single product. Okay, nothing mind blowing, just a regular Python Flask application. I'm not a Python developer, so if you see bad stuff on this code, please let me know afterwards, but not publicly. Then on the Docker file, I will actually add the open telemetry dependencies. So the requirements here, they are what I told you. Flask and SQL Alchemy, nothing more. But when I build the image, I add the additional open telemetry dependencies and then when I run the image, then I run it through the open telemetry instrumentation stuff. So as I mentioned before, the developers, they are completely oblivious of this open telemetry stuff. They don't need to know about it, okay? And then on the Java side, so we will forget the Rust side. So on the Java side, it's a Spring Boot project. I'm using Kotlin because I love Kotlin. Everybody should do Kotlin anyway. It's reactive even though it's not necessary here. And then again, nothing related to open telemetry here. However, when I create the application, the image, so it's like a multi-stage Docker build. So first I build the stuff, I need a JDK, then I run the stuff, I only need a GRE. I build the stuff normally, and then I add the open telemetry Java instrumentation in the latest version and I start Java with this Java agent. And then the Rust stuff, there are not that many people doing Rust so it's not interesting. Besides, again, I'm not super great in Rust. So now I can curl local hosts, I go through the API gateway and I ask for the products. And I've got the result. And what happens is now I can go here and I will check Yeager. So this is the Yeager UI. I can find the traces and I did a single request. So I will find the traces here. And here we can see the path of the request through all the components. So of course it starts with the API gateway and then it forwards it to the catalog. And then inside you can see different calls inside and that is through the magic of Spring. I didn't do anything but Spring was able to do this stuff for me. Then something interesting, even if you are not an ops person, you will see that the catalog get for the price goes through API six. So we don't directly call the pricing component, we get to the gateway that forwards it to the pricing. It can be one architecture. On the other side, the catalog calls the stock directly. So even if you are not interested in distributed tracing, you can check how a request flows through all the components because it can be a misconfiguration on your part. And then you've got additional data. On every components, you can have additional data. So here, for example, on the Apache API six side, I decided to add, as I mentioned, everything is a plugin. So here I have a global rule that says every time you go to API six, you just add the open telemetry plugin. Normally you should always sample. You don't want to have every request which is instrumented, but for demo purposes, here I have everything. And then I have additional attributes such as the root ID, the request method, and I have this HTTP header. So here I have nothing, but if I redo the same with an additional header called HTTP, sorry, called xHotelKey, and I call it hello.devconf.cz. And then I get back here and I search again, then I have a new one, and here on the Apache API six side, hey, I have this additional data. So you can query additional data if you want to have it in your tracing. On the Java side, I have a new one. I have a new one, and here on the Apache API six side, Java side, you've got additional data that I didn't configure. This is by default. Here you can see that I'm using the network project because I'm using WebFlux. The thread is the reactor thread. So again, you are not using WebFlux, but there is project reactor and they have their own threading model, so everything is handled by them. So that's the beginning. And I believe it gives you already many, many insights into your architecture. But now, suppose we want to add manual instrumentation on top of it. No worries, let's do it. So what I will do is I will docker, I will just do it here, docker compose down. I will just use get because I have mentioned I'm lazy. And I will up it again. And now I've made a couple of changes. Here, I'm telling my developers, now you should add open telemetry to your list of dependencies. Okay, let's do it. And you can now use some interesting stuff. So inside the application, inside a single function, I can add additional spans. So with Spring, I already adds internal spans, but now I'm adding them inside my Python application with the explicit API. So here I've decided I will trace the query and I will add the attributes, the product ID that I'm querying. Okay, on the Java side, I've added the annotation stuff. So Spring allows me to do it through annotations or through explicit API. Honestly, the explicit API doesn't work that great at the moment. So I'm using annotations, but for my opinion, in my opinion, especially for a demo, it's good enough. And here, I've added additional stuff. So I've added annotations. So now every function as will be instrumented. And here, I want also to add the product ID. For one single product, I want to check the product ID. And here is the stuff that is not that great is I don't need this ID in my code here because I pass the product. I don't need to pass the product ID. I can get it from here. But because I'm using annotation, I need to change the signature of my method to auto-instrument it. So there are pros and cons to this approach. Now, if I go there and I curl, and I curl local, local, local, host, maybe 90 products. Now I get back to Yeager. I find the traces. Now there were 20 spans before. Now I have 27 because I've added some of them through explicit API, through those annotations. And here, you can see the stuff that I added manually. Select star from here. It's not good to do star, but again laziness. Where ID equals ID. And here I have the ID one. And we can see that actually my API is very bad because I'm calling every single component once whereas I should pass all the ID that I want and get them in one call. But it's also interesting to see that. And on the GVM side, now we have like here, the find all that I didn't get before. Sorry, nope. And folks, that's all. That's all that I have for you today. So just as a last words. This is for developers. And as I showed you, for auto instrumentation is not that hard. If you are using a runtime such as a GVM or a Python runtime, you just can build your Docker file and auto instrumentation and you will get already lots of, lots of insights that can help your ops colleague or even you if you are doing also debugging to understand the flow of a request throughout your system. If you want to have additional details, then you need to do that yourself. Either through a notation or through an explicit API. But then you are, you couple your code to open telemetry. Thanks for our attention. You can follow me on Twitter. You can follow me on MasterDone. Everything is on GitHub like every time. So you just get this bitly just to check how many people are actually looking at the codes. And if I got you interested in Apache API 6, please have a look. This is the reason I can come to you and do talks because somebody is actually using the product. Any questions? Yes. You will need to shout because I'm old and I don't hear that well. So the question is whether I have any experience with like fast requests such as DNS requests, correct? And the answer is no. I've been my share of consulting and either I was considered a very good consultant or a very bad because when I don't know, I said I don't know. My employer didn't like it. My customers liked it. Other questions that I can answer? If you don't ask the question, you will never know if I can answer. Yes. So the question is whether it's possible to aggregate the traces which is, I don't understand really the question because the traces you cannot aggregate. Let me change the direction a bit. I've seen a demo on the Grafana stack with Lucky, Temple, Grafana, and Mimia. Thanks. I always remember it looks good to me. LGTM. And actually you can go from the traces to the logs to the metrics and go circle. And that's probably what you want because like a request is a request. Aggregating for multiple requests doesn't make a lot of sense in my opinion. So here is just the Yeager UI. Other components will give you probably different insight but they look more or less like this. I know that Elasticsearch.Same. And because I'm too lazy, I ask one of my friend who works for Elasticsearch to provide it in this way so you can have the same like go to the metrics and the logs and the traces and all is into one. Other questions? Yes. Yes. OK, good. Yes. Yes. So Python is not my area of expertise. I'll say the question was why are the dependencies of the Python Dockerfile in beta or not the Dockerfile but the OpenTelemetry in beta and when will it be fixed and why is it so? So Python is not my area of expertise. So honestly, I don't know. However, I know that some people were like very, very deep into OpenTelemetry on the JVM side are telling me that there might be some, let's say, slight issues. It's not as easy as it is. And of course it works very well for a demo but if you want to use it in the real world, it will work but you might have sometimes some surprises. I don't think it's a real issue because if you've got, I don't know, if you sample and if 99% of the traces are correct and one of them is not correct, it's my opinion not that big an issue. But you should get used to using the beta version until it's fixed or it will never be fixed and just use it. In my opinion, the amount of eFort versus the benefits that it gives you, the ratio is incomparable. So if you are an engineer, yes, it sucks to use beta versions but if you just want to get done, just use it. Other questions? Yes. So the question is, is it possible to use the auto instrumentation with the manual instrumentation? And actually, if you look at the Docker file, I think it's the case with the GVM stuff. Let me get back. I think it's the case, yes. So here, I'm using the Java agent plus the annotation. The Python, I don't remember. Yes, it is the case as well. So yes. Because somehow, you need something to send the data to the open telemetry collector. So for sure you will need it. It's just that it's an additional step where your code is coupled, which is not that great. You had a question, yes. So the question is, what's strict? I could recommend, I don't recommend anything because I'm not an ops person. But you should check the Grafana stack. And I didn't know it, but as I mentioned, Elasticsearch does the same as well. So at least you have two stacks which you can compare. But I would recommend using the stack that your ops people are familiar with. And don't try to force them a new tool upon them. Because sorry again. So the question is, what would it take to implement a third-party library? And it depends. So the question is, how hard would it be to instrument a third-party library? And the answer is, it depends. First, it depends on the stack. Python versus JVM, for example, rusts out of the question. You need to have a platform. And it depends on the library itself. If the library is, for example, recognized by Spring and has the correct entry point and whatever, it would be out of the box. If it's not designed for that, tough luck. Because you need to be at the core site to call the hotel collector to send data. So if it's not designed for that, you cannot do it at all. So either it's out of the box or not at all. Yes. So the question is whether there is an integration with Istio. So the question is whether there is an integration between open telemetry and Istio. And the answer is, I don't know. Yes. So the question is whether I've seen similarities between open telemetry and sentry, right? OK. And the answer is, I don't know. Because I never use sentry. I don't even know what it's made of. And I am out of time. But I will be there for a couple of minutes if you have questions. Thanks a lot. Enjoy the rest of the day.