 Hello everyone, glad to see so many people here Glad you made it also here. We made it was close to not making it here We got stuck in the elevator for a second, but then it worked again Okay, for today we have a few steps prepared in case you have any questions during the tutorial Just wave or raise your hand and then one of those beautiful people will come by and help you For those who are virtual here, you can just reach us on slack and just ping us in the channel So what we do first is we will quickly wrap up what is observability and then we Continue with the open-talented project and then we just start with the tutorial. Oh And the prerequisites So in case you want to participate you need to have a terminal have access to a Kubernetes cluster and Yeah, need Docker apartment to run some containers if you have no Access to another access to Kubernetes cluster. You can use kind We have also is described in the tutorial steps how this works so we can set it up all together All right, so what is observability observability is basically Or when people speak about observability, they want to know that no things about a system They don't really know which means what does it mean? It might means we have a I'm a bit stuck It's basically when We have our Probably you can explain it for me Yes, so Sorry, so yeah, the observability is about understanding our applications, right and we do that by Looking at certain data, which are either metrics logs and traces And this tutorial will try to use all of them on Kubernetes by using the open telemetry technology Yeah, right. So the thing is usually people come with different solutions and then it's hard to correlate them But open telemetry can help us with this. So we who is here a bit familiar with open telemetry So he played with it or something like this It's not too many people. So who used it in production. It's probably the people who also played with it okay, so Open telemetry itself is an open source project as you or most of you probably know It's from the CNCF and it's when your neutral approach to ship telemetry data It is a bit an overloaded term It comes with a specification an API and SDK data model a lot of other tools to generate traces to auto instrumentate your applications and a collector and This collector is the thing we will speak about first Yeah, you can run this collector on or the entire things on Kubernetes or on Open shift you can run it on your PC itself and That's what we will do now. So we go first to the open telemetry collector figure out how the configuration works What we can do with it then Pavel will show you how the operator works and then Severin and Pavel go and Continue with an instrumentation for it. They will also show you the author instrumentation next Christina goes and shows how we can integrate With using the open telemetry operator with Prometo's especially with the CRs and then finally Yuri comes and Shows us how it works with locks. So how we can get locks from our notes and Yeah, what's next? So you need to open up this link it will lead you to a repository from Pavel and there we will find all the instructions Should I let it there for a moment so that everyone can open it or you can go to my github account Well, you don't see the URL, but if you scroll up then Scroll up the other way Yeah, it's it's pinned So you will see it and we will follow the readings there in this repository Okay, so what we will find here is aside from the abstract in the slides the prerequisites So let's jump on setting up a cluster. I Hope most of you have installed cube CTL and in case you don't have kind which that's what we use here You can quickly install it if you have go otherwise you can also download the binary over here and And Let's see if I can use a MacBook so this will set up our cluster And then you can type for example cube CTL get nodes to just see if it's up and running Here we see the node is not ready yet But now it is okay Later in case you don't use the containerized version of telemetry gen to generate some trace data You can also use You should install it via go But it's not required So next we need to set up a few things for the admission web hook of the open telemetry operator We need to install Cert manager So Ben is a Linux user and this is a MacBook Believe me. It's way quicker than it was when we tested it yesterday. He managed to broke it even It's command C, right? Okay, and then control V Command V one try then yeah Hey Okay, so next we can deploy an observability backend. We just glued something together. It's Consist out of me and me low key and tempo Okay, so and now we should be able to see in Grafana dashboard, which is also directly preconfigured. I will open up a new tap Not this one Yeah, I recommend opening a new tap so that you don't I need to pull forward the ports always Yes at the back and deploys couple of Grafana projects to to store and visual to store trace data metric data and logs And so it takes some time to get it up and running. It's not only the MacBook. It's also the keyboard layout Okay, so we have no data, but it's up and running Okay, aside from that Let's see if I can get it on the screen. So that's an image you will find when you go to the open telemetry collector documentation and It displays basically the architecture of it And we can see it has three major parts on the left we have the receiving side which is here the OTP collector the Yeager collector and the permitor's receiver and They are slightly different for example the OTP collector is a passive one It waits for messages and listens on a specific port and the permitors one is an active one which just scrapes data That's was it basically the types you will get there and Next we have the processors the processors are then used to filter data and richer data with the Kubernetes attributes you will see it afterwards and Finally we can then export the data to different data stores That's what we also do today. We send data to Mirmir. We send data to low-key and they all use different exporting formats There are a few more components you will find when you go to the documentation For example extensions those are used for handling identification or connectors, which are used to for example generate traces metrics out of traces and yeah by default the open telemetry collector is Available and the Docker repository it's maintained by the open telemetry core maintainer team and For contrapp there is basically everything included you will find in a contrapp repository In case you don't The core thing is not sufficient for you and you don't you shouldn't probably use the contrapp one because it includes all the things You don't need you can also build your own collector There is the open talent tree open talent tree collector builder available and this is a manifest how it looks like so we can Just put together what we would like to have in our distribution and then go ahead Finally there is a configuration That's what we have seen on top. So we it's divided into these three parts. We have the receiving part here It's accepts gRPC on a specific endpoint, which is Yeah, the default one. I just placed it there so that you see which board is used and Then we use a batch processor to just say save some overhead and the logging exporter to See on the console when telemetry data is arriving. Finally, we need to specify pipelines. This is basically which receiver Gets telemetry data and where should it go so we can have different database destinations We can have different kinds of databases what we have later on So you can try and get this configuration using this curling or just copy it and Then running a collector is also quite easy. We have this docker command here. So we just run it locally It forwards the ports Gets the collector configuration mounted in and then just sets it as a parameter So how does it look like when such a thing runs? Let me open a new terminal. I hope I'm the right photo No, I'm not don't look this thing Directory so now here should be the collector configuration. This looks good. And there it is. So it shows us the exporters are loaded and Now now it's waiting So next We can generate some traces in case you and started locally and you are not on a MacBook I think there is something different You can create another container which then It does the same what we see here on top calling telemetry gen setting what it should generate and send data there So we just pick here traces This was again the collector It's like I picked metrics not traces. Yeah, but we know we see here the Metrics arrive. So in case we want to see more details. We can then just go up again in the configuration and change The verbosity which is there somewhere Yeah, that's so far it and no public will show you how to get those things up and running on Kubernetes Yeah, so so we started with the collector to Kind of show you how to use the collector locally to to play with it because by playing with it locally You will get You can get it very quickly up and running and you will see if your configuration is correct or not correct So if you are starting with the collector, I would definitely recommend you to first run it as a Docker container and then use the operator so for the operator First of all, I would like to talk about like why what what is the Kubernetes operator, right and the Kubernetes operator is a component that you can deploy to your cluster and it will Kind of expose new functionality to your to your users on Kubernetes, right? It's it's this functionality then is it's exposed. Sorry as a custom resource definition And it usually hides complexity of the application in this case the open telemetry collector and The operator usually supports kind of the the application upgrades It can as well fix any kind of breaking changes of that application So let's say if the auto collector Like kind of breaks the configuration the operator is able to to fix that on your behalf And it's as well kind of allows you to scale the application a bit more easier The open telemetry operator Then kind of offers free or solves free use cases it can deploy the open telemetry collector Right deploy provision and scale it up Then it can allow you to instrument your business applications on Kubernetes There is right now support for instrumenting Java.net Node.js and Python and Last but not least it integrates with the Prometheus ecosystem. It can read the pot and service monitors And Distribute the scrape targets across deployed collectors We'll see that in the in the metrics section later So for the CRDs that the auto operator kind of manages there are two of them The first one is for the collector and the second one is for the instrumentation The operator itself it's a deployment right we have to deploy to the cluster and we have to create the CRDs as well We can deploy it by kind of creating or applying the operator manifest files from the Open telemetry operator release page or We can install the operator through the operator hub There is like install button or on open shift There is directly operator hub where you can just type in I want to install the auto operator and just click a button the Auto operator uses the third manager that we already installed and So we're gonna just install the operator the third measures that you can skip I'm not a Mac user either. So I have to figure out how to change to terminal How do I call this? command one Okay, so let's Okay, it seems it's there and we're gonna check the installation by getting the pots from the operator system namespace This is the namespace where the operator is installed Okay, it's up and running control one again. How do I kill this? Control C I control C not command C Okay So and now I would like to speak about the the collector CRD and then about the instrumentation CRD So what you can see here Is the collector CR one kind of instance of the CRD and there's a couple of Configuration options that you can set the most important one is the config and that's the place where you can put the entire Collector configuration that we saw from the you know the step that banner showed us Then there is config for the image and this is the place where you can configure your Distribution of the collector or use the contrip or the core one from the open telemetry upstream Then there is a mode where we Say the operator how we want to deploy the collector if it should be a deployment Statal set demon set or a side car and There's configuration as well for sampling for sorry for auto scaling or exposing the collector outside the cluster for the side car if we create a CR with the mode side car the Operator will not deploy the collector Then you have to use the annotation and put it on the pot spec of your application and The operator will inject the collectors side car to your to your application So now I'm gonna create a collector CR. So this is the The spec it's pretty much what we saw in the previous example But in this case There's a couple of exporters we're gonna export through otlp to to tempo Mimir and Loki and You know Loki is a is a lock system. So logs gonna go to Loki metrics to Mimir and Traces to tempo Now we're gonna check if the collector pot was created and we see it's up and running the The operator it on not only like creates the collector deployment, but it will creates the the service for the for the collector So now I'm gonna get the the service and We see it exposes the otlp for receiving gRPC and HTP Now we're gonna change the CR By using cubes detail edit and we're gonna add the Yeager receiver then we're gonna apply the change and we're gonna see that the Operator will expose the Yeager ports on the service. That's a wrong comment Yeah, I'm gonna enable just the gRPC and I'm gonna write all these first protocols So I have added the the Yeager receiver to the receiver sections but by doing this if the receiver is not enabled I have to as well edit to the to the service section of the config and And to the Tracy's pipeline because Yeager works only on traces save the config and we're gonna Get the the services of the collector Copied Yeah, now we see there is a as well port for the Yeager gRPC Okay, so this is one kind of functionality of the collector of the operator the the collector CR and the second CR is the instrumentation as I mentioned before the instrumentation allows you to instrument your your business applications and this is the The CR how it looks like The most important config option here is the exporter where we defined the otlp endpoint of the collector The operator doesn't automatically, you know configure this for you you have to create first the collector and then Set the the collector endpoint in the instrumentation then there is Configuration for the open telemetry SDK, which is you know the sampling configuration or the propagators or you can as well configure the the instrumentation to to set any Kind of resource attributes. We will see that later in the demo Okay, and we're gonna use the more the instrumentation CR in the next step But first seven will start with the manual instrumentation So now we're gonna be instrumenting the application We're gonna use the manual instrumentation and later the auto instrumentation So you see, you know, what are the differences and you will understand How the auto instrumentation can dramatically simplify the instrumentation work that you have to do if you want to get telemetry data awesome, cool Yeah, let's talk about the the application Let me drink a little bit first. I Could could also make some music here like that's five glasses. So I'm in between entertainment So I will talk a little bit about the application first and about the the instrumentation itself So actually if you really want to stand up for a few minutes and just listen and then do that So what we need of course now is a is a sample application, right? We need something that we run in our cluster I just want you to give you quickly like how this how this application works. It's not fairly complicated So think about it. It's that it's a game of dices, right? So you have two back-end services and they choose a number between one and six most of the time And they give back this number to a front-end service and this front-end service And just simply tells us like hey the winner of this game is Bob or the winner of this game is Ellis Depending on the people playing this game This app also comes with a load generator. So this will make sure that we have a lot of traffic going on In in our application in our environment, right? Okay, so let's let's talk about instrumentation first So what what do we mean when when we talk about instrumentation? So when we talk about instrumentation? This is really the moment where you take your code and you add in logs traces and metrics, right? So you weave that in into your application to make it observable There's no two ways how you can do this you can do this manually This is the more let's say developer centric approach and we will walk through this in a minute as well So what you really do is you initialize your SDK you tell the SDK where to send the data You even can go down and say like okay here. I want to start a span here I want to end this ban here. I want to add a metric So this is really a said a manual process where you do all of those things yourself and the big advantage of that is really you Can really decide yourself what you want to monitor what you want to observe and what you want to add in the other approaches automatic instrumentation One of the big advantages and you will see this in a minute is that it's working just immediately, right? So you can attach it to your application For most of the languages it's even working without changing any codes to think about it Java For example, can you spite code instrumentation for that in no JS we can do some monkey patching for that So you have a lot of approaches that that can be used here And this is especially for example useful if you have a huge deployment of applications that are not instrumented And then you can use something like the operator to just say like hey instrument all the applications running on my cluster So think about this more like something you would need if you're an application operators and say like hey I don't really know what is in this application, but I want to have traces metrics and logs emitted by it, right? When we now go into the manual instrumentation maybe to to send this up front, right? So we will do some some no chess programming now I'm gonna know if all of you are good in no chess I'm not that's not a problem because later when we will then use the automatic Instrumentation and run it directly in the cluster. We will just use a version that is done for you This is really more now the next few minutes for you to just get a feeling how manual instrumentation works, right? So what we have prepared for that is said is a front-end application I will start this in a minute if you don't have again no chess installed on your machine You can you can use docker for that So you can again bring it up in docker and you can run the same commands in inside a docker container, right? Okay, let's see. Here's the app front-end, right? So let's Go back and copy out the thing here, right? So we go into the app Still it's there. Everything is there. So we say now and Pax note monso note mon is such a watching service, right? So it's it's taking care of it now when we change the code every time. It's just restarting that service And it's now starting that that very simple application So let's now go over into into the source code. So you see like here is this this application So it's not fairly complicated You have here at the bottom your your requests being being being done and it calls those two back-end services and then says like Yeah, hey player one rolls this number player two rolls this number But actually I don't really care about that what I want to do now here at the top is really initialize my note SDK And then use the instrumentation library for Express and HTTP to to get all of that done for me automatically, right? What we do for that we go into the open telemetry documentation So let's go there in a minute Right, so we go here into the JavaScript getting started note chess instrumentation We can skip a few things because I said we have already our sample application So we can skip that top part where you can just have a very simple express application We have this already in the very first step. We install those dependencies This should work fairly quick because in that node. We are running. Let me find it again It's this one, right? No Where's my I Needed to put it. Yeah, I'm sorry Where's the This one. Yeah, I just need to stop of course the process and and put this in here Yeah So this installs as said the SDK the API some auto instrumentation Libraries and and some some SDK again for the metrics and then in the next step as said we can make use of that In the application itself. So the documentation here says like hey created independent file of doing that and use require to load That module from the CLI, but you can also really copy that out what we see here Just just take it and put it into into your source code Let me jump over here again So you go to the top of this index file and you put it here and you see like let's load all the modules We need node SDK the exporters the auto instrumentation and then also let's initialize all of that So for the start we use a console exporter from traces and metrics We use all the Instrumentation libraries and then we just say like yeah start the SDK. So let's save this again And then let's see if it's working. So let's do the Note one again If we are lucky, just as not taking a little bit a little bit longer Okay, so and you already see like it's dumping out a bunch of traces that are just generated in the beginning by the instrumentation of the FS module But what we can now easily do in another terminal we can send now a request to that. I hope that Goes well now, right? So Yeah, don't worry about the internal server Internal server error because the thing is there there's no back-end service. So it's complaining about that But what we should see now here, right? So you see now traces or spans specifically being dropped to the Trudy console and and now of course the next step is we want to have that emitted to to the open telemetry X to the open telemetry collector, right and what we do for that We we need to replace the span exporter and the metric exporter here to something that that uses otlp Let me go back to the docs. There's a documentation then now as well that that gives you all those exporters, right? So here in the case of Node.js, it's it's the otlp exporters and the the sip can exporters and again We can simply copy out a little bit of that code. So we take the things here at the top and we go back to the To the code Yeah, scrolls in the wrong direction sometimes Sorry for that And you see like I just add those those modules in the only difference we make we use gRPC, right? So there's different modes how you can use open telemetry or the open telemetry exporter But we will use here the the gRPC, right? Here's the let's copy that the trace exporter the metric exporter also put it like that And then we should be able to to send that back to the To to the how is it called the the collector, right? So did the only thing we have to do now differently we need to set the right environment variable Which I don't have in my head right now. Let me Check this quick in the code again or we can do it like that so we can simply Change the configuration of for example the the trace and the metric exporter like that so we can put this here inside Our collector is running under the name Yeah There's the slash there it is Hotel collector You see I'm normally using a term in Kile out numbers about that And then the same for the metric exporter right so we can put it here This should work just fine for both of them. They just used the same endpoint Oh, yeah, we need to set the port of course Which is four three one Seven and four Three one seven here as well. I think we should remove the we want traces here for a gRPC Let me do this quick And Then we yeah restart the application with that Give it some time to start up again. Yeah Let's see. It's Not sending anything here versus the collector And you see now that the collector is now here's receiving spans, right? So if you look here, it's receiving those bands. We can also Send some some requests using curl again to that service. It's again still giving us this internal server error We don't see the Spans anymore here on the console what we see a little bit hidden here is that the span ID and the trace ID are Injected into into my log lines and at the same time you see like The span so the 27 spans you see here at the bottom are the ones I just generated using curl, right? Cool quickly again here's the source code. It's fairly simple So you really just initialize the SDK using all those To trace exporters and here does get node auto instrumentation is doing most of the magic for you, right? It's loading a lot of instrumentation libraries, but what you then also can do If you go down a little bit in the source code You can add in your own open telemetry metrics for example So we can add in here some some counters for how often has the game been played or how often which player has won We can also add some some additional span attributes So those are things you can then do while while manual instrumentation. I left in some to-dos for you We will not go through them, but if you then maybe review all of that later For yourself, then then you can play around with it and say like, okay, how can I replicate of things? Right I Think the next step that then remains is That we then deploy our application to the cluster, right? So this was now said the way how you do manual instrumentation and now the situation We want just to introduce you to is that we have to frontend now manually instrumentate instrumented I said we will use something be prepared, but but you could technically now use the code you created in a new container image And then for the other applications later, we will see the auto instrumentation, right? So let me now deploy everything to to the cluster Let me go into the right direction again, I Think I can finish the application now. We don't need it anymore So we have deployed all the the applications, right? Let me go back and then let's see how everything looks like Right, so you see like there's the two back ends the front end The load generator so all of that is being created right now. It takes a little bit And then we should be happy and up and running You could now start also an additional proxy to expose the front end service and place some games on your own But since we have to load generator running, we don't need that Yeah, and the next part is to know how can we auto instrument the remaining services? Yeah, and for the auto instrumentation, we're gonna use the the CRD But what the CRD actually does it uses the open telemetry auto instrumentation and agents and In jest injects them into our workloads, but we're gonna see that in a minute So let's first create the Instrumentation resource. So this is how it looks like I Said the exporter endpoints to our auto collector that we deployed earlier in the observability back-end them space And that's that's pretty much it and we're gonna sample all the requests So we're gonna keep all the telemetry data instrumentation step So the CR has been created and now we're gonna instrument the The front end to back end one and back into but the front end is already Instrumented but it's not configured. So right now if I take a look at Logs, right? They should be it should be printing the something to the standard output What is the namespace? Tutorial application So maybe I can write something like this Yeah, and we see We see actually logs But you have to trust me that the the application is instrumenting just sending the telemetry to to the console So now we're gonna use the operator to inject the SDK configuration So we're gonna apply the So first of all, we're gonna first take a look at the At the pot of the front end application how it looks like and As you can see The auto instrumentation is named enabled and then we are just setting the the URLs for the back-end one and back-end two and there is no other configuration for the open telemetry SDK and now we're gonna apply the instrumentation annotation With value true, which means the operator will use Kind of the only CR the only instrumentation CR that we created in the namespace and the The annotation is inject dash SDK, which tells the operator to just inject the SDK configuration It will not change the deployment and it will not It will not inject any kind of like libraries into this Container it will just inject the configuration. So let me go back Apply the annotation if I apply the annotation to the pot spec in the deployment Kubernetes. We will restart the pot Now I will get the pot spec again and We're gonna see what operator configured So the operator in this case It seems like it didn't do anything. Okay, so maybe there is some why there's two parts Yeah, the old one is still there. Yeah, so it's the output has like a list of like two pots for the front-end And so this is the second. This is the one that has changed and we see That the operator configured the open telemetry series name, which is the the front-end deployment name the endpoint for to sending the OTP data and Then the Kubernetes resource attributes which are In this environment variable and as well the the propagators and the sampling configuration So now I should be able to access traces in Grafana Yeah, and we are getting some traces for the front-end application Right now there is only single span and we see that in the resources We are getting the Kubernetes resource attributes to describe, you know, what is the container name deployment name namespace and so on Okay, now let's instrument the back-end one service, which is a Python and We're gonna do the same workflow we're gonna get the pot spec and see how it looks like there is no Pretty much. There is no configuration And we're gonna apply again the inject annotation, but in this case, we're gonna choose to inject Python So the operator in this case will inject the Python instrumentation libraries, but as well will configure the SDK Okay, so now I should get the new pot and So How does it? How does it work right like how are the instrumentation libraries injected? They are injected by using the init container So the operator will inject will change the pot spec will add the init container and this init container just copies the Auto instrumentation libraries for Python into the directory called hotel dash auto dash instrumentation which is a volume that is as well mounted to your application container and Then it configures Python runtime to use those libraries and so Python will start with load the auto instrumentation And then we'll load your application So it's using the Python path in this case To tell the Python the that it has to load the libraries and then again the same Hotel SDK config we can access traces for the back-end one. Yeah, we see there are some traces already And Java very similarly, I will not get the the original pot. We're gonna just apply the inject annotation and see how How it configures JVM? It's again a list of pots. So the first one should be the one that has changed And again like we see the init container with copying that is copying the Java agent and then the Operator is configuring the Java tool options Environment variable and JVM uses this environment variable to to configure the JVM right and in this case we are setting the Java agent Java agent again, it's loaded into memory and then it will see all the classes as are being loaded You know from for the application All right, and now we have instrumented all the applications so we should see The entire trace Starting in front-end and finishing in back-end. We don't see the front-end service Yeah, but it should be to show up here anyway on the left It's six minutes ago. So we are not getting any traces from the front-end. Let's take a look Front-end seems to be running Let's take a look. It has the config for hotel The endpoint looks good Sampling looks okay as well I will just try to delete it And I was just going to say okay This one. Yeah, so it seems to work. Maybe Maybe there is a delay in like getting these spans, you know through the collector and then to to tempo and then maybe It just takes some time to to propagate them and make them available for query Okay, and so we ended up with this architecture. We have the front-end service back-end the back-end one and All of those are reporting telemetry data to our hotel collector That is then sending each individual single signal to two different back-end So for for metrics to me mirror for logs to Loki and for traces to tempo. So we already seen this in the In Grafana and now we're gonna take a look at like different use cases how we can Kind of improve our our Instrumentation and get bit more value or kind of solve different use cases and we're gonna start with the resource attributes which are very important so on Kubernetes environment we We should understand like from where the data is coming from right so we should be getting attributes for to identify What was the deployment name? What was the? The container name what was the the pod pod name the pod you ID and all these kind of Kubernetes attributes There is different ways how we can do that if you use the instrumentation CR The operator knows the pot and it can figure out all these data for you and it will inject those Resource attributes as environment variable to the application container So we are using this approach in the in the demo the second approach is to use the collector CR and Configure the Kubernetes attributes processor so in this case we can deploy the collector as a deployment and You know have multiple pots sending data to this collector and the attribute processor will kind of recognize From where the data is coming and we'll call the API server and get the attributes for you The last approach is when you're using again collector CR But deploy it as a sidecar in this case the operator will set again the Attribute the attributes as environment variable and you just need to configure the resource detection processor to kind of use that environment variable and set the attributes to And to kind of consume them and set them on beta There is as well a link for a blog post that describes how to set it up What we're gonna do right now is to update the Instrumentation CR and we're gonna enable collection of the Kubernetes UI the attributes So we are getting the pod name deployment name, but we are not getting the the UI the ones So I need to go to the resource think it's this one and I'm gonna just copy it. I have changed it, but actually nothing happens because the The change is not propagated to the workloads because the The configuration or the the attributes are set as the environment variable So we need to restart our applications to to let the operator You know change the environment variable and inject the new new configuration now we can go again to Grafana Let me just first check if Everything looks okay. The new pods are running Yeah, let's try this for instance And now in the resource attributes. I should see the UID attributes. Yeah, we are getting for the deployment pod replica set Okay The next kind of use case is sampling and a sampling is a technique how we choose How we decide what like what amount of data we want to store and send to to back end So in our previous setup with we were using 100% sampling so we are storing all the telemetry data that we are collecting we're gonna change that and You know save only 25% So we're gonna again edit the instrumentation CR and in the sampling sampler we're gonna edit the argument and Set it to 0.25 and Again like the sampling configuration is on the SDK is configured for the environment Variable, so we have to again restart the workloads seems okay If you would like to see like what are all the possible Configuration options for the type and argument There is a link to the SDK SDK config Now I'm gonna open Grafana dashboard for the collector and we should hopefully see that The amount of received traces decreased So this is this success last 15 minutes and We see that is receiving less and less traces and yeah over time it should stabilize in like for 25% of the original value This approach works and you will probably have to experiment with sampling quite a bit when you kind of deploy Or when you instrument your applications and like restarting your workloads is not ideal and In auto project there is a sampler called Yeager remote sampler Which allows you to set the sampling without the the restart the way it works that is that you deploy the auto collector and There you configure extension with the sampling configuration and then the SDK will Connect the collector and will you know can receive and update the sampling config Alright, so the last topic I want to cover here is the kind of data manipulation as Ben has showed us the collector can It's a pipeline right and in the in the middle there is you can configure processor and in these processors there is a lot of functionality and What they usually do is that they can manipulate the data Which can be super useful for you know removing any sensitive data or You can kind of extract new data from already existing attributes There's a couple of processors that will kind of offers you this functionality But we're gonna use the attributes processor and we're gonna use it to extract the player name from the HTTP target attribute so maybe let's go back to To graphana and see how the HTTP target looks like It's not in the resource attributes. It's in the attributes So in this case, you know, this is the root endpoint. There is just slash But if I go I believe to the back end there should be It will look like this roll dice Player equals Christina and we want to just extract the Christina into a separate new attribute and this could be useful for If we need to then you know query the player name, so I'm gonna edit the Collector CR and just copy the the config into processor section and I have to as well put the attributes processor into the pipeline for traces and Let's see if the collector Was restarted It looks perfect now we can go again to graphana and we should see Player attribute somewhere Yeah, it's actually me. That's cool. Okay so Yeah, but I want to show you here essentially that the collector is like highly configurable and There is a bunch of processors that give you a lot of functionality to manipulate the data and In the next section, we will see how you can even use the processor to extract Metrics we will show that to extract metrics from from traces. Hi everyone So before we extract metrics from traces We're going to look at the auto instrumentation that we did earlier So when we added auto instrumentation it didn't just affect traces We also got logs and metrics from that So these are the metrics that we are now getting because of the auto instrumentation we added before Front-end actually has some instrumentation so these games we can see as well as Who is winning how many times There also nobody can win we can tie But from the auto instrumentation we mostly see these HTTP metrics For both front-end back-end one and back-end two We can see our server responses, which are 200 which is great as well as the request duration And then for back-end two weeks since it's using JBM. We also get some CPU and memory usage One thing you might notice is back-end one has Prometheus metrics But since we're not currently scraping back in one we're not getting those metrics in Grafana So the collector can collect metrics using a scrape config like you can see here Where we list the job name we provide targets and then the collector goes and scrapes them but if you are deploying a lot of services or your Service makeup is changing often. You probably don't want to go in modify the configuration and redeploy your collector every time So Prometheus has something called service and pod monitors which allow you to Watch your services and pods and then it updates the collector so it knows What to scrape so in order to use the pod and service monitors. We just need to Install the CRDs, so we'll do that quickly now And then when we check our CRDs We can see both pod monitors and service monitors there So the way that we use the collector to The way that we use the collector and service and pod monitors together is with a service called the target allocator It uses the monitors to discover targets and then it splits up those targets among all of the collectors in the staple set So previously when we deployed the collector it was as a deployment So the main difference in the CRD that we're going to deploy now is it's a staple set as Well as various configuration for the target allocator So we need to enable the target allocator There are a few different allocation strategies for how you want various targets from the scrape configs to be allocated to the collectors And then things like the image number of replicas Prometheus CR is important. That's what Enables watching the service and pod monitors and using that in the target allocator Another important piece is we need to use the Prometheus receiver in the collector itself that we're deploying So we want to set that up as well as it has a target allocator configuration We just need to tell it what endpoint to hit to get its scrape config from the target allocator So when we apply this chart, we'll deploy our new collector as well as the target allocator for it and we also need a special cluster role that grants the permissions that the target allocator needs to see the The service and pod monitors. Oh, yeah, so we also need service monitors and we'll add those here we are monitoring The back-in-one service which we know has the Prometheus metrics as well as you can monitor the collector and target allocator as well So we'll be doing that. So Now that we've deployed everything We can see our new Hotel collector is called hotel prom CR collector. We have three of them in our staple set as well as two target allocators Yeah, and everything's running so we can check. Oh, we should check our service monitors, too Yep, they're all here one for the collector one for the target allocator and one for back-in-one service So now when we go back, well, this may take a few seconds Usually I see it first in the collector graphs We have a scrapes graph. So previously the hotel collector that we first deployed was scraping just itself But now we can see additional scrapes To new targets that we've just added. Oh, it's very small On your screen Yeah, so we can see scrapes as well as a slight increase in Prometheus metrics scrape duration and Just other collector Metrics when we go back To our app. Oh, there we go Now we can see metrics from back-in-one So somebody has already come in and had Prometheus metrics running on this back-in-one And now we can see them in Grafana So we can see What dice rolls were getting per second? The numbers we can see that some people are cheating With sevens and eights on our six-sided die And we also get some just regular Python garbage collection and CPU metrics Since we're also Scraping scraping the target allocator. We also get metrics from that So we can see any events that happen for service discovery or failures The targets per collector So so it's pretty evenly split between our three collectors that are running how many targets they scrape We've found all three collectors, which is great and Everything's looking good So the final thing we want to look at for metrics is in the collector You may notice that there are span metrics graphs, but we don't have those metrics yet span metrics is a connector That we can configure in the hotel collector that can transform spans into request error and duration metrics A connector is just a special component that can consume data as an exporter in one pipeline and then Act as a receiver in another pipeline. So in this case, it'll be an exporter in the traces pipeline But then a receiver in the metrics pipeline. So Right now all of our traces are going to the original hotel collector that we deployed So that's what we want to modify to add the span metrics connector And then we need to add it to the pipelines. So It's an exporter And a receiver So now that we've modified this we just need to restart our collector and then Hopefully we will see new metrics. We just wait a few Here they are So now from the spans we saw before you could see the get on roll dice On our back-end servers as well as the get slash for front-end And then we can also see back-end one since we've added scraping on the metrics endpoint We also see that in our server calls as well And then client calls our only client is the front-end. So that's what we see here And then other internal calls are also included basically any spans that it can convert The it will and then you can see them as metrics So next Yuri is going to talk about opportunities for logs Thank You Christina before Yeah, we start talking about logs. Let me just close. Oh, you just closed the tabs. Okay. Thank you as Metrics and trace are covered as the open telemetry is capable to handle the logs flow I Give in that example File log receiver which can be configured to read logs from the file file log receiver is just an example of Many different receivers that we have on the opening telemetry collector contrib repo Let me just open to have a look as you can see We have different folders divided by exporters receivers and we should go to the Receiver folder you can see you're like a patch receiver crunning in and so and so far and We've opened the file log receiver folder. You're going to see basically a doc how to use it and You can see the definition In the part of the page Like okay, you have to define which file will be included and your receiver basically defining your collector instance and How you use the regular expression to read the logs from Going back to the Example, I just designed this diagram We have a file input a regular expression in the middle and then basically the logs will go out We flow to the locking instance that we just deployed with the colleagues We have why the file log has this parser as a regular expression because the logs can be first by The container runtime and you can configure the regular expression for which container runtime You are using running your workload I Will use I will deploy a new instance for our collector or open telemetry collector using now the deployment mode as a demon set Let me just apply that Okay, as we can see we have a new demon set running on the observability backend name space and If you want to check the demon set training, but yeah, just Check it out through this command that is described on the repo. Okay Just have a look. So there we go then We should see right now the logs on the Grafana dashboard Coming from the old jalopy instance Configured by this file log receiver Let's go to the dashboard Here and if I open for example when I roll the dice I have all That information Just to give you also an option because I mentioned a lot regarding this file log receiver, but O2 LP is also capable to receive and transmit logs using the native protocol O2 LP and then if you Simply does want to read the logs from a file then you can use the O2 LP protocol it's basically what I have for the logs and After that, I will just give some bullets regarding what we have what is coming from the Open telemetry operator We have five Four actually here different topics or different challenge for the open telemetry operator Nowadays we have auto instrumentation for dot net Python and Java node.js. Yeah, Java node.js applications and we are planning to release the auto instrumentation for Goal in applications. Okay, and Also as a second challenge, we have this open agent management protocol Bridge which Gretina's team is working also and then yeah, and then we are Getting this as a second challenge as a second Second bullet for the road map. Sorry. I forgot just the road. I missed the point and For me the third one Simplifying the operator CRGs when we have CR configured for the collector instance we have to define the whole config Included the receivers processors and the pipelines exporters and What we are trying as a concept We are trying to develop Develop some opinionated CRGs with the winners receiving a CR with the receiver configuration another CR for Export the configuration and so on and so forth, but for now is We are just accepting new ideas and I'd like also to invite any one of you Yeah to help us Developing that simplification of our CRs CRGs And also, we have as Pavel mentioned we have four different deployments We have sidecar state full set deployment and The gym on set All of them a set sidecar When we change the configuration Doesn't get reloaded and as a final Bullet for our roadmap We are developing when we deploy a collector as a sidecar We will reload the collector instance. Okay, then this is our challenge for yeah, the next months and I'd like to invite you anyone again if you want to contribute We have the open telemetry operator repo and And I guess that's it. Thank you all Do you have any questions? Yeah, sorry. Yes, I noticed there's a lot of time the versions but specified for the collectors I assume if I upgrade open telemetry I would have to go through the individual collectors check their versions and maybe update them to I guess, right? Yes, so do you mean the image right for the collector? Yes Yeah, so the operator if you don't specify the image it will use the default one which is the The one for the core distribution, right and this one is set as a flag I think on the operator deployment, so you can override it with the With a contra distribution as a flag on the operator and once you update the operator then all the instances will be You don't have to then you know go and change the image field So it would be awesome if it would be possible just to override the image name of the tag So I could get this auto update feature, but use my own repository Exactly. Yeah, you just specify the the collector distribution on the operator. I think it's on the flag. Okay. Okay. That's awesome. Thank you Any other question? Yeah, there is one Thank you for explaining the and the whole structure of the project And I have a question regarding the CRDs responsible for injecting instrumentation into the into the pods is Is there a possibility of Using the extensions or with custom Custom instrumentation as well In these CRDs for example in case of in case of Java when we have jar with the agent We also need separate jars with our develop extensions. At least that's how I approach it in my project Is there like a dedicated field in CRD or plan to introduce that there isn't one but the instrumentation CR has option to Configure the init container image that contains the agent So you can build your own init container image with Java instrumentation and your extensions But it's not gonna work because the operator will copy just the Java agents would know the answer is no It's not gonna work if the if the extension is a separate jar right now There is no way to do it But if one embeds and this into the extension and compiles their own Yes agent then it would work. So there is a field to customize something like this There is a field to customize the init container with the Java agent Yeah, but it's a good proposal. Maybe if you open issue on the operator, we can take it from there. All right. Thank you Any other questions? Yeah, hi, thank you for the presentation I have a question about serverless like, you know, how open telemetry collector can play with for example No JS AWS Lambda application like what would be the approach to I don't know flash But processes plants and I send them to the collector and is there any out of the box solution that supports serverless So with with the operator There's nothing related to that, but there is a project right now really looking also into serverless instrumentation. I Would recommend it that you reach out to that project. So there's a lot of things going on right now That that's the best answer I can give you right now. So look into the I think it's called hotel fast slack channel over in CNCF slack they're highly Looking for people asking those kinds of questions and giving feedback on that All right. Thank you But I believe the SDK should flush the the inflight data before it's shut down Or I tried this and The lambda doesn't wait for the flash even if I try to wait for flash function So I did try this and yeah, I use passband processor But I'm not sure why it's not, you know waiting actually, right? I did see some tickets about this That's why I wanted like no best practice approach to AWS anyway to serverless Thanks Thank you for the talk. I was wondering first of all I think most of the time we want like automatic and manual Instrumation right so we want the basic automatic Instrumentation, but also custom business metrics, for example In that case would I have to what would I have to do to get both? Do you have to just Deploy the automatic instrument at manual metrics somehow or do I just have to do the manual approach? So if you look at the the front end application that we created in the demo it already has some Open telemetry in it even before we did the instrumentation was to note SDK and this is actually what you can do So so the API when when you call it and no SDK is loaded you should just nothing should happen Right, so and you can even combine this with the auto instrumentation So I think also if you look in the we have the open telemetry demo So if you go into the open telemetry docs and look at the open telemetry demo Some of them do are doing act actually that right so there's some Java applications There's some business metrics and business attributes being collected and then the Java agent is used for the for the automatic Instrumentation so you can do some mixing of that there is sometimes some situations where things let's say work Not as you might expect it But then I ask you to rely on the documentation or reach out to to the individual project and then ask for that Okay, thank you very much and just one more thing and I was wondering so we just added a lot of components And but we also had logs and metrics in the beginning of the setup without adding the collectors. I was wondering how that worked So we kind of had metrics in the Grafana dashboard in Mimir before adding the Prometheus stuff and yeah, okay so the The most of the languages right now are exposing already a bunch of traces and metrics, right? So if you look at Python and Node.js They give you a some HTTP metrics for for some standard frameworks just out of the box What we also wanted to show you with the with the Prometheus work is that you can mix and match that right because this is one very important part of open telemetry that that it really helps you on on your journey with Observability and if you already have something like Prometheus then you can just take that and a really cool thing is I think with The the Java agent is doing this already. It's also giving you log auto instrumentation, right? So it's looking into your code and it's looking for your lock framework and then Turning that into into otlp as well. So this is also something stay tuned the next few days and weeks some summer stuff on that Will will come and make those things a lot easier for you. Great. Thank you We also didn't have metrics at the beginning until auto instrumentation was added So I want to make that clear auto instrumentation is what allowed us to get any of those app metrics All we had were collector metrics from collector scraping itself Same for locks your assess So one more question can you say something about walk loads on virtual machines how to get the locks into the Collector I'm sure there is a approach. We don't work with the virtual machines. All right Thanks, I mean what what you can of course use is the defile log receiver, right? That you re showed you you can of course also use this on a virtual machine And if you go into the collector contract repository, there's a few other log receivers as well, right? So that's really the same with with the metrics You can use different protocols that you have already To to scrape those those files or any any other source of locks that you have And not only locks there also What is it called? host metrics receiver, which is extracts metrics for you from the host and this yeah, you can enable something like this And you also can mount the the Firelog as a as a volume and basically tells you the gym on site of the collector. Hey look Read the file from this volume. No Thank you for the cool tools and the comprehensive tutorial and even though it's Friday afternoon. This was great One question regarding the auto instrumentation. I've been listening to the What was it? I guess pixies talk earlier and they are doing this with EBPF, I guess I just wanted to know if you could like compare the Solutions or the What's the pros and cons in regarding so I prefer this approach that you presented to be honest And I'm already using it and I just wanted to know if you see maybe some pros or cons in this Thanks. I Think it depends on the language So for instance the Golang auto instrumentation from hotel is based on the EBPF Right, but Golang is natively combined language Like for Java, it's like doesn't make sense to me to write EBPF agents because then You would have to understand how the virtual machine works and it's just much easier to write the divide code manipulation for that Which is the technology that the current Java agent uses It's important in the auto instrumentation to kind of get the Context from the runtime so understand, you know, what is the handler name? What is the The HTTP path and all this data right and like EBPF level, maybe it's it's gonna be way more trickier than Than on the on the JVM level If you just you know comparing like Java Okay, thanks And if you don't use the auto instrumentation you can also customize How rich your data are and then basically this is I would say the pros and cons Okay, thanks. Any more questions. So yeah, thank you very much