 All right, so this talk is about distributed tracing with open telemetry and Knative. I was supposed to do this talk with my colleague Daniel. Unfortunately, he couldn't make it. He was actually supposed to be the main speaker. So if you don't like this talk, then just blame him, okay. All right, so just a few words about myself. So my name is Kevin Dubois. I'm based in Belgium. I work for Red Hat as a developer advocate. And yeah, I contribute also to some community projects. You can find me on all those channels. So this talk is about observability, right? So being able to see what's going on in our applications, but maybe let's take a step back into the past. So this soccer or football player, depending on where you're from, this is a player called Silvio Piola. He played for Lazio, which is my team. And so in this picture, we see kind of a lot going on, right? We see like this guy in the back. He's lying on his back with his legs up. We see our main player who may or may not have kicked the ball, maybe he's gonna score. I don't know, maybe he got pulled down by the other player. We don't know, right? Cause this happened in the past. There's no recording of this. We have no data, no telemetry about what happened here. So fast forward to a few years ago. So I don't know if anybody remembers this game, Japan against Spain. It was a pretty exciting game. And so in this instance, we see that the ball seems to be out. And Japan went on to score, I believe. But thankfully, in this case, we do have data, right? We can go back in the past. We can see what happened and we can verify that really this ball was in or out because we actually have the exact image that shows this ball was in. It still doesn't look like it, but there's this tiny sliver that's still within the line. And so in football or soccer, the rule is that the ball has to be completely out of the external line, right? So this was a valid goal. And thanks to having observability, we can verify that that's actually what happened. So what does this have to do with software, of course? Well, we also want to have visibility into what happened. We wanna be able to go back in the past and see what happened in this occasion. If somebody, an end user or somebody from the business comes and tells us, hey, there was an issue with your software, you wanna be able to go and find out what happened. You wanna know exactly for this particular quest, this is what happened. And so that's what telemetry and observability is all about. So from a developer perspective, we wanna know what is the health of my application? Is the performance okay? If there's an error, what was the root cause of the error or the defect? And what are some performance bottlenecks? There are certain requests taken a long time. And especially in distributed architectures where we may have one service calling another service, calling another service, how can we trace through that and see, first of all, how did the request go through our system? What happened to our request? How long did each component take? And we can drill down into the details of that, right? So that's the ideal world of observability and tracing. So in terms of kind of the observability, main components, so we have three main pillars. So on the one hand, we have metrics where we can find out all the good numbers that we're interested in, right? How long did it take? What is the memory usage of my JVM, for example, what is the garbage collector doing if you're using a Java application, for example, you can get all those metrics and then find out how can we improve our application or is there an issue? Then also logs, I mean, most of you are probably aware of logging, so having access to certain data that we're exporting in terms of logs and then being able to trace through our application, which is the main component that we're gonna be talking about today. So being able to go again through requests and exactly follow the trace of what was going on so we can find out what line of code is actually having issues, what request was causing the damage here. So where to get started with this, right? Because if we look at the Cloud Native Computing Foundation's landscape, we can see that there's many tools out there. There's a lot of really good ones, but you kind of have to pick and choose and there's kind of too many choices, right? So, but once you've made your choice, a lot of the kind of more, let's say, legacy tools, they're not open source or only certain components are open source and maybe you have one tool for the instrumentation of how to integrate this observability stack into your code, maybe you have something else for collecting that data than another one for processing that data and it means that you kind of end up being locked in to that provider because especially if you have to instrument into your code, you have to put certain libraries into your code, certain ways of integrating with these tools, you end up with a little bit of a mess if you would then wanna switch to another provider, right? In terms of the actual observability stack, it's a little bit less complicated in the sense that the migration from one tool to another isn't necessarily that bad, but if we have to refactor our code, that's not good, right? That's gonna be a very expensive project so we wanna make sure that what we're using for our observability in terms of instrumentation is gonna be open source ideally and so the industry has fortunately also converged like hey, it's probably better if we use open source, that means that other people who are using a different provider can come to my project if I'm running a project, but that also means that we can work in an open way, right? So we don't just create open source software but also open standards, right? And so the industry has created this new kind of open standard for doing telemetry. So originally there was a project that was called Open Tracing, which was a nice tool for tracing in an open source way and then there was another project called OpenCensus and OpenCensus is more focused on kind of like edge, IoT kind of devices and doing the tracing for that but so now we have multiple standards, multiple projects and you know how it goes, right? I mean, you start with 14 standards, now you add one that is gonna be more standard and then you end up with 15 standards because the other ones are still there, so. Fortunately in this case the two projects actually got together and said hey, we're both doing kind of the same thing, we're both tackling the same thing in an open source way, let's converge into a new project and that's open telemetry and so this is, yeah, you could say this is yet another standard but ideally the two other standards for Open Tracing and OpenCensus can kind of fade away. So OpenTelemetry is a relatively new project but it's been very popular in the cloud native world so you can see OpenTelemetry was actually, after Kubernetes is the most worked on project last year so this is a very active project and not just people working on the core of OpenTelemetry but it's all these providers that are now integrating with OpenTelemetry to have an open standard so that we as developers can just use OpenTelemetry specifications and then it's kind of up to the platform engineers to decide what tool is gonna ingest that data and handle that so we have a nice open way of working. So the components of OpenTelemetry on the one hand is providing common specification for how can we produce our telemetry, how can we send our telemetry and then the second part is the instrumentation so how can we actually integrate this into our code and so we can do that with OpenTelemetry in different ways, we can do a push system, we can do a pull system, we can have an agent running on our systems that collects, that sends the data, we can have libraries, standard OpenTelemetry libraries in our code and do that in a nice lightweight way and then also in terms of the collector having a standard way of collecting this data to a provider that handles it and that you can actually use to consult your telemetry. So this is fine, right? So we have now a tool for doing tracing but what about distributed applications and especially serverless applications? So with serverless applications the idea is that you're gonna scale up when there's high demand and then scale back down potentially to zero if there's no traffic and you wait and then once there's traffic again you handle more requests and so let's talk a little bit about serverless before we continue with the tracing part of it so one project that does that kind of in a Kubernetes native way is Knative, it tries to make serverless kind of seamless to work with for developers so on the one hand makes it easier to deploy to Kubernetes and then automatically adds auto-scaling out of the box in terms of requests, right? It doesn't need to check for metrics in terms of RAM and CPU, it just looks at what kind of requests are coming in if there's a lot of requests come in, scale up if there's no requests coming in, scale down to zero so it's very reactive. That's just one component of Knative, there's actually quite a few interesting capabilities that you can do with Knative in terms of eventing as well so plugging in different eventing systems but that's kind of beside the point of this talk right now we're mostly interested in the auto-scaling and then especially with the fact that if you have a container running and you want to consult the data, the telemetry of it that's relatively easy because you can look into your pod and see what's going on but with serverless if it scales down to zero then while there's no more pods, there's no more containers so how can we consult our data, right? So we need a distributed way of handling this and of course we'll do that with open telemetry. So I'm a Java developer at my core and so for me it was interesting to do this serverless Java and so I don't know how many Java developers are here. A few, nice, I like it. I'm always afraid to ask this question at CNCF events because but actually using Java with Cloud Native and serverless isn't always so great because traditional Java, the idea was that it runs on these big kind of dedicated servers and if you want to scale up, you do vertical scaling or you add CPU and memory and so but the Java ecosystem is pretty awesome. There's a lot of things that you can do with Java, a really great community and so actually there's been some recent developments in Java over the last few years to make it much nicer to use with Cloud Native and with serverless and so one project that I'm particularly passionate about is Quarkis. So we're gonna use Quarkis with Knative and open telemetry in a little demo in just a little bit to show you kind of how they all work hand in hand as an example of how to use open telemetry but again, you can use open telemetry. The idea is that it supports all programming languages so there's instrumentation for the main languages like Go, Node.js, Python and stuff like that too but of course also Java but so real quick kind of introduction on Quarkis if you're not interested or not, you don't know Quarkis yet so the idea is that it's a supersonic subatomic Java so very fast, very small footprint but still Java so all those Java developers with their knowledge and their ecosystem can continue using it in kind of a Cloud Native way so the idea with Quarkis is that it moves a lot of that stuff that Java usually does during startup time, it does a lot of optimization with the JVM, it moves that to the build phase and that's how it is able to optimize that startup time a lot more and then there's projects out there like GraalVM to even do a native compilation of your application and it starts up even faster and has a smaller footprint but Quarkis does more, I wouldn't say magic because it's all very well documented but a lot of optimization for Cloud Native development in terms of performance but also in terms of developer joy and you're gonna see in my demo how I'm gonna really create interactions with Kubernetes and OpenTelemetry in a very easy way so yeah kind of the highlights of Quarkis there's a focus on performance, on productivity it's kind of, they call it a cube native Java so very easy to work with Kubernetes and it focuses again on standards just like OpenTelemetry, just main standards and not anything specific to Quarkis itself so let's look at a little demo here so who likes live demos, hopefully everyone you never know what's gonna go wrong right so here's my little project that I've already created so if you're familiar with Java you'll see that we have our pom.xml and we have a source main Java folder and all that good stuff so let's take a look at that and run it real quick so as you can see in terms of dependencies in this case my instrumentation is in my dependency so I have an OpenTelemetry dependency one for just Quarkis OpenTelemetry that's gonna automatically out of the box do the instrumentation so I don't need to add anything to my code to make this work and then I have also a specific one for the further instrumentation using the OpenTelemetry spec to have also tracing capabilities into my database requests so let's run this real quick on my local machine so I can just do Quarkis dev that's gonna start up this application on my local machine and actually even start up container with my dependency with my database dependency which is kinda handy and so let's see hopefully this is running we have a connection refuse that's always fun I'm gonna go look at my podman desktop and so it looks like our Postgres is running but my tracing was not running so I will go ahead and get that started maybe I'll start this up real quick again to make sure that it makes its connection and then we're gonna make a few requests on our local machine and then we'll deploy it of course to Kubernetes because that's kind of the main point to have this work in a distributed way and I'll deploy this as a serverless application so let's go look at our browser and so let's see Mr. Firefox here we can oh congratulations your application is running and so I have this endpoint places that just shows like where I've been recently and yes the last one is Singapore, cool so just to let you also see I don't have any dependencies on open telemetry in my code right so this is all just orchestrated by having the dependency on open telemetry so let me make a few requests here so I'm gonna hit refresh a couple times and then go to my local hosts 16 blah blah blah so in my case I'm using Yeager but you could use Tempo with our friends here from Grafana I'm using Yeager in this case and you can see that it automatically integrated with my application so I was able to see that application and now I can find the traces and we can see 13 traces from my request and so I can see here I called endpoint places there are three spans in there and so I can see kind of what's happened in my code what processes were being called and then I can also see the database request so let me make this a little bit smaller there we go let's go back and then scroll down wow this is a and so I can also see my database request how long did it take like in this case 135 microseconds we can see how long the actual query took which is hidden underneath here the select statement so I got all that data kind of out of the box in my Java application and again is the same way with other applications by just adding the open telemetry dependency to my project so I can see that it's running on my local machine let's go ahead and stop this and then deploy it and so in this case I just have a few application properties to work with my open telemetry here so I have an endpoint where my traces are gonna be collected and then just I'm formatting my data in a specific way but by default you would just be able to do that too and then just some additional deployment target K-native so that it knows to deploy it to K-native so I can create a container image with just Quarkus image build so it makes it pretty easy to create containers with Quarkus and we can see in a few seconds we have our container image I'm not gonna push it to my registry instead I'm just gonna deploy so I'm doing Quarkus deploy so Quarkus has this Kubernetes dependency and it creates our YAMLs out of the box so because I have the K-native target it creates a K-native service so I don't need to worry about creating kind of the default structure here and that's gonna make it so we can deploy our application to K-native and I have the container concurrency set to one so that every request is gonna create a new container, a new pod so we're gonna see that it scales more in that way so it's a little bit of a cheat because you wouldn't wanna do this on production for every request create a separate pod if there's a concurrency request all right so let's see if this is working if we're deploying our application to K-native so I'm using, let's see what it's saying did I get logged out or something, maybe that's, see that's the fun of live demos right so I'm gonna copy my login command here with a get a token for it three login couple times display token so this is one way that you can log into an open shift cluster we go back to our command line if not we'll deploy it from the UI that's always and that is alternative okay let's log in real quick come on we need some holding music who can play some holding music oh wow it's not, oh no that's I thought I saw more errors but it still hasn't gone forward all right come on why is it taking so long if not, all right my patience has run out so I was gonna show this from the application but what I can also do is just add to project container image and then I have of course an image of already ready and I'm gonna deploy that to a new application Quarkis and then I wanna make this of course a serverless deployment otherwise we're not using Knative and I wanna make it so that it scales so you can specify here you know like minimum or maximum pods and I'm gonna set my concurrency target to one so create actually let's make this auto scale window but let's a little shorter okay so this should be the same thing so I'm deploying my application as a serverless application in my project so let's give this a second so we can see that it's starting up we can look at logs here and see that it started in 90 milliseconds which isn't so bad for Java right and so if we go look at our application now let's open the URL we can see it's the same thing right and we can go look at our places and again we can see the data being loaded from the database that's also running in my project to have a Postgres database here and I'm gonna hit refresh a couple times and then go to my instance of Yeager that I've deployed here as well and we should see that the same kind of thing is gonna happen right we're gonna get the same data except this time we're running on Kubernetes on an OpenShift instance and we can see that also in this case it was able to find the service automatically by just adding my instrumentation to my code so just that dependency and then telling it where to find it and so we can see here we have those requests that were just made a few seconds ago when I hit refresh and then again we can see that it did its request in 1.75 milliseconds so quite a bit faster than on my local machine so that's good right ideally our cloud environment is gonna be a little more performant than our local machine but as you can see I mean this demo is very simple right I mean there's no kind of crazy stuff that I had it to add in this case because we're using serverless we're now autoscale to zero but my traces are still there so I can still find if there were issues I can find which container which pod was misbehaving and what line of my code was involved in everything so let's do one more fun thing let's send a bunch of requests to this yeah let's do this let's send a thousand requests with a thousand concurrent well a thousand concurrent requests and then make it see see if it'll handle my requests in a nice way right so we can see that it's creating 200, 300 and whatever pods I'm hopefully not blowing up my entire cluster here but let's see how this is behaving so I'm gonna go and make sure that I hit some, create some requests and so these should be going to the same application and then if we go look at our tracing we can see that even though we have a whole bunch of distributed applications we can see that they're still being handled by the same central collection place and so you can see that in this case I have one application running had I had multiple they would all stream into this I would be able to trace through those different services if I have different services running on different clusters and they're connected same thing right they're all streaming to my central collector and then I can access and have observability of my application and you know like thanks to Knative we can see that we can scale up very fast scale down very fast because now all those pods are terminating because I'm not sending any more requests and so it's just gonna go to sleep and wait for more requests to come in and you know as we saw in the previous keynote session that's good for the environment right not using all those requests all right so that was a very quick introduction into open telemetry Knative and also a little bit of Quarkis sorry about the Java stuff I like Java we at Red Hat developers we are lucky because Red Hat is also likes to sponsor some of our books and makes them available for users so if you're interested there's a whole bunch more books there's actually some Knative books I believe too so you can download them for free thanks to Red Hat and then that's about it for my talk so thank you and have a nice rest of the day so I think we have one minute for maybe one question if anybody has a question I have stickers so if you ask a question you can have a sticker and if you don't ask a question you can also have a sticker we have 59 seconds for a question no nobody all right you can find me in the room over there if you have any questions and you don't want to be put on the spot so that's fine too so there we go one question I don't know if we have it's a microphone in the aisle in the aisle there's a microphone there's a microphone if maybe somebody can pass it along thanks for the demo I just wanted to ask one simple question like when you showed up the Red OpenShift right and then you tried to deploy the application right we had actually the serverless option is it like the K-Native implementation on top of that OpenShift and you can use that that's what it means yeah exactly thanks for asking that question yeah so because I added the K-Native or the OpenShift serverless which uses K-Native operator I get that option in the OpenShift UI to deploy as serverless yeah is it part of the OpenShift platform itself to have that yeah so you can add the OpenShift serverless operator as part of your OpenShift platform thank you that's time we have to go to break and transition to the next presenter I'll answer the question now you can ask some questions offline thank you