 Hello, welcome to Cloud Native Live, where we dive into the code behind Cloud Native. I'm your host today. My name is Whitney Lee, and I'm a CNTF Ambassador and a Developer Advocate at VMware. Every week, we bring new presenters to showcase how to work with Cloud Native technologies. We'll build things, we'll break things, and we'll answer your questions. Today, we have Adnan Rahic here with us to talk about the power of traces, why open telemetry embraced trace-based testing. Now, this is an official live stream of the CNTF, and as such, it's subject to the CNTF Code of Conduct. Please don't add anything to the chat, that would be in violation of that Code of Conduct. Basically, be respectful of all of your participants, be respectful of Adnan and respectful of me too, please. Friends who are joining us live, please say hello in the chat. Tell us where you're tuning in from. It's so cool that this is a global community, and then as always, if you have questions during the presentation, please post them in chat. With that, I'll hand it over to Adnan to kick off today's presentation. Thanks for being here, Adnan. Thank you for having me. Hello, everyone. Hello, community. Super thrilled to be here. My first time joining the Cloud Native live. Let's just get started. Just a bit about myself. I do developer relations. I've been doing it for roughly five, six years now, so I should know what I'm doing. I hope, right? Isn't that always what you want to hear at the beginning of a presentation? You're really building trust right now. Yeah, what can you do, what can you do? Yeah, in my spare time, when I'm not doing developer relations, I like lifting heavy things off the floor and getting punched in the face, which basically translates to weightlifting and boxing. You are building trust in that we believe you're honest. That's the kind of trust you're building. Isn't that the way it should be? It's just cut through the nonsense and just go for the important stuff, right? Yeah, getting punched in the face. Wonderful, wonderful. Yeah, let's do just quickly jump into the rough agenda of today. All right. Yeah, let me pull up the, here we go. So obviously, what we're going to be talking about today is going to be really kind of just based around observability, but not just specifically observability. It's going to be based around open telemetry and specifically I want to kind of demystify the whole thing. So it's quite hard to understand what it exactly is and also how you can contribute to helping the community, but also helping the open telemetry demo, which is also a big, big part of the community. So we're going to basically be going through all of those things and also a key part in contributing. It's not basically not breaking stuff and that's what we're going to be talking about mostly. We're going to talk about how to reliably write traces, how to reliably test them, how to reliably test your system as well. So because when you're using any cloud native tool or any cloud native app and you're building in cloud native, there's a lot of microservices, a lot of things that are interconnected where it's very hard to understand what's happening. It's very hard to test it. Integration testing is a pain. And so yeah, so that's basically what we're going to be talking about in the next hour or so. And I really want to start by starting from the bare bones basics. So let's say what is observability? It's like a buzzword. We know about monitoring, we know about metrics, we know about logging, but what does this specifically mean? The easiest way I would say to explain it is that it's the way you observe your system. So it's the way you figure out all of the unknown unknowns that are going on in your system to help you troubleshoot any problems that are in your system. Now, the way you do that is specifically, it's the way of emitting signals and those signals are the holy trinity or whatever we call them, traces, metrics and logs. Three pillars of observability are traces, metrics and logs. Now specifically for open telemetry, it does support all of those, but the most important thing and what we will be talking about during this live stream will be distributed tracing. So it's the distributed tracing part that we really want to understand. Now, the basics of distributed tracing, it's not really easy to wrap your head around because there are a lot of things like, oh, what is this context thing? And nobody really knows. So they just kind of wing it. And things like that, I want to help demystify. The easiest way of thinking about it is everybody knows what logs are. Everybody knows about logging. Everybody has done logging. If you haven't, you probably should try on a side note. Anyway, logs are quite literally, this is a log line. You write a console log or whatever log in your application. It spits out the log and you know what's basically happening in your system. Tracing is very similar conceptually where instead of logs, as in logs, you're generating something called a span. And as it says here, a span is a unit of work or operation. But let's key in this keyword here, distributed. Distributed tracing wouldn't be distributed if you only had spans. The thing with spans is that they connect into a distributed trace. So here's a perfect example. This off N span is just part of this entire distributed trace. And this entire set of spans that are connected to each other is a distributed trace or a trace per se. And this is what the actual power of tracing gives you. You don't have just individual log lines and then you have to figure out what it does and how it connects in between one part of your system and the other part of your system. You're basically getting a waterfall diagram, so to say, off the whole interaction one request goes through within your system and that's the power of it. And yeah, so let's just kind of take a step back. And I want to show you really quickly. Yeager is a tool for distributed tracing and then just to say, whatever it doesn't really matter, which one, what I want to show you here is that this is what you would see in your system if you have something like Yeager installed. And Yeager is basically a tracing backend. It's like a data store for your traces. So all of the traces you generate in your system you put in Yeager and then you can get a really nice waterfall of everything that's happening within your system. So this is a get request. You can see what's happening. You can just open up every single part of the codes that gets touched with this particular transaction and you can see exactly what's happening. So we have a couple of questions from one person asking more than one question. But I guess they're wondering, how do you manage this for Python logging? And then at which point is it best to use this framework instead of a sidecar in a Kubernetes environment? So I guess this is touching on like, are these things language specific and how exactly are our traces collected? That is a lovely question. And the way I want to answer that is to just pull up the OpenTelemetry website. So OpenTelemetry handles all of this. OpenTelemetry is a set of SDKs. It's a set of tools and libraries that lets you generate telemetry in the most efficient way possible. So you can generally slot this in instead of any logging framework or log library or whatever you're using and instead use OpenTelemetry instead. So as you can see here, it's literally a collection of APIs, SDKs and tools and that you use it to instrument or code. So you're basically the same way you would do like Python, library.log, debug or whatever you want to use there that you use OpenTelemetry SDKs instead. Now the best way of explaining that is popping into the official documentation. Obviously in saying, what is OpenTelemetry? So you basically get the observability framework to create, manage your telemetry data, which means that as same thing as you were doing for, let's say your Python logging, you would slot in OpenTelemetry and say, yep, I want to use it for logs. I want to use it for metrics, but most specifically the traces part is the power because with a trace, you can also add in log events to a particular distributed trace. So you can get context to that log. You're not just basically sifting through logs. You can actually see this particular request had a problem and here are all the logs within that distributed trace. And regarding the, yeah, go ahead. What's the benefit of using OpenTelemetry instead of your Python logging framework, for example? Observability, short and sweet. You get an exact map of what is happening in your APIs. So let's say you have an API for a payment. If that, even if you have black box tests for that API, you only really know whether it failed or succeeded because you're looking at the output. If you're set up tracing for that particular API, you can know exactly everything. You can know whether the card is invalid because it's a visa, but you want master card. You can know if it's invalid because of X number of different reasons, which is the power of tracing. You're basically observing exactly what's happening and you can add in even custom spans within your code, the same as you would do with logs, but these custom spans are basically added in to that entire context of that one distributed trace, which represents one API request. And that's basically the most powerful thing that I would say. So it's way more rich in terms of what it collects, but then also it plugs into a lot of different frameworks, not just Python logging, for example, like it can know about your Kubernetes, it can know about other languages. And so it has a standardized way to collect the telemetry data, regardless of where it's collecting the telemetry data from. I want to ask you, Arnaan, we have more questions rolling in. Do you want to go with the questions for a while or do you want to tell your story and then maybe we'll break for questions in a bit? Yeah, let's do a couple of more questions before we jump in because some of the things I want to move on to right now is exactly what you were saying about using different languages and what the SDKs actually entail or what they actually have as a possibility. And I would say that that would be something we can jump into after a few more questions. Okay, sounds great. So Said has a question. Can you add traces to BlackBox application's legacy where you don't have access to the code base? Before you answer that, will you let people know what BlackBox means? It's quite literally where the only access, actually the only knowledge you have about the system is what you put into it and what it puts or spits back out to you. So basically the request response. You only know what you have to give it and what you should get back. You don't have any clue what's happening in the inner workings of that system. Excellent, thank you. And then answer the, how can you add traces to BlackBox application? So one thing that I think you could do there is depending on where your application is running, let's say you have it running in Kubernetes. OpenTelemetry does, that even says here on the screen. So OpenTelemetry gives you an operator for Kubernetes. So you would quite literally set up the operator, run it as a Kubernetes CRD inside of her cluster. And then you would specify on the pods you want, you can specify the auto instrumentation libraries, which is basically a super magical way of getting tracing enabled in your Kubernetes environment without writing any code. So this basically no code changes to your code base. This is just configuration changes and something that your DevOps team would do or platform team or whatever you call it in your organization. But apart from this operator for Kubernetes, everything else that I've tried, you basically need to do some sort of configuration changes that are code based. And that's something I'm going to be showing in a moment as well. We still get auto instrumentation, but there are some slight code changes you need to do to actually load those libraries to start generating those automatic traces, so to say. Excellent. So another question, if you compare tracing instead of logging, how intrusive does tracing become in terms of resource consumption? Does it make sense to only do tracing and ditch logging? That's a good question. I would say that back in the day, you would start with logging and then you would kind of add in some metrics and then you would kind of add in some APM and then that would kind of all just mesh together in some weird way. If you're doing it from scratch now, I would suggest starting from tracing and going from there. Using distributed traces can in some sense be equated with the term APM or application performance monitoring, which is something that I'm not sure who coined it, but it's quite common with certain tools like Datadog, Nurellic, all of the big vendors in the space. So you could say that if you're used to having a sort of APM monitoring system, using distributed tracing goes with that logic, so to say. So I would say try doing instrumentation with OpenTelemetry first because that enables you to both generate metrics with OpenTelemetry and add your logs with OpenTelemetry as well. So you're getting a non-vendor login way of generating the telemetry where you can basically choose whatever backend you want. If you want to send it to Datadog, if you want to send it to Yeager, if you want to use Grafana, I mean, it doesn't really matter because the telemetry is standardized and you're generating it in a way that will be accepted into any trace data store that you choose. Amazing. And now we're level, we're back to ground zero with the questions. Wonderful. And with that, I think it's gonna be fine. I mean, I did mention the OpenTelemetry operator for Kubernetes, but generally the thing I also want to mention here is that all of the libraries that the OpenTelemetry project offers, it's, I mean, the languages you can choose from are immense. I mean, everything from C++, JavaScript, Python, I mean, basically any languages that you're using, you'll have the SDK available to generate traces from it. And to show you the specifics there, I think, I mean, because I'm a JavaScript developer, hopefully nobody calls me a fake developer because I'm using JavaScript. I mean, I am, but yeah, let's not go into that. I wanted to actually talk about the instrumentation part. So what's the easiest way of getting started with a language such as JavaScript? And here's, let's actually pop into the automatic instrumentation and I'll just walk you through here. You install some libraries, cool. You're Node.js, that's NPM install. It's perfectly, perfectly reasonable. We all do that all the time. You export some environment variables and then you run your app and you're magically, magically have tracing. Like, is this real? Is this, you know, is this a hoax? Is it really possible to do it this way? I mean, come on. It looks too good to be true. And this is the part that I want to talk a bit about the way you need to actually configure some code that it's to actually get it to work. And I think the best way of doing that, let me just pull in something and show you. So let's go ahead and open some code. And I specifically want to show you this on the left-hand side, you can see that it's a service called payment service and we have an open, it's obviously JavaScript, it's Node.js and we have a file called open telemetry.js. Now, I have very conveniently just commented this little section out and added that in. So with this, so it's a open telemetry, requires some SDK, SDK node, getting some auto instrumentation, I'm getting some exporters, yada, yada, whatever. And I'm setting my auto instrumentation and I'm setting where I'm going to export my traces. Now, this 30 odd lines of code is going to auto instrument my Node.js app. Cool, that's it. Mic drop, yeah. And you're obviously looking at me, this dude is like, he's freaking, he's lying. There's no way, no way it's gonna be. But yeah, let me backtrack a bit. I've commented this out, there's no open telemetry instrumentation in my payment service, cool. Let me go ahead and trigger an API test against this payment service and just show you what happens. I have very conveniently set that up as well. So let's do, I think it is this one, and it is this one. So I'm just going to trigger an exploratory, what's it called? I'll just mess that up. I wonder if I can turn your name off, your name is blocking the... Oh, okay, let me zoom in a bit more. Okay, your command. So like that, we have the demo running, we have this guy. Ah, okay, it's a, I'm just misconfiguring the test itself, so let me pull up the test and change it real quick, and it's going to work perfectly fine. Let's do like that. You see, this is what happens when you're not prepared for your demo, isn't that terrible? We did promise up top that we would break things. So I'm glad you're... Yeah, also the last slide's the better, right? Yes, yes, 100. Actually, we can probably also just move this up a bit. There we go, great. That's a bit prettier. So what's this is going to do here is that I'm basically just running an API test. Then whilst that's running, let's just pull up the actual test itself so you can see it running. Let's just last run, and then we put that one in. Zoom in a bit on that. So obviously this test here, just, I think this is going to be fine. Obviously this API test here is going to say, yeah, I'm looking for the traces, whatnot. This is, by the way, this is trace test. This is the testing harness that the OpenTelemetry demo is using, more on that in a second. This is going to start running and it's going to fail after three minutes, which is the timeout. It's going to fail because I have no tracing in my payment service. So it's just gonna say, I triggered it, nothing really happened, I don't really know what to do. Well, let's see if my hypothesis is true. So let's move back to our OpenTelemetry file. Let's go ahead and comment in our supposedly working auto instrumentation. Save that file up. Go ahead and rebuild our payment service. So we need to, just obviously need to stop it. We need to rebuild it and restart it once again. This is going to take three seconds. So if we have any questions that- We have a comment too late. No, we can go ahead. What's nice in logging versus OpenTelemetry is that OpenTelemetry defines semantic conventions about attributes, which would be there for a given scope like HTTP. So that's better than full text indexing some logs. I love that. I love that. This is a perfect segue. So just for reference, I've restarted this payment service with the changes. I'm going to run my test again. So let's do just the same API test, pop back into the UI and let's reopen the, obviously last run. Let's reopen this little buddy here. And now if we go to the trace tab, we're going to see the trace. But also one thing that I'm going to talk about here is the question was about semantic conventions. Beautiful question. That's something that was also thought of in the community because one of the big problems when pushing changes is that you break not just the tests or you break, obviously break the code because the tests are passing, but they shouldn't be passing, but also you're not writing adequate telemetry. So you're basically not adhering to the semantic conventions and the rules that the open telemetry community wants you to adhere to, which is obviously first and foremost, we have our trace now, obviously we have our trace band. So that one jumble of 30 lines of code actually made us get our trace back. So with that magic, let's say through 30 lines of code magic, we can actually see our trace. But even more importantly, we can pull up this analysis of our trace and see, okay. So our trace is 83% good, which basically means, yeah, I have some semantic conventions that have failed. I can jump in and say, okay, so this failed. Let me go in and pull up the documentation on how to fix it, which is an amazing thing that our community member just mentioned, which is I actually have some hand holding. I don't really have to make up my own way of doing this. I had the open telemetry semantic conventions that I could follow. If I really don't, I mean, if I don't really know what the problem here is, it says attribute pyramid, I mean, who knows, pull up the documentation. I can see, okay, so here are the rule details. I need to do this one for my HTTP spans. I need to do this one for my database spans. I need to do this one or these two for my RPC span. So it's the hand holding that you really need to improve your software development life cycle. Like it's not easy setting up tracing and writing traces. It's not easy, right? So that's the point of the whole edition of trace-based testing to the open telemetry demo is to just improve the hand holding. And with that, I think we can pop back here to talking a bit about this magical, I'm going to say auto instrumentation. This code that I'm showing right now is part of, so it's called the payment service. It's part of the open telemetry demo, which is just gonna be blunt. It's an amazing demonstration of what open telemetry can do. It's obviously maintained and run by the open telemetry community. And it gives you an overview of both the power of open telemetry and also guides you on how you should be using open telemetry. And I think the best way of showing that as well is to pop into the documentation here and say, yep, let's actually see what features are in the open telemetry demo. And one of the most important features that I would think I really want to talk about a bit is that it features tons of languages. So it doesn't really matter what your background is. If you're like me writing Node.js, if you're a Golang engineer, if you're running, if you write Java, that doesn't matter. Every single one of the SDKs are showcased in the demo itself. So you can pop in and actually read code and see what's happening. You don't really have to write, or like, sorry, read any documentation and kind of fumble through. You know exactly what's happening if you pop into the code and look at the code and read the code and understand it that way. And I love that. And with that, I think there are a total of 11 languages and there are 12 services. And you can see here that we have the payment service here that we're going to be looking at as well. And yeah, I just think it's freaking amazing. And yeah, with that, I think. We have a couple of questions. Can we get to those? Yeah, for sure. Let me just open up the architecture here so we can walk through, just have that open. And yeah, let's do some questions. Okay, so we have one that is asking, well, first of all, my question is, what is open tracing? And then Rima's question is, is there a way to maintain the compatibility between open tracing where tracing is governed by XB3 headers and open telemetry, where tracing is governed by transparent headers and a distributed environment at the same time? So the short answer would be that open tracing and open census merged in 2019 into open telemetry. So you should just drop open tracing. Sorry to be so blunt, but I mean, you should just move to open telemetry and just trash everything else you've been doing. Obviously there's a migration guide as well. So it's not that big of a deal. Obviously people have thought about it. People that are much smarter than me have written the guides and actually figured out a way of doing it. Are those guides part of the open telemetry documentation? I am 99% sure yes, but also the quick Google search can say, my great thing from open tracing to open telemetry. And we can pull up, yeah, obviously they have a dedicated page. It's purple even, yeah, yeah, there you go. So obviously they have backward compatibility to some extent and they have language version support. But I mean, if you're quite new to this, if you have existing open tracing SDKs or instrumentation in your code, you should probably consider dedicating some time and figuring out how to change. Otherwise, if you do not, you should just use open telemetry. Excellent. And we have another question about collecting the telemetry data. Can we do tracing with open telemetry as a init container or a sidecar? So I think the logic of collecting the telemetry is a bit different than what people are used to with, let's say, logs. It's a bit different because you specifically need to write the instrumentation in your code or you use the open telemetry operator in Kubernetes that auto-injects these libraries. So I would say that, it's not a easy question. I would say use the specified guides that the open telemetry documentation provides with you. If you're running Kubernetes, as I obviously see you are, use the open telemetry operator. It's dead simple, like I've tried it. There's even a documentation page I wrote. I can also pull that up. I mean, it's, if I don't have, so it's dead simple. You install a cert manager, you install the open telemetry operator, you configure a file, and you're basically, apply it and you're basically done. The only thing that you really need to do is that you need to specify in your services. So the deployment that you want to have the auto-injected library, you need to specify an annotation for that. Now it's, they have four, I believe they only have four right now that are available. So you basically, if you have Python, let's say you use Python, you just say Python, inject Python true, add it to an annotation, and you're basically done. It's, I mean, I keep saying it's magic. It really is magic, hopefully it leads me, right? So, I mean, to actually believe it yourself, try it. I mean, dedicate half an hour of your day, try adding it. I mean, it's a CRD. So if you don't like it, you can just delete it. You know, nothing's gonna really happen to your cluster. So yeah. So I think you just had the right thing. What language is this question is, is the Kubernetes operator available for all the languages supported? So these four languages are the ones that are. Exactly. So we can even pull that up in the, that doesn't really work. We can even pull that up in the documentation here. Let's say we want to go to the telemetry operator, that will be instrumentation operator, and we can pull that up and we can see here, Kubernetes operator, we can also probably link this in the chat as well. So people can check out for themselves. It's quite strange that they don't have a list. Oh, there we go. So they have annotations for injection, Java, Node.js, Python, .NET and Go, but Go has some a bit more complicated setup that they need to figure out. And that is pretty much it. Yeah, I need to update my documentation now because Go didn't have that at Go. Cool. Sweet, sweet. Yeah, with that, I think. Yeah, I think we can move back to the, if there's no more questions. No more questions. I'll show that operator. Oops. I think we can pop back into the demo architecture really quickly and just show you here. So you have 11 languages. You have 12 different services. You have a few different things regarding databases, caches, queues, whatnot. And this is a demo, this is obviously the open telemetry demo and it is a perfect demo of what a production system at a corporation would look like. And saying that the open telemetry demo, I think in the last month had 15 or so different contributors that numerous pull requests. And these people are from all over the world. They're from every imaginable time zone that we have on the globe. And how do you sync with that? How do you make sure when somebody edits this guy that everything else just doesn't die? It's not easy, especially if you're writing instrumentation, if you're setting everything up and then changing that. And you obviously don't know how it's all running because you're a new contributor and you have no idea what's happening. And then you just, I mean, even worse if tests are passing, but it's broken. I mean, that's a nightmare. And then you merge that and then it's broken and it's merged. It's just not fun at all. So I think that was some of the major pain points that the open telemetry demo had because they had black box tests already set up with Ava and Cypress. And it looked fine, but sometimes tracing broke. Sometimes telemetry was just misconfigured. Sometimes tests passed when they shouldn't pass. And that was basically one of the problems that the community wanted to fix. And that was to just stop that from happening more or less. And that's when the decision to introduce trace-based testing came in. And trace-based testing is exactly what it sounds like. It's using distributed traces for your end-to-end testing, for your integration testing. And with that, I think the best way of showing that is to obviously pop in and do some live coding. And that live coding is going to be in the open telemetry demo. And yeah, if we don't have any questions, we can probably pop back in and just pop into the code. A quick question. Oops, yeah. Are traces supported on the database layer with open telemetry? Yep. Yes. Short and sweet. Yep. So if you use Postgres, Redis, whatever you can think of. Yes. So I'm going to show that in a bit as well. So if you're patient for another maybe 15 minutes, I'm going to show some interactions in between a API, so a GRPC API, and some actually a Redis database. So... Cool. Will you zoom in just a tad, please? I will, so let's... Thank you. Nice. Please, probably fine. Cool. Now, so the thing I really want to show from the get go, just to have everybody on the same page, we did look at the architecture overview. So I think the best thing we can do is also check out the actual Docker compose file that is part of this demo. And you can see we have tons of, tons of lines of code, and it's basically, you can see accounting service, you can see all of these services. We're just going to pop down to the payment service because that's the one we will be changing and editing. And then what's happening in the payment service is quite simple. So it's building a file, it's loading some environment variables, and it's just starting. I mean, it's nothing really magical is happening there. What I do want to show, just really quickly, is the actual Docker file that gets, that is used within the payment service. And what's happening here, it's obviously, yep, I'm loading in, that's not the one. Let me just change that up. Mr. Fumblefingers, there we go. So what's happening here is just a note app, I'm loading the source from the payment service, I'm loading the proto file as well, and I'm just running it. Super simple Node.js service. Now, one thing I do want to note with the actual Docker compose file that is quite important for, if you want to contribute to the open telemetry demo, is something called profiles. So profiles in the Docker compose is just a way of selecting which services are going to start by default versus which services you want to specifically tag when you're starting Docker compose to spin them up. Now, the profiles that are named with test, those are going, obviously, they are going to run the front end test, the integration tests, the trace based test, and then obviously trace test server, which is the testing harness that we're using. That was all postgres because it's a requirement, but that's less important. One more thing that I really want to note as well is that I've added in this dev profile to the traces server, just because I want to have that up and running when I'm doing my red green. Actually, when I'm editing code, when I'm testing the code and writing the instrumentation, so that my software relevant life cycle is actually up and running. So I want to do that. So when I actually pop back into my terminal window, I have the demo running here and I'm running it with Docker compose profile dev up. So this is going to start my demo. You can see here, I'm running my open telemetry demo, all of the services are there up and running, perfect. Now, one other thing that if you want to continue contributing, there's a really nifty little file called make file. And in the make file, you have all of the commands set up for you. So if you want to run tests, you can basically run the tests and it's going to trigger all of the test containers. Same if we want to only run the trace base tests, you can do that as well. And then obviously you can also just run start and just run all of the services that way. Now, one thing that I really want you to check out a bit is that this command is going to trigger the trace base test. Now, I do want to show you this really quickly. So if I do pop back over to my terminal and let's say this one, I want to run only the trace base tests for the payment service. So the payment service that we actually checked out a moment ago, this is going to spin up everything for my payment service and it's going to run the trace base tests. And if I pop back here into my UI, I'll see last run. And it's going to actually test suite, is going to start this test suite for my payment service. And you can obviously do that whilst you're developing when you're actually... Oh no, just kicked you out. Oh, sorry, did I die? You just, I don't know. They got you kicked out of the room for momentarily but you're back. Yeah, you were only maybe cut off for five seconds worth. It wasn't long. Well, I mean, that's just, I was talking too much. I'm sure it is. The forces that be somehow restream new, you're paid extra for that feature. Right. And perfect. Now, what did I want to say? Yeah, so the payment service and this is going to trigger all of the tests that are currently in the demo for the payment service. So it's quite simple to actually get started if you want to contribute to the demo. I mean, I would definitely just kind of jest, you would because it's super fun. And also what I want to show as well here is that the way that this, I mean, you can also obviously in the terminal, once the tests are finished running, you get the either passed or read as failed for the test specs as well. So this is just a usual, just the usual way of testing that everybody's used to as well. So it's very natural. One thing that I do want to show though, for this particular step is that the way this is getting run. So let's say, let's actually dig into this trace-based tests container. If we look at it right here, what's happening is that it's loading a Docker file and it's loading some environment variables. If we open the Docker file, let's say, so it is this Docker file, no, it's not, it's this Docker file. Okay, see the only thing that really happens is that it's running this app test, trace testing run bash script. So what's happening in that script is that it's pulling in the CLI to actually run these tests automatically or automated in an automated fashion and it's generating something called a variable set. Now it's quite literally loading in all of these environment variables. Now, which environment variables you're actually asking? Obviously the environment variables for the entire demo and the things that basically, let's say, the important things like the actual addresses for the services and the ports for the services. And this is because we really want to pull that in so we can run the tests. So we're pulling those in so we can run the tests. And then what's happening is that all of these, all of these environment variables are getting loaded into the test files and then the test files are going to get run that way. And with that, I think we can move on to, yes, the important part. It just wants to really remember. Yeah, go ahead. A couple of questions, so is that now a good time? Yeah, go ahead, go ahead. Okay, we do have someone who wants to follow up with questions later to you. Do you have a good way to contact you? Yeah, for sure. You can either do my email or, I mean, you can jump into the CNCF Slack and just Slack me there. I'm in there, you can probably find me there. It's the easiest way to find me. Excellent. And then we have a question here. Seems like database tracing is clients-based or have the database tools like Redis Postgres, et cetera, built-in server-based hotel support already or is it not required? No, I mean, so the code we added in in the OpenTelemetry file for the Node.js SDKs, it's going to pick up all of the interactions with the databases. So that's the only config you really need. Cool. And then someone asked just kind of generally about Helm. How does Helm relate to OpenTelemetry? Well, I mean, if you want to, that's probably a question for, regardless to the OpenTelemetry operator for Kubernetes. Yeah, I was thinking that too. I'm not actually sure if they have a Helm setup for the OpenTelemetry operator. I would think so, but we'd have to check. I mean, yeah, we can probably check that later on and then just forward that to you. Cool. And those are our questions right now. Sweet. And yeah, so with that, this is just a basic run-through of how you can yourself get started with contributing and also running the trace-based tests. Yeah, one cool thing is as well, if we pull up this. So I did obviously run this test by passing in the payment service. If I want to run all of them, I could just trigger it like that and run all of them. But for the purposes of this demo, I think, I mean, we don't really need to do that. What I think is gonna be more fun if we actually jump into the code a bit more as more specifically jump into the code of the payment service itself. Now we did jump back in here. So let's open up the OpenTelemetry file. So we did jump in here and add in our instrumentation. One more thing I really want to show you before we move forward is I want to explain what the actual code of the payment service does so we can understand what it does and how to write the instrumentation for it as well, which is what we're gonna be doing in the next few minutes. And I think the best way of doing that is first, let's pull up the proto file. So let's do like that. And in regards for the payment service specifically, we have these four things that we need to look at. So the payment service has a charge function. It takes a charge request object and it returns a charge response. Very simple. The charge request has a money amount parameter, also has a credit card and it's this specific object and then it's returning a string. And that's basically it. So just understand what's happening here. And if we pop back into the service itself, let's say first the index file, you can see that's pretty much the same thing that's happening here. We have a charge service handler. We're putting in the call request. So the request object is getting passed into the charge function. The charge function is getting called here. As you can see, charge function. And we're setting, we're actually grabbing the values from the credit card that gets passed in. We're getting the details. We're validating the card. We're throwing some errors if it's invalid. And then with that all done, we specify the requested amounts and we generate, obviously we generate the transaction ID up here. And we pass that transaction ID back to the, as the response here in the index, in the service, and then we're just running a callback. So it's a very simple, very simple little service. What we want to do now is you can see that we have a bunch of code that's commented out. And we're going to specifically step by step add in some of this code to show how the trace changes and how we can change the testing against or actually change the test specs against this particular service. And with that, I think the first thing we can do is just start adding in some open telemetry instrumentation. So just to refresh our memory, popping back into the test suite. Let's do last run. And let's do our, let me just find it. Okay, I'll just rerun the whole thing to not think about it. Let's do like that. Let's rerun our, our tests really quickly. And let's pull up the, this one. And let's pull up the test just to refresh our memory on what the trace looks like right now. So this is only having the auto instrumentation. It's going to pull up a little trace and it's going to show me these two spans. So I have my trigger span and I have my RPC span. And this is, you can see payment service slash charge. Obviously this is the payment service and the charge function that I just showed you in the code. Now from here, I can go in and say, yeah, I want to make sure that my status code is equal to zero. I want to say it should return status code zero. I can go ahead and save that test spec. And now every, every time that this particular test runs, it's going to want my RPC span to return status code zero. Sure, that's cool, but that's not really that specific. We can do much, much better. And this is what I would say that we can do. So the first thing is we can pop in back to the code. Let's say, let's go ahead and add a context where we're getting the current span. So we're basically, we want to get this current span from the context and it's going to equate to this particular RPC span. So if we pull that in, we can then go ahead and say, yep, I want to add some attributes to this span and I want to add a span event, which is, I mean, it's just a fancy word of saying a log, right? So I want to add the payment amount and I want to add some values here. So let's say severity message request. I'll also obviously need to go ahead and kill the span. So I always need to make sure that if I'm starting a span I need to end it. And I also want to add a status to my particular span. Now, because this current service also has a charge function, so this particular function, I also want to pop into the charge function and I want to do the same thing. So I want to go up and again, I'm getting the active span. So the active span is going to be my RPC span and I want to add some span attributes. Let's go in and add some span attributes and this is a kicker. I also want to check, so because we have a valid value here because we're obviously validating the card and then getting that value back, I want to add that as an attribute so I can run our test spec against that particular value, which is actually pretty cool. Yeah, let's close up the span and next thing we can do, I think that should be it. Next thing we have to do is obviously just rebuild our, rebuild our payment service. So let's go ahead and stop it. And whilst we're rebuilding, are there any questions we might take right now whilst we're waiting? We're, no, we're good right now on questions. So yeah, this is actually going faster than the expected rebuilding this. So this is a kicker. So we're basically adding in, it's not a lot of code that I was writing. It's a few lines and a few attributes I'm adding, which is the process of adding this is not very different from the way you would add in custom logging. I mean, you can confirm or deny, but for me at least it's not that big of a difference. And that's what I particularly like. And then let's go ahead and start it like that. And with that started, let's just validate one more time that we're actually running it. It is perfectly up and running. Let's go ahead and run this same test once again. So let's go ahead and pull up our exploratory test once again, and pulling it up in the UI once again. Now, so if I've done this right, if I've done this right, I'm gonna have a few more attributes on my RPC span. Now, it will be pretty funny if this just kind of failed miserably. Everyone's rooting for you. But yeah, sometimes we learn more troubleshooting, I think, than when things don't go right. Yeah, I agree, I agree. Oh yeah, that's moment of truth. Let's see, nice. So yeah, so now we have the app payment amount. We have the app payment card type, and then we also have the card valid. So we have all of these custom attributes. We have them added. Even the span events here, you can see that we have the log. So it's a very log message idea. I can even run some test specs against the events here as well, which is, I mean, it's freaking cool. I mean, if this isn't cool, I don't know what is. And that's the cool part. Let's say I want to actually do a custom test spec against my card valid true. And I'm gonna say it should return a valid equals to true. Like that. Save that up. Let's go ahead and save that whole set of tests. And here's what we get. So now, if this card is invalid, I mean, maybe it's getting that value from the code itself. And it's part of my distributed trace, meaning it's part of my test harness, which means if this changes, I know something's wrong. This is as white box testing as it can be. It's freaking, I mean, it's actually pretty cool. But let's not just stop there. Let's do some more cool things. And what I think we can do specifically is this setup doesn't really give me a good point of view of what's happening. It's just one RPC span. I'm not quite sure what else is happening. I have some custom attributes, which is cool, but we need to take it a step further. And the way we do that first, let's just stop our payment service while so changing some code. And the part of the code that I really wanna change, I'm gonna pop back into the index. And let's just, here at the beginning, I'm just gonna comment out this part and I'm going to put in this part instead. Now, what's happening here? Again, we're doing the same thing. We're getting the active span, which is the RPC span. But we're setting it on the context and we're creating another span called charge service handler. Okay, okay, cool. So we have another span, that's a child span of our RPC span. But also we want to do the same thing for this part of our charge function. So let's say we pop in here, we say, again, in the same type of thing, we're getting the active span, but this active span is not the RPC span anymore. Well, no, actually, no, I'm wrong. It is the RPC span still, because this will create a sibling span with the span that I was just mentioning. So this charge service handler, it's going to be added to the active span, which is the RPC span. And this span, the charge span is going to also be added to the active span, which is the RPC span. So we're gonna have one RPC span and two sibling spans. I might have messed it up, but let's hope I haven't. So if I- Do you have a graphical representation? I feel like you should one earlier, yeah. It is coming. So let's do a rebuild or payment service. So that's what should happen. So if we move back, so we rebuilt it. Let's go ahead and run it up and run a start. Now, if we rerun the test once again, let's go ahead and run our exploratory test once again. Let's go back into, I believe we can also just rerun it here. Let's see what happens. So if I've done this right. And trace test is open source? Oh, yes, of course. It's part of the CNCF landscape as well, which is, I would believe that was one of the reasons why the open telemetry team decided to add it in as a test harness. Cool. I like Kyle's hype. Yes, Kyle's. There we go. And I was right, believe it or not. We're getting two sibling spans that are part of the GRPC, so the RPC span here. And what I particularly like about this is that with this, I can add a, I want to make sure all of my GRPC spans are okay. Cool. I'm just targeting my GRPC spans. But then I also want to target these individually where I want to say, okay, so I want to make sure that my hotel status code is okay here for this one. Or I can even do, oh, I want the payment amount to be equal to a particular payment amount. But in my charge span, I want to make sure that my card valid is equal to true. Say it should be true like that. So I'm basically splitting up my test specs based on the spans. And this just makes it easier to figure out. And I can actually, by looking at this trace, I know that I have an RPC server running. I know that I have a charge service handler somewhere. And I know that there's a charge function that's getting run somewhere as well. But we can even take this a step further as well by doing it the right way. The right way should be that this charge span should actually be as a child of this span. Because that kind of just makes sense because it's triggered from the service handler. It's not triggered from the GRPC specifically. So let's take another stab at that. Let's do another, to stop the service. While that's stopping, let's just change that up in a second here. Let's comment this back up. And we want to do something that's called start active span instead. So the start active span, what that does is that it gives you this callback function. And everything that happens in between, so in this callback function, is going to be nested within that span. So it's going to create child spans within that nested callback. Let's put that here. And let's move that a bit like that. Let's clean that up a bit more. And we're obviously, let's clean that up a bit more. And also put that there. And if I've done this correctly, you can see that this callback function is again going to, also we need to change this up because we don't really need the context anymore because we're running this function in that callback function within the active span. So what's going to happen now is that this charge span right here is going to be part of the active span, which now is the service handler span. And with that, let me go ahead and rebuild our payment service. And let's go ahead and restart it again and run the same test once again. Let's go ahead and do the upstart and the start like that. So. So exciting. Yeah, I mean, it's taking a while, but we're getting there. So now, if we do rerun the test this time, and this should ideally say, number of spans collected should be four. So just give it a moment to iterate through them. And it's going to show us a nice linear view. So a nice visual linear view of what's actually happening in our code. And then we can go ahead and run our tests against that and add specs logically, I would say, against those spans. There we go. So now this makes more sense now. We have our RPC server, we have the charge service handler, and we have our charge within that. And again, the tests are passing just fine. It doesn't really matter, but I think it matters. Looking at this, this just makes sense now. I have a question and we also have one in chat. My question is you've written all the tests in the UI. Can you write tests with pure code or do you have to interact with the UI? We, yeah, next step is going to be running all of this in the code as well. So just with YAML files. And that's the neat thing here as well. If you jump to the automate tab, you get the text file as well. So you can just copy this and run it with the CLI command. It's just freaking, it's just convenient as hell. It doesn't really get any simpler than this. And doing that actually in the UI, and actually sorry, in the CLI and the code editor is really simple. If we pull up the test itself, let's say we do here. So we have the valid credit card. So we want to just check if the card is valid. We have our, obviously we're our proto buff file. So the demo proto was showing a bit a moment ago. And then we have obviously our request. So this is the amount and the credit card we want to pass into the payment service. And we can trigger this and add in our test specs. Now the magic about the test specs is they're quite literally a part of this file. And the way they work is that the selector here is the selector, it works in the same principle as I was showing in the UI as well. Where if you pop into the UI, you can basically go in and say, I want to run a test against, let's say this one. And I can see here span type general name charge. If I copy this over and let's say I add it here and run this test, let's do like that. Let's copy that and put it right here. So let's say it's part of the, let's do like that. Add in another name, should return a valid credit card. And let's say I want to do the span general name charge. You put it here in the selector, span general name charge. I want my attribute, I pulled the attribute as well. I want it to be attribute like that. App payment card valid, should it be equal to true? That works perfectly fine. Let's save that up. And let's go ahead and run this test now instead. If I go ahead and point to that file in my CLI, I'm gonna do like actually let's just delete that part. And let's say it was called valid credit card. Obviously that happens because you're not attentive enough. We obviously need to add in the, it's a list of, let's pop back in there and let's run that again. And this is going to ask me for, this is what I like calling an ad hoc test. So it's going to ask me for environment variables, which is super cool because remember, maybe 30 minutes ago I was talking about the environment, how we're adding the environment in the run bash file. And here's how that gets loaded. So anytime I want to run this from an environment that's already set up, so I can see here the variable set here, I can see, so here's my variable set. I can also do that in the CLI as well where I want to say it is like that. And it's, I think it's, Rars, it's probably Rars. If I mess this up, it's on me. Cool, so I'm pointing to the environment actually the variable set that I have already configured. I can obviously just write a file for an environment for the actual variable set and have it that way. But yeah, let's pop back into the UI so we're not waiting for the terminals to get back to us. Let's do, yeah, here we go. Recent last run. And here we can see right away payment, valid credit card. We're getting the transaction ID back. We have our trace and obviously we have our test. And right here we can see that all of these test specs are equivalent to what we're getting. Obviously we're getting the same back in the terminal, but we're getting the same test specs as we have that defined here in the test file. Which is, I mean, whatever you prefer doing, one cool thing is that when you're running in CI CD, you have all of these, you can generate all of the tests by hand if you want to actually see what's happening. You export the files and set that up automatically to run with the CLI and any CI CD process that you have. Amazing. We have one more question in chat. And then I think we're about finished with time. But let's go, how is privacy handled in OpenTelemetry? Is that left to the trace store or features like masking can be leveraged out of the box? As far as I know, there are, so it's handled in the collector. So the OpenTelemetry collector, who doesn't know, we can just pull that up as well here in the OpenTelemetry documentation. So the collector is basically, it's a piece of software that acts as the middleman in between your system, sending traces and the trace data store. And then you can do pretty much anything in the collector. So you have different extensions, you have different processors that you can add in and the processors are, you can do tail sampling, you can do head sampling, you can basically do a bunch of different things. And for this specific thing, I think it's called masking. I'm not quite sure what it's called. But if you do pop into the collector and then look for that specifically, I'm sure you'll find it. Also, the CNCF Slack has a dedicated channel for the OpenTelemetry collector. So asking them as well is going to give you an answer even quicker. Yeah. Is there anything else you'd like to add to your presentation before we say goodbye today? I mean, I think this is super cool. So I could probably be talking about this for another hour or so, but I don't really wanna waste anybody's time. But yeah, if you have any more questions, we will be, I'm going to try to add this, all of the examples that I showed today and even more, I'm going to make sure to add this to the, so basically just the fork that I'm maintaining of the OpenTelemetry demo. And then share it with everybody so you can try these samples yourself if you want to just get up to speed or if you want to add it to your own project. So here's the GitHub of the OpenTelemetry demo. And yeah, and your fork is... Yeah, I'll just add it in. I can also add it in here. So it's just, I'll actually try to make sure because the official OpenTelemetry demo, they have this little part here at the bottom. So here, so there are different demos from different tools and vendors that maintain their own forks just to show people how it's run. I'm going to try my best to actually get a version in here from the examples that I showed today just to have it so everybody can pop in and see if they want to try it. Awesome. So we have the GitHub URL now to see the demos if you want to get your own hands dirty with everything that we saw Adnan do today. Please, please go for it. This has been super informative and super fun. Yeah, yeah, you're super impressive. I appreciate you sharing your time and your expertise with us. And I appreciate everyone for coming and sharing your time with us. Everyone's time is so valuable. It's such a gift that you're giving us some of yours. So thanks everyone for joining today's episode of Cloud Native Live. It was great to have Adnan Rahit here teaching us the power of traces while open telemetry embraced trace-based testing. I also, as always, really loved the interaction and questions from chat. Y'all are the best. So here at Cloud Native Live, we bring you the latest Cloud Native code on Tuesdays and Wednesdays at noon US Eastern. So we'll be doing another show tomorrow. So thanks for joining us today and thanks for those who watch the recording and we'll see you again soon. Bye. Bye.