 Hello there, I'm Michael Shee, I'm a Product Manager for Developer Experience here at Law of DNA and today I'll be talking about planting trees and Kubernetes observability. So just a quick agenda of what we're going to be talking about today. I'm going to start off with a quick background of who are we as Law of DNA, why are we here at KubeCon? And then a quick discussion on our thoughts on what makes a good Kubernetes observability tool. And then onto the demo portion, we'll be standing up a production-ready service that will be planting trees. It will be a microservice deployed on Kubernetes and of course we're going to have some observability wrapped around it just to debug our issues while we bring that into production. So just some quick background, Law of DNA, we're a centralized log management service. Over the past three years, we've been helping customers manage their Kubernetes logs, but of course we also accept logs from all sorts of different types of sources. But Kubernetes is certainly one of our largest logging sources for our customers. And on top of that, we've been built from the ground up on top of Kubernetes. We operate worldwide deployments on both bare-metal clusters as well as managed Kubernetes clusters. And through our years of operating Kubernetes internally, we've learned a thing or two around observability, especially as it relates to Kubernetes and microservices. So what makes good observability tooling? And when we have a lot of internal conversations around observability tooling or talking to our customers and their engineering leaders, we typically aren't talking about dealing with high-carnality data or artificial intelligence that are really worried about making sure their observability tooling enables their microservice teams to be able to solve their application problems at the end of the day. It makes sure that these service owners aren't being bottlenecked into solving their issues. They're not being dependent on other teams to solve their specific microservice issues. And it also means that your shared services team are able to be a lot more high-leverage. They don't have to triage requests from a whole host of different microservices that might be trying to be deploying and running into issues. And on our end, we think that there's three big properties that kind of enable this goal. The first one is really about lowering the infra-learning curve. So especially as a service owner to operate on top of infrastructure, you really shouldn't need to be an expert on top of that infrastructure. Especially when something you might be on top of very dynamic infrastructure, it can certainly become difficult at times to separate understanding if there is a service-level issue, if there's something wrong with your microservice under your control, or if it's an infrastructure issue, it's just an underlying issue that is actually a little bit outside of your control. And certainly you shouldn't have to be an infrastructure expert as a service owner to understand that. And at the same time, certainly as an infra-member, you shouldn't have to be a service expert to understand if it's a service issue or an infra-issue. And your observability tool should really be working towards helping bridge that gap between those two pieces. The second part is really talking about habits and workflows. Of course, we're all creatures of habit, so it's the best of your observability tooling. RE fits in with existing tools that you want to reach out for. So things that you're already part of a workflow. The tools should hopefully fit into that existing workflow. And what's even more important is that once you're in that tool, it's able to, you're able to get your job done without needing to do a ton of context switching or flipping in and out of the tool just to get the information you need to really keep you focused on making sure you get your job done. And the third part, of course, is it has to be easy to get set up and get started. We all know that literally no engineer wants to sit through training sessions just to get started with a new tool. And even the most powerful tool is available. It's not going to be useful to you if it's too hard to set up. We all know that you're not going to spend all of your time on observability. So obviously, ease of use and ease of getting set up is crucial to making an effective observability tool for your team. So on to planting some trees. So for this demo, we're going to be planting some trees. It's going to be a microservice that we're going to deploy into Kubernetes. And it was going to plant a tree on every single request made to it. But like a lot of applications being made, we're going to run into a few hiccups on the way to production. That's going to be OK. So I'll be playing the part as a service owner. And we're going to be deploying basically a new microservice into production to help with sustainable logging in the planting trees. All right, awesome. So now to just hop right into the demo. So what I have here is I have a small microservice in JavaScript that will basically serve and be able to plant the trees upon request. So there you go. The server is now running. I'm just going to test this locally just to make sure that it actually works on my computer before I push it out to the cluster. And there we go. I just made the curler request. Then we see that it comes back. So that's fantastic. And I've already prepared the deployment manifest as well. So we're just going to run kubectl-ply-off. And then we're going to just grab the pods. And then we'll see that the pods are being created right now. So one of the pods is running. And I'm just going to get on a little soon follow. So I'm just going to get ready to port forward to the service that we've just stood up. And then now what I should be able to do is I should be able to just call a close AD80. It'll go through port forwarding and then reach the service within Kubernetes and all things go well. We should get some trees planted. Of course, deploying to prod is never that easy, unfortunately. So we're just going to check out what's going on with the pods. And one thing I'll notice, while this is quite strikingly similar, you'll notice that there's a restart on this pod. And so that's kind of interesting. Let's just try to get the logs of what's going on. And the logs are empty. And those who have run into this use case before in Kubernetes knows that it's really hard to get the logs out of a container that's already started. And in this case, that's what we're running into here. And what's happening is that it's trying to pull the logs to the latest pod that just got restarted instead of the older pod that aired out and would have that valuable crash log that would tell us what the issue is. And this is where centralized logging is extremely useful. And to get logging a setup, just to have some centralized logging and observability into our system, it's just a couple of Qt CTL commands. This will set up both the agent and the report. So logs and events will both be pulled into a lot of DNA into my instance. So here, I'm just going to make another request again. And this time, though, it should be captured within log DNA. So it should be a pretty easy thing to simply go into log DNA and pull logs here. And you'll see pretty quickly, OK, awesome. So there is this fatal error here saying that there is just nvar that's undefined. So let's just fix that up. So I do have the config map already here. But one thing I'll notice is that the pods back here doesn't actually say to reference the config map. So let's just go here. We're just going to save that and make sure it's referenced in the config map. And then we're just going to apply the demo. And here we go. It is now configured. And what I'm going to do is I'm also just going to restart the core forward since the deployment is being redeployed. So there we go. So now I'm just going to start the report. I'm just going to look at the live tele here to monitor the progress. And very easily, logging has been running in the background. It's just telling me that, OK, cool, there's this error here about the config map not being configured correctly. And there seems to be a typo right there. So that's not a big deal. Let's just fix that up real quick. Just fix that. Do another application. And of course, I'll have to restart the port forward in just a second. But in this case, just take a look at logging this time. So now it's saying that it created on my first time where it was creating and then got stuff and error. So just monitor what's happening here. Great. So now it's turning. And there's the new pods that are running. And we'll just restart our core forward. Now what we can do, we should be able to just curl the service like we were trying to do before. And hopefully get an answer from these pods. So still not doing really well. That's certainly not a really good thing. Let's just check the logs real quick. So notice you get into the habit of just checking the logs because the logs will usually tell you all the nuances that can be happening in your system. In this case, there's actually nothing really happening. Both the servers are noted to be running. Nothing else is really going on. Let's just check back in with the request. And we'll know that it's definitely still running right now. Yeah, so I think the request, let's just give it another try. Something is going on here. It's not working properly. OK, there we go. So it's definitely working, but it's a little bit slower than what we'd really be expecting here. So it's certainly coming through. So I'm just going to go back to our application. So wow, that's such a super interesting. Looks like there was a timeout on one of the requests. So that's something certainly to look into and super helpful to know what's going on in the application. And but I don't know if I can completely explain why it's taking a long time for the request to come through. So I'm just going to take a quick peek at one of the log lines themselves. There's nothing super interesting besides maybe open the context and see what's happening in the log line itself. And that's super interesting. This is red, although it's saying it's just nine milli-CPUs used, but it says it's out of one milli-CPU. And that's probably the area here that I under provisioned this application. Indeed, to go back to the config here, really what we wanted. Let's just give it a quarter of a CPU. I probably meant to give it one CPU and give it one milli-CPU instead of an accident. But it's not a huge deal. Again, really easy to tell in-law DNA when you're able to have your metrics and your logs kind of correlated in the same place, it's actually really easy to just tell like, OK, fantastic. This is an application error. It's not slow because there's a noisy neighbor problem in the underlying node. But actually, at the application level, there's a couple of issues. Surfacing up here is a primary problem. And then everything else is looking healthy in terms of the replicas, the ready status of the node. And the pod itself is also looking pretty good. It just had a supposedly high CPU usage because every cost was set a little bit too low. And certainly, there's some more interesting stuff to look up here in terms of the unhandled promise rejection that's happening in the API. But it also seems to be the API side of things that we had to fix up. So at this point, we could just take a quick look just to make sure that we have our actual servers running. And indeed, the servers are running now. So we're just going to make a quick request. Forgot to restart the port forward service. Make a quick request. And there we go. Now the API is responding a lot faster. And it's certainly working a lot better right now. Fantastic. And we could check that these logs are certainly nothing else weird happening in these logs. So that's kind of the quick demo of deploying this application, getting it running in production, making a couple of requests to it. Obviously, there's some more things that we can clean up looking into the application level. But here, we've already got our service in production. And we've been able to debug a couple of the issues in the log here quite quickly. And of course, beyond just as a service owner, perhaps as a cluster operator, there's only more things that you might be interested in within live DNA. And there's a lot more that I could go through in this demo. One thing I love to point out is also some of the dashboards that we include out of the box for Kubernetes and Kubernetes enrichment, such as the cluster overview here, that helps you aggregate the information across the entire cluster of how much memory is being used, CPU, and the nodes that are ready, and some other useful information about your cluster. So I know we didn't get a chance to plant a lot of trees with our microservice yet as part of this demo. But please come by our booth at Gold Hall. And if you drop us a message on how logging has helped you troubleshoot on our board at our booth, we will donate $10 on your behalf to the family giving tree. And of course, please just feel free to come by and chat. Me and my colleagues will be there. We're happy to talk about anything from Kubernetes to observability logging or log DNA. And thank you guys for watching this demo on planting trees and Kubernetes observability.