 So today we are going to talk about what we are essentially doing in Kubernetes to make it The logs like to make the logs machine readable like right now. They are human readable, but they are not machine readable So as we Take a step forward in the observative space. We need some automation on top of Kubernetes to be able to Manage Kubernetes in a better way. So this is what We're going to cover today about what we have done to make it possible Short introduction about me. She wants you and yeah, so I work at that trade and I've been contributing to WG sticks at logging, which is a part of sick instrumentation and yeah, I'm also contributing to Kubernetes and the steel Yep, if you have any questions, you can reach out to me on Twitter and yeah, we can talk about And shout out to Patrick and Mark. So they have been investing their time They have been editing on few things to make it possible. So it Don't be possible because of these guys. I mean, if they were not here. So we are glad for their help. So today's agenda involves A bit of introduction about what is structured logging and what is contextual logging and then we're gonna dive a bit deep into what we have done to actually achieve this in communities and then Like what are like for Kubernetes contributors? What are the migration instructions for them to actually write code when when they're so as a developer we ignore the importance of logs when writing the code, but it becomes a bottleneck when Let's say a platform engine is debugging some things. So that's that's why we have created a migration guide so that people can actually use it and then questions are welcome at the end and I would like to welcome the target audience so anyone who is contributing to Kubernetes is is yeah, it's they're welcome and People who are contributing to logging agents They are mostly playing around the melt data all day long. So yeah, they are also welcome and users of communities who are Interested in managing communities and we are probably so for example GKE or or any managed community solution. You need Some sort of automation so that if something goes wrong like we can do something about it So that's where and users of communities are welcome so that they can build something on top of the structure and contextual logs and it's very good for new contributors to get started because We have a lot of work which revolves around migrating the logs the existing logs and then there is beginning a friendly work, which which is Very good for new contributors to get started Okay So what is the motivation behind this? So I think when mark started in 2020 Yeah, the community logs are very messy. So they are So they are not leading to anywhere It's just some strings which can guide you and you'd probably have to navigate through the code base to see okay. What's happening and then Kubernetes uses K log for logging which is a default logger and it is essentially a fork of G log and So we don't have much capabilities in G log So we we have our separate fork in K log and we maintain it according to our needs in Kubernetes There is no easy and it's right standardized log collection design for Kubernetes Like people are passing those log messages, but it's not an standardized way to do it So for example in open telemetry, we have hotel P log data model So which is kind of a standardized way to collect your mail data, but that's not the case with communities today and Yeah, so if you want to Gain some fine-grained information about so for example if if you have a main function It's calling maybe thousands of go routines and then it is calling it is spanning multiple routines Then there's a context associated with everything that's happening But logs can give you that information, but we need to figure out a few ways to actually achieve this and Yeah, of course if we Achieve this we can have a easy and an automated monitoring of logs So yeah, that's the motivation behind So this is like the very basic communities architecture. We have control pin components. We have node components and So as part of the structure logging migration and contextual log in migration what we essentially need to do is we need to transform all the code to to be able to produce contextual and structured logging and then so yeah, so we have done migration for a scheduler We have done migration for cube proxy cubelet, and yeah, things are still in progress and This is like a basic way of collecting logs So you have your container logs, and then you deploy an agent and then you probably in the pipeline deploy some parsers to to correlate a few things in your logs and then Yeah, you basically store it somewhere and then you show it in the UI so if We are using structure in contextual logging this like there's a lot of things that we can correlate and we can Attach a complete pipeline in this simple diagram in simple flow so that a lot of things that like manually We are doing today can be automated So we have divided this in two parts Structured and contextual part. So first we're gonna understand on a high level What we are actually doing here, and then we want to do like with respect to code what we are doing actually So the introduction will have this proposal and the deep dive so the proposal around structure logging is to define a standard structure for communities log messages, which is not there at the moment and and Add methods to the native logger of communities, which is K log to enforce this structure and Ability to configure communities components to produce logs in JSON format. So JSON format We have chosen JSON format because it's it's very parsable in a way that it's it's a key value pair so it's the easiest way to to get a structured kind of logs and At the end initiate migration of the current code base to structure and contextual logging So this is an example of of a structured log So in the beginning we have okay It's an info message in the time stamp and we are getting the the line exactly from where the log is coming from So this is the message and then we have key value pairs. So we have pod which is name is space slash Name and then status. So this key value pairs can be dynamic can be anything But yeah, we'll come to the structure later on the slides so the goal here is to make most common logs more queryable so that We can search for specific things we can correlate a few things and It's nice the log messages and reference to Kubernetes objects So right now if we are if you are contributing to communities, you know that if you're writing a log, you cannot give a Kubernetes object as a as a parameter in The log you probably do something pod.name or something like that But yeah with this you can actually pass Kubernetes object and it will print out the essential information about it and For the log structure by introduction of new K log methods. So this for to achieve this we have to modify a few things in our K log fork and Simplify an addition of logs into third-party logging solutions So it's it's just like how to consume these logs when communities Producing GBs and terabytes of data then then how like we have done some benchmarking about it So how logging agents can actually get the data? it's about that and There are non goals here also So we are not planning to remove K log as of now So K log is gonna remain there and we gonna keep using it For the functions that we are using today But it's just an additional layer to provide a structure and connection local logging on top of K log Similarly for connection logging. What's the idea? So so if you are aware about connection logging, it's just like So if you have a main function the main function can call a different Different functions and then function can be spanning Multiple go routines. So at the end, it's like a tree structure where the leaves you are getting logs But if you don't have contextual logs in the place all these logs gonna look alike So you need to pass the context along the leaf from the parent to the leaf So this is how it looks like and then the context is also shared between different go routines So that information is also important So the idea is to retain the context from the parent to the leaf and the proposal is to replace the the global logger and use logger dot logger to log out to dot logger to And and it's basically provides us freedom to use the the original context and then adding the extended support for K log to To actually use the log log are capabilities to to get the context from the parent So this is a sample log. So it's similar to structure log The only difference is here this four five one eight nine five number. It's kind of so if you have used Let's say proxy wasm SDK to maybe configure your envoy You might have seen that there's a concept of VM and there's a concept of threads involved there So it's it's similar to that here. We are getting the context from the parent They are we get to get the context from the thread In the goals here as to ground the caller the caller of the function means the one who is programming cubanities Control over logging inside that function. So so if there's a cascaded function, we can pass the context and then the same context can be populated to the leaf and So For example, if you want to also check in the unit test What's happening with the with the complete context? We have also added a wrapper on K log so that in the unit test this functionality can also be tested and a few API changes are also required here, which we'll talk in the deep dive And again the goal is not to remove K log and we are not deprecating K log as of now So let's take a look into the structure logging. So There are few Implementation details around like these are the things that we want to achieve. We want to have Structured log message. We want to have references to cubanities object in in every log We want to have in we want to introduce Jason output format So so that we can pass it and the logging configuration like Jason logs are huge They are heavy. So if you don't want it, don't use it and This performance implication associated with it because right now like we are logging More than we were logging before so we also need to be worried about the performance And then the migration details So let's go One by one on everything. So the first thing is about log message structure So the structure is very simple. So so there's a cap Where we are like we were discussing about the structure and we finalized on this particular structure The structure is simple You just write your log message and then everything which is associated at that particular context Can be logged as a key value pair. So you just have to provide Specific keys and specific values and the guidelines for that are in the migration right, right? So for example, if you And at some point you need pod so so for example here I've created a sample pod with some name and name space So you can log it in a way that you write the message and then Pod status updated and then key as pod and this key object is what we have induced So it's gonna pass pod and it's gonna Give so we'll come to that part. How it's gonna pass and It's basically this is the idea. You have message and you have key value pairs and then References to Kubernetes objects. So for example here, we are referencing pod So how to actually achieve that? The idea is to use KS native APIs first approach and then relate the information And embed that in the K log So for example for K object, we have a object reference and the object reference looks like this So in in Kubernetes Every Kubernetes component can be boiled down to either name is based on or without name space So the object difference is gonna give us exactly that the specific details So the idea is to for the name space object we are going to pass it as name space slash name and For non name space objects, we we are just using the name. So this is for example QtnSQP system and here it's just a node name. The cluster name is given there So the example is like for example, we have K object similar to the previous example And if there's an error So we can have a custom message of error and we propagate the error here and then again the key value pairs and How we're going to introduce this one output format in K log. So it's about Again introducing new methods in the K log logging library to support JSON and with K log We do we can take further advantage of the fact that that there's an option to produce sexual logs in JSON format There are some pros of using JSON and there are some cons the pros are like it's it's very efficiently efficiently implemented using zap and zero log and Out of the box like many many logging back ends are Like can parse the JSON logs So you don't have to configure any regex pattern or any parser like this and laws are easy passable So if you are open telemetry fan, you can just use philogue receiver and with with the With the JSON log parser and yes, I mean all the logs can be passed easily and Yeah, so like for your local debugging like jq tools like jq can help So this is an example of a structure log where we have with the time step temp And then the everything isn't in the key value pair So the log that we are referencing referencing to here. It's it's gonna look like something like this So we have the pod name and name space and other other things so how like how we have introduced the This is one another example. So for example, this is a cubanities native object pod is a cubanities native object But let's say if we have Something else. So there's a there's a request and we want to also log that so we can use custom key value pairs also So basically the idea in this migration guide is to have standardized key value pairs for Knative objects and then you can use your custom Implementation and your custom key value pairs for for other things Similarly, just like here like we are logging the pod here. We are logging the request in a similar way other objects can be logged and Then how we actually Change the logging format. So if we do not want this much of log volume The default is the text-based logging, but if you want to have the decent logging, we can just configure this flag and it will gonna emit the decent logs The like the first benchmarking that we did around this is so info f is the previous way of logging and info s is the new way of logging so if you look at the time that is like nanoseconds per operation So it's there's not much difference. There's a slight difference and again with number of bytes There's not much difference and the CPU allocation. It's almost is exactly the same so info s implementation for text is nine percent slower than info f and It's roughly takes two percent of overall CPU CPU usage if you are just logging if you're not doing anything So there are user stories around this So if we want to take a look like why we are doing it in the first place. So so people are using Kubernetes and then they they are facing hard time to actually debug when random schedulers goes down or a Bluebird is not responding. So how do we actually do bug it? So like contextual logging plays a big role. So Instead of navigating through the code and just looking at the right and just correlating things on your head You can now use the contextual number to actually correlate by using a software and that's gonna be very helpful for platform engineers and the implementation details around is like removing the dependency on the Global keylog logger and extend keylog for contextual logging and use migrate a six log and attach context with it So that's what just we discussed right now so The idea is to when we are writing code instead of using keylog error s we need to use logger dot error and if so basically logger log r dot logger is an instance of So if the log is already migrated to such a logging we can use klog dot logger, which is actually an instance of logger dot logger So for example, these are the three sample methods that we have. So for example I want to retain the context. So I can use from context, which is internally using log r dot from context. So And similarly for the background It's it's received the Context related information and then context dot to do is also there. So it's just like Using a ctx package, but but within the context of klog And here are migration instructions. Let's Take a look at it so So the idea of this migration guide is to so that anybody who is contributing to Kubernetes They can actually refer to it. And then if they are writing new logs, they can Get an idea how to actually write Conductional and structured log and if they are a new contributor who are migrating the old logs to the new logs They also need to refer to this So the so like we have defined everything like what is the what we were previously using and then what are the new functions that have been introduced So for example for contextual logging, we need to change klog dot error s to logger dot error And we need to use the sender function. So like I recommend To take a look at this migration guide Yep, so So what We actually need from you is like we need your help So consider a big volume of prs like there are issues. For example, we had 16 huge prs for q proxy migration and then Like for shadow also, we had a lot of prs the thing is So we have divided it in two phase The first phase is someone from wg structure logging or signal instrumentation reviews the pr and see if Everything is migrated properly and the second part is when the code owners of that particular component Take a look and if they want to modify the logs because as the code grows Sometimes what we have written earlier does not make sense now It's like it's it's having a second look at the at the logs So the idea is People from a structured logging group. They they just l gtm the pr It's fine. And then the maintainers they have to approve it if it's it's correct And for contributors We have this WG structure logging channel and we have bi-weekly meeting. Please join and it's very beginner friendly Because a few things here are like already implemented and the migration part is already is is all we need So it's very beginner friendly Please join and like ask questions And yeah, I think that's it So if you have any questions, I will try to answer them if I cannot like Patrick is here He will help me out. So, yeah I think it's was not very deep But like it's it was like Target for new contributors also and like it's like high-level view. But if you want to dive deep, uh I have links in the slides and I will upload in the In the session so that you can take a look at like what kind of implementation we have done And yeah, we can go from there. Yeah, please Okay, so if I understand the question correctly is like, uh In k-native system, we are logging to either the service logs or or the conditional logs But it's like dumping logs somewhere and then reading it from there. So there's uh, there's a lag here So are you asking if if we're gonna expose some api so that you can directly query logs from kubernetes api server or or something else? Okay It's probably any plans around that They wouldn't get any output from kubectl logs. So we kind of have we would have to do both log to file for kubectl And write to some agent, but there is no api standardized for that that I know of Well, it's mostly a question also to be signal maintainers and container runtime folks how they want to Optimize that I'm also not sure whether there actually is an improvement Intuitively if you you think it would be but if you read from the file that has just been modified The data will be in the linux kernel Cache so essentially You're right. We'll go into the cache It will go to this and you read but immediately it happens because someone is following that file Will come from the cache and for serialization Is necessary because otherwise even if you write through a pipe or something you need to serialize and deserialize Even if it's jason, you still need a byte stream So that overhead doesn't go away So I'm I'm not sure how much we can actually gain in terms of performance if we just optimize that data path It might be worthwhile in experiment But it will involve quite a few components container runtimes. So It would be interesting if someone digs in here and reports back But I think uh Like today in a k native ecosystem. That's what like every agent is kind of doing like it's they are reading from the logs Yeah So we're talking about logs here, but there's other telemetry that the components might be exposing right and they're probably exposing metrics We're talking about logs and we probably uh, the components are exposing metrics traces. Maybe something else in future and How are those other signals exposed and is there a way to converge? Yeah, so So we have single instrumentation which is actually taking care of it. So we we expose metrics traces and events and Yeah, I mean there are ways to consume it efficiently. So Yeah, but this talk was specifically about the logging. So Yeah, sure Asymmetrics are exposed in Prometheus format, right? Oh All right, not right Not here not a right question. Okay. Thanks. Thank you