 Hi, and thanks to Amid. It was a great talk and I'm here to talk about observing the gRPC protocol But first I want us to take a look at the bigger picture Because this is about much more than gRPC Observability into our clusters will undergo an enormous revolution in the coming years Thanks to eBPF When I say observability I'm talking about our ability as developers and DevOps to see everything That's going on in our cluster This includes all the interactions between our services all the metadata that accompanies them various events such as program exceptions and protocol errors and Determining exactly when all of this occurs and to which processes it belongs This is what I call full observability today Your cluster is most likely in the dark When a bug occurs you don't really know what happens So you have to try and reproduce it in your development environment And that's only if you care enough, but if we are able to reach full observability This will totally change I'm Ori Founding engineer at ground cover. I'm very happy to be here today and I've been implementing observability solutions for many different use cases for quite some time and I I fell in love With the magic that is also known as eBPF What I want to share with you today is the most important insight that I found throughout my experiments That with eBPF we can reach full observability The best way to understand that is by examples and this is where gRPC comes into the picture gRPC is quickly becoming the preferred tool to connect microservices Replacing a lot of the uses of HTTP in the modern era And it is also Quite hard to observe and this is exactly why it makes the best example for our revolution I will not dive into the technicalities of gRPC here, but there is some basic stuff that we should know When the client wants to request a resource from the server using gRPC They send a get like request with the resource name as a header the path header The headers are then cached to minimize the total amount of data sent over the connection because headers usually repeat themselves This basically means that if the same resource is requested later on it is basically given an ID at the first request and the ID is Used instead of the resource name in our example the ID 7 is requested instead of the resource name Oli We also need to know that gRPC supports multiple streams This means that the client may request more than one resource at the same time And if it does so and request for example two resources almost at the same time the server may even respond to the first request Only after responding to the second one sorry gRPC like every other protocol we know of is implemented in different libraries The main library that I will discuss today is the gRPC C library. It is used by a lot of languages including Python so Today here we are gonna implement observability for the gRPC C library But first let's consider what do we really want to observe? Let's take a simple example the most simple one I could think of of Two containers a client and a server communicating over gRPC The client periodically every 10 seconds sends a request to the server requesting for resource Oli It's quite a common use case of a client that needs to know when the resource is updated and in that communication What do we really want to observe? Well, we have to see the gRPC data This is the most basic part of the protocol in our case. It's does it's just the resource being downloaded We also need to see the headers, especially the path header that we talked about the resource name We also need to know When the stream has been closed that will indicate to us that the resource has been downloaded or that there was some other error and Lastly for our simple example, we don't really need to know the stream ID But if the client were to ask for more than one resource at the same time We would need a stream ID to differentiate the responses gRPC connections Can live for a very long time these requests every 10 seconds could go on forever and This means that we are most likely going to start observing the traffic When the resource name alongside other headers is not transmitted anymore so the first Solution that eBPF brings to the table is using K probes kernel probes They can be attached to kernel functions Including system calls and there we have access to the arguments to these functions So for example, we could attach probes to the send and receive system calls and Have access to their arguments And this will basically provide to us everything going in or out of the container and This is really powerful. It is used this way eBPF is used to monitor a lot of different protocols like HTTP and It also looks kind of promising right everything we talked about all the four items have to be there right coming in or out of the container Well, the problem is that k probes just don't cut it in our case The header compression mechanism we talked about that replaces the resource name with an ID Makes the gRPC protocol stateful if we used k probes from the beginning of the connection We would observe the first the first request and we would see that seven represents the resource Oli But we might just start observing when the connection is already alive and We also want to do that without harming the connection in any way with zero downtime and If we do that we are gonna miss the resource name So what can we do? Well eBPF luckily also introduces you probes user mode probes And they can be attached to user mode functions including And sorry including library functions Using you probes we can just like with k probes see the arguments to the functions that we probe So probing inside the gRPC C library Can provide to us information from the library from the memory of the library In the middle of the connection to The library the gRPC C library Obviously knows that the number seven Represents the resource Oli otherwise it just wouldn't have worked if the server didn't know what seven was It wouldn't know to respond to a request for resource seven So if there was a gRPC receive function that receives all the information We seek as arguments we would probe it and just see all the incoming information into the container We would just need another another probe for the complimentary gRPC send function and we would be done We would have full observability When you think about it this kind of solution Works for all types of stateful connections including even encrypted traffic The encryption libraries for instance libous a cell Obviously know what the unencrypted data is After all they are used to translate encrypted to unencrypted data So using you probes in the correct location inside these libraries will provide full observability Into encrypted traffic We are here. I've got a feeling that we are gonna hear more about that today later in a talk by Dom And I am definitely looking forward to that one, but for now, let's focus on our gRPC use case a Similar concept that Omid talked about briefly is USDT user level statically defined tracing USDT functions are basically functions like the gRPC receive function. We just saw They are functions that are designed to make monitoring the library easier They receive all the information you need to monitor as arguments Monitoring a library that has USDT functions is the most simple and reliable Uprobing solution you could have Our problem is that the gRPC see library does not have USDT support We can add USDT support After all this library is an open source and a community driven library Hopefully the library developers will like the idea and allow us to merge this code into their pull request and Hopefully they will also deem it important enough to be turned on by default at newer versions and That is great Because when the new version that has USDT functions is rolled out and everybody updates the library version They are using which will take I'm guessing about three years Then we can have full observability to the library's traffic using USDT Uprobes But what can we do right now? Well, there ought to be send and receive functions inside the library and The headers and the stream ID must also be somewhere in the library's memory too so The solution to our problem may not be as simple as using two neat USDT probes on library functions But the concept is similar We are looking for functions They're receive everything we seek as arguments gRPC see is an open source project and that makes our task way more feasible We can just look directly at the library's code to search for these functions So Let's see what you probe hacking looks like The first thing we're gonna search for is the gRPC data. It is the most basic part of the protocol So let's look for gRPC receive and gRPC send functions and Hopefully we will find functions that not only have access to the data But also to the stream ID and to the headers providing us with everything we need in just two probes This is our dream right now. This is what we are hoping we can find But how can we even start to search for these functions in the hundreds of thousands of lines of code inside the gRPC see library? Trust me. I've counted Well, we can search for them in three manners The first one is a bottom-up search where we start by looking at the kernel system calls send and receive We've established that the data we search for goes through there and then we can ask ourselves Who uses these functions? the data must be there too and we ask the same question again and again working our way up Through the flow of the data receiving and data sending Sequences and Hopefully we will find our perfect functions The second way is a top-down search where we start by looking at the API of The library these are the functions that the user process calls in order to communicate using the gRPC protocol using the library and Then we start working our way downwards use again working through the Data sending and data receiving flows and again. Hopefully we will stumble upon our perfect functions Now the third way is a middle-out search where we start right at the middle between the API functions and the kernel system calls We can do that by for example searching for a string say send and If the functions in the library are named properly We will land somewhere in the middle of the flow and then we can start expanding outwards in both directions so Starting with a bottom-up search. I found the flush data function It is used to send data and It also has access to a stream struct that has stored within it the stream ID and the headers and That is amazing because all the data we were searching for is just in one place but Well, it wouldn't be that simple. I wouldn't be here if it were that simple This function is compiled in line This basically means that the compiler for Whatever performance reasons it knows better than I do Decided that it's as if this function Doesn't even exist It's just as if the code that's written in this function was instead written directly Inside the function that calls it and that's bad news for us because we can't probe a function that doesn't exist So let's look at who uses this function, which is by the way bottom-up searching and That is the begin write function. What it does is look through all the different Active gRPC streams and sending the data for each of them that is ready to be sent using a set of three functions All of the three are compiled in line and that makes the begin write function be quite a huge bloated function Theoretically we do know that the flush data logic is embedded directly there in the middle of begin write So we could probe it in the middle. It is a power that eBPF allows us however doing that is Way harder and even more hard to maintain So let's look for something else Instead we found a pop rideable stream function, which is used by begin write to iterate through the different active gRPC streams Proping at the end of this function will retrieve to us as the return value The stream struct the same one that is later on passed to flush data This struct has stored within it the stream ID and the headers and from that specific context We can also use it to access the data and That just as we dreamed is everything we searched for in just one probe If you thought this was hard enough To see incoming data we needed three probes one for the data and two for the headers So summing up with five probes on Five library functions one for the outgoing data three for incoming data and a last one to see when the stream is closed Everything works and that is really exciting to me So I want to show you this solution put to action in this demonstration in the bottom right You can see the eBPF code In the top right Is the gRPC server that is now starting and the first thing that's gonna happen is the eBPF code attaching five U probes The client at the left will then start sending periodical requests to the server and also printing all the information from the connection and As you can see with zero code changes to the server or to the client the eBPF code magically has access To all the gRPC information from the connection Uprobe hacking was also used in the pixie open source project To monitor the goal and gRPC library With collaboration pixie now supports tracing both libraries gRPC see and gRPC go and All the languages that use them You can see the code to both solutions in github There are still some languages left unsupported you can see three examples in the diagram This was rough We can repeat this process For every library we wish to observe, but why bother? As I said before using usdT for short-term solution for observability purposes is Feutile it would take three years in the best case until we can reap the fruits But long-term wise as one of eBPF's forefathers brandon said UsdT is the solution The main reason is that you probe hacking has its problems They all stem from the fact that we use the magic of eBPF To access information in the library that is not made visible to us by the library developers It's not such a big of a deal, right? I mean we saw this process. It's hard But it's not impossible. We can do it just once per library. We want to observe Well, the real issue is that this kind of solution may be hard to maintain So let's explore three reasons That this specific solution may require more effort in the future All of the three would not exist if we used usdT probes The first is that library developers refrain from changing the API of The definition of API functions in the library This is important because they wouldn't want that when someone updates the library version their code just stops working To encounter this whenever you you probe hack and if you do that be cautious probe API functions They usually have all the data that you need In our case it wasn't a possibility because when PEEP installs the library for Python usage It does something quite different So what we did is we looked back we looked two years back and the functions that you saw that we chose Are functions that don't change they didn't change in that time period But this is just a way to minimize the risk because they can't change in the future and if they do our probing solution will need to be Accord will need to be changed accordingly The second is that the stream ID and the headers and Sometimes even the data that we found is all in one huge stream struct that has thousands of bytes and it changes almost every library version If in a new version and you field is added to destruct even if it has nothing to do with the information that we seek It might for example shift the stream ID forward by four bytes And this means that for our probes to work they need to be aware of the library version. They are probing In our case we had no easy way to find the library version So what we had to do is download a bunch of library versions and Hash them all and now when we encounter a new library we can hash it therefore knowing its version The third is that new versions of the gRPCC library this specific library are sometimes compiled without symbols This means that we don't know the addresses to the functions that we are Probing and if we don't know the addresses we can't probe them The functions we chose are functions that have unique strings inside them like the one you see on the screen These strings can be used to find the address Very easily using reverse engineering, but it still needs to be done for every library version The work you heard of here You probe hacking is for observability right now You probe hacking is a necessity and a necessity that I enjoy very much to be honest, but EBPF can be made easier. It shouldn't be that hard Modern libraries should add USDT support To make this easier because this is the future We already added USDT support to the gRPCC Open-source library and if I've convinced you that this is a worthy cause, please outvote our pull request Meanwhile, we paved the way using you probe hacking for observability to grow Giving it everything we've got so that in a few years we will be able to leverage the true power of EBPF full and effortless on demand observability so You experience the power of EBPF in changing the world of observability gRPCC is a very interesting and complicated use case This work depicts only the first steps of the revolution I invite you to stay tuned in the coming years and see how it all turns out or Even join me and be a part of this too You could for example add USDT support to a library that you use or even maintain if you have one of these If you do Even if you don't please message me. I would love to be of help So thank you very much for having me today. I hope to see you the rest of the week and I'd love to hear any questions Thanks, Ori