 Right, hey, thank you everyone for joining this session. Hope the digestion didn't kick in, right? Are we good, huh? All right, nice. Did you bring coffee for everyone, man? All right. All right, so, yeah, again, thank you for joining this session. Mainly today, we're gonna be talking about observability using, oh, in terms of service mesh capabilities and what we can do better, okay? There's many things going on today with eBPF. There's a bit of market confusion about things technologies we should use for our service meshes. There's multiple ones out there, so. Things like this talk, I'm taking this example here, Istio, because this is what I know. But I think this can fit any other service mesh technology, right? Because monitoring is something common for every single service mesh. We all need to know what's going on in our cluster and between our services. Okay, so a little bit about me. My name is Adam Sayah. I'm a field engineer at Solo, so basically I just help prospects and customers with all their API getaways questions and service mesh. And recently, I'm getting really interested into the ePF realm. It's, for me, a real good domain that we should investigate into, when we deal with application networking in general, okay? So today, we're gonna go through a quick reminder about what, actually, why we should care about the service mesh at first place, right? And then we're gonna take a look at, I mean, the pros and the cons of kind of the actual things we do with service mesh in term of monitoring. We're gonna talk about how can we improve this a bit? How can we go to this extra mile configuration, optimization in that matter? And I'm gonna do a quick demo. Though the quick demo here is not part of any product, it's just something I put together to kind of get the idea across. This may be implemented in other service meshes or should be an extension of SEO as kind of the main service mesh I know. And yeah, we're gonna have a quick conclusion there and some questions, I hope. All right, so let's start from scratch. Why actually we move to microservices? We move to microservices because it's easy from operational complexity. We have different teams, different needs. They need different things, right? So if we have different microservices, it's easy to manage. We can always upgrade easily. We can scale them differently. We can have all these fun things that comes with microservices. I remember just a couple years ago, we used to do a project that has months and months in term of lifecycle release. So you wait for six months to just have one security patch and then you have to do a massive upgrade and that was just a nightmare week for I used to take my PTOs around that time, so just to get out of that. So now with microservices, things are way better, okay? Way better in that matter. Now, every single team can just schedule their own upgrade at their own pace. They can have the scaling their microservices in their own terms. So it's really good in that specific thing. Now, the thing that we didn't really care about when we used to have moderate applications, if you guys remember, we used to have everything within the same components. So meaning in term of observability and knowing what's going on, it used to be an easy thing. Okay, I'll plug in a debugger there inside, see what's going on, even from function calls and see even from API calls, everything was within contained within the same thing. So no problems at all in term of monitoring. Now, with microservices, it's a different story. We are moving to, I don't know, I have thousands of services. How am I gonna know which one failed, right? There's no way out of the box to know which service failed my request unless I'm adding some sort of metadata. If I'm lucky that my microservices are turning a result, right? And or I just have to go through every single, I mean, my log collection or it's a mess. So how to improve this? And actually that was kind of the main premises of what a kind of a service mesh is. Service mesh is a tool that will help us in different terms in a microservice architecture for mainly these three things. Flow traffic control in general, so obviously resiliency, rerouting, blue, green, canary deployment of services and all that stuff, that's fine. Then security, security and transit. Mainly I want MTLS between every single component, every single microservice just to be saved since now that the services are scattered, I have a larger attack surface on to be saved there. But again, one of the main things in the service mesh is observability. So I need to observe what's happening. If something goes wrong, I always need to know to go back to see exactly what happened in going back to the right service that failed, try to debug it and just having a good monitoring stack on top of all my microservices. So that's what services mesh try to answer. Now, things are that we have multiple technologies out there and most of them kind of face kind of the same problem is that, well, first of all, just monitoring itself is hard from different things. You need to capture the right data, you need to capture them in the right spot, you need that specific concept is kind of complicated. Plus, a lot of service meshes out there will still, we still use proxies. Proxies are main components of a lot of service meshes and they're important, important for all the things I talked about previously like security, like traffic control, I mean, even if that we can have other tools doing it, but also in terms of monitoring, today we use proxies a lot in terms of monitoring. Now, the problem with that is that if my, talking about sidecars or even sidecar less, if my traffic is not going through that proxy, nothing is getting created, right? So nothing gonna get generated if there is no proxy in there. And that's not the reality of the world, right? We are not in an environment where we can just go and put sidecars and proxies everywhere. I don't think it's a good idea either. Who's going to, let's say I'm a big company and tomorrow I wanna adopt service mesh. Am I just gonna go and just like day and night and replace all my microservices with like a sidecar or injected traffic, a proxy there? I don't think it's a good approach. I think we should do it gradually. Always start with like a safe application test if everything is well, then start migrating new applications, add them to the mesh, maybe not into MTLS security now, maybe just permissive for now, then add security gradually. But this also affect monitoring, meaning if you do things gradually, you're also observing gradually. And doesn't make sense because your application, if you have A calling B, calling C, calling D, well if A and B are in a service mesh, well B and C are not. And meaning like tomorrow if you're observing your stack, you're gonna see A and B, you're gonna have metrics around A and B, but nothing for C and D. And at the end, so you are half monitoring your stack and this is something we should look into. Now, that's basically what I was just talking about now. We need to go gradually and also in term of like proxies, proxies are expensive. We can debate about that all day. They are expensive because they are actually a component you add to your infrastructure. So if you add them, you need to add them for the right things. You need to add them for example, L7 traffic are really well done using proxies. But maybe for monitoring, maybe there's something else. And maybe we should look into something different. Now, here is an example of kind of a traffic flow flying through from application to a proxy. That proxy can be a sidecar, like in a traditional sidecar approach, or it can be even in the sidecar less world where we have still a proxy somewhere either on the node side or actually on the cluster itself, it still goes in the same way. You're still gonna go from the application through the kernel socket TCPIP stack going to network, going all the way up to the proxy, again going through the network again and then going to the physical layer to send the traffic somewhere else. So all this is important I mean, we still need it. But what if we can optimize this in term of metrics? I don't want the proxy to generate my metrics until it goes all the way down through all this stuff. And actually here, we're getting a big premise saying that I'm actually successfully calling the proxies. Imagine if something fails even within the kernel, like I don't know, like things happened, right? Like different versions of different things, it's just a problem with the in-flight self. How can I know what's going on wrong with my environment? So how can we enhance this? How can we improve on monitoring using service mesh in general? So I think the thing we should look into is EPPF, okay? That was the kind of core of this talk. How can we use EPPF to improve monitoring in the service mesh? Who here heard of EPPF? I'm pretty sure they remember so hard topic. So okay, awesome. Who is using EPPF in some manner? Okay, so we have a good crowd here. EPPF is a technology that has been implemented that deals directly with the kernel. I'm not gonna go too much deep into this topic. But I just wanna stress the fact that we are not dealing with the same problems we used to have with like a proxy, right? Now we have something lower that deals directly with the kernel to interact with it, to watch, to report, to modify data. So there is a way to interact even at the layer really close to the kernel and now even close to obviously the physical layer. We're going really deep there. So how it works. This is basically the idea behind it. The idea behind it, even in Linux, you have, well, kernel space, you have in term of memory separation, you have a kernel space where your actually kernel is running, is doing its stuff, and actually Linux or any, you know, they are really strict on what things can run there, right? We don't wanna run anything that can impact my machine on the kernel because obviously we want the machine to run. But if the application fails, well, the application fails. It's the user's problem, not really the kernel problem. This way there's a separation between a kernel space and the user space. And what we see usually from a proxy perspective, the proxy runs on the user space, right? It's doing, you know, obviously it's doing like traffic shifting, monitoring, all that stuff. That's fine. Now, the user space, the kernel space, we don't interact with it traditionally, okay, in a service mesh context. And how it works in EVPF, we can create a program, we can create some code. This code gonna be verified, making sure that's not impacting anything, and then gonna be running on the kernel side. It's gonna listen for events, doing some reporting, doing like listening to things, and you know, we can even like modify stuff. But mostly in this monitoring discussion, we can monitor things in the kernel. Then there is a way to send data to the user space where we can report and generate metrics and create promethias metrics and so on. Okay, so from efficiency here, in term of monitoring, we're not talking about user space. If my proxy is not there and stuff like that, that's a different discussion. Now, we are going all the way down to the kernel to observe what's happening. So this is really powerful. Meaning that first, it can be way more effective than a traditional approach, okay? Meaning, well, we don't need a proxy, we don't need to go all the way through all this stuff. We just, we have the data there, so let's just consume it, expose it. It is lightweight, it is effective, it is fast, it is not on the request path, like I mean from a proxy perspective. So there's a lot of things that we can use from ABPF. Now, that's how it should look like, okay? We still have our application, we still have our user space to run our proxy to deal with L7 type things, I don't know, like retries, and oh, even the retries can be networking. But let's say, you know, security, YDC, all the stuff that we wanna do sometimes at the proxy layer. But the ABPF part can run within the kernel, and there we are collecting data, and we are sending it, we are creating metrics, and from that perspective, now we can observe everything going in the cluster. So with what that means, I don't know, what's the difference now that we have ABPF run in the kernel, we don't need any proxy to report any metric, again, meaning if I'm introducing a service mesh, I don't have to do it gradual, like my monitoring would not be gradual. At the first time when I install my service mesh, even if the traffic is not captured and redirected, and part of the service mesh, it's still monitored end to end. This is really powerful, right? Now, I think if I have to stress only one thing during this call, during this talk is this thing, we should really look into ABPF to monitor all our stack, and we still gradually add our services to service mesh, to our service mesh, okay? So if I wanna take an example, going back to my ABCD services, so I have four services running there. Even if I'm adding only the service A and B to my service mesh, C and D are also monitored, meaning if I'm making a call to A, I'm gonna see ABCD, so full monitoring over my cluster, but service mesh is partially on two services. So that's really powerful. Now, again, how it works with the BF, you're gonna run like a code on the kernel, this code will listen to different things, like network, if A is calling B or this is happening, report it, put it into a map, we call this, right? So we put it in the map on the user space, on the other side, we get this data and we do whatever we want with it. We can basically create primates metric from it. And actually, this is what I'm gonna show. Here, I'm gonna use a tool that we created internally at Solo called Bumblebee. Bumblebee is basically a tool that's helped us package EBPF programs and automatically you can have the same Docker experience with kind of an EBPF program where you can package it, push it in the registry and then run it into your cluster. But the good thing there, Bumblebee also automatically create primates metrics from EBPF programs. So you just have to deal with the EBPF side of it. This is just an example. I'm not saying that Bumblebee has to be using service mesh or I'm just stressing that this technology, whatever I'm showing right now, can be tomorrow captured and be part of, let's say, example Istio, okay? So you're gonna install Istio, you're gonna have everything like either the service, you know, the sidecar or the sidecarless version, but you're gonna have the full monitoring over your stack and actually gradually adding your services to the service mesh. So let's see how we can do that using Bumblebee in this case. So, ah, yeah, good thing. Bumblebee is an open source solution. Actually, I can show you the repo here. We're happy to have any contribution going on to the tool. You know, check it out. It's gonna be a good entry point for, actually that's a good thing for me, for example, where I'm not coming from kind of the L4 network. I'm more into, you know, I know proxies, I know service mesh, but I'm learning ABPF. For me, this is a great tool because it's actually helped me care about my kernel program without thinking about my user space program, right? So I'll show you this in a second. So let's go back to this. This first step is basically just installing Bumblebee and I'm gonna create it here. Now this is being downloaded. Then I'm gonna be doing a B in it which is creating my ABPF program. I want it in C. I wanna listen to the network capabilities, okay? So actually the service mesh probably gonna be using this file system is not really important in that context. Let's say we want a network, that's just type of like, I was talking about the map, you know, we use a map that sends data from the kernel space to user space. In this case, we're gonna use, well, we're gonna use a hash map. And then, or you know what? I'm probably gonna just go back to re-init and create it with ring buffer. Then I'm gonna be using, let's say, I just wanna print this and then I'm gonna call it like prop. See. All right, here it's created now. Let's just compile it. Go back here. I'm gonna go back to this ABPF program in a second. All right. I wanna show you first how an ABPF program look like. So here is, we don't have to get into too much detail. We're not here to understand like ABPF code. I'm just saying like this code, imagine that bin package with the service mesh. So this is what gets created behind the scene for you when you install a service mesh. And if you go back to it here, if I take a look at what's happening, I can, so we use something we call kprobes that gonna just listen on events. And here, if what I'm doing is that I'm listening to two events when we get the socket and exit on it. And from there, I'm triggering a function. So that's super kind of easy explanation of what we're doing here. We're listening to the network stack. If something happening, we are going this data, putting it in the map, and that's gonna be pushed back to my user space. All right, that's it. So this one will monitor communication between A and B. Okay, every time something happened between two IPs, they get captured and put into this address where it's like service destination and source destination address and destination address. This data gonna be captured. All right, let's build this thing. So here, there we go, we build it. This is gonna be running for a second here and should be fine. One, two. All right, so while this is compiling, one's gonna be ready. We're gonna be able to use, yeah, there you go. Now that we have this program compiled, now we should be able to use it. So now that this is ready, we should push it to a registry. As I mentioned earlier, Bumblebee allows us to create programs and push them to like a registry to be reused, like a kind of same Docker experience we can have. And then now that is running, now that's built, we can actually run it. And if we run it, we're gonna see that here. Okay, we see data captured between two pods and every time we're gonna have an IP here, it's gonna be monitored and gonna be incremented. This is the number of calls between this IP and the other IP. Okay, so we have this. This is awesome. Now let's think about PrimaThings metrics. In this case, if I go back to my terminal and make a call here to the Bumblebee program, I'm gonna see this automatically created metrics for the events happening between this address and the other address. And you see the accounts incremented here. So this was an example only for, that's an example here, only for destination source address to destination address. But a lot of things can be monitored using EBPF, way more than a traditional proxy. Okay, because now we have access to the kernel, we can listen to a lot of many different things that we should never have access when we get, when monitoring using a proxy. All right, so this is done. Now let's actually build a Grafana dashboard that would use this metrics, this EBPF metrics to build something. All right, it's gonna take a second here. There you go, we are starting. I'm gonna create a set of applications. So think about this set of application as your app that you wanna introduce to your service mesh. And maybe you don't wanna introduce everything, maybe you just wanna introduce the product page service and all the other set of services gonna be not in the mesh for now. So first of all, I'm going to create my services. Now that they are created, I can deploy Prometheus because I wanna save these metrics, right, somewhere. This is created here, the namespace first, and then we are installing Prometheus. I'm gonna take a second to be ready. After that, we're gonna deploy the EBPF program we just created into our cluster. Think about that as we are installing our service mesh. So behind the scene, this is happening, okay? All right, I think the network is a bit slow here. And at the end, we are going just to create installed Grafana and have this graph exporter which transform basically, we'll listen also to the Kube services and transform that into a graph data. So here, let's deploy Bumblebee. Again, this is just an example of what you can do with EBPF. And then we're gonna use a pod monitor. A pod monitor in Prometheus just help us go reach out to certain pods, get on some address, some metrics from them, and put them in Prometheus, right? So the pod monitor got created. Now we can generate some traffic, right? This is going through my gateway. I'm feeding data to Prometheus through this. So it's happening right now. I'm making calls to my application I just deployed. But behind the scene, I have the EBPF program running. So it's listening to this data. It's automatically pushing that into, like creating a Prometheus format for it. And then they have the pod monitor that gathered, grabbed this data and put it into Prometheus. And the last step, obviously, is just using Grafana to show this data. So this is created and this is fine. Now let's create this. And that's all we need to do. Now we're gonna create Grafana. This is created. It's gonna take a second here for Grafana to get ready. Let's see pod dash capital A. All right. It should be... All right, everything's running now. Let's take a look at Grafana. I'm clicking here. Just loading, first time starting. Admin admins, secure password, skip. And then I can do something like... Now, so now we have data. We have the EBPF data directly in Grafana in Prometheus and now we can show, like you can use it in Grafana. So I can just, let's say, create a new dashboard. I'll create a new one here. Doesn't matter. Oops. Okay. New dashboard. I'm gonna add a new panel here. This is fake data, but I can actually add here. I can go back to the main. I'm gonna add actually a data source for my graph data. So EBPF data, EBPF data are in Prometheus and they can use like, I can monitor, I can create the graph with it. But if you wanna use a kind of a graph representation, I need it kind of an exporter there that create the structure for like a graph service. And for that, I'm gonna be using this. Notegraph API. And then I'm gonna add this address for my source. Save and test. Now we are good. Oh, actually, we don't have to have all this space. Save and test. There you go. Now let's go back to create a dashboard. New one, dashboard, panel. And this time I want to have, let's say everything. It doesn't really matter. And then if I wanna go to my, here I'm gonna just do a node graph. And there you go. There it is. You see, now we have, I don't know if I can like, where it's smaller here. But here is a representation. I don't know if you can see, but this is going, this is kind of my microservice representation. Traffic going from my service, product pages with UI, all the way down to my other microservices. And if you can see, I have this full monitoring without even the service mesh at this point, right? I'm not installing, I didn't install Istio, I didn't install anything else. This is just part of what EBPF can do for us in that matter, in the monitoring, you know, realm. Now let's go back quickly to have a conclusion here on what we should, you know, what should care about. EBPF for monitoring, it's great. It has no impact on the request path. So we're not adding any proxies. This is awesome. It's lightweight. It's actually on the kernel. It runs proxies, runs in the user space. EBPF, they run on the kernel side and they have actually, we can report data to your space, but it's kind of have lower footprints on what's happening. In the proxies, we need a proxy to have metrics, though in EBPF we don't need anything. We can just observe directly from the kernel. The thing is with proxy, we have a good understanding of L7 out of the box with EBPF. We have that which still needs kind of a lot of work. I have my colleague Aiden that did a great talk yesterday regarding what we can do with L7. I invite you guys to check his talk. Now, things are that will EBPF replace proxies in general. And personally, I don't believe that. I think we're still in the kind of near future. This is gonna be really complicated. Why? We should not think about a service mesh as a whole. Like we need basically all everything altogether, like using the same technology. Like I need Envoy that does all the things or I need this program to do everything. We need to use the right tools for the right things. If monitoring means, if for the monitoring, it means it's more effective and faster and better, well, let's use EBPF. But if it's L7 where I need a better understanding of my stack, it's more dynamic, it's more complex, I need that on the user space, well, let's still use Envoy, other technology, right? So I'm just stressing the fact that tomorrow we should really think about stop being kind of in this which technology we should use for everything. Last thing about technologies and just use the best out of every one. I have one minute left just for the conclusion here. EBPF is super powerful and I invite you guys to take a look at all the features we can provide with it, okay? Try to see if you guys want help. I mean, E-Bumblebee, it's an open source product that you can take a look there as first start to EBPF programs. EBPF will not replace traditional capabilities of sidecars and proxies in near term. It's evolving, we'll see where it goes, but for now it's not there yet. Proxies and EBPF programs can completely run together. We showed that today we can have monitoring done by EBPF L7 all still done by our traditional proxies. With that being said, this is the end of my talk and thank you guys for listening to me. All right, thanks Adam for this presentation. We have a few minutes for quick questions and gentlemen over there, I will bring the microphone. My question is so for this to be used comprehensively developers would have to create applications on a specialized container, right? So do you mean like from implementation of the EBPF program itself? Yeah, yeah, because it seemed like you had like a special container that... Yeah, yeah, so no. Well, that was because of Bumblebee's the tool that we use for that matter. Things can be different. Plus actually from a service mesh user, honestly you should not care about this. This should be embedded within the service mesh technology you're using. This code I showed you, all this mechanism that get deployed to monitor everything, that should be just part of like your install. Install that, it's running in your cluster, it has all the metrics, everything collected. So you should not really care about the EBPF program. That should be like the service mesh community working on providing this value for you. One more question. So when you attach the EBPF program to one of the functions, what happens if I have multiple EBPF program used by different technologies? How do you coordinate that? Is that even possible? So, well, when you create like an EBPF program, you're attaching it to an event happening on the kernel. Now, in terms of technologies, yeah, it doesn't really matter since like they are actually all sent to the user space using like a map. You are doing your kernel, your EBPF program gonna be standard. There's no technologies there. It's limited, it's superguard rail. Once defined, you are going to add it to your kernel, then the data can be sent to the user space and then you can use any technology to parse that data and transform it into something different. Oh, yeah. Okay. Got it. Yeah. Yeah, yeah. I see your point. I see your point. Okay, so what if there's different technologies? I think you can follow up with Adam directly. I'll follow up with you. We need to move on to the next speaker. Once again, one round of applause for Adam.