 Hello, KubeCon, for this last day on Friday. My name is Thomas. I'm one of the creators and maintainers of Cilium. And Cilium is one of the popular Kubernetes networking implementations. And based on what we have learned creating Cilium and working with many of you, I would like to share how to survive day two and how to troubleshoot Kubernetes networking. So a bit of context, I'm one of the maintainers of Cilium. So what I'm sharing today, of course, has a bit of Cilium context because I've been mostly working with Cilium the past few years. But the exercises or the best practices and how to approach this is, of course, not at all specific to Cilium and can be applied regardless of what Kubernetes networking implementation you are using. For those who have never heard about Cilium, Cilium is a CNI plug-in among other things. It is a CNCF project at incubation stage. But Cilium does not only provide networking or CNI functionality, which is what we will focus on today. It also provides service mesh and Hubble. And a lot of what we will hear about today is made possible by Hubble or tools similar to Hubble. Hubble is the observability layer of Cilium. Then we also have in the Cilium family Tetragon, which is providing runtime security and security observability. All aspects of Cilium are being done using a technology called EBPF. And that's actually what's enabling a lot of the observability that we will see today, which will assist you in troubleshooting. For some, in particular, in the last part of the talk, we'll also look at layer 7 latency or HDP latency. Some of that work is done through Envoy, the Envoy proxy, which is also a CNCF project. So let's jump into Kubernetes networking or how some people call it the dark side of Kubernetes. How many of you are familiar with the core concepts of Kubernetes networking? Excellent. Almost everybody. So this will probably be a repetition for most of you. But if you've never heard about or have never seen how Kubernetes networking works, this may help you understand the rest of this talk. First of all, Kubernetes networking is very simple and basic. All pods have an IP address, which means every individual pod has an IP address. They can directly talk to each other. There is no specific network topology required. Kubernetes makes a simple assumption that we have a flat, so-called flat layer 3 network. Pods have IPs, and thus pods can talk to each other. This is sometimes implemented differently depending on whether you're running in the cloud or whether you're running on-prem. You may be running BGP on-prem. You might be using the Cloud Provider SDN as you're running in the cloud, or you might be running in overlay. Doesn't matter, all of these implementation principles implement this basic foundation. Typically, this is not true for all Kubernetes implementations, but typically, every node has a so-called pod-sider range, which means pods on a particular node will have IPs assigned to that node. So that's a particular set of IPs, and all pods on that node will have an IP out of that range. Some implementations do this differently and allocate a unique IP. But there is a concept, you may see this, in Kubernetes called the pod-sider range or per-node pod-sider range. Kubernetes uses services for load balancing. You're probably using this all the time. This is the load balancing layer of Kubernetes. This is what's allowing us to have multiple replicas of a pod, address a single service name or single cluster IP, and Kubernetes will load balance to any of those replicas. We'll look into some troubleshooting there as well. And then, of course, Kubernetes uses DNS for service discovery. So you can address a service by its name using DNS instead of hard coding IP addresses into the application. And then as the last principle of Kubernetes networking, there is network policy. Network policy is what implements the segmentation. So network policy is what allows you to define which pods can talk to which other pods. You could be doing segmentation maybe on namespace level. So maybe you want to allow within a namespace, you want to allow communication, but the cross namespaces communication should not be allowed, or you can even do this at the pod level and essentially say that only certain pods can talk to certain other pods. This is called Kubernetes network policy. And as we will see in the demo later on, plays a huge role in some of the troubleshooting. This slide looked very simple, right? It looks beautiful. Kubernetes networking is very, very basic. As you learn Kubernetes, as you run Kubernetes, I did this slide back in 2021. That's a little bit more how it looks like in practice, right? So you have an overall goal, which is to forward together, to do network forwarding. You have IP tables somewhere. You have Q-proxy using IP tables. You have an application team trying to schedule 5,000 services, really putting stress on Q-proxy. You have the lifeness probes succeeding. So the applications seem like they're up and running because they're not aware of the actual network underneath, or the applications are just reporting, hey, I'm healthy, I'm healthy, I'm healthy, but they're not at all because they're actually not even reachable. And you may have a platform team that is completely ignoring the crash loop backoffing coordinate spots. That may be closer to the reality sometimes. So this talk is giving you tools on how you can look into this and figure out where the problem actually is. So let's dive one step deeper into Kubernetes and see how this is actually implemented. Because as we troubleshoot networking, it helps to understand the base concepts of how the CNI actually works in the Kubernetes cluster. And this is true for all CNIs. It's a core concept of Kubernetes. So we have multiple nodes, and we have parts running on the nodes. And of course, containers running inside them. We then have a network CNI level, or layer, which typically runs as an agent or as a demon set on all the nodes. And the CNI then essentially spans the network plane, which allows the parts to talk to each other. Kubernetes itself does not have a built-in networking layer. It requires there's a default CNI, KubeNet, but it requires the CNI plugin to actually allow parts to talk, not only across nodes, but even inside of the node itself. And then we have KubeProxy, which is the default implementation to implement Kubernetes services and as well, Coordiners, which is not in the picture here. When we implement this with Cilium, this looks very similar. We have Cilium that provides the networking data path. And then we have EBPF, which is doing the implementation of that and is actually causing the package to be forward or net or policy rules to be implemented. The last concept we need to understand before we can get a little bit more hands on is Hubble. Hubble is based on top of Cilium and provides observability. There is no demand. There's no requirement for a CNI to actually provide observability to stand that only demands for parts to be able to connect to each other. And everything else is optional. So even net or policy implementation is actually optional. So a CNI can provide as little as just pure network connectivity. With Hubble, we are essentially providing additional observability tooling in the form of Flowlocks, who is talking to whom, and metrics. It's essentially a TCP dump for Kubernetes. Because in the early days of Kubernetes, many of us have been executing into a node or into a pod and literally running TCP dump to somehow figure out what is going on on the network. And for those of you who have never heard about TCP dump, it's a 25-year plus-year-old tool, which has obviously never been intended to be used for an environment like Kubernetes. Hubble has native integration with Prometheus and Grafana. So you are not looking at you can, but you don't have to look at the terminal. You can look at dashboards and at Prometheus metrics. Let's look at a couple of examples, and we'll jump into a live demo afterwards. So we can, for example, look at Grafana-based panels. How many packets are we forwarding? What is the drop rate? So how many packets? How many network packets are getting dropped by the network layer? What is the total amount of traffic being forwarded? Or we can even create dashboards to display how much cross-region, cross-AC network traffic do I have in my cloud? Because that's usually what your cloud provider charges a little bit extra for. That's the Grafana view. We also have a Hubble UI service map where you can see all the services that are running and how they are talking to each other. So here we actually see the individual network connections and the API calls for application protocols that we understand. These are HTTP, GRPC, Kafka, Cassandra. So we're not only showing you who is talking to whom, we can even show you with Hubble what type of API calls they are making or what is the request response latency for a GRPC call. And on the lower part, you actually see the live flow data. This service map is being based on because what you're seeing here on the screen is derived completely transparently. This is not requiring changes in the application. It's essentially Hubble running transparently on the nodes and extracting all the connectivity data transparently. And then we can calculate what we call the service map from that data. And the raw data is what you see in the lower part of the screen. As a last concept, the Kubernetes service implementation and this is if we would go into a lot of details, several types of services for the purpose of this talk, I will keep it simple and talk about cluster IP. So this is the ability to expose multiple pod replicas via a single IP, a single cluster IP, which means that instead of being aware of the potentially hundreds or even thousands of pod replicas to talk to, I can talk to a single cluster IP. And this cluster IP will then get low balance to any of the hundreds or thousands of pod replicas. And of course, you do not want to talk specifically to an IP. So you will, Kubernetes allocates a service name for you and makes that available as a DNS name via coordinates. So essentially your application talks to the service app name and Kubernetes takes care of the rest for you. Let's jump in into the first demo and actually show you how to troubleshoot some problems. And I figured let's start simple and use a very basic network policy example because what could possibly go wrong there, right? In this simple example, we will have a frontend and a backend pod and we'll do a network policy that will look something like this. So this is a Kubernetes network policy. And this policy, for those of you have used this before, this will look very, very simple. We're allowing from the frontend pod to talk egress, so outgoing egress, when we are networking people we'll call egress the outgoing side and ingress the incoming side. So we are creating a policy that the frontend pod is allowed to talk to the backend pod. And does somebody already see the problem with this? If so, yes, I see one hand, excellent, right? But it looks very simple like what could go wrong. So let's actually try this out. So I have this running right here. So the pods are up and running and they seem to be pretty happy. Is it big enough? I think so. They're up and running, right? So this is probably good. But actually under the hood, this is not good at all. And it's pretty, pretty hard to actually even see this because as I mentioned, the health check here, the application is reporting just fine, the application is up and running. In this case, the application, they may actually be complaining, hey, my app team, my app is not really working even though it's up and running, what is going on? So let's actually start looking below the hood what we can see on the network observability side. So I'm swapping over to the Grafana dashboard of Hubble UI and we'll start out with the overall view. This is the cluster wide view of everything that is going on in this cluster. Here we see the total amount of traffic being forward. We see the total of endpoints. That's the total of parts running in these clusters. We see the number of nodes. We see that we have no unreachable psyllium nodes. We see that we have no warnings or errors being reported by any of the agents. We do see a bunch of DNS errors though. That's probably, so if we zoom in here. That's probably something we should be looking into. So we have quite a few DNS errors that are ongoing. We also see that we have policies loaded. So we have 16 policy loaded. We can look at the enforcement status and we see we have 53 parts which have network policy enforcement enabled and we have 61 parts which have no network policy enforcement enabled at all. We can go forward and actually see we have some missing DNS responses. We can look into the connection tracking table and so on, a lot of information. In this case, I actually know that something is wrong potentially with this application, front end and back end and this is exposed into a namespace that I'm known. So I'll go over here and actually look into this KubeCon simple namespace. This is where front end and back end is running in. And it's very interesting. We can see there's some flows being forwarded. We can scroll down. And here we go, we see network policy drops, right? We see a constant rate of network policy drops from the front end. We can go deeper and actually look at well, where are these network policy drops? Where are these packets attempting to go? We can zoom in here and we see, huh, this is going to the Kube DNS part. So what is wrong? At this point, we know that something is being dropped. So let's figure out what's actually being dropped. So I will go back into my terminal here and I will use the Hubble Observe CLI command. So this is the CLI that will allow me to query Hubble. I will run that and it will show me all the network drops and all the forwarded flows that is happening in this namespace because I run it with the dash and KubeCon simple namespace filter. And yes, indeed, we are seeing drops from the front end part to the Kube DNS part. So we've troubleshooted it down from the high level overview where we saw we have some policy drops to the namespace view where we actually saw this was going to Kube DNS. And then with Hubble CLI, we were able to see the actual real drops. So we go back to the slides. It was indeed DNS as often the case for many incidents, right? Even though it was actually not obvious at all. I think a lot of application teams will inject in our policy, they will not understand that or may not understand that I also need to allow to Kube DNS. I need DNS for my service. So let's go to the Hubble UI. This is the service map with the view on just that namespace. And we see in fact that we only have communication from the front end to Kube DNS. And we actually see in the lower part, it's a bit small maybe, but these are all dropped flows. So this is all been dropped. Could even click on one and it would tell me the drop reason is policy denied. So let's actually fix this. So let's allow core DNS. So I have a policy, allow DNS. See if that works. So I created this policy, allow DNS. So this is the policy which on top of the policy that already have now allows to Kube DNS. And bam, we can go back into the service map and we now see not only connectivity to Kube DNS but also to the back end service. And we now see new forwarded flows in green. Those are allowed flows. So we fixed our problem. Now, how did I get to this policy to this allow Kube DNS policy? Of course, you can ask chat GPT today, right? If you don't know how to use that yet, you can also use what we have, what we call the network policy editor. This is available for everybody, editor.networkpolicy.io, it's free. You can use it and it will visualize network policies for you. And right now I've loaded the original policy and based on the graphic you see in the top, you can actually see that, yes indeed we are allowing to the back end what we are not allowing to Kube DNS. You can see the arrow is actually red. I can now quickly allow this, go in here, allow rule, and it has now extended our existing policy to allow DNS. And you can see the new YAML that was added below which will actually allow Kube DNS. So if you don't know how to get to the YAML yourself, network policy editor can do this for you or even in the visualization you can see the problem straight away. If you do not know the namespace specifically, where the application is running, we have what's called a policy verdict dashboard. This is this one, cluster wide and we see a whole bunch of drops going on. So you'll probably need to look into, well, what are the drops I care about? So there's actually a namespace level view down here where we can indeed see that we have drops from the front end, Kube simple part into the Kube system, Kube DNS part. So this is another way if you're not sure which application is even affected, you can go in here and see all the drops by name, filtered by namespace across the cluster. So this is how the policy looks like fixed. We not only need to allow to the back end, we also need to allow to the Kube DNS or Kube DNS part. So more DNS because DNS is really often the problem. We looked at part to part. Next example, we're looking at something that's a little bit more advanced. So as we have learned, Kube DNS DNS is used for service discovery. It's usually implemented by core DNS, but that's actually not mandatory. You can of course use a different DNS implementation and it looks simple. It is not always simple. So let's look at how to troubleshoot something like this. And this is a simple part in another namespace that simply tries to reach out or intends to reach out to google.com. So let me go over and switch tab. This is same cluster, but I have just one pod in one namespace running. Then is DNS who is doing curl to google.com. And let's say the application team is actually complaining, hey, this is not working. What is going on? Let's go back into the dashboard and I have the demo DNS. So we've seen this view before and we can see that the dashboard view of how the DNS layer of Kubernetes is doing in general. So we see about 30 DNS requests per second and we see the top 10 DNS. I can make this a bit bigger. We can see the top 10 DNS queries. We can see we're looking up elastic search, core API and so on. Then I also see a bunch of DNS errors, right? So we're seeing quite a few of them and one of them is from my Dennis DNS pod. So these are like the rate of DNS errors that this DNS or this pod is actually experiencing. We can then go in and actually look at what are the specific queries that are making that are not successful and we see down here that yes, we're trying to look up or curl Google but the app has spelled Google wrong and was using zeros instead of O's. Of course, that's the dashboard view. We can go back and actually look at the observe view as well. So I go back and I run Hubble observe. This time I'm specifying the namespace debug DNS which is where my application is running in and here we actually see, yes, we see the communication to Qtns that this is UDP and then we see the actual DNS requests and responses. We see here that the pod is actually attempting to resolve Google with zeros for both IPv6 and IPv4. I can then even dive deeper and say I actually only want the layer seven information like this and now it will only show me the resolution paths. Like what is the DNS resolution that is going on inside of this namespace? So in here I can quickly with the dashboard here look at is my cluster are there pods which are experiencing DNS resolutions and as I've identified those pods I can use Hubble CLI to go in and actually find out what is specifically going on, what are they trying to look up and so on. Last example, debugging service latency. Service latency is let's say you have deployed an application and the application is not performing as fast as it should be. How are you troubleshooting this? How are you even identifying this? And there is a standard for this or a best practice for this called golden signal dashboard that's been invented or written down by the Google SRE team. You can see the information at the bottom. It's actually documented pretty well. It's a standard way of monitoring your infrastructure specifically for cases where you're running a service that's publicly available and the famous four golden signals that matter in this case are latency, traffic or throughput, errors and saturation and we'll look at what that actually means and why that's useful. So I go over here and I open up the third demo. This is the Hubble dashboard for golden signal dash for the four golden signals. We see at the top the request rate, how many requests per second are actually coming in. We're then seeing the request duration and in this case this is HTTP. So this is actually the HTTP request to response latency. We can see P50, P95 and P99. These are essentially the tails. So P50 is the worst half of the latency number. P95 is the worst 5%. So if you only look at the worst 5% of connections with the worst latency took the longest, what is the average over that 5% and P99 is only the worst 1%. And what really matters is P99 or P95 because on average it often looks good but then for some customers the experience could be really, really bad. So you really want to not only monitor an average, you want to monitor P99 as well. And it's interesting that we can actually see significant peaks here. So the average is actually pretty good. If you would look at P50, that's the green line. It's all the way at the bottom. So if you only look at the average even of the worst half, everybody's like, yay, happy, like it's almost zero. That's probably really good. But if you look at P99, it's sometimes up to two seconds. So some requests going into this service have actually experienced a latency of two seconds before they got back a response. That's great. Now we understand that there is a problem in terms of latency. We can also understand that there's actually some problems in terms of errors being returned which is the second column. We can see this, this is the error rate. So this is the rate of HTTP errors that are being returned by the application. So any sort of HTTP 500 type code. So one source of problem can be the request taking very long so the service being slow. And another common problem is that the service is crashing and returning like an HTTP 500 error code. Both are very meaningful and important to monitor. That's great. Now we know that there are some problems. In order to now actually debug this, we need to correlate this with saturation because we need to understand what a latency is because there is limited availability of resources and that's often CPU. So we actually have the HTTP request duration and the CPU usage right next to each other. So I can actually go in here and see whether a spike, I can go in and actually zoom in here and just let's just look at this spike. I can look whether this spike in latency is caused because there was a spike in CPU usage on that note. And I can do that for both the source and the destination. So if I have two parts running in the cluster, I can measure the latency and I will see the CPU consumption on the source node and on the destination node. So I can even understand if the latency is bad on the destination side, whether that was caused by limited availability of CPU on the source node side. This is allowing to monitor applications broadly, cluster-wide to figure out is my service up and running, is it returning errors and how fast is my service actually running? Like how quick is the app functioning and so on. And again, all of this is done using transparent extraction of observability data. So this is not using application instrumentation. We're now looking at the Grafano dashboard of this, but all of this data is also available as open telemetry metrics and traces as well. So you can also visualize this in other tooling that is open telemetry capable. And with that, I think we have a couple of minutes left for questions. This was like a quick overview. Well, please, thank you. If you want to learn more, cilium.io, where you find information about cilium and hobble, or if you want to ask me questions afterwards, feel free to reach out on Twitter. Now I think there are mics left and right in case you want to ask questions, but I also know that it is lunchtime. This side. Hello, just one simple question and fast, I hope. So, okay, let's say we would like to use cilium, but we use OpenShift now. And OpenShift has, now we don't use OpenShift CNI, but OpenShift OVN. Do you know if they are compatible? I mean, can we add cilium over OpenShift and don't break everything? Yes, you can run CNI chaining modes where you can run cilium, just a hobble portion on top of OpenShift as CN or OVN and so on, absolutely. Okay. But cilium is also available on OpenShift if you want to replace it, both or both. I see it. Thank you. Hi, I have a question regarding as a network plugin. What does a network plugin like Calco or cilium provide, not provide by EKS VPC CNI? What will make me choose the network plugin and not the cloud plugin? Especially, I feel that VPC CNI makes the life easy. So it makes the pod reachable or accessible inside the VPC. Also, if I will use a low-balancer controller and I use the IP target, so the request can reach to the pod directly without it go to the API server. So, yeah, that's it. Yeah, so I think a good, often a motivation to run Cilium on top of the VPC CNI in EKS would be to install Hubble on top or the network policy implementation. So just like with OpenShift, you can run on top of the VPC CNI. You can also run Cilium natively in what's called ENI mode, in which case it can, just like the VPC CNI plugging itself natively route all of the pod traffic. So you don't need to run Cilium in what's called overlay mode. Was that your question? I'm not 100% sure whether I got it right. Yeah, so what we will provide for me if I put it on top of the CNI. So CNI now provide me the network or implement the network and make the pod already accessible. So, and also increase that latency. So what this will add to my configuration or my implementation? What's the main feature for that? Can you repeat the last part? I didn't fully understand it. Yeah, so I mean, if I have CNI and after that I installed Cilium in top of the CNI, I mean VPC CNI, what is the feature for that? What we will add more to my implementation? I see, okay. So the VPC CNI, at least as my understanding right now those connectivity, and there is of course Q-proxy already running. So the service implementation. And I think there already is or there is an upcoming network policy implementation. What does not existing there is all the observability that we've seen today, which is needed for the troubleshooting. Then Cilium also has functionality such as cluster mesh, service mesh, Tetragon, all of these are not available in the VPC CNI plugin. So the extended functionality. Does that make sense? Yeah, yeah, thank you. Yeah, a good question as well. We are currently using Istio with Kiali on top, which provides a very similar set of capabilities with what you were just describing for Hubble. So what would be the benefit of switching from one to the other? Because there is of course the service mesh level and the CNI level and like what's the difference that that gives you? So I think the different for most users will be that if you don't have a service mesh yet, you can get this easily just natively from Cilium. You don't need to actually deploy a full service mesh. If you are already running Istio, then Cilium provides very similar observability data. We even implement the same open standards. There are, there can be some performance differences depending on how you run Istio, ambient mesh versus the traditional Istio and so on. That you would have to look into in detail. So the difference is less in capability but more on the architectural model, potentially the performance of it and so on. Let's go ahead. I was trying to think for a great presentation first of all and I was trying to come up with a question. We, I'm trying to troubleshoot the problems from AWS ALBs talking to the Q cluster itself. And a lot of times I see that there is like connection to target errors or so. So like based on the observer like investigation and I call it that is somewhere we are, we're suspecting that the target, the pod is basically still not ready but the traffic is still trying like trying, there are attempts to route it there. If there is something like, is it anyhow the use case of Hubble that can help us with troubleshooting all those things? Yes, absolutely. Because if that is happening on the network you typically receive what's called a reset, TCP reset because the port is not yet open. There's even a Hubble filter or a dashboard that shows you those parts, right? Where you're reaching out to a service, maybe the back end, the replicas are still scaling up or they're already scaling down. And this is actually recognizable on the network because typically you don't have a lot of reset, TCP resets in your Kubernetes network. So it's often a pattern that you can see. If you know the specific service then with the observer, with this view with the Hubble Observe view here with the CLI you could actually see each reset. So you could see the service trying out and then instead of just not receiving anything back you will get back to reset. I can show you an example on how that looks like. So that's how many Scyllium users actually troubleshoot this. Because it's more like not on the service itself level but maybe when the node itself spins up. So at that time, so it's somehow show that the node is already ready and even the slowdown side, so like ALBs should wait. So if you're running Scyllium in those scenario that's a motivation. I know that we have seen the Hubble dashboard for the new cluster we just launched and we see the same, exactly the same problems and I think at that time we were using EKS because before it was for the self-hosted cluster. So there is Hubble somehow. There is Hubble, yeah, okay. And if you already run Scyllium in that when the node is still coming out there's actually a specific error, a specific drop reason that you will see that this went to a node that does either not exist or is not available yet. So if it even gets to the nodes but Scyllium is not ready there or the node is not ready yet, Scyllium will actually report you the specific reason why it could not deliver the packet. So that could be a motivation to run the Scyllium CNI part. You said I can see a demo, how can I? Yeah. Can I just come up or? Yeah, absolutely. Absolutely. Okay, cool, thank you. So I have a couple of questions. Great presentation firstly. So we wanna migrate away from Calico to Scyllium. We run on-prem, so all the fun stuff that goes with on-prem clusters. Is there a migration document that we can follow? I asked the booth and apparently there isn't one. Is there anything in the works? There is, even better, there's actually a video recording of Duffy Cooley migrating I think from Calico but I'm not quite sure. But yeah, so migration paths exist and there is on the Echo newsletter which is the EVPF and Scyllium newsletter and show. There's actually a migration episode on how to migrate to Scyllium. That can be a great starting point. And then the next step, if that's not enough, ping us on Slack, we're happy to help. We may have a more formal, more official migration path or documentation path soon. The trouble with that is often there's not just one way to run other CNI plugins. So it's often really depends on are you running, let's say Calico in BGP mode? Or are you running that in the cloud? Are you using that or policy? Are you using Calico or policy and so on? That's why it's not just a flip of a switch. Okay, and the next question was, so we run vanilla Kubernetes across a couple of sites and Azure too. So if we were to use Scyllium and Hubble, Hubble UI, how will traffic show when it goes through several other layers of infrastructure like firewalls, gateways, et cetera? So we obviously have access to the virtual machines but the infrastructure teams manage the firewalls. So how will that show on Hubble? Would it actually show as next hops or what will Hubble do with that traffic? Unless you do something and run something on those middle boxes, you will not see anything, but you can actually, if you want, install Hubble on them, if you want the observability data. Often Scyllium users will then replace, for example, the external local answer, with Scyllium, the Scyllium local answer and then you get the observability data. But if you run Scyllium in overlay mode, it will not actually even care about what the underlying infrastructure looks like. And as such, you will not see any details, but you will see, so Scyllium will warn you if it transmitted on one node and it did not receive on the other node, which will stand usually the indication that the underlying network dropped it somewhere. Right? Cool, thanks. Oh, by the way, you can see, typically if it's L3 routers, you can compare the TCP TTL on the sending and the receiving node, then you at least know how many routers you had in between. Cool, thank you. I have a question regarding the L7 detection. I mean, you showed with the command line tool in the UI that there was some information about the L7. And I wanted to ask, in case we have a TLS, do we still have some information about the L7 protocol used? The answer is it depends. So if you are using Scyllium to do the TLS, then yes. If the application is doing the TLS itself, then the answer is by default, no. But with KTLS, we can optionally if the application allows for this. And then if you are running a service mesh to do MTLS, then Hubble can also see it because it's unencrypted up until to the proxy. But if the application itself is TLS, and if the application does not want to be eavesdrops on, then we cannot see it. That would break the security mode. There are some ways to do that with Uprobes in the app, but that's not really reliable. So we can demo that, like many others as well, but it's not something we recommend to run in production because it's not 100% guaranteed. Okay, thank you. Hi, you mentioned that there are traces from Hubble and Scyllium companies. What are they or what kind of information I can get from them? So traces, we do open telemetry traces. We can do HTTP traces with request, response, spans, with the latency, the return code, all of this. We can also do traces for the DNS resolution, which is interesting. You can see the whole trace and then the spans including the DNS resolution so you actually know how long the DNS resolution took. Then we also have TCP traces support as well. So for whatever HTTP or layer seven protocol we support, as well as for the underlying protocol, we can emit open telemetry traces with spans. And then if you want to add additional instrumentation via open telemetry, application instrumentation, you can combine that together. So if you want to have spans inside of the app or including trace ID support. Okay, cool. And second question gonna be about routing of L7 level layer seven protocols like HTTP one, like in case of service mesh, like is it really possible with Scyllium? I mean, like the build some custom logic with some specific URLs, et cetera, like the things that like easier can do, right? So Scyllium implements the ingress standard and the gateway API standard. So whatever you can do with either of them, you can fully perform URL rewrites, path-based routing, header-based routing, carrier rollouts, then we have some annotations on both of them. And then as the most sophisticated, most powerful API, we have what's called an Envoy CRD, which allows you to configure direct Envoy configuration. So whatever is possible to configure in an Envoy listener, you can use. So the full Envoy feature configuration that's available in a so-called listener, you can actually configure for retries or for specific TLS termination that is going beyond what's possible in ingress and gateway API. So is it correct that it's not happening in the inside of the kernel, right? I mean, it's- Correct. So all of the layer seven low balancing is being done with Envoy. We have some observability into layer seven using EBPF, but the low balancing retries rate limiting on layer seven is all done through Envoy. We do have an MTLS model that does not rely on a proxy, but that has nothing to do with HTTP or any layer seven processing that's purely on the layer four, right? Okay, cool, thanks. Hi, so we are also running Calico on prem. And I was wondering whether it's possible, assuming the worst case, so we cannot migrate to Selium for some reason. So whether we can run Apple on top of Calico and have the same observability metrics and whatsoever. So yes, you can chain Selium on top of Calico and run Hubble. It does have some limitations because the sum of the layer seven observability into HTTP, the Envoy injection we're doing that relies on using a particular feature that Calico also uses right now. So that specific aspect is currently not compatible, but everything else, I would say roughly 95% what you have seen today is possible to run on top of Calico, absolutely. And does it require any change from an application level or is it completely transparent? No, it's completely, you are changing your CNI configuration to also launch Selium next to Calico on top of it. And then it will not actually do the routing. You can even still have Calico do the network policy if you want, or you can say, I'm migrating away from Calico network policy and I'm using Selium network policy. And then in either way, do the observability on top, that is possible. Okay, so the traffic is going to Selium and then down to Calico and then it's going. Okay, please. Calico actually only uses the standard. So if you're running Calico in the default configuration, then it's just using Linux networking. There's nothing too special about it. If you're using Calico in EBPF mode, then it would use that EBPF. But both are compatible, I think. Would there be maybe performance issues by running two CNI on one on top of each other? Like only running one is definitely more efficient than running two, we would have to run it. If you are not doing anything with Selium, right, the overhead is very small, I'd say overall, right? Like as you configure all of things in Selium, there is overhead. And then it will depend on how much observability you want, which will also then dictate how much overhead you have. But in general, that is definitely 100% feasible and we have many Selium users that are doing this. Okay, thank you. Hello. Hello. We're thinking about shifting from AWS to GCP, but we know that Hubble isn't available in GKE. So do you know if it will change someday or if there is an alternative to that? You can run Selium on all the cloud providers natively. I can obviously not speak to when the cloud providers would enable specific functionality in their version of Selium. I know that some of them are considering to bring Hubble, but you can run Selium, to my understanding, on all the cloud provider as a replacement CNI or in chaining mode on top. In particular, of course, on AWS, that's common. And that's actually what a lot of users and ISOvalent customers are running. Okay. I think we got all the questions. One more. One more. Sorry, I'm sorry. Two more. Two more. We are also an OpenShift customer and we are using OpenShift with default OpenShift SDN. They use IP tables. Currently, we have a cluster IP service and the clients are calling this service in a high rate. And we are creating lots of new TCP connections and we are close to consuming all the available TCP source ports. In the Selium implementation, I think you are replacing the cluster IP address with the real endpoint IP address during the TCP connect times. Does this implementation increase the number of tuples? So it increases the number of the usable TCP source ports. Very good point. We have a good answer for you because when Selium implements the load balancing with Qproxy replacement, we actually use our own connection tracking table which you can set the limit for. So that solves limit number one. And then if you're really running out of tuples overall, I would recommend that you are using the socket-based load balancing, which does the load balancing at the socket connect system time, which is incredibly more scalable. If you ping me on Slack, I can give you pointers on that to get you started. Thanks for the demo. So since GKE DataPlane v2 uses Selium, I was wondering if that's compatible with Hubbell because I did a quick search and said that the Hubbell UI wasn't available because the agents don't expose the ports but I was just interested whether the CLI Observe tool would work. I know that there is work to enable Hubbell on GKE. I'm not aware of the latest status as of now. In the initial version, it was not enabled but I think the Google team is working on enabling Hubbell. As soon as the Hubbell API is exposed, then the dashboards and the Hubbell CLI will work to some extent because they may actually not immediately enable all of the Hubbell functionality, for example, the DNS observability or the Layer 7 observability. That's a question to ask for Google what they will actually specifically enable there. Okay, I follow the project. Thanks. Awesome. Thank you very much again.