 Yn ymlaen i Llyfrin, mae am gyfarwyddor ymlaen i isovalent. Isovalent yn y cymdeithas ymlaen i Llyfrin. Ond mae'n ddweud y prosiect yn gweithio'r cymdeithas ymlaen i'r cymdeithas. Ac yn ymlaen i'r cyfnod ymlaen i Thomas, bod yma'r ymlaen i'r prosiect a'r ysgrifennu ymlaen i Llyfrin. Rydym yn ymlaen i Laura, ym ymlaen i'r cyfnod, o'r cyfnod i Llyfrin. Felly, we're going to hear from Purvi from Google about how they've been adopting Cilium in the data plane. And then I'm going to wrap up with a little bit about how you can all get involved in Cilium. Can I get a show of hands? How many of you are currently using Cilium today? Good number, excellent. How many of you are thinking about using Cilium? OK. How many of you have contributed PRs to Cilium? A few of you. How many of you would like to contribute PRs to Cilium? Excellent. Wonderful. Have I failed to ask any questions? I think that's great. So, without further ado, I'm going to hand over to Thomas who is going to explain a little bit about what Cilium is. So, welcome, Thomas. Thank you very much, Liz. All right. Welcome. Hello. My name is Thomas Graff. As Liz mentioned, I was one of the creators of Cilium. I also co-founded I Surveillance, the company behind Cilium. Because this is the first time for many of you that you hear about Cilium, I want to do a quick introduction very briefly on what is Cilium and what is EBPF, because both are tightly connected together. So, what is Cilium? Cilium is a CNCF project at incubation stage. It uses EBPF, and we'll talk about what EBPF is to provide networking, security, observability, very new service mesh and ingress. As mentioned, it's in the CloudNative Foundation, and it's on the line technology. It uses EBPF and Envoy. We'll see where Envoy gets into the picture. Cilium today is used by many, many users from all sorts of verticals, SaaS companies. We'll hear an end-user with Lono later on, but also Telco's cloud providers, for example, GKE, and we will talk about this, is using Cilium as the networking layer for managed Kubernetes offering. So, what is EBPF? How many of you have heard about EBPF before? Everybody, great. Thank you. EBPF is, in one sentence, to the kernel, what JavaScript is to the browser. What I mean by that is EBPF makes the Linux kernel programmable, which means we can extend, we can change the behavior of the Linux kernel by loading EBPF programs, and very similar to JavaScript, they run in a safe sandbox environment. The difference is they run a lot faster than a JavaScript program. We can do this for a variety of events. In the slide, we're seeing an example where this is done for a system call. We can do this for network packets, for storage access, for file access, for a variety of trace points, K-promps, U-promps. Cilium uses, obviously, networking related endpoints. Cilium is not just a C&I, though, because many of you have probably heard about Cilium as a C&I first, which provides very scalable and we'll hear about scale from Lono, secure and high-performance C&I and working for Kubernetes. Cilium is also a service mesh now, so Cilium can provide a sidecar-free service mesh and Ingress. We have network observability with Project Hubble, which provides open telemetry, Prometheus metrics, and very new, recently announced Tetragon providing security observability and runtime enforcement. So let's look into all of them a little bit. Cilium, C&I, what are all the things it can do? There's a lot. Obviously, it's a C&I, and it provides a variety of networking functions, so for the networking people in here, IPv4, IPv6, of course, but then also things like NAT46, translating IPv4, IPv6, so we can bootstrap an IPv6-only cluster but still make parts addressable by IPv4, but for example, for telcos also provides SRV6, so a very wide range of networking capabilities. It supports various topologies, overlays, BGPs, and integration into all cloud provider SDNs, at least all major cloud providers. It can replace Qproxy, so it can provide Kubernetes service implementation, and it has its own North-South load balancers or load balancing traffic into the cluster, and very new, Kubernetes Ingress, layer 7, traffic management load balancing. Cilium has extensive network security controls, obviously support for Kubernetes network policies, but they can then be extended with Cilium network policies to provide, for example, DNS-based policies, so a policy where you can use DNS names or wild cards to define what is allowed, and also layer 7, HTTP, Kafka, and so on, where you can provide even better security policies. On top of segmentation, we can also do transparent encryption with IPsec and WireGuard. Widely popular is also the multi-cluster and external workloads, so we can connect clusters together with Cilium using standard Kubernetes resources, so we don't need to use additional CRDs or anything. You can use Kubernetes services to define, for example, global services and do services coverage across multiple clusters. Of course, network policies will work as well. Fascinating as well is the egress gateway and external workloads because often not everything can be moved into your Kubernetes cluster. Egress gateways allow you to represent power traffic behind stable source IPs, which make the life of traditional network firewalls a lot easier because all they understand are IP addresses, and Cilium can also run on virtual machines and metal machines, so you can integrate external machines that you don't want to run as part of containerized workloads and integrate from a network perspective, integrate them into your Kubernetes. Hubbell in a nutshell provides metrics, logs, and a service map, layer 3, layer 4 on the network level, but also layer 7. We're seeing an example of how Hubbell is parsing HTTP requests and provides, for example, HTTP latency to build a golden signal dashboard. We also have a service map where you can see what services are talking to each other, and then you can export all of these metrics and these logs via Prometheus, FluentD, just as Jason, you can build Grafana dashboards, elastic search, and what we've added recently, open telemetry support as well. Cilium service mesh is moving to stable in 1.12 that will be released in a couple of weeks which adds a native service mesh capability. Cilium service mesh offers two options to implement service mesh. The option we had so far was option number two, where Cilium integrates with Istio and enforces its layer 7 policies in the sidecars of the sidecars managed by Istio. The new option is a completely sidecar free service mesh, where EVPF and Envoy together provide a service mesh data path that does not require to run sidecars in each pod. What's important to us is we want to support the entire ecosystem of service mesh control planes, so whether you want to use Istio or LinkerD or SMI or maybe choose Kubernetes ingress or a gateway API to define what you want to do with the service mesh, we want to support all of that. Obviously for the observability that we can provide, support the ecosystem in the cloud native space. Tetragon introduced this week was part of our enterprise offering so far and this week we have open sourced tetragon and with this it has moved into the cloud native foundation. It is the ability to provide runtime visibility, so for example ability to see what system calls are being made, what files are being accessed. We can see privilege escalation, so when a pod gains security privileges, we can see when a pod breaks out of a container namespace and then even more importantly we can actually mitigate some of that. So when we see these events happen, we have an incremental EVPF implementation to react to that and automatically terminate a process that violates the rules. For the observability, of course the usual ecosystem that we can provide, so you can with the runtime visibility we have build Grafana dashboards, export that into your SIM via FluentD for example or use an open telemetry collector to provide syscall traces for example. Last but not least, very briefly unlike what's coming in 1.13, so we're about to release 1.12, so we're in feature freeze for 1.12 which will move service mesh out of beta into stable. In 1.13 we will focus heavily on adding additional service mesh control plane integrations. Gateway API support will be coming, SPIFI integration, the PR is out there, we want to merge that in time for 1.13 as well as a new architecture for providing mutual authentication with MTLS that will work not only for TCP but for all variety of network traffic. Obviously a lot more, these are the high level points but even more importantly what is missing, tell us and Liz will be speaking a bit, oops, Liz will be speaking a bit more about how you can actually influence the roadmap, how you can tell us what we should be doing next and with that I would like to introduce Laura from Datadoc talking about Datadoc's journey with Cillium. Here with a live audience for the first time in a long while. I'm Lauren Bernay, I work at Datadoc, I'm staff engineer there and I work in infrastructure, there's a bit of a delay. For those of you who don't know Datadoc, we're an observability company, I put a few figures on the left hand side of the slides but I'm not going to go through them because they don't really matter for what I'm going to talk about today. I work in the infrastructure team of Datadoc and this is where we use Cillium and why I'm here today. So just to give you an idea of the challenges we have is, well, we currently run tens of thousands of nodes and hundreds of thousands of pods and the challenges we have a lot of communities clusters because of course we can't run clusters that big but they're already pretty big as you can see because our biggest clusters are close to, I mean this slide says 4,000 nodes but we now have clusters close to 5 to 6,000 nodes. To make things even more complex we run on multiple cloud providers and we're growing pretty fast. So this is a very high level view of our infra so we run on different cloud providers with different constraints and of course what we want is we want communications to be secure within clusters but also between clusters because we have so many clusters, we also have a lot of communication between clusters and we need ways to make these communications secure. So one of the things, one of the constraints that comes with our design is because we have workloads running in multiple clusters we want to be able to efficiently route traffic between workloads. For instance imagine you have a Kafka cluster running in one communities cluster and you have Kafka clients in another cluster you want this communication to work and for this to work you can't use load balancers you can use ingress at least easily so you need all the pods to have IPs that are readable within your network right and so this is a design shows we made very very early on because it's more performant because we have an overlay and it allows for communication between clusters. Of course it's not as easy there's a reason many people start with overlay because overlays are very simple to set up, all your clusters are independent, you don't have issues managing the IP space but if all your IPs are rotable then you need to be very careful how you allocate IP and and sizing clusters is a bit challenging, sizing subnet is challenging and of course even if you have IPs that are rotable between clusters it's not because IP communication works that services can discover themselves right so we need a solution to discover services between clusters. The solution we started with and to be honest we were very happy with it for a long time was to use one of the first plug-in that was providing rotable IPs for pods in AWS which was a plug-in developed by Lyft and of course on other cloud providers this wasn't working because it was edrych specific and on GCP we were lucky because GCP was already providing IP alias ranges which is also heavily used in GKE and these two solutions were working for AWS and GCP but they were different and we had no solution for all the providers. In addition to that we are starting to see issues when the scale was increasing with the Lyft and I plug-in because API location was done on each node which means when you have thousands of nodes asking for a location well you have issues with API rate limits for instance and with this solution we had no network policy and no support for encryption between nodes. Another challenge we had was service load balancing so I'm sure most of you are familiar with how this usually works. You have Qproxy that is responsible for transforming services into pod IPs and of course I mean we did like everyone we started with IP tables at the beginning that if you run Qproxy in IP tables mode on last clusters I'm sure you've seen you've seen some interesting challenges such as the size of the and the number of IP table rules, the time it takes to update them, the time it takes to match them and so we decided early on to use IPvS instead of IP tables that was much better but it was very new and it didn't have all the features IP tables had so it was not honouring everything Kubernetes was supporting and connection tracking in IPvS proved to be very very difficult for us and as a summary I mean we had many many growing pains and for service load balancing IPvS and IP tables kind of worked both with issues but they were really not designed to do smart clan-based load balancing like clan-side load balancing and it's obvious for IP tables I mean IP tables was not designed for load balancing and while IPvS was designed for load balancing it was designed for middle box is not for clan-side load balancing. Also getting patches in the kernel and improvement was pretty slow and something that happened pretty recently is we found a bug in the kernel in the VS code which impacted performance and getting this it was actually easy to get it fixed but of course then you need this to roll out right and I mentioned network policy before during network policy but it was possible but of course once again it's a lot of IP table rules so this gets us to why we're currently using Cilium right because what it gives us is the power of eBPF where we can program the features we need and of course all the features are provided by Cilium but I was mentioning before the VS bug when we discovered the bug I reported to Cilium because at the very beginning I was assuming it was due to the configuration through our Cilium configuration but it turned out to be a kernel bug down the road and Daniel from the Cilium team was like oh it's going to be very easy what we're going to do is we're going to fix this in the database of Cilium in eBPF and you're not going to see the bug anymore so that was that was amazing. Something I wanted to mention is we started as users of Cilium of course but we have we run it at quite some scale and we have a few specific use cases based on our on our design things at Datadog and so we after some times we started suggesting features and then we started contributing contributing some some small fixes and some larger ones and what I wanted to mention is we really felt welcome Cilium maintainers are very welcoming so I before many of you raised your hand when Liz asked who was who wanted to contribute to Cilium I can tell you it's it's pretty easy to get started because people are very very welcoming so in terms of contribution we did a few I wanted to mention two recent one that might be of interest for some of you one is prefix delegation on AWS so we run Cilium in IPAM mode where Cilium is allocating IPs to AWS interfaces and the problem is when your VPC starts to get big and you allocate IPs one by one you start having issues with with the VPC because it can get slower and AWS heavily recommended we migrate to this new design from AWS which is called prefix delegation where instead of allocating IPs to interfaces one by one you can allocate a block of IPs and so this was possible on AWS but it was not supported yet by Cilium and so someone from the team him and from the team did did all the work to support that and I think it's going to be in 112 if I'm correct. Another one that's that's recent is we had interesting issues where when you delete a pod suddenly the IP is not readable anymore and it's just black holds and it's sometimes very interesting to notify the client that this IP is not readable by sending an ICMP error so you don't get into a scenery try loop and so this is a new feature we added. Another one is I didn't put it in the slide because I forgot is we landed an interesting performance improvement where instead of having adding IPs one by one when we noted that there is that the node doesn't have IPs available for pods anymore, we can now watch for all the pending pods for nodes which means instead of allocating them one by one can look at blocks of IPs and it's much faster. So yeah, I mean we've been using Cilium for some time now and really plan to continue and to use it and to contribute to it because it's been a very interesting journey for us and on this I'm going to welcome Purvi. All right. Thank you. All right. Thank you, Lauren. Good afternoon everybody. I'm Purvi Desai, director of engineering, Kubernetes Networking, Google Cloud. I'm very happy to be here to give you an update on GKE Data Plane V2 and our journey forward with Cilium. So we launched the GKE Data Plane version 2 based on EBPF Cilium in 2020 and from very early on we saw the EBPF and Cilium's disruptive innovation in enabling the or exposing the programmability in the stack, network stack of Linux. So based on Cilium, GKE Data Plane V2 is our opinionated and fully managed and Kubernetes compliant network stack. So you all know we have been fully committed to Kubernetes community and Kubernetes open source and with this we were committed to Cilium community and Cilium open source. As many of us know that one of the hidden superpower of Kubernetes is really the developer-first networking model and with Data Plane V2 what we did was we harnessed the power of EBPF, Cilium and the Kubernetes. And this is very important for us and we have seen great feedback from the customers. But more importantly, we are very happy and very proud that Google was the first cloud provider to adopt Cilium. And the strategy has worked out very well based on the customer feedback that we have received. Since the launch we have enabled it as a general, we have generally geated as a Data Plane foundation for our flagship GKE Anthos and newly announced platforms like Google Distributed Cloud Edge and Google Distributed Cloud Hosted. And this is also now a default for our innovative offering of GKE Autopilot. So why we have always been excited, but we have also seen our users really interesting in migrating away from IP tables and towards this ability to harness the flexibility of the stack. Now, during this journey, we have actually learned a lot from our customers. And while listening to them, we feel the power of end and not or. Our customers, while they love the fully managed service and opinionated service, we also have customers who would like to have the flexibility and ability to pick and choose the best of both worlds, which is Cilium and Google's opinionated service. So we believe we can make this happen by bringing, we can achieve this by bringing the modularity, plugability and composability in Cilium and in Data Plane. So we will be investing in this coming months and journey forward. So the North Star for GKE is a vibrant and open ecosystem where we can offer innovative networking features. We are very excited with what lies ahead in the journey and we plan to give you all a tangible and detailed update in upcoming QCON North America in October. So stay tuned. Thank you so much. Thank you to Pervie and thank you to Laurel for sharing their experiences of working with Cilium and we really appreciate the input from our adopters, from our users and their contributions and their feedback. So at the start, quite a few of you said you're interested in using Cilium, but you've not tried it out yet. So here are a few pointers for where you can go to get started. Now, Cilium.io is the website. There's a page on there with a collection of pointers to beginner materials, whether that's the documentation, whether it's our weekly introduction sessions with Thomas and other contributors and maintainers from the Cilium project who will answer your questions and help point you in the right direction if you have some initial questions. We have documentation with a number of getting started guides and we now have some interactive tutorials that will show you some of the basics. If you're new to the project, I really highly recommend joining the Slack channel because that's really the easiest way to reach out to the community, to ask for help, to get help, and to give feedback to the maintainers. Maybe it's going to work, maybe it's not. Maybe I'll use the button. Okay. So if you do ask for assistance on Slack, you'll find a number of channels related to some subtopics. I mean, obviously people have a tendency to just put everything in general and then things get lost. So if you do realise that your question is very specific to service mesh or Hubble or some other feature of Cilium, then it's great if you can use those specific channels. I also really recommend you to help answer questions in those Slack channels because after all, this is a community project and your experiences as an existing user are really valuable. This has to be a community where we can help each other solve problems. Sometimes you will find something that is, you know, not just a question, but actually needs a change or a fix. And of course, GitHub is the place to raise issues. I've just listed Cilium, the organisation, because we do have the main Cilium, Cilium repo, and also some other sub repos for things like Tetragon and Hubble. And you'll probably find it pretty obvious if in doubt use Cilium-Cilium. So feature ideas. As you've heard from our users, we often find that there is a new requirement, some interface that someone wants to integrate with an issue. I would strongly recommend asking questions and asking for feedback about ideas on Slack. Partly that helps to see if there's a community of other people who want the same feature or have the same idea. We have a new, a CFP template. If you go to the issues part of GitHub and you raise a feature request, you can fill that in at a high level. Or if you have a strong opinion on how it could be implemented or you want to provide a lot of details, there's a Google doc template that you can use to give us a bit more expansive idea on what it is that you'd like to achieve. And for a bigger feature, we use those CFP templates as a way of asynchronously debating how larger features could be implemented. If you want to see the roadmap, now there's no real replacement for GitHub issues. GitHub issues is the set of things that we know about that we would like to implement. The kind of higher level set of broader features that we want to tackle, we've started publishing in a community roadmap. We deliberately don't put the time frames on those because it's a community project. If there's something specific that you really want, you can help contribute to that feature. Or you can at least give us the feedback on why it's urgent and what you could possibly do to help us bring items in the time frame that you need them. If you are one of those people who put your hand up because you would like to make contributions to the code, there's a whole developer section in the Syllium docs that will tell you how to set up the build environment. They'll tell you about what you call it, I've lost the word, making code consistent. So just guidelines for how you can develop. I strongly recommend getting involved in Slack and talking about a PR before you submit it because quite often there might be half a dozen different ways to tackle a problem. There may be some strong opinions from the maintainers about which of those approaches is the best. So we'd much rather have that discussion first, help point you in the right direction and make it an easy process for getting those PRs accepted and merged because that's what we all want. Nobody wants PRs sitting there unmerged. We have a number of issues. I think I looked this morning and there were 90-something issues labelled with good first issue. I think there were probably about 90 of you who put your hand up and said you wanted to contribute. If everybody takes one, great. Of course, you might want to actually talk to people. I know it's been amazing actually seeing people in real life this week. We might all be a bit fed up of Zoom calls but it can be really useful to have a conversation. We have a weekly developer call. You'll find all the details on GitHub and also in the documentation. You can add items to the agenda. There's a sort of ongoing living Google doc that describes what the agenda is every week. We welcome you to come in and add the items that you want to discuss. We also have some special interest groups that may be very busy one week and less busy another time depending on what's happening in different areas of the project. If you have specific areas of focus, you might want to join a SIG. I would say join the developer call first and get advice on who is involved in particular areas of the project. Last but very much not least, you might want to be making contributions to the project in other ways. We really want to enable people to go out and help their local communities, learn about Siliam, get help from Siliam. If you have stories to tell, whether you want us to amplify your blog post, maybe put a blog post on the Siliam site, if you want slides, perhaps you want to present something about Siliam in your local meetup, we can support you with some slides. We have a form that you can fill in if you'd like us to send swag. Also, if you want to ask one of the maintainers from Siliam, if you'd like to invite them to speak at an event, I'm not going to guarantee that you'll always get a yes, but we can try and connect you with people in your local area who might be interested in having a discussion and sharing the Siliam joy locally. That help page will give you a form. You can ask for all kinds of help. We also really want your ideas on how you might want to contribute to Siliam. Get in touch on Slack. That is the best way to reach all of us, the best way to tell us what you need, your ideas for Siliam going forward. I think we have about three minutes for questions. Questions for Thomas, for Laurent, and Purvee. Would you like to come on in case we can, if you can shout out a question, I will try and repeat it for the mic if you have questions. Any questions? Is a whole room of it must be a question? Yes, wonderful. The purpose of, first of all, the north star, as I mentioned, is about that vibrant and the open ecosystem to enable the innovation of the networking features. The main purpose is, as we said, we are hearing our customers where they would like to be, they would like to have things from open source, the goodness from here, and then also get the managed and the differentiation from Google Cloud, let's say, on GKE. For us, it is important to have that kind of ecosystem where they can run some tools that they can get from Siliam and make it work. Maybe even actually, if everything works out well, far away in the future, maybe they can even write ABPF programs and install it. But that's like a vision, vision. Okay. Is another one, Justin? Yes, so a question, just to repeat the question, a question was great ABPF, but what are the limitations? So the answer is what's specific about ABPF is that it can run in the Linux kernel. That's what's different to other languages such as WebAssembly or Java or just C. This privilege to run in the Linux kernel comes with limitations because nobody wants the kernel to crash, which means in order for a BPF program to be loadable, it needs to be verifiable. This means it needs to be verified by the kernel verifier and it will be rejected if it does not pass this check. One of the conditions is, for example, complexity. We cannot run a 10 megabyte word processing app into the kernel. Programs need to terminate. They need to be able to run to completion, which means it needs to be known that they will eventually end. So it doesn't run continuously and your kernel will just stall. And there's also a size limit to the program so we can only load a program that is so much or so big. So there are limitations. These limitations exist so it is safe to run programs in the kernel. But if we look at the feature set of Syllium, I would say the limitations are not holding back much. And some of those limitations have been reduced in the sense of like the number of instructions that you can run on a BPF program has increased over time. So it's less and less of a problem. I think we are pretty much out of time. We'll probably get kicked out of the room. I'm just going to add one more point. Well, first of all, two more points. One is there is a Syllium boost. Do come and ask your questions. There are people manning that boost for the rest of today and tomorrow. So we'd love to have a chat with you there. And also, if you're interested in EBPF, we have some books that apparently have actually arrived after being stuck in customs. So have passed forward today. We'll have those on the ISO Venom boost if you're interested in one of those. Thank you so much, everyone, for coming here.