 Welcome to this Cillium Maintainers Track session. My name is Liz Rice. We're going to have some guest speakers today to talk about how they've been using Cillium and some updates about how you can learn about Cillium. But let's just start by saying welcome to Cillium. How many of you are already using Cillium? Pretty good show of hands. And also put your hand up if you have contributed to Cillium. Okay, hopefully next time you will come there will be a bigger number of hands but thank you to everyone who's been contributing and thank you to everyone who's been using Cillium. So let's just have a quick refresh on what Cillium is. Probably most of you know it best as a CNI, a networking solution for Kubernetes, although it isn't just for Kubernetes. You will probably be aware of Cillium's service mesh and some of the features we've been doing there around Ingress and Gateway API and mutual authentication that we've been talking about this week. How many of you have used Hubble for observing network packets? Yeah, I think it's one of the real strong points of Cillium that you can not only connect, the packets aren't just flowing but you can also observe them with Hubble. And then finally, the Tetragon subproject. Anyone here using Tetragon? Okay, again, that's another one I expect next time. There will be a bigger show of hands. Tetragon is the project that uses EBPF for runtime security, observability and enforcement. And all of these things, of course, are built on EBPF, which is very exciting technology that allows us to change the way the kernel behaves. Did anybody come to see that EBPF documentary screening last night? It's so good. It's a half hour of your life that you will not regret if you watch that documentary. It's really good. So, Tetragon reached 1.0. Give it a round of applause for everybody who worked on Tetragon. So Tetragon is using EBPF to detect events which may be relevant for security purposes. Because we're in the kernel, we can detect file access or network access or changes of privilege, all of these kind of things that are very significant potentially for detecting malicious activity. And the other thing that's really important about Tetragon is that those events are filtered in kernel. So it's extremely high performance. We've done some measurements pretty recently about the performance impact. Normally when you add a security tool or anything that's adding a ton of observability, it might have an impact on performance. But you'll see some really low numbers, under 2% CPU usage for some really useful observability going on. So the performance is excellent. Tetragon is 1.0 and ready for use in production. So we want to hear when you try it. We want you to tell us how you get on. We want you to raise issues and contribute profiles and just tell us how Tetragon has been for you. The other element alluded to this earlier is that Cillium is not just for connecting within a Kubernetes cluster. We have cluster mesh, which allows us to connect multiple clusters. We have Cillium mesh, which allows us to connect with external workloads, have ingress from external traffic and essentially meet all of your connectivity needs with Cillium. I guess that leads us on to the other big piece of news, and I'm sure you all know this, but I think we're going to have another round of applause for it. The fact that Cillium is now graduated in the CNCF. So a massive thank you to everyone who has really done anything at all with Cillium because even just adding your company to the adopters list has had an impact on this and enabled this milestone. I think everyone who has been involved can be really proud. If you're not involved yet and you're thinking, I want to be part of this rocket ship, then there is the Cillium contributor ladder which lays out how you can work your way into the Cillium community, whether that's through code contributions or non-code contributions like educational material or documentation or community management, we're really interested in non-code contributions as well as actually fixing issues and adding features. If you do want to get involved, you have an idea for a feature or you want some advice. There's a weekly Cillium developer meeting that happens on Wednesdays and we also now have a monthly, APAC friendly time zone meeting as well. All the details for this are on the Cillium readme and you also can find all this information in Cillium Slack. How many of you are in Cillium Slack? Okay, if you're not already, it's a really great Slack.Cillium.io or you find the information on the Cillium readme. Really, there's a ton of people there, like 18,000 people who are interested in Cillium and EBPF and it's a great place to share information and get advice. I mentioned this already, I stole my own thunder but if you haven't already seen it, definitely go and watch the EBPF documentary. The final thing that I want to leave you with is I would like you to take a picture of this URL or scan that QR code and fill in the survey and tell us about your experiences with Cillium and tell us about what you want to see next and give us the feedback that helps us to move the project along to its next success. With that, I would like to welcome James McShane and we're going to tell us about how they're using Cillium in the wild. Let's give James a round of applause. Thank you, Liz. The Cillium team is really killing it at this conference. It has been, you want to talk about some buzz. They've done a spectacular job in this year in their community and I think that's what really drives our usage of Cillium in the wild. Who am I? My name is James McShane. I'm director of engineering at Super Orbital. We are a cloud-native consulting services firm as well as offering training curriculum. I am trying to resurrect the Cincinnati cloud-native meet-up. If any of you are in Cincinnati, come find me afterwards. We'll make it happen. I've contributed to Tecton, Argo and Cillium, though don't go look my PR at Cillium. I've done a lot of work with the team and getting real production use cases out for Cillium at a couple of our clients. Our team focuses on delivering high-leverage projects requiring deep integration across multiple teams and in large end-user environments. For us, the way that I see it, Cillium has really nailed the delivery and the maintenance across hundreds of clusters, thousands of services and all the major Kubernetes providers and provided it for us a consistent strata on which to build network observability, network security and it makes it really easy for us to build solutions on top of that where enterprises can trust that they are delivering on top of a secure platform that they can then trust their applications on top of. This is where we really started our path of utilizing Cillium is thinking about it from the network security perspective. For us, this has evolved from a set of layer 3, 4, 7 policies now to think about security policies as application. When you think about a large multi-cluster, multi-region delivery, the key thing for us is to be able to say, hey, I've got this policy set that is its own application. I need to ensure that I can modify my policies consistently. I can get those delivered across, validated in a test environment, run a number of validations to make sure when I release this out into the wild for my edge release customers that I know that I'm not going to break their existing workflows. On the back end, we observe that with looking at the network flows it's a little bit difficult, but the key thing is, if you've already started breaking your customers and you're seeing the impact of that, the goal is to validate and ensure that that consistent policy set is enforced before we get there. By treating this policy as applications, it's also really useful for security teams that want to know what's going on in their environments that want to know how is this being enforced. They can view our GitOps delivery model as their place as the source of truth for how is my network being secured inside of these clusters that I deliver applications to. It starts there when it comes to delivery because we really want to, when it comes to these high-leverage enterprise projects, we need to involve security from the start. It's that shift-left mentality that Axilium is at the root of for us when we're delivering these policy sets across multi-region, multi-cluster environment. Then, as we look into the work that's been done in 2023, I really see the buzz of the project, the significant effort that's been done in the community, has immediate impact for us in our use cases. I put this last, but I'll be honest, it's one of the biggest things that has improved the observability and the operational aspect of Axilium for us is separating out the Envoy-Damon set into a separate deployment, separate unit. It's a very simple thing. It's a separate process. It's running Envoy with Axilium for the later seven policy management. But those kinds of usability improvements have been consistent for us as we go from, I was here up in Detroit last year and talking about what Axilium 110 and 111 had provided for our team. I just see that consistent delivery going on in this project. Things like enterprise customers need supply chain validation for their open-source artifacts. These things that are going on in the open-source ecosystem are really critical to delivering secure environments that start from delivery in the repository down to the runtime security environment that you need to be able to monitor and need to be able to enforce. For us, when I think about the use cases that we haven't yet picked up that I'm really excited about, mutual authentication is absolutely like the top of mind for us as we move forward. These are use cases that we've been asked about consistently by our customers for a really long time because everyone wants this kind of transparent encryption from service to service and Axilium makes it so easy. Thomas will talk about that a little bit more later in the presentation. Unified spiffy identity across the cluster boundary with Axilium service mesh is another thing that we can then utilize in other APIs to use that identity for all authorization mechanisms. The BGP control plane, I know Marino is just these kind of improvements are really in line with the kind of things that we've needed to see in the delivery of our applications on top of Axilium. For us, in the runtime, it's about pairing security and observability. We put together the observability of Hubble seeing those network flows, the process flows in that immediate nature that Hubble provides. Tetragon is the tool that we've needed for a long time when it comes down to monitoring and maintenance of SIS calls. That's a really key use case. We were working with a client in 2020 that wanted to do this deep-level integration into SIS calls and we ended up using SETCOM policies that have been around for a really long time, but the maintenance of those types of policies, especially across nodes that have less host access and different host profiles when you get into different operating systems and different architectures, Tetragon enables us to build better visibility around that that enables enforcement. By bringing that centrally and allowing us to view and query and those sorts of things, we can then build consistent policies that enable us to get far more granular about the SIS calls that these containers should do. The next steps for us with Tetragon is to bring that back and treat that policy as code in the same way. Take our network policy approach of delivering that starting from the repository to drive those things and we want to do the same thing in Tetragon as well. What's next for us is very much in line with the things Liz was talking about. Gateway API integration, Syllium service mesh with mutual authentication and then using Tetragon to make that runtime security even more of a reality than it is today. Please feel free to come up and talk to me afterwards about your end-user use cases with Syllium and we're really excited to be a part of this community. We'll go ahead and Hamath will come up and talk about learning with Syllium. Thanks, James. Hello, everyone. We wanted to take some time to talk about some new parts and updates for how we go about learning about Syllium. A quick introduction about me. I'm a senior software engineer at Datadog. Datadog is an observability company and I work at a team called Compute Data Plane which is responsible for managing Syllium for all of Datadog. We run hundreds of Kubernetes clusters and tens of thousands of nodes. Syllium CNCF maintainer. I primarily work on the SIG IPAM which is the different IPAM modes in Syllium. I also spend a lot of time in Azure and AWS modes. You can find me on Syllium Slack, our community meetings, as Liz mentioned, on Wednesdays. For this section, I want to start this off with a quick question. How are all of you learning about Syllium these days? What are some of the resources that have helped you learn Syllium? Anyone? Did not expect that, but okay. Blogs, yeah. Perfect, yeah. So, here's what I had in mind, right? Documentation, docs and blogs, Echo office hours, maybe reading code, running Syllium at your own company or home lab. There are different ways. There are so many resources that you could use to learn about Syllium. But depending on your background, your own path might look completely different, right? And part of what makes it slightly tricky is that Syllium is extremely feature-rich. There are lots of features. I went to the Syllium IO website earlier today and I could not fit all of the features into one single slide. And that's how many features Syllium has. And Syllium is a very fast-paced project. I saw a screenshot from earlier today. In the last one month, we had 90 plus authors making 362 pull requests and there were four major releases. And I can tell from our own experience, it's hard to keep up with the pace at which Syllium is developing and building new features. So, the community has been constantly adding lots of new features. And where I'm getting at with all this is a graduated CNCF project with as much complexity as Syllium needs a structured learning path, right? So, the community has been really excited that CNCF is finally coming up with a Syllium Certified Associate exam. And this is an entry-level certification exam. And the ideal candidates would require some Kubernetes knowledge, networking background, and some hands-on experience with Syllium, of course. And this should allow users to validate their Syllium knowledge and also gives an opportunity for companies to identify Syllium talent. So, we are really, really excited about that. And here's how the blueprint looks like currently. And this blueprint was designed by a set of subject matter experts from different companies. And all of us have agreed on these topics as the first topics that somebody should try and understand when you're getting into Syllium. And so, who's creating this CCA, right? So, there are a set of 15, I think, subject matter experts from different companies. We're all collaborating with each other to try and come up with a certification exam. So, the process is basically all of us draft our own questions and that goes into a peer review process. And we all exchange feedback on those questions and once those questions pass, once you add all that feedback back in, it gets added to the question bank, right? So, here's a sample question. I'll give you all a few seconds to read about it. Any guesses? Hopefully not from the first one. First one? Yeah. Cool. So, the idea of this certification is it's meant to be an intro level certification. You are not required to be an EBPF expert or a networking expert because we understand that there are different kinds of engineers from different backgrounds that are all trying to learn Syllium for their own use cases. So, this is meant to create a learning path for everybody and create some structure around that. So, how do we go about preparing for CCA, right? So, Isovilleant has done a great job at building a set of labs and these labs have their own distinct learning paths. So, based on your background, you can pick your own path and explore different labs in each of those paths and that is a great starting point. And on top of that, there's a course from CNTF called Introduction to Syllium which is completely free. So, check the course out. And, of course, the official reference guide documentation. I know everybody says, read the documentation but nobody really does, but we should. It has a lot of good content in there and that is the official guide for the entire certification and every time you write a new question, we are required to cite things from it. And here's a link. If you want to get notified about updates on when the certification would be released, we expect it to be available early next year. Right? Yeah. So, what else is in store? I was told that after Syllium Certified Associate Exam launches, we might be working on a few advanced certifications as well. Hopefully, if everything goes well. And that's pretty much all I had in mind. I know all of you are waiting to hear from Thomas. So, let's invite Thomas, CTO of Isovilleant. Thank you. Thank you very much. All right, we've heard Syllium is a fast-paced project. I have the pleasure to talk a little bit about what is next for Syllium, because we keep on hearing from you what are your challenges, what do you want Syllium to solve next for you. So, let's dig in and give a bit of an outlook what is planned in the next roughly six months. Of course, we will continue evolving the mutual authentication that we have released or introduced in 1.14. We're not fully done with this feature. It's still in better phase. We will continue to evolve this, looking to hopefully market stable in 1.15 that is upcoming, or maybe 1.16, depending on how fast we all write code. We will of course look into fully implementing Gateway API as it evolves. Gateway API itself is not fully complete yet. It has reached a 1.0 milestone, but we're still adding TCP, UDP route support and the SPAC and the standard is still evolving. We're continuously keeping up, so whatever is being put into the SPAC we immediately implement in Syllium itself, but Gateway API itself is also still moving. I'll talk a little bit about NetKit, which was formerly called MetaDevice, or something very exciting. We can still make networking go faster. It's hard to believe at this point, but we keep on finding ways to improve performance. Then we also have a new Syllium initiative that will briefly talk about remove the friction. Let's dive in. Mutual authentication. Very exciting that initially we have been able to do encryption with WireGuard and IPsec, so if you had a desire to just encrypt all of your traffic, that was incredibly simple. You could enable a simple single helm flag, enable encryption, choose IPsec in WireGuard and automatically encrypt your entire network without actually having to manage keys or run an entire service mesh. If you have been looking for purely encryption, this was a great option. But then we have also heard from many of you you also want mutual authentication on top of that and please without sidecars, which is why we have brought mutual authentication for the first time in 114. We also wanted to keep it simple. You can now use a single helm flag to bring in an entire spiffy spire stack. So Syllym will run a spire server as well as a spire agent on all the nodes and automatically generate certificates for all the deployments you are running. And then you can use network policies. As you can see on the screen there, existing Syllym network policy adds two lines of YAML to require authentication and Syllym does all the magic in the background to actually run the mutual authentication handshake. So we're trying to hide a lot of the complexity that exists when running mutual authentication, which is just the necessity of kind of the complex inner workings of handling certificates and handshake and all of that and exposing the user experience and keeping that as simple as possible. I talked a little bit about this NetKit device and it will go a little bit deeper down the stack. So what is NetKit? This was called MetaDevice until recently. It's a new kernel feature and if you want to learn more about what exactly that is, I recommend Daniel Borgman's talk about MetaDevices. So far we have been kind of abusing what's called a virtual Ethernet device, which is a software device, like a software patch cable which is representing a virtual Ethernet essentially. It's not been built at all for container use cases in any ways. I've been kind of just abusing this. There was no actual dedicated network device that was built for container workloads and running containers has become obviously a primary use case for Linux. So why not come up with a native network device specifically built for container workloads? Because a lot of the performance overhead for container networking was that we not only had a networking stack inside of the container, the network namespace of the container, we also had an entire networking stack outside. So far Sillim has been able to bypass quite a lot of that, but not quite all of it. And with NetKit, we're able to essentially remove all of the network stack overhead, and you can see that this Meta, the slides are still old, so the NetKit rename came after the slides were created. You can see that we can now essentially almost directly connect the network interface card of the VM or the server with the physical interface that's leading into the container. And I will show you what that actually means in terms of performance because it gets exciting. You can see a couple of numbers here. We have in yellow the baseline. That's host networking, not running any containers that's simply benchmarking one node, talking to another node. That's the baseline. That's what we want to get to. Then in light blue we see the old stack, virtual Ethernet device, no modifications. And then in red we see the prior optimizations that we have done to bypass quite a lot of that. And you can see the line has went up and we're almost as fast as host networking. And now with NetKit we're essentially able, that's the purple, we're essentially being able to get to almost host level just from a throughput perspective. It gets better because we can do even better. Here lower is better. This means the baseline is all the way on the left. And you can see the representation is the first five on the left. That's kind of what we've seen before. If we can take it one step further and if your application uses a kernel feature called 0copy, that is a special system call which avoids copying the data from the application into the kernel. It's called 0copy. We avoid the memory copy. And at this point we're so fast that the memory bandwidth, the memory, the throughput of the memory in your server is becoming the limiting factor. If you can use 0copy in the application you can go see it all the way down. And that's an improvement of six times in terms of performance. If you're running in particular workloads like video streaming or any sort of data intensive workloads, this is exactly what you will want. NetKit is currently being merged into the kernel and we already have a branch in Cillium to support this and as soon as kernels are released with this feature you can start using this. Cillium initiative removed the friction. Cillium is very feature rich. We are trying to accommodate all of you with all your features from running a Cillium on a Raspberry Pi to data doc clusters of thousands of nodes to heavily regulated enterprise environments and this needs everything from encryption to high scale service mesh, all of it. We are doing our best to optimize both the learning of Cillium, how you discover Cillium, how you learn about Cillium and the usage of it, migration to it and so on. But we have been dealing with this for such a long time. It's sometimes hard to understand what's actually hard. So I would invite all of you to, as you have learned Cillium or if you have migrated to Cillium, if you started using Cillium please give us feedback what was hard. You need to know that in order to remove that sort of friction. So if you started this initiative, there's a QR code, you will also find it in the Cillium Slack. If you have any sort of input on what was hard with Cillium or what we could optimize on learning about Cillium, what is missing, please tell us because we would love to optimize that. Sometimes it's just hard to actually know. And with that I think, first of all I would like to thank all of you for coming today and if you have questions on Cillium, if you want to contribute to Cillium or if you just want to learn more or try it out, Cillium.io is the best way to get started. You will have URL links to Slack. You can find the documentation, you have the contributor letter, you have introduction material, tutorials, labs and so on. That's a great starting point. And of course the code is all on GitHub like Cillium organization, Hubble, Tetragon, Cillium and so on. And with that I think we have a bit of time for questions and you can open it up and ask essentially any question on Cillium. We have a lot of expertise here today. So go ahead. Thank you very much. Any questions? Yes, I can also repeat the question. Questions, how are we making money as I surveillance? We are offering an enterprise distribution of Cillium and if you would like to have enterprise great support you can give us a bit of money. Yes. Could you speak just a little bit louder? Yes, the question is what algorithms are we using for mutual authentication. So mutual authentication we are using TLS of Go. So the mutual authentication handshake is using standard TLS using Golang. That's for the mutual authentication handshake. The encryption is being performed using IPsec and WireGuard and in particular with IPsec you can run a FIPS compliant kernel limit the ciphers to let's say ASCBC and only run FIPS compliant ciphers. But you can use the full cipher set of IPsec, standard WireGuard ciphers and on the authentication side it's go TLS. Other questions? You can also tell me about the friction of Cillium. Yes. Specifically for low balancing? Yes. So the question is what sort of algorithms are available for low balancing. Cillium offers both layer 3, layer 4 and layer 7 low balancing. Layer 3 and layer 4 low balancing is always performed in EBPF and we have round robin, we have weighted round robin so you can send a certain percentage of traffic into one direction. We have session affinity that you can ensure for a particular client all requests are going to the same backend. Then for layer 7 all of the routing is being done using Envoy and you have a wide variety of algorithms, least connected, round robin, maglev, all the consistent hashing algorithms. We also have a standalone load balancer which can run outside of Kubernetes that's maglev based performs consistent hashing. We also support DSR, direct server return so we can preserve the client IP. So I would say we are implementing and supporting the standard load balancing algorithms that you will find in other software and products. I think we have time for one last question. Yes. Do we see a reason to move away from Envoy? Envoy is a fantastic project. As any proxy it does introduce overhead. So if we're talking XDP, EBPF based load balancer we're talking 15 million packets per second in load balancing capacity. If we try to move that through Envoy that's just not possible. So the reason to not use any proxy including Envoy is performance. At the same time we cannot perform all layer 7 load balancing in EBPF at the moment. Maybe we will never can. That's why we use Envoy. The reason, the motivation is performance but in particular things like HTTP parsing and load balancing splicing connections that's when we use Envoy. I think we're running out of time. I'd like to thank all of you for coming and have a safe trip home. Thank you very much. We have one more quick thing. We have two giant bees that Bill is going to set flight. Set them free. So if you're lucky. Brilliant. Safe travels everyone. Thanks for joining us today.