 Hello, my name is Thomas Graf. I hope everybody had a good lunch. I'm still pretty full I'm here to talk about Selim BPF it will be a somewhat Cuban edit specific So before I start I would like to understand who has no clue at all about BPF has never heard of BPF at all Who is kind of knows what BPF is? But this not quite super expert yet mid-level and who's like the pro All right. Well, I want to talk about that. Yeah, I know you otherwise. I would have proposed. So let's start Let's talk hiring right away All right, so Selim What is Selim? I try to put it in one sentence API we're an or king security using BPF and XDP. We're going to dive a little bit into what that what that means I think the other part that summarizes wall is that picture up there Which is kind of has the Kubernetes container ship up there And then the anchor is IP tables and then Selim with BPF is coming along and it's kind of cutting down that that anchor So we're using BPF as a replacement for a lot of values that have previously been provided by use of IP tables And I'm I can I can blame I can I can talk bad about IP tables because I helped write it So I'm one of the few people I can actually Really bash it. All right. So what is BPF BPF? This is what the tool chain looks like and some of you may have seen this So what BPF gives us is the ability to load small programs into the Linux kernel and run them when a particular event is happening For example when a system call is being made or when an app packet is being received or or is being sent Or when an application is in queuing data into a socket or when a user space application is calling a particular function For which we know the symbol address of and so on so for critical We can run the small program whenever a TCP retransmission is happening or we can call a program whenever a connect system call is happening Or we can call a program whenever a particular network packet is received on the virtual interface That is owned by a particular container So BPF gives us the ability to extend the kernel with with with logic of small programs These small programs can be written by a programmer in source code Typically, this is done using pseudo C. So I Restricted C code subset we could then feed that into a compiler tool chain in this case LLVM Which basically takes steps to the pseudo C code and then it's BPF bite code Which is an instruction set that's close to X86 assembly a Program we can then load that program into the kernel and say run this program Whenever the system call is being made run this program whenever a network packet is being received Like most of you have actually used BPF in some form. For example, if you're using Chrome Chrome will leverage BPF to isolate and limit the system calls that a Chrome plug-in can do Or if you have been using TCP dump TCP dump leverages BPF to filter the actual packets that are being displayed So if you do TCP dump and you say port 80 only that's a small PPF program Which will run for every packet received and all that program is thus it will say look at the destination port if it's 80 Return one and we will actually display that so we've all been using BPF in some way What like what we're talking about here is this extended eBPF for or extended BPF eBPF which became a lot more powerful So what happens when we when we load this into the kernel and isn't that kind of dangerous to load an arbitrary program in the The kernel and load it isn't that just a kernel module So there's one big difference between a kernel module and BPF and it's it's the very fire bit Well, this doesn't actually work. It's the very fire bit up there So when we load the program the kernel doesn't just accept it and run it it will very it will verify the program first For example, it will make sure that there is no loops inside the program So the program has to long run to completion it will for example prevent us from arbitrarily Leaking kernel memory. We can also not just call into arbitrary kernel functions There's a there's a white listed set of functions that we can talk to or we can call So there's a known API as we can't just like arbitrarily access kernel memory or kernel functions And then the last bit is just in time compiler So BPF itself as we load it in the kernel as a software bytecode instruction set With the just in time compiler will take that program and we compile it to x86 R and PPC Whatever whatever your CPU actually runs, which means that from an efficiency perspective We're back to native execution native compilation execution. So to summarize it We can basically extend the kernel at the speed as if we would have recompiled the kernel And we can run logic small programs at arbitrary events inside of the Linux kernel There's a typo in this slide kudos for those who have noticed it. I was just too lazy to to fix it up It's the there's TC ingress is mentioned twice for those that the noticed All right, so right rise of BPF and XDP. I've listed four use cases here that I'm diving into a little bit I think some of you may have known about I've heard about the Facebook's use of BPF I heard it in an earlier talk in the morning as well I think BPF is being leveraged a lot in terms of profiling visibility and now also networking and security So be silly will be more on the working and the security side But I want to make sure I'm also mentioned some of the other use cases So the first that the one the upper left the lower left and the upper right are basically all tracing and visibility like Many of you may have heard Brendan Greg talk about flame graphs and using BPF to actually trace Applications and figure out what function call is actually consuming CPU and he's able to do this with BPF at low overhead So he's gaining the the the ability to actually profile applications while they run at them at a minimal overhead The last example is the one in the lower right Which is that the kernel community about two or three months ago has decided that the data path portion The kernel piece of IP tables is being replaced by BPF because BPF is being It's being declared the future in terms of how these things should be done in the future And it doesn't really make sense to maintain both and like an IP tables version of the data path and the BPF data path So while preserving the the IP tables binary Compatibility so while allowing you to continue using IP tables as a binary the kernel portion is being replaced with a BPF implementation So I'm diving into this a little bit this slide was actually this was not open source when I first present and presented this Facebook I think at net conf 16 maybe presented this this slide with with and basically announced to the world that hey We're we're starting to use BPF XDP for our low-pallancing needs and they put this loud out a slide out Then it's showing kind of the performance difference between IPvS, which is another Linux based low-pallancing solution With BPF and XDP and then nobody really truly believe those numbers like those numbers are way too good Like how how is that possible? How can we replace a piece of software if another piece of software and there's like a 10x performance difference? But it turns out that this is actually true and it's it's not because the software is better It's simply because it's running a lot closer to the to the network driver, which is why which is why it is Slash XDP is in there. So this is the ability to run a BPF program Basically inside of Linux network driver So extremely close to the network hardware with access to the DMA buffer So we actually we can cut off a huge portion of the kernel code that would otherwise introduce latency But it's showing Kind of the true nature and potential of BPF as a as a really interesting Technology the code has since been released. So you can go to get up to Facebook's incubator It's called control and control and I'm assuming I'm sure I'm pronouncing correctly detraids or detraids for Linux Detraids was huge on Solaris and the Linux never really had an equivalent tool. So like BPF or BCC or There's various other kind of frameworks that do more or less the same. It's a use of BPF for profiling and tracing purposes Netflix is very has lots of block posts around this Then IP table. So this I really like this tweet by Jerome in any team You need a tank a healer a damage dealer somebody with crowd control abilities and another who knows IP tables I think a lot of us in the networking field in particular have been in a situation where I try to Like debug a system with 10,000 IP tables and it turns out to be extremely difficult. So With the I think with the introduction of silly but trying to to Resolve this to a large extent by actually not relying on huge sets of rules But the actual good logic we'll talk about more about we'll talk more about that That's what I referred to earlier, which is kind of the BPF Being replacing IP tables. There's the QR code up there with a blog post and all there's an AWN article Which gives some of the the background of this All right, and I Usually don't do performance number slides, but this one is is actually interesting because I think it shows Why it's it's not just more of the same but actually something that is entirely different So this is measuring a relatively simple packet drop implemented using IP tables NF tables and then BPF The yellow one is a standard IP tables rule, which basically is is dropping packets that are matching a particular pattern The dark blue is using doing the same using NF tables and F tables Is the successor of IP tables and actually is pretty close to BPF in terms of extensibility It's also a bytecode virtual machine That you can program difference between BPF and NF tables is that BPF is general purpose You can use it for non networking and NF tables is entirely domain specific to networking So the language actually knows about IP addresses and it knows about our king and also about packets and so on BPF is It's like Java. It's a general purpose virtual machine Grace there is is BPF and so it's not the concept of programmability that makes the big Performance difference here. It's the where we can attach and run these programs That's kind of the revolution that's going on here We can program the kernel of various points in this case very close to the network hardware And because BPF is general purpose We can even offload that in this case on two smart necks in which case we actually see another performance win What's interesting to me as a pure soft developer is that we can actually gain huge performance numbers Just with software only without even depending on any hardware specific features There are many many many more examples, of course I listed them here the kinfolk folks actually did an amazing we've scope plugin using BPF Which I which I can highly recommend running Also go BPF Which gives gives goal line bindings for BCC All right, so What does eBPF mean in the context of of Sillium? We're looking at where we'll look at kind of two different use cases The first one is How Sillium provides this in the context of Kubernetes or similar orchestration systems So Sillium itself is actually a goal and based Ancient that will run on your worker nodes could be Kubernetes Docker mezzos whatever and takes high-level intent For example a CNI plug-in request. Hey provide in the arching for this pod or hey implement The security policy and so on or hey do low-pricing for this particular service IP and it takes that That notification and it will generate a BPF program then then implements this so think of Sillium as What takes the BPF complexity away for you and basically writes the program? On behalf of you so instead of manually programming be Sillium automatically generates the program. We will actually generate unique programs for particular containers So instead of having like one program that will then be executed and run on behalf of all containers We look at the particular needs of a container and generate a program for that particular container or pot Which means that we can actually leave out certain? Implementation details of the program for example if a container does not need IPv6 We'll leave that code out if a container does not need policy will leave that code out if a container does not need Low-pricing will leave it out. This allows us to basically minimize the code to the minimal amount required to implement the functionality This is a very basic picture for who knows about kubernetes and who has never heard about kubernetes at all I think they talked before it was about kubernetes as well, but I'm not sure who is actually familiar with kubernetes about have so CNI is what what what kubernetes uses to basically make networking modular So whenever a container or a pod is started CNI will be invoked and will request a plug-in to provide networking What Sillium will do in this case It will basically generate the BPF program and it and attach it to that pod So whenever that pod is sending or receiving that org traffic The BPF program will be in the way and then Sillium or the BPF program will make sure that all the pods can talk to each other We have two ways of doing this. So We have a mode where we run as a so-called encapsulation mode. This is for those familiar with other plugins This is similar to for example flannel where we use Vxlan or Genif The benefit here is that there's no dependency on how the network underneath operates as long as nodes have IP connectivity and UDP connectivity This will kind of just work The second mode is kind of the direct routing mode where you have the ability to make your network Your network fabric aware of pod IPs or container IPs This could be achieved by using routing demons or by running something like cube router We kind of support both modes is up to you what you would like to run here Often people sort out with mode one because it's simpler as deployments get bigger They will go into that routing mode because it has some performance benefits We also support BPF based service low balancing So typically the the standard solution is what is on the right, which is called Q proxy Which by now has a non IP tables mode as well using IPvS I'm this is a slightly older slide, which is still focusing on the IP tables approach So IP tables basically is a linear list of rules that needs to be traversed every time Low-pallancing is happening and once the low-pallancing decision is done The decision is cashed in the connection tracking table But that first lookup is actually incredibly slow and it will get slower The more services that you add because it's adding more and more and more Rules to the to the list and that's simply because IP tables is 20 years old So it at that point scale did not really matter that much Just like IPvS. We're using a hash table based mechanism in BPF, which gives kind of linear Performance in terms of search This is a summary of networking features. So we have native IPv6 support We actually wrote IPv6 first and then added IPv4 later We have a very simple set layer 3 only mode. I think that's becoming pretty standard in a container world There's no concept of networks or subnets It's basically one big flat L3 network and every pod can potentially talk to each other Then you can lock it down with security rules So you don't need to think about how do I manage my addressing space? You can just basically Start small grow big and it will it will continue working We have a tiny small R-presponder that will allow to resolve the MAC address of the default gateway But that's it other than that. It's all L. It's all L3. We do efficient low-pallancing As we saw on the previous slide we have not for six support Which is the ability to translate between IPv4 and IPv6, which is actually pretty fancy It allows you to run an IPv6 only cluster, but then still reach out to IPv4 nodes You can make your IPv6 only containers Approachable IPv4 which which I think is pretty cool if a couple of forward-looking users already using this But I think this is actually going to be the future that no people don't want to manage IPv4 address space in large clusters I think they will want to have native IPv6 support Then we have a connection tracking that is optimized for container workloads So this general purpose Linux connection tracking is actually built for middle boxes I kind of we can do everything in the world of containers There's actually we can do a lot of assumptions which we have done in this case Let's check on time. All right So that's kind of the networking side of things. So that's a that's a CNI world That's give gives you with connectivity between pots Not that super fancy. I think some people are using us because we're a little bit faster a little bit more Efficient some people just hate IP tables and they will come to BPF because of that But where it's actually getting really fancy and interesting is on the network security side I'm specifically mentioning networking here because we're not doing image scanning. We're not doing system call filtering We're doing segmentation on the network security side For this I would like to bring up one example It's probably not a good idea for a kernel level upper talking about microservices, but I'm trying so This is a basic example where we have a service that is exposing an API Call it the jobs API Has three simple API endpoints. First of all, we can get jobs We can post an applicant so if an applicant has applied to the job, we can post that and add it We can also get the applicants data for a particular job Then we have two different front ends in front of that one front end is kind of the applicant front end So that will be the front end that is exposed outside and whoever is applying for jobs would Access or use this front end and the front end will then call post applicants if an applicant is actually sending Her cv, for example The second front end is the recruiter front end the recruiter front that obviously needs to retrieve the applicant's data as well This is a good example of like a single API that's being consumed by multiple services with multiple levels of sensitive data Right, for example, if the applicants front end is compromised Like retrieving all applicants is obviously like a worst-case scenario that we want to avoid at all cost If you look at the traditional network security world and how a firewall would have enabled this or Kind of soft security here is we would have done something like allow applicant front end to talk to the jobs API And whether that's label based IP based doesn't really matter like this on a pure networking front That's what we would have done and we could have locked it down on port for for three, for example, and so on Well, this obviously allows the communication to happen if applicants get gets compromised It could obviously call out and do the get applicants API call and retrieve that information And that's definitely not least privileged in the sense of API awareness or in the in the age of microservices So what we really want is want something like this and this is what selen provides and what we call API where Security we can actually look into the API calls that are being made between services and right now We're supporting hdp grpc kafka. We're just adding added kassandra memkejdi Um, and we're going to talk about our new fancy goal and based extension framework for envoy at the next kubicon Which makes it possible to add more and more protocols very very rapidly So what we really want is to write a security rule like this So allow get to slash jobs or post slash applicants from application or applicant front end Right like this is the level of network security that we need in the ho of microservices and this is what basically selen adds That is unique in depth in this front This is showing you the different enforcement points Where we can actually enforce security rules So in this case you have two containers or two parts We can obviously do a kind of security in between the parts that's standard stuff, right? So part or service applicant can talk to Jobs api and we can restrict it down to certain api calls We can also enforce communication into the cluster. For example, we can say My front my poo Pod foo can only be accessed by a certain subnet range for example That's typical if you for example certain services are only exposed to a vpn Then you want to limit it down to the ip range of that vpn We can also do security outside of the cluster for example if if you're talking to an api that's hosted Outside of your cluster in this case. We actually support dns space policies. So instead of white listing ip addresses you can actually Embed the dns name into the policies and we will automatically resolve it And update the policies as the ip's change for that dns name The last bit about security is that we are identity based. So I think that's another big move, which is instead of kind of traditionally translating Policy rules into ip addresses and then enforcing ip addresses talking to each other What we're doing instead is we're actually looking at all replicas of a particular pod And we are assigning a security identity to that to that pod And we're doing that for every single pod that's in a kubernetes cluster or every container in a docker swarm cluster And then instead of filtering on ip addresses We embed this identity into all packets that are leaving a particular pod So we can enforce the identity when when the receiving pod is receiving the packet So this is very similar to mutual tls or tls based enforcement The main difference is that we can do this on a packet level, which means it also works for udp It also works for icmp. It works for all network traffic mutual tls or hdp based identity mechanisms usually only work for For for example tcp All right before we go into the service mesh integration. I want to do a quick demo to To actually show this so All right, so I have a small kubernetes cluster here You can see it's star bar space. Let's let's just make sure I have wi-fi Maybe not yeah, all right, so it is cluster demo. We would slow. I think the internet connection is a bit slow So let's do the intro All right, so a long time ago in a container cluster far far away It's a period of civil war The empire has adopted microservices and continuous delivery despite Rebel spaceships striking from a hidden cluster have won their first victory against the evil galactic empire During the battle rebel spies managed to steal the swagger api specification To the empire's ultimate weapon the death star. All right, so that's the intro to our demo What I have here I have basically I have a couple of deployments that I will get started So let's deploy that. So k is just my alias for cube control. You're not supposed to say cube cattle anymore So let's see if that's coming up So I have basically I have death star of three replicas of death stars I have a couple of spaceships which are represent the empire and they have a couple of x-wings which are auto rebels Let's see if everything is up Okay So the next thing I have a small script that will x will basically extract me a cube control Exact command lines, which is going to copy paste that and run it. So what this will do is will actually execute curl Inside one of the x-wing Parts and it will talk to the death star service. So let's do that Well, this is basically the rebels spying out the death star. So let's see what the death star is spawns here Responsive some chase sounds like hey and death star. Um, I have some attributes. And by the way, this is my entire api that I that I surface You can do get the slash ribbon. You can retrieve my health. This would be the kubernetes health check You can request a landing, which is probably useful You can put something into the cargo bay. You can get the status of the hyper matter reactor Or you can put something into the exhaust part. That's that's That's pretty interesting So let's uh, let's let's let's do that Before the rebels come back though, they actually The empire actually starts thinking like shouldn't we protect the death star a bit? Um, so They will enforce a so-called layer seven policy Let's look at that So this is this is just standard kubernetes yaml. It's a crd custom resource definition in this case And it says here that this is the policy that will apply to all parts of label death star Um in the organization empire. It says this is an ingress rule this like for traffic going into the death star And any part of the label spaceship can talk to me on port 80 And by the way, you can do a get to slash we won or you can request a landing So that's kind of the public api endpoints That I want to allow so let's load that policy All right, so it's loaded now So the rebels come back First of all, I do a get to slash we won That still works. So now I let's see if I can Turn this into a put So Exhaust port access denied Rebels have lost you can see it here So I'm sorry. I got changed. I changed the story Well, no, no, no, we didn't we didn't so what do you notice that while the while the the empire has started constructing the death star The rebels the chat I have actually managed to infiltrate this. So they managed to basically Load a different policy to that the one we saw. So this is the policy that I showed you And this is the policy that they actually loaded. So let's look at the difference You can see there's one additional rule in there Which is you can actually do a put to slash exhaust port But only if you have the hdp header has force set. So let's try that So I'm going to need to add And it's dash a x Has force true Death star exploded. We're good again. So That was the demo and I think I'm running out of time anyway So I think we have maybe one or two minutes for question or is it completely isn't supposed to be half an hour Okay, maybe one question. All right. Otherwise, I will be back in the hallway before I forget I have stickers the the logo that you saw in the beginning Our special ip tables edition come up to me and I am glad to hand you over stickers. Yeah, one one question So, uh, we've been doing a lot of research into istio and everything lately and this seems to have a ton of overlap to istio How can you address a that be istio? Out of the box allows for best practices for microservices. Can you integrate things like exponential backoff and circuit breakers in this as well? Would this be instead of istio? No, I so I give you a super quick answer because the that will be in the next topic We would have dived into uh, I can give you the link to a presentation of last cube con that talks exactly about this The 32nd summary is this selam does not compete at all with istio. We actually have an istio integration Where we where we um allow for example, you can run istio and have pilot managed envoy And we enforce the same rules that you saw here in the sidecar We can also um accelerate istio have numbers and slides So like the short answer is wouldn't there's no way we should compete with istio. We're not like we're actually the best match So if you want to run istio, you should talk to us about running selam underneath Cool, right. We're running out of that. Thanks a lot. I'm outside if you have more questions