 Yeah, a couple of reminders before that. Don't forget about the leftover tickets that you will be able to pick up from the Red Hat HR person near the Exposition Center. You can pick them up at 1440. So go there to pick the leftover tickets. Please don't forget after the presentation this one and all the others to leave a feedback. You can see the URL on the whiteboard. Please delete the feedback. So the presenters know how well they did. Some of them will get some prizes at the end. And may I introduce Florian with a presentation about a very cool package, which is NF tables, the feature of ID tables. So welcome, everyone. The agenda of the talk will be roughly, I will first explain what NF tables is and why it's being graded. Then we will have a look at some of the underlying features in the kernel that NF tables is using so that we are on the same page with regard to some of the terms. Then we will talk about the limitations that IB tables has, and then we will look at how NF tables avoids these and solves some of the underlying problems. And then there will be a couple of examples of things that you can do with NF tables that you can't do with IB tables. So NF tables simply is a new packet classification framework based on lessons learned from IB tables. It was first presented publicly at the Netfield workshop in Paris in 2008, and then finally merged into the kernel roughly two years ago in 2014 for 3.13. So what's being replaced? Most simply speaking, IB tables, IB6 tables, ART tables, and EB tables, the latter one, which is the bridge filtering framework. So all these user space tools go away and they are replaced with a new one. And at the same time, we are also replacing the kernel implementations that does the IB tables, IB6 tables, and so forth, runtime evaluation of the rule set in the kernel. What we do not replace is the connection tracking engine, network address translation engine that stays the same. And we are also reusing the existing net filter hooks. We will look at what that is. And the user space packages like contract tools, contract daemon, and ulogd for packet logging also remains unchanged. So we are not replacing everything. It's just basically the IB tables and IB6 tables and so on. Now the more interesting question is why, because you usually do not replace something that just works, and IB tables works pretty well. But it does have some fundamental problems in its architecture that are not resolvable without changing so much that basically a rewrite is the only way to do it. One problem with IB tables is there was just IB tables. And it did only do IPv4 filtering. And when IPv6 stack was added, what happened? Basically, IB tables was copied from IPv4 net filter to IPv6 net filter. And then someone replaced all the IPv4 addresses in the code with IPv6 addresses. And then you had the same code base twice. And the same happened again with bridge and with ARP. And so basically, we have the same code four times in the kernel with slight variances for the different protocol families. And at the same time, we have very similar extensions that basically just do the same thing for different protocols like TCP or SCTP. It's getting out of hand. And at the same time, user space has all these different tools. So if you have to do bridge filtering, you can't use IB tables. You have to use EB tables. And the greatness is it also has slightly different syntax than IB tables, which is greatly confusing. And people that want to use IB tables in a programmatic fashion also basically lose on every scale because there is no way to do it race-free and to learn what rules exist. So it's basically a mess and there's no library that you can use for IB tables. There is IPTC, but that's an internal library for IB tables and it does not have an exported stable API. So basically, if you use that and you are on your own anyway and there's no support whatsoever. So what are net filter hooks? I mentioned that briefly. So basically, if you look at the kernel source code and grab through that, don't worry. There will be not much C code in this presentation, so it's a call down. You will see a bunch of inline functions or macros called nf underscore hook that are sprinkled across the kernel and placed in strategic places. And basically all this does is it provides other kernel modules a point where they can register functions that will then be called by the kernel for a packet. And these functions then have control over what to do with the packet. Like drop the packet, move on to the next hook, or queue it to use a space, things like that. And all the net filter features, like IP tables, connection tracking and so forth, use these nf hooks. So they register with the kernel, something like I want to see packets that arrive or that come in and please call this function when that happens. There are five of these hooks in the IP stack. Then another five in the IPv6 stack, similar for the bridge. And if you look at the hook names, and you have used IP tables before, then you will instantly recognize those names because they are also matched with the built-in chains that IP tables offers. So for instance, IP tables filter. Yes? No, sorry. Sorry. Can you add a second monitor? I don't have a second output device here, sorry. No, sorry. My bad. But fortunately, there's not much missing from the slide. Because the only thing that you can't see is basically the last two letters of post routing. There is not nothing else on the right side. So the basic idea is that you place these hooks in places where you want to have particular properties. So for instance, if you want to do network address translation and redirect packets to the local machine, you have to change the IP address. And if you change that after the routing decision has taken place, then it doesn't do what you want. So you have to do this before the routing happens. So that's why there's the hook in pre-routing before the routing decision. Then you might want to have different filtering for packets that are being forwarded and for packets that are arriving to the local machine. So that's why you have different hooks for forwarding and for input and so on. Now for IP tables. IP tables, example rule and things that come to mind here is IP tables limitation. You can only have one target. That's the thing that is after the J. So if you accept, in this example, but at the same time you would want to lock or to mark, you can't do that in one rule and you have to copy the rule or you need to help a chain or whatever. Rules are always stored in a binary blob in the kernel. So what IP tables actually only does, it translates a textual representation into a binary one. It has no idea what it's actually doing. The only thing that a user space knows about is how to parse the text and how to translate that into a binary representation. It has no idea what TCP is or what a port is. It only does the translation. Furthermore, if you run this command and to add a rule, then it doesn't add a rule, but what it actually does, it first fetches the entire existing rule set from the kernel to fetch this huge array of structures and it mangles that somehow to add this rule into the blob and then commits everything back to the kernel. In other words, when two persons or two programs at the same time try to add rules, then one of the ads will fail and silently disappear because of this race condition because IP tables does not guarantee atomicity for individual rules, only for switching off tables. And that's something that you can't resolve in IP tables because that's just basically how it works. The kernel has no idea of what changed in the rule set because it only sees old blob versus new blob, and so it can't really tell you what has changed and that would also be a desirable feature so that you could, for instance, monitor what rules are being added or removed, but it's not possible to do that. And user space has no idea what the rule set is actually doing, so you can't do optimizations in user space on the rule set unless you would go the full way and basically duplicate the kernel part that does the evaluation of the rule set in user space. So in user space you can't do things like eliminate dead rules, you can't detect things like, here I could rearrange these rules to make it match faster or things like that because user space doesn't know what the rules do. At the same time, we have a linear evaluation of the rule set. So in IB tables it starts at the top and then one rule after another is evaluated until a final decision is made like drop or accept. We have four times code application as already mentioned for all the families that we support. IB tables hasn't seen any extension since it was added and it's basically IB tables from 2001 and we still have a reader writer lock in the main traverser, in other words, it does show up when you profile bridge traffic and it's really becoming a problem and it has to be fixed some way. IB tables limitations are being worked around by adding various matches, so for instance, because you can't test for two different ports in a single rule, those IB tables, someone added a multi-port match and then you can give it up to eight ports or nine ports, I'm not sure, in a single rule and if you need more than that, you already have to add again two rules. There is IP set, which is an external tool that you can use to efficiently match on white or black lists. So you can use the IP set tool to generate a list of addresses or a map of addresses and then use the IP tables IP set match to directly decide whether it's a hit match or a non-match to avoid IP tables change with tens of thousands of rules, which you would have to do otherwise. There's the BPF match and the U32 match to match on packets where IP tables doesn't have an existing match, so for instance, if you want to match on, for instance, DNS packets, that's the only way to do it, you have to use the BPF match or something like that. We have a lot of duplicate functionality, so for instance, someone wanted to add the ability to match on IP comp packets and what did they have to do? They had to copy the authentication header match and then replace it so that it would work for IP comp. There are some ridiculous matches for special cases. For instance, someone needed to do load balancing for policy routing and they didn't want to add 100 mark rules, so they added this hash mark target, which computes a hash over the packet and then sets the mark based on that hash and that's a special feature that not many people need and IP tables should have offered a generic way to do that without having to add all those special case features. Then we have magic tables, for instance, the Manga table is special in a way that if during output the mark changes, then it will do a rerouting, but that's something you just have to know because the kernel does it behind your back. So what does NF tables do differently? A lot, it turns out. So first difference is all the protocol details are in user space, not in the kernel. The kernel, it's exactly the other way around. The kernel has no idea what TCP is. User space knows what TCP is and user space knows where in a packet the TCP ports are located and things like that. The kernel is just dump and it's just basically for lack of a better term, a virtual machine that offers registers and then operations that work with these registers. So the NF tables tool generates codes or instructions to do what the user has specified and the kernel will just dumply follow those instructions without knowing what's actually being tested. So the kernel has no idea that for instance the TCP destination port is being tested or the TCP flex or whatever just follows instructions like look at this register and check if it's this number and things like that. The main focus of NF tables is about efficient rural representation. So you can express a lot with NF tables that you can't do with IP tables. We will see in examples what that exactly means. So we replace the kernel implementation of the IP tables evaluation of data rules and we change all the user space tools go away and there's a single program called NFT which handles everything. And NFT is also, at the same time, IP tables save, IP tables restore, IP tables and so forth. So you can give it a file and it will load all the rules in a file. You can start it in interactive mode and then you have a shell like prompt where you can input rules directly and you can also add rules on the command line. From a kernel point of view, there are no matches or targets anymore because that distinction is something IP table specific and it doesn't make a whole lot of sense. So the term match and target has no meaning anymore and it's just called expression. And an expression can be terminal like accept or it can be non-terminal like for instance log so you can have as many actions as you want in the same rule as long as those are not terminal so you can't accept and drop at the same time because well, that's not possible. So we basically have two groups. We have some basic expressions like immediate which is just numbers. We have expressions like payload which basically just asks the kernel to load. Some bytes from a particular offset and stuff it into a register. Then we have expressions that work with those registers like is it equal to, is it larger than, less than, not equal, we have bit operations so you can do a flag test. Counter is also no longer built in so if you want packets to increment the byte and packet counter you also have to say, you have to specify this counter expression on the command line. There are a bunch of special and more complex expressions such as contract which is the same as the contract match in IP table so you can check if a packet matches an existing connection or not or if it's related and things like that. And there is the meta expression which basically replace a whole lot of IP tables matches like mark for instance or you can use it to fetch the input interface name and test for that and everything basically that is somehow not related to the content of the packet itself but somehow to the meta data. So the associated was the packet like the firewall mark, the input interface, the output interface that is being used, even the processor ID that is currently used to process this packet. The user space front end has a new grammar which is somewhat similar to what TCP dump uses. User space parses this grammar and then translates that into the NF tables instruction set. Another thing that is very different from IP tables is that the user space and the kernel representation of the rules are completely distinct in IP tables. It's the user space and the kernel uses the same format so you can't make changes without breaking everything. And in NF tables the kernel has a completely different understanding of what a rule looks like which greatly helps this optimization. We have netlink as an intermediate format and user space will just serialize that in this format and the kernel will just do some sanity checking on that just to see that for instance you don't have infinite loops in your rule set but it doesn't do anything other than basic sanity checking and will just translate that to its internal representation. So if you look at this rule there and you fire up NF table in a debug mode and it will output the instructions that it has generated based on the input. So for instance this here, 22, is this number here because that's the network byte order representation of the number 22. And we see it has generated a payload instruction to fetch from the transport header something and it stuffs it into register one and it does a compare and if that succeeds it will place the verdict accept into the verdict register zero. And what it also did, it prepended another payload instruction beforehand because you're asking for TCP port 22 not for instance for UDP or something else. So it automatically generates this dependency instruction to fetch the protocol header from the IP header takes that into register and six that's TCP. So it will check if it's TCP and then proceed and otherwise it doesn't match and it will leave here. Any questions so far? Yes. I have a question on the last slide before this one. There was about the interface in the meta tax interface input interface now I'm able to match the interface like HDA, EDH plus. So I can match several interface but unfortunately nowadays the system the changes the interface's name so it's almost not possible anymore. Will this implement this feature so I can match multiple different names of interfaces? So the question was whether you can match multiple interface names with the input interface names meta match and the answer is yes. We will see in some examples how that works because it's not just limited to interface names basically you can match a lot of things at the same time with NF tables in a single rule. We will see that in a couple of next slides. The other thing that is different than in IP tables with regard to interface naming is that there are two types of interface naming matches. The one that actually checks for the name and there's also another one that internally checks for the interface number so that for instance when system D or something else renames the interface and it will still continue to match if you use the one form which matches on the index because the index is unique and stays the same all the time or if you use the other form then it will no longer match. An interface index is always unique. Yes, yes, yes, exactly. Between reboots it can change, yes, but it does not change while the system is running which this was about. If you, okay, I don't know what the system D does so it doesn't do. So yes, this is basically the most confusing feature for people that are used to IP tables because in NF tables there are no tables. Well, that's not exactly true. There are no built-in tables. So what NF tables does it ships by default. A few template files conveniently named something like IP filter or IP six filter. And this here is basically how do I get the IP tables, filter table and NF tables? And what you have to do is you have to say, I want a new table and the table should be called filter. Could be any name in the IP stack, not for instance in the bridge or in the IPv6 stack. And this is table is basically just a container for chains and then you create chains here. And this tells NF tables that it should tell the kernel where to hook it up. So you want a hook at input with a particular priority and you want a hook in forward and you want a hook in output. So for instance, if you don't want or if you don't need output filtering then you could just remove this chain output and it would also help a bit with performance because the kernel would just skip the NF output hook completely in the kernel instead of just redirecting it to an empty chain that doesn't do anything. Same for a mangle table. There is also something interesting here. There is no hidden magic anymore with tables or chains. You have to specify a particular type if you want magic behavior. So for instance to replicate mangals or IP tables mangle output behavior to do a rerouting when packet mark has changed then you have to tell the kernel that it needs to do some extra work when evaluating this chain. This here is an arbitrary example. So first is the family that you want to hook. The name can be anything that you want. Chain name can be anything that you want. The filter type means basically just the default. I want to accept or drop and nothing else. Drop means I want to want NF tables to re-consult the routing table after it's done. And this here is important to specify which one of the five built-in hooks that you want your packets to be hijacked at. This is a family overview. So basically for ARP tables you would use table ARP, for EB tables you would use table bridge and so forth. There are two new tables or new families I should say in NF tables which is INET and NetEF but we will look at those soon. Other important changes in NF tables are finally the rule replacement is atomic so if several instances try to inject rules at the same time and it succeeds then the rule will exist in the kernel and that's it because all the rule replacement is transaction-based. There's a unified tool, so same syntax regardless of what family you are using. So if you use bridge then you don't have to get accustomed to a different syntax as with EB tables, IB tables, it's just the same. The INET family which is new should help greatly with hosts that use both IPv4 and IPv6 because you no longer have to duplicate all your rules for IB tables and IPv6 tables. So if you want to accept say SSH inbound you just accept TCP, DPPOR 22 and the INET family and it will just work for both IPv4 and IPv6. Also greatly sought after feature is the ingress hook in the new NetEF family which allows you to filter packets before they entered IP or IPv6 stack so you can actually filter on raw IP sockets. So if you want to say drop raw IP frames then you can do that in the ingress hook and it's also different in that it allows you to attach rules directly to an interface. Syntax is pretty much exactly the same as early one. You ask for NetEF family, name doesn't matter, doesn't matter, hook name is ingress and you have to specify a device where the rules should be attached to and after that you can use the standard NF table syntax to for instance check for a particular event type and it will generate the required dependencies to make sure that for instance if the device is something other than either of them this will not cause false positives. It's both because for IPsec there is a, IPsec packets travel rules that wise. So you would see, so you would see once you would see the IPsec raw frame and then on the second location you should see it on unencrypted. Just regarding the forms but what happens if you pass IP addresses to that family? That's quite simple. In the background if you say IP source address 1234 then it will inject a dependency that guarantees that this rule is only consulted if it's an IP packet. So it will just work. You can use IPv6 or IP addresses in the NF family, no problem. Now for some of the more interesting features that NF table says, I already mentioned that it integrates sets. What that means is that you can in a single rule use curly brace syntax like this and then you can give it a list of as many addresses as you want. And unlike IP tables it will not expand into multiple rules but if you look at what it actually does it will fetch the source address in this case push it into register and then there's a single lookup instruction that checks whether the address is part of this anonymous map. And based on that it will just match or not match. Yes. It's possible to define a range. Yes it's also, the question was whether it's possible to define a range and yes NF tables also allows you to match on ranges natively without extensions. Another feature is that you can concatenate keys so you can match on address and destination port pairs at the same time. So that's with this means it's a placeholder and that it should expect destination address and port as the content of the set. Yes. The question was whether you can mix IPv4 and IPv6 in one set and the answer is no, not yet. Another feature is named sets and for named sets you can basically reference the set in rule set by a name. So you first have to define a name and tell it you want a new set. Then you have to specify what type is going to be used for instance IPv4 in this case and port number and then you can use it yet followed by the name so add foo here to reference to this set in rule so you can use the set multiple times in the same table and you can also remove and add entries to that set at any time which should be useful for something like fatal ban. Yes. The question was whether if there is support for NF tables and drawwall and I have no idea, sorry. Could you please resolve this discussion later because I'm running out of time and I would like to present all the features. Thank you. So since it's cumbersome to remember all those names you can query the NF tables tool to tell you what it is. So if you say NFT describe and then say something like TCP port and it will tell you INIT service and same for IPv6 or for meta marks and things like that. Next cool features are maps. So for instance if you want to do a lot of net transformations you no longer need a rule for every transformation you can put all of them in a single one with map keyword. So this is basically the address that you are expecting in the packet and that's the address that you want to replace it with. And again there is this lookup instruction that's being used in the background to lookup the content of the set and at the same time it will return something namely this into a register and the net instruction will then just grab whatever in that register and push it into the packet for replacement operations. And that's basically the greatest feature and the most important one because that's the one feature that allows you to very efficiently replace the rule set and it's also something that you can't do with IP tables because now suddenly we have full control over the control flow of the rule set with verdict mapping. So you can say for this IP source address except for this one drop, for this one jump to this chain, jump to that chain, all in a single rule there is no overhead in doing a linear evaluation as with IP tables. So this is basically you can represent everything with a tree or a lot of decision tree. For debugging, we now have an equivalent to something for something like the routing monitor that you have for the IP route package. So if you launch NFT in monitor mode then it will log every rule that is being added. It will log when a rule goes away and you can just check what's happening on the computer. This is also very useful for tools like FireWaldy because now they can learn when someone else other than them starts to change rules and it allows them to discover what's happening outside of their control. For debugging, IP table says a feature called trace. You add a rule with the trace keyword and then after that you're basically on your own, you have to look at the current ring buffer because it now starts to fill with such crazy output for every packet that is being matched to somewhere in your rule set and it gives you line numbers and then you have to basically look at the IP table safe output and then somehow map this output that you're getting to your rule set and then you have to figure out yourself why this matched or it didn't match. NANF tables, that's integrated natively now. So you also add a trace rule and after that you no longer have to do anything special, you just fire up the trace monitor mode and then it will dump directly every rule that is being matched and not just a line number, you will also see an ID number so you can figure out which, if there are two packets that are matching at the same time, you can tell them apart and you know this packet is going through this rule and this policy and it matched somewhere in this verdict map and there was a jump verdict and that should allow debugging in a much more simpler way and it's also internally implemented via Netlink so you could also subscribe to that automatically in a programmatic fashion and get Netlink parsable data to allow say a graphical representation of what packets are being matched. So last slide, so we should be on time. So future work, the one outstanding thing is a high level library for third party applications to integrate with that will happen unlike with IB tables where there's no such thing and there will never be such a thing. It's also not yet entirely feature complete so some of the popular magic extensions from IB tables that we don't have at the moment is policy matching for IPsec, rate estimator, reverse pass filtering is missing, hash limit is still missing but it's being worked on so that will happen soon. Some targets are missing, for instance, CCP MSS mangling, the contract target is missing. There is no NFQ and contract matching for the bridge family so far and we also have not yet done any extensive performance testing and optimization work on NF tables. Although NF tables itself should be quite fast because there are no locks anywhere in the evaluation path and there are no default counters anymore so a lot of the performance penalties that IB tables has to pay by design are no longer there anyway. If you want to look at NF tables I highly recommend this NFD-NF table speaky which should have a lot of examples for you to look at and yeah, any questions? It was installed in 4K. Is this some command where I can say NFD pick up these rules and write them? Yes, so the question was if you have an IB tables rule set is that something to translate the rules that are NF tables and the answer is yes, there is a conversion tool being worked on and it's slowly, as we extend NF tables to replace a particular feature that is only available IB tables that's being extended to give you the optimum output for, yeah. RFC 1918 for the private data source. So the question was there is this curly brace syntax to match the IP address and the question was whether there is an alias to for instance match on RFC 1,918 addresses. The answer is no but NF tables does have a defined feature so you could create a define that expands to this list. Basically we could even add it to the NF tables default config file. It would be something to consider again. Yes, this presentation? Yes. Yes. Okay, I'm eating and there's somewhere is Alex. We are for the session chat.