 Hello, everybody. I'm Marcelo, software engineer at Red Hat, and I'm here today to talk about OpenVStrich and Contract Harder Float. Why do we need that? How it works under the hoods and how the user perceives it? So, first let's start by reviewing what is Contract. Contract is part of the NetFooter project. It consists of a building block for functional assets as net and state-full-for-walling. It is essentially what keeps track of the connections. It is a table of six bookies, L2 and L3 protocol information, search in this nation IP and part, and it also stores extra information such as state, email, statistics, mark, label, and a couple others. An interesting piece of it is that it is reusable by other projects such as OVS. Then we also have this thing called Floatables. What are Floatables? Floatables were introduced in late 2017 as a software float or fast path. It bypasses the traditional forwarding path of this stack, because it already computed the steps that are needed when it saw the first packets, the first packets, and it was decided to offload this connection. It consists of seven table keys, the same six, but now plus the input interface. And why contract offload? Why do we need it? One may wonder. That is because OVS is being used for way more than L2 switching or stateless forwarding these days. With the hardware floating that we had for OVS so far, we could match on packet headers. We could do some actions in the packets, like encapsulate, add villain tags, pop villain tags, decapsulate. We could even do stateless net, but we can't do stateful stuff. If you require stateful things, you couldn't offload so far. And that's required by OpenStack Security Groups, which consists of stateful ACLs, and is also heavily used by OpenVS, as it relies on nets to work. So, yes, without contract offload, we would limit where the OVS hardware float that we had so far could be used. So this now makes the solution a whole. And this is a result of a big community collaboration. It was a really long-term project, years long, to be a bit more precise. It consisted of a collaboration from multiple parties and vendors, including weekly meetings with participation from Red Hat and Evidia, which was formerly known as Melanox back then. Netronome, Intel, Broadcom, and special thanks goes to all these folks who helped one way or another, contributing with comments or with codes and tests. And available since Kernel 5.7, it is there, it is upstream, we can use it. And it is also available on RHEL 8.3 as a tech preview. And it is only supported by the Melanox 5 driver so far. And it's the only one that supports it. So, contract and float tables. So, more on the center than on the diagram, we have the traditional contract box. We have the chains in there for ingress, we're routing. Then we have the system routing decision where it goes, and we have the forward chain was routing. And okay, we need to output that packet, so it gets forwarded. And the float table then attaches on the ingress hook and consists of this special table of facts that should be handled differently. It will trigger that fast path by pass in the bottom. So it will essentially skip over all those blocks that are on the top if the decision was made to offload it to the float table. And that is currently a software float so far here, okay? So that's the position that the float table usually is. And how can we use these float tables? Here is an example with the NFT tool. We create the table line attacks, create the float table F, attach them, attach it to the interface eth0, eth1. And then on chain Y we specifically say that packets matching these conditions will get offloaded to the float table F, okay? And once we do it, and when packets are flowing, they're matching, they get offloaded to that float table, we can check prognets in a contract and we'll see that offload mark on it. So that means that these, the packets belonging to this connection will hit that fast path. And then contract and TC. TC had no idea about contracts so far. There's no idea ever. And it consists now in two parts, match and action. Because when the part gets to the system, it doesn't have any contracting for attached grid. And TC hooks happen before that filters. So we will never have contract information attached grid unless something, even before TC does it first, which could be like a BPF hook somewhere. But we don't rely on it. So we would never see contract information attached grid. And it has to be these two parts because attaching a contract information on a packet actually changes it. So we always need to somehow match the packet that we are interested in, send it to a contract, attach this information on the packet and then we can match on those informations that only contract knows about like the connection state. If it's established, if it's part of a new connection, if it has mark X or something like that. And the matching part is rather easy. It is integrated in the flower classifier, which is the obvious harder float vector. Support for contract was added to float sector. So float sector now can go to the SKB and pull the bits that we are interested in from it and append to the mask. And then flower will just compare them. So for flower is just extra bits to be compared. That is, once the contract information is available, right? So it's just the matching site. So now let's see about the action part. It seems to implement some extra code because the float table, as we were seeing on the NFT example, it doesn't just exist on the system. Someone, something needs to implement, needs to instantiate the float table. And what is doing that now for obvious contractor float is the TC action called CT. Yes, drive control and connection tracking, so TC and CT. And it also has to interface the TC cover line and contract. So it has to make this sort of both worlds talk to each other. And this is so far it didn't deal with IP fragments. So if we have large GDP packets flowing, we need to defragment them so we can actually match on new DP headers because they are only part of the first fragments. So if we need to drop, we need to drop the whole thing. We need to defrag them. And the CT action will also control when and once your float and currently it will only float entries that are in established states so that the software always has control on how also that the connection is getting set up, that the same flags are passing and that kind of stuff. That policy is left to software. And the aging is done by the float table, not the CT action. But how the driver will know when the float table wants to expire then? When the action CT gets offloaded to a specific interface, the driver will receive a pointer to the float table that they created for the zone that is being offloaded. And with that float table pointer, it can then register a callback and the float table will then issue the comments on it that we need to offload this entry, we need to remove this entry, we need to fetch stats for this entry. And now we have the same diagram as before but with the t-singress hook on the left of it to have an idea where it is attached. It is simplified. Now let's see it a bit more detailed. When we have the bags coming in the system, it will hit the TC flower classifier, most likely if you are using obvious. At this moment it's not required because you just need to match the packed based on something that can't be contract related. And then send it to the TC action CT, which will trigger the following pattern. It will check if the packed already has some CT action on it, some CT information on it. If not, it will check the float table for the zoning question. And if not, it will then check it, send it to contract by calling an F contract in. So in this step, we'll create an entry if it's not known by contract yet, and then the CT action will attach it to the packet. And if this entry is now in established state somehow, it will also ask to float it on the float table. Then the next step is to the TTC to either reclassify or to jump into a new chain, which then need to hit the flower classifier that can interface with float sector and can match on contract information. So now we can defer our packets for connections that are already established, our packets that are there for new connections, and our packets that has the label Y, for example. And those we matched on it, take the wanted actions on it, and probably that's why there will be TTC action period, just put it back somewhere, okay? And it's very important to highlight that this contract called here in red, it can only be performed in software. So if you use skip software on TTC here, this block will never get called, and that means that new entries will never get created. Which means the float table will always be empty, and then the hardware will never have a connection entry to match. So contract float table and TTC. No, these three things together. Factors that made it complex. Float table is not to be exclusively used by TTC. NF tables as we saw also uses it. The float table didn't care about hardware so far. It was built to be a software or through the software test path. We had no knowledge about the hardware and that connection had to be built. And all the management decisions as I was saying need to be left to software, like aging, timeouts, and changing states from new to established. And the result of that is that float table now has the representation drivers. NF tables can also be hardware fluid. It's amazing when a collaboration goes that long and we have two projects benefitting from a common effort. In strictly speaking, what is actually floated is the float table, not contract, but for easy understanding. That's a contract. TTC and float tables. There is one float table for each zone. Whenever you initiate CT action, for a specific zone, it needs to be on your one zone. It will check if there's a rate of float table for that zone or not, and if not, you must initiate it. If there is, it will share it. And the ACT-CT is the CT action that we're seeing, yes, but it's not the database itself. That is contract. That is the float table. TTC is this piece that makes all the other parts talk to each other. TTC is CT action. It adds interest to the float table, which then will afloat them in sequence through the tree cards. It is asynchronous because here we're talking about datapath packets. They're passing. They should be handled as fast as possible. If we pause there a little bit, other packets may get stuck. And so the floating action is asynchronous because that's a contract. We generate a contract out of a datapath packet. And this means, again, that CT afloat is incompatible with steep software. Okay? So contract float table and TTC a bit more. The float table does, by definition, a subset of contract interest. Because it is always like this. The CT action will notice that something went to establish its date and it will request afloat it. So the contract interest are likely going to be what float table has, plus some interests that are not established at state so far. And other interests that are not related to this float table. Contract interest can only belong to one float table and only. As I was saying before, the CT action datapath will check in order if the packet already has some contract information test rates. If not, it will consume the float table. If not, it will rely on contract to perhaps find an entry on its own set or create a new entry for us. The first and the second one can be performed in hardware, but the third one, it can't. And the float classifier can only work with information that is already attached to the packet. So float can't interface with float tables. It doesn't know about it. And TC drivers and float table. So the float table has access to drivers via infrastructure that was already present on TC, which is the indirect float block. As I was saying, it registers a callback using the flow indirect dev register. With that callback, the float table can generate events and send instructions to the drivers that they should offload or remove some entry from it. And it's a one-way communication that drivers that we today think they can't generate the events to the float table. And one common question is can a float table be shared by multiple names? Can I reply that with another question? Why? Why would someone want to do that? But yes, it can be done because the CT action, it shares the float tables for specific zones. So if you have it offloaded to three different cards and they are referencing the same zone, it's the same float table that will be attached to them. That means that when the float table wants to offload something, it will send callbacks to the three cards. But in essence, that's just a reason for resources. And it's hard to see in any case of where that is useful. So our recommendation would be to restrict access to instance over multiple names to certain zones and avoid this sharing. Now an example of how we can perceive contract offload working when I'm using OVS. So we have this simple bridge with two vertical function representers attached to it. The way to configure it is just as we had to do it for, let's say, OVS hardware float in general, without contract. It's the same thing. It takes two knob using the hardware float knob on OVS and that's pretty much it. Set of flows to interact with contract. In this case, really simple. The first one is going to send all AP packets to contract. The second one will just forward packets that are not AP. And then do a distinction between new and established connections just because just to show it. And then we check on data path dump flows. We can see contract actions and matches taking place. But without dash M we don't see much difference on it. And if we add dash M, now we can see that it is offloading, it is using data path DC. Before this project, this combination of contract and DC data path wouldn't work. And they can be there together. Then checking DC filter dumps we can see that it is in harder and the CT actions getting cold. There will always be some software and hardware hits on it and probably 30 harder ones will always be much bigger. Now we are matching on the new entries because we are asking if it's on new state and not established state. And here as we can see there is no distinction between the statistics. So this one was only executed in software because new ones are always handled in software. Now we are checking the established states. Now we have some software hits but a lot of harder hits as well. And this is the last step of it just start putting packets. And if we check properly at the F contract now marking on it that we know that this connection entry was harder to float in. It is present on a nick and it will only hit software if it has some flags on it like reset, fin flag and that's it.