 I'm taking this with me again. Yeah, it's good. Thanks a lot. You're welcome. See you. What is your nickname on the jersey? I'm Boucher. Thanks a lot. Is it possible that I can use power from... What do you want to use? Power. What do you want? Yeah, I'm sorry. What's that? No, it's something. Yeah? Okay, good. The... Okay. The rest of the speakers. And this talk, so if there are any seats in the middle, can you please... Blah, blah, blah, blah. Blah, blah, blah. Blah, blah, blah. Blah, blah, blah. Blah, blah, blah. Blah, blah, blah. Blah, blah, blah. Blah, blah, blah. Excuse me. Can you move over towards the camera, please? No, no, no. That was last year. That was last year. Okay. This is a kernel subsystem. The kernel subsystem? Ok, I switch that. encode. Hey, good morning. about switch then. So first a little bit of history to set it up. So that's an Ethernet switch basically. Initially, an Ethernet switch was no more than what was called a bridge. Something deferred. Ethernet frames between basically two wires. You can't just connect two wires together. So you had something like something called a bridge, which was something like that to stretch the length of any Ethernet run. But of course, that was very limited. You can only connect two lines to a bridge. So therefore, multiple bridges started coming in. We're talking about like 34 years ago by the way. So multiple bridges, formerly known as a hub. You get a packet in on one port and it just flows it up to all the other ports. So hubs became very unpopular, very fast, because obviously they tended to just generate a lot of traffic where it didn't need to go. So in-came switches, whether switches, formerly as different to hubs, essentially they have an FDB, a forwarding database. It was learning and aging of Macs. That means it sees a packet come in on one port, an Ethernet packet, a frame, sorry. It remembers that the source Mac address came in on that port and therefore, if it ever gets a frame destined for that Mac address, it will send it out to that port. It will do that by storing that entry in the FDB, just Mac address that's the essential, that's the least of an FDB. Aging is an important part of that. We keep a timer and if it hasn't seen a packet from, is it five minutes? Okay. You see like five minutes left. Jesus God, what have I done? So if you see a packet come in, okay, you know that it belongs there from that port, but if it hasn't generated any traffic, you should age it out because maybe that host has moved to a different port, different segment of the network and therefore you should start flooding it again out of all the ports. So then what came in afterwards was VLANs. Very important. It's a way to segment your switches. Lots of ways to do VLANs, but you can do the VLAN priority tagging as codified in 802.1Q, the standard, where you can just do a partition switch where you don't have any extra tags on your ethernet frames, but basically you can figure your switch to be like two different switches, three different switches. And if you want to have forward traffic between those partitions, you actually have to connect the wire. So that's the basics of an ethernet switch, right? Now we've been able to do an ethernet switch or an emulation of it, you might say, in the kernel for a long time. But that was, of course, a software thing. You use a bunch of mix space. You put a bunch of cards. You put like four or whatever cards in your server. And you can configure a bridge, a software bridge. And it's got an FTB all implemented in the kernel. And you can look forward frames between them, things like that, and route them out to IP interfaces, things like that as needed. But of course there are some limitations there. First off, everything is done by CPU. So you need the actual general purpose CPU to do a bunch of comparing of frames, frame MAC addresses, looking up in the FTB. And it's not super efficient. Switches tend to be a lot more specialized. They have specialized memory, typically, for implementing an FTB that just makes it faster to do lookups. Another thing, of course, is these things tend to be IO bound. Initially, in early Linux, getting a frame on a NIC meant getting an interrupt on your CPU to interrupt would then cause the kernel to receive the frame from the NIC. But of course, too many interrupts kind of ruins your performance anyway. That was solved a long time ago. I forget what that's called. What was that system called? NEPI. That's it. That basically just handling more than one frame per interrupt. But of course, you've still got the same problem. As long as the kernel is doing all this manual work on a general purpose CPU, it's not going to be super performant. So then something interesting started happening. Little server routers and switches started coming out, starting with the venerable WRT54G and, of course, the WRT project came in. Now these are really kind of very specialized. They're small CPUs, but they have a built-in switch chip. That means they have typically something like five ethernet ports, and you can configure those to switch chips to actually do the switching between them. In other words, you program their FTB. You program the switches FTB so that if it gets a frame from one port, it could actually know, I'll just switch it through. I don't need to send that to the CPU. Of course, much more interesting, particularly on those little server routers where the CPU really isn't that strong. You can't really do CPU switching on those. So the WRT project needed a way to configure those switch chips in there. The problem, of course, was that you can't just use the vendor software. Vendor provided software or source code to program those chips because it's... Sorry. The problem is that those vendor software tended to be quite hacky. A lot of it was done in user space. That's one thing, but a lot of it was very much a hack. Programming a switch chip via some random SPI port or I2C port, nothing standardized about it at all. That was a real problem. What the open WRT folks found themselves doing was having to do real wildly different things to get the switch chips in different platforms. They came up with an abstraction, a switch configuration setup called SW ComFeed. It's basically a kernel framework and some user space tools to abstract out the programming of switch chips. It really got them ahead because there wasn't anything like that in the kernel at the time, the mainline kernel. So it was a big win for them. The features they needed to support tended to be kind of wild. I'm not sure what all was in there, but it was just random stuff that's supported by vendor switch chips. Some weird features sometimes. I'm not sure if SW ComFeed really supports them all. The problem with that was that SW ComFeed was never upstream to the mainline kernel. I'm not entirely sure if the open WRT product tried to upstream it, it was refused or if it didn't bother. Maybe somebody here knows if somebody is on the open WRT. That didn't show up. Next slide. It was somewhat tried. Hauke says that it was tried to upstream to the mainline kernel, but it was more or less refused because the kernel was starting to get something similar. That was the DSA sub-system. The DSA sub-system stands for distributed switch architecture, as I recall. It really comes from a feature present in Marvell, several switch chips, the MV88E series chips, which I think were originally the Intel switch chip division. I think Marvell just bought it. It was initially just for the way the Marvell chips worked, but actually it was pretty generic. You could do lots of other things with it. Some Broadcom and some Broadcom drivers were submitted for B53 series and the Starfighter 2, which I have no idea where it is, but apparently it's one of these small switch chips that you found at Salvo Radders. I see there's a Qualcomm after all switch supporter as well, the 8553, I believe, the 8000 series anyway. There's lots of room there, but the DSA sub-system, and this is very new, it exposes all the ports that it has access to the kernel. That means if you have a similar ladder running a mainline kernel with DSA-supported switch chip in it, and it has, say, five Ethernet ports, that means you get ETH0, ETH1, 2, 3, 4. So you get actual five ports. You can put different IP addresses on them, and just use them as a ladder. That way you can bridge them together as you like. If you bridge them together, the magic that happens there is it offloads the actual bridging to the hardware, and that's really important, right? Of course, that's what SWCoffee also did, but this is like it works in the kernel, sorry, it works in the kernel but towards user space, exposing the ports, but it does the actual switching if you configure it that way in hardware. So that's really important. The cool thing is the way you normally configure bridging with the VR control or the bridge tool or whatever, you just do it as usual with your interfaces as if there were nicks in your system, and just behind the scenes it all, the actual switching happens in hardware. That's really important, right? But the problem with the DSA is that it was intended to be MDIO only. I think that's because of the original Marvel thing, I don't know, but it was kind of limited in that way. There's lots of other ways to connect Ethernet switches to your system as many as there are buses, right? And there's enough buses. So then the switch-dev system came in. The switch-dev system is a proper kernel system. That means that you can invoke it from any bus. For example, you can have a switch-dev driver that is triggered by a device being present on a PCI bus or PCI Express bus. That makes it, of course, more interesting to, you know, the higher-end type of switch chips that need to kind of bandwidth that PCI Express uses. It's a very generic system, right? It's more generic than DSA. That means that DSA was essentially changed, the DSA's system was essentially changed in a kernel to be a switch-dev driver. So DSA is still there. There are still DSA drivers, but DSA, the framework itself, is essentially a switch-dev driver. If you want to do switching support for an MDIO-supported switch, the MDIO-connected switch, sorry, in the kernel right now, what you would do is you'd write a DSA driver, right? Because that's kind of the MDIO part of switch-dev. You might see it that way right now. It's kind of the easiest way, right? But switch-dev in general is super generic. It's got standard operations to work on your FDB. So add entries, edge entries, set aging time, and various other parameters. MDB as well, support. So multicast caching or group caching and things like that. VLANs as well, setting up VLANs. Let's switch that. The big news that this happened last year is that it was a while after switch-dev got merged before anything but DSA was supported. But at some point last year, Melanox, a vendor of very high-end switch chips, submitted drivers for their Switch X2 and then their Spectrum switch chips. These are super, super high-end, right? The Spectrum has a forwarding rate of 6.4 terabit per second. To give you some idea, that's basically you can have a 32-port switch. Every port, 100 gigabit, doing full duplex. On every port, constantly. And this thing can handle it. Right? You can use a switch with the Spectrum chip in it or the Switch X2 chip in it right now. Running a mainline kernel doing 100 gig on every port. 100 gigabit on every port. Just on a mainline kernel standard driver using the standard tools IPv2, bridge, all that stuff. That's magic, right? So what Melanox did is very much a pioneering thing. I think pretty much everybody in the network community and the kernel is hoping that other vendors will follow. Be nice if Broadcom would follow. Take a small miracle, but it might happen. In any case, there are a bunch of other vendors that are sort of on the fence about this. You might say somewhere in the spectrum between Melanox and Broadcom, there are a lot of other ones. Cavium seems kind of interesting. So we'll have to see what happens. Hopefully some more vendors will come in and ride drivers. If you want people to look at the Switch Dev subsystem, look at the kernel source. Include netSwitchDev.h. That's the entire API. It's super simple because it's so generic. I encourage you to take a look at it. So that's all I got. Any questions? 6.4, so it was for one node in detail with this number of 6.4 terabytes per second. Sorry, what do you mean? 6.4, the number is set because of the rich context. Oh, no, that's just the total switching capacity of the spectrum chip, the Melanox spectrum chip. The 6.4 terabit. I don't work for Melanox or anything. I don't mean to, you know, but I'm just saying that's a really high-end chip. It has a total switching capacity of 6.4 terabit. Do you have any questions? Do you have any questions? I don't know. I don't know. That's their marketing literature. I assume it's correct. I'm naive. I don't know. Any other questions? There's an exam after this. I don't have questions. Okay, thank you. No? Maybe. Or maybe it's just blinking slow battery. Yeah, I think it's all right. My sense? Are you kidding me? Are you kidding me? I didn't work in switching in the past. We were starting to do something. I worked for Intel, so we were starting to do something. I looked like a switch dad for the fulcrum stuff. It's good. So, you're using your own laptop, right? I have it on my laptop. I have it on my laptop. What do you call it? You speak here, if that helps you what? No, that's a joke. That's a joke. No, this is the old one. They were probably my thing pad. They were probably my old bag. Your LinkedIn profile, I was stalking. And I saw you went to DCU, so I kind of assumed. That was my postcard. I was in UCG before that. You know what they say about assumptions? Make an answer to you and me. This is nice. You see this laser quick box? It's a Raspberry Pi inside. It's brilliant. You know, it's not picking it up. But it's not picking up my... It's not extending the display. Okay. Well, it worked for me. I don't know why. Maybe we could do it online with the USB key. Yeah, maybe that would be better. Oh, just with the... It's a bit strange, isn't it? I don't know. It was a funny one. Yeah. I say I exported as PDF, so... Exported from... PowerPoint. Okay, that'd be perfect. Okay, some USB key. You know, 53 MB. Yeah. As people... Please, we can't have anybody in the stairwell, so if there are spare seats to the inside of you, please move in, so that we get as many people into the room as possible. And... If you can do that before he starts, that would be great. Could you switch? There's one particular row. I see five seats in the middle, and that would be really nice if you could lock those up. And there's four in the middle here. So, excuse me. Can you move that way? Thank you. Thank you very much. Next up, Ray Kensler, presenting PLDK, a TCP IP stack on top of DPDK and USB keys. Can you switch me on? Can I switch you on? I hope you can. I don't think people...