 Hi, Mike Low, I'm the architect for JetStream 2. We've done some unusual things here in the networking. So yeah, we'll just dive right in. In your classic topology, you've got leaf spine. You've got a bunch of links between the switches, the ISLs. Because between the server and the spine, there is no way to communicate a broken link, right? So you have to have redundancy all the way through. A way to get rid of that is getting rid of all the layer 2 and using layer 3, because layer 3 is resilient, right? It was built to route around damage. So you can compare these pictures, and the rounded one is much cleaner. But what may not be obvious right off the bat is when you drop all the redundant inter-switch links, you also drop your cost, right? So in JetStream 2, that works out to about 1% of the cost of the system dropping those links, about 100 grand. That, and I'm not sure that we could actually build it if we didn't do it this way. There just wasn't enough room in the system as it was designed to actually fit all of the ISLs. So, right. It's relatively simple conceptually. Use a routing daemon. You advertise to the switch. Hey, I know how to get to this particular IP address, right? And the IP address is the host IP address. And through some magic, you can advertise this IPv4 address over an IPv6 to an IPv6 destination. And because of the way IPv6 was constructed, there's always an IPv6 address called link local address that is automatically configured, always works. Everything that does an IPv6 stack has this. So you have basically a zero configure. I can advertise the four addresses to this magic address and it's done, right? This works great. It's called BGPM numbered. It's relatively known in the networking world. What's not usual is running a routing daemon on your host. This used to be a thing, right, back in the 70s with some of the IBM topologies. Before TCP IP, they would run routing daemons and route everywhere. So it was just really kind of a return to the past. Let's see. Is there a good example here? Winging this a little bit. All right, let's look at a live system. What are my compute nodes? Is this readable? Oh, could you bump it up a little bit, please, Mike? Okay, okay, is that readable? I think that was just about to it, yep. Okay, so you know we've got a RFC 1918 address here, right? And it's a slash 32. What we've done is we have constructed a router with routing daemon and we have told it that if you have a IP address connected directly to the box, say, like on loopback there, then go ahead and advertise that through BGP, right? And then we'll get to this later, but this is also another important line. This is redistribute any routes that you have to know about from static kernel route, okay? Right, so this guy's address, one on one, right? Pop off to another host. We find that guy has been advertised over BGP by the two redundant links over IPv6 link local addresses. Okay, one for each interface on the machine. Okay, so we build a whole fabric, all the compute nodes are all together. They're all advertising their addresses works beautifully. Okay, so we're gonna switch gears a bit. Due to lack of imagination, I think, we have this restriction, right? If you use DBR, they say, and it's not true, that every compute node will have to have a routable IP address. You'll consume one from your public pool or every compute node. And I wasn't willing to accept that, right? So you have a topology in the compute node that looks, and the network node that looks like this. And it's a bit crazy, right? But the important parts are when you use DBR, you have two namespaces. You have one for the router and you have this floating IP namespace, okay? When you assign a floating IP, the traffic winds its way from the instance through the bridge and through open B switch into the router or to the floating IP namespace where the NAT happens. And it goes back through OBS and winds its way back out through your provider bridge. So the lack of imagination is not realizing that this works both directions. So all you have to do is make sure that all the IP traffic for a floating IP knows how to get back through this provider bridge. And as I pointed out before, we are publishing routes to the kernel that we know about. So on this compute node, we have a static route, just IP route add, pointing it to the provider bridge. And we use the IP address as the destination that belongs to this floating IP namespace, okay? So that will get traffic directly out of the compute node as is normal for DBR. And it will also return it right back into it. So you don't have to use the network node as your ingress. Right, that gets you full bandwidth from your entire cluster. One of the tricks to doing this is using these neutron service subnets. So you can take a subnet belonging to a neutron network and tag it. You tag it with its purpose. And then IP addresses will only get pulled from this subnet if they match this purpose. So you can say I want this subnet to take my floating IPs. I want the floating IP, EJ gateway to come from a different subnet. Now that subnet doesn't actually have to be routable, which the documentation would lead you to believe otherwise. So that makes this not true, right? You can have non-routable IP addresses in a special subnet that you've tagged with its purpose. So you don't consume any of your publicly routable IP addresses which I know is particularly important for people who are not as fortunate with their IP address space, right? So in order to make this happen, you just need a simple moon or line patch all we actually have to do is when you're associating a floating IP, you just need to add a kernel route for it. And when it goes away, you remove it, right? So if we were to disassociate this floating IP, route's gone. You can no longer ping this address from the world, but if we were to find this again, associate a floating IP, we now have the route back, static route back in the kernel on the compute node. And we will find on another node, that address is published over BGP. And therefore is accessible to the world. So two-line patch, don't believe the docs and use the service networks. And you can get the full aggregate bandwidth of all your compute nodes to all of your instances. That's it. And that's it.