 I presented that this talk at London and at the KubeCon and not Europe in Valencia earlier this year This time I will do it a bit different But I have brought a buddy with me who's gonna talk about that some of the pieces they did at isovalent So I will first start with a problem statement because when you do this and people ask me why we want to do this There's there needs to be a problem statement The logic is around how do we make our network more tuned to the application? We normally do things with encapsulation We do some kind of networking tricks, but we don't look at what we're trying to do is actually connect applications So I thought I brought just three statements, which I liked around now Do we some of the problems around the applications? for example Network treat in 2030 from the ITUT explained that one of the gaps we had in an existing internet right now was the Network doesn't really connect well really relevant with the the applications Another one which is famous I think from it was in medium somebody complained about how to make the network get out of the way for an application developer But the one I really liked it came out of sick come last year with from a mid for a bad at from Google when he was They're talking about the network and application integration Explaining that most of the time application developers don't really care should not care and don't want to learn about the network They just really want to build their applications and the concept of what is network for them is kind of overly complex for no reason And I really like this doc because in a utopian world We would like this to be happening we would like to be able to have a simple declarative and developer friendly way to do the application networking And evolving towards a network specification or a way to the expose those networking Concept in a meaningful manner from the for the developers not really how to do a layer two constructs a layer three constructs So basically what we want is a developer an application developer or coder to be able to expect the specifications Some magic will happen in the network and try and make that happen. And how do we make that developer friendly? So Crash course into a service six network programming Some people might have their ears ears bleeding afterwards, but this is K This comes out directly from the ITF. It is afraid a standard It's an RFC, which is a 89 86 which allows to create network programs using Basically policies composed of instructions and behaviors or behaviors that we call them network programs have a view Which are network-wide you build a network program, which is end-to-end from a telecom perspective But your instructions are locally defined you build a your code and this is the one that it does the Instructions it's built based on standard IPv6 it leverages IPv6 as a common denominator and transport And basically what you do is you connect those small instructions together to create that end-to-end program It uses a source routing paradigm meaning that at the source of the network or your source of the Domain of segment routing you actually encode all the places you want to up in your network And it will give you the application Actual program you wanted to adhere to this is done to either an extension header, which is also a standard RFC 8754 or True is basic use of IPv6 other IP headers It those instructions are Identified are encoded into what we called pat segments or segment segment IDs Which are represented into an IPv6 format and there's two ways of doing so one which is the base Which was the initial one which using 128 bit of the IPv6 address to be able to identify segments The the newer one which is compressed in is how to use some Chunks of 16 bits out of that 128 bit to create those segments There the latter part being more friendly for towards ASIC, but they actually both support are the same architecture So basically you can encode a path into either a list of 128 bit segments in an extension header of IPv6 of Just another IP header if you only want to have one instruction and only one other IP Either if you want to have multiple small instructions, that's going to compress it version So you're gonna tell me what the heck does it do with applications? So basically you do your instructions Instructions are packet processing programs Everyone to do when you treat the packet can be done in p4 Can it be done in C can be done in ebpf for example, and it's all you do What instructions you want to your packet to be processed on and you implement them at multiple points as I said before They are locally dependent So I can actually create a code that nobody in the world will ever know what it does But it's my code and actually it's it's customized to me They can be standardized as some of those instructions are standard how to do encapsulation of layer 2 how to decap and do a lookup in IP table before those kind of things and Some can be totally user-defined So for example a network function vendor would want to create something which is SRV6 compliant could decide I really I do a complete instruction code into my box and my function and it's still it would still be a Sorry six compliant, even though I haven't not gone to the ITF and standardized it So to my logic and if I do the summarization of SRV segment routing v6 I call it the one protocol to rule them all Because once you've done it well, you don't have to go back to any standards to build rebuild and be I have acknowledgement It's all integrated Once you've done this once you have created all your say your your segments Then you create your SR policy or your end-to-end program and then you steer traffic Based on any kind of characteristics you want to do either logical physical interfaces or five tuples and all those things and the Actually, how you will build you will build your network your you build your your network program So what is interesting is from a single logic of a policy of a single address bit I can represent any networking constructs layer to layer tree functionalized firewall everything can be identified this way so That brings me back to the dreaded telco networking use case This is how we actually have to deal with networking right now in a community's world Most of our application need to talk to multiple domains. We don't know how to do it easily in communities we create an insane amount of complexity with Multis multi CNI try to do those kind of folks We still haven't figured out how to connect my communities cluster to multiple environments and the reality is network operators even Enterprises have that problem So and before that we didn't really think about this But I think it became into a spotlight when telco started to think about using communities, which was the first thing that happened But so what I'm trying to look at and when we look when we engage with these element team is how could we do something a Bit more disruptive and looking at new ways of addressing the problem This is where we went and did SRv6 VPN or layer tree VPN into CELium How to leverage the construct of CELium and the policies egress policies to be able to be able to have a pod single interface single default tie routing table nothing complex no forwarding rules no Insanities of IP static routes to be able to attach the right the right networking constructs when required and use the logic of BGPV PNV for for for the Layer tree VPN advertisement to be able to create those policies dynamically So from a developer perspective, you just assign a VRF the rest becomes really quite simple Once you've done this any pattern is now possible because I found a way to hook my network my application into a network My network can be stretched across multiple domains multiple technologies because I can still use the earlier constructs of the network with like Gateways and not devices whatever we want. We can also introduce service chains We can actually insert functions or different devices either physical or virtual inside that policy Meaning my cluster can now read my my program can now reach beyond the communities cluster and be able to attach to To the network or to have the services So in essence, I'm making my network more programmable by combining the strengths of eps of eBPF and SRv6 How do I do it? You build a simple instruction using eBPF code It where it's in encapsulation decapsulation any kind of packet processing engine You link those segments to those instructions to a say a service segment ID. Basically, you give them an IPv6 address You build the policies out of those out of those IPv6 addresses those segments Which creates a segment list and then you find a way to steer the traffic in any way or form into that policy And be all do you now have a network program? Simple encapsulation supporting overly underly and be able to do serve complex service chaining Which in the in in realm I can actually have an application doing network functions at the application level Still being able to be tied to a program Which is also supports my underlay traffic engineering for example Network slicing if you look about the 5g transport all of that all those kind of things including service chaining in a simple Simple way using the the SRv6 protocol and the flexibility Silyam and eBPF give me That's basically the crash course of why we went at that approach to support our networking requirements I will pass the mic to a to a Lewis we can explain the details of how it's been implemented Everyone yeah, my name is Lewis. I'm a staff software engineer at Isovalent and for this part of the talk I'm gonna go in to detail on how we achieved the POC of a popular SRv6 feature L3 VPN which was somewhat explained already so The first concept that we needed to achieve was pod VRF membership The idea here and one of our design goals was for a pod to belong to multiple VRFs and To achieve this we would like the Silyam data path to identify the destination traffic of its egress traffic destination of its egress traffic and Depending on that destination it'll identify which VRF Should that traffic be tagged as In order to do that like most things with eBPF We accomplish this with a series of maps which stores state So I'm actually going to explain these maps from bottom up here the first map that we have is the VRF map and This takes the pods IP address and the destination sider and maps it to a VRF ID So this gives us the notion of VRF pod VRF membership Next we have a SID map which is a simple mapping between SRv6 SIDs and the VRFs Which the SIDs act as locators for and finally we have a policy map The policy map takes the destination sider and the VRF Which we've identified the traffic belonging to and then provides you a SID This really allows us to encode multiple SIDs for a particular VRF Okay, so what does that actually look like on the data path side the next The next concept that we have to achieve is traffic encapsulation, right? So the idea here is that you label a pod And that pod that label determines the VRFs that the pod could potentially be in When the pod begins egressing traffic the psyllium data path will identify which VRF It would belong to and then it performs a policy lookup Inside the policy map which are just covered to identify which SID should actually in cap the outgoing traffic So at the actual data path level, what are we doing? So here I have a packet flow diagram and I'm going to start at the top left when a pod sends traffic We traverse the egress the v-th device and we hit the host networking stack From there, we'll do a fib lookup and we'll find the native device On the native device we attach TCEgress eBPF program. If you're not super familiar with the terms TCEgress or TC ingresses They're more or less cues which you can hook eBPF programs onto as the packet traverses either direction of the interface So let me go through the actual logic flow of what happens when SRV6 traffic is egressing We'll perform a lookup into the IP cache map So I didn't cover this map much, but it's a mapping of IP addresses that Sillium knows about It gives us metadata for instance It will tell us that the destination address is it a IP address in our cluster or not If it's not in our cluster Then we're going to actually go the in-cap path from there The traffic will do a VRF lookup on the traffic will take the source IP address and the destination address and then look up the VRF ID Again, we'll take the VRF ID and then we'll do a destination Along with the destination address and do a lookup in the policy map the policy map will provide us the SID will in-cap the traffic and then the SRV6 becomes the outer packet the SRV6 SID becomes the outer packet and then the inner packet is the original pod traffic So now the next step we'd have to deal with is traffic decapsulation So in this packet flow diagram we first have a native device ETH0 and Incoming to the Sillium data path we see that there is a SID When the Sillium data path determines that this could be a SID We're going to do a lookup in the SID map and Determine if we know about this did did we find this from the control plane? Did we allocate it ourselves if we did we decap the packet? So again, what does that look like on the data path here? We have the ETH device hanging off external network and packet is now ingressing packets are ingressing now We again attach a EBPF program on the TC ingress this side and we perform a SID lookup With the SID map and then if it is a destination SID We know we decap it and then we send the packet the inner packet to Rx that might be more data path processing normally that just goes to the pod So L3 VPN usually also includes a BGP aspect and in this case Sillium can act as a BGP speaker and it can actually learn about upstream PE VPN networks this way so when Sillium is acting as a BGP speaker we have a concept called the BGP control plane and it can peer with upstream PEs and the advertisements VPN v4 BGP advertisements hold the data which Provide the VPN information the SID the prefix And when we learn about this we go ahead and we map those VPN v4 advertisements into data path map calls which programs the maps which we Outlined before and sets up the encapsulation side Likewise for the Decapsulation side again using our BGP control plane We're able to understand a VRF has been created and we can allocate a SID for this VRF and then we can advertise that forward to Upstream PEs now when they learn about it. They'll understand that They can encapsulate traffic in a particular SID send it to us. We'll decap it and then send it off to a pod Okay, so with that we actually have a demo of the POC and Enzo So it should be audio involved with this Okay, well, I'll just talk us through it then the best I can so we have a topology here and Let's describe it starting on the right side. We have Sillium acting as a PE and we have pod One which is in VRF zero Sillium is Peered with PE zero and it's going to advertise L3 VPN routes for the VRF zero PE zero is an FRR router and it has two routers hanging off it CE one and CE two and They both have overlapping VPN addresses So the idea here is that pod one can send traffic It can be encapsulated and reach a VPN 10 3 0 1 24 over VRF zero Okay, so we're taking an initial look here at FRR and this explains the FRR configuration that we have This is PE zero. So this is explaining That the routing table of FRR has two NDT four routes these NDT four routes Tell FRR that when it sees those SIDS We're interested in the VRF zero routing table here the routing entry there and the SID of the B two zero zero one hundred when it sees that SID Ingress it's going to de-capsulate it and send it off to VRF zero. That's our initial FRR configuration so what we don't have right now is the Incapsulation portion which says the return traffic going towards FRR needs to be encapsulated in a SID Which locates pod ones VRF zero on psyllium side? So we apply a configuration and then we trigger this what what this is doing is having psyllium allocate a SID and and Sent making an advertisement up to FRR saying to get to VRF zero You can encapsulate this traffic in this SID that we've allocated. Here's a log line showing us that we've done the actual SID allocation which is highlighted there and then the resulting routing table one hundred On FRR indicates that we have learned about the allocated SID and that when we see Egressing traffic to ten one zero zero slash twenty four We're going to encapsulate it in the SID which psyllium Allocated and sent forward over the BGP control plane The ten one zero zero slash twenty four. That's our pod sider. So that's going to get the traffic back Okay, so here we start a ping from pod one in VRF zero This is a good sign because the traffic is already coming back. So we know we have end-to-end communication But now we probably want to dig a little bit deeper and show a TCP dump of what's actually going on on the wire itself and Okay, so now we have an echo request and an echo reply What we can see here is the outer IP address of the request You see our allocated SID and you see that the destination is the VRF zero on FRR side the SID which locates that You can see that it is an encapsulated packet and the inner IPs are our pot a pod IP address 101097 and the destination is VRF zeros 10301 Likewise the return traffic is now coming from our host or our FRR node at B one colon colon one and it's destined to our SRV6 SID on the psyllium side So ideally what happens is that when psyllium sees this SID It's going its data path is going to be set up in such a way that it will decap that SID pull out the inner IP address of 101097 and deliver that forward to pod one and That is currently occurring because you see the pot you see the return traffic getting back to pod one demonstrated there and then the last interesting thing to note is just looking at CE ones right so CE one is all that all the way on the other side of the Customers network. It's past the PEs and it's over hanging off FRR is VRF zero interface What we see here is the encapsulated traffic, right? So this is just a a proof saying okay. Yeah, we did the decapsulation We don't see any IPv6 addresses. We basically have end-to-end L3 VPN as a proof of concept working So, yeah That's all for the demo and then There was the Yep, and just I am far from the only person working on this It's been a lot of fun and it's definitely Some cool network tech that I got the liberty to working with Two other great engineers Paul and Yutaro and Jerome three other great engineers Great. Yeah, I hope you enjoyed it. Thank you very much. If you have questions come to you Hi, see you have two amazing and awesome policy expression languages and I'm curious for the SRV six side if you think about end-to-end observability, but what's happening between eBPF and SRV six is Hubble taking SRV six data or do you sort of do debugging between eBPF and SRV six if you're trying to figure out end-to-end or what was the state of our route At one point or which labels were used which v6 addresses were used for what at one point like how do you think about that? End-to-end operability debugging Yes, currently, I don't think there is a integration with this POC and Hubble currently But the traffic flow would follow the normal data path flow. So given That we can see the traffic and pull out Labeled information then we do have the control plane which has all the smarts of what's going on, right? So we should be able to pick apart that information and be intelligent about how we show the traffic observation and there'd be some storage of what that was at 10 p.m. Yesterday Versus the network view of that pull down your mask a bit. Yeah, right, you know So so going a few days before not just the real-time, you know thinking about that But how you would look at what the state was a few days ago if you're trying to debug a ticket or Some question about connectivity, you know something that might not have been working, but didn't wasn't detected in real-time. Yeah So you you will add the information out of a bowl for the data path the routing plane You can actually get the stream telemetry or the stream data about the BGP routes You can get time stamping on all you your state of your route work I think it's not part of our ball, but the correlation can be done quite a quite easily Any other questions? Thank you very much