 Hello, welcome to Extending Service Match to the Edge. Let's get started. The agenda is actually very simple for this talk. We just wanna talk about the motivation behind trying to extend Service Match to run on the Edge. Some of the key challenges that you would face, a demo that would provide a solutions, or our solutions for one of the challenges that we think exist. And then we'll wrap it up by talking about some future plans that we have on a project. Typically, when we, anyone talk about Edge, the first thing they ask is, what do you mean by Edge? And to address that, it makes sense to take a little time to talk about where we came from. We came from a project called OPNV Clover. And OPNV is a number of projects. It stands for Open Platform for Network Function Virtualization. NFV, Network Function Virtualization is basically the telco's way of attempts to cloudify the network services. And OPNV itself is an integration and testing projects where you take open source projects, integrate them together, write some test cases, and try to make sure that they can actually address particular telco NFV use cases. Particularly for Clover, we are interested to integrate cloud native related technologies or open source projects. And then trying to see what kind of NFV use cases we can address. And then in this case, what we get feedbacks from users and then after our investigation, is there's a very strong inclination from the users that they wanna use Kubernetes on all the cognitive technologies on mobile Edge, namely what we call MEC, which used to send for mobile Edge computing. Now it's multi-access Edge computing. If you think about telco's, they actually have tremendous Edge site presence. They're radio networks. When you're on a cell phone connected to AT&T or someone, there's a close proximity base stations that actually terminates your radio connectivity. So they have those sites that are in small buildings. So, and then with compute powers. So they have actually, if you will, a micro data centers and that presence has been around for a very long time. And then particularly for 5G upgrades because the bandwidth is so much bigger and then what kind of applications and implication of the applications that would come in. They are, the MEC effort is there for taking advantage of that access compute powers and then allowing third-party application owners, players to be able to utilize those compute powers by running their applications. So we are not talking about sensors or devices when we talk about Edge. And therefore you don't see us doing something like virtual cuplets. We're talking about some level of compute powers. Some were typically from single digit number of servers to maybe a few racks of X86 or ARM based servers sitting on the Edge. Do note that we, this particular technology that we talked about today, it doesn't necessarily just address the MEC use case but it does address the use case where if you have servers on the Edge and you want to run applications on the servers, the benefits of actually running the service match all the way to the Edge. And then going back to that use case, what makes sense for us in terms of deployments for Kubernetes is that we want to run cluster and auto control plan for that cluster independently on the Edge sites. Obviously the benefits are actually pretty obvious, right? If you want to restart a pod, you want to reschedule the pods, all the decisions are made locally. So you don't have to go back to the cloud just to restart and reschedule a pod. And that's not anything majorly controversial right now. I think for the last two or three years, there's been a number of projects that are trying to address the use case of lightweight Kubernetes engines. So what we see here is something like a K3S from Rancher or micro K8S from Ubuntu or example projects where they're actually trying to address a lightweight Kubernetes cluster use case. And another one is if you have this kind of deployment, what kind of application are we looking at? So going back to something like a MEC or any kind of Edge data center and micro data centers, you are trying to run a single applications that have components that spans across both Edge and cloud. So that's what we get actually and a feedback from some of the users from the OPNNV community also. Yeah, yeah, when the control things would be the control analytics and all that may be running on the cloud but then the processing of frames and videos will be running on Edge or machine learning where the inference engine would actually be running on Edge and then all the learning capabilities are running on cloud or IoT gateway where some sort of filter is running on Edge and then the core analytics engines would be running on cloud. So these kinds of applications where they're basically single applications that spans across both have microservice that running components that are running on cloud and components that are running on Edge. So we'll make an argument and that's actually the gist of this presentation is that it makes a lot of sense to run a single match across the cloud and Edge sites for this kind of applications. The obvious one is that once you do that you have a consistent network policy and telemetry formating models across a single applications across different sites of your cloud or Edge. So you can write basically the same framework you can treat it as basically just an extension on your applications just like how you would write it for any of the single match applications that you have that runs in a single data center. And also Istio, the major use case for Istio really is CICD pipelines, Canary releases, red-black testing, AB testing type stuff. For those things, for Istio's particular single match it's actually running very well. Spinnaker, for example, is already integrated with Istio for over a year now. So, and then we do think that for this kind of features the ability to do Canary releases, the CICD pipelines is critical for Edge deployments because physical access is very expensive for Edge sites. So you want to automate a process of deployments or any residing parts or things like that or basically fully automated to avoid having actually, technicians accessing Edge sites all the time. So with all these benefits that is sort of obvious really, why have no one done it before? Or why have them actually? So actually we thought about doing that as early as, I would say mid second half, 2018 when Clover was in major development time. At that time, actually we had a, we encountered, we investigated and we encountered a major blocker and that was actually test a component called mixer. Mixer is a Istio control plane component which is actually an engineering model. It's a very good idea as a theory. What it does is because as you probably know, Anvoy, which is the proxy engine of a data plane, if you will, on Istio runs on every single part. So it's a site, what we call sidecar. So on every part that gets spawned there'll be a Anvoy site, a workload part that gets spawned Istio proxy, which is an Anvoy running. Mixer, the idea would be, oh, you have potential thousands of these running. If we have complicated policies or complicated telemetry policies or if you have a bunch of info back against it, does his own thing that leads to record the sessions, you should centralize that capabilities instead of spreading over thousands where it doesn't really scale. Great idea, but then because mixer as the name indicated, because you want to implement part of your network policies on the data's traffic in mixer. Mixer is basically not a control plane component. It's actually a mix. I guess that's what the name means. A mix between data plane and a control plane. And how Anvoy works is every single first request that comes in for a session, they would first get funded into mixer and then they'll process it. And then we'll see what it is, day on day to actually go before Anvoy can proceed. Just by hearing that, you would know that it's actually pretty terrible to run that on the edge. You can actually run Mixer on the edge. In multi cluster deployment, they do, Israel does actually assume that you run Mixer also on the other clusters. But then particularly for edge, it doesn't make sense because your info back ends probably doesn't exist on the edge. They're mostly pretty heavy stuff, I would say. So if that's the case, then your mixer may be actually running on cloud. And if all your requests has to go to the cloud, just to get Mixer to say yes, that's the kind of delay is actually completely unacceptable. But thankfully for us, during I think when Israel 1.4, 1.5, the Israel community decided to duplicate Mixer. So it, and then instead actually taking advantage of the rich set of filter capabilities on Anvoy, that allows you to then implement those complicated policies and technologies or your custom protocols directly on Anvoy, instead of actually having this centralized components to do so. And that in a way actually unblocks us from moving forward. So now that we start to resume looking at running and still extending to the edge again. One of the great things that you can do and a good opportunity to innovate really when you run Istio to the edge is extending at the Istio capabilities, the route routes, the policies into influencing your choices of which WAN connectivity links that you're gonna use. So in here, when I say WAN, why area networks, I don't just mean why area networks with the typical definitions. Because going between edge to cloud, usually you can go through for networking terms, and you go through access network, you go through like Metro, and then you go through like the real WAN, which is either long-haul transcontinental links. And here when I use WAN, and it means like the combination of all of those. And then when you are running WAN links, you can, it's very typical now that you have multiple WAN links going from edge sites to cloud, or actually in the era of SD WAN, software-defined WAN, the software-defined WAN capabilities would then be creating a lot of channels, even if you only have one or two physical WAN links for different reasons, like low latency, or high bandwidth, or best effort and stuff like that. So those are been defined. And then what we really wanna see, and then what we see the opportunity of, is to map the Istio policies and route routes into different WAN links, depends on what their characteristic is. And actually in fact, this is gonna be a very simple demo that we're gonna run on this technology that we use. Before we actually go into the demo descriptions, the demo and this solution actually that we propose, actually utilizes heavily on this technology that we developed as called Clovisor. Clovisor used to be part of, it was developed as part of OPNV Clover, but it has since spun out and it's an independent project now. What it does is it talks to Kubernetes or Istio controllers. And then the big thing is it actually connects with the Linux kernel through BPF technology. iOvisor is an open source technology under Linux that actually would compile and load what we call BPF code into a kernel. BPF, I hope you know what BPF is, but if you don't, I probably don't have time to explain what BPF is, but then one liner would be it's a kernel technologies that allows you to insert your code safely into different event points in a kernel. So for us, for example, with the ingress of a network device and the egress of a network device, we can insert our own code in there and it's kernel safe. The compiler ensures that it won't crash the kernel. And then this, so this is Clovisor, it runs in a daemon set so it runs on every single node. And this, so let's talk about a demo. There's a script on a demo. The demo, the setup is pretty simple setup actually. I run two VMs on my MacBook through a virtual box. So I'm creating two internal links between those two VMs. Each VM basically represents a cluster. So I run a full Kubernetes clusters on each VM, untamed it. So the master runs here and all the minions runs here too. All the parts we schedule on just this one node. And then I also utilize the SDO multi cluster capabilities by deployment. So there's three different deployment scenarios. The one we pick is replicated control plane separate networks. This is because, well, the other two are shared control plane separate networks and shared control plane, same network. The last one actually doesn't really make a lot of sense for practical purposes, it's probably more for testing. So we didn't use it. I think the second one, which is the shared control plane and separate networks that actually reflects much better on the cloud and edge relationships. But then we still do the replicated control plane because it's the easiest to set up. So that's basically the only reason. What this does is, for a set of it, as I mentioned earlier, there are two interfaces that are running between the two clusters. So they symbolize the when link. So this is when link A and when link B basically between the two. And that's, so the idea is this is one SDO cluster, SDO instance running on a single cluster. And then that's another one. And then when sleep wants to access any of this actually we've been, it would go through SDO Ingress Gateway. And then our point is through our solutions, based on the route rules that you use to go to either V1 or V2, it would then select either one of the when links to map into that route rules. So let's get right to the demo. Hi, so this is the demo screen, what you see here is there's three windows locked into the edge node, and this is the cloud side. This is going back to that side deck. This is basically having actually been that would actually maps minions to version two of actually been and everything else to V1. As you can see, there are two interfaces to when interfaces that we symbol, the simulator when interfaces, when it was 1.3 and 2.3. And then if you look into the SDO gateway, Ingress Gateway IP address, which is when we are setting it up, we set up such that the sleep pod here, the sleep pod here has a DNS rules that would forward the IP address of this guy when they're going to actually been. And then also the envoy would intercept the packets and forward it over with a new port number. If you look at it now, we set up a route that goes to the Ingress Gateway. So I'm why we select that interface to forward it out. And if you look into packet throw between the two, the good thing about using this there's hardly any packets or actually there is really isn't any packets going through the two internal interfaces until you actually start forwarding stuff. So as you can see from that route rules here, we are looking at user minions and actually the second user that name is boss. So here there's a boss, the boss user, curl dash u, boss. So if it goes there, it goes through interface, the first one interface. And then if you use user minions, it still goes through this interface. So nothing gone through this. So let's get started. What we do here now is creating a closers to ask them to use the when policies, the mapping. So if you look at this, it's basically asking them that anything that matches port 3000 and minions would then be rewrite into this and then redirected over to this particular link. So again, if you do boss, if you do boss, it doesn't change anything. It's still going through this. If you do minions instead, you can see that it's now going through as nine. So once again, if you do boss, it's going through this. If you do minions, it's going through this guy, wait, take some time. Yeah, it's going through this guy. So that's a very simple demo to see with the route rules. So what actually really is happening under the hood then? So what's happening under the hood? So basically the first thing to do is close visor, you're loading a when mapping into a close visor. This when mapping would then tells them, as we saw earlier, what route rules are mapped to which when interface. We learned the route rules from Istio D. So in this case, we do have to reapply that route rules over to the edge also because Istio D, even though it's replicated control plane, actually doesn't really sync the route rules across different clusters. But anyway, so we learned that, we learned how we implement this and then in sleep on the edge side, the pod sleep that has an envoy, which we load a luau filter. So what does the luau filter do? What it does is it's for everything that matches the next review rules or any actually review things that matches port 8,000 in this case because we use sci-fi inbound. It hasn't actually changed to port numbers yet. It extracts the actually header informations and send it over to close visor. So close visor would, for example, we basically match with the rules that say it's actually user equals dominions, then do something. Then close visor takes that from Istio, understanding what it is and then knowing that this maps to a particular new when links, takes the session info and program BPF to then start routing things for this particular sessions to S9. Once that redirect rules is in place, anything that sleeps sense that matches minions would then be routed into S9. But anything that sends just anything other than minions like boss in our demos, it would still go through S8, go through a different when link. So this is actually the cluster of what's going on inside. So basically what happens is on the full measures, once we have a rules settings for close visors, another one is to be able to read from Istio D. We have a Luau filter on Luau filter on an envoy. So they will actually forward us actually the informations on new sessions. Close visor improvements that route will classifications, maps into the session informations and program BPF and then subsequently auto packets would then be forwarded into a different when link. For future enhancement, so this is cool, hopefully you think this is good. For future enhancements, some of the things we're looking at is, let's say you actually have a very resource-constrained edge node where, because envoy is being deployed per as a container per pod, per workload pod, it can, you may not have applications that actually require that, particularly an edge maybe actually isn't needed. Particularly talking to users, there's really a difference between whether or not your application is CPU bound versus IO bound. A lot of times your applications in edge may not be very chatty, it may not be as chatty. It would actually be probably wasting resources to actually create that extra containers. So one thing you may wanna do is to create envoy that runs on, that basically runs on, sorry, that basically runs on as a single entity where multiple parts would be able to go through the same envoy. So the Clovis apportion is kind of more simplistic. So when you deploy an envoy, the first thing you do is an init container is to basically set it into a set of the IP table rules so that you route all the pod traffic or incoming or outgoing traffic onto an envoy. So Clovis can actually do this by using something called SOCMAP, which acknowledgement to Project Selenium folks, they actually created the SOCMAP capabilities. If you know TC splicing, it's basically the same thing, except it's programmable TC splicing that allows you to switch packets from one socket to another socket. So we can utilize SOCMAP capabilities and basically creates an envoy as a non-psychic entity, but then at least one per namespace per node. Why one per namespace per node? Because of security reasons. Anything that before going into an envoy are clear text. So you would at least trust everything in your namespace to actually do the right thing. And that being the case, then you will send it out onto a per namespace per node because if you go out onto the node, you do not want unencrypted things go out to go out on the physical interface. So this is actually something we are exploring. Some of the hard thing is to make Istio works with a non-psychar envoy. Another thing that very quickly, because it's almost time, is how to actually ship logs. So right now, this is not really part of the Istio control plane. It's not Istio D, the log collectors, the trace collectors and the metrics collectors are all separate entities that you have to deploy yourself. It really doesn't make sense to deploy that an edge. So we would like to figure out a solutions that you can, but you still want that telemetry information to log in for information, the metrics, information, the trace information particularly are very useful. So one of the things to think about is, can we store them in batch and send it out? Or instead of, do we want them to keep on sending it when utilized, utilize when by keep on, traces are pretty taxing because I think these whole configuration is open traces, send out two traces, two spans all the time. So out of the pod. So this is pretty frequent operations if you have a complicated applications. So this is one thing to really think about moving forward on how to address this on an edge cloud deployment. So in summary, there are, as we said, there are a lot of benefits of running service national cross-cloud and edge. We demonstrated it when associations is actually, we leave is a very good features, particularly with SD-WAN. This will definitely be very powerful moving forward. And then for the future, if you have any resource constrain or concerns on sidecars and all the control traffic in general, there are two major areas to investigate and address. We do want to conclude by saying that we believe that edge computing is as much, if not more, of a networking problem as it is a computing problem. So that being the case, I think application needs to be aware that they're running on edge instead of on cloud. And then infrastructure needs to be invested heavily on how to solve the problem on a networking perspective. So if you want to contact me, if you're interested in the project, please contact me. This is Clovis a project, do you know? And the code is actually in GitHub, Clovis the Cloviser. Thank you for attending because I know this is, by the time you guys look at this session, you'll be one of the last ones at the events. So I thank you very much for staying and attending. You guys have a nice weekend.