 My name is Wim Hendricks, so I'm from Nokia, so I was supposed to be a computer by John from Google, but unfortunately he couldn't make it, so you have to deal with me. So the talk that I'm going to give, the title is rather extensive, right? But I think there is three important aspects that I believe that are represented in the title that are important. So the first one is collaboration. The second is manifests, and the third is scale in complex organization. So I think these are the three main themes that I'm going to, let's say, talk about and try to get a representation on how we deal with this in a specific example. So because these are concepts which are very generic, right? So I think it's always good to look at a specific example on how we could deal with those. One thing I would say is that the example that I have is Telco because I'm from Nokia, right? There's also people from Ericsson here, so what I want to say is that the example as such, or what you see here as a framework that we are building, it's not specific to this use case, okay? So keep that in mind, right? So there was a talk this morning at 11 o'clock for someone who was talking about how to deal with large YAML manifests in an organization. You see that the approach that we are taking is actually also applicable in such an environment, right? So keep that in mind, right? So we use Telco as a use case, but it's not limited to it, okay? Now, how many in the room are from the Telco space? So quite a bit, okay? Thank you. So half of the room somehow. There are a bunch of acronyms here, which probably for the rest of the people doesn't mean anything, right? So when I talk about UPF or as an AMF, think about as an app, and I'm looking at it as a network function, right? So that's how we call it. So it's a network function, it's an app, and when we talk about manifests, we talk about the manifest to deploy or configure such an app, right? So just to give you that perspective, why did I put this example here? First of all, it's what our main business is about, but secondly, there are two important things that you see here. It's one is scale. If you look to the numbers at the bottom, right? You see here on the, let's say, right-hand side, 100,000 plus sites, right? So all of these things are being developed more and more in cloud-native way. So that means that they are leveraging Kubernetes, so they will be deployed on a Kubernetes environment in a cloud-native stack. And as such, that gives a certain set of, let's say, complexities that we have to deal with, right? The second thing that you see with these workloads is they have quite some interdependencies. So you see here the, let's say, the lines in between those network functions, so they have dependencies between each other. So if you want to deploy them or if you want to operate them, you have to somehow deal with those relationships and dependencies, right? And that, of course, given this large scale gives a set of interesting attributes and we are trying to address and develop an approach that deals with that. Now, before I go into more of the solution or the approach that we are taking, I want to give you also a little bit of context, right? Why that organizational part that I put into the title is important, right? Because if you deal with this environment, you see that you have lots of relationship and dependencies, right? And it's not like, hey, I have here an application and I have all the freedom and I can do whatever I want and I deploy it. That's not what typically happens in the real world. We have people who are infrastructure people. We have people who are application. We have DB people, storage people and so on and so forth. Various roles inside an organization that deal with the particular problem space that we try to represent. Now, in particular, when we talk about the network function, so you have to keep in mind that we used to develop this as a hardware black box system, right? As a vendor. So that has been our main business. Now, with all the things that are happening in cloud native, you see that we are decoupling that and we are making it truly cloud native runs on Kubernetes and so on and so forth. But the culture that we have inside this means we had full control over everything, right? And we are going into a world where the amount of control that we have is lesser and lesser, right? We should only take care of the app, but the surrounding environment on which this is deployed is less and less under our control, right? Now, that also means that we have to, that's a journey as a vendor, right? That you have to go through and understand how to do so. And that's also why those relationships and roles in this whole deployment and orchestration, if you will, are very important. Because what we used to do is we said, OK, we do cloud native, but we said, OK, this is the infrastructure on which this can be deployed and you have no choice. So probably if you are people who are living in the true cloud native world, they will say, OK, that's not cloud native, right? So as such what you see, it's important to get a set of, define an environment on which those relationships and those dependencies, let's say, disappear and are decoupled. You can still say, I want a set of requirements, for example, how to deploy it, NUMA, Azure Iov and blah, blah, blah, blah, blah. But you don't want to be 100% in control. It's actually the infrastructure people that take care that that will happen, right? So decoupling is going to be important. The second or the third thing that is also very important that led us to the approach is that I built this thing of a house, right? So think of, for example, in this context, we have network, compute and storage, right? Each of that house is represented like that. Why I believe that Kubernetes is so successful is they gave you a plug-in to make it extensible, right? And there is various vendors on CNIs. There is several vendors on storage and compute and stuff like that that could do their own thing. But they all were given an environment on which they could operate independently, right? If you look back on how we in the telco or as a vendor were operating, we want to be in full control of that house, right? So rather than saying, here you can develop your app or put your app, we want to also define all the surroundings around it, right? So the reason I came or I'm talking about this is like four or five years ago, I started to onboard our applications on a public cloud environment, and I said, this is a challenge. And I started to, inside of Nokia, I say, okay, we have to change the approach that we are taking. It's not working very well. We have too much dependencies and so on and so forth. And so I started the journey to say, okay, we have to move more and more native to inside of a Kubernetes environment and focus on the app, but the surroundings, we have to define in such a way that they are true cloud native. And we don't have to define the surroundings. We just have to be able to deploy it or get deployed or get operationalized in that environment that someone else has defined for us, right? And that led to the following is that we, in parallel to us, Google started an open source initiative which is called Nephio. And I don't know who in the room has heard about Nephio before. Nobody, very few. So like 10, 20 people. So Nephio is a project in the Linux, so LFN, so in Linux Foundation, on which we are developing this, right? So Google has pioneered that, but there was people from Ericsson, Nokia, several service providers who together where we are defining that architecture. Now, what is the goal of Nephio is basically, as represented here on to the slide, is use Kubernetes as that unified layer and make sure that we can automate all our elements in that environment. So rather than having our own house and so on and so forth, we want to use a unified Kubernetes layer on which those systems are built, right? And so, of course, this is a huge problem space, right? And so we divided that into three categories, right? Because to get a bit of context. So first category is what we call infrastructure, right? This developing or deploying clusters, right? Now, just to be clear, we are not going to come up with a new cluster API system. We are just using it, right? Because we have to understand which clusters are there, and we potentially have to instantiate the cluster, right? But we are not going to take care of how to do it. We are just going to drive that, and we are going to use the tools that are available, right? The second pillar is we have the network functions, right? That has to be deployed on a certain infrastructure, and we have to deal with that, right? And there's the third pillar, which is the configuration of those network functions, right? Now, third pillar is a bit interesting, because if you look at the PetForce's Kettle approach, right? So what is actually the configuration management is that you have an existing, let's say, container running, and you want to change the configuration of it while running, right? So you can have two approaches. You could say, hey, I deploy a new one, right? So like a Kettle, so you basically throw it away, deploy a new one, and you're fine. Or you could say, no, I want to have granular control and have the ability to change certain configuration parameters during the lifecycle of the operation of that thing, right? And depending on the environment, you have different approaches. But what you also have to bear in mind is that with this automation framework, we don't want to be limited to purely cloud-native application. So we also want to control, for example, physical network elements. So we don't want to limit ourselves to that. And of course, when you go to a physical network element, like a top-of-rack switch or a device that is running into your network, you probably cannot just kill it and get rid of it and deploy a new one, right? So depending on the environment that you have, you can either use either approach. And as such, we believe that configuration management is also a consideration that we have to take into account. At the moment in FEO, we are focused mainly on the first two pillars, right? But also the interdependencies between the pillars, right? So in other words, if I have a network function X and it has a set of requirements on the infra, we want to do that in a way that we can do that at scale. And secondly, that we can do that in a way that we have some organization boundaries that we take into account, okay? So high-level goal that we have in FEO is, first of all, it should be intent-driven, right? So focus on the what, not on the how, okay? Secondly, if you look to the scale, we want to do that in a very scalable and distributed way. So we have central control, but then once you have defined what you want, you want the actual workloads to be independent operating. But if you see that there has to be some changes, you have the ability to interfere with that. And then thirdly, we want to do that in a uniform way and try to have some guardrails around the changes that are happening in order to ensure that what you deploy is actually going to be right, right? So sometimes we call that shift left, so we try to give you more and more tools to upfront try to validate what you are going to put on the runtime environment, right? So that's part of the collaboration part as well. So these are the three main pillars. Now to give you a bit more, because so far this was probably a little bit fluffy, so I'm trying to give you an analogy on how you should, let's say, look at the example that I'm trying to give, right? We all know probably Kubernetes on how it works. So we have clusters, we have nodes, and we schedule pods, right? All good. If you look to the approach that we have in nephew, for us the unit on which we schedule is a cluster, okay? We deploy network functions, so it's apps, and when we talk about the manifest, it's the apps related to that, the manifest related to that app. And then thirdly, I mean, people know when you schedule a pod on a Kubernetes, you can use daemon set, you can use deployment, you can use stateful set, so you have various mechanisms to deploy it. We have a similar approach which we call package variant set, right? But so you can basically say, okay, I want to have this network function running on this set of nodes with these characteristics and so on so forth, right? So you see, there is a lot of analogy in how we are operating. Secondly, the use case that I'm saying it's edge clouds and so on so forth, you can see that in IoT or in other environments, it's a similar type of thing that you need. You sometimes need to have the same thing deployed on various environments, right? That controller that you have here that does that fan out, if you will, or that scheduling is not contained to network functions. I could deploy a set of manifest that do something, right? So the same controller does that. We want to make also generic building blocks that are reusable for multiple use cases because it's not, if you do specific things, you have the problem space that we are trying to address in Nefio is huge, so we have to build these reusable components, which is something probably I will repeat a few times. Now, so this is a bit high level what we want to do, so I'm trying to go into how we approach that, right? So first of all, we are a heavy consumer of KRM, and we are 100% consumer of KRM, so we love the KRM model that Kubernetes came up with, and we believe that it's very important to standardize that whole automation system on that, okay? So why is that? Because the schema is well known, we have standard metadata tools, and so you can build very generic code on that KRM without understanding necessarily the full integrity details on what those things do, okay? So that's one. The second thing that we did or that we are using is we are using a concept called configuration as data. I don't know how many people have heard about this before. I'm expecting very few, okay? So it's a new approach, and by the way, if you scan this QR code, so this is giving you a link to a website which has a lot of information around that. But the approach is a little bit different than what we have done so far. First of all, we defined the configuration artifacts, so we built a concept like a package. So a package consists of a bunch of, let's say, KRM files, YAML, in the normal case, and that can be anything. And that package is just a list of YAML representation that together does something. In our case, it's deploying network functions potentially or in the example I'm explaining, but it could be anything, right? On that YAML or that package is version controlled. So you have a version backend on which that package is living, and so any change is version controlled, right? So any change that we do, you know exactly when it happened, who did it, and that goes into the collaboration approach. And then you have a bunch of functions or controllers, what we call KRM functions, that act upon that package. So you could mutate. So for example, you say, you create a package with default namespace and you say, I want to deploy this in this environment. You have a function to say, change the namespace from A to B, right? Or a function, so you have these functions are, first of all, there is a standard set of functions that are developed and already available, but we can develop those functions based on your needs and that's what we are doing inside of nephew so far. Now, what is nice about the approach is that we have a full trail while you do changes and while you clone those packages towards your needs. So for example, you'll see in a bit that we will fend this out towards various environments. You get a trail towards the source. That means that from the moment you do a change in one of your blueprints that you have defined, you can still trickle it back down all the way to the end where it was, for example, deployed. So that means that you never lose track of a trail of the origin and at any point in time you can do a change and it follows through all the way, which in my view is something that so far has not been accomplished and I think it's a very nice attribute because that means that if you deploy or you define what I call a blueprint design and you're going to use one for dev, prods, or so on and so forth, you have that same set of capabilities. If sometime the blueprint changes, you can still do the updates and understand where it was deployed, who is using what and so on and so forth. So that's a very nice attribute and the other thing is that those functions could be instantiated using containers and then the new hotness, which is WebAssembly. So they are very tiny, extensible, and you can basically execute as you will. Now, of course, in order to make this work, there's one important thing is they have to be idempotent. So you have to write them in a way that they are idempotent because otherwise, depending on if you get different result based on different environment, that's one thing. But the nice thing is they are very tiny and they are very extensible. So you can use them in the context as you wish. Now, to give you a bit of context on Nefio, right, and so this is the architecture that we are, let's say, adopting. So you have here on the, let's say, on the left-hand side, I believe, we have a management cluster on which a bunch of, let's say, controllers or functions. So we have the choice to use controllers and functions. They are going to do a number of things on a package. So the package is our unit that we operate upon. And so we have a bunch of controllers and functions that we are defining that are going to manipulate the contents of that package. So that could be mutating. We could do validating. And we can do generation of new KRM resources that we need. And I'll have an example in a bit that hopefully clarifies a little bit better of what I mean, okay? Of course, that management cluster has a good backend. So in order to ensure that we can do collaboration, that you have multiple organizations be able to act upon that package in a way that their responsibilities are set to perform a certain task within their organization, right? And then we have a set of workload clusters that are going to be the ones where the actual app, in our case, or in this example, a network function, is going to be deployed upon and is going to run, which we call the run time, right? Now, we have a layer in between. So people, when they look at the system, they say, okay, you're doing, let's say we are doing GitOps. Yes, we are doing GitOps. But so far, when people talk about Argo and Flux and stuff like that, it's simply I get a set of YAML files and I synchronize them onto the cluster, right? That happens at the lowest layer, right? Where you see that we are using configuration as data is all the way from the deployment side where potentially you run, so we are using at the moment config sync, but you could use Argo or Flux. So all the way from this guy where you start the, let's say the intent, the whole engine, all the way here, we are using configuration as data to do that manipulation and version control in that scale out way with that link back to the source always to understand from what the origin is, right? And if you look here, I try to conceptually give you what is happening on that management cluster before you get deployed. So we have something what we call a blueprint package which is a set of manifests, so YAML, that someone defined. So in our case, it is a set of artifacts for a network function in the example that I'm explaining. And what happens then typically is you need, you need to put that onto a cluster, right? So that could be one, but that can also be thousands, right? So we have what we call a fan out aspect. So it's the same package that we take, we clone it for that particular environment and we add, so what we call injection, we add some context where that package and what is the context in which it operates. So that could also be prod, for example, Dev and prod. So that could, anything that you can imagine, the use case that I have is really the scaled out edge deployment. So now that same package is cloned towards a specific, let's call it instance on where they should run, right? But of course, if you say I want to run on a specific cluster, you still need a set of attributes that are aligned with that environment. And that's what we call specialization. So specialization is, for example, I need an IP address of which is aligned with that subnet provision on that side, or I need a VLAN, or I need another context that is specific to that environment. So during specialization, we again change the content of that package based on the, let's say, functions and controllers that handle that, right? So what you see now, you have this nice orchestrated system that at some point is going to result in something that you can deploy at runtime. During this step, when you have done this whole hydration, as we call it, you have an artifact that you can control, so you can run it through a CI pipeline, you can do some validation upon, you can see what is the changes that you have from version 1 to version B. So we have full control over what are the changes that you are going to apply to the cluster in order to avoid any problems while you are deploying, right? That doesn't mean that nothing will, that everything will be nice, but so we are trying to limit things that get deployed and give you the controls to be able to handle that, right? Now, just to give you an example, and maybe I'm not sure whether the example is very good, but what I try to represent here is to give you a bit of context on how this works. On the left-hand side, we have this blueprint package. As I said, any YAML or any KRM resource that you can think of, you can put in there. One important thing that I did not say so far is that not all of these KRM resources have to be applied to the API server, right? Because think of it, right? If we have 100,000 sites and we need, let's say, five IPs, we have half a million entries in the API server, right? So we also have a way to offload certain things because sometimes that IP address that you need, you don't necessarily need to put on the API server. It's an intermediate resource that gets scheduled in order to get to the final result, you see? So we also have a way to offload whatever is put in the API server, but you will see in a bit, we have a mechanism to give you the same philosophy that is happening on a controller to act upon that. So that's the other important aspect that you have to remember is that the content of a package, you have control over what you apply to the API server of Kubernetes and which pieces you don't, okay? And that allows, again, for scale, right? But you do not lose a number of things by doing so, okay? The second piece that is here is, and that's, by the way, also part of the package, it's a kept file. So we use kept and porch, which is the configuration as data. So what is in that kept file is the pipeline of which functions and which controllers you actually use in order to do that specialization, okay? And then once that pipeline has run, you typically get, all right, you still have this manifest on how it started, but you can have new resources, KRM, that that whole specialization phase results in, right? So for example, here, the use case is new network attachment definition because we are using multis and so on and so forth. But again, think of this as a very generic approach. It can be, it go from A to Z, or it can mutate A with a little bit of mutation to A prime and so on and so forth. So any mutation and permutation is possible, okay? Now, what is important, I think is also the approach to do that specialization because I think one thing that I like so much about Kubernetes is the, what I call the loosely coupled framework. So you have a bunch of independent actors, right? That together achieve a certain goal, right? Would that not be great if we have the same thing that can act on that package that does that specialization, right? And that's what I call or what we call the conditional dense or the conditional choreography. So you have a set of independent actors. What are actors? They are either functions or controller. So you have the choice what you pick, right? But together they achieve a certain result, right? Now, why is this so nice? I think if you look to when you schedule a pot on a Kubernetes cluster, you have actually the same dense. It's not explained in this way, but you have someone you need to see an eye, right? So someone will take care of that. I need storage. Someone will take care about that. You just see that that pot gets scheduled and if something didn't work, you will see in the status why, right? The same philosophy is sitting behind this, right? But we are applying it through the automation layer, right? And what that helps us to do is have these independent actors that together achieve a goal, right? And I call this personally, I call it conditional dense or the conditional choreography. It's a very nice way to have independent actor act upon that KRM package that you saw and then mutate it, do all of these specialization and do that in a very independent way that if you say, okay, we changed one of these parameters, we can do that independently, okay? Also, what you see here on the slide is that you see this VLAN backend, IPAM backend. You can even do calls outside of the package, right? So it's not limited or constrained to KRM that is located. So you can even call out through an API system to say, hey, I want to reach out to an IPAM system or anything that you can imagine to actually get information about something, right? So it's not constrained to the functions or KRM that lives into the package. It can actually reach out to the outside world, okay? Now, I talked about organizational complexities, right? So I didn't go so much into the details so far, but of course that whole system has to be built in such a way that people who are responsible for infra and people that are responsible for network functions or other roles that you have. You have security people and so on and so forth can all do that in harmony, right? And as such, we have this concept of a consumer provider so there is a clear delineation of who... So for example, the network function people want to have certain constraints to the infra and you have a consumer provider relationship, right? You see that the CRDs or the KRM that we have in that package have a clear role of who owns that and who acts upon that, right? So you could have a consumer that says I want an IP and then the provider who is owned by the infra people actuate that, right? And then we have this loosely coupled and this extendability framework that allows to add vendor extensions and so on and so forth because we did what... So the other nice attribute of that package is you can extend it with whatever KRM you want. So if you say that your vendor has these specific attributes you just basically say put the vendor KRM inside of that package and have a function that acts upon it and you're good to go, right? So that's the other, I think, nice thing or property of that package is that it's not contained to a certain CRD. You can put any resource in that, okay? Now, probably we don't have a lot of time but what I wanted to do, so this is a very busy slide but what I wanted to do is in nephew rather than building... Okay, we are using the telco use case as an environment but we are building these generic frameworks that help achieve those roles. So what do we have? We act on KRM, we use packages, we use configuration as data, we use this fan out controller so we use this conditional dance and so on and so forth. So we are building a set of generic capabilities that I believe are applicable in various other domains, right? And I would love to talk to people to see how that would be applicable and try to see what their feedback is on that. So with that, I would like to thank you all for listening. We are doing all of this in open source, by the way. So everything that I talked about is open. It's part of the Eleven, so Linux foundation. Here are a bunch of links on how to follow and figure out what and what is going on. And with that, I would like to thank you and open it for questions. I don't know whether you have time for questions or not or... No? Yes. I think there is a mic. With KPT, do you run some kind of KPT agent in the cluster? The question is, do we run a KPT in the cluster? So in the management cluster, I did not explain that. So KPT has a set of attributes. So there is... Porch is a component of the configuration as data. So Porch is the backend that runs inside of the management cluster. And that has a component of the configuration as data. Inside of the management cluster, and that has a Git backend. So that's the thing that we're using. So if you look to the management cluster specialization and all the fanouts that interacts with the Porch API, which is a Kubernetes API, that we are talking to in order to then do all the changes to the backend and to Git and to get this version control and so on and so forth. So there is a component of KPT which is called Porch that is doing that.