 So, okay, good afternoon everyone. My name is Jose Lato. I work in Red Hat and thank you everyone for coming to this presentation where I am going to talk about the title is Tuning an Automating Delco 5G Containerized Workloads. I work as part of one team that is called Partner Workloads and Enablement and together with other colleagues like Javier Peña who also work on these same topics and he also helping me to prepare the presentation. We work helping our different Delco partners in order to allow them to run their different Delco workloads into our platform which is open safe. Very quickly the agenda for this session I will do a very quick introduction about the Delco architecture and the different layers that we can find there just also to explain the different points where in Red Hat we are trying to help to this partner. So, the idea is not only to enable open safe to run the workloads but also to provide for them or to help them with a solution to manage all the clusters that they are going to have. To manage me not only deploying but also doing automation, configuration, upgrades etc. How we prepare all these things together we have a set of different open safe Delco partners that allows, that helps them to do the different tuning that they have to do for Delco. And finally I will talk about the special configuration of open safe that is single node open safe. Pretty special because it's a cluster but only one node and these clusters can reconfigure with one special profile for these partners. So how is this architecture? So well, this part is the part of the devices that are our mobiles that are connected to the internet through Intel co what is known radio access network. I will mention very much during this presentation. Radio access network is divided at the same time into different layers. In the first one we will find the radio unit that are common antennas where our mobiles are going to be connected. And then close to these antennas or behind these antennas we will have the first set of cluster or sites. And these are called distribution units. Distribution unit take the signal from the antenna and everything is collected into the central unit where we again have a new cluster. And then the last layer that is the packet core. So packet core is closer to internet and again here we have one new cluster this time could be in a CPD or something like that while the central unit you can have one or several in different regions managing different distribution unit. So well we can use different on-pensif cluster configuration from standard compact but the most special part is cool. By now it's out of this full screen sorry. So okay during the presentation I will focus on the radio access network and the distribution unit. We implement this with a single note on-pensif together with a management cluster with a special profile that comes pre-configured inside this cluster. Also using some special operators for telco and with a very big requirement about the scaling and making the things reproducible. Why because if we focus here in the distribution unit that we have said that is closer or behind antenna you can imagine how many radio units or how many antennas these partners are going to have hundreds or thousands of new cluster that they need to scale and they need to manage. Well we usually say that telco war is different but maybe everyone say the same from their partner or customer but it's true and we will see later that we have to do very low level configurations. Also we want to reach to the far edge, the edge and the cloud using the same technologies and the same platform and when I have said before that we have very strict requirements about the scaling and making the things replicable. So how we make because we not only need to make open-sif enabled by to run these workloads but also we need to manage and administrate this huge amount of clusters. So this is why we need to create a management cluster that is composed by a set of different technologies. A management cluster is a cluster, an open-sif cluster or a Kubernetes cluster that manage other cluster, deploy, monitor, configure in an architecture similar like that because we are combining these together with keytops and zero touch provisioning technologies and tools. So we have our Git repo, we have our Git apps where we define our infrastructure and configuration and this is managed by our hub cluster and we can deploy different sites for distribution you need, the central unit or the packet core part. How we will manage cluster, we say that is an open-sif cluster so the base layer is going to be core OS and open-sif. Then we are going to use Red Hat Advanced cluster management product or tool together with assistance service that is the newer open-sif installer to install the different cluster. We will see more later about that then we have to do keytops, we want to use keytops combined with this so we have Argo CD and also a new operator that is the Dalm operator. This operator helps you about the lifecycle management for the configuration of your cluster and we will see later the special needs that we have and then of course you can have monitoring or other workloads that you want to have in your management cluster. Also I'm talking about the zero touch provisioning and what is the scope of the different areas that we want to cover, we want to cover the whole lifecycle management. So in the day zero we deploy our clusters together with this CTP GitOps tooling and plugins that I will explain in another talk more in detail. You can define your infrastructure, in this case you define your different single node open-sif. When these are deployed you go to day one, day one is still the cluster, well the cluster is only deployed, still is not ready for telco, it's just single node open-sif and in day one is configured what we call the video or virtual distribution unit profile that are a set of operators with some configuration and some tunings that we will see in a moment. In day two is the work of the partner so they have a cluster where they can deploy their CNF or workloads and of course after that in day two we can still do more configuration, more upgrades, whatever using the same platform that is ACM, our GitOps things, with our CTP tools etc to do the configuration. Very quickly because we will see this later, with the CTP tools we provide two new templates of custom resource definition that allow you to define the cluster and allows you to define the different configurations. This is just a screen capture of ACM, ACM is not only a we, it's also a set of services that allow you to do all the stuff around installing cluster monitoring, deleting nodes, whatever you need to do. I don't go in details with this because right after this talk I have another one where I'm going to go further about how we work in this management cluster. But ok, let's say that we have this management cluster with thousands of distribution units and now how we configure this because we said at the beginning that telco world is different. So we have open seed telco operators that is what we configure in this day one. So as an overview we have our single node open seed, we want to make it virtual distribution unit into our RadioX network. So we have to create this video profile. Again we have open seed, it's true that it's open seed only with one node. Then we are going to use the telco operators, logging, local storage, well local storage is not a special operator for telco but it's there. And then we have the node tuning operator and all previous to 4.12 performance operator that allow you to do some customization. Then the SRIOV operator for networking and finally PTP precision time protocol operator. All these operators together with the configuration allow us to convert the single node open seed in something that can be a distribution unit. Finally you can have other operators, other workloads but maybe the most relevant part is the application workload of the telco, customer of the telco partner that is going to run. We are talking every time about distribution unit, virtual distribution unit, well they are going to run bots with their virtual BDU implementation. And what is this BDU? So the BDU what it's doing is taking, we go again to the telco architecture, we have our mobile connected to the radio unit sending a radio signal. The radio unit here we don't do anything. The radio unit converts the radio signal into the digital signal and this reaches to our BDU bots. Each of these BDU bots what are doing is processing the digital signal that is going to be sent to the central unit. How this is done? Ok, this is a loop, it's a process that is continuously working on this signal processing. Interesting things here, these bots are working in this infinite loop with very, very demanding image. It will take one CPU and it will take this CPU for them and they will not share. So they will take the 100% of each CPU. You have four CPU, you can run four BDU bots that are going to be constantly using the CPU. We need a real time kernel for that. This CPU process cannot receive kernel interruptions because they only want to do signal processing and they don't want to be bothered. If you interrupt the CPU for whatever microseconds this will make the distribution unit to drop 1000 of packets that is not acceptable. Also they need to access to the network car with a very high throughput of data so they will access directly to the network car not passing through the kernel. So for that they use DPDK that allow them some special network cars to do this kind of things. So this kind of tuning is what is making this video a single node offensive especially is why I am focusing the presentation on this. And here the previous list of operators came into action. The first one is the node tuning operator that allows you to create performance profile. In a performance profile you can enable real time kernel. In a performance profile you can disable kernel interruptions for some CPUs. You can create huge pages of memory for these processes etc. And maybe the most interesting part is that you do the CPU pinning where you say we will have some research CPUs. The research CPUs are going to be the CPUs that are going to be used by the operative system, by Opensiv or for other workloads. And then we have the isolated one. The isolated one don't receive kernel interruptions and are going to run the videos. More tuning. Then we have the PTP operator precision time protocol. Because they have to have synchronization at level of nanoseconds. So this protocol is in charge of that. So this is a server where you have a network card. Then you will receive in some way a GPS signal for doing this precision time protocol. One of the demons managed by PTP will take the signal from the GPS signal from satellite. And it will synchronize the clock of the network card. In other demon will take the signal and will send it through the network to synchronize other cards that could be in the network or other cards that could be in the same server. And again in other third demon what is going to do is to synchronize the clock from the network card into the server hardware clock. And finally the third operator, SIOV operator. This SIOV technology is enabling some network cards and allow you to take one port from one of these. It allows you to take the port, the physical port of the network card and to virtualize it into different virtual functions. From the point of view of the ports they see this as a real hardware. And you can use these virtual functions to be used unique by each of the ports. So the ports have one CPU only for itself and also it has one of these virtual functions only for itself. And remembering that here we don't go through the kernel, we can jump this and go directly to the card using the DPDK technologies. Ok, more or less I'm reaching to the end. So we have this single node open seed and the video profile that I have explained before is one open seed cluster with only just one node that is very prepared or is designed to be used for edge at the edge. Maybe edge computing not, but is to be used in the edge. Because in this distribution unit we have to deploy hardware that is very confined to very small spaces with very reduced cost, power consumption is very limited, also the network connectivity etc. And again a single node open seed is optimized about the number of CPUs and RAMs that is consumed. Why? Because as less CPUs that are used by open seed and the operating system is free, more CPUs to run the video ports. If each video port can manage one radio unit and you free more CPUs you can manage, at the same time more radio unit that is a lot of money for telcos. So we are also present to even reduce the minimum number of CPUs that we need for running open seed single node. Ok, so when we deploy the single node open seed using some of these tools, serotox provisioning, kit and the github tools, this automatically includes what we tried to make that is a run reference profile with some pre configuration of this operator, of course the operator some configuration that can be generic and some other that can be customized depending on each partner. Also with CDP and github we come and the management cluster we can manage your fleet of clusters in groups. You can do upgrades for a group of clusters, you can do a lot of different management things. What is the video profile? Well I don't go in detail but I have a link here about this github repo. This is a list of manifest that by default are always included with the single node open seed. For example it enables a CDP protocol which is pretty usual in these cases. It makes some customization about making the server to start up faster because in these cases you have to reboot, something goes wrong. You have to start up as quick as possible otherwise you can lost some connectivity or we enable kernel dump or disable for example that cry or wipe the partitions with the containers every time that you reboot. If we do that in the single node open seed each reboot will take a lot of time that is not acceptable. Another kind of customization. These are the generic and then we have tools that allow you to customize and configure the operators or whatever that you have in open seed. With what we call policy gen template that again I will explain in the next talk. This is one example of a performance profile policy gen template that helps you to define this performance profile where you can see that we configure the, I don't know how to point, you can configure the isolated and reserved different CPUs, you configure your huge pages and also neighbor real time kernel etc. Whatever of this configuration and policy gen templates are managed by the hub cluster and you can decide to which cluster you want to apply the configuration. Ok, some numbers about the scaling and some performance test because we talk about the scaling a lot. These are some results coming from Red Hat internal lab where they are doing a lot of this performing testing with one hub cluster that is basically a compact cluster with three parameter servers. In this world they always use parameter servers that are pretty performant servers with hundreds of CPUs and 500 gigabytes of memory that this management hub cluster is going to try to deploy a single node open seed and to configure the video profile as quick as possible and in a huge quantity. We are going to push to our Git repo a 500 SNOs per hour trying to be deployed during like 8 hours with the intention of attempting to reach to 3672 single node open seed deployed in these 8 hours. Ok, it's done 99.7% of the installation which is pretty impressive and only a few less don't reach it to get the video profile fully configured. Anyways, these are very good results here in this graph. We can see how each one hour we add new 500 clusters that are installed in these lines tell us that each hour we have 5 new single node open seed clusters installed and this pink one is the one that installs or configure the video profile. So it's pretty stable about deploying the cluster, about applying the profile. It's also more or less stable about how many new single nodes open seed also contain the profile. So just ending, just some conclusion. Here in the presentation I have focus in the radio unit and the distribution unit and the radio access network layer because I think it's in some way different because of the customization that we have to include. The distribution unit is implemented with single node open seed with this video run profile with the telecooperator and extra configuration and we can have a management cluster that can do the lifecycle management as we will see later, not only about deploying but also about doing the configuration and the day one and day two. The same technology on Farage, Edge and Cloud and well we need to provide something, a platform that is scalable and to provide replicability. Ok, so last thing is the different levels of certifications that we have. The vendor validated is, we will say that is ok, we have made the video running in the cluster so the vendor can say that is validated. It means it will work in this offensive. What is new now is that we want to provide the CNF cloud native function certification. CNF certification don't go into details if it's working or not, what is testing is that it works because the vendor says that it works and also we certify that it's using best practices, some security issues and life cycle etc. So we can have the platform, the management cluster, the life cycle management and also the certification that CNF is going to work and is going to do it following best practices. That's all for the presentation and you have questions. I think I go more or less on time. I'm not sure now. Ok, yeah, the question was about the SIOV card with PCI interface version is using. I fear that I'm not sure because there are also many different cards about that. But it's true that it's something that we don't usually have problems with that. I think the question is about how it's doing the work of reducing the requirement for the single note open seat to free more hardware for the telcos. Well, I'm not part of the engineering team so I'm not sure how they are doing the magic of going for example from eight CPUs to four CPUs. But for example, some of the things that is related to the work that we are doing is, well, open seat, it's a platform that is for generic intention. So it contains some pieces or operators that maybe are not needed in these telcos scenarios. But by default the single note open seat is an open seat cluster so it contains operators and for example open seat monitoring. Some of the partners they don't need or they don't want to have the open seat monitoring because in this scenario or maybe because they have everything different that they are going to use to use monitoring or because directly they don't want to use monitoring. If something fails, you replace the server, you reinstall and that's all. So these are the kind of things that we are trying also to see to see how much customisable we can do this platform in order to consume less resources. How do we manage? Ok, because it's a single note open seat, there is no redundancy and there is no way of doing so it's something that is acceptable in the way. Ok, so you have one distribution unit in one antenna. Ok, well the theory is that you don't have only one antenna in the same area. So if you lost this distribution unit, in theory other distribution units that are closer are going to compensate this to not lost all the connectivity in the device that you have in the area. So distribution units compensate this failure and this gives you some time to try to fix it as quick as possible. Also the management cluster is important in this because we said replicability. Mostly these kind of servers are always the same so it should be very easy and fast too and if we are doing ketops we delete and we recreate. Maybe it's ok, well I have some more links and well thank you everyone.