 Okay, my name is Raymond Long, I'm a Linux kernel engineer working on both the upstream kernel as well as the enterprise Linux kernel. And this talk is about a feature in the Linux kernel called Control-Go. Right now they are currently suiting up here, but they didn't know, maybe I should be... Let's see what happens if you put yours up there then. Okay, let's... Are you on Google Slides? No, just... Okay, so what is a Control-Go? Control-Go in mechanism where you try to work a set of related process into one under one control and birth so that you can control the resource that are allocated to that set of process that are running in the system. So you can partition all the processes in the system into multiple groups and then assign different resources to each of those groups. And that is what Control-Go is for. Together with namespace, which is a mechanism that allows you to show only set of resources to the user process. They form the basis of what we do for container as well as for virtual machine. So there are many kernel resources that can be controlled by Control-Go, like the amount of CPU you have, the amount of memory, what I.O. devices that you can use for specific container or virtual machine as well as the network, the number of process ID you can use in the system. So how to use Control-Go? From the kernel perspective, Control-Go is controlled by exporting a set of virtual file system where you write the desired value into the virtual file system to move process into it and to set the other resource limit that you want for that Control-Go. So the Control-Go, so each process running a system associated with one of directory in the Control-Go by default when you start up a process is mostly in the wood directory, the wood Control-Go. And then you can use the Mac directory command to create additional Control-Go that you can move process into it as well as setting different set of resources for those process that belong to that new Control-Go. So these are just some examples of the virtual file that you are available for the CPU-set Control-Go. Each of the Control-Go and CPU-set is what we call a controller. So there are many set of controllers for controlling different resources in the system. And how to use Control-Go? So basically you can doubly assess the virtual control file for each of the Control-Go to assign process as well as set up all the resources. But in reality, you should use some abstraction layer like use the CG create and CG execute those kind of command. Or in reality most middleware layer they provide some interface that you can control what system can be classified or put into different Control-Go. So you seldom doubly write into the Control-Go, the control file for the CPU-set. And right now there are two versions of Control-Go that are supported by the Linux kernel, the V1 and the version 1 and version 2. Version 1 is in the kernel terminology what we call the legacy hierarchy. And version 2 we call it the default hierarchy because this is what the kernel engineer want to promote, to move from V1 to V2. And basically V1 are in mostly maintenance mode. There may be a little bit of new feature here and there, but most of the development activity focus on CG-Go V2. And so you want to use some new feature that are provided by the Control-Go system. You will need to move to V2 eventually because many of those new features may not be available in V1. And in terms of internal, within the Control-Go subsystem there is a core set of functionality that are shared by or used by all the different controller and then there are some controller specific functionality that are specific to one of the controller. And right now the kernel-available controller including a controller to control the profile, the CPU, and CPU set is just deciding which CPU or which memory node you can use for a process. And then there is a device, feature, huge TAB. So there's a list there. I'm not going to go into each of them one by one, but if you have questions you can ask about it. So now we're talking about the hierarchy. So Control-Go are managed like a directory tree. So you have a wood control group and then underneath the wood control group you can create some control group underneath it. And it's like a tree structure so you can have a multiple level of nesting within the control group hierarchy. And for the worst-of-one control group, each controller can have its own hierarchy. So as a result, you look at the control directory within any system. You can see multiple sub-directory, one for each of the controller. But then the feasibility of combining different VN controller and you can put a process. A process can be in one control, one SQL hierarchy, maybe a level one. But then in a SQL hierarchy, it can be deep in level two or even level three below. So that creates a lot of complexity that the controller needs to manage in order to make sure that you know excited on which control group each of the process is in. And there are cases where one controller may want to coordinate with another one to provide some kind of join feature that requires both. But it's very hard to do in SQL V1 because a process can be played anywhere in two completely different hierarchy of different tree. So it's very hard to coordinate between two different controllers. And this is where the SQL V2 comes in. The idea behind SQL V2 is that we provide only one unified hierarchy for all the controller. So you only put one point in the tree in the SQL hierarchy tree and then you know that all the controller, you can make sure that different controller in the same position in the tree will be able to work together because they will ensure that they all share the same set of process. And one of the new controller that are available is called the white-back controller. It controls the amount of white-back depending on the amount of memory bandwidth you use. So it needs cooperation between the BLOIO and the memory controller. And the only way you can do this is through the SQL V2 system because in V1 you just can't coordinate between two completely different tree. And how to use the unified hierarchy? Not all the SQL V1 controller currently available in unified hierarchy. Some of them may never be supported in V2 and some of them may remain as V1 only unless there is a strong request to do it. Some of those controller may not move at all. So a given controller can be either in V1 controller or V2. So you can't have the same controller in both. So you have to make a decision when you start a system whether you want to use that controller in a V2 controller or in a V1 controller. You can't do it both. So by default when the system puts up, the system tools like the system T will bind each controller to its own hierarchy. And so whatever lab that isn't mounted on but into the V1 SQL then it will be available as V2 when you mount the V2 SQL. But you can also use a good command parameter SQL underscore no V1 to specify the controller that shouldn't be available in V1 and should be in V2 only. So this is one way for you to kind of change the way that the system allocated which controllers go to V1 and V2. This is basically an interpretation mechanism. So like right now it's CPU no V1 but then my future kernels is going to be by default. There's no V1 and then you'll have to explicitly enable it if you need it then it will be gone without the idea. Well the idea is we want to eventually duplicate the V1 but right now V1 is still very happy to use so that's why by default you will. If the system mounted V1 controller then it will be in V1 only and whatever lab is available in V2 but you can override that by using this option. Is that even considered deprecated yet? Not yet. But we are moving in that direction. But we should treat it as deprecated. Yeah basically. CPU V2 also support a concept of delegation. So where less people use it is allowed to manage a subtree within the overall hierarchy. And so that feature will allow the use of a container that don't need a privilege to manage the control code within its own hierarchy. It's not available and V1 doesn't have this feature. That's why one reason people are trying to move to V2 is because of all these new features that are available. So I just gave a talk on delegations of this really. Can you give an example of something that a privileged user or less privileged user can do with a C-group that you could do before? Maybe you have the username space. Yeah. And that is mostly used for the username space or within the username space. The word within that particular name space doesn't need to have the word privilege in the whole system. And you can use the delegation to allow it to manage all the C-group control files underneath its own tree. So is there a supported route? Yeah, some work like a route. Because many of the control files can only be written by route or process with privilege. And so in order for a non-route user to do it, you have to use the delegation mechanism. Okay. Unlike the legacy hierarchy, a controller in the user file actually isn't enabled by default. So you have to explicitly enable each of the controllers because you can have multiple controller bind to the C-group V2 subsystem. And you don't want to be explicitly enabled all of them by default. So by default, they aren't enabled. And in order to enable a particular controller, you have to explicitly enable it using the C-group sub-tree control file. Sub-tree control means that all the child directory underneath it will have that C-group enabled. So the controller by default all enabled at the root C-group, but not on any of the child C-group unless you enable it in the sub-tree control file. So as a result, even if you have a hierarchy of different C-group directory, not all the controller are enabled in all of them. So you can manage which controller are enabled in which one. Like you can enable C-group control on the first level and not in C-group. You can enable here. And so in this case, both. So for what I'm saying is when you enable C-group control here, when you sub-tree, then both these two, A-group, the C-group A and D, will have C-group control enabled but not B and C by default. So process within the B and C will use the controller in the parent. In the nearest parent, they have that controller enabled. So we can think of this controller now control all the process within the A, B and C directory. Flat mode. In SQL v1, we manage process by the flat ID. So each flat can... Different flat of the same process can be in different hierarchy in the same controller tree. So that creates some competition in how you manage the resource because for some of the resource like memory, memory of all the flat within the same process, they share the memory space. So you just can't... So it doesn't make sense to manage each of the flat differently in terms of how you manage the memory. So there are some controllers that can only be... They are meaningful to be used for flat where different flat can have different resources or limit applied to it. But there are other controllers in memory that doesn't make sense to have different flat in different memory SQL. So SQL v2 has a feature called the flat mode where you have explicit enabled because SQL v2 by default manage on the process level. So you write the process ID and then all the flat within the flat process will go to the SQL directory. But there are controllers like the CPU controller which require the flat level control. So different flat within the same process may have different CPU priority or different CPU resources that should be applied to it. So in order to allow a flat controller to be used in a mainly process-based environment, we create a flat mode. You have to explicitly enable the flat mode in order to allow you to move different flat into different directory within the street structure. So this is how you use the flat mode. So you have to designate one or you can't enable flat mode in the wood. But in level one or below, you can enable the flat mode. And then once you enable it, this node is called the flat domain. And then underneath the flat domain, you can explicitly enable SQL that are threaded. Threaded means that it will only allow users to use controllers that support the flat mode like the CPU controller and some of the other ones. But there are controller like memory that are not allowed to be used in flat mode. So memory controller will stay here. And to manage all this, you can't activate memory controller here when the flat mode is designated as threaded. So only threaded controller can be activated in this level. And then underneath the domain, domain just means that you can't use threaded or you can't enable threaded controller here. You can keep on. And underneath the threaded directory, you have to... All your sub-directory has to be threaded as well in order for you to function well. But then underneath the domain, you can activate flat mode here and then create some additional thread sub-directory underneath it. So there are some behavior between V1 and V2. First of all, the naming of the control file may differ a bit between V1 and V2. The reason of this is that V1 controller will develop independently by different developers over a period of time. So the naming convention and also the semantic of each of the control files may differ between different controllers. So you really have to know what those control files is and how to use them. There is not a consistent naming convention as well as semantic for all those control files. V2, the current developer, want to have a more consistent interface that look or feel similar for each of the different controllers. So they apply more strict control of what you can name for each controller. And also because of the overtime, a lot of features that are added to V1 controller may not have that many users or may not be that useful. So all those what we think is not that useful features get dropped out of the V2. So V2 provide more simplified interface that are more consistent and supposedly easier to use than V1. So now we have to talk about how to migrate from V1 to V2. It's not easy process. Because of the control file, the naming of the control file is different as well as also the functionality may differ a bit between V1 and V2. So you have to kind of rewrite your application if you want to use V2. And upstream is currently doing that. They are trying to support SQL V2. So they will provide like the other system libraries. They will probably provide one version for V1 and another version for V2. So application can now choose to decide which one they want to use. They can use a V1 library to use SQL V1 or they can language V2 library and use SQL V2 for the application. And there are some, as I said before, there are some controller that are not core to the SQL. There are now not being supported in V2 instead. They are wise people that they can use other mechanism to control those, to provide similar feature like to use the EPP program. Or alternatively they can still use them as a V1 controller. Is that practically speaking, is that going to keep them assuming the C groups are there and so you can say, everything I want to do like a new V2 except I need the CLS? Well, you can, those controllers you can still continue using it everyone. It's not just in case of you can't just switch the V2 stuff on. Well, a lot of these are very special use cases. They are not, in fact, most of the middleware library, they don't use this SQL at all. Here's TLB, isn't used by them in the application actually. They are mostly used by enterprise applications like Oracle Database. Those are the... That's quite a world. Yeah. So what's going on with that? Well, there is just no customer request to do it in V2. And that's why they stay there. But you have to get a lockers behind only having V1 then right? In many cases, if you are wanting those entire applications, you probably want to have control the whole box instead of... Because control is a use of partition and system. And the application itself can manage it itself without using SQL at all. So it's probably a legacy that we provide. And I don't know how many people are actually using it, but I think it's not that many. Yeah. If people complain. Yeah, this is usually how we move forward. If no one complain, we move forward. If someone complain, then we will see how, what we can do and satisfy the demand. Okay. There are some issues with the Unified hierarchy, SQL V2. So it does solve some of the problems associated with SQL V1, but then it also introduces some of its own... And one of the problems is that you can't have that freedom of using different controller in completely different hierarchy. You only have one hierarchy. And so you have to prepare how you want to structure the hierarchy and move stuff there. So you require more thinking and planning for the user. And of all the SQL controller, the SQL controller has some performance implications with NASDAQ. And the more level you NAS, the more performance impact you may see. The reason is because of the way the scheduling work, they create... If you use CP controller, they actually create some dummy one queue. Well, each of the controller have their own set of one queue. So the way the server is... At the boot level, you choose what they call a schedule entity that you want to run. But the entity may not be actual process. It can be a subtree of processes. So once you set the entity, you have to go down to select which... Until you find a process that you can run. So it takes time and that's why there are some performance implications with two different level of hierarchy that you want to do. So there are developer working upstream trying to reduce the performance degradation associated with the CP controller. But it's still work in progress and we don't know how much that will improve. So there is a proposed patch upstream. Actually, I posed a patch or I called the bypass mode to try to work on that problem with the CP controller. But the upstream to the server still want to do it the other way, basically improving the performance degradation associated with the CP controller instead of adding a new mode to work on the problem. And now we talk about the SQL v2 core. So those are the control files that are associated with the v2 core. So they are all prefix with the name SQL. There is a SQL controller which we only found there. So you want controller available for that particular directory. And then there's an event file. So I'm not going to go over all of them. The site will be posted on the platform website so you can read it when you have time. So now I've talked about each of the controller, like the CP v2 controller. So this controller manages how much CPUs time to be allocated each to process within that particular control group. So they do it by using a weight and also there's a max file to control the bandwidth. So they allow it to the process within the control group. So this is basically how the CP controller. You have two processes with different weights. Then the one with the higher weight will allocate more CPU time than the one with the lower weight. And you can also control it by allocating the bandwidth. So they have much total time out of. So there's bandwidth, there's a period like such as I think maybe one millisecond or so. And they allow you to one. You allocate a bandwidth, let's say 50%. Then out of each one millisecond, you are only allowed to one 0.5 millisecond. And then the weights have to be used by process in the other control group. Then there's a CPU set controller. The CPU set controller is used to control which CPU and which memory no process within that SQL is allowed to use. So the CPU controller is actually very useful used by all the container management tools as well as the VM to control which CPU and memory no are allocated to each of the container or machine. Because you don't want to... So you can partition a system into different partition of CPU and memory no and assign to different SQL. SQL V2 is also a new feature called partition that allow you to actually partition the CPU resources and scheduling within that CPU only. So it will affect the way that the CPU schedule work. And then we have the CPU V2 memory controller. The controller manage how much memory... How much physical memory allowed to be used by process within a SQL. I want to distinguish between physical memory and virtual memory. So a process within a SQL can have a very big virtual memory space. But out of that big virtual memory space, only a tiny portion of it are actually allocated in physical memory and used there. Memory controller is used to control how much physical memory is actually being used, not the virtual memory. So the way that memory controller work is that you can specify the high memory limit. So the amount of physical memory that allow to use by process within a controller. Once you reach that limit, then the system memory will kick on to try to regain memory back from those processes. And then there is another control file called the memory max that specify the maximum amount of physical memory you can actually reach before something bad happens. And what I mean by that is that in OEM, our memory controller will be activated and try to kill some women process. So to kill some of the process so that you can actually regain those memory back. But in OEM, memory controller is never a good thing because there is little control of which process will actually get killed. So you want to avoid that much as possible. And there are also some control file relating to how much swap space you can use and things like that. And then there is IO controller. One thing you can notice is that the naming convention between different controllers are more consistent than in V1. So you have both a max, a rate, a step. So those control files are there in many of the V2 control groups. So once you see the name, you have a rough idea of what they are and how you can use them. Even though they are controlling different resources within the system. The final one I want to talk about is SQL namespace. SQL namespace provides mechanism to virtualize what a process can view the SQL hierarchy. So you are in a container. You don't want it to see other SQL tree above its level. So you can use SQL namespace to do that. SQL namespace is also used for the delegation mechanism. So you can delegate the maintenance of that particular namespace to a normal process. Okay, about the support V2 in VEL. We are not going to support SQL V2 in VEL 7, which is now entering maintenance mode. Because the inclusion is important, we need to make some kernel KBI which we don't want to happen. And in VEL 8, some of the V2 control groups are not enabled because they are not much upstream in time for that. So V2 in VEL 8.0 will only provide SQL V2 as a technical preview mode. But then both the CPU set and V3 V2 controllers should be available in 8.1. And we are hoping to, or V3 support it in 8.2 time frame if possible. But it also depends on how the other, whether they are able to support it or not. The kernel perspective is almost done. All the needed controllers are there. But we also need support in the middleware and in the higher level before we can claim official support. So looking forward, most people are still using SQL V1 today. There are some companies are using V2 by Facebook. But then the trend is we are trying to move people away from V1 to V2. And it will take a while, maybe a few years to 10, maybe more. I don't know, depending on how much people are willing to move. So for what I'm thinking is that for a considerable period of time both V1 and V2 will be supported. And until we are sure that most of the users are migrated to V2, we will keep supporting V1. But most of the time it will maintain the mode. So if there are any bugs, we will try to fix it. But we are probably not going to add new feature into SQL V1. Okay. And that's the end of my presentation. Do you have any questions? Yeah. You talked about deep CPU, C-group tree is having a performance impact. It happened in both V1 and V2. Okay. It's the same. I guess I was wondering is that, I mean if you do that, is that just impacting the subtree or is that impacting the entire operating system performance? Well it just, you lose some performance if you have to deep a level in the next thing. It's mostly for process under those SQL tree. So if you have a positive one at a good level, the schedule is just managing like any other and you are okay, let's say 3% of the CPU time, you will get 3%. But then there are some overhead for the scheduling. The schedule itself has overhead. And the more level of the next thing you have in the CPU controller, the more overhead it will consume. So overall speaking, the overall system performance will be less than what you expect. So you're saying you want to wait for people to micro it to V2. Given the fact that V1 is such a train wreck, isn't it just the disaster of waiting and not pushing people? I mean I would argue that it's an absolute catastrophe that Rails 8 doesn't ship with V2 enabled. Well we are trying to push people forward. Like right now for Rails 8, the default is SQL V1. In some future days we will change the default to V2. But you can expressly change the default back to V1 in the user wall. That's our way to move the user forward by changing the default. Yeah, we agree. One of the reasons why we have SQL V2 is because we don't want to break kernel KBI. We don't want to break existing applications using V1. So we can't change it in a way that satisfies our need to provide new features. At the same time, it doesn't break existing applications. And that's why we have to create a separate version for control group. Now we are trying to move people over from V1 to V2. It will take a while, yeah. Time's up. Okay, okay. This gentleman had a question. You mentioned that you had a year and a half discussion to decide between per thread or per process model. Yes. Do you feel personally that they reached the best decision or was the unhappy compromise just to move forward? Yes, kind of. Yeah, some difference in opinion between up-stream developer about how the V2 CPU controllers work and they kind of go into a deadlock situation, so they can't reach a compromise and install the whole process for about one and a half year. Finally, I proposed some changes and finally they reached a compromise on what the current model is. It certainly is a compromise, but at least all people feel they are acceptable and there's a way to move forward. Okay.