 Hello, everyone. Graph.no. My name is Rui Huang from MD Linux team and focus on the multi-components in the Linux kernel, including the CPU power measurement and GPU DIM graphics compute support over about 10 years. This is my first time to present talk in the LinuxConf at OpenSource Summit. So it's my great honor to have the opportunity to talk about the new CPU frequency control mechanism on Linux here to all of you. In fact, it is a long time since we prepared the solution last year. Then we started to upstream the kernel module at September and take 7 rounds of public upstream code review and then until about earlier this year, the initial patch set was finally accepted by the kernel 5.17. And today, I would like to go through for technical sharing with the overall design and implementation of this solution. Thank you so much to be here and listen to my introduction and feel free to contact me if you have any questions. Okay, move to the background. This function is required from the Chrome project with the season shapes. That is the first MSR solution support for the collaborative processor performance control. Let's say CPPC, because the Chromebook would like to have a better performance per broad scale in their new product. And at the same time, why we were working with wolves of engineers to enable StingDec on the AMD platform last year. We were tuning the Sting games performance on the Vulkan D3D Proton. They found two issues on the AMD processors. One is something wrong with the CPU frequency prints and the other is slow motion on the horizon zero jaw. We spent a long time to investigate them and found quite a few problems for ACPI CPU free driver on the AMD processors. So we thought the ACPI CPU free driver might not be very performance and power efficiency on the modern CPU platforms. Okay, let's talk about existing the kernel frequency control here. As we know, Linux kernel provides a CPU free framework to use the kernel terminals on-demand and the schedule to use such as this kind of policy to control CPU clocks. And many years ago, Intel implemented an ACPI based CPU free driver. And it is absolutely a very good driver to provide a general solution on the Lexi Intel processor before the Sandy Bridge. Then they switched to an Intel-specific PSD driver in recent CPU series. However, current AMD CPU platforms are still using the ACPI based CPU free driver to manage the CPU frequency and clocks with switch only in three PSD. The ACPI based CPU frequency driver develops based on the Intel platforms and let's bring some potential issues on the AMD processors. The AMD hardware also provides a former-based CPU clocks dynamic power measurement. It is also to control the processor frequency. We call it DPM. The ACPI CPU free driver is not avail of the AMD CPU clocks DPM in the former. So they would have the conflict to impact the final target frequency. So we plan to design and implement a new AMD CPU free control proposal. This proposal is expected to handle the hardware functionalities such as the AMD CPU SMU former and ACPI CPPH. The AMD processor provides the MSR register as the back-end mailbox for the frequency control. Because MSR is a low-latency register model that is faster named ACPI AML code interpreter. Above all, we decide to implement a new MDP state driver instead of ACPI CPU free driver for AMD platform. It is able to use a finer gram CPPC frequency range instead of traditional ACPI 3P states to control the CPU frequency. We can use even leverage the kernel governor such as the schedule tools to predict the workload to calculate the reasonable design performance values with the Linux CPU CFS scheduler. We mainly use the schedule tool governor to optimize the solution and manage the hints to the SMU CPU clocks DPM to calculate final target frequency. This page expands the many methods of the frequency control in the former. Let's formula is got from the SMU former and it is the algorithm less the SMU former to adjust the frequency at low level. I probably cannot go details of the algorithm because we can know the target frequency is mainly decided by the activity here. The activity in fact indicates the maximum C0 resistance during the DPM cycle. That means the input is decided by the CPU idle driver in the Linux kernel. It's very similar with on-demand governor in the Linux kernel and I will expand later in this discussion. These four performance values are defined for MDP state performance capability entries. They are the performance scales which is mapped as different processor frequencies. The different processors type have different frequencies. MDP state solution introduces continuously performance scaling between the highest perf and the lowest perf. The highest perf is similar with the both state before and the nominal perf is similar with P0 states. And the processor frequency can be dynamic adjusted during this range. This is another four performance values are defined for the MDP state performance control entries. The performance scales which are mapped at different processor frequencies as well. Users can assign the performance range as the hints to the hardware. The hardware can control the real-time frequency according to the hints that MDP state driver provides. And the fourth scale is different. It is provided hints if the driver wants to bounce towards the performance or energy efficiency. We are still working on this feature at this moment and continue upstream the port once it gets wirefiled. This kernel has several governance such as on-demand performance, power-safe user space, and the schedule to control the general frequency change for most CPU processors. Performance and power-safe set highest and lowest performance goals statically for processor cores support. The on-demand governance is mainly used for most of the current processors besides the modern MD Zen Serial CPU. This governance sets the CPU frequency depending on the current system workloading. Schedule 2 governor expires the better integration with the Linux kernel scheduler. And the load estimation is achieved through the Schedules for Entity Loading Checking method, which is also provides the information about recently workloading. The MD-CPU Frick design is to create a new kernel module, named MDP state driver, under the CPU Frick subsystem. This module can access the CPPC-related MSR registers, which are provided on some of the Zen 3 and later MD processors, which includes the CPPC functionality. The SMU former can detect the hints less MD-PC state passes while the MSR registers, and responds the request into the hardware to control CPU clocks according to the performance goals from the MD-PC state module. The CPU cores estimate CPU usage statics over the last period, which is similar with the SMU former method to monitor the C0 resistance. Then decide frequency is increased or decreased. However, these kinds of CPU statics are mainly controlled by the ACPI processor idle driver in the Linux kernel. Let's use the MMWate FFS style translation to set CPU from C0 to C1. Current Linux kernel is mainly using the completed fail scheduler. We call it CFS to manage the processor scheduling. Because the scheduler is aware of all the processors, including the kernel and the user space inclusion status, the CFS scheduler provides per entity loading checking method. We call it PLEPELT to manage the real workload at real time. The schedule too works based on the DVFS for task managers by the CFS. The DVFS is dynamic voltage and frequency scaling. The real time and the deadline scheduler tasks are always running on the highest frequency. This is user mode CPU power extension tool to manage the CPU frequency subsystem running status. We also asked the MDP state and ACPI CPC lib support in this tool. After that, we also introduced the MDP state tracer in the Linux kernel as well. It is very useful to monitor processor running status that can be used for the performance and power tuning at platforms. An MDP state driver is brand new and can also work on some of all the same processors. We introduced the unit test suite into the self-test as well. That is to make sure to have good functionality on current system BIOS and former supported platforms. As we know, the MDP processors in the CPU MDP processors are controlled by the SMU former both on the CPU and GPU portion. Each MDP CPU and GPU are all used SMU DPM. We call it dynamic power management function to control the CPU, graphics, data fabric, clocks. The arbiter in the SMU former is the state machine to manage the clocks control. Please take a look at the two pictures which are the program sequence, that's how to get the CPPC capabilities data of the highest, nominal, lowest-long-linear and lowest-performance and expose them to the ACPI interface or MSR registers. Another is the program sequence, that's how to respond to the request hints of CPPC such as the max, manning, design performance and EPP values. These are the dependency components that we need to take care of for the MDP state driver. Firstly, the first is the CPU idle driver. The MDP CPU is using the M1 weight FF style translation to set CPU from C0 to C1. Actually, the CPU idle framework is managed C0 resistance percentage in the system. We provide a slide that I just mentioned. The resistance indicates the activity here in the SMU former. That is the key impact factor of the final target frequency. Actually, the SMU former-based frequency control also depends on our OS kernels, the CPU idle driver. Another component are the kernel terminals in the CPU free driver. As suggested by the kernel maintainers, we are trying to optimize the Schedule 2 and on-demand terminals for the MDP state driver. For example, Schedule 2 can predict the workload directly, which is more efficient in seriesistic. But we still do more tuning in the real hardware. We still need to tune the new CPU free driver on our existing MDP platforms. The Linux CPU free core offers a standardized interface for the CPU free subsystem. We use these interfaces to implement a new kernel CPU free driver, which is named MDP state. The MDP state model is to instead of the Lexi ACPI CPU free module implement CPPC function in the MDP platforms. And it is to manage CPU frequency and performance with the MDP state ATIs with multiple Linux kernel terminals. We are using the two different processor identifiers to know whether the processor supports the MDP state module or not. One is ACPI CPC object in the ACPI table from system files. And another is CPC CPU ID. The CPU ID is used for checking the MSR support on the current processor. The MDP state driver implements CPU data structure to store the private MDP specific information and the function callbacks. And we implement the MDP state register write and read helpers for the capability and the request interfaces. And this driver implements a free QoS request instance for the free constraints. The constraint structure sets the limit range between the maximal and the minimal frequency, which register into the Linux PM QoS framework. The QoS is quality of service. And then we also implement the CPU performance scaling driver RST restructuring the texture kernel documentation, which is to introduce this driver details and please feel free to access this link to know the details. We will maintain this documentation as well. Let's continue to look at MDP state driver. This module is MDP state driver instance to implement the general operations in the callbacks functions. It can use a relative performance and scaling control that is dedicated for the MDP state interfaces on the MDP processors, such as the target and adjust the frequency update functions. The target is the traditional callbacks, which is used for the on-demand and schedule UTO governor. And the adjust perf is the fast-switch function that is only for the schedule UTO governor. In fact, they are all used for mapping the functions in the governors. Then the kernel governors can manage the P-State performance hints with the MDP state module for each processor core. In future, we plan to implement kernel parameters, how to switch the different frequency management policies and customize the different management policies for specific products. We will also implement the UPP, energy performance preferences, pre-reference control and pre-for-core and fast CPPC. I also would like to discuss with the Linux Power Management developers for the further support in the Linux Promo conference as well. There are two static trace events that can be used for the MDP state diagnostics. One of them is the CPU frequency trace event, which is generally used by the CPU freak. We can use these traces to monitor the real-time CPU frequency of each logic core. And another is MDP state perf trace event, that's specific for the MDP state driver. It can be monitored, all the performance goals, which MDP state programs to the hardware side at the wrong time. This is an example of the trace event, that's MDP state module, that's definitely very useful debugging information for performance tuning. Then we can leverage the trace event in the Linux kernel to develop a user spacer, TracerTour. The TracerTour is a Python script, and can record and parse the Tracer logs. This feature is to check the latency between why we have a workload and why the CPU frequency is increased. You can look at this picture, why we have a workload, then the frequency is increased. Once the application is gone, the workload is also gone, then the frequency goes down for the very low level at this moment. We can monitor the wrong time data, every time we write into the performance goal, into the hardware registers. At the same time, we can also check the wrong time CPU frequencies. At the same time, please see this kind of data from Excel file. This is the link of this tour for the MDP state Tracer detail implementation. MDP state exposed several global attributes files in the CFS to control its functionalities at system level. These attributes file located under the CPU free subsystem. This is the part of the CFS interface. They indicate the highest and nominal and lowest long-lanion and lowest processor performance values and frequencies. These APIs are used for the CPU power tour. In MDP state driver, we only implement the different frequency and performance, which is not included in the CPC live. This is the CPU power tour and the library. In this picture, we introduce the CPU power tour and the TurboState support. The CPU power tour and the library use the space tour to query and assess the CPU free APIs. This tour is suggested by the Linux community and widely used by the Linux world as the processor power configuration applications. It adds two components, let the MDP state performance and the frequency level helpers and MDP state CFS operations handling. We also add general ACPI CPC CFS APIs into this library. The user can control the CPU power tour to manage the MDP state module. The TurboState will be the next step we would like to support for displaying the frequency and power consumption, idle status and so on. That's definitely a very famous tour in the community. We will introduce unit test support. The UT is the unit test test module to test the MDP state driver functionalities. It is for verifying the support of the system BIOS and the firmware on existing CPUs. You know the MDP state driver is new, but it can run on many old MDP CPU platforms. So we would like to have a unified test baseline that can be running on all the CPU platforms that support it on the CPC. And we will add more CPU benchmark automation testing, such as T-bench and GitSource in the test module. It's leveraged kernel self-test framework to implement it. Then each developer and user can easily test performance data in their platforms. And also easy to test all the existing CPU platforms with CPC functionalities. So it's definitely a very useful tour. And we will add more test suites and test steps into the kernel repo. We are using the RAPL, the running average power limit interface to calculate energies of current CPU patch power. And we are using the perf tour to monitor the energy consumption while we are running the any CPU benchmarks. The APIs is already implemented in the latest kernel right now for all the ZEM-based processors. This approach can be used for the performance per-rat testing on the CPU platforms. And this is the guidelines of the formula that we calculate performance per-rat scaling for each benchmark test. Community provides a paper to explain how could we test and measure the performance per-rat data. And I didn't want to go into very details here if you have any interest, we can discuss it offline. Okay, we pick up the MD-CZEM processors to do the sound performance per-rat testing on the multiple CPU benchmarks. This is a T-bench test result. They are the average data. Let MDP state got 3% drop in our Vista schedule tools and 2% improvement with the performance governance. And these are the 15 times average data and pictures here. And this is the test environment we get from this benchmark. And we were also provided the T-bench test suite for the MDP state UT. And then the user can verify it in their platforms as well at the same time. You know, the MDP state is still brand new even we have a better design, but we still do more power and performance optimization in the existing BIOS firmware and kernel driver. And that is the GitSource benchmark test. They are the average data here. The MD-P state can got very positive data in this kind of benchmarks. And the MDP state can got about 24% improvement with scheduled UT and 30% with on-demand governance. So we can have a good data on this kind of benchmarks. But we still need to have more issues to investigate how can we get the performance data in this kind of benchmark. Not only save power, performance, perhaps we need to use a balance during the performance and power side. Okay, this is the speedometer benchmark which is required from the Chromebook. And the schedule too, governance performance for VAT is almost the same. But we still need testing and optimizing the power base. Sorry, we have 9% improvement in on-demand. But they faced 3% job in the performance governance. So the Chrome is default using on-demand for the MD platforms. So we can have a bit improvement in the Chromebooks here. But we still need to do more optimizations. The MD-P state is still new and we still continue testing and optimizing the power base on the Chrome software platform. Okay, finally, let's talk about Karen's status of MD-P state driver. Actually in design, both on hardware and software design, the MD-P state CPPC provides a final-grammed performance range instead of the Lexi ACPI-P state which leveraged for the ACPI-3 driver. Less the manning improvement from this driver. But so far, we still have a long way to go. We need to do more testing because it is new. We can support this kind of driver in the existing platforms, Zen2 and Zen3 platforms as well. And we also spend a long time to go upstream. In the length of kernel 5.15, we have the MD-P state driver support. And in kernel 5.18, we have a CPU power tool and support and the MD-P state tracer tool support. I think following the kernel 6.1, we were introduced the new MD-P state unit test support. And in the next, we were introduced the EPP and the preferred core support in the future. We can see some walking staters from Mr. Bureau Walking Ripple, Git Trace. And you know, at last, it is a brand new driver, MD-P for MTP state. So we still need to have a long way to go. Currently, we face some challenges. We still have some challenges. The first one is we need to resolve the performance and power issues on the multiple benchmarks. You know, we still face the share memories issue, performance, power drops, which compare with the ACPI CPU flick driver. In these kind of processors, we definitely still have some issues. And the second, we still need to cover all the existing CPU platforms with different types of processors. As the mobile, desktop, server, and so on. And the third, is we still need to optimize the power on the steam lag for gaming, that is the proton. Because, you know, the protons were leveraged while, the while were talked to the CPU flick APIs directly. So maybe we can do some good transactions between the proton and the CPU kernel, CPU flick driver for the MD platforms to make the, we have a better gaming power and performance in future. And that's all for, actually less the challenges so far. We will try to adjust this kind of concern for this kind of challenges for us. So, and yeah, that's all for my side. These are the performance links for the whole pictures, MDP state driver design. Feel free to ask me any questions and then we can discuss here or mail me is welcome. Any questions are welcome for me. And also, thank you very much for listening. And that's all for my side.