 I'm going to give a brief introduction to what people have been working on for over five years. So it's very hard to actually summarize it in 30 minutes, but let's try it. I'm assuming because Red Hat is one of the sponsors conference here, and you can hear it in public, people know what's going on, because at the Unicons there's no idea what scheduling is, right? I mean, there's hands, it's not a good opportunity. So, what is EAS? Energy-Ever Scheduling. So it started with, interesting EAS started with an arm about six, seven years ago, came up with this architecture called Big Digital. What did it do is basically it has, on the same C2 complex, you have very high-powered CPUs and low-powered CPUs. So if you have an Intel word, think of it as, if you have Xeon and Akron on a single CPU complex. Akron's are low-powered, Tango has no processing as a Xeon, and both of them are in the same CPU complex. Now, EAS is not restricted to Big Digital. It's just thinking about EAS as a result of Big Digital, but it's not restricted to Big Digital. Suddenly, because you have like a Xeon and Akron, in ARM's case they have like Cortex-A57, Cortex-A53 on the same CPU complex. Suddenly it becomes very important which CPU your task runs on. Because the Big cluster, the big cores which are very, very powerful cores, becomes a processing, but they're also very, very powerful. And on the flip side, the big cores, they're very, very powerful. But they can't do as much processing. Depending on your task, you have to figure out where to place your task. Think about a CPU scheduler as a main task for a CPU scheduler. Take all the processes you have running in the system and put them on all the CPUs, distribute them across the system. So it became very important how to do that. And so throughput was not the only method. Before, I mean six years ago, throughput, how quickly it went through all the tasks on the system was the only method used by the scheduler. It had to be fast. For one scheduler, if you've heard of it, it was always about throughput. I'm suddenly here talking about energy efficiency. So from a history lesson, what happens is there are two key frameworks doing power management inside of the scheduler. CPU flag, CPU ID. CPU flag is essentially scaling your CPU frequency. Right? Running at 200 megawatts or so, running at 200 megawatts. Depending on what you do, how much of the CPU power you need to scale it down and you can use it automatically. And instead of four hours of battery life, you only get eight hours of CPU scale. So that's CPU flag. And CPU idle is when you have nothing to do. CPU idle is when you shut down. It's when you shut down the cache. That's what I'm talking about. So think about it this way. The Linux scheduler is trying to move tasks around. So let's say you have eight hours of scenario. You have tasks on your mobile. You have eight hours of scenario. And it's taking tasks and putting them across all these systems. CPU flag and CPU idle are trying to save power at the same time. But they are not talking to the scheduler. So scheduler moves the tasks from CPU 1 to CPU 6. There's CPU flag and CPU idle after about 30 milliseconds, and then, oh, something changed. And then they suddenly scaled the frequency, because they are like, oh, that we need more processing power on CPU 6 because we moved the task there. So we need to increase the frequency on this one. And oh, oh, by the way, we need to actually make up this other CPU because actually CPU 6 was CPU. So we need to actually make it up before we can actually do anything. There was a lot of latency in this because it was already in reaction. The scheduler was not talking to the power and inference of this one at all. So everything was reactionary. Everything happened after 100 milliseconds. It was a lifetime in internal space. In hardware space. It's a lifetime. So I don't know why my introduction is sort of like before. I apologize. But that's me. Yeah, so I just want to talk to you about the problem of EAS. And it's not meant to capture all the details of EAS. So this is what it looked like. The scheduler, this is a strategy you like. The scheduler is doing task placement. You have CPU idle, the menu governor, which is like a algorithm. It says, OK, CPU is idle. Shut it off. The CPU is like with interactive governor, which is a similar algorithm is running on a second scale. And it's saying, scale the speakers. Since your CPU scale down to your back CPU, it's doing none of the other stuff. If you think about the problem, if you keep thinking of the problem, it's so obvious that actually the scheduler is the best case to make these decisions. Because the scheduler is one picking a task. But it's like, OK, you already know what I'm picking a task from CPU one. You may need to CPU CPU. So I know that CPU six is going to need more processing power. And if there is no more task left from CPU one, you actually idle CPU one right away. You don't have to wait for 100 milliseconds to make that decision. So the scheduler is actually the right one. That's when we started working on EAS. So there would be a scheduler who would have some sort of energy performance matrix. We had to teach the scheduler about what efficiency was. And then the scheduler would sort of go and start finished. I mean, six years and we're still not done. So six years ago, we were like, OK, we'll solve this problem. We're going to fix this problem. We're going to do some things and get this done. So we started with some piecemeal approaches. Like, OK, we have some internal switcher which basically created a virtual operating point for the big and little cluster. And we're like, OK, this is going to solve all our scheduler problems and energy management problems. Then we came up with this thing called small task packing. Because there was a lot of very short load, small housekeeping tasks that were constantly going on in the operating system. If you ever track it, if you run, like, turn your files off, it does something really quickly, goes back to sleep. Thousands of these going on all year. It's like, OK, all these small tasks, we want to move it to the little core and avoid making up the big cores so we don't burn too much power. So we call it small task packing. So we're like, OK, let's pack all the tasks. We want to fill our CPUs so we can put enough of them to sleep. The big task, you want to keep them asleep as long as possible until you start working your 3D engines. So that was the whole objective. Then we, it was this crazy idea of having completely separate power schedulers. OK, so we like ideas. I was there when I was at least the team here and in an hour of permanent meeting when we were doing more of these things. You know what, let it happen. He is the scheduler manager. This is what he said. Can you read this at the back? Because I'm not going to read it from here. But he essentially said, you guys are idiots. OK, I'm not going to accept any of this. And essentially what we said is this policy belongs in the scheduler. For a very long time, our management people and regular people have been doing things on their own. So going out and doing something here. These guys are doing something here. These guys don't know how to talk scheduler language. Scheduler people don't know how to talk scheduler language. We only get our own comments. We start collaborating and coming up with a new state which actually appears in English. Essentially everything that we have done so far, it goes about for years going to work. It's like mapped. It's mapped. We are not going to accept this. Come up with a better design. This is 2013. Back to the front board. And by the way, the other one was that everything that we did could not address existing systems. So think about all of the old Intel PCs that you have. Do you have a number of engineering solutions to add energy efficiency without regressing anything? No more regressions are acceptable in terms of throughput performance. Scheduler is the most important. Efficiency in scheduler is like the most important thing. That wasn't an option. We have a few percentage points of performance just to get there. Linux is used in data centers and servers. So whether it is in Facebook, it will kill us if we create a second patch which will actually reverse their popular patching or something like that. Now I'm just going to quickly talk about things we actually end up solving around the last decade. And that's basically it. I'm going to just say, this is what we saw if there's something happening. So the first thing is the scheduler had no concept of all of the apologies. Do you know what CPU looks like? 8 core CPUs. So what you have is typically you might have four CPUs which share cache. And then these other four CPUs share another cache. And then all of the eight of them share another level of cache. So all of this information the scheduler did not need. The scheduler only knew about how it goes. He did not know that if there is a cache for a cache hierarchy and caches by the consumer do not have caches. Of course, government and people can take a lot of free ones should have caches. But he cannot shut off caches until all of the CPUs that share that cache I take off. If the first hundred CPUs then I can shut off that other one cache. Then I take it to the other four CPUs go out and share this other level of cache. And then all of this stuff then it goes on the other way. That's the deeper c-stakes, the autonomous c-stakes in ACP-R. So there was no concept of our topology. We talked about power topology. So there is an SD share that's the share of the flag which has been allowed to define these topologies. At this level, everything is shared. At this level, these bits are shared. So it's not most powerful. Then we have to assume that all of those are equal. Because before that, everything was equal. You know if you were high, it was the only system. Each of those CPUs was capable of a single speaker. Every one of them was able to do that. Everyone had the same power. Everyone had the same power. They both did that. Retrogenous systems, that changed. Suddenly, the power characteristics and the performance characteristics of CPUs were different. So you had to teach it about that. So it's been taught through this thing called asymmetric sweep-and-stool. And on a normalized scale of 0 to 1, 0 to 4, So basically you pass that information to the government when you do that, saying that prior interviews, the first four are performance one, zero, two, four. It's an analyzed group. And the other four are performance five, one, two. And that means that roughly they are about half the performance of the first four. So you are essentially... What is this? You mean to say the big group and the small group? That they are force and order? Yeah, so previous and the force were not equal. The question was, is that how we do it? Yes. So now, by passing that information, you can tell that this is a big group because they are all on zero people. And this is a big group because they are all on zero people. We get that it's come up with an i-active order. So it's like three different integrations. So you can have three different performance states. So there are three clusters and they all have different performance characteristics. In that case, we become like five and two, seven, six, eight, one, zero people. Or each other's. So this is an extremely hard problem. Assuming that all cores are equal is... It was so ingrained in the electronic. Those are the data functions made everywhere in the world. All the load capacity, all the load calculations was based on everything. So now suddenly you are saying these are not equal. You have to start thinking about CPU capacity. How much capacity do I have on this particular code where I'm moving this task? So can I... The big code can probably take ten tasks. Let's say they are all equally equalized tasks. You can take ten tasks, but the real code can only take four of those tasks. So now you have to start thinking about what are the max capacity of each of those tasks. And teaching the schedule that has taken us to make a lot of things. How much time do we have? Then there was the integration of the ID tasks. So we basically, as I said, CPU ID was a community-based system. So we moved all of that ID code into the schedule. So the schedule actually took over the ID. Even if I didn't know what to do. It was everything to the code. And then we... There was also one assumption made in the... I don't know that if you just pick up the random whichever CPU to schedule it has to. Okay. Now in mobile you will either actually feel it or do that. You always want to pick up a code which is in a shallow hide and seek. I talked about hierarchy, right? I talked about CPU's reactions. Another level of cache. That has a cost. The more things you turn off, the more time it takes to turn that off. And that is measured in hundreds of milliseconds. To turn everything off. I mean, it's just a regulator. Powering on the old regulator that powers the CPU can take tens of milliseconds. So that cost is very high. So then we cut the schedule up that you only want to pick the one that's in the shallow state. Something that has just gone viral. You want to go into a deep sea state where you are going to have to turn off the caches and everything. Schedule it on that. That also saves power because it allows the other code to go into a deeper sea state. And that saves a lot more power. So that was that. Then we saw scaling your games. Dispersing problems. Read the statement. That leads you on this one. Is there a sentence for that? Sure. Is there a sentence that will be noted to the CPU that is in this list? Yes. Yes. That is, as I said, I'm not going to press it all. But that is a big problem. So an inter-prouding, an inter-controller. So which CPU do you wake up? Two processors in terms. There are tools which you can use. You tune the system where all the encodes go to a particular CPU. So you don't end up making the big CPUs. So a lot of tuning goes into mobile where even the hertz don't come out of the big CPUs. So you have to have a kind of an Android phone. Most of you are reading the systems in the last year or a year. And a lot of people are going to be making sure that you don't wake up any more. You have this first, you wake up the encodes. And then if you think that this is actually going to be a big task then you might wake up a person. So 20% lower. So if there's a concept of lower, if you ever run a problem, then you lower average. The concept of lower at 20% lower at 1,000. This is the same as 20% lower at 500. So the boss is the same. I mean, it's still doing the same amount of work. It just takes all the time. But the amount of work is the same, right? The amount of work is the same. But all the lower metrics are all messed up because of this. Because it doesn't take into account that you can't just blow up your lower averages just because the CPU frequency changes. So there's something called escape invariance. We have a diagram to show that. So essentially, this is with no invariance. This is with scaling variables. So essentially, the task is happening at 1 gigahertz. The task is happening at 500 megahertz here. And essentially it doubles. The lower average shouldn't double. But it's unfair to the task. It's not explored that the CPU frequency goes reduced. But the penalized the task. And that misses out all the statistics about fair balancing of tasks. So with scaling variance, that problem goes away. I'm out of time because I might need to look at those tasks. I think I have a lot of tools because I mean, this has been going for a month here. I'm not sure. It takes more time from high to 500 megahertz. This is the time. Yes, time is fine. So time is not the problem. The problem is how the time is accounted in your... So scheduler is also responsible for all the process accounting. How much task with this particular task comes right there? Because it has to do fair balancing of things that are talked about last time. If it is scaling variance does not exist. It penalizes tasks for both of the tasks. The task is not actually running a busy loop and taking more time. It's actually a CPU frequency that has gone down. So it's not in the control of the task. So the penalizing part is that some tasks may not connect to CPU time. Exactly, exactly. So I do not see here how it is here. Because the lines may seem so easy. It makes here a factor, a multiplication factor. You multiply the load with that multiplication factor for that particular CPU. And that sort of normalizes, that's against the scope of the game. So it's a factor that gives the task as part of the boot up. They are saying that on this CPU, this is the multiplication. And then we came out with a lot of tools because everybody wanted to be able to test it and verify that what we were training was true. So the app is an application used by the real-time folks. We took that. We added the ability to simulate all kinds of what you were trying to do. Machine learning. Clouds. We took that. Music. We did that. And the modular organization is a harness. It's a harness. We automate all instances. We literally do color with our patches. We fire it off and it will come back to the same. We have to invest a lot of performance on this system by 0.5%. We put it into the data. So we take out all the manuals. So the problems that we discussed, this is my perspective. Energy model. Energy model is essentially the cost of running a task on a CPU at a particular frequency. So it comes off in its own terms. How much energy are you consuming? And then my tables normalize. Start 100 megahertz. This is the cost of keeping the CPU alive. I don't know if this is the cost of cleaning. This is the cost. And you translate that into a normalized game. That's a table that's fast and consistent. This is a very tricky problem because how to pass this can become a better platform. It's a harder characteristic. How to make sure that this table can pass through the loaders or the manpower. Come up with a new one for that. And it is a PSP topic for itself. It's basically the load packing in between the bundles. This is how you go all the tasks. And so for interactive workloads, it takes about 32 milliseconds for us to register in this curve. It's the main curve. And it takes out 32 milliseconds for, let's say, you have your screen. And it also opens up. It's going to take about 32 milliseconds. Of course, the scheduler will say, oh, browser, there's going to be a lot of computing. And you can scale up my frequency from 500 to 200. And that results in my performance. You said, like, something like that. So if I say workloads, it's very important to quickly shoot up under the highest frequency. That's the issue of the program. So that's something there. And then I'll share to you again. People have to be able to do this for different workloads. This is not going to go all the past several of yours. And they have to move them into sequence. And I see those two more. So these are the things that have to be discussed. These are some references. So I'll share this. I'll give you an answer. That's it. So one thing. There was an article about the recent questions of Android 2.0. The scheduler is probably not aware of which. So let's say two processors come in different cities. The scheduler is smaller than the other two. There are some invitations around the activity. There's the user space, which basically says even if a processor is demanding a lot of CPUs, it is not critical to any of your experience. There's one example of background tasks. For example, you have your computer, right? Right now, your phone is in a pocket, but it's constantly updating your game. That doesn't need to be on the report. So there's a single created for all these little background tasks. So you don't make these things up. So Android 2.0 is too simple. So I think they have the foreground and the background. I think that is much better. You can interact with them. Yeah. Many other questions? I recently noticed that even the Pixel 2 and the Pixel 3.0, they also have different ideas. Were there any benchmarking tests, which gave like how was VAs compared to HMP, which was the... Look at the rush instances. It will take you to some of the... Of course, I have written about this over time. And the patches we have posted, we have actually published benchmarking tests. On non-year solution courses in the years, it was like 18-20% of our studies. It was massive. At no cost of work, people. Close to nothing. And one more thing would be like, if I have to generate an energy model for my associate, how is it possible? There are tools, but the tool is not an energy tool. It's more like a Python high-hook. And you can literally... You have to measure it physically. You have to say, or talk to your hardware designer. Tell me what are the characteristics of these normal systems? No, you can do it. We actually have a talk about the linear mechanics. One of my colleagues actually gave a talk on how to create this without talking to our hardware and how to instrument their work and measure power and come up with that table. It won't be, like, perfect table, but it will be a good approximation. We'll get 80% of the rate. And it's good enough for... And if I'm shifting, like, let's say 3.18, it's my time to shift my HMP. It will completely create. I wish it. So 4.14 is the final version in which Google has mildly merged the areas. So if you basically move your base... Anything you want to add, all that's enough. It is already such a complex... I mean, think about five years of getting stuff upstream. You would require a company that has, you know, manpower to do a platform for the schedule or just to make it work on the table. I want you to know that. For one guy, it's perfect enough. Unless you want to write your sign. Otherwise, just move to a 4.14 column, move your base... It was difficult to put all the patches of 3.10 or 3.18 to 4.14 in the same thing. It's upstream and in the first place. I wish I was in the... Any last question, please? Thank you very much. I'm not Aadish Parthal. I'm Sivesh. So Aadish did this presentation at this fire workshop in IIT, Madras last month. And I am not going to run through this presentation. This is essentially a plug for people who are interested in contributing to the UNESCO. So how many people came here looking for ways to contribute to the UNESCO? This is amazing because this... At least part of the slides that... Oh, I don't work for western religion either. So that's not me. Neither am I deviant. How many people know what RISC 5 is? Oh, that is really amazing. For the people who don't know, RISC 5 is an open source instruction set architecture. How many people know what an instruction set architecture is? That is even better. So we are like almost halfway there. We have cut down our template up to 96 degrees. So RISC 5 is an open source instruction set architecture. It was developed in University of California, and it was developed as a teaching aid. It was all the instruction sets that they were using from each way to complex, or had some kind of patent issues that they could not use to develop their own microprocessor core in-house. That is a great RISC 5, and the good RISC 5, with the aim of it being as simple as possible. And even today, if you look at the RISC 5, it's under 100 instructions. Also because it is a relatively new ISA, but then it's less than 100 instructions. And the best part about it is that the ISA is open source. So if you are the CPU manufacturer, I doubt if most of you are in this room, but you could actually develop your own micro architecture and use the RISC 5 instruction set architecture to actually interface with the programming. So RISC 5 Digital is one of the companies that's heavily involved in developing RISC 5 using the RISC 5 ISA in running. And for that reason, there's a... So I'm not going to talk about why Linux and all that because you guys already know that. So I'm going to jump directly into what is required. And I'll skip the user space as well, but you can come talk to me if you're interested in that. So RISC 5 Digital Google, a bunch of companies are actually interested in RISC 5 as a way to create their own microprocessors and basically develop their own application-specific micro architecture. And to that end, a lot of people have contributed things to the RISC 5 architecture and there's a lot of things that still remain to be had because it is such a new architecture and it's such a new ISA. There's a lot of scope for developers to actually get involved and help out. So one of the things that Aakish Colby is really, really interested in getting people working on is a motorless microprocessor. So right now, it uses a proprietary motor or a format and they want to move away from that to something that is more standard. So something like Newport and... It's not that it's live, but it would often end up something like DDS5 would be possible. So if you're interested in motorless, I suggest if Aakish is there, you can talk to him and get started on that. There's also CPU topology support, something that I'm going to talk about just now. Power management and a bunch of basically fine tuning of the kernel for that ISA that we can actually work on. Then there's also advanced memory-related support. Like huge pages, for example, is not there in the kernel for ISA 5. I don't think they have hardware that actually will benefit from huge page support at this moment, but then it would be something interesting as a college project or something like that. If you're interested in learning how huge pages work and how much memory transcription to page support in the kernel works, this would be a good project to start on. Then there's a lot of tracing and debugging. Tracing and debugging, a lot of people look at it as a federal function, but if you ask any software developer working in the industry, their hope or wish is that I wish we had tracing and debugging first and then we would start debugging this out there. But all the software is usually the other way around because functionality is the king for businesses, and they want to get functionality out first and then the infrastructure to place and debug programs that place on that. So there's a bunch of things in there. There's KXX, there's K-Probes, there's KGNB. The PPF and PPF is the up-and-coming way to do tracing in the Linux kernel. There's a lot of very interesting AV graphics I think we all have often this. So you have very nice community to interact with within the KVF of JVF as well. So as for hardware, if you're looking to get on a more seamless level of operation, as of now there is, at least for our practical models there is more available. There are two more that are available and they're about $3,000. So if you can afford it, great. But otherwise you have FPGAs that you can buy and go in for the specified FPGA image on it. I think low-risk is one difference that is not the work with some, I think, some FPGAs. Otherwise the best way to get it to work is by using QD, which is a virtualized environment and there are some work areas in KVF as well which are shown in the graphics that I don't have here. But again that's also an area that you can get yourself involved in. So that's all I have for RISC-5 and running on RISC-5 and how do I take it? Awesome. Any questions? Any more questions? So any questions in this? I'm not sure if you have any ideas on the initiative to come up with the project. Yes. I'm aware of it but I'm not involved in it. How could we get in on it? Any ideas? So there's a big bucket right where they have some quality very low price. They have also formed a company and I forgot that I actually attended one of their talks during the RISC-5. This was in 9 months last month and that is what you're supposed to do. They have some 4 periods of process at the same time and they apparently have funding from DREO to 3.0 It's a subset of that. So there's a open source party for it. It's on the bucket. The ISO obviously is open if you can read the ready talk files and understand the pipeline structure and so on, you can actually hold that to a different... So that's one of my better projects which I have only in my head and I have not done anything about it is there are a bunch of simulators. There is a simulator for Gen-5. Gen-5. So what Gen-5 allows you to do is it allows you to build the first processor pipeline and you can build the first processor pipeline as long as you have the ISO support for it. So for Gen-5, you need to add the RISC-5 ISO support and you need to build a shanty pipeline on top of it which would be a really cool thing to do because I don't have any time to do it. So if anyone's interested in doing that you can touch with me and we can figure out a way. Any other questions? Thanks.