 Okay, hi everyone. Sorry for being late. I have some products for the laptop to turn on the presentation, so I finally would really like to find it. I will start... I will do a bit faster presentation because I have less time, so yeah. At the end there will be a lot of questions if you don't start. And I want to start to present me. I am Matias Vara. I started electronic engineering in the University of La Plata, Argentina. Then my PhD I did in France and I worked with the Citrus in Cambridge and then in Silicon, which is my work now. And during my free time I developed a kernel. I will present you a bit of that in the follow. Actually, what is a total? It is a kernel based on x86-64 architecture. The whole kernel is well, it is right in a free Pascal. The idea is to provide a simple API to the user applications to compile all together. So in this sense it is application oriented, so it is a really simple set of APIs. And as I said before, the application and the kernel are compiled together and then they are running in the zero ring of the processor. It means that they are not different level of privilege, so they are all together there. And they are compiled together, generating an image. And this image can run in several hyperbisodes, like hyperbicabular, well, it will later have a key in the world. What I say, more or less, is this. You have your application and then you write your application using the API that the product provides. The compiler, which is the free Pascal compiler, will show everything all together and generate the image. This image is a .EMG, so you can use it in any hypervisor. Well, you can transfer the format of the image and then use it in VMware or KBN and so on. What is nice here is that you can use the same image in all hyperbisodes. You don't have to modify it depending on the hypervisor that you want to run. There is more or less the total stack. Well, you have the hardware, which is always the same, which is architecturally fixed. Well, maybe you have or not the layer of the hypervisor. You have the whole kernel that pre-proposes the model that most of the normal operators can have, more or less they have, like the scheduler, memory, device driver, network, and so on. Currently, we have just finished the driver for build.io, for example, but also we have for emulated device, like EAT1000 and so on. In the virtual file system, for the moment, we are supporting X2, which is easy to use, and while on top of that, you have the user application. As I said before, kernel and application are in the same level. They are not different in the execution, in the send-out. They are using the same ring of previous. The idea of this talk was, I don't know what it is, this. Sorry. Okay, sorry. I don't see the mouse here. I can continue a bit saying what I'm going to talk. Actually, in this presentation, I wanted to talk about the kernel itself. I did that two years ago, since 2015. Say, more or less, what was special on it. In this case, it was most interesting to show some work that I did in the last three or four weeks, when I tried to reuse the CPU usage of the total running as a VM, like a Kimo VM or KVM VM. Actually, that is the talk about. I figured out that it was consuming a lot of the CPU, because the guess was 100% of the CPU usage. The presentation is about how I tried to do some API to try to reduce the consumption of the VM guess. More or less, it's that representation that I'm going to show that very soon, I hope. What I was observing when I was running Torio as a Kimo guess was if I do it in top of the host, we're seeing that the VM was consuming 100% of the CPU, which is completely unacceptable in terms of production. You cannot have a VM that is consuming 100% of the CPU, almost there. I can do it, I cannot show any picture, but I can show the picture at the end if you want. But I can display what I did anyway. I don't have to use it. The problem was, as I said before, when I was running Torio as a guess, I was observing that 100% of the CPU was consuming, and I actually didn't know why that was happening. Then when I started to analyze to figure out where the kernel was running, I figured out that there were many either loops in the code in different parts of the code. That was consuming a lot of CPU. Actually, the first thing that I did was to identify this point. Actually, I found three points in the kernel that were using either loops, which are common in some areas. It's very useful, the use of either loops that keep in the loop, checking that condition and doing some work if its condition is true, or it's not just keep looping. Actually, I identified three parts in the code that were using either loops. One was the use of spin-locks in the scheduler, and there were some threads in the system that were implementing either loops. They were doing some work, calling the scheduler, and doing always the same. So actually, what I started to do is start to look for bibliography to figure out how this fixer, and actually in the case of spin-locks, Intel proposed to use the post-instruction inside the spin-lock. That's the way to relax the CPU, to tell the CPU, hey, I am in a loop. Well, I don't remember exactly how the instructions behave. I have that in the slide. But the thing is, delay the next instruction in order to relax the CPU. The whole idea is to try to relax the CPU in all those cases. So in the case of the scheduler, I was furious that when there is no thread or no thread was ready in the system, the scheduler was just keeping in a loop. So in this case, it was quite easy to fix. What I tried to do was to use the whole instruction. So in that case, when the scheduler queue was empty, I was just holding the call. So the call was completely dead. And then wake up if you have some interruption at the end. For instance, you receive a packet, more or less. And the third case was the harder case because I have some thread that could be system thread or user thread that we're using in the loop. So it was harder because I had to find a way to tell the scheduler that I am doing needlework. So what I tried to do was I implemented an API based on two functions. One to say the scheduler, I am doing needlework. And in that case, the scheduler will just count the time that a thread is in the needle state. The idea is that if you have many, many threads in the system, the scheduler will go sleep only if all the threads in the system are doing needle loop. In that point, the scheduler will say, okay, I will hold the call, the system. That was the tricky part because you cannot hold a call if you have some thread in thread state for sure, right? So the idea was to do more. I don't remember exactly all the API, but the idea was to tell the scheduler, hey, I'm doing needlework. It count some times and after that time, if I have all the threads in needle state, I will stop completely the call. And the second API was a function to tell the scheduler, okay, I'm doing some work, so it stopped to count me that I'm needle somehow. So in that case, the state of that thread will be kind of ready again. It's quite hard to explain like this because I'm using a lot of terminology that maybe sounds weird. I feel sorry I could not show the presentation. And actually when I implement this, I try to compare what happened. And if I compare the old total with the new total, you can see that we are now saving the half of the power CPU because now when there is not any packet, for instance, the CPU goes to zero CPU sash. I mean, if you run top on the host, you see that. So now you are consuming the half that we were consuming before. If you send packets, you tell me if I am out of time, okay? If you send during 60 seconds packet and then you stop during 60 seconds like this, right? You observe that this is 100% during the first 60 seconds and then zero in the next 60 seconds. It was a first quite nice result. It was easy to implement also. It took me five days to implement all of that. But then I started to compare with something more serious like Apache, for instance, running as a chemo guest. And it was quite interesting because what I observed was that with total states at 100% during the first 60 seconds of my experience, Apache stayed at 40%. And in the case that Apache was idle, one goes to the 10% of the CPU, right? And it was quite interesting because I said, okay, I thought my result was good, but that one is much better than mine. The interesting thing was when you start to increment the request in these 60 seconds windows. So you see that in the case of Apache, Apache, the CPU starts to scale. So at that point of 200 requests during the first 60 seconds, you see that the CPU goes to more or less the same than total, I mean to 100% of the CPU. So the takeaway lessons were that, okay, if you have production, you cannot have a VM that is running at 100%. And it's, depending on what application you are running in the guest, you cannot control more or less that. The solution that you can implement, in my case, it was quite basic because this instruction that I use, that is called instruction, is well supported for more of the hyperbisers. But if you want to do something more complex and use another instructions, that depends on the hypervisor if it is supported or not. I mean, if the hypervisor will emulate or not that instruction. I'm talking about maybe, you know, the MY than M control instruction, which is not quite well supported, but it's something more intelligent called, I would say, okay. And the third takeaway lesson, I would say that my solution that was only hard, the core was not enough for sure. And a real solution had to take into account the scaling of the CPU, which is much more complex to implement for sure than its only hard instruction, right. Yeah, that was a, more or less a talk, actually. I mean, it was really an experimentation, right. That's it. I don't know if someone had a question because I feel so fast and... Well, in the case of the networking particularly, it's not, it's not blocked. So, for instance, if you, well, that is how I implemented. If you want to get a packet, you will ask the upper layer of the packet, but that function will return nil, if it's not, if there is no packet. This will not block until someone arrives. So, in the case of the networking, it is not blocking at all in the whole stack, I mean. I think for that I should, I had to implement such an API to try to... to draw off the... I mean, if in the case it will block it, I would not need such API because it will block at all, right. But, yeah, I know. I already thought about that. But the thing is the whole stack, the whole network stack is not blocked. Yeah. Sorry, can you repeat the question? Yeah, maybe. Yeah, yeah. So, you do all the time asking if there is, like, never data available? Yeah. Well, I think I could implement something like that. Now I don't have in mind all the decision and assumption that I did when I was working in the socket, so maybe I do that now it will not work, but I can think about that. I... Everything, I have a reason, but I don't remember all the reasons. Yeah. I don't know if you have someone or another question. I will show some... I will show something... some demo at four. I will show maybe the three slides that I show, all the workflow that I do, and I will run on the cloud a tutorial guest. I will run a word browser on tutorial so you can access as you trust me that it works. So, if it's possible, I will do it at four. Yeah. So, thank you very much. Sorry for the...