 Hi, Ricardo! Hi, how are you? I'm good. How are you? I'm good. I'm good. Thank you for joining. Yeah, let's wait a little bit until more people show up. Okay. Maybe a couple, maybe two or three minutes. All right. Okay, so three minutes past the hour. Thank you for joining. Thanks everyone for joining. So yeah, today we have Yulin. We'll be talking about quark another runtime written in Rust. So yeah, take it away. Yeah. Okay. Thank you. Let me share my screen. Yeah. Can you see my screen? Yeah. Yeah. Thank you. Here I will introduce the quark. It is a high-performance secure container runtime. For the security runtime, we have three, let me present that. We have three major dimensions. First is security, another is performance. Here we are working on this Linux compatible runtime container. So we need it to provide Linux compatibility. Yeah. Here for the quark, we are using security based on the KVM-based machine, that's based on isolation. And now also we are using the secure program language, that's Rust. And for the performance, we are designing the dedicated for the containerized workload. And it's optimized for the multi-quark CPU and using this high-performance language Rust. And our goal is to provide that's the Linux compatibility. So far we already implemented more than 210 system quarks. In the future, we will add more system quarks and provide more compatibility. Actually, for the secure container runtime, so far in the market, there's another secure container runtime. First is Kata. Another is the divisor. Yeah. Yeah. Kata is based on the virtual machine. And it is a straightforward implementation that is running the container inside the virtual machine. In this mode, for the virtual machine, actually, we have a Linux kernel, that's a common architecture that we have a Linux kernel, we have a QML. And we also can run different OIS inside this in this machine. It could be Linux kernel, or the Windows kernel. Actually, we can, based on this architecture, we can get some performance opportunity. For example, for the QML, QML is a general purpose that's a virtual machine monitor. It supports not only Linux, it can support other OS. But for our Linux container runtime, we only need to support Linux workload. So this is an opportunity to optimize QML. We got a better VMM to get better performance. Also, we can, for the Kata, it's running inside Linux kernel. That's a, we can also get some opportunity to optimize that. That's a Linux kernel is designed to run on real hardware. But for the container runtime, in our target environment, it's running on the Linux kernel. So we can get some benefit to support the Linux kernel. And Linux kernel also supports all kinds of hardware and device. But because our container runtime only run on service, server hardware, so they have multi-core and have X64, CPU, 64-bit CPU, maybe in future we can support this ARM 64. And it doesn't support video and audio. And it don't need to support other devices. For example, the software, the software disk driver, that's still supported in the Linux kernel now. So we can get a benefit to support only limited hardware. Yes. And Linux kernel supports all kind of workload. But for the secure container, worker target, its workload is, majority is for the cloud native workload. His work item is TCPIP protocol. And also, may have this hardware disk IO, may also have this cloud-based IO, for example, RS3. That's a virtual disk IO. Eventually, that data is started in the cloud. We can do more optimization for that. So we designed an environment that's just the target for the container runtime. Just for the container workload. Comparing with the common Linux machine solution. It's developed a kernel. It will just work as a Linux kernel and provide the Linux kernel compatible system call to the Linux container application. And for the VMM side, we provide the Qweiser. It is working as a VMM. And different than the Qmo plus Linux kernel solution. The Qweiser can only support Qkernel. Also, Qkernel can only run with Qweiser. And the Qweiser can only run inside the QVM plus Linux kernel. So that we can do more optimization based on this more targeted architecture. This is our high-level design. For the part container, the Qkernel is running inside the guest kernel space. And the Qweiser is running inside the host. It's just like a common Linux application. And between the Qkernel and Qweiser, we use a special call mechanism to let the Qkernel and Qweiser communicate with each other. That's Qcall. Qcall is based on the shared memory-based communication between the Qkernel and the Qweiser. So when Qkernel sends a request to the Qweiser, we don't need every time to call a hypercall. Hypercall in the QVM architecture, the hypercall cost is very high. Based on this shared memory that's channel, the thread running inside Qkernel don't need to exit the guest space. We just use that shared memory to communicate with Qweiser so that they can get better performance, better throughput, and better latency. And inside this Qkernel, we have multiple water CPUs. And inside this Qweiser, we have one Qcall thread. This Qcall thread uses a shared memory queue to talk with Qkernel. So I'm waiting for the message sent from the Qkernel. And when they get the message from the queue, he just processed that use the Linux kernel system call to handle the request and then return the result to the Qkernel. We find that even we use that Qcall mechanism, the latency for some system call is still latency and throughput for some system call is still very high. So we also use it as an IOU ring, that Linux IOU ring to celebrate the process. With IOU ring, that's a Qcall and the Linux kernel directly shares a memory. And then Qkernel can send the IOU request to the Linux kernel directly. So in this way, we can get much better performance than the Qcall and the hypercall. But for the security reason, so far we only passed the data plan that's the data IOU request to use the URL. For example, the read, write, send message, we use IOU ring. For the other metadata related operation like the open and the socket, this kind of system call, we still use the Qcall so that we can have better check, another level check in the Qvisor to make sure this Qcall is not compromise the request. This is a high-level design. A question. So how did you handle the virtual CPUs, the resources? So, yeah, in the Qkernel, you may see four virtual CPUs, but what does that mean at the host level? Is there any sort of map in there? Oh, yes. Actually, in the KVM architecture, the virtual CPU is a host thread. Yeah. And it's virtualized in the host kernel, a guest kernel as the virtual CPU, but it's just a thread inside the host. In the guest kernel, it's just a virtual CPU. Yeah. And for that, in the Qkernel, we should have multiple kernel threads, just like Linux kernel. And for this kernel thread, we will just use this host thread, to handle this guest thread. But it's just like a kind of, that's a user-space thread. That's in the Solaris, that's the operating system, we have between the host thread and the guest kernel thread. We have multi-multi-multi-miking. That's, for example, eight host threads can handle maybe 100 or more. That's the Qkernel thread. Got it. Yeah. Actually, this architecture is just like, in some level, it's just like the divisor. But we have much different also than the divisor. Actually, this project is also motivated by the divisor project. We have learned much from divisor. But we have different design and different design. And we also do some optimization to, our goal is to get better performance. First, the optimization is that we use rust instead of go language. I think this is a major performance improvement area. We know that go language is a very good language. But it is not designed for this kind of, that's the OS kernel-level developing. For example, it doesn't support system-level, that's the memory management. It supports GC. And the performance is slower than the rust. Also, it also introduces a difference. First, that for the memory management, that's the divisor has its own, that's the heap management. It has the garbage collection. For the garbage collection, that means we cannot control the memory location fully control that. Before, maybe three years ago, I did some, that's Kubernetes, Kubernetes performance tuning. I find that the Kubernetes consume maybe a few hundreds of megabytes memory. When I do tuning, I cannot find where is it. And where likely it is, it is reserved by the GC. But now, for the rust, we can plug in our own, that's the heap management. So far, in the Qwizer, we are using body, body equation plus lava, that's manual management to manage the heap memory. Another is scheduling. That's, just now we mentioned that in the, in the KVM-based, that's a water machine. That's one host, host thread, map to one virtual CPU. And then we, and we have multiple kernel threads, we need to do a scheduling between this virtual CPU and the, and the host, the guest kernel, we need a scheduling. In the Qwizer implementation, it's based on the go round time strategy. And in the clock, we implement our own scheduling. That's the automizer for the, for the, for the, that's the IOU run call and the Q call. Yeah. But nothing is, that's a guest to the VMM call. It says you use the Q call. So that's the share memory, share memory Q-based. That's this, that's is also, we, when we do the tuning for the Qwizer, we find that's Qwizer, when they handle that's a high QPS call. For example, the, for example, the workload like a radius, that's the, for the radius, that's his, his call is just the, just have a small workload inside the memory process and then do the IO. That's a very high performance, high QPS IO. And we find that its performance is majority, the performance is bad and the majority that's the performance penalties at the hyper call. So we, we implemented this Q call and IOU run-based call. That's easy for the IOU run. It can also, it can directly bypass the, the, the, the guest kernel can call the host kernel directly and it can fully bypass the, the, the guest, the host application level, application layer, so that it can better performance. Yeah. Any question about that? Hi. Hi, Julien. Hi, my name is Cesar. I have a question, please. So in order to do the performance optimizations that you're doing between the communication between the guest machine and the underlying PMM, for example, you have sharing the memory, do you think that has any impact as far as reducing isolation? Because rather than going through a hyper call, which is a very sort of narrow interface, right? Yeah. I wonder if, if in, in, you know, going through the shared memory, yes, it's faster, but then it ends up reducing the isolation of the VM in the sense that it makes it easier, let's say, for an attacker that compromises the Q kernel to then get into the, into the Q device. Yeah. Yeah. Actually, this thing we need to consider that secure, we need to balance the security and security and the, and the, and the performance. That's the, that's the balance, we need to balance this. Actually, in theory, yes, we use a shared memory based communication between the Linux kernel and the host kernel. Yeah, this is the, we may get some larger, that's, that's a tech service, that's possible. But actually, I think we'd better do more evaluation for the concrete, that's, that's the threat model. So far, we haven't got a concrete threat model for that, for this model, for this communication, communication mechanism. Actually, for the user part, we also add some, some protection. For example, we only support the data plan operation, like the, like the, the, the read write, and also for the, for the, for the socket, for the metadata operation, like the create a socket and create a, create a file descriptor. For this kind of thing, we can also, also do more protect in the, in the, in the QCOR, in the, that's the QWither layer. And also, we, we limited the, the IOU, IOU RINs, IOU RINs, IOU RINs also based on the file descriptor. They also limited the permission for this, this FD. And so far, we haven't concrete that's threat model to see this, this model have security hole, but we will still working on that. I see, I see. So would you say that it, let's say compared to a traditional VM, right? Would you say that it's slightly less isolation at the benefit of more performance, but even though it's less isolation, you still feel that you still haven't found a threat, a security threat to break, to break out of it? Is that, is that, would you, would you say, right? Yeah. Actually, another level of security is based on Q-Colonel is, let's start as Linux, Linux, current Linux kernels threat model. This one large part is that it is developed as a C-language, so that there's, this language is not secure. When this object is freed and then it's powered, this, maybe it's, there's a much chance to compromise the whole kernel, but, but in our side, we developed the system with Rust. So at least this level, that's, that's attack surface is decreased. I see. So that is the kind of, so this is kind of balance between the, between the security and the performance. From my point of, my understanding that security hole is still kind of a bug. We cannot fix all the bugs. Also, we cannot fix all the security hole. So we still need to balance between the security and performance. And one more question, if I may, about the Q-Colonel itself. Even though you're running the containers on top of that, correct? Doesn't mean that the Q-Colonel has the same concepts of namespaces and, let's say, maybe C-groups, you know, the things that the Linux kernel, the primitives that the Linux kernel has in order to create the containers. Do those also exist in the Q-Colonel or, or, or not really? Yeah, actually, our goal is that implement, our goal is to implement all the object inside the Linux kernel. So far, we've done support the C-group and partially support namespace inside the Q-Colonel. But for the Q-container, in the overall, who Q-container runs inside the Linux container, it has, it's running inside one name. I got you. Yeah, because you're running a single, a single sort of container per VM. So therefore, the Q-Colonel doesn't need to necessarily support multiple namespaces itself, right? It's like, like, almost like the, the whole thing is part of a single namespace. The underlying Linux kernel is the one that is separating the different. Yeah, for C-group, for this kind of resource isolation, it's still based on the host kernels that's, based on host C-group, yeah. All right, thank you. Yeah, thank you. Another question. So on the, on the share memory, have you considered a fallback mechanism that gives some users don't want to share their memory at the expense of performance? Oh, yes. Actually, we, actually, we support a three kind of call. First is the hyper power, a Q-car, and the IOU ring. Actually, this is, this hyper card is developed at first. So all the, all the, this is Q-car, Q-car request can support a hyper car. And this is just a switch inside our code. Got it. Got it. Thanks. Yeah. Okay. I can give more information about our current performance. We have tested that, tested our performance with, and compare that with Kata, and the, and the divisor. Here is our test result. For the, there are some, some, some metrics. First is the, is the start of time. That's the, this is our test result. We use the, we use the time command to test this. For the wrong, wrong say, that's just the wrong say means the wrong, wrong, the container, that's native wrong say on the bi-matter machine. And his start time is a, is a, is a 600 millisecond, and the clock is just like that. Is a, have small difference. But for divisor, it is a, this is still good. But for Kata, it takes a much longer time. I think it's also reasonable. Your Kata started a full, started a full-length kernel. And it already done much, I think it already done much automation, so that this is almost two, two second performance, I think it's pretty good. But still, it is a full-length kernel. So start-up time, that's Kata is the most slow. Also, another set is the memory overhead. I use the, that's the busy box to test the memory, memory is consumption. For the quark, it take this about 20, 12 megabytes overhead. He whether is 28 and Kata is the largest. It's because the Kata started full-length kernel. And, and actually for the memory side, we have more benefit with quark. For the Kata, it's the latest kernel. When they consume the memory, they cannot, they cannot release that to the, to the house kernel. So memory consumption, they are capable to increase. But for the quark, when the application freeze the memory, quark runtime can freeze that to the house memory. And this free memory can be used by another container. For divisor, similar things as, because it's based on the go-round time, they have garbage collection. That's the memory, free memory become uninspired. We also do some, that's performance throughput comparison. This is some, some, this is industry benchmark. The first one is the EDCD. And we can say that the EDCD is this, for, for a majority of this, or this is, this number is QPS. For one, say for the port operation is QPS is the 3700. And the quark is slightly smaller than it. For divisor, it's, it's still good. But for Kata, in this, this benchmark, it's the most slow. Similar thing that's for the EDCD, that Kata is the slowest quark, quark is better. And sometimes it's still better than the quark. But the majority of the quark is, is faster. Because so far for the divisor, sometimes it's better than the quark. We have no time to do deep, deep performance tuning. But we think that's for any optimization divisor can do, quark can do that either. So in future, we can fill this gap. Yeah, for the radius, that's something, that's something that divisor is not good, but Kata is good. Yeah, we can say that this is, this is the, the radius pass result. RunC is our, our ceiling, because actually we, the quark is run inside a, inside a container. So runC is our problem ceiling. Yeah. Sometimes, sometimes we quark can run better than the runC. I think it's maybe the test, test the problem. And in future, because we will use, now we are using IOU, in future we will target to, we may use more advanced technology, for example, RDMA to improve that's IOU, IOU performance. Maybe sometimes we can have better performance than the runC. Yeah, but overall, that's divisor performance is very low in this high QPS environment. And Kata is much better. But still overall, that's the quark performance is better than the. All right. So what's the, what's the metric on that, on that? Oh, that's QPS. QPS, okay. Yeah. Yeah. So all this test is based on throughput. So far I haven't tested latency yet. Yeah. Okay. Yeah. This is a NJEC test. We can say this quark is much slower than the runC. I'm not sure why it happened. Maybe in future I will do more problems for that. And divisor is much slower than the quark. And Kata is slower than the quizer. This is still the QPS. That's the, I use that's the guide, the most simple one that's the HTTP guide, test that. Yeah. Also, we, we also test some, that's complex scenario. For example, we use the modern DB and the Mexico's initial initialization. When we started, Mexico started modern DB, it has, it had a very complex system call. We use that for test. And still quark's performance is better than divisor and Kata. Yeah. Oh, this time is the second, how much time it will take to start a modern DB and the Mexico. Yeah. In summary, that's the quark's memory overhead is, overhead start time and performance is the best in this three, three contender round type, secure contender round type. Yeah. Any question about this performance? I have a question also, Julien, have you done any performance benchmarks measuring how many, let's say containers I could put on a host with just a bare runC versus, you know, with quark and with the other things, in other words, you know, it seems like these, these tests are just for a single sort of container, but I wonder if you start running, let's say, you know, if you get a big host and you start running maybe 100 or 150 containers, right? And then you do the same with quark. How many, you know, can you run and get the same performance, right? The capacity is sort of Oh, I haven't, I haven't had that. Yeah, that's good, Mark. You're in future, we will add this. Okay. So our focus is the majority based on the throughput and then later we will pass this kind of density. Because sometimes you may get surprised, it may not scale linearly, if you think, you know, it may, or it may, or it may scale linearly, one, one never knows, right? But people are often looking at, okay, how much stuff can I run on my server? You know, right? And how much capacity, how much can I get out of it as far as workloads? Yeah, yeah, that makes sense. In future, we will add this kind of benchmark. The last part is, I can give some demo. Actually, we can, let me, here we have different, this feed, we have different commands to run, say, run quark, quark, debug, and run, I say, that's the Qweather, and Qweather, and the kata, you know, they have different runtime. Here, I just demoed, that's a demo execution for the Bshell, run, Bshell with Wubuntu, that's use quark. You know, this is a, this is, this use quark started, started the Wubuntu. And we can see we have just like a call, call Michelle and go to EDC. We can do the cat, for example, system, just like, just like this, this is a straightforward thing. And here, we can also, I can give more demo like, for example, DD, this is for some, for some disk commands. Here, for example, no, for this test, let's use the quark, release version, and his commands are 160, 160 megabytes in my, in my machine. And if we use the run, say, that's the, that's the, that's the Qweather, it's much slower. It is much slower. And the kata is better than the, than the, than the Qweather. Oh, this time it's your 100. Yeah. So, that's the quark is much better than that. We can also run some ETCD. This ETCD use a, use quark, that's our ETCD. And then I can run ETCD benchmark. This is the port list, MVCC port, etc. And the benchmark. Okay. This is the port benchmark. Oh, why this time it's, maybe you, why this time it's much slower. Maybe I, I just do the test, add some regression, problems regression. You're only 1000. That's weird. You're, but you're, maybe I, my, my code added some regression. Here, this is another benchmark. This is, this is the least capable live, this kind of thing. And we can also, oh, this log, oh, I, I forgot to, disable log, but I can add more logs, debug log. Let's go back and run the bash. And we can see in the log, we can see we got so many, this kind of system call from the application. This is the system read and F states, this kind of system call. Oh, this is the log. And the last call, the last call is a, oh, here you select, p select. And they have more car generator here, etc. This is the demo. There's lots of heart. Any more questions? I have a question also. So, I have two questions. One of them is, if I wanted to run, let's say quark on the cloud, right? Let's say I go to, you know, AWS or GCP and I get a EC2 instance, let's say on AWS, right? Which is itself a VM. And I want to run quark on it. You know, what do I need? Is that possible? Or do I need nested virtualization enabled? What are the requirements? Well, firstly, I haven't tried that. And that is possible. But we need to make sure that's the, for example, for the Amazon, I heard that his virtual machine is based on the KVM. And we need to make sure that's the, that's virtualization. This feature, that's the name, the African name, just to make sure that's the recursive, that's the virtualization kind of work enabled. So that, yeah, the nested virtualization, they call it. Yes. Yeah. I don't, I think Amazon doesn't support it, but I think Google, Google's GCP does have an option to. Yeah. Oh, Mr. Do you know Ricardo? Is that correct? Yeah, they supported and also Azure support nested virtualization. Okay, cool. Got it. And that would be enough, Julian. So let's say if you have a machine, as long as the machine supports nested virtualization, would that be something, would that be enough? Or are there any other requirements that you need? Yeah, I believe that's enough, but I haven't passed that. Okay, got it. We are common that's a KBN-based water machine. Okay. So it should work. Yeah. You may want to try that because that's, you know, the most common use case you're going to see, right? People are renting their machines from the cloud all the time. And so they're going to quickly ask about that. All right, good. And the other question that I had was, are there any limitations on the workloads that you can run inside of the container with cork? Are there any limitations that you know of? Yeah. Yeah, that's a very important part. That's the latest capability. So far, we've supported 200, more than 200 tons. That's a system cost. There's still some system cost we don't support. I got you. Yeah, we have tested some common service like MySQL, the NJX, this kind of thing. But maybe there's some simplification if they use a special system cork. And we won't support that now. But now we are working hard to add more support and improve this compatibility. Yes. Good. And other than system cost, let's say things like procFS, you know, slash proc, slash these, are they also compatible with Linux when you're inside of the container? Do they look exactly the same as they would on a Linux kernel or not? Oh, yeah. Actually, that's a good question. For the proc folder, we support some of them. It's still a subset of Linux. We don't support all of them. Got you. Got you. Got it. So you would say a lot of workloads run in it, but I'm sure there are certain workloads that either they use the system cost that you don't implement yet, or they do access things like proc that probably are not implemented yet, that right now don't work. Maybe in the future they will work, right? Is that right? Yes. You're right. Yeah. Okay. And one final question. If you were to launch a pod in a Kubernetes pod, does this work with pods? Because in pods, you have multiple containers. If they're sharing some of the namespaces, they're not sharing some of the other namespaces. For example, the network namespace they share, right? But the other namespaces they don't share. Have you tried this with Kubernetes pods? Yeah, we haven't done the full test yet. Firstly, we support that. Actually, one, how to say, one is pure kernel. We name it sandbox. Multiple containers can run inside of a sandbox. Just use run C to start, just like run C start, start the container inside a sandbox, that's possible. But so far, we have implemented some namespace inside a queue kernel, but we didn't fully test that. So we haven't tested that for the part. Yeah. Yeah, you may want to also work on that because that's also going to be a very common use case. People are going to say, okay, that's great that I can use Docker, but most people are like beyond Docker, right? They want to use it with things like Kubernetes. So that's something that I would also recommend that you put some attention on. Yes. This is our target. They must be able to run that inside a Kubernetes. Excellent. We are also collaborating with one company. That's one cloud provider company to work on this. This kind of requirement is a key requirement in a critical path to support Kubernetes. We are collaborating with them and try to add this support. Yeah. Do you have any users now or not yet? No, we are working with two cloud providers. One of them is trying to put some production workload, but we are doing that together, do that step by step. So far, we have no production deployment now, but someone is trying to do that. That cloud provider now is using divisor and trying to move to Quark. If Quark can meet their requirement. Got it. I had a few questions. So on the performance metrics, are you going to make those available, the slides that you presented today? I think people would be interested in looking at those. Yeah. Actually, all of them is inside the Quark, Quark Git as the folder that's the performance PDF. And another thing is how to test that, how to run that. This is how to reprogram this test result. For example, how to run this startup time and how to run different, run the ETCD, etc., always put here. And the test result is put in the promise.tdif. Right. Yeah, I think it would be also interesting to see some of the CPU utilization overhead between all of those that you compared again. Yes. So the other questions about the project. So you mentioned having a couple cloud providers interested in testing it out. When I first looked at it, granted that was actually a month or two ago. It was unclear to me who's sponsoring it, how many people are involved in the project. I just curious what the state of the project is today. So far, actually, we trust the open source this project last month. And we are in the initial stage now. So far, we have no solid sponsor now. So how many contributors do you have today? I'm just curious, is this just a start off as a personal project or a work project? What's your question? For the contributor, your majority is from our company and now some cloud provider is trying to contribute for that. But not a contributor, no, yeah. So, okay. I was just curious, is core containers like your company's product that you've been working on? Oh, it's already open sourced, yeah. And open sourced is an Apache license. And the majority of the contributors are from our company now. Yeah, I was just trying to understand where the sponsorship of it was coming from. Yeah. So I think this question is more about how did you get started with the project? How did, I guess, the idea became to fruition because of some of the limitations from GVisor and then you started that? Yeah. Actually, this motivated from the GVisor. I think at that time, quite a few people have thought to start to write a GVisor with Rust. And we just started working on that. Yeah. All right, that's great. Yeah, I think it's interesting. Thank you. Any other questions? So yeah, this is great. I mean, it's exciting to see this and yeah, it's an alternative to something like GVisor and Cata. And then you already applied for Sandbox, right? For the CNCF. Yeah, that's something we planned to donate this project to the CNCF. What are some of the things that you want to get out of being in the CNCF? What do you expect to get? More contributors, more community support? Yes. Yeah, that's the motivation is more a more contributor. Nothing is aligned with the CNCF ecosystem. They hope we can align and put it inside the CNCF ecosystem, be part of them. Got it. Yeah. All right, we've got 10 more minutes. More questions? I think that's about it. So yeah, thank you, everyone. Thank you for presenting and we'll keep in touch. Yeah, thank you. Thank you. Thank you. Bye.