 Bonjour, je m'appelle Maxime Gopin, je travaille sur Red Hat dans la virtualisation et la compagnie de travail. Et aujourd'hui, nous parlons de postcopie et de la migration avec l'institut de la back-end. Donc, la raison pour laquelle nous avons ajouté la migration de postcopie à l'institut de la back-end, c'est qu'aujourd'hui, nous ne pouvons qu'utiliser et précopier la migration, ce qui est un moyen normal pour faire la migration de la back-end. Et cette méthode peut causer des problèmes quand nous avons des rates de paiement de pages. Et donc, l'NFV est le cas où nous avons un rate de paiement de pages. Donc, quand ça arrive, la migration de postcopie est vraiment importante. C'est trop grand. Et même parfois, ça ne peut jamais se passer. Ou le temps de postcopie peut être aussi important. Et avec postcopie et de la migration, le but est d'avoir un temps de migration déterminé. Donc, récemment, nous n'avons pas de support pour la back-end. Donc, vous n'aurez pas besoin de postcopie et de la migration, mais avec une VM classique, qui est une nouvelle devise. Donc, premièrement, nous verrons ce qu'est la migration de postcopie, pour comprendre pourquoi il y a des problèmes avec ça. Donc, le concept de la migration de postcopie est de copier la mémoire de gestes de source à destination, avant que l'exécution commence sur la destination. Et pendant que cette copie est faite, c'est-à-dire que c'est toujours exécuté sur la source. Et depuis que c'est toujours exécuté sur la source, certaines pages sont toujours en place par les gestes, ou par les processus, dans la mémoire de gestes. Donc, nous avons besoin de traiter cet écrit. Et cette logique de page doc must be done for writes done by the guest, write done by the QMU process, and over a process write, like the vio through the back-end running in OVSTPDK. Et dans QMU, nous avons une migration thread that is in charge of copying all the memory from source to destination. And this thread is in charge of gathering all the dirty pages information and sending them to the destination host. And still, since the pages can be dirty while they are being copied to the destination, we need to, the same page can be transferred multiple times, which cause longer migration time. And in some cases, I can even never converge if the guest is writing memory too fast, or if the bandwidth is too low. And in that case, to overcome this, we have either to slow down the guest, which will have an impact on performance, or to pause it and it will have an impact on the done time. So, as I said, we have to do dirty pages tracking at multiple levels. For the guest page write, this is done at KVM level. So, the idea is that QMU notify KVM that it needs to track dirty pages. So, it does this with a dedicated value control, with a value control with a dedicated flag. And KVM, when handling this request, creates a bitmap for each memory slot. And for each memory slot, it will remove the right access to the memory pages. So, when done the dirty pages start, so when the guest writes a page, it will cause an IPT violation because we have removed the right access. And so we have a VM exit. So, we jump it to KVM. And KVM will set the corresponding bit into the dirty looking page, the bitmap. And once that done, it will restore the right access to the page so that for further write on to this page, it will not trap anymore. And on KVM side, we have the migration side that collects this dirty bitmap with a value control. And it will transfer it to the destination. So, this is nice because we don't have to modify the guestOS to handle the migration. But each time we have the guest writing a page, we have a VM exit. And so it means that we switch from level 1 to level 0. And the VCPU is no more executed here. So, we have a performance impact. And since the same page can be transferred multiple times, we have an impact on migration time. And we have a feature that do CPUs for clean to decrease the VCPU speed in some way. If the dirty page is too high, but doing this will help to reduce the speed if the dirty page is too high, but doing this will also have a performance impact. In the Vios Chooser backend, we have a similar way to... we also do dirty page logging because we have an external process to QMU that will write the guest memory. So, we need to track this. And so, we have a Vios Chooser protocol extension to do that so that QMU allocates the bitmaps and shares them with the Vios Chooser backend using a dedicated set log base request. And then, for each page written, the backend will set the bits in the bitmap. And the periodic QMU will check the dirty bits transfer the pages and clear the bits in the bitmap. And so that we need to do that some atomic operation because we could lose some dirty pages if we didn't do that. And when you have multiple Qs plus QMU doing this atomic operation and the same memory region, it creates a contention which will also cause some performance issues. So, to overcome this, we implemented an optimization which the Vios Chooser backend persiste in having a pair via Q cache. So, as long as we don't commit, we don't write back the descriptor to the guest, we don't mark the pages as dirty. And so, we have less contention doing that. But still, that's not perfect. We can still have a lot, if we have a lot of Q, we can have a lot of contention and fail on converging. And the auto-converge feature that is used by QMU for the guest that reduce CPU-13 might not help in this case because it's an over process. So, the backend can still write even if the guest is slowed down. So, we looked at post-copy like migration for the sub-backer to overcome this. So, the difference with pre-copy is that with post-copy, we start the execution on the destination before all the memory has been copied. So, we start by copying only the minimal VM states. So, like the CPU states, the registers, the devices states, which is a few kilobytes, not megabytes. And we start execution on the destination with that. And in parallel, we start to copy pages from source to destination. And at some point, the guest will try to access a page that is not copied yet. So, what will happen is that a page fault, we have to handle a page fault. And we use a mechanism to request this specific missing page to be able to continue execution. So, this page fault handling is done by userFaultFD so that we don't have the kernel to request QMU on source side to do some network communication to get the missing page. So, the advantage with this solution is that we have no more convergence issue because the page will be copied on the nuance. But if the migration fails for some reasons, for example, we lose the network, we cannot recover the migration because we have part of the state, of the VM state in the destination and part of the source. So, in this case, we are lost. So, as I said, to implement this, we use userFaultFD. So, userFaultFD provides a dedicated C-Score to be able to send some IO controls. So, first IO control is to negotiate the API because depending on the kernel version we can have different operations available. We have a register IO control that we use for registering to userFaultFD every memory slot of the guest. Then, when we want to map a receipt page into the guest, we have a copy IO control. The page is empty, it's on zeros. We also have an IO control to do that so that I can set some memory. And when we are done with copying or setting the new page, we wake up the thread in the page fault using the wake IO control. So, like with Freck-o-P we have a dedicated thread running to handle these page faults. So, we just have to receive the missing page's information to pull and read the userFaultFD. And what's great with userFaultFD is that you have also a non-cooperative mode. So, it's interesting for use case which means that we can have an external process that can handle the page fault of another process. So, let's see how PostCopy works in detail. So, this CPU is started on destination, start to execute. At some point, it tries to access a missing page. So, we trap into the hypervisor. And the hypervisor will notify the migration thread through Polyin. And so, the migration thread can read the missing page address and request to the destination et to the source, the corresponding page. And after this, when the page is received, it will copy it into a guest memory and resume execution of the VCP. So, what we need is to add support for the user backend with PostCopy so that when the user backend that can be, for example, of the SDK tries to access a page that is missing so that the QMU can request the page to the source and continue execution. So, to do this, we use the non-cooperative mode discussed earlier so that it is QMU that request the missing page because it knows about the QMU source et to do this, we use the user backend to get the user fault FD and share it with QMU. And for every regent, it maps, every guest memory region, it maps, it will register then to user fault FD. And doing that, we are done except that when QMU will receive the missing page information to receive an address into the virtual address space and not in the backend process address space. So, it needs a way to translate this address into his address space and for this, we need to send the QMU, the memory region of the guest of the backend map to QMU. And to do that, we have extended the Vios user protocol to be able to send the user fault FD via the unique socket notify the slaves of the SDK that we have to start postcopy and notify it when postcopy is done. Also, to send the slave address space information, we have extended the set memory table protocol request. So, I did some benchmarks, so it's it's not a real realistic benchmark because the benchmark is doing migration on the same host so we have an unlimited bandwidth and the workload also is not really realistic, so we are using DPDK both in the host and in the guest it is doing an IO loop back so it is doing IO forwarding inject and it is doing an IO loop and after that we start the migration so, when we use a guest of Triggy of the RAM, we have one big huge pages so we have forwarding rate of 10 million packets per second we see that we have a big game in the total migration time because it goes from 16 to 2 seconds with postcopy and a slightly better but it's not really noticeable down time with postcopy also then I switch to use to make huge pages and we see that with precopy we have a higher migration time whereas with postcopy we remain at the same duration so back to confirm why postcopy is good for this but we can see and that we have higher down time when using postcopy that might be explained because we have much more request to make pages to the destination to get the pages so we have some remaining work to do first, we have an issue with postcopy that migration will break when the process pre-folds the shared memory so for example if the application of sdpk with future parameter the migration will fail because it will pre-fold the page before postcopy at map time before the the migration is initiated and we have no way today to prevent this and the problem is that we don't have the library so we don't have control over the application so maybe we need to to have something like a new map flag to say never pre-fold or something like that and at least we need to detect before doing the map that the application has done a M lock and so we could forbid postcopy in that case and we still have also to enable postcopy in obvious dpdk so it is supported in dpdk but not in obvious dpdk yet and we would also need to run some benchmark in more realistic condition with NFV workloads so that we can measure the packet drops what is important and also the migration time so we have been released in dpdk 1811 and for the Vios user extension in QMU it has been added in 3.1 I am working with QE team to have more realistic benchmark so I will try to post the result in a blog and I would like to to give some presents to David Gilbert who who work on the QMU side of the postcopy support for Vios user back end and postcopy in general the Andrea Arcongelli who added user full belge support to do more with the question do we have some idea how to improve the downtime so if we go to I think you talk about the downtime here so first we need to be sure what is the root cause of this downtime and then we can trace and understand what is the exact reason of this so we have maybe some idea but we need to do the investigation before so maybe we will have to we could use an hybrid solution would be to do precopy for one pass so copy most of the guest memory to the station for example there is an option for that in virge to precopy postcopy after precopy to do that so we need to see depending on the use case if we need migration to finish early or not with tuning we need to do maybe speculate on the pages speculate on the pages and present in the class there is already this kind of algorithm in QMU so that when it will fault on the page when we have to under the page fault it will try to send the pages around to speculate so if we don't have network outage or it would be a bug that the postcopy fail but if it happens it's really bad because we don't we can't recover ok thank you we can talk