 Mój nazwem jest Maciej Szmigiero. Jestem principalnym tłumaczem technicznych w Oracle. Przedmawiałem w Konradz-Wilk, Linox Wirtualizacji i Securitarnych Grupie w borystostrowskim, który jest mój dyrektor. Jestem najmłodszym możliwym dla stawów z Windowsu, w wsparcie Windowsu, a szczególnie w borystostrowskim Cloudie. Dzisiaj rozmawiamy o tym, aby opłacić hodną memorę, czy ram, przyzwyczajeną kapalitę do Windowsu, z względu na QM, bo to jest to, co Oracle Cloud jest używający. Jesteśmy na środku hyper-V dynamic memory protocolu. Dlatego, że to jest Open Source Summit, oczywiście, najpierw pytanie jest, dlaczego opłacić propozytary protocols w Open Source lub Free Software Ecosystem, bo to jest pytanie, co ludzie często poszukiwają w tych przypadkach. Tak samo, że tak samo, jak dla innych stawów Windowsu w Open Source, np. na Samba, lub na Hyper-V Enlightment w KVM. I tak samo, że kluczowie chcą uruchomić Windows VMS pod open source stack, a jeśli nie chcą tego, po prostu chcielibyśmy uruchomić Azure i przeszkodzić to do 100% propozytary. Tak samo, że tak samo, czy przeszkodzić do Windowsu, a nie przeszkodzić do tego, jak opłacić Open Source. Tak samo, jak na Samba, lub na innych stawach, np. na połączeniu drogów na sprzęt, który nie jest załatwiony przez manufacturera, np. na połączenie Linuxu na sztuk M1. Więc dlaczego nie chcielibyśmy uruchomić Windows VMS? Więc najważniejsze jest, bo jest bardzo ciężko pokazać akurat 100% accuracy all the time. Więc problem jest, że kluczowie nie chcą uruchomić Windows VMS. Oni nie chcą uruchomić, nawet dla bezpieczeństwa, jeśli mogli. Więc oni naprawdę nie chcą uruchomić, np. po prostu, żeby opłacić niektórych kapacitów do Windows VMS, więc oni nie chcielibyśmy uruchomić wszystko, dlaczego VMS nadal jest uruchomić, np. na połączenie reboot time. Więc zawsze jest dobrze, jeśli ten VMS wystarczy z uruchomieniem, np. jeśli na przykład nieuchodniejszy, czy nieuchodniejszy uruchomieniu, że kluczowie będą mogli uruchomić połączenie do tego, do jego VMS, jeśli na przykład się wystarczy z uruchomieniem w jego uruchomieniu. I też, to też pozwala do zespołu do uruchomienia, np. na przykład bardzo dużych soluczów, kiedy za dużo uruchomił na hostze, potem kluczowie podoba, że czasami VMS może mieć jego uruchomieniu częściowo w tym razie. To jest taki przykład na przykład w termsie prezentujące. Tak więc, LTV zскойsiliśmy jedno VMS podróż od ramy, do tyłu nowego. Przecież najważniejsze, jak jakaś podoba się 56 memory slots w QMU. Zobieżna solucja to będzie, by to zwiększyć, ale nie skala. W tym samym czasie już są jakieś konfiguracje, które wreszcie w ACPI tabułach są za wielkie i w porządku. Jeżeli zwiększysz w numerach slotsu, to zwiększ w tabułach nawet więcej. To jest oczywiście, że solucja nie skala. Jeżeli nie skala się tych problemów, to te slotsu mają bardzo dużą granularność. I to jest wreszcie efekt z limitedem numerem. I kiedy slot się wzaję, to jest bardzo ciężko, żeby to skalać. Ponieważ nawet jak jedna psa, który skalał w QMU, to nie skala. Więc nie skala. harder and harder to scale in this case and also it has some performance problem because you have a striple effect on removal and what I mean is that basically the host does not have any information about the guest layout of let's say memory and so it's possible that the host can always choose the wrong the most used dim to to remove so if you have this let's say situation that the there are free extra dim splagged in the guest A, B and C and A, B are nearly empty but C is nearly full and as I said the host doesn't know this so basically in this case it makes a wrong guess and removes the last one so on the removal request the guest wants to free this stick so it copies its content to stick B and then the host also second time does the wrong guess and chooses stick B to remove instead of for example stick A so once again you have to copy the memory contents obviously that's fixable if you make the host aware of the guest let's say usability of every stick but it's always something that has to be developed so it's something that's not there currently. Now the second popular let's say resizing solution those days is virtual mem but it still has a fairly large block size that the current minimum assigned as far as I know is one megabyte which is like 256 pages so it's basically also has this problem that it one one stack page in this block can prevent its removal it's not as severe as obviously as if you have like 100 gigabyte dynastik or around this order of magnitude site but it's still not a management of the memory and the basic granularity of the hardware page and also there is in terms of windows there is no native windows drivers I know there was an attempt presented last year at DEF CONCCI by Marek Kenjerski there's a link to his talk and he mentioned there is a lot of issues with windows kernel because you have to basically as with Linux when virtual mem and similar stuff was added you have to basically have access to lot of the memory management internals there and obviously with windows that's a bit challenging and also I think he noted that Hyper-V dynamic memory protocol client driver in windows actually uses a lot of let's say undocumented windows calls as one can expect so it's something that would be fragile if for example we are to write a client driver for windows for virtual mem and this client driver then uses some kind of like a undocumented windows data structures or functions then it will be problem if Microsoft removes those in the next windows version or even in the next windows update so it's like a rather challenging to add a support for virtual mem to windows obviously I'm totally for doing this if somebody is always good to have like a competition because it's good for business if somebody wants to add the virtual mem client driver for windows then it will be great if we have something like this it will obviously also probably maybe make Microsoft document those interfaces which will help everybody because that will be more transparent as I said though some of those issues as theoretically at least politically fixable but it will be much more invasive while this driver is pretty much self-contained solution it only requires a QMU changes and doesn't require any other changes for example to the guest itself so in terms of guest resizing the other stuff that is often mentioned is ballooning and basically we have to have some kind of like a ballooning solution because it's required to change the guest size at runtime with the required granularity it doesn't matter how you call it if you can call it like a small dim or ballooned out pages it's important that the management happens at the very low granularity at basically the hardware page level and that's what ballooning usually uses that's why it's pretty much necessary if you want to have these changes guest size changes done with a lot of flexibility so that's why the integration between the ballooning and hot add drivers is desirable because if we don't want to hot add to memory to a guest that has been ballooned down to some size we want to first deflate the balloon and once the guest is at the boot memory size then we can think about adding a new memory range to the guest in order to increase its size so it's good to have an integration between those drivers because then it's easier for the process controlling QMU to actually manage those resizing so in terms of ballooning the virtual balloon is the currently preferred QMU ballooning solution but the problem with this driver in terms of windows is that it's Windows client driver to actually makes those ballooned out pages as in use inside the guest so obviously there is no way to remove the dimstick backing them because with those things those pages are actually in use obviously that's solvable if somebody wants to but it's as I said it requires another changes to the guest though and this is also a pure ballooning driver and protocol so no way to, there is no way to actually resize the guest past boot size it just you can resize it down and back to the boot size and also it has a rather low performance because it's the whole protocol like communication it operates on a per single page there is no optimization for example if you remove the large range you want to have a granularity of single page because you can have like a stack page somewhere but you at the same time usually you will remove like a big range so you want to operate on ranges but as I said it's probably fixable too if somebody wants to do this and it turns out to be performance bottleneck so what's Hyper-V dynamic memory protocol it's a protocol that uses Hyper-V VM bus which is pretty much an undocumented bus because it's neither the bus nor the dynamic memory protocol has been documented in the Hyper-V top level functional specification I actually asked Microsoft for specs for this but they say they will write this in some unspecified future so it's not there yet so QMU host support for the VM bus itself has been actually developed by virtuoso that's a pretty much a big effort so big thanks for those guys for doing this I actually it was actually let's say orphaned recently so I took a maintenance shift of those driver in QMU so it's almost self-contained it doesn't require a lot of integration for other QMU subsystems so it gets very few very low traffic but it's still good to have it because it's a large even if it's self-contained it's a large piece of code and it's a tricky so because we don't have documentation for this protocol at least we have a Linux kernel client drivers for this so we can kind of observe it from the opposite side obviously they only kind of document only the messages and data structures that Microsoft decided to use for their Linux guests they don't actually document everything that's possible when HyperV communicates with genuine Windows guests so we are limited in this but it's what we can reverse engineer for Linux kernel client drivers but at least they are a huge help so those were let's say bad things about the protocol good things is that it has a built-in support starting from about Windows Server 2012 revision 2 obviously newer version of Windows Server also have support for this protocol so probably the client Windows versions have support too but client Windows versions are not really run at cloud so we don't care that much for those it's obviously it should be 2019 sorry for this 2022 I haven't checked myself but I'm almost certain that it also has support for this protocol whether it uses some kind of extension I don't know it has to be verified but for sure the basic protocol support is there so the limitation of the protocol is that it won't adversize hot add capability unless S4 support is disabled in the guest so that's kind of like an incompatible with some VM freezing solution that are based on hibernation so it's like a small limitation small or maybe major if your provider wants to use the freeze VM where it's client-side hibernation so it's something to have on mind another good news is that the Windows let's say interpretation of this protocol seems to be really determined to free the requested memory area so basically it tries to free as much as possible if there is like an let's say 800 megabytes of memory in use in Windows and you request to ballon down it basically will remove everything up to this 800 megabytes and then it will leave very little to spare but it's worth mentioning here that it won't actually crash it actually has some kind of like a memory reserve so even if you ballon it down to almost the required minimum size it will keep the minimum reserve so it doesn't actually crash and also it's important that in Windows part of the kernel memory can be swapped to the disk so it basically will start swapping very heavily and obviously it probably does some kind of memory compaction but it's Windows so we don't know the details here and I think that enforcing the ballon floor which is the minimum size that the guest could be ballon down is best left to the guest because host obviously does not have knowledge of the guest memory internals so it doesn't guest kernel memory internals so it doesn't know how low the guest could go I know there was a problem with Hyper-V client driver this protocol for Linux that it could go so low that the kernel would crash basically if you set this set its size to low and it's somehow related to the dynamic memory VM setting in the actual Hyper-V there is a setting like this in VM config in Hyper-V but it's kind of bit misleading because it only controls the kind of automatic VM size management depending on the load of the host and if you manually change the VM size in Hyper-V even if this setting is disabled it will actually use this protocol if it can so it's kind of like always in use if it's possible so let's talk about the driver it was developed as a new driver it was named HV Balloon after the Linux kernel client driver for this protocol so to follow the pattern established by Virtio Balloon driver and this driver supports both the memory hot-add and hot-removal request and ballooning and for this because it's like a first version it was easier the easiest way to do this was to plug it into QMU ballooning commands that is balloon and info balloon and let's say what the driver is not it's not a cross-platform VM resizing driver it's limited to what Windows supports it's currently only tested on x86 in principle ARM64 it should be possible in the future but there is no requirement for the first version to have it and obviously Linux guests aren't in scope because it's a proprietary Windows-specific protocol and the first implementation has actually been implemented using a kind of universal packing device for memory hot-add protocol I call it HAProt and the dynamic memory protocol actually registers as a provider for those and this allows removing memory from guests in single page units we are building ballooning operation of this protocol because dynamic memory protocol it's both supports ballooning and also hot-add of memory obviously it has to not trust the guest because for example it could report that it's returning the pages that are outside from its current address space and then there will be a problem if those ranges are later hot-added for performance the guest release memory is tracked in range trees it's basically kind of like an extantry in file system and this requires a few new G3 operations to be added to G-LIP and they were already upstreamed and released as a part version of G-LIP 268 like year and a half ago obviously the driver detects their presence and not to break build QMU with older GD versions using those kind of range trees gave much better performance than virtual balloon and you can see that it uses almost three times the performance of ballooning in comparison to virtual balloon obviously as I said it's possible to integrate those enhancement and it's a kind of universal idea and the basic HIP-prod device works like a virtual dimstick by it allows inserting extra RAM into the guest at runtime but the dimstick is not defined at the VM start time it's created under runtime and notifying the guest about the new memory range is done via this protocol handler which in this case is the dynamic memory protocol driver obviously the ACPI dim slot does not apply and the virtual dim size is determined at the insertion time and the protocol handler on the other hand can inform the guest about the removal of the device and the virtual dim and do it's own cleanup if that's the operation that they are desired and also as a part of those, let's say, of this effort I also upstream a scalable memslot implementation for KBM since each hot added memory range is a new memslot so this new implementation for KBM can integrate on a player as you go basis it changes all those linear scans that were there which were dependent on the count of memslot to the logarithmic operation so it gave a lot much better performance with a large number of memslot and the number of memslot were increased from 509 to 32K i those improvements have been released in kernel 5.17 so as a summary because we are running out of time this was just a first attempt MVP so there is still a lot of work to do and one of the most, let's say, things to figure out is to what to do on the guest reboot Hyper-V seems to resize the boot memory to match the current guest in this size i to implement this in QMU would be tricky because it requires a lot of changes so it's quite an invasive change at the same time we try to avoid relaunching QMU with a new guest size because that's kind of risky and also it breaks some workflows currently the virtual dimsticks are reinserted after the guest reconnects to the dynamic memory interface there is actually a waiting period for this which is the same as the original Hyper-V uses and obviously if the guest reboots they are marked as not in use so they could be removed there is also a small other change for example this HE-Prot it's like a first attempt we can probably do better and use better QMU interface for this obviously NUMA awareness is an important stuff these are kind of like a ritual note member for the protocol messages but it has to be investigated how to actually tell windows and whether this actually changes somewhere in how windows places it satisfies its memory requirements whether this actually has the desired effect and also we want to have every balloon virtual balloon existence because we want to use both for different kind of guests at the same QMU so maybe they can be even integrated as a backends of a common user interface so I think I'm running out of time so I won't do live demonstration so I instead will go to QA immediately so are there any questions? The question is whether HyperV detects the guest type and does something different depending on whether this is windows and Linux I know that HyperV has a guest type field so it detects the guest type but at the same time obviously the only resizing protocol that HyperV actually has it's the dynamic memory one so obviously that's the only one protocol it can offer to the guest so there is no choice here depending on the guest type So let's say remark was that the virtual MAM driver for windows is being reviewed because it was submitted as upon request by Marex great, it's great to have a competition really The question was why per the generic interface QMU interface with ballooning instead of virtual MAM to be integrated it could integrate free things under the hood let's say as a back-end to execute the request it's not it's fairly possible to have under one user interface to have virtual MAM ballooning and HV balloon depending on what the guest could use it's totally makes sense it's a great idea I think, thanks Yes I haven't heard about any changes about this so the question I will repeat the question is whether using this interface actually changes something in windows about the zeroing memory I don't know about any changes maybe there are some it's like it could be as a performance optimization for the future, but it's like for the version 3 maybe or 4 I haven't heard about this kind of interface that is basically tells the guest that the memory is already zeroed but it will be a good performance optimization for sure so any further questions so thank you for attending