 Good morning I'm Sean Christopherson. I work at Google Cloud mostly on KVM x86 and today I'm going to be talking about scaling KVM in its community Okay, disclaimers disclaimer number one This is going to be a non-technical talk focused on scaling KVM as a project not as a hypervisor if you were expecting technical talk Sorry, no cake for you Disclaimer number two much of this talk is going to be x86 centric that said many of the observations and learnings can be applied to other Architectures at the very least hopefully this talk will help others avoiding avoid repeating x86's mistakes Disclaimer number three. I am not a data scientist any flaws in the data I've collected are due to incompetence not malice Okay before I dive into the meat of the talk I want to state the obvious KVM is not small with the addition of risk 5 KVM now supports six architectures and if you count the different flavors of virtualization each architecture for example x86 is SVM versus Vmx that numbers easily in the double digits There are over 300 KVM specific files across all those architectures with over 150,000 lines of code not counting white space or comments in 2022 alone over 150 people have contributed more than 1100 patches to KVM And that's not taken into account all the people using testing and deploying KVM And that's also not taken into account the over 90,000 lines of test code between KVM So-called unit tests and KVM self tests and expect all these numbers will continue to grow Looking beyond 2022 where does KVM need to scale in my mind? There are three points of focus development maintenance and validation From a development perspective KVM needs to scale to support the increasing number of features being added to KVM as well as an increasing number of use cases and the variety of those use cases and Inevitably all those new features and use cases means the number of contributors Contributions to KVM will also continue to increase which means more patches to review and more code to maintain and more code and Features and use cases means more things KVM can break in exciting new ways Note I specifically did not say developers and maintainers We don't necessarily need to scale the number of developers We can scale development overall if we can make the developers we have more efficient Same goes for maintainers. We don't strictly need more maintainers though. That would certainly help We can effectively scale maintenance if we reduce the cost to maintain KVM for example by reducing the number of bugs that were that are introduced in the first place and Obviously all these focal points overlapped in their act if we increase the number of maintainers and reviewers Then we'll theoretically be able to support all that new development But if we don't also scale our validation capabilities then keeping KVM healthy will become more and more costly and eventually Development will slow down and or maintenance will again be a bottleneck on the surface The goals of this talk are to identify what I see as the pain points in bottlenecks and KVM as a project and to propose ideas And how I think we can eliminate or at least reduce those pain points without upsetting the balance between development and maintenance and validation But if the only thing I accomplish is to erase awareness and spur conversation in the KVM community Then I'll still consider this talk a success We as a community must first acknowledge that something needs to change even if we don't agree on exactly how or what to change That's it. Let's get into the how so to scale KVM development and maintenance and validation Let's pretend for a minute that KVM is a network if KVM were a network users would be asking for more cat videos Delivered faster and with less downtime in KVM terms. That means we need to improve latency to get features merged faster We need to improve developer and maintainer efficiency so that merging new features requires less time and less effort We need to improve monitoring so that when there are bugs They're found in a more timely fashion and result in fewer visible problems for downstream users And finally we need to improve KVM durability so that we aren't constantly in a state of fixing KVM First up is latency Most of us have sent exactly this email at least once in the last year except for maybe Apollo Latency is arguably the single biggest problem in KVM x86 Developers spend too much time waiting for reviews which leads to frustration and loss of efficiency And it takes too long for features to get merged which impacts the schedules of downstream Consumers and or it forces downstream pod projects to backport features to an older kernel Backporting obviously sucks up more developer and testing resources and again hurts efficiency Looking at KVM x86 in terms of the number of Contributors versus the patches merged port per contributor gives us a good idea of why latency is a problem The number of people actively maintaining the code base is roughly flat over the last decade But the number of overall contributors has roughly doubled and it's not just one-off patches While the number of people submitting one or two patches here and there does account for many of KVM's Contributions the number of people contributing medium-sized series has also doubled and Note I took the number of contributors verbatim for this year and didn't try to project the final results So the drop-off for 2022 will likely be much less sharp by the end of the year In other words there are more and more Contributions coming in but the number of people actively maintaining KVM x86 on a day-in day-out basis is staying the same and here I'm defining maintainers to be people that are regular fixing bugs and doing cleanup not just so-called official maintainers and Realistically that trend is not going to change There's a relatively low hard limit on the number of active maintainers that KVM can sustain It's difficult to fully isolate sub components in KVM for example adding support for a new a big virtualization feature in x86 Requires enabling and vendor code enabling an epic code enabling in common x86 and potentially even in common KVM and In a lot of ways subdividing an architecture would be counterproductive When VMX and SVM were maintained more independently than they are today. There was a lot of duplicate code so at some point diminishing returns kick in and as Coordinating between everyone becomes more and more difficult and there are other issues too if we just try to throw more Maintainers at the problem if there's not enough work from the maintainers then move on to something else We're back to square one with not enough maintainers and not everyone likes the day-to-day work of maintaining a code base I personally enjoy bug fixing bugs and maintaining a code base But there are plenty of folks that prefer to develop new features and or want to constantly experience something new and there's absolutely nothing wrong with that and Lastly staff in a team with people that do nothing but fix bugs and do code reviews. Just it's never going to happen So we need to figure out ways to improve latency without just adding more maintainers though Obviously adding more maintainers and reviewers will help another area in which KVM can improve is efficiency and what I mean by efficiency is Minimizing the amount of time and code and churn needed to enable and stabilize a given feature Why does KVM need to be more efficient again like the number of contributors the number of commits to KVM x86 has basically doubled in the last three years a Large part of that is due to Google shifting to an upstream first development process for KVM Many of those commits are the result of Google pushing features and patches that Google has carried internally for however many years and So Google's upstreaming of existing features and fixes will naturally taper off So it's entirely possible that x86 has peaked in terms of number of commits But it's also unlikely that KVM is going to return to its 2018 levels in the near future TDX and SMP are knocking on the door and are going to bring with them a new memory API and plenty of churn And we've also effectively ignored major feature proposals and no small part due to lack of reviewer free resources For example virtual machine introspection and address space isolation There are also a non zero number of Intel AMD features that KVM doesn't utilize or support And it's unlike that Intel AMD are going to suddenly stop adding shiny new things Looking at arm the uptick isn't as notable, but the number of commits is still trending upward The dip back in 2019 is the result of KVM dropping support for 32-bit arm CPUs So if we mentally flatten out the history to account for dropping 32-bit support arms been on a slow But steady upward trend Given that Google Cloud launched its first arm-based VMs this year and that the Android folks are deploying protected KVM And are building up quite the backlog of patches and that Google isn't the only company pursuing arm-based platforms I fully expect the upward trajectory to continue Sooner or later many features that exist on x86 like nested virtualization and optimizations for live migration will make their way to arm The other architecture that I want to mention is risk 5 Like arm I expect risk 5 to actually increase activity to increase substantially in the next few years And we also have a prime opportunity to intercept risk 5 in the sense that if there's code that can be shared now is the time to do that Risk 5 has a relatively small code base There are relatively few users and so the risk of breaking existing users is lower and Implementing a feature in common code from the get-go means we don't have to go and rip it out of the risk 5 equivalent And then there's common KVM Common KVM development is more or less flat for the last decade The optimistic takeaway is that KVM's core API is are mature and stable The pessimistic takeaway is that there's likely too much duplicated work across architectures Obviously a significant amount of KVM code is inherently arched specific But there are features and sequences that can be shared across architectures for example a Significant chunk of the x86 commits over the last few years has been to implement a shadow a new MMU and as David Matlach's talk Shown the vast majority of that code isn't truly x86 specific So from an efficiency perspective while supporting x86 levels of activity across multiple architectures is certainly doable It would be quite costly, especially for users such as Google that want to deploy on multiple architectures note unlike Contributors for this one I did try to project and I just assumed a linear scaling for the rest of the year in terms number of commits and I also excluded merge commits which is in theory why my numbers differ from what Paolo presented on Monday Sharing code between architectures can reduce the amount of code needed to enable a given feature But what about the amount of time and amount of churn needed to get a feature merged and healthy? Picking on Paolo again. We've all seen the magic cued. Thanks, but where is it cued? When will it show up in KVM queue? What's the difference between KVM queue and KVM next these are all questions? I've asked been asked multiple times by Googlers The questions and answers themselves aren't relevant to improving efficiency, but the underlying problem is Expectations of developers are unclear rules are undocumented Protocols for handling anomalies don't exist. We have tests, but which ones are our developers expected to run? Do all tests need to pass the test need to be run in multiple architectures? Depending on who you ask you might get different answers Unclear expectations lead to more flawed submission bugs that could have been found via developer testing are instead found by code review And or post submission testing that leads to more submissions Which increases reviewers workloads and in the worst-case scenario bugs that are found by maintainers and reviewer testing consume even more reviewer time And lack of documentation and protocols means it takes more sign more time to resolve problems Because rules and expectations aren't documented maintainers sometimes feel obligated to help resolve problems instead of saying Hey, this broke X. Please go see why fix it and resubmit and as a result Developers are left waiting for maintainer feedback while maintainers figure out what's going on and maintainers are spending time debugging and fixing bugs And someone else's stuff and no one's happy No, I'm picking on Paulo because he's the most visible KVM maintainer and cute. Thanks has reached mythic proportions I'm not blaming Paulo nor am I saying that as Paul's responsibility to improve the situation or at least it's not only Paul's problem Related to efficiency and especially to maintain or efficiency is monitoring or in KVM's case the lack of monitoring Having all the tests in the world doesn't help if they're not run or as is often the case at Google if they're run too late Testing upstream KVM and kernel releases months or years after the fact makes it more costly to fix bugs Developers need to page back in the relevant details and it makes it much more likely that bugs will be encountered Debugged and possibly even fixed by multiple users or developers In short upstream KVM is woefully behind. I'm catching the continuous integration train Now I'm not saying that end users of KVM aren't utilizing CI for their kernels rather that CI principles aren't being applied to upstream KVM Upstream KVM does get some amount of continuous testing But it's mostly from bots that test the kernel at large and or the testing and reporting isn't consistent And so it can't be relied upon in any meaningful way At a minimum it should be very feasible to test KVM queue and or KVM master on an on-demand basis But ideally we would go even further and find bugs before they make their way to an official queue Again looking at x86 around the same time that the number of commits started accelerating the percentage of commits that are bug fixes also jumped Now these stats are likely misleading to some extent and I'm obviously using them to tell the story that I want to tell Odds are very good that many of those fixes were for bugs in older versions of KVM And or that tagging fixes tagging commits with fixes simply became more prevalent around 2019 In other words, it's probably a good thing that we're fixing more bugs But that said regardless of why the increase occurred the fact remains that roughly one in five commits to KVM x86 over the last four years was a bug fix Again relating back to efficiency those commits represent more failures that had to be triage entity bug More patches that had to be written tested submitted and reviewed and note I'm counting anything with fixes as a bug fix not just patches tagged for stable again Which is in theory why my numbers differ from follows Last up is durability i.e. how do we improve KVM's long-term health and break out of the loop where we're constantly fixing bugs and code That was written three plus years ago The easy answer is to write more tests and to write better tests and to run them more often and from the most part That is the answer But there's one underlying problem. I think we can tackle KVM at least on x86 Historically as employed a good enough approach Okay, so obviously Frank Robinson didn't include the bit about KVM in this quote But that's only because KVM wasn't around in the 70s The good enough approach largely worked when KVM had a relatively small development community and when KVM was being used to run a relatively small set of workloads But that only works for so long Eventually KVM shortcuts are no longer good enough Case in point x86 has addressed several KVM Arata in the last year alone a Hypercalcate patching quirk a monitor m-weight quirk an x2 a big hot plug hack and a truly brutal hack where KVM added several layers of View to fudge around a bug where KVM injected double fault into L2 instead of synthesizing a nested VM exit to L1 All of those hacks and quirks were avoidable Fixing them properly would have required more effort and or might have required non-trivial changes to other components in the stack But the fact remains that they were avoidable and Even if good enough ends up being truly good enough We'll soon forget the detail and eventually break things in our own ignorance Either because we're human and we have limited brain capacity and we simply forget or because people leave the project Attrition is inevitable as denoted with an asterisk over half the effect of maintainers throughout KVM's history are no longer active Unless we're extremely diligent in documenting KVM's quirks and or Arata Knowledge of KVM's good enough shortcuts will eventually be lost and we'll have to be relearned by the next generation Without solid documentation the long-term cost in not breaking KVM is non-trivial for good enough code This is not always obvious why KVM behaves a certain way Anecdotally I can attest to that fact as I've spent far too many hours doing get archaeology to understand why even if it might Only take a small amount of time to understand what KVM does Okay, so all that I still actually haven't said anything about how we can improve first and foremost Document expectations and processes and if we don't have expectations and processes define them For example KVM x86 doesn't have a well-defined merge window, which is a double-edged sword on the one hand lack of anything remand Anything resembling an official merge window allows for flexibility There's no hard cut-off date So any last-minute patches can be squeezed in and maintainers can accommodate variances in their own schedules on the other hand not having constraints of any Kind means being heavily reliant on maintainers judgment to decide what's safe and what needs more soap time and Inherent subjectivity also leads to friction with developers KVM releases are unpredictable and there's often a lack of clarity as to why feature x was merged But feature y was not as For expectations at the top of my list is defining and documenting KVM health requirements For example explicitly stating what tests must be run before posting based on what code is changing What kernel config settings are mandatory and so on and so forth less ambiguity means fewer assumptions and thus fewer surprises and Hopefully, it'll also help developers find bugs in their code before submitting On the receiving side lay down hard rules for the health of KVM Q KVM x KVM master Whatever we call our branches again, so that we're less reliant on maintainers judgment to keep KVM healthy And so that there's less ambiguity and less subjectivity and what is merged and when We also need to document even seemingly mundane details like the difference between KVM Q and KVM next and when do patches go to KVM master instead of KVM Q We also need to document KVM coding style preferences that aren't hot covered in the common kernel documentation What scope to use in the short log and so on and so forth Documenting so-called mundane details will help improve predictability from a developer perspective Cut down on patch churn due to knits and minor flaws and will also reduce the cost to onboard new developers Lastly, we need to document KVM orata ie document KVM deviations from architectural specs So that KVM's known quirks and flaws don't need to be rediscovered by the next generations of users and developers Next up is testing which in my mind is the single biggest thing individual developers can do to improve efficiency and latency at the very least run the tests we have and Non-trivial number of bugs that take up maintainers time are caught by existing self tests or KVM unit tests Sometimes it's infeasible for developers to fully test due to lack of hardware But in many cases bugs escape simply because tests were never run and for new features Don't write a test just so that you can get the magical cued. Thanks actually try to break your code Deliberately introduce bugs in KVM and verify that your tests hit them when possible brute force every combination of input In other words when writing tests make it your goal to find bugs and keep KVM healthy Not to reach the point where the maintainers thirst for tests is quenched and they'll merge your code and Tests also serve as documentation Pointing out the various edge cases in a test means that reviewers spend less time ferrying out those theoretical edge cases because the authors already identified them Improving the overall quality of submissions and reducing the number of bugs that are found by maintainers means that there will be less back and forth between reviewers and submitters less versions to review less time spent reviewing each version and ultimately faster version of code Continuing with that theme we as a developer community need to make it easier to write tests We need to make it easier to maintain those tests if we can make writing tests a less painful Experience for example by reducing the amount of boilerplate code providing better frameworks better helpers for common behaviors Etc, then hopefully we can increase the overall quality of our tests Realistically, we're never going to get to the point where developers are actually excited to write tests But we can and should get to a point where they don't want to rage quit after trying to write a KVM self test Closely related to testing is knowledge sharing Anyone that's worked on KVM for any substantial amount of time is either spending way too much time fighting our tools or has built up a repository of Scripts and whatnot to make themselves more efficient Anything from simple get a laces to complex scripts to run VMs You know up screaming such scripts is just not going to happen It's the bar for quality is too high And we don't want to discourage chips and tricks just because someone cobbled together a set of scripts instead of developing an elegant framework and The end result also needs to be tailored to each individual to maximize their efficiency So upstream is just not realistic But rather than force everyone to reinvent the wheel which often seems like a rite of passage in Linux kernel development Let's figure out a way to share our tips and tricks Even something as simple as a file with links to per people's personal repositories would give others a starting point to build up their own arsenal Okay, so that mostly covers latency and efficiency of developers and maintainers, but what about achieving a more efficient code base? The obvious answer is that we need to put more effort in the sharing code between architectures Solve common problems once instead of replicating features and fixes across multiple architectures When implementing an existing feature on a new architecture consolidate code instead of copy pasting But without a forcing function of some form telling people to go share code isn't going to get us anywhere To that end we need maintainers that work across architecture So they have the knowledge and the authority to say hey this can be common code and this should be common code One step we've already taken is to reduce Paolo's Responsibilities on x86 so that he can focus more on KVM as a whole Another way to eventually reach the state is to train the next generation of KVM developers maintainers on multiple architectures For those of us that have been focused on a single architecture for many years It's often difficult to shed our existing Existing responsibilities to dedicate time to a new project which is just a polite way of saying we're crusty and stuck in our ways But for newcomers that don't have years of baggage I think it's feasible for them to achieve greater breadth without sacrificing too much depth as For improving monitoring. We simply need more automation at a bare minimum We need automated testing of to be merged patches Ideally bugs that are found by existing tests would never escape a staging branch That might not be 100% realistic due to the number of possible We have a jillion compilers kernel configs module prams and hardware capabilities But for configurations that users care about and that have well-defined tests Detecting the overwhelming majority of bugs before they hit KVM queue is absolutely doable and fewer bugs in KVM queue Means fewer bugs that are ultimately the maintainers responsibility to solve and fewer bugs that maintainers need to resolve means more time Maintainers can spend doing literally everything else In theory bugs can be assigned to the author of the guilty patch to achieve a similar effect, but in practice It's not so simple once a patch is fully accepted bugs become the responsibility of the maintainer as bugs now affect other developers and users and Because such bugs affect others There's often a sense of urgency if KVM is broken There's no way that it makes sense to wait for days or weeks to fix the bug It's just not a viable option if the author is unavailable or for whatever reason has tasks that for them are higher priority They can't be relied upon to fix the bug On the other hand it's far easier to punt a failure back to a developer if the offending patch hasn't been officially merged The onus is on the developer to fix the bug in order to get their series accepted As for making automated testing in reality the Red Hat folks already are working on setting up CI for KVM unit tests Maybe self tests and are actively working on automating Apollo's battery of x86 installing boot tests We at Google Cloud are also working slowly on providing additional automated testing For example for more comprehensive integration testing Unfortunately, that's easier said than done on our end as is probably the case with other companies Google testing infrastructure is tailored to Validating Google kernels not vanilla upstream kernels and that infrastructure is developed and maintained by engineers outside the KVM team It's a solvable problem. It just might take us a while to get there And that said I want to make it very clear that neither Red Hat nor Google is laying claim to automated KVM testing This is very much a more the merrier situation Worst case scenario we end up with duplicate and redundant testing and redundant automated testing would be a very good problem to have Pivoting to test something slightly different is much easier than creating the ability to test in the first place Long term the pie in the sky goal is to have automated testing of individual series and patches For example, so that developers can receive feedback and iterate without needing to wait for a human to review their code And obviously reducing the number of bugs that are encountered by maintainers would improve maintainer efficiency as well Realistically though, we're at least a year away from automated testing of an individual series and it's possible. We'll never get there in addition to the mechanics or of Identifying what to test what tests to run reporting the results, etc They're also logistical problems to solve testing requires having enough physical systems available to actually do all the testing Which means having the additional budget to acquire the hardware and engineers to maintain those systems handle unexpected behavior, etc That said in the meantime developers can and should automate their own workflows For example use scripts to compile test multiple kernel configs run check patch before toast posting, etc And jumping back to the previous slide We as a community can help by sharing scripts methodologies, etc. That some of us are already written to do exactly this Lastly we as a community need to make a conscious effort to adapt to what KVM has become and move away from what KVM was 10 years ago KVM is no longer a startup project whose primary goal is to simply to run VMs We need to get out of the good enough mindset and declare the job done when KVM faithfully emulates or virtualizes hardware Not when KVM is able to boot a VM without exploding Realistically KVM is never going to achieve perfection There will inevitably be features that are flat out impossible to virtualize in all scenarios And there will be cases where the cost of achieving perfection outweighs the benefits I've argued that exact case more than once in the last year But failure to faithfully emulate hardware needs to become the exception not the rule If KVM honors hardware specifications, then we can rely on hardware vendors to document KVM behavior Hardware vendors have pages upon pages of detailed specs and while we sometimes grumble that their specs are flawed The reality is that documentation from hardware vendors is and always will be miles and miles ahead of KVM's documentation And as mentioned earlier when KVM inevitably needs to diverge from architectural behavior Document the erotum so that's visible so that it's feasible for someone to understand What to expect from KVM without having to go spelunking through codebase and without having to do years of git archaeology And part of implementing to the spec means not making assumptions about the guest i.e. Don't use that'll never happen as justification for taking a shortcut Even if something never should happen doesn't mean it can't happen If the guest encounters a bug of its own then the guest and or host user space needs a sane predictable response from KVM And if that response is rooted in architectural behavior, then all the better Finally speak up Force KVM to adapt propose process changes if you see something isn't working Request documentation for processes or requirements that are undefined or unclear For many of the x86 issues i've talked about especially around documentation expectation I think many of the people in the x86 community developers and maintainers alike I've actually been aware for quite some time that the status quo isn't working But for whatever reason we've all been waiting for someone else to initiate change Obviously, there's no guarantee that proposed changes will be accepted But you should at least get an answer as to why things are the way they are And with that i'll stop talking and open the floor to questions and comments And i believe we are going to do virtual first from jennifer There are no virtual questions So if you have live questions, we're gonna have a mic set up If you don't mind you want to come up here? I'll repeat you You get your code base to a point where you're not spending copious amounts of time Debugging failures That's the that's the easy answer As far as like getting the infrastructure. I mean you shouldn't ideally you don't have your development team Writing your infrastructure code and doing the day-to-day work and maintaining your systems I mean obviously you need a big enough company that you can afford to do that, but We're google we can do that. We should be able to do that And then once you have a healthy enough code base and you don't have failures every day You don't have a hundred bugs to debug And you have you know one-off failures and you can bisect those and just punt them back to the developer that Made the problem in the first place then you don't put all that toil into people doing development theoretically a live question I think also I think also if we like you said put A high bar for what's required to submit patches people be naturally incentivized to automate Because it'll be too painful to do everything manually otherwise. And so that will just naturally feel forcing function for Automation I think we had a virtual question So I have a suggestion for maintainers which is don't necessarily try to be nice like Be polite but not necessarily nice like if somebody is actively making your work a pain Tell them like tell them I won't accept your patch until you do x y and z like Do it politely, but don't try to be nice because it only leads to your own burnout I think we had a virtual question huge effort in music where unavailability is a real problem. Do we have a system in place? So that individuals can test their changes on different architectures No, we get there eventually in theory if we can get enough ci to do that hardware availability though is Not fully solvable Examples being people at a hardware company are working on a bleeding edge cpu. They are the only people with any kind of access to that cpu However Those are Not the problems we have Yes, bugs come into kvm because people don't have hardware to test But they're relatively small minority and there are lots of red flags around such patches to Make it quite obvious that we may want to take a closer look at this series If someone from intel submits a patch that you know modifies core core code and amd It's obvious that they probably didn't test it all the way and we should take a closer look So yeah, it's a problem, but I don't think that it is a Major issue nor is it a major source of the things that are slowing us down