 Thank you very much. So I wanted to talk today about not just the specific RFC that I'd worked on of providing unions into the language, but the general RFC process and how it got me into Rust. So naturally I want to start with talking about the most related possible topic to those things which is virtual machines. So a virtual machine is a way of giving you a system in a box. And to give an example of this, let me show you a screenshot of a virtual machine running my desktop inside of my actual desktop. If you can see the command in the corner there, yes, that is a scary and really amusingly useful thing to do as long as you remember the dash snapshot. And the big advantage of a virtual machine is it gives you containment. Okay. I'll keep going and we'll see how things go. So yes, containment and isolation, which would be useful in audio systems as well. So this provides a form of sandboxing. And you can use that for a variety of things. You can use that to not just sandbox a whole system in a box, but an application sandbox, just your application running alone in a virtual machine or an arbitrary piece of code. So virtual machines are a key part of what people call the cloud, which would not exist without a very robust virtual machine infrastructure. So the most popular thing people tend to run for virtual machines is based off of QEMU. But in practice, what people use is a mode of QEMU called KVM or kernel virtual machine, which provides hardware acceleration. So this allows you to run near native speed virtual machines. There's various other tools based off of this, things like KVM tool, which is more of a sample virtual machine that's easier to reason about and much smaller to work with. These are all based off of a common API called the Linux kernel virtual machine API, or sometimes just called Dev KVM because that's the device used to access it. So when you're building a virtual machine, you provide various things like virtual hardware. And the problem with virtual hardware is that it's a very large, very complex attack surface area. So to give an example of this, how many people have used one of these? Awesome. So there was a vulnerability in the QEMU virtual machine based off of floppies. And even if you didn't have a actual floppy being emulated, the floppy drive was there and there was a buffer overflow in it. And this was a major impact to hosting providers where they had to go rework and reboot all of these VMs, upgrade, deal with this problem, and at the time they figured, okay, let's take out all the hardware we're not actually using. So no way you'll have a problem in the popular stuff like, say, graphics, which had a problem with, yeah. So that one turned out to be an out of bounds memory access. Let's access this buffer full of bytes using a Uint32 pointer and go four times later than you're supposed to. So I don't want to pick on QEMU here. This is the most popular, so that's why I'm mentioning cases it has been affected by. But there have been major security vulnerabilities in all virtual machines. And they all tend to lead, all the worst ones tend to be arbitrary code execution. These are often caused by buffer overflows, by use after free vulnerabilities, or by double free issues. So just to emphasize this for a moment, we first start, you know, the initial discussion of what a buffer overflow was happened in 1972. In 1988, the very first exploit of a buffer overflow occurred. It's 2016 and we're still dealing with this. So you can probably see where this is going. And the reason I want to try using Rust for this is that it prevents these whole classes of bugs, but in addition to that, it's also a pretty good fit for virtual machines for other reasons. There's great systems programming language. There's no runtime or garbage collector to get in the way of doing that low-level programming. And it's honestly the first language I've seen that's a credible replacement for C and C++ anywhere you could use them. So I'm not going to go into great detail about the KBM API here. I've actually written an article for LWN on the details of the API, along with a sample virtual machine that's very minimal. But one thing I want to look at is there's a structure in it called struct KBM run, which is the state of one virtual CPU, and it has all sorts of things in it related to that state. So that structure looks a little bit like this. And the most common things it's used for is you call into the kernel saying run the VM for a while, and it returns after a while saying, okay, I can't proceed until the virtual machine monitor does something for me. Here's why I exited. And it might be, I don't know why the hardware just seems to do something weird. It could be, you did an IOPORT operation, here's the details. But the thing I want to point out here is that this is a struct with a union in it with a bunch of structs inside that. And that's actually a really common pattern in Linux and Windows APIs. Here's some discriminant and here's the details of all the variations. So it's common in FFI, it's common in Linux and Windows APIs. I haven't used OS 10 a lot, but I would be surprised if it doesn't show up reasonably often in systems level programming as well there. And the problem here is Rust doesn't actually know what a union is, or didn't actually know what a union is. So Rust, so let's talk a bit about a union for just a bit of background if you haven't seen one before. It's a structure that has multiple storage variants, different types for each variant, and overlapping storage for each of those. So if you have ten things in a union, it uses the space of the biggest one, not the space of all of them at once. So Rust says, I know what you're talking about, you mean an enum, right? So an enum is a safe tagged union. And when you're writing native Rust code, it's almost always the thing you want to be using. But a union from a low level systems programming point of view, when you're writing C code, is an unsafe untagged union. You have to know, by some application specific way, what the details of what you need to access are. So you can do this in Rust without any native support. You can call functions like standard memtransmute and pointer offset. The existence of which in your code tends to mean you're having a bad day. They tend to be kind of awkward. And most importantly, they lose just about all the interesting safety properties that I wanted to get out of writing a virtual machine in Rust. So how do I fix this? So I looked into this, trying to figure out, what do I do about this? I found the request for comments process in Rust. So I want to compare this for a moment to the C and C++ standards process. I know people who work on this process, so I've talked with people who actually make changes to the C and C++ standards. And the main thing is it tends to be very meeting driven, and I mean very in-person meeting driven. Let's all get together and talk about the language. It's very opaque. It tends to only really produce results when it has something to show and not all the intermediate bits. And it's pretty much an expert's only club. You need to already be an expert and then go say, I really want to change the language and here's all my myriad reasons why. To compare that to the Rust RFC process, like many things in Rust, it's extremely welcoming. It's open and transparent, and it's extremely inclusive. So in particular, Rust has a community process around RFCs. It's very lightweight, but it's not too light. And I want to emphasize that a lot of people think, well, if you want to make something more approachable, you need less process so that it's less impenetrable, that kind of thing. But there are cases where you actually want a process to be more friendly. For example, Rust has a shepherd for an RFC. Somebody in the core team who's saying, I'll keep an eye on this and help move it through the steps. And the actual decision ends up being made by one of the Rust core teams. So that avoids problems like diffusion of responsibility. If you send a mail to a mailing list with a thousand people on it, you might get no answer. Not because nobody knows the answer, but because everybody figures it's somebody else's problem and nobody does it. So in particular, I looked around for details of what I could do with unions. And it turns out there was an RFC already for this in January of last year. RFC 724 for unsafe enum. So this was, let's have an enum without a discriminant. That makes sense. Unfortunately, that actually got closed without being adopted. But there was some interest in reviving it based off of renewed interest in the FFI layer. And there was a thread on the discourse forum, talking about here is reasons why we still need this, reasons why this is important. Here's the approach we could take to solving this problem. So I started poking at this and trying to come up with, okay, what's a new approach to this that might have more interest and more ability to get adopted? So the first problem I ran into was keywords. Things like struct and enum are keywords in the Rust language. So if you try to write something like, let me make a function named struct or a variable named enum. The compiler will say, no, you can't do that. And everybody I know has that same cartoon image from this morning in their heads now. So union on the other hand, if you wanted to write union something or other and define the fields, you'd first thought would be, let's make that a keyword. So it turns out there's, not only would that prevent you from using functions or variables named union. There are already functions like that in the standard library. Let's take two sets and get the union of them. So can't, clearly can't use that. So let's take some other approach. So I looked around and thought, well, you can have a repper on a struct that says, well, this struct has the representation of C that has the same layout as C on the native platform, or it can be a packed struct that is, doesn't include any padding for natural alignment of types. So maybe union could be the representation of a struct. This struct has the representation of overlapping all its fields. So that would have looked like this. And yeah, it's not the prettiest thing in the world, but if you're only gonna use it in FFI, it's functional, it gets the job done. So that was enough, I sent a mail to that thread called a pre-RFC saying, hey, I'm thinking of filing an RFC on this, here's the approach I would use. So this produced a bunch of responses, the most notable of which tended to be, well, you're using the struct keyword, but you're giving it completely different semantics. C and packed are just ways of laying out a struct, but union actually completely changes the semantics. And by the way, accessing the fields is now unsafe. So that seems a little bit strange. So what if we just pick a keyword instead that doesn't conflict with much of anything in the standard library? So how about let's make it a little longer and say, it's an untagged union. It's still a keyword. So it could potentially break things if somebody happened to have a local variable called the untagged union. But it's an improvement and there are ways to find out, would I actually break anything in practice? So there's a beautiful process in Rust that's often used as part of the RFC and other language changes called Crater. And Crater takes a proposed change to the language and builds the entire cargo ecosystem against it to say, would anything we know about break? So that's really handy if you wanna test drive something with the entire corpus of available Rust code. So I took, that was another option and we would likely go down that route if we're gonna introduce a new keyword that might break things. But in any case, based off of that, I ended up in December of last year, proposing RFC 1444 and proposing let's use untagged union as the name of this. So that produced a bunch of additional discussion. And that led to some new ideas, new proposals, new variations on a theme. And a few of those ideas I wanna mention in particular, somebody said, well, we could use a compound keyword where any given piece of it wasn't a keyword, like unsafe union. But you could still use union for other things. There were a bunch of other interesting proposals. And the one that ended up gathering the most attention was Niko, I believe, provided the idea of what about a contextual keyword? Something that's a keyword in the right context, but it's not reserved for other places. So that would let us actually write the thing we really wanna write to begin with, which is union. But then you can also write this and it doesn't break. So it's a little quirky, but it doesn't really cause much of a problem in practice, so that's what we ended up going with. And I mentioned this not by way of demonstrating bike shed painting. But rather to say, this is the kind of improvement that happens through the RFC process. Here's a bunch of people with a bunch of ideas on how to do this better. And we ended up with something much better than what we would have done without that process. So another aspect of this that I had to deal with when writing these RFCs was unions interact with everything. And the RFC needs to discuss all those language interactions. So the result of that is that the RFC was about 20% the definition of unions and 80% how it touches absolutely everything else and what happens when you mix the two. So it's, however, the really obvious things like how do you initialize a union? How do you read and write fields of a union? How do you pattern match one? How do you borrow from it? But it's getting increasingly, let's touch a bunch of other features of rust. What about visibility annotations? How do those work? Well, hopefully they work in the obvious way, but what's the obvious way? Let's talk about it. What about traits? Can you define traits on a union? Sure, okay, here's how they work. Can you have generic unions with a parameterized type? How does it interact with drop? Which ended up having a bunch of additional discussion on, well, what if you want to stuff a field into a union that has a destructor? But then you access it as a different field, do you still destruct the first variation? So there's a lot of corner cases and part of the job of an RFC is to cover all those corner cases. So this produced some discussion on the RFC. I want to put this in perspective here because when I proposed this RFC, I was very new to rust. And I'd done a few tests of various interesting things, but I was still very new. And I did not expect that this would be the fourth most talked about RFC ever. Note that the ones above this are things like question marks. So controversy, did not expect that. So one of the things that helped a great deal and that people were marked on later as being, okay, that monitor went away, but that's still working, that's good. So was to have summary posts in the thread that said, hey, there's been several hundred messages, but here's what I see as the major open issues for us to move forward. We've been talking about this and here's the alternatives. We've been talking about that and here's the alternatives. Here's the points of disagreement. And even if people had different opinions, they could agree, yeah, that's the set of open issues and here's where I fall on those. The other aspect of it is an RFC should really be self-contained. So all that history, all those alternatives, all those forum threads and discussions should end up in the RFC, in summarized form. So even though the union's RFC just says, this is how we're gonna do it. It also says, here's the eight other ways we could have done it and the advantages and disadvantages in a fair representation and fair way. So that anybody looking at the RFC doesn't have to say, well, if I wanna know what it could have been or what the trade-offs are, let me go read 350 GitHub posts and a discourse thread. So given that, how does this progress through the process? How do you make an RFC move forward? So there's a bunch of discussion and when the core team feels like it's ready to say yes or no, we've picked a reasonable alternative, let's say is that what we're gonna do, it goes into final comment period or FCP that says, okay, last call, if you have any objections, please speak now. This typically gets announced in this week in Rust, which if you're not already reading, I highly recommend it. And that's a good opportunity to see, hey, here's something that's coming up in the language. It shows when an RFC is first posted and then when it hits final comment period. So what is the status of union? The RFC was actually accepted in April 2016, yay. And it's, thank you. So this is actually part of the Rust language now, Rust includes unions. But the RFC process does not include implementation. The work on implementation can run in parallel if it wants to be speculative. But in general, the implementation doesn't tend to happen as part of the RFC process. So as part of the RFC, when a new RFC is accepted into the language, there's a tracking issue opened. The RFCs are in a separate repository. The tracking issue is opened on the main Rustlang slash Rust repository to say, okay, we've decided to implement this. Here's the issue where we track this implementation status. That issue is still ongoing. But the details in particular Petrochinkov, I hope I'm pronouncing that correctly, implemented this as of last week. So it's actually available in nightly Rust right now. So it's behind a feature flag, but it's usable now. And now that that's available, let me go way back to how did I end up, why was I shaving this yak again? Oh yeah, I wanted to implement the KVM API. So let's talk very briefly about that then. So I implemented here, you know, a subset of the union, the bits that I actually needed, just as a quick example. I actually ended up doing this yesterday. And the implemented, here's a union with the variations that you need for what happens if you fail to start, what happens if you get IO, what happens if there's an internal error. Here's some padding to make sure it's as big as it needs to be. And then the actual struct on top of that has a bunch of common fields, but it has an exit reason and then the actual union. Then what I want to do with this is I was running a very simple test program. So I wanted to run a little bit of 16-bit assembly inside this virtual machine. The details aren't particularly important, so don't worry if you don't know X86 machine code, that's fine. But the key detail here is, let's write out, let's add two plus two, which are in the registers I get when I start up. Let's turn that into ASCII by adding it to zero. Let's write that out to the serial port and then write a new line to the serial port. So if all goes well, this should add two plus two, write two, new line and then halt the machine. So then on top of that, I'm gonna create a virtual machine with one virtual CPU, a bunch of I octal calls to KVM APIs, map some memory containing that one bit of code, set up the initial registers, point the instruction pointer to that code, point the two initial registers to two and two to give it something to do. Run the code in the virtual machine. Whenever it exits, I need to handle its IO and the actual halt at the end. So let's run that. So it runs a KVM test, it prints four and it halts. So I do plan to write a more complete version of this beyond a test program, like a KVM syscrate binding to the KVM API. But in the meantime, this test program successfully simulated this processor that's older than I am. So I've worked on a few other RFCs since then. I ended up working on an RFC kind of mentoring somebody else new to the process to do named field puns, which leaving the jargon aside means, so you've got a struct like this. You can write an initializer that looks like this without having to name field, gets the value of field, field two gets the value of field two. So you can define local variables equal to the field names and then just bind those all into values. Really convenient when you've got 27 fields with descriptive names and you don't want to write x gets x, y gets y, z gets z and so on. So the first question that came up here is a very reasonable question that just about every RFC is gonna get asked. Could you do this with a macro instead? What would a native implementation give you over this? So we did have an answer for that, brevity, simplicity, some better compiler checking, that kind of thing. There were reasons for that and so far it's relatively favorable. But the discussion leading to this RFC is kind of interesting. So I had some conversations about this on a private chat server talking about, hey, here's some interesting ideas that somebody said, hey, has anybody heard of a variation of this for rust? Somebody may have suggested it but I haven't seen anything like it. So we talked about it for a while and we said, well, hey, you wanna write the RFC? Maybe I can help with this. Let's try it. It probably won't interact with too many language features. So that should make it simpler, much simpler than the union's RFC, right? And a few hours later, we pulled up an ether pad, started hacking on this collaboratively and said, I think we're pretty close to done here. I mean, we can't think of any unresolved questions. I bet the rest community will probably find them if we did. But yeah, I mean, we're not being too ambitious here. So yeah, much discussion in several revisions ensued, but it's making good progress and seems relatively positive. So let's go up one meta level here and say, what's the state of the RFC process itself? So the RFC process is actually about to go through some minor renovations and refinements. There's been some discussion about pulling in some process improvements learned through various RFCs, including the 1444 for unions. One of the big ones was to standardize on a process for tracking these periodic summaries of complex RFCs. There's also several of the teams have already moved to asynchronous review, instead of saying, let's talk about this in our weekly meeting. They're saying, well, check the box if you agree, we should move forward to final comment period, that kind of thing. So that helps provide more responsiveness to various RFCs in various teams. And the rest of the teams are kind of moving over incrementally. There's also a new Rust style team and there's a process being introduced for format RFCs for here's how the language should be laid out. So we clearly need a logo that has like a bicycle shed and a paintbrush inside the Rust logo. So if there's any artists in the community that would, I'd love to talk to you. But I'm working with the Rust style team on this. The last, the thought I want to leave you with is, you know, you might look at this process and say, well, I have to be an expert to write an RFC and to change Rust. And probably the single most important thing in this talk that I want to point out, I wasn't, give it a shot. I learned a lot about Rust by trying it. I sure wasn't an expert when I started. I'm not really an expert now. So should be approachable. There's lots of people who are willing to provide help and mentorship, that kind of thing. So a few possible tips for trying to make a successful RFC. First of all, definitely start with a pre-RFC discussion, started discourse thread. Hey, I was thinking of proposing something like this. Does this sound reasonable? You'll get a lot of ideas before you start the formal process. Definitely seek advocates, seek people with use cases other than your own. Find collaborators who want to work on this with you. It doesn't need to be a solo process. Don't break compatibility with existing code if you can help it. Crater helps, but it's even better to say this can't possibly break anything. Definitely consider and discuss all the language features it might interact with. If you don't necessarily know about all those language features, somebody in the discussion will. And you'll want to document how they work. You'll want to document all the history, all the alternatives, all the trade-offs in the RFC, and not just point to the discussion for them. And give them a fair evaluation as well that really seriously consider the alternatives and the suggestions. The response to an alternative isn't, let's write that down as a thing we could have done. Maybe this is better than what we were proposing. Maybe we should go with that instead. What could we learn? Is there a synthesis of both? Then you should seriously consider the null alternative as well, or a macro implementation. Is there a better way to do this that doesn't require changing the language? Could it be a library thing instead? Could we keep the surface area of rust more minimal? Does this need changes to the language? And definitely make summaries as you go. If you've had dozens and hundreds of messages over months, not everybody's going to want to read the whole status. Summary messages linked from the start of the RFC help people catch up so they feel like they can participate without first reading 200 males or duplicating things. So I want to specifically acknowledge a few people who are hugely helpful in this process. Peter is the primary author of a lot of the Windows support and Windows API support for Rust. And he had a huge need for unions. And so we talked a great deal about the use cases there. Ubisan provided a lot of feedback on approaches and pushed for better alternatives, better approaches to make this more semantically sensible. Niko provided the help of saying, hey, the parser can actually do this contextual keyword thing so that we could actually ship it that way. And then Petrochinkov did the actual implementation. A lot of other contributors to RFC 1444. And of course, the Rust community is just incredible. So the thought I'd like to leave you with is I titled this talk in order to form a more perfect union, partly because I mean, who can pass up that kind of pun. But it's also, RFC is the process by which we form a more perfect Rust and the process by which we form a more perfect community. Thank you.