 So, how many of you had a Mr. Softy? All right, Mr. Softy is going to be here till three. So, you guys have permission for a second, because it's only once a year. So, I was at a close friend that I'd been playing soccer with for a few years, and he had a big house party. And this was, I don't know, a month ago, and he had Mr. Softy there. And I reached out to Connor and said, figure it out, make it happen, we got to have Mr. Softy. So, Connor pulled some magic. At Google, you can imagine we have a lot of processing paperwork. So, Mr. Softy is now an approved, I don't know, or whatever it's called here at Google. And we went through all that. Thankfully, I don't have to do any of that, but Connor got Mr. Softy. So, definitely feel encouraged to have at least one before you leave. Test, test. For those who don't know, the great trick of Magic Shell and Mr. Softy is really easy to replicate. If you take two parts coconut oil and three parts dark chocolate, you just melt them in a microwave and it becomes Magic Shell. And if you use really good chocolate, it becomes really good Magic Shell. Very good. All right. I think we're ready to go. Let's just verify that this thing clicks. It does, as God in nature intended. Wait, it's not clicking this, is it? Yes, it is. Great. All right. Lights out. Can you cut the front lights? They're washing out the... Thank you. Hey, folks. I'm Matt Guluthandas. And I'm Mike Roskill. Welcome to our talk about protobuf evolution. This talk is meant to motivate and give a sneak peek of the upcoming work we're doing in the protobuf ecosystem. Evolution for protobuf is a really highly dimensional thing. So we're going to try and split apart those dimensions a little bit to give you a more clear sense of what we're talking about. Consider a very basic system. What are the entities that can evolve here? You have all three that can evolve independently. And this is the evolution that protobuf was really designed for from the start. This is what everyone thinks about in their microservices, right? You can add fields. You can mark old fields deprecated. There's built-in handling of unknown fields. This is the bread and butter of the protobuf ecosystem. All those wires between the boxes are the protobuf wire format. Let's zoom in a bit and walk through a basic example of schema evolution. We can start with a simple message here, just a single string field, along with the wire format that someone might get from it. This first byte, A0, indicates field one has length encoded content. The second byte, 17, is the length of the string field, and then the rest of the bytes are just the string contents. But rather than inflicting any more binary on you, I'm going to introduce you to a tool that we developed for decoding wirebuff format in a nicer way. This notation comes from the Protoscope tool, and it's equivalent to the bytes I showed earlier. It's just a lot more readable. You can see here that field one has a string in it and the contents of the string. Now imagine that the client had a newer schema for person that included an address field. When this message comes to an old version of the server, the server will see something like this. The server won't know about the address field, but protobuf has semantics for unknown fields baked into it. It'll be able to decode this message and the address data will just end up in the unknown field set. Schema evolution has been planned for from the start, and it's therefore handled very smoothly. You could imagine adding new tag types, but that requires updating all of the existing parsers, everything, right, on your mobile phones deployed in India, on your servers, like everywhere, so that they can accept this new format, and then you have to wait for all of the parsers to roll out globally. And then once everything has rolled out, you can then change your serializers to emit these new tag types. This is a sort of effort that one really wants to only try once a decade or less, and that's not what we're doing right now. We're focused on schema message evolution, or we're not focused on schema message evolution, which is already well handled. We're not focused on wire format evolution, because it's too big a problem right now. We're focused on API evolution. Consider the component of a single piece, right? We're now down into one part of our micro architecture, right? This is the, I want to add features, I want to upgrade a library. This is the sort of day-to-day things that we as developers do when we are doing like a single unit of change and releasing it. While Protobuf has really strong evolution primitives for interacting between components, it doesn't actually have strong evolution primitives for evolving projects like the within it. If you want to change a Protobuf API, you have to do it entirely atomically. You would have to update Protobuf and then update all of the code in your project simultaneously in a single atomic commit. And that's just it. It's somewhat like if you want to bring your, if you want to update from C++11 to C++14, you just have to fix the entire thing and throw a giant switch for your software stack. Of course, upgrading languages should be hard and there's nothing to be done about this. Actually, Matt, maybe we could look at some specific examples of this and see if they give us any ideas. What about the evolution of Python 2 to Python 3? Okay, that was a super painful transition. They didn't do it great, but I mean, they did it better than Pro 6 in some sense. There are good points in 2 to 3 that we can learn from. Let's look at the various points they had. The 2 to 3 tool, it had rough edges, but it helped a lot. It saved a ton of just mindless things like that. Import future allowed libraries to pre-migrate before the rest of the code was ready. So you could say in your Python 2 code, import future and now I want to use the Python 3 features. Similarly, I had the 6 package, which kind of gave you this way to be like, I want to live in both worlds at the same time for a little while. And so despite all that, the change was just too large for Python 2 to 3 to smoothly incorporate into a large system. I think the key pattern here is obvious though. Things that allow for flexibility and incremental migrations make life easier. Generated API evolution has huge potential to unlock performance wins. It lets us fix historical mistakes and replace inefficient designs. This is the target we're actually aiming for. But how do we apply the lessons from Python 2 to 3 so that this target is tenable? What we want are powerful primitives in Protobuf to enable evolution of generated APIs so that people that produce language bindings in Protobuf can actually do these sorts of things without harming their users. We're currently in a worse state than Python 2 to 3 because we don't have any of those mechanisms for incremental evolution. So the question becomes, how do we create the equivalence of 2 to 3 and import future for Protobuf? How do we evolve the schema language defining .protofiles so that it provides rich primitives for API evolution? Fortunately, language evolution is a lot simpler than wire format evolution. You have to go and update all the parsers and if the change modifies the semantics in any meaningful way, then you have to go update all the code generators as well. Also, it's important that your changes don't affect the wire format at all. Otherwise, you're back in the situation we mentioned earlier with wire format evolution. To start with, we're going to borrow a concept from Rust called additions. But before we dive too much into that, let's talk about Proto 2 and Proto 3. These can be thought of as two different representations of the same common underlying language. On the wire, they're identical and they can work in conjunction with one another. They just have slightly different semantics. Unfortunately, when Proto 3 was created, nobody had a migration plan from Proto 2. Because of that, there was no way to switch between them incrementally and the differences between them are all or nothing and now we're just stuck supporting both of them. This rigidity is one of the major problems we face and to get around it, we're pivoting to the concept of additions. Rather than the current all or nothing set of configuration changes, an addition is a set of defaults that can be overridden. We call each of these fine-grained configurable differences features. Features are the mechanism to provide us both import future and import past incrementally. But this will be a lot easier to explain if we look at a concrete example of how to use additions. We're currently in a bit of a knot around strings. Consider this message and the C++ that it generates. The C++ that it generates, it's actually pretty obvious kind of straightforward. If you're loosely familiar with C++, you might look at this and be like, there's nothing really wrong with this. But it has two really important shortcomings. This setter means that a caller must have a stood string object. So if you want to call set name, Matthew Fowles Glucundus, what you'll end up doing is creating a temporary stood string, allocating memory, mem copying from the static portion of your binary into that allocated memory, passing that temp string into set name. Then Proto will take it and then make a copy of that stood string into its own stood string. And then it will go back to you and then your temporary string will be destroyed, which will read that memory. This is a lot of churn and this is something that should just have been one mem copy. And there's a relatively easy migration that allows us to change this to use a string view. And fortunately, we already did that. Our contracts allow this because you're not supposed to take the address or specify the exact signature of function names. This is very similar to the abseil contracts for how to use C++ with them. This accessor on the other hand means that a stood string must exist somewhere inside person's representation. So as a result, we're highly constrained in how we can implement person. We must have a stood string objects that we can provide a constant string reference to the user. That means we either have to have it by value or we have to keep allocated and have a pointer to it. And the stood string must exist and the stood string will have its own indirection to do that. But what if we knew from runtime data and profiling information that this string was always between 3 and 14 bytes? Then we're wasting space. We could have had a scratch buffer of 16 bytes directly in the proto. What if we knew it was always 2 bytes? What if we knew it was always 1,000? There's so much you can do if you have flexibility in the underlying representation. So instead, we want to return an opaque handle to the data. In optimizations, we get. We could do custom sizing where we have a custom SSO buffer for it. Or we could do a Pascal string for things that we really want to minimize. And we can avoid having the stood string destructors run entirely. Which is really big if you're using arenas which are like a 15 to 30% performance when in C++. And there's more if we can avoid the stood string destructors. So how do we use features to help us get here? To start out, we can make it explicit what addition we're on and consider a slightly more complicated proto. Here we've added an address field and specified that we're on addition 2023, which is the first one we'll be shipping. Now, we can use features to import future for a single field. Here we've specified that the name field should generate string view getters, even though the address field doesn't because the defaults for 2023 are to generate string. The core idea here is that addition specifies a set of default features but that a user can override them at different levels. So when the time comes to upgrade the entire file, you can import past for the parts that aren't ready yet. Here we've bumped the file up to addition 2024 where hypothetically string view is now the default. Remove the feature from name and added a new one to address that says we want to continue generating strings. Alternatively, if all of the strings in this file weren't ready, we could simply specify it at the file up and this would mean that all of the fields in this, all of the string fields in this file will generate string getters. As an aside, you might be curious what is the syntax we're using. It looks a little weird the way we've written it but it's actually pre-existing syntax in proto.buff for custom options. The only actual changes we had to make to the proto.buff grammar for any of additions is the first line where you specify an addition name instead of a syntax. By using custom options for features it allows language generators space to evolve their own features independently. This evolves third-party language generators to play on equal footing with the ones built directly into proto.c that we own. Together, additions and features give us an equivalent to Python's import future, import past and also a way to know what time it is. But we still need an equivalent to the two to three tool. What if we had a simple way to automate upgrading a proto file? It would be a single command and would update a file for you automatically importing import past and removing import future as needed. It would even guarantee that it would do this without changing what the generated code was. So you could just sweep it across get everything up to the next addition and then start the incremental things. It would also allow more fine-grained control of the modifications this tool supports. With a rich set of primitives people can evolve their own code bases safely. But we can take it a step further and encode more smarts into the tool. By making the tool aware of protobuf semantics we can teach it how to ensure that the edits it generates are safe from several different perspectives. Is it safe from a wire format perspective? Changing a string to an int is not, but changing in 32 to an int64 is. Is it safe from a JSON format perspective? Actually changing the name of a field isn't safe from a JSON format perspective. Whereas changing a tag is not safe from a wire format perspective but is safe from a JSON format perspective. So you can build those in. We hope this tool provides a baseline capability to the protobuf ecosystem that is value beyond simply updating to the latest edition. So Mike, how long until it's released? Well, we have finished the C++ implementation. Edition 2023 is out as of our 24.0 release. It's just guarded by an experimental flag. We've finished Objective C implementation. We've open sourced all our internal design docs and started on the public-facing documentation that will actually be maintained long-term for editions. We're also currently working on rolling it out to Java, Python, Ruby, and PHP, and all of those should be available well before the end of the year. Prototiller is still under development and that's mainly because we're focused on finishing Edition 2023 and also implementing it in all of the languages you support. Does anyone have any questions? Someone I believe is running around with a mic, so... Definitely. If you want to ask now, I can repeat the question while he gets the mic. Is the feature a guaranteed behavior or is it more a suggestion? So the question is is the feature a guaranteed behavior or a suggestion? And the intent is for features to be guaranteed behaviors. There are a few historical quirks around things like UTF-8 handling that we've had to play a little bit of marginalia with. But the intent is that features are requirements. How does that play into using the generated code with the rest of my codebase? Great question. For the specific case of Protobuf, we have a dependency on AppSale, and so we use AppSale StringView, but the intent of your question isn't specific to StringView, so I'll try and answer that intent as well. And it is, if the rest of your system isn't ready, you should put in top-level disables on things and say, oh, I want string accessors, and it'll hit all of the fields, right? They lexically inherit, so you can just put a line at the top of all of your files saying I'm not ready for this feature. It's just more along the lines of it's obscure from the user, right? Like, for example, if I don't really need to, I'm not going to look at what the types are in the generated code, and so how would I know if I'm adopting Edition 2024 that I have to preemptively disable certain features? I would think it wouldn't compile when you try it. Right, but Protoc would still generate the code, but it wouldn't compile, yes. When you went to upgrade with Prototiller, it would add all the features for you, because your previous edition had string by default, and Prototiller has a no-op upgrade, so it would add string to all your prototypes when you did the upgrade. Got you. And then you also had a slide for the Prototiller example where it kind of, you went from 2023 to 24 and then it went from string view to string. I'm curious why it went backward. So in this slide, string view in 2023 is the future. The default is string, and in 2024 the default is string view. And so Prototiller, maintaining that this should be a no-op when it did it, said, well, you don't need to specify something that's the same as the default, but you were using a default that's no longer here, so I will explicitly set it for you. I see. Okay, thank you. Yeah. So I'm curious about the fact that like the internal proto files, like descriptor.proto, all these proto files, they are, it's feasible to open them and look at like file options, message options, etc. Will you do the same with features so that we can get a list of all the features? Yep. Yeah. Yes. The features live on descriptors, descriptor will have appropriate things, each language generator, like you saw in this case, pb.cpp will be its own file, but you'll have proto messages and be able to see it from there. So will it be like a folder per language and then inside there will be like some proto files? Or because I guess like if you have a lot of languages, right, if you put everything in descriptor.proto? Yeah, it's not all in descriptor.proto. The idea is that each language generator would define its own. So C++ will have a cpp.proto that and then in we cheated a little bit, we didn't import features. There should be an import statement at the top of this, but slides compress things. Okay. Thank you. Okay. I didn't hear any mention of go. Are you planning on supporting go since? Yes. Yeah, we are. We forgot it. It's there. We're planning on supporting it. It's pretty easy to forget a language when you're listing the, like, set of languages we support. Is that out now? No. And Doug here is RTL for go on the GRPC maintainer. Yeah, we're working with Florian on it. Yeah, awesome. So we have five minutes until the next set of talks. So Antoine is going to be in this room. I forget who's in the other room, but I think we'll go ahead and stop the Q&A now and many thanks. Just as a quick, the slides are available online. If I fly back through time here, you can get to the slides if you go to that URL. Thank you. All right. Good job. Good job. I heard it last year and I'm thinking, how far along have you gotten? So it looks like you made some good progress. Yeah, Mike has been the one doing the work. This one? Yeah. Yeah. What if which structure? What if a field is a structure? Yeah. What language? Which language? So if we talk from here, you can even come from back here. Oh really? Really good. Okay. No, that's part of it. You want HG? That's what we tried before and it works. I think I have to do some magic with all of that. Okay.