 Hello. Thank you for joining me today for Show Me the Schema, RPCs and Learning to Love Code Gen. My name is Eddie Zaneski. You could find me on the internet at eddyzane. I'm coming to you from Denver, Colorado in the United States, where I like to climb big mountains. I'm a software engineer at a company called ChainGuard, where we do software supply chain security. I'm also a maintainer for the Kubernetes and six-door projects. And I'm sick and tired of writing API clients by hand. Is anyone else? Yeah, cool. So I figured I'd start with a story. And so I'm going to set the stage. I joined ChainGuard back in April of this year. And as with any small company, any small startup, you wear lots of different hats. And so I had to put on the hat of helping with our marketing operations. So I had to wire up a CRM that we were using called Copper. It's like Salesforce. And I had to wire that up to some different APIs and kind of make everything work and sync together. And so with that in mind, like rewinding back, like eight, nine years ago, I was working for another API company. And I was responsible for shipping all of these different language clients. All of these were written by hand. Whenever a new feature came out, I had to go in there and add this feature, make any changes, bug fixes across everything. We had extra libraries that these libraries depended on. So I was writing in tons of different languages, including Perl. I published a package to Cpan, which was a journey in itself. And my beard immediately grew to the floor. And so I was very strict in this world of these libraries need to be idiomatic. This needs to be idiomatic code that developers know how to use. A Python developer wants to use Python S code. A Ruby developer wants to use Ruby S code. And so when we were approached with the opportunity to try out this new thing called Swagger, I was like, no. I'm not going to generate code. I think it needs to be idiomatic and work. And so I completely rejected cogeneration. Cool. So now we're fast-forwarding back to the present. So I had to do this task with the copper API. And so I pull up their API documentation. Obviously, the first thing I look for is any clients that they have written. They had a JavaScript client that was pretty old. And that was it. And so I go to look at the API docs. They have a search endpoint for, this is how you list all the companies. You make a post request to the search endpoint. And going through the docs, I was like, OK, I really don't want to have to do this by hand. And I thankfully found a page in their docs where they had a postman collection. Anyone use postman before? Yeah. I love postman. Postman is a great tool for playing with your APIs, mocking it out, documenting it, testing all different sorts of things. And so when I saw that they had a postman collection, I got super excited. And I figured, oh, I know postman has code generation or some kind of code sampling built in. I can generate a client this way. It'll be easy. And so I load it up in postman. I test out my API keys, test out my headers, and everything's working as expected. I pop over to the code generation for postman, and it gives me a pretty simple and awesome code snippet. The problem is that this has no types. And I'm writing in Go, which is a type language. And so I'm not able to easily do this without having to marshal into map interface, which is basically varchar in void star in Go. And so I was like, OK, well, I have a postman collection. Can I turn this into open API or swagger? Because I know I can code gen from that. And so I actually find this awesome package on NPM called postman to open API. And this lets you convert from that postman collection into an open API spec. That's cool. And so I convert it. I run it through my swagger code generator, and I get stuck with a body of byte slice, which is just raw bytes. I could, again, turn that into a map string interface, but I'm not working with types. The postman tool doesn't have this concept of types. So I'm using the wrong tool for the wrong job. And so I'm like, OK, how can I fix this situation? Well, thankfully, the copper API docs had lots of JSON examples. So I pull up this awesome tool called quick type. Your editor might have this built in. You can paste in JSON on the right there, and then it'll just generate you structs and types for just about every language in there. There's tons of different languages that you can pick from. And so here I go. I'm able to take this raw JSON example, generate a Go type for it, and I go to plug this in. And then I run into an issue with pagination. And of course, like many of you, I was like, no. I don't want to have to deal with pagination. I don't want to have to write this by hand. All of these things keep compounding and adding up. So I reach out to copper on their support, throwing a hill, Mary, like, hey, do you happen to have a open API swagger file internally that you might be using and not be publishing? And RJ, thankfully, asks the engineering team and confirms that they don't. So thank you for asking at least, RJ. So where am I at? Well, I've spent a good amount of hours on this problem so far. I'm no further than I was when I really started. And so I kind of put on headphones and go to crank out this API client by hand. But it's more like this. I don't want to be doing this. I want to solve a problem. That problem shouldn't have to start with building a client for an API that I don't own. We're a paying customer. And so I eventually come up with a giant pool of code. This is a single method with all the types somewhere else. I got pagination baked in and what do I have to do? I have to copy, paste this logic over and over again for all the other methods. I could write my own code generation. I could write my own abstract logic to handle this. But I just want to get done with the actual problem trying to solve. So I don't want to do this. And I come to the conclusion that this sucks. I don't want to have to write clients by hand. I want to do and complete the job that I started. And so this is what I've been calling my first rage CFP. After spending an entire day trying to solve this problem and generate a code client, I threw together a CFP and submitted it like five minutes before the deadline. And so I'm here. So shout out to anyone who wants to give a talk at a conference. Just get mad enough to write a last minute CFP. But who's been here before? Can I get a show of hands? You've been in this position where you just want to do what you're supposed to. And the real thing is this isn't anyone's fault. This isn't Postman's fault. This isn't Copper's fault. This isn't any of the tooling's fault. This is just where we're at right now as a community. I can't blame Copper for not having a go client. There's how many other languages that they need to support and staff with resources. And they want to build features. They don't want to build clients. So my question is why is this still a thing in 2022? Why is this still happening? This was a problem years ago and it's still a problem now. And so I have a thesis. And so this is the part of the talk where I'm supposed to tell you I have a solution. I don't actually have a solution, right? There's no solution here. I have a proposal. I have a pitch. And my pitch is that we should be publishing more about our APIs than just docs. We should be publishing the schema. Whereas my friend Kirby likes to say schema. And so talking about what a schema is, right? This is the definition. This is the shape of our API. I can look at the schema, interpret it different ways, maybe do some cogen, maybe dynamically create a client. And so what I really want out of this is what we call an IDL or an interface definition language. And so Wikipedia's definition, this is actually great. It's a generic term for a language that lets a programmer object written in one language communicate with another program written in an unknown language, right? So this is the idea that we can agree on a syntax, a contract for what a spec or a protocol looks like. I can do some kind of generation, send this over a wire, send this to a different program that's written in Python when I'm using go, and it will just work. This is basically what we're doing on the web already, right? But we're not agreeing to the standard the spec these types ahead of time. So taking a look at some IDLs that are out there, swag and open API, I've already mentioned. Most folks have used that before. ASN one, anyone heard of this before? Used this many times a day. This is the IDL that certificates are encoded in for all modern cryptography. So all modern cryptography gets marshaled into an ASN one syntax, and then turned into like a PEM or a DER or different bits. But you use these every day. Protobuf, which is my personal favorite, we'll spend a bit more time talking about Protobufs. This is a tool written by the folks over at Google. Let's you serialize your code into different bits. Apache Thrift is another very common one, made by a bunch of Googlers who went over to Facebook and wanted Protobuf, so they built Thrift is what I'm told. Smithy is a tool by the folks over at AWS. It's very similar, and I think it might predate a lot of the other tools. They just recently open sourced it the past couple of years. And so taking a look, so this is what a swagger file it looks like. To me, this is still cumbersome to write. There's nested YAML. You could do this in JSON, but there's editors out there that you can use that make this experience a little better, but your auto-completion, your editor can help. But again, this is something that I feel like a machine should be generating for me as well. To me, this is still too close to the lower level where I wanna be. It's what Protobuf looks like. So at the top here, we have the definition for a person object and some fields that we define on there. Those numbers are kind of just what we call field descriptor numbers. You can ignore them. They basically just have to stay the same. They infer backwards compatibility. And you can see at the bottom, that's the code that gets generated for Java. This just gives me, I've been calling them pojos, plain old Java object. I'd rather work with that. From that tiny bit of syntax and code, and I can work with a pojo and an object builder that is Java-esque that I know how it works already. Thrift looks very similar to Protobuf. Same kind of stuff. Different syntax, same purpose. Smithy as well, looks very similar. We have different services, different types in there. Different approaches for the same problem. And so the IDLs handle part of this problem. So this is sending our contract, sending our spec and our definition. This is what I'm gonna write. And once I have, Protobufs don't have to run over a network, for example. You can serialize a Protobuf and write it to disk. But when we talk about RPCs or remote procedural calls, this is what we're sending over the wire, right? Dating back to the days of SOAP and all other RPCs. Don't get cringy on that one, right? It's like, it's moving forward. It's how modern frameworks are gonna send these remote calls to access functionality that another server exposes, kind of like an API. And so there's different languages that ship-specific RPC clients built in. Go has NetRPC, which is super common and popular. There's also a JSON RPC underpath of that that you can use. I think Python also ships with a native RPC client. GRPC, which makes use of Protobufs. This is a CNCF project. Probably has the most traction out of all of these. Torp, anyone use Torp before? Torp is from the folks over at Twitch. It is basically Protobuf for REST. It works really well over the web. GRPC requires you to use HTTP2, while Torp can run over HTTP1 and 1.1 and all that. So it's a lot closer to REST. Connect is a rather new framework from the folks over at Buff. This is something I'm a big fan of so far and been following heavily. Connect is basically GRPC, but the protocol re-implemented. We'll talk more about why the GRPC protocol can get a little challenging, but if you're starting from scratch, I definitely recommend you check out Connect. Apache Thrift again, runs over the wire. Apache R-Row is a cool project. It is usually used in the data warehouse and data serialization world. This kind of gets serialized to rows. So if you have data that can be represented at a row in the database or a spreadsheet, of R-Row is a good RPC to take a look at. And then kind of sort of, yes, GraphQL. GraphQL has an interesting thing where it's kind of great for front-end developers, but on the back end there's tons of performance issues with you're joining. If you were to write the same SQL query, it would look nothing like the GraphQL data resolver that it's putting together, so I'll include it here. It's a good step in the right direction. And so looking around on the internet, so my first question was again, why aren't companies doing this? So I happened to find that Twilio, if folks are familiar with Twilio, they're a communications messaging company. As of 2020, they actually started publishing a swagger spec for all of their API definitions. And so I came in here and I saw that they have this spec, can grab it in YAML, and then open their API. And so this is their open API spec for all of the Twilio API surface. It's a huge file, how many lines is this? Almost 25,000 lines. And this is most likely machine generated. I don't know for sure, but this is most likely machine generated. And I know that their go client actually generates off of this, right? I think they might be working towards their other clients doing this as well. So this was really cool to see shipped. And so I took that just as a proof of concept, and I was able to generate a client library from it. And it just simply worked, right? I plugged in my authentication, hit their API endpoint, listed all of the phone numbers that I had on my account and printed them out. As far as developer experience goes, this is fine for me, right? It's kind of unidiomatic for go, right? I have to dereference all these things here. That JSON 200 thing at the bottom I'm accessing is the response type that I'm expecting back. That could be named something a little better probably. But as far as experience goes, I'd rather have this than having to build it from scratch. And if Twilio wanted to, which was what they did for their go client, they can build a wrapper around this, right? So you don't have to generate your client and ship that directly to your customers. You can generate the bulk of it and build a nicer user interface over top of it. This is what a GRPC service looks like. Super straightforward and simple. We declare a service and then RPC methods on there. These image requests types and image response types, those are just those proto messages I showed you at the start. So that's really cool. And so I have a couple of demos to show folks. And so I took anyone familiar with stable diffusion yet. If you haven't played with it, it's awesome. It's open source, like Dolly image generation. So text to image. And so I have a server running that's built off of a Python GRPC generation. And GRPC works in like, I think eight or nine different languages. And to support a new language, someone just has to build a compiler for it. So there's specs for that. So if I show you what the protos look like on that. Right, so this is what my service looks like. I have this image type. I can generate the image request, takes in the prompt, sends back that image type. I can list the images, get the images. Pretty straightforward. Here's what my service looks like. It's pretty much the slide I showed you before, but I have some extra annotations in here that we'll talk about in a little bit. This kind of does some really cool magic. And so if I co-gen that, right? So this is gonna read all my proto files for my whole project, doing a bunch of other gen in here too. And so what we wind up with is this client that we can use, right? And so I have my generated code that's in here. So this is just regular, my GRPC generated client, pull in some of the other GRPC stuff, open a connection to my host. GRPC wants to work with HTTP to SSL certs by default, so it's secured, but I'm gonna set it up for insecure. And then this is what I love, right? So I took this plain old definition file and I was able to co-generate a library. And inside of my editor, I can get full on completion. That is syntax aware, right? And I can just call those methods that I want, right? So I can call this generate image method. That takes in a go context. That takes in a generate image request. That takes in a prompt. I give it a prompt, what's a good prompt? A yellow dog, right? And boom, so that's just gonna work. So I can save that and run it. I forgot to log something, right? And so if I pop back to my Python client that's running on my server, this is running Python. So this is talking natively to PyTorch, using the stable diffusion models directly. At first, I built this server in go and I was exacting out to Python to call it. And then I realized this is stupid. PyTorch is Python. There's a GRPC client for Python, so I built a Flask GRPC server around PyTorch. And it just works. And so I got back the thing that I didn't log out. And so we can look at the prompt. Where we go? A dog, here we go. Let's see what a yellow dog looks like. Hey, it's a yellow dog, right? So super straightforward. But you can take my protos and you can generate a client in whatever language you want. And so that's a quick GRPC demo. And so we need to talk about rest a little bit, right? So rest was a convention that we all decided on how to treat URLs as resources. And so rest is kind of discovery for our clients, right? So if we want to get our users, we can make a get request to slash users endpoint, right? If we want to create a new user, we usually post to slash users. Different filtering, different querying, updating patch request. And it was a great convention to make sense for developers who were using an API. But there's still some problems with rest. And I still think we need to move away from rest and further into getting back into the RPC world. I like to start talking about what we call like these artisanal decisions that you have to make with rest, right? So let's say that you want to fetch a user. You hit that slash users endpoint, like I said. Well, what if you want to fetch a user in an organization? Do you pass a query arg with the org ID? Or do you do slash users slash org ID? I don't know. You want to fetch a specific user in an organization. What does that look like? We're doing slash users slash org slash ID. Well, what is that actually referring to? All right, so it's just these things that we don't want to have to be bike shedding and thinking about. Same thing with errors with rest. We have status codes, right? That's great. We all agree on status codes for response 200 okay, 201 created, 404 not found, whatever. Well, are errors sent back as status codes or JSON? What about the famous 200 okay, status not found message, right? How is this useful to anyone? You're basically just making this up, right? As far as rest goes, it's a great tool. It's a great place to get started. We've learned a lot from it, but there still aren't answers for these outside of agreed upon conventions. But I think the worst part with this is that it forces you to focus on those URL paths instead of the shape and structure of your data. And that's what I like about the protobuf in the GRPC style, where we actually start with what our schema looks like. Same thing with Swagger and OpenAPI. We start with what our path looks like. We start with what our types look like. As opposed to bike shedding about what this URL path looks like. No lie, when I was working for that API company, we spent an entire month discussing what the endpoint for creating an email would look like. Because you're not sending it in real time, so you get back a receipt that it's supposed to be sent. And what does this look like? It's just a complete waste of time to figure out what should have been a simple RPC call. So why are we doing this? Why aren't companies shipping and publishing Protos and OpenAPI files? Why are we stuck? Well, some companies are. Google actually ships all of their Protos for the Google APIs. If you've ever used a Google API client before, it's probably generated from Protos. They're all publicly available, and you can use them, generate a client in your own language. GRPC and Protobuf also has a ton of extension points, so you can add extensions for different things. We saw Twilio was doing it. I'm sure there's plenty of other companies out there that are shipping and publishing these things. But the real problem is, I think they still are cumbersome to write. As far as Swagger and OpenAPI goes, that's still a cumbersome tool to write by hand. The tooling is complicated. Setting up GRPC and Protos takes so much time. It's very complicated. I have this magic incantation script that I invoke that is filled with commented examples about how to do different compilation and you gotta pass all these flags to tell it where to get paths from. So that's kind of a pain. Breaking changes are also hard to do. Not in your head, yeah. Breaking changes are also kind of hard to do when you have these contracts. Do you generate a new client every time that you make an API change through a field? Does it break? Depends on the language, depends on the implementation. Well, what if clients are already using a generated client in production? And the real problem is that a lot of these specs and protocols are way too focused on performance. Google built GRPC internally to solve a real Google problem that has tons of data, tons of requests. Most companies will never see a fraction of that traffic. And so when it comes to GRPC specifically, HTTP trailers are one of the hardest things to reason with, right? The HTTP2 spec has trailers that are sent after the request. Well, browsers will never support them. All the browser renders have said, we're never gonna support them. So we're stuck in this place that GRPC is very complicated to get working in the web. They also have specific things where if you're sending a certain header, it has to fit into a certain data frame. And there's different rules for if it goes beyond this data frame. And so what I've found talking to people is that if you wanna write a GRPC client for a new language, you often have to write a full-on HTTP2 client from scratch that follows the GRPC spec. So you're definitely getting into trouble there. And I mentioned browser support. Those trailers will most likely never be supported by the browser vendors. So it just will never be able to talk natively. And discovery is still a problem. Figuring out, like searching for what companies ship Proto's, what companies ship open API Swagger files doesn't give you many results. You just have to pray it looks like that there's a, in the docs or a GitHub repo that has this information. The buff folks, I mentioned them before, they're doing a lot when it comes to Proto's that I wanna call out. So I can, this is actually the Proto that I uploaded. They have a CLI tool that makes compiling this much easier instead of writing those bash commands. You can just write some YAML manifests. And so this is kind of cool because this is self-documenting. I could put annotations and comments in here to figure it out, but I just basically shipped them my entire Proto file and they documented it out and commented it. That's really cool. And then, so transitioning, right? So I think we're at this point where we need folks to start doing this, right? We can't stick around and have people writing API clients by hand forever. It's a waste of everyone's time. And so the GRPC gateway, I think is a good thing to call out in here. That was these annotations that I had in my Proto files. So you can see here, I can annotate those requests and I can say, hey, this is now a, when a post body comes in, stick all that body into my request, right? So this is a way that I can take my Proto's and make it work with HTTP and REST style APIs. This is a good transition point. And I know that most companies that are using GRPC internally are probably using GRPC gateway behind the covers. They're just not exposing that GRPC endpoint. I can definitely say most of the companies I've worked at in the past have been doing this. And what's cool about this is I can generate a swagger file from this API gateway, which I did. So I created a swagger file somewhere in here, gateway, there we go, right? So this is the generated swagger file from that Proto buff. That's cool. It looks like a regular swagger file. I took that and sped it into the swagger generator, right? So I went from Proto buff to swagger to swagger generated code and it just works. Everything works. When we're generating these clients, they're built to a spec and they work. The Buff CLI I mentioned, really great tool to get you started. It kind of tackles that, I don't know what flags to pass Proto C, so check that out for sure. Connect is another great thing to get started with. Like I said, it works in the browser. It works natively. It backwards compatible with Proto buff for the most part in GRPC. So you can just plug it in and use it. It kind of works everywhere. The browser can talk to the client as they re-implemented the protocol. Torp again is a good middle ground to get started with. It's more like GRPC for rest. The folks over at SmartBear have a cool tool called Packflow. This can help you with detecting breaking changes. So you can do contract testing in place so you can figure out if your API is breaking as you generate it and do some radar testing. We need more folks to do education, write content, make videos, do examples of all this cool stuff. If you like doing that, it's a great place to get started. There is not, having spent the past like two weeks working on this talk, there's not a lot of good getting started information out there. I'm gonna give a quick shout out to Akshay. He works at Buff and he gave me a ton of time to get this talk right and answer a bunch of questions I had. More resources for you to get in the slide decks and cool. So this is a link to the slides. If you wanna scan this, take a picture of it. And I'd love to answer any questions folks have. This, like I said, this talk was more of a rage talk to highlight the problem and suggest how we get to a solution. So thanks for coming. We have a mic if anyone has questions. Thank you for the talk. It was excellent. You mentioned idiomatic and user friendly code. What are the closest things you've seen for being able to generate code that just feels good? Yeah, great question. I still think GRPC gets the closest. If I'm a Go developer, so I'm very biased. And that's actually a big complaint with the GRPC community. And I've heard this a lot from Ruby developers is that most of the GRPC generators out there were inspired by Go. And so in Go, so GRPC also has like streaming, bi-directional streaming, single streaming. You can open a channel that keeps going. So as a Go developer, GRPC will create me channels that can just feed into, right? So it's like a native language construct. So I think GRPC gets closer there. The Ruby client again, not so much, creates things like text to image service, service class names, which no one in Ruby would call their stuff. Yeah. Thanks. Thank you. Great talks, thanks. One question, little attempts have done of generating things. I always ended up having to tune things a little bit, making some changes, which means that if I'm going to regenerate things, I'm going to have to redo my changes again. So how do you deal with this kind of things? Yeah. That's a very common thing that I've seen. That's why in the Go world, we have that kind of, we call it the hack update code gen script. It's kind of just like a convention. And so I've seen all sorts of different things, right? And it's, again, it's a hack because the tooling still isn't where we need it to be. So I've seen folks take this file and drop said commands in here that update the generated code, right? Is there something specific that you have to make changes to? Yeah. The other thing I forgot to show was the generating these clients, the server code that gets generated is super straightforward. This is a Go, this is what a Go server looks like. So I just have to declare my server struct up there, implement an interface basically in Java, right? I would just add my business logic and this to do thing. This is just a greeter service, right? So I could send back like, oh, your name is whatever. And then that's what I love about it is the rest of this is still boilerplate. So I can just fill in my business logic in those generated sections. Yeah, I actually had a question about the service side here. So the tools you discussed here, all of this, can they also generate the server side of it or are some of them only for the client side? Yes, yeah, so GRPC definitely generates, actually all of them do server code. So thrift, smithy, GRPC, the swagger generator generates a swagger backend. I have a bunch of demos in this repo if you wanna check that out. This is what the, oh, this is the client, but the swagger generator also can generate a server stub for you. And we actually use this exact generator in the RECore project, which is under SigStore. So that's all generated. We have one giant open API file that everyone makes their changes to and generates the server from. Yeah, and so you mostly just wind up with these stubs that you fill in like I showed. So like to do put business logic here. Any other questions? Thank you for the talk. According to you, which is the best way to distribute the prototype for other people to build a client? Like having it on GitHub, is there some best practice around this? Yeah, that's a great question. And that's again, that's why I think the Buff folks are doing a great job. For Proto specifically, they have this Buff schema registry, which has a bunch of different bits in here. Like the most popular ones are the Google APIs, right? And so this has all the Google API Proto's. And so this is an open free registry that people can use. And what was really cool about it is the tooling for this. So this Buff generate file I don't even have to have the Proto Buff plugins installed. So this can do remote building. So I can say, hey, remotely build my Proto using this. It's basically a container, right? It's a Docker container. And as far as I could take any of these from the schema registry here and you can add those as dependencies to your code. So somewhere in here, here we go. So here's where I'm declaring the dependencies I have on that. And so that will just remotely go out and pull them in, right? So it's like fetching, importing something based on URL. So to me, that's awesome. Maybe I might have time for one more. Great talk. It's not really a question, but it's just because I'm from the Swagger team. So we are also thinking about a new Code Gen option. As we've rebuilt some of our core, which is API DOM, which is like a common DOM for multiple different types of specifications not just Swagger and Open API to help us reimagine Code Gen. And it's all about letting Code Gen happen in the coding languages that folks wanted to happen in, which I think will go a long way to making it easier. And I would also add to your list to check out Cattle from Microsoft at CADL. So they're doing some cool stuff with auto-generating Open API directly off your code. Oh, very cool. Thank you so much. The other cool thing, check out the SmartBear folks. They have another async API spec for Open API where you can generate based on Kafka streams and other kind of async producer consumer models. I thought that was really cool too. I'll be happy to stick around and answer anything afterwards. Thank you so much for listening to me rage rant and for hanging out.