 So we're gonna go ahead and get started. I know people are kind of trickling in here, but I'm gonna take a little bit of time for introductions to start off with. So really happy to see how many people are interested in API design. I'm super stoked about that actually. Yeah, so this is a talk called Broken APIs Break Trust or as someone from the audience suggested, Broken APIs, Broken Hearts, which is awesome, thanks again. But yeah, we're just gonna be talking about API design, client design, and a bit about backwards compatibility and why it's important. So hi, I'm Alex, and I've noticed a trend in some of the talks I've gone to where people are posting embarrassing photos from their youth. So I was recently on vacation in Japan where I have some step family, and I found this photo, which I have not seen in a long time. And unfortunately also I'll warn, I got a little bit sick on that trip and I'm still recovering. So if I accidentally cough into the mic, just remember that every time I accidentally cough into a mic, somewhere a puppy is getting adopted. So we can feel good about that. Yeah, so I work for the AWS SDK for Ruby. Just curious, show of hands if you've used the Ruby SDK in AWS before. That is super awesome. Feel free to ask questions or heckle me later after the talk, depending on how you feel about it. You can find me on Twitter at Alex Wood with two W's because the one without my middle initial was taken. And I kind of evaluate how well I've done on a talk by how many people tweet about it. So feel free, if I see everyone on their laptops and phones, I'm just gonna assume that people are excitedly tweeting about everything they're learning. Yeah, and I also have bad jokes. I'm sorry, or you're welcome in advance. All right, so today we're gonna walk through rules and strategies for API design. Essentially, we wanna understand what breaking API changes are, how to avoid them, and then how you design your APIs to avoid the need to make breaking changes as often as possible. And we're also gonna try to think about ways we can look around corners to find ways that your users may be relying on your API and client design in ways that you have not considered. So I mentioned my work on the ADOS SDKs. Another thing that is a part of my job is we do API reviews for every API that Amazon releases. So we have over 110 services with a large number of APIs per service and we review all of them. So there's a lot of lessons that we've learned over time about decisions we can make in our API design to try to improve the customer experience. And my hope is that's what we can get across today. So to start off with, I wanna get into some definitions and kind of a common language about how we're going to describe APIs. So this is a Rails and services track talk. So we are going to talk both about how your API design can affect your Rails applications and how it can affect client library design. And also this interaction is really important because your community is going to have potentially a large number of client users, especially if you're releasing Rails APIs which are becoming more and more common. And a lot of your client users may not wanna think too hard about how you did the details of your web API implementation. They just want their code to work and you might even have community client authors who want to make sure that they have a consistent web API experience. I was gonna add a quick note again since we have a lot of people on the wall, raise your hands one more time if you have seats next to you just to help people out. Thank you so much. I'm excited, I'm following that under high quality problems. Okay, so we have here a list of terms. I don't wanna spend too much time on it but essentially these are the terms we use for naming parts of an API at AWS. These are not universal terms, these are not the only way to describe things, it's just a way to have a common language to talk about this. So things like resources which is probably a familiar term if you're a Rails developer but also we use shapes as a way to describe the input and output models and shared components of the API interface itself. So input shapes, output shapes, error shapes and even subshapes and we'll talk about all those. A member is a single property that exists in a shape so often the closest equivalent is a single column in a database for example but your APIs are not necessarily going to be one to one with the database so what we're talking about specifically is API shape members. Operations are something exposed by a service to be invoked so in Rails, generally anything you're writing in your routes file is something that would be likely one to one with an operation. And an interesting note, so when you're writing libraries if we step outside of Rails for a second like just literally code libraries that people are interacting with whether it's across teams or something you ship, a lot of the same stuff still applies. So you don't have to be going over the internet for these API design principles to apply to your work. Okay, so let's talk about the most basic scenario. You have a client, you can imagine the Ruby SDK since a lot of you said you were using it as an example and it's talking to an API so you're making a request to S3 for example. It's a simple, we're all familiar with this. This is the life cycle of something that is done by web service. And eventually we're gonna add new features to our API and we'll launch a new client version that has support for those new features. But as new iterations of an API are released people are going to still be using the older clients. They're still going to be hitting that endpoint and it still has to work. Raise your hand if you're a fan of forced upgrades. We got one guy raising his hand and you are an amazing person who's living on the edge. You're a rebel, you live by your own rules. And it just continues. You have a new version of an API and now you just get this spaghetti nesting problem and it can get out of control really quickly and it's really important that these older clients continue to work. Updating your clients shouldn't be required to continue functioning. New client versions are for new features. It shouldn't be required housekeeping just to keep the lights on. And if you can ensure backwards compatibility in your API the mental model suddenly becomes a lot simpler. Your API evolves, your clients evolve to provide the new functionality and users of older clients if they don't need the new functionality yet they just keep going along. There's no problem. If there is no problem then there is no problem. And you have little to no forced migration. So there's kind of like a recipe for happiness here. And Ruby is a language designed to maximize developer happiness so we can take this all internally. As a design principle, we want APIs to be backwards compatible so that existing calling patterns and output usage continues to work in perpetuity as much as possible. And we want clients to be forwards compatible so that the only changes you need to make to your code are the changes you need to support using new functionality rather than mandated code changes for standard upgrades. Other than our brave soul on the wall here nobody enjoys that. Okay, so we're talking about definitions. So here's an example of how you might model a resource for an API. So here we're talking about a trip which many of us had to take somewhat drama we'll get back to that later. You have travelers on a trip, a string description of it, and a flight shape which we'll get back to in a second. So shape members. This is kind of the common language we're using to describe these API concepts. You'll also notice that we're typing our members and that's gonna be important even though Ruby, we're talking about Ruby and everything's an object because when we get to our full set of customers we're not just talking about Ruby. And flight, it's a special member because shapes can be complex and in this case we have a shape which references yet another shape. So nested shapes allow for us to avoid duplication. So there might be several points in our data model where we're talking about a flight and the flight is the same. You have a flight identifier, you have an airline, you have some sort of status and if you came to this conference your flight status might've changed a few times I know mine did. But you can have changes of that shape be reflected everywhere that it's used across input and output shapes. And we have our operations. So get, create, update, list, delete, fairly standard controller stuff. In fact it's gonna map pretty one to one with a lot of the default routes that Rails would create for a resource. And it's gonna map about one to one with controller actions that are gonna be generated by any kind of scaffolding. So these are basic Rails concepts like whether or not you're thinking about and modeling what your API does, it has an implied model and it's useful to think about what that is and move with a sort of intentionality. Okay, so from an API client perspective let's look at a request response lifecycle. So we're talking about people using code to access your web API. So first we have a user, they're making a client request such as getting trip information and that's gonna turn into a request on your web API. And that's in turn going to give us the response shape that we spent a few slides describing earlier and your client can translate that into a language object so that your user can not think about the details of what your web API is doing. Though this example I actually had to go back and change because that was actually my original flight number but I had to take a different flight because it really was delayed because Blizzard Conf. You may also receive an exception from the service and surface it through the client. So maybe your system deleted the trip from the flight database because you drove instead of flying because they after tomorrow Conf. And that's just gonna be represented as an error and it's gonna be raised and your client's gonna handle that in whatever way is appropriate to them. So yeah, again these are definitions that we use because they're useful to us for designing and essentially reviewing APIs. It's not the only way to describe it but it's a useful sort of syntax. And don't worry about taking a picture of this slide, I'm gonna put all the slides up later. So you will be able to find them, I will definitely post them on Twitter. All right, now let's talk about safe API changes and unsafe ones. So avoiding breaking changes is really important. Like a lot of you have experienced the time where you did a bundle update. You know your gem file was, you didn't have loose dependencies, you were thinking about it and there was some patch upgrade to some library and it had a breaking change and all of your stuff exploded in production. It's a horrifying experience as a customer and so you really want to take intentional effort to avoid inflicting that pain on your users. So let's go back to our trip shape. We wanna make changes. For example, what happens if we add a new Boolean member to differentiate whether a trip is confirmed or unconfirmed or a booking service maybe. Let's totally fine. You can add new optional members to a shape like new output members, new optional inputs. That's totally okay. But I realized another thing and that's that I made this array of travelers and for some reason in implementation I've only ever been doing one to one. So let's just go ahead and delete that. We don't need that. Wrong, no. Changing the type is going to break older clients. It's a breaking change. Imagine the poor guy who made an each condition around the traveler's shape and suddenly they're getting no method errors and nil values. It's a horrifying experience and we're gonna get into this a little bit more later but also think about someone who's using a Java client because you tried to support multiple languages and now their code doesn't compile because they're looking for an array and it's not an object. They actually have to have a type and it has to match. Even if we're using Rails, which is great and for all the reasons we've talked about our clients might be in many languages. Oh, okay, okay. But collections each, I get it. You know, I really wish though confirmed could be a number of states because again, BlizzardConf confirmed as a state of mind. Well, so maybe I can make it a string. It can be multiple types. No, this will also break your clients. You might be able to find clever ways around this in Ruby. I know someone's thinking like, yeah, but if it's nil, then I mean, and if condition works and Ruby's awesome. But again, client libraries in multiple languages, you're gonna break people using compilers and even with Ruby clients, you're kind of stepping into weird uncharted territory. So once you've set a type, it's a commitment. So let's go back to that happy change. We've added a new Boolean member output shape. What does that actually look like for old clients? Like how is this actually working? Why is it okay to add things? So if we're using the latest version of our client, we can access our new value returned by the service. I wanna go check if something is confirmed. It is confirmed, even though I'm gonna go to the airport and it's gonna say my flight is canceled after I get through security, it's confirmed now. And if we're using an older client, we can access the output values we knew about. You know, the confirmed shape is gonna show up, it's gonna get dropped on the floor, but that's fine, I wasn't looking at it. Nothing broke. So adding new things is okay. And if I need to use the new values, I can upgrade. It's a reasonable story. Let's explore another question on adding and removing shapes from our input shape where we're listing trips from a service. So raise your hand if you're familiar with the concept of pagination. Okay, we've got most hands, but essentially like, if you think about like index all and you have a million items, if you don't have some way to break up the response, you're gonna have a bad time and that's pagination. So we have like list objects in S3, we're not gonna return in a single response every single object you have, that could be a very bad time. So we have concepts like next token and max results and you can use that to control the way that your API paginates. So maybe that's all we launch with. That's enough to have a functioning list trips that's gonna scale pretty well. And now I wanna add a new optional parameter called confirmed so that I can go limit our listed trips to confirmed trips or trips that are not confirmed or if I leave it on set, it just gives me everything. And that's totally fine. New optional parameters on input shapes are good. But now I'm running into a scaling problem because Rails doesn't scale, we all know that. And so I wanna require that you have to tell me if the trips are specified or not and is this okay? Interactively, is this okay? No, relaxing parameter requirements is okay. If we had an old required parameter and we wanna say it's not required anymore, that's fine, that breaks nobody. But adding new required parameters is going to break you. Why? So let's consider older clients. They've touched literally nothing, their code works, they're doing what they're supposed to do. And now we add this requirement on the server side. Yep, they're broken, they're probably broken in runtime too. And worst of all, they're literally unable to fix it until they do an upgrade. They can't add the parameter, it's not in the library. And I'm imagining too that a lot of you have experienced a problem like this with a service or library that you use. Don't do that to people. It's wrong. Oh, okay, okay, okay. But we've got this guy, he always upgrades and that's all my customers because they're really on the ball and they upgrade their bundles with a daily build and that's great and that's still insufficient. You still need to make a code change if this restriction is put on the client side or the server side. And to be honest, nobody has a user base which keeps fully up to date with all their dependencies for any number of reasons. It's not a realistic thing to expect or ask for from your users. So what I want to instill is a sort of empathy for your users. Yeah, breaking your users is gonna get a bit of ill will. Some of you probably when you came to this talk or I started talking, started thinking about libraries that had mandatory deprecations and where you're really frustrated. And it's not even the most important thing, it's almost like, why does this engender ill will? Because your code is gonna break at runtime. You're gonna take down someone's application. At best, someone adds a sprint task of urgent but not important but not useful of yet another overhead code change and it doesn't actually help them achieve their goals. And it's preventable. An ounce of prevention in the form of carefully reviewing your APIs before an initial launch is gonna prevent the need to cause most breaking change possibilities. So if you're respecting the rules, even if you have to do a little bit of creative thinking and you're not making the exact perfect change you wanna make is going to lead to happier users and we think it's worth it. And finally, especially important for us because some of these breaking changes here would cause compile time issues for something like Java but in Ruby these problems are probably gonna show up at runtime. And even if you test and you went to the mini test talk, you went to the RSpec talk, you'd love testing but changes to a backend server are kinda hard to model in tests anyway. And you can't rely on the fact that all your users are even writing tests so good behavior is still important. All right, so now we're gonna talk about more about constraints and validation and exceptions. So modification of shapes gives you some of the most obvious opportunities for breaking changes but there's a lot of subtle changes we need to think about when it comes to constraints and exceptions. So again, no new required parameters. It breaks existing code even if they're upgrading but it's important to remember if you have an existing required parameter and you feel like you wanna make it optional, that's okay, it's not gonna break anybody. And it's worth reviewing just because this theme is gonna come up over and over and over again when we talk about constraints and validation. So constraints on both the server side and the client side are important. If you're writing APIs, you're probably writing clients. So if you look again at our requests for listing trips, a maximum value for max results is reasonable. So it's probably implied when you don't specify it as well. Maybe you don't validate anything on the client but if I send you more than, if I say 50 results, I'm gonna get 25 and there's an implied maximum. So you can ask for less, you can say give me five per page but you can never ask for more. So maybe later I wanna squeeze a bit more performance out of my server for first page requests and lower that limit, is that okay? No. No and double and triple no if you're doing validation on the client side because now formally valid requests are gonna raise validation errors. Customers are gonna be relying on that previously accepted value and they're gonna break. But increases by loosening constraints, that's totally fine. No old code using lower values is going to break. They can pull in the new maximum and actually so this is a design decision that we were thinking about in the Ados SDK for Ruby when it comes to client design. We only validate required parameters in the existence of parameters. We don't validate other constraints. So there's APIs that say like you can only have a certain length on a shape and that'll be validated on the server side if you exceed that but we don't check that on the client side. And this actually came in handy at one point. So there's a small service, some of you may have heard of called Amazon EC2 and the fact that you're laughing is because at some point it got too big for the fact that we had eight-digit instance IDs. We actually ran out and users of older Ruby SDKs had no problem with this even if they didn't upgrade. However, if we had validated the previously correct length constraint on instance IDs customers would have needed to upgrade their SDK to keep functioning and it's not a good experience. So I'm not gonna say wrong, you can't do it but use caution when you're validating constraints on the client side. Use your judgment but not a fan. Okay, so I wanna talk a little bit about states. This is gonna get a little bit into the weeds and this doesn't affect a lot of you but it's worth covering at least a little bit. So let's consider the example of our flight shape and the status value and while the type over the wire and in rails is a string, effectively this isn't innumerable. In fact, if you're writing a Java client you might literally use the type innumerable because that's going to be useful for them. So consider the possible states of a flight and I know all of you have as we would represent in our flight status. The interim set of statuses could be used as a flight workflow and the terminal states are the end states of the flight. It was landed or it was canceled like the original flight I was supposed to take to get here. Maybe we decide we wanna be more descriptive and we wanna split landed and to land it on time and landed late. And by now we know this is wrong even though we would have found this super handy for all the people whose flights landed late. Well, okay, but I'm clever, I'm smarter. So let's keep that value and just add a new one. Now we haven't deleted a value. We've only added a new one and I mean again we really need it after our experience in this conference. So cool, right? No, no, no it's not. We have in fact split a terminal state and users are going to be getting an unhappy surprise and the way to think about why this is is to think about concepts like waiters. If you've used the AtoS SDK for Ruby, raise your hand if you've used waiters before. Okay, we got a few hands. Essentially a waiter is something where you're pulling on a described type request over and over again waiting for a certain state to arrive so you might say wait till it says landed or wait till it says canceled and keep pulling every five minutes. Well, now we've added a new state that's just gonna pull forever until it hits a timeout. So we've actually broken our customers. And one way we could look at this is okay, but we still wanna describe if it was on time or if it was late in the end. You can add an extra member that has that information. So this is where we're thinking about creative ways to get the information we wanna get across without breaking existing behavior. Adding interim statuses is different, it's often okay. You could look at them as lifecycle events. So rather than pulling determination on these states and risking looping forever and for all time, you're potentially taking an action on certain interim states. So a lot of the airline apps when a flight is suddenly boarding you get a text message notification. That's something where you're pulling on an interim state. And if it never boards, cause it's canceled, well, your code probably knows that's a possibility and you're ready to handle it. If new unrecognized services states arrive, that's a lot easier to just ignore and drop on the floor. And I say this is tricky because there's interim states that are optional in the middle of a flow and then there's state forks. State forks are generally not so good, but I could probably spend a lot of time just talking about states and terminal states. The short way I'm gonna say it is generally interim states are okay, new terminal states are not okay. And also consider that even if you're adding interim states in languages like Java, you probably have to future proof. You have to account for the fact that you might get an enum value you don't recognize at some point in the future and you don't wanna just explode as soon as you see it. So thinking about ways you can future proof clients is also important. Okay, so remember our old example of a simple exception? I canceled my trip, I drove and I specify a unique ID that's not present. I get an exception and it's neat and tidy and I can handle it rather easy in code. So I go to get a trip. It's in like a getter method and maybe if the resource isn't found I don't need to crash hard. I can ignore it. I have to pick up the error. I log it. I return nil. It's cool. We've all written code like this all the time. It's not invalid. When do we have a problem? Well, I've decided that HTTP status code 4.10 is the greatest idea of all time and I'd love to give my users an amazing insight when they specify a resource that used to exist but doesn't exist anymore. So I split my exception behavior. Okay, pause. Again, show of hands. Is anyone in the room been burned by code that depend on doing this? I feel so good for you. I only see a couple raised hands. I've seen it happen. It's not fun because what ends up happening is this old code with squashing one type of exception. Now it gets another type and that's gonna jump the stack and possibly crash your application at runtime. The code above has no idea why the resource is missing and now for a previously supported use case you're gonna get an error breakout. And some of you were thinking, because we're all smart, inheritance can fix this and in library design you're right but an API design nesting exceptions can bring on a lot more complexity and you don't wanna necessarily go there. So clients need to be future proof. So proceed with caution or again prefer to use added fields and headers an exception to breakout details. So go ahead and put your 4.10 gone information in an optional header field. It's fine, it'll work. This is wrong. Yeah, adding a new field parameter where you have that information you can go ahead and fork in your logging. That's cool. And customers who use curl style requests will also like you a lot more. Okay, one thing I wanna think about for a second is looking around corners because again, customers are clever. They're gonna do things that you weren't anticipating. A note of caution. One thing you'll note throughout the presentation is a lot of the proposed API changes that broke customers were well intended and even potentially useful. But at the same time, customers weren't writing bad code and getting punished for it. They were following common conventions. So I wanted to introduce Hiram's law. Backwards compatibility in practice and especially at scale, it's hard work. Customers are gonna build workarounds based on shortcomings of your API whether those are real shortcomings or just something they perceive should work differently. Like you give a hacker an inch they're gonna take you to the boom. And they notice behavior quirks that you didn't intend on and they build on them. So if Stack Overflow becomes sentient and starts shocking PHP developers they're gonna start depending on the free electricity at some point. Every detail or inconsistency is gonna become part of someone's workflow and careful design can minimize those sharp edges and reduce the scope of the problem. But now you can't stop a security fix because it's gonna break someone's workflow. There's always exceptions but you should be mindful when you're making changes that have a downstream behavior impact. So I know that when we're doing changes to the Ruby SDK at AWS we're super, super paranoid about things that are going to break users even in applied ways like new defaults changing the way that retry behavior works. It's not technically a breaking change but if you were relying on some part of that behavior you might get a surprise you're not happy with. You might have been relying on the fact that we only retry a few times and then fail out rather quickly so that you don't have cascading timeout failures in your system. And suddenly we decide to give you a lottery tries so that we can make sure things will succeed and you get a knock on failure that takes down your system. You're not happy with me if I do that to you. So think about not only modeled behaviors but unmodeled behaviors. All right, so let's wrap this up with API design rules that you can use. But first I wanna talk a little bit about related talks you might wanna check out if this is peter interest. And again, I'm super stoked to see how many people are interested in API design. So much of this talk was inspired by a talk from my colleagues Kyle Thompson and Jim Flanagan at this year's re-invent called embracing change without breaking the world. We all work together on reviewing APIs at APS or APIs at AWS, say that five times fast. And their talk is especially useful for understanding the static language side of the problem in more depth. Kyle is a developer on our Java SDK team. Second, I gave a talk at the same conference about building APIs with Ruby using a Ruby on Rails example using our API gateway service. And given that API gateway gives you the ability to generate SDKs for your services automatically in multiple languages including Java, Ruby, several mobile SDKs, JavaScript. This talk can be a bridge between the concepts you learn today and using services that we offer, quick plug, for applying them to your real world API customers. And finally, I added this one recently but there was another talk which already happened. So you'll have to look at it later on YouTube if you didn't go on GraphQL. It has some interesting ideas about different approaches to some of these same problems. It's not an endorsement, it's not not an endorsement but if you want to learn a little bit more about ways people are looking at these same problems, it's an interesting talk to check out. Okay, so, four out of five, this is the slide you can take a screenshot of if you want. I'll still be posting all the slides later but this is kind of the thing we've been building up to over time. So I have enough time to do the thing that speakers are not supposed to do and read the slide. So do for APIs, add new members and shapes. Do add intermediate workflow states, cough carefully. There, a puppy got adopted, I coughed. Do add detail to existing exceptions. Do add new opt-in exceptions. So I use a new feature, I might get a new type of exception, that's okay. Do loosen constraints. And for clients, think about forward compatibility, build so you support all those things. Focus on discoverability which is not something we dove into too much but could be a whole talk in and of itself. And Kyle and Jim's talk at Reinvent talked a little bit more about this topic if you're interested. And then the do not. Do not remove or rename members and shapes ever. And renaming is the same as removing. Don't change member types. I know you're clever. And that you can find some way to make it work in Ruby. It's gonna cause a problem, I'm telling you. It's not a good time. Do not add new terminal workflow states. Do not add new exceptions that are not purely opt-in. Don't fork exception behavior. And don't tighten constraints. And in clients, again, asterisk here, this is kind of a do-it-your-own-risk but I wouldn't recommend it. Don't validate API constraints or do it very carefully. It can cause more problems than it's worth. So hopefully you found this useful. I've got AWS stickers. So if you have questions, I'll take them over here. If you just want stickers, I've got them over here. So there's two sticker piles. Thanks everyone for your interest. Thank you.