 Ich sag nur kurz etwas, warum du hier bist und alles weiter rüber lasse ich dir Vorstellung und so. Wir machen das nur für die Feature Session so eine kurze Einleitung. Ja, und das fühlt sich doch ganz ordentlich. Dein Name wird David ausgesprochen, oder? Baby? Ja? Okay. Dann darf ich erzählen. Das ist, die Leute verteilen sich immer weit das möglich. Ja, und die machen einen Gang sitzen, damit das es kann. Was mal beim CCC-Kongress? Dann machen Sie am Anfang immer eine Defragmentierung. Ja. D.h. alle, die an den Seiten sitzen, sollen nach innen rücken. Geil. Das nennen Sie dann Defragmentierung. Ja, vor allem bei dem Kongresszentrum in Berlin haben die achten, die auch scharf darauf, dass die Gänge frei bleiben für die Notmastgänge. Das ist hier manchmal ein bisschen... Na, den können wir euch anfangen. Wait until those people sit. Okay. Good morning, everybody. And thanks for showing up to the last Feature Session in the development track. I hope everybody attended the keynote and if you remember the HTTP commercial that Fabian did, you are now in the right session to learn what this is about. So, I'm very happy to present you David Zürcher. Please, David, go ahead. Good morning. Can you hear me well? If you cannot hear me well at any point during this conversation, throw a soft object my way. I'm really excited that this is a Feature Session. So, I wasn't even aware. Apparently something special. Okay, but thank you for showing up so early in the morning. I'm sure you've had lots of beer yesterday. So, I really appreciate you showing up so early. Whenever I'm at a conference, I sleep until noon and then I'm late for lunch even. The session is called Designing HTTP Interfaces and Restful Web Services. Welcome to the last few people. I'm a minute late. Sorry. I'm not very German of myself. This is my name. You can spell it with an U because I am in fact German and you might recognize this city. I think you've been there. I work here in Munich as one of the founders or partners of a consulting company called BitExtender. We do various scalability web architecture stuff, things. Also lead developer of the Adobe framework for PHP. And this is my Twitter nickname in case you have Wi-Fi at the minute. You can follow me and praise or bash this presentation as I proceed with it. I want to start this talk with a look at how things were before the term rest was even popular. And back in the days whether it was a simple personal web page that had some sort of guestbook or comment feature or a big web shop, all the rails kind of looked like this. So you had the index IPHP question mark something. And then at some point during, well, after the millennia break, the turn of the century, people start figuring out, hey, we can earn money on this internet thing. So along with that came the SEO experts. And the SEO experts said, nine, you can't have these URLs. Verboten. At least the German one said Verboten. So we had to make the URLs SEO friendly. Right. Like this. So that's on global then. And then things really spun out of control because really nobody had a clue what they were doing. And also the advent of Ruby on Rails, where you put everything with slashes in the URL. Then things spiraled really out of control because from the latest hamburger videos you get to some obscure things like local picture, yes, one, two hundred. And around the same time, the web, the whole idea that we need service-oriented architectures. And that we need to have layers in our systems, et cetera, became really popular. And for quite a while, the SOAP standard was popular still because it's one of those dot-com bubble things, where XML was everything. So people kept using that. And you have a post request to some sort of, I'm sorry I don't have dual lasers. That would be really cool actually. It could be like the Arnold. I will be back. So I'll be, yeah, so I'll pace like a tiger in a cage between these screens occasionally. So, for you, good morning. So you would post to a SOAP endpoint, right? And there is an envelope and a body, and it has the get product thing there. And that tells the server what operation we are calling. And in this case we are calling the get product operation, and we want to retrieve a product based on an ID, which is an argument to that method. It's simple remote procedure calls, right? We are calling a procedure remotely on another service. And the details of how our objects are marshaled into XML and marshaled back to, and some other object on the server side is completely transparent to us when we call it, because in code we just call a get product method on some sort of object, and it does it over the wire. And then the response is going to be the red stapler with ID 13456 and a price. And somewhere there's a whistle file with an XML schema. The whistle file defines the get product operation, and the schema defines the responses, et cetera. So the product object has three fields. And then if an error happens because the product doesn't exist, then SOAP really, it doesn't take advantage of any of the HTTP features, because it's designed to be transport agnostic. SOAP can be done over HTTP. There is actually a standard over SMTP. For a very long time Amazon did SOAP calls, dozens of SOAP calls for every product page through custom socket connections with custom serialization. They didn't serialize from and to XML, they used sockets. But the principle was the same. And at least in the spec for the transport of HTTP, it says, well, if there is an error on the server side, a SOAP fault, which again is very generic, you could use an HTTP 500 internal server error code, to at least on the wire indicate some sort of failure. But if you have an ad product operation, it doesn't return out to a one created code. It always says 200 OK. And the whole thing of marshalling these things in the envelope, and then people don't understand XML schema and don't understand whistle, so they built the XML by hand, because they don't know that they can just load it into a client and just fire off a request. SOAP sucks, that's what really everyone said. And it's true to a really big extent. In many cases, it does really suck. So let's build APIs without the clutter. For instance, one of these examples is the joined in API, the old one. There is a new one now. On the joined in API, you make a request, which again is a post request, to some sort of generic endpoint. And you pass some sort of envelope with XML. And in the XML, you enclose the operation that you are trying to call. In this case, we are calling getDetail and we're passing it a talk ID. And as you can tell, the authentication details are in line here in the request. It doesn't use HTTP authentication. So you cannot have a generic proxy in front of your web server that handles authentication. You can also not have a proxy in this case that says, well, it's a get operation, so we can cache that. Because the proxy needs to understand your custom application logic and your application semantics. It needs to know that it can inspect a post request because it might be a side effect free request. And that it may cache the operation whenever it starts with gets or something like this. That's not good. Summary. It's always a post. It doesn't use HTTP authentication, for instance, among many other things that HTTP has to offer. The operation that you're trying to call, and this is the really, really important bit, is enclosed in the request and completely domain specific. Even though the protocol HTTP has means of conveying this, it's not cacheable because of this, without a lot of custom logic. And everything goes through one endpoint. There is a guy called Leonard Richardson. He once wrote about a maturity model for restful applications. And then Martin Fowler, who wrote Patterns of Enterprise Application Architecture, among other things, you might have read that book, picked it up and said, this is Richardson's maturity model. This is really cool. And what we just saw is level zero. It's plain old XML over the wire. You could improve that by giving every resource, so every talk, an individual URL. So you don't go to API talks. You go to API talks 1, 2, 3, 4, 5. That will be level one in this maturity model. As something really important for you to remember is that level zero and level one are a bag of hurt. And you don't want to use them ever. What is with my slides? Damn it. Hold on. Ah, there we go. Keep that in mind. Don't do this stuff, because you're really just repeating SOAP's mistakes and SOAP failed. Now, we built these sort of APIs and people fiddled with the URLs and stuff. And there was this HTTP thing. HTTP is an insanely smart protocol. So all the people out there, they say, HTTP sucks. It's so slow, and it always closes the connection. Idiots, they haven't understood it yet. But we needed someone to point us to why it's great. And that was this Roy Fielding guy. And he gave us this term REST. And that was really, really awesome, because everyone could say, I have a REST API now, when in fact they bloody didn't. Because they haven't understood what REST really means. The term is an abbreviation for representational state transfer. And it's defined in this dissertation of his, and he published in the late 90s, I think, even, or something, architectural styles and design of network-based software architectures. Note how it doesn't say web or HTTP. In fact, the whole thing is not about the web. Well, it's about networks, but it's not about HTTP. It's about how you design software and systems that have dependencies, et cetera, in such a way that they can scale with the growth of a network. And what trade-offs you need to make when you make communications over in a network. And in this REST thing, it defines a bunch of constraints. The notable ones, of which are a client-server architecture, which is a really obvious requirement. We have a browser making a call to a web server. These requests are stateless. So there is no implicit mechanism in the protocol, saying or allowing a server to know that a subsequent request from the same client is, in fact, from that same client. And there's some sort of built-in session mechanism. Because, as we'll look at later, Sessions don't really scale well. We added this feature with cookies, but the protocol itself should not have that for really logical reasons. If it's stateless, it's really easy to cache it. Also, because it's a layered system, that's important. There can be any number of intermediaries between a server and a client. And it's completely transparent to the call or the response. So if I make a call to some sort of web server, I don't care how many proxies I go through. And I don't care how many load balancers I go through. And how many reverse proxies I go through. That means that you guys can deploy a varnish just there. And you don't have to change anything. Imagine how ridiculous it would be if you had to change the calls from the browser to your web server just because you deployed a varnish. It's not even possible. In SOAP, you kind of have to do this. You can target intermediaries in your calls, et cetera, because they overthought things, designed by a committee. AG2P was just a bunch of guys with beers and a problem in the science field. They said, ah, let's just do this. Another constraint is code and demand, which is completely optional and a good example of which is a server shipping JavaScript code to the browser to enhance the experience. We can completely ignore that, because for our purpose, for the purpose of this talk, it doesn't really matter. And then, and this is the important one, we need a uniform interface. Who knows what the uniform interface is? Aha, great. That is the important thing. The uniform interface basically states we have a URL that identifies a single resource. We use an operation, a method in HTTP, or a verb in common language, to perform operations. We get the slash cart. We put to slash product slash 12345. We delete slash categories slash 15. We do not include the operation that we want to delete something, that we want to get something, show something in the URL, because it's implicit from the verb. We say delete slash product slash 15, that it means the thing identified by that URL should be deleted. We don't need delete slash product slash 15 slash delete, that makes no sense. Redundant. And we use a hypermedia format to represent the data. And then we use link relations to navigate a service. These last two bits are particularly important, because these are the things that make rest restful. Without these, it's just HTTP. We look at those in a detail at the moment. But in the meantime, as a first aha for you in the room maybe, consider this. A web page is not a resource. Remember how I said a URL identifies a resource. A web page is not such a resource. It is merely a representation of that resource. Because any resource can have any number of representations. We can make a get request to the list of products. And then we say, we accept JSON. And we get a list of products as JSON. And then we can repeat the same request and say we want XML. And then we get XML. Neither XML nor JSON are hypermedia formats. Because they have no defined semantics. JSON is just a definition of how you serialize objects. XML is just a markup language where it basically says, well, there's a bunch of tags. And it's really just stuff delimited by angle brackets. We can make a call with a browser. And we can want HTML back. In this case, we're a Safari. We're asking for xHTML, HTML, and other things. And we get HTML back, which is a hypermedia format. Because it attaches meanings to tags. We'll talk about that in a moment. Before we talk about the whole hypermedia thing, we want to talk about just HTTP without all the magic. Because the first step to REST is getting the HTTP stuff right. And that's where most people fail, unfortunately. Once you have that, then you can worry about real REST in hypermedia. So designing an HTTP interface means, well, first of all, you need to think about your resources. You don't have to, but it's really useful. Because if you do that, then it helps you for just your own development purposes and the understanding of your own system to know what are the entities in your system and how do they relate. So commonly, you would see URLs such as products. Because this is people who put all their holiday photos in a folder called photo. And you do. And then you can filter by, I don't know if this is cats, but actual kittens? Or Categories, I don't know. Could be an abbreviation. So we filter by cats descending order. And if we don't want to filter in descending order, and if we want to add a keyword filter, we probably add more slashes. And then we really quickly get a problem where we have ambiguous URLs. Or we can't figure out what to do anymore. Then we have an individual product, which apparently has a list of photos the wrong way around. This is strange. And then I've seen stuff like this in the wild a lot. And it's not bad per se, because it still works. At the end, it doesn't matter what the URLs look like, but it tells me that people haven't even thought about what items, what elements from their problem domain do they even want to represent on the web. Does this create a new photo or a new product? And is this a photo ID or the product ID? These are all the confusions that you could get. So I always start, even though it's completely unnecessary, as we'll see later, by just saying, OK, these are the URLs that I will have in my system to identify resources. I'll have a list of products. And I'll use the query string. Please use the query string to filter, because that's what it's for. Don't add more slashes for filtering, for ordering, for sorting, for selecting fields. It's a query string. So that any client out there can easily construct such a URL with additional parameters, and knows how. Doesn't have to know what the order is maybe with additional slashes. An individual product has a sub resource of photos. The HTTP specification actually states this, that, well, it doesn't state it explicitly, but it's implicit from some things it says about the post request, is that something that has more slashes in it is a subordinate resource. More slashes means it's more, in this case, it means the photos belong to the product. That's what HTTP says about its URLs, which is kind of strange. And then again, you could append a query string to sort by the latest products, or to address any individual photo. And the fun thing is that once you've done this, and then once you move on to real rest, if you want to, then it really doesn't matter what your URLs look like. Then they could be random hashes. But it's good for humans, good for developers to understand what they're dealing with. And it's really helpful for yourself to understand what the relations in your system are. So try and do this. Then, once we've defined these, we want to use them. So we have a collection of products. We can now make a get request to this to get the list of products, obviously. We could also post to that location to create a new subordinate resource. That's in the HTTP specification. It says, oh, a post could trigger a script or something, or it could make something underneath this location. So in this case, we make a new product. Question for the attendees, only for those who haven't been in this talk before. What does this return? What should this return? True? Have you been in this talk before? Good. Many people then return an ID. What do I do with that ID? The product ID. Sorry? Well, but I have only an ID. What do I load? But it's just an ID. I don't have a URL. I have an ID. How do I get the URL? So I append it to this. But then I need to know how to construct URLs. And I need to code them to my client, right? That's not good, because, well, I need to know what to prepend. Maybe one day I'll be in a subfolder, or I want to use the same mechanism on a set of related resources where I don't prepend slash photos, but I prepend slash songs or slash music. So what this operation should return, I'm not talking about a web page with a form. On a form, we need to return a 302 or something in the location header to redirect, blah, blah, blah. I'm talking about, like, it's brought us a broken nation, but it's stupid. But ideally, the way the protocol is designed, we return. And if you want machines to interact, you need to return a 201 created, because that says I've created a new resource. And you can find it at the location specified in the location header. And then you follow that link. And you don't have to program any knowledge of how to even get the web product given an ID into your client, because there isn't even IDs. Maybe you just use the URLs as IDs on the public-facing side of the API. And once you have such a product, we can get it to retrieve it, right? We could make a put request to update it. If we put, then we need to enclose the whole product, which is very, very good. Many people say, oh, that's stupid, because it's not bandwidth efficient. It's important to do that, because imagine that I put details to the product, but I don't send all fields. I just send the price, right? To update the price. Well, maybe someone else has updated the price. Maybe someone else has updated the description in such a way that my price makes no sense anymore, right? You have concurrency issues really quickly. HTTP uses something called optimistic locking, where basically you don't have any locking. So, what you do is, what a server can do is, when it sends you the product in a get request, it says, oh, content type, content length, blah, blah, blah. And it can give you an e-tag, which is basically a hash, a fingerprint of a state of the product. Could be a hash of all fields or something. It sends you this e-tag. And then when you update, when you make the put request and you enclose the whole product with an updated description and an updated price field in a new category, you can send along another header saying, if match, and then this fingerprint that you got in the e-tag header. And then the server can compare this and say, yep, that's the fingerprint I still have, so I'll let you update. If it's not the fingerprint the server still has, then that means that someone else updated the resource in the meantime. And then it'll say, precondition failed, that's a status code response. And it'll say, sorry, that'll tell you, the client, because it's in the protocol, it'll tell you, ah, my e-tag means, my e-tag has changed, so I need to get the request, the resource again maybe, and compile the fields and ask the user if his update should still be applied because he could overwrite changes, et cetera. And then you can retry the request with a new hash. If there's a lot of people writing, then that means that you'd have to try a thousand times until your update goes through, right? Under high concurrency. But you don't have one idiot locking a resource and going on a holiday. So optimistic locking is good. The put is used to update something, the delete is obviously to delete something, right? And you could also return any other status code, like a 400 bad request, if you think, ah, it's not good, or you could say, you could return a 409 conflict on delete because you say, you can't delete that, category, because other products depend on it or something like this. And one thing that you should remember is that whenever you do these things for machine-to-machine interactions, don't ever let the server maintain any sort of client state. It's fine to have, for instance, a temporary URL identifying search results or something like this or a shopping cart, right? And with a unique URL because you have a machine that likes shopping because it just got Amazon Prime. I just got Amazon Prime, I order five things a day from Amazon. So I could build some API bot thing that just order stuff. But when you're a machine, cookies are stupid, because then in your client, you just like, ah, keep track of it and it's really not the way to go. Every request you make should be stateless. So if you want a bot that shops, then you should have a post request to slash carts and then the server creates a new shopping cart with a temporary URL and then in there you somehow put your items so that one client could have 100 shopping carts at the same time. It's not a problem we typically deal with when we write a web app, right? On the web app we have slash cart and that's a shopping cart for the person with the current session. But for machines we can't do that because we need parallelism and we don't want client state in any way on the server side. We don't want clients to deal with cookies. So once we've done all this, we are at level two in the Richardson maturity model. We are using HTTP verbs. Get is both safe and idempotent which means we can repeat it anytime and it doesn't have any side effects other than visit tracking and stuff like this. We could use a post request which is really not safe because it creates something new, it has a side effect. Unsafe doesn't mean it blows something up, it just means it does something on the server. And it's not idempotent because if we repeat it five times, then we have five new products for instance. Unless there is any custom logic on the server that automatically detects duplicate products, right? But that's not somehow on the protocol level. Put and delete are also unsafe because they will delete something, right? Or update something. But we can repeat them 100 times. If we delete a product 100 times, it's still gone. If we update a product with the same details 100 times, it's still the same details, right? That's idempotency. You can repeat the operation and it doesn't give you the different results. Those slides by the way are going to be, they're actually on my session page already as PDFs. So if you, I saw some people taking photos, you don't have to. And we do this. We use URLs to identify resources and we have HTTP status codes to somehow indicate result success or well failure. So we could send a 409 conflict if we say that price is too low for that category because we're selling supercars and they can't be $5. But which is really sad. Would be very nice. A $5 supercar. Now, who's used the Twitter API ever? Good, so a bunch of people know it. It's not good, no. It's not very good. It's not good at all. That's the point of the next slides. So we're looking at Twitter's thing and we don't even have to bother with HTTP because they get HTTP so wrong that it really hurts. So in this case we're just looking at how well does it use HTTP the way you're supposed to use HTTP. In its current state, we make a get request to show an individual status. We make a post request to statuses update. We destroy something by sending a delete request or a post request. We can send both, which is stupid. We can make a get request to get the list of retweets and we can make a put request to retweet something to a completely different URL. So the first problem is it doesn't even allow the accept header. It's not just that you can append .json or .xml to URLs. You cannot use the accept header as a client to say, I want JSON or I want HTML. So you need to construct URLs differently for the same tweet depending on whether you want it back as HTML or JSON, which is not good because remember, it's one resource and it has different representations. The only reason .json and .xml and .html and .pdf exist is so, Browsers can get various versions of the same content. That's the only reason it exists because if we link somewhere, then the browser is going to say, well, I accept this HTML thing and a bunch of images and really everything because it can download things. So that's why we have .csv for the sales guy so he can download his sales statistics from the backend. But machines don't need that. Machines can use the protocol the way it was intended. So it says, give me that thing as content type whatever. We delete the destroy here. This, well, is really redundant. What we really want to do is we want to delete this location, right? That would be enough. Why is destroying the URL and why is it in front of the ID? Why is it not statuses? Well, one, two, three, four, five, destroy. So the show is also really redundant. Then we can get a bunch of retweets and we can make a new retweet. Why are these different? I don't know. It's yeah. But there is one. These are all in some way cosmetic in some way nitpicky. As well as why do we put to make a new retweet? That's not, that's really against the spec. But that's just, you know, me with my German OCD hat on. There is one big problem in this interface specification. It's in the second line who can tell me what the problem is with the update thing. First of all, what does update do? Guesses, please. It creates a new tweet. That's why it's called update. Very logical, right? But what is the problem there? It doesn't tell you? Well, it doesn't update anything. It makes a new tweet. Exactly. For who does it make a new tweet? That's a really fucking stupid interface. Because I've seen this, I swear I've seen this before on the Twitter Web interface. Where Twitter's own support account had the ability that different employees posted. It would be a really, really cool feature for Twitter to have because they could sell that to businesses. They sell, well, you have five support guys all in chart of Twitter. They can post to your company Twitter account with their own credentials. And then we display posted by and then the Zulka or something. You want to be able to delegate access to your account. So Derek, because he trusts me so much, could give me access to his Twitter account because he trusts me so much. And then I could post on behalf of Derek but using my login. Right now, that's not even possible. Because I can't say, I want to post to the status collection of Derek R, right? I can't do that. So, it would be much nicer if that's API. The reason why I'm bringing this up is because I was thinking, many of you know this API and I can tell you what's wrong with it and how to do it better. And it's really obvious how to do it better once you've seen it. Like, you'll be, ah, because you really just want to list the statuses, right? Where you can also post to create a new tweet. And then any individual statuses addressable through a URL and you can delete it or put if they ever want to allow updates in the future, like for 60 seconds you can still edit it. You could say, okay, that's fine. And then after 60 seconds the API just returns method not allowed. Done. And you have a bunch of retweets and then you could post to that something like this to create a new retweet, right? That'd be really nice and really tidy. But instead they have the mess I showed you earlier. The mess in many cases is just cosmetic. In case of the status update it's a really big forward compatibility problem. They can never without adding a new endpoint thing for this add this functionality where the delegate access ever. And that's not good because then you need to maintain two things forever. Well, not if your Twitter, Twitter just switches off all things that they badly, that were even worse designed but yeah, that's a whole different story. So, we've looked at this. Despite many people getting HTTP wrong, et cetera the web is still this really massive thing, right? That we use every day and that's growing and growing. But why is the web so successful? What is the reason the worldwide web has become the first data exchange system that spans the whole planet? Because email is really not a data exchange system, right? Our planet, by the way, looks like this. If you haven't, if you don't know. And everywhere there's computers and they can communicate. Well, you could now say, well, I can communicate with email, which is true, but if I send one half of this room an email with really important information, then the other half of this room, like that half has to walk over to that half and say, hey, what's your email address? So my email address is hmm, and I'll send you an email and then maybe can you send me the really important information back. And the guy who's just in the bathroom, he's just really out of luck, right? He has to call someone with a phone and then, okay. So, that's not data exchanges, just like letters, right? So, why has the web grown to something where you just post a link on Twitter and other people can click a link and follow it and then everyone has the information, even an hour later. When someone forgot to tell them, they can just look it up elsewhere. Well, the reason is because of the hyperlink, right? Because the hyperlink means we have no tight coupling. It makes the web loosely coupled by design. Before that we had Go-Fo, we had links and stuff and there was all persistent connections, et cetera. But the modern web has no notification infrastructure if a resource is deleted. If I link to some web page, right? And then that web page somehow decides to change all its links. I don't know that. They don't have to send me a notice. They don't even have to ask me, hey, can you change your links so I can change my URL? Because that wouldn't work, right? That doesn't scale. Instead, I just get a 404 not found. Because the web fundamentally embraces failure. It's in the design. They knew things will fail. Very early on, when HTML was conceived, you know how HTML has a red link on links? There is also Rev, which is the inverse relation. And those guys were eggheads at CERN. They do nuclear laser rocket stuff, right? Which is really cool. But they also like integrity of links and they have sources and they attribute everyone in their papers. So they could have said, every link has to have an inverse link. So whenever I add a link to a document, they have to say I add you as referenced by. Imagine how ridiculous it would be if I have to call Google and say, hey, guys, I want a link to you. Can you link to me from your start page? That's what we did in the 90s, right? We used to link exchange things to go up in the ranking at Yahoo and we still managed, maintained by hand. Google don't care if I link to them. And Google don't care if I link to something that they want to change, they just do. Which means that we can add more information. We can spawn new web servers, thousands of them every minute. And it doesn't add any more friction on an architectural level. There is more network I own maybe. And obviously we as humans have to find the content, but that's a problem of the search engines, right? It doesn't, I as a website author am not affected if someone in China boots up on your web server, creates a new website and links to me. I don't care, I can tell from my web server logs, but that's it. So there is no limits to scalability. We can in 100 years, this approach is probably still gonna be around. I don't have persistent connections, there is no complicated protocols. There is no rules on how to do something, because back in the early 90s, a few really clever people devised a protocol-centric system that still works to this day. Even on Mars, if you want to go to Mars. So, that's done. And I think I have 20 minutes left, is that correct? Volume two, Hypermedia. Now this is the fun part I promised. We looked at the uniform interface earlier, remember? We say, we identify resources through a URL. We have conceptually separate representation. And through these representations, we manipulate the resources with a complete thing. We send the whole product. It doesn't mean that you can't optimize that. You could always say, well I have a product and I add a new sub URL where I just can get the description. It's just a virtual thing. So you have products 12345 slash description and then I can put to the description just the description to update the description. You can do that at any point in time. If you say, well, products have five megabytes and I really don't want to send that over the wire. But you communicate through complete representations whenever you want to update something. And each of these representations is completely self-descriptive. That means a server receives it and it knows, it doesn't have to keep track of the requests you made earlier. From the URL, the method and the contents, it can determine what to do. And then, and this is the fun ingredient, we need hypermedia as the engine of application state. So a really ugly acronym, this hey-do-you-ass. But it's the piece that was still missing in the puzzle up until now. Because the question is, how does a client even know what to do with a representation? How does it know to go to the product we just created? How does it go to the next page in a search result? How do you create a new product in a list of products? Where's the whole contract for the service? Remember soap, who's used soap in the past? Okay, a bunch of people, or heard of it or roughly know what it is. You have a whistle file. The whistle file is not a contract for the service. All it says is, there is a get product operation and it requires these arguments and it will give you this object as a response. What it does not tell you is that you have to call initialize before you can call execute, right? It doesn't tell you what the meaning is of the product it returns and that there is a category ID field and then you can call get category using that ID. That's nowhere to be found in a machine readable way. You have to write that down in documentation, nobody reads and then people bitch about it. How much the API sucks, right? And this thing theoretically anyway solves it. We wanna use links to allow clients to discover new locations and representations. So in a product we have a bunch of links to the categories, right? And in order for the clients to understand that it is a link to a category, we have to somehow tag the link. We have to say this is a link of type, which is a link relation. And because the clients don't need to know what the URLs for a category look like, they can change. I as a server can at any point in time change the URL to a category and nothing breaks because the client just looks it up from the product object. If it stores and remembers bookmarks basically, the category URLs, well, that's a problem. So maybe I should add redirects because I wanna be a nice server, right? And that ultimately abstracts the whole application workflow and you can really easily change things without having to make 15 calls to your most important clients and scheduling a 4 a.m. Saturday switchover to a new system. And if you want, you can version the hypermedia type itself, we'll look at that in a moment. And ultimately that means that you don't break the clients if you update your implementation, if you add information, change the way links work, et cetera. Really good examples are XHTML and ATOM. But they're not really good examples. ATOM is very generic, so it's hard to grasp things with this at first. And XHTML is really special because XHTML while it attaches meaning to elements, the meaning is this is a paragraph or this is an image. But what do you want when you have an API for machines to consume, opening up your whole shop inventory, you wanna say this is a product and this is the price. So you could roll your own media type. We could say, well, we wanna get an individual product and now we say, we accept application slash vnd.com.acme.shop plus XML. The application slash something plus XML syntax you may have seen before with XHTML. It basically means that this media type is based on application XML. Which means it has the serialization semantics of XML, right? It's angle bracket tags that you can parse with a DOM or a SAX or a pool puzzle. And then we say, well, the vnd.prefix is really open to the public basically so I can just register that. Cause if I call it application slash david plus XML then the guys at IANA will be really angry and send me letters how angry they are with me. So I don't want that. I get an HTTP 200 okay? I say, yep, it's that content type. I also already include information that you can also get into leads on that location. And there's my product. In this case, I actually reuse XML namespaces for the link relations cause it also has a link element defined. So people can just look at that documentation. I don't even have to document it anymore. And there's a real payment that the client can then be programmed against to go to the payment operation. And now that's level three in RMM, right? And XHTML is really, really good for that. It has most importantly namespaces and it has a unique document model that allows extensions. JSON is a bit more difficult because it doesn't have these things and because really it is not built for exchange of information by parties who had certain, not necessarily on the same level of implementation. It is a thing to serialize objects. So for machine to machine APIs, it's not always the best choice. If you compare these two, then we already here have a price which is already an object cause we thought, hey, there needs to be a currency. But we can't evolve this now. We can't add a price without breaking things because here in XML, I have bacon. And first of all, XML from its document model allows mixed content. So I can express my love of bacon without breaking the document structure. And I can also add a second price element without changing the document structure. Cause if any client here wants to retrieve the price, it has to say even the first price element on the product and return me the value. So if I add a second one, I won't break that client. In JSON, if my price is afloat, I can't change the price to an object with amount and currency and later to an array of objects. That breaks a client. The client expects afloat in the price field. So with JSON, you have to put a lot of effort into designing something that's really complicated and hard to read. Just keep that in mind. We can add new names and we can all of a sudden give the name an attribute by saying that's in the language such and such and it doesn't again break the elements. So it's a lot more extensible which is really important for systems such as Drupal where you have lots of service that a plugin could add an element from a different namespace, put attributes and elements from different namespaces. And it's still all interoperates with the basic setup, someone on the other side of the planet that syncs documents and doesn't have that plugin installed. And now, the one thing that you should keep in mind is that if you don't wanna go through the trouble of your own media type and then only using links to navigate et cetera, then that's just HTTP and not a restful service and that's totally fine, right? And in some ways, it can be totally fine. And in some ways, it's the only way to do it. Ever used CouchDB, anyone? Okay, a few people. Ever used Amazon S3? If you upload something to Amazon S3, does it know that it's an image of you and your dog? Does it care that the XML document describes a rocket ship? Well, maybe it does. I would care if it describes a rocket ship, but S3 probably doesn't. S3 is so basic that you can't have hypermedia stuff there because it just stores things, right? And so does CouchDB out of the box. So you don't want hypermedia there. So it's fine if you don't wanna do that or you don't wanna go through the hassle. So just don't call it a REST API. We're gonna look at two quick examples of good-ish hypermedia formats. First is the Love Film API, which I suppose many of you are familiar with, all those from the United States. It is the same thing as Netflix. So, we have here a bunch of links at the top because we searched for the term old in the games catalog. And as you can tell, it's a bit small, but you can also look at it later. We have links here with rel next. Next has a universal meaning across all media types, right? It's defined, the IANA has a link relations list and it's defined in there as, it's the next thing in a sequence of things. So if I wanna build a client that ingests all the titles with the term old from the Netflix, from the Love Film catalog, I'll build a client that keeps following the next link until there is no more next link, done. I don't have to understand the stupid total results items per page thing and then compute if I'm already at the total results because I have so and so many items per page and then construct some query strings or even slash URLs to go to page three, four, five. I don't care, I just follow the next link. That's good, right? And there's also a self link. A self link is really important. Self does not mean the current document. Self just means, this is the item I'm enclosed in. In this case, that is the document element. So that means this is the location where this search happens. And you can see here, they have an ID element inside the catalog title. That's actually not ideal. What this should be is a link rel equals self, href equals and then that again. Because it links to the catalog title. The self link is contained in the catalog title. And then you can build code that goes through every page, extracts the catalog title element and passes it to some completely different code that you maybe didn't even write. But it's a complete self contained catalog title and it has a link rel self. So the code that handles it even knows the URL where this thing is from. So when it stores it in the database it could maybe store a recurring process that every day goes to the URL, fetches and you see if there's any changes and updates your database. All by following links. What you can also see here is at the bottom they have a link rel bla bla bla bla slash reviews. Review or synopsis is not really something universal. The universal link relations that you can use anywhere are self, next, previous, alternates, the ones you know from H2L. Reviews is really something custom so they prepended their own URL to avoid naming collisions. There is nothing at that URL, it's just an identifier. But they link to the reviews. Now, these fellas, last year, the year before, got bought by Amazon. They could tomorrow decide that Amazon's product reviews for Knights of the Old Republic are much better than the ones on Loveville. And they can swap that link for one pointing to amazon.com slash something something something. And as long as it returns the old familiar media type still the client doesn't break. The client doesn't even know that the implementer's changed the link from lovefilm.com to amazon.com. No breakage. And that becomes really important because you run a Drupal site that reviews things and then, or is a catalog of things and then you wanna link to another Drupal install that has the reviews. And you can just do it. You can do it while the system is up and while there is a million mobile phones out there calling your API, if you design it right. They can improve this still. They use application XML, which means on a semantic level means there is no semantic meaning attached to any of the elements in this document. It's just angular brackets. So they should be using application slash vnd.com.lovefilm.api plus XML or something. And then they document that type saying this element means this, that element means that. Then they could also use XML namespaces to say this is a search and this is a catalog title and they're in different namespaces that makes code reuse easier but that's a minor thing. What they should do is for every link they have, not just say, where does it point? What's the href and what's the rel? They should also say, what's the content type behind that link? Because you could link to the reviews then in the Lovefilm API format and in the Amazon format and in the Metacritic format and as HTML for human readers. And the XML namespaces are reset. It's just easier if you have a client and you just throw it this catalog title then the code can say, ah, the element's called catalog title. It's from the namespace I was programmed against so everything's fine, I can parse this. If not, I just abort right away. I close the network connection maybe. Another good example is the huddle API. It's also well documented on Google code. They also have a lot of links and here you can say, see, we have a bunch of actor elements that each has a self link, identifying the actor. In this case, they make a bunch of mistakes because they use thumb or avatar here as rels but they're technically not supposed to because these are not in this list of links, link relations behind that link so they should have their own prefix like that. And they use one global XML schema for everything which again makes code reuse and delegation in clients a bit more difficult. Now, I've talked about versioning earlier that you can version your media type if you want to. You know, Twitter has slash one and slash two in the URLs but that's bad because if I remember a link to a tweet and it has slash one in it, I can in five years not get that link again from my database and fetch it with version two of the API because I have to know that I have to replace slash one with slash two, right? And maybe Twitter ultimately turns 180 degrees and opens up their API instead of closing it down I think they're doing right now. So why shouldn't I have this and then this? First of all, from a pure protocol perspective, these are different resources, which they are not really. They're the same resource with different representations, version one and two. And we can't bookmark this and evolve. So we cannot say, ah, I remember that URL and a year later I've been as a client updated to version two of the protocol and I wanna fetch it as .v2, the plus XML. And here's one important thing. You can evolve the meaning and you can have breaking changes through these media type versions. So if we fetch products in this case, we have a list of products and we get the red stapler, right? Which is available right now. And in a few years, we could make another call and we say, we want it as my service plus XML. And maybe it wouldn't include the red stapler anymore because currently it is not in stock. But maybe there's a version two of the protocol, where we say, well, I understand v2 now and before that, if it was not in stock, it wasn't shown to me because I as the client think, well, if it's there, then I can sell it, right? To my robot friends. So I will sell it. But version two actually says, well, you know the red stapler's not really in stock right now, but I'll show you anyway, because version two of the media type defines a new element, availability. This is an example of where you have a breaking change. You cannot simply with an existing API that shows products, all of a sudden say, well, now we have availability, so now we're gonna add an availability element to tell clients that maybe they shouldn't sell a product. You can't do that ever because you already have 100 clients out there and they won't be updated to understand this new element, right? And then they will sell something that you don't have in stock, which it might be bad. So that's one case where you need to add, you need to add a new version of the thing because of the interface, because it's a breaking change. If you just wanna add a new description or a link to reviews that you didn't have before, that's no problem, right? Clients can safely ignore that. Availability is very fundamental, it breaks your things. So we have a version two. And I still know the list of products or the individual product, so I can call it a few years later with version two of the interface. And boom, I get it. And if the URL had slash v1 in there, I could never upgrade for URLs already now, right? And, oh, hold on. In the next slide, I have a little typo, because it says, sorry, better. It used to say, imagine every install of PHPBB or typo3 had an API, but let's say it's PHPBB or Drupal, right? And you know the PHPBB installs, they are at different locations and so are Drupal installations, Subdomains, Subdirektories. Imagine I have, as a client, need to regex around in the URL because there's the sharksforum.org or is it forums.sharkforum.org and they have community slash API or just slash API, so I have to do regexing with v1 and v2, right? That would be fail. And now imagine something even more important. That's just an inconvenience to client programmers. But imagine that PHPBB guys, they wrote an API and it became so successful that V-Bulletin wants to copy that complete protocol. They could, right? If it's an open specification, they say, we support that same thing. Well, then the V-Bulletin guys would have to have the exact same URL Structure as the PHPBB guys, which is really not good for interoperability, right? They would also have to have slash v1 in their URLs and slash API at the beginning. But maybe they don't want that. All they want is one well-known index URL location maybe slash API or just slash. And then they say, the V-Bulletin guys would say, just call the forum URL and say accept application slash vnd.org.phpbb plus XML because their API is so great, we support that. And then you have interop, right? Then you can have two different CMSs agreeing on one media type and having true interoperability. They can talk to one another and then they could also say, hey, let's have an iPhone app that already works for Drupal and it could so easily be updated to work with typo3. Because a bunch of smart folks in this room came up with such a great interface to manage content that the typo3 guy said, well, that really makes sense. So let's just implement the same thing. And because the client that I wrote for my iPhone doesn't even have to know what the URLs look like in a Drupal system because the system tells me, all I need to go to a typo3 installation is, well, the entry URL and that has a bunch of links to search and maybe other things, done. With URL-based versioning and fixed URLs, you can't have that, you can't have interop. And no interop is bad in many cases. For us it's bad, for us open source fellows. For Twitter, no interop is really good. Twitter wants a silo. Twitter doesn't want a great hypermedia interface that app.net can today copy and every Twitter client magically becomes app.net capable. That's the fucking nightmare. But I'm hoping that in the future, this could also be driven by governments and by non-profits, et cetera, and by the open source community. We'll have many strong standards that are built with the explicit purpose of interoperability. And then all of a sudden, writing clients and servers will be really, really easy. But that aside, you might be wondering, why is this really awesome? Because rest is a bunch of merits. I'm almost done, don't worry. You can evolve it really easily, right? You can add a new element, a new description tag without breaking backwards compatibility. If you do that in SOAP, then you need to update your whistle file because the XML schema says the product object now has a description field and you need to regenerate your stuff. And then all the clients out there break because some highly paid job engineer needs to, with his Apache access thing, regenerate the client and hold on a moment. How do you even coordinate the upgrade for that? It's nightmare, right? So it's easy to evolve. It's also really easy to learn. Once you have such a restful system, where everything's a link, even the list of products is linked to from the entry point of the API. Then if you have any API, you can say, well, here's the format description, but all the interactions, like that you can go from a product to a category, you can just see that by pointing your browser to that location and it will return you a bunch of links and you say, ah, that's the link to a product. Oh, okay, and it has a link to a category, fine. Easy to learn for people. And it really scales well with the number of servers and especially with the number of people that implement it. You can have alternative implementations and a client can support them all. And also, you get all the nice features in HTTP if you implement REST over HTTP. You don't have to. It just happens to be the case that REST maps really well to HTTP semantics. But with HTTP, you get all the cool things. You get authentication, transport layer security, conditional requests that I explained earlier with the eTags, right? You can add, you know, you already have the load balancing and caching figured out. Content negotiation, you can say, give me that as an image. And that is exciting stuff. This is what excites us as developers mostly. Because you might say, well, hold on. If I just do an HTTP thing, level two Richardson maturity model service, then I get all of the transport layer security content negotiation features, right? That does the job. So, and REST is really just about evolving things in maybe in five years, when we have world peace, then we all use the same APIs and media types. Is there not something to REST that is, is there not another really killer feature? Is there not a really good reason to use it? And the answer is, no. Roy Fielding himself says, these are links in the slides, so you can click the link in the PDF. REST is a software design on the scale of decades. It's for independent evolution and longevity. And it's often directly opposed to short-term efficiency. So you need to make the trade-off, your manager needs to make the decision. Well, do we want something quick out the door? Or will we very soon have the problem where lots of third-party developers program against our thing? And then maintaining just the first update, managing that, is gonna be such a nightmare, cause we wanna change behavior and we would break them all. So we have to maintain all the versions forever. Then maybe it's worth it. And the whole thing is designed to be on the scale of years. So that in two years down the road, you don't have to add a new API endpoint, like Twitter's doing right now for some operations. Cause they screwed up with the initial design and they don't care, they still don't care. Because it's so big that people have to implement whatever they do. But he points out another really important thing is that many people just don't care. Cause in Europe it's a little bit different, but in the US, after two or three years, chances are that you implement the API and then two or three years later, you already have a different job. So why do you care? So keep that in mind, but it's always worth it. And in particular, for an open source, for open source projects, it's really worthwhile, cause interoperability means that you ultimately only extend the potential audience of client and other people you wanna interact with. Some further reading. How I explained Rest of My Wife is a discussion about the uniform interface. No, cause so, this guy, he works for GitHub now, Ryan Tomiaco. He's in bed and he's reading about fielding, I think. And then his wife goes like, who's that guy? And then they talk about, you know, idempotent operations with coffee cups and stuff. How to get a cup of coffee? It's a very low level overview. It doesn't really touch hyper medium, but more the rest of the uniform interface. The rest of the uniform interface. How to get a cup of coffee is a very interesting tutorial. It models the whole Starbucks ordering flow. The whole process of getting coffee from Starbucks as a restful system. It's really useful, for a reason I'll show you in the next slide. Then there is obviously Fielding's rest paper. I would recommend reading that early in the morning because if you try in the evening, you'll fall asleep. At least I did. But it's an interesting read. Read through the last few chapters, Chapter seven or whatever it is. That's where it gets really interesting. But he also tells some stories of the, you know, of the olden days and the trenches when they were designing HTTP and fighting dinosaurs and stuff. And then there's a bunch of books. The lower two, I can't really recommend all that well, all that much because well, Subu in the cookbook says you should use URLs for versioning and stuff. And it's just wrong, don't listen to him. But the rest in practice is really good for two reasons. First, rest in practice uses the Starbucks ordering flow and the whole get a cup of coffee, including the media type they define in that tutorial. In the book. So you already have interoperability between something that's already on the web and a book and you can follow along really easily. And there is existing clients for the Starbucks name. And also, they take you from the very beginnings where you just construct some XML and post it to somewhere and then they show what's wrong with that. And then they say, well, you should have the individual URL for every individual order. And then from there they go to HTTP and from there they go to hypermedia. And for every step you understand why the things they did previously had weaknesses. So it's a very, very good book. That would be the end. If you have any questions, we don't have any time. But you can still ask me here or outside. In the meantime, I thank you. This is a QR code that takes you to my session page where you can rate it if you have any questions then. I stole it from you on the QR code. Or send me an email if you have any questions. I want to send you an email if you have any questions. So maybe I can do an interview that I can post to the issue thread. So we can actually have a bunch of questions there. We'll have one major question about versioning APIs. So if we're exposing every Drupal entity as an API and one of Drupal's big things, every different content type is totally configurable. So you can add fields, you can add properties, maybe add new things. So in essence, if all of your entities are exposed to the API, you're changing your API versions every single time you add a field. Well, but you just add a field. Why would you, that doesn't break anything, because a new client or a new server that doesn't have that field definition, well, it could just ignore the fields. There's a really important thing for Drupal. You have to delete it or change it or make it a multiple field. You can do all sorts of things. Right. Well, with XML, a multiple field, it's easy, right? Because you have one title element and you can have two title elements and that doesn't break things. With JSON, that's really difficult. So you should maybe start with XML first. Start with JSON-LD. But that's a problem, you have one title, two titles and boom, stuff breaks. Right. There is a much more critical problem that you're facing, which is that I as a client download, because I'm some sort of synchronization client, I download one document from one server with these fields and I push them to another server so that it doesn't have the fields to find.