 Welcome. It's 3 p.m. and now on stage is Angel Remboy and telling us about restful APIs in the gaming industry. Hello everyone. Can you hear me? So is that statement accurate? Not really. And I'm going to expand on that in just a moment. My name is Angel Remboy and I work for the most awesome company in the world, Demoware. Of course, Demoware is based in Ireland and when I moved there I had to adapt fast in order to survive in those foreign lands. This is my solution. Demoware is an Activision Blizzard subsidiary. We have offices in Dublin, Vancouver and Shanghai. We're around 200 people but we like to keep the start-up-ish field to the company. What do we do? Well, basically it can be summed up as straight from one recruiting book. We enable gamers to find one another and then shoot each other in the face and we're pretty good at it. What we actually do is provide backend services for Activision Game Studios, for leaderboards, matchmaking, anti-cheat, accounts management and more. We have like 70 plus services that serve our past games and also upcoming games like Call of Duty Advanced Warfare and Destiny. And of course we're hiring so if you're interested in what I'm about to show you please come talk to me afterwards. So back to our previous slide. Is this statement accurate? Well, as you can see from more of our graphs, the user count doesn't come even close to zero and with over a hundred billion API calls per month, well that's an API's dreamland. And these guys get really excited during launch time. I mean I heard HR offices around the world experienced a spike in sick days requests in November. I want to tell you that this is just a coincidence. It's not us. Talk overview. I'm going to touch on topics ranging from API design to authentication and authorization. So let's get to it. Why REST? First interoperability. Our APIs must be available to game clients, websites, companion apps and by using the right protocol, it should be in our case and the REST principles we can achieve that level of interoperability we need. Second of all, scalability. From all the architectures I've came across, REST looks like the only one that's truly web scale. Basically you can look at the web as a huge REST API and your browser is the client that consumes it. So you have your entry point, your bookmark, use HTTP verbs to talk to web pages, you have links from one page from one page to another, from one resource to another, you have URIs that define resources and also your browser interprets those resources based on hypertext and metadata. So I think it's safe to say that using RESTful architecture style to your API design for your API design will make your services easier to scale on the long run. Anyone who ever worked on API probably heard of right fielding thesis. It's an interesting read and when you're done with it you're left astonished by the elegant concepts outlining it. You want to adhere to those concepts and you will probably succeed but then you realize that whatever you'll do your clients will misuse your otherwise perfect creation. In the gaming industry things get more complicated. You have custom protocols because everyone has to do their stuff. You have mandatory libraries and SDKs. You have multiple languages and platforms csharp.net Java you name it and documentation goes from okay to non-existent. Most game developers have little to no contact with web services over the most time of their careers and some user education sometimes is needed like even for simple things like what the JSON is or what HTTP codes mean or how to use query strings. Only in recent years the gaming industry started to embrace HTTP and REST like services that make life easier for us. Having said that our API is RESTful I want to say yes but even I have to admit that we don't adhere to all the principles either because of business constraints, legacy logic or backwards compatibility. The important thing that is that we're moving in that direction and we are encouraging our clients to follow suit one step at a time. So design wise we would get post-put delete verbs for API crowd. We use HTTP for the communication protocol and JSON for representation and every time we do an API design or work on design we try to be pragmatic about it. Like good enough is better than perfect. Other things we use design wise we have version in the URL but it's mostly semantic. We tend not to break backwards compatibility it's mostly to tell the client that we have this set of features that it's only available in version 2 for example. So we also have a camel case in JSON and query strings. Nothing Python code or we have a mapping to underscore for that. We standardize dates also links to other resources and of course it's human readable. Just using REST is not enough to run your services at scale. You need to have the right processing tools in place. So I'm going to walk you through how we do it at DemoWare. We use Chrome and Kanban depending on what works for the team or the residual cycle. We also have automatic builds that test everything that's merging to master and tests against also our other systems. So everything is alright, everything to be alright. DemoWare services use a lot of different texts so I'm going to focus only what we use for APIs. So if I 7 in Django 16 we also use my SQL. A Django account this year some people were surprised that we still use my SQL in production and that it scales really well for us. So at one point there was even a show hands and about 90, 95% of the people were using PostgreSQL. So I'm curious here at your Python how many people here use my SQL. Can you raise your hands please? PostgreSQL in production? About the same. MongoDB. So our reasoning is that we couldn't find enough pros for other SQL databases that would want such a big migration for our infrastructure. And my SQL is doing some really good development in the last few years. So it works pretty well for us. We also use CentOS 6 and Apache and ModWizki. As you can see we don't use anything flashy. Our layout is simple, reliable, and easily scalable. Our projects are built with the sharding line, also our dev environments, and our builds run the unit tests, acceptance tests, and all other tests against sharding environments. Our layout also saves us from a lot of tech stack related issues most of the times. And let's us focus on real business problems. Reliability is something we take really seriously as you can see compared to other big game launches in the past few years. For code we use Git and GitHub Enterprise. We use feature branches. Master is always deployable. We do pull requests for team review. And when all the features are in and the builds pass, we bag it, tag it, and ship it. We use RPMs for our packaging, individual package, or dependencies, and our own repo for dependencies. I'm going to talk about a bit about schema migrations. So schema migrations are straightforward when you're working with huge amounts of data. So there comes a time when you need to do a schema change, but you cannot afford any downtime. And when you have lots and lots of records, an alter can mean table lock. And when you need to do table lock, you're going to have a bad time. For this, we use Percona Toolkit, which is a clever set of scripts created by Percona to deal with these kinds of situations. We also use Percona MySQL fork in our production environment, but the tool should work for pretty much any MySQL variant. So what does the tool does behind the scenes? It creates an already altered table, a copy of the original table, then sets up triggers for insert update, delete on the old table towards the new table. So everything is in sync. You have for every operation, you have consistent data. Then copies the data over in batches while in the meantime monitors the slave lag and adjusts the batch size or just stops the operation to let the slave get in sync with master. And at the end just renames the table, which is an operation that takes fractions of a second. The only downside of this process is that it uses a lot of space as it displays all the data in the tables. But other than that, we couldn't find anything in our lotus that could deter us from using this tool. This is how our configuration file looks like. So why YAML? First of all, cross-project. We're not a Python terminal shop, and we need to be consistent with configuration files across the board. Also, YAML is just as diffable as readable as Python code, and it has to be reviewed by people who don't know Python. Also validation. We dynamically build the Django settings module at runtime from the loaded YAML file. So we check for missing configs and invalid values at this point. This way, if something is not right, we know before the actual setting is used. We know when it's loaded, not when it's used in the app. This is a simple example of what our validation library does. It checks for type of valid entry. It has default in the description. Just to give you an idea. On the subject of validation, to validate data sent by our clients, we use JSON schema, and it's pretty awesome. If you haven't used it, I really recommend it. As you can see from this example, you can do all kind of fun stuff with it. You can have restriction by type, minimum maximum integers, pattern matching. You can also have max minimum length for strings, required fields. You can also have different errors depending on what kind of exception is encountered. The cool part about this is that in your Python code, you can see all the primary validation related to an endpoint or to a resource in one place. You don't have to jump hoops or really dig around. Just one thing to mention, we don't use this for heavy business logic validation. It can get messy and it's hard to maintain the long run. For error handling, it's where Django middleware shines. In our views, you just raised the exception in the middleware and we'll catch it in its process exception method. Then you can wrap it in a nice HTTP response and send it back to the client like this. You might have to do some adjustment depending on what your client needs or what you need, but that's basically, it's pretty, we don't do anything fancy there. You can also see we use our article approach to our errors and also to our error codes. That's where most of the design work goes into on how to trace back from those errors to the actual thing that happened. For logging, syslog relays our logs to an aggregator built on open source tools. In the end, you get something like you see here. We use log stash like search and Kibana for frontend. Things are somewhat messy in this slide, but once in a while, you can see exactly as something's happened and need to act. Of course, we don't spend all day looking at these graphs. We have errors that do that for us. But looking at them, we can have an overview of the frequency of the errors and also the timeframe when that happened. For example, deployment, an event, you have a promotion for a game or something like that. We tend to keep the needs of production as opposed to development when we format our logs. The logs need to be concise, complete, and contain context. Think about it this way. If you put a log in a bug report, would you understand immediately at first glance where the issue occurred and why? We try to guide us by those principles. A brief example of logging, as you can see, you have the level of the error, the project, the app, so you can search pretty much easily based on keywords from all the logs that you get from all our services. Besides our logging, we use metrics, lots of metrics. All our metrics sent to an aggregator that verifies and source them and then sends them to graphite. And you finally get the visual image you see here. So what's the difference between logging and metrics? With metrics to get different information than we're logging, you can have like how many mails sent or failed, SQL query times, slave lag over time, app response times, user's creation of time, user deletion over time. And you notice anomalies like that right away in the middle. Again, Django middleware to the SQL. For this simple example, we record in the request the start time of the request and then in the response, we do a diff and we send the request time to our metrics aggregator. It's pretty straightforward. There's nothing, no rocket science there. We can also add here, for example, got a process exception method, and you have metrics for your exceptions, and you can log, see metrics of different types of exception over time. For authentication authorization, we use JSON Web tokens. JSON Web tokens contain claims that a system can use to access resources it owns. We use two types of JSON Web tokens. We have JSON Web signature objects, which are claims that are base 64 encoded but carry with them a signature for authentication. And we'll also use JSON Web encryption objects, which are claims that are totally encrypted with a public private pair. For that, we use a JSON framework created by us in the open source. You can find it on GitHub and also pip install it right on. It's called Jose. I'm going to walk you through what, how it can be used. As you can see, you have your claims there, you have your issuer, the expired time in the subject, and you also have the password. This is for JSON Web signature. You sign the claims. You can use either a synchronous algorithm for signing your views. In this case, it's a synchronous. This is the object you get. You get either payload and signature. Then you serialize and compact it and send it to the client. And the client just like one line of code verifies it and knows it. It's okay. For JSON Web encryption objects, it's pretty much the same with the difference that we use private public key pair. And we encrypt it with a public key. You get a slightly bigger object. You serialize it at the client side. You decrypt it with the private key and they have it. So in summary, REST is awesome. Use as many concepts as you can, but be pragmatic in your approach. Error logging and metrics monitoring are what makes scalable, survival, and scaling survival. And we're hiring. And that's it. Thank you. Thank you. We have a good five minutes for Q&A. There's one microphone and I'm on this side. Hi. Thank you for your talk. Can you talk a little bit more about the last two bits, which is JSON Web signatures and JSON Web encryption? In the sense of how do you guys use it? What's the research that you did behind it? Do you believe this is secure? What are the constraints? We did quite some research and we think it's pretty secure for our use case. So for our use case, I guess it's secure enough. We're going to release a new version of Hosey, like next in the near future. We should have more context and better security. Probably this will be this example will be a bit updated, outdated. But from our research, as we plan to have some APIs more open in the future, it looks like it's pretty secure. I don't know. We can talk about this. We can expand on this subject afterwards, but security is not an easy subject to approach. This is how we do it. We can talk a lot about this. One quick remark. Am I right? JSON Web signatures and encryption are open standards. You did not invent JSON Web tokens? Those are open standards? Yes, standards. These are based on standards. You wrote the library for that. We did not invent JSON Web tokens. Hi, thank you for your talk. You decided to implement your own framework rather than using another framework such as Django REST framework. For the API, you mean? Yes. We'll use Django, which is a framework in itself, I think. It's safe to say that. At the time when we started developing our own framework, we didn't have a lot of choices, viable choices. Now you have Django REST framework and TastyPie, which are pretty mature. Back then I think there was only piston, which is not even maintained anymore. It didn't suit our needs at that time. That made us do our own thing. More questions? Do you use anything for measuring response time or throughput of a new version of your software? You mean like load testing? Yes, we do load testing a lot. I saw some companies, for example, when they update a new version, they put part of their cluster or node of their cluster in production and see how it behaves directly in production. But we tend not to do that. We load test before with real machines with the production-like environment and production-like requests. So the proper staging environment that you test load? Yes, but it's an exact replica of the production environment usually. Thank you very much again.