 Today, we are going to talk about GraphQL. So I just want to see if you know what GraphQL is. Can I see a show of hands? I think you all know perfect. Have any of you used GraphQL before? Very few, but that's still great to hear. So this talk is about the tools you might need. And you want to run GraphQL in production. And I'm going to mention why that is a bit different. And yeah, let's get started. But before I go on, I'd like to talk about myself. Hello, it's me, Arthur. I'm currently working for Kraken Tech as a lead backend engineer. And I'm leading the APIs team, which makes these tools. So these are the tools our team has been doing. And I'd like to present them to you. If you are looking for a new opportunity, or if you'd like to join Kraken, and we have, I think, offices in nearly 10 countries now, or cities, cities and countries, please, we have a booth outside. And just join and say hi. And we'd love to chat to you. So let's jump into these tools. Why do we need these tools? So GraphQL, about six years ago, I started using GraphQL. And back then, it was a bit magic, because just a quick introduction. So unlike REST APIs, when you make a request to a GraphQL API, it's a query language, meaning you just tell the server what you want. And then the server looks at your language, like SQL, and just gives back you the results. But this design is perfect for many reasons, like it lowers the network costs, and it's faster, and easier to build, and has tons of benefits. I'm not going to go over them out of scope. But due to this design, you have to treat things differently. And I'm going to present these as problems, and I'm going to show you the tools we've built in Kraken to address them. Some of these newer libraries over five, six years, and growing community of GraphQL Python libraries might eventually decide to address these and have these as built-in tools that you could just install and use. But currently, that's not the case for some, but I know some of these tools are already being developed in the newer GraphQL libraries. So please check them out. So let's start with the first problem. Someone requests too much information. So in a REST API, you have an endpoint, and you just hit the endpoint. That's requesting one thing. But in GraphQL, you can request 1,000 things, right? Because no one's really stopping you to say, OK, you have requested too much. How do we solve this problem? And the problem was, this was one of the first things I did when I joined the company, was to check the query complexity in a simple way. For example, here you see we are requesting a viewer, and let's say this is a GitHub, OK? This was inspired by GitHub. And then you are getting the repositories as nodes, but then you are asking for the name. And for each of those 50 repositories, you are asking for 10 issues. And then you are also displaying the title and body HTML. When you look at this, it's a big query. And we said, we want to look at this query as a string, and we want to be able to extract things out and then kind of find a number, get a number, compute a number about how complex a query is in a simple way. And we also would like to take into account the nesting and the parent children objects, because then the number of objects you are getting and the pressure maybe you are putting on the network or the database or your application grows. So this was what we did. We created this query complexity tool that, given a query text like this, gives you a number. For example, for this case, it's 50 repositories. And for each issue, you also get a different number. But in this simple case, we have taken it a bit further. We said, why not have different coefficients? Maybe some of the queries are more expensive. Maybe they are making network goals. Maybe some of the queries, you have to do lots of SQL and number crunching or CPU things. So each of the things that has been requested can have a different weight so that while doing the maths, you also put in a coefficient like a multiplier for each of those nodes so that you have a better representation of how complex a query is. So this means, and you can have a middleware saying something like, OK, if this query is too complex, don't execute it. Stop here and just raise an error without hitting the database or anything like that. So this is like the entry to your API. Related problem is the depth of a query. So what is the depth of a query? So imagine something like this. You have foo. And as a child, you have bar type. And then as a child, you have the author. And then you are looking for posts of those authors. And then you are for the posts, you get the tags. And for the tags, you display something. So you can imagine this would be a big SQL query if these are in different tables or even in different databases. So we want to say, OK, we allow nesting like this, but we don't really want it to be that nested. And sometimes it's beneficial to have top level queries for your developers. So you just don't want to give the full control to whoever is asking this. And sometimes you even have this cyclic things. If your types are not designed in a way, if you want to crash a server, you can do this. I mean, this unfortunately works in some GraphQL things at the moment. So you can ask for an author. And you can ask for the author's posts. And then for each author, you can ask the posts. And then you can just go on and go on. And this usually crashes a server instantly. So we didn't want to have this. So we created a tool just checking for this or saying, OK, your query is too deep. Please don't go that deep. This was the second thing. So this is a better known problem. It's about access control. And back in the days when GraphQL was young and just developing, most of the libraries that you used didn't have this access control built in. And this means that the problem of who has access to what is kind of critical. And for REST or other things, this is easier problem. Because you have an endpoint in REST. If you have access to that, you get that. But what happens when you have this graph of things? So you have blog posts. You have authors. And authors have other things. And then these are nested things. So you can really traverse the types. And you can jump from one type to the other. And this is a difficult problem to solve in terms of access control. So you have to think about that kind of thing. So what we did was we started very simple. We created some decorators to decorate some mutate or query methods. And this means it was a very simple one. Basically, this was just checking if the user can, let's say, update their electricity reading. If the user, and this is just the logic itself, meaning this is a tool. Your process check is a tool. But the logic about that is, let's say, you have to be on the account that you are trying to update. Or you could embed other business logic like has enough time passed between the updates and about numbers and validation. So all of that. But this was one thing we did. Until this wasn't really enough. Excuse me, my voice was too much parting last night. And then in our system, it's quite complex. So we don't have a single user model. But we have different user models. And one of the types of users are these organizations. So it's not like an account user, but it's like a third party organization or third party application. Think of it as OAuth or something. So we wanted to say, we would like to define permissions for these types or mutations. And we would like to bundle them in roles so that the roles can be assigned to certain organizations. And once those roles are assigned to certain organizations, those organizations with their API access would be able to systematically get the data they want. Of course, since these permissions are here, it means they cannot get everything that all the customer gets. But they get less, basically. OK, another problem. This is a well-known problem. Someone's making too many requests. And the team, we have created two tools to address this issue. And these are unfortunately not built in. But we had to create them. And our inspiration was, again, GitHub for this case. So you might remember the first slide about determining query complexity. So when we look at the query, we kind of know how complex it is. And we said, OK, for a given hour, we would like to define a limit for a user. Maybe, let's say, 5,000 points per hour. And each time they make a request and it is successful, we would like to deduct those complexity points from the user's allowance. So this is quite smart and flexible. Because maybe you need to get a lot of data. But you don't really want to be limited in a basic way. You can get all the data. But this was just creating some fair use of the API. Because then, over your quota, if you don't have any points left, you can't do any more queries. And you have to wait some time. This is what we have kind of implemented as a custom tool. But there's also one more thing. And this is the rate limit. This is a Django library, actually. You can keep the install. It's called Django rate limit. And we kind of took that idea and extended it to create some custom logic. And for example, what this does is it is one of our core mutations where you can obtain an access token to our APIs. And this means if you fail to log in three times per minute, you are going to get rate limited for some time. So this just opens up ways to create different rate limit types. You could rate limit by IP, by a user, by a logic. You can define this kind of limit. But this is really essential. And while this is like looking at the bigger picture of what user allowance has to do, this is a really fine-grained approach, meaning you can really be specific about this thing. And you can really tightly control how the experience is going to be. I'm going to talk about a few more tools. We did not write them, but these are must-haves. And I think all of the GraphQL libraries include those things. So I highly recommend you use them. And one of the problems is you are asking for something. But the results, let's say a list, has 1,000 items or 10,000 items. And you don't want to display those 10,000 items because the client who is asking for it can't even show them on a page or they can't display them. So the solution is pagination. So in GraphQL, you could do pagination. And all the libraries have tools to do pagination. We were using graphene. So this is how you define the connection object. And this means every time someone makes a request, they will be doing the requests in pages. So they can't return 1,000 items. But you have to return them in smaller chunks. And the good thing is that you can enforce this. And we really enforce this on our queries. Meaning this means you can maximum request 100 items from a paginated query. And if it's paginated type, meaning if it's something like this, the connection type, you must include the first or the last argument. And I don't really have time to show this as a demo. But you have to say, give me the first 10 items. And then you can say, give me the next 10 items. Or you could say, give me the last 50 items. And then that's how you actually construct your query. This is really a must-have. And just to conclude, how much time do I have? 10 minutes? That's good. So these are not GraphQL specific tools, but it's really good to have things when you're running GraphQL. And this applies to many other tools. But I just want to come up with a few things. So in GraphQL, you don't really do versioning. You deprecate things. And then you replace them with a newer version. So they are both available at the same time. And after some time, you just remove the deprecated version. But the thing is, sometimes you want to do breaking changes. Or you want to communicate this with the stakeholders, people who are using your APIs, your customers, or other companies that systematically consume your APIs. And this is a very hard problem. Because sometimes, if the API is public, you don't know who's using them. So you have to create kind of tools to reach them out. Or they could subscribe to your breaking changes. And then once you say, OK, I'm going to deprecate this in three months, they would be made aware. And for this, we have created cracking API announcements, which sends you an email, has an RSS feed that you could put anywhere in Slack or other RSS readers. It just pops you up. Or you even have a blog-like interface, as you see here. The other problem is knowing when something is not right or performance issues. And you definitely, definitely need good dashboards, some event monitoring about the error rates and what's been requested and how they perform, or even alerts that kind of say, oh, something doesn't look right, can you please check this so that you could be notified about this from Slack. So these are really must-haves. And the second thing is this GraphQL looks like magic to some people if they have not used it before. And it's not really common knowledge. So we try things, we fail, we learn. And then we try again, we fail some more, we learn some more, and then we publish these into conventions, the best practices for our internal team, of course. And sometimes we share those things in conferences like this. But long story short, these best practices really encourages people to do things in the tested or best way for that time being until we come up with other things. So if you are going to do GraphQL, make sure all your team is on the same page about those things. And I think last two items are linters. Yeah, those best practices are great, but who reads them? Sometimes people don't read them. So if you can automate something, please, please, please, do automate them using linters. And there are many different linters out there. You could use Flake8. We've started to use Fixit recently, and we are moving some of our Flake8 into RAF. Meaning if it's a convention, and if you can automate it, just do it. And the Fixit library is even fixing things for you like Black does when it's not the best practice. It's a bit harder to write, but it saves tons of times. And I think this is my last point, the pack testing. This is a contract testing. So this means as a provider, you create a middleman between your consumers and yourself. And this means that middleman, the broker in between, can act as a way of making sure the changes on the provider side do not break the consumer or client implementation. Suppose it's a mobile app, and it's consuming your GraphQL API or REST API or any API, really. The consumer writes a contract, like writes a consumer test specifying precisely how they use the API. They say, I'm calling this endpoint. I'm expecting this HTTP status. I'm expecting a buddy with these items. And in this shape, and in this type, whatever, whatever. They write this as a test. We verify the test. We say, OK, thumbs up. This is correct. We can provide you this. And when we verify this, it's a handshake. It means now we have a contract, and it's a pact. And then we upload that to the broker. So any time any of our team is doing a breaking change, boom, the broker says, you are going to break this client's mobile app. Be careful. Don't do this. So this is really a must-have. So this is all I had to say. I hope you enjoyed the talk. I'm happy to take any questions and, again, excuse my parting voice. Before we break for lunch, let's show our hunger for questions. We have plenty of time for questions. Hi. Thank you for the great talk. I have two questions, actually. Please. Because I am also struggling sometimes with GraphQL. Yep. One of the issues we have is we talk about validation of permissions on mutations. Yes. But the same thing goes for queries. And it could be a bit more complex, because let's say someone has rights to access all of the authors, but for some reason shouldn't have access to posts. Yes. When you have, like, nested query, how do you solve this kind of situation? So for a given query, we have a middleware that traverses all the types and all the children, and all their children's children, like, until imagine like a tree, we traverse it all, and we check on each type where they have the permission or not. If they don't have the permission, we just return none for those things and we raise an unauthorized error. But for the things they do have, then we say, OK, this is good, and we give you the result. So the magic is having a middleware that can traverse the relationships of the children nodes and the parent nodes, and then just checking if the user has access to those things individually by type by type. Thank you. And second question is about actually logging the errors in the API, because most of the GraphQL implementation tend to return 200, even though there was an error in the API. Yes. So it's a bit harder to log it and view it in some like Grafana or Dashboards. How do we do that? It's by design that GraphQL returns 200 for the errors. And so we raise custom events for errors. And not all errors are the same. Errors are unique in our system. So we have these unique error codes. I haven't told those out of time, really. But each error is unique with that code, and we kind of lock them on a different system so that we use those events about the errors to display these dashboards or visualize them. Not like just we don't depend on HTTP codes because they don't work really. OK, thank you. No worries. Are there any other questions? If not, we can thank our speaker again.