 Hello, Nick. Good day. A little early? Yes. Feminine Sunday, are you nervous? No. I think I'm already recording, actually, so it's fine. Don't mind me, just a few technical issues while I try and get the other me to go away. There we go, I think that's it. Great, so today we're doing a deep dive into GraphQL, which is a new API GitLab has started developing. We initially became aware of GraphQL in around 2017, and it's been a bit of a fun story since then. I'll go into the history a little bit more later. I'm going to be using this view for the presentation simply because if I switch screens too often, Zoom crashes and I can't do anything. So I want to be going into GraphQL here a little bit to play with queries and stuff, so this is just the safest way for me to get through the whole presentation without it completely blowing up on me. So this is being recorded and uploaded to YouTube afterwards, of course. And if you do have any questions at any point, just pop them into the document right here. I'm not monitoring Slack, I'm not monitoring the Zoom chat, so here is absolutely the right place for them to go. And I would love for there to be some questions. If you have anything you want to ask partway through and you think is better addressed immediately, just interrupt me, I don't mind that and say, hey Nick, what about this? You've forgotten about this. Absolutely fine. So getting started then, kind of we have to start by asking what GraphQL is. As it's an API, APIs are things you use to programmatically interact with a web service. So something like gitlab.com, you don't always want a user to be going in and clicking buttons and so on. It's quite hard to programmatically drive websites by going in and having a program and click the buttons for you. It's possible, but it's much better to have something that the computer can interact with more directly. That's an API. They've been around for a very long time. Gitlab's got a REST API at present. REST APIs were initially defined in around 2000 by the great Royal Fielding himself. And in 2000, the web was a very different place. In particular, APIs weren't really considered separate things. Typically you'd have your one endpoint and you'd be trying to handle programmatic and user responses on the same endpoint using different response types. So the API would be, you post to the page with some JSON and the JSON is interpreted by the computer, the server. It returns some more JSON. Whereas if a user visits exactly the same page, it gets a nice HTML page with a web form that you can fill in, click, and that uses a different content type. So REST was very much built around this idea of allowing you to use the same endpoints for different users. In modern years, we've kind of moved away from that. APIs are now typically JSON-only or XML-only on the completely separate endpoint, sometimes even a completely separate host name through the main web services connect to. And that's just kind of one of the things that's fallen out of how REST lied in the beginning years. There are many different types of REST API and they kind of come in levels. There are lots of different but generally falls into the second highest layer at GitLab.com. We have a certainly one level. It's called the level for REST API. And what this means is it's almost completely REST but not quite. In particular, we do a lot of good things. We version the API. This is the top one here is kind of the level four graphic that I stole from Damian Cremont over there. So resource is in the URL itself. You can see API V1 users where operating on users and that's specified in the request path. We've got a request version so we can change the API over time. As you can see right here, we're also in a separate place. We're looking at slash API slash whatever slash users. So we've moved away from this initial idea that everything would be intermingled in the same endpoint and that's generally considered normal for REST these days. The request parameters themselves go into the body of the request. That's normally Jason certainly at GitLab.com and we use HTTP methods to denote what we want to do. So if we just want to read the contents of the users then we'd get API V1 users. If we want to change it, then we would put API V1 users. And these are verbs in the HTTP standard that allow us to kind of specify more semantics around what we want to do. There are a lot of things we don't do in REST API though. It's about as good as REST APIs generally come up to the central promise of REST and of what Roy Fielding was looking at when he was writing his paper nearly 20 years ago now is you would have a single client that would work for everything. And this is something that GitLab has never kind of bought into and the web generally never ended up buying into. We've got the web browser and that is a single client of course and it works for all manner of websites as long as they're talking HTML and those kinds of simple web formats. What we're doing with APIs these days instead is using the web to deliver an application written in JavaScript at the moment. In the future that's likely to be some sort of web ASM blob that comes down and just runs in the browser context and interacts almost exclusively with APIs which might be REST, which might be something else. So we've moved quite far from the initial vision of what REST was going to be. Instead of having lots of different web services all accessible from the same client which is what we call a level five REST API when it permits that. We have lots of different APIs and lots of different clients one per API. Our front end is a good example of this. It's kind of a mixer present. Some of it is kind of web 2.0 old style. Here's some HTML with some JavaScript mixed in. A lot of the rest of it is essentially a view based web application. We download a bunch of JavaScript, a bunch of templates and we execute that in the browser against the API that we have. So it operates quite differently to the initial idea of REST. GraphQL is also an API. It is a level zero REST API which is like the least powerful from a REST perspective that you can do. In particular, you've got a single endpoint that you always post HTTP verb post to. No matter what you want to do, it's always a HTTP post. It's always to the same path and the body itself, the data that you post to the API says what you want to do, what the semantics of the request are, whether you want to read some data or whether you want to create a new object, et cetera. So in many ways, it's kind of a step back in terms of accepted best practice. I have a wonderful book on my bookshelf called REST for APIs, which goes into all the wonderful things you can do if you fully buy into REST and we'll cover some of those items that essentially go away on the GraphQL a bit later. But we throw 20 years of best practice out of the window because we're giving up on this idea that you can have a single client that can consume any API. Instead, we're trying to optimize now for the case where you have a client that's adapted very clearly to a single API because that is essentially what we're building these days. And that's kind of the paradigm shift that goes from REST or SOAP to GraphQL. You're giving up on those essentially really good ideas that you've been holding for a very long time to say, okay, we're gonna do it completely differently. Anything else REST can do, GraphQL can also do. We'll cover some of the capabilities later. It's not like we're giving up on things and there's no replacement for them. You could argue that giving up on the single client ambition means that you can do these things better as well. And we'll come onto that a little bit more later as well. What we're really trying to optimize for is flexible queries that can be served very efficiently by the server. So the client can always request exactly what he wants. The server can always give it exactly that and calculate it in the easiest way possible for it. Has there been talk about, like you said now, we always post to the endpoint. In the REST world, we could optimize GETs to only read and do optimizations based on what we think is going to happen and the GET doesn't write anything so we can optimize for that. Are there similar ways of doing that for GraphQL? It's one of those things we're giving up on. Caching especially is very different in the GraphQL world and essentially it's left up to each individual client. A whole bunch of good things REST gives you, like the ability to have a caching proxy in the middle that keeps hold of requests and serves them for as long as they seem in date, just go away. And that sounds awful. And as I said, in this slide I've got, why would you do any of that? That sounds like a terrible idea. 20 years of best practice gone, 20 years of ecosystem no longer available to you. And certainly when I first encountered GraphQL, I thought, oh, this is just a JSON RPC or XML RPC done slightly differently. And there's a lot of truth to that. And it does lead to the question of why would we even use it in that case? And I think I struggled with that for a while, but just yesterday while I was writing these slides in a great hurry, I came across a merge request that kind of, I mean, I've already changed my slide but it kind of encapsulated why we change our minds. It's a wonderful feature being worked on by Mario. It tries to improve the performance of a specific area of GitLab. In doing so, it bumps up against some functionality in our REST API. Essentially, there's one attribute in the commit model which we can no longer serve from the efficient data store. The thing is that attribute is used by basically none of our API clients. They're getting it because in the REST API model, we have, there are projections. It's a thing that exists in REST but they're quite difficult to add and they, it's difficult to add them in a backward compatible way. If you want to remove a field from the response because you've discovered it's expensive to calculate and basically nobody uses it, you can't remove it without breaking the compatibility guarantee. All you can do is add a new projection that somebody can select and this might be a Git slash API v4 commit question mark if you equal simple. That's a projection. It says, leave out some fields but it's difficult to do it in a backward compatible way and as a result, we tend not to. In this module quest, it's a single field that turns out to be really expensive to calculate and we just can't remove the field without breaking our compatibility guarantee. We also don't have much visibility into which fields are actually being used by clients because we always just give them back the whole response. At best we can see which projections they can use. Lazy evaluation to batch up these and it's an M plus one issue underlying it and we can lazily evaluate that and try to improve performance by instead of making any queries we would make perhaps to commit a spread across three projects there's 20 in total. So we can reduce it to perhaps three queries instead of 20 but it's quite difficult to monkey patch that into existing REST APIs. We have done it in a couple of places on gitlab.com commit offers are one example and there was an M plus one issue loading lots of different commit authors and we have to fix that. It's a lot of code. It's spread out. It's hard to understand. And while I was looking at this major quest I found myself really wishing that this was a GraphQL API because if it's GraphQL API the client always tells the server exactly what fields it wants. So for the vast majority of cases where this field isn't needed the client's never going to ask for it in the first place. So there will be absolutely no issue removing it from a default query because there is no default query. Perhaps 1% of queries will go down the more expensive path. As it is with the fix we've gone for in the REST API 100% of queries to the API will still go down the more expensive path. You're less flexible. You can't do as much. The API can change GraphQL aims to be versionless. Versionless, definitely versionless. It has a built-in deprecation mechanism. You have instrumentation. You can see which fields being used and which aren't. So you could put out an advisory saying we're going to remove these fields in a few months time and then you can monitor whether or not people have stopped using them. It really helps. And lazy evaluation is something that's just kind of built into the frameworks from the start. Anybody can add a new field to GraphQL and have that be loaded in a way that's respectful of the underlying performance possibilities. So at GitLab we can only query repositories one at a time. So if you query three different projects you have to have at least three queries. But if you're going for commits in each of those three different projects you're only doing three queries rather than perhaps 20 or 40 queries to get all the information. So it really helps you to build that in just as part of the framework. So we can overcome most of these advantages using REST but we have a very complex code base and a very complex API that's difficult to use. So that's kind of why you might want to do that. And GraphQL's origin is Facebook. They have a lot of APIs, very large APIs and initially they weren't too interested in 2012 they started using APIs and this idea of a web application. Prior to that they were betting very big on HTML5 and certainly their mobile application was just a thin wrapper around the existing website. It didn't do much, it was different. Linked to a talk there by Lee Byron who talked a little bit about the challenges from perhaps 2006 when they started betting on mobile all the way up to March 2019. And the main thing that certainly he discovered while building the REST API and then the GraphQL API that replaces it at Facebook was that REST was very slow, very fragile and very tedious to write, it was just difficult to use. And they did literally throughout 20 years of best practice start again designing something that worked for their specific use case which is of course a single client for a single website rather than one client that does lots of different websites and what they did was essentially take the API and optimize it for that case rather than trying to be more general. It's linked quite heavily to React.js they were both opens, React.js was open sourcing around 2012 and it was used internally at GitLab at Facebook rather with Relay to actually build the web applications they were deploying to iOS, to Android, et cetera. GraphQL itself was open sourced in 2015 and that was a specification plus a reference implementation. In 2017, we became aware of hours involved in this issue where we looked at essentially starting a GraphQL API ourselves. I think GitHub had started one as well so we had an iron in, we wanted to know why. There were some issues with patent grants or the lack of them rather. They were raised and resolved in 2017 around the time we were working on the initial mode request to add sport to GitLab. As they were resolved and now it is an entirely open specification as well as being open source code. So there's no worry about the legal issues around it anymore that's all being resolved. So as a result, we merged alpha support for GraphQL into GitLab in 2018. It's a very simple merge request you can access projects and a few other things. We still don't have full support in GitLab for everything you can do in the REST API in GraphQL. A lot's missing. It's still alpha and we are looking to build out that support so everything the REST API can do, the GraphQL API can do as well. The first feature that we've got to using GraphQL is actually issue suggestions. It's really good fun. You start typing an issue title and it will find matching issue titles. And if GraphQL is enabled on GitLab it will do that using GraphQL instead of the REST API. And so GraphQL is now independent from Facebook in a lot of ways. They've just started a new foundation for it and they are still members of the foundation but they're trying to turn it into more of a public thing than a Facebook thing. So I just wanted to cover some basics of GraphQL before we dig into the query language itself. And the most important aspect is that everything, literally everything in GraphQL is a field and throughout if it's a field, I'll italicize it. A field is essentially a calculation. It's saying, this is going to return some data and you can pass arguments to fields. As you can see on the right hand side, you've got this field echo, which is actually a function. And it takes some text as an argument and it returns some text. The exclamation mark at the end that just means you will always get some text back. It will never return nil. It will always be a string. And the field encapsulates some calculation on that text. What it actually does, I'll just demonstrate earlier here, is you can echo a world. If you run that, it says there's a pass error. Because I need to do that. Oh, I know why. The argument needs to be specified. So you can see, this is running on my local machine and the definition of the echo field, normally we'd call this a function, the definition is that it always returns back the same string with username prepended. It's a little testing thing, but it is a field rather than being a function or a type or anything else. It's a field. I'll say that a lot. As the fields can take arguments, fields have types, arguments have types, everything has a type. Types are also fields. I just want to make that point. In the schema, there is a field that lists types and the types are fields. It's great fun. Some types are built in, some are user defined and there are some special types. Right at the top is the schema type and that always has a query type and a mutation type which are accessed from the query and mutation fields. You can probably see where this is going. Types can have fields themselves. So you end up with a graph of fields. Here we've got the query field, the project field, the ID field, that's a graph, three nodes, one, two, three. Then we have the issue here, so that's three, four, right into the ID. You're forming a graph of fields essentially and it gets even more explicit in the issues collection which we'll come to later. You can see your edges and nodes and all the rest of it. The types determine what fields you have. Some are built in as I said and others you define yourself like the project type. GraphQL comes with a schema built in and you essentially define the schema through code and that determines what fields are available to the users, what types are available to the users and also what directives are available to the users. The whole thing is entirely queryable. I'll just demonstrate this. Yeah, through the graph. When we load GraphQL, this little front end book, it requests the schema and gets back a response which contains the schema and you can see it lists a few types, these two are special, it lists the other types, these are all fields. So the whole thing together is a GraphQL query that returns the schema and from this schema, it's a bit like Swagger except it's built in. You can introspect exactly what you can do with the GraphQL endpoint which is pretty handy for API clients. You can auto-generate them. Also for auto-completion, you can see in this GraphQL endpoint, if I start typing, it shows me what's available in that context and that's all based on the schema. Unlike Swagger, the schema is automatically generated. You don't have to put any work in to get that which is very nice from my point of view. You also get automatic documentation which is great. In order to use GraphQL, mostly you're writing queries. They look like this. It's essentially a type of SQL for graphs if you've ever used a graph database. The syntax is quite similar to that but it's relatively niche in the world, I'd say. Most people are familiar with at least a bit of SQL. Very few people are familiar with graph databases so we'll dig into what you can do with the query language a little bit more. Essentially, this gets sent to the server as is. The server passes it, interprets it, or from that generates a response containing what we've asked for. So here's a very basic query and we'll just kind of go through some of the points in it. So the request is on the left, the response is on the right and it's just a screenshot of up here. It's pretty much the same. So in here, we have the Out of Rock and then we specify in there what fields we want. If you don't say query on line one, it's implied. So it is just not here. And you can just delete that, it's exactly the same. The definition is a set of fields. You say, I want this field, I want this field, I want this field. If a field has fields itself, this field project is a type of project, we know that. It takes the argument full path which specifies a particular project to return. That's all server-side semantics that it knows about and it's documented in the schema. But because it's a type project, we can also say, okay, from the type project, I want the fields ID, I want the field issue, which takes an argument which is the IID. Remember from that, I want the web URL. And then down here we've got two more fields. Essentially what this query is doing is always returning the GitLab CE and GitLab EE project IDs along with a third project which is specified here as a www.GitLab.com project. And that works pretty much as you'd expect. It is a bit unusual compared to a REST request though, because essentially you're requesting three completely arbitrary projects. And that's not something that is often permitted in a REST API. It's a bit weird. The uses for it are many and obviously it's not just this kind of query you can do it for anything. And behind the scenes, GraphQL works very hard to make this efficient. If this were a REST API, we'd probably make three separate Git requests. One for project one, one for project two, one for project three. Here we've made a single request with exactly what we want. And the return data on the right is also exactly what we want. If this were a REST API, we'd have three large documents containing full details of all three projects. We've done three round trips, which is slow. We've probably done them in series because parallelizing things is quite hard on the client side. And we've got far more data than we want. In GraphQL, we've written a single query. It specifies exactly what we want. The server, if the project were quite simple and the project model, this will be a single query from the GraphQL point of view. In fact, it's a few more than that because getting the issues adds one or two. But the point is it matches up the request because the client's asking for exactly what we want. The server can serve that very efficiently. So it will translate that into SQL that will do select star from projects where full path is in one, two, or three, which means that it needs to do one SQL query instead of three. And the database parallelizes that forward. It's absolutely great. The response, as you can see, has just the fields we've asked for. It doesn't matter how many fields a project has because we only ever get back what we ask for. This is a huge benefit compared to what normally happens in REST because you don't have to parse all that extra JSON. You don't have to generate all that extra JSON. It just makes everything much easier. Downside, of course, is that the response is less generally applicable. You can't use this response and there's many different context, I should say, as you would a general purpose REST one. So the value of caching actually goes down a lot with GraphQL and we'll talk a little bit more about caching later, it's just not something you do as often because you're making much more specific queries that's less generally useful. What else have I got in there? Yeah, the only other thing I'd like to ask note is that you couldn't, in theory, generate a very complex query that asks for, I don't know, 10,000 projects and that would kill the server. So there are ways to restrict complexity of queries. You can restrict the maximum depth if you've got very complex nested types with many fields. You can say the maximum depth is 10 or 100. If you said the maximum depth is one here, you'd be able to get the project ID but you wouldn't be able to get the issues whether URL. That limits how expensive it is to pass the query and to respond to the query. You could also limit the overall complexity which is usually a measure of how many fields you're asking for. So you might not set that to 100 fields with a maximum depth of 10, for instance. And that helps somewhat against denial of service tax. I've got a slightly more complex query here going into pagination, especially for issues. A project might have millions of issues and we don't want to return them all at once. In REST we deal with this usually by having an offset and a limit. We've started introducing ESET pagination into the Rails API which works a little similarly but you specify exactly what you want. In GraphQL it's very different. For a start the pagination information is included directly in the response. In the REST API you put this into the headers. So you've got two different places to look at. And the reason why REST puts it into the headers is because the content type for the response generally doesn't have a space for it. So it's just easier to put it into the headers. It's not necessarily the best place for it to be. We can see in this query we're asking for the first two issues and we're saying sort by created at ascending. You can also go backwards. There are a number of different directives you can use for the issues. The response returns a very strange type for issues which is an issue connection. It's not an array of issues. It's a type which has let's just change this to the sorting by created at ascending. So this is an issue connection as you can see and it has just two members. It's got this page info which carries the pagination and that's the same for essentially every connection type that exists. It also has edges which is not an array of issues but an array of issue edges. Issue edge contains a cursor of every node itself which is actually the issue. There were some shortcuts you can use to get around this deeply nested construct when not using the monkeler.com or present. The cursor allows you to say at any point paginate from here. So if we take the edges, I say we get the cursor as well. Run that and then we say remind myself end cursor. So we give it any of these cursors and the pagination will proceed from that edge rather than from the end of the page. So say you start with the first 10. Like so. It doesn't accept any cursors. It's easy to forget exactly how this works. There we go. So you say after the cursor. And then maybe you decide you've processed the first five for some reason you throw away the next one. You want to go from this one instead. You take this cursor and that will go from there instead. But you can also just go page by page by looking at the cursor of the connection type itself. There is an end cursor like so. So you've got choices of where you paginate from and you can make this as arbitrarily complex as possible. You've got a search field in there as well which is used, I guess, by the... So these are all options you've got. As I said earlier, caching is out. There's just no way to make this generally applicable. It's completely incompatible with the usual HTTP semantics for caching because everything is a post. We've not actually bothered handling this at all with GitLab yet. Instead, we're just kind of waiting to see if it becomes a problem. Which might not be the best idea. We'll see how it goes. The GraphQL way to solve this is to move caching into each individual client rather than having it at the edge of the server. So every entity gets a GraphQL... gets a globally unique user ID and use that to decide whether or not to go and fetch that object again. It's not ideal. Like I said, there are downsides as well as upsides to GraphQL. It's not like an amazing addition to REST. It's a complete replacement of... It doesn't do everything REST as better. It gets close. Back onto the query language. You can have fragments and you can also have variables that you pass into them. This is a fairly complicated example, but essentially we've pulled the definition of the fields we want out of a project into a fragment. We've included it into all three project definitions. We've also named the query so that we can give it a parameter here, which is the path. And as you can see down in query variables, we specify the path. What this means is that the query itself can be static. You don't have to interpolate into the query. You just have a static string in your client, which is designed for the use you've got in this case, pulling a project alongside GitLab CE... GitLab CE... And you just change the variable at runtime and pass that in the query string. You want to change the result you get back from the client. GraphQL is kind of designed to help you have fewer more general-purpose queries in that sense. And they can get quite complicated. You can specify multiple queries and then select them by their name. So if we have query projects with GitLab and just the query projects, we could have the same string. We send the same string to the server and just say run this query instead of this query in this case. It gets even more complicated with directives. The query language allows you to say include this field if this variable is true or false. You can also skip the field in the same case. So you can build them up to be quite complicated and do a lot of different things. And directives, I think you can specify your own custom directives as well. They show up in the schema. I'm not really sure how to do that. I know this is a deep dive, but I've not gone that deep. So we will see if that becomes useful in the future. As it is, it means that you can have the same query string for pulling down the project. But you can say do you want the issues, do you not want the issues? Instead of having two different queries, you've got one query in a variable to sort that out. What I was wondering about with the variables is why they are evaluated server side. Does that mean that you can do tricks with them and not specify a fixed value here and do something on the other side? Yeah, the server can specify a default variable, a default value for the variable. The point with having them evaluated server side is so that the client doesn't need to modify the query string. Interpolating into a string is dangerous, think of SQL injections. So just having a static string that the client always sends along with the variables for the server to handle the interpolation keeps that part safe. The query is validated to be well formed and the variables are inserted essentially like a programming language. It's just a lot safer to do it like that. It does mean that each request to the server takes a bit more text but those requests are typically compressed anyway. The queries don't get that large and there are ways to deduplicate. So quite how much you want to have in a theory you could have a single document you always send with every GraphQL request with a thousand queries in and you just select one at a time depending on which one you want to run. Typically I'd expect we'd break it up a bit more than that. We might have queries at most per static document but each individual document would be static. GraphQL gives you the ability to choose how far down that route you want to go. Does that make sense? It does, thanks. Cool. So we talked about directives, that's fine. One thing we haven't talked about so far is changing things. As you might expect from a company like Facebook, most of their work is reading data. Customers their applications want to get a lot of data to look at and make changes relatively infrequently. I'd argue this is most of the case for GitLab.com as well. It's not the read-write web that rests in vision. But in general people are reading from it, they're consumers rather than changing things and being producers and this is probably why web dev isn't very popular these days either because most of the time your reading data, GraphQL is very much optimised for that and making it amazing. You can change things. Here's an example. On GitLab.com we have a single mutation, which is what they're called. All that does is change the whip status of a merge request. It does have most of the same features as queries but they are very much closer to an RPC than they are to the rest idea of here's a put, here's a patch with some arbitrarily complex JSON and this is a state transfer for this resource. It's very much instead here's an action in the name you've got to do. In the rest kind of context, it will be like having lots and lots and lots of new special purpose HTTP verbs. So you've got to get put patch post and delete but perhaps you might also have a HTTP verb called set whip and that will be very similar to the RPC interface that we have in GraphQL. You do get a lot of the same benefits though. You can do multiple mutations in a single query. I'll announce the exam shown down below. We don't really know what this is going to look like in the future. As mutations get more complicated I expect how we handle them in the code base to also get more complicated. I don't have good answers yet for how we're going to handle our most complicated mutations. It's quite possible that GraphQL just isn't the best choice for some of them. We'll just have to wait and see how it goes. One note I'll make is that RPC interfaces aren't the devil. Gitterly for instance is an RPC using GRPC. That's been a success. It's been very good. What it's not is general purpose. You can't have a single client which can reasonably consume any RPC interface. You always need some special purpose code to do so. You get advantages from that but also disadvantages in terms of loss of generality. One thing I've seen coming up is GRPC which is what Gitterly uses in JavaScript. I expect at some point in the future we'll see a website which is a web application written in JavaScript that talks to a GRPC API server in the background. That will be very interesting to see. That might be better than a GraphQL mutation it might not. One thing for sure, it will not be general purpose. Moving on. We also have subscriptions in GraphQL and this is a very simple slide that we have not got a clue. All I know is that they're very similar to existing pub sub mechanisms like Action Cable Overrest where you've got a long live Web socket and you're pushing events down like an event stream. You can also push requests back. The GraphQL library we're using on gelab.com has support for relaying Action Cable out of the box and we might be able to make use of this in the future when we move to Puma as the main Web server at present because we don't have much scope to offer these long live connections. They're just bad for the back end because they tie up too many resources for their connection. We'll see how that moves in the future. We might be able to make something really good there. At present, we prefer to poll every 10, 15 seconds for data rather than having a single long live connection. Authorization and authentication are essential for any API. It's fairly similarly between REST and GraphQL. I must make the point that it exists. You can have cookie-based authentication, token-based authentication. If you're using cookies, then you also have CSRF protection in order to prevent you from being hacked by arbitrary third-party JavaScript on different websites. It all works more or less as you'd expect. One advantage we do have with GraphQL is it's very easily applied by authorization per field. We have this in a few places in the REST API where if you're an admin, you'll see extra features or you'll be able to do extra things. With GraphQL, that's much more integrated and it's easier to do. You can also ask for your permissions. In the example we've got here, we ask what permissions the user has over that project and the user can see that they are allowed to read the project so that can be used in the front-end to show or hide additional controls. That's not something we've ever really had in the REST API. I wasn't actually sure how long that would take. I didn't want to talk a little bit about how you'd add a new GraphQL at endpoint 2 Rails. I can't really talk very much about how you'd handle it at the front-end. Maybe someone will be able to jump in a bit later and walk me through that because I honestly don't know. But here I've got a list of the important files you might want to change and the places you need to look if you're adding new functionality to the GraphQL API. We do not currently have support for EE only features. The GraphQL API is the same in GitLab Core as it is in GitLab Ultimate. We are adding that. I don't know how that would look yet. But presumably we'll just have some extra directories in EE slash app and EE slash spec in order to make changes. In general, you're adding fields to existing types. That's mostly how you add support for new things in GraphQL. So say the project gets a new attribute. When you're just going to the project type adding a field of name with that attribute so that clients can request that attribute through the GraphQL as well as the US API. Sometimes you'll add a new type or a new top-level query. At present, we have very few top-level queries. We've focused on the project quite a lot. We've got to echo. We've also got metadata, which is a top-level type and from that you can get the version and provision of the GraphQL endpoint. What we don't have is issues at the top-level. This doesn't exist. We don't have no requests at the top-level. We don't have groups at the top-level. None of this exists. At some point, we'll probably want to add them. We're actually adding it to this query type. I'll pull up some code in a second. The resolvers are responsible for taking essentially a list of IDs which have been generated from the query. In this case, we've got three four paths. We've got the get-level, get-up, test-full path and then the over-full paths here. The entire query is passed and GraphQL sees it needs to pull these two different things up. It gathers up these two paths into one thing asks the resolver, please give me back all the objects for these paths. Functions are stored in here and that's just like the echo function. It's a very simple field, essentially. It doesn't have a backend data store. It's just some arbitrary code you want to run. The specs are quite composable. Mostly, you have a spec for each type and the request specs and the API specs and being the main means by which you verify the correctness of the API. They're usually more like smoke tests. They're checking that you get something back. All the complex details about the authorization the fields that are returned, etc. They can go into per type tests which is a lot simpler. In the front-end, as I say, we've got the one example so far which is issue suggestions. This uses a GraphQL client library which is a bin wrapper around Apollo but it has a static file that contains the GraphQL query and you just put the two together in JavaScript. I don't really know how that works but it does work. It's great. In general, what we're trying to do is to what I'm trying to do is re-implement the REST API in terms of GraphQL. The REST API itself becomes a client of the GraphQL API with API, etc. in the GitLab code base is just a set of queries that are executed against the GraphQL code and this prevents us from having two complete independent implementations of the same functionality. Going back to that search issue I did have a look at what we'd need to do to support it and actually it was a bad example. The search API is just far too complicated. I went back to a different example where I had a while ago file templates and I just spent five minutes going through the files that we've changed and so on. Then we'll have 10 minutes or 30 questions. So projects have templates for different types of file. A project can have a template for its Gitignore file for instance when you add a new file you can choose from different types of templates to apply them to the new file that you're adding. We have support for this in the main API right here. It's just used as this template finder to get a list of templates essentially. It's very simple templates of a different type but there's no support in GraphQL. So here's where we change the JavaScript in order. At the moment we're calling API.ProjectTemplates which calls the REST API. It's just used the new GraphQL client I have no idea how obviously the pagination etc is the change as well. Then down in the start of the project type so the GraphQL project type is included from the GraphQL query type which is included from the GraphQL schema it's forming that graph that we talked about earlier and we've added a new field to it called file templates which is of a particular type and we've told it which resolver to use as well. So we're returning these types and as you can see it's a connection type just like the issues were. So we get all the pagination goodies essentially for free and we're specifying a custom resolver and a custom type. This pulls from GitLab's existing codebase and this is what we output in the GraphQL response. If we go to the file template type to start with then we can see it's got a name of file template this goes into the schema that we automatically generate and the file template has a set field itself. It's got a type which is an enum so it's got four possible values also in the schema. The REST API has this documented manually with text and it's not machine passable. You can use this schema to generate a drop-down that contains these four options whereas with the REST API it has to be hard coded. We have the ID and here we can see we've got a custom resolver for this ID which is very simple. The resolver gets back a template which is generated from the file template resolver and it's just calling template.key to get what we're calling ID it's just the distinction between the two. But this could be arbitrarily complex code and in the case of the file template resolver it is arbitrarily complex code. That's just saying how to generate that field with this code. We've got a name and content as well and there's a bunch of stuff that's not supported yet here which still needs to be added. We have in the resolver here then here it's a bit more complicated we specify two arguments and essentially what happens is the resolver function is called bygraphql to fill the file template's field in the project with these arguments and we're only specifying one ID and there's no yes this was pulled together in about five minutes last night so it's not complete yet. At the present this is vulnerable to n plus one queries. If we ask for three different types of file template we'll run this code three different times. In the project case we shell out to a loader here which is it up and executes it in parallel we don't have that support in here yet but it's fairly easy to add. Finally the API itself what we've done here is try to convert the existing REST API into a GraphQL client. So we specified the query that we want and we've got a little helper here to execute the query with these variables you can see the query is completely static we never need to interpolate the full path of the project or the type of template that we want and then it returns exactly the same data in the specs we have modified the existing API specs so we run both without and with GraphQL and this helps to verify that the functionality is complete that it works in all cases as it is I think there's a bunch of tests because this is not complete and it is all passing that's good so there's a bunch of taboos but this is quite close to adding this functionality without duplicating this piece of project and I'm quite proud of it I'll probably finish this off in the next few days get it merged our GraphQL API will be one step closer to parity which will be very nice so that's everything I wanted to say and I'm really impressed that came in on time actually I wasn't really sure how long it would take let's see if we've got any questions yes we do so efficiently authorizing arbitrary complex queries can be a challenge do we have any answers for that Sean? I don't no I'm asking you yeah so I mean I don't have any good answers for it what we do allow at present is each field is individually authorizable if the authorization check doesn't pass that means that the field isn't filled yet and all back I think what this is referring to is the case where say so many queries a million different commits and we have to work out whether they can view each commit which is quite expensive the best we can do really is to limit the complexity of queries so that even the most pathological query is something we can answer quite quickly and that goes back to the maximum depth and maximum complexity questions that we had earlier does that make sense? yeah no that's fine I don't expect like these questions to necessarily have like oh yeah we solved this answers so my second one you gave an example earlier where removing the fields for that elastic search MR would have been able to help but I feel like in general an optimization is often like we know that we never need to do this and we can show that we never need to do this so we won't do it and by not doing it we don't have time but it seems like GraphQL could also make that harder because we have to go find out if anybody doesn't use it this way and even if they don't use it this way on getlab.com there's always a possibility that they use it on that on a self-managed instance which I assume that we wouldn't be able to monitor as easily I think in general when we say it will never be used like this what we're saying is if they want to do it they'll have to do two separate queries against the rest API does that sound fair? so more what I mean is with the rest API you can look at it and say well actually we're loading this database from the database but in fact we never actually do anything with that in the API we just load it for no reason for instance whereas in GraphQL you might just be able to say I can load this as well as this at the same time so you don't have that same sort of certainty about what the client's going to do with it I'm not too sure I'm getting it but certainly if there's something that we'd never want to expose we would still never expose it certainly the clients can request combinations of things but because of the effort we put into avoiding M++ ones in the back end and the tools we have to do that I think in general we're going to see that the queries will be more efficient I'm not maybe we can take it to somebody not too sure exactly what the question is I think I have a more specific example in mind but I might let you move on to the next questions and see if we've got time for that at the end cool so Luke how easy is it to invert relations in the schema yeah so the first example I had to drop out for another call but that call so anyway you showed something like that we have like an edge from project to issues and I have a specific use case so I want to have issues as the top level thing and I don't know I want to give several is this something that can be done easily because it seems to me that if you have a lot of edges it just gets like deep nest adjacent madness there is one way we can get around having this edge so we just have the issue of nodes here so we're not doing that at present we will do it at some point in the future I suspect because it gets rid of edges and nodes completely but there's no reason why we can't take issues here and then maybe we'd have projects okay and then we could give an array of II for example yeah there's no reason why we can't do that we happen not to have done it at present and maybe this talks a little bit about what Sean was on about as well it's up to us to decide what fields are available we can do this if we want to we're specifying here what this is another option we've got at the moment all GraphQL queries go against here and that's something to become supporting forever we also have API GraphQL projects like so and then that would imply this project then all queries would then be relative to that project so if we have this we're making the query here we can remove that and just use this in exactly the same implementation but without the surrounding stuff I actually wouldn't be interested in that because that's boring I would be interested in being able to make great joints so for example I have emails and I want to sort them and these emails are from CE and EE and I want to maybe have a nice joint query where I'm like okay I have this five CE issues these five EE please give me the thing back and I prefer to have some structure I have to deal with later on but yeah so like that and then what we'd be doing is maybe instead of having a project top level we'd have projects and maybe paths like that and then we'd just go to emails and that's something we can add to the support for and it would mostly be a helper around this syntax that just returns it in a nicer way so it is possible we haven't done it yet and certainly if we're interested in that kind of thing it's kind of the point that is to dive is we want to be able to add these things as people actually use them okay thanks have we got anything else in there yeah so in general we should use bachelor that for all existing has many relationships yes absolutely will this the work of nest relations yes this is quite one of the cleverer things actually about how we're using the bachelor at present say we have two projects and we want all the issues and then we have a project we have to name give them different names like so like so in the existing rest api this will be two queries essentially two separate best queries one for project one one for project two and each query would have one query for the issues as well as one for the projects in graphql we're getting it all at the same time so we actually is more like projects and then this doesn't actually exist it looks a bit like that then we have like so so instead of being four queries it's two queries and as you add projects to them you've got a third and a fourth and a fifth it's still just those two queries that's how we're using the bachelor at present and it doesn't make a big difference obviously without restricting us on complexity and depth this can get up to tens of millions of billions of projects and issues so it raises different scalability challenges and files for them and hopefully we'll be able to add more in the future does that make sense Jan? yes thank you just to make sure so even if I add assignees under issues there will no be any difference other than one more SQL query for fetching all assignees for all issues in all selected projects absolutely that we'll just add the first like so users like so and that does break down from time to time but when it does it's a bug it's quite easy to spot and we can fix it so it's very nice it gets a bit more complicated when from here you might go to these are users maybe that has just coming up to the end of time now I just want to see if this has projects it doesn't if it did have projects then what we've created is essentially a loop and in that case we might see a second query for these projects at this depth but if we duplicated this up here as well it wouldn't be two extra queries for projects it would just be one extra query for projects at this depth so it works out pretty well great so that's the hour thanks a lot everybody and I hope that was in some way useful I'll be around on Slack again in about 5 minutes if anybody has any questions they'd like to follow we'll call them or if they're really inspired and want to work on a GraphQL API endpoint alternatively if you absolutely hate it and you want to submit a merge request to remove all GraphQL support entirely we can talk about it, well let's go that far great thanks a lot everyone, bye