 Can you hear me? Now much better, right? OK, OK, now it's working. OK, hi everyone again. OK, so thank you for being here. My name is Miguel Angel Fernandez. I'm working at Detergia. My company does software development analytics. I'm part of the engineering team. I'm working as a software developer. And I'm also involved in stuff related with data and metrics. OK, so first, to give you a glance of the talk, first I'm going to talk about the problem we are trying to solve. And then we are talking about GraphQL a little bit. And then we are going to see a little bit of how the implementation went and the process and so on. So we have people contributing to a software project. And people contributing to an open source project uses a lot of different tools, right? We have a huge variety of tools, depending on the aspect you want to focus on. Could be code, could be messaging, right? So you may want to ask some questions about the project. So first is, how can you measure your project? That's the main question. First is, how many contributors do we have? These are basic questions, right? Another question could be, how many companies are contributing to my project? To answer these basic questions, you need to focus on identities, right? And to have all the contributors for your project properly identified, right? So this is an example. This is a guy contributing to a project. His name is Tom Bridel. And he's affiliated to his leadering, OK? He's part of Hogwarts. And then we have another contributor. This guy over here is called Lord Voldemort. He's working as a freelance wizard now. But wait, he's the same person. So as a project manager, you may not have this information. So it's difficult to properly identify a contributor in your project. If you don't identify them properly, you may think that these are different contributors. But they are not. They are the serve version, right? This is one of the problems. Another problem is that an open source project is a little bit more complex than that, right? This is our CEO, the CEO of my company, Marike. So he has made a lot of comets, Git comets with different emails. He has his GitHub account. He has participated in several mailing lists with different emails. He has a Fabricator account to do ticketing. He has a Slack with different channels. And he has worked in many companies, right? And we have also the dates where he started, where he ended, right? So it's a little bit more complex than the other case, right? OK, so how do we solve this? The question you may ask is, who is who? And who is going to ask this question? Could be anyone, but usually this is the project manager getting crazy about all these identities and how to identify everyone. Sorry about the mouse. And here comes Sorting Hat, which is the tool I'm talking about. It's the Sorting Hat for you, for those of you that you don't know, Harry Potter. That this hat is the one that the students put in their head and it tells you which house in Hogwarts you should be in, right? So Sorting Hat is using a relational model with all the entities that are related with the entities. You can see their profiles. You can see enrollments, organizations, domain, and so on, right? OK, so we had the guy from before. This is Lord Voldemort and Tom Riddell, right? We knew we know that he belonged to his littering and we have a basic profile about him. So Sorting Hat can help us saying, OK, merge these two identities, annihilate this person, and complete his profile with this information, OK? OK, so far, how can you interact with Sorting Hat? So you can interact with Sorting Hat via command line. You can use Hashtag, which is a web tool based on Django to communicate with Sorting Hat. And you can interact with Sorting Hat as a Python model, OK? So the main idea here is to build a robust API for Sorting Hat, for making this easier, even easier to integrate it with other external applications. It has to be flexible and easy to adapt to new technologies and other technologies, and of course, ensure consistency as we expect that many users are using Sorting Hat at the same time, right? And here comes GraphQL. How many of you know something about GraphQL? OK, so probably most of you or those who raise your hand just giving a quick definition. GraphQL is a query language. It's typically served over HTTP protocol. It is also a specification for client-server communication, right? But it doesn't tell you which language tool you use or how the data has to be stored or which client you have to support. It's only a specification, right? And it is based on graph theory. So it uses concepts from mathematical graph theory. For those of you just in case you don't know, if you see the GraphQL logo, the pink dots are the nodes. The lines between them are the edges. And the connections means the direction of those edges, OK? So let's compare a traditional rest-based application versus a GraphQL approach. A traditional rest approach is the one used by Hat Style, which is the web interface we have right now. So imagine that you want a basic view of a profile of a person. So you have a unique identity, which is the main ID for that person. And you want his or her identities in all the data sources. You want also the profile with the name, maybe the gender, the country, also the enrollments. And for every enrollment, a main domain for that organization. And this is what the query would look like in GraphQL. So as you see, in one request, you are asking for all the things that you need for that view. And you can specify at the level of the field of every endpoint what you want to show. OK, so I'm not going to be partial right now. I just wanted to point the strongest points, the worst points, for the traditional approach and the best for GraphQL one. So as I said, rest is just a convention between the server and the client. It's not a specification, so you may have your own rules for that. You can have more data than you need or less data than you need. That's over or under fetching. The API documentation is not tied to development. And we all know how hard it is to write good documentation to the things that we code. And as you have seen in the last slide, you have multiple requests per view, right? What about GraphQL? It's a strongly typed language. This means you define the abstract types you have and each field which type it should have. So the client defines what it receives, as you saw in the other query. The client is telling how it wants to receive the information. The server only sends what the client needs. And there is one single request per view. So summarizing, I think we get GraphQL as the winner. I'm talking about implementing process a little bit. So first, we are going to talk about the data model and schema. We are going to talk about how to implement basic queries and mutations, which are basic entities together with the schema, basic entities of GraphQL. We added support for paginated results. We are now in the process of adding authentication. We are talking about how we are doing that. And also some further steps. OK, about implementation, GraphQL, as I said before, is not tied to any programming language. So you find implementation for different languages. Graphine is a library that provides tools to implement a GraphQL-based API. And Graphine Django is a library which is built on top of Graphine. So it provides some additional abstractions that you need. It may ease the process of implementing GraphQL functionality using Django objects. How many of you know Django and have or have worked with Django before? OK, so most of you are familiar with Django. Perfect. OK, so first we have to do is to define the schema, right? Basically, GraphQL needs three basic entities. The schema is defined by types. Which types are we going to need for our model? On top of those types, we are going to define queries for obtaining the results, asking for the objects of those types. And mutations, which are requests changing the objects we have in the database, right? And that as a whole makes the GraphQL schema, OK? So what does this look like in Django? OK, so as most of you know Django, but I think that goes for the rest of the people, too. Instead of types, in Django we define models, right? For the queries we are defining, we have to define resolvers, which are functions telling how to solve each query and how to return the results. And the mutations are operations creating things, reading things, updating or deleting any object in the database, OK? And that's the Graph in Django schema. About the data model, it turns out it is already a graph we saw before a picture. So it makes sense to modelize this as a graph being the nodes. You see the nodes and the edges, right? So we will have nodes for identities, nodes for the profile, nodes for the domains, nodes for the organizations, and so on, OK? OK, so let's start sneaking somehow into the implementation. So this is a basic recipe for building a query in GraphQL using the Graph in Django library. I don't know for those in the back. Please tell me if you can see the code. If not, remember that the slides are available in the website of this talk. You can see a link to the slides, OK? So first you define a class in models script in Django creating an object. So we are defining here a simple one, which is an organization. It only has one field, which is the name, which is a chart field, right? This is Django syntax. And then in the script related to the schema, we are saying GraphQL, OK, this is an organization type object, and it's based on the model we have created there, OK? And then we can define the different queries we want to have. So first is a query asking for organizations. It will go into return a list of organization type objects, right? And then we have the resolver, which gets the objects, and it orders the results by name, right? And the good thing is documentation is already updated. If you check, I don't know if any of you have played with GraphQL before, but most of the implementations use a basic interface called GraphQL, where you can see that you can perform the queries and see the results. And there is a section for the automatic documentation that is created based on the schema. So you can see there is a query defined on which objects is this query returning. It's an organization type object. It has an ID created at time and last modified. These fields are added by Django because it is inheriting from a very basic Django class. And the name field we have added before, OK? About building mutations, it's similar. We have to define in the schema how are we calling the mutation, which is, in this case, other organization. We want to add a new organization into the database with the field name. So we define that the field is an organization type field, which is the one we have defined before for the query, OK? And then we have another class for defining all the mutations, like the one before for the queries. And you can say here, this is a field where we are expecting a field, which is the name for having the organization, OK? So if you check on the function mutate, it calls to another function called other organization, right? This is behind the scenes. It's calling. This is how we have defined this. A method in the API script, taking care of validating or controlling the errors that we may have. And also that API method calls a DD method that handles the interaction with the database itself. In this case, we are validating the field and checking for integrity errors to check that it's not duplicate or things like that. And again, the documentation is already updated, so this is a good thing. OK, about pagination. There are many ways to implement pagination. Most of the methods proposed are based on first offset method, right? You can also use the idea of an object, or you can use a cursor that is an object that you create for that. The question you may have is, how are we getting this cursor in case we are following any of these approaches? The thing is the cursor is not a property of the object itself. You can store that in an organization object or identity object, right? It's a property of the connection. And here it comes the concept of edges and connections. So imagine that you have two nodes, one node is friend A, another node is friend B. So you have the forensic time, and that value only makes sense in the connection. It's not a property of each of. It only makes sense about the connection, right? So this is information that is specific for the connection, for the edge, rather than the object themselves. There are specifications for building this in GraphQL, like Relay, but for this particular case, it's too complex because we will have to change the whole model we have defined before. And well, basically, we wanted to take our own approach, but without reinventing the wheel, right? So in this case, we went for a hybrid approach using offset and limits, but using paginated Django objects, which is already implemented, is well tested. We know it works well. And we also are benefiting from edges and connections, OK? So I will talk about this in a moment. So it seems like we are complicating our lives, but it is not because we are using well proven implementation of this. So this is what a query looks like. I don't know if I think I didn't show any query before. So you can see the query is split in two parts. First, I'm asking for organizations, right? We are setting the filters. I'm asking for the first page. I want three results per page. And then we have two different parts. The entities section is the information about the nodes, the objects we are asking for. And the page info is the information about the connection, in this case, the pagination fields. You can see on the left the result of this query. It gives you the list of organizations. And it tells you, OK, this is the first page. Page size is three. This is the fields we have specified in the query. We have two pages. We have one next page, but we don't have any previous page. We're starting in one, ending in three, and we have five total results. So in this case, the client knows how to ask for the next results. This is bigger just in case for the people in the back. OK. So this is how it is implemented behind the scenes. So basically, we have to create an abstract type in top of the different types. So coming up, this is the Django objects, the paginator Django objects we are creating. This is the query results. This is the nodes information, the identities, the profiles, the organizations, et cetera. And this is the pagination info, the information that goes into the connection side. And this is a constructor method. So now, every time we want to add a new type, it has to inherit from that abstract type. So in this case, organization type is going to be organization paginated type, inheriting from that one. And then we have to modify our queries for, when we call the resolver, we have to construct an abstract paginated type to get the paginated results as a result of the query. OK. OK. About authentication, we have follow and approach of using JSON web tokens. I don't know if any of you is aware of, is familiar with this technology. The thing is, an existing user has to generate a token, which has to be included in the HTTP authorization header with the HTTP request. So this token has to be generated by an existing user using a mutation that gives you this token. OK. And we are using a module called Graphene GWT. OK. And how to test this? This was a little bit difficult to test at first time until you set up everything. So first, you have to use an application which is capable of modifying the headers of the request. Otherwise, you are not going to be able to perform the request and get the result. And configuring the Django CS, the CSRF token, if any of you have played with Django, is not trivial if you haven't seen it before. So I used an application calling Sonya. I don't know if any of you knows that. I recommend you to have a look at it. You can see the token CSRF token. There is a request cookie, and then the authorization header with the JSON GWT token, which is these three letters, and then the long token after that. And about how to test in this, this also gave us some problems, because we have to create some context in the connection in order to create the unit test and to make this work. I don't have, I think, time for more. I have one bonus slide in case you want to check how to do filtering. About future work, we have to implement a command line and web client for this server part we are developing. We have to limit nested queries, which is something, a problem in GraphQL, because you can ask organizations and then domains and then organizations from all of the domains and so on. So you will have an infinite loop of queries. And of course, feedback is welcome. You have the link of the repository in the talk main page. And only saying that Shortening Hat is part of Grimoire Lab, which is a bigger platform to produce software development analytics. So please have a look. And thank you. Any questions? I think we've gone. Thank you.