 A short introduction of myself, I'm Yap, I'm the co-founder of Kogo, which is a content library for professional services firms. I'm on Twitter, GitHub, so if you're interested, you can find me there. So basically, before I started Kogo, I worked as a data scientist. I don't really have a background into computer science, so sometimes I don't really know the algorithms and stuff, so forgive me for that. So my talk is actually going to be about data loading, and specifically, this is a problem you probably run into if you use GraphQL. So this is something I've kind of been on my mind for some time, and I've been thinking about it, and so there was not really a great solution for this in Elixir, and this is kind of the topic of my talk. So actually, it's not specifically related to GraphQL. If you use other mechanisms to load data, you have the same issues probably, but if you use GraphQL, you run into it pretty quickly. So who in this room has heard of GraphQL? A lot of people, that's good. And how many people actually used it before? In, okay, yeah, so I don't really need to explain a lot about GraphQL. But this is like a typical query of GraphQL. So basically, you say, I'm looking for a profile. From this profile, I want the name. I also want the name of the best friend to this person. And for the first five friends, I want the name and also the best friend name. So this is quite nested, right? So one of the, so if you would run this, you'll get this. It's basically a big map, and GraphQL would kind of fill in the things you would be looking for. So the query would be, you would run the query in your app. You say, okay, this is the data I want. And they would get back from the GraphQL server the data. So here you see kind of like, okay, I get back a name. I get back the name of the best friend. And then I have a list of friends with the same information. So the problem here is that you run into the n plus 1 problem. And in this case, because you're looking for the list of friends, if you would run a query for each friend, you basically reach out to the database for each friend. And for a small list, that's not a problem. But if you're working with longer lists, and if you work with deeper nesting, that's going to be a problem. Because basically, there's quite a lot of overhead for like sending a query to the database. And then you also have the waterfall effect. So you kind of wait for the first friend to arrive. And then you query the next friend, right? So for more complex queries, this would be pretty slow. And this will actually be noticeable. And on the other side, you're also overloading the database. So if you scale your surface, you're basically hammering the database with a lot of queries. So this will actually really be a problem. And you see if you get more complex queries, it actually will get exponentially slower. I'm not quadratically for, depending on the list. And then if you nest deeper, it will have this impact. So let's forget about the GraphQL query. If we would write this into Elixir code, we might have this context. And it's like where the business logic is. And we might have a function where we get a user. And we basically say to the database, get this user. And then we have another function to get the friend's IDs of a particular user. Again, you reach out to the database, okay, these are the friends. This is a user ID, I want to get all the IDs of the friends. And the database can be basically anything that is not really important. It can be like Ecto or like SQL or whatever. This problem, it's the same for each database. But then if we kind of load a user and a best friend, this function can look like this. So basically you say to the user's context, get this ID. This user, with this ID, I get back a user. And for the best friend, I get my best friend ID here and send it to the user's context to get back the best friend. The result I will put in the map, as you can see. And then to get the full data set that I requested with the query. This is a function load profile. It basically loads me as a user. It will request the friend's IDs again. And then a map over the friend IDs and load the same function for those friends. And then the last step is combining those two results in a single map. And so this will get the same result as the query. So this is kind of written out in the elixir code. How does that will work? So again, if we do the first query, I get myself, my best friend as a query. So this is kind of the right side is a database log. So I log out what the database will do. So it will get user one, get user three. This is still fine. Then I ask for all the friend's IDs. So it will do another query for the friend IDs. That's fine, but then if I do this, I get all these 10 requests to the database. And this is like the end-to-end problem again. So there are a few different ways how we can optimize this. The most important one is batching. So if we, for the long list of friends, I would probably want to send a single query to the database and say, okay, I want all these users. So this is like the most important thing. If you do this, you're probably good. There's a few other things you can do as well. So caching, so as you can see in this list, the fourth request is again, user three. But we already requested that as a second request to the database. So if we apply caching, we can basically not do that because we already have it. And there's a few other double request tiers as well. And another thing you can do is do some things in parallel that are not dependent on each other. And that's also, you don't have to waterfall, you may have need to do two requests to the database, but they can happen at the same time. So that will also make the response quicker. So, of course, if you use Elixir, Epsynth is the most popular GraphQL server. There's another one as well, but I think Epsynth is having the most traction within the Elixir community. And they created Dataloader. Dataloader was created by Facebook. And they kind of released a few different version of Dataloader in different languages. And the Epsynth project created a version for Elixir. How this works is basically you declare your data needs. So you basically say, I want all these users. Then explicitly you run the Dataloader and then it will fetch it all in one go and we'll apply batching, caching and everything like that. And then you can get the data back from the results. So this is the way how you can solve it. This is how it works in code. So you basically get the loader, you load and here I load user one, user two, then I explicitly run the loader and then I get it back from the result, which is like a data structure that contains the results. So how does this work in your GraphQL layer? So each resolver and resolvers are kind of like controllers. If you're familiar with ModelViewController, the resolver, the job is to fetch, for instance, a user. And those resolvers will get a Dataloader. What I then can do in a resolver is basically load the data and supply a callback. And then the middleware combines all of that together, runs the Dataloader one time, and then all the callbacks are being run after that, right? So this is how it looks like. There is a shorthand, but if you do it explicitly, this is how you do it. So in your resolver, you have the loader, you load the thing you want to load, and you supply a callback on load. And the callback internally within AppSint, it just returns a tuple, where it says, okay, middleware, you have to solve this. And then it kind of combines all the dataloaders and runs the callbacks. Right, so problem solved, right? We don't have the n plus one problem. We efficiently query data, yeah, this problem is solved. So in my opinion, I wasn't really happy with this. And the most important reason was because of the single responsibility principle. So what happens if you do this, you kind of do the data loading in your resolver. And so the resolver should basically just be responsible to call out to the context, in my opinion. And that's also like the creators at Facebook, they said, okay, keep your GraphQL layer as thin as possible, and do all your business logic in a different module. So that was something not optimal. Also another point is that if you do this in your resolvers, only your GraphQL API will have the benefit of efficient data loading. And then if you do like a REST API or a different API, or like an integration with Slack, the data will not be loaded efficiently. And you have to do the same thing again for that particular use case. And another problem, and I think that was for us a really important one, is that you might need some extra data that GraphQL doesn't really show. But it's important for you to, for instance, determine if I load a profile. I might need some extra data to determine if you actually can see that profile. So we might actually want to check if we're friends. And otherwise, it's like you cannot see this. So that's another data point you need to get from the database. And so that's pretty hard if you do all of this in your resolver. So how do other implementations solve this? So there's a very popular data loader in Node.js and that uses promises. And they have actually a way to actually use the data loader in your business logic. Because if you reach out to the data loader, it works with a promise. So every time you loop over something or you call it multiple times, it kind of waits for the next stick before it does anything. And then the data loader internally batches everything that happens at the same time. So that's actually really nice. That will solve a problem. However, if you look at Elixir, we don't really have that. We don't really have an event loop. We don't really have ticks or the concept of ticks. So processes are really, truly concurrent and they're isolated. So you know how would you do this? So if I would use the data loader in, for instance, getting a user, it kind of eagerly executes. And if I get a bunch of users, they will not be best together, because it will do the query in that function. So my first idea was like setting up a gen server. Kind of you send a message to the gen server. You say, okay, I want to load this. If anything, any process at the same time also reaches out to the gen server. The gen server can combine, like it's in a single query. There's however like a few downsides to this, so it will only work. So it kind of blocks the execution for the process you're in. So if you're not paralyzing in terms of if you're looping over something and you're not doing a parallel loop, then it will block the whole process. And so then it will do batching. It will basically wait until the data loader comes back with a result, until it goes to the next one. So that's actually a problem. And we also kind of have to, we have to wait a certain amount. Because how do we determine that the first message has come in and the last message with requests for data comes has finished arriving. There's not like ticks or anything like that to kind of batch them all together. So we kind of have to wait a certain amount. So maybe there's actually a way to do this. But in my experience, this is, these are some downsides to this approach. There's also another way to do this. Like I thought a bit, and another way to do this is using the deferral pattern. So it's kind of like a promise. A deferral and a promise are quite similar. But the deferral ball you have to actively run or trigger. The downside of this is that it's not like an official Elixir language feature. So it doesn't exist yet. So this is something you kind of need to implement. But like the upside of it is that it's lazy. It's like a lazy data structure. It's non-blocking. You can basically loop over, call it multiple times. And there's no delay that you need to have. So the good side is that you can add the very end. And this is really good for a GraphQL. That, for instance, you know exactly what's being requested. So you know the entire query. Only then you can resolve it and kind of load all the data as efficiently as possible. I have to work with a different example that you gave. Yeah, so I will kind of go through that quite detailed. So it will be later in the, yeah. So let's see if I can answer the question then. So yeah, so the deferrable protocol, you basically describe the requirement. And this is really into the implementation. So that's not in the protocol. It's not described how you do this. This can be anything. It's kind of the deferrable protocol itself. You can kind of see it as like innumerable. So it needs another module that implements a protocol. But what's included is that you always have a then callback. So you say, okay, I need this. After this, I want to do this. So you have a then callback. And the then callback can also come back with deferrable again, right? So you can have a deferred value that resolves into another deferred value, or a list of deferred values. And as I said before, you actively have to trigger it. So if you have a deferred value, it doesn't do anything yet. It doesn't query the database yet. But when you run defer.run, it will kind of do everything. And then it comes back with the real result. Is this something that you keep on there? Yeah, yeah, so I had to create this. I wasn't aware of, I kind of looked for something like this in the elixir hex, like the package system, or I didn't really look at Erlang. It might be something in Erlang, but it was nothing in elixir. As far as I know. So if you go back to the data loader, this example of how the data loader works. On the left, so we basically load user one, load user two, run the data loader and get back user one and user two using get. And put them in a map. So on the right, you can see how it works with the lazy example. So we basically create a deferrable where we get user one and get user two, and then we supply a deferred or then callback. And that callback will only run when these are resolved, and then they get back like the two users. So then if I really want the real result, so if I call get users, it will be the first value. It will be data structure. I cannot do anything with it yet. If I trigger deferred or run on this value, I need to supply the data loader as well. That's like a thing I need to set up. It will resolve into a list of two users. So to go back to this, this actually solves our problem because we can actually put this in our context. And it's kind of you have a context that can optionally get back deferred values. But we don't need kind of, we can call this multiple times and only combine everything and then run the data loader and efficiently load all the data. So this kind of solves the problem. But it's not like the best developer experience, right? In my opinion, if you have a lot of callbacks in your code, it makes it quite unreadable. Like if you ever use Node.js, you kind of have the callback hell of like a lot of nested callbacks. And like in a Node.js world, they came up with async await. And we're really nice if we can have the same syntax as async await, but then actually for deferrables. So to have an example, if you would have this, right? So we kind of say, this is a deferred function. So if the deferred keyword, then we, the two users here, we await them here, and then only the next line will be executed when this comes back. So then we can put them in the, in the map. And then this will look really similar to if you, we just like talk to the database and get the two users. And you know, if you would have this, and you would have run the data loader in your GraphQL layer automatically. So if your GraphQL loader knows how to handle deferred values, you know, you might not even need this. So you just have this code. But well, we need to add a language feature. That's, sounds like it's pretty tricky. So I was thinking about this. I kind of came up with the API async await. I didn't really know how to, you know, do this. And I was like, oh, maybe we have to propose this to the Elixir language. And, you know, maybe they will edit if we are successful. Maybe they don't think it's necessary. So I was thinking about this. And I was like, well, 80% of Elixir is written using Elixir. And we have macros, right? So I was like, maybe this is actually possible to do using macros. I wasn't sure. I played around a bit. Actually, it turns out you can do this. So the deferred package, it kind of brings the deferrable protocol to Elixir. And it includes the defer and await statements. So this is something you can actually get from Hex right now. You need to combine this with an implementation so the deferred package itself will not be very useful. It's also quite a, actually quite a simple library. That's the macros are really powerful, actually. It's not trivial, but it's also not like you need to do a lot of work to get this. And actually, the protocol is really, really simple. Like most of the package itself is like doing these macro transformations to be able to have this syntax. And so lazy loader is the first like use of this deferrable protocol. It kind of uses the original data loader and kind of implements the protocol. And it's the same as I just showed. You know, it's also quite simple because I didn't do a whole implementation of a data loader. I could just use the existing data loader and build it on top of that. So if you go back to our first example. So the context, you know, I showed before where we have like the users, we have a get function and a get friend ideas function, how to do this with this new deferrable lazy loader. So basically you kind of in the get function you basically have to run lazy loader.get. And you basically, this is the same way as how I showed you the normal data loader works, except we don't have to run load and get. So you just have to do get. And then for get friend ideas, it works the same way. You call the lazy loader. You say, okay, I want to get the friend ideas for this user. So actually, as you can see, it's very similar to just using the database directly. So we have to run, we have to kind of write a source for the data loader. If you're using Ecto, that's already there. So you don't have to write anything new. The database is like example database. And I wrote the very, I wrote it on top of the key value source, which like a very simple key value data store. So if we go back again to the load user and load profile and see how this works with deferrables, it's also very similar to the simple case, the naive case where we have the problem with data loading. So again, you get the user and it kind of awaits. Lazy, you call the context and you await the result. Then for best friend, you also call lazyusers.get and await it. Then you kind of create the map that includes you and the best friend. And for the whole profile, we kind of call this lazy load user, await it. Then it getting the friend's ideas. So we call again the lazyusers context, get the friend ideas awaited results. So we get back the friend ideas. And then we map over the friend ideas and get run again the lazy load user's function. The last line is exactly the same, combines the two maps. So as you can see, this code looks really similar to how the code would look like if you would just write it like calling a normal database. So that's actually really nice. So if we compare it to different implementations, so we would query the name of the first user. This is quite similar. Then we query the name of the best friend. Still the same, that's fine. Then we get the friend ideas. And here it is a different. Like here it will, in this case, query five different users in a new implementation. It will only query four users. And like the data loader automatically also has caching. So it gets back only or it only asks for four users. And then if we query the best friends, again, there's some duplication. So we only need to ask for three different users here. So the code looks very similar, but we get efficient data loading for free. So the conclusion of this is that, data loading with this approach can now be done in the context. And you simply just have to add defer to the function definition when you need it. You add a weight if you want to await a deferrable. And you know when you need to resolve a deferred value, you basically run defer.run, and then you get back to the real result. And then everything will be combined into as efficiently as possible. And so the data loader itself, it already does like parallelization, it already does caching, and it also batches as good as possible batches results. There might be some improvements that you can make, but that's just something that you can add into the data loader code. And then you don't really need to change your application code. So you can improve the data loader and get all these benefits for free. So can I use it now? So I wanted to say we're using it in production, but I didn't get it in production yet, but I'm planning to do this really soon. So we actually have this problem that we have quite complex authorization. So we need a lot of different data loading in different scenarios, depending on the scenario basically, to see if a user can see what he's looking for. And this is actually not possible using the existing data loader. So this is the first thing where we use it. So now it's actually the queries are quite inefficient, and I'm really looking forward to using this. APIs might change. So it's pretty new. I opened the pool request in data loader. So like the initial, like the eventual goal is I think it makes sense to have this in the official data loader package because at least a solution to this problem, I think a lot of people have this. So the goal is to have this within the data loader package, but you can use it now. I've released like a separate package of data loader that includes this, and it's called lazy loader. So you can just use this. And if there's any ideas on how to make this better or like feedback, I'm really looking, you know, I'm really, yeah. Look at Hexel, Facebook's Hexel, make it better because they solve very similar problems. And that's in Haskell, right? Some people have posted it to C-Sharp, so you can probably put it to, I like that as well. Yeah, I actually had a quick look at that as well. Yeah, I think the idea is quite similar to data loader. So, yeah, maybe there are some improvements to be made. So, you know, you can look this, so actually all the code I had in the presentation is real code that you can run. I published this on GitHub, so if you wanna, you know, really see, you know, the nuts and bolts and dive a little bit deeper, you can basically look at this repository and you can run it. So, yeah, if you're interested, have a look. Thank you very much. Any questions? So, basically the await macro is reps, all your lady who does code, like previously you would be writing callbacks, all of us just put, like, it removes the callbacks. Yeah, so it wraps the callbacks. It wraps the callbacks, right? So it's still doing the same thing. Yeah, I'm not sure. Because it's quite dynamic, so you can have, you're not sure if you get back at the first value or not. So there might be something you can do to, if you have, can do like static analysis to actually not have the callbacks anymore, but that's only, that's not in all cases. So it's still, you still need to have the callbacks. But to see the callbacks, but you're just making it easy for them. Yeah, it's basically syntax you work to, to make it more readable. Yeah. It's like how they used to wrap, like the Node.js callbacks, when you're sitting with people. Yeah. And they're doing the same thing. Yeah, so it's basically a bubble plug-in, kind of like a bubble plug-in. Okay, sure. Yeah. Yeah, can you go back to the callback for that? We're talking about exactly the batching happening. Yeah. Okay, sorry. Right. So what it will do, so this is I think the most important part. So it maps over all the friends' ideas and it will create a deferrable and this will be a list. So it will be list of deferred values and this list with a weight. You basically say, okay, create a callback. After this list, I want to run this. And inside of the deferrable code, I still need, you still need to run this. So you still need to call defer.run. But then if it has a list of deferrables, it will basically use the data loader to batch them together. Yeah, so. So just callback, callback, callback. Yeah. Yeah. Sorry, why can't you run them using an exist task and then join the result? What's that? Task? Yeah, but then you still do the query. So if you, you can do that. So then they will be paralyzed. So you will do the query at the same time. But it's still like five queries that you need to run. And so what we want to do is do, like if you're using SQL, you do select user and then ideas in these like list of ideas. It's a single query. Yeah, so it's a single, it becomes a single query. Yeah. And especially if you have like deeper nesting and a lot of like values in the list, that will just instead of like 100 queries for a single request, you only have like five or something like that. So, yeah. Yeah, I know I did some, in my previous job, I did some experiments with Neo4j. Yeah, that's a way to first graphs as well. This is kind of more general purpose. So if you use GraphQL, normally, at least most people use, still use a relational database. Yeah. Good stuff. But probably you can use the data loader also for Neo4j. Because a lot of times you still need to do some optimization how to execute queries. Yeah. I'm just trying to take out the box to prevent the multiple queries and the queries thing. And Neo4j seems to be a good fit for this problem domain that you want to solve. Yeah. Yeah, I think deferrals are not really used in Elixir because a lot of like problems that you have within Elixir, you can solve them using processes. It's like, this is slightly different. And, yeah. So it will be as much as possible right now. So, generally, that's quite okay. But I mean, you can, basically what you can do with data loader, data loader is also a generic interface. And so they have a source for, for instance, a vector. But you can, if you need any specific requirements, you can write your own source. And they can do any optimization that you might want to do. So if like bed size is a problem, you can write a source that only does like a particular bed size. All right, okay. There was not any more questions. Thank you.