 Good afternoon, everybody. My name is Emil, and this is my talk title. Turn hours into seconds using flow for concurrent processing. So I was supposed to give this talk at ElixirCon to UK, but I had to cancel my travel because I felt sick. So let me take this opportunity to thank all of you to give me this wonderful opportunity to talk about something that I'm really excited about. So let's get started. I work with a small team of developers at Code Mansers where we build web applications for our clients. So this talk is based on my recent experience working on a soft real-time system that we built completely with Elixir. So we're going to look at some of the performance optimizations that we did that brought down the execution time of a critical part of the system from a little more than one hour to a few seconds, actually. So first, we look at an example that is similar to the real-world problem that we worked on. Then we'll see the first version of the code that actually had a lot of concurrency problems. Then we look at the second version of the code that fixes the concurrency problems but brings in some other problems. And finally, we'll talk about the actor pattern and the message passing concurrency model and how we use that to solve the problem using Elixir. So let's look at our example. So we are building a module that does season rankings for a multiplayer game. So after the season starts, players from all over the country can come and play this game. And there would be maybe 100 players playing a game. And until the season ends, they can play this game. And at the end of the season, there would be awards going out, like prices given to the top scorers. So there is going to be a leaderboard that will be running throughout the season. And the players will have a rank in this leaderboard. So basically, players get scores based on their actions in the game, which in turn affects their ranks in the season leaderboard. Remember that our ranks are not per game but actually for the whole season. And players can actually check their ranks whenever they want. The scoring calculation will be a function that looks like this. So these are based on the different in-game events, like kills, deaths, and assists. So if you're familiar with Counter-Strike or something like that, it's like that. So there will be kills, deaths, and assists. And then the final score will be a function of these. And these in-game events are actually stored inside a database as a game progresses. So first version of the code is something as simple as this, where we have a function that calculates the scores of all the players of a game and then reranks all the players in the system and then saves the scores and ranks in the database. So we call this function right when the game ends. And we make it asynchronous by wrapping it inside a task. So after each game, there would be a task that spawned that would run the scoring and ranking. And because the tasks run inside separate processes, they can run concurrently. But when we tested this first version, we found that the calculations were going wrong. So we would get duplicate ranks. And this would happen very randomly. So we clearly have comprehensive problems here. So let's understand what's going wrong. So the current, so let's say the current top score in the season is 100. And two games just finished and at the same time, and then two asynchronous tasks respond, which would calculate the scores and ranks for the players. So in each process or in each task, the algorithm would find one top scorer because there is one player whose score is more than 100 in each game. So when these tasks finish execution, we'll end up with two players with rank one. So how do we solve this problem? So what is the first thing that we do when we have concurrency problems or what we are taught to do? So we use locks, right? So we wrap our code inside a lock so that only one scoring can happen at a time. So this is how we write global locks in Elixir, I mean, or Erlang for that matter. So this will make sure that only one of this will run at a time on all the nodes that you have in your cluster. So we use global.trans for this. So now without concurrency, this is how the execution timeline would look because there is no concurrency, it would take a really long time for the scoring to finish, but at least we don't have any bugs right now. But wait, we have actually some other problems today. So we are still in beta. That means we are still in the process of fine tuning our scoring function. So we have some constants that we use inside the function. We have to fine tune them to remove outliers in our scoring. So there'll be somebody who will have like high score in some formula and then we have to remove outliers by adjusting the constants. So every time we do this, and we do this multiple times a day maybe. So every time we do this, we have to redo the scoring of all the players for the whole season. And yeah, so this is how the iteration would look like. So the requirements would change and we'll understand the requirements and we change the code. And then after we change the scoring function, we'll run the scoring again and we'll have to wait for about an hour because we have a lot of games happening. So we redo the scores for all of them. And then we find out that we have another bug. So the problem is that our scoring code is not item-potent. So in order to avoid recalculating a player's score with using all the in-game events of the whole season again and again, we actually calculate the score for this current game and then just add it to the existing score of that user. So this makes the calculations really fast. But this means we cannot run scoring multiple times because the scores would get added multiple times and we would end up with the wrong scores. So now the iteration looks like this. First we change the scoring function and then we delete all the scores. We now do a scoring from the blank state and then wait for an hour to verify and then we end up feeling bad for ourselves because we actually almost thought that this process is good, but it's not good. It's not all right. There has to be a better way. We have to make our scoring item-potent. But that would mean the scoring would take forever to complete. It would take a really long time because we are using all their in-game events for the whole season to calculate the scores and we have to do this for every game one after the other. And we do this every game one after the other because we have locks. But hold on. Why are we even using locks in Elixir when we have the actor model? So previous talks, the excellent talk by Brian also talked about the actor model. So I'm pretty sure you're familiar with it, but again. So the actor model of concurrency says we don't create locks and locks on shared state and change the shared state ourselves, but instead we'll let an actor own the state and we'll send a message to the actor and ask it to change it for us. So let's rethink our solution in terms of actors and message processing. So when each game ends, it will send a message to a scoring queue and the message is going to be a game ID. The scoring queue will forward a list of game IDs that it collected to an actor that fetches a list of player IDs from this list of game IDs. This actor now has a list of player IDs and yeah, it just sends it to an actor which is responsible for scoring the players. And this function is going to be item potent. That means it will use all the in-game events from the database for the whole season. Yeah, before that. So when we do this, this actor is going to be around for some time. So we can do some interesting stuff like caching and eager loading and all these things here. So that's one thing to keep in mind. And then after scoring we, this actor will send a message to another actor to trigger a re-ranking of all the players in the system. So let us look at different ways we can build this message processing system in Elixir. You could use processes. A process as you already know is an entity which executes a function and it can send and receive messages. So if you use processes, all our actors are going to be processes like this. And each process will have a loop which receives a message from a sending process and then does some computation and sends another message to a receiving process down the stream. But this received loop will have to be written by us. But we have more than processes in Elixir. We have OTP, which is a library of behaviors or rather like design patterns that we can use to quickly build solutions to common problems. So OTP is a higher level of abstraction so we don't have to write custom receive loops, but instead we'll write callback functions that the behavior requires and we're all set. In addition to that, modules that are OTP compliant can be added under a supervision tree which makes them fault tolerant. And also there are a lot of tools that OTP comes with which we can use for debugging and inspecting the systems when things go wrong. So GenStage is such a behavior that is available in Elixir. It's not there in our land. Which helps us create demand-driven message passing stages where each stage can be a producer, a consumer or a combination of both like a producer-consumer. Let's see how we can build our pipeline using GenStages. So this is how it would look like so. So here each producer emits an event or events from its queue. Whenever the consumer downstream has capacity or a demand to consume them. So in this case, let's say the second stage which can fetch players. If it can process more events from the previous stage that is the scoring queue, the scoring queue will take a bunch of game IDs from its queue and then send these events downstream. So all this sounds good, but can we create something better, something like this where we have many concurrent stages? So given my past experiences building applications with a lot of moving parts in Ruby and C++, I find building something like this in those languages non-trivial at all. But in Elixir we can build this concurrent event processing pipeline which would allow us to plug our application code inside it in less than 10 lines of code. So all this magic and heavy lifting is done by this little library called Flow which is created and maintained by the Elixir team. So let's recap and look at the levels of abstractions we're talking about. So first we talked about processes which are the building blocks of state encapsulation and message passing. Then we discussed gen stages which help us create demand-driven message streams and now we are discussing Flow which helps us create pipelines that do concurrent stream processing. So what is Flow exactly? So Flow is a module with a bunch of functions built on top of gen stages and we can use that for, developers can use that for expressing computational steps as a neat pipeline which will get executed concurrently. So let's look at an example to understand this better. So let's say our job is to count the number of occurrences of letters and strings like this. So we have a list of strings and then we have to come up with this map where we count the number of occurrences of each letter in the string. So let's see how we can build this using the enum module. So first we'll have a list of strings. Next we'll call a flat map on the strings and then split the strings into words. So we'll end up with a big list of words and then we'll do flat map again and then split these words into letters. So we'll use a function called string.graphines for this which basically does that. And finally we'll prepare a map using map.update and we'll use the reduce function. And then so the map update function actually lets us set an initial value for each letter and then if the keys collide we can actually bump up that count by one. So the code would look like this. So let's see how we can rewrite the same thing using Flow. So the first thing that we'll do is just replace all the enums with flow and nothing else. And lastly we'll insert a function flow.fromenumerable which basically takes the list of strings and hands it over to the flow pipeline. So let's understand what these flow functions do behind the scenes. So flow.fromenumerable takes an enumerable and returns a flow struct. So I think you already know what a struct is from the previous talk. So it basically sets that enumerable inside the struct and returns a struct. But it also does one thing that is smart. So it basically finds out the number of cores that are available on your computer and also sets that inside this map. So that means flow.flatmap will now receive a flow struct and then a function called string.split. So let's see what flow.flatmap does with this. So flow.flatmap takes a flow struct and the function used for mapping and pushes this function into a list of operations inside the struct and returns that struct. So that means flow.reduce will again receive a flow struct and then also receives a reduce function and let's see what flow.reduce does with these. So again, it takes this flow struct, takes a reduce function, pushes that into a list of operations again and then returns this flow struct. So that means all of these functions just keep adding operations to a list inside this struct and just keeps passing around this struct downstream. So from this we understand one important property about flow, that is flow is lazy. The flow struct that gets passed around only gets evaluated when flow is asked to run. So we can make a flow run using, by explicitly asking you to run by calling flow.run or we can enumerate the flow which forces it to run. So as you see here we can pipe the flow to map.new or enum.toolist which basically enumerates the flow which runs the flow. Basically it converts the output of flow to a list and then you can enumerate on it. The second property of flow is that flow is also concurrent. As you already saw flow knows the number of cores in your CPU and I mean in your computer and it uses this information when spawning consumer stages to achieve maximum concurrency. So when our flow pipeline runs on a machine, two cores, flow will first spawn a producer, then spawn two consumers because we have two cores. These consumers will then subscribe to the producer. So inside the consumer flow will run each operation stored in flowstruct one after the other. So the first operation in our flowstruct is a map, a flat map operation. So it basically calls string.split and then splits the string into multiple words. The next operation is another flat map which splits the words into letters. So we'll call that after the first flat map. The last operation is a reduce function which results in a map of letters and the number of occurrences. Then finally we call map.new which basically enumerates the flow and this converts the results into a list which is then given as the input to map.new. So finally we get this result. So if you look at this closer you can see that the counts are wrong. There are two F's and four O's but we get one F and two O's, right? So what happened here? This happened because the same letter goes into both stages. In each of these stages the counts are calculated separately. That means F will have one occurrence in one stage and the same thing in the other stage and when this gets merged into a single map the values of one stage will overwrite the values of the other. So in order to solve this flow provides something called partitioning. So you call this function flowed out partition which makes sure that the same message always goes to the same stage. This uses a hashing function internally. So basically how the hashing function works is you'll give it an input and then give a list of categories that are the number of outputs that are possible. So it will categorize that into one of these categories. So if you have four outputs or four stages then it'll basically give an index between zero and three for any given input. So after we called a flowed out partition flow created a set of new stages downstream and the same letter always goes to the same stage. So this way the same letter does not get repeated in multiple stages and then we get the expected result. So let's recap and look at the flow pipeline that we have created so far. So we started by passing the strings into flow.from enumerable which then gets split into words in a flat map operation. Then the next flat map operation splits the words into letters. Next we partition the letters into multiple stages and then we build the map with their columns. So here's the final code for this pipeline. So there are not many changes from the enum code if you notice. We just changed the enum calls with flow and we added flowed out from enumerable at the top. So you can see a comparison here. So we use flowed out flat map and flowed out reduce for transforming the registry but there are a bunch of other functions that flow provides like filter, map, reject, et cetera which are basically similar to functions with the same names in the enum module and you can use these functions to transform the data in your event stream. So let's go back to our original problem of ranking players. So now that we know how flow works let's try to reimplement the same thing using flow. Instead of starting with flowed out from enumerable we'll start the flow using flowed out from stage. This function is used because in our example we already have a scoring queue which is already built and it sits outside the pipeline. So we just basically want to consume or create a consumer as the starting stage of our pipeline. So we use flowed out from stage which basically sets up a subscription to a producer that sits outside the pipeline. So the events that we receive as part of this like from this producer would be game IDs, list of game IDs, et cetera. And then next we use flowed out flat map to find the player IDs using the game IDs. So if you have a list of game IDs we'll do a database query and find all the player IDs for that list of game IDs here. Next we do a partitioning using player IDs because we don't want to really score players like the same player in multiple stages and save the scores in multiple stages confidently because we would have like weird issues with concurrency if you do that. So we want to make sure that one player ID is always going to one stage. So we partition it using the flowed out partition function. Next we use flowed out map to calculate the scores of each player. And we do this in an idempotent way using all the events in the database or the in-game events that we have for the whole season. So now we pass the scores to the next operation that is another map operation and then we save the scores in the database. And after saving the scores in the database we emit events to a queue which would basically trigger a re-ranking of all the users in the system. So this process that does re-ranking is again not inside the pipe when it sits somewhere outside. So basically it has a queue where it receives events that trigger re-ranking and it may choose to run, let's say once in five minutes or once in 10 minutes or maybe once in an hour. And that logic is not really part of this pipeline. So it sits outside the pipeline and then receives these triggers and then it may choose to do the re-ranking whenever it chooses to. So here is the final code that we have. So we created the flow pipeline using the flow APIs and then we can use flowed out start link. So if you've seen the start link you know that it's something that can be used to establish links between processes. So that means you can use flowed out start link to create this flow pipeline and put it under a supervision tree. So that'll make it fault all the time. So if at all there is any edge case that we have not handled in any of these stages or any of these functions that, so basically playerscorer.scoreplayer is application code, right? It's not flow code. So if you have any bugs there it doesn't crash the whole, I mean crashes it, it crashes the whole flow pipeline but it restarts it again because it's under a supervision tree. So as I mentioned before it took less than 10 lines of code to actually build this topology of message passing that does concurrent processing. We'll actually see some code to compare the first version which had locks and the second version which had flow. I can't see the mirroring option. Can you guys see this in the back? It's a font already. It's okay, I have a lot of time. I have a plan for this. This is going to, okay so, so this is a, what do you say? I just scaffolded a project so it doesn't have anything else. So we have C Okay so, so this is the function that we already saw, right? So there is a, there is a run function which basically does the scoring and everything. So we wrap this inside a task and then we call this function score players with a game ID which scores the game and ignore these functions. I mean this function is basically for benchmarking because I wanted to count the number of games which are scored. So I put a call back. So if you look at score players you'll notice that we accept the game ID and then we do all this reading from the database and calculating total scores. So yeah, so this is the part where it is not idempotent. So we have existing scores and then we merge the new scores. Can you guys see it on the back? So we have existing scores and then we merge the new scores on it and then we sort the scores and then we save it in one shot so that we can save the ranks and scores in one query basically. So okay, so this is the last version where we have the third version where we have used flow and so there is a new file here which is called scorer which is yeah, so which is a file which basically contains only the flow pipeline. So everything else remains in the player scorer dot score player. So we have a scoring queue here which could be a producer like a GenStage producer. So I'll just open score. Yeah, so here I've used GenStage again. So the scoring queue is a producer. So you can see in the innate callback of the GenStage we specify what kind of producer it is. So I mean what kind of stage it is. So here it is a producer. So it will have a push interface where you can push game IDs to it and then it will basically add those game IDs to a queue and then when there's demand from the flow pipeline the pipeline will ask for some game IDs from this queue. So it never just sends game IDs unless there is a demand. So if there is only, let's say there are only two game IDs and then we basically say that okay the max demand is one. So the consumer will ask for one item instead of like bunch of 500. The default I think is 500 or something. So I just ask for one game at a time. So you can specify what demand the consumer should deal with. So yeah, so these are basically the callbacks that we'll write for GenStage. We'll have handle demand which has the logic of how the demand should be handled and the events should be dispatched to the consumer. Okay, so I'll come back to, I mean this is not the focus of the stocks are not really going into details but we'll actually look at the flow pipeline here. Okay, sorry guys, I cannot mirror this for some reason. So let's start with this. Okay, we'll write for this. GameScore. Yeah, so this is the pipeline. We already saw this. And so one interesting thing to notice that so we already had this player-scorer module but if I open this now, so earlier we used to score games. Now we have started scoring players, right? So earlier we used to do scoring one game after the other but now we are doing like each player one after the other. So we'll get a bunch of player IDs and then we score players. So that means we'll look at all the games that the player has played and then look at all the in-game actions and then fetch them, do some caching and I'm not doing all those here. We'll do some eager loading and then score a player. So that's a difference here. So we have score player instead of score game and so here I'm using a small flow here. So whenever you have a need for, if you wanna do a map but you want to do concurrent mapping, then you can always use flow because it's very lightweight to spawn processes and then do things concurrently. So you can make something concurrent just by doing this. You have a list and then you have to concurrently map over it. Okay, so let's come back to slide. So we're gonna do some benchmarking on top of this code and the code is on GitHub. It's there if you want to let's check out how this is written. So when we try to score 100 games with 10 players in a game, it took 48 seconds using the version with locks. This means the games will be scored one after the other and then using flow it took three seconds and when we increase the load four times when the number of games became 400 from 100, it took six minutes to execute the scoring and ranking whereas it took only 15 seconds using flow. Now when we bumped it up to 1500 games it took 1140 minutes to complete the whole scoring and using flow it took only 55 seconds. So this is not just because we made it concurrent but it's also because we made it into a flow pipeline. We can do some interesting things like caching and eager loading as I mentioned. So what are the takeaways from the stock? So let's, so if you're using Elixir or Lang and if you ever reach out for locks then probably you should stop and think and you should think if you can rewrite your algorithm using actors and message passing. Think of event streams when you have to do computations continuously and partition these streams into multiple stages for achieving maximum concurrency. Finally, don't be afraid of concurrency. Elixir makes it easy so use it to your advantage. Thank you. Do you have time for questions? I can take a few questions if you have. I guess no questions. Swaran, you promised you'll ask a question. I have it. So you asked what happens when the number of players increases, right? Okay, so for 100, I don't have it for 1500 games because it would take too long, certain that I call it. So for 100 games, when we start with five players it took 1.3 seconds using flow and with 10 players it took 2.3 and then you can see here. So with 40 players, so okay, we'll compare this. So we have 10 players and it took 2.3. When we increase it four times, it basically increased four times. I'm not doing any caching or anything, so a lot of things can be improved in this demo because the sample code is kind of naive. Won't done anything actually. All right, I think that's it. So one thing is using logs, everything would happen one after the other, right? So here we have, so basically we have made everything concurrent, so all these computations would happen concurrently. We fetch from the database, we do everything concurrently and then we do re-ranking only once. So there we did ranking of all the players after each game, but here you're doing ranking in a separate system and we would trigger the ranking maybe once in five minutes or something. So we have that flexibility here because we don't have to do it after every game and ranking takes a lot of time because you basically load all the players from the whole season and then re-order and then basically save the rank, right? So yeah, so the way we test this is a little tricky because we are passing messages around. So what we do is we have some mock producers and consumers just for testing and we produce these events from the mock producer and then the consumer we would assert that we got these messages. Yeah, so we basically pass the message through the whole thing and then we'll assert that the messages are received as expected, okay? Okay, thanks guys. Thanks everybody.