 I'm going to give you a talk today about how you can learn from my terrible mistakes over the last two years of using Elixir in production, a little bit about what I've done in myself. For the last nine years, I've built online games exclusively, right out of college. I worked on games like Guild Wars 2, Loaded of the Rings Online, Dungeons & Dragons Online, thank you, League of Legends, and most recently, Moonrise and the State of Decay franchise. Undead Labs is the last game company that I was working with where a lot of these stories take place. It was founded in 2009 by one of the founders of ArenaNet, also the creators of Guild Wars. I joined in 2013. I was a round employee, 20 or so, now the company is up to 50 or 60, and they're a great group of people. One of the games that we were working on is Moonrise. This was the very first online game that Undead Labs was creating. I say was because unfortunately it was canceled, but it was not canceled due to any technical limitations. Just really hard to make a game in the mobile space and also execute in the PC and Mac space at the same time. We learned a lot when we were developing this, and we get to keep the technology and use it on our future projects. Other projects that we have are State of Decay, which is a zombie survival game for Xbox 360, Xbox One, and the PC. Original launch date was June 5th, 2013, then April 28th, 2015 was the re-release. Nope, nothing's wrong, okay. The game was actually created with just about 20 people. It's a single-player game, and the future of the franchise continues. I started using Elixir at 090 about two and a half years ago or so. I was actually really happy with Erlang, and when I first heard about Elixir was actually like 2011, my coworker pointed me out to something on Reddit or Hacker News, and he's like, hey, it's Ruby on the Erlang VM, and I'm like, I don't want that. I like Erlang on the Erlang VM. I don't want any of the garbage collector problems or the threading problems or lack of threading or any of the things that really came with Ruby. But the things that are nice, of course, are syntax. When I read the web page originally, it didn't resonate with me. I didn't quite understand what Elixir was, so I didn't think about it for another two years or so. And when I did start thinking about it again was when my friend Pat again said, hey, it's not actually Ruby on the VM, I was wrong. It's actually a nicer syntax, and that's what he pitched it to me as. And that was good enough for me to look at it, because while I was happy with Erlang, getting others on board was very hard. The Standard Library is fragmented in all these different ways. You don't know what's new, what's old. You don't know the order priority, the parameters, which totally depends on which module you're using on the priority where you pass in for like a collection, what the collection or what the enumerator's gonna be. And it was a little bit of a pain to use and a lot of boilerplate too. And not awesome tooling. So getting people on board was a little rough. The thing about Elixir is I came to it, I would have been happy with just like Erlang Standard Library too. I think Jose, this is really all you ever really need to create for me to be happy and to get a bunch of people on board would have been something with standardized and run by the community and not by a corporation. Which is actually a thing I really like about Elixir is that the community, if some of you are new to it, I'm telling you that it is literally one of the best programming communities I've ever worked with. And that's something that we also were able to cultivate in Elixir. But like I said, Elixir team delivered so much more. A great standard library, a build tool and mix which is extremely powerful and extremely easy to use. One of the, a lot of people in here might have done rails in the past and is really easy to get somebody started with rails new and mixed new is just as easy. I know rebar and concrete have come a long way as well. But two years ago, they were not at this stage. Amazing docs and they are first hand, you write docs that test the code that they're documenting and a package manager which I always really, really wanted in Erlang and that's in Hex. Thank you Eric, that was very much needed. Polymorphism through types with protocols. So for instance, if I created a JSON library and gave it to you and you had a custom type, you don't have to fork my JSON library anymore to determine how to serialize your type. You can now just implement the protocol and hygienic macros which I'm a huge fan of, started using them in closure and was addicted to them. And then I learned not to do that. So I use Elixir every single day for the last two years and I can tell you that I am extremely happy. I wouldn't be here today if I wasn't really pleased with the language, the ecosystem and the community. And it allowed me to be very successful in a very short amount of time. Elixir is, in my words, humane Erlang. It's Erlang made for people in this century. And I know that, again, I like Erlang a lot. I'm just being empathetic with all the other people that don't enjoy it. So Elixir, it's easy to get started, get it on your development machine and creating your first project. And if you're using Phoenix, the web is super quick to get anything going. But it's not so easy at first to get it into production. And once it is in production, it really is brilliant. So the purpose of this talk is to share my learnings over the last two years of what it's like not to develop Elixir, but how to take the baby that you made in your computer and force it into AWS or somewhere else, right? Last year I was here and I promised myself that I'd chat about my experiences. And I'd hope the room would have more people than last year. And it's something like double or more now. And that was amazing to walk into today. So first I'll tell you a little bit. I assume everybody understands video games and is into games and everybody programs video games, but that's not true. I just have never done anything else. So I'll give you a little bit of an example here. Basically, I build server hosted online games. They are a persistent connection. So if you've done web stuff, it's like HTTP too. You hold a persistent connection. They're stateful. So unlike HTTP, we run and simulate the game. And it's not easy to migrate or destroy an application and bring up a new one. So Erlang does something really well and Elixir inherits that, which is code hot-swapping, no downtime. And they're normally multi-service, or I think the buzz word now is microservice, where you have this little thing that does chat and this little thing that does authentication. And they almost always have persistent storage. So if you've played an online game before, like if you ever networked Doom, that was not a persistent online game. There was no storage there and it was just transient you played together. But persistent games, or World of Warcraft, or League of Legends, or Guild Wars, EverQuest, Ultima Online, and almost most modern games today have at least some sort of back end storage. Even shooters like Call of Duty. And my specific role as a network programmer, I build game back end platforms. Game back end platforms are a thing that provides authentication, chat, presence, they're all the microservices. Purchasing, player storage, matchmaking determine if you should play this player or not, who's better, who's less skilled or more skilled. And then administrative tools, like I have to kick somebody for being a bad person in the game. So if you've ever played a Blizzard game, it's like Battle.net. And we just didn't brand them all the time. But almost every online game has one. And this is a high level landscape of what that looks like. This is not drawn to scale. I only show like four chat servers, but there could be upwards of like 128 of them running. And I only show the chat service. But basically the client is the game client. That's your phone or your computer. And that connects down to something called a lobby server. The blue lines are reliable UDP connections. That's the client speaking to the lobby world in combat. The lobby server, the world server in combat are colored brown. Those are the C sharp on mono servers. And then the purple servers are the elixir servers. And any red lines that you see are fully meshed nodes. And any black lines you see is just TCP. That's important later on to understand if it's fully meshed and not fully meshed, we'll get to that. The client talks down to the lobby, which holds a persistent connection. It then finds the player, a world server to log into, which is like what you'd be walking around in. And then once you do combat, you transition to another server that simulates the combat. And all three of the C sharp servers are clients of the back end platform. So anything in purple is that back end. And the routers are what they talk through. And they talk a binary protocol to the routers. And the routers is job, which we'll get to in a little bit. It's just to send the messages down to the appropriate service and then get them back to the person who asked for them. And then the nodes below it, say chat. Those are called service nodes. They provide a microservice. So in this case, chat and presence. And those are database backed. And they connect to database nodes. So the white nodes at the bottom, in this case, are Postgres nodes in our infrastructure. And they're sharded. Meaning that there's many buckets that contain your data. And then you go to one. It's a way to load balance. So how that breaks down into what it would look like in an Elixir project is pretty simple. And again, this isn't the whole application. This is just the small piece to make a point. Is a protocol library, a common library, a route app, and the chat app. The protocol contains the serializer and the DC serializer for our binary messages. And also the common module contains any active recordy stuff. Something that's useful for every single one of the applications. And then we have the route server. And the route server is a gateway into that backend. Like I said, it speaks TCP with undid server binary protocol. At one point, I had tried to go over what that binary protocol looked like to an audience and they were sleeping. So you can go look at that Erlang factory talk and take a nap. But I talked about it, if you'd like to see it. There's like two people who probably are interested, and one of them's me. And it also disconnects misbehaving clients. While we're not connecting players directly to the back end, somebody as a company grows could create a misbehaving server. And it also does a job of dropping hanging in partial TCP connections and rate limits different types of requests. And what it looks like when you're routing a message is basically you read data from a socket into a buffer. You stop when you've reached the amount of data that the message said it was going to contain. And then we tag that message, which is just in Erlang, sorry, in Elixir struct, with the PID of the route acceptor. So like where it came from. And then from there, we use a routing hash, which is a piece of data that the client sets that's like, hey, it's my account ID or it's my name to perform deterministic routing. Or randomly deterministic routing, because you don't really know where it's gonna go. And it sends it to a service protocol. So the client will say, or the lobby will say like, hey, I'm talking to the chat server. And it's like, all right, well, we're gonna send to this chat server, chat 34, right? And then on the other end, the service nodes receive a routed message. So the service receives that message in, it internally dispatches it to itself and figures out what code path it's supposed to go through. And then at the end, it checks to see if the message was transactional or not, and replies with a message as well. And this is just all in Erlang term, elixir terms. And it bounces the message back to the router that sent it, because we tagged it with the pit of the route acceptor. And that route acceptor serializes a binary version and sends it back to the client. And through this, we are also using a database backed by Postgres with Ecto. Again, thank you very much, Eric Ecto kicks fucking ass. It is so, so good. It's okay, you can clap for him, yeah. And Jose also, we'll get into it a little bit, helped a hell of a lot. Fixed some issues that we had going into production with Ecto. And it really helped us scale to a huge degree. So it's sharded by a schema. Postgres schemas are little buckets. They're inside a database node or an instance you have Postgres running. And you can have different versions of that exact same schema prefix with a name. So in this case, we pre-shared data into X buckets. In this case, it's 128 on X amount of nodes. And in this case, there's two nodes. The buckets are Postgres schemas and the nodes are Postgres instances. And each service node, or say that chat node in the left, services users that belong to shard 1 through 32 and only 1 through 32. And if we wanted to continue to spanned out, we'd just add another database server and move the shards over. Or the scheme is over to another database server and then have another chat server connect to that. So it's very easy to actually scale out. You don't take a full downtime when you use this model. Instead, you just say, we're going to take 64% of the population offer, 50% of the population offer something like that. So that's the high level overview. You don't need to know anything about games or the back end to learn from the rest of this talk. But now, once you've developed this, you're like, hey, how do I get this thing that works on my machine into production? So this is really how that adventure begins. Yeah, so some new people in the room probably that would have come from a scripting language or a non-compiled language, stop and do not deploy your source code to your node. That's like really if anyone has a security guy in your team and you said that he'd be at your front door waiting for you to come out to go to work and bug you the whole way, tell you to please don't do this. It's easy to do it and unfortunately a lot of the tutorials online are telling you to do this, but you don't want to. In the Erlang ecosystem, we do something instead called creating a release. This is still my least favorite part of Elixir and Erlang. It's just not really getting better. I know that Paul might be in the audience here who's Bitwalker on GitHub. I've never met him and he's creating EXRM and unfortunately that didn't exist when I started either, but that's making great strides as well. But creating a release was described to me as dark magic one time because there's like one guy, Sylvester Stallone as Harry Potter on your team. Yeah, I found that image. And when I learned how to first do an Erlang release, I think Relix existed. I can't remember because I was really stumbling through it. It was so painful. There's some real old janky shit that you could use like RelTool. And then there's Relix, which is a hell of a lot better. And then EXRM now, and like I said, EXRM is on GitHub. And it might be the way that a lot of people are using it right now. But it does a couple of things that I'm not a big fan of. So, and I had already created my own release process, but I don't actually believe that operations people should be managing my servers or my applications. I think that they should auto discover and you should use tools for that. So I don't really care that there's a human readable format. I didn't mind that there was an Erlang term. But overall, this is a really great way to get started. I'm gonna talk a little bit about how to do a release without it, just so people know how it works. Because it would be great if people understood and then helped with EXRM more, because I think that that is the right approach for the long term. EXRM basically just calls into Relix, which is a release tool. It made hell of a lot easier to create releases than RelTool ever did. And you can get it on GitHub there. But what's important to know about what's in a release is you can think of it like a manifest and all the files to be given to your VM on startup. It has all your compiled applications, all their compiled dependencies, some boot scripts, which is needed by the VM to know how to start the application. Control scripts to start, stop, restart, assist config, which is optional. VM args, which are optional. And then ERTS, which is actually the early runtime system, which is optional as well, you don't need to include that in the release, but you can. And the config file and the args are just the way you configure the VM or your applications that are contained inside the VM in the case that this is config. You can just really think of this as Mr. Fusion, the DeLorean. You can put anything in it, like my code, trash, same thing, doesn't matter, anything you want. And a release can contain many OTP applications. And this was the thing that really helped me understand. If you created one application, you can just create a release with that, plus the rest of your umbrella, and they all live in one single VM. So when we talk about applications, we don't really mean like a binary process that's running on your machine or Nginx or something like that. Instead, it's an application inside of the VM. And an OTP application is just a bunch of related code and processes. They're wrapped in an OTP behavior. You might have seen it if you ever typed use application. And it informs the VM on how to start and stop your application. It might be helpful to illustrate for anyone what that looks like in the mix file. Because the releases are all Erlang, and then you have the mix files, and you have the config file as well, like config.exs. And this was really hard to get for me to wrap my head around when I first started because, well, first it wasn't a config.exs when I started. But knowing what this mapped to when I was building the release really helped me. So here, this is just a library. It's not starting any applications. All it's saying is I depend on these things. I want logger, and I want crypto. This is the most basic. So if I was to create a release, this would do nothing basically. I'm not starting an application. I'd basically include this in something else. And this is what an OTP application would like. So very similar to the last, but we have this key value that says mod, and then the atom or the module name of the app we're gonna start. The thing that implements use application, and then the arguments we're gonna pass to it. Any registered processes that that thing's gonna start. If you do have any named registered processes, that's useful to fill out. Please do it if you're distributing this application to anybody or if you have a large organization, because if you name your registered process game, and I name my registered process game, they're gonna fight and not be able to be in the same VM. And then the end bit down there actually maps to sysconfig or default configuration for your application. And it's very, very important that you include, I'll go back one side. In the applications like logger, crypto, any of the things that you're depending on, you're expecting that library to be there. Just don't put it as a dependency. You also have to put it inside of your applications that to be started. Or when you create the release, you'll get this module not found. And you're like, what are you talking about? It worked on my machine. And then you go in the VM in production and you find out that the module or the application completely was not included. And that bit me and every single one of my coworkers multiple times. So this is what the mix.exe file looks like. When you wanna include a dependency from a project, I'm sorry, from a package manager. So in this case, discovery, which is an application that I created. And then I make sure it gets started and included by the release by putting it inside of that list of applications. So configuring the release, which should I use? They all do configuration. Well, the mix.exe, like I said, you can put environment configuration there and it's like default config stuff. But, I skipped the head crap. All right, well, forget what I said. The config.exe file. This is the file you would configure if you were not building releases. And instead, you're pushing your code out under your nodes. You just can't use it if you're building actual releases. It's not respected by the early VM in any way, unless somebody's created a way to feed it yet that I'm not aware of. And it really sucks because the syntax is really nice. It's elixir syntax, and then there's the mix.exe file, which is another pathway that elixir exposes. Again, uses the elixir syntax. This is what I was talking about, about setting default configuration. And it's respected by release tooling to a degree, but it's only useful configuring your application. You can't configure dependencies with this thing. And then there's the sysconfig. And this is Erlang, like it just straight Erlang code. If you leave the period off at the end of your configuration file, it won't work. And that gets a lot of operations folk. And it gets a lot of people that are just familiar with elixir, and they don't know Erlang. And I have four guys in my team that know elixir very, very well. And they leave the period off of this every time. And I'm like, come on, we've had this chat four times. So it's just a little bit of the wart that I don't like still. So once we've built the release, we'll deploy it out. So let's imagine that you had a build server. I used Jenkins to build it. And you had an artifact store, which is the thing that your build server outputs. It's like a tar file with your application. And well, I used GitHub for my artifact store. And then you have a configuration management tool, which is going to run on the nodes to deploy out the actual code and configure the sysconfig file and a couple of the small things. I used Chef and the hosting solution. You can use anything you like, but I used AWS. So once we get to this spot, we're going to realize that we extracted the code. It's on the node and we go to start the application. But remember, I had a database backed application and I'm connected to the database, but I didn't run any migrations yet. And if, okay, so let's go out to the machine and we could do this in the configuration management. But just to illustrate it easily, we'll run the Ecto migration. So we'll go to where I extracted the code and I'll try to run mix Ecto migrate. And then I'll realize that I can't because my application is in a release now. It doesn't look like it was on my desktop anymore. Mixed tasks just aren't available to me. And so unfortunately, mixed tasks aren't a valid path to running migrations once you're in production. Migrations, I believe, should be packaged separately and put on the database node. So that way, your database nodes don't have your application code or the compiled database application code. And a couple of small things about the mixed tasks are, Ecto migrate actually requires your application code and it requires to start the application. And that's rough if you're using automatic service discovery. Because all of a sudden my database node pops up and it's like, hey, I'm a chat server. What's up? And I'm like, you are not a chat server. And some bad happens, right? So unfortunately, I created, I mean, not unfortunately. I wish I didn't have to make this. But I made something called migrator. And basically it's just a small command line tool. Super easy to use, very small, very easy. All it does is includes Ecto and a couple of the small things. And it allows you to run database migrations without actually having to have your application on the node. And you can just run it with migrate up, path your migrations, and then the connection string to the database. And there's one neat little thing that's in here as well is Jose and I worked really closely. He did most of the work where I was like, hey, I need to have this multi-schema thing where I have 120 account databases and Ecto has no way right now to run those migrations. So this thing also includes something called multi-schema migration, which will allow you to migrate 128 or whatever scheme is at the same time for the same migrations. And if you are using Chef, I created a cookbook that does this for you. If you use Chef, include it and use that LWRP. So you might ask then, well, you get the scheme as migrated, how are you going to get the data out or in of the schemas? Because by default, Ecto thinks that you're just using one database. Well, Jose, again, helped out a lot, put this into Ecto for me, where you can place the prefix of the schema or of your query, which is the schema, into a model and then put it into the database. And if you get something out of the database, it tags it with that schema as well. So as you pass this struct around, it knows what schema it came out of. And that's really nice. And this is a helper function that I created. I think that this might appear sometime instead of you needing to make it. The API is a little buried, right? But those are the helper functions to put the prefix. And that's what it looks like when one of the other developers uses that at our company and just put the prefix and then insert it. OK, so I talked really quick about fully mesh clusters. And the red lines on the graph, that's actually the background of this image, were between the two routers. And in our infrastructure, those are only things that are fully meshed. What I mean by fully meshed is when one connects to the other and then the third connects to the first one, the second one auto connects to the third one as well. And the problem with this is that you can't get really any more than 100 to 150 Erlang nodes all connected together. And when I'm talking about supporting millions and millions of players and needing to scale out, and I have 128 possible chat servers, never alone the account servers, which is another 128, and then the routing servers. I just saw this as a potential bottleneck. So we wanted to avoid fully mesh clusters. And this is what it looks like. Basically, it's a dumpster on fire with wheels at scale. It's just, I mean, that's the line going everywhere. And I couldn't fit what the actual application would look like on this slide. But imagine, again, like 128 chat servers and then 100 routers or whatever. I didn't think I'd ever need more than 100 routers. It turned out I did not need more than two. So then we have this partially meshed idea, which is the routers are connected, red lines again. And then the routers just know about each chat server and catalog and account. And this is a subset of the service nodes. And it doesn't matter what they do. You can kind of figure it out from looking at it. The way you can achieve that is by using something called hidden nodes. Hidden nodes can be started up with the dash hidden flag. It can be passed on the command line, or it can be put into the VMARKS file. And what this does is when you connect the node, it doesn't try to auto discover its friends or create a connection to anybody else. So the routers know all about each other. But the chat servers have no idea where the other chat server is. So that means that if I'm on one chat server and I want to message somebody else, I have to bounce off of a router to the other chat server. So there's a little bit extra latency there. But it still allows us to scale much bigger than a fully mesh cluster would have allowed us to. And we, again, applied this to all the service nodes. And there's some downsides, like you can't use global processes if you do this. And by proxy, you can't use any libraries that leverage global processes. So if you're trying to use the distributed lock stuff inside of Erlang, you can't. But you could use something else for it like console or ETCD. So there's other ways to do it. And you have to manually connect the nodes together. So a nice thing would be if I had 80 nodes running and I connected one, it would just all find each other. And unfortunately now, when you go to connect them, you'd have to do it manually. Or you can use this thing that I built called Discovery. And Discovery's job is to automatically discover and connect Erlang nodes together. And there's a little bit of proprietary stuff that I'd really like to pull into this and make it more generalized and open source. But this is actually how we figure out where shards are. So when a chat server comes online, it tells the routers, by the way I support shards 1 through 10. And the routers will say, oh, absolutely. And then when messages come in, it figures it out. And when that node goes offline, the routers say, oh, shard 1 through 10 is temporarily off unavailable, so we can't service your request. I could give a pretty big talk about how to configure this and how to use it. But instead, I'll link you to the readme where it's pretty extensive, and we use this in production. And it's very useful. It sits on top of console, which is a Hashicorp tool. So once we deployed all of our stuff out, and I ran the database migrations, I started up my applications. And then I realized that nothing is connecting. I have a node here with a router, and I have a chat server here, and I ask them to talk, and they can't. It's because we haven't opened up the ports yet. And you could do a deep dive through the mailing list of 2002, like I did, or read the Erlang docs. But instead, I'll just tell you the couple ports you need. First is the Erlang portmapper daemon. By default, it runs on 4,369. You could configure it with an environment variable when you start the VM as well. And basically, the point of the EPMD is to keep track of what ports, what Erlang VMs are taking up on a local node. So if I'm connecting to somebody else's machine, and I say, I'm looking for the node named Jamie, it'll say, oh, it's at this port, and give it to me. So first, it makes a connection over this thing, figures out where it needs to go. And then it connects to the node on the port it gave me back. And that's a big range. It's actually a huge range if you don't configure it. And you can configure it with this kernel config. This is actually a sysconfig. That's why it's an Erlang syntax. But it's inet dist listen min and max, and you can set this range. And then all you'd want to do, if you're in a cloud, is go on and make sure that you've opened up this port range between any of the Erlang nodes that are going to need to talk to each other, and also with the EPMD port. And that's all you need to do. So once we had the application running, now's the time for the fun part, which is finding all the bottlenecks that I wrote. And there was plenty of them because if you're learning a new language, or you're using libraries that people provided, and it's got a great read me, but you haven't decided to look through the source code yet, you really don't know what's in there. And this is when the real fun part starts. So some general tips before we get into how you can find those are avoid registered processes. They suck way more than you think. You almost never need a registered process. The only time you'd need one is if you were storing some state that you want to get later on. They typically, in my infrastructure, are very minimal. And if you do use a registered process, it cannot be like a central path. At one point, I had a registered process that every single message that came into the system had to go through. And obviously, it's dumb. It was temporary. And I found it very first. But that's a time when you do not use registered processes. And then another one would be one for one supervision supervisor with a transient restart strategy. And what that means is when you start up a child, if it dies, it's OK as long as it ends with a normal. And you really need to think, do I need this? So for instance, the chat server, if the player is logged in and somehow a crash happens in his user process, yeah, I'd really like that to be restarted. But a time when you don't need this is the transaction manager, where the transaction, if it fails, I never want it to be restarted. So don't do it for that. That was actually a mistake that I made. And then the final tip is to use the observer. It is absolutely amazing, especially when you know what all the numbers mean. So you can start it. You open up an IEX console, and you just type this. If you've never done it, and you have your laptop opened, this is the only time that I'd be like, hey, please, open it and ignore me. But then quickly come back to me. This is what it'll look like. And this is just the landing screen. This is the basic system information. You can see the different tabs. And you can connect to remote nodes with this as well. This is the part about what I said Erlang or Elixir in production is brilliant. Connecting to C++ game servers with Visual Studio was pretty cool, too. But this gave me a ton of power about what was actually going on. I can actually see what messages are coming in and out of processes, and what their whole message queue looks like, and I'll just show you some of it. So one of the tabs is application. So if we look back, it's like the fourth tab. And applications will show you the entire application and its process tree. So this is what you would see on your laptop. I could have started up our entire internal application, but it's really, really big, and this is just simple enough. So this is what the logger looks like when it started. Registered processes are named with the Atom syntax, and then non-registered processes are just a PID. And you can also look at the process list if you press a tab on the right of it that says processes. And that gives you information like the PID, reductions, memory, and the message queue. I'll tell you about all those in a second. And this is what that looks like. The important ones are, again, applications, processes, reductions, memory, and message queue. And then if you clicked on any one of the processes in the process tree or that process list, you can get process information about it, which is what messages are in its queue, what it's waiting on. You can get information about the garbage collector. You can get information about how much memory it's using. You can look at the entire stack trace of the thing, how it was started. You can look at the state of the application. So this is actually neat to debug your application as well if you're building it and you're not understanding why your state's getting into a bad spot. This will help you out with that. And this is what that would look like. So the thing that you're aiming for when you're looking for bottlenecks is to have low reductions or reds and an empty message queue or MSG queue. The memory is important, but memory is so cheap at this point. Those two, though, are how I was finding every bottleneck in my application. And then the question everybody in the room is asking, I hope, is what are reductions? Because I had no idea. Not even a little bit of an idea what they were. And basically, reductions are a way that we figure out how busy a process is. Elixir processes are scheduled on a reduction count basis. And one reduction's roughly equivalent to a function call. A process is allowed to run until it pauses or until it has executed about 1,000 reductions. And this is in the normal priority queue, and we'll get to priority queues. You can pause either with receive or you can actually use a function called yield, which is part of the early standard library. And what yield does is it voluntarily allows you to stop and let another process have a chance. So you're probably not going to use this much. But if you ever get into a spot where you're thinking, I'm a hog, this process is a hog, I need to, and it can't afford to pause, yield is a much more efficient way than saying receive after one inside of Elixir. And the early scheduler uses preemptive scheduling, which basically ensures that at any given time the processor executes the highest priority task of all the tasks that are currently ready to execute. There's four queues in the early scheduler. And this is where the talk will get a little boring. I'm so sorry. There's beer right after it. This is the vegetables. I swear that it's going to be boring right now, especially if anyone's here is just like, I can't wait to make web applications. You'll thank me at some point for knowing how this works. So there's four priorities. There's max, which is reserved for internal use. Don't ever use it. There's high, normal, and low. And there's two priority profiles for these queues. There's strict and fair. Strict is what max and high are set to. The scheduler processes all the messages in the max queue first. All of them. Doesn't matter how many. And then the scheduler goes ahead and messages processes in the high queue. Then it'll go on and move on to the fair queues. There's a little bit of bluffing here, by the way, because if your runtime has SNP enabled, it does things in parallel. But this is a good general rule of thumb. And then there's fair, which is the normal and low priority. By default, any process you start is of normal priority. And the scheduler will process the normal queue until it's empty, or it's done about 8,000 reductions. And then it'll go ahead and move on and process one low if available. You should be careful with this, because you can get priority inversion if you're using the low queue, where low will actually somehow take priority over your normal messages if you have a lot of normal messages. Thanks to Ulf Weiger for explaining to me. That's probably not how to say his name, either. How to use how the scheduler works. And you can set priority yourself. So you can set a process flag that actually tells it. So at the top, I'm setting the priority to high. And then it returns back what the old priority was. And then I'm setting it back to normal. And just like anything in C++, you have to be very careful with it. You probably don't want to set the priority yourself. You might come into a point, and it's just really good to know that this is a tool available. But I can tell you that I have never had to set the priority. I've always thought I had to, and then said, no, I'm not going to do that. One of the reasons, here's an example of a bad time, when you're going to have a bad time if you set priority. Say you set high priority in a process, and it uses dynamic code loading. So it's going to talk to the code server to pull, we'll say protocols, because you didn't consolidate them. And what then happens is the code server is running at normal priority. So you won't cause a deadlock, but you'll cause a slowdown because you have this high priority message that needs to be done, but then it's depending on this normal priority process. So just be careful. And this is a great time then to talk about protocol consolidation. And this was one of the first major bottlenecks that we ran into, because if you recall earlier, I'm very excited about polymorphism in Elixir. And the protocols, if you don't consolidate them, so protocol and implementation, those are a construct in Elixir, if you don't consolidate them, every time you go to use one, it'll hit the code server and be like, hey, I'm using the enum protocol. Can I have that? Yeah, sure. And the code server is a registered process. And if you remember, I told you that registered processes are way shittier than you think they are. And if every single thing is touching the code server, then you have a massive bottleneck. And that's the code server running inside of the observer. Thank you. So protocol consolidation continued to figure out, if you think you've consolidated your protocols and you're in production, you can run protocol.consolidation, pass at the module, see if you actually did. If you didn't, you can look at the path and see, oh, maybe I didn't put it in the path when I started the Erlang VM, which is also a thing that I had done. Luckily, EXRM will handle this for you. Again, because I wasn't using it and I didn't have it available, I created this problem in a surprising number of ways where my protocols weren't available. So then I also ran into, why isn't Ecto starting? My workers at application start. And that was a pretty easy one. There's a flag in Ecto, lazy. You can set it false. And that'll make sure that the whole pool gets warmed up, give you a little bit of extra process boost. I also ran into, why is Ecto not committing data to the database? This was a really fun one. I had no idea how this happened. And I was able to suss this one out, again, by using the observer. I looked at the pool and then saw that it was some reason, a transaction pool, and the test pool, instead of post graphs. And then I realized it's because I turned on a brand new flag that was undocumented because I loved doing things like that, which was to disable build for environment because I don't believe in build for environment. I think that if you build an artifact, you should test it and then put that artifact in production. I don't want to build a test artifact and then put my test it and then build a production one, then put that one in production. So I turned that off. And then I immediately turned it right back on because some libraries like Ecto expect that feature to be on. Some application configurations are only available at build time, like the adapter in Ecto. So don't turn that off. And this was a big one. You really want one Erlang VM per machine, especially if you're on like AWS. The OS scheduler will get real mad at you. So I had one point, like eight Erlang VMs running. And it was a nightmare. That's top running. And basically, I don't know how many people in here have ever done sysab and stuff. But what you really want is for the amount of processes you have, we'll say we have four, my load average, I want to be at least maybe 3.75 or under. The first load average is for one minute. The next is for five minutes. And the one after it is for 15 minutes. And that's a number that you really should be paying attention to, along with your disk usage, if you are writing out to disk much. And then this is the last little bit that I'll leave you with that was something that I wish I learned more about. And it's the VM args. We got quite a significant performance boost by enabling kernel pull, which is just setting plus k to true, and setting up an async thread pool that was larger than the default. And you can just set that again in the VM args. A lot of this information I got directly from Jose, who I owe a hell of a lot to, and Eric. And then the rest of it was either reading Erlang mailing lists or a book that was recommended to me called Erlang in Anger. And I would very much recommend that once you guys are going into production. Read it. It's free. It's a PDF. And it is super, super helpful. So that's it. You can find me on GitHub or on Twitter. Thank you very much. Thank you very much to the AV guys for running the play.