 Paul Dawson, I work for a consultancy called CitrusBite. We build stuff with Elixir and Ruby. Somehow I got pulled up on stage to do live coding demos. I blame this guy. And we should also blame him if anything goes wrong. I'm not the one doing the live coding. I am James Gray. I work for No Red Ink and we use Elixir in production, which is kind of awesome. And I think the most exciting part of this talk will be driving down that ramp when it's over. All right, so sort of as we mentioned before, Hinabi is a collaborative game wherein you're not supposed to be able to see your hand at all. You're sort of giving each other clues. And you can see on the screen, my hand right here, this is sort of like a buildup of the clues that I've had thus far from James in this game. This is what I know about the tiles. But there's something that I think could make this a little easier for us, is if we also had a little bit of additional information about things that maybe we don't know. What do you think? You're saying cheat to win? I didn't go that far. Yeah. Okay, so let's see if we can make this happen. This, by the way, the game is just running on a digital ocean box. If we look here, I have a couple of scripts, no magic involved. Specifically, this upgrade app script is the one I'm going to use. We can look at it real quick. It's not fancy. It basically just builds a directory for release. It copies this file into the release directory. And then it does this fancy little upgrade command right here to upgrade to the next version. So if we run this, we get made release permanent 2.0, which is pretty cool. If we go back to our game, continue playing. I'm going to discard this five because I like to live dangerously. Please don't do that in a real game. And there's only one five in the game, so that was a terrible move on my part. But you can see here, we have a little bit of extra information down at the bottom that's been built up from the clues that we've gotten throughout the game. For instance, I know that tile in the middle is not white. I didn't know that before. That information is not apparent to me from the thing, which is nice. So we live deployed, and I didn't break it. That's always a plus. I saw a juggler one time, and he showed this trick. And it was really fast, and he said it took him a year to learn and get good at. And he did it, and everybody in the audience is just like, yeah, I guess. And he's all, I know it does nothing for anybody. But then later on in his show, he turned on a black light and used glow-in-the-dark balls and clubs, and he did the trick again. And then you could see what was going on, and it was super amazing. So my hope is that what you just saw is like, yeah, I guess. But maybe by the end of this talk, you think it's cooler than you do now. We'll see, we'll see. Are you saying this has to be super amazing? Well, it's after lunch, it better be. All right, so yeah, so a good place to start in our exploration of what just happened would maybe be at the beginning. Whenever we set out to build a game, specifically Hanabi, we were thinking a lot about what is Hanabi, right? Like what sorts of moves can you make? How would you represent Hanabi in a computer? We were thinking about our desired outcome that we want to play this game. We were not thinking so much about how we would put it in a database or anything like that. And so often whenever we're building applications as software engineers, we find ourselves in the situation where we're trying to put together something. We're trying to put together the pieces. We have a user story. We have a project manager who's saying, hey, let's build this thing. And we have put ourselves, we've grown up in these sorts of conventions like REST where we build these fancy routes that are loosely based on some database tables and some relations that we think should exist to try and make our lives a little easier. But over time, the intended experience that we're giving gets sort of precluded by the amount of plumbing that we have to build up in order to build that application, right? We end up in a case where we're doing 30 or 40 database queries just to put us back in a particular state. Just to put us back at a particular step in our application. In spite of all of our best efforts and conventions, this is sort of like the ball of mud you hear described sometimes whenever as applications grow, they get more and more complex and the plumbing gets kind of all intertwined and kind of leaky. Hey Paul, I don't want to interrupt your rant here and I definitely don't want to take away from this amazing picture. But can I tell you about another way to build applications? Does it involve Docker? No Docker required. Okay, let's do it. Okay, so I'm going to try this kind of radical idea, I guess. But instead of doing all these queries and stuff to re-establish the state of the world, another thing we could do is just keep the state of the world loaded. We could just have it in memory. And this allows us to avoid all of that redoing queries to remember where we were. And this also means that whatever this thing is we have in memory, it's probably just some kind of boring data structure that we can make simple changes to. And if we can remember what happens on both sides of the connection, then the messages that we have to send back and forth are really small and trivial, like move this thing over here or update this, right? And this may sound kind of radical for web development, but it would be totally normal for all other kinds of development, right? Like in a NERVs project, you would have your state in memory, for example. So in a traditional web app, if you wanted to store things in memory, we do this a lot, right? We use things like Redis or memcached to push something outside of the stack, right, so that we can remember it and keep track of it without having to worry about this request response cycle where we kind of have amnesia every time a request comes in. But we're on the beam. We have the beam options, so we can use processes and have them loop over memory. Or we can have ETS tables where values are stored outside of processes, right? So I would like to tell you that the memory state of this application is super fancy, but it's actually super boring. It's just a normal data structure that has values that you would need to represent a game of Hanabi. So things like a discard pile or a draw pile. And this is just a trivial data structure. Then we have to change state. We just have these pure functions that take in one set of state, fiddle a few bits in the state, and spit out the new set of state, right? It's just a simple change in state. Our messages, these are messages like that come from the front end to the back end, and remember I said those are simple. This is actually the largest message that the front end we were just playing with sends to the back end. And you can see that it says a player wants to give a hint to someone. And that's the full message. So if you're gonna keep this in memory, then you have to manage memory. But that's not too difficult to do. If you wanna make a new game, you can spin up a new process. If a game is done and you wanna reclaim that memory, then the process could exit, right, and that takes the memory back. You can even have processes check on themselves periodically and exit if there's been a period of inactivity. Say if two people are playing a game via email or something. And so they only move every so often. Okay, so that's about processes. Another tool we can use is ETS. This is a key value store that's available in Erlang, and of course, Elixir Bikes Engine. And it does simple insert, lookup operations, and also supports some not very complex queries. So if you want to do querying. The syntax can be slightly strange because it uses that concept called match specs, which is a little weird in Erlang. And then if you're getting at it from Elixir, that's like two layers of weird. So it gets a little weird. But for simple stuff, it's simple. This is useful for if you need to share state between processes, so outside of processes. It's also useful for just temporarily keeping track of something that you may not need to keep around long term. This is what an ETS setup looks like, or a very simple one. You can name a table, set some parameters for that table, what kind it is, and things like that. This is what an insert could look like. There's different ways to insert, but this is one. In this case, we're inserting a registered user into an ETS table. It just lets us keep track of their PID. And the value of the stored is just a term. It stands for Erlang Term Storage, I think. So it's just an Erlang term. In this case, a tuple, right? And here's a lookup, which is the opposite operation. From a different part of the code, the games table. This lookup is seeing if there are any games already prepared to start with a given number of players. So when you use this kind of strategy, you end up with this scenario where your state is living in these kind of ephemeral places, right? It's surviving a memory. These game manager processes, they're looping over game data structures. And those game data structures are the memory. And then the ETS table lives outside of the processes and different processes that are communicating with it. And this is where the heart of your application lies. So we have this neat abstract concept at this point. But outside of my test suite, there's no way really to interact with it, right? So what we want is a way to move tiles around to do stuff, to interact with it. And that means we need an interface. Wait, are you saying, so we have this cool process in the background and we want some way to talk to it from the outside world, that's what we're looking for? That's what we need. I think I have an answer. I read about, I think Rails is doing this thing with web sockets now. Let's do that. No, not really. Yeah, so obviously we should use web sockets. We're not going to use Rails. But yeah, so having a stateful connection to an outside user via an interface is what we're looking for. And web sockets are super cool. How many people have played with web sockets? They're great. Yes, see lots of hands because they're awesome. I find it very close how we tend to think about message passing within Elixir, like it's very event driven. You have subscriptions, you're paying attention to what's coming down the line and using them and Phoenix happens to have a very great abstraction for using web sockets. So I notice a lot of us are familiar with them, hopefully using Phoenix. The channel abstraction is pretty awesome. It kind of is reminiscent of how you would expect to use a gen server in Elixir. In our case, we have the game engine on the back and then we're using an umbrella application. So the Phoenix application within that game is actually completely isolated from the game engine and it's only reason for existence is to provide an interface for web sockets for the front end. There's no Ecto in the Phoenix app. There's not anything except for basically just the channel abstraction. And you can push messages to it. You can handle messages and this is a way that we, this is how most of our messages look in the application. If a play message comes down the line with a specific payload, then we handle it. We basically just hand it off to the game manager, which is just another process in the sky and does the right thing. Yeah, which is pretty great. On the actual front end, we chose to use Elm, which is a fantastic pairing to the two web sockets and two Phoenix and web channels specifically, I believe, due to some side effects of the Elm architecture. And without getting too deep into it, this little bit within the dotted lines is what is commonly referred to as the Elm architecture, whereas the model up at the top is a representation of a sort of user state. And the view is a means for representing that state. And then the update function is a method to update the model. And it is very specific and explicit in the ways that you are able to do that via certain actions. And so you can trigger actions from within the view if a user clicks on a specific button or in our case you can tie certain events that may come down a web socket through a channel to specific actions that get triggered in the update function and then cause the modeled update and then eventually the view, which is pretty cool. There is actually a Phoenix socket library. This is our example that we use for assigning the game and subscribe to the game and the game player message. Another side effect of the way that we chose to build this application is on the front end there's not really much logic that has to do with the actual rules of Hanabi. Basically all it does is when you click a button it pushes the message back to the back end and the back end responds with, hey look, here's a new game. And so really the only message we're having to listen to on the front end is I got a new game, what am I supposed to do with it? And that's what this message does. It basically subscribes to that particular channel and then uses a JSON decoder in order to decode what the game is and then redraw it. And similarly, publishing messages back to the back end is pretty easy. Here you see, I mean, we're just, we're publishing a discard message back to the back end that a user has made and it's the same thing, includes a JSON payload and pretty simple, it's pretty neat. So in doing this we've sort of created another problem, right? We have this very stateful connection between the user and we have this very stateful process up in the sky that's holding all of the state of our game. And this is kind of problematic because we live in the real world. And there's things like spotty Wi-Fi and computers crash. And computers are just computers and do things that we don't really understand sometimes. And that makes having all of this in-memory state and very, very stateful specific connection kind of problematic. If you just stop chewing on your cables, then your Wi-Fi will stay up. I have been trying to break that habit. So it's obvious that we need some manner of long-term storage to protect us against this problem. And we have to decide how we would go about that. And I think my instincts tell me that I should rebuild my logic, my game logic surrounded by these models, this database abstraction. But actually I'm gonna try to argue that that's not what we should do. Let's see how that goes. It's clear we need long-term storage to protect for the things that Paul mentioned are a problem, failures, things like that. He did forget to mention we needed to keep track of how many games I've won, which I feel is pretty important, but I'm gonna toss that out there. So in our case, we were using PubSub to communicate with the front end. And the reason to do that is though only one user makes a move, right? I wanna give a hint to somebody, they make a move. But actually the game updates for all the users, okay, maybe not in the case of a hint, but if you discard a title, for example, the game updates for all the users. So we send the moves up normally, but we respond, the game responds by publishing an update on a channel based on that game ID. And then all the players are subscribed to that channel or topic, I guess, sorry, to use Phoenix Terminology and get that update. Well, that means that storage can be just another subscriber to the PubSub system, right? So storage hooks into the PubSub system, it sees the events as they come out and it records them in a database. They reused append-only storage for this. We're gonna see the events as they come in, so it makes sense just to take the move and shove it in a database, right? And not try to keep updating and keep this complex game state in sync with whatever's going on on the other side, better to just record them. If you someday need to load a game, then you pull those events out of the database and you replay them on your data structure, right? And the advantage of doing something like that is that in the case of what we did on stage where we changed how the game works, when we replayed those events to refresh that game in progress, then it caught up with the state of where it was, but now it had that new information involved, right? Like you can change your mind about something. If you're calculating a score for something, you can change the algorithm, replay the events, and get back to your new score calculation using your new rules. This can require roll-ups if your data gets huge. Like if you're just continually adding to one table forever, it can become prohibitive to go back to the dawn of time and replay all those events. You can usually fix that problem by like snapshotting the data periods of time. And then grabbing the latest snapshot and then playing only the events forward from that usually solves that kind of problem. I'll also admit that it's not ideal for all tables. It works really well in the moves that we're doing here in the game. I also used it in the game table and I kind of regret it there. It would have just been better to have a normal record that I set, finished at flag on when it was done. So it's not great for everything, but it really suits the move use case. So loading games, we did have to make a tiny change to the game manager. And that's just to have it take a call back that it can use anytime it needs to load a game. So that might happen because of, in our case, hot code reloading. But it could happen because a game was shut down for an activity and then later brought back up or something like that. And so this is the only change we made to the game manager. It takes this call back. And internally, the call back just makes an empty new data structure. So our game engine still has zero knowledge of the Phoenix component and also the storage component. It knows nothing about them. This is, I'm not gonna show you the full code to load a game because a lot of it's just database concerns. But this is the beating heart of it right here. You take the saved moves and you just apply them one by one to your game data structure until it's back to where it was. So there are lots of pros and cons to the system. I mentioned that the game engine doesn't have any knowledge of storage. Storage is literally an add run. It's the last thing we built and we were playing with it before it was built. So it works fine without it, save that you can't resume games, of course. Also, there's an interesting kind of independence here. If your database gets a little behind, that's fine. And events can queue up in your storage systems mailbox. And you'll get to them when you get to them, right? Get clear of that. But that won't slow down gameplay necessarily, right? The gameplay is independent and players can continue interacting with each other. Now the con to that is you probably need some form of back pressure, which I haven't put in yet, or we haven't put in. The back pressure is needed because if the games continually swamped the storage system, then eventually you can get some kind of memory error or something like that. So you need to be able to push back and slow down if we're too far behind. Also, with this scenario, it's possible for some catastrophic failure to leave the two systems disagreeing on what the current state of the world is. That is a solvable problem. If you have to have that kind of assurance, you could use a strategy more like this, where the user system sends the event to storage, storage persists it, and then storage forwards the event onto the game engine, and then the game engine could respond to players with the update. That way you wouldn't process a move until it was persisted, so you would know it was safe. So this has different trade-offs. Again, you can still have the independence and the add on nature of it. That's fine. And we can guarantee that the two systems stay in sync, which is good. We give up that easy independence and processing speed. You can still do it, but you'll need to do something like a worker pool at the storage layer or something like that to get your kind of independent processing. Also, we have kind of a new trade-off in that you have to be okay with storing illegal moves, or in having the game engine ignore those when they come back to it, or validating in the storage layer before you save, that the move is valid. Our system doesn't have this problem for two reasons. One, the validation is in a separate module from the actual engine itself, so it would be no problem to use it at the storage layer. But two, the Elm front end is architected in such a way that it never sends an illegal move. So there would be no illegal moves to worry about, but you do have to think about these things. Finally, I want to mention one other potential strategy. You've probably heard the advantage of lazy computing, that if you make something lazy, then maybe an advantage is you never get around to invoking that code, it means you don't have to do it, right? If you don't design your system around models and you add it on as a planned feature, then maybe you don't have to. So like in the case of our ETS tables, they store games that are being put together in progress but aren't yet in progress. So one player joins and another player joins. They're waiting on the third player. If we had some kind of outage or something and that went down, players would have to reconnect to the games that they were in. But that's fine, we've lost their connections anyway. So having that information long term persisted has no value to us. And we just don't have to do it, right? Okay, this is all a great, we have a full system now. We can play games, we can restore things. We can keep it going until the time comes and it always does when you have to deploy, right? And then this is kind of bad, because now we've built up this whole system. Everything's in memory and we're playing everything from there. And basically at that point, we flush it all, right? So we lose everything. Cool, this is my best Morpheus impression. What if I told you that we never had to reboot the application? I would take the blue pill. Perfect, yeah, so we are working in Elixir, right? And Elixir has some really awesome tools around doing hot code swapping, which does away with the fact that you have to reboot the server and flush all the memory. The tool that we chose to use was Distillery. We've heard it mentioned a few times today. It is an awesome tool, it is super easy to use. This is an example of building a Distillery release. The one that we actually used today was built in this way. One note, even if you're using umbrella applications, specifically with Phoenix, you will have to compile your assets whenever you build the release. And then, yeah, you build a release and you can just run it, it's pretty nice. One other thing that I neglected to show during the live demo is that we bundled all of the early runtime system and everything together in the release. So that server on DigitalOcean that's running the game right now actually doesn't have Elixir installed, isn't running the OTP, isn't doing anything. It's just running from that one release that we pushed out. Similarly, to upgrade a release, you do the same thing. You tell which version that you're upgrading from and you build your release. However, I did notice that you do have to have the version that you are upgrading from on the same machine that you're building the release that you're upgrading to, if that makes any sense. So on the code side of things though, a few things had to change in order to facilitate doing a hot code swap. One of those is the versioning of the data. So we have this data struct in memory that represents our game and the current play state that it's in, right? Whenever you run a hot code swap, there's this code change callback that gets called and that manages the migration of your in-memory state to whatever the new state of the world is in your running application. And you use this VSN attribute up here to show that we are moving from this type of data structure that doesn't have the insights feature that we pushed today to one that does. And this is how you would go back and build that feature into the data struct. So in this case, we just had to call this replay game function that went back, read all of the game events that had happened and rebuilt the game state in memory, taking the new rules into effect that also included that extra clue information. Some of the advantages of doing this are that, I mean, we have years, we have decades of OTP release history and like how that is all handled as far as hot code swapping. It was actually really easy. I was kind of nervous when we were preparing and building this application and putting this talk together about being able to do a hot code swap like on stage while running Linux and giving it, I don't know, it seemed like a terrible idea at the time, but it actually worked out pretty well. Like distillery is a super neat tool. One thing though, this is sort of, I have it both on the pros and the cons list about you do have to be very specific about your application architecture, right? You have to be very concerned about where the state in your application is and how you are going to move it to a new version. So in our case, most of the game state was in a game module and it was very easy. We had to change a few things in order to be able to handle that code callback and to be able to replay everything back to where we needed it to be, but in a large existing application, it can be very complex if your state is spread across many modules and there can be some co-dependence and so being able to do a hot code reload like that does require some specific architectural decisions. Some of the downsides of approaching deployment in this way is you do have to build your releases on the exact same architecture that you're planning to deploy to. As I mentioned before, I run on Debian Linux full-time on my laptop. I was unable to build a release on my laptop that would then ship to an Ubuntu machine on digital ocean. And through some troubleshooting and open SSL conflicts and whatever, you can maybe make it work, but just generally you'll need a build server to build for the same architecture that you're deploying to. Another thing, there's this big section in the distillery documentation about how you should probably not do live code reloading in production. And I'm not sure, so I mean, I don't know. The way I kind of look at it is there are definitely cases, like this is a very simple example that we use for this talk, right? There are definitely cases wherein you have to deal with something like a database migration or you have multiple modules that are dependent on different versions of your data that you're trying to migrate during this hot code reload or there's some external dependency that you need to manage. And in all of those cases, you need to have an understanding of what the app file does, which distillery generally generates for you. You need to have some knowledge of the internals of how an OTP release works and how to build that out. And so that's another thing you kind of, maybe it's just easier to do a rolling release than it is to go back and really think about those types of things. And then like I said before, it would be difficult, I think. I use Elixir at work in production and it would be difficult for us to do a hot code swap on our application just because of some of the architectural decisions that were made in building that. So we built this application for the purpose of this talk and to show off hot code reloading and how it works, but we really learned a lot in the process of doing this and building it and making it work. This is some of our key takeaways for what they're worth. First of all, as Paul just said, you have to stay aware of where your state lives. If you spread little bits of state all over the system and then do something like a hot code reload where you gotta change bunches of those bits of state, that's gonna be harder, right? Significantly harder. So you do have to stay aware of that. This next one was a big aha moment for me, but if you can have this continuous conversation with both sides, where both sides are remembering all of the state, then it just gets drastically easier. You're not having to ship back and forth all the time, all these breadcrumbs so that you can reestablish the state of the world on the other side of the net, right? And instead, you're just sending these tiny messages back and forth. So if you imagine a complex front end where you make some Hanabi move and it updates its internal representation of the game and then sends messages to the back end, trying to keep the back end in sync with what's going on, it gets so complicated, so fast. But in our case, because the front end is just allowing a player to click buttons that makes sense and the back end is doing all the game logic, the conversation is like player A would like to give player B a hint about it as fours and then the game engine responds with okay, then this changes this and you change those things and both sides are just very simple. I think we should strive to make storage just another feature in our application, right? That we shouldn't make it the center of the world that we base all things off of, right? If you have it as another feature, you can plan how and where it gets used and how that affects your application. You can choose not to use it at times and it doesn't have to get all tangled up with what you're doing about, in this case, running a game or in another case, surveying surveys or whatever it is, your app does. And then finally, this kind of strategy of app building does require a plan. You have to think about how the components are gonna be laid out, how they're gonna exist, how they're gonna get upgraded. You can't just kind of look your way into this, I don't think. You have to think about it a little. Great. So that's what we have. All of this code that we use today is available in open source on that repo that we've been working on for the last few months. James wrote an awesome read me for that so hopefully you can poke your way around. Feel free to reach out to us for questions afterwards. Yeah, thanks for coming.