 Okay. Let's start with this. Say hi. Hi. Yes. Thank you. Perfect. Tradition, tradition. All right. Welcome, everyone. Thanks so much for having me here. I'm very excited to speak at this conference, to speak here. This is the title of the Zalk Beam Architecture Handbook. We're going to talk about a bunch of design patterns on the Beam. How many of you have used Elixir before or Erlang? How many of you have used it in production? All right. So it's just going to be a bunch of patterns and ideas and tips to work with the Beam applications or Erlang Elixir application on the drone on the Beam. And we'll talk about a bunch of these. So this is my username on the internet. I'm a member of the Elixir core team. I've been for three and a half years now. The good news about Elixir now is that when I joined the core team, it was a hipster language. Now I think it's not anymore because Hacker News is saying, would you still pick Elixir in 2019? So probably we've made it, you know, because people are getting tired of us. There's a company called Community, which is a messaging system we are hiring remotely. So if you want to come talk to me later, just come talk to me. I come from a country called Italy. You've probably heard of it. And I think the best thing that we export in Italy is Italian people that are mad at food cooked, Italian food cooked by other countries. So if you want to know more, just follow this Twitter account a collection of screenshots of Italian people that are angry at other people for cooking Italian food wrong, mostly Americans. I will show you a bunch of tweets. For example, obviously there is no spina chicken pasta recipe in Italy. In Italy a recipe like this is a crime worse than a murder. That's the kind of stuff that they say. A world war won't be enough to forget this. Very strong people. This one very common recipe used in Italy to cause what we call a translation cone of the power that is a huge and uncontrolled vomit explosion. Very graphic people. This is directed to Americans in particular. We're just here to watch you burn one chicken, parmesan, four cheese, mozzarella, taco pasta, Alfredo at the time. Very nice. Is it an ungoverned meal, isn't it, to vomit the last crap out? Very nice. Lots of vomit references. Then we have a defeated one. I really don't understand why you do this. It's just like defeated. It's a positive person that says, so happy that there is an ocean between us guys. And then my personal favorite one. We're not offended. We're actually vomiting. Very nice. So that's the best thing about Italy, in my opinion, is this Italian people very, very mad at food. Anyways, let's talk about being for a little bit. We will talk about two aspects of this, which are the perspective of a single node. So what happens inside of single nodes? How do processes interact together? And how supervisors work and tips and tricks around that. And then we will move to multiple nodes interacting together. So design patterns and tips to architecture applications with multiple nodes talking to each other. And let's start with from the ground up. So let's start from a single node and a bunch of tips to architecture applications around a single node. And on a single node, the smallest unit of, let's say, life in an airline analytics application is a process. And a process is a unit of isolation and concurrency. That's what it needs to be used for. Just if you need to isolate some operation, isolate failures or isolate resources, then a process is a good idea. If you need to do something in parallel to something else, then a process is a good idea. Otherwise, processes are not a good idea. So an example of where a process makes a lot of sense is if, for example, if you have a TCP acceptor that just accept in TCP connections, and then spawning a process for each TCP connection makes a lot of sense because you're gaining, first of all, concurrency. So every TCP connection can be handled in parallel, but you're also gaining isolation. So that if a TCP connection, for example, goes down, the other are unaffected. So that's a very good use case for using processes. New commerce to the beam often tends to use processes for code organization. Say, like, I need to do this set of operations. I will do it in another process because they belong together. That doesn't make sense. That's what you want to do with a module. You want to put those operations separately in code. Sure. But probably not running a separate process because a separate process entails a bunch of other things. Like, you have a different memory space. You have this running concurrently. So usually the good idea is to put that in a module. And that's why I use a process. When using processes, my recommendation is to never use a spawn directly because there's especially in Elixir. In Erlang it's sometimes necessary, but in Elixir you really have all the tools that you need to never have to use spawn directly. And these tools are things like GenServer, GenServe Machine, Task. So if you need, for example, a process that has state and interacts with other processes, a long-lived process at GenServer is a good idea. If you need a state machine, there's GenSateM. When you need to spawn a task, when you need to spawn a process to do something, Elixir provides this very, very nice abstraction called tasks. And a task is basically a very thin layer around a process that deals with a bunch of other stuff that otherwise you would have to deal with. So let's look at three super quick examples of why tasks are great for these particular things, which are doing an async operation, so something that happens, let's say in the background, doing parallel mapping and parallel processing of collections and doing time box operations. Let's see why this is super useful. In the first example, just doing something in the background, in this example we're just making an HTTP request in the background and we can just do that by spawning a task with task.async and this returns a task data structure. And this is a thin layer around spawn, but it sets up a bunch of stuff that otherwise we have to set up like monitoring and it has a format for the request and response format that you have with this process. So it's very simple. You use it by just calling task async and it returns a task. Then you can do other stuff while the HTTP request is happening in the background. When you're ready to fetch the result of the HTTP request, you can do task await. And task await does, again, a bunch of logic around receiving the response from the task. It will clean up. It will not leak messages from the task. It will just return the result if it's there. Another super good, so this is no using spawn, but it's basically the same thing as doing spawn and then some sending and receiving, just that you don't have to do that and there's less chance of making mistakes. If you want to make, if you want to do some parallel processing and parallel mapping of collections, we have this super cool thing in Elixir called task async stream. So task async stream takes a stream, so any collection basically, and a function and it returns a stream that will process that, will apply that function to every element of the original stream in parallel, but with a bound number of processes. So it will not use all the processes because otherwise if you implement this naively with spawn, you have a 10,000 elements collection, you can spawn 10,000 processes, you have a million elements collection, you will spawn a million processes and that doesn't scale really well. Task async stream will let you use like a fixed number of processes to process the element of the stream as they come. So in this case we're just fetching, say we have a bunch of URLs, we're just saying fetch those URLs and then we're mapping over the response and we can set some options like for example max concurrency 20, like use at most 20 processes. So that's a very nice use case and very simple to use. And the last one is time box operations. So if you want to do something and you want to give it some time, then give up. So for example here we say, we want to make a HP request and we want to give it five seconds and then we want to give up if it doesn't finish in five seconds. So we still start the task async and then we use this task.yield function. This basically says wait for five seconds and if the response is there, get it. If the response is not there yet, just return nil and then we say or. So if it's nil we'll go there to task shutdown and task shutdown will shut down the task but we'll also have nice logic to if there's race conditions to basically get the response while the task is shutting down. We'll go through the code in detail but just to show that there's nice tooling to do. Fairly, I mean not complex but stuff that you have to do if you have to do with spawn takes a little bit of time, takes a little bit of plumbing to set up monitors and the right receives and avoid race conditions and this is much easier to use I think so to avoid spawn. The nice thing about since we're talking about processes messaging between processes is asynchronous. One thing that I really, really like about the beam is that it makes you think in a perspective of distributed systems naturally so in the beam the stuff to be concerned about there's process to process stuff to be concerned about like asynchronous messaging between processes, having timeouts, doing time box operations making sure that you get you send a request and you get back the kind of stuff and then there's a layer on top that is the node to node problems and those are a super set of the problems that you have between processes. We'll go into this in more detail a little bit later but I think it sets up a very, very nice mindset for writing applications. So one thing that I like that you have to think about when working with processes is that you always have to think about a synchronicity so the communication between processes is a synchronous so you have to force yourself to think if I send a request then to know that it arrived I have to receive an acknowledgement but then I have to probably have a timeout but what if the response comes after the timeout this is a very nice thing to focus on because they let you I think build they kind of force you to build more robust systems and crashes as well like any process can crash at any time right and which is the same principle as every node can crash every computer can crash at any time right so you always need to have a plan for when that happens right and the beam forces you to have this plan at a very low level right which I really, really like I really like this quote by Chris Githley saying any of your processes will crash at any given moment and all of their state will be lost which is true it's very hard to it's hard to deal with this right a few things that you can do is for example if you have a process with state it's just to flush the state to a persistent storage and then fetch it back so just keep basically the very latest state in the in the process make the state rebuildable so that if the process crashes it can just start back up and rebuild the state from somewhere or generally just it's important to keep this in mind I think because it makes it force you to build better and more robust applications one thing that's very easy to miss when dealing with this inter process communication and asynchronous communication is the mailbox so Erlang has selective receive meaning that when you receive a message you can pattern match on the message that you receive and some code like this will never fail if it doesn't if you get a message that doesn't match any of the clauses it will just sit in the mailbox of the process so when you have to some code like this you're risking to leave messages in the mailbox of the process that will never be consumed so and that becomes a memory leak basically because you never consume the messages that are there this is rare because it's rare to use receive by itself to use receive manually most of the time you will use something like a gen server which provides this handle info callback to handle messages and the nice thing here is that it forces you to handle all possible messages because if you don't handle one it's actually gonna fail with a function clause error right so it forces you to handle all possible messages eliminating the memory leak of messages not being consumed when we step up from processes the thing that coordinates processes is supervisors right so supervisors are basically this the philosophy is just that you try turning it off and on again so when there's a problem with a process for corrupted state or you arrive at a situation in the process that made it crash there's just another process that restarts it and it turns out like a lot of design this works very well restarting a process from a non-state works well and the process it's common that it's rare that the process ends up in a corrupted state again so a bunch of advice around supervisors the first one is to start with a whiteboard design so I like doing this when I'm designing my supervisors I like to draw them out it's just very practical advice just draw them out because it's very hard to see them in code I think and having components of your system and the parts and designing how you want to handle failures of parts of your system is very nice so you can design the supervision tree and now it reacts to failures on paper I think it's much nicer than doing it in code because it's very hard to read and then another advice is to learn supervision strategies very well and to know when to apply which strategy so we have three strategies one for one one for all and rest for one and they basically one for one means if a child dies only the child is restarted one for all if a child dies all the children are restarted and rest for one if a child dies only the children after it are restarted and this knowing strategies is very powerful I think in combination with knowing how to nest supervisors and nest supervisor is something that's not very advice will not very advertise let's say in the beam community but I think it's a very powerful concept and let's look at a quick example of why say that we have this situation where we have a supervisor and then we have a state process that holds some state and then we have three worker processes and these worker processes need the state to work so then the state process needs to be up what the what's the strategy that you would choose for supervisor for the supervisor on top if we use one for one it works very well for the workers because if a worker goes down one for one just restarts the worker and we're back to the original state the problem with one for one is that if the state goes down the workers are unaffected so they will crash somehow because they can't talk to the state we want a situation where if the state goes down we restart all of the processes so this is not the strategy for us what about one for all one for all is good if the state goes down because if the state goes down we restart everything and we restart in order so start the state first and then the workers will have the state available but if a worker dies we don't really care about restarting the state and all of the workers right so this is not our strategy like if a worker dies yes restarts everything and that's not really a communist strategy for us rest for one is the last one it kind of works better because if the state crashes it will restart all processes after it so it will start all workers which is the right behavior but if a worker crashes it kind of depends where the worker is right if the worker was started as the last worker good that's the only one that's restarted but if the worker for example is the second to last we will restart two workers instead of one which doesn't really make sense for our use case we want to restart only the worker so there's no strategy that will make us happy but if we introduce nesting we can easily solve this problem so say that we turn our supervision tree into something like this where we have the supervisor on top then we have the state and then we have another supervision subtree basically so where we have a worker supervisor and the supervisor is under it once we have this situation it's very easy to make choices for the strategies because we can just say the worker supervisor is one for one so that workers are independent of each other if one goes down and just restarted and the main supervisor is rushed for one meaning that if the workers keep crashing and the workers supervisor will crash but it will not affect the state but if the state goes down the supervisor will shut down and restart the whole worker supervision tree so that's our that's the strategy that we want so it's very nice to be able to nest and composing and nesting supervisor like this makes the strategies extremely flexible to use another couple of bits of very practical advice just supervise every process because every and every process supervised means that processes have proper shutdown if the application is shutting down or if parts of the application are crashing you can trace processes through the supervision tree and they have proper resource semantics so it's nice to have them supervised another very practical thing is just give a name to most of your supervisors because it's very hard when you're in production you're debugging a system to go find the supervisor that you want to find the child process to examine what it's doing if you give name to supervisors it's just much easier it's very practical and then one more thing about supervisors is how to test supervisors this is a topic that's not covered at all or very very rarely in the literature of the Erlang and Elixir communities I think the best thing to do the most practical thing to do is just go into observer and manually kill stuff and see if the supervisor reacts well it's not what we're used to without automated testing and everything but it's hard to test supervisors in an automated way and doing this just gives you a little bit more confidence if you have your application running and you go shut down in an observer you can just right click and kill processes if you do that and you can observe if it behaves correctly it's nice to be able to do that otherwise you can introduce some chaos engineering to test this if you want and if you want to do it more programmatically there's this library called SUPS which is basically property based testing applied to testing supervisors so it will go and shut down random processes and random supervisors in your application supervision tree and if it finds a failure we'll try to reduce that to the minimal case that creates problem in your application so it's a little bit more work it's a little bit more setup but it's nice to if you really want to make sure that your supervision trees that your supervision architecture works moving on to supervisors I want to talk about a pattern that I see implemented a lot in the BIM that it's hard to implement right it's not hard but I think it needs to be discussed in order to be implemented right and that is connection handling so in a lot of applications at least for me it turns out that we have to handle some kind of connection so you have to write for example a connection to a database so you have to write a connection to a message queue or to a cache or whatever you have to write a persistent connection to something that's external of your system so for example it could be redis so you have redis you have your app you have a connection process that holds some kind of connection to redis for example it could be a TCP socket and then you have redis the database sitting there right the problem that I see often is that in the redis connection process people that write code like this they will crash if there's a network fault for example right because the whole let it crash thing like the supervisor will start it anyway so if there's a network fault it will crash the redis connection because the supervisor will restart it the problem is that the supervisor doesn't really provide smart restart logic it just provides dumb restart logic it just restarts it but here when we have these kind of connections we usually want smart restart logic for example we want back off we want to say let's reconnect after one second and then maybe after two seconds so we want exponential back off we want to randomize it a little bit and then one connection and in those cases we I think that the right choice the right abstraction here is a connection manager so connection manager is basically it's still a process that holds a connection but it's a process that is not supposed to crash for things that are very much supposed to happen like the connection going down so if the connection goes down it's a normal case and the connection process doesn't need to go down it needs to relate to say that the connection is down and then it needs to have some logic to say I will try to reconnect maybe in a second and in the meantime when the connection is down it can just return errors so it can represent a lower level of abstraction to the connection that says the connection manager, the process redix in this case is still there if it's connected there's no one zero return error reason and then you basically bubble up the decision of what to do with the error to the application, that's using it for example if you're using redis as a cache you just say if there's an error just fall back to the database and we have now in Erlang a very nice abstraction to help with this which is GenStateM which is an abstraction to build generic state machines but this is pretty straightforward we have basically two states which is connected and disconnected and connected when you successfully establish the connection you end up in the connected state there's a problem with the connection you go back to the disconnected state and simply when you receive a request in the connected state you can't reply so you return an error fun fact, if you look at that connected state you can tell that I can make typos even when running with a pen which I thought was impossible anyways so this is a how to write a process that manages a connection instead of a process that is tied where a connection is a resource of the process that the process just manages instead of being tied to the life of the connection and this also enables one more thing to do with supervisors or not enables but highlights one thing that we need to think about when working with supervisors which is that init is for setting up state so init is basically what a supervisor calls in a GEN server or GEN state machine or any GEN process it will call init and it will wait for init to be done before moving on to starting the next child and so for example if you're writing a connection process it will establish my connection in init the problem is that first of all you're slowing down the startup of your application second of all you're tying the availability of a resource that's going to be unavailable at some point to the life of your application so for example if you have redx in your supervision tree and you want to set up the connection in init you're saying if redx is down in the moment where it starts my application my application won't be able to start that's usually not very good so that's why I'm saying init is just for setting up state so in a connection manager the state is basically just maybe the address of redx where you want to connect to so you just parse options and you build a state that you can connect from and then you return from init and then you connect asynchronously so you basically have an asynchronous init where you connect right after you return from init so the supervisor will have started the children, the child and in gen server for example you can do something like this now you can do continue and then a term when you return from init and it will basically return the supervisor will consider this child has started and then it will just go straight to doing to handle continue which is another callback where you can set up the connection and this also has the advantage of saying of declaring that you can handle for example this dependency in this case redx say being down for your application so your application knows how to deal with redx being down in general what I try to reason about is if I hard need a dependency like for example redx if my application really needs a dependency if it's not there it can't work then you can have something like syncing it where you establish the connection on init you can erase on every error that you get from redx instead of trying to handle it but in most applications I've worked with you don't strictly need these things for the application to work for the application shouldn't go down for example redx is down right in those cases it's very fine to have a synchronous initialization of this connection process plus to handle just errors that come from the connection process when talking about connection process it's easy to go into the discussion of bottleneck processes because if you have a connection process at some point it will not be enough to handle load to that thing for example if you want to talk to redx it's usually not enough to have one one connection to redx you maybe want to have more for the same application otherwise everything is serialized through that connection and that's a general problem with a lecture process not a problem but something to be aware of is that a process is sequential processes are always sequential so if you have a situation like this where you have a cache process and a bunch of processes that want to access this cache if they do a blocking call then the cache will serialize every call so they will every process will be waiting for all calls before it's called in order to get the result because everything is serialized in the cache so in this case sometimes it's easier to do in a situation like this for example you could use a shared resource like ETS where you maybe only have reads go through the ETS and writes still go to the cache so you don't have to care about race conditions too much but in general you can't always do this with something like a shared resource like ETS so what you do to increase your availability of connections to something external to the system is to use or to resources in general is to use pools so a pool is just a connections let's say that are connected to a resource so in the example of Redis the resource is Redis and you want to have a pool of connections that are connected to Redis so that you can pick connections from this pool and have more than one instead of having only one where all requests go through there are two main types of pools in Elixir and I ended up writing a bunch of pools in my Elixir career so I am giving this advice because it happens but the main the two kind of pools that I generally write are either checkout pools or name based pools in checkout pools basically you have the resource and then you have a pool manager which is a process that manages a bunch of connections to the pool and the way it works is that when a process needs a connection it will ask the pool manager to check out the connection and we will use it exclusively for a while and then we will check it back in so in pseudocode it would look something like this where you say connections is called pool.checkout of the pool now the pool belongs to this process and this process is the only one sorry the connection belongs to this process and this process is the only one that can use it then you do something with that connection then you check it back in when you are done so this is very useful when you have resources or connections in the pool that can only be used by one process at once for example you make a request and you wait for the response in one operation when you have connections that themselves can handle like multiplexing for example so they can send and receive responses at the same time which is what DCP does then name based pools are a very very useful thing so name based pools work I ended up writing a bunch of these because they are very useful and the mechanism is simple you basically register a bunch of connections you register them with a name that you can derive so for example just connection underscore and an index and then as long as you know how many you have you can just choose a random index and then you can basically get to the PID of a connection in the pool so this scales better because you don't have to go through a pool manager it also scales better if the connections support multiple processes using them at the same time so they don't need to be exclusive and now in Alixer we also have this thing called registry which helps here because connections can register to this register which is just a name registry where you have a name corresponding to a bunch of PIDs so connections can register to this registry and a registry is basically an ETS table so it's accessible without going through a process so when a client or when a process needs a connection it can just ask the registry what are the connections and just pick one at random and it's nicer because you can do things like if the connection goes down you can de-register it from the registry so you don't get it at all so in Suda code it's you can get all the connections from the registry there you can choose a random one and now you have the PID of the connection that you can use so when you have connections that don't need to be exclusive this is a very very useful kind of pool talking about going back a little bit to error handling this is a topic that's very sensitive on the BIM and it's very sensitive with supervision and connection managers and all kinds of things my advice is to handle all expected errors so for example this connection TCP disconnection is an expected error and that's something that you probably should most of the time handle let it crash is a very in my opinion misunderstood phrase in the BIM especially for newcomers because we try to settle it as you can write programs that crash at any time and everything is going to be fine that's not really the case because if you have expected situations like a TCP error and you crash you're not going to have a good time you're going to have a good time if you crash an unexpected or unrecoverable things so for example if you have something that you parse when it comes into your system and you say from now on it's an integer then you don't have to check that it's an integer every time if it's just do stuff and if it's not an integer it will crash that's like an unrecoverable state because it will take probably more time to try and reparse it or find out where it went wrong usually crashing is good in those cases because the system won't go down like supervisors still provide a protection for the system to go down from the system going down so that's very nice but crashing on expected errors doesn't really work because supervisors are for emergency restarts so they don't have smart logic to do restarts as we were as we were saying in the connection manager example if you want to do back off you can't do it through supervisors supervisors just something crashes restarts restart it, something crashes again restart it at some point just crash the whole part of the tree and then bubble that up it's not smart logic so if you need smart logic you just need to implement that in process supervisors just for emergency restarts so it's something crashes and it's unrecoverable a supervisor will guarantee that your whole application doesn't go down but the failure is very localized to a little part of the supervision tree and it takes a while to bubble up and there's a chance that it doesn't stay so those are a bunch of tips and tricks on processes and supervisors and everything is on a single beam node but many times you have applications that are main of multiple beam nodes I tend to reason more about applications that are on multiple nodes but that are not communicating through distributor even if it's mostly the same thing but this let's say that this means applications are run on different nodes but don't communicate to each other with distributor Erlang so there are normal services communicating to each other or even just distributor Erlang so multiple VMs actually connected together and communicating to each other, the same principles apply the nice thing that I like about Erlang is that it forces you to think even when you're a process to process communication but you have to think about when you have node-to-node communication basically node-to-node communication is the same problems as inter-process communication so you have async delivery you have crashes unexpected crashes plus unreliable message delivery so when you have process-to-process communication you have one advantage and that is that if you send a message to another process it's guaranteed to be delivered to the process it's guaranteed to not get lost unless the system crashes but then you have other problems but if the process that sends and the process that gets it are alive it's guaranteed that it will get to the process this is not true in distributed system because if you can send a message to another process and the message can get lost if the connection gets interrupted so you have a little bit more problems but the base set of problems is the same when you have node-to-node communication when you have distributed systems with multiple nodes when you need each other I want you to know that BIM doesn't solve distributed systems this is not true BIM solves distributed systems this is my reaction BIM solves distributed systems no BIM doesn't really do anything for distributed systems except to provide a good set of tools so the tools that BIM provides make it easier to deal with a bunch of distributed systems problems but doesn't provide any solutions let's say it just provides tools a few examples of tools that I really like are like send so send is very simple it lets you send a message to a process the cool thing about send is that it works from node-to-node so if you have distributed early you can send a message to a process that's leaving on another node and you don't have to care you can do it transparently or kind of force to write your interaction in a synchronous way so you have to if you want to know if the other process got it and processed it you still have to wait for an act and everything it's very nice because you can use send basically in the same way you can use the same semantics and the same handling when you use send if you're sending a message to a local process or to a process on another node and same thing goes for things like process monitor node-to-node monitor if a node or if a process whether it's on this node or another node goes down those tools work very smoothly or global process registration is like a global registry name registry for processes these are tools that make it very cool to work with distributed systems and with distributed applications but they don't solve the inherent problems like how do I know that the other node got my message right one thing that we talk about that I want to really quickly touch on is hot code upgrades in the beam so I've never done hot code upgrades in the beam have you ever done hot code upgrades I don't even need to ask because you haven't because nobody has it's just one of those features it's cool that it's there where you can update code on the fly without shutting down your application but there are a bunch of enemies to this right now in the modern development world it's the same number one right if you have docker you really have a hard time because when you're building another docker image with updated code it's not straightforward to update the code on the fly in a running system so usually it's just easier to stop the container and start a new one but my question is do you even need hot code upgrades and my answer to myself is probably not so if you have a case like this where you have for example a server and you have a web socket client and this is a good use case where you don't want to interrupt the connection the web socket connection when you're pushing new code to the server you don't want to interrupt that the problem is that in the world that we live in connections suck so the connection will be interrupted anyways for a million reasons because people put phones in airplane mode because signal goes away because the connection is unstable whatever reason the connections are terrible so you still have to handle the connection going down might as well just make the deploy where you shut down an application and start again just a particular case of like one case possible case of why the connection goes down right and one more thing I want to talk about is to be more not to beam so when do we use the beam and when do we use other solutions for example if we want to talk about data storage when do you use postgres and when do you use something like reac so in beam solution same goes for key value storage when you use retis and when you use ETS or Nija I tend to go for whatever is best for the case right of course so if I need something quick and something fast or something that's fine on a single node for example for key value storage ETS is very very good because it's super fast and you don't have to have the overhead of connections and the protocol and everything but you have to think about a bunch of things like interoperability so are you writing all your services in Erlang because if you aren't then having a database that's not proprietary to the beam might be very nice because you can access it from multiple services does your database need relational features like foreign keys and stuff like that then it's very nice to have a proper relational database right do you need to do data analysis on your data like it's harder to do data analysis on ETS than it is to do data analysis on retis because retis provides a bunch of tools people wrote tools that work with retis and some of the replication is very hard like postgres will handle that for you retis will handle that for you but if you have to build replication on top of ETS it's not going to be fun right so my advice is just use the right tool for the job here and then I want to talk about one thing that's very very dear to me that I kind of like hinted at throughout the whole talk which is intercomponent communication that could be inter-process communication inter-node communication just like how do two things talk to each other asynchronously right and it boils down really to requests so when I have an elixir process or an Erlon process and that process sends a message to another process it's a request you can call it a request right it's just telling the other process to do something and so it could be for example an htp connection process that asks the database connection process to make a query and get results right this is the same exact same thing in a distributed systems where you have for example multiple services right you can have one service like order service just send a request to the storage service and I think it's really key when writing applications to understand what kind of request you're making between your components in order to write good applications right and there really are three kinds of requests there are at most one request a list one's request and exactly one's request and categorizing your requests not all requests but categorizing when you're dealing with a particular request putting it into one of these categories helps a lot at least helps me to decide the behavior that is right for this right and to put the right mindset for dealing with this so let's quickly go over all three and most one's requests are basically just black hole behavior so you're just sending a request to somewhere and you don't care if it gets processed you only care that it doesn't get processed twice but if you send it once it will not get processed twice right so for example this could be this is a gen server cast so I'm just sending a message and I don't care for the response in a service to service to communication it could be a message that you put on RobinNQ once and you don't care if it gets there or not and this is very useful for very shuggy attitude right and it's very useful for things like for example push notifications for messages like when I want to send a push notification it's usually better to send it and most ones because if you send it if you don't send it maybe the user doesn't notice if you send it twice the user definitely notices right so it's a good use case for just trying your best like best effort just try to send it once sometimes you want to make sure that your request is processed at least once so this requires a little bit more state because you need to keep track of whether you get an acknowledgement from the other component that the request was processed so for example in this case you just say I have a component I make a request does it get act no I retry does it get act yes then I'm good that's squares for good I don't know why so a trick for like a fundamental component of this of at least once request is the retry so when you want to make sure that something gets processed you need to implement retry strategies so that if it fails you retry to do it and the trick on the receiving side with for at least once request is idempotence so idempotence means that you can consume this you can say handle the same requests multiple times without being with the same effect of handling it once right so for example this can be I don't know it delete on the database it doesn't matter if you delete something once or if you delete something 100 times it's still going to be deleted at the end right and when you have idempotence it's easy because then you don't need to care about making sure that the request only gets to the recipient once you can just get there many times and then it just processes many times and it's the same thing and then the third kind is exactly once request those are the most expensive ones exactly once means that you want them to happen only once right and they require states at both the sender and the recipient because the sender needs to keep track of whether the recipient act this request so he needs to retry, needs to keep a state for that and the recipient needs to keep to ensure that the same request if it ever gets the same request for some reason for example a retry that was spurred from a retry connection so you get the request twice because the sender thought that you didn't act it so you need to ensure that those multiple requests multiple copies of the same request are only processed once so for example you need to give request an id and you need to have to ensure that when your process request is not already been processed you need to store the IDs of the processed requests and this is very rare that you need this so if you can try to reason about your system without this it's easier but it's necessary sometimes for example for financial transactions where you want to actually do the thing only once the transaction only once so conclusion about this main advice, use the right tool for the job very obvious I think that right now we live in a very polyglot software development world so I really like the beam but I think that you should always consider if it's the best choice if you want to introduce other components in your system you need to be sure that you're not locking in to the beam too much so go use Redis, go use Postgres if they make sense and if they let you use other services and then the other main advice is generally learn about distributed systems because distributed systems the problem is that distributed systems try to solve our problems that we deal with all the time when you have systems in multiple services talking to each other multiple nodes talking to each other it's a similar set of problems and then one advice is that beam architecture is good so take it with you even if you don't write elixir so things like this kind of reasoning about requests things about asynchronous communication about restarting processes that crash so having some kind of isolation those are I think generally good things so if you can take them with you even if you're writing something else that's very good a few resources the first one is designing for scalability there's a book to read the second one is the fred.ca is the blog for fredhabbert which is a very very nice guy that writes very interesting things about the beam that I really like reading and the third one is a book about distributed systems for fun and profit it's just a book about introduction to distributed systems that I recommend everyone reads if you're writing systems where you have multiple things interacting with each other you