 Hi, everybody. I hope you had a good lunch. I'm going to be giving you a bit of a talk on Erlang's fault-tolerance capabilities and the framework, the OTP framework that comes with it, which allows you to actually leverage those capabilities. For those of you who were in Brian's talk this morning, he gave a very good overview of what sort of things you want out of a fault-tolerance system and why you want them. This is sort of more diving down into the particulars of Erlang and how you get those from a programming, from a code perspective. So before we get too far, this is designed as a tutorial. You can follow along, and not as the mood takes you, but if you are following along, what you will need is a working version of Erlang installed, and you'll need the source code that's available at that GitHub address there. Also, if you're going to follow along to the end, we're going to, if there's time, we're going to be playing a little bit with Dialyzer. It's worth running that command that's at the top now, because that builds up what's called a PLT, which makes Dialyzer actually run at a same speed, so we might actually finish it before the end of the day. I'll just give you a couple of seconds to look at that one. Who on earth am I? My name's Bernie or Bernard, whichever one is easier for you to pronounce. I work for a company called Shortel. This slide's actually a shade out of date. We've dropped the sky off the division I work for now, but I work for the hosted part of Shortel, which is notionally the cloud part, and I work on the enterprise-grade hosted voiceover IP system, so it's a system where we run in our own data center. You come along with a bunch of IP phones, plug them into the internet, and we provide you a service that looks exactly like you've got a Cisco UCM or a Shortel box in your own building, but we virtualize that and put it all in the cloud for you. I've been working on that for about nine years now, just over nine years, and for the last six or so of them, we've been doing stuff in Erlang to a greater or lesser extent. Nowadays, everything new we do is in Erlang. We've still got a lot of legacy C++ code there, because it's a very big system, but that's what I do for a crust day-to-day. We also actually have an office here in Bangalore, so, you know, go over and say hi to them. They're nice guys. How is that earlier this week? So, first things first. We're going to start with a basic Erlang application. Now, this is the standard... It's the Erlang equivalent of Hello World. It's the first thing everyone learns to write in Erlang, which is a really basic chat server. Can I just get an idea? Who has written any amount of Erlang in the past? Who has got a bit of an overview? I know you have. Who actually works on Erlang in production? In production machine? Cool. Okay. About 50% of you know some of what I'm talking about. So, this is... We're going to go sort of very quickly through some basic Erlang stuff. It's not really the focus of this talk, but I realize a lot of you, the whole language is a bit of an unknown. So, we'll go through some stuff fairly quickly, and then we'll sort of dive into the OTP. So, what we're going to write first is a chat server. And it's going to have these three components that we've got up on the board. It's going to have the main server, so the process that starts when you start everything up. It's going to have another process that is the socket listener. It sits there listening for incoming connections. And it's going to have a bunch of transient processes for each of the existing client connections. So, for those of you who've got the code, it's in the simple... Is it out of that? Exit. Come on. Be cool. There we go. Here we go. So, this is our basic chat client, and it's going to look fairly familiar to anyone who's written a little bit of Erlang. So, up the top, we've got our module description, we've got the function that we export, which is just start, and we've got the start function. And this is not the first thing I should be looking at. Chat.Erl is the first thing I should be looking at. Good. This one has the start function, which is what we're going to call to get everything going. And it's going to spawn two processes, so I actually missed the process off there. There's this main loop. There's also what I've added called the chat hub. And the chat hub is the process that sits at the center of all the clients. All the clients send their messages to it, and it rebroadcasts them out to all the rest of the clients. So, we start up the chat hub process. We start up a local process in the listen function. And when we stop, we send the shutdown, and we exit the chat process. And so, in the listen process, we're going to call... We're going to register ourselves as having the name chat. And we're going to start listening on a socket. And this socket really should just be default chat port, which I've cleverly defined up the top and not used. Excuse the sort of line wrapping there. It's obviously not very high resolution to work with on these projectors, but we're going to listen, and these parameters just say incoming messages on that socket should be sent as a list. It's not active, so we've got to explicitly set it active true when we want something, and we're going to reuse the address, which means if I've just shut down a socket on this one, feel free to reuse the same port. It's all good. And then we're going to go into this await connection loop here. And await connection takes the socket as a parameter. It calls accept on it. Accept is a blocking function. Now, this is going to be problematic later, but for this simple one, it's fine. We're going to call accept on that socket, accept blocks until we get a new socket connecting to us, and that's going to be returned as connection. And so when we get that connection coming in, we're going to spawn a new chat client process, which is the one I showed you in error earlier. We're going to say to that socket that we've got, hey, your new controlling process is the process I just spawned. By the way, for those of you who aren't familiar with Erlang, when I talk about processes, what I'm really talking about is you can think of them as threads. They're called processes because semantically they are processes. They don't share any memory with the other processes, but in Erlang they're called processes. Think of them as threads if that helps. So we're going to call gentysp controlling process, which passes control of that connection, of that socket that was just created to that new client process. We're going to tell the client that it now owns that socket, and we're going to do that. This is the, again, for those of you who aren't familiar, this in Erlang is send message. So we're going to send a message to the client. That message is going to be a tuple containing the atom socket and the socket itself. And then we're going to tail request into ourselves and wait for the next connection. So it's a very simple loop, gets connection, creates a process for it, passes that socket off, goes back and waits for the next connection. So this is our chat client process, the one that's spawned for that socket to handle that socket. And it's, again, it's not particularly complex. This is, there's a little more to it. We'll just work from the top. When it starts, it's there waiting for that message that we just passed it. So it waits for that socket, receives that socket, returns that and binds it to the socket variable. Then it sends a message to chat hub, which is that other process that we started, to say, hey, I'm a new client. Here's my process ID, which is what self returns. Just so you know, so that you're aware of me and are going to broadcast messages back to me when you receive them. We're going to write out a little bit of debug, saying new client started. And then this is, this is just a bit of Erlang magic which says, on this socket, I'm now ready to receive something. But I'm only ready to receive something once. So I don't, like, fill up my message queue with stuff. I'm ready to receive something once. And then we're going to go into loop. This is our main loop. What does loop do? Loop looks at that socket. Sorry, loop doesn't look at that socket. Loop sits there and waits until it gets a message. And that message will be one of a few things. It can either be send, and this is a message that comes from chat hub which says, I've got a message that needs broadcasting to all the clients. Here's your copy of it. And all we do is just send that over the TCP socket. We loop again back to here. And then we get into the three things that the TCP socket can legitimately send us, which is TCP error. Something went wrong. Close the socket. Don't, well, we don't even have to close the socket because it's already broken. All we do there is write out an error message. You'll see there's nothing else that this function executes. So in fact, the process just ends there. And similarly, if the socket is seen to be closed by the remote end, we write out a message. Close, done. This is the actual workhorse function. It says we've received data on this socket. Now TCP is a streaming protocol, so packets can be broken up. And for simplicity, what we're doing here is the messages, we're not going to print them out, we're not going to re-broadcast them to everyone else until we've received a carriage return. So what we do with the data we get, you'll notice this other parameter here which I haven't mentioned, data so far. So that's going to be important in a second. But we're going to call handleData down here. HandleData takes in the list of bytes that was received and says, is there a carriage return at the end of it? Sorry, new line, not carriage return. If there's no carriage return, we know that it's only a partial packet. It's only a partial message, so we've got to wait for the rest of it to come in over TCP. So all we're going to do there is just return what we've got so far and exit out of this function. And what's going to happen there is remaining, so the unused data, the unprocessed data is assigned to that data that we got. We've passed in, you'll notice, data so far plus data, plus plus data. And so we're going to return remaining as unprocessed data, and that's going to be fed back into the loop as data so far. So next time, hopefully, we do get an end of line. We'll append the packet we just got to the data we've got so far, and we'll use that whole thing as the message. If we did get the end of line, what we're going to do is split the string where that line is. We're going to handle a line with that one line, and we're going to recursively call handle data ourselves. In case two messages came in in that one TCP packet. And what does handle line do? Handle line, so ignore that commented out bit. That's what I'm going to show you later. Handle line just says send this message to the chat hub, send a message that is broadcast from me of the line. So it's relatively simple. When we get a line, we pull out the exact line, throw it to the chat hub and say broadcast this. So the final bit of the puzzle is the chat hub. And this looks, this is also very simple. We start up, we register ourselves as chat hub, and we call loop with an empty list. And the list that's passed in is going to be the list of active clients. And we're going to sit here waiting for a message. So we can receive the new client message, which you saw the client sending. What that does is it sets up a monitor on the process. For those of you not familiar with Erlang, monitor is a built-in function that allows you to watch a process and you automatically receive a message if that process ends. And indeed, you receive a message if that process doesn't exist as well. And so we're going to add a monitor to it, and we're going to call ourselves loop with that process added to our existing list of clients. This is the magic message that monitor sends if the process does end. So we receive the down atom in a tuple followed by a reference, which is the reference you get back from monitor, but we're not going to bother comparing that. It says it's a type process. And the important point for our perspective is this PID here, because we're going to remove it from our list of clients. And then we're going to loop back around. Info is just more info on why it exited, the important point for us is it's not one of our clients anymore. This is our broadcast function. So, and I actually forget about this one, because I haven't implemented that in the chat client yet. This is our broadcast function. And it's the one you saw in the chat client. It says broadcast is message. It's from the from process and it's message. And what this says is this is a list comprehension. It's a very powerful little Erlang thing. And it says, for each PID in clients, where PID is not equal to from. So that's all our clients who aren't the one that sent us this message. Send to PID the tuple, send, and the message. And you remember in chat client, we had the receiving end of that where it received message and printed that out onto its socket. And loop around again. If we receive the shutdown message, we send shutdown to all of our clients, just like before, except we do all of them, from and we exit. And any other message we get, we don't want our mailbox building up, so we're just going to eat it, do nothing with it, and loop around again. So that's all cool, I hear you say, but does it actually work? Well, I really hope so. So we go chat. Oops, hat. Start, full stop. That just returns us a process ID. And then these are going to be our two clients, and they're just going to telnet into port 2000 from localhost. Lose the ability to type when I'm on stage. And you can see now it's connected, and this has printed out our new client started, and it's given us the process ID of that client. This is the other one. Another client. Now, here comes the magic bit. This guy's going to say, hello, client two. And sure enough, it's that hello client two on the other one. Okay, so that's all cool and magic, and anything that anyone who's learned or learned for about 15 minutes could probably do. But it gives us a starting point, right? So that's our chat server. That's our very basic chat server. What you will notice about this is that we've done absolutely nothing to deal with the possibility that something might crash, that something might go wrong. So what are we going to do about that? First, I'm going to find the slide I was on. So that's our chat server. Plus, I forgot the chat hub box. Sorry about that, but there's another process they call chat hub at any rate. Now we're going to introduce the OTP, which is what you came here to actually hear about. So these are the basic five components of the OTP. First thing is supervision trees. This is a way of arranging your processes in a structured manner so that you can tell the system exactly what behavior you want when something fails. So we'll see this in pretty pictures later. But that's the sort of short version. And there are processes called supervisors that you can start up using the built-in O-lang supervisor module that will watch other processes and take the appropriate action when they fail. They can also be responsible for starting them up and other such things. And they use internally that monitor call that I showed you earlier. The next element is applications. I'll get into them a bit later, but they're sort of the ways of grouping a bunch of processes together. And then you've got these three, GenServe, GenFSM, and GenEvent. They're called the no-lang behaviors. You can think of them if you're familiar with the Java world as interfaces. They give you a structured way of putting your process together so that supervisors and so forth can communicate with it in a known way and so that it will behave in a known way. And it gives you certain bits of introspection and so forth for free. It allows you to get the state of a process automatically without you having to write any other code. They're a TEP, as I mentioned. It's what allows you to make O-lang sort of promise of fault-tolerant, the thing that everyone waves their hands about when they talk about O-lang, myself included. It actually allows you to do something concrete with that and start to make your system fault-tolerant. So we'll look at each of those in a little bit more detail. Applications are your coarse-grained building blocks. They're what you group a bunch of related functionality under. You can start and stop them as a whole group. Our chat server fits nicely into an application. They're potentially reusable, so if you're familiar with some of the built-in applications in O-lang, Mnesia is an application. The SNMP server is an application. Crypto is an application. There's also Kernel, which is the low-level application that is always running. It's just a way of grouping all your process together and having a top-level process that's looking after all of them. It also allows you to do things in collaboration by application and so forth. GenServer is your basic long-lived server process. I should say, by the way, if anyone's got any questions at any time, throw up your hand. I'm better to answer them as I'm talking about the thing than sort of wait till the end, necessarily. A GenServer is a behavior that you would use on a basic long-lived server-style process. It doesn't necessarily have to be that long-lived. It can be sort of minutes or 30 seconds or whatever, but it's something that sits there until it responds back. So it's your standard sort of wrapper for an actor model thing that you would use in Erlang. And it has systems for responding to both synchronous and asynchronous requests. So you send a message, you wait for a response, or you send a message and you don't get any kind of response, but you assume that it probably got there. The second one on the list was GenFSM. It's a behavior for implementing finite state machines. And it sounds really cool, and in reality it's actually a little bit clunky probably anyone uses it as far as I know. We use it in a couple of little specific places in the code, but in reality there's kind of, there tends to be better ways to approach FSMs, which is a whole talk in and of itself that I'm not going to give today, but it's a way of... This is one simple way of representing an FSM. We're not going to really look at it. The third one is GenEvent. And this is the behavior that allows you to cleanly and sort of within the OTP context implement a subscribed notify system. So the module that implements the GenEvent interface is the subscriber in this case. So if I want to listen for, say, messages coming in from the logging system that I've built, it might fire off GenEvent messages. I will build a GenEvent module that receives those messages according to the GenEvent interface. But you can use it for any other sort of subscribed notify systems you might have. And the last one, and this is kind of the core of the whole thing, is the supervisor. The supervisor, it does exactly what it sounds like. It supervises, it watches things. It doesn't actually do any work itself other than starting processes, watching processes, and restarting processes if they need to be. There's four different types of supervisor that you can have. And this will make a little bit more sense when you see a supervision tree laid out. But there's a one-for-one supervisor which says if any of my children's processes crash, restart just the one that crashed. Don't touch any of the others. One for one. It died, I replace it. There's one for all, which kind of does what it sounds like. If one process crashes, the supervisor will tear down the rest of its supervised processes and start them all back up in the same order that they originally defined to start up. Rest for one is kind of halfway between the two. Processes under a supervisor always have a defined order, and that order is the order they start it up in. For a rest for one supervisor, if one process crashes, all the processes to the right of it, all the ones that started after it, are also torn down, and then it will restart from the crashed process, but it will leave any that started before it. And this is actually more useful than it sounds because there's a lot of cases where processes have a sort of one-way dependency. So you start this one, and then you start this one, and the reason you start in that order is because this one is dependent on this one, but this one is not dependent on this one. And you might have a whole chain like that where the dependency runs left to right. And so that's exactly where you want a rest for one supervisor. And the goal here, when you're choosing a type of supervisor, is to minimize the impact of a crash. So you want to be able to restart the least number of processes that can be minor, maybe it's not going to be visible to the customers. Maybe it is. You want to minimize that impact. There'll never be no impact to a fault. It's about minimizing that impact. And so that's a matter of choosing the appropriate supervisor that will restart the least number of things while still probably getting you back up to a running state. And the last one on that list is simple one-for-one. This is what you use when your supervisor is basically just a holding container for a pile of more or less homogenous processes that you don't really particularly care about. You just want to know when they die and you want to have some supervisor to jam them under. And the example here will be our chat client process. Yeah, a chat client process. We're going to have a bunch of them. If one of them dies, there's no point in restarting it because the socket's closed. So if it dies, it's going to die. We want that logged. That's the job of the supervisor. But nothing else is going to be done. So it's basically just a holding pen, the simple one-for-one. Let's go back to our chat server. This is how we're going to put it together. We're going to have our main supervisor, which is our top-level supervisor, and it's going to be a one-for-one supervisor because none of these are intrinsically dependent on each other. They all refer to each other via registered names. So if one goes down and comes back up... Hmm. That's not quite right, is it? Let's say for the sake of argument that I'm right. If one of those goes down and comes back up, the other processors can all still refer to it as by its named process without needing to know that it went down and came back up. And we're going to have a simple one-for-one supervisor hanging under there to which we attach any number of clients. Now, the reason I said it's not quite right is it turns out our chat server, or chat hub in this case, is going to keep a list of clients. So when it dies, we're going to lose that list of clients and they're not going to be able to communicate anymore. So if the chat hub dies, we probably want to kill all the clients as well. But let's stick with this for the moment because this is what all my code is based around. What you would probably do to solve that, by the way, is have another one-for-rest supervisor sitting under this. You would drop the chat hub next to the client supervisor so that if the chat hub died, the client supervisor would die, all the clients would die and the hub would come back up. The client supervisor would come back up and the connection listener would still be able to start new clients under the chat client supervisor. Nobody pointed that out to me last time I gave this talk. So this is what a supervisor specification looks like, and it's really ugly. I don't like them, but what are you going to do? The comments are almost required. Nobody can read these things just by glancing at them. Nobody I know anyway. But a supervisor specification, so you define... a supervisor and you pass it a module and the module's got to have an init function and that init function has to return something that looks like this. So it's got to be... it's got to be a tuple that starts with okay to say, okay, I've got a good specification for you. And then the first three lines are the strategy, the max retries, and the max timeout. So the strategy I've already been through, max retries and max timeout are important because you can imagine very easily a case where a process crashes because of some external thing that has been sent to it and it restarts immediately because it's fault tolerant and that same thing gets sent to it again and it crashes again and it starts back up again. And you can imagine something like that happening, getting it into a very bad loop. And it's not really becoming fault tolerant, it's not recovering at all, it's just sitting there chewing CPU cycles, restarting and crashing a lot and filling up your logs. So max retries says, only try to restart this four times. And only try to restart this four times within a period of the next one, which is 1,000 milliseconds. So if these processes that we're supervising under this specification crash more than four times in a second, we're going to give up, we're going to bail out. This supervisor is going to exit and the next layer of the supervision tree is going to kick in and maybe that's the last layer of the supervision tree, it means your whole application will shut down. And the other block there is a list of the children. And so I've just got one example child there. The first element is the ID of this child. It's the ID within the context of the supervisor. So if later code needs to refer to children of this supervisor, they can do it by that ID. The second is the, generally referred to as MFA, module function argument specification for starting up the process. And that's just what it sounds like, it's the module that the starting function is in, it's the starting function itself, and it's a list of arguments. So it can be empty or whatever. And so when this supervisor wants to start this child, it will call that function in that module with that argument list. The next is the restart type. So this tells the supervisor under what circumstances should the restart occur. And there's three types of restart type. There's permanent, which means always restart this. This is a process I expect to be permanently running all the time. If it's not, you're not doing your job properly. There's, let me get this straight, there's transient. No, there's temporary, which means restart this only if it exited in error. So if it exited with normal, so that every process exits with a particular exit code, and the standard for a properly exiting process is the atom normal. Temporary means if it exited with normal, don't restart it. If it exited in any other way, assume it had eroded before it had completed its task and restarted. And the third type is transient, which means never restart this. It pops up, it pops down, we don't care. We're just watching it. It's fine. Shutdown timeout says, when we go into a shutdown mode, so supervisors are also responsible for shutting things down, when we go into a shutdown mode, give it this long to shut down. Try to nicely shut it down first. If it hasn't shut down within a second, a thousand milliseconds, then you can kill it. Do whatever it takes to kill it. But give it some time first. We want it to try to cleanly shut down. The last two are of more interest if anyone's kind of doing hot code stuff, which is a whole separate discussion itself that I'm not going to go into in great detail. I'm going to give a quick example later. But if you're doing a full release cycle with Erlang's release system, and you want to do full application, hot code loading and stuff, the last two are relevant. Otherwise, the second and last one is just either worker or supervisor. If this is a worker or a supervisor that we're starting up here. And the other one is the modules that are used by this child. Usually, unless you're doing hot code loading stuff, it's sufficient to just put whatever the starting module is in that module field. Cool, so let's crash some processes. I hope this will work okay on this tiny screen. So there's a cool thing in Erlang called Observer. Not if I type it like that. Observer. For those of you who haven't seen this in Erlang, it was added like a couple of major releases ago, and it replaces an old process called Atmon. And it's awesome. It's the best thing ever. It's got all this sort of information about the running system, memory use, and all that sort of stuff. It's got load graphs. It's got lists of running applications. And this is the cool bit. So this is the kernel application. Process lists. ETS table viewers. Tracer overview. Okay, that's all cool. We'll use that, though, once we've actually got a running thing. I kind of put this out of order. So, what you'll notice here is we've got four processes. We've got, whoa, where's my mouse gone? We've got process 36, which is our server. There isn't a logline for it, but process 37 will be the chat hub. Process 38 and 39 are the two client processes. So if I go exit, kill is that the order? Probably. S2pid38.39. So this is going to kill... No, that's going to get a syntax error. This is hopefully going to kill process 38, which is this window here, the middle one. Except I've done those parameters in the wrong order, haven't I? And you can see I killed that child process. And sure enough, I've been booted. This client has been booted. Now, that's pretty much what you'd hope would happen if the client process crashed anyway. The client gets booted. But what's much worse is if we kill process 36, I think is the one I want to kill, because suddenly that process has crashed and I can't reconnect to the chat server anymore. The chat server has gone down. Nothing's recovering it. We're doomed. Our customers are very angry at us. That's not fault tolerant. That's kind of crappy. So now we're going to go back to the tutorial stuff and we're going to rewrite all this stuff in those Erlang behaviors. So let's sit out of that for a moment. And we're going to go in the OTP directory for those of you who've downloaded it. There's a temporary file. But crucially, there's a blank Gen server there. And this is sort of your bare minimum Gen server. So the trade-off with the OTP is it does require a little bit of boilerplate. So nothing like as much boilerplate as like a Java factory or anything like that. But it is more than perhaps you're used to writing in Erlang. I'm just going to run through the basic parts of it here. So start link isn't actually a required part of the Gen server behavior. It's kind of just a convention that you wrap the starting system in a start link. And that way, callers don't have to know that it's a Gen server. And in fact, any functions, any interface functions on the Gen server, you usually wrap them in a function with a sensible name the functions that are required for Gen server don't usually have sensible names. So we're going to have a start link function and that's just going to be a wrapper for a Gen server start link. The parameters here are the registered names. So we're going to say we're going to register it locally, not globally, and we're going to register it with the module name. The second parameter is going to be... You're right, I shouldn't code live because I can't remember for the live me what the second parameter is. I'm going to pull up an example. The second parameter is just going to be the module. So I don't know why there's an empty list there. And that tells Gen server that the functions implementing this Gen server are in this local module. Now you can obviously call Gen server start link from another module and pass it this module explicitly if you're so inclined. There's just kind of a standard way of doing it. And now our actual functions that do stuff. It is the function that's first called. It's called with any parameters that you pass in this third element here of the start link call. So in this case, in the case of a simple blank Gen server, we don't care about the parameters. It's going to be an empty list, obviously. And the return value is defined to be OK and the state of this process. Now you'll notice that I have very cleverly... Oh, I very cleverly have defined the record state up here but with no members in it. This is just a sort of boilerplate. This is the boilerplate required for it but it's not actually implementing anything yet. And the state is... Remember, Erlang doesn't have any global variables or mutable data or anything. So every time a callback is called in Gen server, it's going to be passed this state, we're going to operate on it, and we're going to return a new state. Maybe it's going to be the same state, maybe it's going to be changed. But all this is executed in the context of one Erlang process and so it shares that one state. And then these are the actual functions that do stuff. Handle call, handle cast, and handle info. Handle call is your asynchronous function. That's an idiot. Handle call is your synchronous function. It will do stuff and the process that called Gen server call, which is the way you access, which is the way handle call gets accessed, is going to wait until this called handle call has returned. And so the return value from this is going to be a reply. The reply that we're giving it, so in this case we're saying bad call because, well, nobody should be calling this Gen server because it doesn't do anything. And the same state that we were passed in because we don't want to mutate the state at all. Handle cast is the sister or brother or whatever to handle call and it's the asynchronous one. So it never replies. It usually returns no reply. This can also return stop if you want them to be terminating cases. But generally it returns no reply and the Gen server cast, which is the function that a caller will call to get to this, returns immediately without checking for this having done anything. And handle info is the callback function you use to handle incoming messages that were not generated with a handle call or a handle cast. So for example, the early monitor that we called earlier, if it fires and we receive a monitor message, it just comes in the form of a tuple, which is the atom down and some process information. It doesn't come as a Gen server call or a Gen server cast, so it's going to have to be handled by the handle info function. And we'll see some examples of this in a minute. Last to once code change takes... This is again used for hot code loading, so I'm not going to go into this in too much detail. It's used for mutating the state from the state required for the old version to the state required for the new version. And so if you've updated your code and you've added a couple of fields to the state, the state that's sitting in this code, in your existing running code, isn't going to have those fields. And indeed then when one of your new functions is there, it's going to crash because they're not. So when hot code loading occurs in the OTP, code change is called, you're past the old state and you've got to write code explicitly to mutate it to that new state so that subsequent calls on your new code have the kind of state they expect. And you can pass some extra information there if it's going to be helpful. These are all part of the Erlang release stuff. Again, I'm not going to talk in great detail about that because it's a whole sort of two day course on its own and I don't know that anyone really understands it very well. And finally, terminate. This is kind of like a destructor in C or C++. It's called when the process exits. Catch it, it's not always called when the process exits. If you want to guarantee that terminate is going to be called, you've actually got to add a trap exit. So you go process, flag, trap, exit, true. And what that says is when any linked process to this one dies, abnormally, rather than just killing me immediately, you need to send me a message first. And it's not really obvious why that's linked, but it's an internal implementation detail of GenServer, which means if you want the terminate function to actually work as you expect it to work as a destructor, you need to have that flag set. So that's all very fun. But we want to actually implement our chat server in this. So now I really am going to do some on-the-fly coding. Yeah. Yeah. So they're part of the function signature as defined by the GenServer behavior. And what they mean, there's a few different things that you can return in your handle call function. There's reply. You can call no reply, and you can later call GenServer reply to explicitly generate a call if you need to wait on something, but you don't want your server to block. You can return stop. Those are the three ones I can think of right now. Reply, no reply, and stop. Reply has no meaning in handle cast or handle info because there's nothing waiting for reply at the other end of them. Reply is only meaningful in handle call where there's someone on the other end of that call waiting for a response. But they're just a bit stronger than conventions, but they're part of the definition of what those GenServer functions need to implement. So for handle call to return a value, the way it's got to do that is that the handle call function has to return reply. The second parameter of that return tuple has to be what it's replying, and the third parameter is its new state. And similarly, for handle cast and handle info, these will do some work. We'll see a bit more of this in the example, but these will do some work, and then they will return no reply because they've got no one to reply to, and then use state. It'll become clearer with an example in a minute, I hope. Su. We've got this chat client, and it's got a start function, and it's going to first of all wait to receive a socket message from the chat hub. And so we need to think... Because we don't want this function to be blocking in a GenServer because it needs to be able to receive shutdown messages and stuff, we need to think a little bit differently about how we're going to manage this. So we can't necessarily just write out which is a bit of a bummer with OTP, and there are ways around it, but they're not ways that are built into the OTP. So your code does get a little bit more complex if you follow sort of pure OTP style. But what we're going to do, so this is an incoming message, and it's an incoming message that's not a cast or a call or anything like that, but in fact it probably makes sense to make it a cast. Because it's an asynchronous message that's come in, a process that sends us this message doesn't care really whether we got it or when we got it. In a sense it cares whether we got it, but Erlang has guaranteed message delivery for some value of guaranteed. So we're going to put it in the handle cast, and it's going to look like this. It's going to look like a socket. And what we're going to do with that is we're going to jam that socket into our state. So I'll break this out into a couple of lines just so it's clear a new state equals state, state where socket equals socket. And then we're going to return new state. So what this has done is when somebody, when the thing, when the process that's receiving the sockets sends us the socket with the message socket, the socket value, we're going to create it, we're going to update our state so that we're now holding that socket. And the other thing of course that we're going to do is the other things that we're in here. We're going to write out the debug line. These keys are all in a different place. We're going to write out the debug line and we're going to set the socket active so that we can receive something on it. Now our init function is going to be pretty simple. We're not actually going to worry about the process flag. We're going to call this check client here instead. That behavior line is a hint to the compiler. It says make sure that this module implements all the functions you expect to see in a gen server. So those are init, handle, call, handle, cast, handle, info, code change, terminate. If it doesn't implement those, then the compiler will complain that you haven't fully implemented the gen server. Now, because we've now got a socket in our state, we need to add a couple of things here. One thing, we need to add a holder for the socket. And so that's our start function, as a gen server function. It's not quite our start function. Our actual start function is this one here, which is init, and that just fires it up, but there's nothing for the check client to actually do in this case. So what other functions does this have to handle? Well, it goes into this loop and it's got four, five, five different messages that it can take. Six, if you count that. And so we basically want to write that down as well. Now, these are coming in not as gen server casts. They're coming in as naked Erlang messages. So we're going to have to put them in the handle info function. So we'll just work from the top. And this one will be sendMessage. And what we're going to do there is NTCP sendSocketMessage. Now, we don't have to call loop like we did in the previous one, because the gen server module itself, which exists over in Erlang library land, actually has our main loop function in it. So if it has the main loop, it manages that stuff. It manages a bunch of other potential incoming messages through that we don't have to handle here, which I'll show you in a bit. But all we have to do to get back into the main loop is to return from one of these functions. Now, what you'll notice here is that... What you'll notice here is that I can't use a BIM. What you'll notice here is that we don't actually have this socket at the moment, because it's kind of jammed into the state thing. So we need to bind it in the call. So that pulls out the socket that was set up in the state here and passed into the new state. Now when we get that state, we can pull out the socket and make use of it by sending things. And we'll just do similar things for the other two. TCP error. Don't really care what the error is. Don't really care what the reason... Sorry, don't care what the socket is. Don't really care what the state is. Because all we're going to do here is stop. And the reason we're going to stop... I mean, we can throw in the debug line too if we're so inclined. So what this says is anytime we get a TCP error thing, we should probably really check that it's coming from the right socket, I guess, if we want to be properly careful. So now that function will only be called if the socket that we got the error on is the one that we have in our state. Handle it... What was I saying? Right, so we're calling stop. And what that says is to the main gen server loop, I'm done. Don't bother calling me anymore. Shut down this process. If you do log it anywhere, by the way, here's the state it was when it shut down. So it's always good to pass in the correct state to that. But obviously that state is made no further use of. Yep. Sorry, the reason. Oh, yeah, I didn't... Sorry, quite right. Because I wasn't using it when I started writing that line, but I am now. Yep, and we do a similar... I won't duplicate all my code here, but we do a similar thing for TCP closed. And then this is, of course, the interesting one. Plop it in here, wholesale for the moment. Now, so the nice thing about this is we can still just use the same handle data and handle line functions that we had in this version. Nothing's actually changed with them. Nothing much, anyway. I have realized there's a slight race condition here, but we won't talk about it for a moment. So we get just what we had before, except instead of looping here, we go... No reply. And what we want to do here is update the state because something here has changed in the state. We've actually got new data so far, which was the other parameter we were passing around. So we're going to add another parameter here, and we're going to add data so far to our state. So we've now got two things that are being passed around into every function and back out, and each function can modify them and pass out a modified version of that state. If you remember, data so far was just to handle the fact that TCP packets can fragment TCP messages, and they can also contain multiple TCP messages. But the rest of our functions are going to be... The other two functions, the handle data and the handle line, are going to be the same. And that has basically OTP-ized our chat client process. So I can show you the other ones if you want. I can sort of run through them manually. They're all in the OTP chat directory of that Git repository I showed you. So what I'm going to do instead, just quickly, is fire up the OTP version and show you what's different about it, and equally what's not. So we would follow a similar process for the other two, for the chat hub and for the main process itself. We'd also... There is something else I need to do, of course, or at least to show you. So I mentioned applications earlier. So when you define an application, you do it in one of these app files, and it looks a bit like this, and it says it's an application, obviously. Its name is going to be chat, so we can refer to it in the application module as chat. The module that it's going to start in is chat app. So I'll show you that. And an application module is very much... It's a behavior just like a gen server is, and it exports these three functions. So it exports start, stop, and config change. Ignore the init for a moment. That's the other part of it that I'll show you. Start, this looks obviously very similar to our... Well, sorry, no, it doesn't look anything like it. This is the start function, so when you call chat start, this is what ends up being called. Sorry, when you call application start chat, this is the function that ends up being called. And what it's saying is start that supervisor. And the supervisor, this is saying, is defined in this module, and the parameter we're going to pass it is main sub. Stop just does nothing. We've got no clean up to do or anything. It just stops... When I say we've got no clean up to do, we've got no explicit clean up to do. That will stop all the supervisors under this application automatically, and thereby all the clients and all the processes and stuff. And config change, again, is to do with hot code loading. So we'll skip over that. You will have noticed up here there was two behaviors, the application, and in the same module I've also implemented a supervisor. That's probably bad form, but I wanted to keep it fairly small so that's why this start link to a supervisor says the supervisor is also defined in this module and the parameter you'll be passing is main sub. So a supervisor is defined by a knit function. And this is that horrible big blob of tuple that I showed you before which defines what's going to run under the supervisor. You can see we pattern match on that main sub thing so we know which supervisor you're starting. That's an easy way to define multiple supervisors within a module. Now we're going to have the one for one. So this is that tree that I showed earlier. We're going to have a one for one. It's going to have four retries over 1,000 milliseconds. And the thing we're going to run is the client supervisor. So that's a supervisor under a supervisor. That client supervisor is going to be started with supervisor, start link. Its name is going to be local. It is going to be clients up and registered locally. And again, it's going to be in this same module. For those of you unaware, the question mark module is just a preprocessor macro that's replaced with this module's name. And the parameter we're going to pass it is clients up. So you'll see another in it just below this one whose parameter is clients up. It's going to be permanent. It's going to have a shutdown time out of 1,000 milliseconds. And crucially, this one is going to be a supervisor, not a worker. We're also under the supervisor. So this is the list here of things we're going to start. We're going to start the chat event manager. Chat event manager I haven't shown you before, but it's going to be a process that listens, that generates chat events. I can't remember if I... So we'll skip over that one for a moment. That's a useful demonstration of the chat event, which I'll get to later if I have time. Oh, sorry, of the gen event, which I'll get to later if I have time. Ignore that one, that's part of it as well. We're going to start the chat hub, which is that central process which rebroadcasts all the messages. Now you'll see I haven't written out a full specification here. That's because one of the slightly neater ways to do it, rather than having 20 big specifications in one place, is to actually have the specification sitting in the... in the module itself. So you'll see here we've got a function called subspec, which just returns a little snippet that can be dropped into a supervisor specification. And this last one is something I really didn't have... I really wish didn't have to exist, but Erlang, for all its awesomeness, has one really kind of glaring design flaw in its TCP system, and that is that the built-in accept function is blocking. And what you'll have noticed in the rest of the gen server stuff is that it doesn't take kindly to blocking. The whole point is that it should be constantly in a loop, constantly ready to receive a message somewhere. If you block waiting for a connection, you can't receive any other messages. You can't have someone request your state. You can't have someone send you a shutdown message or anything like that. So there's a hack that almost everyone uses to get around it. And it's this module that is sitting on the TrapExit website. That's not what I wanted to help tab to. Module sitting on the TrapExit website called... This is a slightly modified version of it called ACCC TCP listener, blah, blah, blah, blah, blah. And it's a gen server. I won't bother you with all the details. The key is this single line here, which is prim inet async accept, and it's an undocumented function. But what it is, importantly, is a function that will probably always be there, so you can probably count on it to continue to exist. And it's an asynchronous accept. So instead of being a blocking accept like the documented accept function, it's an asynchronous and when a function... Sorry, when a remote client connects, it receives this message here. So you can see we're in a handle info just like we were looking at a minute ago. And it's a tuple that contains the atom inet async, the socket that was being listened on, a reference which we don't particularly care about, and OK, client SOC, which is what actually happened with the accept operation. So the accept would have returned an OK and this new socket. Have a look at this in your own time. I'm not gonna sort of dwell on this particular module. It's... But it's a very useful one to be aware of if you're trying to code purely OTP stuff, and you go, well, how do I... I'm gonna say it's Erlang. I'm writing a server. Of course I'm writing a server because I'm writing Erlang. How do I do asynchronous connections? And the answer is you don't until you find this trick. Yeah, so I'm not gonna spend any more time looking at that. The important point is that that's how you get asynchronous TCP stuff. Yeah, no. Because underlying Erlang's TCP system is... They're called port... I think it's port drivers in the case of TCP. And a port from Erlang's perspective is kind of just another process. So when you start up a socket, turns out that that starts a port. I think it starts a separate port, or it goes into the sort of unified TCP port. I can't remember implementation details, not important. The important point is there is already a separate process that you can't see that you don't need to know about, handling the TCP stuff behind the scenes. And so it's what will generate this incoming asynchronous message. And indeed, if you look into the code for the TCP accept function, it actually calls that anyway. It calls prim-init-accept, prim-async-accept. It just blocks on waiting for a response from it. So yeah, it's not really clear to me why that's not exposed. I really should get around to submitting a patch to see if we can get that fixed. But anyway, that's the situation at the moment, and that's how you get around it. And so those are the things we're starting, ignoring that one and ignoring the event listener. So that's our application, that's our supervisor. Now, the other supervisor, the client SUP that we started here, its definition is down here. And you'll see it's much simpler. It's a simple one-for-one supervisor, which is, as you recall, what we use when we have a bunch of sort of not necessarily long-lived homogenous processes that are all started the same way to do the same work, but for a pile of different clients. And in this case, it's our actual client processes that are going to run under here. And again, I've done the trick that I did up there. I've just defined the supervisor spec itself in the chat client process. So there's the supervisor spec. A simple one-for-one supervisor spec looks a little bit different, but it fundamentally does the same stuff. It gives you an ID. Now, the ID for all of those processes is going to be the same. It's going to give you a function and a module that we started in. This restart type is temporary. If you remember, that means that if it dies, I don't care, let it die. Because if one of these chat processes crashes, if one of these chat client processes crashes, if we bring it back up, there's no point, because the socket will have died. The client will have disconnected. So we just let it go and let the client reconnect and create a new process. Shutdown timeout, child type, child modules as before. This is obviously very similar to what I was manually writing before. I've added a couple of little things. So remember, I said it's always good form to wrap a GenServer function in a wrapper function like this so that the people calling into this module don't need to know it's a GenServer. So in this case, the chat hub is going to want to send us a message, and it's just going to call chat client send. Now internally, we implement that as a GenServer cast, but the chat hub doesn't need to know that. It just knows it's calling the send function. And this is particularly handy, because it means you can rejig functions as synchronous or asynchronous. You can move away from a GenServer entirely if you want. We had one process implemented as a big complex GenServer that we decided wasn't really appropriate. We got it and rewrote it in a totally different style, but the external function stayed the same, so none of the rest of the system needed to change. And that's kind of the value. It's the same as you sort of get a set of functions in C++ or whatever. It's a matter of encapsulation and keeping the implementation details, the fact that it's a GenServer in this case, hidden from the people trying to use it. This is all much what I was showing you before. So in my sample code here, I've actually done the transfer socket a little bit differently, I think. I don't know, it's basically the same. Except I've added a little feature, which says enter your name, and so this guy's also got a name, which we'll see in a moment. Yes, so a simple one-for-one supervisor. When you want to start one of those processes, when you want to spawn them, it's actually the supervisor that you get to do that. And so the signature that you provide the supervisor will call the started. So when I say homogenous, they really are just all one... Well, they've all been started the same way. They'll obviously have different parameters, different sockets being passed to them and stuff. But the thing you're starting, the class of thing that you're managing, is always all the same. If you want a bunch of different things, so you want a supervisor managing your clients and a supervisor managing, I don't know, people looking at the stats of the system or something, and they're being managed by two totally different process types, you would just have two simple one-for-one supervisors, one for the clients, one for the other type of stuff. And so you're going to have as many of them as you want. These are all line processes, they're really cheap. So, does it work? Maybe. So now we're going to call application start chat. Yeah, sweet. Live demos for the win. So I'm going to enter my name here, and my lovely assistant, Brian, is going to be on the other window. And you can see again, we're logging, the new connections have been established, we're logging the process IDs and stuff. And I added a tiny little feature here, so if I say hello Joe, it sticks the name in front. And that works well. So that's great. So now I'm going to fire up that observer process again. Now if I go to the applications, I don't know how well you can see that, but that is kind of the tree that I drew earlier with a couple of extra things added. So on the, and rotated through 90 degrees counterclockwise. On the far left are the internal processes that Erlang uses to manage the process, the application. The far left one is called the group leader. It's not particularly important right now, but it is, if you ever hear the name group leader, it's the one on the far left of the tree of the applications, and it's kind of the overlord of the whole application. I don't know what the next one is, some internal kernel thing. The next one down is our actual application itself. Sorry, it's our supervisor. It's our main supervisor. It doesn't have a registered name there for reasons that I'm not entirely clear on, but 0380, that's the main supervisor. That's the one for one supervisor. And you can see hanging under it, just, you know, turn your head to the left and it'll, the under bit will work. Is our async TCP listener, our event manager, ignore that, our chat hub, and our client supervisor. And that's our simple one for one supervisor. And under the simple one for one supervisor are two processes, just as you'd expect. And guess what? If I go here, background that one. Can I background it? No, it's not going to let me. We'll do this a different way. No, not that. Fire up another terminal. I'm going to turn on that. Fire up another guy called Chris. And you can see there's now three processes under there. This observer thing is built into Erlang. It's the best. So, I've now got three guys there talking. You can see Chris says hello. He'll, oh, I've killed this guy's terminal accidentally. Oops, that's easy. Hi, Byron. So, you can see that's also working. I've got three people chatting now. But what happens if heaven forbid we've got a bug in our code? There's my observer gone. There it is. What happens if one of your processes were to meet with an unfortunate accident? In fact, the reason I didn't bother with that problem in the chat hub is because I fixed it in this version. So, chat hub no longer actually maintains a list itself, wouldn't it? I'll share the code. It'll make more sense. The other nice thing about having supervisors that know where all the children are is that we can ask that supervisor where all its children are. So now, when chat hub gets a broadcast request from one of the clients, sorry, it actually gets it from itself when it gets the broadcast message. But anyway, when it knows it needs to broadcast to a bunch of people, to a bunch of other processes, rather than maintaining its own list of processes and monitoring them and everything, well, there's already a process monitoring them in that list, and that's the client sub-supervisor. So we can say, hey, client sub-supervisor, what are your children? And it will just return us a list of its children. And then we can do basically what we did before, which is now that the children list is actually a list of four tuples which has a bunch of other information about the children, but the second element of that is the process ID. So we can say for all the children where P is a process ID and P is not the process that it came from originally, called chat client send, so that's that wrapper function around the gen server, so this guy doesn't know that chat client is a gen server, he doesn't care, he calls the wrapper function send to that process, that message. Again, that notify thing is the gen event thing, which I probably won't have time to cover today. So, okay, so that's cool. That was a diversion, but guess what? I can now kill the chat hub process because I don't like it. Now, what you'll notice is that it's still there, but it's there with a much higher process ID. And hopefully, if the link is being nice and logging this stuff, no, it's not. Okay, is it used to have a process ID of about 40 something? It's now got a process ID of 143, so it's a whole new process, but it's still working, still doing its job. It has tolerated a fault, it has tolerated me killing the process entirely because the supervisor saw it die, knows it's a permanent process, knows it's a one-for-one, so it starts it back up again. And similarly, if we look at this guy, he's got a process ID, you can see, I'm looking at the bottom left there, which is not especially clear, but he's got a process ID of 44. If I kill him, he's still there as well, but he's now got a process ID of 166, so he's a whole new process, so I'm going to show that by going back to one of our terminals, quitting, and you can see what I'm doing, and telnetting in again, and I'm still talking to the other guys, except apparently I've just written hell. I meant to write hello again, and it's also working, so we've just had the equivalent of two different processes crashing for unknown reasons, not only have they come back up and are continuing to work, but these guys who were talking on them and using them didn't even know they'd gone. And that makes for happy customers, or at least less unhappy support people. So that's all really cool. That's fault tolerance in a nutshell. And obviously this is a contrived, simplified example. You can come up with cases where something will break and the whole thing will go to hell. This is not a silver bullet. It doesn't excuse you from writing good code. You've still got to do everything well, but this mitigates the possibility, or the inevitability I should say, that you will have bugs in your code. And it means that things can go wrong, and the system can pick back up without having to restart everything. My battery is going flat because I forgot to plug my laptop in. How are we going for time? I've got another 15 minutes. So one last cool trick to show then. This is the one which is really going to catch me out and go horribly wrong, but I'll try anyway. By the way, just for anyone who doubts... For anyone who doubts that that crash thing that I did was really crashing stuff, we can do a little test. We can put in a special crashy case here. So what this says is if I get a message telling me to broadcast the message crash, rather than actually broadcasting it, I'm going to stop with the crash atom, which is pretty much the equivalent of crashing something. You could probably do something a little bit more elaborate to crash it, but this will stop the process dead in its tracks with an abnormal exit message. And we can demonstrate that. I hope we'll rebuild it. Why didn't it rebuild it? A couple of unused variables for the matter. And then we'll stop, chat. Actually, I didn't even really need to do that. Start up a new chat application. Okay, so we'll get our clients. That's not a client terminal. Get our two clients again. Tell them in. Okay, still working. Now hopefully if I type crash, no, it's not going to work and I just realized why. Because this is concatenating names onto the front of it. Sorry, yes. But what I actually need to look like is that, because we're having the name prepended to the front. Okay, so I'll try this again. My name is ABC. My name is DF. I say crash. Why did it still work? Take my word for it that it would have crashed. I want to show you something else. But anyway, stopping things in that... I mean, you could see the process ID changing. The process had clearly stopped and restarted. Try it at home with this code if you want. I'll take my word for it. So the one last thing I want to show you. Okay, so we've started our chat application one more time. Chat application is up and running. Good. Cool. This is the bit that will inevitably go wrong. But let's try it anyway. What we want to do, we've discovered a bug in our code. We've discovered that there's not enough stars at the front of every message that's being sent. So what we want to do here, just when we're about to send it, we're going to add three stars or two stars to the front of the message. But we don't want to bring down all our clients to do it. So what we can say is compile just that chat hub module. And that little C shortcut there compiles and reloads the module. And so any calls into it, i.e. callbacks on the GenServer module, should now be using that new code. And without dropping the connection, without interrupting these guys, without them even knowing that we did anything, now we have that new behavior that we just coded in with the stars propended to the code. And that is super, super cool in my opinion. Cool, so I think we've got about ten minutes left. Questions, comments, thoughts, anything. Yes, yes you can. You'll see here in fact that there's a nodes thing up in the menu here. So any nodes that are currently known by the node you start observer on, you can do that cookie stuff. I can't demonstrate it here easily. But yes, it certainly works with remote nodes. It's really cool. It's much nicer and more modern looking than Atmon. Sorry? Well, yeah. But it just works over Erlang protocol, right? I mean you need to do that for any Erlang meshing that you're doing. Yes, that's a slight pain in the arse if you've got network people who like to block all your ports. I know the feeling. You can actually constrain the set of ports that the distribution protocol uses in the kernel config which makes it a lot easier because you can just say, can you just open those two ports and we'll be happy. Which is what we ended up doing because for some reason they weren't keen on opening like 4,000 ports, which is the default range you can use. Networks guys, man, what are you going to do? But yeah, I mean it works wherever you can get meshing working anyway. Yeah, sorry, yeah, we'll go back first. Low is the answer. Allow me to demonstrate though. Don't just take my word for it. So I mean there's two elements to overhead, right? There's two simple elements anyway. There's memory use and like the time it actually takes to spin it up in terms of processor time. Memory use is something on the order of 1.5 kilobytes per process. So low. I mean compare and contrast with the POSIX thread which starts at about, you know, two megastacks space and sort of goes up from there. The overhead of Erlang threads, Erlang processors, sorry, very, very lightweight by design. The whole point is that you're able to start thousands, hundreds of thousands of these things without a major problem. So here's a little demo I can do. So let's have a function. Now this really is coding on the fly. So I've got a function. And all it does is sit there waiting in a received block for the exit message to arrive. I'm now going to spawn F. I'm going to do it 100,000 times. And that's probably going to break. What's Erlang's default process limit, do you remember? We'll find out. So this is just a simple little list comprehension, a simple syntactically incorrect list comprehension, there we go, which is going to generate a list of 100,000 elements. It's going to stick the result of each of, it's going to say for each of those, spawn with the function. Spawn a process running F. And we're done. 100,000 processes. And don't take my word for it. There are 100,000 and 25 processes in the system. And I can now, you know, I can check some memory stuff. I don't remember most of the memory commands off the top of my head, but I can try. System info, cool. Thank you. No, the member. What am I passing in, sorry? Oh, memory, thank you. Nope, you both fail. That's for a single process, though. Hold on, hold on. I've got the docs right here. Just give me a second. I could. But that wouldn't really give you an idea of the memory use of the processes themselves. I mean, it would certainly show you the beam memory use. Yeah, yes, I could do that. Only because I think the default limit is something like 500,000 or something. So... So... It's like... Oh, there we go. So where's the beam? Come on, that's too easy, man. Okay, so there you go. Those would be bytes, I'd imagine. So 284 meg total of which 273 meg is used by the processes. Now, what I can do is kill them all. Kill them. Such that P comes from... This is a cool trick I learned the other day. Value of 2, which was the return value of operation number 2 on the shell up here. So that list of processes, I think it's V. I hope it's V. Sorry, that would explain why the memory use hasn't gone down. Thank you. So you can see the base system here is using a reasonable amount for its processes. The trivial little processes I started, the 100,000 of them used what an extra... Where are the commas? An extra 90 megabytes. So for... Obviously that's the simplest possible process. It has no stage. It has a single tiny function to it. But that's the sort of overhead you're looking at to spawn a process before you start doing anything with it. Interestingly, our total memory use has gone up slightly, of course. But yeah, so... Sorry, does that answer your question? Yeah. The supervisor? Who watches the watches? Yes, excellent question. So a supervisor... I mean, you saw one of my supervisors was supervised by another supervisor. It's turtles all the way up, basically. The top-level supervisor is watched by the application process. The application process is watched by the Erlang kernel. Slash VM, the whole thing. You can run that in the... Being watched by... Erlang has a built-in system called HEART, which is an external C process that constantly pings it to make sure it's okay. If you're so inclined, you can attach a hardware watchdog to that HEART thing. You can always find a point somewhere up the chain where it's going to fail. Erlang at least allows you to build that chain. By the way, quick plug. You can use HEART and you find it kind of crummy. Erl-D. Go ahead and Google Erl-D. It's a little Erlang rapid demon that myself and a colleague wrote to provide Unix server-style functionality to an Erlang VM. It's awesome if I do say so myself. Yes? It doesn't. They don't use any... They don't intrinsically use any sockets. I mean, if you open a socket and attach it to a process, then obviously it uses a socket, but they're not associated with any specific OS resource as you would think of it apart from a little bit of memory. Would you like to come up and explain other VMs? Yeah, quite right. I say processes because semantically they are processes that don't share memory with other stuff, but they're not Unix processes. They are Erlang processes within the Erlang VM. And in terms of stepping back to looking at the OS scheduling, the Erlang VM starts up exactly as many scheduler threads as you have cause or as it can see cause by default. I mean, you can change that, but what that means is it starts a few heavyweight processes and then it has its own internal scheduler which it uses to pull those lightweight processes off the queue and throw them to those heavyweight processes and so there's not a direct mapping between the Erlang lightweight processes and the scheduler threads that are actually executing them. They can be pulled off the queue, put back on, run by a different thread, whatever, later on. So, yeah, there's no there's no direct link to OS resources or to hardware resources in that respect. Yeah, that's consuming one socket, just like any server that's listening is plus one socket for each existing for each connection that's established to it. Just like any other server you wrote that did that. For reasons of politeness, I'm not going to compare Erlang to Go. I'm not a fan of Go myself. Erlang is good at doing what Erlang is good at doing. That's highly concurrent for Tor and stuff. The other reason I'm not going to compare it to Go is I don't know enough about Go to draw reasonable comparisons. But I mean the point of Erlang is that fault isolation is that concurrency. If you want that kind of thing if your system is going to benefit from that kind of thing then Erlang is an excellent choice. If you don't need that kind of stuff, if you're writing a game that's going to be very graphics intensive don't choose Erlang. It's probably a bad choice. It may be a good choice for the server on that game but possibly not a good choice for like what's the example of the fast Fourier transform type code. If you're doing hyper intensive mathematical stuff you probably don't want to choose Erlang for that. I can't comment on Go specifically really because I don't have enough knowledge about it to make any valid comparisons. I'm afraid. I can briefly short. How are we going for time? I've got minus two minutes left but I'll do what I can in that time. Erlang has distribution built into it. It's a very good language for using between machines. The OTP doesn't really touch on the distribution so much but it doesn't have to because the distribution is at a very basic level in the language. You saw these process IDs I've scrolled off the screen now but the process IDs I can spawn them on a different node on a different machine using basically the same code I use there just providing the node as a parameter. I will get back a process ID like the ones I got there and I can treat it exactly the same as if I'm running a process on the local machine. That's a slight lie because obviously you can't pretend the network's not there there's going to be slightly increased latency there's going to be the possibility of packet loss and stuff but from a basic semantic point of view in the language you can treat a remote process exactly the same as you treat a local process and that doesn't make distribution easy. Distribution is never an easy problem but it is an incredibly powerful base that you can use to boost them on. Does that answer your question? Does Erlang come with any load management software? Not really that I'm aware of it comes with some basic stuff like I showed you in Observer there's a load graph and stuff on there that allows you to do some measurements of load and stuff. There's nothing that I'm aware of that's built into the language at a basic level that manages load across machines but as I was saying with the distribution stuff the building blocks to build a very elegant powerful system are there in the sense that you can treat remote stuff just the same way you treat local stuff but no short answer is there's no stuff in the standard Erlang library that does load management as such. Anyone else? We're all done I would use efficiently. Thank you very much for coming Thank you, it was incredible.