 We're going to be talking about OTP today. The title of this talk is OTP has done it, a survey of wonderful things. This is a magical land, and I'm excited to welcome you to it. My name is Nick De Bonner. I'm the CTO at a small startup in the local government software space called Seneca Systems. And we are, as of I guess a couple weeks ago, now doing all green field development in Elixir, which we're really excited about. And yeah, so I want to talk a little bit about, I'm kind of a cross Ruby and Erlang native. I'm coming from both those spaces on various projects. So I kind of wanted to talk about the different things that the OTP ecosystem has to offer to each of you. I think that you're going to find some stuff in here that's going to be really exciting for the work you're doing day to day. So the first thing I want to kind of look at are the principles that OTP was designed around. And really quickly, and this is the only time I'll mention it, but OTP stands for the open telecom platform. I didn't put it up there so that you could read it, because I think often OTP gets short shrift because people assume that it's somehow only related to telecom applications. In reality that was where it was first developed, but these are common design implementations that we can use everywhere. And the first, and I think the most important principle is OTP is about separating what is generic from what is specific. What I mean by this is, what I mean by this is it's very common for us in our industry to mistake the challenges that we face as specific to our work, to the domain problem that we're facing. When in reality I think there's a large amount of shared technical problem that we can use to address these things that we think are specific. And so OTP set out to basically say, look, we see these generic patterns of implementation across almost every software project that we develop. Let's turn them into something that is as easy to use as LEGOs to build our little lands. And in OTP, every process is either a supervisor or a worker. This stuff is not complicated. It feels complicated from the outside. If you've looked at the Erlang docs, it's very easy to just get confused and lost and say I really don't know what all this stuff is doing. But the essence of it is that everything in the system is either a supervisor or a worker. And we're going to talk about what those things mean. And applications, which we'll look at in a little bit, are basically just trees of processes. That's it. It's trees of supervisors and workers all doing their thing. So it's not a complex concept. This is where to look if you're getting started on trying to figure out what is the philosophy behind this. It seems a little ridiculous to have to put this up there, but the Erlang docs are spread out everywhere. And I think if you're trying to figure out what something is about or how to use it, this is a good place to start. And then I just wanted to throw this up there. So this is a take on Greenspun's law that Robert Verding came up with. But the idea is that, I mean you guys can read it for yourself, but the idea is that if you really spend your time trying to implement and reinvent all the things that OTP has to offer, you're going to basically end up with this. An ad hoc, informally specified, bug-ridden, and slow implementation, basically about half of Erlang. So if you take anything else away from this, it would be that this should be sticking in your mind every time you find yourself saying, no, we probably need to write that in-house. We need to write that from scratch. So the first of these principles is kind of, in my opinion, the most important. This is the generic versus specific battle. And what I think I've learned over a couple of decades of doing this is your domain problem, that problem that you're trying to solve specific to your users, might be unique and might not. But much of the technical problem that you face, it's probably not unique. There are pieces, most likely, but much of it has been written by somebody else, and somebody else has had to sit there and think about this and build a great solid implementation of it. And that's what OTP is all about, is what were those things that we could have shared and let's build them. This is a little bit of a tangent, but I see this happening more and more often in our industry, which is that we have a tendency to reinvent a lot of stuff. That doesn't actually happen in a lot of other disciplines around engineering. And this is not like a dig at JavaScript or something like that. But it is a dig at a tendency I think we have, because we're all very curious people. I think you don't get into this particular profession unless you're a curious person who likes to build things. And part of it is that it's always a challenge to reinvent, and we like challenges. It's really fun. It is an entertaining way to keep our brains occupied. But I argue if you're building production systems, for the most part, and there's a mostly there to qualify this, it can be a waste. There's of course always exceptions to this. When you hit problems, literally no one has faced technically. That's fine. But I would like to see us reusing a little bit more. And I think OTP provides a perfect gateway to getting into that new habit. This is a great metaphor I took from my mobile game days, which is there's a great phrase in game development, which I'm not sure if you've heard, but it basically goes along the lines of if you set out to make a game, which most people do when they're getting into game development, you can either make a game engine or you can make a game. And the idea is game engines are incredibly complex, incredibly complex. And you'll very quickly find your two years down the road with perhaps a great game engine, but you actually didn't get accomplished what you wanted. If you're a company with huge resources, this might not apply because you have enough people that you can weather that. But if you're a small team looking to move in a nimble manner, I think it behooves us to look at are we building an engine here or are we trying to build the game itself? And lastly, if any of these abstractions that we're going to talk about here in the rest of the talk stop being useful to you, don't use them. This is not some sort of written in stone law, everything I'm writing up here. This is common sense of these are abstractions we can use. We can take off the shelf and apply them to our problem. But if at any point in these you find yourself, it's getting painful, I say, stop, look at them for inspiration and build your own thing. So this is sort of the caveat to the rest of the talk, I guess. And lastly, to sort of visualize that, I think for most of us, this is what we end up thinking our problem looks like, where we have sort of the problem you face, the problem everybody else face. And then in the middle there in the overlapping section is the shared problem, the problem where we can share wisdom about it or code. In reality, I think it probably ends up looking something like that, where the other problems people are facing and your problems actually take up a lot smaller space than you realize. And it's really that shared problem in the middle that makes a community like this so exciting, right? This is a place where we can extend that, or sorry, get those circles to overlap more. That to me is the secret to great productivity in this industry is the more that I can make these circles overlap, the more I'm able to apply our expertise to that tinier sliver, right? So ultimately, if you're working on an elixir system, I think it's critical to ask yourself, has the OTP done this thing that I'm considering? The answer, probably. It's not, yes, it's not an unqualified yes. But the answer is probably these guys, these teams have built. This is a marvel of engineering, in my opinion, OTP. And we're gonna look at it, but it is a marvel of engineering. People smarter than me spent a long time getting it right. And I think anytime that I've looked, mostly, with a few exceptions, OTP has solved my problem. So the first thing I wanna talk about is this notion of applications. We hear it a lot, if you spin up a mix, do a mix new with like the sub flag you're gonna get an OTP application. And it can look a little overwhelming. You're unsure of where do things start? What are these configuration files I'm looking at? What is this version number four in this config excess? And an application is just a piece of functionality that can be started and stopped as a unit, okay? This is like system D units, system D controversial. But this is, it's akin to that, right? An application is a piece of functionality that can start and stop at will, and they can be mixed together in OTP. Which is really nice as well. Applications are optionally reusable. There's two kinds, there's regular application, I don't remember what they're called, and library applications. And the idea is that library applications are those that don't actually actively manage any processes. They are providing functionality to you from other places. So a lot of the dependencies that you mix into your projects are library applications in OTP. I would say honestly though, you don't actually need to worry about this a whole lot. For most, the vast majority of projects, the application callbacks are done for you with mix. It mixes a beautiful tool, you provide it that sub-flag and you get a supervised OTP application ready to go. It's really, really nice. I would say if you find yourself looking at your config excess for your application and you're like, this is not doing exactly what I need, look at conform. I think it's been touched on a few times and there was a great talk yesterday about it. This is an incredibly powerful tool that I have some big hopes for that I can talk a little bit about later. So supervisors, we hear this term get thrown out a lot. And when I first started Erlang, I had no idea what these things were, they sounded like sort of magical beings that ran and terminated things at random and suddenly I got fault tolerance for free kind of thing. That isn't the case, but they are very simple to understand. They're basically responsible for starting, stopping and monitoring all of your workers and other supervisors. So in that tree of processes we talked about, supervisors are sort of the managers, really, of the workers. Erlang was built with the understanding that OTP in particular was built with the understanding that the managers themselves will often fail. So the idea of OTP is you are free to spin up as many supervisors as you feel necessary to achieve the fault tolerance that you're looking for. And this can lead to some incredible, it was a revelation for me the first time that I properly structured an OTP app and realized that you can get things like self-healing and failover and all sorts of other really neat properties to your system just by setting up a small tree of supervisors. This is kind of confusing. When you do a supervisor, there's all these, they call them restart strategies. I will tell you that you really just, most likely, unless you know otherwise, want to stick to these two strategies, one-for-one simply means that if a child process that the supervisor is managing dies, it gets restarted, just that process. One-for-all means if one child process dies, they all get restarted. Typically, I think we use one-for-one for almost everything, and the rest we use one-for-all. Supervisors, there's no real concept of stopping a supervisor. You terminate at the root, and that termination trickles down through your tree of supervisors and gives them time to do proper cleanup or whatever they need to do if you set your kill strategy that way. I put, they're stopped in reverse start order. This is something that bit me a while ago with Erlang, is every once in a while you'll have an application where the termination of your supervisors depends on cleanup from some other one. And so it's just good to know sometimes that they stop in the reverse order that they're started in. And lastly, a child spec. You may have heard this term. A child spec is just a keyword list that we pass to a supervisor when we're adding a child. And it's just a set of arguments that tell the supervisor, what do I do with this child? The start keyword just tells it, what's the module and function you want me to execute? Restart is, what kind of worker is this? Should it always be restarted? Should it never be restarted? Should I restart it every once in a while? Shutdown is a key that just says, how do you want me to do this? You guys saw the brutal kill Adam earlier. That's one of them. It's just a means of telling the supervisor you can kill this thing right away or I can be patient with it and wait for it to do some cleanup. And then type is either just a worker or supervisor. You almost never have to worry about this. It defaults to worker. So knowing that, let's talk about these gen flavors. We hear these all over the community. And I think it can get very, very confusing about which you should be using when or even what they do, frankly. But in general, they all share some common properties. The first is that they're all generic implementations of common design patterns. These are patterns that the developers of these libraries saw over and over in every piece of software that they built. And they wanted to encapsulate it in a well-written form. These things are battle-tested at scales. You probably won't have to worry about hitting. And if you do, you can rest easy. I was telling a story in the groom room back there to some of the other speakers about a mobile game that I had with like 2 and 1 half million users at one point on concurrently for a special event. And the system basically never, like I couldn't even max out the instances that it was running on. We're talking like rock solid implementations here. You use these via behaviors. Note the British spelling. These are implemented in callback modules. It's essentially just a set of functions that you define according to this behavior that'll get called by the system automatically. And we'll look at what some of those are. But they're very easy to implement. It's like an interface kind of. And so it's very easy to get started. The sort of king of the gen land is the gen server. Gen server is, I would argue, forms probably the basis of like 90% of Erlang code out there that's in production. It encapsulates the request response cycle that we see on almost every form of interaction between users and systems and systems and systems. One thing that you're going to start thinking about is if you land yourself on an elixir stack, microservices get way easier. It becomes much easier to split your code out because you're just basically throwing up gen server interfaces everywhere. So it's very efficient, very fast, and very maintainable. There's two forms of interactions that gen server supports. That's the synchronous interaction. That's the request response cycle. You use those via genserver.call. And you pass a PID or name of the gen server you want to call out to. And then in the gen server, you just set up a handle call callback function. And all that does is take in the message that you're looking for, and you do some work on it, and you return a reply. And OTP will take care of the whole rest of that process. One way is called cast. And it works the exact same way except there's no expectation to a reply. So we can use this for asynchronous actions. GenServer is what you want to reach for, I would say, almost all the time. We're going to talk about the other two implementations. But if you find yourself trying to fit those into what you're doing, I would say just go back to genServer because it does a whole lot. And in fact, the next one is built on top of genServer. And I really encourage you to go and look at the code for this. Very lightweight. The code for genServer itself is actually very readable. But genFSM, which I'm going to talk about next, is basically just a specialized version of genServer. And all it is is finite state management as a process. I'm not sure how many of you are following the mailing list recently, but there was a big discussion about genFSM and whether it might be better to split things out into a data component of the state machine and a process component of it. I'll leave that for more creative minds. I will say that I've used this in many large production systems, and it is highly maintainable and very elegant, in my opinion. All it is is a process that goes from state to state using what it calls events. And those are either asynchronous or synchronous. And basically, all you do is define a set of callbacks in your state machine that match on whatever the current state of that FSM is, and then match on that event that you're passing in. So for example, in this FSM, if we're in the pending state and an approval event gets sent into the genFSM, this function will be called. Very easy to follow the logic. It's composable. You can move this into multiple modules and states. And we actually use this to do, well actually, this will be going into production soon, but a set of interactions with old government systems. So at Seneca, we work with a lot of local governments that are, there's like cobalt systems and things like that that we have to interact with for passing data into a 311 system or something like that. And there's tons of failure modes, tons of them. They have business hours for the servers. I'm not kidding. And so you want to have a way of spinning up, and we spin these up for each of the processes that comes through. We spin up a genFSM, a way of encapsulating those failure modes and recovering from them gracefully, very gracefully. And they're really easy to write. GenEvent, this is kind of a controversial gen flavor. Basically, this is the only one of them that you don't directly really spawn. You spawn an event manager. And I also want to say for a second, event here has a connotation, I think, when we're speaking about it in modern terms, that probably doesn't apply to the way you think about it here. It was these are not, these are blocking calls, for example, when you do this. You can do asynchronous events, but the event manager will block while it runs on that in some cases. However, it is very easy to use. You spin it up at one point in your tree. At any time, you can add a handler. And this works similarly to any other sort of event handling you've ever used from an ergonomics point of view. You give it a module and a function, and what'll happen is basically, throughout the rest of your code, you can just make callouts to that genEvent manager and say, hey, this thing happened. Execute whatever handlers are waiting for it. The great use case for this is logging. I would say that unless you absolutely know that you need to reach for this, I wouldn't worry about it. There's a lot of edge cases involving concurrency here that will get you in trouble. But it's really powerful when it fits right. And also, Act Notify. This isn't really spelled out, I don't think anywhere. But Act Notify is a safer version. There's basically three Notifies. I think it's Act Notify, Notify, and Sync Notify. Notify does a cast and doesn't apply any back pressure to the event manager. Act Notify, on the other hand, will wait for the event manager to acknowledge an incoming event, and then we'll run it asynchronously. So I suggest going with the Act Notify if possible. And that's what that is. So kind of moving away from the gen flavors for a moment. One of the common things that we face, especially in web infrastructure, are the need for sort of ephemeral storage components, the ability to basically sit something in a performant cache that can be distributed and talked to by multiple nodes without having to introduce a third-party process into the loop. And ETS is Erlang's answer to this. This is an Erlang term storage. It's for storing arbitrary Erlang terms, so binaries, or lists, or tuples, or whatever you want to store in it. It's key value. It's very easy to use. And it can be introduced into your process management tree so that you're never worried about it sticking around beyond what you want to. Although as we'll find out, it might disappear sometimes before you want it to. But this is a really powerful tool. And essentially, it comes in these four forms. One is called the set type, and this is the default. And this is essentially unique keys to unique values, or actually, sorry, unique keys. And you can also specify that you want an ordered set, which is going to be ordered keys, a bag, which allows you to have duplicate keys, but not duplicate values, and then a duplicate bag, which will be duplicate keys and possibly duplicate values for those duplicate keys. I find myself reaching for set and sometimes ordered set to duplicate some of the stuff that Redis can do every once in a while. But I honestly haven't gotten a whole lot of use out of the bag or duplicate bag settings on this, although we were talking about it might be useful for time series storage to use a bag. They come with these kind of difficult to understand but really easy to use options for optimizing them to your use case. If you know that your ETS table is going to be read heavily, pass it the read concurrency true flag, if it's going to be read multiple processes. If you know it's going to be written to a lot from a lot of different processes, pass that flag. And basically, our laying behind the scenes will optimize the ETS for those use cases. They come at overhead if you're not using it for correct use case. But I think you can figure out whether you need that or not. They have three access modes. There's the public mode, which basically says it can be written to or read by any process that's within the tree or accessible that can access that node. It comes with the protected access, which says that this ETS table is readable by anyone but writable only by the managing process. And usually that's what you kind of want for like a cache, so you stick it like a gen server in front of it. And then that process fully manages your cache store. And then there's also a private mode, which says that the managing process is the only one that can read or write to the ETS table. This is probably the most important thing to understand about ETS. The process that spins it up is going to destroy it when it terminates. So if you have a child process that owns an ETS table and you wanted that ETS table to persist when that child process dies, don't spin the ETS up inside that process, because it will kill it on its way out. One of the difficulties I face in this talk was OTP is such a broad ecosystem of libraries that I didn't really know how to do it justice in the time allotted here. And I really encourage you to go look up Amnesia. Basically, this is a relational database built on top of ETS with all the power of ETS and a lot of the cons as well. Look at it for your use case. I would say it's probably easier for people coming from the Erlang community to sort of understand its query language, but it's very powerful. It's a full relational database, has query language, foreign key constraints, I think. And it's very powerful. So check it out if you get a chance. Releases. So OTP releases are one of those things that I really wish we'd be talking about more. They are so powerful, so incredibly powerful. And I'm going to say I had a whole lot written on this. And then I saw Paul's talk yesterday, and I realized basically you should just do yourself a favor and watch his talk. It was an incredibly eye-opening experience for a lot of people to see what these things are capable of and what's been done for us. But generally an overview is that we're talking about completely self-contained artifacts for your application. Build it, you get a release artifact, a tar ball out of it, put it on a server, run it. It's that easy. Comes with the entire Erlang runtime system, by default, embedded inside of it. It is absolutely incredible. And I can speak as a, we have a very small team running at Seneca right now. And I can say that these things save my life on many occasions, because it makes fixing bugs so much easier for what you would think would be a very complex deployment process, given OTP's power. They're highly configurable. And basically, they're highly configurable. And we saw this with conform. You can make these things do anything you want. And hot upgrades and downgrades. So this is kind of interesting, because I know, for me, this was really exciting. But I understand, and I've talked to a lot of people who have said, well, my applications aren't very stateful. I run fairly stateless applications. Hot upgrade, it doesn't really matter if the system's down for a little bit. I'd argue two things about that. One is, I think applications are moving. We saw the keynote about the pendulum. I think applications are moving to needing at least some state handling on the back end. And I think while you can get away with throwing stuff out on, say, a restart, I kind of look at it as let's evolve the state of the industry a little bit. Let's expect this kind of hot upgrade to become the standard for how we deploy code. It's about time, I feel like. This is an incredibly powerful means for holding onto your application state, not disconnecting people at random times when you're doing maintenance or whatever. And part of this is that all of the gen flavors that I talked about support this really magical little function that never gets looked at, called code change. Code change is just a callback behavior, or a callback on the behavior that takes in the old state of the application when this function was called and is expected to return the new state of it. This gets called during one of the hot upgrades or during a hot downgrade. And it's incredibly powerful. You can essentially take, for example, you could have a stateful map of connections or something like that. And then you could literally change the entire logic of the way those connections are handled. And then you're just going to write a migration inside this code change function that's going to take the old map that you were holding around, return the new map, and all of your systems will make that migration seamlessly and in real time with no downtime. I mean, it's really neat to see the first time you do it. And I highly recommend it for anybody who's doing any sort of real time application where you're dealing with persistent connections. Quickly, I want to touch on this really neat feature, which doesn't get a lot of play. It's the fact that in an OTP release, you can specify the nodes that you want to be on sort of hot standby for your application. So you can say, when I start this up, I want to start a node at this host name and it's going to be the primary. And then I want to have these two hot standbys. And their job is to be started, but they're not going to be getting the traffic from the rest of the way that the tree trickles down. And at any time, if one of those nodes goes down and you can configure what that means, how quickly or how much of a chance you want to give it to come back up, basically, OTP will automatically fail over to one of your hot standby nodes. And when your node comes back, oh, sorry, yeah. So it's configured as part of the release. And when your node comes back from failure, it goes back. It gives control back to that original primary node. So you can have self-healing, basically, topologies for your applications with little more than a few configuration steps. That comes with a little asterisk because it can be difficult to any sort of distributed failover, can be difficult to configure. But I encourage you to look into this to see if it's an option for you. Having said all that, I want to talk about my dream infrastructure because I think we're really close to it here with Elixir. What I really want, at Seneca, for example, we have a lot of legacy systems that are on, they're not legacy, they're in use, but they're on legacy languages. We have a lot of systems that we need to keep running, but I would really like to be able to have the power of OTP supervision structure over even third party processes. And I think this could be just absolutely incredible. I know I was talking with the cargo sense guys about what they're doing and how they're starting to see the power of OTP releases and kind of questioning their decision of looking at Docker as they're moved. Because I actually think, done right, you could build a complete deployment infrastructure configuration management system on top of OTP that would blow your mind away. I think that ports can be used to get a lot of this out of it. I know that they can be used to manage third party processes and restart them when they die. And so I'd really love to see somebody take an attempt at building the supervision tree that sort of mimics, you're not gonna get all that, but sort of mimics the supervisor structure as it sits today. I'm gonna basically take all the hard questions and pass them off to James Smith who's gonna be giving a talk later today on interoperability. We kind of went back and forth on this and he's gonna be the expert for how to do this. But I'm gonna go ahead and say that the first person to build this will have one of the Phoenix-like apps in the elixir ecosystem, if it's done right. So kind of take it way, way out there. I'm not sure how many people have seen this demo, but this is essentially an Erlang VM written on top of, as a unicolonel for a hypervisor. Oh, that's somebody out there. And what it does is it basically spins up a complete hypervisor VM for every request. Go to this website, it will blow your mind how quickly it runs. And sort of I was thinking about my dream and I thought, man, what if you could just get rid of the OS altogether and you could run an OTP infrastructure on top of a hypervisor, spin it up, scale it up and down with new nodes that with self-healing do to failover like that. Then I think you're competing with like Kubernetes or something like that, which gets really exciting. So yeah, somebody get to it. Finally, I just want to recap real quickly. OTP is about awesome implementations of common design patterns. Look to it anytime you're looking for something that you have a suspicion somebody else has had to tackle before, which is probably most things. We've got robust ephemeral storage options here. And if you need something a little bit more than ephemeral storage, check out Dets as well. So Dets is basically a disk version of ETS that you can use to flush ETS tables to disk every once in a while. It's very robust in the use cases that it was designed for but it comes with a lot of caveats. So just check it out if it interests you. We have a really powerful deployment story. I think the most powerful deployment story of any language, even more powerful than Go, in my opinion. And Go is pretty hard to beat given how self-contained it is. But with the monitoring and fault tolerance we're talking about, I think it blows it out of the water. And lastly, when you're rolling with OTP you very quickly figure out all the other platforms in your stack end up as the failure points. In fact, it's almost depressing because you want to be writing more elixir but it works, it just works. And this was another thing that Ben and Bruce for Carcassense were telling me is they were like, yeah, we got all this OTP stuff going and then it ended up being the Docker daemon that we had to watch all the time because that was basically where the failure abstraction went. And I think that's incredible. I think it's really exciting. So yeah, that's the end of the talk.