 A friend of mine, Lily, Shalene came up with this idea and it's like super genius. So anytime during this talk that I'm going to walk over to this table, which is like forever away and grab my glass of water and take a sip, I want everybody in here to just start like clapping and cheering because otherwise I'm going to do this weird dance where I'm like, please don't watch me drink water. It's going to be great. So we're going to practice it now. So this is awesome. Thank you. So I'm Stella Cotton. I'm an engineer on the tools team at Heroku. And before we get started, I wanted to give you a heads up that I'm going to tweet out a link to the slides right after my talk. So if you want to take a closer look at any of the code or some of the links that I'm going to include at the bottom as references, you'll be able to do that. And I lured you all here under this very false pretense that we are going to learn how to write a small web server in Ruby, the little server that could. But the reality is that this is the little server that can't. If you're going to use this little server that we're going to talk about here today to run a production website, you're going to find it's slow, it's not secure. In some cases, I'm just going to be like, oh, we're just not going to do that. And also it's going to be a little bit limited. It's going to be limited to Unix-like systems. So it's not going to run on a Windows machine or a Windows server. But there are some reasons to care. It can do some cool things. And to talk about what this dinky little server can do, we're going to first talk about abstractions. And as engineers, a super powerful skill is to know when to dig into an abstraction and to just accept the constraints of the abstraction. And just think, like, if every single time you wanted to use a third-party API, you decided, oh, I really need to know how this works underneath. You'd be wasting your time, and you'd be really sacrificing a value of an API, which is an abstraction away from something that you just don't need to know about. So abstractions can also make your code a lot stronger when you're practicing them and you're implementing them in your own code base. And it really helps you manage your time efficiently. So you might be that person or you might have worked alongside that person who they follow every single rabbit hole when they're trying to debug a problem and they just waste hours and hours because they just can't accept that, like, you know, this part, it just works. But to be totally honest, as somebody who came to software much later in life, after a career of doing something completely different, I find that I have to, like, really fight this instinct to just write off abstractions as magic and something, like, completely unknowable. And good abstractions should feel like they're magic, but we really just need to remind ourselves that they're a tool. And servers specifically are a tool that we use every single day. It helps web developers do their jobs. You run Rails S, and you don't really need to care, like, what's happening underneath to start building a web application. The server just starts up. It's powerful, it's an easy-to-use abstraction, and it is so powerful that it feels like magic. But servers are not magic. If you dig down past this abstraction layer, you're going to find that it's just code. So we're going to explore some of the components of servers. So tonight, you can go home, go to GitHub, and you can open up a Ruby server's repo, like Puma or Unicorn, and there will be some familiar concepts inside that mystery. So our server is going to help us understand what's happening in production servers. What else can the little server do? Because the pieces that actually make up web servers are so fundamental. It actually will help you build a foundation to understand all sorts of random cool things that are happening inside the Ruby community. So things like, why was garbage collection so important in Ruby 2? Why do we care about Koichi's new concurrency model with guilds that he talked about earlier? These fundamentals will also help you explore outside the Ruby community, and venture into operations or systems communities, like watching Kelsey Hightower live code to talk on containers, or reading Julia Evans' zine about S trace. So let's start off by talking about what is a server really? Today we're going to be talking specifically about a web server. It's going to live on a physical computer, which is somewhat confusingly also called a server. And right now, you are probably using one of these servers today. Unicorn, Puma, WebRick, pretty common. And in a lot of ways, the server is really just like any other program you run on your computer. It has code, it lives in a file, you run it. Our server is just going to be run like this from the command line. So how is a server different from your web development code that lives inside Sinatra or Rails? One, it's going to communicate with the outside world, and it's going to leverage the power of the operating system to do that. And two, it's going to conform to a very specific API to communicate. And this might not really sound like a big deal, but if you think about the web today, or when I wrote this slide, has 4.68 billion index web pages, and it's more now than when I captured this, and it's being served up by servers all over the world. And these web pages are being viewed by typically five different browsers across mobile devices, desktop devices, and the fact that every single one of these are communicating in the exact same language is really incredible. And so how is that even possible? In my experience, if you asked five developers to solve a problem individually, you're going to get like six answers. But it's the magic of the standards body called the W3C, the World Wide Web Consortium. It was formed in 1994, and it creates standards for the open web. And they created the API that we use today for web servers to communicate over HTTP. Specifically, they established a document called RFC 2616. It's 175 pages, and it outlines the entire API that web developers use every day. And like you can see this like wall of plain text, and it's just, it's wild, and you might get overwhelmed. And why is it so confusing and so dense? And an important thing to keep in mind is it's not actually like API documentation that we would think of. So typical API documentation might be how you would use Twitter's API to create a tweet on behalf of a user. A corresponding documentation like that for a web request could be something like the initial line, zero or more header lines, blank line, message body, it's optional. And in the real world, we form these web requests in a few different ways. And specifically in this talk, I'm gonna use the term client as an umbrella term to describe any of these methods. We can use the browser to visit a web page, we can use telnet, we can use curl, which is what we'll use here. But even inside our web apps, we use HTTP client libraries to format given parameters to make a request. And the documentation's pretty good and it's pretty short. So what about this confusing RFC? Like why is it so much more complicated than that? You can actually think of this as Twitter giving you all the specifications to implement your own Twitter API. It has to outline every single way that somebody would call out to that API and every single way that Twitter might respond. It's basically a giant contract for how we structure our web requests and our responses so that every single person in the world can communicate. And if you think about it in a more Ruby-ish way, it's actually like a giant list of tests you should write if you wanna create a web server. So it's not just saying like, you need to support a request that looks like this, which is what we were building earlier. But it also has to say things like, every single one of these URLs are equivalent. It's every single bit of interface between the open web and our applications. It's amazing. And it's really impossible to talk about writing a web server without talking about this RFC because it's so incredibly important. But when I was first starting to understand how web servers work, I was like Googling like, oh, how does a web server work? And so what comes up is the Stack Overflow article or answer. And it's basically saying the only way to understand how a web server works is to read this RFC on 175 pages and implement it yourself. And since this is a conference talk, I'm gonna make at least one meme reference, but this was the most how to draw an owl shit I had ever read. And to be honest, I am so frustrated because I'm like, am I a real developer? I can't turn this 175 page spec into a web server. And the reality is that it's super important and it's really awesome. And I actually recommend that you take a look at it, but it's actually not gonna help you understand how production web servers really work. It's really just an instruction manual for respecting the conventions of an open web. So if you wanna build a production web server, yes, you definitely need to make sure you respect it. So we'll respect the basic contract. So we'll be able to curl our web server. We'll be able to get a response. But what we're really gonna focus on today is digging deeper into some of the fundamentals. Bless you. So we talked about what it is generally, a server. So let's dig in a little deeper. We'll look at three fundamental building blocks of a simple web server and how it facilitates communication because that's what servers are really all about. So first, let's talk about how it communicates with the outside world. At its heart, it's just a program running on a machine that communicates in a few defined ways. And when you start up that program on your machine, UNIX is gonna create a little world for your program to run in called a process. The idea of the process is basically you can assign variables, you can change state inside your program without corrupting your global state in your operating system. And like any other process, you can use PXOX to see what process is running. And a specific process is special, like we talked about before, because it leverages the power of the operating system to talk to the outside world. And the Ruby Standard Library is going to give you a small web server you can run called WebRick. But it also provides a lot of wrappers around common UNIX system calls, which are gonna help you build your own WebRick. So how do we do this? First, we're gonna talk about opening a socket. A socket is a way that processes on a system communicate. They can either communicate with other processes on the same machine or with the outside world. And you'll hear people say that everything in UNIX is a file and sockets are no different. They're just a specific kind of file that both servers and clients can read and write to. And if you're curious about what sockets are running on your machine right now, if you have a Mac, you can use NetStat and you'll see the list of sockets. And the operating system is gonna identify each of these files that are on your system with a number. It's called a file descriptor or file handle. And Ruby's gonna give us a higher level abstraction. So we actually don't need to pass this number around in our system calls. But underneath it's what the operating system is gonna use to identify the specific files that we're reading and writing to, in this case, a socket. We're also not just gonna open any socket. We need to tell our operating system that we're opening a specific kind of socket that's capable of accepting web traffic. So to create a web socket, we're gonna choose our addressing format or communications domain. And in more plain terms, that's the kind of address you're gonna use to talk to your socket. The two most common formats that you're gonna see are gonna be Unix and Internet. And remember how I said that processes can use sockets to talk to other processes on the same machine or to the outside world? A Unix socket is the path name that they use to talk to other processes on the same machine. And an Internet socket uses an internet address so the outside world can talk to the process running on your machine. So we're gonna use an Internet socket because we wanna allow the outside world to talk to us. So we decided we got this address format that we're gonna use. Next we need to define the kind of socket we wanna create. And typically you're gonna hear about two kinds of sockets. You're gonna hear about stream sockets and you're gonna hear about datagram sockets. Stream sockets are gonna be like a telephone. They're bi-directional and the communication protocol is gonna be using TCP. And it's gonna be two servers connected, chatting back and forth. And you may have heard of something called the TCP three-way handshake. And this is how clients and servers are gonna establish communication with each other. CIN and ACK are two bit flags that are turned on and off as a part of this communication. And it's just essentially a way for both parties to know that both parties are also connected before exchanging very important information. So the client's gonna be like, hey, it's Stella. And the server's gonna be like, oh, hi, Stella. Server speaking, how are you? And the client's gonna respond, oh, I'm here to get some information for you. But instead of doing all that goofy, like talking on the telephone, they're just gonna use CIN and ACK. So stream sockets are gonna make sure that both the server and the client are connected using the handshake. But equally important is that all the information is being sent in the correct order. Otherwise your webpage is gonna look terrible or it may not render at all. So the server is going to continue to send information over the socket along with a sequence number. And the client is in charge of keeping track of that data and making sure it's in order, even if your packets don't actually arrive in order. So the TCP protocol using a stream socket is a great way to ensure that your HTML pages are arriving in the correct order, but you're gonna pay a price for it in terms of how long that process takes to connect. So datagram sockets on the other hand are like a megaphone. They're unidirectional, they only go one way, and we don't actually care if there's anyone around to hear us. There's no handshake because datagram sockets just don't care. And so not only are they yelling into a megaphone, but if you were the person standing around listening to this megaphone, there's not even a guarantee that the words are gonna come out in the same order that they went in. And the benefit is that this message is gonna come across really quickly because it uses a protocol called UDP instead of TCP. And common real world examples that use this are things like multiplayer games or streaming audio, or if you've used monitoring stats D. So for our web server, let's first set up our socket. We're gonna tell it use the internet communication domain, AKA use an IP address, and communicate via TCP over a stream socket. Now that we have the socket, we just create the address to bind to. This could be anything. It's like generating a telephone number for our server or for outside clients to call in. We're gonna bind the server to the address so that it knows that's where we're gonna receive the information. And finally, we're gonna say socket, why don't you listen for incoming connection attempts, and we'll just pass the socket back. We just wait, we listen for requests. So next we're gonna create a whole new method that's just gonna be looping and continuously listening for requests. And if somebody dials our number, AKA writes on our socket, like we're doing here with this curl command, except it's gonna then create a whole new pair of sockets so that we can talk back and forth to the client. And why wouldn't you use the other socket? But that's because that socket has one job. It accepts incoming communications. So you're gonna need a new set of sockets to communicate back and forth so that your data's not getting jumbled. So we receive the request off of the socket and we're ready to go. So what are we actually doing when we handle those requests? For now, let's just say that this is where we're gonna run some application code. It's gonna return a HTTP response. And what might that look like? It's gonna look pretty similar to the most basic response that's outlined back in the RFC that we talked about earlier. It's gonna have the header, it's gonna have the body. And we're just gonna write that response back to our socket and the client will be able to read it off. And at the end, when we decide to shut down our server, we'll close our socket so that can be used by another program on our system. With socket.close. So this whole process here is just how our tiny little server is gonna communicate to the outside world. It's gonna repeat over and over each time the client sends a request and it's just gonna keep returning this comfortable, familiar phrase, hello world. So now that it's like actually chatting with the outside world, let's talk about how our server is gonna communicate with the application code that's living underneath. We're gonna start by talking about a parser. Before when we were building this tiny web server, you can see we got back the response and we didn't actually do anything with it. We were just like, you get hello world, doesn't matter, I don't care. And a parser's job is to take in that request that we ignored and it's gonna use the guidelines from RFC 2616 and it's gonna break that request down into pieces so that you can act on it. A parser's gonna extract things like header, body, URL and it needs to do these things very quickly and it needs to do it very accurately. So even though you'll actually find production web servers written in Ruby, the actual parser itself is typically gonna be written in C. So I'm not gonna dig too much into the details here but one thing to note is that Zed Shaw wrote a very unique parser for the Mongrel web server which has basically been ported in to many of the Ruby web servers that we use today. It uses a DSL called Raggle to explicitly outline what a valid HTTP request looks like and it uses state machines to make sure it does that safely and accurately. This is really beyond the scope of this talk but it's really worth knowing about if you start digging into Ruby web server code because you'll find it in Puma, you'll find it in Unicorn and it'll literally just be brought in directly. Now as we build up our little server that can't, we're gonna just bypass building a parser ourselves, take some liberties, continue to assume that no matter what requests we get, everybody's gonna get the same thing back. So we talked about parsing. Let's talk about what happens when the server actually communicates with the application itself. Instead of having just a hard-coded hello world in our run application code method, how can we modify this server so that we could actually plug in any standard Rails or Sinatra web app? We can do that with the magic of rack. And in the Ruby world, there's this common interface that all servers and applications communicate over called rack. Sinatra and the Rails framework both use it and it's what allows you to substitute in and out web servers without actually having to do a lot of configuration changes. And the basic implementation on the application side, this would be what's actually in Sinatra or in Rails or in your application, is that it's gonna be a Ruby object, it's gonna respond to the method call, takes one argument, and it returns a response. So here's an example in a super lightweight rack app. Ruby object, response to call, takes one argument, returns status header body. And so before when we were calling this method run application code, we're just constructing a string. But now, we can just say always just execute app.call and it doesn't matter what's happening underneath, our server is just going to trust that that's going to return what it needs. So it could be our super lightweight rack app that we've written here, Sinatra app, Rails app, we'll always get the information. So our server is running, it's chatting back and forth with a client, it's communicating with our application, which is super cool, and we've even got all this stuff, this is really awesome. And in fact, we can even have another client chatting with our server too. There's two of them over there, life is perfect. Except now, let's just pretend, inside of our rack application, instead of just returning hello world, we started making a call to the external GIFI API. Who knows why? But we're making this external API call. And so we're gonna download a new cat GIF every single time you visit this homepage of our website. And so I'm adding a sleeper to illustrate that this is like a blocking slow request to an external server. And it'll wait five seconds to download before it'll return the response to the client on the right. Five seconds of waiting is really bad for one person, as you see with the top right client. But if you just add one more visitor to this webpage at nearly the same time, they're gonna be waiting 10 seconds for their cat GIF. And you might be saying, well, five seconds is like totally an unreasonable amount of time for a cat GIF. And yeah, it's kind of arbitrary, but even relatively short, fast calls out to an external API can add up the more people visit your website. And so you can use a grocery store analogy to explore how that would impact your server if more and more clients start to retrieve this cat GIF. If you have a grocery store with cashiers checking people out, our server right now is like you only have one line at the grocery store. Each request has to wait for the request in front of it to be finished before it can complete. And adding this cat GIF to your web request is like having a new inexperienced cashier come on duty who's just twice as slow as the experienced cashier. And so if people keep getting in line at the same rate or people keep visiting your website at the same rate, people are gonna wait longer and longer. So how can we make things go faster? We could add a new cashier. And one way to add a new cashier is to build a new checkout line. And the equivalent thing in our server is to fork a process and create a whole new sub-process to download our cat GIF. So before we're like listening for client requests and we're running this code that handles that request in line. But if you extract it out into another method, you can wrap that method in fork and it'll create a whole new process that calls out to get the cat GIF. That means the parent process is still accepting requests from other clients without waiting on the API call from the first client. And you can see here, it's gonna still be, you're still running two clients at the same time, but they're not gonna be waiting five seconds each blocking the other client from getting the cat GIF. One thing to remember is that you actually have to close the parent connection to that socket as well, even though you've forked and copied it into a child. And I mentioned before that file descriptors are a number that the operating system uses to keep track of files. And when you call fork, the child is just gonna duplicate that connection to the socket by duplicating its file descriptors. And the operating system is gonna keep track of how many references there are to that file descriptor. And so if you close that child socket, but you don't tell the parent to close its socket, that socket can basically stay open as long as the program is running. And the operating system limits the total number of files that you can have open and you can eventually see an error called just saying too many open files. So this is a side note. You may screw up at some point. You may fork a process that does not get killed. And when you try to start your server again, it's gonna say address and use error, address and use error because this process is hanging around in the background and you don't know it. And it's holding onto that IP address. And you can just use a list of open files to find the process, all the files that are open and look for the process. And you can see that it's running with grep, or with PXOX grep, and then you can just kill it. So at the end of the day, each child process is gonna be a copy of the parent process. And the boxes here are gonna represent separate memory that they write to. And if any of these processes wanna actually talk to each other, they can't access each other's memory. They'll need to open a Unix socket like we talked about earlier. But wait, so if fork is duplicating my entire application, isn't this just gonna duplicate the amount of memory that my application is consuming? So Unix has this optimization called copy on write. Since a lot of the memory that's actually being taken up by your program is the code itself, which is static, and not these variables that you're dynamically assigning, there's a huge amount of memory duplication between the child and the parent. So Unix is actually able to share that allocated memory between the parent and its fork children. And when a portion of that memory is changed, say you assign a variable in your program, the only small difference, that only the small difference is actually registered. So it copies the memory to a new page when the child program tries to write to that memory. That's why it's called copy on write. And it means that the memory footprint of the child process is usually gonna be smaller than your parent process, which is awesome. But in the olden days, Ruby unfortunately was not able to take advantage of this optimization, and it got a really bad rap for being a huge memory hog. And why? Because of garbage collection. So what is garbage collection exactly? It's what allows us as Ruby programmers to allocate memory willy nilly on our computers, and we never worry about freeing that back to the system. Very simple example, assigning the cat gives variable to the word awesome, or the string awesome. We may never actually use this variable again, and in a manually managed language, we would need to write our code in a way that frees that back into the system. But in Ruby, we don't actually have to worry about that overhead. The garbage collector does all that work for us. It's periodically gonna run through the program's memory and figure out if it's still active, and if it's not, it's just gonna release it back into the wild. But in the pre-Ruby two days, we paid a huge cost for that abstraction when we forked a program. And why was that? Because the garbage collector basically marked all of the memory in the process of inspecting it, to see if it was in use, which made the operating system think like, oh, every single value in the forked process for the child and the parent process is different. So basically, none of the memory is shared. And instead of the child process having a smaller footprint, it's basically the same as the parent as soon as the garbage collector runs. However, Narahira Nakamura made a really cool change to the way garbage collector works in Ruby two. So I'm not gonna go into it here, but the main benefit is the garbage collector no longer modifies the actual memory itself, and Ruby is able to take advantage of copy and write. If you wanna learn more, Pat Shaughnessy's blog has a really great explanation of the internals. And this code is not perfect. If you had a lot of clients trying to reach your server at once, and you just continue to fork every time you get a request, the machine running your server program is eventually going to run out of memory. So we learned that fork is a way to run processes in parallel. And if you're concerned about running out of memory, well, you might think, hmm, I've heard something called threads use less memory. Why don't we try that instead? So what's a thread exactly? It's the smallest bit of programmable instructions that can be coordinated by the operating system. Many threads can be a part of that same process in a multi-threaded system. And it'll share all of the processes resources, specifically its memory. There's no copy on right here because all the threads can access the same memory space and write to it. One of the benefits of threading is that it uses less memory. The threads will die when the process that they're running in has died, so you don't run into these zombie processes that we talked about earlier. And it's really fast to communicate back and forth because there are no sockets. And building up and tearing down Unix sockets isn't free. So why don't we actually use threading more often? Well, one reason is that the version of Ruby that most of us or many of us use, MRI, doesn't actually support true parallelism. And why does it not support parallelism with multi-threading? Because of the global interpreter lock. And Matt's touched on this very briefly in his keynote, but if you aren't familiar with it, the GIL means that even in a multi-threaded program, only one thread is ever executing Ruby code at any given time. So you might think, oh, well, that's awesome. Ruby itself makes sure that our code doesn't have any of those weird side effects that people warn you about when you fork up a bunch of copies of your program and you have threads of a bunch of copies of your program and they're all sharing the same memory. But unfortunately, that's not the case. It only guarantees a single operation in Ruby, the language itself, is thread safe. So what does that mean? So here we have a pretty simple method. It's just gonna check, are there more than zero cat toys available? If there are, we will let somebody buy one. Otherwise we're gonna say, no, you can't buy one of these. However, if we put these two methods running parallel in a thread, in two threads, the GIL can context switch in weird parts of the code and cause race conditions. So let's say thread one. It checks to see if there are any cat toys available. The instance variable is still gonna be more than zero, so we're gonna be like, cool, one left, we got this. But that's one operation. So the GIL can switch to the other thread because that single Ruby operation is completed. The first thread hasn't ever purchased the cat toy, but the second thread is also gonna think that there's a cat toy left. Thread two buys the cat toy, but when you actually context switch back, thread one is gonna think, oh, there's cat toys left, I'm gonna keep going, but there aren't any. So the GIL is valuable because it protects the Ruby language itself from race conditions, but it's not actually gonna help you make sure that your code is thread safe. So why would you use a multi-threaded server if you're running MRI? It makes sense if you're using JRuby or Rubinius because you can take advantage of multi-threaded support in those languages. So a benefit to using a multi-threaded server like Puma, for example, is if your app is actually waiting on a lot of third party API calls because that would actually give you a performance improvement because your thread isn't actually being blocked by Ruby code, it's just waiting on an external call. But at the end of the day, if you're using a multi-threaded server, you have to be sure that your code is thread safe, any gems you use are thread safe, and it's really beyond the scope of this talk to discuss how to write thread safe code, but I can share a few resources afterwards if you're interested, but threading comes with a lot of problems, as Matt's pointed out in his keynote. So our server is communicating with the outside world through internet sockets, communicating with the application, using a parser and the rack interface. We're even able to handle multiple clients with forking. Last, let's talk about how you, as a human server administrator, can talk to your server using signals and traps. Signals are a way that you can interact with processes that are actually running on your machine. You can see all the signals that your operating system supports using kill-l, and the most common signal that we use every single day is gonna be Control-C. It's gonna fire off an interrupt signal to our program, and the program is gonna know that we'd like it to shut down. So we can actually use a trap in our server to capture that signal and execute some code before our server process totally shuts down. Here, we're trapping the signal so we can print terminating to let the user know that we're shutting down gracefully. And in this case, it's a little bit arbitrary, but in other scenarios, like a long-running process, like a job, if your web application needs to be restarted because of a deploy, and there's no way it's gonna finish that job in that short amount of time that it's gonna be allowed to perform its operation between when an interrupt signal gets fired and when the program is just completely forced to quit, you could do something like store how far along you got in that long-running job, and it can pick it back up later when your app restarts. And there's one signal you cannot trap, and that is SIGKILL or signal nine. The kernel immediately terminates any process that sent this signal, and there is no signal handling that gets performed. So wrapping up, today we've talked about what a server is, how it communicates. Along the way, we've written a little server that can't do everything, but hopefully it's taught us some cool Unix tricks, and it will help us navigate the mysteries of production servers in the future. I'll tweet out a link to my slides on Twitter, like I said, and if you wanna check out any of the links, and also if you wanna come say hi or ask me any questions, I'm gonna be around afterwards, and I'll also be at the Heroku booth tomorrow from 11 to 2 p.m. if you wanna chat or get a t-shirt or get some stickers. Thanks, everybody. Thank you.