 Hi everybody, I'm Andreas and I'm here to talk about the Unix input and output system. There is one particular aspect about the Unix IO system that everybody kind of agrees on and the idea is that everything is a file. Usually when we talk about files, we think of entries in the file system, some path name that points to something that has bytes in it. So these dormant file system entries aren't the kind of file that I'll be talking about today. This talk is about the files that processes have opened that they are reading and writing. These are called file descriptors and I apologize for the quality of the drawings in these slides. So one day I was talking about Unix with my colleague Nelson whom you have heard of in Currence Talk. We all work on the systems team together and Nelson knows kernels and he sent me down the path to see what weird things Unix lets you do with IO and I have learned that Unix does not disappoint when it comes to weird things. I found mechanisms to circumvent limits and break down process isolation, ways to bring a machine to its knees and some very fun implementation details in the Linux and OS 10 kernels. So file descriptors are really the concept behind how Unix does input and output. To an application, there are just numbers but they hide a massive amount of complexity and they also let you do incredibly weird and funny things. So a number alone doesn't carry a whole lot of information. It's just a handle to an entry in a table summary that holds that file descriptor state and all other bookkeeping stuff that the operating system keeps for you. It's a primitive but simple API design. Numbers are very easy to pass around for programs and the OS can keep track of all that state on the table and easily retrieve it using that number as an offset. And they're also not just files, they can be pipe sockets terminals, all sorts of weirdo stuff and Linux. Basically any kind of input and output on Unix system goes through one. Let's look at an example. So here's the output of a program called LSOF. You probably can't see this in the back. I'll describe it in words. LSOF is short for List Open Files. It shows a bunch more stuff than just the open files but the last three lines, people sitting in front here, are the important ones that I'm gonna focus on. So I ran a command to grep from a pipe and direct the output of grep into another file. And you can see that grep has, or you could see if you could see in the back, I'm sorry, grep has a pipe opened as file descriptor zero and the output file opened as file descriptor one. And then there's also a third file descriptor number two which is standard error which goes to the terminal. All of these are conventions necessary for the C library. They're meaningful there. Everything after the file descriptor number two is up to the programmer. And also file descriptors that are opened by one process can be passed off to another. So the terminal in LSOF output was inherited from the shell. This is a process called inheritance. Child process will inherit apparent open files. Or you can pass file descriptors between arbitrary processes using UNIX domain sockets. The API is complicated, so I'm not gonna talk about that very much. But to a programmer, they look like these are the network sockets that you can connect to via a path name instead of a network address. So something like slash temp slash mysql.sock. They also work only on the local system, so never across a network. And because they work only on the local system, you're guaranteed to get every message out the other end that you send and always in order. That's kind of great. They're also kind of a wormhole. So they let you send files from one process to another. The process to do that is kind of complicated. Basically you send a message containing out of band data, containing control messages, mentioning those file descriptors, and out the other end, the process reads that message and as soon as it reads that message, it has the file descriptor on its file table. The timeline here though is interesting. The sending process could close that file before the receiving process picks it up. And it's kind of like time travel in a way. This gets pretty cool. So from this we learned that the kernel makes sure that the files stay open somehow. There must be some global table that holds all the state. And the kernel then closes them hopefully when all the references are closed. Hank, please don't read that section either. So yeah, so there are also file descriptors and you can totally send UNIX domain sockets down UNIX domain sockets. Socket recursion is totally a thing you can do. So people who design IO layers like UNIX, they love to put buffers and limits everywhere. Same for the number of files that the process can open. That is, there is an upper bound to that. And what if I told you that using UNIX domain sockets, you could circumvent that upper bound. You can totally circumvent that open file. Well, standard links have different defaults. So a standard comes with 7000, your Linux has 1,000. You can reset that as a system administrator. You can just bump that limit for any process. If the process needs more, you kind of have to budget. But what if you require them to remain open and you don't want to talk to the assistant? Well, there's a workaround. There is a workaround. You could make a pair of UNIX domain sockets that both connect to the same process. Then you send the file descriptor in one, close the file descriptor on your end, and then when you need it back again, you read it out the other end. Yeah, totally works. So a really little test program to do that. This illustration hopefully makes sense to somebody, maybe me. There's a great UNIX disk called SocketPair. You get two UNIX domain sockets that are connected to each other. Works exactly like it described it. Works across platforms, it's wonderful. So how many more files can you have opened? On Linux, about 500. That's when the buffer runs full or when the kernel tells you you're over the limit. On OS X, about one and a half thousand. That's kind of great, but can we take this further? Yup. So you take one of those ring buffers that I described, and then you recursively stuff that into another ring buffer. Totally works on Linux. Yeah, you can have 200,000 open files and then the operating system itself can no longer open files. This is really confusing to other processes. I'll also do the sysadmin. Since each of these file descriptors is no longer on your file table, LSF doesn't show them. All it shows is these weirdo sockets of which you have two. Yeah. So Linux is really cooperative with this. OS X does not like it. What worked was to send messages into the inner ring, then put it on the outer ring, close the inner ring, read that inner ring back from the outer ring, read messages from the inner ring, but those messages did not have file descriptors on them. So the only conclusion that I can draw, and I confirm this with Nelson, who knows kernels, by the way, is that these dear like kernels like OS X, close all sockets trapped in a pair of Linux domain sockets once you close both ends. So every time I close the inner ring, all its containing files were closed. It does not matter that the ring itself is trapped in another one. It's kind of reasonable behavior. It's like, do this if you don't want your operating system to run out of open files. It's great. And it's definitely better than leaking them, but what does that mean about Linux? So, does Linux do garbage collection? Yeah, it does. It totally marks and mark and sweeps from open sockets. It traverses each one to see which file descriptors are in there, and if they are Linux domain sockets, it recursively traverses those up to, I think, five of four layers deep. So yeah, that's market sweep garbage collection right there. And I find this kind of amazing. Linux has a garbage collector in it to ensure that my silly test program works. It's so considerate. It's really kind of weird. So what have I learned from this? Weird computer behavior is awesome. Building these tiny test programs and trying out how they break and what they break now allowed me to take a mostly opaque thing and figure out how it works. To even find some unintended consequences and pretty pathological behavior in the process. It's kind of awesome. This makes me really happy every time it happens. So thank you so much for listening to me. I hope you enjoyed my talk. If you want to try out my test code, go to boincorps.net-go-fd-fun. And if you do, I would really appreciate if you come talk to me, or if you hit me up on Twitter, I'm at Antifoose. Thank you so much again. Thank you.