 All right. Good morning everyone. Come on. All right. Exactly. All right. So 30 minutes modeling concurrency in Ruby obviously kind of an ambitious topic and we just spent half an hour talking about one specific aspect of this. So there's no way I'm going to cover this entire topic. So what I want to do is actually tell you a little bit of a story of my own exploration of this topic because concurrency is something that I've been interested in for a long time. You know, I'm going back to university. Some of the courses I enjoyed the most were around us and of course what I discovered was I learned all kinds of stuff about semaphors, locks, mutixes. I did my detection algorithms for deadlocks and all the rest and then I went out into the industry and realized that nobody gave a damn about any of those things and in fact it was something very different. So of course, you know, I picked up a few books and you know, if you do a quick Amazon search you'll find that there's probably over a million pages that have been written on this topic. So clearly this is something that a lot of people care about but nonetheless a lot of them are kind of the same flavor, right? You can find a book on concurrency for virtually any language and you know, I picked up a few and I found that, you know, we have books on patterns, we have books on anti-patterns, we have books for just about anything and everything on concurrency but nonetheless when people think concurrency the first thing that comes to mind is threads. Of course, it's not only threads, no, it's all about events. So I'm here to tell you that, you know, forget threads, it's all about events. Well, not quite, right? So it turns out and you know, I spent a lot of time pouring gasoline on this discussion as well, it's a religious debate. Turns out you actually need both and to understand why you need both, we actually, once again, need to take a step back and look at the hardware. And this is something I didn't appreciate until I actually, you know, put all of the software stuff aside and looked at the hardware. So first of all, here's a very simple architecture of a core, CPU core today, right? So you have this compute unit, you have a couple of caches and we'll see why we need those caches and then there's RAM devices and other IO devices attached to this thing. So today on most architectures, you know, these are rough numbers, it takes about 100 nanoseconds to go from the core to the RAM. So if you're trying to pull down an instruction to execute in your code, that'll take 100 nanoseconds. That's pretty quick, but not quick enough. So the reason we started adding all of these caches and in fact if you get really into this topic, you'll find that there's L1 caches, L2, L3 some even have L4 caches. There are different architectures for how those caches are constructed and how the data is shared between multiple cores and all the rest. So to go to L2, it takes us 7 nanoseconds, which is an order of magnitude or several orders of magnitude better than going to RAM. Going to L1 is about half a nanosecond, right? So that's pretty good. Now let's do some math. You take a run of the mill to give her CPU today and that one clock cycle is about half a nanosecond. So that's pretty good. That means that we can actually fetch the next instruction from the L1 cache in about one operation. But the converse of that is anytime you have to go to RAM, un-often times you do, you're wasting 2,000 CPU cycles. So to combat this, the hardware industry for the last, I'm going to say 30, 40, if not more years, have been inventing all kinds of crazy stuff that we don't even think about when we write software. There is prefetching, there is branch prediction, pipelining, hyperthreading, speculative execution, all kinds of crazy stuff that we don't really even have to think about. That all happens under the hood. And if you're interested in the sort of topic, follow the Bitly link at the bottom. So here's a very interesting example that I came across recently. This is an example from a presentation that was given by Joshua Block. He's one of the lead developers in a whole bunch of JVM libraries. He gave this example where which of these code blocks do you think is faster, will execute faster, one or two? So one and two? Well, your intuition is correct, but it turns out that we actually don't know. And the reason we don't know is because both the JVM and the hardware is crazy enough such that oftentimes those are actually the same today. But in some cases, for example, in the first case, when the operations one and two are not both true, it's significantly slower. So what happens here is once again if we rewind a little bit, we have branch prediction, speculative execution, and all the rest. What happens is the code, the runtime, will actually execute both of the branches in parallel at the same time, and you actually don't get any benefit. So while our intuition says that the second one should be faster, we actually don't know. So his message was we have to measure. And in order to measure performance today, you actually have to do statistical tests. You don't just run your code once, you actually have to run it 10 times or 100 times, build a histogram, look at the histogram, get your error bars and everything. Only then can you make some sort of prediction. And that also depends on your hardware, it depends on your JVM version and all the rest. So basically the answer is we don't know. So coming back, what this means is we have hardware parallelism, right? That's just embedded in the hardware. And in order to make our software run faster, the hardware has to do all kinds of magic tricks to prefetch instructions, to execute it in parallel and do all these things. And because of that, we actually have software parallelism. Because of that, we have processes, threads and events. And I'm not saying those are exclusive or in fact even very different. They're all basically the same thing. How you schedule events, what your event detection algorithm is, or if you have a runtime loop or something else is a whole different topic entirely. And I'm not going to go into that. What I think is interesting though is because we have the software parallelism, what we've done is we've basically invented a whole bunch of libraries like p-threads, which we've mentioned already, ePoll, KQ and all the rest. And we just bolted on all of these things to our operating systems, right? So these concepts are not specific to any particular language. They're just there as part of the operating system. And then what we've done is we've taken all those libraries and exposed them to all of our languages. So it is not even the case that when you design the language, you actually actively think about concurrency. These things are basically an afterthought where we just say, oh yeah, the operating system gives me all this stuff, so here bind to this library and off we go. Go build concurrency stuff, all good. So recently I've been making my way through this book by Bruce Tate. And anybody here read this book? Highly recommended, very, very good book. Within this book there's actually a chapter, one of the seven languages is Ruby. And one of the questions that the author asks Matt in the book is, if you could go back in time, what is the one thing that you would change about the language? And his answer was, well, I would remove the thread and maybe add actors or some other more advanced concurrency features. And of course the question that you should be asking yourself now is, what the hell is an advanced concurrency feature? Because certainly when I was taught concurrency, nobody even mentioned actor models or anything like that. We were taught threads and all the rest. So what he's implying though is, sure we have pthreads, epo, kq and all the rest, but there's something missing, there's something in between that we have skipped. And that's that advanced concurrency model. And it turns out there's actually quite a few of these guys, right? And this is anything but a complete list. In fact I could fill this entire slide. There is Dataflow, Petronet, actor models and all the rest. Follow that link at the bottom there. There's a very good collection of Wikipedia articles around this topic. In the interest of time I'm going to focus on just two. One of them you guys are probably fairly familiar with at this point. The actor model has been kind of going around and a lot of people have been playing with it. And then there's the Pi Calculus and CSP, which I find very interesting. Just because it's actually very similar to the actor model, but also significantly different. So before we get into the specifics, I think the value of a good tool or a model is first in what it enables us to do. But also one thing that's often overlooked is the constraints that a model will impose on us. So an example of that is a good model will give us something that we can't do before. So maybe it gives us a way to express something. So maybe that's a language feature. It's such that we can cleanly express it. It can dictate a structure. So think about Rails, for example. The first time you created a Rails app and you had those 15 folders, you were like, what the hell is this for? But then later, once you're one year into development cycle, you're like, okay, it actually made sense. Like now I know where to put this stuff and I don't have to re-architect my structure. Whether you actually like that pattern or not is a whole different conversation, but that's an example of what a model can give you. And likewise, it can dictate a style. So for example, functional programming will dictate a style and that has its benefits and it has its downsides. So those are the things that a model can enable. But likewise, a model can disallow certain behaviors. So for example, it can impose restrictions on the language such that you just cannot make a certain type of mistake. And the best thing about that is somebody can think of what that is embedded into the language and then we don't even have to think about it because the language will just naturally not permit us to make that mistake. It can implicitly make the right choice. So the next time you choose your concurrency framework, it may be as simple as the right set of defaults. And then finally, as we already mentioned, it can eliminate a whole class of errors. So the actor model, how many people here are familiar with the actor model or know what it means? Okay, so maybe a half. The actor model and actually a lot of these concurrency models date back to the early 70s, which is in itself very interesting to me because there's been very little conversation, it seems, around this stuff outside of the academic community. It's only now that this stuff is making it into software development as we know it in a practical sense. So one of the first papers that was published was in 1973 around just the proposal, the idea around the actor model. And just to be clear, this is not an API or a programming model at that point. It's basically a language and a sort of, it's mathematical notation that they were after to create a process calculator, as they called it, where you could prove certain things, stuff like that this program will execute, it will finish, it won't deadlock, it will give you certain behaviors. All the things that we complain about when we say it's really hard to test threading or this concurrency code, they were after formal proofs and ways to develop a language that will allow us to actually express all of this stuff. Later in 1975, we had the first kind of practical application where somebody took the mathematical notation and put it into something that smelled like something you could write code for. And then since then there's been literally dozens and dozens of papers around this stuff. Some of the actual languages, the one that most of us are familiar with is probably Erlang. So that dates back to 1986. It wasn't officially released, kind of the first public release since in 1993, but it wasn't until 2003 that we even really heard about it. There is Scala, there is a whole bunch of frameworks that have been built with this model. So Kulim is a good example. That's on the JVM, it's not a language, it's just a framework that you add on top of the JVM. So needless to say, this has been around for a while. And the basic concept is very simple. I'm not going to bore you with the mathematical notation, but the practical aspect of it is there's two things. Every process has a name, every process has a mailbox, and you communicate by messages. And when I say process, that's a very ambiguous word. That could be a physical process in your machine, that could be a thread, that could be a different machine entirely. So effectively you have different things that have names and you communicate with messages. So what does that give us? Well, it gives us a message-centric view. It allows us communication between all kinds of entities, whatever that may be. And as you can see, it's a very natural fit for distributed programming. You can take this model and scale it out with one machine and between multiple machines. There's nothing in this model that prohibits us from doing so, which in itself is very interesting. That's not something you can do with threads and shared memory by itself. So that's a good example of what a good model can give us. It puts certain limitations, a message-centric view, but it gives us other things in the process. Constraints, no side effects. So we're not sharing variables here. We're passing variables. There's no race conditions. We don't need locks anymore. There's no semaphores. There's never a case where we're sharing a pointer to the same thing. So forget it. We don't need that stuff anymore. So that's the actor model. The CSP model is a little bit older. It dates back to 1978, and the first kind of seminal paper on this was the Communicating Sequential Process by Tony Hoare. Later, this evolved into a whole family, I should say, of other related systems, and the language adoption actually lagged even that of the actor model. So there's been limbo. Most recently, it's Go. So Google's Go language. How many people here have played with Go? A few. Okay, that's actually more than I thought it would be. So Go introduces CSP back, and I'm going to say this is the only language that actually has CSP built in that's actually interesting to play with today. So what's the difference between the actor model and CSP? Well, unlike the actor model, processes are anonymous. So instead of giving processes names, we will actually give the communication channel a name, and you communicate over these named channels. So the best analogy you can think of is a Unix pipe, where you create something on the system, and then you can shove stuff into it. You have a name for it. You can shove stuff into it, and you can get stuff out of it. Really, that is the only difference between the actor model and the CSP, but it yields very different results. So likewise, message-centric view. No shared variables. Allows communication between threads, processes, and machines. The distributed case is a little bit trickier, because now you have this pipe that's sitting in your machine. You have to somehow expose it to a different machine, but we can think of a way to do that. But nonetheless, distributed programming is a pretty natural fit. Once again, no side effects, no race conditions, no semaphores, and all the rest. So very similar, but also very different. So what is the advantage, or what is the feature of this model? Because we have, so I'm calling that arrow a named channel. So let's call it A, and we have multiple processes. Because we have A, we can have multiple processes just all pushed into the same pipe. We don't need to know who to send anything to. All we need to know is, where is the destination that I should be sending my messages to? So you could have 10 workers all pushing into the same pipe and not worry about who's on the other side receiving this data. You could do some crazy stuff, like, well, imagine if the channel itself, what if you could send a different channel over the channel? Then I could create this weird scenario where originally we had the blue process talking to the orange process, but then we have this guy at the bottom, and then the blue process sends the channel through which it's communicating with the orange guy to the guy at the bottom and says, hey, I'm talking to this guy, I'm talking to this pipe, take that pipe and talk to it as well. So we can actually delegate work to a different process. Or you can create a chain. So I can do the same thing and say, hey, the orange process, do something, and then talk to this other guy on the other end. So you can create very interesting flows of data using this model. So to explore this, I was playing with Go, I was looking at the source code, trying to really wrap my head around what does this actually mean? Because I find that, at least personally, until I actually write some code, I really don't get it. And until I write a blog post, I really don't get it. So I started basically porting the Go concurrency model onto Ruby. So you can play with this yourself, gem install agent, let's take a look at an example. So this is the hello world of concurrency world, right? It's the producer consumer example. What am I doing here? I'm declaring a named channel, right? I'm going to call it increment or incur here. I'm going to give it a type. So channels in Go are typed channels. You have to declare what you're going to be sending over this channel. So I'm going to be sending integers. Here's where the magic happens. You have this special keyword called Go, which in the language Go starts what's called a Go routine. In Ruby, in this case, what it actually does is it takes the code block that I'm specifying, and it actually starts a background thread. Now, you don't actually see any of the threading or the synchronization code because you don't need to, right? So the code that I'm implementing here will take care of all of that for you. You don't even have to think about it. But effectively, what it does is you, I'm defining a new keyword in the language called Go. Go will take a channel, and then we will, inside of this block, we will loop forever and just continue sending messages into this channel, right? Incrementing our number every single time. And then we're going to consume the results. So to consume the results we simply call receive on the channel, and that's all the risk to it. There is two threads running here. One is producing numbers, one is consuming numbers. There's no synchronization, there's nothing else. Very clean, very simple. And actually even more compact than probably anything you can write with threads in Ruby, right? So let's take a look at a harder but much more interesting example. So I'm going to implement something that resembles a multi-threaded web server. So first, I'm going to declare a new struct. It's going to be a request. The reason I need to do that is the channels are typed, so I'm going to transport request types over my channel, right? And the request will contain an argument and a resulting channel. Then I'm going to declare a new channel, right? I'm going to call it client requests. I'm declaring the type, and I'm actually saying that the size of the channel should be two. What that means is by default the channels are unbuffered. So that means you can send one message and then if you call receive on that channel it'll block until there's a message. When I'm declaring a size I'm saying you can push up to this many messages before you will block. We'll see why that's interesting in a second. Then I'm going to declare a worker or create a worker process, right? So all I'm doing here is just declaring a new proc. So I'm using regular Ruby, which will loop forever. It'll take a request object and call receive on it. So as you can guess, that's a channel. That's going to be a channel. Basically we have a worker that's going to sit there and listen and receive. It'll sleep for one second after receiving something and then it'll push a message back into the resulting channel where it'll print its current time, take the argument and add one to it. Pretty simple stuff. So sleep, increment, and add a timestamp. Then I'm going to start two of these guys. So I'm just using some Ruby syntax trigger here and instead of doing go with two different blocks I'm just capturing the block and passing it in which is why I did the proc at the top. So I'm starting two of these guys in parallel. So now we have two threads running in the background. Now I'm going to actually create two incoming requests. So here I'm kind of simulating this by myself but you could easily picture how you would attach a network socket to this and just create these things dynamically. So there's a request one and request two. The first one, the argument is one. The second one, the argument is two. And the second argument to a request object is a channel, a new channel over which we'll get back the result. And the type of that channel is going to be string. Then what I do is I take my client request channel, which is the channel on which the workers are listening. Both of them right now, they're running in the background and they have both called receive on this channel. So they're both waiting for a message. All I'm doing here is I'm taking the two requests that I've created and just shoving them into that queue. And then I call receive, right? And what happens here is because we have the client request channel, which is of size two, the moment I push both of these requests into this queue, both workers pick up the message, sleep for one second, and then print it out. And not surprisingly, if you look at the timestamps that get returned, we see that they both execute at the same time when we get our results back, which is two and three. So what we've done here is we have a main loop, we have two workers, and we have a way to receive messages in a thread safe way. And there's not a single synchronization point in this code, right? Never in here did I ever declare a join or a wait or I need to guard for some shared state or anything like that. Much simpler to write, much simpler to reason about, and much easier to test, in fact, because you can detect a lot of these cases. The language does not, or this model just does not permit you to make certain types of mistakes. I can tell you that while I was writing this, I was implementing all of this using threads underneath, and I had a hell of a time trying to debug some of these edge cases. But once I have, it made writing these things very, very easy. So of course the question now is, well, you know, Ruby, so what do we do about all this stuff? We have many different kinds of rubies. We have JRuby, which doesn't have a global interpreter lock. We have JVM threads, which is great. There's in fact a lot of existing work that's been done in a JVM that's independent of Ruby. There's frameworks like ACCA and Kiliman and others, which have already implemented the actor model on top of the JVM. So in fact, there has been some work for how do we expose those primitives within JRuby, and I think that's really interesting. I'm yet to see something that is easy to use in kind of a Ruby idiomatic way right now in order to use something like any one of those libraries. You still have to go through a page worth of boiler code Java to get something up and running, which just feels plain wrong. But I think that's a very promising area to explore. So I think JRuby is a great platform to experiment with some of these ideas, these advanced concurrency models. Rubinius is also an interesting one because there is work being done on the hydro branch to remove the global interpreter lock. Rubinius actually has built-in channel and actor types. So in fact, a lot of the internals of the Rubinius are built around this channel idea. I don't have hands-on experience with it, but it seems like this would be a very natural fit. And in fact, lately Rubinius has been a lot of talk about using Rubinius for building other languages on top. So I think this is a very fertile ground for experimenting with these ideas. And maybe it's an extension to Rubinius. Maybe it's a new language entirely. I don't know. That's up to you guys. MacRuby has Grand Central Dispatch. I kind of see that as halfway there. It makes thread scheduling and all that stuff a little bit easier, but it doesn't expose an actual better concurrency model. So it simplifies it somewhat, but it's not the answer. But I do think that at a certain point we will see MacRuby in iOS. So maybe there's a reason to invest into that ecosystem as well. And I think you could take Grand Central Dispatch and build a higher-level API to do something really interesting. And unfortunately, finally, I think the MRI Ruby is kind of the big loser in this race right now because it has a global interpreter lock. There is some talk about the MVM, which is a multi-VM implementation, but that's, as far as I understand, barely at the research phase in Tokyo University. I'm not sure that there's been real work done in terms of getting it into the language and getting it into the language would be a whole other proposition. I have a feeling that's just not going to happen anytime soon. Maybe there's room for libraries like Agent, like the one I've built. Do I use Agent in production today? No, I'm not. Do I think it's actually feasible to use it? I think so. But nonetheless, you wouldn't get the benefits of the multi-core. You would get the benefits of much cleaner code that is testable code and sane code, but you wouldn't get the multi-core aspect of it. So that's not as good. But of course, I don't think we should limit ourselves to Ruby. I think in order to really understand what these advanced concurrency models are, I would encourage everybody to actually go beyond Ruby at this point and explore some other possibilities. So, you know, I've had a lot of fun working with these languages. I.O. is a fun language. It takes, you know, it's a weekend project for you guys, honestly, because it's a prototype-based language, very similar to, you know, if you've done JavaScript, you're familiar with that concept. You can read the entire documentation in about two hours. It has some very interesting primitives built in around the actor model, so I would encourage you guys to play with it. There is Go, of course, which some of you have already picked up. I would encourage you to look at Go routines and just work through the examples there. In fact, I've taken the examples from Go, put them into Agent, and basically re-implemented all of them in Ruby as well. That was my way of learning all of those concepts. So you can pull up the repo. You can look at the Go code, the Ruby code, side by side, and just, you know, play with it. Look under the hood. Look how it's implemented. Scala, you know, very popular nowadays. Clojure has some interesting concepts, you know, functional language, transactional memory, all of these kinds of things. So I think, you know, we, before we make any sort of advances in the Ruby ecosystem, we really need to understand what's happening elsewhere, because there's trade-offs to every model. You need to find out what works for you first. So in summary, there's hardware parallelism. It's not about threads versus events. And as I've said, I've put enough gasoline on that topic myself. So it's not one or the other. You need all of them. You need P-threads, you need E-paul, you need KQ, and all the rest. But what we're missing right now is that extra tier, which is the actor model, CSP, and all the rest. And I think that's where we should invest our time. I don't think threads and events are the right API to build concurrent programs in the future. I think it exposes a language that is air-prone, that is easy to screw up in, and easy to maim yourself on, basically. So we need something better. Now, there's a couple of blog posts that I've put up around this topic as I've been exploring this stuff. I would really encourage you guys to go out and follow some of the links in this presentation. I think it's a very important topic for us moving forward. And then finally, do take a look at agent, play with it, see if it makes sense. The specs are actually... There's examples embedded, and I would encourage you to look at the specs, because they actually talk about all the... I specifically specify all the behaviors that the GoRoutines should give you. And you can walk through those and wrap your head around what this all means. So that's it. I don't know if I have time for questions. A few minutes. If you have shared resources, it seems like the Go still wouldn't solve your problem. You still need to have mutexes, semaphores. So the question is, if you have shared resources, Go wouldn't solve that problem, because you still need to share them. Right. So the constraint that this model places on you is you communicate with messages. So instead of passing a pointer, there's never a point where two entities, whatever that entity is, a processor or a thread, the model basically disallows that two entities have access to the same resource at the same time. I mean, sure, you could probably create a synthetic scenario where you would try to do that, but that's kind of dangerous. Now, under the hood, I will tell you that Go is designed to be a systems language, so they actually biased it towards let's build a high-performance system that is within the same host, not for the distributed case. So what happens is when you pass a message, they don't necessarily pass a message, but they pass a reference, which is obviously faster. But basically, you're passing a reference or a process, so once you pass it to somebody else, you don't have it anymore. But that sort of abstraction is completely hidden from you. So sure, you can probably create scenarios where you still create deadlocks and all the rest, but it's just, I'm going to say, it's exponentially harder to do so. So the question is, I recommended seven languages in seven weeks. Are there any books specifically about concurrency? You know what? One of the first slides that I had had a whole array of different books, and honestly, I can say that I can't think of any one book that is specific about... I wish there was a book that talked about all these different models. I wish my university course actually mentioned the actor model or something beyond threads. That's just not the case. I think we're too focused on the hardware software parallelism. We're too focused on the P-threads and the E-pole and the KQ stuff. There's a ton of material around that. There's not enough material on the bottom layer, which is that advanced concurrency model. And I think the state-of-the-art research in that field right now is actually in the languages that are being invented at the moment. Go, IO, and a whole bunch of others. Closure and all the rest. So I think the best way to learn is to actually pick up some of the code samples and run them today. And I hope we'll fix that soon. I hope we'll get more resources about this. That's certainly my intent behind giving this talk, is to kind of open this topic up and encourage you guys to go out and play with this stuff. Is agent-optimized for... Is agent-optimized for anyone interpreter? It will run on... It runs on MRI. It runs on JRuby. It is able to take advantage of the No Global Interpreter Lock on JRuby. So there's actually a benchmark example in the repo that will run faster on JRuby because it can. It doesn't work today on MacRuby because their threading stuff is not implemented right. I basically just psych-fold MacRuby. So it actually exercises quite a few features. Rubinius, I haven't tested. But in principle, I'm not using anything that is specific to any one implementation. It's just using basic thread primitives underneath. So it should be able to run across all the platforms. That's certainly my intent. You said that it's blocking, but did you order to that? Yeah, I missed that. Sorry? You said that agent was blocking under over-circuit size. Oh, I said that agent was blocking after a certain size. Channels, right? So channels can be buffered or unbuffered. So when you declare a channel, you can say, this is the amount of messages you can receive. So you can keep up to, let's say, 50 messages in the queue. After you try to push the 51st message, it'll say, wait, hold on, stop. It'll pause that thread and keep it there. And the semantics of that depend completely on your application. By default, it has one message, but you can declare that to be something else. So for example, if you want to have a work queue with up to 10 workers, you would say, the size of my channel is going to be 10. That's how I control my concurrency. All right, thank you guys.