 Thank you for coming by way of obligatory introduction. My name's James, I'm a web developer Ieithio yn gweld mai'r cyd-dweithio Ryby a'r gyfnod gyda gyda gyda gyda gyda Gwydiannau ac yn ymweld yn y bwysig o ymgyrchau o safonau. Mae'r cyd-dweithio erioed wedi'u gweld yn ymweld, ac yn ymweld yn ymweld ymweld yn ymweld. Mae'r ymweld sy'n gweithio ar y cyd-dweithio ar ymweld, yw Fe. Felly, ydych chi'n meddwl, mae'r cyd-dweithio ar y cyd-dweithio, mae'r cyd-dweithio ar y cyd-dweithio ar y cyd-dweithio? More than two people. That's amazing. For those of you who haven't used it, probably explain what it is, it's a simple PubSub tool for the web. It lets you write programs that run in the web browser that can send messages between each other, between different users that are connected to your app that can send messages to each other. Your server can also send messages out to clients so you can use it as a push server. It's based on the bio messaging protocol, which was a protocol that was introduced around 2006, I think. It introduced a PubSub messaging model that was designed around the constraints of what browsers could do at the time. We didn't have web sockets. Browsers have a limited number of connections per domain that you can open. If you have several things that you want to be notified from the server about, you can't open a connection for all of them because you'll just flood the available connections. Most browsers are the limits too, so it's really limited. There's a protocol designed within those constraints to let you do messaging in a same way that made use of those resources. Faye provides a messaging server and a client based on that protocol. It has implementations on Node.js and in Ruby and they can interoperate with each other. They basically have the same protocol and they can interoperate with projects like Comet D and other projects that use this protocol. One thing I get asked pretty frequently about this project is why do we need it? We've got web sockets almost everywhere now if we ignore Internet Explorer. But Bio does a bunch of nice stuff for you. It has a routing scheme built into it, so you don't have to deal with keeping track of which sockets belong to which clients or users. You can use its channel system to just publish stuff and it routes all the messages to the right places for you. It deals with automatic reconnection, so if your socket drops out it will reconnect for you. It deals with browsers that don't support web sockets, so it falls back to long polling on Firefox 3.6, and IE are still the ones that don't have web sockets, so it falls back to long polling for those. That's a technique where the client opens a request and then the server doesn't respond to that request until it has new data. As soon as the client gets a response it immediately reopens another request to wait for more data, so it looks essentially like you've got a persistent connection. Phase design to be really, really simple to use. This is how you start a server. It's a rack application. You just instantiate it, tell it what path on the server you want it to mount to. You can also use it as middleware, so if you install it in front of a Sinatra app it will catch, in this example it will catch all requests to the slash buy you path and treat all of those as things that it should handle, it will delegate anything else down the stack. The client in the browser, you just make a client, you tell it what URL the host lives on, and you can then just tell it to subscribe to a channel and give it a callback function that will accept messages that are sent to that channel. The channel system works by all channels look like path names, essentially, and when you publish a message you never send it directly to an individual client. You publish to a channel and then the server figures out which clients are subscribed to that channel and sends the messages onto them. The routing system also supports wildcards, so you can say anything below a certain path, like catch all paths underneath that. You can do a bit of pattern matching with that, it's quite powerful. When you publish stuff, you can publish anything that can be serialized to JSON, so any JavaScript object or an array or string or numbers, and that will get deserialized on the other end, so you don't have to pause it when it comes back into your callback function. A Ruby client has exactly the same API, you just have to wrap what you're doing in Event Machine or have Event Machine running somewhere. It's the same kind of thing. You subscribe to a channel and you give it a block, and the block gets run with the incoming message whenever the client receives something, and the publish API works out of the same way. It's interoperates between Ruby and JavaScript data types, so if you publish a Ruby hash to a channel, it will turn that into JSON and then the JSON will get deserialized in the browser and become a JavaScript object. So that's quite nice. That is basically the whole FAY API. It's not something that could spend a whole talk talking about. What I really want to talk about is what I've learnt from building it, which is mostly centered around event driven programming. That's been kind of a hot topic the last year. We've had Node.js coming out, people having all sorts of fights around our thread servers better than evented servers, or the other way around. That's not really what I want to talk about. I'm not going to tell you how to benchmark your network, IO, throughput, and be all web scale. I want to talk about using events just as a design pattern, as a way to model what you're doing in such a way that it keeps your code modular and helps you decouple your components, and also helps you handle asynchronous IO. What do we mean by event driven programming? Well, it turns out that you're already doing it. You just might not have realised it. When you write controller methods in Rails, you're saying to the Rails environment, this is what I want to do when a certain request is made. Probably no part of your application ever calls one of your controller methods directly. You're just saying to Rails, hey, here's this code. Tell me when a certain request happens, and then I'll deal with it. When you have a web app running, you're not sitting there waiting for a blog post create request, blocking on that and ignoring everything else. A web app reacts to whatever comes in. Controllers basically declare event handlers for handling requests. It turns out that event driven programming also relates quite closely to some of the kind of fundamental concepts of object oriented programming. We've all heard maxims like tell, then ask, or the Hollywood principle, don't call us, we'll call you. All these little bits of advice kind of stem from Alan Kay's original vision of what OOP should be. This is a quote. This was in a talk yesterday, so I apologise to people who've already heard it. What he said was, I thought of objects being like biological cells, only able to communicate with messages. OOP to me means only messaging, local retention and protection and hiding of state process. What he means by that is that objects aren't allowed to ask other objects about their state. They can only send messages out like the way biological cells communicate. They send little chemical signals out into their environment, and they pick up signals that other cells are emitting, but they can never say to an individual other cell, can you tell me something? You can only publish stuff and listen to what's going on around you. This is kind of the way that we communicate in a lot of situations electronically. When you send email, you send a Twitter message to someone, you can ask someone a question, but you don't sit around all often waiting for an answer. You've got stuff to do, YouTube isn't going to watch itself. It's not like having a phone conversation where you ask someone a question, and then you sit there and you're entirely focused on that. You can ask people questions, but you can do that concurrently with doing lots of other work. That's kind of how we saw objects communicating. Syntactically, that ends up not being very convenient a lot of the time, which is why we have things like the return statement. The return statement is really just a shorthand for the called object sending a message straight back to the caller to say, oh, you sent me a message, I'm going to immediately send you something back and you don't have to wait. But let's imagine for a second that we didn't have the return statement, which is really what happens as soon as you start doing asynchronous programming, you can't use return values because return values are blocking things. You won't get a return value immediately. You're going to have to wait until some future time when you actually know what the result of the computation was. So if we get rid of the return statement, the most convenient thing to do usually in Ruby is use blocks. So your kind of traditional blocking API that we're used to is that you call something, you can say, for example, if we're using htd party, we can say response equals htd party dot post where the URL and a query, and we're assuming that post is going to immediately return us a value and our code is going to block while that request is made and we get something back. If we go to a non-blocking system, for example, the event machine, the HTTP library, you tell it to make a post request, but then you don't sit around waiting for the response of that. What you do is you give it a callback and you say, I'm going to go and do some other work, but when you know what the response was, call me back and then I can run stuff that depends on it. So in this example, the puts at the end that says this happens first, that will happen before what happens in the block because once you've set up your callback, you can carry on doing work until you go response back. This is a really simple transformation going from blocking return value-based code to using callbacks, this thing that we call continuation passing style, as in you're passing in the work that you want to do afterwards to the thing that you're calling. But people start doing this and they find that they start making everything asynchronous and they get all these nested callbacks and they complain and they say it's a horrible pattern, but it doesn't have to be that way and that's something that I hope to get across in this talk. So it's all very nice, a lot of kind of theory, but I want to use Faye as an example for how this stuff actually works in practice and what I've learnt by building it. So the in terms of Faye looks something like this, there's the racketata, which is the thing that you instantiate when to start your server, that just deals with all the talking to the outside world and doing HTTP stuff. This figures out, because there's several different request types that Faye supports, it could be a get or a post, it could be a web socket, so it deals with sorting all that out, figuring out how to get a message out of a request, parses the JSON that it gets so that you get a hash and then once it's done all that kind of yucky business it just sends the hash down into this loop of objects that we have in the core. So beneath there there's no HTTP, there's no JSON parsing, it's all just parsing hashes around from object to object to object, pushing data around. I'm going to go through some very simplified versions of the code for these components and the thing that I want to get across is the components all push, all they do is forward data to another object. We don't use the return statement to move data around, we keep pushing it through the system instead of pulling it out of objects that we're calling. So the RAC adapter just looks at this, it wraps a server object which is the thing that actually speaks the messaging protocol using Ruby hashes to parse back and forth. It just has a method to start it running, listening on a port to accept connections. The server itself has two collections of things inside it, it has a set of channels which are the things that you publish messages to and that you subscribe to and it has a set of connections and a connection just represents a connected client and it's the thing that can sit there collecting messages until the client asks for them and they can be sent back. So inside the RAC adapter the way that when you accept a request this is the example, this is the way that handles post requests that has a bunch of other things for different types of requests but it just reads the request body and parses it with JSON and then it tells the server to process the message but it doesn't block wedding for response because part of this system is that we're going to have these very very long running connections that wait for new messages to show up. So we don't want to block waiting for that to complete. There are no apologies about that. The final bit of this method is just some kind of boilerplate. The thin web server has this protocol where you can tell it that what you're doing is asynchronous because most RAC apps expect your app to return a value immediately. Thin lets you not do that, it lets you say that I'll call you back when I have a response and it will give you this minus one response thing and it knows what to do with that. So when the server has finished processing the message it runs this callback, it gives us the reply message to the message that we sent it and we can just serialize that with JSON and send it back. That's that response.succeed is saying to the response object that I made I'm done now, you can actually send something back to the client. So the server as I mentioned just has a pool of channels that show you for routing purposes but that's not really important and it has a set of connections that are indexed by client ID. The client ID is something that the protocol handles for you so you don't have to deal with going that connection means this client and this one means this other one that's all completely hidden. So the messages that come into the server look something like this. They all have a message ID which is just keeping track of things internally. Every message has a channel that it has been sent to. Every message says what client sent it that's the client ID and every message has a data field which is the data that you published to that channel. So when you say publish slash foo with a hash slash foo is the channel and the hash becomes the data for that message. Now there are some channels that are special. Faye uses the channel system itself to do a lot of its internal management of things. So two special channels that exist in their system are called MetaConnect and Metasubscribe and what MetaConnect does is that tells the Faye server this is a long running request that's designed to wait for messages. I'm not publishing something to MetaConnect that's just telling the server and similarly Metasubscribe is the way that a client tells the server that it wants to subscribe to a particular channel. So this is what the server does it looks at what channel the message was sent to and does something appropriate in each case. Notice how we're passing in a callback. None of these methods return anything. We just the callback was this the line where we say do reply. That callback gets passed down the system until something succeeds and it can run the callback for you. But if we keep passing that down it's basically giving the methods that you're calling a place that they can send data back to so that the server can carry on doing its work. So that's just a simple routing system. So when we accept a connection the message has a client ID we just go and get the connection object to the client ID and all we do is we give it the callback that was passed into the server because we don't know how long this is going to take this might wait for messages for minutes, hours, I don't know. You just say to the connection here's this callback when you've got some messages run it and then the response will get sent out. So to add a subscription again we look at the client ID we look at what channel they want to subscribe to and we pick that channel out of our pool that we have and we tell the connection for that client to subscribe to that channel and the connection knows how to do that. But in this case we aren't waiting on anything, we're not making a long running request so we can immediately run the callback and send a response out so that's all we're doing there. That will take that hash and pass it to the callback that the server gave and that will get turned into Jason out. And similarly to distribute a message, this is when you use the publish method it will come in here, it will go okay what channel did you publish the message to? It will find all the channels that match that because there's a wildcard system so there's some pattern matching you find all the channels and you just push the message onto the channel, that's all the work that you need to do at this stage. And again we're not waiting on anything so we can run the callback immediately. So what does the channel do with this message? All the channel does is it doesn't care what other objects in the system actually want to know about that message at this point. It can just announce using its own publish method it can announce I've received this message it doesn't know about any of the other domain objects in this server so it's decoupled from them, it can just make announcements and other things can listen to that if they need to do work based on it. And that's exactly what the connection does when you tell the connection to subscribe to a channel you first check that it's not already subscribed to that channel so you don't get duplicate messages and you just add a listener to that channel so this says when the channel receives a message run this callback so as soon as any channel that the connection is subscribed to gets a message it will go okay the connection has succeeded that means I can the request that we had open wait for messages can succeed and we can send a response back and the client gets the messages that it wanted and as soon as we've succeeded we put the connection back in the deferred state meaning that it's going to sit there and wait for more messages to come in so I've glossed over a few bits in this in the code that I just showed you this callback method, add listener publish, succeed, I didn't explain what any of those were doing what those are incredibly common patterns in event driven programming that you can extract into mixins and some of these already exist in the Ruby standard library and in event machine and in other places they're so commonly used so what are those missing pieces the first one is publisher which implements the add listener and the publish method and you use this to make an object that just wants to make announcements so the most common example at least in my experience of doing this is when you're doing GUI programming and you want to say whenever this link gets clicked run this callback, it's that kind of thing you just want other objects to get notifications when something interesting happens so you can listen to different types of events and run callbacks based on that and that's really easy to implement you just say what event type you want to subscribe to and give it a block and the nice thing about Ruby blocks is that you don't have to yield to them immediately they're just proc objects and you can store them in a list and use them later so that's what we do here we make a list for storing the callbacks for that event type and just put the block in it and then when we want to publish that event we first check that there is actually a list of callbacks for that event and then we just run each of them it's really simple the other piece of the jigsaw is deferrable and that implements the callback, succeed and defer methods and what deferrable represents is it's a placeholder for a value that we don't know yet so in our HTTP example earlier our event machine HTTP request object was a deferrable because it represents the fact that we don't know what the response is yet so we can add callbacks to it and get notified when there is a response but we can also pass this deferrable object round as a piece of data so that any other objects that need to know what its eventual value is can add callbacks to it and in this system any callbacks you add to it should only get run once they shouldn't get notified every time some event happens they're just waiting on one thing to complete so the way this works is like this you have a defer method that puts it in the deferred state and as soon as you tell this thing that it's succeeded it should run all the callbacks that we're waiting for that value so again it just loops over the callbacks runs them and then it forgets about them because they don't need knowing again so we set callbacks to an empty list and the callback method is the thing that you use to attach callbacks to the object so that other objects in your system can get a callback method so when you add a callback the first thing you do is you go do we already know what value this thing is wrapping so if the status is success we can immediately run the block because we know what value we can yield to it otherwise we just put the block in a list and save it for later years so the thing that I want to illustrate with this is that I we've got this slide we've got this cycle of data flow the server accepts messages it gives the messages to channels the channels give the messages to connections and then the connections give them back to the server as responses to other clients and so data is going around this loop but if someone showed you an architectural plan for some software that looked like that you'd get a bit wary we've got a cyclic dependency it'd be hard to tease this apart it'd be hard to unit test and if it got beyond the kind of hello world examples that I've been showing you it just gets nasty pretty quick but the nice thing about having an event system is that the for example I showed you the channel object the channel object doesn't actually know about the connection object all it does is it publishes a message the connection object says to the client can you tell me when something happens and it's the same with the relationship between the connection and the server the connection doesn't know what the server API is the server gives callbacks to the connection objects so using this we can actually reverse the direction of our dependencies so that we get a more layered system so in this system like channel is kind of the base thing it doesn't know about any other types of things in the system the connection has a bit more knowledge it knows how clients work and it knows what events it publishes and server knows about both those things so it knows how to send messages to channels and it knows how to add callbacks to connections but the point is that we've used these events as glue to connect objects that don't know that much about each other and this turns out to be a really really great way of keeping systems like this maintainable and it shows up especially if you do a lot of GUI programming you may have a bunch of components if you're doing like front end for a web app you can have a bunch of components on a page that have to react in various ways to how your data model changes or they might have to be synchronised in various ways but you don't want to tie them together if you think about what events an object publishes don't just think about its API think about how it can notify other things about what it's doing that's almost in some situations that's as important if not more important than what its API is because it gives you these hooks to glue stuff together so event driven programming helps us keep components decoupled and layered and modular it expresses in the code why and when things are happening so we don't just have this do this and then do this and then do this sequence it doesn't really explain why those things are happening in that order if we have events that actually expresses oh this is happening because this other thing happened we can see how the triggers propagate and it helps us deal with non-blocking IO because it gives us a way to add callbacks of things without blocking waiting for responses we're doing for time okay cool so testing this wouldn't be a Ruby conference if nobody mentioned testing thing about testing asynchronous systems though is that it's kind of yucky this is really like a hello world integration test and a lot of it there's some dots dotted around there where I've just left things out because they take up too much space it won't even wait for the test to complete properly there's a ton of stuff missing all this does is it checks that one client can send a message to another client through a channel but to do that there's all this like I've got to start a server and wait for it to complete and then I've got to make subscriptions and wait for them to complete but there's two of them so they rely on each other and then it's horrible and yeah this code wouldn't even work so don't try but and this is a hello world case so it gets really ugly really quickly so it's also not very valuable to have tests like this because they're just as complex as your code you haven't eliminated any risk you haven't proved that your system works properly you've just moved the risk up to your tests that you can't read and nobody knows what they're doing so when I'm thinking about how to design, how to factor things this quote ends up popping into my head quite a lot this is this is a few years ago but it's talking about designing some code that had to like go and count how many employees in a company had a boss that had had some kind of relationship to some other people in the company and he was he was talking about okay well you know you're going to need some loops and some if statements and whatever and he kind of writes down a kind of naive implementation then he goes this is patently wrong there's nothing inherently nested about what we're trying to do and he proposes something that he calls a telephone test which is that if you called up your boss or one of your co-workers and you just described what this code was supposed to do you wouldn't say to them oh we need an if statement and then we need two nested loops you'd say no we want to count all the employees and the company that make more than X amount per year so we would rather have tests that tell us a story and that are readable so what the test I showed you a second ago is actually trying to do is this it goes given that there's a server running and the client Alice has no subscriptions and the client Bob is subscribed to slash foo when Alice publishes a message to foo then Bob should receive the message that's what we actually want to happen and the the reason why this is readable and is not necessarily because it's cucumber and it's plain text the killer feature of cucumber that makes everything readable is this there's just a straight line down the left hand side of the page it doesn't let you write loops or nest anything it makes you tell it kind of makes you do a telephone test by telling a linear story that you can understand really easily so that's the thing that we actually care about so what if we tried to refactor our test so it looked like this again this is essentially the same as the cucumber test that I just showed you just written in Ruby makes a server, makes some clients with some subscriptions publishes a message and then checks what messages each client got it isn't obvious how we're going to do that like in the node community there's a ton of asynchronous testing frameworks it's like the hello world app if you get into node.js it's like writing a blog when rails came out everyone's doing it but they're not actually that much help they're okay if you're doing just I'll make this Ajax call and then check the response but if you're doing something like this where kind of there's all this setup and all of it is asynchronous you still end up with a mess all that these systems do is they just make the test runner asynchronous they don't make your code any cleaner so we need a way to deal with this so one suggestion that will probably come up if you're asking a Rubyist to do this is they'll tell you to use fibers or use threads or just wrap some kind of a blocking API around this thing that uses callbacks but I don't really want to do that I don't want to make everything blocking I just want it to look linear I still want it to be asynchronous I'd rather the caller was left to decide whether it cared that all the work had been completed or not I don't want every little bit of this story to block until it was done so the way that I do that is that I use callbacks to glue the things together we've got all these these essentially kind of what would be step definitions in Cucumber these little methods that implement bits of the story that set up servers, that set up clients that publish things and they'll take arguments but there's no callbacks in here that's what we're trying to eliminate but what if we implemented all those story methods but made all of them take a callback and they could use that callback to say when they were done doing their work so the for running these tests I'm going to have this thing called an async scenario and that is the that models a use case of the fe messaging system so what that has is it has a set of clients and a set of inboxes to track messages that they've got and then it implements all the story methods also making them take callbacks so the server method for example does all the work, it starts a server up on the port that you gave it and then it uses the rack API to get notified about when the server is actually running and as soon as that's the case you can call your resume block, I've called it resume here because that's the job that it's doing, it's telling the test runner that my bit of work is done and you can carry on going setting up a client works kind of the same way, remember we set up a client with a name and a list of subscriptions so that's the arguments that we have and again we have our callback and that just, it makes a client goes over the channels, subscribes the client to those channels and then we use a timeout, we figure Fade doesn't have an API for waiting for a subscription to be activated yet but people have asked for it so I might add it, but all we're going to do here is wait a certain amount of time like a reasonable amount of time for the requests to go through and everything to get set up so we just use Event Machine to add a timeout and when that expires we can call resume and we can carry on running our test and the other methods are implemented in a similar way, the pattern is you, okay so yeah, you implement your methods like you would normally but you give them an extra callback argument that they can use to notify the rest of the world that they're done working and you just need something to tie this together so when I'm setting up my tests for Fade, I just have some stuff that I add to test unit that lets me catch all those those story methods that I want to run and gives them to the scenario to kind of get glued together so my tests in my setup I make a new one of these scenarios and I have a list of commands that I want to run and I just make sure that the Event Machine reactor is running in its own thread and then in teardown you need to make the test because test unit isn't asynchronous you need to make it block until your scenario is finished running so in our example we're just going to sleep until Event Machine stops running you could just use a flag for that but in our case it makes sense because we're going to stop the server and that will stop Event Machine so we use that to check whether the test is finished and the interesting bit is the bit in the middle, the method missing we can just use method missing to catch all the story methods that you're calling and just put them in a queue so we don't run them sequentially we just store them and let the callbacks kind of glue everything together so when you call when method missing gets run it just puts everything it received and we want the first thing in the story to kind of set the test running because if we just put this thing in a queue it wouldn't do anything so we go and run the next command unless the test has already started so what does this do? it just shifts the next command that it wants to run off the queue if there's no command that means that we're done doing all the work that we want to do and we can stop event machine and that will stop the tear down method blocking and it will go and run the next test otherwise we take the command and we send it to our async scenario object and in addition to everything that we called it with we tack on this callback and all the callback does is it calls run next command it's just this tiny bit of glue that takes all this work you want it to do and sequences it even though it's all asynchronous and obviously if you're doing test you can catch exceptions exceptions are kind of don't work that well with asynchronous stuff because as soon as you defer some work for later you've gone into a different call stack and weird things happen but you can catch exceptions and there's APIs in test unit for adding that so in test unit you call add error if you're using aspect I think you can call example.exception and that will show up in the error output if your test fails so that is everything that I want to tell you about Faye there's a demo that I want I don't know if this will be visible we had some tech problems my computer won't talk to the projector but I just want to show you the project that's the reason that I wrote Faye in the first place which is something called terminus how many people have used Cappibara? Cappibara most people terminus is a Cappibara driver that's designed to let you run tests on remote machines so you can run tests on your phone or your iPad or any machine that can see yours over a network you can script it using Ruby through Cappibara so I was going to do this by asking someone with an iPad to connect to this but it wasn't going to happen I'm afraid but hopefully I'll be able to show you just what's involved just on this machine I have a terminus server running it's in here if I open a web browser if I open a web browser we hope everyone's going to be able to see this I'm sorry it's not on the projector what terminus does is is that it runs a server and then you can go and open a browser and connect to that server and what that means is that that browser is then available to be taken over by any Cappibara scripts that you run so if I go into the terminus project and run our spec this is going to run all the Cappibara specs against terminus using the browser that I've just captured so I start the test running it's going to go and do some stuff and then it should start manipulating the browser and there you go it's visiting pages it's checking links that are there I won't make you sit for all the tests because it will take about 20 minutes it's very slow but this is just working by it sends the browser messages the browser runs some javascript to figure out what you want it to do and then it sends a message back and it's all using asynchronous communication if someone has an iPad and they want a demo I could probably show you afterwards but I'm not going to have fat around with that with already numerous technical problems we've had so yeah that's everything that I had to tell you thank you very much for listening