 Mae'r adŵr. Dwi'n credu'r adŵr i'w rhun oherwyddGrath drwy'r adŵr? Mae yma yn ei gwasanaeth sydd o'r adŵr? Felly, dwi'n credu'r adŵr. Ac ydw i, dyfnodd ychydig i wedi gydag. Rhaid i chi'n credu'r adŵr. Mae'r Adŵr. Ac rwy'n dechrau i'n teimlo i'r 300 o'r cyf würdenol ffaith yr עthoedd. Mae'r swyddo ar welnodos llwyth yn ei dyn stands. Er yw hipud i'n tymnu Ambu North. New bamboo is a rails consultancy down in London and I'm now heading up the development of Pusher. So at New bamboo over the last year or so we've been working on quite a few product ideas and two products have really come out of that work. They're both cloud based services they've been interesting challenges to work on in lots of interesting problems and those are the kind of problems I want to talk to you about today. So the first product that came out was Panda. Panda is a hosted video encoding service and then Pusher which I'm now leading is a hosted real time messaging system and you'll notice that we're quite proud of our new domain name. All of the three stories I'd like to talk about today are all about event machine and also about Redis. Just before I start I'd like to get an idea of who here has used Redis. Okay that's brilliant. Practically everybody has used Redis. Redis is awesome. I'm sure you know that already. So what is Redis? Obviously you know that. It's an advanced key value store that's what it says on the website. So you store data in it right? That's what it's for. Well we actually only use Redis to store data in a very few little places. It's like storing API keys for really fast lookup. Things like that. But then actually Redis is also described as an advanced data structure server. I think that's getting a bit closer to it. I was talking to a colleague of mine and he really said it's kind of like a grab bag of algorithms and super fast in memory structures and you can kind of just in an emergency you can just go in and get some great tools that will help you solve your problem. And I think that's what I'd like to concentrate on today. Some maybe non-studded use cases. Specific solutions to problems we had. So three stories as I said. The first story is about auto assigning roles to processes. The second is about asynchronous communication between processes and the third is about synchronising state in your system. So let's get started. So this is about pusher. First I'd like to tell you just a little bit about the basic architecture of pusher. Because that'll make things easier. So at its very core the idea of pusher is to make it super easy to send messages from your server to web browsers. And so we've got two components kind of on the left side we've got the API. It's a little dark kind of standard. You send messages to it via a REST API. They get sent to a message bus in the middle and then those messages then get sent to a socket servers and those servers maintain persistent connections with large numbers of browsers. So in the system there are lots of operations happening all the time. There's hundreds of people connecting and disconnecting and sending messages up the pipe, sending messages down the pipe. And the important thing for us is to be able to measure what's happening and to have an idea of the usage of the system. That's very important because the billing information is based on that. So we build based on message volume and that kind of thing. And also it means that we can give our users very nice analytics. And that's actually very useful because when all your clients are connecting these persistent connections with your service you get a lot of real-time information that you can use very easily. We want to collect these statistics. And Reddit seems like a great place to store statistics. It's non-blocking. We can really, really quickly store the statistics. We can use incrementers to update values atomically. That's great, but that's not really what I want to talk to you about. What I really want to talk about is how we store those statistics in the long term. So Reddit is great for storing things right here right now and getting them out of your process really fast without blocking any operations. But really we'd like to store that data in a database so that we can archive it and we can query it however way we want to. So the question is really how does the data get from Reddit to SQL? Well it doesn't really matter who sends that data. It's not a really difficult job. It's not a job that has to happen super-super-quickly. It's just kind of a background process. We need to read some data from Reddit and write it into the database. But what we don't want is that multiple processes are doing this at the same time because then in our code we have to write lots of locking code and it gets really ugly and what if we read data from Reddit at the same time and try and push it into the database. We don't want to do that. So it should be a single process that does this. But we should be isolated from failure. We don't want this process dying and then suddenly there are no statistics for several days. What are we going to do? And ideally I don't really want to configure it because if I have to configure it it will go wrong and if it fails then I have to change the configuration and that's going to be a real pain. So we're kind of inspired here by something that Netflix put out and they wrote a blog post about this thing they've got called Chaos Monkey and Chaos Monkey just sits in their AWS cloud and it just randomly goes and terminates instances and terminates services and I think that's a great way of thinking about it. If you want to survive failure the best way to avoid failure is to fail constantly and that's what they said. And so how do we solve this problem? So the inspiration for this really came from the Chubby paper that Google put out. So they use it as a lock service so that they can do some of these same kinds of problems. So they had lots of different they have obviously huge distributed systems at Google right. And one of the quotes from the paper is they said before Chubby most distributed systems used ad hoc methods for primary election or required operator intervention in case of failure. So they were having to write this ad hoc code all the time that allowed instances in a distributed system to elect a master process. So the master process is responsible for some key operations. And there's some papers on this. Paxos the protocol if you've read that is kind of the most commonly used. But one of the interesting things that Google says in the Chubby paper is that it's actually they decided not to use a kind of distributed process, a distributed consensus protocol for defining the master. They said actually it's just easier to use a lock. So it's a central process. And so we thought well if a central lock is good enough for Google it's going to be good enough for us. So obviously the central lock in Chubby is resilient to failure right. They've got I think they say that they have five kind of slaves or five Chubby processes communicating and at least those must be up in order for the service to be up. But it means that they don't have to worry about consensus in their actual processes to define who the master process is. They can simply query this lock and the lock will tell them and the lock will say yes you have the lock you can do this job. So what we thought is I thought this is great. I bet I could build this in Redis. So the simplest way to build a lock in Redis is to use the setNX operations. I'm going to use this style. Can you read this text actually? Is it okay? Okay good. I'm going to use this style. I'm sorry about the dollar marks at the front. It's something to do with show off. It will only give me these nice kind of command line prompts if I use dollars. So setNX basically says set this key if it doesn't already exist. So nobody has the lock right now. I'd like to assign myself the lock and I'm going to the value there is a time stamp. So that seems to be a usual way of doing things. That will reply with either a 1 if the operation succeeded or a 0 if the operation failed. So returns 1. We've got the lock. Great. When I go away, when I want to shut down, I can delete the lock that returns okay. That's fine. Some other process can come along and claim the lock. If another process tries to claim the lock at the same time, then it returns 0. It didn't get the lock. Okay, you're getting the picture. But obviously there's an issue here. If there are multiple processes, some processes might just die without cleaning up the state and there's a lock there that hasn't been cleaned up. The algorithm that I'm going to describe is described actually in the Redis documentation for setNX. The idea is that we start off as before. We attempt to acquire the lock, but that fails. So we say, well, maybe a process died and didn't clean up its state. So the lock contains a time stamp and that is the expiry time of the lock. Whatever the process that originally set the lock defined as the expiry time. So I can fetch that value. Fetch T1 and check whether T1 is in the past or if it's in the future. If it's in the future, then okay, somebody else has the lock. That's fine. We'll stop. If it's in the past, we'll say okay, we're going to try and acquire a lock ourselves. So we'll define a new expiry time, T2, which is, for example, right now plus five minutes, something like that. And then what gets set does, it's one of these nice atomic operations within Redis, which allows you to get a key and return the value that key previously held at the same time. And so in this case it returns T1, the old value of the lock, and we can compare it with the value that we got previously and say, well, okay, that time stamp hasn't changed, therefore nobody else tried to acquire the lock at the same time. And so to see how that kind of works, if another process does the same thing, when they call get set at the bottom right, they'll get the new time stamp T2 in response, which means that basically they failed to acquire the lock. So that's basically the algorithm we've worked. You were used and it works great. The new version of Redis actually has a very nice feature which I'd like to talk about which is a little bit on the side, which are transactions. So in the previous slide it's kind of a kind of a pain. The code ends up looking a little bit messy because you can actually clean it up a lot using transactions. So there's actually still quite a lot of commands, but the idea here is really that we start in the same way as previously, we call set an axe, if we return one and we got the lock, otherwise we return zero. And then what we'd like to do really is we'd like to Redis uses the style of locking called optimistic locking rather than kind of pessimistic locking which is maybe more common which says that rather than locking the value of the lock, sorry about that, rather than creating a lock and then saying okay nobody else can change that data because it's mine. Rather than that you just say I'm going to watch this data and then I'm going to perform a load of operations and then I'm going to perform them atomically and if that value I was watching changed, then fail. If it didn't change then that's fine, then everything's consistent and you can process those commands atomically. So the idea here is called watch. Watch is the lock and then we call get. If we get an old time stamp we call multi, this starts the atomic block of operations and we set a new key a new t2 expiry time and we call exec. Exec will either fail or succeed depending on whether that lock was modified by another process or not. It's actually maybe a little bit simpler to look at the commands. So this is as before when we watch it says okay you're watching, fine go get, that returns, we're not in a multi exact block yet, then we call multi. So all further commands instead of being executed right now, they just get queued up by Redis for execution at future points in time and then we call exec and they get executed and you can do really quite interesting things with this. You can watch one or more keys and you can execute many transactions within the multi exact block. So you can really create a really nice kind of atomic mechanism for yourself. So I think it's time to look at some Ruby. So we're going to do this right? No, event machine callbacks, okay. Who was in the talk yesterday by Matthias? Okay that's really great. I was hoping that practically everyone would have gone to that talk because it was great and it means I don't have to explain a whole load of stuff. So right, we're using event machine, how are we going to do this? There are at least two event machine Redis clients. I wrote EM high Redis because I wasn't really happy with the old EM Redis gem. Ask me later if you want to know why. I'm going to use it for the duration purely because I know it and it's fairly similar in terms of interface. So I'm going to skip a whole load of the kind of boilerplate code in these slides but basic idea with event machine is you always run things inside a reactor loop. If I haven't defined a connection that is an instance of EM high Redis client and it's got the kind of callback style that you'd expect in an event machine you call get, you call any Redis operation against a key and it returns a value for you asynchronously. Optionally you can provide callback and errorback blocks to Redis.get or any other operation in fact because it returns a deferable which allows you to bind to success and failure separately if that's useful for you. So what's the interface that we'd like? This is something I'd quite like to talk about a little bit today. I think Matias mentioned yesterday, often event machine gets a bad name and node and other such frameworks because you get all these callbacks right? It's all these callbacks and that's a problem to some people. But what I'd like to say is actually you don't really need to have you can kind of abstract the callbacks away from your main code in most cases and so the kind of interface that I'd like for a lock is something like this I say okay I'm going to create a new lock I'm going to find the key, it's called my lock and 60 which is the timeout so I'm going to set my timeout value 60 seconds in the future which means when I acquire this lock I would have it for 60 seconds. And then what I'd like to do is I'd like to basically callback if I managed to acquire the lock successfully or if the lock didn't acquire then I'd like to get an errorback. So rather than worrying about the kind of complexities of the operations that are actually happening in this redis I'd like to just get these kind of simple callbacks and errorbacks. And this is quite a common style in Event Machine is to rather than returning true or failure or returning true or raising an exception return an object that completes or fails sometime in the future. So this is the code that this is the kind of core bit of the code for the redis lock. This is the algorithm we were talking about earlier. Sorry about the amount of code it's just like that. As you can see it kind of nested lots of callbacks kind of horrible but the point is we don't have to worry about that in our actual interface. That's good. We're calling get, we're checking the expiry time if it's in the past, we're calling get set and then we're looking at the value returned by get set and then depending on that we're either acquiring the lock or failing to acquire the lock. What I've done here as well is you'll notice that inside the redis lock class I'm including EM deferable which is one way you can use an Event Machine of making an instance have this property of failing or succeeding sometime in the future. So if we look at the code we're basically attaching a callback and an errorback to the lock itself. So just kind of two thirds of the way down we're calling fail. So that's the, that will basically end up triggering the errorback block. And then in our lock acquired method there we've got a call to succeed at the bottom and that will end up, that will call the callback. Another thing that's quite common is rather than kind of making a whole class deferable itself, just return a deferable from a method call. And so the unlock method I implemented happened to work like that. So Event Machine comes with this thing called a default deferable which is kind of just a really simple class that mixes in deferable and I end up doing this quite a lot at the start of the method definition just define a new deferable object doesn't have any logic doesn't know anything about the problem is just a thing that can accept callbacks and errorback functions. And then depending on, you know, so if we manage to delete the lock then we succeed at that deferable. And so we can return that deferable to our calling method and then attach a callback and an errorback function and then when those deferables fail or succeed those callbacks get called. It's not an ideal interface because what I was talking about at the start was that I'd like to kind of be able to elect a master process effectively. I'd like to say okay this process is going to be in charge of this operation and ideally I would like to do that kind of at the start of the lifetime of a process and it could just say okay am I the master or not and periodically it could check, you know, do I need to become the master has another node failed. I don't really want to be acquiring and then losing the lock and all this stuff all the time and we want to keep the expiry time for lock fairly short, you know, 60 seconds something like that because if one of these processes fail we'd like that lock to expire and for that role to be taken over by another process relatively quickly. And so we ended up kind of wrapping up that code again and again kind of bundling up a whole load of callbacks inside a nice object that we could use which I called a persistent lock. We gave the persistent lock a timeout which is the timeout for the underlying lock and then also a kind of polling interval between successive attempts to acquire a lock and then we get a very simple interface really unlocked and that's another common pattern you'll see in a lot of event machine code is your if a deferrable object is not kind of suitable for whatever reason you'll find people kind of attaching on event handlers kind of very similar to a kind of JavaScript way of doing things. So in our codes you know we're interested in you know being this process that can synchronise locks sorry synchronise statistics to from Redis to MySQL. You know we can just write this code and and just really simply we get a call back fired if we're the master process and then when we fail to be the master process we get another call. So I think this is actually what makes this code really understandable actually and in many ways event machine gives you a very elegant abstraction actually. It allows you to write these kind of this kind of code that's as long as you kind of have abstracted away the complexity with a simple interface. You can look at the code and really simply understand what's going on and we end up using this for quite a few different tasks within Pusher and it works really great. One of the big advantages is that as I said it means we're tolerant to failure. Rather than having to worry about stopping instances, stopping nodes it's really just kind of it's just easy you don't have to worry about was this process responsible for that task do I need to maybe not stop it maybe I should kind of stop that cleanly first or what have you. You can just kind of boot up on that instance, kill one of the old instances and things just kind of work and it's a lot more resilient. So the next story is about asynchronous communication between processes and this is in the context of Panda. So Panda as I said before it's a hosted video encoding service. People send videos, upload videos to Panda and those videos are encoded and then they're stored on S3 and it provides you a lot of callback functions webhooks when those different events happen or that kind of thing. So obviously one of the core concerns in Panda is to be able to encode videos. So we obviously have a huge number of Pandas that sit around encoding videos. It's kind of obvious. So we'll call those encoders from now on. Now the way we price Panda and sell it is we give people virtual encoders. Virtual Pandas if you will. So I've given a customer maybe purchases five virtual encoders. What that means is that if they have five or more videos in their queue to encode they end up encoding exactly five videos. It means that we don't end up provisioning a whole lot of resources for users that are not currently requiring to process any videos. We just have to make sure that in our system we've got sufficient resources so that we can always satisfy the demand for actual encoders at any given point in time. So this is kind of what I'm describing. So we've got maybe a user a red user and the red user has a encoding capacity of three. So we've got three of these encoders down on the right-hand side. Three of those are active. Say the blue user has an encoding capacity of two but they've only got one video right now encoding they've got nothing in their queue and so this kind of thing. What we'd like is that when any one of these jobs completes, we've finished encoding a video we'd like to automatically assign a new job for that user if they have space in their allocation of encoders and when a new job arrives in the system we'd like to immediately push that into one of the available encoders. So in many ways this problem is kind of a classic kind of queuing problem right and in many ways you would solve this by using delayed job or Beanstalk or a library like that. You just have a queue of work that has to happen and you're going to do it as and when you have resources. But Panda is a little bit different as I've described in that you end up queuing especially having a queue for every user and it's not always necessarily trivial to determine which encoder should take on which job next. Initially we did this with a whole lot of database logic in each encoder but that was kind of a horrible solution because the complexity of this just kept getting bigger and bigger and we wanted to optimise the placement of encodings in the system and you end up putting a load of logic in all your encoders and they have to kind of pull the database so it's not a push base model and just as you scale the number of encoders it just kind of gets unmanageable. So what we do like to do is have a kind of a way of kind of pushing those jobs which is why we introduced this single manager process. So the manager process has a knowledge of all of the queues for all of the users what jobs you're raising and it has a knowledge of all of the different encoders in the system who's encoding what videos for which clients and which encoders are free and we also like to be able to boot up a new encoding instances and have all of those kind of encoders just automatically available and not have to kind of configure anything. The manager should just automatically be aware of what's available. And so we wanted to have this kind of asynchronous communication between the manager and all of the different coders in the system. So the manager could ask an encoder what's your status and they should reply well I'm busy or I'm doing something else. They should be able to say I've completed the video, all this kind of thing. Synchronous communication is kind of a bad idea in this context because encoding a video is kind of a blocking operation and they're not going to respond very quickly and you don't know where they are even if you did and so having an asynchronous method just works so much better. So the way we did this is we solved it using Redis lists. So Redis lists very quickly. They are extremely cheap and they have a lot of operations for pushing and popping onto a list. So we can push an item onto a list we can pop that item off the list again if we try to pop an item on the list that's already empty which turns nil. We can also do these kind of blocking pop operations and what this means is that you don't have anything to keep kind of popping all the time and saying is there anything new on the list? Is there a new communication for me yet? You don't have to do that. You can just kind of blocking pop and you will receive a call you'll be notified when something arrives on that list. So this is the kind of Ruby interface, BR pop operation and in this case we end up returning nil which is what Redis returns if it kind of timed out. So if it timed out it was a time out otherwise we got a value. So same as before getting rid of the callbacks what do we do? So we need an object to handle this. We call it a post box. Every process has a post box. What's the really nice interface for this? I've got a post box and all I want to know is a process is when am I going to get a message. Under the hood we'll use blocking pop. So it means when I send a message to a user that process is immediately notified of that. This is just kind of the boilerplate code for that class. It allows us to attach an on message callback and it allows us to send a message just by doing an L push onto a list. This is the listen method which as you can see does the blocking write to pop operation to see whether it was a time out. In all cases it ends up calling itself again in the callback after it completes so this kind of this listen call just kind of endlessly circulates. I would probably want to have some method of telling this class to stop at some point but I kind of skipped over that. But again, maybe this API is still not really perfect. All this allows us to do is to attach an on message handler but maybe I have different parts of my code base that are responsible for different operations. One part of the code base is responsible for receiving new jobs that require work. Maybe another part of the code is responsible for handling job complete calls from encoders. Again, just kind of shape the API to fit the problem. In Panda we ended up with an interface something like this. We led to create a post box for a process so this is the manager process we create a new post box for it and then we can just kind of register for these kind of different callbacks that we're interested in. So we say well you know new encoder do something so one class in the system maybe registers for this call and in another part of the code base we register for the job complete and then the encoder things look basically the same. So the advantage of this was the manager gave us a very clear global view of the system. The asynchronous communication within Redis worked really great and the system ended up being very resilient to failure. Encoders could fail in the middle of processing job because we were querying their status we'd find out oh this encoder's vanished okay we'll reschedule that job for someone else and actually we're even resilient to manager failure because all that communication to the manager goes over Redis lists and so if the manager for some reason fails we can just boot it up again and all of those that mailbox is stored in Redis and it can retrieve all the new events that happened in the system and process them accordingly. Zero configuration scaling okay how am I doing for time okay so last story really really quickly this is actually a really fun one so hopefully we'll get through it so this is about synchronising shared state we have this feature called the debug console and the debug console allows you to log into your account click on the debug console and see all the events that are happening in your system so you'll see that users are connecting, they're disconnecting, they're subscribing to channels, unsubscribing to channels they're sending messages, messages that have been delivered all this kind of thing so this is extremely useful for people especially when they're developing their applications they can just something's not happening they're working the way they expect and see exactly what's going on. The problem is that they're kind of supporting this for a busy application there could be hundreds of thousands, tens of thousands of operations per second happening in the system and so exposing all of this data is just a shed load of information to push over the message bus and so really what we'd like to know is which users have a debug console open right now because there's a whole load of extra work I'd like I need to do in the case that they have that open so maybe the simplest way to think about this is well let's just use a set right so we have a set in Redis, really really cheap O1 adding, removing and querying to see whether an element is a member of a set. The problem with using a set is that okay a user and they have multiple debug consoles open, multiple tabs in their window then when one of those closes you need to know whether that was the last one right and so you can't just remove from a set at that point because maybe they already had one open so the solution to that is to use sorted sets. A sorted set is very simply it's a set where every element has a score associated to it and it allows very cheap kind of range queries by score by index and that kind of thing. I'm running out of time so I'm going to kind of skip over this really quickly. Basically we can query the score of an element so we can query app ID 42, it's nil, okay that means it's not even in the set, okay so that means the debug console is not open. When someone opens the debug console we can call Z increment by one that returns one, that's the new score of that element, that means somebody has their debug console open, okay we can query that, we can get ranges, see all the elements that have at least one, all the apps that have at least one debug console open and then we can kind of keep incrementing and decrementing and you get the idea, okay so the question is how do we know if a debug console is open? So we have this kind of distributed system, API processes, socket server processes and they're all kind of doing work and they're all receiving subscriptions and unsubscriptions and they'd kind of like to know whether they need to tell kind of broadcast that information to the debug console so I broadcast that information to the message bus so that people can kind of view it. So we could definitely do this by calling Z score which is a really nice atomic operation, tells us in a cheap way whether that constant time way whether the console is open or not, however we end up doing that a lot so we solved the problem using the atomic operations to have a kind of consistent view of whether the application has a debug console open or not but we kind of haven't really solved the problem entirely. So really what we'd like to do is to be able to push that change instead of each process requesting that over and over and over again we'd just like to tell them ok it changed. And let us come to the rescue again let us pub sub is really really great it's really performant, it's really simple to solve problems really nicely. The idea here is that if you're interested in the fact that the set of debug consoles open changed then you can just subscribe to that and so pub sub allows us to really easily decouple the system. We can just say when the set changes we can publish a notification and then any process that's interested in that event can very easily subscribe. So again skipping quickly over the interface we can call subscribe against a channel and that returns a result well it returns a subscription result we can subscribe again if we want to do. We can publish publish to that channel, the foobar message that ends up telling us how many subscriptions received, how many subscribers there were that received that message which can be quite useful. And then what happens is when you call subscribe that redis process blocks and can no longer use it as a normal communications channel but that just basically receives messages whenever they arrive in the system ok so the interface for EM high redis is very simple, you just subscribe, you can keep subscribing and subscribing on the same connection as many times as you like and then you just receive a message whenever a message is sent to the pub sub channel and you get the channel and the message ok skip that and so this is what we want to do we want to publish when it's open or closed and so we can do this because of the fact that the incremental operations in redis return the new value of that, the new score of that value, we can really easily anatomically which is important when a new console is opened we can increment by one and then the new score is returned to us, if that score is one it means that previously it was definitely zero and so this is new, the set changed, we can publish that event exactly the same on close, if the new score is zero that means we're closed and this means that what we'd like to do is we'd like to update an in memory list so rather than querying redis all the time if we're interested as a process in the fact that that list changed we can just subscribe to that event so we can say subscribe and then when we get that a message on that channel we can just say ok give me all of the consoles that have score between one and infinity and we'll store that in an array in memory and we can just query that kind of almost free and so to wrap up the atomic operations were extremely useful for this meant that we could have this distributed state in the system and have it change that distributed state atomically which is important so that we could publish those events the PubSub just made it really easy to propagate that state across the system and made sure that we could de-couple the components if a new process arrives and we need to write a new process that cares about the state of those debug consoles we can use this and we've used this for a lot of different data in the system not just this debug consoles it's been a really flexible tool so just that's perfect timing thanks for coming have you got any questions no don't block the event loop there's nothing wrong with it for doing message passing well you end up with basically one connection if you're doing subscriptions you can subscribe to multiple channels over the same redis connection so you can keep calling subscribe and unsubscribe and changing your subscriptions in real time you can usually end up with one connection which is blocking on some operation and then one connection which you are using to send other messages right that's a good question so redis has a very nice feature for slaving so I can boot up a new instance of redis and slave off another instance so we end up having at least one slave for the data and then all of the operations are sent to that slave as they happen and so it's kind of always kept up to date and then you can do a certain well if you want to you can build a kind of auto failover system in that yep we haven't we could have done that especially for the statistics we could have used expires we ended up and as I said we're kind of storing it transiently in redis so we're actually only storing it for about five minutes and then we're flushing it to the database so that we can give people near real time statistics yes that could probably work that is fairly new anything else okay great thanks very much