 Hello everybody, my name is Tim Uchkin and today I'm going to talk about gear man earlier today Paul showed you a diagram and inverted triangle And so far everybody's been talking at the lower end of that at the CPU level and and at the chip level This is more of you know How do you get the panelization at the application level for just normal people like me not geniuses like the earlier speakers? and in my case I Was contracted to jump on a project for an spiral for a client and they had chosen the stack of you know, Linux Postgres Ruby Rails to write this application and the application required a lot of Scalability for the lack of a better word and and the only way to get that in my opinion with foot with Chosen stack was gear man. So I chose gear man to get this done and today. I'm going to go through What gear man is and how you can work with gear man and how gear man helped us? to build this product So what is gear man? Gear man is an anagram for manager Well, obviously it's more than that, but it's a massively distributed massively fault-tongued fork mechanism So it's basically just a fork mechanism. It allows you To parallelize at at the fork level at the application level without having to choose anything special. No special languages No special CPUs At this core gear man is just a protocol and this protocol has multiple implementations the Originally it was started as Pearl server. There's also a C server Which is at launch pad? There's also a C plus version C++ version of this There are also several other ones in various stages of abandonment. There's a or lang version There's even a rupee version that somebody kind of started and went along with for a while and then left There's a fork of the C version with with the crown and epic functionality built in so People have been fooling around with it And I'm sure many people in here might when they want to pick one of those up go The client API Is available for almost any language So you name it there's probably a gear man client for it. There's also a more interestingly command-line tools so you can actually just use batch to to submit jobs and the process jobs for it So, you know, if you can read pipes, you can do it. There's also udfs for MySQL Postgres and drizzle so you can actually submit jobs from inside of triggers And store procedures, which is also quite interesting There's a subtle difference between a client and a worker client submits jobs worker processes jobs and Obviously anything that functions as a client will more than likely function as a worker. So including the command-line tools So why would you use gear man as opposed to something else? probably I would say the number four flexible application design is the most the best reason to use gear man It's open source. Lots of things are open source. It's simple. It's fast No single point of failure, but again a lot of things like rabbit MQ or MQP or whatever would give you as well But gear man is more flexible than all of those things. So For example gear man Gives you persistent cues or not by default It's in memory only but you can choose your various persistence options You can use off-the-shelf databases like my scale and drizzle took your cabinet man cash And I'm sure there's a couple of pieces of code around for just by many kind of data store that you want Or you can write your own. It's not that hard here You can work in the foreground synchronously or in the background asynchronously and I'll be showing examples of how to work with both of these in a little bit it's actually quite simple and Large-scale applications work well, but the great thing is it's very easy to start off simple It's very easy to start off small. You just you know, I've got install gear man Plug in your code and off you go. And if you have just one worker, no problem. If you have 150 workers, no problem How does it work? Well, basically it works You have this broker, which is the job server and you have 10 number of these that you set up Of course, it's recommended you have at least two Clients connect to and again in terms of nomenclature clients somebody that submits a job is requesting some work be done Connect to any of the job servers and the workers connect up all the job servers So if one job server goes down then the worker fetches the jobs from the other job server It's important to note here that the job servers themselves are not communicating with each other And I you know, I think that's what maybe we can fair time. We'll talk about that later Some of the use cases Just you know this scattergather map reduce asynchronous queues and pipeline processing. These are the ways I use it. I'm sure There are many many other ways to use it to use it, but In my case, these are the functions that I took Scattergather is Basically, you take a job you break it up into little pieces you hand them off to workers the workers get them done and feed you back the result and normally this this is done synchronously tasks don't need to be related and One of the interesting use cases of this is to push the logic to where the data is So if for example, you need to lead a log file or the process of binary or to take a snapshot or Whatever you just push that work you put the worker on that machine and you just push that work and it's done fast So you don't have to fetch a huge amount of data over the network And that's this basically how it works. You have a client that just missed the jobs splits them up and you have a resize image worker Maybe somebody does a location search. Maybe whatever a database query So each one of these jobs could be done on the machine with the databases the machine where the images are whatever Map reduce This is the one personally. I use a lot Because especially if you have a large number of records you need the process you can kind of break these up and say Worker X, you know process records one through n whatever n plus one through y whatever so This this could also be multi-tier and That could be asynchronous and in my case, it's mostly asynchronous, but you can also them synchronously obviously in this case the client takes a task breaks it up another pieces hands them off to workers and A worker can then further break that task up and give it to other workers to get it done. So And of course you can just use it as a simple asynchronous queue In this case it'd be no different than using any other queue You just this is especially handy for long latency tasks like emails log entries indexing and batch operations, which is This I use this a lot Which is a client it acts as both a client and a worker So a client submits a task the worker does something and then takes the result and hands it off to another worker for further Processing and it just goes on and on and on and when you do something like this More often than not you can keep the state in the data that itself as you're passing along Or if you want to you can keep the state in a centralized database And then each worker would have to coordinate the state of the job So Here's some simple examples in my case again. I'm using Ruby. So this this is all Ruby code It should be relatively easy to understand for everybody Asynchronous processing is basically a one-liner in Ruby, which is great So you you create the gear man client and then you say do task you give the name of the queue You pass some data and then you specify whether or not it's going to be background tasks or not Now there are additional piece of data you can pass in there you can pass in for example The name of your client you could pass in some sort of a unique ID for the job and various other options that gear Man allows for but again, this could be just done as a one-liner, which is very handy Workers themselves very easy to do you read you register an ability so in this case the worker can recalculate the index and Gear man connects to your worker And then passes you some data and a job ID and then the job object I'll talk about that in a second, but basically at this point. It's up to the worker To know what that piece of data is as far as as far as the technically is concerned It's just the string so the worker has to know, you know what that string represents in this case It's some sort of a serialized object you answer Eliza you start working on But it could be anything again just a string as far as gear man's concerned a Little more complex client in this case Where the work with the client submits the task and then registers events what to do if there's an exception What to do when the work is complete what to do if there's a warning what to do when a retry conditions reached etc and what happens when Worker fails in this case these are all lambdas so and this is how you return data to a synchronous clients I mean in this is just in my case. It's just pure Ruby So you just return a value from your function like you would return any value from any function and the client takes that turns it into a string and dumps it now if Your return value is an object and can't return into a string then you know obviously you're responsible for turning that into a string before you pass it on the gear man and Then it's up to you what that string representation could look like could be Jason could be XML as in the previous example Just could be a partial Object of some sort. You just dump it back in gear man will do Lee deliver it back to the client and If you can also raise exceptions for synchronous processes only obviously for asynchronous processes If you raise an exception in your code gear man will count that job has not been done and then submit it to a next available worker and If all your workers had the same, I don't know Exception then that job will just stay in gear man forever until some worker processes it successfully however, if if the job is submitted synchronously then this exception will get raised back to the client and then the on exception Method will be called and so you can then process that exception report it. Whatever you can need to do with it You can oops did I go the wrong way? Yep You can also return the data to the client in chunks. So this is good for progress bars or whatever. So you just Do some processing and then at some point and say okay. Well here so this is In this case you say on data do this on complete do this And then you just add the task onto the onto the job and then on the worker side The worker does it and it sends a chunk of data It sends a chunk one chunk two chunk three and at the end it sends a native EOD and at that point gear man knows that the job's done and closes the job out and you can also Query the state of the queue So this is a simple function. I wrote But it's basically just to open a socket send the command gear man send you back some data And you just process it in this case I took it and And dumped it out and what it shows you is the name of the queue in this case at the bottom for example The queues names Twitter. There are 126 jobs in the queue. There are three Running workers and there's four workers total So you get a good idea of what's happening in your queue at any given time How many workers are registered? How many are working? How big is your backlog blah blah blah you can also take all this data obviously and send it to some sort of a monitoring system SNMP whatever you need to do and then raise alarms that the queues are backed up You could do all kinds of interesting stuff if for example You're always expecting 24 workers of some type and you don't have 24 workers Maybe something's wrong raise an alarm do something with it As I mentioned earlier you can there are also database UDFs for Postgres and my scale and of course of course drizzle So in this case I don't know this is really powerful to me Like you know something happens in your database of some value changes and then you want to submit this to a job some state changes you want to submit this to a worker and the database just Submits that connects the gear man So does the worker and then the worker does whatever it needs to do so Personally, I mean a lot of people are probably spooked about putting this kind of functionality in their database and that's fine too You don't have to but you can if you want and of course in writing any kind of a complex application You're gonna have all kinds of optional ingredients in your mix. You can have a database in my case. It was Postgres You're gonna have You can have shared or distributed file systems to use as locking mechanisms or you know Shared data or whatever. I would not obviously be putting hard chunks of binary data into the gear man to be passed around You can leverage other types of protocols and You know in my case image manipulation and full text indexing These are jobs that you want to do in the background most of the time so You're gonna have all kinds of other tools. The great thing about gear man is it's a very simple protocol Tell not like thing, you know, you connect you send some strings you get some strings back so you can kind of use your imagination and go wild and what you can do with it, but Most of the time the client libraries and the stuff that you get for free is Plenty I want to talk about timeouts a little bit. There is no timeout in gear so if you if a client connects the gear man and Says that it's working on a job gearman will wait forever for that job to finish It will not timeout. It's up to you. It's up to you to monitor your clients Make sure they're not in stuck in a loop somehow. It's up to you to monitor your cues. Nothing's gonna be raised so You may want to take out out yourself or you can kind of leverage this for your own purposes as well So Now, you know, what are the shortcomings? This kind of weird that the clients and the workers must connect the all servers that that if one of your gear man goes down The the data that's in that queue is not known to any other other job brokers So you have to if it comes back up again, I'll pick it up again But if that machine is blown up and it's on fire, then you know, you might have lost that data So you have to take some precautions to make sure that date that data is Been replicated someplace, you know, you could do that the database level or whatever you want to do It's a little bit slower than let's say rabbit or you know amp cube or something like that, but in my case You know, who cares a couple of milliseconds here and there. I mean In my case, that doesn't really matter. It takes longer to connect to the database and do a query than that So I don't really don't care I'm not crazy about the way it logs. I think it could tell you a lot more details on the log but And as I said earlier the steps must be taken to sure recovery of queued messages if a server is completely destroyed. So And finally small community the development has slowed The couple of the lead developers are, you know, kind of not that interested in any more So I think this is a really great opportunity for people here For example, if somebody wants to jump on a really interesting project, it's stable. It works well it's There's a couple of maintainers from Ubuntu on it all the codes on launch pad So if anybody wants to go take a look at it and people are Talking about putting some more functionality and especially epic support which seems to be very interesting like, you know Put in a job and then Gearman will submit that job in five minutes or you know You pick up a job you decide you don't want to do it But you don't want to give back to Gearman to immediately hand off to somebody else So you give it back to Gearman with a delay and say, you know, don't touch this for another hour Or submit this every Tuesday at 5 p.m. Or whatever So if you want to get involved Gearman.org There's the IRC and That's it Short and sweet. Thank you very much team questions, please Again on the back. Is it architecture independent? No I'm sorry. Did you ask me if it was dependent or independent? Independent as in you can you can run it run it on little end in big Indian 64 32 bit you just compile it from source. I don't think it runs on Windows I think it's got to be you next thing, but you just get the source code and compile it Okay, so please Join me to say thank you with the team