 Everybody thanks for coming. I'm going to talk about how to play nice with others This is mainly talking about tools and to be mainly tools that you can use in Multi-language environments. So first off my name is Jeremy Heingartner I'm on Twitter at copious free time and Jeremy at Heingartner.org or copious free time.org. So, you know Play with that as you like. I work for company called collective intellect That's that part's not really too important But the fun part is is all the different things that we use as far as technologies in the production of the products we make Last year. How many were here last year? Did anybody go to mine in for non-stock about building a Ruby infrastructure a Few okay, so still working for collective intellect but our Ruby infrastructure has grown a little bit more and we've added even more Systems to our entire infrastructure including we've got some Java services a couple of CC plus plus libraries a Groovy application and you know 20 micro rails apps some Sinatra apps and a whole slew of gems so In this entire environment, we need to have things that can play nice with each other We need to be able to have the Java applications use some of the same resources that the Ruby applications do some of the CC plus plus You know the groovy applications all these guys need to be able to talk to each other and what can we use? Besides what everyone loves to use is the relational database sometimes. That's not the right tool for the job. So We'll start out with Everybody raise your hand if you have a freight a favorite programming language There shouldn't be a hand down no no ever keep them up keep them up if raise your hand if you have a Favorite programming language Okay, drop your you know keep it up. We're gonna a little survey See how long you can keep your hands up first. No. All right drop your hand if it's Ruby We still have some up good awesome, okay small talk No Java C sharp C plus plus C Assembler What am I missing? fourth, okay PHP JavaScript and Pearl. Okay. All right Well, okay, so we have all these language. Yes at a Ruby conference Ruby is the probably the most popular Favorite language, but we do have others even in this room So we have all these different languages and they need to be able to talk to each other Through some mechanism because you're gonna have some program that's written in fourth And it's gonna have to do something and then some program that's written in Java or Ruby or small talk or something Is gonna have to talk to it and how are they gonna exchange the information? You know the basic is just a file and stuff like that, but When I started looking at this problem and trying to figure out what was going on There were some commonalities between all of these different things you have languages that need to talk to each other And what are the commonalities? Involved in ways of having things talk to each other Actually Let's take it a different way So what are some things that you learned in how many people have computer science degrees or a computer science background? Okay What are some of the things that you learned in computer science that have absolutely nothing to do with an actual language? Big oh, okay Computational complexity All right, who has the big assembler Mix assembly mix assembler actually Well, that actually brings us to probably the big one. I'm trying to get to what is all of that mix assembler stuff used for Teaching what? Data structures, okay That's the key what I'm thinking of is that everyone here learned something about data structures Yeah Everybody how many people have the big white book with the blue sweet sweep on it that's got rivest and somebody else Yeah, okay, that's the one I learned on to so we have data structures We've got all these great data structures and every single language has some sort of implementation of a vast majority of these data Structures some of them may not think of data structures by number But hey and Ruby you've got integer number float Rational imaginary all those different types of things so you can consider it a data structure now There's other different things that are also a commonality between things that between pan language things And the one I'm another one. I'm thinking of is communication So currently a little quick survey What does everyone currently use to communicate a data structure between different applications? soap Say again Corba Jason HTTP tab delimited files Marshall YAML well Marshall you use that between languages, okay? Oh well Okay We could stretch the difference between languages maybe a little bit All right, so we've got Marshall flat files of different formats you have another one over here Okay, there we go, so we've got all these in a couple of it. Yes, you had one Say again AMF. All right So these are all different ways of communicating data structures between programs and I kind of think of them in two realms We have network based communication and we have library You know API IPC based flat file, you know, those are on so basically I think of these as network and local So the network based is communicating a data structure between different physical machines and a local Communication is something that's used maybe on the same system. It doesn't require some sort of network API. Does this make sense? All right Now there's a third aspect to describing these tools to communicate data structures and that is persistence So I'm roughly this may be something that made out everybody agree with I'm kind of defining persistence in three ways First we have none No persistence whatsoever. I think that's a valid way for persistence. There is none Hey, I'll get to one in a minute snapshot snapshot persistence I kind of think of as things where you have something it's got a data structure in it snapshot persistence is Just taking a snapshot of that data structure saving it to disk or solid state or something like that So these would be I'll get into examples here in a little bit, but one of the things about snapshot persistence is the Amount that can be persisted to some sort of permanent storage is strictly limited to the amount of RAM that you have So it's a snapshot of something in RAM that saves it to disk Kind of think of it as a checkpoint or a rollback or something like that And then the other one I'm coming. These are all my kind of categories of persistence So if you have better ones, please let me know I'm great to incorporate them Lifetime and lifetime can mean the usefulness lifetime of the usefulness of the data or maybe it is forever If the data is useful forever, then it may be forever The data is only useful for a few days or while it's being worked on or something like that then maybe persistence Only counts or lifetime persistence persistence only counts as long as that happens So First we're gonna see if anybody's been paying attention. Hopefully you have We're gonna using these three different descriptors. We're going to describe a couple of different tools and see if anybody can guess what they are So these are two widely used cross language tools So first one. Oh, that's just the equals Network communication so of these three things we have communication persistence and data structure So we're going to describe a tool in terms of these three different attributes So the first one is this tool has network communication No persistence any guesses yet So HTTP, okay. No RPC and a hash data structure now memcash That's what it is. It has no persistence. It has network communication and a hash data structure So this is a cross language tool It's a server tool because it is network and these are kind of how I'm describing it if this Taxonomy if you will doesn't catch then let me know. I'll try to maybe I'm thinking about this in the wrong way This is the way I've been working on it. So it works for me So let's try another one This one is also a network communication. I'm one behind Has lifetime persistence. So this is data that is saved for a long period of time for some reason And it uses say a struct data structure a database. Yeah Pick your favorite database You know, have I set up everybody's yet? Yeah, there we are So this is kind of this in vc. Where I'm coming from. This is just kind of the way I'm sort of categorizing tools in some sort of taxonomy So again, we have our taxonomy. We have a data structure huge selection of data structures Communication it's either network-based or it's local and the persistence. It's either non snapshot or lifetime and the other criteria for for me for a cross language tool for communicating this type of stuff is it must support at least three languages so Quick little survey How many people are working on a project that has more than one language? Okay, more than two Okay, so the ones that only have one language. What is it? JavaScript only only JavaScript Okay, three languages All right, what three languages? Okay, JavaScript, Pro-N-Shelf four languages. Yes JavaScript groovy Ruby and Java five Yes Okay, so you got it sounds like an interest seven. All right That's awesome. So I gave this talk once before and there was nobody in the room That was working on a project that had fewer than three languages in it So for me, I figure three languages is probably the rough average number of languages I mean even in your general Ruby on Rails project. You're gonna have Ruby and JavaScript at least maybe some shell or something other along the way Say again sequel counts to yeah, that's very true So in your in every single Ruby on Rails project generally you're gonna have at least three languages sequel JavaScript and Ruby so So this is kind of the background and now I'm gonna talk about a few different tools that I that I enjoy working with and And how they fall into this realm of stuff So first off is Tokyo products. Who's familiar with any of these? All right smattering Which is good because there's and the thing I started thinking about this is because in the past year it seemed like there's been a proliferation of Simple tools to do very, you know general things for many many many different languages. So in Tokyo Cabinet and other products are one of the ones that's come to the forefront quite a bit In fact, I think there were two or three talks on Tokyo products at Ruby kai-gi and maybe two or three it I just forgot the one that was in Toronto so for me Tokyo cabinet I I'm using this currently in production in terms of type in terms of tyrant But we'll get that in just a second But this is kind of how I describe time Tokyo cabinet data structures. It has arrays has hashes It has structs. It's basically has three different file formats Hash or for hash vitri table and array if anyone wants a fixed-length file with fixed-length number of records It's really good for that too In terms of cabinet its communication is local. It's a straight library. That's it Persistence lifetime saves it to disk and other process can access it that kind of thing and then in terms of its languages It ships with see Pearl Ruby Java Lua and Python. I mean you can't ask for much more than you know a nice tool like that Anyone using cabinet in production? We got one all right two on the back awesome and Tokyo tyrant is basically converts any Tokyo cabinet database into a network server So you have the exact same data structures. You can have arrays hashes structs all that kind of good stuff It's network and lifetime persistence So basically a Tokyo tyrant just sits on top of the Tokyo cabinet file and gives it a network interface now There's some pretty cool bonuses for using Tokyo tyrant One of them is compression how many people anybody using well, there's only a few of us We personally are using Tokyo tyrant to store XML files And one of the interesting things about XML files is its text and compresses really easily So in Tokyo tyrant your values in your key value stores can be automatically compressed and decompressed with Zlib So you get text was at 80% saving something like that. That's really nice Useful the other one is tyrant fully understands the memcache D protocol Have you ever wanted to have your memcache persisted to disk raise your hand? one two three four, okay Shutdown memcache D startup Tokyo tyrant. You're done. That's all you have to do The other one is tyrant actually has a full restful API API you can speak HTTP to it with a get request It'll respond with the value at that URL. You can do a put request. It'll put a value at that URL. It's very it's very cool You well you can get to expires With a Lua extension, which is right and that's actually the sample they have on the website is how to do expiring keys So the Lua extension is very interesting because in a tyrant server you can Have a Lua extension that in corp that can be fired off with a particular Request is made and I'm gonna do some demos in a little bit and I'll show you one exactly like that The other thing is tyrant has replication. You can do master master master slave Master multiple slave and I tried to break it by doing a master master master cycle It didn't break, but I don't think it's supported So, you know, it's it's pretty solid. It's a very solid system And I fully recommend using this in production And a slight pitch I wrote a gem called tyrant manager if you do have to have more than one tyrant then check it out. That's it Okay, who's heard of Redis All right more than Tokyo. This is interesting. So Okay, how many were at Mountain West Okay, all of you saw Redis there I assume Redis is is another and it literally calls itself a data structure server It stores different data structures Then Tokyo tyrant It has a list Hashes and sets and it'll also do your standard key value pairs and also increment decrement of Numbers that kind of thing, but the really cool part is this list and set. This is where it really shines It is a it's network based it has its own protocol And its persistence model is snapshot That means that every so often Redis in the background saves everything that it has in memory to disk It does it asynchronously in the background so it'll save it But if your server does die then you do it'll recover from that But you will have missed that window of opportunity between the last save and the current time that it died So it's it's a it's a good solid piece of work And it has a slew of languages. These are all ones that are written for it, you know Ruby Python PHP Erlang tickle pearl Lua and Java. So this is a this is a pretty good piece of work right here Redis also has some bonuses It has replication. It'll do master master or master slave replication so you can stream data from one to other It actually has server-to-server data movement. You can tell hey server a Take this record and move it to server B And you don't have to tell it, you know, hey get the value and then put it in server B You just tell server a to put that value on server B, which I think is very very cool The one that I'm I'll be demoing is in server set operations. So Redis has all these set values. Well, you can say hey, I've got these values in set a these values in set B Hey Redis tell me what the intersection of a and b is and return it to me as a list So it's a it's a niche area, but this is the only Pro this is the only thing I know of that will actually do that in a network realm It also has in server sorting So if you have a list you can say hey give me the values from this list But return them to me sort it so you don't have to do it on your side And then what I said before is the asynchronous snapshots in memory This is one that I think there's only one person in this room that may have heard of besides me Anyone heard of lib J log? Okay, lib J log. I like I think it's a great tool I have unfortunately have yet to actually use it in an actual application, but I think it's a great concept It is nothing more than a Library for doing publish subscribe on disk between processes. It's it's really cool It is used in production today But not not by me, but by the guys at omni it they have it It is communication local. It is strictly a library And you open it up. You say here's my cue. Here's my subscription another process can open up and say here's my subscription You know one can publish the other one can subscribe it cleans up the disk as it's going along It's a pretty cool thing. It does have lifetime persistence. Everything is on disk There's you know, there's nothing in memory other than just the operation of the actual library It has C pearl and PHP right now, and I'm currently working on the Ruby one And the bonus is here is we get actual published subscribe behavior So if I publish if I have someone subscribes, you know five different people subscribe to a queue I publish once all five people will get a copy. So it's that type of operation Okay, beanstalk D. How many people have heard of beanstalk D? All right, I know a few people here that are actually using in production. This is another one of my all-time favorite libraries In fact so much. So I manage this one for fedora if anyone's of needs this in fedora or sentos. It's there Just yum install. You're good to go Beanstalk D is its data structure that uses is a cue straight up cue. That's it it has network communication and There is no persistence. So in this realm, it's essentially What memcache is but instead of a hash, it's a cue. So you have memcache for hashes and being stalked for cues It doesn't have a persistence now that the next minor version release will include persistence So that's on the way and if you want to experiment with it It's currently in the source code just the option to turn it on is commented out So it'll be a minus. I think minus D option on the command line and just the use of the minus D is commented out in the library Now the really cool bonus on this is it's not just a cue. It's a it's a job cue. So when you have You have someone that's pushing jobs onto the cue and many workers that are pulling jobs off the cue This isn't published subscribe. This is straight cue. You know when one person grabs an item They're the only one that gets it, but it is job cue behavior. So Someone says they're gonna do the job then they also have to say that they've done it So you reserve a job when you're done with it you tell beanstalk. I'm done with it You can delete the job So it is a strict job cue behavior if the person who reserve the job fails and they don't do anything The job will get reinserted back into the cue. So this is a this is a great tool excellent tool Another one that all right zero MQ has anyone heard of this one aha Yes, a new one for everyone This has great great great great great potential Everyone here is familiar with get or a large majority of you are so get has this concept of the plumbing in the porcelain Zero MQ. I'm considering is the plumbing of any type of message system you want to do You know publish subscribe central broker any of these different types of ways of doing it It has the ability and all the plumbing is there to facilitate you implementing it your own yourself It's zero MQ.org They are saying they are the fastest messaging ever And it is it's a cue, but however you want to add additional attributes to cues You're free to do that in whatever way you want The communication is network it'll do both broker and you know just Multi-cast if you like different types of things along those lines I haven't played with this in as much. It does ship with all of these languages Currently from the vendor the CC plus plus. Hey cobalt. You're taking care of right here You're all good. It is the only one that I found so far that really has a good Mono implementation and it comes with a common language run timepiece Fortran anyone using Fortran All right Well, you can use zero MQ with Fortran if you like a couple other interesting things this also speaks a mpq So if you you can have it be an interface between your queuing system and an a mpq queuing system So it'll just sit there in the middle and you can send messages back and forth and it'll take care of the exchange So the persistence on it is just in the past couple of versions has become Lifetime I wouldn't really call it life It wouldn't really call it's the lifetime of the usefulness of the message so what happens is You can set a high watermark for the amount of data that can flow into zero MQ and if it crosses that high watermark It'll start spilling it to disk so you never actually cross a memory barrier And then when when your current in process messages have been drained It'll start reading them back off a disk and into memory so You're not going to blow out the memory on your system with this one. It'll spool to disk beanstalk It'll go till you have enough memory and I have almost I put three million jobs on beanstalk before of not insignificant size and You know it couldn't really work it through because the processes couldn't have enough memory to actually pull the jobs off the queue So that was a little fun But yeah, so this is basically lifetime of the usefulness of the message is what its data star is Now there's a couple. This is the bonus. I've probably gone over some of these already You can implement your own messaging models. However you like It has white papers examples documentation out the wazoo, so feel free to check it out. Give it a good look I'm still working on putting together a couple of really good examples So this one I actually won't be demoing today And as they say fastest messaging ever there are a couple of little they do have one demo program for ruby That does do a performance thing and it's you know one or two byte messages And it's some god-awful number of millions of messages, you know per second So that's this is the quick ones that I that I'm currently talking about and The ones I really like there's a lot more So MongoDB there's we'll talk on that tomorrow. I'm going to it NMDB is a network database There's a whole all this stuff anyone familiar with the term no SQL now Yeah, there's actually mailing lists. There was a conference once and earlier this year There's a lot of these different ones and I consider all of these cross language. How many people know about EH cash? Yeah, big Java. They actually just I think they just got purchased by Voldemort or Spring or somebody like that. I forget but EH cash has a breath It's used with hibernate. Yeah, but you can use it in anything it has a restful API And it's been around a long time. It's very very very very stable Flare is a new one on the market. It kind of uses Tokyo cabinet under the covers, but it actually it does its own It's used for if you want to have Sharded data and flare will actually spin up new shards on its own So it's not I wouldn't see it's really super stable yet. That's when I'm looking at Cassandra. Who's heard of Cassandra? Get a few. This is the one that was by Facebook, right? Yeah, Facebook. It's getting some press lately couch DB. Everybody likes couch DB. Yeah. Yeah net CDF Who's heard of that or HDFS 5? Okay, so Really old-school technology. It's been around a very long time But when you need to deal with massive massive massive quantities of data It's probably got more experience than any of these and doing so And it's basically it's a data file format for storing self-describing data So I'm probably gonna look at it for a few different things here and there Solar who's used solar before all right. It's a cross language storage tool I mean you can think of it as a column store. It's more than just a full-text search index, but solar I use solar a lot Did I miss any anybody got some favorites? They haven't mentioned Awesome. Well, if you do find some more, let me know because I'm starting to collect all these that's fun There it's it's fun to see all the stuff that's coming around in the past year on terms of stuff So now let's get on to some demos. I'm gonna have to switch to a mirrored So we'll start out with Let's do Tokyo cabinet So the sample data I'm using is first and last names from the US Census so it gives us a nice little variety of data and I have a little tiny library to read some of it Just so you can see what it is and I just have these things called data files Pulls them in the files have name the frequency percent cumulative percent and rank of a name It's either first male first name female first name or a last name from the US Census. So oh Yeah, I'm sorry. Yes, definitely need to increase the font size Tell me when All right, so the More question. Okay, where my tabs went. Oh, yeah, that is kind of interesting So this is Tokyo cabinet a little quick demo I think probably the most underutilized file format in Tokyo cabinet is the table file which is essentially a key value store and the value is a hash and Basically column your database some of the cool stuff you can do with this is you can index a column You can tell Tokyo cabinet I want you to create an index on in this case the name field and then you can ask for a query Of anything that has a name that matches blah and it'll be a super fast lookup because it's actually indexed So in this case We're just going to index all the last names. So I'm using Rufus Tokyo, which I think is probably one of the better Tokyo Tokyo Ruby lives these days and a load up the last names We're gonna this is a time metric. I'm just gonna show how fast it does and then for each name We're gonna store it in the table and the record is a hash. It's those four different fields Now we're gonna store them pick a random one and print it back out So it is something like 86,000 names. So it might take a couple of seconds. Yeah, it's all running locally This is Tokyo cabinet, which is pure local. It's just a straight library and so this we've got 88,799 records Storing them at 12,000 records a second not bad And we picked the random. Oh random last name is It alerts anyone have that last name Okay And it was in this is just one record out of the data So there's a quick example of how to use Tokyo cabinet pretty much you can use it as a hash and the value store is Depending on your database is either a value or it could be a hash if you're doing the table format So feel free to interrupt me with questions at any time Next we're gonna do Tokyo Tyrant, so I need to start them up So this is the Tyrant manager gem I mentioned so I've got two different tyrants running. They're gonna be for two different demos In this case This is a sack exactly the same demo. Oh, I might editing it elsewhere So in this case, it's pretty much the exact same code as the first time We're just gonna use men's first names this time instead of that So it's essentially the exact same code as the local Tokyo cabinet, but this time we're using a network server So this could be on any server we wanted in this case. We're talking to local host port 1978 Tyrant uses port 1978 by default. It's I think the birth year of the author So this case we're just going to store all the male names and then pick a random one So it's exactly the same code as the last time. We pretty much just Replaced Tyrant table here instead of Tokyo cabinet. That's pretty much the only code changes and this case There's only 1200 records Stored them at 4,000 a second So it's 1 3rd is slow, but it is network over the network and it's got to save it to disk and the random name is Warner Anyone by the name of Warner here? okay, oh well, so Depends on if people are reading at the same time if you're just talking raw insertion I'm not sure what it would be, but it's at least comparable. Yeah I mean, I've heard a good number of stories of people using my seagull as just a key value store You know it works out pretty well, and it probably also depends on whether they're using I know DB or my I see them back in And if you're doing bulk inserts or individual inserts, yeah Yeah, in this case Yeah Exactly, that's a good point And I should also mention the library I'm using here is also Rufus Tokyo Rufus Tokyo and this is using the FFI Side of Rufus Tokyo. It also has the Edo side, which is which is wrapping the actual Ruby code that ships with Tokyo Tyrant So this is slower than optimal usage, but most cases. It's pretty good 4,000 inserts or second. I can deal with that So the next one here is a Lua demo So I said before if you have Tokyo Tyrant, then you can actually do Lua Inside it and the way we do that is call an ex t function and I'm actually running out of time here shortly So we're gonna breeze through this real quick. I'm gonna call the add function And give it a record and insert it so in this case We're gonna use female first names and then the Lua function is I to And all it really is is when we're given the function a key and a value then I'm going to get the length of the Of the name and then I'm gonna create a new key Which is the count dash length of the name I'm just gonna keep a running total of essentially the histogram of name lengths as a separate key in it And it's gonna be managed by Lua and there we go So it looks like the most commonly freak the most common size of female first names in the United States is six characters long So, you know, that's I mean, I think that's pretty cool stuff where you can have you can give it a record and tell Hey, Lua do this and you can call any function if you write the function in Lua put it in a file Then you can invoke it remotely and it can do pretty much whatever you like Let's see let's do a redis one real quick That one shouldn't take more than a minute So I'm just starting up a redis server here and Okay, and in this case We're gonna do something slightly different We're gonna demo the the set capabilities of redis the in-server set stuff So in this case, we're gonna use all sets of names male first names female first names And common last names and we're gonna store them each one The key is gonna be first name last name or Male name female name last name and the values are gonna be the list of names So, you know the last names is gonna have 88,000 members in it first names is male first names like a thousand first female names It's 4,000 and then we're just gonna look at all the intersection possible the intersections possible and have them print it out Okay, so he inserted 94,000 records at 9,000 records a second And then we said hey look there's 331 names common between male male and female names in the US's of the lessons You know between female names and last names there's 1,300 names that you know well a woman who worked for my dad She was Kelly Kelly, so you know that's it happens Maybe they're not they're not always in the same person But it happens and there's a thousand common names between male first names and common last names So, I mean this is this is set operations in server I've I have you I have a use for this this may not be useful for everyone, but I think this is a pretty cool operation And I think we have time for one more or questions, which would you guys prefer? Demo, okay. All right This one's gonna be beanstalk So in this one, it's this is a trivial one. I will admit Let's see. Oh, and this is the log of Redis now. We're gonna do beanstalk beanstalk So we have a producer just a standard producer consumer. We're gonna store jobs and remove jobs So beanstalk, there's a great Ruby library beanstalk beanstalk client Just install it. We're gonna say I want to talk to a beanstalk server on this server And I'm gonna create a queue on it or in beanstalk terms It's called a pipe beanstalk can have as many pipes as you like just put them all in there They exist they exist so long as there's data in them and someone's talking to them If there's no person talking on that pipe, then it just disappears so long if there's no data in it So we're gonna iterate over all the last names and insert them into the queue pretty simple and Then we've got a consumer and it's just going to read them off And so here's what I said. We're going to grab a job We're gonna get the body of the job and then we're gonna delete the job So if we didn't put this delete in the jobs would actually never leave because it's a job queue It's not a standard queue. It's a job queue And we're just gonna print out as it's doing it. So I'm getting another terminal All right, so let's Let's store them and Over here. We'll do consumer So we're consuming boom boom boom boom over here. We're producing up there. We are consuming I Mean beanstalk is trivial so trivially simple. It's it's just a great simple simple tool Oh, it's done inserting all of them and this guy over here is just an infinite loop and it'll when it stops It's consumed them all so That works. That's all my demos and I think I've got one minute left Questions comments Was it useful? All right