 The only things that I've added to gemcommand is lazy loading. So when you start up gem, it doesn't have to load up everything until you go and say, gem install, then I'll load the install class for you. Gemcommand also gives you argument grouping. So for the help output, you can say, well, all these arguments are for local remote. All these arguments are for version, et cetera. And there's also some argument helpers. So you can say, get one gem name, get one optional argument, get all gem names. And then it'll also go ahead and raise the appropriate errors for you. So you don't have to redo that across multiple classes. It also provides argument sharing by modules. So we've got a module for local remote options, one for install and update options, and one for version of platform options. And you can also, for some of these, you can only include them as you want. And finally, there's integrated help. So gemhelp will list all the commands that you can use that are active interregems. And I think the nipiest feature is command completion. So since there's only one command that starts with i, you can type gem i to install gems instead of gem install. Now I'll review some important components in RubyGems that have changed. First up is the gem format. Originally, the format was yaml, the yaml paste. So the beginning was a Ruby self-install header followed by a yaml gem spec. And then a yaml manifest, which is all the files and their sizes and their file loads. And then after that was the file there. So there was one file separated by gashes, and there were basically four encoded and compressed. And so these original gems, you can't install them by running the gem anymore, like you used to be able to because the gem installed in our face has changed. The custom format was replaced with this targizip format, which was a little more flexible and uses more standard tools. So in the new format, a gem is a tar file, and inside that tar file is a data targizi, which today is all the files, and it's a yaml gem spec in the metadata.gz. And when we add assigned gems, it was really easy to go and extend this by just adding a signature for each of those files. However, RubyGems integrates itself with Ruby and it probably changes the most in RubyGems. Originally, you would use required gem. So when you get a required gem of some file, you would add all the gem's paths to Ruby's load path, and then if you had an auto-require that's giving you a gem spec, it would automatically require that file for you. And this didn't work like required because it's nicer to have just be able to require stuff than how it would be loaded, regardless of whether it's a built-in or it's a proper RubyGems. So with the library stuff we're at, and with the library stuff, if you had an auto-require in your gem, RubyGems would install a stuff for this auto-require, which would, when you required it, would do a required gem first. And then it would activate your gem, and then it would require the original file again, now that it was added to your load path. And this still had some problems. There was, you could only have one file, or one active version, and there were problems with, if you had multiple files having the same, or multiple gems having the same file, there could be conflicts there. So custom-require replaces this. And custom-require overrides for the require. First it does a real require, sort of a whole load path. The gem isn't there, it rescues the exception and activates a gem that has that file, and then runs to the same process again. And now in Ruby 1.9, RubyGems is just built-in, so you don't have to do an acquired RubyGems before requiring a file. So this is much more natural. The evolution of the remote index is long and painful, and some of you have been using RubyGems for a while, so you remember the updating bulk index pane. So originally, RubyGems used one big YAML file, and that had every gem spec for every gem with every detail, including the file list, all the files that were in the gem. This was fine, because there weren't many gems way back in those days. But over time, RubyGems became more popular, and so that file got bigger and bigger and eventually got too big. And in order to install a gem, you need to download this huge file to update your local cache. So incremental updates were added. And with incremental updates, RubyGems would batch individual YAML files to update their local cache. So in theory, you'd have to download a few small files instead of a big file in updating the index. So this didn't, this still took a long time. So the format was switched to Marshall, and that helped a little bit, because the files were a little smaller and the connections were added, and that helped a little bit because you didn't have to go and make a round trip to the server every time. But it still wasn't enough. The one file was still too big, because on the client side, you'd still have to load this giant file up to figure out what all the dependencies were to install a gem. So for small virtual servers, 128 megabyte virtual server was pretty popular back then. This would bring the gem server, or bring the virtual server to its knees, because it would take more than 128 megabytes just to install a gem. So I replaced that with the modern index. The monolithic cache file was too big, and RubyGems needed to instead download only the files it needed, and then load just the gem specs it needed on demand. So this would vastly reduce its memory footprint. So now with modern index, when you type gem install, you download this latest specs file, and it contains the name and the version of the platform, all the latest versions of every gem. And then once it's got that, you can go and download that gem spec, figure out the dependencies, and then continue to download the gem specs just for the dependencies it needs. And this latest specs file is about 150 kilobytes, and you may even small it with some marshal tricks. If you provide a version that downloads this specs file, which is just a little bit bigger, and the rest of the process is the same, it's for install without the version. So to recap, basically the problem with the index was the scaling problem. Now originally we had this yaml file, that one big yaml file that was cached locally. That got too big and too hard to update, so we added the quick index, and then gem spec for every gem. That was too slow to download, so we added the marshal file. Marshal file, which was smaller, and marshal version of gem specs, which were smaller, or reduced the number of downloads, we also added persistent connections. And then finally we replaced that with the two specs file, one for the latest gems, one for all the gems, and then a cache file for gem. And the on-disk format is also now one cached file per remote file. So you only have to download a couple of them, or you only have to load one file at a time. Well, so I already drove my first major commitment to RubyGems, which was the gem installer. I wanted to have a tool that would go and download every gem, install it into a clean environment, and then run the test on that gem, and then make a report that to a website. And so in order to do this, I needed a way to automatically install all these gems in a sandbox. So this was too hard with the original installer. When I started there, these two classes, the gem installer and the gem installer. The gem installer did the extracting the gem file, put all the files in the right place, build the extensions, and install all the executable steps in Ruby. The remote installer would go and download the gem, figure out that we're into dependencies, ask the user, hey, do you want to install this dependency? And we're gonna install the dependency, and then finally we're gonna install the gem. And the problem with this was it wasn't automatic enough. There was too much work for me to get the gem installed. Asking me if I wanted to, I always wanted to install the dependencies. It also could sometimes reinstall a dependency multiple times. So if you had a gem that had dependency on rate, and then another dependency before it also had dependency on rate, a rate could be installed multiple times because it wouldn't make a full dependency map first. When I finished there were two installer classes, the gem installer and the gem dependency installer. Which if the gem dependency installer worked a little bit different. First it would go and resolve all of the dependencies and then install all of the gems and figure out what the entire map is. Also the API was a little bit nicer with the way it interacted with the cache. Platforms and Ruby gems are a way for Ruby gems to deal with having alternate gems if you really don't have compilers. A platform works pretty similar to a version. So x86 Darwin 10 maps to 32 bit Ruby on OS 10.6. Universal Darwin 10 maps to any architecture that 10.6 supports. And Universal Darwin maps to any architecture in version of OS 10. And so you can build a gem version this way to make it similar to a gem version. We're starting to build a platform this way to make it similar to a gem version. And so when you're building with platforms, the best thing to do is use Luis Levinas rate compiler. This will go ahead and build a gem that can work in both 1.8 and 1.9 all in one package. If you're too lazy to set up a gem or set up a rate compiler, you should use the current version constant. And if you're too lazy to go and build for both 1.8 and 1.9, you should set the required Ruby version to prevent it from installing on the version that you have in the battle field. Platforms is still not a perfect solution. There's still some problems. Ruby gems refers gems matching the platform, but many gems are 1.8 only. Many existing platform gems are 1.8 only. So if you go and try and use this on 1.9, a lot of Windows people try and then the gem just installs, but it doesn't actually work. The other problem is Ruby gem will fall back to the Ruby platform, which may involve building an extension. This is a problem when the person who builds the platform gem is different from the main one. So if the main version is updated, that goes and hides all the platform versions until new releases are made, for example, for Windows. So there are still several things that I'd like to fix in Ruby gems that I think could be better. First one is gem mirror. It's certainly an important code, but not many people use it. Hasn't been actually worked on in years. It's slow and it's bandwidth intensive in order to update the gems. Instead, you should use Ruby gem's mirror by James Tarr. James Tarr gives Ruby gem's mirror uses persistent connections to speed up the gem downloads. It also provides parallel fetching, so you can download multiple gems at once. It uses the modern index rather than the old Marshall index. There's also a gem test, which is also run by gem install dash t. I don't think, actually, has anybody used gem install or gem test? Now, you have used it. Yeah, I don't think this actually works any better. It doesn't actually work very well. One of the problems is, now in Ruby, we've got a half dozen popular testing frameworks. So, and there's no consistent API across these. You don't figure out, hey, how do we run these tests? Do we run them successfully? It's not something that's easy to go and do in Ruby gems. So, Eric Lindsey is working on Ruby gem's test. And so, one of the other things that Ruby gem, that gem test command doesn't do is install the development dependencies. Ruby gem's test goes ahead and takes care of this for you and uses the right test. It's still in development, so if you want to help out with this, you can go talk to Heritage on IRC channel. And Fancy Require is a feature that I've developed with the inspiration of Nobu to kind of look up objects to Ruby's load path. So, Ruby gem's would add a Ruby gem's look up object to the load path, and this would respond to the path form method. And so, you would pass it the name of a file that, or for Ruby, would pass it the name of a file that's going to be required, and then it would respond with one or three things, a file path telling Ruby to go and load this file like normal, or if they respond with true false to say, hey, it's load has already been taken care of and this was the status, or it could turn nil saying to go on to the next item and load that. And so with this, Ruby gem's could be encapsulated in this look up object rather than sitting on top of a require, which would reduce the memory footprint in Ruby 119. I'd also like the gem spec to be more strict about what it allows. Jeremy Hindergachter is going to be talking in two talks and he's going to have all the gory details of that. Basically, there's a lot of junk in the gem specs. Overtime, because it's a Ruby format, you can pretty much add almost anything in Ruby to the gem spec and previously Ruby gem just didn't care. So it would be nice to have some better theater metadata. Unfortunately, I don't know what the solution to this is because it's hard to go and say, oh, you can't add that, but still support the old gems that had stuff that I'd like to avoid. So now on to Ardok. So Ardok was originally written by Dave Thomas and Dave Thomas wanted a set of tools to create documentation in the source for Ruby and for Ruby projects. And so I think he's absolutely succeeded with this because there are pretty much every project has at least some documentation in the Ardok format. But when I asked him about this as I was researching my talk, he said, I am surprised and terrified that Ardok lasted as long as it did. So my first contribution to Ardok was a RIO output formatter for an IRC bot. For an IRC bot, so you can look at RIO with an IRC bot. And then I also added gem RIPaths so that the RIO tool can also look up gem RIO, there are RIO data for gems. So like Ruby gems, I'll make a rundown of the Ardok releases. So Ardok 101 was added to Ruby 181 and much of the development of the original version of Ardok was done in Ruby Trunk without, or the 180 branch without any separate release notes. When I took over Ardok releases, I bumped version two, version two, I replaced the template page templating with BRB and moved everything into the Ardok game space. Also I added Ryan Davis's RI Cache to speed up look out of RI methods. In Ardok 201, I added support for metaprogram methods like Adder accessor. So those would be, you could document those and have them display in RI or DHTML output. I also added an ancestor look up to RI. So if you looked up file read, it would display IO read, but I don't think that exact example would work because there was another bot in ARDok that would file be a subclass of IO. In Ardok 22, we improved the RI cache and added interactive RI. In 23, especially Michael Ray, there's a dark-faced generator and added some Ardok generator speed ups, such as threading and added Ardok discover to plug it to Ardok discover to look up additional generators or templates. Ardok 24 removed the HTML generator, originally the HTML generator and the XML generator because I didn't need two HTML generators to maintain and nobody used the XML generator. And finally, Ardok 25 added a new RI data format and removed the threading because there were problems with visibility across multiple classes when threading was used. So some components of Ardok. First, the parser, so the parser goes and converts the code into a tree of objects representing the project. The Ruby parser is based on IRB and walks token string. The C parser is based on regular expressions and just kind of graphs around in the file looking for pieces of information. The tree that the parser builds is based on or is composed of Ardok code objects and there's a subclass for Ruby construct. So there's one for classes and one for modules, methods, attributes, files, constants, and hiring, I think those are all of them. And then so the parser goes and constructs a graph of these objects. So a file will contain a class and that class contains methods and then the method will go and link back to the file where it was defined. And the generators go and take this code object and turn it in, code out by graph and turn it into some kind of output. So Ardok generates HTML using dark fish, generates RI data and it also have a work in progress generator for tags file format for use in an integral in a banner. There's also an old Microsoft CHM generator which is still floating around out there. I'm not sure it works anymore. Ardok Markup provides the block and inline formatting using a plain text format and for a fancy format you can embed HTML if you wanted to add a table. And finally, RI is Ardok's command line documentation tool. So it outputs several formats including plain text, fancy colorized text and you can also get HTML out of it. So I think the best feature in Ardok is the code object tree. So the parcels built this graph of Ardok code objects which represent the entire project. So the various classes involved in here can provide full introspection so the generators can more easily generate documentation. You can also use this there being a data source you can manipulate in a new format like the HTML generator error or the tags file outputter. Code object tree provides a rich interaction. So for example, the methods and attributes are sortable and they're separate classes but they're all sortable with each other. It's easy to create new scenarios or building tests for some higher order behaviors and there's also a printing print baked in so when I'm writing some tests and I'm learning some new thing if it fails it'll go and show me pretty output that I can go and actually look at and figure out what's going on and the things that I'm just going to inspect which is kind of hard to read. So in Ardok, I replaced more code than Ruby Jam which is largely factory for factories. First up is Ardok markup. The original Ardok markup used a regex parser for text blocks. So you had blocks for paragraphs and lists and binary sections and then inline markup for things like emphasis or cross-references was separate. And the original markup used, or the original parser used regular expression is to go and figure out stuff like indented blocks like the slide here. And so the state was kept separate from the regular expression which was a little bit hard for me to maintain and follow. So I replaced this with a token idler and a personal descent parser. And so it would use, so it parses the input text and builds a markup tree and the inline markup is still separate. And when I wrote this, I battle tested my new parser against all the jams using Gauntlet to make sure that I didn't have any bugs that I didn't know about from the tests. And then once you have a tree, you can use the visitor pattern to go and output text. So currently the visitor is involved in Ardok Text Output which is pretty much a mirror of the original. Backspace text for paged output because most pagers don't support ANSI box. ANSI text for colorized output, HTML and an extra class for HTML with cross-reference links for dark vision. Like the gem index, the RI software from a success problem as well. Originally there was just one RI data directory and then with the gems, you added a RI data directory per gem. And for every method and class, there was a file for that documentation that was yaml. And so if you wanted to look up some method with RI, it would go and look through all of these data directories, try and get a walk and static all the files and look up, hey, is there this method here? And then if once the ancestor was look up it got worse because then you'd have to go and look up the class for all those and then go and look up the method again. So you ended up doing a lot of file stats across a lot of the file systems which was really slow. There was also a separate formatter for output. So the HTML generators and the RI sharing, didn't share any of the code. For the first pass at improving RI, largely done by Ryan Davis, the YAML objects were replaced with hashes. So Ruby, sorry, RI would load up a YAML object originally and now it's just loaded a nested hash. Also there's a cache of file locations added but given the nature of the original RI code, this was a bit too difficult. For the second pass, the Marshall format was added. So we store the ROC code objects directly and also store the ROC markup. ROC markup tree in there for the first comment and then also reuse the visitor code. There's also new decks included to speed lookups but there's not a complete cache across all the gems like the original format. None of the original ROC generators were made. The XML and CHM generators have been removed from RDoC as they were unused and un-maintained. The CHM generators still floating around if anybody wanted to resurrect it though. The original HTML generators used a class called template page generate output. This consumed a JSON like hash which was very simple and used each and if or the only looping or constructs or the only constructs in it. Would also convert the RDoC code object tree into a half nested hash which was it's lost all the richness of the Ruby objects. There's also a library home. By later one eight releases it would take about two gigabytes to go and generate all the HTML documentation. So to approve the HTML generator, first I added, I switched to ERB to make template writing templates easier but we still had this code object transformation. And then Michael Granger came in and wrote darkfish. Darkfish, which I converted to off the code object tree directly. Meaning there was no extra tree to build making it more memory friendly. Incidentally, one of Koichi's students, Tetsu So, wrote a memory profiler that found the cause of the memory consumption in RDoC's original HTML generator. But this was after the darkfish switch. The original RI generator translated the code object tree into the separate tree of RI description objects which were stored in the ML format. And so there was also no index to speed up any of the new lookups. And new RI uses the code object tree with RDoC markup content comments that has an index for directory. So we can easily go and see what all the classes and methods and the ancestor chain for all the jams right off the bat. And just set around and walk through a bunch of files. So there's still some things I like to change in RDoC. So this slide may look familiar. The Ruby parser is based on our review which was rather hard to change. I added multi-byte support to it and that was pretty difficult. I've also added features of the C-pipe parser which were not very fun. And my experience with RDoC markup and Ruby2Ruby tell me that it's much easier to walk a syntax tree and then to interpret token stream or to wrap around in a file. So I like to switch to something like Ripper and Cast. Also, encoding support is not actually done but it's not actually released yet. Release RDoC is the released versions assume a single encoding across all the files. And Trump had transcode all the input documents to one output encoding that you specified. I haven't battle tested this yet which is why it's not been released. I'd like to make sure that I don't break any jams. I have some plugins for it in RDoC but I don't think I have enough yet. You can add generators, you can add hook directive to do custom stuff. But I don't yet have a way of adding command line options to a generator. And I'm sure that there's probably some plugins that I should add to RI but I'm not sure what they should be in. So what have I learned from Ruby jams and RDoC? First thing is to have a good project and a good API. And so a good API has a clean separation of concerns. So the, for example, dependency installer should only be involved in figuring out what dependencies to install. I mean, has the caching is separate and the installing is separate from it. So your classes should all do one thing and they should do that one thing really well. It should be a rich interaction amongst your objects. So are you gonna wanna store your objects? Well, then you should add spaceship. Oh, I got an error. Spaceship and comparable. You're gonna wanna enumerate your object to provide each enumerable time to compare it. And you're gonna use your objects to hash key by the hash and the VQLA method. So refactor your methods aggressively or refactor your classes aggressively. Make sure you shave off any rough edges and encapsulate any state that's shared across multiple objects down into one. This makes your API easier to use. To get there, you're gonna wanna have good tests for everything. They help you refactor. They also teach you about your pain points. You're not sure how to go and find these out. Build some other thing with your API that's not whatever your main thing is. In this case, for the installers, it was Fire Brigade. Talk to me about how difficult it was to actually use the installer classes. And then once you've done this, you can go back and refactor the original implementation. And then once your software is out there and successful, people will be using it in ways you don't expect. So if you can, go and battle test it. You know, tests are good, but they're not gonna, they're only your ideas about how to use the code. They're not everybody else's idea. So you can only make it so idiot-proof. For Arbok and Lubejams, I can use Dolphlet, which can harvest the wisdom of many idiots. When I replaced the Arbok markup parser, I read Dolphlet and found several crashing bugs and infinite loot bugs in my parser, even though I had all of Dave's original tests. So in this case, I was the idiot, since I was able to battle test, nobody noticed. You know, Dolphlet has an arrow focused on it. It's only for jams. It goes through, it does this process, it downloads every latest jam, converts it to a charge easy, and then put it on a command against every jams. So that's, that's all it does. For example, at work at AT&T, we do a lot of solar stuff. So in order to battle test our solar parsing code back at Redditch, we'll do dictionary searches against the solar server, and then go and run the results through our solar parsing code. So you may even have some success problems after a while, and these can be hard to tap, especially when you have only a few tests. Arbok and Lubejams had some poor code, or some poor test coverage, so make sure you go and write more tests to figure out how it works. When there's many moving parts, it can be kind of hard to figure out, hey, is it supposed to work this way? So tests can help you figure that out. And really, so what this is doing is helping you familiarize yourself with the code, so you can test it. So once you've tested it, it'll tell you kind of give you an idea of why it works the way it does, so that you can learn how, once you've learned how it works, then you can make small improvements. And if those small improvements don't do it, then you can replace it completely. This is our, how I improved our eye, Lubejams Remote Index. And you know, be sure to examine the problem closely. What do you really need to fix here? While simple improvement do it, while simpler code do it, if none of those things will work, then finally replace it. Do that, try that last. Replacing it first means you may not replace it with a good enough thing. And finally, less is more. Having less in your project is much better. When there's less code in your project, it's easier to coordinate amongst busy developers. So with Plugins, so for that, Hardock and Lubejams both provide plugins, which make it easier for many people to work on what they want to work on, rather than having me maintain everything. Plugins give you more flexibility. So if James Tucker wants to make a release of Lubejams Mirror or Eric Lensby wants to make a new release of Lubejams Test, they don't have to worry about me making some feature, some unrelated feature in Lubejams work. They can just go and do it. And with this more ideas can be explored, for example, while there's Lubejams sing, which will sing the indentation of your code, or there's, you know, GemConner was really started out as a plugin. And so this ends up with less working on your original code base, because everybody can work on their own happy thing. But plugins can be hard to do right, so be very careful when you start. Think carefully about when you load the plugins and what to do. In Lubejams I made a mistake with the way the plugins work because they're loaded all the time. So even when you're not going to use a command plugin, they're getting loaded up. Also be careful about API walking. If you do it wrong then you might get stuck supporting something you don't want to. Thank you, that's the end. Any questions? So I recently had occasion to need all of the Gem QuickSpecs in advance for my own mirror. And Lubejams mirror doesn't do that. So I wrote a thing that doesn't. But in order to do it I needed the deflated Marshall index that I think is going way over time. My understanding that's the only place you can go and say what are all the versions, all the gems, all at once. I need all the right ones. Is that still okay since- So Wilson was asking about the list of every Gem and all the Gem specs. And currently the Marshall 4.8.z file will be staying around for a foreseeable future. It's pretty expensive to generate. So I believe it's only generated once a day. Shane. Where do we go to help? So Shane asked where do we go to help? Lubejams is on GitHub now. Actually I've got, Lubejams, here we go. There, here's where you go to help. In the back. Are there any features planned for the future? Is it more or better? So are there any features planned for the future? Well, so my talk is about the history. So I didn't go over many of those other than my to-do items. For RDoC I've got about a release almost ready to go. I'm trying to figure out the, how to make command line options work better. For Lubejams we're looking at making arbitrary metadata for the Gem specs. But I'm kind of nervous about that because of the potential for abuse. So we've got a couple things planned, but not so much for Lubejams. Yes. I just wanted to say thanks. Are there any options for Lubejams for doing that shit for yourself? Are there any features planned for Lubejams? Well, you're welcome. Any comments on Yard? Any comments on Yard? Yard, Yard. Competitors are rude. Yes, sorry, Yard. For whatever reason, the Yard maintainer has never spoken to me about much of anything. So I haven't really looked at Yard much here. I do know it uses Ripper, so I may go and look at it to see how it uses Ripper to see what it does. I think that most of what Yard could be done, Yard does, could be added as an extension doc to RDoC. And I don't see any reason for me to not want to do that. But as to why he hasn't asked me for that, I'm not sure. I don't want to add those features to RDoC because they're not my preferred style and I don't want to have to make, it's hard to maintain something that's not your passion. Any more? All right, I believe that's it. Thank you.