 Hi everybody, I'm Jeremy my talk today is ruby techniques by example now I'm aiming for a slower pace than yesterday's lightning talk So if I am going too fast, please tell me to slow down Now this presentation will take you behind the scenes of production ruby code and show you techniques that you can use in your own code Now a great Ruby programmer told us last year that as we become better Ruby programmers We should read more code and for those of you who weren't with us last year. That's James Edward Gray Who ran the Ruby quiz for a long time? I don't know about you, but personally I find that reading code is often boring And I think the reason for that is that the signal-to-noise ratio of most code is quite low At least in terms of learning from the code Now I don't mean that most code has serves no purpose But I do think the majority of code that you read is not going to teach you new things I'm not suggesting reading code is a bad idea However, I do think that it may not be the best use of your time And that's where this presentation comes in I've read over 10,000 lines of code so that you don't have to and have chosen to highlight only those techniques that I think are interesting Now the first technique I'd like to discuss deals with creating more easily extensible classes And by extensible I mean classes that are easy for other users to extend in any way that they see fit Now I'd like to give a warning that the rest of the presentation is very code heavy And I'll try to give you enough time to read the code on the slides before I talk about it But if you want more time, please just speak up Now consider this quite common way of adding class and instance methods to a class For most classes, this is perfectly fine But it is not the most extensible way to add the methods to the class and the problem with this way is that it's Difficult for someone to override one of your class or instance methods that you have defined and call super to get the default behavior Now you who had touched on this near the end of his presentation yesterday, so I'll try to go into a little bit more detail All right, you can have users subclass your class in order to override your methods But that will not affect all instances of the class and the user may want to affect instances They are not themselves creating Now the goal here is to allow users to override the methods that you have defined But still call super to get the default behavior and have the overriding affect all instances of the class Now here's one solution that unfortunately does not work When you define methods in a class like this, you're defining them directly on the class or the class is a singleton class Class methods defined this way cannot be overridden by modules and Instance methods defined this way can only be overridden by modules on a per instance basis You cannot override the instance methods for all instances at once and this is due to how method lookup works in Ruby I'm going to attempt to explain Ruby's method lookup quickly with some simplification imprecision and inaccuracy Basically when a method is called on any Ruby object It's going to first look in its singleton class and then any modules included in that singleton class and reverse order of inclusion If the method hasn't been found or the found method called super It substitutes this singleton class with the singleton class is super class and then it restarts the lookup Now given that lookup process that I've described. There's three interesting cases for classes The super class of the singleton class is the singleton class of the classes of super class And unless the current class is subclassed directly from object in which case the super class of the singleton class is a class class Now for other objects the super class of the singleton class is just the objects class itself Now, here's an example of the method lookup process for a class method of big num. Let's extend the integer class with a new module, which includes the module and integer's singleton class. It tries the singleton classes of big num and integer, then the module that extends integer, and then the singleton class of numeric. Since numeric's superclass is object, the singleton class of numeric's superclass is class, so it tries that next, followed by module, object, and finally kernel, a module included in object. Now, here's the method lookup process for an instance of file extended with a module. Ruby will first look at the singleton class of that object, followed by the method, sorry, the module that you extended the class, the object with, followed by the file class, followed by the ancestors of file, which are class IO, module file constants, which is included in IO, module enumerable, which is included in IO, class object, and finally kernel. Now, now that we've finished the digression on Ruby's constant lookup, let's get back to extensible design. If your initial class definition defines all of your class and instance methods in modules, then future modules can extend or be included in the class, and they can call, super, to call the definition of the method in the most recently included or extended module. Now, this design approach works very well, but the simplistic approach shown here is a tad verbose. You can structure your extensions so that an outside module encloses both the class method and instance method modules. And then you can make extending your classes as easy as a simple method call inside the class. To implement this is actually fairly simple. Here's a simplified version of SQL model's plugin method. You just need to check if the submodules are present and extend them or include them in the class if so. Now, you could accomplish the same thing using the extended method of the class methods module or the included method of the instance methods module. But some plugins may have only class methods and some plugins may have only instance methods, which is why a separate method is a superior solution. With your plugin method set up like that, you can allow the user to extend the class with as many plugins as they want. If all three plugins define the all class method and have a call super, when the user calls person.all, it will call the version in plugin three, and then plugin two, and finally plugin one. Now, this design approach makes it possible for extensions to have complete control to override any part of the class's behavior while making it easy for the user to extend with any combination of extensions for their own use. The second technique I'd like to discuss deals with handling class level data in inheritance hierarchies. I'm going to go over three different approaches to this, which I'm going to refer to as the class variable approach, the delegating approach, and the copying approach. All right, this is the class variable approach, and you will rarely see it used as it's basically unworkable when changes in subclasses need to be independent. And with this code, setting the class variable inside customer affects person and employee as well, since class variables are shared throughout an inheritance hierarchy. Now, class variables should generally be avoided and are definitely not appropriate for situations where parts of the hierarchy are independent. All right, here's a simplified version of the delegating approach pulled from act of support. With the delegating approach, whenever they look up of a class instance variable as requested, it looks in its own class. And if it has not been defined there, it tries its superclass, continuing up the hierarchy until the class instance variable is defined or the top level class is reached. The possible advantage to this approach is that if you create a subclass and you later modify the value of the class instance variable in the superclass, the subclass will see the changed value unless it has been overwritten in the subclass itself. The disadvantage is speed, as lookups can be significantly slower, especially for deep hierarchies. Now, this example of the copying approach is pulled from SQL's force encoding plugin, which is similar to the approach of other SQL plugins. In Ruby, when subclasses are created, the superclass's inherited method is called with the subclass as an argument. So you can override the inherited method in your class, call super, and then set values in the subclass. In this case, the value of forced encoding is copied from the superclass to the subclass. And the advantage to this is that lookups in the subclass are just as fast as in the superclass. And the possible disadvantage is that modifications to the superclass after subclass creation are not copied to the subclass. Now, the next technique relates to safety. In particular, safely defining methods via metaprogramming. I'm gonna pick on the delegating example I showed earlier for an example of unsafe metaprogramming. And in general, anytime you use eval created with a string that's created with interpolating arguments, you need to be sure that the string created is valid Ruby code. And for most cases, this will work just fine. But take a few seconds to look at this method and see if you can spot some safety issues. All right, let's consider what happens if name contains a character valid in a method name, but not valid in a literal, such as a space. Now in Ruby, it's perfectly valid to have a space inside a symbol and inside a method name. But the Ruby code produced by superclass delegating a reader will not be valid. In this case, the string of value will raise an error. But if the code body did not raise an error, instead of defining a method named foobar that takes no arguments, it would define a method named foo that takes one argument. Now you can fix this by switching from a string eval to a block eval and referencing the objects directly. However, this has performance implications. Methods that are defined with a defined method are closures, and they do not perform as well as methods created with eval strings that are not closures. So in general, there's a performance versus safety tradeoff, with string evals being performance and block evals being safety. However, you don't have to sacrifice one or the other. You can write code that has maximum performance in the normal case using a string eval, while still handling the abnormal case with a block eval. Now this example is taken from SQL, which creates accessor methods for the columns it finds when introspecting the database, which may contain spaces or other characters not valid in Ruby literals. And basically, you just need to check your inputs and make sure that they would create valid Ruby code. If so, you can use this ring eval. If not, you have to use the block eval. Important to note here is the use of a separate method for the block evals, instead of including them in the main method. Because block evals create closures, objects created in the surrounding scope will not be garbage collected, even if they are never used in the block eval code. In a simplified example, it's not a major savings, but one application using SQL measured the savings of half a megabyte of memory per process by moving block evals to a separate method. Our next technique is a brief but complete description of a simple DSL, and some issues that you need to deal with so that your DSL works nicely. Now this example is taken from the validation helper's block plugin for SQL, which provides a simple DSL for SQL validations. And the DSL syntax should be fairly straightforward. Inside the validates block, methods indicated tributes, and inside those blocks, methods indicate the type of validation that you are doing. I'm going to go over how this DSL is implemented, which is actually fairly simple. The first step is the definition for validates, which just passes along the block it receives, along with reference to the current object to a new DSL class. And the DSL class is named a validation helper's attributes block to indicate that method names inside the block are used to specify attributes. And the implementation of this class is fairly simple, but the important thing to note is that it drives from SQL basic object. And with DSLs, it's usually important to derive from a basic object class so that the methods defined in object can still be used. Now there are a couple of issues with using basic object. For one, there is no basic object in Ruby 1.8. All right, here's SQL's basic object class. And note that it has separate versions for Ruby 1.8 and Ruby 1.9. On Ruby 1.8, the SQL basic object class is just a subclass of object, with most of the methods removed using undef method. And we'll get to the Ruby 1.9 implementation in just a bit, but first I wanna take two quick surveys. A quick show of hands, who knows what this code will do in Ruby 1.9. Does anyone use Ruby 1.9 here? All right, a few people. Now if you think output zero, you are correct. How about this code, let's get another show of hands. Who knows what this code will do in Ruby 1.9? Matt, what will it do, Matt? Yeah, so you get a name error, because inside the definition of basic object, you could not directly access constants to find an object, which is where all other classes are defined by default. Now you can work around this by adding a double cone in front of all constants inside of basic object. However, with DSL design, users are generally not going to know to do that, as they aren't gonna know that the blocks that they are gonna be using are gonna be evaluated inside the context of a basic object derived class. Now here's an example of a separate SQL DSL that allows easy filtering of data sets. Without changing the constant lookup inside basic object, this used to raise constant lookup errors on Ruby 1.9, because of the time constant did not exist inside of basic object. Now the previous workaround was just to add a double colon before time, but by changing the constant lookup, you can allow users to not worry about prefacing constants in DSLs with a double colon, which brings us to how to fix constant lookup in basic object. Thankfully, it's actually fairly easy. You just need to add a constant missing class method to the object or to the class that calls object.constget. You'll note that you do need to preface the reference to object with a double colon, otherwise you should get a system stack error. However, when I tried this without double colon, I got a sig ill and a core dump. Now that we've finished the digression on constant lookup, let's look back at the DSL implementation that we were talking about originally. And as you can see, this is fairly simple. In initialize, you're just keeping a reference to the outer self in an instance variable and an instance of valing the block. All method calls in the block are handled by method missing, which passes the outer self and the method name specifying an attribute along with the block to a new DSL class. Now method missing is used here because potentially any method name is valid. And in general, you should only use method missing if any possible method name is valid. Now here's the final DSL class that handles the actual validation. In initialize, again it's just keeping a reference to the outer self and a current attribute and an instance of valing the block. Note that method missing is not used here. Because we know in advance which validation methods exist, we only create methods for those validations. In this particular case, the methods also have different arity, so two separate class of values are used, one for methods that accept an additional argument and one for methods that do not. Now in both cases, the methods created via metaprogramming just call the appropriate validation method on the object for the given attribute. Now my next topic isn't really a technique, it's just a code example that shows off a little appreciated aspect of Ruby. One of the libraries I work on is scaffolding extensions, which is a very flexible admin front end for multiple wave frameworks and all three major ORMs. And one of the things that allows you to do is override pretty much all of the defaults. For example, you can set the default fields to be displayed on the pages for the person model to be name and age, but to also show the position on the browse page. Now I'm gonna go over two of the internal methods that implement this support. First, all methods have a default implementation that is defined by the library, but the user can override any of those methods for specific cases by defining methods or instance variables that handle that case. Now this method takes multiple method name symbols. For each method name symbol, it first aliases the default method to a private method and then it creates a public method that checks that the method has been overridden before the given argument. If the method has been overridden, it calls the overridden method, otherwise it falls back to using the default method. Now here's what I think is the more interesting part. In order to use this in classes, you just need to extend the class with the overriding module. Now the overriding module has an extended singleton method. And in order for it to work correctly, it exploits a little appreciated aspect of Ruby, which is that virtually all objects can have singleton classes, including all singleton classes themselves. So when a class is extended with overridable, it adds the necessary metaprogramming methods to the singleton class of the singleton class of that class. And inside the singleton class calls the metaprogramming methods, which in turn override the given singleton methods on the class itself. Now this is the only production code I've seen that modifies the singleton class of a singleton class. Anyone else used it? Okay, now our next topic is about presenting multiple backends using a unified interface. And the general strategy for doing this is using separate subclasses for each backend and having a method in the parent class return an instance of the appropriate subclass. Now distilled to its essence, that's what SQL's connect method does. SQL's connect method is supposed to return an appropriate SQL database instance for the database. In order to handle differences between databases, the connect method returns an instance of the database adapter specific subclass. It processes its input, and then it calls adapter class with the appropriate adapter. Now this is a simplified version of the adapter class method. It just takes the adapter scheme given and requires the appropriate adapter file. In each adapter file, the adapter registers the adapter's database subclass in the adapter map, and then the adapter class is just looked up in the adapter map and returned. Back to the connect method, it just takes the class return from the adapter class method and just instantiates a new instance of that class using the given options. Now that's basically all you need to do for the initial setup to work. As long as the subclasses implement the appropriate methods, the wrapping is fairly transparent. Transparent, that is, until you have to handle exceptions raised by the underlying backends. For example, let's say you try to insert into a non-existent table. In order to treat the multiple backends as one, you don't want a PGR to be raised using PostgreSQL or MySQL, or it would be raised when using MySQL. Instead, you wanna wrap the underlying exception classes in your own exception classes. Now in this case, no matter what backend you are using, if that backend raises an exception, SQL will raise a SQL database error. Doing this properly is actually a little bit more work than you might think. Here's a simplified version pull from SQL 2.0 as MySQL adapter. Now this is the simplest thing that works, but it has some unfortunate drawbacks. Now when the SQL error is raised, the exception message from the underlying exception is kept, but the backtrace is lost. And what's also lost is the ability to tell which backend raised the message, which can be helpful when debugging. Now this is a simplified version of SQL's current exception class conversion method. Note first that the exception class name is included in the exception message, which easily allows the user to tell what the underlying exception class was, while still only rescuing SQL's exception class. Also note that the wrapped exception is kept in its entirety, mainly to make it available for use in a case statement inside a rescue clause so that higher level application code can treat different backend errors differently if it wants to. And finally, the backtrace from the newly created exception is set to the backtrace of the underlying exception so the user can easily see which line actually raised the error. I'd like to close out the presentation with a few simple reminders to prevent some issues in Ruby code. The most experienced Ruby programmers probably know about all of these, but I still see these occasionally in production code. Now the first reminder relates to using strings with eval. Now see if you can identify a problem with this code. And the main problem is that the file and line arguments were not passed to instance eval. So if an error is raised, you aren't sure where it happens. And the solution is fairly simple, you just add the file and line arguments to all of your string evals. Now note that if you are using here document, you should add one to the line argument because the string starts on the line after the line that calls the eval method. Then, if the eval code raises an error, the user can see where the error actually occurs. Now the second reminder relates to the appropriate use of the logical OR operator in Ruby. Now think about problems with this code, which is supposed to set the single-threaded mode of given or fall back to whatever the class default is. Now the problem with this code is if the user sets the single-threaded option to false when instantiating the database, it will always use the class default. So if the class default is true, the database will be put in single-threaded mode even though that's not what the user requested. Now this is due to how the short-circuiting logical OR operator works. Now the solution to this is to not use a logical OR operator at all, but to switch to a conditional. And the basic principle is that anytime nil or false can be a valid value, you cannot use a logical OR operator. Now the final topic is a combination of a fairly simple technique involving creating re-entrant methods, along with a reminder about the proper use of insure. Now let's first discuss re-entrancy and why it is important, at least in this context. Consider this code. The insert method inserts a hash of values into one or two tables inside a transaction. However, the fact that insert uses a transaction is not obvious to the caller. If the caller calls insert inside their own transaction, you don't wanna open up a new database connection in transaction, you want to reuse the databases currently open transaction. Now this is the actual code used for an old version of SQL's transaction method. These are the parts that deal with re-entrancy. Basically, you need to store references to the threads that are currently inside the method. Then before they begin rescue insure block, you check that the current thread is already inside the method, and if so, you use return yield to immediately pass execution to the block and return. Now this return yield technique is used in multiple places inside of SQL. Now let's revisit the insert method we defined earlier and consider an issue with it. Think about the highlighted line and how it may not work correctly. Now the problem with returning inside the transaction block is that the lines of code between the yield call and the rescue block are not called. Now the solution to this is fairly simple. You just need to make sure that all code that must be executed is inside an insure block. I think the original author of this code did not want the commit query to be sent in case of an error, which is why it was not in the insure block to begin with. Now if you only want code to be executed if no exception was raised, you should have the rescue block assigned the exception to a local variable and then inside the insure block only execute the code if the value of that variable is no. And that concludes my presentation. I hope you found some of these techniques interesting. Thank you very much for the opportunity to present here at Mountain West Ruby Conference. Anybody have any questions? No questions? All right, if you don't have questions, I will attempt to show you something at least vaguely interesting to me. All right, who else uses Word in their presentation? All right, so what possible, where those question marks are, what's the only thing that can be there in order to get this code to work? So basically this is something where you call the class method and it'll call the class method first and then call the instance method on the same class. Anybody, Matt's? Matt's too busy reading his email. It's okay. Okay, Matt's. Where do you have those question marks? What's the only valid thing that can be there? What's the only thing where you can call the class method and calling super will then call the instance method? Anybody, anybody else? So if you call class, because of the way I describe Ruby's method lookup, classes are basically, it's super class of class is object. So when you call the singleton method of classes, that super class is class itself, which is when you go to the instance methods and that's how you end up with the same code. Thank you. Questions? You're talking about the method lookup slides? Or let's see, okay. So let's see. That or? Okay, I mean, basically the idea for doing that would be if you're defining a method in the class that's already there, you'd have to sort of create an anonymous module included in the class and then have the method call super to go to that. There was also talk, I think Yehuda brought it up, talking with Matt's about the ability to prepend modules to the ancestor chain so that they'd show up before the class module itself and I don't think as I got anywhere Matt's. Okay, we're still thinking about it. So, anyone else? Okay, thank you very much.