 How are we doing this morning? Have a good night last night, good time at the party, everybody awake, do we need to do some stretching or anything? Serran's talk was really good, so I think we're probably all awake. So how about this venue? So we're in the crystal ballroom, which for those of you on the livestream looks like this. It makes me a little nervous that Statler and Walder are just going to pop up on one of those balconies and start heckling me. So given where we are, I want to be crystal clear about what I'm talking about. No need to be fuzzy. I mean fuzzy. Anyway, now that my terrible puns are out of the way, let's get started here. So I'm going to talk about subclassing hash. What's the worst that could possibly happen? So as Megan said, my name is Michael Harold. If you have any questions or anything, please tweet me at mherald or say hello at michaeljherald.com. Megan also said I work at Flywheel. We're a delightful WordPress hosting company for designers and creatives. Now, I said WordPress, we do all Ruby. So don't worry, I'm not an imposter. We're looking for an engineering manager. So if you're an engineering manager looking for a new job, come talk to me or I have a couple of my co-workers here, so. All right, so this talk is a talk about a little gem called hashy. If you read hashy's GitHub page, it says hashy is a collection of classes and mixins that make hashes more powerful. All right, let's think about this for a minute. What pops out to you in this sentence? More powerful immediately pops out to me. And whenever I hear that phrase, it makes me think of Uncle Ben. Of course, it might also make you think of unlimited power, which we know we also get through doing this. But let's get back to Uncle Ben. So Stanley died this week, it's a sad day. But Uncle Ben is famous for saying with great power comes great responsibility. I'd like to juxtapose this with an Alexander Pope quote, to air is human. So humans write computer programs. What do computer programs have? They have bugs. And this talk is primarily centered around three different bugs, in three different portions of the hashy library. So that's the frame for our story. The first bug we're gonna talk about occurs in a hash extension that we call indifferent access. If you're a Rails developer, you know that there is a hash within different access in active support. We have an extension that gives you that power without having to use active support, and there's a bug in there. There's also a bug in mash keys. I'll talk a lot about mash here. It's a big part of our library. And then we're gonna talk about destructuring a dash. A dash is also another data structure within our library, which I'll tell you how that works. So to start out, our indifferent access extension. Wake up one morning and I see that there's a bug report on GitHub. It's a good bug report, it's very thorough. So let's dig into what the reporter found. If we look at their sample code, they start out by making their own hash by subclassing hash, and then they mix in this thing called the merge initializer. If you know how hash works out of the box, you can't really easily pass a hash into another hash to make a hash. Merge initializer gives you that ability. Looks a little bit like this. So we're gonna create our my hash, a new my hash. We're gonna pass it a cat key that has a meow and a dog key that has another hash with a name of rover and a sound of wolf. And we get what we would intuitively expect from Ruby's standard library. We get that hash can respond to cat and it gets meow. And we get hash can respond to dog in the bracket syntax and we get the hash included in there. So that's merge initializer. That's not where a bug lives. The reporter also mixed in the indifferent access extension and this is where the problem lies. So indifferent access, if we create a new my hash once we have indifferent access, we see that we can access hash with a cat as a string and it's the same thing as hash as cat as a symbol. This is the indifferent access portion of the hash. It makes it so you don't have to remember if you're using string keys or symbol keys because that's easy to mix up, particularly when you're dealing with things that users have input via an endpoint or something like that. Also intuitively we get that hash with a string of dog gets the same thing as hash with a symbol of dog. So everybody with me so far? Awesome. So we have this. We're gonna create our hash again and then when we do this, we want to grab the dog hash and we want to merge on that it's a blue healer. We get no method error, undefined method convert. I don't see that anywhere. What? All right, so when we look at the indifferent access extension, we see that we have a merge method which is implemented like this. It calls super and then it calls convert on the resulting, the result of super. We also have a convert method. So what is happening here? We've mixed this into our hash. Why do we suddenly not have access to this convert method? So we look at our hash, we ask do you respond to convert? I love Ruby. This is one of my favorite things in Ruby. Like do you know what this is? Okay, yeah, I respond to convert. So you get a true. Then you ask does the dog hash within the hash respond to convert? True. What is happening? We need to go deeper. So this is an introduction of two of my favorite tools called pry and by bug. Anybody have fan of pry and by bug in here? Yes, whoo, makes my life so much easier. So when I come across a bug like this, this is often how I go about diagnosing what's going on. I write a failing test and then I insert something that looks like this. We're gonna take our merge method, we're gonna call super and then we're gonna tap into super. If you're not familiar with tap, it's a method on object. All it does is it passes the object that you called tap on in as the block parameter. So the result of super becomes result here, but self is still the indifferent access hash. So we have access to both the result and self to figure out what's going on. Tap then returns what is result in that block. So there's no functionality change here at all. It's just it's gonna do exactly what it did before. We're gonna call convert on the result and when we do this and then we call hash dot merge breed blue healer, we get dropped into a REPL. REPL stands for redevelop print loop and we can now type and interact with these variables. So first thing here is I wanna know what is self just to make sure I know what I'm dealing with. Self as a hash, okay, that makes sense so far. We see that result is also a hash. Okay, those match. So they should be behaving similarly. We ask self if we respond to convert. It says yes, I do. And then we ask result if it responds to convert and we get a false. This makes no sense. They should be the same thing, right? So if you're unfamiliar with singleton class, singleton class is the eigenclass or the singleton class, the instance of the class that the object is at a given point in time. If you call extend on an object, you can modify the singleton hash of the thing. And when you call a method on an object, it crawls up this array of singleton class ancestors and sees if each of those modules has the method on it. We see that we have the indifferent access extension in ancestors so that's why self responds to convert here. When we ask the same question of result, we see that the singleton class has no idea what indifferent access is. So that's the source of our bug. The result doesn't know what indifferent access is so it doesn't know how to convert. Because of this, we know that we need to make sure that the result of merge gets indifferent access set on it. Without doing that, it's just a normal hash that gets responded to because of the implementation of the merge method. When we look at the merge method, we see that we call super. In this case, super is hash's implementation of merge, which happens to be written in C in the Ruby VM. So what we get back from this when we call super is just a normal hash even though we're asking the indifferent access extension to give us the result. So in order to fix this bug, we're gonna change this line, we're gonna grab super as a result and then we're gonna make sure to inject indifferent access in the result. And this means that the result's singleton class now has indifferent access in its ancestor's list, which means it can respond to convert. Once we make this change, we can go run our test again, we knew up our my hash and we try to access the dog and merge on the breed and it works. So why was this a problem? As I said, the source of the bug was the fact that we call super. Because we use the base class of hash, when we call super, hash's implementation of merge is run in that instance. This allows us, we call super in order to make it so we can chain multiple extensions together and they can inter-operate. So we need the super in there, but because we're coming from hash, the base of super is going to be the hash class's merge method. This was a single method. If you ask hash how many public methods it has, it currently has 178 public methods. That might be some extra things that get mixed in when I knew up my IRB, but 178 public methods is roughly about right. Aaron Patterson in 2014 wrote a blog post about how Yagney methods are killing him. He's talking about a memory leak that happened in Rails when you use the action controller parameters class. Because there's 178 methods, he goes on to explain, you need to handle all of those methods because they're part of the public interface of your class. Now, I'm not sure about any of you. I would have a really hard time covering 178 methods of implicit behavior just from a subclass. I'm not really sure how to do that effectively, so there are bugs that are gonna come in when we do this in our own code. So that was one problem. It was relatively easy to fix once we knew where to dig in and how to dig in, but let's look at a second problem. Mash keys. So I wake up another morning and I get a bug report that say, mash keys that collide with hash methods produce strange results. Okay, it's a pretty good bug report here that explains what's happening. Now, if you use hashy, how many people here use hashy? How many people have it appear in your gem file and you're not sure why? Yeah, okay. Spoiler alert, not spoiler alert, but disclaimer, I do use, we do use a mash inside of our application. I would recommend steering away from mash, but if you've heard of hashy, you've definitely heard, almost definitely heard of mash. It's almost synonymous with our library. It's also highly controversial, as I alluded to. Let's see what year it was this. Also in 2014, 2014 was a good year of ripping on things that subclass from hash. Richard Schneemann wrote this blog article about how he considers hashy harmful. And through this whole blog post, he only says hashy, he really means mash. He has a lot to say on the matter. It's a really good blog post. Short story there, he noticed that once you add Omnioth to your Rails application, suddenly every single endpoint in your application got 5% slower. Every single one, not your Omnioth ones, every single endpoint. So he was trying to fix that. It's a really interesting read. I would recommend checking it out. Okay, so back to mash. Mash works a bit like this. If we knew up a mash, we can say, hey, do you have a name property? If it doesn't, it turns false. We can verify that by trying to fetch name with a method accessor. We get nil, makes sense. We can also set name. So we can set the name of the mash to my mash. Then when we ask for name, we get my mash back. And we can also see that we have a name property set. This is most of what people use in the mash interface. There's more to it, which we'll see here in a sec. But this is basically what mash is used for. It's also recursive. So if you pass it a hash key that is a hash value, that gets wrapped in a mash as well. So you can infinitely chain hashes down. This is implemented through method missing. Method missing is one of the best things in Ruby. It's also one of the sharpest tools in Ruby. The implementation for this says, hey, if you receive a message that you don't know about and you have a key that matches that method name, return it. That's how we get the ability to say dot name on the mash. Otherwise, we take the method name and we split off any suffix that exists. Then we look at the suffix. If the key ends in an equal sign, so you send name equals, then we assign a property. If it ends in a question mark, then we check to see if we have a truthy value in our hash at that point. If you say bang, it does what's called an initializing reader, which basically news up a mash at that key. If you do under bang, it does this crazy DSL where you can build a hash out of, you can build the next level of the hash by chaining messages onto the top level. If you ever need to just generate a data structure, it's useful for that. I would never recommend using it in production. And then we fall back to just accessing our reader method, if none of those suffix, suffixi, suffixes exists. So clear as mud, right? There's definitely one responsibility for this class, definitely. All right, so mash is intended originally, the read me said that you can use it for JSON responses. Because the read me said that, that's what people do. Makes sense, right? It's intended for this purpose, let's do that. I'm guilty of this. I've written more than one API client library that uses the mash for this purpose. What this looks like is we get our JSON response from the server, we parse it into a hash and then we wrap it in a mash and we suddenly have method accessors on everything. You don't have to define your interface, it's just magically there. Nothing bad can happen from that, right? But remember, mash is a hash. It's explicitly in the definition of what we want mash to be. Mash and all of its other brethren have to be hashes. And remember, hash has 178 public methods. Would any of those methods conflict with anything you would ever return from an API? Class, count, hash, length, trust, zip. So what happens when we do this? We say we have a hashy mash, we're at the Millennium Biltmore that has a zip code of 90071. You call zip on it, you get a weird response. Anybody recognize what this is? Yeah, it's a innumerable zip. It's a great, great method. Not what we want in this case though. But because we implement all of this behavior through method missing, there isn't really much we can do. The method is not missing, so it behaves unexpectedly. What should we do? Well, we're in Ruby, and a lot of the Ruby culture has been influenced by DHH who writes things like this, providing sharp knives. So let's make a sharp knife to handle this. So I made the method access with override extension that you can mix into your mash. You have a mash. We're gonna set awesome on it to sauce. It behaves how we expect. You can then fetch awesome. Now we're gonna set zip to adiduda. We're gonna access it. It works. So what I'm doing here is I'm actually aliasing a numerable zip to underscore underscore zip so you can still access it if you want. But zip now responds how you expect it to. If Aaron Patterson is in the audience, I'm not sure if he is, he would tell you never to do this because you bust the method cache. When you do this in production, every single time you get a mash that has a zip key, you bust the method cache for your entire application. That's really bad for performance, so please don't do this. This is really only for exploration. Okay, so there's mash. There's the problems that we have from mash for from extending hash. Next, we're gonna talk about another data structure called a dash. The names for these things, man. We also have a trash, we have a sash, we have a rash. I have to look up what all of them mean at any given point in time. Okay, so dash stands for declarative hash. Get another bug report. Issue with double-splat merge and dash. What does that mean? This is a really good bug report. He goes through and gives examples of everything that's happening. It's very long, it's very thorough. This was my initial reaction. Like, it's a very good bug report. But I'm also saying, whoa, about something else that I learned from this bug report. Did you know that in Ruby you can have a hash and you can treat it like it's JavaScript and splat it out inside of another hash to build a new hash? I didn't know that until this bug report came in. Changed my world. Okay, so that's what he's talking about. This specific behavior is broken for dashes. So what is a dash? It's a declarative hash, so you define, say, a person hash that subclasses from dash and you have access to a class method called property where you're defining properties on the hash. So this person hash expects and can only have the name value and the nickname value inside of it. So if you try to do person hash.new foo bar, it says, nope, I don't know what foo is, it's not defined. So it's adding a level of not validation but preventing a bad state in your hash. When we try to use this, we can say we're gonna have a same variable, his name is Samwise, his nickname is Sam, and then we try to, we double-splat out the result and add height to it, which gives us, it looks like a hash but when we try to access height, it says, I don't know what height is. So it's not a hash because that's not the behavior of hash but when we do it the other way, we say height is 1.66 meters and we split out Sam there and we access height, we get 1.66 meters. Clear as mud, right? If we call 2h on Sam first, it still works as well. Why is that? What happens when we double-splat? Double-splat uses the two-hash method so if you define the two-hash method on any class, it's gonna basically return the hash that you returned from two-hash. So when we create a test class with a two-hash method with a foo and a bar and we double-splat that out and combine it with baz, we get a hash as we expect. Pretty cool. What happens when we double-splat inside a hash literal? All right, so that's what we're doing. We're trying to double-splat within Sam. In order to figure out what's going on here, we're gonna have to use a very powerful tool. So we take our bit of code, we wrap it in a string, we call Ruby VM Instruction Sequence Compile and what that is is it generates the instruction sequence that the VM is actually going to run when you do this code. We disassemble it and then we're gonna put out the result. When we do this, we get Ruby assembly code. You don't have to understand what's going on in here. It's really interesting to dig into if you are into that sort of thing, but what we care about here is the instruction six. We see core hash merge keyword. So we look for that method in that function in Ruby's source code. It exists in C land. It lives in VM.C if you're interested. We find that function. Looks a bit like this. I'm not a C programmer, so I have to squint, turn my head, think about it a lot, see what's going on here. But I see one thing that pops out at me. I see RB to hash type. So what this does is if the argument is not already a hash, it calls the to hash method on it. The to hash method, if you've defined it in Ruby, will be what you've defined in Ruby. However, Ruby's VM casts the value to a hash, but only when it already isn't a hash. Recall dash is a hash. That's one of the laws of hashy. Always has to be a hash. So the VM does not call to hash on this thing. So when we do this, we get what looks like a good result, but then we try to merge on the height and we get the property height is not defined. What? Okay. There's a second piece of that C function that I showed. So Ruby, the RB hash for each method is basically a merge that happens only in C. So without coming back to your Ruby code, it's going to do a merge of all the keys in the second argument onto self. And you get a resulting hash, which is why we see that height variable there, that height key, but we can't access it. The VM doesn't call merge. Dash's property logic exists in Ruby, so it isn't run. So the thing that we get back from doing that double-splat is a dash. It now has a height property, which violates the property of the dash that you defined, but it doesn't work. Unfortunately, there is not really anything that we could do to fix this because it's just the way the Ruby VM works. Fixing it would require changing the Ruby VM and would actually be a performance penalty anytime you want to call merge on any hash. So we don't want to do that. So we wrote it up in the readme. One more gotcha from a lot of gotchas in our library. All right, so to recap, we talked about indifferent access. This is an extension that we have in hashi that gives you indifferent access on any hash that you want. It means you can pass a symbol or a key and it works as you would expect. There's no, you can't access, you always access the same thing. We talked about mash keys. Mash keys, and mash is a big data munging system that allows you to wrap a bunch of hashes recursively. It's implemented through method missing because of method missing. It interferes with hash methods. Like I said, there's 178 of them, so there's lots of corner cases you can run into with this. We talked about destructuring a dash because of the way that hash is implemented in the Ruby VM. This is a bug that we can't fix and we just have to say, hey, pay attention when you're doing these things. All three of these problems have a root cause of subclassing hash. Now, I pick on hash in particular because that's what I'm most familiar with. But really, any time you're subclassing a core class of Ruby, you're gonna run into issues like this. String, any of the number thing, number classes, hashes, all of those have a big surface area of methods on their public interface, many of which are implemented in C, so if you try to override them, it won't go right quite as well as you think it will. So when you're doing this, you really just wanna be aware of what you're doing. The blog post that I showed of Aaron's where he talks about action controller parameters, there's a sister blog post to that where he talks about active support safe buffer and output buffer, which are what handles HTML safe in Rails. They have the same problem. Because when you subclass from an internal class, your interface suddenly has 173 methods in counting. There's nothing to say that that doesn't get bigger and you have to support all of those methods in order for everything to work as promised. Now in your application code, this may or may not be a problem for you. You might know exactly what you're gonna get back from a response, but in my case, since I'm a maintainer of Hashi, which is a transit of dependency for something like 1700 gems that are out on RubyGems, I have to be aware of this. You can run yourself into performance problems because of this as well. So you have 173 method interface. Do you think you can catch all the corner cases? I know that I can't. I try all the time. I know that I can't. If you do, please contact me. We'd love another co-maintenor. So I have a little bit of extra time. I have like an indendum. But wait, a wild PSA appears. So we've talked about HashiMash. HashiMash, if you ask my co-maintenor, DB, is a devil. He wrote a big long blog post about the demonic possession of HashiMash, which is a chronicle of everything that has gone wrong with it over the years. It's really entertaining to read. It's pretty long too. One class in our library, yeah. So HashiMash, if you look at the RubyGems data dump, this is any gem in the top 1,000 most downloaded gems on RubyGem that uses Hashi. This link's not live yet. I'll get it up later today and I'll tweet out my slides. So of the top 1,000 gems, 1% of them use Hashi. Every single one of these only uses HashiMash. That makes me feel a bit like this. So my PSA is that you might not need HashiMash. So often what you're doing with a mash is you're taking a JSON string, you're parsing it into a hash and then you're passing that into a mash in order to handle it. So you get method access for foo, you can also access strings, you can also access symbols, you get method access for arrays and all that good stuff. That's a lot of good powerful stuff. However, you're still dealing with a hash and you don't really need a hash here. So let's take HashiMash, let's blow it up in all of these libraries and replace it with two things. Require JSON, require ostruct. We take our JSON string and we run it through JSON.parse and we pass this object class open struct here. We get an open struct, a recursive open struct. It parses all the way down. You can also pass array class if you want a special array but in this case we don't need that. So when we do this we have a parse variable, we can call foo on it, we get bar, we can bracket access with foo and we get bar, we can bracket access with a symbol and we get bar. We also get bazes as an array and that makes me feel good. So if you'd like to work with me come seek me out, work at Flywheel, my name is Michael Harold. Please tweet me at any questions. I'll be down here if you wanna ask questions in person. Feel free to email me. If you're interested in contributing to Hashie, please reach out. So we've got a lot of backlog. So thanks.