 So, once we've identified what it is that we want to build, we need to roughly, and the key word here is roughly decide what we want it to look like. One of the worst things you can do is pin yourself down to a very exact API too early on. When you're refactoring towards a new API in an existing system, it's very important that you have good tests. And then the final steps of actually implementing it, we build those objects, we use them internally, we can close them together manually where we need to, and then the DSLs will come from where we find duplication or paint. So before you can design a great API, first step is to identify the API that you're missing. In any large legacy code base, and make no mistake, Rails is just a large legacy code base. All of the strategies that you can use everywhere else in your code still apply here. You can find plenty of concepts that are duplicated across the domain. Some of the smells will look for are methods with the same prefix, code that has similar structure. Or the big one that you find in Rails a lot are multiple modules or classes that are overriding the same method over and over and over again and call it super. So one of these concepts that we found as I backed the record was the need for a modifying the type of natural. Say for example, you have a price column on a product table and you would like to represent that as a money object instead of a float. So it might look something like this. We're overriding the reader and the writer checking to see if anything's nil. We have to dim the lights a little. Awesome. It's really washed out. Even with real experienced developers, everybody's always wondering, if I do this, am I going to break Rails? There are some things that might work on Xen. When you override the writer, you're actually doing some casting and then giving an act for everything that thinks this is before performing the casting. You might use a certain structure, some validation to expect things to be a certain way and this might just work a little bit differently. Even if you don't break things, there might be other behavior that you might want to control over the sequel representation of your money object. Maybe you add in currency and are storing it as a string instead of a float and you need to parse that and you need to parse that out and combine them back together when you're going to the database. But the really hard one right now is you might also want to be able to use an object as a value in a query. Being able to pass into a query is incredibly useful. So Rails overrides the types of attributes internally and you might be wondering if this is so hard, how does Rails do it? And if you guessed with a giant pile of hacks, you would be right. It's a giant pile of hacks. Rails internals. So not for the faint of heart. So don't worry too much about the specifics of the actual very, very specifics of what it's doing aren't too important here. So this is a feature that's on by default where we convert time values that you pass into act record to the current time zone. It's one of these things that we just sort of do and most people don't know that it's there but it is on by default in most applications. So we're just overriding the writer here. The first thing we're doing is converting it to the current time zone. Then we're basically completely re-implementing dirty checking. Second and third lines in this method are one to one copy pasted from inside of the dirty module in act record. And then once we've done that, we're then jumping through even more hosts so that we can maintain the more tight pass version of the attribute. So in this case, we're looking for the common concepts and some common smells. So we're overriding our writer. We're duplicating a lot of code. And this is a relatively small behavior change, but it has to jump through a lot of really complicated loops in order to do it correctly. It's also important to note that the code that's written this way introduces a lot of solid bugs. A lot of other modules may be trying to modify the type of this attribute in very unexpected ways. And the bugs are hardly attacked once behaviors scattered all over the place. Another place that we modify the behavior of the type passing system is with the serialized macro. This is ultimately, we're overriding the method that gets called internally to perform the type passing instead of overriding the reader and writers. This module wasn't this simple when we got started. Here's some more code from the file and more. Really, this module literally overrode every single method in act record containing the word attribute. And there are five or more slides with this code that I left out. So, in this case, we are not explicitly overriding a reader and writer, but we are duplicating code from other parts of act record. We're jumping through a lot of loops and we're overriding literally everything. And this file was the con of so many bugs in 4.2 and earlier. This macro actually ends up directly modifying the value of the columns hash, which is problematic for reasons that we'll get into a lot later. Another example is enum, where instead of strings, here we're overriding the writer method, we're also overriding the reader method, we're also overriding the before type cast, and there's several others. And once again, enum was a large source of a significant number of bugs, disproportionate to the size of the feature. So, we found our missing concepts. Type attributes are overridden everywhere. And one of the things that you might want to think is, well, we want to do this so much, maybe other people want to be able to do this. So, let's talk about what type casting is. Type casting is when you go through an explicitly convert a value from one type to another. Here's a very simple example, where we have a value, which is a string, and we want to convert it to an integer, so we call it 2i. In act of record, what we do is in actually type casting, it's type coercion, which is the same thing when done implicitly. So, here's an example when using act of record. You have a user model, age is presumably an integer column in the database. We go look at that and decide whenever you assign a value to the age attribute, we're going to convert it to an integer. Now, the reason that we do this is because act of record was originally designed to work with web forms. You're going to assign brands to to attributes. And having to cast these manually would be a pain. Not just for integer types, but something like date can be significantly harder. We didn't want to have to do have this code littered all over our controllers, so act of record sentencing was born. The cases we handle today are much more complicated than that, but if you go through the history of how this evolved, everything can be traced back to that original limitation. Now, in Rails 4 and earlier, the only way that you can have a coerced attribute is if it's backed by a database column. We want to be able to hook into this behavior and be able to modify it. So this is what we we're getting to step two now. We're going to roughly identify what we want to look like. And this is a simpler case. So we're going to have a product model and we know a few things about our API at this point. We're going to have a need to call some method in this case we'll go attribute. We are going to need to say the name of the attribute and have some marker for what the type we want it to be is. Now, this is very similar to what you might find in Datamapper or MongoY, which have simpler APIs. We're going to avoid over-specifying the API at this point. And the really nebulous part is going to be how we pass in the type directly. So at this point, all we know for sure about our implementation is that we're going to need to introduce a type object into our system. But presumably that's not going to be enough for reasonable implementation. We want something that's not just a little bit less snappy, we want something that we can really be proud of and know that we will be able to maintain in the future. So we're going to start by composing the objects in our system manually. The only one we know of is our type object, but we're going to be looking for places to extract collaborators and compose them to make our life easier. Before we start introducing the API, we need to say a few brief words about the fact that there's some rules that you need to follow. Rule number one for factoring is have good test coverage. Rule number two is have good test coverage. Rule number three is have good test coverage. So on the next couple of slides, we're going to, again, there's the code to be very small. The specific details of it here are important. What is important here is there's a giant case statement. Like many parts of our record, it's a giant case statement just going over a set of symbols. And this is the entire type system in 4.1. We call a bunch of class methods based on a symbol that we had earlier derived from the SQL type. And you'll see at the top, at the top of this, there's a very small comment there. Casts value, which is a string to the appropriate instance. And, I mean, like, you think of a lot of ways to try and pass it any value that isn't a string. That is one of the most misleading comments I've seen. So we know that we're going to introduce type object, and we know that type passing currently lives on the column. So first step, let's give the column a type object. So we add a constructor argument, we pass in nil everywhere, and we just run the tests. And that was the very first commit that went in going towards this. It's a tiny, tiny step extracting out more and more from what we know, which is that we need a type object and where it's going to live. By injecting it into the constructor of the columns and finding where the column objects are being constructed, this also is going to point us at the other portions of behavior that we're going to need to modify. Where are we looking at the SQL types for the columns? Where are we constructing these? If we're injecting the type object into this so that we can modify it later, these surrounding bits of code are all going to have to change well. So we go through and in our system and replace all of these case statements and we just slowly move these methods to these type objects. At this point, we have introduced that mapping system into our connection adapters which we're not going to look at in detail because it's very old and tedious but it replaces the responsibility of looking at the SQL type string and building the simple integers, simple strings, simple timestamp and replace it with a different object based on them. So we have a place that we can start moving all of these case statements to. So we go through our system one by one and we just remove each case statement and each of these diffs are just removing a giant case statement and adding another method to our delegate block at the top of the file. So this is at this point in refactoring what a simple type object looks like. This is the string type which has almost no behavior attached to it. Now we've refactored our system into something that's a little bit easier looking at actually implementing the API that will let us look into this. So the simplest case we could start with is changing the type of an attribute from string to integer. So let's write a test. This is what the test might look like. We create a model with a schema. It will create two attributes with the same type and we say that we want to change the type of one of them and then test the distortion. We've actually written the first invocation of our API and let's take a little bit of a closer look at it. So we're starting with the simplest thing that we know we're going to have a type object so we just pass the type object to our method directory. We could use a constant to our symbol or some other marker for the type but for now we're going to keep it very simple and very explicit. This actually turns out to be a design choice that sticks with us through the rest of the refactoring and there's a lot of benefits to giving a manual object. You can understand what's happening here much more easily. The API becomes much simpler and I'm not just talking about from an implementation point of view when you give me an object, you presumably have an inkling of what behavior can be modified by this API. The object that gave me has a known set of methods on it. I presumably cannot possibly change the behavior of anything that won't be calling one of those methods. And every DSL that you add has a concept. When you do add a DSL you want to try to avoid adding DSLs on top of your DSLs on top of your DSLs. There's a lot of cognitive overhead through what the game gets modified and where. You basically have to memorize every DSL that you introduce into your system. Understanding plain Ruby stops being enough. And the line between being helpful and less painful and being too magic is very, very thin. So we can come up with a very serviceable implementation early on by overriding the columns. Actually the same thing serialized does internally. But if it feels wrong we're not changing the schema or changing the structure of the model. However, we want to take the small steps we possibly can't and we want to get to a working implementation of our APIs quickly as possible. But if we just try to modify the columns as directly we're going to run into another problem. So this is how we look up the columns and columns inside of ActiveRecord. Inside of ActiveRecord base specifically. And when you call either of these methods they're going to go execute it where immediately. And that means that we can't actually use this inside of any class macros. It's very important that you be able to load your class or you load the definition and not need a database connection to do that. For example, on Heroku when you deploy, when your assets are creating a pile, that loads up the environment which will load up all of your ActiveRecord models into memory but you won't have a database connection. So we need our implementation to be lazy. And when you find that you need laziness in your system I find that very important to separate the lazy form from the strict form and have both of those available. So here's roughly what the code looks like at this point. So on the top here we have our attribute method which is the lazy version. Below that we have the fine attribute which is the strict version. And then we're overriding after the scheme has loaded we're going in overriding all of the columns that we want to modify. Now unfortunately for most of our cases we're not just modifying the type of an attribute or we're not just replacing type of an attribute completely. You want to modify the existing file. Serialize it might be backed by text. It might be backed by binary. So we really need our decorators. But this again needs to be lazy. We can't go get the current type when you call it because we don't know the current type yet because we haven't gone to the database. Now decorators are not an API that's going to be public in Rails 5. However when you are building these lower levels on top of each other make your internal API just as nice to use. You as a maintainer want to be able to understand your system and have the same simple composable APIs available to you that your users do in your public app facing APIs. So there's a lot of code that I'm leaving out for gravity in the implementation of this but the attribute type decorations is going to be an actual object in our system. Not hash even though we're calling merge on it and other cache like methods. And it keeps track of the order that they were defined in other complicated things. One thing to note here in this design when you're designing class macro one of the important things is that it be IDempetent. So if you call it the same time with the same arguments it should not modify the behavior multiple times. So we're passing in a name of a decorator into this argument instead of just the name of the thing we want to decorate in the block so that way we can differentiate one decorator from another. So if we're going to use this for serialize internally if you call serialize twice you don't want to convert a thing to JSON and then convert that to JSON again you want to replace the original decoration. So this is what using this API starts to look like as we consume it internally. We give it a block give it a name and we look for any absolute that we previously had to find is the thing that we would convert the time zone along. We then create a new typo that practice the original and in its cast and deserialize methods it goes in and does the time zone conversion. Now we can do the same thing for serialization. However in this case we're not basing it off of whether it's a time column in this case we're basing it off of purely the name. When you call serialize it, serialize who and you might say JSON instead of YAML. So we can pull this out again. This seems like a common pattern why to decorate purely based on the name instead of based on additional arguments so we can pull this out into another API internally. So this is the same thing but it just takes the name of the attribute instead of comparing things. And this is what serialize looks like in 4.2. The entire file has basically benzolidin. I wanted to put the diff of this file in 4.2 but it was so huge with all of the red things that the dots were one pixel tall and it filled up the entire slide. And this is what the type object that we extracted from it looks like. There's code there, right? It's not zero code but it's significantly smaller than what was there before. It turns out most of why internals like that were implemented in the way that they were was just because you have to know about every possible method that can affect type casting. So we're building our APIs on a simple implementation of defining and taking a type and an attribute and replacing the original type with the new one. On top of that we were able to build a thing that we used to decorate an older type. On top of that we were able to build an API to represent a common pattern for that. Once we've introduced the API into our system it should be universal. So we're modifying this columns hash internally which implies that the columns hash has a lot of additional information that is useful to type casting and at this point it really doesn't re-separate the idea of a type attribute from the database schema. What we had to do in an act record is go through and introduce internal APIs that may or may not go through the columns hash, that obscure that information away so that eventually we can separate it out so that there was a single canonical way to access the type for an attribute. Now we're not going to look at all the diffs for this because it took about a year and required rewriting a lot of act records and a lot of API. But this is what the schema definition the schema inference code looks like in Rails today on master. So we're no longer defining all the behavior of act record based on the columns hash. We have a single method where we go load that up and we loop over it and then we just call public API. So when an act record builds that determines the shape of the attributes and what types there from the schema automatically, that's just doing something that we're giving you the ability to do as well. We also started to define several other objects that we could introduce into our system that made management of state in act record much easier. This is one of them. It's called attribute and it handles the memoization and state transitions between the various states that an attribute can live and it manages the types. We found that these objects started to be known about everywhere so we introduced a collection object to handle the transitions between those and this is the thing that you actually get mutated. And most methods inside of act record now very quickly change to these small one-off things that just delegate to this other object. In a lot of ways it feels like act record internally has become a really bad implementation of the data pattern hidden behind a layer of indirection which I think qualifies it for worst on name jam of all time. And one of the things in DNR that we're looking for is we're trying to remove all of these modules upon modules upon modules that are just overriding behavior over and over again. We found a common behavior that needed to be modified frequently so we pulled out a new object in our system. When we need to add additional behavior on top of that we can just use a decorator. We can use object oriented principles that we all know love. And when you have an object again it has an interface you can figure out what it can possibly change. So an API looking simple or having simple invocations is not the same thing as it being easy to understand. Here's the pathological example. If I have a product and product belongs to user if I change the user's name I save the product. Did the user's name change in the database? Raise your hand if you think you know the answer. Trick question is based on whether the product is a new record. But that's sort of my point like belongs to I wouldn't even think that modify save if I didn't just know it. There's absolutely nothing here that would indicate what could or couldn't change. Certainly if I see that I'm calling the user method and there was a belongs to user that's fine but if I want to see if a possibly modify save where does it look? Do the docs for save say every possible class macro that can modify it? Do I have to go look at the docs for every class macro I've ever invoked on this class to see if that might modify save? It's also important when developing these APIs to have a contract. So these are a couple of things I think should be universally true for attributes in that record that are not true in 4.1 and 4.2 with these re-backers in the API. I should also mention this API exists mostly finished internally in 4.2 it's not going to be public until 5.0 when most of this work went into 4.2. But one of the things that we want to have universally true when you assign a value to an attribute and then read that value back out it should never change based on saving and then reloading from the database. If you assign the same value to a model from what's already there it should never the model should never remark this change if you just call new on a model and don't give it any attribute it should never remark this change and for any possible value on an attribute when you pass that to where or find by or any of the finders you should get that model back. So at this is the point where I was supposed to have the big conclusion at the point I don't really know how to end this talk so Please ask me questions now Thank you very much Yes I used the 4.1 API and I used the 4.2 API and just walked this through a little bit in order to make a JBC adaptive and I much prefer you in 4.2 API. Thank you very much Thank you The more we do what is the internals are better now I was curious what the performance like that is now. It seems like they're reusing more new objects and stuff in terms of type casting and all that. Yes and no. So the question was what is the performance impact? It looks like we added a lot of new objects. And this was actually a very common concern that came up a lot during the development of it. If you guys saw me at RubyConf this year, you might have seen that I was in the hall the entire time because right before RubyConf we had gotten a report that 4.2 was twice as slow as 4.1 and I was fixing that. So I have another branch where I removed the objects and replaced them with singletons and I saw no improvement in performance. We did introduce the new allocation of the attitude objects but we removed several hashes which are string key. And we can't actually guarantee that we're getting frozen strings coming in. So for that, by replacing the multiple hashes with this attitude object instead we were able to reduce the number of string allocations and it comes out to be about the same. There's another low hate performance group that are becoming more possible because the internals have changed to this new structure. Dirty tracking can be moved to this object which knows much more whether or not a thing could have possibly changed. So we can do fewer checks and stuff like that. Right, so the question was from the maintainer of the org on Hans adapter and it was how is the contract like this published to the connection adapter? So there's a couple of different, this is the connection adapter. The first one is we have a method that we need to be able to call to look up the type for a given column object. The only method that we were calling for this from active record base. And then on SQLite and my SQL Postgres adapter we introduced a type map object and have a consistent internal structure for how that gets populated, how the lookups occur, to the adapter object itself. The object is better mapped to the same type as here. Yes. And the queue file has a link to the next update where it states it's pre-unit and state. It's not to be filled by the user. Yes, but I don't want to rewrite all of the associations to do that. No. Oh, I'm sorry. The question was I felt that passing the type object was much cleaner API than passing a symbol in this DSL. Are there other APIs inside of Rails where I think the same thing is true? I think association is definitely because that notifies so much behavior in really unexpected ways. I think we could gain a lot by describing that more in terms of objects, especially when you get into the ways gems tend to want to add new behavior to that. But that's never going to happen. So you showed us an example of modifying the attribute of a record that belongs to another record. Can you change it to another slide? So are we going to forbid changing the massive model attribute, in this case, in future versions of Rails? No. The question was are we going to change this behavior that I think is really confusing to answer that. No, we are not. So that's a great new change. And it's not painful enough to warrant that going through a deprecation cycle. Any other questions? You showed an example where you can pass a custom money type as a type where actually if I need to create a custom type or something, I don't want to be able to type like that. Do I need to create a type object? Is that an API that I have access to? Yes. Well, the API you have access to. The question is if I want to create a money type in my system, is that an API that I have access to? Do I need to do that? And is that an API that I have access to? Yes. The API that you have access to is creating a normal 3D class. Because the API of this object is three minutes. There is a convenience class that you can inherit from if you want to. It's called type value, and it gives you things like a template method where if you don't need separate behavior for form input versus database input, which a lot of simpler types like integer, you're just always converting it to integer. It just calls a single method by default and then also has a method that you can override or nail it filtered out by default. But it's really easy just to make this inherit in from nothing. It has three methods which are cast serialize and de-serialize, which is form input to the database and then from the database. The contract form is to go on to the edge API that you can find with the documentation for the attribute method. And also looking at the doc script, type the last type value where all of those contracts are 3.4.0. Here again, we've got string and integer. I mentioned how a similar thing happened a long ago. Is that going to be recommended because it's not going to do standard or easy to find? Or is that going to be a model or something you need to find out? Is it going to be a model or something you need to find out? Right. So the question is, I showed an example where we had a model and you were going to go back to that sample real fast. So looking at this example, the question was, we have redefined the simple string and integer on this model. Is that new standard? Is this where we're expected to define all the attributes? And does the introspection still happen? Yes, introspection does still happen. Notice it's not a new standard. This will override introspection. So for example, one of the things that we can deprecate is this behavior that we have where a decimal column with zero precision is treated as an integer for performance reasons in Ruby. We can just deprecate that happening automatically because if you want it to be an integer for performance reasons in Ruby, you can just do that. I'm personally going to exhaustively define everything on my models because I hate having to go to schema RV to see what methods I can call on the object. And I am experimenting with a workflow where you turn on this auto-magic schema-list thing in development when you first are creating the model. And then you test drive, right? And you, like, okay, now I need a title, now I need a body. And you add them to the model, but you never create a migration. And then it just magically saves to one table where it can build everything and have mostly still work. And then when you're done, you do Rails, G, Diff migration, and it pulls it to the model, gets it to schema, diffs them, comes up with the migration required to bring them in line with each other. That's probably not going to be done any time for Rails 5. That is, I hate the, like, do I remodel and rerun this migration or do I have eight migrations because I don't order it? So just freeze and think of every API I'm ever going to add. I'm trying to look at ways that we can use this API to eliminate that. And we really like the strings in that case. Instead of, like, Alex, in this, it's, like, I don't know what you just put that you mean. Like, in that case. In the future, do you work with my open ops areas where we could make better support loops? Like, do we need to do that or do we need to? Yes. Okay, so the question was, at Rover, they really like databases. I heard Rover most of the rest is pretty cool, and you should check that out. And specifically, if they like database constraints, then is there any chance that this work will be used to better support for validating things at the database layer? Hopefully, I would love to see us actually treat a unique index on the database as a, as an economical way to do that, that still be able to present the user-facing error that you get from the unique validation Rails. Not familiar with the unique validation Rails cannot actually validate the use of anything because it does not have a lock on the database, and the database can change between when it goes to check to see if the value exists and when it tries to save the value. The database is really good at validating this sort of stuff. I'd love seeing more stuff put both into the database, and yeah, hopefully one day we'll get to the point where that's more a standard way to do it.