 Hi, welcome. My talk today is on an orb I wrote called Ruby Preserves. My name's Craig Buchak. I'm an independent web developer. I've been doing Rails and Ruby since about 2006. I'm also a big fan of Agile, do a podcast called This Agile Life. Let me check that out. So, I started writing a Ruby orb last year, and it's surprisingly small. So, I suppose you could call it a micro orb. It's missing some features that most orms would have, but it has some features that other micro orms don't have, actually. So, wanna make sure we know what an orb is. An orb is an object relation mapper. So, an SQL database deals with relations. There's something called relational algebra, and Ruby deals with objects. And so, those two sides actually work differently, and an orb brings those two sides together. There's a few caveats with that. There's something that people call an impedance mismatch, because things don't always work the same. One example is a tree structure. A tree structure in an object-oriented language is pretty simple. You just have pointers or links or contained hazard relationships. That's not so easy to do in SQL. There's a few different techniques. You have to map those two different techniques to go to save things to the database or to pull them out of the database. So, why would I write a norm? Why would I tackle such a daunting task? First, I'm not really happy with any of the existing Ruby orms. I wanna explore and learn. I've been interested in writing an orb for a while. My colleague Amos King is in the audience. He often tells people, hey, you should write your own orb. And, stupidly, I'd listen to him. Maybe I can learn enough to write my ideal orb someday. They say, if you wanna write something good, write one, throw it away, and then write a new one. So, in some ways, this is that. I'm gonna try to make mistakes and learn from them. And, hopefully, the second try will have a better architecture if I decide to go that way. So, I also wanna answer a few questions. Orms are really complex, but do they have to be? How simple can we make one? What is the essence of an orb? So, every orb I've used has a DSL to help write SQL. A DSL is a domain-specific language. The problem is you usually end up having to write some SQL manually yourself. That's called the leaky abstraction. That means the abstraction doesn't always work. Sometimes you have to go down to a lower level of abstraction. So, what have we made the leaky abstraction leak all the way? And we just used SQL. We didn't use a DSL to help us write SQL. So, I started designing the orb based on a few strong opinions. The thing that drives me most crazy about ActiveRecord is having to look in two places for things. The relationships are defined in the model class, like the has many and the belongs to. And the attributes are defined in the database schema. So, I always hate when I have to go when I'm trying to figure out how a model works. I have to look in those two different places. I'm of the opinion that NoSQL is usually misguided. Most uses are not legitimate. I don't think most of us know or understand SQL well enough to know when we should reject it. I think the people that created SQL are pretty darn smart. I don't think I'm smarter than them when it comes to SQL and database stuff. The other thing is Postgres can do just about anything you need, including most of what a NoSQL database does. It still uses SQL though. Sarah May has a great article on why never use MongoDB. And basically, the gist of it is you're going to paint yourself in a corner, but that corner you're not gonna see until you're about a year down the road. I was talking to someone at Stripe about this. They use MongoDB as their primary storage. And they ran into that problem when what they're doing now is they actually consistently constantly copy the data from MongoDB into a Postgres database for ad hoc queries. So, how many of you have ever switched database vendors on a single project? All right, only a few hands. Was it just an easy change your database YAML file? Yeah, I don't see any hands on that one. So, yankin' and eat it. You're probably not gonna need it. Why are we half preparing for something that's probably not gonna happen? And if it does, you're probably gonna have to do a lot more work anyway. These days, a developer workstation is fast enough to run a full database system. It makes sense to do all your development in the same way as production. So, if you're using Postgres in production, you should try if you possibly can use Postgres in dev. So, this is definitely, with current hardware, this is definitely feasible for Postgres in MySQL. If you use an Oracle or SQL server, that may or may not be feasible. That's probably more due to licensing than the hardware abilities. There are some express versions that probably work though. And even if you do change databases, it's never easy, unless your app is really simple, which means you probably didn't need to change the database for any reason. So, who loves ActiveRecord? Raise your hand. All right, does it or so? Who hates ActiveRecord? All right, about a dozen or so. Who raised their hand both times? All right, a few. I'm mostly in the hate camp, but we have to deal with it. Who's mainly had to type in SQL in an ActiveRecord class? Yeah, that's like a couple dozen. So, ActiveRecord is a leaking abstraction. SQL leaked up in the upper layers. But ActiveRecord is the 800 pound gorilla. Every Rubyist knows it. It's well tested. Frankly, it's just easy to start using it. You don't even have to think about it if you're using Rails. Odds are, if you're hard to work on Rails, you'll be using ActiveRecord. So, domain logic is the things in our application and the interactions between those things. ActiveRecord, the ORM, is based on the ActiveRecord pattern. So, I use ActiveRecord without a space to mean the ORM, the gem that comes with Rails, and ActiveRecord in a space to mean the pattern. So, the pattern was first described, or at least described well by Martin Fowler. He's got a book called Patterns of Enterprise Application Architecture. Good book. It's fairly a little bit dry and technical, but if you wanna learn about patterns, architecture, enterprise applications, that's the first place you should look. He discusses the trade-offs of ActiveRecord, the ActiveRecord pattern, and so there's a lot of pros, there's a lot of cons, we'll talk about those in a bit. Some of the terminology also comes from a book by Eric Evans called Domain Driven Design. Another good book. So, here's the UML diagram for the ActiveRecord pattern. A few things to note here. Find is a class method. The object knows how to save itself and the model is dependent on the database. So, who didn't get, who didn't understand MVC before they used Rails? Yeah, I didn't really understand it, but once you get it, you get this aha moment, most of us have. The MVC is about separation concerns. Your model contains your domain logic, your controllers handle the web requests and the response, and your views handle your output. So, separation of concerns is important, and it's closely related to the single responsibility principle, and the single responsibility principle says that things that need to be changed at the same time should be together. So, my biggest problem with ActiveRecord is it encourages bad engineering habits, mostly violating the single responsibility principle. And that makes it harder to test domain logic without testing the database too. So, my experience is the sweet spot for ActiveRecord is 20 or fewer model classes, and a fully crud app, where the database and the applications, the screens on the application, pretty much mirror each other. But, most of our apps are more sophisticated than just crud. So, one of the alternatives is the Datamapper pattern. There was a Ruby arm called Datamapper that didn't quite implement the Datamapper pattern, so that's weird and interesting. It was closer to the ActiveRecord pattern. So, Martin Fowler explicitly says, ActiveRecord is a starting point, and you should move to Datamapper once you get too complex. You might see domain model objects called entities, sometimes in Datamapper pattern. An entity just means it's an object defined by its ID. So, Python has an arm called SQL Alchemy. It uses the Datamapper pattern. It's very highly regarded, and it's basically the Python arm from what I've seen. So, here's not quite a UML diagram of the Datamapper pattern. It actually conflates Datamapper and the repository pattern, which we'll talk about in a bit. This is sort of simplified. A few things to note. The user class knows nothing about the database, and the repo class knows how to find and save user objects. So, the repository represents a collection of domain objects. We can treat the database as an in-memory collection. So, we have something similar in ActiveRecord, scopes and class methods. But ActiveRecord doesn't support two different data stores for the same model class. I don't know if it's even possible to fit it in. I've never seen anyone do it before. Class methods are generally problematic. We would rather not have class methods. They lead to procedural code instead of object-oriented code. They often indicate you're missing administration. They limit polymorphism. They're hard to test, and they're hard to factor. I've got an article at the end I've linked to on Code Climate that talks about the details of the problems, especially with refactoring of class methods. So, the repository pattern gives a clear separation of concerns. The domain model handles business logic, which is pretty much as we are used to in ActiveRecord. The repository handles persistence, saving, retrieving, finding things, pretty much like the class methods and the scopes in ActiveRecord. And the mapper handles the mapping the database fields to the object attributes. So, I started Ruby Preserves by writing the readme, or I think it was actually cloning up with a clever name. So, this is actually something called readme-driven development. So, before writing any code, I put all the things in the readme, my motivations, my opinions I was basing this on, and how I wanted the ARM to be used. Basically, the high-level API. It's changed some since then, but it's effectively pretty similar to what I envisioned originally. So, here's what that API looks like. You start with the domain model. We can just use a plain old Ruby object, PoRow, but here we're using a struct. Notice I can create my model with no database involved. How much of our app could we write without persistence? Sometimes it's a lot more than you think. Uncle Bob has a talk about architecture, and he talks about delaying persistence until a long time into the project. You might actually use a flat file for a while before you get to the complexity needed for a database. So, this is a lot simpler than an active model record, right? It also shows all the field names in one place. Also, it doesn't have 461, whatever the count is, methods on that model. So, when we get to the point of needing persistence, we can configure Ruby preserves with the name of the database, which is RubyConf example here. We define a repository associated with the model class. So, that's the user repository, and we say it should use the user model. It's associated with the user model. As James and Gregory pointed out, probably don't need the model keyword there. It's always gonna be a model. So, that's a fix I need to work on. And then we can just use the fetch command that's built in to get something out of the database. And this is actually working code from the orm right now. Next thing we need to do is to find a mapping between the domain model and the database table. We don't need to define the table name because we're handwriting on the SQL. We do need to find the primary key so every repo gets a fetch method for free. And the fetch method takes the ID or whatever the primary key is and returns the object associated with that ID. So, here we've added a database field name username, and that corresponds to the ID in the attribute model. So, this is how you map the different names in the database side versus the model side. We added an age field. So, it's got the same name in the database and the model. So, we don't have to specify that, but we specified it's an integer. So, specifying the type lets us serialize and coerce the data as it goes between the database and the model. This is not coerced between user input and the model. That's a different scope and sometimes it has different requirements. So, we don't implement that. So, I haven't implemented any saving yet, but we can basically do it manually. This shows an example of how simple the API is. So, we just call query and it executes some arbitrary SQL. We can check the return value to make sure it has the right size. If we're doing an insert, we expect one row to be added and we can raise an exception if that wasn't the case. Here we're using select. So, select is basically the same thing as query but it maps the result set to a set of objects. It's an innumerable. You can go through them. You can ask for the first one and you can work with the entire set. So, this is basically equivalent to an active record scope. So, relationships aren't usually implemented in a micro-orm, but I've implemented has many and belongs to. They took less than two hours each, but it took me several months to think about how to do them. I'm gonna see if I can view has and belongs to many and has many through, like active record does. So, here we've got a collection of addresses. So, this is a has many relationship and the database addresses are stored on a different table. That's what has many relationship means. Addresses in the table reference user table with a foreign key name, username. And we have to specify the repository for the addresses so it knows how to, we know how to new up each of those address objects. No, we only have two very simple queries there. We have a query that creates a result set for the addresses and then we have a select that creates a result set for the users and then we map everything to objects including the addresses for all those users. Now, this is pretty simplistic. We probably wanna have a where clause on that addresses query at the very least so that we only get the addresses associated with the users we got. But these examples are actually from my acceptance test on the gem. So, we've also implemented belongs to. Here we've got a group object. We have to get that from another table. This is a belongs to relationship. The group repository again tells us how to map that result set into an object. Again, we have only two very simple queries and we've got the query and the select. And we could do, you know, we could do a belongs to and we could do a has many, two different fields in the same object, in the same mapping. But you generally wouldn't wanna go in both directions for the same relationship. That causes a circular dependency. We try to minimize circular dependencies to improve our architecture. So, we're not supporting that yet but I haven't actually tested to see if it would work or not. So, N plus one queries are an anti-pattern. SQL queries involve network latency, so they're really slow. So, if you got a post with a thousand comments and you have an SQL query to get the post and then you iterate through the IDs of the comments associated with that, you'd end up with a thousand and one database queries and that ends up being really slow. So, sometimes active record will give you those. Usually it's just because you're manually iterating through a collection. So, there's one solution in active record. It's usually to use and include to eagerly load the collection or the associated objects with the main object you're trying to get. Also, if you are using active record, please use bullet while you're in development mode. It will watch and find pretty much all of your N plus one queries. So, I made a few mistakes along the way. I'm not so sure. There's not some more remaining big mistakes in my gem. The biggest mistake was I was starting to generate SQL for the relationships and that violated pretty much the whole principle of using raw SQL, which was part of the premise of writing this. The code ended up very complex. I was generating terrible SQL and it turned out I was generating N plus one queries. So, that wasn't very good. The idea of using a proxy object sounded good at first. You don't want to populate the relationships unless they're used, but eager loading actually tends to be better with databases I found. So, it took me a long time to think about how to do relationships to fix that. Took several wrong turns before it came up with something reasonable. And even when I was headed in the wrong direction, it took me some time to narrow in. But once I narrowed in and found the solution, I was pretty sure it was the right abstraction because it was very quick to implement. So, I'm pretty happy with what I eventually came up with there. Maybe. Joins are the one thing I'm not 100% comfortable with. It'd be best if we did joins in SQL, but here's where the problem comes in. In this example, we've got a join with two tables that have the same column name with the column named name. And our result at the bottom there has a result set with name in there twice. Unfortunately, SQL doesn't deal very well with that. It doesn't have a simple solution for that. And the Ruby database adapters, they don't handle it very well either. They want to create a hash out of that and hashes have to have unique keys. So, this is how most orms solve this. You have to specify every column you want to pull from the database. So, you're gonna have to map in your code, you're gonna have to map, okay, what did I call this on the SQL side? What did I call it in the Ruby side? I'm gonna have to keep track of all that stuff. I didn't want to have to force that since my users are writing SQL manually, I didn't want to have to force that API on my users. It would have changed the whole concept. Kind of would have been like, well, what's the point of using this? So, I think my solution's okay. Web app patterns, web apps usually save us because a web app usually doesn't want to show thousands of things on the screen ever. So, we're probably gonna have some sort of limit clause. So, that's Ruby Preserves. I think it has some advantages. It's small, it's got a simple API that makes it easy to use. It's easy to understand the whole thing. It's 350 lines. And it encourages good engineering practices. So, it's 350 lines. I compared to some other orms. Perpetuity, and we'll talk about these other orms in a bit, perpetuity is 2,500 lines, including an in-memory adapter, I believe a Mongo adapter and a Postgres adapter. Lotus Model is 2,000 lines, plus it uses SQL. SQL is 31,000 lines and ActiveRecord is 210,000 lines. I believe it's 210,000 lines out of 270,000 lines total of Rails. So, ActiveRecord is a big chunk of Rails code. So, there's a lot of complexity in there. So, out of my 350 lines, 100 are actually just the word to end. So, that's because I refactor mercilessly. Most of my methods are one line. Because smaller methods with good names show intent and they make the code easier to read. So, of course, there's some disadvantages to a 350 line orm. The biggest one is probably composability. We can't change scopes like ActiveRecord does. The example here, we've got two scopes. One is called published and the other is by author. You can actually chain those together and they will create SQL that satisfies both of those conditions. We don't have any equivalent that. If you wanted to do that, you'd have to write another method that has your hand written SQL. You would have to generate that SQL by hand. We're tightly integrated with Postgres. Part of that was a line cost savings. We could add an adapter layer. Wouldn't be too hard. But you'd have to change your SQL probably any time you change your database. And as we talked about before, you're probably not gonna need to change your database that very often. Even if we had an in-memory option, you'd need two separate repos for repository objects. One for SQL with Postgres and one for whatever the in-memory storage needed. Whether that's a problem or not, it's hard to say. Actually, some people do do testing with an in-memory repository that's hand written. Not that hard to write, actually. We don't have any active model support yet. It makes the API slightly hard to work with in Rails if you don't have active model though. So definitely not ready to use with Rails yet. So some further ideas that I'd like to work on if I continue on with this. So mostly we're a proof of concept at this point. I haven't done any optimizations. I haven't done any performance testing. But I was able to make it work like on it pretty small. And I sort of figured out what the essence of an ORM is. So one of the optimizations I'd like to do first is prepared statements. Prepared statements cache the query plan on the database server. And a query plan is basically sort of the figuring out what pieces to grab from what tables in the database software. So that's a pretty good optimization, actually. So you'd use that for queries that run frequently. I'd like to automatically determine the mappings if I can. We could pull them from the database like ActiveRecord does. Because the repository knows about the database. It's just the domain model class that doesn't know about the database. We could also pull them from the model if we have some way to do that. Vertus has a nice API for defining attributes and the repository could ask the model, hey, what attributes do you have and what are their types? We could try implementing ActiveRecord. That's probably possible. I don't know. I'm not 100% sure on that one. We wouldn't be able to use plain old Ruby objects anymore, though. The biggest challenge is implementing the persisted predicate, persisted with a question mark. And it would require, again, a circular dependency because the repository, the model class would have to ask the repository, is this object persisted? When I was doing research for the presentation, I found some similarities with data access objects and I kind of want to explore that more. You know, what's the difference? What are the advantages and disadvantages between those two patterns? A really interesting idea is, what if we took this as a base and we layered the code to map from a DSL to SQL for us? That gives nice separation of concerns within the arm. It would separate the SQL generation from the object relation mapping. And I'd like to see if I can get to this point to be used in a production code sometime. So Ruby preserves nowhere near ready for production, but I still don't have a norm that I'm really happy with. This is the order I'd currently consider them for my own personal projects. Obviously it depends on the context. So Lotus is kind of an alternative to Rails. It's about a year and a half old, I guess it's close to two years now maybe. So it's got a arm called Lotus Model. It's pretty young and immature. It's 0.5 is the most recent version. It does implement the data mapper pattern, so that makes me happy. It can use plain old Ruby objects. You just need to define three pretty simple methods, most of which we usually have anyway. It uses scopes similar to how Ruby preserves does. So all the built-in scopes are private, so you can't just do where and order and limit. You can't do those, you have to use those within a private method. So that makes sure that you know exactly how your data is being accessed from that repository. It has database repositories, strategies and adapters for SQL in memory and flat files. The main problem I have with it though is all the mappings are done in a single config section. I think that should be done in the repository class. The other problems is the scopes are class methods and we talked about how that causes some problems with testing and other things, refactoring. Perpetuity, small gem created by Jamie Gaskins. There's no support for relationships yet, but he is using it in a few of his production apps. And it's pretty simple, I've taken a look at it a few times. ROM is the Ruby object mapper, primarily written by Piotr Solnica. It goes further than the data mapper pattern actually. It uses something called command query separation. It uses a lot of immutability. Piotr and a lot of that part of the Ruby community are very into functional programming and immutability. The main problem I have with it is it requires a very different mindset and I haven't been able to wrap my mind around the API and how he wants it to be used. The other problem is that it's pretty much built from the bottom up and there's less attention to the API used by the public API that everyone would use. So there's also the SQL gem. I have to pronounce SQL as SQL and SQL to distinguish between the two. This is by Jeremy Evans. He won a Ruby Hero Award earlier this year. He also wrote a framework called Rota, which is really nice, you should take a look at that if you want something like Sinatra. So I found this, I guess two years ago when I was doing research on a talk called Alternatives to Active Record. It's exactly what I'm looking for except it's the Active Record pattern and not the data mapper pattern. And my last choice is using Active Record with attribute declarations. So these were actually added in version 4.2 but no one really told us until they started working on version five. So the idea is that you can, in your model class, you can say I have this attribute that is the ID and it should be an integer. And it will check that to make sure that you do the right thing. So before that, there was a few plugins available. One's called Annotate Models. It has several descendants now. It's written by Dave Thomas. It adds comments to your model file and lists the fields and their types. And I wrote my own thing called Virtus Active Record. It uses Virtus, which is a way to describe your attributes. And so you put those in your model and when the model class loads, it makes sure that the database schema and what you declared in your model match up. So I've got some further reading articles. So turning the tables, subtitled How to Get Along with Your Object Relational Mapper, Brad Irani is actually talking about immutable data structures tomorrow. And Sir May is one of the organizers of this event. That's a really good article about understanding MongoDB and new SQL versus SQL. So this is online. I'll have the URL up in a minute. Actually, you can get there at the bottom. So you can follow those links if you get the URL for this talk. So I wanna thank all the people who gave me feedback, especially James Edward Gray, the second. Thank you, Amos King, thank you. And the slideshow is done with Remark.js. And the UML diagrams, I used an old program that's been unmaintained for about six years called Ditta, D-I-T-A-A. So that was pretty, it was not too hard to use. I'd appreciate any feedback you've got. You can hit me up at Twitter, Craig Buchek. GitHub on Booch. You can send me an email at that address. The project is on GitHub, URL right there. The slides, you can get the slides from that URL or you can actually get the source code for the slides as well at that other URL. Thanks.