 So, as Josh said, my name is Sarah. I'm a Ruby developer at Pivotal Labs in San Francisco, and we do a lot of Rails development. I'm going to see some Rails code in these slides today. I have some GitHub and some Twitter. I'm here to talk today about Ruby APIs for NoSQL. And I also like to call this talk Polyglot Persistence. We're going to talk about how you store data in Ruby when you're writing a system that uses more than just a relational database. But I'd like to start out with a little bit of audience participation to get ready. So, fans, who's written an application in Ruby that uses a relational database? Okay, that's great. About equal to the number that do TDD, that's pretty awesome. So, who's written in Ruby app that uses a relational database and some kind of alternative data store? Any kind? Memcache counts, file system counts. All right. Who's written in Ruby app that uses only non-relational storage? Wow, that's a lot more than I thought. That's awesome. When I started asking this question, I don't know, six months ago, that last one you would get like one or two little lonely hands in the middle of the room. So, it's really, it's awesome. It's nice to see that stuff's come in that far. So, I want to start off by showing you a diagram you've probably seen a hundred times before. This is a diagram that shows up in every single like learn rails in 25 seconds blog post. But here we have a very vanilla Rails application. Requests come in, they go through the routes, the controller, the controller attacks the models, the model talks back record, gets stuff out of SQL. And then we back out the way it came in, except we go through views on the end and we send out a response. And I was showing this diagram to a friend and he said, you know, if you show a diagram like that, you are legally obligated to have a little diagram of a cloud somewhere in there. So, I added one. So, anyway, I would be really surprised if there's anyone in this room that's actually written an application that has real users in which the system diagram looks like this. And in particular, I would be very surprised if anyone in here has written one where my SQL is the only way that data is persisted. And, you know, usually you don't set out to make a poly persistent system, but it just kind of happens. You have your nice little Rails app and then your product owner comes in and says, you know what, we need to be able to do free text search. So, you're like, all right, let's throw in some solar. And pretty soon you're like, okay, we need another web server. Maybe it was a bad idea to let them upload their videos to our file system. So, maybe we'll do a little s3. And then pretty soon you realize you need to do some background jobs. Maybe you toss in some rescue. And you're like, oh, crap, I've got a friend graph like every other freaking web app in the world. We made our cache the friend graph. Another use for Redis. Then you're like, yeah, you know, we really should do some object caching. So, we'll put a little bit in there. And let me just pause to say that I'm drawing all these arrows into and out of the models. And I know there are people in here that do these things in the controllers. But I'm ignoring you. And we're just going to pretend that everyone does this the right way. And it goes in and out of the models. So, just pretend with me. So, at this point you've got five different data stores. And you haven't even done anything crazy, right? And in fact, I would say that this setup is a lot more common than the one that I showed originally, the plain vanilla and decorated Rails app. And I would say that this is actually what a Rails app looks like, a basic Rails app. I would say that this is almost like the new normal, right? And that's actually pretty amazing, right? In the last maybe year or so, this has happened, or a couple of years, right? We've come to recognize that most applications have some data that doesn't fit into relational persistence very well. Like these models, they have relationships. And who knew, but relationships are actually pretty hard to model in SQL. And the joins get nasty really, really quick. And then you have blocks of text that needs to be searched semantically, and not just by the order of the characters that are in them. And they've got these objects that are frequently read, but rarely written. So you need to have them available more quickly. And you've got assets that live out in your other data stores that you need to associate with them. And I would like to point out actually before we go on that most of these things until recently weren't even really thought of as alternative data stores, right? I mean, Cassandra and Redis, that kind of stuff, sure. But like maybe not Memcache or Solar or S3. And a lot of people are like, well, really, S3, is that really a data store? I think you know how I feel about it. But one of the really interesting things about the NoSQL movement is it really has expanded our idea of what the data store is. Does anyone remember like in the 90s, there were these guys doing Java, writing like persisting stuff to XML text files? Was anyone one of those people? Like those guys were doing NoSQL before it was cool. So, but big problem number one with a system like this, right? Is how do you encapsulate stuff into a class where it makes sense? You've got a conceptual model, but you've got little bits of it split out over six different data stores. How do you reconcile that into a single class? What does that look like? So that's question number one. Keep that in mind, but there's another question too. And that's how do you go from a system like this to a system like this? How do you replace your primary data store with something else? Because what we've been looking at so far have been sort of satellites to the primary data store which is SQL. But if you're lucky and you're very successful, then you may need to replace it with something else. And dig, for example, just moved everything over to Cassandra. They don't have, as far as I know, any relational storage left. And of course, this is not always the right answer because they just fired their VP of engineering. But there are situations in which it's the right answer. And if you want to do this in Rails, this has historically not been super easy. So it's question two, how do you replace your primary data store? And in reality, rather than this kind of thing where you've got no persistent or no SQL based storage at all, most apps end up using some kind of hybrid model where you've got standard SQL persistence for some things because there is data that that's suited for, right? There's basic in and out cred stuff that SQL is really good at doing. And then you end up using non-SQL storage for stuff where it fits and where it makes sense. I mean, you can keep your Cassandra for your voluminous sparse data, if you have any, which a lot of apps do. Actually, a surprising number of apps do. So these are our two questions. How do you unify a model of data that's scattered across multiple data stores? And how do you replace the primary store if you want to? So just to sum up this section, this idea that multiple databases are a fact of life. If you're building even a simple application is what's called polyglot persistence. And this is not really a new idea. I think Ben Scofield has been talking about it for more than a year. But I think a lot of Ruby developers think data storage layer and they think, oh, it's a choice between MySQL and Postgres, and really what's going to end out with an assortment of different technologies. And that's really going to happen whether or not you want it and whether or not you're going to plan for it. So let's take a look at what that looks like. So let's start with a base case. I decided for my simple application that we would build a cephalopod social network, mostly because I like the word cephalopod. So here we're starting out with a basic squid class. We're inheriting from Active Record Base. We've got all the relational introspection and all the random normal stuff for free. So far, we only have one data store. So then the product owner comes along and tells us the customers want to free text search the set of squids. So install sunspot and we add solar free text search by adding a searchable block to our model, which describes what attributes get indexed in solar and how they are retrieved. So we're up to two. And then it's a web application. So we have to have a friend graph. So we decided to store our denormalized list of friends in Redis. And this probably would not be the first solution that you settled on. But I've seen a couple of these applications they normally start out with. OK, first we'll have a join model. And that works until you start spending 400 milliseconds at a time doing joins. And then the next strategy is usually let's just keep a list of them somewhere. And we'll update it when we update the join model. And then you actually don't ever have to join to get the join model. So that's an example of a relationship that just really doesn't fit very well into SQL and doesn't go with the row-oriented SQL style. So we're trying something else. So we had a follow method that uses the very nicely scriptively named gem, Redis, to insert stuff into your Redis pieces. So we're up to three. And then this is a social network centered around several pods in the novels they write. So you want to be able to upload the novels that they write into so that other squids can read them. So we want them to upload their finished manuscript. So we install the S3 gem. And we add an upload novel method that puts the tech file into S3. So up to four. We haven't even been trying very hard. Now at this point, the squid class is starting to look a little bit mismatched. Here's the entire thing. It's a little bit hard to read. I think the text is a bit too small. But you're descending from active record base. You've got this searchable method that deals with solar. You've got the novel upload method that deals with S3. And you've got this follow method that deals with Redis. And each of these things persists a little piece of what is conceptually one object. And you know what I would really like out of this? I would really like to have a nice consistent interface for all of this stuff. Wouldn't that be nice? And I was originally thinking, you might laugh. I was originally thinking that it would be nice to be able to put everything through active record as a Rails developer. Because it's very tempting. Most of us who do Rails tend to think of active record as being synonymous with model. So it's like, it would be awesome. I'll have all my little data stores down here. I'll have all my code on top. Everything will go through active record. It'll be really happy. But the problem with this is that it just doesn't work that way. Sadly. And it's not ever going to. And this is not because Rails Core is trying to make our lives more difficult. This is because active record is specifically built to model a relational database. It's an ORM, Object Relational Mapper. And in fact, in Rails 3, it kind of moves even more of that direction because they've got this new thing called Errol, which we're going to have a whole talk on tomorrow, which conceptually is an implementation of the theory of a relational database, which is called the relational algebra. And it's actually really awesome and cool. And it's conceptually consistent. It's easy to understand. It's very tidy. I like it. Unfortunately, the relational algebra is totally useless when you're dealing with data. It's not structured into rows and columns and tables. It just doesn't map well onto something like a key value store or a column-oriented document database. So the good news, though, is that once I started looking at Rails 3, I got really happy because the folks who rewrote it realized that quite correctly that the concept of what we were calling active record in Rails 2 is modeling two very different types of behavior. And so here's what we have in Rails 2. We've got this big blobby active record thing that does both the communication with the database and lots of other useful stuff, like validations and serialization and lifecycle callback methods. So stuff that your calling code uses, so stuff that sort of interfaces with the controller level, but isn't directly related to how the data is persistent. So in Rails 3, they split it up into two different sections, active model and active record. And active record handles the persistence. Active model handles validations and callbacks and serialization and all kinds of other stuff. And a lot of the blog posts about this split have focused on the fact that this now makes it possible to take active model out and use it in objects outside of Rails. But as someone who does mostly Rails, I'm actually not very interested in that. What I'm interested in with this split is the fact that it makes it possible to take the active record out of the model and still use the model in Rails. And all you have to do in order to do that is write an adapter that presents the same API to active model that active record does, which is actually pretty easy. And in fact, there are also already a bunch of ones out there, like Longoid, that are active model compliance, whatever that means, persistence libraries out there for accessing these things. So you can sort of have a similar interface for different persistence models in Rails 3. And this doesn't really solve the problem of multiple stores in a single model, but it does make it possible to keep a lot of the nice tease of what we think of as active record without actually persisting to a relational database, which is pretty cool. And it's gonna make it much easier to use a non-relational store as a primary data store. So I think we're gonna start seeing a lot more applications that do that, should be interesting. So here are a few of the existing active model compliant libraries. Most of them are quite alpha and experimental and they all have a lot of rough edges, but I would expect to see more of these and I would expect to see them mature quite a bit over the next couple of months. So just to give you a quick example of what this actually looks like. So in Rails 3, we don't actually inherit from active record base anymore. So it's nicely decoupled from any sort of default data store. And in this one, we're defining Mongo as our primary data store and then we tell it which fields we want. And sadly, as I said before, active model actually doesn't really solve the problem of having multiple data stores in a single model. For instance, if you wanted to do free tech search for whatever reason, so within this model, you would still need to add the searchable block in so that you could use solar. However, in Rails 2, it was actually really, really difficult to make something like this work together because active record was so tightly coupled with all of the active model stuff and the searchable library or the SunSpot library actually needed to hook in to some of the lifecycle calls in your models because for example, whenever you update one of your indexed models, you wanna go out to SunSpot and have it update its index, right? So your SunSpot is always tracking your data that's in your database. And so in Rails 2, if you wanted to replace active record with something like Longoid, it was actually pretty tough because Longoid had to make sure that it implemented exactly the same set of callbacks so that searchable could find it in the right place. It didn't always actually seem to match up exactly right, but since we've decoupled that in Rails 3, it makes it much easier for these things to work together, which is pretty awesome. So coming back to our two questions from the beginning of the talk, which were how do you encapsulate a model that has data scattered across multiple stores and how do you replace the primary store? I would say that the first question is there's not really a pat answer for that one yet. I would say that there's some work to do there. For now, the best you can do is break stuff into modules where possible, try to extract the common bits, put as much in lib as you can and reuse it across the models that use that same kind of persistence. So there's not really a silver bullet here as far as I know, just general software engineering usefulness, but the second one is basically solved in Rails 3, which is actually really exciting. And it'll be interesting to see which active model compliant libraries actually start gaining traction, because they are all pretty much right now pretty rough and they're all missing certain things that people expect there to be in an ORM, such as joining and searching, which actually are not going to exist in these libraries at all. So it'll be interesting to see how application development, especially in Rails, evolves as you're doing different, as people start mixing these different types of data stores. So if I can leave you with a thought, if it'll go forward, there we go. If all you did was read the blogs on the Twitter about SQL and NoSQL, you might think it's sort of a standoff. You have to pick one side, right? Either you're the guys in the yellow jackets or you're the guys in the blue jackets. But I think that's a false dichotomy. I think any application with more than 100 users is going to use both. And so please, just quit it with all the fun, look for ways they can play nice together and if you're going to do this, upgrade to Rails 3. So with that, I will take questions. Quick question, how do you test with multiple data stores? Usually I try and break, so if I try and break the stuff for an individual store into a module I can mix in and then try and do shared examples for that, that I can include into each of the tests for the model is using that module. It's typically how I do it. Do you get the sense when looking at the sort of, especially the NoSQL ones, that they're sort of shoehorning themselves into the same thinking about relationships or does Rails 3 let you sort of embrace the reason you might pick that data store to begin with a sort of different way to store your data, et cetera? I don't think that it's Rails that's doing that. I do agree that there's a little bit of like, you can just drop this in, it's just like using a relational database except it's totally different. And I'm not sure that's super useful but I think that's just, it's not the fault of Rails, it's more the fault of just, that's the way people are used to thinking about things. And I think in many ways some of these libraries that are trying to be more relational or have at least a more relational interface are almost like, they're like the gateway drug, right? They're like, come on, try it, it's not so bad. You can let go of one of the ACID, right? Just come on over and try it. But I think we're gonna see more sort of native style for each of these databases because it is, as you get into it, you start realizing that, oh, you know what, it would be a lot more useful if I thought about this as a column oriented instead of a row oriented database. But a lot of the interfaces still try and pretend it's row oriented, so we'll see. Yeah, I wanted to ask you about using no SQL databases with, right now we're using, for instance, Mongoid with Rails 2.3 app. And we love it once you've gotten past some initial monkey patching. But we're really missing the sexy migrations of ActiveRecord. And I wonder whether you think that's more of a symptom of us not using it correctly or it's sort of an inadequacy of where like Mongoid is at right now and it should have a better migration system in place. And if you have any comments about model versioning. I, you know, one of the things that I really, really like about traditional Rails is the migrations model. Migrations like database changes, data store changes of any kind, right, are really a difficult problem to solve right. And I think that, I think that we're gonna see more activity in that on some of the clients for, Mongoid seems to be doing a little bit of work in that area. I haven't really had a chance to test it out but they seem to be, they have some experimental stuff on that, seem to recall. But yeah, it's one of those things that you give up when you go away from relational databases, right. It's this idea of a model with a set schema that you then have to sort of change or version. But I know there are people that would argue that if you need versions, you're doing it wrong. But on the other hand, you know, it's semi-structured data, right. You have to choose where the semi goes, I guess. Yeah, I'm wondering if you've seen any of these adapters for things like some of the semantic web technologies of triple stores and RDF and that sort of thing. I haven't seen any but I haven't been looking. Since not a lot of these data stores implement two-face commit, how do you ensure consistency across your polyglot storage? Well, usually you don't, right. And that's okay, though, right. I mean, basically how it works is that you, some of these systems are sort of self-healing, right. If they notice that there's a, if there's a consistency, they'll kind of fix it themselves and sometimes you can set it up so that it'll, you know, when you try and read something, it'll first make sure that it's consistent with whatever your primary store is and then it'll read it for you. So, I mean, one of the things that you're giving up when you do this kind of thing is guaranteed consistency all the time, right. Got the cap theorem, you can have two out of three, right. Consistency, availability, partition tolerance. And what you're doing is you're giving up a bit of that C in order to gain a little bit more of the A and the B. So that's definitely an issue but usually with code you can kind of work around it. Thanks for the talk, Sarah, it's really interesting. I have a question about the kind of loft topic but have you had any experience using some of the managed third party services for some of these new tools like Mongo HQ and all the other kind of stuff that goes along with them? No, I haven't done any of them. I haven't used any of the services. So you just rolled your own? Yeah, that's definitely a learning experience. Hey, so a while back on App Engine we did a little monkey patching and got Datamapper to run and initially people were concerned that Rails 2.3 had too much dependency and active record for things like scaffold but it turns out you can generate active record scaffold that works with Datamapper and you don't have to write schema migration so people should just go check out using Datamapper for fun on any version of Rails they want and it might just work. I mean, there's a lot of many years of tuning and adapters and things. Strangely enough, people aren't writing Datamapper adapters for all these other key value stores. I know it's unusual for them to do that but you can certainly use it on MySQL and some other things. Yeah, Datamapper has a, I know they have an active model thing. I haven't used it myself. What do you see with the question? Oh, the question was, I think what you were saying is that on App Engine, you did some monkey patching so that you can use Datamapper with Rails 2.3 and generating, right, Datamapper runs on older versions of Rails which allows you to do some stuff that active record doesn't give you but Datamapper, if I'm not mistaken, I think that they are specifically designed to be an ORM. It's another ORM, right? So it's designed to be used with relational data stores and I know that there are some, I think there are some plugins but adapters would do non-relational stores but I think it was, it's sort of oriented towards being an ORM and maybe that's why we haven't seen as many adapters for some of the key value stores and backfitted agencies. With that, we're going to wrap it up. Thank you, Sarah. Thank you very much. Thank you. Thank you.