 Thanks for coming to my talk. I am David Ferber, and I live in Ithaca, New York. I work for a software consulting company called Gorgias. Ithaca is Gorgias, so Gorgias is Ithaca. And I'm going to be talking to you about rapid data modeling and active record with the JSON data type, specifically the JSON data type in Postgres. Another title could be how I sped up my Rails development by fitting a document store into my active record models. And I'll fill in the blanks there as I go along, but I'm going to mainly tell a story of some projects out of which this emerged, okay? Ithaca, maybe you've heard of it, maybe you've not. It's a small city in upstate New York. It's a, yay, a bus ride away from New York City, so we get a lot of business from there. And it's the home of Cornell University in Ithaca College, so we get a lot of stuff from there. It tends to be academic stuff. It tends to be often small to medium sized stuff for which Rails can cut right through. So we have a Rails toolkit that we use to do it. And so everything for me is often about how to get this done as fast as possible, because I'm often working on three or four or six or even eight things at a time, and I need to get through. So rapid data modeling, even before you get up to rapid data modeling, there's a lot of things to do before you would think of such a thing that I thought it would go through. I'm a big fan of simple form and learning, how simple form works beyond just the top of the readme in terms of how you can use it to make your, well, your form simple, but more on that later. And slim for templating because it removes a lot of the extra stuff and taking advantage of the view template inheritance in Rails to minimize the amount of views. And I like to take common components and put them in engines like CK Editor. Every time I start a new project and I push it up, I take to Heroku, how long does it take to build and it started timing out. So we created an engine and put CK Editor what we use on a CDN and whenever we need it, we just plug it in. And I say, don't waste work. The things I'm going to tell you about rapid data modeling with Postgres, I'm not doing any of it with the intention of simply prototyping it and throwing it away at a later time, but of using the JSON in order to build something. And I suggest always to keep reading the instructions even if you started learning Rails at 1.2 or 2 or 3, they're always adding new things. And this is a story of how sitting down and reading what came out in Rails 4 helped me solve a problem. And of course, inherited resources. I'm a fan of that. The projects I'm talking about use that. I noticed that it's come out of fashion lately. If you go to the projects on GitHub, it says, please don't use this anymore. And so I tried to cut back. Also, the 10th time trying to explain to new developers what an association chain is caused me to back away as well, but the model works. And most often when you have a lot of little projects I've found try to be consistent in the way that you do things on all of the projects so that when you parachute in, the things are where they need to be. Now last spring, I got involved with a project, Project Aragats. Like many of our projects, this was, somebody at Cornell got a grant and they had a project that had been evolving over the years and they needed somebody to come in and clean it up and make it work. And this particular one was the archeology department. Some archeologists there, two of them, go every year and have been since 1995. They've skipped some years, but they've been going to Aragats, which is one of the areas of the oldest human settlements, going back to like 8,000 BC. And they do field work and they collect artifacts and then they've been describing them in a first to Microsoft access database that then went to Microsoft Excel and they valently did it all themselves. And with all respect, but it needed a database person to look at it and clean it up and they wanted to expand it and be able to search on it more efficiently than they could. They didn't wanna have to learn SQL in order to keep searching for their data and they wanted it to be publicly available to other researchers so that they could put it in their footnotes and stuff. So let me bring you into the world of archeology real quick. Last summer, when we were working on this, they took a drone with them to Armenia and thanks to the time that we saved them with the system, they were able to fly the drone around. And here, when an archeologist, data modeling for an archeologist, what you're seeing here is a, it looks like a tundra, but that would be, they go and they look around and they see, oh, here's a place where there's some human settlement and this site, they call it a landscape and here's another one and they survey a bunch of these sites and then they decide when they go in a particular year, which ones look interesting. So there's a database of surveys of sites that they've looked up, prospective places to dig and then they decide, okay, this year we're going to focus on that one and this one and that would be called an operation and then in this operation, they're going to go and set up their tents and stakes in the ground and measure and start to do some preliminary digging and say, oh, here, this place looks interesting and this little place looks interesting and this little place looks interesting within that confined landscape and then they're going to start digging with their picks and toothbrushes and whatnot. This, you see that there's a little animal kind of running along. They totally weren't meaning to film a discovery channel type of thing, but that's fun. So this application is going to be used for, they take all the pot shards and whatever and they put them in bags and boxes and they bring them back to the tents and then in the evening, they get out some beer and they and turn on the television and start measuring and guessing what is this and cataloging. So they have an application where they load up to the place that they're at and all the different things that they can find there, they can add them. So lots of different things meant to be used in action but also meant to be used to answer questions like, okay, last year, Bronze Age pots with rolled rims. I need to be able to find that without writing SQL. So there needed to be this very fancy and the beginning of it, very intimidating search interface to write. So the things that they find and of course there's some English and Russian things going on that it's not just in English or in Russian but the things that are in Russian are presented also in English and you choose what kind of thing you're looking for and then you can apply many filters and when you click Add Attributes then you get a lot of different choices and then you can pick your filter and check your lists of things and click Submit and you will get results. So things, lots of different things. In archeology, they, here's some of the kinds of things that they can find at a site. From a bone which they call fauna or a bone object which is an object made from bone or pottery which is seldom coming out as a pot, shells, wood objects, all different kinds of things that they find. So all of these things, there's some metadata about them that everything has. Everything has a period in which they guess that it might have started being from in a period in which it ended. If they really sure that it's from the early Bronze Age those would be the same, for example. But everything gets periodized and what place they found it at and anything else that all of these things have who found it would be on there but then each of these things, everything else is different and these things can have a lot, a lot of attributes about them. Here's, it's just a pot, right? But when an archeologist looks at it and they see pinch handles and grip handles and I'm not even sure what those are and out core positions and what is it a lid? Is it a neck? And core bands and colors and there's dozens of attributes on here and a bone is just as complex as that and whether it's been butchered and how it was butchered and what it was butchered with and it blew my mind how much they do. And I had originally started with having each of these things in its own table. Actually somebody else had started the schema and that's how it was when I got it and then for the searching the things in common they had this thing called an object registry and I was trying to wrecking my mind with the forms and hooking us all up and making the search interface when I read a blog post saying that you can do this in Postgres and I was like whoa, I wonder if I can use this. So here's an SQL query that's saying if we have a single table with everything in it all the attributes of everything and we have a data column that's a JSON column then we can actually search on the keys. Does anybody know you can do this? Has anybody done this? So I thought well, I wonder if I can make rails do that or active record easily. And so I sat down with a beer. I wanted to see if I could use a single table inheritance to have the stuff that's in common in actual database columns and then put everything that is a variant into this data column, okay? So I would have a common base class, we actually call the entity but it was having trouble explaining what entities were so I call the artifacts here. They call them objects but in Ruby that's taken. So we had to say look guys let's pick something else and we settled on entity but artifact I think is clear for a non-archeologist. Pottery extends artifact and all of the specific attributes will be in the data column. Okay, so how can we do this? Well let me tell you first here's a migration that would create a column that would have a JSON column. Now what is a JSON column? Has anybody ever stored JSON in the database before? Did you use a text column? Yes, I have too. I had a slide in here earlier that said humans have long stored JSON as text in the database but now you can actually store it as JSON. Now Postgres 9.2 they introduced this JSON column and all the JSON column really is is text. Postgres stores it as text just like you stored it when it was a text column. However they provided a bunch of new functions that would allow you to search within the JSON and to work with the JSON and blow it out say that you have an array of JSON objects you can blow it out in a query as if they're table rows and you can select out the JSON and pretend that they're columns all kinds of interesting things like that but it was all still stored as text and parsed on demand like James with the presentation about the scrolls and that you make a query it's gotta open up all the scrolls with the JSON. However you can index the individual keys if you want to. JSON B came along in Postgres 9.4 and if you have access to 9.4 and you wanna work with JSON use the JSON B because it gets stored as JSON and that means faster searching and it also allows you to create a what's called a GIN index you can say index all of the keys in this JSON and lookups when you say find me a document with this key value pair in the JSON it's actually faster I found than doing it in regular columns okay but it's much slower if you're doing greater than and less than I'll get some more of that later but if you've got this JSON column which here I call I always call it data then you can do something like this store accessor data comma width and when I saw that I was like I think I might be able to use this what does this do? Well let's say that we have our artifact and we say pass in the width is 20 whoa artifact dot width is 20 cool and there's no column for it data width because it put it in the hash which is your data field and is serializing it to the JSON so by the same token your HTML form with the width where the user type 20 gets posted it comes in as a string because active record doesn't know what to do with it because it's not an actual column and it assigns it as 20 so you have a string of 20 and you ask for the width it is 20 it's not an integer then you get wild and you ask if the width has changed and it goes kaboom because it's not an actual attribute it's just an accessor that gets stored in your data however data changed would become true otherwise active record won't know when to save the record now I had to have this massive search interface now I was gonna use ransack somehow has anybody used ransack? Okay so ransack is a gem written by Ernie Miller that basically glues a form to so you have an input that would be something like width greater than equal to like that so you have the column and then what they call a predicate and you can have a form input and then when you say entity.ransack then that drops into a rel and converts that into a where width is greater than or equal to 30 and then call dot result that pulls it back into active record land so that you can keep on adding scopes and stuff it's a way to build advanced search filter type things okay so I wanted it I needed it to look like that so that I could use it with ransack and then my search problems would be solved if I was gonna use this JSON column for storing attributes and I found that you could with a custom ransacker it's basically saying okay if you see width on my entity model then please when you make a query out of it please change the word width with data arrow width colon colon int now what this is telling Postgres you know data look for the width key in the data it's gonna come back as text and then convert that to an integer so that you can compare it to the integer that I gave you okay so put them together and what do you get the basis for a document store question is how to put them together and what is a document store so a document store is when you store you don't have a schema necessarily but you have or at least it's defined in the model that says it has these keys and there's not columns in the database I wanted to emphasize here that this is a hybrid schema mix type document store like if you're going to use the JSON column it doesn't mean you can't use the other columns or use columns like you normally do an active record so if I was going to use this and it was actually going to be rapid data modeling then I would need to have a quick way to declare what attributes go in my document something kind of like active model serializers or Mongo Mapper or Mongoid and I thought well okay I need to typecast my data coming in and out because it's probably coming in as a string and I want it to be an integer so then I need to convert it to something for a Ransack to be able to search store accessor as a method it turns out to be one of those methods in Rails you go to the Rails internals and sometimes it's like what is that but this turned out to simply wrap up two methods called read store attribute and write store attribute and I found that oh I could just do the same thing and just catch it before it goes in and catch it when it comes out and make it what I want it and then I can declare a custom Ransacker and that would allow me to do something like this with the integer attributes with length width and height and then I also have string attributes and float attributes and Boolean attributes and so forth so I thought well this might work as a template method where I have to define what goes into the blanks and then integer attributes and string attributes would just fill in what those blanks are in reality it turned out to be a little more complicated because there is no dot to bool kind of thing and a date conversion is a little more complicated than just calling to date on whatever the end result is a model that looks something like that and that's actually how the models look in this project and it got the work done but I wasn't really happy with it it kind of bugged me that the implementation to me seemed to be driven by those blanks I needed to fill in and so string attributes Boolean attributes, float attributes that seemed to me that Boolean string float would be like the arguments, the first argument of a common method that could be called something like attribute and doing it this way rather than grouping the attributes by semantically where they are in the meaning of that document instead I've grouped the strings and grouped them together by type and I asked myself what was Sandy Metz do? Apparently I'm not the only one that ever asked themselves that there's even a sticker that I just got but what would Sandy Metz do? I had just watched her last RailsConf talk on Confreaks and the refactoring she did, one of the stages was pretty much what this was it's like okay we're almost there but not quite what if it looks something more like that where it's declaring each attribute one by one along with what it is and that would give a place to add more interesting things like default arguments and make it easier to come up with different types of attributes I found that once I had the ability to say create my own essentially column types within my document and store them how I like but I could be more creative in making the models fit and that refactoring that you saw there did not happen on this project the code of a ceramic object doesn't look like that it's the next project where I got to keep on taking it further and this was for the Inclusive Recreation Resource Center which is based in Cortland, New York runs out of SUNY Cortland or the State University of New York College at Cortland and their mission is to make possible that the picture that you see there can happen that to have a database, a map of places that you can go if you have special needs for access and to assess them to find out is the bowling alley really accessible or are they just saying it's accessible is it really like that? You can say that's accessible but somebody really had to come and measure it right? So then when we have a need we wanna go to a state park we can type in Lake Placid and see what comes up and find out is there really an accessible bathroom there that I can go to should I load up my minivan and drive out there? Okay and part of this website is for an online course for training the assessors another part is for actually making the assessments and these assessments have a lot of forms like this many forms. This is just the beginning of one of the forms and there's like a PDF took a picture of the finders with some of those forms and many of these have multiple pages. Now what are they? As you see by some of the names a gazebo, picnic shelter, a gift shop, pro shop, locker shower, changing room, picnic area, overlook, shopping facility, sauna, amusement ride. There are a lot of different things but many of them are well they're all physical places many of them are physical structures buildings and buildings tend to have some common elements and we noticed that when we were looking at the forms as we were overcoming our initial shock at seeing how many forms there were and when do we have to get this done by and how many migrations do we have to write to fit all this data and then as we were actually reading the form seeing oh these questions are kind of similar to those questions, those questions oh they're always asking about the door and can you open the door and get in the door and how wide is the door and is there a clear route to the door is there a ramp, well describe the ramp is there a parking area and there were always variations of the same questions maybe with slightly different wording on each form but there was a recurring theme within this we never need to ask about unlike the archeologists we never need to search on the specific attributes within the JSON though the search is like give me a state park show me what's around like Placid but not give me all of the gazebos with 40-inch doors, okay in the domain-driven design angle that got me thinking that what I have is an assessment a structure as an entity and these doors and ramps and things you know are value objects that I would like to be able to compose each thing out of so that they would be a reusable part that I can plug in wherever it shows up and I wonder if I could embed these reusable parts into the JSON as it turns out I could I wanted something that would look like this to this is an embeddable model and everything tends to have a name you know door, okay, door to front door back door some things have many doors and very often the comments and whether or not the thing has the thing in question and then the door itself here we see an example what I mean by when you control what an attribute is because it's just going into the JSON then you have more flexibility than you do with ActiveRecord in order to customize what is in that attribute so here the first thing you see there attribute door type is an enumeration that is typically a drop-down list or a list of checkboxes maybe you can add another which is multiple is true means you can select many and then I had another option as like strict is true like you can only select from the list or you can have extra you know things and dictionary because there's different you know open handle and close handle both come from a common dictionary of of door handle types and then here we have a comfort station has a door well a lot of things have a door so I just, you know use single table inheritance for a specialty area which is what a comfort station is called and every specialty area gets the things that most everything seemed to have and then a comfort station has doesn't actually explicitly have a door just got one from a specialty area but as it turns out restrooms and showers also have doors and wherever a door needed to be it was as simple as writing embeds one or embeds many doors or restroom or parking area ramp or elevator or stairway and so forth and then when you have a door for then let's make it behave like except nested attributes for wants it to behave so that you can assign the attributes the form doesn't need to know that this isn't these aren't actual columns and this isn't a real life active record model but just a ruby object with some attributes on it and you can create a custom input for example in simple form that's what this checkbox is other and it hooks up to an enumeration with multiple true and gives you something like this with the add other button and suddenly when we were looking at that and thinking oh my god half of these checkboxes have the other and how are we going to do that and suddenly we owe enumeration multiple true and let's use the checkboxes other simple form input and it was no longer a thing and you know this isn't a partial so whenever we need to plug in a door we just call the partial or actually call a helper that calls the partial with some special arguments and then each door might have some slightly different hints or labeling on the form well simple form gives you some wonderful if tedious but very handy specific ways to address that in the you know in the language file so if you want your non-active record model thing to behave with accepts nest like accepts nested attributes for does all you gotta do is define a method that is your attribute name underscore attributes equals that receives what came in from the form and then builds out your array and sticks it in your jason here i i'm showing you this just to show you how you can do a lot of tricking of active record to make it behave the way that it's supposed to behave with your non-active record thing because i don't want the form to have to know about it i want don't want the controller to have to know that the thing is document store and not and not uh... regular columns so i found myself in this project pulling this out into a gem and using it since then on many other projects because now it's one of the tools in my toolbox putting it here both you can take a look at it i encourage you and would love some feedback on it and also because if you want to see like how i how i got from integer attributes to attribute well uh... but i wanted to reflect more on some of the yeses and noes about the experience that i've had working with this jason column especially as a document store emerge from me from using finding a way to use single-table inheritance and put all of the very attributes into the document store into the jason column so that i don't have the thousand column table which is probably a good indication that i shouldn't use single-table inheritance but i didn't want to have to work with all the dependent models and stuff this was just so easy and got the job done from the inclusive recreation resource center irrc you see that it's handy for storing value objects and arrays and stuff which means that it's good for what you probably think of when you think oh jason oh i have some jason i need to do something with it what should i do well let's put it in a jason column i was building a dashboard for interacting with digital ocean and dns simple and and bit bucket and what else and you get back an api response when i create a droplet let's stuff it in the in the jason column and there it is along with the other attributes that i've already i'm in the flow of building out of some forms and oh crap i forgot to add that field i better generate a migration and here i just go to the model pop it in there you know i think well i regret this the answer is no then it just goes uh... goes there and i move on that last joke i want to emphasize that uh... it's not a permanent commitment that there's ways to get into jason ways to get out of jason look at the jason documentation in post-dress nothing prevents you from running actual queries even though you're using active record and there's all kinds of cool things that you can do that i've not began to talk about that i was going to talk about really i did not need to know about any of it in order to do the work that i showed you so far so why uh... why talk about it here but i would say more that it has the advantages of a document store and all the usual oh you should use mongo db because and you can tell that post-dress uh... has been answering the mongo db question with this and the document store isn't the only thing they had in mind with it a lot of the demos i see would have like okay you know you you have orders and order orders have line items the line items don't really need to exist outside of the order so why not put the line items within the jason of the order model that was a that's a pretty common example and then they say oh but you have a context in which you want to access line items independently here's how you can write a query or create a materialized view that's going to pull all of the line items out of the models that they're stored in and present them to you as if they're in their own table it's like oh okay that's one of many things you can do with it so it has the advantages the speed the and it also has the disadvantages of a document store you know all the projects in which i've used it had a big legacy database for which a large part of the project was to normalize it and sort out like oh we have a text column that's parading as a bullion and they type in yes or no or why or n or why n or n why or yes and like no people uh... let's uh... you know and then figure out true or false once you know and then you have this big migration and then you push push the boat off the dock and it goes out into the pond and up until the time you push that that boat migration whatever you know i just you know there's going to be a migration and as we go we can you know add columns and add fields to the JSON tour hearts content but once it's out there and you want to say change and move stuff around then you have to go through all the documents and change them because there's no partial update you can't say update this key in my jason and set the value of if the age is forty set it to forty two because they're older now you gotta go through and you gotta update the entire you know and save the entire record and in terms of is the performance okay at the level the number of records that i have i don't notice a difference uh... i would and i'm still working on the uh... question of performance in terms of if i have a lot a lot of records you know what kinds of things my my my performance questions are more of how much slower is this or how much faster that is this than if i wrote columns for all of the things that i'm putting into my document store and i what i found is if i'm looking up simple stuff like find this where that equals that then it's pretty comparable and if you're using jason b with a genindex it's actually faster and that if you're doing something like greater than less than string matching any more complex operation than equals in the jason it's actually going to be somewhat slower then if it was in a column and how much slower seems to progress as the number of records increases okay so the more jay more records you have the slower it gets when you're doing greater than less than type stuff which was important for our guts the archaeologists and not relevant at all to the i r r c also i when i was doing this testing i totally did not expect to find this but i found that let's say that you have the restrooms with the doors let's say you have many restrooms and many doors and then you have age up here just as a as a key at the top level the fact that that jason has nested for some reason makes the entire you know search even for the age become much slower in other words if you want to use the jason and you're going to use the indexing and all of that if searching on the keys in your jason is going to be important to you then don't nest jason within the same column because it just it's like throwing in a wrench anywhere from five to sixty times slower depending on the query and depending on how many records and i didn't want that to be true but i can't make i keep looking maybe i'm doing something wrong but then i run a benchmark and it is still comes out slower so i would like to go back when we do human arrays excavations and arrogance i would like to turn grip handle into an object because it should be a value object but if i do that then the search performance is going to suck so i'm not going to do that so where to go from here i encourage you to check out my ordox store gem this is not a talk about that but more how that emerged from the adventures that i had in rapid data modeling and to uh... i've put in some blog post to learn more about the jason column and other things you can do with it from there and i encourage you to go and read them thank you for coming and uh... hearing about the jason column once again i'm david perber and there i am at email and on twitter please uh... give me a holler thank you