 All right, so I'm going to talk about an idea called Schemalisk SQL. It lets you have all the advantages you have of your relational database and some nice document stuff, too. I'm Will Linewerber. You probably know me best for my amazing gem bundle, which when you install it, installs bundler. It's widely popular. But I work at Heroku on the Postgres team, and we run a lot, a lot of Postgres databases. And I really like Postgres, and I want to tell you some of the cool things you can do with it. So a little bit of where I'm coming from to explain some of my motivations is I started out with originally doing plain, active record, simple CRUD apps, and that worked really well for a while. And then I found CouchDB, and I fell in love with Couch. It's a really great database. Has any of you guys tried Couch? Yeah, I mean, it has some great features. The HTTP Rustful-ish interface is nice. MapReduce is pretty fun. But the thing I loved most was the documents. Being able to store the things I was dealing with in Ruby, just as a regular document, just really felt right. Tables, they sort of suck. Third normalize form is a pain. And often my data that I'm dealing with, my web applications, isn't really relational data. It doesn't fit the model. Making a lot of tables in the pursuit of third normalize form, it just doesn't feel right a lot of the times to me. So one of the last projects I worked on was a tool where artists could version their songs and then show those versions to their fans. And so this is sort of how it went. Song had a title. It had a pointer back to the artist. Some other metadata there. Each song had versions, one, two, three, four, five. Those versions had lyrics that could change between versions. The data was made, stuff like that. Each of these versions had a list of tracks. The tracks had files and so on. I was trying to model this before Couch in SQL. I got a problem when I got into the file part. It was hard for me to justify having a files table when really a file in and of itself didn't make a whole lot of sense. It only made sense in the context of everything else. And so I mean, I could have done that in SQL, but I'm just not smart enough to keep all of that stuff in my head. And so I did it in Couch and it worked great. But then I came to Heroku and started learning Postgres. And Postgres is a fantastic database. Multiversion concurrency control. The whole tech search that it has is phenomenal. Geolocation, the last version of Postgres has this awesome thing called K nearest neighbors where you can say, find me the five things closest to that on an index, instead of having to get 20 things and boil it down to five. Listen notify built in. The data types are just great. On and on and on. I could give a whole talk on great things in Postgres. But the best thing that I didn't realize I missed so much was transactions and being to roll back. Like I'd be, now whenever I have a scary migration, I just spin up a fork of my database, you know, and then type begin, run the migration, you know, see if it worked, if it didn't, because I never get it right on the first try. I roll back and then try it again until I get it right. Throw away that fork database and then do it on the real one. Like being able to do that just saves so much time and it's just, it's fantastic. But, you know, I missed, I missed documents. Well, I should say that I missed documents. I don't anymore and I'll tell you why. Two things, HStore and PLV8. One of these you can use today. The other one is sort of cutting edge, not really ready for production, but I think it's really fantastic. So I'm going to start with HStore. So HStore gives you a key value column type inside of Postgres. And so you'll have your regular table and all the columns will be an HStore document. And the awesome part is you can have an index on that. How many guys in your apps do you have just like a general like serialized table that you just treat, or a serialized column that you just treat as like an opaque thing and you can't really query on? Did any of you guys do that? Yeah, now you can query on it. Okay, so I'm going to show you how HStore works. And so, see here's a query, it's select and then I have a string and then I'm casting that string into an HStore. And so this is just, you know, A is X, B is Y, and then that little single arrow operator, that's an HStore operator of how to get a key out. And so I'm asking, you know, give me the A key. And so I get X out as you might expect. Another example. So SQL as a language is pretty bad. And the OR operator is actually concat. And so if you can cat to HROs together, you will overwrite keys that are existing and any new keys will just show up as new keys. And so here the B key was overwritten to P and then C was added. You can also subtract keys out. They'll only get subtracted if they match specifically. And so in this case, the B key, since that matched, that's gone, but the A still stayed. And that's some basic HStore stuff. There's more and more operators that I'm not going to get into. You can look it up. The Postgres documentation is actually pretty good. But one of the great things you can do is you can use this, you know, this is how you would use it in a where clause. So I'm selecting all of my products that have the color attribute. So in this case, adders is an HStore column in my table. And I'm pulling out all the ones that are red. And I would get, you know, some results, like any other query. You can update things. So this query here would update all of the colors and change them all to blue and adds it if it's not there. And you can do joins on it. So this would give me all of the companies that I have a red product. And so you have the full power of all the things you're used to in SQL, but you can access into your serialized column there. But here's the killer feature. You can do an index. So this first one creates a functional index on the, pulling out the color. And now you have, you know, when you do a query with Postgres, it'll use the index. Post, the HStore also has a general index, which isn't as fast as the one above, but you don't have to say it for each key that you get. And so there's some trade-offs there, but you can have it either way. I think that's really cool. You can have it today. If you have a Postgres 911 database, all you have to do is create extension HStore, and you have it. HStore has been out since 8.0. Something, I forget the first version it was in, but you'd have to do some other things to get it. You'd have to, like, find where the HStore SQL is on your machine and do backslash I. This is, you know, much easier. And you can have this in your migrations. Great news. HStore 4 is going to have HStore support built-in. There's a patch. You can bring it back to ActiveRecord 3, but ActiveRecord 4 has it, and you get to use it like this. So this is an operator I didn't cover earlier, but this one would find you all the products, again, with a color and value of run. That's how you use it in ActiveRecord. I think that's pretty simple. One of my colleagues wrote a demo. You can check this out. HStoredemo.rokuapp.com to see it live, and then you can fork the code behind it if you want to see a real-life example of how to use HStore. Thanks to him. However, there is... I'm really glad that Xavier, I think, yesterday went on and on about SQL. I was going to... I had my own loving slides of SQL. It's fantastic. I do all of my apps in SQL these days in it. It has HStore support as well. Jeremy Evans. I love you. So what we use this for, as a real-world example of HStore, so we've run the Postgres service, and part of that service, we check in on every single database several times a minute, and we ask it some questions. How are you doing? Are you up? How many tables do you have? How many connections? And so on and so forth, and we check in and we store that information. Now, over the course of the project, we figure out new metrics that we want to check. We start checking them, and we figure out that other metrics aren't really that useful, so we stop checking them. And if we had to do database migrations all the time, that just wouldn't be fun. And so this, we start checking something new. We just shove it in the HStore and we have our observations. Another use, there's a tool called Wickel-D, which is a project in Go that just listens for your logs. And if you format your logs like this in a key value pair, it'll parse that and then just shove it into HStore, so that way you can just print to standard out and then grab these letters so you can go through your logs afterwards. And that uses HStore to store everything. So HStore is pretty good, but there is, as far as I know, no nesting, like you can't have HStores within HStores, and it's only strings. And so one of the ways that we get around that is we have some naming conventions. One of the things that we can do is underscore at, and Ruby will cast that back into a time, or if it's number something or something size, we'll cast it to an integer, and so on. In practice, it works well enough, but leave something to be desired. But the great thing is you can use it today. So next I'm going to talk about something that you can play around with today, but I really wouldn't recommend using it in production yet, and that's PLV8. PLV8 embeds inside Postgres. And V8, for those of you who don't know, is Google's V8 engine that Node is built on as well, and it's really fast. One thing that's great about this for me is that it's a trusted language, so eventually I'll be able to offer this to Heroku customers. Postgres has a number of languages that she can embed, such as like C, but we can't offer that because then you could run arbitrary C and that's not fun for exploits. But JavaScript is trusted, so we can run that. It's written by Hitoshi Harada, and it's been maintained lately by Andrew Dunstan, who's really cool. Here's the location of PLV8. Installing it's pretty easy. I wrote a brew recipe, but I'm not sure if it's going to get merged because it actually has to install into Postgres. It can't install into its own thing. But installing it yourself is pretty easy. You make install, does it, and then you say create extension PLV8, and then create language PLV8, and you have it. So how do you use PLV8? I'm going to go to one of my favorite examples of trivial programming, the Fibonacci, and it happens in nature. So this is Fibonacci written in PLPG SQL, and that's ugly. All those capital letters, it's like yelling at you. But, you know, this is standard, naive Fibonacci, and it's slow. So this is doing 0 through 35, every 5, and that's a long time. But it works. Here's that same code in PLV8. It takes a make a Postgres function that takes a integer, and then it returns an integer, and then in between the $2 signs is just JavaScript, and there's my function. And it's pretty fast. I didn't want to stop there. I wanted to cheat a little bit and memorize the function, which, you know, sure, that's cheating, but that's really fast. So that's cool. So how many of you guys know PLPG SQL? Okay. How many of you guys know JavaScript? Yeah, see, that's what I thought. Yeah, it's a lot more. So, let's go back to documents. So here's a sample of JSON document. It has name and age, you know, some nested stuff. I made a table that just has one column data, and I just shoved a bunch of JSON into it. Here's the first seven. I made a million of them. And then I made this function here, GetNumeric. So it takes a key list, it takes data as JSON, it returns a numeric, and then all I'm doing is parsing the JSON in JavaScript, pulling out the key, and then that's it. So what does that give us? So now we can select age there out of our documents, and here's, you know, do a statistic on it. There's the average age of all of them. So the first one is JavaScript. The second one is if I just had a table of just age. So, you know, this is slower than SQL. But when you do sort of more advanced queries, this is one up there is a common table expression, which is cool. I can't go over it now, but check out CTEs. It lets you define, like, subqueries and then do operations on the subqueries. It's really powerful. But when you're doing sort of more advanced work, it's still slower, but the difference is much less. But one of the things you can do to make it faster is you can create an index on it. Postgres doesn't care that the index came from JavaScript. Once it's an index, Postgres knows how to deal with it. And so here I am in the second there, creating an index, and then all of a sudden the same query is much faster. And so, for those of you who've used Couch, that's sort of the same trade-off, right, where in Couch you can make one-off queries or slopes as to scan everything, like a Couch view, it's really fast. This is the same trade-off. Once you make an index, it's really fast. But Postgres can do something cool, which is combine indices. And so I don't know, I'm not going to go over, like, how to read, explain statements in SQL, but in the bottom here, it's taking those two indices for getNumeric, siblings of getNumeric age, and it's combining them, making an and. Because once Postgres has the index, it knows how to do that. Combines the two indices, re-checks the one it has to, and then goes. And this is good. So Postgres 9.2 is going to support JSON as a native data type, which is great. There's probably, we're probably going to backport that to, I mean, I don't know, but I think there's an extension that backports it to 9.1. And so you can have JSON as a first-class thing in Postgres. So what else can you do when you have PLV8? Well, what about Mustache? So here, you take your function, returns text, it takes the template and the view, you paste in the 400 and so lines of Mustache, and then at the end there, just return parsing it and, you know, parse the JSON, and then pass it to Mustache and the template, and it works. And so here's, I'm selecting Mustache. The first thing is my template that iterates over my array, and I pass it in some things there, and now I have Mustache in my database. Is that useful? I don't know. But you can do it, and that's cool. Um, here's something that is useful. There's a project called JSON Select, which lets you do arbitrary CSS selector style queries on JSON. And so, same exact pattern as the Mustache one. This is, JSON Select is written to use either in jQuery or in Node, and so I just pretend that I'm Node, I make an exports object, paste in JSON Select, and then I parse my JSON, I, um, you know, pass in the selector, and then I re-stringify the JSON so that I'm up here instead of returning a numeric or something, I'm returning JSON. Um, you know, the same pattern as all the other functions there before. And now, just to remind you what the documents I'm working with, they look like this with some, you know, phone numbers. And you can select the first name if there, you know, there's more than one. You can select the numbers there, and now I have my return set, I have the names, and I have the JSON there of my numbers. And you can do just arbitrarily deep queries into your, your JSON. And this is, uh, you know, better than the, uh, get numeric that I showed you before, because this, you know, you can do arbitrary queries. Um, you can also do, you know, the same thing. You can have, uh, indices, so see, I'm, I'm gonna get all the ages of people who are 26. It's about 10,000, which is about right for, you know, the million documents evenly distributed age. Um, the problem though is there's a lot of, uh, rough edges right now with, with PLV8. I think I don't think there's any more crash bugs. There was one. I think that one got fixed. But there's still some problems, like, uh, sometimes what you get back is an array from Postgres, and that's sort of, um, not so nice to work with. Um, there's other problems with, uh, the data types and some things. So there's some rough edges. It needs to be, you know, cleaned up and polished up before, you know, people in general start using it. But I think there's a lot of promise. Uh, so here's a real bad idea. So here I'm just, uh, evaluating any JSON that, uh, or any, uh, JavaScript that comes in and evaluating and running it. Um, so, so what can we do with that? Uh, we can do something easy, like just returning a new date. There it is. Um, we could take a very large number and turn it into a string, and it happens to be JavaScript. Or we can, uh, you know, go actually iterate through our data there and, uh, split it, sort it, and jumble up the letters. So this is, you know, ad hoc JSON in your, your SQL. Uh, I don't know if I'd actually use that in real life. It seems kind of dangerous. But the JavaScript SQL injection. But, uh, you know, you know, there you go. Um, so what about Ruby? So PLV8 has or, or h-door has already seen, you know, some great Ruby adoption. Uh, PLV8 really needs, uh, some work to make it all transparent. I've done a little bit in there, but if any of you guys beat me to it, I'd be more than happy to. But, um, this is sort of what I'm, uh, experimenting with. Something like this. Uh, if we make, just like the h-door operator, if we make that same operator for JSON select, uh, we can do something like this, where the cost is greater than 10. Uh, the, the problem is, is when you do, you know, greater than a 10, uh, you have to cast that into a numeric. Uh, because Postgres can't do the greater than on a string. This is some of the rough edges of, you know, these Postgres is real tricky about types. It's, it's strict. Um, but, you know, this, this sort of works and it's kind of nice. Um, so, these are the, the ideas that I'm going over. Um, so we have is, you know, now we have friendly documents that, you know, everyone loves to use, but, you know, in a world-class database, which is pretty awesome. Um, but one of the, the ideas that I want to start thinking about and, and starting a conversation about is the locality of your, your data. So we've seen over time, like, you know, web applications were all in the, uh, you know, Rails place and then we started pushing some of more of the logic into JavaScript on the server, on the client side. And, you know, that makes a lot of sense for some things where you do the work on the client side that belongs there. Uh, the same thing applies to your database. Sometimes it makes a lot of sense to do the work in the database and I know I'm guilty of this myself, uh, before I was sort of, you know, treated my database as a black box and, you know, didn't really want to do anything that I just treated as a dumb data store. Uh, I was sort of buying into the idea that my ORM, you know, made my database agnostic, but I really think that the ORMs, that's really a selling point to get more people to use the ORMs, I'm never going to switch my database of a production app. You know, once I have a lot of data in there that becomes too painful and so might as well take advantage of the features that my database has and really explore that and, um, I invite all of you to, uh, do that too. And so, that's really all I have to say, um, the slides are up there and thanks.