 Today we're going to talk about a somewhat non-standard approach to search in your applications. Often search is done in some sort of an external service. Better. I'll just stand like this for half an hour. Starting over, because I had that extra time. We're going to talk about a somewhat non-standard approach for search in your applications. Often search is going to be done with some sort of an external service. Other than doing that, we're going to use the tool that's already responsible for storing and finding data. Welcome to multi-table full-text search in Postgres. I'm Caleb Thompson. We're not doing a Wednesday hug. You can find me at these places on the internet. Feel free to tweet to me during the talk. I love that shit. But if you're not tweeting, please do close your laptops. There's going to be a lot of code here, and it's going to be hard to get anything out of this talk if you're not looking. Now that I've told you that you should close your laptops, I don't like to jump into credentials at the beginning of the talk. You're already here, and that's sort of what the point is. We had the abstract. We had the bio. So you'll hear a little bit more about me at the end, but I do have one claim to fame, and that's that my birthday is in the Simple Delegator docs. So you know I'm an expert. I'm going to talk to you about a real-life feature. We're going to iterate, explore other options, and optimize, just like we would when we're developing a feature in our applications. We're going to talk about full-text search, what it is, how it can help us. Hopefully, you could have guessed that from the title. We're going to talk about views, and no, not the HTML views. We're going to talk about database views. Hopefully we want to explore the performance implications of whatever we're doing. So we're going to look at some of the performance implications and how we can mitigate them. We're going to talk about materialized views, of course, as one way to do that. We'll look at a couple of gems that can help us out while we're doing this fun stuff, and of course, we're going to look at all of the other options or some of the other options for what we're doing. All right, let's look at sort of the classic example. Let's search for articles in our application. The simplest thing that we could possibly do that works is to search for substrings. So here we've got articles where the body includes. How's that looking? Terrible? Can somebody hit the lights over there? Just keep going, it's fine. How's that? Better? That looks better. I can see it. We're going to look at articles where the body includes some substring. Pass in that query. You've probably seen this in your applications. This works. It works if your users know exactly what they're searching for. So if they want to find an article with, say, the word, book in it, and they know that it's going to be lower case in the middle of a sentence somewhere, then they can search like this. And you can pass that search right in. Like I said, that's doing exact substrings. That's not the most useful thing. So a tiny little step forward that we could take is to do the case and sensitive like with I like. I don't like this. But it is slightly better. All right, well, let's leave that. It sort of does what we need for now. You don't need to know where in the sentence the word that you're searching for is. That's cool. OK, well, naturally, features expand. We need to search based on the titles of our articles. We probably should have seen that coming when we were searching. We can handle this too. We'll just sort of extend what we had already done. We're going to pass in that query two times instead of once. And we'll still do that case in sensitive like. And the percent signs in here are so that it'll show up anywhere in the query. Anywhere in a word. So you could have any sort of substring in the. So now we want to search by the author's name. This is getting a little more complicated, but, you know, again, probably something we should have seen coming. We'll go ahead and join onto our user model. I apologize. We're going to use the user and author's tables interchangeably in this talk. So user's table. So this is basically that same query, but we're doing we're letting Rails, A-Rail handle the join. And then we're doing pulling out the user's name in the same two fields that we were already pulling out. That query, query, query is starting to stutter. Something that we don't really want in our code. So one way that we could refactor something like this is to do it in a query object. And this is less performant, but arguably easier to understand when you're looking at it. But when it comes down to it, we still got really poor results. We're only searching for these case in sensitive substrings. And, you know, that's not great. What if the word is going to be plural, and we have a singular query? Or what if we're searching for singular things? Google knows how to do these things, and that's what our users are going to expect from us when we're building this search feature. Enter full text search. Full text search allows us to query for rows based on natural language searching. Hey Caleb, what is natural language searching? I'm glad you asked. Natural language searching allows us to remove stop words from the query. So these are words that appear in all sorts of sentences that have not a lot of semantic meaning to us, and we want to not really include them in our search results. We don't want every article that returns the word and to show up in our search results. Again, just like with the like versus I like, we're going to eliminate casing, fairly straightforward. We want synonyms to show up. So if our user has sort of a concept in mind that they're searching for but doesn't remember exactly what it is, then both of these should return the same results. And we're going to incorporate stimming. It's another feature of natural language searching, which means that related words like try, trying, and tries, these are all different versions of the same root words, so we record them under the same concept. And when we're doing our searching under the hood, we're actually searching for that root word instead of the specific words that we're passed in. So here's an example of making that same query. We're going to sort of zoom around in this code a little bit, and we'll highlight the more important pieces. So here we're looking at the text that we want to join. We're saying title, concatenate that with an empty space, and then the body. We'll call that the text just because we need to give it a name for Postgres to be happy. And so that's what the two pipe operators are, the concatenation. And we're also going to pull in the author's name as the text. And naturally we want the ID when we're pulling out of the article and the article's ID when we're pulling out of authors. We want unique results because we don't want the same article to show up a bunch of times in our query if it shows up multiple times in the body or if it shows up both in the author's table and in the title or in the body. All right, so that's a lot of SQL. Where do we put all of this? We could throw that back into our query object that we looked at last time we had some code, basically just some inline SQL, pass that through an execute or where. Same thing with the scope. Throw it in there, just pass in that query so that it's interpolated. But to be honest, SQL doesn't belong in .rb files. We've got an extension for that. And so Postgres actually has our answer in the form of views. View is a partial query stored in the database that can be selected from and it returns a set of columns that can be searched on later on. The nature of views is that because they're basically just a query, they can have multiple source tables. So right now we've got the user's slash author's table and we've got the articles table. So this view will allow us to abstract that away and just say, this is the text that we care about. And then we can perform a where later on. We can do whatever we need to complete that query so that it's meaningful to our users when they're actually performing a search later on down the road. So if we were going to build, here's just sort of an example view, we've got this create view syntax. Just give it a name, just like you would a table. We can select distinct user IDs. So right now, obviously, we're pulling users with recent activity. So we're going to look at a couple of different tables. We want to see all of the information about the user and also the last time they were active. So we only want one instance of each user. Like I said, we want all of those rows from the user's table and we're going to create this concept of a user's last active time by pulling in the activities created at column. And we're just going to limit that to users who were active in the last seven days. So when we're looking through this view, it looks pretty similar to what searching through a table would look like. You select everything from users with recent activity, where, order, whatever you need to do. And in fact, it's so similar that ActiveRecord can use of you as a back end for a model. So what we can do is create a fairly vanilla model, looks very familiar, and we can interact with that as if it were a table in our database. So we've got this users with recent activities model. As you can see, it's an ActiveRecord subclass. We're going to give it a table name just because our naming of that table didn't match what rails would have expected when given the class name. And we're going to tell it that it's read only. This isn't strictly speaking true, but it's easiest to just assume that a view is going to be read only. If you need it not to be read only, then there are some special rules for that. That's an exercise for the audience. But what this says is it tells rails that nothing can be deleted and nothing can be written into this table. So that's read only. You can only query against it. Will this work with the full text search? Luckily, yes. We're going to talk now about our first gem, Textacular. Textacular was originally written by Aaron Patterson and lives here on GitHub. Textacular takes care of the full text search portions of the queries. And it assumes that you want to search over every text field because it's called full text search. So it's full text, full text search, I guess, on a record, on a table. And it gives you some variant search options like basic search and fuzzy search and advanced search. For our purposes, all we really care about is this basic search. And that's going to be what's most generally useful when you're building sort of a single field input that your users use to get results back. So that looks like this. If you're searching for a game, any sort of game that included Sonic, Sonic the Hedgehog, or Super Sonic, whatever, I don't know. And you can get a little more complicated even with the basic search and say the title needs to include Mario and the system needs to include Nintendo. But I want any Mario title on any Nintendo platform. So this is sort of the next simplest useful thing that you can do with Textacular's mixin. All right, so let's go back and take a look at that search that we wrote. This is that same SQL from before to get out articles based on either the article's name, body, or the author's name. So our search result is really simple on the rail side. We're going to create this three-line class. We're going to include Textacular. And we're going to tell it that it belongs to an article because we named that field article ID. And we want to actually use it if we want to say find an article written by Sandy or that it mentions Sandy. Then we just do this basic search for Sandy and map that onto the articles. If you wanted to get a little bit crazy, you could include enumerable into your record. Enumerable is a super important and very useful feature of Ruby built into the Standard Library. And if you don't know about it, feel free to come up and ask me afterwards. But basically, it's going to give you all of those cool each and map and everything else. So you'd be able to use this class with search result dot new, the query, and then dot each. So basically, you can treat it as if it were any sort of other collection, array-like collection. So creating this view, I'm sure I've convinced you now that views are great to use and that you want to use them, you want to know how to use them. So creating that is fairly straightforward. You've got active record-based connection and execute, so this is a migration. You could actually just shorten this to execute. And we're going to use that createViewSequel that we just had on the screen. And then to drop that, we just say dropView and then the name of the view. How resistant changes this? All right, well, let's find out. Let's look at some feature creep, as we always have in our features. Project Manager comes back and says, articles whose comments match the query should also show up in the results. If somebody has mentioned Sandy in a comment about an article, we want that article to show up in our search results. So to recap, we're now searching on an article's title and body, an author's name, and a comment's body. And any article that any of these things are related to should show up in our search results. So the updated SQL looks like this, the updated query. And the new part is this new union with a fairly straightforward select and join, where we're pulling in the body and the article ID from a comment. So let's take a look at that update view migration. What we can do is throw the new SQL into that up record, just like we had done before. And unfortunately, what we need to do for the down migration is put the old text of the view into that down migration so that when we're rolling back, Rails knows what state to put the database into. That's sort of a pain, but we can handle that. That's not too bad. And unfortunately, we can't always update. You actually can't remove any columns from the view, which we happen to be doing when we're rolling back this migration. Because we no longer have, oh, I'm sorry, no we don't. But if you had added a new column into your view so that you have more information, say that active back column from before, if you say we don't really need that active back column anymore, let's remove that from the user with recent activity, then you couldn't just do that with an update. So what you have to do is just first drop that view and then create the view again. And again, we need to dump that whole SQL into the migration. So that sucks. You also can't dump a view into dvschema.rb. And so the solution to that is to tell Rails that the database format is going to be structure, and then you're going to dump into structure.sql. It's going to dump out an actual Postgres SQL version of your entire database. Sorry. Luckily, we've got our second gem, scenic, which adds view methods to migrations and allows views to be dumped into dvschema.rb, which is what you expect. And just generally makes views a little bit easier to work with. Of course, I had a little hand in this. I am one of two main containers. The other one is in this room of this gem. So creating this scenic migration is pretty straightforward. The readme goes over it, but you're going to write into a SQL file basically just the query portion of the view. So you don't need to worry about the createViewSyntax or the dropViewSyntax. It'll handle that for you. And because you're writing it into, you actually write it into a .sql file, and so you're getting whatever sort of editor support, your Vim, or Tmk, or Sublime, or whatever people are using these days, Adam, you get whatever benefits that gives you. So mine gives me indentation and some nice syntax highlighting in SQL. And then for the syntax in the migration, you actually just have this createView, which is reversible, just like createTable, and you can go back to using that change method. Then if you need to change the view, you can actually just do this updateView. You tell it what version numbers, and it knows, based on a naming convention, how to find the new and old versions of the SQL for the view. Even that's a little bit tough to remember. So we did create some generators. We actually have a model generator that gives you that read only and infers the name of the model based on the name of the view, so that that naming will match up. So you end up with a three line method, or a one line method, and a total of five lines in your file. And then when you're writing into that SQL file, it looks just like this. So the first version of the searches, we just write in this is that same SQL from before, pretty straightforward. And that lives in a SQL file, so that it's a lot easier to read and look at when you're in your editor. We also have a view generator for when you need to update that view, so you can just do, rather than scenic model, you can do scenic view. I don't need notes. And that will give you the next version of the next numeric version of the view. And it dumps in the old version of the SQL, which then you can update with whatever you needed to add. Now there are definitely some performance implications with this approach. As I mentioned, this query is pretty slow. It has to search across three different tables and a couple of columns to get us all of the results that we need. So what it comes down to is actually an order of magnitude slower to get these results. And unfortunately, views can't be indexed. Luckily, underlying tables can. So the recommendation here is add indices. There are several types of index. The one that you're most familiar with is Btree. Btree is great for exact matches on a column. So either text or even like the substring matches are OK with a Btree, you might get an index hit with that. And definitely for primary keys where you're just looking up an ID or a UUID, those are great. For full text search, the ones that we're interested in are Jin and Just. Jin stands for Generalized Inverted Index and Just stands for Generalized Inverted Search Tree. There's some information you'll never use again. Jin lookups are generally about three times faster than Just. They also take about three times longer to build. Jin indexes are moderately slower to update than Just indexes. And they're two to three times larger on disk. What does that mean? Who cares? This is what it means. You should use Jin if you don't have hundreds of thousands of rows in your database. You're not concerned about longer writes blocking the DB. You're adding the index late in the game. And you don't care about disk space, perhaps, because it's 2016. And you want very fast lookups. So we're optimizing for read heavy. If we were building a log manager or something like that, then we would want to optimize for write heavy. And maybe Jin isn't the right solution. So on the other hand, you should use Just if you have very large tables, millions, billions of records. There's an order of magnitude in there. If you're between those two, it's up to you to figure out. I work at a consultancy. You can pay me. You should use it if you have performance concerns right now. And when I say that, I mean that you currently have performance issues, not that you are concerned that in the future you will have performance concerns. You should use it if, for some reason, disk space is important. It's 1994. And you should use it if your table is very write heavy. Like I said, log aggregators are a great example of this. So adding those indexes as Jin is pretty straightforward. These are the four fields that we've been using. And you just say using Jin, Rails knows how to handle that. Materialized views are another way that we can improve this performance. Materialized views are a tool to pre-populate the results of the view. So it's going to run that entire search query that we had. And it's going to store all of those results into a temporary table. So we still pay, say, the 400 milliseconds whenever we're creating that table. But then we can query against the temporary table, which already has the results in it. And that's much, much faster. So we query against that result set rather than perform the full query. And it's another order of magnitude faster even than the I like was. This is without the indexes. And the downside of a materialized view is that it's not always updating because it is storing into that temporary table. You have to tell it when you want to pay that 400 milliseconds to get your update happening or however long your query takes. And you can do that as often as you like. You can do that on every write with a trigger in SQL or with an after-commit hook. Looks like this. Or you can do it maybe on a timer. If your search results don't always need to be up to date, you could have the Heroku schedule or do it every 10 minutes or hour or day or whatever. So you can do that either with a Postgres trigger exercised to the reader or with Rails after-commit hook. That looks like this. All right, well, what about some pre-built solutions? There's a lot of options out there. And I did say that we would look at them. We've got Elasticsearch with either Tire or Elasticsearch Rails with Chewy. It's including who knows how much into your models. I know it's a lot. You can use Solar via Sunspot, but holy shit. Sphinx with thinking Sphinx actually does use a separate file. But still, I don't know what any of this means. Why do I have to figure this out? I already know things. I know SQL. All right, so what these services are great at is faceted search. If your search doesn't look like a single box, it looks like Amazon's sidebar. Then full-text search is gonna be a little bit more difficult to work with. I'll admit. Or Postgres' full-text search. These other tools do full-text search for you as well. All of these things have to run in your development machine. They have to run on your production machine, which means that they have to be running. They're slowing down on your machine. You have to deal with all of these dependencies. You also have to deal with them every time you're doing an update to your system. If you ever change that version, then you need to make sure that development and production are all in the same version. If you're ever gonna roll back, you need to make sure that's handled. Big pain in the ass. They all need to be faked in tests because you don't want to be actually using these things live in tests. In fact, I had a couple of co-workers who were using Solar, I believe, and a great feature of Solar is that it synchronizes its index across the same subnet. Because they were both sitting on their work computers and had the same external IP addresses, their test indexes were being synced between each other, and that was a lot of fun. For a week. All of these have a lot of cruft in the models. I said some, but it's all, and they're removing a data concern from your database, database, yeah. They all have this arcane syntax, and ultimately they make me make this face. So, by combining materialized view, full-text search, and some Rails magic, we have a pretty cool search feature that doesn't require any new dependencies, and it makes me smile. Thank you.