 Hello? Is that fine? Everybody can hear me? Great. So, welcome, everyone, to this talk on Search Index. Let's see to avoid the feedback. Search Index is a persistent full-text search engine for the browser in Node.js. It's written in JavaScript based on level DB. So, my name is Matz. I have a degree in mathematics, but I do search at Comperio. We're a consultancy. We mostly do elastic search and solar. And on my spare time, together with a good friend and colleague of mine, we build Search Index. So, I'd really like to thank all of the volunteers and people who made this conference. And thanks for inviting me. We're having the chance to talk about Search Index. Switch? So, that's better, presumably. Right. So, Search Index is a project that saw light in 2013 by a good friend and colleague of mine, Fergus McDowell. It was originally the underlying technology of Norge. Norge is a search server like elastic search and solar that you can install with NPM. So, elastic search and solar are heavy Java sort of machines that needs a couple of servers to run your application. And Fergus, at the time, thought that you should be able to have search by just simply NPM installing it. So, when I joined the project in, like, last fall, our priorities kind of changed because at the time, like, elastic search and solar, they became so much better. Like, elastic search, you can get up running in less than a minute. So, even though Norge is even quicker by NPM installing it, sort of that niche fell through. But at the same time, we realized that Search Index actually can run inside a browser. And there is some amazing technology that lets us do that. And perhaps one of the most important one is the level DB project. Before I tell you more about that, I just want to say what kind of features you can expect from Search Index. So, Search Index is a search engine. It searches documents for full text. So, you can do search. You can have facets and filters sort of count aggregations and then filter on specific fields matching to some value. It has a matcher, although it's not fuzzy at the time, but we're working on it. It has stop work, stop work removal, support for snapshot and replication, which we experiment with doing over HTTP. And you can wait individual fields and do the leads and updates. So, it's basically what you'd expect from a modern search engine without, like, the fancy aggregations that solar and elastic search do now. So, as I said, we're based on level DB. Level DB, if you don't know, is a key value store developed at Google, inspired by the Bigtable database. And it's actually part of the implementation of the index DB spec in Google Chrome. So, since this is a key value store, we actually store all the contents of the search engine, all the term frequency, inverse documents frequency, everything is stored as keys and values in this database. So, actually, the recent search index works is because level DB is lexicographically sorted, so we just pull out ranges all the time from the database. So, but level DB is an interesting project because it's written in C++, but it has bindings for Node.js. So, it's split up into two halves, like, level down and level up, and level up is sort of the interface that we interact with. So, level up from level DB.org is a Node.js project that aims to provide a common portable interface to a multitude of level DB forks. So, by talking to level up and not directly to some low-level database, we can actually switch out the back end. So, search index can use any sort of level up compliant database as a back end. And this is an important step of our journey of making a search engine run inside your browser. At the same time, search index is written in JavaScript, but for Node.js. So, we have a bunch of require calls and stuff that you wouldn't normally do in a browser, but that's where Browserify comes in. So, if you don't know this already, I'm sure all of you uses it, Browserify turns your code that has require into a sort of bundle. So, it goes through your code, bundles all your require calls into a single file, which you can include in your browser. And together, sort of level up and Browserify makes search index truly back end agnostic. So, on the server side, we can run search index on top of level DB, on MongoDB, on Redis, and on the client side, specifically in browsers, we can run it on index DB or even local storage. So, if this was a search conference, like all of you would sit here and, like, shake your heads and saying, this is madness. This is a JavaScript conference and we are JavaScript people. So, there are sort of cool things to actually having a search engine running inside your client. So, first of all, depending on how much data you have, you could lower the bandwidth costs of your users and yourself as an app developer. In a traditional search scenario, the user sends lots of search requests and you have to respond to them over HTTP and it actually allows you to have lower hardware costs as well. If you replicate sort of a search index as a file and then replicate that index in the browser, what you only need to do on the server is actually create that index once and serve it by a static file server. So, a good friend, Espen, he likes to say that search index scales naturally. What we mean by that is if you have an elastic search cluster that has, like, 1,000 or 10,000 requests per second, chances are your cluster is on fire. You're doing everything you can to stop it. But if you have 1,000 or 10,000 search index clients in the browser searching one time per second, that's nothing for search index. So, that's sort of one thing to think about. If your data isn't huge or if you can break it into small pieces and tailor each individual user with a small ball of data, then perhaps moving search to the client is for you. So, another good point is that search index runs offline and some people say that the best thing you can do for app performance is making your app run offline. And by increasing your performance, you can have greater user experience. And there's also this point that we tend to forget that people actually are offline. Here in India, I can't do roaming on my phone because it's crazy expensive. So, I'm offline all the time. All my favorite apps don't work. And on the train, on my way to work, my phone is offline. So, making your app run offline is actually a huge advantage for you and your users. And you should seek to do that. So, now I told you what this talk is about. And it's partially about search index. And it's also about moving search into the client. I told you some reasons about it, about how and why you should do it. And now I'm going to show you exactly how search index works. So, to install it, you do MPM install search index. Or if you're feeling sort of lucky, you can clone the Git Hub repo and try out the master branch, which is changing every day. So, we're working our way towards 1.0. And sort of a bunch of the things you'll see here might change in a couple of weeks, a couple of months. But it's also a great opportunity to come in and contribute to the project. Because at the time, it's Fergus and me, we're the only active contributors. We work on this every day. And we rely on our users to sort of come with feedback. So, in your code, you simply do require search index. And that exposes a function that takes in an option subject. It can be empty. But this is where you typically switch out your low level database. So, by default, we use level up. So, by default, it binds to level DB. But you could put here MongoDB or Redis or your database of choice as long as it's level up compliant and has sort of a down component that somebody has to write. So, adding documents is very simple. You do SI add. And data here is either a single JavaScript object or an array of JavaScript objects. And once the callback is executed, and the error object is null, then the data should be indexed. Since this is a key value sort of situation, in order for it to be computationally feasible to do filters and facets, we require you to actually specify that in an options object. It's the argument number two. If you want to sort of provide account aggregation on some field, or let the user filter on some field. But we don't require that if all you need is full text search or match your functionality. So, this is what a typical document looks like. None of the fields are actually mandatory. But if you don't specify the ID field, we'll generate one for you. So, this is a document. Indian banks provide record low interest rates. Pretend there's somebody in there. It has just one topic and then a country. So, searching this kind of data is simple. You do an SI search, and then a query object, which comes next, I'll show you. And then you supply us a callback which we execute with error and response objects. So, again, if error is null, then rest.hits should hold the array of hits that matches your documents. So, this query object could look something like this. So, this is also subject to change. But right now it's a simple API and lets you do like cover quite a lot of use cases. So, if you want to search for Indian bank in all the fields, you specify sort of the star here to indicate what field you want to search in and then the tokens, the words in NRA. If you want to specify a given field, you do title as the key and then Indian bank is the value. And this query object also lets you search in different fields. This is very typical. If you want to say, I want to document that has India in the title and it mentions bank in the body. So, this is an and query. We, as of now, don't have support for an or query. So, here's the kind of like not so beautiful thing, and it's that even though you're only searching for one token, you have to wrap it in NRA. This is changing, and perhaps you're the one who implements it. So, I told you a bit about facets. Facets are sort of functionality that is a count aggregation. It's typically what you find on Amazon when you search for a TV and on the left hand it says 40 to 42 inch, there are 46 TVs. 50 to 55, there are 30 TVs. So, for the document or like documents like I showed you before, if you specify facets and then country and then an object which can have some options but doesn't have to, you'll get out something like this. It'll tell you like, oh, we found six documents that have the country field set to India, two that has the country field set to USA, and one that has country field set to Norway. So, the counterpart of facets are filters. And what you typically do is you expose the facets to the user, the user clicks on it, and then you put that in a filtered query. So, in order, if the user clicked on India and you want to filter on India, this is how you do it today. It's a bit ugly because this is actually a range query. So, we're actually filtering on the country field from the value India to India. So, if you wanted, for example, to filter on India and Norway, you wouldn't do like the thing you thought you'd do to switch out the second there with Norway, you would have to specify another range. This is also probably changing in the next couple of months. But still, it allows you to do what you want to do just in a sort of crooked way. So, I'll show some examples. I'll show you how to use it in Node.js, and I'll show some examples of how we use a search index on the client. So, this is sort of the simplest example I can think of. This is where we just create a data array. Here we have two documents, ID 1 and 2, and we instantiate search index where we can give it sort of an index path. And the index path is the file path to the level DB index. So, once that's done, that's a synchronous operation, we can add documents. So, we do SI add data. If there's an error, just return, ignore everything. Then you can search in your data. So, searching is simple as I showed you. SI search, query, specify what fields, cool. So, we can run this. We get back. This is the response.hit subject. And you can see the score for each document and the document itself lies in the document property. So, we can switch out this field with the title field, and it still works. We find sort of the documents relevant. But if you switch this with ID field, I actually don't think you can search in the ID field. So, if we specify some other fields here, like body, hello, then body, hi, we run this again, and we try to search in this. Again, works. But if we switch this field with body now, we shouldn't get any hits. So, the hits is empty, as you see here. A bit confusing with the old outfit. So, this is actually really simple to get search index up and running. The data array here, it can look like anything. It just has to be a JavaScript object. And what we kind of tend to focus on is actually running search index inside the browser now. So, I'm going to show you a simple example of search index running in the browser, storing data in index DB, and just show you how it works with facets and filters. And please, please excuse my limited HTML and CSS knowledge, because this is really ugly. So, here is searching index running on index DB. And searching for a star gets me a bunch of documents. So, this is just a test document, no, a test dataset which we like to use. And here I've provided some facets on the places field. So, it says that there are nine documents that are from USA. So, if I click that, the total hits changes to nine. If I click Argentina, I should only have one document, and that has the places set to Argentina here. So, normally, if you do a query here, you would update the facets, but I didn't have time to do that, so I'm really sorry. But it still works. You get the correct documents back. So, let me show you just, is that too small for? Make this a bit bigger and show you what happens here. So, this is index DB. And we have sort of, when you do, when you specify the index path, you specify also sort of a namespace in index DB. So, here we have a table which is what search index, how to search index source data as of today. So, it's a bunch of keys. It will start with delete document, which is kind of not good because it takes up a lot of space. We're changing that as well. But you can see here that it actually stores data inside the browser in index DB. And that makes queries very snappy, like you say. It's just a JavaScript and an index DB operation. So, if you have bad internet, this would be miles faster than doing sort of a search over HTTP. So, I have this different example on my screen at least. Let's see. There it's back. Yeah. So, this is actually search index storing data in local storage. Yeah. Not certain what happens with the screen here. It doesn't like this. Example. Just one second. We'll be back. Yeah. Okay. Yeah. So, this particular example is actually storing data in local storage. And it looks like this. Local storage isn't really feasible for storing a lot of data. It's just cool to see that you can actually just switch out what kind of database you want to run on. So, I'm going to show you what that looks like. So, CD local storage example, main.js. And actually, yeah. So, these two lines, it's all you need to specify to actually switch out the database. So, here we require some local storage down library, which provides low level bindings for local storage. And you just specify that in search index. It really doesn't care. I also want to show you an example with search index and replication. So, search index can replicate itself into a file. That's really just a level DB operation. See, HTTP here. And you get this sort of file. That's the raw. It's just an adjacent array of all the documents. And now I'm going to show you an example that has JavaScript that actually goes out and gets the search index over HTTP, which is kind of what you would do if you had a website or a web app that had generated an index up front and your user logs on for the first time and you replicate the data. So, this is search index replicating a file over HTTP. Probably should show you the code first. So, just a simple AJAX call. Cache false. And then there's an API for replicating a batch. And once that is done, we can search in the data. So, once we see DB serialized here in the console, we can search in it. So, here I've specified level.js, which is the project that actually makes search index run on index DB. So, let's see how this works. I'm going to head up to the network tab. Whoops. That seems to work. Here you can see that it's done request. And it got the whole index as a response. And there's sort of a drawback of doing this, of getting the whole sort of index pre-made. The index is generally bigger than your data because it needs to store the data for each field in a different key. But the sort of plus side is the indexing is really a lot quicker. It doesn't have to do any calculation once you get the data. But if you have a lot of data, you typically want to gzip the raw data object and send it instead of the replica. So, here's the DB serialized that we can search in it. It's the same data as before. As proof, we have this HTTP example. And it's still in index DB. Yes. So, the last thing, last sort of cool example I have is there was a talk yesterday about JavaScript internals. Great talk. But it showed how like running lots of JavaScript can obstruct rendering of the page. It can hinder the loop that does that. And search index, if you give it like 1,000 documents, 10,000 documents, and it runs in the main thread of this tab, it's going to block the rendering of the UI definitely. So, I have this example where search index actually runs in a separate thread in a web worker so that once you load the tab, it chips off a different thread in the browser, indexes the data, and you search by actually sending messages via this API. So, just to go to this web worker example, so we have this main and worker code. So, this is the web worker. It does the work that you would normally expect from search index. It requires search index, it adds documents, and it has an event listener, which is like the inter thread communication error. So, we listen on message and we expect a query. So, we just search for that query, and once we have the result back, we just send it back. And this is sort of the code that you put inside your HTML. So, we require this WebWorkify, which is just a nice way to package the web worker code made by the same guy who made Browserify. So, the rest of the code is the same. Let me show you how it works, actually. This is search index running in a web worker. So, we can search here. The thing is, we won't actually find the data stored here in index DB because it doesn't live here. It lives somewhere we can access through some other inspector, but not this one. So this is actually how you would sort of do complex and hard operations on your page. If you'd like to index your data instead of replicating it, you would typically run it in a web worker or even a service worker when that API sort of gets a bit more mature. Yeah. So, this is what we do today. If we try to replicate big batches, and we do all the complex work inside a web worker, and really, if you're indexing, that is a lot of complex work. It has to calculate a lot of keys, and if you have updates for your documents, that's even more resource-intensive, so we do that in the web worker. So, in the future, there are some things we'd like to do with search index. So, one of the things we realize that we probably don't need in the browser is the functionality of indexing. So, if you're going to replicate either way, you don't need the extra code for being able to add documents on the fly. So, we plan to make modular builds, sort of like the core of search index is only less you search in your data, and then you have to sort of specify if you would like to be able to add documents. As of now, there's options that tells you if you want to filter on data, pass it on data, and be able to delete data. If you make it undelegable, the index size gets smaller. So, since search index is based on level Db, which has this really nice streaming API, we're thinking of rebuilding our API as well, top of that. And there's quite a lot of work to do with the index structure to get the keys and values sort of smaller. And, yeah, so this is all in the pipeline. So, there's one, there's one caveat with search index as opposed to having a search server or a search cluster with elastic search for solar. And that is like the amount of data. If you want to actually run a search on the client, you probably shouldn't do that with like a lot of gigabytes of data. You should do that with a couple of hundred megabytes at most. And this makes it sort of unfeasible for the sort of big data revolution. But we sort of also think that you should embrace small data because if you have lots of individual users with individual data, chances are like when you look at it all at once, it's a big blob of data. But if you look at it on a per user basis, maybe it's just small blobs of data corresponding to each user. And if each user only needs this small amount of data, you can sort of generate that index and serve that to that user. So that's how we like to sort of see search index fitting in the next couple of, well, in the future of search and data. So I'd love for some of you to get involved by asking a question. There was a guy who came up to me before the talk, asked a question about a use case. That was awesome. Please do that after the talk. We'd love for people to bring it to new platforms. Sort of, yeah, make it run on electron. For instance, there was a talk here yesterday about that. I haven't tried it yet, really want to. Just make an app, submit a pull request or issue. And we're really welcoming to people who do this. And you'll get a lot of love if you even just post a comment or something. So you can go to github.com, slash Fergie McDowell, slash search index. And yeah, that's it. Thanks a bunch. Questions? Hi. That was great to know about search index in Node.js. So my question is, you know, when it comes to performance benchmark, how does it compare with solar and maybe elastic search? Well, if you factor in the sort of cost of doing HTTP requests, you will have a very competitive search engine. But doing so search on like, on big data on the server, elastic search, and Lucene, the things running on Lucene will be faster, I guess. Sort of the thing you get from it is having a search server running in Node will demand a lot less resources. We have all clients who need five or six servers to run search for their users just because they need Java and elastic search and solar. So you sort of trade off a bit of speed. If you go for the, if you go for a Node.js version on a big, big data set, but on the other hand, if you, if you go for a sort of moderate data set on the client, you gain a lot from not having to do those HTTP requests. One last one. Does it support geolocation searches? Again? Special searches? Does it support? So based upon latitude, longitude, and you know, I give a radial distance kind of a search. Oh, yeah, that actually works by geohashing. And we have this, you can do numerical ranges, but it's a bit, it's a bit weird today. You actually have to pad your numbers in a special way because everything in level to be a strings, but we're switching out like the whole way of storing data. So we're encoding it quite since that we'll be able to support that more natively. Hello. Where are you? There you are. So I have a question like you give the example of your code, how is search engine, search index works? There is a part where you initialize the code request, request the module and option local DB. But you haven't defined any callbacks. When it's load, how we get the notification that data is loaded, or you every time I add or something you hit in database, I'm not sure how it's work. Not sure if I got that. Yeah, we're talking about this before that, like, yeah, yeah, the required. Yeah, yeah, you have started required the search index. Yes, and you give the option over there like which DB you want to use. Yeah, yeah. Okay, this one. Yeah. Yeah. And now after that, you are adding or something, but how we'll get notified that your DB is up, like in elastic search, we get the notification callback where is it said it is up, now you can insert or something. Yeah, so this is a synchronous operation. So at the next line, you're able to to interact with your database. But what if I get some error? So it will initialize the code SI or it's throw as an error. Oh, so you wonder if you have data in that database already? Oh, you actually have to you actually have to add it so that search index can do its relevant calculations and and like make the the keys it needs. It can sort of connect to a database and then work on the existing data. It has to re index it to in order to calculate all those term frequencies. Okay. Yeah, sorry. Oh, hi. Hi. Continuing this question, I wanted to ask, like you said with 100 MB of data, elastic search is I mean search index is competitive with elastic search. Can you throw some number? I mean, is it faster than elastic search? Or is it? I haven't properly benchmarked it. Okay. So I can't really say yes or no. But what I can say is that you'd be surprised how fast it is and written totally in JavaScript. And the level DB implementation is in C plus plus. So it's extremely fast. Yeah. I can't say a number of of what what amount of data it's feasible for but it's feasible for low and moderate amounts. But I wouldn't use this for the sort of big data revolution if you're Google or or Facebook. Thank you. Thank you for your question. One question. For this ad API, it's a synchronous call. So does your entire code for search has to be in the callback or does it return a promise so that I can sort of have the callback chain sort of like save from that? Yeah. So so as of now, you need to have your code in the callback. So that's also like one of the future things is the streaming API which would allow us also to sort of say if you don't supply a callback, we can give you a promise back. And that is something I really want to do. So yeah, I know that's sort of not so nice. Hi. Hi. How does search index works internally? Like MongoDB has its own query language. So does it converge the queries to MongoDB queries like that? As for the specific Mongo implementation of it, I'm pretty sure it gets ranges of keys. So that's how it works in level DB. So we go into the database and we get ranges of keys where we know we have stored data to calculate the score. So in MongoDB, probably we have all the same keys and values and we go out and fetch in the same way the relevant sort of ranges in the data source. So we don't use I don't think we actually use like MongoDB specific query language. Just use the getting ranges. So the client said queries are converted into database depending upon the database I use the query language. Yeah, so that depends on sort of the the database, the low level database library you use. So when I specify it here, sorry, in the local storage here, when I specify that I want to use local storage as my database, I actually don't know how this library handles this. I just know that since it's compliant with level up, it does what I wanted to do. So how it handles those queries internally. I don't know. I just say to level up, give me this range of keys. OK, so it's up to sort of that that specific library to to do the database specific things. We only interact with sort of the generic API on top. OK, thank you. Hi. Oh, yeah. First of all, I think great talk and great effort to simplify this. Thank you. Three questions. First, does it search incomplete stemmed words? What I mean is, can it search blue in blueberry? If I. So if you if you search for if you type L O in the title, you had cool. So if I do, if I type C O O, will it search cool? Yeah, we have this for that the sort of search functionality today is decoupled with what we call the matcher. So the matcher matches on incomplete terms. So it would sort of suggest you to to write cool and then you'd get the document. So it would find in the database the matching the closest matching term and suggest like that as a search term and the search would would do the rest. So that's also something we need to work on because the matcher is fuzzy. So if you write like cool with one, oh, it won't suggest cool. OK. The other one I had is on the client side search that you're doing, you said you're using local storage and local storage in HTML5 has some storage limitation. So when it exceeds that, how do you handle it? Yeah, well, as I said, it's not really feasible to run it on local storage. It's just as an example to show that you can run it on almost anything that has sort of this low level. Library. But in in databases, if you run it on in browsers, if you run it on index to be if you exceed something like index to be can take up to like 50% of your storage on your computer. But browsers handle this differently. So Firefox would probably ask you at sort of 50 megabytes as a user if you're if you will allow the app to store more than this. So we don't have to think about that. The browser will actually ask you in a dialogue if you'd like to allow more data to be stored. And one final question. So when I'm using the index DB or something on the client side, and let's say if I have to update the data, is that possible other APIs to update it on the client? Yeah, there's an update API. What it really does is you add. You add the document again and it goes in and delete the document and add it back to the to the index. So it's a bit slower than indexing it totally in you document, but you can definitely do that. But I suggest if you do this in the browser, do it in a separate thread, do it in a worker. Thank you. That's it. Thanks. That's all. Yeah, thank you all.