 So hi everyone Today we will talk about how to come from a basic distance search to a complex multi-criteria search But first let's talk a little bit about me. So I'm Antonela Comme. I'm French and I'm Python developer since 2009 and I'm currently technically at Jeloumo camping car come which is a French website and we have other many websites Like alquilar me auto caravana.com For Spanish or I rents my moto home.com for English, which means exactly the same but in different language We have other for Portuguese, Catalonia and Italian countries, but Because of the accent I will not tell you that weird language So what we do at Jeloumo coping car calm and I will just say GLM for the next time because that's quite long So GLM is a private camper hire So basically we try to connect camper owners and travelers So the owner just posts ads on our websites and hopes to make money when the camper is not used and traveler look for a vehicle and just hopes to Enjoy holidays by traveling over Europe or French or Spain or whatever So the search is one of our primary feature in the websites because we have to connect the offer and the demand And to be simple. We are like the RBNB of campers So I say today we will talk about search But I just want to be clear with that today. We will talk about distance search A step backends Elastic search function score and maybe campers, but we will not talk about how to install elastic search how to install a stack or how to configure your elastic search indexes and We will neither talk to we will don't go deep inside the elastic search implementation so three years ago when we write the first version of the GLM website, we had to write the Search page and search backends. We don't have a lot of time because We had to write all the websites We just make something quick for the search and we say okay a simple distance search will be okay at least for the next month So we use to Django to do that And like Google auto completion in our form. So Google give us a long longitude and latitude We create just a geo point from that and Has to Django to compute a distance between that point and the camper parking address and Let's sold by distance and what? Searching is hard and maybe other than you think and let's see What's happened in the search results? So here is an example of search for Paris using this kind of algorithm As you can see the first camper is at two kilometer from Paris The third the second is three and the third is four kilometer for Paris The first is a van second is another van and the third is a proper camper You can see that there is no reviews for none of them and Worst you can see that we don't have any description for the third vehicle Here I'm quite confident that we can find better camper for people who want to rent a camper in Paris On that kind of slide you will see that there is blue lines. That's blue lines help our Worker people who work at GLM to know how the search engine compute the source here the score is Dependent on distance because we only use thought by distance. So as you can see the score is dependent on distance and And Okay, and the problem is what do we want? We want to reward owner that deliver a great experience to guests and The other question is what guests want So guests want to rent vehicles close to a city because that's important But guests also want to rent to a top-notch owner who rents often have good reviews answer fast and probably other things in the future Guests want an available vehicle. They don't want a vehicle so booked that the owner will never answer or will probably reject him and Guests probably want a vehicle with a lot of pictures because That's more picture you have for something you want to rent Better idea you have of what you will rent so Let's do things better At GLM we choose a stack in this stack we have elastic search which is a Fully functional search engine who provide a compilation highlights more like this function full text Searching in nursing Facetating it's I don't know if who don't know elastic search Okay So yeah that provide all that kind of things and it's more generally a kind of no SQL database Some people use like that In the other end haystack is a Django app that provide Orem and queries object similar to that Django Orem on different search engine like elastic search solar zapian whoosh Probably all there if you want to implement back in The idea is I like the a stack half rush who make easy to start And easy to understand for people who already know Django's because they have the same Paradigm like queries query set and it's convenient to make the code clear for everyone even for Someone who is for example a front-end developer who just Take a look to your form or your back in view and if that guy already seen How looks like As a query a Django query set he will exactly know what he's doing so Let's write or rewrite the same function with haystack and elastic search so I will not spend time on how your data are indexes and I will assume that haystack doc and elastic search doc are clear enough to explicitly say How your index your model So here is the equifunctional function the same as before with a Point instantiation Here the difference is we will ask a last haystack to give us a search query set for a given model which is an ad We ask to compute the distance between the location fields of an ad and the search point And let's order by distance Yeah, I said we need something better not just distance search And If you read carefully the haystack documentation You will arrive in the problem The problem is explicitly written in haystack documentation and say you cannot specify both distance and lexical graphic ordering together so you can Put as many thought by in a stack that you want Except if there is one keyword distance if there is one distance haystack will assume that distance is the name of a field And not the name of a compute field And that not work like that So after some googling reading documentation and asking myself if Is it the right stack? I find something in elastic search named function score and dg function score is The function score query Is a tool to take in control to your scoring process in elastic search It's allow you to apply function to each document that match to a main query in order to totally Alter alter or totally replace the original score So That looks I mean probably we need to use that The problem is function score And neither tk are implemented in haystack Before going deeper we just have a look on how Elastic search haystack and our queries Work together so Elastic search on the right side Provide only an HTTP API so Basically, when you write your query set on haystack haystack Have The haystack elastic search back and generate an HTTP get Send that to elastic search elastic search return you adjacent response and haystack parse that response and Transform that in haystack object and put that haystack object in the query set So it's an important thing for What's happened next? I already told you that elastic search work with HTTP API and Here is the documentation example of how it looks like a function score On top. I don't know if you can see this is just a curl gets on a given path What's interesting here is you can see that All your data are just put inside a function score object or diction error called whatever you want Um, that function score have a query The query is probably the original query That you want to search. Maybe it's a full text searching It's a filter on a given fields or whatever you want to To filter and to ask elastic search to to make your search But important things happen now the score mode the score mode is here to tell how elastic search will Compute the score that will be used to sort your response and score mode can have Different values. It can be multiply, which is the default one. It can be a sum. It can be average. It can be max mean or first will be the first score generated Here we will use the sum because the default multiply A kind of problem if one of your function that we will see after Return a score of zero for an object Whatever other function, whatever what happened Your object will be at the end because you know zero multiplied by something is like zero So let's use the sum And we will see that will be okay Next you have the functions And functions are least you can put as many function as you want Function always have the same form for decay functions You define a curve Curve are important curve are on the right. You have three different kind of curve the ghost the exponential and The linear the graphic speak Um Here we will apply a ghost curve on the location field with an origin point At a given geo point with enough set of two kilometers and a scale of three What does it means if you see that beautiful graph that means Okay that means Imagine a map where you have a central point and a two kilometers offset All objects inside these two kilometers will have the same score for that function and All the points outside Will start will see their score decayed And this case see how it will be decayed or how severity it will be So that works for geo point, but that's also work for All fields the second example is another ghost curve, but based on the price field There is a little trick about origin offset and scale Price let's assume that price can't be negative that doesn't mean anything So origin is set to 50 and offset to 50 That means our circle Which is not really circle for that, but will be between zero and 100 That means all price more expensive than 100 Will see their score decayed and in this example This example in the Documentation is about A hotel and probably the hotel The less expensive is your hotel better Will it be on not maybe better, but more relevant The last things and it's quite important is for each function you can define a weight Here the price weight is twice bigger than the location Whatever it's just That you can mix Your curve your weight your origin scale and offsets as you want And so what because a stack Don't provide with that kind of functionality. So Let's write a custom elastic search back end for a stack Actually, that's not just a back end We need to write a Something named search engine who embed a custom back end and a custom query And we need also writing a third a search query set Don't be afraid. That will be okay quick and fun Let's see how it looked like. Okay. We don't see a top but don't worry This is the search query search queries are like Django queries. They're lazy and executed only when needed The idea of this subclass is just to keep track of a dk function list That you see on top in the in it The big params just is Used by the back end to generate that HTTP request sent to elastic search. So If we have a function score or a dk function, we just return them in our search graphs The add dk function will be used by the query set internally to add the new dk functions and the clone function is here to Give the ability to a query like Django To clone itself. That means every time in a stack or in Django, it works the same You add a new filter or a stored by or an exclude or whatever you add Every time you add something to your queries the query is clone itself and return you a new query with that new capability And here we every time we clone the query We just copy the dk function to the new cloned query Here is the search back end the search back end are in charge of building the Parameters to send to elastic search Here we have something cool, which is if we don't have dk function, we just fade fast go away We don't have to do nothing with that. Just let s stack compute the rest for us The difference is if we have dk function We will embed our query computed by s stack inside a function score code that we see before uh We will put the function the dk function that we had Generate the query given by elastic a stack and the score mode as some We will go faster and here is the query set. This is how user used the query Uh, and we will just add a new function to the query set who will be dk dk will take a function dick Who will be a dk function? dk function will be only a dictionary All this code is available on github at this address I mean I will give it to you later And now let's use it so assuming we want all the vehicle with a park assistance and gps and gps nearby a geo point Let's instantiate Our function query set with a model add filter with park assistance and gps Let's compute the distance between our Searching points and vehicle points now you see how it looks like a dk function Here we'll apply an exponential curve on the fields named vehicle location With the origin scale offset and we will put the weight as to because weight is Weight is something important Not the most important, but it's quite important But we will have more fun now and uh at computation at indexing time. We add more fields A fields name picture count Which is the number of picture a vehicle had And here we will add a ghost curve on picture counts with a an origin of 50 An offset of 40 and a scale to 9 What does it mean that just mean less picture you have more dk you will be And we will put that weight at 0.5 because that's important, but maybe less important that distance And let's add this dictionary to the dk search query set But we need more Uh Even at the indexing time we compute something named owner quality rate, which is from 0 to 1 Is arbitrary computed by us and it's uh computed from Accentation rates the owner The number of reviews the answer time the number of booking And we will see that One is the best owner that you can imagine On the earth and zero is an owner that don't do nothing on our website So we want to reward good owner. So let's see. Let's say that Less your score is lower in The search you will be and we don't put weight that means weight one Then add that on our dk and now Okay, let's see what's happened now You can see that the first vehicle has description it's 12 kilometers from paris and 12 kilometers. Okay, it's it's inside our Non-dk circle that we define We have reviews. We have a number of eight pictures and a good owner quality rate You can see that the score is directly dependent from the distance the owner quality rate and the picture count Because you see that all of them are Better for the first and a second And same for the second we have reviews and good quality rate pictures. So we Can now say that if you want to rent a vehicle from paris. Maybe the first one is We'll provide you a better Holy days or maybe a better experience of our website Just by modifying that search query Okay, so now I'm finished. Thank you for the audience and Now if you have any question, I will be happy to answer you Hi, thanks for your talk. Um, this actually seems like a pretty interesting feature Did you ask the haystack developers if they wanted back so if they can can merge it on their side? actually I have a lot of things to do and I just Know that there is no functionality like that in a stack. I didn't see any pool request or things like that existing on the haystack github The code is now available on github that we make If someone is happy to create pool request or has case stack team developer to do that That will be happy. But I think haystack developer try to keep consistency between All the back end. So if they try to make something for elastic search, they will try to make the same thing for solarchs apian whoosh or and Actually, I not I didn't read any documentation how to do that with all their elast or other search engine Yeah, just contact them. They might be interested. So just contact them. They might be interested What about the performances? Do you compute the results? Once or is it computed at each church? Do you have a cache or something? Okay, so I don't know if you already use elastic search, but elastic search some things really Performance Adding decay function to our search Didn't degrade our search response time adding elastic search and haystack to our search function make the search faster Comparing to the geo Django search. So we have more functionality and it's answer faster I think that's a good compromise I was just wondering like you had to explore Really deep into haystack and all the different layers and write a lot of code to actually make that happen Why did you stick with haystack instead of just going with raw queries for for elastic search? Actually, we already used a stack for another part of our website, which is the afa queue and we use Not really intensively, but we use a stack to make like highlight some faceting and things like that and as I said before Going deeper in haystack. I really Thought that that will be easy to do You know Actually all the code I show you is exactly the code on github. There is no more code to do except that and The thing is doing that back end query set and things Allow me to add even more things or even more custom things on our website Which is not open source now, but for example, we develop a A more like this functionality or more relevant more like this functionality For campers because normally more like this functionality only work for full text indexing and now more like this work for campers and for for a given campers you have like a full of campers nearby with the same Um With the same options and same type what it's like more like this And your search results are quite dependent on the Parameters you passed into the decay functions. Yeah, did elastic search give you any tools to help you to understand? Where the breaks where the natural groupings in your data set were so did elastic search help you to understand that? Uh a decay function with this particular set of parameters is going to really neatly chop up your Owners for example Actually, it's some if I understand the question. This is something we do manually. That means All that decay function together makes something quite weird, but we Do it's like it's like playing with cursor. We just uh gets a given City and just test manually to see Okay, that's not good. This one is not really relevant. So maybe we have to decrease by zero dot one point that weight And we just adjust Our set of decay to have something that we think that It's acceptable for us, but we have to do that manually Um, do you allow the users to sort the search results and if yes, how? No, the simple question is uh, no and uh The way we make the the search is We provide all ads for every search So if we allow them to sort by price for example The thing is we will totally lost our sort by distance And it's quite not compatible or at least today. I don't know how to give that functionality to our user So today we will assume that our algorithm is good enough for user Maybe not the best to understand for user, but like blog Post or You have to make you have to learn that to your to use the user, but We consider that okay Okay, thank you. Thank you