 Okay, it's the right time. Okay, good luck and take it away. I won't need you to do time. Okay, good morning everybody. My name is Diego Ciaccarelli and I'm here with Sambap Kotari, and today we're going to talk about learning to rank explained for dinosaurs. So before we go into the learning to rank thing, a bit about us. We're both software engineers at Bloomberg, and we work on search relevance for news. This talk is called learning to rank explained for dinosaurs because the team where we work is called NSRX. This stands for new search experience, and we basically work on making the user experience in news better for the user. So we own all these services that improve the quality of the results, that help users to find better news, and to browse the collection that we have. Yes, so here we have a dancer. So just a few numbers about our engine. We have around 300,000 users, and we receive 16 million queries per day. We have really tight constraint on latency because some of the user use do trading on that, so they really need to know when something happened as soon as possible. So we have 100 milliseconds timeout for indexing. So when we receive the story, the constraint is that in 100 milliseconds has to be searchable, and a query has to take less than 200 milliseconds. As for volume, Bloomberg is a journalism department, and so there are journalists all around the world writing news, but they also acquire news from other newswires like the New York Times or El País. They also have an agreement with Twitter, so we also index the full stream of tweets, and this makes a lot of news. So every day we receive around 2 million news stories, and at peak time we have to index 500 stories per second. And in total at the moment we have 650 million stories in the index. Search comes into different flavors. We have the normal search Google style, where you go search for a company or search for a country, and you get back top 10 results, let's say. And then we have what you call alerting, that is more Twitter style search, where you subscribe to a query, and then in real time you get the news as soon as they happen. So you can follow a company and see real-time news about IBM. And for this we have like 1 million and 500,000 subscriptions. So in every moment we have to notify more than one million users with news. So that's all about the new search. Today we're going to talk about learning to rank, and short of review, how many of you use learning to rank for search here? Can you raise your hands? One, two, yeah. Okay, so good. So we're going to talk about how this technique called learning to rank works from scratch, you don't need to know anything. The only thing that you need to know is that when you do a search on a search engine you get a ranked list of results, right? When you search on Google you get results one, two, three, four, five, and this is achieved because the engine gives a score to each document that is just a number. So it gives a score and then the documents are sorted by the crescent score. That's the only thing that you need to know, okay? So, and we're going to explain learning to rank in four steps. So there are four steps that I'm going to describe, and then after Samba is going to show you a demo that shows all the steps. So you'll see how to build learning relevance with learning to rank from the beginning. So what is learning to rank? Learning to rank is basically using machine learning to improve the relevance of the results of a search engine, okay? And why we want to do this, I have a nice example. So some time ago I was writing slides about learning to rank and when you implement learning to rank in Lucene and Solar or Elasticsearch, one of the core points of the code is this object in Lucene that is called query, okay, it's query.java. So I wanted to check something in the implementation of the class. So I went on this website that is quite famous and I like it a lot. We use it at work as well. And I searched for query because I was looking for that particular class and I got these results. And that was not what I was looking for. I was looking for the class query.java. So I refined my query. I changed it a bit to try to get the result. I wrote query.java and I got these, okay? Again, not what I'm looking for. So I try to do what I do usually on Google. I put it in quotes so that it means like I want to match exactly this string and I got these, okay? And this is like, yeah, I mean it's funny but like it happens to everyone in search, right? When you put it on a search engine, this will happen. It happens to us, it happens to them but it's totally natural and it's bad but happens a lot. And that's why there is learning to rank and that's what learning to rank is trying to solve, okay? So what is the point here? So I have a search engine, okay? And so what happens is that I have my query. So this is the normal flow. I have a query, I search for something and I decide that some results are relevant and some results are not relevant. So in this case, we have in green the relevant results and in red not relevant results, okay? And in this case, how I signed the score to my documents, I use this formula here that is called TFIDF and it's a standard like the default way that Lucine and Sola used to rank the document and it's a very base score that is basically looks at the frequency of the term in the document. So this term, for example, contained 10 times the term solar so it ranks higher. Now Sola and Lucine and Sola don't use TFIDF, they use something similar but it's the same concept. So base score, I got these results and I decide to tweak it because I'm not happy with these results at the end of the list. So I multiply the score by 2.3. It doesn't change anything because I'm just increasing the score so all the documents will still have the same score but I try to do this, it doesn't really work. So what I can do easily in solar or in elastic search is that I combine it with other things. So I can say I compute the score of the query on the title and the score of the query for just the description of the document and I combine these three things together, multiplying by different factors, the score. And by combining these three together, I get a different score for each document so that affects my ranking and now I get the relevant result here and the relevant result here. And I can keep doing that. I can add like, oh, let me look at the previous click that people did, let me look at the recency of the document and I can end up with something that looks nice. Where I have the two relevant results on the top and I'm up with that, send my new ranking score in production and release to all the user. And then what is going to happen is this, that for my query solar I get the top two relevant results on the top but then when I start to look for other queries, I get like for this query everything red on top and so on, right? So, and this happens a lot. Like people try to tweak the ranking function manually and they get these results. So that's what learning to rank is trying to solve. It's make this automatic, okay? Because the problem with this is that you really need to be an expert to know how to tweak. You really need to know your domain. Like I work on financial news and I didn't study finance so I can't really tweak the ranking function manually, okay? So, what are the goals of learning to rank? Don't tune relevance manually but just let the machine do everything for you. And also, so at Bloomberg we decided to implement this for solar when I joined the company. And one of the target when we wrote this plugin that allows to do learning to rank was to also make easy to experiment. So, have the possibility to plug different types of techniques to do learning to rank and make the thing easy, okay? So, we're gonna start now with the four steps. Before we start, just an overview of how learning to rank works. So, this is a normal search pipeline. We have user querying for something. The query goes to the index. That is a Lucene index. And then, Lucene applies this TFIDF scoring that is the score that just looks at the frequency of the terms. And then, it retrieves the documents with the highest score. So, in general, like Google use case, 10 top results. And it returns the documents to the user, right? In learning to rank, this thing's changed. So, instead of retrieving top 10 results, we retrieve, let's say, 1,000 results for the query. And then, these 1,000 results, they go through this block that is called feature destruction. I'm gonna describe what is it, feature destruction. And then, they go through a learning to rank model. And this learning to rank model will look at the documents and produce new scores. And so, we reorder this 1,000 document according to the new scores. We select the top 10 documents in the new ranking, and this is the results, okay? So, I'm gonna go through all these steps now. And so, the first one is not feature, but is training data. Because if you want to train your model, you need examples. And this is how you're gonna learn how to give better results. So, how does it work? What is training data? Training data is just a set of queries. And for each query, you need to know which documents were relevant and which documents were not relevant, okay? So, it's just this. You have a set of queries. For each query, a bunch of documents that were important and documents that were not important, okay? How do you produce this data? There are two major ways to do that. One way is using explicit data. So, you create an interface. You ask some annotators to type some queries and to tell you which documents are relevant or not relevant for each query. You can use, like, you can pay people. You can pay experts to do that in the domain. There are websites, like, on Amazon, it's possible to spin jobs where you ask users to annotate. It is, in general, very expensive to do that. And the other way is using implicit data. So, if you have users in your search engine, you can look at the interaction, right? If I search for something and then I keep refining my query, that's not a good sign. It means that all the documents that I skipped by refining the query were not relevant. I was not looking for that. If a user click on a document and spends some time on it, it means that it's interested in the result so that might be considered a relevant result. So, there are these two approaches. They all have pros and cons. As I said, explicit data is very clean because the annotators will tell you exactly what is relevant or not, but it's quite expensive. While implicit data is easy to produce if you have users, but it's very noisy. And sometimes it's so noisy that you can't really learn anything from that. Okay? So, moving forward in the search pipeline, we have a feature destruction. So, this is the first thing that you do after that you retrieve 1,000 documents and you need to extract these features. Now, what is a feature? A feature is just a number that describe something in your document, okay? So, it can be like if the query matches the title, so just a Boolean, it can be the length of the document in terms or a popularity score, okay? So, how many views these documents got in the past, for example, okay? So, just to give an example, here I have a query, the ticker of Apple and I got two documents for my query and I asked, I produced a feature for these two documents, okay? So, we have the feature, query matches the title, here is zero because it's false, here is one because it's true, Apple is in the title. We have a feature like is the document from bloomer.com and here is zero and here is one. We have this feature with the popularity score. We can see this is more popular, we got 3,000 views and so on. So, it's just like coming out with features that you think might correlate with the fact that the document is relevant. So, for example, in the example with GitHub, you can have, does the query match the name of the class as a feature, okay? So, these are features and then the next step, once that you extract all your feature vectors for the document is to train a model. Now, what is a learning to rank model is just a way to combine together the values of your feature to get the score. So, it's similar to what it did at the end by tweaking the ranking function but it's done automatically. So, a model, for example, might be the one in the first line, okay? I'm just getting multiplied by seven if the query matches the title, summing 42 plus is excessive and then another number multiplied by popularity, okay? This is a very simple example. It's called a linear model. There are much more complicated way to train a model and the way you train is that you get your training set and for your training set, you try to predict the label according by combining together the features, okay? So, you have labels that might go from zero to four and then you try to find the best way to fit the feature together over the training set and predict the score. There are multiple ways to train, as I said, and there are three big families. There's a family called PointWise that was the first approach proposed for learning to rank in the 90s and PointWise is basically this. You try to predict the score by looking one document at a time. So, you just look at a document and you try to predict the label. So, if it was relevant or not. Then there was another big family that came and it was much more better. That is called PointWise, sorry, PayWise method. So, here in PayWise methods, you don't look at the score, but you look at two documents and you want to predict which one is coming before. So, you're trying to predict, if I give two documents for a query, which one is the best one? And once that you have the comparison function, you can rank the document because you have a function that predicts who comes first. And then there is the final method, I mean the final family that is the one used now by search engine like Google or Bing that they are list-wise method and they just look at the whole list and they try to optimize their global ranking. So, as I said, like this method that are list-wise, we're gonna show you a list-wise method and they're usually based on families of trees. So, you have many trees and each tree says something very simple, like if the query matches the title is greater than one and the popularity is greater than 31, assign the score. So, they are like, they combine together multiple feature and they work together to produce a final score. So, they have much more expressivity than linear mode. And finally, the last step is the evaluation. So, you want to say if your model is good or not and there are multiple metrics that tells you if your search is good or not, you still need a notation to do that. So, you need to know for some queries which documents are relevant or not and here there are the main metrics that people use to evaluate learning to read. There is precision that basically says how many relevant documents I retrieved for my query with respect of all the documents that I retrieved. So, if I retrieved my 10 documents for the query and only five are relevant in the query, my precision is 0.5, then there is recall that is how much you are covering of the relevant documents in your query. So, if for your query in all the collection that you have in the index, you have 100 documents that are relevant and with your query you retrieve 10 relevant documents, your recall is 0.1, you're returning only 10% of the relevant document in the collection and usually it's important to have these two metrics good. So, precision and recall are the main things that you want in a search engine. So, before when I was looking for the query class, that's a problem of recall. In that case there is only one document that relevant in the collection because I'm looking exactly for that document and I'm trying to retrieve it and I can't. So, it's a problem of recall. So, you want to have these two. So, basically when you want to measure the overall quality of your search system, you use F score that is just the harmonic mean of precision and recall. So, you want to have both them good. And finally, we have MDCG that is another metric that look at the whole result list for a query and it returns a score that tells you how much the ranking is good. So, the more you push relevant document on the top of the list, the higher this MDCG it will be. Okay? So, we decided some years ago to implement all this logic inside solar and the reason why we did that was that we were kind of doing it outside from solar but it was heavy from the performance and also from the point of view of developing. So, we decided to put it inside to make it faster and to make it easy to work with that. So, before this used to be a feature, it was a Python function computing a value and storing it somewhere. This is now a feature in solar. So, a feature is just a JSON snippet where you have the name of the feature, a type that is a Java class that implement that particular feature and then some three parameters. And in this case, we're using solar feature that allows us to reuse the solar lucene query syntax to specify the value that we want for that feature. So, in this case, checking if the person, the document contain an executive person is just checking if the query person matches into the field category and if the field primary position contains one of these terms. And that's all. So, the engine will compute the score of the query for the document and that would be the value of the feature. This is how you plug the neat rank inside solar at the moment. So, it's just adding these two fields inside the solar configuration. And this is an example of a model. So, this is a linear model and again, it's just a JSON snippet that has a Java class that describe how to compute the model and the feature that the model is using and then how to combine them together. So, in this case, we just have the weights for our feature. So, benefits of the learning to rank plugin. As I said, simple feature engineering. We reuse the solar search functionality. So, like the syntax can reuse all the feature that solar already has and it makes easy to model feature by doing that. We had some improvements by doing that in we wanted improvements in the search quality and we wanted to do relevance tuning. So, we didn't want to tune the relevance by hand anymore. So, now Samba is gonna show you a quick demo on learning to rank. For the demo, we use a simple Wikipedia JSON dump. So, you know, there is this simple dump. It just contained the most important Wikipedia article and it's 150,000 documents. So, we are gonna index it into solar and then we're gonna set up some stuff that Samba is gonna show you. Thanks. Hey, so, first of all, you can find all the code related to this demo at this repository. So, the entire repository is dockerized. We wanted to make the demo reproducible so that it's easy to actually take a look at how we did this and it's very simple code. We have outlined all the steps and I'm gonna follow them exactly. So, you can just, when you wanna go back and reproduce this demo, you can. Okay. So, first of all, what we wanna do is we wanna bring up a solar instance which has the learning to rank plug-in enabled. So, let's do that. So, as you can see, I just ran a docker command which brings up a container that runs solar. It is reachable using this URL. So, we have learning to rank enabled on this and since our goal is to actually search the Wikipedia data dump, I have added a schema for that matches the actual data dump. So, as you can see, we have some 180,000 documents indexed and let's actually take a look at some of the documents we have. So, as you can see, the schema contains, for example, the title of the Wikipedia document, the wiki title which is like the link. It contains links to other Wikipedia documents which it references. It has the actual description of the document and this is the first paragraph of the document. So, you can see we have a bunch of these. Now, when you actually query for something like Brussels, we get a list of documents that it matches and we get it along with the score. So, this is using like the default solar TFIDF methods. It's not using any learning to rank. So, let's actually, let's try to think of some of the features that more important documents might have. For example, if you take a look at these fields and what Diego described earlier, can you try and come up with some features that might be more important for a document which more important documents might have? Yeah, sorry. The number of incoming links. Yeah, sure, that's a good feature. Yeah, sure. Anything, anyone else? Yes, popularity, but right now in the schema, we don't have something like the clicks or reads. So, if you had that data stored somewhere, that could be a feature, but for this particular dataset, we cannot use that. Any other ideas, yeah? Oh, yeah, sure. So, that's a good one. Yeah, sure, that's another good one. That's two. Yeah, sorry, that's two good ones. So, yeah, all of these are features we were able to come up with. And surprisingly, they're very similar to the features I actually had written. So, you see these features are very intuitive. You can like think of them on the spot and they're very relevant to how these documents will look like. So, if we actually take a look at the feature store, which has the definitions of the feature, we see we have something like the like the frequent, like the number of words in the query that are contained in the description. The freshness, for example, the number of incoming links that it has. We have the original score that is provided by solar. And again, like we have the TFID score in the title instead of description. So, you were able to come up with all these features on spot and surprisingly, they're very close to the original feature store that we had implemented. So, you can see these features are very intuitive and you basically need to just write them. It's a very simple, it's just a JSON file, you can easily write them and learning to rank helps you actually extract these values very easily. So, let's see an example of how these feature values look like. So, for example, we have the same documents we had in the earlier search, but we have the actual values of the features. For example, we can see that our query Brussels has a freshness score. It has the number of links that this document has. The score of the actual query is in the description and the title. So, the description score and the title score. So, you can see it's very easy to extract these features once you have just provided that JSON definition. And next, so what I did is now we have the features, we have the documents. What we want to do now is get the features out, train a model on it and evaluate our results. So, I wrote a simple web UI so that you can give them to judges or people who are experts in the domain to mark the documents as relevant or not. So, this is where we'll get a training data set from and it also has a bunch of screens for actually taking a look at how our model performs, like the evaluation part. So, for example, we can type the same query again and we see like a list of, it's the same list of results and you can see there's a button here where they can mark the document as relevant or not. So, for example, this seems like a relevant document. I'm just marking it as relevant and it's saving that particular document as a relevant document. So, this is how you can build up a training set by giving it to people who can just like look at, type in a query and then mark the documents as relevant or not. And also you can see this particular documents comes third in the result, it's not the top result and you see a lot of results that are not relevant to the actual query. So, let's take a look at the baseline solar model where we haven't implemented any, we don't have a learning to rank model at all and we'll evaluate it on the same matrix that Diego described earlier. We'll check the precision, we'll check the recall and we'll check the F score. So, yeah, as you can see for the original model, for the top five documents, if we just take a look at the top five documents returned by the query, we see it has an F score of 0.38, a precision of 0.41 and a recall of 0.49. All of these matrix, the higher they are, the better kind of results that your search engine is returning. So, we can also take a look at how this, like the original baseline solar model performs for each query. You can check for each query, the F score precision and recall and you can actually check the documents that your annotators marked as relevant or not. So, now let's try and use learning to rank to improve these results. So, first thing we'll do is, we'll try to train and evaluate a linear model. A linear model will just take a look at the features we had earlier, like described and it'll combine them using some weights, by assigning some weights to each of the features. So, to do that, all you need to do is run the script. So, it gives you two options. You can either train a linear model or a lambda-mart model. We wanna train and upload a linear model. So, that's done. So, as you can see, a linear model right out of the box is not performing that well against the original solar model. Now, this might be due to a number of reasons. Maybe the data set we have doesn't have enough examples. It's also because right now we have not normalized our features. So, different features might have different distributions and we're not combining them together in a meaningful way. So, if you want to achieve better performance with your linear model, you can basically try to reduce your feature set. There might be some features that are very noisy, that are basically deviating the model from achieving ideal results or maybe some of the features are overpowering the other ones. So, we can actually compare how both these models look like for each particular query. For example, here we have the linear model on the left side and the original model on the right. We can see that it's not performing that well. So, for example, the linear model puts the first relevant result and maybe the sixth or the seventh position. Here it's in the second position. So, let's try and improve on that. As Diego said, we have better models right now, like tree-based models, which take a look at the features and then basically go through a set of decisions. For example, if they see that the number of links, if a document has a lot of links, go down this particular path, which checks the score of the query in the title and then so on. So, it's able to combine the features in a much more meaningful way than just a simple linear model with weights attached to it. Before we do that, there's something interesting I want to show with the linear model. So, we have trained the linear model and we know about the features. So, what do you think? Like, if there's a feature like the number of incoming links, what kind of weight should it have? Should it have a positive weight, negative weight? Any ideas? Anyone? Yeah, obviously. And what about... So, what about any other features that we had discussed? What about freshness? Do you think you'd prefer a fresher document than an older one? Yeah, so it's, again, a positive weight. What about things like the length of the document? Any ideas? Are we doing any? Oh, no, it's just taking a look. It's not normalizing the length at all. It's just taking a look at the length of the document. How do you think... What kind of feature would that be a positive feature or a negative feature? Yeah. Yeah, sure. So, if we take a look at our linear model and the weights it actually learned, we see that for something like description length, it learned a negative weight. So, it knows to penalize documents that have a very large description. For something like links length, it gave it a very high positive score. It gave it like a plus 31 weight. So, it knows that it was able to learn and extract the fact that documents with more links are more important. So, you can see it was able to do that with very little... We were able to set all of this up in very easily. So, yeah, that's a linear model. Now, let's try and train a tree-based model. So, we're using a tree-based model called Lambda Mart. It is available in a library called Ranklib. Let's try and train a very basic model. So, this tree-based model, it looks at a metric to optimize. Right now, what it's trying to optimize is the top 10 results. So, it's basically trying to put the most relevant result in the first position, the second most one at the second position, and so on. It's not just looking at the top 10 documents out of order and just figuring out if there is a document from positions five to 10. Those are marked at a lower relevance if the relevant documents were in the positions one through five. So, we've trained this model and we've uploaded it to solar. And let's check how it performs. So, we can see that the tree-based model actually performs much better than the original solar model. You get better metrics across the board. And if you check at a query level how it was performing. So, yeah, for example, we have the Lambda Mart model on the left and the original solar model on the right. You see, it's put all the relevant results on the top and that's typically what we want. And we didn't have to tune any of these weights manually. It was able to learn all of these weights and it was able to generalize it across all the documents. So, it's not just the first one. If we take any of the other queries we made, for example, Apple, we see that it's still, it's tried to learn a general formula so that it works better across the board, not just for one particular query. Now, this, typically if you're doing it manually, this would be very hard to do. But right now all you have to give it is a dataset and a bunch of annotations marked as relevant or not and it'll automatically learn how to rank the documents. So, let's try and see how it performs for a query it has never actually seen before. Yeah, can you, let's try and annotate a new query. Can you give me a query? Any query? Yeah, like we don't know how it will actually perform. What? Oh, keep in mind that this is simple. Wikipedia, so it has. It's only there. Yeah. Sweden. Sweden, okay. So, we see that we have a bunch of documents from Sweden. Let's try to mark something relevant that we see. So, Sweden is all the way in like some 18th or 16th position here. Okay, now, this is a query that we have never seen before. Our module doesn't know about it at all. It's not being trained on it. So, let's just for now mark this particular document as relevant. So, yeah, so we see it's here. Let's go back to our evaluation UI and the moment of truth, let's see how it does. Whoa, so it actually got it in the first place. Now, it never knew about this particular query. And this is the first time it saw it and it actually put it in the first place, whereas the original solar model put it all the way in like the lower 20s. So, yeah, that's the power of learning to rank. You can also do something, some other cool things with it. For example, now, if you take a look at the parameters that this script accepts, it basically asks for the number of trees, a metric to optimize, and the number of documents for that metric to optimize at. So, basically, if you have something like a UI where a user types in a query and it automatically directs them to like the top first document that is returned by the search engine, then you basically wanna optimize the precision for the first document. Now, even though we see that it's not performing as well as our other model, but it specializes in something that it does, so it will try and boost the most relevant document always at the first position. It will make sure that the first document you have is always the relevant one. So, if we actually take a look at the queries and the documents, we'll find that the first document will always be relevant. So, for example, Apple first document is relevant, Berlin again first is relevant, and so on. So, yeah, this is like a basic demo of how easy learning to rank is to use, and how you can basically do away with a lot of manual labor, manually tweaking the ranker function with learning to rank. Yeah, we are open to questions. Thank you. Yeah. What's not relevant and became relevant? Yeah. So, that was basically, oh, yeah. So, the question is we had in the UI, when we searched for Sweden, we had a bunch of queries that were marked as not relevant, and then we clicked on it, it became relevant. So, what's the front end behind that? Yeah, that's not, I mean, that was something you built yourself, right? Yeah, that was something we built ourselves. So, what we tried to build was a basic annotation tool. So, you can give the users a UI, where you can, let's say, pre-populate certain queries that you want them to evaluate. They look at the query, they'll get a list of documents, and they can simply click on the document they think is relevant. May I suggest that you do a thumbs up, thumbs down, which is unmarked. Yeah. And then, if you create a thumbs up, it became three, all right when you do thumbs down. Yeah, sure. So, you might choose both directions, right? Yeah, okay, yeah, yeah, that definitely makes sense. That's wonderful, I love this. Yeah, yeah, yeah, that's true. So, oh, sorry. Yeah. So, the question is, why didn't we do any cross-validation for our machine learning models? Maybe, and why we didn't mention it. So, for the purposes of this demo, we just wanted to showcase what learning to rank is capable of, and how easy it is to set up. We didn't want to get into, like, Yeah, yeah, yeah, yeah, we, People in the room, they never, they didn't know what was learning to rank at the beginning. So, the focus of the talk is not how you train a model, but it's how the whole framework works, and how like it makes elements better, then you can do like, there is a machine learning course on Coursera about machine learning. If you want to know more about how to train a model, you can do the course, learn about cross-validation, and machine learning techniques, but that will require a whole semester course, I think. It's not that... Yeah, I don't get any less of that, but like, let's just talk one mistake, because we're practically in crisis order, let's do it. Yeah? So, in the interest of getting something going, how do you know when to get more labeled data versus tweaking your features? Okay, so the question is, how do we know when to get more training data rather than tweaking our actual algorithms and the ranking function? Do you want to? I can get it. Yeah? So, that's kind of like, there's no real answer for that. Like, as the guy over there was saying, like the problem with adding a lot of features is that the risk is that you are overfit. So, the risk is that like, basically your model learn like really how to be good on your training set, but not like on, it doesn't generalize. So, it doesn't, at that point, it's not able to predict. So, understanding when you are like have too many features and you're overfitting can be done by using like a validation data test that contains query that you've never seen before, and then you evaluate your metrics on this new data set and you check the metrics. Like, if the metrics like drop down on the validation data set, it means that your model didn't generalize. So, it didn't really learn how in general to rank the documents, but it just like memorize your training set. So, by doing that, you can understand if you have too many features, reduce the feature, and then check the performance of your model again. If they are not good, you want to add new examples. Okay. Yeah. Yeah? This is a search on document level. How easy is it to use these kind of features in models to find part of the document? So, the question is right now, whatever we demonstrated was on a document level. How easy it is to extend it to parts of the document. So, most of those things would depend on how you've actually structured your schema. If your schema is structured that in your sense, that a document that you want to return is like parts of the document, you can basically create a field for that and then you can create features based on that so that when you search for a query, whatever feature value is computed, it gets boosted for whatever you think is relevant. For example, let's say you want to, you have a part of the document called as the links and you only want to search on the links and not on the rest of the content of the document. You can basically have a field in the schema that is like a list of links and you can have a feature that only looks at the links and you can make it like maybe, so when you're marking the documents as relevant, you only mark the documents which have the link as relevant so it will try to pick up on the fact that you're looking at the links and not the rest of the document. If you are looking for a particular block inside the document, what you can do is that you can split the document in blocks and then see the problem as learning to rank but done inside the blocks of the document. So you basically retrieve the document and then you go, you have another model that will rank the blocks inside the documents so we will rank them and we'll put, hopefully the block that is the one that you want.