 Today, I'll be talking how we can search for a lot of different languages on your mobile phone very efficiently. And I think I was just introduced, but to reiterate again, I was from NUSCS 2018, so my junior back just graduated. I'm currently an iOS engineer in labs, and you can reach me at this email address. So without much further ado, I will talk about what are the things that I will go through today. So three very quick things. I would introduce you to the full tech search in SQLite, which is a very powerful tool. And after that, we will go on to try to build our own advanced tech search engine, like going beyond what the scope of conventional search engines will do. And finally, at the end, I will give you some tips and tricks in engineering search user experiences. So the first question might come to you is that why do we want to search on mobile devices, right? So increasingly, security and privacy is becoming a larger problem nowadays. A lot of messaging apps now are having end-to-end encryption, and a lot of people prefer whatever they store to be on the device themselves. And also, it will allow you to do whatever functions you have when the user is offline, or just because he's offline, you can get much faster results than if it were to be online. And also, just because we can. Nowadays, phones are so much more powerful. The iPhone 11 was just released, and the new A13 chip is very, very powerful. So with this, I'm working on this product in SQLite C Talk, and it's a messaging app. And so that's why all of these things come into question into our product. Just a small disclaimer, so most of the results I presented today will be like results from my own research, my own firsthand research, and of course, it would not be the best way of doing things if you have better way, feel free to reach out to me or suggest in the Q&A. And most of the ideas I present in this type will be universal, but I'm using Apple's standard library to give you some examples because they are very complete and it's very useful for you to build mobile applications. But of course, all of these things are generalizable to Android because in C, we also have Android team or web team. So all of these features that I will be talking about today has already been presented with our Android team and our web team. Yeah. Okay, so let's go straight into SQLite. And I know that in the US, people don't really talk about SQL a lot. If you only do like the very basic SQL module, this is what all they teach you inside the module. Basically, just do a SQL query, select from this table with a light query with the percentage match on both ends, which allow you to search for everything that matches this particular query. But this is very, very slow and inflexible. So starting from version 3.24, SQLite actually has this feature called full text search, and the latest version is FTS5. So what is full text search? So full text search, as the name implies, it allows you to search for a piece of text in the large document. And a lot of features come out of the box in SQLite, FTS5. There is full text search, of course. There are custom tokenizers to handle different language. What tokenizers are I will talk about later on. And it has a very simple syntax. So let's say if you want to create a table, that has FTS5 by default, you can just call this very simple create command code, create virtual table message using FTS5. And you can specify your custom tokenizer down there. And the searching for the text syntax is also very simple. You can just select from message where content match. And as you can see down here, this is the asterisk, means that this search for a prefix of full. So everything that starts with full will be returned as a result. And there are many, many other kind of syntax that you can search for prefix, you can search for plus. Both of these has to appear in the same, or they near each other, like a near query. So it's very, very powerful, or n or not. So the FTS5 syntax is really, really, really powerful. And there are a lot of other nice things that come out of the box as well. For example, they come with a suite of default tokenizers. So what are tokenizers? Tokenizers are just functions that process your text and turn them into searchable entities. So for example, FTS5 comes with SCPorter and Unicode 6.1. So for ASCII, basically it's very simple. You just convert everything to ASCII, like let's say in French or in some other languages they have those funny signs on top. It will already be removed. Or for Porter, they will try to stem, so some of you who have done information retrieval or some machine learning algorithm, you might know Porter Stammer. So let's say convenience can match convenience or inconvenient or it's just a stemming of the word. And Unicode of course is to handle a lot of other languages out there. And there are some other utility functions, like for example, highlight. You can select highlight where content matched this thing. And on top of the results that is returned to you, it also surround the search result with whatever things you specify in the query. So let's say I put in like an XML tag of HL for highlight and then the results that return to me will contain all of the matched queries surrounded by the text. So it's very, very, very useful. However, today is not a ceremony about FTS5. So we will just talk about its benefits and also trade-offs. So what are problems with FTS5? So first thing first, I'm an iOS developer and the default SQL-like version that's shipped with iOS is does not support FTS5. So we need to add a custom version on top of that. And the other problem is that the only data type that is allowed is text. So if you go back to a few slides here, in the create virtual table, if you see, if you know the SQL syntax, usually you have to specify what is a data type, like an integer or string. But over here, we are not allowed to specify anything. Why? Because the only supported data type is text. So sometimes if you want to sort by timestamp or sort by some other arbitrary ranking, it can be very, very difficult to do so. And other language, for example, let's say the default Unicode 6.1 tokenizer, surprising that does not support Chinese or Japanese because most of these things, let's say here I have Ni Hao, Shijie. And then if I search for Shijie, it actually doesn't give me anything down here, no much found. But if I search for Ni Hao, it actually gives me the result. Why? Because it actually search for the prefix over here and because there's no space between these Chinese characters. So it recognizes this thing as one entire word. Yeah. So that's why it doesn't work. And for Vietnamese as well, like Xin Chao Tazer, there are some characters that, for example, this one, when it translates, this one, it will not match this one because the E with the cap on top does not match with this E. But however in Vietnamese, we usually consider these two as usual when we try to, as similar when we try to search for these things. So it is not very powerful. Yeah. So here are the challenges that we face in having FTS5. But it doesn't stop us there. There are a lot of cool things that we can do on top of FTS5. But before I talk about how we address the challenges, let's go into tokenizing a little bit more. So okay, why is tokenizing? I only went through briefly, but tokenizing, let's say, how many tokens are there in this text? Can anyone tell me in this particular text how many tokens are there? So yes, it's nine. So each one of these one is a token. How about this one? It's actually eight because the my token in the is token is repeated twice. So each of these one will regard them as similar. Yeah, because when we tokenize this string, we only mark what are the tokens that are present in this world and then where they are. So how do we split this thing in tokens? Do we simply do this? You just split by a regular expression of string. So as all things goes, it's not so simple. So let's dig deeper into the problem. So what is a string? Has anyone been taught in school what a string is? Can anyone tell me what a string is? Actually, that's a smart answer, yeah. But because the real answer is that depends on which programming language or framework we are talking about, they have a very different set of definitions about string. So let's take a look at this string, okay? So what's the length of this string over here? And if you ask a lot of different language, they will give you, each of them will give you a different answer. So let me just fire up a very simple demo over here. Wait, this one is, and then I need to drag this thing over. Okay, cool. This is too dark. Anyway, I just, so let's just ask node, okay? So let's just open node and we paste, I'll just go back to my slides and copy that. Yeah, sure. I increase the font size when I go back to the terminal. Whoops. Sorry guys, I'm trying to search for the text. Let's increase the font size a little bit. Is it being enough for everyone? Okay, let's try this one. Okay, when the moment I paste into the terminal you see something fishy going on. So if I ask what is the length of this string, it will tell me 17, yeah, for this string. And I wrote a very simple C program, string.length.c. That just print out this thing. The terminal is also screwed up. Yeah, so very simple program, right? Declare a string, print the length, and then I just execute it, string.length. Yeah, it will return me 31. Yeah, so very funny. Let's look at Java, okay? Let's look at Java, string.length. Okay, 17. Same as our friend, node. So final thing is Swift. So what does Swift say? So I just increase here, font size a bit. And Swift says seven. Yeah, which is the correct answer if you look at it from a human perspective. There are one, two, three, four, five, six characters in this token. So let's go back to our slides. Yeah. So each of this language will give us a very, very different answer for each of this one. And for us, just nice, Swift gives us the correct answer and I will base on this thing for the rest of this talk. So why seven and why other languages return different results? It's because Swift characters are grapheme clusters. So this is a very loaded term, but to think about it, a grapheme cluster is just basically a human perceived character. Yeah, for other languages, they might use UTFAs encoding and then they will count some code points and for C, they will count a number of bytes. So it will give you a different result. Yeah. So let's go in to try to decompose this string. Ni hao, wu jia maing, wu, tokenai jing, gu, gai, suki desu. Ya, lu, blu, jia, zheng, Swift. So there are four languages in this text. First is Chinese, is Latin, there's Japanese, and there's Russian. So let's talk about how we can tokenize this thing. So for Chinese tokens and for, there's one character for token and there are no space between them. For Latin-based languages like English and Russian or Cyrillic in general, they will have multiple characters for token and they will mostly separated by spaces or punctuations, whereas Japanese token, so this is an alphabet called katakana, which says tokenizing, which is a borrowed word from English. So actually this one should ideally be considered one single token. However, for the rest of the tokens like gasuki desu, it should be either kanji or hiragana, it should be treated as sim to Chinese. So at this point in time, you might ask me why do we support so many languages, right? So it's just because we are in a very unique region here, we're in Southeast Asia and we have a lot of different languages around the region. It's not like in US where you can just get away with one single language. So let's talk about how we are going to tokenize this thing. So this is just a brief outline. I will go through with you later. But, and this strategy will not work with all languages. It only worked for the languages I just showed you in the previous slide. Yeah, so the first step is to separate potentially single character tokens from the rest. And you can look at this very loaded regular expression. Basically it will check for all Chinese, Japanese and Korean characters in the text. And this regular expression is very, very powerful. Basically just, yeah, just check all these characters that belong in the Han, the Hiragana, Katakana or Hangul group inside of the Unicode blocks. And in each of these token will be captured as one. Whereas for the rest, we'll trip or die critics, convert to lowercase or uppercase depending on your preference and split by whether they are letter. So this is very loaded. Let's go through them one by one. So first one, let's split out all the things that are Chinese or Japanese or Korean. So each one of these will be marked out and we'll regard each one of them as one token. So that's all good. For the rest, let's trip or die critics. So what are die critics? So if you look at this is Vietnamese text and it is a French text and you have like those squiddling marks on top or below. Yeah, so basically stripping or die critics meaning removing all of them. Yeah. And the next step is to convert them to lowercase, uppercase. It depends on your preference. Over here I use lowercase. And so finally we have to split by whether they are letter or not. So why can't we simply split by spaces and punctuations? Why? It's because space and punctuation is too general, right? In different language, they have different spaces and different punctuations. For example, the Japanese space is different from the English space and the Japanese comma is different from the English comma. But even if we know all the sets of spaces and punctuations we still shouldn't split by spaces or punctuations. Why? Because of emoji. Yeah. So even if you split by space or punctuation, you'll see the la and the smiley face will be considered as one token. So basically we have to go through each character and ask them are you a letter or not? And if you are a letter, I just add you to my token. If you're not, then we'll go. Yeah. And in Apple standard library, you will have a very powerful function called East Letter. It was just recently introduced, which is very powerful. So even if you ask it for Arabic characters or like Russian characters, it will just tell you yes is a character. But for any other things, it will say no. So that's why I really like working with iOS libraries because of that. Yeah. So let's talk about how we can bring this tokenizing behavior into FTS5 in SQLite. So SQLite support this thing called custom tokenizer where we can put in a custom function into SQLite and tell us, hey, don't use your BDFopt behavior, use this one instead. Yeah. And it's actually a C API. So you can see like all the const jar and everything down there is very low level. Here, there's a constructor, destructor and the tokenizer. You don't have to care about all these things. The most important thing you have to care about is the tokenizer function. So if you see over here, yeah. So this one is the full text that is passed in into you. And then this is the, all the charges number of characters. And here is the callback function that you're supposed to call in FTS5 to tell them, hey, this is the token. So the callback function takes in the string. Let's tell him, okay, this is one token. This is the length of it. What is the position in the original string? And what is the end position in the original string? So you just call this function in terms and then you can tokenize your text by yourself. However, the SQLite is in C API. So for Android, you can use GNI. And for Swift and Objective-C, actually Swift and Objective-C can interface with calls, with C calls natively. So we don't really have a problem. But in my app, we actually use this framework, GRDB, which abstract a lot of these things out to very nice Swift functions that we can call. Yeah, so if you ever need to do this thing in your own project, you can look at GRDB. And last thing we want to talk about is the sorting of the text type because sometimes, actually most of the time, the virtual table is of text type and we cannot sort them fast enough. So what can we do? So in FTS5, we have an extra feature called the external content table. And what it does is that it links your FTS5 virtual table with an ordinary SQL table, yeah. And you can see here that you can create two tables together and then you link them together using triggers. So every time you add one column to the external content table, it will also be indexed inside the FTS5 table. And they are linked together by this unique default column called row ID. So this one is not what we create, but it is created for us by default for SQLite. So these two things is how they know that hey, these two columns are the same, yeah. So having this external content table, you can finally query for your results. So this is usually what people will do. They will select from the external content table. It will join on the message table and where the row ID are equal and we will search for the content where the content match the query. However, this is very, very slow. This is around 0.3 seconds for 100,000 matches of the query. That is very slow by our app standard. Imagine having to stop for one second before you receive your results. So actually there's a better way of doing this. You have to force the index of the table. So here we're trying to sort all the messages using timestamp because I want the latest message to be on top and the very old message should be below. So to force the index, you can just say index by this thing and it's automatically three times faster, yeah. So why is this so? In SQLite or in any other SQL language, you have this thing called the query plan and you can pass in explain query plan followed by your query to ask it, hey, how are you going to perform this query? And it will tell you what a step it goes through to perform this query. So if you look at no index forcing the query plan, it will scan the message table first, which is the virtual table. After scanning it, it will try to find everything that have the same row ID and then it will construct a temporary B3 in memory to order the string. So it's very, very slow. However, if you force index, it will sort on the external table first and then it will do a sub query on the message table and then it will return out the result and that's why it is so much faster. In our research, we also do other optimizations, but we found out to be slower than this one. So that's why we didn't do it. Yeah. So yes, that's a lot of information about SQLite. I hope you remember this. If you don't, just talk to me sometimes if you want. Okay, let's move on to the second section of today, building a custom search engine. And what are the challenges that we face in Sitot that prompt us to build this custom search engine in words? Because first thing, all the messages are securely stored on the device only and all the messages are wiped on our server as soon as they're delivered to all the intended recipients. So that's why we can only do all these sort of things on the device. And a few months ago, there came a new feature called Global Search in Sitot. So what Global Search means is that it matches everything within the app. You can search for messages, you can search for contacts, you can search for groups for everyone, it matches everything. And the second requirements tells us that we have to highlight all the partial matches. Let's say if I search for V, then only the V will be highlighted, not VU. Yeah, it's actually harder than it sounds. And there are different matching strategies for single and multiple token queries. Like for example, if you type in one single token, let's say VU, then it will match as a substring. But if you say like ANG VU, like two tokens, it will only match by prefix. Don't ask me why, ask my project manager. Yeah. So very weird requirements. And down the line there, okay, let's just talk about this first. So basically if you, okay, it's a little bit small, but here is ANG. And for the single token case, it will match substring. And for the multiple token case, it will match only the prefix. Yeah. And, okay, let's skip the details. Let's go. And a very big feature that came out just two months ago, I think, was pin-in search. So what is pin-in search? In other C offices, there are colleagues who put the Chinese name only in the app. And other colleagues in the same office, they don't speak Chinese. So there's no way they know how to pronounce this people's name. Even they know like, hey, this guy is blah, blah, blah, but they don't know which Chinese character it is. So one of the features that was requested was that we given a pin-in pattern search for the corresponding Chinese name. As you can see here, my colleague, okay, every name or my colleague's name to some weird characters, but it is Lachi Chensang. And then if you say Chensang, it will like flesh out this guy. Yeah. So this is one of the requirements. And it is not as simple as it looks. But let's explore how we can address this problem, okay? So in Swift, we have this very cool API called Applying Transform. And basically it is available in the default library. You can just apply transform mandarin to Latin and you convert this Chinese string to the Latin text. And later on, you can strip the diacritics as well. And then it will turn it into a very intelligible string down here. For Android, you can use this library called Tiny Pin-in. That's what our Android guys in the company is using. Yeah. So switching transformation can work very well for other cases as well. Like you can, you know, transform other languages as well, just as a site. And from here on, it will get a little bit technical. So bear with me for a while. So inside our search, we define a object called the token. So like I said, each token object will represent a unique token in the original string alongside with all the alternate representations of the token. So what are the alternate representations? So for example, let's say this Swift code, but you can understand it, I hope. Yeah, so I have a struct called token. I have a string which is the original representation. I have all the alternate forms of the token. And for each of the alternate forms, we match it. This is the dictionary. It matches with a match type. So it can either partially map to a string or wholly match the string. And for example, here I have this big token. And this is a English text. So it will partially match another string. For example, big will match bigger and ambiguous. But here, for example, here's a Chinese token. It's called ta. So ta, when you translate it to here, you shouldn't make it a substring search. Yeah, you should only match this token wholly. You shouldn't match it as a substring. Because imagine you type in some Chinese character and it matches some English guy's name. So that's why we don't do it. We define it in such behavior. Currently, the way that we do it in our app is very dumb. It's actually a n-time, k-time, m algorithm. So it's a very naive token matching algorithm. So why do I talk about this? We are talking about efficient matching right now. So I'll talk more about how we can improve this case later on in the slides. But let's just move on to solve this problem in its entirety first. So the next step with the problem with the current tokenization is that if you write the Pinyin without spaces, there's no way we can know where we can split the Chinese characters. For example, last time I have a friend named Yan Gen. But a lot of people call him Yang-en. Because if you write it like this, there's no way you know which one you can split. And for the search as well, for example, if this is a query, it cannot split. It doesn't know where to split the token so that it can match up with the text string. So here was one of the biggest problems that we have when we try to build this feature. So what I will pre-order is this thing called reverse prefix Pinyin matching. So now the text becomes a pattern. We reverse the roles of the two entities we're talking about. So let's say if you're trying to match Lajie Shen Chang with this text, then we'll turn it around. This one becomes a query and this one will become the text. So all this thing here is just a steps but I will explain to you one by one. Okay, so let's say this is now our text and this is our query. We'll turn the first character into the Latin form first and we found out that the prefix of these two match. So we chop up this one and we recursively match it down to the last character. And when we end up into our base case, which is when the query string is with empty, then we know that it is a match. Yeah, so this is how we do our reverse prefix Pinyin matching. And similar to the previous one, it is very, very inefficient because let's say you have N text to match, each text has K, Chinese token and each string matching is OM as best. The best you can do is OM. Then it's OM times K, which is very, very inefficient. So there are a few improvement strategies we have in mind but as with all software engineering products, I have to ask my manager, I have to ask my product manager whether I have time to do all these things. It's like, oh, I'm gonna speed this up by like, you know, 1.25 times will you allow me to do it? It's like, no, we have other more important features. So here are just some suggestions. So first thing is that we build a tree, a tri-e that points each token to each of the nodes. So every time you fire up the search, you will do the indexing in the background. Basically, we point, we make a graph or like a tri-e that points each of the token to the intended matching participant, yeah, like the users inside the chat. So here, when you type in the query, you just walk through the tree and then it will point to all of the matching candidates. So this is one, but it's very space-consuming because the space is exponential if, yeah, on a very large database of user names, yeah. So another strategy is just to dump all of this into a SQLite database and let the database does it work. So yeah, this is one of the improvement strategies that we had. There are a few drawbacks to these strategies of doing things, though. So bear in mind that sometimes if you want, you want to translate characters with different pronunciations that you cannot do it. So if you put this string into Swift, it will say Le Bochi, which is wrong. This thing should be Le Bochi, yeah. So that's one of the drawbacks. And another drawback is that it will assume Japanese names as Chinese names. So for example, my favorite character is Conan, Detective Conan, if you watch, is Shinichi Kuro. But then in Chinese, it's Kong Teng Shin-Yi, yeah. So it's a different name, yeah. And also it will, difficult to perform well on Japanese name with multiple pronunciations. For example, this one on the left is Yamada, but on the right is Tanaka. So the character over there, the window character can be pronounced as ta or ta depending on where it is in the string. So yeah, the best way is actually do not assume the pronunciations and tell the users to put in the actual like name there or an alternative Latin name into the database if possible. But otherwise it's a very fun and challenging problem that we come up. So yeah, it can, it can be really fun to do sometimes. All right, so with all of this, all n times m some k algorithms, why not we try to improve our performance? So performance over here in school, you talk about a lot about like O n and how we can make it like as linear as possible, make it O one as possible. But however, engineering is very different. So engineering, we are more concerned with UX, like how users perceive our search performance. It's not how the search is done actually in the code sometimes. So let's explore a few expectations that user might have when they perform a search operation. So first thing is that users nowadays are spoiled by speed and accuracy of Google. So they will expect the results to be among the top five of the return results very, very fast. And sometimes people would like to change the query over going to the next page. For example, if you cannot find what you need, you actually want to type in a different query. You actually don't scroll, scroll, scroll, find your next page or maybe I'm just different. And most people would not go beyond page two. So there's a popular saying somewhere, say everything that lives on page two is dead. No one goes there, no one goes to page two. So these are a few expectations of search that we have observed. And actually these are actually good for us because we can return our results early. If the users only expect the first top five to 10 results to be the best results, then we can always return them first. And then even if other results are required by the users, we always perform all these things in the background and report them later as the users scroll down lazily. Yeah, so that is how we can improve on this thing. Sorry. Let's talk about more aspects of UX of search. So in search, when the user first enters, we need to show something like an initial state for an app like CTOG like us. We have to tell users, okay, what you can search for, right? So we usually show some illustration there and say, hey, these are the things you can look for. And search has to be very responsive, meaning that the moment that you type your query, the users expect your results to return within zero point. For us, we base our research result on 0.5, 0.5 seconds. Yeah, so we also have to indicate whether there are more results. Like for example, if there are more results at the bottom or we are still trying to load for results or empty state, like if we cannot find anything, we need to show them that we cannot find anything. So I will introduce to you this thing called paginated search state. This is not anything that is specific, but you can implement this in any language. It's just a general idea of how I want to show you guys what we do in our app to improve our search result. So basically we keep an array of the current results that we have right now. We have a Boolean flag saying whether we have more results. Well, Boolean flag to say whether we are performing a search. And we have only one function called fetch more results. Yeah. And basically when you execute this one, whatever your search operation might be, it will try to execute that. And wherever there's a result, it will report immediately to the main UI. So basically instead of walking through all the end entries of the text, you can just walk through until you find the first five entries in the text and then you can report them on the main queue. And then you can perform the rest of the search operations in background and report them when you are done with it. So these are very, very technical. I guess I shall not go through them. So yeah, that's it for today. I hope I didn't go through all these things very, very fast. But now it's time for Q&A. So if you have any questions, I can address to you right now. So let's go back there. So talking about the FTS5, so FTS5 is actually very, very efficient already. The problem with this thing is that on top of the search results, we also want to sort them by a certain criteria. In this case, in my messaging app, I want to sort the messages by the timestamp. So because the timestamp is a text, I need to turn it into integer somehow. And that's why I need the external table. The improvement was very simple. Like I only forced the index on this thing. But actually that's a good question because we tried a lot of different strategies. So I can show you another one that we have in here. So this is different slides that I have from my own company search. Because I've been giving this talk for a lot of times. But yeah, so in here, if you can see that, we, when we try to improve on all of this thing, right, we don't just poke around in the dark. We actually look at this thing called the query plan. So you can say, you can input into your SQL console, explain query plan of all this thing and it will tell you like how it's going to do your search. And then you try to look at its strategies. Okay, this one is trying to scan this table first. It's trying to match itself against the other table and then it construct the temporary B3. So this one is notorious because when you try to build B3 or like a sorting tree in memory, it is very, very slow. So we are trying to eliminate this step. How is, we can base itself on the index. So basically the table already had another index that we had beforehand. So when you try to sort the table using the index first, it will not actually touch the temporary B3 to build. Yeah, there is weight. So we actually tried to improve this thing a lot more. Not just, is this, okay, I'm just moving this thing over. Sorry, open this one, move this thing over. Okay, so this is another talk that I gave in my own internal company a long time ago. And we explored another strategy. So, this thing, yeah, this question is taking a bit long. I'm really sorry about that. We have this thing called the, okay, so this is forcing the index. We have another thing called using, we tried to prefix the timestamp with zero. So we tried to cheat it a little bit. You know, all the text, okay, let's try not to use the external table. Let's just append zeros to our timestamp so that now your string sorting is not integer sorting anymore, you're sorting by like string. So you compare each of the character and then if the character is less than that, then it will sort. But this one is also very slow. So yeah, we didn't want to do this. So we explored this new feature in the FTS5 called auxiliary function. Basically, it is used to, traditionally it's used to read how well a query fits certain text. But we can define our custom auxiliary function inside SQL using a C API. And I actually use this thing to turn it into, to use timestamp. So basically, I will just turn the text into UN32 and then I will just bind us in the max because I want the latest message to be on top. So that's why I need to reverse the function. Yeah, and in my initial trials, it is very fast. It's 0.04 seconds. Why? It's because there's only one single query plan which is scan the table. And at first it seems like a miracle. Like this is actually our first prototype. But however we realized later on is that our test database is flawed because when I generated the test database I actually generated them batch by batch. Like when I generated 500,000 messages I actually only generate like 1,000 messages at once. So all of them have the same timestamp. And this becomes a problem because actually each of the timestamp value is only calculated once. So by this method, we've wrongly accused this function or wrongly recognized this function as being very fast. But actually it just has less things to calculate compared to the other methods. But turns out when we create a new test database with strictly increasing timestamp, the forcing index actually works much faster than this one. So after our research, we found out that the forcing the index on the table. But OK, to come back to the question in general, just look at the query plan. If the query plan is doing something really stupid, you should be improving that part. Any other question? I can speak Vietnamese. I mean, great. No, that's a thing. I can speak English. I speak a little bit from French. Je ne parle un peu français. And I speak Russian. Yeah, I can only speak a little bit Russian. I can speak Japanese. I can speak Chinese. I can speak Chinese. I don't know which one. But people, yeah, so I don't know. I cannot count. Actually it's a good thing because when it comes to all these languages, I need to know how they are indexed. So in our company, we actually have Thai customer as well. And no one in our office is Thai. We don't know whether our search on the Thai language is correct or not. But so far, we have not heard complaints from Thai people. So I think it's OK. Just nice to have a new Indian guy. And then we also have a Pakistani guy who happens to know Greek. So basically, currently our search algorithm can work on most of these languages. OK. Yes. So sometimes we will ignore the accent. But when it comes to that, it depends on the user expectation of the search as well. Yeah. So in this particular region, the language that we deal with, we try to strip out the diacritics. But in some other language, when you strip the diacritics, it actually means a very different thing than it is intended to be. So in CTOC, our app, we actually strip out everything so it can return wrong results for that matter. But I think it really depends on your user base. Because if you want to support a lot of different languages, for example, you are building an app in the Europe, for example. And you want to support all those European languages, then you might need a lot of different edge cases to deal with. Over here, just nice in this particular region, we found a set of minimal algorithms to process this text such that we can search them correctly. So I say it's a case by case basis. Yes? So this is more like an engineering term, actually, because this is a protocol. So whatever you're trying to do, you just confirm this protocol. How you return the result is up to you. So in this presented search state, OK, sorry. I need to show this slide on the other side. In this particular search state, we only care about the current state of the search, whereas how you arrive at this search is totally based on you. So it doesn't talk about the accuracy of the search, actually. This one just tells you that if you have a very accurate search algorithms, you can just make use of this framework so that when you first found your first five results, you can just return them immediately. And then you can try to search more for more. I didn't go through this part very in details, because to go through this thing might take quite a long time. If you want to know more about this thing, you can talk to me after the talk. I hope that answered your question. Any more questions? No? OK, so thank you very much for coming today. By the way, just a shameless plug. C is always hiring for interns or whatever. So just come out to me if you want. We need more of you. My team is entirely NUS, by the way.