 Thanks first to the organizers, to organize for Asia in person again. Big thanks to Omfuk and the whole team. I'm very happy after four years to be back. For most of the time, we were not allowed to leave Japan. Well, actually, we were allowed to leave Japan, but we were not allowed to enter it again. So that wasn't really an option for us. So I'm quite happy to be here. We are talking, so this is on the AI track. So it will be a bit less about open source. But actually, I think there's one reason, because open source is so prominent in the whole industry, that you don't have to talk about it. Every word between AI, everyone uses AI libraries nowadays, right? That is the big advantage. So what Maya already said, we have two talks here. This one is about the AI part, but don't worry. We don't have a lot of mathematics slides on it, so don't run away. We have a second part on the MLOps dev ops part, and how to get an AI system running in a rather big environment, you will see. Okay, so what we will do, we will go over some introducing ourselves quick, then about what Merkari is, because I guess most of you don't know about it. The state of search in Merkari, and then a bit of technically stuff about how to improve search results, and then to rank and then key takeaways. So let's start with the introduction. So Alexander here, he joined Merkari like two years ago. A bit more than two, he has huge experience in all kind of famous places. Actually, he put me to Merkari because he moved very close to the place I live in Japan, we call it the Inaka group, because we're living on the other side. Not in Tokyo. And so he is, thanks to him, I joined Merkari a bit later. This photo is sent a friend of, a colleague sent me while I was complaining about doing Google slides, because I'm an old style guy. I use later and produce PDF and he said, now that this is how I imagine you when you complain about Google slides. I thought that is quite fitting. Okay, so first about Merkari. So that is actually, it was nice that Tom Fook brought up sustainability because actually Merkari was founded out of one reason. With the idea to create something which is called, is in the Merkari speech, a circular economy. It's about reusing and trying to be more sustainable. That was the original idea of the founder ten years ago. So it was founded ten years ago. It will have now offices in the UK and the US. It's at the core, we have lots of other businesses going on. At the core is a client to client. So people sell their stuff and to other users. Very easy. And of course, if you look at, so you see here, this is more or less how the application looks now. Sorry, it is all in Japanese because, well, it's only Japanese market. There is a Merkari where US application is a bit different, but since we are from Japanese, you will have to be with that. So the main way to interact with the application is via search, right? You're searching for stuff. So I often buy stuff from my daughter, like ski boots and this kind of stuff. I mean, that have been used once or twice, I don't mind. So this core functionality is provided by Elasticsearch, another open source project. I guess most people who have a bit of an idea have heard about Elasticsearch. Elasticsearch provides already an excellent way to document retrieval. And yeah, that was the basic use. So a bit of the numbers we are having here. So we have about 150 billion yen. That's about 1 billion US dollar per year. Snatch sales, 20 million monthly users, hundreds of millions of active listing and that you get an idea about. So we had thousands and thousands of search requests per second. So that is the dimension we were speaking about here. And so when Alex joined in April 21, the state of search was just basic. We just throw the search query with a bit of tricks at Elasticsearch and get results back. And that is then displayed to the user, right? Okay, that works. Actually, it works quite nicely, but I mean, that can be improvements, right? And if you're in this industry and in the area about search and search improvements, then there are a lot of techniques to improve search results with machine learning. So what we are trying to beat up on this regular text-based retriever, Elasticsearch doesn't do anything special. It's just user-read tokens, whatever it is. So single words of it and it retains the best matching document that quite works quite nicely. But what we wanted to do better than that, get over this and that is called re-ranking. So looking into what is the state of the art of re-ranking, I will talk about shortly what re-ranking is, so don't worry. And also how it can be improved over time over this, right? This is not like a process is once done and then, okay, we finished, would be nice, but there are always improvements permanently. So what is re-ranking in a simple image, like that is a search result, searching for spots, it's like training trousers or in Japanese. And what you actually want is that the more relevant stuff, right, the more that those items that have a higher probability to be sold are higher up in the list, right? I mean, because well, at the end, if an item that could be bought by a user is on the sixth search page, he will probably not find them. So it's better to move them up. This is what re-ranking is doing basically. So in more abstracting, so this is what you get from the search, from elastic search, from your basic text search. And what you want to do is to increase the relevance, right? You want listing to is more relevant to the user who is currently searching. So that's all about, well, what is the reason? Well, it's of course, well, increase money, right? I mean, we want to sell more. We want that more people use their platform so that somehow what we are aiming at. So the basic setup, it was all done. So we have the Mercury application here. That's at the center and here's the index with the elastic search that was all already here in 21. That was our basic setup. And the end was, how can we improve on top of this by just throwing in something that picks the results from elastic search and then just re-all this then in some way using machine learning. So that was developed over the, let's say, last two years, while it is in. Yeah, and I see here a pass over to, are you having a microphone? Hello, hello. Yeah, thank you. So I will try to cover, Robert, first of all, thank you. I will try to cover the ML side of things and just to clarify what Norbert has said previously, the later versions of elastic search are fairly flexible. You can incorporate some ML models and to do fun stuff during indexing time. For example, natural language processing, but it's rather limited when you want to integrate other signals that would add personalization to the search results. For example, user activity or something else. So that's the previous architecture diagram. So this, I will get back to this. This is a pretty common setup, overly, overly super, overly simplified from a very high level. You have the first phase, which is where you're recalling results from the index, solar, elastic search, something else. And then you have the other thing which takes the results from the first phase and integrates some other signals that add personalization. For example, recommendation system often work like that when you, if you use Amazon or use Netflix, that's how recommendation algorithms work. They add more signals to the search results when they, or recommendation list when they recommend yourself. For example, what you did yesterday, what you did in the past, your browsing history, what are the users in your area, your age, what are your, what are your, with interest doing? Right. So we decided that, yes, we need some sort of a machine learning approach, but what is that machine learning approach? So there is an area in information retrieval field which called learning to rank, which is basically a set of algorithms in which I'll supervise machine learning algorithms. Which help you to apply machine learning techniques to add some sort of relevant aspect to the search result, relevance as it pertains to the user that's browsing, researching. So, how to choose the algorithms, how to apply it? So there are many algorithms available and luckily for us, there are open source frameworks that already provide implementation of these algorithms. So you don't have to write them from scratch. And the way those algorithms work is they approach the ranking problem differently. For example, there are algorithms that consider documents independently of each other, how those documents are relevant to the query, which is called point-wise. There's the pair-wise approach where documents compared in pairs and the list-wise approach. So you have the whole result set returned from the first phase retrieval and all those documents together evaluated in terms of their relevancy. Right, so we went with the TensorFlow ranking framework which is the TensorFlow module which sits on top of the TensorFlow core. And the reason being is because Mercari is rather maybe TensorFlow oriented, so it was kind of more natural for us to choose this framework. But to note, this is not the only framework out there. We just ran with it and decided to give it a go. It's backed up by Google. So there is some activity around the GitHub around that. So we decided, let's check it out. Right, so yeah, first of all, so how to start. So we took an iterative approach. We created a simple model by choosing a set of simple features and we were hoping to iteratively progress and see how our efforts help the search relevance at Mercari. Now, as I mentioned, we used the supervised machine learning approach where we need to label our data. So how do we label the data and what is the label signal? So the most obvious one is click. Because when you search for something and you want to preview it or you show interest, you make a click. But there is a problem with that. Clicks are noisy because human behavior is that that human users just click on stuff and it doesn't mean that click means something is relevant. And the opposite is also true. It doesn't mean that when there is no click, it doesn't mean that the item that was not clicked on is not relevant. Also, clicks are biased. Normally, human users tend to click on the top results more than they would click at the bottom results. So for example, if you search for monkeys, if you like monkeys and you get back 120 items of various documents that speak about monkeys, normally you would see majority of the clicks the let's say top 20, 30 results. Which means if you have relevant results at the bottom, they would never get clicked on because most of the clicks are the top, which means when you generate a dataset for machine learning, the labels would be kind of biased, which leads to problems like position bias and then selection bias in the dataset. Where you have this loop where you constantly retain the models on the biased label dataset. Also depends on your application business domain, clicks may not be a good enough signal. For example, it may be a good signal in web search. For example, when you search something and you Google presents your search results and user clicked on a given search result, it can be considered a relevant result. But in a C2C marketplace, clicks may not always lead to a purchase. So users click and preview a lot, but doesn't mean that what they click on, they would purchase. So when we label the data, simple click will not be good enough. So we need some sort of other signals that would help us to teach the model, to teach the machine learning algorithm what is the, how the model should learn. Right, so as I mentioned, clicks are binary labels, which means it's either clicked or not, really not not relevant, this is not good enough. So we adopted approach to create graded relevance labels, which means we incorporate other business events when we compute what should be the label. For example, if user clicked on something, started, made a comment or liked an item, started a purchase process and then purchased, it means this user events behavior journey with application, it can be a good label. So what our approach is not known, and like other companies approach this model the same way, so depending on the business domain, you have to adopt your label strategy. Also, of course, as a general statement in machine learning, what data you give to a machine learning algorithm, you will get the output accordingly. So to have features, to have good features, of course it's very, very essential. So apart from deciding what should be our label, we also experimented with a number of different features as we were trying to train model. So to touch a little bit about the metrics and what I said earlier about business domain, there are a set of very common metrics and information retrieval domain, which called NDCG, normalize this kind of cumulative gain. It's a metric that calculates the precision of your relevance applying behavior, which means the high NDCG score, it means users see results that more relevant to them, which means they would click high on a search results, which is what we want, but a high NDCG score or any other metric does not mean that the company is actually making more money. So in terms of the numbers, you may see a higher NDCG score, but sales went down. Now we're back to you and the key takeaways. Okay, thanks a lot for the details. I hope you went overhand by the mathematics. Well, it wasn't that much, at least from my side. So a few of the key takeaways that if you want to implement something similar, right, it is not, we are not a big team. We could put this off in short time for a large scale company. So they're using just basic of the shelf open source technologies you can do. Here, a few hints you should take here if you want to look into this. The first is, well, as already mentioned, bad data, in-bed data, better output, you have to need good structure data and proper engineered features. That is actually the most difficult part. This has nothing to do with software. This has with looking at your users, what they are doing and what could be relevant. Cleaning your data, of course, it's necessary and invest a lot of time in quality data labeling. It sounds crazy, as you mentioned before, with how many 20 million monthly users, how many searches in per second, of course, you cannot manually label that. That is not possible anymore. So you have to think about good indicators for what could be a sign that the user is engaged interested in a certain. That is the main thing you have to do if you want to set up a good AI system. Then keep an eye on the NDCG. So NDCG is used everywhere, right, for search results, but we have seen it again and again. If you actually test stuff, business KPIs, so your actual company or engagement, whatever, does not really go along always the NDCG. So things have to be careful a bit. So, yeah, the general structure, this very general system structured in. So for the feature engineering here, it's like, can you, I mean, if it's possible, it's good to include into the user interface something that user can give you a feedback. That's the optimal stuff. That was not a good search page. That was not interesting or that was interesting. Even if you get very few of them, they will help you to improve your search results. That is something that unfortunately is very hard to do because lots of people get annoyed if it's done badly. And so, yeah, one has to be careful. Cleaning your data. I mentioned already, beta in, beta out, beta out. So the same happened with extreme outliers. So you won't believe what humans are able to do. So we have examples of hundreds and hundreds and hundreds of clicks on items in succession by humans. We are very surprised what some of our users are doing. So that means you have to actually create your data in a good way. Good data labeling, so as I said, this is something where you can actually iterate. So it's not that you have to come up with a serious solution for all your problems in the first step. We never did this. We started with binary, very trivial labels and then improved incrementally, right? So this is what we call graded leavens or this like when liked or when commented. So we give it a certain graded relevance level. But this is a way how you can start quickly off with a nice off-the-shelf system to provide re-ranking stuff. Yeah, I mentioned this already, the entity blind side. So it doesn't mean if the entity value is good that you get really serious improvement, just be careful for that. Yes? So a little bit more on the blind side. In terms of the number itself, the metric might be high but if your first stage retrieval, your index gives you absolutely horrible results as it matches the query to the documents which are indexed. So in terms of the NDCG, it may be high but the results still not relevant to the user because you could have just ranked poor results. So your metric goes up but your users are still unhappy. That's where the blind side is. Don't just look at the number and think you solved your element solution is working great. Okay, last but not least. So this is an open problem and there are lots of conferences only related to search in AI. There are a lot of things we can do. Unfortunately, we're running out of time I just saw the slides a few ideas one could implement. I don't see start with that if you want to implement something similar start with something simple but there are a lot of things what is now let's say the ethics of technology or what most people are using then you can progress to this. Okay, time is over. Thanks for your interest here. We are open for I think one or two short questions. I guess if there are no questions then thanks everyone for that. Oh, sorry there wasn't, I didn't see. Actually, the second talk by the two here sitting here and we'll go in more details, ML Ops. So we split this because we don't have that much time. We have a second talk on Saturday which goes into ML Ops and implementing and discussing. But it's off the shelf stuff I would say. Nothing specific. Thanks. Yes, Marco? Yes, because then actually purchase something. Yes, so Marco the question was I will just look at where we're on else. We mentioned that there is a very weird user behavior and actually user or robots. So yes, of course this is a huge problem. So we are not at 90% of the traffic of the internet is consists of bots. We have indication that they make purchases. So that could still be a bot, but yeah, how to see? We identified like very consistent clicking behavior pattern from our data. It doesn't look like it was a human. So and there is like huge amount of clicks. So there is certain indications which will help you to identify the bots and you need to clean that out when you prepare your training data set. Yeah. Okay, I think time is out. Thanks a lot for your attention. And yeah, we have a sponsor table if you have questions, we are on.