 in search where we're working has some unique constraints because we generally have a very high volume of queries and very strict latency requirements, like real time. A lot of machine learning is run in batches like once an hour or so, but when you have to return things in real time with very large numbers of queries, things get a little bit more interesting. So I guess I'll give a little background about what Mercari is and the situation we were in when we introduced our machine learning model. So Mercari is Japan's largest consumer-to-consumer online marketplace, like an online free market, flea market. Our vision is a circular economy, so sustainability. We want to reduce the number of things that go to landfill, reduce the number of new things we have to manufacture by allowing people to resell their things instead of throwing them away. Our net sales are about 150 billion yen, a little over a billion dollars US. And within search we have over 20 million monthly active users, so unique visitors to our website every month, hundreds of millions of active listings in our catalog and thousands of queries per second, and our all-time peak is over 10,000 queries per second. So that's kind of the landscape we saw when we had our first machine learning model that we wanted to put into production. Our initial original architecture was a traditional term-based architecture based on elastic search. Elastic search is an industry-wide standard, very widely used, but it's term-based. That means that the keywords that you enter in the query are what are returned. So if somebody had not written that exact keyword in their description of their item, you wouldn't find it. So that was a big limitation of elastic search, although it was still working very well for us for 10 years. But our challenge was then to take this traditional architecture and add machine learning into it. And so now I'll pass the mic over to Tio. Thanks, Ryan. And so as Ryan mentioned, my name is Tio. I'm going to be discussing the next part of our talk, which is the problem. So as Ryan mentioned, we have this existing search architecture, but why do we need AI in this system? And so while this was very effective, this was Mercari's search backbone for over 10 years, there were a few major shortcomings. So elastic search is a keyword-based system, and because of that, it kind of falls for things like this. So these are areas where AI can help. For instance, ambiguous keywords, semantics, saying cool toys for boys, and personalization. And so with keyword-based matching, you can't resolve any of these issues. So again, going back to ambiguous keywords, in Japan, One Piece is a type of women's clothing, but it's also the name of a very popular Japanese manga franchise. And so if you're just using queries, matching the query to relevant items, which gets returned? Maybe both, right? When you only wanted one of the other, very different items. And then semantics, again, cool toys for boys. Maybe this One Piece bag is a cool toy for boys, but that's not in the title or the description. So it will never come up. And then personalization, again, there's no real easy way to personalize results to a user just based on a query itself. These are all ways that we knew AI-based methods could help improve the search that we had at Mercari. And so the question is, where do we start, right? So we have an idea, let's do AI in search at Mercari, but how do we do it in a world that wasn't designed for it? So the search architecture was designed originally without AI in mind at all, right? So there were no easy ways for us to integrate AI. And because Mercari is, again, Japan's largest consumer marketplace, at the scale that we had, we had very strict performance requirements. And so our latency budget was, in our case, just tens of milliseconds, which is really, really tough for machine learning models. And we also didn't have access to, again, no easy hooks for AI, optimal ways of serving these AI requests, right? That's not really built into the search architecture. And so the final thing that we wanna discuss is these are kind of like infrastructural constraints, but the main constraint we had was user search experience at all cost. So no matter what we do, the user search experience must be as good or if not better at each step of the way. And so we always want the situation on the right and not the left, right? That's exactly what we're trying to do by using AI to improve search results, right? And so this was the hardest part of the, I guess, the constraint, but I think it was the one that served us very well to deliver the actual business impact of our system as opposed to just say model performance or something like this. And so what is the first way that we decided to apply AI in search, right? With an existing search infrastructure that didn't have AI. So enter search re-ranking. This was the original place that we felt AI would add the most value, highest ROI, with the simplest least-insurance of integration into our existing system. And so our colleagues Alex and Norbert actually got into the ML side of this talk on, I think, Thursday morning, the second talk of the day. And so there's two aspects when you consider adding an ML system into an existing search architecture, whether the Elaxis search result, the original result that came back and be considered the first phase of retrieval. And now we're adding a second phase where we're gonna re-rank those results on top. And so essentially the goal is, we have this set of results. Can we surface the most relevant ones to our users first, which is even more powerful in e-commerce context, right? So when, say, for instance, if you're on Google doing a Google search, maybe you have a little more patience to go to the next page, the third page for search results. But for an e-commerce platform, it's very crucial to have the most relevant results first. Which means in the long term, which is ultimately serves our broader purpose of matching users to the items which they are looking for. And so with the first concrete use case for AI and search in mind, we go to the next step, which is the evolution of this ML system, right? So how do we grow an ML system while again running the business and sharing a really high user search experience? And I wanna emphasize that the focus is always on least risk, very iterative development on the highest ROI areas that we see in each iteration and then going from there. And so given our existing infrastructure, the first thing that we did was the simplest solution, which is if we do have a machine learning model, we can integrate it directly within the search server. And so this had a lot of benefits, but there were a few drawbacks. For instance, we tightly coupled model development with development of the search servers itself. The code base is massive. We have many developers working on it concurrently. As you would imagine, such a key part of the Mercari platform, it has a very high release cadence. And so we had to be very careful when we were developing. Along the way, this whole journey was very measured. We wanted to make sure that we didn't have, we weren't able to provide major features out of the gate, but we also preempted any major incidents, right? So in production, we were running very reliably the whole time. So because of that, I mean it very difficult, we went very, very slowly at this stage, but we went, right? This was a step in the right direction. And I do wanna mention that, because in keeping with the theme of this talk of open source for each of these stages of system evolution, we wanted to just include one of the many libraries just to highlight that kind of brought us from that stage to the next stage. And so in this case, our search server is actually written 100% in Go. And so we're very limited in the libraries we could use, but we did find a very useful one. This is called leads. And so in this case, we just developed an internal component that used this library to serve a light GBM model directly in the Go code in the search server. And so it was very simple to implement as mentioned. There's just a function call, right? To this component, which served this model. With that being said, again, there were problems with the solution that we addressed in the next phase of the system. And so in this case, it was characterized by really us going all in on our microservices architecture. So Mercari is very, very heavily invested in microservices, very Kubernetes first company. And so it was very easy, very natural choice for us to now split the ML model from the search server to a custom Python microservice for model serving, which was again, very simple, but actually unlocked huge benefits. Now we could develop independently from the rest of the search team from that huge search code base. And again, originally it was just a function call to the model in the existing kind of search server. Now we just easily replace that function call with an RPC. And now since it's going over the network, we have a small timeout. And now we introduced the notion of a baseline response, which in this case is elastic search rankings. And so what's very useful in deploying ML productions or ML systems to production is having some kind of really simple baseline that you can iterate on top of. And because we already have elastic search results, this was the natural choice for us and went back to our do no harm to users kind of tenant, right? So by definition, this re-ranking system was implemented. In the worst case, it would do no worse than what was already in place currently. So again, this key aspect of having this baseline response helped us to really iterate quickly in this stage and the next stage. And we also took the time to, as we're building out our POCs to implement basic production, kind of in this case metrics, but production features that help us in the future. So in this case, we add more observability to our system, but we're constantly paving the path to a more production system in the future, right? Along the way with each iteration. That wasn't strictly necessary, but we knew it was a good thing to do at the time and very natural. And we wanted to emphasize that these key points is when we add the complexity because now we realize there's an actual need, right? And so in keeping with the OSS talk and because we moved to a Python microservice, this gave us a lot more flexibility. And so using re-ranking, we went with TensorFlow re-ranking. TensorFlow, for those of you here who aren't familiar, is an open source library made by Google. It's been open source for many years now, industry standard in a lot of ways. High performance, it's been validated, has very active development. So again, a natural choice for us going forward and really kind of helped us to really get through this next stage successfully. And so with this being said, the system was still very basic and there are still many ways for us to improve on the system. And so this is the next solution that we settled on. And so as mentioned previously, the search server with the model baked in features were computed within the search request. And so we realized a lot of those features were, well, the computation of the features was very redundant. And so if now we have, say, consider a feature store like a distributed cache for inputs for a model. And what we iterated on, which is kind of not present in the slide, I'll go over it really quickly, was again, very basic next steps. And so instead of computing features within a search request every time, now we can pre-compute them. In this case, we used a big table, which is cloud managed on disk storage by Google. That was a lot better, but it did not meet our current latency requirements at the time, reliably I should say. And so we then upgraded from that to using Redis, which is an in memory key value store. And it served the same purpose. We were able to keep the interface the exact same. And then that was the key to further improving the system's performance to meet the aggressive latency requirements that we mentioned earlier. And in addition, as mentioned before, we have timeouts on the re-ranking request. We also have timeouts and fail safes on this feature store component. So we can get into it a little later. There's a few details in there, but essentially with each abstraction that we added and each layer complexity, there was always a safeguard to fall back to that previous more simple layer that we had before. And again, further improving our monitoring kind of observability suite along the way realizing kind of what metrics we needed to track both operationally and for model performance. And it's worth emphasizing here that a major bottleneck that we now had is again, I mentioned, oh, we're developing very quickly, right? That meant that we had to do a lot of AP tests, which is essentially, you know, we have something that's in production. We run something that receives a certain part of traffic, this new feature, and we see, oh, does this feature raise any of these key metrics that we have that are related to the business. And because we were able to develop quickly, we were able to develop and test new features quickly, but now the bottleneck wasn't necessarily feature development, but it was actually the AB tests themselves. So we could develop features very quickly, but each new AB test required a new model which needed essentially to be deployed as a new microservice and set up the same way as we set up the original model, which you multiply by the amount of models in any given AB test. So that was actually taking us about maybe a full engineer's time per quarter, and even with that, it would be run to two AB tests maximum. And so that became the next bottleneck in this phase of our system evolution. And before I go on to the next phase, I do wanna highlight the next open source library that we use that was very high value. Again, features still resettled on Redis as the actual backing technology. And so a very simple, just to go Redis, which is the Redis client for the Go language. Nothing really special to say, which I guess is kind of the main point, right? The main emphasis was it just integrated out of the box really easily with our current cloud managed Redis instance, and it had really great performance. And so again, open source along the way, we didn't have to reinvent the wheel on this one either. And then finishing up with the systems evolution, this is the final iteration currently. We have many more in the future, but this is where we're at right now, which is adding selling core for serving and Istio for model routing. And these together will alleviate that bottleneck that we mentioned previously. So instead of having to set up new microservices and Kubernetes configurations for each model that were AB testing and then tear them back down again, we can move all that manual work and automate that. So the feature flags we mentioned earlier, those can get sent directly as is without us modifying any code. And then Seldin can actually just with Istio's help route those to the correct endpoints and we can just spin up and down AB tests, hopefully in order of magnitude faster. And so again, iterative development, this I think will allow us to iterate even more quickly and really help us to now not just develop the features, but AB test them and now release them to really improve that user search experience a lot more. And in conclusion, we believe that ML Enhanced Search really is worth the effort. So if you have a traditional search architecture, ML can really add a lot to that and I can't really get into exact numbers to show how worth the investment it is, but as Ryan mentioned earlier, annual net sales are on the order of a billion dollars. So even just a 1% improvement in that direction is pretty significant, right? And so again, this went back to our very iterative approach to this system. So 1% decrease would also be horrible, right? But at the same time, because of we went iteratively along the way, we were able to prove our system out, release its production and have positive lives. So it definitely was well worth the investment in our case and we would consider, we'll urge everyone here to consider a top down integration of AI, right? So while it would have been possible to totally rewrite this architecture implemented from scratch, that would have been really dangerous and I can't feasibly think of a way to do that that wouldn't have negatively affected our users, but with a top down system, we can slowly integrate into search and now over time, you have an AI based search system that is essentially as if you were reading it from scratch anyway. And again, in doing this top down integration, this iterative development, you can really balance that engineering business trade off, which if you go in this way, we think maybe very rapid iterations. You can start with the simplest highest ROI features first and at each step just kind of laid that minimum amount of brand work necessary for the main stage. So don't over engineer your systems, but at least a little bit make it extensible so you can evolve to that next, overcome that next bottleneck that presents itself. And then I wanted to finish off with this quote that I really like personally. It's from the great John Maxwell, which is one is too small a number to achieve greatness. And I mentioned that because Ed Mercari, one of our values is an all for one. Literally, it's all for one. It means just teamwork, everyone helping each other. So a system at this scale would not have been possible without the help of many, many teams across the organization. So I wanted to say thank you to them and to give back and keeping with that spirit. We're also building the system to serve search in general. So it started with our smaller team that was in search. Now a lot more search engineers can be empowered to add machine learning to the overall search system at Mercari. And once that's in place, we are aiming for an organizational wide platform so that really AI can permeate much more easily across the org and reach its way to all of our users. And so again, an all for one, one is too small a number to achieve greatness. We wanted to thank the open source community. So as we wanted to emphasize along the way, this wouldn't have been possible without open source, right? Mercari, search ML at Mercari would not have been possible at least at this speed with this quality without the open source software that we use. It's almost entirely built on open source libraries. And in keeping with that, we at Mercari also open source our internal tools and resources back to the community when possible in addition to as engineers contributing PRs and bug fixes to the libraries we use because Mercari was founded on the premise of a circular economy where everyone can buy and sell. We also believe in a circular development economy where anyone can contribute and hopefully anyone can build these scalable ML systems without having to be at the scale of Mercari. And so we wanted to give back to that and really kind of pay it forward. So we believe open source was the key differentiator in our case and we really hope that the information in this presentation was valuable to the open source community present here today. Thank you for listening and we are excited to answer any questions that everybody may have. Thank you. Vic, so some questions for Rick and to you. Thanks for the sharing. So I just have one question about the search method that you're implementing. Do correct me if I'm wrong, but you're actually using the ML models to re-rank the results based on your original results. So does it mean that it will, in most cases, be slower than your original method without the ML models in actual product? Yeah, exactly, 100%. And sorry, what was your name one more time? Sorry? What was your name? Billy. Sorry, one more time. Yes. Billy. Billy. Yes, okay, yeah. So that was a great question. The short answer is yes and that's what led to our strict latency requirements earlier is that we kind of did the bookkeeping and said this is the only permissible amount of latency that can be added to the existing system, right? The current latency metrics that the system without AI was performing at. Thanks, I'll just add to that. We did some experiments to determine how much extra latency a user would tolerate. And we found it was pretty low, but not zero. So that was our opportunity to add in, we're ranking on top of the existing system. Yeah, would you mind sharing what the common metrics you look at when you perform the A-B testing for a new model? That is a good question. In general, we could say our business metrics were sales. So the more we sold, the better that our product people were happy. Hopefully that answers the question. Yeah, I guess that testing or that validation, you only get once you put it into production, right? So perhaps to your question is how do you test that in advance to know that you're not gonna miss the mark once you roll the model out? Okay, so just pass it back first. Hi, would you be able to share a little bit on the SRE side and all the performance redundancies that you have to put in place? Sure, we'd love to. Is there a specific area that you're considering? Yeah, like I guess that you probably have multiple elastic search instances, you will have multiple models because if one of the models went down, I mean containers that contain a model, then I mean, how does that look? How do you do the upgrades and things like that? Yeah, great question. So we actually run everything in Kubernetes including elastic search was a little, maybe non-standard these days, but everything is in Kubernetes. We do canary deployments and rolling updates of our models themselves. And so if they break, we can quickly roll back. And then we use DataDog for monitoring our various metrics latency, QPS, those kinds of things. And we get alerted sometimes late at night when we're not meeting our metrics. So it gives us a good incentive to not go too slow. Yes, but does that answer your question? Yeah, so our main state is our feature store. So we compute features about our listings and we store them and like Tio said in memory store, we update it regularly. So in addition to caching them, we directly compute them offline and write them to our store. And that is the only stateful part of our system. Thanks for the question. Thanks for sharing your experience of integrating your machine learning model with your search and season, right? And you mentioned you use Microsoft, Microsoft and you use Kubernetes, right? So would you please share more details about how you use Kubernetes to organize your microservice? Thanks. So that's a good question and maybe a better question for a platform team. So I can only really give you at a high level how we use Kubernetes internally. Is there a specific, I guess, question within that that you were thinking of, specific topic? So my question is probably you, my question is you mentioned you use Kubernetes, right? Yes. To organize your microservices. So I say would you please share more details? How do you use your Kubernetes like to organize your microservice for a CSDO? Just way quickly to, we quickly to change your system to implementation system and share more details about that. Sure. I can try to give a kind of a high level. So at least from the ML productionization side of things, it's a very simple solution where we just, create another set of Kubernetes manifests. We have kind of an automated templating system internally that will help us to do that. So in our case, as mentioned earlier, when it'd be testing these models, there's a lot of kind of manual setup of these manifests. And honestly, a lot of it was a copy and paste and just change a few fields here and there. And so I think that's the system we were trying to solve. So short answers, again, maybe, not at the right level of abstraction, but we would just for the ML models themselves to create different microservices, we have just set of manifests. We use customize on top of that. And then we just run it for a new set of manifests for that model. Thanks, I'll add to it maybe a little bit. So like we said, originally the system wasn't designed for ML. And so it was a very kind of static system and end points don't change often. And so most of the end points were hard coded in various configuration files. So our entire Kubernetes configuration is a GitHub repo. And if you want to change the system, you make up PR. So if you want to add a microservice, it's a PR. And when we first started it would be, we have to change almost every part of the system to manually code where the end points are. And then we would have if statements in our code to switch between which model to serve. So that wasn't scalable. So what we did now is we're using Istio for routing. So we just put the feature flags into the headers. And with Istio we just allow Selden to route the request to the right model. And that allows us to serve a new model without changing any part of the system. The system just, the search service calls the same endpoint every time. And depending on what the feature flag headers are, Selden decides which model is gonna serve that request. The following question. May I ask, how does this search AI model in general support pagination? So can you say it one more time? Pagination. Pagination. That's a good question. So again, we worked with the existing search system. So whatever the pagination that was used within the search server is what we ranked on top of. And so we can't get into too many details. But essentially we had to re-rank with that pagination in mind. There was nothing special on top of that. Yeah, short answer. Hopefully that answers the question. Yeah, thank you very much for the sharing. I just want to ask more regarding the feature store. This is a bit interesting. So can you elaborate more on how are you guys using the feature store? Is it used to save the results for the feature extraction from item descriptions? And do you use that for model training mainly? Or does it come in place? Like how does data flow in this system, in this case? And I can get to it at a high level. So the short answer to those questions is yes. Follow the things. So primarily in the very beginning, there were some features of the model that, again, they're being computed within the search workflow itself. And some of those were already kind of just there. Like say, let's just pretend item name. We don't need to do any processing for that. It's just there. But there are other features, say, maybe clicks on an item. Let's just pretend that's one of the features in the feature store. And so those were the things that I realized, oh, that is infeasible to really calculate within a search request. And so you can easily look it up if we have just a feature store. So that's how it started. And then the yes to all the other questions that you had is we are evolving it to serve many more complicated features after that, that have to do with both items and users. And then I'll elaborate a little bit. So the first features that we get back is what we get back from Elasticsearch. Things about the item, what's the description, how much does it cost, things like that, how many people have clicked like on it. And there are some things that we can't calculate just from the item itself, like what's the click-through rate that Theo mentioned, things like that. Or personalization, like what does this user prefer, things like that, that we can't calculate just from Elasticsearch. We pre-calculate it. And it's also the same things that we feed to our model, of course. And then at query time, we look those up in the feature store and feed the things we get back from Elasticsearch together with the features that we computed, feed them into the model, and we get a re-ranking. OK, I think we're up for time. But thank you so much, Rick and Ryan, that we can tell. Thank you so much.