 So I'll start by talking a little bit about myself. First of all, good afternoon to all of you. And I hope that you're all doing safe and you're all healthy or wherever you are in the world right now. My name is Gunjan, and I work as a data scientist at Gojek. I'm a mathematician at heart. I have a bachelor's degree in mathematics and a master's degree in data science. I'm currently working at Gojek since the past two years, and I've been working on some really exciting problems. And I'm here to share one of those with you. So let me start by talking a little bit about Gojek and setting a bit of context about the problem that I'm trying to showcase here. Gojek is an app that offers many services to its users. Services like ordering food, commuting, digital payments, shopping, hyper-local delivery, getting a massage, and two dozen other services are all offered in one app by Gojek. It is Indonesia's first and fastest growing deca-corn, building an on-demand empire. We have a headquarters in Jakarta, Indonesia, and Gojek operates in 207 cities in Southeast Asia and five Southeast Asian countries, including Indonesia, Singapore, Thailand, Vietnam, and Philippines. Now, when you have so many services rolled into one app across so many locations, a very natural question comes up. How many people who are on our platform actually use more than one of these services? Let's assume a person, Mike, who uses our two-wheeler taxi service, GoRight, to commute to and from work every day, but has never used our car taxi service, GoCar. Hi, we have one more person, welcome. So I was just talking a little bit about Gojek and the place I work for right now, and then I'll go on to talking about the problem that I'm trying to solve with the poster that you see on your screen. So Gojek, at Gojek, we have many, many services rolled into one app. Services like ordering food, commuting, digital payments, shopping, hyper-local delivery, getting a massage, and like two dozen other services all rolled into one app. So when you have so many services in one app, a very natural question comes up, that how many people actually use multiple services that we offer? So let's assume a person, Mike, who uses our two-wheeler taxi service, GoRight, to commute to and from work every day, but has never used our car taxi service, GoCar. And does this mean that he will never use it? Or let's take the case of Indra, who religiously orders food from our food delivery service, GoFood, and uses our digital payments platform, GoPay, to pay for them, but has never taken a ride using GoRight or GoCar. So does it make sense for us to send her a voucher for these services to see if she uses it? Now, with millions of monthly active customers, the permutations are endless. But the key point is that for us as a business, it makes sense if more and more customers use more and more services that we offer. But at the same time, we don't want to spam our customers with vouchers or services that are not relevant to them. Imagine constantly getting notifications of a voucher from a particular app, which you have no interest in whatsoever. It can surely get very annoying. So we decided to build an algorithm that helps us figure out which customer is most likely to use with service based on their transaction history. This meant generating a targeted campaign for our customers. By targeted campaign, I mean only sending a voucher or a cashback to a customer if they are highly probable to actually use it. This way, we get higher conversion rate at much lower cost and we don't end up spamming our customers. Any questions till now? You can just feel free to stop me at any point if you have any questions. Okay, so let's try to zoom in to the left side of the poster and see what's going on here. So here, you can see a Venn diagram that explains how a targeted campaign is designed. So let's say we have a universe of all HoJack users and out of this, there will always be a base pool of the campaign that you want to run. So let's say we want to run a campaign to convert more and more people to use our food delivery system, GoFood. Then our base pool consists of people who are eligible for that campaign, which means people who have used other services but have not used GoFood till now. And out of this base pool, our aim is to find a target group or a target audience which our users were more likely to be interested in the campaign. In this case, users were more likely to convert to becoming GoFood users. So for any targeted cross-sell, our aim is to find this targeted audience. So this is the basic problem statement that we're trying to solve. We're trying to find a targeted group of people. We're trying to find a small group of people to target for a particular campaign. Any questions till now about customer targeting or if the diagram is not here or anything, please feel free to ask at any point and I'll move forward. So now to solve this problem, our first hunch was to build a classification model. Now, a little bit about what a classification model is. A classification model is an algorithm that is designed to divide your data set into two or more classes if such classes exist. So in our case, we had two very clear classes that existed in our data set. So for our base pool, what we're trying to do is classify our users into two classes, customers who will cross-sell and customers who will not cross-sell. And we can only look at the customers who will cross-sell and send the voucher to those people, avoiding a huge bunch of people from that voucher. Now classification model, when we built it, it was working great. We were getting like an average uplift for around five X or natural conversion rates. On one of our key services. But at the same time, classification model had some of its pitfalls. So to actually build a classification algorithm for a targeted campaign, we needed to train the model individually for each campaign because the base pool itself for each campaign would change. If I want to run a campaign to target new GoFood users, then my base pool would consist of people who have used other services except GoFood. Whereas if I want to run a campaign to target new GoCar users, then my base pool would be people who have used other services except GoCar. So the base pool is itself changing for each model, for each cross-sell campaign. Now when that happens, you literally need to rebuild your classification model for each and every campaign that you want to run. So clearly this was not a scalable approach to go about it. So we needed to sort of rethink the way we were approaching this problem and think of it as making a matchmaking, like literally a match between users and customers, such that we have an algorithm that is generic enough to actually, that that is generic enough to just, such that we don't need to have a lot of effort to go from one target, one target cross-sell model to another. That's when we thought of using recommendation systems to solve this problem. Now a recommendation system is an algorithm that uses users' past behavior or users' history to recommend items to users that they're most likely to purchase. In today's world, the recommendation systems are literally all around us. You see relevant ads, relevant products on Amazon, relevant shows and movies on Netflix, all of these have recommendation systems sitting behind them. We are using recommendation systems based on collaborative filtering as a matchmaking mechanism between users and products. Since we have many different products, we can actually use these products as different items in a recommendation system place that we want to recommend to our users. I'm gonna take a pause here for two minutes to see if there are any more questions. I don't see any questions, I'll move on. So I was talking about recommendation systems. So like I said, we are using recommendation systems to recommend products or we're treating each of our services as one item in a recommendation system. Now every recommendation system that is based on collaborative filtering has a utility matrix associated with it. A utility matrix is a matrix that captures interactions between users and items. By interactions, I mean these could be the rating that a user has given to an item or this could be the purchase history of that user and that item, this could be the search history, this could be how many times a user has clicked an item, et cetera. This is what our utility matrix look like. So each column of the matrix is one service. This is like a sample of our utility matrix, of course. Each column of the utility matrix is one service and each row is a customer. Now these columns can go as granular as having a merchant as something that we're recommending. So the cross-sell model can actually work to the granularity level of a merchant, not just a service or a payment method. So the matrix is filled like this. We have, we were looking at the transaction history of a user for that particular product. So let's say user one has used GoFood in the last one month three times. So the cell between user one and GoFood will be filled with three. Similarly, user two has used GoFood two times and the cell between these two will be filled with two. However, user three has not used GoFood in the past one month. So this cell will be empty. Now imagine this concept being applied across millions of customers and more than 20, 30 products. So we would have a pretty huge utility matrix at our hands and the idea is that this matrix is going to be very, very sparse. By sparse, I mean most of the values of this matrix will be missing because as I said, we have a lot of users, but not everybody uses all the services. So wherever a person is not using a service that particular cell will be empty. So the problem eventually boils down to finding these missing values in the matrix. Now, if we are able to find these missing values in this matrix, we will be able to figure out whether this particular user will be interested in this particular product in the future or not. So let's say we have a value for user one against GoCard. Let's say we're able to predict what this value is. If this value is very, very small, we might not want to recommend GoCard to this user. However, if the predicted value is very high, we might want to recommend GoCard to this user and include that person in the campaign. Now for the purposes of this poster, I'm gonna ask you to consider like recommendation systems as a black box. I'm not going to go into too much detail about how the algorithm works, et cetera. But just to give you an intuitive idea, we can try to understand the image here. So as you can see, user one and user three both have very similar behaviors. Both of them have used GoRide and GoPay Offline. And both of them have used GoPay Offline once and user one has used GoRide thrice and user three has used GoRide four times. Now, this is pretty similar behavior in the past one month. However, user one seem to also have used GoFood thrice. Now what we can do is recommend GoFood to user three as well because we have seen that in the past, user one and user three have actually behaved similarly. So if there is a service that user one is using, we can actually use that information to recommend it to user three. So this is the basic idea behind how recommendation systems work. They work on finding similarities between users and similarities between products based on the past behavior or any questions till now. Like I said, let's treat recommendation systems as a black box and moving forward, this is what our final workflow looks like. We get our data, our one month transaction history from BigQuery. We feed this data into Pandas and do some ETL on it. Now doing ETL on Pandas is like to get a utility matrix out of the raw data is not a very expensive operation. So Pandas simply on his own was working really well for it. So we decided to go ahead with that. However, building a recommendation system on the utility matrix is quite an expensive operation. We're not only storing the entire utility matrix in memory, but we're also trying to perform optimizations on top of it. So we started off by tying out a library called Surprise from Python in Python, which was being used to build our recommendation system. And it was taking around six to seven hours to just build one recommendation system. So after exploring more, we found that there is a Spark ML library as well, which deals with recommendation system. And when we tried using that with a Python wrapper on it, we found that our training time had reduced from six hours to one hour. So we finally ended up using the Spark ML LS recommendation engine with a Python wrapper on top of it. And that library basically spits out your final filled-in utility matrix from which we can infer the list of customers to target. Now, once we had this workflow in place, we decided to do some field tests on the model to see if it is actually, how it actually performs out there in the field. And we got an uplift of around five X to seven X on natural conversion rates across service types. And currently also this model is being used to target a lot of people, like to send out a lot of vouchers for cross-selling across products. That's about it from me. That's a brief intro to the project that I've done. There is a relevant blog post about this that I will be posting on the Discord channel in a bit. So please feel free to go through it and I'd be happy to answer any questions that you might have. Hi, I see someone who has joined. How are you doing? Hi, how are you? Yeah, I'm doing fine. Thanks. I actually just finished talking about the poster right before you joined. Like, if you ask any questions that you might have, if there's any section that's not clear, I'll be happy to talk about it again. Yeah, amazing. I was in between talks basically and I missed the beginning. I'm very sorry about that. Let me just turn on... No problem at all. Let me turn on the video. Interesting by the poster. I reviewed it before the call because I knew I was not gonna have much, much time. And I saw that you were using the Spark ALS, like model basically. Yes. I was wondering if in your setup with the recommendation you were working on, if your products were changing quite often or if you had kind of a stable set of products because I was thinking of using also ALS for, I tried ALS on my own problem on a couple of them actually, like some to do to find similar users, some to find similar products, et cetera. And I have one use case in mind, but for that use case, like from basically one week to another, the products are changing. So I was wondering if I could make that work or how was your situation basically? Okay. So currently this model is not a real-time model. It is trained on a weekly basis. So the products are not changing on a weekly basis, but customers are being added on a weekly basis. That happens. Like we get more and as our customer base expands, we get more and more customers being added to the model. So, and this is not a model that is even predicting results real-time because this is currently used to generate campaigns. Now those campaigns are usually pre-planned. So how it works is whenever a product comes up to us and asks us to give them a list of customers to target for a particular campaign, we just sort of use a pre-trained model for that particular week and give them the list of customers to target for that model, for that campaign. So it is not really real-time, but if a weekly sort of retraining works for you, then you can definitely do something like this. Yeah. Because also, in our case, like I was just start talking before, we started using the surprise package when we actually started building the recommendations. There's a surprise package in Python, which we started with. But there the problem was that the training time was around six to seven hours, which was pretty huge. And like it was just not optimal enough for us. So then we moved on, we explored Spark a little bit and we found out that this has an ALS package. And in 10 hour training time, actually did you store it on one hour after we started using Spark. Yeah. So that was pretty useful for us that way. Yeah. I'm not sure how this would work the whole time if that's your use case. Yeah, I mean, I've used the ALS successfully for a use case similar to yours, where I have my users where, like basically I want to match make users and entities that they follow basically. So I have user profiles with all the entities that they follow and that's pretty stable. And thanks to ALS, I can use that, I forgot the name of them in the API, but like you can get like the relationships or like I forgot the name, but you can basically, it's like with this market basket model, right? So you can basically get the next entities that the user is... Yeah, yeah. Could follow. So that... Probability score for the users who would lose the next entity. Yeah, I get that. Yeah. So that was like working and I was quite happy with that. But now I'm looking at the other one, which is more like kind of like e-commerce in use case where you have this product that changes. And for that one, I'm a bit stuck, but I will kind of keep on thinking about this. I was thinking of maybe in order to be able to use ALS, kind of find like the similarities between the product or something like this and then do the recommendations based on that or something like that. But I'm not sure how well that will work, but yeah. Because then if you could say like, if the products are more or less similar, which in my case, I mean, to some extent they are. So then I could maybe instead of much making a user with a product directly, I could much make the user with the, let's say the archetype of the product or something like this. But then I would need to define these things. Yeah, so I think of like, we also face a similar problem, but the thing with us, so the upside that you have is that if you're working in an e-commerce space, your product similarities actually make sense. Like if a person is selling a type of clothing, then there would be other people selling similar type of clothing. So you would have something which could be used, called a similar product. But for us, what we had was services that are offered by Gojek. Now it's very absurd to say that GoFood is similar to GoCar or like a food delivery platform similar to Car Healing Service. So that was kind of one of the challenges that we faced. So we're actually, we ended up using matrix factorization technique for this very reason. We started by using KNN approaches, we tried those, but they're also like, it just doesn't make sense to say that, okay, one product is similar to the other in our case. Users can be similar based on their usage history, et cetera, there's no such thing as the product profile that we can use to create similar products as such. So that's sort of where ALS has helped us because using matrix factorization, using ALS has sort of helped us cross that kind of blocker that we had because it uses both user and item similarities. So it takes a combination of both. Yeah, yeah, definitely. Sounds great, yeah, yeah. Thanks, thanks. Actually, I had on my list to check, to check more like the product similarity techniques for that thing like, and for the KNN, so this ANOI library that I have not used yet. So I might, I'm just gonna make a note like actually, what you said makes sense. Like the ALS is more for the, when you can train on that relationship between the two when it's on the, yeah. Cool, thanks, thanks a lot for, yeah, I have to run story to add another talk, but. Yeah, sure, thank you for joining. I'm happy I caught you and could ask my question. Thanks a lot. Thank you. Bye-bye. Bye-bye. Hello, hi, can I ask a question? Hey, yes, please go ahead. Yes, Mark, hi. So you spoke about the differences in learning time between the two packages. The first one you were saying was six to seven hours and the second one was much shorter. I think it's about an hour. Apart from the speed of concluding that process, did you determine any particular differences in the types of outcome of those learnings or was it purely just a speed benefit that drove you to the second package? That's a very good question. So before I answer that, I'll talk about something that was essentially different between the two algorithms between Surprise and the Spark ML library. So recommendation engines as an entity, they work on explicit data. By explicit data, I mean data where you have a rating given by a user to a particular item. So for example, in Netflix, you have the rating given by a user for a particular movie or a TV show. And you use that explicit feedback to train your model. However, in our case, we didn't have that. So what Spark ML library, ALS library does is that it converts the explicit data into implicit data, sorry, it converts the implicit data into explicit data first. What we had was implicit data. It was user transaction history. It doesn't necessarily indicate how a user likes or dislikes an item. Surprise did not have that conversion. It was assuming that the data we're giving it is actually explicit data. So it was working on this underlying assumption which was not really true for us. Now, when we moved to Spark, this is also one of the benefits that we had, which was that it was converting our implicit data to explicit data. It was converting it into a probability score with a confidence saying that, okay, if this person is using so-and-so service X number of times, I can say that this is the probability that he or she likes this service. So the explicitness of liking or disliking a service was coming using that type really. So overall, it kind of enriched the process. And yes, our results were also like our accuracy numbers were much better using the Spark ML library as compared to the surprise package because of this one enrichment that the model was doing. Okay, that's really interesting. Thanks, yeah. Thank you. All right, guys, it's time. I am going to drop off and I'll be there hanging out at Discord for a while. If you guys have any questions, please feel free to reach out. Thank you. Thank you for joining. Bye.