 And now, let's crack on with our first speaker this afternoon. We're accustomed to paying different fares when we fly. Sitting on our flight to London, if we look around the plane, we know there's a huge disparity in the amount passengers have paid. Have you ever wondered how much that person sitting next to you on the plane might have paid for his ticket? Dynamic pricing optimization makes perfect sense from a rational point of view. So why don't we see it used more often? Why don't cinemas, for example, dynamically adjust their prices according to how full the cinema gets? Our first speaker today is senior data scientist at Cabify. And he knows how hard it can be to maintain the balance between demand and supply. So let's welcome Tony Prada. Tony, how are you? Great to see you. Well, how are you? Thank you for the introduction. Take it away, Tony. Yeah, we'll start straight away. I'm Tony Prada, I'm senior data scientist at Cabify, and today I'm going to talk to you all how we put prices at Cabify, how we define how much we charge for its journey, which problems we have with our current pricing process, and how we use reinforcement learning to create assistance that is automatic and is running all the time to try to optimize these prices. So let me put the slides. Just one moment. Can you see the slides? I can't see the slides, Tony. There? OK. No, no, no. Now, now, now, now you're OK. Thank you. That's perfect. So for the ones of you that doesn't know Cabify, Cabify is a very healing company like Uber or Leaf for others, and we are based in Madrid, and we operate on Spain and South America. And because we are going to talk about prices, the first question here is how our prices look. And here I just put one screenshot from one of the apps, and the most important feature here is that our prices are so affront, so passengers know how much they are going to be charged before ordering a journey. So how this works is that passengers select the original destination of the journey and we then create a route that connects these two points, and we use a routine system to calculate the distance and estimated time to serve that journey, including for, of course, traffic and other conditions. And then our prices are mainly two ingredients that are a price per minute and a price per kilometer that together with the distance and time we can calculate the final price of the journey and display to the users. And then users have the capability of deciding if they like the price and then they will order the journey and we will send a card to the pickup point or just maybe it looks too expensive to them and then just abandon the app and nothing will happen. And because we are going to, why is pricing that important? Of course, pricing is important for every company because it's usually the way to extract revenue from the operations, but in here because we are a market place, a two-sided market place, this is even more important. And what do I mean with market place is that we have a demand side that is people that want to go from point A to point B and also we have a supply side that is drivers that want to earn money by serving those journeys. So in this market place, pricing is a very important tool to keep the equilibrium between demand and supply because for example, if we increase prices too much, we will kill all the demand, but also if we lower prices too much, the drivers will disconnect because they will feel like they are not getting a fair compensation for the amount of time they are putting into the job. So we are a market place and by the way, we work by commission here. By every journey that happens, we take a commission. So how do we do pricing at QAFI? We have business teams that are local to the markets and now all things like cost of living, regulation, things like that. And they are the ones that build those prices, those price per kilometer and price per minute. And also they have the responsibility of maintaining those prices in a good state. So they don't break this equilibrium between the demand and supply. And of course, that means that sometimes we have to change prices. And the question here is how do we do that? Maybe we could just change the price, just put it on production, put it on releasing for everybody and see how it works and compare some metrics with some time periods in the past. But that wouldn't work very well when there are external factors that can affect that period of time. For example, it's difficult to test something at the start of the summer because you don't know if the metrics are affected by the new prices or just because it's the summer. So what we do here is using randomized control experiments. We have some very complex designs because in a marketplace, the independence between groups is not easy and all of that, but here we can use just the normal A-V testing. And to refresh what A-V testing means, it's about selling randomly, selecting a subset of users, randomly, again, dividing between control and treatment, and to the treatment group. In this case, we could apply the new prices and to the control group that will be business-as-usual. And after some time, we leave things like this for some time, and then we check the difference in the metrics between treatment and control, and of course, any external factors will have affected both groups. So that's OK. And we look for statistically significant differences between the metrics in both groups. And if that happens and the differences are significant and positive, then we apply those new prices to everybody. And while this pricing process is working well, we have identified some weaknesses, and these are the three. First, a scalability problem, because this is a process that has to be managed by humans, by people, and that limits the amount of attention we can put into the problem. So as the business grows in complexity, for example, we release more products, that means that we have to divide now the attention we put into maintaining prices with even more products and more cities. And that's a problem. Also, we have a reliability problem, because experiments are a one-off process that happens once, and then you use that learning into the future. And does it make any sense to learn something maybe in January and then use it in the summer, because maybe those results are not valid anymore? So then you have to experiment again with it, and that means that is linked to the first of the problems. And last but not least, each one of the experiments are about offering some options, and some options are worse than others, and by definition, by offering the worst options, you are losing money. So experiments have some cost. And what we thought from the data science team is that maybe we could do things better, and maybe there is a way to improve this process by creating something that is first automatic. So it reduces how many people have to be involved in the process, and something that is always running, and something that is cheaper than just running experiments all the time. Before we go into the solution that we designed, I'm going to play a little game with you. When you imagine that you are in a casino in Las Vegas, and you have a slot machine in front of you, like from the old movies, and you know that they are like big, and have a lot of lights and noise, and they have one big lever, one big arm that you pull down. But this is a special type of machine that it has not one but two levers, actually. And you have been told that one of the two has a higher payout than the other one. So you are better off playing one with one of the two levers. But you don't know which one is a good one. So what you do is maybe you play a little bit with the left and the right five times each. So maybe you put five coins into the left arm, and it gives you a price two out of these five times. And the right arm only gives you a price once. And OK, that's the only thing you know. What happens when you still have 50 coins left? What do you do here? Maybe some of you will see that the left arm looks better because it has the higher average price, something like that. So maybe some of you will put the money there. Or maybe some of you are not very confident about that five times is enough to discard the right arm. Maybe it's just a matter of bad luck. So maybe you want to play a little bit more, left, right, left, right, and then see what happens and then put all your money in whatever seems the best option. This little game is actually an experiment, real experiment that was done to try to understand how we humans face this exploration, exploitation trade-off. That is when you have a limited amount of resources and you have to decide when to explore which the best option is and exploit what you think is the best option. And if there is a field in machine learning that is good at managing uncertainty, this exploration, exploitation trade-off, and learning material and error, that is the experiment learning. And in fact, we are going to use one of the classical problems in reshared learning that is called the multi-ammed bandage problem. The first question is why the name? The name is actually because of the same experiment because bandage is just a funny way of naming or calling a slot machine because they steal your money. And multi-am because they have multiple arms to choose from these special machines. So what we have here is a sequential decision-making problem, sequential as you play one coin by coin and you can think between rounds. You have a limited amount of resources that is the coins that you can play with. Then you have to choose between competing options that are the two arms in this case and one of them will give more rewards than the other, that in case are the prices. And your objective is to maximize game on the long run that is just the accumulated reward. That is how many prices you got by playing your money. And this field was not created to play with slot machines, but for a more worthy case. It was introduced by Thonson in 1933 because what he proposed was to reduce cruelty in clinical trials. And imagine that you want to run a clinical trial. What you have is a drug, for example, that you think is going to help some sick people. And then you take some of these sick people and you randomly divide them between treatment and control. And to the treatment group, you give this drug that you hope is going to help them. But to the other half, to the control group, you are going to give them a placebo. So nothing that helps. And what Thonson proposed was to, as the experiment advanced, and you gain more knowledge of if the drug is working well or not, or if it's actually helpful, maybe you can add that location. So you put less people into the placebo, more people into the actual drug. OK? And why, if this was introduced almost 90 years ago, is such a hot topic right now. And the reason is that with the rise of online platforms, online services, as websites, apps, all of that, we know how the capability of offering very tidal of experiences and get immediate feedback about how these experiences work. And to put another example, imagine that you were working on the advertisement industry 50 years ago, and you wanted to test two different ads. Maybe you had to put an ad in a billboard in a side of the road and then leave it there for a week and then came back after a week, come back after a week and change it. And then after some time, try to understand which one of the two options worked best at boosting the sales of your company. And that is very difficult because attribution is difficult in traditional marketing. But nowadays, we have the field of online advertisement where you can show different ads to different people and get immediate feedback about how these ads are working because users can actually click on it. So you can get that immediate feedback and use it immediately for the next ads you are going to show. And the question here is if we are also an online platform, we also have this capability of offering tailored experiences. Can we use these techniques to improve our pricing process? And of course, that is what we are going to do just to refresh how our pricing process works. Remember that we show a price to users and then they can decide if they like the price and then the journey happens and we send that card to the pickup point or if they don't like the price, then they can abandon the app and nothing will happen. No money will be changed. So if we want to model our pricing process as a multi-annual bandit problem, we have to think, do we have a sequential decision-making problem? Okay, we can make it sequential, just go one by one. And do we have a limited amount of resources? Actually, we have because that's the demand. That's the amount of journeys that passengers want to make with us. So what is left is the different competing option we can use. And here what we did was to try to create something that is modular and works on top of what we have right now. Remember that we are already offering prices. So what we thought was to define the concept of factor as a scalar that multiplies prices up or down. For example, in here, this 0.99 will lower prices 1%. And then the competing option has the different factors that we can use to multiply the original price. So for example, in this case, the original price was going to be 7 euros, but used this factor lower than 1 than to lower the price and then showed the final price, no, of course, not the factor to the user. And then the user can decide if that's setting that price because it's unfair. And here we are very happy with it. And we will use the factor as a reward. And this sounds a bit weird because we can use also the price. But this will simplify things later. So we will use the factor because the higher the factor, the higher the reward, the lower the factor, the lower the reward, of course. And in case the journey may be too expensive and is not accepted, the reward will be 0 because we will not get any money out of this request. So now that we model our pricing process as a multi-almer bandit problem, we want to leverage the different algorithm and strategies that we can use from this field. And the first thing we need is a simulator, a way to play with these algorithm themes. And this simulator is actually really easy to implement. We just want to create fake responses from users. So try to simulate how users will respond to these factors. For that, we take just a totally fake made up, like price sensitivity function. That is the probability of acceptance, a probability of success of a request that has this factor. And of course, this is a monotony decreasing function because the higher the factor, the lower the probability. And when we want to see how a user will respond, we just randomly draw from a Bernoulli distribution with that probability of success. That is the same that Philippine allowed it going with that probability of success. So we have the simulator, and now we have to understand what we want to achieve. What we want to achieve is to maximize the cumulative reward, like with the slot machines. And in this case, there are factors that are going to be best than others. Because, for example, you can earn more money by doing more journeys, even if they are cheaper. So in this case, the expected reward is just the multiplication of the probability of success by the factor itself. And because we have a monotony decreasing function, this function will be unimodal. That means that it will have a maximum. And that maximum is the sweet point that we want to achieve. So for example, in this case, in market in green, we have a factor that is lower than one. That means we should lower the price. That it will return more revenue than other factors. And we want to go there, of course, without knowing in advance how users respond to prices. We have to learn that by experience. To do that, now we will see the different algorithms we can use. I will present two. The good thing about the multi-armor bandit problem is that all the algorithms are really easy to understand, are really intuitive. And this is a good example. In the epsilon greedy algorithm, what happens is each round, you decide if you want to explore or exploit. So most of the time, what you will do is to exploit what it seems to be the best option. And that best option is just the one with the highest average reward. So the factor that you have seen that gives you more reward in average. So you just have to store all the rewards that you got and divide by the number of tries, and then use that one. Okay, the one with the highest average reward. And that sometimes with the small probability epsilon, and that's the epsilon part of the name. The other part was the greedy one. This is the epsilon. With a small probability epsilon, you will do a random exploration round that is just select one option at random, one factor at random. And in this case, imagine that we only have three factors, so select one at random. And while we like this algorithm, and actually it's actually pretty powerful for the simplicity of it. The thing is that we identify some drawbacks here that are there is a hyperparameter that is the epsilon. And if we want to have a system like this in every city, and maybe every city has to have a different epsilon, that means that a lot of hyperparameters and that's a pain in the ass. Also, there is a constant exploration rate that is epsilon, and that doesn't make any sense because I think we should explore more at the beginning when we know less and as time passes, we should explore less and less and exploit more. And also, even when we are exploring, we are doing random exploration. So you will have multiple options to explore with, and one seems more promising than the others. The probability of getting one of them is equal to the other. So we are not taking that information into the program. So one way to improve this is another algorithm that is called Tonson sampling. Yeah, here, this is the same Tonson from 1933. What we will do is to liberate, to make use of Bayesian statistics to model what is our understanding of the probability of success of each one of the factors. So if we are modeling the factors with the Bernoulli distribution, the response of users that is flipping a coin, the prior distribution is the beta distribution that we can build by just counting successes and failures. And I think this is easier to understand. We see an example. Imagine that we have a factor that we have not used before, and that means zero successes and zero failures. The beta distribution looks like that, like this, that is a uniformly random distribution between zero and one because we know nothing about the probability of success because we don't have any feedback. But as time passes, maybe we try three times this factor, and we got two successes and one failure. And we know that maybe the probability of success is around two thirds, but we are not so sure and that uncertainties model and the width of the distribution, that means that we are not so sure about it. At as time passes and we try more and more times, we see how the distribution gets thinner and thinner because we are more sure about what is the real probability of success, okay? And how do we use that to make decisions? Actually, it's really easy. Imagine, again, that we have three factors to choose from and what we do is to count the number of successes and failures for each one of these factors. So we can build a beta distribution for each one of them. So we build this distribution and then each round, we randomly draw prior to success from this beta distribution that remember that can be wider, it can be thinner. And of course, each time because it is a random draw, each round it will be different. And then once we have the probability of success, we can build expected reward that is just the probability of success multiplied by the factor itself. So, and then of course we choose the one with the highest expected reward. And maybe you are wondering if this is not the same that being greedy, that the epsilon greedy or anything. And actually it is not because in the epsilon greedy, what we do is that we keep a point estimate of the expected reward, it's just a number. And in here, we are modeling our expectance of the probability of success with that distribution. So we have a distribution of the expected reward. And because we are randomly drawing from this distribution, sometimes we will get a number, sometimes we will get another, and we'll balance with that exploration and exploitation. Indeed, this has three nice properties. One is that there is no hyper parameter at all, which is really cool. The second one is that because the beta distribution will be wider at the beginning and thinner at the end as time passes, we will explore more at the beginning and we will explore less and less as time passes and we have a better understanding of how each factor works. And even when we are exploring and we don't have that concept of exploring or exploiting here, it's like a great concept here. But even we will try more the options that look best. So if there is one of the factors that we already know that is the worst, we will not give more tries to the factor. And this is actually the system that we end up building. Of course, we needed a name for it and the obvious choice was Optimus Price because we are optimizing our prices. I don't know, I will have to explain that. The idea is just to wrap up, we wanted to optimize driver earnings and we needed something that was always running and it was an automatic system. And we got to that point by using Thompson sampling that is the algorithm that we choose. And in here, we did two more modifications. I have a little bit of time, so it will be fast here. We did what we call heat cleaning, that is there is a natural structure in the problem that there is, if you don't like a price, you don't like small expensive prices, if you don't like a price, if you like a price, you will like cheaper prices. So that structure is not used into the problem because each factor is totally independent. And we can use that information by creating a window of which factors do we consider. So the last one, we are going to consider only the last one used and the neighbors. So as we get close to the maximum of the expected reward, we, all the options work with that, the window, and we don't have to consider all the potential factor that maybe is 20 or 30. And also we use discounted reward, reward that is just a way of giving more important to new data than to old data. And that is important if you have a system that is going to be running for a long time. And what do we do when we have a system like this? Of course, we again have to test how it works because the only way to see if things are working well or not is a randomized control experiment. So of course, first you have to move these requirements to production and we have an awesome engineering team that moved that. Actually, if you think about it, it's not a complex system because you only need some data-wise to count successes and failures and then you need some statistics library to build a data distribution and then some heuristics. So it's not crazy complex. And we created a randomized control experiment and a test where control group was just business or social and treatment group was got their prices modified by the system. So kind of fun experiment inside of fun experiment. And the results is that actually pretty good. We incremented driver earnings around 1% and just a little point. We did it in with the statistics in cities where we were kind of confident that things were working okay. So actually it should be hard for resistant to improve things there and we managed to increase driver earnings. And we did it by most of the time by lowering the prices that I personally love. So it's kind of a win-win situation. Remember that we did it by creating something that is automatic and is intended to run all the time to be always on. But some of the cities have worse results than expected. So we take a look into the data and we found that the ones that were working performing worse were the ones that were prices were increased. So what happened here is that we looked into the data and we found that in these cities where prices were increased, the treatment group had less activity in the platform than the control group. And that it was because they had low retention. If you think about it, it makes all sense. The prices effect on decision are not only certain because what is happening here is that you saw a higher price. You are reducing the probability that somebody will come back later in the future to use you again. So we created something that is pretty myopic that only considers the short term and we had to include that long-term information into our program. And I personally also love this that it seems then that lowering prices is even better than we expected to. This is actually the last slide. The status right now is that we have included this information by modeling how the price imparts retention and including that information into the reward function. So the reward function is about the rewards are about the benefit that gives you each option, each factor. Then what we do is to not only consider the current journey that's happening but also the potential food to impact in future journey that didn't happen yet. So modifying the reward function, we can include that information. And we are going to test again, again, of course, in randomized control experiments in every test and just to be very fast, a really cool feature of Bandits just to think about the future roadmap is there is this thing that is called contextual Bandits that let you include more information into the decision-making process. So for example, we can include the time of the day into this decision. So we have a different response for each time of the day. And that's all. I hope you liked the talk. Maybe it has inspired you to use reinforcement learning for something more than playing video games. That is really cool, but not all. It's not for everybody. An 18 talk. I'm a big fan of Capify and I'm already a big fan of price optimization. So it didn't have to do much to win me over. I'm wondering about your customers in general, though. Is there a certain amount of confusion created by having changing prices? Even perhaps when prices are cheaper, doesn't that create a certain amount of confusion? And do you see your role here as educating the market to a certain degree? Yeah, that's a very good question. Actually, these factors are not... It is not easy to see these changing prices because every journey is different. Not every journey are equal and it's difficult to compare my journey with your journey. We are going to different places. But there is truth that is sometimes confusing that you are changing prices. We have some systems like high demand, supplements and stuff like that that already change prices. But yeah, one of the things that we have left is to research how users feel these changes. Actually, the system should convert to some point so after some time we should keep things constant. But yeah, something that we should do is to use our user research team that is also to ask users about how they feel about it. Right. And how do your competitors feel about what you're doing? There's that small company, you may have heard of them, the one that begins with you. I can't remember their name, but they're a competitor of yours. Uber? How do they feel about what you're doing and are they following similar lines at all? Yeah, usually we are in a world where competitors are usually bigger than us. That's really cool because we can do many things. They do things similar to this. Actually, for example, Uber, you never really know why they give you a price and that's price is changing too. So we assume that they have a system like this one, but of course we are not sure about what they have. Right, okay. We've got some questions that have come in here. So let me fire some of these to you. Question is how do you handle the opportunity cost? Once you have assigned a driver, you can't use it again until it's available again. Maybe there could be a better option a few seconds later. That's the question. That's a different question. That's a different system. Here we are focusing just on the pricing and pricing like a really, really small funnel that is you ask for a price and then you get the price. Of course, the marketplace is way more complex and another very big part of it is the matching system. That is that is when you match a driver to a passenger, and maybe you have to wait a little bit more to find a better driver or maybe you want to just match them straight away. So that's a totally different talk, but the last system like that, that consider the opportunity cost of doing those assignations. Okay, you've got a lot of fans here, several comments congratulating you on your talk. Another question coming in. I was wondering if you are setting your factor, 0.991, 1.1, 0.01, et cetera, with respect to the optimal price and obtain a factor different from 1, doesn't it mean that your original pricing wasn't optimal in the first place? Yeah, yeah, that's a very good question. And the question is, for example, what happens if the price change, these factors are not useful anymore. That is something that we even got with, right, yet. So the moment we moved this to production and not just testing, we had to get that right. But yeah, it seems like these factors are just useful as while the price is still the same. I don't know if that answers the question. Well, another question here that I like very much. How long did it take you to develop version one of optimum price? Yeah, very good question. And here, most of the work is from the data science team because what we did was to play to define the problem and define the potential solutions and play with the simulator. So play with just fake data. And the moment that we, I don't know, that's maybe three months of work, not all of people. I don't know how much it costs. But then is when we involve other teams. The engineering team is involved when we have something that we want to test and then they have the big responsibility of moving that to production. And that is sometimes hard because these are real time systems. So we did most of the work at first and then we moved the responsibility to the engineering team that moved that to production and helped us to create the experiments. Great, final question, Tony. I think you have another fan here. Maybe someone potentially gonna send you a CV. How do you guys find the time to develop and share such awesome ideas? It seems like you've got a great team there. Yeah, this is like perfect to the advertisement thing, recruitment. This is an awesome team. And as I said first in the first question, we are smaller than other companies. And that means that we are a lot of opportunity to do things like this because not everything is done. We have 100 data scientists. So part of our work is to help others to have some everyday job that maybe is not as cool but we have the liberty to create something, stuff like that. So if we have this idea, we can manage and get the time and then work on it. Okay, that's great. So yeah, you want to send some CVs, that's awesome. I'm sure people will be getting in touch with you via the networking panel on the website. So Tony Prada, thank you so much for that talk. That was fantastic. Thanks to you. So we're going to take a short break now and we'll be back in the garage in five minutes for our next talk. Don't go away.