 Tresous, welcome. Tresous, a pleasure having you here, so in this fabulous set, you can't complain? No, of course not. So all yours, all yours. Thank you very much, hello everybody. Thank you very much for that very nice introduction and thanks very much to the organization for inviting me to be here with all of you today. Sorry, a little bit about myself very quickly. So I'm Jesús Montes and I'm a data scientist and I've been a data scientist for the last roughly 15 years and I've been working in and out of academia during that time during different projects. Nowadays, I am a professor at UPM at the Technical University of Madrid and I'm also, as Elena has said, I am the principal data scientist at Cabify. So what is Cabify? For those of you who don't know what Cabify is, Cabify was born approximately 10 years ago as a right-hailing company, as a Spanish right-hailing company. Nowadays, it's a multi-mobility company that has more than 33 million users worldwide, more than 400,000 drivers moving people all around the world. 65,000 companies operate with us every day and we have staff of roughly 900 people, which 50% of those are women, which is always nice, of course. So as I said, Cabify operates worldwide. Specifically, we operate in eight countries around the world in Spain and Latin America and in more than 40 cities, which means that we have a very terogeneous marketplace with different regions, different areas, different cities of different sizes, different types of cars, clients, etc. So it's a very complex environment. But let's start talking about a little bit about Cabify and what Cabify is from the point of view of our users, the people who use our application. So the first type of user, the most common one, probably most of you who have used our right-hailing company are part of this group, are the passengers, right? The people who want to find a car that takes them from one point to another, right? So from the point of view of the passenger, Cabify is an app that lets you find a driver to take you to your destination. And your main objective is to get there quickly, so you want the driver to be where you are as soon as possible and to take you to your destination quickly and efficiently. And of course, there's always the bonus thing of having a good price, right? If the fare that is supplied to your trip is a good one, it's not very expensive, then that's a better scenario, right? On the other side, we have the other type of users that we have, which are the drivers, right? These are workers that are driving around cities with their cars and they're looking for passengers who want to use their services, right? And some of their objectives are very similar to the ones the driver has in the sense that they want to locate people who want to move around, right? Passengers and they want to locate people that are closer to them, so they don't have to drive long distances to pick up their passengers, right? But from the point of view of the price, of course the objective is the opposite, right? These are workers that are trying to make a profit, right? So they want good journeys, good drives, which tend to be more expensive the better, right? So these are the two main users that we have and Cabify is like in the middle of those, right? So what do we see? We see a marketplace, we see a complex marketplace with a lot of passengers, a lot of drivers, you know, looking for each other at the same time. And the question that we try to answer here is can we make everybody happy, right? Can we find a way to assign the best driver for each passenger looking for a ride, you know, so they operate? And as I said, there are like two forces here at play and some of the objectives of these two forces are similar, right? They both want to look for someone from the other group that is close to them, but from the other point of view of the price, the objective is different, right? So we have to find an equilibrium and we have to find a way to, you know, get the best situation for all of them, right? This is what we call matching, right? Finding a match between passengers and drivers. And finding a right match is hard, right? It's not something that you can do easily, right? From a mathematical point of view, this is what is called an assignment problem, which is if you have two sets of elements and you have to create what is called a bipartite graph, which is assigning each one of them to the other and finding the optimal assignment for this, right? And this is based on what is called a cost function, which is the number, a way to measure, you know, how good a specific assignment between, in this case, a passenger and a driver is, right? And of course, this is a problem that cannot be solved by brute force, you know, actually there's like factorial and factorial combinations of this problem. So we have to find smarter ways to do this, right? And well, depending on the technology we use and the techniques we use, the problem would be different. But at the core of everything, it's always what I just mentioned, the cost function, right? How do we measure how good a specific assignment is? So we can compare them and decide which one is the best one, right? So we do that based on the data that we have, right? And what kind of data we have? What kind of data a company like Cavify has when we're trying to make this assignment, make this matching, right? So typically we do this like in two stages. The first stage what we try to do is find select valuable candidates. Let's say we have a passenger that is looking for a driver to take him or her somewhere. We have to find which are, you know, feasible candidates for that passenger. And that means looking at the data that we have and trying to make a decision, right? And we can use many different pieces of information to, you know, to use that. One of the most important ones is how far the car is from the passenger. Meaning how long it will take for that car to drive where the passenger is, right? That's what is called the ETA, the estimated time of arrival, right? Once we've done this first filtering of candidates, we have to pick the best one. And again, we have to look at the data that we have. And again, in a scenario like this, we can use many different things. You know, how much time the passenger has been waiting and things like that. But again, one of the most important things, if not the most important one, is how long it would take for a specific car to get to where the passenger is waiting, right? So again, the ETA is the key factor here. So we have to find a way to calculate this ETA. And of course, this is something that we cannot know, you know, in advance because the car hasn't moved, so we cannot time the car and see how long it takes. So we have to predict that. So that's why it's called an estimated time of arrival, right? So how can we do this estimation? Okay, so before we start thinking about ways of doing this estimation, we have to give some, you know, set some boundings to our problem and understand the kind of things that we require from this, right? And the most important thing here is that this is not something that we do, you know, sometimes we're doing this type of calculation a lot of times, right? In a typical day of operation, Cavify makes it more than 150 million of these estimations, you know, worldwide. So we need something that, first of all, gives us an estimation very fast because if we don't have the answer fast, we won't be able to, you know, do the correct matching very fast. Second, that estimation needs to be accurate. We have to have an estimation that it's a good estimation of what is going to happen, right? If the car is going to take two minutes and we say 60 minutes or we say five seconds, that's not acceptable, right? So we have to find a good estimation for that. And thirdly, we have to do it in a reliable way, right? We have to make sure that the service that calculates this estimation is working all the time and, you know, it doesn't crash, you know, because if we lose this service, we lose the capacity to do the matching, right, to do the estimation. So with all that in mind, there are like three alternatives that we can have when we try to do this estimation. The first one, it's apply a simple solution, right? So take something like a linear distance or a haversine distance, something that is really, really easy to calculate. We could use that and see what happens, right? So that would be our first alternative. A second alternative would be to rely on an external provider. There are many companies around the world who provide these kind of services, people like Google Maps are here, and they provide this as a service and you can hire that service and you can, you know, send requests to those services and get the result, right? So that's an option as well. And there's the third option, which is build an ETA estimator, an ETA predictor ourselves, right? If we have the technology and we have the knowledge, we can create our own ETA calculator, right, instead of relying on another one, right? So let's consider the pros and cons of each one of these alternatives. If you think about the first one, you know, you're talking about a simple solution. So the main pro of this is that this is very easy to do and very easy to maintain, right? If we think just from, simply from a technological point of view, it's going to be a very small piece of code, right? So it's something that I can write and it's easy to maintain and it's very, very difficult for it to crash if we, you know, deploy it correctly. So those are all good things, right? Of course, in the other hand, we have the problem of how reliable, sorry, how accurate our estimation is. So of course, if we are relying on a simplified solution, the estimation is not going to be very good. But, you know, we have the pros that, as I said, we have to consider as well. The second alternative, you know, focusing on a server provider, in this case, well, we have all the benefits of all the, you know, the accuracy and the reliability of relying on a service that has been tested and has been backed by a big company like Google or something like that. So we have to, we can't rely on all that, right? So we know it's going to be reasonably accurate. We know it's not going to crash, et cetera, et cetera. But on the other hand, the problem that we're having by, you know, using this alternative is that we are, basically we're relying on an external provider to provide one of our core functionalities, right? At the core of what we do is assigning passengers to drivers, right? So if we rely on an external provider to use that, to do that, we generate a dependency, you know, and that dependency could become a vendor lock. So, and that, you know, limits our options, right? So that's a big problem that we could have, right? The third alternative is, you know, to develop our own solution. And developing our solution is good because we control the technology, and once we have the technology running, you know, the costs are much lower, right? When we have an external provider, we not only have to hire the services to provide it, but we have to pay a monthly bill, you know? Once, if we have our own technology, we don't have to, you know, we don't have that problem, right? On the other hand, the big question here is, can we do it? You know, it's actually possible to, you know, to actually create something like this and, you know, and have it running at a similar level of performance that we would have if we rely on a good external provider, right? So that's the challenge that we faced here, right? Can we make our own service for calculating ATAs? And can we beat the best commercial provider that we have with it, right? And that's, of course, that's not an easy question to answer. And this is what we tried, and we've been trying, you know, for a couple of years now. So with this challenge, you know, in our heads, we developed what is called copy maps, which is our own ETA estimator. Copy maps is an artificial neural network. It's a deep learning model that takes, you know, several inputs, basically takes the location of the place where the car is, the origin and the destination, the place where the car wants to go, and, you know, the date and time and these kind of things. And it does all the data processing, it does all the prediction, and it gives me the time in seconds. You know, how long in seconds my driver is going to take to drive from point A to point B. So, and we want to do that with a couple of additional restrictions. First of all, we want to use exclusively our own data, right? We don't want to rely on additional external sources to do this, right? And there are several reasons for that, for this, sorry. The first one is that we want to, you know, we don't want to be dependent on any external source, so we want to do it with what we have, right? And actually we are in a very particular situation where we are one of the few companies in the world that we can do such a thing because we have the data, right? Because we monitor the position of our cars every moment, right? Because we have to know where the drivers are so we can assign journeys to them, we can assign passengers to them. So we have the data, we know, we can see in real time how our drivers are moving around the cities, around more than 40 cities around the world. So we have the data, so the question is, can we use that to create the model, right? And of course, this is something that is going to evolve, it's not a model that is going to do it once deployed and, you know, and just, you know, be happy. You have to retrain, you have to re-adapt, fine-tune, et cetera. So it has to be a piece of technology that can be, you know, adapted and improved over time. So how did we build CabinMap? So we started in, you know, late 2018 as a sort of a hackathon kind of project, you know, in a couple of days, a couple of guys from the Data Science Department, they very quickly, they hack a, you know, a proof concept of this, and it showed, you know, a lot of promise, so we decided to, you know, keep working on it and improve it, right? And from that original, you know, hacked version of it to nowadays, we've been adding lots and lots of features to CabinMap and making it better every month, right? So let's talk about these steps. The first, the first step that we did was a very basic attempt. Basically, we just took the coordinates, the origin and destination and the date, and a couple of really easy, you know, values to calculate, like the, you know, Haverson and Manhattan distances between points, et cetera, and we fed that into our machine learning model to our deep learning network, and we trained the network and we tested it, right? And the results were not good, of course, right? And this is probably not surprised to any of you. Specifically, we got, depending on the city, we got, you know, an error, an absolute error that was between 150 and 250 seconds, which is like two minutes and a half to a little bit more than four minutes, right? Which is way above what are the, you know, the performance that commercial providers give nowadays, which is around two minutes, around 120 seconds of error, right? So, of course, this wasn't the final result. This was just the first step. And it serves as a way to set the baseline of the kind of data sources that we needed, the kind of data processes that we have to build and how to create the framework to create the model. And, you know, using this as a foundation, we start adding layers and adding functionality to it to make it grow. Sorry. So, how did we make it grow? So, it is very obvious if you think about it that, of course, this model is not going to work well because it's so simple, right? And there's the most important thing that we know is that location matters. And what do I mean by location matters? What I mean is that where a car is located right now when moving, it's really important, right? It's not the same thing being on a, you know, on a busy neighborhood and rush hour and it's also raining and things like that, right? Like being in the middle of the countryside, right? So, where the car is, at what time it is and what are the geographical characteristics of the surroundings of that car, it's very important, right? And it has a lot of impact, right? So, if we had only one city or a very small region, we could do this by hand, right? We are made each one of the streets, you know, and optimize our problems so we can adapt to that, right? But, of course, that doesn't scale and we cannot, you know, pretend to do that for 40 plus cities around the world and have it updated and involved over time. So, we have to find ways to make the fact that location matters, you know, present in our model in a scalable way. So, the idea was to try to find a way to provide some geographic context, provide information about the geography, the place where the cars are located so the model can use that information to do a better prediction. And when considering how to provide this geographical context, we have three things that we have to consider, right? The first of all is the kind of information, the geographic information that we want to include in our model. So, of course, there are things like traffic or weather or things like that that are very obvious that could, you know, beneficial for the model, but we have to understand that we are working on a scenario where we have to do hundreds of millions of estimations per day. We have a service that cannot crash at any time because if it crashes, our core business crashes. So, you know, having this kind of information means that we have to add external sources of information to our system. And also, these sources of information have to be, you know, comparable between cities. So, if we have a service for providing traffic information in Madrid has to be, you know, similarly as good as one for Buenos Aires or for Mexico D.F. So, and that's not an easy task, you know, and again, with the weather or with any other source, right? But on the other hand, you know, you could argue that we already have kind of that information already present because we know how our drivers are behaving. So, we can infer the conditions in where they are because of how they drive, right? If they take, if it takes too long for them to, you know, to cover a certain distance, maybe that's because there's something, you know, making them slow, like there's, you know, a lot of traffic and there's a heavy weather or something like that. So, we have that information there. So, the question would be how can we use the knowledge that we already have in the data that we have to infer that information. The second thing that we have to consider is how do we aggregate the information? What I mean by that is that there are geographical elements that are, you know, relevant when they are very close to a car, you know, like for instance, if I am in the middle of a road and there's a roadblock, you know, that's relevant to me. But that's not relevant to the car right? So, that's an element that is very, very, very, you know, dependent on where I'm located, right? But if I am in a residential neighborhood, for instance, there are certain aspects of how people drive around a residential neighborhood that are going to be constant around the neighborhood, right? So, there are different levels of scale that could apply here for different sources of information, right? So, we have to find a way to aggregate the information, you know, in an intelligent way depending on the kind of information that went to incorporate in the model. And the third one is how do we detect what's relevant in each case? And what I mean by this is that depending on the scenario, depending on the city, depending on the moment, what is relevant is different, right? So, for a particular moment being close to a sports center could be irrelevant, like for instance, if nothing is happening in the sports center but in a different time, it could be relevant, right? And these kind of things are not easy to do at scale, right? So, knowing what's relevant and what's not and what geographic information is important in each case is not an easy task, right? So, we had these things, these three things in mind and we tried to define ways or to create ways to incorporate all these in our model, right? And we did it by using two things that I'm going to tell you about right now. So, the first of all is we move from a pure, a very basic model based on coordinates to a model based on index, especially index cells. This is, especially indexes, it's a way of organizing the geographical space by dividing the let's say the surface, the entire surface of the earth we can divide that in cells which would be like hexagons or squares depending on the cell, the index that you use. And you could control the size of that cell and depending on that you could aggregate the information inside the cells. So, yeah, well, you could have different levels of resolution, let's say, right? And we used, you know, you see in the slide two examples of using H3 and S2, which are two alternatives for doing these that we've been using over the years. So, once we have cells, the next thing that we want to do is we want to represent the information and to determine what's relevant. And for that, we use what are called cell embeddings. So, embeddings, for those of you who are not familiar with the term, an embedding, it's a tool that is typically used in machine learning models when you have a categorical variable and you want to, you know, incorporate the semantics of that categorical variable in the model, you can use that, you can use embeddings for that. And to explain what the embedding is, you can see the example I'm showing you in the graph, which is the classical example of calculating the embeddings of words. So, in this case, we have the English language, so my variable would be all the words in the English language. And what I want to do is I want to map the meaning of those words, the semantics of the words in a set of coordinates in a multi-dimensional space, so that the positions of those words in that space are somehow representative of the meaning of the word. So, you see the example in the slide, which is the very typical one of I take the word king and I subtract the word man and I add the word queen, sorry, the word woman, and I get the word queen. So, having this kind of mapping between semantics and special coordinates. So, how can we do this with cells, right, especially in Excel, because we don't have words here, we have cells of a city, like tiles of the world. So, there are different things, different ways of doing this that we have explored over the years. The two main ones you can see here, the first one on the left side of the slide is the first approach we used, which is based on the word-to-back algorithm. So, the word-to-back algorithm is one of the classical algorithms that you use for generating word embeddings, and it works by analyzing sentences in a specific language, analyzing the words in that language, in those sentences, analyzing the position of the word, and trying to learn what the word means by seeing the relationship with that word with the rest of the words in the sentence, right. But we don't have words here, we have cells and it's a completely different scenario, right. Okay, but we have cars that are moving around the city, so we have paths, right. And it follows a certain path, so we can use that path as if it was a sentence, and the sequence of cells that we have as the words in a sentence. So, we turn the paths of our cars into sentences, so to speak, that are sequences of cells, of cell identifiers, and we train word-to-back using that. And the result is that we get embeddings for those cells, and those embeddings, they, of course, they don't contain the meaning but these are not words, they contain information about the structure of the city, how the cars move through the city, what streets are faster, what streets are lower, what kind of routes are preferred by the drivers, the direction of the streets, all those things are implicit in those embeddings, even though we haven't provided the model with information about those, but we just, the model looked at how the drivers move around the city and it returned the embeddings, right. The second alternative is using what is called an embedding layer on a deep learning network. An embedding layer is a specialized layer that you can put on a deep learning network model and what the embedding layer does is it trains an embedding using the output, the final output of the model, so in this case, instead of training based on the sequence of the paths, the sequence of IDs of cells, what we do is we train the embeddings in terms of the final output of our model, which is the ETA. So, in this case, we get a model that learns, sorry, that learns about the relationship between the different cells in the city in regard to the kind of trips that are done from those places, how long those trips take, etc. You can see here two examples of this second alternative I just explained. These are two images of the city of Madrid and what you see, colored in yellow is the target square, the cell that we are analyzing and the rest of the cells are colored depending on how similar they are. If the color is darker, it means it's closer to that. And if you see the image on the left, you can see that just by looking at how people drive around those cells, the model has been able to learn the structure of the city. Specifically, for those of you who know the city of Madrid, there's a very big avenue called Castellana which crosses the city from north to south and it acts like a geographical barrier and divides the city in two different areas with different behavior. So you can see on the image on the left that the model has learned that. So the cells that are similar to the ones that we selected are all in that side of the city and not many of them on the other side. In the second image that you see on the right side of the slide, I am picking the Portal Sol which is like the historical center of the city and what you see here is that the similar cells, the similar area of the city according to how our drivers move is the rest of the old town of Madrid which are streets that are very narrow and most of them are one direction only and it's generally very difficult to drive around that. So without explicitly telling our model anything about the city of Madrid, it has already learned the structure of our neighborhoods and how people drive around them. This is what we use as the input to our model. So once we have that and we have all the other things that I've mentioned, what we do is we train our model, we create we don't have one single model, we have one model per city which makes sense because what happens in Buenos Aires is probably not relevant to what's happening in Madrid and vice versa. So we have one model per city and we select our data intelligence so we can create a good model and we add all the typical improvements that you typically do when you do an advanced deep learning algorithm, sorry, deep learning model and using that we train our model and we have all these models to predict the ETA. So how good are these models, right? So that would be the question. So here you see some results of our analysis. So these numbers are actually taking for the last retrain of the model we did which was only a couple of weeks ago and the image you see on the center is the mean absolute error of the model per city and as you can see more of our basically all but one of our cities are below 150 seconds, around 120 seconds which is the objective that we had and on the right side you see the comparison between our model and the best commercial provider that we are using nowadays and the good news is that 60% in 60% of our cities our results are similar or better to the commercial provider that we were using. So it's actually a very good thing to use this kind of model because in most scenarios we can have results that are equivalent to the commercial provider or even better and the average error that we have as I said is two minutes which is around the same number that we would have if we would be using a commercial provider. After we have the model, of course we have to deploy and everything else so the end is not just doing the model, we have to monitor its performance, retrain it periodically etc. So for these we take advantage of all the state of the art technologies, TensorFlow etc to export the model and deploy it quickly and we have a lot of monitoring tools and performance metrics so we can see how our models are behaving and detect when things are not happening as we hoped so we can retrain, re-adapt etc. So just some final remarks about what I've been telling you about so as you can probably see this is a very complex problem and we have been able to tackle it successfully just both from the point of view of the model itself and the machine learning part but also from the technological part so we can serve this ETA in time and with the reliability that we need we have found a way to incorporate something that was obvious that we needed which is location information but not so obvious how to do this in a way that is also scalable and we can control our model. We have managed to do all this relying only on our data which is great because it breaks a lot of dependencies with external sources and finally we have significantly reduced the costs that we have in using an ETA predictor. I'm talking about several millions of euros a year without compromising the quality of the services that we provide to our users. So, this is it. Thank you very much and if you have any questions. Thank you so much Jesus! Round of applause for Jesus! That was fantastic! You're such a, come backwards with me! You're such a fast speaker! My god, I have to give him some water because this man is going to die here on stage. Jesus! Well, first of all, I love when you started saying if anybody doesn't know, let me explain what a cabify is. I mean, seriously, do we have to explain this? Have you been living in a cave for the past 10 years or so? Come on, very humble. I'm one of the best clients of cabify. Half of that, every time I go into a cab, I say this car is half mine because I'm paying it from the very beginning. Let me, well, we have some questions and we have some a few minutes. They're asking you, very interesting. This model to me, is it too good to be true? Is it more reliable? I mean, come on, come on, give us some is everything so good to be true? More reliable, faster, cheaper. Are there any flaws? We've been working for almost 4 years on it, so it's not just a few luck. It was just a tissue. They're asking you, if you use DL to create the ETA, how do you get the route to display in the app? Because the DL will not draw a route through the road network. Ok, that's a very good question, yes. So, exactly, we don't a cadmium app doesn't calculate the route so you don't, the thing that you see on the screen is not calculated by that. We have a different model for doing that. So the reason for that is that drawing a route in the screen of your app is something that you do once per journey, once per driver. Once you hire a driver in the app, you get a route. But when you're trying to do the assignment, trying to do the matching, you have to do this calculation for many, many cars. So the problem is different and the restrictions and the needs of the problem are different. So you can use one service for just drawing the route and use another service with a different level of requirements for doing all these very, very fast calculations internally. You have so much fun at work, aren't you? Great, there's another question. How do you apply the relevance of newer data in your model instead of just using historical data to train? Do you use a LTVM and, my god, too complicated? Ok, so yeah, that's a very good question. So the problem is that we have experimented with that but it's actually very complicated because our marketplace changes every day. Events happen in cities around the world every day and we have different players that are appearing in the marketplace from one way to another. So it's actually very, very difficult to anticipate. So our approach so far has been basically monitoring how our model is performing and be very quickly and updating our model when things change. So we can be as close to the current situation as possible. But of course there are scenarios that you cannot anticipate. Drink. Drink some water, please. He's going to die here today. Come on, drink. Take it easy. We still have a couple of minutes. Ok, you were saying? No, that's it. Ok. Well, I'm just checking. We have one more minute. You said this model is obviously improving. It's changing by the day. This has no end or eventually. And what about the competition? Well, we cannot know what the competition is doing. But from what they publish we understand that we are all different players that we work in this industry. We have the similar the same kind of problems. So we have our own solutions for them. So I guess probably some of the most immature players still rely on external providers to do this kind of service. But I'm just guessing here I cannot know what the competition is with. So does this model have other uses maybe eventually to be a provider even for the competition that it could be an alliance? I don't know. I'm just giving ideas business model. Actually, I can't say because of course there's always that possibility. But you don't know because it doesn't happen. Just in case you tell us something and nobody knows. Anyway, so Jesus Montes fascinating. Cavify always. Every time Cavify is in an event there's a lot of interest. So we'll stay tuned and congratulations on the model. Big round of applause for Jesus Montes our virtual we say goodbye to Jesus.