 Okay, it seems it's working now First thanks Everyone for for coming over here It's the first time I'm in this huge room as a speaker. I tend to come here to watch movies and it's kind of weird And I'm gonna talk to you about how we use the the data at cavify as probably if you are here You know about cavify Is basically a company that has grown a lot lately this what this is what everybody knows about that about us We are a right hailing company in Spain and Latin America We are and this is kind where we are one of the first at least if not the first Spanish unicorn and What not everybody knows but I can share with you without getting fired is that We actually play in one of the most competitive industries nowadays and the metric because you are data people is the billions of VC funding The mobility as a service companies have raised in the in the last 10 years all our tech is built in Madrid, South Paulo and We grow exponentially for real Because you are data people you understand that this might look exponential But when I do this and in the logarithmic scale, this is an straight line of 20 The 20 percent monthly growth within five years. That means that is truly exponential, right? And apart from having someone that really looks like a business school Case study, but I told you that is happening for real the cool thing of these hyper growth is that a Now we can start understandings the cities where we operate and the best way to understand is using our very own data for example, this is a City I I think that people in the in the audience can guess the city and the cool thing here is There is actually no map. I am just throwing points in Cavified drop of locations and as you can see you can identify in the patterns of what are the streets Where is the Retiro Park? This belt of black around the city is the the highway the N30 Obviously, we don't drop off passenger in there that wouldn't be safe at all And and it's it's kind of cool I mean I'm sure that many people in the audience is familiar with the Facebook Facebook world map on the This is kind of the of the same thing. I have so much data that I can understand them the map draws itself and This is the point that I will be making during my my entire Talk and because I am not really good at convincing people I tell the same thing many times So that by the end of the talk you will be convinced and is that because we are a non mature industry We can get huge impact applying simple solution I give the country example if you are trying to use data science so machine learning to optimize the Supply chain of an Airbus of the Airbus 380 You maybe get some but in the end you are looking at a problem that many people have many smart people have been looking at in the past 20 years So the the the impact that you have is probably incremental While at Cavify and other re-hailing companies We are facing of problems that literally no one has faced ever until the last five years because the problem itself didn't didn't exist so One of the things I see people are are doing more and more often. How many of you use data doc in your companies? and How many you use it for things that was not designed for it's because I am seeing this trend of hacking Monitoring system that the scale well in principle data doc was designed for for For looking at systems right at computers CPU memory whatever we actually found that we could hack it and provide our Operations teams with real-time information about how the city is performing And it was super nice that the last month I was in in Sao Paulo and the operations teams were looking at the This access rate which is basically how much of the demand are we are we able to serve in real time to activate campaigns to send to drivers? Based it on that and there was literally a row of computers with that dashboard that I thought it would be useless for the for the business for the business team and They were like added to to see just like traders will do right So It's an example of something like super simple that we are we are getting a lot of value from It's also cool when you have that's my data you you understand the city Does anyone remember what happened in Mexico City in that date? There was an earthquake right and just looking on our connectivity from drivers You were able to see the exact minute of the of the earthquake because Basically the the mobile system went down and then I mean most of them as you see we we kept some drivers Drivers connected, but it's like out of the blue. You built a seismographer, but because you are Gathering so much information on so much people you really can feel the big events even with super simple tools So again the the point of of the talk is that we get huge impact from simple solutions Another thing is dynamic pricing all right-handing companies have dynamic pricing We were one of the last to to get there and we did because the competition was hard for getting drivers in the peak hours But the scene the systems that we that we built This is the first system now We have something a bit more complicated, but the the basics is the same. It's like yeah, we will Every minute we will see how how would where we were Serving the last 20 minutes of of journeys in in different hexagons around the city there You see a map and basically if there were many people waiting for long time We raise the price if there is if there isn't As many people as they used to be or the trend is going down then we we lower the price So basically we are just now casting if we are not we don't have a first forecasting Stage and we are using a feedback loop where the user is part of the feedback Right if we would go too high in price they could stop ordering and therefore scarcity good go down and Basically we we just do that and then we do some smoothing so that is you don't feel like you are in a casino when you are ordering So it's temporary and an especially Smooth and just by using this super simple system this is the the rights with with a supplement high demand supplement and On November 14th. We last year we launched the pilot in Bogota and 23 days later we were full production in all of the cities at Cavify and for such a simple system it took like 17 days to generate one million dollars additional revenue for the drivers, which was the the aim of the project so it helped us building the The case of of simple solution with huge impact just the idea of the project was to increase the 15 to 20 percent driver earnings to them pick ours And then we we came up with a funny way to to measure that is like if you have done a ride with 20 percent supplement is like a Right with a 0.2 rights happened in a new city that we just found it that is sure tropolis sure because it's the the technical name of the of the thing and the The cool thing is like just the first week in full production We had generated over a hundred and twenty thousand virtual rights in value for our for our riders And it's kind of cool because to get To get to that they need typically to work over a hundred thousand hours and drive for 2.8 million kilometers That we just created value out of the blue out of bodily balance in the the marketplace Another idea this is this gets a bit more complicated All right hailing apps. We love our matching system because is where we feel matching system The idea is super simple. It's like when you have a ride. Who is the driver that should receive? the ride And it's actually where we bring a measurable impact in the world so Comparing to hailing taxis on the on the street the usage of right hailing apps because they can see the entire city and make More close to optimal decisions The idea is like a typical taxi driver is only busy 30% of the time while our drivers are on average busy for 55% of the time so we are pretty close to double the productivity of the overall system How that happened Well Before there was a data team we were doing something smart But probably overly simple that which was okay I will get the the rider who has been the longest on the queue and I and if there is some There is some some driver available within x kilometers and those were like manually set Radius by local teams We will we will assign that that driver and we called us these the greedy approach because he's fairly easy to understand Why doing thing this way can get to suboptimal decision. I will show an example later on and What we did in the in several iteration with experiments or so on is like yeah We instead of a considering bird distance We will try to use ETAs because you for example Rio de Janeiro It's a city where there are like big mountains in between and you can be in the map very close But it might take you one hour To get to get to the point and Also the idea is instead of using manually set a ready this happens when companies grow big by the before we kill this we end up having like 18,000 and manually defined rules on what should be the the radius in Certain areas of city for sending products a certain hour on a weekday on a weekend It was it was absolutely impossible to to test And And the the other thing we we did is like instead of doing the greedy approach the journey by journey We solve the the Hungarian algorithm, which is a 100g a lot math for for assignment problem basically what it allows us to do is like a Depending on if we were lucky or not We might be matching riders and drivers that way just depending on who ordered first But if we can consider the entire city We can mathematically be sure that we will do the the green assignment, which is much shorter Again We measure this for real our experimental setting was like during 13 minutes in the city We will have much a working and do this and then 13 minutes will have much a be and we will do this for a for a long period of time to avoid any seasonality or whatever and The what we measure is that our pickups are not shorter and faster on average. They are like 30 seconds 30 second faster with at our scale by the time we are now bigger, but by the time we run the experiments We were already saving 25 to 30,000 hours of work per week in going to pick up. So what I explained now is a little bit more a more complicated, but In the at the bottom of this there is nothing that hundred year old Math, we didn't put at this state any much in learning or whatsoever and we were able to to gather Significant impact There are interesting things of this approach basically the ETA's Calculating ETA's if you consider that we might at the same time We might have thousands of requests and thousands of available drivers and we have to compute the matrix of ETA's From all of them. So we basically became a goldmine for commercial route providers We we may be at a peak we are requesting 2000 3000 Routes per second and the problem with that is that it scales with the square of the size of the business and for Some reason our CFO is not super happy about that and Also commercial results are not exactly what we are looking for because I don't want how long it takes from driving from a to be What I really want is how long it takes for a driver who is currently in a Driving in a certain trajectory if I offer him a ride It will take some time for him to accept and to set up the the GPS and And all of those and how long it will take for him to get to the pickup point and those are very Significantly different question for example I should never assign a ride to a driver who is about to enter a tunnel Right because he probably won't be able to to accept the ride and I am and I am just waiting every one time so So the idea the challenge that we face it is like can we build our own ETA estimation system and Now I will tell you what we will which because it has some pretty cool things that we found along the way As I said we are heavily dependent on ETA for example we need ETA to be able to show a price to the rider I think that nowadays no everyone expects from from a right hailing a decent right hailing platform that you get a From pricing warranted and you don't want from 10 to 15 Eurus to one 12 point 25 That that That means that we have to make a very wise prediction on how long the the ride is gonna take As I explained the assignment problem is the is the biggest consumer of these of of ETA's and we also need For simulation and experiments Obviously for improving the matcher I can we can do as I told you before which is running experiments But if we are tweaking a parameter, we can there is no possible timing in in life For for us to be able to know if a parameter is 0.2 0.4 0.5 Right first for that we need simulations and to make our simulation realistic We need to know a how long good have taken for a driver to get to another to another location So the the idea is Is Simple because yeah, what we have is we have a bunch of data about how long it takes from going from a to b because basically All the time that our drivers are moving they are feeding this This system there is a catch that is like we won't be able to know which is the route Because at the evaluation time in the machine learning problem We will just have an original destination and the other catch is like we don't have a map available Right Digital mapping is one of the most expensive things you can build I don't know the investment that we will have done in Google Maps is probably Over a billion nowadays Some of our competitors kind of for building on their own maps. We are far from there yet and and Also, the the other interesting thing is like we we wanted to distinguish between the assignment ETA and the origin to drop off ETA because of what I explained you before So basically we we got a bunch of rights and We start doing what you any of you would have done which is like, okay, let's trust. Let's extract some features I don't know the latitude and the longitude the target is of course the time that it actually take for the ride To happen in in real life the Euclidean distance the Manhattan distance habits in distance whatever you you want to do basic and To the surprise of no one it didn't work too well, right? But then that was the magic And if we do the the basic thing I explained before every journey is just providing us with a data point, right? But actually we have a lot of information like every five seconds the the car is telling us Where it is and where in where is moving? Where's the trajectory? and The idea is like yeah, if we instead of using a ride just the origin and drop off we use all the cells around There is something Quite magic that that happens that is like this. Sorry. Yeah These cells behave statistically very similar to words in a phrase Same way that you have Common words in English or Spanish or any language like why? We have very common cells typically train stations transportation hubs airports and just like Natural language processing leverage the distance between words so that you know Madrid is often next to city So NLP lens that Madrid might be a city we we understand that Two cells tend to be connected if they there is a major road going through them So the crazy thing that when one of the the the person who thought of that in the team Told me about he said like yeah, that's crazy. That will not work is what happens if we If we apply to these problem techniques such as word to break or similar more More general these are embeddings and the idea is to bring The number of cells that we have is millions but we could have a lower dimension representation that catches most of the variants and Therefore we understand how the city is built and then instead of inputting the neural network with 1 million With a with 1 million rows full of zeros We we actually input the the neural network with something a bit more interesting to learn from And this is certainly what what that things planes I mean, I'm probably some of you have a working machine learning techniques are kind of Familiar with with the technique basically you train another neural network and you cut it so that you get the low dimension representation of the high dimension space you are trying to to study And the cool thing is that it works Basically what we are presenting there is One yeah, I don't I don't have a laser here, but it was the cells that are closer In the sense of the of the lower dimension space Which come from the directories to a certain cell in the middle of that cluster and as you can see Basically, the system has understood the structure of the city, right because it populates a lot of cells as being closer close to the highway and also to the other highway that goes northeast and Basically, it got all the major roads as being closer in the sense of the city or the trajectories of the rights To the city that we were that we were looking for and for sure there is that thing in the in the upper left corner That we were like yeah, it didn't work, right because it's like why are those cells? Closer to the to the cluster that is significantly disconnected Can anyone guess what was in there? It's a tunnel, right? There is a tunnel between the the cell next to Duke de Pastrana Basically it goes under the under the rail the rail station that is there so it's pretty cool that the system has understood that The structure on the city is there and remember that I never fed the system with any sort of map It just got is just inferred the map from the from the directories And it's actually cool so that I can give like talk samples that yeah, it looks cool Is that it actually works very well? This is our naive model What we used to have 300 seconds mean mean absolute error and now we are in 190 Which is remarkable remarkable because when we Commercial providers and the best one that we have found is Google Maps is making 180 So he's like yeah, I mean we didn't put billions in the thing and cars in the street I mean we put a lot of cars, but they they were not with the with the cameras and so on And we were able to to understand and we also this is an example of a route and Is the the estimations during a seed during an entire wake? And it's basically how our system learned that there is a rush hour on the mornings that there is the weekend where everything is calm So it's basically understood the the entire the entire pools of the city The other cool thing we built with this so that We can scale the match it up is that? Using the system we can build our own isochromes and replace the manual radiuses that I talked to you Before we by studying the manual rules the only pattern that we could understand that all local teams were applying was that They wanted that they wanted Drivers not to take more than x minute to x minutes to pick up but because they have to define that in terms of in terms of In terms of the distance they were like yeah in the peak hour the radius will be shorter Late at night the radius will be larger and in this area because there are highways the radius is larger And if so all of those all of those insights We were able to to rebuild with this that as you can see captures the pools of the city as you can see the on the on the Late at night there the isochromes cover most of the space and it's also pretty cool because If you realize it's catches also the highways. That's why you see That it that the isochrome is not is not rounded because if you are in a highway you will get much much faster to To the center of the of there and this was a super cool thing that we did very recently Just saying isochromes with ten minutes help us Makes our drivers more efficient than the 1800 100 rules manually defined we used to have So now we can start building complexity on top of these huge simplification And now we will do things like yeah the cost of opportunity if you are in the suburbs You probably will be available for a time because there is not so much demand in there So we might have a bigger isochrome over there and in the center of the city the opposite We are getting requests all the time. So it doesn't make any sense that you take seven minutes to go pick up anyone And the the cool thing in here is that commercial APIs do not provide This or they do not provide it on the on the scale that that we are interested to so we it's it's really something We are super happy with And just the the super flogos that we used to to build all of these I think all of you are are very familiar with with the tools mentioned in there We are very we don't get any discounts for saying this but we are very heavy users of Google Cloud and The TensorFlow and and all of those and we are kind of happy about it I mean you when you play with cutting-edge technology, you know, you will suffer because Many things you will be the first one that has been doing exactly what you're doing and and that but we are kind of happy how things are progressing and the speed of progress and the how fast they they solve the issues and The what we are now is what we now want is to be to bring all these Prototype to production For production in cavify. It means those requirements. We have to be up because we are something that The work of people depends depends on The scale is pretty high We should be able to be able to answer a hundred thousand requests per second the the accuracy has to be no non-worth of 90% of the commercial accuracy and We should be able to to auto retain So the reason what we are doing these talks in the end is because the current data team is great But needs to grow to make all of these things happen We current these are our current now our current numbers and I think I mentioned in the beginning But I repeat because I know this is catchy. We have also development center in Brazil and the plans for next year is to Triple the number of data engineers and to double the number of data scientists and If any of you is up for the challenge or know someone who is up for the challenge You can always look in the cavify commons last jobs But you can also email me that I'm a friendly guy and we can we can get past a lot of rule bureaucracy Yeah, and that's this is what I wanted to say with you today Now is the time for the question over there You will have many questions as you told as I told you he's the guy. He's always making question for the talk My question is obviously building your own maps is very expensive as well. Google has invested tons of money in it And have you considered using open street maps? Yeah, we have we have considered open using I mean we in the early days of cali fight Of course we were using open street maps because we have no money, right? The thing that we that we found is that in Latin America the data is really poor and We also we are totally adaptive actively contributing to improve That data, but we are mostly mostly contributing in terms of places like POIs in the map because the the mapping is is a serious business and typically requires local authorities to be on board and We have way too many issues to go first with local authorities before before doing this But we would love to yeah, I mean that the place is awesome. Thanks. It's nice to hear that you contributed to the project. Thanks Another question We have time so thanks for the talk and have you thought about using a graph database? Yeah, I mean Neo4j or stuff like that. Yes, certainly For the maps. Yes, the thing is like as I explained we don't have maps. We don't have segments in there We are just using cells and the other thing is like for most of the use cases that that we are using here Databases are really slow just to give you an idea I don't know how many of you are familiar with Michelangelo, which is the feature the centralized feature feature store architecture by uber We actually found we and our friends for on gojek, which is basically the cabify of Indonesia The the thing that we have a change in that architecture is not having the features in Cassandra or Big table, but actually having on redis because that how fast we need our features to be updated, right? For example the the ETA between a rider and a passenger that that changes super fast So typically we are we are not We have database problems many as many as you can imagine the core of cabify is built on couch DB Which is something really cool when you're a small startup, but we are suffering like Too much I'm getting a way of it, but for this for this thing we we don't I mean we don't have a problem that I think is Databases resolution Thank you for the presentation Just a question out of curiosity you've mentioned that you have the information about the trajectories But are you considering also another variable such as the weather or the time of the day? Yeah, those are the kind of things that we when we input in the model It's like our system already knows that's what I'm what I mean on being pervasive on people's life Like I don't need to know is is is raining I just need to know I just know that car I have a thousand cars in that area and they are moving much slowly So I don't need I we haven't found any synergies using external data But because we have too much of our own of our own data and my strategy in there is like first. Let's make a As much let's get as much value as possible from the data that we have and then we start considering external data such as GSM data for a normally detection. I'm talking football games or stuff like that, but yeah No more question Last chance five four Three two One go big applause for him. It's been a real honor to have you here You