 Hi First of all, thanks to the organizer for giving us the opportunity to share a little bit of where we are doing with data at KabyFi Back to the community. So hopefully You can you can avoid some side sign of our mistakes and and take some of our lemons The the title of the title the talk is one journey wiser. Um, this is simply because our approach to the way we do things in data team at KabyFi is that every time we bring someone from A to B We should learn something so that Future riders and future drivers have a better experience. So that that's like the like the motto of the team Our ass I mean because we are in Madrid when I do these presentations Abroad this is is not so common, but because we are playing home Probably most of you know what KabyFi is. We are right-hailing company. We are unicorn we operate in Spain and in Latin America and I'm telling you a little bit about what probably you don't know Probably, you know KabyFi, but if you see I'm using this max immobility thing because KabyFi, we thought it might be boring to be just doing right-hailing and therefore we started a couple years Expanding our efforts and now we are not only have classic black cars But we also a significant fraction of our volume is with taxes in Latin America And this is the biggest the the most increasing Vertical that we have with electric moped scooters and e-bikes with the mobile brand and of course because it was not enough We also entered into the fintech space with Lana that is targeting a The problems of our users, especially in Latin America the data in in principle that the strategy in presenting it is the same for for all verticals and But obviously because it's the biggest force portion of what we've done so far is right-hailing. It will be right-hailing focus Probably what you don't know and it help it probably help understand why we are doing all these things that we are doing It's because our vision is that within technology can have a transformational effect In in cities. This is the corporate vision and when we go to the mission as you see there It's clearly highlighted than data and technology has to be the the ways So we get that improvement in the cities What we operate so we we have this at the at the very core of the of the entire maxi organization Another thing you probably don't know or maybe you know, but you haven't realized is that despite being a unicorn Most of our competitors are 20 to 50 times bigger than us So for us standing in the shoulders of the ions these things that you say about science Like we have to build on top of what other people did before It is mandatory. It's like we cannot reinvent wheels. We cannot afford that so the way we envision that we can Then we can provide a better experience to to the people in our platform is that We think that automating decision-makers are the way to go Google is the company in the world that is probably the best in organizing information Not because they have the best librarian Because is the algorithms within Google those who have learned and keep improving on ways to organize and synthesize information Right. So so that's the kind of approach that we want to bring here I want to be the best in transportation not because I have the the best transportation classic professional but because My systems are learning on how to move people from a to be in a more effective way This is when I joined the the the company in me to 2017 so a couple years ago. This was me all lonely and and We have a couple of data engineers and just because it's an operation-based business we already have a comparatively large Business intelligent business intelligence team because you know we we need to do operations As of now, this is where we are as you can see the the team has increased significantly and this is where we plan to be in one year and as you see all these spaces are opportunities and Just an idea is that you take this page and bookmarked it and maybe once a month You can check if there is something for you there Yeah, I'm waiting for the for the pictures in this one. Okay I'm gonna tell you about how we build smart decisions and first a huge disclaimer First thing I'm gonna tell is about how Narnia looks how if everything would be wonderful How things are but then we know things are not wonderful, right? So so I wanted to say this from the beginning first thing we do is we Dimension the problem or the opportunity that we are trying to tackle we go for an anecdote evidence You have these many drivers angry because these didn't work to something that we can measure and see the pollution See if it's getting worse is getting is getting better was the real impact of this in the business And of course we can consider this a problem on our own opportunity if you are an optimistic You will see an opportunity if you're a pessimistic You will see a problem and we need both of the profiles in the projects will end up Then if we see that the problem is worth because as I said we don't have as many resources as others we are competing with We prototype a model that solves the problem or up to a point solve the problem In this phase We validate the prototype against either cold data or what we call in shadow Which is something kind of cool. It's like we listen to production, but we don't We don't make decisions in the marketplace, but rather we write what we have what we will have decided The reason for that is because by definition. I cannot affect Any real user I can take bigger risk, right? Then it comes the production ready, you know, I had this prototype it was a A poorly written script in Python wherever but with a super valuable insight and I I want To scale it so that I can serve all of my users in the platform and here for sure real engineering is needed Hopefully this finishes and Then we go to the inference part that is basically we need to establish a causal relationship between this thing we built and check if It fixes the problem it tried to fix or if it brings the value we expect it in the prototype Basically, this is by experimenting treatment and control a bit testing and all of these things that you are probably familiar with But this is basically how it should work In order to ensure we do it right this is a bit of what we do and this is a bit of how we do We are copy pasting Airbnb and I am saying this with pride because there is a post I completely recommend which is scaling knowledge at the Airbnb. So we are doing the same thing at Cavify It basically revolves around this open source tool knowledge repo that if you haven't tried I I highly recommend That focuses on on providing quality reproducibility consumability and discoverability on the research that we do how Okay, how do you provide quality? Python notebooks love it amazing Very good. I want to review. I hate them Right like how do I provide feedback to a book to appear on top of an iPad and notebook? We we tried collaboratory. We tried many things none of them we We like at all and then we found this thing which is like, okay Get some tool that from a non-reviewable Python notebook creates a reviewable markdown And then you can jump on top of all things like github githl up or whatever the things of all developers have been using for 15 years to give each other feedback and Having this idea of the pool requests or merrikes and stuff Uh Reproducibility what we do with knowledge repo is like anything we publish in the repo Has control version of books and frozen data frozen data because The the cost of this is like ten cents a month. Then ten US dollar cents So why wouldn't you freeze the data of this prototype that you did, right? We are living in the cloud age everything is so cheap so Let's let's make sure that that the the data scientists one year or data engineer one year in the future can reproduce This thing that we did here Consumability we are already an organization with over a thousand people so communicating about the things we learn is from for trivial to all stakeholders and Our idea here is that all of our deliverables have a hundred word TLDR a key figure and Then they have this structure of an executive summary that cannot have code Then we have a scientific report that allows you to reproduce and there you have the and there you have the code and Then discoverability we have now over a hundred and fifty Post in the knowledge repo. So how how does a person who works in rider growth get to Get to the things they need so what we are using is Taks and subscription and actually this is a feature that we added into the into the Airbnb only repo actually the engineer that billet is seated over there and And basically allows that if you subscribe to the rider tag from now on everything published in the In the repo will get to you in your ad cavify.com email, which is the tool that we also use for authentication And Again, this is Narnia. This is if everything would be wonderful now. I'm gonna get real And I'm gonna get real telling you a data story about a problem Our problem our problem is last project as last product is a smart rider engagement and start with a question that Okay, we in pretty sure we spend lots of money in retention efforts every year You can you can probably ride hailing is the most subsidized history in the Industry in the history of mankind and I measure that in the terms of billions raised from venture capital Guess where they end up, right? So we we have this idea Two years ago or 1.5 years ago that this is this is worth for real like we can with weather and Then we start problem dimensioning and Hypothetically each of these squares is tens of thousands of dollars Right and these are different kinds of retention efforts that we do and this is from a from a market for a period of time And then we repeat this for all the markets over same periods of time and we start getting some stories And then among many other things we find that we have a very fragmented ecosystem that We don't have a joint effort on how we are approaching retention and Almost no Intel in being generated So remember the title of the talk is one junior wiser We realize that by giving discounts to people we were just getting poorer because we were not learning I mean that there were like very little insights after millions in investment of like What's the best in the most effective way to retain a user in a right healing platform? We were not generating that knowledge and we thought we could do something about it So because the opportunity I think you will agree with me that is worth Then we start model prototyping And this is I'm gonna insist on this idea a lot We are building the simple the simplest model that works That means that as data scientists or data engineers we know we could do better models But we know they could take longer and bring additional travel So we go step by step we prove every hypothesis I remember the previous speaker saying something similar along these lines We verify that every step we do is giving us in the in the right direction Thus the first step has to be a baby step the model of choice in here is MVG slash MVD if you have been doing retention studies is fairly common It basically fits on just this thing Imagine the kind of data that we have in cabify about a user being a cloud native company Where everything you do is stuck next to your user ID yet we decided for this Just with the purchase history in days of the customer We could feed the algorithm With just three features and four parameters and the simplest way to fit a model, right? And actually we realize that it work it works Pretty cool With the output of the model is the probability of that we should see you again. We call it be alive And you can see as an oppose of the probability to turn but really in Because we are not a subscription base thing is hard for us to define what's a charm, right? Like you haven't used me in one month in three months like why do I have to make that arbitrary choice to define? So so we went this way and as you can see the model basically learns This is a particular users and this is time and this is when the the the user Road with us so you can see that any time that he writes a we We increase the P the probability to see him or her again And also that the more he's retaining the platform the the slope is smaller Right because if I have been able to bring one to the platform to someone who has been away for a month I don't worry too much if he's gone for 15 days, right? Maybe it's just that he doesn't need my service so often Of course, I can discretize this space and I can start Doing actuators that do things or they don't do not do things when Different users go through those through those thresholds Of course, I have this classic guy that we like we have this bad news guy Which is like apparently he loved us and then left for good very bad Very very bad story there or people who just used on but very infrequently, but they are loyal Of course because we can do this with every customer We can draw the distribution of a certain city compared to the distribution of another city and say and see if we have a Healthier user base in some in some market compared to others Then remember we go to okay, let's let's let's experiment as you can see and you can imagine the experiment here is Pretty safe forward I keep some people in control group and I'm on people in training group I go through the model and I give discounts to that's that the model shows they have higher potential of living Right and of course in the first attempt we get something like this. Yeah one point something percent improvement Really not significant When you do this several times you end up with what do you actually want it in here? I'm plotting time and here I'm plotting profitability of the user of course because I am giving discounts my treatment group at the beginning Has less profitability because I am investing further in them But I am proving that over time the treatment group has more profitability Than the control group. Therefore I have returned investment or attention Okay, and now it comes the fun part which is Okay, we did this with the scripts for two cities During two weeks and it means like people were like running the the scripts a CRM was like manually launching the campaigns Everything that you need that you can imagine that it needs to happen for this and then okay How do we make this automatic and then the things get a bit crazier, right? hopefully we had a proper data lake and data warehouse by the time and And we have a talented group of data engineers to help us and build these pipelines Super cool very good looking To to process the data, but it really takes time Again One simple model in production is much more effective than 1 million amazing model as prototypes. So Keep things simpler because they are simpler to prototype to be politicized Along the along the way you can never the less generate value, right? We we try to be agile and I think we kind of achieve as in for for months Some engineers were coming to the office Monday 9 a.m. And they were manually launching the scoring process and monitoring We were building in tableau the visualization So the marketing team would know what we were doing and we started the people affected and And and and the end this I think we released this just a month ago Which is finally the interface that we always wanted to give to our local teams in marketing They can simply log in say how much money they have for retention this month And then everything will happen magically But we have been delivering value for over a year of course because we were launching manually We couldn't serve all markets So we say okay, you can you can choose three markets the one that are more critical at the time Okay, so so for those we were doing a manual things and also it was super useful When someone that has to be this interface has to launch Monday 9 a.m Manually the things because then he has or she has a huge incentive to like get it done, right? You are basically automating yourself But when we run the post mortem We realize that yeah the problem dimension and model prototype cycles are fairly fast mostly because we have knowledge repo and a proper warehouse and data lake But the green them production ready not so fast Whoa, okay, and that's probably a conservative estimate So I think it's pretty clear where our bottleneck was In learnings, okay, we have massive impact We learned a lot about how to do retention on on right hailing We were able to bring proof of context to to real impact on customer very fast, but We had these huge among the bad things is that we we had a very a very complicated process of like Translating things from Python scripts to like real code production ready We have a problem with the model infrastructure it was very hard to know which version of the model was in production versus which prototype in the in the repo and Probably the worst of all is like if I would have to run Be true of the model it would not be much faster than the view than the than the V1 Which is bad, so we realize we need to be one platform cooler Let me introduce you to Nick Ion, which is cavifies machine learning platform Why I'm a cheat learning platform if I As I said before we shouldn't build many things on our end or reinventing the wheel But we truly realize we have problems that we we could solve all at once Okay, the first the first ball is pretty straightforward and I Would like data scientists to be able to boost to production machine learning models without talking to any engineer as a goal as a dream I Would like to fix the gap between training data and pretty data because sometimes part of the problems in between production and and prototype is that You don't have the version of the data already in the data lake that will be actually input in your In your model in in an execution environment and this debugging this is you want to die if you before doing that Monitoring model performance is something that for every automated decision-maker we were building we will have to do it again Right and explain a new set of engineers What's a train was test was the area under the car? What's all of those we needed to start from scratch with every with any any Engineering team that was implementing a new machine learning model We wanted to ensure that models are reproducible because we are giving them like Really the decisions the power of the of deciding things in our marketplace. So the less I Mean I'm more or less I have convinced my boss that the dynamic pricing is good But for sure he can he can challenge me on why we made this price this time at this time And and I have to be able to give some sort of an answer at least What would have happened in this variable will have been three instead of two, right? Also, we are deploying many models to be honest We are not yet deploying thousands of models, but the way we are going. This is what we are ending and probably this was This was this is something we need now that is like our models sometimes have to answer Thousand of prediction requires in a second So and this is not something that you can achieve with just one script So part of our bringing to production means scaling up to these levels and and it was part of the challenge in in a nutshell We are building the Cajun to at least experimentation because we want data scientists Twitter it as much as possible and as fast as they want and we were not able to remember our 37 weeks production in cycle. We were not achieving that before And I'm gonna tell you a little bit about the platform I'm gonna not get into technical just the two big parts and why they are they are there The first is a feature store. How many of you are familiar with the feature store concept? Okay, not many therefore the explanation and First is the part that has And yeah, I'm gonna I'm gonna do this first And When you grow as a company or at least in our case and you go all micro services and all all distributed and stuff Simple aggregation like how many writes has user one to three done with me It's far from trivial because you know Duplications and all these things that happen in distributed systems So it's amazing that if I have counted the number of you of Trips per user for my SRE model that I previously introduced if a data scientist is building and fraud model They can simply fit on this previous calculation for this or the other project, right? Some simple but those of you who have been in large enough ecosystem Probably can feel about the the value of this because really calculating the features is many times is over 80 or 90 percent of the job If We calculate the future these features come half value by themselves Even before any decision-making system happened and this actually has happened to us The SRE outputs some something like how much are you gonna consume in the next month? So even before getting full production ready with the discount system the the the team building the customer the support the tools for the Customer support they were feeding from here to so when you You complain to Cavify because you have some sort of trouble They they know you are a top customer or you are not such a top such a top customer and they can treat you Accordedly so this part actually started providing value before location Was completely ready? We have different storages for different purposes and we close that but the data is the same So there is no difference between train and predict data Just happen that the data in production is served from big table big table is Cassandra like or I mean It's one of these Databases that you if you give them a key in a few hundred at most a hundred millisecond you have a value Big query is a data warehouse So it's the kind of database that can you give me one million users and this for features and you will have that in 20 seconds So these are two very to two different use cases big query We use it to to train and to develop the models and big table We should to serve it in production, but the data is ensured that is the same in the both and Because we are doing this We by the by definition have a consistent way of calculating features where they remember the super simple Example that I gave you before how many trips has this user done in this week Okay, a trip belongs to a week if it started on the week of its end in the week And this might sound stupid But you know when you are telling a driver in the screen in the in the app that he has done 20 rights and In the bonus system you are telling them that he's solid and 19 and therefore bonus doesn't apply for him Then you start appreciating the value of consistently calculating things and calculating the calculating them just one So so that's something we we also got from the from the feature store And then the probably that the real magic is Okay, I have a model and I want to bring in to production How that happens? Okay, so we have several pieces. We have a repository of model. So a model is Actually, I'm gonna mention later is ML flow the thing we're using but this is like a chunk of bytes that this is my model Okay, here is the file. You could have it in a pen drive, right? So there is the model There are a flow jobs to train evaluate or or simply to rapid to production and then there is a Management that makes sure that if you update the model to a newer version it deploys To as many containers of the model as needed and then there is a serving that you basically get an API front And maybe behind there are a thousand containers and they are strapped from you, right? So so you can query from that in point as much as you need and Model trainer service probably the the remember the biggest challenge is like how do we ensure that data scientists know exactly what's in production and Actually, this is something we start making our own Solution and two weeks later. We realize ML flow existed and we basically erase everything we do I'd like let's go with the standard because we are not here to remember more things But the the cuisine of an inflow is like is the contract between the engineering and the science on what's in production as That's the best way. I I learned on how to board it So a real a real example actually of the sre model that we saw before is that okay someone is hitting the predict API saying that they want An sre prediction for Spain because maybe we have model different models trained for different for different markets Actually, we wanted we want the prediction from version four and we want the prediction from this particular user ID So what this does is it makes sure that the model actually exists and it has some deployment over there And the other thing is like okay sre for this version has these three features Remember that the features that we saw before Frequency raising cn8 and it actually gathers those from the from the big table in the future store So in the end you are hitting a container somewhere that has the version you wanted all the model You have the features that were just got from the future store and out of this You have a prediction on how likely are we to see user one two three again as of today that that's basically how how it works and Then how it works imagine you are a data scientist and you want to bring Another feature into the model for a new version because you have realized that if you add another feature Another feature accuracy will increase Okay, you go to the future manifest you write that you want a rider conversion rate You put some description so other data scientists can reuse it from from their models And as of today as of now we are supporting SQL queries, but we are our plans to support non-sql queries Basically how to calculate and then you have a new version of the model that you you publish Into the into the repo you train the model with the new with the new feature Hopefully this training confirms what you so you saw in your notebook that this increases accuracy And if that's the case you hit deploy we kill all the containers with the previous version of the model And now we deploy as many containers with a new version of the model that that's basically how how we do So just wrapping up the take-home messages is that that's the way we do smart decisions and codify For this we were able to basically Reuse something that someone else in the world has has done for the same problems that we were And we were happy about it for this part. We need to build our own little thing and We truly spent occasion to move codify to the next level and before coming to the talk I had a conversation with part of the senior leaders in the team on Some of the models that we have built For example, SRE. I this is the the one we have been discussing the talk We know he has he has brought over five million dollars in in return on investment yearly but it took 37 weeks to bring to production and They are fairly confident that leakage on this is what do we consider access for leakage remember leakage is still being finished We can do it in two weeks. Rosewood is a model to to fight fraud basically credit card fraud He's bringing over one point five million a year in in chairbacks that we are not having It took us really long time This was the first machine learning model that we will in cavify probably that was that that's why it was so long to bring Into production or or copy maps, which is basically is my talk in this conference last year If you want to learn more about it, it's basically We show to a to a deep network to a deep neural network the trajectories of our cars But we never saw any map and then Based on this training we ask the network to estimate how long it will take from a to be at this time in this date And and it's pretty amazing at capturing the the city. So we do need to call Commercial maps APIs. So we are producing for million a year in in basically Google Maps build And this is the kind of thing you can realize that even if we are overly optimistic here and those numbers Actually double because you know the reality is not not me We will have a huge increase in the in the speed where in which we move forward So yeah, that's all I wanted to share with you today. Thank you for listening to me I I will take questions in Spanish or after the talk or if someone feels English sure Yeah, I assume that your infrastructure Leak-ion. Yeah, it's pretty much dependent on the Google Cloud infrastructure Or not as of today. Yes, as I said before We don't like we do not like to reinvent things. Yeah, but we realize that for example If we would have to move to AWS for strategic reasons, we know the piece the equivalent in in each of the pieces So it wouldn't be it wouldn't be that bad and actually we have changed our cloud provider last year in two weeks and a half Which is pretty impressive. Okay. The second at the end of the day The models are being served from a container from a pod. I assume they expose a predict A method, how do you build us in that in that container using seldom or using? How do you recreate that? I assume that is optimized It's out of mice and I know the input is an ML flow and the output is a is a Kubernetes deployment But if you are interested about that, I think there are people in the room that can have a discussion with you Better because I really don't know that but anyone else Okay, I'll be around