 So, my name is Willem. I'm going to speak to you today about one of the use cases we had at Gojek. Basically, it's a story about clearing tech debt that we had. So, I lead the data science platform team at Gojek, which is why this is one of the things that we had to focus on. So, before we get started, I wanted to give an introduction on Gojek because I don't think everybody here is acquainted with the company. So, what is Gojek? Gojek is an Indonesian technology startup. We have a variety of products and services, but we're most famous for ride-hailing on motorcycles. That's one of the reasons why ride-hailing is so popular in Indonesia. It's actually called an Ojek or it's a motorcycle taxi. It's because of the traffic congestion there. So, we launched our first product, which was a ride-hailing product in Indonesia. And since then, we've gone on to launch many other products in Indonesia. So, our aim is to have basically a solution for every workday problem that you have, whether that's taking you to your office in the morning, buying lunch, buying groceries for you. We aim to have in, I think about 15 verticals we have at least one specific product. So, it's lifestyle, payments, ride-hailing, logistics. We have about 18 products in total at the moment and we are classified as a unicorn. So, just a bit about our kind of scale and reach. At the moment, we are in three and a half countries. I'm not sure about Singapore. It's kind of in beta. We should be launching fully soon. If you just look at Indonesian statistics, we have, on an average day, hundreds of drivers online at one time. We do more than 100 million bookings a month. We have, that's one product. And one other product is food delivery. We have 250,000 merchants, making us one of, if not the biggest, food delivery service in Southeast Asia. And yeah, so, other products we have that are also pretty big are payments. So, GoPay is one of the leading e-wallets in Southeast Asia as well. So, the scale is quite big. There are not many other companies that are both digital and offline because we pride ourselves on being also an offline company that are operating at the scale. So, what does my team do? The data science platform team at Gojik focuses on getting ML into production, reducing the time to market for ML. Okay. So, reducing time to market for ML, productionizing ML models with data scientists and then building a generalized platform and tooling for the data scientists so that they can focus more on ML and DS instead of the nitty-gritty engineering details. And if you're just starting, you're starting from scratch, then that's often the problem that you have is you need to solve engineering problems. So, this slide actually captures that quite well. It's a common industry problem where you can easily just clone a repository you find on GitHub and get some model up and running locally, but getting that into production can take months. And this is kind of a dirty secret, but it's the reality because you're integrating with, in our case, product teams. So, just a quick outline of what we're going to discuss. We're going to discuss one of our first models that we made at Gojik, which is our driver allocation model and some of the initial tech debt we built up building that and then how we ended up clearing it and how ML flow fits into that clearing of the tech debt or that solution. And then I'll quickly just demo ML flow for you and some of the functionality. So, the driver allocation model is actually quite simple. I think if anybody is right-handing like grab over Gojik, you've probably triggered an algorithm or a decision like this, which is which driver do we send to a customer that wants to go from his location to his destination. And this is not always an easy decision because it depends on what you're optimizing for. You can optimize for business objectives or customer experience or driver experience. There are many different objectives that you can optimize for. So, this is a very important model as well because of the amount of money that's at stake here. It's at Gojik scale, it's a lot of money. So, small improvements in the model have a very big impact on bottom line of the company. So, when we started the team, this is one of the first models that we had to build. And of course, if you're starting with no infrastructure, nothing really existing and you have to get something into production, you need to deploy and provision everything. So, we had a lot of decisions to make, which, how are we going to go from raw data into production serving these models? So, we could do this manually, of course, but for a production system, we actually want to do this in an automated way. So, we looked at the machine learning lifecycle basically and identified the stages that we need to solve. The first one we looked at was data processing. So, for that at the time, Airflow was basically the primary solution, the most commonly used solution. And I think even today, it's still probably in first place in the open source world. It has a lot of gotchas, which is out of scope for this presentation, but we started with that for data processing and I think we're still happy with that for data processing. So, in Airflow, basically, what you have on the screen here is a DAG and that's at the top, you define a basically a graph and then you define specific steps within Python code and that that graph gets executed and it executes a series of steps in which you can transform data and typically Airflow runs based on some upstream event or a timer. So, either a partition is created in another database or you have like a daily job that runs and then the task is all complete and the graph completes. So, we built that, we implemented that and we needed to do that to get raw data refined. And then secondly, we were looking at how are we going to do our deployments but and then we saw, well, the company's using GitLab so we can just use GitLab CI which allows us to get into production to deploy our models and we can just define a YAML of all the steps that we want to execute to build our model serving application and deploy it into production, which in our case is Kubernetes. Then we were left with the question of how do we train our models and at the time we didn't really think that it was worth implementing a custom ML solution to train our models. So, we looked at our existing stack and we said, well, why don't we just use Airflow as one continuous process and we process our data there, we train our models there and we connect that up to our deployment. So, basically how this works is you have raw data and at some schedule you have your Airflow DAG which is a process, a long-running process being triggered by that scheduler. When it's done processing data, which is your features, done producing the model and your evaluation, metadata and all of that statistics, it sends an API call to GitLab and then GitLab builds the model serving application, it knows, based on that trigger, what model was created and it deploys all of that with the binary and the docker image and the configuration for that model into prod. So, this is basically a long-running push-based system. So, each step requires the previous step to notify it what's coming from upstream, which is important for how things can change later. But what we realized was that we'd actually built a very monolithic, tightly coupled process because you'd have a single instance of that timer trigger happening at the start and then all the way through to production, you have not models producing all of your metadata and all of these things, building, serving, deploying, serving and all the testing, everything is one, essentially long-running process. So, at first this was great because our driver allocation system could get into production, but soon the cracks began to show once you wanted to experiment and change this. So, the first problem we had was it's quite inefficient. So, we have a long-running process like this and there's a lot of complexity in this process. When you need to make changes, you need to wait a lot of time for each pipeline to run. Secondly, the steps are dependent on each other. So, your deployment con, your model con, so you're serving con deploy without the pipeline triggering it from Airflow and Airflow con can get into production without GitLab being triggered. So, those two are very tightly coupled. The second problem we had was that it was extremely hard to experiment because we had still long-running pipeline. So, either two options you have is you can either fork the pipelines if you want different variants. They say you want to add a feature, remove a feature, try a new model type, but forking it would then require multiple pipelines to trigger a single serving application. But if they're running at the same time, which one gets precedence? So, basically you need to hard-code into your deployment which model should be the one being served. So, you have all these complexities that arrive because of this tight coupling. Alternative is to fan in and fan out within a single pipeline, but then you need to have feature flags and you're serving as well. So, we knew that experimentation was a problem. And the second problem, the third problem we had was versioning. So, the problem with versioning is if your Airflow ETL is producing everything that's going into production, you're versioning based on time, not on instances. But it's actually very unnatural to version a model on time. And what ends up happening is everything is based on the timestamp of the original Airflow. So, what happens is that you're eventually versioning your model serving application and even your user clicks with that timestamp of that model was created. But actually that is something that is out of scope or irrelevant to serving. It should be something that you can find out if you had to dig in deeper later. The fourth problem was reproducibility because our pipelines had side inputs. So, they make API calls or pull data from GCS or do all kinds of strange things that even we didn't know about because we allow the data scientists and we think the data scientists should be the ones offering these pipelines. So, we wanted all the platform on which it runs. But if they're allowed to pull in data from the outside then we can't have deterministic runs and there's no reproducibility in the run. There's also no standardized way of tracking which artifacts are produced. So, each project has its own way of tracking these artifacts that are produced from these pipelines. But again, in the driver allocation case, this was also a problem. The visibility was also one of the problems. So, when you're using GitLab, this is a specific example of what we use GitLab. But this is often the case in CI tools that are not meant for observability and evaluation of models. So, in GitLab you can see that you have some idea of, for example, the Git tag, the SHA and who commit, who commit. But there's no indication of model matrix or parameters or how do you compare specific runs or deployments with each other. And then the scalability was also a problem. So, one of the issues was that GitLab, because it's push-based, depends on you defining all the destinations where you want to deploy all of your models. So, if you're scaling out to many regions, let's say we want to scale out to Singapore, Thailand, Vietnam, just from Indonesia, and you have all of these destinations like your tests, production, staging, all of these variants or model types, depending on your objectives, your deployment pipeline can get extremely complicated. And the final problem we had was there's no separation of roles. So, data scientists have to work, all the data engineers, software engineers and data scientists have to work up and down the stack. So, you'd have data scientists debugging issues in production where they make a change with the start of the pipeline and you had software engineers fixing a code that's producing data or artifacts. So, we knew that this was unnatural and we didn't know how we could limit this effect. So, we sat down and we figured out what is our desired state. The first thing we wanted was to simplify experimentation and make it a bit easier. So, that can be getting something into production that's new, but it's also something as simple as just changing some features or trying a new library for producing a model. We wanted to be able to easily reproduce results. So, we wanted to have deterministic, a deterministic way to version pipelines or code that runs specific process, whether it's deployments, training, ETLs, and based on the inputs and the code that runs, have a reproducible way to run that again and get the same results. And then, we wanted to make deployments easy. So, deployments should be something that's an afterthought. And, evaluation was the other thing we really wanted, some way to have an easy overview of both model training parameters and metrics as well as information about how the models and features are performing in production on real traffic. And then, especially for an engineer, the most important for me was how to be scaled as to hundreds or thousands of models because when you're stuck in using tools that force you to take a certain design pattern, then it can restrict you. And in our case, it was restricting us. So, we knew we needed to think about what we wanted to implement to scale this out properly. So, we evaluated some tools and one of the ones that we're looking at was MLflow. So, on the screen is just a simple ML workflow and it basically starts with raw data and this can be your data lake, it can be an event stream. You generally take that, you do some processing on that data, some preparation, you train some models and then you deploy to production and then once that's done, you collect metrics and it goes back into your data lake and the cycle continues. So, one of the things is that you typically want to experiment with a lot of technologies and this is unique in even though software engineering and ML are very similar, this is one area where they're kind of different in that you want to experiment with all kinds of different ways to process data, ways to prepare data like feature engineering, especially on training, you want to try different models and in deployments often you want to try different serving depending on the actual application. So, you might have a specific model being deployed another one going into TensorFlow serving. So, the next one that, so this is one of the basically the goals of what MLflow is setting out to solve. So, the first was how can we allow all kinds of different tools throughout this MLflow lifecycle. The second thing that it sets out to do is to allow you to tune. So, at steps like data prep training and well potentially even deployments, you want to change parameters and experiment and see what the outcome of that is. Another thing you want to do is scale. So, each stage there's actually scaling required, scaling your data lake, scaling data processing, handling very high load if it's a Kubernetes cluster or some cloud service that's serving your model. And then model exchange, basically how do you transfer or interface between the stages when you have a model. That's not always easy because it could be different teams between different stages and if there's not a common format for model exchange then that's something that needs to be defined or it's going to lead to a lot of inefficiencies. And then governance. So, how do you know that your process is following the regulations that you've set out and how do you have way to evaluate and observe what's going on in your system which is often neglected when you're just starting from scratch and you're just trying to get into production. So, MLflow sets out to solve these aspects and it tries to do it in a way that you're basically unable to use any kinds of tools, basically bring your own data processing, training, deployment, tools, and libraries. And on this platform you can run everything from training to serving. So, it actually consists out of three components. And just to add to this important point is that the components allow you to do that both in local development and in production. So, there's just a thin wrapper or execution difference when you actually take this into production and run your own production code, but otherwise it's identical. So, the three components are tracking projects and models. So, tracking will get to that a little bit later. So, with tracking you can record and query your experimental runs. So, everything from the code, the data, the config, the parameters, the metrics, artifacts that are produced during your training runs. You can actually, because it's a very generic service, you can track any kinds of instances of a product. It doesn't have to be ML, but in this case it's kind of built for that use case. And the second one is projects which allows you to wrap any code basically in a repository with a specific format for you to reproducibly run those or that code. And models is a generalized way for you to deploy models that are produced using MLflow and or actually just using a standard interface using multiple deployment tools. So, the one we were most interested in was tracking because of its ability to track what we are training, the models themselves and metadata about that. And it will allow us to decouple a lot of our processes. This is Dave. Long press. Okay. So, the key concepts in tracking are parameters, metrics, artifacts, and your source. So, the parameters, typically what you want to track are the two most important things for reproducibility are the parameters and the source because with those you can reproduce based on the source code of your training run and the inputs, what resulted in, what are the outputs of that specific process. So, in this case, the outputs would be the artifacts that are produced. So, that can be your models or data or any metadata or even images if you're looking at feature importance. And then importantly the metrics because that's what the data science is ultimately used to measure how well their models are performing. So, this is an example of the MLflow UI. So, you can see that on the left hand side you have some experiments that you've defined and for each experiment you will produce experimental runs and they're all timestamped and you'll see that there's also even a version and this version is the commit where you executed the code on and then you'll have parameters which were inputs run and then metrics between the outputs. So, this is the base of what you'd see in MLflow but I'll have a quick demo about this a little bit later. But for our use case this seems to be quite suited. So, we went back to the drawing board and we looked at what we built and we knew that we needed to make some changes. The main problem with this what we had built was there was a monolithic long running and there was no real separation of concerns. So, we broke the process up into three stages. Processing of data, training of models and then deployments. So, what we also did is we removed GetLap from this whole workflow of ours because it requires push basically API calls it's basically a push-based system and it's good at CI but it's not really good at continuous delivery and what we wanted for scaling something to allow us to continuously do the the training of models we actually implemented systems, a software and we basically in this case it's not really important what those models end up in. So, for example, data scientists can use Jupiter to train a model as long as he follows the right approach and the structure and libraries and his models end up in the right place which would be a model store stream. We also have other ways to schedule those jobs and there are some subtle differences in why we implemented this instead of using airflow which I'll touch on later. And the second thing we did was implement a CD tool. So, basically the change here is I think a pull-based system that has an idea of what is upstream to it instead of one that requires that information to be passed downstream because if it needs to be passed downstream then your ML pipeline needs to push that information downstream. So, for example, if you want to deploy to a specific region then that model needs to know that it has to go to a region but that's not really the focus of the model it shouldn't actually care about that. So, now that we have these ways of executing the stages of processing data which is creating your features, training your models and employing, we needed ways in which you can asynchronously change those processes without affecting downstream processes. So, we needed a way to store artifacts between the stages. The first way we did that was with a feature store which is actually open source. It's a way for us to store data that's processed both batch and streaming in a single place. So, it's polyglot storage. It's used for training and serving actually but in this case we're just focusing on the training aspect. So, now somebody can actually focus on processing data without affecting the training downstream. And then finally where MLflow fits into the picture importantly is between training and deployment. So, with MLflow here you can train models and as a data scientist you only need to focus on getting your model into MLflow. And with the right parameters that decouples you from the whole training, the whole deployment aspect of this workflow. So, this is ultimately what we're ending up with. So, some of the advantages to taking this approach. So, one of the big advantages is that ETLs in airflow are fundamentally different to model training in whatever training system you're using because model training is instance-based where ETLs are generally time-based. They're waiting for a partition to exist or some day to roll over before they process the data. But for model training you just want to be able to, in some cases it's on a schedule but often you don't want to wait for something to happen. You want to make a change and you run a training and then get a result. So, if you're using MLflow then to log your matrix from that training and you want to basically train your model, store the matrix in MLflow and have that available to deployment. This is basically the change you make. You're just wrapping the existing training code in some, in the MLflow Start Run method and then you're logging the parameters in that context and the matrix in that context and then finally your model out. And that gets stored in the MLflow tracking server which is what we use. And that's all you do. You don't have to have your code deploy the model or trigger anything down the stream. And then finally because our deployment system is then artifact-based it's looking at MLflow, it's looking at configuration, it's looking at Docker registries, it's looking at Helm charts, it's looking at all the components it needs in a pool-based way for their availability. The second advantage is reproducibility. So, when you're with feature store MLflow, storing the versions of the code and the inputs that went into them, then you can easily trace back from deployment through a backup stream to what produced either the models or the features that went into production. And typically this is how it would work. You'd make a change, get your model into MLflow, get into production and then see some result. And then from there you would go backwards and see what change you made in training, what change you made in processing the data or upstream format. So, we now have instead of just prod serving and raw data, we have two basically checkpoints which decouples the processes from each other and allows us to, because of the versioning, track and reproduce results. So, this is one of my favorite slides and it's quite important because it really shows why MLflow is useful in our specific example. So, on the left is just the previous slide that I showed you, but what the typical experience would be like or the flow would be like is that the feature store would be used to train the model, but all the data, the config that's used to train the model will be logged in MLflow. And then you deploy from your CD system into production. And so in production you just have the idea of the model. Then when you get metrics from real traffic, you can actually log those metrics back into MLflow. So, you don't just have the training metrics, you also have the, for example, the objective that you are targeting which could be conversion rates, click through rates or something, you're also logging that back to the model. But you can even take that further because the feature store and upstream processes that produce inputs that go into that model might also benefit from that. So, you can even log the feature performance back into the feature store. So, now if there's a data engineer producing features, he can see, well, the features of this specific section are being used in many models and have a high importance in reaching our objectives. And then role separation. So, role separation was one of the other things that was really quite powerful in making this distinction. Once you have a decoupling which is essentially an interface between these different processes, now you can have that engineers focusing on producing the features that go into the feature store. You can have data scientists only focus on the model training and getting models into MLflow. And the software engineers can basically, especially for me, protect the production environment from everybody upstream. And, yeah, so my goal is to just basically squeeze the data scientists there until they can basically just sit in a Jupiter notebook and they're very safe there. So, the last one was about scalability. And for our driver education system going multi-region was one of the most important things. And if you look at what we wanted to achieve, just with a specific system, it's multiple environments, multiple markets, many different model types because it's multi-objective, and you want to have many experiments. So, you can look at hundreds of simultaneous deployments. And we knew it couldn't be push-based. But if you have a CD system that's pool-based and the CD system is the one that knows which inputs or materials or artifacts should be in the pipeline. And it knows about all the destinations that it needs to deploy. Reversing that process from push-to-pull unlocks scalability for us. Because now we can define all the markets in our configuration. And as soon as there's a new model, the C-part won't be alerted of the change in MLflow. And it'll roll that change out automatically. So, I'm going to show you just a quick demo of MLflow. You guys can see this. It's a simple example. So, what we're doing here, this is one of the basic MLflow examples. I want to illustrate to you. Let me just show you MLflow. So, we could see demos here. And we're going to train a basic model. And then I want to show you the difference in what you can see in terms of parameters and metrics. So, in this example, you're basically going to read in some wine quality data, just a CSV, and then train in the scikit-learn model, and then produce that as an artifact to your tracking server. So, you can just run this code. And basically, the only difference between normally doing this and with MLflow is that you're wrapping, you can see at the top there, your code with just MLflow's.run, and then you have your logging at the back. The rest of the code is identical. So, we can produce a couple of these runs with different parameters. And they should show up here. Okay. So, now as a data scientist, you can have a look at this and compare the results of these different runs. So, let me give you just a quick glimpse of what these columns mean. So, one of the cool things that MLflow does is it actually looks at the git version of the code that ran. So, you can see that that version column actually tells you the git shot. So, if you go back here and you go git log, it'll be the same shot. So, that's A01D, which is the same one that's over here. And then you have your parameters and you have your metrics. And as a data scientist, you can then have a look at them and evaluate the ones that you want to. So, I'm just going to quickly show you how the SIGI process would look like. So, if you're looking at Go CD or some, it's actually relevant which CD system you use, as long as it allows you to have a pull-based declaration of your inputs. So, you can use Finnaker, you can use Go CD or concourse. There are many tools that like to do this. In this case, we're looking at two things. Basically, the MLflow tracking server as well as Git. So, Git will be our configuration store. Now, we did some new runs here, some three new runs. And if you go into MLflow and one of these runs, you'll see that it has a promoted flag here or a tag. This indicates to the CD system that you want to deploy this into production. Of course, this promotion can be a manual step. It's typically not done from the actual training. Also, when you're in this detailed page of experimental run, you also get these statistics of a specific run. So, you can look at the parameters, the metrics, so the inputs and outputs, as well as the artifacts that are produced. So, you can drill into the model that is logged. So, that model is the one that's being produced over here by the log model method. And you can have a look at the inputs that require the packages and dependencies. And basically, any arbitrary metric can be stored here, sorry, any arbitrary artifact. And it also allows you to backup this or to retain the state in an object store like S3 or Google Cloud Storage or anything equivalent to that. But now, what's important about this is that from an engineering perspective, because I care more about this than I do about the metrics, that's up to the data scientist, is how can I get that deployed? So, if everything went well, we should have a new model deployed here. So, basically, if you have a look here, there's a seventh one that's being triggered. And a trigger is based on a new version being available upstream in MLflow. And MLflow allows you to do this because it has an API where you can just hook up to, where you can just hook up to and just scan for new versions with that promote attack. And then you can even click on this link and go back to MLflow where it's working and see what was actually deployed and draw into your specific new model. So, this information can be passed into serving as well. So, you can put the specific run ID or experiment as the ID that's deployed in production and then you can tag users with that or your user segments and then evaluate the model in a loop. So, to just give you a little bit more detail on what the UI can do, I'm going to show you some data that I've partially cleaned of anything important, but that is actual data. You can actually, you can log any number of columns or parameters. In this case, we have a lot of parameters that are being logged for our driver allocation model. And the data scientists are looking at a lot of metrics here as well. So, in this case, this is not really that much time that's passed, that has elapsed, but you already have 276 runs. So, if you didn't have something like MLflow to evaluate this with, it would be very, very difficult. You need to go manually and or with some SQL database or something or some BI tool and go and evaluate this. So, it's very convenient to have MLflow here, AUC or whatever metric you're interested in, and then you can draw into that specific run. Let's say you want to see what this model, what's important about this model. And one of the artifacts that we typically log is the feature importance. So, in this case, I've blurred it out because I don't want to show you guys what features we're looking at, but clearly there's one feature that's really important. Sorry, great people. So, this is very useful because the training pipeline has this information and both the data scientists are very interested in this. But the point is that you can log in any kind of arbitrary file and the UI allows you to expose that. And another interesting function is that you can actually select multiple runs and compare them. You can do this video comparison between runs. So, this is another way to compare specific instances of a run which you have pre-filtered out. You get some horizontal scaling here and you can look at that. But one of the cool things is that you actually get a graph at the bottom here that allows you to look at, in a visual way, the differences between them, between the specific runs. So, being a human, it's easier to see these hot liars and then you can draw into them and see what make these specific runs unique. And you're able to do this for all kinds of metrics and parameters. You can make changes as you see fit and you can zoom in and look at specific areas. So, the functionality here, it may seem not very different from what you can get with the BI tool, but it's about the decisions they made in the API that they've built. Something that's useful for ML, but generalizable. It doesn't restrict the way that you operate. So, it allows you to use the system both for offline development when you're just testing things out and for production systems. Yeah, so I think that's it from my side. Yeah. Thank you. So, questions, please. I will come up to you and I'll pass the mic if you have questions. Okay, so I'll go. Every time I'll say more questions recent and whoever raises thirsty drink up. I really am excited you are protecting production from our stream. I'm still a bit unclear though on why you're preferring a full-based model for the push-based model. I think you briefly stated that it's probably because you guys have feature clients that you want to serve to specific countries and something we're not going to serve them. But wouldn't that be probably how you buy configuration or something? Right. But let's say you've got configuration or other inputs to your deployment. Then you would need some way, some other store where you basically change your push-based system into a full-based one by having coded the start of your push-based system that goes and asks those stores what's the latest version of what you need, what I need to deploy. Do you know what I mean? There's a slight distinction there as opposed to it polling and knowing that what are the latest versions available. So are you also in effect like using a helifluor as one type of photography service which is actually provided? Yes, yes, yes. So the parameter storage is actually one of the things that makes it valuable as well. I mean as an engineer that's the part that's valuable to me. It's the interface between the two layers. More so than the matrix and artifact storage which we can solve ourselves. But in this case it solves it well for us. But yeah, I hope that answers your question. Storage is definitely not a problem. It's the least of our problems. Will you once more ask because we captured the audio for the recording through your this one. Yes, so then we keep that question from the word this case. Thanks William. My question is that for purposes of deployment can you logically define a setup model? Often you don't deal with one model, right? You probably deal with a group of models for a particular use case. Now, can you logically define a group of models and version them as a unit and then that you are doing at a model level. Instead of doing a model level, you do it as a group of models that are just abstracted out. So basically the mapping is a point to a deployment unit. Right, so the question is can you version groups of let's say artifacts but in this case models together just as you would at the model level or the atomic level. So yes, you can do that. So you can just well in our case there are ways around that. I guess there are better ways but we basically just sum up all the versions from all of the components all the inputs and we produce a unique version from that's basically a aggregate version and then we use that to track the group together. But you should look at the specific attribute that is unchanging or that's relevant. For example, if you have multiple runs producing a model with different timestamps but it's an identical run then that should be an identical downstream version. So I guess that's dependent on what you're actually producing. In our case it hasn't been a requirement actually because the model is the atomic thing that we care about for the future. If you want to ask more questions then feel free to raise up your hands and I'll wrap up my question here please. Right, so your question is which other tools did we identify for model tracking? Right, so the other tool we were looking at was model DB. I'm not sure if anybody has seen that. So I think that the biggest difference for us there was that it was focused a lot more on the evaluation side of it because the engineers are making the decision in this case. So we felt like MLflow had better and more generalizable decisions that they had made to support many technologies and also the API that MLflow has is very lightweight compared to what model DB has. So we opted for MLflow because of its API even though it is lighter weight on the UI side and the ability to compare between experimental runs. That's the only one in that case where I was talking about different technologies. I think I was also referring to non-tracking. So for training and for CD, there's a lot of tools out there. Okay, keep on raising hands. I'll come up with the mic and I have one more question here in the back. Right, so we're not using projects and models because, wait, is that your question? Okay, have you looked at model chip? Model chip? I have not looked at model chip. I mean I was just asking, are there any more features other than traction that you're using because I understand you're serving there is also very possible rather than making MLflow in point to whatever. In this case, it was primarily the tracking that we were using as well as the ability to easily log out the artifacts because it also abstracts the way for us and the API is very sane and lightweight. Okay, one question is, I think you were just coming to us. My question is about role separation. You mentioned that this is one of the advantages. So I'm just curious about like, is it the responsibility of data scientists to introduce ML logic into their roles or software engineering? No, so that should be the data scientists. So the data scientists need to be semi-software engineers in our case as well. That's what somewhat is opposite from what you mentioned that it clearly separates data scientists and software engineers. So now it means that they have to also introduce the software engineering system. Well, it depends on what you define as software engineering because they're running code, right, even if it's just some fine code. So it depends on what level of abstraction you're working at. I think the difference is that they would be working at a higher level of abstraction and the software would be at lower levels. And I wouldn't want the data scientists to be deploying Kubernetes clusters and things. So I'll not anymore. Okay, keep on raising your hands. Okay, you catch your next one. Okay, this gentleman. I have a couple of questions on the serving side, right? First is, does the department need any money on services today? So you have so many developments happening. Does it require the services to go on or does it require downloading on services side? So Kubernetes does our deployment, our current deployments require downtime during deployment. It depends on the service in some cases. So most of the time we're using loading updates with Kubernetes is not an issue if it's state for service. If it's a stateful service like a database that's loading a lot of data into memory. So we have, for example, food recommendations that takes a lot of data into greatest memory that can sometimes have a blip where it's down for a few seconds. Yeah, is that the question? Or does it answer your question? That's part of the question. The question would be, you know, something complicated or close. So for example, before you train a model, you have a complex feature engine that you've done, right? And you have to repeat the same feature engine in the serving side as well, right? Because so how do you handle the sort of complex feature engine on the sort of serving side? Obviously you care about your latency, you know, faster response time and so on. So we have some details on how your structure, your sort of lean infrastructure can be able to do these questions. So your question is how do you account for feature engineering on the serving side to not impact your latencies and your performance? Because that's a requirement. I think that's one of the biggest problems that we all face. And we still haven't solved it. But one of the ways that we're doing that is by moving as much of the data processing upstream. So both data batch transformations as well as how you can have this. Let me just show you the slide. So if you can do all of the data processing upstream using Airflow or some other system, you can also use Flink or Bean or something for your real time and then store it in real time stores and then pre-compute as much as you can because then you can you only need to do minimal feature transformations in your application itself. And so that's what we were trying to get at is not to have any feature transformations except some basic data structure manipulation in serving. Sure. In some other cases, I think scientists prefer to look by governments in the past, you know, historic data, pre-offensive features and so on and so on, right? I think those are the cases where it's difficult for us to be able to do business on their own, which is responsible enough and also in the form of the JVM, so they don't have a hate on the performance question. Right. So yeah, so I don't know if that was a question, but just to present another alternative is a tool called ML Leap, which allows you to take pipelines, I believe it's scikit-learn and run that on the JVM. So there are some other tools that are trying to solve this problem. Okay, keep on raising hands. I know who is the next one. Okay. So, sir. So my question is how frequently do you deploy as a product in all those models? And when there is a new data generated through the source site, what is the feedback cycle? What is the latency in your feedback in your model? There is some point of the new cookies or the new traffic metrics generated from the source site. How frequently do these feedback will go to the products? So the first question is how frequently deployed to production and the second question is how frequently do changes in our product reflect in our models? Is that your question? Yes. I'm not sure how much detail I can give on this. We deploy many, many models every day. So at the very least daily, some models will deploy once every three months if it depends on the actual model and what it's trying to do. I don't think we deploy hourly for some projects. I don't think we ever get to that scale because normally the analysis takes a long time and data scientists don't want to have too many models to slow it down there. In terms of how quickly changes get applied, changes get applied a lot through the feature store, not so much through the models itself. That process is slower. So I think on a daily basis, at the fastest, but in aggregate, many models do. Okay, keep on raising hands. I know there's a bit of sugar still in you, so we'll talk about it. I can do it. So sir, here's the next one. We'll get to the next one. Thank you for a great content, William. A few questions. One is, is it easy to interface annual flow with FLO? For example, FLO runs daily and the daily batch is done, then you want annual flow to kick-start the training process. Second, does annual flow allow for control? For example, if I'm tracking on several metrics, if the metrics don't fall, it will set the threshold, deploy automatically, and maybe all the metrics fall, it will set the threshold, if there's a problem with the artifact, send me a notification, can you take care of things like that? No, I don't. So your first question was, should we repeat that first one? How would you interface? How would you interface FLO? To annual flow? How do you do that? We haven't actually tried to integrate an ETL process that's time-based with MLflow, but I don't think there'll be multi-different than interfacing an ML system with it. And secondly, in terms of how can we have alerting or basically hooks based on conditions in MLflow, I'm not aware of any functionality like that, but I also don't think that they should have that functionality there. I think that should be something that's left outside of MLflow. API that they have built is easy to interface with. So I would rather have MLflow as the central point to communicate with an external service, maybe. Thank you, colleagues. Keep those raised in your hands if you're passing the mics. Yeah, so thanks for sharing. So actually, I'm quite interested in how another can help in the deployment part, because tracking the 20 models is very useful, but at the same time, when we deploy the models, what kind of model version and how is it important for an issue for us? So I think I believe you're a resource of some problem, but I didn't see the size of the issue of someone because of us. Well, we can just have a question as hard as MLflow help with a versioning and solving that. Well, this is a closer look. Like once we're going, you will need to collect it at comp and then trigger the training. So it's like a close look for updating the model. So I don't even want to do this. Right. So connecting that up to training is a different matter. So your question is how do we collect feedback from specific deploys and basically evaluate that and like monitor the performance of the current model. So I didn't see how you do it with MLflow. Right. So in this case, it was just a toy example. But the versioning question, how do we do the versioning or how does it help? If you have a CD system and you have basically combined inputs, let me see if I can find an example of you. You have an instance, right, that's produced, a specific pipeline, specific instance, and the instance is a combination of specific inputs, materials. And that's can be a round charge, your Docker image, your MLflow model. You can have 20 models here actually, right? Because if you're building this pipeline, you're saying, this is my driver allocation system. It requires these six models for six objectives. And in the combination of them, the unique combination produces an instance and that instance goes into production. And then when you serve something to a user, you also tag that specific order, for example, or the user's session with that instance. Then you have a stream that captures all of these users. That is custom integration because it's, you're hooking up into a product team with specific backing, right? So there's nothing that I can tell you there to make that easier for you, unfortunately. But you can log that back and you can store that somewhere. You can even store it in the feature store if you want to use that somehow. But typically we don't do that. We just store it in BigQuery or in some place to analyze that. We don't generally use that in a closed loop, but we're thinking of just hooking that straight up into MLflow as a metric. And then you have everything in line with that model, right? So you can just compare the matrix of training and compare the matrix of production. How are you feeling? I'm fine. Okay, cool. So keep on raising questions. You have online and offline learning. So when you're doing online learning, can you have a change data capture such that you can capture the changes automatically and redeploy the model? Is it possible in MLflow? So, sorry, just repeat that question online. So there's a few types of learning. One is online. Right. So suppose I want to capture the change in the data automatically and redeploy the model here. So is it possible in MLflow? No, MLflow does not allow you to detect a change in data and redeploy the model. So the key thing here is that you don't want to use MLflow to deploy the model at all. MLflow doesn't need to retain the information. It's just the store, the artifact store, essentially. The downstream services use that as a repository to find the information and then take action because that's their role, right? It's not MLflow's intention to do that. So you need to have a CDC in between to capture the changes and... Right. Right. That's right. Also, just to read, Gorgia uses the tracking segment of MLflow extensively. There are two more, which is the projects and the models. So Proplex Micro help. We're not really covering it today. So I think please raise your hand. Who is the person who wanted to ask a question in this corner? Okay. Any more questions? Anyone? Yeah. Any more questions? Any four more questions from the same attendees? Please feel free. William is hopefully a very open guest to our meetup. Maybe next time when? Maybe, I don't know. This year, hopefully... It depends on the stickers and the pens. Yes. The stickers and the pens are over there. Okay. One more question. Yes. A question that MLflow works with certain subset of models where the parameters can be calculated in a concise manner. So for example, if you're using a deep learning model where the parameter is the architecture of the neural network. So how do you use it in MLflow? So your question is how do you use MLflow with models that have metrics that are not easily capturable by MLflow? Or what type of... Maybe to articulate what type of information you want to be able to evaluate on that model? And then I can tell you if MLflow can do that or not. I'm referring to the T side. Sorry, I mean the model itself. So for example, it's a deep learning model. Right. So you can basically log anything. So the only thing that MLflow does not allow you to do is I'm not aware of time series data, but you can log strings or text or even artifacts. Right? So in your case, you want to log maybe an artifact which can be an image or a graph, or it can be some text or, I don't know, it can be any arbitrary binary or data. Right? So it's really up to you. As long as there's a way to do it. Right? If the model is hard to interpret, then that's going to be a big problem, but then you can also just get it in production and see what happens. Thank you. To add to that, I want to give you an MLflow sample ass. And you can see how the artifacts, the pickle models themselves are getting outmoded. There are several codes. And also MLflow comes with the modules inside that are for different most popular algorithms that are already decomposed. So you can take a look at the GitHub. MLflow examples are the errors here. It's quite easy. Okay, one person's here. And does MLflow change how you do monitoring visibility on different types of things? So do you need more visibility to specific models that are being applied? The question is, do we have any more visibility due to MLflow being part of the stack as opposed to how we previously did it? I don't think that it does not give us any more visibility, but it allows you to debug problems because if a new model comes downstream, you'll see it first here, right? And see new models being deployed. And you know something happened upstream. It allows you to investigate. There's a paper trail. But other than that, it doesn't give you visibility. But it shouldn't, in my opinion. Oh, no, we have not done that. Any more precious? We actually have a Slack channel, so I believe you that feel that the forum should have seen the link to get onto the Slack. We just established it two days ago. So we live there as well. And you may go check out all the details that were not asked in the live session.