 OK, we're going to get started. My name is Jules Damjee. First, one is Diaz. Good morning. Thank you very much for coming to my talk. Now, some of you probably attended yesterday, Matej Zaharia, my esteemed co-founder of Databricks, and you have a talk on MLflow. So some of these are going to be a bit repetitive, but I believe the repetition is good for you to agree. It's part of reinforcement learning. So some of the slides might be repetition, and then we'll actually get into the more needed details of how to use the MLflow API. So this is what our schedule is going to be like. I'm going to talk about the challenges of some of the machine learning problems that most of you actually are working with. What are some of the challenges? How is it different from, say, doing software development engineering in the normal sense of the word? And then I'm going to talk about how MLflow tries to address this. Note that MLflow is the first one, but it tries to address it in a very unique way. And then I'm going to talk about in detail what are some of the components that comprise of MLflow. And then finally, we'll actually get into a demo where I talk about how do you actually use the neural networks using Keras and Python to create a baseline model, which is very normally the case when you're actually comparing models and do experiments, and then create a statistical power whereby we know that a baseline model gives you enough accuracy, but we want to somehow tune parameters and how do you actually do that using MLflow API to experiment, and then compare some of the models as we go along. And then I'll end with a bit of a road map. Now, how many of you actually are Python programmers here in this room? Brilliant. Anybody use Keras? Intensive flow. Lovely. So what we're going to be doing is building three models using Keras and then looking at how we can actually tune the parameters. But the important part is not as much as learning how to do neural networks, but how MLflow can actually help you in tracking the experiments as we move forward. Now, yesterday here about the machine learning is actually complex, right? It is not something that is very similar or very same as your normal software development cycle. There are a lot of different steps involved. There are a lot of different stages involved. And it's a very iterative process, which means that you have to repeat certain things. And it's not like you're just using one set of tools. You're using myriad set of tools. You're using different kinds of parameters. You're using things at scale. And it's imperative that you have the ability to reproduce certain experiments. It's very imperative for you to be able to actually somehow export their particular model. So that in itself introduces a number of complexities which somehow make this task very complex. Now, this is a quintessential cycle of repetitive things that you actually do in machine learning. You've got the data preparation part. You get the training. You deploy. And then eventually you move to the raw data and monitoring of the particular model. Now, each stage has its unique challenges. Each stage has its unique challenges. Well, those are. But for one, in data preparation, you will have loads of tools, unlike software development. You might just have one database. You might have just one streaming engine from which you're actually extracting data and you're doing some feature engineering. But over here, because of the fact that we have today multiple sources of data coming from different directions, you won't be able to actually use the most of the state of the art data sources. There could be SQL database. There could be Kafka. There could be a streaming engine. There could be Flink. There could be Python, Pandas that you want to use. And so each of these actually have a different configuration that you got to worry about. So that's stage number one. You want to use best of the tools. And you want to be able to actually say, I want to use a particular algorithm of choice that is conducive to the problem I'm trying to solve. So that's challenge number one. The second challenge is that each of these have different requirements in terms of how you actually tune. And especially those of you who are actually familiar with machine learning algorithms, you have the ability to actually do hyperparameter tuning. You have the ability to actually change some of the parameters that actually affect the metrics. And as you explore what are some of the tuning parameters, that actually can move the needle in terms of getting two or three percent accuracy on your validation loss or drop of that on your validation loss. Accuracy actually makes a huge difference in your revenue and also on the confidence of your prediction model as you put it in the production. So number two challenge is that you got to have tuning parameters. And then how do you actually track all these parameters? You might use five different parameters that you want to use. And eventually, if you're happy with the eventual outcome of that particular result, you won't be able to reproduce that. You'll be able to track that somehow. And so that's challenge number two. There are different kinds of parameters and you got to worry about that. The third thing is scale. We are entering the Zeiss geist of big data. And in other words, big data now is actually a nomenclature that we see every day. And things happen at scale. And I would contend that the more data you actually have, whether you're using deep learning frameworks, whether you're using deep learning models, or whether you're using traditional machine learning models, the more data you have, the chances are that you have a better accuracy because your training model can actually look at data that actually hasn't seen. And if you have a large validation set, then you can actually have a confidence that whatever I've trained on something, I can actually get a fairly good high accuracy on the unseen data. So having a large set of data at scale is important. And scale is important not only on one stage of the machine learning cycle. Scale is actually important throughout because after you have done the training, you're going to do a deploy. And then when you're done deploy, you actually put it out into real production. And you're going to be monitoring data how your model is actually working. You might incorporate some of the new features that come in because data changes, features changes, and you might have to retrend the model. And so this particular cycle actually repeats itself. The challenge number two is that you've got to worry about scale. And that's an important part of it. The third thing is once you're done as a machine learning developer, you're going from feature engineering to training. And now you're going to hand off the model exchange to your ops guy and how he or she is going to take that particular model and ensure that the model that you actually hand over is the one that you actually experimented, the one you actually tested and the one you were happy with so that he or she can actually deploy it. And you can actually monitor and then repart that. So model exchange is an important part in this entire predictive cycle. And then finally, we are living in an era that GDPR is now an imperative part, especially in the EU and also in the United States, where your private data is actually something very personal and a lot of companies have had problem either selling their data or inadvertently exposing your personal data. And so it's very important for you when you deploy a particular model, you have an accountability, you have a governance, you have a provenance, you want to make sure that when you deploy the particular model, you have the ability at any point in any time to be able to query that. Who ran the particular model when it was drawn, what data was used, and so on and so forth. So collectively, these are kind of things that daily people who are doing development in their life cycle encounter that. And people do their own un-court and overflow internal tools whereby track that, they might do it on a spreadsheet, they might do it in a JSON file. And then they have to somehow create a report to sign over. These are the tuning parameters used. These are my configuration. This is where the data came from. This is the code version that I actually used to create that model. All that bookkeeping can actually be taken care of a tool such as that allows you to keep tracking. And when we were doing research, when we were talking now to developers, and we were talking about the customers to find out what is it and how they're actually doing it. And it was not very dissimilar to what we actually would automate that. And this was one of the code that actually came from one of the very chief data scientists who say, we develop models on a daily basis, 100 to 200 models. And we have data scientists who are our programmers. We have data scientists who are Java programmers. We have data scientists who have statisticians and ML developers who swear by Python that don't use anything else. We want to be able to actually create tools so that we can actually deploy them. And it becomes like a religious war where which one should we actually use. And so in order to do that, we had to somehow come up with a model or come up with the strategies, come up with the scheme that actually addresses some of the things. Now you might ask at this particular point, Jules, what's the problem, right? I mean, if this is such a problem, why people actually haven't sold it? And it turns out that people have. There are large companies out there who actually do this on a very frequent daily basis. They go out developing tools. And so we did some research to find out how these big companies, Facebook has got enormous amount of data. They have machine learning models that they deploy on a daily basis. It's not only one model, they have several models, and they do a lot of A-B testing. And how do you actually track all the different requirements that actually, how do you actually track all the parameters and do that? So they have their own internal tool called FB Learner, which is fine because it standardizes the way they train them more than they deploy it and how they actually monitor it, which is good because now we can actually learn something from it. Uber is another one that came to the Michelangelo, which is very similar in terms of that allows them to be able to do all that. And so is Google. I mean, when you talk about Google, what do you think about? You think about scale, right? These guys are dealing with scale amount of data and with the TFX extended server that can actually deploy various kinds of TensorFlow model, not only in one architecture, but also in your mobile phone, also on Raspberry Pi, surprisingly. And all these are requirements that do this. The only problem over there is that while it's true that they give you the ability to actually do this, but they're limited in terms of the choice of algorithms you need, the choice of frameworks that you need, the choice of programming languages because they will adopt the one that's commonly used within their company which actually suits their network and they're tied to the company's infrastructure. So when you leave the company, what happens? But you lose that intellectual property. And I remember one particular anecdote when I was running a meet-up in San Francisco and I had a guy from Facebook who was a data scientist who actually gave me a talk in how he was actually doing A.B. testing and calculating the sentiment analysis on the comments, right? When you look at a Facebook comment, you say like or you might comment and they want to find out how do you actually do sentiment analysis on this? And he actually gave a very eloquent presentation on how he actually does that. And at the end I asked him, you know, what was the underlying library you actually used? Did you use Spark ML Lib? Do you use this particular algorithm? And he was dumbfounded. He couldn't answer the question because you know, we actually have a very abstraction layer. All I do is I create all this high-level declarators in my file. I said I want to use a logistic regression. I want to use this particular function. I want to use, here's my data. Go ahead and create a model for me and then run the model and give me the results. So that's a high-level abstraction. Now if he left somewhere and he won't be able to do that. Same thing with Google's Borg, which is a massive scale in how you actually do provisioning. And when people left Google and went to Twitter they couldn't provision things because the Borg didn't exist. It was a very platform that was very specific. So we decided what is it we can actually learn from this that standardize things but we want to do it in a very open manner so that when you leave the particular company you don't have to worry about taking the intellectual property. We can do that in an open manner. And one of the ideas behind OpenMan actually came from the Apache Spark community because the founders who created MLflow are also the founders who actually created Apache Spark. And they learned a lot from the fact that doing things in an open manner is where the innovation happens because if you look at historically how open sources actually change the world innovation happens in collaboration. It doesn't happen in isolation. And that was the principle behind what we decided to do was to keep this in an open manner so you can actually do that very easily. And thus came MLflow. And MLflow was created with a couple of things in mind. One was that we wanted to have the ability to actually run any algorithm that you actually need, any particular framework you want to use, any language if you don't have language wars. We wanted to be able to run in any particular environment locally because a lot of people developers would run things on PyCharm or they would use ID. At the same time run it on the cluster as well or on a large scale. We wanted to make it very simple. We want to make it easy to use. And I think developers are where the power is. They are the new king makers. And you provide them simple APIs. You provide them modularity so they can build things on their own. You really have the minds and hearts. And we always wanted to keep developers first. And thus was the design philosophy. And the design philosophy was very simple, API first. If you look at history, how APIs have actually changed in how you actually build something, you look at Unix for example, see API changed completely how people started writing stuff. You could write shared libraries. You could write window managers. You all ran on the same platform. Why? Because there was a C interface to the kernel. There was a C interface to the application layer. There was a C interface. Very well defined signatures in C. Then you can actually write those applications. Take an example Java. Java came in late 90s and it completely exploded the world. Why? Because of the APIs. Developers love the APIs. So we wanted to stick with the ideas that you give them the APIs, make them simple, make them modular. So that was the first design philosophy. That was the principle and the requirement we actually did there. And this sort of allows people to be able to actually build these things using the APIs and REST APIs as well. The second part was that we wanted to actually make it modular in that we didn't want this particular tool to be monolithic, right? We wanted to actually have modular tools so they can use it distinctively or independently or they can actually use it collaborative to create an entire MLflow structure. So there was a whole idea behind that. Keep it distinct, keep it less monolithic and keep it simple. And the whole idea is if you have something this decouple you can actually build things like that. That's how Unix was built. You got shell, you got all these different commands that you can actually do. You can stick together and now you have a full bone application. And so that sort of was the principle and the proponent proposition behind MLflow. And that's how we actually went forward and did that. And if you look at the MLflow components there's actually three very distinct components that can be used independently or that can be used together, right? So the MLflow tracking allows you to track your APIs, allows you to track all the hyper parameter tuning, allows you to track and persist all the parameters that you use, all the models that you actually use. And so that is a very simple API. You take the API, you track it. And the second thing was project. And project is a way for us to be able to encapsulate your entire existing project as your model. And that project can become a unit of execution in any particular environment because we want it to sort of encapsulate. Think of project as a docker file, right? Docker file has all the dependencies that you actually need. And once you actually deploy the docker in the container manner, you can run pretty much anything. So that was the idea behind docker that the project is a very simple way to express those things. And that way we can actually encapsulate all these different models in this particular project. And then finally, the third one is model. And think about model as a way just like project where you can express the model in one particular flavor. And the flavor could be a TensorFlow model. The flavor could be a Keras model. The flavor could be an MLip model. And for us to be able to actually requirement to have all these ML models, if we supported all of them in their own different particular format, we would actually have an N by N matrix multiplication. So if you actually somehow use that idea by saying, well, why don't we come up with a convention where we can actually express the model in particular flavor? And then that flavor can actually run in any environment where that flavor is actually supported. So let's look at tracking. There are a few concepts in tracking that people worry about. And those are normally parameters. Those are things that you want to track. Those are key value periods you can actually use. Numeric values are matrix that you actually want to capture. So they might be, say, for example, a root min square error that you actually got that you want to track. You might want to keep track of the binary loss. You might want to take out the validation loss. So those are the metrics that you care about. Those are the things that actually move your needle. Those are the things that make a difference in your model. There might be tags and notes. Those are sort of, you know, oh, here's a note for a reminder for me to actually remember something. And those are, the artifacts are things that the model uses. This could be files. This could be a code version. This could be data from where you actually did the preparation. This could be any number. This could be a source code. This could be a revision that you actually used. It could be the model you actually saved. Those are the artifacts. And the sources is, you know, what was the source? What was the origin in where it actually did? And then version is obviously version of the code. So those are the essential things that people want to be able to track it so that when the model is created with this particular experiment, I can always go back and exactly find out where the particular model is. And the tracking API is very simple, right? If you're a Python, this is a Python code, all you have to do is import MLflow. And then using the compound statement such as with the current run, go ahead and log this parameter, log that parameter, log this particular metric and so on and so forth. And so very simple API that allows you to do things. And finally, tracking has this notion that I can have a tracking server running locally on my machine, or I can have a tracking server running somewhere remotely on the cluster where I can actually do the tracking. Invariably, if you're a developer, you're gonna be doing things locally. And then finally, setting the tracking to run on a particular remote server which could be production or which could be dev. And those can be run in three different ways, right? You can actually run in your notebooks whether you're using Databricks notebooks or whether you're using Jupyter notebooks. If you have MLflow installed as part of the package, then you can just import those and from your notebooks go ahead and do the tracking. Or you can actually have local application on PyCharm that actually just says tracking server is here, go ahead and run it, or tracking server is not set. So I'm just gonna do everything locally and once I'm happy with it, I can actually do the experiment running remotely. Or it can actually be part of the cloud jobs whereby you might have a cron job that's actually running this model on a daily basis to do the training and then tracking all the experiments remotely. And that's sort of governed by these two attributes. You can actually, on the command line, you can actually say, here's my environment variable. Go ahead and run this experiment and put all the data on the tracking server. And if this environment variable is run and I'm running locally, it's gonna create an ML run directly locally and you'll have all the experiments there. And then, or you can actually set it in your model where you're actually doing the experimenting where you set the tracking to be somewhere else and now you're actually going to do it. So that's sort of a tracking. We wanted to be able to take this motivation and take it one forward and say, well, the motivation behind creating projects as an independent way and as a way to express that was the ability to create this project file called a project format. And this project format encapsulate what your project is, what your model is, what flavor it is. And that way, this allows us to be able to run this flavor on any particular machine. And it makes the productionizing a lot easier and it makes it easier to manage. You don't have to create thousands of different project files or different kinds of projects because one is Keras and one is Sidekid and one is the other one. You have one project that expresses what your project is and it becomes a unit of execution. And so the idea behind projects is very simple. I can actually run it locally or I can run it remotely and think of project as a Docker file, right? I express my execution or mode of execution in my project file and I deploy that in a container and it downloads all the necessary dependencies that I have and it can actually run. So that's the whole idea behind every project. And it encapsulates three very important things that comprises your model, the code that you actually use to do that, the configurations that you actually use that and the data, the files where you actually got the data from. It could be a Kafka server, it could be a SQL database, it could be HDFS path where you actually have large amounts of data where you actually use to train your particular model. And those of you who are coming from the Knicks background, the file project is very simple, right? It could be part of your Git repository where the name of the ML project is, you know, Keras IMD classifier. And then the project file is just nothing but the YAML file that says, here are my entry points. When you run this particular project and if I give you the parameters, use this default parameters or use this Python code to execute that. And the Kanda environment is a Kanda YAML file that says if you're running this using the ML flow from a Git repository, the Kanda environment tells you what dependencies are, what packages you actually need. You're just gonna download, it's gonna create a Python environment, it's gonna create a Kanda environment and download those packages that you need. And you actually have a self-sustained project running where you can actually do that similarly. So it's a very sort of very easy intuitive way to express what your project is. And you can run project in number of ways, right? You can say on the command line, I can say ML for a run. And if somebody has published this project on a Git repository, if I just go to the Git repository, it will download on my local environment. It will use, it will ascertain from the Kanda YAML that I need these packages and I need SciPy and I need Pandas and I need TensorFlow and it carries. It will download those and it will run in that particular Kanda environment or a virtual Python environment and do the tracking experiment depending on where the tracking server is. So it's very flexible, it's very powerful, it's very simple conceptually to actually understand that. So that sort of is the idea behind project. And then the ML models, now this was something that initially we had a bit of difficulty how to articulate, but the way I look at it is MLflow models allows you just like the Docker file and I'm using this analogy over and over again is because models just like projects are a way to express how that particular flavor is. Now the models could be a flavor or a carousel, it could be any of the deep learning frameworks or machine learning libraries that you perform. And once we encapsulate that particular flavor, you can think of this flavor as a Lambda function. Those of you are Python programmers understand what Lambda function is. You actually take the Lambda function, you deploy it anywhere that's running Python and you just have one function called predict and you can actually use that particular predict function or you can actually have the flavors at TensorFlow that's running on that particular Docker environment. And if TensorFlow is there, it will know that I'm running a TensorFlow code. I'm gonna load that particular model using the TensorFlow HD format or carous and I'm gonna execute my predict function given the particular output. So that sort of allows us to do that. How many of you use Apache Arrow? Anybody use Apache Arrow? So Apache Arrow is very similar in terms of deserializing and serializing Panda's frame to allow applications to run on JVM. So instead of actually JVM trying to understand I'm gonna have Mlib, I'm gonna have Mlib function, I'm gonna have Arrow function, I'm gonna have all these different formats to understand. Arrow is an exchange format. I'm just gonna have one format and the JVM will know that you're actually using the Arrow format to serialize and deserialize. So in a very ways, it's very similar where you can actually exchange the model. And if you look at the model file, it's quite simple. You have the model that says when the model was actually created, time it was created, what flavor it is. So I can actually use TensorFlow where I'm deploying this particular model to say, well, this is a TensorFlow model so I can load using the TensorFlow API to load that particular model it was saved and I can use whatever the model API on the TensorFlow is to predict, to evaluate, to do whatever I actually wanted a particular model. And the second is a very generic model called Python Flavor that is very generic. It can run anywhere where Python is running as a Py function, as a Lambda function. So those are sort of the two flavors that allows you to deploy this model anywhere where either you have the ability to run Python or you have the model target on an environment with TensorFlow, Keras, Spark, MNLip, and so on and so forth. So I think this is a time where you get to see some code because I know yesterday there were a lot of high level stage so I thought today I would spend the next 15 minutes I have on code. Now it's going to be quite challenging because it's going to be hard for me to type but I'll try. So what's my model? You might want to take a picture of this because you can actually go home and look at the GitHub if you missed some parts of it that actually has a notebook that allows you to do that and there's a blog associated with that and all the code that I'm going to show you today is there so I don't have to endlessly trip all the wires while I'm talking and have this cell tape here. All right, so this is what we're going to do. We're going to create three Keras models. We'll create the baseline model that says, okay, this is my baseline model and once I have the baseline model I look at the accuracy to see whether it actually makes sense and I'm going to use now two experiments, experiment one, experiment two while I'm changing the hyperparameter tunes to see how is the needle moving? Do I have a better accuracy? Do I have a lower loss? Is my model overfitting? Things of that sort. And these are the things you ask when you're actually experimenting and you want to capture all the parameters available. So we'll have a neural network that actually looks like this. We have an input layer that's going to take the IMD classifier as a vector of input from the Keras vectorized as tensors and we'll have a number of hidden layers. We can experiment with the hidden layers and the last layer is going to be an output layer which is sigmoid as your function and those of you are familiar with active function, sigmoid, it gives you the ability to actually give you a probability or watch your classification is between zero and one. And so anything that's above 0.5 will say it's good. Anything below, it's going to be a good API. I'm in a good prediction. All right, so let's look at that. So what I'm going to do now is if the demo goes up with me, could we switch to my screen? I can't seem to see my browser. Do I have to do anything different over here because I can't seem to see my browser? Okay, so the demo goes up with me. So what I'm going to do instead is run this experiment on my laptop. And what I'm doing, I'm going to actually create three particular models. One is the one which is a base experiment. Oh, actually you can see it. Yeah, but I'm trying to go to the system preferences. I actually want that, I want this particular screen, you know, the browser, so I can type things in. Okay, looks like the demo goes up with me today and I can't seem to find my stuff. But the whole idea is, and I tried this before. Yeah, I'm not getting, does anybody know how to actually change their preferences so I can actually show the demo, I can't seem to get the browser up. I want to be able to actually see the screen over here for that particular browser to the Chrome. That's where I actually had the experiment. So I can actually show, no, the browser is here, the Chrome. But I can't see it over here, right? So we just, yeah, yeah. So if you actually go to preference in a mirror, I should be able to see that over here. Yeah, bring it over here, brilliant. All right, so in the interest of time, I actually have this experiment that we're actually running already. And what I've done essentially is I have, I've got nine minutes, I'm gonna end it right quickly. Can you guys see this? You can't see it. Oh dear, okay. All right, this is unfortunate. I should be able to see it. I can see it on my screen, but I can't seem to get to show the mirror. Okay, so just to give you an idea, I mean, if you look at this particular code here, I can't even look at that particular thing. All right, what I'm gonna do is, probably take any questions you have. I think the important part was to actually show you a demo, how to actually use that. I'm gonna be at the Databricks booth, or I'm gonna hang around here after that, and I can actually show you personally. But the whole idea is we actually create the experiments. We launch the MLflow UI to actually look at the experiments and see how we actually lowered the metrics and how we actually do the loss function. So I have a few questions that I have some time. I can answer any question they actually have. Unfortunately, the demo codes are not with me. I can't seem to mirror the preferences. Otherwise, this would have been an interesting exercise, but I can only show you later on if you stop by the booth. Any questions you have? Yes. Hi, nice talk. Thank you. So our stack is based on Python, ScylerKit, we deploy it with Docker. And our format is JobLib. Is this MLflow compatible with that? Yeah, so if you're using Cykit-Lan as one of the frameworks, one of the flavors that we actually support, you can actually save the model as an MLib model, and they can deploy that as an MLib model, and it will save both as an MLib or as a pickle in Cykit-Lan, whichever way you actually want. So if you're using Cykit-Lan, you have two choices. You can save it as an MLib model, which will use the MLib format to store it, or you can actually use the Cykit pickle that actually used that. And depending on where you are storing on the Docker, you can load it using Cykit API to load the particular model. It will load the pickle file if you saved it as Cykit file, or if you just leave it as MLib, then you can just use the MLib model, load it, and then just provide the data frame as a pandas data frame to your predict in the model and you get all your results. I wish I actually had the demo. They've been actually worth it, but I apologize for that. All right, any other question do you have? Yes. Thanks for your talk. Thank you. Are there any plans for the MLflow project to run on Microsoft platforms? Well, actually, yes. We have the ability to run MLflow on Azure Databricks, that actually runs on Azure Databricks. It runs on, you can deploy your models on Azure ML, right? And we can also deploy models on AWS SageMaker. So yes, it actually runs on Microsoft Azure ML. So if you, with the current version, we provide you the docker file so you can actually run on Azure ML. So any model that you actually save, it can be deployed in Azure ML as well. And as the CNTK from Microsoft, like TensorFlow works, do you have any flavor for that? So CNTK is on the list right now. We support the most popular ones, but if you feel that there is a need for CNTK, again, this is an open-source project. I think it's people like you, really, who actually make a difference. And we want to be able to actually do that, whereby you just file in a pull request, and we're very open. Some of the things that actually have come out, for example, the RAPI, the scatterplot, how you actually use it, has actually come from open-source community. And so we're very open. We're still in alpha 0.8, and we are very, very conducive to people actually putting a PR request. And we evaluate the request on daily basis, and we create releases on sprint, on bi-weekly basis. And so if you think that you actually need that, please, please, please, file a PR request. And I'll be there to actually take it and take it to the right people to say, okay, we need CNTK because we have people actually using it, right? Because we actually deploy your Microsoft Azure ML, it's not unreasonable to think that CNTK right now. But because TensorFlow and Keras is now very much integrated in sort of a default platform, a lot of people use that. But I don't see any, I wouldn't say, there's nothing that actually stops us from you finding a PR request and re-approving it and making that part of that. The more mortals we have, the more open it becomes, and people who come from a diverse set of skills can actually use that. Any other question you have? Yeah, we have a question in the back up there. So if you just go to mlflow.org, we've got documentation over there and we have had the GitHub repository and you'll see all the code examples. And if you actually follow the URL that I could give you, you'll be able to run this experiment, no problem on your own. Pip install, mlflow, it's very easy. Import, mlflow, start hacking on your local machine. You'll love it, literally. It's, you know. Thank you for your talk. We just, here. Oh, okay, sorry. I just, to ask if you have any plan to include a service, not as an installation, the part of mlflow registry and all of these parts. Good question, good question. I think what you're actually asking is, how do you actually deal with metadata and how do you actually deal with governance? And if you look at the cycle that I actually provided, you know, the governance is actually last part. And on the roadmap that I was gonna show right now because we're out of time, one of the things we actually wanna do now is the last stage, which is the provenance and the governance because I think without that, it's like pasta without garlic, you got to have that, right, as part of it. And I think we are working diligently to cut the releases where we actually now have the entire accountability by saying, I wanted actually not only register who did the experiment, who ran the experiment, when they ran the experiment, were they allowed to run the experiment? You know, can I actually go to the registry to find out what the access control, how? So all those are back, which are part of the enterprise requirements are going to be part of it. Remember, we just really 0.80 is still in its preliminary stages. And I think it's important right now to get the community galvanized and start contributing that. And I believe as an advocate and I believe the company that I work for were the creators of Part Your Spark that lightning can actually strike twice because what happened with the Part Your Spark was not only the community actually contributed quite a bit and it really, really has taken off. See what's happened with TensorFlow. You have Google who actually released it and the community is actually behind it. And we feel that we can actually do the same thing with MLflow, whereby we actually create a lot of PR requests from you guys. And I think you can actually contribute that and you can actually take it and help us to go to the next level. Any other questions? Yeah, in the back. Yes, so last year in our team, we started to work with MLip with the idea of serialized models from Psykit-Learn or from Spark. Don't worry too much about which technology we're using. But soon we realized that it was not so easy because in MLip, it's not supported some of the transformers that we have in Spark. MLflow help us with this kind of issues? I think that's a good question. The question is, MLip has certain restrictions. What is it that MLflow can actually do? Because as you notice that MLip is one of the flavors that we actually, in Psykit-Learn, you can store as MLip. And they are customized additions that you can actually put in your model. So when you actually export, when you deploy your model, one of the customized things would be the call and the transformation that you actually employ or you deploy, that you write, and then that can actually be part of that particular code. And so, yes, we have plans to document that and provide code snippets to see how you can actually write your custom transform in the MLip model when you actually save that. So when you run, your entry point not becomes the predict, but becomes the transformation that you actually wanted to. Anybody else? Well, if not, thank you for coming. I'm sorry about the demo, but I'm going to be at my Databricks booth, come by and I can actually give you a personal demo. Thank you.