 So I guess we are going for one ad or directly to the next speaker. Well, it's 4.15, 16.15. So let's go to the next speaker, Eduardo on the stage, please. All right, thank you. Thank you, Consuelo. Hello, Eduardo. Hi. So you're the next speaker up. Where are you connecting from? From Mexico City. Nice, nice, nice place, very nice. I was there many, many years ago. So you're going to talk about develop and deploy a machine learning pipeline in 30 minutes with... How do you pronounce it? Perfect. So you can start sharing your screen if you have slides anytime. Okay, let me try. Can you see my screen? Now, yes, perfect. So I'll disappear. You have 30 minutes and take it away. Great, thank you. Welcome, everyone. Thanks for being here at my presentation. My name is Eduardo and I'm going to be showing a demo of a project I've been working on, Plumber. So the talk is going to be develop and deploy a machine learning pipeline in 30 minutes with Plumber. So I'm going to be coding... I'm going to be lab coding fast, as fast as I can, trying to explain as many details as I can, but bear in mind that this presentation, the objective of this presentation is not for you to be an expert in Plumber, but rather to get a glimpse of how the experience looks like so you can use it for your next project. So before we start with the demo, I want to show a few things. Otherwise, I'm going to forget this by the end of the presentation. So just a few things. The project is open source, so you can check out the code on GitHub. Here's the link. If you like the project, please show your support with the start on GitHub. Please also join our community if you have any questions or just want to chat. The link is on the GitHub's with me. Or you can also reach out to me on Twitter. So here's my handle. Okay, let's start. The first thing that we are going to do is we are going to create a base project. So I'm going to run the first command, which is PlumberScaffold. We're going to be using conda for my dependencies. I can also use pip. And we are going to create an empty project. So that's the first step. We are going to create a base project to the final step we need to get started. I'm going to call this demo. Then we go to the demo folder and I'm going to start explaining how this pipeline thing looks like. So a pipeline is just a bunch of tasks. We get some data. We clean some data. We generate some features. We train a model. And we usually split this into many small steps so that we can modularize our pipeline. So the central piece in Plumber is this pipeline.jaml file where we declare our tasks. So that's what I'm going to do now. I am going to create my first task. I'm going to say source. That's where my source code is. I'm going to store this in the scripts folder. And I'm going to say get.i. So this script is going to get some data, just the raw data that we need. And this is going to generate two outputs. So I say products. And then the first one is going to be a notebook. Why a notebook? That's because Lumber creates scripts as notebooks. So we can develop them interactively, but then we can execute them from the command line and we can get an output notebook. The idea is that if our script generates any kinds of charts or tables, once we execute the pipeline, we are going to be able to get all of these in a file that we can take a look at. So I'll do this. I'll say products.get.ipimd. I can also change the format. For example, I can say HTML, but I'll leave it as ipimd. This is also going to generate some data. So products slash get. This is where I want to save my data. And that's for our first task. Now let's continue with the next task. I'm going to be using the iris dataset. So I'm going to generate a feature from the sepal columns. So I'll just call this sepal feature. Same idea, source code, and products. Let's continue with the next task, which I'm going to be using the petal column columns. Same thing. And finally, we train a model. So I'm going to call this fit. I'm going to change this because this doesn't generate data. This is going to be a model. So I'm going to change the name and say model.pico. So now we have a basic structure, the basic layout. Now I'm going to ask Fumber to generate some basic files for me. I made a mistake. Yes. This should be product. Yes. Okay. Let's try again. All right. So now we have the base files. And I can generate a plot from this. So we see that Fumber is recognizing these files as our tasks. So you see we have four tasks. This doesn't have any structure yet. So that's what we are going to be working on. And I'm going to show the integration with Jupyter. So I'm going to open JupyterLab. And we are going to start calling the logic for our pipeline. Okay. Let's give it a few seconds. Okay. So now let's go to the first task. So I'm going to be getting some data. As you can see, this is something important to mention. I have my pipeline.jammel here. And as you can see, get generates to outputs. Right. So Fumber is auto completing that for me and telling me this is where you are supposed to save your output. So I can simply run this L and I have the information that I need. I'm going to import. This is where I'm going to be getting my data from. Import loads. So it's from import. Yes. The integration with Jupyter is really nice because it allows me to do these kinds of things like doing things interactively. It just makes things much easier that just using a script or regular script. And remember that this is a regular script. It just happens that Fumber has a plugin that allows us to open them as notebooks. We rely on the fantastic Jupyter package. And we just add a bunch of things on top of it to make this work. So I think I need this frame. Yeah. This contains everything I need. So I'll just save this to CSV. And then I'm going to use a variable that Fumber adds for me. So product.data. And I don't want to save the index. So I'll save index false. So that's it for our first task. Now let's continue with the next one. So simple feature. And I'm going to show something interesting here. So we are going to generate a feature, but we depend on the raw data to do so. So what I'm going to do is I use this special option variable and say, I want to use get as a dependency. So I save my file and reload. And you see that Fumber is going to autocomplete things for me. So I have my output where I'm supposed to save my output and where's my input. So I continue working. Import pandas. And I'm going to read my raw data. So the data that I generated in the previous task. This is going to be upstream, upstream. Oops. Get. Okay. Now I have my raw data. This is where I'm going to generate one feature. This is going to be a simple feature. So it's going to be, let's call this bit filter. And say this is going to be equal to the F. Let's just take this one. I'm just doing the classic feature engineering step. And multiply by the other one. Really simple thing just to, just for the sake of example. What's going on here? Oh, this one is extra. Okay. Now I got my new column. And I'm only going to save this one. Because I already have the rest of the columns. I'm just going to save this one. Excuse me. Same thing. So I use the variable that will be able to complete product. Because that's where I should save my output. Okay. So do we finish the second task? Now let's move to the second, sorry, the third one. Repeat that again. And the code is going to be really similar. I'm just going to copy a few things here. Oh, first I have to declare my dependencies. So let's reload. Okay. Now let's add this new feature. So it's going to be really, really similar. I'll just copy this thing. I just have to change something here. Save all. Save all. All right. I skip one important step, which is loading. My data. My raw data. Here. Yes. So we load the raw data. We connect the feature. And we save it. Let's just make a quick check. Everything looks good. Okay. So now we finished our third step. Let's go to the final task, which is fitting the model. Oops. Actually. Oh, I kind of overlooked this important detail. So these are .py scripts. In order to open them as notebooks, I have to double click and then open as notebook. Now this final step uses all previous tasks as inputs. So I'm going to make a list. So I'm going to set up a feature. Get all feature and then get task. Okay. So these are my dependencies. Now I am going to reload this. You can see I get everything I need. And let's work on our machine learning model. Port panas as the name. Let's roll. Let's load the raw data. We read CSV. And then we say upstream. That's the raw data is here. So we see our data. And now let's load the features that we generated. So let's start with sepal. So as you can see this out of completion and all these things allows us to really break down this. What usually happens is that people code notebooks, like real notebooks, and it becomes a real mess. So in this way, we are breaking down this huge notebook. And we have many small files that we can concatenate one with another. And this helps a lot with organization and maintainability. And we can also collaborate with people because people may work on different files without any issues. Okay. So I have everything I need. I'm going to create one data frame with everything. So let's call this DF. And this has everything that I've been working on. So this is my training set. We have the raw data. We have the features that I generated. And we have the target variable. So let's now train our model. Just a random forest from import. Okay. And just to show some charts on evaluation charts, I'm going to create a confusion matrix. Corp confusion matrix. Okay. Now we have our data. Let's split this into X and Y. So let's drop the target. Axis columns. And then Y is going to be DF.target. All right. So we have X and Y. Let's train our model. So this is going to be the random forest. Now let's go fit. X, Y. I'm going to skip the cross-validation part just to save some time and to be quick. But in real life, you should be doing cross-validation to evaluate your models. I'm just going to... So don't do this, please, in a real machine learning project. I'm going to generate predictions in my training set. Predict confusion matrix. And then we need Y and Y prime. All right. So we have our evaluation. So we finished the training pipeline. I'm going to save this. I'm going to close this. Close Jupyter Notebook. Now I've run things interactively. But I want to make sure that my pipeline runs from start to finish for reproducibility. So what I'm going to do is I'm going to ask Bloomberg to run everything for me from start to finish. And you're going to see that it's going to run things in order. So you can see here it's getting the data. Then it's going to generate the first feature, then the second feature, and finally it's going to train a model. So we are making sure... Oh, I forgot something important. Yes. I didn't save the model. I just trained the model that I didn't save it. So let's go back to Jupyter Lab and fix that. Give a few seconds. Okay. So let's come back here. And this doesn't take too long to run. So I'm just going to run everything. Okay. So here's where we have to save our model. We see here that we declare a model as an output. And we have to do that. That's why Bloomberg was complaining because it's saying, well, you told me you were going to save something and I don't see it. So tell me what it is. What else? Oh, I need import pickle. And now I have my model. And I'm going to save this in product. I think it's model. Model. Right. Bites. And then pickle. Bumps. All right. So now we can close this. Now I'm going to show this was this can help us to show some, some nice feature from Bloomberg because I can call this a preliminary build command again. So I already build most of my pipeline. I run get and the both, the two tasks that you know, that feature. So if I call this again, check out what's going to happen. So it's only running feet because I already have the outputs for the other tasks and I haven't changed anything. So it can skip tasks that haven't changed in the last run. So for example, if I run this again, it's not going to do anything because I haven't done anything. So it helps you to iterate faster on your pipeline. Okay. So we finish the turning pipeline. Let's work on the serving pipeline. I'm going to show the new plot. And now that we established the relationships between the past, I can see this, this new chart. So before we had a plot without any structure. Now we are saying we are getting some data, we generate some features and we join everything to train a model. Now I want to generate a serving pipeline. And the only difference between this training pipeline and my serving pipeline is what happens at the beginning and at the end. When we are training a model, we want to get historical data. We process it and then we train a model. When we want to make predictions, we are going to get new data. So all the new data points that we want to make predictions on, we have to apply the same pre-processing to generate the same features and we are going to load up a model and make predictions. So as you can see what happens here in the middle is the same thing. So I'm going to use that fact and reuse this code. So I don't have to compute my feature generating code twice. So that's what I'm going to do now. What I'm going to do is I'm going to create a new file that separates what's common to both pipelines. So I'm going to call these features channel and then I'm going to create another file where I'm going to be writing, I'm going to be declaring my serving logic. So let's go back to the training pipeline. I'm going to take out these two tasks which generate the features. So I'm going to put them here. This is going to be common to both pipelines. So I put this here. And now to fix my training pipeline, I'm going to import that file. So I do import tasks from, say, features.jaml. OK, so that's it. Now our serving pipeline, I'm going to reuse our training pipeline as a base and I'm just going to make a few changes. So for our saving pipeline, instead of getting historical data, we need to get new data. So I'm going to create a new script called get new. I have to put the name here, change the name of this task to be compatible with the rest of the code. What else we have to do? I have to change this. So this is not going to be training them all. This is going to be making predictions. I'm going to do predict. And this is going to be a data file with the prediction. So I'll call this predict. OK, so now I'm going to parameterize these two pipelines because when I run the training pipeline, I want to save the output in one folder and when running the serving pipeline, I want to save the files in a different folder. So I create a new file to parameterize my pipelines. And I'm going to say out train. So my train pipeline is going to save its output here. And for our serving pipeline, I'm going to create that mserver.jaml. And out is going to be server. Now, I need to parameterize my pipeline. What I'm going to do is I'm going to change the path to the output files and I'm going to include that parameter that I just created. So I know this is really fast. I'm skipping lots of details, but just want to get some idea of how the experience looks like. So I parameterized my train pipeline. Now I have to do the same thing with my serving pipeline. Out. And finally, this other file. Out. I'm going to test this thing. Oh, I missed something. I have to include my model. So when serving predictions, we have to load our model. So what I'm going to do is I'm going to say my model is in this folder in a file called model.tico. And this has to be a parameter. So params, model. Now, let's skip our pickle file. The one that we generated when we ran our training pipeline. I'm just going to copy that. I'm going to delete the rest of this because I want to show you how now that we parameterize our pipelines, I can run Flumber build again. This is going to run the training pipeline. And you are going to see that it's going to save everything in the train folder because we parameterized the pipeline. So it's running everything from scratch again. It's training a new model. Now that it finished, I'm going to do the same for the serving pipeline. We want to test that we can actually serve. Oh, actually, I'm skipping a really important step, which is coding the logic for the train, for the serving pipeline. So I'm just going to do that. I have to tell Flumber that it has to use the serving pipeline instead of the training one. So I'm just going to do this. And now I'm going to generate the base files right now because I don't have anything. I don't have this file or these files. So that's what I'm going to do now. I'm going to call Flumber, scaffold, and then use my server demo file. OK, so we got those two files. We can see them here. And now let's go to Jupyter Lab. So we code the logic that gets new data and the one that loads the model and makes predictions. So just for simplicity, I'm going to be loading the same data. It's not going to be new data because this sample data set is limited. So again, please don't do this in a real machine learning project. Just to make an example of how this works. In a real project, we would be getting new observations that we want to make predictions on. So I just copied the code from the other task because this is going to be really similar. So let's assume this command load Iris gives us new data. And we want to make predictions on this. I have to change this because we don't want the target variable. So you can see we only have the raw data and we want to make predictions on this. So I'm going to save this. I didn't... Oh. Why is this... It's not how to complete things for me. All right, let me see what's going on. Oh, I see what happened. This shouldn't be new. It should be good. All right, let's see if this works. Yes, okay. So now I have my variable, the one that I need to save my output. Now I... Oh, I don't have the output folder. I can actually use the command line to ask Plumber to run my new code. So it generates that folder. I am missing the server folder. So you cannot save that file because I only have train. But just to show how the command line works, I'm going to say Plumber task and then get... So I want to run this task and use my pipeline server. So this is going to run this file from the command line and it's going to create that folder for me. All right, so we finished that. We can ignore this. And let's continue working. So we are going to reuse the previous code. So these two files, the ones that generate features, this one and this one. So I can simply run my pipeline because we already declared that in our serving logic. So I'm going to do Plumber build. This, of course, is going to break at the end because I don't have the script that makes a prediction. I have to work on that now. So I am just going to generate the features. Now I have the features that I need and I can continue working on my final step. I can show that I have the server folder. So this is generated by the server pipeline. Okay, so now let's continue working on this. We need everything from the previous tasks. Okay, go figure, figure. Now let's reload this thing. And I am going to borrow some of the code from here, not from here, from here. Just to make things a little fast. I need this. I'm just going to make a few changes here. Actually, I think I think I need to change anything. There's nothing important on this. So we are going to generate the features and then we are going to load the model and make predictions. So you see, we have all the features. We don't have the target variable because this is the task that loads and predicts. So now let's load our model from pathlib import path and import equal. So we have the path to our model here. Let's load that at model. I think it's model. Yes, model. Create bytes. And we need to call loads. This is going to return the object. Okay, so we load our model. We are going to make predictions now. I think it's called df, yeah. It spreads. And now let's just create that data frame with this. Now that we can save this as a CSV file. Okay, so let's assume these are the predictions that we want to generate. Now we save this. CSV, product, seed, data, product, data. Yes, index, false. Okay, so we finished coding. Let's make sure that our saving pipeline actually runs from scratch before we deploy this to the cloud. Okay, it's working. Great. We finished working on Jupyter Cloud, so I can close this. Let me shut this down. And now, so we finished with the coding part. We don't need this anymore, the output. We just need the models. I'm just going to delete it. Now we use the second command line tool that, so Plumber takes care of, helps you write pipelines locally. And if you want to run things in the cloud, you can use the second command tool that we are going to use now. So what I'm going to do is I'm going to create a new deployment environment. So I'm going to use the supervisor app. I'm going to call this serve, and we are going to use the AWS batch backend. We can also use airflow or Kubernetes. The experience is pretty much the same. The only difference is the configuration. It's pretty much the same. I'm missing two files that I need. My dependencies. So what I'm going to do is I'm going to get those files I need. These three. Just configuration files that I need. My credentials for VS3 bucket, and my dependencies. That's why it didn't work. The command didn't work. So now, okay, so now it worked. And we have this new file, which is where we are going to be configuring, setting the configuration for the execution in the cloud. So this is AWS batch settings. You can ignore the details here if you are not interested in AWS batch. These settings change if you change the backend. So I need to get a copy of my repository URL. So I have that here. Great. So those are my settings. I finished configuring this thing. Now there's one remaining piece here because when we run things in the cloud, we are going to be running each task. So each of these excretes in a different container that are completely isolated. So if we run one task that depends on a previous task, we need to pass or transfer the data. So what we use is that we use a nestry bucket and I have to configure a client. So I need to add a new file to configure the client and I say clients.get. And I'm going to create that file now. Clients.py. And now I have to configure my s3 bucket. So s3 client. Let's get return s3. Pumber bucket. Now the folder I'll say hello from Python. And my credentials are in credential. I'll start. Okay. So I think we're done. Let's check out if this configuration works. So Pumber status and check our serving pipeline. And if this configuration can successfully connect to s3, which it happened, we're ready to deploy. So what I'm going to do now is I do supervisor export and the name of my environment, which is 0. Okay. Let's run this. So it's loading my pipeline and now it's going to make sure that it actually works. It's creating the Docker image. It's pretty fast because I already generated a base image. So it's only adding the new code, but it's not installing dependencies because it already has those things just to make this faster. Now it verified that the pipeline works, that you can import it. Then it checked the configuration with the s3 bucket. It pushed the image and it submitted the jobs. So that's it. We deployed our pipeline on AWS batch. I can actually show. Ignore this. The practice that I did last night, just to make sure that I was able to do this in 30 minutes. So you can see the new tasks here. These are the ones that we just submitted to AWS. And that's it. We finished. We finished on time. All right, Eduardo. Thank you so much. Great talk, great tool, great representation, life coding, everything. I think people also in the matrix were really impressed. We have... Well, we can cut a little bit into the break right now. So I will ask you one question, one very quick question that people had in the chat. Can Plumber be used without Jupyter notebooks? Yes, yes, you can. You can use it without Jupyter. I have a really strong preference for Jupyter because it allows me to do things interactively. But if you like a text editor, of course, you can use the tool that you prefer. Fantastic. Great. Thank you so much, Eduardo. Folks, we can... Let's thank all our speakers again for this session. I will do the chat class by myself. Thanks.